|
Querying Hierarchies in Oracle
Hierarchies , when the number of levels is not fixed, because the only possible
implementation then is a recursive table relationship . If you're not using
Oracle, "painful" might be a better word choice. I've not seen a relational
database other than Oracle that provides support for querying hierarchical data.
If your database doesn't provide such support, you are forced to write recursive
code yourself. For example, to get the first level of components that make up an
airplane, you could issue the following query:
SELECT assembly_id, assembly_name
FROM bill_of_materials
WHERE parent_assembly = 200;
ASSEMBLY_ID ASSEMBLY_NAME
----------- ---------------
201 Jet Engine
202 Left Wing
203 Right Wing
204 Body |
Four rows come back. Each of these four assemblies might (or might not) in
turn be composed of other assemblies; thus, you must issue four more queries in
the same form:
SELECT assembly_id, assembly_name
FROM bill_of_materials
WHERE parent_assembly = 201;
SELECT assembly_id, assembly_name
FROM bill_of_materials
WHERE parent_assembly = 202;
etc. |
Painful! Think of all the network round trips required to execute these
queries and bring back each level of results. Consider the performance impact of
all those round trips. And imagine having to write and debug the code to do all
this work.
Oracle Provides a Better Way
Oracle has long provided specific support for querying hierarchical data. This
support comes in the form of the START WITH and CONNECT BY clauses of the SELECT
statement, which allow you to easily query hierarchical data in which each child
row contains a link back to its parent.
To issue a hierarchical query, you must know two things about your data: the
conditions identifying root rows (those at the top of a hierarchy) and the
column or columns in a child row that point to its parent.
In the bill_of_materials table used for the examples in this article, the
parent_assembly column contains an assembly ID value identifying the parent of
each row. For example, assembly 201 is a jet engine. The parent_assembly value
for that row is 200, indicating that the jet engine is part of an airplane. The
parent_assembly for airplane is NULL, indicating that the airplane is the final
product.
Begin writing a hierarchical query by using the START WITH clause to identify
the root nodes of interest. The clause in the following query tells Oracle to
select all rows from bill_of_materials having a NULL parent_assembly. Each of
those rows will then be treated as the root node of a tree growing downward, and
all the child nodes for each tree and subtree and so forth will also be
retrieved.
SELECT assembly_id, assembly_name, parent_assembly
FROM bill_of_materials
START WITH parent_assembly IS NULL |
Next, use the CONNECT BY clause to define the relationship between child and
parent rows. In the bill_of_materials table, the parent_assembly column holds
the parent row's assembly_id value. This leads to the CONNECT BY clause shown
here:
SELECT assembly_id, assembly_name, parent_assembly
FROM bill_of_materials
START WITH parent_assembly IS NULL
CONNECT BY parent_assembly = PRIOR assembly_id; |
The PRIOR keyword, which is really an operator, and not a keyword such as
SELECT or CONNECT BY, is critical to the operation of the query. PRIOR is
designed specifically for use in hierarchical queries. Its purpose is to return
a column value from a given row's parent row. In this case, the CONNECT BY
clause specifies that for a given row, the parent row is the one for which the
prospective parent's assembly_id value matches the child's parent_assembly
value. In effect, CONNECT BY specifies a recursive join. Listing 1 shows the
output from this query.
In Listing 1, "Automobile" is the first root node found while executing the
query, so it's listed first in the output. Next comes "Automobile's" first
child, the "Combustion Engine." You then see that "Piston," "Air Filter," and
"Spark Plug" are components of the engine. "Airplane" is the second root node
found, and its assemblies, subassemblies, and parts are also listed in
hierarchical order.
Recursive Joins
Before going deeper into Oracle's support for hierarchical queries, I want to
stop and talk more about the just what the CONNECT BY clause is, and why it's
important. If you think about it, from a certain point of view the CONNECT BY
clause is nothing more than a join specification. Imagine for a moment you were
going to join the bill_of_materials table to itself and return each row's
immediate parent. To that end, you could write:
SELECT bom1.assembly_id, bom1.assembly_name,
bom2.assembly_id parent
FROM bill_of_materials bom1 LEFT OUTER JOIN bill_of_materials bom2
ON bom1.parent_assembly = bom2.assembly_id;
ASSEMBLY_ID ASSEMBLY_NAME PARENT
----------- ----------------------- ----------
130 Interior 100
120 Body 100
110 Combustion Engine 100
115 Starter System 110
114 Block 110
... |
Doesn't the output from this query look suspiciously like that of the query
shown in the previous section? I'll save you the trouble of checking, and tell
you that this query returns the same 42 rows as the previous query. The only
difference is that those rows are ordered differently. But the order of the rows
matters! The order in which rows are returned from the self-join solution does
not reflect the hierarchy.
Many problems cannot be solved via a self-join. For example, if you were to use
a self-join, you wouldn't be able to select just the assemblies and parts that
make up an airplane. In fact, the self-join shown here is next to useless when
querying hierarchical data. What you need is a recursive self-join: The root row
is joined to its children, each of those child rows is joined to its children,
and so forth on down the hierarchy. The database can be aware of the
hierarchical nature of the data, return results in a useful order, return useful
information about a row's position in its hierarchy, and enable you to work with
nodes in a top-down fashion. CONNECT BY gives you all these things.
To convert the preceding self-join to a recursive join, begin by specifying the
table only once in the FROM clause:
FROM bill_of_materials
The ON clause goes away, and is replaced by CONNECT BY. The reference to
bom1.parent_assembly becomes simply parent_assembly. The two references to
bom2.assembly_id, which are references to a value in the parent table, become
PRIOR assembly_id. Add in the necessary START WITH clause, and you have the
following:
SELECT assembly_id, assembly_name, PRIOR assembly_id parent
FROM bill_of_materials
START WITH parent_assembly IS NULL
CONNECT BY parent_assembly = PRIOR assembly_id; |
This query returns the same 42 rows as the preceding self-join. However, the
order of those rows reflects the hierarchy, and because CONNECT BY is used.
|