RSS

Querying Hierarchies in Oracle

10 Jul

Querying Hierarchies in Oracle

Hierarchies , when the number of levels is not fixed, because the only possible implementation then is a recursive table relationship . If you’re not using Oracle, “painful” might be a better word choice. I’ve not seen a relational database other than Oracle that provides support for querying hierarchical data. If your database doesn’t provide such support, you are forced to write recursive code yourself. For example, to get the first level of components that make up an airplane, you could issue the following query:

SELECT assembly_id, assembly_name
FROM bill_of_materials
WHERE parent_assembly = 200;

ASSEMBLY_ID ASSEMBLY_NAME

———– —————
201 Jet Engine
202 Left Wing
203 Right Wing
204 Body

Four rows come back. Each of these four assemblies might (or might not) in turn be composed of other assemblies; thus, you must issue four more queries in the same form:

SELECT assembly_id, assembly_name
FROM bill_of_materials
WHERE parent_assembly = 201;

SELECT assembly_id, assembly_name
FROM bill_of_materials
WHERE parent_assembly = 202;

etc.

Painful! Think of all the network round trips required to execute these queries and bring back each level of results. Consider the performance impact of all those round trips. And imagine having to write and debug the code to do all this work.

Oracle Provides a Better Way

Oracle has long provided specific support for querying hierarchical data. This support comes in the form of the START WITH and CONNECT BY clauses of the SELECT statement, which allow you to easily query hierarchical data in which each child row contains a link back to its parent.

To issue a hierarchical query, you must know two things about your data: the conditions identifying root rows (those at the top of a hierarchy) and the column or columns in a child row that point to its parent.

In the bill_of_materials table used for the examples in this article, the parent_assembly column contains an assembly ID value identifying the parent of each row. For example, assembly 201 is a jet engine. The parent_assembly value for that row is 200, indicating that the jet engine is part of an airplane. The parent_assembly for airplane is NULL, indicating that the airplane is the final product.

Begin writing a hierarchical query by using the START WITH clause to identify the root nodes of interest. The clause in the following query tells Oracle to select all rows from bill_of_materials having a NULL parent_assembly. Each of those rows will then be treated as the root node of a tree growing downward, and all the child nodes for each tree and subtree and so forth will also be retrieved.

SELECT assembly_id, assembly_name, parent_assembly
FROM bill_of_materials
START WITH parent_assembly IS NULL

Next, use the CONNECT BY clause to define the relationship between child and parent rows. In the bill_of_materials table, the parent_assembly column holds the parent row’s assembly_id value. This leads to the CONNECT BY clause shown here:

SELECT assembly_id, assembly_name, parent_assembly
FROM bill_of_materials
START WITH parent_assembly IS NULL
CONNECT BY parent_assembly = PRIOR assembly_id;

The PRIOR keyword, which is really an operator, and not a keyword such as SELECT or CONNECT BY, is critical to the operation of the query. PRIOR is designed specifically for use in hierarchical queries. Its purpose is to return a column value from a given row’s parent row. In this case, the CONNECT BY clause specifies that for a given row, the parent row is the one for which the prospective parent’s assembly_id value matches the child’s parent_assembly value. In effect, CONNECT BY specifies a recursive join. Listing 1 shows the output from this query.

In Listing 1, “Automobile” is the first root node found while executing the query, so it’s listed first in the output. Next comes “Automobile’s” first child, the “Combustion Engine.” You then see that “Piston,” “Air Filter,” and “Spark Plug” are components of the engine. “Airplane” is the second root node found, and its assemblies, subassemblies, and parts are also listed in hierarchical order.

Recursive Joins

Before going deeper into Oracle’s support for hierarchical queries, I want to stop and talk more about the just what the CONNECT BY clause is, and why it’s important. If you think about it, from a certain point of view the CONNECT BY clause is nothing more than a join specification. Imagine for a moment you were going to join the bill_of_materials table to itself and return each row’s immediate parent. To that end, you could write:

SELECT bom1.assembly_id, bom1.assembly_name, bom2.assembly_id parent
FROM bill_of_materials bom1 LEFT OUTER JOIN bill_of_materials bom2
ON bom1.parent_assembly = bom2.assembly_id;

ASSEMBLY_ID ASSEMBLY_NAME PARENT
———– ———————– ———-

130 Interior 100
120 Body 100
110 Combustion Engine 100
115 Starter System 110
114 Block 110

Doesn’t the output from this query look suspiciously like that of the query shown in the previous section? I’ll save you the trouble of checking, and tell you that this query returns the same 42 rows as the previous query. The only difference is that those rows are ordered differently. But the order of the rows matters! The order in which rows are returned from the self-join solution does not reflect the hierarchy.

Many problems cannot be solved via a self-join. For example, if you were to use a self-join, you wouldn’t be able to select just the assemblies and parts that make up an airplane. In fact, the self-join shown here is next to useless when querying hierarchical data. What you need is a recursive self-join: The root row is joined to its children, each of those child rows is joined to its children, and so forth on down the hierarchy. The database can be aware of the hierarchical nature of the data, return results in a useful order, return useful information about a row’s position in its hierarchy, and enable you to work with nodes in a top-down fashion. CONNECT BY gives you all these things.

To convert the preceding self-join to a recursive join, begin by specifying the table only once in the FROM clause:

FROM bill_of_materials

The ON clause goes away, and is replaced by CONNECT BY. The reference to bom1.parent_assembly becomes simply parent_assembly. The two references to bom2.assembly_id, which are references to a value in the parent table, become PRIOR assembly_id. Add in the necessary START WITH clause, and you have the  following:

SELECT assembly_id, assembly_name, PRIOR assembly_id parent
FROM bill_of_materials
START WITH parent_assembly IS NULL
CONNECT BY parent_assembly = PRIOR assembly_id;

This query returns the same 42 rows as the preceding self-join. However, the order of those rows reflects the hierarchy, and because CONNECT BY is used.

Advertisements
 
Leave a comment

Posted by on July 10, 2009 in Database Administration

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: