Preorder tree traversal

One of the exclusive features of our worldCities database is the use of the pre-ordered tree traversal in the features_hierarchy table. Here is a quick look a how it works and what its advantages are.
(If you do not currently own worldCities but would like to see how it works in practice, just download the worldCities sample data for free)

Parent-child relationship

Before getting our hands dirty with the pre-ordered tree traversal, let’s take a look at the classic parent-child relationship, which most developers are familiar with. A list of the continents and macro-regions according to the United Nations could look like this:

feature_idparent_idname
10World
21Africa
32Western Africa
42Eastern Africa
52Northern Africa
62Middle Africa
72Southern Africa
81Americas
98Northern America
108Latin America and the Caribbean
1110South America
1210Central America
1310Caribbean
141Asia
.........

It is quite straightforward to fetch the list of the continents, for instance. You just have to query the database to fetch all records with parent_id=1 (World), and you’ll get Africa, Americas, Asia, etc…

Should you want the macro-regions as well, just write another query (or use a LEFT JOIN if you prefer a single query). If you have multiple or uneven depth levels, though, it can quickly become complicated to use a LEFT JOIN. Using individual queries and recursiveness for each depth level is not very efficient either.

Pre-ordered tree traversal

With the pre-ordered tree traversal, you add two values (left and right) to each record.
You just walk the tree and assign the left values one after the other until you hit the end of a branch, then assign the right value and go back then on to the next branch. While it might seem a little confusing at first, the principle is easier to grasp when seeing it, as in the following image, which shows how the left and right values are assigned.

modified preordered tree traversal sample

There, we start with the left values for World, Africa and then Western Africa. This is the end of the branch, so we give it the right value and follow with Eastern Africa, etc… After Southern Africa gets its right value and the same is done to Africa, we follow the tree to the next continent, and so on. After all continents and regions have been assigned their left and right values, we just have to give World its right value (58).

Our continents and regions table now looks like this:

feature_idparent_idleftrightname
10158World
21213Africa
3234Western Africa
4256Eastern Africa
5278Northern Africa
62910Middle Africa
721112Southern Africa
811425Americas
981516Northern America
1081724Latin America and the Caribbean
11101819South America
12102021Central America
13102223Caribbean
1412637Asia
...............

Why is it useful in practice? To me, the main reason is that it is easy to fetch multiples levels of data with a single query, thus reducing the load of the database server and speeding things up, as well as keeping cleaner code.

Fetch everything below a node

When you examine the left and right values, it quickly becomes obvious that:

  1. all records below a node have as left (and right) values that are bigger than that node’s left value;
  2. all records below a node have left (and right) values that are smaller than that node’s right value.

So, if for instance we would like to fetch Americas and all its descendants, we have to look for records where left between 14 and 25.
To fetch only the descendants, we would search for records where left between 15 and 25.

Fetch the path down to a node

Similarly, if you need the path down to a node, you’ll only have to look for records with:

  1. a left value that is smaller or equal to that node’s left value;
  2. a right value that is bigger or equal to that node’s right value.

To get the path down to the Caribbean, search for records where left<=22 and right>=23.
To fetch only the path without the Caribbean record itself, simply drop the equal sign and use where left<22 and right>23.

How many descendants does a node have?

As a last example to this quick introduction to the pre-ordered tree traversal, let’s count!
How many descendants a node has is very easy to find out with this simple formula:
(right – left – 1) / 2

Conclusion

The pre-ordered tree traversal is very powerful and easy to use when dealing with multiple depth levels and whole branches of hierarchical data. And as you noticed, whether working with Africa or the Americas, which has one more depth level, the queries are exactly the same! There no need to write your queries differently depending on the depth of the data or even where’s your starting point in the tree.

However, it is sometimes easier to use the good old parent-child relationships, as when looking only for the direct parent or children of a node instead of the whole ascendance or descendance, respectively. That’s why you can use both methods with our worldCities database.

In worldCities

To see how to use hierarchical data in practice, take a look at the worldCities sample queries, some of which use the classic parent-child relationship and others the preorder tree traversal.