One of the exclusive features of our worldCities database is the use of the pre-ordered tree traversal in the features_hierarchy table. Here is a quick look a how it works and what its advantages are.
(If you do not currently own worldCities but would like to see how it works in practice, just download the worldCities sample data for free)
Before getting our hands dirty with the pre-ordered tree traversal, let’s take a look at the classic parent-child relationship, which most developers are familiar with. A list of the continents and macro-regions according to the United Nations could look like this:
|10||8||Latin America and the Caribbean|
It is quite straightforward to fetch the list of the continents, for instance. You just have to query the database to fetch all records with parent_id=1 (World), and you’ll get Africa, Americas, Asia, etc…
Should you want the macro-regions as well, just write another query (or use a LEFT JOIN if you prefer a single query). If you have multiple or uneven depth levels, though, it can quickly become complicated to use a LEFT JOIN. Using individual queries and recursiveness for each depth level is not very efficient either.
Pre-ordered tree traversal
With the pre-ordered tree traversal, you add two values (left and right) to each record.
You just walk the tree and assign the left values one after the other until you hit the end of a branch, then assign the right value and go back then on to the next branch. While it might seem a little confusing at first, the principle is easier to grasp when seeing it, as in the following image, which shows how the left and right values are assigned.
There, we start with the left values for World, Africa and then Western Africa. This is the end of the branch, so we give it the right value and follow with Eastern Africa, etc… After Southern Africa gets its right value and the same is done to Africa, we follow the tree to the next continent, and so on. After all continents and regions have been assigned their left and right values, we just have to give World its right value (58).
Our continents and regions table now looks like this:
|10||8||17||24||Latin America and the Caribbean|
Why is it useful in practice? To me, the main reason is that it is easy to fetch multiples levels of data with a single query, thus reducing the load of the database server and speeding things up, as well as keeping cleaner code.
Fetch everything below a node
When you examine the left and right values, it quickly becomes obvious that:
- all records below a node have as left (and right) values that are bigger than that node’s left value;
- all records below a node have left (and right) values that are smaller than that node’s right value.
So, if for instance we would like to fetch Americas and all its descendants, we have to look for records where left between 14 and 25.
To fetch only the descendants, we would search for records where left between 15 and 25.
Fetch the path down to a node
Similarly, if you need the path down to a node, you’ll only have to look for records with:
- a left value that is smaller or equal to that node’s left value;
- a right value that is bigger or equal to that node’s right value.
To get the path down to the Caribbean, search for records where left<=22 and right>=23.
To fetch only the path without the Caribbean record itself, simply drop the equal sign and use where left<22 and right>23.
How many descendants does a node have?
As a last example to this quick introduction to the pre-ordered tree traversal, let’s count!
How many descendants a node has is very easy to find out with this simple formula:
(right – left – 1) / 2
The pre-ordered tree traversal is very powerful and easy to use when dealing with multiple depth levels and whole branches of hierarchical data. And as you noticed, whether working with Africa or the Americas, which has one more depth level, the queries are exactly the same! There no need to write your queries differently depending on the depth of the data or even where’s your starting point in the tree.
However, it is sometimes easier to use the good old parent-child relationships, as when looking only for the direct parent or children of a node instead of the whole ascendance or descendance, respectively. That’s why you can use both methods with our worldCities database.
To see how to use hierarchical data in practice, take a look at the worldCities sample queries, some of which use the classic parent-child relationship and others the preorder tree traversal.