SQL Tip: Self-Referential Tables

November 7, 2022

Tony Wallace

Note: The examples in this post are written for MSSQL. Syntax may differ for other SQL variants.

I was recently tasked with designing a SQL database for a new app. One of the requirements was to be able to store a category tree of unknown complexity and depth. It wasn't sufficient to have categories and subcategories; subcategories could also have subcategories, which could have subcategories, and so on. Each branch of the tree could have a different structure and continue to a different depth. An example tree would look something like this:

A nested List:

  • Category 1
    • Category 1.1
    • Category 1.2
      • Category 1.2.1
      • Category 1.2.2
    • Category 1.3
      • Category 1.3.1
        • Category 1.3.1.1
      • Category 1.3.2
  • Category 2
    • Category 2.1
      • Category 2.1.1
        • Category 2.1.1.1
        • Category 2.1.1.2
          • Category 2.1.1.2.1
    • Category 2.2
      • Category 2.2.1
      • Category 2.2.2
      • Category 2.2.3

It was clearly not going to be possible to satisfy these requirements with simple 'category' and 'subcategory' tables. I needed to be able to describe all the category relationships in one self-referential table.​

Designing the self-referential table

Relationships in a self-referential table are defined by two fields:

  • id (primary key, uniqueidentifier, not nullable)
  • parentId (foreign key, uniqueidentifier, nullable)

Note that 'parentId' is a foreign key that references the same table. If your table is called 'category', 'category.parentId' has a foreign key constraint to 'category.id'.

(You can also declare any other fields you might need to store the category name and other information. Those aren't relevant to this tip.)

When you insert a row into the table, you can set 'parentId' to 'null' to put the row at the top of the tree, or set 'parentId' to the 'id' of any other category to make the row a subcategory.

Selecting branches of the tree

Creating the tree is simple enough, but how do you select only the rows that are descendants of a given category id? For that, we can use a recursive common-table expression (recursive CTE).

The recursive CTE creates a virtual table ('[tree]') that is the union of the rows that have a given parent, the rows that have that first set of rows as parents, the rows that have that next set of rows as parents, and so on until there are no more descendent rows to fetch. We then select all rows from the virtual table:


	DECLARE @id UNIQUEIDENTIFIER;
	SET @id = '<id of the category whose descendants you want to select>';

	WITH [tree] AS (
	    SELECT parent.* FROM [category] parent WHERE parent.id = @id
	    UNION ALL
	    SELECT child.* FROM [tree]
	    INNER JOIN [category] child ON child.parentId = [tree].id
	)

	SELECT * from [tree]
  

We can modify the 'SELECT' statement to fetch only the bottom-most descendants of a given category, omitting the upper levels of the tree:


	DECLARE @id UNIQUEIDENTIFIER;
	SET @id = '<id of the category whose descendants you want to select>';

	WITH [tree] AS (
	    SELECT parent.* FROM [category] parent WHERE parent.id = @id
	    UNION ALL
	    SELECT child.* FROM [tree]
	    INNER JOIN [category] child ON child.parentId = [tree].id
	)

	SELECT * FROM [tree] WHERE NOT EXISTS (
	    SELECT parentId FROM [category] WHERE [category].parentId = [tree].id
	)

We can also invert the CTE to fetch categories that are ancestors of a given subcategory:


	DECLARE @id UNIQUEIDENTIFIER;
	SET @id = '<id of the category whose descendants you want to select>';

	WITH [tree] AS (
	    SELECT child.* FROM [category] child WHERE child.id = @id
	    UNION ALL
	    SELECT parent.* FROM [tree]
	    INNER JOIN [category] parent ON parent.id = [tree].parentId
	)

	SELECT * from [tree]


By modifying the 'SELECT' statement, we can find the top-most ancestor of any subcategory:


	DECLARE @id UNIQUEIDENTIFIER;
	SET @id = '<id of the category whose descendants you want to select>';

	WITH [tree] AS (
	    SELECT child.* FROM [category] child WHERE child.id = @id
	    UNION ALL
	    SELECT parent.* FROM [tree]
	    INNER JOIN [category] parent ON parent.id = [tree].parentId
	)

	SELECT * from [tree] WHERE parentId IS NULL
  

This combination of a self-referential table and recursive common-table expressions provided a great solution to my requirements. The model is straightforward, and easy to maintain, and the selection of entire branches can be performed in a single query.

Latest Posts

Be the first to know whats new at RedBit and in the industry. Plus zero spam.

Thank you! An email has been sent to confirm your subscription.
Oops! Something went wrong while submitting the form.

Let's Make Something Great Together

Contact Us