I have been inactive on this forum for a long time now, tackled quite a bit of challenging projects and did not get to shell out time for blog updates. It is time to make up for that now.
As a thank you note to all my followers, I am planning to explain and explore "Recursive SQL" which most candidates I have interviewed so far don't have a clear understanding of. So I presume this topic is an ideal candidate to discuss here.
What is Recursive query and when do you typically need to use it?
Recursion is a process of querying iteratively, and to perform a function multiple times on the data of interest. It comes in handy when you need to access hierarchies of data. For example, when you need to access multi-categorical data (with multiple levels), such as Home Furnishings -> Dining -> Tables. OR looking to navigate organizational structure : CEO -> Senior VP -> VP -> Senior Director etc. You get the idea?
Knowing when to make use of Recursion is nearly as important as understanding the processes of recursion itself. So, think about the result you need and list down the options you have to achieve that result. There could be other straight forward approaches to solving a problem than recursion, so choose carefully. Note that for illustration purposes, I have used a simple dataset, whose result may be obtained in different ways. I have used recursion merely to show a working example.
Components of a recursive SQL:
1. Seed
2. Recursion
3. Termination and final output
Seed:
This is the portion of the SQL that bring the subset of data, on which recursion is to be performed.
Recursion:
Operation to be performed multiple times on the seed
Termination:
Final output after the recursion completes
Keywords:
WITH RECURSIVE (parameters that need to be passed to recursion) - to invoke recursion
UNION ALL - to combine the intermediate results together
Syntax:
Example:
Illustrating an example of going through a hierarchy from top to bottom. Home & Garden has 2 subcategories: Dining and Outdoor, who in turn have their own subcategory: table, chair, hammock. We want to navigate through each of this category from top to the lowest available level.
Input:
Output:
Working SQL emulating above:
As a thank you note to all my followers, I am planning to explain and explore "Recursive SQL" which most candidates I have interviewed so far don't have a clear understanding of. So I presume this topic is an ideal candidate to discuss here.
What is Recursive query and when do you typically need to use it?
Recursion is a process of querying iteratively, and to perform a function multiple times on the data of interest. It comes in handy when you need to access hierarchies of data. For example, when you need to access multi-categorical data (with multiple levels), such as Home Furnishings -> Dining -> Tables. OR looking to navigate organizational structure : CEO -> Senior VP -> VP -> Senior Director etc. You get the idea?
Knowing when to make use of Recursion is nearly as important as understanding the processes of recursion itself. So, think about the result you need and list down the options you have to achieve that result. There could be other straight forward approaches to solving a problem than recursion, so choose carefully. Note that for illustration purposes, I have used a simple dataset, whose result may be obtained in different ways. I have used recursion merely to show a working example.
Components of a recursive SQL:
1. Seed
2. Recursion
3. Termination and final output
Seed:
This is the portion of the SQL that bring the subset of data, on which recursion is to be performed.
Recursion:
Operation to be performed multiple times on the seed
Termination:
Final output after the recursion completes
Keywords:
WITH RECURSIVE (parameters that need to be passed to recursion) - to invoke recursion
UNION ALL - to combine the intermediate results together
Syntax:
WITH RECURSIVE
<REC_NAME>(<COLUMNS
TO BE PASSED TO RECURSION SEPARATED BY COMMA>)
AS (
--SEED
(<SEED
SQL>
UNION ALL
--RECURSION
<RECURSIVE SQL
CREATING INTERMEDIATE RESULTS>
)
--FINAL OUTPUT
SEL <OUTPUT
COLUMNNAMES>
FROM <REC_NAME>;
Example:
Illustrating an example of going through a hierarchy from top to bottom. Home & Garden has 2 subcategories: Dining and Outdoor, who in turn have their own subcategory: table, chair, hammock. We want to navigate through each of this category from top to the lowest available level.
Input:
Output:
Working SQL emulating above:
--SAMPLE DATA TO TEST RECURSION
CREATE VOLATILE
TABLE CATEG
(ITEM_ID INTEGER,
CATEGORY VARCHAR(100),
SUBCATEGORY VARCHAR(100)
)PRIMARY
INDEX(ITEM_ID)
ON COMMIT
PRESERVE ROWS;
INSERT INTO
CATEG
SEL 100,'HOME & GARDEN','DINING';
INSERT INTO
CATEG
SEL 101,'DINING','TABLE';
INSERT INTO
CATEG
SEL 100,'HOME & GARDEN','OUTDOOR';
INSERT INTO
CATEG
SEL 102,'OUTDOOR','HAMMOCK';
INSERT INTO
CATEG
SEL 103,'DINING','CHAIR';
--EXECUTE TO SEE THE RESULT OF
RECURSIVE SQL
WITH RECURSIVE
REC_NM(CATEGORY,SUBCATEGORY, LVL)
AS
--SEED
(SELECT
CATEGORY,SUBCATEGORY,
1
FROM
CATEG
UNION ALL
--RECURSION
SELECT A.CATEGORY,A.CATEGORY||':'||A.SUBCATEGORY||':'||B.SUBCATEGORY, LVL+1
FROM
REC_NM A, CATEG B
WHERE A.SUBCATEGORY = B.CATEGORY
)
--FINAL OUTPUT
SELECT CATEGORY,SUBCATEGORY
FROM REC_NM
QUALIFY RANK() OVER(ORDER BY LVL DESC) = 1;