A question cropped up on the Oracle SQL and PL/SQL community forums regarding generating a JSON nested level structure that is hierarchical in nature, from hierarchical data – that typically comes in the form of an adjacency list.
So from this table data structure (PATH and LEVEL shown for clarity)…
ID PARENT_ID NAME PATH LVL
---------- ---------- ----- -------------------- ----------
1 Joe /Joe 1
2 1 Steve /Joe/Steve 2
3 1 Mandy /Joe/Mandy 2
4 2 Jeff /Joe/Steve/Jeff 3
5 3 Paul /Joe/Mandy/Paul 3
6 5 Sue /Joe/Mandy/Paul/Sue 4
etc...
To JSON like this…
{
"id" : 1,
"name" : "Joe",
"children" :
[
{
"id" : 2,
"name" : "Steve",
"children" :
[
{
"id" : 4,
"name" : "Jeff"
}
]
},
{
"id" : 3,
"name" : "Mandy",
"children" :
[
{
"id" : 5,
"name" : "Paul",
"children" :
[
{
"id" : 6,
"name" : "Sue"
},
{
"id" : 7,
"name" : "Marc"
}
]
}
]
}
]
}
Could this be solved using SQL alone? It’s an interesting problem that sounds simple, but it’s much more complex than it first appears.
Problems..
- The structure is recursive in nature, so any solution will have to rely on recursion or a stack.
- The standard approach you’d take is using nested aggregation JSON_ARRAYAGG coupled with JSON_OBJECT, yet Oracle’s standard recursive SQL approaches (connect by recursive queries or recursive subquery factoring queries) don’t support aggregation in the manner we would need to use it.
- If you were able to have aggregations in recursive SQL, to be able to nest, you’d have to start at the leaf nodes and work towards the root, but leaf nodes could be at different levels and each level could have any number of siblings. You’d have to aggregate siblings at each level under a parent, and those would contain all of their descendants too
- You couldn’t author a hard-coded static JSON generation structure of nested JSON_ARRAYAGG and JSON_OBJECT, as the structure could change in the source table, meaning the JSON is no longer correct.
- You could write code to dynamically generate the necessary SQL structure of nested JSON_ARRAYAGG and JSON_OBJECT, but with a large dataset, the resultant query could be huge, which makes this approach a non starter. The solution has to be scaleable.
A solution to this sort of problem was something that Odie had solved by writing a user-defined aggregate function “JSONNest” coupled with helper object types and PL/SQL package. It’s a clever solution, typical of Odie, but I wondered if the problem could be solved directly with SQL?
Table Setup
Here’s the setup data :
create table relationship (
id integer primary key,
parent_id integer,
name varchar2(20) not null,
constraint fk_parent foreign key (parent_id) references relationship (id)
)
/
insert into relationship (id, parent_id, name)
select 1, null, 'Joe' from dual union all
select 2, 1, 'Steve' from dual union all
select 3, 1, 'Mandy' from dual union all
select 4, 2, 'Jeff' from dual union all
select 5, 3, 'Paul' from dual union all
select 6, 5, 'Sue' from dual union all
select 7, 5, 'Marc' from dual
/
commit;
A Solution
Here’s the tweaked solution I arrived at for this blog based on that forum thread linked at the top. Note, Stew Ashton also produced a working solution for this problem, and we shared some similar ideas in our approaches. I tried to simplify my attempts and shamelessly merged the best ideas from both of us, so I must give him credit for some inspiration. Thanks Stew!
with rel_hier(id, parent_id, name, lvl) as (
select id, parent_id, name, 1 -- anchor
from relationship
where parent_id is null
union all
select n.id, n.parent_id, n.name, h.lvl + 1 -- recursion
from rel_hier h
join relationship n on n.parent_id = h.id
)
search depth first by id set rn -- depth first traversal order given by rn, siblings ordered by name
, rel_hier_with_leadlag as (
select r.*
, lag(lvl) over (order by rn) as lag_lvl -- The previous level in recursive traversal
, lead(lvl, 1, 1) over (order by rn) as lead_lvl -- The next level in recursive traversal, defaulted to 1 rather than null, as makes resolving closing tags easier
, json_object(
'id' value id
, 'name' value name
) jso
from rel_hier r
)
select
json_query( -- This line not required
xmlcast( -- Concatenate lines together, working around listagg 4000 byte limit
xmlagg(
xmlelement(e,
case
when lvl - lag_lvl = 1 then ',"children":[' -- Level incremented by one, so child level, start array
when lvl > 1 then ',' -- appending when not first level
end ||
substr(jso, 1, length(jso) - 1) || -- remove last brace, as we are controlling children
case
when lvl >= lead_lvl then '}' || -- Level same or greater than next level, so close json_object
rpad(' ', (lvl - lead_lvl) * 2 + 1, ']}') -- and add as many closing array / object blocks as required
end
)
order by rn
)
as clob
)
, '$' returning clob pretty) json_res
from rel_hier_with_leadlag
/
How It Works
The query has plenty of comments to describe how it works and is broken down into sections. It works around the limitations of recursive aggregation and Oracle JSON functions by constructing aspects of the JSON through string manipulation, but that seems one of the only options available. Anyway, in summary :
rel_hier
Performs a depth first recursive traversal (treewalk) of the data, ordering siblings by id and keeps track of the node order (rn) that the traversal occurred in (needed for processing children).
Note, you could use a connect by approach for the recursive traversal like this :
with rel_hier(id, parent_id, name, lvl, rn) as (
select id, parent_id, name, level, rownum -- rownum holds the traversal order
from relationship
start with parent_id is null -- start at root
connect by parent_id = prior id -- depth first recursive traversal
order siblings by id -- order siblings at each level by ID
)
But as Stew Ashton pointed out, if you are using ORDER SIBLINGS BY, it is an undocumented assumption that the ROWNUM will be returned in the correct order. Experience seems to show it is, but there is no guarantee in the documentation. Whereas the recursive subquery route does guarantee this, as would removing the ORDER SIBLINGS BY clause.
rel_hier_with_leadlag
Finds the previous and next node levels of each node (lag_lvl and lead_lvl), and constructs the json_object for each node. Note, the children key value content will be generated in the select, utilising the data fetched in this part.
select…
The JSON is built up utilising the json_object for each node in separate lines. I slice the closing brace “}” off the json_object if I know there are levels below it (children), so I can add them. I add the children and then work out how to close off the arrays and objects, based off the differences between the current level and the next level.
Don’t be confused by the xmlcast, xmlagg, xmlelement section – this is only used to concatenate all of the lines together into one JSON clob doc, but without the 4000 byte limit listagg provides. It uses the technique I describe in my post “A Solution to Listagg’s 4000 Byte Limit” Limit
The rpad(‘ ‘, (lvl – lead_lvl) * 2 + 1, ‘]}’) is simply used to calculate and produce how many closing array and object sections I need when the level lessens. As the structure can be jagged, this can be more than one.
json_query(… , ‘$’ returning clob pretty) isn’t really required, but it does beautify the JSON, formatting it nicely.
Ideally, Oracle would add functionality for this type of JSON structure problem, as it’s not that uncommon a structure, perhaps something similar to DBMS_XMLGen.NewcontextFromHierarchy (which does a similar thing for XML), or provide functionality similar to Odie’s JSONNest aggregation or allow aggregation in recursive structures, but for now that’s not the case.