Querying nested JSON in Python with recursive programming.
As we are requesting REST API heavily nowadays, we need to deal with JSON frequently. JSON objects can be transfer into Python dictionaries very easily by the json
library. Yet, it is quite tedious to query nested JSON if it contains many layers.
= {
d "layer1_item1": {"layer2_item1": {"layer3_item1": "some_info", "layer3_item2": "another_info"}},
"layer1_item2": {"layer2_item1": {"layer3_item1": "some_info", "layer3_item2": "another_info"}}
}
For example if we want to query layer3_item2
from the above JSON,
"layer1_item1").get("layer2_item1").get("layer3_item2") d.get(
'another_info'
It involves so much brackets and quotes. I have written a recursive function to query these nested JSON dictionary in Python. But we will first take a look what a recursive functions is.
A recursive function means the function execute itself within its own definition. A simple but endless definition is shown below.
def foo():
foo()
The function foo
calls itself within its own definition. It simply creates a loop (but an endless one in the above example).
We now take a look on a practical factorial function. Recall the factorial of \(n\) is given by
\[n! = n \cdot (n - 1) \cdot (n - 2) \cdots 3 \cdot 2 \cdot 1.\]
In Python, we can illustrate this by
def factorial(n):
if n == 1:
return n
return n * factorial(n - 1)
5) factorial(
120
The following function can query nested dictonary with syntax like key1.key2.key3
instead of multiple get
, brackets and quotes. See some examples in the coming section.
from typing import Any, Optional
def pjq(
'list | dict',
json_dict: 'list[str] | str',
query: = None,
default: Optional[Any] str = ".",
sep: str = ",",
idx_sep: bool = True,
trim: str] = None
prev_q: Optional[-> Any:
) = query.split(sep) if isinstance(query, str) else query
query # Cannot pop query index otherwise affecting the for loop in list
str = query[0]
q: list[str] = query[1:]
query: if json_dict == default:
# If `default` is set to a list of dict, it cannot go through this
return default
elif isinstance(json_dict, dict):
= json_dict.get(q, default)
json_dict
if query:
return pjq(json_dict, query, default, sep, idx_sep, trim=trim, prev_q=q)
return json_dict
elif isinstance(json_dict, list):
if q:
try:
list[int] = [int(i) for i in q.split(idx_sep)]
idx: = [json_dict[i] for i in idx]
json_dict except Exception:
return default
if query:
= [pjq(jd, query, default, sep, idx_sep, trim=trim, prev_q=q) for jd in json_dict]
json_dict
if trim:
= json_dict[0] if len(json_dict) == 1 else json_dict
json_dict
return json_dict
else:
return None
= {
d "a": {
"b": {"b1": 1, "b2": 2},
"c": {"c1": 3, "c2": 4}
} }
"a.b.b2") pjq(d,
2
= {
d "a": [
"b1": 1, "b2": 2},
{"b1": 3, "b2": 4},
{"b1": 5, "b2": 6}
{
] }
"a.1,2.b2") pjq(d,
[4, 6]
"a..b2") pjq(d,
[2, 4, 6]
= [
d "a": {"b": {"x": 1}, "c": {"y": 3}}},
{"a": {"b": {"x": 2}, "c": {"y": 4}}}
{ ]
".a.c.y") pjq(d,
[3, 4]