Recursive Programming - Querying nested JSON in Python

python recursive-programming

Querying nested JSON in Python with recursive programming.

Wilson Yip
2023-08-08

Introduction

As we are requesting REST API heavily nowadays, we need to deal with JSON frequently. JSON objects can be transfer into Python dictionaries very easily by the json library. Yet, it is quite tedious to query nested JSON if it contains many layers.

d = {
    "layer1_item1": {"layer2_item1": {"layer3_item1": "some_info", "layer3_item2": "another_info"}},
    "layer1_item2": {"layer2_item1": {"layer3_item1": "some_info", "layer3_item2": "another_info"}}
}

For example if we want to query layer3_item2 from the above JSON,

d.get("layer1_item1").get("layer2_item1").get("layer3_item2")
'another_info'

It involves so much brackets and quotes. I have written a recursive function to query these nested JSON dictionary in Python. But we will first take a look what a recursive functions is.

Recursive function

A recursive function means the function execute itself within its own definition. A simple but endless definition is shown below.

def foo():
    foo()

The function foo calls itself within its own definition. It simply creates a loop (but an endless one in the above example).

We now take a look on a practical factorial function. Recall the factorial of \(n\) is given by

\[n! = n \cdot (n - 1) \cdot (n - 2) \cdots 3 \cdot 2 \cdot 1.\]

In Python, we can illustrate this by

def factorial(n):
    if n == 1:
        return n
    return n * factorial(n - 1)
factorial(5)
120

Querying nested JSON dictionary

The following function can query nested dictonary with syntax like key1.key2.key3 instead of multiple get, brackets and quotes. See some examples in the coming section.

from typing import Any, Optional

def pjq(
    json_dict: 'list | dict', 
    query: 'list[str] | str', 
    default: Optional[Any] = None, 
    sep: str = ".", 
    idx_sep: str = ",", 
    trim: bool = True, 
    prev_q: Optional[str] = None
) -> Any:
    query = query.split(sep) if isinstance(query, str) else query
    # Cannot pop query index otherwise affecting the for loop in list
    q: str = query[0]
    query: list[str] = query[1:]
    if json_dict == default:
        # If `default` is set to a list of dict, it cannot go through this
        return default

    elif isinstance(json_dict, dict):
        json_dict = json_dict.get(q, default)
        
        if query:
            return pjq(json_dict, query, default, sep, idx_sep, trim=trim, prev_q=q)
        return json_dict

    elif isinstance(json_dict, list):
        if q:
            try:
                idx: list[int] = [int(i) for i in q.split(idx_sep)]
                json_dict = [json_dict[i] for i in idx]
            except Exception:
                return default
        if query:
            json_dict = [pjq(jd, query, default, sep, idx_sep, trim=trim, prev_q=q) for jd in json_dict]

        if trim:
            json_dict = json_dict[0] if len(json_dict) == 1 else json_dict

        return json_dict

    else:
        return None 

Examples

Example 1

d = {
    "a": {
        "b": {"b1": 1, "b2": 2},
        "c": {"c1": 3, "c2": 4}
    }
}
pjq(d, "a.b.b2")
2

Example 2

d = {
    "a": [
        {"b1": 1, "b2": 2},
        {"b1": 3, "b2": 4},
        {"b1": 5, "b2": 6}
    ]
}
pjq(d, "a.1,2.b2")
[4, 6]
pjq(d, "a..b2")
[2, 4, 6]

Example 3

d = [
    {"a": {"b": {"x": 1}, "c": {"y": 3}}},
    {"a": {"b": {"x": 2}, "c": {"y": 4}}}
]
pjq(d, ".a.c.y")
[3, 4]