Python dict accessible by dot, similar to how it's done with an object in Javascript.
It provides the same nature of a dict
, plus facilities in notation and a find
functionality that helps in the data exploration process. Particularly useful when dealing with a large JSON that you don't know much about.
Implemented by overriding some builtin methods of regular Python dict
, it has no impact in performance or memory footprint.
from dictdot import dictdot
d = dictdot()
# Get and set items either by dot or regular notation.
d["foo"] = 1
d.bar = 1
assert d.foo == d["bar"]
# Nested dicts are also converted to dictdot.
d.bar = [{"baz": 1}]
assert type(d.bar[0]) is dictdot
# Find elements by key or value.
assert [".foo", ".bar[0].baz"] == list(d.find(value=1))
Now let's see a more illustrative use case. We'll be using this ~25MB JSON file. If you have curl
installed in your terminal, you can download it as follows:
curl -JO https://raw.githubusercontent.com/json-iterator/test-data/master/large-file.json
The following Python code can be run as it is if you have dictdot
installed and large-file.json
in the current directory. It will load the file in memory, then convert it to dictdot
, and perform some find
tasks on it, measuring the time in each step. I share the time for each step running it in my 2018 laptop (2.6GHz CPU)):
import json
import time
from dictdot import dictdot
def get_time(old_t=None):
new_t = time.time()
if old_t:
print(f"Time: {new_t - old_t:.2f} sec\n")
return new_t
t = get_time()
print("Load data from file.")
with open("large-file.json") as f:
data = json.load(f)
t = get_time(t)
# Time: 0.36 sec
print("Convert to `dictdot`.")
data = [dictdot(d) for d in data]
t = get_time(t)
# Time: 4.67 sec
print("List all keys.")
ks = list(dictdot.find(data))
print(f"{len(ks)} keys found.")
t = get_time(t)
# 626243 keys found.
# Time: 5.53 sec
print("Find values by function.")
vs = list(dictdot.find(data, value=lambda v: type(v) is str and " dict " in v))
print(f"{len(vs)} values found.")
t = get_time(t)
# 2 values found.
# Time: 0.62 sec
print("Convert back to dict.")
dic = [d.as_dict() for d in data]
t = get_time(t)
# Time: 0.38 sec
In the example above, the variable vs
contains paths that represent items within data
. These items match the condition of being strings, and containing the substring " dict "
(with trailing spaces). Let's check them:
vs[0]
# '[169].payload.comment.body'
data[169].payload.comment.body
# "What about making the combined dict a local variable, like...
Which prefer?
vs[1]
# '[1275].payload.commits[0].message'
assert data[1275].payload.commits[0].message is \
data[1275]["payload"]["commits"][0]["message"]
-
Can initialize
dictdot
the same way asdict
:a = ["foo", "bar", "baz"] b = range(3) d = dictdot(zip(a, b))
-
Access items by dot notation, or as a
dict
:assert d.foo < d["bar"] < d.get("baz")
-
Also when setting an item:
d.bar = { "fee": 1, "boo": None, }
-
Convert to regular
dict
:d2 = d.as_dict() assert d == d2 assert type(d2) is dict
-
Function to find keys and values nested in the dict structure:
# Find every key equal to "foo". assert list(d.find(key="foo")) == [".foo"] # Find every value that bool-evaluates to False. assert list(d.find(value=lambda v: not v)) == [".foo", ".bar.boo"] # Both key and value must evaluate to True. assert list(d.find(key="bar", value=1)) == []
See the tests in the source code for more details about the behavior and usage.