Using Python to flatten and unpack dictionaries

Using Python to flatten and unpack dictionaries - python

I am using python to flatten and unpack JSON structures. I have already figured out flattening and can flatten JSON files into dictionary structures like this:
# Given the JSON
{
"a": "thing",
"b": {
"c": "foo"
},
"array": [
"item1",
"item2"
]
}
Then flatten() it into:
{
"a": "thing",
"b.c": "foo",
"array.[0]": "item1",
"array.[1]": "item2"
}
But any ideas on how to unpack those flattened dicts back into the original json? I had an idea on how to do it using string.split() on the key names but the arrays complicated things and now I don't know how to go about doing it. The trouble is arrays can have items that themselves are another array or dict. I am guessing something recursive?
UPDATE: So I have looking around for packages that unflatten (or flatten + unflatten, I don't care) and I found this one, which works well except it can't handle paths that include the separator character as part of the key name.
For example I had a path that flattened down into REG_SRC.http://www.awebsite.com/ but when unflattened, it got a little mangled because the dots in the URL were interpreted as key seperators. Does anyone know of a library that can handle key names with any text? Even text containing the separator character? I am assuming it would require the flat paths to be quote encapsulated or something "REG_SRC"."http://www.awebsite.com/"

You could try this:
import re
from bisect import insort
RE_SPLIT = re.compile(r'(?<!\\)\.')
RE_INDEX = re.compile(r'\[(\d+)\]')
data = {
"a": "thing",
"b.c": "foo",
"array.[1]": "item2",
"array.[0]": "item1",
"some\\.text": 'bar2', # you need to difference between `.` path operator and normal . char
}
result = {}
for key, value in data.items():
paths = RE_SPLIT.split(key)
if len(paths) > 1:
subkey, subvalue = paths
m = RE_INDEX.search(subvalue)
if m:
# if you care about order this code needs to be enhanced
insort(result.setdefault(subkey, []),((int(m.group(1)),value)))
else:
result.setdefault(subkey, {})[subvalue] = value
else:
result[key.replace('\\.', '.')] = value
# convert tuple of elements to just one element like (0, item) to item
for k in result:
if isinstance(result[k], list):
result[k] = [e[1] for e in result[k]]
print(result)
{'a': 'thing', 'b': {'c': 'foo'}, 'array': ['item1', 'item2'], 'some.text': 'bar2'}
I used bisect module to insert the items of list in order. as you can see item2 is in the second position.

Related

How to sum equal key values when inserting them into a new dictionary in Python?

I have the dictionary that I got from a .txt file.
dictOne = {
"AAA": 0,
"BBB": 1,
"AAA": 3,
"BBB": 1,
}
I would like to generate a new dictionary called dictTwo with the sum of values of equal keys. Result:
dictTwo = {
"AAA": 3,
"BBB": 2,
}
I prepared the following code, but it points to error syntax (SyntaxError: invalid syntax):
import json
dictOne = json.loads(text)
dictTwo = {}
for k, v in dictOne.items():
dictTwo [k] = v += v
Can anyone help me what error?

Assuming you resolve the duplicate key issue in dict
dictOne = {
"AAA": 0,
"BBB": 1,
"AAA": 3,
"BBB": 1
}
dictTwo = {
"AAA": 3,
"BBB": 2,
}
for k, v in dictOne.items():
if k in dictTwo:
dictTwo [k] += v
else:
dictTwo[k] = v
print(dictTwo)

You can do this if you do it while reading the JSON input.
JSON permits duplicate keys in objects, although it discourages the practice, noting that different JSON processors produce different results for duplicate keys.
Python does not allow duplicate keys in dictionaries, and Python's json module handles duplicate keys in one of the ways noted by the JSON standard: it ignores all but the last value for any such key. However, it gives you a mechanism to do your own processing of objects, in case you want to do something else with duplicate keys (or produce something other than a dictionary).
You do this by providing the object_pairs_hook parameter to json.load or json.loads. That parameter should be a function whose argument is an iterable of (key, value) pairs, where the key is a string and the value is an already processed JSON object. Whatever the function returns will be the value used by json.load for an object literal; it does not need to return a dict.
That implies that the handling of duplicate keys will be the same for every object literal in the JSON input, which is a bit of a limitation, but it may be acceptable in your case.
Here's a simple example:
import json
def key_combiner(pairs):
rv = {}
for k, v in pairs:
if k in rv: rv[k] += v
else: rv[k] = v
return rv
# Sample usage:
# (Note: JSON doesn't allow trailing commas in objects or lists.)
json_data = '''{
"AAA": 0,
"BBB": 1,
"AAA": 3,
"BBB": 1
}'''
consolidated = json.loads(json_data, object_pairs_hook=key_combiner)
print(consolidated)
This prints {'AAA': 3, 'BBB': 2}.
If I'd known that the values were numbers, I could have used a slightly simpler definition using defaultdict. Writing it the way I did permits combining certain other value types, such as strings or arrays, provided that all the values for the same key in an object are the same type. (Unfortunately, it doesn't allow combining objects, because Python uses | to combine two dicts, instead of +.)
This feature was mostly intended to be used for creating class instances from json objects, but it has many other possible uses.

How to append the list of dictionary to same list in python?

I'm having a JSON with nested values. I need to remove the key of the nested field and need to keep the values as plain JSON.
JSON(Structure of my JSON)
[
{
"id":"101",
"name":"User1",
"place":{
"city":"City1",
"district":"District1",
"state":"State1",
"country":"Country1"
},
"access":[{"status":"available"}]
}
]
I need to get the JSON output as:
Expected Output:
[
{
"id":"101",
"name":"User1",
"city":"City1",
"district":"District1",
"state":"State1",
"country":"Country1"
"access":[{"status":"available"}]
}
]
What i need is:
I need to parse the JSON
Get the Placefield out of the JSON
Remove the key and brackets and append the values to existing
Python
for i in range(0,len(json_data)):
place_data = json_data[i]['place']
print(type(place_data)) #dict
del place_data['place']
Any approach to get the expected output in python.?

One way to accomplish this could be by
for i in json_data:
i.update(i.pop("place"))

Another way to accomplish this with multiple "keys" updated...
This would only work for a single nested level as described in the original question
def expand(array):
flatten = list()
for obj in array:
temp = {}
for key, value in obj.items():
if isinstance(value, dict):
temp.update(value)
else:
temp.update({key:value})
flatten.append(temp)
return flatten

Remove entire JSON object if it contains a specified phrase (from a list in python)

Have a JSON file output similar to:
{
"object1": {
"json_data": "{json data information}",
"tables_data": "TABLES_DATA"
},
"object2": {
"json_data": {json data information}",
"tables_data": ""
}
}
Essentially, if there is an empty string for tables_data as shown in object2 (eg. "tables_data": ""), I want the entire object to be removed so that the output would look like:
{
"object1": {
"json_data": "{json data information}",
"tables_data": "TABLES_DATA"
}
}
What is the best way to go about doing this? Each of these objects correspond to a separate index in a list that I've appended called summary[].

To achieve this, you could iterate through the JSON dictionary and test the tables_data values, adding the objectX elements to a new dictionary if their tables_data value passes the test:
new_dict = {k: v for k, v in json_dict.items()
if v.get("tables_data", "") != ""}
If your JSON objectX is stored in a list as you say, these could be processed as follows using a list comprehension:
filtered_summary = [object_dict for object_dict in summary
if object_dict.get("tables_data", "") != ""]

Unless you have compelling reasons to do otherwise, the pythonic way to filter out a list or dict (or any iterable) is not to change it in place but to create a new filtered one. For your case this would look like
raw_data = YOUR_DICT_HERE_WHEREVER_IT_COMES_FROM
# NB : empty string have a false value in a boolean test
cleaned_data = {k:v for k, v in raw_data.items() if not v["table_data"]}

creating nested json objects from a list

I am attempting to understand how to take a list and convert that into a nested JSON object.
Expected Output
{
"Name1": [
{
"key": "value",
"key": "value",
"key": "value",
"key": "value"
}
],
}
So far my thinking as gone as follows, convert the list to dictionary using comprehension and splitting key value pairs.
list1 = ['key value', 'key value', 'key value']
dict1 = dict(item.split(" ") for item in list1)
I then thought converting that into a JSON object would be something similar to:
print json.loads(dict1)
However, Im not sure how to create the "Name1" parent key. And it seems google is being particularly helpful. Im sure there is something simple im missing, any pointers would be appreacited.
EDIT
Included a list for reference

You simply put them in another dictionary, and use a new list. So:
import json
list1 = ['key1 value1', 'key2 value2', 'key3 value3']
dict1 = {'Name1': [dict(item.split(" ",1) for item in list1)] }
# ^ dict ^ list with 1 element end list ^ ^ end dict
json.dumps(dict1)
And this produces:
>>> print(json.dumps(dict1))
{"Name1": [{"key2": "value2", "key3": "value3", "key1": "value1"}]}
Notes:
A dictionary can only contain different keys (both in JSON and Python);
You better split with .split(" ",1) since if the value contains spaces, these are all seen as still a single value.
dictionaries are unordered, so the order of th keys can be shuffled.

Python script to convert complicated flattened data to JSON

Sorry about the vague title, I need some help with Python magic and couldn't think of anything more descriptive.
I have a fixed JSON data structure that I need to convert a CSV file to. The structure is fixed, but deeply nested with lists and such. It's similar to this but more complicated:
{
"foo" : bar,
"baz" : qux,
"nub" : [
{
"bub": "gob",
"nab": [
{
"nip": "jus",
"the": "tip",
},
...
],
},
...
],
"cok": "hed"
}
Hopefully you get the idea. Lists on dicts on lists on lists and so forth. My csv for that might look like this:
foo, baz, nub.bub, nub.nab.nip, nub.nab.the, cok
bar, qux, "gob" ,,,, "hed"
,,,,, "nab", "jus","tip",,
,,,,, "nab", "other", "values",,
Sorry if this is hard to read, but the basic idea is if there's a listed item it will be in the next row, and values are repeated to denote what sub-lists belong to what.
I'm not looking for anyone to come up with a solution to this mess, just maybe some pointers on techniques or things to look into.
Right now I have a rough plan:
I start by turning the header into a list of tuples containing the keys. For each group of rows (item) I'll create a copy of my template dict. I have a function that will set a dict value from a tuple of keys, unless it finds a list. In this case I'm going to call a funky recursive function and pass it my iterator, and continue filling up the dict in that function, and making recursive calls as I find new lists.
I could also do a lot of hardcoding, but what's the fun in that?
So that's my story. Again, just looking for some pointers on what the best way to do this might be. I wrote this quickly so it might be kinda confusing, please let me know if any more info would help. Thanks!

Your JSON is malformed. Additionally, your json must not contain arrays in order to achieve what you want.
def _tocsv(obj, base=''):
flat_dict = {}
for k in obj:
value = obj[k]
if isinstance(value, dict):
flat_dict.update(_tocsv(value, base + k + '.'))
elif isinstance(value, (int, long, str, unicode, float, bool)):
flat_dict[base + k] = value
else:
raise ValueError("Can't serialize value of type "+ type(value).__name__)
return flat_dict
def tocsv(json_content):
#assume you imported json
value = json.loads(json_content)
if isinstance(value, dict):
return _tocsv(value)
else:
raise ValueError("JSON root object must be a hash")
will let you flatten something like:
{
foo: "nestor",
bar: "kirchner",
baz: {
clorch: 1,
narf: 2,
peep: {
ooo: "you suck"
}
}
}
into something like:
{"foo": "nestor", "bar": "kirchner", "baz.clorch": 1, "baz.narf": 2, "baz.peep.ooo": "you suck"}
the keys don't preserve any specific order. you can replace flat_dict = {} with the construction of an OrderedDict if you want to preserve order.
assuming you have an array of such flat dicts:
def tocsv_many(json_str):
#assume you imported json
value = json.loads(json_content)
result = []
if isinstance(value, list):
for element in value:
if isinstance(element, dict):
result.append(_tocsv(element))
else:
raise ValueError("root children must be dicts")
else:
raise ValueError("The JSON root must be a list")
flat_dicts = tocsv_many(yourJsonInput)
you could:
create a csvlines = [] list which will hold the csv lines for ur file.
create a keysSet = set() which will hold the possible keys.
for each dict you have in this way, add the .keys() to the set. no key order is guaranteed with a normal set; use a sorted set instead. Finally we get the first CSV line.
for flat_dict in flat_dicts:
keysSet.extend(flat_dict.keys())
csvlines.appens(",".join(keysSet))
for each dict you have (iterate again), you generate an array like this:
for flat_dict in flat_dicts:
csvline = ",".join([json.dumps(flat_dict.get(keyInSet, '')) for keyInSet in keysSet])
csvlines.append(csvline)
voilah! you have your lines in csvlines

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using Python to flatten and unpack dictionaries - python

Related

How to sum equal key values when inserting them into a new dictionary in Python?

How to append the list of dictionary to same list in python?

Remove entire JSON object if it contains a specified phrase (from a list in python)

creating nested json objects from a list

Python script to convert complicated flattened data to JSON

Categories

Resources