Python build dict from a mixture of dict keys and list values - python

Input body is:
{'columns': ['site_ref', 'site_name', 'region'], 'data': [['R005000003192', 'AIRTH DSR NS896876', 'WEST'], ['R005000003195', 'AIRTHREY DSR NS814971', 'WEST']]}
How could I build a new dict that will take the column values as keys and then populate a new dict for every list item within the data values for 1 to n?
Desired output would be:
{
{
"site_ref": "R005000003192",
"site_name": "AIRTH DSR NS896876",
"region": "WEST"
},
{
"site_ref": "R005000003195",
"site_name": "AIRTH DSR NS896876",
"region": "WEST"
}
}
I have attempted to iterate over with:
for i in range(len(result["data"])):
new_dict = []
new_dict.append(dict(zip(result["columns"], result["data"][i])))
But cannot seem to get it to complete the iteration

Note that you would require keys if the output should be a dictionary, which are currently missing in the provided desired output. However, you could create a list of dictionaries as follows:
d = {'columns': ['site_ref', 'site_name', 'region'], 'data': [['R005000003192', 'AIRTH DSR NS896876', 'WEST'], ['R005000003195', 'AIRTHREY DSR NS814971', 'WEST']]}
res = [{k: v for k, v in zip(d['columns'], datum)} for datum in d['data']]
print(res)
prints
[{'site_ref': 'R005000003192',
'site_name': 'AIRTH DSR NS896876',
'region': 'WEST'},
{'site_ref': 'R005000003195',
'site_name': 'AIRTHREY DSR NS814971',
'region': 'WEST'}]
If you want keys after all, you could e.g. use the numbering (i.e. 1 for the first, n for the n-th datum) for the keys, as follows:
res = {i+1: {k: v for k, v in zip(d['columns'], datum)} for i, datum in enumerate(d['data'])}
print(res)
prints
{1: {'site_ref': 'R005000003192',
'site_name': 'AIRTH DSR NS896876',
'region': 'WEST'},
2: {'site_ref': 'R005000003195',
'site_name': 'AIRTHREY DSR NS814971',
'region': 'WEST'}}

Related

Python Dictionary comprehension with condition

Suppose that I have a dict named data like below:
{
001: {
'data': {
'fruit': 'apple',
'vegetable': 'spinach'
},
'text': 'lorem ipsum',
'status': 10
},
002: {
.
.
.
}
}
I want to flatten(?) the data key and convert it to this:
{
001: {
'fruit': 'apple',
'vegetable': 'spinach',
'text': 'lorem ipsum',
'status': 10
},
002: {
.
.
.
}
}
I am trying to achieve this using dict comprehensions. Below implementation is with for loops:
mydict = {}
for id, values in data.items():
mydict[id] = {}
for label, value in values.items():
if label == 'data':
for x, y in value.items():
mydict[id][x] = y
else:
mydict[id][label] = value
I tried below comprehension but it gives syntax error:
mydict = {
id: {x: y} for x, y in value.items() if label == 'data' else {label: value}
for id, values in data.items() for label, value in values.items()}
Is there a way to achieve this using comprehensions only?
With dict expansions:
mydict = {i:{**v['data'], **{k:u for k, u in v.items() if k != "data"}} for i, v in data.items()}
The if clause in a comprehension (dict, list, set, generator) applies to the iteration itself, it can not be used for the production. For that you need conditionals in the production.
Generally speaking, comprehensions are really a reorganisation of a specific kind of (possibly nested) iterations:
a bunch of iterations and conditions, possibly nested
a single append/set
So
for a in b:
if c:
for d in e:
for f in g:
if h:
thing.append(i)
can be comprehension-ified, just move the production (i) to the head and put the other bits in a flat sequence:
thing = [
i
for a in b
if c
for d in e
for f in g
if h
]
Now your comprehension makes no sense, because it starts with iterating value, and there's no else in comprehension filter, and even if we add parens {x: y} for x, y in value.items() is not a value. Comprehensions also do not "merge" items, so with:
mydict = {
id: {label: value}
for id, values in data.items() for label, value in values.items()
}
Well you'll get only the last {label: value} for each id, because that's how dicts work.
Here if you consider the production loop, it's this:
for id, values in data.items():
mydict[id] = {}
This means that is your dict comprehension:
mydict = {
id: {}
for id, values in data.items()
}
the rest of the iteration is filling the value, so it needs to be a separate iteration inside the production:
mydict = {
id: {
label: value ???
for label, value in values.items()
}
for id, values in data.items()
}
In which case you hit the issue that this doesn't quite work, because you can't "conditionally iterate" in comprehensions, it's all or nothing.
Except you can: the right side of in is a normal expression, so you can do whatever you want with it, meaning you can unfold-or-refold:
mydict = {
id: {
x: y
for label, value in values.items()
for x, y in (value.items() if label == 'data' else [(label, value)])
}
for id, values in data.items()
}
This is a touch more expensive in the non-data case as you need to re-wrap the key and value in a tuple and list, but that's unlikely to be a huge deal.
An other alternative, instead of using a conditional comprehension, is to use splatting to merge the two dicts (once of which you create via a comp):
mydict = {
id: {
**values['data'],
**{label: value for label, value in values.items() if label != 'data'}
}
for id, values in data.items()
}
This can also be applied to the original to simplify it:
mydict = {}
for id, values in data.items():
mydict[id] = {}
for label, value in values.items():
if label == 'data':
mydict[id].update(value)
else:
mydict[id][label] = value
let me simplify;
sample_data = {
"001": {
"data": {
"fruit": 'apple',
"vegetable": 'spinach'
},
"text": 'lorem ipsum',
"status": 10
},
"002": {
"data": {
"fruit": 'apple',
"vegetable": 'spinach'
},
"text": 'lorem ipsum',
"status": 10
}
}
for key, row in sample_data.items():
if 'data' in row.keys():
info = sample_data[key].pop('data')
sample_data[key] = {**row, **info}
print(sample_data)

Comparing 2 json files using phyton and outputting the difference in new output file

I am trying to compare 2 json files to do a delta check between them.
Exising json:
{
"rules": [
{
"customer_id": "abc123",
"id": "1",
},
{
"customer_id": "xyz456",
"id": "2",
}
]
}
Updated json:
{
"rules": [
{
"customer_id": "abc123",
"id": "1",
},
{
"customer_id": "def456",
"id": "3",
},
{
"customer_id": "xyz789",
"id": "2",
}
]
}
What i want is my code to get the new objects from the new json(in this case id:3 and customer id def456)
however i also want to keep the original existing values (id:2 customer id should remain as xyz456 instead of updated to the new value xyz789)
Here is my current code:
import json
# Opening JSON file
f = open('1.json',)
y = open('2.json',)
# returns JSON object as a dictionary
less_data = json.load(f)
more_data = json.load(y)
# Iterating through the json list
for x in more_data['rules']:
for y in less_data['rules']:
if x['id']== y['id']:
print("x:" + x['id'],"y:" + y['id'])
break
print(x['id'] + " is not found")
//take action to add in new objects into json output
running the program i get the following output:
x:1 y:1
1 is found
x:3 y:1
3 is not found
x:3 y:2
3 is not found
x:2 y:1
2 is not found
x:2 y:2
2 is found
I only want 3 is not found to be printed once after running till the end of the inner for loop instead of printing it out every iteration. Any help would be appreaciated
You can try flatten the JSON & compare its keys as dict\json datatype is unordered kind of datatype:
If you list ordering matters
import collections
from itertools import chain
def flatten(d, parent_key='', sep='__'):
items = []
for k, v in d.items():
new_key = parent_key + sep + k if parent_key else k
if isinstance(v, collections.MutableMapping):
items.extend(flatten(v, new_key, sep=sep).items())
elif isinstance(v, list):
for idx, value in enumerate(v):
items.extend(flatten(value, new_key + sep + str(idx), sep).items())
else:
items.append((new_key, v))
return dict(items)
a1 = {'rules': [{'customer_id': 'abc123', 'id': '1'}, {'customer_id': 'xyz456', 'id': '2'}]}
a2 = {'rules': [{'customer_id': 'abc123', 'id': '1'}, {'customer_id': 'def456', 'id': '3'}, {'customer_id': 'xyz789', 'id': '2'}]}
flatten_a1 = flatten(a1)
flatten_a2 = flatten(a2)
print(flatten_a1)
print(flatten_a2)
Output:
>>> flatten_a1 => {'rules__0__customer_id': 'abc123', 'rules__0__id': '1', 'rules__1__customer_id': 'xyz456', 'rules__1__id': '2'}
>>> flatten_a2 => {'rules__0__customer_id': 'abc123', 'rules__0__id': '1', 'rules__1__customer_id': 'def456', 'rules__1__id': '3', 'rules__2__customer_id': 'xyz789', 'rules__2__id': '2'}
from flatten_a1 & flatten_a2 value, you can easily find the differences as data structure comes in single level (i.e. not in nested format)
You can find:
which keys are missing
keys are present but values are different

Python: Convert multiple columns of CSV file to nested Json

This is my input CSV file with multiple columns, I would like to convert this csv file to a json file with department, departmentID, and one nested field called customer and put first and last nested to this field.
department, departmentID, first, last
fans, 1, Caroline, Smith
fans, 1, Jenny, White
students, 2, Ben, CJ
students, 2, Joan, Carpenter
...
Output json file what I need:
[
{
"department" : "fans",
"departmentID: "1",
"customer" : [
{
"first" : "Caroline",
"last" : "Smith"
},
{
"first" : "Jenny",
"last" : "White"
}
]
},
{
"department" : "students",
"departmentID":2,
"user" :
[
{
"first" : "Ben",
"last" : "CJ"
},
{
"first" : "Joan",
"last" : "Carpenter"
}
]
}
]
my code:
from csv import DictReader
from itertools import groupby
with open('data.csv') as csvfile:
r = DictReader(csvfile, skipinitialspace=True)
data = [dict(d) for d in r]
groups = []
uniquekeys = []
for k, g in groupby(data, lambda r: (r['group'], r['groupID'])):
groups.append({
"group": k[0],
"groupID": k[1],
"user": [{k:v for k, v in d.items() if k != 'group'} for d in list(g)]
})
uniquekeys.append(k)
pprint(groups)
My issue is: groupID shows twice in the data, in and out nested json. What I want is group and groupID as grouby key.
The issue was you mixed the names of the keys so this line
"user": [{k:v for k, v in d.items() if k != 'group'} for d in list(g)]
did not strip them properly from your dictionary there was no such key. So nothing was deleted.
I do not fully understand what keys you want so the following example assumes that data.csv looks exactly like in your question department and departmentID but the script converts it to group and groupID
from csv import DictReader
from itertools import groupby
from pprint import pprint
with open('data.csv') as csvfile:
r = DictReader(csvfile, skipinitialspace=True)
data = [dict(d) for d in r]
groups = []
uniquekeys = []
for k, g in groupby(data, lambda r: (r['department'], r['departmentID'])):
groups.append({
"group": k[0],
"groupID": k[1],
"user": [{k:v for k, v in d.items() if k not in ['department','departmentID']} for d in list(g)]
})
uniquekeys.append(k)
pprint(groups)
Output:
[{'group': 'fans',
'groupID': '1',
'user': [{'first': 'Caroline', 'last': 'Smith'},
{'first': 'Jenny', 'last': 'White'}]},
{'group': 'students',
'groupID': '2',
'user': [{'first': 'Ben', 'last': 'CJ'},
{'first': 'Joan', 'last': 'Carpenter'}]}]
I used different keys so it would be really obvious which line does what and easy to customize it for different keys in input or output

How to mask a Python 3 nested dictionary to return a new dictionary with only certain items?

I am consuming several endpoints of an API that is very verbose in the data it returns. I would like to provide a subset of this data to another piece of code elsewhere.
Suppose I am given several dictionaries like this (which I plan to loop through and filter):
asset = {
'id': 1,
'name': 'MY-PC',
'owner': 'me',
'location': 'New York City',
'model': {
'id': 1,
'name': 'Surface',
'manufacturer': {
'id': 1,
'name': 'Microsoft'
}
}
}
I want to create a function that will take that dictionary in, along with a "mask" which will be used to create a new dictionary of only the allowed items. This might be an example mask (though, I can work with whatever format makes the resulting code the most concise):
mask = {
'id': True,
'name': True,
'model': {
'id': True,
'name': True,
'manufacturer': {
'name': True
}
}
}
The function should then return this:
mask = {
'id': 1,
'name': 'MY-PC',
'model': {
'id': 1,
'name': 'Surface',
'manufacturer': {
'name': 'Microsoft'
}
}
}
Is there something already built into Python 3 that would help aid in this? It looks like if I have to do this manually, it's going to get quite ugly quickly. I found itertools.compress, but that seems like it's for lists and won't handle the complexity of dictionaries.
You can recursively build a new dict from the mask by selecting only values corresponding in the main dict:
def prune_dict(dct, mask):
result = {}
for k, v in mask.items():
if isinstance(v, dict):
value = prune_dict(dct[k], v)
if value: # check that dict is non-empty
result[k] = value
elif v:
result[k] = dct[k]
return result
print(prune_dict(asset, mask))
{'id': 1,
'model': {'id': 1, 'manufacturer': {'name': 'Microsoft'}, 'name': 'Surface'},
'name': 'MY-PC'}
This would be a good chance to use recursion, here is some sample code I haven't tested:
def copy(asset, result, mask):
for key_name, value in mask.items():
if value == True:
result[key_name] = asset[key_name]
else:
result[key_name] = x = {}
copy(asset[key_name], x, value)
y = {}
copy(asset, y, mask)
This would probably be a recursive function. Also, for the mask, I recommend this format: mask = ["id", "name", "model.id", "model.name", "model.manufacturer.name"]
Then, you'd first keep only entries that are named in the mask:
def filterstage1(dictionary, mask):
result = {}
for key in dictionary:
if isinstance(dictionary[key], dict):
newmask = [maskname[mask.find(".") + 1:] for maskname in mask if maskname.startswith(key + ".")]
result[k] = filterstage1(dictionary[key], newmask)
elif key in mask:
result[key] = dictionary[key]
return result
Then, depending on whether or not you want to remove sub-dictionaries that were not in the mask and had no subelements, you can include the second stage:
def filterstage2(dictionary, mask):
result = {}
for key in dictionary:
if not (isinstance(dictionary[key], dict) and dictionary[key] == {} and key not in mask):
result[key] = dictionary[key]
Final code: filterstage2(filterstage1(dictionary, mask), mask). You can combine the two stages together if you wish.

How merge with appending two nested dictionaries in python?

For example I have two dicts:
schema = {
'type': 'object',
'properties': {
'reseller_name': {
'type': 'string',
},
'timestamp': {
'type': 'integer',
},
},
'required': ['reseller_name', 'timestamp'],
}
and
schema_add = {
'properties': {
'user_login': {
'type': 'string',
},
},
'required': ['user_login'],
}
How I can get next merged with appending result dict:
schema_result = {
'type': 'object',
'properties': {
'reseller_name': {
'type': 'string',
},
'timestamp': {
'type': 'integer',
},
'user_login': {
'type': 'string',
},
},
'required': ['reseller_name', 'timestamp', 'user_login'],
}
Rules:
Same path is properties and required for scheme and scheme_add in example.
If both dict have dicts with same path, they merged with same rules.
If both dict have lists with same path, then add first list with second.
If both dict have simple values (or dict and non dict or list and non list) with same path, then first value overriding with second.
If only one dict have key with some path, than setting this key and value.
Not sure where the problem likes, but the way you're writing it down is almost like a computer program, and the example is like a test case. Why don't you start from this?
def add_dict(d1, d2):
newdict = {}
for (key, value) in d1.iteritems():
if key in d2: ...
#apply rules, add to newdict, use
else:
#simply add
for (key, value) in d2.iteritems():
if not key in d1:
# simply add
return newdict
This can probably be written more tightly, but might be easier like that to edit.
Edit.. after writing the last comment, couldn't help but write a nicer implementation
def merge_values(a,b):
if a==None or b==None:
return a or b
# now handle cases where both have values
if type(a)==dict:
return add_dict(a, b)
if type(a)==list:
...
def add_dict(d1,d2):
return dict(
[
(key,
merge_values(
d1.get(key,None),
d2.get(key,None)))
for key
in set(d1.keys()).union(d2.keys())
])
My own solution with #Nicolas78 help:
def merge(obj_1, obj_2):
if type(obj_1) == dict and type(obj_2) == dict:
result = {}
for key, value in obj_1.iteritems():
if key not in obj_2:
result[key] = value
else:
result[key] = merge(value, obj_2[key])
for key, value in obj_2.iteritems():
if key not in obj_1:
result[key] = value
return result
if type(obj_1) == list and type(obj_2) == list:
return obj_1 + obj_2
return obj_2
I am adding simple solution of this problem. Assuming that sample data will not change.
def merge_nested_dicts(schema,schema_add):
new_schema = schema
for k in schema:
if k in schema_add.keys():
if isinstance(schema_add[k],dict):
new_schema[k].update(schema_add[k])
if isinstance(schema_add[k],list):
new_schema[k] = new_schema[k]+schema_add[k]
return new_schema
Try this if you know the keys exactly.
schema['properties'].update(schema_add['properties'])
schema['result'].append(schema_add['result'])
result is merged in schema.
If you do not know the keys exactly then one loop is required to find inner list and dictionaries.
for value in schema:
if value is dict:
if schema_add.has_key(value) and schema_add[value] is dict:
schema[value].update(schema_add[value])
elif value is list:
if schema_add.has_key(value) and schema_add[value] is list:
schema[value].append(schema_add[value])
result can be merged into different dict as well.

Categories