I have a list of dictionaries where all dicts contain similar items but some dicts are missing certain items. Here's an example of what it would look like:
data = [
{
'city' : Toronto,
'colour' : blue
},
{
'city' : London,
'country' : UK
'colour' : green,
'name' : Alex
},
{
'city' : Kingston,
'colour' : purple,
'name' : Alex
}
]
I need to match the format of the largest dict by inserting items (with the same keys but blank values) into the smaller dicts. I also need to preserve the order of the keys so I can't just insert them at the end. Following the previous example, it would look like this:
data = [
{
'city' : Toronto,
'country' : ,
'colour' : blue,
'name' :
},
{
'city' : London,
'country' : UK
'colour' : green,
'name' : Alex
},
{
'city' : Kingston,
'country' : ,
'colour' : purple,
'name' : Alex
}
]
I'm not sure how to loop through and add entries to each dict since the dicts I'm comparing are different size. I've tried copying the largest dict and editing it, adding blank values to the end of each dict and reformatting those, and creating new dicts as I loop through but nothing has worked so far.
Here is my code so far (where all_keys is a list of all the keys in the correct order).
def format_data(input_data, all_keys)
formatted_list = [{} for i in range(len(input_data)) ]
increment = 0
for i in range(len(input_data)):
for key, value in input_data[i].items():
if (key == all_keys[increment]):
formatted_list[i][increment].update(all_keys[increment], ''))
else:
formatted_list[i][inrement].update(key, value)
increment += 1
increment = 0
return formatted_list
How can I format this? Thanks!
Dictionaries are considered unordered (unless you are using Python 3.7+). If you need a specific order, this must be specified explicitly.
For Python <3.7, you can use collections.OrderedDict: there's no concept of "inserting into the middle of a dictionary".
The example below uses set.union to calculate the union of all keys; and sorted to sort keys alphabetically.
from collections import OrderedDict
keys = sorted(set().union(*data))
res = [OrderedDict([(k, d.get(k, '')) for k in keys]) for d in data]
Result:
print(res)
[OrderedDict([('city', 'Toronto'),
('colour', 'blue'),
('country', ''),
('name', '')]),
OrderedDict([('city', 'London'),
('colour', 'green'),
('country', 'UK'),
('name', 'Alex')]),
OrderedDict([('city', 'Kingston'),
('colour', 'purple'),
('country', ''),
('name', 'Alex')])]
You can use set:
import json
d = [{'city': 'Toronto', 'colour': 'blue'}, {'city': 'London', 'country': 'UK', 'colour': 'green', 'name': 'Alex'}, {'city': 'Kingston', 'colour': 'purple', 'name': 'Alex'}]
full_keys = {i for b in map(dict.keys, d) for i in b}
final_dict = [{i:b.get(i) for i in full_keys} for b in d]
print(json.dumps(final_dict, indent=4))
Output:
[
{
"colour": "blue",
"city": "Toronto",
"name": null,
"country": null
},
{
"colour": "green",
"city": "London",
"name": "Alex",
"country": "UK"
},
{
"colour": "purple",
"city": "Kingston",
"name": "Alex",
"country": null
}
]
Related
I have two dictionaries, as below. Both dictionaries have a list of dictionaries as the value associated with their properties key; each dictionary within these lists has an id key. I wish to merge my two dictionaries into one such that the properties list in the resulting dictionary only has one dictionary for each id.
{
"name":"harry",
"properties":[
{
"id":"N3",
"status":"OPEN",
"type":"energetic"
},
{
"id":"N5",
"status":"OPEN",
"type":"hot"
}
]
}
and the other list:
{
"name":"harry",
"properties":[
{
"id":"N3",
"type":"energetic",
"language": "english"
},
{
"id":"N6",
"status":"OPEN",
"type":"cool"
}
]
}
The output I am trying to achieve is:
"name":"harry",
"properties":[
{
"id":"N3",
"status":"OPEN",
"type":"energetic",
"language": "english"
},
{
"id":"N5",
"status":"OPEN",
"type":"hot"
},
{
"id":"N6",
"status":"OPEN",
"type":"cool"
}
]
}
As id: N3 is common in both the lists, those 2 dicts should be merged with all the fields. So far I have tried using itertools and
ds = [d1, d2]
d = {}
for k in d1.keys():
d[k] = tuple(d[k] for d in ds)
Could someone please help in figuring this out?
Here is one of the approach:
a = {
"name":"harry",
"properties":[
{
"id":"N3",
"status":"OPEN",
"type":"energetic"
},
{
"id":"N5",
"status":"OPEN",
"type":"hot"
}
]
}
b = {
"name":"harry",
"properties":[
{
"id":"N3",
"type":"energetic",
"language": "english"
},
{
"id":"N6",
"status":"OPEN",
"type":"cool"
}
]
}
# Create dic maintaining the index of each id in resp dict
a_ids = {item['id']: index for index,item in enumerate(a['properties'])} #{'N3': 0, 'N5': 1}
b_ids = {item['id']: index for index,item in enumerate(b['properties'])} #{'N3': 0, 'N6': 1}
# Loop through one of the dict created
for id in a_ids.keys():
# If same ID exists in another dict, update it with the key value
if id in b_ids:
b['properties'][b_ids[id]].update(a['properties'][a_ids[id]])
# If it does not exist, then just append the new dict
else:
b['properties'].append(a['properties'][a_ids[id]])
print (b)
Output:
{'name': 'harry', 'properties': [{'id': 'N3', 'type': 'energetic', 'language': 'english', 'status': 'OPEN'}, {'id': 'N6', 'status': 'OPEN', 'type': 'cool'}, {'id': 'N5', 'status': 'OPEN', 'type': 'hot'}]}
It might help to treat the two objects as elements each in their own lists. Maybe you have other objects with different name values, such as might come out of a JSON-formatted REST request.
Then you could do a left outer join on both name and id keys:
#!/usr/bin/env python
a = [
{
"name": "harry",
"properties": [
{
"id":"N3",
"status":"OPEN",
"type":"energetic"
},
{
"id":"N5",
"status":"OPEN",
"type":"hot"
}
]
}
]
b = [
{
"name": "harry",
"properties": [
{
"id":"N3",
"type":"energetic",
"language": "english"
},
{
"id":"N6",
"status":"OPEN",
"type":"cool"
}
]
}
]
a_names = set()
a_prop_ids_by_name = {}
a_by_name = {}
for ao in a:
an = ao['name']
a_names.add(an)
if an not in a_prop_ids_by_name:
a_prop_ids_by_name[an] = set()
for ap in ao['properties']:
api = ap['id']
a_prop_ids_by_name[an].add(api)
a_by_name[an] = ao
res = []
for bo in b:
bn = bo['name']
if bn not in a_names:
res.append(bo)
else:
ao = a_by_name[bn]
bp = bo['properties']
for bpo in bp:
if bpo['id'] not in a_prop_ids_by_name[bn]:
ao['properties'].append(bpo)
res.append(ao)
print(res)
The idea above is to process list a for names and ids. The names and ids-by-name are instances of a Python set. So members are always unique.
Once you have these sets, you can do the left outer join on the contents of list b.
Either there's an object in b that doesn't exist in a (i.e. shares a common name), in which case you add that object to the result as-is. But if there is an object in b that does exist in a (which shares a common name), then you iterate over that object's id values and look for ids not already in the a ids-by-name set. You add missing properties to a, and then add that processed object to the result.
Output:
[{'name': 'harry', 'properties': [{'id': 'N3', 'status': 'OPEN', 'type': 'energetic'}, {'id': 'N5', 'status': 'OPEN', 'type': 'hot'}, {'id': 'N6', 'status': 'OPEN', 'type': 'cool'}]}]
This doesn't do any error checking on input. This relies on name values being unique per object. So if you have duplicate keys in objects in both lists, you may get garbage (incorrect or unexpected output).
My json object:
"students": [
{
"name" : "ben",
"hometown" : "unknown"
},
{
"name" : "sam",
"hometown" : "unknown"
}
]
}
with this list
"hometowns":{California,Colorado}
change to this:
"students": [
{
"name" : "ben",
"hometown" : "California"
},
{
"name" : "sam",
"hometown" : "Colorado"
}
]
}
I need to loop and check if the key = "hometown" and change its value like
students[1].hometown == hometowns[1].
First, note that your syntax is a bit off (and your towns are actually states). After correcting the syntax, we can use the zip() function to iterate over both lists together:
hometowns = ["California", "Colorado"]
students = [{"name": "ben", "hometown": "unknown"},
{"name": "sam", "hometown": "unknown"}]
for student, hometown in zip(students, hometowns):
student['hometown'] = hometown
students
[{'name': 'ben', 'hometown': 'California'},
{'name': 'sam', 'hometown': 'Colorado'}]
You can do something like this
hometowns=["California","Colorado"]
students=[{"name" : "ben",},{"name" : "sam"}]
for student,town in zip(students,hometown):
student["hometown"]=town
I assumed you were trying to specify hometowns and students as variables rather than elements of a larger dictionary, which would change the syntax somewhat.
here is an alternative simple solution if you have not learnt about zip() in python:
hometowns = ["California","Colorado"]
a = {"students": [
{
"name" : "ben",
"hometown" : "unknown"
},
{
"name" : "sam",
"hometown" : "unknown"
}
] }
num = 0
for j in a["students"]:
j["hometown"] = hometowns[num]
num += 1
print(a)
I am facing this issue where I need to insert a new field in an existing document at a specific position.
Sample document: {
"name": "user",
"age" : "21",
"designation": "Developer"
}
So the above one is the sample document,what I want is to add "university" : "ASU" under key "age" is this possible?
Here's what you can do, first take the document as a dict, then we will determine the index of age and then we will do some indexing, look below:
>>> dic = { "name": "user", "age" : "21", "designation": "Developer" }
>>> dic['university'] = 'ASU'
>>> dic
{'name': 'user', 'age': '21', 'designation': 'Developer', 'university': 'ASU'}
Added the university field, now we will do some exchanging by using dic.items().
>>> i = list(dic.items())
>>> i
[('name', 'user'), ('age', '21'), ('designation', 'Developer'), ('university', 'ASU')]
#now we will acquire index of 'age' field
>>> index = [j for j in range(len(i)) if 'age' in i[j]][0]
#it will return list of single val from which we will index the val for simplicity using [0]
>>> index
1
#insert the last element ('university') below the age field, so we need to increment the index
>>> i.insert(index+1,i[-1])
# then we will create dictionary by removing the last element which is already inserted below age
>>> dict(i[:-1])
{'name': 'user', 'age': '21', 'university': 'ASU', 'designation': 'Developer'}
I have a method that takes a list of field names. In the method, I am making an API call out to get a record which will contain a list of dictionaries of fields.
API call example:
"fields": [
{
"datetime_value": "1987-02-03T00:00:00",
"name": "birth_date"
},
{
"text_value": "Dennis",
"name": "first_name"
},
{
"text_value": "Monsewicz",
"name": "last_name"
},
{
"text_value": "Male",
"name": "sex"
},
{
"text_value": "White",
"name": "socks"
}
]
My method makeup looks like contact(contact_id, contact_fields) where contact_fields looks like ['last_name', 'first_name']
The final fields dictionary I am trying to create would look like (not worried about order):
{
"last_name": "Monsewicz",
"first_name": "Dennis"
}
So, basically generate a single dictionary where the key is the name attribute from each dictionary in the list, but only if the name is in the list of field names passed into the method.
I've tried this:
"fields": {x: y for x, y in contact['fields'] if x in contact_fields}
Something like this?
>>> fields
[{'datetime_value': '1987-02-03T00:00:00', 'name': 'birth_date'},
{'name': 'first_name', 'text_value': 'Dennis'},
{'name': 'last_name', 'text_value': 'Monsewicz'},
{'name': 'sex', 'text_value': 'Male'},
{'name': 'socks', 'text_value': 'White'}]
>>> output = {}
>>> for field in fields:
... key = field.pop('name')
... _unused_key, value = field.popitem()
... output[key] = value
...
>>> output
{'birth_date': '1987-02-03T00:00:00',
'first_name': 'Dennis',
'last_name': 'Monsewicz',
'sex': 'Male',
'socks': 'White'}
How about this one-liner?
output = dict((x['name'], x['text_value']) for x in fields)
It basically loops through fields, pulls out name/text_value pairs then constructs a dict from it.
I can have the following JSON string:
{ "response" : [ [ { "name" : "LA_",
"uid" : 123456
} ],
[ { "cid" : "1",
"name" : "Something"
} ],
[ { "cid" : 1,
"name" : "Something-else"
} ]
] }
or one of the following:
{"error":"some-error"}
{ "response" : [ [ { "name" : "LA_",
"uid" : 123456
} ],
[ { "cid" : "1",
"name" : ""
} ],
[ { "cid" : 1,
"name" : "Something-else"
} ]
] }
{ "response" : [ [ { "name" : "LA_",
"uid" : 123456
} ] ] }
So, I am not sure if all childs and elements are there. Will it be enough to do the following verifications to get Something value:
if jsonstr.get('response'):
jsonstr = jsonstr.get('response')[1][0]
if jsonstr:
name = jsonstr.get('name')
if jsonstr: # I don't need empty value
# save in the database
Can the same be simplified?
You're not guaranteed that the ordering of your inner objects will be the same every time you parse it, so indexing is not a safe bet to reference the index of the object with the name attribute set to Something.
Instead of nesting all those if statements, you can get away with using a list comprehension. Observe that if you iterate the response key, you get a list of lists, each with a dictionary inside of it:
>>> data = {"response":[[{"uid":123456,"name":"LA_"}],[{"cid":"1","name":"Something"}],[{"cid":1,"name":"Something-else"}]]}
>>> [lst for lst in data.get('response')]
[[{'name': 'LA_', 'uid': 123456}], [{'name': 'Something', 'cid': '1'}], [{'name': 'Something-else', 'cid': 1}]]
If you index the first item in each list (lst[0]), you end up with a list of objects:
>>> [lst[0] for lst in data.get('response')]
[{'name': 'LA_', 'uid': 123456}, {'name': 'Something', 'cid': '1'}, {'name': 'Something-else', 'cid': 1}]
If you then add an if condition into your list comprehension to match the name attribute on the objects, you get a list with a single item containing your desired object:
>>> [lst[0] for lst in data.get('response') if lst[0].get('name') == 'Something']
[{'name': 'Something', 'cid': '1'}]
And then by indexing the first item that final list, you get the desired object:
>>> [lst[0] for lst in data.get('response') if lst[0].get('name') == 'Something'][0]
{'name': 'Something', 'cid': '1'}
So then you can just turn that into a function and move on with your life:
def get_obj_by_name(data, name):
objects = [lst[0] for lst in data.get('response', []) if lst[0].get('name') == name]
if objects:
return objects[0]
return None
print get_obj_by_name(data, 'Something')
# => {'name': 'Something', 'cid': '1'}
print get_obj_by_name(data, 'Something')['name']
# => 'Something'
And it should be resilient and return None if the response key isn't found:
print get_obj_by_name({"error":"some-error"}, 'Something')
# => None