What is the most efficient way to create nested dictionaries in Python? - python

I currently have over 10k elements in my dictionary looks like:
cars = [{'model': 'Ford', 'year': 2010},
{'model': 'BMW', 'year': 2019},
...]
And I have a second dictionary:
car_owners = [{'model': 'BMW', 'name': 'Sam', 'age': 34},
{'model': 'BMW', 'name': 'Taylor', 'age': 34},
.....]
However, I want to join together the 2 together to be something like:
combined = [{'model': 'BMW',
'year': 2019,
'owners: [{'name': 'Sam', 'age': 34}, ...]
}]
What is the best way to combine them? For the moment I am using a For loop but I feel like there are more efficient ways of dealing with this.
** This is just a fake example of data, the one I have is a lot more complex but this helps give the idea of what I want to achieve

Iterate over the first list, creating a dict with the key-val as model-val, then in the second dict, look for the same key (model) and update the first dict, if it is found:
cars = [{'model': 'Ford', 'year': 2010}, {'model': 'BMW', 'year': 2019}]
car_owners = [{'model': 'BMW', 'name': 'Sam', 'age': 34}, {'model': 'Ford', 'name': 'Taylor', 'age': 34}]
dd = {x['model']:x for x in cars}
for item in car_owners:
key = item['model']
if key in dd:
del item['model']
dd[key].update({'car_owners': item})
else:
dd[key] = item
print(list(dd.values()))
OUTPUT:
[{'model': 'BMW', 'year': 2019, 'car_owners': {'name': 'Sam', 'age': 34}}, {'model': 'Ford', 'year': 2010, 'car_owners': {'name': 'Taylor',
'age': 34}}]

Really, what you want performance wise is to have dictionaries with the model as the key. That way, you have O(1) lookup and can quickly get the requested element (instead of looping each time in order to find the car with model x).
If you're starting off with lists, I'd first create dictionaries, and then everything is O(1) from there on out.
models_to_cars = {car['model']: car for car in cars}
models_to_owners = {}
for car_owner in car_owners:
models_to_owners.setdefault(car_owner['model'], []).append(car_owner)
combined = [{
**car,
'owners': models_to_owners.get(model, [])
} for model, car in models_to_cars.items()]
Then you'd have
combined = [{'model': 'BMW',
'year': 2019,
'owners': [{'name': 'Sam', 'age': 34}, ...]
}]
as you wanted

Related

Remove dictionaries from a list of dictionaries under certain conditions

I have a very large list of dictionaries that looks like this (I show a simplified version):
list_of_dicts:
[{'ID': 1234,
'Name': 'Bobby',
'Animal': 'Dog',
'About': [{'ID': 5678, 'Food': 'Dog Food'}]},
{'ID': 5678, 'Food': 'Dog Food'},
{'ID': 91011,
'Name': 'Jack',
'Animal': 'Bird',
'About': [{'ID': 1996, 'Food': 'Seeds'}]},
{'ID': 1996, 'Food': 'Seeds'},
{'ID': 2007,
'Name': 'Bean',
'Animal': 'Cat',
'About': [{'ID': 2008, 'Food': 'Fish'}]},
{'ID': 2008, 'Food': 'Fish'}]
I'd like to remove the dictionaries containing IDs that are equal to the ID's nested in the 'About' entries. For example, 'ID' 2008, is already nested in the nested 'About' value, therefore I'd like to remove that dictionary.
I have some code that can do this, and for this specific example it works. However, the amount of data that I have is much larger, and the remove() function does not seem to remove all the entries unless I run it a couple of times.
Any suggestions on how I can do this better?
My code:
nested_ids = [5678, 1996, 2008]
for i in list_of_dicts:
if i['ID'] in nested_ids:
list_of_dicts.remove(i)
Desired output:
[{'ID': 1234,
'Name': 'Bobby',
'Animal': 'Dog',
'About': [{'ID': 5678, 'Food': 'Dog Food'}]},
{'ID': 91011,
'Name': 'Jack',
'Animal': 'Bird',
'About': [{'ID': 1996, 'Food': 'Seeds'}]},
{'ID': 2007,
'Name': 'Bean',
'Animal': 'Cat',
'About': [{'ID': 2008, 'Food': 'Fish'}]}]
You can use a list comprehension:
cleaned_list = [d for d in list_of_dicts if d['ID'] not in nested_ids]
It is happening because we are modifying the dict while iterating it, So to avoid that we can copy the required values to a new dict as follow
filtered_dicts = []
nested_ids = [5678, 1996, 2008]
for curr in list_of_dicts:
if curr['ID'] not in nested_ids:
filtered_dicts.append(curr)
the problem is that when you remove a member of a list you're changing the indexes of everything after that index so you should reorder the indices you want to remove in reverse so you start from the back of the list
so all you need to do is iterate over the list in reverse order:
for i in list_of_dicts[::-1]:

Adding key value pair to a dictionary list

I'm trying to add a third child to the below dictionary:
person = {'Name':'Jane', 'Age':32, 'Allergies':
[{'Allergen':'Dust'},
{'Allergen':'Feathers'},
{'Allergen':'Strawberries'}],
'Children':
[{'Name':'Ben', 'Age': 6}, {'Name':'Elly', 'Age': 8}]}
print(person)
{'Name': 'Jane', 'Age': 32, 'Allergies': [{'Allergen': 'Dust'}, {'Allergen': 'Feathers'}, {'Allergen': 'Strawberries'}], 'Children': [{'Name': 'Ben', 'Age': 6}, {'Name': 'Elly', 'Age': 8}]}
When I try update person.update('Children': [{'Name':'Hanna', 'Age':0}])
it replaces all children with just that one? Nothing else works either... Any suggestions?
The person dictionary does not know that the Allergies and Children are lists, so you need to use the lists' methods to append things to that specific list.
person["Allergies"].append({"Allergen": "gluten"})
# or
person["Children"].append({"name":"Hannah", "age": 0})

Obtain a set of unique values from a dictionary of lists

I have a list of dictionaries in which the dictionaries also contain a list.
I want to generate a set of the values of the respective nested lists so that I end up with a set of all of the unique items (in this case, hobbies).
I feel a set is perfect for this since it will automatically drop any duplicates, leaving me with a set of all of the unique hobbies.
people = [{'name': 'John', 'age': 47, 'hobbies': ['Python', 'cooking', 'reading']},
{'name': 'Mary', 'age': 16, 'hobbies': ['horses', 'cooking', 'art']},
{'name': 'Bob', 'age': 14, 'hobbies': ['Python', 'piano', 'cooking']},
{'name': 'Sally', 'age': 11, 'hobbies': ['biking', 'cooking']},
{'name': 'Mark', 'age': 54, 'hobbies': ['hiking', 'camping', 'Python', 'chess']},
{'name': 'Alisa', 'age': 52, 'hobbies': ['camping', 'reading']},
{'name': 'Megan', 'age': 21, 'hobbies': ['lizards', 'reading']},
{'name': 'Amanda', 'age': 19, 'hobbies': ['turtles']},
]
unique_hobbies = (item for item in people['hobbies'] for hobby in people['hobbies'].items())
print(unique_hobbies)
This generates an error:
TypeError: list indices must be integers or slices, not str
My comprehension is wrong, but I am not sure where. I want to iterate through each dictionary, then iterate through each nested list and update the items into the set, which will drop all duplicates, leaving me with a set of all of the unique hobbies.
You could also use a set-comprehension:
>>> unique_hobbies = {hobby for persondct in people for hobby in persondct['hobbies']}
>>> unique_hobbies
{'horses', 'lizards', 'cooking', 'art', 'biking', 'camping', 'reading', 'piano', 'hiking', 'turtles', 'Python', 'chess'}
The problem with your comprehension is that you want to access people['hobbies'] but people is a list and can only index lists with integers or slices. To make it work you need to iterate over you list and then access the 'hobbies' of each of the subdicts (like I did inside the set-comprehension above).
I got it:
unique_hobbies = set()
for d in people:
unique_hobbies.update(d['hobbies'])
print(unique_hobbies)

Creating a list of dictionaries from separate lists

I honestly expected this to have been asked previously, but after 30 minutes of searching I haven't had any luck.
Say we have multiple lists, each of the same length, each one containing a different type of data about something. We would like to turn this into a list of dictionaries with the data type as the key.
input:
data = [['tom', 'jim', 'mark'], ['Toronto', 'New York', 'Paris'], [1990,2000,2000]]
data_types = ['name', 'place', 'year']
output:
travels = [{'name':'tom', 'place': 'Toronto', 'year':1990},
{'name':'jim', 'place': 'New York', 'year':2000},
{'name':'mark', 'place': 'Paris', 'year':2001}]
This is fairly easy to do with index-based iteration:
travels = []
for d_index in range(len(data[0])):
travel = {}
for dt_index in range(len(data_types)):
travel[data_types[dt_index]] = data[dt_index][d_index]
travels.append(travel)
But this is 2017! There has to be a more concise way to do this! We have map, flatmap, reduce, list comprehensions, numpy, lodash, zip. Except I can't seem to compose these cleanly into this particular transformation. Any ideas?
You can use a list comprehension with zip after transposing your dataset:
>>> [dict(zip(data_types, x)) for x in zip(*data)]
[{'place': 'Toronto', 'name': 'tom', 'year': 1990},
{'place': 'New York', 'name': 'jim', 'year': 2000},
{'place': 'Paris', 'name': 'mark', 'year': 2000}]

Converting a dict of dicts into a spreadsheet

My data is a dict of dicts like this (but with a lot more fields and names):
{'Jack': {'age': 15, 'location': 'UK'},
'Jill': {'age': 23, 'location': 'US'}}
I want to export it to a spreadsheet like this:
Name Age Location
Jack 15 UK
Jill 23 US
But apparently csv.DictWriter wants to see a list of dicts like this:
[{'name': 'Jack', 'age': 15, 'location': 'UK'},
{'name': 'Jill', 'age': 23, 'location': 'US'}]
What's the simplest way to get my dict of dicts into a CSV file?
I suppose the entries should be in alphabetical name order, too, but that's easy enough to do in the spreadsheet software.
mydict = {'Jack': {'age': 15, 'location': 'UK'},
'Jill': {'age': 23, 'location': 'US'}}
datalist = []
for name in mydict:
data = mydict[name]
data['name'] = name
datalist.append(data)
And datalist will hold the desired result. Notes:
This also modifies mydict (adding the 'name' key to each datum). You probably don't mind about that. It's a bit more efficient than the alternative (copying).
It's not the most flashy one-line-way-to-do-it, but it's very readable, which IMHO is more important
You can use list-comprehension to get your new list of dicts:
>>> [dict(zip(['name'] + attrs.keys(), [name] + attrs.values())) \
for name, attrs in d.iteritems()]
[{'age': 23, 'location': 'US', 'name': 'Jill'},
{'age': 15, 'location': 'UK', 'name': 'Jack'}]
EDIT: d is your dict:
>>> d
{'Jack': {'age': 15, 'location': 'UK'}, 'Jill': {'age': 23, 'location': 'US'}}

Categories