Filter/group dictionary by nested value - python

Here‘s a simplified example of some data I have:
{"id": "1234565", "fields": {"name": "john", "email":"john#example.com", "country": "uk"}}
The wholeo nested dictionary is a bigger list of address data. The goal is to create pairs of people from the list with randomized partners where partners from the same country should be preferd. So my first real issue is to find a good way to group them by that country value.
I‘m sure there‘s a smarter way to do this than iterating through the dict and writing all records out to some new list/dict?

I think this is close to what you need:
result = {key:[i for i in value] for key, value in itertools.groupby(people, lambda item: item["fields"]["country"])}
What this does is use itertools.groupby to group all people in the people list by their specified country. The resulting dictionary has countries as keys, and the unpacked groupings (matching people) as values. Input is expected as a list of dictionaries like the one in your example:
people = [{"id": "1234565", "fields": {"name": "john", "email":"john#example.com", "country": "uk"}},
{"id": "654321", "fields": {"name": "sam", "email":"sam#example.com", "country": "uk"}}]
Sample output:
>>> print(result)
>>> {'uk': [{'fields': {'name': 'john', 'email': 'john#example.com', 'country': 'uk'}, 'id': '1234565'}, {'fields': {'name': 'sam', 'email': 'sam#example.com', 'country': 'uk'}, 'id': '654321'}]}
For a cleaner result, the looping construct can be tweaked so that only the ID of each person is included in the result dict:
result = {key:[i["id"] for i in value] for key, value in itertools.groupby(people, lambda item: item["fields"]["country"])}
>>> print(result)
>>> {'uk': ['1234565', '654321']}
EDIT: Sorry, I forgot about the sorting. Simply sort the list of people by country before putting it through groupby. It should now work properly:
sort = sorted(people, key=lambda item: item["fields"]["country"])

Here is another one that uses defaultdict:
import collections
def make_groups(nested_dicts, nested_key):
default = collections.defaultdict(list)
for nested_dict in nested_dicts:
for value in nested_dict.values():
try:
default[value[nested_key]].append(nested_dict)
except TypeError:
pass
return default
To test the results:
import random
COUNTRY = {'af', 'br', 'fr', 'mx', 'uk'}
people = [{'id': i, 'fields': {
'name': 'name'+str(i),
'email': str(i)+'#email',
'country': random.sample(COUNTRY, 1)[0]}}
for i in range(10)]
country_groups = make_groups(people, 'country')
for country, persons in country_groups.items():
print(country, persons)
Random output:
fr [{'id': 0, 'fields': {'name': 'name0', 'email': '0#email', 'country': 'fr'}}, {'id': 1, 'fields': {'name': 'name1', 'email': '1#email', 'country': 'fr'}}, {'id': 4, 'fields': {'name': 'name4', 'email': '4#email', 'country': 'fr'}}]
br [{'id': 2, 'fields': {'name': 'name2', 'email': '2#email', 'country': 'br'}}, {'id': 8, 'fields': {'name': 'name8', 'email': '8#email', 'country': 'br'}}]
uk [{'id': 3, 'fields': {'name': 'name3', 'email': '3#email', 'country': 'uk'}}, {'id': 7, 'fields': {'name': 'name7', 'email': '7#email', 'country': 'uk'}}]
af [{'id': 5, 'fields': {'name': 'name5', 'email': '5#email', 'country': 'af'}}, {'id': 9, 'fields': {'name': 'name9', 'email': '9#email', 'country': 'af'}}]
mx [{'id': 6, 'fields': {'name': 'name6', 'email': '6#email', 'country': 'mx'}}]

Related

Python convert multiple lists to dictionary

I have 3 lists:
names = ["john", "paul", "george", "ringo"]
job = ["guitar", "bass", "guitar", "drums"]
status = ["dead", "alive", "dead", "alive"]
I am trying to figure out the best way to combine these lists into a dict like the following:
{"person":{"Name":"john", "Job":"guitar", "Status":"dead"}, "person":{"Name":"paul", "Job":"bass", "Status":"alive"}, "person":{"Name":"george", "Job":"guitar", "Status":"dead"}, "person":{"Name":"ringo", "Job":"drums", "Status":"alive"}}
I have tried using dict(zip) but cannot get it to format like above.
Thanks in advance!
I think what you want is a list of dictionaries. You can zip your three lists together and use a list comprehension. Here's an example:
[
{'name': name, 'job': job, 'status': status}
for name, job, status in zip(names, jobs, statuses)
]
(also renaming your job to jobs and status to statuses)
Which will give you:
[
{'name': 'john', 'job': 'guitar', 'status': 'dead'},
{'name': 'paul', 'job': 'bass', 'status': 'alive'},
{'name': 'george', 'job': 'guitar', 'status': 'dead'},
{'name': 'ringo', 'job': 'drums', 'status': 'alive'}
]
I think this is what you are looking for:
>>> names = ["john", "paul", "george", "ringo"]
>>> job = ["guitar", "bass", "guitar", "drums"]
>>> status = ["dead", "alive", "dead", "alive"]
>>> persons = []
>>> for n, j, s in zip(names, job, status):
... person = { 'name': n, 'job': j, 'status': s }
... persons.append(person)
...
>>> persons
[{'status': 'dead', 'job': 'guitar', 'name': 'john'}, {'status': 'alive', 'job': 'bass', 'name': 'paul'}, {'status': 'dead', 'job': 'guitar', 'name': 'george'}, {'status': 'alive', 'job': 'drums', 'name': 'ringo'}]
>>>
Try this approach:
names = ["john", "paul", "george", "ringo"]
job = ["guitar", "bass", "guitar", "drums"]
status = ["dead", "alive", "dead", "alive"]
two step process:
description =[{'job': j, 'status': s} for j,s in zip(job,status)]
artist ={n: i for n,i in zip(names,description)}}
final output:
print(artist)
{'john': {'job': 'guitar', 'status': 'dead'},
'paul': {'job': 'bass', 'status': 'alive'},
'george': {'job': 'guitar', 'status': 'dead'},
'ringo': {'job': 'drums', 'status': 'alive'}}
you can then do something like this:
artist['ringo']['job']
output:
'drums'

Python find element from list of dict in other list of dict

I have two list of dict.
students = [{'lastname': 'JAKUB', 'id': '92051048757', 'name': 'BAJOREK'},
{'lastname': 'MARIANNA', 'id': '92051861424', 'name': 'SLOTARZ'}, {'lastname':
'SZYMON', 'id': '92052033215', 'name': 'WNUK'}, {'lastname': 'WOJCIECH', 'id':
'92052877491', 'name': 'LESKO'}]
And
house = [{'id_pok': '2', 'id': '92051048757'}, {'id_pok': '24', 'id': '92051861424'}]
How to find elements that not exist in house list of dict matching by id?
Output
output = [{'lastname':
'SZYMON', 'id': '92052033215', 'name': 'WNUK'}]
I try do that
for student in students:
for home in house:
if student['id'] != home['id']:
print student
But this only repeat list
The reason your code doesn't work is that if there's any house_id which doesn't match a student_id, the student will be printed. You'd need some more logic or the any function:
for student in students:
if not any (student['id'] == home['id'] for home in house):
print(student)
It outputs:
{'lastname': 'SZYMON', 'id': '92052033215', 'name': 'WNUK'}
{'lastname': 'WOJCIECH', 'id': '92052877491', 'name': 'LESKO'}
A more efficient solution would be to keep a set of house_ids, and find students whose id isn't included in this set:
students = [{'lastname': 'JAKUB', 'id': '92051048757', 'name': 'BAJOREK'},
{'lastname': 'MARIANNA', 'id': '92051861424', 'name': 'SLOTARZ'}, {'lastname':
'SZYMON', 'id': '92052033215', 'name': 'WNUK'}, {'lastname': 'WOJCIECH', 'id':
'92052877491', 'name': 'LESKO'}]
house = [{'id_pok': '2', 'id': '92051048757'}, {'id_pok': '24', 'id': '92051861424'}]
house_ids = set(house_dict['id'] for house_dict in house)
result = [student for student in students if student['id'] not in house_ids]
print(result)
It outputs:
[{'lastname': 'SZYMON', 'id': '92052033215', 'name': 'WNUK'}, {'lastname': 'WOJCIECH', 'id': '92052877491', 'name': 'LESKO'}]
Note that 2 students match your description.
The reason setenter link description here is used is that it allows much faster lookup than a list.
student_ids = set(d.get('id') for d in students)
house_ids = set(d.get('id') for d in house)
ids_not_in_house = student_ids ^ house_ids
students = [{'lastname': 'JAKUB', 'id': '92051048757', 'name': 'BAJOREK'},
{'lastname': 'MARIANNA', 'id': '92051861424', 'name': 'SLOTARZ'}, {'lastname':
'SZYMON', 'id': '92052033215', 'name': 'WNUK'}, {'lastname': 'WOJCIECH', 'id':
'92052877491', 'name': 'LESKO'}]
house = [{'id_pok': '2', 'id': '92051048757'}, {'id_pok': '24', 'id': '92051861424'}]
s = {item['id'] for item in students}
h = {item['id'] for item in house}
not_in_house_ids = s.difference(h)
not_in_house_items = [x for x in students if x['id'] in not_in_house_ids]
print (not_in_house_items)
>>>[{'name': 'WNUK', 'lastname': 'SZYMON', 'id': '92052033215'}, {'name': 'LESKO', 'lastname': 'WOJCIECH', 'id': '92052877491'}]

Convert dictionary lists to multi-dimensional list of dictionaries

I've been trying to convert the following:
data = {'title':['doc1','doc2','doc3'], 'name':['test','check'], 'id':['ddi5i'] }
to:
[{'title':'doc1', 'name': 'test', 'id': 'ddi5i'},
{'title':'doc2', 'name': 'test', 'id': 'ddi5i'},
{'title':'doc3', 'name': 'test', 'id': 'ddi5i'},
{'title':'doc1', 'name': 'check', 'id': 'ddi5i'},
{'title':'doc2', 'name': 'check', 'id': 'ddi5i'},
{'title':'doc3', 'name': 'check', 'id': 'ddi5i'}]
I've tried various options (list comprehensions, pandas and custom code) but nothing seems to work. For example, the following:
panda.DataFrame(data).to_dict('list')
throws an error because, since it tries to map the lists, all of them have to be of the same length. Besides, the output would only be uni-dimensional which is not what I'm looking for.
itertools.product may be what you're looking for here, and it can be applied to the values of your data to get appropriate value groupings for the new dicts. Something like
list(dict(zip(data, ele)) for ele in product(*data.values()))
Demo
>>> from itertools import product
>>> list(dict(zip(data, ele)) for ele in product(*data.values()))
[{'id': 'ddi5i', 'name': 'test', 'title': 'doc1'},
{'id': 'ddi5i', 'name': 'test', 'title': 'doc2'},
{'id': 'ddi5i', 'name': 'test', 'title': 'doc3'},
{'id': 'ddi5i', 'name': 'check', 'title': 'doc1'},
{'id': 'ddi5i', 'name': 'check', 'title': 'doc2'},
{'id': 'ddi5i', 'name': 'check', 'title': 'doc3'}]
It is clear how this works once seeing
>>> list(product(*data.values()))
[('test', 'doc1', 'ddi5i'),
('test', 'doc2', 'ddi5i'),
('test', 'doc3', 'ddi5i'),
('check', 'doc1', 'ddi5i'),
('check', 'doc2', 'ddi5i'),
('check', 'doc3', 'ddi5i')]
and now it is just a matter of zipping back into a dict with the original keys.

Updating a value in a dictionary inside a dictionary

If I have a list of contact dictionaries like this:
{'name': 'Rob', 'phoneNumbers': [{'phone': '123-3214', 'type': 'home'}, {'phone': '456-3216', 'type': 'work'}]}
how could I update this dictionary to remove the dashes from the phone numbers in a list of contact dictionaries pythonically?
You could just nest loops:
for contact_dict in list_of_dicts:
for phone_dict in contact_dict['phoneNumbers']:
phone_dict['phone'] = phone_dict['phone'].replace('-', '')
This alters the values in-place.
Or you could create a whole new copy of the structure, with the alterations made:
[dict(contact, phoneNumbers=[
dict(phone_dict, phone=phone_dict['phone'].replace('-', ''))
for phone_dict in contact['phoneNumbers']])
for contact in list_of_dicts]
This creates a semi-shallow copy; only the phoneNumbers key is explicitly copied, but any other mutable values are just referenced by the new dictionaries.
Demo:
>>> list_of_dicts = [{'name': 'Rob', 'phoneNumbers': [{'phone': '123-3214', 'type': 'home'}, {'phone': '456-3216', 'type': 'work'}]}]
>>> [dict(contact, phoneNumbers=[
... dict(phone_dict, phone=phone_dict['phone'].replace('-', ''))
... for phone_dict in contact['phoneNumbers']])
... for contact in list_of_dicts]
[{'phoneNumbers': [{'phone': '1233214', 'type': 'home'}, {'phone': '4563216', 'type': 'work'}], 'name': 'Rob'}]
>>> for contact_dict in list_of_dicts:
... for phone_dict in contact_dict['phoneNumbers']:
... phone_dict['phone'] = phone_dict['phone'].replace('-', '')
...
>>> list_of_dicts
[{'phoneNumbers': [{'phone': '1233214', 'type': 'home'}, {'phone': '4563216', 'type': 'work'}], 'name': 'Rob'}]
Just str.replace the -
d ={'name': "Rob", 'phoneNumbers': [{'phone': '123-3214', 'type': 'home'}, {'phone': '456-3216', 'type': 'work'}]}
for dct in d["phoneNumbers"]:
dct['phone'] = dct['phone'].replace("-","",1)
Which gives you:
{'phoneNumbers': [{'phone': '1233214', 'type': 'home'}, {'phone': '4563216', 'type': 'work'}], 'name': 'Rob'}

Extract multiple key:value pairs from one dict to a new dict

I have a list of dict what some data, and I would like to extract certain key:value pairs into a new list of dicts. I know one way that I could do this would be to use del i['unwantedKey'], however, I would rather not delete any data but instead create a new dict with the needed data.
The column order might change, so I need something to extract the two key:value pairs from the larger dict into a new dict.
Current Data Format
[{'Speciality': 'Math', 'Name': 'Matt', 'Location': 'Miami'},
{'Speciality': 'Science', 'Name': 'Ben', 'Location': 'Las Vegas'},
{'Speciality': 'Language Arts', 'Name': 'Sarah', 'Location': 'Washington DC'},
{'Speciality': 'Spanish', 'Name': 'Tom', 'Location': 'Denver'},
{'Speciality': 'Chemistry', 'Name': 'Jim', 'Location': 'Dallas'}]
Code to delete key:value from dict
import csv
data= []
for line in csv.DictReader(open('data.csv')):
data.append(line)
for i in data:
del i['Speciality']
print data
Desired Data Format without using del i['Speciality']
[{'Name': 'Matt', 'Location': 'Miami'},
{'Name': 'Ben', 'Location': 'Las Vegas'},
{'Name': 'Sarah', 'Location': 'Washington DC'},
{'Name': 'Tom', 'Location': 'Denver'},
{'Name': 'Jim', 'Location': 'Dallas'}]
If you want to give a positive list of keys to copy over into the new dictionaries:
import csv
with open('data.csv', 'rb') as csv_file:
data = list(csv.DictReader(csv_file))
keys = ['Name', 'Location']
new_data = [dict((k, d[k]) for k in keys) for d in data]
print new_data
suppose we have,
l1 = [{'Location': 'Miami', 'Name': 'Matt', 'Speciality': 'Math'},
{'Location': 'Las Vegas', 'Name': 'Ben', 'Speciality': 'Science'},
{'Location': 'Washington DC', 'Name': 'Sarah', 'Speciality': 'Language Arts'},
{'Location': 'Denver', 'Name': 'Tom', 'Speciality': 'Spanish'},
{'Location': 'Dallas', 'Name': 'Jim', 'Speciality': 'Chemistry'}]
to create a new list of dictionaries that do not contain the keys 'Speciality' we can do,
l2 = []
for oldd in l1:
newd = {}
for k,v in oldd.items():
if k != 'Speciality':
newd[k] = v
l2.append(newd)
and now l2 will be your desired output. In general you can exclude an arbitrary list of keys like so
exclude_keys = ['Speciality', 'Name']
l2 = []
for oldd in l1:
newd = {}
for k,v in oldd.items():
if k not in exclude_keys:
newd[k] = v
l2.append(newd)
the same can be done with an include_keys variable
include_keys = ['Name', 'Location']
l2 = []
for oldd in l1:
newd = {}
for k,v in oldd.items():
if k in include_keys:
newd[k] = v
l2.append(newd)
You can create a new list of dicts limited to the keys you want with one line of code (Python 2.6+):
NLoD=[{k:d[k] for k in ('Name', 'Location')} for d in LoD]
Try it:
>>> LoD=[{'Speciality': 'Math', 'Name': 'Matt', 'Location': 'Miami'},
{'Speciality': 'Science', 'Name': 'Ben', 'Location': 'Las Vegas'},
{'Speciality': 'Language Arts', 'Name': 'Sarah', 'Location': 'Washington DC'},
{'Speciality': 'Spanish', 'Name': 'Tom', 'Location': 'Denver'},
{'Speciality': 'Chemistry', 'Name': 'Jim', 'Location': 'Dallas'}]
>>> [{k:d[k] for k in ('Name', 'Location')} for d in LoD]
[{'Name': 'Matt', 'Location': 'Miami'}, {'Name': 'Ben', 'Location': 'Las Vegas'}, {'Name': 'Sarah', 'Location': 'Washington DC'}, {'Name': 'Tom', 'Location': 'Denver'}, {'Name': 'Jim', 'Location': 'Dallas'}]
Since you are using csv, you can limit the columns that you read in the first place to the desired columns so you do not need to delete the undesired data:
dc=('Name', 'Location')
with open(fn) as f:
reader=csv.DictReader(f)
LoD=[{k:row[k] for k in dc} for row in reader]
keys_lst = ['Name', 'Location']
new_data={key:val for key,val in event.items() if key in keys_lst}
print(new_data)

Categories