Extract multiple key:value pairs from one dict to a new dict - python

I have a list of dict what some data, and I would like to extract certain key:value pairs into a new list of dicts. I know one way that I could do this would be to use del i['unwantedKey'], however, I would rather not delete any data but instead create a new dict with the needed data.
The column order might change, so I need something to extract the two key:value pairs from the larger dict into a new dict.
Current Data Format
[{'Speciality': 'Math', 'Name': 'Matt', 'Location': 'Miami'},
{'Speciality': 'Science', 'Name': 'Ben', 'Location': 'Las Vegas'},
{'Speciality': 'Language Arts', 'Name': 'Sarah', 'Location': 'Washington DC'},
{'Speciality': 'Spanish', 'Name': 'Tom', 'Location': 'Denver'},
{'Speciality': 'Chemistry', 'Name': 'Jim', 'Location': 'Dallas'}]
Code to delete key:value from dict
import csv
data= []
for line in csv.DictReader(open('data.csv')):
data.append(line)
for i in data:
del i['Speciality']
print data
Desired Data Format without using del i['Speciality']
[{'Name': 'Matt', 'Location': 'Miami'},
{'Name': 'Ben', 'Location': 'Las Vegas'},
{'Name': 'Sarah', 'Location': 'Washington DC'},
{'Name': 'Tom', 'Location': 'Denver'},
{'Name': 'Jim', 'Location': 'Dallas'}]

If you want to give a positive list of keys to copy over into the new dictionaries:
import csv
with open('data.csv', 'rb') as csv_file:
data = list(csv.DictReader(csv_file))
keys = ['Name', 'Location']
new_data = [dict((k, d[k]) for k in keys) for d in data]
print new_data

suppose we have,
l1 = [{'Location': 'Miami', 'Name': 'Matt', 'Speciality': 'Math'},
{'Location': 'Las Vegas', 'Name': 'Ben', 'Speciality': 'Science'},
{'Location': 'Washington DC', 'Name': 'Sarah', 'Speciality': 'Language Arts'},
{'Location': 'Denver', 'Name': 'Tom', 'Speciality': 'Spanish'},
{'Location': 'Dallas', 'Name': 'Jim', 'Speciality': 'Chemistry'}]
to create a new list of dictionaries that do not contain the keys 'Speciality' we can do,
l2 = []
for oldd in l1:
newd = {}
for k,v in oldd.items():
if k != 'Speciality':
newd[k] = v
l2.append(newd)
and now l2 will be your desired output. In general you can exclude an arbitrary list of keys like so
exclude_keys = ['Speciality', 'Name']
l2 = []
for oldd in l1:
newd = {}
for k,v in oldd.items():
if k not in exclude_keys:
newd[k] = v
l2.append(newd)
the same can be done with an include_keys variable
include_keys = ['Name', 'Location']
l2 = []
for oldd in l1:
newd = {}
for k,v in oldd.items():
if k in include_keys:
newd[k] = v
l2.append(newd)

You can create a new list of dicts limited to the keys you want with one line of code (Python 2.6+):
NLoD=[{k:d[k] for k in ('Name', 'Location')} for d in LoD]
Try it:
>>> LoD=[{'Speciality': 'Math', 'Name': 'Matt', 'Location': 'Miami'},
{'Speciality': 'Science', 'Name': 'Ben', 'Location': 'Las Vegas'},
{'Speciality': 'Language Arts', 'Name': 'Sarah', 'Location': 'Washington DC'},
{'Speciality': 'Spanish', 'Name': 'Tom', 'Location': 'Denver'},
{'Speciality': 'Chemistry', 'Name': 'Jim', 'Location': 'Dallas'}]
>>> [{k:d[k] for k in ('Name', 'Location')} for d in LoD]
[{'Name': 'Matt', 'Location': 'Miami'}, {'Name': 'Ben', 'Location': 'Las Vegas'}, {'Name': 'Sarah', 'Location': 'Washington DC'}, {'Name': 'Tom', 'Location': 'Denver'}, {'Name': 'Jim', 'Location': 'Dallas'}]
Since you are using csv, you can limit the columns that you read in the first place to the desired columns so you do not need to delete the undesired data:
dc=('Name', 'Location')
with open(fn) as f:
reader=csv.DictReader(f)
LoD=[{k:row[k] for k in dc} for row in reader]

keys_lst = ['Name', 'Location']
new_data={key:val for key,val in event.items() if key in keys_lst}
print(new_data)

Related

understanding nested python dict comprehension

I am getting along with dict comprehensions and trying to understand how the below 2 dict comprehensions work:
select_vals = ['name', 'pay']
test_dict = {'data': [{'name': 'John', 'city': 'NYC', 'pay': 70000}, {'name': 'Mike', 'city': 'NYC', 'pay': 80000}, {'name': 'Kate', 'city': 'Houston', 'pay': 65000}]}
dict_comp1 = [{key: item[key] for key in select_vals } for item in test_dict['data'] if item['pay'] > 65000 ]
The above line gets me
[{'name': 'John', 'pay': 70000}, {'name': 'Mike', 'pay': 80000}]
dict_comp2 = [{key: item[key]} for key in select_vals for item in test_dict['data'] if item['pay'] > 65000 ]
The above line gets me
[{'name': 'John'}, {'name': 'Mike'}, {'pay': 70000}, {'pay': 80000}]
How does the two o/ps vary when written in a for loop ? When I execute in a for loop
dict_comp3 = []
for key in select_vals:
for item in test_dict['data']:
if item['pay'] > 65000:
dict_comp3.append({key: item[key]})
print(dict_comp3)
The above line gets me same as dict_comp2
[{'name': 'John'}, {'name': 'Mike'}, {'pay': 70000}, {'pay': 80000}]
How do I get the o/p as dict_comp1 in a for loop ?
The select vals iteration should be the inner one
result = []
for item in test_dict['data']:
if item['pay'] > 65000:
aux = {}
for key in select_vals:
aux[key] = item[key]
result.append(aux)

Filter/group dictionary by nested value

Here‘s a simplified example of some data I have:
{"id": "1234565", "fields": {"name": "john", "email":"john#example.com", "country": "uk"}}
The wholeo nested dictionary is a bigger list of address data. The goal is to create pairs of people from the list with randomized partners where partners from the same country should be preferd. So my first real issue is to find a good way to group them by that country value.
I‘m sure there‘s a smarter way to do this than iterating through the dict and writing all records out to some new list/dict?
I think this is close to what you need:
result = {key:[i for i in value] for key, value in itertools.groupby(people, lambda item: item["fields"]["country"])}
What this does is use itertools.groupby to group all people in the people list by their specified country. The resulting dictionary has countries as keys, and the unpacked groupings (matching people) as values. Input is expected as a list of dictionaries like the one in your example:
people = [{"id": "1234565", "fields": {"name": "john", "email":"john#example.com", "country": "uk"}},
{"id": "654321", "fields": {"name": "sam", "email":"sam#example.com", "country": "uk"}}]
Sample output:
>>> print(result)
>>> {'uk': [{'fields': {'name': 'john', 'email': 'john#example.com', 'country': 'uk'}, 'id': '1234565'}, {'fields': {'name': 'sam', 'email': 'sam#example.com', 'country': 'uk'}, 'id': '654321'}]}
For a cleaner result, the looping construct can be tweaked so that only the ID of each person is included in the result dict:
result = {key:[i["id"] for i in value] for key, value in itertools.groupby(people, lambda item: item["fields"]["country"])}
>>> print(result)
>>> {'uk': ['1234565', '654321']}
EDIT: Sorry, I forgot about the sorting. Simply sort the list of people by country before putting it through groupby. It should now work properly:
sort = sorted(people, key=lambda item: item["fields"]["country"])
Here is another one that uses defaultdict:
import collections
def make_groups(nested_dicts, nested_key):
default = collections.defaultdict(list)
for nested_dict in nested_dicts:
for value in nested_dict.values():
try:
default[value[nested_key]].append(nested_dict)
except TypeError:
pass
return default
To test the results:
import random
COUNTRY = {'af', 'br', 'fr', 'mx', 'uk'}
people = [{'id': i, 'fields': {
'name': 'name'+str(i),
'email': str(i)+'#email',
'country': random.sample(COUNTRY, 1)[0]}}
for i in range(10)]
country_groups = make_groups(people, 'country')
for country, persons in country_groups.items():
print(country, persons)
Random output:
fr [{'id': 0, 'fields': {'name': 'name0', 'email': '0#email', 'country': 'fr'}}, {'id': 1, 'fields': {'name': 'name1', 'email': '1#email', 'country': 'fr'}}, {'id': 4, 'fields': {'name': 'name4', 'email': '4#email', 'country': 'fr'}}]
br [{'id': 2, 'fields': {'name': 'name2', 'email': '2#email', 'country': 'br'}}, {'id': 8, 'fields': {'name': 'name8', 'email': '8#email', 'country': 'br'}}]
uk [{'id': 3, 'fields': {'name': 'name3', 'email': '3#email', 'country': 'uk'}}, {'id': 7, 'fields': {'name': 'name7', 'email': '7#email', 'country': 'uk'}}]
af [{'id': 5, 'fields': {'name': 'name5', 'email': '5#email', 'country': 'af'}}, {'id': 9, 'fields': {'name': 'name9', 'email': '9#email', 'country': 'af'}}]
mx [{'id': 6, 'fields': {'name': 'name6', 'email': '6#email', 'country': 'mx'}}]

Mapping names in Python

I have been given the following list of dictionaries:
names = [
{'first_name': 'Jane', 'last_name': 'Doe'},
{'first_name': 'John', 'last_name': 'Kennedy'},
{'first_name': 'Ada', 'last_name': 'Lovelace'}
]
Part a was to return an array of full names, which I did as follows:
[user['first_name'] +' '+ user['last_name'] for user in names]
It returned the following:
['Jane Doe', 'John Kennedy', 'Ada Lovelace']
Part b is to Now do the same thing above, only return a list of dictionaries, with 'name' being the key. Result should be:
python
[{'name':'Jane Doe'},{'name':'John Kennedy'},{'name': 'Ada Lovelace'}]
I have tried everything I can think of. From trying to change the key, to changing back to a list and then back to a dictionary. I'm very new at Python and would appreciate any help possible.
[{'name': '{first_name} {last_name}'.format(**n)} for n in names]
The following comprehension using join will work:
result = [{'name': ' '.join((d['first_name'], d['last_name']))} for d in names]
# [{'name': 'Jane Doe'}, {'name': 'John Kennedy'}, {'name': 'Ada Lovelace'}]
Adjust your list comprehension to the following:
names = [
{'first_name': 'Jane', 'last_name': 'Doe'},
{'first_name': 'John', 'last_name': 'Kennedy'},
{'first_name': 'Ada', 'last_name': 'Lovelace'}
]
result = [{'name':d['first_name']+' '+ d['last_name']} for d in names]
print(result)
The output:
[{'name': 'Jane Doe'}, {'name': 'John Kennedy'}, {'name': 'Ada Lovelace'}]
list(map(lambda d: {'name': ' '.join((d['first_name'], d['last_name']))},names))

Group a List of Python Dictionaries

I have some JSON data coming from the API as a list of dictionaries, such as:
entities = [
{'name': 'McDonalds', 'city': 'New York', 'gross': 250000000, 'id': '000001'},
{'name': 'McDonalds', 'city': 'Philadelphia', 'gross': 190000000, 'id': '000002'},
{'name': 'Shake Shack', 'city': 'Los Angeles', 'gross': 17000000, 'id': '000003'},
{'name': 'In-N-Out Burger', 'city': 'Houston', 'gross': 23000000, 'id': '000004'},
{'name': 'In-N-Out Burger', 'city': 'Atlanta', 'gross': 12000000, 'id': '000005'},
{'name': 'In-N-Out Burger', 'city': 'Dallas', 'gross': 950000, 'id': '000006'},
]
I'm trying to group all the entries with the same name into another list of dictionaries named for whatever business it is.
def group_entities(entities):
entity_groups = []
# Establish a blank list for each unique name
for entity in entities:
entity['name'] = []
entity_groups.append(entity['name'])
# Within each business's list, add separate dictionaries with details
for entity in entities:
entity['name'].append({
'name':entity['name'],
'city':entity['city'],
'gross':entity['gross'],
'id':entity['id']
})
entity_groups.extend(entity['name'])
return entity_groups
I can't use entity['name'] as a variable name because it just changes the original value nor can I use a string version of the name. I want to end up with data I can iterate and display like:
Business
• All City 1 Dictionary Values
• All City 2 Dictionary Values, etc
Business
• All City 1 Dictionary Values
• All City 2 Dictionary Values, etc
I'm at a loss as to how to even do further research on this because I don't know proper 'googleable' terms to describe what I am trying to do.
If your data is ordered by name:
from itertools import groupby
from operator import itemgetter
entities = [
{'name': 'McDonalds', 'city': 'New York', 'gross': 250000000, 'id': '000001'},
{'name': 'McDonalds', 'city': 'Philadelphia', 'gross': 190000000, 'id': '000002'},
{'name': 'Shake Shack', 'city': 'Los Angeles', 'gross': 17000000, 'id': '000003'},
{'name': 'In-N-Out Burger', 'city': 'Houston', 'gross': 23000000, 'id': '000004'},
{'name': 'In-N-Out Burger', 'city': 'Atlanta', 'gross': 12000000, 'id': '000005'},
{'name': 'In-N-Out Burger', 'city': 'Dallas', 'gross': 950000, 'id': '000006'},
]
data = [{k: list(v)} for k, v in groupby(entities, itemgetter("name"))]
Which would give you:
[{'McDonalds': [{'id': '000001', 'city': 'New York', 'name': 'McDonalds', 'gross': 250000000}, {'id': '000002', 'city': 'Philadelphia', 'name': 'McDonalds', 'gross': 190000000}]}, {'Shake Shack': [{'id': '000003', 'city': 'Los Angeles', 'name': 'Shake Shack', 'gross': 17000000}]}, {'In-N-Out Burger': [{'id': '000004', 'city': 'Houston', 'name': 'In-N-Out Burger', 'gross': 23000000}, {'id': '000005', 'city': 'Atlanta', 'name': 'In-N-Out Burger', 'gross': 12000000}, {'id': '000006', 'city': 'Dallas', 'name': 'In-N-Out Burger', 'gross': 950000}]}]
Or if you don't want the name:
keys = ("id","gross", "city")
data = [{k: [dict(zip(keys, itemgetter(*keys)(dct))) for dct in v]} for k, v in groupby(entities, itemgetter("name"))]
If the data is not ordered you can use a defaultdict:
from collections import defaultdict
d = defaultdict(list)
for entity in entities:
d[entity["name"]].append(dict(entity))
print([{k: v} for k,v in d.items()])
Again you cab remove the name or maybe you wan to use the original dicts and you don't mind mutating them :
from collections import defaultdict
d = defaultdict(list)
for entity in entities:
d[entity.pop("name")].append(entity)
print([{k: v} for k,v in d.items()])
That will give you:
[{'Shake Shack': [{'id': '000003', 'city': 'Los Angeles', 'gross': 17000000}]}, {'McDonalds': [{'id': '000001', 'city': 'New York', 'gross': 250000000}, {'id': '000002', 'city': 'Philadelphia', 'gross': 190000000}]}, {'In-N-Out Burger': [{'id': '000004', 'city': 'Houston', 'gross': 23000000}, {'id': '000005', 'city': 'Atlanta', 'gross': 12000000}, {'id': '000006', 'city': 'Dallas', 'gross': 950000}]}]
It all depends on whether you want to use the original dicts again and/or if you want the names kept in the dicts. You can combine parts of the logic to get whatever format you like.
This should work:
def group_entities(entities):
entity_groups = {}
# Within each business's list, add separate dictionaries with details
for entity in entities:
name = entity['name'] # name is the key for entity_groups
del entity['name'] # remove it from each entity
# add the entity to the entity_groups with the key (name)
entity_groups[name] = entity_groups.get(name, []) + [entity]
return entity_groups
If you want to keep the entity name in each entity, remove the del statement.
bycompany = {}
for ent in entities:
if not ent['name'] in bycompany:
# if there is no location list for this company name,
# then start a new list for this company.
bycompany[ent['name']] = []
# Add the dict to the list of locations for this company.
bycompany[ent['name']].append(ent)

Updating a value in a dictionary inside a dictionary

If I have a list of contact dictionaries like this:
{'name': 'Rob', 'phoneNumbers': [{'phone': '123-3214', 'type': 'home'}, {'phone': '456-3216', 'type': 'work'}]}
how could I update this dictionary to remove the dashes from the phone numbers in a list of contact dictionaries pythonically?
You could just nest loops:
for contact_dict in list_of_dicts:
for phone_dict in contact_dict['phoneNumbers']:
phone_dict['phone'] = phone_dict['phone'].replace('-', '')
This alters the values in-place.
Or you could create a whole new copy of the structure, with the alterations made:
[dict(contact, phoneNumbers=[
dict(phone_dict, phone=phone_dict['phone'].replace('-', ''))
for phone_dict in contact['phoneNumbers']])
for contact in list_of_dicts]
This creates a semi-shallow copy; only the phoneNumbers key is explicitly copied, but any other mutable values are just referenced by the new dictionaries.
Demo:
>>> list_of_dicts = [{'name': 'Rob', 'phoneNumbers': [{'phone': '123-3214', 'type': 'home'}, {'phone': '456-3216', 'type': 'work'}]}]
>>> [dict(contact, phoneNumbers=[
... dict(phone_dict, phone=phone_dict['phone'].replace('-', ''))
... for phone_dict in contact['phoneNumbers']])
... for contact in list_of_dicts]
[{'phoneNumbers': [{'phone': '1233214', 'type': 'home'}, {'phone': '4563216', 'type': 'work'}], 'name': 'Rob'}]
>>> for contact_dict in list_of_dicts:
... for phone_dict in contact_dict['phoneNumbers']:
... phone_dict['phone'] = phone_dict['phone'].replace('-', '')
...
>>> list_of_dicts
[{'phoneNumbers': [{'phone': '1233214', 'type': 'home'}, {'phone': '4563216', 'type': 'work'}], 'name': 'Rob'}]
Just str.replace the -
d ={'name': "Rob", 'phoneNumbers': [{'phone': '123-3214', 'type': 'home'}, {'phone': '456-3216', 'type': 'work'}]}
for dct in d["phoneNumbers"]:
dct['phone'] = dct['phone'].replace("-","",1)
Which gives you:
{'phoneNumbers': [{'phone': '1233214', 'type': 'home'}, {'phone': '4563216', 'type': 'work'}], 'name': 'Rob'}

Categories