Related
I have multiple lists of data, for example: age, name, gender, etc. All of them in order, meaning that the x record of every list belongs to the same person.
What I'm trying to create is a list of dictionaries from these lists in the best pythonic way. I was able to create it using one of the lists, but not sure how to scale it from there.
What I currently have:
ages = [20, 21, 30]
names = ["Jhon", "Daniel", "Rob"]
list_of_dicts = [{"age": value} for value in ages]
It returns:
[{'age': 20}, {'age': 21}, {'age': 30}]
What I want:
[{'age': 20, 'name': 'Jhon'}, {'age': 21, 'name': 'Daniel'}, {'age': 30, 'name': 'Rob'}]
You need to zip:
ages = [20, 21, 30]
names = ["Jhon", "Daniel", "Rob"]
list_of_dicts = [{"age": value, 'name': name}
for value, name in zip(ages, names)]
You can take this one step further and use a double zip (useful if you have many more keys):
keys = ['ages', 'names']
lists = [ages, names]
list_of_dicts = [dict(zip(keys, x)) for x in zip(*lists)]
output:
[{'age': 20, 'name': 'Jhon'},
{'age': 21, 'name': 'Daniel'},
{'age': 30, 'name': 'Rob'}]
Less obvious code than #mozway's, but has imho one advantage - it relies only on a single definition of a mapping dictionary so if you need to add/remove keys you have to change only one k:v pair.
ages = [20, 21, 30]
names = ["Jhon", "Daniel", "Rob"]
d = {
"name" : names,
"age" : ages
}
list_of_dicts = [dict(zip(d,t)) for t in zip(*d.values())]
print(list_of_dicts)
I currently have over 10k elements in my dictionary looks like:
cars = [{'model': 'Ford', 'year': 2010},
{'model': 'BMW', 'year': 2019},
...]
And I have a second dictionary:
car_owners = [{'model': 'BMW', 'name': 'Sam', 'age': 34},
{'model': 'BMW', 'name': 'Taylor', 'age': 34},
.....]
However, I want to join together the 2 together to be something like:
combined = [{'model': 'BMW',
'year': 2019,
'owners: [{'name': 'Sam', 'age': 34}, ...]
}]
What is the best way to combine them? For the moment I am using a For loop but I feel like there are more efficient ways of dealing with this.
** This is just a fake example of data, the one I have is a lot more complex but this helps give the idea of what I want to achieve
Iterate over the first list, creating a dict with the key-val as model-val, then in the second dict, look for the same key (model) and update the first dict, if it is found:
cars = [{'model': 'Ford', 'year': 2010}, {'model': 'BMW', 'year': 2019}]
car_owners = [{'model': 'BMW', 'name': 'Sam', 'age': 34}, {'model': 'Ford', 'name': 'Taylor', 'age': 34}]
dd = {x['model']:x for x in cars}
for item in car_owners:
key = item['model']
if key in dd:
del item['model']
dd[key].update({'car_owners': item})
else:
dd[key] = item
print(list(dd.values()))
OUTPUT:
[{'model': 'BMW', 'year': 2019, 'car_owners': {'name': 'Sam', 'age': 34}}, {'model': 'Ford', 'year': 2010, 'car_owners': {'name': 'Taylor',
'age': 34}}]
Really, what you want performance wise is to have dictionaries with the model as the key. That way, you have O(1) lookup and can quickly get the requested element (instead of looping each time in order to find the car with model x).
If you're starting off with lists, I'd first create dictionaries, and then everything is O(1) from there on out.
models_to_cars = {car['model']: car for car in cars}
models_to_owners = {}
for car_owner in car_owners:
models_to_owners.setdefault(car_owner['model'], []).append(car_owner)
combined = [{
**car,
'owners': models_to_owners.get(model, [])
} for model, car in models_to_cars.items()]
Then you'd have
combined = [{'model': 'BMW',
'year': 2019,
'owners': [{'name': 'Sam', 'age': 34}, ...]
}]
as you wanted
name=[]
age=[]
address=[]
...
for line in pg:
for key,value in line.items():
if key == 'name':
name.append(value)
elif key == 'age':
age.append(value)
elif key == 'address':
address.append(value)
.
.
.
Is it possible to use list comprehension for above code because I need to separate lots of value in the dict? I will use the lists to write to a text file.
Source Data:
a = [{'name': 'paul', 'age': '26.', 'address': 'AU', 'gender': 'male'},
{'name': 'mei', 'age': '26.', 'address': 'NY', 'gender': 'female'},
{'name': 'smith', 'age': '16.', 'address': 'NY', 'gender': 'male'},
{'name': 'raj', 'age': '13.', 'address': 'IND', 'gender': 'male'}]
I don't think list comprehension will be a wise choice because you have multiple lists.
Instead of making multiple lists and appending to them the value if the key matches you can use defaultdict to simplify your code.
from collections import defaultdict
result = defaultdict(list)
for line in pg:
for key, value in line.items():
result[key].append(value)
You can get the name list by using result.get('name')
['paul', 'mei', 'smith', 'raj']
This probably won't work the way you want: Your'e trying to assign the three different lists, so you would need three different comprehensions. If your dict is large, this would roughly triple your execution time.
Something straightforward, such as
name = [value for for key,value in line.items() if key == "name"]
seems to be what you'd want ... three times.
You can proceed as :
pg=[{"name":"name1","age":"age1","address":"address1"},{"name":"name2","age":"age2","address":"address2"}]
name=[v for line in pg for k,v in line.items() if k=="name"]
age=[v for line in pg for k,v in line.items() if k=="age"]
address=[v for line in pg for k,v in line.items() if k=="address"]
In continuation with Vishal's answer, please dont use defaultdict. Using defaultdict is a very bad practice when you want to catch keyerrors. Please use setdefault.
results = dict()
for line in pg:
for key, value in line.items():
result.setdefault(key, []).append(value)
Output
{
'name': ['paul', 'mei', 'smith', 'raj'],
'age': [26, 26, 26, 13],
...
}
However, note that if all dicts in pg dont have the same keys, you will lose the relation/correspondence between the items in the dict
Here is a really simple solution if you want to use pandas:
import pandas as pd
df = pd.DataFrame(a)
name = df['name'].tolist()
age = df['age'].tolist()
address = df['address'].tolist()
print(name)
print(age)
print(address)
Output:
['paul', 'mei', 'smith', 'raj']
['26.', '26.', '16.', '13.']
['AU', 'NY', 'NY', 'IND']
Additionally, if your end result is a text file, you can skip the list creation and write the DataFrame (or parts thereof) directly to a CSV with something as simple as:
df.to_csv('/path/to/output.csv')
The input dictionary of dictionaries are dict1 and dict2.
dict1 = {company1:[{'age':27,'weight':200,'name':'john'},{'age':23,'weight':180,'name':'peter'}],
company2:[{'age':30,'weight':190,'name':'sam'},{'age':32,'weight':210,'name':'clove'},{'age':21,'weight':170,'name':'steve'}],
company3:[{'age':36,'weight':175,'name':'shaun'},{'age':40,'weight':205,'name':'dany'},{'age':25,'weight':160,'name':'mark'}]
company4:[{'age':36,'weight':155,'name':'lina'},{'age':40,'weight':215,'name':'sammy'},{'age':25,'weight':190,'name':'matt'}]
}
dict2 = {company2:[{'age':30},{'age':45},{'age':52}],
company4:[{'age':43},{'age':67},{'age':22},{'age':34},{'age':42}]
}
I am trying to write a logic where I can check inner key ('age') of each compay key in dict2 exist in same company key dict1, even if one value of inner key 'age' matches with inner key ('age') in dict1 of same company key, then save it to a third dictionary. Please check the below example
Example:
company2:[{'age':30}]
matches with
company2:[{'age':30,'weight':190,'name':'sam'}, ...]
Also I want to save the key:values of dict1 which doesn't appered in dict2 to the dict3, As we can see in the below example company1 key does not apper in dict2.
Example:
company1:[{'age':27,'weight':200,'name':'john'},{'age':23,'weight':180,'name':'peter'}]
and
company3:[{'age':36,'weight':175,'name':'shaun'},{'age':40,'weight':205,'name':'dany'},{'age':25,'weight':160,'name':'mark'}]
Expected Output:
dict3 = {company1:[{'age':27,'weight':200,'name':'john'},{'age':23,'weight':180,'name':'peter'}],
company2:[{'age':30,'weight':190,'name':'sam'},{'age':32,'weight':210,'name':'clove'},{'age':21,'weight':170,'name':'steve'}]
company3:[{'age':36,'weight':175,'name':'shaun'},{'age':40,'weight':205,'name':'dany'},{'age':25,'weight':160,'name':'mark'}]}
pardon my explanation!
This solution might be better done using some other method more succinctly. However, it accomplishes the desired result.
from pprint import pprint
dict3 = dict()
dict1 = {'company1':[{'age':27,'weight':200,'name':'john'},{'age':23,'weight':180,'name':'peter'}],
'company2':[{'age':30,'weight':190,'name':'sam'},{'age':32,'weight':210,'name':'clove'},{'age':21,'weight':170,'name':'steve'}],
'company3':[{'age':36,'weight':175,'name':'shaun'},{'age':40,'weight':205,'name':'dany'},{'age':25,'weight':160,'name':'mark'}],
'company4':[{'age':36,'weight':155,'name':'lina'},{'age':40,'weight':215,'name':'sammy'},{'age':25,'weight':190,'name':'matt'}]
}
dict2 = {'company2':[{'age':30},{'age':45},{'age':52}],
'company4':[{'age':43},{'age':67},{'age':22},{'age':34},{'age':42}]
}
for company, array in dict1.items():
if company not in dict2:
dict3[company] = array
else:
# all the ages for this company in dict1
ages = set(map(lambda x: x['age'], array))
for dictref in dict2[company]:
if dictref['age'] in ages:
dict3[company] = array
break
pprint(dict3)
Output was
{'company1': [{'age': 27, 'name': 'john', 'weight': 200},
{'age': 23, 'name': 'peter', 'weight': 180}],
'company2': [{'age': 30, 'name': 'sam', 'weight': 190},
{'age': 32, 'name': 'clove', 'weight': 210},
{'age': 21, 'name': 'steve', 'weight': 170}],
'company3': [{'age': 36, 'name': 'shaun', 'weight': 175},
{'age': 40, 'name': 'dany', 'weight': 205},
{'age': 25, 'name': 'mark', 'weight': 160}]}
Hi I have a dictionary like below
{
'namelist': [{'name':"John",'age':23,'country':'USA'},
{'name':"Mary",'age':12,'country':'Italy'},
{'name':"Susan",'age':32,'country':'UK'}],
'classteacher':'Jon Smith'
}
I would like to know is it possible to change it to
{
'namelist': [{'name_1':"John",'age_1':23,'country_1':'USA'},
{'name_2':"Mary",'age_2':12,'country_3':'Italy'},
{'name_3':"Susan",'age_3':32,'country_3':'UK'}],
'classteacher':'Jon Smith'
}
By adding _1, _2 .... on every last position of every key
Is it possible? Thank you for your help
You can add the new values in the initial list with changing the key and removing the initial values yourdict[j+'_'+str(num)] = yourdict.pop(j)
keys() returns all the keys of a dict (name, age, country in your case)
a = {'namelist': [{'name':"John",'age':23,'country':'USA'},
{'name':"Mary",'age':12,'country':'Italy'},
{'name':"Susan",'age':32,'country':'UK'}]}
num = 1
for i in a['namelist']:
for j in list(i.keys()):
i[j+'_'+str(num)] = i.pop(j)
num += 1
print(a)
# {'namelist': [
# {'name_1': 'John', 'country_1': 'USA', 'age_1': 23},
# {'name_2': 'Mary', 'country_2': 'Italy', 'age_2': 12},
# {'name_3': 'Susan', 'age_3': 32, 'country_3': 'UK'}]}
Here is my one-line style solution, which works even if you have many keys other than 'namelist':
d = {'namelist': [{'name':"John",'age':23,'country':'USA'},
{'name':"Mary",'age':12,'country':'Italy'},
{'name':"Susan",'age':32,'country':'UK'}],
'classteacher':'Jon Smith'
}
d = {k:[{f'{k2}_{nb}':v2 for k2,v2 in i.items()} for nb,i in enumerate(v,1)] if isinstance(v,list) else v for k,v in d.items()}
print(d)
# {'namelist': [{'name_1': 'John', 'age_1': 23, 'country_1': 'USA'},
# {'name_2': 'Mary', 'age_2': 12, 'country_2': 'Italy'},
# {'name_3': 'Susan', 'age_3': 32, 'country_3': 'UK'}]},
# 'classteacher': 'Jon Smith'
# }
However as Aran-Fey said, this is not really readable and very difficult to maintain. So I also suggest you the solution with nested for loops:
d1 = {'namelist': [{'name':"John",'age':23,'country':'USA'},
{'name':"Mary",'age':12,'country':'Italy'},
{'name':"Susan",'age':32,'country':'UK'}],
'classteacher':'Jon Smith'}
for k1,v1 in d1.items():
if isinstance(v1,list):
for nb,d2 in enumerate(v1,1):
for k2 in list(d2):
d2[f'{k2}_{nb}'] = d2.pop(k2)
print(d1)
# {'namelist': [{'name_1': 'John', 'age_1': 23, 'country_1': 'USA'},
# {'name_2': 'Mary', 'age_2': 12, 'country_2': 'Italy'},
# {'name_3': 'Susan', 'age_3': 32, 'country_3': 'UK'}]},
# 'classteacher': 'Jon Smith'
# }
Using enumerate
Ex:
d = {'namelist': [{'name':"John",'age':23,'country':'USA'},
{'name':"Mary",'age':12,'country':'Italy'},
{'name':"Susan",'age':32,'country':'UK'}]}
d["namelist"] = [{k+"_"+str(i): v for k,v in value.items()} for i , value in enumerate(d["namelist"], 1)]
print(d)
Output:
{'namelist': [{'age_1': 23, 'country_1': 'USA', 'name_1': 'John'},
{'age_2': 12, 'country_2': 'Italy', 'name_2': 'Mary'},
{'age_3': 32, 'country_3': 'UK', 'name_3': 'Susan'}]}
You will have to create new key, with value correspond to old key. You can achieve this easily in one line using dict.pop
I will assume you want to add index of the row into field name. For other fields or modified them in other ways, you can do similarly.
for index, row in enumerate(a['namelist']):
row['name_%d' % index] = row.pop('name')
Output:
{'namelist': [{'age': 23, 'country': 'USA', 'name_0': 'John'},
{'age': 12, 'country': 'Italy', 'name_1': 'Mary'},
{'age': 32, 'country': 'UK', 'name_2': 'Susan'}]}
You can use dict and list comprehensions:
d = {'namelist': [{'name': "John", 'age': 23, 'country': 'USA'},
{'name': "Mary", 'age': 12, 'country': 'Italy'},
{'name': "Susan", 'age': 32, 'country': 'UK'}]}
d = {k: [{'_'.join((n, str(i))): v for n, v in s.items()} for i, s in enumerate(l, 1)] for k, l in d.items()}
d would become:
{'namelist': [{'name_1': 'John', 'age_1': 23, 'country_1': 'USA'}, {'name_2': 'Mary', 'age_2': 12, 'country_2': 'Italy'}, {'name_3': 'Susan', 'age_3': 32, 'country_3': 'UK'}]}
Use dictionary comprehension:
mydictionary['namelist'] = [dict((key + '_' + str(i), value) for key,value in mydictionary['namelist'][i-1].items()) for i in [1, 2, 3]]
for i, dct in enumerate(inp['namelist'], 1):
for key, value in list(dct.items()): # take a copy since we are modifying the dct
del dct[key] # delete old pair
dct[key+'_'+str(i)] = value # new key format
This would be in place. You are not using extra memory. Iterating over each value inside the dict and then deleting the old key-value pair and adding the it with a change in the key name.