Creating a list of dictionaries from separate lists - python

I honestly expected this to have been asked previously, but after 30 minutes of searching I haven't had any luck.
Say we have multiple lists, each of the same length, each one containing a different type of data about something. We would like to turn this into a list of dictionaries with the data type as the key.
input:
data = [['tom', 'jim', 'mark'], ['Toronto', 'New York', 'Paris'], [1990,2000,2000]]
data_types = ['name', 'place', 'year']
output:
travels = [{'name':'tom', 'place': 'Toronto', 'year':1990},
{'name':'jim', 'place': 'New York', 'year':2000},
{'name':'mark', 'place': 'Paris', 'year':2001}]
This is fairly easy to do with index-based iteration:
travels = []
for d_index in range(len(data[0])):
travel = {}
for dt_index in range(len(data_types)):
travel[data_types[dt_index]] = data[dt_index][d_index]
travels.append(travel)
But this is 2017! There has to be a more concise way to do this! We have map, flatmap, reduce, list comprehensions, numpy, lodash, zip. Except I can't seem to compose these cleanly into this particular transformation. Any ideas?

You can use a list comprehension with zip after transposing your dataset:
>>> [dict(zip(data_types, x)) for x in zip(*data)]
[{'place': 'Toronto', 'name': 'tom', 'year': 1990},
{'place': 'New York', 'name': 'jim', 'year': 2000},
{'place': 'Paris', 'name': 'mark', 'year': 2000}]

Related

How can i compare and remove nested dictionaries with the same values within the same dictionary?

If I have a dictionary with data in it like below what process should i enact like an if statement to delete duplicate entries such as nested dictionary 1 and 4. Lets say i wanted to delete 4 because the user entered it and i'm assuming that people are unique so they can't have the same demographics there can't be two John R. Smiths.
people = {1: {'name': 'John R. Smith', 'age': '27', 'sex': 'Male'},
2: {'name': 'Marie', 'age': '22', 'sex': 'Female'}
3: {'name': 'Mariah', 'age': '32', 'sex': 'Female'},
4: {'name': 'John R. Smith', 'age': '27', 'sex': 'Male'}}
I am just learning so i wouldn't be surprised if there is something simple I was unable to come up with.
I attempted to compare the entries such as if ['1']['name'] and ['1']['sex'] == ['4']['name'] and ['4']['sex']:
then print['4'] just to test and the error message told me that I need to be using indexes.
I've also turned it into a list which was successfull but was met with another error when trying to compare them in a manner like if person['name'] and person['age'] and person['sex'] is equal to another row within a four loop than print a message and i got nowhere.
I've also tried to turn it into a dataframe and use pandas duplicate function to remove the duplicates in which I got some error
yesterday about 'dict' probably because the dictionaries get nested in the dataframe contrasting to a list with nested
dictionaries which tends to look like this:
[{1: {'name': 'John', 'age': '27', 'sex': 'Male'},
2: {'name': 'Marie', 'age': '22', 'sex': 'Female'}]
You can take advantage of the fact that dict keys are always unique to help de-duplicate. Since dicts are unhashable and can't be used as keys directly, you can convert each sub-dict to a tuple of items first. Use dict.setdefault to keep only the first value for each distinct key:
records = {}
for number, record in people.items():
records.setdefault(tuple(record.items()), (number, record))
print(dict(records.values()))
Given your sample input, this outputs:
{1: {'name': 'John R. Smith', 'age': '27', 'sex': 'Male'}, 2: {'name': 'Marie', 'age': '22', 'sex': 'Female'}, 3: {'name': 'Mariah', 'age': '32', 'sex': 'Female'}}
Demo: https://replit.com/#blhsing/LonelyNumbWatch
One approach is to build a new dictionary by iterating over people and assigning a person to the new dictionary if their data is unique. The following solution uses a set for tracking unique users:
from pprint import pprint
unique_people = {}
unique_ids = set()
for key, data in people.items():
data_id = tuple(data.values())
if data_id in unique_ids:
continue
unique_people[key] = data
unique_ids.add(data_id)
pprint(unique_people)
Output:
{1: {'age': '27', 'name': 'John R. Smith', 'sex': 'Male'},
2: {'age': '22', 'name': 'Marie', 'sex': 'Female'},
3: {'age': '32', 'name': 'Mariah', 'sex': 'Female'}}

How do i access a dictionary after using loops to convert some lists to dictionary? The code is below

INSTRUCTION:
The data is organized such that the data at each index, from 0 to 33, corresponds to the same hurricane.
For example, names[0] yield the “Cuba I” hurricane, which occurred in months[0] (October) years[0] (1924).
Write a function that constructs a dictionary made out of the lists, where the keys of the dictionary are the names of the hurricanes, and the values are dictionaries themselves containing a key for each piece of data (Name, Month, Year, Max Sustained Wind, Areas Affected, Damage, Death) about the hurricane.
Thus the key "Cuba I" would have the value: {'Name': 'Cuba I', 'Month': 'October', 'Year': 1924, 'Max Sustained Wind': 165, 'Areas Affected': ['Central America', 'Mexico', 'Cuba', 'Florida', 'The Bahamas'], 'Damage': 'Damages not recorded', 'Deaths': 90}.
THESE ARE THE LISTS:
names = ['Cuba I', 'San Felipe II Okeechobee', 'Cuba II']
months = ['October', 'September', 'September']
years = [1924, 1928, 1932]
max_sustained_winds = [165, 160, 160]
areas_affected = [['Central America', 'Mexico', 'Cuba', 'Florida', 'The Bahamas'], ['Lesser Antilles', 'The Bahamas', 'United States East Coast', 'Atlantic Canada'], ['The Bahamas', 'Northeastern United States']]
damages = ['Damages not recorded', '100M', 'Damages not recorded', '40M']
deaths = [90,4000,16]
THE CODE I HAVE TRIED:
hurricane_records = list(zip(names, months, years, max_sustained_winds, areas_affected, update_damages(), deaths))
dict1 = {}
strongest_hurricane_records = {}
for i in range(len(names)):
dict1["Name"] = hurricane_records[i][0]
dict1["Month"] = hurricane_records[i][1]
dict1["Year"] = hurricane_records[i][2]
dict1["Max Sustained Wind"] = hurricane_records[i][3]
dict1["Areas Affected"] = hurricane_records[i][4]
dict1["Damage"] = hurricane_records[i][5]
dict1["Deaths"] = hurricane_records[i][6]
strongest_hurricane_records[names[i]] = dict1
print(strongest_hurricane_records["Cuba I"])
My problem here is that when I tried to access the dictionary "Cuba I", instead of printing the values of "Cuba I" dictionary, it is printing the last dictionary which is this:
{'Cuba II', 'September', 1928, 160, ['Lesser Antilles', 'The Bahamas', 'United States East Coast', 'Atlantic Canada'], '100M', 4000}
The problem is that you have dict1 = {} outside of the for loop, so only one dictionary will be created, and each iteration of loop will modify the same dictionary. Simply move dict1 = {} into the loop to re-initialize it for each hurricane:
strongest_hurricane_records = {}
for i in range(len(names)):
dict1 = {} # Now dict1 is unique for each loop iteration
dict1["Name"] = ...
strongest_hurricane_records[names[i]] = dict1
update_damages() isn't reproducible so I made my own lists which are similar to yours. Remember to run your code and fix errors before posting the minimal working example.
If I understand correctly you want to create a nested dictionary, i.e. a dictionary which contains other dictionaries.
I give a simplified version. You can find more advanced in other very relevant questions like in How to zip three lists into a nested dict.
# Outer dictionary:
genders = ['male', 'female']
# Inner dictionary:
keys = ['name', 'surname', 'age']
names = ['Bob', 'Natalie']
surnames = ['Blacksmith', 'Smith']
ages = [10, 20]
inner_dictionaries = [dict(zip(keys, [names[i],
surnames[i],
ages[i]]))
for i, elem in enumerate(names)]
# [{'name': 'Bob', 'surname': 'Blacksmith', 'age': 10},
# {'name': 'Natalie', 'surname': 'Smith', 'age': 20}]
outer_dictionaries = dict((keys, values)
for keys, values in
zip(genders, inner_dictionaries))
# {'male': {'name': 'Bob', 'surname': 'Blacksmith', 'age': 10},
# 'female': {'name': 'Natalie', 'surname': 'Smith', 'age': 20}}
By the way avoid range(len()) in a for loop if you want to loop like a native (also from R. Hettinger). Moving on, the inner_dictionaries comes from this for loop:
for index, element in enumerate(names):
print(dict(zip(keys, [names[index], surnames[index], ages[index]])))
You can make a list of all the values then use a for loop and then convert each value into dictionary using dict(value) and then append these results into another dictionary using .update() function:
Features = [all features]
Dict = {}
for f in Features:
Dict.update(dict(f))
else:
print(Dict)

Adding the 2D array list values to a string in python

I am trying to convert a 2D list into mongodb document format.
Below is my list
list1=[('New Mexico', 'NM', '2020-04-06', 686, 12), ('New Mexico', 'NM', '2020-07-07', 13727, 519)]
i want to convert this into below format
{'state':'New Mexico','State_code':'NM','Date':'2020-04-06','cases':686,'deaths':12}
i tried accessing the list but not able to get the above format
below is the code
for i in range(len(list1)):
for j in range(len(list1[i])):
item={
'state': list1[i][j],
'state_code': list1[i][j],
'date': list1[i][j],
'cases': list1[i][j],
'deaths': list1[i][j]
}
print(item)
above code inserts same value to all them,may i know how to insert each value into each of them(such as state,state_code etc)
thanks in advance
You would better assign items like below. No need for second loop. You must be sure however that all items in list1 have same length, namely all necessary values:
for i in list1:
item={
'state': i[0],
'state_code': i[1],
'date': i[2],
'cases': i[3],
'deaths': i[4]
}
print(item)
Hello sumesh as rads said you need to save results from every iteration. By doing a small change you can get end result
items = []
for i in list1:
temp_item={
'state': [i][0],
'state_code': [i][1],
'date': [i][2],
'cases': [i][3],
'deaths': [i][4]
}
items.append(temp_item)
print(items)
I am just saving results from every iteration in temp_item and then appending them to the items list :)
You can do this in a nice and pythonic fashion using regular list comprehensions and zip-
# Put the keys to associate with the values in order as they appear in the tuples inside list1
keys = ('state', 'State_code', 'Date', 'cases', 'deaths')
result = [dict(zip(keys, l)) for l in list1]
Output
[{'state': 'New Mexico',
'State_code': 'NM',
'Date': '2020-04-06',
'cases': 686,
'deaths': 12},
{'state': 'New Mexico',
'State_code': 'NM',
'Date': '2020-07-07',
'cases': 13727,
'deaths': 519}]

What is the most efficient way to create nested dictionaries in Python?

I currently have over 10k elements in my dictionary looks like:
cars = [{'model': 'Ford', 'year': 2010},
{'model': 'BMW', 'year': 2019},
...]
And I have a second dictionary:
car_owners = [{'model': 'BMW', 'name': 'Sam', 'age': 34},
{'model': 'BMW', 'name': 'Taylor', 'age': 34},
.....]
However, I want to join together the 2 together to be something like:
combined = [{'model': 'BMW',
'year': 2019,
'owners: [{'name': 'Sam', 'age': 34}, ...]
}]
What is the best way to combine them? For the moment I am using a For loop but I feel like there are more efficient ways of dealing with this.
** This is just a fake example of data, the one I have is a lot more complex but this helps give the idea of what I want to achieve
Iterate over the first list, creating a dict with the key-val as model-val, then in the second dict, look for the same key (model) and update the first dict, if it is found:
cars = [{'model': 'Ford', 'year': 2010}, {'model': 'BMW', 'year': 2019}]
car_owners = [{'model': 'BMW', 'name': 'Sam', 'age': 34}, {'model': 'Ford', 'name': 'Taylor', 'age': 34}]
dd = {x['model']:x for x in cars}
for item in car_owners:
key = item['model']
if key in dd:
del item['model']
dd[key].update({'car_owners': item})
else:
dd[key] = item
print(list(dd.values()))
OUTPUT:
[{'model': 'BMW', 'year': 2019, 'car_owners': {'name': 'Sam', 'age': 34}}, {'model': 'Ford', 'year': 2010, 'car_owners': {'name': 'Taylor',
'age': 34}}]
Really, what you want performance wise is to have dictionaries with the model as the key. That way, you have O(1) lookup and can quickly get the requested element (instead of looping each time in order to find the car with model x).
If you're starting off with lists, I'd first create dictionaries, and then everything is O(1) from there on out.
models_to_cars = {car['model']: car for car in cars}
models_to_owners = {}
for car_owner in car_owners:
models_to_owners.setdefault(car_owner['model'], []).append(car_owner)
combined = [{
**car,
'owners': models_to_owners.get(model, [])
} for model, car in models_to_cars.items()]
Then you'd have
combined = [{'model': 'BMW',
'year': 2019,
'owners': [{'name': 'Sam', 'age': 34}, ...]
}]
as you wanted

How can I use list comprehension to separate values in a dictionary?

name=[]
age=[]
address=[]
...
for line in pg:
for key,value in line.items():
if key == 'name':
name.append(value)
elif key == 'age':
age.append(value)
elif key == 'address':
address.append(value)
.
.
.
Is it possible to use list comprehension for above code because I need to separate lots of value in the dict? I will use the lists to write to a text file.
Source Data:
a = [{'name': 'paul', 'age': '26.', 'address': 'AU', 'gender': 'male'},
{'name': 'mei', 'age': '26.', 'address': 'NY', 'gender': 'female'},
{'name': 'smith', 'age': '16.', 'address': 'NY', 'gender': 'male'},
{'name': 'raj', 'age': '13.', 'address': 'IND', 'gender': 'male'}]
I don't think list comprehension will be a wise choice because you have multiple lists.
Instead of making multiple lists and appending to them the value if the key matches you can use defaultdict to simplify your code.
from collections import defaultdict
result = defaultdict(list)
for line in pg:
for key, value in line.items():
result[key].append(value)
You can get the name list by using result.get('name')
['paul', 'mei', 'smith', 'raj']
This probably won't work the way you want: Your'e trying to assign the three different lists, so you would need three different comprehensions. If your dict is large, this would roughly triple your execution time.
Something straightforward, such as
name = [value for for key,value in line.items() if key == "name"]
seems to be what you'd want ... three times.
You can proceed as :
pg=[{"name":"name1","age":"age1","address":"address1"},{"name":"name2","age":"age2","address":"address2"}]
name=[v for line in pg for k,v in line.items() if k=="name"]
age=[v for line in pg for k,v in line.items() if k=="age"]
address=[v for line in pg for k,v in line.items() if k=="address"]
In continuation with Vishal's answer, please dont use defaultdict. Using defaultdict is a very bad practice when you want to catch keyerrors. Please use setdefault.
results = dict()
for line in pg:
for key, value in line.items():
result.setdefault(key, []).append(value)
Output
{
'name': ['paul', 'mei', 'smith', 'raj'],
'age': [26, 26, 26, 13],
...
}
However, note that if all dicts in pg dont have the same keys, you will lose the relation/correspondence between the items in the dict
Here is a really simple solution if you want to use pandas:
import pandas as pd
df = pd.DataFrame(a)
name = df['name'].tolist()
age = df['age'].tolist()
address = df['address'].tolist()
print(name)
print(age)
print(address)
Output:
['paul', 'mei', 'smith', 'raj']
['26.', '26.', '16.', '13.']
['AU', 'NY', 'NY', 'IND']
Additionally, if your end result is a text file, you can skip the list creation and write the DataFrame (or parts thereof) directly to a CSV with something as simple as:
df.to_csv('/path/to/output.csv')

Categories