Imagine I have the following dictionary.For every record (row of data), I want to merge the dictionaries of sub fields into a single dictionary. So in the end I have a list of dictionaries. One per each record.
Data = [{'Name': 'bob', 'age': '40’}
{'Name': 'tom', 'age': '30’},
{'Country’: 'US', 'City': ‘Boston’},
{'Country’: 'US', 'City': ‘New York},
{'Email’: 'bob#fake.com', 'Phone': ‘bob phone'},
{'Email’: 'tom#fake.com', 'Phone': ‘none'}]
Output = [
{'Name': 'bob', 'age': '40’,'Country’: 'US', 'City': ‘Boston’,'Email’: 'bob#fake.com', 'Phone': ‘bob phone'},
{'Name': 'tom', 'age': '30’,'Country’: 'US', 'City': ‘New York', 'Email’: 'tom#fake.com', 'Phone': ‘none'}
]
Related: How do I merge a list of dicts into a single dict?
I understand you know which dictionary relates to Bob and which dictionary relates to Tom by their position: dictionaries at even positions relate to Bob, while dictionaries at odd positions relate to Tom.
You can check whether a number is odd or even using % 2:
Data = [{'Name': 'bob', 'age': '40'},
{'Name': 'tom', 'age': '30'},
{'Country': 'US', 'City': 'Boston'},
{'Country': 'US', 'City': 'New York'},
{'Email': 'bob#fake.com', 'Phone': 'bob phone'},
{'Email': 'tom#fake.com', 'Phone': 'none'}]
bob_dict = {}
tom_dict = {}
for i,d in enumerate(Data):
if i % 2 == 0:
bob_dict.update(d)
else:
tom_dict.update(d)
Output=[bob_dict, tom_dict]
Or alternatively:
Output = [{}, {}]
for i, d in enumerate(Data):
Output[i%2].update(d)
This second approach is not only shorter to write, it's also faster to execute and easier to scale if you have more than 2 people.
Splitting the list into more than 2 dictionaries
k = 4 # number of dictionaries you want
Data = [{'Name': 'Alice', 'age': '40'},
{'Name': 'Bob', 'age': '30'},
{'Name': 'Charlie', 'age': '30'},
{'Name': 'Diane', 'age': '30'},
{'Country': 'US', 'City': 'Boston'},
{'Country': 'US', 'City': 'New York'},
{'Country': 'UK', 'City': 'London'},
{'Country': 'UK', 'City': 'Oxford'},
{'Email': 'alice#fake.com', 'Phone': 'alice phone'},
{'Email': 'bob#fake.com', 'Phone': '12345'},
{'Email': 'charlie#fake.com', 'Phone': '0000000'},
{'Email': 'diane#fake.com', 'Phone': 'none'}]
Output = [{} for j in range(k)]
for i, d in enumerate(Data):
Output[i%k].update(d)
# Output = [
# {'Name': 'Alice', 'age': '40', 'Country': 'US', 'City': 'Boston', 'Email': 'alice#fake.com', 'Phone': 'alice phone'},
# {'Name': 'Bob', 'age': '30', 'Country': 'US', 'City': 'New York', 'Email': 'bob#fake.com', 'Phone': '12345'},
# {'Name': 'Charlie', 'age': '30', 'Country': 'UK', 'City': 'London', 'Email': 'charlie#fake.com', 'Phone': '0000000'},
# {'Name': 'Diane', 'age': '30', 'Country': 'UK', 'City': 'Oxford', 'Email': 'diane#fake.com', 'Phone': 'none'}
#]
Additionally, instead of hardcoding k = 4:
If you know the number of fields but not the number of people, you can compute k by dividing the initial number of dictionaries by the number of dictionary types:
fields = ['Name', 'Country', 'Email']
assert(len(Data) % len(fields) == 0) # make sure Data is consistent with number of fields
k = len(Data) // len(fields)
Or alternatively, you can compute k by counting how many occurrences of the 'Names' field you have:
k = sum(1 for d in Data if 'Name' in d)
Related
I'm stuck parsing the below python nested dictionary based on the nested key. I want to filter a key's value and return all the nested key/values related to that.
{ 'US': { 'Washington': {'Seattle': {1: {'name': 'John', 'age': '27', 'gender': 'Male'}}},
{ 'Florida': {'some city': {2: {'name': 'Marie', 'age': '22', 'gender': 'Female'}}},
{ 'Ohio': {'some city': {3: {'name': 'Luna', 'age': '24', 'gender': 'Female', 'married': 'No'}}},
{ 'Nevada': {'some city': {4: {'name': 'Peter', 'age': '29', 'gender': 'Male', 'married': 'Yes'}}}}}
For instance, filtering on gender "Male" should return the below:
US
Washington
Seattle
1
name:John
age: 27
US
Nevada
somecity
4
name:Peter
age: 29
married: Yes
Can you please suggest the best way to parse it. I tried to use contains within a loop that doesn't seem to work.
We can recursively explore the dict structure, keeping track of the path of keys at each point. When we reach a dict containing the target value, we yield the path and the content of the dict.
We can use this generator:
def recursive_search(dct, target, path=None):
if path is None:
path = []
if target in dct.values():
out = ' '.join(path) + ' ' + ' '.join(f'{key}:{value}' for key, value in dct.items())
yield out
else:
for key, value in dct.items():
if isinstance(value, dict):
yield from recursive_search(value, target, path+[str(key)])
this way:
data = { 'US': { 'Washington': {'Seattle': {1: {'name': 'John', 'age': '27', 'gender': 'Male'}}},
'Florida': {'some city': {2: {'name': 'Marie', 'age': '22', 'gender': 'Female'}}},
'Ohio': {'some city': {3: {'name': 'Luna', 'age': '24', 'gender': 'Female', 'married': 'No'}}},
'Nevada': {'some city': {4: {'name': 'Peter', 'age': '29', 'gender': 'Male', 'married': 'Yes'}}}}}
for match in recursive_search(data, 'Male'):
print(match)
# US Washington Seattle 1 name:John age:27 gender:Male
# US Nevada some city 4 name:Peter age:29 gender:Male married:Yes
This Code Will work...
a_dict={ 'US': { 'Washington': {'Seattle': {1: {'name': 'John', 'age': '27', 'gender': 'Male'}}}, 'Florida': {'some city': {2: {'name': 'Marie', 'age': '22', 'gender': 'Female'}}}, 'Ohio': {'some city': {3: {'name': 'Luna', 'age': '24', 'gender': 'Female', 'married': 'No'}}}, 'Nevada': {'some city': {4: {'name': 'Peter', 'age': '29', 'gender': 'Male', 'married': 'Yes'}}}}}
for k,v in a_dict.items():
for k1,v1 in v.items():
for k2,v2 in v1.items():
for k3,v3 in v2.items():
if v3["gender"]=="Male":
string=""
for k4,v4 in v3.items():
string=string+ k4+":"+v4+" "
print(k,k1,k2,k3, string.strip())
def remove_repeated_lines(data):
lines_seen = set() # holds lines already seen
d=[]
for t in data:
if t not in lines_seen: # check if line is not duplicate
d.append(t)
lines_seen.add(t)
return d
a=[{'name': 'paul', 'age': '26.', 'hometown': 'AU', 'gender': 'male'},
{'name': 'mei', 'age': '26.', 'hometown': 'NY', 'gender': 'female'},
{'name': 'smith', 'age': '16.', 'hometown': 'NY', 'gender': 'male'},
{'name': 'raj', 'age': '13.', 'hometown': 'IND', 'gender': 'male'}]
age=[]
for line in a:
for key,value in line.items():
if key == 'age':
age.append(remove_repeated_lines(value.replace('.','___')))
print(age)
the output is
[['2', '6', '___'], ['2', '6', '___'], ['1', '6', '___'], ['1', '3', '___']]
my desired output is ['26___','16___','13___']
Here is my code to remove repeated lines from the value of a dictionary. After I run the code, the repeated lines are not remove.
In [37]: a=[{'name': 'paul', 'age': '26.', 'hometown': 'AU', 'gender': 'male'},
...: {'name': 'mei', 'age': '26.', 'hometown': 'NY', 'gender': 'female'},
...: {'name': 'smith', 'age': '16.', 'hometown': 'NY', 'gender': 'male'},
...: {'name': 'raj', 'age': '13.', 'hometown': 'IND', 'gender': 'male'}]
In [40]: set(i["age"].replace(".","")+"_" for i in a)
Out[40]: {'13_', '16_', '26_'}
You can use set comprehension to do it with ease, in a more readable fashion:
age = list({
line['age'].replace('.', '___')
for line in a
if 'age' in line
})
Output:
['26___', '16___', '13___']
How can I convert this json into dataframe in python, by removing fields. I just need employess data in my dataframe.
{'fields': [{'id': 'displayName', 'type': 'text', 'name': 'Display name'},
{'id': 'firstName', 'type': 'text', 'name': 'First name'},
{'id': 'gender', 'type': 'gender', 'name': 'Gender'}],
'employees': [{'id': '123', 'displayName': 'abc', 'firstName': 'abc','gender': 'Female'},
{'id': '234', 'displayName': 'xyz.', 'firstName': 'xyz','gender': 'Female'},
{'id': '345', 'displayName': 'pqr', 'firstName': 'pqr', 'gender': 'Female'}]}
If you wan the employee information you can
JSON = {var:[...],'employees':[{}]}
employee_info = JSON['employees']
employee_info with be a list of dictionaries which you will be able to create a dataframe from by this answer: Convert list of dictionaries to a pandas DataFrame
I am looking to read one list which consists of columns names and another list of lists which consists of data which needs to be mapped to the columns. Each list in the list of list is one row of data to later be push into the database.
I've tried to use the following code to join these two lists:
dict(zip( column_names, data)) but I recieve an error:
TypeError unhashable type: 'list'
How would I join a list of lists and another list together to a dict?
column_names = ['id', 'first_name', 'last_name', 'city', 'dob']
data = [
['1', 'Mike', 'Walters', 'New York City', '1998-12-01'],
['2', 'Daniel', 'Strange', 'Baltimore', '1992-08-12'],
['3', 'Sarah', 'McNeal', 'Miami', '1990-05-05'],
['4', 'Steve', 'Breene', 'Philadelphia', '1988-02-06']
]
The result I'm seeking is:
dict_items = {{'id': '1', 'first_name': 'Mike', 'last_name': 'Walters',
'city': 'New York City', 'dob': '1998-12-01'},
{'id': '2', ...}}
Later looking to push this dict of dicts to the database with SQLAlchemy.
You can create a list of key-value-pairs like this:
result = [dict(zip(column_names, row)) for row in data]
Note the brackets are not curly like you specified.
zip will not work in your case, because its map one to one input arguments.
Zip Documentation
Demo:
>>> l1 = ["key01", "key02", "key03"]
>>> l2 = ["value01", "value02", "value03"]
>>> zip(l1, l2)
[('key01', 'value01'), ('key02', 'value02'), ('key03', 'value03')]
>>> dict(zip(l1, l2))
{'key01': 'value01', 'key02': 'value02', 'key03': 'value03'}
>>>
Use normal iteration and list append method to create final output:
Demo:
>>> list_data_items = []
>>> for item in data:
... list_data_items.append(dict(zip(column_names, item)))
...
All the other answers above worked fine. Just for the sake of completeness you could also use pandas (and it might be convenient if your data is coming from say a csv file).
Just create a data frame with your data and then convert it to dict:
import pandas as pd
df = pd.DataFrame(data, columns=column_names)
df.to_dict(orient='records')
Two simple for-loops:
column_names = ['id', 'first_name', 'last_name', 'city', 'dob']
data = [
['1', 'Mike', 'Walters', 'New York City', '1998-12-01'],
['2', 'Daniel', 'Strange', 'Baltimore', '1992-08-12'],
['3', 'Sarah', 'McNeal', 'Miami', '1990-05-05'],
['4', 'Steve', 'Breene', 'Philadelphia', '1988-02-06']
]
db_result = []
for data_row in data:
new_db_row = {}
for i, data_value in enumerate(data_row):
new_db_row[column_names[i]] = data_value
result.append(new_db_row)
print(result)
First For statement loops over all data rows.
The second uses enumerate to separate the index(i) and the data_value of the rows. The index is used to extract the column names from the list column_names.
I hope this explanation does not make it more complicated.
Following the printed result.
[{'id': '1', 'first_name': 'Mike', 'last_name': 'Walters', 'city': 'New York City', 'dob': '1998-12-01'}, {'id': '2', 'first_name': 'Daniel', 'last_name': 'Strange', 'city': 'Baltimore', 'dob': '1992-08-12'}, {'id': '3', 'first_name': 'Sarah', 'last_name': 'McNeal', 'city': 'Miami', 'dob': '1990-05-05'}, {'id': '4', 'first_name': 'Steve', 'last_name': 'Breene', 'city': 'Philadelphia', 'dob': '1988-02-06'}]
Since you want to construct multiple dictionaries, you have to zip your column names with each list in data and pass the result to the dict constructor. Your result dict_items also needs to be a collection that can store unhashable types such as dictionaries. We cannot use a set for this (which you say you are seeking), but we can use a list (or a tuple).
Employ a simple list comprehension in order to build one dictionary for each sublist in data.
>>> [dict(zip(column_names, sublist)) for sublist in data]
[{'dob': '1998-12-01', 'city': 'New York City', 'first_name': 'Mike', 'last_name': 'Walters', 'id': '1'}, {'dob': '1992-08-12', 'city': 'Baltimore', 'first_name': 'Daniel', 'last_name': 'Strange', 'id': '2'}, {'dob': '1990-05-05', 'city': 'Miami', 'first_name': 'Sarah', 'last_name': 'McNeal', 'id': '3'}, {'dob': '1988-02-06', 'city': 'Philadelphia', 'first_name': 'Steve', 'last_name': 'Breene', 'id': '4'}]
I also assumed that {'id':'2'} in your expected result is a typo.
Using Pandas:
>>> column_names
['id', 'first_name', 'last_name', 'city', 'dob']
>>> data
[['1', 'Mike', 'Walters', 'New York City', '1998-12-01'], ['2', 'Daniel', 'Strange', 'Baltimore', '1992-08-12'], ['3', 'Sarah', 'McNeal', 'Miami', '1990-05-05'], ['4', 'Steve', 'Breene', 'Philadelphia', '1988-02-06']]
>>> import pandas as pd
>>> pd.DataFrame(data, columns=column_names).T.to_dict().values()
[{'dob': '1998-12-01', 'city': 'New York City', 'first_name': 'Mike', 'last_name': 'Walters', 'id': '1'}, {'dob': '1992-08-12', 'city': 'Baltimore', 'first_name': 'Daniel', 'last_name': 'Strange', 'id': '2'}, {'dob': '1990-05-05', 'city': 'Miami', 'first_name': 'Sarah', 'last_name': 'McNeal', 'id': '3'}, {'dob': '1988-02-06', 'city': 'Philadelphia', 'first_name': 'Steve', 'last_name': 'Breene', 'id': '4'}]
column_names = ['id', 'first_name', 'last_name', 'city', 'dob']
data = [
['1', 'Mike', 'Walters', 'New York City', '1998-12-01'],
['2', 'Daniel', 'Strange', 'Baltimore', '1992-08-12'],
['3', 'Sarah', 'McNeal', 'Miami', '1990-05-05'],
['4', 'Steve', 'Breene', 'Philadelphia', '1988-02-06']
]
destinationList = []
for value in data:
destinationList.append(dict(zip(column_names,value)))
print(destinationList)
#
# zip(column_names,value)
# [('id', '1'), ('first_name', 'Mike') , ('last_name', 'Walters'), ('city', 'New York City'),('dob', '1998-12-01')]]
# dict(zip(column_names,value))
# {'last_name': 'Walters', 'dob': '1998-12-01','id': '1','first_name': 'Mike','city': 'New York City'}
I'm making some scrip with Python and having one small question.
I have 2 lists:
['name', 'age', 'sex', 'addr', 'city']
['Jack 24 male no23 NY', 'Jane 25 female no24 NY', 'Dane 14 male no14 NY']
So I want to have:
dictofJack = {'name': 'Jack', 'age': '24', 'sex': 'male', 'addr': 'no23', 'city':'NY'}
dictofJane = {'name': 'Jane', 'age': '25', 'sex': 'female', 'addr': 'no24', 'city':'NY'}
dictofDane = {'name': 'Dane', 'age': '14', 'sex': 'male', 'addr': 'no14', 'city':'NY'}
In this case, how can I use zip to make it get the dictionaries automatically in a for loop?
Using list comprehension or generator expression:
>>> header = ['name', 'age', 'sex', 'addr', 'city']
>>> values = ['Jack 24 male no23 NY',
'Jane 25 female no24 NY',
'Dane 14 male no14 NY']
>>> dictofJack, dictofJane, dictofDane = (
dict(zip(header, value.split())) for value in values
)
>>> dictofJack
{'addr': 'no23', 'age': '24', 'city': 'NY', 'name': 'Jack', 'sex': 'male'}
>>> dictofJane
{'addr': 'no24', 'age': '25', 'city':'NY', 'name': 'Jane', 'sex': 'female'}
>>> dictofDane
{'addr': 'no14', 'age': '14', 'city': 'NY', 'name': 'Dane', 'sex': 'male'}
BTW, instead of making multiple variables of dictionaries, I recommend to use dictionary of dictionaries (think of case where 100 of dictionaries required), using dictionary comprehension:
>>> {value.split()[0]: dict(zip(header, value.split())) for value in values}
{'Jane': {'addr': 'no24', 'age': '25', 'city': 'NY', 'name': 'Jane', 'sex': 'female'},
'Dane': {'addr': 'no14', 'age': '14', 'city': 'NY', 'name': 'Dane', 'sex': 'male'},
'Jack': {'addr': 'no23', 'age': '24', 'city': 'NY', 'name': 'Jack', 'sex': 'male'}}