I have read 2 csv files into 2 separate dictionaries. Now I need to merge or join them based on a zipcode column, Please advise. Here is the sample data:
Data1:
{
'10029': {'Zipcode': '10029', 'City': 'New York', 'State': 'NY'},
'11221': {'Zipcode': '11221', 'City': 'Brooklyn', 'State': 'NY'},
'10162': {'Zipcode': '10162', 'City': 'New York', 'State': 'NY'}
}
Data2:
{
'10029': {'Zipcode': '10029', 'Latitude': '40.82374', 'Longitude': '-73.9373'},
'11211': {'Zipcode': '11211', 'Latitude': '40.72354', 'Longitude': '-73.98295'},
'10162': {'Zipcode': '10162', 'Latitude': '41.75554', 'Longitude': '-72.94225'}
}
Merged_Date (expected result):
{
'10029': {'Zipcode': '10029', 'City': 'New York', 'State': 'NY', 'Latitude': '40.82374', 'Longitude': '-73.9373'},
'10162': {'Zipcode': '10162', 'City': 'New York', 'State': 'NY''Latitude': '41.75554', 'Longitude': '-72.94225'}
}
Since there are only 2 matches.
Code I have, which seems to be not working:
Data1[Zipcode] = Data2
if Data1[Zipcode] == Data2['Zipcode']:
Data1= Data2.append(['Zipcode'],['Longitude'],['Latitude'])
You cannot append to a dictionary.
What I would do is:
merged = dict()
for key in Data1:
if key in Data2:
merged[key] = {**Data1[key], **Data2[key]}
print(merged)
Result:
{
'10029': {'Zipcode': '10029', 'City': 'New York', 'State': 'NY', 'Latitude': '40.82374', 'Longitude': '-73.9373'},
'10162': {'Zipcode': '10162', 'City': 'New York', 'State': 'NY', 'Latitude': '41.75554', 'Longitude': '-72.94225'}
}
You can iterate through the first data keys, check if this key is in the second dict and if so, merge both dicts:
merged_data = dict()
for key in Data1.keys():
if key in Data2.keys():
merged_data.update({key: {**Data1[key], **Data2[key]}})
If you would rather use dict comprehension:
merged_data = {k: {**Data1[k], **Data2[k]} for k in Data1.keys() if k in Data2.keys()}
Related
Say I have a nested dictionary like so:
dict = ["{'model': 'network.customer', 'pk': 'C00001', 'fields': {'name': 'Valentino Solomon', 'latitude': 57.13514, 'longitude': -2.11731}}"
"{'model': 'network.customer', 'pk': 'C00002', 'fields': {'name': 'Luna Armstrong', 'latitude': 57.13875, 'longitude': -2.09089}}"
"{'model': 'network.customer', 'pk': 'C00003', 'fields': {'name': 'Jaylen Crane', 'latitude': 57.101, 'longitude': -2.1106}}"
"{'model': 'network.customer', 'pk': 'C00004', 'fields': {'name': 'Christopher Fritz', 'latitude': 57.10801, 'longitude': -2.23776}}"
"{'model': 'network.customer', 'pk': 'C00005', 'fields': {'name': 'Timothy Hutchinson', 'latitude': 57.10076, 'longitude': -2.27073}}"
"{'model': 'network.customer', 'pk': 'C00006', 'fields': {'name': 'Yesenia Reeves', 'latitude': 57.13868, 'longitude': -2.16525}}"
"{'model': 'network.customer', 'pk': 'C00007', 'fields': {'name': 'Cameron Vargas', 'latitude': 57.16115, 'longitude': -2.15543}}"]
How can I iterate through this to get the fields pk and the keys and values inside the key fields so that the return is:
data = "{'pk': 'C00001', 'name': 'Valentino Solomon', 'latitude': 57.13514, 'longitude': -2.11731}",
"{'pk': 'C00002', 'name': 'Luna Armstrong', 'latitude': 57.13875, 'longitude': -2.09089}",
"{'pk': 'C00003', 'name': 'Jaylen Crane', 'latitude': 57.101, 'longitude': -2.1106}",
"{'pk': 'C00004', 'name': 'Christopher Fritz', 'latitude': 57.10801, 'longitude': -2.23776}",
"{'pk': 'C00005', 'name': 'Timothy Hutchinson', 'latitude': 57.10076, 'longitude': -2.27073}"
Thanks!
I can access the fields by using this:
print(customers[0]['fields'])
Convert that dict into a list and make every entry a dict. Then just iterate through the list, put it in a dict and access it like you would every other dict.
list: list = [{'model': 'network.customer', 'pk': 'C00001', 'fields': {'name': 'Valentino Solomon', 'latitude': 57.13514, 'longitude': -2.11731}},
{'model': 'network.customer', 'pk': 'C00002', 'fields': {'name': 'Luna Armstrong', 'latitude': 57.13875, 'longitude': -2.09089}},
{'model': 'network.customer', 'pk': 'C00003', 'fields': {'name': 'Jaylen Crane', 'latitude': 57.101, 'longitude': -2.1106}},
{'model': 'network.customer', 'pk': 'C00004', 'fields': {'name': 'Christopher Fritz', 'latitude': 57.10801, 'longitude': -2.23776}},
{'model': 'network.customer', 'pk': 'C00005', 'fields': {'name': 'Timothy Hutchinson', 'latitude': 57.10076, 'longitude': -2.27073}},
{'model': 'network.customer', 'pk': 'C00006', 'fields': {'name': 'Yesenia Reeves', 'latitude': 57.13868, 'longitude': -2.16525}},
{'model': 'network.customer', 'pk': 'C00007', 'fields': {'name': 'Cameron Vargas', 'latitude': 57.16115, 'longitude': -2.15543}}]
for entry in list:
dict: dict = entry
print(dict['pk'])
print(dict['fields']['name'])
If anybody has a similar list like OP, where all the dicts are in one single string (convert string of dicts without commas into list of dicts), you can use this bit of code:
import ast
def convertToList(inString: str):
i: int = 0
closeCounter: int = 0
openCounter: int = 0
firstOpen: int = 0
outList: list = []
while i < len(inString):
openPos = inString.find("{", i)
closePos = inString.find("}", i)
if closePos == -1:
return outList
if openPos < closePos and openPos != -1:
openCounter += 1
if openCounter == 1:
firstOpen = i
i = openPos + 1
elif closePos < openPos or openPos == -1:
closeCounter += 1
if openCounter == closeCounter:
dict: dict = ast.literal_eval(inString[firstOpen:closePos+1])
outList.append(dict)
openCounter = 0
closeCounter = 0
i = closePos + 1
return outList
list: list = ["{'model': 'network.customer', 'pk': 'C00001', 'fields': {'name': 'Valentino Solomon', 'latitude': 57.13514, 'longitude': -2.11731}}"
"{'model': 'network.customer', 'pk': 'C00002', 'fields': {'name': 'Luna Armstrong', 'latitude': 57.13875, 'longitude': -2.09089}}"
"{'model': 'network.customer', 'pk': 'C00003', 'fields': {'name': 'Jaylen Crane', 'latitude': 57.101, 'longitude': -2.1106}}"
"{'model': 'network.customer', 'pk': 'C00004', 'fields': {'name': 'Christopher Fritz', 'latitude': 57.10801, 'longitude': -2.23776}}"
"{'model': 'network.customer', 'pk': 'C00005', 'fields': {'name': 'Timothy Hutchinson', 'latitude': 57.10076, 'longitude': -2.27073}}"
"{'model': 'network.customer', 'pk': 'C00006', 'fields': {'name': 'Yesenia Reeves', 'latitude': 57.13868, 'longitude': -2.16525}}"
"{'model': 'network.customer', 'pk': 'C00007', 'fields': {'name': 'Cameron Vargas', 'latitude': 57.16115, 'longitude': -2.15543}}"]
list0 = convertToList(list[0])
print(list0)
for entry in list0:
dict: dict = entry
print(dict['pk'])
print(dict['fields']['name'])
I have a list of dictionaries, and within the dictionaries there are dictionaries, and within those dictionaries, there are lists as values - within those lists is the information I need to access.
I want to turn the lists into dictionaries. The entire list of dictionaries are set up like this:
data = [{'date': 'Aug 1 1980',
'hour': '2PM',
'group': {'location' :
[{'country': 'United States',
'state': 'Utah',
'city': 'St. George',
'coordinates': [37.0965, 113.5684]}]},
{'date': 'Aug 1 1980',
'hour': '4PM',
'group': {'location' :
[{'country': 'United States',
'state': 'Utah',
'city': 'St. George',
'coordinates': [37.0965, 113.5684]}]}]
I need the the coordinates but the type of location is a list. How can I turn this list into a dictionary? Should I start by splitting by ':' and ','s into keys and values? That seems like an awful way to do it and I'm hoping someone can help me with a better, quicker way.
Edit
I would want my dictionary to look like this:
{'country': 'United States', 'state': 'Utah', 'city' :'St George', 'coordinates': [37.0965, 113.5684]}
I think the following does what you want (although I'm not totally sure since I had to fix your input data to make it valid and guess a little on what exactly you wanted the result to be.
from pprint import pprint
data = [{'date': 'Aug 1 1980',
'hour': '2PM',
'group': {'location': [{'country': 'United States',
'state': 'Utah',
'city': 'St. George',
'coordinates': [37.0965, 113.5684]}]}},
{'date': 'Aug 1 1980',
'hour': '4PM',
'group': {'location': [{'country': 'United States',
'state': 'Utah',
'city': 'St. George',
'coordinates': [37.0965, 113.5684]}]}}]
fixed_data = []
for dct in data:
dct['group']['location'] = dct['group']['location'][0]
fixed_data.append(dct)
pprint(fixed_data, sort_dicts=0)
Printed result:
[{'date': 'Aug 1 1980',
'hour': '2PM',
'group': {'location': {'country': 'United States',
'state': 'Utah',
'city': 'St. George',
'coordinates': [37.0965, 113.5684]}}},
{'date': 'Aug 1 1980',
'hour': '4PM',
'group': {'location': {'country': 'United States',
'state': 'Utah',
'city': 'St. George',
'coordinates': [37.0965, 113.5684]}}}]
Imagine I have the following dictionary.For every record (row of data), I want to merge the dictionaries of sub fields into a single dictionary. So in the end I have a list of dictionaries. One per each record.
Data = [{'Name': 'bob', 'age': '40’}
{'Name': 'tom', 'age': '30’},
{'Country’: 'US', 'City': ‘Boston’},
{'Country’: 'US', 'City': ‘New York},
{'Email’: 'bob#fake.com', 'Phone': ‘bob phone'},
{'Email’: 'tom#fake.com', 'Phone': ‘none'}]
Output = [
{'Name': 'bob', 'age': '40’,'Country’: 'US', 'City': ‘Boston’,'Email’: 'bob#fake.com', 'Phone': ‘bob phone'},
{'Name': 'tom', 'age': '30’,'Country’: 'US', 'City': ‘New York', 'Email’: 'tom#fake.com', 'Phone': ‘none'}
]
Related: How do I merge a list of dicts into a single dict?
I understand you know which dictionary relates to Bob and which dictionary relates to Tom by their position: dictionaries at even positions relate to Bob, while dictionaries at odd positions relate to Tom.
You can check whether a number is odd or even using % 2:
Data = [{'Name': 'bob', 'age': '40'},
{'Name': 'tom', 'age': '30'},
{'Country': 'US', 'City': 'Boston'},
{'Country': 'US', 'City': 'New York'},
{'Email': 'bob#fake.com', 'Phone': 'bob phone'},
{'Email': 'tom#fake.com', 'Phone': 'none'}]
bob_dict = {}
tom_dict = {}
for i,d in enumerate(Data):
if i % 2 == 0:
bob_dict.update(d)
else:
tom_dict.update(d)
Output=[bob_dict, tom_dict]
Or alternatively:
Output = [{}, {}]
for i, d in enumerate(Data):
Output[i%2].update(d)
This second approach is not only shorter to write, it's also faster to execute and easier to scale if you have more than 2 people.
Splitting the list into more than 2 dictionaries
k = 4 # number of dictionaries you want
Data = [{'Name': 'Alice', 'age': '40'},
{'Name': 'Bob', 'age': '30'},
{'Name': 'Charlie', 'age': '30'},
{'Name': 'Diane', 'age': '30'},
{'Country': 'US', 'City': 'Boston'},
{'Country': 'US', 'City': 'New York'},
{'Country': 'UK', 'City': 'London'},
{'Country': 'UK', 'City': 'Oxford'},
{'Email': 'alice#fake.com', 'Phone': 'alice phone'},
{'Email': 'bob#fake.com', 'Phone': '12345'},
{'Email': 'charlie#fake.com', 'Phone': '0000000'},
{'Email': 'diane#fake.com', 'Phone': 'none'}]
Output = [{} for j in range(k)]
for i, d in enumerate(Data):
Output[i%k].update(d)
# Output = [
# {'Name': 'Alice', 'age': '40', 'Country': 'US', 'City': 'Boston', 'Email': 'alice#fake.com', 'Phone': 'alice phone'},
# {'Name': 'Bob', 'age': '30', 'Country': 'US', 'City': 'New York', 'Email': 'bob#fake.com', 'Phone': '12345'},
# {'Name': 'Charlie', 'age': '30', 'Country': 'UK', 'City': 'London', 'Email': 'charlie#fake.com', 'Phone': '0000000'},
# {'Name': 'Diane', 'age': '30', 'Country': 'UK', 'City': 'Oxford', 'Email': 'diane#fake.com', 'Phone': 'none'}
#]
Additionally, instead of hardcoding k = 4:
If you know the number of fields but not the number of people, you can compute k by dividing the initial number of dictionaries by the number of dictionary types:
fields = ['Name', 'Country', 'Email']
assert(len(Data) % len(fields) == 0) # make sure Data is consistent with number of fields
k = len(Data) // len(fields)
Or alternatively, you can compute k by counting how many occurrences of the 'Names' field you have:
k = sum(1 for d in Data if 'Name' in d)
I have a column "data" which has json object as values. I would like to add a key-value pair inside nested json
source = {'my_dict':[{'_id': 'SE-DATA-BB3A'},{'_id': 'SE-DATA-BB3E'},{'_id': 'SE-DATA-BB3F'}], 'data': [ {'bb3a_bmls':[{'name': 'WAG 01', 'id': '105F', 'state': 'available', 'nodes': 3,'volumes-': [{'state': 'available', 'id': '330172', 'name': 'q_-4144d4e'}, {'state': 'available', 'id': '275192', 'name': 'p_3089d821ae', }]}]}
, {'bb3b_bmls':[{'name': 'FEC 01', 'id': '382E', 'state': 'available', 'nodes': 4,'volumes': [{'state': 'unavailable', 'id': '830172', 'name': 'w_-4144d4e'}, {'state': 'unavailable', 'id': '223192', 'name': 'g_3089d821ae', }]}]}
, {'bb3c_bmls':[{'name': 'ASD 01', 'id': '303F', 'state': 'available', 'nodes': 6,'volumes': [{'state': 'unavailable', 'id': '930172', 'name': 'e_-4144d4e'}, {'state': 'unavailable', 'id': '245192', 'name': 'h_3089d821ae', }]}]}
] }
input_df = pd.DataFrame(source)
input_df looks like below:
Now I need to add the "my_dict" column values as a 1st element inside the nested json values of "data" column
My Target dataframe should look like below ( I have highlighted the changes in bold)
I tired using dict.update() but it doesn't seem to help. I'm stuck here and not getting any idea how to take this forward. Appreciate your help.
I don't see any benefit putting it as a dataframe, if you keep the original dictionary, then the following loop will do,
my_dict=[{'_id': 'SE-DATA-BB3A'},{'_id': 'SE-DATA-BB3E'},{'_id': 'SE-DATA-BB3F'}]
data = [ {'bb3a_bmls':[{'name': 'WAG 01', 'id': '105F', 'state': 'available', 'nodes': 3,'volumes-': [{'state': 'available', 'id': '330172', 'name': 'q_-4144d4e'}, {'state': 'available', 'id': '275192', 'name': 'p_3089d821ae', }]}]}
, {'bb3b_bmls':[{'name': 'FEC 01', 'id': '382E', 'state': 'available', 'nodes': 4,'volumes': [{'state': 'unavailable', 'id': '830172', 'name': 'w_-4144d4e'}, {'state': 'unavailable', 'id': '223192', 'name': 'g_3089d821ae', }]}]}
, {'bb3c_bmls':[{'name': 'ASD 01', 'id': '303F', 'state': 'available', 'nodes': 6,'volumes': [{'state': 'unavailable', 'id': '930172', 'name': 'e_-4144d4e'}, {'state': 'unavailable', 'id': '245192', 'name': 'h_3089d821ae', }]}]}
]
for idx, val in enumerate(data):
val[list(val.keys())[0]][0].update(my_dict[idx])
def get_val(row):
my_dict_val = row.loc['my_dict']
dict_key = list(row['data'].keys())[0]
if not list(row['data'].values())[0]:
return row['data']
data_dict = list(row['data'].values())[0][0]
data_dict.update(my_dict_val)
res = dict()
res[dict_key] = []
res[dict_key].append(data_dict)
return res
input_df['data'] = input_df.apply(get_val, axis=1)
The solution is as follows:
def update_data(row):
data_dict = row['data']
for key in data_dict:
data_dict.update(row.loc['my_dict'])
return data_dict
df['data'] = df.apply(update_data,axis=1)
I have a variable and list imported from excel that looks like below:
cities= [{'City': 'Buenos Aires',
'Country': 'Argentina',
'Population': 2891000,
'Area': 4758},
{'City': 'Toronto',
'Country': 'Canada',
'Population': 2800000,
'Area': 2731571},
{'City': 'Pyeongchang',
'Country': 'South Korea',
'Population': 2581000,
'Area': 3194},
{'City': 'Marakesh', 'Country': 'Morocco', 'Population': 928850, 'Area': 200},
{'City': 'Albuquerque',
'Country': 'New Mexico',
'Population': 559277,
'Area': 491},
{'City': 'Los Cabos',
'Country': 'Mexico',
'Population': 287651,
'Area': 3750},
{'City': 'Greenville', 'Country': 'USA', 'Population': 84554, 'Area': 68},
{'City': 'Archipelago Sea',
'Country': 'Finland',
'Population': 60000,
'Area': 8300},
{'City': 'Walla Walla Valley',
'Country': 'USA',
'Population': 32237,
'Area': 33},
{'City': 'Salina Island', 'Country': 'Italy', 'Population': 4000, 'Area': 27},
{'City': 'Solta', 'Country': 'Croatia', 'Population': 1700, 'Area': 59},
{'City': 'Iguazu Falls',
'Country': 'Argentina',
'Population': 0,
'Area': 672}]
I just want the value 'Population' from each cities.
What is the most efficient or easiest way to make a list with value from each cities 'Population'?
Below is the code that I came up with, but it's inefficient.
City_Population = [cities[0]['Population'], cities[1]['Population'], cities[2]['Population']]
I am currently learning Python and any advice would be helpful!
Thank you!
Using list comprehension:
print([city['Population'] for city in cities])
OUTPUT:
[2891000, 2800000, 2581000, 928850, 559277, 287651, 84554, 60000, 32237, 4000, 1700, 0]
EDIT:
Assuming there is no population in a city:
print([city['Population'] for city in cities if 'Population' in city])
OUTPUT (removed population from a few cities in the list):
[2891000, 2800000, 2581000, 928850, 287651, 84554, 32237, 4000]
Use a getter, that way you will have empty/none values if some of them are not defined.
populations = [city.get('Population') for city in cities]
If you don't want the empty values:
populations = [pop for pop in populations if pop is not None]