Related
I have a DataFrame like:
id
country
city
amount
duplicated
1
France
Paris
200
1
2
France
Paris
200
1
3
France
Lyon
50
2
4
France
Lyon
50
2
5
France
Lyon
50
2
And I would like to store a list per distinct value in duplicated, like:
list 1
[
{
"id": 1,
"country": "France",
"city": "Paris",
"amount": 200,
},
{
"id": 2,
"country": "France",
"city": "Paris",
"amount": 200,
}
]
list 2
[
{
"id": 3,
"country": "France",
"city": "Lyon",
"amount": 50,
},
{
"id": 4,
"country": "France",
"city": "Lyon",
"amount": 50,
},
{
"id": 5,
"country": "France",
"city": "Lyon",
"amount": 50,
}
]
I tried filtering duplicates with
df[df.duplicated(['country','city','amount', 'duplicated'], keep = False)]
but it just returns the same df.
You can use groupby:
lst = (df.groupby(['country', 'city', 'amount']) # or .groupby('duplicated')
.apply(lambda x: x.to_dict('records'))
.tolist())
Output:
>>> lst
[[{'id': 3,
'country': 'France',
'city': 'Lyon',
'amount': 50,
'duplicated': 2},
{'id': 4,
'country': 'France',
'city': 'Lyon',
'amount': 50,
'duplicated': 2},
{'id': 5,
'country': 'France',
'city': 'Lyon',
'amount': 50,
'duplicated': 2}],
[{'id': 1,
'country': 'France',
'city': 'Paris',
'amount': 200,
'duplicated': 1},
{'id': 2,
'country': 'France',
'city': 'Paris',
'amount': 200,
'duplicated': 1}]]
Another solution if you want a dict indexed by duplicated key:
data = {k: v.to_dict('records') for k, v in df.set_index('duplicated').groupby(level=0)}
>>> data[1]
[{'id': 1, 'country': 'France', 'city': 'Paris', 'amount': 200},
{'id': 2, 'country': 'France', 'city': 'Paris', 'amount': 200}]
>>> data[2]
[{'id': 3, 'country': 'France', 'city': 'Lyon', 'amount': 50},
{'id': 4, 'country': 'France', 'city': 'Lyon', 'amount': 50},
{'id': 5, 'country': 'France', 'city': 'Lyon', 'amount': 50}]
If I understand you correctly, you can use DataFrame.to_dict('records') to make your lists:
list_1 = df[df['duplicated'] == 1].to_dict('records')
list_1 = df[df['duplicated'] == 2].to_dict('records')
Or for an arbitrary number of values in the column, you can make a dict:
result = {}
for value in df['duplicated'].unique():
result[value] = df[df['duplicated'] == value].to_dict('records')
import pandas as pd
data = [['INDIA', 'UP', 'BANARAS'], ['INDIA', 'UP', 'KANPUR'], ['INDIA', 'TN', 'CHENNAI'], ['US', 'TEXAS', 'HUSTON']]
cols = ['COUNTRY', 'STATE', 'CITY']
df = pd.DataFrame(data=data, columns=cols)
I want result like this...
[
{
"COUNTRY": "INDIA",
"STATE": "TN",
"CITIES": [
{
"CITY": "CHENNAI"
}
]
},
{
"COUNTRY": "INDIA",
"STATE": "UP",
"CITIES": [
{
"CITY": "BANARAS"
},
{
"CITY": "KANPUR"
}
]
},
{
"COUNTRY": "US",
"STATE": "TEXAS",
"CITITES": [
{
"CITY": "HUSTON"
}
]
}
]
You can try
out = (df.groupby(['COUNTRY', 'STATE'])
.apply(lambda g: g[['CITY']].to_dict(orient='records'))
.to_frame('CITIES')
.reset_index()
.to_dict(orient='records'))
pprint(out)
[{'CITIES': [{'CITY': 'CHENNAI'}], 'COUNTRY': 'INDIA', 'STATE': 'TN'},
{'CITIES': [{'CITY': 'BANARAS'}, {'CITY': 'KANPUR'}],
'COUNTRY': 'INDIA',
'STATE': 'UP'},
{'CITIES': [{'CITY': 'HUSTON'}], 'COUNTRY': 'US', 'STATE': 'TEXAS'}]
add this to the above code:
import json
with open("sample.json", "w") as outfile:
json.dump(out, outfile)
or you could remove to_dict() and add this line after
out.to_json("filename.json")
I got the following list of dicts
list_of_dicts = [
{'product': 'car', 'city': 'new york', 'quantity': 13},
{'product': 'car', 'city': 'new york', 'quantity': 25},
{'product': 'bus', 'city': 'miami', 'quantity': 5},
{'product': 'container', 'city': 'atlanta', 'quantity' 5},
{'product': 'container', 'city': 'atlanta', 'quantity' 8}
]
My target is, when values of 'product' and 'city' are the same, sum up the values of 'quantity'.
The result should look like this:
result_list_of_dicts = [
{'product': 'car', 'city': 'new york', 'quantity': 38},
{'product': 'bus', 'city': 'miami', 'quantity': 5},
{'product': 'container', 'city': 'atlanta', 'quantity' 13},
]
Is there a pythonic way? I tried a couple of things but I better not show them because they are really ugly.
Thank you in advance!
You can do the following, using only standard library utils:
from operator import itemgetter
from functools import reduce
from itertools import groupby
pc = itemgetter("product", "city") # sorting and grouping key
q = itemgetter("quantity")
combine = lambda d1, d2: {**d1, "quantity": q(d1) + q(d2)}
[reduce(combine, g) for _, g in groupby(sorted(list_of_dicts, key=pc), key=pc)]
# [{'product': 'bus', 'city': 'miami', 'quantity': 5},
# {'product': 'car', 'city': 'new york', 'quantity': 38},
# {'product': 'container', 'city': 'atlanta', 'quantity': 13}]
Or, maybe even simpler and linear:
from collections import Counter
pc = itemgetter("product", "city")
q = itemgetter("quantity")
totals = Counter()
for dct in list_of_dicts:
totals[pc(dct)] += q(dct)
result_list_of_dicts = [
{"product": p, "city": c, "quantity": q} for (p, c), q in totals.items()
]
One approach using collections.Counter
from collections import Counter
list_of_dicts = [
{'product': 'car', 'city': 'new york', 'quantity': 13},
{'product': 'car', 'city': 'new york', 'quantity': 25},
{'product': 'bus', 'city': 'miami', 'quantity': 5},
{'product': 'container', 'city': 'atlanta', 'quantity': 5},
{'product': 'container', 'city': 'atlanta', 'quantity': 8}
]
counts = sum((Counter({(d["product"], d["city"]): d["quantity"]}) for d in list_of_dicts), Counter())
result = [{"product": product, "city": city, "quantity": quantity} for (product, city), quantity in counts.items()]
print(result)
A pandas implementation
Group by "product" and "city", sum over the groups and reset index to get original columns.
import pandas as pd
list_of_dicts = [
{'product': 'car', 'city': 'new york', 'quantity': 13},
{'product': 'car', 'city': 'new york', 'quantity': 25},
{'product': 'bus', 'city': 'miami', 'quantity': 5},
{'product': 'container', 'city': 'atlanta', 'quantity': 5},
{'product': 'container', 'city': 'atlanta', 'quantity': 8}
]
df = pd.DataFrame(list_of_dicts)
print(df)
df = df.groupby(["product", "city"]).sum().reset_index()
print(df)
summed_dict = df.to_dict("records")
print(summed_dict)
You could do it with a loop, initializing it the first time you encounter the product.
list_of_dicts = [
{'product': 'car', 'city': 'new york', 'quantity': 13},
{'product': 'car', 'city': 'new york', 'quantity': 25},
{'product': 'bus', 'city': 'miami', 'quantity': 5},
{'product': 'container', 'city': 'atlanta', 'quantity': 5},
{'product': 'container', 'city': 'atlanta', 'quantity': 8}
]
new_dict = {}
for ld in list_of_dicts:
if ld['product'] not in new_dict:
new_dict[ld['product']] = {}
new_dict[ld['product']]['city'] = ld['city']
new_dict[ld['product']]['quantity'] = 0
new_dict[ld['product']]['quantity'] += ld['quantity']
# print(new_dict)
# {'car': {'city': 'new york', 'quantity': 38}, 'bus': {'city': 'miami', 'quantity': 5}, 'container': {'city': 'atlanta', 'quantity': 13}}
result_list_of_dicts = [{'product': nd,
'city': new_dict[nd]['city'],
'quantity': new_dict[nd]['quantity']} for nd in new_dict]
# print(result_list_of_dicts)
# [{'product': 'car', 'city': 'new york', 'quantity': 38}, {'product': 'bus', 'city': 'miami', 'quantity': 5}, {'product': 'container', 'city': 'atlanta', 'quantity': 13}]
I have read 2 csv files into 2 separate dictionaries. Now I need to merge or join them based on a zipcode column, Please advise. Here is the sample data:
Data1:
{
'10029': {'Zipcode': '10029', 'City': 'New York', 'State': 'NY'},
'11221': {'Zipcode': '11221', 'City': 'Brooklyn', 'State': 'NY'},
'10162': {'Zipcode': '10162', 'City': 'New York', 'State': 'NY'}
}
Data2:
{
'10029': {'Zipcode': '10029', 'Latitude': '40.82374', 'Longitude': '-73.9373'},
'11211': {'Zipcode': '11211', 'Latitude': '40.72354', 'Longitude': '-73.98295'},
'10162': {'Zipcode': '10162', 'Latitude': '41.75554', 'Longitude': '-72.94225'}
}
Merged_Date (expected result):
{
'10029': {'Zipcode': '10029', 'City': 'New York', 'State': 'NY', 'Latitude': '40.82374', 'Longitude': '-73.9373'},
'10162': {'Zipcode': '10162', 'City': 'New York', 'State': 'NY''Latitude': '41.75554', 'Longitude': '-72.94225'}
}
Since there are only 2 matches.
Code I have, which seems to be not working:
Data1[Zipcode] = Data2
if Data1[Zipcode] == Data2['Zipcode']:
Data1= Data2.append(['Zipcode'],['Longitude'],['Latitude'])
You cannot append to a dictionary.
What I would do is:
merged = dict()
for key in Data1:
if key in Data2:
merged[key] = {**Data1[key], **Data2[key]}
print(merged)
Result:
{
'10029': {'Zipcode': '10029', 'City': 'New York', 'State': 'NY', 'Latitude': '40.82374', 'Longitude': '-73.9373'},
'10162': {'Zipcode': '10162', 'City': 'New York', 'State': 'NY', 'Latitude': '41.75554', 'Longitude': '-72.94225'}
}
You can iterate through the first data keys, check if this key is in the second dict and if so, merge both dicts:
merged_data = dict()
for key in Data1.keys():
if key in Data2.keys():
merged_data.update({key: {**Data1[key], **Data2[key]}})
If you would rather use dict comprehension:
merged_data = {k: {**Data1[k], **Data2[k]} for k in Data1.keys() if k in Data2.keys()}
I have a variable and list imported from excel that looks like below:
cities= [{'City': 'Buenos Aires',
'Country': 'Argentina',
'Population': 2891000,
'Area': 4758},
{'City': 'Toronto',
'Country': 'Canada',
'Population': 2800000,
'Area': 2731571},
{'City': 'Pyeongchang',
'Country': 'South Korea',
'Population': 2581000,
'Area': 3194},
{'City': 'Marakesh', 'Country': 'Morocco', 'Population': 928850, 'Area': 200},
{'City': 'Albuquerque',
'Country': 'New Mexico',
'Population': 559277,
'Area': 491},
{'City': 'Los Cabos',
'Country': 'Mexico',
'Population': 287651,
'Area': 3750},
{'City': 'Greenville', 'Country': 'USA', 'Population': 84554, 'Area': 68},
{'City': 'Archipelago Sea',
'Country': 'Finland',
'Population': 60000,
'Area': 8300},
{'City': 'Walla Walla Valley',
'Country': 'USA',
'Population': 32237,
'Area': 33},
{'City': 'Salina Island', 'Country': 'Italy', 'Population': 4000, 'Area': 27},
{'City': 'Solta', 'Country': 'Croatia', 'Population': 1700, 'Area': 59},
{'City': 'Iguazu Falls',
'Country': 'Argentina',
'Population': 0,
'Area': 672}]
I just want the value 'Population' from each cities.
What is the most efficient or easiest way to make a list with value from each cities 'Population'?
Below is the code that I came up with, but it's inefficient.
City_Population = [cities[0]['Population'], cities[1]['Population'], cities[2]['Population']]
I am currently learning Python and any advice would be helpful!
Thank you!
Using list comprehension:
print([city['Population'] for city in cities])
OUTPUT:
[2891000, 2800000, 2581000, 928850, 559277, 287651, 84554, 60000, 32237, 4000, 1700, 0]
EDIT:
Assuming there is no population in a city:
print([city['Population'] for city in cities if 'Population' in city])
OUTPUT (removed population from a few cities in the list):
[2891000, 2800000, 2581000, 928850, 287651, 84554, 32237, 4000]
Use a getter, that way you will have empty/none values if some of them are not defined.
populations = [city.get('Population') for city in cities]
If you don't want the empty values:
populations = [pop for pop in populations if pop is not None]