I got the following list of dicts
list_of_dicts = [
{'product': 'car', 'city': 'new york', 'quantity': 13},
{'product': 'car', 'city': 'new york', 'quantity': 25},
{'product': 'bus', 'city': 'miami', 'quantity': 5},
{'product': 'container', 'city': 'atlanta', 'quantity' 5},
{'product': 'container', 'city': 'atlanta', 'quantity' 8}
]
My target is, when values of 'product' and 'city' are the same, sum up the values of 'quantity'.
The result should look like this:
result_list_of_dicts = [
{'product': 'car', 'city': 'new york', 'quantity': 38},
{'product': 'bus', 'city': 'miami', 'quantity': 5},
{'product': 'container', 'city': 'atlanta', 'quantity' 13},
]
Is there a pythonic way? I tried a couple of things but I better not show them because they are really ugly.
Thank you in advance!
You can do the following, using only standard library utils:
from operator import itemgetter
from functools import reduce
from itertools import groupby
pc = itemgetter("product", "city") # sorting and grouping key
q = itemgetter("quantity")
combine = lambda d1, d2: {**d1, "quantity": q(d1) + q(d2)}
[reduce(combine, g) for _, g in groupby(sorted(list_of_dicts, key=pc), key=pc)]
# [{'product': 'bus', 'city': 'miami', 'quantity': 5},
# {'product': 'car', 'city': 'new york', 'quantity': 38},
# {'product': 'container', 'city': 'atlanta', 'quantity': 13}]
Or, maybe even simpler and linear:
from collections import Counter
pc = itemgetter("product", "city")
q = itemgetter("quantity")
totals = Counter()
for dct in list_of_dicts:
totals[pc(dct)] += q(dct)
result_list_of_dicts = [
{"product": p, "city": c, "quantity": q} for (p, c), q in totals.items()
]
One approach using collections.Counter
from collections import Counter
list_of_dicts = [
{'product': 'car', 'city': 'new york', 'quantity': 13},
{'product': 'car', 'city': 'new york', 'quantity': 25},
{'product': 'bus', 'city': 'miami', 'quantity': 5},
{'product': 'container', 'city': 'atlanta', 'quantity': 5},
{'product': 'container', 'city': 'atlanta', 'quantity': 8}
]
counts = sum((Counter({(d["product"], d["city"]): d["quantity"]}) for d in list_of_dicts), Counter())
result = [{"product": product, "city": city, "quantity": quantity} for (product, city), quantity in counts.items()]
print(result)
A pandas implementation
Group by "product" and "city", sum over the groups and reset index to get original columns.
import pandas as pd
list_of_dicts = [
{'product': 'car', 'city': 'new york', 'quantity': 13},
{'product': 'car', 'city': 'new york', 'quantity': 25},
{'product': 'bus', 'city': 'miami', 'quantity': 5},
{'product': 'container', 'city': 'atlanta', 'quantity': 5},
{'product': 'container', 'city': 'atlanta', 'quantity': 8}
]
df = pd.DataFrame(list_of_dicts)
print(df)
df = df.groupby(["product", "city"]).sum().reset_index()
print(df)
summed_dict = df.to_dict("records")
print(summed_dict)
You could do it with a loop, initializing it the first time you encounter the product.
list_of_dicts = [
{'product': 'car', 'city': 'new york', 'quantity': 13},
{'product': 'car', 'city': 'new york', 'quantity': 25},
{'product': 'bus', 'city': 'miami', 'quantity': 5},
{'product': 'container', 'city': 'atlanta', 'quantity': 5},
{'product': 'container', 'city': 'atlanta', 'quantity': 8}
]
new_dict = {}
for ld in list_of_dicts:
if ld['product'] not in new_dict:
new_dict[ld['product']] = {}
new_dict[ld['product']]['city'] = ld['city']
new_dict[ld['product']]['quantity'] = 0
new_dict[ld['product']]['quantity'] += ld['quantity']
# print(new_dict)
# {'car': {'city': 'new york', 'quantity': 38}, 'bus': {'city': 'miami', 'quantity': 5}, 'container': {'city': 'atlanta', 'quantity': 13}}
result_list_of_dicts = [{'product': nd,
'city': new_dict[nd]['city'],
'quantity': new_dict[nd]['quantity']} for nd in new_dict]
# print(result_list_of_dicts)
# [{'product': 'car', 'city': 'new york', 'quantity': 38}, {'product': 'bus', 'city': 'miami', 'quantity': 5}, {'product': 'container', 'city': 'atlanta', 'quantity': 13}]
Related
I have a list of list that look like this, they have been sorted so that duplicate IDs are arranged with the one I want to keep at the top..
[
{'id': '23', 'type': 'car', 'price': '445'},
{'id': '23', 'type': 'car', 'price': '78'},
{'id': '23', 'type': 'car', 'price': '34'},
{'id': '125', 'type': 'truck', 'price': '998'},
{'id': '125', 'type': 'truck', 'price': '722'},
{'id': '125', 'type': 'truck', 'price': '100'},
{'id': '87', 'type': 'bike', 'price': '50'},
]
What is the simplest way to remove rows that have duplicate IDs but always keep the first one? In this instance the end result would look like this...
[
{'id': '23', 'type': 'car', 'price': '445'},
{'id': '125', 'type': 'truck', 'price': '998'},
{'id': '87', 'type': 'bike', 'price': '50'},
]
I know I can remove duplicates from lists by converting to set like set(my_list) but in this instance it is duplicates by ID that I want to remove by
Since you already hav the list sorted properly, a simple way to do this is to use itertools.groupby to grab the first element of each group in a list comprehension:
from itertools import groupby
l= [
{'id': '23', 'type': 'car', 'price': '445'},
{'id': '23', 'type': 'car', 'price': '78'},
{'id': '23', 'type': 'car', 'price': '34'},
{'id': '125', 'type': 'truck', 'price': '998'},
{'id': '125', 'type': 'truck', 'price': '722'},
{'id': '125', 'type': 'truck', 'price': '100'},
{'id': '87', 'type': 'bike', 'price': '50'},
]
[next(g) for k, g in groupby(l, key=lambda d: d['id'])]
# [{'id': '23', 'type': 'car', 'price': '445'},
# {'id': '125', 'type': 'truck', 'price': '998'},
# {'id': '87', 'type': 'bike', 'price': '50'}]
I would probably convert to Pandas DataFrame and then use drop_duplicates
import pandas as pd
data = [
{'id': '23', 'type': 'car', 'price': '445'},
{'id': '23', 'type': 'car', 'price': '78'},
{'id': '23', 'type': 'car', 'price': '34'},
{'id': '125', 'type': 'truck', 'price': '998'},
{'id': '125', 'type': 'truck', 'price': '722'},
{'id': '125', 'type': 'truck', 'price': '100'},
{'id': '87', 'type': 'bike', 'price': '50'},
]
df = pd.DataFrame(data)
df.drop_duplicates(subset=['id'], inplace=True)
print(df.to_dict('records'))
# Output
# [{'id': '23', 'type': 'car', 'price': '445'},
# {'id': '125', 'type': 'truck', 'price': '998'},
# {'id': '87', 'type': 'bike', 'price': '50'}]
Here's an answer that involves no external modules or unnecessary manipulation of the data:
data = [
{'id': '23', 'type': 'car', 'price': '445'},
{'id': '23', 'type': 'car', 'price': '78'},
{'id': '23', 'type': 'car', 'price': '34'},
{'id': '125', 'type': 'truck', 'price': '998'},
{'id': '125', 'type': 'truck', 'price': '722'},
{'id': '125', 'type': 'truck', 'price': '100'},
{'id': '87', 'type': 'bike', 'price': '50'},
]
seen = set()
result = [row for row in data if row['id'] not in seen and not seen.add(row['id'])]
print(result)
Result:
[{'id': '23', 'type': 'car', 'price': '445'},
{'id': '125', 'type': 'truck', 'price': '998'},
{'id': '87', 'type': 'bike', 'price': '50'}]
Note that the not seen.add(row['id'])] part of the list comprehension will always be True. It's just a way of noting that a unique entry has been seen by adding it to the seen set.
Let's take the name of the given list as data.
unique_ids = []
result = []
for item in data:
if item["id"] not in unique_ids:
result.append(item)
unique_ids.append(item["id"])
print(result)
The result will be,
[{'id': '23', 'type': 'car', 'price': '445'},
{'id': '125', 'type': 'truck', 'price': '998'},
{'id': '87', 'type': 'bike', 'price': '50'}]
I have a column in my dataframe of type object that has values like:
for i in df3['placeholders'][:10]:
Output:
[{'type': 'experience', 'label': '0-1 Yrs'}, {'type': 'salary', 'label': '1,00,000 - 1,25,000 PA.'}, {'type': 'location', 'label': 'Chennai'}]
[{'type': 'date', 'label': '08 October - 13 October'}, {'type': 'salary', 'label': 'Not disclosed'}, {'type': 'location', 'label': 'Chennai'}]
[{'type': 'education', 'label': 'B.Com'}, {'type': 'salary', 'label': 'Not disclosed'}, {'type': 'location', 'label': 'Mumbai Suburbs, Navi Mumbai, Mumbai'}]
[{'type': 'experience', 'label': '0-2 Yrs'}, {'type': 'salary', 'label': '50,000 - 2,00,000 PA.'}, {'type': 'location', 'label': 'Chennai'}]
[{'type': 'experience', 'label': '0-1 Yrs'}, {'type': 'salary', 'label': '2,00,000 - 2,25,000 PA.'}, {'type': 'location', 'label': 'Bengaluru(JP Nagar)'}]
[{'type': 'experience', 'label': '0-3 Yrs'}, {'type': 'salary', 'label': '80,000 - 2,00,000 PA.'}, {'type': 'location', 'label': 'Hyderabad'}]
[{'type': 'experience', 'label': '0-5 Yrs'}, {'type': 'salary', 'label': 'Not disclosed'}, {'type': 'location', 'label': 'Hyderabad'}]
[{'type': 'experience', 'label': '0-1 Yrs'}, {'type': 'salary', 'label': '1,25,000 - 2,00,000 PA.'}, {'type': 'location', 'label': 'Mumbai'}]
[{'type': 'date', 'label': '08 October - 17 October'}, {'type': 'salary', 'label': 'Not disclosed'}, {'type': 'location', 'label': 'Pune(Bavdhan)'}]
[{'type': 'experience', 'label': '0-2 Yrs'}, {'type': 'salary', 'label': 'Not disclosed'}, {'type': 'location', 'label': 'Jaipur'}]
[{'type': 'experience', 'label': '0-0 Yrs'}, {'type': 'salary', 'label': '1,00,000 - 1,50,000 PA.'}, {'type': 'location', 'label': 'Delhi NCR(Sector-81 Noida)'}]
I want to add more columns to my existing dataframe by extracting features from this column such that
value of "type"= Column name
value of "label"= value under the column
The final expected output:
df.head(3)
Output:
..... experience, salary, location, date, education
..... 0-1 Yrs, 1,00,000 - 1,25,000 PA., Chennai, nan, nan
..... nan, 1,00,000 - 1,25,000 PA., Chennai, 08 October - 13 October, nan
..... nan, Not disclosed, Mumbai Suburbs, Navi Mumbai, Mumbai, nan, B.Com
The first answer worked.
[EDIT 2]
Later, I tried the same code suggested in the first response for a new dataset with same issue. I got the following error:
<ipython-input-23-ad8e644044af> in <listcomp>(.0)
----> 1 new_columns = set([d['Name'] for l in dfr.RatingDistribution.values for d in l ])
2 # Make a dict of dicts
3 col_val_dict = {}
4 for col_name in new_columns:
5 col_val_dict[col_name] = {}
TypeError: 'float' object is not iterable
My Input column:
RatingDistribution
[{'Name': 'Work-Life Balance', 'count': 5}, {'Name': 'Skill Development', 'count': 5}, {'Name': 'Salary & Benefits', 'count': 5}, {'Name': 'Job Security', 'count': 5}, {'Name': 'Company Culture', 'count': 5}, {'Name': 'Career Growth', 'count': 5}, {'Name': 'Work Satisfaction', 'count': 5}]
[{'Name': 'Work-Life Balance', 'count': 4}, {'Name': 'Skill Development', 'count': 5}, {'Name': 'Salary & Benefits', 'count': 4}, {'Name': 'Job Security', 'count': 4}, {'Name': 'Company Culture', 'count': 3}, {'Name': 'Career Growth', 'count': 3}, {'Name': 'Work Satisfaction', 'count': 5}]
[{'Name': 'Work-Life Balance', 'count': 3}, {'Name': 'Skill Development', 'count': 4}, {'Name': 'Salary & Benefits', 'count': 5}, {'Name': 'Job Security', 'count': 4}, {'Name': 'Company Culture', 'count': 5}, {'Name': 'Career Growth', 'count': 4}, {'Name': 'Work Satisfaction', 'count': 4}]
[{'Name': 'Work-Life Balance', 'count': 5}, {'Name': 'Skill Development', 'count': 5}, {'Name': 'Salary & Benefits', 'count': 5}, {'Name': 'Job Security', 'count': 5}, {'Name': 'Company Culture', 'count': 5}, {'Name': 'Career Growth', 'count': 5}, {'Name': 'Work Satisfaction', 'count': 5}]
[{'Name': 'Work-Life Balance', 'count': 3}, {'Name': 'Skill Development', 'count': 5}, {'Name': 'Salary & Benefits', 'count': 3}, {'Name': 'Job Security', 'count': 3}, {'Name': 'Company Culture', 'count': 3}, {'Name': 'Career Growth', 'count': 3}, {'Name': 'Work Satisfaction', 'count': 4}]
[{'Name': 'Work-Life Balance', 'count': 3}, {'Name': 'Skill Development', 'count': 5}, {'Name': 'Salary & Benefits', 'count': 5}, {'Name': 'Job Security', 'count': 1}, {'Name': 'Company Culture', 'count': 3}, {'Name': 'Career Growth', 'count': 1}, {'Name': 'Work Satisfaction', 'count': 1}]
My code:
new_columns = set([d['Name'] for l in dfr.RatingDistribution.values for d in l ])
# Make a dict of dicts
col_val_dict = {}
for col_name in new_columns:
col_val_dict[col_name] = {}
# For each column name look to see if a row has that as a type
# If so, get the label for that dict
# otherwise fill it with NaN
for i,l in enumerate(dfr.placeholders.values):
the_label = [d['count'] for d in l if d['Name'] == col_name]
if the_label:
col_val_dict[col_name][i] = the_label[0]
else:
col_val_dict[col_name][i] = np.NaN
# Merge this new dfa with the old one
merged_dfa = pd.concat([dfr,pd.DataFrame(col_val_dict)],axis='columns')
dfr.shape
I'm getting error in the very first line. I'm not able to figure out why it is throwing me the float error.
PLEASE HELP
# Get the unique types (column names)
new_columns = set([d['type'] for l in df3.placeholders.values for d in l ])
# Make a dict of dicts
col_val_dict = {}
for col_name in new_columns:
col_val_dict[col_name] = {}
# For each column name look to see if a row has that as a type
# If so, get the label for that dict
# otherwise fill it with NaN
for i,l in enumerate(df3.placeholders.values):
the_label = [d['label'] for d in l if d['type'] == col_name]
if the_label:
col_val_dict[col_name][i] = the_label[0]
else:
col_val_dict[col_name][i] = np.NaN
# Merge this new df with the old one
merged_df = pd.concat([df3,pd.DataFrame(col_val_dict)],axis='columns')
Imagine I have the following dictionary.For every record (row of data), I want to merge the dictionaries of sub fields into a single dictionary. So in the end I have a list of dictionaries. One per each record.
Data = [{'Name': 'bob', 'age': '40’}
{'Name': 'tom', 'age': '30’},
{'Country’: 'US', 'City': ‘Boston’},
{'Country’: 'US', 'City': ‘New York},
{'Email’: 'bob#fake.com', 'Phone': ‘bob phone'},
{'Email’: 'tom#fake.com', 'Phone': ‘none'}]
Output = [
{'Name': 'bob', 'age': '40’,'Country’: 'US', 'City': ‘Boston’,'Email’: 'bob#fake.com', 'Phone': ‘bob phone'},
{'Name': 'tom', 'age': '30’,'Country’: 'US', 'City': ‘New York', 'Email’: 'tom#fake.com', 'Phone': ‘none'}
]
Related: How do I merge a list of dicts into a single dict?
I understand you know which dictionary relates to Bob and which dictionary relates to Tom by their position: dictionaries at even positions relate to Bob, while dictionaries at odd positions relate to Tom.
You can check whether a number is odd or even using % 2:
Data = [{'Name': 'bob', 'age': '40'},
{'Name': 'tom', 'age': '30'},
{'Country': 'US', 'City': 'Boston'},
{'Country': 'US', 'City': 'New York'},
{'Email': 'bob#fake.com', 'Phone': 'bob phone'},
{'Email': 'tom#fake.com', 'Phone': 'none'}]
bob_dict = {}
tom_dict = {}
for i,d in enumerate(Data):
if i % 2 == 0:
bob_dict.update(d)
else:
tom_dict.update(d)
Output=[bob_dict, tom_dict]
Or alternatively:
Output = [{}, {}]
for i, d in enumerate(Data):
Output[i%2].update(d)
This second approach is not only shorter to write, it's also faster to execute and easier to scale if you have more than 2 people.
Splitting the list into more than 2 dictionaries
k = 4 # number of dictionaries you want
Data = [{'Name': 'Alice', 'age': '40'},
{'Name': 'Bob', 'age': '30'},
{'Name': 'Charlie', 'age': '30'},
{'Name': 'Diane', 'age': '30'},
{'Country': 'US', 'City': 'Boston'},
{'Country': 'US', 'City': 'New York'},
{'Country': 'UK', 'City': 'London'},
{'Country': 'UK', 'City': 'Oxford'},
{'Email': 'alice#fake.com', 'Phone': 'alice phone'},
{'Email': 'bob#fake.com', 'Phone': '12345'},
{'Email': 'charlie#fake.com', 'Phone': '0000000'},
{'Email': 'diane#fake.com', 'Phone': 'none'}]
Output = [{} for j in range(k)]
for i, d in enumerate(Data):
Output[i%k].update(d)
# Output = [
# {'Name': 'Alice', 'age': '40', 'Country': 'US', 'City': 'Boston', 'Email': 'alice#fake.com', 'Phone': 'alice phone'},
# {'Name': 'Bob', 'age': '30', 'Country': 'US', 'City': 'New York', 'Email': 'bob#fake.com', 'Phone': '12345'},
# {'Name': 'Charlie', 'age': '30', 'Country': 'UK', 'City': 'London', 'Email': 'charlie#fake.com', 'Phone': '0000000'},
# {'Name': 'Diane', 'age': '30', 'Country': 'UK', 'City': 'Oxford', 'Email': 'diane#fake.com', 'Phone': 'none'}
#]
Additionally, instead of hardcoding k = 4:
If you know the number of fields but not the number of people, you can compute k by dividing the initial number of dictionaries by the number of dictionary types:
fields = ['Name', 'Country', 'Email']
assert(len(Data) % len(fields) == 0) # make sure Data is consistent with number of fields
k = len(Data) // len(fields)
Or alternatively, you can compute k by counting how many occurrences of the 'Names' field you have:
k = sum(1 for d in Data if 'Name' in d)
Below is result I got from API query.
[{'type':'book','title': 'example1', 'id': 12456, 'price': '8.20', 'qty': '12', 'status': 'available'},
{'type':'book','title': 'example2', 'id': 12457, 'price': '10.50', 'qty': '5', 'status': 'none'}]
How do I specify in code to get value pairs of title, price, & status only?
So result will be like:
[{'title': 'example1', 'price': '8.20', 'status': 'available'},
{'title': 'example2', 'price': '10.50', 'status': 'none'}]
You can use a dictionary comprehension within a list comprehension:
L = [{'type':'book','title': 'example1', 'id': 12456, 'price': '8.20', 'qty': '12', 'status': 'available'},
{'type':'book','title': 'example2', 'id': 12457, 'price': '10.50', 'qty': '5', 'status': 'none'}]
keys = ['title', 'price', 'status']
res = [{k: d[k] for k in keys} for d in L]
print(res)
[{'price': '8.20', 'status': 'available', 'title': 'example1'},
{'price': '10.50', 'status': 'none', 'title': 'example2'}]
class Weightcheck:
def bag_products(self,product_list):
bag_list = []
non_bag_items = []
MAX_BAG_WEIGHT = 5.0
for product in product_list:
if float(product['weight']) > MAX_BAG_WEIGHT:
product_list.remove(product)
non_bag_items.append(product)
and argument product_list is like
product_list = {'barcode': [123, 456], 'Name': ['Milk, 2 Litres', 'Bread'], 'Price': ['2', '3.5'], 'weight': ['2', '0.6']}
if the passed arugument is like
product_list = [{'name': 'Milk', 'price': 2.0, 'weight': 2.0},
{'name': 'LowfatMilk', 'price': 2.0, 'weight': 2.0},
{'name': 'HighfatMilk', 'price': 2.0, 'weight': 2.0},
{'name': 'Bread', 'price': 2.0, 'weight': 7.0}]
then it works properly. i mean list of dictionary. please help how can i solve this
This is not the best way but you can use something like this:
final_list = []
for i in range(len(product_in_basket['Name'])):
item ={} # each new item
for k,v in product_in_basket.items():
item[k]= v[i] # filling that item with specific index
final_list.append(item) # append to final list
> final_list
[
{'Name': 'Milk, 2 Litres', 'Price': '2', 'barcode': 123, 'weight': '2.0'},
{'Name': 'Bread', 'Price': '3.5', 'barcode': 456, 'weight': '0.6'}
]
Here's a one-liner that does the trick:
product_list = [dict(zip(product_in_basket,t)) for t in zip(*product_in_basket.values())]
print(product_list)
Output:
[{'Name': 'Milk, 2 Litres', 'Price': '2', 'barcode': 123, 'weight': '2.0'}, {'Name': 'Bread', 'Price': '3.5', 'barcode': 456, 'weight': '0.6'}]
In general, it's better to not use a library when plain Python will do, but I thought a solution using pandas might be interesting:
import pandas as pd
product_in_basket = {'barcode': [123, 456], 'Name': ['Milk, 2 Litres', 'Bread'],
'Price': ['2', '3.5'], 'weight': ['2.0', '0.6']}
df = pd.DataFrame(product_in_basket)
output = list(df.T.to_dict().values())
print(output)
Output:
[{'Name': 'Milk, 2 Litres', 'Price': '2', 'barcode': 123, 'weight': '2.0'},
{'Name': 'Bread', 'Price': '3.5', 'barcode': 456, 'weight': '0.6'}]