Remove duplicates from list of lists by column value

Remove duplicates from list of lists by column value - python

I have a list of list that look like this, they have been sorted so that duplicate IDs are arranged with the one I want to keep at the top..
[
{'id': '23', 'type': 'car', 'price': '445'},
{'id': '23', 'type': 'car', 'price': '78'},
{'id': '23', 'type': 'car', 'price': '34'},
{'id': '125', 'type': 'truck', 'price': '998'},
{'id': '125', 'type': 'truck', 'price': '722'},
{'id': '125', 'type': 'truck', 'price': '100'},
{'id': '87', 'type': 'bike', 'price': '50'},
]
What is the simplest way to remove rows that have duplicate IDs but always keep the first one? In this instance the end result would look like this...
[
{'id': '23', 'type': 'car', 'price': '445'},
{'id': '125', 'type': 'truck', 'price': '998'},
{'id': '87', 'type': 'bike', 'price': '50'},
]
I know I can remove duplicates from lists by converting to set like set(my_list) but in this instance it is duplicates by ID that I want to remove by

Since you already hav the list sorted properly, a simple way to do this is to use itertools.groupby to grab the first element of each group in a list comprehension:
from itertools import groupby
l= [
{'id': '23', 'type': 'car', 'price': '445'},
{'id': '23', 'type': 'car', 'price': '78'},
{'id': '23', 'type': 'car', 'price': '34'},
{'id': '125', 'type': 'truck', 'price': '998'},
{'id': '125', 'type': 'truck', 'price': '722'},
{'id': '125', 'type': 'truck', 'price': '100'},
{'id': '87', 'type': 'bike', 'price': '50'},
]
[next(g) for k, g in groupby(l, key=lambda d: d['id'])]
# [{'id': '23', 'type': 'car', 'price': '445'},
# {'id': '125', 'type': 'truck', 'price': '998'},
# {'id': '87', 'type': 'bike', 'price': '50'}]

I would probably convert to Pandas DataFrame and then use drop_duplicates
import pandas as pd
data = [
{'id': '23', 'type': 'car', 'price': '445'},
{'id': '23', 'type': 'car', 'price': '78'},
{'id': '23', 'type': 'car', 'price': '34'},
{'id': '125', 'type': 'truck', 'price': '998'},
{'id': '125', 'type': 'truck', 'price': '722'},
{'id': '125', 'type': 'truck', 'price': '100'},
{'id': '87', 'type': 'bike', 'price': '50'},
]
df = pd.DataFrame(data)
df.drop_duplicates(subset=['id'], inplace=True)
print(df.to_dict('records'))
# Output
# [{'id': '23', 'type': 'car', 'price': '445'},
# {'id': '125', 'type': 'truck', 'price': '998'},
# {'id': '87', 'type': 'bike', 'price': '50'}]

Here's an answer that involves no external modules or unnecessary manipulation of the data:
data = [
{'id': '23', 'type': 'car', 'price': '445'},
{'id': '23', 'type': 'car', 'price': '78'},
{'id': '23', 'type': 'car', 'price': '34'},
{'id': '125', 'type': 'truck', 'price': '998'},
{'id': '125', 'type': 'truck', 'price': '722'},
{'id': '125', 'type': 'truck', 'price': '100'},
{'id': '87', 'type': 'bike', 'price': '50'},
]
seen = set()
result = [row for row in data if row['id'] not in seen and not seen.add(row['id'])]
print(result)
Result:
[{'id': '23', 'type': 'car', 'price': '445'},
{'id': '125', 'type': 'truck', 'price': '998'},
{'id': '87', 'type': 'bike', 'price': '50'}]
Note that the not seen.add(row['id'])] part of the list comprehension will always be True. It's just a way of noting that a unique entry has been seen by adding it to the seen set.

Let's take the name of the given list as data.
unique_ids = []
result = []
for item in data:
if item["id"] not in unique_ids:
result.append(item)
unique_ids.append(item["id"])
print(result)
The result will be,
[{'id': '23', 'type': 'car', 'price': '445'},
{'id': '125', 'type': 'truck', 'price': '998'},
{'id': '87', 'type': 'bike', 'price': '50'}]

Related

create dictionary of values based on matching keys in list from nested dictionary

i have nested dictionary with upto 300 items from TYPE1 TO TYPE300 called mainlookup
mainlookup = {'TYPE1': [{'Song': 'Rock', 'Type': 'Hard', 'Price': '10'}],
'TYPE2': [{'Song': 'Jazz', 'Type': 'Slow', 'Price': '5'}],
'TYPE37': [{'Song': 'Country', 'Type': 'Fast', 'Price': '7'}]}
input list to search in lookup based on string TYPE1, TYPE2 and so one
input_list = ['thissong-fav-user:type1-chan-44-John',
'thissong-fav-user:type1-chan-45-kelly-md',
'thissong-fav-user:type2-rock-45-usa',
'thissong-fav-user:type737-chan-45-patrick-md',
'thissong-fav-user:type37-chan-45-kelly-md']
i want to find the string TYPE IN input_list and then create a dictionary as shown below
Output_Desired = {'thissong-fav-user:type1-chan-44-John': [{'Song': 'Rock', 'Type': 'Hard',
'Price':'10'}],
'thissong-fav-user:type1-chan-45-kelly-md': [{'Song': 'Rock', 'Type': 'Hard', 'Price': '10'}],
'thissong-fav-user:type2-rock-45-usa': [{'Song': 'Jazz', 'Type': 'Slow', 'Price': '5'}],
'thissong-fav-user:type37-chan-45-kelly-md': [{'Song': 'Country', 'Type': 'Fast', 'Price': '7'}]}
Note-thissong-fav-user:type737-chan-45-patrick-md in the list has no match so i want to create a
seperate list if value is not found in main lookup
Notfound_list = ['thissong-fav-user:type737-chan-45-patrick-md', and so on..]
Appreciate your help.

You can try this:
mainlookup = {'TYPE1': [{'Song': 'Rock', 'Type': 'Hard', 'Price': '10'}],
'TYPE2': [{'Song': 'Jazz', 'Type': 'Slow', 'Price': '5'}], 'TYPE37': [{'Song': 'Country', 'Type': 'Fast', 'Price': '7'}]}
input_list = ['thissong-fav-user:type1-chan-44-John',
'thissong-fav-user:type1-chan-45-kelly-md', 'thissong-fav-user:type737-chan-45-kelly-md']
dct={i:mainlookup[i.split(':')[1].split('-')[0].upper()] for i in input_list if i.split(':')[1].split('-')[0].upper() in mainlookup.keys()}
Notfoundlist=[i for i in input_list if i not in dct.keys() ]
print(dct)
print(Notfoundlist)
Output:
{'thissong-fav-user:type1-chan-44-John': [{'Song': 'Rock', 'Type': 'Hard', 'Price': '10'}], 'thissong-fav-user:type1-chan-45-kelly-md': [{'Song': 'Rock', 'Type': 'Hard', 'Price': '10'}]}
['thissong-fav-user:type737-chan-45-kelly-md']

An answer using regular expressions:
import re
from pprint import pprint
input_list = ['thissong-fav-user:type1-chan-44-John', 'thissong-fav-user:type1-chan-45-kelly-md', 'thissong-fav-user:type2-rock-45-usa', 'thissong-fav-user:type737-chan-45-patrick-md', 'thissong-fav-user:type37-chan-45-kelly-md']
mainlookup = {'TYPE2': {'Song': 'Reggaeton', 'Type': 'Hard', 'Price': '30'}, 'TYPE1': {'Song': 'Rock', 'Type': 'Hard', 'Price': '10'}, 'TYPE737': {'Song': 'Jazz', 'Type': 'Hard', 'Price': '99'}, 'TYPE37': {'Song': 'Rock', 'Type': 'Soft', 'Price': '1'}}
pattern = re.compile('type[0-9]+')
matches = [re.search(pattern, x).group(0) for x in input_list]
result = {x: [mainlookup[matches[i].upper()]] for i, x in enumerate(input_list)}
pprint(result)
Output:
{'thissong-fav-user:type1-chan-44-John': [{'Price': '10',
'Song': 'Rock',
'Type': 'Hard'}],
'thissong-fav-user:type1-chan-45-kelly-md': [{'Price': '10',
'Song': 'Rock',
'Type': 'Hard'}],
'thissong-fav-user:type2-rock-45-usa': [{'Price': '30',
'Song': 'Reggaeton',
'Type': 'Hard'}],
'thissong-fav-user:type37-chan-45-kelly-md': [{'Price': '1',
'Song': 'Rock',
'Type': 'Soft'}],
'thissong-fav-user:type737-chan-45-patrick-md': [{'Price': '99',
'Song': 'Jazz',
'Type': 'Hard'}]}

How to separate values in a dictionary to be put into CSV? [duplicate]

This question already has answers here:
How can I convert JSON to CSV?
(26 answers)
Closed 3 years ago.
I am trying to write my JSON output to CSV, but I'm not sure how to separate my values into individual columns
This is my current code
with open('dict.csv', 'w') as csv_file:
writer = csv.writer(csv_file)
for key, value in response.json().items():
writer.writerow([value])
print(value)
This is the csv file I am getting:
current csv file
This is the desired csv file/output I want to get:
desired output
This is an example of my JSON Output
[{'id': '123', 'custom_id': '12', 'company': 28, 'company_name': 'Sunshine'}, {'id': '224', 'custom_id': '14', 'company': 38, 'company_name': 'Flowers'},
{'id': '888', 'custom_id': '10', 'company': 99, 'company_name': 'Fields'}]
how about this JSON format? (a more complicated one)
[{'id': '777', 'custom_id': '000112', 'company': 28, 'company_name':
'Weddings Inc', 'delivery_address': '25 olive park terrace, 61234', 'delivery_timeslot': {'lower': '2019-12-06T10:00:00Z', 'upper': '2019-12-06T13:00:00Z', 'bounds': '[)'}, 'sender_name': 'Joline', 'sender_email': '', 'sender_contact': '91234567', 'removed': None, 'recipient_name': 'Joline', 'recipient_contact': '91866655', 'notes': '', 'items': [{'id': 21668, 'name': 'Loose hair flowers', 'quantity': 1, 'metadata': {}, 'removed': None}, {'id': 21667, 'name': "Groom's Boutonniere", 'quantity': 1, 'metadata': {}, 'removed': None}, {'id': 21666, 'name': 'Bridal Bouquet', 'quantity': 1, 'metadata': {}, 'removed': None}], 'latitude': '1.1234550920764211111', 'longitude': '103.864352476201000000', 'created': '2019-08-15T05:40:30.385467Z', 'updated': '2019-08-15T05:41:27.930110Z', 'status': 'pending', 'verbose_status': 'Pending', 'logs': [{'id': 56363, 'order': '50c402', 'order_custom_id': '000112', 'order_delivery_address': '25 olive park terrace, 61234', 'order_delivery_timeslot': {'lower': '2019-12-06T10:00:00Z', 'upper': '2019-12-06T13:00:00Z', 'bounds': '[)'}, 'message': 'Order was created.', 'failure_reason': None, 'success_code': None, 'success_description': None, 'created': '2019-08-15T05:40:30.431790Z', 'removed': None}, {'id': 56364, 'order': '50c402d8-7c76-45b5-b883-e2fb887a507e', 'order_custom_id': 'INV-000112', 'order_delivery_address': '25 olive park terrace, 61234', 'order_delivery_timeslot': {'lower': '2019-12-06T10:00:00Z', 'upper': '2019-12-06T13:00:00Z', 'bounds': '[)'}, 'message': 'Order is pending.', 'failure_reason': None, 'success_code': None, 'success_description': None, 'created': '2019-08-15T05:40:30.433139Z', 'removed': None}], 'reschedule_requests': [], 'signature': None},
{'id': '241', 'custom_id': '000123', 'company': 22, 'company_name': 'Pearl Pte Ltd', 'delivery_address': '90 Merchant Road, Hotel Royal, 223344', 'delivery_timeslot': {'lower': '2019-11-29T10:00:00Z', 'upper': '2019-11-29T13:00:00Z', 'bounds': '[)'}, 'sender_name': 'Vera Smith', 'sender_email': '', 'sender_contact': '81234567', 'removed': None, 'recipient_name': 'Vera Smith', 'recipient_contact': '81234561', 'notes': '', 'items': [{'id': 22975, 'name': 'Custom wrapped bouquet', 'quantity': 2, 'metadata': {}, 'removed': None}, {'id': 22974, 'name': "Parents' boutonniere x 3", 'quantity': 1, 'metadata': {}, 'removed': None}, {'id': 22973, 'name': "Groom's boutonniere", 'quantity': 1, 'metadata': {}, 'removed': None}, {'id': 22972, 'name': 'Loose hair flowers', 'quantity': 1, 'metadata': {}, 'removed': None}, {'id': 22971, 'name': 'Bridal Bouquet', 'quantity': 1, 'metadata': {}, 'removed': None}], 'latitude': '1.28821802835873000000', 'longitude': '103.84569230314800000000', 'created': '2019-08-30T03:20:17.477528Z', 'updated': '2019-08-30T03:29:25.307856Z', 'status': 'pending', 'verbose_status': 'Pending', 'logs': [{'id': 59847, 'order': '24117085-9104-4442-841b-4a734f801d39', 'order_custom_id': 'INV-000123', 'order_delivery_address': '90 Merchant Road, Hotel Royal, 223344', 'order_delivery_timeslot': {'lower': '2019-11-29T10:00:00Z', 'upper': '2019-11-29T13:00:00Z', 'bounds': '[)'}, 'message': 'Order was created.', 'failure_reason': None, 'success_code': None, 'success_description': None, 'created': '2019-08-30T03:20:17.511250Z', 'removed': None}, {'id': 59848, 'order': '24117085-9104-4442-841b-4a734f801d39', 'order_custom_id': 'INV-000123', 'order_delivery_address': '90 Merchant Road, Hotel Royal, 223344', 'order_delivery_timeslot': {'lower': '2019-11-29T10:00:00Z', 'upper': '2019-11-29T13:00:00Z', 'bounds': '[)'}, 'message': 'Order is pending.', 'failure_reason': None, 'success_code': None, 'success_description': None, 'created': '2019-08-30T03:20:17.513132Z', 'removed': None}], 'reschedule_requests': [], 'signature': None}]

Use pandas library:
df.to_csv() - Write object to a comma-separated values (csv) file.
Ex.
import pandas as pd
data = [{'id': '123', 'custom_id': '12', 'company': 28, 'company_name': 'Sunshine'},
{'id': '224', 'custom_id': '14', 'company': 38, 'company_name': 'Flowers'},
{'id': '888', 'custom_id': '10', 'company': 99, 'company_name': 'Fields'}]
df = pd.DataFrame(data)
df.to_csv('sample.csv')

Try:
import csv
csv_file = 'my_file.csv'
csv_columns = ['id', 'custom_id', 'company', 'company_name']
dict_data = [{'id': '123', 'custom_id': '12', 'company': 28, 'company_name': 'Sunshine'}, {'id': '224', 'custom_id': '14', 'company': 38, 'company_name': 'Flowers'}, {'id': '888', 'custom_id': '10', 'company': 99, 'company_name': 'Fields'}]
try:
with open(csv_file, 'w') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=csv_columns)
writer.writeheader()
for data in dict_data:
writer.writerow(data)
except IOError:
print("I/O error")

Given your response data in json format
response = [{'id': '123', 'custom_id': '12', 'company': 28, 'company_name': 'Sunshine'},
{'id': '224', 'custom_id': '14', 'company': 38, 'company_name': 'Flowers'},
{'id': '888', 'custom_id': '10', 'company': 99, 'company_name': 'Fields'}]
You can convert it to a list of lists using
header = [response[0].keys()]
data = [row.values() for row in response]
csv_list = header + data
And then save it to csv using
with open('dict.csv', "w") as f:
for row in csv_list:
f.write("%s\n" % ','.join(str(col) for col in row))
This should yield your desired output

How to get/filter values in python3 json list dictionary response?

Below is result I got from API query.
[{'type':'book','title': 'example1', 'id': 12456, 'price': '8.20', 'qty': '12', 'status': 'available'},
{'type':'book','title': 'example2', 'id': 12457, 'price': '10.50', 'qty': '5', 'status': 'none'}]
How do I specify in code to get value pairs of title, price, & status only?
So result will be like:
[{'title': 'example1', 'price': '8.20', 'status': 'available'},
{'title': 'example2', 'price': '10.50', 'status': 'none'}]

You can use a dictionary comprehension within a list comprehension:
L = [{'type':'book','title': 'example1', 'id': 12456, 'price': '8.20', 'qty': '12', 'status': 'available'},
{'type':'book','title': 'example2', 'id': 12457, 'price': '10.50', 'qty': '5', 'status': 'none'}]
keys = ['title', 'price', 'status']
res = [{k: d[k] for k in keys} for d in L]
print(res)
[{'price': '8.20', 'status': 'available', 'title': 'example1'},
{'price': '10.50', 'status': 'none', 'title': 'example2'}]

Python dictionary value conversion to dictionary list

class Weightcheck:
def bag_products(self,product_list):
bag_list = []
non_bag_items = []
MAX_BAG_WEIGHT = 5.0
for product in product_list:
if float(product['weight']) > MAX_BAG_WEIGHT:
product_list.remove(product)
non_bag_items.append(product)
and argument product_list is like
product_list = {'barcode': [123, 456], 'Name': ['Milk, 2 Litres', 'Bread'], 'Price': ['2', '3.5'], 'weight': ['2', '0.6']}
if the passed arugument is like
product_list = [{'name': 'Milk', 'price': 2.0, 'weight': 2.0},
{'name': 'LowfatMilk', 'price': 2.0, 'weight': 2.0},
{'name': 'HighfatMilk', 'price': 2.0, 'weight': 2.0},
{'name': 'Bread', 'price': 2.0, 'weight': 7.0}]
then it works properly. i mean list of dictionary. please help how can i solve this

This is not the best way but you can use something like this:
final_list = []
for i in range(len(product_in_basket['Name'])):
item ={} # each new item
for k,v in product_in_basket.items():
item[k]= v[i] # filling that item with specific index
final_list.append(item) # append to final list
> final_list
[
{'Name': 'Milk, 2 Litres', 'Price': '2', 'barcode': 123, 'weight': '2.0'},
{'Name': 'Bread', 'Price': '3.5', 'barcode': 456, 'weight': '0.6'}
]

Here's a one-liner that does the trick:
product_list = [dict(zip(product_in_basket,t)) for t in zip(*product_in_basket.values())]
print(product_list)
Output:
[{'Name': 'Milk, 2 Litres', 'Price': '2', 'barcode': 123, 'weight': '2.0'}, {'Name': 'Bread', 'Price': '3.5', 'barcode': 456, 'weight': '0.6'}]
In general, it's better to not use a library when plain Python will do, but I thought a solution using pandas might be interesting:
import pandas as pd
product_in_basket = {'barcode': [123, 456], 'Name': ['Milk, 2 Litres', 'Bread'],
'Price': ['2', '3.5'], 'weight': ['2.0', '0.6']}
df = pd.DataFrame(product_in_basket)
output = list(df.T.to_dict().values())
print(output)
Output:
[{'Name': 'Milk, 2 Litres', 'Price': '2', 'barcode': 123, 'weight': '2.0'},
{'Name': 'Bread', 'Price': '3.5', 'barcode': 456, 'weight': '0.6'}]

How can I order a list of dictionaries based on another list which has the key?

I have some ids as follows:
req_ids=['964','123','534','645','876','222']
and I get a response from another server like this not in any particular order:
res_result = [{'id':'123', 'name':'Sachin'},
{'id':'534', 'name':'Vipin'},
{'id':'222', 'name':'Ram'},
{'id':'645', 'name':'Anoop'},
{'id':'964', 'name':'Sani'},
{'id':'876', 'name':'John'}]
I need to get res_result in the same request order as req_ids,
[{'id':'964', 'name':'Sani'},
{'id':'123', 'name':'Sachin'},
{'id':'534', 'name':'Vipin'},
{'id':'645', 'name':'Anoop'},
{'id':'876', 'name':'John'},
{'id':'222', 'name':'Ram'}]
How do I do this if possible using an inbuilt python function instead writing our own loop to do this logic?

If you must sort, map the keys to integers, then use that map to sort:
id_to_pos = {key: i for i, key in enumerate(req_ids)}
sorted(res_result, key=lambda d: id_to_pos[d['id']])
Demo:
>>> from pprint import pprint
>>> req_ids = ['964', '123', '534', '645', '876', '222']
>>> res_result = [{'id':'123', 'name':'Sachin'},
... {'id':'534', 'name':'Vipin'},
... {'id':'222', 'name':'Ram'},
... {'id':'645', 'name':'Anoop'},
... {'id':'964', 'name':'Sani'},
... {'id':'876', 'name':'John'}]
>>> id_to_pos = {key: i for i, key in enumerate(req_ids)}
>>> sorted(res_result, key=lambda d: id_to_pos[d['id']])
[{'id': '964', 'name': 'Sani'}, {'id': '123', 'name': 'Sachin'}, {'id': '534', 'name': 'Vipin'}, {'id': '645', 'name': 'Anoop'}, {'id': '876', 'name': 'John'}, {'id': '222', 'name': 'Ram'}]
>>> pprint(_)
[{'id': '964', 'name': 'Sani'},
{'id': '123', 'name': 'Sachin'},
{'id': '534', 'name': 'Vipin'},
{'id': '645', 'name': 'Anoop'},
{'id': '876', 'name': 'John'},
{'id': '222', 'name': 'Ram'}]
However, you can avoid sorting altogether, you already have the right order, you only need to map from id keys to dictionaries (O(N)), then pull out the dictionaries in the right order (still O(N)):
id_to_dict = {d['id']: d for d in res_result}
[id_to_dict[id] for id in req_ids]
Demo:
>>> id_to_dict = {d['id']: d for d in res_result}
>>> [id_to_dict[id] for id in req_ids]
[{'id': '964', 'name': 'Sani'}, {'id': '123', 'name': 'Sachin'}, {'id': '534', 'name': 'Vipin'}, {'id': '645', 'name': 'Anoop'}, {'id': '876', 'name': 'John'}, {'id': '222', 'name': 'Ram'}]
>>> pprint(_)
[{'id': '964', 'name': 'Sani'},
{'id': '123', 'name': 'Sachin'},
{'id': '534', 'name': 'Vipin'},
{'id': '645', 'name': 'Anoop'},
{'id': '876', 'name': 'John'},
{'id': '222', 'name': 'Ram'}]

>>> map({d['id']: d for d in res_result}.__getitem__, req_ids)
[{'id': '964', 'name': 'Sani'}, {'id': '123', 'name': 'Sachin'}, {'id': '534', 'name': 'Vipin'}, {'id': '645', 'name': 'Anoop'}, {'id': '876', 'name': 'John'}, {'id': '222', 'name': 'Ram'}]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove duplicates from list of lists by column value - python

Related

create dictionary of values based on matching keys in list from nested dictionary

How to separate values in a dictionary to be put into CSV? [duplicate]

How to get/filter values in python3 json list dictionary response?

Python dictionary value conversion to dictionary list

How can I order a list of dictionaries based on another list which has the key?

Categories

Resources