Removing common elements from a dictionary of lists in python - python

I have a dictionary of lists and the lists contain dictionaries like so:
my_dict = {
'list1': [{'catch': 100, 'id': '1'}, {'catch': 101, 'id': '2'},
{'catch': 50, 'id': '1'}],
'list2': [{'catch': 189, 'id': '1'}, {'catch': 120, 'id': '12'}],
'list3': [{'catch': 140, 'id': '1'}, {'catch': 10, 'id': '100'}]
}
What is the most pythonic way of removing the list items with commin 'id' values and storing them in a separate list? So the output would be something like this:
my_dict = {
'list1': [{'catch': 101, 'id': '2'}],
'list2': [{'catch': 120, 'id': '12'}],
'list3': [ {'catch': 10, 'id': '100'}],
'list4': [{'catch': 100, 'id': '1'}, , {'catch': 50, 'id': '1'},
{'catch': 189, 'id': '1'}, {'catch': 140, 'id': '1'}]
}
In my program I have 7 lists similar to this, and if an 'id' appears in two or more of these lists, I want to store all appearances of an item with that 'id' in the 8th list for further processing
with regards,
finnurtorfa

Consider restructuring your data into something like this:
>>> import itertools
>>> { k: [d['catch'] for d in v] for k, v in itertools.groupby(sorted(itertools.chain(*my_dict.itervalues()), key=lambda d: d['id']), lambda d: d['id']) }
{'1': [100, 50, 140, 189], '2': [101], '100': [10], '12': [120]}
You haven't described what your data represents, so this may not be appropriate for you. But the tools used (chain and groupby from itertools) should at least give you some ideas.
Edit: I used the sample answer from the question in my testing by accident. Fixed by adding sorting to the input to groupby.

>>> get_id = operator.itemgetter("id")
>>> flattened_dict = itertools.chain.from_iterable(my_dict.values())
>>> groups = itertools.groupby(sorted(flattened_dict, key=get_id), get_id)
>>> {k: list(v) for k, v in groups}
{'1': [{'catch': 100, 'id': '1'},
{'catch': 50, 'id': '1'},
{'catch': 140, 'id': '1'},
{'catch': 189, 'id': '1'}],
'100': [{'catch': 10, 'id': '100'}],
'12': [{'catch': 120, 'id': '12'}],
'2': [{'catch': 101, 'id': '2'}]}
Explanation:
get_id is a function that takes an object x and returns x["id"].
flattened_dict is just an iterable over all the lists (i.e. concatenating all the .values() of my_dict
Now we sort flattened_dict with the key function get_id -- that is, sort by ID -- and group the result by id.
This basically works because itertools.groupby is awesome.

Something along the following line:
my_dict = {
'list1': [{'catch': 100, 'id': '1'}, {'catch': 101, 'id': '2'},
{'catch': 50, 'id': '1'}],
'list2': [{'catch': 189, 'id': '1'}, {'catch': 120, 'id': '12'}],
'list3': [{'catch': 140, 'id': '1'}, {'catch': 10, 'id': '100'}]
}
from itertools import groupby
sub = {}
for k in my_dict:
for kk, g in groupby( my_dict[k], lambda v: v["id"] ):
if not kk in sub:
sub[kk] = []
sub[kk] = sub[kk] + list( g )
print sub
{'1': [{'catch': 100, 'id': '1'}, {'catch': 50, 'id': '1'}, {'catch': 140, 'id': '1'}, {'catch': 189, 'id': '1'}], '12': [{'catch': 120, 'id': '12'}], '100': [{'catch': 10, 'id': '100'}], '2': [{'catch': 101, 'id': '2'}]}

Related

How to Iterate through an array of dictionaries to copy only relevant keys to new dictionary?

I want to iterate through a dictionary array like the following to only copy the 'symbol' and 'product_progress' keys and their corresponding values to new dictionary array.
[{'coin_name': 'Bitcoin', 'coin_id': 'bitcoin', 'symbol': 'btc', 'rank': 1, 'product_progress': 93, 'team': 100, 'token_fundamentals': 100, 'github_activity': 95, 'marketing': 5, 'partnership': 5, 'uniqueness': 5, 'total_score': 96, 'exchange_name': 'Bitfinex', 'exchange_link': 'https://www.bitfinex.com/t/BTCUSD', 'website': 'https://bitcoin.org/en/', 'twitter': 'https://twitter.com/Bitcoin', 'telegram': None, 'whitepaper': 'https://bitcoin.org/en/bitcoin-paper'}, {'coin_name': 'Ethereum', 'coin_id': 'ethereum', 'symbol': 'eth', 'rank': 2, 'product_progress': 87, 'team': 98, 'token_fundamentals': 97, 'github_activity': 100, 'marketing': 5, 'partnership': 5, 'uniqueness': 5, 'total_score': 94, 'exchange_name': 'Gemini', 'exchange_link': 'https://gemini.com/', 'website': 'https://www.ethereum.org/', 'twitter': 'https://twitter.com/ethereum', 'telegram': None, 'whitepaper': 'https://ethereum.org/en/whitepaper/'}] ...
The code I have so far is:
# need to iterate through list of dictionaries
for index in range(len(projectlist3)):
for key in projectlist3[index]:
d['symbol'] = projectlist3[index]['symbol']
d['token_fundamentals'] = projectlist3[index]['token_fundamentals']
print(d)
It's just saving the last entry rather than all of the entries {'symbol': 'eth', 'token_fundamentals': 97}
Given your data:
l = [{
'coin_name': 'Bitcoin',
'coin_id': 'bitcoin',
'symbol': 'btc',
'rank': 1,
'product_progress': 93,
'team': 100,
'token_fundamentals': 100,
'github_activity': 95,
'marketing': 5,
'partnership': 5,
'uniqueness': 5,
'total_score': 96,
'exchange_name': 'Bitfinex',
'exchange_link': 'https://www.bitfinex.com/t/BTCUSD',
'website': 'https://bitcoin.org/en/',
'twitter': 'https://twitter.com/Bitcoin',
'telegram': None,
'whitepaper': 'https://bitcoin.org/en/bitcoin-paper'
}, {
'coin_name': 'Ethereum',
'coin_id': 'ethereum',
'symbol': 'eth',
'rank': 2,
'product_progress': 87,
'team': 98,
'token_fundamentals': 97,
'github_activity': 100,
'marketing': 5,
'partnership': 5,
'uniqueness': 5,
'total_score': 94,
'exchange_name': 'Gemini',
'exchange_link': 'https://gemini.com/',
'website': 'https://www.ethereum.org/',
'twitter': 'https://twitter.com/ethereum',
'telegram': None,
'whitepaper': 'https://ethereum.org/en/whitepaper/'
}]
You can use listcomp
new_l = [{field: d[field] for field in ['symbol', 'token_fundamentals']}
for d in l]
which is better equivalent of this:
new_l = []
for d in l:
new_d = {}
for field in ['symbol', 'token_fundamentals']:
new_d[field] = d[field]
new_l.append(new_d)
Judging by what your writing into d you want to save a list of objects so this would work:
[{"symbol": i['symbol'], "token_fundamentals": i['token_fundamentals']} for i in d]
Result:
[{'symbol': 'btc', 'token_fundamentals': 100}, {'symbol': 'eth', 'token_fundamentals': 97}]

Python - remove duplicate item from list of dictionaries based on dictionary value

I have a list of dictionaries that has some duplicate IDs in and I would like to keep the dictionaries that have a value under rsrp and remove the ones that have 0, but if there is a duplicate that has rsrp of 0 in twice I need to keep it.
The current and desired list examples are below, is there a simple way to do this? finding non '0' with a loop is the easy bit but if matching ids are '0' im not sure
current_list = [
{'id': 255, 'rssi': -108.0},
{'id': 255, 'rssi': '0'},
{'id': 301, 'rssi': -82.0},
{'id': 301, 'rssi': '0'},
{'id': 263, 'rssi': -85.0},
{'id': 263, 'rssi': '0'},
{'id': 18, 'rssi': '0'},
{'id': 18, 'rssi': '0'}
]
desired_list = [
{'id': 255, 'rssi': -108.0},
{'id': 301, 'rssi': -82.0},
{'id': 263, 'rssi': -85.0},
{'id': 18, 'rssi': '0'}
]
in the itertools recipes there is a method called unique_everseen:
from itertools import filterfalse
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
you could use that to get your desired list:
desired_list = list(unique_everseen(current_list, key=lambda x: x["rssi"]))
# [{'id': 255, 'rssi': -108.0}, {'id': 255, 'rssi': '0'},
# {'id': 301, 'rssi': -82.0}, {'id': 263, 'rssi': -85.0}]
all that is left to do is select 'rssi' using the key parameter of unique_everseen.
Using a simple iteration .
Ex:
current_list = [
{'id': 255, 'rssi': -108.0},
{'id': 255, 'rssi': '0'},
{'id': 301, 'rssi': -82.0},
{'id': 301, 'rssi': '0'},
{'id': 263, 'rssi': -85.0},
{'id': 263, 'rssi': '0'},
{'id': 18, 'rssi': '0'},
{'id': 18, 'rssi': '0'}
]
seen = set()
result = []
for i in sorted(current_list, key=lambda x: True if x["rssi"] == "0" else False):
if (i["id"] not in seen and i["rssi"] != "0") or \
(i["id"] not in seen and i["rssi"] == "0"):
result.append(i)
seen.add(i["id"])
Output:
[{'id': 255, 'rssi': -108.0},
{'id': 301, 'rssi': -82.0},
{'id': 263, 'rssi': -85.0},
{'id': 18, 'rssi': '0'}]
If you can use external libraries in your project, you can take advantage of Pandas vectorized operations.
E.g. :
import pandas as pd
df = pd.DataFrame(current_list)
df["rssi"] = pd.to_numeric(df["rssi"])
df = df[(df["rssi"] != 0) | (df.groupby("id").transform("min") == 0)["rssi"]]
df = df.drop_duplicates()
df.to_dict("records")
One solution using itertools.groupby. If there's item with rssi only 0, we keep one. We add all other non-zero rssi items to output list:
current_list = [
{'id': 255, 'rssi': -108.0},
{'id': 255, 'rssi': '0'},
{'id': 301, 'rssi': -82.0},
{'id': 301, 'rssi': '0'},
{'id': 263, 'rssi': -85.0},
{'id': 263, 'rssi': '0'},
{'id': 18, 'rssi': '0'},
{'id': 18, 'rssi': '0'}
]
from itertools import groupby
out = []
for v, g in groupby(sorted(current_list, key=lambda k: (k['id'], k['rssi'] == '0')), lambda k: k['id']):
out.append(next(g)) # ensure we add at least one `0`
out.extend(i for i in g if i['rssi'] != '0') # add any non-zero `rssi` items
from pprint import pprint
pprint(out)
Prints:
[{'id': 18, 'rssi': '0'},
{'id': 255, 'rssi': -108.0},
{'id': 263, 'rssi': -85.0},
{'id': 301, 'rssi': -82.0}]
Without using any imports, I would do it following way:
current_list = [
{'id': 255, 'rssi': -108.0},
{'id': 255, 'rssi': '0'},
{'id': 301, 'rssi': -82.0},
{'id': 301, 'rssi': '0'},
{'id': 263, 'rssi': -85.0},
{'id': 263, 'rssi': '0'},
{'id': 18, 'rssi': '0'},
{'id': 18, 'rssi': '0'}
]
output = {}
for i in current_list:
if not i['id'] in output:
output[i['id']] = []
output[i['id']].append(i['rssi'])
# now output is {255: [-108.0, '0'], 301: [-82.0, '0'], 263: [-85.0, '0'], 18: ['0', '0']}
def func(x):
for j in x:
if j!='0':
return j
return '0'
desired_list = [{'id':i[0],'rssi':func(i[1])} for i in output.items()]
print(desired_list)
Output:
[{'id': 255, 'rssi': -108.0}, {'id': 301, 'rssi': -82.0}, {'id': 263, 'rssi': -85.0}, {'id': 18, 'rssi': '0'}]

Partitioning a List of Dictionaries in Python According to a Value

I have a coding challenge requires me to create a logic that partitions a list of dictionaries into three new lists of dicts. The new lists need to have the same number of experienced and inexperienced personnel. The original list has an even number of experienced and inexperienced personnel. I have no idea how to form the logic for this challenge. Here is a shortened version:
mylist = [
{'name': 'Jade', 'height': 64, 'experience': 'n'},
{'name': 'Diego', 'height': 60, 'experience': 'y'},
{'name': 'Twee', 'height': 70, 'experience': 'n'},
{'name': 'Wence', 'height': 72, 'experience': 'y'},
{'name': 'Shubha', 'height': 65, 'experience': 'y'},
{'name': 'Taylor', 'height': 68, 'experience': 'n'}
]
The new dicts need to have equal numbers of experienced and inexperienced personnel like this:
newlist_1 = [
{'name': 'Diego', 'height': 60, 'experience': 'y'},
{'name': 'Jade', 'height': 64, 'experience': 'n'},
]
newlist_2 = [
{'name': 'Wence', 'height': 72, 'experience': 'y'},
{'name': 'Twee', 'height': 70, 'experience': 'n'},
]
newlist_3 = [
{'name': 'Shubha', 'height': 65, 'experience': 'y'},
{'name': 'Taylor', 'height': 68, 'experience': 'n'}
]
I am keeping the original list, so in the end there needs to be a total of four collections.
def make_teams(my_list):
# divide the member list in two
experienced = list()
novice = list()
for record in my_list:
if record.get('experience') in ['Y','y']:
experienced.append(record)
else:
novice.append(record)
# stitch the two lists together as a list of tuples
teams = zip(experienced, novice)
# build a dictionary result starting with the member list
results={
'members':my_list
}
# update results with each team
for i in range(0,len(teams)):
results.update(
{'newlist_%s'%(i+1):list(teams[i])})
return results
Will produce the following...
from pprint import pprint
pprint(make_teams(mylist))
{'members': [{'experience': 'n', 'height': 64, 'name': 'Jade'},
{'experience': 'y', 'height': 60, 'name': 'Diego'},
{'experience': 'n', 'height': 70, 'name': 'Twee'},
{'experience': 'y', 'height': 72, 'name': 'Wence'},
{'experience': 'y', 'height': 65, 'name': 'Shubha'},
{'experience': 'n', 'height': 68, 'name': 'Taylor'}],
'newlist_1': [{'experience': 'y', 'height': 60, 'name': 'Diego'},
{'experience': 'n', 'height': 64, 'name': 'Jade'}],
'newlist_2': [{'experience': 'y', 'height': 72, 'name': 'Wence'},
{'experience': 'n', 'height': 70, 'name': 'Twee'}],
'newlist_3': [{'experience': 'y', 'height': 65, 'name': 'Shubha'},
{'experience': 'n', 'height': 68, 'name': 'Taylor'}]}
You can have 2 lists - one with experienced and one with inexperienced and build whatever lists you need from that, something like:
experienced = [worker for worker in mylist if 'y' == worker['experience']]
inexperienced = [worker for worker in mylist if 'n' == worker['experience']]
list1, list2, list3 = map(list, zip(experienced, inexperienced))
Try slicing the list of dictionaries into three seperate list
list1 = mylist[0:2]
list2 = mylist[2:4]
list3 = mylist[4:6]

Check unique values for a key in a list of dicts [duplicate]

This question already has answers here:
Remove duplicate dict in list in Python
(16 answers)
Closed 6 years ago.
I have a list of dictionaries where I want to drop any dictionaries that repeat their id key. What's the best way to do this e.g:
example dict:
product_1={ 'id': 1234, 'price': 234}
List_of_products[product1:, product2,...........]
How can I the list of products so I have non repeating products based on their product['id']
Select one of product dictionaries in which the values with the same id are different. Use itertools.groupby,
import itertools
list_products= [{'id': 12, 'price': 234},
{'id': 34, 'price': 456},
{'id': 12, 'price': 456},
{'id': 34, 'price': 78}]
list_dicts = list()
for name, group in itertools.groupby(sorted(list_products, key=lambda d : d['id']), key=lambda d : d['id']):
list_dicts.append(next(group))
print(list_dicts)
# Output
[{'price': 234, 'id': 12}, {'price': 456, 'id': 34}]
If the product dictionaries with the same id are totally the same, there is an easier way as described in Remove duplicate dict in list in Python. Here is a MWE.
list_products= [{'id': 12, 'price': 234},
{'id': 34, 'price': 456},
{'id': 12, 'price': 234},
{'id': 34, 'price': 456}]
result = [dict(t) for t in set([tuple(d.items()) for d in list_products])]
print(result)
# Output
[{'price': 456, 'id': 34}, {'price': 234, 'id': 12}]
a = [{'id': 124, 'price': 234}, {'id': 125, 'price': 234}, {'id': 1234, 'price': 234}, {'id': 1234, 'price': 234}]
a.sort()
for indx, val in enumerate(a):
if val['id'] == a[indx+1]['id']:
del a[indx]

sort a list of dicts by x then by y

I want to sort this info(name, points, and time):
list = [
{'name':'JOHN', 'points' : 30, 'time' : '0:02:2'},
{'name':'KARL','points':50,'time': '0:03:00'}
]
so, what I want is the list sorted first by points made, then by time played (in my example, matt go first because of his less time. any help?
I'm trying with this:
import operator
list.sort(key=operator.itemgetter('points', 'time'))
but got a TypeError: list indices must be integers, not str.
Your example works for me. I would advise you not to use list as a variable name, since it is a builtin type.
You could try doing something like this also:
list.sort(key=lambda item: (item['points'], item['time']))
edit:
example list:
>>> a = [
... {'name':'JOHN', 'points' : 30, 'time' : '0:02:20'},
... {'name':'LEO', 'points' : 30, 'time': '0:04:20'},
... {'name':'KARL','points':50,'time': '0:03:00'},
... {'name':'MARK','points':50,'time': '0:02:00'},
... ]
descending 'points':
using sort() for inplace sorting:
>>> a.sort(key=lambda x: (-x['points'],x['time']))
>>> pprint.pprint(a)
[{'name': 'MARK', 'points': 50, 'time': '0:02:00'},
{'name': 'KARL', 'points': 50, 'time': '0:03:00'},
{'name': 'JOHN', 'points': 30, 'time': '0:02:20'},
{'name': 'LEO', 'points': 30, 'time': '0:04:20'}]
>>>
using sorted to return a sorted list:
>>> pprint.pprint(sorted(a, key=lambda x: (-x['points'],x['time'])))
[{'name': 'MARK', 'points': 50, 'time': '0:02:00'},
{'name': 'KARL', 'points': 50, 'time': '0:03:00'},
{'name': 'JOHN', 'points': 30, 'time': '0:02:20'},
{'name': 'LEO', 'points': 30, 'time': '0:04:20'}]
>>>
ascending 'points':
>>> a.sort(key=lambda x: (x['points'],x['time']))
>>> import pprint
>>> pprint.pprint(a)
[{'name': 'JOHN', 'points': 30, 'time': '0:02:20'},
{'name': 'LEO', 'points': 30, 'time': '0:04:20'},
{'name': 'MARK', 'points': 50, 'time': '0:02:00'},
{'name': 'KARL', 'points': 50, 'time': '0:03:00'}]
>>>
itemgetter will throw this error up to Python2.4
If you are stuck on 2.4, you will need to use the lambda
my_list.sort(key=lambda x: (x['points'], x['time']))
It would be preferable to upgrade to a newer Python if possible

Categories