I am trying to remove all the duplicates and only make the original value remain in a nested dictionary. This is my nested dictionary below. In this dictionary, I am trying to check if two names are the same and if they are same, then remove the second and subsequent duplicates. For example: dictionaries 4 and 5 have the same name 'Sasha', so dictionary 5 should be removed.
dict1 = {
1: {'friends': [2],
'history': [],
'id': 1,
'name': 'Fred',
'date_of_birth': datetime.date(2022, 2, 1)},
2: {'friends': [1],
'history': [],
'id': 2,
'name': 'Jenny',
'date_of_birth': datetime.date(2004, 11, 18)},
3: {'friends': [4],
'history': [],
'id': 3,
'name': 'Jiang',
'date_of_birth': datetime.date(1942, 9, 16)},
4: {'friends': [3],
'history': [],
'id': 4,
'name': 'Sasha',
'date_of_birth': datetime.date(1834, 2, 2)},
5: {'friends': [6],
'history': [],
'id': 5,
'name': 'Sasha',
'date_of_birth': datetime.date(1834, 2, 2)},
6: {'friends': [5],
'history': [],
'id': 6,
'name': 'Amir',
'date_of_birth': datetime.date(1981, 8, 11)}}
I have implemented my solution like this but I don't understand where I am going wrong.
temp = []
res = dict()
for key, val in dict1.items():
if val not in temp:
temp.append(val)
res[key] = val
print(pprint.pformat(res))
It would be great if someone could help me with this.
In your for-loop, val is the inner dictionary and if you look closely, the value of 4 and 5 are different ("friends" and "id" are different), so it's not dropped. However, since you only need the "name" to be the same (not the entire dictionary), you can keep track of the "name" instead and keep only unique names:
temp = []
res = dict()
for key, val in dict1.items():
if val['name'] not in temp:
temp.append(val['name'])
res[key] = val
Edit:
If the goal is to "shift" keys as well, you could approach it a little differently by only storing the non-duplicate values in res, then zip it with the keys of dict1 to create the output dictionary:
temp = set()
res = []
for val in dict1.values():
if val['name'] not in temp:
temp.add(val['name'])
res.append(val)
out = dict(zip(dict1, res))
Output:
{1: {'friends': [2],
'history': [],
'id': 1,
'name': 'Fred',
'date_of_birth': datetime.date(2022, 2, 1)},
2: {'friends': [1],
'history': [],
'id': 2,
'name': 'Jenny',
'date_of_birth': datetime.date(2004, 11, 18)},
3: {'friends': [4],
'history': [],
'id': 3,
'name': 'Jiang',
'date_of_birth': datetime.date(1942, 9, 16)},
4: {'friends': [3],
'history': [],
'id': 4,
'name': 'Sasha',
'date_of_birth': datetime.date(1834, 2, 2)},
5: {'friends': [5],
'history': [],
'id': 6,
'name': 'Amir',
'date_of_birth': datetime.date(1981, 8, 11)}}
Related
I want to consolidate a list of lists (of dicts), but I have honestly no idea how to get it done.
The list looks like this:
l1 = [
[
{'id': 1, 'category': 5}, {'id': 3, 'category': 7}
],
[
{'id': 1, 'category': 5}, {'id': 4, 'category': 8}, {'id': 6, 'category': 9}
],
[
{'id': 6, 'category': 9}, {'id': 9, 'category': 16}
],
[
{'id': 2, 'category': 4}, {'id': 5, 'category': 17}
]
]
If one of the dicts from l1[0] is also present in l1[1], I want to concatenate the two lists and delete l1[0]. Afterwards I want to check if there are values from l1[1] also present in l1[2].
So my desired output would eventually look like this:
new_list = [
[
{'id': 1, 'category': 5}, {'id': 3, 'category': 7}, {'id': 4, 'category': 8}, {'id': 6, 'category': 9}, {'id': 9, 'category': 16}
],
[
{'id': 2, 'category': 4}, {'id': 5, 'category': 17}
]
]
Any idea how it can be done?
I tried it with 3 different for loops, but it wouldnt work, because I change the length of the list and by doing so I provoke an index-out-of-range error (apart from that it would be an ugly solution anyway):
for list in l1:
for dictionary in list:
for index in range(0, len(l1), 1):
if dictionary in l1[index]:
dictionary in l1[index].append(list)
dictionary.remove(list)
Can I apply some map or list_comprehension here?
Thanks a lot for any help!
IIUC, the following algorithm works.
Initialize result to empty
For each sublist in l1:
if sublist and last item in result overlap
append into last list of result without overlapping items
otherwise
append sublist at end of result
Code
# Helper functions
def append(list1, list2):
' append list1 and list2 (without duplicating elements) '
return list1 + [d for d in list2 if not d in list1]
def is_intersect(list1, list2):
' True if list1 and list2 have an element in common '
return any(d in list2 for d in list1) or any(d in list1 for d in list2)
# Generate desired result
result = [] # resulting list
for sublist in l1:
if not result or not is_intersect(sublist, result[-1]):
result.append(sublist)
else:
# Intersection with last list, so append to last list in result
result[-1] = append(result[-1], sublist)
print(result)
Output
[[{'id': 1, 'category': 5},
{'id': 3, 'category': 7},
{'id': 4, 'category': 8},
{'id': 6, 'category': 9},
{'id': 9, 'category': 16}],
[{'id': 2, 'category': 4}, {'id': 5, 'category': 17}]]
maybe you can try to append the elements into a new list. by doing so, the original list will remain the same and index-out-of-range error wouldn't be raised.
new_list = []
for list in l1:
inner_list = []
for ...
if dictionary in l1[index]:
inner_list.append(list)
...
new_list.append(inner_list)
I have a question about the convert key.
First, I have this type of word count in Data Frame.
[Example]
dict = {'forest': 10, 'station': 3, 'office': 7, 'park': 2}
I want to get this result.
[Result]
result = {'name': 'forest', 'value': 10,
'name': 'station', 'value': 3,
'name': 'office', 'value': 7,
'name': 'park', 'value': 2}
Please check this issue.
As Rakesh said:
dict cannot have duplicate keys
The closest way to achieve what you want is to build something like that
my_dict = {'forest': 10, 'station': 3, 'office': 7, 'park': 2}
result = list(map(lambda x: {'name': x[0], 'value': x[1]}, my_dict.items()))
You will get
result = [
{'name': 'forest', 'value': 10},
{'name': 'station', 'value': 3},
{'name': 'office', 'value': 7},
{'name': 'park', 'value': 2},
]
As Rakesh said, You can't have duplicate values in the dictionary
You can simply try this.
dict = {'forest': 10, 'station': 3, 'office': 7, 'park': 2}
result = {}
count = 0;
for key in dict:
result[count] = {'name':key, 'value': dict[key]}
count = count + 1;
print(result)
I have several lists of dictionaries, where each dictionary contains a unique id value that is common among all lists. I'd like to combine them into a single list of dicts, where each dict is joined on that id value.
list1 = [{'id': 1, 'value': 20}, {'id': 2, 'value': 21}]
list2 = [{'id': 1, 'sum': 10}, {'id': 2, 'sum': 11}]
list3 = [{'id': 1, 'total': 30}, {'id': 2, 'total': 32}]
desired_output = [{'id': 1, 'value': 20, 'sum': 10, 'total': 30}, {'id': 2, 'value': 21, 'sum': 11, 'total': 32}]
I tried doing something like the answer found at https://stackoverflow.com/a/42018660/7564393, but I'm getting very confused since I have more than 2 lists. Should I try using a defaultdict approach? More importantly, I am NOT always going to know the other values, only that the id value is present in all dicts.
You can use itertools.groupby():
from itertools import groupby
list1 = [{'id': 1, 'value': 20}, {'id': 2, 'value': 21}]
list2 = [{'id': 1, 'sum': 10}, {'id': 2, 'sum': 11}]
list3 = [{'id': 1, 'total': 30}, {'id': 2, 'total': 32}]
desired_output = []
for _, values in groupby(sorted([*list1, *list2, *list3], key=lambda x: x['id']), key=lambda x: x['id']):
temp = {}
for d in values:
temp.update(d)
desired_output.append(temp)
Result:
[{'id': 1, 'value': 20, 'sum': 10, 'total': 30}, {'id': 2, 'value': 21, 'sum': 11, 'total': 32}]
list1 = [{'id': 1, 'value': 20}, {'id': 2, 'value': 21}]
list2 = [{'id': 1, 'sum': 10}, {'id': 2, 'sum': 11}]
list3 = [{'id': 1, 'total': 30}, {'id': 2, 'total': 32}]
# combine all lists
d = {} # id -> dict
for l in [list1, list2, list3]:
for list_d in l:
if 'id' not in list_d: continue
id = list_d['id']
if id not in d:
d[id] = list_d
else:
d[id].update(list_d)
# dicts with same id are grouped together since id is used as key
res = [v for v in d.values()]
print(res)
You can first build a dict of dicts, then turn it into a list:
from itertools import chain
from collections import defaultdict
list1 = [{'id': 1, 'value': 20}, {'id': 2, 'value': 21}]
list2 = [{'id': 1, 'sum': 10}, {'id': 2, 'sum': 11}]
list3 = [{'id': 1, 'total': 30}, {'id': 2, 'total': 32}]
dict_out = defaultdict(dict)
for d in chain(list1, list2, list3):
dict_out[d['id']].update(d)
out = list(dict_out.values())
print(out)
# [{'id': 1, 'value': 20, 'sum': 10, 'total': 30}, {'id': 2, 'value': 21, 'sum': 11, 'total': 32}]
itertools.chain allows you to iterate on all the dicts contained in the 3 lists. We build a dict dict_out having the id as key, and the corresponding dict being built as value. This way, we can easily update the already built part with the small dict of our current iteration.
Here, I have presented a functional approach without using itertools (which is excellent in rapid development work).
This solution will work for any number of lists as the function takes variable number of arguments and also let user to specify the type of return output (list/dict).
By default it returns list as you want that otherwise it returns dictionary in case if you pass as_list = False.
I preferred dictionary to solve this because its fast and search complexity is also less.
Just have a look at the below get_packed_list() function.
get_packed_list()
def get_packed_list(*dicts_lists, as_list=True):
output = {}
for dicts_list in dicts_lists:
for dictionary in dicts_list:
_id = dictionary.pop("id") # id() is in-built function so preferred _id
if _id not in output:
# Create new id
output[_id] = {"id": _id}
for key in dictionary:
output[_id][key] = dictionary[key]
dictionary["id"] = _id # push back the 'id' after work (call by reference mechanism)
if as_list:
return [output[key] for key in output]
return output # dictionary
Test
list1 = [{'id': 1, 'value': 20}, {'id': 2, 'value': 21}]
list2 = [{'id': 1, 'sum': 10}, {'id': 2, 'sum': 11}]
list3 = [{'id': 1, 'total': 30}, {'id': 2, 'total': 32}]
output = get_packed_list(list1, list2, list3)
print(output)
# [{'id': 1, 'value': 20, 'sum': 10, 'total': 30}, {'id': 2, 'value': 21, 'sum': 11, 'total': 32}]
output = get_packed_list(list1, list2, list3, as_list=False)
print(output)
# {1: {'id': 1, 'value': 20, 'sum': 10, 'total': 30}, 2: {'id': 2, 'value': 21, 'sum': 11, 'total': 32}}
list1 = [{'id': 1, 'value': 20}, {'id': 2, 'value': 21}]
list2 = [{'id': 1, 'sum': 10}, {'id': 2, 'sum': 11}]
list3 = [{'id': 1, 'total': 30}, {'id': 2, 'total': 32}]
print(list1+list2+list3)
list1 = [{'id': 1, 'value': 20}, {'id': 2, 'value': 21}]
list2 = [{'id': 1, 'sum': 10}, {'id': 2, 'sum': 11}]
list3 = [{'id': 1, 'total': 30}, {'id': 2, 'total': 32}]
result = []
for i in range(0,len(list1)):
final_dict = dict(list(list1[i].items()) + list(list2[i].items()) + list(list3[i].items()))
result.append(final_dict)
print(result)
output : [{'id': 1, 'value': 20, 'sum': 10, 'total': 30}, {'id': 2, 'value': 21, 'sum': 11, 'total': 32}]
I am trying to update a document in a collection:
for bookerName,bookingId in zip(bookerNames,bookingIds):
Person.update_one({'Name': bookerName}, {'$push': {'Bookings': bookingId}})
I am trying to add a new bookingId to the list of Bookings this Person already has.
I receive the following error:
bson.errors.InvalidDocument: Cannot encode object: 0
From what I understand about this error, it means Mongo don't know how to encode a list, it treats it as a custom type.
However, this is strange since I can do:
for bookerName in BookerNames:
personRecord = {"Name":bookerName,"Bookings":[1,2,3,4]}
Person.insert_one(personRecord)
And it works perfectly.
I have also tried to update using $set and even by deleting the old record and trying to insert a new one with the updated list.
Any idea why this is happening or how can I update a list in a document?
I am using Python 3.5, PyMongo 3.0.3 and MongoDB 3.0.7. Could the difference in version between PyMongo and MondoDB be the issue?
EDIT:
for bookerName in bookerNames:
bookingList = bookersToBookings[bookerName]
personRecord = {"Name":bookerName,"Bookings":bookingList}
Person.insert_one(personRecord)
Where bookersToBookings is a dictionary mapping bookers to their bookingIds, as list.
This does not work, although if I make a record with a list like [1,2,3], it works.
EDIT 2:
for bookerName in bookerNames:
personRecord = {"Name":bookerName,"Bookings":[]}
Person.insert_one(personRecord)
for x in Person.find():
print(x)
This works. It prints:
{'Name': 'Alice', '_id': ObjectId('56241ad79e44c71641efa383'), 'Bookings': []}
{'Name': 'John', '_id': ObjectId('56241ad79e44c71641efa384'), 'Bookings': []}
{'Name': 'Jane', '_id': ObjectId('56241ad79e44c71641efa385'), 'Bookings': []}
{'Name': 'Mary', '_id': ObjectId('56241ad79e44c71641efa386'), 'Bookings': []}
{'Name': 'Dan', '_id': ObjectId('56241ad79e44c71641efa387'), 'Bookings': []}
This also works:
for bookerName in bookerNames:
personRecord = {"Name":bookerName,"Bookings":[1,2,3]}
Person.insert_one(personRecord)
Result:
{'Bookings': [1, 2, 3], '_id': ObjectId('56241b029e44c716497b60b2'), 'Name': 'Alice'}
{'Bookings': [1, 2, 3], '_id': ObjectId('56241b029e44c716497b60b3'), 'Name': 'John'}
{'Bookings': [1, 2, 3], '_id': ObjectId('56241b029e44c716497b60b4'), 'Name': 'Jane'}
{'Bookings': [1, 2, 3], '_id': ObjectId('56241b029e44c716497b60b5'), 'Name': 'Mary'}
{'Bookings': [1, 2, 3], '_id': ObjectId('56241b029e44c716497b60b6'), 'Name': 'Dan'}
Even this works:
for bookerName in bookerNames:
y = [1,2,3,4]
personRecord = {"Name":bookerName,"Bookings":[x for x in y]}
Person.insert_one(personRecord)
Result:
{'_id': ObjectId('56241b5a9e44c7165395a0f0'), 'Bookings': [1, 2, 3, 4], 'Name': 'Alice'}
{'_id': ObjectId('56241b5a9e44c7165395a0f1'), 'Bookings': [1, 2, 3, 4], 'Name': 'John'}
{'_id': ObjectId('56241b5a9e44c7165395a0f2'), 'Bookings': [1, 2, 3, 4], 'Name': 'Jane'}
{'_id': ObjectId('56241b5a9e44c7165395a0f3'), 'Bookings': [1, 2, 3, 4], 'Name': 'Mary'}
{'_id': ObjectId('56241b5a9e44c7165395a0f4'), 'Bookings': [1, 2, 3, 4], 'Name': 'Dan'}
This, however, does not work:
for bookerName in bookerNames:
y = bookersToBookings[bookerName]
personRecord = {"Name":bookerName,"Bookings":[x for x in y]}
Person.insert_one(personRecord)
bookersToBookings looks like this:
{'Mary': [5, 9], 'Jane': [4, 7, 8, 13, 14], 'Alice': [0, 6, 10], 'Dan': [12], 'John': [1, 2, 3, 11]}
EDIT 3 - SOLVED:
For $%#^ sake! The list was made of numpy64 ints, not python ints. It works now.
Thanks for the help!
I have a dictionary of dictionaries. Within these subdictionaries, I have two keys - ui_section and section_order - that determine if the value of this subdictionary is shown in a specfic portion of the UI and if so, what order it appears. My dictionary looks like this:
MASTER_DICT = {
'key1': {'ui_section':[1,2],'section_order':1, 'value': 'key1'},
'key2': {'ui_section':[1],'section_order':2, 'value': 'key2'},
'key3': {'ui_section':[1,2],'section_order':3, 'value': 'key3'},
'key4': {'ui_section':[1],'section_order':4, 'value': 'key4'},
'key5': {'ui_section':[1],'section_order':5, 'value': 'key5'},
'key6': {'ui_section':[1],'section_order':6, 'value': 'key6'},
'key7': {'ui_section':[1],'section_order':7, 'value': 'key7'},
'key8': {'ui_section':[1],'section_order':8, 'value': 'key8'},
'key9': {'ui_section':[1],'section_order':9, 'value': 'key9'},
}
The ui_section is a list of possible sections the key can appear in. I determine this via the following code:
def show_section_ui(master_dict, section=None):
if section:
ui_sections = []
# Find the keys that are part of this section
for k in master_dict.keys():
try:
if section in master_dict[k]['ui_section']:
ui_sections.append(master_dict[k])
except AttributeError:
pass
# Order the keys by sort order
ui_sections.sort(key=lambda x: x['section_order'])
return ui_sections
else:
return None
This portion of the code works. The output below shows that order is correct for both sections 1 and 2.
>>> pprint.pprint(show_section_ui(MASTER_DICT, 1))
[{'section_order': 1, 'ui_section': [1,2], 'value': 'key1'},
{'section_order': 2, 'ui_section': [1], 'value': 'key2'},
{'section_order': 3, 'ui_section': [1,2], 'value': 'key3'},
{'section_order': 4, 'ui_section': [1], 'value': 'key4'},
{'section_order': 5, 'ui_section': [1], 'value': 'key5'},
{'section_order': 6, 'ui_section': [1], 'value': 'key6'},
{'section_order': 7, 'ui_section': [1], 'value': 'key7'},
{'section_order': 8, 'ui_section': [1], 'value': 'key8'},
{'section_order': 9, 'ui_section': [1], 'value': 'key9'}]
>>> pprint.pprint(show_section_ui(MASTER_DICT, 2))
[{'section_order': 1, 'ui_section': [1,2], 'value': 'key1'},
{'section_order': 3, 'ui_section': [1,2], 'value': 'key3'}]
My problem is that the section_order needs to have sort orders per ui_section. For example, in the above outputs, in section 2, I'd like key3 to be first. My initial thought was to make section_order a list as well. But, I'm not sure how to adjust this line to properly account for the list (and to select the correct index to sort by then)
ui_sections.sort(key=lambda x: x['section_order'])
My intention was to do something like this:
MASTER_DICT = {
'key1': {'ui_section':[1,2],'section_order':[1,2], 'value': 'key1'},
'key2': {'ui_section':[1],'section_order':[2], 'value': 'key2'},
'key3': {'ui_section':[1,2],'section_order':[3,1], 'value': 'key3'},
}
Getting me output like this:
>>> pprint.pprint(show_section_ui(MASTER_DICT, 2))
[{'section_order': [3,1], 'ui_section': [1,2], 'value': 'key3'},
{'section_order': [1,2], 'ui_section': [1,2], 'value': 'key1'}]
How can I sort by the ui_section and the appropriate index within the key?
I think I understood. If you want to sort a list of items according to the order of another list of items, you could do something like this. Step by step, the last line is what need.
We need itertools.count(), which is an infinite incrementing range, used to apply an index.
import itertools
these should be sorted by 'value'
>>> values = [{"name": "A", "value": 10},
{"name": "B", "value": 8},
{"name": "C", "value": 9}]
these are in the same input order as values, and should be sorted in the the same order
>>> to_sort = [{"name": "A", "payload": "aaa"},
{"name": "B", "payload": "bbb"},
{"name": "C", "payload": "ccc"}]
Zip the values with their indexes. This annotates each item to include its original order. This is a list of pairs of (object, index)
>>> zip(values, itertools.count())
[({'name': 'A', 'value': 10}, 0),
({'name': 'B', 'value': 8}, 1),
({'name': 'C', 'value': 9}, 2)]
Now sort by the key 'value'. The x[0] is to get the first item of the pair (the object).
>>> sorted(zip(values, itertools.count()), key=lambda x: x[0]["value"])
[({'name': 'B', 'value': 8}, 1),
({'name': 'C', 'value': 9}, 2),
({'name': 'A', 'value': 10}, 0)]
Now retrieve the indexes from the pairs to return the new order of the original indexes.
>>> map(lambda x: x[1],
sorted(zip(values, itertools.count()),
key=lambda x: x[0]["value"]))
[1, 2, 0]
Now use those indexes to map over the to_sort list and retrieve the items at those indexes.
>>> map(lambda i: to_sort[i],
map(lambda x: x[1],
sorted(zip(values, itertools.count()),
key=lambda x: x[0]["value"])))
[{'name': 'B', 'payload': 'bbb'},
{'name': 'C', 'payload': 'ccc'},
{'name': 'A', 'payload': 'aaa'}]
I hope that answers your question. It means changing MASTER_DICT to be a list, but I think that's the best representation for it anyway.
I don't have a nice one line code change for you, but you can replace the line you have:
ui_sections.sort(key=lambda x: x['section_order'])
With this:
sort_orders = []
for s in ui_sections:
ndx = s['ui_section'].index(section)
# This next line makes the assumption that you ALWAYS have a section_order
# for every ui_section listed. If not, you'll get an IndexError
sort_orders.append(s['section_order'][ndx])
# Magic happens here
sorted_sections = [x for y, x in sorted(zip(sort_orders,ui_sections))]
return sorted_sections
Output:
>>> pprint.pprint(show_section_ui(MASTER_DICT, 2))
[{'section_order': [3, 1], 'ui_section': [1, 2], 'value': 'key3'},
{'section_order': [1, 2], 'ui_section': [1, 2], 'value': 'key1'}]
>>> pprint.pprint(show_section_ui(MASTER_DICT, 1))
[{'section_order': [1, 2], 'ui_section': [1, 2], 'value': 'key1'},
{'section_order': [2], 'ui_section': [1], 'value': 'key2'},
{'section_order': [3, 1], 'ui_section': [1, 2], 'value': 'key3'},
{'section_order': [4], 'ui_section': [1], 'value': 'key4'},
{'section_order': [5], 'ui_section': [1], 'value': 'key5'},
{'section_order': [6], 'ui_section': [1], 'value': 'key6'},
{'section_order': [7], 'ui_section': [1], 'value': 'key7'},
{'section_order': [8], 'ui_section': [1], 'value': 'key8'},
{'section_order': [9], 'ui_section': [1], 'value': 'key9'}]
Adding key8 to the second ui_section at position 3, and key7 at position 4:
[{'section_order': [3, 1], 'ui_section': [1, 2], 'value': 'key3'},
{'section_order': [1, 2], 'ui_section': [1, 2], 'value': 'key1'},
{'section_order': [8, 3], 'ui_section': [1, 2], 'value': 'key8'},
{'section_order': [7, 4], 'ui_section': [1, 2], 'value': 'key7'}]
This is utilizing this answer. First, though, it finds the index that the section is listed in the ui_section:
ndx = s['ui_section'].index(section)
The value at this location is then added to the sort_orders list. Note that the code provided does not error check that this is valid (ie. if you don't have a value for the second position), and will throw an IndexError if it is not.
sort_orders.append(s['section_order'][ndx])
Next it zips the two lists together so that you have a list of tuples containing the sort order and the section dictionary.
[(3, {'ui_section': [1, 2], 'section_order': [8, 3], 'value': 'key8'}),
(1, {'ui_section': [1, 2], 'section_order': [3, 1], 'value': 'key3'}),
(2, {'ui_section': [1, 2], 'section_order': [1, 2], 'value': 'key1'}),
(4, {'ui_section': [1, 2], 'section_order': [7, 4], 'value': 'key7'})
]
Then we sort that based on the first position in the tuple. Then we unzip it and pull back the sorted information. All of that occurs in this line:
sorted_sections = [x for y, x in sorted(zip(sort_orders,ui_sections))]