Remove duplicates dictionaries from list of dictionaries in python - python

I want to remove duplicates dictionaries from list of dictionaries. I am trying to make configurable code to work on any field instead of making field specific.
Input Data
dct = {'Customer_Number': 90617174,
'Phone_Number': [{'Phone_Type': 'Mobile', 'Phone': [12177218280.0]},
{'Phone_Type': 'Mobile', 'Phone': [12177218280.0]}],
'Email': [{'Email_Type': 'Primary',
'Email': ['saman.zonouz#rutgers.edu']},
{'Email_Type': 'Primary',
'Email': ['saman.zonouz#rutgers.edu']}]
}
Expected Output:
{'Customer_Number': 90617174,
'Email': [{'Email_Type': 'Primary',
'Email': ['saman.zonouz#rutgers.edu']}],
'Phone_Number': [{'Phone_Type': 'Mobile',
'Phone': [12177218280]}]}
**Code tried:**
res_list = []
for key,value in address_dic.items():
if isinstance(value,list):
for i in range(len(value)):
if value[i] not in value[i + 1:]:
res_list.append(value[i])
dic.append((res_list))
**Output Getting**
type: [[{'Email_Type': 'Primary', 'Email': ['saman.zonouz#rutgers.edu']}, {'Email_Type': 'alternate', 'Email': ['samance#gmail.com', 'saman.zonouz#rutgers.edu']}], [], [{'Phone_Type': 'Mobile', 'Phone': [12177218280.0]}, {'Phone_Type': 'work', 'Phone': [nan]}, {'Phone_Type': 'home', 'Phone': [2177218280.0]}], []]

Write a function to dedupe lists:
def dedupe_list(lst):
result = []
for el in lst:
if el not in result:
result.append(el)
return result
def dedupe_dict_values(dct):
result = {}
for key in dct:
if type(dct[key]) is list:
result[key] = dedupe_list(dct[key])
else:
result[key] = dct[key]
return result
Test it:
deduped_dict = {'Customer_Number': 90617174,
'Email': [{'Email_Type': 'Primary',
'Email': ['saman.zonouz#rutgers.edu']}],
'Phone_Number': [{'Phone_Type': 'Mobile',
'Phone': [12177218280]}]}
dedupe_dict_values(dct) == deduped_dict
## Out[12]: True

Related

Sum up how many fields are empty or do not exist in a dict

I have a problem. I have a list myList inside these list, there a dictionaries. I want to count if the field dataOriginSystem is empty or does not exist. Unfortunately I got the wrong result. if(key_nested == 'dataOriginSystem'): ... else: count =+ 1
The reason for this lies in the if-query. Since I am asking, does the field exist? If no, then count it up and since I loop through all the nested keys, one of the errors is here. Also, is there a way to make this more efficient?
How can I query how many of the field dataOriginSystem are empty or non-existent?
count = 0
for element in myList:
for key in element.keys():
if(key == 'metaData'):
for key_nested in element[key].keys():
if(key_nested == 'dataOriginSystem'):
if(key_nested == None):
count += 1
else:
count += 1
print(count)
myList = [
{'_id': 'orders/213123',
'contactEditor': {'name': 'Max Power',
'phone': '1234567',
'email': 'max#power.com'},
'contactSoldToParty': {'name': 'Max Not',
'phone': '123456789',
'email': 'maxnot#power.com'},
'isCompleteDelivery': False,
'metaData': {'dataOriginSystem': 'Goods',
'dataOriginWasCreatedTime': '10:12:12',},
'orderDate': '2021-02-22',
'orderDateBuyer': '2021-02-22',
},
{'_id': 'orders/12323',
'contactEditor': {'name': 'Max Power2',
'phone': '1234567',
'email': 'max#power.com'},
'contactSoldToParty': {'name': 'Max Not',
'phone': '123456789',
'email': 'maxnot#power.com'},
'isCompleteDelivery': False,
'metaData': {'dataOriginSystem': 'Goods',
'dataOriginWasCreatedTime': '10:12:12',},
'orderDate': '2021-02-22',
'orderDateBuyer': '2021-02-22',
},
{'_id': 'orders/12323',
'contactEditor': {'name': 'Max Power2',
'phone': '1234567',
'email': 'max#power.com'},
'contactSoldToParty': {'name': 'Max Not',
'phone': '123456789',
'email': 'maxnot#power.com'},
'isCompleteDelivery': False,
'metaData': {
'dataOriginWasCreatedTime': '10:12:12',},
'orderDate': '2021-02-22',
'orderDateBuyer': '2021-02-22',
},
{'_id': 'orders/12323',
'contactEditor': {'name': 'Max Power2',
'phone': '1234567',
'email': 'max#power.com'},
'contactSoldToParty': {'name': 'Max Not',
'phone': '123456789',
'email': 'maxnot#power.com'},
'isCompleteDelivery': False,
'metaData': {'dataOriginSystem': None,
'dataOriginWasCreatedTime': '10:12:12',},
'orderDate': '2021-02-22',
'orderDateBuyer': '2021-02-22',
},
]
Result should be
[OUT] 2
# Because of the two last elements.
# The first element does not exist
# and the second ist None.
You can use dict.get directly on the nested key, returning a default value of None, and then count the number of None values you get:
sum(d['metaData'].get('dataOriginSystem', None) is None for d in myList)
Output
2
You don't have to loop through the keys. Access the item you want directly and increment the counter if the item is not found, or found but None.
count = 0
for element in myList:
if element["metaData"].get("dataOriginSystem", None) is None:
count += 1
print(count)
You can try something like this...
non_existant = len([0 for item in myList if item['metaData'].get('dataOriginSystem') == None or item['metaData'].get('dataOriginSystem') == ''])
print(non_existant)
Output...
2

Writing a GET functionality for json-like data in Python

I am working on a coding challenge for self-development and I came across a question where I am given an input like this:
add {"id":1,"last":"Doe","first":"John","location":{"city":"Oakland","state":"CA","postalCode":"94607"},"active":true}
add {"id":2,"last":"Doe","first":"Jane","location":{"city":"San Francisco","state":"CA","postalCode":"94105"},"active":true}
add {"id":3,"last":"Black","first":"Jim","location":{"city":"Spokane","state":"WA","postalCode":"99207"},"active":true}
add {"id":4,"last":"Frost","first":"Jack","location":{"city":"Seattle","state":"WA","postalCode":"98204"},"active":false}
get {"location":{"state":"WA"},"active":true}
get {"id":1}
get {"active":true}
delete {"active":true}
get {}
And what I am doing is adding the entries that start with add to a list called database = []:
json_input = []
database = []
for line in sys.stdin:
json_input.append(line.split("', "))
for i in range(0, len(json_input)):
if json_input[i][0] == 'add':
database.append(json_input[i][1])
What I want to do is to print out every entry that matches what follows get and delete every entry that matches what follows delete. This is where I am stuck. Currently, this is what json_input() looks like. database is empty:
[
['add {"id":1,"last":"Doe","first":"John","location":{"city":"Oakland","state":"CA","postalCode":"94607"},"active":true}\n'],
['add {"id":2,"last":"Doe","first":"Jane","location":{"city":"San Francisco","state":"CA","postalCode":"94105"},"active":true}\n'],
['add {"id":3,"last":"Black","first":"Jim","location":{"city":"Spokane","state":"WA","postalCode":"99207"},"active":true}\n'],
['add {"id":4,"last":"Frost","first":"Jack","location":{"city":"Seattle","state":"WA","postalCode":"98204"},"active":false}\n'],
['get {"location":{"state":"WA"},"active":true}\n'], ['get {"id":1}\n'],
['get {"active":true}\n'], ['delete {"active":true}\n'],
['get {}']
]
Perhaps an easy-to-read way to handle this would be a simple class that maintains a list of records. You can add methods for the various commands you want to handle. Then it's just a matter of defining the methods and processing the input to pass to the methods. Here's a possible way (without any frills like error checking):
import json
raw_data = '''add {"id":1,"last":"Doe","first":"John","location":{"city":"Oakland","state":"CA","postalCode":"94607"},"active":true}
add {"id":2,"last":"Doe","first":"Jane","location":{"city":"San Francisco","state":"CA","postalCode":"94105"},"active":true}
add {"id":3,"last":"Black","first":"Jim","location":{"city":"Spokane","state":"WA","postalCode":"99207"},"active":true}
add {"id":4,"last":"Frost","first":"Jack","location":{"city":"Seattle","state":"WA","postalCode":"98204"},"active":false}
get {"location":{"state":"WA"},"active":true}
get {"id":1}
get {"active":true}
delete {"active":true}
get {}'''
class Data:
#staticmethod
def matches(obj, query):
if not isinstance(query, dict):
return obj == query
return all(Data.matches(obj.get(key), q) for key, q in query.items())
def __init__(self):
self.data = []
def add(self, record):
self.data.append(record)
def get(self, query):
for item in self.data:
if (Data.matches(item, query)):
print(item)
def delete(self, query):
self.data = [record for record in self.data if not Data.matches(record, query)]
data = Data()
for line in raw_data.split('\n'):
command, line = line.split(None, 1)
command = getattr(data, command)
command(json.loads(line))
This will print the records from WA then the active:True records. Then after deleting the True records it will print everything (the result of the {} query), which is the only one left -- the active:False record:
{'id': 3, 'last': 'Black', 'first': 'Jim', 'location': {'city': 'Spokane', 'state': 'WA', 'postalCode': '99207'}, 'active': True}
{'id': 1, 'last': 'Doe', 'first': 'John', 'location': {'city': 'Oakland', 'state': 'CA', 'postalCode': '94607'}, 'active': True}
{'id': 1, 'last': 'Doe', 'first': 'John', 'location': {'city': 'Oakland', 'state': 'CA', 'postalCode': '94607'}, 'active': True}
{'id': 2, 'last': 'Doe', 'first': 'Jane', 'location': {'city': 'San Francisco', 'state': 'CA', 'postalCode': '94105'}, 'active': True}
{'id': 3, 'last': 'Black', 'first': 'Jim', 'location': {'city': 'Spokane', 'state': 'WA', 'postalCode': '99207'}, 'active': True}
{'id': 4, 'last': 'Frost', 'first': 'Jack', 'location': {'city': 'Seattle', 'state': 'WA', 'postalCode': '98204'}, 'active': False}
If this were a test or a serious coding challenge, you would probably want to look carefully at matches() to make sure it properly handles edge cases (I didn't do that).

Search for string in values of a dict and return new dict

I have the following dict
{'returnData': [{'eMail': None,
'firstName': 'Peter',
'id': '1234',
'name': 'Parker'},
{'eMail': 'lucky#mail.example',
'firstName': 'Lucky',
'id': '123',
'name': 'Luke'},
{'eMail': 'micky#mail.example',
'firstName': 'Micky',
'id': '3456',
'name': 'Mouse'}],
'status': {'errorCode': 0,
'message': None,
'subErrorCode': None,
'success': True}}
How would i search the dict for the values of eMail, firstName and name and return all found matches in a new dict.
For example, i search for mail.example it should return only two entries.
Python3
Solution 1:
list(filter(lambda profile: '#mail.example' in str(profile['eMail']), data['returnData']))
to search in all values of dict
list(filter(lambda profile: '#mail.example' in str(
profile.items()), data['returnData']))
Solution 2:
Create a search function.
data = {'returnData': [{'eMail': None,
'firstName': 'Peter',
'id': '1234',
'name': 'Parker'},
{'eMail': 'lucky#mail.example',
'firstName': 'Lucky',
'id': '123',
'name': 'Luke'},
{'eMail': 'micky#mail.example',
'firstName': 'Micky',
'id': '3456',
'name': 'Mouse'}],
'status': {'errorCode': 0,
'message': None,
'subErrorCode': None,
'success': True}}
def search(search_term, field, data):
result = []
for item in data:
if search_term in str(item[field]):
result.append(item)
return result
print(search("#mail.example", "eMail", data['returnData']))
You may want to use re module here
import re
d = {'returnData': [{'eMail': None,
'firstName': 'Peter',
'id': '1234',
'name': 'Parker'},
{'eMail': 'lucky#mail.example',
'firstName': 'Lucky',
'id': '123',
'name': 'Luke'},
{'eMail': 'micky#mail.example',
'firstName': 'Micky',
'id': '3456',
'name': 'Mouse'}],
'status': {'errorCode': 0,
'message': None,
'subErrorCode': None,
'success': True}}
out = []
for entry in d['returnData']:
pattern = 'mail.example'
email = entry.get('eMail', None)
if email and re.search(pattern, email):
out.append(email)
print(out)
Iterate over dict_['returnData'] to look for the eMail:
dict_ = {'returnData': [{'eMail': None,
'firstName': 'Peter',
'id': '1234',
'name': 'Parker'},
{'eMail': 'lucky#mail.example',
'firstName': 'Lucky',
'id': '123',
'name': 'Luke'},
{'eMail': 'micky#mail.example',
'firstName': 'Micky',
'id': '3456',
'name': 'Mouse'}],
'status': {'errorCode': 0,
'message': None,
'subErrorCode': None,
'success': True}}
for elem in dict_['returnData']:
print(elem['eMail'])
Note: use if elem['firstName']: print(elem['eMail']) for only two
enteries
OUTPUT:
None
lucky#mail.example
micky#mail.example
EDIT:
even better, using lists:
firstNames = []
names = []
emails = []
for elem in dict_['returnData']:
if elem['firstName']:
firstNames.append(elem['firstName'])
if elem['name']:
names.append(elem['name'])
if elem['eMail']:
emails.append(elem['eMail'])
print("First Names: {}".format(firstNames))
print("Names: {}".format(names))
print("Emails: {}".format(emails))
OUTPUT:
First Names: ['Peter', 'Lucky', 'Micky']
Names: ['Parker', 'Luke', 'Mouse']
Emails: ['lucky#mail.example', 'micky#mail.example']

Filter/group dictionary by nested value

Here‘s a simplified example of some data I have:
{"id": "1234565", "fields": {"name": "john", "email":"john#example.com", "country": "uk"}}
The wholeo nested dictionary is a bigger list of address data. The goal is to create pairs of people from the list with randomized partners where partners from the same country should be preferd. So my first real issue is to find a good way to group them by that country value.
I‘m sure there‘s a smarter way to do this than iterating through the dict and writing all records out to some new list/dict?
I think this is close to what you need:
result = {key:[i for i in value] for key, value in itertools.groupby(people, lambda item: item["fields"]["country"])}
What this does is use itertools.groupby to group all people in the people list by their specified country. The resulting dictionary has countries as keys, and the unpacked groupings (matching people) as values. Input is expected as a list of dictionaries like the one in your example:
people = [{"id": "1234565", "fields": {"name": "john", "email":"john#example.com", "country": "uk"}},
{"id": "654321", "fields": {"name": "sam", "email":"sam#example.com", "country": "uk"}}]
Sample output:
>>> print(result)
>>> {'uk': [{'fields': {'name': 'john', 'email': 'john#example.com', 'country': 'uk'}, 'id': '1234565'}, {'fields': {'name': 'sam', 'email': 'sam#example.com', 'country': 'uk'}, 'id': '654321'}]}
For a cleaner result, the looping construct can be tweaked so that only the ID of each person is included in the result dict:
result = {key:[i["id"] for i in value] for key, value in itertools.groupby(people, lambda item: item["fields"]["country"])}
>>> print(result)
>>> {'uk': ['1234565', '654321']}
EDIT: Sorry, I forgot about the sorting. Simply sort the list of people by country before putting it through groupby. It should now work properly:
sort = sorted(people, key=lambda item: item["fields"]["country"])
Here is another one that uses defaultdict:
import collections
def make_groups(nested_dicts, nested_key):
default = collections.defaultdict(list)
for nested_dict in nested_dicts:
for value in nested_dict.values():
try:
default[value[nested_key]].append(nested_dict)
except TypeError:
pass
return default
To test the results:
import random
COUNTRY = {'af', 'br', 'fr', 'mx', 'uk'}
people = [{'id': i, 'fields': {
'name': 'name'+str(i),
'email': str(i)+'#email',
'country': random.sample(COUNTRY, 1)[0]}}
for i in range(10)]
country_groups = make_groups(people, 'country')
for country, persons in country_groups.items():
print(country, persons)
Random output:
fr [{'id': 0, 'fields': {'name': 'name0', 'email': '0#email', 'country': 'fr'}}, {'id': 1, 'fields': {'name': 'name1', 'email': '1#email', 'country': 'fr'}}, {'id': 4, 'fields': {'name': 'name4', 'email': '4#email', 'country': 'fr'}}]
br [{'id': 2, 'fields': {'name': 'name2', 'email': '2#email', 'country': 'br'}}, {'id': 8, 'fields': {'name': 'name8', 'email': '8#email', 'country': 'br'}}]
uk [{'id': 3, 'fields': {'name': 'name3', 'email': '3#email', 'country': 'uk'}}, {'id': 7, 'fields': {'name': 'name7', 'email': '7#email', 'country': 'uk'}}]
af [{'id': 5, 'fields': {'name': 'name5', 'email': '5#email', 'country': 'af'}}, {'id': 9, 'fields': {'name': 'name9', 'email': '9#email', 'country': 'af'}}]
mx [{'id': 6, 'fields': {'name': 'name6', 'email': '6#email', 'country': 'mx'}}]

Extract multiple key:value pairs from one dict to a new dict

I have a list of dict what some data, and I would like to extract certain key:value pairs into a new list of dicts. I know one way that I could do this would be to use del i['unwantedKey'], however, I would rather not delete any data but instead create a new dict with the needed data.
The column order might change, so I need something to extract the two key:value pairs from the larger dict into a new dict.
Current Data Format
[{'Speciality': 'Math', 'Name': 'Matt', 'Location': 'Miami'},
{'Speciality': 'Science', 'Name': 'Ben', 'Location': 'Las Vegas'},
{'Speciality': 'Language Arts', 'Name': 'Sarah', 'Location': 'Washington DC'},
{'Speciality': 'Spanish', 'Name': 'Tom', 'Location': 'Denver'},
{'Speciality': 'Chemistry', 'Name': 'Jim', 'Location': 'Dallas'}]
Code to delete key:value from dict
import csv
data= []
for line in csv.DictReader(open('data.csv')):
data.append(line)
for i in data:
del i['Speciality']
print data
Desired Data Format without using del i['Speciality']
[{'Name': 'Matt', 'Location': 'Miami'},
{'Name': 'Ben', 'Location': 'Las Vegas'},
{'Name': 'Sarah', 'Location': 'Washington DC'},
{'Name': 'Tom', 'Location': 'Denver'},
{'Name': 'Jim', 'Location': 'Dallas'}]
If you want to give a positive list of keys to copy over into the new dictionaries:
import csv
with open('data.csv', 'rb') as csv_file:
data = list(csv.DictReader(csv_file))
keys = ['Name', 'Location']
new_data = [dict((k, d[k]) for k in keys) for d in data]
print new_data
suppose we have,
l1 = [{'Location': 'Miami', 'Name': 'Matt', 'Speciality': 'Math'},
{'Location': 'Las Vegas', 'Name': 'Ben', 'Speciality': 'Science'},
{'Location': 'Washington DC', 'Name': 'Sarah', 'Speciality': 'Language Arts'},
{'Location': 'Denver', 'Name': 'Tom', 'Speciality': 'Spanish'},
{'Location': 'Dallas', 'Name': 'Jim', 'Speciality': 'Chemistry'}]
to create a new list of dictionaries that do not contain the keys 'Speciality' we can do,
l2 = []
for oldd in l1:
newd = {}
for k,v in oldd.items():
if k != 'Speciality':
newd[k] = v
l2.append(newd)
and now l2 will be your desired output. In general you can exclude an arbitrary list of keys like so
exclude_keys = ['Speciality', 'Name']
l2 = []
for oldd in l1:
newd = {}
for k,v in oldd.items():
if k not in exclude_keys:
newd[k] = v
l2.append(newd)
the same can be done with an include_keys variable
include_keys = ['Name', 'Location']
l2 = []
for oldd in l1:
newd = {}
for k,v in oldd.items():
if k in include_keys:
newd[k] = v
l2.append(newd)
You can create a new list of dicts limited to the keys you want with one line of code (Python 2.6+):
NLoD=[{k:d[k] for k in ('Name', 'Location')} for d in LoD]
Try it:
>>> LoD=[{'Speciality': 'Math', 'Name': 'Matt', 'Location': 'Miami'},
{'Speciality': 'Science', 'Name': 'Ben', 'Location': 'Las Vegas'},
{'Speciality': 'Language Arts', 'Name': 'Sarah', 'Location': 'Washington DC'},
{'Speciality': 'Spanish', 'Name': 'Tom', 'Location': 'Denver'},
{'Speciality': 'Chemistry', 'Name': 'Jim', 'Location': 'Dallas'}]
>>> [{k:d[k] for k in ('Name', 'Location')} for d in LoD]
[{'Name': 'Matt', 'Location': 'Miami'}, {'Name': 'Ben', 'Location': 'Las Vegas'}, {'Name': 'Sarah', 'Location': 'Washington DC'}, {'Name': 'Tom', 'Location': 'Denver'}, {'Name': 'Jim', 'Location': 'Dallas'}]
Since you are using csv, you can limit the columns that you read in the first place to the desired columns so you do not need to delete the undesired data:
dc=('Name', 'Location')
with open(fn) as f:
reader=csv.DictReader(f)
LoD=[{k:row[k] for k in dc} for row in reader]
keys_lst = ['Name', 'Location']
new_data={key:val for key,val in event.items() if key in keys_lst}
print(new_data)

Categories