Dealing with duplicates in lists of dictionaries

Dealing with duplicates in lists of dictionaries - python

I uploaded a a csv file using DictReader so I essentially have a list of dictionaries. For example I have a called reader with the following:
[{'name': 'Jack', 'hits:' :7, 'misses:': 12, 'year': 10},
{'name': 'Lisa', 'hits': 5, 'misses': 3,' year': 8},
{'name': 'Jack', 'hits': 5, 'misses ':7, 'year': 9}]
I am using a loop to create lists like the following:
name = []
hits = []
for row in reader:
name.append(row["name"])
hits.append(row["hits"])
However I don't want duplicates in my list so where there is a duplicate name I am only interested in the names with the highest year. So basically I want to end up with the following
name = [Jack, Lisa]
hits = [7,5]
What is the best way to go about this

TRY:
reader = sorted(reader, key = lambda i: i['year'], reverse=True)
name = []
hits = []
for row in reader:
if row['name'] in name:
continue
name.append(row["name"])
hits.append(row["hits"])
Idea is to sort the list of dict based on year and then iterate over the list.

import pandas as pd
data = [{'name': 'Jack', 'hits' :7, 'misses': 12, 'year': 10},
{'name': 'Lisa', 'hits': 5, 'misses': 3,'year': 8},
{'name': 'Jack', 'hits': 5, 'misses':7, 'year': 9}]
df = pd.DataFrame(data).sort_values(by=['name','year'],ascending=False).groupby('name').first()
dict(zip(df.index,df['hits']))

In pure Python (no libraries):
people = {} # maps "name" -> "info"
for record in csv_reader:
# do we have someone with that name already?
old_record = people.get(record['name'], {})
# what's their year (defaulting to -1)
old_year = old_record.get('year', -1)
# if this record is more up to date
if record['year'] > old_year:
# replace the old record
people[record['name']] = record
# -- then, you can pull out your name and year lists
name = list(people.keys())
year = list(r['year'] for r in people.values())
If you want to learn Pandas
import pandas as pd
df = pd.read_csv('yourdata.csv')
df.groupby(['name']).max()

Solution without pandas:
lst = [
{"name": "Jack", "hits": 7, "misses:": 12, "year": 10},
{"name": "Lisa", "hits": 5, "misses": 3, " year": 8},
{"name": "Jack", "hits": 5, "misses ": 7, "year": 9},
]
out = {}
for d in lst:
out.setdefault(d["name"], []).append(d)
name = [*out]
hits = [max(i["hits"] for i in v) for v in out.values()]
print(name)
print(hits)
Prints:
['Jack', 'Lisa']
[7, 5]

Related

How to make a one key for all values in dictonary python

I have a list:
List_ = ["Peter", "Peter", "Susan"]
I want to make a dictonary like this:
Dict_ = {"Name": "Peter", "Count": 2, "Name": "Susan", "Count": 1}
Dict_ = {}
Dict_new = {}
for text in List_:
if text not in Dict_:
Dict_[text] = 1
else:
Dict_[text] += 1
for key, values in Dict_.items():
Dict_new["Name"] = key
Dict_new["Count"] = values
print(Dict_new)
It is printing only last ones:
{"Name": "Susan", "Count": 1}

Here is the implementation that you can use according to what you would like :
from collections import Counter
# Your data
my_list = ["Peter", "Peter", "Susan"]
# Count the occurrences
counted = Counter(my_list)
# Your format
counted_list = []
for key, value in counted.items():
counted_list.append({"Name": key, "Count": value})
print(counted_list)
And output will be :
[{'Name': 'Peter', 'Count': 2}, {'Name': 'Susan', 'Count': 1}]

As noted in comments, a dictionary can only have each key once.
You may want a list of dictionaries, built with help from collections.Counter and a list comprehension.
>>> from collections import Counter
>>> List_ = ["Peter", "Peter", "Susan"]
>>> [{'name': k, 'count': v} for k, v in Counter(List_).items()]
[{'name': 'Peter', 'count': 2}, {'name': 'Susan', 'count': 1}]
In addition to using collections.Counter you could use a defaultdict.
>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> for n in List_:
... d[n] += 1
...
>>> d
defaultdict(<class 'int'>, {'Peter': 2, 'Susan': 1})
>>> [{'name': k, 'count': v} for k, v in d.items()]
[{'name': 'Peter', 'count': 2}, {'name': 'Susan', 'count': 1}]

You can use the following code to achieve what you are trying to do.
List_ = ["Peter", "Peter", "Susan"]
dict_ = {}
for name in List_:
if name in dict_:
dict_[name] += 1
else:
dict_[name] = 1
print(dict_)
Generates the following output where key is the name and value is the count.
{'Peter': 2, 'Susan': 1}

Convert a text file into a dictionary list

I have a text file in this format (in_file.txt):
banana 4500 9
banana 350 0
banana 550 8
orange 13000 6
How can I convert this into a dictionary list in Python?
Code:
in_filepath = 'in_file.txt'
def data_dict(in_filepath):
with open(in_filepath, 'r') as file:
for line in file.readlines():
title, price, count = line.split()
d = {}
d['title'] = title
d['price'] = int(price)
d['count'] = int(count)
return [d]
The terminal shows the following result:
{'title': 'orange', 'price': 13000, 'count': 6}
Correct output:
{'title': 'banana', 'price': 4500, 'count': 9}, {'title': 'banana', 'price': 350, 'count': 0} , ....
Can anyone help me with my problem? Thank you!

titles = ["title","price","count"]
[dict(zip(titles, [int(word) if word.isdigit() else word for word in line.strip().split()])) for line in open("in_file.txt").readlines()]
or:
titles = ["title","price","count"]
[dict(zip(titles, [(data:=line.strip().split())[0], *map(int, data[1:])])) for line in open("in_file.txt").readlines()]
your approach(corrected):
in_filepath = 'in_file.txt'
def data_dict(in_filepath):
res = []
with open(in_filepath, 'r') as file:
for line in file.readlines():
title, price, count = line.split()
d = {}
d['title'] = title
d['price'] = int(price)
d['count'] = int(count)
res.append(d)
return res
data_dict(in_filepath)
why? because
->
d = {}
d['title'] = title
d['price'] = int(price)
d['count'] = int(count)
is out of for loop and run only once and when ‍‍for be finished and then you have just one element
you return your last element and didn't use others and use must create a list and append every element at the last line of for loop (saving) and at last, return result
#Rockbar approach:
import pandas as pd
list(pd.read_csv("in_file.txt", sep=" ", header=None, names=["title","price","count"]).T.to_dict().values())

You can read the file line-by-line and then create dict base keys that define in the first.
keys = ['title', 'price' , 'count']
res = []
with open('in_file.txt', 'r') as file:
for line in file:
# Or in python >= 3.8
# while (line := file.readline().rstrip()):
tmp = [int(w) if w.isdigit() else w for w in line.rstrip().split() ]
res.append(dict(zip(keys, tmp)))
print(res)
[
{'title': 'banana', 'price': 4500, 'count': 9},
{'title': 'banana', 'price': 350, 'count': 0},
{'title': 'banana', 'price': 550, 'count': 8},
{'title': 'orange', 'price': 13000, 'count': 6}
]

You are trying to create a list of dictionaries (array of objects). So it would be best if you appended dictionary into a list each time you created it from a line of text.
Code
in_filepath = 'in_file.txt'
def data_dict(in_filepath):
dictionary = []
with open(in_filepath, 'r') as file:
for line in file:
title, price, count = line.split()
dictionary.append({'title': title, 'price': int(price), 'count': int(count)})
return dictionary
print(data_dict(in_filepath))
Output
[
{"title": "banana", "price": 4500, "count": 9},
{"title": "banana", "price": 350, "count": 0 },
{"title": "banana", "price": 550, "count": 8},
{"title": "orange", "price": 13000, "count": 6}
]

Add random values to json data in Python

I am new to python.I want read a json file and add random values to it.The json contains subset too.I am unable to solve this.
sample.json
{"name": "Kash","age": 12,"loc": {"loc1":"Uk","loc2":"Usa"}}
import json
import random
f=open("sample.json")
data=json.load(f)
def iterate(dictionary):
for key, value in dictionary.items():
dictionary[key]=random.randrange(1,10)
print(dictionary)
if isinstance(value, dict):
iterate(value)
return dictionary
iterate(data)
Output I Got
{'name': 8, 'age': 12, 'loc': {'loc1': 'tc', 'loc2': 'cbe'}}
{'name': 8, 'age': 6, 'loc': {'loc1': 'tc', 'loc2': 'cbe'}}
{'name': 8, 'age': 6, 'loc': 9}
{'loc1': 5, 'loc2': 'cbe'}
{'loc1': 5, 'loc2': 1}
===========================================
Output Expected
{"name": 15,"age": 85,"loc": {"loc1":52,"loc2":36}}

dictionary[key] = random.randrange(1,10)
This is performing:
dictionary['loc'] = some_number
So you lose the nested dict that was already there.
You only want to modify keys that do not have a dict as a value.
def iterate(dictionary):
for key, value in dictionary.items():
if isinstance(value, dict):
iterate(value)
else:
dictionary[key] = random.randrange(1, 10)
return dictionary
If your JSON can contain lists - you will need to handle that case too.

From the list of dictionaries find the largest value lengths for each key

data = [{"id": "78ab45",
"name": "Jonh"},
{"id": "69cd234457",
"name": "Joe"}]
I want my function to return the largest value lengths for each key from all dictionaries:
expected_output = [
{ "size": 10, "name": "id" }, #because the length of the largest "id" value is 10
{ "size": 4, "name": "name" }, #because the length of the largest "name" value is 4
]
My code so far:
def my_func(data):
headers_and_sizes = []
for item in data:
for key, value in item.items():
headers_and_sizes.append({"size": f'{len(value)}', "name": key})
if int(headers_and_sizes[0]["size"]) < len(value):
headers_and_sizes[0]["size"] = len(value)
return headers_and_sizes
Gives me this:
[{'size': '6', 'name': 'id'}, {'size': '4', 'name': 'name'}, {'size': '10', 'name': 'id'}, {'size': '3', 'name': 'name'}]
How can I fix that so that it will return the values as in expected_output?

You'll want to be updating a dictionary that stores each key mapped to the maximum length seen for that key thus far.
data = [
{
"id": "78ab45",
"name": "Jonh",
},
{
"id": "69cd234457",
"name": "Joe",
},
]
key_to_max_len = {}
for datum in data:
for key, val in datum.items():
if key not in key_to_max_len or len(val) > key_to_max_len[key]:
key_to_max_len[key] = len(val)
key_size_arr = [{"size": val, "name": key} for key, val in key_to_max_len.items()]

you can get the max value for id and name like below code, and structure the output accordingly
>>> data
[{'id': '78ab45', 'name': 'Jonh'}, {'id': '69cd234457', 'name': 'Joe'}]
id = max(map(lambda x:len(x['id']), data))
name = max(map(lambda x:len(x['name']), data))
>>> id
10
>>> name
4

You can use list comprehension to form a tuple with ids and names:
names_ids = [(eachdict['id'],eachdict['name']) for eachdict in data]
Format the output to have the desired shape (dictionaries), find the max length (using the max() function, passing it the lengths of names and ids, using another list comprehension, inside max()):
expected_output = \
[{"size":max([len(each[0]) for each in names_ids]),"name":"id"},
{"size":max([len(each[1]) for each in names_ids]),"name":"name"}]
Output will be:
[{'name': 'id', 'size': 10}, {'name': 'name', 'size': 4}]

Using the following:
keys = list(data[0].keys())
output = {key:-1 for key in keys}
for d in data:
for k in d.keys():
if len(d[k]) > output[k]:
output[k] = len(d[k])
Will output:
{'id': 10, 'name': 4}

I think the easiest method here is pandas...
import pandas as pd
df = pd.DataFrame(data)
out = [{'size': df['id'].str.len().max(), 'name':'id'},
{'size': df['name'].str.len().max(), 'name':'name'}]
output:
[{'size': 10, 'name': 'id'}, {'size': 4, 'name': 'name'}]
or for addt'l names..
[{'size':df[col].str.len().max(), 'name':col} for col in df.columns]

Here is how you can use a nested dictionary comprehension:
data = [{"id": "78ab45",
"name": "Jonh"},
{"id": "69cd234457",
"name": "Joe"}]
expected_output = [{'size': len(max([i[k] for i in data], key=len)),
'name': k} for k in data[0]]
print(expected_output)
Output:
[{'size': 10, 'name': 'id'},
{'size': 4, 'name': 'name'}]

updating dictionary inside list of dicts, when comparing two another list of dict

I have two list of dictionaries and when a certain key matches, i want it to append the dict in the first list to the second, but when the dictionary gets big, it takes a very long time. Is there a faster way to do it?
with open('tables', 'rb') as fp:
tables = pickle.load(fp)
# embedding
for table in tables:
filename = table + "_constraints"
with open(filename, 'rb') as fp:
fkeys = pickle.load(fp)
if fkeys and len(fkeys) == 1:
key = fkeys[0][1]
rkey = fkeys[0][2]
rtable = fkeys[0][3]
filename = table + ".json"
with open(filename, 'rb') as fp:
child = list(json.load(fp))
filename = rtable + ".json"
with open(filename, 'rb') as fp:
parent = list(json.load(fp))
for dict in child:
for rdict in parent:
if dict[key] == rdict[rkey]:
if "embed_"+table not in rdict:
rdict["embed_"+table] = []
del dict[key]
rdict["embed_"+table].append(dict)
break
input example would be:
tables = [child, parent]
child = [{child_id : 1, child_name : matthew , parent_id: 1},
{child_id : 2, child_name : luke , parent_id: 1},
{child_id : 3, child_name : mark , parent_id: 2}]
parent = [{parent_id:1, parent_name: john},
{parent_id:2, parent_name: paul},
{parent_id:3, parent_name: titus}]
output would be:
parent = [{parent_id:1, parent_name: john, child_embed:[{child_id : 1, child_name : matthew },{child_id : 2, child_name : luke}]},
{parent_id:2, parent_name: paul, chiled_embed : [{child_id : 3, child_name : mark}]},
{parent_id:3, parent_name: titus}]

When you make a loop like this:
for dict in child:
for rdict in parent:
you are setting up an O(n²) operation. For every child you can potentially search through every parent. Given 1000 kids and 1000 parents that's of the order of a million loops. Granted you can break early, but it doesn't change the growth rate of the function with respect to the number in the list.
You should take the time to make an object that lets you find what you need at the same speed regardless of how big it is. In Python that's the dict. You can turn your parent list into a dict with one loop through it:
>> parent_d = {d['parent_id']: {'name': d['parent_name']} for d in parent}
>> print(parent_d)
{1: {'name': 'john'}, 2: {'name': 'paul'}, 3: {'name': 'titus'}}
This lets you lookup parents without looping through the whole list every time:
>> parent_d[1]
{'name': 'john'}
With that in place you can loop once through the kids and add them to the parent (using setdefault is a handy way to initialize a list if the key is new):
for d in child:
parent = parent_d[d['parent_id']]
parent.setdefault('child_embed', []).append({'child_id' : d['child_id'], 'child_name' : d['child_name'] })
Now you have a clear dictionary with all the info keyed to parent:
{ 1: {'name': 'john','child_embed': [{'child_id': 1, 'child_name': 'matthew'},{'child_id': 2, 'child_name': 'luke'}]},
2: {'name': 'paul', 'child_embed': [{'child_id': 3, 'child_name': 'mark'}]},
3: {'name': 'titus'}}
This is a nice format to work with. If you need to get back to a list to match your old format, you can use a list comprehension:
>> [{'parent_id':i, **rest} for i, rest in parent_d.items()]
[{'parent_id': 1,
'name': 'john',
'child_embed': [{'child_id': 1, 'child_name': 'matthew'},
{'child_id': 2, 'child_name': 'luke'}]},
{'parent_id': 2,
'name': 'paul',
'child_embed': [{'child_id': 3, 'child_name': 'mark'}]},
{'parent_id': 3, 'name': 'titus'}]

This should do what you need if I am reading the question correctly...
for entry in child:
p_id = entry['parent_id']
parent_update = [x for x in parent if x['parent_id'] == p_id][0]
position = parent.index(parent_update)
del entry['parent_id']
if 'child_embed' in list(parent_update.keys()):
parent_update['child_embed'] = parent_update['child_embed'] + [entry]
else:
parent_update['child_embed'] = [entry]
parent[position] = parent_update
print(parent)
which gives:
[{'parent_id': 1, 'parent_name': 'john', 'child_embed': [{'child_id': 1, 'child_name': 'matthew'}, {'child_id': 2, 'child_name': 'luke'}]}, {'parent_id': 2, 'parent_name': 'paul', 'child_embed': [{'child_id': 3, 'child_name': 'mark'}]}, {'parent_id': 3, 'parent_name': 'titus'}]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Dealing with duplicates in lists of dictionaries - python

TRY: reader = sorted(reader, key = lambda i: i['year'], reverse=True) name = [] hits = [] for row in reader: if row['name'] in name: continue name.append(row["name"]) hits.append(row["hits"]) Idea is to sort the list of dict based on year and then iterate over the list.

Related

How to make a one key for all values in dictonary python

Convert a text file into a dictionary list

Add random values to json data in Python

From the list of dictionaries find the largest value lengths for each key

updating dictionary inside list of dicts, when comparing two another list of dict

Categories

Resources