Finding nth biggest value in a nested dictionary - python

I have been trying to figure out how to return the key which has the nth biggest value in a nested dictionary. What causes problems for me is if there's some missing values like
my_dict = {'0': {'name': 'A', 'price': 50}, '1': {'name': 'B', 'price': 20}, '2': {'name': 'C'}, '3': {'name': 'D', 'price': 10}}
If every price existed, I could use code such as this to get the correct key:
sorted_list = sorted(my_dict.items(), key=lambda item: item[1]['price'])
print(sorted_list[-number_im_using_to_find][1]['name'])
How to account for missing values in an efficient way?

you can use dict.get to achieve this:
get(key[, default])
Return the value for key if key is in the dictionary, else default. If default is not given, it defaults to None, so that this method never raises a KeyError.
Here is a code implementation using dict.get:
default_value = -1
key = lambda item: item[1].get('price', default_value)
sorted_list = sorted(my_dict.items(), key=key)
If you want just sorted values you could remove entirely the index [1]:
default_value = -1
key = lambda value: value.get('price', default_value)
sorted_list = sorted(my_dict.values(), key=key)
EDIT: if you want to be sortable with NoneType in the list, you could use something like:
>>> mylist = [3, 1, None, None, 2, 0]
>>> mylist.sort(key=lambda x: x or -1)
>>> mylist
[None, None, 0, 1, 2, 3]

Use get with a default value, in this case 0 or even a negative value to put the elements with missing prices all the way down the sorted list.
lambda item: item[1].get('price', 0)

Related

The key of the dict in the python array is the most frequent

I have an array with some dictionaries in it.
Although the following method can be achieved.
But I have to do some more processing on the returned value, which I think is very bad.
Is there a better way?
data = [{'name': 'A'},
{'name': 'A'},
None,
None,
{'name': 'B'},
{'name': 'B'},
{'name': 'B'}]
process = list(map(lambda x: x.get('name') if isinstance(x, dict) else None, data))
result = max(process, key=process.count)
for _ in data:
if isinstance(_, dict) and _['name'] == result:
array_index = _
break
print(data.index(array_index))
{'name':'B'} appears the most times.
Where is the data array {'name':'B'}?
According to the above example, I want to get 4.
But the code above has to be processed by the for loop again, which I think is very bad.
Hey I used github copilot to see how it will solve this problem
def get_index_of_most_frequent_dict_value(data):
"""
Return the index of the most frequent value in the data array
"""
# Create a dictionary to store the frequency of each value
frequency = {}
for item in data:
if item is None:
continue
if item['name'] in frequency:
frequency[item['name']] += 1
else:
frequency[item['name']] = 1
# Find the most frequent value
most_frequent_value = None
most_frequent_value_count = 0
for key, value in frequency.items():
if value > most_frequent_value_count:
most_frequent_value = key
most_frequent_value_count = value
# Find the position of the most frequent value
for i in range(len(data)):
if data[i] is None:
continue
if data[i]['name'] == most_frequent_value:
return i
Output:
4
Comparisons of time:
My solution (a): 5.499999999998562e-06 seconds
Your solution (b): 7.400000000004625e-06 seconds
a < b? Yes
Here is what you can do
import ast
data = [{'name': 'A'},
{'name': 'A'},
None,
None,
{'name': 'B'},
{'name': 'B'},
{'name': 'B'}]
x={str(y):data.count(y) for y in data}
j=ast.literal_eval(max (x))
print(j)

Removing a nested key value pair from dictionary based on value in dataframe

The question is one part help one part curiosity, so I have a dict that I'm appending to a list once all my conditions have been iterated through:
for col, row in df.iterrows():
up_list = []
if row['id_check'] == 'Add all':
l = {'external': {'om': {'id' : row['posm']},
'wd': {'id': row['wdp']},
'wk': {'id': row['tw'].replace('ru: ', '')}
}
}
up_list.append(l)
Basically, I'm adding multiple keys and values to the dict l, and my main question is, provided one of values for 'id' == 'None' I don't want to add the entire key value pair to the dictionary.
So best case output looks like:
final_l = {'external': {'om': {'id' : '123'},
'wd': {'id': '456'},
'wk': {'id': '789'}
}}
BUT: provided one of those values == 'None' based on its corresponding dataframe value, I don't want to replace the 'id' with None, I don't want to have it there at all, so ideally say 'wk' == 'None' then the output dict would look like:
final_l = {'external': {'om': {'id' : '123'},
'wd': {'id': '456'}
}}
But the only thing I can get is:
final_l = {'external': {'om': {'id' : '123'},
'wd': {'id': '456'},
'wk': {'id': 'None'}
}}
Which is not optimal for my use case. So, How do you delete (or not even add) specific key value pairs from a dictionary based on its corresponding dataframe value? Also if there is a better way of doing this I'm very to open to this, as this "works" but by god is it not elegant.
EDIT Sample Dataframe:
id_check om wd wk
0 Add all 123 None 789
1 Add all 472 628 None
2 Add None 528 874 629
I am editing my previous answer both based on your response that you are trying to alter the dictionary and not the dataframe and because my previous answer was incorrect.
I couldn't find a way to do what you are asking using a nice simple way - e.g. list comprehension, but was able to do it with this converter I created:
class Converter:
def __init__(self):
self.rows = []
self.cols = []
#classmethod
def from_dict(cls, d):
conv_df = cls()
conv_df.cols = list(d.keys())
conv_df.rows = list(zip(*d.values()))
return conv_df
def as_dict(self):
vals = []
for idx, _ in enumerate(self.cols):
vals.append([j[idx] for j in self.rows if None not in j])
return {k: v for k, v in zip(self.cols, vals)
Example usage:
>>> z = {'a': [1, 2, 3], 'b': ['a', 'b', 'c'], 'c': ['q', 'r', None]}
>>> conv = Converter.from_dict(z)
>>> conv.cols
['a', 'b', 'c']
>>> conv.rows
[(1, 'a', 'q'), (2, 'b', 'r'), (3, 'c', None)]
>>> "Get as dict and we expect last row not to appear in it:"
'Get as dict and we expect last row not to appear in it:'
>>> conv.as_dict()
{'a': [1, 2], 'b': ['a', 'b'], 'c': ['q', 'r']}
IIUC, you could try with to_dict, dropna, eq and to_list:
final_l=df[df['id_check'].eq('Add all')].drop('id_check',1)
.apply(lambda x : {'external':x.dropna().to_dict()},axis=1)
.to_list()
Output:
final_l
[{'external': {'om': 123.0, 'wk': '789'}},
{'external': {'om': 472.0, 'wd': '628'}}]
So I tried the provided answers, and the biggest issue I ran into was truth evaluation and speed. I coded this which "works" but I'm not too happy with it from an efficiency standpoint:
if row['id_check'] == 'Add all IDs':
link_d, ex_link = {}, {}
if row['posm'] != 'None':
link_d['om'] = {'id': row['posm']}
if row['pd'] != 'None':
link_d['wd'] = {'id': row['pd']}
if row['tw'] != 'None':
link_d['wk'] = {'id': row['tw']}
ex_link['external'] = link_d
up_list.append(ex_link)
up_d[row['id']] = up_list
all_list.append(up_d)
Which outputs:
{'external': {'om': {'id' : '123'},
'wd': {'id': '456'},
'wk': {'id': '789'}}}
and ignores keys where the value == None:
{'external': {'om': {'id' : '123'},
'wd': {'id': '456'}}}

Finding and printing max value from dictionaries within a list [duplicate]

I'm trying to get the index of the dictionary with the max 'size' in a list of dictionaries like the following:
ld = [{'prop': 'foo', 'size': 100}, {'prop': 'boo', 'size': 200}]
with the following code I can take the maximum size:
items = [x['size'] for x in ld]
print(max(items))
How can I take its index now? Is there an easy way?
Test:
I just figured I can do this:
items = [x['size'] for x in ld]
max_val = max(items)
print(items.index(max_val))
is this correct?
Tell max() how to calculate the maximum for a sequence of indices:
max(range(len(ld)), key=lambda index: ld[index]['size'])
This'll return the index for which the size key is the highest:
>>> ld = [{'prop': 'foo', 'size': 100}, {'prop': 'boo', 'size': 200}]
>>> max(range(len(ld)), key=lambda index: ld[index]['size'])
1
>>> ld[1]
{'size': 200, 'prop': 'boo'}
If you wanted that dictionary all along, then you could just use:
max(ld, key=lambda d: d['size'])
and to get both the index and the dictionary, you could use enumerate() here:
max(enumerate(ld), key=lambda item: item[1]['size'])
Some more demoing:
>>> max(ld, key=lambda d: d['size'])
{'size': 200, 'prop': 'boo'}
>>> max(enumerate(ld), key=lambda item: item[1]['size'])
(1, {'size': 200, 'prop': 'boo'})
The key function is passed each element in the input sequence in turn, and max() will pick the element where the return value of that key function is highest.
Using a separate list to extract all the size values then mapping that back to your original list is not very efficient (you now need to iterate over the list twice). list.index() cannot work as it has to match the whole dictionary, not just one value in it.
You can pass the enumerate(ld) to max function with a proper key :
>>> max(enumerate(ld),key=lambda arg:arg[1]['size'])[0]
1
If you just want the dictionary with max size value, as a Pythonic approach you could use operator.itemgetter function as the key:
In [10]: from operator import itemgetter
In [11]: ld = [{'prop': 'foo', 'size': 100}, {'prop': 'boo', 'size': 200}]
In [12]: fn = itemgetter('size')
In [13]: max(ld, key=fn)
Out[13]: {'prop': 'boo', 'size': 200}

Get max value index for a list of dicts

I'm trying to get the index of the dictionary with the max 'size' in a list of dictionaries like the following:
ld = [{'prop': 'foo', 'size': 100}, {'prop': 'boo', 'size': 200}]
with the following code I can take the maximum size:
items = [x['size'] for x in ld]
print(max(items))
How can I take its index now? Is there an easy way?
Test:
I just figured I can do this:
items = [x['size'] for x in ld]
max_val = max(items)
print(items.index(max_val))
is this correct?
Tell max() how to calculate the maximum for a sequence of indices:
max(range(len(ld)), key=lambda index: ld[index]['size'])
This'll return the index for which the size key is the highest:
>>> ld = [{'prop': 'foo', 'size': 100}, {'prop': 'boo', 'size': 200}]
>>> max(range(len(ld)), key=lambda index: ld[index]['size'])
1
>>> ld[1]
{'size': 200, 'prop': 'boo'}
If you wanted that dictionary all along, then you could just use:
max(ld, key=lambda d: d['size'])
and to get both the index and the dictionary, you could use enumerate() here:
max(enumerate(ld), key=lambda item: item[1]['size'])
Some more demoing:
>>> max(ld, key=lambda d: d['size'])
{'size': 200, 'prop': 'boo'}
>>> max(enumerate(ld), key=lambda item: item[1]['size'])
(1, {'size': 200, 'prop': 'boo'})
The key function is passed each element in the input sequence in turn, and max() will pick the element where the return value of that key function is highest.
Using a separate list to extract all the size values then mapping that back to your original list is not very efficient (you now need to iterate over the list twice). list.index() cannot work as it has to match the whole dictionary, not just one value in it.
You can pass the enumerate(ld) to max function with a proper key :
>>> max(enumerate(ld),key=lambda arg:arg[1]['size'])[0]
1
If you just want the dictionary with max size value, as a Pythonic approach you could use operator.itemgetter function as the key:
In [10]: from operator import itemgetter
In [11]: ld = [{'prop': 'foo', 'size': 100}, {'prop': 'boo', 'size': 200}]
In [12]: fn = itemgetter('size')
In [13]: max(ld, key=fn)
Out[13]: {'prop': 'boo', 'size': 200}

Dictionary transformation and counter

Object:
data = [{'key': 11, 'country': 'USA'},{'key': 21, 'country': 'Canada'},{'key': 12, 'country': 'USA'}]
the result should be:
{'USA': {0: {'key':11}, 1: {'key': 12}}, 'Canada': {0: {'key':21}}}
I started experiment with:
result = {}
for i in data:
k = 0
result[i['country']] = dict(k = dict(key=i['key']))
and I get:
{'Canada': {'k': {'key': 21}}, 'USA': {'k': {'key': 12}}}
So how can I put the counter instead k? Maybe there is a more elegant way to create the dictionary?
I used the len() of the existing result item:
>>> import collections
>>> data = [{'key': 11, 'country': 'USA'},{'key': 21, 'country': 'Canada'},{'key': 12, 'country': 'USA'}]
>>> result = collections.defaultdict(dict)
>>> for item in data:
... country = item['country']
... result[country][len(result[country])] = {'key': item['key']}
...
>>> dict(result)
{'Canada': {0: {'key': 21}}, 'USA': {0: {'key': 11}, 1: {'key': 12}}}
There may be a more efficient way to do this, but I thought this would be most readable.
#zigg's answer is better.
Here's an alternative way:
import itertools as it, operator as op
def dict_transform(dataset, key_name=None, group_by=None):
result = {}
sorted_dataset = sorted(data, key=op.itemgetter(group_by))
for k,g in it.groupby(sorted_dataset, key=op.itemgetter(group_by)):
result[k] = {i:{key_name:j[key_name]} for i,j in enumerate(g)}
return result
if __name__ == '__main__':
data = [{'key': 11, 'country': 'USA'},
{'key': 21, 'country': 'Canada'},
{'key': 12, 'country': 'USA'}]
expected_result = {'USA': {0: {'key':11}, 1: {'key': 12}},
'Canada': {0: {'key':21}}}
result = dict_transform(data, key_name='key', group_by='country')
assert result == expected_result
To add the number, use the {key:value} syntax
result = {}
for i in data:
k = 0
result[i['country']] = dict({k : dict(key=i['key'])})
dict(k = dict(key=i['key']))
This passes i['key'] as the key keyword argument to the dict constructor (which is what you want - since that results in the string "key" being used as a key), and then passes the result of that as the k keyword argument to the dict constructor (which is not what you want) - that's how parameter passing works in Python. The fact that you have a local variable named k is irrelevant.
To make a dict where the value of k is used as a key, the simplest way is to use the literal syntax for dictionaries: {1:2, 3:4} is a dict where the key 1 is associated with the value 2, and the key 3 is associated with the value 4. Notice that here we're using arbitrary expressions for keys and values - not names - so we can use a local variable and the resulting dictionary will use the named value.
Thus, you want {k: {'key': i['key']}}.
Maybe there is a more elegant way to create the dictionary?
You could create a list by appending items, and then transform the list into a dictionary with dict(enumerate(the_list)). That at least saves you from having to do the counting manually, but it's pretty indirect.

Categories