Trying to analyse some strings and compute the number of times they come up. This data is stored in a dictionary. If I were to use the max function only the first highest number encountered would be printed.
count = {"cow": 4, "moo": 4, "sheep": 1}
print(max(count.keys(), key=lambda x: count[x]))
cow
This would yield cow to be the max. How would I get "cow" and "moo" to both be printed
count = {"cow": 4, "moo": 4, "sheep": 1}
cow, moo
Why not keep it simple?
mx = max(count.values())
print([k for k, v in count.items() if v == mx])
# ['cow', 'moo']
The bracketed expression in line two is a list comprehension, essentially a short hand for a for loop that runs over one list-like object (an "iterable") and creates a new list as it goes along. A subtlety in this case is that there are two loop variables (k and v) that run simultaneously their values being assigned by tuple unpacking (.items() returns pairs (key, value) one after the other). To summarize the list comprehension here is roughly equivalent to:
result = []
for k, v in count.items():
if v == mx:
result.append(k)
But the list comprehension will run faster and is also easier to read once you got used to it.
Just group the counts with a defaultdict, and take the maximum:
from collections import defaultdict
count = {"cow": 4, "moo": 4, "sheep": 1}
d = defaultdict(list)
for animal, cnt in count.items():
d[cnt].append(animal)
print(dict(d))
# {4: ['cow', 'moo'], 1: ['sheep']}
print(max(d.items())[1])
# ['cow', 'moo']
Related
I know to write something simple and slow with loop, but I need it to run super fast in big scale.
input:
lst = [[1, 1, 2], ["txt1", "txt2", "txt3"]]
desired out put:
d = {1 : ["txt1", "txt2"], 2 : "txt3"]
There is something built-in at python which make dict() extend key instead replacing it?
dict(list(zip(lst[0], lst[1])))
One option is to use dict.setdefault:
out = {}
for k, v in zip(*lst):
out.setdefault(k, []).append(v)
Output:
{1: ['txt1', 'txt2'], 2: ['txt3']}
If you want the element itself for singleton lists, one way is adding a condition that checks for it while you build an output dictionary:
out = {}
for k,v in zip(*lst):
if k in out:
if isinstance(out[k], list):
out[k].append(v)
else:
out[k] = [out[k], v]
else:
out[k] = v
or if lst[0] is sorted (like it is in your sample), you could use itertools.groupby:
from itertools import groupby
out = {}
pos = 0
for k, v in groupby(lst[0]):
length = len([*v])
if length > 1:
out[k] = lst[1][pos:pos+length]
else:
out[k] = lst[1][pos]
pos += length
Output:
{1: ['txt1', 'txt2'], 2: 'txt3'}
But as #timgeb notes, it's probably not something you want because afterwards, you'll have to check for data type each time you access this dictionary (if value is a list or not), which is an unnecessary problem that you could avoid by having all values as lists.
If you're dealing with large datasets it may be useful to add a pandas solution.
>>> import pandas as pd
>>> lst = [[1, 1, 2], ["txt1", "txt2", "txt3"]]
>>> s = pd.Series(lst[1], index=lst[0])
>>> s
1 txt1
1 txt2
2 txt3
>>> s.groupby(level=0).apply(list).to_dict()
{1: ['txt1', 'txt2'], 2: ['txt3']}
Note that this also produces lists for single elements (e.g. ['txt3']) which I highly recommend. Having both lists and strings as possible values will result in bugs because both of those types are iterable. You'd need to remember to check the type each time you process a dict-value.
You can use a defaultdict to group the strings by their corresponding key, then make a second pass through the list to extract the strings from singleton lists. Regardless of what you do, you'll need to access every element in both lists at least once, so some iteration structure is necessary (and even if you don't explicitly use iteration, whatever you use will almost definitely use iteration under the hood):
from collections import defaultdict
lst = [[1, 1, 2], ["txt1", "txt2", "txt3"]]
result = defaultdict(list)
for key, value in zip(lst[0], lst[1]):
result[key].append(value)
for key in result:
if len(result[key]) == 1:
result[key] = result[key][0]
print(dict(result)) # Prints {1: ['txt1', 'txt2'], 2: 'txt3'}
I've looked all over the internet asking the question how can I find all the keys in a dictionary that have the same value. But this value is not known. The closest thing that came up was this, but the values are known.
Say I had a dictionary like this and these values are totally random, not hardcoded by me.
{'AGAA': 2, 'ATAA': 5,'AJAA':2}
How can I identify all the keys with the same value? What would be the most efficient way of doing this.
['AGAA','AJAA']
The way I would do it is "invert" the dictionary. By this I mean to group the keys for each common value. So if you start with:
{'AGAA': 2, 'ATAA': 5, 'AJAA': 2}
You would want to group it such that the keys are now values and values are now keys:
{2: ['AGAA', 'AJAA'], 5: ['ATAA']}
After grouping the values, you can use max to determine the largest grouping.
Example:
from collections import defaultdict
data = {'AGAA': 2, 'ATAA': 5, 'AJAA': 2}
grouped = defaultdict(list)
for key in data:
grouped[data[key]].append(key)
max_group = max(grouped.values(), key=len)
print(max_group)
Outputs:
['AGAA', 'AJAA']
You could also find the max key and print it that way:
max_key = max(grouped, key=lambda k: len(grouped[k]))
print(grouped[max_key])
You can try this:
from collections import Counter
d = {'AGAA': 2, 'ATAA': 5,'AJAA':2}
l = Counter(d.values())
l = [x for x,y in l.items() if y > 1]
out = [x for x,y in d.items() if y in l]
# Out[21]: ['AGAA', 'AJAA']
I have this dictionary (key,list)
index={'chair':['one','two','two','two'],'table':['two','three','three']}
and i want this
#1. number of times each value occurs in each key. ordered descending
indexCalc={'chair':{'two':3,'one':1}, 'table':{'three':2,'two':1}}
#2. value for maximum amount for each key
indexMax={'chair':3,'table':2}
#3. we divide each value in #1 by value in #2
indexCalcMax={'chair':{'two':3/3,'one':1/3}, 'table':{'three':2/2,'two':1/2}}
I think I should use lambda expressions, but can't come up with any idea how i can do that. Any help?
First, define your values as lists correctly:
index = {'chair': ['one','two','two','two'], 'table': ['two','three','three']}
Then use collections.Counter with dictionary comprehensions:
from collections import Counter
number of times each value occurs in each key.
res1 = {k: Counter(v) for k, v in index.items()}
value for maximum amount for each key
res2 = {k: v.most_common()[0][1] for k, v in res1.items()}
we divide each value in #1 by value in #2
res3 = {k: {m: n / res2[k] for m, n in v.items()} for k, v in res1.items()}
index={'chair':{'one','two','two','two'},'table':{'two','three','three'}}
Problem: {} is creating a set. So you should consider to convert it into list.
Now coming to your solution:
from collections import Counter
index={'chair': ['one','two','two','two'],'table':['two','three','three']}
updated_index = {'chair': dict(Counter(index['chair'])), 'table': dict(Counter(index['table']))}
updated_index_2 = {'chair': Counter(index['chair']).most_common()[0][1], 'table': Counter(index['table']).most_common()[0][1]}
print(updated_index)
print(updated_index_2)
You can use python collections library, Counter to find the count without writing any lambda function.
{'chair': {'one': 1, 'two': 3}, 'table': {'two': 1, 'three': 2}}
{'chair': 3, 'table': 2}
Firstly, you have a mistake in how you created the index dict. You should have lists as the elements for each dictionary, you currently have sets. Sets are automatically deduplicated, so you will not be able to get a proper count from there.
You should correct index to be:
index={'chair':['one','two','two','two'],'table':['two','three','three']}
You can use the Counter module in Python 3, which is a subclass of the dict module, to generate what you want for each entry in indexCalc. A counter will create a dictionary with a key, and the number of times that key exists in a collection.
indexCalc = {k, Counter(v) for k, v in index}
indexCalc looks like this:
{'chair': Counter({'two': 3, 'one': 1}), 'table': Counter({'three': 2, 'two': 1})}
We can easily find the index that corresponds to the maximum value in each sub-dictionary:
indexMax = {k: max(indexCalc[k].values()) for k in indexCalc}
indexMax looks like this:
{'chair': 3, 'table': 2}
You can create indexCalcMax with the following comprehension, which is a little ugly:
indexCalcMax = {k: {val: indexCalc[k][val] / indexMax[k] for val in indexCalc[k]} for k in indexCalc}
which is a dict-comprehension translation of this loop:
for k in indexCalc:
tmp = {}
for val in indexCalc[k]:
tmp[val] = indexCalc[k][val] / float(indexMax[k])
indexCalcMax[k] = tmp
I know this is suboptimal, but I had to do it as a thought exercise:
indexCalc = {
k: {key: len([el for el in index[k] if el == key]) for key in set(index[k])}
for k in index
}
Not exactly lambda, as suggested, but comprehensions... Don't use this code in production :) This answer is only partial, you can use the analogy and come up with the other two structures that you require.
For example, if I have a dictionary that lists farm animals, it would be as follows:
{"groupA":["cow","lamb","lamb","lamb"], "groupB":["cow","cow"]}
How would I be able to count the number of times each value appears for the corresponding key? I want groupA cow 1, groupA lamb 3, groupB cow 2. The even trickier part is that the dictionary is not fixed, it needs to be dynamic. There may be more keys that go on past groupB (ex. groupC, group D) and might have more or less values associated with it. I have figured out a static way by popping out each key then using the if function to see what it contains then put a counter to it. (The only two animals will be either cow or lamb). I unfortunately cannot determine how to do this dynamically because I will never know how many keys there will be. Thank you guys so much for the help. I couldn't find the answer anywhere and am extremely frustrated at the moment.
I think the canonical approach would be to use collections.Counter:
>>> from collections import Counter
>>> d = {"groupA":["cow","lamb","lamb","lamb"], "groupB":["cow","cow"]}
>>> dc = {k: Counter(v) for k,v in d.items()}
>>> dc
{'groupA': Counter({'lamb': 3, 'cow': 1}), 'groupB': Counter({'cow': 2})}
after which you can access the nested counts:
>>> dc["groupA"]["lamb"]
3
Another iterpretation using Counter
>>> from collections import Counter
>>> D = {"groupA":["cow","lamb","lamb","lamb"], "groupB":["cow","cow"]}
>>> Counter((k,v) for k in D for v in D[k])
Counter({('groupA', 'lamb'): 3, ('groupB', 'cow'): 2, ('groupA', 'cow'): 1})
If you need a dict:
>>> dict(Counter((k,v) for k in D for v in D[k]))
{('groupA', 'cow'): 1, ('groupB', 'cow'): 2, ('groupA', 'lamb'): 3}
or a list:
>>> list(Counter((k,v) for k in D for v in D[k]).items())
[(('groupA', 'cow'), 1), (('groupB', 'cow'), 2), (('groupA', 'lamb'), 3)]
You can first count the number of each item in a list with a function something like this:
def countList(list):
counts = {}
for l in list:
if l in counts:
counts[l] += 1
else:
counts[l] = 1
return counts
Use the count method on each list of animals:
farm = {"groupA":["cow","lamb","lamb","lamb"], "groupB":["cow","cow"]}
for group_name, animal_list in farm.items():
for animal in 'cow', 'lamb':
print(group_name, animal, animal_list.count(animal))
rite={"groupA":["cow","lamb","lamb","lamb"], "groupB":["cow","cow"]}
for i,j in rite.iteritems():
b=set(j)
for k in b:
print i,k,j.count(k)
I am working with dictionaries for the first time. I would like to know how I can count how many key value pairs there are in each dictionary where the value is 'available'. I know I probably use len().
seats = {
'A': {'A1':'available', 'A2':'unavailable', 'A3':'available'},
'B': {'B1':'unavailable', 'B2':'available', 'B3':'available'},
'C': {'C1':'available', 'C2':'available', 'C3':'unavailable'},
'D': {'D1':'unavailable', 'D2':'available', 'D3':'available'} }
rowChoice = raw_input('What row? >> ')
numSeats = input('How many Seats? >> ')
I am very new to this, so I really need a very simple method and probably some annotation or explanation how it works.
I'd use the following statement to count each nested dictionary's values:
{k: sum(1 for val in v.itervalues() if val == 'available') for k, v in seats.iteritems()}
This builds a new dictionary from the same keys as seats with each value being the number of seats available. The sum(..) with generator trick efficiently counts all values of the contained per-row dictionary where the value equals 'available'.
Result:
{'A': 2, 'C': 2, 'B': 2, 'D': 2}
To show available seats for a specific row you filter and just use len():
row_available = [k for k, v in seats[rowChoice].iteritems() if v == 'available']
avail_count = len(row_available)
if avail_count:
print 'there {is_are} {count} seat{plural} available in row {rowChoice}, seat{plural} {seats}'.format(
is_are='are' if avail_count > 1 else 'is', count=avail_count,
plural='s' if avail_count > 1 else '', rowChoice=rowChoice,
seats=row_available[0] if avail_count == 1 else ' and '.join([', '.join(row_available[:-1]), row_available[-1]]))
For rowChoice = 'A' this prints:
there are 2 seats available in row A, seats A1 and A3
but it adjusts to form coherent sentences for more or fewer seats too.
Using collections.Counter and itertools.chain:
from collections import Counter
from itertools import chain
print Counter(chain.from_iterable(i.itervalues() for i in seats.itervalues()))
# Counter({'available': 8, 'unavailable': 4})