Determining Key with Most Values - python

dictionary = {"key1": ["Item1", "Item2"], "key2": ["Item3", "Item4"]}
Working with the above dictionary, trying to iterate through it and return the key with most values.
I was trying this:
def most_values(a):
return max(a, key=a.get)
Which isn't bad though it's going to return whatever key it checks first. Next I tried:
def most_values(a):
count = 0
high = ""
for t in a:
if len(a[t]) > count:
count += 1
high = t
return high
But it does the same and will return whatever key it iterated through first. It's also not a very elegant looking solution.
What's the most Pythonic way of going about this?

The problem with:
return max(a, key=a.get)
is that here the key will return the actual list and in Python lists are compared lexicographically, so not by length (there are things to say for both ways to compare lists, but they decided to sort lexicographically). You can however easily modify this with:
def most_values(a):
return max(a, key=lambda x:len(a[x]))
This is probably the most Pythonic way since it is declarative (you do not have to think how the maximum is calculated), elegantly, readable and has no side effects.
The problem with your second approach is that you should set count to the new len(a[t]), not increment it. So you can fix it like:
def most_values(a):
count = -1
high = None
for key,val in a.items():
if len(val) > count:
count = len(val) # semantical error in your code
high = key
return high

how about this:
sorted(dictionary.iteritems(), key=lambda x:len(x[1]), reverse=True)[0][0]
sorted() sorts stuff. dictionary.iteritems() is an iterator for key:value pairs in the dict. key will receive such a pair and use the 2nd item in it (the value) as the thing its comparing. reverse=True will make it sort from big to small. the first [0] will return the "biggest" key value pair and the second [0] will return the key
or go with Willem Van Onsem's idea because its much cleaner

Related

MemoryError using numeric range as dict index (Inefficient)

I have a need to define numeric ranges as a dictionary index such as:
SCHEDULE = {
(0, 5000): 1,
(5001, 22500): 2,
(22501, 999999999): 3
}
I search it by this function:
def range_index(table, val):
new_table = {k: v for tup, v in table.items() for k in range(tup[0], tup[1]+1)}
return new_table.get(int(val)) # int() is used to deal with floats.
which works good as long as the range isn't too big. The last entry in SCHEDULE which is 999999999 causes Python to throw MemoryError. If I decrease it to a smaller number, it's fine.
This obviously means we are building this whole table from the ranges. How can this be re-worked so that the entire ranges aren't enumerated for each search?
This is a job for an order-based data structure, not a hash-based data structure like a dict. Hashes are good for equality. They don't do range tests.
Your table should be a pair of lists. The first is sorted and represents range endpoints, and the second represents values associated with each range:
# I don't have enough information to give these better names.
endpoints = [0, 5001, 22501, 1000000000]
values = [1, 2, 3]
To find a value, perform a binary search for the index in the first list and look up the corresponding value in the second. You can use bisect for the binary search:
import bisect
def lookup(endpoints, values, key):
index = bisect.bisect_right(endpoints, key) - 1
if index < 0 or index >= len(values):
raise KeyError('{!r} is out of range'.format(key))
return values[index]
You can do a next on generator with a default value as 0 to handle StopIteration:
def range_index(table, val):
return next((v for k, v in table.items() if k[0] <= int(val) <= k[1]), 0)
This uses the usual less than, greater than checks to find the range of val and get the value corresponding.
Advantages:
No new dictionary creation for every search.
Exits immediately when the condition is satisfied.
Iterate over SCHEDULE and return the first value where val is in the associated range.
category = next(category
for (start, stop), category in SCHEDULE.items()
if val in range(start, stop + 1))
It would be a bit faster if you started off with a dict of ranges, not of tuples. It would be even faster if you made SCHEDULE into a binary tree, and did a binary search on it instead of a linear one. But this is good enough for majority of cases.
This assumes your SCHEDULE is exhaustive, and you'll get a StopIteration error if you submit a val that is not covered by any of the ranges, to signify a programmer error. If you wish an else value, put it as a second parameter to next, after wrapping the first parameter in parentheses.

How to find the most common string(s) in a Python list?

I am dealing with ancient DNA data. I have an array with n different base pair calls for a given coordinate.
e.g.,
['A','A','C','C','G']
I need to setup a bit in my script whereby the most frequent call(s) are identified. If there is one, it should use that one. If there are two (or three) that are tied (e.g., A and C here), I need it randomly pick one of the two.
I have been looking for a solution but cannot find anything satisfactory. The most frequent solution, I see is Counter, but Counter is useless for me as c.most_common(1) will not identify that 1 and 2 are tied.
You can get the maximum count from the mapping returned by Counter with the max function first, and then ues a list comprehension to output only the keys whose counts equal the maximum count. Since Counter, max, and list comprehension all cost linear time, the overall time complexity of the code can be kept at O(n):
from collections import Counter
import random
lst = ['A','A','C','C','G']
counts = Counter(lst)
greatest = max(counts.values())
print(random.choice([item for item, count in counts.items() if count == greatest]))
This outputs either A or C.
Something like this would work:
import random
string = ['A','A','C','C','G']
dct = {}
for x in set(string):
dct[x] = string.count(x)
max_value = max(dct.values())
lst = []
for key, value in dct.items():
if value == max_value:
lst.append(key)
print(random.choice(lst))

Finding min and max values from a dictionary containing tuple values

I have a python dictionary named cdc_year_births.
For cdc_year_births, the keys are the unit (in this case the unit is a year), the values are the number of births in that unit:
print(cdc_year_births)
{2000: 4058814, 2001: 4025933, 2002: 4021726, 2003: 4089950, 1994: 3952767,
1995: 3899589, 1996: 3891494, 1997: 3880894, 1998: 3941553, 1999: 3959417}
I wrote a function that returns the maximum and minimum years and their births. When I started the function, I thought I'd hard code the max and min unit at 0 and 1000000000, respectively, and then iterate through the dictionary and compare each key's value to those hard coded values; if the conditions were met, I'd replace the max/min unit and the max/min birth.
But if the dictionary I used had negative values or values greater than 1000000000, this function wouldn't work, which is why I had to "load in" some actual values from the dictionary with the first loop, then loop over them again.
I built this function but could not get it to work properly:
def max_min_counts(data):
max_min = {}
for key,value in data.items():
max_min["max"] = key,value
max_min["min"] = key,value
for key,value in data.items():
if value >= max_min["max"]:
max_min["max"]=key,value
if value <= max_min["min"]:
max_min["min"]=key,value
return max_min
t=max_min_counts(cdc_year_births)
print(t)
It results in TypeError: unorderable types: int() >= tuple() for
if value >= max_min["max"]:
and
if value <= max_min["min"]:
I tried extracting the value from the tuple as described in Finding the max and min in dictionary as tuples python, but could not get this to work.
Can anyone help me make the second, shorter function work or show me how to write a better one?
Thank you very much in advance.
Your values are 2-tuples. You'll need one further level of indexing to get them to work:
if value >= max_min["max"][1]:
And,
if value <= max_min["min"][1]:
If you want to preset your max/min values, you can use float('inf') and -float('inf'):
max_min["max"] = (-1, -float('inf')) # Smallest value possible.
max_min["min"] = (-1, float('inf')) # Largest value possible.
You can do this efficiently using max, min, and operator.itemgetter to avoid a lambda:
from operator import itemgetter
max(cdc_year_births.items(), key=itemgetter(1))
# (2003, 4089950)
min(cdc_year_births.items(), key=itemgetter(1))
# (1997, 3880894)
Here's a slick way to compute the max-min with reduce
from fuctools import reduce
reduce(lambda x, y: x if x[1] > y[1] else y, cdc_year_births.items())
# (2003, 4089950)
reduce(lambda x, y: x if x[1] < y[1] else y, cdc_year_births.items())
# (1997, 3880894)
items() generates a list of tuples out of your dictionary, and the key tells the functions what to compare against when picking the max/min.
In case you're interested in a more functional programming-oriented solution (or just something with more independent component parts), allow me to suggest the following:
Establish a comparison function between entries
Yes, we can use </> to compare the values as we iterate through the dict, but, as will become evident in a moment, it'll be useful to have something which lets us keep track of the year associated with that number of births.
def comp_births(op, lpair, rpair):
lyr, lbirths = lpair
ryr, rbirths = rpair
return rpair if op(rbirths, lbirths) else lpair
At the end of the day, op will end up being either the numerical greater than or the numerical less than, but adding this tuple business accomplishes our goal of keeping track of the year associated with the number of births. Futher, by factoring op out into a function parameter, rather than hard-coding the operator, we open the door for reusing this code for both the "min" and "max" variations.
Construct your iteratees
Now, all we need to do to create a function that compairs two year/num_births pairs is partially apply our comparison function:
from functools import partial
from operator import gt, lt
get_max = partial(comp_births, gt)
get_min = partial(comp_births, lt)
get_max((2003, 150), (2012, 400)) #=> (2012, 400)
Pipe in your data
So where do we find these year/num_births pairs? Turns out it's just cdc_year_births.items(). And since we're lazy, let's use a function to do the iteration for us (reduce):
from functools import reduce
yr_of_max_births, max_births = reduce(get_max, births.items())
yr_of_min_births, min_births = reduce(get_min, births.items())
demo
You need to compare against the value, not the entire tuple:
if value >= max_min["max"][1]:
As for not using the built-in functions, are you averse to using other built-ins? For instance, you could use reduce with a simple function -- x if x[1] < y[1] else y -- to get the minimum of all the entries. You could also sort the entries with x[1] as the key, then take the first and last elements of the sorted list.
Yeah, I'm up to this exercise too.
Without using max and min functions (we haven't covered them yet in the course material) here's the hard way...
def minimax(dict):
minimax_dict = {}
if(len(dict) == 31):
time = "day_of_month"
elif(len(dict) == 12):
time = "month"
elif(len(dict) == 7):
time = "day_of_week"
else:
time = 'year'
min_time = "min_" + time
max_time = "max_" + time
for item in dict:
if 'min_count' in minimax_dict:
if dict[item] < minimax_dict['min_count']:
minimax_dict['min_count'] = dict[item]
minimax_dict[min_time] = item
else:
minimax_dict['min_count'] = dict[item]
minimax_dict[min_time] = item
if 'max_count' in minimax_dict:
if dict[item] > minimax_dict['max_count']:
minimax_dict['max_count'] = dict[item]
minimax_dict[max_time] = item
else:
minimax_dict['max_count'] = dict[item]
minimax_dict[max_time] = item
return minimax_dict
#here's the test stuff...
min_max_dow_births = minimax(cdc_dow_births)
#min_max_dow_births
min_max_year_births = minimax(cdc_year_births)
#min_max_year_births
min_max_dom_births = minimax(cdc_dom_births)
#min_max_dom_births
min_max_month_births = minimax(cdc_month_births)
#min_max_month_births

Python- How to compare new value with previous value on for-loop?

My function needs to find the character with the highest speed in a dictionary. Character name is the key and value is a tuple of statistics. Speed is index 6 of the value tuple. How do I compare the previous highest speed to the current value to see if it is higher? Once I get the speed, how can I take that character's types (index 1 and 2 in value tuple,) and return them as a list?
Example Data:
d={'spongebob':(1,'sponge','aquatic',4,5,6,70,6), 'patrick':(1,'star','fish',4,5,6,100,1)}
Patrick has the highest speed(100) and his types are star and fish. Should return [star,fish]
Here is my current code which doesn't work since I don't know how to compare previous and current:
def fastest_type(db):
l=[]
previous=0
new=0
make_list=[[k,v] for k,v in db.items()]
for key,value in db.items():
speed=value[6]
return l_types.sort()
The sorted function can do this quite easily:
Code:
from operator import itemgetter
def fastest_type(db):
fastest = sorted(db.values(), reverse=True, key=itemgetter(6))[0]
return fastest[1], fastest[2]
This code sorts by key 6, in reverse order so that the largest is first, and then set fastest to the first element of the sort. Then it simply returns the two desired fields from fastest.
Test Code:
d = {
'spongebob':(1,'sponge','aquatic',4,5,6,70,6),
'patrick':(1,'star','fish',4,5,6,100,1)
}
print(fastest_type(d))
Results:
('star', 'fish')
If I understood what you meant.
def fastest_type(db):
speed = 0
new = []
for key,value in db.items():
if speed < value[6]:
speed = value[6]
new = value[1:3]
return list(new) # return list instead of tuple

How to Sort List Items from Low to High Without Built-In Tools

I'm trying to write this function gensort(list) that takes a list of numbers and returns a new list with the same numbers, but ordered from low to high. An example of the output would be something like
>>> gensort([111, 1, 3.14])
[1, 3.14, 111]
I wrote a function to take one element and return it to its place in ascending oder:
def insert_sorted(elem,list):
if list == []:
return [elem]
elif elem < list[0]:
return [elem] + list
else:
return [list[0]] + insert_sorted(elem, list[1:])
Now I'm trying to apply it to the rest of my list and I came up with this:
def gensort(list):
insert = insert_sorted(list[min],list)
return insert
However, this doesn't work in the least. I'm wondering how I can use insert_sorted recursively, or write a different list comprehension to get it to return the correct order for my whole list.
I know there are built in sorting tools but I'm trying to write this with what I've got currently.
You didn't ask whether creating your own sort function was a good idea, so I'll answer the question you asked, with one way of using insert_sorted to create a full gensort function:
def gensort(list):
sorted_list = []
for item in list:
sorted_list = insert_sorted(item, sorted_list)
return sorted_list
Why not use sort?
If you are putting there numbers only, simple do something like this:
insert_sorted(elem,list):
list.append(elem)
list.sort() // will sort in asc order
This is not a tool, it's standard functionality in python. Your method has big disadvantage, it will be slow and memory requiring on long lists. Better would be add new element to list and run sort function on it.
With your function adding an element to n element array, you will have n+1 function calls, creating n+1 subarrays. thats way to slow, and not acceptable. Use one loop sort algoritm instead, if you don't want to use python sort.
Example of bubble sort in python:
def bubble_sort(list_):
"""Implement bubblesort algorithm: iterate L to R in list, switching values
if Left > Right. Break when no alterations made to to list. """
not_complete = True
while not_complete:
not_complete = False
for val, item in enumerate(list_):
if val == len(list_)-1: val = 0
else:
if list_[val] > list_[val+1]:
list_[val], list_[val+1] = list_[val+1], list_[val]
not_complete = True
return list_

Categories