I have a python dictionary named cdc_year_births.
For cdc_year_births, the keys are the unit (in this case the unit is a year), the values are the number of births in that unit:
print(cdc_year_births)
{2000: 4058814, 2001: 4025933, 2002: 4021726, 2003: 4089950, 1994: 3952767,
1995: 3899589, 1996: 3891494, 1997: 3880894, 1998: 3941553, 1999: 3959417}
I wrote a function that returns the maximum and minimum years and their births. When I started the function, I thought I'd hard code the max and min unit at 0 and 1000000000, respectively, and then iterate through the dictionary and compare each key's value to those hard coded values; if the conditions were met, I'd replace the max/min unit and the max/min birth.
But if the dictionary I used had negative values or values greater than 1000000000, this function wouldn't work, which is why I had to "load in" some actual values from the dictionary with the first loop, then loop over them again.
I built this function but could not get it to work properly:
def max_min_counts(data):
max_min = {}
for key,value in data.items():
max_min["max"] = key,value
max_min["min"] = key,value
for key,value in data.items():
if value >= max_min["max"]:
max_min["max"]=key,value
if value <= max_min["min"]:
max_min["min"]=key,value
return max_min
t=max_min_counts(cdc_year_births)
print(t)
It results in TypeError: unorderable types: int() >= tuple() for
if value >= max_min["max"]:
and
if value <= max_min["min"]:
I tried extracting the value from the tuple as described in Finding the max and min in dictionary as tuples python, but could not get this to work.
Can anyone help me make the second, shorter function work or show me how to write a better one?
Thank you very much in advance.
Your values are 2-tuples. You'll need one further level of indexing to get them to work:
if value >= max_min["max"][1]:
And,
if value <= max_min["min"][1]:
If you want to preset your max/min values, you can use float('inf') and -float('inf'):
max_min["max"] = (-1, -float('inf')) # Smallest value possible.
max_min["min"] = (-1, float('inf')) # Largest value possible.
You can do this efficiently using max, min, and operator.itemgetter to avoid a lambda:
from operator import itemgetter
max(cdc_year_births.items(), key=itemgetter(1))
# (2003, 4089950)
min(cdc_year_births.items(), key=itemgetter(1))
# (1997, 3880894)
Here's a slick way to compute the max-min with reduce
from fuctools import reduce
reduce(lambda x, y: x if x[1] > y[1] else y, cdc_year_births.items())
# (2003, 4089950)
reduce(lambda x, y: x if x[1] < y[1] else y, cdc_year_births.items())
# (1997, 3880894)
items() generates a list of tuples out of your dictionary, and the key tells the functions what to compare against when picking the max/min.
In case you're interested in a more functional programming-oriented solution (or just something with more independent component parts), allow me to suggest the following:
Establish a comparison function between entries
Yes, we can use </> to compare the values as we iterate through the dict, but, as will become evident in a moment, it'll be useful to have something which lets us keep track of the year associated with that number of births.
def comp_births(op, lpair, rpair):
lyr, lbirths = lpair
ryr, rbirths = rpair
return rpair if op(rbirths, lbirths) else lpair
At the end of the day, op will end up being either the numerical greater than or the numerical less than, but adding this tuple business accomplishes our goal of keeping track of the year associated with the number of births. Futher, by factoring op out into a function parameter, rather than hard-coding the operator, we open the door for reusing this code for both the "min" and "max" variations.
Construct your iteratees
Now, all we need to do to create a function that compairs two year/num_births pairs is partially apply our comparison function:
from functools import partial
from operator import gt, lt
get_max = partial(comp_births, gt)
get_min = partial(comp_births, lt)
get_max((2003, 150), (2012, 400)) #=> (2012, 400)
Pipe in your data
So where do we find these year/num_births pairs? Turns out it's just cdc_year_births.items(). And since we're lazy, let's use a function to do the iteration for us (reduce):
from functools import reduce
yr_of_max_births, max_births = reduce(get_max, births.items())
yr_of_min_births, min_births = reduce(get_min, births.items())
demo
You need to compare against the value, not the entire tuple:
if value >= max_min["max"][1]:
As for not using the built-in functions, are you averse to using other built-ins? For instance, you could use reduce with a simple function -- x if x[1] < y[1] else y -- to get the minimum of all the entries. You could also sort the entries with x[1] as the key, then take the first and last elements of the sorted list.
Yeah, I'm up to this exercise too.
Without using max and min functions (we haven't covered them yet in the course material) here's the hard way...
def minimax(dict):
minimax_dict = {}
if(len(dict) == 31):
time = "day_of_month"
elif(len(dict) == 12):
time = "month"
elif(len(dict) == 7):
time = "day_of_week"
else:
time = 'year'
min_time = "min_" + time
max_time = "max_" + time
for item in dict:
if 'min_count' in minimax_dict:
if dict[item] < minimax_dict['min_count']:
minimax_dict['min_count'] = dict[item]
minimax_dict[min_time] = item
else:
minimax_dict['min_count'] = dict[item]
minimax_dict[min_time] = item
if 'max_count' in minimax_dict:
if dict[item] > minimax_dict['max_count']:
minimax_dict['max_count'] = dict[item]
minimax_dict[max_time] = item
else:
minimax_dict['max_count'] = dict[item]
minimax_dict[max_time] = item
return minimax_dict
#here's the test stuff...
min_max_dow_births = minimax(cdc_dow_births)
#min_max_dow_births
min_max_year_births = minimax(cdc_year_births)
#min_max_year_births
min_max_dom_births = minimax(cdc_dom_births)
#min_max_dom_births
min_max_month_births = minimax(cdc_month_births)
#min_max_month_births
Related
I need to replicate this same function but instead of having a list as a parameter I need a dictionary. The idea is that the calculation done by the function is done with the values, and the function returns the keys.
def funcion(dic, Sum):
Subset = []
def f(dic, i, Sum):
if i >= len(dic): return 1 if Sum == 0 else 0
count = f(dic, i + 1, Sum)
count += f(dic, i + 1, Sum - dic[i])
return count
for i, x in enumerate(dic):
if f(dic, i + 1, Sum - x) > 0:
Subset.append(x)
Sum -= x
return Subset
The function works if I enter (300, 200,100,400). But i need to use as an input something like {1:300 , 2:200 , 3:100, 4:400 }
So the calculation is done with the values, but it returns the keys that match the condition.
Im trying working with dic.keys() and dic.values() but its not working. Could you help me?
Thank u so much.
Your code isn't working with your dictionary because it's expecting to be able to index into dic with numeric indexes starting at 0 and going up to len(dic)-1. However, you've given your dictionary keys that start at 1 and go to len(dic). That means you need to change things up.
The first change is in the recursive f function, where you need the base case to trigger on i > len(dic) rather than using the >= comparison.
The next change in in the loop that calls f. Rather than using enumerate, which will generate indexes starting at 0 (and pair them with the keys of the dictionary, which is what you get when you directly iterate on it), you probably want to do something else.
Now, ideally, you'd want to iterate on dic.items(), which would give you index, value pairs just like your code expects. But depending on how the dictionary gets built, that might iterate over the values in a different order than you expect. In recent versions of Python, dictionaries maintain the order their keys were added in, so if you're creating the dictionary with {1:300, 2:200, 3:100, 4:400 }, you'll get the right order, but a mostly-equivalent dictionary like {3:100, 4:400, 1:300, 2:200 } would give its results in a different order.
So if you need to be resilient against dictionaries that don't have their keys in the right order, you probably want to directly generate the 1-len(dict) keys yourself with range, and then index to get the x value inside the loop:
for i in range(1, len(dic)+1): # Generate the keys directly from a range
x = dic[i] # and do the indexing manually.
if f(dic, i + 1, Sum - x) > 0: # The rest of the loop is the same as before.
Subset.append(x)
Sum -= x
I have a need to define numeric ranges as a dictionary index such as:
SCHEDULE = {
(0, 5000): 1,
(5001, 22500): 2,
(22501, 999999999): 3
}
I search it by this function:
def range_index(table, val):
new_table = {k: v for tup, v in table.items() for k in range(tup[0], tup[1]+1)}
return new_table.get(int(val)) # int() is used to deal with floats.
which works good as long as the range isn't too big. The last entry in SCHEDULE which is 999999999 causes Python to throw MemoryError. If I decrease it to a smaller number, it's fine.
This obviously means we are building this whole table from the ranges. How can this be re-worked so that the entire ranges aren't enumerated for each search?
This is a job for an order-based data structure, not a hash-based data structure like a dict. Hashes are good for equality. They don't do range tests.
Your table should be a pair of lists. The first is sorted and represents range endpoints, and the second represents values associated with each range:
# I don't have enough information to give these better names.
endpoints = [0, 5001, 22501, 1000000000]
values = [1, 2, 3]
To find a value, perform a binary search for the index in the first list and look up the corresponding value in the second. You can use bisect for the binary search:
import bisect
def lookup(endpoints, values, key):
index = bisect.bisect_right(endpoints, key) - 1
if index < 0 or index >= len(values):
raise KeyError('{!r} is out of range'.format(key))
return values[index]
You can do a next on generator with a default value as 0 to handle StopIteration:
def range_index(table, val):
return next((v for k, v in table.items() if k[0] <= int(val) <= k[1]), 0)
This uses the usual less than, greater than checks to find the range of val and get the value corresponding.
Advantages:
No new dictionary creation for every search.
Exits immediately when the condition is satisfied.
Iterate over SCHEDULE and return the first value where val is in the associated range.
category = next(category
for (start, stop), category in SCHEDULE.items()
if val in range(start, stop + 1))
It would be a bit faster if you started off with a dict of ranges, not of tuples. It would be even faster if you made SCHEDULE into a binary tree, and did a binary search on it instead of a linear one. But this is good enough for majority of cases.
This assumes your SCHEDULE is exhaustive, and you'll get a StopIteration error if you submit a val that is not covered by any of the ranges, to signify a programmer error. If you wish an else value, put it as a second parameter to next, after wrapping the first parameter in parentheses.
My function needs to find the character with the highest speed in a dictionary. Character name is the key and value is a tuple of statistics. Speed is index 6 of the value tuple. How do I compare the previous highest speed to the current value to see if it is higher? Once I get the speed, how can I take that character's types (index 1 and 2 in value tuple,) and return them as a list?
Example Data:
d={'spongebob':(1,'sponge','aquatic',4,5,6,70,6), 'patrick':(1,'star','fish',4,5,6,100,1)}
Patrick has the highest speed(100) and his types are star and fish. Should return [star,fish]
Here is my current code which doesn't work since I don't know how to compare previous and current:
def fastest_type(db):
l=[]
previous=0
new=0
make_list=[[k,v] for k,v in db.items()]
for key,value in db.items():
speed=value[6]
return l_types.sort()
The sorted function can do this quite easily:
Code:
from operator import itemgetter
def fastest_type(db):
fastest = sorted(db.values(), reverse=True, key=itemgetter(6))[0]
return fastest[1], fastest[2]
This code sorts by key 6, in reverse order so that the largest is first, and then set fastest to the first element of the sort. Then it simply returns the two desired fields from fastest.
Test Code:
d = {
'spongebob':(1,'sponge','aquatic',4,5,6,70,6),
'patrick':(1,'star','fish',4,5,6,100,1)
}
print(fastest_type(d))
Results:
('star', 'fish')
If I understood what you meant.
def fastest_type(db):
speed = 0
new = []
for key,value in db.items():
if speed < value[6]:
speed = value[6]
new = value[1:3]
return list(new) # return list instead of tuple
I'm trying to create a function that returns the largest element of an array, I feel I have the correct code but my syntax is in the wrong order, I'm trying to use a for/while loop in order to do so. So far I have the following:
def manindex(arg):
ans = 0
while True:
for i in range (len(arg)):
if arg[i] > arg[ans]:
pass
ans = i
return ans
Not sure where I'm going wrong if anyone could provide some guidance, thanks
EDIT: So it's been pointing out I'm causing an infinite loop so if I take out the while statement I'm left with
def manindex(arg):
ans = 0
for i in range (len(arg)):
if arg[i] > arg[ans]:
ans = i
return ans
But I have a feeling it's still not correct
When you say array I think you mean list in Python, you don't need a for/loop or while/loop to achieve this at all.
You can also use index with max, like so:
xs.index(max(xs))
sample:
xs = [1,123,12,234,34,23,42,34]
xs.index(max(xs))
3
You could use max with the key parameter set to seq.__getitem__:
def argmax(seq):
return max(range(len(seq)), key=seq.__getitem__)
print(argmax([0,1,2,3,100,4,5]))
yields
4
The idea behind finding the largest index is always the same, iterating over the elements of the array, compare to the max value we have at the moment, if it's better, the index of the current element is the maximum now, if it's not, we keep looking for it.
enumerate approach:
def max_element_index(items):
max_index, max_value = None, None
for index, item in enumerate(items):
if item > max_value:
max_index, max_value = index, item
return max_index
functional approach:
def max_element_index(items):
return reduce(lambda x,y: x[1] > y[1] and x or y,
enumerate(items), (None, None))[0]
At the risk of looking cryptic, the functional approach uses the reduce function which takes two elements and decides what is the reduction. Those elements are tuples (index, element), which are the result of the enumerate function.
The reduce function, defined on the lambda body takes two elements and return the tuple of the largest. As the reduce function reduces until only one element in the result is encountered, the champion is the tuple containing the index of the largest and the largest element, so we only need to access the 0-index of the tuple to get the element.
On the other hand if the list is empty, None object is returned, which is granted on the third parameter of the reduce function.
Before I write a long winded explanation, let me give you the solution:
index, value = max(enumerate(list1), key=lambda x: x[1])
One line, efficient (single pass O(n)), and readable (I think).
Explanation
In general, it's a good idea to use as much of python's incredibly powerful built-in functions as possible.
In this instance, the two key functions are enumerate() and max().
enumerate() converts a list (or actually any iterable) into a sequence of indices and values. e.g.
>>> list1 = ['apple', 'banana', 'cherry']
>>> for tup in enumerate(list1):
... print tup
...
(0, 'apple')
(1, 'banana')
(2, 'cherry')
max() takes an iterable and returns the maximum element. Unfortunately, max(enumerate(list1)) doesn't work, because max() will sort based on the first element of the tuple created by enumerate(), which sadly is the index.
One lesser-known feature of max() is that it can take a second argument in the form max(list1, key=something). The key is a function that can be applied to each value in the list, and the output of that function is what gets used to determine the maximum. We can use this feature to tell max() that it should be ranking items by the second item of each tuple, which is the value contained in the list.
Combining enumerate() and max() with key (plus a little help from lambda to create a function that returns the second element of a tuple) gives you this solution.
index, value = max(enumerate(list1), key=lambda x: x[1])
I came up with this recently (and am sprinkling it everywhere in my code) after watching Raymond Hettinger's talk on Transforming Code into Beautiful, Idiomatic Python, where he suggests exorcising the for i in xrange(len(list1)): pattern from your code.
Alternatively, without resorting to lambda (Thanks #sweeneyrod!):
from operator import itemgetter
index, value = max(enumerate(list1), key=itemgetter(1))
I believe if you change your for loop to....
for i in range (len(arg)):
if arg[i] > ans:
ans = arg[i]
it should work.
You could try something like this. If the list is empty, then the function will return an error.
m is set to the first element of the list, we then iterate over the list comparing the value at ever step.
def findMax(xs):
m = xs[0]
for x in xs:
if x > m:
m = x
return m
findMax([]) # error
findMax([1]) # 1
findMax([2,1]) # 2
if you wanted to use a for loop and make it more generic, then:
def findGeneric(pred, xs):
m = xs[0]
for x in xs:
if pred(x,m):
m = x
return m
findGeneric(lambda a,b: len(a) > len(b), [[1],[1,1,1,1],[1,1]]) # [1,1,1,1]
I have a dictionary structure that maps an id (integer) into a number (double). The numbers are actually weights of an item.
I am writing a function that will allows me to fetch the id of a given weight (if the weight is found in the dict, else, it will return the id of the next closest (i.e. nearest matching) weight.
This is what I have so far:
def getBucketIdByValue(bucketed_items_dict, value):
sorted_keys = sorted(bucketed_items_dict.keys())
threshold = abs(bucketed_items_dict[sorted_keys[-2]] -bucketed_items_dict[sorted_keys[-1]]) # determine gap size between numbers
# create a small dict containing likely candidates
temp = dict([(x - value),x] for x in bucketed_items_dict.values() if abs(x - value) <= threshold)
print 'DEBUG: Deviations list: ', temp.keys()
smallest_deviation = min(temp.keys()) if value >= 0 else max(temp.keys()) # Not sure about this ?
smallest_deviation_key = temp[smallest_deviation]
print 'DEBUG: found bucketed item key:',smallest_deviation_key
return smallest_deviation_key
I'm not sure the logic is actually correct (esp. where I obtain the smallest deviatioon). In any event, if even the logic is correct, this seems an overly complicated way of doing things. Is there a more elegant/pythonic way of doing this?
Off the top of my head, I think a more pythonic/elegant way would be to do something like passing a custom function to the min function - don't know if that is possible...
[[Update]]
I am running Python 2.6.5
Try sorting the items by the distance of their weight to your target value:
from operator import itemgetter
distances = ((k, abs(v - value)) for k, v in bucketed_items_dict.items())
return min(distances, key=itemgetter(1))[0]
Or using a lambda function instead of itemgetter:
distances = ((k, abs(v - value)) for k, v in bucketed_items_dict.items())
return min(distances, key=lambda x:x[1])[0]
def getBucketIdByValue(bucket, value):
distances = [( id , abs( number - value ) ) for id , number in bucket.items()]
swapped = [( distance , id ) for id , distance in distances]
minimum = min ( swapped )
return minimum[1]
Or in short:
def getBucketIdByValue(bucket, value):
return min((abs(number-value),id) for id,number in bucket.items())[1]
This function uses the bucket to create id/number pairs, then creates an iterator of distance/id pairs, then gets the first minimum pair of it and finally extract the id of that pair and returns it.
The distance is defined as the absolute value of the difference between the number and the sought-for value.
The minimum is defined as the pair with the lowest distance. If there are more, the pair with the lowest id is returned.
You can find the index of closest weight using bisect in sorted keys:
import bisect
def bisect_weight(sorted_keys, value):
index = bisect.bisect(sorted_keys, value)
# edge cases
if index == 0: return sorted_keys[0]
if index == len(sorted_keys): return sorted_keys[index - 1]
minor_weight = sorted_keys[index - 1]
greater_weight = sorted_keys[index]
return minor_weight if abs(minor_weight - value) < abs(greater_weight - value) else greater_weight
This way you just need to check 2 weights and find the best one. Sorting and binary searching are probably faster than calc all weights and find the best one.
I'd also consider the bisect module.