MemoryError using numeric range as dict index (Inefficient)

MemoryError using numeric range as dict index (Inefficient) - python

I have a need to define numeric ranges as a dictionary index such as:
SCHEDULE = {
(0, 5000): 1,
(5001, 22500): 2,
(22501, 999999999): 3
}
I search it by this function:
def range_index(table, val):
new_table = {k: v for tup, v in table.items() for k in range(tup[0], tup[1]+1)}
return new_table.get(int(val)) # int() is used to deal with floats.
which works good as long as the range isn't too big. The last entry in SCHEDULE which is 999999999 causes Python to throw MemoryError. If I decrease it to a smaller number, it's fine.
This obviously means we are building this whole table from the ranges. How can this be re-worked so that the entire ranges aren't enumerated for each search?

This is a job for an order-based data structure, not a hash-based data structure like a dict. Hashes are good for equality. They don't do range tests.
Your table should be a pair of lists. The first is sorted and represents range endpoints, and the second represents values associated with each range:
# I don't have enough information to give these better names.
endpoints = [0, 5001, 22501, 1000000000]
values = [1, 2, 3]
To find a value, perform a binary search for the index in the first list and look up the corresponding value in the second. You can use bisect for the binary search:
import bisect
def lookup(endpoints, values, key):
index = bisect.bisect_right(endpoints, key) - 1
if index < 0 or index >= len(values):
raise KeyError('{!r} is out of range'.format(key))
return values[index]

You can do a next on generator with a default value as 0 to handle StopIteration:
def range_index(table, val):
return next((v for k, v in table.items() if k[0] <= int(val) <= k[1]), 0)
This uses the usual less than, greater than checks to find the range of val and get the value corresponding.
Advantages:
No new dictionary creation for every search.
Exits immediately when the condition is satisfied.

Iterate over SCHEDULE and return the first value where val is in the associated range.
category = next(category
for (start, stop), category in SCHEDULE.items()
if val in range(start, stop + 1))
It would be a bit faster if you started off with a dict of ranges, not of tuples. It would be even faster if you made SCHEDULE into a binary tree, and did a binary search on it instead of a linear one. But this is good enough for majority of cases.
This assumes your SCHEDULE is exhaustive, and you'll get a StopIteration error if you submit a val that is not covered by any of the ranges, to signify a programmer error. If you wish an else value, put it as a second parameter to next, after wrapping the first parameter in parentheses.

Related

I need some help using a dictionary as a function parameter

I need to replicate this same function but instead of having a list as a parameter I need a dictionary. The idea is that the calculation done by the function is done with the values, and the function returns the keys.
def funcion(dic, Sum):
Subset = []
def f(dic, i, Sum):
if i >= len(dic): return 1 if Sum == 0 else 0
count = f(dic, i + 1, Sum)
count += f(dic, i + 1, Sum - dic[i])
return count
for i, x in enumerate(dic):
if f(dic, i + 1, Sum - x) > 0:
Subset.append(x)
Sum -= x
return Subset
The function works if I enter (300, 200,100,400). But i need to use as an input something like {1:300 , 2:200 , 3:100, 4:400 }
So the calculation is done with the values, but it returns the keys that match the condition.
Im trying working with dic.keys() and dic.values() but its not working. Could you help me?
Thank u so much.

Your code isn't working with your dictionary because it's expecting to be able to index into dic with numeric indexes starting at 0 and going up to len(dic)-1. However, you've given your dictionary keys that start at 1 and go to len(dic). That means you need to change things up.
The first change is in the recursive f function, where you need the base case to trigger on i > len(dic) rather than using the >= comparison.
The next change in in the loop that calls f. Rather than using enumerate, which will generate indexes starting at 0 (and pair them with the keys of the dictionary, which is what you get when you directly iterate on it), you probably want to do something else.
Now, ideally, you'd want to iterate on dic.items(), which would give you index, value pairs just like your code expects. But depending on how the dictionary gets built, that might iterate over the values in a different order than you expect. In recent versions of Python, dictionaries maintain the order their keys were added in, so if you're creating the dictionary with {1:300, 2:200, 3:100, 4:400 }, you'll get the right order, but a mostly-equivalent dictionary like {3:100, 4:400, 1:300, 2:200 } would give its results in a different order.
So if you need to be resilient against dictionaries that don't have their keys in the right order, you probably want to directly generate the 1-len(dict) keys yourself with range, and then index to get the x value inside the loop:
for i in range(1, len(dic)+1): # Generate the keys directly from a range
x = dic[i] # and do the indexing manually.
if f(dic, i + 1, Sum - x) > 0: # The rest of the loop is the same as before.
Subset.append(x)
Sum -= x

Not able to understand Python3 enumerate()

Question: Given an array of integers, return indices of the two numbers such that they add up to a specific target.
You may assume that each input would have exactly one solution, and you may not use the same element twice.
Example:
Given nums = [2, 7, 11, 15], target = 9,
Because nums[0] + nums[1] = 2 + 7 = 9,
return [0, 1].
class Solution:
def twoSum(self, nums, target):
lookup={}
for cnt, num in enumerate (nums):
if target-num in lookup:
return lookup[target-num], cnt
lookup[num]=cnt
I am not able to understand the steps after for loop is used.I am new on Python, someone please help me.

Let me help you understand by explaining what the code does and how it solves the problem.
We need to find two numbers that sum to 9, to achieve this, we can iterate over every number in the array and then look if we already encountered a number that equals the target number minus the number we are currently on. If we haven't encountered such a number yet, we store the current number and its corresponding index.
Because we need to return the indices, we want to be able to look for the number-target pairs and immediately get the index. The solution uses a dictionary to store a number (key) and return an index as (value).
We iterate over every number, if we already encountered target-number before, we can return the current index and the index of the target-number, if we haven't encountered that number, we simply store the current number and its index.
The enumerate part, simply provides an index along with the value of the array that is being iterated, in the form of (id, item).
class Solution:
def twoSum(self, nums, target):
# Here a dictionary is created, which will store value, index as key, value pairs.
lookup={}
# For every number in the array, get the index (cnt) and number (num)
for cnt, num in enumerate (nums):
# If we find target-num, we know that num + target-num = target
if target-num in lookup:
# Hence we return the index of the target-num we stored in the dict, and the index of the current value (cnt)
return lookup[target-num], cnt
# Otherwise we store the current number as key with its index as value
lookup[num]=cnt

enumerate() method adds a counter to an iterable and returns it in a form of enumerate object. This enumerate object can then be used directly in for loops or be converted into a list of tuples using list() method.
For e.g.
>>>list(enumerate("abc"))
Gives
[(0, 'a'), (1, 'b'), (2, 'c')]
For easy understanding, I'm commenting your program. Go through it, you'll surely understand.
class Solution:
def twoSum(self, nums, target):
# lookup is a dictionary that stores the number and its index
# e.g. '{7:1}'
# number 7 at index 1
lookup={}
# As explained above cnt and num will receive values one by one along with index.
for cnt, num in enumerate (nums):
# We look if the number required to be added into the 'num' is present in dictionary
if target-num in lookup:
# if value found in lookup then we return the current index along with the index of number found in lookup.
return lookup[target-num], cnt
# After every loop insert the current value and its index into the lookup dictionary.
lookup[num]=cnt
Hope, I answered your query in the way you wanted. Please comment below, if anything is left unanswered, I'll surely try to answer that as well.

Using a selection sort to sort an array in python. How can I optimize?

Working on this challenge on HackerRank and got this code to pass 10 out of 15 test cases. It is failing due to timeout error which is HackerRank's way of telling you that the algorithm is not optimized. How can I optimize this code to run on larger input data?
The goal is to figure out the minimum number of swaps necessary to sort an unsorted array.
Update: Each element in the array is distinct.
def minimum_swaps(arr):
"""Returns the minimum number of swaps to re-oder array in ascending order."""
swaps = 0
for val in range(len(arr) - 1, 0, -1):
# Index of max value
max_pos = 0
for index in range(1, val + 1):
if arr[index] > arr[max_pos]:
max_pos = index
# Skip if value is already in sorted position
if max_pos == val:
continue
arr[val], arr[max_pos] = arr[max_pos], arr[val]
swaps += 1
return swaps

Look at the code. It has 2 nested loops:
The outer loop iterates over the positions val.
The inner loop finds the index of the value that should be at the index val, i.e., max_pos.
It takes a lot of time just to find the index. Instead, I will compute the index of each value and store it in a dict.
index_of = {value: index for index, value in enumerate(arr)}
(note that because all values in arr are distinct, there should be no duplicated keys)
And also prepare a sorted version of the array: that way it's easier to find the maximum value instead of having to loop over the array.
sorted_arr = sorted(arr)
Then do the rest similar to the original code: for each index visited, use sorted_arr to get the max, use index_of to get its current index, if it's out-of-place then swap. Remember to update the index_of dict while swapping too.
The algorithm takes O(n) operations (including dict indexing/modifying), plus sorting cost of n elements (which is about O(n log n)).
Note: If the array arr only contains integers in a small range, it may be faster to make index_of an array instead of a dict.

The short answer is: implement merge sort. The bubble sort algorithm you are using has a O(n^2) running time, while merge sort has a O(log_2(n)) running time.

Finding min and max values from a dictionary containing tuple values

I have a python dictionary named cdc_year_births.
For cdc_year_births, the keys are the unit (in this case the unit is a year), the values are the number of births in that unit:
print(cdc_year_births)
{2000: 4058814, 2001: 4025933, 2002: 4021726, 2003: 4089950, 1994: 3952767,
1995: 3899589, 1996: 3891494, 1997: 3880894, 1998: 3941553, 1999: 3959417}
I wrote a function that returns the maximum and minimum years and their births. When I started the function, I thought I'd hard code the max and min unit at 0 and 1000000000, respectively, and then iterate through the dictionary and compare each key's value to those hard coded values; if the conditions were met, I'd replace the max/min unit and the max/min birth.
But if the dictionary I used had negative values or values greater than 1000000000, this function wouldn't work, which is why I had to "load in" some actual values from the dictionary with the first loop, then loop over them again.
I built this function but could not get it to work properly:
def max_min_counts(data):
max_min = {}
for key,value in data.items():
max_min["max"] = key,value
max_min["min"] = key,value
for key,value in data.items():
if value >= max_min["max"]:
max_min["max"]=key,value
if value <= max_min["min"]:
max_min["min"]=key,value
return max_min
t=max_min_counts(cdc_year_births)
print(t)
It results in TypeError: unorderable types: int() >= tuple() for
if value >= max_min["max"]:
and
if value <= max_min["min"]:
I tried extracting the value from the tuple as described in Finding the max and min in dictionary as tuples python, but could not get this to work.
Can anyone help me make the second, shorter function work or show me how to write a better one?
Thank you very much in advance.

Your values are 2-tuples. You'll need one further level of indexing to get them to work:
if value >= max_min["max"][1]:
And,
if value <= max_min["min"][1]:
If you want to preset your max/min values, you can use float('inf') and -float('inf'):
max_min["max"] = (-1, -float('inf')) # Smallest value possible.
max_min["min"] = (-1, float('inf')) # Largest value possible.
You can do this efficiently using max, min, and operator.itemgetter to avoid a lambda:
from operator import itemgetter
max(cdc_year_births.items(), key=itemgetter(1))
# (2003, 4089950)
min(cdc_year_births.items(), key=itemgetter(1))
# (1997, 3880894)
Here's a slick way to compute the max-min with reduce
from fuctools import reduce
reduce(lambda x, y: x if x[1] > y[1] else y, cdc_year_births.items())
# (2003, 4089950)
reduce(lambda x, y: x if x[1] < y[1] else y, cdc_year_births.items())
# (1997, 3880894)
items() generates a list of tuples out of your dictionary, and the key tells the functions what to compare against when picking the max/min.

In case you're interested in a more functional programming-oriented solution (or just something with more independent component parts), allow me to suggest the following:
Establish a comparison function between entries
Yes, we can use </> to compare the values as we iterate through the dict, but, as will become evident in a moment, it'll be useful to have something which lets us keep track of the year associated with that number of births.
def comp_births(op, lpair, rpair):
lyr, lbirths = lpair
ryr, rbirths = rpair
return rpair if op(rbirths, lbirths) else lpair
At the end of the day, op will end up being either the numerical greater than or the numerical less than, but adding this tuple business accomplishes our goal of keeping track of the year associated with the number of births. Futher, by factoring op out into a function parameter, rather than hard-coding the operator, we open the door for reusing this code for both the "min" and "max" variations.
Construct your iteratees
Now, all we need to do to create a function that compairs two year/num_births pairs is partially apply our comparison function:
from functools import partial
from operator import gt, lt
get_max = partial(comp_births, gt)
get_min = partial(comp_births, lt)
get_max((2003, 150), (2012, 400)) #=> (2012, 400)
Pipe in your data
So where do we find these year/num_births pairs? Turns out it's just cdc_year_births.items(). And since we're lazy, let's use a function to do the iteration for us (reduce):
from functools import reduce
yr_of_max_births, max_births = reduce(get_max, births.items())
yr_of_min_births, min_births = reduce(get_min, births.items())
demo

You need to compare against the value, not the entire tuple:
if value >= max_min["max"][1]:
As for not using the built-in functions, are you averse to using other built-ins? For instance, you could use reduce with a simple function -- x if x[1] < y[1] else y -- to get the minimum of all the entries. You could also sort the entries with x[1] as the key, then take the first and last elements of the sorted list.

Yeah, I'm up to this exercise too.
Without using max and min functions (we haven't covered them yet in the course material) here's the hard way...
def minimax(dict):
minimax_dict = {}
if(len(dict) == 31):
time = "day_of_month"
elif(len(dict) == 12):
time = "month"
elif(len(dict) == 7):
time = "day_of_week"
else:
time = 'year'
min_time = "min_" + time
max_time = "max_" + time
for item in dict:
if 'min_count' in minimax_dict:
if dict[item] < minimax_dict['min_count']:
minimax_dict['min_count'] = dict[item]
minimax_dict[min_time] = item
else:
minimax_dict['min_count'] = dict[item]
minimax_dict[min_time] = item
if 'max_count' in minimax_dict:
if dict[item] > minimax_dict['max_count']:
minimax_dict['max_count'] = dict[item]
minimax_dict[max_time] = item
else:
minimax_dict['max_count'] = dict[item]
minimax_dict[max_time] = item
return minimax_dict
#here's the test stuff...
min_max_dow_births = minimax(cdc_dow_births)
#min_max_dow_births
min_max_year_births = minimax(cdc_year_births)
#min_max_year_births
min_max_dom_births = minimax(cdc_dom_births)
#min_max_dom_births
min_max_month_births = minimax(cdc_month_births)
#min_max_month_births

How to random choose several value from a list based on the value from other list?

I have two lists and one number. list val is a list of numerical values(can have repeated value),
val = [3,2,5,6,1,6]
list pair is list of paired values(cannot have repeated value)
pair = [(1,3),(3,2),(7,3),(6,5),(3,4),(5,7)]
both list have same length, i.e., len(val) = len(pair). The number is a numeric value, say num=4.
The task is to find whether is there any value in list val are larger or equal to num, if so, find out all the maximum value and random choose one value with same index from pair. For the above example, the result should random choose a value from (6,5) or (5,7). I can write a long code with several function to finish this job. I wonder is there any concise way to do this?

num = 4
val = [3,2,5,6,1,6]
pair = [(1,3),(3,2),(7,3),(6,5),(3,4),(5,7)]
import random
print random.choice([random.choice(pair[v]) for v in range(len(val)) if val[v] == max(val)])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

MemoryError using numeric range as dict index (Inefficient) - python

Related

I need some help using a dictionary as a function parameter

Not able to understand Python3 enumerate()

Using a selection sort to sort an array in python. How can I optimize?

Finding min and max values from a dictionary containing tuple values

How to random choose several value from a list based on the value from other list?

Categories

Resources