Related
I have a problem in which I need to find the max values of a range of slices.
Ex: I have a list of 5,000 ints, and want to find the maximum mean for each slice from 1 to 3600 elements.
Currently my code is as follows:
power_vals = # some list / array of ints
max_vals = []
for i in range(1, 3600):
max_vals += [max([statistics.mean(power_vals[ix:ix+i]) for ix in range(len(power_vals)) if ix+i < len(power_vals)])]
This works fine but it's really slow (for obvious reasons). I tried to use cython to speed up the process. It's obviously better but still not ideal.
Is there a more time efficient way to do this?
Your first step is to prepend a 0 to the array, and then create a cumulative sum. At this point, calculating the mean from any point to any other point is two substractions followed by a division.
mean(x[i:j]) = (cumsum[j] - cumsum[i])/(j - i)
If you're trying to find the largest mean of, say, length 10, then you can make it even faster by just looking for the largest value of (cumsum[i + 10] - cumsum[i]). Once you've found that largest value, you can then divide it by 10 to get the mean.
I'm looking into the source code for the function sample in random.py (python standard library).
The idea is simple:
If a small sample (k) is needed from a large population (n): Just pick k random indices, since it is unlikely you'll pick the same number twice as the population is so large. And if you do, just pick again.
If a relatively large sample (k) is needed, compared to the total population (n): It is better to keep track of what you have picked.
My Question
There are a few constants involved, setsize = 21 and setsize += 4 ** _log(3*k,4). The critical ratio is roughly k : 21+3k. The comment says # size of a small set minus size of an empty list and # table size for big sets.
Where have these specific numbers come from? What is there justification?
The comments shed some light, however I find they bring as many questions as they answer.
I would kind of understand, size of a small set but find the "minus size of an empty list" confusing. Can someone shed any light on this?
what is meant specifically by "table" size, as apposed to say "set size".
Looking on the github repository, it looks like a very old version simply used the ratio k : 6*k, as the critical ratio, but I find that equally mysterious.
The code
def sample(self, population, k):
"""Chooses k unique random elements from a population sequence or set.
Returns a new list containing elements from the population while
leaving the original population unchanged. The resulting list is
in selection order so that all sub-slices will also be valid random
samples. This allows raffle winners (the sample) to be partitioned
into grand prize and second place winners (the subslices).
Members of the population need not be hashable or unique. If the
population contains repeats, then each occurrence is a possible
selection in the sample.
To choose a sample in a range of integers, use range as an argument.
This is especially fast and space efficient for sampling from a
large population: sample(range(10000000), 60)
"""
# Sampling without replacement entails tracking either potential
# selections (the pool) in a list or previous selections in a set.
# When the number of selections is small compared to the
# population, then tracking selections is efficient, requiring
# only a small set and an occasional reselection. For
# a larger number of selections, the pool tracking method is
# preferred since the list takes less space than the
# set and it doesn't suffer from frequent reselections.
if isinstance(population, _Set):
population = tuple(population)
if not isinstance(population, _Sequence):
raise TypeError("Population must be a sequence or set. For dicts, use list(d).")
randbelow = self._randbelow
n = len(population)
if not 0 <= k <= n:
raise ValueError("Sample larger than population or is negative")
result = [None] * k
setsize = 21 # size of a small set minus size of an empty list
if k > 5:
setsize += 4 ** _ceil(_log(k * 3, 4)) # table size for big sets
if n <= setsize:
# An n-length list is smaller than a k-length set
pool = list(population)
for i in range(k): # invariant: non-selected at [0,n-i)
j = randbelow(n-i)
result[i] = pool[j]
pool[j] = pool[n-i-1] # move non-selected item into vacancy
else:
selected = set()
selected_add = selected.add
for i in range(k):
j = randbelow(n)
while j in selected:
j = randbelow(n)
selected_add(j)
result[i] = population[j]
return result
(I apologise is this question would be better placed in math.stackexchange. I couldn't think of any probability/statistics-y reasons for this particular ratio, and the comments sounded as though, it was maybe something to do with the amount of space that sets and lists use - but could't find any details anywhere).
This code is attempting to determine whether using a list or a set would take more space (instead of trying to estimate the time cost, for some reason).
It looks like 21 was the difference between the size of an empty list and a small set on the Python build this constant was determined on, expressed in multiples of the size of a pointer. I don't have a build of that version of Python, but testing on my 64-bit CPython 3.6.3 gives a difference of 20 pointer sizes:
>>> sys.getsizeof(set()) - sys.getsizeof([])
160
and comparing the 3.6.3 list and set struct definitions to the list and set definitions from the change that introduced this code, 21 seems plausible.
I said "the difference between the size of an empty list and a small set" because both now and at the time, small sets used a hash table contained inside the set struct itself instead of externally allocated:
setentry smalltable[PySet_MINSIZE];
The
if k > 5:
setsize += 4 ** _ceil(_log(k * 3, 4)) # table size for big sets
check adds the size of the external table allocated for sets larger than 5 elements, with size again expressed in number of pointers. This computation assumes the set never shrinks, since the sampling algorithm never removes elements. I am not currently sure whether this computation is exact.
Finally,
if n <= setsize:
compares the base overhead of a set plus any space used by an external hash table to the n pointers required by a list of the input elements. (It doesn't seem to account for the overallocation performed by list(population), so it may be underestimating the cost of the list.)
I'm trying to return the running median for a series of streaming numbers. To do that I use a max-heap (which stores the values on the lower half of the series) and a min-heap (which stores the values on the higher half of the series).
In particular I'm using the Python (2.0) built-in min-heap data structure from the heapq module (https://docs.python.org/2/library/heapq.html). To build the max-heap instead I simply use the negative of the numbers I need to push into my heap.
My Python code is the following:
import heapq
maxh = []
minh = []
vals=[1,2,3,4,5,6,7,8,9,10]
for val in vals:
# Initialize the data-structure and insert/push the 1st streaming value
if not maxh and not minh:
heapq.heappush(maxh,-val)
print float(val)
elif maxh:
# Insert/push the other streaming values
if val>-maxh[0]:
heapq.heappush(minh,val)
elif val<-maxh[0]:
heapq.heappush(maxh,-val)
# Calculate the median
if len(maxh)==len(minh):
print float(-maxh[0]+minh[0])/2
elif len(maxh)==len(minh)+1:
print float(-maxh[0])
elif len(minh)==len(maxh)+1:
print float(minh[0])
# If min-heap and max-heap grow unbalanced we rebalance them by
# removing/popping one element from a heap and inserting/pushing
# it into the other heap, then we calculate the median
elif len(minh)==len(maxh)+2:
heapq.heappush(maxh,-heapq.heappop(minh))
print float(-maxh[0]+minh[0])/2
elif len(maxh)==len(minh)+2:
heapq.heappush(minh,-heapq.heappop(maxh))
print float(-maxh[0]+minh[0])/2
Below is the full list of test cases I've built to check my code:
vals=[1,2,3,4,5,6,7,8,9,10] # positive numbers, increasing series
vals=[10,9,8,7,6,5,4,3,2,1] # positive numbers, decreasing series
vals=[10,9,11,8,12,7,13,6,14,5] # positive numbers, jumping series (keeping
# heaps balanced)
vals=[-10,-9,-8,-7,-6,-5,-4,-3,-2,-1] # negative numbers, increasing series
vals=[-1,-2,-3,-4,-5,-6,-7,-8,-9,-10] # negative numbers, decreasing series
vals=[-10,-9,-11,-8,-12,-7,-13,-6,-14,-5] # negative numbers
# jumping series (keeping heaps
# balanced)
vals=[-5,-4,-3,-2,-1,0,1,2,3,4,5] # mixed positive-negative numbers,
# increasing series
vals=[5,4,3,2,1,0,-1,-2,-3,-4,-5] # mixed positive-negative numbers,
# decreasing series
vals=[0,-1,1,-2,2,-3,3,-4,4,-5,5] # mixed positive-negative numbers,
# jumping series (keeping heaps balanced)
My code seems ok to me but I cannot pass 4 out of 10 test cases with an online judge (https://www.hackerrank.com/challenges/ctci-find-the-running-median/problem).
Do you have any hint?
The problem is here:
# Insert/push the other streaming values
if val>-maxh[0]:
heapq.heappush(minh,val)
elif val<-maxh[0]:
heapq.heappush(maxh,-val)
If val == maxh[0], then the item is never pushed onto either heap. You should be able to reveal the error with the test case [1,1,2].
A simple fix would be:
# Insert/push the other streaming values
if val >= -maxh[0]:
heapq.heappush(minh,val)
else
heapq.heappush(maxh,-val)
I was playing around with the Singpath Python practice questions. And came across a simple question which asks the following:
Given an input of a list of numbers and a high number,
return the number of multiples
of each of those numbers that are less than the maximum number.
For this case the list will contain a maximum of 3 numbers
that are all relatively prime to each other.
I wrote this simple program, it ran perfectly fine:
"""
Given an input of a list of numbers and a high number,
return the number of multiples
of each of those numbers that are less than the maximum number.
For this case the list will contain a maximum of 3 numbers
that are all relatively prime to each other.
>>> countMultiples([3],30)
9
>>> countMultiples([3,5],100)
46
>>> countMultiples([3,5,7],30)
16
"""
def countMultiples(l, max):
j = []
for num in l:
i = 1
count = 0
while num * i < max:
if num * i not in j:
j.append(num * i)
i += 1
return len(j)
print countMultiples([3],30)
print countMultiples([3,5],100)
print countMultiples([3, 5, 7],30)
But when I try to run the same on SingPath, it gave me this error
Your code took too long to return.
Your solution may be stuck in an infinite loop. Please try again.
Has anyone experienced the same issues with Singpath?
I suspect the error you're getting means exactly what it says. For some input that the test program gives your function, it takes too long to return. I don't know anything about singpath myself, so I don't know exactly how long that might be. But I'd guess that they give you enough time to solve the problem if you use the best algorithm.
You can see for yourself that your code is slow if you pass in a very large max value. Try passing 10000 as max and you may end up waiting for a minute or two to get a result.
There are a couple of reasons your code is slow in these situations. The first is that you have a list of every multiple that you've found so far, and you are searching the list to see if the latest value has already been seen. Each search takes time proportional to the length of the list, so for the whole run of the function, it takes quadratic time (relative to the result value).
You could improve on this quite a lot by using a set instead of a list. You can test if an object is in a set in (amortized) constant time. But if j is a set, you don't actually need to test if a value is already in it before adding, since sets ignore duplicated values anyway. This means you can just add a value to the set without any care about whether it was there already.
def countMultiples(l, max):
j = set() # use a set object, rather than a list
for num in l:
i = 1
count = 0
while num * i < max:
j.add(num*i) # add items to the set unconditionally
i += 1
return len(j) # duplicate values are ignored, and won't be counted
This runs a fair amount faster than the original code, and max values of a million or more will return in a not too unreasonable time. But if you try values larger still (say, 100 million or a billion), you'll eventually still run into trouble. That's because your code uses a loop to find all the multiples, which takes linear time (relative to the result value). Fortunately, there is a better algorithm.
(If you want to figure out the better approach on your own, you might want to stop reading here.)
The better way is to use division to find how many times you can multiply each value to get a value less than max. The number of multiples of num that are strictly less than max is (max-1) // num (the -1 is because we don't want to count max itself). Integer division is much faster than doing a loop!
There is an added complexity though. If you divide to find the number of multiples, you don't actually have the multiples themselves to put in a set like we were doing above. This means that any integer that is a multiple of more than than one of our input numbers will be counted more than once.
Fortunately, there's a good way to fix this. We just need to count how many integers were over counted, and subtract that from our total. When we have two input values, we'll have double counted every integer that is a multiple of their least common multiple (which, since we're guaranteed that they're relatively prime, means their product).
If we have three values, We can do the same subtraction for each pair of numbers. But that won't be exactly right either. The integers that are multiples of all three of our input numbers will be counted three times, then subtracted back out three times as well (since they're multiples of the LCM of each pair of values). So we need to add a final value to make sure those multiples of all three values are included in the final sum exactly once.
import itertools
def countMultiples(numbers, max):
count = 0
for i, num in enumerate(numbers):
count += (max-1) // num # count multiples of num that are less than max
for a, b in itertools.combinations(numbers, 2):
count -= (max-1) // (a*b) # remove double counted numbers
if len(numbers) == 3:
a, b, c = numbers
count += (max-1) // (a*b*c) # add the vals that were removed too many times
return count
This should run in something like constant time for any value of max.
Now, that's probably as efficient as you need to be for the problem you're given (which will always have no more than three values). But if you wanted a solution that would work for more input values, you can write a general version. It uses the same algorithm as the previous version, and uses itertools.combinations a lot more to get different numbers of input values at a time. The number of products of the LCM of odd numbers of values get added to the count, while the number of products of the LCM of even numbers of values are subtracted.
import itertools
from functools import reduce
from operator import mul
def lcm(nums):
return reduce(mul, nums) # this is only correct if nums are all relatively prime
def countMultiples(numbers, max):
count = 0
for n in range(len(numbers)):
for nums in itertools.combinations(numbers, n+1):
count += (-1)**n * (max-1) // lcm(nums)
return count
Here's an example output of this version, which is was computed very quickly:
>>> countMultiples([2,3,5,7,11,13,17], 100000000000000)
81947464300342
I'm having some troubles understanding this behaviour.
I'm measuring the execution time with the timeit-module and get the following results for 10000 cycles:
Merge : 1.22722930395
Bubble: 0.810706578175
Select: 0.469924766812
This is my code for MergeSort:
def mergeSort(array):
if len(array) <= 1:
return array
else:
left = array[:len(array)/2]
right = array[len(array)/2:]
return merge(mergeSort(left),mergeSort(right))
def merge(array1,array2):
merged_array=[]
while len(array1) > 0 or len(array2) > 0:
if array2 and not array1:
merged_array.append(array2.pop(0))
elif (array1 and not array2) or array1[0] < array2[0]:
merged_array.append(array1.pop(0))
else:
merged_array.append(array2.pop(0))
return merged_array
Edit:
I've changed the list operations to use pointers and my tests now work with a list of 1000 random numbers from 0-1000. (btw: I changed to only 10 cycles here)
result:
Merge : 0.0574434420723
Bubble: 1.74780097558
Select: 0.362952293025
This is my rewritten merge definition:
def merge(array1, array2):
merged_array = []
pointer1, pointer2 = 0, 0
while pointer1 < len(array1) and pointer2 < len(array2):
if array1[pointer1] < array2[pointer2]:
merged_array.append(array1[pointer1])
pointer1 += 1
else:
merged_array.append(array2[pointer2])
pointer2 += 1
while pointer1 < len(array1):
merged_array.append(array1[pointer1])
pointer1 += 1
while pointer2 < len(array2):
merged_array.append(array2[pointer2])
pointer2 += 1
return merged_array
seems to work pretty well now :)
list.pop(0) pops the first element and has to shift all remaining ones, this is an additional O(n) operation which must not happen.
Also, slicing a list object creates a copy:
left = array[:len(array)/2]
right = array[len(array)/2:]
Which means you're also using O(n * log(n)) memory instead of O(n).
I can't see BubbleSort, but I bet it works in-place, no wonder it's faster.
You need to rewrite it to work in-place. Instead of copying part of original list, pass starting and ending indexes.
For starters : I cannot reproduce your timing results, on 100 cycles and lists of size 10000. The exhaustive benchmark with timeit of all implementations discussed in this answer (including bubblesort and your original snippet) is posted as a gist here. I find the following results for the average duration of a single run :
Python's native (Tim)sort : 0.0144600081444
Bubblesort : 26.9620819092
(Your) Original Mergesort : 0.224888720512
Now, to make your function faster, you can do a few things.
Edit : Well, apparently, I was wrong on that one (thanks cwillu). Length computation takes O(1) in python. But removing useless computation everywhere still improves things a bit (Original Mergesort: 0.224888720512, no-length Mergesort: 0.195795390606):
def nolenmerge(array1,array2):
merged_array=[]
while array1 or array2:
if not array1:
merged_array.append(array2.pop(0))
elif (not array2) or array1[0] < array2[0]:
merged_array.append(array1.pop(0))
else:
merged_array.append(array2.pop(0))
return merged_array
def nolenmergeSort(array):
n = len(array)
if n <= 1:
return array
left = array[:n/2]
right = array[n/2:]
return nolenmerge(nolenmergeSort(left),nolenmergeSort(right))
Second, as suggested in this answer, pop(0) is linear. Rewrite your merge to pop() at the end:
def fastmerge(array1,array2):
merged_array=[]
while array1 or array2:
if not array1:
merged_array.append(array2.pop())
elif (not array2) or array1[-1] > array2[-1]:
merged_array.append(array1.pop())
else:
merged_array.append(array2.pop())
merged_array.reverse()
return merged_array
This is again faster: no-len Mergesort: 0.195795390606, no-len Mergesort+fastmerge: 0.126505711079
Third - and this would only be useful as-is if you were using a language that does tail call optimization, without it , it's a bad idea - your call to merge to merge is not tail-recursive; it calls both (mergeSort left) and (mergeSort right) recursively while there is remaining work in the call (merge).
But you can make the merge tail-recursive by using CPS (this will run out of stack size for even modest lists if you don't do tco):
def cps_merge_sort(array):
return cpsmergeSort(array,lambda x:x)
def cpsmergeSort(array,continuation):
n = len(array)
if n <= 1:
return continuation(array)
left = array[:n/2]
right = array[n/2:]
return cpsmergeSort (left, lambda leftR:
cpsmergeSort(right, lambda rightR:
continuation(fastmerge(leftR,rightR))))
Once this is done, you can do TCO by hand to defer the call stack management done by recursion to the while loop of a normal function (trampolining, explained e.g. here, trick originally due to Guy Steele). Trampolining and CPS work great together.
You write a thunking function, that "records" and delays application: it takes a function and its arguments, and returns a function that returns (that original function applied to those arguments).
thunk = lambda name, *args: lambda: name(*args)
You then write a trampoline that manages calls to thunks: it applies a thunk until the thunk returns a result (as opposed to another thunk)
def trampoline(bouncer):
while callable(bouncer):
bouncer = bouncer()
return bouncer
Then all that's left is to "freeze" (thunk) all your recursive calls from the original CPS function, to let the trampoline unwrap them in proper sequence. Your function now returns a thunk, without recursion (and discarding its own frame), at every call:
def tco_cpsmergeSort(array,continuation):
n = len(array)
if n <= 1:
return continuation(array)
left = array[:n/2]
right = array[n/2:]
return thunk (tco_cpsmergeSort, left, lambda leftR:
thunk (tco_cpsmergeSort, right, lambda rightR:
(continuation(fastmerge(leftR,rightR)))))
mycpomergesort = lambda l: trampoline(tco_cpsmergeSort(l,lambda x:x))
Sadly this does not go that fast (recursive mergesort:0.126505711079, this trampolined version : 0.170638551712). OK, I guess the stack blowup of the recursive merge sort algorithm is in fact modest : as soon as you get out of the leftmost path in the array-slicing recursion pattern, the algorithm starts returning (& removing frames). So for 10K-sized lists, you get a function stack of at most log_2(10 000) = 14 ... pretty modest.
You can do slightly more involved stack-based TCO elimination in the guise of this SO answer gives:
def leftcomb(l):
maxn,leftcomb = len(l),[]
n = maxn/2
while maxn > 1:
leftcomb.append((l[n:maxn],False))
maxn,n = n,n/2
return l[:maxn],leftcomb
def tcomergesort(l):
l,stack = leftcomb(l)
while stack: # l sorted, stack contains tagged slices
i,ordered = stack.pop()
if ordered:
l = fastmerge(l,i)
else:
stack.append((l,True)) # store return call
rsub,ssub = leftcomb(i)
stack.extend(ssub) #recurse
l = rsub
return l
But this goes only a tad faster (trampolined mergesort: 0.170638551712, this stack-based version:0.144994809628). Apparently, the stack-building python does at the recursive calls of our original merge sort is pretty inexpensive.
The final results ? on my machine (Ubuntu natty's stock Python 2.7.1+), the average run timings (out of of 100 runs -except for Bubblesort-, list of size 10000, containing random integers of size 0-10000000) are:
Python's native (Tim)sort : 0.0144600081444
Bubblesort : 26.9620819092
Original Mergesort : 0.224888720512
no-len Mergesort : 0.195795390606
no-len Mergesort + fastmerge : 0.126505711079
trampolined CPS Mergesort + fastmerge : 0.170638551712
stack-based mergesort + fastmerge: 0.144994809628
Your merge-sort has a big constant factor, you have to run it on large lists to see the asymptotic complexity benefit.
Umm.. 1,000 records?? You are still well within the polynomial cooefficient dominance here.. If I have
selection-sort: 15 * n ^ 2 (reads) + 5 * n^2 (swaps)
insertion-sort: 5 * n ^2 (reads) + 15 * n^2 (swaps)
merge-sort: 200 * n * log(n) (reads) 1000 * n * log(n) (merges)
You're going to be in a close race for a lonng while.. By the way, 2x faster in sorting is NOTHING. Try 100x slower. That's where the real differences are felt. Try "won't finish in my life-time" algorithms (there are known regular expressions that take this long to match simple strings).
So try 1M or 1G records and let us know if you still thing merge-sort isn't doing too well.
That being said..
There are lots of things causing this merge-sort to be expensive. First of all, nobody ever runs quick or merge sort on small scale data-structures.. Where you have if (len <= 1), people generally put:
if (len <= 16) : (use inline insertion-sort)
else: merge-sort
At EACH propagation level.
Since insertion-sort is has smaller coefficent cost at smaller sizes of n. Note that 50% of your work is done in this last mile.
Next, you are needlessly running array1.pop(0) instead of maintaining index-counters. If you're lucky, python is efficiently managing start-of-array offsets, but all else being equal, you're mutating input parameters
Also, you know the size of the target array during merge, why copy-and-double the merged_array repeatedly.. Pre-allocate the size of the target array at the start of the function.. That'll save at least a dozen 'clones' per merge-level.
In general, merge-sort uses 2x the size of RAM.. Your algorithm is probably using 20x because of all the temporary merge buffers (hopefully python can free structures before recursion). It breaks elegance, but generally the best merge-sort algorithms make an immediate allocation of a merge buffer equal to the size of the source array, and you perform complex address arithmetic (or array-index + span-length) to just keep merging data-structures back and forth. It won't be as elegent as a simple recursive problem like this, but it's somewhat close.
In C-sorting, cache-coherence is your biggest enemy. You want hot data-structures so you maximize your cache. By allocating transient temp buffers (even if the memory manager is returning pointers to hot memory) you run the risk of making slow DRAM calls (pre-filling cache-lines for data you're about to over-write). This is one advantage insertion-sort,selection-sort and quick-sort have over merge-sort (when implemented as above)
Speaking of which, something like quick-sort is both naturally-elegant code, naturally efficient-code, and doesn't waste any memory (google it on wikipedia- they have a javascript implementation from which to base your code). Squeezing the last ounce of performance out of quick-sort is hard (especially in scripting languages, which is why they generally just use the C-api to do that part), and you have a worst-case of O(n^2). You can try and be clever by doing a combination bubble-sort/quick-sort to mitigate worst-case.
Happy coding.