The task:
Write a function that receives 3 lists and returns an array. The first list contains n integers, their values range between 0 and 10^9. "numbers".
The second list is a low-range list, which contains the lower end of a range, it contains q integers. "low".
The third list is a high-range list, which contains the higher end of a range, it contains q integers. "high".
The function should return a list that contains the number of integers in the first list, that fall in its range, given by the low-range and high-range lists.
In the returned list, at index i, there should be the number of integers in "numbers" which are bigger or equal to low[i] and smaller or equal to high[i].
You can only import math, no other imports are allowed
the list may not be sorted
Examples:
count_range([12,13,14,15,17],[14],[14]) should return [1]
count_range([12,13,14,15,17],[14,15],[14,18]) should return [1,2]
count_range([12,13,14,15,17],[12],[17]) should return [5]
This is my solution but it's not efficient enough, I need ways to optimize it or solve it differently without having to import any external packages.
def binarySearch(data, val):
highIndex = len(data) - 1
lowIndex = 0
while highIndex > lowIndex:
index = math.ceil((highIndex + lowIndex) / 2)
sub = data[index]
if sub > val:
if highIndex == index:
return sorted([highIndex, lowIndex])
highIndex = index
else:
if lowIndex == index:
return sorted([highIndex, lowIndex])
lowIndex = index
return sorted([highIndex, lowIndex])
def count_range(numbers, low, high):
numbers.sort()
result = []
low_range_dict = {}
high_range_dict = {}
for i in range(len(numbers)):
if numbers[i] not in low_range_dict:
low_range_dict[numbers[i]] = i
high_range_dict[numbers[i]] = i
for i in range(len(low)):
low_r = low[i]
high_r = high[i]
if low_r not in low_range_dict:
low_range_dict[low_r] = binarySearch(numbers, low_r)[0]
high_range_dict[low_r] = low_range_dict[low_r]
low_index = low_range_dict.get(low_r)
if high_r not in high_range_dict:
high_range_dict[high_r] = binarySearch(numbers, high_r)[0]
low_range_dict[high_r] = high_range_dict[high_r]
high_index = high_range_dict.get(high_r)
if low_r in numbers or low_r < numbers[0]:
low_index -= 1
result.append(high_index - low_index)
return result
If we could use any module from the standard library, we could do write a very simple solution.
from bisect import bisect_left
from functools import lru_cache, partial
def count_range(numbers, lows, highs):
index = lru_cache()(partial(bisect_left, sorted(numbers)))
return [index(hi + 1) - index(lo) for (lo, hi) in zip(lows, highs)]
But we can write our own (simplified) equivalent of partial, lru_cache and bisect_left, so the imports are not needed.
It is less complicated than your original code, and should probably run faster, but I don't know how big the difference is.
We'll use a simpler bisect function for the binary search. And we don't need two different memoization dictionaries for high and low range.
# This bisect is based on the reference implementation in the standard library.
# in cpython this is actually implemented in C, and is faster.
def bisect_left(a, x):
"""Return the index where to insert item x in list a, assuming a is sorted."""
lo, hi = 0, len(a)
while lo < hi:
mid = (lo + hi) // 2
if a[mid] < x:
lo = mid + 1
else:
hi = mid
return lo
def count_range(numbers, lows, highs):
numbers.sort()
# instead of both low_range_dict and high_range_dict
# we only need a single memoization dictionary.
# We could also use #functools.cache from the standard library
memo = {}
def index(val):
"""Memoized bisect"""
if not val in memo:
memo[val] = bisect_left(numbers, val)
return memo[val]
return [index(hi + 1) - index(lo) for (lo, hi) in zip(lows, highs)]
Related
I am trying to count the number of unique numbers in a sorted array using binary search. I need to get the edge of the change from one number to the next to count. I was thinking of doing this without using recursion. Is there an iterative approach?
def unique(x):
start = 0
end = len(x)-1
count =0
# This is the current number we are looking for
item = x[start]
while start <= end:
middle = (start + end)//2
if item == x[middle]:
start = middle+1
elif item < x[middle]:
end = middle -1
#when item item greater, change to next number
count+=1
# if the number
return count
unique([1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,5,5,5,5,5,5,5,5,5,5])
Thank you.
Edit: Even if the runtime benefit is negligent from o(n), what is my binary search missing? It's confusing when not looking for an actual item. How can I fix this?
Working code exploiting binary search (returns 3 for given example).
As discussed in comments, complexity is about O(k*log(n)) where k is number of unique items, so this approach works well when k is small compared with n, and might become worse than linear scan in case of k ~ n
def countuniquebs(A):
n = len(A)
t = A[0]
l = 1
count = 0
while l < n - 1:
r = n - 1
while l < r:
m = (r + l) // 2
if A[m] > t:
r = m
else:
l = m + 1
count += 1
if l < n:
t = A[l]
return count
print(countuniquebs([1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,5,5,5,5,5,5,5,5,5,5]))
I wouldn't quite call it "using a binary search", but this binary divide-and-conquer algorithm works in O(k*log(n)/log(k)) time, which is better than a repeated binary search, and never worse than a linear scan:
def countUniques(A, start, end):
len = end-start
if len < 1:
return 0
if A[start] == A[end-1]:
return 1
if len < 3:
return 2
mid = start + len//2
return countUniques(A, start, mid+1) + countUniques(A, mid, end) - 1
A = [1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,3,4,5,5,5,5,5,5,5,5,5,5]
print(countUniques(A,0,len(A)))
First of all, sorry about the naive question. But I couldn't find help elsewhere
I'm trying to create an Optimal Search Tree using Dynamic Programing in Python that receives two lists (a set of keys and a set of frequencies) and returns two answers:
1 - The smallest path cost.
2 - The generated tree for that smallest cost.
I basically need to create a tree organized by the most accessed items on top (most accessed item it's the root), and return the smallest path cost from that tree, by using the Dynamic Programming solution.
I've the following implemented code using Python:
def optimalSearchTree(keys, freq, n):
#Create an auxiliary 2D matrix to store results of subproblems
cost = [[0 for x in xrange(n)] for y in xrange(n)]
#For a single key, cost is equal to frequency of the key
#for i in xrange (0,n):
# cost[i][i] = freq[i]
# Now we need to consider chains of length 2, 3, ... .
# L is chain length.
for L in xrange (2,n):
for i in xrange(0,n-L+1):
j = i+L-1
cost[i][j] = sys.maxint
for r in xrange (i,j):
if (r > i):
c = cost[i][r-1] + sum(freq, i, j)
elif (r < j):
c = cost[r+1][j] + sum(freq, i, j)
elif (c < cost[i][j]):
cost[i][j] = c
return cost[0][n-1]
def sum(freq, i, j):
s = 0
k = i
for k in xrange (k,j):
s += freq[k]
return s
keys = [10,12,20]
freq = [34,8,50]
n=sys.getsizeof(keys)/sys.getsizeof(keys[0])
print(optimalSearchTree(keys, freq, n))
I'm trying to output the answer 1. The smallest cost for that tree should be 142 (the value stored on the Matrix Position [0][n-1], according to the Dynamic Programming solution). But unfortunately it's returning 0. I couldn't find any issues in that code. What's going wrong?
You have several very questionable statements in your code, definitely inspired by C/Java programming practices. For instance,
keys = [10,12,20]
freq = [34,8,50]
n=sys.getsizeof(keys)/sys.getsizeof(keys[0])
I think you think you calculate the number of items in the list. However, n is not 3:
sys.getsizeof(keys)/sys.getsizeof(keys[0])
3.142857142857143
What you need is this:
n = len(keys)
One more find: elif (r < j) is always True, because r is in the range between i (inclusive) and j (exclusive). The elif (c < cost[i][j]) condition is never checked. The matrix c is never updated in the loop - that's why you always end up with a 0.
Another suggestion: do not overwrite the built-in function sum(). Your namesake function calculates the sum of all items in a slice of a list:
sum(freq[i:j])
import sys
def optimalSearchTree(keys, freq):
#Create an auxiliary 2D matrix to store results of subproblems
n = len(keys)
cost = [[0 for x in range(n)] for y in range(n)]
storeRoot = [[0 for i in range(n)] for i in range(n)]
#For a single key, cost is equal to frequency of the key
for i in range (0,n):
cost[i][i] = freq[i]
# Now we need to consider chains of length 2, 3, ... .
# L is chain length.
for L in range (2,n+1):
for i in range(0,n-L+1):
j = i + L - 1
cost[i][j] = sys.maxsize
for r in range (i,j+1):
c = (cost[i][r-1] if r > i else 0)
c += (cost[r+1][j] if r < j else 0)
c += sum(freq[i:j+1])
if (c < cost[i][j]):
cost[i][j] = c
storeRoot[i][j] = r
return cost[0][n-1], storeRoot
if __name__ == "__main__" :
keys = [10,12,20]
freq = [34,8,50]
print(optimalSearchTree(keys, freq))
After analyzing the fastest subset sum algorithm which runs in 2^(n/2) time, I noticed a slight optimization that can be done. I'm not sure if it really counts as an optimization and if it does, I'm wondering if it can be improved by recursion.
Basically from the original algorithm: http://en.wikipedia.org/wiki/Subset_sum_problem (see part with title Exponential time algorithm)
it takes the list and splits it into two
then it generates the sorted power sets of both in 2^(n/2) time
then it does a linear search in both lists to see if 1 value in both lists sum to x using a clever trick
In my version with the optimization
it takes the list and removes the last element last
then it splits the list in two
then it generates the sorted power sets of both in 2^((n-1)/2) time
then it does a linear search in both lists to see if 1 value in both lists sum to x or x-last (at same time with same running time) using a clever trick
If it finds either, then I will know it worked. I tried using python time functions to test with lists of size 22, and my version is coming like twice as fast apparently.
After running the below code, it shows
0.050999879837 <- the original algorithm
0.0250000953674 <- my algorithm
My logic for the recursion part is, well if it works for a size n list in 2^((n-1)/1) time, can we not repeat this again and again?
Does any of this make sense, or am I totally wrong?
Thanks
I created this python code:
from math import log, ceil, floor
import helper # my own code
from random import randint, uniform
import time
# gets a list of unique random floats
# s = how many random numbers
# l = smallest float can be
# h = biggest float can be
def getRandomList(s, l, h):
lst = []
while len(lst) != s:
r = uniform(l,h)
if not r in lst:
lst.append(r)
return lst
# This just generates the two powerset sorted lists that the 2^(n/2) algorithm makes.
# This is just a lazy way of doing it, this running time is way worse, but since
# this can be done in 2^(n/2) time, I just pretend its that running time lol
def getSortedPowerSets(lst):
n = len(lst)
l1 = lst[:n/2]
l2 = lst[n/2:]
xs = range(2**(n/2))
ys1 = helper.getNums(l1, xs)
ys2 = helper.getNums(l2, xs)
return ys1, ys2
# this just checks using the regular 2^(n/2) algorithm to see if two values
# sum to the specified value
def checkListRegular(lst, x):
lst1, lst2 = getSortedPowerSets(lst)
left = 0
right = len(lst2)-1
while left < len(lst1) and right >= 0:
sum = lst1[left] + lst2[right]
if sum < x:
left += 1
elif sum > x:
right -= 1
else:
return True
return False
# this is my improved version of the above version
def checkListSmaller(lst, x):
last = lst.pop()
x1, x2 = x, x - last
return checkhelper(lst, x1, x2)
# this is the same as the function 'checkListRegular', but it checks 2 values
# at the same time
def checkhelper(lst, x1, x2):
lst1, lst2 = getSortedPowerSets(lst)
left = [0,0]
right = [len(lst2)-1, len(lst2)-1]
while 1:
check = 0
if left[0] < len(lst1) and right[0] >= 0:
check += 1
sum = lst1[left[0]] + lst2[right[0]]
if sum < x1:
left[0] += 1
elif sum > x1:
right[0] -= 1
else:
return True
if left[1] < len(lst1) and right[1] >= 0:
check += 1
sum = lst1[left[1]] + lst2[right[1]]
if sum < x2:
left[1] += 1
elif sum > x2:
right[1] -= 1
else:
return True
if check == 0:
return False
n = 22
lst = getRandomList(n, 1, 3000)
startTime = time.time()
print checkListRegular(lst, -50) # -50 so it does worst case scenario
startTime2 = time.time()
print checkListSmaller(lst, -50) # -50 so it does worst case scenario
startTime3 = time.time()
print (startTime2 - startTime)
print (startTime3 - startTime2)
This is the helper library which I just use to generate the powerset list.
def dec_to_bin(x):
return int(bin(x)[2:])
def getNums(lst, xs):
sums = []
n = len(lst)
for i in xs:
bin = str(dec_to_bin(i))
bin = (n-len(bin))*"0" + bin
chosen_items = getList(bin, lst)
sums.append(sum(chosen_items))
sums.sort()
return sums
def getList(binary, lst):
s = []
for i in range(len(binary)):
if binary[i]=="1":
s.append(float(lst[i]))
return s
then it generates the sorted power sets of both in 2^((n-1)/2) time
OK, since now the list has one less lement. However, this is not a big deal its just a constant time improvement of 2^(1/2)...
then it does a linear search in both lists to see if 1 value in both lists sum to x or x-last (at same time with same running time) using a clever trick
... and this improvement will go away because now you do twice as many operations to check for both x and x-last sums instead of only for x
can we not repeat this again and again?
No you can't, for the same reason why you couldn't split the original algorithm again and again. The trick only works for once because once you start looking for values in more than two lists you can't use the sorting trick anymore.
I have to find the second largest number and largest number from the list by divide and conquer algorithm. The problem is that everything is right except the part that I use indices like a and b. Because it works faster. Cost cheaper. Do not need rewrite code or send other codes and approaches. Just help me please to fix it if u can.. Any helps any ideas welcome. Thanks
#!/usr/local/bin/python2.7
def two_max(arr,a,b):
n = len(arr)
if n==2:
if arr[0]<arr[1]:
return (arr[1], arr[0])
else:
return (arr[0], arr[1])
(greatest_left, sec_greatest_left) = two_max(arr,a (a+b)/2)
(greatest_right, sec_greatest_right) = two_max(arr,(a+b)/2,b)
if greatest_left < greatest_right:
greatest = greatest_right
if greatest_left < sec_greatest_left:
return (greatest, sec_greatest_left)
else:
return (greatest, greatest_left)
else:
greatest = greatest_left
if greatest_right < sec_greatest_right: # Line 4
return (greatest, sec_greatest_right)
else:
return (greatest, greatest_right)
The biggest problem is that you never get any closer to your recursive base case.
The base case is len(arr) == 2. But every time you call yourself, you just pass arr as-is:
(greatest_left, sec_greatest_left) = two_max(arr,a,(a+b)/2)
(greatest_right, sec_greatest_right) = two_max(arr,(a+b)/2,b)
(Note that I'm guessing on the comma in the first one, because as you posted it, you're actually calling the number a as a function, which is unlikely to do anything useful…)
So, either your base case should take a and b into account, like this:
if b-a == 2:
if arr[a]<arr[a+1]:
return (arr[a+1], arr[a])
else:
return (arr[a], arr[a+1])
… or you should send a slice of arr instead of the whole thing—in which case you don't need a and b in the first place:
(greatest_left, sec_greatest_left) = two_max(arr[:len(a)/2])
(greatest_right, sec_greatest_right) = two_max(arr[len(a)/2:])
Either one will fix your first problem. Of course the function still doesn't work for most inputs. In fact, it only works if the length of the list is a power of two.
If that isn't a good enough hint for how to fix it: What happens if b-a is 3? Obviously you can't split it into two halves, both of which are of size 2 or greater. So, you'll need to write another base case for b-a == 1, and return something that will make the rest of the algorithm work.
Why don't you do it that way:
>>> def getIlargest(arr, i):
if (i <= len(arr) and i > 0):
return sorted(arr)[-i]
>>> a = [1,3,51,4,6,23,53,2,532,5,2,6,7,5,4]
>>> getIlargest(a, 2)
53
I took it one step further and tested 3 methods:
Using counting sort - getIlargestVer2
Using python sorted function - getIlargestVer1
Using heap - heapIlargest as #abarnert suggested.
The results:
for arrays in sizes from 1 to ~5000 sorted is the best, for larger arrays the heapq.nlargest usage is the winner:
plot for arrays in sizes between [1*150, 55*150]:
*Full scan between array in sizes of [1*150, 300*150]:*
The code I used is the following, the 3 methods implementation is in setup string:
setup = """
import heapq, random
a = random.sample(xrange(1<<30), 150)
a = a * factor
class ILargestFunctions:
# taken from [wiki][3] and was rewriting it.
def counting_sort(self, array, maxval):
m = maxval + 1
count = {}
for a in array:
if count.get(a, None) is None:
count[a] = 1
else:
count[a] += 1
i = 0
for key in count.keys():
for c in range(count[key]):
array[i] = key
i += 1
return array
def getIlargestVer1(self, arr, i):
if (i <= len(arr) and i > 0):
return sorted(arr)[-i]
def getIlargestVer2(self, arr, i):
if (i <= len(arr) and i > 0):
return self.counting_sort(arr, max(arr))[-i]
def heapIlargest(self, arr, i):
if (i <= len(arr) and i > 0):
return heapq.nlargest(i,arr)
n = ILargestFunctions()
"""
And the main line triggers the performance counting and plots the collected data is in:
import timeit
import numpy as np
import matplotlib.pyplot as plt
if __name__ == "__main__":
results = {}
r1 = []; r2 = []; r3 = [];
x = np.arange(1,300,1)
for i in xrange(1,300,1):
print i
factorStr = "factor = " + str(i) + ";"
newSetupStr = factorStr + setup
r1.append(timeit.timeit('n.getIlargestVer1(a, 100)', number=200, setup=newSetupStr))
r2.append(timeit.timeit('n.getIlargestVer2(a, 100)', number=200, setup=newSetupStr))
r3.append(timeit.timeit('n.heapIlargest(a, 100)', number=200, setup=newSetupStr))
results[i] = (r1,r2,r3)
p1 = plt.plot(x, r1, 'r', label = "getIlargestVer1")
p2 = plt.plot(x, r2, 'b' , label = "getIlargestVer2")
p3 = plt.plot(x, r3, 'g' , label = "heapIlargest")
plt.legend(bbox_to_anchor=(1.05, 1), loc=1, borderaxespad=0.)
plt.show()
#0x90 has the right idea, but he got it reversed.
def find_i_largest_element(seq, i):
if (i <= len(seq) and i > 0):
s = sorted(seq, reverse=True)
return s[i-1]
By the way, is this a homework assignment? If so, what's the whole idea behind the algorithm you have to use?
I currently have a long list which is being sorted using a lambda function f. I then choose a random element from the first five elements. Something like:
f = lambda x: some_function_of(x, local_variable)
my_list.sort(key=f)
foo = choice(my_list[:4])
This is a bottleneck in my program, according to the profiler. How can I speed things up? Is there a fast, inbuilt way to retrieve the elements I want (in theory shouldn't need to sort the whole list). Thanks.
Use heapq.nlargest or heapq.nsmallest.
For example:
import heapq
elements = heapq.nsmallest(4, my_list, key=f)
foo = choice(elements)
This will take O(N+KlogN) time (where K is the number of elements returned, and N is the list size), which is faster than O(NlogN) for normal sort when K is small relative to N.
It's actually possible in linear time (O(N)) on average.
You need a partition algorithm:
def partition(seq, pred, start=0, end=-1):
if end == -1: end = len(seq)
while True:
while True:
if start == end: return start
if not pred(seq[start]): break
start += 1
while True:
if pred(seq[end-1]): break
end -= 1
if start == end: return start
seq[start], seq[end-1] = seq[end-1], seq[start]
start += 1
end -= 1
which can be used by an nth_element algorithm:
def nth_element(seq_in, n, key=lambda x:x):
start, end = 0, len(seq_in)
seq = [(x, key(x)) for x in seq_in]
def partition_pred(x): return x[1] < seq[end-1][1]
while start != end:
pivot = (end + start) // 2
seq[pivot], seq[end - 1] = seq[end - 1], seq[pivot]
pivot = partition(seq, partition_pred, start, end)
seq[pivot], seq[end - 1] = seq[end - 1], seq[pivot]
if pivot == n: break
if pivot < n: start = pivot + 1
else: end = pivot
seq_in[:] = (x for x, k in seq)
Given these, just replace your second (sort) line with:
nth_element(my_list, 4, key=f)