I'm having trouble finding the following functionality in Python:
Given a set of numbers, return the largest number less than or equal to n or return None if no such number exists.
For example, given the list [1, 3, 7, 10] and n = 9, the function would return 7.
I'm looking for Python functionality similar to Java's TreeSet.lower.
I can use another data structure. A heap seems appropriate.
The O(n) solution is too slow for the scale of the problem. I'm looking for an O(log n) solution.
Background
I'm working on https://www.hackerrank.com/challenges/maximise-sum. The possible values range from 1 - 10^14, so using a sorted list with binary search is too slow.
My current thought is to iterate on Python's heapq backing array directly. I was hoping there might be something more Pythonic.
I think you can use bintrees library for this : https://bitbucket.org/mozman/bintrees/src
Examples :
tree = bintrees.RBTree()
In [10]: tree.insert(5,1)
In [11]: tree.insert(6,1)
In [12]: tree.insert(10,1)
tree.ceiling_item(5) -> (5,1)
The complexity of this operation is O(logN)
nextLowest = lambda seq,x: min([(x-i,i) for i in seq if x>=i] or [(0,None)])
Usage:
t = [10, 20, 50, 200, 100, 300, 250, 150]
print nextLowest(t,55)
> 50
I take the above solution from a similar question.
If you can't make any assumptions about the ordering of the array, then I think the best you can do is O(n):
def largest_less_than(numlist, n):
answer = min(numlist, key=lambda x: n-x if n>=x else float('inf'))
if answer > n:
answer = None
return answer
If the question is about repeatedly getting the largest-less-than for different n values on the same dataset, then maybe one solution is using bucket sort to get your list sorted in O(n), and then use bisect repeatedly.
You can use the selection algorithm for this. Below I have provided a simple algorithm for this:
numbers = [1, 3, 7, 10]
n = 9
largest_number = None
for number in numbers:
if number<=n:
largest_number=number
else:
break
if largest_number:
print 'value found ' + str(largest_number)
else:
print 'value not found'
If you don't have to support dynamic additions and removals from the list, then just sort it and use binary search to find the largest < n in O(log N) time.
ig-melnyk's answer is probably the right way to complete this question. But since HackerRank doesn't have a way to use libraries, here's an implementation of a Left-Leaning Red Black Tree that I used for the problem.
class LLRB(object):
class Node(object):
RED = True
BLACK = False
__slots__ = ['value', 'left', 'right', 'color']
def __init__(self, value):
self.value = value
self.left = None
self.right = None
self.color = LLRB.Node.RED
def flip_colors(self):
self.color = not self.color
self.left.color = not self.left.color
self.right.color = not self.right.color
def __init__(self):
self.root = None
def search_higher(self, value):
"""Return the smallest item greater than or equal to value. If no such value
can be found, return 0.
"""
x = self.root
best = None
while x is not None:
if x.value == value:
return value
elif x.value < value:
x = x.left
else:
best = x.value if best is None else min(best, x.value)
x = x.right
return 0 if best is None else best
#staticmethod
def is_red(node):
if node is None:
return False
else:
return node.color == LLRB.Node.RED
def insert(self, value):
self.root = LLRB.insert_at(self.root, value)
self.root.color = LLRB.Node.BLACK
#staticmethod
def insert_at(node, value):
if node is None:
return LLRB.Node(value)
if LLRB.is_red(node.left) and LLRB.is_red(node.right):
node.flip_colors()
if node.value == value:
node.value = value
elif node.value < value:
node.left = LLRB.insert_at(node.left, value)
else:
node.right = LLRB.insert_at(node.right, value)
if LLRB.is_red(node.right) and not LLRB.is_red(node.left):
node = LLRB.rotate_left(node)
if LLRB.is_red(node.left) and LLRB.is_red(node.left.left):
node = LLRB.rotate_right(node)
return node
You can decrease the number you're looking for until found.
This funtion will find the position of the largest number <= n in fs, a sorted list of integers.
If there are no numbers smaller or equal to n, it will return -1.
def findmaxpos(n):
if n < fs[0]: return -1
while True:
if n in fs: return fs.index(n)
n-=1
Related
The task:
Write a function that receives 3 lists and returns an array. The first list contains n integers, their values range between 0 and 10^9. "numbers".
The second list is a low-range list, which contains the lower end of a range, it contains q integers. "low".
The third list is a high-range list, which contains the higher end of a range, it contains q integers. "high".
The function should return a list that contains the number of integers in the first list, that fall in its range, given by the low-range and high-range lists.
In the returned list, at index i, there should be the number of integers in "numbers" which are bigger or equal to low[i] and smaller or equal to high[i].
You can only import math, no other imports are allowed
the list may not be sorted
Examples:
count_range([12,13,14,15,17],[14],[14]) should return [1]
count_range([12,13,14,15,17],[14,15],[14,18]) should return [1,2]
count_range([12,13,14,15,17],[12],[17]) should return [5]
This is my solution but it's not efficient enough, I need ways to optimize it or solve it differently without having to import any external packages.
def binarySearch(data, val):
highIndex = len(data) - 1
lowIndex = 0
while highIndex > lowIndex:
index = math.ceil((highIndex + lowIndex) / 2)
sub = data[index]
if sub > val:
if highIndex == index:
return sorted([highIndex, lowIndex])
highIndex = index
else:
if lowIndex == index:
return sorted([highIndex, lowIndex])
lowIndex = index
return sorted([highIndex, lowIndex])
def count_range(numbers, low, high):
numbers.sort()
result = []
low_range_dict = {}
high_range_dict = {}
for i in range(len(numbers)):
if numbers[i] not in low_range_dict:
low_range_dict[numbers[i]] = i
high_range_dict[numbers[i]] = i
for i in range(len(low)):
low_r = low[i]
high_r = high[i]
if low_r not in low_range_dict:
low_range_dict[low_r] = binarySearch(numbers, low_r)[0]
high_range_dict[low_r] = low_range_dict[low_r]
low_index = low_range_dict.get(low_r)
if high_r not in high_range_dict:
high_range_dict[high_r] = binarySearch(numbers, high_r)[0]
low_range_dict[high_r] = high_range_dict[high_r]
high_index = high_range_dict.get(high_r)
if low_r in numbers or low_r < numbers[0]:
low_index -= 1
result.append(high_index - low_index)
return result
If we could use any module from the standard library, we could do write a very simple solution.
from bisect import bisect_left
from functools import lru_cache, partial
def count_range(numbers, lows, highs):
index = lru_cache()(partial(bisect_left, sorted(numbers)))
return [index(hi + 1) - index(lo) for (lo, hi) in zip(lows, highs)]
But we can write our own (simplified) equivalent of partial, lru_cache and bisect_left, so the imports are not needed.
It is less complicated than your original code, and should probably run faster, but I don't know how big the difference is.
We'll use a simpler bisect function for the binary search. And we don't need two different memoization dictionaries for high and low range.
# This bisect is based on the reference implementation in the standard library.
# in cpython this is actually implemented in C, and is faster.
def bisect_left(a, x):
"""Return the index where to insert item x in list a, assuming a is sorted."""
lo, hi = 0, len(a)
while lo < hi:
mid = (lo + hi) // 2
if a[mid] < x:
lo = mid + 1
else:
hi = mid
return lo
def count_range(numbers, lows, highs):
numbers.sort()
# instead of both low_range_dict and high_range_dict
# we only need a single memoization dictionary.
# We could also use #functools.cache from the standard library
memo = {}
def index(val):
"""Memoized bisect"""
if not val in memo:
memo[val] = bisect_left(numbers, val)
return memo[val]
return [index(hi + 1) - index(lo) for (lo, hi) in zip(lows, highs)]
I am working on a python algorithm to find the most frequent element in the list.
def GetFrequency(a, element):
return sum([1 for x in a if x == element])
def GetMajorityElement(a):
n = len(a)
if n == 1:
return a[0]
k = n // 2
elemlsub = GetMajorityElement(a[:k])
elemrsub = GetMajorityElement(a[k:])
if elemlsub == elemrsub:
return elemlsub
lcount = GetFrequency(a, elemlsub)
rcount = GetFrequency(a, elemrsub)
if lcount > k:
return elemlsub
elif rcount > k:
return elemrsub
else:
return None
I tried some test cases. Some of them are passed, but some of them fails.
For example, [1,2,1,3,4] this should return 1, buit I get None.
The implementation follows the pseudocode here:
http://users.eecs.northwestern.edu/~dda902/336/hw4-sol.pdf
The pseudocode finds the majority item and needs to be at least half. I only want to find the majority item.
Can I get some help?
Thanks!
I wrote an iterative version instead of the recursive one you're using in case you wanted something similar.
def GetFrequency(array):
majority = int(len(array)/2)
result_dict = {}
while array:
array_item = array.pop()
if result_dict.get(array_item):
result_dict[array_item] += 1
else:
result_dict[array_item] = 1
if result_dict[array_item] > majority:
return array_item
return max(result_dict, key=result_dict.get)
This will iterate through the array and return the value as soon as one hits more than 50% of the total (being a majority). Otherwise it goes through the entire array and returns the value with the greatest frequency.
def majority_element(a):
return max([(a.count(elem), elem) for elem in set(a)])[1]
EDIT
If there is a tie, the biggest value is returned. E.g: a = [1,1,2,2] returns 2. Might not be what you want but that could be changed.
EDIT 2
The pseudocode you gave divided into arrays 1 to k included, k + 1 to n. Your code does 1 to k - 1, k to end, not sure if it changes much though ? If you want to respect the algorithm you gave, you should do:
elemlsub = GetMajorityElement(a[:k+1]) # this slice is indices 0 to k
elemrsub = GetMajorityElement(a[k+1:]) # this one is k + 1 to n.
Also still according to your provided pseudocode, lcount and rcount should be compared to k + 1, not k:
if lcount > k + 1:
return elemlsub
elif rcount > k + 1:
return elemrsub
else:
return None
EDIT 3
Some people in the comments highligted that provided pseudocode solves not for the most frequent, but for the item which is present more that 50% of occurences. So indeed your output for your example is correct. There is a good chance that your code already works as is.
EDIT 4
If you want to return None when there is a tie, I suggest this:
def majority_element(a):
n = len(a)
if n == 1:
return a[0]
if n == 0:
return None
sorted_counts = sorted([(a.count(elem), elem) for elem in set(a)], key=lambda x: x[0])
if len(sorted_counts) > 1 and sorted_counts[-1][0] == sorted_counts[-2][0]:
return None
return sorted_counts[-1][1]
The following is a class of heap. I am trying to sort the heap but i have a problem with my max_heapify function. I have inserted the values [10, 9, 7, 6, 5, 4, 3] and my heap sort prints the given output. The given output and expected output is given below the class
class of heap
class Heap(object):
def __init__(self):
self.A = []
def insert(self, x):
self.A.append(x)
def Max(self):
"""
returns the largest value in an array
"""
return max(self.A)
def extractMax(self):
"""
returns and remove the largest value from an array
"""
x = max(self.A)
self.A.remove(x)
self.max_heapify(0)
return x;
def parent(self, i):
"""
returns the parent index
"""
i+=1
i = int(i/2)
return i
def left(self, i):
"""
returns the index of left child
"""
i = i+1
i = 2*i
return i
def right(self, i):
"""
returns the index of right child
"""
i+=1;
i = 2*i + 1
return i
def heap_size(self):
"""
returns the size of heap
"""
return len(self.A)
def max_heapify(self, i):
"""
heapify the array
"""
l = self.left(i)
r = self.right(i)
if(l < self.heap_size() and self.A[l] > self.A[i]):
largest = l
else:
largest = i
if(r < self.heap_size() and self.A[r] > self.A[largest]):
largest = r
if largest != i:
temp = self.A[i]
self.A[i] = self.A[largest]
self.A[largest] = temp
self.max_heapify(largest)
def build_max_heap(self):
n = len(self.A)
n = int(n/2)
for i in range(n, -1, -1):
self.max_heapify(i)
def heap_sort(self):
"""
sorts the heap
"""
while self.heap_size() > 0:
self.build_max_heap()
temp = self.A[0]
n = len(self.A) - 1
self.A[0] = self.A[n]
self.A[n] = temp
x = self.A.pop()
print(x)
self.max_heapify(0)
h = Heap()
h.insert(10)
h.insert(9)
h.insert(7)
h.insert(6)
h.insert(5)
h.insert(4)
h.insert(3)
h.heap_sort()
given output
10
7
6
5
4
3
9
expected output
10
9
7
6
5
4
3
It looks like you're trying to build a max-heap with the root at A[0]. If that's correct, then your left, right, and parent index calculations are not correct. You have:
def parent(self, i):
"""
returns the parent index
"""
i+=1
i = int(i/2)
return i
def left(self, i):
"""
returns the index of left child
"""
i = i+1
i = 2*i
return i
def right(self, i):
"""
returns the index of right child
"""
i+=1;
i = 2*i + 1
return i
So if i=0, the left child would be 2, and the right child would be 3. Worse, given i=3, parent will return 2. So you have the case where parent(right(i)) != i. That's never going to work.
The correct calculations are:
left = (2*i)+1
right = (2*i)+2
parent = (i-1)/2
I don't know why your extractMax is calling max(self.A). You already know that the maximum element is at A[0]. To extract the maximum item, all you need to do is:
returnValue = save value at self.A[0]
take last item in the array and place at self.A[0]
decrease length of array
maxHeapify(0)
I've used pseudo-code because I'm not particularly comfortable with Python.
The loop inside your heapSort method is seriously non-optimum. You're calling self.build_max_heap at each iteration. You don't need to do that. If extractMax is working correctly, all you have to do is:
while self.heap_size() > 0:
temp = self.extractMax()
print temp
Now, if you want to sort the array in-place, so that self.A itself is sorted, that's a bit more tricky. But it doesn't look like that's what you're trying to do.
I am trying to find to the largest numeric palindrome that is a product of two numbers in a given range. First, I've written a simple function to check if the number is indeed a palindrome:
import math
def check_palindrome(n):
if n == 0: //O is a palindrome
return True
else:
check = [0]
i = 1 //Find the power of the number
while (n/i) >= 10:
i = i*10
m = n
j = 0
while i >= 1: //Add each digit to a list
j = math.floor(m/i)
check.append(j)
m = m-(j*i)
i = i/10
length = len(check)
if length == 1: //One digit number is always a palindrome
return True
else:
i = 1 //Check if list is a palindrome
while i <= length/2:
if check[i] != check[-i]:
return False
i += 1
return True
Next, I've implemented a priority queue and declared a class that contains a factor, the current number in the given range that is being multiplied by that factor, and functions for returning the value of this product and moving to the next lowest number.
import heapq
class PriorityQueue:
def __init__(self):
self.items = []
def push(self, priority, x):
heapq.heappush(self.items, (-priority, x))
def pop(self):
_, x = heapq.heappop(self.items)
return x
def empty(self):
return not self.items
class Products:
def __init__(self, factor, current):
self.f = factor
self.c = current
def value(self):
return self.f*self.c
def move(self):
self.c -= 1
Finally, the main function, which fills the priority queue with Product classes containing each factor between min and max and initially set to multiply by itself, given priority by the magnitude of its product, then pops the Product with the highest product and checks if it is a palindrome and if not requeues it with the number the factor is being multiplied by set one lower and continues to check the next highest Product until it finds a palindrome.
def max_palindrome(maximum, minimum):
q = PriorityQueue()
i = maximum
while i >= minimum:
p = Products(i, i)
q.push(p.value(), p)
i -= 1
check = False
while not check:
p = q.pop()
check = check_palindrome(p.value())
if not check and p.c > minimum:
p.move()
q.push(p.value(), p)
return p.value()
My problem is that max_palindrome() returns the correct answer up to (329,0), but range with a maximum of 330 or greater returns the error:
Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
max_palindrome(330,0)
File "C:\Users\Alec Collins\Documents\Euler\problem 4.py", line 67, in max_palindrome
p = q.pop()
File "C:\Users\Alec Collins\Documents\Euler\problem 4.py", line 52, in pop
_, x = heapq.heappop(self.items)
TypeError: unorderable types: Products() < Products().
Clearly something is up with the priority queue so the pop is not working but I have no idea what. Any ideas?
When a heap has two items with the same priority*, it compares their values using the "less than" operator to determine their comparative order. But your Products class doesn't implement that operator.
Try implementing __lt__ for your class.
class Products:
def __init__(self, factor, current):
self.f = factor
self.c = current
def value(self):
return self.f*self.c
def move(self):
self.c -= 1
def __lt__(self, other):
return self.value() < other.value()
(*this is something of a simplification; the heap algorithm does not have a native conception of "priority". But the outcome is the same; when you store two-element tuples in a heap, the second elements of each tuple are only compared when the first elements are equal, because that's how tuple comparison always works in Python.)
I have to find the second largest number and largest number from the list by divide and conquer algorithm. The problem is that everything is right except the part that I use indices like a and b. Because it works faster. Cost cheaper. Do not need rewrite code or send other codes and approaches. Just help me please to fix it if u can.. Any helps any ideas welcome. Thanks
#!/usr/local/bin/python2.7
def two_max(arr,a,b):
n = len(arr)
if n==2:
if arr[0]<arr[1]:
return (arr[1], arr[0])
else:
return (arr[0], arr[1])
(greatest_left, sec_greatest_left) = two_max(arr,a (a+b)/2)
(greatest_right, sec_greatest_right) = two_max(arr,(a+b)/2,b)
if greatest_left < greatest_right:
greatest = greatest_right
if greatest_left < sec_greatest_left:
return (greatest, sec_greatest_left)
else:
return (greatest, greatest_left)
else:
greatest = greatest_left
if greatest_right < sec_greatest_right: # Line 4
return (greatest, sec_greatest_right)
else:
return (greatest, greatest_right)
The biggest problem is that you never get any closer to your recursive base case.
The base case is len(arr) == 2. But every time you call yourself, you just pass arr as-is:
(greatest_left, sec_greatest_left) = two_max(arr,a,(a+b)/2)
(greatest_right, sec_greatest_right) = two_max(arr,(a+b)/2,b)
(Note that I'm guessing on the comma in the first one, because as you posted it, you're actually calling the number a as a function, which is unlikely to do anything useful…)
So, either your base case should take a and b into account, like this:
if b-a == 2:
if arr[a]<arr[a+1]:
return (arr[a+1], arr[a])
else:
return (arr[a], arr[a+1])
… or you should send a slice of arr instead of the whole thing—in which case you don't need a and b in the first place:
(greatest_left, sec_greatest_left) = two_max(arr[:len(a)/2])
(greatest_right, sec_greatest_right) = two_max(arr[len(a)/2:])
Either one will fix your first problem. Of course the function still doesn't work for most inputs. In fact, it only works if the length of the list is a power of two.
If that isn't a good enough hint for how to fix it: What happens if b-a is 3? Obviously you can't split it into two halves, both of which are of size 2 or greater. So, you'll need to write another base case for b-a == 1, and return something that will make the rest of the algorithm work.
Why don't you do it that way:
>>> def getIlargest(arr, i):
if (i <= len(arr) and i > 0):
return sorted(arr)[-i]
>>> a = [1,3,51,4,6,23,53,2,532,5,2,6,7,5,4]
>>> getIlargest(a, 2)
53
I took it one step further and tested 3 methods:
Using counting sort - getIlargestVer2
Using python sorted function - getIlargestVer1
Using heap - heapIlargest as #abarnert suggested.
The results:
for arrays in sizes from 1 to ~5000 sorted is the best, for larger arrays the heapq.nlargest usage is the winner:
plot for arrays in sizes between [1*150, 55*150]:
*Full scan between array in sizes of [1*150, 300*150]:*
The code I used is the following, the 3 methods implementation is in setup string:
setup = """
import heapq, random
a = random.sample(xrange(1<<30), 150)
a = a * factor
class ILargestFunctions:
# taken from [wiki][3] and was rewriting it.
def counting_sort(self, array, maxval):
m = maxval + 1
count = {}
for a in array:
if count.get(a, None) is None:
count[a] = 1
else:
count[a] += 1
i = 0
for key in count.keys():
for c in range(count[key]):
array[i] = key
i += 1
return array
def getIlargestVer1(self, arr, i):
if (i <= len(arr) and i > 0):
return sorted(arr)[-i]
def getIlargestVer2(self, arr, i):
if (i <= len(arr) and i > 0):
return self.counting_sort(arr, max(arr))[-i]
def heapIlargest(self, arr, i):
if (i <= len(arr) and i > 0):
return heapq.nlargest(i,arr)
n = ILargestFunctions()
"""
And the main line triggers the performance counting and plots the collected data is in:
import timeit
import numpy as np
import matplotlib.pyplot as plt
if __name__ == "__main__":
results = {}
r1 = []; r2 = []; r3 = [];
x = np.arange(1,300,1)
for i in xrange(1,300,1):
print i
factorStr = "factor = " + str(i) + ";"
newSetupStr = factorStr + setup
r1.append(timeit.timeit('n.getIlargestVer1(a, 100)', number=200, setup=newSetupStr))
r2.append(timeit.timeit('n.getIlargestVer2(a, 100)', number=200, setup=newSetupStr))
r3.append(timeit.timeit('n.heapIlargest(a, 100)', number=200, setup=newSetupStr))
results[i] = (r1,r2,r3)
p1 = plt.plot(x, r1, 'r', label = "getIlargestVer1")
p2 = plt.plot(x, r2, 'b' , label = "getIlargestVer2")
p3 = plt.plot(x, r3, 'g' , label = "heapIlargest")
plt.legend(bbox_to_anchor=(1.05, 1), loc=1, borderaxespad=0.)
plt.show()
#0x90 has the right idea, but he got it reversed.
def find_i_largest_element(seq, i):
if (i <= len(seq) and i > 0):
s = sorted(seq, reverse=True)
return s[i-1]
By the way, is this a homework assignment? If so, what's the whole idea behind the algorithm you have to use?