Python - sum of integers in a list (recursive) - python

I am trying to make a program that returns the sum of all integers in a list, bigger than n or equal to n. For example,
>>>floorSum([1,3,2,5,7,1,2,8], 4)
20
Here is the code I came up with:
def floorSum(l,n):
if len(l)>0:
if l[0]<n:
floorSum(l[1:],n)
else:
s=l[0]+floorSum(l[1:],n)
return s
I am getting: UnboundLocalError: local variable 's' referenced before assignment.
Any ideas?

you forgot to initialize s to zero
def floorSum(l,n):
s = 0
if len(l) > 0:
if l[0] < n:
s = floorSum(l[1:], n)
else:
s = l[0] + floorSum(l[1:], n)
else:
return 0
return s

As others pointed out, you neglected to initialize s for all cases and check for a length of zero.
Here's an alternative approach:
def floorSum(l, n):
if len(l) > 1:
mid = len(l) // 2 # Python 3 integer division
return floorSum(l[:mid], n) + floorSum(l[mid:], n)
if len(l) == 1 and l[0] >= n:
return l[0]
return 0
This version will divide the list into halves at each step, so although it doesn't do any less work the depth of the recursion stack is O(log(len(l))) rather than O(len(l)). That will prevent stack overflow for large lists.
Another benefit of this approach is the additional storage requirements. Python is creating sublists in both versions, but in your original version the additional storage required for the sublists is (n-1) + (n-2) + ... + 1, which is O(n2). With the successive halving approach, the additional storage requirement is O(n log n), which is substantially lower for large values of n. Allocating and freeing that additional storage may even impact the run time for large n. (Note, however, that this can be avoided in both algorithms by passing the indices of the range of interest as arguments rather than creating sublists.)

Thanks I solved the problem!
Forgot to put s=0

Python is a wonderful language that allows you to do that in a single line with list comprehension.
s = sum([value for value in l if value >= n])
Another way is to use filter
s = sum(filter(lambda e: e >= n, l))
The first one basically says:
"Create a new list from the elements of l, so that they are all greater or equal to n. Sum that new list."
The second one:
"Filter only the elements that are greater or equal to n. Sum over that."
You can find ample documentation on both of these techniques.
If you found the answer useful, mark it as accepted

Related

Sum of two squares in Python

I have written a code based on the two pointer algorithm to find the sum of two squares. My problem is that I run into a memory error when running this code for an input n=55555**2 + 66666**2. I am wondering how to correct this memory error.
def sum_of_two_squares(n):
look=tuple(range(n))
i=0
j = len(look)-1
while i < j:
x = (look[i])**2 + (look[j])**2
if x == n:
return (j,i)
elif x < n:
i += 1
else:
j -= 1
return None
n=55555**2 + 66666**2
print(sum_of_two_squares(n))
The problem Im trying to solve using two pointer algorithm is:
return a tuple of two positive integers whose squares add up to n, or return None if the integer n cannot be so expressed as a sum of two squares. The returned tuple must present the larger of its two numbers first. Furthermore, if some integer can be expressed as a sum of two squares in several ways, return the breakdown that maximizes the larger number. For example, the integer 85 allows two such representations 7*7 + 6*6 and 9*9 + 2*2, of which this function must therefore return (9, 2).
You're creating a tuple of size 55555^2 + 66666^2 = 7530713581
So if each element of the tuple takes one byte, the tuple will take up 7.01 GiB.
You'll need to either reduce the size of the tuple, or possibly make each element take up less space by specifying the type of each element: I would suggest looking into Numpy for the latter.
Specifically for this problem:
Why use a tuple at all?
You create the variable look which is just a list of integers:
look=tuple(range(n)) # = (0, 1, 2, ..., n-1)
Then you reference it, but never modify it. So: look[i] == i and look[j] == j.
So you're looking up numbers in a list of numbers. Why look them up? Why not just use i in place of look[i] and remove look altogether?
As others have pointed out, there's no need to use tuples at all.
One reasonably efficient way of solving this problem is to generate a series of integer square values (0, 1, 4, 9, etc...) and test whether or not subtracting these values from n leaves you with a value that is a perfect square.
You can generate a series of perfect squares efficiently by adding successive odd numbers together: 0 (+1) → 1 (+3) → 4 (+5) → 9 (etc.)
There are also various tricks you can use to test whether or not a number is a perfect square (for example, see the answers to this question), but — in Python, at least — it seems that simply testing the value of int(n**0.5) is faster than iterative methods such as a binary search.
def integer_sqrt(n):
# If n is a perfect square, return its (integer) square
# root. Otherwise return -1
r = int(n**0.5)
if r * r == n:
return r
return -1
def sum_of_two_squares(n):
# If n can be expressed as the sum of two squared integers,
# return these integers as a tuple. Otherwise return <None>
# i: iterator variable
# x: value of i**2
# y: value we need to add to x to obtain (i+1)**2
i, x, y = 0, 0, 1
# If i**2 > n / 2, then we can stop searching
max_x = n >> 1
while x <= max_x:
r = integer_sqrt(n-x)
if r >= 0:
return (i, r)
i, x, y = i+1, x+y, y+2
return None
This returns a solution to sum_of_two_squares(55555**2 + 66666**2) in a fraction of a second.
You do not need the ranges at all, and certainly do not need to convert them into tuples. They take a ridiculous amount of space, but you only need their current elements, numbers i and j. Also, as the friendly commenter suggested, you can start with sqrt(n) to improve the performance further.
def sum_of_two_squares(n):
i = 1
j = int(n ** (1/2))
while i < j:
x = i * i + j * j
if x == n:
return j, i
if x < n:
i += 1
else:
j -= 1
Bear in mind that the problem takes a very long time to be solved. Be patient. And no, NumPy won't help. There is nothing here to vectorize.

Consider an array n of integers A=[a1,a2,a3......an]. Find and print the total number of pairs such that ai*aj <= max(ai,ai+1,.....aj) where i < j

Can anyone please help me with the above question.
We have to find combination of elements in the array (a1,a2),(a1,a3),(a1,a4).... so on, and pick those combinations which satisfies the condition (ai*aj) <= max(A) where A is the array and return the number of combinations possible.
Example : input array A = [1,1,2,4,2] and it returns 8 as the combinations are :
(1,1),(1,2),(1,4),(1,2),(1,2),(1,4),(1,2),(2,2).
It's easy to solve this using nested for loops but that would be very time consuming.(O(n^2)).
Naive algorithm:
array = [1,1,2,4,2]
result = []
for i in range(len(array)):
for j in range(len(array)):
if array[i] * array[j] <= max(array):
if (array[j],array[i]) not in result:
result.append((array[i],array[j]))
print(len(result))
What should be the approach when we encounter such problems ?
What I understand reading your problem description is that you want to find the total count of pairs whose multiplication is less than the maximum element in the range between these pairs i.e. ai*aj <= max(ai,ai+1,…aj).
The naive approach suggested by Thomas is easy to understand but it is still of time complexity of O(n^2)). We can optimize this to reduce the time complexity to O(n*log^2n). Lets discuss about it in details.
At first, from each index- i, we can find out the range say {l, r} in which element at index- i will be greater than or equal to all the elements from l to i , as well as it is greater than all elements ranges from i + 1 to r. This can be easily calculated in O(n) time complexity using the idea of histogram data-structure.
Now, lets for each index-i, we find out such range {l, r}, and if we want to traverse over minimum length out of two ranges i.e. min( i - l, r - i ) then overall we will traverse n*logn indices for overall array. While traversing over small length range, if we encounter some element say x, then we somehow have to find out how many elements exists in other range such that values are less than ai / x. This can be solved using offline processing with Fenwick Tree data-structure in O(logn) time complexity for each query. Hence, we can solve above problem with overall complexity of O(n log^2 n) time complexity.
What about sorting the array, then iterating up: for each element, e, binary search the closest element to floor(max(A) / e) that's lower than or equal to e. Add the number of elements to the left of that index. (If there are many duplicates, hash their count, present only two of them in the sorted array, and use prefix sums to return the correct number of items to the left of any index.)
1 1 2 4 2
1 1 2 2 4
0
1
2
3
2
It's easy to solve this using nested for loops but that would be very time consuming.(O(n^2)).
since where i < j we can cut this in half:
for i in range(len(array)):
for j in range(i+1, len(array)):
...
Now let's get rid of this part if (array[j],array[i]) not in result: as it does not reflect your results: (1,1),(1,2),(1,4),(1,2),(1,2),(1,4),(1,2),(2,2) here you have dupes.
The next expensive step we can get rid of is max(array) wich is not only wrong (max(ai,ai+1,…aj) translates to max(array[i:j])) but has to iterate over a whole section of the array in each iteration. Since the Array doesn't change, the only thing that may change this maximum is array[j], the new value you're processing.
Let's store that in a variable:
array = [1,1,2,4,2]
result = []
for i in range(len(array)):
maxValue = array[i]
for j in range(i+1, len(array)):
if array[j] > maxValue:
maxValue = array[j]
if array[i] * array[j] <= maxValue:
result.append((array[i], array[j]))
print(len(result))
Still a naive algorithm, but imo we've made some improvements.
Another thing we could make is not only store the maxValue, but also a pivot = maxValue / array[i] and therefore replace the multiplication by a simple comparison if array[j] <= pivot:. Doing this under the assumption that the multiplication would have been called way more often than the maxValue and therefore the pivot changes.
But since I'm not very experienced in python, I'm not sure wether this would make any difference in python or wether I'm on the road to pointless micro-optimizations with this.

Find maximum sum of sublist in list of positive integers under O(n^2) of specified length Python 3.5

For one of my programming questions, I am required to define a function that accepts two variables, a list of length l and an integer w. I then have to find the maximum sum of a sublist with length w within the list.
Conditions:
1<=w<=l<=100000
Each element in the list ranges from [1, 100]
Currently, my solution works in O(n^2) (correct me if I'm wrong, code attached below), which the autograder does not accept, since we are required to find an even simpler solution.
My code:
def find_best_location(w, lst):
best = 0
n = 0
while n <= len(lst) - w:
lists = lst[n: n + w]
cur = sum(lists)
best = cur if cur>best else best
n+=1
return best
If anyone is able to find a more efficient solution, please do let me know! Also if I computed my big-O notation wrongly do let me know as well!
Thanks in advance!
1) Find sum current of first w elements, assign it to best.
2) Starting from i = w: current = current + lst[i]-lst[i-w], best = max(best, current).
3) Done.
Your solution is indeed O(n^2) (or O(n*W) if you want a tighter bound)
You can do it in O(n) by creating an aux array sums, where:
sums[0] = l[0]
sums[i] = sums[i-1] + l[i]
Then, by iterating it and checking sums[i] - sums[i-W] you can find your solution in linear time
You can even calculate sums array on the fly to reduce space complexity, but if I were you, I'd start with it, and see if I can upgrade my solution next.

Heap sort not working python

Referring to MIT's open course ware algo course chapter 4, I created heap sort according to the psuedocode given:
def heap_sort(A):
build_max_heap(A)
curr_heap_size = len(A)
while curr_heap_size:
A[0],A[curr_heap_size-1] = A[curr_heap_size-1],A
curr_heap_size -= 1
max_heapify(A,0)
return A
build_max_heap is guaranteed to be correct, as I checked it with pythons heapq library.
However, heap_sort does not seem to work correctly,
test_array = [1,5,3,6,49,2,4,5,6]
heap_sort(test_array)
print(test_array) # --> [6,5,5,4,3,49,6,2,1]
Completely stumped here, I cross checked with Heap sort Python implementation and it appears to be the same...
Would appreciate help, thank you!
EDIT: Code for max_heapify and build_max_heap:
def max_heapify(A,i):
heap_size = len(A)
l,r = i*2+1,i*2+2
largest = l
if l < heap_size and A[l] > A[largest]:
largest = l
if r < heap_size and A[r] > A[largest]:
largest = r
if largest != i:
A[i],A[largest] = A[largest],A[i]
max_heapify(A,largest)
def build_max_heap(A):
array_size = len(A)
for i in range(array_size // 2 ,-1,-1):
max_heapify(A,largest)
You have a few mistakes in your code that made it harder to regenerate your case, and find the solution to your particular issue, but here it goes.
First, your code includes a syntax error in heap_sort function, specifically when you try to swap the first and the last elements of A. On the right hand side of that assignment, the second value is A, even though it should be A[0].
Secondly, your usage of variable largest in build_max_heap implies either that largest is a global variable declaration of which you did not provide in your question, or you meant to use i, instead. I assumed it was the second case, and since I have a working heap_sort based on the code you provided, I reckon my assumption was correct.
Third, in max_heapify, you initialize largest to l, even though you should initialize it with i. I believe you will find this to be a trivial mistake, as further down that same function, you clearly expect the value of largest to be equal to that of i.
Finally, your most crucial error is that you keep passing down the entire array and use an array length that never decreases.(i.e. it is always the initial length of test_array) The algorithm you use finds the maximum element of the given array and exclude it from the remainder of the structure. That way, you have an array that keeps on decreasing in size while sending its largest element to the end.(i.e. just beyond its reach/length) But, since you never decrease the size of the array, and its length is continually computed as len(test_array), it will never work as expected.
There are two approaches that can solve your issue. Option 1 is passing down to max_heapify a shorter version of A in heap_sort. Specifically you should pass A[:curr_heap_size] at each iteration of the loop. This method can work, but it is not really space efficient, as you make a new list each time. Instead, you can pass down curr_heap_size as an argument to functions build_max_heap and max_heapify, and assume that to be the length. (i.e. as opposed to len(A))
Below is a working heap_sort implementation, based on your code. All I did was fixing the mistakes I listed above.
def max_heapify(A, heap_size, i):
l,r = i*2+1,i*2+2
largest = i
if l < heap_size and A[l] > A[largest]:
largest = l
if r < heap_size and A[r] > A[largest]:
largest = r
if largest != i:
A[i], A[largest] = A[largest], A[i]
max_heapify(A, heap_size, largest)
def build_max_heap(A, array_size):
for i in range(array_size // 2 ,-1,-1):
max_heapify(A, array_size, i)
def heap_sort(A):
curr_heap_size = len(A)
build_max_heap(A, curr_heap_size)
while curr_heap_size > 0:
A[0], A[curr_heap_size-1] = A[curr_heap_size-1], A[0]
curr_heap_size -= 1
max_heapify(A, curr_heap_size, 0)
return A

What's a fast and pythonic/clean way of removing a sorted list from another sorted list in python?

I am creating a fast method of generating a list of primes in the range(0, limit+1). In the function I end up removing all integers in the list named removable from the list named primes. I am looking for a fast and pythonic way of removing the integers, knowing that both lists are always sorted.
I might be wrong, but I believe list.remove(n) iterates over the list comparing each element with n. meaning that the following code runs in O(n^2) time.
# removable and primes are both sorted lists of integers
for composite in removable:
primes.remove(composite)
Based off my assumption (which could be wrong and please confirm whether or not this is correct) and the fact that both lists are always sorted, I would think that the following code runs faster, since it only loops over the list once for a O(n) time. However, it is not at all pythonic or clean.
i = 0
j = 0
while i < len(primes) and j < len(removable):
if primes[i] == removable[j]:
primes = primes[:i] + primes[i+1:]
j += 1
else:
i += 1
Is there perhaps a built in function or simpler way of doing this? And what is the fastest way?
Side notes: I have not actually timed the functions or code above. Also, it doesn't matter if the list removable is changed/destroyed in the process.
For anyone interested the full functions is below:
import math
# returns a list of primes in range(0, limit+1)
def fastPrimeList(limit):
if limit < 2:
return list()
sqrtLimit = int(math.ceil(math.sqrt(limit)))
primes = [2] + range(3, limit+1, 2)
index = 1
while primes[index] <= sqrtLimit:
removable = list()
index2 = index
while primes[index] * primes[index2] <= limit:
composite = primes[index] * primes[index2]
removable.append(composite)
index2 += 1
for composite in removable:
primes.remove(composite)
index += 1
return primes
This is quite fast and clean, it does O(n) set membership checks, and in amortized time it runs in O(n) (first line is O(n) amortized, second line is O(n * 1) amortized, because a membership check is O(1) amortized):
removable_set = set(removable)
primes = [p for p in primes if p not in removable_set]
Here is the modification of your 2nd solution. It does O(n) basic operations (worst case):
tmp = []
i = j = 0
while i < len(primes) and j < len(removable):
if primes[i] < removable[j]:
tmp.append(primes[i])
i += 1
elif primes[i] == removable[j]:
i += 1
else:
j += 1
primes[:i] = tmp
del tmp
Please note that constants also matter. The Python interpreter is quite slow (i.e. with a large constant) to execute Python code. The 2nd solution has lots of Python code, and it can indeed be slower for small practical values of n than the solution with sets, because the set operations are implemented in C, thus they are fast (i.e. with a small constant).
If you have multiple working solutions, run them on typical input sizes, and measure the time. You may get surprised about their relative speed, often it is not what you would predict.
The most important thing here is to remove the quadratic behavior. You have this for two reasons.
First, calling remove searches the entire list for values to remove. Doing this takes linear time, and you're doing it once for each element in removable, so your total time is O(NM) (where N is the length of primes and M is the length of removable).
Second, removing elements from the middle of a list forces you to shift the whole rest of the list up one slot. So, each one takes linear time, and again you're doing it M times, so again it's O(NM).
How can you avoid these?
For the first, you either need to take advantage of the sorting, or just use something that allows you to do constant-time lookups instead of linear-time, like a set.
For the second, you either need to create a list of indices to delete and then do a second pass to move each element up the appropriate number of indices all at once, or just build a new list instead of trying to mutate the original in-place.
So, there are a variety of options here. Which one is best? It almost certainly doesn't matter; changing your O(NM) time to just O(N+M) will probably be more than enough of an optimization that you're happy with the results. But if you need to squeeze out more performance, then you'll have to implement all of them and test them on realistic data.
The only one of these that I think isn't obvious is how to "use the sorting". The idea is to use the same kind of staggered-zip iteration that you'd use in a merge sort, like this:
def sorted_subtract(seq1, seq2):
i1, i2 = 0, 0
while i1 < len(seq1):
if seq1[i1] != seq2[i2]:
i2 += 1
if i2 == len(seq2):
yield from seq1[i1:]
return
else:
yield seq1[i1]
i1 += 1

Categories