Python how do i make list appends / extends quicker? - python

Heyo all.
Trying to get better at python and started doing leetcode problems.
Im currently doing one, were the goal is to capture water.
Link => https://leetcode.com/problems/trapping-rain-water/
problem is; it times me out for taking too long. My code is certainly inefficient. Afer googling around i found that .append is supposedly very slow / inefficient. So is .extend.
Cant find any obvious ways of making my code faster; hence my arrival here.
any response is much appreciated
class Solution:
def trap(self, height: List[int]) -> int:
max_height = max(height)
water_blocks = 0
for element in range(max_height):
local_map = []
element = element + 1
for block in height:
if block >= element:
local_map.extend([1])
else:
local_map.extend([0])
if local_map.count(1) > 1:
first_index = local_map.index(1)
reversed_list = local_map[::-1]
last_index = len(local_map) - 1 - reversed_list.index(1)
water_count = last_index - first_index - 1 - (local_map.count(1) - 2)
water_blocks += water_count
else:
continue
return water_blocks

Although many of your count and index calls can be avoided, the two big nested loops might still be a problem. For the outer loop, max_height can be large number and the inner loop iterates over the full list. You might need to come up with a different algorithm.
I don't have a leetcode account, so I can't really test my code, but this would be my suggestion: It iterates over the height-list only once, with a small inner loop to find the next matching wall.
class Solution:
def trap(self, h):
water = 0
current_height = 0
for i, n in enumerate(h):
# found a "bucket", add water
if n < current_height:
water += current_height - n
else: # found a wall. calculate usable height
current_height = self.n_or_max(h[i+1:], n)
return water
def n_or_max(self, h, n):
local_max = 0
for i in h:
if i > local_max:
local_max = i
# that's high enough, return n
if i >= n:
return n
return local_max

Here are some pointers:
Do not use list.count() or list.index() (that is, try to remove local_map.count(1), local_map.index(1) and reversed_list.index(1)). The first will loop (internally) over the whole list, which is obviously expensive if the list is large. The second will loop over the list until a 1 is found. Currently you even have two calls to local_map.count(1) which will always return the same answer, so at least just store the result in a variable. In your loop over blocks, you construct local_map yourself, so you do in fact know exactly what it contains, you should not have to search through it afterwards. Just put a few ifs into the first loop over blocks.
The operation local_map[::-1] not only runs over the whole list, but additionally copies the whole thing into a new list (backwards, but that's not really contributing to the issue). Again, this new list does not contain new information, so you can figure out the value of water_count without doing this.
The above is really the major issues. A slight further optimization can be obtained by eliminating element = element + 1. Just shift the range, as in range(1, max_height + 1).
Also, as written in the comments, prefer list.append(x) to list.extend([x]). It's not huge, but the latter has to create an additional list (of length 1), put x into it, loop over the list and append its elements (just x) to the large list. Finally, the length-1 list is thrown away. On the contrary, list.append(x) just appends x to the list, no temporary length-1 list needed.
Note that list.append() is not slow. It's a function call, which is always somewhat slow, but the actual data operation is fast: constant time and even cleverly amortized, as juanpa.arrivillaga writes.

Here's another way of looking at the problem. This scans left to right over the bins, and at each point, I track how many units of water are dammed up at each level. When there's a tall wall, we tally up whatever units it was damming, and clear them. However, this still gets an "overtime" flag on the next to the last test, which has about 10,000 entries. It takes 20 seconds on my relatively old box.
class Solution():
def trap(self, height):
trapped = 0
accum = [0]*max(height)
lastwall = -1
for h in height:
# Take credit for everything up to our height.
trapped += sum(accum[0:h])
accum[0:h] = [0]*h
for v in range(h,lastwall):
accum[v] += 1
lastwall = max(lastwall,h)
return trapped
print(Solution().trap([0,1,0,2,1,0,1,3,2,1,2,1])) # 6
print(Solution().trap([4,2,0,3,2,5])) # 9

Related

Trying to find sets of numbers with all distinct sums; help optimizing algorithm?

I was recently trying to write an algorithm to solve a math problem I came up with (long story how I encountered it): basically, I wanted to come up with sets of P distinct integers such that given a number, there is at most one way of selecting G numbers from the set (repetitions allowed) which sum to that number (or put another way, there are not two distinct sets of G integers from the set with the same sum, called a "collision"). For example, with P, G = 3, 3, the set (10, 1, 0) would work, but (2, 1, 0) wouldn't, since 1+1+1=2+1+0.
I came up with an algorithm in Python that can find and generate these sets, but when I tried it, it runs extremely slowly; I'm pretty sure there is a much more optimized way to do this, but I'm not sure how. The current code is also a bit messy because parts were added organically as I figured out what I needed.
The algorithm starts with these two functions:
import numpy
def rec_gen_list(leng, index, nums, func):
if index == leng-1: #list full
func(nums)
else:
nextMax = nums[index-1];
for nextNum in range(nextMax)[::-1]: # nextMax-1 to 0
nums[index] = nextNum;
rec_gen_list(leng, index+1, nums, func)
def gen_all_lists(leng, first, func):
nums = np.zeros(leng, dtype='int')
nums[0] = first
rec_gen_list(leng, 1, nums, func)
Basically, this code generates all possible lists of distinct integers (with maximum of "first" and minimum 0) and applies some function to them. rec_gen_list is the recursive part; given a partial list and an index, it tries every possible next number in the list less than the last one, and sends that to the next recursion. Once it gets to the last iteration (with the list being full), it applies the given function to the completed list. Note that I stop before the last entry in the list, so it always ends with 0; I enforce that because if you have a list that doesn't contain 0, you can subtract the smallest number from each one in the list to get one that does, so I force them to have 0 to avoid duplicates and make other things I'm planning to do more convenient.
gen_all_lists is the wrapper around the recursive function; it sets up the array and first iteration of the process and gets it started. For example, you could display all lists of 4 distinct numbers between 7 and 0 by calling it as gen_all_lists(4, 7, print). The function included is so that once the lists are generated, I can test them for collisions before displaying them.
However, after coming up with these, I had to modify them to fit with the rest of the algorithm. First off, I needed to keep track of if the algorithm had found any lists that worked; this is handled by the foundOne and foundNew variables in the updated versions. This probably could be done with a global variable, but I don't think it's a significant issue with the slowdown.
In addition, I realized that I could use backtracking to significantly optimize this: if the first 3 numbers out of a long list are something like (100, 99, 98...), that already causes a collision, so I can skip checking all the lists generated from that. This is handled by the G variable (described before) and the test_no_colls function (which tests if a list has any collisions for a certain value of G); each time I make a new sublist, I check it for collisions, and skip the recursive call if I find any.
This is the result of these modifications, used in the current algorithm:
import numpy
def rec_test_list(leng, index, nums, G, func, foundOne):
if index == leng - 1: #list full
foundNew = func(nums)
foundOne = foundOne or foundNew
else:
nextMax = nums[index-1];
for nextNum in range(nextMax)[::-1]: # nextMax-1 to 0
nums[index] = nextNum;
# If already a collision, don't bother going down this tree.
if (test_no_colls(nums[:index+1], G)):
foundNew = rec_test_list(leng, index+1, nums, G, func, foundOne)
foundOne = foundOne or foundNew
return foundOne
def test_all_lists(leng, first, G, func):
nums = np.zeros(leng, dtype='int')
nums[0] = first
return rec_test_list(leng, 1, nums, G, func, False)
For the next two functions, test_no_colls takes a list of numbers and a number G, and determines if there are any "collisions" (two distinct sets of G numbers from the list that add to the same total), returning true if there are none. It starts by making a set that contains the possible scores, then generates every possible distinct set of G indices into the list (repetition allowed) and finds their totals. Each one is checked for in the set; if one is found, there are two combinations with the same total.
The combinations are generated with another algorithm I came up with; this probably could be done the same way as generating the initial lists, but I was a bit confused about the variable scope of the set, so I found a non-recursive way to do it. This may be something to optimize.
The second function is just a wrapper for test_no_colls, printing the input array if it passes; this is used in the test_all_lists later on.
def test_no_colls(nums, G):
possiblePoints=set(()) # Set of possible scores.
ranks = np.zeros(G, dtype='int')
ranks[0] = len(nums) - 1 # Lowest possible rank.
curr_ind = 0
while True: # Repeat until break.
if ranks[curr_ind] >= 0: # Copy over to make the start of the rest.
if curr_ind < G - 1:
copy = ranks[curr_ind]
curr_ind += 1
ranks[curr_ind] = copy
else: # Start decrementing, since we're at the end. We also have a complete list, so test it.
# First, get the score for these rankings and test to see if it collides with a previous score.
total_score = 0
for rank in ranks:
total_score += nums[rank]
if total_score in possiblePoints: # Collision found.
return False
# Otherwise, add the new score to the list.
possiblePoints.add(total_score)
#Then backtrack and continue.
ranks[curr_ind] -= 1
else:
# If the current value is less than 0, we've exhausted the possibilities for the rest of the list,
# and need to backtrack if possible and start with the next lowest number.
curr_ind -= 1;
if (curr_ind < 0): # Backtracked from the start, so we're done.
break
else:
ranks[curr_ind] -= 1 # Start with the next lowest number.
# If we broke out of the loop before returning, no collisions were found.
return True
def show_if_no_colls(nums, games):
if test_no_colls(nums, games):
print(nums)
return True
return False
These are the final functions that wrap everything up. find_good_lists wraps up test_all_lists more conveniently; it finds all lists ranging from 0 to maxPts of length P which have no collisions for a certain G. find_lowest_score then uses this to find the smallest possible maximum value of a list that works for a certain P and G (for example, find_lowest_score(6, 3) finds two possible lists with max 45, [45 43 34 19 3 0] and [45 42 26 11 2 0], with nothing that is all below 45); it also shows some timing data about how long each iteration took.
def find_good_lists(maxPts, P, G):
return test_all_lists(P, maxPts, G, lambda nums: show_if_no_colls(nums, G))
from time import perf_counter
def find_lowest_score(P, G):
maxPts = P - 1; # The minimum possible to even generate a scoring table.
foundSet = False;
while not foundSet:
start = perf_counter()
foundSet = find_good_lists(maxPts, P, G)
end = perf_counter()
print("Looked for {}, took {:.5f} s".format(maxPts, end-start))
maxPts += 1;
So, this algorithm does seem to work, but it runs very slowly; when trying to run lowest_score(7, 3), for example, it starts taking minutes per iteration around maxPts in the 70s or so, even on Google Colab. Does anyone have suggestions for optimizing this algorithm to improve its runtime and time complexity, or better ways to solve the problem? I am interested in further exploration of this (such as filtering the lists generated for other qualities), but am concerned about the time it would take with this algorithm.

Python Find the mean school assignment - What is a loop?

I have been working on this assignment for about 2 weeks and have nothing done. I am a starter at coding and my teacher is really not helping me with it. She redirects me to her videos that I have to learn from every time and will not directly tell or help me on how I can do it. Here are the instructions to the assignment (said in a video, but made it into text.
Find the mean
Create a program that finds the mean of a list of numbers.
Iterate through it, and instead of printing each item, you want to add them together.
Create a new variable inside of that, that takes the grand total when you add things together,
And then you have to divide it by the length of your array, for python/java script you’ll need to use the method that lets you know the length of your list.
Bonus point for kids who can find the median, to do that you need to sort your list and then you need to remove items from the right and the left until you only have one left
All you’re doing is you need to create a variable that is your list
Create another variable that is a empty one at the moment and be a number
Iterate through your list and add each of the numbers to the variable you created
Then divide the number by the number of items that you had in the list.
Here's what I've done so far.
num = [1, 2, 3, 4, 5, 6];
total = 0;
total = (num[0] + total)
total = (num[1] + total)
total = (num[2] + total)
total = (num[3] + total)
total = (num[4] + total)
total = (num[5] + total)
print(total)
However, she tells me I need to shorten down the total = (num[_] + total) parts into a loop. Here is how she is telling me to do a loop in one of her videos.
for x in ____: print(x)
or
for x in range(len(___)): print (x+1, ____[x])
there is also
while i < len(___):
print(___[i])
i = i + 1
I really don't understand how to do this so explain it like I'm a total noob.
First of all, loops in python are of two types.
while: a while loop executes the code in a body until a condition is true. For example:
i = 0
while(i < 5):
i = i + 1
executes i = i + 1 until i < 5 is true, meaning that when i will be equal to 5 the loop will terminate because its condition becomes false.
for: a for loop in python iterates over the items of any sequence, from the first to the last, and execute its body at each iteration.
Note: in both cases, by loop body I mean the indented code, in the example above the body is i = i + 5.
Iterating over a list. You can iterate over a list:
Using an index
As each position of the array is indexed with a positive number from 0 to the length of the array minus 1, you can access the positions of the array with an incremental index. So, for example:
i = 0
while i < len(arr):
print(arr[i])
i = i + 1
will access arr[0] in the first iteration, arr[1] in the second iteration and so on, up to arr[len(arr)-1] in the last iteration. Then, when i is further incremented, i = len(arr) and so the condition in the while loop (i < arr[i]) becomes false. So the loop is broken.
Using an iterator
I won't go in the details of how an iterator works under the surface since it may be too much to absorb for a beginner. However, what matters to you is the following. In Python you can use an iterator to write the condition of a for loop, as your teacher showed you in the example:
for x in arr:
print(x)
An iterator is intuitively an object that iterates over something that has the characteristic of being "iterable". Lists are not the only iterable elements in python, however they are probably the most important to know. Using an iterator on a list allows you to access in order all the elements of the list. The value of the element of the list is stored in the variable x at each iteration. Therefore:
iter 1: x = arr[0]
iter 2: x = arr[1]
...
iter len(arr)-1: x = arr[len(arr)-1]
Once all the elements of the list are accessed, the loop terminates.
Note: in python, the function range(n) creates an "iterable" from 0 to n-1, so the for loop
for i in range(len(arr)):
print(arr[i])
uses an iterator to create the sequence of values stored in i and then i is in turn used on the array arr to access its elements positionally.
Summing the elements. If you understand what I explained to you, it should be straightforward to write a loop to sum all the elements of a list. You initialize a variable sum=0 before the loop. Then, you add the element accessed as we saw above at each iteration to the variable sum. It will be something like:
sum = 0
for x in arr:
sum = sum + x
I will let you write an equivalent code with the other two methods I showed you and do the other points of the assignment by yourself. I am sure that once you'll understand how it works you'll be fine. I hope to have answered your question.
She wants you to loop through the list.
Python is really nice makes this easier than other languages.
I have an example below that is close to what you need but I do not want to do your homework for you.
listName = [4,8,4,7,84]
for currentListValue in listName:
#Do your calculating here...
#Example: tempVar = tempVar + (currentListValue * 2)
as mentioned in the comments w3schools is a good reference for python.

optimizing code running time [diffrence between the codes below]

these are two codes, can anyone tell me why the second one takes more time to run.
#1
ar = [9547948, 8558390, 9999933, 5148263, 5764559, 906438, 9296778, 1156268]
count=0
big = max(ar)
for i in range(len(ar)):
if(ar[i]==big):
count+=1
print(count)
#2
ar = [9547948, 8558390, 9999933, 5148263, 5764559, 906438, 9296778, 1156268]
list = [i for i in ar if i == max(ar)]
return len(list)
In the list comprehension (the second one), the if clause is evaluated for each candidate item, so max() is evaluated each time.
In the first one, the maximum is evaluated once, before the loop starts. You could probably get a similar performance from the list comprehension by pre-evaluating the maximum in the same way:
maxiest = max(ar)
list = [i for i in ar if i == maxiest]
Additionally, you're not creating a new list in the first one, rather you're just counting the items that match the biggest one. That may also have an impact but you'd have to do some measurements to be certain.
Of course, if you just want to know what the difference between those two algorithms are, that hopefully answers your question. However, you should be aware that max itself will generally pass over the data, then your search will do that again. There is a way to do it with only one pass, something like:
def countLargest(collection):
# Just return zero for empty list.
if len(collection) == 0: return 0
# Setup so first item is largest with count one.
count = 0
largest = collection[0]
# Process every item.
for item in collection:
if item > largest:
# Larger: Replace with count one.
largest = item
count = 1
elif item == largest:
# Equal: increase count.
count += 1
return count
Just keep in mind you should check if that's faster, based on likely data sets (the optimisation mantra is "measure, don't guess").
And, to be honest, it won't make much difference until either your data sets get very large or you need to do it many, many times per second. It certainly won't make any real difference for your eight-element collection. Sometimes, it's better to optimise for readability rather than performance.

Am I doing this right? Removing items from python list - is there room for optimization?

I have 2 lists. I want to remove all items from list which contain strings from second list. Now, I am using classical 2 loop approach, 1st I loop over copy of main list and then for every item i check if it contains any strings from 2nd list. Then I delete the item if string is found. And I can end the 2nd loop with break, since no more lookup is needed (we're gonna remove this line anyway). This works just fine - as you can see, I am iterating over copy of list, so removing elements is not a problem.
Here is the code:
intRemoved = 0
sublen = len(mylist) + 1
halflen = sublen / 2
for i, line in enumerate(mylist[:], 1):
for item in REM:
if item.encode('utf8').upper() in line.text.encode('utf8').upper():
if i < halflen:
linepos = i
else:
linepos = (sublen - i) * -1
mylist.remove(line)
intRemoved += 1
break
Now, I need data how many lines I removed (intRemoved) and position in the list (from beginning of list or end of list, that's why it splits in half). Positive numbers indicate removed line position from the beginning of the file, negative from end.
Ahh, yes, and I am ignoring the case. That's why there is .upper().
Now, since I am in no way pro, I just need to know if I am doing it right, performance-wise? Am I doing something that's bad for performance? Is there a way to optimize this?
Thanx,
D.
As you have been said, probably you should be looking in codereview. Anyhow, I am pretty sure using sets and intersection operation is going to be much faster.
Take a look here: https://docs.python.org/2/library/stdtypes.html#set
Calling encode is unnecessary. Calling upper each time is not ideal. Coping the list for iterating is expensive. Removing is more expensive, since one have to search for the element and shifting elements. Counting intRemoved is not the best way, either.
sublen = len(subsSrt) + 1
halflen = sublen / 2
filtered_list = []
rem_upper = [item.upper() for item in REM]
for i, line in enumerate(mylist, 1):
text = line.text.upper()
if any(item in text for item in rem_upper):
if i < halflen:
linepos = i
else:
linepos = (sublen - i) * -1
else:
filtered_list.append(line)
intRemoved = len(mylist) - len(filtered_list)

using recursion to find the maximum in a list

I'm trying to find the maximum element in a list using recursion.
the input needs to be the actual list, the left index, and the right index.
I have written a function and can't understand why it won't work. I drew the recursion tree, ran examples of lists in my head, and it makes sense , that's why it's even harder to find a solution now ! (it's fighting myself basically).
I know it's ugly, but try to ignore that. my idea is to split the list in half at each recursive call (it's required), and while the left index will remain 0, the right will be the length of the new halved list, minus 1.
the first call to the function will be from the tail function.
thanks for any help, and I hope I'm not missing something really stupid, or even worse- not even close !
by the way- didn't use slicing to cut list because I'm not allowed.
def max22(L,left,right):
if len(L)==1:
return L[0]
a = max22([L[i] for i in range(left, (left+right)//2)], 0 , len([L[i] for i in range(left, (left+right)//2)])-1)
b = max22([L[i] for i in range(((left+right)//2)+1, right)], 0 ,len([L[i] for i in range(left, (left+right)//2)])-1)
return max(a,b)
def max_list22(L):
return max22(L,0,len(L)-1)
input example -
for max_list22([1,20,3]) the output will be 20.
First off, for the sake of clarity I suggest assigning your list comprehensions to variables so you don't have to write each one twice. This should make the code easier to debug. You can also do the same for the (left+right)//2 value.
def max22(L,left,right):
if len(L)==1:
return L[0]
mid = (left+right)//2
left_L = [L[i] for i in range(left, mid)]
right_L = [L[i] for i in range(mid+1, right)]
a = max22(left_L, 0 , len(left_L)-1)
b = max22(right_L, 0 , len(left_L)-1)
return max(a,b)
def max_list22(L):
return max22(L,0,len(L)-1)
print max_list22([4,8,15,16,23,42])
I see four problems with this code.
On your b = line, the second argument is using len(left_L) instead of len(right_L).
You're missing an element between left_L and right_L. You should not be adding one to mid in the right_L list comprehension.
You're missing the last element of the list. You should be going up to right+1 in right_L, not just right.
Your mid value is off by one in the case of even sized lists. Ex. [1,2,3,4] should split into [1,2] and [3,4], but with your mid value you're getting [1] and [2,3,4]. (assuming you've already fixed the missing element problems in the previous bullet points).
Fixing these problems looks like:
def max22(L,left,right):
if len(L)==1:
return L[0]
mid = (left+right+1)//2
left_L = [L[i] for i in range(left, mid)]
right_L = [L[i] for i in range(mid, right+1)]
a = max22(left_L, 0 , len(left_L)-1)
b = max22(right_L, 0 , len(right_L)-1)
return max(a,b)
def max_list22(L):
return max22(L,0,len(L)-1)
print max_list22([4,8,15,16,23,42])
And if you insist on not using temporary variables, it looks like:
def max22(L,left,right):
if len(L)==1:
return L[0]
a = max22([L[i] for i in range(left, (left+right+1)//2)], 0 , len([L[i] for i in range(left, (left+right+1)//2)])-1)
b = max22([L[i] for i in range((left+right+1)//2, right+1)], 0 , len([L[i] for i in range((left+right+1)//2, right+1)])-1)
return max(a,b)
def max_list22(L):
return max22(L,0,len(L)-1)
print max_list22([4,8,15,16,23,42])
Bonus style tip: you don't necessarily need three arguments for max22, since left is always zero and right is always the length of the list minus one.
def max22(L):
if len(L)==1:
return L[0]
mid = (len(L))//2
left_L = [L[i] for i in range(0, mid)]
right_L = [L[i] for i in range(mid, len(L))]
a = max22(left_L)
b = max22(right_L)
return max(a,b)
print max22([4,8,15,16,23,42])
The problem is that you aren't handling empty lists at all. max_list22([]) recurses infinitely, and [L[i] for i in range(((left+right)//2)+1, right)] eventually produces an empty list.
Your problem is that you don't handle uneven splits. Lists could become empty using your code, but you can also stop on sizes 1 and 2 instead of 0 and 1 whichi s more natural (because you return a max, zero size lists don't have a max).
def max22(L,left,right):
if left == right:
# handle size 1
return L[left]
if left + 1 == right:
# handle size 2
return max(L[left], L[right])
# split the lists (could be uneven lists!)
split_index = (left + right) / 2
# solve two easier problems
return max (max22(L, left, split_index), max22(L, split_index, right))
print max22([1,20, 3], 0, 2)
Notes:
Lose the list comprehension, you don't have to create new lists since you have indices within the list.
When dealing with recursion, you have to think of:
1 - the stop condition(s), in this case there are two because list splits can be uneven, making the recursion stop at uneven conditions.
2 - the easier problem step . Assuming I can solve an easier problem, how can I solve this problem? This is usually what's in the end of the recursive function. In this case a call to the same function on two smaller (index-wise) lists. Recursion looks a lot like proof by induction if you're familiar with it.
Python prefers things to be done explicitly. While Python has some functional features it's best to let readers of the code know what you're up to ratehr than having a big one-liner that makes people scratch their head.
Good luck!

Categories