optimizing code running time [diffrence between the codes below]

optimizing code running time [diffrence between the codes below] - python

these are two codes, can anyone tell me why the second one takes more time to run.
#1
ar = [9547948, 8558390, 9999933, 5148263, 5764559, 906438, 9296778, 1156268]
count=0
big = max(ar)
for i in range(len(ar)):
if(ar[i]==big):
count+=1
print(count)
#2
ar = [9547948, 8558390, 9999933, 5148263, 5764559, 906438, 9296778, 1156268]
list = [i for i in ar if i == max(ar)]
return len(list)

In the list comprehension (the second one), the if clause is evaluated for each candidate item, so max() is evaluated each time.
In the first one, the maximum is evaluated once, before the loop starts. You could probably get a similar performance from the list comprehension by pre-evaluating the maximum in the same way:
maxiest = max(ar)
list = [i for i in ar if i == maxiest]
Additionally, you're not creating a new list in the first one, rather you're just counting the items that match the biggest one. That may also have an impact but you'd have to do some measurements to be certain.
Of course, if you just want to know what the difference between those two algorithms are, that hopefully answers your question. However, you should be aware that max itself will generally pass over the data, then your search will do that again. There is a way to do it with only one pass, something like:
def countLargest(collection):
# Just return zero for empty list.
if len(collection) == 0: return 0
# Setup so first item is largest with count one.
count = 0
largest = collection[0]
# Process every item.
for item in collection:
if item > largest:
# Larger: Replace with count one.
largest = item
count = 1
elif item == largest:
# Equal: increase count.
count += 1
return count
Just keep in mind you should check if that's faster, based on likely data sets (the optimisation mantra is "measure, don't guess").
And, to be honest, it won't make much difference until either your data sets get very large or you need to do it many, many times per second. It certainly won't make any real difference for your eight-element collection. Sometimes, it's better to optimise for readability rather than performance.

Related

Trying to find sets of numbers with all distinct sums; help optimizing algorithm?

I was recently trying to write an algorithm to solve a math problem I came up with (long story how I encountered it): basically, I wanted to come up with sets of P distinct integers such that given a number, there is at most one way of selecting G numbers from the set (repetitions allowed) which sum to that number (or put another way, there are not two distinct sets of G integers from the set with the same sum, called a "collision"). For example, with P, G = 3, 3, the set (10, 1, 0) would work, but (2, 1, 0) wouldn't, since 1+1+1=2+1+0.
I came up with an algorithm in Python that can find and generate these sets, but when I tried it, it runs extremely slowly; I'm pretty sure there is a much more optimized way to do this, but I'm not sure how. The current code is also a bit messy because parts were added organically as I figured out what I needed.
The algorithm starts with these two functions:
import numpy
def rec_gen_list(leng, index, nums, func):
if index == leng-1: #list full
func(nums)
else:
nextMax = nums[index-1];
for nextNum in range(nextMax)[::-1]: # nextMax-1 to 0
nums[index] = nextNum;
rec_gen_list(leng, index+1, nums, func)
def gen_all_lists(leng, first, func):
nums = np.zeros(leng, dtype='int')
nums[0] = first
rec_gen_list(leng, 1, nums, func)
Basically, this code generates all possible lists of distinct integers (with maximum of "first" and minimum 0) and applies some function to them. rec_gen_list is the recursive part; given a partial list and an index, it tries every possible next number in the list less than the last one, and sends that to the next recursion. Once it gets to the last iteration (with the list being full), it applies the given function to the completed list. Note that I stop before the last entry in the list, so it always ends with 0; I enforce that because if you have a list that doesn't contain 0, you can subtract the smallest number from each one in the list to get one that does, so I force them to have 0 to avoid duplicates and make other things I'm planning to do more convenient.
gen_all_lists is the wrapper around the recursive function; it sets up the array and first iteration of the process and gets it started. For example, you could display all lists of 4 distinct numbers between 7 and 0 by calling it as gen_all_lists(4, 7, print). The function included is so that once the lists are generated, I can test them for collisions before displaying them.
However, after coming up with these, I had to modify them to fit with the rest of the algorithm. First off, I needed to keep track of if the algorithm had found any lists that worked; this is handled by the foundOne and foundNew variables in the updated versions. This probably could be done with a global variable, but I don't think it's a significant issue with the slowdown.
In addition, I realized that I could use backtracking to significantly optimize this: if the first 3 numbers out of a long list are something like (100, 99, 98...), that already causes a collision, so I can skip checking all the lists generated from that. This is handled by the G variable (described before) and the test_no_colls function (which tests if a list has any collisions for a certain value of G); each time I make a new sublist, I check it for collisions, and skip the recursive call if I find any.
This is the result of these modifications, used in the current algorithm:
import numpy
def rec_test_list(leng, index, nums, G, func, foundOne):
if index == leng - 1: #list full
foundNew = func(nums)
foundOne = foundOne or foundNew
else:
nextMax = nums[index-1];
for nextNum in range(nextMax)[::-1]: # nextMax-1 to 0
nums[index] = nextNum;
# If already a collision, don't bother going down this tree.
if (test_no_colls(nums[:index+1], G)):
foundNew = rec_test_list(leng, index+1, nums, G, func, foundOne)
foundOne = foundOne or foundNew
return foundOne
def test_all_lists(leng, first, G, func):
nums = np.zeros(leng, dtype='int')
nums[0] = first
return rec_test_list(leng, 1, nums, G, func, False)
For the next two functions, test_no_colls takes a list of numbers and a number G, and determines if there are any "collisions" (two distinct sets of G numbers from the list that add to the same total), returning true if there are none. It starts by making a set that contains the possible scores, then generates every possible distinct set of G indices into the list (repetition allowed) and finds their totals. Each one is checked for in the set; if one is found, there are two combinations with the same total.
The combinations are generated with another algorithm I came up with; this probably could be done the same way as generating the initial lists, but I was a bit confused about the variable scope of the set, so I found a non-recursive way to do it. This may be something to optimize.
The second function is just a wrapper for test_no_colls, printing the input array if it passes; this is used in the test_all_lists later on.
def test_no_colls(nums, G):
possiblePoints=set(()) # Set of possible scores.
ranks = np.zeros(G, dtype='int')
ranks[0] = len(nums) - 1 # Lowest possible rank.
curr_ind = 0
while True: # Repeat until break.
if ranks[curr_ind] >= 0: # Copy over to make the start of the rest.
if curr_ind < G - 1:
copy = ranks[curr_ind]
curr_ind += 1
ranks[curr_ind] = copy
else: # Start decrementing, since we're at the end. We also have a complete list, so test it.
# First, get the score for these rankings and test to see if it collides with a previous score.
total_score = 0
for rank in ranks:
total_score += nums[rank]
if total_score in possiblePoints: # Collision found.
return False
# Otherwise, add the new score to the list.
possiblePoints.add(total_score)
#Then backtrack and continue.
ranks[curr_ind] -= 1
else:
# If the current value is less than 0, we've exhausted the possibilities for the rest of the list,
# and need to backtrack if possible and start with the next lowest number.
curr_ind -= 1;
if (curr_ind < 0): # Backtracked from the start, so we're done.
break
else:
ranks[curr_ind] -= 1 # Start with the next lowest number.
# If we broke out of the loop before returning, no collisions were found.
return True
def show_if_no_colls(nums, games):
if test_no_colls(nums, games):
print(nums)
return True
return False
These are the final functions that wrap everything up. find_good_lists wraps up test_all_lists more conveniently; it finds all lists ranging from 0 to maxPts of length P which have no collisions for a certain G. find_lowest_score then uses this to find the smallest possible maximum value of a list that works for a certain P and G (for example, find_lowest_score(6, 3) finds two possible lists with max 45, [45 43 34 19 3 0] and [45 42 26 11 2 0], with nothing that is all below 45); it also shows some timing data about how long each iteration took.
def find_good_lists(maxPts, P, G):
return test_all_lists(P, maxPts, G, lambda nums: show_if_no_colls(nums, G))
from time import perf_counter
def find_lowest_score(P, G):
maxPts = P - 1; # The minimum possible to even generate a scoring table.
foundSet = False;
while not foundSet:
start = perf_counter()
foundSet = find_good_lists(maxPts, P, G)
end = perf_counter()
print("Looked for {}, took {:.5f} s".format(maxPts, end-start))
maxPts += 1;
So, this algorithm does seem to work, but it runs very slowly; when trying to run lowest_score(7, 3), for example, it starts taking minutes per iteration around maxPts in the 70s or so, even on Google Colab. Does anyone have suggestions for optimizing this algorithm to improve its runtime and time complexity, or better ways to solve the problem? I am interested in further exploration of this (such as filtering the lists generated for other qualities), but am concerned about the time it would take with this algorithm.

Python Find the mean school assignment - What is a loop?

I have been working on this assignment for about 2 weeks and have nothing done. I am a starter at coding and my teacher is really not helping me with it. She redirects me to her videos that I have to learn from every time and will not directly tell or help me on how I can do it. Here are the instructions to the assignment (said in a video, but made it into text.
Find the mean
Create a program that finds the mean of a list of numbers.
Iterate through it, and instead of printing each item, you want to add them together.
Create a new variable inside of that, that takes the grand total when you add things together,
And then you have to divide it by the length of your array, for python/java script you’ll need to use the method that lets you know the length of your list.
Bonus point for kids who can find the median, to do that you need to sort your list and then you need to remove items from the right and the left until you only have one left
All you’re doing is you need to create a variable that is your list
Create another variable that is a empty one at the moment and be a number
Iterate through your list and add each of the numbers to the variable you created
Then divide the number by the number of items that you had in the list.
Here's what I've done so far.
num = [1, 2, 3, 4, 5, 6];
total = 0;
total = (num[0] + total)
total = (num[1] + total)
total = (num[2] + total)
total = (num[3] + total)
total = (num[4] + total)
total = (num[5] + total)
print(total)
However, she tells me I need to shorten down the total = (num[_] + total) parts into a loop. Here is how she is telling me to do a loop in one of her videos.
for x in ____: print(x)
or
for x in range(len(___)): print (x+1, ____[x])
there is also
while i < len(___):
print(___[i])
i = i + 1
I really don't understand how to do this so explain it like I'm a total noob.

First of all, loops in python are of two types.
while: a while loop executes the code in a body until a condition is true. For example:
i = 0
while(i < 5):
i = i + 1
executes i = i + 1 until i < 5 is true, meaning that when i will be equal to 5 the loop will terminate because its condition becomes false.
for: a for loop in python iterates over the items of any sequence, from the first to the last, and execute its body at each iteration.
Note: in both cases, by loop body I mean the indented code, in the example above the body is i = i + 5.
Iterating over a list. You can iterate over a list:
Using an index
As each position of the array is indexed with a positive number from 0 to the length of the array minus 1, you can access the positions of the array with an incremental index. So, for example:
i = 0
while i < len(arr):
print(arr[i])
i = i + 1
will access arr[0] in the first iteration, arr[1] in the second iteration and so on, up to arr[len(arr)-1] in the last iteration. Then, when i is further incremented, i = len(arr) and so the condition in the while loop (i < arr[i]) becomes false. So the loop is broken.
Using an iterator
I won't go in the details of how an iterator works under the surface since it may be too much to absorb for a beginner. However, what matters to you is the following. In Python you can use an iterator to write the condition of a for loop, as your teacher showed you in the example:
for x in arr:
print(x)
An iterator is intuitively an object that iterates over something that has the characteristic of being "iterable". Lists are not the only iterable elements in python, however they are probably the most important to know. Using an iterator on a list allows you to access in order all the elements of the list. The value of the element of the list is stored in the variable x at each iteration. Therefore:
iter 1: x = arr[0]
iter 2: x = arr[1]
...
iter len(arr)-1: x = arr[len(arr)-1]
Once all the elements of the list are accessed, the loop terminates.
Note: in python, the function range(n) creates an "iterable" from 0 to n-1, so the for loop
for i in range(len(arr)):
print(arr[i])
uses an iterator to create the sequence of values stored in i and then i is in turn used on the array arr to access its elements positionally.
Summing the elements. If you understand what I explained to you, it should be straightforward to write a loop to sum all the elements of a list. You initialize a variable sum=0 before the loop. Then, you add the element accessed as we saw above at each iteration to the variable sum. It will be something like:
sum = 0
for x in arr:
sum = sum + x
I will let you write an equivalent code with the other two methods I showed you and do the other points of the assignment by yourself. I am sure that once you'll understand how it works you'll be fine. I hope to have answered your question.

She wants you to loop through the list.
Python is really nice makes this easier than other languages.
I have an example below that is close to what you need but I do not want to do your homework for you.
listName = [4,8,4,7,84]
for currentListValue in listName:
#Do your calculating here...
#Example: tempVar = tempVar + (currentListValue * 2)
as mentioned in the comments w3schools is a good reference for python.

Python how do i make list appends / extends quicker?

Heyo all.
Trying to get better at python and started doing leetcode problems.
Im currently doing one, were the goal is to capture water.
Link => https://leetcode.com/problems/trapping-rain-water/
problem is; it times me out for taking too long. My code is certainly inefficient. Afer googling around i found that .append is supposedly very slow / inefficient. So is .extend.
Cant find any obvious ways of making my code faster; hence my arrival here.
any response is much appreciated
class Solution:
def trap(self, height: List[int]) -> int:
max_height = max(height)
water_blocks = 0
for element in range(max_height):
local_map = []
element = element + 1
for block in height:
if block >= element:
local_map.extend([1])
else:
local_map.extend([0])
if local_map.count(1) > 1:
first_index = local_map.index(1)
reversed_list = local_map[::-1]
last_index = len(local_map) - 1 - reversed_list.index(1)
water_count = last_index - first_index - 1 - (local_map.count(1) - 2)
water_blocks += water_count
else:
continue
return water_blocks

Although many of your count and index calls can be avoided, the two big nested loops might still be a problem. For the outer loop, max_height can be large number and the inner loop iterates over the full list. You might need to come up with a different algorithm.
I don't have a leetcode account, so I can't really test my code, but this would be my suggestion: It iterates over the height-list only once, with a small inner loop to find the next matching wall.
class Solution:
def trap(self, h):
water = 0
current_height = 0
for i, n in enumerate(h):
# found a "bucket", add water
if n < current_height:
water += current_height - n
else: # found a wall. calculate usable height
current_height = self.n_or_max(h[i+1:], n)
return water
def n_or_max(self, h, n):
local_max = 0
for i in h:
if i > local_max:
local_max = i
# that's high enough, return n
if i >= n:
return n
return local_max

Here are some pointers:
Do not use list.count() or list.index() (that is, try to remove local_map.count(1), local_map.index(1) and reversed_list.index(1)). The first will loop (internally) over the whole list, which is obviously expensive if the list is large. The second will loop over the list until a 1 is found. Currently you even have two calls to local_map.count(1) which will always return the same answer, so at least just store the result in a variable. In your loop over blocks, you construct local_map yourself, so you do in fact know exactly what it contains, you should not have to search through it afterwards. Just put a few ifs into the first loop over blocks.
The operation local_map[::-1] not only runs over the whole list, but additionally copies the whole thing into a new list (backwards, but that's not really contributing to the issue). Again, this new list does not contain new information, so you can figure out the value of water_count without doing this.
The above is really the major issues. A slight further optimization can be obtained by eliminating element = element + 1. Just shift the range, as in range(1, max_height + 1).
Also, as written in the comments, prefer list.append(x) to list.extend([x]). It's not huge, but the latter has to create an additional list (of length 1), put x into it, loop over the list and append its elements (just x) to the large list. Finally, the length-1 list is thrown away. On the contrary, list.append(x) just appends x to the list, no temporary length-1 list needed.
Note that list.append() is not slow. It's a function call, which is always somewhat slow, but the actual data operation is fast: constant time and even cleverly amortized, as juanpa.arrivillaga writes.

Here's another way of looking at the problem. This scans left to right over the bins, and at each point, I track how many units of water are dammed up at each level. When there's a tall wall, we tally up whatever units it was damming, and clear them. However, this still gets an "overtime" flag on the next to the last test, which has about 10,000 entries. It takes 20 seconds on my relatively old box.
class Solution():
def trap(self, height):
trapped = 0
accum = [0]*max(height)
lastwall = -1
for h in height:
# Take credit for everything up to our height.
trapped += sum(accum[0:h])
accum[0:h] = [0]*h
for v in range(h,lastwall):
accum[v] += 1
lastwall = max(lastwall,h)
return trapped
print(Solution().trap([0,1,0,2,1,0,1,3,2,1,2,1])) # 6
print(Solution().trap([4,2,0,3,2,5])) # 9

Trying to write a recursive function to get just the unique numbers in list

I'm trying to do it where the list is split in two and the function picks the unique numbers from the first half and then calls itself again with the second half of the list. I'm just a little stuck at this point, as this only gives me the first half numbers.
def function(Lst):
s = set()
mid = len(Lst)//2
for item in Lst[:mid]:
Thank you for the direction.
Was able to use a that for loop to add to the set and then recursevely redo it for the right half.

Something like this should do the work although it's not optimal:
def unique(Lst):
s = set()
if len(Lst) == 1:
return {Lst[0]}
mid = len(Lst)//2
for item in Lst[:mid]:
if item not in s:
s.add(item)
s |= unique(Lst[mid:])
return s
P.S. if you want uniqueness, you can just do this: set([1,2,3,3,3,4,5,6,7,7,8,8])
EDIT:
your problem was that you forgot to add results from recursive call to the set, which already contains your items (s)

One recursive solution, for example, would be:
def unique(Lst):
if len(Lst) == 1:
return {Lst[0]}
m = len(Lst) // 2
left_values = unique(Lst[:m])
right_values = unique(Lst[m:])
merge_step = set.union(left_values, right_values)
return merge_step
The divide & conquer solution does pretty much this:
If the subproblem size is 1, return a set containing the only element
The merge step merges the left and right subproblems using set union. The union of two sets is the set of all unique elements contained in at least one of those sets.

Algorithm to find the maxima of a given set of items

I have written a program in python that displays the set of maximum values that are entered into database. When an item is incomparable to the maxima, it is added to the maxima
Currently I am performing a linear search through the entire database. The problem is the worst case runtime is O(n^2). I was wndering is there can be a better implementation for this algorithm.
maxima = []
for item in items:
should_insert = 1;
for val in maxima:
comp = self.test(item, val)
if comp == 1:
should_insert = 0
break
elif comp == -1:
maxima.remove(val)
if should_insert == 1:
maxima.append(item)
return maxima

In general there is no way to improve on this.
However there are usually many linear extensions of your partial order that turn your partial order into a total order. (See http://en.wikipedia.org/wiki/Linear_extension for what I mean by that.) Let's suppose that you can find several which, between them, have the property that two elements are comparable in the original order if and only if they compare the same way on each. Now what you can do is take your set, do a heapsort using the first ordering until you find the first element not comparable with your max. (See http://en.wikipedia.org/wiki/Heapsort for that algorithm, which in Python is available from https://docs.python.org/2/library/heapq.html.) Take that set, switch to the second ordering, and repeat. Continue until you've used all orderings.
If you have n elements, and k such orderings, then this algorithm's worst case running time is O(k * n * log(n)). And frequently it will be much better - if m is the size of the group that you pull out on the first step, the running time is O(n + k * m * log(n)).
Your ability to use this approach will, unfortunately, depend on whether you can find several total extensions of your partial ordering that have this property. But in many cases you can. For instance for one ordering you break your original sort on number of bathrooms ascending, and in the next on number of bathrooms descending. And so on.

Its not entirely clear what you mean by "incomparable" values. If you mean equal values, then you probably want a simple variation on the normal max function, allowing it to return multiple equal values:
def find_maxima_if_incomparable_means_equal(self, items):
it = iter(items)
maxima = [next(it)] # TODO: change the exception type raised here if items is empty
for item in it:
comp = self.test(item, maxima[0])
if comp == 0:
maxima.append(item)
elif comp < 0:
maxima = [item]
return maxima
On the other hand, if you really mean it when you say that some cannot be compared (i.e. that comparing them has no meaning), the situation is more complicated. You want to find a "maxima" subset of the values such that each item in the maxima set is either greater than or incomparable to every other item in the original set. If your set was [1, 2, 3, "a", "b", "c"], you'd expect the maxima to be [3, "c"], since integers and strings cannot be compared to each other (at least, not in Python 3).
There's no way to avoid the potential of O(N^2) running time in the general case. That's because if none of your items can be compared to any of the others, the maxima set is going to end up being the same as the whole set, and you'll have to have tested every item against every other item to be sure they're really incomparable.
In fact, in the most general case where there's no requirement of a total ordering among any of the values (e.g. a < b < c does not imply a < c), you probably have to compare every item to every other item always. Here's a function that does exactly that:
import itertools
def find_maxima_no_total_ordering(self, items):
non_maximal = set()
for a, b in itertools.combinations(items, 2):
comp = self.test(a, b)
if comp > 0:
non_maximal.add(a)
elif comp < 0:
non_maximal.add(b)
return [x for x in items if x not in non_maximal]
Note that the maxima returned by this function may be empty, if the comparisons are bizarre enough that there are cycles (e.g. A < B, B < C and C < A are all true).
If your specific situation is more limited, you may have some better options. If your set of items is the union of of several totally ordered groups (such that A < B < C implies A < C and that A incomparable-to B and B < C implies A incomparable-to C), and there's no easy way to separate out the incomparable groups, you can use an algorithm similar to what your current code attempts, which will be O(M*N) where N is the number of items and M is the number of totally ordered groups. This is still O(N^2) in the worst case (N groups) but somewhat better if the items end up belonging to just a few groups. If all items are comparable to each other, it's O(N) (and the maxima will contain just a single value). Here's an improved version of your code:
def find_maxima_with_total_orderings(self, items):
maxima = set() # use a set for fast removal
for item in items:
for val in maxima:
comp = self.test(item, val)
if comp == 1:
break
elif comp == -1:
maxima.remove(val)
maxima.add(item)
break
else: # else clause is run if there was no break in the loop
maxima.add(item)
return maxima # you may want to turn this into a list again before returning it
You can do even better if the group an item belongs to can be determined easily (for instance, by checking the item's type). You can first subdivide the items into their groups, then find the maximum of each totally ordered group. Here's code that's O(N) for all cases, assuming that there's an O(1) running time method self.group that returns some hashable value so that if self.group(A) == self.group(B) then self.test(A, B) != 0:
from collections import defaultdict
def _max(comparable_items): # a helper function, find max using self.test rather than >
it = iter(comparable_items)
max_item = next(it)
for item in it:
if self.test(item, max_item) < 0:
max_item = item
return max_item
def find_maxima_with_groups(self, items):
groups = defaultdict(list)
for item in items:
groups[self.group(item)].append(item)
return [self._max(g) for g in groups.values()]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

optimizing code running time [diffrence between the codes below] - python

Related

Trying to find sets of numbers with all distinct sums; help optimizing algorithm?

Python Find the mean school assignment - What is a loop?

Python how do i make list appends / extends quicker?

Trying to write a recursive function to get just the unique numbers in list

Algorithm to find the maxima of a given set of items

Categories

Resources