Optimizing the run time of the nested for loop - python

I am just getting started with competitive programming and after writing the solution to certain problem i got the error of RUNTIME exceeded.
max( | a [ i ] - a [ j ] | + | i - j | )
Where a is a list of elements and i,j are index i need to get the max() of the above expression.
Here is a short but complete code snippet.
t = int(input()) # Number of test cases
for i in range(t):
n = int(input()) #size of list
a = list(map(int, str(input()).split())) # getting space separated input
res = []
for s in range(n): # These two loops are increasing the run-time
for d in range(n):
res.append(abs(a[s] - a[d]) + abs(s - d))
print(max(res))
Input File This link may expire(Hope it works)
1<=t<=100
1<=n<=10^5
0<=a[i]<=10^5
Run-time on leader-board for C language is 5sec and that for Python is 35sec while this code takes 80sec.
It is an online judge so independent on machine.numpy is not available.
Please keep it simple i am new to python.
Thanks for reading.

For a given j<=i, |a[i]-a[j]|+|i-j| = max(a[i]-a[j]+i-j, a[j]-a[i]+i-j).
Thus for a given i, the value of j<=i that maximizes |a[i]-a[j]|+|i-j| is either the j that maximizes a[j]-j or the j that minimizes a[j]+j.
Both these values can be computed as you run along the array, giving a simple O(n) algorithm:
def maxdiff(xs):
mp = mn = xs[0]
best = 0
for i, x in enumerate(xs):
mp = max(mp, x-i)
mn = min(mn, x+i)
best = max(best, x+i-mn, -x+i+mp)
return best
And here's some simple testing against a naive but obviously correct algorithm:
def maxdiff_naive(xs):
best = 0
for i in xrange(len(xs)):
for j in xrange(i+1):
best = max(best, abs(xs[i]-xs[j]) + abs(i-j))
return best
import random
for _ in xrange(500):
r = [random.randrange(1000) for _ in xrange(50)]
md1 = maxdiff(r)
md2 = maxdiff_naive(r)
if md1 != md2:
print "%d != %d\n%s" % (md1, md2, r)
exit
It takes a fraction of a second to run maxdiff on an array of size 10^5, which is significantly better than your reported leaderboard scores.

"Competitive programming" is not about saving a few milliseconds by using a different kind of loop; it's about being smart about how you approach a problem, and then implementing the solution efficiently.
Still, one thing that jumps out is that you are wasting time building a list only to scan it to find the max. Your double loop can be transformed to the following (ignoring other possible improvements):
print(max(abs(a[s] - a[d]) + abs(s - d) for s in range(n) for d in range(n)))
But that's small fry. Worry about your algorithm first, and then turn to even obvious time-wasters like this. You can cut the number of comparisons to half, as #Brett showed you, but I would first study the problem and ask myself: Do I really need to calculate this quantity n^2 times, or even 0.5*n^2 times? That's how you get the times down, not by shaving off milliseconds.

Related

faster way to erode/dilate images

I'm making a script thats does some mathemagical morphology on images (mainly gis rasters). Now, I've implemented erosion and dilation, with opening/closing with reconstruction still on the TODO but thats not the subject here.
My implementation is very simple with nested loops, which I tried on a 10900x10900 raster and it took an absurdly long amount of time to finish, obviously.
Before I continue with other operations, I'd like to know if theres a faster way to do this?
My implementation:
def erode(image, S):
(m, n) = image.shape
buffer = np.full((m, n), 0).astype(np.float64)
for i in range(S, m - S):
for j in range(S, n - S):
buffer[i, j] = np.min(image[i - S: i + S + 1, j - S: j + S + 1]) #dilation is just np.max()
return buffer
I've heard about vectorization but I'm not quite sure I understand it too well. Any advice or pointers are appreciated. Also I am aware that opencv has these morphological operations, but I want to implement my own to learn about them.
The question here is do you want a more efficient implementation because you want to learn about numpy or do you want a more efficient algorithm.
I think there are two obvious things that could be improved with your approach. One is you want to avoid looping on the python level because that is slow. The other is that your taking a maximum of overlapping parts of arrays and you can make it more efficient if you reuse all the effort you put in finding the last maximum.
I will illustrate that with 1d implementations of erosion.
Baseline for comparison
Here is basically your implementation just a 1d version:
def erode(image, S):
n = image.shape[0]
buffer = np.full(n, 0).astype(np.float64)
for i in range(S, n - S):
buffer[i] = np.min(image[i - S: i + S + 1]) #dilation is just np.max()
return buffer
You can make this faster using stride_tricks/sliding_window_view. I.e. by avoiding the loops and doing that at the numpy level.
Faster Implementation
np.lib.stride_tricks.sliding_window_view(arr,2*S+1).min(1)
Notice that it's not quite doing the same since it only starts calculating values once there are 2S+1 values to take the maximum of. But for this illustration I will ignore this problem.
Faster Algorithm
A completely different approach would be to not start calculating the min from scratch but keeping the values ordered and only adding one and removing one when considering the next window one to the right.
Here is a ruff implementation of that:
def smart_erode(arr, m):
n = arr.shape[0]
sd = SortedDict()
for new in arr[:m]:
if new in sd:
sd[new] += 1
else:
sd[new] = 1
for to_remove,new in zip(arr[:-m+1],arr[m:]):
yield sd.keys()[0]
if new in sd:
sd[new] += 1
else:
sd[new] = 1
if sd[to_remove] > 1:
sd[to_remove] -= 1
else:
sd.pop(to_remove)
yield sd.keys()[0]
Notice that an ordered set wouldn't work and an ordered list would have to have a way to remove just one element with a specific value sind you could have repeated values in your array. I am using an ordered dict to store the amount of items present for a value.
A Ruff Benchmark
I want to illustrate how the 3 implementations compare for different window sizes. So I am testing them with an array of 10^5 random integers for different window sizes ranging from 10^3 to 10^4.
arr = np.random.randint(0,10**5,10**5)
sliding_window_times = []
op_times = []
better_alg_times = []
for m in np.linspace(0,10**4,11)[1:].astype('int'):
x = %timeit -o -n 1 -r 1 np.lib.stride_tricks.sliding_window_view(arr,2*m+1).min(1)
sliding_window_times.append(x.best)
x = %timeit -o -n 1 -r 1 erode(arr,m)
op_times.append(x.best)
x = %timeit -o -n 1 -r 1 tuple(smart_erode(arr,2*m+1))
better_alg_times.append(x.best)
print("")
pd.DataFrame({"Baseline Comparison":op_times,
'Faster Implementation':sliding_window_times,
'Faster Algorithm':better_alg_times,
},
index = np.linspace(0,10**4,11)[1:].astype('int')
).plot.bar()
Notice that for very small window sizes the raw power of the numpy implementation wins out but very quickly the amount of work we are saving by not calculating the min from scratch is more important.

How do I implement summation and array iteration correctly based on Pseudo code. PYTHON Relaxation Method

I am trying to implement relaxation iterative solver for a project. The function we create should intake two inputs: Matrix A, and Vector B, and should return iterative vectors X that Approximate solution Ax = b.
Pseudo Code from the book is here:
enter image description here
I am new to Python so I am struggling quite a bit with implementing this method. Here is my code:
def SOR_1(A,b):
k=1
n = len(A)
xo = np.zeros_like(b)
x = np.zeros_like(b)
omega = 1.25
while (k <= N):
for i in range(n-1):
x[i] = (1.0-omega)*xo[i] + (1.0/A[i][i])[omega(-np.sum(A[i][j]*x[j]))
-np.sum(A[i][j]*xo[j] + b[i])]
if ( np.linalg.norm(x - xo) < 1e-9):
print (x)
k = k + 1.0
for i in range(n-1):
xo[i] = x[i]
return x
My question is how do I implement the for loop and generating the arrays correctly based off of the Pseudo Code.
Welcome to Python!
Variables in Python are case sensitive so n is defined but N is not defined. If they are supposed to be different variables, I don't see what your value is for N.
You are off to a good start but the following line is still psuedocode for the most part:
x[i] = (1.0-omega)*xo[i] + (1.0/A[i][i])[omega(-np.sum(A[i][j]*x[j]))
-np.sum(A[i][j]*xo[j] + b[i])]
In the textbook's pseudocode square brackets are being used as a grouping symbol but in Python, they are reserved for creating and accessing lists (which is what python calls arrays). Also, there is no implicit multiplication in Python so you have to write things like (1 + 2)*(3*(4+5)) rather than (1 + 2)[3(4+5)]
The other major issue is that you don't define j. You probably need a for loop that would either look like:
for j in range(1, i):
or if you want to do it inline
sum(A[i][j]*x[j] for j in range(1, i))
Note that range has two arguments, where to start and what value to stop before so range(1, i) is equivalent to the summation from 1 to i - 1
I think you are struggling with that line because there's far too much going on in that line. See if you can figure out parts of it using separate variables or offload some of the work to separate functions.
something like: x[i] =a + b * c * d() - e() but give a,b c, d and e meaningful names. You'd then have to correctly set each variable and define each function but at least you are trying to solve separate problems rather than one huge complex one.
Also, make sure you have your tabs correct. k = k + 1.0 should not be inside that for loop, just inside the while loop.
Coding is an iterative process. First get the while loop working. Don't try to do anything in it (except print out the variable so you can see that it is working). Next get the for loop working inside the while loop (again, just printing the variables). Next get (1.0-omega)*xo[i] working. Along the way, you'll discover and solve issues such as (1.0-omega)*xo[i] will evaluate to 0 because xo is a NumPy list initiated with all zeros.
You'd start with something like:
k = 1
N = 3
n = 3
xo = [1, 2, 3]
while (k <= N):
for i in range(n):
print(k, i)
omega = 1.25
print((1.0-omega)*xo[i])
k += 1
And slowly work more and more of the relaxation solver in until you have everything working.

How to deal with excessively large inputs in python?

I am a beginner and I was practicing a question on hackerrank.
I wrote this code as part of a problem which timed out for large inputs:
K = int(input())
roomnos = input().split()
setroomnos = set(roomnos)
for r in setroomnos:
if roomnos.count(r) == 1:
print(r)
break
The next one was accepted by the judge for all test cases
K = int(input())
roomnos = [int(i) for i in input().split()]
setroomnos = set(roomnos)
c = (K * sum(setroomnos) - sum(roomnos)) // (K - 1)
print(c)
can you please explain why the first one timed out for large inputs and the second one worked fine
PS: The fundamental operation is to find a no which appears only once in a list as opposed to other nos which appear K times
Your first solution uses O(n) for with an O(n) count inside it - leading to O(n^2) complexity. Your second example doesn't nest operations in that way, and so is O(n) complexity.

Random prime Number in python

I currently have ↓ set as my randprime(p,q) function. Is there any way to condense this, via something like a genexp or listcomp? Here's my function:
n = randint(p, q)
while not isPrime(n):
n = randint(p, q)
It's better to just generate the list of primes, and then choose from that line.
As is, with your code there is the slim chance that it will hit an infinite loop, either if there are no primes in the interval or if randint always picks a non-prime then the while loop will never end.
So this is probably shorter and less troublesome:
import random
primes = [i for i in range(p,q) if isPrime(i)]
n = random.choice(primes)
The other advantage of this is there is no chance of deadlock if there are no primes in the interval. As stated this can be slow depending on the range, so it would be quicker if you cached the primes ahead of time:
# initialising primes
minPrime = 0
maxPrime = 1000
cached_primes = [i for i in range(minPrime,maxPrime) if isPrime(i)]
#elsewhere in the code
import random
n = random.choice([i for i in cached_primes if p<i<q])
Again, further optimisations are possible, but are very much dependant on your actual code... and you know what they say about premature optimisations.
Here is a script written in python to generate n random prime integers between tow given integers:
import numpy as np
def getRandomPrimeInteger(bounds):
for i in range(bounds.__len__()-1):
if bounds[i + 1] > bounds[i]:
x = bounds[i] + np.random.randint(bounds[i+1]-bounds[i])
if isPrime(x):
return x
else:
if isPrime(bounds[i]):
return bounds[i]
if isPrime(bounds[i + 1]):
return bounds[i + 1]
newBounds = [0 for i in range(2*bounds.__len__() - 1)]
newBounds[0] = bounds[0]
for i in range(1, bounds.__len__()):
newBounds[2*i-1] = int((bounds[i-1] + bounds[i])/2)
newBounds[2*i] = bounds[i]
return getRandomPrimeInteger(newBounds)
def isPrime(x):
count = 0
for i in range(int(x/2)):
if x % (i+1) == 0:
count = count+1
return count == 1
#ex: get 50 random prime integers between 100 and 10000:
bounds = [100, 10000]
for i in range(50):
x = getRandomPrimeInteger(bounds)
print(x)
So it would be great if you could use an iterator to give the integers from p to q in random order (without replacement). I haven't been able to find a way to do that. The following will give random integers in that range and will skip anything that it's tested already.
import random
fail = False
tested = set([])
n = random.randint(p,q)
while not isPrime(n):
tested.add(n)
if len(tested) == p-q+1:
fail = True
break
while n in s:
n = random.randint(p,q)
if fail:
print 'I failed'
else:
print n, ' is prime'
The big advantage of this is that if say the range you're testing is just (14,15), your code would run forever. This code is guaranteed to produce an answer if such a prime exists, and tell you there isn't one if such a prime does not exist. You can obviously make this more compact, but I'm trying to show the logic.
next(i for i in itertools.imap(lambda x: random.randint(p,q)|1,itertools.count()) if isPrime(i))
This starts with itertools.count() - this gives an infinite set.
Each number is mapped to a new random number in the range, by itertools.imap(). imap is like map, but returns an iterator, rather than a list - we don't want to generate a list of inifinite random numbers!
Then, the first matching number is found, and returned.
Works efficiently, even if p and q are very far apart - e.g. 1 and 10**30, which generating a full list won't do!
By the way, this is not more efficient than your code above, and is a lot more difficult to understand at a glance - please have some consideration for the next programmer to have to read your code, and just do it as you did above. That programmer might be you in six months, when you've forgotten what this code was supposed to do!
P.S - in practice, you might want to replace count() with xrange (NOT range!) e.g. xrange((p-q)**1.5+20) to do no more than that number of attempts (balanced between limited tests for small ranges and large ranges, and has no more than 1/2% chance of failing if it could succeed), otherwise, as was suggested in another post, you might loop forever.
PPS - improvement: replaced random.randint(p,q) with random.randint(p,q)|1 - this makes the code twice as efficient, but eliminates the possibility that the result will be 2.

python: Generating integer partitions

I need to generate all the partitions of a given integer.
I found this algorithm by Jerome Kelleher for which it is stated to be the most efficient one:
def accelAsc(n):
a = [0 for i in range(n + 1)]
k = 1
a[0] = 0
y = n - 1
while k != 0:
x = a[k - 1] + 1
k -= 1
while 2*x <= y:
a[k] = x
y -= x
k += 1
l = k + 1
while x <= y:
a[k] = x
a[l] = y
yield a[:k + 2]
x += 1
y -= 1
a[k] = x + y
y = x + y - 1
yield a[:k + 1]
reference: http://homepages.ed.ac.uk/jkellehe/partitions.php
By the way, it is not quite efficient. For an input like 40 it freezes nearly my whole system for few seconds before giving its output.
If it was a recursive algorithm I would try to decorate it with a caching function or something to improve its efficiency, but being like that I can't figure out what to do.
Do you have some suggestions about how to speed up this algorithm? Or can you suggest me another one, or a different approach to make another one from scratch?
To generate compositions directly you can use the following algorithm:
def ruleGen(n, m, sigma):
"""
Generates all interpart restricted compositions of n with first part
>= m using restriction function sigma. See Kelleher 2006, 'Encoding
partitions as ascending compositions' chapters 3 and 4 for details.
"""
a = [0 for i in range(n + 1)]
k = 1
a[0] = m - 1
a[1] = n - m + 1
while k != 0:
x = a[k - 1] + 1
y = a[k] - 1
k -= 1
while sigma(x) <= y:
a[k] = x
x = sigma(x)
y -= x
k += 1
a[k] = x + y
yield a[:k + 1]
This algorithm is very general, and can generate partitions and compositions of many different types. For your case, use
ruleGen(n, 1, lambda x: 1)
to generate all unrestricted compositions. The third argument is known as the restriction function, and describes the type of composition/partition that you require. The method is efficient, as the amount of effort required to generate each composition is constant, when you average over all the compositions generated. If you would like to make it slightly faster in python then it's easy to replace the function sigma with 1.
It's worth noting here as well that for any constant amortised time algorithm, what you actually do with the generated objects will almost certainly dominate the cost of generating them. For example, if you store all the partitions in a list, then the time spent managing the memory for this big list will be far greater than the time spent generating the partitions.
Say, for some reason, you want to take the product of each partition. If you take a naive approach to this, then the processing involved is linear in the number of parts, whereas the cost of generation is constant. It's quite difficult to think of an application of a combinatorial generation algorithm in which the processing doesn't dominate the cost of generation. So, in practice, there'll be no measurable difference between using the simpler and more general ruleGen with sigma(x) = x and the specialised accelAsc.
If you are going to use this function repeatedly for the same inputs, it could still be worth caching the return values (if you are going to use it across separate runs, you could store the results in a file).
If you can't find a significantly faster algorithm, then it should be possible to speed this up by an order of magnitude or two by moving the code into a C extension (this is probably easiest using cython), or alternatively by using PyPy instead of CPython (PyPy has its downsides - it does not yet support Python 3, or some commonly-used libraries like numpy and scipy).
The reason for this is, since python is dynamically typed, the interpreter is probably spending most of its time checking the types of the variables - for all the interpreter knows, one of the operations could turn x into a string, in which case expressions like x + y would suddenly have very different meanings. Cython gets around this problem by allowing you to statically declare the variables as integers, while PyPy has a just-in-time compiler which minimises redundant type checks.
Testing with n=75 I get:
PyPy 1.8:
w:\>c:\pypy-1.8\pypy.exe pstst.py
1.04800009727 secs.
CPython 2.6:
w:\>python pstst.py
5.86199998856 secs.
Cython + mingw + gcc 4.6.2:
w:\pstst> python -c "import pstst;pstst.run()"
4.06399989128
I saw no difference with Psyco(?)
The run function:
def run():
import time
start = time.time()
for p in accelAsc(75):
pass
print time.time() - start, 'secs.'
If I change the definition of accelAsc for Cython to start with:
def accelAsc(int n):
cdef int x, y, k
# no more changes..
I get the Cython time down to 2.27 secs.
I'd say that your performance issue is somewhere else.
I didn't compare it with other approaches, but it does seem efficient to me:
import time
start = time.time()
partitions = list(accelAsc(40))
print('time: {:.5f} sec'.format(time.time() - start))
print('length:', len(partitions))
Gave:
time: 0.03636 sec
length: 37338

Categories