Speed Up Nested For Loops with NumPy

Speed Up Nested For Loops with NumPy - python

I'm trying to solve a dynamic programming problem, and I came up with a simple loop-based algorithm which fills in a 2D array based on a series of if statements like this:
s = # some string of size n
opt = numpy.zeros(shape=(n, n))
for j in range(0, n):
for i in range(j, -1, -1):
if j - i == 0:
opt[i, j] = 1
elif j - i == 1:
opt[i, j] = 2 if s[i] == s[j] else 1
elif s[i] == s[j] and opt[i + 1, j - 1] == (j - 1) - (i + 1) + 1:
opt[i, j] = 2 + opt[i + 1, j - 1]
else:
opt[i, j] = max(opt[i + 1, j], opt[i, j - 1], opt[i + 1, j - 1])
Unfortunately, this code is extremely slow for large values of N. I found that it is much better to use built in functions such as numpy.where and numpy.fill to fill in the values of the array as opposed to for loops, but I'm struggling to find any examples which explain how these functions (or other optimized numpy methods) can be made to work with a series of if statements, as my algorithm does. What would be an appropriate way to rewrite the above code with built-in numpy libraries to make it better optimized for Python?

I don't think that np.where and np.fill can solve your problem. np.where is used to return elements of a numpy array that satisfy a certain condition, but in your case, the condition is NOT on VALUES of the numpy array, but on the values from variables i and j.
For your particular question, I would recommend using Cython to optimize your code specially for larger values of N. Cython is basically an interface between Python and C. The beauty of Cython is that it allows you to keep your python syntax, but optimize it using C structures. It allows you to define types of variables in a C-like manner to speed up your computations. For example, defining i and j as integers using Cython will speed thing up quite considerably because the types of i and j are checked at every loop iteration.
Also, Cython will allow you to define classic, fast, 2D arrays using C. You can then use pointers for fast element access to this 2D array instead of using numpy arrays. In your case, opt will be that 2D array.

Your if statements and the left-hand sides of your assignment statements contain references to the array that you're modifying in the loop. This means that there will be no general way to translate your loop into array operations. So you're stuck with some kind of for loop.
If you instead had the simpler loop:
for j in range(0, n):
for i in range(j, -1, -1):
if j - i == 0:
opt[i, j] = 1
elif j - i == 1:
opt[i, j] = 2
elif s[i] == s[j]:
opt[i, j] = 3
else:
opt[i, j] = 4
you could construct boolean arrays (using some broadcasting) that represent your three conditions:
import numpy as np
# get arrays i and j that represent the row and column indices
i,j = np.ogrid[:n, :n]
# construct an array with the characters from s
sarr = np.fromiter(s, dtype='U1').reshape(1, -1)
cond1 = i==j # result will be a bool arr with True wherever row index equals column index
cond2 = j==i+1 # result will be a bool arr with True wherever col index equals (row index + 1)
cond3 = sarr==sarr.T # result will be a bool arr with True wherever s[i]==s[j]
You could then use numpy.select to construct your desired opt:
opt = np.select([cond1, cond2, cond3], [1, 2, 3], default=4)
For n=5 and s='abbca', this would yield:
array([[1, 2, 4, 4, 3],
[4, 1, 2, 4, 4],
[4, 3, 1, 2, 4],
[4, 4, 4, 1, 2],
[3, 4, 4, 4, 1]])

Here is a vectorized solution.
It creates diagonal views into the output array which allow us to do accumulation in diagonal direction.
Step-by-step explanation:
evaluate s[i] == s[j] in the diagonal view.
only keep those which are connected to the main or first sub- diagonal by a series of Trues in top right to bottom left direction
replace all Trues with 2s except the main diagonal which gets 1s instead; take the cumulative sum in bottom left to top right direction
finally, take the cumulative maximum in bottom up and left right direction
As it is not totally obvious this does the same as the loopy code I've tested on quite a few examples (using function stresstest below) and it seems correct. And is roughly 7x faster for moderately large strings (1-100 characters).
import numpy as np
def loopy(s):
n = len(s)
opt = np.zeros(shape=(n, n), dtype=int)
for j in range(0, n):
for i in range(j, -1, -1):
if j - i == 0:
opt[i, j] = 1
elif j - i == 1:
opt[i, j] = 2 if s[i] == s[j] else 1
elif s[i] == s[j] and opt[i + 1, j - 1] == (j - 1) - (i + 1) + 1:
opt[i, j] = 2 + opt[i + 1, j - 1]
else:
opt[i, j] = max(opt[i + 1, j], opt[i, j - 1], opt[i + 1, j - 1])
return opt
def vect(s):
n = len(s)
h = (n+1) // 2
s = np.array([s, s]).view('U1').ravel()
opt = np.zeros((n+2*h-1, n+2*h-1), int)
y, x = opt.strides
hh = np.lib.stride_tricks.as_strided(opt[h-1:, h-1:], (2, h, n), (x, x-y, x+y))
p, o, c = np.ogrid[:2, :h, :n]
hh[...] = 2 * np.logical_and.accumulate(s[c+o+p] == s[c-o], axis=1)
np.einsum('ii->i', opt)[...] = 1
hh[...] = hh.cumsum(axis=1)
opt = np.maximum.accumulate(opt[-h-1:None if h == 1 else h-2:-1, h-1:-h], axis=0)[::-1]
return np.maximum.accumulate(opt, axis=1)
def stresstest(n=100):
from string import ascii_lowercase
import random
from timeit import timeit
Tv, Tl = 0, 0
for i in range(n):
s = ''.join(random.choices(ascii_lowercase[:random.randint(2, 26)], k=random.randint(1, 100)))
print(s, end=' ')
assert np.all(vect(s) == loopy(s))
Tv += timeit(lambda: vect(s), number=10)
Tl += timeit(lambda: loopy(s), number=10)
print()
print(f"total time loopy {Tl}, vect {Tv}")
Demo:
>>> stresstest(20)
caccbbdbcfbfdcacebbecffacabeddcfdededeeafaebeaeedaaedaabebfacbdd fckjhrmupcqmihlohjog dffffgalbdbhkjigladhgdjaaagelddehahbbhejkibdgjhlkbcihiejdgidljfalfhlaglcgcih eacdebdcfcdcccaacfccefbccbced agglljlhfj mvwlkedblhvwbsmvtbjpqhgbaolnceqpgkhfivtbkwgbvujskkoklgforocj jljiqlidcdolcpmbfdqbdpjjjhbklcqmnmkfckkch ohsxiviwanuafkjocpexjmdiwlcmtcbagksodasdriieikvxphksedajwrbpee mcwdxsoghnuvxglhxcxxrezcdkahpijgujqqrqaideyhepfmrgxndhyifg omhppjaenjprnd roubpjfjbiafulerejpdniniuljqpouimsfukudndgtjggtbcjbchhfcdhrgf krutrwnttvqdemuwqwidvntpvptjqmekjctvbbetrvehsgxqfsjhoivdvwonvjd adiccabdbifigeigdfaieecceciaghadiaigibehdaichfibeaggcgdciahfegefigghgebhddciaei llobdegpmebejvotsr rtnsevatjvuowmquaulfmgiwsophuvlablslbwrpnhtekmpphsenarhrptgbjvlseeqstewjgfhopqwgmcbcihljeguv gcjlfihmfjbkdmimjknamfbahiccbhnceiahbnhghnlleimmieglgbfjbnmemdgddndhinncegnmgmfmgahhhjkg nhbnfhp cyjcygpaaeotcpwfhnumcfveq snyefmeuyjhcglyluezrx hcjhejhdaejchedbce
total time loopy 0.2523909523151815, vect 0.03500175685621798

Related

Greedy Makespan algorithm

I am needing to implement this greedy algorithm in python, but am having trouble understanding how to find the 'processor' for which M[j] is the least. Algorithm provided below...
greedy_min_make_span(T, m):
# T is an array of n numbers, m >= 2
A = [Nil, ... , Nil] # Initialize the assignments to nil (array size n)
M = [ 0, 0, ...., 0] # initialize the current load of each processor to 0 (array size m)
for i = 1 to n
find processor j for which M[j] is the least.
A[i] = j
M[j] = M[j] + T[i]
# Assignment achieves a makespan of max(M[1], .. M[m])
return A
def greedy_makespan_min(times, m):
# times is a list of n jobs.
assert len(times) >= 1
assert all(elt >= 0 for elt in times)
assert m >= 2
n = len(times)
# please do not reorder the jobs in times or else tests will fail.
# Return a tuple of two things:
# - Assignment list of n numbers from 0 to m-1
# - The makespan of your assignment
A = n*[0]
M = m*[0]
i = 1
for i in range(i, n):
j = M.index(min(M))
A[i] = j
M[j] = M[j] + times[i]
return (A, M)
FIXED: The error i'm getting right now is "list assignment index out of range" when I am trying to assign A[i] to j.
Utility function:
def compute_makespan(times, m, assign):
times_2 = m*[0]
for i in range(len(times)):
proc = assign[i]
time = times[i]
times_2[proc] = times_2[proc] + time
return max(times_2)
Test cases that I have...
def do_test(times, m, expected):
(a, makespan) = greedy_makespan_min(times,m )
print('\t Assignment returned: ', a)
print('\t Claimed makespan: ', makespan)
assert compute_makespan(times, m, a) == makespan, 'Assignment returned is not consistent with the reported makespan'
assert makespan == expected, f'Expected makespan should be {expected}, your core returned {makespan}'
print('Passed')
print('Test 1:')
times = [2, 2, 2, 2, 2, 2, 2, 2, 3]
m = 3
expected = 7
do_test(times, m, expected)
print('Test 2:')
times = [1]*20 + [5]
m = 5
expected =9
do_test(times, m, expected)
Right now I am failing the test cases. My assignment returned is not consistent with the reported makespan. My assignment returned is: [0, 0, 1, 2, 0, 1, 2, 0, 1] and my claimed makespan is: [6, 7, 4]. My compute makespan is returning 8 when it is expecting 7. Any ideas where I'm implementing this algorithm wrong?

Change A = n*[] to A = n*[0].
Instead of creating a list with length n, A = n*[] would create an empty list. Since you're assigning A[i] = j in each iteration, the change would functionally make no difference to the output.

Min Makespan Algorithm [duplicate]

I am needing to implement this greedy algorithm in python, but am having trouble understanding how to find the 'processor' for which M[j] is the least. Algorithm provided below...
greedy_min_make_span(T, m):
# T is an array of n numbers, m >= 2
A = [Nil, ... , Nil] # Initialize the assignments to nil (array size n)
M = [ 0, 0, ...., 0] # initialize the current load of each processor to 0 (array size m)
for i = 1 to n
find processor j for which M[j] is the least.
A[i] = j
M[j] = M[j] + T[i]
# Assignment achieves a makespan of max(M[1], .. M[m])
return A
def greedy_makespan_min(times, m):
# times is a list of n jobs.
assert len(times) >= 1
assert all(elt >= 0 for elt in times)
assert m >= 2
n = len(times)
# please do not reorder the jobs in times or else tests will fail.
# Return a tuple of two things:
# - Assignment list of n numbers from 0 to m-1
# - The makespan of your assignment
A = n*[0]
M = m*[0]
i = 1
for i in range(i, n):
j = M.index(min(M))
A[i] = j
M[j] = M[j] + times[i]
return (A, M)
FIXED: The error i'm getting right now is "list assignment index out of range" when I am trying to assign A[i] to j.
Utility function:
def compute_makespan(times, m, assign):
times_2 = m*[0]
for i in range(len(times)):
proc = assign[i]
time = times[i]
times_2[proc] = times_2[proc] + time
return max(times_2)
Test cases that I have...
def do_test(times, m, expected):
(a, makespan) = greedy_makespan_min(times,m )
print('\t Assignment returned: ', a)
print('\t Claimed makespan: ', makespan)
assert compute_makespan(times, m, a) == makespan, 'Assignment returned is not consistent with the reported makespan'
assert makespan == expected, f'Expected makespan should be {expected}, your core returned {makespan}'
print('Passed')
print('Test 1:')
times = [2, 2, 2, 2, 2, 2, 2, 2, 3]
m = 3
expected = 7
do_test(times, m, expected)
print('Test 2:')
times = [1]*20 + [5]
m = 5
expected =9
do_test(times, m, expected)
Right now I am failing the test cases. My assignment returned is not consistent with the reported makespan. My assignment returned is: [0, 0, 1, 2, 0, 1, 2, 0, 1] and my claimed makespan is: [6, 7, 4]. My compute makespan is returning 8 when it is expecting 7. Any ideas where I'm implementing this algorithm wrong?

Change A = n*[] to A = n*[0].
Instead of creating a list with length n, A = n*[] would create an empty list. Since you're assigning A[i] = j in each iteration, the change would functionally make no difference to the output.

In numpy, most computationally efficient way to find the array with shortest non-zero sequence in array of arrays

Say that I have an array of arrays
import numpy as np
z = np.array(
[
[1, 1, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 0],
[1, 1, 1, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
]
)
Where 1s start on the left side of each array, and 0s on the right side if any. For many applications, this is how arrays are padded so that each array is of the same length in an array of arrays.
How would I get the shortest sequence of non-zeros for such an array.
In this case, the shortest sequence is the first array, which has a length of 2.
The obvious answer is to iterate over each array and find the index of the first zero, but I feel that there's probably a method that takes more advantage of numpy's c processing.

Benchmark with a 5000×5000 array:
74.3 ms Dani
33.8 ms user19077881
2.6 ms Kelly1
1.4 ms Kelly2
My Kelly1 is an O(m+n) saddleback search from top-right to bottom-left:
def Kelly1(z):
m, n = z.shape
j = n - 1
for i in range(m):
while not z[i, j]:
j -= 1
if j < 0:
return 0
return j + 1
(Michael Szczesny said it can trivially be made ~150x faster (if I remember correctly) by using Numba. I'm not equipped to test that myself, though.)
My Kelly2 is an O(m log n) horizontal binary search, using NumPy to check whether a column is full of non-zeros:
def Kelly2(z):
m, n = z.shape
lo, hi = 0, n
while lo < hi:
mid = (lo + hi) // 2
if z[:, mid].all():
lo = mid + 1
else:
hi = mid
return lo
(Could be shorter by using bisect with a key, but I don't have Python 3.10 to test right now.)
Note: Dani and user19077881 return different results: The smallest number of non-zeros in any row, or the row with the fewest non-zeros. I followed Dani's lead, as that's the accepted answer. It doesn't really matter, as you can compute one result from the other very quickly (by finding the index of the first zero in the column or row, respectively).
Full benchmark code (Try it online!):
import numpy as np
from timeit import timeit
import random
m, n = 5000, 5000
def genz():
lo = random.randrange(n*5//100, n//3)
return np.array(
[
[1]*ones + [0]*(n-ones)
for ones in random.choices(range(lo, n+1), k=m)
]
)
def Dani(z):
return np.count_nonzero(z, axis=1).min()
def user19077881(z):
z_sums = z.sum(axis = 1)
z_least = np.argmin(z_sums)
return z_least
def Kelly1(z):
m, n = z.shape
j = n - 1
for i in range(m):
while not z[i, j]:
j -= 1
if j < 0:
return 0
return j + 1
def Kelly2(z):
m, n = z.shape
lo, hi = 0, n
while lo < hi:
mid = (lo + hi) // 2
if z[:, mid].all():
lo = mid + 1
else:
hi = mid
return lo
funcs = Dani, user19077881, Kelly1, Kelly2
for _ in range(3):
z = genz()
for f in funcs:
t = timeit(lambda: f(z), number=1)
print('%5.1f ms ' % (t * 1e3), f.__name__)
print()

Use np.count_nonzero + np.min:
res = np.count_nonzero(z, axis=1).min()
print(res)
Output
2
The function count_nonzero returns an array like:
[2 5 3 6]
then simply find the minimum value.
If you want the index of the row, use np.argmin instead.

If you want to know which sub-array has the least zeros then you could use:
z_sums = z.sum(axis = 1)
z_least = np.argmin(z_sums)

1d array or 2d array when solving dynamic programming problems

My question is how can you identify when to use a 1d array or 2d array for a dynamic programming problem. For instance, I stumbled upon the problem number of ways to make change
Here is an example:
inputs n = 12 and denominations = [2, 3, 7],
suppose you can pick an unlimited (infinite) amount of coins of each of the denominations you have. In how many ways can you make change for 12. The answer is 4
I got to the answer using dynamic programming and here is my code
def numberOfWaysToMakeChange(n, denoms):
if n == 0 or len(denoms) == 0:
return 1
ways = [[0 for _ in range(n + 1)] for _ in range(len(denoms))]
for row in ways:
row[0] = 1
for i in range(n + 1):
if i % denoms[0] == 0:
ways[0][i] = 1
for i in range(1, len(denoms)):
for j in range(1, n + 1):
if denoms[i] > j:
ways[i][j] = ways[i - 1][j]
else:
ways[i][j] = ways[i - 1][j] + ways[i][j - denoms[i]]
return ways[-1][-1]
result = numberOfWaysToMakeChange(12, [2, 3, 7])
print(result)
But online I found an answer that works as well that looks like the following
ways = [0 for _ in range(n + 1)]
ways[0] = 1
for denom in denoms:
for amount in range(1, n+1):
if denom <= amount:
ways[amount] += ways[amount - denom]
return ways[n]
How can you identify when you can use a 1d array for these kind of questions?

Fill in an array using loop with multiple variables (new to Python, old to C++ (back in the day))

Basically what I want to do is create something like this in python (this is basic idea and not actual code):
n = 3
i = n + 1
a = [1, 3, 3, 1]
b = [1, 2, 1]
while n > 1:
Check if n is even
- if n is even, then for all i in range(0,n), insert values into an array using the formula below
- b[n-i] = a[n-i-1] + a[n-i], this value will replace the previously given value of b[] above the code.
- Print out the array
- After each area is filled, n+=1, i=n+1 are applied, then the loop continues
Check if n is odd
- same process except formula is
- a[n-i] = b[n-i-1] + a[n-i], this value will replace the previously given value of a[] above the code.
- Print out the array
- After each area is filled, n+=1, i=n+1 are applied, then the loop continues
This process will loop and print each and continue on, the arrays will essentially look like this:
b = [1, 4, 6, 4, 1], a = [1 5, 10, 10, 5, 1], b = [1, 6, 15, 20, 20, 15, 6, 1], etc.
Here is the code that I currently have, however I'm getting an 'out of range' error.
n = 3
i = n + 1
b = [1, 2, 1]
a = [1, 3, 3, 1]
while n > 1:
if n%2==0:
print("even")
for i in range(0,n):
b[n-i].append(a[n-i-1]+a[n-i])
else:
print("odd")
for i in range(0,n):
print("yay")
a[n-i].append(b[n-i-1]+b[n-i])
if n%2==0:
print(b)
else:
print(a)
n +=1
i = n + 1
print("loop")
The random prints throughout the code are to test and see if it is even making it into the process. There were from a previous code and I just haven't removed them yet.
Hopefully you can help me, I can't find anything online about a loop that constantly increases the size of an array and fills it at the same time.

Sorry struggling with the code that's in the sample. From your description I can see that you want to generate Pascal's triangle. Here's a short snippet that will do this.
a = [1, 1]
for _ in range(10):
a = [1] + [x+y for (x,y) in zip(a[:-1], a[1:])] + [1]
print a
a[:-1] refers to the whole array except the last element and a[1:] refers to whole array except first element. zip combines first elements from each array into a tuple and so on. All that remains is to add them and pad the row with ones one the outside. _ is used to tell Python, I don't care about this variable - useful if you want to be explicit that you are not using the range value for anything except flow control.

Maria's answer is perfect, I think. If you want to start with your code, you can rewrite your code as below to get similar result. FYI.
n = 3
b = [1, 2, 1]
while 1 < n < 10:
if n % 2 == 0:
print("even")
b = [0] * (n + 1)
for i in range(0, n + 1):
if i == 0:
b[i] = a[0]
elif i == n:
b[i] = a[i - 1]
else:
b[n - i] = a[i - 1] + a[i]
else:
print("odd")
a = [0] * (n + 1)
for i in range(0, n + 1):
if i == 0:
a[i] = b[0]
elif i == n:
a[i] = b[i - 1]
else:
a[i] = b[i - 1] + b[i]
if n % 2 == 0:
print(b)
else:
print(a)
n += 1
print("loop")

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Speed Up Nested For Loops with NumPy - python

Related

Greedy Makespan algorithm

Min Makespan Algorithm [duplicate]

In numpy, most computationally efficient way to find the array with shortest non-zero sequence in array of arrays

1d array or 2d array when solving dynamic programming problems

Fill in an array using loop with multiple variables (new to Python, old to C++ (back in the day))

Categories

Resources