Would like to vectorize while loop for performance (updated)

Would like to vectorize while loop for performance (updated) - python

Set values for a window of size n of an array based on the current value of another array
Ignore values that the window overrides
Need to be able to change the window size (n) for different runs
This code works but it is very slow.
n = 3
def signal(arr):
signal = pd.Series(data=0, index=arr.index)
i = 0
while i < len(arr) - 1:
s = arr.iloc[i]
if s in [-1, 1]:
j = i + n
signal.iloc[i: j] = s
i = i + n
else:
i += 1
return signal
arr = [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, -1, -1, 0, 0, 0, 0]
signal = [0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, -1, -1, -1, 0, 0, 0]

Don't make arr a pandas series object but just a numpy array.
Try this:
import numpy as np
def signal(arr, n):
size = len(arr)
signal = np.zeros(size)
for i in range(size):
s = arr[i]
if s in [-1, 1]:
j = i + n
signal[i: j] = s
i = i + n
else:
i += 1
return signal
arr = [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, -1, -1, 0, 0, 0, 0]
n = 3
signal(arr, n)
I benchmarked the two different solutions and this is way faster:
Original: 738 µs ± 21.9 µs per loop (mean ± std. dev. of 7 runs, 1000
loops each)
New: 9.56 µs ± 778 ns per loop (mean ± std. dev. of 7 runs, 100000
loops each)

Related

Conway's Game of Life: check if a cell is in the corner/border

I am trying to implement Game Of Life in Python. The new_state function should count the neighbours of a cell and decide, based on the rules (if statements) if it will die(turn 0) or stay alive(turn 1) in the next generation.
Is there a way to check if a value is in the corner/border of an array? The neighbouring cells of a cell will then be the immediate surrounding cells, there is no need to wrap the array. Right now, new_state throws out an index error. I am using numPy for this function.
import numpy
def new_state(array):
updated_array=[]
for x in range(len(array)):
for y in range(len(array[x])):
cell = array[x][y]
neighbours = (array[x-1][y-1], array[x][y-1], array[x+1][y-1], array[x+1][y], array[x+1][y+1], array[x][y+1],
array[x-1][y+1], array[x-1][y])
neighbours_count = sum(neighbours)
if cell == 1:
if neighbours_count == 0 or neighbours_count == 1:
updated_array.append(0)
elif neighbours_count == 2 or neighbours_count == 3:
updated_array.append(1)
elif neighbours_count > 3:
updated_array.append(0)
elif cell == 0:
if neighbours_count == 3:
updated_array.append(1)
return updated_array

To avoid the index error you can pad the array with zeroes:
def new_state(array):
array_padded = np.pad(array, 1)
updated_array=array.copy()
for x in range(1, array.shape[0]+1):
for y in range(1, array.shape[1]+1):
cell = array[x-1][y-1]
neighbours = (array_padded[x-1][y-1], array_padded[x][y-1],
array_padded[x+1][y-1], array_padded[x+1][y],
array_padded[x+1][y+1], array_padded[x][y+1],
array_padded[x-1][y+1], array_padded[x-1][y])
neighbours_count = sum(neighbours)
if cell == 1 and (neighbours_count < 2 or neighbours_count > 3):
updated_array[x-1, y-1] = 0
elif cell == 0 and neighbours_count == 3:
updated_array[x-1, y-1] = 1
return updated_array
Here's a much faster vectorized version:
from scipy.signal import correlate2d
def new_state_v(array):
kernel = np.ones((3,3))
kernel[1,1] = 0
neighbours = correlate2d(array, kernel, 'same')
updated_array=array.copy()
updated_array[(array == 1) & ((neighbours < 2) | (neighbours > 3))] = 0
updated_array[(array == 0) & (neighbours == 3)] = 1
return updated_array
Let's test
>>> array = np.random.randint(2, size=(5,5))
>>> array
array([[0, 0, 1, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[1, 1, 0, 0, 1],
[0, 1, 1, 1, 0]])
>>> new_state_v(array)
array([[0, 0, 1, 1, 0],
[0, 0, 1, 0, 0],
[0, 1, 0, 1, 0],
[1, 1, 0, 0, 1],
[1, 1, 1, 1, 0]])
Speed comparison:
array = np.random.randint(2, size=(1000,1000))
%timeit new_state(array)
7.76 s ± 94.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit new_state_v(array)
90.4 ms ± 703 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

I have not thoroughly tested this, but the way I'd go about it would be to allow the array index value to wrap - this way no additional logic besides the wrapping is required (which could be more bug prone when requiring modifications down the line).
Use:
import numpy as np
a = np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]])
a[np.mod(x,a.shape[0]),np.mod(y,a.shape[1])]
where x and y are your original indexes.
Examples:
a[np.mod(4,a.shape[0]),np.mod(3,a.shape[1])]
>>> 1
# even wraps correctly with negative indexes!
a[np.mod(-1,a.shape[0]),np.mod(-1,a.shape[1])]
>>> 12

Non-Assert Way To Compare Two 2D Arrays for Accuracy

I am currently training a LSTM which classifies frames. What I am trying to do is compare two 2d numpy arrays to check for accuracy between my prediction and target. I have currently looked around for non-naive ways to solve this problem using NumPy / SciPy.
I am aware that there is np.testing.assert_array_equal(x, y) which uses Assertion to output the results. I am looking for a way to solve this issue using NumPy / SciPy so I can store the results rather than an Assert print out:
Arrays are not equal
(mismatch 14.285714285714292%)
x: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
y: array([0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0])
x = np.asarray([[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]])
y = np.asarray([[0, 0, 0], [0, 0, 0], [0, 0, 1], [0, 1, 0], [1, 0, 0], [0, 0, 0], [0, 0, 0]])
try:
np.testing.assert_array_equal(x, y)
res = True
except AssertionError as err:
res = False
print (err)
I am looking for a way which I can store the mismatch of these two arrays without using a naive fashion (Two comparative loops):
accuracy = thisFunction(x,y)
I am sure there is something in NumPy which can solve this, I've had no luck with searching for built-in functions.

As hpaulj noted in the comment, you can use numpy.allclose() for checking the array equality, with acceptable difference of up to some tolerance value (see below or NumPy notes).
Here is a small illustration with two simple float arrays.
In [7]: arr1 = np.array([1.3, 1.4, 1.5, 3.4])
In [8]: arr2 = np.array([1.299999, 1.4, 1.4999999, 3.3999999999])
In [9]: np.allclose(arr1, arr2)
Out[9]: True
numpy.allclose will return True if the corresponding elements in the arrays are dissimilar (only up to the tolerance value). Else it would return False. NumPy default for relative & absolute tolerance values are rtol=1e-05, atol=1e-08 respectively.
Having said that, if you only want to compare int arrays, then you'd be better off with numpy.array_equal() which is approx. 8x faster than numpy.allclose.
In [17]: arr1 = np.random.randint(23045)
In [18]: arr2 = np.random.randint(23045)
In [19]: %timeit np.allclose(arr1, arr2)
22.9 µs ± 471 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [20]: %timeit np.array_equal(arr1, arr2)
3.99 µs ± 68.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

np.array_equal(x, y) is roughly equivalent to (x == y).all(). You can use this to compute the discrepancies:
def array_comp(x, y):
"""
Return the status of the comparison and the discrepancy.
For arrays of the same shape, the discrepancy is a ratio of mismatches to the total size.
For arrays of different shapes or sizes, the discrepancy is a message indicating the mismatch.
"""
if x.shape != y.shape:
return False, 'shape'
count = x.size - np.count_nonzero(x == y)
return count == 0, count / x.size

Incrementally increase non zero elements in a list

if i got this list
a = [1,0,0,1,0,0,0,1]
and I want it turned into
a = [1,0,0,2,0,0,0,3]

Setup for solution #1 and #2
from itertools import count
to_add = count()
a = [1,0,0,1,0,0,0,1]
Solution #1
>>> [x + next(to_add) if x else x for x in a]
[1, 0, 0, 2, 0, 0, 0, 3]
Solution #2, hacky but fun
>>> [x and x + next(to_add) for x in a]
[1, 0, 0, 2, 0, 0, 0, 3]
Setup for solution #3 and #4
import numpy as np
a = np.array([1,0,0,1,0,0,0,1])
Solution #3
>>> np.where(a == 0, 0, a.cumsum())
array([1, 0, 0, 2, 0, 0, 0, 3])
Solution #4 (my favorite one yet)
>>> a*a.cumsum()
array([1, 0, 0, 2, 0, 0, 0, 3])
All the cumsum solutions assume that the non-zero elements of a are all ones.
Timings:
# setup
>>> a = [1, 0, 0, 1, 0, 0, 0, 1]*1000
>>> arr = np.array(a)
>>> to_add1, to_add2 = count(), count()
# IPython timings # i5-6200U CPU # 2.30GHz (though only relative times are of interest)
>>> %timeit [x + next(to_add1) if x else x for x in a] # solution 1
669 µs ± 3.59 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit [x and x + next(to_add2) for x in a] # solution 2
673 µs ± 15.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit np.where(arr == 0, 0, arr.cumsum()) # solution 3
34.7 µs ± 94.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit arr = np.array(a); np.where(arr == 0, 0, arr.cumsum()) # solution 3 with array creation
474 µs ± 14.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit arr*arr.cumsum() # solution 4
23.6 µs ± 131 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit arr = np.array(a); arr*arr.cumsum() # solution 4 with array creation
465 µs ± 6.82 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Here is how I would do it:
def increase(l):
count = 0
for num in l:
if num == 1:
yield num + count
count += 1
else:
yield num
c = list(increase(a))
c
[1, 0, 0, 2, 0, 0, 0, 3]

So, you want to increase each 1 except for the first one, right?
How about:
a = [1,0,0,1,0,0,0,1]
current_number = 0
for i, num in enumerate(a):
if num == 1:
a[i] = current_number + 1
current_number += 1
print(a)
>>> [1, 0, 0, 2, 0, 0, 0, 3]
Or, if you prefer:
current_number = 1
for i, num in enumerate(a):
if num == 1:
a[i] = current_number
current_number += 1

Use a list comprehension for this:
print([a[i]+a[:i].count(1) if a[i]==1 else a[i] for i in range(len(a))])
Output:
[1, 0, 0, 2, 0, 0, 0, 3]
Loop version:
for i in range(len(a)):
if a[i]==1:
a[i]=a[i]+a[:i].count(1)

Using numpy cumsum or cumulative sum to replace 1's to sum of 1's
In [4]: import numpy as np
In [5]: [i if i == 0 else j for i, j in zip(a, np.cumsum(a))]
Out[5]: [1, 0, 0, 2, 0, 0, 0, 3]

Other option: a one liner list comprehension, no dependencies.
[ 0 if e == 0 else sum(a[:i+1]) for i, e in enumerate(a) ]
#=> [1, 0, 0, 2, 0, 0, 0, 3]

Replace 1's with 0's in a sequence

I have a huge list of 1's and 0's like this :
x = [1,0,0,0,1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1].
Full list here.
I want to create a new list y with the condition that , the 1's should be preserved only if the they occur in a sequence of >= than 10, else those 1's should be replaced by zeroes.
ex based on x above ^ , y should become:
y = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1].
So far I have the following :
Finding out where the changes are occurring and
Finding out what sequences are occurring with what frequency:
import numpy as np
import itertools
nx = np.array(x)
print np.argwhere(np.diff(nx)).squeeze()
answer = []
for key, iter in itertools.groupby(nx):
answer.append((key, len(list(iter))))
print answer
which gives me :
[0 3 8 14] # A
[(1, 1), (0, 3), (1, 5), (0, 6), (1, 10)] # B
#A which means the changes are happening after the 0th, 3rd and so on positions.
#B means there is one 1, followed by three 0's followed by five 1's followed by 6 zeroes followed by 10 1's.
How do I proceed to the final step of creating y where we will be replacing the 1's with 0's depending upon the sequence length?
PS: ##I'm humbled by these brilliant solutions from all the wonderful people.

Just check while you are iterating over the group-by. Something like:
>>> from itertools import groupby
>>> x = [1,0,0,0,1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1]
>>> result = []
>>> for k, g in groupby(x):
... if k:
... g = list(g)
... if len(g) < 10:
... g = len(g)*[0]
... result.extend(g)
...
>>> result
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Note, this is faster than the corresponding pandas solution, for a dataset of this size at least:
In [11]: from itertools import groupby
In [12]: %%timeit
...: result = []
...: for k, g in groupby(x):
...: if k:
...: g = list(g)
...: if len(g) < 10:
...: g = len(g)*[0]
...: result.extend(g)
...:
181 µs ± 1.72 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [13]: %%timeit s = pd.Series(x)
...: s[s.groupby(s.ne(1).cumsum()).transform('count').lt(10)] = 0
...:
4.03 ms ± 176 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
And note, that's being generous with the pandas solution, not counting any time converting from list to pd.Series or converting back, including those:
In [14]: %%timeit
...: s = pd.Series(x)
...: s[s.groupby(s.ne(1).cumsum()).transform('count').lt(10)] = 0
...: s = s.tolist()
...:
4.92 ms ± 119 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Here is another numpy approach. Please take note of the benchmarks at the bottom of this post:
import numpy as np
import pandas as pd
from itertools import groupby
import re
from timeit import timeit
def f_pp(data):
switches = np.empty((data.size + 1,), bool)
switches[0] = data[0]
switches[-1] = data[-1]
switches[1:-1] = data[:-1]^data[1:]
switches = np.where(switches)[0].reshape(-1, 2)
switches = switches[switches[:, 1]-switches[:, 0] >= 10].ravel()
reps = np.empty((switches.size + 1,), int)
reps[1:-1] = np.diff(switches)
reps[0] = switches[0]
reps[-1] = data.size - switches[-1]
return np.repeat(np.arange(reps.size) & 1, reps)
def f_ja(data):
result = []
for k, g in groupby(data):
if k:
g = list(g)
if len(g) < 10:
g = len(g)*[0]
result.extend(g)
return result
def f_mu(s):
s = s.copy()
s[s.groupby(s.ne(1).cumsum()).transform('count').lt(10)] = 0
return s
def vrange(starts, stops):
stops = np.asarray(stops)
l = stops - starts # Lengths of each range.
return np.repeat(stops - l.cumsum(), l) + np.arange(l.sum())
def f_ka(data):
x = data.copy()
d = np.where(np.diff(x) != 0)[0]
d2 = np.diff(np.concatenate(([0], d, [x.size])))
ind = np.where(d2 >= 10)[0] - 1
x[vrange(d[ind] + 1, d[ind + 1] + 2)] = 0
return x
def f_ol(data):
return list(re.sub(b'(?<!\x01)\x01{,9}(?!\x01)', lambda m: len(m.group()) * b'\x00', bytes(data)))
n = 10_000
data = np.repeat((np.arange(n) + np.random.randint(2))&1, np.random.randint(1, 20, (n,)))
datal = data.tolist()
datap = pd.Series(data)
kwds = dict(globals=globals(), number=100)
print(np.where(f_ja(datal) != f_pp(data))[0])
print(np.where(f_ol(datal) != f_pp(data))[0])
#print(np.where(f_ka(data) != f_pp(data))[0])
print(np.where(f_mu(datap).values != f_pp(data))[0])
print('itertools.groupby: {:6.3f} ms'.format(10 * timeit('f_ja(datal)', **kwds)))
print('re: {:6.3f} ms'.format(10 * timeit('f_ol(datal)', **kwds)))
#print('numpy Kasramvd: {:6.3f} ms'.format(10 * timeit('f_ka(data)', **kwds)))
print('pandas: {:6.3f} ms'.format(10 * timeit('f_mu(datap)', **kwds)))
print('numpy pp: {:6.3f} ms'.format(10 * timeit('f_pp(data)', **kwds)))
Sample output:
[] # Delta ja, pp
[] # Delta ol, pp
[ 749 750 751 ... 98786 98787 98788] # Delta mu, pp
itertools.groupby: 5.415 ms
re: 28.197 ms
pandas: 14.972 ms
numpy pp: 0.788 ms
Note only from scratch solutions considered. #Olivier's #juanpa.arrivillaga's and my approach yield same answer, #MaxU's doesn't. Couldn't get #Kazramvd's to finish reliably. (May be my fault - don't know pandas and didn't fully understand #Kazramvd's solution).
Note that is only one example, other conditions (like shorter lists, more switches, etc.) may change the ranking.

With list comprehension
From your encoded list B, you can use list comprehension to generate the new list.
b = [(1, 1), (0, 3), (1, 5), (0, 6), (1, 10)] # B
y = sum(([num and int(rep >= 10)] * rep for num, rep in b), [])
From the start with re
Alternatively, from the start this looks like something re could do as it can work with bytes.
import re
x = [1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
y = list(re.sub(b'(?<!\x01)\x01{,9}(?!\x01)', lambda m: len(m.group()) * b'\x00', bytes(x)))
Both solutions output:
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

If you wanna use Numpy here is one Vectorized approach:
ind = np.where(np.diff(np.concatenate(([0], np.where(np.diff(x) != 0)[0], [x.size]))) >= 10)[0] - 1
x[vrange(d[ind] + 1, d[ind + 1] + 2)] = 0
If you want to use Python, here is an approach using itertools.chain, itertools.repeat and itertools.groupby within a list-comprehension:
chain.from_iterable(repeat(0, len(i)) if len(i) >= 10 else i for i in [list(g) for _, g in groupby(x)])
Demos:
# Python
In [28]: list(chain.from_iterable(repeat(0, len(i)) if len(i) >= 10 else i for i in [list(g) for _, g in groupby(x)]))
Out[28]: [1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
# Numpy
In [161]: x = np.array([1,0,0,0,1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1, 0, 0, 1, 1, 1, 1, 1, 1 ,1, 1, 1, 1, 0, 0])
In [162]: d = np.where(np.diff(x) != 0)[0]
In [163]: d2 = np.diff(np.concatenate(([0], d, [x.size])))
In [164]: ind = np.where(d2 >= 10)[0] - 1
In [165]: def vrange(starts, stops):
...: stops = np.asarray(stops)
...: l = stops - starts # Lengths of each range.
...: return np.repeat(stops - l.cumsum(), l) + np.arange(l.sum())
...:
In [166]: x[vrange(d[ind] + 1, d[ind + 1] + 2)] = 0
In [167]: x
Out[167]:
array([1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
For Vrange I used this answer https://codereview.stackexchange.com/questions/83018/vectorized-numpy-version-of-arange-with-multiple-start-stop but I think there might be more optimized approaches for that.

Try this:
y = []
for pair in b: ## b is the list which you called #B
add = 0
if pair[0] == 1 and pair[1] > 9:
add = 1
y.extend([add] * pair[1])

using Pandas:
import pandas as pd
In [130]: s = pd.Series(x)
In [131]: s
Out[131]:
0 1
1 0
2 0
3 0
4 1
..
20 1
21 1
22 1
23 1
24 1
Length: 25, dtype: int64
In [132]: s[s.groupby(s.ne(1).cumsum()).transform('count').lt(10)] = 0
In [133]: s.tolist()
Out[133]: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
In [134]: s
Out[134]:
0 0
1 0
2 0
3 0
4 0
..
20 1
21 1
22 1
23 1
24 1
Length: 25, dtype: int64
for your "huge" list it takes approx. 7 ms on my old notebook:
In [141]: len(x)
Out[141]: 5124
In [142]: %%timeit
...: s = pd.Series(x)
...: s[s.groupby(s.ne(1).cumsum()).transform('count').lt(10)] = 0
...: res = s.tolist()
...:
6.56 ms ± 16.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Vectorization to achieve performance

I want to avoid using for loop in the following code to achieve performance. Is vectorization suitable for this kind of problem?
a = np.array([[0,1,2,3,4],
[5,6,7,8,9],
[0,1,2,3,4],
[5,6,7,8,9],
[0,1,2,3,4]],dtype= np.float32)
temp_a = np.copy(a)
for i in range(1,a.shape[0]-1):
for j in range(1,a.shape[1]-1):
if a[i,j] > 3:
temp_a[i+1,j] += a[i,j] / 5.
temp_a[i-1,j] += a[i,j] / 5.
temp_a[i,j+1] += a[i,j] / 5.
temp_a[i,j-1] += a[i,j] / 5.
temp_a[i,j] -= a[i,j] * 4. / 5.
a = np.copy(temp_a)

You are basically doing convolution, with some special treatment for borders.
Try the following:
from scipy.signal import convolve2d
# define your filter
f = np.array([[0.0, 0.2, 0.0],
[0.2,-0.8, 0.2],
[0.0, 0.2, 0.0]])
# select parts of 'a' to be used for convolution
b = (a * (a > 3))[1:-1, 1:-1]
# convolve, padding with zeros ('same' mode)
c = convolve2d(b, f, mode='same')
# add the convolved result to 'a', excluding borders
a[1:-1, 1:-1] += c
# treat the special cases of the borders
a[0, 1:-1] += .2 * b[0, :]
a[-1, 1:-1] += .2 * b[-1, :]
a[1:-1, 0] += .2 * b[:, 0]
a[1:-1, -1] += .2 * b[:, -1]
It gives the following result, which is the same as you nested loops.
[[ 0. 2.2 3.4 4.6 4. ]
[ 6.2 2.6 4.2 3. 10.6]
[ 0. 3.4 4.8 6.2 4. ]
[ 6.2 2.6 4.2 3. 10.6]
[ 0. 2.2 3.4 4.6 4. ]]

My trail uses 3 filters, rot90, np.where, np.sum, and np.multiply. I am not sure which way to benchmark is more reasonable. If you do not take into account the time to create filters, it is roughly 4 times faster.
# Each filter basically does what `op` tries to achieve in a loop
filter1 = np.array([[0, 1 ,0, 0, 0],
[1, -4, 1, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]]) /5.
filter2 = np.array([[0, 0 ,1, 0, 0],
[0, 1, -4, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]]) /5.
filter3 = np.array([[0, 0 ,0, 0, 0],
[0, 0, 1, 0, 0],
[0, 1, -4, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 0]]) /5.
# only loop over the center of the matrix, a
center = np.array([[0, 0, 0, 0, 0],
[0, 1, 1, 1, 0],
[0, 1, 1, 1, 0],
[0, 1, 1, 1, 0],
[0, 0, 0, 0, 0]])
filter1 and filter2 can be rotated to represent 4 filters individually.
filter1_90_rot = np.rot90(filter1, k=1)
filter1_180_rot = np.rot90(filter1, k=2)
filter1_270_rot = np.rot90(filter1, k=3)
filter2_90_rot = np.rot90(filter2, k=1)
filter2_180_rot = np.rot90(filter2, k=2)
filter2_270_rot = np.rot90(filter2, k=3)
# Based on different index from `a` return different filter
filter_dict = {
(1,1): filter1,
(3,1): filter1_90_rot,
(3,3): filter1_180_rot,
(1,3): filter1_270_rot,
(1,2): filter2,
(2,1): filter2_90_rot,
(3,2): filter2_180_rot,
(2,3): filter2_270_rot,
(2,2): filter3
}
Main function
def get_new_a(a):
x, y = np.where(((a > 3) * center) > 0) # find pairs that match the condition
return a + np.sum(np.multiply(filter_dict[i, j], a[i,j])
for (i, j) in zip(x,y))
Note: There seem to be some numerical errors such that np.equal() would mostly return False between my result and OP's while np.close() would return true.
Timing results
def op():
temp_a = np.copy(a)
for i in range(1,a.shape[0]-1):
for j in range(1,a.shape[1]-1):
if a[i,j] > 3:
temp_a[i+1,j] += a[i,j] / 5.
temp_a[i-1,j] += a[i,j] / 5.
temp_a[i,j+1] += a[i,j] / 5.
temp_a[i,j-1] += a[i,j] / 5.
temp_a[i,j] -= a[i,j] * 4. / 5.
a2 = np.copy(temp_a)
%timeit op()
167 µs ± 2.72 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit get_new_a(a):
37.2 µs ± 2.68 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Note again, we ignore the time to create filter as I think it would be a one time thing. If you do want to include the time to create filters, it is roughly two times faster. You might think it is not fair becasue op's method contains two np.copy. The bottleneck of op's method, I think, is the for loop.
Reference:
numpy.multiply do a elementwise multiplication between two matrix.
np.rot90 does rotation for us. k is a parameter that you can decide how many times to rotate.
np.isclose can use this function to check whether two matrices are close within some error that you can define.

I came up with this solution:
a = np.array([[0,0,0,0,0],
[0,6,2,8,0],
[0,1,5,3,0],
[0,6,7,8,0],
[0,0,0,0,0]],dtype= np.float32)
up= np.zeros_like(a)
down= np.zeros_like(a)
right= np.zeros_like(a)
left = np.zeros_like(a)
def new_a(a,u,r,d,l):
c = np.copy(a)
c[c <= 3] = 0
up[:-2, 1:-1] += c[1:-1,1:-1] / 5.
down[2:, 1:-1] += c[1:-1,1:-1] / 5.
left[1:-1, :-2] += c[1:-1,1:-1]/ 5.
right[1:-1, 2:] += c[1:-1,1:-1] / 5.
a[1:-1,1:-1] -= c[1:-1,1:-1] * 4. / 5.
a += up + down + left + right
return a

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Would like to vectorize while loop for performance (updated) - python

Related

Conway's Game of Life: check if a cell is in the corner/border

Non-Assert Way To Compare Two 2D Arrays for Accuracy

Incrementally increase non zero elements in a list

Replace 1's with 0's in a sequence

Vectorization to achieve performance

Categories

Resources