Numba: how to speed up numerical simulation requiring also GUI

Numba: how to speed up numerical simulation requiring also GUI - python

I was just starting to learn about Numba to speed up for loops.
I've read it is impossible to call a non-jitted function from a numba jitted function. Therefore I don't think I can #jitclass(spec) my class or #njit the main algorithm function (compute()) leaving my code how it is, since every step of the simulation (onestep()) also changes the value of the pixel in the image tkinter.Photoimage, which is a Python type. So, I was wondering whether:
there is any possible logical change to the program which would separate GUI and numerical part enough to allow Numba to be applied;
there is any alternative to Tkinter compatible with Numba;
there is any alternative to Numba which I may benefit from.
Here is a simplified version of my code for now:
import tkinter as tk
import numpy as np
window = tk.Tk()
window.geometry("600x600")
canv_w= 480
square_w = 16 #size of one element of the matrix
canvas=tk.Canvas(window,width=480,height=480)
canvas.pack()
my_image=tk.PhotoImage(width=480,height=480)
canvas.create_image((3, 3),image=my_image,anchor="nw",state="normal")
running =0
def pixel(self, i,j):
if self.matrix[i,j]==-1:
temp="#cc0000" #red
elif self.matrix[i,j]==0:
temp= "#fffafa" #white
elif self.matrix[i,j]==1:
temp="#7CFC00" #green
my_image.put(temp,to=(i*square_w,j*square_w,(i+1)*square_w,(j+1)*square_w))
class myClass:
def __init__(self, size):
self.L=size
self.matrix=np.random.choice([-1, 0, 1], (self.L,self.L), p=[0.45,0.1,0.45])
self.white_number=len(np.where(self.matrix==0)[0])
self.iteration=0
for i in range(self.L):
for j in range(self.L):
pixel(self,i,j)
def onestep(self):
whites=np.where(self.matrix==0)# find position of all white squares
my_v= np.random.choice(self.white_number)# randomly pick one white square...
x=whites[0][my_v]
y=whites[1][my_v]
num= np.random.choice([0,1,2,3]) #...randomly pick one of its 4 neighbours
neighbour=[[(x + 1)% self.L, y], [x, (y + 1) % self.L], [(x - 1)% self.L, y], [x, (y - 1)% self.L]]
#swap with neighbour
self.matrix[x,y]=self.matrix[neighbour[num][0],neighbour[num][1]]
self.matrix[neighbour[num][0],neighbour[num][1]]=0
pixel(self,x,y) #update the pixel the white square has left
pixel(self,neighbour[num][0],neighbour[num][1]) #update the pixel the white atom has jumped to
def compute(self):
if running:
for j in range(1, self.white_number + 1):
self.onestep()
self.iteration+=1
window.after(1000,self.compute)
running=1
myObj=myClass(30)
myObj.compute()
window.mainloop()

there is any alternative to Numba which I may benefit from.
cython exists, and is more mature than numba, but it requires a compiler, so you only make compiled binaries, not JIT the functions, it provides static typing and it removes the interpreter overhead.
there is any possible logical change to the program which would separate GUI and numerical part enough to allow Numba to be applied
you actually can call a non-jitted function using numba objmode, but it still has some constraints, instead consider spliting each jitted function into a jitable part and a non-jitable part as shown
import tkinter as tk
import numpy as np
import numba
window = tk.Tk()
window.geometry("600x600")
canv_w = 480
square_w = 16 # size of one element of the matrix
canvas = tk.Canvas(window, width=480, height=480)
canvas.pack()
my_image = tk.PhotoImage(width=480, height=480)
canvas.create_image((3, 3), image=my_image, anchor="nw", state="normal")
running = 0
def pixel(matrix, i, j):
if matrix[i, j] == -1:
temp = "#cc0000" # red
elif matrix[i, j] == 0:
temp = "#fffafa" # white
elif matrix[i, j] == 1:
temp = "#7CFC00" # green
my_image.put(temp, to=(i * square_w, j * square_w, (i + 1) * square_w, (j + 1) * square_w))
#numba.njit
def _onestep(matrix, white_number, L):
whites = np.where(matrix == 0) # find position of all white squares
my_v = np.random.choice(white_number) # randomly pick one white square...
x = whites[0][my_v]
y = whites[1][my_v]
num = np.random.choice(np.array((0, 1, 2, 3))) # ...randomly pick one of its 4 neighbours
neighbour = [[(x + 1) % L, y], [x, (y + 1) % L], [(x - 1) % L, y], [x, (y - 1) % L]]
# swap with neighbour
matrix[x, y] = matrix[neighbour[num][0], neighbour[num][1]]
matrix[neighbour[num][0], neighbour[num][1]] = 0
return x,y,neighbour[num][0],neighbour[num][1]
class myClass:
def __init__(self, size):
self.L = size
self.matrix = np.random.choice([-1, 0, 1], (self.L, self.L), p=[0.45, 0.1, 0.45])
self.white_number = len(np.where(self.matrix == 0)[0])
self.iteration = 0
for i in range(self.L):
for j in range(self.L):
pixel(self.matrix, i, j)
def onestep(self):
x,y,z1,z2 = _onestep(self.matrix, self.white_number, self.L)
pixel(self.matrix, x, y) # update the pixel the white square has left
pixel(self.matrix, z1, z2) # update the pixel the white atom has jumped to
def compute(self):
if running:
for j in range(1, self.white_number + 1):
self.onestep()
self.iteration += 1
window.after(1000, self.compute)
running = 1
myObj = myClass(30)
myObj.compute()
window.mainloop()
here onestep is not jitted, while _onestep that does the heavy-lifting is jitted.
there is some speedup (11 msec with numba vs 19 msec without numba per frame) in your program because most of the time is spent drawing, not in computation, so it won't benefit from any more "compiling".
the better approach would be to store all your screen data as 2d array and manipulate it in numba then redraw your entire screen in python at once instead of part by part.
cython would probably be able to get more optimized code out of this as it can mix python and non-python objects in the same function and remove the loop overhead, but writing code for it is harder than numba.

Related

Is there a way to make a matplotlib grid interactable using motion_notify_event or the mouse package?

I'm making an implementation of Conway's Game of Life using matplotlib and numpy. However, I'm having lots of trouble trying to figure out how to make the grid interactable. Basically, whenever I click, the code checks whether the mouse is not on the grid (which would return a tuple None, None), and then using the mouse's position if it is on the grid the code sets the pixel of the grid closest to the mouse to ON (255). I tried to implement this using this function:
def mouse_move(event):
xy = event.x, event.y
return xy
and then in the function doing the updating of the grid:
def update(frameNum, img, grid, N, msx = 0, msy = 0):
# copy grid because we need 8 neighbors
# and we go line by line
newGrid = grid.copy()
# this is the part where I use the mouse pos
if ms.is_pressed('left') == True:
msx, msy = plt.connect('motion_notify_event', mouse_move)
print(msx, msy)
newGrid[msx, msy] = ON
# this is the end of the part where I use the mouse pos
for i in range(N):
for j in range(N):
# compute 8-neighbor sum using toroidal boundary conditions
# x and y wrap around so that sim is toroidal
total = int((grid[i, (j-1)%N] + grid[i, (j+1)%N] +
grid[(i-1)%N, j] + grid[(i+1)%N, j] +
grid[(i-1)%N, (j-1)%N] + grid[(i-1)%N, (j+1)%N] +
grid[(i+1)%N, (j-1)%N] + grid[(i+1)%N, (j+1)%N])/255)
# apply rules
if grid[i, j] == ON:
if (total < 2) or (total > 3):
newGrid[i, j] = OFF
else:
if (total == 3):
newGrid[i, j] = ON
# update data
img.set_data(newGrid)
grid[:] = newGrid[:]
return img
The issue here is that plt.connect() was only returning 1, 0 for some reason and not the mouse's position.
I then tried to ditch using motion_notify_event altogether and opted to use the mouse package:
def get_ms():
# the pixel dimensions of the grid are (200, 140) and (575, 510) on my screen
x, y = ms.get_position()
if (x > 575 or x < 200) and (y > 510 or y < 140):
return None, None
else:
return x, y
But this is where I'm stumped. Even though I have my mouse's pixel position, it isn't the grid position that I want. I know there has to be a way to do this with plt.connect('motion_notify_event', mouse_move) but I don't know what it is. I've already used basic debugging techniques. Am I doing something wrong?
Also, I've included the entire update() function in case I messed up there.

Iterating over regions of numpy array in parallel

I have an 3D array and need to iterate over it, extract a 2x2x2 voxel large region and check if any voxel is non-zero. Of these locations, I need the unique elements of the region and the index:
import time
import numpy as np
np.random.seed(1234)
def _naive_iterator(array):
lookup = np.pad(array, (1, 1), 'constant') # Border is filled with 0
nx, ny, nz = lookup.shape
for i in range(nx - 1):
for j in range(ny - 1):
for k in range(nz - 1):
n = lookup[i:i + 2, j:j + 2, k:k + 2]
if n.any(): # check if any value in the region is non-zero
yield n.ravel(), i, j, k
# yield set(n.ravel()), i, j, k # `set()` alone takes some time - for testing purposes exclude this
# arrays that shall be used here are in the region of (1000, 1000, 1000) or larger
# arrays are asserted to contain only integer values >= 0.
array = np.random.randint(0, 2, (200, 200, 200), dtype=np.uint8)
for fun in (_naive_iterator, ):
print(f"* {fun}")
for _ in range(2):
tic = time.time()
[x for x in fun(array)]
print(f" ** execution took {time.time() - tic}")
On my PC, this loop takes about 24s to run. (Interesting sidenote: without the n.any(), the loop needs only 8s, so maybe there is some optimization potential as well?)
I thought about how I could make this faster, potentially by running it in parallel. But, I can not figure out how I could do that, without pre-generating all the 2x2x2 arrays.
I also thought about using scipy.ndimage.generic_filter but with that I can only get an image which has for example 1 on all pixels that I want to include, but I would had to iterate over the original image to get n.ravel() (Ideally, one would use generic_filter directly, but I can not get the index inside the called function).
How can I speed up this loop, potentially by parallelizing the iteration?

without the n.any(), the loop needs only 8s, so maybe there is some optimization potential as well?
This is because Numpy function have a big overhead for very small arrays like 2x2x2. The overhead of a Numpy function is about few microseconds while the actual n.any() computation should take no more than a dozen of nanoseconds on a mainstream processor. The usual solution is to vectorize the operation so to avoid many Numpy calls. You can use Numba to speed up this code and removes most of the CPython/Numpy overheads. Note that Numba does not support all function like pad currently so a workaround is needed. Here is the resulting code:
import time
import numpy as np
import numba as nb
np.random.seed(1234)
#nb.njit('(uint8[:,:,::1],)')
def numba_iterator(lookup):
nx, ny, nz = lookup.shape
for i in range(nx - 1):
for j in range(ny - 1):
for k in range(nz - 1):
n = lookup[i:i + 2, j:j + 2, k:k + 2]
if n.any():
yield n.ravel(), i, j, k
array = np.random.randint(0, 2, (200, 200, 200), dtype=np.uint8)
for fun in (numba_iterator, ):
print(f"* {fun}")
for _ in range(2):
tic = time.time()
lookup = np.pad(array, (1, 1), 'constant') # Border is filled with 0
[x for x in fun(lookup)]
print(f" ** execution took {time.time() - tic}")
This is significantly times faster on my machine (but still quite slow).
I thought about how I could make this faster, potentially by running it in parallel.
This is not possible as long as the yield is used since generators are inherently sequential.
How can I speed up this loop
One solution could be to generate the whole output as a Numpy array in Numba so to avoid the creation of 8 million Numpy objects stored in a CPython list which is the main source of slowdown of the code once optimized with Numba (each call to n.ravel creates a new array). Note that generators are generally slow since they often requires a context-switch (of a kind of lightweight-thread / coroutine). The best solution in term of performance is to compute data on-the-fly in the loop.
Additionally, n.any and n.ravel can be manually rewritten in Numba so to be more efficient. Indeed, the n array views are very small and using 3 nested loops with a constant compile-time bound help the compiler to produce a fast code (ie. it can unroll the loops and generate only few instructions the processor can execute very efficiently).
Here is a modified improved code (that compute the padded array manually):
#nb.njit('(uint8[:,:,::1],)')
def fast_compute(array):
nx, ny, nz = array.shape
# Padding (with zeros)
lookup = np.zeros((nx+2, ny+2, nz+2), dtype=np.uint8)
for i in range(nx):
for j in range(ny):
for k in range(nz):
lookup[i+1, j+1, k+1] = array[i, j, k]
# Actual computation
size = (nx + 1) * (ny + 1) * (nz + 1)
result = np.empty((size, 8), dtype=np.uint8)
indices = np.empty((size, 3), dtype=np.uint32)
cur = 0
for i in range(nx + 1):
for j in range(ny + 1):
for k in range(nz + 1):
n = lookup[i:i+2, j:j+2, k:k+2]
# Fast manual n.any()
found = False
for i2 in range(2):
for j2 in range(2):
for k2 in range(2):
found |= n[i2, j2, k2]
if found:
# Fast manual n.ravel()
cur2 = 0
for i2 in range(2):
for j2 in range(2):
for k2 in range(2):
result[cur, cur2] = n[i2, j2, k2]
cur2 += 1
indices[cur, 0] = i
indices[cur, 1] = j
indices[cur, 2] = k
cur += 1
return result[:cur].reshape(cur,2,2,2), indices[:cur]
The resulting code is quite big, but this is the price to pay for high performance computing.
As pointed out by #norok2, result[:cur] and indices[:cur] are views referencing arrays. The view can be quite small compared to the allocated arrays. If this is a problem, you can return a copy (eg. result[:cur].copy()) so to avoid a possible memory overconsumption. In practice, it should not be a problem since the array is allocated in virtual memory and only the written pages are mapped in physical memory on mainstream systems (eg. Windows & Linux). Page of virtual memory are only mapped to physical memory during the first touch (ie. when items are written for the first time). Modern platforms can allocate huge amount of virtual memory (eg. 131072 GiB on my mainstream x86-64 Windows, and even more on mainstream x86-64 Linux) while the physical memory is much more scarce (eg. 16 GiB on my machine). The underlying array is freed when there is no view referencing it anymore.
Benchmark
_naive_iterator: 21.25 s
numba_iterator: 8.10 s
get_windows_and_indices: 1.35 s
fast_compute: 0.13 s
The last Numba function is 163 times faster than the initial one and 10 times faster than the vectorized Numpy implementation of #flawr.
The Numba implementation could certainly be multi-threaded, but it is not easy to do since threads need to write the output and the location of the written items (ie. cur) is dependent of the other threads. Moreover, it would make the code significantly more complex.

Whenever you're working with numpy, you should try to avoid explicit loops. These loops are written in python and therefore usually slower than anything you can do with vectorization. That way you defer the looping to the underlying C functions that are pretty much as fast as anything can be. So I would approach your problem with something like the following. This function does roughly the same thing as your _naive_iterator but in a vectorized manner without any python loops:
from numpy.lib.stride_tricks import sliding_window_view
def get_windows_and_indices(array):
lookup = np.pad(array, (1, 1), 'constant') # Border is filled with 0
nx, ny, nz = lookup.shape
x, y, z = np.mgrid[0:nx, 0:ny, 0:nz]
lookup = np.stack([lookup, x, y, z])
out = sliding_window_view(lookup, (2, 2, 2), axis=(1, 2, 3)).reshape(4, -1, 2, 2, 2)
windows = out[0, ...]
ic = out[1, ..., 0, 0, 0]
jc = out[2, ..., 0, 0, 0]
kc = out[3, ..., 0, 0, 0]
mask = windows.any(axis=(1, 2, 3))
return windows[mask], ic[mask], jc[mask], kc[mask]
Of course you will also need to think abou the rest of the code a little bit differently but vectorization is really something you need to get used to if you want to (efficiently) work with numpy.
Also I'm pretty sure that even this function above is not optimal and can definitely be improved further.

The simplest approach to speed up your code while retaining the features is with Numba. I assume the padding to be essentially a decorating step, and I will deal with it separately at end of the answer.
Here is a cleaner implementation of the originally proposed code and the naïve Numba acceleration:
import numpy as np
import numba as nb
def i_cubicles_3D_set_OP(arr, size=2):
nx, ny, nz = arr.shape
nx += 1 - size
ny += 1 - size
nz += 1 - size
for i in range(nx):
for j in range(ny):
for k in range(nz):
window = arr[i:i + size, j:j + size, k:k + size]
if window.any():
yield set(window.ravel()), (i, j, k)
i_cubicles_3D_set_OP_nb = nb.njit(i_cubicles_3D_set_OP)
i_cubicles_3D_set_OP_nb.__name__ = "i_cubicles_3D_set_OP_nb"
If one is interested is a dimension-agnostic version of it (which comes at the cost of some speed) one could write:
def i_cubicles_set_nb(arr, size=2):
window = (size,) * arr.ndim
window_size = size ** arr.ndim
reduced_shape = tuple(dim - size + 1 for dim, size in zip(arr.shape, window))
view = np.lib.stride_tricks.as_strided(
arr, shape=reduced_shape + window, strides=arr.strides * 2, writeable=False)
return _i_cubicles_set_nb(view.reshape((-1, window_size)), reduced_shape)
#nb.njit
def unravel_index(x, shape):
result = np.zeros(len(shape), dtype=np.int_)
for i, dim in enumerate(shape[::-1], 1):
result[-i] = x % dim
x //= dim
return result
#nb.njit
def not_only_zeros(seq):
# assumes seq is not empty
count = 0
for x in seq:
if x == 0:
count += 1
break # because only unique values
return len(seq) != count
#nb.njit
def _i_cubicles_set_nb(arr, shape):
for i, x in enumerate(arr):
uniques = set(x)
if not_only_zeros(uniques):
yield uniques, unravel_index(i, shape)
This introduces the important trick of generating a strided (read-only) view of the input, which can be used to simplify conceptually all the looping, at the cost of having to manually unravel the index.
This is a similar idea as the one proposed in #flawr's answer.
On a 50³-sized index, I get the following timings:
np.random.seed(42)
n = 50
arr = np.random.randint(0, 3, (n, n, n), dtype=np.uint8)
def is_equal_i_set(a, b):
return all(x[0] == y[0] and np.allclose(x[1], y[1]) for x, y in zip(a, b))
funcs = i_cubicles_3d_set_OP_nb, i_cubicles_3d_set_OP, i_cubicles_set_nb
base = list(funcs[0](arr))
for func in funcs:
res = list(func(arr))
print(f"{func.__name__:>24} {is_equal_i_set(base, res)!s:>5}", end=' ')
# %timeit -n 1 -r 1 list(func(arr))
%timeit list(func(arr))
# i_cubicles_3d_set_OP_nb True 1 loop, best of 5: 130 ms per loop
# i_cubicles_3d_set_OP True 1 loop, best of 5: 776 ms per loop
# i_cubicles_set_nb True 10 loops, best of 5: 151 ms per loop
Indicating the use of Numba to be quite effective.
No uniques
If one is willing to forego the requirement of returning only unique elements inside a cubicle, replacing them with all the elements inside the cubicles, one does gain some (but not much) speed:
#nb.njit
def i_cubicles_3d_nb(arr, size=2):
nx, ny, nz = arr.shape
nx += 1 - size
ny += 1 - size
nz += 1 - size
for i in range(nx):
for j in range(ny):
for k in range(nz):
window = arr[i:i + size, j:j + size, k:k + size]
if window.any():
yield window.ravel(), (i, j, k)
def i_cubicles_nb(arr, size=2):
window = (size,) * arr.ndim
window_size = size ** arr.ndim
reduced_shape = tuple(dim - size + 1 for dim, size in zip(arr.shape, window))
view = np.lib.stride_tricks.as_strided(
arr, shape=reduced_shape + window, strides=arr.strides * 2, writeable=False)
return _i_cubicles_nb(view.reshape((-1, window_size)), reduced_shape)
#nb.njit
def unravel_index(x, shape):
result = np.zeros(len(shape), dtype=np.int_)
for i, dim in enumerate(shape[::-1], 1):
result[-i] = x % dim
x //= dim
return result
#nb.njit
def any_nb(arr):
for x in arr:
if x:
return True
return False
#nb.njit
def _i_cubicles_nb(arr, shape):
for i, x in enumerate(arr):
if any_nb(x):
yield x, unravel_index(i, shape)
as evidenced by the following benchmark (on the same 50³-sized input as before):
def is_equal_i(a, b):
return all(np.allclose(x[0], y[0]) and np.allclose(x[1], y[1]) for x, y in zip(a, b))
funcs = i_cubicles_3d_nb, i_cubicles_nb
base = list(funcs[0](arr))
for func in funcs:
res = list(func(arr))
print(f"{func.__name__:>24} {is_equal_i(base, res)!s:>5}", end=' ')
# %timeit -n 1 -r 1 list(func(arr))
%timeit list(func(arr))
# print()
# i_cubicles_3d_nb True 10 loops, best of 5: 116 ms per loop
# i_cubicles_nb True 10 loops, best of 5: 125 ms per loop
No yield (and no uniques)
While it is clear that a function matching exactly the OP output can be made faster only with Numba / Cython, a number of fast approaches can be obtained by foregoing some features of the OP code.
In particular, when creating generators, a significant amount of time is spent on creating the actual objects to yield.
The same information can be returned (and most importantly allocated) all at once with substantial speed gain, especially if we skip creating the containers for computing the unique elements.
Once we are accepting to return all elements inside a cubicle instead its unique elements, it is possible to devise also NumPy-only vectorized (fast and dimension-agnostic) approaches, alongside faster Numba (3d-specific) implementations:
def cubicles_np(arr, size=2):
window = (size,) * arr.ndim
window_size = size ** arr.ndim
reduced_shape = tuple(dim - size + 1 for dim, size in zip(arr.shape, window))
view = np.lib.stride_tricks.as_strided(
arr, shape=reduced_shape + window, strides=arr.strides * 2, writeable=False)
mask = np.any(view, axis=tuple(range(-arr.ndim, 0)))
return view[mask, ...], np.array(np.nonzero(mask)).transpose()
def cubicles_tr_np(arr, size=2):
window = (size,) * arr.ndim
window_size = size ** arr.ndim
reduced_shape = tuple(dim - size + 1 for dim, size in zip(arr.shape, window))
view = np.lib.stride_tricks.as_strided(
arr, shape=window + reduced_shape, strides=arr.strides * 2, writeable=False)
mask = np.any(view, axis=tuple(range(arr.ndim)))
return (
view[..., mask].reshape((window_size, -1)).transpose().reshape((-1, *window)),
np.array(np.nonzero(mask)).transpose())
def cubicles_nb(arr, size=2):
window = (size,) * arr.ndim
window_size = size ** arr.ndim
reduced_shape = tuple(dim - size + 1 for dim, size in zip(arr.shape, window))
view = np.lib.stride_tricks.as_strided(
arr, shape=reduced_shape + window, strides=arr.strides * 2, writeable=False)
values, indexes = _cubicles_nb(view.reshape((-1, window_size)), reduced_shape, arr.ndim)
return values.reshape((-1, *window)), indexes
#nb.njit
def any_nb(arr):
for x in arr:
if x:
return True
return False
#nb.njit
def _cubicles_nb(arr, shape, ndim):
n, k = arr.shape
indexes = np.empty((n, ndim), dtype=np.bool_)
result = np.empty((n, k), dtype=arr.dtype)
count = 0
for i in range(n):
x = arr[i]
if any_nb(x):
indexes[count] = unravel_index(i, shape)
result[count] = x
count += 1
return result[:count].copy(), indexes[:count].copy()
#nb.njit
def any_cubicle_3d_nb(arr, size):
for i in range(size):
for j in range(size):
for k in range(size):
if arr[i, j, k]:
return True
return False
#nb.njit
def cubicles_3d_nb(arr, size=2):
nx, ny, nz = arr.shape
nx += 1 - size
ny += 1 - size
nz += 1 - size
nn = nx * ny * nz
indexes = np.empty((nn, 3), dtype=np.bool_)
result = np.empty((nn, size, size, size), dtype=arr.dtype)
count = 0
for i in range(nx):
for j in range(ny):
for k in range(nz):
x = arr[i:i + size, j:j + size, k:k + size]
if any_cubicle_3d_nb(x, size):
result[count] = x
indexes[count] = i, j, k
count += 1
return result[:count].copy(), indexes[:count].copy()
The timings, obtained again on the 50³-sized input, do indicate for the Numba-based approaches that spelling out the loops is significantly faster than looping through a view.
In fact, without explicitly looping along the dimensions, the NumPy-only approaches can be faster than the Numba accelerated one.
Note that cubicles_3d_nb() can be seen essentially as a cleaned-up version of #JérômeRichard's answer.
(Actually, the timing for JérômeRichard's fast_compute() on my machine and input -- with the addition of the extra .copy() -- seem to indicate that cubicles_3d_nb() is more efficient -- possibly because of the short-circuiting in the "any" code, and the lack of need to ravel the values manually).
def is_equal(a, b):
return all(np.allclose(x[0], y[0]) and np.allclose(x[1], y[1]) for x, y in zip(a, b))
funcs = cubicles_3d_nb, cubicles_nb, cubicles_np, cubicles_tr_np
base = funcs[0](arr)
for func in funcs:
res = func(arr)
print(f"{func.__name__:>24} {is_equal(base, res)!s:>5}", end=' ')
%timeit func(arr)
# cubicles_3d_nb True 100 loops, best of 5: 3.82 ms per loop
# cubicles_nb True 10 loops, best of 5: 23 ms per loop
# cubicles_np True 10 loops, best of 5: 24.7 ms per loop
# cubicles_tr_np True 100 loops, best of 5: 16.5 ms per loop
Notes on indexes
If one is to give the result all at once, then the indexes themselves are not particularly efficient to store the information as to where the non-zero cubicles are, unless there are few of them.
Instead, a boolean array is more memory efficient.
The indexing requires index_size * ndim * num (num being the number of non-zero cubicles, bounded to be 0 < num < prod(shape)).
The masking requires bool_size * prod(shape).
For NumPy bool_size = 8 while index_size = 64 (can be tweaked but typically at least 16), so: index_size = bool_size * k.
So the masking is more efficient as long as:
num < prod(shape) // (k * ndim)
For 3D and typical index_size = 64, this means that (num / prod(shape)) < (1 / 24), so that indexing is efficient if non-zero cubicles are ~5% or less.
Speed-wise, using a boolean mask instead of the indexes could lead to implementations that are faster by a small but fair margin (~5 to ~20%) as long as the non-zero cubicles are not too few.
Addendum: Padding
While np.pad() is not supported by Numba, it is quite simple to call any padding function outside of Numba.
Additionally, for some combination of inputs np.pad() is slower then simple assign on a sliced output:
import numpy as np
import numba as nb
#nb.njit
def pad_3d_nb(arr, size=1):
nx, ny, nz = arr.shape
result = np.zeros((nx + 2 * size, ny + 2 * size, nz + 2 * size), dtype=arr.dtype)
for i in range(nx):
for j in range(ny):
for k in range(nz):
result[i + size, j + size, k + size] = arr[i, j, k]
return result
def const_pad(arr, size=1, value=0):
shape = tuple(dim + 2 * size for dim in arr.shape)
mask = tuple(slice(size, dim + size) for dim in arr.shape)
result = np.full(shape, value, dtype=arr.dtype)
result[mask] = arr
return result
np.random.seed(42)
n = 200
k = 10
arr = np.random.randint(0, 3, (n, n, n), dtype=np.uint8)
base = np.pad(arr, (k, k))
print(np.allclose(pad_3d_nb(arr, k), base))
# True
print(np.allclose(const_pad(arr, k), base))
# True
%timeit np.pad(arr, (k, k))
# 100 loops, best of 5: 3.01 ms per loop
%timeit pad_3d_nb(arr, k)
# 100 loops, best of 5: 11.5 ms per loop
%timeit const_pad(arr, k)
# 100 loops, best of 5: 2.53 ms per loop

How can I transfer the data that my program generates to a new turtle program to use?

Seen below is the code for my iteration program. I want to be able to use turtle graphics to take each parameter (k) and have the equations output plotted against its corresponding k value. This should create a feigenbaum diagram if im not mistaken? My problem is, how can I get a turtle to plot these points for each k value, then connect them to the points from the neighbouring k values and so on?
def iteration(xstore):
global x0
x0=xstore
print (x0)
x0=float(input("x0:"))
n=float(input("max parameter value:"))
divison=float(input("divisons between parameters:"))
xv=x0
x1=0
k=0
while k<(n+divison):
print("K VALUE:"+str(k))
for i in range (0,20):
x1=x0+x0*k*(1-x0)
iteration(x1)
print ("________________________")
x0=xv
k=k+divison

Here is a feigenbaum diagram generated using tkinter. It is from the "open book project", visualizing chaos.
The program source is here; I converted it to python 3 and posted it hereunder. There is a lot for you to learn reading and understanding this code.
#
# chaos-3.py
#
# Build Feigenbaum Logistic map. Input start and end K
#
# python chaos-3.py 3.4 3.9
#
canWidth=500
canHeight=500
def setupWindow () :
global win, canvas
from tkinter import Tk, Canvas, Frame
win = Tk()
canvas = Canvas(win, height=canHeight, width=canWidth)
f = Frame (win)
canvas.pack()
f.pack()
def startApp () :
global win, canvas
import sys
# k1 = float(sys.argv[1]) # starting value of K
# k2 = float(sys.argv[2]) # ending value of K
x = .2 # is somewhat arbitrary
vrng = range(200) # We'll do 200 horz steps
for t in range(canWidth) :
win.update()
k = k1 + (k2-k1)*t/canWidth
# print("K = %.04f" % k)
for i in vrng :
p = x*canHeight
canvas.create_line(t,p,t,p+1) # just makes a pixel dot
x = x * (1-x) * k # next x value
if x <=0 or x >= 1.0 :
# print("overflow at k", k)
return
def main () :
setupWindow() # Create Canvas with Frame
startApp() # Start up the display
win.mainloop() # Just wait for user to close graph
k1 = 2.9
k2 = 3.8
main()

how can I get a turtle to plot these points for each k value
Here's a simple, crude, slow example I worked out using Python turtle:
from turtle import Screen, Turtle
WIDTH, HEIGHT = 800, 400
Kmin = 2.5
Kmax = 3.8
x = 0.6
screen = Screen()
screen.setup(WIDTH, HEIGHT)
screen.setworldcoordinates(Kmin, 0.0, Kmax, 1.0)
screen.tracer(False)
turtle = Turtle()
turtle.hideturtle()
turtle.penup()
k = Kmin
while k < Kmax:
for _ in range(HEIGHT//4):
x *= (1.0 - x) * k
turtle.goto(k, x)
turtle.dot(2)
x *= 1 + 1/(HEIGHT//4)
k *= 1 + 1/WIDTH
screen.tracer(True)
screen.exitonclick()
I hope it gives you some ideas about plotting functions using a turtle. (Of course, using matplotlib with numpy usually works out better in the end.)

Can I vectorise this python code?

I have written this python code to get neighbours of a label (a set of pixels sharing some common properties). The neighbours for a label are defined as the other labels that lie on the other side of the boundary (the neighbouring labels share a boundary). So, the code I wrote works but is extremely slow:
# segments: It is a 2-dimensional numpy array (an image really)
# where segments[x, y] = label_index. So each entry defines the
# label associated with a pixel.
# i: The label whose neighbours we want.
def get_boundaries(segments, i):
neighbors = []
for y in range(1, segments.shape[1]):
for x in range(1, segments.shape[0]):
# Check if current index has the label we want
if segments[x-1, y] == i:
# Check if neighbour in the x direction has
# a different label
if segments[x-1, y] != segments[x, y]:
neighbors.append(segments[x,y])
# Check if neighbour in the y direction has
# a different label
if segments[x, y-1] == i:
if segments[x, y-1] != segments[x, y]:
neighbors.append(segments[x, y])
return np.unique(np.asarray(neighbors))
As you can imagine, I have probably completely misused python here. I was wondering if there is a way to optimize this code to make it more pythonic.

Here you go:
def get_boundaries2(segments, i):
x, y = np.where(segments == i) # where i is
right = x + 1
rightMask = right < segments.shape[0] # keep in bounds
down = y + 1
downMask = down < segments.shape[1]
rightNeighbors = segments[right[rightMask], y[rightMask]]
downNeighbors = segments[x[downMask], down[downMask]]
neighbors = np.union1d(rightNeighbors, downNeighbors)
return neighbors
As you can see, there are no Python loops at all; I also tried to minimize copies (the first attempt made a copy of segments with a NAN border, but then I devised the "keep in bounds" check).
Note that I did not filter out i itself from the "neighbors" here; you can add that easily at the end if you want. Some timings:
Input 2000x3000: original takes 13 seconds, mine takes 370 milliseconds (35x speedup).
Input 1000x300: original takes 643 ms, mine takes 17.5 ms (36x speedup).

You need to replace your for loops with numpy's implicit looping.
I don't know enough about your code to convert it directly, but I can give an example.
Suppose you have an array of 100000 random integers, and you need to get an array of each element divided by its neighbor.
import random, numpy as np
a = np.fromiter((random.randint(1, 100) for i in range(100000)), int)
One way to do this would be:
[a[i] / a[i+1] for i in range(len(a)-1)]
Or this, which is much faster:
a / np.roll(a, -1)
Timeit:
initcode = 'import random, numpy as np; a = np.fromiter((random.randint(1, 100) for i in range(100000)), int)'
timeit.timeit('[a[i] / a[i+1] for i in range(len(a)-1)]', initcode, number=100)
5.822079309000401
timeit.timeit('(a / np.roll(a, -1))', initcode, number=100)
0.1392055350006558

Why is my simple python gtk+cairo program running so slowly/stutteringly?

My program draws circles moving on the window. I think I must be missing some basic gtk/cairo concept because it seems to be running too slowly/stutteringly for what I am doing. Any ideas? Thanks for any help!
#!/usr/bin/python
import gtk
import gtk.gdk as gdk
import math
import random
import gobject
# The number of circles and the window size.
num = 128
size = 512
# Initialize circle coordinates and velocities.
x = []
y = []
xv = []
yv = []
for i in range(num):
x.append(random.randint(0, size))
y.append(random.randint(0, size))
xv.append(random.randint(-4, 4))
yv.append(random.randint(-4, 4))
# Draw the circles and update their positions.
def expose(*args):
cr = darea.window.cairo_create()
cr.set_line_width(4)
for i in range(num):
cr.set_source_rgb(1, 0, 0)
cr.arc(x[i], y[i], 8, 0, 2 * math.pi)
cr.stroke_preserve()
cr.set_source_rgb(1, 1, 1)
cr.fill()
x[i] += xv[i]
y[i] += yv[i]
if x[i] > size or x[i] < 0:
xv[i] = -xv[i]
if y[i] > size or y[i] < 0:
yv[i] = -yv[i]
# Self-evident?
def timeout():
darea.queue_draw()
return True
# Initialize the window.
window = gtk.Window()
window.resize(size, size)
window.connect("destroy", gtk.main_quit)
darea = gtk.DrawingArea()
darea.connect("expose-event", expose)
window.add(darea)
window.show_all()
# Self-evident?
gobject.idle_add(timeout)
gtk.main()

One of the problems is that you are drawing the same basic object again and again. I'm not sure about GTK+ buffering behavior, but also keep in mind that basic function calls incur a cost in Python. I've added a frame counter to your program, and I with your code, I got around 30fps max.
There are several things you can do, for instance compose larger paths before actually calling any fill or stroke method (i.e. will all arcs in a single call). Another solution, which is vastly faster is to compose your ball in an off-screen buffer and then just paint it to the screen repeatedly:
def create_basic_image():
img = cairo.ImageSurface(cairo.FORMAT_ARGB32, 24, 24)
c = cairo.Context(img)
c.set_line_width(4)
c.arc(12, 12, 8, 0, 2 * math.pi)
c.set_source_rgb(1, 0, 0)
c.stroke_preserve()
c.set_source_rgb(1, 1, 1)
c.fill()
return img
def expose(sender, event, img):
cr = darea.window.cairo_create()
for i in range(num):
cr.set_source_surface(img, x[i], y[i])
cr.paint()
... # your update code here
...
darea.connect("expose-event", expose, create_basic_image())
This gives about 273 fps on my machine. Because of this, you should think about using gobject.timeout_add rather than idle_add.

I don't see anything fundamentally wrong with your code. To narrow the problem down I tried a different approach that may be minimally faster, but the difference is almost negligible:
class Area(gtk.DrawingArea):
def do_expose_event(self, event):
cr = self.window.cairo_create()
# Restrict Cairo to the exposed area; avoid extra work
cr.rectangle(event.area.x,
event.area.y,
event.area.width,
event.area.height)
cr.clip()
cr.set_line_width(4)
for i in range(num):
cr.set_source_rgb(1, 0, 0)
cr.arc(x[i], y[i], 8, 0, 2 * math.pi)
cr.stroke_preserve()
cr.set_source_rgb(1, 1, 1)
cr.fill()
x[i] += xv[i]
y[i] += yv[i]
if x[i] > size or x[i] < 0:
xv[i] = -xv[i]
if y[i] > size or y[i] < 0:
yv[i] = -yv[i]
self.queue_draw()
gobject.type_register(Area)
# Initialize the window.
window = gtk.Window()
window.resize(size, size)
window.connect("destroy", gtk.main_quit)
darea = Area()
window.add(darea)
window.show_all()
Also, overriding DrawingArea.draw() with a stub makes no major difference.
I'd probably try the Cairo mailing list, or look at Clutter or pygame for drawing a large number of items on the screen.

I have got the same problem in program was written on C#. Before you leaves Expose event, try to write cr.dispose().

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numba: how to speed up numerical simulation requiring also GUI - python

Related

Is there a way to make a matplotlib grid interactable using motion_notify_event or the mouse package?

Iterating over regions of numpy array in parallel

How can I transfer the data that my program generates to a new turtle program to use?

Can I vectorise this python code?

Why is my simple python gtk+cairo program running so slowly/stutteringly?

Categories

Resources