I have part of metlab function, that multiply two matrices by 8,8 block. Table1 is 8x8 shape, table2 is 320x240 shape. I want to transmofr below code to python.
fun = #(x)x.data .*table1;
I_spatial = blockproc(table2,[8 8],fun);
I want to use method to multiply matricies like np.dot, but sizes of input arrays is not the same in row-column connection, so I cannot do it easily. Can somebody could help me, how can I port that fragment to Python?
Also I have second part of this function
fun=#(x)idct2(x.data);
I_spatial = blockproc(I_spatial,[8 8],fun)+128;
How can I write that part in Python?
Using Ahmed's function and example:
In [284]: a = np.ones([320, 240])
...: b = np.zeros([8, 8])
...:
...:
...: def func_mul(x):
...: return x # b
In [285]: result = blockproc(a, 8, 8, func_mul)
In [286]: result.shape
Out[286]: (320, 240)
In a comment I suggested reshaping/transposing the a to a (n,m,8,8) array:
In [287]: a1 = a.reshape(40, 8, 30, 8).transpose(0, 2, 1, 3)
In [288]: a1.shape
Out[288]: (40, 30, 8, 8)
In [289]: res = a1 # b # matmul does 'batch' on lead dimensions
In [290]: res.shape
Out[290]: (40, 30, 8, 8)
In [291]: res1 = res.transpose(0, 2, 1, 3).reshape(a.shape)
Compare times:
In [292]: timeit result = blockproc(a, 8, 8, func_mul)
10.2 ms ± 171 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [293]: def foo(a, b):
...: a1 = a.reshape(40, 8, 30, 8).transpose(0, 2, 1, 3)
...: res = a1 # b
...: res1 = res.transpose(0, 2, 1, 3).reshape(a.shape)
...: return res1
In [294]: timeit foo(a,b)
918 µs ± 2.19 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Changing the arrays so the result values are significant (not all 0) to verify the equality of these methods:
In [295]: a = np.arange(320 * 240).reshape(320, 240)
In [296]: b = np.arange(64).reshape(8, 8)
In [297]: result = blockproc(a, 8, 8, func_mul)
In [298]: res1 = foo(a, b)
In [299]: np.allclose(result, res1)
Out[299]: True
My approach is much faster because it does not iterate on the lead (40,30) dimensions. But it depends on the func being something like matmul that can work with this mix of dimension. In other words, a function that makes full use of numpy broadcasting.
edit
And #Victor's version:
In [308]: def victor(A, blockdims, func):
...: vr, hr = A.shape[0] // blockdims[0], A.shape[1] // blockdims[1]
...: B = A.copy()
...: verts = np.vsplit(B, vr)
...: for i in range(len(verts)):
...: for j, v in enumerate(np.hsplit(verts[i], hr)):
...: B[
...: i * blockdims[0] : (i + 1) * blockdims[0],
...: j * blockdims[1] : (j + 1) * blockdims[1],
...: ] = func(v)
...: return B
...:
In [309]: res2 = victor(a, (8, 8), func_mul)
In [310]: res2.shape
Out[310]: (320, 240)
In [311]: np.allclose(result, res2)
Out[311]: True
In [312]: timeit res2 = victor(a, (8, 8), func_mul)
13.7 ms ± 5.51 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
there aren't any premade versions I am aware of, but this implementation will be quite fast, as data copying is minimal, granted that it'll always pad the output to be of the proper size.
def blockproc(A,m,n,fun):
results_rows = []
for y in range(0,A.shape[0],m):
results_cols = []
for x in range(0,A.shape[1],n):
results_cols.append(fun(A[y:y+m,x:x+n]))
results_rows.append(results_cols)
patch_rows = results_rows[0][0].shape[0]
patch_cols = results_rows[0][0].shape[1]
final_array_cols = results_rows[0][0].shape[1] * len(results_rows[0])
final_array_rows = results_rows[0][0].shape[0] * len(results_rows)
final_array = np.zeros([final_array_rows,final_array_cols],dtype=results_rows[0][0].dtype)
for y in range(len(results_rows)):
for x in range(len(results_rows[y])):
data = results_rows[y][x]
final_array[y*patch_rows:y*patch_rows+data.shape[0],x*patch_cols:x*patch_cols+data.shape[1]] = data
return final_array
testing it:
a = np.ones([320,240])
b = np.zeros([8,8])
def func_mul(x):
return x#b
result = blockproc(a,8,8,func_mul)
print('dims:',result.shape)
import time
t1 = time.time()
for i in range(1000):
blockproc(a, 8, 8, func_mul)
pass
t2 = time.time()
print('time:',(t2-t1)/1000)
dims:(320, 240)
time:0.006634121179580689
Like #Ahmed AEK mentioned, there is no built-in solution for this. I have come up with a solution that leverages numpy's extremely optimized vsplit and hsplit functions and even allows you to perform the function inplace:
import scipy
import numpy as np
from typing import *
from scipy.fftpack import idct
npd = NewType('npd', np.ndarray)
id = lambda x: x # default function is nothing
def blockproc(A: npd, blockdims: Tuple[int, int], func: Callable[npd, Any]=id, inplace: bool = False)-> npd:
blocks: List[npd] = []
if A.shape[0]%blockdims[0] != 0 or A.shape[1]%blockdims[1] != 0:
print(f"Invalid block dimensions - {A.shape} must be divided evenly by {tuple(blockdims)}")
vr, hr = A.shape[0]//blockdims[0], A.shape[1]//blockdims[1]
B = A if inplace else A.copy()
verts: List[npd] = np.vsplit(B,vr)
try:
for i in range(len(verts)):
for j,v in enumerate(np.hsplit(verts[i], hr)):
B[i*blockdims[0]:(i+1)*blockdims[0], j*blockdims[1]: (j+1)*blockdims[1]] = func(h)
except Exception as e: print("Invalid block function"); exit(e)
return B
if __name__ == "__main__":
# Assume table1 and table2 are defined above ...
# First code sample
fun = lambda x: x#table1
I_spatial = blockproc(table2 ,[8,8], fun)
# Second code sample
fun = lambda x: idct(x)
I_spatial = blockproc(I_spatial,[8 8],fun) + 128
Look at the two code samples you provided - nearly identical! If you're curious, click here for more info about idct
EDIT:
Per #Ahmed AEK's comments (see below), it appears enumerate is slowing down the code significantly. I've now removed the outer enumerate in an effort to decrease runtime.
Related
I want to change a multidimensional numpy array (say mydata) based on some boolean conditions (cascaded, one after other).
This works:
mydata[condition] = something
This does not:
mydata[condition1][condition2] = something
Where all the conditions are boolean array of compatible shape (brodcast-able).
Any reason why this doesn't and what could be a good solution? Right now, I resolve it by reassigning to the original by following:
tempdata = mydata[condition1]
tempdata[condition2] = something
mydata[condition1] = tempdata
To solve cases like those, use chained/cascaded integer-indexing -
idx1 = np.flatnonzero(condition1)
idx2 = np.flatnonzero(condition2)
mydata[idx1[idx2]] = something
Sample run -
In [42]: mydata = np.array([2,6,8,0,9,3,1,4])
...: mydata_copy = mydata.copy() # make copy for verification
...: condition1 = np.array([True,False,True,True,True,False,False,True])
...: condition2 = np.array([False,True,False,True,True])
...: something = -1
...:
# Working solution from question
In [43]: tempdata = mydata[condition1]
...: tempdata[condition2] = something
...: mydata[condition1] = tempdata
...:
In [44]: mydata # Check changed values
Out[44]: array([ 2, 6, -1, 0, -1, 3, 1, -1])
# Proposed solution
In [45]: idx1 = np.flatnonzero(condition1)
...: idx2 = np.flatnonzero(condition2)
...: mydata_copy[idx1[idx2]] = something
...:
In [46]: mydata_copy # Verify changed values in copy
Out[46]: array([ 2, 6, -1, 0, -1, 3, 1, -1])
Alternative method : Alternatively, If you don't mind editing condition1, you could do -
condition1[idx1] = condition2
and then using mydata[condition1] = something as the final step.
Performance benefits
Let's time the proposed one and see if there's any benefit over the one in the question.
Approaches -
# Original approach
def org_app(mydata,condition1,condition2):
tempdata = mydata[condition1]
tempdata[condition2] = something
mydata[condition1] = tempdata
return mydata
# Proposed one
def proposed_app(mydata,condition1,condition2):
idx1 = np.flatnonzero(condition1)
idx2 = np.flatnonzero(condition2)
mydata[idx1[idx2]] = something
return mydata
Timings -
In [58]: mydata = np.random.rand(1000000)
...: mydata_copy = mydata.copy()
...: condition1 = np.random.rand(mydata.size)>0.5
...: condition2 = np.random.rand(condition1.sum())>0.5
...: something = -1
...:
In [59]: %timeit org_app(mydata,condition1,condition2)
100 loops, best of 3: 14.1 ms per loop
In [61]: %timeit proposed_app(mydata_copy,condition1,condition2)
100 loops, best of 3: 7.44 ms per loop
Incorporating Alternative method should bring about further performance boost.
I'm using itertools.combinations() as follows:
import itertools
import numpy as np
L = [1,2,3,4,5]
N = 3
output = np.array([a for a in itertools.combinations(L,N)]).T
Which yields me the output I need:
array([[1, 1, 1, 1, 1, 1, 2, 2, 2, 3],
[2, 2, 2, 3, 3, 4, 3, 3, 4, 4],
[3, 4, 5, 4, 5, 5, 4, 5, 5, 5]])
I'm using this expression repeatedly and excessively in a multiprocessing environment and I need it to be as fast as possible.
From this post I understand that itertools-based code isn't the fastest solution and using numpy could be an improvement, however I'm not good enough at numpy optimazation tricks to understand and adapt the iterative code that's written there or to come up with my own optimization.
Any help would be greatly appreciated.
EDIT:
L comes from a pandas dataframe, so it can as well be seen as a numpy array:
L = df.L.values
Here's one that's slightly faster than itertools UPDATE: and one (nump2) that's actually quite a bit faster:
import numpy as np
import itertools
import timeit
def nump(n, k, i=0):
if k == 1:
a = np.arange(i, i+n)
return tuple([a[None, j:] for j in range(n)])
template = nump(n-1, k-1, i+1)
full = np.r_[np.repeat(np.arange(i, i+n-k+1),
[t.shape[1] for t in template])[None, :],
np.c_[template]]
return tuple([full[:, j:] for j in np.r_[0, np.add.accumulate(
[t.shape[1] for t in template[:-1]])]])
def nump2(n, k):
a = np.ones((k, n-k+1), dtype=int)
a[0] = np.arange(n-k+1)
for j in range(1, k):
reps = (n-k+j) - a[j-1]
a = np.repeat(a, reps, axis=1)
ind = np.add.accumulate(reps)
a[j, ind[:-1]] = 1-reps[1:]
a[j, 0] = j
a[j] = np.add.accumulate(a[j])
return a
def itto(L, N):
return np.array([a for a in itertools.combinations(L,N)]).T
k = 6
n = 12
N = np.arange(n)
assert np.all(nump2(n,k) == itto(N,k))
print('numpy ', timeit.timeit('f(a,b)', number=100, globals={'f':nump, 'a':n, 'b':k}))
print('numpy 2 ', timeit.timeit('f(a,b)', number=100, globals={'f':nump2, 'a':n, 'b':k}))
print('itertools', timeit.timeit('f(a,b)', number=100, globals={'f':itto, 'a':N, 'b':k}))
Timings:
k = 3, n = 50
numpy 0.06967267207801342
numpy 2 0.035096961073577404
itertools 0.7981023890897632
k = 3, n = 10
numpy 0.015058324905112386
numpy 2 0.0017436158377677202
itertools 0.004743851954117417
k = 6, n = 12
numpy 0.03546895203180611
numpy 2 0.00997065706178546
itertools 0.05292179994285107
This is is most certainly not faster than itertools.combinations but it is vectorized numpy:
def nd_triu_indices(T,N):
o=np.array(np.meshgrid(*(np.arange(len(T)),)*N))
return np.array(T)[o[...,np.all(o[1:]>o[:-1],axis=0)]]
%timeit np.array(list(itertools.combinations(T,N))).T
The slowest run took 4.40 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8.6 µs per loop
%timeit nd_triu_indices(T,N)
The slowest run took 4.64 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 52.4 µs per loop
Not sure if this is vectorizable another way, or if one of the optimization wizards around here can make this method faster.
EDIT: Came up with another way, but still not faster than combinations:
%timeit np.array(T)[np.array(np.where(np.fromfunction(lambda *i: np.all(np.array(i)[1:]>np.array(i)[:-1], axis=0),(len(T),)*N,dtype=int)))]
The slowest run took 7.78 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 34.3 µs per loop
I know this question is old, but I have been working on it recently, and it still might help. From my (pretty extensive) testing, I have found that first generating combinations of each index, and then using these indexes to slice the array, is much faster than directly making combinations from the array. I'm sure that using #Paul Panzer's nump2 function to generate these indices could be even faster.
Here is an example:
import numpy as np
from math import factorial
import itertools as iters
from timeit import timeit
from perfplot import show
def combinations_iter(array:np.ndarray, r:int = 3) -> np.ndarray:
return np.array([*iters.combinations(array, r = r)], dtype = array.dtype)
def combinations_iter_idx(array:np.ndarray, r:int = 3) -> np.ndarray:
n_items = array.shape[0]
num_combinations = factorial(n_items)//(factorial(n_items-r)*factorial(r))
combination_idx = np.fromiter(
iters.chain.from_iterable(iters.combinations(np.arange(n_items, dtype = np.int64), r = r)),
dtype = np.int64,
count = num_combinations*r,
).reshape(-1,r)
return array[combination_idx]
show(
setup = lambda n: np.random.uniform(0,100,(n,3)),
kernels = [combinations_iter, combinations_iter_idx],
labels = ['pure itertools', 'itertools for index'],
n_range = np.geomspace(5,300,10, dtype = np.int64),
xlabel = "n",
logx = True,
logy = False,
equality_check = np.allclose,
show_progress = True,
max_time = None,
time_unit = "ms",
)
It is clear that the indexing method is much faster.
First off, apologies for the vague title, I couldn't think of an appropriate name for this issue.
I have 3 numpy arrays in the follwing formats:
N = ([[13, 14, 15], [2, 5, 7], [4, 6, 8] ... several hundred thousand elements long
e1 = [1, 0, 0]
e2 = [0, 1, 0]
The idea is to create a fourth array, 'v', which shall have the same dimensions as 'N', but will be given values based on an if statement. Here is what I currently have which should better explain the issue:
v = np.zeros([len(N), 3])
for i in range(0, len(N)):
if((N*e1)[i,0] != 0):
v[i] = np.cross(N[i],e1)
else:
v[i] = np.cross(N[i],e2)
This code does what I require it to but does so in a longer than anticipated time (> 5 mins). Is there any form of list comprehension or similar concept I could use to increase the efficiency of the code?
You can use numpy.where to replace if-else and vectorize the process with broadcasting, here is an option with numpy.where:
import numpy as np
np.where(np.repeat(N[:,0] != 0, 3).reshape(1000,3), np.cross(N, e1), np.cross(N, e2))
Some benchmarks here:
1) Data set up:
N = np.array([np.random.randint(0,10,3) for i in range(1000)])
N
#array([[3, 5, 0],
# [5, 0, 8],
# [4, 6, 0],
# ...,
# [9, 4, 2],
# [6, 9, 3],
# [2, 9, 2]])
e1 = np.array([1, 0, 0])
e2 = np.array([0, 1, 0])
2) Timing:
def forloop():
v = np.zeros([len(N), 3]);
for i in range(0, len(N)):
if((N*e1)[i,0] != 0):
v[i] = np.cross(N[i],e1)
else:
v[i] = np.cross(N[i],e2)
return v
def forloop2():
v = np.zeros([len(N), 3])
# Only calculate this one time.
my_product = N*e1
for i in range(0, len(N)):
if my_product[i,0] != 0:
v[i] = np.cross(N[i],e1)
else:
v[i] = np.cross(N[i],e2)
return v
%timeit forloop()
10 loops, best of 3: 25.5 ms per loop
%timeit forloop2()
100 loops, best of 3: 12.7 ms per loop
%timeit np.where(np.repeat(N[:,0] != 0, 3).reshape(1000,3), np.cross(N, e1), np.cross(N, e2))
10000 loops, best of 3: 71.9 µs per loop
3) Result checking for all methods:
v1 = forloop()
v2 = np.where(np.repeat(N[:,0] != 0, 3).reshape(1000,3), np.cross(N, e1), np.cross(N, e2))
v3 = forloop2()
(v3 == v1).all()
# True
(v1 == v2).all()
# True
I'm not certain what it is you're trying to do, but I know why this specific code is so slow for you. The worst offender is (N*e1). That's a simple calculation, and it runs pretty fast with numpy, but you're executing it inside of the loop, len(N) times!.
I am able to execute your code with N == 1000000 in less than 15 seconds on my machine by pulling that outside of the loop. Example below.
v = np.zeros([len(N), 3])
# Only calculate this one time.
my_product = N*e1
for i in range(0, len(N)):
if my_product[i,0] != 0):
v[i] = np.cross(N[i],e1)
else:
v[i] = np.cross(N[i],e2)
The other answer demonstrates how to avoid the for loop and if statements for a lot of extra speed at the cost of somewhat less readable code.
I have to cluster the consecutive elements from a NumPy array. Considering the following example
a = [ 0, 47, 48, 49, 50, 97, 98, 99]
The output should be a list of tuples as follows
[(0), (47, 48, 49, 50), (97, 98, 99)]
Here the difference is just one between the elements. It will be great if the difference can also be specified as a limit or a hardcoded number.
def consecutive(data, stepsize=1):
return np.split(data, np.where(np.diff(data) != stepsize)[0]+1)
a = np.array([0, 47, 48, 49, 50, 97, 98, 99])
consecutive(a)
yields
[array([0]), array([47, 48, 49, 50]), array([97, 98, 99])]
Here's a lil func that might help:
def group_consecutives(vals, step=1):
"""Return list of consecutive lists of numbers from vals (number list)."""
run = []
result = [run]
expect = None
for v in vals:
if (v == expect) or (expect is None):
run.append(v)
else:
run = [v]
result.append(run)
expect = v + step
return result
>>> group_consecutives(a)
[[0], [47, 48, 49, 50], [97, 98, 99]]
>>> group_consecutives(a, step=47)
[[0, 47], [48], [49], [50, 97], [98], [99]]
P.S. This is pure Python. For a NumPy solution, see unutbu's answer.
(a[1:]-a[:-1])==1 will produce a boolean array where False indicates breaks in the runs. You can also use the built-in numpy.grad.
this is what I came up so far: not sure is 100% correct
import numpy as np
a = np.array([ 0, 47, 48, 49, 50, 97, 98, 99])
print np.split(a, np.cumsum( np.where(a[1:] - a[:-1] > 1) )+1)
returns:
>>>[array([0]), array([47, 48, 49, 50]), array([97, 98, 99])]
Tested for one dimensional arrays
Get where diff isn't one
diffs = numpy.diff(array) != 1
Get the indexes of diffs, grab the first dimension and add one to all because diff compares with the previous index
indexes = numpy.nonzero(diffs)[0] + 1
Split with the given indexes
groups = numpy.split(array, indexes)
It turns out that instead of np.split, list comprehension is more performative. So the below function (almost like #unutbu's consecutive function except it uses a list comprehension to split the array) is much faster:
def consecutive_w_list_comprehension(arr, stepsize=1):
idx = np.r_[0, np.where(np.diff(arr) != stepsize)[0]+1, len(arr)]
return [arr[i:j] for i,j in zip(idx, idx[1:])]
For example, for an array of length 100_000, consecutive_w_list_comprehension is over 4x faster:
arr = np.sort(np.random.choice(range(150000), size=100000, replace=False))
%timeit -n 100 consecutive(arr)
96.1 ms ± 1.22 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit -n 100 consecutive_w_list_comprehension(arr)
23.2 ms ± 858 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In fact, this relationship holds up no matter the size of the array. The plot below shows the runtime difference between the answers on here.
Code used to produce the plot above:
import perfplot
import numpy as np
def consecutive(data, stepsize=1):
return np.split(data, np.where(np.diff(data) != stepsize)[0]+1)
def consecutive_w_list_comprehension(arr, stepsize=1):
idx = np.r_[0, np.where(np.diff(arr) != stepsize)[0]+1, len(arr)]
return [arr[i:j] for i,j in zip(idx, idx[1:])]
def group_consecutives(vals, step=1):
run = []
result = [run]
expect = None
for v in vals:
if (v == expect) or (expect is None):
run.append(v)
else:
run = [v]
result.append(run)
expect = v + step
return result
def JozeWs(array):
diffs = np.diff(array) != 1
indexes = np.nonzero(diffs)[0] + 1
groups = np.split(array, indexes)
return groups
perfplot.show(
setup = lambda n: np.sort(np.random.choice(range(2*n), size=n, replace=False)),
kernels = [consecutive, consecutive_w_list_comprehension, group_consecutives, JozeWs],
labels = ['consecutive', 'consecutive_w_list_comprehension', 'group_consecutives', 'JozeWs'],
n_range = [2 ** k for k in range(5, 22)],
equality_check = lambda *lst: all((x==y).all() for x,y in zip(*lst)),
xlabel = '~len(arr)'
)
This sounds a little like homework, so if you dont mind I will suggest an approach
You can iterate over a list using
for i in range(len(a)):
print a[i]
You could test the next element in the list meets some criteria like follows
if a[i] == a[i] + 1:
print "it must be a consecutive run"
And you can store results seperately in
results = []
Beware - there is an index out of range error hidden in the above you will need to deal with
I am trying to translate every element of a numpy.array according to a given key:
For example:
a = np.array([[1,2,3],
[3,2,4]])
my_dict = {1:23, 2:34, 3:36, 4:45}
I want to get:
array([[ 23., 34., 36.],
[ 36., 34., 45.]])
I can see how to do it with a loop:
def loop_translate(a, my_dict):
new_a = np.empty(a.shape)
for i,row in enumerate(a):
new_a[i,:] = map(my_dict.get, row)
return new_a
Is there a more efficient and/or pure numpy way?
Edit:
I timed it, and np.vectorize method proposed by DSM is considerably faster for larger arrays:
In [13]: def loop_translate(a, my_dict):
....: new_a = np.empty(a.shape)
....: for i,row in enumerate(a):
....: new_a[i,:] = map(my_dict.get, row)
....: return new_a
....:
In [14]: def vec_translate(a, my_dict):
....: return np.vectorize(my_dict.__getitem__)(a)
....:
In [15]: a = np.random.randint(1,5, (4,5))
In [16]: a
Out[16]:
array([[2, 4, 3, 1, 1],
[2, 4, 3, 2, 4],
[4, 2, 1, 3, 1],
[2, 4, 3, 4, 1]])
In [17]: %timeit loop_translate(a, my_dict)
10000 loops, best of 3: 77.9 us per loop
In [18]: %timeit vec_translate(a, my_dict)
10000 loops, best of 3: 70.5 us per loop
In [19]: a = np.random.randint(1, 5, (500,500))
In [20]: %timeit loop_translate(a, my_dict)
1 loops, best of 3: 298 ms per loop
In [21]: %timeit vec_translate(a, my_dict)
10 loops, best of 3: 37.6 ms per loop
In [22]: %timeit loop_translate(a, my_dict)
I don't know about efficient, but you could use np.vectorize on the .get method of dictionaries:
>>> a = np.array([[1,2,3],
[3,2,4]])
>>> my_dict = {1:23, 2:34, 3:36, 4:45}
>>> np.vectorize(my_dict.get)(a)
array([[23, 34, 36],
[36, 34, 45]])
Here's another approach, using numpy.unique:
>>> a = np.array([[1,2,3],[3,2,1]])
>>> a
array([[1, 2, 3],
[3, 2, 1]])
>>> d = {1 : 11, 2 : 22, 3 : 33}
>>> u,inv = np.unique(a,return_inverse = True)
>>> np.array([d[x] for x in u])[inv].reshape(a.shape)
array([[11, 22, 33],
[33, 22, 11]])
This approach is much faster than np.vectorize approach when the number of unique elements in array is small.
Explanaion: Python is slow, in this approach the in-python loop is used to convert unique elements, afterwards we rely on extremely optimized numpy indexing operation (done in C) to do the mapping. Hence, if the number of unique elements is comparable to the overall size of the array then there will be no speedup. On the other hand, if there is just a few unique elements, then you can observe a speedup of up to x100.
I think it'd be better to iterate over the dictionary, and set values in all the rows and columns "at once":
>>> a = np.array([[1,2,3],[3,2,1]])
>>> a
array([[1, 2, 3],
[3, 2, 1]])
>>> d = {1 : 11, 2 : 22, 3 : 33}
>>> for k,v in d.iteritems():
... a[a == k] = v
...
>>> a
array([[11, 22, 33],
[33, 22, 11]])
Edit:
While it may not be as sexy as DSM's (really good) answer using numpy.vectorize, my tests of all the proposed methods show that this approach (using #jamylak's suggestion) is actually a bit faster:
from __future__ import division
import numpy as np
a = np.random.randint(1, 5, (500,500))
d = {1 : 11, 2 : 22, 3 : 33, 4 : 44}
def unique_translate(a,d):
u,inv = np.unique(a,return_inverse = True)
return np.array([d[x] for x in u])[inv].reshape(a.shape)
def vec_translate(a, d):
return np.vectorize(d.__getitem__)(a)
def loop_translate(a,d):
n = np.ndarray(a.shape)
for k in d:
n[a == k] = d[k]
return n
def orig_translate(a, d):
new_a = np.empty(a.shape)
for i,row in enumerate(a):
new_a[i,:] = map(d.get, row)
return new_a
if __name__ == '__main__':
import timeit
n_exec = 100
print 'orig'
print timeit.timeit("orig_translate(a,d)",
setup="from __main__ import np,a,d,orig_translate",
number = n_exec) / n_exec
print 'unique'
print timeit.timeit("unique_translate(a,d)",
setup="from __main__ import np,a,d,unique_translate",
number = n_exec) / n_exec
print 'vec'
print timeit.timeit("vec_translate(a,d)",
setup="from __main__ import np,a,d,vec_translate",
number = n_exec) / n_exec
print 'loop'
print timeit.timeit("loop_translate(a,d)",
setup="from __main__ import np,a,d,loop_translate",
number = n_exec) / n_exec
Outputs:
orig
0.222067718506
unique
0.0472617006302
vec
0.0357889199257
loop
0.0285375618935
The numpy_indexed package (disclaimer: I am its author) provides an elegant and efficient vectorized solution to this type of problem:
import numpy_indexed as npi
remapped_a = npi.remap(a, list(my_dict.keys()), list(my_dict.values()))
The method implemented is similar to the approach mentioned by John Vinyard, but even more general. For instance, the items of the array do not need to be ints, but can be any type, even nd-subarrays themselves.
If you set the optional 'missing' kwarg to 'raise' (default is 'ignore'), performance will be slightly better, and you will get a KeyError if not all elements of 'a' are present in the keys.
Assuming your dict keys are positive integers, without huge gaps (similar to a range from 0 to N), you would be better off converting your translation dict to an array such that my_array[i] = my_dict[i], and using numpy indexing to do the translation.
A code using this approach is:
def direct_translate(a, d):
src, values = d.keys(), d.values()
d_array = np.arange(a.max() + 1)
d_array[src] = values
return d_array[a]
Testing with random arrays:
N = 10000
shape = (5000, 5000)
a = np.random.randint(N, size=shape)
my_dict = dict(zip(np.arange(N), np.random.randint(N, size=N)))
For these sizes I get around 140 ms for this approach. The np.get vectorization takes around 5.8 s and the unique_translate around 8 s.
Possible generalizations:
If you have negative values to translate, you could shift the values in a and in the keys of the dictionary by a constant to map them back to positive integers:
def direct_translate(a, d): # handles negative source keys
min_a = a.min()
src, values = np.array(d.keys()) - min_a, d.values()
d_array = np.arange(a.max() - min_a + 1)
d_array[src] = values
return d_array[a - min_a]
If the source keys have huge gaps, the initial array creation would waste memory. I would resort to cython to speed up that function.
If you don't really have to use dictionary as substitution table, simple solution would be (for your example):
a = numpy.array([your array])
my_dict = numpy.array([0, 23, 34, 36, 45]) # your dictionary as array
def Sub (myarr, table) :
return table[myarr]
values = Sub(a, my_dict)
This will work of course only if indexes of d cover all possible values of your a, in other words, only for a with usigned integers.