I want to implement SVD++ with numpy or tensorflow.
( https://pdfs.semanticscholar.org/8451/c2812a1476d3e13f2a509139322cc0adb1a2.pdf )
(4p equation 4)
I want to implement above equation without any for loop.
But summation of y_j with index set R(u) makes it hard.
So my question is...
I want to implement below equation (q_v multiply sum of y_j) without any for loop
1. Is it possible to implement it with numpy without for loop?!
2. Is it possible to implement it with tensorflow without for loop?!
My implementation is below... but I want to remove for loop further more
import numpy as np
num_users = 3
num_items = 5
latent_dim = 2
p = 0.1
r = np.random.binomial(1, 1 - p,(num_users, num_items))
r_hat = np.zeros([num_users,num_items])
q = np.random.randn(latent_dim,num_items)
y = np.random.randn(latent_dim,num_items)
## First Try
for user in range(num_users):
for item in range(num_items):
q_j = q[:,item]
user_item_list = [i for i, e in enumerate(r[user,:]) if e != 0] # R_u
sum_y_j = 0 # to make sum of y_i
for user_item in user_item_list:
sum_y_j = sum_y_j + y[:,user_item]
sum_y_j = np.asarray(sum_y_j)
r_hat[user,item] = np.dot(np.transpose(q_j),sum_y_j)
print r_hat
print "=" * 100
## Second Try
for user in range(num_users):
for item in range(num_items):
q_j = q[:,item]
user_item_list = [i for i, e in enumerate(r[user,:]) if e != 0] # R_u
sum_y_j = np.sum(y[:,user_item_list],axis=1) # to make sum of y_i
r_hat[user,item] = np.dot(np.transpose(q_j),sum_y_j)
print r_hat
print "=" * 100
## Third Try
for user in range(num_users):
user_item_list = [i for i, e in enumerate(r[user,:]) if e != 0] # R_u
sum_y_j = np.sum(y[:,user_item_list],axis=1) # to make sum of y_i
r_hat[user,:] = np.dot(np.transpose(q),sum_y_j)
print r_hat
Try this.
sum_y = []
for user in range(num_users):
mask = np.repeat(r[user,:][None,:],latent_dim, axis=0)
sum_y.append(np.sum(np.multiply(y, mask),axis=1))
sum_y = np.asarray(sum_y)
r_hat = (np.dot(q.T,sum_y.T)).T
print r_hat
It eliminates the enumerate loop, and also the dot product can be done in single go. I don't think it can be reduced beyond this.
Simply use two matrix-multiplications there with np.dot for the final output -
r_hat = r.dot(y.T).dot(q)
Sample run to verify results -
OP's sample setup :
In [68]: import numpy as np
...:
...: num_users = 3
...: num_items = 5
...: latent_dim = 2
...: p = 0.1
...:
...: r = np.random.binomial(1, 1 - p,(num_users, num_items))
...: r_hat = np.zeros([num_users,num_items])
...:
...: q = np.random.randn(latent_dim,num_items)
...: y = np.random.randn(latent_dim,num_items)
...:
In [69]: ## Second Try from OP
...: for user in range(num_users):
...: for item in range(num_items):
...: q_j = q[:,item]
...: user_item_list = [i for i, e in enumerate(r[user,:]) if e != 0] # R_u
...: sum_y_j = np.sum(y[:,user_item_list],axis=1) # to make sum of y_i
...: r_hat[user,item] = np.dot(np.transpose(q_j),sum_y_j)
...:
Let's print out the result from OP's solution -
In [70]: r_hat
Out[70]:
array([[ 4.06866107e+00, 2.91099460e+00, -6.50447668e+00,
7.44275731e-03, -2.14857566e+00],
[ 4.06866107e+00, 2.91099460e+00, -6.50447668e+00,
7.44275731e-03, -2.14857566e+00],
[ 5.57369599e+00, 3.76169533e+00, -8.47503476e+00,
1.48615948e-01, -2.82792374e+00]])
Now, I am using my proposed solution -
In [71]: r.dot(y.T).dot(q)
Out[71]:
array([[ 4.06866107e+00, 2.91099460e+00, -6.50447668e+00,
7.44275731e-03, -2.14857566e+00],
[ 4.06866107e+00, 2.91099460e+00, -6.50447668e+00,
7.44275731e-03, -2.14857566e+00],
[ 5.57369599e+00, 3.76169533e+00, -8.47503476e+00,
1.48615948e-01, -2.82792374e+00]])
Value check seems successful!
Related
Is there some faster variant of computing the following matrix (from this paper), given a nxn matrix M and a n-vector X:
?
I currently compute it as follows:
#M, X are given as numpy arrays
G = np.zeros((n,n))
for i in range(0,n):
for j in range(i,n):
xi = X[i]
if i == j:
G[i,j] = abs(xi)
else:
xi2 = xi*xi
xj = X[j]
xj2 = xj*xj
mij = M[i,j]
mid = (xi2 - xj2)/mij
top = mij*mij + mid*mid + 2*xi2 + 2*xj2
G[i,j] = math.sqrt(top)/2
This is very slow, but I suspect there is a nicer "numpythonic" way of doing this instead of looping...
EDIT: While all answers work and are much faster than my naive implementation, I chose the one I benchmarked to be the fastest. Thanks!
Quite straightforward actually.
import math
import numpy as np
n = 5
M = np.random.rand(n, n)
X = np.random.rand(n)
Your code and result:
G = np.zeros((n,n))
for i in range(0,n):
for j in range(i,n):
xi = X[i]
if i == j:
G[i,j] = abs(xi)
else:
xi2 = xi*xi
xj = X[j]
xj2 = xj*xj
mij = M[i,j]
mid = (xi2 - xj2)/mij
top = mij*mij + mid*mid + 2*xi2 + 2*xj2
G[i,j] = math.sqrt(top)/2
array([[0.77847813, 5.26334534, 0.8794082 , 0.7785694 , 0.95799072],
[0. , 0.15662266, 0.88085031, 0.47955479, 0.99219171],
[0. , 0. , 0.87699707, 8.92340836, 1.50053712],
[0. , 0. , 0. , 0.45608367, 0.95902308],
[0. , 0. , 0. , 0. , 0.95774452]])
Using broadcasting:
temp = M**2 + ((X[:, None]**2 - X[None, :]**2) / M)**2 + 2 * (X[:, None]**2) + 2 * (X[None, :]**2)
G = np.sqrt(temp) / 2
array([[0.8284724 , 5.26334534, 0.8794082 , 0.7785694 , 0.95799072],
[0.89251217, 0.25682736, 0.88085031, 0.47955479, 0.99219171],
[0.90047282, 1.10306597, 0.95176428, 8.92340836, 1.50053712],
[0.85131766, 0.47379576, 0.87723514, 0.55013345, 0.95902308],
[0.9879939 , 1.46462011, 0.99516443, 0.95774481, 1.02135642]])
Note that you did not use the formula directly for diagonal elements and only computed for upper triangular region of G. I simply implemented the formula to calculate all G[i, j].
Note: If diagonal elements of M don't matter and they contain some zeros, just add some offset to avoid the divide by zero error like:
M[np.arange(n), np.arange(n)] += 1e-5
# Do calculation to get G
# Assign diagonal to X
G[np.arange(n), np.arange(n)] = abs(X)
First, you function is not you equation. As this line
mid = (xi2 - xj2)/mij
should be
mid = (xi - xj)/mij
Second, I use numpy generate your equation.
Generate test data
test_m = np.array(
[
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
]
)
test_x = np.array([5, 6, 7, 8, 9])
build function
def solve(m, x):
x_size = x.shape[0]
x = x.reshape(1, -1)
reshaped_x = x.reshape(-1, 1)
result = np.sqrt(
m ** 2
+ ((reshaped_x - x) / m) ** 2
+ 2 * np.repeat(reshaped_x, x_size, axis=1) ** 2
+ 2 * np.repeat(x, x_size, axis=0) ** 2
) / 2
return result
run
print(solve(test_m, test_x))
In fact, the result part could be simpfy like this:
result = np.sqrt(
m ** 2
+ ((reshaped_x - x) / m) ** 2
+ 2 * reshaped_x ** 2
+ 2 * x ** 2
) / 2
Tested with googles colab:
import numba
import numpy as np
import math
# your implementation
def bench_1(n):
#M, X are given as numpy arrays
G = np.zeros((n,n))
M = np.random.rand(n, n)
X = np.random.rand(n)
for i in range(0,n):
for j in range(i,n):
xi = X[i]
if i == j:
G[i,j] = abs(xi)
else:
xi2 = xi*xi
xj = X[j]
xj2 = xj*xj
mij = M[i,j]
mid = (xi2 - xj2)/mij
top = mij*mij + mid*mid + 2*xi2 + 2*xj2
G[i,j] = math.sqrt(top)/2
return G
%%timeit
n = 1000
bench_1(n)
1 loop, best of 3: 1.61 s per loop
Using Numba to compile the function:
#numba.jit(nopython=True, parallel=True)
def bench_2(n):
#M, X are given as numpy arrays
G = np.zeros((n,n))
M = np.random.rand(n, n)
X = np.random.rand(n)
for i in range(0,n):
for j in range(i,n):
xi = X[i]
if i == j:
G[i,j] = abs(xi)
else:
xi2 = xi*xi
xj = X[j]
xj2 = xj*xj
mij = M[i,j]
mid = (xi2 - xj2)/mij
top = mij*mij + mid*mid + 2*xi2 + 2*xj2
G[i,j] = math.sqrt(top)/2
return G
%%timeit
n = 1000
bench_2(n)
The slowest run took 88.13 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 3: 9.8 ms per loop
import numpy as np;
import math;
import random;
from scipy.optimize import minimize;
def matrixmult (A, B):
rows_A = len(A)
cols_A = len(A[0])
rows_B = len(B)
cols_B = len(B[0])
Z = [[0 for row in range(rows_B)] for col in range(cols_A)]
for i in range(cols_A):
for j in range(rows_A):
#for k in range(cols_A):
Z[i][j] += A[i][j] * B[i][j]
return Z
def constraint1(x):
A=x
rows_X = cols_X = len(x)
ad = np.ones((len(x),1)) #makes a 7x1 array of ones
ad1 = x.sum(axis=1) # makes 7x1 array, each element is sum of each rows
ad2 = np.matrix(ad1)
for i in range(len(x)):
ad[i] = ad[i] - ad2[i] # sum of each row in a binary matrix must be 1 to indicate there is only one entrance or exit for each node
#for j in range(cols_X):
#ad = ad - ad1[i]
return ad
def constraint2(x):
rows_X = cols_X = len(x)
ad3 = np.ones((1,len(x)))
ad4 = x.sum(axis=0)
ad5 = np.matrix(ad4)
for i in range(len(x)):
ad3[i] = ad3[i] - ad5[i]
#for j in range(cols_X):
#ad = ad - ad1[i]
return ad3
def total(C):
C = np.array([[np.nan,3,5,np.nan,np.nan,np.nan,3],[3,np.nan,3,7,np.nan,np.nan,11],[5,3,np.nan,3,np.nan,np.nan,np.nan],[np.nan,7,3,np.nan,3,9,11],[np.nan,np.nan,np.nan,3,np.nan,3,np.nan],[np.nan,np.nan,np.nan,9,3,np.nan,3],[3,11,np.nan,11,np.nan,3,np.nan]])
X = [[0 for row in range(len(C))] for col in range(len(C[0]))]
for i in range(len(C[0])):
for j in range(len(C)):
if math.isnan(C[i][j]) == False :
X[i][j] += random.randint(0,1)
else :
X[i][j]==np.nan
CX = matrixmult (C, X)
cx = np.array(CX)
x = np.matrix(X)
print(x.sum(axis=1))
print(x.sum(axis=0))
print(x)
print(cx)
tot = 0
for i in range(len(cx[0])):
for j in range(len(cx)):
if math.isnan(cx[i][j]) == False :
#print (i,j)
tot += cx[i][j]
#for i in range(len(cx[0])):
#for j in range(len(cx)):
#if math.isnan(cx[i][j]) == False :
#print (i,j)
return tot
C = np.array([[np.nan,3,5,np.nan,np.nan,np.nan,3],[3,np.nan,3,7,np.nan,np.nan,11],[5,3,np.nan,3,np.nan,np.nan,np.nan],[np.nan,7,3,np.nan,3,9,11],[np.nan,np.nan,np.nan,3,np.nan,3,np.nan],[np.nan,np.nan,np.nan,9,3,np.nan,3],[3,11,np.nan,11,np.nan,3,np.nan]])
con1 = {'type' : 'eq', 'fun' : constraint1}
con2 = {'type' : 'eq', 'fun' : constraint2}
cons = [con1,con2]
path = minimize(total, 12,method='SLSQP', jac=None, bounds=None, tol=None, callback=None, constraints = cons)
print(path)
I need to implement traveling salesman problem with linear programming. My intention to use python optimization tools. Its my first program in python and optimization programs.
Since there are two constraints forces traveling salesman to visit(enter and leave) every node once, I wanted to create binary selection 'x' matrix with the same dimensions of cost matrix. Since there is one entrance every column of the selection matrix will sum to 1 and the same for each exit.
I have problems with the usage of scipy.optimize.minimize method. I am not able to send selection matrix to the constraint functions. I will appreciate if anybody helps, thanks in advance.. (sub-tour elimination constraints are not implemented yet)
from cvxpy import *
import numpy as np
import math;
import random;
n = 7
#X = Bool(n , n)
#Y = Bool(n , 1)
#C = np.random.randint(1,5,(n,n))
C = np.array([[np.nan,3,5,np.nan,np.nan,np.nan,3],[3,np.nan,3,7,np.nan,np.nan,11],[5,3,np.nan,3,np.nan,np.nan,np.nan],[np.nan,7,3,np.nan,3,9,11],[np.nan,np.nan,np.nan,3,np.nan,3,np.nan],[np.nan,np.nan,np.nan,9,3,np.nan,3],[3,11,np.nan,11,np.nan,3,np.nan]])
#X = [[0 for row in range(len(C))] for col in range(len(C[0]))]
X = np.zeros((n,n))
for i in range(n):
for j in range(n):
if math.isnan(C[i][j]) == False :
X[i][j] += random.randint(0,1)
else :
X[i][j]== np.nan
#x = np.array(X, dtype = np.float64)
P = C*X
nodes = []
tot = 0
for i in range(n):
for j in range(n):
if math.isnan(P[i][j]) == False :
tot += P[i][j]
if(P[i][j] >0):
print (i,j)
nodes.append((i,j))
print(nodes)
print(len(nodes))
objective = Minimize(tot)
constraints = []
constraints.append( sum_entries( X, axis=0 ) == 1 )
constraints.append( sum_entries( X, axis=1 ) == 1 )
#constraints.append( sum_entries(Y) == C )
prob = Problem(objective, constraints)
prob.solve(solver=GLPK_MI)
print (prob.value)
print(tot)
print(C)
print(X)
print(P)
#print(objective)
Now i have an edited optimization code using cvxpy packet. But it could not minimize the objective. I could not find more examples on cvxpy MILP examples. If you have any suggestion this will be nice. thanks
I would like to delete a data on which is 10cm close from the previous data.
This is what i have but it takes a large computational time because my dataset is very huge
for i in range(len(data)):
for j in range(i, len(data)):
if (i == j):
continue
elif np.sqrt((data[i, 0]-data[j, 0])**2 + (data[i, 1]-data[i, 1])**2) <= 0.1:
data[j, 0] = np.nan
data = data[~np.isnan(data).any(axis=1)]
Is there a pythonic way to do this?
Here is an approach using a KDTree:
import numpy as np
from scipy.spatial import cKDTree as KDTree
def cluster_data_KDTree(a, thr=0.1):
t = KDTree(a)
mask = np.ones(a.shape[:1], bool)
idx = 0
nxt = 1
while nxt:
mask[t.query_ball_point(a[idx], thr)] = False
nxt = mask[idx:].argmax()
mask[idx] = True
idx += nxt
return a[mask]
Borrowing #Divakar's test case we see that this delivers another 100x speedup on top of the 400x Divakar reports. Compared to OP we extrapolate a ridiculous 40,000x:
np.random.seed(0)
data1 = np.random.rand(10000,2)
data2 = data1.copy()
from timeit import timeit
kwds = dict(globals=globals(), number=10)
print(timeit("cluster_data_KDTree(data1)", **kwds))
print(timeit("cluster_data_pdist_v1(data2)", **kwds))
np.random.seed(0)
data1 = np.random.rand(10000,2)
data2 = data1.copy()
out1 = cluster_data_KDTree(data1, thr=0.1)
out2 = cluster_data_pdist_v1(data2, dist_thresh = 0.1)
print(np.allclose(out1, out2))
Sample output:
0.05073001119308174
5.646531613077968
True
It turns out that this test case happens to be quite favorable to my approach because there are very few clusters and thus very few iterations.
If we drastically increase the number of clusters to about 3800 by changing the threshold to 0.01 KDTree still wins but the speedup is reduced from 100x to 15x:
0.33647687803022563
5.28947562398389
True
We can use pdist with one-loop -
from scipy.spatial.distance import pdist
def cluster_data_pdist_v1(a, dist_thresh = 0.1):
d = pdist(a)
mask = d<=dist_thresh
n = len(a)
idx = np.concatenate(( [0], np.arange(n-1,0,-1).cumsum() ))
start, stop = idx[:-1], idx[1:]
idx_out = np.zeros(mask.sum(), dtype=int) # use np.empty for bit more speedup
cur_start = 0
for iterID,(i,j) in enumerate(zip(start, stop)):
if iterID not in idx_out[:cur_start]:
rm_idx = np.flatnonzero(mask[i:j])+iterID+1
L = len(rm_idx)
idx_out[cur_start:cur_start+L] = rm_idx
cur_start += L
return np.delete(a, idx_out[:cur_start], axis=0)
Benchmarking
Original approach -
def cluster_data_org(data, dist_thresh = 0.1):
for i in range(len(data)):
for j in range(i, len(data)):
if (i == j):
continue
elif np.sqrt((data[i, 0]-data[j, 0])**2 +
(data[i, 1]-data[j, 1])**2) <= 0.1:
data[j, 0] = np.nan
return data[~np.isnan(data).any(axis=1)]
Runtime test, verification on random data in the range : [0,1) with 10,000 points -
In [207]: np.random.seed(0)
...: data1 = np.random.rand(10000,2)
...: data2 = data1.copy()
...:
...: out1 = cluster_data_org(data1, dist_thresh = 0.1)
...: out2 = cluster_data_pdist_v1(data2, dist_thresh = 0.1)
...: print np.allclose(out1, out2)
True
In [208]: np.random.seed(0)
...: data1 = np.random.rand(10000,2)
...: data2 = data1.copy()
In [209]: %timeit cluster_data_org(data1, dist_thresh = 0.1)
1 loop, best of 3: 1min 50s per loop
In [210]: %timeit cluster_data_pdist_v1(data2, dist_thresh = 0.1)
1 loop, best of 3: 287 ms per loop
Around 400x speedup for such a setup!
The following problem concerns evaluating many monomials (x**k * y**l * z**m) at many points.
I would like to compute the "inner power" of two numpy arrays, i.e.,
import numpy
a = numpy.random.rand(10, 3)
b = numpy.random.rand(3, 5)
out = numpy.ones((10, 5))
for i in range(10):
for j in range(5):
for k in range(3):
out[i, j] *= a[i, k]**b[k, j]
print(out.shape)
If instead the line would read
out[i, j] += a[i, k]*b[j, k]
this would be a a number of inner products, computable with a simple dot or einsum.
Is it possible to perform the above loop in just one numpy line?
What about thinking of it in terms of logarithms:
import numpy
a = numpy.random.rand(10, 3)
b = numpy.random.rand(3, 5)
out = np.exp(np.matmul(np.log(a), b))
Since c_ij = prod(a_ik ** b_kj, k=1..K), then log(c_ij) = sum(log(a_ik) * b_ik, k=1..K).
Note: Having zeros in a may mess up the result (also negatives, but then the result wouldn't be well defined anyway). I have given it a try and it doesn't seem to actually break somehow; I don't know if that behavior is guaranteed by NumPy but, to be safe, you can add something at the end like:
out[np.logical_or.reduce(a < eps, axis=1)] = 0
You can use broadcasting after extending those arrays to 3D versions -
(a[:,:,None]**b[None,:,:]).prod(axis=1)
Simply put -
(a[...,None]**b[None]).prod(1)
Basically, we are keeping the last axis and first axis from the two arrays aligned, while performing element-wise powers between the first and last axes from the two inputs. Schematically put using the given sample on shapes -
10 x 3 x 1
1 x 3 x 5
Two more solutions:
Inlining
numpy.array([
numpy.prod([a[:, i]**bb[i] for i in range(len(bb))], axis=0)
for bb in b.T
]).T
and using power.outer:
numpy.prod([numpy.power.outer(a[:, k], b[k]) for k in range(len(b))], axis=0)
Both are a bit slower than the broadcasting solution.
Even with some logic to accommodate for zero and negative values, the exp-log solution takes the cake.
Code to reproduce the plot:
import numpy
import perfplot
def loop(data):
a, b = data
m = a.shape[0]
n = b.shape[1]
out = numpy.ones((m, n))
for i in range(m):
for j in range(n):
for k in range(3):
out[i, j] *= a[i, k]**b[k, j]
return out
def broadcasting(data):
a, b = data
return (a[..., None]**b[None]).prod(1)
def log_exp(data):
a, b = data
neg_a = numpy.zeros(a.shape, dtype=int)
neg_a[a < 0.0] = 1
odd_b = numpy.zeros(b.shape, dtype=int)
odd_b[b % 2 == 1] = 1
negative_count = numpy.dot(neg_a, odd_b)
out = (-1)**negative_count * numpy.exp(
numpy.matmul(
numpy.log(abs(a), where=abs(a) > 0.0),
b
))
zero_a = numpy.zeros(a.shape, dtype=int)
zero_a[a == 0.0] = 1
pos_b = numpy.zeros(b.shape, dtype=int)
pos_b[b > 0] = 1
zero_count = numpy.dot(zero_a, pos_b)
out[zero_count > 0] = 0.0
return out
def inline(data):
a, b = data
return numpy.array([
numpy.prod([a[:, i]**bb[i] for i in range(len(bb))], axis=0)
for bb in b.T
]).T
def outer_power(data):
a, b = data
return numpy.prod([
numpy.power.outer(a[:, k], b[k]) for k in range(len(b))
], axis=0)
perfplot.show(
setup=lambda n: (
numpy.random.rand(n, 3) - 0.5,
numpy.random.randint(0, 10, (3, n))
),
n_range=[2**k for k in range(11)],
repeat=10,
kernels=[
loop,
broadcasting,
inline,
log_exp,
outer_power
],
logx=True,
logy=True,
xlabel='len(a)',
)
import numpy
a = numpy.random.rand(10, 3)
b = numpy.random.rand(3, 5)
out = [[numpy.prod([a[i, k]**b[k, j] for k in range(3)]) for j in range(5)] for i in range(10)]
I have a numpy operation that looks like the following:
for i in range(i_max):
for j in range(j_max):
r[i, j, x[i, j], y[i, j]] = c[i, j]
where x, y and c have the same shape.
Is it possible to use numpy's advanced indexing to speed this operation up?
I tried using:
i = numpy.arange(i_max)
j = numpy.arange(j_max)
r[i, j, x, y] = c
However, I didn't get the result I expected.
Using linear indexing -
d0,d1,d2,d3 = r.shape
np.put(r,np.arange(i_max)[:,None]*d1*d2*d3 + np.arange(j_max)*d2*d3 + x*d3 +y,c)
Benchmarking and verification
Define functions -
def linear_indx(r,x,y,c,i_max,j_max):
d0,d1,d2,d3 = r.shape
np.put(r,np.arange(i_max)[:,None]*d1*d2*d3 + np.arange(j_max)*d2*d3 + x*d3 +y,c)
return r
def org_app(r,x,y,c,i_max,j_max):
for i in range(i_max):
for j in range(j_max):
r[i, j, x[i,j], y[i,j]] = c[i,j]
return r
Setup input arrays and benchmark -
In [134]: # Setup input arrays
...: i_max = 40
...: j_max = 50
...: D0 = 60
...: D1 = 70
...: N = 80
...:
...: r = np.zeros((D0,D1,N,N))
...: c = np.random.rand(i_max,j_max)
...:
...: x = np.random.randint(0,N,(i_max,j_max))
...: y = np.random.randint(0,N,(i_max,j_max))
...:
In [135]: # Make copies for testing, as both functions make in-situ changes
...: r1 = r.copy()
...: r2 = r.copy()
...:
In [136]: # Verify results by comparing with original loopy approach
...: np.allclose(linear_indx(r1,x,y,c,i_max,j_max),org_app(r2,x,y,c,i_max,j_max))
Out[136]: True
In [137]: # Make copies for testing, as both functions make in-situ changes
...: r1 = r.copy()
...: r2 = r.copy()
...:
In [138]: %timeit linear_indx(r1,x,y,c,i_max,j_max)
10000 loops, best of 3: 115 µs per loop
In [139]: %timeit org_app(r2,x,y,c,i_max,j_max)
100 loops, best of 3: 2.25 ms per loop
The indexing arrays need to be broadcastable for this to work. The only change needed is to add an axis to the first index i to match the shape with the rest. The quick way to accomplish this is by indexing with None (which is equivalent to numpy.newaxis):
i = numpy.arange(i_max)
j = numpy.arange(j_max)
r[i[:,None], j, x, y] = c