Numpy calculate gradients accross matrices - python

I am using the following to calculate the running gradients between data in the same indexes across multiple matrices:
import numpy as np
array_1 = np.array([[1,2,3], [4,5,6]])
array_2 = np.array([[2,3,4], [5,6,7]])
array_3 = np.array([[1,8,9], [9,6,7]])
flat_1 = array_1.flatten()
flat_2 = array_2.flatten()
flat_3 = array_3.flatten()
print('flat_1: {0}'.format(flat_1))
print('flat_2: {0}'.format(flat_2))
print('flat_3: {0}'.format(flat_3))
data = []
gradient_list = []
for item in zip(flat_1,flat_2,flat_3):
data.append(list(item))
print('items: {0}'.format(list(item)))
grads = np.gradient(list(item))
print('grads: {0}'.format(grads))
gradient_list.append(grads)
grad_array=np.array(gradient_list)
print('grad_array: {0}'.format(grad_array))
This doesn't look like an optimal way of doing this - is there a vectorized way of calculating gradients between data in 2d arrays?

numpy.gradient takes axis as parameter, so you might just stack the arrays, and then calcualte the gradient along a certain axis; For instance, use np.dstack with axis=2; If you need a different shape as result, just use reshape method:
np.gradient(np.dstack((array_1, array_2, array_3)), axis=2)
#array([[[ 1. , 0. , -1. ],
# [ 1. , 3. , 5. ],
# [ 1. , 3. , 5. ]],
# [[ 1. , 2.5, 4. ],
# [ 1. , 0.5, 0. ],
# [ 1. , 0.5, 0. ]]])
Or if flatten the arrays first:
np.gradient(np.column_stack((array_1.ravel(), array_2.ravel(), array_3.ravel())), axis=1)
#array([[ 1. , 0. , -1. ],
# [ 1. , 3. , 5. ],
# [ 1. , 3. , 5. ],
# [ 1. , 2.5, 4. ],
# [ 1. , 0.5, 0. ],
# [ 1. , 0.5, 0. ]])

Related

Combine array of indices with array of values

I have an array in the following form where the first two columns are supposed to be indices of a 2-dimensional array and the following columns are arbitrary values.
data = np.array([[ 0. , 1. , 48. , 4. ],
[ 1. , 2. , 44. , 4.4],
[ 1. , 1. , 34. , 2.3],
[ 0. , 2. , 55. , 2.2],
[ 0. , 0. , 42. , 2. ],
[ 1. , 0. , 22. , 1. ]])
How do I combine the indices data[:,:2] with their values data[:,2:] such that the resulting array is accessible by the indices in the first two columns.
In my example that would be:
result = np.array([[[42. , 2. ], [48. , 4. ], [55. , 2.2]],
[[22. , 1. ], [34. , 2.3], [44. , 4.4]]])
I know that there is a trivial solution using python loops. But performance is a concern since I'm dealing with a huge amount of data. Specifically it's output of another program that I need to process.
Maybe there is a relatively trivial numpy solution as well. But I'm kind of stuck.
If it helps the following can be safely assumed:
All numbers in the first two columns are whole numbers (although the array consists of floats).
Every possible index (or rather combinations of indices) in the original array is used exactly once. I.e. there is guaranteed to be exactly one entry of the form [i, j, ...].
The indices start at 0 and I know the highest indices beforehand.
Edit:
Hmm. I see now how my example is misleading. The truth is that some of my input arrays are sorted, but that's unreliable. So I shouldn't assume anything about the order. I reordered some rows in my example to make it clearer. In case anyone wants to make sense of the answer and comment below: In my original question the array appeared to be sorted by the first two columns.
find row, column, depth base your data array, then fill like below:
import numpy as np
data = np.array([[ 0. , 0. , 42. , 2. ],
[ 0. , 1. , 48. , 4. ],
[ 0. , 2. , 55. , 2.2],
[ 1. , 0. , 22. , 1. ],
[ 1. , 1. , 34. , 2.3],
[ 1. , 2. , 44. , 4.4]])
row = int(max(data[:,0]))+1
col = int(max(data[:,1]))+1
depth = len(data[0, 2:])
out = np.zeros([row, col, depth])
out = data[:, 2:].reshape(row,col,depth)
print(out)
Output:
[[[42. 2. ]
[48. 4. ]
[55. 2.2]]
[[22. 1. ]
[34. 2.3]
[44. 4.4]]]
You can use numba in no-python parallel mode with loops (which is inherently for python loops acceleration) that will be one of the most efficient methods in terms of performance as szczesny mentioned in the comments, that won't need to sort; this code is adjusted for when column counts are 2, if it be changeable, this code can be modified to handle that:
# without signature --> #nb.njit(parallel=True)
#nb.njit("float64[:, :, ::1](float64[:, ::1])", parallel=True)
def numba_(data):
data_ = data[:, :2].astype(np.int8)
res = np.empty((data_[:, 0].max() + 1, data_[:, 1].max() + 1, 2))
for i in nb.prange(data_.shape[0]):
res[data_[i, 0], data_[i, 1], 0] = data[i, 2]
res[data_[i, 0], data_[i, 1], 1] = data[i, 3]
return res
without the sorting and curing the proposed NumPy code (horizontal axis --> data.shape[0]):
More general to consider more than 2 columns:
#nb.njit("float64[:, :, ::1](float64[:, ::1])", parallel=True)
def numba_(data):
data_ = data[:, :2].astype(np.int8)
assert data_.shape[0] == data.shape[0]
depth = data[:, 2:].shape[1]
res = np.empty((data_[:, 0].max() + 1, data_[:, 1].max() + 1, depth))
for i in nb.prange(data_.shape[0]):
for j in range(depth):
res[data_[i, 0], data_[i, 1], j] = data[i, j + 2]
return res

What is the Numpy best practice for a function that works with scalars or vectors as inputs?

I often write equations using numpy that take scalars as inputs and returns another scalar or a vector. Then later I find that I would like to do the same thing, but with one or more vectors as inputs. I'm trying to figure out a way to make one function work in both cases without sprinkling in all sorts of if tests calling np.isscalar() or np.atleast1d() (unless that's the only way).
Is it possible to handle scalar and vector inputs with only one function or am I stuck with multiple implementations?
As an example, here are some functions to convert x, y angles into a north, east, down unit vector (assume the angles are in radians already). I've included a scalar version, vectorized version, and one using np.meshgrid() with calls to the scalar version. I'm trying to avoid doing explicit for looping or calls to np.vectorize().
Scalar Example
import numpy as np
def xy_to_nez_scalar(x, y):
n = np.sin(y)
e = np.sin(x) * np.cos(y)
z = np.cos(x) * np.cos(y)
return np.array([n, e, z])
x = 1
y = 1
nez1 = xy_to_nez_scalar(x, y)
print(f'{nez1=}')
print(f'{nez1.shape=}\n')
producing
nez1=array([0.84147098, 0.45464871, 0.29192658])
nez1.shape=(3,)
Vectorized Example
# X and Y are arrays, so we'll follow the convention that X
# is a column of N rows and Y is a row of M columns. Then we would
# return an array of shape (N, M, 3).
def xy_to_nez_vector(x, y):
n = np.sin(y)
e = np.sin(x[:, np.newaxis]) * np.cos(y)
z = np.cos(x[:, np.newaxis]) * np.cos(y)
# make sure n has the same shape as e and z
nn = np.broadcast_to(n, e.shape)
nez = np.stack([nn, e, z], axis=-1)
return nez
x_array = np.arange(4)
y_array = np.arange(2)
nez2 = xy_to_nez_vector(x_array, y_array)
print(f'{nez2=}')
print(f'{nez2.shape=}\n')
producing
nez2=array([[[ 0. , 0. , 1. ],
[ 0.84147098, 0. , 0.54030231]],
[[ 0. , 0.84147098, 0.54030231],
[ 0.84147098, 0.45464871, 0.29192658]],
[[ 0. , 0.90929743, -0.41614684],
[ 0.84147098, 0.4912955 , -0.2248451 ]],
[[ 0. , 0.14112001, -0.9899925 ],
[ 0.84147098, 0.07624747, -0.53489523]]])
nez2.shape=(4, 2, 3)
Meshgrid Example
# note this produces a (3, M, N) result.
xv, yv = np.meshgrid(x_array, y_array)
nez3 = xy_to_nez_scalar(xv, yv)
print(f'{nez3=}')
print(f'{nez3.shape=}\n')
# fix things by transposing.
nez4 = nez3.T
print(f'{nez4=}')
print(f'{nez4.shape=}\n')
print(np.allclose(nez2, nez4))
producing
nez3=array([[[ 0. , 0. , 0. , 0. ],
[ 0.84147098, 0.84147098, 0.84147098, 0.84147098]],
[[ 0. , 0.84147098, 0.90929743, 0.14112001],
[ 0. , 0.45464871, 0.4912955 , 0.07624747]],
[[ 1. , 0.54030231, -0.41614684, -0.9899925 ],
[ 0.54030231, 0.29192658, -0.2248451 , -0.53489523]]])
nez3.shape=(3, 2, 4)
nez4=array([[[ 0. , 0. , 1. ],
[ 0.84147098, 0. , 0.54030231]],
[[ 0. , 0.84147098, 0.54030231],
[ 0.84147098, 0.45464871, 0.29192658]],
[[ 0. , 0.90929743, -0.41614684],
[ 0.84147098, 0.4912955 , -0.2248451 ]],
[[ 0. , 0.14112001, -0.9899925 ],
[ 0.84147098, 0.07624747, -0.53489523]]])
nez4.shape=(4, 2, 3)
True
This may be bordering on a frame challenge, but I would suggest changing your implementation philosophy slightly to conform to what most numpy functions do already. This has two advantages: (1) experienced numpy users will know what to expect from your functions, and (2) the scalar-vector problems go away.
Normally if faced with a function like xy_to_nez(x, y), I would expect it to take arrays x and y, and return something that has the broadcasted shape of the two, with 3 as either the first or last dimension. The choice of putting 3 in the last dimension is totally fine here. However, magically meshing the arrays together instead of broadcasting them is a rather un-num-pythonic thing to do.
Have the user tell you what you want explicitly (a core tenet of python, item #2 in import this). For example, given a broadcasting interface as suggested above, your scalar function is nearly complete. The only change you would need to make is to stack along the last axis instead of the first, as np.array does:
def xy_to_nez(x, y):
n = np.sin(y)
e = np.sin(x) * np.cos(y)
z = np.cos(x) * np.cos(y)
return np.stack(np.broadcast_arrays(n, e, z), -1)
I would expect the following three examples to work as nez1, nez2 and nez4:
>>> xy_to_nez(1, 1) # shape: 3
array([0.84147098, 0.45464871, 0.29192658])
>>> xy_to_nez(np.arange(4)[:, None], np.arange(2)) # shape: 4, 2
array([[[ 0. , 0. , 1. ],
[ 0.84147098, 0. , 0.54030231]],
[[ 0. , 0.84147098, 0.54030231],
[ 0.84147098, 0.45464871, 0.29192658]],
[[ 0. , 0.90929743, -0.41614684],
[ 0.84147098, 0.4912955 , -0.2248451 ]],
[[ 0. , 0.14112001, -0.9899925 ],
[ 0.84147098, 0.07624747, -0.53489523]]])
>>> xy_to_nez(*np.meshgrid(np.arange(4), np.arange(2), indexing='ij'))
array([[[ 0. , 0. , 1. ],
[ 0.84147098, 0. , 0.54030231]],
[[ 0. , 0.84147098, 0.54030231],
[ 0.84147098, 0.45464871, 0.29192658]],
[[ 0. , 0.90929743, -0.41614684],
[ 0.84147098, 0.4912955 , -0.2248451 ]],
[[ 0. , 0.14112001, -0.9899925 ],
[ 0.84147098, 0.07624747, -0.53489523]]])
In the third example, I corrected the problem with nez3 by passing indexing='ij' to np.meshgrid. The issue there was not with vectorization, but with the shape of the meshgrid you were passing in.
I would not expect xy_to_nez(np.arange(4), np.arange(2)) to work at all: the arrays don't broadcast together and we shouldn't try to figure out how to combine them. To see why, pretend that they had three random dimensions each. Do you interleave? Do you place one set after the other? What if some of the dimensions broadcast but others don't? Leave it up to the user to eliminate those questions from consideration.
At the same time:
>>> xy_to_nez(np.arange(4), np.arange(4)) # Shape: 4, 3
array([[ 0. , 0. , 1. ],
[ 0.84147098, 0.45464871, 0.29192658],
[ 0.90929743, -0.37840125, 0.17317819],
[ 0.14112001, -0.13970775, 0.98008514]])

Vectorize an index-based matrix operation in numpy

How can I vectorize the following loop?
def my_fnc():
m = np.arange(27.).reshape((3,3,3))
ret = np.empty_like(m)
it = np.nditer(m, flags=['multi_index'])
for x in it:
i,j,k = it.multi_index
ret[i,j,k] = x / m[i,j,i]
return ret
Basically I'm dividing each value in m by something similar to a diagonal. Not all values in m will be different, the arange is just an example.
Thanks in advance! ~
P.S.: here's the output of the function above, don't mind the nans :)
array([[[ nan, inf, inf],
[ 1. , 1.33333333, 1.66666667],
[ 1. , 1.16666667, 1.33333333]],
[[ 0.9 , 1. , 1.1 ],
[ 0.92307692, 1. , 1.07692308],
[ 0.9375 , 1. , 1.0625 ]],
[[ 0.9 , 0.95 , 1. ],
[ 0.91304348, 0.95652174, 1. ],
[ 0.92307692, 0.96153846, 1. ]]])
Use advanced-indexing to get the m[i,j,i] equivalent in one go and then simply divide input array by it -
r = np.arange(len(m))
ret = m/m[r,:,r,None] # Add new axis with None to allow for broadcasting

Why does python numpy std() make unwanted spaces?

'car3.csv' file download link
import csv
num = open('car3.csv')
nums = csv.reader(num)
nums_list = []
for i in nums:
nums_list.append(i)
import numpy as np
nums_arr = np.array(nums_list, dtype = np.float32)
print(nums_arr)
print(np.std(nums_arr, axis=0))
The result is this.
[[ 1. 1. 2.]
[ 1. 1. 2.]
[ 1. 1. 2.]
...,
[ 0. 0. 5.]
[ 0. 0. 5.]
[ 0. 0. 5.]]
[ 0.5 0.5 1.11803401]
There are lots of spaces that I didn't expected.
How can I handle these anyway?
That is not a spacing problem. What all you need to do is to save the output of the standard deviation. Then, you can access each value like this:
std_arr = np.std(nums_arr, axis=0) # array which holds std of each column
# now, you can access them by indexing:
print(std_arr[0]) # output here is 0.5
print(std_arr[1]) # output here is 0.5
print(std_arr[2]) # output here is 1.118034

Predicting missing values in recommender System

I am trying to implement Non-negative Matrix Factorization so as to find the missing values of a matrix for a Recommendation Engine Project. I am using the nimfa library to implement matrix factorization. But can't seem to figure out how to predict the missing values.
The missing values in this matrix is represented by 0.
a=[[ 1. 0.45643546 0. 0.1 0.10327956 0.0225877 ]
[ 0.15214515 1. 0.04811252 0.07607258 0.23570226 0.38271325]
[ 0. 0.14433757 1. 0.07905694 0. 0.42857143]
[ 0.1 0.22821773 0.07905694 1. 0. 0.27105237]
[ 0.06885304 0.47140452 0. 0. 1. 0.13608276]
[ 0.00903508 0.4592559 0.17142857 0.10842095 0.08164966 1. ]]
import nimfa
model = nimfa.Lsnmf(a, max_iter=100000,rank =4)
#fit the model
fit = model()
#get U and V matrices from fit
U = fit.basis()
V = fit.coef()
print numpy.dot(U,V)
But the ans given is nearly same as a and I can't predict the zero values.
Please tell me which method to use or any other implementations possible and any possible resources.
I want to use this function to minimize the error in predicting the values.
error=|| a - UV ||_F + c*||U||_F + c*||V||_F
where _F denotes the frobenius norm
I have not used nimfa before so I cannot answer on exactly how to do that, but with sklearn you can perform a preprocessor to transform the missing values, like this:
In [28]: import numpy as np
In [29]: from sklearn.preprocessing import Imputer
# prepare a numpy array
In [30]: a = np.array(a)
In [31]: a
Out[31]:
array([[ 1. , 0.45643546, 0. , 0.1 , 0.10327956,
0.0225877 ],
[ 0.15214515, 1. , 0.04811252, 0.07607258, 0.23570226,
0.38271325],
[ 0. , 0.14433757, 1. , 0.07905694, 0. ,
0.42857143],
[ 0.1 , 0.22821773, 0.07905694, 1. , 0. ,
0.27105237],
[ 0.06885304, 0.47140452, 0. , 0. , 1. ,
0.13608276],
[ 0.00903508, 0.4592559 , 0.17142857, 0.10842095, 0.08164966,
1. ]])
In [32]: pre = Imputer(missing_values=0, strategy='mean')
# transform missing_values as "0" using mean strategy
In [33]: pre.fit_transform(a)
Out[33]:
array([[ 1. , 0.45643546, 0.32464951, 0.1 , 0.10327956,
0.0225877 ],
[ 0.15214515, 1. , 0.04811252, 0.07607258, 0.23570226,
0.38271325],
[ 0.26600665, 0.14433757, 1. , 0.07905694, 0.35515787,
0.42857143],
[ 0.1 , 0.22821773, 0.07905694, 1. , 0.35515787,
0.27105237],
[ 0.06885304, 0.47140452, 0.32464951, 0.27271009, 1. ,
0.13608276],
[ 0.00903508, 0.4592559 , 0.17142857, 0.10842095, 0.08164966,
1. ]])
You can read more here.

Categories