I am working on conversion of code from matlab to python.
for values,
N = 100
V = [[ -7.94627203e+01 -1.81562235e+02 -3.05418070e+02 -2.38451033e+02][ 9.43740653e+01 1.69312771e+02 1.68545575e+01 -1.44450299e+02][ 5.61599000e+00 8.76135909e+01 1.18959245e+02 -1.44049237e+02]]
V is numpy array.
for i = 1:N
L(i) = sqrt(norm(v(:,i)));
if L(i) > 0.0001
q(:,i) = v(:,i)/L(i);
else
q(:,i) = v(:,i)*0.0001;
end
end
I have converted this code to :
L = []
q = []
for i in range(1, (N +1)):
L.insert((i -1),np.sqrt( np.linalg.norm(v[:, (i -1)])))
if L[(i -1)] > 0.0001:
q.insert((i -1), (v[:, (i -1)] / L[(i -1)]).tolist())
else:
q.insert((i -1), (v[:, (i -1)] * 0.0001).tolist())
q = np.array(q)
return q, len_
But, in matlab the resultant dimensions are 3 x 4 but I am getting 4 x 3 in python. Can anyone let me know where I am doing mistake?
You are inserting lists of length 3 into q. When you finish the loop that creates q, q is a list of 4 items, where each item is a list of length 3. So np.array(q) creates an array with shape 4x3. You could change the second-to-last line to this:
q = np.array(q).T
Or, you can use numpy more effectively to eliminate all the explicit for loops. For example, if you are using numpy 1.8, the norm function accepts an axis argument.
Here's vectorized version of your code.
First, some setup for this example.
In [152]: np.set_printoptions(precision=3)
In [153]: np.random.seed(111)
Create some data to work with.
In [154]: v = 5e-9 * np.random.randint(0, 3, size=(3, 4))
In [155]: v
Out[155]:
array([[ 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00],
[ 1.000e-08, 5.000e-09, 1.000e-08, 1.000e-08],
[ 1.000e-08, 0.000e+00, 5.000e-09, 0.000e+00]])
Compute the square root of the norms of the columns by using the argument axis=0 in numpy.linalg.norm.
In [156]: L = np.sqrt(np.linalg.norm(v, axis=0))
In [157]: L
Out[157]: array([ 1.189e-04, 7.071e-05, 1.057e-04, 1.000e-04])
Use numpy.where to select the values by which the columns of v are to be divided to create q.
In [158]: scale = np.where(L > 0.0001, L, 1000.0)
In [159]: scale
Out[159]: array([ 1.189e-04, 1.000e+03, 1.057e-04, 1.000e+03])
q is has shape (3, 4), and scale has shape (4,), so we can use broadcasting to divide each column of q by the corresponding value in scale.
In [160]: q = v / scale
In [161]: q
Out[161]:
array([[ 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00],
[ 8.409e-05, 5.000e-12, 9.457e-05, 1.000e-11],
[ 8.409e-05, 0.000e+00, 4.729e-05, 0.000e+00]])
Repeated here are the three lines of the vectorized code:
L = np.sqrt(np.linalg.norm(v, axis=0))
scale = np.where(L > 0.0001, L, 1000.0)
q = v / scale
Related
Given a 2-d numpy array, X, of shape [m,m], I wish to apply a function and obtain a new 2-d numpy matrix P, also of shape [m,m], whose [i,j]th element is obtained as follows:
P[i][j] = exp (-|| X[i] - x[j] ||**2)
where ||.|| represents the standard L-2 norm of a vector. Is there any way faster than a simple nested for loop?
For example,
X = [[1,1,1],[2,3,4],[5,6,7]]
Then, at diagonal entries the rows accessed will be the same and the norm/magnitude of their difference will be 0. Hence,
P[0][0] = P[1][1] = P[2][2] = exp (0) = 1.0
Also,
P[0][1] = exp (- || X[0] - X[1] ||**2) = exp (- || [-1,-2,-3] || ** 2) = exp (-14)
etc.
The most trivial solution using a nested for loop is as follows:
import numpy as np
X = np.array([[1,2,3],[4,5,6],[7,8,9]])
P = np.zeros (shape=[len(X),len(X)])
for i in range (len(X)):
for j in range (len(X)):
P[i][j] = np.exp (- np.linalg.norm (X[i]-X[j])**2)
print (P)
This prints:
P = [[1.00000000e+00 1.87952882e-12 1.24794646e-47]
[1.87952882e-12 1.00000000e+00 1.87952882e-12]
[1.24794646e-47 1.87952882e-12 1.00000000e+00]]
Here, m is of the order of 5e4.
In [143]: X = np.array([[1,2,3],[4,5,6],[7,8,9]])
...: P = np.zeros (shape=[len(X),len(X)])
...: for i in range (len(X)):
...: for j in range (len(X)):
...: P[i][j] = np.exp (- np.linalg.norm (X[i]-X[j]))
...:
In [144]: P
Out[144]:
array([[1.00000000e+00, 5.53783071e-03, 3.06675690e-05],
[5.53783071e-03, 1.00000000e+00, 5.53783071e-03],
[3.06675690e-05, 5.53783071e-03, 1.00000000e+00]])
A no-loop version:
In [145]: np.exp(-np.sqrt(((X[:,None,:]-X[None,:,:])**2).sum(axis=2)))
Out[145]:
array([[1.00000000e+00, 5.53783071e-03, 3.06675690e-05],
[5.53783071e-03, 1.00000000e+00, 5.53783071e-03],
[3.06675690e-05, 5.53783071e-03, 1.00000000e+00]])
I had to drop your **2 to match values.
With the norm applied to the 3d difference array:
In [148]: np.exp(-np.linalg.norm(X[:,None,:]-X[None,:,:], axis=2))
Out[148]:
array([[1.00000000e+00, 5.53783071e-03, 3.06675690e-05],
[5.53783071e-03, 1.00000000e+00, 5.53783071e-03],
[3.06675690e-05, 5.53783071e-03, 1.00000000e+00]])
In one of the scikit packages (learn?) there's a cdist that may handle this sort of thing faster.
As hpaulj mentioned cdist does it better. Try the following.
from scipy.spatial.distance import cdist
import numpy as np
np.exp(-cdist(X,X,'sqeuclidean'))
Notice the sqeuclidean. This means that scipy does not take the square root so you don't have to square like you did above with the norm.
This would be easier if you provided a sample array. You can create an array Q of size [m, m, m] where Q[i, j, k] = X[i, k] - X[j, k] by using
X[None,:,:] - X[:,None,:]
At this point, you're performing simple numpy operations against the third axis.
Let's say I have a simple array, like this one:
import numpy as np
a = np.array([1,2,3])
Which returns me, obviously:
array([1, 2, 3])
I'm trying to add calculated values between consecutive values in this array. The calculation should return me n equally spaced values between it's bounds.
To express myself in numbers, let's say I want to add 1 value between each pair of consecutive values, so the function should return me a array like this:
array([1, 1.5, 2, 2.5, 3])
Another example, now with 2 values between each pair:
array([1, 1.33, 1.66, 2, 2.33, 2.66, 3])
I know the logic and I can create myself a function which will do the work, but I feel numpy has specific functions that would make my code so much cleaner!
If your array is
import numpy as np
n = 2
a = np.array([1,2,5])
new_size = a.size + (a.size - 1) * n
x = np.linspace(a.min(), a.max(), new_size)
xp = np.linspace(a.min(), a.max(), a.size)
fp = a
result = np.interp(x, xp, fp)
returns: array([1. , 1.33333333, 1.66666667, 2. , 2.66666667, 3.33333333, 4. ])
If your array is always evenly spaced, you can just use
new_size = a.size + (a.size - 1) * n
result = np.linspace(a.min(), a.max(), new_size)
Using linspace should do the trick:
a = np.array([1,2,3])
n = 1
temps = []
for i in range(1, len(a)):
temps.append(np.linspace(a[i-1], a[i], num=n+1, endpoint=False))
# Add last final ending point
temps.append(np.array([a[-1]]))
new_a = np.concatenate(temps)
print(new_a)
Try with np.arange:
a = np.array([1,2,3])
n = 2
print(np.arange(a.min(), a.max(), 1 / (n + 1)))
Output:
[1. 1.33333333 1.66666667 2. 2.33333333 2.66666667]
Given a function like my_function(x,y) that takes two ndarrays x and y as an input and outputs a scalar:
def my_function(x,y):
perm = np.take(x, y)
return np.sum((np.power(2, perm) - 1) / (np.log2(np.arange(3, k + 3))))
I want to find a way to apply it to two matrices r and p
r = np.asarray([[5,6,7],[8,9,10]])
p = np.asarray([[2,1,0],[0,2,1]])
in such a way that an ndarray is returned with the values
np.asarray([my_function([5,6,7],[2,1,0]), my_function([8,9,10],[0,2,1])
You can slightly modify your function to use take_along_axis instead of take, which will allow you to adapt to the 2D solution.
def my_function_2d(x, y, k=1):
t = np.take_along_axis(x, y, -1)
u = np.power(2, t) - 1
v = np.log2(np.arange(3, k+3))
return (u / v).sum(-1)
my_function_2d(r, p, k=1)
array([ 139.43547554, 1128.73332914])
Validation
In [96]: k = 1
In [97]: my_function([5,6,7],[2,1,0])
Out[97]: 139.4354755392921
In [98]: my_function([8,9,10],[0,2,1])
Out[98]: 1128.7333291393375
This will also still work on the 1D case:
In [145]: my_function_2d(r[0], p[0], k=1)
Out[145]: 139.4354755392921
This approach generalizes to the N-dimensional case:
In [157]: r = np.random.randint(1, 5, (2, 2, 2, 2, 2, 3))
In [158]: p = np.random.randint(0, r.shape[-1], r.shape)
In [159]: my_function_2d(r, p, k=3)
Out[159]:
array([[[[[ 8.34718483, 14.25597598],
[12.25597598, 19.97868221]],
[[12.97868221, 4.68481893],
[ 2.42295943, 1.56160631]]],
[[[23.42409467, 9.82346582],
[10.93124418, 16.42409467]],
[[23.42409467, 1.56160631],
[ 3.68481893, 10.68481893]]]],
[[[[15.97868221, 10.93124418],
[ 5.40752517, 14.93124418]],
[[ 4.14566566, 6.34718483],
[14.93124418, 3.68481893]]],
[[[ 9.20853795, 13.39462286],
[23.42409467, 3.82346582]],
[[23.42409467, 9.85293763],
[ 4.56160631, 10.93124418]]]]])
I assume you realize your approach doesn't work for all inputs and ks, there are some shape requirements
You can try either map or a list comprehension with zip as following. Please note that I took k=1 to have a running code as you did not specify k
def my_function(x,y):
k=1
perm = np.take(x, y)
return np.sum((np.power(2, perm) - 1) / (np.log2(np.arange(3, k + 3))))
r = np.asarray([[5,6,7],[8,9,10]])
p = np.asarray([[2,1,0],[0,2,1]])
result = np.asarray([my_function(i, j) for i, j in zip(r, p)])
print (result)
# [ 139.43547554 1128.73332914]
You can use np.vectorize with the signature keyword:
k = 3
np.vectorize(my_function, signature='(i),(i)->()')(r, p)
# array([124.979052 , 892.46280834])
How do I (efficiently) sample zero values from a scipy.sparse.coo_matrix?
>>> import numpy as np
>>> from scipy.sparse import coo_matrix
>>> # create sparse array
>>> X = np.array([[1., 0.], [2., 1.], [0., 0.]])
>>> X_sparse = coo_matrix(X)
>>> # randomly sample 0's from X_sparse, retrieving as [(row, col), (row_col)]
>>> def sample_zeros(sp_arr, n, replacement=False):
>>> # ???
>>> return negs
>>> zero_indices = sample_zeros(X_sparse, n=3, replacement=False)
>>> print(zero_indices)
[(0, 1), (2, 0), (2, 1)]
Efficiency is important here, since I will be doing this in an iterator that feeds a neural network.
Since you know the shape of X, you could use np.random.choice to generate
random (row, col) locations in X:
h, w = X.shape
rows = np.random.choice(h, size=n)
cols = np.random.choice(w, size=n)
The main difficulty is how to check if a (row, col) is a non-zero location in X.
Here's a way to do that: Make a new sparse X which equals 1 wherever X is nonzero.
Next, create a new sparse matrix, Y, with non-zero values at the random locations generated above. Then subtract:
Y = Y - X.multiply(Y)
This sparse matrix Y will be zero wherever X is nonzero.
So if we've managed to generate enough nonzero values in Y, then we can use their (row, col) locations as the return value for sample_negs:
import unittest
import sys
import numpy as np
import scipy.sparse as sparse
def sample_negs(X, n=3, replace=False):
N = np.prod(X.shape)
m = N - X.size
if n == 0:
result = []
elif (n < 0) or (not replace and m < n) or (replace and m == 0):
raise ValueError("{n} samples from {m} locations do not exist"
.format(n=n, m=m))
elif n/m > 0.5:
# Y (in the else clause, below) would be pretty dense so there would be no point
# trying to use sparse techniques. So let's use hpaulj's idea
# (https://stackoverflow.com/a/53577267/190597) instead.
import warnings
warnings.filterwarnings("ignore", category=sparse.SparseEfficiencyWarning)
Y = sparse.coo_matrix(X == 0)
rows = Y.row
cols = Y.col
idx = np.random.choice(len(rows), size=n, replace=replace)
result = list(zip(rows[idx], cols[idx]))
else:
X_row, X_col = X.row, X.col
X_data = np.ones(X.size)
X = sparse.coo_matrix((X_data, (X_row, X_col)), shape=X.shape)
h, w = X.shape
Y = sparse.coo_matrix(X.shape)
Y_size = 0
while Y_size < n:
m = n - Y.size
Y_data = np.concatenate([Y.data, np.ones(m)])
Y_row = np.concatenate([Y.row, np.random.choice(h, size=m)])
Y_col = np.concatenate([Y.col, np.random.choice(w, size=m)])
Y = sparse.coo_matrix((Y_data, (Y_row, Y_col)), shape=X.shape)
# Remove values in Y where X is nonzero
# This also consolidates (row, col) duplicates
Y = sparse.coo_matrix(Y - X.multiply(Y))
if replace:
Y_size = Y.data.sum()
else:
Y_size = Y.size
if replace:
rows = np.repeat(Y.row, Y.data.astype(int))
cols = np.repeat(Y.col, Y.data.astype(int))
idx = np.random.choice(rows.size, size=n, replace=False)
result = list(zip(rows[idx], cols[idx]))
else:
rows = Y.row
cols = Y.col
idx = np.random.choice(rows.size, size=n, replace=False)
result = list(zip(rows[idx], cols[idx]))
return result
class Test(unittest.TestCase):
def setUp(self):
import warnings
warnings.filterwarnings("ignore", category=sparse.SparseEfficiencyWarning)
self.ncols, self.nrows = 100, 100
self.X = sparse.random(self.ncols, self.nrows, density=0.05, format='coo')
Y = sparse.coo_matrix(self.X == 0)
self.expected = set(zip(Y.row, Y.col))
def test_n_too_large(self):
self.assertRaises(ValueError, sample_negs, self.X, n=100*100+1, replace=False)
X_dense = sparse.coo_matrix(np.ones((4,2)))
self.assertRaises(ValueError, sample_negs, X_dense, n=1, replace=True)
def test_no_replacement(self):
for m in range(100):
negative_list = sample_negs(self.X, n=m, replace=False)
negative_set = set(negative_list)
self.assertEqual(len(negative_list), m)
self.assertLessEqual(negative_set, self.expected)
def test_no_repeats_when_replace_is_false(self):
negative_list = sample_negs(self.X, n=10, replace=False)
self.assertEqual(len(negative_list), len(set(negative_list)))
def test_dense_replacement(self):
N = self.ncols * self.nrows
m = N - self.X.size
for i in [-1, 0, 1]:
negative_list = sample_negs(self.X, n=m+i, replace=True)
negative_set = set(negative_list)
self.assertEqual(len(negative_list), m+i)
self.assertLessEqual(negative_set, self.expected)
def test_sparse_replacement(self):
for m in range(100):
negative_list = sample_negs(self.X, n=m, replace=True)
negative_set = set(negative_list)
self.assertEqual(len(negative_list), m)
self.assertLessEqual(negative_set, self.expected)
if __name__ == '__main__':
sys.argv.insert(1,'--verbose')
unittest.main(argv = sys.argv)
Since sample_negs is rather complicated, I've included some unit tests
to hopefully verify reasonable behavior.
I don't think there's an efficient way that takes advantage of the sparse matrix structure:
In [197]: >>> X = np.array([[1., 0.], [2., 1.], [0., 0.]])
...: >>> X_sparse = sparse.coo_matrix(X)
In [198]: X_sparse
Out[198]:
<3x2 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in COOrdinate format>
In [199]: print(X_sparse)
(0, 0) 1.0
(1, 0) 2.0
(1, 1) 1.0
With the dense array you could do something like:
In [204]: zeros = np.argwhere(X==0)
In [205]: zeros
Out[205]:
array([[0, 1],
[2, 0],
[2, 1]])
In [206]: idx=np.random.choice(3,3, replace=False)
In [207]: idx
Out[207]: array([0, 2, 1])
In [208]: zeros[idx,:]
Out[208]:
array([[0, 1],
[2, 1],
[2, 0]])
We could ask for all 0s of the sparse matrix:
In [209]: X_sparse==0
/usr/local/lib/python3.6/dist-packages/scipy/sparse/compressed.py:214: SparseEfficiencyWarning: Comparing a sparse matrix with 0 using == is inefficient, try using != instead.
", try using != instead.", SparseEfficiencyWarning)
Out[209]:
<3x2 sparse matrix of type '<class 'numpy.bool_'>'
with 3 stored elements in Compressed Sparse Row format>
In [210]: print(_)
(0, 1) True
(2, 0) True
(2, 1) True
I have a set of data in python likes:
x y angle
If I want to calculate the distance between two points with all possible value and plot the distances with the difference between two angles.
x, y, a = np.loadtxt('w51e2-pa-2pk.log', unpack=True)
n = 0
f=(((x[n])-x[n+1:])**2+((y[n])-y[n+1:])**2)**0.5
d = a[n]-a[n+1:]
plt.scatter(f,d)
There are 255 points in my data.
f is the distance and d is the difference between two angles.
My question is can I set n = [1,2,3,.....255] and do the calculation again to get the f and d of all possible pairs?
You can obtain the pairwise distances through broadcasting by considering it as an outer operation on the array of 2-dimensional vectors as follows:
vecs = np.stack((x, y)).T
np.linalg.norm(vecs[np.newaxis, :] - vecs[:, np.newaxis], axis=2)
For example,
In [1]: import numpy as np
...: x = np.array([1, 2, 3])
...: y = np.array([3, 4, 6])
...: vecs = np.stack((x, y)).T
...: np.linalg.norm(vecs[np.newaxis, :] - vecs[:, np.newaxis], axis=2)
...:
Out[1]:
array([[ 0. , 1.41421356, 3.60555128],
[ 1.41421356, 0. , 2.23606798],
[ 3.60555128, 2.23606798, 0. ]])
Here, the (i, j)'th entry is the distance between the i'th and j'th vectors.
The case of the pairwise differences between angles is similar, but simpler, as you only have one dimension to deal with:
In [2]: a = np.array([10, 12, 15])
...: a[np.newaxis, :] - a[: , np.newaxis]
...:
Out[2]:
array([[ 0, 2, 5],
[-2, 0, 3],
[-5, -3, 0]])
Moreover, plt.scatter does not care that the results are given as matrices, and putting everything together using the notation of the question, you can obtain the plot of angles by distances by doing something like
vecs = np.stack((x, y)).T
f = np.linalg.norm(vecs[np.newaxis, :] - vecs[:, np.newaxis], axis=2)
d = angle[np.newaxis, :] - angle[: , np.newaxis]
plt.scatter(f, d)
You have to use a for loop and range() to iterate over n, e.g. like like this:
n = len(x)
for i in range(n):
# do something with the current index
# e.g. print the points
print x[i]
print y[i]
But note that if you use i+1 inside the last iteration, this will already be outside of your list.
Also in your calculation there are errors. (x[n])-x[n+1:] does not work because x[n] is a single value in your list while x[n+1:] is a list starting from n+1'th element. You can not subtract a list from an int or whatever it is.
Maybe you will have to even use two nested loops to do what you want. I guess that you want to calculate the distance between each point so a two dimensional array may be the data structure you want.
If you are interested in all combinations of the points in x and y I suggest to use itertools, which will give you all possible combinations. Then you can do it like follows:
import itertools
f = [((x[i]-x[j])**2 + (y[i]-y[j])**2)**0.5 for i,j in itertools.product(255,255) if i!=j]
# and similar for the angles
But maybe there is even an easier way...