Python version: 2.7
I have the following numpy 2d array:
array([[ -5.05000000e+01, -1.05000000e+01],
[ -4.04000000e+01, -8.40000000e+00],
[ -3.03000000e+01, -6.30000000e+00],
[ -2.02000000e+01, -4.20000000e+00],
[ -1.01000000e+01, -2.10000000e+00],
[ 7.10542736e-15, -1.77635684e-15],
[ 1.01000000e+01, 2.10000000e+00],
[ 2.02000000e+01, 4.20000000e+00],
[ 3.03000000e+01, 6.30000000e+00],
[ 4.04000000e+01, 8.40000000e+00]])
If I wanted to find all the combinations of the first and the second columns, I would use np.array(np.meshgrid(first_column, second_column)).T.reshape(-1,2). As a result, I would get a 100*1 matrix with 10*10 = 100 data points. However, my matrix can have 3, 4, or more columns, so I have a problem of using this numpy function.
Question: how can I make an automatically meshgridded matrix with 3+ columns?
UPD: for example, I have the initial array:
[[-50.5 -10.5]
[ 0. 0. ]]
As a result, I want to have the output array like this:
array([[-10.5, -50.5],
[-10.5, 0. ],
[ 0. , -50.5],
[ 0. , 0. ]])
or this:
array([[-50.5, -10.5],
[-50.5, 0. ],
[ 0. , -10.5],
[ 0. , 0. ]])
You could use * operator on the transposed array version that unpacks those columns sequentially. Finally, a swap axes operation is needed to merge the output grid arrays as one array.
Thus, one generic solution would be -
np.swapaxes(np.meshgrid(*arr.T),0,2)
Sample run -
In [44]: arr
Out[44]:
array([[-50.5, -10.5],
[ 0. , 0. ]])
In [45]: np.swapaxes(np.meshgrid(*arr.T),0,2)
Out[45]:
array([[[-50.5, -10.5],
[-50.5, 0. ]],
[[ 0. , -10.5],
[ 0. , 0. ]]])
Related
Can someone help me please on how to generate a weighted adjacency matrix from a numpy array based on euclidean distance between all rows, i.e 0 and 1, 0 and 2,.. 1 and 2,...?
Given the following example with an input matrix(5, 4):
matrix = [[2,10,9,6],
[5,1,4,7],
[3,2,1,0],
[10, 20, 1, 4],
[17, 3, 5, 18]]
I would like to obtain a weighted adjacency matrix (5,5) containing the most minimal distance between nodes, i.e,
if dist(row0, row1)= 10,77 and dist(row0, row2)= 12,84,
--> the output matrix will take the first distance as a column value.
I have already solved the first part for the generation of the adjacency matrix with the following code :
from scipy.spatial.distance import cdist
dist = cdist( matrix, matrix, metric='euclidean')
and I get the following result :
array([[ 0. , 10.77032961, 12.84523258, 15.23154621, 20.83266666],
[10.77032961, 0. , 7.93725393, 20.09975124, 16.43167673],
[12.84523258, 7.93725393, 0. , 19.72308292, 23.17326045],
[15.23154621, 20.09975124, 19.72308292, 0. , 23.4520788 ],
[20.83266666, 16.43167673, 23.17326045, 23.4520788 , 0. ]])
But I don't know yet how to specify the number of neighbors for which we select for example 2 neighbors for each node. For example, we define the number of neighbors N = 2, then for each row, we choose only two neighbors with the two minimum distances and we get as a result :
[[ 0. , 10.77032961, 12.84523258, 0, 0],
[10.77032961, 0. , 7.93725393, 0, 0],
[12.84523258, 7.93725393, 0. , 0, 0],
[15.23154621, 0, 19.72308292, 0. , 0 ],
[20.83266666, 16.43167673, 0, 0 , 0. ]]
You can use this cleaner solution to get the smallest n from a matrix. Try the following -
The dist.argsort(1).argsort(1) creates a rank order (smallest is 0 and largest is 4) over axis=1 and the <= 2 decided the number of nsmallest values you need from the rank order. np.where filters it or replaces it with 0.
np.where(dist.argsort(1).argsort(1) <= 2, dist, 0)
array([[ 0. , 10.77032961, 12.84523258, 0. , 0. ],
[10.77032961, 0. , 7.93725393, 0. , 0. ],
[12.84523258, 7.93725393, 0. , 0. , 0. ],
[15.23154621, 0. , 19.72308292, 0. , 0. ],
[20.83266666, 16.43167673, 0. , 0. , 0. ]])
This works for any axis or if you want nlargest or nsmallest from a matrix as well.
Assuming a is your Euclidean distance matrix, you can use np.argpartition to choose n min/max values per row. Keep in mind the diagonal is always 0 and euclidean distances are non-negative, so to keep two closest point in each row, you need to keep three min per row (including 0s on diagonal). This does not hold if you want to do max however.
a[np.arange(a.shape[0])[:,None],np.argpartition(a, 3, axis=1)[:,3:]] = 0
output:
array([[ 0. , 10.77032961, 12.84523258, 0. , 0. ],
[10.77032961, 0. , 7.93725393, 0. , 0. ],
[12.84523258, 7.93725393, 0. , 0. , 0. ],
[15.23154621, 0. , 19.72308292, 0. , 0. ],
[20.83266666, 16.43167673, 0. , 0. , 0. ]])
I have following program
import numpy as np
arr = np.random.randn(3,4)
print(arr)
regArr = (arr > 0.8)
print (regArr)
print (arr[ regArr].reshape(arr.shape))
output:
[[ 0.37182134 1.4807685 0.11094223 0.34548185]
[ 0.14857641 -0.9159358 -0.37933393 -0.73946522]
[ 1.01842304 -0.06714827 -1.22557205 0.45600827]]
I am looking for output in arr where values greater than 0.8 should exist and other values to be zero.
I tried bool masking as shown above. But I am able to slove this. Kindly help
I'm not entirely sure what exactly you want to achieve, but this is what I did to filter.
arr = np.random.randn(3,4)
array([[-0.04790508, -0.71700005, 0.23204224, -0.36354634],
[ 0.48578236, 0.57983561, 0.79647091, -1.04972601],
[ 1.15067885, 0.98622772, -0.7004639 , -1.28243462]])
arr[arr < 0.8] = 0
array([[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ],
[1.15067885, 0.98622772, 0. , 0. ]])
Thanks to user3053452, I have added one more solution which the original data will not be changed.
arr = np.random.randn(3,4)
array([[ 0.4297907 , 0.38100702, 0.30358291, -0.71137138],
[ 1.15180635, -1.21251676, 0.04333404, 1.81045931],
[ 0.17521058, -1.55604971, 1.1607159 , 0.23133528]])
new_arr = np.where(arr < 0.8, 0, arr)
array([[0. , 0. , 0. , 0. ],
[1.15180635, 0. , 0. , 1.81045931],
[0. , 0. , 1.1607159 , 0. ]])
This question already has answers here:
Numpy array loss of dimension when masking
(5 answers)
Closed 3 years ago.
The question sounds very basic. But when I try to use where or boolean conditions on numpy arrays, it always returns a flattened array.
I have the NumPy array
P = array([[ 0.49530662, 0.07901 , -0.19012371],
[ 0.1421513 , 0.48607405, -0.20315014],
[ 0.76467375, 0.16479826, -0.56598029],
[ 0.53530718, -0.21166188, -0.08773241]])
I want to extract the array of only negative values, but when I try
P[P<0]
array([-0.19012371, -0.41421612, -0.20315014, -0.56598029, -0.21166188,
-0.08773241, -0.09241335])
P[np.where(P<0)]
array([-0.19012371, -0.41421612, -0.20315014, -0.56598029, -0.21166188,
-0.08773241, -0.09241335])
I get a flattened array. How can I extract the array of the form
array([[ 0, 0, -0.19012371],
[ 0 , 0, -0.20315014],
[ 0, 0, -0.56598029],
[ 0, -0.21166188, -0.08773241]])
I do not wish to create a temp array and then use something like Temp[Temp>=0] = 0
Since your need is:
I want to "extract" the array of only negative values
You can use numpy.where() with your condition (checking for negative values), which can preserve the dimension of the array, as in the below example:
In [61]: np.where(P<0, P, 0)
Out[61]:
array([[ 0. , 0. , -0.19012371],
[ 0. , 0. , -0.20315014],
[ 0. , 0. , -0.56598029],
[ 0. , -0.21166188, -0.08773241]])
where P is your input array.
Another idea could be to use numpy.zeros_like() for initializing a same shape array and numpy.where() to gather the indices at which our condition satisfies.
# initialize our result array with zeros
In [106]: non_positives = np.zeros_like(P)
# gather the indices where our condition is obeyed
In [107]: idxs = np.where(P < 0)
# copy the negative values to correct indices
In [108]: non_positives[idxs] = P[idxs]
In [109]: non_positives
Out[109]:
array([[ 0. , 0. , -0.19012371],
[ 0. , 0. , -0.20315014],
[ 0. , 0. , -0.56598029],
[ 0. , -0.21166188, -0.08773241]])
Yet another idea would be to simply use the barebones numpy.clip() API, which would return a new array, if we omit the out= kwarg.
In [22]: np.clip(P, -np.inf, 0) # P.clip(-np.inf, 0)
Out[22]:
array([[ 0. , 0. , -0.19012371],
[ 0. , 0. , -0.20315014],
[ 0. , 0. , -0.56598029],
[ 0. , -0.21166188, -0.08773241]])
This should work, essentially get the indexes of all elements which are above 0, and set them to 0, this will preserve the dimensions! I got the idea from here: Replace all elements of Python NumPy Array that are greater than some value
Also note that I have modified the original array, I haven't used a temp array here
import numpy as np
P = np.array([[ 0.49530662, 0.07901 , -0.19012371],
[ 0.1421513 , 0.48607405, -0.20315014],
[ 0.76467375, 0.16479826, -0.56598029],
[ 0.53530718, -0.21166188, -0.08773241]])
P[P >= 0] = 0
print(P)
The output will be
[[ 0. 0. -0.19012371]
[ 0. 0. -0.20315014]
[ 0. 0. -0.56598029]
[ 0. -0.21166188 -0.08773241]]
As noted below, this will modify the array, so we should use np.where(P<0, P 0) to preserve the original array as follows, thanks #kmario123 as follows
import numpy as np
P = np.array([[ 0.49530662, 0.07901 , -0.19012371],
[ 0.1421513 , 0.48607405, -0.20315014],
[ 0.76467375, 0.16479826, -0.56598029],
[ 0.53530718, -0.21166188, -0.08773241]])
print( np.where(P<0, P, 0))
print(P)
The output will be
[[ 0. 0. -0.19012371]
[ 0. 0. -0.20315014]
[ 0. 0. -0.56598029]
[ 0. -0.21166188 -0.08773241]]
[[ 0.49530662 0.07901 -0.19012371]
[ 0.1421513 0.48607405 -0.20315014]
[ 0.76467375 0.16479826 -0.56598029]
[ 0.53530718 -0.21166188 -0.08773241]]
I'd like to assign a value to an element of a numpy array addressed by a list. Is this possible? It seems like the sort of thing you ought to be able to do.
I tried:
q = np.zeros((2,2,2))
index = [0,0,0]
print(index)
q[index]=4.3
print(q)
Which didn't give an error, which is promising, but q is now:
[[[ 4.3 4.3]
[ 4.3 4.3]]
[[ 0. 0. ]
[ 0. 0. ]]]
As opposed to:
[[[ 4.3 0. ]
[ 0. 0.]]
[[ 0. 0. ]
[ 0. 0. ]]]
As I hoped it would be.
Thanks in advance for your help.
You can't use a list to index a single element - it has to be a tuple:
import numpy as np
q = np.zeros((2,2,2))
index = [0,0,0]
print(index)
q[tuple(index)]=4.3
print(q)
[0, 0, 0]
[[[ 4.3 0. ]
[ 0. 0. ]]
[[ 0. 0. ]
[ 0. 0. ]]]
I have a ndarray. From this array I need to choose the list of N numbers with biggest values. I found heapq.nlargest to find the N largest entries, but I need to extract the indexes.
I want to build a new array where only the N rows with the largest weights in the first column will survive. The rest of the rows will be replaced by random values
import numpy as np
import heapq # For choosing list of max values
a = [[1.1,2.1,3.1], [2.1,3.1,4.1], [5.1,0.1,7.1],[0.1,1.1,1.1],[4.1,3.1,9.1]]
a = np.asarray(a)
maxVal = heapq.nlargest(2,a[:,0])
if __name__ == '__main__':
print a
print maxVal
The output I have is:
[[ 1.1 2.1 3.1]
[ 2.1 3.1 4.1]
[ 5.1 0.1 7.1]
[ 0.1 1.1 1.1]
[ 4.1 3.1 9.1]]
[5.0999999999999996, 4.0999999999999996]
but what I need is [2,4] as the indexes to build a new array. The indexes are the rows so if in this example I want to replace the rest by 0 I need to finish with:
[[0.0 0.0 0.0]
[ 0.0 0.0 0.0]
[ 5.1 0.1 7.1]
[ 0.0 0.0 0.0]
[ 4.1 3.1 9.1]]
I am stuck in the point where I need indexes. The original array has 1000 rows and 100 columns. The weights are normalized floating points and I don't want to do something like if a[:,1] == maxVal[0]: because sometimes I have the weights very close and can finish with more values maxVal[0] than my original N.
Is there any simple way to extract indexes on this setup to replace the rest of the array?
If you only have 1000 rows, I would forget about the heap and use np.argsort on the first column:
>>> np.argsort(a[:,0])[::-1][:2]
array([2, 4])
If you want to put it all together, it would look something like:
def trim_rows(a, n) :
idx = np.argsort(a[:,0])[:-n]
a[idx] = 0
>>> a = np.random.rand(10, 4)
>>> a
array([[ 0.34416425, 0.89021968, 0.06260404, 0.0218131 ],
[ 0.72344948, 0.79637177, 0.70029863, 0.20096129],
[ 0.27772833, 0.05372373, 0.00372941, 0.18454153],
[ 0.09124461, 0.38676351, 0.98478492, 0.72986697],
[ 0.84789887, 0.69171688, 0.97718206, 0.64019977],
[ 0.27597241, 0.26705301, 0.62124467, 0.43337711],
[ 0.79455424, 0.37024814, 0.93549275, 0.01130491],
[ 0.95113795, 0.32306471, 0.47548887, 0.20429272],
[ 0.3943888 , 0.61586129, 0.02776393, 0.2560126 ],
[ 0.5934556 , 0.23093912, 0.12550062, 0.58542137]])
>>> trim_rows(a, 3)
>>> a
array([[ 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. ],
[ 0.84789887, 0.69171688, 0.97718206, 0.64019977],
[ 0. , 0. , 0. , 0. ],
[ 0.79455424, 0.37024814, 0.93549275, 0.01130491],
[ 0.95113795, 0.32306471, 0.47548887, 0.20429272],
[ 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. ]])
And for your data size it's probably fast enough:
In [7]: a = np.random.rand(1000, 100)
In [8]: %timeit -n1 -r1 trim_rows(a, 50)
1 loops, best of 1: 7.65 ms per loop