Having an array and a mask for this array, using fancy indexing, it is easy to select only the data of the array corresponding to the mask.
import numpy as np
a = np.arange(20).reshape(4, 5)
mask = [0, 2]
data = a[:, mask]
But is there a rapid way to select all the data of the array that does not belong to the mask (i.e. the mask is the data we want to reject)?
I tried to find a general solution going through an intermediate boolean array, but I'm sure there is something really easier.
mask2 = np.ones(a.shape)==1
mask2[:, mask]=False
data = a[mask2].reshape(a.shape[0], a.shape[1]-size(mask))
Thank you
Have a look at numpy.invert, numpy.bitwise_not, numpy.logical_not, or more concisely ~mask. (They all do the same thing, in this case.)
As a quick example:
import numpy as np
x = np.arange(10)
mask = x > 5
print x[mask]
print x[~mask]
Related
Is it possible to generate a boolean mask from the output of numpy.where()?
Specifically, I would like to use the output of where to select values while retaining the shape of the array because further processing needs to be done row-wise, however, np.where directly flattens the array without retaining the dimensions:
import numpy as np
A = np.random.random((10, 10))
A[np.where(A > 0.9)]
Out[76]:
array([0.98981282, 0.9128424 , 0.92600831, 0.98639861, 0.97051929,
0.90718864, 0.95667512])
But what I would like to get is either a (10,10) boolean mask, or the actual values from A, but then in a way that the dimension are identifiable.
My current work around looks like this, but I am not sure whether there isn't a better, more direct, way of doing it.
A = np.random.random((10, 10))
B = np.nan * np.zeros_like(A)
C = np.where(A > 0.9, A, B)
where I can do the processing for each row separately.
What should be in the positions where A<=0.9? Try A * (A > 0.9).
Out of huge matrix in numpy (currently 1000x1000) only a few elements are relevant for me. Say these elements are >1000 in value and others are way lower. I need to find indices of all such elements in the most efficient way because the search will be repeated often and the matrix can become even bigger.
For now I have two different approaches which should be about the same complexity (I omit possible solutions with for as inefficient):
import numpy as np
A = np.zeros((1000,1000))
#do something with the matrix
#first solution with np.where
np.where(A > 999).T
# array([[0, 0],[1, 20]....[785, 445]], dtype=int64) - made up numbers
#another solution with np.argwhere
np.argwhere(A > 999)
# array([[0, 0],[1, 20]....[785, 445]], dtype=int64) - outputs the same
Is there any possible way to speed up this search or is my solution the most efficient?
Thanks for any advices and suggestion!
You can try this, the filter directly included in the numpy array!
import numpy as np
arr = np.array([998, 999, 1000, 1001])
filter_arr = arr > 999
newarr = arr[filter_arr]
print(filter_arr)
print(newarr)
https://www.w3schools.com/python/numpy_array_filter.asp
We can find indices of a scalar in numpy array like below:
import numpy as np
array = np.array([1,2,3])
mask = (array == 2) #mask is now [False,True,False]
When element is a vector:
import numpy as np
array = np.array([[1,2],[1,4],[5,6]])
mask = (array == [1,4]) #mask is now [[True,False],[True,True],[False,False]]
I actually want to generate a similar mask like in the first code snippet in the second example.
mask = [False,True,False]
Is this possible in numpy library?
Since the comparison is element-wise, you need to reduce it using all on the first axis:
(array == [1, 4]).all(axis=1)
Out: array([False, True, False], dtype=bool)
I have a 2D MaskedArray X and I want to randomly select 30 non-masked elements from it and return their indices idx.
The goal is that I could use the indices to read / set values efficiently later in my code:
selected = X[idx]
X[idx] = a # some arrays with the same length
What is the most efficient way of generating idx?
Ok I have figured out a way... if anyone has a better approach please let me know.
pos = np.random.choice(X.count(), size=30)
idx = tuple(np.take((~X.mask).nonzero(), pos, axis=1))
I solved a similar task, by passing true/false of the array mask as weights to np.random.choice:
import numpy.ma as ma
import numpy as np
data = np.array([[0,0,0,1],[0,1,3,2],[2,0,0,3],[0,3,4,1]])
numSample=2
masked = ma.masked_where(data<3, data)
weights=~masked.mask + 0 #Assign False = 0, True = 1
normalized = weights.ravel()/float(weights.sum())
index=np.random.choice(
masked.size,
size=numSample,
replace=False,
p=normalized
)
idx, idy = np.unravel_index(index, data.shape)
I have three arrays: longitude(400,600),latitude(400,600),data(30,400,60); what I am trying to do is to extract value in the data array according to it's location(latitude and longitude).
Here is my code:
import numpy
import tables
hdf = "data.hdf5"
h5file = tables.openFile(hdf, mode = "r")
lon = numpy.array(h5file.root.Lonitude)
lat = numpy.array(h5file.root.Latitude)
arr = numpy.array(h5file.root.data)
lon = numpy.array(lon.flat)
lat = numpy.array(lat.flat)
arr = numpy.array(arr.flat)
lonlist=[]
latlist=[]
layer=[]
fre=[]
for i in range(0,len(lon)):
for j in range(0,30):
longi = lon[j]
lati = lat[j]
layers=[j]
frequency= arr[i]
lonlist.append(longi)
latlist.append(lati)
layer.append(layers)
fre.append(frequency)
output = numpy.column_stack((lonlist,latlist,layer,fre))
The problem is that the "frequency" is not what I want.I want the data array to be flattened along axis-zero,so that the "frequency" would be the 30 values at one location.Is there such a function in numpy to flatten ndarray along a particular axis?
You can try np.ravel(your_array), or your_array.shape=-1. The np.ravel function lets you use an optional argument order: choose C for a row-major order or F for a column-major order.
I guess what you actually wanted was just transpose to change the axis order. Depending on what you do with it, it might be useful to do a .copy() after the transposed to optimize the memory layout, since transpose will not create a copy itself.
Just to add, if you want to make something that is beyond F and C order, you can use transposed = ndarray.transpose([1,2,0]) to move the first axis to the end, the last into second position and then do transposed.ravel() (I assumed C order, so moved 0 axis to the end). You can also use reshape which is more powerful then the simple ravel (return shape can be any dimension).
Note that unless the strides add up exactly, numpy will have to make a copy of the array, you can avoid that by the very nice transposed.flat() iterator in many cases.
>>> a = np.random.rand(2,2,2)
>>> a
array([[[ 0.67379148, 0.95508303],
[ 0.80520281, 0.34666202]],
[[ 0.01862911, 0.33851973],
[ 0.18464121, 0.64637853]]])
>>> np.ravel(a)
array([ 0.67379148, 0.95508303, 0.80520281, 0.34666202, 0.01862911,
0.33851973, 0.18464121, 0.64637853])
You are essentially unfolding a high-dimensional tensor. Try tensorly.unfold(arr, mode=the_direction_you_want). For example,
import numpy as np
import tensorly as tl
a = np.zeros((3, 4, 5))
b = tl.unfold(a, mode=1)
b.shape # (4, 15)