Given an array and a mask, we can assign new values to the positions that are TRUE in the mask:
import numpy as np
a = np.array([1,2,3,4,5,6])
mask1 = (a==2) | (a==5)
a[mask1] = 100
print a
# [ 1 100 3 4 100 6]
However, if we apply a second mask over the first one, we can access to the values but we cannot modify them:
a = np.array([1,2,3,4,5,6])
mask1 = (a==2) | (a==5)
mask2 = (a[mask1]==2)
print a[mask1][mask2]
# [2]
a[mask1][mask2] = 100
print a
# [ 1 2 3 4 5 6 ]
Why does it happen?
(Even if it seems a bizarre way to do this. Just out of curiosity)
This is probably because you mix getters and setters preventing backpropagation.
It's because you use mark1 as an indexer:
>>> mask1
array([False, True, False, False, True, False], dtype=bool)
now by setting a[mask1] = 100, you will set all the elements where mask1 was true thus resulting in
>>> a
array([ 1, 100, 3, 4, 100, 6])
note that you have only called a "setter" so to speak on a.
Now for a[mask1][mask2] = 100 you actually call both a getter and setter. Indeed you can write this as:
temp = a[mask1] #getter
temp[mask2] = 2#setter
as a result you only set the value in the temp, and thus the value is not "backpropagated" so to speak to a itself. You should see temp as a copy (although internally it is definitely possible that a python interpreter handles it differently).
Note: note that there can be circumstances where this behavior works: if temp is for instance a view on an array, it could support backwards propagation. This page for instance shows ways to return a view instead of a copy.
You are chaining advanced* indexing operations for the assignment, which prevents the value 100 being written back to the original array.
a[mask1] returns a new array with a copy of the original data. Writing a[mask1][mask2] = 100 means that this new array is indexed with mask2 and the value 100 assigned to it. This leaves a unchanged.
Simply viewing the items will appear to work fine because the values you pick out from the copy a[mask1] are the values you would want from the original array (although this is still inefficient as data is copied multiple times).
*advanced (or "fancy") indexing is triggered with a boolean array or an array of indices. It always returns a new array, unlike basic indexing which returns a view onto the original data (this is triggered, for example, by slicing).
Related
I have a 3D int64 Numpy array, which is output from skimage.measure.label. I need a list of 3D indices that match each of our possible (previously known) values, separated out by which indices correspond to each value.
Currently, we do this by the following idiom:
for cur_idx,count in values_counts.items():
region=labels[:,:,:] == cur_idx
[dim1_indices,dim2_indices,dim3_indices]= np.nonzero(region)
While this code works and produces correct output, it is quite slow, especially the np.nonzero part, as we call this 200+ times on a large array. I realize that there is probably a faster way to do this via, say, numba, but we'd like to avoid adding on additional requirements unless needed.
Ultimately, what we're looking for is a list of indices that correspond to each (nonzero) value relatively efficiently. Assume our number of values <1000 but our array size >100x1000x1000. So, for example, on the array created by the following:
x = np.zeros((4,4,4))
x[3,3,3] = 1; x[1,0,3] = 2; x[1,2,3] = 2
we would want some idx_value dict/array such that idx_value_1[2] = 1 idx_value_2[2] = 2, idx_value_3[2] = 3.
I've tried tackling problems similar to the one you describe, and I think the np.argwhere function is probably your best option for reducing runtime (see docs here). See the code example below for how this could be used per the constraints you identify above.
import numpy as np
x = np.zeros((4,4,4))
x[3,3,3] = 1; x[1,0,3] = 2; x[1,2,3] = 3
# Instantiate dictionary/array to store indices
idx_value = {}
# Get indices equal to 2
idx_value[3] = np.argwhere(x == 3)
idx_value[2] = np.argwhere(x == 2)
idx_value[1] = np.argwhere(x == 1)
# Display idx_value - consistent with indices we set before
>>> idx_value
{3: array([[1, 2, 3]]), 2: array([[1, 0, 3]]), 1: array([[3, 3, 3]])}
For the first use case, I think you would still have to use a for loop to iterate over the values you're searching over, but it could be done as:
# Instantiate dictionary/array
idx_value = {}
# Now loop by incrementally adding key/value pairs
for cur_idx,count in values_counts.items():
idx_value[cur_idx] = np.argwhere(labels)
NOTE: This incrementally creates a dictionary where each key is an idx to search for, and each value is a np.array object of shape (N_matches, 3).
I ran into a bug caused by using multiple sets of brackets to index an array, i.e. using a[i][j] (for various reasons, mostly laziness, which I've now fixed properly). While attempting to assign an element of the array, I found that I was unable to, but I didn't receive any kind of error to tell me why. I am confused as to why I can't do this:
>>> import numpy as np
>>> x = np.arange(0,50,10)
>>> idx = np.array([1,3,4])
>>> x[idx]
array([10, 30, 40])
>>> x[idx][1]
30
>>> x[idx][1] = 10 #this assignment doesn't work
>>> x[idx][1]
30
However, if I instead index the idx array inside the brackets, then it seems to work:
>>> x[idx[1]]
30
>>> x[idx[1]] = 100
>>> x[idx[1]]
100
Can someone explain to me why?
Another way to explain this is that each [] translates into __getitem__() call, and each []= is a __setitem__ call.
x[idx][1]= 2
is then
x.__getitem__(idx).__setitem__(1, 2)
If x.__getitem__(idx) produces a new array with its own databuffer, the set changes that, not x. If on the other hand the get produces a view, then the change will appear in x.
It's important when working with arrays, that you understand the difference between a view and copy.
Because:
x[idx]
Creates a new array object with an independent, underlying buffer.
So then you index into that:
[1] = 10
Which does work, but then you don't keep that new array around, and it is discarded immediately.
Whereas:
x[idx[1]] = 100
Assigns to some particular index in the existing array object.
I'm not very familiar with python. I reading the book 'Python for Data Analysis' recently, and I'm a bit confused about the numpy boolean indexing and setting.
The book said:
Selecting data from an array by boolean indexing always creates a copy of the data, even if the returned array is unchanged.
Setting values with boolean arrays works in a common-sense way.
And I have tried it as the follow code:
First:
data = np.random.randn(7, 4)
data[data < 0] = 0 # this could change the `data`
Second:
data = np.random.randn(7, 4)
copied = data[data < 0]
copied[1] = 1 # this couldn't change the `data`
I do not quite understand here, anyone can explain it. In my understanding, copied should be pointer to the data[data < 0] slices.
While data[data < 0] = 0 sorta looks like a view being set to 0, that's not what's actually happening. In reality, an ndarray followed by = calls __setitem__ which handles the piecewise assingment.
When the ndarray is on the other side of the =, __setitem__ isn't called and you assign a copy (as boolean indexing always does), which is independent of the original array.
Essentially:
foo[foo != bar] = bar # calls __setitem__
foo[:2] = bar # calls __setitem__
bar = foo[foo != bar] # makes a copy
bar = foo[:2] # makes a view
As a rule of thumb numpy creates a view where possible and a copy where necessary.
When is a view possible? When the data can be addressed using strides, i.e. for example for a 2d array A each A[i, j] sits in memory at address base + i*stride[0] + j*stride[1]. If you create a subarray using just slices this will always be the case which is why you will get a view.
For logical and advanced indexing it will typically not be possible to find a base and strides which happen to address the right elements. Therefore these operations return a new array with data copied.
Based on the sequence of the code:
data = np.random.randn(7, 4) : Thi step creates an array of size 7 by 4
data[data < 0] = 0 : makes all the elements in data which are < 0 as 0
copied = data[data < 0] : This step generates an empty array as there is no element in data which is < 0, because of step 4
copied[1] = 1 : This step raises an error as copied is an empty array and thus index 1 does not exist
I'm fairly new to Python/Numpy. What I have here is a standard array and I have a function which I have vectorized appropriately.
def f(i):
return np.random.choice(2,1,p=[0.7,0.3])*9
f = np.vectorize(f)
Defining an example array:
array = np.array([[1,1,0],[0,1,0],[0,0,1]])
With the vectorized function, f, I would like to evaluate f on each cell on the array with a value of 0.
I am trying to leave for loops as a last resort. My arrays will eventually be larger than 100 by 100, so running each cell individually to look and evaluate f might take too long.
I have tried:
print f(array[array==0])
Unfortunately, this gives me a row array consisting of 5 elements (the zeroes in my original array).
Alternatively I have tried,
array[array==0] = f(1)
But as expected, this just turns every single zero element of array into 0's or 9's.
What I'm looking for is somehow to give me my original array with the zero elements replaced individually. Ideally, 30% of my original zero elements will become 9 and the array structure is conserved.
Thanks
The reason your first try doesn't work is because the vectorized function handle, let's call it f_v to distinguish it from the original f, is performing the operation for exactly 5 elements: the 5 elements that are returned by the boolean indexing operation array[array==0]. That returns 5 values, it doesn't set those 5 items to the returned values. Your analysis of why the 2nd form fails is spot-on.
If you wanted to solve it you could combine your second approach with adding the size option to np.random.choice:
array = np.array([[1,1,0],[0,1,0],[0,0,1]])
mask = array==0
array[mask] = np.random.choice([18,9], size=mask.sum(), p=[0.7, 0.3])
# example output:
# array([[ 1, 1, 9],
# [18, 1, 9],
# [ 9, 18, 1]])
There was no need for np.vectorize: the size option takes care of that already.
I was a R user and I am learning Python (numpy in particular), but I cannot perform a simple task of updating a submatrix in Python which can be very easily done in R.
So I have 2 problems.
First one is say we have a 4 by 4 matrix
A = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
and a 2 by 2 matrix
B = np.array([[100,200],[300,400]]).
I want to grab a 2 by 2 submatrix of A formed by taking 2nd and 4th rows and columns (array([[6,8][14,16]])) and replace it to B.
I can pull the correct matrix by doing
m = [1,3]
A[m][:,m]
but nothing happens to A even after I update it to B. That is
A[m][:,m] = B
print A
and A comes out to be the same.
Is there a way to do this without using loops or maybe with a relatively simple code?
The second problem which is relatively easy is that in R, we can subset matrix with True and False. From above A, we can subset the same 2 by 2 matrix by
m = [F, T, F, T]
A[m,m]
However, in Python the same code does not seem to work because True is 1 and False is 0. I think I can convert [F,T,F,T] to [1,3] and subset, but I thought there may be a one step method to do this.
Is there a easy way to do the same operation in Python when the index is given in terms of True and False?
For part 1, from NumPy for MATLAB Users, there are examples showing both read-only and mutable access to arbitrary slices.
The read-only pattern is similar to what you already describe, A[:, m][m]. This slices the columns first, then the rows, and provides a read-only view of the returned data.
To obtain clean indices for mutating the sub-array, a convenience function is provided, np.ix_. It will stitch together its arguments into an R-like or MATLAB-like slice:
indxs = np.ix_([1,3], [1,3])
A[indxs] = B
The reason behind this is that NumPy follows certain shape-conformability rules (called "broadcasting" rules) about how to infer the shapes you intended based on the shapes present in the data. When NumPy does this for a row index and column index pair, it tries to pair them up element-wise.
So A[[1, 3], [1, 3]] under NumPy's chosen conventions, is interpreted as "Fetch for me the value of A at index (1,1) and at index (3,3)." Which is different than the conventions for this same syntax in MATLAB, Octave, or R.
If you want to get around this manually, without np.ix_, you still can, but you must write down your indices to take advantage of NumPy's broadcasting rules. What this means is you have to give NumPy a reason to believe that you want a 2x2 grid of indices instead of a 1x2 list of two specific points.
You can trick it into believing this by making your row entries into lists themselves: rows = [[1], [3]]. Now when NumPy examines the shape of this (1 x 2 instead of 1 x nothing) it will say, 'aha, the columns had better also be 1 x 2' and automatically promote the list of columns to match individually with each possible row. That's why this also will work:
A[[[1], [3]], [1, 3]] = B
For the second part of your question, the issue is that you want to let NumPy know that your array of [False, True, False, True] is a boolean array and should not be implicitly cast as any other type of array.
This can be done in many ways, but one easy way is to construct an np.array of your boolean values, and its dtype will be bool:
indxs = np.array([False, True, False, True])
print A[:, indxs][indxs] # remember, this one is read only
A[np.ix_(indxs, indxs)] = B
Another helpful NumPy convenience tool is np.s_, which is not a function (it is an instance of numpy.lib.index_tricks.IndexExpression) but can be used kind of like one.
np.s_ allows you to use the element-getting syntax (called getitem syntax in Python, after the __getitem__ method that any new-style class instances will have). By way of example:
In [60]: np.s_[[1,3], [1,3]]
Out[60]: ([1, 3], [1, 3])
In [61]: np.s_[np.ix_([1,3], [1,3])]
Out[61]:
(array([[1],
[3]]), array([[1, 3]]))
In [62]: np.s_[:, [1,3]]
Out[62]: (slice(None, None, None), [1, 3])
In [63]: np.s_[:, :]
Out[63]: (slice(None, None, None), slice(None, None, None))
In [64]: np.s_[-1:1:-2, :]
Out[64]: (slice(-1, 1, -2), slice(None, None, None))
So np.s_ basically just mirrors back to what the slice index object will look like if you were to place it inside the square brackets in order to access some array's data.
In particular, the first two of these np.s_ examples shows you the difference between plain A[[1,3], [1,3]] and the use of np.ix_([1,3], [1,3]) and how they result in different slices.