Replacing numpy array elements with chained masks - python

Consider some array arr and advanced indexing mask mask:
import numpy as np
arr = np.arange(4).reshape(2, 2)
mask = A < 2
Using advanced indexing creates a new copy of an array. Accordingly, one cannot "chain" a mask with an an additional mask or even with a basic slicing operation to replace elements of an array:
submask = [False, True]
arr[mask][submask] = -1 # chaining 2 masks
arr[mask][:] = -1 # chaining a mask with a basic slicing operation
print(arr)
[[0 1]
[2 3]]
I have two related questions:
1/ What is the best way to replace elements of an array using chained masks?
2/ If advanced indexing returns a copy of an array, why does the following work?
arr[mask] = -1
print(arr)
[[-1 -1]
[ 2 3]]

The short answer:
you have to figure out a way of combining the masks. Since masks can "chain" in different ways I don't think there's a simple all-purpose substitute.
indexing can either be a __getitem__ call, or a __setitem__. Your last case is a set.
With chained indexing, a[mask1][mask2] =value gets translated into
a.__getitem__(mask1).__setitem__(mask2, value)
Whether a gets modified or not depends on what the first getitem produces (a view vs copy).
In [11]: arr = np.arange(4).reshape(2,2)
In [12]: mask = arr<2
In [13]: mask
Out[13]:
array([[ True, True],
[False, False]])
In [14]: arr[mask]
Out[14]: array([0, 1])
Indexing with a list or array may preserve the number of dimensions, but a boolean like this returns a 1d array, the items where the mask is true.
In your example, we could tweak the mask (details may vary with the intent of the 2nd mask):
In [15]: mask[:,0]=False
In [16]: mask
Out[16]:
array([[False, True],
[False, False]])
In [17]: arr[mask]
Out[17]: array([1])
In [18]: arr[mask] += 10
In [19]: arr
Out[19]:
array([[ 0, 11],
[ 2, 3]])
Or a logical combination of masks:
In [26]: (np.arange(4).reshape(2,2)<2)&[False,True]
Out[26]:
array([[False, True],
[False, False]])

Couple of good questions! My take:
I would do something like this:
x,y=np.where(mask)
arr[x[submask],y[submask]] = -1
From the official document:
Most of the following examples show the use of indexing when referencing data in an array. The examples work just as well when assigning to an array. See the section at the end for specific examples and explanations on how assignments work.
which means arr[mask]=1 is referrencing, while arr[mask] is extracting data and creates a copy.

Related

Count the occurrences of a specific value and remove them at the same time

I want to count the occurrences of a specific value (in my case -1) in a numpy array and delete them at the same time.
I could do that so here is what I've done:
a = np.array([1, 2, 0, -1, 3, -1, -1])
b = a[a==-1]
a = np.delete(a, np.where(a==-1))
print("a -> ", a) # a -> [1 2 0 3]
print("b -> ", b) # b -> 3
Is there any more optimised way to do it ?
Something like this ?
Using numpy like you did is probably more optimized though.
a = [x for x in a if x != -1]
First, a list in-place count and delete operation:
In [100]: al=a.tolist(); cnt=0
In [101]: for i in range(len(a)-1,-1,-1):
...: if al[i]==-1:
...: del al[i]
...: cnt += 1
In [102]: al
Out[102]: [1, 2, 0, 3]
In [103]: cnt
Out[103]: 3
It operates in place, but has to work from the end. The list comprehension alternative makes a new list, but often is easier to write and read.
The cleanest array operation uses a boolean mask.
In [104]: idx = a==-1
In [105]: idx
Out[105]: array([False, False, False, True, False, True, True], dtype=bool)
In [106]: np.sum(idx) # or np.count_nonzero(idx)
Out[106]: 3
In [107]: a[~idx]
Out[107]: array([1, 2, 0, 3])
You have to identify, in one way or other, all elements that match the target. The count is a trivial operation. Masking is also easy.
np.delete has to be told which items to delete; and in one way or other constructs a new array that contains all but the 'deleted' ones. Because of its generality it will almost always be slower than a direct action like this masking.
np.where (aka np.nonzeros) uses count_nonzero to determine how many values it will return.
So I'm proposing the same actions as you are doing, but in a little more direct way.

flatten arrays in a list in python

I have multiple numpy masked arrays arr0, arr1, ..., arrn.
I put them in a list arrs = [arr0, ..., arrn].
I want to flatten these arrays et put a mask on them. I did something like:
for arr in arrs:
arr = np.ravel(arr)
arr[mask] = ma.masked
I do not understand when Python make copies and when it is just a pointer. This for loop does not flatten the arr0, ..., arrn, (whereas ravel outputs a view and not a copy) it just flattens the variable arr, although it does change their mask !
As I understand it, arr is a view of the elements in the list arrs, so when I change elements of arr it changes the elements of the corresponding array in the list. But when I assign a new value to arr it does not change the original array, even if the assignement is supposed to be a view of this array. Why ?
Edit with an example:
Arrays to flatten:
arr0 = masked_array(data=[[1,2],[3,4]], mask=False)
arr1 = masked_array(data=[[5,6],[7,8]], mask=False)
mask = [[False,True],[True,False]]
Expected output:
arr0 = masked_array(data=[[1,--],[--,4]], mask=[[False,True],[True,False]])
arr1 = masked_array(data=[[5,--],[--,8]], mask=[[False,True],[True,False]])
I'd like to do this in a loop because I have a lot of arrays (15 more or less), and I want to use the arrays name in the code. Is there no other way than do to:
arr0 = np.ravel(arr0)
...
arrn = np.ravel(arrn)
In [1032]: arr0 = np.ma.masked_array(data=[[1,2],[3,4]], mask=False)
In [1033]: arr1 = np.ma.masked_array(data=[[5,6],[7,8]], mask=False)
This is the basic way of iterating over a list, applying some action to each element, and collecting the results in another list:
In [1037]: ll=[arr0,arr1]
In [1038]: ll1=[]
In [1047]: for a in ll:
a1=a.flatten() # makes a copy
a1.mask=mask
ll1.append(a1)
In [1049]: ll1
Out[1049]:
[masked_array(data = [1 -- -- 4], mask = [False True True False],
fill_value = 999999),
masked_array(data = [5 -- -- 8], mask = [False True True False],
fill_value = 999999)]
Often that can be writen a list comprehension
[foo(a) for a in alist]
but the action here isn't a neat function
If I use ravel instead, a1 is a view (not a copy), and applying mask to it changes the mask of a as well - the result is changed masks for arr0, but no change in shape:
In [1051]: for a in ll:
......: a1=a.ravel()
......: a1.mask=mask
(the same happens with your a=a.ravel(). The a= assigns a new value to a, breaking the link to the iteration value. That's true for any Python iteration. It's best to use new variable names inside the iteration like a1 so you don't confuse yourself.)
Essentially the same as
In [1054]: for a in ll:
......: a.mask=mask
I can change the shape in the same in-place way
In [1055]: for a in ll:
......: a.shape=[-1] # short hand for inplace ravel
......: a.mask=mask
In [1056]: arr0
Out[1056]:
masked_array(data = [1 -- -- 4],
mask = [False True True False],
fill_value = 999999)
Here's a functional way of creating new arrays with new shape and mask, and using it in a list comprehension (and no change to arr0)
[np.ma.masked_array(a,mask=mask).ravel() for a in [arr0,arr1]]
Understanding these alternatives does require understanding how Python assigns iterative variables, and how numpy makes copies and views.

python numpy strange boolean arithmetic behaviour

Why is it, in python/numpy:
from numpy import asarray
bools=asarray([False,True])
print(bools)
[False True]
print(1*bools, 0+bools, 0-bools) # False, True are valued as 0, 1
[0 1] [0 1] [ 0 -1]
print(-2*bools, -bools*2) # !? expected same result! :-/
[0 -2] [2 0]
print(-bools) # this is the reason!
[True False]
I consider it weird that -bools returns logical_not(bools), because in all other cases the behaviour is "arithmetic", not "logical".
One who wants to use an array of booleans as a 0/1 mask (or "characteristic function") is forced to use somehow involute expressions such as (0-bools) or (-1)*bools, and can easily incur into bugs if he forgets about this.
Why is it so, and what would be the best acceptable way to obtain the desired behaviour? (beside commenting of course)
Its all about operator order and data types.
>>> import numpy as np
>>> B = np.array([0, 1], dtype=np.bool)
>>> B
array([False, True], dtype=bool)
With numpy, boolean arrays are treated as that, boolean arrays. Every operation applied to them, will first try to maintain the data type. That is way:
>>> -B
array([ True, False], dtype=bool)
and
>>> ~B
array([ True, False], dtype=bool)
which are equivalent, return the element-wise negation of its elements. Note however that using -B throws a warning, as the function is deprecated.
When you use things like:
>>> B + 1
array([1, 2])
B and 1 are first casted under the hood to the same data type. In data-type promotions, the boolean array is always casted to a numeric array. In the above case, B is casted to int, which is similar as:
>>> B.astype(int) + 1
array([1, 2])
In your example:
>>> -B * 2
array([2, 0])
First the array B is negated by the operator - and then multiplied by 2. The desired behaviour can be adopted either by explicit data conversion, or adding brackets to ensure proper operation order:
>>> -(B * 2)
array([ 0, -2])
or
>>> -B.astype(int) * 2
array([ 0, -2])
Note that B.astype(int) can be replaced without data-copy by B.view(np.int8), as boolean are represented by characters and have thus 8 bits, the data can be viewed as integer with the .view method without needing to convert it.
>>> B.view(np.int8)
array([0, 1], dtype=int8)
So, in short, B.view(np.int8) or B.astype(yourtype) will always ensurs that B is a [0,1] numeric array.
Numpy arrays are homogenous—all elements have the same type for a given array, and the array object stores what type that is. When you create an array with True and False, it is an array of type bool and operators behave on the array as such. It's not surprising, then, that you get logical negation happening in situations that would be logical negation for a normal bool. When you use the arrays for integer math, then they are converted to 1's and 0's. Of all your examples, those are the more anomalous cases, that is, it's behavior that shouldn't be relied upon in good code.
As suggested in the comments, if you want to do math with an array of 0's and 1's, it's better to just make an array of 0's and 1's. However, depending on what you want to do with them, you might be better served looking into functions like numpy.where().

How to generate a bool 2D arrays from two 1D arrays using numpy

I have two arrays a=[1,2,3,4] and b=[2,3]. I am wondering is there an efficient way to construct a boolean 2D array c (2D matrix, i.e. 2*4 matrix) based on array element comparsions, i.e. c[0,0] = true iff a[0] == b[0]. The basic way is to iterate through all the elements of a and b, but I think there maybe a better using numpy. I checked numpyreference, but could not find a routine could exactly that.
thanks
If I understood the question correctly, you can extend the dimensions of b with np.newaxis/None to form a 2D array and then perform equality check against a, which will bring in broadcasting for a vectorized solution, like so -
b[:,None] == a
Sample run -
In [5]: a
Out[5]: array([1, 2, 3, 4])
In [6]: b
Out[6]: array([2, 3])
In [7]: b[:,None] == a
Out[7]:
array([[False, True, False, False],
[False, False, True, False]], dtype=bool)

Equality of copy.copy and copy.deepcopy in python copy module

I am creating a list of numpy arrays then copying it to another array to keep an original copy. Copying was done using deepcopy() function. When I am comparing the two arrays now, it is showing false in equivalence. But its all good when I am using copy() function .I understand the difference between copy and deepcopy function, but shall the equivalence be not same?
That is:
grid1=np.empty([3,3],dtype=object)
for i in xrange(3):
for j in xrange(3):
grid1[i][j] = [i,np.random.uniform(-3.5,3.5,(3,3))]
grid_init=[]
grid_init=copy.deepcopy(grid1)
grid1==grid_init #returns False
grid_init=[]
grid_init=copy.copy(grid1)
grid1==grid_init #returns True
grid_init=[]
grid_init=copy.deepcopy(grid1)
np.array_equal(grid1,grid_init) #returns False
Shall all be not true?
This is what I'm getting when running the first example:
WARNING:py.warnings:/usr/local/bin/ipython:1: DeprecationWarning: elementwise comparison failed; this will raise the error in the future.
To see why the elementwise comparison fails, simply try to compare a single element:
grid_init=copy.deepcopy(grid1)
grid_init[0][0] == grid1[0][0]
>>> ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
This fails because the second element in the list is in itself a numpy array, and comparison of two numpy arrays does not return a bool (but an array).
Now, why does the example case behave differently?
Seems to be some interpreter optimization which avoid the actual comparison logic if the two objects are the same one. The two are the same object, because the copying was shallow.
grid_init=copy.copy(grid1)
grid_init[0][0] is grid1[0][0]
> True
grid_init[0][0] == grid1[0][0]
> True
The root cause is that you're using a numpy array of dtype=object, with lists in it. This is not a good idea, and can lead to all sorts of weirdnesses.
Instead, you should simply create 2 aligned arrays, one for the first element in your lists, and one for the second.
I must be running a different version of numpy/python, but I get slightly different errors and/or results. Still the same issue applies - mixing arrays and lists can produce complicated results.
Make the 2 copies
In [217]: x=copy.copy(grid1)
In [218]: y=copy.deepcopy(grid1)
Equality with the shallow copy, gives a element by element comparison, a 3x3 boolean:
In [219]: x==grid1
Out[219]:
array([[ True, True, True],
[ True, True, True],
[ True, True, True]], dtype=bool)
The elements are 2 item lists:
In [220]: grid1[0,0]
Out[220]:
[0, array([[ 2.08833787, -0.24595155, -3.15694342],
[-3.05157909, 1.83814619, -0.78387624],
[ 1.70892355, -0.87361521, -0.83255383]])]
And in the shallow copy, the list ids are the same. The 2 arrays have different data buffers (x is not a view), but they both point to the same list objects (located else where in memeory).
In [221]: id(grid1[0,0])
Out[221]: 2958477004
In [222]: id(x[0,0])
Out[222]: 2958477004
With the same id the lists are equal (they also satisfy the is test).
In [234]: grid1[0,0]==x[0,0]
Out[234]: True
But == with the deepcopy produces a simple False. No element by element comparison here. I'm not sure why. Maybe this is an area in which numpy is undergoing development.
In [223]: y==grid1
Out[223]: False
Note that the deepcopy element ids are different:
In [229]: id(y[0,0])
Out[229]: 2957009900
When I try to apply == to an element of these arrays I get an error:
In [235]: grid1[0,0]==y[0,0]
...
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
This is the error that comes up repeatedly in SO questions, usually because people try to use an boolean array (from a comparison) in a scalar Python context.
I can compare the arrays with in the lists:
In [236]: grid1[0,0][1]==y[0,0][1]
Out[236]:
array([[ True, True, True],
[ True, True, True],
[ True, True, True]], dtype=bool)
I can reproduce the ValueError with a simpler comparison - 2 lists, which contain an array. On the surface they look the same, but because the arrays have different ids, it fails.
In [239]: [0,np.arange(3)]==[0,np.arange(3)]
...
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
This pair of comparisons shows what is going on:
In [242]: [0,np.arange(3)][0]==[0,np.arange(3)][0]
Out[242]: True
In [243]: [0,np.arange(3)][1]==[0,np.arange(3)][1]
Out[243]: array([ True, True, True], dtype=bool)
Python compares the respective elements of the lists, and then tries to perform a logical operation to combine them, all(). But it can't perform all on [True, array([True,True,True])].
So in my version, y==grid1 returns False because the element by element comparisons return ValueErrors. It's either that or raise an error or warning. They clearly aren't equal.
In sum, with this array of lists of number and array, equality tests end up mixing array operations and list operations. The outcomes are logical, but complicated. You have to be keenly aware of how arrays are compared, and how lists are compared. They are not interchangeable.
A structured array
You could put this data in a structured array, with a dtype like
dt = np.dtype([('f0',int),('f1',float,(3,3))])
In [263]: dt = np.dtype([('f0',int),('f1',float,(3,3))])
In [264]: grid2=np.empty([3,3],dtype=dt)
In [265]: for i in range(3):
for j in range(3):
grid2[i][j] = (i,np.random.uniform(-3.5,3.5,(3,3)))
.....:
In [266]: grid2
Out[266]:
array([[ (0,
[[2.719807845330254, -0.6379512247418969, -0.02567206509563602],
[0.9585030371031278, -1.0042751112999135, -2.7805349057485946],
[-2.244526250770717, 0.5740647379258945, 0.29076071288760574]]),
....]])]],
dtype=[('f0', '<i4'), ('f1', '<f8', (3, 3))])
The first field, integers can be fetched with (giving a 3x3 array)
In [267]: grid2['f0']
Out[267]:
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]])
The second field contains 3x3 arrays, which when accessed by field name are a 4d array:
In [269]: grid2['f1'].shape
Out[269]: (3, 3, 3, 3)
A single element is a record (or tuple),
In [270]: grid2[2,1]
Out[270]: (2, [[1.6236266210555836, -2.7383730706629636, -0.46604477485902374], [-2.781740733659544, 0.7822732671353201, 3.0054266762730473], [3.3135671425199824, -2.7466097112667103, -0.15205961855874406]])
Now both kinds of copy produce the same thing:
In [271]: x=copy.copy(grid2)
In [272]: y=copy.deepcopy(grid2)
In [273]: x==grid2
Out[273]:
array([[ True, True, True],
[ True, True, True],
[ True, True, True]], dtype=bool)
In [274]: y==grid2
Out[274]:
array([[ True, True, True],
[ True, True, True],
[ True, True, True]], dtype=bool)
Since grid2 is pure ndarray (no intermediate lists) I suspect copy.copy and copy.deepcopy end up using grid2.copy(). In numpy we normally use the array copy method, and don't bother with the copy module.
p.s. it appears that with dtype=object, grid1.copy() is the same as copy.copy(grid1) - a new array, but the same object pointers (i.e. same data).

Categories