Build a new array from an existing using a boolean mask - python

I have created a boolean mask, say mask, which I want to apply to an existing array, say old to create an entirely new one, say new, which retains only the non zero elements. The new array should then have a smaller dimension with respect to old.
Can some one suggest me the fastest and more coincise way, without using, if possible, the numpy.append function?

Say you have:
old = np.array([2,4,3,5,6])
mask = [True, False, True, False, False]
Simply do:
new = old[mask]
print(new)
[2 3]
I suggest you read about Boolean or “mask” index arrays

Just use logical indexing
x = x[x!=0]

Related

A boolean list from ranges of True values (start and end), without using a for loop

For example I have this list containing ranges.
x=[[1,4],
[6,7],
[9,9]]
where the first value of each item (e.g. [1,4]) is the start position (1) and, the second value is the end (4) position.
I want to convert this list of ranges into a boolean list, wherein the value is True if the position is between (any of) the ranges (i.e. the start and end positions) indicated in the list above, otherwise the value should be False.
[False, True, True, True, True, False, True, True, False, True]
This is obviously possible using a for loop. However, I am looking for a other options that are one-liners. Ideally, I am looking for some way that could also be applicable to a pandas series.
Note: This is essentially an opposite problem of this question: Get ranges of True values (start and end) in a boolean list (without using a for loop)
A hopefully efficient way using numpy:
low, high = np.array(x).T[:,:, None] # rearrange the limits into a 3d array in a convenient shape
a = np.arange(high.max() + 1) # make a range from 0 to 9
print(((a >= low) & (a <= high)).any(axis=0))
An alternative that edits the array in a python loop:
result = np.zeros(np.array(x).max() + 1, dtype=bool)
for start, end in x:
result[start:end+1] = True
This could be faster depending on the speed of editing a slice of an array relative to numpy 2d matrix comparisons.

Counting ocurrences of specific True/False ordering in Numpy Array

I have a Numpy Array of True and False values like:
test = np.array([False, False, False, True, False, True, False, True, False,False, False, False, True, True, False, True])
I would like to know the number of times the following pattern (False, True, False) happens in the array. In the test above it will be 4. This is not the only pattern, but I assume that when I understand this code I can probably also make the others.
Of course, I can loop over the array. If the first value is equal, compare the next and otherwise go to the next value in the loop. Like this:
totalTimes=0
def swapToBegin(x):
if(x>=len(test)):
x-=len(test)
return(x)
for i in range(len(test)):
if(test[i]==False):
if(test[swapToBegin(i+1)]==True):
if test[swapToBegin(i+2)]==False:
totalTimes += 1
However, since I need to do this many times, this code will be very slow. Little improvements can be made, since this was made very quickly to show what I need. But there must be a better solution.
Is there a better way to search for a pattern in an array? It does not need to combine the end and beginning of the array, since I would be able to this afterwards. But if it can be included it would be nice.
You haven't given any details on how large test is, so for benchmarks of the methods I've used it has 1000 elements. The next important part is to actually profile the code. You can't say it's slow (or fast) until there are hard numbers to back it up. Your code runs in around 1.49ms on my computer.
You can often get improvements with numpy by removing python loops and replacing them with numpy functions.
So, rather than testing each element individually (lots of if conditions could slow things down) I've put it all into one array comparison, then used all to check that every element matches.
check = array([False, True, False])
sum([(test[i:i+3]==check).all() for i in range(len(test) - 2)])
Profiling this shows it running in 1.91ms.
That's actually a step backwards. So, what could be causing the slowdown? Well, array access using [] creates a new array object which could be part of it. A better approach may be to create one large array with the offsets, then use broadcasting to do the comparison.
sum((c_[test[:-2], test[1:-1], test[2:]] == check).all(1))
This time check is compared with each row of the array c_[test[:-2], test[1:-1], test[2:]]. The axis argument (1) of all is used to only count rows that every element matches. This runs in 40.1us. That's a huge improvement.
Of course, creating the array to broadcast is going to have a large cost in terms of copying elements over. Why not do the comparisons directly?
sum(all([test[i:len(test)-2+i]==v for i, v in enumerate(check)], 0))
This runs in 18.7us.
The last idea to speed things up is using as_strided. This is an advanced trick to alter the strides of an array to get the offset array without copying any data. It's usually not worth the effort, but I'm including it here just for fun.
sum((np.lib.index_tricks.as_strided(test, (len(test) - len(check) + 1, len(check)), test.strides + (1, )) == check).all(1))
This also runs in around 40us. So, the extra effort doesn't add anything in this case.
You can use an array containing [False, True, False] and search for this instead.
searchfor = np.array([False, True, False])

Count number of "True" values in boolean Tensor

I understand that tf.where will return the locations of True values, so that I could use the result's shape[0] to get the number of Trues.
However, when I try and use this, the dimension is unknown (which makes sense as it needs to be computed at runtime). So my question is, how can I access a dimension and use it in an operation like a sum?
For example:
myOtherTensor = tf.constant([[True, True], [False, True]])
myTensor = tf.where(myOtherTensor)
myTensor.get_shape() #=> [None, 2]
sum = 0
sum += myTensor.get_shape().as_list()[0] # Well defined at runtime but considered None until then.
You can cast the values to floats and compute the sum on them:
tf.reduce_sum(tf.cast(myOtherTensor, tf.float32))
Depending on your actual use case you can also compute sums per row/column if you specify the reduce dimensions of the call.
I think this is the easiest way to do it:
In [38]: myOtherTensor = tf.constant([[True, True], [False, True]])
In [39]: if_true = tf.count_nonzero(myOtherTensor)
In [40]: sess.run(if_true)
Out[40]: 3
Rafal's answer is almost certainly the simplest way to count the number of true elements in your tensor, but the other part of your question asked:
[H]ow can I access a dimension and use it in an operation like a sum?
To do this, you can use TensorFlow's shape-related operations, which act on the runtime value of the tensor. For example, tf.size(t) produces a scalar Tensor containing the number of elements in t, and tf.shape(t) produces a 1D Tensor containing the size of t in each dimension.
Using these operators, your program could also be written as:
myOtherTensor = tf.constant([[True, True], [False, True]])
myTensor = tf.where(myOtherTensor)
countTrue = tf.shape(myTensor)[0] # Size of `myTensor` in the 0th dimension.
sess = tf.Session()
sum = sess.run(countTrue)
There is a tensorflow function to count non-zero values tf.count_nonzero. The function also accepts an axis and keep_dims arguments.
Here is a simple example:
import numpy as np
import tensorflow as tf
a = tf.constant(np.random.random(100))
with tf.Session() as sess:
print(sess.run(tf.count_nonzero(tf.greater(a, 0.5))))

Updating array values using two masks a[mask1][mask2]=value

Given an array and a mask, we can assign new values to the positions that are TRUE in the mask:
import numpy as np
a = np.array([1,2,3,4,5,6])
mask1 = (a==2) | (a==5)
a[mask1] = 100
print a
# [ 1 100 3 4 100 6]
However, if we apply a second mask over the first one, we can access to the values but we cannot modify them:
a = np.array([1,2,3,4,5,6])
mask1 = (a==2) | (a==5)
mask2 = (a[mask1]==2)
print a[mask1][mask2]
# [2]
a[mask1][mask2] = 100
print a
# [ 1 2 3 4 5 6 ]
Why does it happen?
(Even if it seems a bizarre way to do this. Just out of curiosity)
This is probably because you mix getters and setters preventing backpropagation.
It's because you use mark1 as an indexer:
>>> mask1
array([False, True, False, False, True, False], dtype=bool)
now by setting a[mask1] = 100, you will set all the elements where mask1 was true thus resulting in
>>> a
array([ 1, 100, 3, 4, 100, 6])
note that you have only called a "setter" so to speak on a.
Now for a[mask1][mask2] = 100 you actually call both a getter and setter. Indeed you can write this as:
temp = a[mask1] #getter
temp[mask2] = 2#setter
as a result you only set the value in the temp, and thus the value is not "backpropagated" so to speak to a itself. You should see temp as a copy (although internally it is definitely possible that a python interpreter handles it differently).
Note: note that there can be circumstances where this behavior works: if temp is for instance a view on an array, it could support backwards propagation. This page for instance shows ways to return a view instead of a copy.
You are chaining advanced* indexing operations for the assignment, which prevents the value 100 being written back to the original array.
a[mask1] returns a new array with a copy of the original data. Writing a[mask1][mask2] = 100 means that this new array is indexed with mask2 and the value 100 assigned to it. This leaves a unchanged.
Simply viewing the items will appear to work fine because the values you pick out from the copy a[mask1] are the values you would want from the original array (although this is still inefficient as data is copied multiple times).
*advanced (or "fancy") indexing is triggered with a boolean array or an array of indices. It always returns a new array, unlike basic indexing which returns a view onto the original data (this is triggered, for example, by slicing).

Numpy/Python: Array iteration without for-loop

So it's another n-dimensional array question:
I want to be able to compare each value in an n-dimensional arrays with its neighbours. For example if a is the array which is 2-dimensional i want to be able to check:
a[y][x]==a[y+1][x]
for all elements. So basically check all neighbours in all dimensions. Right now I'm doing it via:
for x in range(1,a.shape[0]-1):
do.something(a[x])
The shape of the array is used, so that I don't run into an index out of range at the edges. So if I want to do something like this in n-D for all elements in the array, I do need n for-loops which seems to be untidy. Is there a way to do so via slicing? Something like a==a[:,-1,:] or am I understanding this fully wrong? And is there a way to tell a slice to stop at the end? Or would there be another idea of getting things to work in a totally other way? Masked arrays?
Greets Joni
Something like:
a = np.array([1,2,3,4,4,5])
a == np.roll(a,1)
which returns
array([False, False, False, False, True, False], dtype=bool
You can specify an axis too for higher dimensions, though as others have said you'll need to handle the edges somehow as the values wrap around (as you can guess from the name)
For a fuller example in 2D:
# generate 2d data
a = np.array((np.random.rand(5,5)) * 10, dtype=np.uint8)
# check all neighbours
for ax in range(len(a.shape)):
for i in [-1,1]:
print a == np.roll(a, i, axis=ax)
This might also be useful, this will compare each element to the following element, along axis=1. You can obviously adjust the axis or the distance. The trick is to make sure that both sides of the == operator have the same shape.
a[:, :-1, :] == a[:, 1:, :]
How about just:
np.diff(a) != 0
?
If you need the neighbours in the other axis, maybe diff the result of np.swapaxes(a) and merge the results together somehow ?

Categories