find stretches of Trues in numpy array - python

Is there a good way to find stretches of Trues in a numpy boolean array? If I have an array like:
x = numpy.array([True,True,False,True,True,False,False])
Can I get an array of indices like:
starts = [0,3]
ends = [1,4]
or any other appropriate way to store this information. I know this can be done with some complicated while loops, but I'm looking for a better way.

You can pad x with Falses (one at the beginning and one at the end), and use np.diff. A "diff" of 1 means transition from False to True, and of -1 means transition from True to False.
The convention is to represent range's end as the index one after the last. This example complies with the convention (you can easily use ends-1 instead of ends to get the array in your question):
x1 = np.hstack([ [False], x, [False] ]) # padding
d = np.diff(x1.astype(int))
starts = np.where(d == 1)[0]
ends = np.where(d == -1)[0]
starts, ends
=> (array([0, 3]), array([2, 5]))

Related

How to identify / find a 1D numpy array pattern in a 2D numpy array?

I have a 2D numpy array, and am trying to find the entries where it is equal to a 1D array, but the dimensions of these two arrays prohibit broadcasting. Specifically, my 2D array is like 300x400, and I want to see where it's equal to a 2 element row vector [1, -1].
I am trying to find the location of pixels in an image that are on the border of segmentations. This is denoted in this mask by a 1 being adjacent to the foreground and -1 to the background. So I need to find locations where [1,-1] occurs in the rows of the mask, let's say a.
I have tried a == [1,-1] but this just performs object-level equality and returns False.
I guess I could do this with
for i in range(a.shape[0]):
for j in range(a.shape[1]-1):
if a[i,j] == 1:
if a[i,j+1] == -1:
print(i)
but is there not some cute way to do this with a numpy method or something? I hate loops
arr = np.array([[1,1,-1],[1,-1,-1]])
arr_idx = (arr==1)[:,:-1] & (arr==-1)[:,1:]
Gives
>>> arr_idx
array([[False, True],
[ True, False]])
Which is an index for things that meet your criteria. Note that this is shaped with one fewer column than your input matrix (for obvious reasons).
You can add a column on one side or the other to change the indexing to either side of the pair you're looking for.
arr_idx = np.concatenate((np.zeros(shape=(2, 1), dtype=bool), arr_idx), axis=1)
>>> arr_idx
array([[False, False, True],
[False, True, False]])
Sticking a new column on the left gives the index for the -1 component of the pair.
You can use a 2d cross-correlation for this.
There is a function in scipy for this: signal.correlate2d()
import numpy as np
from scipy import signal
arr = np.array([[1,1,-1],
[1,-1,-1]])
# the pattern you are looking for, has to be 2d
krn = np.array([[1,-1]])
res = signal.correlate2d(arr, krn, mode='same')
print(res)
The result is
[[ 0 2 -1]
[ 2 0 -1]]
it is the higher the better the match. In your case the 2 indicates the positions where your pattern is found.
Check for a 1 starting with the first index (drop the last as no -1 can be after it)
tmp1 = a[:,0:-1]
t1 = tmp1==1
t1 is True wherever the shifted a is 1
Check for -1 starting from the first till the last index
tmp2 = a[:,1:]
t2 = tmp2 ==-1
t1 and t2 contain True and False (shifted according to your goal)
t1*t2 will give you True in the rows where your condition is fulfilled. Summing over the rows gives you a number above zero at every row index i
res = np.sum(t1*t2,axis=1)
desired_Indices = np.where(res>0)
... everybody hates loops ;)

finding the occurrence of vector v (1,k) inside a matrix M (m,k)

I want to find the number of occurrences of vector v in matrix M.
What I have is a matrix the size (60K, 10)
and I initialised a test vector v (1,10):
tester = np.zeros((1, 10))
Now I want to check how much time that vector entirely repeats itself in the matrix rows.
I did it iterative and it works, but the fact that the matrix is very large, it affects the performance and im trying to find some more elegant and faster way.
would appreciate some help
Thanks.
you can do the following:
temp = np.where((prediction == tester)).all(axis=1))
len(temp[0])
what np.where() returns in the case it has no values [x,y] accept for the condition is the indices, in your case it will return the True and False option, starting from the True.
so using this will sure to lower your running time, and for me its much more elegant then looping through the matrix.
you can check np.where api:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html
Just compare and use all, so each row will result in a True value only if all its elements compare equal to the reference array. Then, you can simply sum the result, since int(True) == 1.
Example:
np.random.seed(0)
data = np.random.randint(0, 2, size=(50, 3))
to_match = np.random.randint(0, 2, size=(1, 3))
print(to_match)
print((data == to_match).all(axis=1).sum())
Output:
[[0 0 0]]
4
...which means that there are 4 instances of [0, 0, 0] in data.

Evaluating a function using numpy

What is the significance of the return part when evaluating functions? Why is this necessary?
Your assumption is right: dfdx[0] is indeed the first value in that array, so according to your code that would correspond to evaluating the derivative at x=-1.0.
To know the correct index where x is equal to 0, you will have to look for it in the x array.
One way to find this is the following, where we find the index of the value where |x-0| is minimal (so essentially where x=0 but float arithmetic requires taking some precautions) using argmin :
index0 = np.argmin(np.abs(x-0))
And we then get what we want, dfdx at the index where x is 0 :
print dfdx[index0]
An other but less robust way regarding float arithmetic trickery is to do the following:
# we make a boolean array that is True where x is zero and False everywhere else
bool_array = (x==0)
# Numpy alows to use a boolean array as a way to index an array
# Doing so will get you the all the values of dfdx where bool_array is True
# In our case that will hopefully give us dfdx where x=0
print dfdx[bool_array]
# same thing as oneliner
print dfdx[x==0]
You give the answer. x[0] is -1.0, and you want the value at the middle of the array.`np.linspace is the good function to build such series of values :
def f1(x):
g = np.sin(math.pi*np.exp(-x))
return g
n = 1001 # odd !
x=linspace(-1,1,n) #x[n//2] is 0
f1x=f1(x)
df1=np.diff(f1(x),1)
dx=np.diff(x)
df1dx = - math.pi*np.exp(-x)*np.cos(math.pi*np.exp(-x))[:-1] # to discard last element
# In [3]: np.allclose(df1/dx,df1dx,atol=dx[0])
# Out[3]: True
As an other tip, numpy arrays are more efficiently and readably used without loops.

To sum arrays and get the final large value of pixel == 1, others values == 0

I am new in python syntax.
I want to sum 5 arrays (boolean, have value of pixels 1 or 0) and put the the final large value of pixel(5) == 1, others values == 0.
resArraySum = np.array(5, (first | second | third | fourth | fifth), 1, 0)
print resArraySum
This is not correct.
If I did understood properly, you want to add 5 arrays and then create a mask where the pixels sum 5. I see here some miss-concepts that have nothing to do with python and its syntax.
First, booleans in python are defined as True and False, 0 and 1 are just integers with no boolean property (not entirely true if you compare them bitewise, but lets just leave this for now).
Second, what you are looking for is just the logical and operator, summing 5 arrays of 0 and 1 and then finding the indexes that sum to 5 is a bit overhead, you could just directly compare them logically.
The pseudo-code would look like this:
For small number of masks:
result = bool_mask[0] & bool_mask[1] & bool_mask[2] & bool_mask[3]
For a large number of masks:
# bool_mask = [a, b, c, d, e, f....] number of boolean arrays
result = bool_mask[0] # Equivalent to:
for mask in bool_mask[1:]: # for i in range(1, len(bool_mask)):
result &= mask # result &= bool_mask[i]
With N the number of masks and bool_mask a list containing the boolean arrays that you want to compare.
So, the first thing you have to do, is to properly create a boolean array. With numpy you can do that in a variety of different ways:
Creating it:
>>> A = np.array([1, 0, 1], dtype=np.bool) # will cast 1 to True and 0 to False
>>> A
array([ True, False, True], dtype=bool)
Converting it:
>>> A = np.array([1, 0, 1], dtype=int) # An integer array with 0 and 1
>>> A = A.astype(np.bool)
>>> A
array([ True, False, True], dtype=bool)
Comparing it:
>>> A = np.array([1, 0, 1], dtype=int) # Same int array
>>> A = A > 0.5
>>> A
array([ True, False, True], dtype=bool)
After you have your boolean arrays made truly boolean, either any of the pseudocode (it is real working code tho) above would work just fine.
Afterwards, if you will end up with an array result of type np.bool (boolean) with True values where all the masks intersect and False elsewhere. If you really want to go back to the 0 and 1 values, you could just cast the result down:
result = result.astype(int)
And True will be mapped to 1 while False to 0. Other ways of doing similar things would be:
result = result * 1
Any kind of numerical operations in numpy will cast a boolean array to integers.
Now, you will find (as mentioned above), that although 1 and 0 arrays are not boolean arrays, same and & operator will make bite-wise comparisons that would end up behaving like boolean arrays (as long as they only contain 1 and 0 values). Thus,
result = a & b & c & d
would work for both integer (with only 1 and 0) and boolean arrays. However, I would suggest using boolean arrays whether possible, as many interesting features of numpy such as boolean indexing only work if the arrays are truly boolean (an integer array of 0 and 1 would behave very differently that what you would expect).

Walk through each column in a numpy matrix efficiently in Python

I have a very big two-dimensions array in Python, using numpy library. I want to walk through each column efficiently and check each time if elements are different from 0 to count their number in every column.
Suppose I have the following matrix.
M = array([[1,2], [3,4]])
The following code enables us to walk through each row efficiently, for example (it is not what I intend to do of course!):
for row_idx, row in enumerate(M):
print "row_idx", row_idx, "row", row
for col_idx, element in enumerate(row):
print "col_idx", col_idx, "element", element
# update the matrix M: square each element
M[row_idx, col_idx] = element ** 2
However, in my case I want to walk through each column efficiently, since I have a very big matrix.
I've heard that there is a very efficient way to achieve this using numpy, instead of my current code:
curr_col, curr_row = 0, 0
while (curr_col < numb_colonnes):
result = 0
while (curr_row < numb_rows):
# If different from 0
if (M[curr_row][curr_col] != 0):
result += 1
curr_row += 1
.... using result value ...
curr_col += 1
curr_row = 0
Thanks in advance!
In the code you showed us, you treat numpy's arrays as lists and for what you can see, it works! But arrays are not lists, and while you can treat them as such it wouldn't make sense to use arrays, or even numpy.
To really exploit the usefulness of numpy you have to operate directly on arrays, writing, e.g.,
M = M*M
when you want to square the elements of an array and using the rich set of numpy functions to operate directly on arrays.
That said, I'll try to get a bit closer to your problem...
If your intent is to count the elements of an array that are different from zero, you can use the numpy function sum.
Using sum, you can obtain the sum of all the elements in an array, or you can sum across a particular axis.
import numpy as np
a = np.array(((3,4),(5,6)))
print np.sum(a) # 18
print np.sum(a, axis=0) # [8, 10]
print np.sum(a, axis=1) # [7, 11]
Now you are protesting: I don't want to sum the elements, I want to count the non-zero elements... but
if you write a logical test on an array, you obtain an array of booleans, e.g, we want to test which elements of a are even
print a%2==0
# [[False True]
# [False True]]
False is zero and True is one, at least when we sum it...
print np.sum(a%2==0) # 2
or, if you want to sum over a column, i.e., the index that changes is the 0-th
print np.sum(a%2==0, axis=0) # [0 2]
or sum across a row
print np.sum(a%2==0, axis=1) # [1 1]
To summarize, for your particular use case
by_col = np.sum(M!=0, axis=0)
# use the counts of non-zero terms in each column, stored in an array
...
# if you need the grand total, use sum again
total = np.sum(by_col)

Categories