Summing along a function in a 2D array - python

Is it possible to place a linear function over a 2D array and sum all the elements in the 2D array that coincide with the function? So for example, I would have a 2D array shaped say (400, 500). Now somewhere, I would overlap a linear function which stretches from the bottom of the 2D array towards the top. I now only want to sum the elements of the 2D array that overlap with the linear line.
Is there a fast way to sum only the elements of the 2D array that coincide with the linear line? I have been able to do so by using a for loop within a for loop. However, this already takes quit some time. Especially if I want to start applying this trick to even larger arrays.

It depends somewhat on how exactly you define the line, and what array locations count as being "on" the line. But one simple approach would be to use a boolean mask. You can define a mask along a line pretty easily using numpy.mgrid:
>>> grid = numpy.mgrid[0:5,0:5]
>>> grid
array([[[0, 0, 0, 0, 0],
[1, 1, 1, 1, 1],
[2, 2, 2, 2, 2],
[3, 3, 3, 3, 3],
[4, 4, 4, 4, 4]],
[[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]]])
As you can see, this is a grid of x and y values, and you can then relate them in an equation like so:
>>> grid[0] == 2 * grid[1]
array([[ True, False, False, False, False],
[False, False, False, False, False],
[False, True, False, False, False],
[False, False, False, False, False],
[False, False, True, False, False]], dtype=bool)
>>> grid[0] == grid[1]
array([[ True, False, False, False, False],
[False, True, False, False, False],
[False, False, True, False, False],
[False, False, False, True, False],
[False, False, False, False, True]], dtype=bool)
>>> grid[0] == grid[1] / 2
array([[ True, True, False, False, False],
[False, False, True, True, False],
[False, False, False, False, True],
[False, False, False, False, False],
[False, False, False, False, False]], dtype=bool)
Note that you'll probably have to do some careful thinking about why grid[0] == grid[1] / 2 gives a "continuous" line, while grid[0] == 2 * grid[1] does not, and figure out precisely what behavior you want. (With a slightly more complex equation, you could specify a tolerance value that would allow you to create lines of different thicknesses.)
Then you can use the resulting mask to do a sum:
>>> a = numpy.arange(25).reshape(5, 5)
>>> a
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
>>> a[grid[0] == grid[1] / 2]
array([ 0, 1, 7, 8, 14])
>>> a[grid[0] == grid[1] / 2].sum()
30
This is a lot faster than nested for loops because numpy is very fast. But it still does the same number of operations.
Another approach might be to directly calculate y values from x values. The advantage of this is that it doesn't have to do as many operations, and so will be faster for very very large arrays:
>>> x = numpy.arange(5)
>>> y = x * 2
>>> valid_indices = (x < 5) & (y < 5)
>>> a[x[valid_indices], y[valid_indices]]
array([ 0, 7, 14])
Then just use .sum(). To show what the line looks like now:
>>> a[x[valid_indices], y[valid_indices]] = -1
>>> a
array([[-1, 1, 2, 3, 4],
[ 5, 6, -1, 8, 9],
[10, 11, 12, 13, -1],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
Again, as you can see, there are gaps in the line; if you want to remove those gaps you'll have to change it up a bit by "stretching" the x or y values. Here's a function that does so using a simple slope-intercept specification. You'll still need to add special-casing for 0 and infinite slopes, but this does a lot of the necessary work, and produces a nice, smooth line in all the cases I tested it against:
def linear_index(slope, intercept, x_range, y_range):
if numpy.abs(slope) < 1:
intercept = intercept / slope
slope = 1 / slope
y, x = linear_index(slope, intercept, y_range, x_range)
return x, y
x_min, x_max = x_range
y_min, y_max = y_range
x = numpy.linspace(x_min, x_max - 1, (x_max - x_min - 1) * slope + 1)
y = x * slope + intercept
print x, y
valid_indices = (y >= y_min) & (y < y_max)
return x[valid_indices].astype(int), y[valid_indices].astype(int)
Given an adequate understanding of the Hough Transform, you should have no problem at all converting an (r, theta) pair into slope intercept form. Still, if you need a thicker line, this might not be the best approach.

Related

Histogramming boolean numpy arrays by giving each array a unique label

I have a large sample (M) of boolean arrays of length 'N'. So there are 2^N unique boolean arrays possible.
I would like to know how many arrays are duplicates of each other and create a histogram.
One possibility is to create a unique integer (a[0] + a[1]*2 + a[3]*4 + ...+a[N]*2^(N-1)) for each unique array and histogram that integer.
But this is going to be O(M*N). What is the best way to do this?
numpy.ravel_multi_index is able to do this for you:
arr = np.array([[True, True, True],
[True, False, True],
[True, False, True],
[False, False, False],
[True, False, True]], dtype=int)
nums = np.ravel_multi_index(arr.T, (2,) * arr.shape[1])
>>> nums
array([7, 5, 5, 0, 5], dtype=int64)
And since you need a histogram, use
>>> np.histogram(nums, bins=np.arange(2**arr.shape[1]+1))
(array([1, 0, 0, 0, 0, 3, 0, 1], dtype=int64),
array([0, 1, 2, 3, 4, 5, 6, 7, 8]))
Another option is to use np.unique:
>>> np.unique(arr, return_counts=True, axis=0)
(array([[0, 0, 0],
[1, 0, 1],
[1, 1, 1]]),
array([1, 3, 1], dtype=int64))
With vectorized operation, the creation of a key is much more faster than a[0] + a[1]x2 + a[3]x4 + ...+a[N]*2^(N-1). I think that's no a better solution... in any case you need to almost "read" one time each value, and this require MxN step.
N = 3
M = 5
sample = [
np.array([True, True, True]),
np.array([True, False, True]),
np.array([True, False, True]),
np.array([False, False, False]),
np.array([True, False, True]),
]
multipliers = [2<<i for i in range(N-2, -1, -1)]+[1]
buckets = {}
buck2vec = {}
for s in sample:
key = sum(s*multipliers)
if key not in buckets:
buckets[key] = 0
buck2vec[key] = s
buckets[key]+=1
for key in buckets:
print(f"{buck2vec[key]} -> {buckets[key]} occurency")
Results:
[False False False] -> 1 occurency
[ True False True] -> 3 occurency
[ True True True] -> 1 occurency

is there a method for finding the indexes of a 2d-array based on a given array

suppose we have two arrays like these two:
A=np.array([[1, 4, 3, 0, 5],[6, 0, 7, 12, 11],[20, 15, 34, 45, 56]])
B=np.array([[4, 5, 6, 7]])
I intend to write a code in which I can find the indexes of an array such as A based on values in
the array B
for example, I want the final results to be something like this:
C=[[0 1]
[0 4]
[1 0]
[1 2]]
can anybody provide me with a solution or a hint?
Do you mean?
In [375]: np.isin(A,B[0])
Out[375]:
array([[False, True, False, False, True],
[ True, False, True, False, False],
[False, False, False, False, False]])
In [376]: np.argwhere(np.isin(A,B[0]))
Out[376]:
array([[0, 1],
[0, 4],
[1, 0],
[1, 2]])
B shape of (1,4) where the initial 1 isn't necessary. That's why I used B[0], though isin, via in1d ravels it anyways.
where is result is often more useful
In [381]: np.where(np.isin(A,B))
Out[381]: (array([0, 0, 1, 1]), array([1, 4, 0, 2]))
though it's a bit harder to understand.
Another way to get the isin array:
In [383]: (A==B[0,:,None,None]).any(axis=0)
Out[383]:
array([[False, True, False, False, True],
[ True, False, True, False, False],
[False, False, False, False, False]])
You can try in this way by using np.where().
index = []
for num in B:
for nums in num:
x,y = np.where(A == nums)
index.append([x,y])
print(index)
>>array([[0,1],
[0,4],
[1,0],
[1,2]])
With zip and np.where:
>>> list(zip(*np.where(np.in1d(A, B).reshape(A.shape))))
[(0, 1), (0, 4), (1, 0), (1, 2)]
Alternatively:
>>> np.vstack(np.where(np.isin(A,B))).transpose()
array([[0, 1],
[0, 4],
[1, 0],
[1, 2]], dtype=int64)

Identifying all consecutive positive triplets in a 1D numpy array

Consider the 1D array arr shown below, and assume n = 3.
I want to identify all 'islands' holding >= n consecutive positive values.
The following code succesfully finds the FIRST set of 3 consecutive positive numbers by determining the initial index, but it does not find all such sets.
import numpy as np
arr = np.array([1, -1, 5, 6, 3, -4, 2, 5, 9, 2, 1, -6, 8])
def find_consec_pos(arr, n):
mask = np.convolve(np.greater(arr,0), np.ones(n, dtype=int)) >= n
if mask.any():
return mask.argmax() - n + 1
else:
return None
find_consec_pos(arr, 3)
This gives output 2, the index of the 1st triplet of consecutive positive values.
I want to know how to modify the code to get the output [2, 6, 7, 8], identifying all consecutive positive triples.
This code does the job and is simple while being relatively efficient:
positive = arr > 0
np.where(positive[:-2] & positive[1:-1] & positive[2:])
You could use sliding_window_view:
In [1]: from numpy.lib.stride_tricks import sliding_window_view
In [2]: sliding_window_view(arr, 3) > 0
Out[2]:
array([[ True, False, True],
[False, True, True],
[ True, True, True],
[ True, True, False],
[ True, False, True],
[False, True, True],
[ True, True, True],
[ True, True, True],
[ True, True, True],
[ True, True, False],
[ True, False, True]])
Turning this into your desired function (and assuming you want a list as output):
def find_consec_pos(arr, n):
all_n_positive = np.all(sliding_window_view(arr > 0, n), axis=1)
return np.argwhere(all_n_positive).flatten().tolist()
Demo of some different "window" sizes:
In [4]: arr
Out[4]: array([ 1, -1, 5, 6, 3, -4, 2, 5, 9, 2, 1, -6, 8])
In [5]: find_consec_pos(arr, 3)
Out[5]: [2, 6, 7, 8]
In [6]: find_consec_pos(arr, 4)
Out[6]: [6, 7]
In [7]: find_consec_pos(arr, 2)
Out[7]: [2, 3, 6, 7, 8, 9]

Numpy element-wise in operation

Suppose I have a column vector y with length n, and I have a matrix X of size n*m. I want to check for each element i in y, whether the element is in the corresponding row in X. What is the most efficient way of doing this?
For example:
y = [1,2,3,4].T
and
X =[[1, 2, 3],[3, 4, 5],[4, 3, 2],[2, 2, 2]]
Then the output should be
[1, 0, 1, 0] or [True, False, True, False]
which ever is easier.
Of course we can use a for loop to iterate through both y and X, but is there any more efficient way of doing this?
Vectorized approach using broadcasting -
((X == y[:,None]).any(1)).astype(int)
Sample run -
In [41]: X # Input 1
Out[41]:
array([[1, 2, 3],
[3, 4, 5],
[4, 3, 2],
[2, 2, 2]])
In [42]: y # Input 2
Out[42]: array([1, 2, 3, 4])
In [43]: X == y[:,None] # Broadcasted comparison
Out[43]:
array([[ True, False, False],
[False, False, False],
[False, True, False],
[False, False, False]], dtype=bool)
In [44]: (X == y[:,None]).any(1) # Check for any match along each row
Out[44]: array([ True, False, True, False], dtype=bool)
In [45]: ((X == y[:,None]).any(1)).astype(int) # Convert to 1s and 0s
Out[45]: array([1, 0, 1, 0])

Using numpy.where() to iterate through a matrix

There's something about numpy.where() I do not understand:
Let's say I have a 2D numpy ndarray:
import numpy as np
twodim = np.array([[1, 2, 3, 4], [1, 6, 7, 8], [1, 1, 1, 12], [17, 3, 15, 16], [17, 3, 18, 18]])
Now, would like to create a function which "checks" this numpy array for a variety of conditions.
array([[ 1, 2, 3, 4],
[ 1, 6, 7, 8],
[ 1, 1, 1, 12],
[17, 3, 15, 16],
[17, 3, 18, 18]])
For example, which entries in this array have (A) even numbers (B) greater than 7 (C) divisible by 3?
I would like to use numpy.where() for this, and iterate through each entry of this array, finally finding the elements which match all conditions (if such an entry exists):
even_entries = np.where(twodim % 2 == 0)
greater_seven = np.where(twodim > 7 )
divisible_three = np.where(twodim % 3 == 0)
How does one do this? I am not sure how to iterate through Booleans...
I could access the indices of the matrix (i,j) via
np.argwhere(even_entries)
We could do something like
import numpy as np
twodim = np.array([[1, 2, 3, 4], [1, 6, 7, 8], [1, 1, 1, 12], [17, 3, 15, 16], [17, 3, 18, 18]])
even_entries = np.where(twodim % 2 == 0)
greater_seven = np.where(twodim > 7 )
divisible_three = np.where(twodim % 3 == 0)
for row in even_entries:
for item in row:
if item: #equivalent to `if item == True`
for row in greater_seven:
for item in row:
if item: #equivalent to `if item == True`
for row in divisible_three:
for item in row:
if item: #equivalent to `if item == True`
# something like print(np.argwhere())
Any advice?
EDIT1: Great ideas below. As #hpaulj mentions "Your tests produce a boolean matrix of the same shape as twodim"
This is a problem I'm running into as I toy around---not all conditionals produce matrices the same shape as my starting matrix. For instance, let's say I'm comparing whether the array element has a matching array to the left or right (i.e. horizontally)
twodim[:, :-1] == twodim[:, 1:]
That results in a (5,3) Boolean array, whereas our original matrix is a (5,4) array
array([[False, False, False],
[False, False, False],
[ True, True, False],
[False, False, False],
[False, False, True]], dtype=bool)
If we do the same vertically, that results in a (4,4) Boolean array, whereas the original matrix is (5,4)
twodim[:-1] == twodim[1:]
array([[ True, False, False, False],
[ True, False, False, False],
[False, False, False, False],
[ True, True, False, False]], dtype=bool)
If we wished to know which entries have both vertical and horizontal pairs, it is non-trivial to figure out which dimension we are in.
Your tests produce a boolean matrix of the same shape as twodim:
In [487]: mask3 = twodim%3==0
In [488]: mask3
Out[488]:
array([[False, False, True, False],
[False, True, False, False],
[False, False, False, True],
[False, True, True, False],
[False, True, True, True]], dtype=bool)
As other answers noted you can combine tests logically - with and and or.
np.where is the same as np.nonzero (in this use), and just returns the coordinates of the True values - as a tuple of 2 arrays.
In [489]: np.nonzero(mask3)
Out[489]:
(array([0, 1, 2, 3, 3, 4, 4, 4], dtype=int32),
array([2, 1, 3, 1, 2, 1, 2, 3], dtype=int32))
argwhere returns the same values, but as a transposed 2d array.
In [490]: np.argwhere(mask3)
Out[490]:
array([[0, 2],
[1, 1],
[2, 3],
[3, 1],
[3, 2],
[4, 1],
[4, 2],
[4, 3]], dtype=int32)
Both the mask and tuple can be used to index your array directly:
In [494]: twodim[mask3]
Out[494]: array([ 3, 6, 12, 3, 15, 3, 18, 18])
In [495]: twodim[np.nonzero(mask3)]
Out[495]: array([ 3, 6, 12, 3, 15, 3, 18, 18])
The argwhere can't be used directly for indexing, but may be more suitable for iteration, especially if you want the indexes as well as the values:
In [496]: for i,j in np.argwhere(mask3):
.....: print(i,j,twodim[i,j])
.....:
0 2 3
1 1 6
2 3 12
3 1 3
3 2 15
4 1 3
4 2 18
4 3 18
The same thing with where requires a zip:
for i,j in zip(*np.nonzero(mask3)): print(i,j,twodim[i,j])
BUT in general in numpy we try to avoid iteration. If you can use twodim[mask] directly your code will be much faster.
Logical combinations of the boolean masks are easier to produce than combinations of the where indices. To use the indices I'd probably resort to set operations (union, intersect, difference).
As for a reduced size test, you have to decide how that maps on to the original array (and other tests). e.g.
A (5,3) mask (difference between columns):
In [505]: dmask=np.diff(twodim, 1).astype(bool)
In [506]: dmask
Out[506]:
array([[ True, True, True],
[ True, True, True],
[False, False, True],
[ True, True, True],
[ True, True, False]], dtype=bool)
It can index 3 columns of the original array
In [507]: twodim[:,:-1][dmask]
Out[507]: array([ 1, 2, 3, 1, 6, 7, 1, 17, 3, 15, 17, 3])
In [508]: twodim[:,1:][dmask]
Out[508]: array([ 2, 3, 4, 6, 7, 8, 12, 3, 15, 16, 3, 18])
It can also be combined with 3 columns of another mask:
In [509]: dmask & mask3[:,:-1]
Out[509]:
array([[False, False, True],
[False, True, False],
[False, False, False],
[False, True, True],
[False, True, False]], dtype=bool)
It is still easier to combine tests in the boolean array form than with where indices.
import numpy as np
twodim = np.array([[1, 2, 3, 4], [1, 6, 7, 8], [1, 1, 1, 12], [17, 3, 15, 16], [17, 3, 18, 18]])
condition = (twodim % 2. == 0.) & (twodim > 7.) & (twodim % 3. ==0.)
location = np.argwhere(condition == True)
for i in location:
print i, twodim[i[0],i[1]],
>>> [2 3] 12 [4 2] 18 [4 3] 18
If you want to find where all three conditions are satisfied:
import numpy as np
twodim = np.array([[1, 2, 3, 4], [1, 6, 7, 8], [1, 1, 1, 12], [17, 3, 15, 16], [17, 3, 18, 18]])
mask = (twodim % 2 == 0) & (twodim > 7) & (twodim % 3 =0)
print(twodim[mask])
[12 18 18]
Not sure what you want in the end whether all elements in the row must satisfy the condition and to find those rows or if you want individual elements.

Categories