Python numpy: Add elements of a numpy array of arrays to elements of another array of arrays initialized to at the specified positions - python

Suppose we have a numpy array of numpy arrays of zeros as
arr1=np.zeros((len(Train),(L))
where Train is a (dataset) numpy array of arrays of integers of fixed length.
We also have another 1d numpy array, positions of length as len(Train).
Now we wish to add elements of Train to arr1 at the positions specified by positions.
One way is to use a for loop on the Train array as:
k=len(Train[0])
for i in range(len(Train)):
arr1[i,int(positions[i]):int((positions[i]+k))]=Train[i,0:k])]
However, going over the entire Train set using the explicit for loop is slow and I would like to optimize it.

Here is one way by generating all the indexes you want to assign to. Setup:
import numpy as np
n = 12 # Number of training samples
l = 8 # Number of columns in the output array
k = 4 # Number of columns in the training samples
arr = np.zeros((n, l), dtype=int)
train = np.random.randint(10, size=(n, k))
positions = np.random.randint(l - k, size=n)
Random example data:
>>> train
array([[3, 4, 3, 2],
[3, 6, 4, 1],
[0, 7, 9, 6],
[4, 0, 4, 8],
[2, 2, 6, 2],
[4, 5, 1, 7],
[5, 4, 4, 4],
[0, 8, 5, 3],
[2, 9, 3, 3],
[3, 3, 7, 9],
[8, 9, 4, 8],
[8, 7, 6, 4]])
>>> positions
array([3, 2, 3, 2, 0, 1, 2, 2, 3, 2, 1, 1])
Advanced indexing with broadcasting trickery:
rows = np.arange(n)[:, None] # Shape (n, 1)
cols = np.arange(k) + positions[:, None] # Shape (n, k)
arr[rows, cols] = train
output:
>>> arr
array([[0, 0, 0, 3, 4, 3, 2, 0],
[0, 0, 3, 6, 4, 1, 0, 0],
[0, 0, 0, 0, 7, 9, 6, 0],
[0, 0, 4, 0, 4, 8, 0, 0],
[2, 2, 6, 2, 0, 0, 0, 0],
[0, 4, 5, 1, 7, 0, 0, 0],
[0, 0, 5, 4, 4, 4, 0, 0],
[0, 0, 0, 8, 5, 3, 0, 0],
[0, 0, 0, 2, 9, 3, 3, 0],
[0, 0, 3, 3, 7, 9, 0, 0],
[0, 8, 9, 4, 8, 0, 0, 0],
[0, 8, 7, 6, 4, 0, 0, 0]])

Related

Row-wise sorting a batch of pytorch tensors by column value

I would like to sort each row in a bxmxn pytorch tensor (where b represents the batch size) by the k-th column value in each row. So my input tensor is bxmxn, and my output tensor is also bxmxn with the rows of each mxn tensor rearranged based on the k-th column value.
For example, if my original tensor is:
a = torch.as_tensor([[[1, 3, 7, 6], [9, 0, 6, 2], [3, 0, 5, 8]], [[1, 0, 1, 0], [2, 1, 0, 3], [0, 0, 6, 1]]])
My sorted tensor should be:
sorted_dim = 1 # sort by rows, preserving each row
sorted_column = 2 # sort rows on value of 3rd column of each row
sorted_a = torch.as_tensor([[[3, 0, 5, 8], [9, 0, 6, 2], [1, 3, 7, 6]], [[2, 1, 0, 3], [1, 0, 1, 0], [0, 0, 6, 1]]])
Thanks!
Try this
a = torch.as_tensor([[[1, 3, 7, 6], [9, 0, 6, 2], [3, 0, 5, 8]], [[1, 0, 1, 0], [2, 1, 0, 3], [0, 0, 6, 1]]])
b=torch.argsort(a[:,:,2])
sorted_a=torch.stack([a[i,b[i],:] for i in range(a.shape[0])] )
sorted_a
output:
tensor([[[3, 0, 5, 8],
[9, 0, 6, 2],
[1, 3, 7, 6]],
[[2, 1, 0, 3],
[1, 0, 1, 0],
[0, 0, 6, 1]]])

Set numpy matrix elements to zero if varying row index is exceeded

I have a quite large m times n numpy matrix M filled with non-zero values and an array x of length m, where each entry indicates the row index, after which the matrix elements should be set to zero. So for example, if n=5 and x[i]=3, then the i-th row of the matrix be set to [M_i1, M_i2, M_i3, 0, 0].
If all entries of x had the same value k, I could simply use slicing with something like M[:,k:]=0, but I could not figure out an efficient way to this with different values for each row without looping over all rows and use slicing for each row.
I thougt about creating a matrix that looks like [[1]*x[1] + [0]*(n-x[1]),...,[1]*x[m] + [0]*(n-x[m])] and use it for boolean indexing but also don't know how to create this without looping.
The non-vectorized solution looks like this:
for i in range(m):
if x[i] < n:
M[i,x[i]:] = 0
with example input
M = np.array([[1,2,3],[4,5,6]])
m, n = 2, 3
x = np.array([1,2])
and output
array([[1, 0, 0],
[4, 5, 0]])
Does anyone have a vectorized solution for this problem?
Thank you very much!
You can use multi-dimensional boolean indexing:
M[x[:,None]<=np.arange(M.shape[1])] = 0
example:
M = [[7, 8, 4, 2, 3, 9, 1, 8, 4, 3],
[2, 1, 6, 1, 5, 2, 2, 2, 9, 2],
[6, 1, 6, 8, 4, 3, 6, 9, 2, 6],
[5, 4, 0, 8, 3, 0, 0, 1, 8, 7],
[8, 7, 8, 8, 9, 2, 0, 8, 0, 2]]
x = [4, 4, 0, 6, 2]
output:
[[7, 8, 4, 2, 0, 0, 0, 0, 0, 0],
[2, 1, 6, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[5, 4, 0, 8, 3, 0, 0, 0, 0, 0],
[8, 7, 0, 0, 0, 0, 0, 0, 0, 0]]
This looks like a mask-smearing exercise. At each row, you want to smear starting with the element at np.minimum(x[row], n):
mask = np.zeros(M.shape, bool)
mask[np.flatnonzero(x < n), x[x < n]] = True
M[np.cumsum(mask, axis=1, dtype=bool)] = 0

How to use skimage.measure.regionprops to query labels

Can someone help me out with skimage.measure.regionprops? The documentation was confusing to me in describing the list of properties that regionprops provides.
I would like to do the following:
Query a point (x,y) and return which labeled area the point belongs in.
Get an ndarray of all points within a labeled area.
Here is some code showing what I have so far:
import numpy as np
from skimage.measure import label
import matplotlib.pyplot as plt
arr = np.array([[1, 0, 1, 0, 0, 0, 1],
[1, 1, 1, 0, 0, 0, 1],
[0, 1, 1, 0, 0, 0, 1],
[0, 1, 1, 0, 0, 1, 1],
[0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 1, 1, 1, 1],
[1, 0, 0, 1, 1, 1, 1],
[1, 0, 0, 1, 1, 1, 1],
[1, 0, 0, 1, 1, 1, 1]])
img = label(arr)
plt.imshow(img)
plt.show()
Examples of what I want to do are making a query to arr[8][6] and knowing which label it is a part of (green) and to know all of the points that belong to an arbitrary label (like green).
The numeric label of any pixel can be retrieved by indexing img:
In [67]: row, col = 8, 6
In [68]: index = img[row, col]
In [69]: print(f'The label of pixel [{row}, {col}] is {index}')
The label of pixel [8, 6] is 2
The you could use NumPy's nonzero to get the coordinates of all the pixels with the same label:
In [70]: coords = np.nonzero(img == index)
In [71]: coords
Out[71]:
(array([0, 1, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8], dtype=int32),
array([6, 6, 6, 5, 6, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6], dtype=int32))
In [72]: out = np.zeros(shape = arr.shape + (3,), dtype=np.uint8)
In [73]: out[coords] = [0, 255, 0] # green
In [74]: plt.imshow(out)
Out[74]: <matplotlib.image.AxesImage at 0x11a2ec10>

Delete rows in ndarray where sum of multiple indexes is 0

So I have a very large two-dimensional numpy array such as:
array([[ 2, 4, 0, 0, 0, 5, 9, 0],
[ 2, 3, 0, 1, 0, 3, 1, 1],
[ 1, 5, 4, 3, 2, 7, 8, 3],
[ 0, 7, 0, 0, 0, 6, 4, 4],
...,
[ 6, 5, 6, 0, 0, 1, 9, 5]])
I would like to quickly remove each row of the array where np.sum(row[2:5]) == 0
The only way I can think to do this is with for loops, but that takes very long when there are millions of rows. Additionally, this needs to be constrained to Python 2.7
Boolean expressions can be used as an index. You can use them to mask the array.
inputarray = array([[ 2, 4, 0, 0, 0, 5, 9, 0],
[ 2, 3, 0, 1, 0, 3, 1, 1],
[ 1, 5, 4, 3, 2, 7, 8, 3],
[ 0, 7, 0, 0, 0, 6, 4, 4],
...,
[ 6, 5, 6, 0, 0, 1, 9, 5]])
mask = numpy.sum(inputarray[:,2:5], axis=1) != 0
result = inputarray[mask,:]
What this is doing:
inputarray[:, 2:5] selects all the columns you want to sum over
axis=1 means we're doing the sum on the columns
We want to keep the rows where the sum is not zero
The mask is used as a row index and selects the rows where the boolean expression is True
Another solution would be to use numpy.apply_along_axis to calculate the sums and cast it as a bool, and use that for your index:
my_arr = np.array([[ 2, 4, 0, 0, 0, 5, 9, 0],
[ 2, 3, 0, 1, 0, 3, 1, 1],
[ 1, 5, 4, 3, 2, 7, 8, 3],
[ 0, 7, 0, 0, 0, 6, 4, 4],])
my_arr[np.apply_along_axis(lambda x: bool(sum(x[2:5])), 1, my_arr)]
array([[2, 3, 0, 1, 0, 3, 1, 1],
[1, 5, 4, 3, 2, 7, 8, 3]])
We just cast the sum too a bool since any number that's not 0 is going to be True.
>>> a
array([[2, 4, 0, 0, 0, 5, 9, 0],
[2, 3, 0, 1, 0, 3, 1, 1],
[1, 5, 4, 3, 2, 7, 8, 3],
[0, 7, 0, 0, 0, 6, 4, 4],
[6, 5, 6, 0, 0, 1, 9, 5]])
You are interested in columns 2 through five
>>> a[:,2:5]
array([[0, 0, 0],
[0, 1, 0],
[4, 3, 2],
[0, 0, 0],
[6, 0, 0]])
>>> b = a[:,2:5]
You want to find the sum of those columns in each row
>>> sum_ = b.sum(1)
>>> sum_
array([0, 1, 9, 0, 6])
These are the rows that meet your criteria
>>> sum_ != 0
array([False, True, True, False, True], dtype=bool)
>>> keep = sum_ != 0
Use boolean indexing to select those rows
>>> a[keep, :]
array([[2, 3, 0, 1, 0, 3, 1, 1],
[1, 5, 4, 3, 2, 7, 8, 3],
[6, 5, 6, 0, 0, 1, 9, 5]])
>>>

Sum a staggered array "columns" in this way

Let's say I have the following array
import numpy as np
matrix = np.array([
[[1, 2, 3, 4], [0, 1], [2, 3, 4, 5]],
[[1, 2, 3], [4], [0, 1], [2, 0], [0, 0]],
[[2, 2], [3, 4, 0], [1, 1, 0, 0], [0]],
[[6, 3, 3, 4, 0], [4, 2, 3, 4, 5]],
[[1, 2, 3, 2], [0, 1, 2], [3, 4, 5]]])
As you can see, it's a staggered array. What I want to do is to sum the elements in a way so that the output is:
[11, 11, 15, 18, 0, 8, 9, 9, 12, 15]
I want to sum the elements in the "columns" of the matrix, but I don't know how to do it.
As mentioned by juanpa.arrivillaga in the comments, you don't have a multi-dimensional array, you have a 1-D array of lists of lists. You need to flatten the inner lists first :
>>> np.array([[z for y in x for z in y] for x in matrix])
array([[1, 2, 3, 4, 0, 1, 2, 3, 4, 5],
[1, 2, 3, 4, 0, 1, 2, 0, 0, 0],
[2, 2, 3, 4, 0, 1, 1, 0, 0, 0],
[6, 3, 3, 4, 0, 4, 2, 3, 4, 5],
[1, 2, 3, 2, 0, 1, 2, 3, 4, 5]])
It should be much easier to solve your problem now. This matrix has a shape of (5,10), and supports T for transposition and np.sum() for summing rows or columns.
You didn't write any code, so I won't solve the problem completely, but with this matrix, you're one step away from:
array([11, 11, 15, 18, 0, 8, 9, 9, 12, 15])

Categories