I have a 2D array with filled with some values (column 0) and zeros (rest of the columns). I would like to do pretty much the same as I do with MS excel but using numpy, meaning to put into the rest of the columns values from calculations based on the first column. Here it is a MWE:
import numpy as np
a = np.zeros(20, dtype=np.int8).reshape(4,5)
b = [1, 2, 3, 4]
b = np.array(b)
a[:, 0] = b
# don't change the first column
for column in a[:, 1:]:
a[:, column] = column[0]+1
The expected output:
array([[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8]], dtype=int8)
The resulting output:
array([[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0]], dtype=int8)
Any help would be appreciated.
Looping is slow and there is no need to loop to produce the array that you want:
>>> a = np.ones(20, dtype=np.int8).reshape(4,5)
>>> a[:, 0] = b
>>> a
array([[1, 1, 1, 1, 1],
[2, 1, 1, 1, 1],
[3, 1, 1, 1, 1],
[4, 1, 1, 1, 1]], dtype=int8)
>>> np.cumsum(a, axis=1)
array([[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8]])
What went wrong
Let's start, as in the question, with this array:
>>> a
array([[1, 0, 0, 0, 0],
[2, 0, 0, 0, 0],
[3, 0, 0, 0, 0],
[4, 0, 0, 0, 0]], dtype=int8)
Now, using the code from the question, let's do the loop and see what column actually is:
>>> for column in a[:, 1:]:
... print(column)
...
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
As you can see, column is not the index of the column but the actual values in the column. Consequently, the following does not do what you would hope:
a[:, column] = column[0]+1
Another method
If we want to loop (so that we can do something more complex), here is another approach to generating the desired array:
>>> b = np.array([1, 2, 3, 4])
>>> np.column_stack([b+i for i in range(5)])
array([[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8]])
Your usage of column is a little ambiguous: in for column in a[:, 1:], it is treated as a column and in the body, however, it is treated as index to the column. You can try this instead:
for column in range(1, a.shape[1]):
a[:, column] = a[:, column-1]+1
a
#array([[1, 2, 3, 4, 5],
# [2, 3, 4, 5, 6],
# [3, 4, 5, 6, 7],
# [4, 5, 6, 7, 8]], dtype=int8)
Related
I would like to sort each row in a bxmxn pytorch tensor (where b represents the batch size) by the k-th column value in each row. So my input tensor is bxmxn, and my output tensor is also bxmxn with the rows of each mxn tensor rearranged based on the k-th column value.
For example, if my original tensor is:
a = torch.as_tensor([[[1, 3, 7, 6], [9, 0, 6, 2], [3, 0, 5, 8]], [[1, 0, 1, 0], [2, 1, 0, 3], [0, 0, 6, 1]]])
My sorted tensor should be:
sorted_dim = 1 # sort by rows, preserving each row
sorted_column = 2 # sort rows on value of 3rd column of each row
sorted_a = torch.as_tensor([[[3, 0, 5, 8], [9, 0, 6, 2], [1, 3, 7, 6]], [[2, 1, 0, 3], [1, 0, 1, 0], [0, 0, 6, 1]]])
Thanks!
Try this
a = torch.as_tensor([[[1, 3, 7, 6], [9, 0, 6, 2], [3, 0, 5, 8]], [[1, 0, 1, 0], [2, 1, 0, 3], [0, 0, 6, 1]]])
b=torch.argsort(a[:,:,2])
sorted_a=torch.stack([a[i,b[i],:] for i in range(a.shape[0])] )
sorted_a
output:
tensor([[[3, 0, 5, 8],
[9, 0, 6, 2],
[1, 3, 7, 6]],
[[2, 1, 0, 3],
[1, 0, 1, 0],
[0, 0, 6, 1]]])
I have two arrays
a = np.array([[0, 0, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 2, 2, 3, 4]])
and
b = np.array([[1, 1],
[2, 2],
[3, 3]])
I want to one array where I am adding the values of b to the first two columns in a like this:
c = np.array([[1, 1, 2, 3, 4],
[2, 3, 2, 3, 4],
[3, 5, 2, 3, 4]])
if it helps you can think of the first two columns in a as the x,y coordinates and b as dx, dy.
My current method is as follows:
c = np.concatenate([a[:, 0:2] + b, a[:, 2:]],1)
but I am looking for a better method
Thank you
You can use np.pad to add zeros to b to make its shape the same as a's, then add them:
>>> a + np.pad(b, ((0, 0), (0, 3)))
array([[1, 1, 2, 3, 4],
[2, 3, 2, 3, 4],
[3, 5, 2, 3, 4]])
In general (for 2-D):
>>> a = np.array([[0, 0, 2, 3, 4],
... [0, 1, 2, 3, 4],
... [0, 2, 2, 3, 4]])
>>> b = np.array([[1, 1],
... [2, 2],
... [3, 3],
... [4, 4],
... [5, 5]])
>>> a_shape, b_shape = a.shape, b.shape
>>> max_w = max(a_shape[0], b_shape[0])
>>> max_h = max(a_shape[1], b_shape[1])
>>> padded_a = np.pad(a,
((0, np.abs(a_shape[0] - max_w)),
(0, np.abs(a_shape[1] - max_h))))
>>> padded_a
array([[0, 0, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 2, 2, 3, 4],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]])
>>> padded_b = np.pad(b,
((0, np.abs(b_shape[0] - max_w)),
(0, np.abs(b_shape[1] - max_h))))
>>> padded_b
array([[1, 1, 0, 0, 0],
[2, 2, 0, 0, 0],
[3, 3, 0, 0, 0],
[4, 4, 0, 0, 0],
[5, 5, 0, 0, 0]])
>>> padded_a + padded_b
array([[1, 1, 2, 3, 4],
[2, 3, 2, 3, 4],
[3, 5, 2, 3, 4],
[4, 4, 0, 0, 0],
[5, 5, 0, 0, 0]])
In general (2-D, using a zeros array and adding to it):
>>> c = np.zeros((max_h, max_w), dtype=a.dtype)
>>> c[:a_shape[0], :a_shape[1]] += a
>>> c[:b_shape[0], :b_shape[1]] += b
>>> c
array([[1, 1, 2, 3, 4],
[2, 3, 2, 3, 4],
[3, 5, 2, 3, 4],
[4, 4, 0, 0, 0],
[5, 5, 0, 0, 0]])
There is a bunch of questions regarding reshaping of matrices using NumPy here on stackoverflow. I have found one that is closely related to what I am trying to achieve. However, this answer is not general enough for my application. So here we are.
I have got a matrix with millions of lines (shape m x n) that looks like this:
[[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3],
[4, 4, 4, 4],
[5, 5, 5, 5],
[6, 6, 6, 6],
[7, 7, 7, 7],
[...]]
From this I would like to go to a shape m/2 x 2n like it can be seen below. For that one has to take n consecutive rows every n rows (in this example n = 2). The blocks of consecutively taken rows are then horizontally stacked to the untouched rows. In this example that would mean:
The first two rows stay like they are.
Take row two and three and horizontally concatenate them to row zero and one.
Take row six and seven and horizontally concatenate them to row four and five. This concatenated block then becomes row two and three.
...
[[0, 0, 0, 0, 2, 2, 2, 2],
[1, 1, 1, 1, 3, 3, 3, 3],
[4, 4, 4, 4, 6, 6, 6, 6],
[5, 5, 5, 5, 7, 7, 7, 7],
[...]]
How would I most efficiently (in terms of the least computation time possible) do that using Numpy? And would it make sense to speed the process up using Numba? Or is there not much to speed up?
Assuming your array's length is divisible by 4, here one way you can do it using numpy.hstack after creating the correct indices for selecting the rows for the "left" and "right" parts of the resulting array:
import numpy
# Create the array
N = 1000*4
a = np.hstack([np.arange(0, N)[:, None]]*4) #shape (4000, 4)
a
array([[ 0, 0, 0, 0],
[ 1, 1, 1, 1],
[ 2, 2, 2, 2],
...,
[3997, 3997, 3997, 3997],
[3998, 3998, 3998, 3998],
[3999, 3999, 3999, 3999]])
left_idx = np.array([np.array([0,1]) + 4*i for i in range(N//4)]).reshape(-1)
right_idx = np.array([np.array([2,3]) + 4*i for i in range(N//4)]).reshape(-1)
r = np.hstack([a[left_idx], a[right_idx]]) #shape (2000, 8)
r
array([[ 0, 0, 0, ..., 2, 2, 2],
[ 1, 1, 1, ..., 3, 3, 3],
[ 4, 4, 4, ..., 6, 6, 6],
...,
[3993, 3993, 3993, ..., 3995, 3995, 3995],
[3996, 3996, 3996, ..., 3998, 3998, 3998],
[3997, 3997, 3997, ..., 3999, 3999, 3999]])
Here's an application of the swapaxes answer in your link.
In [11]: x=np.array([[0, 0, 0, 0],
...: [1, 1, 1, 1],
...: [2, 2, 2, 2],
...: [3, 3, 3, 3],
...: [4, 4, 4, 4],
...: [5, 5, 5, 5],
...: [6, 6, 6, 6],
...: [7, 7, 7, 7]])
break the array into 'groups' with a reshape, keeping the number of columns (4) unchanged.
In [17]: x.reshape(2,2,2,4)
Out[17]:
array([[[[0, 0, 0, 0],
[1, 1, 1, 1]],
[[2, 2, 2, 2],
[3, 3, 3, 3]]],
[[[4, 4, 4, 4],
[5, 5, 5, 5]],
[[6, 6, 6, 6],
[7, 7, 7, 7]]]])
swap the 2 middle dimensions, regrouping rows:
In [18]: x.reshape(2,2,2,4).transpose(0,2,1,3)
Out[18]:
array([[[[0, 0, 0, 0],
[2, 2, 2, 2]],
[[1, 1, 1, 1],
[3, 3, 3, 3]]],
[[[4, 4, 4, 4],
[6, 6, 6, 6]],
[[5, 5, 5, 5],
[7, 7, 7, 7]]]])
Then back to the target shape. This final step creates a copy of the original (the previous steps were view):
In [19]: x.reshape(2,2,2,4).transpose(0,2,1,3).reshape(4,8)
Out[19]:
array([[0, 0, 0, 0, 2, 2, 2, 2],
[1, 1, 1, 1, 3, 3, 3, 3],
[4, 4, 4, 4, 6, 6, 6, 6],
[5, 5, 5, 5, 7, 7, 7, 7]])
It's hard to generalize this, since there are different ways of rearranging blocks. For example my first try produced:
In [16]: x.reshape(4,2,4).transpose(1,0,2).reshape(4,8)
Out[16]:
array([[0, 0, 0, 0, 2, 2, 2, 2],
[4, 4, 4, 4, 6, 6, 6, 6],
[1, 1, 1, 1, 3, 3, 3, 3],
[5, 5, 5, 5, 7, 7, 7, 7]])
Suppose I have a matrix A with some arbitrary values:
array([[ 2, 4, 5, 3],
[ 1, 6, 8, 9],
[ 8, 7, 0, 2]])
And a matrix B which contains indices of elements in A:
array([[0, 0, 1, 2],
[0, 3, 2, 1],
[3, 2, 1, 0]])
How do I select values from A pointed by B, i.e.:
A[B] = [[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]]
EDIT: np.take_along_axis is a builtin function for this use case implemented since numpy 1.15. See #hpaulj 's answer below for how to use it.
You can use NumPy's advanced indexing -
A[np.arange(A.shape[0])[:,None],B]
One can also use linear indexing -
m,n = A.shape
out = np.take(A,B + n*np.arange(m)[:,None])
Sample run -
In [40]: A
Out[40]:
array([[2, 4, 5, 3],
[1, 6, 8, 9],
[8, 7, 0, 2]])
In [41]: B
Out[41]:
array([[0, 0, 1, 2],
[0, 3, 2, 1],
[3, 2, 1, 0]])
In [42]: A[np.arange(A.shape[0])[:,None],B]
Out[42]:
array([[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]])
In [43]: m,n = A.shape
In [44]: np.take(A,B + n*np.arange(m)[:,None])
Out[44]:
array([[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]])
More recent versions have added a take_along_axis function that does the job:
A = np.array([[ 2, 4, 5, 3],
[ 1, 6, 8, 9],
[ 8, 7, 0, 2]])
B = np.array([[0, 0, 1, 2],
[0, 3, 2, 1],
[3, 2, 1, 0]])
np.take_along_axis(A, B, 1)
Out[]:
array([[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]])
There's also a put_along_axis.
I know this is an old question, but another way of doing it using indices is:
A[np.indices(B.shape)[0], B]
output:
[[2 2 4 5]
[1 9 8 6]
[2 0 7 8]]
Following is the solution using for loop:
outlist = []
for i in range(len(B)):
lst = []
for j in range(len(B[i])):
lst.append(A[i][B[i][j]])
outlist.append(lst)
outarray = np.asarray(outlist)
print(outarray)
Above can also be written in more succinct list comprehension form:
outlist = [ [A[i][B[i][j]] for j in range(len(B[i]))]
for i in range(len(B)) ]
outarray = np.asarray(outlist)
print(outarray)
Output:
[[2 2 4 5]
[1 9 8 6]
[2 0 7 8]]
Let's say I have the following array
import numpy as np
matrix = np.array([
[[1, 2, 3, 4], [0, 1], [2, 3, 4, 5]],
[[1, 2, 3], [4], [0, 1], [2, 0], [0, 0]],
[[2, 2], [3, 4, 0], [1, 1, 0, 0], [0]],
[[6, 3, 3, 4, 0], [4, 2, 3, 4, 5]],
[[1, 2, 3, 2], [0, 1, 2], [3, 4, 5]]])
As you can see, it's a staggered array. What I want to do is to sum the elements in a way so that the output is:
[11, 11, 15, 18, 0, 8, 9, 9, 12, 15]
I want to sum the elements in the "columns" of the matrix, but I don't know how to do it.
As mentioned by juanpa.arrivillaga in the comments, you don't have a multi-dimensional array, you have a 1-D array of lists of lists. You need to flatten the inner lists first :
>>> np.array([[z for y in x for z in y] for x in matrix])
array([[1, 2, 3, 4, 0, 1, 2, 3, 4, 5],
[1, 2, 3, 4, 0, 1, 2, 0, 0, 0],
[2, 2, 3, 4, 0, 1, 1, 0, 0, 0],
[6, 3, 3, 4, 0, 4, 2, 3, 4, 5],
[1, 2, 3, 2, 0, 1, 2, 3, 4, 5]])
It should be much easier to solve your problem now. This matrix has a shape of (5,10), and supports T for transposition and np.sum() for summing rows or columns.
You didn't write any code, so I won't solve the problem completely, but with this matrix, you're one step away from:
array([11, 11, 15, 18, 0, 8, 9, 9, 12, 15])