Related
I've been looking all over but I'm not really sure how to even describe what it is I want. Essentially I need to turn
np.array(
[[0,0, 1,1, 2,2],
[0,0, 1,1, 2,2],
[3,3, 4,4, 5,5],
[3,3, 4,4, 5,5]]
)
into
np.array(
[[[0,1,2,3,4,5], [0,1,2,3,4,5]],
[[0,1,2,3,4,5], [0,1,2,3,4,5]]
)
I think I can accomplish that using np.reshape and maybe some other stuff but if I try reshape with arguments (2,2,6) I get back
[[[0 0 1 1 2 2]
[0 0 1 1 2 2]]
[[3 3 4 4 5 5]
[3 3 4 4 5 5]]]
which is not quite what I want.
Make your array with a couple of repeats:
In [208]: arr = np.arange(0,6).reshape(2,3)
In [209]: arr
Out[209]:
array([[0, 1, 2],
[3, 4, 5]])
In [210]: arr = arr.repeat(2,0).repeat(2,1)
In [211]: arr
Out[211]:
array([[0, 0, 1, 1, 2, 2],
[0, 0, 1, 1, 2, 2],
[3, 3, 4, 4, 5, 5],
[3, 3, 4, 4, 5, 5]])
Now break it into blocks which we can transpose:
In [215]: arr1 = arr.reshape(2,2,3,2)
In [216]: arr1
Out[216]:
array([[[[0, 0],
[1, 1],
[2, 2]],
[[0, 0],
[1, 1],
[2, 2]]],
[[[3, 3],
[4, 4],
[5, 5]],
[[3, 3],
[4, 4],
[5, 5]]]])
In [217]: arr1.shape
Out[217]: (2, 2, 3, 2)
In [218]: arr1.transpose(1,0,2,3)
Out[218]:
array([[[[0, 0],
[1, 1],
[2, 2]],
[[3, 3],
[4, 4],
[5, 5]]],
[[[0, 0],
[1, 1],
[2, 2]],
[[3, 3],
[4, 4],
[5, 5]]]])
Let's consolidate the middle 2 axes:
In [220]: arr1.transpose(1,0,2,3).reshape(2,6,2)
Out[220]:
array([[[0, 0],
[1, 1],
[2, 2],
[3, 3],
[4, 4],
[5, 5]],
[[0, 0],
[1, 1],
[2, 2],
[3, 3],
[4, 4],
[5, 5]]])
Almost there; just need another transpose:
In [221]: arr1.transpose(1,0,2,3).reshape(2,6,2).transpose(0,2,1)
Out[221]:
array([[[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5]],
[[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5]]])
The basic idea is to reshape the array into blocks, do a transpose, and reshape again. Here I needed another transpose, but if I choose the right one to start with I might not have needed that.
I don't know of a systematic way of doing this; there may be one, but so far I've just used a bit of trial and error when answering this kind of question. Everyone wants a different final arrangement.
This should work:
>>> import numpy as np
>>> A = np.array(
... [[0,0, 1,1, 2,2],
... [0,0, 1,1, 2,2],
... [3,3, 4,4, 5,5],
... [3,3, 4,4, 5,5]]
... )
>>> B = a[::2,::2].flatten()
>>> B
array([0, 1, 2, 3, 4, 5])
>>> C = np.tile(b, (2,2,1))
>>> C
array([[[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5]],
[[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5]]])
We can generalize this for a given n * m matrix, that contain blocks sized n/y * m/x of identical values (so there are y rows and x columns of blocks)
def transform(A, y, x):
dy = A.shape[0]/y
dx = A.shape[1]/x
B = A[::dy, ::dx].flatten()
return np.tile(B, (y,x,1))
I think this is what you are looking for:
import numpy
b=numpy.array([[0,0,1,1,2,2],[0,0,1,1,2,2],[3,3,4,4,5,5],[3,3,4,4,5,5]])
c1=(b[::2,::2].flatten(),b[::2,1::2].flatten())
c2=(b[1::2,::2].flatten(),b[1::2,1::2].flatten())
c=numpy.vstack((c1,c2)).reshape((2,2,6))
print(c)
which outputs:
[[[0 1 2 3 4 5]
[0 1 2 3 4 5]]
[[0 1 2 3 4 5]
[0 1 2 3 4 5]]]
and for general size target array and general size input array this is the algorithm with an example of 3*3 input array:
import numpy
b=numpy.array([[0,0,1,1,2,2],[0,0,1,1,2,2],[0,0,1,1,2,2],[3,3,4,4,5,5],[3,3,4,4,5,5],[3,3,4,4,5,5]])
(m,n)=b.shape
C=b[::int(m/2),::2].flatten(),b[::int(m/2),1::2].flatten()
for i in range(1,int(m/2)):
C=numpy.vstack((C,(b[i::int(m/2),::2].flatten(),b[i::int(m/2),1::2].flatten())))
print(C)
which outputs:
[[0 1 2 3 4 5]
[0 1 2 3 4 5]
[0 1 2 3 4 5]
[0 1 2 3 4 5]
[0 1 2 3 4 5]
[0 1 2 3 4 5]]
I have the following numpy array:
[[[1], [2], [3], [1], [2], [3]],
[[4], [5], [6], [4], [5], [6]],
[[7], [8], [9], [7], [8], [9]]]
And I want each of the elements in the last dimension, [1], [2], [3] etc. to be concatenate with the following n arrays in the second dimension. In case of overflow, elements can be filled with 0. For example, for n = 2:
[[[1, 2, 3], [2, 3, 1], [3, 1, 2], [1, 2, 3], [2, 3, 0], [3, 0, 0]],
[[4, 5, 6], [5, 6, 4], [6, 4, 5], [4, 5, 6], [5, 6, 0], [6, 0, 0]],
[[7, 8, 9], [8, 9, 7], [9, 7, 8], [7, 8, 9], [8, 9, 0], [9, 0, 0]]]
I want to do this with the built in numpy functions for good performance and also want to do it in reverse i.e., a shift of n = -2 is fair game. How to do this?
For n = -2:
[[[0, 0, 1], [0, 1, 2], [1, 2, 3], [2, 3, 1], [3, 1, 2], [1, 2, 3]],
[[0, 0, 4], [0, 4, 5], [4, 5, 6], [5, 6, 4], [6, 4, 5], [4, 5, 6]],
[[0, 0, 7], [0, 7, 8], [7, 8, 9], [8, 9, 7], [9, 7, 8], [7, 8, 9]]]
For n = 3
[[[1, 2, 3, 1], [2, 3, 1, 2], [3, 1, 2, 3], [1, 2, 3, 0], [2, 3, 0, 0], [3, 0, 0, 0]],
[[4, 5, 6, 4], [5, 6, 4, 5], [6, 4, 5, 6], [4, 5, 6, 0], [5, 6, 0, 0], [6, 0, 0, 0]],
[[7, 8, 9, 7], [8, 9, 7, 8], [9, 7, 8, 9], [7, 8, 9, 0], [8, 9, 0, 0], [9, 0, 0, 0]]]
If the current shape of the array is (height, width, 1), after the operation, the shape will be (height, width, abs(n) + 1).
How to generalize this so that the numbers 1, 2, 3 etc. can themselves be numpy arrays?
This sounds like a textbook application for the monster that is as_strided. One of the nice things about it is that it does not require any additional imports. The general idea is this:
You have an array with shape (3, 6, 1) and strides (6, 1, 1) * element_size.
x = ...
n = ... # Must not be zero, but you can special-case it to return the original array
You want to transform this into an array that has shape (3, 6, |n| + 1) and therefore strides (6 * (|n| + 1), |n| + 1, 1) * element_size.
To do this, you first pad the left or the right with |n| zeros:
pad = np.zeros((x.shape[0], np.abs(n), x.shape[2]))
x_pad = np.concatenate([x, pad][::np.sign(n)], axis=1)
Now you can index directly into the buffer with a custom shape and strides to get the result you want. Instead of using the proper strides (6 * (|n| + 1), |n| + 1, 1) * element_size, we will index each repeated element directly into the same buffer of the original array, meaning that the strides will be adjusted. The middle dimension will move by one element, rather than the proper |n| + 1. That way, the columns can start exactly where you want them to:
new_shape = (x.shape[0], x.shape[1], x.shape[2] + np.abs(n))
new_strides = (x_pad.strides[0], x_pad.strides[2], x_pad.strides[2])
result = np.lib.stride_tricks.as_strided(x_pad, shape=new_shape, strides=new_strides)
There are many caveats here. The biggest thing to be aware of is that multiple array elements access the same memory. My advice is to make a proper fleshed-out copy if you plan to do anything besides just reading the data:
result = result.copy()
This will give you a buffer of the correct size rather than a crazy view into the original data with padding.
Here is a way to do it:
from skimage.util import view_as_windows
if n>=0:
a = np.pad(a.reshape(*a.shape[:-1]),((0,0),(0,n)))
else:
n *= -1
a = np.pad(a.reshape(*a.shape[:-1]),((0,0),(n,0)))
b = view_as_windows(a,(1,n+1))
b = b.reshape(*b.shape[:-2]+(n+1,))
a is your input array and b is your output:
n=2:
[[[1 2 3]
[2 3 1]
[3 1 2]
[1 2 3]
[2 3 0]
[3 0 0]]
[[4 5 6]
[5 6 4]
[6 4 5]
[4 5 6]
[5 6 0]
[6 0 0]]
[[7 8 9]
[8 9 7]
[9 7 8]
[7 8 9]
[8 9 0]
[9 0 0]]]
n=-2:
[[[0 0 1]
[0 1 2]
[1 2 3]
[2 3 1]
[3 1 2]
[1 2 3]]
[[0 0 4]
[0 4 5]
[4 5 6]
[5 6 4]
[6 4 5]
[4 5 6]]
[[0 0 7]
[0 7 8]
[7 8 9]
[8 9 7]
[9 7 8]
[7 8 9]]]
Explanation:
np.pad(a.reshape(*a.shape[:-1]),((0,0),(0,n))) pads enough zeros to the right side of array for overflow of windows (similarly padding left side for negative n)
view_as_windows(a,(1,n+1)) creates windows of shape (1,n+1) from the array as desired by the question.
b.reshape(*b.shape[:-2]+(n+1,)) gets rid of the extra dimension of length 1 created by (1,n+1)-shaped windows and reshape b to desired shape. Note the argument *b.shape[:-2]+(n+1,) is simply concatenation of two tuples to create a single tuple as shape.
You can also do this (requires numpy version 1.20.x+
import numpy as np
arr = np.array([[[1], [2], [3], [1], [2], [3]],
[[4], [5], [6], [4], [5], [6]],
[[7], [8], [9], [7], [8], [9]]])
n = 2 # your n value
nslices = n+1
# reshape into 2d array
arr = arr.reshape((3, -1)) # <-- add n zeros as padding here
# perform the slicing
slices = np.lib.stride_tricks.sliding_window_view(arr, (1, nslices))
slices = slices[:, :, 0]
print(slices)
Suppose I have a matrix A with some arbitrary values:
array([[ 2, 4, 5, 3],
[ 1, 6, 8, 9],
[ 8, 7, 0, 2]])
And a matrix B which contains indices of elements in A:
array([[0, 0, 1, 2],
[0, 3, 2, 1],
[3, 2, 1, 0]])
How do I select values from A pointed by B, i.e.:
A[B] = [[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]]
EDIT: np.take_along_axis is a builtin function for this use case implemented since numpy 1.15. See #hpaulj 's answer below for how to use it.
You can use NumPy's advanced indexing -
A[np.arange(A.shape[0])[:,None],B]
One can also use linear indexing -
m,n = A.shape
out = np.take(A,B + n*np.arange(m)[:,None])
Sample run -
In [40]: A
Out[40]:
array([[2, 4, 5, 3],
[1, 6, 8, 9],
[8, 7, 0, 2]])
In [41]: B
Out[41]:
array([[0, 0, 1, 2],
[0, 3, 2, 1],
[3, 2, 1, 0]])
In [42]: A[np.arange(A.shape[0])[:,None],B]
Out[42]:
array([[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]])
In [43]: m,n = A.shape
In [44]: np.take(A,B + n*np.arange(m)[:,None])
Out[44]:
array([[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]])
More recent versions have added a take_along_axis function that does the job:
A = np.array([[ 2, 4, 5, 3],
[ 1, 6, 8, 9],
[ 8, 7, 0, 2]])
B = np.array([[0, 0, 1, 2],
[0, 3, 2, 1],
[3, 2, 1, 0]])
np.take_along_axis(A, B, 1)
Out[]:
array([[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]])
There's also a put_along_axis.
I know this is an old question, but another way of doing it using indices is:
A[np.indices(B.shape)[0], B]
output:
[[2 2 4 5]
[1 9 8 6]
[2 0 7 8]]
Following is the solution using for loop:
outlist = []
for i in range(len(B)):
lst = []
for j in range(len(B[i])):
lst.append(A[i][B[i][j]])
outlist.append(lst)
outarray = np.asarray(outlist)
print(outarray)
Above can also be written in more succinct list comprehension form:
outlist = [ [A[i][B[i][j]] for j in range(len(B[i]))]
for i in range(len(B)) ]
outarray = np.asarray(outlist)
print(outarray)
Output:
[[2 2 4 5]
[1 9 8 6]
[2 0 7 8]]
I have a dataframe df with a single column that contains arrays of length 3. Now, I want to transform this column to a numpy array of the correct shape. However, applying np.reshape does not work. How can I do this?
Here is a brief example:
import pandas as pd
import numpy as np
df = pd.DataFrame(columns=['col'])
for i in range(10):
df.loc[i,'col'] = np.zeros(3)
arr = np.array(df['col'])
np.reshape(arr, (10,3)) # This does not work
Here are two approaches using np.vstack and np.concatenate -
np.vstack(df.col)
np.concatenate(df.col).reshape(df.shape[0],-1) # for performance
For best performance, we could use the underlying data with df.col.values instead.
Sample run -
In [116]: df
Out[116]:
col
0 [7, 5, 2]
1 [1, 1, 3]
2 [6, 1, 4]
3 [7, 0, 0]
4 [8, 8, 0]
5 [7, 8, 0]
6 [0, 5, 8]
7 [8, 3, 1]
8 [6, 6, 8]
9 [8, 2, 3]
In [117]: np.vstack(df.col)
Out[117]:
array([[7, 5, 2],
[1, 1, 3],
[6, 1, 4],
[7, 0, 0],
[8, 8, 0],
[7, 8, 0],
[0, 5, 8],
[8, 3, 1],
[6, 6, 8],
[8, 2, 3]])
I have a numpy array x (with (n,4) shape) of integers like:
[[0 1 2 3],
[1 2 7 9],
[2 1 5 2],
...]
I want to transform the array into an array of pairs:
[0,1]
[0,2]
[0,3]
[1,2]
...
so first element makes a pair with other elements in the same sub-array. I have already a for-loop solution:
y=np.array([[x[j,0],x[j,i]] for i in range(1,4) for j in range(0,n)],dtype=int)
but since looping over numpy array is not efficient, I tried slicing as the solution. I can do the slicing for every column as:
y[1]=np.array([x[:,0],x[:,1]]).T
# [[0,1],[1,2],[2,1],...]
I can repeat this for all columns. My questions are:
How can I append y[2] to y[1],... such that the shape is (N,2)?
If number of columns is not small (in this example 4), how can I find y[i] elegantly?
What are the alternative ways to achieve the final array?
The cleanest way of doing this I can think of would be:
>>> x = np.arange(12).reshape(3, 4)
>>> x
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> n = x.shape[1] - 1
>>> y = np.repeat(x, (n,)+(1,)*n, axis=1)
>>> y
array([[ 0, 0, 0, 1, 2, 3],
[ 4, 4, 4, 5, 6, 7],
[ 8, 8, 8, 9, 10, 11]])
>>> y.reshape(-1, 2, n).transpose(0, 2, 1).reshape(-1, 2)
array([[ 0, 1],
[ 0, 2],
[ 0, 3],
[ 4, 5],
[ 4, 6],
[ 4, 7],
[ 8, 9],
[ 8, 10],
[ 8, 11]])
This will make two copies of the data, so it will not be the most efficient method. That would probably be something like:
>>> y = np.empty((x.shape[0], n, 2), dtype=x.dtype)
>>> y[..., 0] = x[:, 0, None]
>>> y[..., 1] = x[:, 1:]
>>> y.shape = (-1, 2)
>>> y
array([[ 0, 1],
[ 0, 2],
[ 0, 3],
[ 4, 5],
[ 4, 6],
[ 4, 7],
[ 8, 9],
[ 8, 10],
[ 8, 11]])
Like Jaimie, I first tried a repeat of the 1st column followed by reshaping, but then decided it was simpler to make 2 intermediary arrays, and hstack them:
x=np.array([[0,1,2,3],[1,2,7,9],[2,1,5,2]])
m,n=x.shape
x1=x[:,0].repeat(n-1)[:,None]
x2=x[:,1:].reshape(-1,1)
np.hstack([x1,x2])
producing
array([[0, 1],
[0, 2],
[0, 3],
[1, 2],
[1, 7],
[1, 9],
[2, 1],
[2, 5],
[2, 2]])
There probably are other ways of doing this sort of rearrangement. The result will copy the original data in one way or other. My guess is that as long as you are using compiled functions like reshape and repeat, the time differences won't be significant.
Suppose the numpy array is
arr = np.array([[0, 1, 2, 3],
[1, 2, 7, 9],
[2, 1, 5, 2]])
You can get the array of pairs as
import itertools
m, n = arr.shape
new_arr = np.array([x for i in range(m)
for x in itertools.product(a[i, 0 : 1], a[i, 1 : n])])
The output would be
array([[0, 1],
[0, 2],
[0, 3],
[1, 2],
[1, 7],
[1, 9],
[2, 1],
[2, 5],
[2, 2]])