I have a Numpy array that looks like
array([1, 2, 3, 4, 5, 6, 7, 8])
and I want to reshape it to an array
array([[5, 0, 0, 6],
[0, 1, 2, 0],
[0, 3, 4, 0],
[7, 0, 0, 8]])
More specifically, I'm trying to reshape a 2D numpy array to get a 3D Numpy array to go from
array([[ 1, 2, 3, 4, 5, 6, 7, 8],
[ 9, 10, 11, 12, 13, 14, 15, 16],
[17, 18, 19, 20, 21, 22, 23, 24],
...
[ 9, 10, 11, 12, 13, 14, 15, 16],
[89, 90, 91, 92, 93, 94, 95, 96]])
to a numpy array that looks like
array([[[ 5, 0, 0, 6],
[ 0, 1, 2, 0],
[ 0, 3, 4, 0],
[ 7, 0, 0, 8]],
[[13, 0, 0, 14],
[ 0, 9, 10, 0],
[ 0, 11, 12, 0],
[15, 0, 0, 16]],
...
[[93, 0, 0, 94],
[ 0, 89, 90, 0],
[ 0, 91, 92, 0],
[95, 0, 0, 96]]])
Is there an efficient way to do this using numpy functionality, particularly vectorized?
We can make use of slicing -
def expand(a): # a is 2D array
out = np.zeros((len(a),4,4),dtype=a.dtype)
out[:,1:3,1:3] = a[:,:4].reshape(-1,2,2)
out[:,::3,::3] = a[:,4:].reshape(-1,2,2)
return out
The benefit is memory and hence perf. efficiency, as only the output would occupy memory space. The steps involved work with views thanks to the slicing on the input and output.
Sample run -
2D input :
In [223]: a
Out[223]:
array([[ 1, 2, 3, 4, 5, 6, 7, 8],
[ 9, 10, 11, 12, 13, 14, 15, 16]])
In [224]: expand(a)
Out[224]:
array([[[ 5, 0, 0, 6],
[ 0, 1, 2, 0],
[ 0, 3, 4, 0],
[ 7, 0, 0, 8]],
[[13, 0, 0, 14],
[ 0, 9, 10, 0],
[ 0, 11, 12, 0],
[15, 0, 0, 16]]])
1D input (feed in 2D extended input with None) :
In [225]: a = np.array([1, 2, 3, 4, 5, 6, 7, 8])
In [226]: expand(a[None])
Out[226]:
array([[[5, 0, 0, 6],
[0, 1, 2, 0],
[0, 3, 4, 0],
[7, 0, 0, 8]]])
Related
I have an array a
a = np.arange(5*5).reshape(5,5)
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
and want to select the last two columns from row one and two, and the first two columns of row three and four.
The result should look like this
array([[3, 4, 10, 11],
[8, 9, 15, 16]])
How to do that in one go without indexing twice and concatenation?
I tried using take
a.take([[0,1,2,3], [3,4,0,1]])
array([[0, 1, 2, 3],
[3, 4, 0, 1]])
ix_
a[np.ix_([0,1,2,3], [3,4,0,1])]
array([[ 3, 4, 0, 1],
[ 8, 9, 5, 6],
[13, 14, 10, 11],
[18, 19, 15, 16]])
and r_
a[np.r_[0:2, 2:4], np.r_[3:5, 0:2]]
array([ 3, 9, 10, 16])
and a combination of ix_ and r_
a[np.ix_([0,1,2,3], np.r_[3:4, 0:1])]
array([[ 3, 0],
[ 8, 5],
[13, 10],
[18, 15]])
Using integer advanced indexing, you can do something like this
index_rows = np.array([
[0, 0, 2, 2],
[1, 1, 3, 3],
])
index_cols = np.array([
[-2, -1, 0, 1],
[-2, -1, 0, 1],
])
a[index_rows, index_cols]
where you just select directly what elements you want.
Is there a faster way to do np.convolve(A, B)[::N] using Numpy? It feels wasteful to compute all the convolutions and then throw N - 1 of N away... I could do a for loop or list comprehension, but I thought it would be faster to use only native Numpy methods.
EDIT
Or does Numpy do lazy evaluation? I just saw this from a JS library, would be awesome for Numpy as well:
// Get first 3 unique values
const arr = [1, 2, 2, 3, 3, 4, 5, 6];
const result = R.pipe(
arr,
R.map(x => {
console.log('iterate', x);
return x;
}),
R.uniq(),
R.take(3)
); // => [1, 2, 3]
/**
* Console output:
* iterate 1
* iterate 2
* iterate 2
* iterate 3
* /
A convolution is a product of your kernel and a window on your array, then the sum. You can achieve the same manually using a rolling window:
First let's see a dummy example
A = np.arange(30)
B = np.ones(6)
N = 3
out = np.convolve(A, B)[::N]
print(out)
output: [ 0. 6. 21. 39. 57. 75. 93. 111. 129. 147. 135. 57.]
Now we do the same with a rolling view, padding, and slicing:
from numpy.lib.stride_tricks import sliding_window_view as swv
out = (swv(np.pad(A, B.shape[0]-1), B.shape[0])[::N]*B).sum(axis=1)
print(out)
output: [ 0. 6. 21. 39. 57. 75. 93. 111. 129. 147. 135. 57.]
Intermediate sliding view:
swv(np.pad(A, B.shape[0]-1), B.shape[0])
array([[ 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 1],
[ 0, 0, 0, 0, 1, 2],
[ 0, 0, 0, 1, 2, 3],
[ 0, 0, 1, 2, 3, 4],
[ 0, 1, 2, 3, 4, 5],
[ 1, 2, 3, 4, 5, 6],
[ 2, 3, 4, 5, 6, 7],
...
[24, 25, 26, 27, 28, 29],
[25, 26, 27, 28, 29, 0],
[26, 27, 28, 29, 0, 0],
[27, 28, 29, 0, 0, 0],
[28, 29, 0, 0, 0, 0],
[29, 0, 0, 0, 0, 0]])
# with slicing
swv(np.pad(A, B.shape[0]-1), B.shape[0])[::N]
array([[ 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 1, 2, 3],
[ 1, 2, 3, 4, 5, 6],
[ 4, 5, 6, 7, 8, 9],
[ 7, 8, 9, 10, 11, 12],
[10, 11, 12, 13, 14, 15],
[13, 14, 15, 16, 17, 18],
[16, 17, 18, 19, 20, 21],
[19, 20, 21, 22, 23, 24],
[22, 23, 24, 25, 26, 27],
[25, 26, 27, 28, 29, 0],
[28, 29, 0, 0, 0, 0]])
I have a numpy array A:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
And the orther array B:
array([0, 1])
How can I get the result by multiply A and B?
array([[[ 0, 0, 0, 0],
[ 0, 0, 0, 0],
[ 0, 0, 0, 0]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
Thank you very much.
You need to reshape the second ndarray so both arrays have the same number of dimensions:
arr1 * arr2[:, None, None]
or
arr1 * arr2.reshape(2, 1, -1)
arr1.shape
# (2, 3, 4)
arr2[:, None, None].shape
# (2, 1, 1)
arr2.reshape(2, 1, -1).shape
# (2, 1, 1)
I've got K feature vectors that all share dimension n but have a variable dimension m (n x m). They all live in a list together.
to_be_padded = []
to_be_padded.append(np.reshape(np.arange(9),(3,3)))
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
to_be_padded.append(np.reshape(np.arange(18),(3,6)))
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17]])
to_be_padded.append(np.reshape(np.arange(15),(3,5)))
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
What I am looking for is a smart way to zero pad the rows of these np.arrays such that they all share the same dimension m. I've tried solving it with np.pad but I have not been able to come up with a pretty solution. Any help or nudges in the right direction would be greatly appreciated!
The result should leave the arrays looking like this:
array([[0, 1, 2, 0, 0, 0],
[3, 4, 5, 0, 0, 0],
[6, 7, 8, 0, 0, 0]])
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17]])
array([[ 0, 1, 2, 3, 4, 0],
[ 5, 6, 7, 8, 9, 0],
[10, 11, 12, 13, 14, 0]])
You could use np.pad for that, which can also pad 2-D arrays using a tuple of values specifying the padding width, ((top, bottom), (left, right)). For that you could define:
def pad_to_length(x, m):
return np.pad(x,((0, 0), (0, m - x.shape[1])), mode = 'constant')
Usage
You could start by finding the ndarray with the highest amount of columns. Say you have two of them, a and b:
a = np.array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
b = np.array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
m = max(i.shape[1] for i in [a,b])
# 5
And then use this parameter to pad the ndarrays:
pad_to_length(a, m)
array([[0, 1, 2, 0, 0],
[3, 4, 5, 0, 0],
[6, 7, 8, 0, 0]])
I believe there is no very efficient solution for this. I think you will need to loop over the list with a for loop and treat every array individually:
for i in range(len(to_be_padded)):
padded = np.zeros((n, maxM))
padded[:,:to_be_padded[i].shape[1]] = to_be_padded[i]
to_be_padded[i] = padded
where maxM is the longest m of the matrices in your list.
I have a bunch of matrices that I stored in a big dataframe. Let's say here is my dataframe.
data = pd.DataFrame([[13, 1, 3, 4, 0, 0], [0, 2, 6, 2, 0, 0], [3, 1, 5, 2, 2, 0], [0, 0, 10, 11, 6, 0], [5, 5, 21, 25, 41, 0],
[11, 1, 3, 2, 0, 1], [3, 1, 7, 3, 1, 1], [1, 1, 6, 5, 3, 1], [1, 1, 6, 7, 6, 1], [6, 6, 21, 24, 42, 1],
[17, 1, 7, 0, 0, 2], [1, 1, 6, 1, 1, 2], [2, 4, 6, 2, 1, 2], [0, 2, 11, 7, 8, 2], [5, 6, 17, 16, 46, 2],
[11, 1, 10, 2, 1, 3], [2, 2, 7, 1, 1, 3], [0, 0, 14, 4, 1, 3], [0, 0, 7, 7, 5, 3], [5, 1, 20, 18, 48, 3],
[16, 3, 7, 1, 2, 4], [1, 2, 4, 1, 0, 4], [2, 4, 7, 5, 3, 4], [3, 0, 4, 4, 7, 4], [7, 2, 13, 12, 58, 4]],
columns=['1', '2', '3', '4', '5', 'iteration'])
print(pd.DataFrame(data))
Each data['iteration'] is a matrix on its own. So, as you can see there are 5 matrices here (iteration-0 to 4). I want to add them all, like in basic matrix addition, to get one single matrix.
I have tried the following, but there's something wrong with it. It doesn't work.
matrix = data[['1','2','3','4','5']]
print(np.sum([matrix[matrix_list['iteration']==i] for i in range(0,9)], axis=0))
How do I do this the right way?
You can use:
In [98]: d = data.set_index('iteration')
In [99]: np.sum(d.loc[i].values for i in d.index.drop_duplicates().values)
Out[99]:
array([[ 68, 7, 30, 9, 3],
[ 7, 8, 30, 8, 3],
[ 8, 10, 38, 18, 10],
[ 4, 3, 38, 36, 32],
[ 28, 20, 92, 95, 235]])
Or alternatively, use groupby():
np.sum(e[1].iloc[:, :-1].values for e in data.groupby('iteration'))
array([[ 68, 7, 30, 9, 3],
[ 7, 8, 30, 8, 3],
[ 8, 10, 38, 18, 10],
[ 4, 3, 38, 36, 32],
[ 28, 20, 92, 95, 235]])