I want to convert X,Y,Z numpy array to (X*Z)*Y numpy array.
Code(Slow):
def rearrange(data):
samples,channels,t_insts=data.shape
append_data=np.empty([0,channels])
for sample in range(0,samples):
for t_inst in range(0,t_insts):
channel_data=data[sample,:,t_inst]
append_data=np.vstack((append_data,channel_data))
return append_data.shape
I am looking for a better vectorized approach if possible
You can use np.transpose to swap rows with columns and then reshape -
data.transpose(0,2,1).reshape(-1,data.shape[1])
Or use np.swapaxes to do the swapping of rows and columns and then reshape -
data.swapaxes(1,2).reshape(-1,data.shape[1])
Related
I'm trying to discretize the columns of a 2D array into equal-sized bucket.
A simple 2D array example, which contains NaNs:
import numpy as np
import pandas as pd
np.random.seed(0)
sarray = np.random.rand(500,500)
sarray[sarray>0.9] = np.nan
I tried using the Pandas qcut function:
pd.qcut(sarray, q=10, labels=False, duplicates='drop')
But it only support 1D arrays
ValueError: Input array must be 1 dimensional
I can get the results with a list comprehension:
[pd.qcut(sarray[:,col], q=10, labels=False, duplicates='drop') for col in range(sarray.shape[1])]
But this approach is not vectorizing the calculation. Is there a way with Numpy to vectorize this discretization problem (i.e. to perform the calculation on a single 2D array instead of on multiple 1D arrays)?
I have a matrix A with 500 rows and 1024 columns. I would like to generate a matrix consisting of evenly spaced columns from A, say with step size 2^5. How do I do this in Numpy? I haven't seen this explained in the references I have.
You can just use slicing:
import numpy as np
arr = np.random.rand(512,1024)
step_size = 2 ** 5
arr[:, ::step_size] # shape is (512, 32)
So what it does is keeping all the rows, while taking all the columns with the desired step size. You can read about numpy indexing in the following link:
https://numpy.org/doc/stable/user/basics.indexing.html?highlight=indexing#other-indexing-options
You can apply the same logic to the rows or to both rows and columns to get a more sophisticated slicing.
I have a numpy array a with shape (m,n,3) that I want to index into the first two columns with another array idx of shape (100,2). So what I want is the following:
np.array([a[x,y,:] for x,y in idx])
What's the most efficient way to do this?
I have a numpy boolean vector of shape 1 x N, and an 2d array with shape 160 x N. What is a fast way of subsetting the columns of the 2d array such that for each index of the boolean vector that has True in it, the column is kept, and for each index of the boolean vector that has False in it, the column is discarded?
If you call the vector mask and the array features, i've found the following to be far too slow: np.array([f[mask] for f in features])
Is there a better way? I feel like there has to be, right?
You can try this,
new_array = 2d_array[:,bool_array==True]
So depending on the axes you can select which one you want to remove. In case you get a 1-d array, then you can just reshape it and get the required array. This method will be faster also.
I have a numpy array of shape (3,12,7,5). I would like to have the sum of all slices along the first axis of this array.
data = np.random.randint(low=0, high=8000, size=3*12*7*5).reshape(3,12,7,5)
data[0,...].sum()
data[1,...].sum()
data[2,...].sum()
np.array((data[0,...].sum(), data[1,...].sum(), data[2,...].sum()))
First, I thought this should be possible using np.sum(data, axis=...) but it is not.
How do I perform this calculation in a single shot. What is the numpy magic?
For a generic ndarray, you could reshape into a 2D array, keeping the number of elements along the first axis same and merging all of the remaining axes as the second axis and finally sum along that axis, like so -
data.reshape(data.shape[0],-1).sum(axis=1)
For a 4D array, you could include the axes along which the summation is to be performed. So, to solve our case, we would have -
data.sum(axis=(1,2,3))
This could be extended to make it work for generic ndarrays by creating a tuple of appropriate axis IDs and thus avoid reshaping, like so -
data.sum(axis=tuple(np.arange(1,data.ndim)))