I want to stack the first n columns of a 2D array vertically. My realization is in the following:
np.vstack(input_seq[:,:n].flatten().tolist())
I am wondering if stacking 1D array directly would be faster? np.vstack(input_seq[:,:n].flatten())
Or are there any faster approaches to stack lists? Asking since I'm gonna repeat this process millions of times.
Any hint would be appreciated! Thanks!
Just reshape your array:
new = input_seq[:, :n].reshape(-1)
Since you're indexing, the reshaped array is already a copy, so you can manipulate it without changing the original array (a reshaped array points to the same data otherwise).
Note that this method makes new one dimensional, while your methods make it two dimensional. If you need your new array to be two dimensional, just reshape it with an extra dimension of 1:
new = input_seq[:, :n].reshape(-1, 1)
Related
I am a bit new to python and I have a large list of data whose shape is as such:
print(mydata.shape)
>>> (487, 24, 12 , 13)
One index of this array itself looks like this:
[4.22617843e-11 5.30694273e-11 3.73693923e-11 2.09353628e-11
2.42581129e-11 3.87492538e-11 2.34626762e-11 1.87155829e-11
2.99512706e-11 3.32095254e-11 4.91165476e-11 9.57019117e-11
7.86496424e-11]]]]
I am trying to take all the elements from this multi-dimensional array and put it into a one-dimensional one so I can graph it.
I would appreciate any help. Thank you.
mydata.ravel()
Will make a new array, flattened to shape:
(487*24*12*13,)
or...
(1823328,)
You can do this by using flatten()
mydata.flatten(order='C')
And order types depend on the following:
order: The order in which items from the NumPy array will be read.
‘C’: Read items from array row wise i.e. using C-like index order.
‘F’: Read items from array column wise i.e. using Fortran-like index
order.
‘A’: Read items from an array based on the memory order of items
I know that we concatenate two 2-D numpy
arrays named arr1 and arr2 with same number of rows with the help of following command:
np.concatenate((arr1,arr2),axis=1)
But I have n number of numpy arrays (I haven't done global variable name assignment to these arrays) in a list ,say, list_array which is a list containing n elements where each element is a 2-D array. We need looping or any efficient program would be okay.
Question
How can I concatenate these elements of the list which are 2-D arrays column wise?
Thank you
I am not from CS background. Any help will be appreciated
Just a side note,
Concatenating with np.concatenate on axis=1 is equivalent to a horizontal stack: np.hstack:
>>> np.hstack(list_array)
vs
>>> np.concatenate(list_array, axis=1)
I have a 2D array of shape (t*40,6) which I want to convert into a 3D array of shape (t,40,5) for the LSTM's input data layer. The description on how the conversion is desired in shown in the figure below. Here, F1..5 are the 5 input features, T1...40 are the time steps for LSTM and C1...t are the various training examples. Basically, for each unique "Ct", I want a "T X F" 2D array, and concatenate all along the 3rd dimension. I do not mind losing the value of "Ct" as long as each Ct is in a different dimension.
I have the following code to do this by looping over each unique Ct, and appending the "T X F" 2D arrays in 3rd dimension.
# load 2d data
data = pd.read_csv('LSTMTrainingData.csv')
trainX = []
# loop over each unique ct and append the 2D subset in the 3rd dimension
for index, ct in enumerate(data.ct.unique()):
trainX.append(data[data['ct'] == ct].iloc[:, 1:])
However, there are over 1,800,000 such Ct's so this makes it quite slow to loop over each unique Ct. Looking for suggestions on doing this operation faster.
EDIT:
data_3d = array.reshape(t,40,6)
trainX = data_3d[:,:,1:]
This is the solution for the original question posted.
Updating the question with an additional problem: the T1...40 time steps can have the highest number of steps = 40, but it could be less than 40 as well. The rest of the values can be 'np.nan' out of the 40 slots available.
Since all Ct have not the same length , you have no other choice than rebuild a new block.
But use of data[data['ct'] == ct] can be O(n²) so it's a bad way to do it.
Here a solution using Panel . cumcount renumber each Ct line :
t=5
CFt=randint(0,t,(40*t,6)).astype(float) # 2D data
df= pd.DataFrame(CFt)
df2=df.set_index([df[0],df.groupby(0).cumcount()]).sort_index()
df3=df2.to_panel()
This automatically fills missing data with Nan. But It warns :
DeprecationWarning:
Panel is deprecated and will be removed in a future version.
The recommended way to represent these types of 3-dimensional data are with a MultiIndex on a DataFrame, via the Panel.to_frame() method
So perhaps working with df2 is the recommended way to manage your data.
I have the following numpy arrays:
a truth table of (nx1), and a matrix of (nxk) where n is 5 and k is 2 in this example.
btable = np.array([[True],[False],[False],[True],[True]])
bb=np.array([[1.842,4.607],[5.659,4.799],[6.352,3.290],[2.904,4.612],[3.231,4.939]])
I would like to extract the vectors in bb according the indexing values in btable.
I tried choicebb=bb[btable==True] which gets me the result
[ 1.84207953 2.90401653 3.23197916]
choicebb=bb[btable] gets me the same results as well.
What I want instead is
[[1.842,4.607]
[2.904,4.612]
[3.231,4.939]]
I also tried
choicebb=bb[btable==True,:]
but then I would get
---> 13 choicebb=bb[btable==True,:]
14 print(choicebb)
IndexError: too many indices for array
This can be easily done in matlab with choicebb=bb(btable,:);
Get the 1D version of the mask with np.ravel() or slice out the first column with [:,0] and use it for logical indexing into the data array, like so -
bb[btable.ravel()]
bb[btable[:,0]]
Note that bb[btable.ravel()] is essentially - bb[btable.ravel(),:]. In NumPy, we could skip mentioning the trailing axes if all elements are to be selected, that's why it simplified to bb[btable.ravel()].
Explanataion : To index into a single axis and such that it select all elements along the rest of the axes, we need to feed in a 1D array (boolean or integer array) along that axis and use : along the leftover axes. In our case, we are indexing into the first axis to select rows, so we need to feed in a boolean array along that axis and : along the rest of axes.
When we are feeding the 2D version of the mask, it indexes along those corresponding multiple axes. So, when we feed in (N,1) shaped boolean array, we are selecting correct rows, but also only selecting the first column elements, which is not the intended output.
I have a pretty big numpy matrix (2-D array with more than 1000 * 1000 cells), and another 2-D array of indexes in the following form: [[x1,y1],[x2,y2],...,[xn,yn]], which is also quite large (n > 1000). I want to extract all the cells in the matrix that their (x,y) coordinates appear in the array as efficient as possible, i.e. without loops. If the array was an array of tuples I could just to
cells = matrix[array]
and get what I want, but the array is not in that format, and I couldn't find an efficient way to convert it to the desired form...
You can make your array into a tuple of arrays like this:
tuple(array.T)
This matches the output style of np.where(), which can be indexed.
cells=matrix[tuple(array.T)]
you can also do standard numpy array slicing and get Divakar's answer in the comments:
cells=matrix[array[:,0],array[:,1]]