I have an array that has 8450 rows and 16 columns. I want to feed these data points into an RNN with each 50 points being an entry. So 0-49 is z=0, and 1-50 is z=1 and so forth. The columns need to remain unchanged so that I can still have the same data in each z axis entry. So basically I am taking every chunk of 50 points and moving it into a third axis. Is there simple way to do this python? I tried the reshape but I may not have been doing it correctly. Currently the data is in a pandas dataframe.
points = 50
for i in range(len(data_prepped_dataframe)-points):
x_data = data_prepped_dataframe.iloc[i:i+points,:]
So far I have this but all this does is give me the last 50 points in the data set. I tried adding indexes to the x_data term but that threw an error
I tried
x_data[:,:,i] = data_prepped_dataframe.iloc[i:i+points,:]
but the error said x_data wasn't defined.
If you change the dataframe to an array using np.array() and then add .copy() at the end this will output the 3D array. Specifically in this case a (8400 x 50 x 16) array
Related
New Here. I have a 100x100x100 array. I want to select only the first 10x10x10 values in the array, then average across first two dimensions for a 100x10 array. Then I need to pull out 20 of the values from the 100x10 and plot those 20 numbers on a scatterplot. Any help?
I have a 3D numpy array points of dimensions [10000x3000x128] where the first dimension is the number of frames, the second dimension the number of points in each frame and the third dimension is a 128-element feature vector associated to each point. What I want to do is to efficiently filter the points in each frame by using a boolean 2D mask of dimensions [10000x3000] and for each of the selected points also take the related 128-dim vector of features. Moreover, in output I need still a 3D vector and not a merged 2D vector and possibly avoid any for loop.
Actually what I'm doing is:
# example of points
points = np.array([10000, 3000, 128])
# fg, bg = 2D dimensional boolean np.array
# init empty lists
fg_points, bg_points = [], []
for i in range(points.shape[0]):
fg_mask_tmp, bg_mask_tmp = fg[i], bg[i]
fg_points.append(points[i,fg_mask_tmp,:])
bg_points.append(points[i,bg_mask_tmp,:])
fg_features, bg_features = np.array(fg_points), np.array(bg_points)
But this is a quite naive solution that for sure can be improved in a more numpy-like way.
In addition, I also tried other solutions as:
fg_features = points[fg,:]
But this solution does not preserve the dimensions of the array merging the two first dimensions since the number of filtered points for each frame can vary.
Another solution I tried is to enlarge the 2D masks by appending a [128] true value to the last dimension, but with any successful result.
Dos anyone know a possible efficient solution?
Thank you in advance for any help!
I am trying to subtract two matrices of different shapes using broadcasting. However I am stuck on some point. Need simple solution of how to solve the problem.
Literally I am evaluating data on a grid (first step is subtracting). For example I have 5 grid points grid = (-20,-10, 0, 10, 20) and array of data of length 100.
Line:
u = grid.reshape((ngrid, 1)) - data
works perfectly fine. ngrid = 5 in this trivial example.
Output is matrix of 5 rows and 100 columns, so each point of data is evaluated on each point of grid.
Next I want to do it for 2 grids and 2 data sets simultaneously (data is of size (2x100, e.g. 2 randn arrays). I have already succeeded in subtracting two data sets from one grid, but using two grids throws an error.
In the example below a is vertical array of the grid, length 5 points and data is array of random data of the shape (100,2).
In this case u is is tuple (2,5,100), so u[0] and u[1] has 5 rows and 100 columns, meaning that data was subtracted correctly from the grid.
Second line of the code is what I am trying to do. The error is following:
ValueError: operands could not be broadcast together with shapes (5,2) (2,1,100)
u = a - data.T[:, None] # a is vertical grid of 5 elements. Works ok.
u = grid_test - data.T[:, None] # grid_test is 2 column 5 row matrix of 2 grids. Error.
What I need is kind of same line of code as above, but it should work if "a" contains 2 columns, e.g. two different grids. So in the end expected result is "u", which contains in addition to above described results another two matrices where same data (both arrays) evaluated on the second grid.
Unfortunately I cannot use any loops - only vectorization and broadcasting.
Thanks in advance.
I have a 3D data cube and I am trying to make a plot of the first axis at a specific value of the other two axes. The goal is to make a velocity plot at given coordinates in the sky.
I have tried to create an 1D array from the 3D array by putting in my values for the last two axes. This is what I have tried
achan=50
dchan = 200
lmcdata[:][achan][dchan] #this array has three axes, vchan, achan, dchan.
I am expecting an array of size 120 as there are 120 velocity channels that make up the vchan axis. When trying the code above I keep getting an array of size 655 which is the number of entries for the dchan axis.
Python slicing works from left to right. In this case, lmcdata[:] is returning the whole lmcdata list. So, lmcdata[:][achan][dchan] is equivalent to just lmcdata[achan][dchan].
For higher level indexing and slicing tasks like this, I highly recommend the numpy package. You will be able to slice lmcdata as expected after turning it into a numpy array: lmcdata = np.asarray(lmcdata).
I am trying to create a histogram in python using matplotlib.pyplot.hist.
I have an array of data that varies, however when put my code into python the histogram is returned with values in all bins equal to each other, or equal to zero which is not correct.
The histogram should look the the line graph above it with bins roughly the same height and in the same shape as the graph above.
The line graph above the histogram is there to illustrate what my data looks like and to show that my data does vary.
My data array is called spectrumnoise and is just a function I have created against an array x
x=np.arange[0.1,20.1,0.1]
The code I am using to create the histogram and the line graph above it is
import matplotlib.pylot as mpl
mpl.plot(x,spectrumnoise)
mpl.hist(spectrumnoise,bins=50,histtype='step')
mpl.show()
I have also tried using
mpl.hist((x,spectrumnoise),bins=50,histtype=step)
I have also changed the number of bins countless times to see if that helps an normalising the histogram function but nothing works.
Image of the output of the code can be seen here
The problem is that spectrumnoise is a list of arrays, not a numpy.ndarray. When you hand hist a list of arrays as its first argument, it treats each element as a separate dataset to plot. All the bins have the same height because each 'dataset' in the list has only one value in it!
From the hist docstring:
Multiple data can be provided via x as a list of datasets
of potentially different length ([x0, x1, ...]), or as
a 2-D ndarray in which each column is a dataset.
Try converting spectrumnoise to a 1D array:
pp.hist(np.vstack(spectrumnoise),50)
As an aside, looking at your code there's absolutely no reason to convert your data to lists in the first place. What you ought to do is operate directly on slices in your array, e.g.:
data[20:40] += y1