I am working with multiple numpy objects that are numpy lists when each list element contains a 2d array. Strangely the shape() function does not reflect this returning only the number of samples overall.
x_train.shape, x_test.shape, x_test.iloc[0].shape
#((22507,), (5627,), (25, 100))
This code snippet accomplishes the task but I am wondering if there is a better/numpy way to accomplish this.
x = []
[x.append(item) for item in x_train]
_np.array(x).shape
# (22507, 25, 100)
I have searched through stack overflow and although there are many reshape questions I have not seen one that can solve this problem efficiently.
A simple and efficient way to convert your 1D dtype=object array of 2D arrays is:
np.stack(x_train)
But it would be more efficient to load the original data into a 3D array in the first place.
Related
I have a question how to efficiently apply a function which takes an m-dimensional slice of a n-dimensional array as an input.
For example, I have a n-dimensional array of shape (i,j,k,l). And on the dimensions (j,l), I want to apply the function, which gives me back a matrix of shape (j,l). The resulting numpy array should again have the shape (i,j,k,l).
For example I want to apply the following, normalisation function
def norm(arr2d):
return arr2d - np.mean(arr2d)
over the array
arrnd = np.arange(2*3*4*5).reshape(2,3,4,5) # Shape is (2,3,4,5)
on the slice (j,l).
The result I want to achieve I would get via a (slow?) Python list comprehension and moving axes.
result = np.asarray([ [ f(arrnd[:,j,:,l]) for l in range(5) ] for j in range(3)]) # Shape is (3,5,2,4)
result = np.moveaxis(np.moveaxis(result,2,0),2,3).shape # Shape is (2,3,4,5) again
Is there any better, more "numpyic" way to achieve this, without any involved loops?
I alreay looked at np.apply_along_axis() and np.apply_over_axes() but the former only works for 1-d functions, and the latter might only work, if my function is implemented as a ufunc.
The example I provided is just a toy example. The solution should work for any python function.
((If normalising a slice would be my specific problem, I could have circumenvented the python loop and moveaxis by using the ufunc's axes=(..).))
I'm trying to turn a list of 2d numpy arrays into a 2d numpy array. For example,
dat_list = []
for i in range(10):
dat_list.append(np.zeros([5, 10]))
What I would like to get out of this list is an array that is (50, 10). However, when I try the following, I get a (10,5,10) array.
output = np.array(dat_list)
Thoughts?
you want to stack them:
np.vstack(dat_list)
Above accepted answer is correct for 2D arrays as you requested. For 3D input arrays though, vstack() will give you a surprising outcome. For those, use stack(<list of 3D arrays>, 0).
See https://docs.scipy.org/doc/numpy/reference/generated/numpy.append.html
for details. You can use append, but will want to specify the axis on which to append.
dat_list.append(np.zeros([5, 10]),axis=0)
I'm trying to convert a list ND arrays to a dataframe in order to do a Isomap on it. But this doesn't convert. Anyone how to convert in such that I can do an Isomap on it?
#Creation and filling of list samples*
samples = list()
for i in range(72):
img =misc.imread('Datasets/ALOI/32/32_r'+str(i*5)+'.png' )
samples.append(img)
...
df = pd.DataFrame(samples) #This doesn't work gives
#ValueError: Must pass 2-d input*
...
iso = manifold.Isomap(n_neighbors=4, n_components=3)
iso.fit(df) #The end goal of my DataFrame
That is obvious, isn't it? All images are 2D data, rows and columns. Stacking them in a list causes it to gain a third dimension. DataFrames are by nature 2D. Hence the error.
You have 2 possible fixes:
Create a Panel.
wp = pd.Panel.from_dict(zip(samples, [str(i*5) for i in range(72)]))
Stack your arrays one on top of the other, or side by side:
# On top of another:
df = pd.concat([pd.DataFrame(sample) for sample in samples], axis=0,
keys=[str(i*5) for i in range(72)])
# Side by side:
df = pd.concat([pd.DataFrame(sample) for sample in samples], axis=1,
keys=[str(i*5) for i in range(72)])
Another way to do it is to convert your 2D arrays (images) to 1D arrays (that are expected by sklearn) using the reshape method on the images:
for i in range(yourRange):
img = misc.imread(yourFile)
samples.append(img.reshape(-1))
df = pd.DataFrame(samples)
Olivera almost had it.
the problem
When you run misc.imread, the output is a NxM (2D) array. Putting this in a list, makes it 3D. DataFrame expects a 2D input.
the fix
Before it goes in the list, the array should be 'flattened' using ravel:
img =misc.imread('Datasets/ALOI/32/32_r'+str(i*5)+'.png' ).ravel()
why .reshape(-1) doesn't work
Reshaping the array preserves the array's rank. Instead of converting it to an Nx1 array, you want it to be Nx(nothing), which is what ravel() does.
I have several N-dimensional arrays of different shapes and want to combine them into a new (N+1)-dimensional array, where the new axis has a length corresponding to the number of initial N-d arrays.
This answer is sufficient if the original arrays are all the same shape; however, it does not work if they have different shapes.
I don't really want to reshape the arrays to a congruent size and fill with empty elements due to the subsequent analysis I need to perform on the final array.
Specifically, I have four 4D arrays. One of the things I want to do with the resulting 5D array is plot parts of the four arrays on the same matplotlib figure. Obviously I could plot each one separately, however soon I will have more than four 4D arrays and am looking for a dynamic solution.
While I was writing this, Sven gave the same answer in the comments...
Put the arrays in a python list in the following manner:
5d_list = []
5d_list.append(4D_array_1)
5d_list.append(4D_array_2)
...
Then you can unpack them:
for 4d_array in 5d_list:
#plot 4d array on figure
I have two (or sometimes more) matrixes, which I want to combine to a tensor. The matrixes e.g. have the shape (100, 400) and when they are combined, they should have the dimensions (2, 100, 400).
How do I do that? I tried it the same way I created matrixes from vectors, but that didn't work:
tensor = numpy.concatenate(list_of_matrixes, axis=0)
Probably you want
tensor = np.array(list_of_matrices)
np.array([...]) just loves to combine the inputs into a new array along a new axis. In fact it takes some effort to prevent that.:)
To use concatenate you need to add an axis to your arrays. axis=0 means 'join on the current 1st axis', so it would produce a (200,400) array.
np.concatentate([arr1[None,...], arr2[None,...], axis=0)
would do the the trick, or more generally
np.concatenate([arr[None,...] for arr in list_arr], axis=0)
If you look at the code for dstack, hstack, vstack you'll see that they do this sort of dimension adjustment before passing the task to concatenate.
The np.array solution is easy, but the concatenate solution is a good learning opportunity.