Convert list of NDarrays to dataframe - python

I'm trying to convert a list ND arrays to a dataframe in order to do a Isomap on it. But this doesn't convert. Anyone how to convert in such that I can do an Isomap on it?
#Creation and filling of list samples*
samples = list()
for i in range(72):
img =misc.imread('Datasets/ALOI/32/32_r'+str(i*5)+'.png' )
samples.append(img)
...
df = pd.DataFrame(samples) #This doesn't work gives
#ValueError: Must pass 2-d input*
...
iso = manifold.Isomap(n_neighbors=4, n_components=3)
iso.fit(df) #The end goal of my DataFrame

That is obvious, isn't it? All images are 2D data, rows and columns. Stacking them in a list causes it to gain a third dimension. DataFrames are by nature 2D. Hence the error.
You have 2 possible fixes:
Create a Panel.
wp = pd.Panel.from_dict(zip(samples, [str(i*5) for i in range(72)]))
Stack your arrays one on top of the other, or side by side:
# On top of another:
df = pd.concat([pd.DataFrame(sample) for sample in samples], axis=0,
keys=[str(i*5) for i in range(72)])
# Side by side:
df = pd.concat([pd.DataFrame(sample) for sample in samples], axis=1,
keys=[str(i*5) for i in range(72)])

Another way to do it is to convert your 2D arrays (images) to 1D arrays (that are expected by sklearn) using the reshape method on the images:
for i in range(yourRange):
img = misc.imread(yourFile)
samples.append(img.reshape(-1))
df = pd.DataFrame(samples)

Olivera almost had it.
the problem
When you run misc.imread, the output is a NxM (2D) array. Putting this in a list, makes it 3D. DataFrame expects a 2D input.
the fix
Before it goes in the list, the array should be 'flattened' using ravel:
img =misc.imread('Datasets/ALOI/32/32_r'+str(i*5)+'.png' ).ravel()
why .reshape(-1) doesn't work
Reshaping the array preserves the array's rank. Instead of converting it to an Nx1 array, you want it to be Nx(nothing), which is what ravel() does.

Related

Efficiently filter 3D matrix in numpy with variable 2D masks

I have a 3D numpy array points of dimensions [10000x3000x128] where the first dimension is the number of frames, the second dimension the number of points in each frame and the third dimension is a 128-element feature vector associated to each point. What I want to do is to efficiently filter the points in each frame by using a boolean 2D mask of dimensions [10000x3000] and for each of the selected points also take the related 128-dim vector of features. Moreover, in output I need still a 3D vector and not a merged 2D vector and possibly avoid any for loop.
Actually what I'm doing is:
# example of points
points = np.array([10000, 3000, 128])
# fg, bg = 2D dimensional boolean np.array
# init empty lists
fg_points, bg_points = [], []
for i in range(points.shape[0]):
fg_mask_tmp, bg_mask_tmp = fg[i], bg[i]
fg_points.append(points[i,fg_mask_tmp,:])
bg_points.append(points[i,bg_mask_tmp,:])
fg_features, bg_features = np.array(fg_points), np.array(bg_points)
But this is a quite naive solution that for sure can be improved in a more numpy-like way.
In addition, I also tried other solutions as:
fg_features = points[fg,:]
But this solution does not preserve the dimensions of the array merging the two first dimensions since the number of filtered points for each frame can vary.
Another solution I tried is to enlarge the 2D masks by appending a [128] true value to the last dimension, but with any successful result.
Dos anyone know a possible efficient solution?
Thank you in advance for any help!

How to stack matrices in a single column table

I am trying to store 20 automatically generated Matrices in a single column Matrix, so this last Matrix would be a 1x20 Matrix.
For this I am using numpy and vstack, but it doesn't work, it Keep on getting the following error:
ValueError: all the input arrays must have same number of dimensions
Even though all the Matrices tham I'm trying to stack together have the same dimensions (881 x 882)
So I'd like to know what is wrong About this or if there is any other way to stack all the Matrices in a way that if one of them is needed I can access easily to that one.
Try to change dimensions with expand and squeeze functions:
y = np.expand_dims(x, axis=0) # dim 20 become 1x20
y = np.squeeze(x, axis=0) # dim 1x20 become 20

Transpose a 1-dimensional array in Numpy without casting to matrix

My goal is to to turn a row vector into a column vector and vice versa. The documentation for numpy.ndarray.transpose says:
For a 1-D array, this has no effect. (To change between column and row vectors, first cast the 1-D array into a matrix object.)
However, when I try this:
my_array = np.array([1,2,3])
my_array_T = np.transpose(np.matrix(myArray))
I do get the wanted result, albeit in matrix form (matrix([[66],[640],[44]])), but I also get this warning:
PendingDeprecationWarning: the matrix subclass is not the recommended way to represent matrices or deal with linear algebra (see https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html). Please adjust your code to use regular ndarray.
my_array_T = np.transpose(np.matrix(my_array))
How can I properly transpose an ndarray then?
A 1D array is itself once transposed, contrary to Matlab where a 1D array doesn't exist and is at least 2D.
What you want is to reshape it:
my_array.reshape(-1, 1)
Or:
my_array.reshape(1, -1)
Depending on what kind of vector you want (column or row vector).
The -1 is a broadcast-like, using all possible elements, and the 1 creates the second required dimension.
If your array is my_array and you want to convert it to a column vector you can do:
my_array.reshape(-1, 1)
For a row vector you can use
my_array.reshape(1, -1)
Both of these can also be transposed and that would work as expected.
IIUC, use reshape
my_array.reshape(my_array.size, -1)

Weird issue masking 3D arrays iteratively in Python loop

I have a 3D array, containing 10 2D maps of the world. I created a mask of the oceans, and I am trying to create a second array, identical to my first 3D array, but where the oceans are masked for each year. I thought that this should work:
SIF_year = np.ndarray((SIF_year0.shape))
for i in range(0,SIF_year0.shape[0]):
SIF_year[i,:,:] = np.ma.array(SIF_year0[i,:,:], mask = np.logical_not(mask_global_land))
where SIF_year0 is the initial 3D array, and SIF_year is the version that has been masked. However, SIF_year comes out looking just like SIF_year0. Interestingly, if I do:
SIF_year = np.ndarray((SIF_year0.shape))
for i in range(0,SIF_year0.shape[0]):
SIF_test = np.ma.array(SIF_year0[i,:,:], mask = np.logical_not(mask_global_land))
then SIF_test is the masked 2D array I need. I have tried saving the masked array to SIF_test and then resaving it into SIF_year[i,:,:], but then SIF_year looks like SIF_year0 again!
There must be some obvious mistake I'm missing...
I think I have solved the problem by adding an extra step in the loop that replaces the masked values by np.NaN using ma.filled (https://docs.scipy.org/doc/numpy/reference/routines.ma.html):
SIF_year = np.ndarray((SIF_year0.shape))
for i in range(0,SIF_year0.shape[0]):
SIF_test = np.ma.array(SIF_year0[i,:,:], mask = np.logical_not(mask_global_land))
SIF_year[i,:,:] = np.ma.filled(SIF_test, np.nan)

Python2.7 (numpy) Keeping shape of array when appending a 3-d numpy array to an empty array

I am trying to create an empty numpy array, and save all the images that I get from my device. The images come in as numpy array of shape (240,320,3). Creating an empty array to store these images seems like the correct thing to do. When I try to append however, I get this error:
ValueError: all the input arrays must have same number of dimensions
Code as follows:
import numpy as np
# will be appending many images of size (240,320,3)
images = np.empty((0,240,320,3),dtype='uint8')
# filler image to append
image = np.ones((240,320,3),dtype='uint8') * 255
images = np.append(images,image,axis=0)
I need to append many images to this array, so after 100 appends, the shape of the images array should be of shape (100,240,320,3) if done correctly.
Better than np.append is:
images = np.empty((100,240,320,3),dtype='uint8')
for i in range(100):
image = ....
images[i,...] = image
or
alist = []
for i in range(100):
image = ....
alist.append(image)
images = np.array(alist)
# or images = np.stack(alist, axis=0) for more control
np.append is just a cover for np.concatenate. So it makes a new array each time through the loop. By the time you add the 100th image, you have copied the first one 100 times!. The other disadvantage with np.append is that you have to adjust the dimensions of image, a frequent source of error. The other frequent error is getting that initial 'empty' array shape wrong.
Your images array has four dimensions, so you must append a four dimensional item to it. To do so, simply add a new axis to image like so:
images = np.append(images,image[np.newaxis, ...], axis=0)
In a sense, when passing an axis numpy.append is more akin to list.extend than list.append.

Categories