how to save feature matrix as csv file - python

I have several features of Image and after image pre-processing I have plenty of data which I need to use frequently in future works. So to save time I want to save the data of image features in csv format. The following image features are the row attributes: Intensity, Skewness, Kurtosis, Std_deviation Max5, Min5.
Here every image feature is a numpy array of size (34560,1).
How to make a csv file which consists of all these image features.

You can use structured array if you want to include attribute name to numpy array. But that will make things a bit more complicated to use. I would rather save the numpy array with same types and save the attributes name somewhere else. That is more straight forward and easier to use.
Example: For the sake of simplicity, let's say that you have three col arrays of length 4: a, b, c
a -> array([[1],
[2],
[3],
[4]])
a.shape -> (4,1)
b and c array have same array shape.
For the sake of faster access to the data, it would be better to make that as a row array so that it is stored continuously on the disk and memory when loaded.
a = a.ravel(); b = b.ravel(); c = c.ravel()
a - > array([1, 2, 3, 4])
a.shape -> (4,)
Then, you stack them into a large array and save it to csv file.
x = np.vstack((a,b,c))
array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]])
x.shape -> (3,4)
Then, just save this stacked array to csv file.
np.savetxt('out.csv', x, delimiter=',')
This can be done in one line:
np.savetxt('out.csv', np.vstack(a.ravel(), b.ravel(), c.ravel()), delimiter='x')

For example if yo got your out put into a variable "result" then you can save that result in csv format using the below commands
import pandas as pd
result = "......................"(your expression)
result1 = pd.DataFrame({'col1':result[:,0],'col2':result[:,1]})
result1.to_csv('Myresult.csv', header = False)
in the place of "Myresult" you can mention your desire path of output. In my case it would be like "C:\Users\dinesh.n\Desktop\Myresult.csv".
I hope it clears your doubt if not please excuse me and correct me.
Thank you.

Related

concatenate a column vector to the end of a matrix

I was trying to add a column vector at the end of a matrix as follows :
import numpy as np
datas=[[1,2],[3,4]]
temp=[1,2]
datas=np.array(datas)
temp=np.transpose(np.array(temp))
np.append(datas,temp,axis=1)
But I'm getting dimension mismatch error?
How do I do this properly then?
you need to add one dimension to temp so that both the array have same dimension
import numpy as np
datas=[[1,2],[3,4]]
temp=[1,2]
datas=np.array(datas)
temp=np.array(temp)[:, np.newaxis] ## this adds new dimension
np.append(datas,temp,axis=1)
you can also do it using concatenate function like below. It will perform better if you are concatenating more than two arrays. Here you create python list ls in a loop and then concatenate them
ls = [datas,temp]
np.concatenate(ls, axis=1)
Would recommend you just use np.expand_dims() and then np.hstack()
datas=[[1,2],[3,4]]
temp=[1,2]
#Expand the dims of temp
temp = np.expand_dims(temp,1)
#Stack horizontally
np.hstack((datas, temp))
array([[1, 2, 1],
[3, 4, 2]])

numpy Reshaping changes the images

I have an numpy array X which contains 2d images. numpy array dimensions are (1000,60,40) (1000=no.of img).
I want to feed this array to my model but requires dimensions to be
(1000,60,40,1) (appended 1 is for no. of channels).
so i reshape the array by
Y=X.reshape(1000,60,40,1)
as I was having wrong predictions I checked by re-reshaping the reshaped array to check if it was same as my orig img,
I did that by doing
Z=Y.reshape(1000,60,40)
And I saved them as PNG by doing
for i in range(1000):
misc.imsave('img_rereshaped'+str(i)+'.png',Z[i])
It gives some png files as output but they are not same as the respective original ones from the X numpy array
Am I reshaping in the wrong way or reshaping changes the input data and again reshaping the reshaped data would give different result than the original data?
To test whether the reshaping is causing a problem, it's better to test it without involving other potential errors coming from, say, misc.imsave() etc.
Running something like:
import numpy as np
a = np.random.rand(10,3)
b = np.reshape(a, [10, 3, 1])
c = np.reshape(b, [10, 3])
print(np.sum(c - a))
you'll see that going back and forth using reshape doesn't cause a problem.
Could be you're not using the PNG save correctly. Perhaps the function expects 3 channels for example. Try plotting it locally using matplotlib.

Append value to each array in a numpy array

I have a numpy array of arrays, for example:
x = np.array([[1,2,3],[10,20,30]])
Now lets say I want to extend each array with [4,40], to generate the following resulting array:
[[1,2,3,4],[10,20,30,40]]
How can I do this without making a copy of the whole array? I tried to change the shape of the array in place but it throws a ValueError:
x[0] = np.append(x[0],4)
x[1] = np.append(x[1],40)
ValueError : could not broadcast input array from shape (4) into shape (3)
You can't do this. Numpy arrays allocate contiguous blocks of memory, if at all possible. Any change to the array size will force an inefficient copy of the whole array. You should use Python lists to grow your structure if possible, then convert the end result back to an array.
However, if you know the final size of the resulting array, you could instantiate it with something like np.empty() and then assign values by index, rather than appending. This does not change the size of the array itself, only reassigns values, so should not require copying.
While #roganjosh is right that you cannot modify the numpy arrays without making a copy (in the underlying process), there is a simpler way of appending each value of an ndarray to the end of each numpy array in a 2d ndarray, by using numpy.column_stack
x = np.array([[1,2,3],[10,20,30]])
array([[ 1, 2, 3],
[10, 20, 30]])
stack_y = np.array([4,40])
array([ 4, 40])
numpy.column_stack((x, stack_y))
array([[ 1, 2, 3, 4],
[10, 20, 30, 40]])
Create a new matrix
Insert the values of your old matrix
Then, insert your new values in the last positions
x = np.array([[1,2,3],[10,20,30]])
new_X = np.zeros((2, 4))
new_X[:2,:3] = x
new_X[0][-1] = 4
new_X[1][-1] = 40
x=new_X
Or Use np.reshape() or np.resize() instead

How to perform iterative 2D operation on 4D numpy array

Let me preface this post by saying that I'm pretty new to Python and NumPy, so I'm sure I'm overlooking something simple. What I'm trying to do is image processing over a PGM (grayscale) file using a mask (a mask convolution operation); however, I don't want to do it using the SciPy all-in-one imaging processing libraries that are available—I'm trying to implement the masking and processing operations myself. What I want to do is the following:
Iterate a 3x3 sliding window over a 256x256 array
At each iteration, I want to perform an operation with a 3x3 image mask (array that consists of fractional values < 1 ) and the 3x3 window from my original array
The operation is that the image mask gets multiplied by the 3x3 window, and that the results get summed up into one number, which represents a weighted average of the original 3x3 area
This sum should get inserted back into the center of the 3x3 window, with the original surrounding values left untouched
However, the output of one of these operations shouldn't be the input of the next operation, so a new array should be created or the original 256x256 array shouldn't be updated until all operations have completed.
The process is sort of like this, except I need to put the result of the convolved feature back into the center of the window it came from:
(source: stanford.edu)
So, in this above example, the 4 would go back into the center position of the 3x3 window it came from (after all operations had concluded), so it would look like [[1, 1, 1], [0, 4, 1], [0, 0, 1]] and so on for every other convolved feature obtained. A non-referential copy could also be made of the original and this new value inserted into that.
So, this is what I've done so far: I have a 256x256 2D numpy array which is my source image. Using as_strided, I convert it into a 4D numpy array of 3x3 slices. The main problem I'm facing is that I want to execute the operation I've specified over each slice. I'm able to perform it on one slice, but in npsum operations I've tried, it adds up all the slices' results into one value. After this, I either want to create a new 256x256 array with the results, in the fashion that I've described, or iterate over the original, replacing the middle values of each 3x3 window as appropriate. I've tried using ndenumerate to change just the same value (v, x, 1, 1) of my 4D array each time, but since the index returned from my 4D array is of the form (v, x, y, z), I can't seem to figure out how to only iterate through (v, x) and leave the last two parts as constants that shouldn't change at all.
Here's my code thus far:
import numpy as np
from numpy.lib import stride_tricks
# create 256x256 NumPy 2D array from image data and image size so we can manipulate the image data, then create a 4D array of strided windows
# currently, it's only creating taking 10 slices to test with
imageDataArray = np.array(parsedPGMFile.imageData, dtype=int).reshape(parsedPGMFile.numRows, parsedPGMFile.numColumns)
xx = stride_tricks.as_strided(imageDataArray, shape=(1, 10, 3, 3), strides=imageDataArray.strides + imageDataArray.strides)
# create the image mask to be used
mask = [1,2,1,2,4,2,1,2,1]
mask = np.array(mask, dtype=float).reshape(3, 3)/16
# this will execute the operation on just the first 3x3 element of xx, but need to figure out how to iterate through all elements and perform this operation individually on each element
result = np.sum(mask * xx[0,0])
Research from sources like http://wiki.scipy.org/Cookbook/GameOfLifeStrides, http://www.johnvinyard.com/blog/?p=268, and http://chintaksheth.wordpress.com/2013/07/31/numpy-the-tricks-of-the-trade-part-ii/ were very helpful (as well as SO), but they don't seem to address what I'm trying to do exactly (unless I'm missing something obvious). I could probably use a ton of for loops, but I'd rather learn how to do it using these awesome Python libraries we have. I also realize I'm combining a few questions together, but that's only because I have the sneaking suspicion that this can all be done very simply! Thanks in advance for any help!
When you need to multiply element-wise, then reduce with addition, think np.dot or np.einsum:
from numpy.lib.stride_tricks import as_strided
arr = np.random.rand(256, 256)
mask = np.random.rand(3, 3)
arr_view = as_strided(arr, shape=(254, 254, 3, 3), strides=arr.strides*2)
arr[1:-1, 1:-1] = np.einsum('ijkl,kl->ij', arr_view, mask)
Based on the example illustration:
In [1]: import numpy as np
In [2]: from scipy.signal import convolve2d
In [3]: image = np.array([[1,1,1,0,0],[0,1,1,1,0],[0,0,1,1,1],[0,0,1,1,0],[0,1,1,0,0]])
In [4]: m = np.array([[1,0,1],[0,1,0],[1,0,1]])
In [5]: convolve2d(image, m, mode='valid')
Out[5]:
array([[4, 3, 4],
[2, 4, 3],
[2, 3, 4]])
And putting it back where it came from:
In [6]: image[1:-1,1:-1] = convolve2d(image, m, mode='valid')
In [7]: image
Out[7]:
array([[1, 1, 1, 0, 0],
[0, 4, 3, 4, 0],
[0, 2, 4, 3, 1],
[0, 2, 3, 4, 0],
[0, 1, 1, 0, 0]])

Convert 1D array into numpy matrix

I have a simple, one dimensional Python array with random numbers. What I want to do is convert it into a numpy Matrix of a specific shape. My current attempt looks like this:
randomWeights = []
for i in range(80):
randomWeights.append(random.uniform(-1, 1))
W = np.mat(randomWeights)
W.reshape(8,10)
Unfortunately it always creates a matrix of the form:
[[random1, random2, random3, ...]]
So only the first element of one dimension gets used and the reshape command has no effect. Is there a way to convert the 1D array to a matrix so that the first x items will be row 1 of the matrix, the next x items will be row 2 and so on?
Basically this would be the intended shape:
[[1, 2, 3, 4, 5, 6, 7, 8],
[9, 10, 11, ... , 16],
[..., 800]]
I suppose I can always build a new matrix in the desired form manually by parsing through the input array. But I'd like to know if there is a simpler, more eleganz solution with built-in functions I'm not seeing. If I have to build those matrices manually I'll have a ton of extra work in other areas of the code since all my source data comes in simple 1D arrays but will be computed as matrices.
reshape() doesn't reshape in place, you need to assign the result:
>>> W = W.reshape(8,10)
>>> W.shape
(8,10)
You can use W.resize(), ndarray.resize()

Categories