Export a numpy matrix of complex numbers to CSV - python

I am having the following trouble in Python. Assume a numpy.matrix A with entities of dtype to be complex128. I want to export A in CSV format so that the entries are separated by commas and each line at the output file corresponds to a row of A. I also need 18 decimal points of precision for both the real and imaginary parts and no spaces within an entry for example I need this
`6.103515626000000000e+09+1.712134684679831166e+05j`
instead of
`6.103515626000000000e+09 + 1.712134684679831166e+05j`
The following command works but only for 1-by-1 matrix
numpy.savetxt('A.out', A, fmt='%.18e%+.18ej', delimiter=',')
If I use:
numpy.savetxt('A.out', A, delimiter=',')
there are two problems. First, I don't know how many decimal points are preserved by default. Second, each complex entry is put in parentheses like
(6.103515626000000000e+09+1.712134684679831166e+05j)
and I cannot read the file in Matlab.
What do you suggest?

This is probably not the most efficient way of converting data in the large matrix and I am sure there exists a more efficient one-line-of-code solution, but you can try executing the code below and see if it works. Here I will be using pandas to save data to a csv file. The first columns in the generated csv file would be respectively your real and imaginary parts. Here I also assume that the dimension of the input matrix is Nx1.
import pandas as pd
import numpy as np
def to_csv(t, nr_of_decimal = 18):
t_new = np.matrix(np.zeros((t.shape[0], 2)))
t_new[:,:] = np.round(np.array(((str(np.array(t[:])[0][0])[1:-2]).split('+')), dtype=float), decimals=nr_of_decimal)
(pd.DataFrame(t_new)).to_csv('out.csv', index = False, header = False)
#Assume t is your complex matrix
t = np.matrix([[6.103515626000000000e+09+1.712134684679831166e+05j], [6.103515626000000000e+09+1.712134684679831166e+05j]])
to_csv(t)

Related

Space seprated values to ndarray

I am trying to have a data set imported for my machine learning model.
Data is of images stored as values of pixels of each image.
Image size is 48X48.
I need to convert this to ndarray but as its space separated so
X = data[:, 1].reshape(data.shape[0],1,48, 48).astype( 'float32' )
doesn't work.
I need help in converting this data to ndarray of(nX48X48).(n->no. of rows.)
EDIT:-
I tried,
`data=pd.read_csv('../input/fer2013.csv').values
X=data[:,1]
for i in range(len(X)):
X[i]=np.asarray(X[i].split(" "),dtype=np.float32)
X[i]=X[i].reshape(1,48,48).astype('float32')`
It does not change shape of column.
I wish the shape of X to be(n,1,48,48)
But doing above keeps the shape to be(n).sample row of data
Format of data:
pixels column of dataset
Thanks and regards,
Judging from your description, it seems that the columns are space separated and the rows are newline separated. If this is the case, then you can use Numpy's genfromtxt() method.
from numpy import genfromtxt
my_data = genfromtxt(f, delimiter=' ')
If you've already got the space-newline-separated mess read in as a string string_data, you need to split it into lines first:
f = string_data.split('\n')
From your link it seems you can just pass in the data["pixels"] Series, possibly.

Python: How does converters work in genfromtxt() function?

I am new to Python, I have a following example that I don't understand
The following is a csv file with some data
%%writefile wood.csv
item,material,number
100,oak,33
110,maple,14
120,oak,7
145,birch,3
Then, the example tries to define a function to convert those trees name above to integers.
tree_to_int = dict(oak = 1,
maple=2,
birch=3)
def convert(s):
return tree_to_int.get(s, 0)
The first question is why is there a "0" after "s"? I removed that "0" and get same result.
The last step is to read those data by numpy.array
data = np.genfromtxt('wood.csv',
delimiter=',',
dtype=np.int,
names=True,
converters={1:convert}
)
I was wondering for the converters argument, what does {1:convert} exact mean? Especially what does number 1 mean in this case?
For the second question, according to the documentation (https://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html), {1:convert} is a dictionary whose keys are column numbers (where the first column is column 0) and whose values are functions that convert the entries in that column.
So in this code, the 1 indicates column one of the csv file, the one with the names of the trees. Including this argument causes numpy to use the convert function to replace the tree names with their corresponding numbers in data.

How to create a table in Python that has headers (text), filled with values (int and float)

I am filling an numpy array in python (could change this to a list if neccesary), and i want to fill it with column headings, then enter a loop and fill the table with values, I am struggling with which type to use for the array. I have something like this so far...
info = np.zeros(shape=(no_of_label+1,19),dtype = np.str) #Creates array to store coordinates of particles
info[0,:] = ['Xpos','Ypos','Zpos','NodeNumber','BoundingBoxTopX','BoundingBoxTopY','BoundingBoxTopZ','BoundingBoxBottomX','BoundingBoxBottomY','BoundingBoxBottomZ','BoxVolume','Xdisp','Ydisp','Zdisp','Xrot','Yrot','Zrot','CC','Error']
for i in np.arange(1,no_of_label+1,1):
info[i,:] = [C[0],C[1],C[2],i,int(round(C[0]-b)),int(round(C[1]-b)),int(round(C[2]-b)),int(round(C[0]+b)),int(round(C[1]+b)),int(round(C[2]+b)),volume,0,0,0,0,0,0,0,0] # Fills an array with label.No., size of box, and co-ords
np.savetxt(save_path+Folder+'/Data_'+Folder+'.csv',information,fmt = '%10.5f' ,delimiter=",")
There is other things in the loop, but they are irrelevent, C is an array of float, b is int.
I also need to be able to save it as a csv file as shown in the last line, and open it in excel.
What I have now, returns all the values as integers, when i need C[0], C[1], C[2] to be floating point.
Thanks in advance!
It depends on what you want to do with this array but I think you want to use 'dtype=object' instead of 'np.str'. You can do that explicitly, by changing 'np.str' to 'dtype' or here is how I would write the first part of your code:
import numpy as np
labels = ['Xpos','Ypos','Zpos','NodeNumber','BoundingBoxTopX','BoundingBoxTopY',
'BoundingBoxTopZ','BoundingBoxBottomX','BoundingBoxBottomY','BoundingBoxBottomZ',
'BoxVolume','Xdisp','Ydisp','Zdisp','Xrot','Yrot','Zrot','CC','Error']
no_of_label = len(labels)
#make a list of length ((no_of_label+1)*19) and convert it to an array and reshape it
info = np.array([None]*((no_of_label+1)*19)).reshape(no_of_label+1, 19)
info[0] = labels
Again, there is probably a better way of doing this if you have a specific application in mind, but this should let you store different types of data in the same 2D array.
I have solved it as follows:
info = np.zeros(shape=(no_of_label+1,19),dtype=float)
for i in np.arange(1,no_of_label+1,1):
info[i-1] = [C[0],C[1],C[2],i,int(round(C[0]-b)),int(round(C[1]-b)),int(round(C[2]-b)),int(round(C[0]+b)),int(round(C[1]+b)),int(round(C[2]+b)),volume,0,0,0,0,0,0,0,0]
np.savetxt(save_path+Folder+'/Data_'+Folder+'.csv',information,fmt = '%10.5f' ,delimiter=",",header='Xpos,Ypos,Zpos,NodeNumber,BoundingBoxTopX,BoundingBoxTopY,BoundingBoxTopZ,BoundingBoxBottomX,BoundingBoxBottomY,BoundingBoxBottomZ,BoxVolume,Xdisp,Ydisp,Zdisp,Xrot,Yrot,Zrot,CC,Error',comments='')
Using the header function built in to the numpy save text feature. Thanks everyone!

Saving/loading a table (with different column lengths) using numpy

A bit of context: I am writting a code to save the data I plot to a text file. This data should be stored in such a way it can be loaded back using a script so it can be displayed again (but this time without performing any calculation). The initial idea was to store the data in columns with a format x1,y1,x2,y2,x3,y3...
I am using a code which would be simplified to something like this (incidentally, I am not sure if using a list to group my arrays is the most efficient approach):
import numpy as np
MatrixResults = []
x1 = np.array([1,2,3,4,5,6])
y1 = np.array([7,8,9,10,11,12])
x2 = np.array([0,1,2,3])
y2 = np.array([0,1,4,9])
MatrixResults.append(x1)
MatrixResults.append(y1)
MatrixResults.append(x2)
MatrixResults.append(y2)
MatrixResults = np.array(MatrixResults)
TextFile = open('/Users/UserName/Desktop/Datalog.txt',"w")
np.savetxt(TextFile, np.transpose(MatrixResults))
TextFile.close()
However, this code gives and error when any of the data sets have different lengths. Reading similar questions:
Can numpy.savetxt be used on N-dimensional ndarrays with N>2?
Table, with the different length of columns
However, this requires to break the format (either with flattening or adding some filling strings to the shorter columns to fill the shorter arrays)
My issue summarises as:
1) Is there any method that at the same time we transpose the arrays these are saved individually as consecutive columns?
2) Or maybe is there anyway to append columns to a text file (given a certain number of rows and columns to skip)
3) Should I try this with another library such as pandas?
Thank you very for any advice.
Edit 1:
After looking a bit more it seems that leaving blank spaces is more innefficient than filling the lists.
In the end I wrote my own (not sure if there is numpy function for this) in which I match the arrays length with "nan" values.
To get the data back I use the genfromtxt method and then I use this line:
x = x[~isnan(x)]
To remove the these cells from the arrays
If I find a better solution I will post it :)
To save your array you can use np.savez and read them back with np.load:
# Write to file
np.savez(filename, matrixResults)
# Read back
matrixResults = np.load(filename + '.npz').items[0][1]
As a side note you should follow naming conventions i.e. only class names start with upper case letters.

calculating means of many matrices in numpy

I have many csv files which each contain roughly identical matrices. Each matrix is 11 columns by either 5 or 6 rows. The columns are variables and the rows are test conditions. Some of the matrices do not contain data for the last test condition, which is why there are 5 rows in some matrices and six rows in other matrices.
My application is in python 2.6 using numpy and sciepy.
My question is this:
How can I most efficiently create a summary matrix that contains the means of each cell across all of the identical matrices?
The summary matrix would have the same structure as all of the other matrices, except that the value in each cell in the summary matrix would be the mean of the values stored in the identical cell across all of the other matrices. If one matrix does not contain data for the last test condition, I want to make sure that its contents are not treated as zeros when the averaging is done. In other words, I want the means of all the non-zero values.
Can anyone show me a brief, flexible way of organizing this code so that it does everything I want to do with as little code as possible and also remain as flexible as possible in case I want to re-use this later with other data structures?
I know how to pull all the csv files in and how to write output. I just don't know the most efficient way to structure flow of data in the script, including whether to use python arrays or numpy arrays, and how to structure the operations, etc.
I have tried coding this in a number of different ways, but they all seem to be rather code intensive and inflexible if I later want to use this code for other data structures.
You could use masked arrays. Say N is the number of csv files. You can store all your data in a masked array A, of shape (N,11,6).
from numpy import *
A = ma.zeros((N,11,6))
A.mask = zeros_like(A) # fills the mask with zeros: nothing is masked
A.mask = (A.data == 0) # another way of masking: mask all data equal to zero
A.mask[0,0,0] = True # mask a value
A[1,2,3] = 12. # fill a value: like an usual array
Then, the mean values along first axis, and taking into account masked values, are given by:
mean(A, axis=0) # the returned shape is (11,6)

Categories