Import matrix from text file using python - python

I have two text files that have matrices written in them(not numpy matrices, so its a list of lists). These matrices are written in string format, so the text file looks like this :
[[1,2,3],[3,4,5],[6,7,8]],[[3,3,3],[5,6,7],.....
I want to read this matrix back from the text file using python. I can't read using numpy as it gives ValueError: could not convert string to float
Is there anyway to do this? Would it be easier if I just wrote the matrix as a numpy matrix in the first place(I need to change code of a previous program for that, and was just wondering if there was a python way of loading matrices when it was stored as a string in a text file)?

You could make use of the ast module:
import ast
strArray = "[[1,2,3],[3,4,5],[6,7,8]]"
# evaluates the array in string format and converts it to a python array object
array = ast.literal_eval(strArray)
note:
For multiple nested arrays like you have, literal_eval will most likely convert the string into a tuple with nested arrays as elements. Just keep that in mind as you use this module.

Related

Python: Convert Long Numpy Array Into a Short Sequence of Characters?

I have an image that I converted to a numpy array using OpenCV. I want to copy the numpy array's printed output and assign it to a different variable. The issue is the resulting printed out numpy array appears to be thousands of lines long arranged vertically. Below is just a tiny snippet screen shot:
My question is: is there a way I can print out the numpy array so that it prints horizontally instead? Or is there a way to convert my numpy array into a short unique identifier like using bitwise or something.

Whey saving an numpy array of float arrays to .npy file using numpy.save/numpy.load, is there any reason why the order of the arrays would change?

I currently have data where each row has a text passage and a numpy float array.
As far as I know, the it's not efficient to save these two datatypes into one data format (correct me if I am wrong). So I am going to save them separately, with another column of ints that will be used to map the two datasets together when I want to join them again.
I have having trouble figuring out how to append a column of ints next to the float arrays (if anyone has a solution to that I would love to hear it) and then save the numpy array.
But then I realized I can just save the float arrays as is with numpy.save without the extra int column if I can get a confirmation that numpy.save and numpy.load will never change the order of the arrays.
That way I can just append the loaded numpy float arrays to the pandas dataframe as is.
Logically, I don't see any reason why the order of the rows would change, but perhaps there's some optimization compression that I am unaware of.
Would numpy.save or numpy.load ever change the order of a numpy array of float arrays?
The order will not change by the numpy save / load. You are saving the numpy object as is. An array is an ordered object.
Note: if you want to save multiple data arrays to the same file, you can use np.savez.
>>> np.savez('out.npz', f=array_of_floats, s=array_of_strings)
You can retrieve back each with the following:
>>> data = np.load('out.npz')
>>> array_of_floats = data['f']
>>> array_of_strings = data['s']

Converting python Dataframe to Matlab file

I am trying to convert a python Dataframe to a Matlab (.mat) file.
I initially have a txt (EEG signal) that I import using panda.read_csv:
MyDataFrame = pd.read_csv("data.txt",sep=';',decimal='.'), data.txt being a 2D array with labels. This creates a dataframe which looks like this.
In order to convert it to .mat, I tried this solution where the idea is to convert the dataframe into a dictionary of lists but after trying every aspect of this solution it's still unsuccessful.
scipy.io.savemat('EEG_data.mat', {'struct':MyDataFrame.to_dict("list")})
It did create a .mat file but it did not save my dataframe properly. The file I obtain after looks like this, so all the values are basically gone, and the remaining labels you see are empty when you look into them.
I also tried using mat4py which is designed to export python structures into Matlab files, but it did not work either. I don't understand why, because converting my dataframe to a dictionary of lists is exactly what should be done according to the mat4py documentation.
I believe that the reason the previous solutions haven't worked for you is that your DataFrame column names are not valid MATLAB struct field names, because they contain spaces and/or start with digit characters.
When I do:
import pandas as pd
import scipy.io
MyDataFrame = pd.read_csv('eeg.txt',sep=';',decimal='.')
truncDataFrame = MyDataFrame[0:1000] # reduce data size for test purposes
scipy.io.savemat('EEGdata1.mat', {'struct1':truncDataFrame.to_dict("list")})
the result in MATLAB is a struct with the 4 fields reltime, datetime, iSensor and quality. Each of these has 1000 elements, so the data from these columns has been converted, but the rest of your data is missing.
However if I first rename the DataFrame columns:
truncDataFrame.rename(columns=lambda x:'col_' + x.replace(' ', '_'), inplace=True)
scipy.io.savemat('EEGdata2.mat', {'struct2':truncDataFrame.to_dict("list")})
the result in MATLAB is a struct with 36 fields. This is not the same format as your mat4py solution but it does contain (as far as I can see) all the data from the source DataFrame.
(Note that in your question, you are creating a .mat file that contains a variable called struct and when this is loaded into MATLAB it masks the builtin struct datatype - that might also cause issues with subsequent MATLAB code.)
I finally found a solution thanks to this post. There, the poster did not create a dictionary of lists but a dictionary of integers, which worked on my side. It is a small example, easily reproductible. Then I tried to manually add lists by entering values like [1, 2], an it did not work. But what worked was when I manually added tuples !
MyDataFrame needs to be converted to a dictionary and if a dictionary of lists doesn't work, try with tuples.
For beginners : lists are contained by [] and tuples by (). Here is an image showing both.
This worked for me:
import mat4py as mp
EEGdata = MyDataFrame.apply(tuple).to_dict()
mp.savemat('EEGdata.mat',{'structs': EEGdata})
EEGdata.mat should now be readable by Matlab, as it is on my side.

How to define multidimensional matrix with strings in Python?

I wish to store strings in multidimensional array. I tried using numpy package along with following line:
co_entity = np.zeros((5000,4))
However, I need to store strings later on. This matrix cannot be used to store strings as it has floats/int. I tried using list to store the strings but since the number of input is dynamic, I have to use multidimensional array with upper limit.
Any ideas for this?
You could try object type with empty() function like so
co_entity = np.empty((5000,4), dtype='object')
This will allow you to store a string in each of the elements generated.

numpy array values to be converted from string to float?

I have a dataset like the one shown below
http://i.stack.imgur.com/1uxCK.png
I am able to read them into an numpy array but the datatype is of type string when it has read from the CSV file. I am unable to convert the same into float since without that i would not be able to proceed further.Mind you there are blank spaces between the two data columns shown in the first screenshot.
The numpy array structure when printed looks like in the screenshot given below:
http://i.stack.imgur.com/JFfzw.png
Note: (Observe the Single Quotation Marks between the start and end of each data line in the screenshot which is a proof that numpy has stored the data as a string rather than float)
Any help would be appreciated in helping me convert the data from string to float type?????? have Tried many things but yet all in vain!!!!!!!!
numpy.loadtxt(filename) should work out of the box: it yields numbers.

Categories