numpy array values to be converted from string to float? - python

I have a dataset like the one shown below
http://i.stack.imgur.com/1uxCK.png
I am able to read them into an numpy array but the datatype is of type string when it has read from the CSV file. I am unable to convert the same into float since without that i would not be able to proceed further.Mind you there are blank spaces between the two data columns shown in the first screenshot.
The numpy array structure when printed looks like in the screenshot given below:
http://i.stack.imgur.com/JFfzw.png
Note: (Observe the Single Quotation Marks between the start and end of each data line in the screenshot which is a proof that numpy has stored the data as a string rather than float)
Any help would be appreciated in helping me convert the data from string to float type?????? have Tried many things but yet all in vain!!!!!!!!

numpy.loadtxt(filename) should work out of the box: it yields numbers.

Related

Getting three dots in the CSV

I've saved some data of mine in a csv file using pandas (from a dict) and if I'm looking at it or printing it I'm getting 3 dots in the middle of the information. I think it might be because the string is too long.
Example:
[-1.19583108e-02, 7.44251342e-03, -1.35046719e-02, ..., 1.01258847e-03, -4.75816538e-03, 1.09870630e-02]
When it should've been about 300 different numbers.
Is there any solution?
Explanation:
Let's say I have a numpy array of 300 entries (we'll call it arr).
I want to store this array in a csv file under the header of test.
So I read the csv file (using pd.read_csv function) and try to get this array by using: df['test'].iloc[0]. Now even if I'm using the commands that I was suggested in the answers - I still get dots (because I think it was saved this way). What I actually want to do is to eval this string to get an actual numpy array and use it as an array, but what I get instead is this:
I figured the ellipsis object is the 3 dots I don't want to get.
Just add the following in the beginning of your code:
import sys
numpy.set_printoptions(threshold=sys.maxsize)
Edit:
Try:
df.loc[df[0] != ...]

Save float and int to file simultaneously in Python

I have a numpy array with a few columns, containing floats, and I want to add one more columns, containing only zeros and save this to file. For the program I need to use this file for, the last column should appear as 0 instead of 0.00000e+00. I tired this:
z = np.zeros(len(data[:,0])).astype(int)
new_data = np.column_stack((data,z))
np.savetxt("data_new.dat",new_data)
but it doesn't seem to work i.e. the zeros appear as floats. One more thing, how can I also specify the number of decimals that the floats should be saved with to the file? Thank you!

Convert column in df to float (sounds simple)

Am fairly new to code but have managed to solve most problems, here though I am stuck. I have a column in a df where all the values are a string in brackets for example '[0.0987]', I can't seem to convert these to float in order to calculate the mean. Every method results in an error such as: 'could not convert string to float:' or 'Could not convert to numeric'. Can't share a link so image below shows an example csv I am loading into pandas.
You have to strip the brackets from the values.
df["qout"].str.strip('[]').astype(float)
Strip - will remove the [] from the column
astype - Will typecast the data as float
You probably have to strip the brackets. Does this work? qout_as_float = float(qout[1:-1])

Converting python Dataframe to Matlab file

I am trying to convert a python Dataframe to a Matlab (.mat) file.
I initially have a txt (EEG signal) that I import using panda.read_csv:
MyDataFrame = pd.read_csv("data.txt",sep=';',decimal='.'), data.txt being a 2D array with labels. This creates a dataframe which looks like this.
In order to convert it to .mat, I tried this solution where the idea is to convert the dataframe into a dictionary of lists but after trying every aspect of this solution it's still unsuccessful.
scipy.io.savemat('EEG_data.mat', {'struct':MyDataFrame.to_dict("list")})
It did create a .mat file but it did not save my dataframe properly. The file I obtain after looks like this, so all the values are basically gone, and the remaining labels you see are empty when you look into them.
I also tried using mat4py which is designed to export python structures into Matlab files, but it did not work either. I don't understand why, because converting my dataframe to a dictionary of lists is exactly what should be done according to the mat4py documentation.
I believe that the reason the previous solutions haven't worked for you is that your DataFrame column names are not valid MATLAB struct field names, because they contain spaces and/or start with digit characters.
When I do:
import pandas as pd
import scipy.io
MyDataFrame = pd.read_csv('eeg.txt',sep=';',decimal='.')
truncDataFrame = MyDataFrame[0:1000] # reduce data size for test purposes
scipy.io.savemat('EEGdata1.mat', {'struct1':truncDataFrame.to_dict("list")})
the result in MATLAB is a struct with the 4 fields reltime, datetime, iSensor and quality. Each of these has 1000 elements, so the data from these columns has been converted, but the rest of your data is missing.
However if I first rename the DataFrame columns:
truncDataFrame.rename(columns=lambda x:'col_' + x.replace(' ', '_'), inplace=True)
scipy.io.savemat('EEGdata2.mat', {'struct2':truncDataFrame.to_dict("list")})
the result in MATLAB is a struct with 36 fields. This is not the same format as your mat4py solution but it does contain (as far as I can see) all the data from the source DataFrame.
(Note that in your question, you are creating a .mat file that contains a variable called struct and when this is loaded into MATLAB it masks the builtin struct datatype - that might also cause issues with subsequent MATLAB code.)
I finally found a solution thanks to this post. There, the poster did not create a dictionary of lists but a dictionary of integers, which worked on my side. It is a small example, easily reproductible. Then I tried to manually add lists by entering values like [1, 2], an it did not work. But what worked was when I manually added tuples !
MyDataFrame needs to be converted to a dictionary and if a dictionary of lists doesn't work, try with tuples.
For beginners : lists are contained by [] and tuples by (). Here is an image showing both.
This worked for me:
import mat4py as mp
EEGdata = MyDataFrame.apply(tuple).to_dict()
mp.savemat('EEGdata.mat',{'structs': EEGdata})
EEGdata.mat should now be readable by Matlab, as it is on my side.

python data types

I wrote a script to take files of data that is in columns and plot it depending on which column the user wants to view. Well, I noticed that the plots look crazy, and have all the wrong numbers because python is ignoring the exponential.
My numbers are in the format: 1.000000E+1 OR 1.000000E-1
What dtype is that? I am using numpy.genfromtxt to import with a dtype = float. I know there are all sorts of dtypes you can enter, but I cannot find a comprehensive list of the options, and examples.
Thanks.
Here is an example of my input (those spaces are tabs):
Time StampT1_ModBtT2_90BendT3_InPET5_Stg2Rfrg
5:22 AM2.115800E+21.400000E+01.400000E+03.035100E+1
5:23 AM2.094300E+21.400000E+01.400000E+03.034800E+1
5:24 AM2.079300E+21.400000E+01.400000E+03.031300E+1
5:25 AM2.069500E+21.400000E+01.400000E+03.031400E+1
5:26 AM2.052600E+21.400000E+01.400000E+03.030400E+1
5:27 AM2.040700E+21.400000E+01.400000E+03.029100E+1
Update
I figured out at least part of the reason why what I am doing does not work. Still do not know how to define dtypes the way I want to.
import numpy as np
file = np.genfromtxt('myfile.txt', usecols = (0,1), dtype = (str, float), delimiter = '\t')
That returns an array of strings for each column. How do I tell it I want column 0 to be a str, and all the rest of the columns to be float?
In [55]: type(1.000000E+1)
Out[55]: <type 'float'>
What does your input data look like, it's fair possible that it's in the wrong input format but it's also sure that it's fairly easy to convert it to the right format.
Numbers in the form 1.0000E+1 can be parsed by float(), so I'm not sure what the problem is:
>>> float('1.000E+1')
10.0
I think you'll want to get a text parser to parse the format into a native python data type.
like 1.00000E+1 turns into 1.0^1, which could be expressed as a float.

Categories