Conversion of string to float - python

I read from the following file my data, and create a table.
tracks=pd.read_csv('C:\\Users\\demet\\Desktop\\Internship\\scripts\\tracks-rainy.csv')
Yet when I print for instance an element instead of obtaining a float a get a string.
print(tracks.iloc[track_id][3][0])
What should I add to my project.

You can try:
tracks=pd.read_csv('C:\\Users\\demet\\Desktop\\Internship\\scripts\\tracks-rainy.csv', dtype={'track_id':'Float64'})
Which tell pandas to interpret the column as Float. (As Karl Knechtel said)

If you do not want to initiate the conversion when reading the csv file, you can always do list comprehension with a float conversion.
tracks['track_id'] = [float(i) for i in tracks['track_id']]

Related

Convert column in df to float (sounds simple)

Am fairly new to code but have managed to solve most problems, here though I am stuck. I have a column in a df where all the values are a string in brackets for example '[0.0987]', I can't seem to convert these to float in order to calculate the mean. Every method results in an error such as: 'could not convert string to float:' or 'Could not convert to numeric'. Can't share a link so image below shows an example csv I am loading into pandas.
You have to strip the brackets from the values.
df["qout"].str.strip('[]').astype(float)
Strip - will remove the [] from the column
astype - Will typecast the data as float
You probably have to strip the brackets. Does this work? qout_as_float = float(qout[1:-1])

Reading a csv persisted list of floats back into a list of floats

I have persisted a list of floats in a csv file and it appears thus (a single row).
"[6.61501123e-04 1.23390303e-04 1.59454121e-03 2.17852772e-02
:
3.02987776e-04 3.83064064e-03 6.90607396e-04 3.30468375e-03
2.78064613e-02]"
Now when converting reading back to a list, I am using the ast literal_eval approach:
probs = [float(p) for p in ast.literal_eval(row['prob_array'])]
And I get this error:
probs = [float(p) for p in ast.literal_eval(row['prob_array'])]
File "/Users/santino/anaconda/lib/python2.7/ast.py", line 49, in literal_eval
node_or_string = parse(node_or_string, mode='eval')
File "/Users/santino/anaconda/lib/python2.7/ast.py", line 37, in parse
return compile(source, filename, mode, PyCF_ONLY_AST)
File "<unknown>", line 1
[6.61501123e-04 1.23390303e-04 1.59454121e-03 2.17852772e-02
^
SyntaxError: invalid syntax
Not sure how I can instruct ast to read the exponent syntax, or am I wrong in assuming it's the exponent syntax that is causing the exception.
Edit: I used csv.DictWriter to persist into the csv file. Is there a different way I should be persisting?
Edit2:
with open("./input_file.csv","w") as r:
writer = csv.DictWriter(r,fieldnames=["item_id","item_name","prob_array"])
writer.writeheader()
res_list = ...
for i,res in enumerate(res_list):
row_dict = {}
row_dict['item_id'] = id_list[i]
row_dict['prob_array'] = res
row_dict['item_name'] = item_list[i]
writer.writerow(row_dict)
CSV only stores string columns. Using it to store strings, ints, floats, and a few other basic types is fine, as long as you manually convert the objects: whenever you do str(i) to an int, you can get the int back with int(s).
But that isn't true for a list of floats. There's no function you can use to get back the result of str(lst) on an arbitrary list.1 And it isn't true for… whatever you have, which seems to be most likely a numpy array or Pandas Series… either.2
If you can store each float as a separate column, instead of storing a list of them in a single column, that's the easiest answer. But it may not be appropriate.3
So, you just need to pick some other function to use in place of the implicit str, which can be reversed with a simple function call. There are formats designed for persisting data to strings—JSON, XML, even a nested CSV—so that's the first place to look.
Usually JSON should be the first one you look at. As long as it can handle all of your data (and it definitely can here), it's dead simple to use, someone's already thought throw all the annoying edge cases, and there's code to parse it for every platform in the universe.
So, you write the value like this:
row_dict['prob_array'] = json.dumps(res)
And then you can read it back like this:
prob_array = json.loads(row['prob_array'])
If prob_array is actually a numpy arrays or Pandas series or something rather than a list, you'll want to either convert through list, or use numpy or Pandas JSON methods instead of the stdlib module.
The only real problem here is that if you want the CSV to be human-readable/editable, the escaped commas and quotes could be pretty ugly.
In this case, you can define a simpler format that's still easy to write and parse for your specific data, and also more human-readable, like just space-separated floats:
row_dict['prob_array'] = ' '.join(map(str, res))
prob_array = [float(val) for val in row['prob_array'].split()]
1. Sometimes you can use ast.literal_eval, but relying on that is never a good idea, and it isn't working here.
2. The human-readable format used by numpy and Pandas is even less parser-friendly than the one used by Python lists. You could switch to their repr instead of their str, but it still isn't going to ast.literal_eval.
3. For an obvious example, imagine a table with two different arbitrary-length lists…

Python Pandas, convert iloc slice to float

I use
Report.iloc[:,6:] = Report.iloc[:,6:].astype(float)
to convert those columns to float.
The problem is, that it doesn´t change Report when used in my code file. But when I use it in the Console it works.
Report.iloc[:,6:] = Report.iloc[:,6:].astype(float)
Report.info()

How can I read every field as string in xlwings?

I have an exelfile that I want to convert but the default type for numbers is float. How can I change it so xlwings explicitly uses strings and not numbers?
This is how I read the value of a field:
xw.Range(sheet, fieldname ).value
The problem is that numbers like 40 get converted to 40.0 if I create a string from that. I strip it with: str(xw.Range(sheetFronius, fieldname ).value).rstrip('0').rstrip('.') but that is not very helpful and leads to errors because sometimes the same field can contain both a number and a string. (Not at the same time, the value is chosen from a list)
With xlwings if no options are set during reading/writing operations single cells are read in as 'floats'. Also, by default cells with numbers are read as 'floats'. I scoured the docs, but don't think you can convert a cell that has numbers to a 'string' via xlwings outright. Fortunately all is not lost...
You could read in the cells as 'int' with xlwings and then convert the 'int' to 'string' in Python. The way to do that is as follows:
xw.Range(sheet, fieldname).options(numbers=int).value
And finally, you can read in your data this way (by packing the string conversion into the options upfront):
xw.Range(sheet, fieldname).options(numbers=lambda x: str(int(x))).value
Then you would just convert that to string in Python in the usual way.
Good luck!
In my case conclusion was, just adding one row to the last row of raw data.
Write any text in the column you want to change to str, save, load, and then delete the last line.

python data types

I wrote a script to take files of data that is in columns and plot it depending on which column the user wants to view. Well, I noticed that the plots look crazy, and have all the wrong numbers because python is ignoring the exponential.
My numbers are in the format: 1.000000E+1 OR 1.000000E-1
What dtype is that? I am using numpy.genfromtxt to import with a dtype = float. I know there are all sorts of dtypes you can enter, but I cannot find a comprehensive list of the options, and examples.
Thanks.
Here is an example of my input (those spaces are tabs):
Time StampT1_ModBtT2_90BendT3_InPET5_Stg2Rfrg
5:22 AM2.115800E+21.400000E+01.400000E+03.035100E+1
5:23 AM2.094300E+21.400000E+01.400000E+03.034800E+1
5:24 AM2.079300E+21.400000E+01.400000E+03.031300E+1
5:25 AM2.069500E+21.400000E+01.400000E+03.031400E+1
5:26 AM2.052600E+21.400000E+01.400000E+03.030400E+1
5:27 AM2.040700E+21.400000E+01.400000E+03.029100E+1
Update
I figured out at least part of the reason why what I am doing does not work. Still do not know how to define dtypes the way I want to.
import numpy as np
file = np.genfromtxt('myfile.txt', usecols = (0,1), dtype = (str, float), delimiter = '\t')
That returns an array of strings for each column. How do I tell it I want column 0 to be a str, and all the rest of the columns to be float?
In [55]: type(1.000000E+1)
Out[55]: <type 'float'>
What does your input data look like, it's fair possible that it's in the wrong input format but it's also sure that it's fairly easy to convert it to the right format.
Numbers in the form 1.0000E+1 can be parsed by float(), so I'm not sure what the problem is:
>>> float('1.000E+1')
10.0
I think you'll want to get a text parser to parse the format into a native python data type.
like 1.00000E+1 turns into 1.0^1, which could be expressed as a float.

Categories