Am fairly new to code but have managed to solve most problems, here though I am stuck. I have a column in a df where all the values are a string in brackets for example '[0.0987]', I can't seem to convert these to float in order to calculate the mean. Every method results in an error such as: 'could not convert string to float:' or 'Could not convert to numeric'. Can't share a link so image below shows an example csv I am loading into pandas.
You have to strip the brackets from the values.
df["qout"].str.strip('[]').astype(float)
Strip - will remove the [] from the column
astype - Will typecast the data as float
You probably have to strip the brackets. Does this work? qout_as_float = float(qout[1:-1])
Related
I read from the following file my data, and create a table.
tracks=pd.read_csv('C:\\Users\\demet\\Desktop\\Internship\\scripts\\tracks-rainy.csv')
Yet when I print for instance an element instead of obtaining a float a get a string.
print(tracks.iloc[track_id][3][0])
What should I add to my project.
You can try:
tracks=pd.read_csv('C:\\Users\\demet\\Desktop\\Internship\\scripts\\tracks-rainy.csv', dtype={'track_id':'Float64'})
Which tell pandas to interpret the column as Float. (As Karl Knechtel said)
If you do not want to initiate the conversion when reading the csv file, you can always do list comprehension with a float conversion.
tracks['track_id'] = [float(i) for i in tracks['track_id']]
I am doing some image processing in python, and need to crop an area of the image. However, my pixel coordinate data is arranged as three values in one excel column seperated by commas, as follows:
[1345.83,1738,44.26] (i.e. [x,y,r]) - this is exactly how it appears in the excel cell, square brackets and all.
Any idea how I can read this into my script and start cropping images according to the pixel coord values? Is there a function that can seperate them and treat them as three independent values?
Thanks, Rhod
My understanding is that if you use pandas.read_excel(), you will get a column of strings in this situation. There's lots of options but here I would do, assuming your column name is xyr:
# clean up strings to remove braces on either side
data['xyr_clean'] = data['xyr'].str.lstrip('[').str.rstrip(']')
data[['x', 'y', 'r']] = (
data['xyr_clean'].str.split(', ', expand=True).astype(float)
)
The key thing to know is that pandas string columns have a .str attribute that contains adapted versions of all or most of Python's built-in string methods. Then you can search for "pandas convert string column to float" to get the last bit!
I have an exelfile that I want to convert but the default type for numbers is float. How can I change it so xlwings explicitly uses strings and not numbers?
This is how I read the value of a field:
xw.Range(sheet, fieldname ).value
The problem is that numbers like 40 get converted to 40.0 if I create a string from that. I strip it with: str(xw.Range(sheetFronius, fieldname ).value).rstrip('0').rstrip('.') but that is not very helpful and leads to errors because sometimes the same field can contain both a number and a string. (Not at the same time, the value is chosen from a list)
With xlwings if no options are set during reading/writing operations single cells are read in as 'floats'. Also, by default cells with numbers are read as 'floats'. I scoured the docs, but don't think you can convert a cell that has numbers to a 'string' via xlwings outright. Fortunately all is not lost...
You could read in the cells as 'int' with xlwings and then convert the 'int' to 'string' in Python. The way to do that is as follows:
xw.Range(sheet, fieldname).options(numbers=int).value
And finally, you can read in your data this way (by packing the string conversion into the options upfront):
xw.Range(sheet, fieldname).options(numbers=lambda x: str(int(x))).value
Then you would just convert that to string in Python in the usual way.
Good luck!
In my case conclusion was, just adding one row to the last row of raw data.
Write any text in the column you want to change to str, save, load, and then delete the last line.
I have a dataset like the one shown below
http://i.stack.imgur.com/1uxCK.png
I am able to read them into an numpy array but the datatype is of type string when it has read from the CSV file. I am unable to convert the same into float since without that i would not be able to proceed further.Mind you there are blank spaces between the two data columns shown in the first screenshot.
The numpy array structure when printed looks like in the screenshot given below:
http://i.stack.imgur.com/JFfzw.png
Note: (Observe the Single Quotation Marks between the start and end of each data line in the screenshot which is a proof that numpy has stored the data as a string rather than float)
Any help would be appreciated in helping me convert the data from string to float type?????? have Tried many things but yet all in vain!!!!!!!!
numpy.loadtxt(filename) should work out of the box: it yields numbers.
I wrote a script to take files of data that is in columns and plot it depending on which column the user wants to view. Well, I noticed that the plots look crazy, and have all the wrong numbers because python is ignoring the exponential.
My numbers are in the format: 1.000000E+1 OR 1.000000E-1
What dtype is that? I am using numpy.genfromtxt to import with a dtype = float. I know there are all sorts of dtypes you can enter, but I cannot find a comprehensive list of the options, and examples.
Thanks.
Here is an example of my input (those spaces are tabs):
Time StampT1_ModBtT2_90BendT3_InPET5_Stg2Rfrg
5:22 AM2.115800E+21.400000E+01.400000E+03.035100E+1
5:23 AM2.094300E+21.400000E+01.400000E+03.034800E+1
5:24 AM2.079300E+21.400000E+01.400000E+03.031300E+1
5:25 AM2.069500E+21.400000E+01.400000E+03.031400E+1
5:26 AM2.052600E+21.400000E+01.400000E+03.030400E+1
5:27 AM2.040700E+21.400000E+01.400000E+03.029100E+1
Update
I figured out at least part of the reason why what I am doing does not work. Still do not know how to define dtypes the way I want to.
import numpy as np
file = np.genfromtxt('myfile.txt', usecols = (0,1), dtype = (str, float), delimiter = '\t')
That returns an array of strings for each column. How do I tell it I want column 0 to be a str, and all the rest of the columns to be float?
In [55]: type(1.000000E+1)
Out[55]: <type 'float'>
What does your input data look like, it's fair possible that it's in the wrong input format but it's also sure that it's fairly easy to convert it to the right format.
Numbers in the form 1.0000E+1 can be parsed by float(), so I'm not sure what the problem is:
>>> float('1.000E+1')
10.0
I think you'll want to get a text parser to parse the format into a native python data type.
like 1.00000E+1 turns into 1.0^1, which could be expressed as a float.