Can't convert string to float because of '.'? - python

I need to round the string values of a column in my dataframe up to 2 decimal cases, so I started by converting them to floats using astype(float) and then using round(2).
Ex:
df['col'] = df['col'].astype(float).round(2)
But I'm getting the following error:
ValueError: could not convert string to float: '.'
I thought the dots would be no problem, is there something I'm missing here?
Edit: It's a huge amount of data, so there could be unexpected values, but after filtering it to testing samples the error continues.
Edit2: Turns out I still had invalid data even after filtering the sheet, so sorry for the seemingly dumb question lol. mozways's solution worked fine.

To convert string to numeric without errors upon invalid data, use pandas.to_numeric:
df['col'] = pandas.to_numeric(df['col'], error='coerce').round(2)

Related

Python- problem converting negative numbers to floats, issues with hyphen encoding

I have a Pandas dataframe that I've read from a file - pd.read_csv() - and I'm having trouble converting a column with string values to float.
Firstly, I'm not entirely sure why pandas is even reading the column as string files to begin with - all the values are numeric. The problem seems to be with the hyphen minus sign for the negative numbers. There are other threads on this topic that mention how em-dash can mess things up (here, for example)
However, when I try converting the hyphen type, it still gives me an error. For example,
df['Verified_m'] = df['Verified_m'].str.replace("\U00002013", "-").astype(float)
doesn't change anything; all the values start with the '-' hyphen, so it's not actually replacing anything. It still gives me the error:
ValueError: could not convert string to float: '-'
I've tried replacing all of the hyphens with a numeric value to see if that would work, and I'm able to convert to float (example: df['Verified_m'] = df['Verified_m'].str.replace("-", "0").astype(float) . But I'd like to retain the negative values in the dataset. Does anyone know what's wrong with my hyphens?
Try this:
df['Verified_m'] = df['Verified_m'].str.replace("\U00002013", "-").str.replace(r'^-$', '0', regex=True).astype(float)
After converting the em-dashes to hyphens, it converts a lone - to zero.

ValueError: could not convert string to float: ' caing low enough for rotary follow on' | Python

I have a python script that iterate among data format values and returns back just hour.
Below is the similar script(that I use for iteration):
zaman = "06:00:00" (hours:minutes:seconds)
hm = zaman.split(":")
vaxt = [hm[1]]
saat = float(hm[0]) + float(float(hm[1])/60)
print(f"{saat:,.2f}")
In one of the files which has several rows I get the error:
ValueError: could not convert string to float: ' caing low enough for rotary follow on'
I have checked myself, that this row do not differ from the previous ones, but I get an error on that one.
Do you have suggestions on how to solve it? (may be getting hours from DateTime in another way)
The issue is that you're not correctly identifying the datetime in the string, so you end up trying to convert the wrong bit to a float.
One potential fix for this would be to not rely on splitting the string at the :s, but instead to use a regex to look for the part of the string with the appropriate format.
For example:
import re
test_string = 'this is a string with 06:00:00 in it somewhere'
matches = re.search('(\d{2}):(\d{2}):(\d{2})', test_string)
matches = [float(m) for m in matches.groups()]
print(matches)
# [6.0,0.0,0.0]
I have tested the code you provided above and it works. However, after doing some research it appears:
The Python "ValueError: could not convert string to float" occurs when we pass a string that cannot be converted to a float (e.g. an empty string or one containing characters) to the float() class. To solve the error, remove all unnecessary characters from the string.
So check your file to make sure the input is clean for float() to work perfectly.

ValueError: invalid literal for int() with base 10: '48.5200.048.5200.0200

I am a newbie to programming and I recently came across this error. I am working on the Space Analysis dataset from Kaggle and the Price column is a panda series. I tried using astype() to convert it into float and int which was working fine a while ago but now it shows me the Value error. When the astype() is removed the TypeError: '<' not supported between instances of 'str' and 'int' occurs.
df_money = df_.groupby(["Organisation"])["Price"].sum().reset_index()
df_money["Price"] = df_money["Price"].astype('float')
df_money = df_money[df_money["Price"]>0]
df_money.head()
Error was:
ValueError: invalid literal for int() with base 10: '48.5200.048.5200.0200.0200.037.0200.037.0200.0200.037.0200.0200.037.0200.0200.037.0200.037.0200.0200.0200.037.0200.0200.037.0200.037.0200.0200.0200.0200.037.0200.0200.0200.0200.037.0200.0200.037.0200
This means some record in your Price column likely has a string with the literal value "48.5200.048.5200.0200.0200.037.0200.037.0200.0200.037.0200.0200.037.0200.0200.037.0200.037.0200.0200.0200.037.0200.0200.037.0200.037.0200.0200.0200.0200.037.0200.0200.0200.0200.037.0200.0200.037.0200"
This string format cannot be naively converted to a number, and thus the conversion fails.
Later you do a comparison between that column and > 0. Python, while dynamic, is still strongly typed, meaning that you can only usually compare values of the same type. A string isn't a number and cannot be compared to a number, hence the TypeError if the Pricing column contains strings.
To resolve your issue, you will have to somehow convert your long string to something that can be converted to a number. You can typically only convert something with a single decimal separator. For example, float("48.5200") will work, but float("48.5200.048") won't.
I believe your groupby.sum() is probably creating this long string. Convert the Price column to a float before doing the groupby operation.

Why can I not convert an object to int in Python and how do I check the troublesome data?

I am trying to run the code:
df["columnname"].astype(int)
And it does not convert my datatype to int. Instead, it's still listed as an object. There are a lot of rows in the column, but I quickly did a sort in Excel and they were all numbers. Integers in fact. Why does Python think there's a string in there, when there is not. I've tried float as well and stubbornly (just to make sure there's not a non-int in there) and it still thinks it's a string.
Assuming Excel is wrong, how do I check exactly which value cannot be converted to an int, and is causing the problem.
You may need to explicity tell python to change the datatype as follows:
df["columnname"] = df["columnname"].astype(int)

Python - Convert negative decimals from string to float

I need to read in a large number of .txt files, each of which contains a decimal (some are positive, some are negative), and append these into 2 arrays (genotypes and phenotypes). Subsequently, I wish to perform some mathematical operations on these arrays in scipy, however the negative ('-') symbol is causing problems. Specifically, I cannot convert the arrays to float, because the '-' is being read as a string, causing the following error:
ValueError: could not convert string to float:
Here is my code as it's currently written:
import linecache
gene_array=[]
phen_array=[]
for i in genotype:
for j in phenotype:
genotype='/path/g.txt'
phenotype='/path/p.txt'
g=linecache.getline(genotype,1)
p=linecache.getline(phenotype,1)
p=p.strip()
g=g.strip()
gene_array.append(g)
phen_array.append(p)
gene_array=map(float,gene_array)
phen_array=map(float,phen_array)
I am fairly certain at this point that it is the negative sign that is causing the problem, but it is not clear to me why. Is my use of Linecache the problem here? Is there an alternative method that would be better?
The result of
print gene_array
is
['-0.0448022516321286', '-0.0236187263814157', '-0.150505384829925', '-0.00338459268479522', '0.0142429109897682', '0.0286253352284279', '-0.0462358095345649', '0.0286232317578776', '-0.00747425206137217', '0.0231790239373428', '-0.00266935581919541', '0.00825077426011094', '0.0272744527203547', '0.0394829854063242', '0.0233109171715023', '0.165841084392078', '0.00259693465334536', '-0.0342590874424289', '0.0124600520095644', '0.0713627590092807', '-0.0189374898081401', '-0.00112750710611284', '-0.0161387333242288', '0.0227226505624106', '0.0382173405035751', '0.0455518646388402', '-0.0453048799717046', '0.0168570746329513']
The issue seems to be with empty string or space as evident from your error message
ValueError: could not convert string to float:
To make it work, convert the map to a list comprehension
gene_array=[float(e) for e in gene_array if e]
phen_array=[float(e) for e in phen_array if e]
By empty string means
float(" ") or float("") would give value errors, so if any of the items within gene_array or phen_array has space, this will throw an error while converting to float
There could be many reasons for empty string like
empty or blank line
blank line either at the beginning or end
The issue is definitely not in the negative sign. Python converts strings with negative sign without a problem. I suggest you run each of your entries against a float RegEx and see if they all pass.
There is nothing in the error message to suggest that - is the problem. The most likely reason is that gene_array and/or phen_array contain an empty string ('').
As stated in the documentation, linecache.getline()
will return '' on errors (the terminating newline character will be included for lines that are found).

Categories