I am sorry if this question has been answered already, I just couldn't find it.
I have scraped a table with pandas.read_html. In the table, I have a couple of expressions I am interested in the format of "20.15 X 200", where the type is a string. I want to convert them to floats in order to make the multiplication and later compare the values. The issue I have is that the expressions are with the "X" sign for multiplication and I am not able to find any information ho to convert such an expression.
When I try the function to float() I get - ValueError: could not convert string to float.
Related
I have a Pandas dataframe that I've read from a file - pd.read_csv() - and I'm having trouble converting a column with string values to float.
Firstly, I'm not entirely sure why pandas is even reading the column as string files to begin with - all the values are numeric. The problem seems to be with the hyphen minus sign for the negative numbers. There are other threads on this topic that mention how em-dash can mess things up (here, for example)
However, when I try converting the hyphen type, it still gives me an error. For example,
df['Verified_m'] = df['Verified_m'].str.replace("\U00002013", "-").astype(float)
doesn't change anything; all the values start with the '-' hyphen, so it's not actually replacing anything. It still gives me the error:
ValueError: could not convert string to float: '-'
I've tried replacing all of the hyphens with a numeric value to see if that would work, and I'm able to convert to float (example: df['Verified_m'] = df['Verified_m'].str.replace("-", "0").astype(float) . But I'd like to retain the negative values in the dataset. Does anyone know what's wrong with my hyphens?
Try this:
df['Verified_m'] = df['Verified_m'].str.replace("\U00002013", "-").str.replace(r'^-$', '0', regex=True).astype(float)
After converting the em-dashes to hyphens, it converts a lone - to zero.
I am writing a code where I am facing the problem and need a solution if it exists.
Suppose we have a following String type variable in Python which contains an integer value.
Eg:x='123'
I know that we can easily convert this by type conversion to int.
However, suppose we have the following list.
x=['123','Spain']
Is there any method in Python by which I can know which element of the list x is Integer contained inside a string and which is purely an Object?
I would recommend this method:
x = "123"
if x.isdigit():
# int
elif x.replace(".","",1).isdigit():
# float
else:
# str
I assume you have similar question with this post.
But, from my perspective, for more general solution (language agnostic), you should learn more about Regular Expression, here also the same question
Hi,
how do i plot the Attached Dataframe in python, i am looking for multiple series line graph.
Any help will be much appreciated.
Error:-ValueError: could not convert string to float
Thanks
Your problem here is that the % signs in your csv file are making Pandas read each value as a string object, rather than as a float.
The best option for resolving this would probably be to not have extraneous characters like %s everywhere in your csv file. Instead, it would probably make more sense to list units in your columns, or elsewhere in descriptions.
However, in this case, it can also be solved afterward by removing the extraneous characters and converting manually, eg, for a DataFrame a:
a.ix[:,a.dtypes==object] = a.ix[:,a.dtypes==object].applymap(lambda x: float(x[:-1]))
This will work for your specific case of one % at the end being the offending character consistently:
The indexing here selects all columns that are of dtype 'object', which in this case are all strings with the last character %.
The lambda function that is applied to each element removes the last character from the string, and then converts it to a float.
It is then assigned to the same columns.
Write a python program to accept 2 "string" numbers for calculation.
Note : convert string number to an Integer before perform the calculation
Any examples answer?
You have to accept input in string, use raw_input(), and you have to parse them in int. And perform your calculation
I need to read in a large number of .txt files, each of which contains a decimal (some are positive, some are negative), and append these into 2 arrays (genotypes and phenotypes). Subsequently, I wish to perform some mathematical operations on these arrays in scipy, however the negative ('-') symbol is causing problems. Specifically, I cannot convert the arrays to float, because the '-' is being read as a string, causing the following error:
ValueError: could not convert string to float:
Here is my code as it's currently written:
import linecache
gene_array=[]
phen_array=[]
for i in genotype:
for j in phenotype:
genotype='/path/g.txt'
phenotype='/path/p.txt'
g=linecache.getline(genotype,1)
p=linecache.getline(phenotype,1)
p=p.strip()
g=g.strip()
gene_array.append(g)
phen_array.append(p)
gene_array=map(float,gene_array)
phen_array=map(float,phen_array)
I am fairly certain at this point that it is the negative sign that is causing the problem, but it is not clear to me why. Is my use of Linecache the problem here? Is there an alternative method that would be better?
The result of
print gene_array
is
['-0.0448022516321286', '-0.0236187263814157', '-0.150505384829925', '-0.00338459268479522', '0.0142429109897682', '0.0286253352284279', '-0.0462358095345649', '0.0286232317578776', '-0.00747425206137217', '0.0231790239373428', '-0.00266935581919541', '0.00825077426011094', '0.0272744527203547', '0.0394829854063242', '0.0233109171715023', '0.165841084392078', '0.00259693465334536', '-0.0342590874424289', '0.0124600520095644', '0.0713627590092807', '-0.0189374898081401', '-0.00112750710611284', '-0.0161387333242288', '0.0227226505624106', '0.0382173405035751', '0.0455518646388402', '-0.0453048799717046', '0.0168570746329513']
The issue seems to be with empty string or space as evident from your error message
ValueError: could not convert string to float:
To make it work, convert the map to a list comprehension
gene_array=[float(e) for e in gene_array if e]
phen_array=[float(e) for e in phen_array if e]
By empty string means
float(" ") or float("") would give value errors, so if any of the items within gene_array or phen_array has space, this will throw an error while converting to float
There could be many reasons for empty string like
empty or blank line
blank line either at the beginning or end
The issue is definitely not in the negative sign. Python converts strings with negative sign without a problem. I suggest you run each of your entries against a float RegEx and see if they all pass.
There is nothing in the error message to suggest that - is the problem. The most likely reason is that gene_array and/or phen_array contain an empty string ('').
As stated in the documentation, linecache.getline()
will return '' on errors (the terminating newline character will be included for lines that are found).