Python Matplotlib ValueError - python

Hi,
how do i plot the Attached Dataframe in python, i am looking for multiple series line graph.
Any help will be much appreciated.
Error:-ValueError: could not convert string to float
Thanks

Your problem here is that the % signs in your csv file are making Pandas read each value as a string object, rather than as a float.
The best option for resolving this would probably be to not have extraneous characters like %s everywhere in your csv file. Instead, it would probably make more sense to list units in your columns, or elsewhere in descriptions.
However, in this case, it can also be solved afterward by removing the extraneous characters and converting manually, eg, for a DataFrame a:
a.ix[:,a.dtypes==object] = a.ix[:,a.dtypes==object].applymap(lambda x: float(x[:-1]))
This will work for your specific case of one % at the end being the offending character consistently:
The indexing here selects all columns that are of dtype 'object', which in this case are all strings with the last character %.
The lambda function that is applied to each element removes the last character from the string, and then converts it to a float.
It is then assigned to the same columns.

Related

Python- problem converting negative numbers to floats, issues with hyphen encoding

I have a Pandas dataframe that I've read from a file - pd.read_csv() - and I'm having trouble converting a column with string values to float.
Firstly, I'm not entirely sure why pandas is even reading the column as string files to begin with - all the values are numeric. The problem seems to be with the hyphen minus sign for the negative numbers. There are other threads on this topic that mention how em-dash can mess things up (here, for example)
However, when I try converting the hyphen type, it still gives me an error. For example,
df['Verified_m'] = df['Verified_m'].str.replace("\U00002013", "-").astype(float)
doesn't change anything; all the values start with the '-' hyphen, so it's not actually replacing anything. It still gives me the error:
ValueError: could not convert string to float: '-'
I've tried replacing all of the hyphens with a numeric value to see if that would work, and I'm able to convert to float (example: df['Verified_m'] = df['Verified_m'].str.replace("-", "0").astype(float) . But I'd like to retain the negative values in the dataset. Does anyone know what's wrong with my hyphens?
Try this:
df['Verified_m'] = df['Verified_m'].str.replace("\U00002013", "-").str.replace(r'^-$', '0', regex=True).astype(float)
After converting the em-dashes to hyphens, it converts a lone - to zero.

Python np.fromfile() adding arbitrary random comma when reading from binary file

I encounter weird problem and could not solve it for days. I have created byte array that contains values from 1 to 250 and write it to binary file from C# using WriteAllBytes.
Later i read it from Python using np.fromfile(filename, dtype=np.ubyte). However, i realize this functions was adding arbitrary comma (see the image). Interestingly it is not visible in array property. And if i call numpy.array2string, comma turns '\n'. One solution is to replace comma with none, however i have very long sequences it will take forever on 100gb of data to use replace function. I also recheck the files by reading using .net Core, i'm quite sure comma is not there.
What could i be missing?
Edit:
I was trying to read all byte values to array and cast each member to or entire array to string. I found out that most reliable way to do this is:
list(map(str, (ubyte_array))
Above code returns string list that its elements without any arbitrary comma or blank space.

How to convert String (containing a table of numbers without comma delimiter) into Array in Python

I have a CSV file and I load it by "pd.read_csv". One of the columns is a variable with String datatype. But, it actually contains a table of numbers (like a 2D array) without comma delimiter.
I would like to convert it into Array. I tried "eval()" function but it gives an error (as can be seen in the following image).
If you have any idea how to solve this issue, please let me know.

How to compare two percentages in python?

I am new to python and I am dealing with some csv files. To sort these files, I have to compare some percentages in string format, such as "5.265%" and "2.1545%". So how do I compare the actual values of these two strings? I have tried to convert them to float but it didn't work. Thanks in advance!
Still convert them to floats, but without the % sign:
float(value.strip(' \t\n\r%'))
The .strip() removes any extra whitespace, as well as the % percent sign, you don't need that to be able to compare two values:
>>> float('5.265% '.strip(' \t\n\r%'))
5.265
>>> float('2.1545%'.strip(' \t\n\r%'))
2.1545
float() itself will normally strip away whitespace for you but by stripping it yourself you make sure that the % sign is also properly removed, making this a little more robust when handling data from files.

Python - Convert negative decimals from string to float

I need to read in a large number of .txt files, each of which contains a decimal (some are positive, some are negative), and append these into 2 arrays (genotypes and phenotypes). Subsequently, I wish to perform some mathematical operations on these arrays in scipy, however the negative ('-') symbol is causing problems. Specifically, I cannot convert the arrays to float, because the '-' is being read as a string, causing the following error:
ValueError: could not convert string to float:
Here is my code as it's currently written:
import linecache
gene_array=[]
phen_array=[]
for i in genotype:
for j in phenotype:
genotype='/path/g.txt'
phenotype='/path/p.txt'
g=linecache.getline(genotype,1)
p=linecache.getline(phenotype,1)
p=p.strip()
g=g.strip()
gene_array.append(g)
phen_array.append(p)
gene_array=map(float,gene_array)
phen_array=map(float,phen_array)
I am fairly certain at this point that it is the negative sign that is causing the problem, but it is not clear to me why. Is my use of Linecache the problem here? Is there an alternative method that would be better?
The result of
print gene_array
is
['-0.0448022516321286', '-0.0236187263814157', '-0.150505384829925', '-0.00338459268479522', '0.0142429109897682', '0.0286253352284279', '-0.0462358095345649', '0.0286232317578776', '-0.00747425206137217', '0.0231790239373428', '-0.00266935581919541', '0.00825077426011094', '0.0272744527203547', '0.0394829854063242', '0.0233109171715023', '0.165841084392078', '0.00259693465334536', '-0.0342590874424289', '0.0124600520095644', '0.0713627590092807', '-0.0189374898081401', '-0.00112750710611284', '-0.0161387333242288', '0.0227226505624106', '0.0382173405035751', '0.0455518646388402', '-0.0453048799717046', '0.0168570746329513']
The issue seems to be with empty string or space as evident from your error message
ValueError: could not convert string to float:
To make it work, convert the map to a list comprehension
gene_array=[float(e) for e in gene_array if e]
phen_array=[float(e) for e in phen_array if e]
By empty string means
float(" ") or float("") would give value errors, so if any of the items within gene_array or phen_array has space, this will throw an error while converting to float
There could be many reasons for empty string like
empty or blank line
blank line either at the beginning or end
The issue is definitely not in the negative sign. Python converts strings with negative sign without a problem. I suggest you run each of your entries against a float RegEx and see if they all pass.
There is nothing in the error message to suggest that - is the problem. The most likely reason is that gene_array and/or phen_array contain an empty string ('').
As stated in the documentation, linecache.getline()
will return '' on errors (the terminating newline character will be included for lines that are found).

Categories