I am working with dataset which contain numerical values stored as type object. Below you can see how is look like data in pandas.core.frame.DataFrame
So I want to convert this columns from object type into int64 . In order to do this I try with this line of code
X['duration']=X['duration'].astype('float64')
But this line not don't work well and give this error message:
could not convert string to float: '18`'
So can anybody help me how to solve this and convert into numerical values ?
Try pd.to_numeric -
X['duration'] = X['duration'].str.extract('(\d+)')
X['duration'] = pd.to_numeric(X['duration'], errors='coerce')
Related
I am using group by function of pandas, instead of adding numbers, pandas think it is a string and return like: 3,000,000,000.0092,315,000.00 instead of 3,092,315,000. i have tried several methods of conversions but each time it returns "ValueError: could not convert string to float: '3,000,000,000.00'"
i am unable to attach csv file, that might be the real problem.
df['AMOUNT'] = df['AMOUNT'].astype('float')
Try to replace , with `` first.
df['AMOUNT'] = df['AMOUNT'].str.replace(',', '').astype('float')
code
dataframe
I tried making a boxplot for 'horsepower', but it shows up as an object type, so I tried converting it to float but it displays that error.
The column horsepower seems to hold some non numeric values. I suggest you, in this case, to use pandas.to_numeric instead of pandas.Series.astype.
Replace this :
df_T['horsepower']= df_T['horsepower'].astype(float)
By this :
df_T['horsepower']= pd.to_numeric(df_T['horsepower'], errors= 'coerce')
If ‘coerce’, then invalid parsing will be set as NaN.
When I am writing/saving the csv file column data are saving in NoneType
Individually I am changing them to numeric type using below command
df['A'] = pd.to_numeric(df['A'])
Its fine when there are few columns but in large data set converting individual column is a big task
I am doing this because I have to plot the line graph using column data, bcz of NoneType not able to plot graph
I want to save the data in their default data type like float data save into float type and integer data save into int type
If you have columns that are non-numeric, you can try this
for c in df.columns:
try:
df[c]=pd.to_numeric(df[c])
except:
pass
Check this. Hope it will solve your problem :
df = pd.to_numeric(df)
I try to read a CSV file as pandas data frame. Beside column names, I get the expected dtype. My approach is:
reading the CSV with inferring column types (as I want to be able
to catch issues)
reading the expected column types
iterating over the columns and try to convert them with 'astype'
Now, I have timedeltas in nanoseconds. They are read in as float64 and can contain missing values. 'astype' fails with the following message:
ValueError: Cannot convert non-finite values (NA or inf) to integer
This little script can reproduce my issue. The method 'to_timedelta' works perfekt on my data while the conversion give the error.
import pandas as pd
import numpy as np
timedeltas = [200800700,30020010030,np.NaN]
data = {'timedelta': timedeltas}
pd.to_timedelta(timedeltas)
df = pd.DataFrame(data)
df.dtypes
df['timedelta'].astype('timedelta64[ns]')
Can anybody help to fix this issue? Is there any other save representation than nanoseconds which would work with 'astype'?
Thanks to MrFuppes.
It's not possible to use astype() but to_timedelta works. Thank you!
df['timedelta'] = pd.to_timedelta(df['timedelta'])
I am trying to convert a pands data frame (read in from a .csv file) from string to float. The columns 1 until 20 are recognized as "strings" by the system. However, they are float values in the format of "10,93847722". Therefore, I tried the following code:
new_df = df[df.columns[1:20]].transform(lambda col: col.str.replace(',','.').astype(float))
The last line causes the Error:
AttributeError: 'DataFrame' object has no attribute 'transform'
Maybe important to know, I can only use pands version 0.16.2.
Thank you very much for your help!
#all: Short extract from one of the columns
23,13854599
23,24945831
23,16853714
23,0876255
23,05908775
Use DataFrame.apply:
df[df.columns[1:20]].apply(lambda col: col.str.replace(',','.').astype(float))
EDIT: If some non numeric values is possible use to_numeric with errors='coerce' for replace these values to NaNs:
df[df.columns[1:20]].apply(lambda col: pd.to_numeric(col.str.replace(',','.'),errors='coerce'))
You should load them directly as numbers:
pd.read_csv(..., decimal=',')
This will recognize , as decimal point for every column.