Having problems converting strings to floats in pandas data frame [duplicate] - python

This question already has answers here:
Convert number strings with commas in pandas DataFrame to float
(4 answers)
Closed 4 years ago.
Trying to make a scatter plot with a pandas dataframe, but "ValueError: x and y must be the same size" kept popping up. Looks like Slaughter Steers data column are strings instead of floats so try to convert it, but ValueError: could not convert string to float: '1,062.6' happens. Tried to replace ' with a space still same error.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
#Read in Data set date as index
cattle_price = pd.read_csv('C:/Users/SkyLH/Documents/CattleForcast Model/Slaughter Cattle Monthly Data.csv', index_col = 'DATE')
cattle_slaughter = pd.read_csv('C:/Users/SkyLH/Documents/Cattle Forcast Model/SlaughterCountsFull - Sheet1.csv', index_col = 'Date')
cattle_price.index = pd.to_datetime(cattle_price.index)
cattle_price.index.names = ['Date',]
cattle_slaughter.replace("'"," ")
cattle_slaughter.astype(float)
cattle_df = cattle_price.join(cattle_slaughter, how = 'inner')
print(cattle_df)
plt.scatter(cattle_df, y = 'Price')
plt.show()
Price Slaughter Steers
Date
1955-01-01 34.899999 983.8
1955-02-01 35.999998 847.9
1955-03-01 34.600001 1,062.6
1955-04-01 35.800002 1,000.9
1955-05-01 33.100002 1,090.1

Believe the commas (thousands separators) are preventing the conversion. This question has possible solutions that may help you:
How do I use Python to convert a string to a number if it has commas in it as thousands separators?

Related

Convert objects to numeric values [duplicate]

This question already has answers here:
Change Pandas String Column with commas into Float
(2 answers)
Closed 6 months ago.
I have a CSV file and it has a column full of numbers. These numbers can be formatted as 45.11 , 1,234.33, 122.33, 10,222.22 etc.
Right now they are showing up as objects in my data frame, and i need to convert them to numeric. I have tried:
df['Value'].astype(str).astype(float)
But am getting errors like this:
ValueError: could not convert string to float: '1,054.43'
Does anyone know how to solve this for the weirdly formatted numbers?
this should make the job
vals={'Value': ["45.11" , "1,234.33", "122.33", "10,222.22"]}
df = pd.DataFrame(vals)
df.Value = df.Value.apply(lambda x: x.replace(",", "")).astype(float)
print(df.Value)
output
0 45.11
1 1234.33
2 122.33
3 10222.22
Name: Value, dtype: float64

How do I fix this type error ('value' must be an instance of str or bytes, not a float) on Python

I want to plot a graph for Covid-19 in India and so far there's no problem when I manually input my data as x and y axis. But since the data is quite long and when I want to read it as .csv file, it gives me this error 'value' must be an instance of str or bytes, not a float. I have also try to wrap int(corona_case), but giving me another new error, cannot convert the series to <class 'int'. Also I would be very appreciate if someone can suggest me tutorials on plotting graph involving datetime using python since this is my first time learning python.
I am using Python 3.
p/s I seem can't find a way to share my csv file so I am gonna leave it in snippet.
import pandas as pd
from datetime import datetime, timedelta
from matplotlib import pyplot as plt
from matplotlib import dates as mpl_dates
import numpy as np
plt.style.use('seaborn')
data = pd.read_csv('india.csv')
corona_date = data['Date']
corona_case = data['Case']
plt.plot_date (corona_date, corona_case, linestyle='solid')
plt.gcf().autofmt_xdate()
plt.title('COVID-19 in India')
plt.xlabel('Date')
plt.ylabel('Cumulative Case')
plt.tight_layout()
plt.show()
Date,Case
2020-09-30,6225763
2020-10-01,6312584
2020-10-02,6394068
2020-10-03,6473544
2020-10-04,6549373
2020-10-05,6623815
2020-10-06,6685082
convert all DataFrame columns to the int64 dtype
df = df.astype(int)
convert column "a" to int64 dtype and "b" to complex type
`df = df.astype({"a": int, "b": complex})`
convert Series to float16 type
s = s.astype(np.float16)
convert Series to Python strings
s = s.astype(str)
convert Series to categorical type - see docs for more details
s = s.astype('category')
You can convert the type of column in pandas dataframe like this.
corona_case = data['case'].astype(int) # or str for string
If that is what you are trying to do.
I couldn't give further information but this might work:
corona_date = (data['Date']).astype(str)
corona_case = (data['Case']).astype(str)

How do i convert one column from an imported csv using numpy from string to float?

I have two csv files which i have imported to python using numpy.
the data has 2 columns:
[['month' 'total_rainfall']
['1982-01' '107.1']
['1982-02' '27.8']
['1982-03' '160.8']
['1982-04' '157']
['1982-05' '102.2']
I need to create a 2D array and calculate statistics with the 'total_rainfall' column. (Mean,Std Dev, Min and Max)
So i have this:
import numpy as np
datafile=np.genfromtxt("C:\rainfall-monthly-total.csv",delimiter=",",dtype=None,encoding=None)
print(datafile)
rainfall=np.asarray(datafile).astype(np.float32)
print (np.mean(datafile,axis=1))
ValueError: could not convert string to float: '2019-04'
Converting str to float is like below:
>>> a = "545.2222"
>>> float(a)
545.22220000000004
>>> int(float(a))
545
but the error message says the problem is converting 2019-04 to float.
when you want to convert 2019-04 to float it doesn't work because float numbers don't have - in between . That is why you got error.
You can convert values of rainfall into float or int but date can't be converted. To convert date into int you have to split the string and combine it back as date formate then convert it to milliseconds as:
from datetime import datetime
month1 = '1982-01'
date = datetime(month1.split('-')[0], month1.split('-')[1], 1)
milliseconds = int(round(date.timestamp() * 1000))
This way, you assume its first date of the month.
Your error message reads could not convert string to float,
but actually your problem is a bit different.
Your array contains string columns, which should be converted:
month - to Period (month),
total_rainfall - to float.
Unfortunately, Numpy has been created to process arrays where all
cells are of the same type, so much more convenient tool is Pandas,
where each column can be of its own type.
First, convert your Numpy array (I assume arr) to a pandasonic
DataFrame:
import pandas as pd
df = pd.DataFrame(arr[1:], columns=arr[0])
I took column names from the initial row and data from
following rows. Print df to see the result.
So far both columns are still of object type (actually string),
so the only thing to do is to convert both columns,
each to its desired type:
df.month = pd.PeriodIndex(df.month, freq='M')
df.total_rainfall = df.total_rainfall.astype(float)
Now, when you run df.info(), you will see that both
columns are of proper types.
To process your data, use also Pandas. It is a more convenient tool.
E.g. to get quarterly sums, you can run:
df.set_index('month').resample('Q').sum()
getting (for your data sample):
total_rainfall
month
1982Q1 295.7
1982Q2 259.2

Multiply by a float in pandas -> numbers with comma disappearing [duplicate]

This question already has answers here:
TypeError: can't multiply sequence by non-int of type 'float' (python 2.7)
(1 answer)
Finding non-numeric rows in dataframe in pandas?
(7 answers)
Change column type in pandas
(16 answers)
Closed 4 years ago.
Im having issue applying a currency rate in pandas.
Some numbers are being converted as 'nan' whenever they contains a comma, eg: 1,789 will be considered as nan.
I started with that code :
import pandas as pd
usd_rate = 0.77
salary = pd.read_csv("salary.csv")
#create revenue clean (convert usd to gbp)
salary['revenue_gbp'] = salary.usd_revenue * usd_rate
So I was getting that error :
TypeError: can't multiply sequence by non-int of type 'float'
I've read you can't multiply the column by a float. So I converted my column to numeric :
salary.net_revenue = pd.to_numeric(salary.usd_revenue, errors='coerce')
salary['revenue_gbp'] = salary.usd_revenue * usd_rate
Now I don't have any errors, yet when I looked at my file , all of the number above 999.99 - so the ones containing a comma - are put under 'nan'...
I thought it could be translate issue .. but I'm getting confused here..
any ideas ?
Thanks a lot
usd_revenue is probably not already a numeric type. Try this:
salary['usd_revenue'] = salary['usd_revenue'].map(float)
before your actual line:
salary['revenue_gbp'] = salary.usd_revenue * usd_rate

Pandas - ValueError: Error parsing datetime string "17-Jan-23" at position 3 [duplicate]

This question already has an answer here:
Error parsing datetime string "09-11-2017 00:02:00" at position 8
(1 answer)
Closed 4 years ago.
I have the following code where I am reading date column:
data = pd.DataFrame(array, columns=names)
data[['D_DATE']] = data[['D_DATE']].astype('datetime64')
But this is giving me error:
ValueError: Error parsing datetime string "17-Jan-23" at position 3
Can someone help how can I resolve this.
Try this:
data['D_DATE'] = pd.to_datetime(data['D_DATE'])
Indexing a single column with double brackets (df[['D_DATE']]) returns a DataFrame with one column named 'D_DATE'. Indexing with a single set of brackets (df['D_DATE']) returns a Series named 'D_DATE'. To create a new column in a DataFrame using the form df[new_col], use single brackets.

Categories