Unable to correctly use Pandas Interpolate over a series - python

I am trying to use the interpolation functionality provided by Pandas, here but for some reason, cannot get my Series to adjust to the correct values. I casted them to a float64, but that did not appear to help. Any recommendations?
The code:
for feature in price_data:
print price_data[feature]
print "type:"
print type(price_data[feature])
newSeries = price_data[feature].astype(float).interpolate()
print "newSeries: "
print newSeries
The output:
0 178.9000
1 0.0000
2 178.1200
Name: open_price, dtype: object
type:
<class 'pandas.core.series.Series'>
newSeries:
0 178.90
1 0.00
2 178.12
Name: open_price, dtype: float64

The problem is that there is nothing to interpolate. I'm assuming you want to interpolate the value where zero is. In that case, replace the zero with np.nan then interpolate. One way to do this is
price_data.where(price_data != 0, np.nan).interpolate()
0 178.90
1 178.51
2 178.12
Name: open_price, dtype: float64

Related

Pandas data frame. Change float format. Keep type "float"

I'm trying to change a format of pd data frame column without changing the type of data.
Here is what I have: df = pd.DataFrame({'Age': [24.0, 32.0}])
I'd like to represent Age in 24 32 type or 24.00 32.00 and keep them as floats.
Here is what I can do:
df['Age'].map('{:,.2f}'.format)
But this line changes the type of data to object.
I was also trying to apply: `
df = df.style.format({'Age': '{:,.2f}'.format})`
but there is something wrong in it. Please help to figure out the right way.
Your dataFrame itself a type float.
Dataframe:
>>> df
Age
0 24.0
1 32.0
Check DataFrame type:
>>> df.dtypes
Age float64
dtype: object
check dtype for DataFrame column type:
>>> df.Age
0 24.0
1 32.0
Name: Age, dtype: float64
OR even check like:
>>> df['Age'].dtype.kind
'f'
The way you are using to round up double digit zeros that's correct but converting them again to float will get them remain in single zero as being float.
>>> df['Age'].map('{:,.2f}'.format)
0 24.00
1 32.00
Name: Age, dtype: object
As you are interested keeping either mimic like int values 24, 32 or 24.00 & 32.00, if you are only interested in the display of floats then you can do pd.set_option('display.float_format','{:.0f}'.format), which doesn't actually affect your data.
For Floating Format without leading zeros
>>> pd.set_option('display.float_format','{:.0f}'.format)
>>> df
Age
0 24
1 32
>>> df.dtypes
Age float64
dtype: object
For Floating Format
>>> pd.set_option('display.float_format','{:.2f}'.format)
>>> df
Age
0 24.00
1 32.00
>>> df.dtypes
Age float64
dtype: object
Alternative way
Set the display precision option:
>>> pd.set_option('precision', 0)
>>> df
Age
0 24
1 32
>>> df.dtypes
Age float64
dtype: object
I believe using df.round is the best way:
>>> df = pd.DataFrame({'Age': [24.0, 32.0]})
>>> df2 = df.round({'Ages': 2})
>>> print(df2.dtypes)
>>> df2
Age
0 24.00
1 32.00
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.round.html
If you want to apply to specific column of the dataframe
df["col_name"] = df["col_name"].apply(lambda x: format(float(x),".2f"))

Convert DataFrame with 'N/As' to float to compute percent change

I am trying convert the following DataFrame (contains several 'N/As') to float so that I can perform a percent change operation:
d = pd.DataFrame({"A":['N/A','$10.00', '$5.00'],
"B":['N/A', '$10.00', '-$5.00']})
Ultimately, I would like the result to be:
(UPDATE: I do not want to remove the original N/A values. I'd like to keep them there as placeholders.)
Because there aren't any flags for dealing with negative numbers, I cannot use:
pct_change(-1)
So, I need to use:
d['A'].diff(-1)/d['A'].shift(-1).abs()
But, I get the error:
TypeError: unsupported operand type(s) for -: 'str' and 'str'
For a first step, I am trying to convert the data from object/string to float, but the output is unexpected (to me). I am getting float 'NaNs' instead of the actual number.
>d['A_float'] = pd.to_numeric(d['A'], errors='coerce')
>d
A B A_float
0 N/A N/A NaN
1 $10.00 -$100.00 NaN
2 $5.00 -$5.00 NaN
>d.dtypes
A object
B object
A_float float64
dtype: object
As a simple test, I tried subtracting '1' from the value, but still got float 'NaN'.
>d['A_float_minus1_test'] = pd.to_numeric(d['A'], errors='coerce')-1
>d
A B A_float A_float_minus1_test
0 N/A N/A NaN NaN
1 $10.00 -$100.00 NaN NaN
2 $5.00 -$5.00 NaN NaN
>d.dtypes
A object
B object
A_float float64
A_float_minus1_test float64
dtype: object
Is there a simple way to get the following result? The way I am thinking is to individually change each DataFrame column to float, then perform the operation. There must be an easier way.
Desired output:
(UPDATE: I do not want to remove the original N/A values. I'd like to keep them there as placeholders.)
Thanks!
To convert your columns from string to float, you can use apply, like such:
d['A_float'] = d['A'].apply(lambda x: float(x.split('$')[1]) if x != '' else 0.0)
The x.split('$')[1] is used to remove the $ character (and eventually the minus before).
Then I am not sure of what your are trying to do, but if you are trying to compute the percentage of A from B, you can use np.vectorize like this:
d['Percent'] = np.vectorize(percent)(d['A'],d['B'])
def percent(p1, p2):
return (100 * p2) / p1
import pandas as pd
d = pd.DataFrame({"A":['N/A','$10.00', '$5.00'],
"B":['N/A', '$10.00', '-$5.00']})
# Covert to number, remove '$', assign to new columns
d[['dA','dB']] = d[['A','B']].apply(lambda s: s.str.replace('$','')).apply(pd.to_numeric, errors='coerce')
# Perform calculations across desired column
d[['dA','dB']] = d[['dA','dB']].diff(-1)/d[['dA','dB']].shift(-1).abs()
print(d)
A B dA dB
0 N/A N/A NaN NaN
1 $10.00 $10.00 1.0 3.0
2 $5.00 -$5.00 NaN NaN

Converting exponential notation numbers to strings - explanation

I have DataFrame from this question:
temp=u"""Total,Price,test_num
0,71.7,2.04256e+14
1,39.5,2.04254e+14
2,82.2,2.04188e+14
3,42.9,2.04171e+14"""
df = pd.read_csv(pd.compat.StringIO(temp))
print (df)
Total Price test_num
0 0 71.7 2.042560e+14
1 1 39.5 2.042540e+14
2 2 82.2 2.041880e+14
3 3 42.9 2.041710e+14
If convert floats to strings get trailing 0:
print (df['test_num'].astype('str'))
0 204256000000000.0
1 204254000000000.0
2 204188000000000.0
3 204171000000000.0
Name: test_num, dtype: object
Solution is convert floats to integer64:
print (df['test_num'].astype('int64'))
0 204256000000000
1 204254000000000
2 204188000000000
3 204171000000000
Name: test_num, dtype: int64
print (df['test_num'].astype('int64').astype(str))
0 204256000000000
1 204254000000000
2 204188000000000
3 204171000000000
Name: test_num, dtype: object
Question is why it convert this way?
I add this poor explanation, but feels it should be better:
Poor explanation:
You can check dtype of converted column - it return float64.
print (df['test_num'].dtype)
float64
After converting to string it remove exponential notation and cast to floats, so added traling 0:
print (df['test_num'].astype('str'))
0 204256000000000.0
1 204254000000000.0
2 204188000000000.0
3 204171000000000.0
Name: test_num, dtype: object
When you use pd.read_csv to import data and do not define datatypes,
pandas makes an educated guess and in this case decides, that column
values like "2.04256e+14" are best represented by a float value.
This, converted back to string adds a ".0". As you corrently write,
converting to int64 fixes this.
If you know that the column has int64 values only before input (and
no empty values, which np.int64 cannot handle), you can force this type on import to avoid the unneeded conversions.
import numpy as np
temp=u"""Total,Price,test_num
0,71.7,2.04256e+14
1,39.5,2.04254e+14
2,82.2,2.04188e+14
3,42.9,2.04171e+14"""
df = pd.read_csv(pd.compat.StringIO(temp), dtype={2: np.int64})
print(df)
returns
Total Price test_num
0 0 71.7 204256000000000
1 1 39.5 204254000000000
2 2 82.2 204188000000000
3 3 42.9 204171000000000

Pandas convert data type from object to float

I read some weather data from a .csv file as a dataframe named "weather". The problem is that the data type of one of the columns is object. This is weird, as it indicates temperature. How do I change it to having a float data type? I tried to_numeric, but it can't parse it.
weather.info()
weather.head()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 304 entries, 2017-01-01 to 2017-10-31
Data columns (total 2 columns):
Temp 304 non-null object
Rain 304 non-null float64
dtypes: float64(1), object(1)
memory usage: 17.1+ KB
Temp Rain
Date
2017-01-01 12.4 0.0
2017-02-01 11 0.6
2017-03-01 10.4 0.6
2017-04-01 10.9 0.2
2017-05-01 13.2 0.0
You can use pandas.Series.astype
You can do something like this :
weather["Temp"] = weather.Temp.astype(float)
You can also use pd.to_numeric that will convert the column from object to float
For details on how to use it checkout this link :http://pandas.pydata.org/pandas-docs/version/0.20/generated/pandas.to_numeric.html
Example :
s = pd.Series(['apple', '1.0', '2', -3])
print(pd.to_numeric(s, errors='ignore'))
print("=========================")
print(pd.to_numeric(s, errors='coerce'))
Output:
0 apple
1 1.0
2 2
3 -3
=========================
dtype: object
0 NaN
1 1.0
2 2.0
3 -3.0
dtype: float64
In your case you can do something like this:
weather["Temp"] = pd.to_numeric(weather.Temp, errors='coerce')
Other option is to use convert_objects
Example is as follows
>> pd.Series([1,2,3,4,'.']).convert_objects(convert_numeric=True)
0 1
1 2
2 3
3 4
4 NaN
dtype: float64
You can use this as follows:
weather["Temp"] = weather.Temp.convert_objects(convert_numeric=True)
I have showed you examples because if any of your column won't have a number then it will be converted to NaN... so be careful while using it.
I tried all methods suggested here but sadly none worked. Instead, found this to be working:
df['column'] = pd.to_numeric(df['column'],errors = 'coerce')
And then check it using:
print(df.info())
I eventually used:
weather["Temp"] = weather["Temp"].convert_objects(convert_numeric=True)
It worked just fine, except that I got the following message.
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:3: FutureWarning:
convert_objects is deprecated. Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
You can try the following:
df['column'] = df['column'].map(lambda x: float(x))
First check your data cuz you may get an error if you have ',' instead of '.'
if so, you need to transform every ',' into '.' with a function :
def replacee(s):
i=str(s).find(',')
if(i>0):
return s[:i] + '.' + s[i+1:]
else :
return s
then you need to apply this function on every row in your column :
dfOPA['Montant']=dfOPA['Montant'].apply(replacee)
then the convert function will work fine :
dfOPA['Montant'] = pd.to_numeric(dfOPA['Montant'],errors = 'coerce')
Eg, For Converting $40,000.00 object to 40000 int or float32
Follow this step by step :
$40,000.00 ---(**1**. remove $)---> 40,000.00 ---(**2**. remove , comma)---> 40000.00 ---(**3**. remove . dot)---> 4000000 ---(**4**. remove empty space)---> 4000000 ---(**5**. Remove NA Values)---> 4000000 ---(**6**. now this is object type so, convert to int using .astype(int) )---> 4000000 ---(**7**. divide by 100)---> 40000
Implementing code In Pandas
table1["Price"] = table1["Price"].str.replace('$','')<br>
table1["Price"] = table1["Price"].str.replace(',','')<br>
table1["Price"] = table1["Price"].str.replace('.','')<br>
table1["Price"] = table1["Price"].str.replace(' ','')
table1 = table1.dropna()<br>
table1["Price"] = table1["Price"].astype(int)<br>
table1["Price"] = table1["Price"] / 100<br>
Finally it's done

How do I display floats as currency with negative sign before currency

consider the pd.Series s
s = pd.Series([-1.23, 4.56])
s
0 -1.23
1 4.56
dtype: float64
I can format floats with pandas display.float_format option
with pd.option_context('display.float_format', '${:,.2f}'.format):
print s
0 $-1.23
1 $4.56
dtype: float64
But how do I format it in such a way that I get the - sign in front of the $
0 -$1.23
1 $4.56
dtype: float64
You can substitute the formatting function with your own. Below is just a demo of how it works, you can tune it to your own needs:
def formatfunc(*args, **kwargs):
value = args[0]
if value >= 0:
return '${:,.2f}'.format(value)
else:
return '-${:,.2f}'.format(abs(value))
with pd.option_context('display.float_format', formatfunc):
print(s)
And you get:
0 -$1.23
1 $4.56
dtype: float64

Categories