computing daily return/increment on dataframe

computing daily return/increment on dataframe - python

So ive some timeseries data on which i want to compute daily return/increment, where Daily increment = value_at_time(T)/ value_at_time(T-1)
import pandas as pd
df=pd.DataFrame([1,2,3,7]) #Sample data frame
df[1:]
out:
0
1 2
2 3
3 7
df[:-1]
out:
0
0 1
1 2
2 3
######### Method 1
df[1:]/df[:-1]
out:
0
0 NaN
1 1
2 1
3 NaN
######### Method 2
df[1:]/df[:-1].values
out:
0
1 2.000000
2 1.500000
3 2.333333
######### Method 3
df[1:].values/df[:-1]
out:
0
0 2
1 1
2 2
My questions are that
If df[:-1] and df[1:] have only three values (row slices of the
dataframe) then why doesnt Method_1 work ?
Why are method 2 & 3 which are almost similar giving different results?
Why using .values in Method_2 makes it work

Lets look at each
method 1, if you look at what the slices return you can see that the indices don't align:
In [87]:
print(df[1:])
print(df[:-1])
0
1 2
2 3
3 7
0
0 1
1 2
2 3
so then when do the division only 2 columns intersect:
In [88]:
df[1:]/df[:-1]
Out[88]:
0
0 NaN
1 1.0
2 1.0
3 NaN
Method 2 produces a np array, this has no index so the division will be performed in order element-wise as expected:
In [89]:
df[:-1].values
Out[89]:
array([[1],
[2],
[3]], dtype=int64)
Giving:
In [90]:
df[1:]/df[:-1].values
Out[90]:
0
1 2.000000
2 1.500000
3 2.333333
Method 3 is the same reason as method 2
So the question is how to do this in pure pandas? We use shift to allow you to align the indices as desired:
In [92]:
df.shift(-1)/df
Out[92]:
0
0 2.000000
1 1.500000
2 2.333333
3 NaN

Related

i cant find the min value(which is>0) in each row in selected columns df[df[col]>0]

this is my data and i want to find the min value of selected columns(a,b,c,d) in each row then calculate the difference between that and dd. I need to ignore 0 in rows, I mean in the first row i need to find 8

need to ignore 0 in rows
Then just replace it with nan, consider following simple example
import numpy as np
import pandas as pd
df = pd.DataFrame({"A":[1,2,0],"B":[3,5,7],"C":[7,0,7]})
df.replace(0,np.nan).apply(min)
df["minvalue"] = df.replace(0,np.nan).apply("min",axis=1)
print(df)
gives output
A B C minvalue
0 1 3 7 1.0
1 2 5 0 2.0
2 0 7 7 7.0

You can use pandas.apply with axis=1 and all column ['a','b','c','d'] convert to Series then replace 0 with +inf and find min. At the end compute diff min with colmun 'dd'.
import numpy as np
df['min_dd'] = df.apply(lambda row: min(pd.Series(row[['a','b','c','d']]).replace(0,np.inf)) - row['d'], axis=1)
print(df)
a b c d dd min_dd
0 0 15 0 8 6 2.0 # min_without_zero : 8 , dd : 6 -> 8-6=2
1 2 0 5 3 2 0.0 # min_without_zero : 2 , dd : 2 -> 2-2=0
2 5 3 3 0 2 1.0 # 3 - 2
3 0 2 3 4 2 0.0 # 2 - 2

You can try
cols = ['a','b','c','d']
df['res'] = df[cols][df[cols].ne(0)].min(axis=1) - df['dd']
print(df)
a b c d dd res
0 0 15 0 8 6 2.0
1 2 0 5 3 2 0.0
2 5 3 3 0 2 1.0
3 2 3 4 4 2 0.0

append two data frames with unequal columns

I am trying to append two dataframes in pandas which have two different no of columns.
Example:
df1
A B
1 1
2 2
3 3
df2
A
4
5
Expected concatenated dataframe
df
A B
1 1
2 2
3 3
4 Null(or)0
5 Null(or)0
I am using
df1.append(df2) when the columns are same. But no idea how to deal with unequal no of columns.

How about pd.concat?
>>> pd.concat([df1,df2])
A B
0 1 1.0
1 2 2.0
2 3 3.0
0 4 NaN
1 5 NaN
Also, df1.append(df2) still works:
>>> df1.append(df2)
A B
0 1 1.0
1 2 2.0
2 3 3.0
0 4 NaN
1 5 NaN
From the docs of df.append:
Columns not in this frame are added as new columns.

Use the concat to join two columns and pass the additional argument ignore_index=True to reset the index other wise you might end with indexes as 0 1 2 0 1. For additional information refer docs here:
df1 = pd.DataFrame({'A':[1,2,3], 'B':[1,2,3]})
df2 = pd.DataFrame({'A':[4,5]})
df = pd.concat([df1,df2],ignore_index=True)
df
Output:
without ignore_index = True :
A B
0 1 1.0
1 2 2.0
2 3 3.0
0 4 NaN
1 5 NaN
with ignore_index = True :
A B
0 1 1.0
1 2 2.0
2 3 3.0
3 4 NaN
4 5 NaN

Pandas agg fuction with operations on multiple columns

I am interested if we can use pandas.core.groupby.DataFrameGroupBy.agg function to make arithmetic operations on multiple columns columns. For example:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.arange(15).reshape(5, 3))
df['C'] = [0, 0, 2, 2, 5]
print(df.groupby('C').mean()[0] - df.groupby('C').mean()[1])
print(df.groupby('C').agg({0: 'mean', 1: 'sum', 2: 'nunique', 'C': 'mean0-mean1'}))
Is it somehow possible that we receive result like in this example: the difference between means of column 0 and column 1 grouped by column 'C'?
df
0 1 2 C
0 0 1 2 0
1 3 4 5 0
2 6 7 8 2
3 9 10 11 2
4 12 13 14 5
Groupped difference
C
0 -1.0
2 -1.0
5 -1.0
dtype: float64
I am not interested with solutions that does not use agg method. I am curious only if agg method can take multiple columns as argument and then do some operations on them to return one columns after job is done.

IIUC:
In [12]: df.groupby('C').mean().diff(axis=1)
Out[12]:
0 1 2
C
0 NaN 1.0 1.0
2 NaN 1.0 1.0
5 NaN 1.0 1.0
or
In [13]: df.groupby('C').mean().diff(-1, axis=1)
Out[13]:
0 1 2
C
0 -1.0 -1.0 NaN
2 -1.0 -1.0 NaN
5 -1.0 -1.0 NaN

Combine data from two columns into one, except if second is already occupied in pandas

Say I have two columns in a data frame, one of which is incomplete.
df = pd.DataFrame({'a': [1, 2, 3, 4], 'b':[5, '', 6, '']})
df
Out:
a b
0 1 5
1 2
2 3 6
3 4
is there a way to fill the empty values in column b with the corresponding values in column a whilst leaving the rest of column b intact?
such that you obtain without iterating over the column?
df
Out:
a b
0 1 5
1 2 2
2 3 6
3 4 4
I think you can use the apply method - but I am not sure. For reference the dataset I'm dealing with is quite large (appx 1GB) which is why iteration - my first attempt was not a good idea.

If blanks are empty strings, you could
In [165]: df.loc[df['b'] == '', 'b'] = df['a']
In [166]: df
Out[166]:
a b
0 1 5
1 2 2
2 3 6
3 4 4
However, if your blanks are NaNs, you could use fillna
In [176]: df
Out[176]:
a b
0 1 5.0
1 2 NaN
2 3 6.0
3 4 NaN
In [177]: df['b'] = df['b'].fillna(df['a'])
In [178]: df
Out[178]:
a b
0 1 5.0
1 2 2.0
2 3 6.0
3 4 4.0

You can use np.where to evaluate df.b, if it's not empty keep its value, otherwise use df.a instead.
df.b=np.where(df.b,df.b,df.a)
df
Out[33]:
a b
0 1 5
1 2 2
2 3 6
3 4 4

You can use pd.Series.where using a boolean version of df.b because '' resolve to False
df.assign(b=df.b.where(df.b.astype(bool), df.a))
a b
0 1 5
1 2 2
2 3 6
3 4 4

You can use replace and ffill with axis=1:
df.replace('',np.nan).ffill(axis=1).astype(df.a.dtypes)
Output:
a b
0 1 5
1 2 2
2 3 6
3 4 4

Looking for a pandas function analogous to DataFrame.nafill()

I would like to apply a function that acts like fillna() but takes a different value than nan. Unfortunately DataFrame.replace() will not work in my case. Here is an example: Given a DataFrame:
df = pd.DataFrame([[1,2,3],[4,-1,-1],[5,6,-1]])
0 1 2
0 1 2.0 3.0
1 4 -1.0 -1.0
2 5 6.0 -1.0
3 7 8.0 NaN
I am looking for a function which will output:
0 1 2
0 1 2.0 3.0
1 4 2.0 3.0
2 5 6.0 3.0
3 7 8.0 NaN
So df.replace() with to_replace=-1 and 'method='ffill' will not work because it requires a column-independent value which will replace the -1 entries. In my example it is column-dependent. I know I can code it with a loop but am looking for an efficient code as it will be applied to a large DataFrame. Any suggestions? Thank you.

You can just replace the value with NaN and then call ffill:
In [3]:
df.replace(-1, np.NaN).ffill()
Out[3]:
0 1 2
0 1 2 3
1 4 2 3
2 5 6 3
I think you're over thinking this
EDIT
If you already have NaN values then create a boolean mask and update just those elements again with ffill on the inverse of the mask:
In [15]:
df[df == -1] = df[df != -1].ffill()
df
Out[15]:
0 1 2
0 1 2 3
1 4 2 3
2 5 6 3
3 7 8 NaN
Another method (thanks to #DSM in comments) is to use where to essentially do the same thing as above:
In [17]:
df.where(df != -1, df.replace(-1, np.nan).ffill())
Out[17]:
0 1 2
0 1 2 3
1 4 2 3
2 5 6 3
3 7 8 NaN

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

computing daily return/increment on dataframe - python

Related

i cant find the min value(which is>0) in each row in selected columns df[df[col]>0]

append two data frames with unequal columns

Pandas agg fuction with operations on multiple columns

Combine data from two columns into one, except if second is already occupied in pandas

Looking for a pandas function analogous to DataFrame.nafill()

Categories

Resources