Second lowest value over the past 756 days in pandas [duplicate]

Second lowest value over the past 756 days in pandas [duplicate] - python

I need to get the rolling 2nd largest value of a df.
To get the largest value I do
max = df.sort_index(ascending=True).rolling(10).max()
When I try this, python throws an error
max = df.sort_index(ascending=True).rolling(10).nlargest(2)
AttributeError: 'Rolling' object has no attribute 'nlargest'
Is this a bug? What else can I use that is performant?

I'd do something like this:
df.rolling(10).apply(lambda x: pd.Series(x).nlargest(2).iloc[-1])

Use np.sort in descending order and select second value:
np.random.seed(2019)
df = pd.DataFrame({
'B': np.random.randint(20, size=15)
})
print (df)
B
0 8
1 18
2 5
3 15
4 12
5 10
6 16
7 16
8 7
9 5
10 19
11 12
12 16
13 18
14 5
a = df.rolling(10).apply(lambda x: -np.sort(-x)[1])
#alternative
#a = df.rolling(10).apply(lambda x: np.sort(x)[-2])
print (a)
B
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 16.0
10 18.0
11 16.0
12 16.0
13 18.0
14 18.0

Related

Adding rows in dataframe with a condition in list

I am trying to add a row with the condition but was having difficulty achieving this.
Currently, I have pandas dataframes in a list that looks like following
The objective is to add a row with the condition that I want to add a row with a fixed number for 'ID' and increase the month by 3.
For example, for this[1] I want it to add rows that look like following
ID | month | num
6 | 0 | 5
6 | 3 | NaN
6 | 6 | 4
6 | 9 | NaN
6 | 12 | 3
...
6 | 36 | 1
I am trying to create a function that takes the index of the list (so it would be an actual dataframe), the max number of the month of that dataframe, and month I want it to be incremented by (3), which would look like
def add_rows(df, max_mon, res):
if max_mon > res:
add rows with fixed ID and NaN num
skip the month that already exist
final = []
for i in range(len(this)):
final.append(add_rows(this[i], this[i]['month'].max(), 3))
I have tried to insert rows but I did not manage to get it work.
The toy data
d = {'ID':[5,5,5,5,5], 'month':[0,6,12,24,36], 'num':[5,4,3,2,1]}
tempo = pd.DataFrame(data = d)
d2 = {'ID':[6,6,6,6,6], 'month':[0,6,12,18,36], 'num':[5,4,3,2,1]}
tempo2 = pd.DataFrame(data = d2)
this = []
this.append(tempo)
this.append(tempo2)
I would really appreciate if I could get help on building the function!

You can use:
for i, df in enumerate(this):
this[i] = (df
.set_index('month')
.groupby('ID')
.apply(lambda x: x.drop(columns='ID')
.reindex(range(x.index.min(), x.index.max()+3, 3))
)
.reset_index()[df.columns]
)
Updated this:
[ ID month num
0 5 0 5.0
1 5 3 NaN
2 5 6 4.0
3 5 9 NaN
4 5 12 3.0
5 5 15 NaN
6 5 18 NaN
7 5 21 NaN
8 5 24 2.0
9 5 27 NaN
10 5 30 NaN
11 5 33 NaN
12 5 36 1.0,
ID month num
0 6 0 5.0
1 6 3 NaN
2 6 6 4.0
3 6 9 NaN
4 6 12 3.0
5 6 15 NaN
6 6 18 2.0
7 6 21 NaN
8 6 24 NaN
9 6 27 NaN
10 6 30 NaN
11 6 33 NaN
12 6 36 1.0]

How to do pandas rolling window in both forward and backward at the same time

I have a pd.DataFrame df with one column, say:
A = [1,2,3,4,5,6,7,8,2,4]
df = pd.DataFrame(A,columns = ['A'])
For each row, I want to take previous 2 values, current value and next 2 value (a window= 5) and get the sum and store it in new column. Desire output,
A A_sum
1 6
2 10
3 15
4 20
5 25
6 30
7 28
8 27
2 21
4 14
I have tried,
df['A_sum'] = df['A'].rolling(2).sum()
Tried with shift, but all doing either forward or backward, I'm looking for a combination of both.

Use rolling by 5, add parameter center=True and min_periods=1 to Series.rolling:
df['A_sum'] = df['A'].rolling(5, center=True, min_periods=1).sum()
print (df)
A A_sum
0 1 6.0
1 2 10.0
2 3 15.0
3 4 20.0
4 5 25.0
5 6 30.0
6 7 28.0
7 8 27.0
8 2 21.0
9 4 14.0

If you are allowed to use numpy, then you might use numpy.convolve to get desired output
import numpy as np
import pandas as pd
A = [1,2,3,4,5,6,7,8,2,4]
B = np.convolve(A,[1,1,1,1,1], 'same')
df = pd.DataFrame({"A":A,"A_sum":B})
print(df)
output
A A_sum
0 1 6
1 2 10
2 3 15
3 4 20
4 5 25
5 6 30
6 7 28
7 8 27
8 2 21
9 4 14

You can use shift for this (straightforward if not elegant):
df["A_sum"] = df.A + df.A.shift(-2).fillna(0) + df.A.shift(-1).fillna(0) + df.A.shift(1).fillna(0)
output:
A A_sum
0 1 6.0
1 2 10.0
2 3 14.0
3 4 18.0
4 5 22.0
5 6 26.0
6 7 23.0
7 8 21.0
8 2 14.0
9 4 6.0

Pandas Rolling Groupby Shift back 1, Trying to lag rolling sum

I am trying to get a rolling sum of the past 3 rows for the same ID but lagging this by 1 row. My attempt looked like the below code and i is the column. There has to be a way to do this but this method doesnt seem to work.
for i in df.columns.values:
df.groupby('Id', group_keys=False)[i].rolling(window=3, min_periods=2).mean().shift(1)
id dollars lag
1 6 nan
1 7 nan
1 6 6.5
3 7 nan
3 4 nan
3 4 5.5
3 3 5
5 6 nan
5 5 nan
5 6 5.5
5 12 5.67
5 7 8.3

I am trying to get a rolling sum of the past 3 rows for the same ID but lagging this by 1 row.
You can create the lagged rolling sum by chaining DataFrame.groupby(ID), .shift(1) for the lag 1, .rolling(3) for the window 3, and .sum() for the sum.
Example: Let's say your dataset is:
import pandas as pd
# Reproducible datasets are your friend!
d = pd.DataFrame({'grp':pd.Series(['A']*4 + ['B']*5 + ['C']*6),
'x':pd.Series(range(15))})
print(d)
grp x
A 0
A 1
A 2
A 3
B 4
B 5
B 6
B 7
B 8
C 9
C 10
C 11
C 12
C 13
C 14
I think what you're asking for is this:
d['y'] = d.groupby('grp')['x'].shift(1).rolling(3).sum()
print(d)
grp x y
A 0 NaN
A 1 NaN
A 2 NaN
A 3 3.0
B 4 NaN
B 5 NaN
B 6 NaN
B 7 15.0
B 8 18.0
C 9 NaN
C 10 NaN
C 11 NaN
C 12 30.0
C 13 33.0
C 14 36.0

opposite of df.diff() in pandas

I have searched the forums in search of a cleaner way to create a new column in a dataframe that is the sum of the row with the previous row- the opposite of the .diff() function which takes the difference.
this is how I'm currently solving the problem
df = pd.DataFrame ({'c':['dd','ee','ff', 'gg', 'hh'], 'd':[1,2,3,4,5]}
df['e']= df['d'].shift(-1)
df['f'] = df['d'] + df['e']
Your ideas are appreciated.

You can use rolling with a window size of 2 and sum:
df['f'] = df['d'].rolling(2).sum().shift(-1)
c d f
0 dd 1 3.0
1 ee 2 5.0
2 ff 3 7.0
3 gg 4 9.0
4 hh 5 NaN

df.cumsum()
Example:
data = {'a':[1,6,3,9,5], 'b':[13,1,2,5,23]}
df = pd.DataFrame(data)
df =
a b
0 1 13
1 6 1
2 3 2
3 9 5
4 5 23
df.diff()
a b
0 NaN NaN
1 5.0 -12.0
2 -3.0 1.0
3 6.0 3.0
4 -4.0 18.0
df.cumsum()
a b
0 1 13
1 7 14
2 10 16
3 19 21
4 24 44

If you cannot use rolling, due to multindex or else, you can try using .cumsum(), and then .diff(-2) to sub the .cumsum() result from two positions before.
data = {'a':[1,6,3,9,5,30, 101, 8]}
df = pd.DataFrame(data)
df['opp_diff'] = df['a'].cumsum().diff(2)
a opp_diff
0 1 NaN
1 6 NaN
2 3 9.0
3 9 12.0
4 5 14.0
5 30 35.0
6 101 131.0
7 8 109.0
Generally to get an inverse of .diff(n) you should be able to do .cumsum().diff(n+1). The issue is that that you will get n+1 first results as NaNs

Rolling sum in subgroups of a dataframe (pandas)

I have sessions dataframe that contains E-mail and Sessions (int) columns.
I need to calculate rolling sum of sessions per email (i.e. not globally).
Now, the following works, but it's painfully slow:
emails = set(list(sessions['E-mail']))
ses_sums = []
for em in emails:
email_sessions = sessions[sessions['E-mail'] == em]
email_sessions.is_copy = False
email_sessions['Session_Rolling_Sum'] = pd.rolling_sum(email_sessions['Sessions'], window=self.window).fillna(0)
ses_sums.append(email_sessions)
df = pd.concat(ses_sums, ignore_index=True)
Is there a way of achieving the same in pandas, but using pandas operators on a dataframe instead of creating separate dataframes for each email and then concatenating them?
(either that or some other way of making this faster)

Setup
np.random.seed([3,1415])
df = pd.DataFrame({'E-Mail': np.random.choice(list('AB'), 20),
'Session': np.random.randint(1, 10, 20)})
Solution
The current and proper way to do this is with rolling.sum that can b used on the result of a pd.Series group by object.
# Series Group By
# /------------------------\
df.groupby('E-Mail').Session.rolling(3).sum()
# \--------------/
# Method you want
E-Mail
A 0 NaN
2 NaN
4 11.0
5 7.0
7 10.0
12 16.0
15 16.0
17 16.0
18 17.0
19 18.0
B 1 NaN
3 NaN
6 18.0
8 14.0
9 16.0
10 12.0
11 13.0
13 16.0
14 20.0
16 22.0
Name: Session, dtype: float64
Details
df
E-Mail Session
0 A 9
1 B 7
2 A 1
3 B 3
4 A 1
5 A 5
6 B 8
7 A 4
8 B 3
9 B 5
10 B 4
11 B 4
12 A 7
13 B 8
14 B 8
15 A 5
16 B 6
17 A 4
18 A 8
19 A 6

Say you start with
In [58]: df = pd.DataFrame({'E-Mail': ['foo'] * 3 + ['bar'] * 3 + ['foo'] * 3, 'Session': range(9)})
In [59]: df
Out[59]:
E-Mail Session
0 foo 0
1 foo 1
2 foo 2
3 bar 3
4 bar 4
5 bar 5
6 foo 6
7 foo 7
8 foo 8
In [60]: df[['Session']].groupby(df['E-Mail']).apply(pd.rolling_sum, 3)
Out[60]:
Session
E-Mail
bar 3 NaN
4 NaN
5 12.0
foo 0 NaN
1 NaN
2 3.0
6 9.0
7 15.0
8 21.0
Incidentally, note that I just rearranged your rolling_sum, but it has been deprecated - you should now use rolling:
df[['Session']].groupby(df['E-Mail']).apply(lambda g: g.rolling(3).sum())

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Second lowest value over the past 756 days in pandas [duplicate] - python

I'd do something like this: df.rolling(10).apply(lambda x: pd.Series(x).nlargest(2).iloc[-1])

Related

Adding rows in dataframe with a condition in list

How to do pandas rolling window in both forward and backward at the same time

Pandas Rolling Groupby Shift back 1, Trying to lag rolling sum

opposite of df.diff() in pandas

Rolling sum in subgroups of a dataframe (pandas)

Categories

Resources