How to interpolate in Pandas using only the above/lag values?

How to interpolate in Pandas using only the above/lag values? - python

I have a dataframe like this
ID Value
1 5
1 6
1 Nan
2 Nan
2 8
2 4
2 nan
2 10
3 nan
Expected output:
ID Value
1 5
1 6
1 7
2 Nan
2 8
2 4
2 2
2 10
3 nan
I want to do something like this:
df.groupby('ID')['Value'].shift(all).interpolate()
Currently, I am using this code, but it also takes the below rows into account.
df.groupby('ID')['Value'].interpolate()

Related

Pandas: conditional shift in blocks with reset

I am trying to shift data in a Pandas dataframe in the following manner from this:
time
value
1
1
2
2
3
3
4
4
5
5
1
6
2
7
3
8
4
9
5
10
To this:
time
value
1
2
3
1
4
2
5
3
1
2
3
6
4
7
5
8
In short, I want to move the data 3 rows down each time a new cycle for a time block begins.
Have not been able to find solution on this, as it seems my English is quite limited not knowing how to describe the problem without an example.
Edit:
Both solutions work. Thank you.

IIUC, you can shift per group:
df['value_shift'] = df.groupby(df['time'].eq(1).cumsum())['value'].shift(2)
output:
time value value_shift
0 1 1 NaN
1 2 2 NaN
2 3 3 1.0
3 4 4 2.0
4 5 5 3.0
5 1 6 NaN
6 2 7 NaN
7 3 8 6.0
8 4 9 7.0
9 5 10 8.0

Try with groupby:
df["value"] = df.groupby(df["time"].diff().lt(0).cumsum())["value"].shift(2)
>>> df
time value
0 1 NaN
1 2 NaN
2 3 1.0
3 4 2.0
4 5 3.0
5 1 NaN
6 2 NaN
7 3 6.0
8 4 7.0
9 5 8.0

Pandas: Conditional Rolling Block Count

I have a series that looks like the following:
Time Step
0 0
1 1
2 2
3 2
4 2
5 3
6 0
7 1
8 2
9 2
10 2
11 3
I want to use Pandas to perform a conditional rolling count of each block of time that contains step = 2 and output the count to a new column. I have found answers on how to do conditional rolling counts (Pandas: conditional rolling count), but I cannot figure out how to count the sequential runs of each step as a single block. The output should look like this:
Time Step Run_count
0 0
1 1
2 2 RUN1
3 2 RUN1
4 2 RUN1
5 3
6 0
7 1
8 2 RUN2
9 2 RUN2
10 2 RUN2
11 3

Let's try:
s = df.Step.where(df.Step.eq(2))
df['Run_count'] = s.dropna().groupby(s.isna().cumsum()).ngroup()+1
Output:
Time Step Run_count
0 0 0 NaN
1 1 1 NaN
2 2 2 1.0
3 3 2 1.0
4 4 2 1.0
5 5 3 NaN
6 6 0 NaN
7 7 1 NaN
8 8 2 2.0
9 9 2 2.0
10 10 2 2.0
11 11 3 NaN

How to include Moving Average with Pandas based on Values on other Columns

I am trying to calculate the Moving Average on the following dataframe but i have trouble joining the result back to the dataframe
The dataframe is : (Moving Average values are displayed in parentheses)
Key1 Key2 Value MovingAverage
1 2 1 (Nan)
1 7 2 (Nan)
1 8 3 (Nan)
2 5 1 (Nan)
2 3 2 (Nan)
2 2 3 (Nan)
3 7 1 (Nan)
3 5 2 (Nan)
3 8 3 (Nan)
4 7 1 (1.33)
4 2 2 (2)
4 9 3 (Nan)
5 8 1 (2.33)
5 3 2 (Nan)
5 9 3 (Nan)
6 2 1 (2)
6 7 2 (1.33)
6 9 3 (3)
The Code is :
import pandas as pd
d = {'Key1':[1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6], 'Key2':[2,7,8,5,3,2,7,5,8,7,2,9,8,3,9,2,7,9],'Value':[1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3]}
df = pd.DataFrame(d)
print(df)
MaDf = df.groupby(['Key2'])['Value'].rolling(window=3).mean().to_frame('mean')
print (MaDf)
If you run the code it will correctly calculate the Moving Average based on 'Key2' and 'Value' but i can't find the way to correctly reinsert it back to the original dataframe (df)

Remove first level of MultiIndex by Series.reset_index with drop=True for align by second level:
df['mean'] = (df.groupby('Key2')['Value']
.rolling(window=3)
.mean()
.reset_index(level=0, drop=True))
print (df)
Key1 Key2 Value mean
0 1 2 1 NaN
1 1 7 2 NaN
2 1 8 3 NaN
3 2 5 1 NaN
4 2 3 2 NaN
5 2 2 3 NaN
6 3 7 1 NaN
7 3 5 2 NaN
8 3 8 3 NaN
9 4 7 1 1.333333
10 4 2 2 2.000000
11 4 9 3 NaN
12 5 8 1 2.333333
13 5 3 2 NaN
14 5 9 3 NaN
15 6 2 1 2.000000
16 6 7 2 1.333333
17 6 9 3 3.000000
If default RangeIndex is possible use Series.sort_index:
df['mean'] = (df.groupby(['Key2'])['Value']
.rolling(window=3)
.mean()
.sort_index(level=1)
.values)
print (df)
Key1 Key2 Value mean
0 1 2 1 NaN
1 1 7 2 NaN
2 1 8 3 NaN
3 2 5 1 NaN
4 2 3 2 NaN
5 2 2 3 NaN
6 3 7 1 NaN
7 3 5 2 NaN
8 3 8 3 NaN
9 4 7 1 1.333333
10 4 2 2 2.000000
11 4 9 3 NaN
12 5 8 1 2.333333
13 5 3 2 NaN
14 5 9 3 NaN
15 6 2 1 2.000000
16 6 7 2 1.333333
17 6 9 3 3.000000

Simply df['mean'] = df.groupby(['Key2'])['Value'].rolling(window=3).mean().values

Fill NaN based on previous value of row

I have a data frame (sample, not real):
df =
A B C D E F
0 3 4 NaN NaN NaN NaN
1 9 8 NaN NaN NaN NaN
2 5 9 4 7 NaN NaN
3 5 7 6 3 NaN NaN
4 2 6 4 3 NaN NaN
Now I want to fill NaN values with previous couple(!!!) values of row (fill Nan with left existing couple of numbers and apply to the whole row) and apply this to the whole dataset.
There are a lot of answers concerning filling the columns. But in
this case I need to fill based on rows.
There are also answers related to fill NaN based on other column, but
in my case number of columns are more than 2000. This is sample data
Desired output is:
df =
A B C D E F
0 3 4 3 4 3 4
1 9 8 9 8 9 8
2 5 9 4 7 4 7
3 5 7 6 3 6 3
4 2 6 4 3 4 3

IIUC, a quick solution without reshaping the data:
df.iloc[:,::2] = df.iloc[:,::2].ffill(1)
df.iloc[:,1::2] = df.iloc[:,1::2].ffill(1)
df
Output:
A B C D E F
0 3 4 3 4 3 4
1 9 8 9 8 9 8
2 5 9 4 7 4 7
3 5 7 6 3 6 3
4 2 6 4 3 4 3

Idea is reshape DataFrame for possible forward and back filling missing values with stack and modulo and integer division of 2 of array by length of columns:
c = df.columns
a = np.arange(len(df.columns))
df.columns = [a // 2, a % 2]
#if possible some pairs missing remove .astype(int)
df1 = df.stack().ffill(axis=1).bfill(axis=1).unstack().astype(int)
df1.columns = c
print (df1)
A B C D E F
0 3 4 3 4 3 4
1 9 8 9 8 9 8
2 5 9 4 7 4 7
3 5 7 6 3 6 3
4 2 6 4 3 4 3
Detail:
print (df.stack())
0 1 2
0 0 3 NaN NaN
1 4 NaN NaN
1 0 9 NaN NaN
1 8 NaN NaN
2 0 5 4.0 NaN
1 9 7.0 NaN
3 0 5 6.0 NaN
1 7 3.0 NaN
4 0 2 4.0 NaN
1 6 3.0 NaN

Count if data is higher than another series within a rolling window of past two (or more) values in pandas

I have this two Series in a DataFrame:
A B
1 2
2 3
2 1
4 3
5 2
and I would to create a new column df['C] that counts how many times the value in column df['A']is higher than the value in column df['B'] for a rolling window of the previous 2 (or more) rows.
The result would be something like this:
A B C
1 2 NaN
2 3 NaN
2 1 0
4 3 1
5 2 2
I would also like to create a column that sums the data in df['A'] higher than df['B'] always using a rolling window.
With the following result:
A B C D
1 2 NaN NaN
2 3 NaN NaN
2 1 0 0
4 3 1 2
5 2 2 6
Thanks in advance.

IIUC
df.assign(C=df.A.gt(df.B).rolling(2).sum().shift(),D=(df.A.gt(df.B)*df.A).rolling(2).sum().shift())
Out[1267]:
A B C D
0 1 2 NaN NaN
1 2 3 NaN NaN
2 2 1 0.0 0.0
3 4 3 1.0 2.0
4 5 2 2.0 6.0

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to interpolate in Pandas using only the above/lag values? - python

Related

Pandas: conditional shift in blocks with reset

Pandas: Conditional Rolling Block Count

How to include Moving Average with Pandas based on Values on other Columns

Fill NaN based on previous value of row

Count if data is higher than another series within a rolling window of past two (or more) values in pandas

Categories

Resources