I am trying to shift data in a Pandas dataframe in the following manner from this:
time
value
1
1
2
2
3
3
4
4
5
5
1
6
2
7
3
8
4
9
5
10
To this:
time
value
1
2
3
1
4
2
5
3
1
2
3
6
4
7
5
8
In short, I want to move the data 3 rows down each time a new cycle for a time block begins.
Have not been able to find solution on this, as it seems my English is quite limited not knowing how to describe the problem without an example.
Edit:
Both solutions work. Thank you.
IIUC, you can shift per group:
df['value_shift'] = df.groupby(df['time'].eq(1).cumsum())['value'].shift(2)
output:
time value value_shift
0 1 1 NaN
1 2 2 NaN
2 3 3 1.0
3 4 4 2.0
4 5 5 3.0
5 1 6 NaN
6 2 7 NaN
7 3 8 6.0
8 4 9 7.0
9 5 10 8.0
Try with groupby:
df["value"] = df.groupby(df["time"].diff().lt(0).cumsum())["value"].shift(2)
>>> df
time value
0 1 NaN
1 2 NaN
2 3 1.0
3 4 2.0
4 5 3.0
5 1 NaN
6 2 NaN
7 3 6.0
8 4 7.0
9 5 8.0
I have a series that looks like the following:
Time Step
0 0
1 1
2 2
3 2
4 2
5 3
6 0
7 1
8 2
9 2
10 2
11 3
I want to use Pandas to perform a conditional rolling count of each block of time that contains step = 2 and output the count to a new column. I have found answers on how to do conditional rolling counts (Pandas: conditional rolling count), but I cannot figure out how to count the sequential runs of each step as a single block. The output should look like this:
Time Step Run_count
0 0
1 1
2 2 RUN1
3 2 RUN1
4 2 RUN1
5 3
6 0
7 1
8 2 RUN2
9 2 RUN2
10 2 RUN2
11 3
Let's try:
s = df.Step.where(df.Step.eq(2))
df['Run_count'] = s.dropna().groupby(s.isna().cumsum()).ngroup()+1
Output:
Time Step Run_count
0 0 0 NaN
1 1 1 NaN
2 2 2 1.0
3 3 2 1.0
4 4 2 1.0
5 5 3 NaN
6 6 0 NaN
7 7 1 NaN
8 8 2 2.0
9 9 2 2.0
10 10 2 2.0
11 11 3 NaN
I am trying to calculate the Moving Average on the following dataframe but i have trouble joining the result back to the dataframe
The dataframe is : (Moving Average values are displayed in parentheses)
Key1 Key2 Value MovingAverage
1 2 1 (Nan)
1 7 2 (Nan)
1 8 3 (Nan)
2 5 1 (Nan)
2 3 2 (Nan)
2 2 3 (Nan)
3 7 1 (Nan)
3 5 2 (Nan)
3 8 3 (Nan)
4 7 1 (1.33)
4 2 2 (2)
4 9 3 (Nan)
5 8 1 (2.33)
5 3 2 (Nan)
5 9 3 (Nan)
6 2 1 (2)
6 7 2 (1.33)
6 9 3 (3)
The Code is :
import pandas as pd
d = {'Key1':[1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6], 'Key2':[2,7,8,5,3,2,7,5,8,7,2,9,8,3,9,2,7,9],'Value':[1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3]}
df = pd.DataFrame(d)
print(df)
MaDf = df.groupby(['Key2'])['Value'].rolling(window=3).mean().to_frame('mean')
print (MaDf)
If you run the code it will correctly calculate the Moving Average based on 'Key2' and 'Value' but i can't find the way to correctly reinsert it back to the original dataframe (df)
Remove first level of MultiIndex by Series.reset_index with drop=True for align by second level:
df['mean'] = (df.groupby('Key2')['Value']
.rolling(window=3)
.mean()
.reset_index(level=0, drop=True))
print (df)
Key1 Key2 Value mean
0 1 2 1 NaN
1 1 7 2 NaN
2 1 8 3 NaN
3 2 5 1 NaN
4 2 3 2 NaN
5 2 2 3 NaN
6 3 7 1 NaN
7 3 5 2 NaN
8 3 8 3 NaN
9 4 7 1 1.333333
10 4 2 2 2.000000
11 4 9 3 NaN
12 5 8 1 2.333333
13 5 3 2 NaN
14 5 9 3 NaN
15 6 2 1 2.000000
16 6 7 2 1.333333
17 6 9 3 3.000000
If default RangeIndex is possible use Series.sort_index:
df['mean'] = (df.groupby(['Key2'])['Value']
.rolling(window=3)
.mean()
.sort_index(level=1)
.values)
print (df)
Key1 Key2 Value mean
0 1 2 1 NaN
1 1 7 2 NaN
2 1 8 3 NaN
3 2 5 1 NaN
4 2 3 2 NaN
5 2 2 3 NaN
6 3 7 1 NaN
7 3 5 2 NaN
8 3 8 3 NaN
9 4 7 1 1.333333
10 4 2 2 2.000000
11 4 9 3 NaN
12 5 8 1 2.333333
13 5 3 2 NaN
14 5 9 3 NaN
15 6 2 1 2.000000
16 6 7 2 1.333333
17 6 9 3 3.000000
Simply df['mean'] = df.groupby(['Key2'])['Value'].rolling(window=3).mean().values
I have a data frame (sample, not real):
df =
A B C D E F
0 3 4 NaN NaN NaN NaN
1 9 8 NaN NaN NaN NaN
2 5 9 4 7 NaN NaN
3 5 7 6 3 NaN NaN
4 2 6 4 3 NaN NaN
Now I want to fill NaN values with previous couple(!!!) values of row (fill Nan with left existing couple of numbers and apply to the whole row) and apply this to the whole dataset.
There are a lot of answers concerning filling the columns. But in
this case I need to fill based on rows.
There are also answers related to fill NaN based on other column, but
in my case number of columns are more than 2000. This is sample data
Desired output is:
df =
A B C D E F
0 3 4 3 4 3 4
1 9 8 9 8 9 8
2 5 9 4 7 4 7
3 5 7 6 3 6 3
4 2 6 4 3 4 3
IIUC, a quick solution without reshaping the data:
df.iloc[:,::2] = df.iloc[:,::2].ffill(1)
df.iloc[:,1::2] = df.iloc[:,1::2].ffill(1)
df
Output:
A B C D E F
0 3 4 3 4 3 4
1 9 8 9 8 9 8
2 5 9 4 7 4 7
3 5 7 6 3 6 3
4 2 6 4 3 4 3
Idea is reshape DataFrame for possible forward and back filling missing values with stack and modulo and integer division of 2 of array by length of columns:
c = df.columns
a = np.arange(len(df.columns))
df.columns = [a // 2, a % 2]
#if possible some pairs missing remove .astype(int)
df1 = df.stack().ffill(axis=1).bfill(axis=1).unstack().astype(int)
df1.columns = c
print (df1)
A B C D E F
0 3 4 3 4 3 4
1 9 8 9 8 9 8
2 5 9 4 7 4 7
3 5 7 6 3 6 3
4 2 6 4 3 4 3
Detail:
print (df.stack())
0 1 2
0 0 3 NaN NaN
1 4 NaN NaN
1 0 9 NaN NaN
1 8 NaN NaN
2 0 5 4.0 NaN
1 9 7.0 NaN
3 0 5 6.0 NaN
1 7 3.0 NaN
4 0 2 4.0 NaN
1 6 3.0 NaN
I have this two Series in a DataFrame:
A B
1 2
2 3
2 1
4 3
5 2
and I would to create a new column df['C] that counts how many times the value in column df['A']is higher than the value in column df['B'] for a rolling window of the previous 2 (or more) rows.
The result would be something like this:
A B C
1 2 NaN
2 3 NaN
2 1 0
4 3 1
5 2 2
I would also like to create a column that sums the data in df['A'] higher than df['B'] always using a rolling window.
With the following result:
A B C D
1 2 NaN NaN
2 3 NaN NaN
2 1 0 0
4 3 1 2
5 2 2 6
Thanks in advance.
IIUC
df.assign(C=df.A.gt(df.B).rolling(2).sum().shift(),D=(df.A.gt(df.B)*df.A).rolling(2).sum().shift())
Out[1267]:
A B C D
0 1 2 NaN NaN
1 2 3 NaN NaN
2 2 1 0.0 0.0
3 4 3 1.0 2.0
4 5 2 2.0 6.0