Pandas: conditional shift in blocks with reset - python

I am trying to shift data in a Pandas dataframe in the following manner from this:
time
value
1
1
2
2
3
3
4
4
5
5
1
6
2
7
3
8
4
9
5
10
To this:
time
value
1
2
3
1
4
2
5
3
1
2
3
6
4
7
5
8
In short, I want to move the data 3 rows down each time a new cycle for a time block begins.
Have not been able to find solution on this, as it seems my English is quite limited not knowing how to describe the problem without an example.
Edit:
Both solutions work. Thank you.

IIUC, you can shift per group:
df['value_shift'] = df.groupby(df['time'].eq(1).cumsum())['value'].shift(2)
output:
time value value_shift
0 1 1 NaN
1 2 2 NaN
2 3 3 1.0
3 4 4 2.0
4 5 5 3.0
5 1 6 NaN
6 2 7 NaN
7 3 8 6.0
8 4 9 7.0
9 5 10 8.0

Try with groupby:
df["value"] = df.groupby(df["time"].diff().lt(0).cumsum())["value"].shift(2)
>>> df
time value
0 1 NaN
1 2 NaN
2 3 1.0
3 4 2.0
4 5 3.0
5 1 NaN
6 2 NaN
7 3 6.0
8 4 7.0
9 5 8.0

Related

How to include Moving Average with Pandas based on Values on other Columns

I am trying to calculate the Moving Average on the following dataframe but i have trouble joining the result back to the dataframe
The dataframe is : (Moving Average values are displayed in parentheses)
Key1 Key2 Value MovingAverage
1 2 1 (Nan)
1 7 2 (Nan)
1 8 3 (Nan)
2 5 1 (Nan)
2 3 2 (Nan)
2 2 3 (Nan)
3 7 1 (Nan)
3 5 2 (Nan)
3 8 3 (Nan)
4 7 1 (1.33)
4 2 2 (2)
4 9 3 (Nan)
5 8 1 (2.33)
5 3 2 (Nan)
5 9 3 (Nan)
6 2 1 (2)
6 7 2 (1.33)
6 9 3 (3)
The Code is :
import pandas as pd
d = {'Key1':[1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6], 'Key2':[2,7,8,5,3,2,7,5,8,7,2,9,8,3,9,2,7,9],'Value':[1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3]}
df = pd.DataFrame(d)
print(df)
MaDf = df.groupby(['Key2'])['Value'].rolling(window=3).mean().to_frame('mean')
print (MaDf)
If you run the code it will correctly calculate the Moving Average based on 'Key2' and 'Value' but i can't find the way to correctly reinsert it back to the original dataframe (df)
Remove first level of MultiIndex by Series.reset_index with drop=True for align by second level:
df['mean'] = (df.groupby('Key2')['Value']
.rolling(window=3)
.mean()
.reset_index(level=0, drop=True))
print (df)
Key1 Key2 Value mean
0 1 2 1 NaN
1 1 7 2 NaN
2 1 8 3 NaN
3 2 5 1 NaN
4 2 3 2 NaN
5 2 2 3 NaN
6 3 7 1 NaN
7 3 5 2 NaN
8 3 8 3 NaN
9 4 7 1 1.333333
10 4 2 2 2.000000
11 4 9 3 NaN
12 5 8 1 2.333333
13 5 3 2 NaN
14 5 9 3 NaN
15 6 2 1 2.000000
16 6 7 2 1.333333
17 6 9 3 3.000000
If default RangeIndex is possible use Series.sort_index:
df['mean'] = (df.groupby(['Key2'])['Value']
.rolling(window=3)
.mean()
.sort_index(level=1)
.values)
print (df)
Key1 Key2 Value mean
0 1 2 1 NaN
1 1 7 2 NaN
2 1 8 3 NaN
3 2 5 1 NaN
4 2 3 2 NaN
5 2 2 3 NaN
6 3 7 1 NaN
7 3 5 2 NaN
8 3 8 3 NaN
9 4 7 1 1.333333
10 4 2 2 2.000000
11 4 9 3 NaN
12 5 8 1 2.333333
13 5 3 2 NaN
14 5 9 3 NaN
15 6 2 1 2.000000
16 6 7 2 1.333333
17 6 9 3 3.000000
Simply df['mean'] = df.groupby(['Key2'])['Value'].rolling(window=3).mean().values

How to interpolate in Pandas using only the above/lag values?

I have a dataframe like this
ID Value
1 5
1 6
1 Nan
2 Nan
2 8
2 4
2 nan
2 10
3 nan
Expected output:
ID Value
1 5
1 6
1 7
2 Nan
2 8
2 4
2 2
2 10
3 nan
I want to do something like this:
df.groupby('ID')['Value'].shift(all).interpolate()
Currently, I am using this code, but it also takes the below rows into account.
df.groupby('ID')['Value'].interpolate()

for each value in dataframe column i wanna need to create values on another column (pandas)

i have a data frame of many patients and their measurements in six hour, but for some patients not all the six hour values have been recorded .
I want for each subject-id , add values form 1 to 6 in hour column , and if the hour value already exist write it the same value, other wise leave it blank.
note (i will deal with this blank values using missing value techniques later.)
subject_id hour value
2 1 23
2 3 15
2 5 28
2 6 11
3 4 18
3 6 22
it is the out put i want to get
subject_id hour value
2 1 23
2 2
2 3 15
2 4
2 5 28
2 6 11
3 1
3 2
3 3
3 4 18
3 5
3 6 22
any one can help me how to make that
any help will be appreciated
Use DataFrame.reindex with MultiIndex.from_product:
mux = pd.MultiIndex.from_product([df['subject_id'].unique(), np.arange(1,7)],
names=['subject_id','hour'])
df = df.set_index(['subject_id','hour']).reindex(mux).reset_index()
print (df)
subject_id hour value
0 2 1 23.0
1 2 2 NaN
2 2 3 15.0
3 2 4 NaN
4 2 5 28.0
5 2 6 11.0
6 3 1 NaN
7 3 2 NaN
8 3 3 NaN
9 3 4 18.0
10 3 5 NaN
11 3 6 22.0
Alternative is create all possible combinations by product and then DataFrame.merge with left join:
from itertools import product
df1 = pd.DataFrame(list(product(df['subject_id'].unique(), np.arange(1,7))),
columns=['subject_id','hour'])
df = df1.merge(df, how='left')
print (df)
subject_id hour value
0 2 1 23.0
1 2 2 NaN
2 2 3 15.0
3 2 4 NaN
4 2 5 28.0
5 2 6 11.0
6 3 1 NaN
7 3 2 NaN
8 3 3 NaN
9 3 4 18.0
10 3 5 NaN
11 3 6 22.0
EDIT: If get error:
cannot handle a non-unique multi-index
It means duplicated values per subject_id with hour.
print (df)
subject_id hour value
0 2 1 23 <- duplicate 2, 1
1 2 1 50 <- duplicate 2, 1
2 2 3 15
3 2 5 28
4 2 6 11
5 3 4 18
6 3 6 22
Possible solution is aggregate sum or mean instead set_index:
mux = pd.MultiIndex.from_product([df['subject_id'].unique(), np.arange(1,7)],
names=['subject_id','hour'])
df1 = df.groupby(['subject_id','hour']).sum().reindex(mux).reset_index()
print (df1)
subject_id hour value
0 2 1 73.0
1 2 2 NaN
2 2 3 15.0
3 2 4 NaN
4 2 5 28.0
5 2 6 11.0
6 3 1 NaN
7 3 2 NaN
8 3 3 NaN
9 3 4 18.0
10 3 5 NaN
11 3 6 22.0
Detail:
print (df.groupby(['subject_id','hour']).sum())
value
subject_id hour
2 1 73
3 15
5 28
6 11
3 4 18
6 22
Or removed duplicates:
mux = pd.MultiIndex.from_product([df['subject_id'].unique(), np.arange(1,7)],
names=['subject_id','hour'])
df1 = (df.drop_duplicates(['subject_id','hour'])
.set_index(['subject_id','hour'])
.reindex(mux)
.reset_index())
print (df1)
subject_id hour value
0 2 1 23.0
1 2 2 NaN
2 2 3 15.0
3 2 4 NaN
4 2 5 28.0
5 2 6 11.0
6 3 1 NaN
7 3 2 NaN
8 3 3 NaN
9 3 4 18.0
10 3 5 NaN
11 3 6 22.0
Detail:
print (df.drop_duplicates(['subject_id','hour']))
subject_id hour value
0 2 1 23 <- duplicates are removed
2 2 3 15
3 2 5 28
4 2 6 11
5 3 4 18
6 3 6 22

Fill NaN based on previous value of row

I have a data frame (sample, not real):
df =
A B C D E F
0 3 4 NaN NaN NaN NaN
1 9 8 NaN NaN NaN NaN
2 5 9 4 7 NaN NaN
3 5 7 6 3 NaN NaN
4 2 6 4 3 NaN NaN
Now I want to fill NaN values with previous couple(!!!) values of row (fill Nan with left existing couple of numbers and apply to the whole row) and apply this to the whole dataset.
There are a lot of answers concerning filling the columns. But in
this case I need to fill based on rows.
There are also answers related to fill NaN based on other column, but
in my case number of columns are more than 2000. This is sample data
Desired output is:
df =
A B C D E F
0 3 4 3 4 3 4
1 9 8 9 8 9 8
2 5 9 4 7 4 7
3 5 7 6 3 6 3
4 2 6 4 3 4 3
IIUC, a quick solution without reshaping the data:
df.iloc[:,::2] = df.iloc[:,::2].ffill(1)
df.iloc[:,1::2] = df.iloc[:,1::2].ffill(1)
df
Output:
A B C D E F
0 3 4 3 4 3 4
1 9 8 9 8 9 8
2 5 9 4 7 4 7
3 5 7 6 3 6 3
4 2 6 4 3 4 3
Idea is reshape DataFrame for possible forward and back filling missing values with stack and modulo and integer division of 2 of array by length of columns:
c = df.columns
a = np.arange(len(df.columns))
df.columns = [a // 2, a % 2]
#if possible some pairs missing remove .astype(int)
df1 = df.stack().ffill(axis=1).bfill(axis=1).unstack().astype(int)
df1.columns = c
print (df1)
A B C D E F
0 3 4 3 4 3 4
1 9 8 9 8 9 8
2 5 9 4 7 4 7
3 5 7 6 3 6 3
4 2 6 4 3 4 3
Detail:
print (df.stack())
0 1 2
0 0 3 NaN NaN
1 4 NaN NaN
1 0 9 NaN NaN
1 8 NaN NaN
2 0 5 4.0 NaN
1 9 7.0 NaN
3 0 5 6.0 NaN
1 7 3.0 NaN
4 0 2 4.0 NaN
1 6 3.0 NaN

Merging two dataframes with different lengths

How can I merge two pandas dataframes with different lengths like those:
df1 = Index block_id Ut_rec_0
0 0 7
1 1 10
2 2 2
3 3 0
4 4 10
5 5 3
6 6 6
7 7 9
df2 = Index block_id Ut_rec_1
0 0 3
2 2 5
3 3 5
5 5 9
7 7 4
result = Index block_id Ut_rec_0 Ut_rec_1
0 0 7 3
1 1 10 NaN
2 2 2 5
3 3 0 5
4 4 10 NaN
5 5 3 9
6 6 6 NaN
7 7 9 4
I already tried something like, but it did not work:
df_result = pd.concat([df1, df2], join_axes=[df1['block_id']])
I already tried:
df_result = pd.concat([df1,df2,axis = 1)
But the result was:
Index block_id Ut_rec_0 Index block_id Ut_rec_1
0 0 7 0.0 0.0 3.0
1 1 10 1.0 2.0 5.0
2 2 2 2.0 3.0 5.0
3 3 0 3.0 5.0 9.0
4 4 10 4.0 7.0 4.0
5 5 3 NaN NaN NaN
6 6 6 NaN NaN NaN
7 7 9 NaN NaN NaN
pandas.DataFrame.join can "join" dataframes based on overlap in column data (or index). Something like this will likely work for you:
df1.join(df2.set_index('block_id'), on='block_id')
As #Wen said best would be using concat with axis as 1, like the below code:
pd.concat([df1, df2],axis=1)
you need, pd.merge with outer join,
pd.merge(df1,df2,on=['Index','block_id'],how='outer')
#[out]
#Index block_id Ut_rec_0 Ut_rec_1
#0 0 7 3.0
#1 1 10 NaN
#2 2 2 5.0
#3 3 0 5.0
#4 4 10 NaN
#5 5 3 9.0
#6 6 6 NaN
#7 7 9 4.0

Categories