I'm using python and need to solve the dataframe as cumsum() the value until the boolean column change its value from True to False. How to solve this task?
Bool Value Expected_cumsum
0 False 1 1
1 False 2 3
2 False 4 7
3 True 1 8
4 False 3 3 << reset from here
5 False 5 8
6 True 2 10
....
Thank all!
You can try this
a = df.Bool.eq(True).cumsum().shift().fillna(0)
df['Expected_cumsum']= df.groupby(a)['Value'].cumsum()
df
Output
Bool Value Expected_cumsum
0 False 1 1
1 False 2 3
2 False 4 7
3 True 1 8
4 False 3 3
5 False 5 8
6 True 2 10
Related
I have the following table format:
id
bool
1
true
2
true
3
false
4
false
5
false
6
true
I'd like it so that I could get another column with the index of the last true occurrence in the bool column by row. If it's true in it's own row then return it's own id. It doesn't sound too hard using a for loop but I want it in a clean pandas format. I.e in this example I would get:
column = [1,2,2,2,2,6]
IIUC, you can mask and ffill:
df['new'] = df['id'].where(df['bool']).ffill(downcast='infer')
output:
id bool new
0 1 True 1
1 2 True 2
2 3 False 2
3 4 False 2
4 5 False 2
5 6 True 6
In your case do
df['new'] = df['id'].mul(df['bool']).cummax()
Out[344]:
0 1
1 2
2 2
3 2
4 2
5 6
dtype: int64
df1.assign(col1=np.where(df1.bool2,df1.id,pd.NA)).fillna(method='pad')
id bool2 col1
0 1 True 1
1 2 True 2
2 3 False 2
3 4 False 2
4 5 False 2
5 6 True 6
In a Pandas Dataframe I have 1 series A
Index A
0 2
1 1
2 6
3 3
4 2
5 7
6 1
7 3
8 8
9 1
10 3
I would like to check if between every 1 of column A there is a number 2 and to write in column B the results like this:
Index A B
0 2 FALSE
1 1 FALSE
2 6 FALSE
3 3 FALSE
4 2 TRUE
5 7 FALSE
6 1 FALSE
7 3 FALSE
8 2 TRUE
9 1 FALSE
10 3 FALSE
I though to use rolling() as a function, but can rolling work with a random sized window of values (ranges) ?
mask = (df["A"] == 1) | (df["A"] == 2)
df = df.assign(B=df.loc[mask, "A"].sub(df.loc[mask, "A"].shift()).eq(1)).fillna(False)
>>> df
A B
0 2 False
1 1 False
2 6 False
3 3 False
4 2 True
5 7 False
6 1 False
7 3 False
8 2 True
9 1 False
10 3 False
IIUC use np.where to assign 2s in 'A' to True after the first 1 is found:
df['B'] = np.where(df['A'].eq(1).cumsum().ge(1) & df['A'].eq(2), True, False)
df:
A B
0 2 False
1 1 False
2 6 False
3 3 False
4 2 True
5 7 False
6 1 False
7 3 False
8 2 True
9 1 False
10 3 False
Imports and DataFrame used:
import numpy as np
import pandas as pd
df = pd.DataFrame({'A': [2, 1, 6, 3, 2, 7, 1, 3, 2, 1, 3]})
Here is a very basic and straightforward way to do it:
import pandas as pd
df= pd.DataFrame({"A":[2,1,6,3,2,7,1,3,2,1,3]})
for i, cell in enumerate(df["A"]):
if (1 in list(df.loc[:i,"A"])) and (1 in list(df.loc[i:,"A"])) and cell==2:
df.at[i,"B"] = True
else:
df.at[i,"B"] = False
i am a newbie in python. please assist. I have a huge dataframe consisting of thousands of rows. an example of the df is shown below.
STATE VOLUME
INDEX
1 on 10
2 on 15
3 on 10
4 off 20
5 off 30
6 on 15
7 on 20
8 off 10
9 off 30
10 off 10
11 on 20
12 off 25
i want to be able to index this data based on the 'state' column such that the first batch of 'on' and 'off' registers as index 1, the next batch of 'on' and 'off' registers as index 2 etc etc... i want to be able to select a group of data if i select rows with index 1.
ID VOLUME
INDEX
1 on 10
1 on 15
1 on 10
1 off 20
1 off 30
2 on 15
2 on 20
2 off 10
2 off 30
2 off 10
3 on 20
3 off 25
You can try this with pd.Series.eq with pd.Series.shift and take cumsum using pd.Series.cumsum
df.index = (df['STATE'].eq('off').shift() & df['STATE'].eq('on')).cumsum() + 1
df.index.name = 'INDEX'
STATE VOLUME
INDEX
1 on 10
1 on 15
1 on 10
1 off 20
1 off 30
2 on 15
2 on 20
2 off 10
2 off 30
2 off 10
3 on 20
3 off 25
Details
The idea is to find where an off is followed by an on.
# (df['STATE'].eq('off').shift() & df['STATE'].eq('on')).cumsum() + 1
eq(off).shift eq(on) eq(off).shift & eq(on)
INDEX
1 NaN True False
2 False True False
3 False True False
4 False False False
5 True False False
6 True True True
7 False True False
8 False False False
9 True False False
10 True False False
11 True True True
12 False False False
You could try this with pd.Series.shift and pd.Series.cumsum:
df.index=((df.STATE.shift(-1) != df.STATE)&df.STATE.eq('off')).shift(fill_value=0).cumsum()+1
Same as this with np.where:
temp=pd.Series(np.where((df.STATE.shift(-1) != df.STATE)&(df.STATE.eq('off')),1,0))
df.index=temp.shift(1,fill_value=0).cumsum().astype(int).add(1)
Output:
df
STATE VOLUME
1 on 10
1 on 15
1 on 10
1 off 20
1 off 30
2 on 15
2 on 20
2 off 10
2 off 30
2 off 10
3 on 20
3 off 25
Explanation:
With (df.STATE.shift(-1) != df.STATE)&df.STATE.eq('off'), you will get a mask with the last value when it changes to 'off':
(df.STATE.shift(-1) != df.STATE)&df.STATE.eq('off')
1 False
2 False
3 False
4 False
5 True
6 False
7 False
8 False
9 False
10 True
11 False
12 True
Then you shift it to include that last value, and then you do a cumsum() knowing that True: 1 and False: 0:
((df.STATE.shift(-1) != df.STATE)&df.STATE.eq('off')).shift(fill_value=0)
1 0
2 False
3 False
4 False
5 False
6 True
7 False
8 False
9 False
10 False
11 True
12 False
((df.STATE.shift(-1) != df.STATE)&df.STATE.eq('off')).shift(fill_value=0).cumsum()
1 0
2 0
3 0
4 0
5 0
6 1
7 1
8 1
9 1
10 1
11 2
12 2
And finally you add 1(+1) to the index, to get the desired result.
Solution for a single column is already provided here: Pandas: Check if column value is smaller than any previous column value.
However, my dataset consists of many columns and I don't want to brute force many codes.
Example dataset:
c d e
0 3 5 8
1 1 5 8
2 5 6 8
3 6 7 8
4 2 1 9
5 9 3 3
Desired result:
c d e c_diff d_diff e_diff
0 3 5 8 False False False
1 1 5 8 True False False
2 5 6 8 False False False
3 6 7 8 False False False
4 2 1 9 True True False
5 9 3 3 False False True
Is there any way to perform this task with simple lines of Python/Panda code?
We can use DataFrame.diff with DataFrame.lt
df.diff().lt(0).add_suffix('_diff')
c_diff d_diff e_diff
0 False False False
1 True False False
2 False False False
3 False False False
4 True True False
5 False False True
You can operate on the whole dataframe:
df.lt(df.shift()).add_suffix('_diff')
gives you
c_diff d_diff e_diff
0 False False False
1 True False False
2 False False False
3 False False False
4 True True False
5 False False True
And you can join:
df.join(df.lt(df.shift()).add_suffix('_diff'))
which gives:
c d e c_diff d_diff e_diff
0 3 5 8 False False False
1 1 5 8 True False False
2 5 6 8 False False False
3 6 7 8 False False False
4 2 1 9 True True False
5 9 3 3 False False True
I have a dataset with below data.
id status div
1 True 0
2 False 2
2 True 1
3 False 4
3 False 5
1 False 5
4 True 3
4 True 10
5 False 3
5 False 3
5 True 2
I want my output as
id status div
1 True 0
2 True 1
3 False 4
4 True 3
5 True 2
If true is present in the group i want it to be true else if only False is present i want to be False.
I have tried using Pandas group by but unable to select the condition.
Use DataFrameGroupBy.any with map by helper Series with first Truerow per groups if exist:
s = (df.sort_values(['status','id'], ascending=False)
.drop_duplicates('id')
.set_index('id')['div'])
print (s)
id
5 2
4 3
2 1
1 0
3 4
Name: div, dtype: int64
df1 = df.groupby('id')['status'].any().reset_index()
df1['div'] = df1['id'].map(s)
print (df1)
id status div
0 1 True 0
1 2 True 1
2 3 False 4
3 4 True 3
4 5 True 2