Select a specific value in a group using python pandas - python

I have a dataset with below data.
id status div
1 True 0
2 False 2
2 True 1
3 False 4
3 False 5
1 False 5
4 True 3
4 True 10
5 False 3
5 False 3
5 True 2
I want my output as
id status div
1 True 0
2 True 1
3 False 4
4 True 3
5 True 2
If true is present in the group i want it to be true else if only False is present i want to be False.
I have tried using Pandas group by but unable to select the condition.

Use DataFrameGroupBy.any with map by helper Series with first Truerow per groups if exist:
s = (df.sort_values(['status','id'], ascending=False)
.drop_duplicates('id')
.set_index('id')['div'])
print (s)
id
5 2
4 3
2 1
1 0
3 4
Name: div, dtype: int64
df1 = df.groupby('id')['status'].any().reset_index()
df1['div'] = df1['id'].map(s)
print (df1)
id status div
0 1 True 0
1 2 True 1
2 3 False 4
3 4 True 3
4 5 True 2

Related

Pandas: Delete rows of each group by condition

Good afternoon, I have a dataframe like this where I have different groups that are reflected in the column "NumeroPosesion".
Event NumeroPosesion
0 completedPass 1
1 completedPass 1
2 takeon 1
3 failedPass 1
4 takeon 1
5 dribbleYES 1
6 shot 1
7 takeon 2
8 dribbleNO 2
9 completedPass 2
10 completedPass 2
11 shot 2
12 completedPass 2
13 completedPass 2
14 completedPass 2
The idea is the following:
When the first "Event" = "shot" appears, delete all the rows below that group.
Iterate from the last row of the group (it will be the one with "Event" = "shot" and go up until "Event" is different from "takeon", "completedPass" or "dribbleYES".
When it is different, delete all rows above the different one in the group.
Dataframe expected:
Event NumeroPosesion
0 takeon 1
1 dribbleYES 1
2 shot 1
3 completedPass 2
4 completedPass 2
5 shot 2
Use boolean indexing with help of groupby.cummax/cummin:
# remove rows after "shot" for each group
m1 = df.loc[::-1, 'Event'].eq('shot').groupby(df['NumeroPosesion']).cummax()
# remove rows before the first non "takeon"/"completedPass"/"dribbleYES"
m2 = (df.loc[m1, 'Event'].isin(['shot', 'takeon', 'completedPass', 'dribbleYES'])[::-1]
.groupby(df['NumeroPosesion']).cummin()
)
# slice
out = df[m1&m2]
Output:
Event NumeroPosesion
4 takeon 1
5 dribbleYES 1
6 shot 1
9 completedPass 2
10 completedPass 2
11 shot 2
Intermediates:
Event NumeroPosesion m1 m2
0 completedPass 1 True False
1 completedPass 1 True False
2 takeon 1 True False
3 failedPass 1 True False
4 takeon 1 True True
5 dribbleYES 1 True True
6 shot 1 True True
7 takeon 2 True False
8 dribbleNO 2 True False
9 completedPass 2 True True
10 completedPass 2 True True
11 shot 2 True True
12 completedPass 2 False NaN
13 completedPass 2 False NaN
14 completedPass 2 False NaN

Find the index of the last true occurrence in a column by row

I have the following table format:
id
bool
1
true
2
true
3
false
4
false
5
false
6
true
I'd like it so that I could get another column with the index of the last true occurrence in the bool column by row. If it's true in it's own row then return it's own id. It doesn't sound too hard using a for loop but I want it in a clean pandas format. I.e in this example I would get:
column = [1,2,2,2,2,6]
IIUC, you can mask and ffill:
df['new'] = df['id'].where(df['bool']).ffill(downcast='infer')
output:
id bool new
0 1 True 1
1 2 True 2
2 3 False 2
3 4 False 2
4 5 False 2
5 6 True 6
In your case do
df['new'] = df['id'].mul(df['bool']).cummax()
Out[344]:
0 1
1 2
2 2
3 2
4 2
5 6
dtype: int64
df1.assign(col1=np.where(df1.bool2,df1.id,pd.NA)).fillna(method='pad')
id bool2 col1
0 1 True 1
1 2 True 2
2 3 False 2
3 4 False 2
4 5 False 2
5 6 True 6

How to check if every range of a dataframe series between 2 values has another value

In a Pandas Dataframe I have 1 series A
Index A
0 2
1 1
2 6
3 3
4 2
5 7
6 1
7 3
8 8
9 1
10 3
I would like to check if between every 1 of column A there is a number 2 and to write in column B the results like this:
Index A B
0 2 FALSE
1 1 FALSE
2 6 FALSE
3 3 FALSE
4 2 TRUE
5 7 FALSE
6 1 FALSE
7 3 FALSE
8 2 TRUE
9 1 FALSE
10 3 FALSE
I though to use rolling() as a function, but can rolling work with a random sized window of values (ranges) ?
mask = (df["A"] == 1) | (df["A"] == 2)
df = df.assign(B=df.loc[mask, "A"].sub(df.loc[mask, "A"].shift()).eq(1)).fillna(False)
>>> df
A B
0 2 False
1 1 False
2 6 False
3 3 False
4 2 True
5 7 False
6 1 False
7 3 False
8 2 True
9 1 False
10 3 False
IIUC use np.where to assign 2s in 'A' to True after the first 1 is found:
df['B'] = np.where(df['A'].eq(1).cumsum().ge(1) & df['A'].eq(2), True, False)
df:
A B
0 2 False
1 1 False
2 6 False
3 3 False
4 2 True
5 7 False
6 1 False
7 3 False
8 2 True
9 1 False
10 3 False
Imports and DataFrame used:
import numpy as np
import pandas as pd
df = pd.DataFrame({'A': [2, 1, 6, 3, 2, 7, 1, 3, 2, 1, 3]})
Here is a very basic and straightforward way to do it:
import pandas as pd
df= pd.DataFrame({"A":[2,1,6,3,2,7,1,3,2,1,3]})
for i, cell in enumerate(df["A"]):
if (1 in list(df.loc[:i,"A"])) and (1 in list(df.loc[i:,"A"])) and cell==2:
df.at[i,"B"] = True
else:
df.at[i,"B"] = False

Pandas cummulative sum based on True/False condition

I'm using python and need to solve the dataframe as cumsum() the value until the boolean column change its value from True to False. How to solve this task?
Bool Value Expected_cumsum
0 False 1 1
1 False 2 3
2 False 4 7
3 True 1 8
4 False 3 3 << reset from here
5 False 5 8
6 True 2 10
....
Thank all!
You can try this
a = df.Bool.eq(True).cumsum().shift().fillna(0)
df['Expected_cumsum']= df.groupby(a)['Value'].cumsum()
df
Output
Bool Value Expected_cumsum
0 False 1 1
1 False 2 3
2 False 4 7
3 True 1 8
4 False 3 3
5 False 5 8
6 True 2 10

Using cumsum to find unique chapters

I have a dataframe like this:
df = pd.DataFrame()
text secFlag
0 book 1
1 headings 1
2 chapter 1
3 one 1
4 page 0
5 one 0
6 text 0
7 chapter 1
8 two 1
9 page 0
10 two 0
11 text 0
12 page 0
13 three 0
10 text 0
11 chapter 1
12 three 1
13 something 0
I want to find the cumulative sum so that I can mark all the pages belonging to a specific chapter by a running index number.
**Desired output**
text secFlag chapter
0 book 1 1
1 headings 1 1
2 chapter 1 2
3 one 1 2
4 page 0 2
5 one 0 2
6 text 0 2
7 chapter 1 3
8 two 1 3
9 page 0 3
10 two 0 3
11 text 0 3
12 page 0 3
13 three 0 3
10 text 0 3
11 chapter 1 4
12 three 1 4
13 something 0 4
This is what I tried:
df['chapter'] = ((df['secFlag'].shift(-1) == 1)).cumsum()
But, this is not giving me the desired output, as it is incrementing as soon as a value is 1 in the section flag. Note that multiple words are part of the text, and the chapter heading will usually have multiple words.
Can you please suggest a simple way to get this done?
thanks
If need flag by first 1 in secFlag solution is:
df['chapter'] = ((df['secFlag'] == 1) & (df['secFlag'] != df['secFlag'].shift())).cumsum()
print (df)
text secFlag chapter
0 book 1 1
1 headings 1 1
2 chapter 1 1
3 one 1 1
4 page 0 1
5 one 0 1
6 text 0 1
7 chapter 1 2
8 two 1 2
9 page 0 2
10 two 0 2
11 text 0 2
12 page 0 2
13 three 0 2
10 text 0 2
11 chapter 1 3
12 three 1 3
13 something 0 3
Details:
a = (df['secFlag'] == 1)
b = (df['secFlag'] != df['secFlag'].shift())
c = a & b
d = c.cumsum()
print (pd.concat([df,a,b,c,d],
axis=1,
keys=('orig','==1','!=shifted','chained by &','cumsum')))
orig ==1 !=shifted chained by & cumsum
text secFlag secFlag secFlag secFlag secFlag
0 book 1 True True True 1
1 headings 1 True False False 1
2 chapter 1 True False False 1
3 one 1 True False False 1
4 page 0 False True False 1
5 one 0 False False False 1
6 text 0 False False False 1
7 chapter 1 True True True 2
8 two 1 True False False 2
9 page 0 False True False 2
10 two 0 False False False 2
11 text 0 False False False 2
12 page 0 False False False 2
13 three 0 False False False 2
10 text 0 False False False 2
11 chapter 1 True True True 3
12 three 1 True False False 3
13 something 0 False True False 3

Categories