How do I drop all rows after last occurrence of a value?

How do I drop all rows after last occurrence of a value? - python

I have a dataframe with a string column and I would like to drop all rows after the last occurrence of a name.
first_name
Andy
Josh
Mark
Tim
Alex
Andy
Josh
Mark
Tim
Alex
Andy
Josh
Mark
What I would like is to drop rows after Alex occurs for the last time, so drop the rows with Andy, Josh and Mark.
I figured I drop before the first occurrence with: df=df[(df.first_name== 'Alex').idxmax():], but don't know how to drop last rows.
Thanks!

argmax
df.iloc[:len(df) - (df.first_name.to_numpy() == 'Alex')[::-1].argmax()]
first_name
0 Andy
1 Josh
2 Mark
3 Tim
4 Alex
5 Andy
6 Josh
7 Mark
8 Tim
9 Alex
last_valid_index
df.loc[:df.where(df == 'Alex').last_valid_index()]
Option 3
df.loc[:df.first_name.eq('Alex')[::-1].idxmax()]
Option 4
df.iloc[:np.flatnonzero(df.first_name.eq('Alex')).max() + 1]
Option 5
This is silly!
df[np.logical_or.accumulate(df.first_name.eq('Alex')[::-1])[::-1]]

mask and bfill
df[df['first_name'].mask(df['first_name'] != 'Alex').bfill().notna()]
first_name
0 Andy
1 Josh
2 Mark
3 Tim
4 Alex
5 Andy
6 Josh
7 Mark
8 Tim
9 Alex
cumsum and idxmax
df.loc[:(df['first_name'] == 'Alex').cumsum().idxmax()]
first_name
0 Andy
1 Josh
2 Mark
3 Tim
4 Alex
5 Andy
6 Josh
7 Mark
8 Tim
9 Alex
cumsum and max
u = (df['first_name'] == 'Alex').shift().cumsum()
df[u < u.max()]
first_name
1 Josh
2 Mark
3 Tim
4 Alex
5 Andy
6 Josh
7 Mark
8 Tim
9 Alex

Related

Merging two dataframes while changing the order of the second dataframe each time

df is like so:
Week Name
1 TOM
1 BEN
1 CARL
2 TOM
2 BEN
2 CARL
3 TOM
3 BEN
3 CARL
and df1 is like so:
ID Letter
1 A
2 B
3 C
I want to merge the two dataframes so that each name is assigned a different letter each time. So the result should be like this:
Week Name Letter
1 TOM A
1 BEN B
1 CARL C
2 TOM B
2 BEN C
2 CARL A
3 TOM C
3 BEN A
3 CARL B
Any help would be greatly appreciated. Thanks in advance.

df1['Letter'] = df1.groupby('Week').cumcount().add(df1['Week'].sub(1)).mod(df1.groupby('Week').transform('count')['Name']).map(df2['Letter'])
Output:
>>> df1
Week Name Letter
0 1 TOM A
1 1 BEN B
2 1 CARL C
3 2 TOM B
4 2 BEN C
5 2 CARL A
6 3 TOM C
7 3 BEN A
8 3 CARL B

How to drop duplicates in column with respect to values in another column in pandas?

I have a database with person names and the date of their visit. I need to remove duplicated rows in "Visit_date" column with respect to each person in another column. I have a very big database, so I need a working code. I've spent several days trying to do this, but no result. Here is a sample:
Person Visit_date
0 John 11.09.2020
1 John 11.09.2020
2 John 11.08.2020
3 Andy 11.07.2020
4 Andy 11.09.2020
5 Andy 11.09.2020
6 George 11.09.2020
7 George 11.09.2020
8 George 11.07.2020
9 George 11.07.2020
The code should return:
Person Visit_date
0 John 11.09.2020
1 John 11.08.2020
2 Andy 11.07.2020
3 Andy 11.09.2020
4 George 11.09.2020
5 George 11.07.2020

Hope this help you. Using df.drop_duplicates() then df.reset_index(drop=True)
import pandas as pd
df = pd.DataFrame({"Person" :['John','John','John','Andy','Andy','Andy','George','George','George','George'],"Visit_date" :['11.09.2020','11.09.2020','11.08.2020','11.07.2020','11.09.2020','11.09.2020','11.09.2020','11.09.2020','11.07.2020','11.07.2020']})
df=df.drop_duplicates()
df=df.reset_index(drop=True)
print(df)
[Result]:
Person Visit_date
0 John 11.09.2020
1 John 11.08.2020
2 Andy 11.07.2020
3 Andy 11.09.2020
4 George 11.09.2020
5 George 11.07.2020

Create a cumulative count column in pandas dataframe

I have a dataframe set up similar to this
**Person Value**
Joe 3
Jake 4
Patrick 2
Stacey 1
Joe 5
Stacey 6
Lara 7
Joe 2
Stacey 1
I need to create a new column 'x' which keeps a running count of how many times each person's name has appeared so far in the list.
Expected output:
**Person Value** **x**
Joe 3 1
Jake 4 1
Patrick 2 1
Stacey 1 1
Joe 5 2
Stacey 6 2
Lara 7 1
Joe 2 3
Stacey 1 3
All I've managed so far is to create an overall count, which is not what I'm looking for.
Any help is appreciated

You could let
df['x'] = df.groupby('Person').cumcount() + 1

How to strip the string and replace the existing elements in DataFrame

I have a df as below:
Index Site Name
0 Site_1 Tom
1 Site_2 Tom
2 Site_4 Jack
3 Site_8 Rose
5 Site_11 Marrie
6 Site_12 Marrie
7 Site_21 Jacob
8 Site_34 Jacob
I would like to strip the 'Site_' and only leave the number in the "Site" column, as shown below:
Index Site Name
0 1 Tom
1 2 Tom
2 4 Jack
3 8 Rose
5 11 Marrie
6 12 Marrie
7 21 Jacob
8 34 Jacob
What is the best way to do this operation?

Using pd.Series.str.extract
This produces a copy with an updated columns
df.assign(Site=df.Site.str.extract('\D+(\d+)', expand=False))
Site Name
Index
0 1 Tom
1 2 Tom
2 4 Jack
3 8 Rose
5 11 Marrie
6 12 Marrie
7 21 Jacob
8 34 Jacob
To persist the results, reassign to the data frame name
df = df.assign(Site=df.Site.str.extract('\D+(\d+)', expand=False))
Using pd.Series.str.split
df.assign(Site=df.Site.str.split('_', 1).str[1])
Alternative
Update instead of producing a copy
df.update(df.Site.str.extract('\D+(\d+)', expand=False))
# Or
# df.update(df.Site.str.split('_', 1).str[1])
df
Site Name
Index
0 1 Tom
1 2 Tom
2 4 Jack
3 8 Rose
5 11 Marrie
6 12 Marrie
7 21 Jacob
8 34 Jacob

Make a array consist of the names you want. Then call
yourarray = pd.DataFrame(yourpd, columns=yournamearray)

Just call replace on the column to replace all instances of "Site_":
df['Site'] = df['Site'].str.replace('Site_', '')

Use .apply() to apply a function to each element in a series:
df['Site Name'] = df['Site Name'].apply(lambda x: x.split('_')[-1])

You can use exactly what you wanted (the strip method)
>>> df["Site"] = df.Site.str.strip("Site_")
Output
Index Site Name
0 1 Tom
1 2 Tom
2 4 Jack
3 8 Rose
5 11 Marrie
6 12 Marrie
7 21 Jacob
8 34 Jacob

Pandas intersection of groups

Hi I'm trying to find the unique Player which show up in every Team.
df =
Team Player Number
A Joe 8
A Mike 10
A Steve 11
B Henry 9
B Steve 19
B Joe 4
C Mike 18
C Joe 6
C Steve 18
C Dan 1
C Henry 3
and the result should be:
result =
Team Player Number
A Joe 8
A Steve 11
B Joe 4
B Steve 19
C Joe 6
C Steve 18
since Joe and Steve are the only Player in each Team

You can use a GroupBy.transform to get a count of unique teams that each player is a member of, and compare this to the overall count of unique teams. This will give you a Boolean array, which you can use to filter your DataFrame:
df = df[df.groupby('Player')['Team'].transform('nunique') == df['Team'].nunique()]
The resulting output:
Team Player Number
0 A Joe 8
2 A Steve 11
4 B Steve 19
5 B Joe 4
7 C Joe 6
8 C Steve 18

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I drop all rows after last occurrence of a value? - python

Related

Merging two dataframes while changing the order of the second dataframe each time

How to drop duplicates in column with respect to values in another column in pandas?

Create a cumulative count column in pandas dataframe

How to strip the string and replace the existing elements in DataFrame

Pandas intersection of groups

Categories

Resources