How do I drop all rows after last occurrence of a value? - python

I have a dataframe with a string column and I would like to drop all rows after the last occurrence of a name.
first_name
Andy
Josh
Mark
Tim
Alex
Andy
Josh
Mark
Tim
Alex
Andy
Josh
Mark
What I would like is to drop rows after Alex occurs for the last time, so drop the rows with Andy, Josh and Mark.
I figured I drop before the first occurrence with: df=df[(df.first_name== 'Alex').idxmax():], but don't know how to drop last rows.
Thanks!

argmax
df.iloc[:len(df) - (df.first_name.to_numpy() == 'Alex')[::-1].argmax()]
first_name
0 Andy
1 Josh
2 Mark
3 Tim
4 Alex
5 Andy
6 Josh
7 Mark
8 Tim
9 Alex
last_valid_index
df.loc[:df.where(df == 'Alex').last_valid_index()]
Option 3
df.loc[:df.first_name.eq('Alex')[::-1].idxmax()]
Option 4
df.iloc[:np.flatnonzero(df.first_name.eq('Alex')).max() + 1]
Option 5
This is silly!
df[np.logical_or.accumulate(df.first_name.eq('Alex')[::-1])[::-1]]

mask and bfill
df[df['first_name'].mask(df['first_name'] != 'Alex').bfill().notna()]
first_name
0 Andy
1 Josh
2 Mark
3 Tim
4 Alex
5 Andy
6 Josh
7 Mark
8 Tim
9 Alex
cumsum and idxmax
df.loc[:(df['first_name'] == 'Alex').cumsum().idxmax()]
first_name
0 Andy
1 Josh
2 Mark
3 Tim
4 Alex
5 Andy
6 Josh
7 Mark
8 Tim
9 Alex
cumsum and max
u = (df['first_name'] == 'Alex').shift().cumsum()
df[u < u.max()]
first_name
1 Josh
2 Mark
3 Tim
4 Alex
5 Andy
6 Josh
7 Mark
8 Tim
9 Alex

Related

Merging two dataframes while changing the order of the second dataframe each time

df is like so:
Week Name
1 TOM
1 BEN
1 CARL
2 TOM
2 BEN
2 CARL
3 TOM
3 BEN
3 CARL
and df1 is like so:
ID Letter
1 A
2 B
3 C
I want to merge the two dataframes so that each name is assigned a different letter each time. So the result should be like this:
Week Name Letter
1 TOM A
1 BEN B
1 CARL C
2 TOM B
2 BEN C
2 CARL A
3 TOM C
3 BEN A
3 CARL B
Any help would be greatly appreciated. Thanks in advance.
df1['Letter'] = df1.groupby('Week').cumcount().add(df1['Week'].sub(1)).mod(df1.groupby('Week').transform('count')['Name']).map(df2['Letter'])
Output:
>>> df1
Week Name Letter
0 1 TOM A
1 1 BEN B
2 1 CARL C
3 2 TOM B
4 2 BEN C
5 2 CARL A
6 3 TOM C
7 3 BEN A
8 3 CARL B

How to drop duplicates in column with respect to values in another column in pandas?

I have a database with person names and the date of their visit. I need to remove duplicated rows in "Visit_date" column with respect to each person in another column. I have a very big database, so I need a working code. I've spent several days trying to do this, but no result. Here is a sample:
Person Visit_date
0 John 11.09.2020
1 John 11.09.2020
2 John 11.08.2020
3 Andy 11.07.2020
4 Andy 11.09.2020
5 Andy 11.09.2020
6 George 11.09.2020
7 George 11.09.2020
8 George 11.07.2020
9 George 11.07.2020
The code should return:
Person Visit_date
0 John 11.09.2020
1 John 11.08.2020
2 Andy 11.07.2020
3 Andy 11.09.2020
4 George 11.09.2020
5 George 11.07.2020
Hope this help you. Using df.drop_duplicates() then df.reset_index(drop=True)
import pandas as pd
df = pd.DataFrame({"Person" :['John','John','John','Andy','Andy','Andy','George','George','George','George'],"Visit_date" :['11.09.2020','11.09.2020','11.08.2020','11.07.2020','11.09.2020','11.09.2020','11.09.2020','11.09.2020','11.07.2020','11.07.2020']})
df=df.drop_duplicates()
df=df.reset_index(drop=True)
print(df)
[Result]:
Person Visit_date
0 John 11.09.2020
1 John 11.08.2020
2 Andy 11.07.2020
3 Andy 11.09.2020
4 George 11.09.2020
5 George 11.07.2020

Create a cumulative count column in pandas dataframe

I have a dataframe set up similar to this
**Person Value**
Joe 3
Jake 4
Patrick 2
Stacey 1
Joe 5
Stacey 6
Lara 7
Joe 2
Stacey 1
I need to create a new column 'x' which keeps a running count of how many times each person's name has appeared so far in the list.
Expected output:
**Person Value** **x**
Joe 3 1
Jake 4 1
Patrick 2 1
Stacey 1 1
Joe 5 2
Stacey 6 2
Lara 7 1
Joe 2 3
Stacey 1 3
All I've managed so far is to create an overall count, which is not what I'm looking for.
Any help is appreciated
You could let
df['x'] = df.groupby('Person').cumcount() + 1

How to strip the string and replace the existing elements in DataFrame

I have a df as below:
Index Site Name
0 Site_1 Tom
1 Site_2 Tom
2 Site_4 Jack
3 Site_8 Rose
5 Site_11 Marrie
6 Site_12 Marrie
7 Site_21 Jacob
8 Site_34 Jacob
I would like to strip the 'Site_' and only leave the number in the "Site" column, as shown below:
Index Site Name
0 1 Tom
1 2 Tom
2 4 Jack
3 8 Rose
5 11 Marrie
6 12 Marrie
7 21 Jacob
8 34 Jacob
What is the best way to do this operation?
Using pd.Series.str.extract
This produces a copy with an updated columns
df.assign(Site=df.Site.str.extract('\D+(\d+)', expand=False))
Site Name
Index
0 1 Tom
1 2 Tom
2 4 Jack
3 8 Rose
5 11 Marrie
6 12 Marrie
7 21 Jacob
8 34 Jacob
To persist the results, reassign to the data frame name
df = df.assign(Site=df.Site.str.extract('\D+(\d+)', expand=False))
Using pd.Series.str.split
df.assign(Site=df.Site.str.split('_', 1).str[1])
Alternative
Update instead of producing a copy
df.update(df.Site.str.extract('\D+(\d+)', expand=False))
# Or
# df.update(df.Site.str.split('_', 1).str[1])
df
Site Name
Index
0 1 Tom
1 2 Tom
2 4 Jack
3 8 Rose
5 11 Marrie
6 12 Marrie
7 21 Jacob
8 34 Jacob
Make a array consist of the names you want. Then call
yourarray = pd.DataFrame(yourpd, columns=yournamearray)
Just call replace on the column to replace all instances of "Site_":
df['Site'] = df['Site'].str.replace('Site_', '')
Use .apply() to apply a function to each element in a series:
df['Site Name'] = df['Site Name'].apply(lambda x: x.split('_')[-1])
You can use exactly what you wanted (the strip method)
>>> df["Site"] = df.Site.str.strip("Site_")
Output
Index Site Name
0 1 Tom
1 2 Tom
2 4 Jack
3 8 Rose
5 11 Marrie
6 12 Marrie
7 21 Jacob
8 34 Jacob

Pandas intersection of groups

Hi I'm trying to find the unique Player which show up in every Team.
df =
Team Player Number
A Joe 8
A Mike 10
A Steve 11
B Henry 9
B Steve 19
B Joe 4
C Mike 18
C Joe 6
C Steve 18
C Dan 1
C Henry 3
and the result should be:
result =
Team Player Number
A Joe 8
A Steve 11
B Joe 4
B Steve 19
C Joe 6
C Steve 18
since Joe and Steve are the only Player in each Team
You can use a GroupBy.transform to get a count of unique teams that each player is a member of, and compare this to the overall count of unique teams. This will give you a Boolean array, which you can use to filter your DataFrame:
df = df[df.groupby('Player')['Team'].transform('nunique') == df['Team'].nunique()]
The resulting output:
Team Player Number
0 A Joe 8
2 A Steve 11
4 B Steve 19
5 B Joe 4
7 C Joe 6
8 C Steve 18

Categories