Create a cumulative count column in pandas dataframe - python

I have a dataframe set up similar to this
**Person Value**
Joe 3
Jake 4
Patrick 2
Stacey 1
Joe 5
Stacey 6
Lara 7
Joe 2
Stacey 1
I need to create a new column 'x' which keeps a running count of how many times each person's name has appeared so far in the list.
Expected output:
**Person Value** **x**
Joe 3 1
Jake 4 1
Patrick 2 1
Stacey 1 1
Joe 5 2
Stacey 6 2
Lara 7 1
Joe 2 3
Stacey 1 3
All I've managed so far is to create an overall count, which is not what I'm looking for.
Any help is appreciated

You could let
df['x'] = df.groupby('Person').cumcount() + 1

Related

create new column of incremental number based on 2 categorical columns pandas dataframe

I have a pandas dataframe with the columns username and phase. I want to create a separate column called count with incremental values.
The count will be based on how many times a username has appeared in a specific phase. How can I accomplish this efficiently? Any suggestion is appreciated.
username phase count
0 andrew 1 1
1 andrew 1 2
2 alex 1 1
3 alex 2 1
4 andrew 1 3
5 cindy 3 1
6 alex 2 2
You can use cumcount after groupby on username and phase.
df['count'] = df.groupby(['username', 'phase']).cumcount()+1
print(df)
username phase count
0 andrew 1 1
1 andrew 1 2
2 alex 1 1
3 alex 2 1
4 andrew 1 3
5 cindy 3 1
6 alex 2 2

How to loop pandas dataframe with subset of data (like group by) [duplicate]

This question already has answers here:
How to groupby consecutive values in pandas DataFrame
(4 answers)
Closed 11 months ago.
I have a pandas dataframe after sorted, it looks like bellow (like few person working for shop as shift):
A B C D
1 1 1 Anna
2 3 1 Anna
3 1 2 Anna
4 3 2 Tom
5 3 2 Tom
6 3 2 Tom
7 3 2 Tom
8 1 1 Anna
9 3 1 Anna
10 1 2 Tom
...
I want to loop and split dataframe to subset of dataframe, then call my another function, eg:
first subset df would be
A B C D
1 1 1 Anna
2 3 1 Anna
3 1 2 Anna
second subset df would be
4 3 2 Tom
5 3 2 Tom
6 3 2 Tom
7 3 2 Tom
third subset df would be
8 1 1 Anna
9 3 1 Anna
Is there a good way to loop the main datafraem and split it?
for x in some_magic_here:
sub_df = some_mage_here_too()
my_fun(sub_df)
Thanks!
You need loop by groupby object with consecutive groups created by compare shifted D values for not equal with cumulative sum:
for i, sub_df in df.groupby(df.D.ne(df.D.shift()).cumsum()):
print (sub_df)
my_fun(sub_df)

Merging two dataframes while changing the order of the second dataframe each time

df is like so:
Week Name
1 TOM
1 BEN
1 CARL
2 TOM
2 BEN
2 CARL
3 TOM
3 BEN
3 CARL
and df1 is like so:
ID Letter
1 A
2 B
3 C
I want to merge the two dataframes so that each name is assigned a different letter each time. So the result should be like this:
Week Name Letter
1 TOM A
1 BEN B
1 CARL C
2 TOM B
2 BEN C
2 CARL A
3 TOM C
3 BEN A
3 CARL B
Any help would be greatly appreciated. Thanks in advance.
df1['Letter'] = df1.groupby('Week').cumcount().add(df1['Week'].sub(1)).mod(df1.groupby('Week').transform('count')['Name']).map(df2['Letter'])
Output:
>>> df1
Week Name Letter
0 1 TOM A
1 1 BEN B
2 1 CARL C
3 2 TOM B
4 2 BEN C
5 2 CARL A
6 3 TOM C
7 3 BEN A
8 3 CARL B

Pandas: compare how to compare two columns in different sheets and return matched value

I have two dataframes with multiple columns.
I would like to compare df1['id'] and df2['id'] and return a new df with another column that have the match value.
example:
df1
**id** **Name**
1 1 Paul
2 2 Jean
3 3 Alicia
4 4 Jennifer
df2
**id** **Name**
1 1 Paul
2 6 Jean
3 3 Alicia
4 7 Jennifer
output
**id** **Name** *correct_id*
1 1 Paul 1
2 2 Jean N/A
3 3 Alicia 3
4 4 Jennifer N/A
Note- the length of the two columns I want to match is not the same.
Try:
df1["correct_id"] = (df1["id"].isin(df2["id"]) * df1["id"]).replace(0, "N/A")
print(df1)
Prints:
id Name correct_id
0 1 Paul 1
1 2 Jean N/A
2 3 Alicia 3
3 4 Jennifer N/A

Converting Repeated Names in Dataframe to Single Values

Looking for this:
Anthony now equals 1
John now equals 2
Smith now equals 3
and this goes on and on even if the name is repeated.. Looking for this
1
1
2
2
3
3
The code is fairly long but here is the spot I need to convert the names to numbers
LM = frame[['Name','COMMENT']] -> Name is currently characters in a movie and I want to change it over to Numbers to be able to run a SVM Model through the Response Variable 'Name'
IIUC, you need to look at pd.factorize or convert name to pd.Categorical and use categorgy_codes.
np.random.seed(123)
df = pd.DataFrame({'Name':np.random.choice(['John','Smith','Anthony'],10)})
df['Name_Code'] = pd.factorize(df.Name)[0] + 1
df
Output:
Name Name_Code
0 Anthony 1
1 Smith 2
2 Anthony 1
3 Anthony 1
4 John 3
5 Anthony 1
6 Anthony 1
7 Smith 2
8 Anthony 1
9 Smith 2
OR
df['Name_Cat_Code'] = pd.Categorical(df.Name).codes + 1
Output:
Name Name_Code Name_Cat_Code
0 Anthony 1 1
1 Smith 2 3
2 Anthony 1 1
3 Anthony 1 1
4 John 3 2
5 Anthony 1 1
6 Anthony 1 1
7 Smith 2 3
8 Anthony 1 1
9 Smith 2 3

Categories