Looking for this:
Anthony now equals 1
John now equals 2
Smith now equals 3
and this goes on and on even if the name is repeated.. Looking for this
1
1
2
2
3
3
The code is fairly long but here is the spot I need to convert the names to numbers
LM = frame[['Name','COMMENT']] -> Name is currently characters in a movie and I want to change it over to Numbers to be able to run a SVM Model through the Response Variable 'Name'
IIUC, you need to look at pd.factorize or convert name to pd.Categorical and use categorgy_codes.
np.random.seed(123)
df = pd.DataFrame({'Name':np.random.choice(['John','Smith','Anthony'],10)})
df['Name_Code'] = pd.factorize(df.Name)[0] + 1
df
Output:
Name Name_Code
0 Anthony 1
1 Smith 2
2 Anthony 1
3 Anthony 1
4 John 3
5 Anthony 1
6 Anthony 1
7 Smith 2
8 Anthony 1
9 Smith 2
OR
df['Name_Cat_Code'] = pd.Categorical(df.Name).codes + 1
Output:
Name Name_Code Name_Cat_Code
0 Anthony 1 1
1 Smith 2 3
2 Anthony 1 1
3 Anthony 1 1
4 John 3 2
5 Anthony 1 1
6 Anthony 1 1
7 Smith 2 3
8 Anthony 1 1
9 Smith 2 3
Related
I have the following df
Original df
Step | CampaignSource | UserId
1 Banana Jeff
1 Banana John
2 Banana Jefferson
3 Website Nunes
4 Banana Jeff
5 Attendance Nunes
6 Attendance Antonio
7 Banana Antonio
8 Website Joseph
9 Attendance Joseph
9 Attendance Joseph
Desired output
Steps | CampaignSource | CountedDistinctUserid
1 Website 2 (Because of different userids)
2 Banana 1
3 Banana 1
4 Website 1
5 Banana 1
6 Attendance 1
7 Attendance 1
8 Attendance 1
9 Attendance 1 (but i want to have 2 here even tho they have similar user ids and because is the 9th step)
What i want to do is impose a condition where if the step column which is made by strings equals '9', i want to count the userids as non distinct, any ideas on how i could do that? I tried applying a function but i just couldnt make it work.
What i am currently doing:
df[['Steps','UserId','CampaignSource']].groupby(['Steps','CampaignSource'],as_index=False,dropna=False).nunique()
You can group by "Step" and use a condition on the group name:
df.groupby('Step')['UserId'].apply(lambda g: g.nunique() if g.name<9 else g.count())
output:
Step
1 2
2 1
3 1
4 1
5 1
6 1
7 1
8 1
9 2
Name: UserId, dtype: int64
As DataFrame:
(df.groupby('Step', as_index=False)
.agg(CampaignSource=('CampaignSource', 'first'),
CountedDistinctUserid=('CampaignSource', lambda g: g.nunique() if g.name<9 else g.count())
)
)
output:
Step CampaignSource CountedDistinctUserid
0 1 Banana 2
1 2 Banana 1
2 3 Website 1
3 4 Banana 1
4 5 Attendance 1
5 6 Attendance 1
6 7 Banana 1
7 8 Website 1
8 9 Banana 2
You can apply different functions to different groups depending if condition match.
out = (df[['Steps','UserId','CampaignSource']]
.groupby(['Steps','CampaignSource'],as_index=False,dropna=False)
.apply(lambda g: g.assign(CountedDistinctUserid=( [len(g)]*len(g)
if g['Steps'].eq(9).all()
else [g['UserId'].nunique()]*len(g) ))))
print(out)
Steps UserId CampaignSource CountedDistinctUserid
0 1 Jeff Banana 2
1 1 John Banana 2
2 2 Jefferson Banana 1
3 3 Nunes Website 1
4 4 Jeff Banana 1
5 5 Nunes Attendance 1
6 6 Antonio Attendance 1
7 7 Antonio Banana 1
8 8 Joseph Website 1
9 9 Joseph Attendance 2
10 9 Joseph Attendance 2
df is like so:
Week Name
1 TOM
1 BEN
1 CARL
2 TOM
2 BEN
2 CARL
3 TOM
3 BEN
3 CARL
and df1 is like so:
ID Letter
1 A
2 B
3 C
I want to merge the two dataframes so that each name is assigned a different letter each time. So the result should be like this:
Week Name Letter
1 TOM A
1 BEN B
1 CARL C
2 TOM B
2 BEN C
2 CARL A
3 TOM C
3 BEN A
3 CARL B
Any help would be greatly appreciated. Thanks in advance.
df1['Letter'] = df1.groupby('Week').cumcount().add(df1['Week'].sub(1)).mod(df1.groupby('Week').transform('count')['Name']).map(df2['Letter'])
Output:
>>> df1
Week Name Letter
0 1 TOM A
1 1 BEN B
2 1 CARL C
3 2 TOM B
4 2 BEN C
5 2 CARL A
6 3 TOM C
7 3 BEN A
8 3 CARL B
I have two dataframes with multiple columns.
I would like to compare df1['id'] and df2['id'] and return a new df with another column that have the match value.
example:
df1
**id** **Name**
1 1 Paul
2 2 Jean
3 3 Alicia
4 4 Jennifer
df2
**id** **Name**
1 1 Paul
2 6 Jean
3 3 Alicia
4 7 Jennifer
output
**id** **Name** *correct_id*
1 1 Paul 1
2 2 Jean N/A
3 3 Alicia 3
4 4 Jennifer N/A
Note- the length of the two columns I want to match is not the same.
Try:
df1["correct_id"] = (df1["id"].isin(df2["id"]) * df1["id"]).replace(0, "N/A")
print(df1)
Prints:
id Name correct_id
0 1 Paul 1
1 2 Jean N/A
2 3 Alicia 3
3 4 Jennifer N/A
I have a dataframe set up similar to this
**Person Value**
Joe 3
Jake 4
Patrick 2
Stacey 1
Joe 5
Stacey 6
Lara 7
Joe 2
Stacey 1
I need to create a new column 'x' which keeps a running count of how many times each person's name has appeared so far in the list.
Expected output:
**Person Value** **x**
Joe 3 1
Jake 4 1
Patrick 2 1
Stacey 1 1
Joe 5 2
Stacey 6 2
Lara 7 1
Joe 2 3
Stacey 1 3
All I've managed so far is to create an overall count, which is not what I'm looking for.
Any help is appreciated
You could let
df['x'] = df.groupby('Person').cumcount() + 1
I have a dataset like the below. I want to be able to be able to populate the missing text with what is normal for the group. I have tried using ffil but this doesn't help the ones that are blank at the start, and bfil similarly for the end. How can I do this?
Group Name
1 Annie
2 NaN
3 NaN
4 David
1 NaN
2 Bertha
3 Chris
4 NaN
Desired Output:
Group Name
1 Annie
2 Bertha
3 Chris
4 David
1 Annie
2 Bertha
3 Chris
4 David
Using collections.Counter to create a modal mapping by group:
from collections import Counter
s = df.dropna(subset=['Name'])\
.groupby('Group')['Name']\
.apply(lambda x: Counter(x).most_common()[0][0])
df['Name'] = df['Name'].fillna(df['Group'].map(s))
print(df)
Group Name
0 1 Annie
1 2 Bertha
2 3 Chris
3 4 David
4 1 Annie
5 2 Bertha
6 3 Chris
7 4 David
You can use value_counts and head:
s = df.groupby('Group')['Name'].apply(lambda x: x.value_counts().head(1)).reset_index(-1)['level_1']
df['Name'] = df['Name'].fillna(df['Group'].map(s))
print(df)
Output:
Group Name
0 1 Annie
1 2 Bertha
2 3 Chris
3 4 David
4 1 Annie
5 2 Bertha
6 3 Chris
7 4 David