Pandas Fillna with the MAX value_counts of each group - python

There two columns in the DataFrame named "country" & "taster_name". Column "taster_name" got some missing values in it. I want to fillna the missing values with the MAX VALUE_COUNTS of the taster_name of each country(depending on which country the missing value belongs to). I don't know how I can make it.
From the code below, we can check the MAX VALUE_COUNTS of the taster_name of each country.
wine[['country','taster_name']].groupby('country').taster_name.value_counts()

try this,
df.groupby('country')['teaser_name'].apply(lambda x:x.fillna(x.value_counts().index.tolist()[0]))
As you didn't provide sample data. I created by myself.
Sample Input:
country teaser_name
0 A abraham
1 B silva
2 A abraham
3 A NaN
4 B NaN
5 C john
6 C NaN
7 C john
8 C jacob
9 A NaN
10 B silva
11 A william
Output:
country teaser_name
0 A abraham
1 B silva
2 A abraham
3 A abraham
4 B silva
5 C john
6 C john
7 C john
8 C jacob
9 A abraham
10 B silva
11 A william
Explanation:
Try to groupby country and fill NaN values with value_counts. In value_counts by default it's in descending order. So, you can take first element and fill with NaN.

Related

How do I create a column with a previous value (in column B) when the names (in column A) in current and previous rows matches?

I want to create a new column that copies the values in column "value" corresponding to the same person in column "Name" but from the immediate previous row with the same name. I want to leave an empty string when there are not previous values for that person.
I tried to use this code, but it doesn't work:
previous_value= []
col_list = df['Name'].values.tolist()
for idx in df.index:
last_name= df['Name'].loc[idx]
last_value= df['Value'].loc[idx]
for i in range(len(col_list)-1):
actual_name= col_list[i+1]
if last_name == actual_name:
previous_value.append(last_value)
else:
previous_followers.append("")
My idea was to transform later the previous_value list into a data frame and then add it to the original data frame.
This is how it should look like:
Name Value Previous_value
1 Andrew 12
2 Marco 10
3 Philips 9
4 Andrew 8 12
5 Oscar 7
6 Peter 15
7 Maria 25
8 Marco 3 10
9 Andrew 7 8
10 Oscar 19 7
11 Oscar 21 19
12 Maria 2 25
Thank you
This question was answered previously here. You can use groupby and shift to achieve this (although by default you will get NaNs for the first entry, not an empty string.
df = pd.DataFrame({'Name':[1,2,3,1,2,3,1,2,3],'Value':[0,1,2,3,4,5,6,7,8]})
df['Previous_Value'] = df.groupby('Name')['Value'].shift()
For loops often don't mix well with pandas.
In this case, you want to group by name and then shift the values down by one to create the previous value column. This should do the trick:
>>> df['previous_value'] = df.groupby('Name')['Value'].shift()
>>> df
Name Value previous_value
0 Andrew 12 NaN
1 Marco 10 NaN
2 Philips 9 NaN
3 Andrew 8 12.0
4 Oscar 7 NaN
5 Peter 15 NaN
6 Maria 25 NaN
7 Marco 3 10.0
8 Andrew 9 8.0
9 Oscar 19 7.0
10 Oscar 21 19.0
11 Maria 2 25.0
You can then use fillna('') on the new column to replace the NaNs with an empty string if desired.

Pandas DataFrame removing NaN rows based on condition?

Pandas DataFrame removing NaN rows based on condition.
I'm trying to remove the rows whose gender==male and status == NaN.
Sample df:
name status gender leaves
0 tom NaN male 5
1 tom True male 6
2 tom True male 7
3 mary True female 1
4 mary NaN female 10
5 mary True female 15
6 john NaN male 2
7 mark True male 3
Expected Ouput:
name status gender leaves
0 tom True male 6
1 tom True male 7
2 mary True female 1
3 mary NaN female 10
4 mary True female 15
5 mark True male 3
You can use isna (or isnull) function to get the rows with a value of NaN.
With this knowledge, you can filter your dataframe using something like:
conditions = (df.gender == 'male')&(df.status.isna())
filtered_df = df[~conditions]
Good One given by #Derlin, other way I tried is using fillna() filled NaN with -1 and indexed them, just like below:
>>> df[~((df.fillna(-1)['status']==-1)&(df['gender']=='male'))]
Just for reference ~ operator is the same as np.logical_not() of numpy. So if you use this:
df[np.logical_not((df.fillna(-1)['status']==-1)&(df['gender']=='male'))] (dont forget to import numpy as np), means the same.

Map multiple columns using Series from another DataFrame

I have two DataFrames. I need is to replace the text in columns B, C, and D in df1 with the values from df2['SC'], based on the value of df2['Title'].
df1
A B C D
Dave Green Blue Yellow
Pete Red
Phil Purple
df2
A ID N SC Title
Dave 1 5 2 Green
Dave 1 10 2 Blue
Dave 1 15 3 Yellow
Pete 2 100 3 Red
Phil 3 200 4 Purple
Desired output:
A B C D
Dave 2 2 3
Pete 3
Phil 4
Using stack + map + unstack
df1.set_index('A').stack().map(df2.set_index('Title')['SC']).unstack()
B C D
A
Dave 2.0 2.0 3.0
Pete 3.0 NaN NaN
Phil 4.0 NaN NaN
If a column contains all NaN it will be lost. To avoid this you could reindex
.reindex(df1.columns, axis=1) # append to previous command

how to melt a dataframe -- get the column name in the field of melt dataframe

I have a df as below
name 0 1 2 3 4
0 alex NaN NaN aa bb NaN
1 mike NaN rr NaN NaN NaN
2 rachel ss NaN NaN NaN ff
3 john NaN ff NaN NaN NaN
the melt function should return the below
name code
0 alex 2
1 alex 3
2 mike 1
3 rachel 0
4 rachel 4
5 john 1
Any suggestion is helpful. thanks.
Just follow these steps: melt, dropna, sort column name, reset index, and finally drop any unwanted columns
In [1171]: df.melt(['name'],var_name='code').dropna().sort_values('name').reset_index().drop(['index', 'value'], 1)
Out[1171]:
name code
0 alex 2
1 alex 3
2 john 1
3 mike 1
4 rachel 0
5 rachel 4
This should work.
df.unstack().reset_index().dropna()
df.set_index('name').unstack().reset_index().rename(columns={'level_0':'Code'}).dropna().drop(0,axis =1)[['name','Code']].sort_values('name')
output will be
name Code
alex 2
alex 3
john 1
mike 1
rachel 0
rachel 4

Pandas intersection of groups

Hi I'm trying to find the unique Player which show up in every Team.
df =
Team Player Number
A Joe 8
A Mike 10
A Steve 11
B Henry 9
B Steve 19
B Joe 4
C Mike 18
C Joe 6
C Steve 18
C Dan 1
C Henry 3
and the result should be:
result =
Team Player Number
A Joe 8
A Steve 11
B Joe 4
B Steve 19
C Joe 6
C Steve 18
since Joe and Steve are the only Player in each Team
You can use a GroupBy.transform to get a count of unique teams that each player is a member of, and compare this to the overall count of unique teams. This will give you a Boolean array, which you can use to filter your DataFrame:
df = df[df.groupby('Player')['Team'].transform('nunique') == df['Team'].nunique()]
The resulting output:
Team Player Number
0 A Joe 8
2 A Steve 11
4 B Steve 19
5 B Joe 4
7 C Joe 6
8 C Steve 18

Categories