This question already has answers here:
Get the row(s) which have the max value in groups using groupby
(15 answers)
Closed 6 months ago.
I have the following data frame
user_id
value
1
5
1
7
1
11
1
15
1
35
2
8
2
9
2
14
I want to drop all rows that are not the maximum value of every user_id
resulting on a 2 row data frame:
user_id
value
1
35
2
14
How can I do that?
You can use pandas.DataFrame.max after the grouping.
Assuming that your original dataframe is named df, try the code below :
out = df.groupby('user_id', as_index=False).max('value')
>>> print(out)
Edit :
If you want to group more than one column, use this :
out = df.groupby(['user_id', 'sex'], as_index=False, sort=False)['value'].max()
>>> print(out)
Related
This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 1 year ago.
I have multiple dataframes that have an ID and a value and I am trying to merge them such that each ID has all the values in it's row.
ID
Value
1
10
3
21
4
12
5
43
7
11
And then I have another dataframe:
ID
Value2
1
12
2
14
4
55
6
23
7
90
I want to merge these two in a way where it considers the ID's that are already in the first dataframe and if an ID that is the second dataframe is not in the first one, it adds it to the ID row with value2 leaving value empty. This is what my result would look like:
ID
Value
Value2
1
10
12
3
21
-
4
12
55
5
43
-
7
11
90
2
-
14
6
-
23
Hope this makes sense. I don't really care for the order of the ID numbers, they can be sorted or not. My goal is to be able to create dictionaries for each ID with "Value", "Value2", "Value3,... as keys and the corresponding actual value numbers as the keys values. Please let me know if any clarification needed.
You can use pandas' merge method (see here for the help page):
import pandas as pd
df1.merge(df2, how='outer', on='ID')
Specifying 'outer' will use union keys from both dataframes.
This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 1 year ago.
i have two data frames:
df1 :
ID COUNT
0 202485 6
1 215893 8
2 181840 8
3 168337 7
and another dataframe
df2:
ID
0 202485
1 215893
2 181840
i want to filter /left join the two dataframes:
desired result is
ID COUNT
0 202485 6
1 215893 8
2 181840 8
i tried df1.merge(df2, how='inner', on='ID') : error like ou are trying to merge on object and int64 columns
also used isin, but didn't work.
list=df1['ID'].drop_duplicates().to_list()
df1[df1['ID'].isin(list)]
Any help?
df1 = pd.DataFrame({'ID':[202485,215893,181840,168337],'COUNT':[6,8,8,7]})
df2 = pd.DataFrame({"ID":[202485,215893,181840]})
out_df = pd.merge(df1,df2)
print(out_df)
This gives the desired result
ID COUNT
0 202485 6
1 215893 8
2 181840 8
This question already has answers here:
Group dataframe and get sum AND count?
(4 answers)
Closed 1 year ago.
I have a table that looks like this:
user id
observation
25
2
25
3
25
2
23
1
23
3
the desired outcome is:
user id
observation
retention
25
7
3
23
4
2
I want to keep the user id column with unique ids and have another column showing how many times this id has appeared in the dataset summing up the observation column values.
any help will be appreciated
thanks
Use groupby() method and chain agg() method to it:
outputdf=df.groupby('user id',as_index=False).agg(observation=('observation','sum'),retention=('observation','count'))
Now if you print outputdf you will get your desired output:
user id observation retention
0 23 4 2
1 25 7 3
You have to use group by:
import pandas as pd
d = {'user id': [25,25,25,33,33], 'observation': [2,3,2,1,3]}
# get the dataframe
df = pd.DataFrame(data=d)
df_new = df.groupby('user id').agg({"sum", "count"}).reset_index()
# rename the columns as you desire
df_new.columns = ['user id', 'observation', 'retention']
df_new
Output:
This question already has an answer here:
Replace value in a specific with corresponding value
(1 answer)
Closed 3 years ago.
I have the following df1:
Id value
'so' 5
'fe' 6
'd1' 4
Then I have a ref_df:
Id value
'so' 3
'fe' 3
'ju' 2
'd1' 1
I want to check that if any of the Ids in ref_df appear in df1, then replace the value in df1 by the ref_df.
The desired output would be:
Id value
'so' 3
'fe' 3
'd1' 1
How can I achieve this?
try this,
df1['Value'] = df1['Id'].map(ref_df.set_index('Id')['Value'])
O/P:
Id Value
0 so 3
1 fe 3
2 dl 1
This question already has answers here:
Pandas number rows within group in increasing order
(2 answers)
Generate column of unique ID in pandas
(1 answer)
Closed 4 years ago.
I have a pandas DataFrame in python with one column A with numerical values:
A
11
12
13
12
14
I want to add a column that contains a counter that counts the number of elements per group up to that index in column A, like this:
A B
11 1
12 1
13 1
12 2
14 1
How do I create column B?