count repeated data in pandas dataframe [duplicate]

count repeated data in pandas dataframe [duplicate] - python

This question already has answers here:
Pandas, groupby and count
(3 answers)
Get statistics for each group (such as count, mean, etc) using pandas GroupBy?
(9 answers)
Closed 3 months ago.
I need to count the times that two values are repeated in the dataframe and create a column that has that count:
I have this example dataframe:
date
cod
type
20210127
29
h
20210127
29
h
20210126
26
h
20210125
26
h
I need this:
date
cod
type
count
20210127
29
h
2
20210126
26
h
1
20210125
26
h
1
I try this like that :
df['count'] = df.apply(lamba x: df.count() if date and cod)

Related

drop rows based on a condition based on another [duplicate]

This question already has answers here:
Get the row(s) which have the max value in groups using groupby
(15 answers)
Closed 6 months ago.
I have the following data frame
user_id
value
1
5
1
7
1
11
1
15
1
35
2
8
2
9
2
14
I want to drop all rows that are not the maximum value of every user_id
resulting on a 2 row data frame:
user_id
value
1
35
2
14
How can I do that?

You can use pandas.DataFrame.max after the grouping.
Assuming that your original dataframe is named df, try the code below :
out = df.groupby('user_id', as_index=False).max('value')
>>> print(out)
Edit :
If you want to group more than one column, use this :
out = df.groupby(['user_id', 'sex'], as_index=False, sort=False)['value'].max()
>>> print(out)

create a new dataframe based on given dataframe [duplicate]

This question already has answers here:
Group dataframe and get sum AND count?
(4 answers)
Closed 1 year ago.
I have a table that looks like this:
user id
observation
25
2
25
3
25
2
23
1
23
3
the desired outcome is:
user id
observation
retention
25
7
3
23
4
2
I want to keep the user id column with unique ids and have another column showing how many times this id has appeared in the dataset summing up the observation column values.
any help will be appreciated
thanks

Use groupby() method and chain agg() method to it:
outputdf=df.groupby('user id',as_index=False).agg(observation=('observation','sum'),retention=('observation','count'))
Now if you print outputdf you will get your desired output:
user id observation retention
0 23 4 2
1 25 7 3

You have to use group by:
import pandas as pd
d = {'user id': [25,25,25,33,33], 'observation': [2,3,2,1,3]}
# get the dataframe
df = pd.DataFrame(data=d)
df_new = df.groupby('user id').agg({"sum", "count"}).reset_index()
# rename the columns as you desire
df_new.columns = ['user id', 'observation', 'retention']
df_new
Output:

Count the sum of a subset of the index in a pandas series [duplicate]

This question already has answers here:
Sum only certain rows in a given column of pandas dataframe
(2 answers)
Closed 3 years ago.
I have a pandas.core.series.Series with some data, now I want to calculate the sum of the index 0 to 13. How would I do that?
This is what tried so far:
#preg.prglngth.value_counts().sort_index()
prglnght_var = preg['prglngth']
prglnght_var.ser[:14]
The series data looks like this:
0 15
1 9
....
47 1
48 7
50 2
Name: prglngth, dtype: int64

You can try:
prglnght_var.loc[:14].sum()
.loc is a method of the series class.
It selects the rows or columns (the rows, in this case) for the criteria you choose (in this case, all lines from 0 to 13)
It returns a series
.sum is a method of a series that will sum all values in it.
As the series is already filtered for the lines you want, it will sum all values that you want.

groupby with multiple columns with addition and frequency counts in pandas [duplicate]

This question already has answers here:
Multiple aggregations of the same column using pandas GroupBy.agg()
(4 answers)
Closed 4 years ago.
I have a table that is looks like follows:
name type val
A online 12
B online 24
A offline 45
B online 32
A offline 43
B offline 44
I want to dataframe in such a manner that it can be groupby with multiple cols name & type, which also have additional columns that return the count of the record with val being added of the same type records. It should be like follows:
name type count val
A online 1 12
offline 2 88
B online 2 56
offline 1 44
I have tried pd.groupby(['name', 'type'])['val'].sum() that gives the addition but unable to add the count of records.

Add parameter sort=False to groupby for avoid default sorting and aggregate by agg with tuples with new columns names and aggregate functions, last reset_index for MultiIndex to columns:
df1 = (df.groupby(['name', 'type'], sort=False)['val']
.agg([('count', 'count'),('val', 'sum')])
.reset_index())
print (df1)
name type count val
0 A online 1 12
1 B online 2 56
2 A offline 2 88
3 B offline 1 44

You can try pivoting i.e
df.pivot_table(index=['name','type'],aggfunc=['count','sum'],values='val')
count sum
val val
name type
A offline 2 88
online 1 12
B offline 1 44
online 2 56

add a column with a counter per group in pandas [duplicate]

This question already has answers here:
Pandas number rows within group in increasing order
(2 answers)
Generate column of unique ID in pandas
(1 answer)
Closed 4 years ago.
I have a pandas DataFrame in python with one column A with numerical values:
A
11
12
13
12
14
I want to add a column that contains a counter that counts the number of elements per group up to that index in column A, like this:
A B
11 1
12 1
13 1
12 2
14 1
How do I create column B?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

count repeated data in pandas dataframe [duplicate] - python

Related

drop rows based on a condition based on another [duplicate]

create a new dataframe based on given dataframe [duplicate]

Count the sum of a subset of the index in a pandas series [duplicate]

groupby with multiple columns with addition and frequency counts in pandas [duplicate]

add a column with a counter per group in pandas [duplicate]

Categories

Resources