add a column with a counter per group in pandas [duplicate] - python

This question already has answers here:
Pandas number rows within group in increasing order
(2 answers)
Generate column of unique ID in pandas
(1 answer)
Closed 4 years ago.
I have a pandas DataFrame in python with one column A with numerical values:
A
11
12
13
12
14
I want to add a column that contains a counter that counts the number of elements per group up to that index in column A, like this:
A B
11 1
12 1
13 1
12 2
14 1
How do I create column B?

Related

count repeated data in pandas dataframe [duplicate]

This question already has answers here:
Pandas, groupby and count
(3 answers)
Get statistics for each group (such as count, mean, etc) using pandas GroupBy?
(9 answers)
Closed 3 months ago.
I need to count the times that two values are repeated in the dataframe and create a column that has that count:
I have this example dataframe:
date
cod
type
20210127
29
h
20210127
29
h
20210126
26
h
20210125
26
h
I need this:
date
cod
type
count
20210127
29
h
2
20210126
26
h
1
20210125
26
h
1
I try this like that :
df['count'] = df.apply(lamba x: df.count() if date and cod)

drop rows based on a condition based on another [duplicate]

This question already has answers here:
Get the row(s) which have the max value in groups using groupby
(15 answers)
Closed 6 months ago.
I have the following data frame
user_id
value
1
5
1
7
1
11
1
15
1
35
2
8
2
9
2
14
I want to drop all rows that are not the maximum value of every user_id
resulting on a 2 row data frame:
user_id
value
1
35
2
14
How can I do that?
You can use pandas.DataFrame.max after the grouping.
Assuming that your original dataframe is named df, try the code below :
out = df.groupby('user_id', as_index=False).max('value')
>>> print(out)
Edit :
If you want to group more than one column, use this :
out = df.groupby(['user_id', 'sex'], as_index=False, sort=False)['value'].max()
>>> print(out)

Split dataframe into 3 equally sized new dataframes - Pandas [duplicate]

This question already has answers here:
Split large Dataframe into smaller equal dataframes
(2 answers)
Split a large pandas dataframe
(10 answers)
Closed 1 year ago.
I have a dataframe that is 100,227 records long.
I would like to export this dataframe into 3 equally sized csv's
I did the following but getting an error. Is there perhaps a simpler approach to doing this?
df_seen = pd.read_csv("data.csv")
df1 = df_seen.shape.iloc[:, :33409]
df2 = df_seen.shape.iloc[33410:, 66818:]
df3 = df_seen.shape.iloc[66819:, 100227]
df1.to_csv('data1.csv', index=False)
df2.to_csv('data2.csv',index = False)
df3.to_csv('data3.csv',index = False)
So for example, the data looks similar to below where I would take the 1 column of 15 numbers and split it into 3 columns of 5 each:
Numbers
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Desired out put (note each column represents a new csv with a single column)
1 6 11
2 7 12
3 8 13
4 9 14
5 10 15
Try using numpy.array_split:
import numpy as np
df1, df2, df3 = np.array_split(df_seen, 3)
To save each DataFrame to a separate file, you could do:
for i, df in enumerate(np.array_split(df_seen, 3)):
df.to_csv(f"data{i+1}.csv", index=False)

How to create dataframe from 2 dataframe if value exist in both dataframe [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
How to filter Pandas dataframe using 'in' and 'not in' like in SQL
(11 answers)
Closed 2 years ago.
I have 2 Pandas Dataframes with one column (ID).
the first one look like this:
ID
1
2
3
4
5
and the second one look like this:
ID
3
4
5
6
7
I want to make a new Dataframe by combining those 2 Dataframes, but only the value that exist on both Dataframe.
This is the result that I want:
ID
3
4
5
can you show me how to do this in the most efficient way with pandas? Thank you

Create Range Column with duplicate values pandas [duplicate]

This question already has answers here:
Pandas DENSE RANK
(4 answers)
pandas group by and assign a group id then ungroup
(3 answers)
Closed 5 years ago.
I have a pandas dataframe with a column, call it range_id, that looks something like this:
range_id
1
1
2
2
5
5
5
8
8
10
10
...
I want to maintain the number buckets (each rows that share values still share values), but make the numbers ascend uniformly. So the new column would like this:
range_id
1
1
2
2
3
3
3
4
4
5
5
...
I could write a lambda function that maps these in such a way to achieve this desired output, but I was wondering if pandas has any sort of built-in functionality to achieve this, as it has always surprised me before in what it is capable of doing. Thanks for the help!

Categories