Count the amount of times value A occurs with value B

Count the amount of times value A occurs with value B - python

I'm trying to count the amount of times a value in a Pandas dataframe occurs along with another value and count the amount of times for each row.
This is what I mean:
a t
0 a 2
1 b 4
2 c 2
3 g 2
4 b 3
5 a 2
6 b 3
Say I want to count the amount of times a occurs along with the number 2, I'd like the result to be:
a t freq
0 a 2 2
1 b 4 1
2 c 2 1
3 g 2 1
4 b 3 2
5 a 2 2
6 b 3 2
The freq (frequency) column here indicates the amount of times a value in column a appears along with a value in column t.
Please note that a solution that e.g. only counts the amount of times a occurs will result in a wrong frequency considering the size of my dataframe.
Is there a way to achieve this in Python?

Use transform with size or count:
df['freq'] = df.groupby(['a', 't'])['a'].transform('size')
#alternative solution
#df['freq'] = df.groupby(['a', 't'])['a'].transform('count')
print (df)
a t freq
0 a 2 2
1 b 4 1
2 c 2 1
3 g 2 1
4 b 3 2
5 a 2 2
6 b 3 2

Related

Fill Pandas dataframe based on conditions from same row values (based on Excel COUNTIFS formula)

I have a table like the following, and I want to fill a new column based on conditions from other columns. In this case:
Having the same value in Cond1
Amount being different than zero
Being from previous months
Column "CountPreviousMonth" is what I need to fill out
Cond1
Amount
Month
CountPreviousMonth
a
10
1
0
a
20
2
1
a
15
3
2
b
10
1
0
b
0
2
1
b
15
3
1
c
5
1
0
c
25
2
1
c
15
3
2
When month is 1 then the count is zero because is the first one.
In Cond1=b it stays at count = 1 because in Month 2 the Amount was zero
In Excel I used COUNTIFS but would like to do it in Python, where I could do it in a for loop but the real table has many rows and it wouldn't be efficient. Is there a better way to calculate it?

First replace Month to missing values if Amount=0 ant hen use custom lambda function with Series.shift and forward filling missing values:
f = lambda x: x.shift(fill_value=0).ffill().astype(int)
df['count'] = df['Month'].mask(df['Amount'].eq(0)).groupby(df['Cond1']).transform(f)
print (df)
Cond1 Amount Month CountPreviousMonth count
0 a 10 1 0 0
1 a 20 2 1 1
2 a 15 3 2 2
3 b 10 1 0 0
4 b 0 2 1 1
5 b 15 3 1 1
6 c 5 1 0 0
7 c 25 2 1 1
8 c 15 3 2 2

Move down by 1, and check which ones are not equal to 0:
arr = df.Amount.shift().ne(0)
Get boolean where month is 1:
repl = df.Month.eq(1)
index arr with repl, to keep track of first month per group:
arr[repl] = True
Groupby, run a cumulative sum, and finally deduct 1, to ensure every group starts at 0:
df.assign(CountPreviousMonth = arr.groupby(df.Cond1).cumsum().sub(1))
Cond1 Amount Month CountPreviousMonth
0 a 10 1 0
1 a 20 2 1
2 a 15 3 2
3 b 10 1 0
4 b 0 2 1
5 b 15 3 1
6 c 5 1 0
7 c 25 2 1
8 c 15 3 2

Search and update values in other dataframe for specific columns

I have two different dataframe in pandas.
First
A
B
C
D
VALUE
1
2
3
5
0
1
5
3
2
0
2
5
3
2
0
Second
A
B
C
D
Value
5
3
3
2
1
1
5
4
3
1
I want column values A and B in the first dataframe to be searched in the second dataframe. If A and B values match then update the Value column.Search only 2 columns in other dataframe and update only 1 column. Actually the process we know in sql.
Result
A
B
C
D
VALUE
1
2
3
5
0
1
5
3
2
1
2
5
3
2
0
If you focus on the bold text, you can understand it more easily.Despite my attempts, I could not succeed. I only want 1 column to change but it also changes A and B. I only want the Value column of matches to change.

You can use a merge:
cols = ['A', 'B']
df1['VALUE'] = (df2.merge(df1[cols], on=cols, how='right')
['Value'].fillna(df1['VALUE'], downcast='infer')
)
output:
A B C D VALUE
0 1 2 3 5 0
1 1 5 3 2 1
2 2 5 3 2 0

Group by id and get all rows corresponds to min value in a column

I have a dataset with three columns A,B and C.
*
A B C
1 2 3
1 3 4
1 4 5
1 2 6
2 1 9
2 9 8
2 8 2
2 1 2
I need to get value of A,B,C columns corresponds to min B value grouped by A column.
As you can see I have duplicated values for A{1,2}B and A{2,1}B. If I do this command:
dataset['A,'B','C'].loc[dataset.groupby('A').B.idxmin()]
I get only first value of A,B,C for min B. But how can I get all rows? \
Output:
A B C
1 2 3
2 1 9
Output expected:
A B C
1 2 3
1 2 6
2 1 9
2 1 2

Use GroupBy.transform and compare by column B in boolean indexing:
df = dataset[dataset.groupby('A').B.transform('min').eq(dataset['B'])]
print (df)
A B C
0 1 2 3
3 1 2 6
4 2 1 9
7 2 1 2

Replace string values in pandas to their count

I`m trying to calculate count of some values in data frame like
user_id event_type
1 a
1 a
1 b
2 a
2 b
2 c
and I want to get table like
user_id event_type event_type_count
1 a 2
1 a 2
1 b 1
2 a 1
2 b 1
2 c 2
2 c 2
In other words, I want to insert count of value instead value in data frame.
I've tried use df.join(pd.crosstab)..., but I get a large data frame with many columns.
Which way is better to solve this problem ?

Use GroupBy.transform by both columns with GroupBy.size:
df['event_type_count'] = df.groupby(['user_id','event_type'])['event_type'].transform('size')
print (df)
user_id event_type event_type_count
0 1 a 2
1 1 a 2
2 1 b 1
3 2 a 1
4 2 b 1
5 2 c 2
6 2 c 2

Repeating rows of a dataframe based on a column value

I have a data frame like this:
df1 = pd.DataFrame({'a': [1,2],
'b': [3,4],
'c': [6,5]})
df1
Out[150]:
a b c
0 1 3 6
1 2 4 5
Now I want to create a df that repeats each row based on difference between col b and c plus 1. So diff between b and c for first row is 6-3 = 3. I want to repeat that row 3+1=4 times. Similarly for second row the difference is 5-4 = 1, so I want to repeat it 1+1=2 times. The column d is added to have value from min(b) to diff between b and c (i.e.6-3 = 3. So it goes from 3->6). So I want to get this df:
a b c d
0 1 3 6 3
0 1 3 6 4
0 1 3 6 5
0 1 3 6 6
1 2 4 5 4
1 2 4 5 5

Do it with reindex + repeat, then using groupby cumcount assign the new value d
df1.reindex(df1.index.repeat(df1.eval('c-b').add(1))).\
assign(d=lambda x : x.c-x.groupby('a').cumcount(ascending=False))
Out[572]:
a b c d
0 1 3 6 3
0 1 3 6 4
0 1 3 6 5
0 1 3 6 6
1 2 4 5 4
1 2 4 5 5

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Count the amount of times value A occurs with value B - python

Use transform with size or count: df['freq'] = df.groupby(['a', 't'])['a'].transform('size') #alternative solution #df['freq'] = df.groupby(['a', 't'])['a'].transform('count') print (df) a t freq 0 a 2 2 1 b 4 1 2 c 2 1 3 g 2 1 4 b 3 2 5 a 2 2 6 b 3 2

Related

Fill Pandas dataframe based on conditions from same row values (based on Excel COUNTIFS formula)

Search and update values in other dataframe for specific columns

Group by id and get all rows corresponds to min value in a column

Replace string values in pandas to their count

Repeating rows of a dataframe based on a column value

Categories

Resources