I'm trying to count the amount of times a value in a Pandas dataframe occurs along with another value and count the amount of times for each row.
This is what I mean:
a t
0 a 2
1 b 4
2 c 2
3 g 2
4 b 3
5 a 2
6 b 3
Say I want to count the amount of times a occurs along with the number 2, I'd like the result to be:
a t freq
0 a 2 2
1 b 4 1
2 c 2 1
3 g 2 1
4 b 3 2
5 a 2 2
6 b 3 2
The freq (frequency) column here indicates the amount of times a value in column a appears along with a value in column t.
Please note that a solution that e.g. only counts the amount of times a occurs will result in a wrong frequency considering the size of my dataframe.
Is there a way to achieve this in Python?
Use transform with size or count:
df['freq'] = df.groupby(['a', 't'])['a'].transform('size')
#alternative solution
#df['freq'] = df.groupby(['a', 't'])['a'].transform('count')
print (df)
a t freq
0 a 2 2
1 b 4 1
2 c 2 1
3 g 2 1
4 b 3 2
5 a 2 2
6 b 3 2
Related
I have a table like the following, and I want to fill a new column based on conditions from other columns. In this case:
Having the same value in Cond1
Amount being different than zero
Being from previous months
Column "CountPreviousMonth" is what I need to fill out
Cond1
Amount
Month
CountPreviousMonth
a
10
1
0
a
20
2
1
a
15
3
2
b
10
1
0
b
0
2
1
b
15
3
1
c
5
1
0
c
25
2
1
c
15
3
2
When month is 1 then the count is zero because is the first one.
In Cond1=b it stays at count = 1 because in Month 2 the Amount was zero
In Excel I used COUNTIFS but would like to do it in Python, where I could do it in a for loop but the real table has many rows and it wouldn't be efficient. Is there a better way to calculate it?
First replace Month to missing values if Amount=0 ant hen use custom lambda function with Series.shift and forward filling missing values:
f = lambda x: x.shift(fill_value=0).ffill().astype(int)
df['count'] = df['Month'].mask(df['Amount'].eq(0)).groupby(df['Cond1']).transform(f)
print (df)
Cond1 Amount Month CountPreviousMonth count
0 a 10 1 0 0
1 a 20 2 1 1
2 a 15 3 2 2
3 b 10 1 0 0
4 b 0 2 1 1
5 b 15 3 1 1
6 c 5 1 0 0
7 c 25 2 1 1
8 c 15 3 2 2
Move down by 1, and check which ones are not equal to 0:
arr = df.Amount.shift().ne(0)
Get boolean where month is 1:
repl = df.Month.eq(1)
index arr with repl, to keep track of first month per group:
arr[repl] = True
Groupby, run a cumulative sum, and finally deduct 1, to ensure every group starts at 0:
df.assign(CountPreviousMonth = arr.groupby(df.Cond1).cumsum().sub(1))
Cond1 Amount Month CountPreviousMonth
0 a 10 1 0
1 a 20 2 1
2 a 15 3 2
3 b 10 1 0
4 b 0 2 1
5 b 15 3 1
6 c 5 1 0
7 c 25 2 1
8 c 15 3 2
I have two different dataframe in pandas.
First
A
B
C
D
VALUE
1
2
3
5
0
1
5
3
2
0
2
5
3
2
0
Second
A
B
C
D
Value
5
3
3
2
1
1
5
4
3
1
I want column values A and B in the first dataframe to be searched in the second dataframe. If A and B values match then update the Value column.Search only 2 columns in other dataframe and update only 1 column. Actually the process we know in sql.
Result
A
B
C
D
VALUE
1
2
3
5
0
1
5
3
2
1
2
5
3
2
0
If you focus on the bold text, you can understand it more easily.Despite my attempts, I could not succeed. I only want 1 column to change but it also changes A and B. I only want the Value column of matches to change.
You can use a merge:
cols = ['A', 'B']
df1['VALUE'] = (df2.merge(df1[cols], on=cols, how='right')
['Value'].fillna(df1['VALUE'], downcast='infer')
)
output:
A B C D VALUE
0 1 2 3 5 0
1 1 5 3 2 1
2 2 5 3 2 0
I have a dataset with three columns A,B and C.
*
A B C
1 2 3
1 3 4
1 4 5
1 2 6
2 1 9
2 9 8
2 8 2
2 1 2
I need to get value of A,B,C columns corresponds to min B value grouped by A column.
As you can see I have duplicated values for A{1,2}B and A{2,1}B. If I do this command:
dataset['A,'B','C'].loc[dataset.groupby('A').B.idxmin()]
I get only first value of A,B,C for min B. But how can I get all rows? \
Output:
A B C
1 2 3
2 1 9
Output expected:
A B C
1 2 3
1 2 6
2 1 9
2 1 2
Use GroupBy.transform and compare by column B in boolean indexing:
df = dataset[dataset.groupby('A').B.transform('min').eq(dataset['B'])]
print (df)
A B C
0 1 2 3
3 1 2 6
4 2 1 9
7 2 1 2
I`m trying to calculate count of some values in data frame like
user_id event_type
1 a
1 a
1 b
2 a
2 b
2 c
and I want to get table like
user_id event_type event_type_count
1 a 2
1 a 2
1 b 1
2 a 1
2 b 1
2 c 2
2 c 2
In other words, I want to insert count of value instead value in data frame.
I've tried use df.join(pd.crosstab)..., but I get a large data frame with many columns.
Which way is better to solve this problem ?
Use GroupBy.transform by both columns with GroupBy.size:
df['event_type_count'] = df.groupby(['user_id','event_type'])['event_type'].transform('size')
print (df)
user_id event_type event_type_count
0 1 a 2
1 1 a 2
2 1 b 1
3 2 a 1
4 2 b 1
5 2 c 2
6 2 c 2
I have a data frame like this:
df1 = pd.DataFrame({'a': [1,2],
'b': [3,4],
'c': [6,5]})
df1
Out[150]:
a b c
0 1 3 6
1 2 4 5
Now I want to create a df that repeats each row based on difference between col b and c plus 1. So diff between b and c for first row is 6-3 = 3. I want to repeat that row 3+1=4 times. Similarly for second row the difference is 5-4 = 1, so I want to repeat it 1+1=2 times. The column d is added to have value from min(b) to diff between b and c (i.e.6-3 = 3. So it goes from 3->6). So I want to get this df:
a b c d
0 1 3 6 3
0 1 3 6 4
0 1 3 6 5
0 1 3 6 6
1 2 4 5 4
1 2 4 5 5
Do it with reindex + repeat, then using groupby cumcount assign the new value d
df1.reindex(df1.index.repeat(df1.eval('c-b').add(1))).\
assign(d=lambda x : x.c-x.groupby('a').cumcount(ascending=False))
Out[572]:
a b c d
0 1 3 6 3
0 1 3 6 4
0 1 3 6 5
0 1 3 6 6
1 2 4 5 4
1 2 4 5 5