I`m trying to calculate count of some values in data frame like
user_id event_type
1 a
1 a
1 b
2 a
2 b
2 c
and I want to get table like
user_id event_type event_type_count
1 a 2
1 a 2
1 b 1
2 a 1
2 b 1
2 c 2
2 c 2
In other words, I want to insert count of value instead value in data frame.
I've tried use df.join(pd.crosstab)..., but I get a large data frame with many columns.
Which way is better to solve this problem ?
Use GroupBy.transform by both columns with GroupBy.size:
df['event_type_count'] = df.groupby(['user_id','event_type'])['event_type'].transform('size')
print (df)
user_id event_type event_type_count
0 1 a 2
1 1 a 2
2 1 b 1
3 2 a 1
4 2 b 1
5 2 c 2
6 2 c 2
Related
I have two different dataframe in pandas.
First
A
B
C
D
VALUE
1
2
3
5
0
1
5
3
2
0
2
5
3
2
0
Second
A
B
C
D
Value
5
3
3
2
1
1
5
4
3
1
I want column values A and B in the first dataframe to be searched in the second dataframe. If A and B values match then update the Value column.Search only 2 columns in other dataframe and update only 1 column. Actually the process we know in sql.
Result
A
B
C
D
VALUE
1
2
3
5
0
1
5
3
2
1
2
5
3
2
0
If you focus on the bold text, you can understand it more easily.Despite my attempts, I could not succeed. I only want 1 column to change but it also changes A and B. I only want the Value column of matches to change.
You can use a merge:
cols = ['A', 'B']
df1['VALUE'] = (df2.merge(df1[cols], on=cols, how='right')
['Value'].fillna(df1['VALUE'], downcast='infer')
)
output:
A B C D VALUE
0 1 2 3 5 0
1 1 5 3 2 1
2 2 5 3 2 0
I am trying to create a "two-entry table" from many columns in my df. I tried with pivot_table / crosstrab / groupby but results appeareance using this functions is not acomplish since will not be a "two entry table"
for example if i have a dataframe like this :
df
A B C D E
1 0 0 1 1
0 1 0 1 0
1 1 1 1 1
I will like to transform my df to a df which could be seen like a "two-entry table"
A B C D E
A 2 1 1 2 2
B 1 2 1 2 1
C 1 1 1 1 1
D 2 2 1 3 1
E 2 1 1 1 2
so if i should explain first row, would be as A has two 1 in his column, then A-A = 2, A-B = 1 because they shared one's in the third row level in df, A-C = 1 because the third row in df they shared one's in the same row level and finaly A-E = 2 because they shared one's in the first row and the third row of df
Use pd.DataFrame.dot with T:
df.T.dot(df) # or df.T#df
Output:
A B C D E
A 2 1 1 2 2
B 1 2 1 2 1
C 1 1 1 1 1
D 2 2 1 3 2
E 2 1 1 2 2
Assuming that I have a pandas dataframe of purchase, with no invoice ID like that
item_id customer_id
1 A
2 A
1 B
3 C
4 C
1 A
5 A
So, my assumption is, if multiple items are bought by a customer in continuous orders, they belong to one group. So I would like to create an order_id column as:
item_id customer_id order_id
1 A 1
2 A 1
1 B 2
3 C 3
4 C 3
1 A 4
5 A 4
The order_id shall be created automatically and incremental. How should I do that with pandas?
Many thanks
IIUC, here's one way:
df['order_id'] = df.customer_id.ne(df.customer_id.shift()).cumsum()
OUTPUT:
item_id customer_id order_id
0 1 A 1
1 2 A 1
2 1 B 2
3 3 C 3
4 4 C 3
5 1 A 4
6 5 A 4
I have the following df:
Doc Item
1 1
1 1
1 2
1 3
2 1
2 2
I want to add third column with repeating values that (1) increment by one if there is a change in column "Item" and that also (2) restarts if there is a change in column "Doc"
Doc Item NewCol
1 1 1
1 1 1
1 2 2
1 3 3
2 1 1
2 2 2
What is the best way to achieve this?
Thanks a lot.
Use GroupBy.transform wth custom lambda function with factorize:
df['NewCol'] = df.groupby('Doc')['Item'].transform(lambda x: pd.factorize(x)[0]) + 1
print (df)
Doc Item NewCol
0 1 1 1
1 1 1 1
2 1 2 2
3 1 3 3
4 2 1 1
5 2 2 2
If values in Item are integers is possible use GroupBy.rank:
df['NewCol'] = df.groupby('Doc')['Item'].rank(method='dense').astype(int)
I'm trying to count the amount of times a value in a Pandas dataframe occurs along with another value and count the amount of times for each row.
This is what I mean:
a t
0 a 2
1 b 4
2 c 2
3 g 2
4 b 3
5 a 2
6 b 3
Say I want to count the amount of times a occurs along with the number 2, I'd like the result to be:
a t freq
0 a 2 2
1 b 4 1
2 c 2 1
3 g 2 1
4 b 3 2
5 a 2 2
6 b 3 2
The freq (frequency) column here indicates the amount of times a value in column a appears along with a value in column t.
Please note that a solution that e.g. only counts the amount of times a occurs will result in a wrong frequency considering the size of my dataframe.
Is there a way to achieve this in Python?
Use transform with size or count:
df['freq'] = df.groupby(['a', 't'])['a'].transform('size')
#alternative solution
#df['freq'] = df.groupby(['a', 't'])['a'].transform('count')
print (df)
a t freq
0 a 2 2
1 b 4 1
2 c 2 1
3 g 2 1
4 b 3 2
5 a 2 2
6 b 3 2