Search and update values in other dataframe for specific columns - python

I have two different dataframe in pandas.
First
A
B
C
D
VALUE
1
2
3
5
0
1
5
3
2
0
2
5
3
2
0
Second
A
B
C
D
Value
5
3
3
2
1
1
5
4
3
1
I want column values A and B in the first dataframe to be searched in the second dataframe. If A and B values match then update the Value column.Search only 2 columns in other dataframe and update only 1 column. Actually the process we know in sql.
Result
A
B
C
D
VALUE
1
2
3
5
0
1
5
3
2
1
2
5
3
2
0
If you focus on the bold text, you can understand it more easily.Despite my attempts, I could not succeed. I only want 1 column to change but it also changes A and B. I only want the Value column of matches to change.

You can use a merge:
cols = ['A', 'B']
df1['VALUE'] = (df2.merge(df1[cols], on=cols, how='right')
['Value'].fillna(df1['VALUE'], downcast='infer')
)
output:
A B C D VALUE
0 1 2 3 5 0
1 1 5 3 2 1
2 2 5 3 2 0

Related

Compare 2 Columns if the the value in Second column replace it

I need to compare two columns if the value in the second column is 3 replace it at the first column, if its 0 in the second column then keep it
I have thses two column
Column A
Column B
2
0
2
3
1
0
2
3
2
3
2
0
it should be like this
Column A
Column B
2
0
3
3
1
0
3
3
3
3
2
0
You can use boolean indexing for that. df['B'] == 3 returns a boolean array that can be applied to column A to set the value equal to 3.
df = pd.DataFrame({'A': [2,2,1,2,2,2], 'B': [0,3,0,3,3,0]})
df['A'].loc[df['B'] == 3] = 3
print(df)
Output:
A B
0 2 0
1 3 3
2 1 0
3 3 3
4 3 3
5 2 0

Group by id and get all rows corresponds to min value in a column

I have a dataset with three columns A,B and C.
*
A B C
1 2 3
1 3 4
1 4 5
1 2 6
2 1 9
2 9 8
2 8 2
2 1 2
I need to get value of A,B,C columns corresponds to min B value grouped by A column.
As you can see I have duplicated values for A{1,2}B and A{2,1}B. If I do this command:
dataset['A,'B','C'].loc[dataset.groupby('A').B.idxmin()]
I get only first value of A,B,C for min B. But how can I get all rows? \
Output:
A B C
1 2 3
2 1 9
Output expected:
A B C
1 2 3
1 2 6
2 1 9
2 1 2
Use GroupBy.transform and compare by column B in boolean indexing:
df = dataset[dataset.groupby('A').B.transform('min').eq(dataset['B'])]
print (df)
A B C
0 1 2 3
3 1 2 6
4 2 1 9
7 2 1 2

Replace string values in pandas to their count

I`m trying to calculate count of some values in data frame like
user_id event_type
1 a
1 a
1 b
2 a
2 b
2 c
and I want to get table like
user_id event_type event_type_count
1 a 2
1 a 2
1 b 1
2 a 1
2 b 1
2 c 2
2 c 2
In other words, I want to insert count of value instead value in data frame.
I've tried use df.join(pd.crosstab)..., but I get a large data frame with many columns.
Which way is better to solve this problem ?
Use GroupBy.transform by both columns with GroupBy.size:
df['event_type_count'] = df.groupby(['user_id','event_type'])['event_type'].transform('size')
print (df)
user_id event_type event_type_count
0 1 a 2
1 1 a 2
2 1 b 1
3 2 a 1
4 2 b 1
5 2 c 2
6 2 c 2

Repeating rows of a dataframe based on a column value

I have a data frame like this:
df1 = pd.DataFrame({'a': [1,2],
'b': [3,4],
'c': [6,5]})
df1
Out[150]:
a b c
0 1 3 6
1 2 4 5
Now I want to create a df that repeats each row based on difference between col b and c plus 1. So diff between b and c for first row is 6-3 = 3. I want to repeat that row 3+1=4 times. Similarly for second row the difference is 5-4 = 1, so I want to repeat it 1+1=2 times. The column d is added to have value from min(b) to diff between b and c (i.e.6-3 = 3. So it goes from 3->6). So I want to get this df:
a b c d
0 1 3 6 3
0 1 3 6 4
0 1 3 6 5
0 1 3 6 6
1 2 4 5 4
1 2 4 5 5
Do it with reindex + repeat, then using groupby cumcount assign the new value d
df1.reindex(df1.index.repeat(df1.eval('c-b').add(1))).\
assign(d=lambda x : x.c-x.groupby('a').cumcount(ascending=False))
Out[572]:
a b c d
0 1 3 6 3
0 1 3 6 4
0 1 3 6 5
0 1 3 6 6
1 2 4 5 4
1 2 4 5 5

Count the amount of times value A occurs with value B

I'm trying to count the amount of times a value in a Pandas dataframe occurs along with another value and count the amount of times for each row.
This is what I mean:
a t
0 a 2
1 b 4
2 c 2
3 g 2
4 b 3
5 a 2
6 b 3
Say I want to count the amount of times a occurs along with the number 2, I'd like the result to be:
a t freq
0 a 2 2
1 b 4 1
2 c 2 1
3 g 2 1
4 b 3 2
5 a 2 2
6 b 3 2
The freq (frequency) column here indicates the amount of times a value in column a appears along with a value in column t.
Please note that a solution that e.g. only counts the amount of times a occurs will result in a wrong frequency considering the size of my dataframe.
Is there a way to achieve this in Python?
Use transform with size or count:
df['freq'] = df.groupby(['a', 't'])['a'].transform('size')
#alternative solution
#df['freq'] = df.groupby(['a', 't'])['a'].transform('count')
print (df)
a t freq
0 a 2 2
1 b 4 1
2 c 2 1
3 g 2 1
4 b 3 2
5 a 2 2
6 b 3 2

Categories