Compare 2 Columns if the the value in Second column replace it - python

I need to compare two columns if the value in the second column is 3 replace it at the first column, if its 0 in the second column then keep it
I have thses two column
Column A
Column B
2
0
2
3
1
0
2
3
2
3
2
0
it should be like this
Column A
Column B
2
0
3
3
1
0
3
3
3
3
2
0

You can use boolean indexing for that. df['B'] == 3 returns a boolean array that can be applied to column A to set the value equal to 3.
df = pd.DataFrame({'A': [2,2,1,2,2,2], 'B': [0,3,0,3,3,0]})
df['A'].loc[df['B'] == 3] = 3
print(df)
Output:
A B
0 2 0
1 3 3
2 1 0
3 3 3
4 3 3
5 2 0

Related

Selecting rows based on multiple columns in pandas - why are these two commands different?

I have a pandas DataFrame:
import pandas as pd
a = [0,0,1,1,2,7]
b = [1,0,0,1,1,4]
df = pd.DataFrame(list(zip(a,b)), columns = ('a','b'))
df
a b
0 0 1
1 0 0
2 1 0
3 1 1
4 2 1
5 7 4
I want to select all rows where both a and b are greater than zero:
Why does this command only return some of the desired rows:
df[(df['a'] & df['b'])>0]
a b
3 1 1
5 7 4
While this other command returns all of the desired rows:
df[((df['a']>0) & (df['b']>0))]
a b
3 1 1
4 2 1
5 7 4
it would mean that sum should greater than 1?
do you also have negative values in these columns?
df[df.sum(axis=1)>1]
a b
3 1 1
4 2 1
5 7 4
OR
rows where values is greater than zero are summed along rows
df[df[df>0].sum(axis=1)>1]

Search and update values in other dataframe for specific columns

I have two different dataframe in pandas.
First
A
B
C
D
VALUE
1
2
3
5
0
1
5
3
2
0
2
5
3
2
0
Second
A
B
C
D
Value
5
3
3
2
1
1
5
4
3
1
I want column values A and B in the first dataframe to be searched in the second dataframe. If A and B values match then update the Value column.Search only 2 columns in other dataframe and update only 1 column. Actually the process we know in sql.
Result
A
B
C
D
VALUE
1
2
3
5
0
1
5
3
2
1
2
5
3
2
0
If you focus on the bold text, you can understand it more easily.Despite my attempts, I could not succeed. I only want 1 column to change but it also changes A and B. I only want the Value column of matches to change.
You can use a merge:
cols = ['A', 'B']
df1['VALUE'] = (df2.merge(df1[cols], on=cols, how='right')
['Value'].fillna(df1['VALUE'], downcast='infer')
)
output:
A B C D VALUE
0 1 2 3 5 0
1 1 5 3 2 1
2 2 5 3 2 0

Pandas: idempotent/force join between dataframes with column overlap

I am working in a notebook, so if I run:
df1 = df1.join(series2)
It works fine. However, if I run it again, I receive the following error:
ValueError: columns overlap but no suffix specified
Because it is equivalent to df1 = df1.join(series2).join(series2). Is there any way I can force an overwrite on the overlapping columns without creating an endless amount of columns with the _y suffix?
Sample df1
index a
0 0
0 1
1 2
1 3
2 4
2 5
Sample series2
index b
0 1
1 2
2 3
Desired output from df1 = df1.join(series2)
index a b
0 0 1
0 1 1
1 2 2
1 3 2
2 4 3
2 5 3
Desired output from df1 = df1.join(series2); df1 = df1.join(series2)
# same as above because of forced overwrite on either the left or right join.
index a b
0 0 1
0 1 1
1 2 2
1 3 2
2 4 3
2 5 3

Create unique MultiIndex from Non-unique Index Python Pandas

I have a pandas DataFrame with a non-unique index:
index = [1,1,1,1,2,2,2,3]
df = pd.DataFrame(data = {'col1': [1,3,7,6,2,4,3,4]}, index=index)
df
Out[12]:
col1
1 1
1 3
1 7
1 6
2 2
2 4
2 3
3 4
I'd like to turn this into unique MultiIndex and preserve order, like this:
col1
Ind2
1 0 1
1 3
2 7
3 6
2 0 2
1 4
2 3
3 0 4
I would imagine pandas would have a function for something like this but haven't found anything
You can do a groupby.cumcount on the index, and then append it as a new level to the index using set_index:
df = df.set_index(df.groupby(level=0).cumcount(), append=True)
The resulting output:
col1
1 0 1
1 3
2 7
3 6
2 0 2
1 4
2 3
3 0 4

Rearranging a non-consecutive order of columns in pandas dataframe

I have a pandas data frame (result) df with n (variable) columns that I generated using the merge of two other data frames:
result1 = df1.merge(df2, on='ID', how='left')
result1 dataframe is expected to have a variable # of columns (this is part of a larger script). i want to arrange the columns in a way that the last 2 columns will be the second and third consecutively, then all the remaining columns will follow (while the first column stays as first column). If result1 is known to have 6 columns, then i could use:
result2=result1.iloc[:,[0,4,5,1,2,3]] #this works fine.
BUT, i need the 1,2,3 to be in a range format as it is not practical to enter the whole of the numbers for each df. So, i thought of using:
result2=result1.iloc[:,[0,len(result1.columns), len(result1.columns)-1, 1:len(result1.columns-2]]
#Assuming 6 columns : 0, 5 , 4 , 1, 2, 3
That would be the idea way but this is creating syntax errors. Any suggestions to fix this?
Instead of using slicing syntax, I'd just build a list and use that:
>>> df
0 1 2 3 4 5
0 0 1 2 3 4 5
1 0 1 2 3 4 5
2 0 1 2 3 4 5
3 0 1 2 3 4 5
4 0 1 2 3 4 5
>>> ncol = len(df.columns)
>>> df.iloc[:,[0, ncol-1, ncol-2] + list(range(1,ncol-2))]
0 5 4 1 2 3
0 0 5 4 1 2 3
1 0 5 4 1 2 3
2 0 5 4 1 2 3
3 0 5 4 1 2 3
4 0 5 4 1 2 3

Categories