Overwrite data frame value - python

I have two data frame df and ddff
df data frame have 3 row and 5 columns
import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[0,1,0,0,1], [1,0,0,1,0], [0,1,1,0,0]]))
df
0 1 2 3 4
0 0 1 0 0 1
1 1 0 0 1 0
2 0 1 1 0 0
ddff data frame consist of neighbour columns of a particular columns which have 5 row and 3 column where the value of ddff data frame represent the column name of df
ddff = pd.DataFrame(np.array([[3,2,1], [4,2,3], [3,1,4], [4,1,2], [2,3,1]]))
ddff
0 1 2
0 3 2 1
1 4 2 3
2 3 1 4
3 4 1 2
4 2 3 1
Now I need a final data frame where where df column neighbour's set to 1 (overwrite previous value)
expected output
0 1 2 3 4
0 0 1 1 1 0
1 1 0 0 1 0
2 0 1 1 0 0

You can filter the relevant column numbers from ddff, and set the values in those columns in the first row equal to 1 and set the values in the remaining columns to 0:
relevant_columns = ddff.loc[0]
df.loc[0,relevant_columns] = 1
df.loc[0,df.columns[~df.columns.isin(relevant_columns)]] = 0
Output:
0 1 2 3 4
0 0 1 1 1 0
1 1 0 0 1 0
2 0 1 1 0 0

You can use:
s = ddff.loc[0].values
df.loc[0] = np.where(df.loc[[0]].columns.isin(s),1,0)
>>> df
0 1 2 3 4
0 0 1 1 1 0
1 1 0 0 1 0
2 0 1 1 0 0
Breaking it down:
>>> np.where(df.loc[[0]].columns.isin(s),1,0)
array([0, 1, 1, 1, 0])
# Before the update
>>> df.loc[0]
0 0
1 1
2 0
3 0
4 1
# After the assignment back
0 0
1 1
2 1
3 1
4 0

Related

Splitting a non delimited column and create an additional column to count which number value

I have a problem in which I want to take Table 1 and turn it into Table 2 using Python.
Does anybody have any ideas? I've tried to split the Value column from table 1 but run into issues in that each value is a different length, hence I can't always define how much to split it.
Equally I have not been able to think through how to create a new column that counts the position that value was in the string.
Table 1, before:
ID
Value
1
000000S
2
000FY
Table 2, after:
ID
Position
Value
1
1
0
1
2
0
1
3
0
1
4
0
1
5
0
1
6
0
1
7
S
2
1
0
2
2
0
2
3
0
2
4
F
2
5
Y
You can split the string to individual characters and explode:
out = (df
.assign(Value=df['Value'].apply(list))
.explode('Value')
)
output:
ID Value
0 1 0
0 1 0
0 1 0
0 1 0
0 1 0
0 1 0
0 1 S
1 2 0
1 2 0
1 2 0
1 2 F
1 2 Y
Given:
ID Value
0 1 000000S
1 2 000FY
Doing:
df.Value = df.Value.apply(list)
df = df.explode('Value')
df['Position'] = df.groupby('ID').cumcount() + 1
Output:
ID Value Position
0 1 0 1
0 1 0 2
0 1 0 3
0 1 0 4
0 1 0 5
0 1 0 6
0 1 S 7
1 2 0 1
1 2 0 2
1 2 0 3
1 2 F 4
1 2 Y 5

Data Frame value set to 1 based on condition

there is two data frame
import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[0,1,0,0,1], [1,0,0,1,0], [0,1,1,0,0]]))
df
0 1 2 3 4
0 0 1 0 0 1
1 1 0 0 1 0
2 0 1 1 0 0
ddff = pd.DataFrame(np.array([[3,2,1], [4,2,3], [3,1,4], [4,1,2], [2,3,1]]))
ddff
0 1 2
0 3 2 1
1 4 2 3
2 3 1 4
3 4 1 2
4 2 3 1
Now, I need to modify df data frame row 0 values based on ddff data frame. ddff data frame row 0 consist [3,2,1] values, Now If df data frame columns 3, 2 and 1 have value 0 then set it to 1
expected output
0 1 2 3 4
0 0 1 1 1 1
1 1 0 0 1 0
2 0 1 1 0 0
This will replace all elements in row 0 in df in columns given by row 0 in ddff
df.iloc[0][ddff.iloc[0].values] = 1
# Out:
# 0 1 2 3 4
# 0 0 1 1 1 1
# 1 1 0 0 1 0
# 2 0 1 1 0 0
Explanation: ddff.iloc[0].values reads the column names from row 0 in ddff.

Pandas: sort according to a row

I have a Dataframe like this (with labels on rows and columns):
0 1 2 3
0 1 1 0 0
1 0 1 1 0
2 1 0 1 0
-1 5 6 3 2
I would like to order the columns according to the last row (and then drop the row):
0 1 2 3
0 1 1 0 0
1 1 0 1 0
2 0 1 1 0
Try np.argsort to get the order, then iloc to rearrange columns and drop rows:
df.iloc[:-1, np.argsort(-df.iloc[-1])]
Output:
1 0 2 3
0 1 1 0 0
1 1 0 1 0
2 0 1 1 0

How do I create a column such that its values is count of the number of,1, in that row, which are appearing for the first time in their own column?

How do I do this operation using pandas?
Initial Df:
A B C D
0 0 1 0 0
1 0 1 0 0
2 0 0 1 1
3 0 1 0 1
4 1 1 0 0
5 1 1 1 0
Final Df:
A B C D Param
0 0 1 0 0 1
1 0 1 0 0 0
2 0 0 1 1 2
3 0 1 0 1 0
4 1 1 0 0 1
5 1 1 1 0 0
Basically Param is the number of the 1 in that row which is appearing for the first time in its own column.
Example:
index 0 : 1 in the column B is appearing for the first time hence Param1 = 1
index 1 : none of the 1 is appearing for the first time in its own column hence Param1 = 0
index 2 : 1 in the column C and D is appearing for the first time in their columns hence Paramm1 = 2
index 3 : none of the 1 is appearing for the first time in its own column hence Param1 = 0
index 4 : 1 in the column A is appearing for the first time in the column hence Paramm1 = 1
index 5 : none of the 1 is appearing for the first time in its own column hence Param1 = 0
I will do idxmax and value_counts
df['Param']=df.idxmax().value_counts().reindex(df.index,fill_value=0)
df
A B C D Param
0 0 1 0 0 1
1 0 1 0 0 0
2 0 0 1 1 2
3 0 1 0 1 0
4 1 1 0 0 1
5 1 1 1 0 0
You can check for duplicated values, multiply with df and sum:
df['Param'] = df.apply(lambda x: ~x.duplicated()).mul(df).sum(1)
Output:
A B C D Param
0 0 1 0 0 1
1 0 1 0 0 0
2 0 0 1 1 2
3 0 1 0 1 0
4 1 1 0 0 1
5 1 1 1 0 0
Assuming these are integers, you can use cumsum() twice to isolate the first occurrence of 1.
df2 = (df.cumsum() > 0).cumsum() == 1
df['Param'] = df2.sum(axis = 1)
print(df)
If df elements are strings, you should first convert them to integers.
df = df.astype(int)

How to cell values as new columns in pandas dataframe

I have a dataframe like the following:
Labels
1 Nail_Polish,Nails
2 Nail_Polish,Nails
3 Foot_Care,Targeted_Body_Care
4 Foot_Care,Targeted_Body_Care,Skin_Care
I want to generate the following matrix:
Nail_Polish Nails Foot_Care Targeted_Body_Care Skin_Care
1 1 1 0 0 0
2 1 1 0 0 0
3 0 0 1 1 0
4 0 0 1 1 1
How can I achieve this?
Use str.get_dummies:
df2 = df['Labels'].str.get_dummies(sep=',')
The resulting output:
Foot_Care Nail_Polish Nails Skin_Care Targeted_Body_Care
1 0 1 1 0 0
2 0 1 1 0 0
3 1 0 0 0 1
4 1 0 0 1 1

Categories