Data Frame value set to 1 based on condition - python

there is two data frame
import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[0,1,0,0,1], [1,0,0,1,0], [0,1,1,0,0]]))
df
0 1 2 3 4
0 0 1 0 0 1
1 1 0 0 1 0
2 0 1 1 0 0
ddff = pd.DataFrame(np.array([[3,2,1], [4,2,3], [3,1,4], [4,1,2], [2,3,1]]))
ddff
0 1 2
0 3 2 1
1 4 2 3
2 3 1 4
3 4 1 2
4 2 3 1
Now, I need to modify df data frame row 0 values based on ddff data frame. ddff data frame row 0 consist [3,2,1] values, Now If df data frame columns 3, 2 and 1 have value 0 then set it to 1
expected output
0 1 2 3 4
0 0 1 1 1 1
1 1 0 0 1 0
2 0 1 1 0 0

This will replace all elements in row 0 in df in columns given by row 0 in ddff
df.iloc[0][ddff.iloc[0].values] = 1
# Out:
# 0 1 2 3 4
# 0 0 1 1 1 1
# 1 1 0 0 1 0
# 2 0 1 1 0 0
Explanation: ddff.iloc[0].values reads the column names from row 0 in ddff.

Related

Splitting a non delimited column and create an additional column to count which number value

I have a problem in which I want to take Table 1 and turn it into Table 2 using Python.
Does anybody have any ideas? I've tried to split the Value column from table 1 but run into issues in that each value is a different length, hence I can't always define how much to split it.
Equally I have not been able to think through how to create a new column that counts the position that value was in the string.
Table 1, before:
ID
Value
1
000000S
2
000FY
Table 2, after:
ID
Position
Value
1
1
0
1
2
0
1
3
0
1
4
0
1
5
0
1
6
0
1
7
S
2
1
0
2
2
0
2
3
0
2
4
F
2
5
Y
You can split the string to individual characters and explode:
out = (df
.assign(Value=df['Value'].apply(list))
.explode('Value')
)
output:
ID Value
0 1 0
0 1 0
0 1 0
0 1 0
0 1 0
0 1 0
0 1 S
1 2 0
1 2 0
1 2 0
1 2 F
1 2 Y
Given:
ID Value
0 1 000000S
1 2 000FY
Doing:
df.Value = df.Value.apply(list)
df = df.explode('Value')
df['Position'] = df.groupby('ID').cumcount() + 1
Output:
ID Value Position
0 1 0 1
0 1 0 2
0 1 0 3
0 1 0 4
0 1 0 5
0 1 0 6
0 1 S 7
1 2 0 1
1 2 0 2
1 2 0 3
1 2 F 4
1 2 Y 5

Overwrite data frame value

I have two data frame df and ddff
df data frame have 3 row and 5 columns
import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[0,1,0,0,1], [1,0,0,1,0], [0,1,1,0,0]]))
df
0 1 2 3 4
0 0 1 0 0 1
1 1 0 0 1 0
2 0 1 1 0 0
ddff data frame consist of neighbour columns of a particular columns which have 5 row and 3 column where the value of ddff data frame represent the column name of df
ddff = pd.DataFrame(np.array([[3,2,1], [4,2,3], [3,1,4], [4,1,2], [2,3,1]]))
ddff
0 1 2
0 3 2 1
1 4 2 3
2 3 1 4
3 4 1 2
4 2 3 1
Now I need a final data frame where where df column neighbour's set to 1 (overwrite previous value)
expected output
0 1 2 3 4
0 0 1 1 1 0
1 1 0 0 1 0
2 0 1 1 0 0
You can filter the relevant column numbers from ddff, and set the values in those columns in the first row equal to 1 and set the values in the remaining columns to 0:
relevant_columns = ddff.loc[0]
df.loc[0,relevant_columns] = 1
df.loc[0,df.columns[~df.columns.isin(relevant_columns)]] = 0
Output:
0 1 2 3 4
0 0 1 1 1 0
1 1 0 0 1 0
2 0 1 1 0 0
You can use:
s = ddff.loc[0].values
df.loc[0] = np.where(df.loc[[0]].columns.isin(s),1,0)
>>> df
0 1 2 3 4
0 0 1 1 1 0
1 1 0 0 1 0
2 0 1 1 0 0
Breaking it down:
>>> np.where(df.loc[[0]].columns.isin(s),1,0)
array([0, 1, 1, 1, 0])
# Before the update
>>> df.loc[0]
0 0
1 1
2 0
3 0
4 1
# After the assignment back
0 0
1 1
2 1
3 1
4 0

Pandas: sort according to a row

I have a Dataframe like this (with labels on rows and columns):
0 1 2 3
0 1 1 0 0
1 0 1 1 0
2 1 0 1 0
-1 5 6 3 2
I would like to order the columns according to the last row (and then drop the row):
0 1 2 3
0 1 1 0 0
1 1 0 1 0
2 0 1 1 0
Try np.argsort to get the order, then iloc to rearrange columns and drop rows:
df.iloc[:-1, np.argsort(-df.iloc[-1])]
Output:
1 0 2 3
0 1 1 0 0
1 1 0 1 0
2 0 1 1 0

How to cell values as new columns in pandas dataframe

I have a dataframe like the following:
Labels
1 Nail_Polish,Nails
2 Nail_Polish,Nails
3 Foot_Care,Targeted_Body_Care
4 Foot_Care,Targeted_Body_Care,Skin_Care
I want to generate the following matrix:
Nail_Polish Nails Foot_Care Targeted_Body_Care Skin_Care
1 1 1 0 0 0
2 1 1 0 0 0
3 0 0 1 1 0
4 0 0 1 1 1
How can I achieve this?
Use str.get_dummies:
df2 = df['Labels'].str.get_dummies(sep=',')
The resulting output:
Foot_Care Nail_Polish Nails Skin_Care Targeted_Body_Care
1 0 1 1 0 0
2 0 1 1 0 0
3 1 0 0 0 1
4 1 0 0 1 1

How to get a cross tabulation with pandas crosstab that would display the frequency of multiple values of a column variable?

Let's say i have a dataframe:
df = pd.DataFrame(np.random.randint(0,5, size=(5,6)), columns=list('ABCDEF'))
Crossing variables with pd.crosstab is simple enough:
table = pd.crosstab(index=df['A'], columns=df['B'])
Yields:
B 1 2 3 4
A
0 1 0 0 0
1 0 0 0 1
2 0 1 1 0
3 0 1 0 0
Where I would for example want a table like this:
B (1+2+3) 1 2 3 4
A
0 1 1 0 0 0
1 0 0 0 0 1
2 2 0 1 1 0
3 1 0 1 0 0
Can anyone set me on the right track here?
Use sum with subset, but if use small random df there can be problem you get always another values so values of columns will be different. If use np.random.seed(100) get same test output as my answer.
table['(1+2+3)'] = table[[1,2,3]].sum(axis=1)
Sample:
np.random.seed(100)
df = pd.DataFrame(np.random.randint(0,5, size=(5,6)), columns=list('ABCDEF'))
table = pd.crosstab(index=df['A'], columns=df['B'])
table['(1+2+3)'] = table[[1,2,3]].sum(axis=1)
print (table)
B 0 1 2 3 4 (1+2+3)
A
0 1 0 0 0 1 0
1 0 0 0 1 0 1
2 0 0 1 0 0 1
3 0 1 0 0 0 1

Categories