I have a Dataframe like this (with labels on rows and columns):
0 1 2 3
0 1 1 0 0
1 0 1 1 0
2 1 0 1 0
-1 5 6 3 2
I would like to order the columns according to the last row (and then drop the row):
0 1 2 3
0 1 1 0 0
1 1 0 1 0
2 0 1 1 0
Try np.argsort to get the order, then iloc to rearrange columns and drop rows:
df.iloc[:-1, np.argsort(-df.iloc[-1])]
Output:
1 0 2 3
0 1 1 0 0
1 1 0 1 0
2 0 1 1 0
Related
I have a problem in which I want to take Table 1 and turn it into Table 2 using Python.
Does anybody have any ideas? I've tried to split the Value column from table 1 but run into issues in that each value is a different length, hence I can't always define how much to split it.
Equally I have not been able to think through how to create a new column that counts the position that value was in the string.
Table 1, before:
ID
Value
1
000000S
2
000FY
Table 2, after:
ID
Position
Value
1
1
0
1
2
0
1
3
0
1
4
0
1
5
0
1
6
0
1
7
S
2
1
0
2
2
0
2
3
0
2
4
F
2
5
Y
You can split the string to individual characters and explode:
out = (df
.assign(Value=df['Value'].apply(list))
.explode('Value')
)
output:
ID Value
0 1 0
0 1 0
0 1 0
0 1 0
0 1 0
0 1 0
0 1 S
1 2 0
1 2 0
1 2 0
1 2 F
1 2 Y
Given:
ID Value
0 1 000000S
1 2 000FY
Doing:
df.Value = df.Value.apply(list)
df = df.explode('Value')
df['Position'] = df.groupby('ID').cumcount() + 1
Output:
ID Value Position
0 1 0 1
0 1 0 2
0 1 0 3
0 1 0 4
0 1 0 5
0 1 0 6
0 1 S 7
1 2 0 1
1 2 0 2
1 2 0 3
1 2 F 4
1 2 Y 5
there is two data frame
import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[0,1,0,0,1], [1,0,0,1,0], [0,1,1,0,0]]))
df
0 1 2 3 4
0 0 1 0 0 1
1 1 0 0 1 0
2 0 1 1 0 0
ddff = pd.DataFrame(np.array([[3,2,1], [4,2,3], [3,1,4], [4,1,2], [2,3,1]]))
ddff
0 1 2
0 3 2 1
1 4 2 3
2 3 1 4
3 4 1 2
4 2 3 1
Now, I need to modify df data frame row 0 values based on ddff data frame. ddff data frame row 0 consist [3,2,1] values, Now If df data frame columns 3, 2 and 1 have value 0 then set it to 1
expected output
0 1 2 3 4
0 0 1 1 1 1
1 1 0 0 1 0
2 0 1 1 0 0
This will replace all elements in row 0 in df in columns given by row 0 in ddff
df.iloc[0][ddff.iloc[0].values] = 1
# Out:
# 0 1 2 3 4
# 0 0 1 1 1 1
# 1 1 0 0 1 0
# 2 0 1 1 0 0
Explanation: ddff.iloc[0].values reads the column names from row 0 in ddff.
I have two data frame df and ddff
df data frame have 3 row and 5 columns
import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[0,1,0,0,1], [1,0,0,1,0], [0,1,1,0,0]]))
df
0 1 2 3 4
0 0 1 0 0 1
1 1 0 0 1 0
2 0 1 1 0 0
ddff data frame consist of neighbour columns of a particular columns which have 5 row and 3 column where the value of ddff data frame represent the column name of df
ddff = pd.DataFrame(np.array([[3,2,1], [4,2,3], [3,1,4], [4,1,2], [2,3,1]]))
ddff
0 1 2
0 3 2 1
1 4 2 3
2 3 1 4
3 4 1 2
4 2 3 1
Now I need a final data frame where where df column neighbour's set to 1 (overwrite previous value)
expected output
0 1 2 3 4
0 0 1 1 1 0
1 1 0 0 1 0
2 0 1 1 0 0
You can filter the relevant column numbers from ddff, and set the values in those columns in the first row equal to 1 and set the values in the remaining columns to 0:
relevant_columns = ddff.loc[0]
df.loc[0,relevant_columns] = 1
df.loc[0,df.columns[~df.columns.isin(relevant_columns)]] = 0
Output:
0 1 2 3 4
0 0 1 1 1 0
1 1 0 0 1 0
2 0 1 1 0 0
You can use:
s = ddff.loc[0].values
df.loc[0] = np.where(df.loc[[0]].columns.isin(s),1,0)
>>> df
0 1 2 3 4
0 0 1 1 1 0
1 1 0 0 1 0
2 0 1 1 0 0
Breaking it down:
>>> np.where(df.loc[[0]].columns.isin(s),1,0)
array([0, 1, 1, 1, 0])
# Before the update
>>> df.loc[0]
0 0
1 1
2 0
3 0
4 1
# After the assignment back
0 0
1 1
2 1
3 1
4 0
In the pandas data frame, the one-hot encoded vectors are present as columns, i.e:
Rows A B C D E
0 0 0 0 1 0
1 0 0 1 0 0
2 0 1 0 0 0
3 0 0 0 1 0
4 1 0 0 0 0
4 0 0 0 0 1
How to convert these columns into one data frame column by label encoding them in python? i.e:
Rows A
0 4
1 3
2 2
3 4
4 1
5 5
Also need suggestion on this that some rows have multiple 1s, how to handle those rows because we can have only one category at a time.
Try with argmax
#df=df.set_index('Rows')
df['New']=df.values.argmax(1)+1
df
Out[231]:
A B C D E New
Rows
0 0 0 0 1 0 4
1 0 0 1 0 0 3
2 0 1 0 0 0 2
3 0 0 0 1 0 4
4 1 0 0 0 0 1
4 0 0 0 0 1 5
argmaxis the way to go, adding another way using idxmax and get_indexer:
df['New'] = df.columns.get_indexer(df.idxmax(1))+1
#df.idxmax(1).map(df.columns.get_loc)+1
print(df)
Rows A B C D E New
0 0 0 0 1 0 4
1 0 0 1 0 0 3
2 0 1 0 0 0 2
3 0 0 0 1 0 4
4 1 0 0 0 0 1
5 0 0 0 0 1 5
Also need suggestion on this that some rows have multiple 1s, how to
handle those rows because we can have only one category at a time.
In this case you dot your DataFrame of dummies with an array of all the powers of 2 (based on the number of columns). This ensures that the presence of any unique combination of dummies (A, A+B, A+B+C, B+C, ...) will have a unique category label. (Added a few rows at the bottom to illustrate the unique counting)
df['Category'] = df.dot(2**np.arange(df.shape[1]))
A B C D E Category
Rows
0 0 0 0 1 0 8
1 0 0 1 0 0 4
2 0 1 0 0 0 2
3 0 0 0 1 0 8
4 1 0 0 0 0 1
5 0 0 0 0 1 16
6 1 0 0 0 1 17
7 0 1 0 0 1 18
8 1 1 0 0 1 19
Another readable solution on top of other great solutions provided that works for ANY type of variables in your dataframe:
df['variables'] = np.where(df.values)[1]+1
output:
A B C D E variables
0 0 0 0 1 0 4
1 0 0 1 0 0 3
2 0 1 0 0 0 2
3 0 0 0 1 0 4
4 1 0 0 0 0 1
5 0 0 0 0 1 5
I have a dataframe like the following:
Labels
1 Nail_Polish,Nails
2 Nail_Polish,Nails
3 Foot_Care,Targeted_Body_Care
4 Foot_Care,Targeted_Body_Care,Skin_Care
I want to generate the following matrix:
Nail_Polish Nails Foot_Care Targeted_Body_Care Skin_Care
1 1 1 0 0 0
2 1 1 0 0 0
3 0 0 1 1 0
4 0 0 1 1 1
How can I achieve this?
Use str.get_dummies:
df2 = df['Labels'].str.get_dummies(sep=',')
The resulting output:
Foot_Care Nail_Polish Nails Skin_Care Targeted_Body_Care
1 0 1 1 0 0
2 0 1 1 0 0
3 1 0 0 0 1
4 1 0 0 1 1