Sort pandas df into individual columns - python

I am trying to sort a pandas df into individual columns based on when values in columns change. For the df below I can sort the df into separate columns when a values changes in Col B. But I'm trying to add Col Cso it's when values change in both Col B and Col C.
import pandas as pd
df = pd.DataFrame({
'A' : [10,20,30,40,40,30,20,10,5,10,15,20,20,15,10,5],
'B' : ['X','X','X','X','Y','Y','Y','Y','X','X','X','X','Y','Y','Y','Y'],
'C' : ['W','W','Z','Z','Z','Z','W','W','W','W','Z','Z','Z','Z','W','W'],
})
d = df['B'].ne(df['B'].shift()).cumsum()
df['C'] = d.groupby(df['B']).transform(lambda x: pd.factorize(x)[0]).add(1).astype(str)
df['D'] = df.groupby(['B','C']).cumcount()
df = df.set_index(['D','C','B'])['A'].unstack([2,1])
df.columns = df.columns.map(''.join)
Output:
X1 Y1 X2 Y2
D
0 10 40 5 20
1 20 30 10 15
2 30 20 15 10
3 40 10 20 5
As you can see, this creates a new column every time there's a new value in Col B. But I'm trying to incorporate Col C as well. So it should be every time there's a change in both Col B and Col C.
Intended output:
XW1 XZ1 YZ1 YW1 XW2 XZ2 YZ2 YW2
0 10 30 40 20 5 15 20 10
1 20 40 30 10 10 20 15 5

Just base on your out put create the help columns one by one.
df['key']=df.B+df.C# create the key
df['key2']=(df.key!=df.key.shift()).ne(0).cumsum() # make the continue key into one group
df.key2=df.groupby('key').key2.apply(lambda x : x.astype('category').cat.codes+1)# change the group number to 1 or 2
df['key3']=df.groupby(['key','key2']).cumcount() # create the index for pivot
df['key']=df.key+df.key2.astype(str) # create the columns for pivot
df.pivot('key3','key','A')#yield
Out[126]:
key XW1 XW2 XZ1 XZ2 YW1 YW2 YZ1 YZ2
key3
0 10 5 30 15 20 10 40 20
1 20 10 40 20 10 5 30 15

Related

How to stack two columns of a pandas dataframe in python

I want to stack two columns on top of each other
So I have Left and Right values in one column each, and want to combine them into a single one. How do I do this in Python?
I'm working with Pandas Dataframes.
Basically from this
Left Right
0 20 25
1 15 18
2 10 35
3 0 5
To this:
New Name
0 20
1 15
2 10
3 0
4 25
5 18
6 35
7 5
It doesn't matter how they are combined as I will plot it anyway, and the new column name also doesn't matter because I can rename it.
You can create a list of the cols, and call squeeze to anonymise the data so it doesn't try to align on columns, and then call concat on this list, passing ignore_index=True creates a new index, otherwise you'll get the names as index values repeated:
cols = [df[col].squeeze() for col in df]
pd.concat(cols, ignore_index=True)
Many options, stack, melt, concat, ...
Here's one:
>>> df.melt(value_name='New Name').drop('variable', 1)
New Name
0 20
1 15
2 10
3 0
4 25
5 18
6 35
7 5
You can also use np.ravel:
import numpy as np
out = pd.DataFrame(np.ravel(df.values.T), columns=['New name'])
print(out)
# Output
New name
0 20
1 15
2 10
3 0
4 25
5 18
6 35
7 5
Update
If you have only 2 cols:
out = pd.concat([df['Left'], df['Right']], ignore_index=True).to_frame('New name')
print(out)
# Output
New name
0 20
1 15
2 10
3 0
4 25
5 18
6 35
7 5
Solution with unstack
df2 = df.unstack()
# recreate index
df2.index = np.arange(len(df2))
A solution with masking.
# Your data
import numpy as np
import pandas as pd
df = pd.DataFrame({"Left":[20,15,10,0], "Right":[25,18,35,5]})
# Masking columns to ravel
df2 = pd.DataFrame({"New Name":np.ravel(df[["Left","Right"]])})
df2
New Name
0 20
1 25
2 15
3 18
4 10
5 35
6 0
7 5
I ended up using this solution, seems to work fine
df1 = dfTest[['Left']].copy()
df2 = dfTest[['Right']].copy()
df2.columns=['Left']
df3 = pd.concat([df1, df2],ignore_index=True)

Set dictionary keys as cells in dataframe column

Please look at my code:
import pandas as pd
my_dict = {
1 :{'a':5 , 'b':10},
5 :{'a':6 , 'b':67},
7 :{'a':33 , 'b':9},
8 :{'a':21 , 'b':37},
}
df = pd.DataFrame (my_dict).transpose()
df['new'] = df.index
print (df)
Here I convert dictionary to DataFrame and set index as new column.
a b new
1 5 10 1
5 6 67 5
7 33 9 7
8 21 37 8
Can it be done in 1 line at the stage of converting a dictionary to a date without
df['new'] = df.index
I want to immediately recognize the major indices as cells of the new column.
Something like
df = pd.DataFrame (my_dict, 'new' = list(my_dict.keys()).transpose()
You can just reset_index() to create a column from index and df.rename to change name of index column to new -
df = pd.DataFrame(my_dict).transpose().reset_index().rename(columns={"index": "new"})
print(df)
new a b
0 1 5 10
1 5 6 67
2 7 33 9
3 8 21 37

Python Dataframe drop rows of multi columns with specific values

My dataframe is given below. i want to drop rows in two columns which have less than 0 value.
df =
name value1 value2
0 A 10 10
1 B -10 10 #drop
2 A 10 10
3 A 40 -10 #drop
4 C 50 10
5 C 60 10
6 D -70 -10 #drop
I want to drop rows with negative values in value1 and value2 columns.
My expected output:
df =
name value1 value2
0 A 10 10
1 A 10 10
2 C 50 10
3 C 60 10
My present code:
df = df[df['value1','value2']>0]
Output:
KeyError: ('value1','value2')
i guess you mean that if one of the 'value1' or 'value2' are negative, you want to drop the row. so use:
df = df[(df['value1'] >= 0) & (df['value2'] >= 0)])

How do I reorder by column totals?

For example, how do I reorder each column sum and row sum in the following data with summed rows and columns?
import pandas as pd
data=[['fileA',47,15,3,5,7],['fileB',33,13,4,7,2],['fileC',25,17,9,3,5],
['fileD',25,7,1,4,2],['fileE',19,15,3,8,4], ['fileF',11,17,8,4,5]]
df = pd.DataFrame(data, columns=['filename','rows_cnt','cols_cnt','col_A','col_B','col_C'])
print(df)
filename rows_cnt cols_cnt col_A col_B col_C
0 fileA 47 15 3 5 7
1 fileB 33 13 4 7 2
2 fileC 25 17 9 3 5
3 fileD 25 7 1 4 2
4 fileE 19 15 3 8 4
5 fileF 11 17 8 4 5
df.loc[6]= df.sum(0)
filename rows_cnt cols_cnt col_A col_B col_C
0 fileA 47 15 3 5 7
1 fileB 33 13 4 7 2
2 fileC 25 17 9 3 5
3 fileD 25 7 1 4 2
4 fileE 19 15 3 8 4
5 fileF 11 17 8 4 5
6 fileA... 160 84 28 31 25
I made an image of the question.
How do I reorder the red frame in this image by the standard?
df.reindex([2,5,0,4,1,3,6], axis='index')
Is the only way to create the index manually like this?
data=[['fileA',47,15,3,5,7],['fileB',33,13,4,7,2],['fileC',25,17,9,3,5],
['fileD',25,7,1,4,2],['fileE',19,15,3,8,4], ['fileF',11,17,8,4,5]]
df = pd.DataFrame(data, columns=['filename','rows_cnt','cols_cnt','col_A','col_B','col_C'])
df = df.sort_values(by='cols_cnt', axis=0, ascending=False)
df.loc[6]= df.sum(0)
# to keep number original of index
df = df.reset_index(drop=False)
# need to remove this filename column, since need to sort by column (axis=1)
# unable sort with str and integer data type
df = df.set_index('filename', drop=True)
df = df.sort_values(by=df.index[-1], axis=1, ascending=False)
# set back the index of dataframe into original
df = df.reset_index(drop=False)
df = df.set_index('index', drop=True)
# try to set the fixed columns
fixed_cols = ['filename', 'rows_cnt','cols_cnt']
# try get the new order of columns by fixed the first three columns
# and then add with the remaining columns
new_cols = fixed_cols + (df.columns.drop(fixed_cols).tolist())
df[new_cols]

How can I add two values of a row, and then put the result into a new cell?

In Python, I have a dataset/frame with 2 values, column A has values of say, 10, 20, 30 and column B has values of 5, 10, 15 etc.
How can I add the value of each row of each column and have the result in a column next to it?
So essentially it will be column C that has the sum results, so the first row will add column A and B for a result in column C for 15, and so on.
Thanks.
simple addition will do
df['C'] = df['A'] + df['B']
Using eval
making a copy by using inplace=False
df.eval('C = A + B', inplace=False)
# create a copy with a new column
A B C
0 10 5 15
1 20 10 30
2 30 15 45
altering the existing dataframe by using inplace=True
df.eval('C = A + B', inplace=True)
df
A B C
0 10 5 15
1 20 10 30
2 30 15 45
Like this:
df = pd.DataFrame({'A':[10,20,30],'B':[5,10,15]})
df = df.assign(C=df.A + df.B)
print(df)
Ouput:
A B C
0 10 5 15
1 20 10 30
2 30 15 45

Categories