how to rearrange order of columns from other dataframe in pandas - python

I have following dataframe columns in pandas
df1
Index(['Income','Age','Gender'])
df2
Index(['Age','Gender','Income'])
I want to reorder the column order of df2as per df1 How can I do it in pandas? In R I can easily do it with df2[names(df1)]

Related

How to write df.query to select all columns from a dataframe with NaN?

I have the following code which returns a df with all columns having NaN values:
df.loc[:, df.isna().any()]
How would I write this code using df.query?

How to drop columns, do operation on remaining columns, then insert back the dropped columns?

I have a large dataset. I want to apply something on all the columns except for 2.
I dropped the 2 columns and created a separate dataframe, then tried merging the dataframes after the operation is applied.
I tried appending, merging, joining the two dataframes but they all created duplicate rows. Appending doubled the row count, and changed the dropped columns.
I just want to add back the 2 columns to the initial dataframe unchanged. Any help?
df= col1 col2 col3... col100
1 2 3 100
df2=df.loc[:,['col2', 'col3']]
df.drop(columns=['col2', 'col3'], inplace=True)
Then do what I needed to do to df.
Now I want to merge df and df2.
Like this:
cols = ['col2', 'col3']
df2 = df[cols]
df.drop(columns=cols, inplace=True)
# do something
df = pd.concat([df, df2], axis=1)
This will work as long as you didn't remove rows from either dataframes or changed their order

Filter pandas dataframe columns based on other dataframe

I have two dataframes df1 and df2. df1 gives some numerical data on some elements (A,B,C ...) while df2 is a dataframe acting like a classification table with its index being the column names of df1. I would like to filter df1 by only keeping columns that are matching a certain classification in df2.
For instance, let's assume the following two dataframes and that I only want to keep elements (i.e. columns of df1) that belong to class 'C1':
df1 = pd.DataFrame({'A': [1,2],'B': [3,4],'C': [5,6]},index=[0, 1])
df2 = pd.DataFrame({'Name': ['A','B','C'],'Class': ['C1','C1','C2'],'Subclass': [C11,C12,C21]},index=[0, 1, 2])
df2 = df2.set_index('Name')
The expected result should be the dataframe df1 with only columns A and B because in df2, we can see that A and B are in class C1. Not sure how to do that. I was thinking about first filtering df2 by 'C1' values in its 'Class' column and then check if df1.columns are in df2.index but I suppose there is a much efficient way to do that. Thanks for your help
Here is one way using index slice
df1.loc[:,df2.index[df2.Class=='C1']]
Out[578]:
Name A B
0 1 3
1 2 4

How to drop rows from a dataframe as per null values in a specific column?

How to drop rows from a dataframe as per null values in a specific column?
Say I have a dataframe that has three columns a,b,c and all can have null values, but I only want to droprows where column b has null/NaN. How can I do that in pandas dataframe?
This should do the trick:
df = df.dropna(subset=['b'], axis=1)

How do I join columns from different Pandas DataFrames?

I have two Dataframes with a single column and a common date index. How do I create a THIRD dataframe with the same date index and a copy of both columns?
If df1 and df2 have the same index:
df_joined = df1.join(df2)
Here is the documentation.

Categories