How do I join columns from different Pandas DataFrames? - python

I have two Dataframes with a single column and a common date index. How do I create a THIRD dataframe with the same date index and a copy of both columns?

If df1 and df2 have the same index:
df_joined = df1.join(df2)
Here is the documentation.

Related

Pandas merge two data frame only to first occurrence

I have two dataframe, I am able to merge by pd.merge(df1, df2, on='column_name'). But I only want to merge on first occurrence in df1 Any pointer or solution? It's a many to one, and I only want the first occurrence merged. Thanks in advance!
Since you want to merge two dataframes of different lengths, you'll have to have NaN values in the merged dataframe cells where there are no corresponding indices in df2. So let's try this. Merge left. This will duplicate df2 values for duplicated column_name rows in df1. Have a mask ready to filter those rows and assign NaN for them in the columns from df2.
mask = df1['column_name'].duplicated()
new_df = df1.merge(df2, how='left', on='column_name')
new_df.loc[mask, df2.columns[df2.columns!='column_name']] = np.nan

Create single pandas dataframe from a list of dataframes

I have a list of about 25 dfs and all of the columns are the same. Although the row count is different, I am only interested in the first row of each df.
How can I iterate through the list of dfs, copy the first row from each and concatenate them all into a single df?
Select first row by position with DataFrame.iloc and [[0]] for one row DataFrames and join together by concat:
df = pd.concat([x.iloc[[0]] for x in dfs], ignore_index=True)
Or use DataFrame.head for one row DataFrames:
df = pd.concat([x.head(1) for x in dfs], ignore_index=True)

combine dataframes in pandas with repeat slices of one of them

I want to combine 2 dataframes.
Here are the samples of dataframes:
df1:
linktoDF1
df2:
linkdoDF2
Desired output should be:
linktoResultcsv
What I want in essence is to extend df1 with data from df2. Key to linking data is index of both dataframes which is ['latitude','level','longitude']. I want to omit data with index unique to df2. i.e. I don't want to see data with index [41, 1000, 19.25 ]
Any help is appreciated.
Use "merge" with how='left', which omits indexes not in df1:
rslt= pd.merge(df1,df2,on=["latitude","level","longitude"],how="left")

Filter pandas dataframe columns based on other dataframe

I have two dataframes df1 and df2. df1 gives some numerical data on some elements (A,B,C ...) while df2 is a dataframe acting like a classification table with its index being the column names of df1. I would like to filter df1 by only keeping columns that are matching a certain classification in df2.
For instance, let's assume the following two dataframes and that I only want to keep elements (i.e. columns of df1) that belong to class 'C1':
df1 = pd.DataFrame({'A': [1,2],'B': [3,4],'C': [5,6]},index=[0, 1])
df2 = pd.DataFrame({'Name': ['A','B','C'],'Class': ['C1','C1','C2'],'Subclass': [C11,C12,C21]},index=[0, 1, 2])
df2 = df2.set_index('Name')
The expected result should be the dataframe df1 with only columns A and B because in df2, we can see that A and B are in class C1. Not sure how to do that. I was thinking about first filtering df2 by 'C1' values in its 'Class' column and then check if df1.columns are in df2.index but I suppose there is a much efficient way to do that. Thanks for your help
Here is one way using index slice
df1.loc[:,df2.index[df2.Class=='C1']]
Out[578]:
Name A B
0 1 3
1 2 4

how to rearrange order of columns from other dataframe in pandas

I have following dataframe columns in pandas
df1
Index(['Income','Age','Gender'])
df2
Index(['Age','Gender','Income'])
I want to reorder the column order of df2as per df1 How can I do it in pandas? In R I can easily do it with df2[names(df1)]

Categories