I have two dataframe, I am able to merge by pd.merge(df1, df2, on='column_name'). But I only want to merge on first occurrence in df1 Any pointer or solution? It's a many to one, and I only want the first occurrence merged. Thanks in advance!
Since you want to merge two dataframes of different lengths, you'll have to have NaN values in the merged dataframe cells where there are no corresponding indices in df2. So let's try this. Merge left. This will duplicate df2 values for duplicated column_name rows in df1. Have a mask ready to filter those rows and assign NaN for them in the columns from df2.
mask = df1['column_name'].duplicated()
new_df = df1.merge(df2, how='left', on='column_name')
new_df.loc[mask, df2.columns[df2.columns!='column_name']] = np.nan
Related
I have two dataframes that looks like this
df1:
df2:
So the thing is that I want to multiply the ratio column of df1 with the columns Total, Hombres, Mujeres in df2 when it the column of 'Estado' matches with the column of 'Entidad Federativa in both tables', and when it stops matching it goes to the second row and does the same with the matching columns. Anyone has any idea on his? I would appreciate it a lot.
Use DataFrame.div with level=0 for match first level of MultiIndex in df2 and index values of Estado:
df1 = df1.rename(index={'Aguascal':'Aguascalientes'})
#if necessary
#df1 = df1.set_index('Estado')
df2 = (df2.replace(',','', regex=True)
.astype(int)
.set_index(['Entidad Federativa','Grupo quinquenal de edad']))
df = df2.div(df1['Ratio'], level=0, axis=0)
I am trying to make 2 new dataframes by using 2 given dataframe objects:
DF1 = id feature_text length
1 "example text" 12
2 "example text2" 13
....
....
DF2 = id case_num
3 0
....
....
As you could see, both df1 and df2 have column called "id". However, the df1 has all id values, where df2 only has some of them. I mean, df1 has 3200 rows, where each row has a unique id value (1~3200), however, df2 has only some of them (i.e. id=[3,7,20,...]).
What I want to do is 1) get a merged dataframe which contains all rows that have the id values which are included in both df1 and df2, and 2) get a dataframe, which contains the rows in the df1, which have id values that are not included in the df2.
I was able to find a solution for 1), however, have no idea how to do 2).
Thanks.
For the first case, you could use inner merge:
out = df1.merge(df2, on='id')
For the second case, you could use isin, with negation operator, so that we filter out the rows in df1 that have ids that also exist in df2:
out = df1[~df1['id'].isin(df2['id'])]
I want to combine 2 dataframes.
Here are the samples of dataframes:
df1:
linktoDF1
df2:
linkdoDF2
Desired output should be:
linktoResultcsv
What I want in essence is to extend df1 with data from df2. Key to linking data is index of both dataframes which is ['latitude','level','longitude']. I want to omit data with index unique to df2. i.e. I don't want to see data with index [41, 1000, 19.25 ]
Any help is appreciated.
Use "merge" with how='left', which omits indexes not in df1:
rslt= pd.merge(df1,df2,on=["latitude","level","longitude"],how="left")
I have two dataframes df1 and df2. df1 gives some numerical data on some elements (A,B,C ...) while df2 is a dataframe acting like a classification table with its index being the column names of df1. I would like to filter df1 by only keeping columns that are matching a certain classification in df2.
For instance, let's assume the following two dataframes and that I only want to keep elements (i.e. columns of df1) that belong to class 'C1':
df1 = pd.DataFrame({'A': [1,2],'B': [3,4],'C': [5,6]},index=[0, 1])
df2 = pd.DataFrame({'Name': ['A','B','C'],'Class': ['C1','C1','C2'],'Subclass': [C11,C12,C21]},index=[0, 1, 2])
df2 = df2.set_index('Name')
The expected result should be the dataframe df1 with only columns A and B because in df2, we can see that A and B are in class C1. Not sure how to do that. I was thinking about first filtering df2 by 'C1' values in its 'Class' column and then check if df1.columns are in df2.index but I suppose there is a much efficient way to do that. Thanks for your help
Here is one way using index slice
df1.loc[:,df2.index[df2.Class=='C1']]
Out[578]:
Name A B
0 1 3
1 2 4
I have two Dataframes with a single column and a common date index. How do I create a THIRD dataframe with the same date index and a copy of both columns?
If df1 and df2 have the same index:
df_joined = df1.join(df2)
Here is the documentation.