Merge Dataframes using List of Columns (Pandas Vlookup) - python

I'd like to lookup several columns from another dataframe that I have in a list to bring them over to my main dataframe, essentially doing a "v-lookup" of ~30 columns using ID as the key or lookup value for all columns.
However, for the columns that are the same between the two dataframes, I don't want to bring over the duplicate columns but have those values be filled in df1 from df2.
I've tried below:
df = pd.merge(df,df2[['ID', [look_up_cols]]] ,
on ='ID',
how ='left',
#suffixes=(False,False)
)
but it brings in the shared columns from df2 when I want df2's values filled into the same columns in df1.
I've tried also created a dictionary with the column pairs from each df and doing this for loop to lookup each item in the dictionary (lookup_map) in the other df using ID as the key:
for col in look_up_cols:
df1[col] = df2['ID'].map(lookup_map)
but this just returns NaNs.

You should be able to do something like the following:
df = pd.merge(df,df2[look_up_cols + ['ID']] ,
on ='ID',
how ='left')
This just adds the ID column to the look_up_cols list and thereby allows it to be used in the merge function

Related

Update column of a dataframe when key matches from another dataframe in pandas

I have two dataframes.
For all rows in df1, find the corresponding row in df2 (through matching key) and update the final column in df2 to 1.
How shall I proceed in pandas?
Remove column final, use left join with indicator parameter, so is possible create 1,0 column by mapping True, False by compare both:
df = df2.drop('final', axis=1).merge(df1, how='left', indicator='final')
df['final'] = df['final'].eq('both').astype(int)

How to convert List from column to list of DataFrame?

I have a DataFrame which contains categories, and I want to split the DataFrame using categories as df_name:
df_name = df['category'].unique()
print(sites)
result:
['df1' 'df2']
after splitting the DataFrame using a loop, I get 2 smaller DataFrames, df1 and df2.
Next, I want to alter the DataFrame. I want to remove column category from df1 and df2, using df_name, but got an error. After trying for awhile I think the problem is because df_name is a list.
How do I convert df_name from
['df1' 'df2']
to
[df1 df2]
?
why using loop to filter? you can just use df[df['column'] == 'df1'] to filter 'df1' value from a column
then if you want to remove column, you can use del df['category']

Pandas - Using column of lists as key to create additional column

I have a dataframe "df1" with a column "team_name". I have a different dataframe "df2" with two columns: "city" as a string, and "teams" as a list. I want to create a new column in df1 called "team_city", where the city name is found in df2's "city" column by finding the row whose list in the "teams" column contains "team_name."
(example: if "team_name" equals "Denver", I want to find the row in df2 where "teams" contains "Denver", and then extract the value of "team_city" in that row.)
I'm currently applying a function over the "team_name" column in df1 that looks like the following:
def get_city(name):
df2 = clean_cities()
for index, row in df2.iterrows():
if name in row['teams']:
return row['city']
I'm curious if there's a better way to do this. Does a vectorized function exist within pandas that can accomplish this?
Instead of using your get_city function, explode your df2['teams'] sublist into multiple rows:
team_df = df2.explode(column='teams')
Then, in your original data frame, let's call it df:
df['city'] = df['name'].map(team_df.set_index('teams')['city'])
This assumes that there is a unique team name. If there isn't, this won't work. If that is the case, I would then try merging: df.merge(team_df[['teams', 'city']], left_on=$TEAM_NAME_COLUMN, right_on='teams', how='left'). You may need to do further reduction.

How can I add new rows from a dataframe to another one based on key column

My df1 is something like first table in the below image with the key column being Name. I want to add new rows from another dataframe, df2, which has only Name, Year, and Value columns. The new rows should get added based on Name. Other columns would just repeat the same value per Name. Results should be similar to the second table in the below image. How can I do this in pandas ?
Create a sub table df3 of df1 consist of Group, Name, and Other and only keep distinct records. And left join df2 and df3 to get desired result.

fill blank values of one dataframe with the values of another dataframe based on conditions-pandas

I have above dataframe df,and I have following dataframe as df2
I want to fill missing values in df with values in df2 corresponding to the id.
Also for Player1,Player2,Player3.If the value is missing.I want to replace Player1,Player2,Player3 of df with the corresponding values of df2.
Thus the resultant dataframe would look like this
Notice.Rick,Scott,Andrew are still forward as they are in df.I just replaced players in df with the corresponding players in df2.
So far,I have attempted to fill the blank values in df with the values in df2.
df=pd.read_csv('file1.csv')
for s in list(df.columns[1:]):
df[s]=df[s].str.strip()
df.fillna('',inplace=True)
df.replace(r'',np.nan,regex=True,inplace=True)
df2=pd.read_csv('file2.csv')
for s in list(df2.columns[1:]):
df2[s]=df2[s].str.strip()
df.set_index('Team ID',inplace=True)
df2.set_index('Team ID',inplace=True)
df.fillna(df2,inplace=True)
df.reset_index(inplace=True)
I am getting above result.How can I get result in Image Number 3?
Using combine_first
df1=df1.set_index('Team ID')
df2=df2.set_index('Team ID')
df2=df2.combine_first(df1.replace('',np.nan)).reset_index()

Categories