Vlookup/Map value from one dataframe to another dataframe in Python

Vlookup/Map value from one dataframe to another dataframe in Python - python

I want to do something similar to the vlookup in the python.
Here is a dataframe I want to lookup for value 'Flow_Rate_Lupa'
And here is the dataframe I want to fill the data by looking at the same month+day to fill the missing value. Is there any one to help me to solve how to do this QAQ

I usually merge the two data frames and define an indicator, then filter out the values where the indicator says both meaning data is in both data frames.
import pandas as pd
mergedData= pd.merge(df1,df2, how='left' ,left_on='Key1', right_on='Key2', indicator ='Exists')
filteredData = mergedData[mergedData[Exists]='both']

Use DataFrame.merge with left join and the nreplace missing values by Series.fillna with DataFrame.pop for use and remove column:
df = df2.merge(df1, on=['month','day'], how='left', suffixes=('','_'))
df['Flow_Rate_Lupa'] = df['Flow_Rate_Lupa'].fillna(df.pop('Flow_Rate_Lupa_'))

Related

Compare data of two columns of one dataframe with two columns of another dataframe and find mismatch data

I have dataframe df1 as following-
Second dataframe df2 is as following-
and I want the resulted dataframe as following
Dataframe df1 & df2 contains a large number of columns and data but here I am showing sample data. My goal is to compare Customer and ID column of df1 with Customer and Part Number of df2. Comparison is to find mismatch of data of df1['Customer'] and df1['ID'] with df2['Customer'] and df2['Part Number']. Finally storing mismatch data in another dataframe df3. For example: Customer(rishab) with ID(89ab) is present in df1 but not in df2.Thus Customer, Order#, and Part are stored in df3.
I am using isin() method to find mismatch of df1 with df2 for one column only but not able to do it for comparison of two columns.
df3 = df1[~df1['ID'].isin(df2['Part Number'].values)]
#here I am only able to find mismatch based upon only 1 column ID but I want to include Customer also
I can use loop also but the data is very large(Time complexity will increase) and I am sure there can be one-liner code to achieve this task. I have also tried to use merge but not able to produce the exact output.
So, how to produce this exact output? I am also not able to use isin() for two columns and I think isin() cannot to use for two columns

The easiest way to achieve this is:
df3 = df1.merge(df2, left_on = ['Customer', 'ID'],right_on= ['Customer', 'Part Number'], how='left', indicator=True)
df3.reset_index(inplace = True)
df3 = df3[df3['_merge'] == 'left_only']
Here, you first do a left join on the columns, and put indicator = True, which will give another column like _merge, which has indicator mentioning which side the data exists, and then we pick left_only from those.

You can try outer join to get non matching rows. Something like df3 = df1.merge(df2, left_on = ['Customer', 'ID'],right_on= ['Customer', 'Part Number'], how = "outer")

Filling a dataframe with multiple dataframe values

I have some 100 dataframes that need to be filled in another big dataframe. Presenting the question with two dataframes
import pandas as pd
df1 = pd.DataFrame([1,1,1,1,1], columns=["A"])
df2 = pd.DataFrame([2,2,2,2,2], columns=["A"])
Please note that both the dataframes have same column names.
I have a master dataframe that has repetitive index values as follows:-
master_df=pd.DataFrame(index=df1.index)
master_df= pd.concat([master_df]*2)
Expected Output:-
master_df['A']=[1,1,1,1,1,2,2,2,2,2]
I am using for loop to replace every n rows of master_df with df1,df2... df100.
Please suggest a better way of doing it.
In fact df1,df2...df100 are output of a function where the input is column A values (1,2). I was wondering if there is something like
another_df=master_df['A'].apply(lambda x: function(x))
Thanks in advance.

If you want to concatenate the dataframes you could just use pandas concat with a list as the code below shows.
First you can add df1 and df2 to a list:
df_list = [df1, df2]
Then you can concat the dfs:
master_df = pd.concat(df_list)
I used the default value of 0 for 'axis' in the concat function (which is what I think you are looking for), but if you want to concatenate the different dfs side by side you can just set axis=1.

Pandas how to concat two dataframes without losing the column headers

I have the following toy code:
import pandas as pd
df = pd.DataFrame()
df["foo"] = [1,2,3,4]
df2 = pd.DataFrame()
df2["bar"]=[4,5,6,7]
df = pd.concat([df,df2], ignore_index=True,axis=1)
print(list(df))
Output: [0,1]
Expected Output: [foo,bar] (order is not important)
Is there any way to concatenate two dataframes without losing the original column headers, if I can guarantee that the headers will be unique?
Iterating through the columns and then adding them to one of the DataFrames comes to mind, but is there a pandas function, or concat parameter that I am unaware of?
Thanks!

As stated in merge, join, and concat documentation, ignore index will remove all name references and use a range (0...n-1) instead. So it should give you the result you want once you remove ignore_index argument or set it to false (default).
df = pd.concat([df, df2], axis=1)
This will join your df and df2 based on indexes (same indexed rows will be concatenated, if other dataframe has no member of that index it will be concatenated as nan).
If you have different indexing on your dataframes, and want to concatenate it this way. You can either create a temporary index and join on that, or set the new dataframe's columns after using concat(..., ignore_index=True).

I don't think the accepted answer answers the question, which is about column headers, not indexes.
I am facing the same problem, and my workaround is to add the column names after the concatenation:
df.columns = ["foo", "bar"]

Left Outer Join a Data frame with a Series object by key on data frame, index on series

Is it possible to join a series object to a dataframe without having to turn the series into a dataframe?
Currently, I calculate something, get a series as a result, and have to turn the series into a dataframe to merge the two:
clicked_series = p.clicked.sum();
temp_df = pd.DataFrame({'ad_id':clicked_series.index, 'clicks':clicked_series.values})
full_df = pd.merge(full_df, temp_df, on='ad_id', how='left')
Is it possible to conduct a left outer join on the series and dataframe directly, without having to create a temporary data frame?

use reindex
full_df['clicks'] = clicked_series.reindex(full_df.ad_id).values
old answers
Use join
technically, I'm still converting to a pd.DataFrame but...
clicked_series = p.clicked.sum();
full_df = full_df.join(clicked_series.to_frame('clicks'), on='ad_id', how='left')
Another option is to use pd.concat. But this will look like an outer join.
pd.concat([full_df.set_index('ad_id'),
clicked_series.rename('clicks')], axis=1).reset_index()

pandas combine_first with particular index columns?

I'm trying to join two dataframes in pandas to have the following behavior: I want to join on a specified column, but have it so redundant columns are not added to the dataframe. This is analogous to combine_first except combine_first does not seem to take an index column optional argument. Example:
# combine df1 and df2 based on "id" column
df1 = pandas.merge(df2, how="outer", on=["id"])
The problem with the above is that columns common to df1/df2 aside from "id" will be added twice (with _x,_y prefixes) to df1. How can I do something like:
# Do outer join from df2 to df1, matching items by "id" but not adding
# columns that are redundant (df1 takes precedence if the values disagree)
df1.combine_first(df2, on=["id"])
How can this be done?

If you are trying to merge columns from df2 into df1 while excluding any redundant columns, the following should work.
df1.set_index("id", inplace=True)
df2.set_index("id", inplace=True)
df3 = df1.merge(df2.ix[:,df2.columns-df1.columns], left_index=True, right_index=True, how="outer")
However this obviously will not update any values from df1 with values from df2 as it is only bringing in non-redundant columns. But since you said df1 will take precedence on any values that disagree, perhaps this will do the trick?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Vlookup/Map value from one dataframe to another dataframe in Python - python

I want to do something similar to the vlookup in the python. Here is a dataframe I want to lookup for value 'Flow_Rate_Lupa' And here is the dataframe I want to fill the data by looking at the same month+day to fill the missing value. Is there any one to help me to solve how to do this QAQ

Use DataFrame.merge with left join and the nreplace missing values by Series.fillna with DataFrame.pop for use and remove column: df = df2.merge(df1, on=['month','day'], how='left', suffixes=('','_')) df['Flow_Rate_Lupa'] = df['Flow_Rate_Lupa'].fillna(df.pop('Flow_Rate_Lupa_'))

Related

Compare data of two columns of one dataframe with two columns of another dataframe and find mismatch data

Filling a dataframe with multiple dataframe values

Pandas how to concat two dataframes without losing the column headers

Left Outer Join a Data frame with a Series object by key on data frame, index on series

pandas combine_first with particular index columns?

Categories

Resources