Variable instead of dataframe name in pandas function - python

I have something like
df3 = pd.merge(df1, df2, how='inner', left_on='x', right_on='y')
But I would like the the two dataframes to be represented by variables instead:
df3 = pd.merge(df_var, df_var2, how='inner', left_on='x', right_on='y')
I get this error: ValueError: can not merge DataFrame with instance of type
I'm stuck on how to get pandas to recognize the variable as the name of the dataframe. thanks!

How about storing the dataframes in a dict and referencing them using the keys
df_dict = {var: df1, var2: df2}
df3 = pd.merge(df_dict[var], df_dict[var2], how='inner', left_on='x', right_on='y')

If the DataFrames have been converted to strings, you need to convert them back to DataFrames before merging. Here's one way you could do it:
from io import StringIO
# to strings
df_var = df1.to_string(index=False)
df_var2 = df2.to_string(index=False)
# back to DataFrames
df_var = pd.read_csv(StringIO(df_var))
df_var2 = pd.read_csv(StringIO(df_var2))
df3 = pd.merge(df_var, df_var2, how='inner', left_on='x', right_on='y')

Related

What's the most efficient way to export multiple pandas dataframes to csv files?

I have multiple pandas dataframes:
df1
df2
df3
And I want to export them all to csv files.
df1.to_csv('df1.csv', index = False)
df2.to_csv('df2.csv', index = False)
df3.to_csv('df3.csv', index = False)
What's the most efficient way to do this?
def printToCSV(number):
num = str(num)
csvTemp = "df99.csv"
csv = csvTemp.replace('99',num)
dfTemp = "df99"
dfString = dfTemp.replace('99',num)
#i know i cant use the .to_csv function on a string
#but i dont know how iterate through different dataframes
dfString.to_csv(csv, index = False)
for i in range(1,4):
printToCSV(i)
How can I can call different dataframes to export?
You can add them all to a list with:
list_of_dataframes = [df1, df2, df3]
Then you can iterate through the list using enumerate to keep a going count of which dataframe you are writing, using f-strings for the filenames:
for count, dataframe in enumerate(list_of_dataframes):
dataframe.to_csv(f"dataframe_{count}.csv", index=False)
When you are creating the dataframes you can store them in a suitable data structure like list and when you want to create csv you can use map and do the same.
dfs = []
for i in range(10):
dfs.append(DataFrame())
result = [dfs[i].to_csv(f'df{i}') for i in range(10)]
If you want to stick with your function approach, use:
df_dict = {"df1": df1, "df2": df2, "df3": df3}
def printToCSV(num):
df_dict[f"df{num}"].to_csv(f"df{num}.csv", index=False)
printToCSV(1) # will create "df1.csv"
However, if you want to increase efficiency, I'd propose to use a df list (as #vtasca proposed as well):
df_list = [df1, df2, df3]
for num, df in enumerate(df_list): # this will give you a number (starting from 0) and the df, in each loop
df.to_csv(f"df{num}.csv", index=False)
Or working with a dict:
df_dict = {"df1": df1, "df2": df2, "df3": df3}
for name, df in df_dict.items():
df.to_csv(f"{name}.csv", index=False)

Mapping Two dataframes Pandas

I want to map two dataframes in pandas , in DF1 I have
df1
my second dataframe looks like
df2
I want to merge the two dataframes and get something like this
merged DF
on the basis of the 1 occuring in the DF1 , it should be replaced by the value after merging
so far i have tried
mergedDF = pd.merge(df1,df2, on=companies)
Seems like you need .idxmax() method.
merged = df1.merge(df2, on='Company')
merged['values'] = merged[[x for x in merged.columns if x != 'Company']].idxmax(axis=1)

Concatenate multiple dataframe and columns names

I have a list of data-frames
liste = [df1, df2, df3, df4]
sharing same index called "date". I concatenate this as follow:
pd.concat( (dd for dd in ll ), axis=1, join='inner')
But the columns have the same name. I can override the columns name manually, but I wonder if there is a way that the columns name will take the corresponding data-frame names, in this case "df1", "df2".
You can replace them as followes:
import pandas as pd
from functools import reduce
liste = [df1, df2, df3, df4]
df_final = reduce(lambda left,right: pd.merge(left,right,on='name'), liste)
Or:
... code snippet ...
df1.merge(df2,on='col_name').merge(df3,on='col_name').merge(df4,on='col_name')
Update based on comment:
An example for automated grabbing the column names of each you may integrate below code (while I assume its a single column array) to your liking:
colnames = {}
for i in range(len(dfs)):
name = df[i].columns
colnames[i+1] = name
... merge with code above ...
you could use merge
df=liste[0]
for data_frame in liste[1:]:
df=df.merge(date_frame, left_index=True,right_index=True)
by default you'll get y_ appended to the columns so you'll end up with _y_y etc but you can control this with suffixes= so perhaps you use the position with an enumerate in the loop?

How to call a list of DataFrames in a function? [duplicate]

I have multiple dataframes:
df1, df2, df3,..., dfn
They have the same type of data but from different groups of descriptors that cannot be joined. Now I need to apply the same function to each dataframe manually.
How can I apply the same function to multiple dataframes?
pipe + comprehension
If your dataframes contain related data, as in this case, you should store them in a list (if numeric ordering is sufficient) or dict (if you need to provide custom labels to each dataframe). Then you can pipe each dataframe through a function foo via a comprehension.
List example
df_list = [df1, df2, df3]
df_list = [df.pipe(foo) for df in df_list]
Then access your dataframes via df_list[0], df_list[1], etc.
Dictionary example
df_dict = {'first': df1, 'second': df2, 'third': df3}
df_dict = {k: v.pipe(foo) for k, v in df_dict.items()}
Then access your dataframes via df_dict['first], df_dict['second'], etc.
If the data frames have the same columns you could concat them to a single data frame, but otherwise there is not really a "smart" way of doing it:
df1, df2, df3 = (df.apply(...) for df in [df1, df2, df3]) # or either .map or .applymap

How to combine (merge) an array of identical DataFrames into a single one?

How do I merge or combine an array of DataFrames in pandas?
dfs = []
for df in pd.read_csv(....chunksize=chunk_size):
df1 = df
# ....
if condition:
dfs.append(df1)
As you can see, they all have the same structure, I just need to combine them in a single DataFrame.
normally you can concatenate your array of data frame so you could have
dfs = []
for df in pd.read_csv(....chunksize=chunk_size):
df1 = df
# ....
if condition:
dfs.append(df1)
result=concat(dfs)
you can find more info on this , here.
Pandas DataFrame already has an append method to merge two DataFrames
See the documentation
import pandas
dfs = pandas.DataFrame()
for df in pd.read_csv(....chunksize=chunk_size):
df1 = df
# ....
if condition:
dfs.append(df1)

Categories