Concatenate multiple dataframe and columns names - python

I have a list of data-frames
liste = [df1, df2, df3, df4]
sharing same index called "date". I concatenate this as follow:
pd.concat( (dd for dd in ll ), axis=1, join='inner')
But the columns have the same name. I can override the columns name manually, but I wonder if there is a way that the columns name will take the corresponding data-frame names, in this case "df1", "df2".

You can replace them as followes:
import pandas as pd
from functools import reduce
liste = [df1, df2, df3, df4]
df_final = reduce(lambda left,right: pd.merge(left,right,on='name'), liste)
Or:
... code snippet ...
df1.merge(df2,on='col_name').merge(df3,on='col_name').merge(df4,on='col_name')
Update based on comment:
An example for automated grabbing the column names of each you may integrate below code (while I assume its a single column array) to your liking:
colnames = {}
for i in range(len(dfs)):
name = df[i].columns
colnames[i+1] = name
... merge with code above ...

you could use merge
df=liste[0]
for data_frame in liste[1:]:
df=df.merge(date_frame, left_index=True,right_index=True)
by default you'll get y_ appended to the columns so you'll end up with _y_y etc but you can control this with suffixes= so perhaps you use the position with an enumerate in the loop?

Related

What's the most efficient way to export multiple pandas dataframes to csv files?

I have multiple pandas dataframes:
df1
df2
df3
And I want to export them all to csv files.
df1.to_csv('df1.csv', index = False)
df2.to_csv('df2.csv', index = False)
df3.to_csv('df3.csv', index = False)
What's the most efficient way to do this?
def printToCSV(number):
num = str(num)
csvTemp = "df99.csv"
csv = csvTemp.replace('99',num)
dfTemp = "df99"
dfString = dfTemp.replace('99',num)
#i know i cant use the .to_csv function on a string
#but i dont know how iterate through different dataframes
dfString.to_csv(csv, index = False)
for i in range(1,4):
printToCSV(i)
How can I can call different dataframes to export?
You can add them all to a list with:
list_of_dataframes = [df1, df2, df3]
Then you can iterate through the list using enumerate to keep a going count of which dataframe you are writing, using f-strings for the filenames:
for count, dataframe in enumerate(list_of_dataframes):
dataframe.to_csv(f"dataframe_{count}.csv", index=False)
When you are creating the dataframes you can store them in a suitable data structure like list and when you want to create csv you can use map and do the same.
dfs = []
for i in range(10):
dfs.append(DataFrame())
result = [dfs[i].to_csv(f'df{i}') for i in range(10)]
If you want to stick with your function approach, use:
df_dict = {"df1": df1, "df2": df2, "df3": df3}
def printToCSV(num):
df_dict[f"df{num}"].to_csv(f"df{num}.csv", index=False)
printToCSV(1) # will create "df1.csv"
However, if you want to increase efficiency, I'd propose to use a df list (as #vtasca proposed as well):
df_list = [df1, df2, df3]
for num, df in enumerate(df_list): # this will give you a number (starting from 0) and the df, in each loop
df.to_csv(f"df{num}.csv", index=False)
Or working with a dict:
df_dict = {"df1": df1, "df2": df2, "df3": df3}
for name, df in df_dict.items():
df.to_csv(f"{name}.csv", index=False)

Merging DFs from two different lists in python

There are two lists where elements are DFs and having datetimeindex:
lst_1 = [ df1, df2, df3, df4] #columns are same here 'price'
lst_2 = [df1, df2, df3, df4] #columns are same here 'quantity'
I am doing it with one by one using the pandas merge function. I tried to do something where i add the two list and make function like this:
def df_merge(df1 ,df1):
p_q_df1 = pd.merge(df1,df1, on='Dates')
return p_q_df1
#this merged df has now price and quantity representing df1 from list! and list_2
still i have to apply to every pair again. Is there a better way, maybe in loop to automate this?
Consider elementwise looping with zip which can be handled in a list comprehension.
# DATES AS INDEX
final_lst = [pd.concat(i, j, axis=1) for i, j in zip(lst_1, lst_2)]
# DATES AS COLUMN
final_lst = [pd.merge(i, j, on='Dates') for i, j in zip(lst_1, lst_2)]
IIUC,
you could concat your df's then merge
dfs_1 = pd.concat(lst_1)
dfs_2 = pd.concat(lst_2)
pd.merge(dfs_1,dfs_2,on='Dates',how='outer')
# change how to specify the behavior of the merge.
I'm assuming your dataframes are the same shape so they can be concatenated.
if you want to merge multiple dataframes in your list you can use the reduce function from the standard python lib using an outer merge to get every possible row.
from functools import reduce
lst_1 = [ df1, df2, df3, df4]
df_merged = reduce(lambda left,right: pd.merge(left,right,on=['Dates'],
how='outer'), lst_1)
lst_1 = [ df1, df2, df3, df4] #columns are same here 'price'
lst_2 = [df1, df2, df3, df4] #columns are same here 'quantity'
def merge(lst_1, lst_2):
df = pd.DataFrame()
for _df in lst_1:
df = df.merge(_df, on='Dates')
for _df in lst_2:
df = df.merge(_df, on='Dates')

How to call a list of DataFrames in a function? [duplicate]

I have multiple dataframes:
df1, df2, df3,..., dfn
They have the same type of data but from different groups of descriptors that cannot be joined. Now I need to apply the same function to each dataframe manually.
How can I apply the same function to multiple dataframes?
pipe + comprehension
If your dataframes contain related data, as in this case, you should store them in a list (if numeric ordering is sufficient) or dict (if you need to provide custom labels to each dataframe). Then you can pipe each dataframe through a function foo via a comprehension.
List example
df_list = [df1, df2, df3]
df_list = [df.pipe(foo) for df in df_list]
Then access your dataframes via df_list[0], df_list[1], etc.
Dictionary example
df_dict = {'first': df1, 'second': df2, 'third': df3}
df_dict = {k: v.pipe(foo) for k, v in df_dict.items()}
Then access your dataframes via df_dict['first], df_dict['second'], etc.
If the data frames have the same columns you could concat them to a single data frame, but otherwise there is not really a "smart" way of doing it:
df1, df2, df3 = (df.apply(...) for df in [df1, df2, df3]) # or either .map or .applymap

Variable instead of dataframe name in pandas function

I have something like
df3 = pd.merge(df1, df2, how='inner', left_on='x', right_on='y')
But I would like the the two dataframes to be represented by variables instead:
df3 = pd.merge(df_var, df_var2, how='inner', left_on='x', right_on='y')
I get this error: ValueError: can not merge DataFrame with instance of type
I'm stuck on how to get pandas to recognize the variable as the name of the dataframe. thanks!
How about storing the dataframes in a dict and referencing them using the keys
df_dict = {var: df1, var2: df2}
df3 = pd.merge(df_dict[var], df_dict[var2], how='inner', left_on='x', right_on='y')
If the DataFrames have been converted to strings, you need to convert them back to DataFrames before merging. Here's one way you could do it:
from io import StringIO
# to strings
df_var = df1.to_string(index=False)
df_var2 = df2.to_string(index=False)
# back to DataFrames
df_var = pd.read_csv(StringIO(df_var))
df_var2 = pd.read_csv(StringIO(df_var2))
df3 = pd.merge(df_var, df_var2, how='inner', left_on='x', right_on='y')

How do I combine two dataframes?

I have a initial dataframe D. I extract two data frames from it like this:
A = D[D.label == k]
B = D[D.label != k]
I want to combine A and B into one DataFrame. The order of the data is not important. However, when we sample A and B from D, they retain their indexes from D.
DEPRECATED: DataFrame.append and Series.append were deprecated in v1.4.0.
Use append:
df_merged = df1.append(df2, ignore_index=True)
And to keep their indexes, set ignore_index=False.
Use pd.concat to join multiple dataframes:
df_merged = pd.concat([df1, df2], ignore_index=True, sort=False)
Merge across rows:
df_row_merged = pd.concat([df_a, df_b], ignore_index=True)
Merge across columns:
df_col_merged = pd.concat([df_a, df_b], axis=1)
If you're working with big data and need to concatenate multiple datasets calling concat many times can get performance-intensive.
If you don't want to create a new df each time, you can instead aggregate the changes and call concat only once:
frames = [df_A, df_B] # Or perform operations on the DFs
result = pd.concat(frames)
This is pointed out in the pandas docs under concatenating objects at the bottom of the section):
Note: It is worth noting however, that concat (and therefore append)
makes a full copy of the data, and that constantly reusing this
function can create a significant performance hit. If you need to use
the operation over several datasets, use a list comprehension.
If you want to update/replace the values of first dataframe df1 with the values of second dataframe df2. you can do it by following steps —
Step 1: Set index of the first dataframe (df1)
df1.set_index('id')
Step 2: Set index of the second dataframe (df2)
df2.set_index('id')
and finally update the dataframe using the following snippet —
df1.update(df2)
To join 2 pandas dataframes by column, using their indices as the join key, you can do this:
both = a.join(b)
And if you want to join multiple DataFrames, Series, or a mixture of them, by their index, just put them in a list, e.g.,:
everything = a.join([b, c, d])
See the pandas docs for DataFrame.join().
# collect excel content into list of dataframes
data = []
for excel_file in excel_files:
data.append(pd.read_excel(excel_file, engine="openpyxl"))
# concatenate dataframes horizontally
df = pd.concat(data, axis=1)
# save combined data to excel
df.to_excel(excelAutoNamed, index=False)
You can try the above when you are appending horizontally! Hope this helps sum1
Use this code to attach two Pandas Data Frames horizontally:
df3 = pd.concat([df1, df2],axis=1, ignore_index=True, sort=False)
You must specify around what axis you intend to merge two frames.

Categories