There are two lists where elements are DFs and having datetimeindex:
lst_1 = [ df1, df2, df3, df4] #columns are same here 'price'
lst_2 = [df1, df2, df3, df4] #columns are same here 'quantity'
I am doing it with one by one using the pandas merge function. I tried to do something where i add the two list and make function like this:
def df_merge(df1 ,df1):
p_q_df1 = pd.merge(df1,df1, on='Dates')
return p_q_df1
#this merged df has now price and quantity representing df1 from list! and list_2
still i have to apply to every pair again. Is there a better way, maybe in loop to automate this?
Consider elementwise looping with zip which can be handled in a list comprehension.
# DATES AS INDEX
final_lst = [pd.concat(i, j, axis=1) for i, j in zip(lst_1, lst_2)]
# DATES AS COLUMN
final_lst = [pd.merge(i, j, on='Dates') for i, j in zip(lst_1, lst_2)]
IIUC,
you could concat your df's then merge
dfs_1 = pd.concat(lst_1)
dfs_2 = pd.concat(lst_2)
pd.merge(dfs_1,dfs_2,on='Dates',how='outer')
# change how to specify the behavior of the merge.
I'm assuming your dataframes are the same shape so they can be concatenated.
if you want to merge multiple dataframes in your list you can use the reduce function from the standard python lib using an outer merge to get every possible row.
from functools import reduce
lst_1 = [ df1, df2, df3, df4]
df_merged = reduce(lambda left,right: pd.merge(left,right,on=['Dates'],
how='outer'), lst_1)
lst_1 = [ df1, df2, df3, df4] #columns are same here 'price'
lst_2 = [df1, df2, df3, df4] #columns are same here 'quantity'
def merge(lst_1, lst_2):
df = pd.DataFrame()
for _df in lst_1:
df = df.merge(_df, on='Dates')
for _df in lst_2:
df = df.merge(_df, on='Dates')
Related
I have multiple pandas dataframes:
df1
df2
df3
And I want to export them all to csv files.
df1.to_csv('df1.csv', index = False)
df2.to_csv('df2.csv', index = False)
df3.to_csv('df3.csv', index = False)
What's the most efficient way to do this?
def printToCSV(number):
num = str(num)
csvTemp = "df99.csv"
csv = csvTemp.replace('99',num)
dfTemp = "df99"
dfString = dfTemp.replace('99',num)
#i know i cant use the .to_csv function on a string
#but i dont know how iterate through different dataframes
dfString.to_csv(csv, index = False)
for i in range(1,4):
printToCSV(i)
How can I can call different dataframes to export?
You can add them all to a list with:
list_of_dataframes = [df1, df2, df3]
Then you can iterate through the list using enumerate to keep a going count of which dataframe you are writing, using f-strings for the filenames:
for count, dataframe in enumerate(list_of_dataframes):
dataframe.to_csv(f"dataframe_{count}.csv", index=False)
When you are creating the dataframes you can store them in a suitable data structure like list and when you want to create csv you can use map and do the same.
dfs = []
for i in range(10):
dfs.append(DataFrame())
result = [dfs[i].to_csv(f'df{i}') for i in range(10)]
If you want to stick with your function approach, use:
df_dict = {"df1": df1, "df2": df2, "df3": df3}
def printToCSV(num):
df_dict[f"df{num}"].to_csv(f"df{num}.csv", index=False)
printToCSV(1) # will create "df1.csv"
However, if you want to increase efficiency, I'd propose to use a df list (as #vtasca proposed as well):
df_list = [df1, df2, df3]
for num, df in enumerate(df_list): # this will give you a number (starting from 0) and the df, in each loop
df.to_csv(f"df{num}.csv", index=False)
Or working with a dict:
df_dict = {"df1": df1, "df2": df2, "df3": df3}
for name, df in df_dict.items():
df.to_csv(f"{name}.csv", index=False)
Using Pandas 1.2.1
MRE:
df_a = pd.DataFrame({"A":[1,2,3,4], "B":[33, 44, 55, 66]})
df_b = pd.DataFrame({"B":[33, 44,99], "C":["v", "z", "z"]})
df_c = pd.DataFrame({"A":[3,4,77,55], "D":["aa", "bb", "cc", "dd"]})
Using three dfs created above I want to join all of them together however
df_a, df_b share column "B" therefore they join on column "B"
df_a, df_c share column "A" therefore they join on column "A"
I want to left_join df_b and df_c onto df_a. currently this is my method:
merged_df = pd.merge(df_a, df_b, on=["B"], how="left")
merged_df = pd.merge(merged_df, df_c, on=["A"], how="left")
I know works fine however I cannot stop to think there is a easier and faster way, there are multiple questions on joining multiple dfs on same column using reduce function however could not find solution for my question.
You can remove on parameter, so it merging by intersection of columns names between DataFrames:
merged_df = pd.merge(df_a, df_b, how="left")
merged_df = pd.merge(merged_df, df_c, how="left")
More dynamic is use reduce, also is removed on parameter:
from functools import reduce
dfList = [df1, df2, df3]
df = reduce(lambda df1,df2: pd.merge(df1,df2,how="left"), dfList)
I have a list of data-frames
liste = [df1, df2, df3, df4]
sharing same index called "date". I concatenate this as follow:
pd.concat( (dd for dd in ll ), axis=1, join='inner')
But the columns have the same name. I can override the columns name manually, but I wonder if there is a way that the columns name will take the corresponding data-frame names, in this case "df1", "df2".
You can replace them as followes:
import pandas as pd
from functools import reduce
liste = [df1, df2, df3, df4]
df_final = reduce(lambda left,right: pd.merge(left,right,on='name'), liste)
Or:
... code snippet ...
df1.merge(df2,on='col_name').merge(df3,on='col_name').merge(df4,on='col_name')
Update based on comment:
An example for automated grabbing the column names of each you may integrate below code (while I assume its a single column array) to your liking:
colnames = {}
for i in range(len(dfs)):
name = df[i].columns
colnames[i+1] = name
... merge with code above ...
you could use merge
df=liste[0]
for data_frame in liste[1:]:
df=df.merge(date_frame, left_index=True,right_index=True)
by default you'll get y_ appended to the columns so you'll end up with _y_y etc but you can control this with suffixes= so perhaps you use the position with an enumerate in the loop?
I have a list where every element is Dataframe itself. And theses Dfs have duplicate date time index. I want to remove every duplicate index for every Df in that list.
list_dfs = [df_1, df_2, df_3, df_4]
dtype='datetime64[ns]' #Index of all Dfs in list_dfs
I am Using this list comprehension code. It is removing the duplicate indices but also with columns. At end i ended up only with the indices.
[df.index.drop_duplicates(keep='last') for df in list_dfs]
Any idea how one can achieve it?
Use Index.duplicated with filtering by boolean indexing and ~ for invering boolean mask:
df = pd.DataFrame({
'A':list('abcdef'),
'F':list('aaabbb')
}).set_index('F')
df1 = pd.DataFrame({
'A':list('tyuio'),
'F':list('rrffv')
}).set_index('F')
list_dfs = [df, df1]
L = [df[~df.index.duplicated(keep='last')] for df in list_dfs]
print (L)
[ A
F
a c
b f, A
F
r y
f i
v o]
I have multiple dataframes:
df1, df2, df3,..., dfn
They have the same type of data but from different groups of descriptors that cannot be joined. Now I need to apply the same function to each dataframe manually.
How can I apply the same function to multiple dataframes?
pipe + comprehension
If your dataframes contain related data, as in this case, you should store them in a list (if numeric ordering is sufficient) or dict (if you need to provide custom labels to each dataframe). Then you can pipe each dataframe through a function foo via a comprehension.
List example
df_list = [df1, df2, df3]
df_list = [df.pipe(foo) for df in df_list]
Then access your dataframes via df_list[0], df_list[1], etc.
Dictionary example
df_dict = {'first': df1, 'second': df2, 'third': df3}
df_dict = {k: v.pipe(foo) for k, v in df_dict.items()}
Then access your dataframes via df_dict['first], df_dict['second'], etc.
If the data frames have the same columns you could concat them to a single data frame, but otherwise there is not really a "smart" way of doing it:
df1, df2, df3 = (df.apply(...) for df in [df1, df2, df3]) # or either .map or .applymap