I have a list of dataframes:
all_df = [df1, df2, df3]
I would like to remove rows with duplicated indices in all dataframes in the list, such that the changes are reflected in the original dataframes df1, df2 and df3.
I tried to do
for df in all_df:
df = df[~df.index.duplicated()]
But the changes are only applied in the list, not on the original dataframes.
Essentially, I want to avoid doing the following:
df1 = df1[~df1.index.duplicated()]
df2 = df2[~df2.index.duplicated()]
df3 = df3[~df3.index.duplicated()]
all_df = [df1,df2,df3]
You need recreate list of DataFrames:
all_df = [df[~df.index.duplicated()] for df in all_df]
Or:
for i, df in enumerate(all_df):
all_df[i] = df[~df.index.duplicated()]
print (all_df[0])
EDIT: If name of dictionary is important use dictionary of DataFrames, but also inplace modification df1, df2 is not here, need select by keys of dicts:
d = {'price': df1, 'volumes': df2}
d = {k: df[~df.index.duplicated()] for k, df in all_df.items()}
print (d['price'])
Related
I have a list of dataframes with a respective column names to join so for example:
dfs = [df1, df2, df3, df4]
col_join = ["col1", "col2", "col3"]
I have seen answers using the reduce function in Python.
import pandas as pd
from functools import reduce
reduce(lambda x, y: pd.merge(x, y, on=["Col1"], how="outer"), dfs)
What I want to achieve is the following:
df1 Columns:
Data1 Dim1 Dim2 Dim3
df2 Columns:
Example1 Example2 Example3
df3 Columns:
Other1 Other2 Other3
df1 to df2 is joined by Dim1 to Example1.
df1 to df3 is joined by Dim2 to Other1.
This list goes until df(n) where n can be even 20 dataframes being joined to df1 on different column names.
My idea is to pass a function with the list of the original df1 and the remainder df2, df3, df4 ... dfn.
Other argument is the list of merging columns, like above for example it would be: left_on=["Dim1"], right_on=["Example1"].
Next would be, joining df1 (already including df2 in the join) to df3 on Dim2 and Other1.
Each dataframe will be joined to df1 on a different column, which may or may not share the same name as df1, that's why left and right are arguments that should be used.d
How to incorporate the fact that merging columns are changing at each join inside the reduce function?
Thank you in advance.
This might work (untested):
result = df1
for df, col in zip(dfs[1:], col_join):
result = pd.merge(result, df, on=[col], how='outer')
I have two dataframes which I want to merge into one. The first one has as its columns the ID, while the second has the same values but in the column named id_number. I tried the below code, but in the end the final_df has both ID and the id_number columns and their values. How can I keep only one column for the ids after merging?
final_df = df.merge(
df2,
left_on='ID',
right_on='id_number',
how='inner')
Also, let's say the following dataframe format for df column A:
A
0
1
2
The same column A in the second dataframe has some empty fields, like this:
A
-
1
2
After merge, how can the final dataframe compound the two dataframes so that A won't have empty values?
try selecting required columns after merge
final_df = df.merge(
df2,
left_on='ID',
right_on='id_number',
how='inner')[['ID', 'col1', 'col2']]
or drop the column after merge
final_df = df.merge(
df2,
left_on='ID',
right_on='id_number',
how='inner').drop(['id_number'], axis=1)
The solution you're looking for:
df.combine_first(df2.rename(columns={'id_number': 'ID'}))
A full working example:
import pandas as pd
dfa = pd.DataFrame({'ID': [1, 2, 3], 'other': ['A', 'B', 'C']})
dfb = pd.DataFrame({'id_number': [None, 2, 3], 'other_2': ['A2', 'B2', 'C2']})
dfa.combine_first(dfb.rename(columns={'id_number': 'ID'}))
Rename 'on-the-fly' id_number column of df2 to ID
final_df = df.merge(
df2.rename(columns={'id_number': 'ID'}),
on='ID',
how='inner')
I have 2 dataframes: df1 and df2. I would like to merge the 2 dataframes on the column link in df2. Link column in df2 contains a list of column and values which match in df1:
df1 = pd.DataFrame({'p':[1,2,3,4], 'a':[1,2,2,2],'b':['z','z','z','z'],'c':[3,3,4,4],'d':[5,5,5,6]})
df2 = pd.DataFrame({'e':[11,22,33,44], 'link':['a=1,c=3','a=2,c=3','a=2,c=4,d=5','a=2,c=4']})
The result should end with dataframe like this where column e from df2 are merge together with df1:
df_res = pd.DataFrame({'p':[1,2,3,3,4], 'a':[1,2,2,2,2],'b':['z','z','z','z','z'],'c':[3,3,4,4,4],'d':[5,5,5,5,6],'e':[11,22,33,44,44]})
How can this be done in pandas?
df1["e"] = df2["e"]
frames = [df1, df2]
result = pd.concat(frames)
I have four df (df1,df2,df3,df4)
Sometimes df1 is null, sometimes df2 is null, sometimes df3 and df4 accordingly.
How can I do an outer merge so that the df which is empty is automatically ignored? I am using the below code to merge as of now:-
df = f1.result().merge(f2.result(), how='left', left_on='time', right_on='time').merge(f3.result(), how='left', left_on='time', right_on='time').merge(f4.result(), how='left', left_on='time', right_on='time')
and
df = reduce(lambda x,y: pd.merge(x,y, on='time', how='outer'), [f1.result(),f2.result(),f3.result(),f4.result()])
You can use df.empty attribute or len(df) > 0 to check whether the dataframe is empty or not.
Try this:
dfs = [df1, df2, df3, df4]
non_empty_dfs = [df for df in dfs if not df.empty]
df_final = reduce(lambda left,right: pd.merge(left,right, on='time', how='outer'), non_empty_dfs)
Or, you could also filter empty dataframe as,
non_empty_dfs = [df for df in dfs if len(df) > 0]
use pandas' dataframe empty method, to filter out the empty dataframe, then you can concatenate or run whatever merge operation you have in mind:
df4 = pd.DataFrame({'A':[]}) #empty dataframe
df1 = pd.DataFrame({'B':[2]})
df2 = pd.DataFrame({'C':[3]})
df3 = pd.DataFrame({'D':[4]})
dfs = [df1,df2,df3,df4]
#concat
#u can do other operations since u have gotten rid of the empty dataframe
pd.concat([df for df in dfs if not df.empty],axis=1)
B C D
0 2 3 4
I have 10000 data that I'm sorting into a dictionary and then exporting that to a csv using pandas. I'm sorting temperatures, pressures and flow associated with a key. But when doing this I find: https://imgur.com/a/aNX7RHf
but I want something like this:https://imgur.com/a/ZxJgPv4
I'm transposing my dataframe so the index can be rows but in this case I want only 3 rows 1,2, & 3, and all the data populate those rows.
flow_dictionary = {'200:P1F1':[5.5, 5.5, 5.5]}
pres_dictionary = {'200:PT02':[200,200,200],
'200:PT03':[200,200,200],
'200:PT06':[66,66,66],
'200:PT07':[66,66,66]}
temp_dictionary = {'200:TE02':[27,27,27],
'200:TE03':[79,79,79],
'200:TE06':[113,113,113],
'200:TE07':[32,32,32]}
df = pd.DataFrame.from_dict(temp_dictionary, orient='index').T
df2 = pd.DataFrame.from_dict(pres_dictionary, orient='index').T
df3 = pd.DataFrame.from_dict(flow_dictionary, orient='index').T
df = df.append(df2, ignore_index=False, sort=True)
df = df.append(df3, ignore_index=False, sort=True)
df.to_csv('processedSegmentedData.csv')
SOLUTION:
df1 = pd.DataFrame.from_dict(temp_dictionary, orient='index').T
df2 = pd.DataFrame.from_dict(pres_dictionary, orient='index').T
df3 = pd.DataFrame.from_dict(flow_dictionary, orient='index').T
df4 = pd.concat([df1,df2,df3], axis=1)
df4.to_csv('processedSegmentedData.csv')