How to combine certain columns from one data frame to another? - python

How to add three columns from one data frame to another at a certain position?
I want to add these columns after a specific column? DF1=['C','D'] after
columns A and B in DF2. So how to join columns in between other columns in
another dataframe.
df1=pd.read_csv(csvfile)
df2=pd.read_csv(csvfile)
df1['C','D','E'] to df2['K','L','A','B','F']
so it looks like df3= ['K','L','A','B','C','D','F']

Use concat with DataFrame.reindex for change order of columns:
df3 = pd.concat([df1, df2], axis=1).reindex(['K','L','A','B','C','D'], axis=1)
More general solution:
df1 = pd.DataFrame(columns=['H','G','C','D','E'])
df2 = pd.DataFrame(columns=['K','L','A','B','F'])
df3 = pd.concat([df1, df2], axis=1)
c = df3.columns.difference(['C', 'D'], sort=False)
pos = c.get_loc('B') + 1
c = list(c)
#https://stackoverflow.com/a/3748092/2901002
c[pos:pos] = ['C', 'D']
df3 = df3.reindex(c, axis=1)
print (df3)
Empty DataFrame
Columns: [H, G, E, K, L, A, B, C, D, F]
Index: []

Try:
df3=pd.DataFrame()
df3[['K','L','A','B']]=df2[['K','L','A','B']]
df3[['C','D','E']]=df1[['C','D','E']]
Finally:
df3=df3[['K','L','A','B','C','D']]
OR
df3=df3.loc[:,['K','L','A','B','C','D']]

This should work
pd.merge([df1, df2, left_index=True, right_index=True]).[['K','L','A','B','C','D']]
or simply use join which is left by deafult
df1.join(df2)[['K','L','A','B','C','D']]

Related

Why pandas merge is skipping some rows

I have df1, df2 who share mutual column (time) where df2.time ∈ df1.also tdf1 shape is (2353X11) and df2 shape is (57X1). I'm trying to create df3 using merge method to extract some rows from df1 based on rows of df2. issue is df3 is missing some rows even though both df1 & df2 are float64 and have mutual values.
df3 shape should have 57 rows also but I get 54 only!
df1
df2
def pressure_filter (noisydata, reducedtime, filcutoff, tzero):
b,a = sig.butter(2, filcutoff, btype='low', analog=False)
noisydata['p_lowcut'] = sig.filtfilt(b, a, noisydata.p_noisy)
noisydata.at[0,'p_lowcut'] = noisydata.at[0,'p_noisy']
noisydata['p_lowcut_ma'] = noisydata['p_lowcut'].rolling(20, center = True).mean()
noisydata['p_lowcut_ma'] = noisydata.apply(lambda row: row['p_lowcut'] if
np.isnan(row['p_lowcut_ma'])
else row['p_lowcut_ma'], axis=1)
datared = pd.merge(noisydata, reducedtime, on=['time'], how='inner')
return datared

Ignore empty dataframe when merging

I have four df (df1,df2,df3,df4)
Sometimes df1 is null, sometimes df2 is null, sometimes df3 and df4 accordingly.
How can I do an outer merge so that the df which is empty is automatically ignored? I am using the below code to merge as of now:-
df = f1.result().merge(f2.result(), how='left', left_on='time', right_on='time').merge(f3.result(), how='left', left_on='time', right_on='time').merge(f4.result(), how='left', left_on='time', right_on='time')
and
df = reduce(lambda x,y: pd.merge(x,y, on='time', how='outer'), [f1.result(),f2.result(),f3.result(),f4.result()])
You can use df.empty attribute or len(df) > 0 to check whether the dataframe is empty or not.
Try this:
dfs = [df1, df2, df3, df4]
non_empty_dfs = [df for df in dfs if not df.empty]
df_final = reduce(lambda left,right: pd.merge(left,right, on='time', how='outer'), non_empty_dfs)
Or, you could also filter empty dataframe as,
non_empty_dfs = [df for df in dfs if len(df) > 0]
use pandas' dataframe empty method, to filter out the empty dataframe, then you can concatenate or run whatever merge operation you have in mind:
df4 = pd.DataFrame({'A':[]}) #empty dataframe
df1 = pd.DataFrame({'B':[2]})
df2 = pd.DataFrame({'C':[3]})
df3 = pd.DataFrame({'D':[4]})
dfs = [df1,df2,df3,df4]
#concat
#u can do other operations since u have gotten rid of the empty dataframe
pd.concat([df for df in dfs if not df.empty],axis=1)
B C D
0 2 3 4

How to add columns to an empty pandas dataframe?

I have an empty dataframe.
df=pd.DataFrame(columns=['a'])
for some reason I want to generate df2, another empty dataframe, with two columns 'a' and 'b'.
If I do
df.columns=df.columns+'b'
it does not work (I get the columns renamed to 'ab')
and neither does the following
df.columns=df.columns.tolist()+['b']
How to add a separate column 'b' to df, and df.emtpy keep on being True?
Using .loc is also not possible
df.loc[:,'b']=None
as it returns
Cannot set dataframe with no defined index and a scalar
Here are few ways to add an empty column to an empty dataframe:
df=pd.DataFrame(columns=['a'])
df['b'] = None
df = df.assign(c=None)
df = df.assign(d=df['a'])
df['e'] = pd.Series(index=df.index)
df = pd.concat([df,pd.DataFrame(columns=list('f'))])
print(df)
Output:
Empty DataFrame
Columns: [a, b, c, d, e, f]
Index: []
I hope it helps.
If you just do df['b'] = None then df.empty is still True and df is:
Empty DataFrame
Columns: [a, b]
Index: []
EDIT:
To create an empty df2 from the columns of df and adding new columns, you can do:
df2 = pd.DataFrame(columns = df.columns.tolist() + ['b', 'c', 'd'])
If you want to add multiple columns at the same time you can also reindex.
new_cols = ['c', 'd', 'e', 'f', 'g']
df2 = df.reindex(df.columns.union(new_cols), axis=1)
#Empty DataFrame
#Columns: [a, c, d, e, f, g]
#Index: []
This is one way:
df2 = df.join(pd.DataFrame(columns=['b']))
The advantage of this method is you can add an arbitrary number of columns without explicit loops.
In addition, this satisfies your requirement of df.empty evaluating to True if no data exists.
You can use concat:
df=pd.DataFrame(columns=['a'])
df
Out[568]:
Empty DataFrame
Columns: [a]
Index: []
df2=pd.DataFrame(columns=['b', 'c', 'd'])
pd.concat([df,df2])
Out[571]:
Empty DataFrame
Columns: [a, b, c, d]
Index: []

Check if text in data frame exists in any of all headers pandas

I have a dataframes from an excel called
df1, df2, df3, df4
I also have df called df5 below.
A B C
df1 df2 df3
df1 df3 df4
How do I check if A, B, C each row contains text, then get that named df and do action. All dataframes are labeled A, B, C
So for row 1,
go to df1 df1.pop('A')
go to df2 df2.pop('A')
go to df3 df3.pop('A')
I'm aware of solutions that involve columns.
df = pd.DataFrame([[0,1],[2,3],[4,5]], columns=['A', 'B'])
aa = ((df['A'] == 2) & (df['B'] == 3)).any()
Not quite what I desire.
Below could be one way to handle this.
create a dictionary mapping dataframe names to data frame objects
objs={'df1': df1 , 'df2':df2, 'df3' : df3}
define a function which manipulate the dataframes
def handler(df):
df.pop('A')
Then apply for the df columns as
df['A'].apply(lambda x : handler(objs.get(x)))
may be not the most elegant way, but should meet your requirement.

Pandas concatenate alternating columns

I have two dataframes as follows:
df2 = pd.DataFrame(np.random.randn(5,2),columns=['A','C'])
df3 = pd.DataFrame(np.random.randn(5,2),columns=['B','D'])
I wish to get the columns in an alternating fashion such that I get the result below:
df4 = pd.DataFrame()
for i in range(len(df2.columns)):
df4[df2.columns[i]]=df2[df2.columns[i]]
df4[df3.columns[i]]=df3[df3.columns[i]]
df4
A B C D
0 1.056889 0.494769 0.588765 0.846133
1 1.536102 2.015574 -1.279769 -0.378024
2 -0.097357 -0.886320 0.713624 -1.055808
3 -0.269585 -0.512070 0.755534 0.855884
4 -2.691672 -0.597245 1.023647 0.278428
I think I'm being really inefficient with this solution. What is the more pythonic/ pandic way of doing this?
p.s. In my specific case the column names are not A,B,C,D and aren't alphabetically arranged. Just so know which two dataframes I want to combine.
If you need something more dynamic, first zip both columns names of both DataFrames and then flat it:
df5 = pd.concat([df2, df3], axis=1)
print (df5)
A C B D
0 0.874226 -0.764478 1.022128 -1.209092
1 1.411708 -0.395135 -0.223004 0.124689
2 1.515223 -2.184020 0.316079 -0.137779
3 -0.554961 -0.149091 0.179390 -1.109159
4 0.666985 1.879810 0.406585 0.208084
#http://stackoverflow.com/a/10636583/2901002
print (list(sum(zip(df2.columns, df3.columns), ())))
['A', 'B', 'C', 'D']
print (df5[list(sum(zip(df2.columns, df3.columns), ()))])
A B C D
0 0.874226 1.022128 -0.764478 -1.209092
1 1.411708 -0.223004 -0.395135 0.124689
2 1.515223 0.316079 -2.184020 -0.137779
3 -0.554961 0.179390 -0.149091 -1.109159
4 0.666985 0.406585 1.879810 0.208084
How about this?
df4 = pd.concat([df2, df3], axis=1)
Or do they have to be in a specific order? Anyway, you can always reorder them:
df4 = df4[['A','B','C','D']]
And without writing out the columns:
df4 = df4[[item for items in zip(df2.columns, df3.columns) for item in items]]
You could concat and then reindex_axis.
df = pd.concat([df2, df3], axis=1)
df.reindex_axis(df.columns[::2].tolist() + df.columns[1::2].tolist(), axis=1)
Append even indices to df2 columns and odd indices to df3 columns. Use these new levels to sort.
df2_ = df2.T.set_index(np.arange(len(df2.columns)) * 2, append=True).T
df3_ = df3.T.set_index(np.arange(len(df3.columns)) * 2 + 1, append=True).T
df = pd.concat([df2_, df3_], axis=1).sort_index(1, 1)
df.columns = df.columns.droplevel(1)
df

Categories