I have two dataframes as follows:
df2 = pd.DataFrame(np.random.randn(5,2),columns=['A','C'])
df3 = pd.DataFrame(np.random.randn(5,2),columns=['B','D'])
I wish to get the columns in an alternating fashion such that I get the result below:
df4 = pd.DataFrame()
for i in range(len(df2.columns)):
df4[df2.columns[i]]=df2[df2.columns[i]]
df4[df3.columns[i]]=df3[df3.columns[i]]
df4
A B C D
0 1.056889 0.494769 0.588765 0.846133
1 1.536102 2.015574 -1.279769 -0.378024
2 -0.097357 -0.886320 0.713624 -1.055808
3 -0.269585 -0.512070 0.755534 0.855884
4 -2.691672 -0.597245 1.023647 0.278428
I think I'm being really inefficient with this solution. What is the more pythonic/ pandic way of doing this?
p.s. In my specific case the column names are not A,B,C,D and aren't alphabetically arranged. Just so know which two dataframes I want to combine.
If you need something more dynamic, first zip both columns names of both DataFrames and then flat it:
df5 = pd.concat([df2, df3], axis=1)
print (df5)
A C B D
0 0.874226 -0.764478 1.022128 -1.209092
1 1.411708 -0.395135 -0.223004 0.124689
2 1.515223 -2.184020 0.316079 -0.137779
3 -0.554961 -0.149091 0.179390 -1.109159
4 0.666985 1.879810 0.406585 0.208084
#http://stackoverflow.com/a/10636583/2901002
print (list(sum(zip(df2.columns, df3.columns), ())))
['A', 'B', 'C', 'D']
print (df5[list(sum(zip(df2.columns, df3.columns), ()))])
A B C D
0 0.874226 1.022128 -0.764478 -1.209092
1 1.411708 -0.223004 -0.395135 0.124689
2 1.515223 0.316079 -2.184020 -0.137779
3 -0.554961 0.179390 -0.149091 -1.109159
4 0.666985 0.406585 1.879810 0.208084
How about this?
df4 = pd.concat([df2, df3], axis=1)
Or do they have to be in a specific order? Anyway, you can always reorder them:
df4 = df4[['A','B','C','D']]
And without writing out the columns:
df4 = df4[[item for items in zip(df2.columns, df3.columns) for item in items]]
You could concat and then reindex_axis.
df = pd.concat([df2, df3], axis=1)
df.reindex_axis(df.columns[::2].tolist() + df.columns[1::2].tolist(), axis=1)
Append even indices to df2 columns and odd indices to df3 columns. Use these new levels to sort.
df2_ = df2.T.set_index(np.arange(len(df2.columns)) * 2, append=True).T
df3_ = df3.T.set_index(np.arange(len(df3.columns)) * 2 + 1, append=True).T
df = pd.concat([df2_, df3_], axis=1).sort_index(1, 1)
df.columns = df.columns.droplevel(1)
df
Related
I have multiple dataframes like this-
df=pd.DataFrame({'a':[1,2,3],'b':[3,4,5],'c':[4,6,7]})
df2=pd.DataFrame({'a':[1,2,3],'d':[66,24,55],'c':[4,6,7]})
df3=pd.DataFrame({'a':[1,2,3],'f':[31,74,95],'c':[4,6,7]})
I want this output-
a c
0 1 4
1 2 6
2 3 7
This is the common columns across the 3 datasets. I am looking for a solution which works for multiple columns without having to specify the common columns as I have seen on SO( since the actual data frames are huge).
If need filter columns names with same content in each DataFrame is possible convert it to tuples and compare:
dfs = [df, df2, df3]
df1 = pd.concat([x.apply(tuple) for x in dfs], axis=1)
cols = df1.index[df1.eq(df1.iloc[:, 0], axis=0).all(axis=1)]
df2 = df[cols]
print (df2)
a c
0 1 4
1 2 6
2 3 7
If columns names should be different and is necessary compare only content:
df=pd.DataFrame({'a':[1,2,3],'b':[3,4,5],'c':[4,6,7]})
df2=pd.DataFrame({'r':[1,2,3],'t':[66,24,55],'l':[4,6,7]})
df3=pd.DataFrame({'f':[1,2,3],'g':[31,74,95],'m':[4,6,7]})
dfs = [df, df2, df3]
p = [x.apply(tuple).tolist() for x in dfs]
a = set(p[0]).intersection(*p)
print (a)
{(4, 6, 7), (1, 2, 3)}
You can use reduce, to apply function r_common cumulatively to the dataframes of dfs, from left to right, so as to reduce the list of dfs to a single dataframe df_common. The intersection method is use to find out the common columns in two dataframes d1 & d2 inside r_common function.
def r_common(d1, d2):
cols = d1.columns.intersection(d2.columns).tolist()
m = d1[cols].eq(d2[cols]).all()
return d1[m[m].index]
df_common = reduce(r_common, dfs) # dfs = [df, df2, df3]
Result:
# print(df_common)
a c
0 1 4
1 2 6
2 3 7
A combination of reduce, intersection, filter and concat could help with your usecase:
dfs = (df,df2,df3)
cols = [ent.columns for ent in dfs]
cols
[Index(['a', 'b', 'c'], dtype='object'),
Index(['a', 'd', 'c'], dtype='object'),
Index(['a', 'f', 'c'], dtype='object')]
#find the common columns to all :
from functools import reduce
universal_cols = reduce(lambda x,y : x.intersection(y), cols).tolist()
universal_cols
['a', 'c']
#filter for only universal_cols for each df
updates = [ent.filter(universal_cols) for ent in dfs]
If the columns and contents of the columns are the same, then you can skip the list comprehension and just filter from only one dataframe:
#let's use the first dataframe
output = df.filter(universal_cols)
If the columns' contents are different, then concatenate and drop duplicates:
#concatenate and drop duplicates
res = pd.concat(updates).drop_duplicates()
res #output has the same result
a c
0 1 4
1 2 6
2 3 7
i have a list ['df1', 'df2'] where i have stores some dataframes which have been filtered on few conditions. Then i have converted this list to dataframe using
df = pd.DataFrame(list1)
now the df has only one column
0
df1
df2
sometimes it may also have
0
df1
df2
df3
i wanted to concate all these my static code is
df_new = pd.concat([df1,df2],axis=1) or
df_new = pd.concat([df1,df2,df3],axis=1)
how can i make it dynamic (without me specifying as df1,df2) so that it takes the values and concat it.
Using array to add the lists and data frames :
import pandas as pd
lists = [[1,2,3],[4,5,6]]
arr = []
for l in lists:
new_df = pd.DataFrame(l)
arr.append(new_df)
df = pd.concat(arr,axis=1)
df
Result :
0 0
0 1 4
1 2 5
2 3 6
Df 1 has columns A B C D, Df2 has columns A B D. Df1 and Df2 are in a list. How do I concatenate them into 1 df?
Or can I directly append these dfs to one single df without using a list ?
Short answer: yes you can combine them into single pandas dataframe without that much work. Sample code:
import pandas as pd
df1 = [(1,2,3,4)]
df2 = [(9,9,9)]
df1 = pd.DataFrame(df1, columns=['A', 'B', 'C', 'D'])
df2 = pd.DataFrame(df2, columns=['A', 'B', 'D'])
df = pd.concat([df1, df2], sort=False)
Which results into:
>>> pd.concat([df1, df2], sort=False)
A B C D
0 1 2 3.0 4
0 9 9 NaN 9
I have two data frames with the same columns, and similar content.
I'd like apply the same functions on each, without having to brute force them, or concatenate the dfs. I tried to pass the objects into nested dictionaries, but that seems more trouble than it's worth (I don't believe dataframe.to_dict supports passing into an existing list).
However, it appears that the for loop stores the list of dfs in the df object, and I don't know how to get it back to the original dfs... see my example below.
df1 = {'Column1': [1,2,2,4,5],
'Column2': ["A","B","B","D","E"]}
df1 = pd.DataFrame(df1, columns=['Column1','Column2'])
df2 = {'Column1': [2,11,2,2,14],
'Column2': ["B","Y","B","B","V"]}
df2 = pd.DataFrame(df2, columns=['Column1','Column2'])
def filter_fun(df1, df2):
for df in (df1, df2):
df = df[(df['Column1']==2) & (df['Column2'].isin(['B']))]
return df1, df2
filter_fun(df1, df2)
If you write the filter as a function you can apply it in a list comprehension:
def filter(df):
return df[(df['Column1']==2) & (df['Column2'].isin(['B']))]
df1, df2 = [filter(df) for df in (df1, df2)]
I would recommend concatenation with custom specified keys, because 1) it is easy to assign it back, and 2) you can do the same operation once instead of twice.
# Concatenate df1 and df2
df = pd.concat([df1, df2], keys=['a', 'b'])
# Perform your operation
out = df[(df['Column1'] == 2) & df['Column2'].isin(['B'])]
out.loc['a'] # result for `df1`
Column1 Column2
1 2 B
2 2 B
out.loc['b'] # result for `df2`
Column1 Column2
0 2 B
2 2 B
3 2 B
This should work fine for most operations. For groupby, you will want to group on the 0th index level as well.
I have some data in pandas:
df1
df1['ID_A'].nunique()
5
df2
df2['ID_B'].nunique()
6
df3
df1['ID_A'].nunique()
2
df4
df2['ID_B'].nunique()
9
and so-on until 200 df.
how to make new dataframe based on this nunique
my expected result looks like this:
combine ID_A ID_B
combine_1 5 6
combine_2 2 9
thank you
Use list comprehension with list of DataFrames and if necessary change index names by list comprehensions with f-strings:
df1 = pd.DataFrame({'ID_A':[1,2,3,4,5,5],
'ID_B':[1,2,3,4,5,6]})
df2 = pd.DataFrame({'ID_A':[1,2,1,2,1,1,1,2,1],
'ID_B':[1,2,3,4,5,6,7,8,9]})
dfs = [df1, df2]
df = pd.DataFrame([x.nunique() for x in dfs])
df.index = [f'combine_{x+1}' for x in df.index]
df.index.name= 'combine'
print (df)
ID_A ID_B
combine
combine_1 5 6
combine_2 2 9
If necessary filter only columns by list:
cols = ['ID_A', 'ID_B']
dfs = [df1, df2]
df = pd.DataFrame([x[cols].nunique() for x in dfs])
#filter only columns starting by ID_
#df = pd.DataFrame([x.filter(regex='^ID_').nunique() for x in dfs])
df.index = [f'combine_{x+1}' for x in df.index]
df.index.name= 'combine'