I have two dataframes that look like this:
df1=
A B
1 A1 B1
2 A2 B2
3 A3 B3
df2 =
A C
4 A4 C4
5 A5 C5
I would like to append df2 to df1, like so:
A B
1 A1 B1
2 A2 B2
3 A3 B3
4 A4 NaN
5 A5 NaN
(Note: I've edited the dataframes so that not all the columns in df1 are necessarily in df2)
Whether I use concat or append, the resulting dataframe I get would have a column called "C" with the first three rows filled with nan. I just want to keep the two original columns in df1, with the new values appended. Is there a way concatenate the dataframes without having to drop the extra column afterwards?
You can first filter columns for appending by subset:
print (df2[['A']])
A
4 A4
5 A5
print (pd.concat([df1, df2[['A']]]))
A B
1 A1 B1
2 A2 B2
3 A3 B3
4 A4 NaN
5 A5 NaN
print (df1.append(df2[['A']]))
A B
1 A1 B1
2 A2 B2
3 A3 B3
4 A4 NaN
5 A5 NaN
print (df2[['A','B']])
A B
4 A4 B4
5 A5 B5
print (pd.concat([df1, df2[['A','B']]]))
A B
1 A1 B1
2 A2 B2
3 A3 B3
4 A4 B4
5 A5 B5
Or:
print (df1.append(df2[['A','B']]))
A B
1 A1 B1
2 A2 B2
3 A3 B3
4 A4 B4
5 A5 B5
EDIT by comment:
If columns in df1 and df2 have different columns, use intersection:
print (df1)
A B D
1 A1 B1 R
2 A2 B2 T
3 A3 B3 E
print (df2)
A B C
4 A4 B4 C4
5 A5 B5 C5
print (df1.columns.intersection(df2.columns))
Index(['A', 'B'], dtype='object')
print (pd.concat([df1, df2[df1.columns.intersection(df2.columns)]]))
A B D
1 A1 B1 R
2 A2 B2 T
3 A3 B3 E
4 A4 B4 NaN
5 A5 B5 NaN
Actually the solution is in an obscure corner of this page. Here's the code to use:
pd.concat([df1,df2],join_axes=[df1.columns])
Related
I have two dfs that I want to concat
(sorry I don't know how to properly recreate a df here)
A B
a1 b1
a2 b2
a3 b3
A C
a1 c1
a4 c4
Result:
A B C
a1 b1 c1
a2 b2 NaN
a3 b3 NaN
a4 NaN c4
I have tried:
merge = pd.concat([df1,df2],axis = 0,ignore_index= True)
but this seems to just append the second df to the first df
Thank you!
I believe you need an outer join:
>>> pd.merge(df,df2,how='outer')
A B C
0 a1 b1 c1
1 a2 b2 NaN
2 a3 b3 NaN
3 a4 NaN c4
I have two dataframes:
df1 :
A B C
0 a0 b0 c0
1 a1 b1 c1
2 a2 b2 c2
3 a3 b3 c3
4 a4 b4 c4
df2 :
A B C
0 a0 b0 c11
1 a1 b1 c5
2 a70 b2 c20
3 a3 b9 c9
In df1, for every row, whenever Column A and Column B values are equal to values in df2, column C should be updated with value from df2.
Output:
A B C
0 a0 b0 c11
1 a1 b1 c5
2 a2 b2 c2
3 a3 b3 c3
4 a4 b4 c4
I tried the following, but it did not work.
df1.set_index(['A', 'B'])
df2.set_index(['A', 'B'])
df1.update(df2)
df1.reset_index()
df2.reset_index()
df1["C"][:4] = np.where((df1["A"][:4]==df2["A"])&(df1["B"][:4]==df2["B"]),df2["C"],df1["C"][:4])
A B C
0 a0 b0 c11
1 a1 b1 c5
2 a2 b2 c2
3 a3 b3 c3
4 a4 b4 c4
I have a df like this:
A B C D E F
2 a1 a2 a3 a4 100
2 a1 b2 c3 a4 100 # note
2 b1 b2 b3 b4 100
2 c1 c2 c3 c4 100
1 a1 a2 a3 a4 120
2 a1 b2 c3 a4 150 # note
1 b1 b2 b3 b4 130
1 c1 c2 c3 c4 110
0 a1 a2 a3 a4 80
I want to compare the results of F column where the columns B-E match based on A column like so:
A B C D E F diff
2 a1 a2 a3 a4 100 120/100
2 a1 b2 c3 a4 100 # note 150/100
2 b1 b2 b3 b4 100 130/100
2 c1 c2 c3 c4 100 110/100
1 a1 a2 a3 a4 120 80/120
1 a1 b2 c3 a4 150 # note
1 b1 b2 b3 b4 130
1 c1 c2 c3 c4 110
0 a1 a2 a3 a4 80
Since the first line has the same values in the first line where A is 1 I do 120/100.
What I've tried:
df.groupby(['B',' 'C', 'D', 'E']) - this groups the data, but I don't know how I could apply the logic of calculating the previous value of column A. Or maybe there is a simpler way of achieving it.
Use DataFrameGroupBy.shift with Series.div:
df['d'] = df.groupby(['B', 'C', 'D', "E"])['F'].shift(-1).div(df['F'])
print (df)
A B C D E F d
0 2 a1 a2 a3 a4 100 1.200000
1 2 a1 b2 c3 a4 100 1.500000
2 2 b1 b2 b3 b4 100 1.300000
3 2 c1 c2 c3 c4 100 1.100000
4 1 a1 a2 a3 a4 120 0.666667
5 2 a1 b2 c3 a4 150 NaN
6 1 b1 b2 b3 b4 130 NaN
7 1 c1 c2 c3 c4 110 NaN
8 0 a1 a2 a3 a4 80 NaN
I have two DataFrames of arbitrary shape of the type:
A B C
0 A0 B0 C0
1 A1 B1 C1
2 A2 B2 NaN
3 A3 NaN NaN
4 A4 NaN NaN
and
A B C
2 NaN NaN C2
3 NaN B3 C3
4 NaN B4 C4
5 A5 B5 C5
6 A6 B6 C6
The two DataFrames have overlapping indexes. Where there is an overlap, for a given column, there is a non-NaN in one DataFrame, and a NaN in the other. How can I concatenate these such that I can achieve a DataFrame with all values and no NaNs:
A B C
0 A0 B0 C0
1 A1 B1 C1
2 A2 B2 C2
3 A3 B3 C3
4 A4 B4 C4
5 A5 B5 C5
6 A6 B6 C6
My proposed solution is:
df3 = pd.concat([pd.concat([df1[col].dropna(), df2[col].dropna()]) for col in df1.columns], axis=1)
However, ideally I would not work column-by-column.
Use combine_first:
df = df1.combine_first(df2)
print(df)
A B C
0 A0 B0 C0
1 A1 B1 C1
2 A2 B2 C2
3 A3 B3 C3
4 A4 B4 C4
5 A5 B5 C5
6 A6 B6 C6
using df.fillna() and df.append() with dropna()
df1.fillna(df2).append(df2).dropna()
A B C
0 A0 B0 C0
1 A1 B1 C1
2 A2 B2 C2
3 A3 B3 C3
4 A4 B4 C4
5 A5 B5 C5
6 A6 B6 C6
I have a DataFrame as follows. Both columns have Member_ID which indicates which Member_ID connected with other Member_ID
col1 col2
1 3
1 4
1 5
2 3
2 4
3 1
3 2
3 5
4 1
4 2
5 1
5 3
and I have calculated each Member_ID connected with how many Member_ID. For example Member_ID 1 is connected with 3 Member_ID. If an Member_ID contains more or equal to 3 connections we have to put "a" in front of the Member_Id else we have to put "b" so the label we have to give the label as "a1" for Member_ID 1.
Likewise I have calculated the labels for each Member_Id and the label array is below.
member_ID No_of_con Label
1 3 a1
2 2 b2
3 3 a3
4 2 b4
5 2 b5
Now I have to replace the first Dataframe's values referring from the label array. Dataframe is big for using for loops is not efficient So how can i achive this using Pandas in simpler way? I'm expecting the result as below
col1 col2
a1 a3
a1 b4
a1 b5
b2 a3
b2 b4
a3 a1
a3 b2
a3 b5
b4 a1
b4 b2
b5 a1
b5 a3
we can stack, map and unstack:
In [9]: d1.stack().map(d2.set_index('member_ID')['Label']).unstack()
Out[9]:
col1 col2
0 a1 a3
1 a1 b4
2 a1 b5
3 b2 a3
4 b2 b4
5 a3 a1
6 a3 b2
7 a3 b5
8 b4 a1
9 b4 b2
10 b5 a1
11 b5 a3
Or you can try this
df2.set_index('member_ID',inplace=True)
df1.apply(lambda x: x.map(df2['Label']))
col1 col2
0 a1 a3
1 a1 b4
2 a1 b5
3 b2 a3
4 b2 b4
5 a3 a1
6 a3 b2
7 a3 b5
8 b4 a1
9 b4 b2
10 b5 a1
11 b5 a3
You can use pd.DataFrame.replace using a pd.Series in a dictionary context.
d1.replace(d2.set_index('member_ID').Label)
col1 col2
0 a1 a3
1 a1 b4
2 a1 b5
3 b2 a3
4 b2 b4
5 a3 a1
6 a3 b2
7 a3 b5
8 b4 a1
9 b4 b2
10 b5 a1
11 b5 a3