i have following 2 dataframes and i want to merge them.
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 1, 2, 3])
df2 = pd.DataFrame({'z': ['A4', 'A5', 'A6', 'A7'],
'e': ['B4', 'B5', 'B6', 'B7'],
'y': ['C4', 'C5', 'C6', 'C7'],
'f': ['D4', 'D5', 'D6', 'D7']},
index=[12,2, 43,24])
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
z y f
12 B4 C4 D4
32 B5 C5 D5
43 B6 C6 D6
24 B7 C7 D7
And i want like :
A B C D z y f
0 A0 B0 C0 D0 B4 C4 D4
1 A1 B1 C1 D1 B5 C5 D5
2 A2 B2 C2 D2 B6 C6 D6
3 A3 B3 C3 D3 B7 C7 D7
can anyone help me , tried below code but i didnt get the solution
pd.concat([df1, df2], axis=1)
Blockquote
You need reset_index first
df=pd.concat([df1.reset_index(drop=True),df2.reset_index(drop=True)],axis=1)
You could either reset the index and use pd.concat (like in YOBEN_S's answer), or stack the values with numpy.
>>> pd.DataFrame(np.hstack([df1, df2]), columns=[*df1.columns, *df2.columns])
A B C D z e y f
0 A0 B0 C0 D0 A4 B4 C4 D4
1 A1 B1 C1 D1 A5 B5 C5 D5
2 A2 B2 C2 D2 A6 B6 C6 D6
3 A3 B3 C3 D3 A7 B7 C7 D7
Related
Is there an easy way to reformat the columns from
2000-01-03 Location1 A1 B1 C1 A2 B2 C2 A3 B3 C3
2000-01-04 Location2 A1 B1 C1 A2 B2 C2 A3 B3 C3
2000-01-05 Location3 A1 B1 C1 A2 B2 C2 A3 B3 C3
to
2000-01-03 Location1 A1 A2 A3 B1 B2 B3 C1 C2 C3
2000-01-04 Location2 A1 A2 A3 B1 B2 B3 C1 C2 C3
2000-01-05 Location3 A1 A2 A3 B1 B2 B3 C1 C2 C3
Thanks
To reorder columns dynamically based on value, here is a way to do it:
df.sort_values(df.index[0], axis=1)
This returns a dataframe with columns (axis=1) ordered based on sorted value of first row.
Here is a full example using your data sample:
import pandas as pd
from io import StringIO
sample=StringIO('''date location x1 y1 z1 x2 y2 z2 x3 y3 z3
2000-01-03 Location1 A1 B1 C1 A2 B2 C2 A3 B3 C3
2000-01-04 Location2 A1 B1 C1 A2 B2 C2 A3 B3 C3
2000-01-05 Location3 A1 B1 C1 A2 B2 C2 A3 B3 C3''')
df = pd.read_csv(sample, sep=' ')
print(df)
df2 = df.sort_values(df.index[0], axis=1)
initialCols = ['date','location']
restCols = [col for col in df2.columns if col not in initialCols]
dfFinal = df2[initialCols + restCols]
dfFinal
One of the easiest ways to reorder the columns of Pandas DataFrame is to use indexing
reordered_col_list = ['Date', 'Location', 'A1', 'A2', 'A3', 'B1', 'B2', 'B3', 'C1', 'C2', 'C3']
df = df[reordered_col_list]
I have two dataframes:
df1 :
A B C
0 a0 b0 c0
1 a1 b1 c1
2 a2 b2 c2
3 a3 b3 c3
4 a4 b4 c4
df2 :
A B C
0 a0 b0 c11
1 a1 b1 c5
2 a70 b2 c20
3 a3 b9 c9
In df1, for every row, whenever Column A and Column B values are equal to values in df2, column C should be updated with value from df2.
Output:
A B C
0 a0 b0 c11
1 a1 b1 c5
2 a2 b2 c2
3 a3 b3 c3
4 a4 b4 c4
I tried the following, but it did not work.
df1.set_index(['A', 'B'])
df2.set_index(['A', 'B'])
df1.update(df2)
df1.reset_index()
df2.reset_index()
df1["C"][:4] = np.where((df1["A"][:4]==df2["A"])&(df1["B"][:4]==df2["B"]),df2["C"],df1["C"][:4])
A B C
0 a0 b0 c11
1 a1 b1 c5
2 a2 b2 c2
3 a3 b3 c3
4 a4 b4 c4
I'm trying to join two DataFrames by index that can contain columns in common and I only want to add one to the other if that specific value is NaN or doesn't exist. I'm using the pandas example, so I've got:
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 1, 2, 3])
as
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
and
df4 = pd.DataFrame({'B': ['B2p', 'B3p', 'B6p', 'B7p'],
'D': ['D2p', 'D3p', 'D6p', 'D7p'],
'F': ['F2p', 'F3p', 'F6p', 'F7p']},
index=[2, 3, 6, 7])
as
B D F
2 B2p D2p F2p
3 B3p D3p F3p
6 B6p D6p F6p
7 B7p D7p F7p
and the searched result is:
A B C D F
0 A0 B0 C0 D0 Nan
1 A1 B1 C1 D1 Nan
2 A2 B2 C2 D2 F2p
3 A3 B3 C3 D3 F3p
6 Nan B6p Nan D6p F6p
7 Nan B7p Nan D7p F7p
This is a good use case for combine_first, where the row and column indices of the resulting dataframe will be the union of the two, i.e in the absence of an index in one of the dataframes, the value from the other is used (same behaviour as if it contained a NaN:
df1.combine_first(df4)
A B C D F
0 A0 B0 C0 D0 NaN
1 A1 B1 C1 D1 NaN
2 A2 B2 C2 D2 F2p
3 A3 B3 C3 D3 F3p
6 NaN B6p NaN D6p F6p
7 NaN B7p NaN D7p F7p
I would like to concatenate two df in both directions at the same time.
It means, if the index does not exist, it is created.
And if the column does not exist, it is created also.
import pandas as pd
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 1, 2, 3])
df2 = pd.DataFrame({'A': ['A4'],
'D': ['D4']},
index=[4])
df3 = pd.DataFrame({'A': ['E4'],
'F': ['F4']},
index=[4])
result = pd.concat([df1, df2, df3])
It gives :
A B C D F
0 A0 B0 C0 D0 NaN
1 A1 B1 C1 D1 NaN
2 A2 B2 C2 D2 NaN
3 A3 B3 C3 D3 NaN
4 A4 NaN NaN D4 NaN
4 E4 NaN NaN NaN F4
Instead of :
A B C D F
0 A0 B0 C0 D0 NaN
1 A1 B1 C1 D1 NaN
2 A2 B2 C2 D2 NaN
3 A3 B3 C3 D3 NaN
4 E4 NaN NaN D4 F4
I want to update data frame X on values from dataframe from Y.
X = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2'],
'C': ['C0', 'C1', 'C2'],
'D': ['D0', 'D1', 'D2']})
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
Y = pd.DataFrame({'A': ['A0', 'A1'],
'B': ['B0', 'B1'],
'C': ['C0xx', 'C1xx'],
'D': ['D0xx', 'D1xx']})
A B C D
0 A0 B0 C0xx D0xx
1 A1 B1 C1xx D1xx
And the result to be:
A B C D
0 A0 B0 C0xx D0xx
1 A1 B1 C1xx D1xx
2 A2 B2 C2 D2
Of course my dataframe is match bigger.
1. Both DataFrames have the same index
This is the case you presented in the example given in your question.
You might want to use the update method:
>>> X.update(Y)
>>> X
A B C D
0 A0 B0 C0xx D0xx
1 A1 B1 C1xx D1xx
2 A2 B2 C2 D2
It also works if lines are in a different order in X and Y:
>>> Y = pd.DataFrame({'A': ['A1', 'A0'],
'B': ['B1', 'B0'],
'C': ['C1xx', 'C0xx'],
'D': ['D1xx', 'D0xx']},
index=[1,0])
>>> Y
A B C D
1 A1 B1 C1xx D1xx
0 A0 B0 C0xx D0xx
>>> X.update(Y)
>>> X
A B C D
0 A0 B0 C0xx D0xx
1 A1 B1 C1xx D1xx
2 A2 B2 C2 D2
2. Different indexes
If Y has a different index:
>>> Y = pd.DataFrame({'A': ['A0', 'A1'],
'B': ['B0', 'B1'],
'C': ['C0xx', 'C1xx'],
'D': ['D0xx', 'D1xx']},
index=[2,1])
>>> Y
A B C D
2 A0 B0 C0xx D0xx
1 A1 B1 C1xx D1xx
You can still use update if you can find another column usable as an index (identifying the lines so that they match the lines to be replaced). I take the example of the "A" column but a multiple index would work as well.
>>> X2, Y2 = X.set_index("A"), Y.set_index("A")
>>> X2.update(Y2)
>>> X2.reset_index(inplace=True)
>>> X2
A B C D
0 A0 B0 C0xx D0xx
1 A1 B1 C1xx D1xx
2 A2 B2 C2 D2
I think you need combine_first with set_index if need add missing values by A, B columns in both df:
print (Y.set_index(['A','B']).combine_first(X.set_index(['A','B'])).reset_index())
A B C D
0 A0 B0 C0xx D0xx
1 A1 B1 C1xx D1xx
2 A2 B2 C2 D2
Unfortunately update works bad:
Y = pd.DataFrame({'A': ['A0', 'A1'],
'B': ['B0', 'B1'],
'C': ['C0xx', 'C1xx'],
'D': ['D0xx', 'D1xx']}, index=[2,1])
print (X)
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
print (Y)
A B C D
2 A0 B0 C0xx D0xx
1 A1 B1 C1xx D1xx
X.update(Y)
print (X)
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1xx D1xx
2 A0 B0 C0xx D0xx
X.set_index(['A','B']).update(Y.set_index(['A','B']))
print (X)
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
print (Y.set_index(['A','B']).combine_first(X.set_index(['A','B'])).reset_index())
A B C D
0 A0 B0 C0xx D0xx
1 A1 B1 C1xx D1xx
2 A2 B2 C2 D2