Adding a column in dataframes based on similar columns in them

Adding a column in dataframes based on similar columns in them - python

I am trying to get an output where I wish to add column d in d1 and d2 where a b c are same (like groupby).
For example
d1 = pd.DataFrame([[1,2,3,4]],columns=['a','b','c','d'])
d2 = pd.DataFrame([[1,2,3,4],[2,3,4,5]],columns=['a','b','c','d'])
then I'd like to get an output as
a b c d
0 1 2 3 8
1 2 3 4 5
Merging the two data frames and adding the resultant column d where a b c are same.
d1.add(d2) or radd gives me an aggregate of all columns
The solution should be a DataFrame which can be added again to another similarly.
Any help is appreciated.

You can use set_index first:
print (d2.set_index(['a','b','c'])
.add(d1.set_index(['a','b','c']), fill_value=0)
.astype(int)
.reset_index())
a b c d
0 1 2 3 8
1 2 3 4 5

df = pd.concat([d1, d2])
df.drop_duplicates()
a b c d
0 1 2 3 4
1 2 3 4 5

Related

Merge on key and keep values of first dataframe

I have two dataframes:
df1
key value
A 1
B 2
C 2
D 3
df2
key value
C 3
D 3
E 5
F 7
I would like to merge this dataframes by their key and get a dataframe which looks like this one. So, I want to get only one column (no new column with suffixes) and remove the value of df2 if the values do not fit together.
df_merged
key value
A 1
B 2
C 2
D 3
E 5
F 7
How can I do this? Should I rather take join or concatenate? Thanks a lot!

Use concat with DataFrame.drop_duplicates by column key:
df = pd.concat([df1, df2], ignore_index=True).drop_duplicates('key')
print (df)
key value
0 A 1
1 B 2
2 C 2
3 D 3
6 E 5
7 F 7

Just adding to #jezrael's answer, you could also use groupby with first:
>>> pd.concat([df1, df2], ignore_index=True).groupby('key', as_index=False).first()
key value
0 A 1
1 B 2
2 C 2
3 D 3
4 E 5
5 F 7
>>>

Add all columns form one dataframe to another without joining on a key/index

Having two dataframes df1 and df2 (same number of rows) how can we, very simply, take all the columns from df2 and add them to df1? Using join, we are joining them on the index or a given column, but assuming their index's are completely different and they have no columns in common. Is that doable (without the obvious way of looping over each column in df2and add them as new to df1)?
EDIT: added an example.
Note; no index, column names are mentioned since it should not matter (thats is the "problem").
df1= [[1,3,2,
[11,20,33]]
df2 = [["bird",np.nan,37,np.sqrt(2)]
["dog",0.123,3.14,0]]
pd.some_operation(df1,df2)
#[[1,3,2,"bird",np.nan,37,np.sqrt(2)]
#[11,20,33,"dog",0.123,3.14,0]]

Samples:
df1 = pd.DataFrame({
'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
}, index = list('QRSTUW'))
df2 = pd.DataFrame({
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')
}, index = list('KLMNOP'))
Pandas always use index values if use join or concat by axis=1, so for correct alignement is necessary create same index values:
df = df1.join(df2.set_index(df1.index))
df = pd.concat([df1, df2.set_index(df1.index)], axis=1)
print (df)
A B C D E F
Q a 4 7 1 5 a
R b 5 8 3 3 a
S c 4 9 5 6 a
T d 5 4 7 9 b
U e 5 2 1 2 b
W f 4 3 0 4 b
Or create default index in both DataFrames:
df = df1.reset_index(drop=True).join(df2.reset_index(drop=True))
df = pd.concat([df1.reset_index(drop=True), df2.reset_index(drop=True)], axis=1)
print (df)
A B C D E F
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 2 1 2 b
5 f 4 3 0 4 b

How do I merge pandas df with multiple columns using one key from another column?

I have a Dataframe with two columns A and B (df1)
A B
1 2
1 3
2 3
And a Dataframe (df2) with a "dictionary" describing 1, 2 and 3
O P Q
1 s a
2 s b
3 t b
Now I want to merge the first table with the second table such that I get the following:
A B P1 Q1 P2 Q2
1 2 s a s b
1 3 s a t b
2 3 s b t b
I've tried df1.merge(df2, left_on=["A","B"], right_on=["O","O"])

You have two separate merging schemes here, so you'll have to call merge twice:
(df1.merge(df2, left_on="A", right_on="O")
.merge(df2, left_on="B", right_on="O")
.drop(columns=['O_x', 'O_y']))
A B P_x Q_x P_y Q_y
0 1 2 s a s b
1 1 3 s a t b
2 2 3 s b t b

Pandas Copy columns from one data frame to another with different name

I have to copy columns from one DataFrame A to another DataFrame B. The column names in A and B do not match.
What is the best way to do it? There are several columns like this. Do I need to write for each column like B["SO"] = A["Sales Order"] etc.

i would use pd.concat
combined_df = pd.concat([df1, df2[['column_a', 'column_b']]], axis=1)
also gives you the power to concat different size dateframes , outer join etc.

Use:
df1 = pd.DataFrame({
'SO':list('abcdef'),
'RI':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
})
print (df1)
SO RI C
0 a 4 7
1 b 5 8
2 c 4 9
3 d 5 4
4 e 5 2
5 f 4 3
df2 = pd.DataFrame({
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')
})
print (df2)
D E F
0 1 5 a
1 3 3 a
2 5 6 a
3 7 9 b
4 1 2 b
5 0 4 b
Create dictionary for rename, select columns matched, rename by dict and DataFrame.join to original - DataFrames matched by index values:
d = {'SO':'Sales Order',
'RI':'Retail Invoices'}
df11 = df1[d.keys()].rename(columns=d)
print (df11)
Sales Order Retail Invoices
0 a 4
1 b 5
2 c 4
3 d 5
4 e 5
5 f 4
df = df2.join(df11)
print (df)
D E F Sales Order Retail Invoices
0 1 5 a a 4
1 3 3 a b 5
2 5 6 a c 4
3 7 9 b d 5
4 1 2 b e 5
5 0 4 b f 4

Make a dictionary of abbreviations. And try this code.
Ex:
full_form_dict = {'SO':'Sales Order',
'RI':'Retail Invoices',}
A_col = list(A.columns)
B_col = [v for k,v in full_form_dict.items() if k in A_col]
# to loop over A_col
# B_col = [v for col in A_col for k,v in full_form_dict.items() if k == col]

How to combine two Series with the same index in python?

I have two Series (df1 and df2) of equal length, which need to be combined into one DataFrame column as follows. Each index has only one value or no values but never two values, so there are no duplicates (e.g. if df1 has a value 'A' at index 0, then df2 is empty at index 0, and vice versa).
df1 = c1 df2 = c2
0 A 0
1 B 1
2 2 C
3 D 3
4 E 4
5 5 F
6 6
7 G 7
The result I want is this:
0 A
1 B
2 C
3 D
4 E
5 F
6
7 G
I have tried .concat, .append and .union, but these do not produce the desired result. What is the correct approach then?

You can try so:
df1['new'] = df1['c1'] + df2['c2']

For an in-place solution, I recommend pd.Series.replace:
df1['c1'].replace('', df2['c2'], inplace=True)
print(df1)
c1
0 A
1 B
2 C
3 D
4 E
5 F
6
7 G

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Adding a column in dataframes based on similar columns in them - python

You can use set_index first: print (d2.set_index(['a','b','c']) .add(d1.set_index(['a','b','c']), fill_value=0) .astype(int) .reset_index()) a b c d 0 1 2 3 8 1 2 3 4 5

df = pd.concat([d1, d2]) df.drop_duplicates() a b c d 0 1 2 3 4 1 2 3 4 5

Related

Merge on key and keep values of first dataframe

Add all columns form one dataframe to another without joining on a key/index

How do I merge pandas df with multiple columns using one key from another column?

Pandas Copy columns from one data frame to another with different name

How to combine two Series with the same index in python?

Categories

Resources