Pandas Copy columns from one data frame to another with different name - python

I have to copy columns from one DataFrame A to another DataFrame B. The column names in A and B do not match.
What is the best way to do it? There are several columns like this. Do I need to write for each column like B["SO"] = A["Sales Order"] etc.

i would use pd.concat
combined_df = pd.concat([df1, df2[['column_a', 'column_b']]], axis=1)
also gives you the power to concat different size dateframes , outer join etc.

Use:
df1 = pd.DataFrame({
'SO':list('abcdef'),
'RI':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
})
print (df1)
SO RI C
0 a 4 7
1 b 5 8
2 c 4 9
3 d 5 4
4 e 5 2
5 f 4 3
df2 = pd.DataFrame({
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')
})
print (df2)
D E F
0 1 5 a
1 3 3 a
2 5 6 a
3 7 9 b
4 1 2 b
5 0 4 b
Create dictionary for rename, select columns matched, rename by dict and DataFrame.join to original - DataFrames matched by index values:
d = {'SO':'Sales Order',
'RI':'Retail Invoices'}
df11 = df1[d.keys()].rename(columns=d)
print (df11)
Sales Order Retail Invoices
0 a 4
1 b 5
2 c 4
3 d 5
4 e 5
5 f 4
df = df2.join(df11)
print (df)
D E F Sales Order Retail Invoices
0 1 5 a a 4
1 3 3 a b 5
2 5 6 a c 4
3 7 9 b d 5
4 1 2 b e 5
5 0 4 b f 4

Make a dictionary of abbreviations. And try this code.
Ex:
full_form_dict = {'SO':'Sales Order',
'RI':'Retail Invoices',}
A_col = list(A.columns)
B_col = [v for k,v in full_form_dict.items() if k in A_col]
# to loop over A_col
# B_col = [v for col in A_col for k,v in full_form_dict.items() if k == col]

Related

Join tables and create combinations in python

In advance: Sorry, the title is a bit fuzzy
PYTHON
I have two tables. In one there are unique names for example 'A', 'B', 'C' and in the other table there is a Time series with months example 10/2021, 11/2021, 12/2021. I want to join the tables now that I have all TimeStemps for each name. So the final data should look like this:
Month
Name
10/2021
A
11/2021
A
12/2021
A
10/2021
B
11/2021
B
12/2021
B
10/2021
C
11/2021
C
12/2021
C
from cartesian product in pandas
df1 = pd.DataFrame([1, 2, 3], columns=['A'])
df2 = pd.DataFrame(["a", "b", "c"], columns=['B'])
df = (df1.assign(key=1)
.merge(df2.assign(key=1), on="key")
.drop("key", axis=1)
)
A B
0 1 a
1 1 b
2 1 c
3 2 a
4 2 b
5 2 c
6 3 a
7 3 b
8 3 c
If you are only trying to get the cartesian product of the values - you can do it using itertools.product
import pandas as pd
from itertools import product
df1 = pd.DataFrame(list('abcd'), columns=['letters'])
df2 = pd.DataFrame(list('1234'), columns=['numbers'])
df_combined = pd.DataFrame(product(df1['letters'], df2['numbers']), columns=['letters', 'numbers'])
output
letters numbers
0 a 1
1 a 2
2 a 3
3 a 4
4 b 1
5 b 2
6 b 3
7 b 4
8 c 1
9 c 2
10 c 3
11 c 4
12 d 1
13 d 2
14 d 3
15 d 4

Add all columns form one dataframe to another without joining on a key/index

Having two dataframes df1 and df2 (same number of rows) how can we, very simply, take all the columns from df2 and add them to df1? Using join, we are joining them on the index or a given column, but assuming their index's are completely different and they have no columns in common. Is that doable (without the obvious way of looping over each column in df2and add them as new to df1)?
EDIT: added an example.
Note; no index, column names are mentioned since it should not matter (thats is the "problem").
df1= [[1,3,2,
[11,20,33]]
df2 = [["bird",np.nan,37,np.sqrt(2)]
["dog",0.123,3.14,0]]
pd.some_operation(df1,df2)
#[[1,3,2,"bird",np.nan,37,np.sqrt(2)]
#[11,20,33,"dog",0.123,3.14,0]]
Samples:
df1 = pd.DataFrame({
'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
}, index = list('QRSTUW'))
df2 = pd.DataFrame({
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')
}, index = list('KLMNOP'))
Pandas always use index values if use join or concat by axis=1, so for correct alignement is necessary create same index values:
df = df1.join(df2.set_index(df1.index))
df = pd.concat([df1, df2.set_index(df1.index)], axis=1)
print (df)
A B C D E F
Q a 4 7 1 5 a
R b 5 8 3 3 a
S c 4 9 5 6 a
T d 5 4 7 9 b
U e 5 2 1 2 b
W f 4 3 0 4 b
Or create default index in both DataFrames:
df = df1.reset_index(drop=True).join(df2.reset_index(drop=True))
df = pd.concat([df1.reset_index(drop=True), df2.reset_index(drop=True)], axis=1)
print (df)
A B C D E F
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 2 1 2 b
5 f 4 3 0 4 b

Create an pd.Dataframe from series

I have a Dataframe like this:
then i am going to get one row with this and add a new column with an column Name time and value 15.
loc_OBL_ein = df.loc[5]
loc_OBL_ein.insert(1,'time',value=15)
then i get an error 'Series' object has no attribute 'insert'.
My idea now was to convert loc_OBL_ein into an object with the same column names like df. How can I do that?
Or is there another way to get this one particular row and keep the object format?
Thank you,
R
It seems you need nested lists to get the row in the DataFrame from index 5:
df = pd.DataFrame({
'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')
})
print (df)
A B C D E F
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 2 1 2 b
5 f 4 3 0 4 b
loc_OBL_ein = df.loc[[5]]
loc_OBL_ein.insert(1,'time',value=15)
print (loc_OBL_ein)
A time B C D E F
5 f 15 4 3 0 4 b

How to combine two Series with the same index in python?

I have two Series (df1 and df2) of equal length, which need to be combined into one DataFrame column as follows. Each index has only one value or no values but never two values, so there are no duplicates (e.g. if df1 has a value 'A' at index 0, then df2 is empty at index 0, and vice versa).
df1 = c1 df2 = c2
0 A 0
1 B 1
2 2 C
3 D 3
4 E 4
5 5 F
6 6
7 G 7
The result I want is this:
0 A
1 B
2 C
3 D
4 E
5 F
6
7 G
I have tried .concat, .append and .union, but these do not produce the desired result. What is the correct approach then?
You can try so:
df1['new'] = df1['c1'] + df2['c2']
For an in-place solution, I recommend pd.Series.replace:
df1['c1'].replace('', df2['c2'], inplace=True)
print(df1)
c1
0 A
1 B
2 C
3 D
4 E
5 F
6
7 G

Formatting dataframe in appending

I want to append 2 dataframes:
data1:
a
1 a
2 b
3 c
4 d
5 e
data2:
b
1 f
2 g
3 h
4 i
5 j
output:
1 a
2 b
3 c
4 d
5 e
6 f
7 g
8 h
9 i
10 j
currently i am using:
all_data= data1.append(data2, ignore_index=True)
this gives me result as:
a b
1 a
2 b
3 c
4 d
5 e
6 f
7 g
8 h
9 i
10 j
i.e. in different columns.
How can i get them in the same column?
Also tried converting the dataframes into list and then tring to append it. But it gave me the error:
TypeError: append() takes no keyword arguments
Also, is there any other function to remove duplicates from the datarame of strings? The drop_duplicates() function does not work in my case. The data still has duplicates.
You need to change one column name, so append can detect hat you want to do:
data2.columns = ["a"]
or
data1.columns = ["b"]
And then, after using data2.columns = ["a"]:
all_data = data1.append(data2, ignore_index=True)
all_data
a
0 a
1 b
2 c
3 d
4 e
5 f
6 g
7 h
8 i
9 j
And here you have your column named after the column's name of data1, which you can rename if you want:
all_data.columns = ["Foo"]
merge or concat work on keys. In this case, there are no common columns. However, why not use numpy append and create the dataframe?
In [68]: pd.DataFrame(pd.np.append(data1.values, data2.values), columns=['A'])
Out[68]:
A
0 a
1 b
2 c
3 d
4 e
5 f
6 g
7 h
8 i
9 j
df1.columns = ['b']
Out[78]:
b
0 a
1 b
2 c
3 d
4 e
pd.concat([df1 , df2] , ignore_index=True)
Out[80]:
b
0 a
1 b
2 c
3 d
4 e
5 f
6 g
7 h
8 i
9 j

Categories