Python Pandas multiply based on columns and add suffix

Python Pandas multiply based on columns and add suffix - python

I have two DataFrame objects which I would like to multiply based on the column names and output the new column with a suffix...
df1 = pd.DataFrame(np.random.randint(0,10, size=(5,5)), columns=list('ABCDE'))
A B C D E
0 6 2 1 7 2
1 0 0 2 1 8
2 7 2 6 6 9
3 2 5 5 1 3
4 9 1 6 7 4
df2 = pd.DataFrame(np.random.randint(1, 10, size=(5,3)), columns=list('ABC'))
A B C
0 2 1 2
1 7 5 1
2 2 1 4
3 7 8 5
4 9 2 2
I would like the output to be listed as with columns A_x, B_x and C_x being the product of the aligning columns in df1 and df2
A B C A_x B_x C_x D E
0 6 2 1 12 2 2 7 2
1 0 0 2 0 0 2 1 8
2 7 2 6 14 2 24 6 9
3 2 5 5 14 40 25 1 3
4 9 1 6 81 2 12 7 4

You can use intersection for get same columns names and then multiple by mul, add add_suffix and last concat df1:
cols = df1.columns.intersection(df2.columns)
df = df1[cols].mul(df2[cols], axis=1).add_suffix('_x')
df = pd.concat([df1, df], axis=1)
print (df)
A B C D E A_x B_x C_x
0 6 2 1 7 2 12 2 2
1 0 0 2 1 8 0 0 2
2 7 2 6 6 9 14 2 24
3 2 5 5 1 3 14 40 25
4 9 1 6 7 42 81 2 12
If need change order of columns:
cols = df1.columns.intersection(df2.columns)
df = df1[cols].mul(df2[cols], axis=1).add_suffix('_x')
cols1 = cols.tolist() + \
df.columns.tolist() + \
df1.columns.difference(df2.columns).tolist()
df = pd.concat([df1, df], axis=1)
print (df[cols1])
A B C A_x B_x C_x D E
0 6 2 1 12 2 2 7 2
1 0 0 2 0 0 2 1 8
2 7 2 6 14 2 24 6 9
3 2 5 5 14 40 25 1 3
4 9 1 6 81 2 12 7 42

Related

Python dataframe add columns in groups of 3

I have a data-frame with n rows:
df = 1 2 3
4 5 6
4 2 3
3 1 9
6 7 0
9 2 5
I want to add a columns with the same value in groups of 3.
n (num rows) is for sure divided by 3.
So the new df will be:
df = 1 2 3 A
4 5 6 A
4 2 3 A
3 1 9 B
6 7 0 B
9 2 5 B
What is the best way to do so?

First remove last rows if not dividsable by 3 with DataFrame.iloc and then create 100% unique group by divide by 3 with integer division by 3:
print (df)
a b d
0 1 2 3
1 4 5 6
2 4 2 3
3 3 1 9
4 6 7 0
5 9 2 5
6 0 0 4 <- removed last row
N = 3
num = len(df) // N * N
df = df.iloc[:num]
df['groups'] = np.arange(len(df)) // N
print (df)
a b d groups
0 1 2 3 0
1 4 5 6 0
2 4 2 3 0
3 3 1 9 1
4 6 7 0 1
5 9 2 5 1

IIUC, groupby:
df['new_col'] = df.sum(1).groupby(np.arange(len(df))//3).transform('sum')
Output:
0 1 2 new_col
0 1 2 3 30
1 4 5 6 30
2 4 2 3 30
3 3 1 9 42
4 6 7 0 42
5 9 2 5 42

Can You Preserve Column Order When Pandas Dataframe.Combine Or DataFrame.Combine_First?

If you have 2 dataframes, represented as:
A F Y
0 1 2 3
1 4 5 6
And
B C T
0 7 8 9
1 10 11 12
When combining it becomes:
A B C F T Y
0 1 7 8 2 9 3
1 4 10 11 5 12 6
I would like it to become:
A F Y B C T
0 1 2 3 7 8 9
1 4 5 6 10 11 12
How do I combine 1 data frame with another but keep the original column order?

In [1294]: new_df = df.join(df1)
In [1295]: new_df
Out[1295]:
A F Y B C T
0 1 2 3 7 8 9
1 4 5 6 10 11 12
OR you can also use pd.merge(not a very clean solution though)
In [1297]: df['tmp' ] =1
In [1298]: df1['tmp'] = 1
In [1309]: pd.merge(df, df1, on=['tmp'], left_index=True, right_index=True).drop('tmp', 1)
Out[1309]:
A F Y B C T
0 1 2 3 7 8 9
1 4 5 6 10 11 12

Reshaping groupby dataframe to fixed dimensions

I have dataframe df with following data.
A B C D
1 1 3 1
1 2 9 8
1 3 3 9
2 1 2 9
2 2 1 4
2 3 9 5
2 4 6 4
3 1 4 1
3 2 0 4
4 1 2 6
5 1 2 4
5 2 8 3
grp = df.groupby('A')
Next I want to make all groups of dataframe df grouped on columns A to have same number of rows. Either Truncate extra rows or pad 0 rows. For above data, I want to make all groups to have 3 rows. I required the following results.
A B C D
1 1 3 1
1 2 9 8
1 3 3 9
2 1 2 9
2 2 1 4
2 3 9 5
3 1 4 1
3 2 0 4
3 0 0 0
4 1 2 6
4 0 0 0
4 0 0 0
5 1 2 4
5 2 8 3
5 0 0 0
Similarly, I may want to groupby on multiple columns, like
grp = df.groupby(['A','B'])

Use GroupBy.cumcount for counter column with DataFrame.reindex by MultiIndex.from_product:
df['g'] = df.groupby('A').cumcount()
mux = pd.MultiIndex.from_product([df['A'].unique(), range(3)], names=('A','g'))
df = (df.set_index(['A','g'])
.reindex(mux, fill_value=0)
.reset_index(level=1, drop=True)
.reset_index())
print (df)
A B C D
0 1 1 3 1
1 1 2 9 8
2 1 3 3 9
3 2 1 2 9
4 2 2 1 4
5 2 3 9 5
6 3 1 4 1
7 3 2 0 4
8 3 0 0 0
9 4 1 2 6
10 4 0 0 0
11 4 0 0 0
12 5 1 2 4
13 5 2 8 3
14 5 0 0 0
Another solution with DataFrame.merge with left join with helper DataFrame:
from itertools import product
df['g'] = df.groupby('A').cumcount()
df1 = pd.DataFrame(list(product(df['A'].unique(), range(3))), columns=['A','g'])
df = df1.merge(df, how='left').fillna(0).astype(int).drop('g', axis=1)
print (df)
A B C D
0 1 1 3 1
1 1 2 9 8
2 1 3 3 9
3 2 1 2 9
4 2 2 1 4
5 2 3 9 5
6 3 1 4 1
7 3 2 0 4
8 3 0 0 0
9 4 1 2 6
10 4 0 0 0
11 4 0 0 0
12 5 1 2 4
13 5 2 8 3
14 5 0 0 0
EDIT:
df['g'] = df.groupby(['A','B']).cumcount()
mux = pd.MultiIndex.from_product([df['A'].unique(),
df['B'].unique(),
range(3)], names=('A','B','g'))
df = (df.set_index(['A','B','g'])
.reindex(mux, fill_value=0)
.reset_index(level=2, drop=True)
.reset_index())
print (df.head(10))
A B C D
0 1 1 3 1
1 1 1 0 0
2 1 1 0 0
3 1 2 9 8
4 1 2 0 0
5 1 2 0 0
6 1 3 3 9
7 1 3 0 0
8 1 3 0 0
9 1 4 0 0
from itertools import product
df['g'] = df.groupby(['A','B']).cumcount()
df1 = pd.DataFrame(list(product(df['A'].unique(),
df['B'].unique(),
range(3))), columns=['A','B','g'])
df = df1.merge(df, how='left').fillna(0).astype(int).drop('g', axis=1)

add_suffix to column name based on position

I have a dataset where I want to add a suffix to column names based on their positions. For ex- 1st to 4th columns should be named 'abc_1', then 5th to 8th columns as 'abc_2' and so on.
I have tried using dataframe.rename
but it is a time consuming process. What would be the most efficient way to achieve this?

I think here is good choice create MultiIndex for avoid duplicated columns names - create first level by floor divide by 4 and add prefix by f-strings:
np.random.seed(123)
df = pd.DataFrame(np.random.randint(10, size=(5, 10)))
df.columns = [[f'abc_{i+1}' for i in df.columns // 4], df.columns]
print (df)
abc_1 abc_2 abc_3
0 1 2 3 4 5 6 7 8 9
0 2 2 6 1 3 9 6 1 0 1
1 9 0 0 9 3 4 0 0 4 1
2 7 3 2 4 7 2 4 8 0 7
3 9 3 4 6 1 5 6 2 1 8
4 3 5 0 2 6 2 4 4 6 3
More general solution if no RangeIndex in column names:
cols = [f'abc_{i+1}' for i in np.arange(len(df.columns)) // 4]
df.columns = [cols, df.columns]
print (df)
abc_1 abc_2 abc_3
0 1 2 3 4 5 6 7 8 9
0 2 2 6 1 3 9 6 1 0 1
1 9 0 0 9 3 4 0 0 4 1
2 7 3 2 4 7 2 4 8 0 7
3 9 3 4 6 1 5 6 2 1 8
4 3 5 0 2 6 2 4 4 6 3
Also is possible specify MultiIndex levels names by MultiIndex.from_arrays:
df.columns = pd.MultiIndex.from_arrays([cols, df.columns], names=('level0','level1'))
print (df)
level0 abc_1 abc_2 abc_3
level1 0 1 2 3 4 5 6 7 8 9
0 2 2 6 1 3 9 6 1 0 1
1 9 0 0 9 3 4 0 0 4 1
2 7 3 2 4 7 2 4 8 0 7
3 9 3 4 6 1 5 6 2 1 8
4 3 5 0 2 6 2 4 4 6 3
Then is possible select each level by xs:
print (df.xs('abc_2', axis=1))
4 5 6 7
0 3 9 6 1
1 3 4 0 0
2 7 2 4 8
3 1 5 6 2
4 6 2 4 4

sort dataframe by position in group then by that group

consider the dataframe df
df = pd.DataFrame(dict(
A=list('aaaaabbbbccc'),
B=range(12)
))
print(df)
A B
0 a 0
1 a 1
2 a 2
3 a 3
4 a 4
5 b 5
6 b 6
7 b 7
8 b 8
9 c 9
10 c 10
11 c 11
I want to sort the dataframe such if I grouped by column 'A' I'd pull the first position from each group, then cycle back and get the second position from each group if any are remaining. So on and so forth.
I'd expect results tot look like this
A B
0 a 0
5 b 5
9 c 9
1 a 1
6 b 6
10 c 10
2 a 2
7 b 7
11 c 11
3 a 3
8 b 8
4 a 4

You can use cumcount for count values in groups first, then sort_values and reindex by Series cum:
cum = df.groupby('A')['B'].cumcount().sort_values()
print (cum)
0 0
5 0
9 0
1 1
6 1
10 1
2 2
7 2
11 2
3 3
8 3
4 4
dtype: int64
print (df.reindex(cum.index))
A B
0 a 0
5 b 5
9 c 9
1 a 1
6 b 6
10 c 10
2 a 2
7 b 7
11 c 11
3 a 3
8 b 8
4 a 4

Here's a NumPy approach -
def approach1(g, v):
# Inputs : 1D arrays of groupby and value columns
id_arr2 = np.ones(v.size,dtype=int)
sf = np.flatnonzero(g[1:] != g[:-1])+1
id_arr2[sf[0]] = -sf[0]+1
id_arr2[sf[1:]] = sf[:-1] - sf[1:]+1
return id_arr2.cumsum().argsort(kind='mergesort')
Sample run -
In [246]: df
Out[246]:
A B
0 a 0
1 a 1
2 a 2
3 a 3
4 a 4
5 b 5
6 b 6
7 b 7
8 b 8
9 c 9
10 c 10
11 c 11
In [247]: df.iloc[approach1(df.A.values, df.B.values)]
Out[247]:
A B
0 a 0
5 b 5
9 c 9
1 a 1
6 b 6
10 c 10
2 a 2
7 b 7
11 c 11
3 a 3
8 b 8
4 a 4
Or using df.reindex from #jezrael's post :
df.reindex(approach1(df.A.values, df.B.values))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Pandas multiply based on columns and add suffix - python

Related

Python dataframe add columns in groups of 3

Can You Preserve Column Order When Pandas Dataframe.Combine Or DataFrame.Combine_First?

Reshaping groupby dataframe to fixed dimensions

add_suffix to column name based on position

sort dataframe by position in group then by that group

Categories

Resources