Adding up all columns in a dataframe

Adding up all columns in a dataframe - python

I'd like to add up all the columns in a DataFrame, and I'd like this sum added as a new column in the DataFrame.
I want "all" the columns available, without mentioning the first and last columns in my query.
Is this possible?

Use sum:
import pandas as pd
import numpy as np
#random dataframe
np.random.seed(1)
df1 = pd.DataFrame(np.random.randint(10, size=(3,5)))
df1.columns = list('ABCDE')
print df1
A B C D E
0 5 8 9 5 0
1 0 1 7 6 9
2 2 4 5 2 4
df1['sum'] = df1.sum(axis=1)
print df1
A B C D E sum
0 5 8 9 5 0 27
1 0 1 7 6 9 23
2 2 4 5 2 4 17
Another solution for creating new columns is assign:
print df1.assign(sum=df1.sum(axis=1))
A B C D E sum
0 5 8 9 5 0 27
1 0 1 7 6 9 23
2 2 4 5 2 4 17

Solution
pd.concat([df, df.sum(axis=1)], axis=1)

you can do it like this:
df['sum'] = df.sum(axis=1)

Related

Sum of every two columns and leave one column in pandas dataframe

My task is like this:
df=pd.DataFrame([(1,2,3,4,5,6),(1,2,3,4,5,6),(1,2,3,4,5,6)],columns=['a','b','c','d','e','f'])
Out:
a b c d e f
0 1 2 3 4 5 6
1 1 2 3 4 5 6
2 1 2 3 4 5 6
I want to do is the output dataframe looks like this:
Out
s1 b s2 d s3 f
0 3 2 7 4 11 6
1 3 2 7 4 11 6
2 3 2 7 4 11 6
That is to say, sum the column (a,b),(c,d),(e,f) separately and keep each last column and rename the result columns names as (s1,s2,s3). Could anyone help solve this problem in Pandas? Thank you so much.

You can seelct columns by posistions by iloc, sum each 2 values and last rename columns by f-strings
i = 2
for x in range(0, len(df.columns), i):
df.iloc[:, x] = df.iloc[:, x:x+i].sum(axis=1)
df = df.rename(columns={df.columns[x]:f's{x // i + 1}'})
print (df)
s1 b s2 d s3 f
0 3 2 7 4 11 6
1 3 2 7 4 11 6
2 3 2 7 4 11 6

For one do
df['a'] = df['a'] + df['b']
df.rename(columns={col1: 's1')}, inplace=True)
You can use a loop to do all
the loop using enumerate and zip, generates
(0,('a','b')), (1,('c','d')), (2,('e','f'))
use these indexes to do the sum and the renaming
import pandas as pd
cols = ['a','b','c','d','e','f']
df =pd.DataFrame([(1,2,3,4,5,6),(1,2,3,4,5,6),(1,2,3,4,5,6)],columns=cols)
for idx, (col1, col2) in enumerate(zip(cols[::2], cols[1::2])):
df[col1] = df[col1] + df[col2]
df.rename(columns={col1: 's'+str(idx+1)}, inplace=True)
print(df)
CODE DEMO

You can try this:-
res = pd.DataFrame()
for i in range(len(df.columns)-1):
if i%2==0:
res[df.columns[i]] = df[df.columns[i]]+df[df.columns[i+1]]
else:
res[df.columns[i]] = df[df.columns[i]]
res['f'] = df[df.columns[-1]]
res.columns = ['s1', 'b', 's2', 'd', 's3', 'f']
Output:-
s1 b s2 d s3 f
0 3 2 7 4 11 6
1 3 2 7 4 11 6
2 3 2 7 4 11 6

df=pd.DataFrame([(1,2,3,4,5,6),(1,2,3,4,5,6),(1,2,3,4,5,6)],columns=['a','b','c','d','e','f'])
df['s1'] = df['a'] + df['b']
df['s2'] = df['c'] + df['d']
df['s3'] = df['e'] + df['f']
df = a b c d e f s1 s2 s3
0 1 2 3 4 5 6 3 7 11
1 1 2 3 4 5 6 3 7 11
2 1 2 3 4 5 6 3 7 11
and you can remove the columns 'a', 'b', 'c'
df.pop('a')
df.pop('c')
df.pop('d')
df = b e f s1 s2 s3
0 2 5 6 3 7 11
1 2 5 6 3 7 11
2 2 5 6 3 7 11

Jump is in steps of two; so we can split the dataframe with np.split :
res = np.split(df.to_numpy(), df.shape[-1] // 2, 1)
Next, we compute the new data, where we sum pairs of columns and keep the last column in each pair :
new_frame = np.hstack([np.vstack((np.sum(entry,1), entry[:,-1])).T for entry in res])
Create new column, taking into cognizance the jump of 2 :
new_cols = [f"s{ind//2+1}" if ind%2==0 else val for ind,val in enumerate(df.columns)]
Create new dataframe :
pd.DataFrame(new_frame, columns=new_cols)
s1 b s2 d s3 f
0 3 2 7 4 11 6
1 3 2 7 4 11 6
2 3 2 7 4 11 6

fastest way to insert multiple rows into a dataframe given a list of indexes (python)

I have a dataframe and I would like to insert rows at specific indexes at the beginning of each group within the dataframe. As an example lets say I have the following dataframe:
import pandas as pd
df = pd.DataFrame(data=[['A',1,1],['A',2,3],['A',5,4],['B',3,4],['B',2,6],['B',8,4],['C',9,3],['C',3,7],['C',1,9],['D',5,5],['D',8,3],['D',4,7]], columns=['Group','val1','val2'])
I would like to copy the first row of each unique value in the column group and insert that row at the beginning of each group while growing the dataframe. I can currently achieve this by using a for loop but it is pretty slow because my dataframe is large so I am looking for a vectorized solution.
I have a list of indexes where I would like to insert the rows.
idxs = [0, 3, 6, 9]
In each iteration of the loop I currently slice the dataframe at each one of the idxs into two dataframes, insert the row, and concat the dataframes. My dataframe is very large so this process has been very slow.
The solution would look like this:
Group val1 val2
0 A 1 1
1 A 1 1
2 A 2 3
3 A 5 4
4 B 3 4
5 B 3 4
6 B 2 6
7 B 8 4
8 C 9 3
9 C 9 3
10 C 3 7
11 C 1 9
12 D 5 5
13 D 5 5
14 D 8 3
15 D 4 7

You can do this by grouping by group, iterating over each group, and constructing a DataFrame via concatenation of each the first row of a group to the group itself, then the concatenation of all those concatenations.
Code:
import pandas as pd
df = pd.DataFrame(data=[['A',1,1],['A',2,3],['A',5,4],['B',3,4],['B',2,6],['B',8,4],['C',9,3],['C',3,7],['C',1,9],['D',5,5],['D',8,3],['D',4,7]], columns=['Group','val1','val2'])
df_new = pd.concat([
pd.concat([grp.iloc[[0], :], grp])
for key, grp in df.groupby('Group')
])
print(df_new)
Output:
Group val1 val2
0 A 1 1
0 A 1 1
1 A 2 3
2 A 5 4
3 B 3 4
3 B 3 4
4 B 2 6
5 B 8 4
6 C 9 3
6 C 9 3
7 C 3 7
8 C 1 9
9 D 5 5
9 D 5 5
10 D 8 3
11 D 4 7

Print out pandas groupby without operation

So I have the following pandas dataframe:
import pandas as pd
sample_df = pd.DataFrame({'note': ['D','C','D','C'], 'time': [1,1,4,6], 'val': [6,4,7,9]})
which gives the result
note time val
0 D 1 6
1 C 1 4
2 D 4 7
3 C 6 9
What I want is
note index time val
C 1 1 4
3 6 9
D 0 1 6
2 4 7
I tried sample_df.set_index('note',append=True) and it didn't work.

Add DataFrame.swaplevel with DataFrame.sort_index by first level:
df = sample_df.set_index('note', append=True).swaplevel(1,0).sort_index(level=0)
print (df)
time val
note
C 1 1 4
3 6 9
D 0 1 6
2 4 7
If need set level name add DataFrame.rename_axis:
df = (sample_df.rename_axis('idx')
.set_index('note',append=True)
.swaplevel(1,0)
.sort_index(level=0))
print (df)
time val
note idx
C 1 1 4
3 6 9
D 0 1 6
2 4 7
Alternatively:
sample_df.index.rename('old_index', inplace=True)
sample_df.reset_index(inplace=True)
sample_df.set_index(['note','old_index'], inplace=True)
sample_df.sort_index(level=0, inplace=True)
print (sample_df)
time val
note old_index
C 1 1 4
3 6 9
D 0 1 6
2 4 7

I am using MultiIndex create the target index
sample_df.index=pd.MultiIndex.from_arrays([sample_df.note,sample_df.index])
sample_df.drop('note',1,inplace=True)
sample_df=sample_df.sort_index(level=0)
sample_df
time val
note
C 1 1 4
3 6 9
D 0 1 6
2 4 7

I would use set_index and pop to simultaneously discard column 'note' and set new index
df.set_index([df.pop('note'), df.index]).sort_index(level=0)
Out[380]:
time val
note
C 1 1 4
3 6 9
D 0 1 6
2 4 7

Python - Pandas, split long column to multiple columns

Given the following DataFrame:
>>> pd.DataFrame(data=[['a',1],['a',2],['b',3],['b',4],['c',5],['c',6],['d',7],['d',8],['d',9],['e',10]],columns=['key','value'])
key value
0 a 1
1 a 2
2 b 3
3 b 4
4 c 5
5 c 6
6 d 7
7 d 8
8 d 9
9 e 10
I'm looking for a method that will change the structure based on the key value, like so:
a b c d e
0 1 3 5 7 10
1 2 4 6 8 10 <- 10 is duplicated
2 2 4 6 9 10 <- 10 is duplicated
The result row number is as the longest group count (d in the above example) and the missing values are duplicates of the last available value.

Create MultiIndex by set_index with counter column by cumcount, reshape by unstack, repalce missing values by last non missing ones with ffill and last converting all data to integers if necessary:
df = df.set_index([df.groupby('key').cumcount(),'key'])['value'].unstack().ffill().astype(int)
Another solution with custom lambda function:
df = (df.groupby('key')['value']
.apply(lambda x: pd.Series(x.values))
.unstack(0)
.ffill()
.astype(int))
print (df)
key a b c d e
0 1 3 5 7 10
1 2 4 6 8 10
2 2 4 6 9 10

Using pivot , with groupby + cumcount
df.assign(key2=df.groupby('key').cumcount()).pivot('key2','key','value').ffill().astype(int)
Out[214]:
key a b c d e
key2
0 1 3 5 7 10
1 2 4 6 8 10
2 2 4 6 9 10

Grouby function creating 2 new columns [duplicate]

SQL : Select Max(A) , Min (B) , C from Table group by C
I want to do the same operation in pandas on a dataframe. The closer I got was till :
DF2= DF1.groupby(by=['C']).max()
where I land up getting max of both the columns , how do i do more than one operation while grouping by.

You can use function agg:
DF2 = DF1.groupby('C').agg({'A': max, 'B': min})
Sample:
print DF1
A B C D
0 1 5 a a
1 7 9 a b
2 2 10 c d
3 3 2 c c
DF2 = DF1.groupby('C').agg({'A': max, 'B': min})
print DF2
A B
C
a 7 5
c 3 2
GroupBy-fu: improvements in grouping and aggregating data in pandas - nice explanations.

try agg() function:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,5,size=(20, 3)), columns=list('ABC'))
print(df)
print(df.groupby('C').agg({'A': max, 'B':min}))
Output:
A B C
0 2 3 0
1 2 2 1
2 4 0 1
3 0 1 4
4 3 3 2
5 0 4 3
6 2 4 2
7 3 4 0
8 4 2 2
9 3 2 1
10 2 3 1
11 4 1 0
12 4 3 2
13 0 0 1
14 3 1 1
15 4 1 1
16 0 0 0
17 4 0 1
18 3 4 0
19 0 2 4
A B
C
0 4 0
1 4 0
2 4 2
3 0 4
4 0 1
Alternatively you may want to check pandas.read_sql_query() function...

You can use the agg function
import pandas as pd
import numpy as np
df.groupby('something').agg({'column1': np.max, 'columns2': np.min})

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Adding up all columns in a dataframe - python

I'd like to add up all the columns in a DataFrame, and I'd like this sum added as a new column in the DataFrame. I want "all" the columns available, without mentioning the first and last columns in my query. Is this possible?

Solution pd.concat([df, df.sum(axis=1)], axis=1)

you can do it like this: df['sum'] = df.sum(axis=1)

Related

Sum of every two columns and leave one column in pandas dataframe

fastest way to insert multiple rows into a dataframe given a list of indexes (python)

Print out pandas groupby without operation

Python - Pandas, split long column to multiple columns

Grouby function creating 2 new columns [duplicate]

Categories

Resources