Dynamic cumulative summations in Pandas - python

In the following DataFrame, the column B computes the sum of column A from index 0 to n.
ix A B
---------------
0 1 1
1 1 2
2 1 3
3 1 4
4 2 6
5 -1 5
6 -3 2
Alternatively, the column B sums 1 for each type == 'I' and -1 for each type == 'O'.
ix type B
----------------
0 I 1
1 I 2
2 O 1
3 I 2
4 O 1
5 O 0
6 I 1
How to perform this type of calculations, where the n-th result of one column depends on the aggregated results of another column, up to n?

You can use cumsum:
df['C'] = df.A.cumsum()
print (df)
ix A B C
0 0 1 1 1
1 1 1 2 2
2 2 1 3 3
3 3 1 4 4
4 4 2 6 6
5 5 -1 5 5
6 6 -3 2 2
And for second df add map by dict:
df['C'] = df.type.map({'I':1, 'O':-1}).cumsum()
print (df)
ix type B C
0 0 I 1 1
1 1 I 2 2
2 2 O 1 1
3 3 I 2 2
4 4 O 1 1
5 5 O 0 0
6 6 I 1 1
Or:
df['C'] = df.type.replace({'I':1, 'O':-1}).cumsum()
print (df)
ix type B C
0 0 I 1 1
1 1 I 2 2
2 2 O 1 1
3 3 I 2 2
4 4 O 1 1
5 5 O 0 0
6 6 I 1 1

Related

How to calculate sum of squares of each cell for a row in dataframe?

i have a dataframe like this
Index
A
B
C
D
E
0
4
2
4
4
1
1
1
4
1
4
4
2
3
1
2
0
1
3
1
0
2
2
4
4
0
1
1
0
2
i want to take the square for each cell in a row and add them up then put the result in a column "sum of squares", how to do that ?
i expect this result :
Index
A
B
C
D
E
sum of squares
0
4
2
4
4
1
53
1
1
4
1
4
4
50
2
3
1
2
0
1
15
3
1
0
2
2
4
25
4
0
1
1
0
2
6
By using apply() and sum().
Code:-
import pandas as pd
lis=[(4,2,4,4,1),
(1,4,1,4,4),
(3,1,2,0,1),
(1,0,2,2,4),
(0,1,1,0,2)]
df = pd.DataFrame(lis)
df.columns =['A', 'B', 'C', 'D','E']
#print(df)
# Main code
new=df.apply(lambda num: num**2) #Square of each number stored in new.
#Creating new column sum_of_squares applying sum() function on new
df['sum_of_squares']=new.sum(axis=1)
print(df)
Output:-
A B C D E sum_of_squares
0 4 2 4 4 1 53
1 1 4 1 4 4 50
2 3 1 2 0 1 15
3 1 0 2 2 4 25
4 0 1 1 0 2 6

Pandas group consecutive and label the length

I want get consecutive length labeled data
a
---
1
0
1
0
1
1
1
0
1
1
I want :
a | c
--------
1 1
0 0
1 2
1 2
0 0
1 3
1 3
1 3
0 0
1 2
1 2
then I can calculate the mean of "b" column by group "c". tried with shift and cumsum and cumcount all not work.
Use GroupBy.transform by consecutive groups and then set 0 if not 1 in a column:
df['c1'] = (df.groupby(df.a.ne(df.a.shift()).cumsum())['a']
.transform('size')
.where(df.a.eq(1), 0))
print (df)
a b c c1
0 1 1 1 1
1 0 2 0 0
2 1 3 2 2
3 1 2 2 2
4 0 1 0 0
5 1 3 3 3
6 1 1 3 3
7 1 3 3 3
8 0 2 0 0
9 1 2 2 2
10 1 1 2 2
If there are only 0, 1 values is possible multiple by a:
df['c1'] = (df.groupby(df.a.ne(df.a.shift()).cumsum())['a']
.transform('size')
.mul(df.a))
print (df)
a b c c1
0 1 1 1 1
1 0 2 0 0
2 1 3 2 2
3 1 2 2 2
4 0 1 0 0
5 1 3 3 3
6 1 1 3 3
7 1 3 3 3
8 0 2 0 0
9 1 2 2 2
10 1 1 2 2

Duplicate a selected row and put the duplicate just below in a Pandas DataFrame

I have a Pandas dataframe like this :
id A B
0 2 2
1 1 1
2 3 3
3 7 7
And I want to duplicate the first row 3 times just below the selected row :
id A B
0 2 2
1 2 2
2 2 2
3 2 2
4 1 1
5 3 3
6 7 7
Is there a method that already exist in Pandas library ?
There is no built-in method for doing just this. However, you can create a list of indexes, and use df.loc + df.index.repeat:
new_df = df.loc[df.index.repeat([4] + [1] * (len(df) - 1))].reset_index(drop=True)
Output:
>>> new_df
id A B
0 0 2 2
1 0 2 2
2 0 2 2
3 0 2 2
4 1 1 1
5 2 3 3
6 3 7 7
Use reindex and Index.repeat to create your dataframe:
>>> df.reindex(df.index.repeat([3] + [1] * (len(df) - 1)))
id A B
0 0 2 2
0 0 2 2
0 0 2 2
1 1 1 1
2 2 3 3
3 3 7 7
Another way:
>>> df.loc[[df.index[0]]*3 + df.index[1:].tolist()]
id A B
0 0 2 2
0 0 2 2
0 0 2 2
1 1 1 1
2 2 3 3
3 3 7 7
A more generalized way proposed by #MuhammadHassan:
row_index = 0
repeat_time = 3
>>> df.reindex(df.index.tolist() + [row_index]*repeat_time).sort_index()
id A B
0 0 2 2
0 0 2 2
0 0 2 2
0 0 2 2
1 1 1 1
2 2 3 3
3 3 7 7
Let us try
n=3
row = 0
df = df.append(df.loc[[row]*(n-1)]).sort_index()
df
id A B
0 0 2 2
0 0 2 2
0 0 2 2
1 1 1 1
2 2 3 3
3 3 7 7

Python: create new column conditionally on values from two other columns

I would like to combine two columns in a new column.
Lets suppose I have:
Index A B
0 1 0
1 1 0
2 1 0
3 1 0
4 1 0
5 1 2
6 1 2
7 1 2
8 1 2
9 1 2
10 1 2
Now I would like to create a column C with the entries from A from Index 0 to 4 and from column B from Index 5 to 10. It should look like this:
Index A B C
0 1 0 1
1 1 0 1
2 1 0 1
3 1 0 1
4 1 0 1
5 1 2 2
6 1 2 2
7 1 2 2
8 1 2 2
9 1 2 2
10 1 2 2
Is there a python code how I can get this? Thanks in advance!
If Index is an actual column you can use numpy.where and specify your condition
import numpy as np
df['C'] = np.where(df['Index'] <= 4, df['A'], df['B'])
Index A B C
0 0 1 0 1
1 1 1 0 1
2 2 1 0 1
3 3 1 0 1
4 4 1 0 1
5 5 1 2 2
6 6 1 2 2
7 7 1 2 2
8 8 1 2 2
9 9 1 2 2
10 10 1 2 2
if your index is your actual index
you can slice your indices with iloc and create your column with concat.
df['C'] = pd.concat([df['A'].iloc[:5], df['B'].iloc[5:]])
print(df)
A B C
0 1 0 1
1 1 0 1
2 1 0 1
3 1 0 1
4 1 0 1
5 1 2 2
6 1 2 2
7 1 2 2
8 1 2 2
9 1 2 2
10 1 2 2

group consecutive equal values and count up

The df:
a b
0 1
0 3
0 3
0 1
1 1
1 2
1 4
I would like to group by a and count up the equal consecutive rows in a group:
a b c
0 1 1
0 3 2
0 3 2
0 1 3
1 1 1
1 2 2
1 4 3
I tried:
df['c'] = df.b.groupby([df.a, df.b.diff().ne(0).cumsum()])
which gave me a type error:
Length of values does not match length of index
In your case , that is factor
s=df.b.diff().ne(0).cumsum().groupby(df.a).transform(lambda x : x.factorize()[0])+1
Out[276]:
0 1
1 2
2 2
3 3
4 1
5 2
6 3
Name: b, dtype: int32
df['c']=s
Or
df.b.groupby(df.a).apply(lambda x : x.diff().ne(0).cumsum())
Out[277]:
0 1
1 2
2 2
3 3
4 1
5 2
6 3
Name: b, dtype: int32
Another approach:
s = df.ne(df.shift()).any(1).astype(int)
df['c'] = s.groupby(df['a']).cumsum()
Output:
a b c
0 0 1 1
1 0 3 2
2 0 3 2
3 0 1 3
4 1 1 1
5 1 2 2
6 1 4 3

Categories