Calculate changes of column in Pandas - python

I have a dataframe with data and I want calculate changes of values during time.
UserId DateTime Value
1 1 0
1 2 0
1 3 0
1 4 1
1 6 1
1 7 1
2 1 0
2 2 1
2 3 1
2 4 0
2 6 1
2 7 1
So after script execution I want to get a column with change identifier (for user and date). Only changes from 0 to 1 is interesting.
UserId DateTime Value IsChanged
1 1 0 0
1 2 0 0
1 3 0 0
1 4 1 1 <- Value was changed from 0 to 1
1 6 1 0
1 7 1 0
2 1 0 0
2 2 1 1 <- Value was changed from 0 to 1
2 3 1 0
2 4 0 0 <- Change from 1 to 0 not interesting
2 6 1 1 <- Value was changed from 0 to 1 for the user
2 7 1 0

What about this?
# df is your dataframe
df['IsChanged'] = (df['Value'].diff()==1).astype(int)
The only case you care about is Value being 0 before and 1 after, so you can simply calculate the change in value and check if it is equal to 1.
UserId DateTime Value IsChanged
0 1 1 0 0
1 1 2 0 0
2 1 3 0 0
3 1 4 1 1
4 1 6 1 0
5 1 7 1 0
6 2 1 0 0
7 2 2 1 1
8 2 3 1 0
9 2 4 0 0
10 2 6 1 1
11 2 7 1 0

Related

Add a column based on frequency for each group

I have a dataframe like this:
vehicle_id trip
0 0 0
1 0 0
2 0 0
3 0 1
4 0 1
5 1 0
6 1 0
7 1 1
8 1 1
9 1 1
10 1 1
11 1 1
12 1 2
13 2 0
14 2 1
15 2 2
I want to add a column that counts the frequency of each trip value for each 'vehicle id' group and drop the rows where the frequency is equal to 'one'. So after adding the column the frequency will be like this:
vehicle_id trip frequency
0 0 0 3
1 0 0 3
2 0 0 3
3 0 1 2
4 0 1 2
5 1 0 2
6 1 0 2
7 1 1 5
8 1 1 5
9 1 1 5
10 1 1 5
11 1 1 5
12 1 2 1
13 2 0 1
14 2 1 1
15 2 2 1
and the final result will be like this
vehicle_id trip frequency
0 0 0 3
1 0 0 3
2 0 0 3
3 0 1 2
4 0 1 2
5 1 0 2
6 1 0 2
7 1 1 5
8 1 1 5
9 1 1 5
10 1 1 5
11 1 1 5
what is the best solution for that? Also, what should I do if I intend to directly drop rows where the frequency is equal to 1 in each group (without adding the frequency column)?
Check the collab here :
https://colab.research.google.com/drive/1AuBTuW7vWj1FbJzhPuE-QoLncoF5W_7W?usp=sharing
You can use df.groupby() :
df["frequency"] = df.groupby(["vehicle_id","trip"]).transform("count")
But of course you need to create the frequency column before_hand :
df["frequency"] = 0
If I take your dataframe as example this gives :
import pandas as pd
dict = {"vehicle_id" : [0,0,0,0,0,1,1,1,1,1,1,1],
"trip" : [0,0,0,1,1,0,0,1,1,1,1,1]}
df = pd.DataFrame.from_dict(dict)
df["frequency"] = 0
df["frequency"] = df.groupby(["vehicle_id","trip"]).transform("count")
output :
Try:
df["frequency"] = (
df.assign(frequency=0).groupby(["vehicle_id", "trip"]).transform("count")
)
print(df[df.frequency > 1])
Prints:
vehicle_id trip frequency
0 0 0 3
1 0 0 3
2 0 0 3
3 0 1 2
4 0 1 2
5 1 0 2
6 1 0 2
7 1 1 5
8 1 1 5
9 1 1 5
10 1 1 5
11 1 1 5

Pandas group consecutive and label the length

I want get consecutive length labeled data
a
---
1
0
1
0
1
1
1
0
1
1
I want :
a | c
--------
1 1
0 0
1 2
1 2
0 0
1 3
1 3
1 3
0 0
1 2
1 2
then I can calculate the mean of "b" column by group "c". tried with shift and cumsum and cumcount all not work.
Use GroupBy.transform by consecutive groups and then set 0 if not 1 in a column:
df['c1'] = (df.groupby(df.a.ne(df.a.shift()).cumsum())['a']
.transform('size')
.where(df.a.eq(1), 0))
print (df)
a b c c1
0 1 1 1 1
1 0 2 0 0
2 1 3 2 2
3 1 2 2 2
4 0 1 0 0
5 1 3 3 3
6 1 1 3 3
7 1 3 3 3
8 0 2 0 0
9 1 2 2 2
10 1 1 2 2
If there are only 0, 1 values is possible multiple by a:
df['c1'] = (df.groupby(df.a.ne(df.a.shift()).cumsum())['a']
.transform('size')
.mul(df.a))
print (df)
a b c c1
0 1 1 1 1
1 0 2 0 0
2 1 3 2 2
3 1 2 2 2
4 0 1 0 0
5 1 3 3 3
6 1 1 3 3
7 1 3 3 3
8 0 2 0 0
9 1 2 2 2
10 1 1 2 2

From geeks for geeks Frequencies of Limited Range Array Elements problem

This is an exercise from gfg must do question. But my code is not passing all the testcases.
Can anyone please help me out here.
Wrong Answer. !!!Wrong Answer
Possibly your code doesn't work correctly for multiple test-cases (TCs).
The first test case where your code failed:
Input:
37349
27162 38945 3271 34209 37960 17314 13663 17082 37769 2714 19280 17626 34997 33512 29275 25207 4706 12532 34909 23823 272 29688 19312 8154 5091 26858 30814 19105 14105 11303 16863 1861 2961 36601 10157 114 11491 31810 29152 2627 14327 30116 14828 37781 38925 16319 10972 4506 18669 19366 28984 6948 15170 24135 6256 38121 3835 38031 9855 25152 19132 23573 29587 1719 33440 26311 12647 23022 34206 39955 3791 18555 336 7317 12033 7278 27508 5521 24935 15078 915 35478 37253 6863 39182 23429 33867.................
Its Correct output is:
2 4 1 2 5 2 0 4 1 3 1 2 1 3 2 4 4 1 1 0 2 0 4 1 3 5 1 0 1 2 1 3 2 0 1 1 2 0 0 2 1 2 2 1 4 2 0 1 2 2 0 1 2 0 2 4 4 5 2 5 2 1 5 1 2 1 0 1 1 2 2 1 3 1 2 0 3 4 1 2 0 2 3 5 2 2 1 3 1 4 0 3 5 1 1 3 1 2 2 3 2 2 4 1 1 3 1 4 3 4 0 2 1 4 4 2 2 3 3 0 0 0 4 1 2 1 2 4 1 3 1 2 4 0 2 1 1 1 0 3 4 3 2 0 3 0 0 0 1 1 0 0 2 2 3 0 1 2 2 2 0 2 3 2 1 1 3 0 1 5 1 1 1 0 2 0 3 1 2 1 1 1 2 3 3 1 1 3 1 4 1 3 1 1 1 2 2 0 1 0 2 2 0 2 2 2 1 4 1 0 3 1 2 0 3 1 2 1 8 3 0 0 1 1 1 1 2 1 1 4 1 3 0 3 2 1 1 1 1 2 4 2 2 1 4 2 1 3 1 0 .................
And Your Code's output is:
1 0 0 0 1 0 0 1 0 2 1 2 0 0 1 1 2 1 1 0 0 0 2 1 2 2 0 0 1 0 1 2 2 0 1 1 1 0 0 1 0 1 0 0 1 0 0 0 1 1 0 1 1 0 1 2 3 3 2 2 2 1 3 1 1 1 0 1 0 1 0 0 1 1 2 0 1 3 1 0 0 2 1 4 0 1 0 1 0 3 0 1 2 0 0 1 1 1 1 3 1 0 2 1 0 3 1 3 2 2 0 2 0 2 3 1 0 0 1 0 0 0 3 0 0 0 1 3 1 2 0 1 2 0 2 0 0 0 0 2 2 2 1 0 0 0 0 0 1 0 0 0 2 2 2 0 0 0 1 2 0 0 2 1 1 1 2 0 1 3 0 1 0 0 1 0 1 1 1 0 0 0 1 1 0 1 0 2 0 2 1 3 1 0 1 2 0 0 1 0 1 1 0 1 2 1 0 3 0 0 1 0 1 0 2 0 2 1 4 2 0 0 1 0 0 1 2 0 0 1 0 2 0 2 2 1 0 0 0 0 2 0 1 0 2 1 0 2 0 0 .................
Given an array A[] of n positive integers which can contain integers from 1 to n where elements can be repeated or can be absent from the array. Your task is to count the frequency of all elements from 1 to n.
Input:
n = 5
A[] = {2,3,2,3,5}
Output:
0 2 2 0 1
Explanation:
Counting frequencies of each array element
We have:
1 occurring 0 times.
2 occurring 2 times.
3 occurring 2 times.
4 occurring 0 times.
5 occurring 1 time.
problem link :
https://practice.geeksforgeeks.org/problems/frequency-of-array-elements-1587115620/1
class Solution:
#Function to count the frequency of all elements from 1 to N in the array.
def frequencycount(self,A,N):
s = {}
for i in A:
if i in s:
s[i] += 1
else:
s[i] = 1
for i in range(1,len(A)+1):
if i in s:
A[i-1] = s[i]
else:
A[i-1]=0
return A
#{
# Driver Code Starts
#Initial Template for Python 3
import math
if __name__=="__main__":
T=int(input())
while(T>0):
N=int(input())
A=[int(x) for x in input().strip().split()]
ob=Solution()
ob.frequencycount(A,N)
for i in range (len (A)):
print(A[i], end=" ")
print()
T-=1
# } Driver Code Ends
Suppose you have a list
l = [0,2,5,6,0,5,3]
Then you can do:
res = [l.count(e) for e in range(max(l))]
This will give an array with the number of times a letter e is present in the original list.

Cumulative sum problem considering data till last record with multiple IDs

I have a dataset with multiple IDs and dates where I have created a column for Cumulative supply in python.
My data is as follows
SKU Date Demand Supply Cum_Supply
1 20160207 6 2 2
1 20160214 5 0 2
1 20160221 1 0 2
1 20160228 6 0 2
1 20160306 1 0 2
1 20160313 101 0 2
1 20160320 1 0 2
1 20160327 1 0 2
2 20160207 0 0 0
2 20160214 0 0 0
2 20160221 2 0 0
2 20160228 2 0 0
2 20160306 2 0 0
2 20160313 1 0 0
2 20160320 1 0 0
2 20160327 1 0 0
Where Cum_supply was calculated by
idx = pd.MultiIndex.from_product([np.unique(data.Date), data.SKU.unique()])
data2 = data.set_index(['Date', 'SKU']).reindex(idx).fillna(0)
data2 = pd.concat([data2, data2.groupby(level=1).cumsum().add_prefix('Cum_')],1).sort_index(level=1).reset_index()
I want to create a Column 'True_Demand' which is max unfulfilled demand till that date max(Demand-Supply) + Cum_supply.
So my output would be something this:
SKU Date Demand Supply Cum_Supply True_Demand
1 20160207 6 2 2 6
1 20160214 5 0 2 7
1 20160221 1 0 2 7
1 20160228 6 0 2 8
1 20160306 1 0 2 8
1 20160313 101 0 2 103
1 20160320 1 0 2 103
1 20160327 1 0 2 103
2 20160207 0 0 0 0
2 20160214 0 0 0 0
2 20160221 2 0 0 2
2 20160228 2 0 0 2
2 20160306 2 0 0 2
2 20160313 1 0 0 2
2 20160320 1 0 0 2
2 20160327 1 0 0 2
So for the 3rd record(20160221) the max unfulfilled demand before 20160221 was 5. So the True demand is 5+2 = 7 despite the unfulfilled demand on that date was 1+2.
Code for the dataframe
data = pd.DataFrame({'SKU':[1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2],
'Date':[20160207,20160214,20160221,20160228,20160306,20160313,20160320,20160327,20160207,20160214,20160221,20160228,20160306,20160313,20160320,20160327],
'Demand':[6,5,1,6,1,101,1,1,0,0,2,2,2,1,1,1],
'Supply':[2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]}
,columns=['Date', 'SKU', 'Demand', 'Supply'])
Would you try this pretty fun one-liner?
(data.groupby('SKU',
as_index=False,
group_keys=False)
.apply(lambda x:
x.assign(Cum_Supply=x.Supply.cumsum())
.pipe(lambda x:
x.assign(True_Demand = (x.Demand - x.Supply + x.Cum_Supply).cummax()))))
Output:
Date SKU Demand Supply Cum_Supply True_Demand
0 20160207 1 6 2 2 6
1 20160214 1 5 0 2 7
2 20160221 1 1 0 2 7
3 20160228 1 6 0 2 8
4 20160306 1 1 0 2 8
5 20160313 1 101 0 2 103
6 20160320 1 1 0 2 103
7 20160327 1 1 0 2 103
8 20160207 2 0 0 0 0
9 20160214 2 0 0 0 0
10 20160221 2 2 0 0 2
11 20160228 2 2 0 0 2
12 20160306 2 2 0 0 2
13 20160313 2 1 0 0 2
14 20160320 2 1 0 0 2
15 20160327 2 1 0 0 2

Dataframe from all possible combinations of values of given categories

I have
{"A":[0,1], "B":[4,5], "C":[0,1], "D":[0,1]}
what I want it
A B C D
0 4 0 0
0 4 0 1
0 4 1 0
0 4 1 1
1 4 0 1
...and so on. Basically all the combinations of values for each of the categories.
What would be the best way to achieve this?
If x is your dict:
>>> pandas.DataFrame(list(itertools.product(*x.values())), columns=x.keys())
A C B D
0 0 0 4 0
1 0 0 4 1
2 0 0 5 0
3 0 0 5 1
4 0 1 4 0
5 0 1 4 1
6 0 1 5 0
7 0 1 5 1
8 1 0 4 0
9 1 0 4 1
10 1 0 5 0
11 1 0 5 1
12 1 1 4 0
13 1 1 4 1
14 1 1 5 0
15 1 1 5 1
If you want the columns in a particular order you'll need to switch them afterwards (with, e.g., df[["A", "B", "C", "D"]].

Categories