I have a dataframe as below:
Datetime Data Fn
0 18747.385417 11275.0 0
1 18747.388889 8872.0 1
2 18747.392361 7050.0 0
3 18747.395833 8240.0 1
4 18747.399306 5158.0 1
5 18747.402778 3926.0 0
6 18747.406250 4043.0 0
7 18747.409722 2752.0 1
8 18747.420139 3502.0 1
9 18747.423611 4026.0 1
I want to calculate the sum of 3 values from each number in Fn and put the value in Sum.
My expected result is this:
Datetime Data Fn Sum
0 18747.385417 11275.0 0 0
1 18747.388889 8872.0 1 0
2 18747.392361 7050.0 0 1
3 18747.395833 8240.0 1 2
4 18747.399306 5158.0 1 2
5 18747.402778 3926.0 0 2
6 18747.406250 4043.0 0 1
7 18747.409722 2752.0 1 1
8 18747.420139 3502.0 1 2
9 18747.423611 4026.0 1 3
df['Sum'] = df.Fn.rolling(3).sum().fillna(0)
Output:
Datetime Data Fn Sum
0 18747.385417 11275.0 0 0.0
1 18747.388889 8872.0 1 0.0
2 18747.392361 7050.0 0 1.0
3 18747.395833 8240.0 1 2.0
4 18747.399306 5158.0 1 2.0
5 18747.402778 3926.0 0 2.0
6 18747.406250 4043.0 0 1.0
7 18747.409722 2752.0 1 1.0
8 18747.420139 3502.0 1 2.0
9 18747.423611 4026.0 1 3.0
I have a dataframe
Group Score Rank
1 0 3
1 4 1
1 2 2
2 3 2
2 1 3
2 7 1
I have to take the difference of the score in next rank within each group. For example, in group 1 rank(1) - rank(2) = 4 - 2
Expected output:
Group Score Rank Difference
1 0 3 0
1 4 1 2
1 2 2 2
2 3 2 2
2 1 3 0
2 7 1 4
you can try:
df = df.sort_values(['Group', 'Rank'],ascending = [True,False])
df['Difference'] =df.groupby('Group', as_index=False)['Score'].transform('diff').fillna(0).astype(int)
OUTPUT:
Group Score Rank Difference
0 1 0 3 0
2 1 2 2 2
1 1 4 1 2
4 2 1 3 0
3 2 3 2 2
5 2 7 1 4
NOTE: The result is sorted based on the rank column.
I think you can create a new column for the values in the next rank by using the shift() and then calculate the difference. You can see the following codes:
# Sort the dataframe
df = df.sort_values(['Group','Rank']).reset_index(drop=True)
# Shift up values by one row within a group
df['Score_next'] = df.groupby('Group')['Score'].shift(-1).fillna(0)
# Calculate the difference
df['Difference'] = df['Score'] - df['Score_next']
Here is the result:
print(df)
Group Score Rank Score_next Difference
0 1 4 1 2.0 2.0
1 1 2 2 0.0 2.0
2 1 0 3 0.0 0.0
3 2 7 1 3.0 4.0
4 2 3 2 1.0 2.0
5 2 1 3 0.0 1.0
In the given data frame, I am trying to perform a row-wise replace operation where 1 should be replaced by the value in Values.
Input:
import pandas as pd
df = pd.DataFrame({'ID': [1,1,1,2,3,3,4,5,6,7],
'A': [0,1,0,1,0,0,1,0,np.nan,0],
'B': [0,0,0,0,1,1,0,0,0,0],
'C': [1,0,1,0,0,0,0,0,1,1],
'Values': [10, 2, 3,4,9,3,4,5,2,3]})
Expected Output:
ID A B C Values
0 1 0.0 0 10 10
1 1 2.0 0 0 2
2 1 0.0 0 3 3
3 2 4.0 0 0 4
4 3 0.0 9 0 9
5 3 0.0 3 0 3
6 4 4.0 0 0 4
7 5 0.0 0 0 5
8 6 NaN 0 2 2
9 7 0.0 0 3 3
**Note: The data is very large.
Use df.where
df[['A','B','C']]=df[['A','B','C']].where(df[['A','B','C']].ne(1),df['Values'], axis=0)
ID A B C Values
0 1 0.0 0 10 10
1 1 2.0 0 0 2
2 1 0.0 0 3 3
3 2 4.0 0 0 4
4 3 0.0 9 0 9
5 3 0.0 3 0 3
6 4 4.0 0 0 4
7 5 0.0 0 0 5
8 6 NaN 0 2 2
9 7 0.0 0 3 3
Or
df[['A','B','C']]=df[['A','B','C']].mask(df[['A','B','C']].eq(1),df['Values'], axis=0)
My data is really large and it is very slow.
If we exploit the nature of your dataset (A, B, C columns have 1s or 0s or Nans), you simply have to multiple df['values'] with each column independently. This should be super fast as it is vectorized.
df['A'] = df['A']*df['Values']
df['B'] = df['B']*df['Values']
df['C'] = df['C']*df['Values']
print(df)
ID A B C Values
0 1 0.0 0 10 10
1 1 2.0 0 0 2
2 1 0.0 0 3 3
3 2 4.0 0 0 4
4 3 0.0 9 0 9
5 3 0.0 3 0 3
6 4 4.0 0 0 4
7 5 0.0 0 0 5
8 6 NaN 0 2 2
9 7 0.0 0 3 3
If you want to explicitly check the condition where values of A, B, C are 1 (maybe because those columns could have values other than Nans or 0s), then you can use this -
df[['A','B','C']] = (df[['A','B','C']] == 1)*df[['Values']].values
This will replace the columns A, B, C in the original data but, also replaces Nans with 0.
My df looks like
a b c
0 1 nan
0 2 3
0 3 4
1 1 nan
I need a itertools.product() like combination of the entries in rows within groups of 'a'. Here 2 possible ways, since the second row has 2 different values:
a b
1 0 1
0 2
0 3
2 0 1
0 3
0 3
3 0 1
0 2
0 4
4 0 1
0 3
0 4
5 1 1
Any ideas?
In your case
df=pd.concat([y.dropna(axis=1,thresh=1).ffill(1).melt('a') for x , y in df.groupby('a')])
a variable value
0 0.0 b 1.0
1 0.0 b 2.0
2 0.0 b 3.0
3 0.0 c 1.0
4 0.0 c 3.0
5 0.0 c 3.0
0 1.0 b 1.0
I have a dataframe as below :
distance_along_path ID
0 0 1
1 2.2 1
2 4.5 1
3 7.0 1
4 0 2
5 0 3
6 3.0 2
7 5.0 3
8 0 4
9 2.0 4
10 5.0 4
11 0 5
12 3.0 5
11 7.0 4
12
I want be able to group these by id's first and the by distance_along_path values, every time a 0 is seen in distance along path for the id, new group is created and until the next 0 all these rows are under A group as indicated below
distance_along_path ID group
0 0 1 1
1 2.2 1 1
2 4.5 1 1
3 7.0 1 1
4 0 1 2
5 0 2 3
6 3.0 1 2
7 5.0 2 3
8 0 2 4
9 2.0 2 4
10 5.0 2 4
11 0 1 5
12 3.0 1 5
13 7.0 1 5
14 0 1 6
15 0 2 7
16 3.0 1 6
17 5.0 2 7
18 1.0 2 7
Thank you
try the following:
grp_id = df.groupby(['ID']).id.count().reset_index()
grp_distance = grp_id.groupby(['distance_along_path'].grp_id['distance_along_path']==0