I have time series data with column that calculates 2-day sums. I want to get the last value in each 2 day period and write it in a column, by user id.
The data looks like (with desired output column 'new'):
df
timestamp uid cols new
2020-10-10 00:00 1 10
2020-10-10 00:00 2 5
2020-10-10 00:10 1 20
2020-10-10 00:10 2 20
2020-10-10 00:20 1 40
....
2020-10-11 23:50 1 3400
2020-10-11 23:50 2 5250
2020-10-12 00:00 1 20 3400
2020-10-12 00:00 2 15 5250
How can I achieve this?
Related
I have two dataframes, df_rates and df_profit as shown below. df_rates has a time-date value its column name with values as certain rates and the index values denotes the minutes before the time-date value of the column. (i.e. row 1 denotes 0 mins before 2012-03-31 23:45:00, row 2 denotes 5 mins before 2012-03-31 23:45:00 and so on). Whereas df_profit has timestamps as its index and there is a Profit column.
I want to add the Profit column from df_profit as a new column to df_rates under the following condition:
If the column name of df_rates is '2012-03-31 23:45:00', then find the timestamp that is 30 mins before it i.e. '2012-03-31 23:15:00' in the index of df_profit, then populate the new column with the corresponding profit value (-21.48) at row where 'Mins before time' is 0.
The next value (-8.538) in line from Profit column of df_profit, where the timestamp is ‘2012-03-31 23:00:00’ should be populated in the new column against row where ‘Mins before time’ is 15 and so on.
With some help I implemented the below code but that grabs and populates the value from the matching timestamp on df_profit index. I am unsure on how to grab value from the df_profit index that is 30 mins before the column name of df_rates. Could someone kindly help?
df_rates
Mins before time 2012-03-31 23:45:00
0 113.1
5 112.1
10 113.1
15 113.17
20 103.17
25 133.17
30 101.39
df_profit
Profit
2012-04-01 00:30:00 251.71
2012-04-01 00:15:00 652.782
2012-04-01 00:00:00 458.099
2012-03-31 23:45:00 3504.664
2012-03-31 23:30:00 1215.76
2012-03-31 23:15:00 -21.48
2012-03-31 23:00:00 -8.538
2012-03-31 22:40:00 -5.11
Expected dataframe:
Mins before time 2022-01-31 23:45:00+01:00 New_column
0 113.1 -21.48
5 112.1
10 113.1
15 113.17 -8.538
20 103.17
25 133.17
30 101.39 -5.11
Implemented code :
df_rates['New column'] = df_profit.Profit.reindex(pd.to_datetime(df_rates.columns[-1]) - pd.to_timedelta(df_rates['Mins before time'], unit='min')).to_numpy()
Like this:
anchor_time = df_rates.columns[-1]
lookback_minutes = 30
df_rates = ( df_rates
.set_index(anchor_time - pd.to_timedelta(df_rates['Mins before time'] + lookback_minutes, unit='min'))
.join(df_profit).reset_index(drop=True).rename(columns={'Profit':'New_column'}) )
Output:
Mins before time 2012-03-31 23:45:00 New_column
0 0 113.10 -21.480
1 5 112.10 NaN
2 10 113.10 NaN
3 15 113.17 -8.538
4 20 103.17 NaN
5 25 133.17 NaN
6 30 101.39 -5.110
I have data, I want to add a column that shows the moving average of the val column for each day.
df
timestamp val val_mean
2022-10-10 00:00 10 10
2022-10-10 00:01 20 15
..
2022-10-10 23:59 50 23
2022-10-11 00:00 80 80
How can I achieve this
Looks like you want a grouped, expanding mean:
group = pd.to_datetime(df['timestamp']).dt.normalize()
df['val_mean'] = df.groupby(group)['val'].expanding().mean().droplevel(0)
output:
timestamp val val_mean
0 2022-10-10 00:00 10 10.000000
1 2022-10-10 00:01 20 15.000000
2 2022-10-10 23:59 50 26.666667
3 2022-10-11 00:00 80 80.000000
I have attached the example of a dataframe which is based quarterly. I wish to resample it to per minute without any aggregation
Input dataframe:
Date (CET)
Price
2020-01-01 11:00
50
2020-01-01 11:15
60
2020-01-01 11:15
100
The output I want is this:
Date (CET)
Price
2020-01-01 11:00
50
2020-01-01 11:01
50
2020-01-01 11:02
50
2020-01-01 11:03
50
2020-01-01 11:04
50
2020-01-01 11:05
50
2020-01-01 11:06
50
2020-01-01 11:07
50
2020-01-01 11:08
50
2020-01-01 11:09
50
2020-01-01 11:10
50
2020-01-01 11:11
50
2020-01-01 11:12
50
2020-01-01 11:13
50
2020-01-01 11:14
50
2020-01-01 11:15
60
I tried using df.resample, but it requires me to aggregated based on the mean() or sum(), which I don't want. I want the values to remain the same for a particular quarter. Like in the output table the price remains 50 from 11:00 to 11:14
Use:
#convert to DatetimeIndex
df['Date (CET)'] = pd.to_datetime(df['Date (CET)'])
#remove duplicates
df = df.drop_duplicates('Date (CET)')
df = df.set_index('Date (CET)')
#forward filling values - upsample
df.resample('Min').ffill()
I have a pandas dataframe with a minute datetime index:
Index
Col1
2022-12-25 09:01:00
5
2022-12-25 09:10:00
15
2022-12-25 11:12:00
10
2022-12-26 10:05:00
2
2022-12-26 12:29:00
2
2022-12-26 13:56:00
5
I want to remove the daily average from this data, resulting in this dataframe (here 10 for the first day and 3 for the second day):
Index
Col1
2022-12-25 09:01:00
-5
2022-12-25 09:10:00
5
2022-12-25 11:12:00
0
2022-12-26 10:05:00
-1
2022-12-26 12:29:00
-1
2022-12-26 13:56:00
2
Assuming df is your dataframe, this should do the trick:
for day, df_group in df.groupby(by=df.index.day):
df.loc[df_group.index,"Col1"] -= df_group["Col1"].mean()
I have a .csv file with some data. There is only one column of in this file, which includes timestamps. I need to organize that data into bins of 30 minutes. This is what my data looks like:
Timestamp
04/01/2019 11:03
05/01/2019 16:30
06/01/2019 13:19
08/01/2019 13:53
09/01/2019 13:43
So in this case, the last two data points would be grouped together in the bin that includes all the data from 13:30 to 14:00.
This is what I have already tried
df = pd.read_csv('book.csv')
df['Timestamp'] = pd.to_datetime(df.Timestamp)
df.groupby(pd.Grouper(key='Timestamp',
freq='30min')).count().dropna()
I am getting around 7000 rows showing all hours for all days with the count next to them, like this:
2019-09-01 03:00:00 0
2019-09-01 03:30:00 0
2019-09-01 04:00:00 0
...
I want to create bins for only the hours that I have in my dataset. I want to see something like this:
Time Count
11:00:00 1
13:00:00 1
13:30:00 2 (we have two data points in this interval)
16:30:00 1
Thanks in advance!
Use groupby.size as:
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = df.Timestamp.dt.floor('30min').dt.time.to_frame()\
.groupby('Timestamp').size()\
.reset_index(name='Count')
Or as per suggestion by jpp:
df = df.Timestamp.dt.floor('30min').dt.time.value_counts().reset_index(name='Count')
print(df)
Timestamp Count
0 11:00:00 1
1 13:00:00 1
2 13:30:00 2
3 16:30:00 1