I would like to get the delta in hours between 8p.m. and 9 a.m based on a start_time and end_time.
It is like counting the delta in hours between two dates but taking only in account the hours between 8 p.m. and 9.am
for example:
start_date, end_date, result
2020-10-25 10:35:30, 2020-10-25 23:35:32, 3hours and 35 minutes (start counting at 8pm)
2020-10-25 08:14:37, 2020-10-25 20:14:37, approx 1 hour (from 8h14 to 9am + from 8pm to 8h14 )
2020-10-25 00:00:00, 2020-10-26 00:00:00, from 8pm to 9 am = (4+9 = 13 hours)
Thank you
Related
The dataframe contains date column, revenue column(for specific date) and the name of the day.
This is the code for creating the df:
pd.DataFrame({'Date':['2015-01-08','2015-01-09','2015-01-10','2015-02-10','2015-08-09','2015-08-13','2015-11-09','2015-11-15'],
'Revenue':[15,4,15,13,16,20,12,9],
'Weekday':['Monday','Tuesday','Wednesday','Monday','Friday','Saturday','Monday','Sunday']})
I want to find the sum of revenue between Mondays:
2015-02-10 34 Monday
2015-11-09 49 Monday etc.
First idea is used Weekday for groups by compare by Monday with cumulative sum and aggregate per groups:
df1 = (df.groupby(df['Weekday'].eq('Monday').cumsum())
.agg({'Date':'first','Revenue':'sum', 'Weekday':'first'}))
print (df1)
Date Revenue Weekday
Weekday
1 2015-01-08 34 Monday
2 2015-02-10 49 Monday
3 2015-11-09 21 Monday
But seems not matched Weekday column with Dates in sample data, so DataFrame.resample per weeks starting by Mondays return different output:
df['Date'] = pd.to_datetime(df['Date'])
df2 = df.resample('W-Mon', on='Date').agg({'Revenue':'sum', 'Weekday':'first'}).dropna()
print (df2)
Revenue Weekday
Date
2015-01-12 34 Monday
2015-02-16 13 Monday
2015-08-10 16 Friday
2015-08-17 20 Saturday
2015-11-09 12 Monday
2015-11-16 9 Sunday
First convert your Date column from string to datetime type:
df.Date = pd.to_datetime(df.Date)
Then generate the result:
result = df.groupby(pd.Grouper(key='Date', freq='W-MON', label='left')).Revenue.sum()/
.reset_index()
This result does not contain day of week and in my opinion this is OK,
as they will be all Mondays.
If you want to see only weeks with non-zero result, you can get it as:
result[result.Revenue != 0]
For your source data the result is:
Date Revenue
0 2015-01-05 34
5 2015-02-09 13
30 2015-08-03 16
31 2015-08-10 20
43 2015-11-02 12
44 2015-11-09 9
My company uses a 4-4-5 calendar for reporting purposes. Each month (aka period) is 4-weeks long, except every 3rd month is 5-weeks long.
Pandas seems to have good support for custom calendar periods. However, I'm having trouble figuring out the correct frequency string or custom business month offset to achieve months for a 4-4-5 calendar.
For example:
df_index = pd.date_range("2020-03-29", "2021-03-27", freq="D", name="date")
df = pd.DataFrame(
index=df_index, columns=["a"], data=np.random.randint(0, 100, size=len(df_index))
)
df.groupby(pd.Grouper(level=0, freq="4W-SUN")).mean()
Grouping by 4-weeks starting on Sunday results in the following. The first three month start dates are correct but I need every third month to be 5-weeks long. The 4th month start date should be 2020-06-28.
a
date
2020-03-29 16.000000
2020-04-26 50.250000
2020-05-24 39.071429
2020-06-21 52.464286
2020-07-19 41.535714
2020-08-16 46.178571
2020-09-13 51.857143
2020-10-11 44.250000
2020-11-08 47.714286
2020-12-06 56.892857
2021-01-03 55.821429
2021-01-31 53.464286
2021-02-28 53.607143
2021-03-28 45.037037
Essentially what I'd like to achieve is something like this:
a
date
2020-03-29 20.000000
2020-04-26 50.750000
2020-05-24 49.750000
2020-06-28 49.964286
2020-07-26 52.214286
2020-08-23 47.714286
2020-09-27 46.250000
2020-10-25 53.357143
2020-11-22 52.035714
2020-12-27 39.750000
2021-01-24 43.428571
2021-02-21 49.392857
Pandas currently support only yearly and quarterly 5253 (aka 4-4-5 calendar).
See is pandas.tseries.offsets.FY5253 and pandas.tseries.offsets.FY5253Quarter
df_index = pd.date_range("2020-03-29", "2021-03-27", freq="D", name="date")
df = pd.DataFrame(index=df_index)
df['a'] = np.random.randint(0, 100, df.shape[0])
So indeed you need some more work to get to week level and maintain a 4-4-5 calendar. You could align to quarters using the native pandas offset and fill-in the 4-4-5 week pattern manually.
def date_range(start, end, offset_array, name=None):
start = pd.to_datetime(start)
end = pd.to_datetime(end)
index = []
start -= offset_array[0]
while(start<end):
for x in offset_array:
start += x
if start > end:
break
index.append(start)
return pd.Series(index, name=name)
This function takes a list of offsets rather than a regular frequency period, so it allows to move from date to date following the offsets in the given array:
offset_445 = [
pd.tseries.offsets.FY5253Quarter(weekday=6),
4*pd.tseries.offsets.Week(weekday=6),
4*pd.tseries.offsets.Week(weekday=6),
]
df_index_445 = date_range("2020-03-29", "2021-03-27", offset_445, name='date')
Out:
0 2020-05-03
1 2020-05-31
2 2020-06-28
3 2020-08-02
4 2020-08-30
5 2020-09-27
6 2020-11-01
7 2020-11-29
8 2020-12-27
9 2021-01-31
10 2021-02-28
Name: date, dtype: datetime64[ns]
Once the index is created, then it's back to aggregations logic to get the data in the right row buckets. Assuming that you want the mean for the start of each 4 or 5 week period, according to the df_index_445 you have generated, it could look like this:
# calculate the mean on reindex groups
reindex = df_index_445.searchsorted(df.index, side='right') - 1
res = df.groupby(reindex).mean()
# filter valid output
res = res[res.index>=0]
res.index = df_index_445
Out:
a
2020-05-03 47.857143
2020-05-31 53.071429
2020-06-28 49.257143
2020-08-02 40.142857
2020-08-30 47.250000
2020-09-27 52.485714
2020-11-01 48.285714
2020-11-29 56.178571
2020-12-27 51.428571
2021-01-31 50.464286
2021-02-28 53.642857
Note that since the frequency is not regular, pandas will set the datetime index frequency to None.
I’m trying to look at some sales data for a small store. I have a time stamp of when the settlement was made, but sometimes it’s done before midnight and sometimes its done after midnight.
This is giving me data correct for some days and incorrect for others, as anything after midnight should be for the day before. I couldn’t find the correct pandas documentation for what I’m looking for.
Is there an if else solution to create a new column, loop through the NEW_TIMESTAMP column and set a custom timeframe (if after midnight, but before 3pm: set the day before ; else set the day). Every time I write something it either runs forever, or it crashes jupyter.
Data:
What I did is I created another series which says when a day should be offset back by one day, and I multiplied it by a pd.timedelta object, such that 0 turns into "0 days" and 1 turns into "1 day". Subtracting two series gives the right result.
Let me know how the following code works for you.
import pandas as pd
import numpy as np
# copied from https://stackoverflow.com/questions/50559078/generating-random-dates-within-a-given-range-in-pandas
def random_dates(start, end, n=15):
start_u = start.value//10**9
end_u = end.value//10**9
return pd.to_datetime(np.random.randint(start_u, end_u, n), unit='s')
dates = random_dates(start=pd.to_datetime('2020-01-01'),
end=pd.to_datetime('2021-01-01'))
timestamps = pd.Series(dates)
# this takes only the hour component of every datetime
hours = timestamps.dt.hour
# this takes only the hour component of every datetime
dates = timestamps.dt.date
# this compares the hours with 15, and returns a boolean if it is smaller
flag_is_day_before = hours < 15
# now you can set the dates by multiplying the 1s and 0s with a day timedelta
new_dates = dates - pd.to_timedelta(1, unit='day') * flag_is_day_before
df = pd.DataFrame(data=dict(timestamps=timestamps, new_dates=new_dates))
print(df)
This outputs
timestamps new_dates
0 2020-07-10 20:11:13 2020-07-10
1 2020-05-04 01:20:07 2020-05-03
2 2020-03-30 09:17:36 2020-03-29
3 2020-06-01 16:16:58 2020-06-01
4 2020-09-22 04:53:33 2020-09-21
5 2020-08-02 20:07:26 2020-08-02
6 2020-03-22 14:06:53 2020-03-21
7 2020-03-14 14:21:12 2020-03-13
8 2020-07-16 20:50:22 2020-07-16
9 2020-09-26 13:26:55 2020-09-25
10 2020-11-08 17:27:22 2020-11-08
11 2020-11-01 13:32:46 2020-10-31
12 2020-03-12 12:26:21 2020-03-11
13 2020-12-28 08:04:29 2020-12-27
14 2020-04-06 02:46:59 2020-04-05
I have a file, df, that I wish to take the delta of every 7 day period and reflect the timestamp for that particular period
df:
Date Value
10/15/2020 75
10/14/2020 70
10/13/2020 65
10/12/2020 60
10/11/2020 55
10/10/2020 50
10/9/2020 45
10/8/2020 40
10/7/2020 35
10/6/2020 30
10/5/2020 25
10/4/2020 20
10/3/2020 15
10/2/2020 10
10/1/2020 5
Desired Output:
10/15/2020 to 10/9/2020 is 7 days with the delta being: 75 - 45 = 30
10/9/2020 timestamp would be: 30 and so on
Date Value
10/9/2020 30
10/2/2020 30
This is what I am doing:
df= df['Delta']=df.iloc[:,6].sub(df.iloc[:,0]),Date=pd.Series
(pd.date_range(pd.Timestamp('2020-10-
15'),
periods=7, freq='7d')))[['Delta','Date']]
I am also thinking I may be able to do this:
Edit I updated callDate to Date
for row in df.itertuples():
Date = datetime.strptime(row.Date, "%m/%d/%y %I:%M %p")
previousRecord = df['Date'].shift(-6).strptime(row.Date, "%m/%d/%y %I:%M
%p")
Delta = Date - previousRecord
Any suggestion is appreciated
Don't iterate through the dataframe. You can use a merge:
(df.merge(df.assign(Date=df['Date'] - pd.to_timedelta('6D')),
on='Date')
.assign(Value = lambda x: x['Value_y']-x['Value_x'])
[['Date','Value']]
)
Output:
Date Value
0 2020-10-09 30
1 2020-10-08 30
2 2020-10-07 30
3 2020-10-06 30
4 2020-10-05 30
5 2020-10-04 30
6 2020-10-03 30
7 2020-10-02 30
8 2020-10-01 30
The last block of code you wrote is the way I would do it. Only problem is in Delta = Date - previousRecord, there is nothing called Date here. You should instead be accessing the value associated with callDate.
I have the following models:
class Destination_Deal(models.Model):
name = models.CharField(_("Nombre"),max_length=200)
duration = models.IntegerField(_(u"Días"))
class Departure_Date(models.Model):
date_from= models.DateField(_('Desde'))
date_to= models.DateField(_('Hasta'))
destination_deal = models.ForeignKey(Destination_Deal,verbose_name = _("Oferta de Destino"))
I would like to filter Destination Deals that are suitable to travel in a weekend. That means:
Departure Day = Friday or Saturday
Return Day = Sunday. So duration must be 3 or 2 if departure day is Friday or Saturday.
Example
Destination_Deal
id name duration
1 Deal1 3
2 Deal2 5
3 Deal3 2
4 Deal4 7
Departure_Date
id date_from date_to destination_deal_id
1 2012-11-05 2012-11-15 1
2 2012-11-01 2012-12-16 2
3 2013-01-21 2013-01-27 3
4 2013-01-14 2013-01-18 3
5 2013-01-04 2013-01-11 4
Desired Result
ID1: 2012-11-09 was Friday and the deal's duration is 3. So in this case, Friday, Saturday and Sunday conform a valid weekend.
ID3: 2013-01-26 is Saturday and the deal's duration is 2. Also is valid.
--Edit--
Ok, sorry if I was not clear. I need to filter the Destination Deals based on the above weekend rule. I was thinking to do it by getting the date_from from the model (DateField) to python (datetime), iterate it until date_to and use weekday() function to check if it is a Friday or Saturday. I am aware of django weekday function but it will only work on a specific date (no range) so I would like to know if there is a simpler approach for this scenario.
where = '(deparure_date__date_from - deparure_date__date_to) > 4 || \
DAYOFWEEK(deparure_date__date_from) IN (1, 6, 7) || \
DAYOFWEEK(deparure_date__date_to) IN (1, 6, 7)'
Destination_Deal.objects.filter(duration__in=[1, 2, 3]) \
.extra(where=[where])
If duration in 1, 2, 3
And If difference of date_from and date_to > 4 then definitely there is going to be either Friday, or Saturday or Sunday in between.
or check if date_from is Friday, or Saturday or Sunday
or date_to is Friday, or Saturday or Sunday