Python iterate through month, use month in between Query - python

I have the following model:
Deal(models.Model):
start_date = models.DateTimeField()
end_date = models.DateTimeField()
I want to iterate through a given year
year = '2010'
For each month in year I want to execute a query to see if the month is between start_date and end_date.
How can I iterate through a given year? Use the month to do a query?
SELECT * FROM deals WHERE month BETWEEN start_date AND end_date
The outcome will tell me if I had a deal in January 2010 and/or in February 2010, etc.

How can I iterate through a given year?
You could use python-dateutil's rrule. Install with command pip install python-dateutil.
Example usage:
In [1]: from datetime import datetime
In [2]: from dateutil import rrule
In [3]: list(rrule.rrule(rrule.MONTHLY, dtstart=datetime(2010,01,01,00,01), count=12))
Out[3]:
[datetime.datetime(2010, 1, 1, 0, 1),
datetime.datetime(2010, 2, 1, 0, 1),
datetime.datetime(2010, 3, 1, 0, 1),
datetime.datetime(2010, 4, 1, 0, 1),
datetime.datetime(2010, 5, 1, 0, 1),
datetime.datetime(2010, 6, 1, 0, 1),
datetime.datetime(2010, 7, 1, 0, 1),
datetime.datetime(2010, 8, 1, 0, 1),
datetime.datetime(2010, 9, 1, 0, 1),
datetime.datetime(2010, 10, 1, 0, 1),
datetime.datetime(2010, 11, 1, 0, 1),
datetime.datetime(2010, 12, 1, 0, 1)]
Use the month to do a query?
You could iterate over months like this:
In [1]: from dateutil import rrule
In [2]: from datetime import datetime
In [3]: months = list(rrule.rrule(rrule.MONTHLY, dtstart=datetime(2010,01,01,00,01), count=13))
In [4]: i = 0
In [5]: while i < len(months) - 1:
...: print "start_date", months[i], "end_date", months[i+1]
...: i += 1
...:
start_date 2010-01-01 00:01:00 end_date 2010-02-01 00:01:00
start_date 2010-02-01 00:01:00 end_date 2010-03-01 00:01:00
start_date 2010-03-01 00:01:00 end_date 2010-04-01 00:01:00
start_date 2010-04-01 00:01:00 end_date 2010-05-01 00:01:00
start_date 2010-05-01 00:01:00 end_date 2010-06-01 00:01:00
start_date 2010-06-01 00:01:00 end_date 2010-07-01 00:01:00
start_date 2010-07-01 00:01:00 end_date 2010-08-01 00:01:00
start_date 2010-08-01 00:01:00 end_date 2010-09-01 00:01:00
start_date 2010-09-01 00:01:00 end_date 2010-10-01 00:01:00
start_date 2010-10-01 00:01:00 end_date 2010-11-01 00:01:00
start_date 2010-11-01 00:01:00 end_date 2010-12-01 00:01:00
start_date 2010-12-01 00:01:00 end_date 2011-01-01 00:01:00
Replace the "print" statement with a query. Feel free to adapt it to your needs.
There is probably a better way but that could do the job.

Related

Is there any function calculate duration in minutes between two datetimes values?

This is my dataframe.
Start_hour End_date
23:58:00 00:26:00
23:56:00 00:01:00
23:18:00 23:36:00
How can I get in a new column the difference (in minutes) between these two columns?
>>> from datetime import datetime
>>>
>>> before = datetime.now()
>>> print('wait for more than 1 minute')
wait for more than 1 minute
>>> after = datetime.now()
>>> td = after - before
>>>
>>> td
datetime.timedelta(seconds=98, microseconds=389121)
>>> td.total_seconds()
98.389121
>>> td.total_seconds() / 60
1.6398186833333335
Then you can round it or use it as-is.
You can do something like this:
import pandas as pd
df = pd.DataFrame({
'Start_hour': ['23:58:00', '23:56:00', '23:18:00'],
'End_date': ['00:26:00', '00:01:00', '23:36:00']}
)
df['Start_hour'] = pd.to_datetime(df['Start_hour'])
df['End_date'] = pd.to_datetime(df['End_date'])
df['diff'] = df.apply(
lambda row: (row['End_date']-row['Start_hour']).seconds / 60,
axis=1
)
print(df)
Start_hour End_date diff
0 2021-03-29 23:58:00 2021-03-29 00:26:00 28.0
1 2021-03-29 23:56:00 2021-03-29 00:01:00 5.0
2 2021-03-29 23:18:00 2021-03-29 23:36:00 18.0
You can also rearrange your dates as string again if you like:
df['Start_hour'] = df['Start_hour'].apply(lambda x: x.strftime('%H:%M:%S'))
df['End_date'] = df['End_date'].apply(lambda x: x.strftime('%H:%M:%S'))
print(df)
Output:
Start_hour End_date diff
0 23:58:00 00:26:00 28.0
1 23:56:00 00:01:00 5.0
2 23:18:00 23:36:00 18.0
Short answer:
df['interval'] = df['End_date'] - df['Start_hour']
df['interval'][df['End_date'] < df['Start_hour']] += timedelta(hours=24)
Why so:
You probably trying to solve the problem that your Start_hout and End_date values sometimes belong to a different days, and that's why you can't just substutute one from the other.
It your time window never exceeds 24 hours interval, you could use some modular arithmetic to deal with 23:59:59 - 00:00:00 border:
if End_date < Start_hour, this always means End_date belongs to a next day
this implies, if End_date - Start_hour < 0 then we should add 24 hours to End_date to find the actual difference
The final formula is:
if rec['Start_hour'] < rec['End_date']:
offset = 0
else:
offset = timedelta(hours=24)
rec['delta'] = offset + rec['End_date'] - rec['Start_hour']
To do the same with pandas.DataFrame we need to change code accordingly. And
that's how we get the snippet from the beginning of the answer.
import pandas as pd
df = pd.DataFrame([
{'Start_hour': datetime(1, 1, 1, 23, 58, 0), 'End_date': datetime(1, 1, 1, 0, 26, 0)},
{'Start_hour': datetime(1, 1, 1, 23, 58, 0), 'End_date': datetime(1, 1, 1, 23, 59, 0)},
])
# ...
df['interval'] = df['End_date'] - df['Start_hour']
df['interval'][df['End_date'] < df['Start_hour']] += timedelta(hours=24)
> df
Start_hour End_date interval
0 0001-01-01 23:58:00 0001-01-01 00:26:00 0 days 00:28:00
1 0001-01-01 23:58:00 0001-01-01 23:59:00 0 days 00:01:00

set_codes in multiIndexed pandas series

I want to multiIndex an array of data.
Initially, I was indexing my data with datetime, but for some later applications, I had to add another numeric index (that goes from 0 the len(array)-1).
I have written those little lines:
O = [0.701733664614, 0.699495411782, 0.572129320819, 0.613315597684, 0.58079660603, 0.596638918579, 0.48453382119]
Ab = [datetime.datetime(2018, 12, 11, 14, 0), datetime.datetime(2018, 12, 21, 10, 0), datetime.datetime(2018, 12, 21, 14, 0), datetime.datetime(2019, 1, 1, 10, 0), datetime.datetime(2019, 1, 1, 14, 0), datetime.datetime(2019, 1, 11, 10, 0), datetime.datetime(2019, 1, 11, 14, 0)]
tst = pd.Series(O,index=Ab)
ld = len(tst)
index = pd.MultiIndex.from_product([(x for x in range(0,ld)),Ab], names=['id','dtime'])
print (index)
data = pd.Series(O,index=index)
But when printting index, I get some bizzare ''codes'':
The levels & names are perfect, but the codes go from 0 to 763...764 times (instead of one)!
I tried to add the set_codes command:
index.set_codes([x for x in range(0,ld)], level=0)
print (index)
I vain, I have the following error :
ValueError: Unequal code lengths: [764, 583696]
the initial pandas series:
print (tst)
2005-01-01 14:00:00 0.544177
2005-01-01 14:00:00 0.544177
2005-01-21 14:00:00 0.602239
...
2019-05-21 10:00:00 0.446813
2019-05-21 14:00:00 0.466573
Length: 764, dtype: float64
the new expected one
id dtime
0 2005-01-01 14:00:00 0.544177
1 2005-01-01 14:00:00 0.544177
2 2005-01-21 14:00:00 0.602239
...
762 2019-05-21 10:00:00 0.446813
763 2019-05-21 14:00:00 0.466573
Thanks in advance
You can create new index by MultiIndex.from_arrays and reassign to Series:
s.index = pd.MultiIndex.from_arrays([np.arange(len(s)), s.index], names=['id','dtime'])

How to handle sum and multiplication of time intervals?

I need to calculate deadline(datetime) after adding N(int) intervals (represented by relativedelta, because it can be months or years and also in seconds or dates). I can do it simply by multiplying interval by N and summing it to start_date(datetime). At the same time, I need to do it in multiple steps, like calculate 5th deadline, 6th..., so I just add interval to start_date for N times.
In some cases, these two methods provide different results.
Assume start_date = datetime(year=2019, month=1, day=2), interval = relativedelta(months=1, days=2), and N = 16.
From one point of view, both of mothods are correct, because interval*16 = relativedelta(years=+1, months=+4, days=+32), start_date+16*interval = 2019-01-01 + 1 year + 4 months + 32 days = 2020/05/1 + 32 days = 2020-06-02(because May has 31 day).
At the sametime, when we add them one by one it results into 2020/05/1 + 1 month + 2 days = 2020/06/02
The problem is related to "month-days overflow", but I can't figure out how to handle it. Always use sum instead of multiplication? But isn't calc-safe (imagine 9999999th deadline with interval=1 day and 1 sec)
Steps to reproduce:
def test_relative_sum_mult_with_date():
start = datetime(year=2019, month=1, day=1)
interval = relativedelta(months=1, days=2)
check_up_to = 100
for i in range(check_up_to):
multiplied = start + i*interval
summed = start
for j in range(i):
summed += interval
print('i=%s, i*interval=%s, diff(multiplied-summed)=%s, multiplied=%s, summed=%s' %
(i, i*interval, multiplied-summed, multiplied, summed))
assert multiplied == summed
Trace:
i*interval=relativedelta(), diff(multiplied-summed)=0:00:00, multiplied=2019-01-01 00:00:00, summed=2019-01-01 00:00:00
i=1, i*interval=relativedelta(months=+1, days=+2), diff(multiplied-summed)=0:00:00, multiplied=2019-02-03 00:00:00, summed=2019-02-03 00:00:00
i=2, i*interval=relativedelta(months=+2, days=+4), diff(multiplied-summed)=0:00:00, multiplied=2019-03-05 00:00:00, summed=2019-03-05 00:00:00
i=3, i*interval=relativedelta(months=+3, days=+6), diff(multiplied-summed)=0:00:00, multiplied=2019-04-07 00:00:00, summed=2019-04-07 00:00:00
i=4, i*interval=relativedelta(months=+4, days=+8), diff(multiplied-summed)=0:00:00, multiplied=2019-05-09 00:00:00, summed=2019-05-09 00:00:00
i=5, i*interval=relativedelta(months=+5, days=+10), diff(multiplied-summed)=0:00:00, multiplied=2019-06-11 00:00:00, summed=2019-06-11 00:00:00
i=6, i*interval=relativedelta(months=+6, days=+12), diff(multiplied-summed)=0:00:00, multiplied=2019-07-13 00:00:00, summed=2019-07-13 00:00:00
i=7, i*interval=relativedelta(months=+7, days=+14), diff(multiplied-summed)=0:00:00, multiplied=2019-08-15 00:00:00, summed=2019-08-15 00:00:00
i=8, i*interval=relativedelta(months=+8, days=+16), diff(multiplied-summed)=0:00:00, multiplied=2019-09-17 00:00:00, summed=2019-09-17 00:00:00
i=9, i*interval=relativedelta(months=+9, days=+18), diff(multiplied-summed)=0:00:00, multiplied=2019-10-19 00:00:00, summed=2019-10-19 00:00:00
i=10, i*interval=relativedelta(months=+10, days=+20), diff(multiplied-summed)=0:00:00, multiplied=2019-11-21 00:00:00, summed=2019-11-21 00:00:00
i=11, i*interval=relativedelta(months=+11, days=+22), diff(multiplied-summed)=0:00:00, multiplied=2019-12-23 00:00:00, summed=2019-12-23 00:00:00
i=12, i*interval=relativedelta(years=+1, days=+24), diff(multiplied-summed)=0:00:00, multiplied=2020-01-25 00:00:00, summed=2020-01-25 00:00:00
i=13, i*interval=relativedelta(years=+1, months=+1, days=+26), diff(multiplied-summed)=0:00:00, multiplied=2020-02-27 00:00:00, summed=2020-02-27 00:00:00
i=14, i*interval=relativedelta(years=+1, months=+2, days=+28), diff(multiplied-summed)=0:00:00, multiplied=2020-03-29 00:00:00, summed=2020-03-29 00:00:00
i=15, i*interval=relativedelta(years=+1, months=+3, days=+30), diff(multiplied-summed)=0:00:00, multiplied=2020-05-01 00:00:00, summed=2020-05-01 00:00:00
i=16, i*interval=relativedelta(years=+1, months=+4, days=+32), diff(multiplied-summed)=-1 day, 0:00:00, multiplied=2020-06-02 00:00:00, summed=2020-06-03 00:00:00
datetime.datetime(2020, 6, 2, 0, 0, 0) != datetime.datetime(2020, 6, 3, 0, 0, 0)
Expected :datetime.datetime(2020, 6, 3, 0, 0, 0)
Actual :datetime.datetime(2020, 6, 2, 0, 0, 0)
Versions:
Python 3.6
python-dateutil==2.8.0
Let me put your example in a more simple manner:
start = datetime(year=2018, month=3, day=29)
interval = relativedelta(months=1, days=2)
d1 = start + interval * 2 # 2018-06-02
d2 = start + interval + interval # 2018-06-03
print(d1, d2)
So I don't even think it's a library bug: just follow the same calculations mentally and see they make some sense.

Filter Dataframe with a list of time ranges

below is a simplified version of my setup:
import pandas as pd
import datetime as dt
df_data = pd.DataFrame({'DateTime' : [dt.datetime(2017, 9, 1, 0, 0, 0),dt.datetime(2017, 9, 1, 1, 0, 0),dt.datetime(2017, 9, 1, 2, 0, 0),dt.datetime(2017, 9, 1, 3, 0, 0)], 'Data' : [1,2,3,5]})
df_timeRanges = pd.DataFrame({'startTime':[dt.datetime(2017, 8, 30, 0, 0, 0), dt.datetime(2017, 9, 1, 1, 30, 0)], 'endTime':[dt.datetime(2017, 9, 1, 0, 30, 0), dt.datetime(2017, 9, 1, 2, 30, 0)]})
print df_data
print df_timeRanges
This gives:
Data DateTime
0 1 2017-09-01 00:00:00
1 2 2017-09-01 01:00:00
2 3 2017-09-01 02:00:00
3 5 2017-09-01 03:00:00
endTime startTime
0 2017-09-01 00:30:00 2017-08-30 00:00:00
1 2017-09-01 02:30:00 2017-09-01 01:30:00
I would like to filter df_data with df_timeRanges, with the remaining rows in a single dataframe, kind of like:
df_data_filt = df_data[(df_data['DateTime'] >= df_timeRanges['startTime']) & (df_data['DateTime'] <= df_timeRanges['endTime'])]
I did not expect the above line to work, and it returned this error:
ValueError: Can only compare identically-labeled Series objects
Would anyone be able to provide some tips on this? The df_data and df_timeRanges in my real task are much bigger.
Thanks in advance
IIUIC, Use
In [794]: mask = np.logical_or.reduce([
(df_data.DateTime >= x.startTime) & (df_data.DateTime <= x.endTime)
for i, x in df_timeRanges.iterrows()])
In [795]: df_data[mask]
Out[795]:
Data DateTime
0 1 2017-09-01 00:00:00
2 3 2017-09-01 02:00:00
Or, also
In [807]: func = lambda x: (df_data.DateTime >= x.startTime) & (df_data.DateTime <= x.endTime)
In [808]: df_data[df_timeRanges.apply(func, axis=1).any()]
Out[808]:
Data DateTime
0 1 2017-09-01 00:00:00
2 3 2017-09-01 02:00:00

Python checking daytime

Basically, I want my script to pause between 4 and 5 AM. The only way to do this I've come up with so far is this:
seconds_into_day = time.time() % (60*60*24)
if 60*60*4 < seconds_into_day < 60*60*5:
sleep(time_left_till_5am)
Any "proper" way to do this? Aka some built-in function/lib for calculating time; rather than just using seconds all the time?
You want datetime
The datetime module supplies classes for manipulating dates and times in both simple and complex ways
If you use date.hour from datetime.now() you'll get the current hour:
datetimenow = datetime.now();
if datetimenow.hour in range(4, 5)
sleep(time_left_till_5am)
You can calculate time_left_till_5am by taking 60 - datetimenow.minute multiplying by 60 and adding to 60 - datetimenow.second.
Python has a built-in datetime library: http://docs.python.org/library/datetime.html
This should probably get you what you're after:
import datetime as dt
from time import sleep
now = dt.datetime.now()
if now.hour >= 4 andnow.hour < 5:
sleep((60 - now.minute)*60 + (60 - now.second))
OK, the above works, but here's the purer, less error-prone solution (and what I was originally thinking of but suddenly forgot how to do):
import datetime as dt
from time import sleep
now = dt.datetime.now()
pause = dt.datetime(now.year, now.month, now.day, 4)
start = dt.datetime(now.year, now.month, now.day, 5)
if now >= pause and now < start:
sleep((start - now).seconds)
That's where my original "timedelta" comment came from -- what you get from subtracting two datetime objects is a timedelta object (which in this case we pull the 'seconds' attribute from).
The following code covers the more general case where a script needs to pause during any fixed window of less than 24 hours duration. Example: must sleep between 11:00 PM and 01:00 AM.
import datetime as dt
def sleep_duration(sleep_from, sleep_to, now=None):
# sleep_* are datetime.time objects
# now is a datetime.datetime object
if now is None:
now = dt.datetime.now()
duration = 0
lo = dt.datetime.combine(now, sleep_from)
hi = dt.datetime.combine(now, sleep_to)
if lo <= now < hi:
duration = (hi - now).seconds
elif hi < lo:
if now >= lo:
duration = (hi + dt.timedelta(hours=24) - now).seconds
elif now < hi:
duration = (hi - now).seconds
return duration
tests = [
(4, 5, 3, 30),
(4, 5, 4, 0),
(4, 5, 4, 30),
(4, 5, 5, 0),
(4, 5, 5, 30),
(23, 1, 0, 0),
(23, 1, 0, 30),
(23, 1, 0, 59),
(23, 1, 1, 0),
(23, 1, 1, 30),
(23, 1, 22, 30),
(23, 1, 22, 59),
(23, 1, 23, 0),
(23, 1, 23, 1),
(23, 1, 23, 59),
]
for hfrom, hto, hnow, mnow in tests:
sfrom = dt.time(hfrom)
sto = dt.time(hto)
dnow = dt.datetime(2010, 7, 5, hnow, mnow)
print sfrom, sto, dnow, sleep_duration(sfrom, sto, dnow)
and here's the output:
04:00:00 05:00:00 2010-07-05 03:30:00 0
04:00:00 05:00:00 2010-07-05 04:00:00 3600
04:00:00 05:00:00 2010-07-05 04:30:00 1800
04:00:00 05:00:00 2010-07-05 05:00:00 0
04:00:00 05:00:00 2010-07-05 05:30:00 0
23:00:00 01:00:00 2010-07-05 00:00:00 3600
23:00:00 01:00:00 2010-07-05 00:30:00 1800
23:00:00 01:00:00 2010-07-05 00:59:00 60
23:00:00 01:00:00 2010-07-05 01:00:00 0
23:00:00 01:00:00 2010-07-05 01:30:00 0
23:00:00 01:00:00 2010-07-05 22:30:00 0
23:00:00 01:00:00 2010-07-05 22:59:00 0
23:00:00 01:00:00 2010-07-05 23:00:00 7200
23:00:00 01:00:00 2010-07-05 23:01:00 7140
23:00:00 01:00:00 2010-07-05 23:59:00 3660
When dealing with dates and times in Python I still prefer mxDateTime over Python's datetime module as although the built-in one has improved greatly over the years it's still rather awkward and lacking in comparison. So if interested go here: mxDateTime It's free to download and use. Makes life much easier when dealing with datetime math.
import mx.DateTime as dt
from time import sleep
now = dt.now()
if 4 <= now.hour < 5:
stop = dt.RelativeDateTime(hour=5, minute=0, second=0)
secs_remaining = ((now + stop) - now).seconds
sleep(secs_remaining)

Categories