"Uncertain" datetime objects in python? - python

I have a bunch of field data, where some have a well known acquisition day, while for some the acquisition is just known with an unvertainty margin, say +/- 1.5 months as an example.
Is there something such as an "uncertain datetime object" that could handle these uncertainties?
I was thinking to insert just "99" and "99" for day and month as a zeroth order approach and then for example create an Enum object that labels the date as uncertain. But first of all inserting nines doesn't work, because datetime takes good care that you insert valid month and day when instantiating a datetime object.
Is there a cleverer aprroach to this? Is there maybe an already existing package than can deal with uncertain dates?

An entry like +/- 1.5 is a difference from some measured time. One way you can codify this is with a timedelta object, which represents the difference in two datetime objects.
Here is how you would represent an interval of minus five minutes:
import datetime
i = datetime.timedelta(minutes=-5)
Now, to calculate a time you just add that to an actual date time value.

Related

Set column with different time zones as index

I have a DataFrame with time values from different timezones. See here:
The start of the data is the usual time and the second half is daylight savings time. As you can see I want to convert it to a datetime column but because of the different time zones it doesn't work. My goal is to set this column as index. How can I do that?
"... timezone-aware inputs with mixed time offsets ..." can be a bit problematic with Pandas. However, there is a pandas.to_datetime parameter setting that may be acceptable to use timezone-aware inputs with mixed time offsets as a DatetimeIndex.
Excerpt from the docs:
... timezone-aware inputs with mixed time offsets (for example issued from a timezone with daylight savings, such as Europe/Paris)
are not successfully converted to a DatetimeIndex. Instead a simple
Index containing datetime.datetime objects is returned:
...
Setting utc=True solves most of the ... issues:
...
Timezone-aware inputs are converted to UTC (the output represents the exact same datetime, but viewed from the UTC time offset +00:00)
[and a DatetimeIndex is returned].

How to compare np.datetime64 up to month only?

I stumbled upon a "problem" while working with my data some time ago, when I started messing with pandas. That problem is that, when you compare np.datetime64 objects with strings, numpy will fill out the rest of the information to fit datetime with the lowest value possible (01 for months, 01 for days and so on).
The same happens if you call an np.datetime64 object and specify only up to the month, the rest of the information will still be filled with the lowest possible value:
np.datetime('2019-07','M')
>>numpy.datetime64('2019-08')
The problem for me is that, many times, my only concern is with what happens between time periods, like months.
For exemple, if I want to filter every row where payments were made within last month, it would be ideal to use:
month = '2019-07'
df[df['pay_day']==month]
But when doing something like that, it will compare up to the day and fail for every date that isn't the first day of the month. I have tried transforming datetime to str, slicing and putting it back together, but for filtering purposes it gets messy. Another thing I have tried is:
df['pay_day'].days=1
The idea was to bring all days to 01, so there would be no problem when comparing and filtering, but it just fills the whole column with int64 1's.
Any ideas on how to do that?
You can use pandas datetime accessor object .dtand get corresponding property (month here) for comparison.
df[df['pay_day'].dt.month == month]
I have found a way that works for this problem in specific: if we set all days to 01, there should be no problem, but it is hard do manipulate np.datetime64. There is a way, though:
df['pay_day'] = df['pay_day'].astype('datetime64[M]')
So that all days will be set to 01, and comparison based on month becomes easy. But if there is a need of editing the days to any other value, I guess it's harder, but this works.
I got the idea from: https://stackoverflow.com/a/52810147/8424939

Query [large] data records from Proficy Historian?

I'm using the Proficy Historian SDK with python27. I can create a data record object and add the query criteria attributes (sample type, start time, end time, sample interval - in milliseconds) and use datarecord.QueryRecordset() to execute a query.
The problem I'm facing is that method QueryRecordset seems to only work for returning a small number of data sets (a few hundred records at most) i.e. a small date range, otherwise it returns no results for any of the SCADA tags. I can sometimes get it to return more (a few thousand) records by slowly incriminating the date range, but it seems unreliable. So, is there a way to fix this or a different way to do the query or set it up? Most of my queries contain multiple tags. Otherwise, I guess I'll just have to successively execute the query/slide the date range and pull a few hundred records at a time.
Update:
I'm preforming the query using the following steps:
from win32com.client.gencache import EnsureDispatch
from win32com.client import constants as c
import datetime
ihApp = EnsureDispatch('iHistorian_SDK.Server')
drecord = ihApp.Data.NewRecordset()
drecord.Criteria.Tags = ['Tag_1', 'Tag_2', 'Tag_3']
drecord.Criteria.SamplingMode = c.Calculated
drecord.Criteria.CalculationMode = c.Average
drecord.Criteria.Direction = c.Forward
drecord.Criteria.NumberOfSamples = 0 # This is the default value
drecord.Criteria.SamplingInterval = 30*60*1000 # 30 min interval in ms
# I've tried using the win32com pytime type instead of datetime, but it
# doesn't make a difference
drecord.Criteria.StartTime = datetime.datetime(2015, 11, 1)
drecord.Criteria.EndTime = datetime.datetime(2015, 11, 10)
# Run the query
drecord.Fields.Clear()
drecord.Fields.AllFields()
drecord.QueryRecordset()
One problem that may be happening is the use of dates/times in the dd/mm/yyyy hh:mm format. When I create a pytime or datetime object the individual attributes e.g. year, day, month, hour, minute are all correct before and after assignment to drecord.Criteria.StartTime and drecord.Criteria.EndTime, but when I print the variable it always comes out in mm/dd/yyyy hh:mm format, but this is probably due to the object's str or repr method.
So, it turns out there were two properties that could be adjusted to increase the number of samples returned and time before a timeout occurred. Both properties are set on the server object (ihApp):
ihApp.MaximumQueryIntervals = MaximumQueryIntervals # Default is 50000
ihApp.MaximumQueryTime = MaximumQueryTime # Default is 60 (seconds)
Increasing both these values seemed to fix my problems. Some tags definitely seem to take longer to query than others (over the same time period and same sampling method), so increasing the max. query time helped make returning query data more reliable.
When QueryRecordset() completes it returns False if there was an error and doesn't populate any of the data records. The error type can be show using:
drecord.LastError

return datetimes in the active timezone with a django query

I am trying to retrieve the last n hour rows from a table and print their datetimes in a given timezone, the timezone to use when printing dates is given, I am trying to use activate to make django return the datetimes with the proper timezone but it returns dates as UTC.
here is my current code:
min_time = datetime.datetime.now(link.monitor.timezone) - datetime.timedelta(hours=period)
timezone.activate(link.monitor.timezone)
rows = TraceHttp.objects.values_list('time', 'elapsed').filter(time__gt=min_time,link_id=link_id,elapsed__gt=0)
array = []
for row in rows:
array.append((row[0].astimezone(link.monitor.timezone),row[1]))
I want to avoid using the astimezone function and make Django do this for me, is there sometimes I'm missing about the activate function?
EDIT
Here are my models, as you can see the timezone to display is saved on the "monitor" model:
class Link(models.Model):
...
monitor = models.ForeignKey(Monitor)
...
class Monitor(models.Model):
...
timezone = TimeZoneField(default='Europe/London')
class TraceHttp(models.Model):
link = models.ForeignKey(Link)
time = models.DateTimeField()
elapsed = models.FloatField()
After some research I noticed that Django allways returns datetimes as UTC and it's up to you to interpret them in the correct timezone either by using the datetime.astimezone(timezone) method or activating a certain timezone.
The django active function just changes the way that the datetime will be rendered on a template but doesn't actually localize a timezone.
If you ever find yourself doing timezone.localtime(dt_value) or dt_value.astimezone(tzifo) in a loop for a few million times to calculate what's the current date in your timezone, the likely best approach as of 1.10 <= django.VERSION <= 2.1 is to use django.db.models.functions.Trunc and related functions, i.e use a queryset like:
from django.db.models.functions import Trunc, TruncDate
qs = MyModel.objects.filter(...).values(
'dtime',
...,
dtime_at_my_tz=Trunc('dtime', 'second', tzinfo=yourtz),
date_at_my_tz=TruncDate('dtime', tzinfo=yourtz),
month=TruncDate(Trunc('dtime', 'month', tzinfo=yourtz)),
quarter=TruncDate(Trunc('dtime', 'quarter', tzinfo=yourtz))
)
This will return datetimes or dates in the right timezone. You can use other Trunc* functions as shorthand. TruncDate is especially useful if all you need are datetime.dates
This will offload date calculations to the database, usually with a big reduction in code complexity and increased speed (in my case, over 6.5 million timezone.localtime(ts) were contributing 25% of total CPU time)
Note on TruncMonth and timezones
A while ago I found that I couldn't get 'proper' months out of TruncMonth or TruncQuarter: a January 1st would become a December 31st.
TruncMonth uses the currently active timezone, so (correctly) a datetime of 2019-01-01T00:00:00Z gets converted to the previous day for any timezone that has a positive offset from UTC (Western Europe and everywhere further East).
If you're only interested in the 'pure month' of an event datetime (and you probably are if you're using TruncMonth) this isn't helpful, however if you timezone.activate(timezone.utc) before executing the query (that is, evaluating your QuerySet) you'll get the intended result. Keep in mind that events occurred from your midnight until UTC's midnight will fall under the previous month (and in the same way datetimes from your timezone's midnight to UTC's midnight will be converted to the 'wrong' month)
You can use now() function from django.utils, but you need to set two variables in settings: USE_TZ and TIME_ZONE, the first with True and the other with the default timezone that will be used to generate the datetime.
You can see more informations in django documentation here.

Event Extraction from Recurrence Pattern

I've been working on a event based AJAX application that stores recurring events in the a table in the following format (Django models):
event_id = models.CharField(primary_key=True, max_length=24)
# start_date - the start date of the first event in a series
start_date = models.DateTimeField()
# the end date of the last event in a series
end_date = models.DateTimeField()
# Length of the occurence
event_length = models.BigIntegerField(null=True, blank=True, default=0)
rec_type = models.CharField(max_length=32)
The rec_type stores data in the following format:
[type]_[count]_[day]_[count2]_[days]#[extra]
type - the type of repeation: 'day','week','month','year'.
count - the interval between events in the “type” units.
day and count2 - define a day of a month ( first Monday, third Friday, etc ).
days - the comma-separated list of affected week days.
extra - the extra info that can be used to change presentation of recurring details.
For example:
day_3___ - each three days
month _2___ - each two month
month_1_1_2_ - second Monday of each month
week_2___1,5 - Monday and Friday of each second week
This works fine, and allows many events to be transmitted concisely, but I now have the requirement to extract all events that occur during a given range. For example on a specific date, week or month and I am a bit lost as to how best to approach.
In particular, I am stuck with how to check if an event with a given recurrence pattern is eligible to be in the results.
What is the best approach here?
Personally, I'd store an rrule object from python-dateutil (http://labix.org/python-dateutil) rather than inventing your own recurrence format. Then you can just define some methods that use rrule. between(after, before) to generate instances of your event object for a given range.
One catch though, dateutil's rrule object doesn't pickle correctly, so you should define your own mechanism of serialising the object to the database. I've generally gone with a JSON representation of the keyword arguments for instantiating the rrule. The annoying edge case is that if you want to store stuff like '2nd Monday of the month', you have to do additional work with MO(2), because the value it returns isn't useful. It's hard to explain, but you'll see the problem when you try it.
I'm not aware of any efficient way to find all eligible events within a range though, you'll have to load in all the Event models that potentially overlap with the range. So you'll always be loading in potentially more data than you'll eventually use. Just make sure relatively smart about it to reduce the burden. Short of someone adding recurrence handling to databases themselves, I'm not aware of any way to improve this.

Categories