MongoDB date and removed objects - python

Yesterday I had some strange experience with MongoDB. I am using twisted and txmongo - an asynchronous driver for mongodb (similar to pymongo).
I have a rest service where it receives some data and put it to mongodb. One field is timestamp in milliseconds, 13 digits.
First of all ther is no trivial way to convert millisecond timestamp into python datetime in python. I ended up with something like this:
def date2ts(ts):
return int((time.mktime(ts.timetuple()) * 1000) + (ts.microsecond / 1000))
def ts2date(ts):
return datetime.datetime.fromtimestamp(ts / 1000) + datetime.timedelta(microseconds=(ts % 1000))
The problem is that when I save the data to mongodb, retreive datetime back and convert it back to timestamp using my function I don't get the same result in milliseconds.
I did not understand why is it happening. Datetime is saved in mongodb as ISODate object. I tried to query it from shell and there is indeed difference in one second or few millisoconds.
QUESTION 1: Does anybody know why is this happening?
But this is not over. I decided not to use datetime and to save timestamp directly as long. Before that I removed all the data from collection. I was quite surprised that when I tried to save same field not as date but as long, it was represented as ISODate in shell. And when retrieved there was still difference in few milliseconds.
I tried to drop the collection and index. When it did not help I tried to drop entire database. When it did not help I tried to drop entire database and to restart mongod. And after this I guess it started to save it as Long.
QUESTION 2: Does anybody know why is this happening?
Thank you!

Python's timestamp is calculated in seconds since the Unix epoch of Jan 1, 1970. The timestamp in JavaScript (and in turn MongoDB), on the other hand, is in terms of milliseconds.
That said, if you have only have the timestamps on hand, you can multiple the Python value by 1000 to get milliseconds and store that value into MongoDB. Likewise, you can take the value from MongoDB and divide it by 1000 to make it a Python timestamp. Keep in mind that Python only seems to care for two significant digits after the decimal point instead of three (as it doesn't typically care for milliseconds) so keep that in mind if you are still having differences of < 10 milliseconds.
Normally I would suggest working with tuples instead, but the conventions for the value ranges are different for each language (JavaScript is unintuitive in that it starts days of the month at 0 instead of 1) and may cause issues down the road.

It can be the case of different timezone's. Please use the below mentioned function to rectify it.
function time_format(d, offset) {
// Set timezone
utc = d.getTime() + (d.getTimezoneOffset() * 60000);
nd = new Date(utc + (3600000*offset));
return nd;
}
searchdate = time_format(searchdate, '+5.5');
'+5.5' here is the timezone difference from the local time to GMT time.

Related

What format do we need to use to store data in db.Time?

I'm creating a database with SQLAlchemy and SQLite. I have some database models and one of them needs a time. Not the time-of-the-day sort of time but like when-you-study-for-10-minutes sort of time. What format can we store in db.Time?
Code:
time = db.Column(db.Time, nullable=False)
What you're looking for is a "time interval". SQLite doesn't have that. In fact, it doesn't have a time type at all. So use whatever you want.
I would suggest storing the number of seconds in the interval as an integer. For "15 minutes" you would store 900.
SQLAlchemy does have an Interval type for storing datetime.timedelta objects which you could try. The docs say "the value is stored as a date which is relative to the “epoch” (Jan. 1, 1970)" which I think is just a fancy way to say they'll store the number of seconds.

Why is there a difference between the stored and queried time in Mongo database using PyMongo client?

I have a MongoDB database with foo objects. I'd like to record the time that they are added to the database. To do this, my approach is to store a time, add the foo to the database and set the start time. When I get the foo from the database, the start time attribute should match the time when I added it.
Below is my test which reflects this. The functions are not terribly important as they're essentially db.collection.insert or db.collection.find_one calls.
def test_set_foo_start_time(self):
import datetime
t = datetime.datetime.utcnow()
foo_id = "42"
print("\nAdding {0} to the database.".format(t))
add_foo_to_database(foo_id)
self.assertIsNotNone(get_foo_from_database(foo_id))
set_foo_start_time(foo_id, t)
time = get_foo_from_database(foo_id)['start_time']
print("Extracting {0} from the database.".format(time))
self.assertEqual(t, time)
When I run my test, I'm surprised to see that it fails:
test_set_foo_start_time (test_foo_misc.FooMiscellanyTestCase) ...
Adding 2016-10-10 17:01:16.559332 to the database.
Extracting 2016-10-10 17:01:16.559000+00:00 from the database.
FAIL
where the assertion error is:
Traceback (most recent call last):
File "/Users/erip/Code/proj/tests/test_foo_misc.py", line 336, in test_set_foo_start_time
self.assertEqual(t, time)
AssertionError: datet[36 chars], 559332) != datet[36 chars], 559000, tzinfo=<bson.tz_util.FixedOffset obj[15 chars]240>)
Why is there a difference in the times that are being stored and extracted? I could add a delta to check a range of times, but I'd prefer to store the exact time. How can I fix my test to account for these microsecond differences?
According to the BSON spec, date precision is limited to milliseconds (emphasis mine),
UTC datetime - The int64 is UTC milliseconds since the Unix epoch.
There are a few different ways to approach this, but it largely depends on whether you need sub-millisecond precision. If you don't, then you can round or truncate at the millisecond-level:
datetime.datetime.utcnow().replace(microsecond=0)
You also would want to ensure that there is no carried timezone info (as that can affect your equality check).
If you need sub-millisecond precision, you could
Store a string-encoded timestamp that you can parse upon querying.
datetime.datetime.utcnow().strftime("%Y-%m-%d %H:%M:%S.%f")
The upside is you maintain precision.
The obvious downside here is strings are limited to lexicographic sorting, and this adds another step both pre-insertion and post-query.
Store a long or double containing the timestamp.
t = datetime.datetime.utcnow() - datetime.datetime(1970, 1, 1)
milli = t.total_seconds() * 1000 # microseconds present here
The upside is you maintain precision and can now sort on it correctly.
The downside is you have an extra processing step again.
Store the date as a dictionary containing the timestamp as both a BSON UTC datetime (millisecond) plus the sub-millisecond as a double or long
dt = datetime.datetime.utcnow()
t = {
"utc": dt,
"micro": dt.microsecond
}
The upside is again you've maintained precision and can (index and) sort on both fields.
The downside is the processing step and deconstructed time object.

Query [large] data records from Proficy Historian?

I'm using the Proficy Historian SDK with python27. I can create a data record object and add the query criteria attributes (sample type, start time, end time, sample interval - in milliseconds) and use datarecord.QueryRecordset() to execute a query.
The problem I'm facing is that method QueryRecordset seems to only work for returning a small number of data sets (a few hundred records at most) i.e. a small date range, otherwise it returns no results for any of the SCADA tags. I can sometimes get it to return more (a few thousand) records by slowly incriminating the date range, but it seems unreliable. So, is there a way to fix this or a different way to do the query or set it up? Most of my queries contain multiple tags. Otherwise, I guess I'll just have to successively execute the query/slide the date range and pull a few hundred records at a time.
Update:
I'm preforming the query using the following steps:
from win32com.client.gencache import EnsureDispatch
from win32com.client import constants as c
import datetime
ihApp = EnsureDispatch('iHistorian_SDK.Server')
drecord = ihApp.Data.NewRecordset()
drecord.Criteria.Tags = ['Tag_1', 'Tag_2', 'Tag_3']
drecord.Criteria.SamplingMode = c.Calculated
drecord.Criteria.CalculationMode = c.Average
drecord.Criteria.Direction = c.Forward
drecord.Criteria.NumberOfSamples = 0 # This is the default value
drecord.Criteria.SamplingInterval = 30*60*1000 # 30 min interval in ms
# I've tried using the win32com pytime type instead of datetime, but it
# doesn't make a difference
drecord.Criteria.StartTime = datetime.datetime(2015, 11, 1)
drecord.Criteria.EndTime = datetime.datetime(2015, 11, 10)
# Run the query
drecord.Fields.Clear()
drecord.Fields.AllFields()
drecord.QueryRecordset()
One problem that may be happening is the use of dates/times in the dd/mm/yyyy hh:mm format. When I create a pytime or datetime object the individual attributes e.g. year, day, month, hour, minute are all correct before and after assignment to drecord.Criteria.StartTime and drecord.Criteria.EndTime, but when I print the variable it always comes out in mm/dd/yyyy hh:mm format, but this is probably due to the object's str or repr method.
So, it turns out there were two properties that could be adjusted to increase the number of samples returned and time before a timeout occurred. Both properties are set on the server object (ihApp):
ihApp.MaximumQueryIntervals = MaximumQueryIntervals # Default is 50000
ihApp.MaximumQueryTime = MaximumQueryTime # Default is 60 (seconds)
Increasing both these values seemed to fix my problems. Some tags definitely seem to take longer to query than others (over the same time period and same sampling method), so increasing the max. query time helped make returning query data more reliable.
When QueryRecordset() completes it returns False if there was an error and doesn't populate any of the data records. The error type can be show using:
drecord.LastError

Python script for Sqlite time subtraction after midnight

I have a database of General Transit Feed Specification data for a city that defines transit service after midnight as hour > 24. So, in the stop_times table, we have many times defined for example as 25:00:00, 26:00:00, etc. Since I need to perform time subtraction on part of this database, I figured I'd write up a user-defined python script to handle this and use the python create_function sqlite command to associate it with my database.
For some reason, when I run the query I have in mind on this dataset, I get
sqlite3.OperationalError: user-defined function raised exception
Here's the time subtraction function I wrote to handle times after midnight. I'm sure it's a mess; if you have any tips for how to more efficiently handle this, I'd love to hear those as well. Thanks in advance.
def time_delta(t1, t2):
old_arrival = t1.encode('utf-8').split(':')
old_departure = t2.encode('utf-8').split(':')
new_arrival_string = "2013-03-16 %s:%s:%s" % (int(old_arrival[0])-12, old_arrival[1], old_arrival[2])
new_arrival_format = "%Y-%m-%d %H:%M:%S"
arr = datetime.datetime.strptime(new_arrival_string, new_arrival_format)
new_departure_string = "2013-03-16 %s:%s:%s" % (int(old_departure[0])-12, old_departure[1], old_departure[2])
new_departure_format = "%Y-%m-%d %H:%M:%S"
dep = datetime.datetime.strptime(new_departure_string, new_departure_format)
difference = arr-dep
seconds = difference.seconds
if difference.days < 0:
difference = dep-arr
seconds = (-1) * difference.seconds
return seconds
Are you able to change the database schema? If so, one way to sidestep this problem might be to store arrival and departure times not as strings but as integer numbers of seconds since midnight (well, "noon minus 12h", as the spec defines), and change whatever tool you're using to load the database to convert from the "HH:MM:SS" format used in stop_times.txt.
Not only does this give you a nice, canonical representation of stop times that isn't bounded by any 24-hour limit, it makes it simple to compute time intervals and to construct database queries for stop times within specific time periods.

Shall I bother with storing DateTime data as julianday in SQLite?

SQLite docs specifies that the preferred format for storing datetime values in the DB is to use Julian Day (using built-in functions).
However, all frameworks I saw in python (pysqlite, SQLAlchemy) store the datetime.datetime values as ISO formatted strings. Why are they doing so?
I'm usually trying to adapt the frameworks to storing datetime as julianday, and it's quite painful. I started to doubt that is worth the efforts.
Please share your experience in this field with me. Does sticking with julianday make sense?
Julian Day is handy for all sorts of date calculations, but it can's store the time part decently (with precise hours, minutes, and seconds). In the past I've used both Julian Day fields (for dates), and seconds-from-the-Epoch (for datetime instances), but only when I had specific needs for computation (of dates and respectively of times). The simplicity of ISO formatted dates and datetimes, I think, should make them the preferred choice, say about 97% of the time.
Store it both ways. Frameworks can be set in their ways and if yours is expecting to find a raw column with an ISO formatted string then that is probably more of a pain to get around than it's worth.
The concern in having two columns is data consistency but sqlite should have everything you need to make it work. Version 3.3 has support for check constraints and triggers. Read up on date and time functions. You should be able to do what you need entirely in the database.
CREATE TABLE Table1 (jd, isotime);
CREATE TRIGGER trigger_name_1 AFTER INSERT ON Table1
BEGIN
UPDATE Table1 SET jd = julianday(isotime) WHERE rowid = last_insert_rowid();
END;
CREATE TRIGGER trigger_name_2 AFTER UPDATE OF isotime ON Table1
BEGIN
UPDATE Table1 SET jd = julianday(isotime) WHERE rowid = old.rowid;
END;
And if you cant do what you need within the DB you can write a C extension to perform the functionality you need. That way you wont need to touch the framework other than to load your extension.
But typically, the Human doesn't read directly from the database. Fractional time on a Julian Day is easily converted to human readible by (for example)
void hour_time(GenericDate *ConvertObject)
{
double frac_time = ConvertObject->jd;
double hour = (24.0*(frac_time - (int)frac_time));
double minute = 60.0*(hour - (int)hour);
double second = 60.0*(minute - (int)minute);
double microsecond = 1000000.0*(second - (int)second);
ConvertObject->hour = hour;
ConvertObject->minute = minute;
ConvertObject->second = second;
ConvertObject->microsecond = microsecond;
};
Because 2010-06-22 00:45:56 is far easier for a human to read than 2455369.5318981484. Text dates are great for doing ad-hoc queries in SQLiteSpy or SQLite Manager.
The main drawback, of course, is that text dates require 19 bytes instead of 8.

Categories