Query [large] data records from Proficy Historian? - python

I'm using the Proficy Historian SDK with python27. I can create a data record object and add the query criteria attributes (sample type, start time, end time, sample interval - in milliseconds) and use datarecord.QueryRecordset() to execute a query.
The problem I'm facing is that method QueryRecordset seems to only work for returning a small number of data sets (a few hundred records at most) i.e. a small date range, otherwise it returns no results for any of the SCADA tags. I can sometimes get it to return more (a few thousand) records by slowly incriminating the date range, but it seems unreliable. So, is there a way to fix this or a different way to do the query or set it up? Most of my queries contain multiple tags. Otherwise, I guess I'll just have to successively execute the query/slide the date range and pull a few hundred records at a time.
Update:
I'm preforming the query using the following steps:
from win32com.client.gencache import EnsureDispatch
from win32com.client import constants as c
import datetime
ihApp = EnsureDispatch('iHistorian_SDK.Server')
drecord = ihApp.Data.NewRecordset()
drecord.Criteria.Tags = ['Tag_1', 'Tag_2', 'Tag_3']
drecord.Criteria.SamplingMode = c.Calculated
drecord.Criteria.CalculationMode = c.Average
drecord.Criteria.Direction = c.Forward
drecord.Criteria.NumberOfSamples = 0 # This is the default value
drecord.Criteria.SamplingInterval = 30*60*1000 # 30 min interval in ms
# I've tried using the win32com pytime type instead of datetime, but it
# doesn't make a difference
drecord.Criteria.StartTime = datetime.datetime(2015, 11, 1)
drecord.Criteria.EndTime = datetime.datetime(2015, 11, 10)
# Run the query
drecord.Fields.Clear()
drecord.Fields.AllFields()
drecord.QueryRecordset()
One problem that may be happening is the use of dates/times in the dd/mm/yyyy hh:mm format. When I create a pytime or datetime object the individual attributes e.g. year, day, month, hour, minute are all correct before and after assignment to drecord.Criteria.StartTime and drecord.Criteria.EndTime, but when I print the variable it always comes out in mm/dd/yyyy hh:mm format, but this is probably due to the object's str or repr method.

So, it turns out there were two properties that could be adjusted to increase the number of samples returned and time before a timeout occurred. Both properties are set on the server object (ihApp):
ihApp.MaximumQueryIntervals = MaximumQueryIntervals # Default is 50000
ihApp.MaximumQueryTime = MaximumQueryTime # Default is 60 (seconds)
Increasing both these values seemed to fix my problems. Some tags definitely seem to take longer to query than others (over the same time period and same sampling method), so increasing the max. query time helped make returning query data more reliable.
When QueryRecordset() completes it returns False if there was an error and doesn't populate any of the data records. The error type can be show using:
drecord.LastError

Related

What format do we need to use to store data in db.Time?

I'm creating a database with SQLAlchemy and SQLite. I have some database models and one of them needs a time. Not the time-of-the-day sort of time but like when-you-study-for-10-minutes sort of time. What format can we store in db.Time?
Code:
time = db.Column(db.Time, nullable=False)
What you're looking for is a "time interval". SQLite doesn't have that. In fact, it doesn't have a time type at all. So use whatever you want.
I would suggest storing the number of seconds in the interval as an integer. For "15 minutes" you would store 900.
SQLAlchemy does have an Interval type for storing datetime.timedelta objects which you could try. The docs say "the value is stored as a date which is relative to the “epoch” (Jan. 1, 1970)" which I think is just a fancy way to say they'll store the number of seconds.

Update a specific field for each document based on a function

I have ~10k documents in my collection, with 3 fields(name, wait, utc).
The timestamps are too granular for my use, and I want to round them down to the last 10 minutes.
I created a function to modify these timestamps (I am rounding them via a function called round_to_10min(), which I import from another python file I have called utility_func.py).
It's not slick or anything but it works:
from datetime import datetime as dt
def round_to_10min(my_dt):
hours = my_dt.hour
minutes =(my_dt.minute//10)*10
date = dt(my_dt.year,my_dt.month,my_dt.day)
return dt(date.year, date.month,date.day, hours, minutes)
Is there a way for me to update the 'utc' field for each document in my collection, without taking the cursor and saving it into a list, iterating through it?
An example of what I would like to avoid having to do(doesn't seem efficient):
alldocs = collection.find({})
for x in alldocs:
id = x['_id']
old_value = int(x['utc'])
new_value = utility_func.round_to_10min(old_value)
update_val = {"$set":{"utc":new_value}}
collection.update_one({"_id":ObjectId(id)},update_val)
Here's where I think I should be headed, but the update argument has me stumped...
update_value = {'$set':{'utc':result_from_function}}
collection.update_many({},update_value)
Is this achievable in pymongo?
The mechanism you are seeking will not work.
Pymongo supports MongoDB operations only. If you can find a way to achieve your goal using MongoDB operations, you can perform this in a single update_many or aggregate query.
If you prefer to use python, then you're limited to your original approach of find, loop, update_one.

Why is there a difference between the stored and queried time in Mongo database using PyMongo client?

I have a MongoDB database with foo objects. I'd like to record the time that they are added to the database. To do this, my approach is to store a time, add the foo to the database and set the start time. When I get the foo from the database, the start time attribute should match the time when I added it.
Below is my test which reflects this. The functions are not terribly important as they're essentially db.collection.insert or db.collection.find_one calls.
def test_set_foo_start_time(self):
import datetime
t = datetime.datetime.utcnow()
foo_id = "42"
print("\nAdding {0} to the database.".format(t))
add_foo_to_database(foo_id)
self.assertIsNotNone(get_foo_from_database(foo_id))
set_foo_start_time(foo_id, t)
time = get_foo_from_database(foo_id)['start_time']
print("Extracting {0} from the database.".format(time))
self.assertEqual(t, time)
When I run my test, I'm surprised to see that it fails:
test_set_foo_start_time (test_foo_misc.FooMiscellanyTestCase) ...
Adding 2016-10-10 17:01:16.559332 to the database.
Extracting 2016-10-10 17:01:16.559000+00:00 from the database.
FAIL
where the assertion error is:
Traceback (most recent call last):
File "/Users/erip/Code/proj/tests/test_foo_misc.py", line 336, in test_set_foo_start_time
self.assertEqual(t, time)
AssertionError: datet[36 chars], 559332) != datet[36 chars], 559000, tzinfo=<bson.tz_util.FixedOffset obj[15 chars]240>)
Why is there a difference in the times that are being stored and extracted? I could add a delta to check a range of times, but I'd prefer to store the exact time. How can I fix my test to account for these microsecond differences?
According to the BSON spec, date precision is limited to milliseconds (emphasis mine),
UTC datetime - The int64 is UTC milliseconds since the Unix epoch.
There are a few different ways to approach this, but it largely depends on whether you need sub-millisecond precision. If you don't, then you can round or truncate at the millisecond-level:
datetime.datetime.utcnow().replace(microsecond=0)
You also would want to ensure that there is no carried timezone info (as that can affect your equality check).
If you need sub-millisecond precision, you could
Store a string-encoded timestamp that you can parse upon querying.
datetime.datetime.utcnow().strftime("%Y-%m-%d %H:%M:%S.%f")
The upside is you maintain precision.
The obvious downside here is strings are limited to lexicographic sorting, and this adds another step both pre-insertion and post-query.
Store a long or double containing the timestamp.
t = datetime.datetime.utcnow() - datetime.datetime(1970, 1, 1)
milli = t.total_seconds() * 1000 # microseconds present here
The upside is you maintain precision and can now sort on it correctly.
The downside is you have an extra processing step again.
Store the date as a dictionary containing the timestamp as both a BSON UTC datetime (millisecond) plus the sub-millisecond as a double or long
dt = datetime.datetime.utcnow()
t = {
"utc": dt,
"micro": dt.microsecond
}
The upside is again you've maintained precision and can (index and) sort on both fields.
The downside is the processing step and deconstructed time object.

Python script for Sqlite time subtraction after midnight

I have a database of General Transit Feed Specification data for a city that defines transit service after midnight as hour > 24. So, in the stop_times table, we have many times defined for example as 25:00:00, 26:00:00, etc. Since I need to perform time subtraction on part of this database, I figured I'd write up a user-defined python script to handle this and use the python create_function sqlite command to associate it with my database.
For some reason, when I run the query I have in mind on this dataset, I get
sqlite3.OperationalError: user-defined function raised exception
Here's the time subtraction function I wrote to handle times after midnight. I'm sure it's a mess; if you have any tips for how to more efficiently handle this, I'd love to hear those as well. Thanks in advance.
def time_delta(t1, t2):
old_arrival = t1.encode('utf-8').split(':')
old_departure = t2.encode('utf-8').split(':')
new_arrival_string = "2013-03-16 %s:%s:%s" % (int(old_arrival[0])-12, old_arrival[1], old_arrival[2])
new_arrival_format = "%Y-%m-%d %H:%M:%S"
arr = datetime.datetime.strptime(new_arrival_string, new_arrival_format)
new_departure_string = "2013-03-16 %s:%s:%s" % (int(old_departure[0])-12, old_departure[1], old_departure[2])
new_departure_format = "%Y-%m-%d %H:%M:%S"
dep = datetime.datetime.strptime(new_departure_string, new_departure_format)
difference = arr-dep
seconds = difference.seconds
if difference.days < 0:
difference = dep-arr
seconds = (-1) * difference.seconds
return seconds
Are you able to change the database schema? If so, one way to sidestep this problem might be to store arrival and departure times not as strings but as integer numbers of seconds since midnight (well, "noon minus 12h", as the spec defines), and change whatever tool you're using to load the database to convert from the "HH:MM:SS" format used in stop_times.txt.
Not only does this give you a nice, canonical representation of stop times that isn't bounded by any 24-hour limit, it makes it simple to compute time intervals and to construct database queries for stop times within specific time periods.

MongoDB date and removed objects

Yesterday I had some strange experience with MongoDB. I am using twisted and txmongo - an asynchronous driver for mongodb (similar to pymongo).
I have a rest service where it receives some data and put it to mongodb. One field is timestamp in milliseconds, 13 digits.
First of all ther is no trivial way to convert millisecond timestamp into python datetime in python. I ended up with something like this:
def date2ts(ts):
return int((time.mktime(ts.timetuple()) * 1000) + (ts.microsecond / 1000))
def ts2date(ts):
return datetime.datetime.fromtimestamp(ts / 1000) + datetime.timedelta(microseconds=(ts % 1000))
The problem is that when I save the data to mongodb, retreive datetime back and convert it back to timestamp using my function I don't get the same result in milliseconds.
I did not understand why is it happening. Datetime is saved in mongodb as ISODate object. I tried to query it from shell and there is indeed difference in one second or few millisoconds.
QUESTION 1: Does anybody know why is this happening?
But this is not over. I decided not to use datetime and to save timestamp directly as long. Before that I removed all the data from collection. I was quite surprised that when I tried to save same field not as date but as long, it was represented as ISODate in shell. And when retrieved there was still difference in few milliseconds.
I tried to drop the collection and index. When it did not help I tried to drop entire database. When it did not help I tried to drop entire database and to restart mongod. And after this I guess it started to save it as Long.
QUESTION 2: Does anybody know why is this happening?
Thank you!
Python's timestamp is calculated in seconds since the Unix epoch of Jan 1, 1970. The timestamp in JavaScript (and in turn MongoDB), on the other hand, is in terms of milliseconds.
That said, if you have only have the timestamps on hand, you can multiple the Python value by 1000 to get milliseconds and store that value into MongoDB. Likewise, you can take the value from MongoDB and divide it by 1000 to make it a Python timestamp. Keep in mind that Python only seems to care for two significant digits after the decimal point instead of three (as it doesn't typically care for milliseconds) so keep that in mind if you are still having differences of < 10 milliseconds.
Normally I would suggest working with tuples instead, but the conventions for the value ranges are different for each language (JavaScript is unintuitive in that it starts days of the month at 0 instead of 1) and may cause issues down the road.
It can be the case of different timezone's. Please use the below mentioned function to rectify it.
function time_format(d, offset) {
// Set timezone
utc = d.getTime() + (d.getTimezoneOffset() * 60000);
nd = new Date(utc + (3600000*offset));
return nd;
}
searchdate = time_format(searchdate, '+5.5');
'+5.5' here is the timezone difference from the local time to GMT time.

Categories