Update a specific field for each document based on a function

Update a specific field for each document based on a function - python

I have ~10k documents in my collection, with 3 fields(name, wait, utc).
The timestamps are too granular for my use, and I want to round them down to the last 10 minutes.
I created a function to modify these timestamps (I am rounding them via a function called round_to_10min(), which I import from another python file I have called utility_func.py).
It's not slick or anything but it works:
from datetime import datetime as dt
def round_to_10min(my_dt):
hours = my_dt.hour
minutes =(my_dt.minute//10)*10
date = dt(my_dt.year,my_dt.month,my_dt.day)
return dt(date.year, date.month,date.day, hours, minutes)
Is there a way for me to update the 'utc' field for each document in my collection, without taking the cursor and saving it into a list, iterating through it?
An example of what I would like to avoid having to do(doesn't seem efficient):
alldocs = collection.find({})
for x in alldocs:
id = x['_id']
old_value = int(x['utc'])
new_value = utility_func.round_to_10min(old_value)
update_val = {"$set":{"utc":new_value}}
collection.update_one({"_id":ObjectId(id)},update_val)
Here's where I think I should be headed, but the update argument has me stumped...
update_value = {'$set':{'utc':result_from_function}}
collection.update_many({},update_value)
Is this achievable in pymongo?

The mechanism you are seeking will not work.
Pymongo supports MongoDB operations only. If you can find a way to achieve your goal using MongoDB operations, you can perform this in a single update_many or aggregate query.
If you prefer to use python, then you're limited to your original approach of find, loop, update_one.

Related

What format do we need to use to store data in db.Time?

I'm creating a database with SQLAlchemy and SQLite. I have some database models and one of them needs a time. Not the time-of-the-day sort of time but like when-you-study-for-10-minutes sort of time. What format can we store in db.Time?
Code:
time = db.Column(db.Time, nullable=False)

What you're looking for is a "time interval". SQLite doesn't have that. In fact, it doesn't have a time type at all. So use whatever you want.
I would suggest storing the number of seconds in the interval as an integer. For "15 minutes" you would store 900.
SQLAlchemy does have an Interval type for storing datetime.timedelta objects which you could try. The docs say "the value is stored as a date which is relative to the “epoch” (Jan. 1, 1970)" which I think is just a fancy way to say they'll store the number of seconds.

Is there a way to modify datetime objects through the Django ORM Query?

We've a Django, Postgresql database that contains objects with:
object_date = models.DateTimeField()
as a field.
We need to count the objects by hour per day, so we need to remove some of the extra time data, for example: minutes, seconds and microseconds.
We can remove the extra time data in python:
query = MyModel.objects.values('object_date')
data = [tweet['tweet_date'].replace(minute=0, second=0, microsecond=0) for tweet in query
Which leaves us with a list containing the date and hour.
My Question: Is there a better, faster, cleaner way to do this in the query itself?

If you simply want to obtain the dates without the time data, you can use extra to declare calculated fields:
query = MyModel.objects
.extra(select={
'object_date_group': 'CAST(object_date AS DATE)',
'object_hour_group': 'EXTRACT(HOUR FROM object_date)'
})
.values('object_date_group', 'object_hour_group')
You don't gain too much from just that, though; the database is now sending you even more data.
However, with these additional fields, you can use aggregation to instantly get the counts you were looking for, by adding one line:
query = MyModel.objects
.extra(select={
'object_date_group': 'CAST(object_date AS DATE)',
'object_hour_group': 'EXTRACT(HOUR FROM object_date)'
})
.values('object_date_group', 'object_hour_group')
.annotate(count=Count('*'))
Alternatively, you could use any valid SQL to combine what I made two fields into one field, by formatting it into a string, for example. The nice thing about doing that, is that you can then use the tuples to construct a Counter for convenient querying (use values_list()).
This query will certainly be more efficient than doing the counting in Python. For a background job that may not be so important, however.
One downside is that this code is not portable; for one, it does not work on SQLite, which you may still be using for testing purposes. In that case, you might save yourself the trouble and write a raw query right away, which will be just as unportable but more readable.
Update
As of 1.10 it is possible to perform this query nicely using expressions, thanks to the addition of TruncHour. Here's a suggestion for how the solution could look:
from collections import Counter
from django.db.models import Count
from django.db.models.functions import TruncHour
counts_by_group = Counter(dict(
MyModel.objects
.annotate(object_group=TruncHour('object_date'))
.values_list('object_group')
.annotate(count=Count('object_group'))
)) # query with counts_by_group[datetime.datetime(year, month, day, hour)]
It's elegant, efficient and portable. :)

count = len(MyModel.objects.filter(object_date__range=(beginning_of_hour, end_of_hour)))
or
count = MyModel.objects.filter(object_date__range=(beginning_of_hour, end_of_hour)).count()
Assuming I understand what you're asking for, this returns the number of objects that have a date within a specific time range. Set the range to be from the beginning of the hour until the end of the hour and you will return all objects created in that hour. Count() or len() can be used depending on the desired use. For more information on that check out https://docs.djangoproject.com/en/1.9/ref/models/querysets/#count

Query [large] data records from Proficy Historian?

I'm using the Proficy Historian SDK with python27. I can create a data record object and add the query criteria attributes (sample type, start time, end time, sample interval - in milliseconds) and use datarecord.QueryRecordset() to execute a query.
The problem I'm facing is that method QueryRecordset seems to only work for returning a small number of data sets (a few hundred records at most) i.e. a small date range, otherwise it returns no results for any of the SCADA tags. I can sometimes get it to return more (a few thousand) records by slowly incriminating the date range, but it seems unreliable. So, is there a way to fix this or a different way to do the query or set it up? Most of my queries contain multiple tags. Otherwise, I guess I'll just have to successively execute the query/slide the date range and pull a few hundred records at a time.
Update:
I'm preforming the query using the following steps:
from win32com.client.gencache import EnsureDispatch
from win32com.client import constants as c
import datetime
ihApp = EnsureDispatch('iHistorian_SDK.Server')
drecord = ihApp.Data.NewRecordset()
drecord.Criteria.Tags = ['Tag_1', 'Tag_2', 'Tag_3']
drecord.Criteria.SamplingMode = c.Calculated
drecord.Criteria.CalculationMode = c.Average
drecord.Criteria.Direction = c.Forward
drecord.Criteria.NumberOfSamples = 0 # This is the default value
drecord.Criteria.SamplingInterval = 30*60*1000 # 30 min interval in ms
# I've tried using the win32com pytime type instead of datetime, but it
# doesn't make a difference
drecord.Criteria.StartTime = datetime.datetime(2015, 11, 1)
drecord.Criteria.EndTime = datetime.datetime(2015, 11, 10)
# Run the query
drecord.Fields.Clear()
drecord.Fields.AllFields()
drecord.QueryRecordset()
One problem that may be happening is the use of dates/times in the dd/mm/yyyy hh:mm format. When I create a pytime or datetime object the individual attributes e.g. year, day, month, hour, minute are all correct before and after assignment to drecord.Criteria.StartTime and drecord.Criteria.EndTime, but when I print the variable it always comes out in mm/dd/yyyy hh:mm format, but this is probably due to the object's str or repr method.

So, it turns out there were two properties that could be adjusted to increase the number of samples returned and time before a timeout occurred. Both properties are set on the server object (ihApp):
ihApp.MaximumQueryIntervals = MaximumQueryIntervals # Default is 50000
ihApp.MaximumQueryTime = MaximumQueryTime # Default is 60 (seconds)
Increasing both these values seemed to fix my problems. Some tags definitely seem to take longer to query than others (over the same time period and same sampling method), so increasing the max. query time helped make returning query data more reliable.
When QueryRecordset() completes it returns False if there was an error and doesn't populate any of the data records. The error type can be show using:
drecord.LastError

select records from last 48 hours with GQL [duplicate]

I have a simple table in Google App Engine with a date field. I want to query all the rows with the date field valued between now and 6 hours ago. How do I form this query?

I know you say GQL, but here's a python helper function I use:
import datetime
def seconds_ago(time_s):
return datetime.datetime.now() - datetime.timedelta(seconds=time_s)
There may well be a more concise way to write it: I'm not a python expert and went with the first thing that worked. Take a look at the datetime docs if you care. It's used like this:
my_query = MyTable.all().filter("date >", seconds_ago(6*60*60))
I'm sure that can be translated to GQL without much bother, but I prefer the object-oriented interface, and I don't know the necessary DATETIME syntax.
In python the query is then used like this:
# get a count
my_query.count()
# get up to 1000 records
my_query.fetch(1000)
# iterate over up to 1000 records
for result in my_query:
# do something with result

SELECT * FROM simpletable
WHERE datefield < DATETIME(year, month, day, hour, minute, second)
computing those year, month, &c, in your application code.

MongoDB date and removed objects

Yesterday I had some strange experience with MongoDB. I am using twisted and txmongo - an asynchronous driver for mongodb (similar to pymongo).
I have a rest service where it receives some data and put it to mongodb. One field is timestamp in milliseconds, 13 digits.
First of all ther is no trivial way to convert millisecond timestamp into python datetime in python. I ended up with something like this:
def date2ts(ts):
return int((time.mktime(ts.timetuple()) * 1000) + (ts.microsecond / 1000))
def ts2date(ts):
return datetime.datetime.fromtimestamp(ts / 1000) + datetime.timedelta(microseconds=(ts % 1000))
The problem is that when I save the data to mongodb, retreive datetime back and convert it back to timestamp using my function I don't get the same result in milliseconds.
I did not understand why is it happening. Datetime is saved in mongodb as ISODate object. I tried to query it from shell and there is indeed difference in one second or few millisoconds.
QUESTION 1: Does anybody know why is this happening?
But this is not over. I decided not to use datetime and to save timestamp directly as long. Before that I removed all the data from collection. I was quite surprised that when I tried to save same field not as date but as long, it was represented as ISODate in shell. And when retrieved there was still difference in few milliseconds.
I tried to drop the collection and index. When it did not help I tried to drop entire database. When it did not help I tried to drop entire database and to restart mongod. And after this I guess it started to save it as Long.
QUESTION 2: Does anybody know why is this happening?
Thank you!

Python's timestamp is calculated in seconds since the Unix epoch of Jan 1, 1970. The timestamp in JavaScript (and in turn MongoDB), on the other hand, is in terms of milliseconds.
That said, if you have only have the timestamps on hand, you can multiple the Python value by 1000 to get milliseconds and store that value into MongoDB. Likewise, you can take the value from MongoDB and divide it by 1000 to make it a Python timestamp. Keep in mind that Python only seems to care for two significant digits after the decimal point instead of three (as it doesn't typically care for milliseconds) so keep that in mind if you are still having differences of < 10 milliseconds.
Normally I would suggest working with tuples instead, but the conventions for the value ranges are different for each language (JavaScript is unintuitive in that it starts days of the month at 0 instead of 1) and may cause issues down the road.

It can be the case of different timezone's. Please use the below mentioned function to rectify it.
function time_format(d, offset) {
// Set timezone
utc = d.getTime() + (d.getTimezoneOffset() * 60000);
nd = new Date(utc + (3600000*offset));
return nd;
}
searchdate = time_format(searchdate, '+5.5');
'+5.5' here is the timezone difference from the local time to GMT time.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Update a specific field for each document based on a function - python

Related

What format do we need to use to store data in db.Time?

Is there a way to modify datetime objects through the Django ORM Query?

Query [large] data records from Proficy Historian?

select records from last 48 hours with GQL [duplicate]

MongoDB date and removed objects

Categories

Resources