I have a Python 2.7 API that queries a SQL db and delivers a JSON list of dictionaries that is then used in a bootstrap/Django site.
Dates in the DB are strings in the format '2017-04-20 00:00:00', but sometimes the time of the source data instead has a decimal, which causes trouble with strptime, so I'm removing the seconds by keeping only the first 10 characters of the string.
import datetime
dict_list = response['my_list_of_dicts']
for dt_to_cmpr in dict_list:
dt_to_cmpr['date_key'] = dt_to_cmpr['date_key'][:10]
Before I can compare date ranges, the dates need to be date time not strings. (Note: For production, I plan to account for exceptions such as null values.)
dt_to_cmpr['date_key'] = datetime.datetime.strptime(dt_to_cmpr['date_key'],
'%Y-%m-%d')
I want to know things about dictionaries where date_key is roughly no more than 90 days from today. (i.e. the total number in the time frame, or the sum of every dictionary's price_key.)
under_days = datetime.timedelta(days=-1)
over_days = datetime.timedelta(days=91)
now = datetime.datetime.now()
ttl_within_90days = sum(1 for d in response['my_list_of_dicts'] if (under_days <
(d.get('date_key')-now) < over_days))
One problem is now that I've converted my dates, the are not JSON serializable. So, now I have to put them back into a string again
for dt_to_cmpr in dict_list:
dt_to_cmpr['date_key'] = dt_to_cmpr['date_key'].strftime("%Y-%m-%d")
I cleaned up the above for simplicity, but that should all work. When it gets to Django, the view is going to covert them all back to date time again for use in a template.
Can I have Python just treat my date strings as time for the 90 day comparison, but leave them alone. Or, maybe have JSON use the Python date times? That much iteration every page load is slow, and can't be the best way.
The main problem is the way you're storing the datetimes. You should probably be storing them as actual datetimes in your database, not strings. You can't do date queries on string fields. Instead, you have to use the inefficient method of querying all the records and then filtering all of them in python after the fact. Database data types were created for a reason, use them.
There's no reason to convert datetimes to strings except at the very last moment when you need to format it for json or html, and the only bit of code that should need to do that is the Django app. That means:
Your Django app should almost entirely be using datetimes. It only coverts to strings when it needs to render out html or json.
Your API should only use python datetimes.
Your database should only use datetimes as well.
If you don't control the database, the best case is going to be 2 conversions
string -> datetime when pulling data out of the database.
datetime -> string when serializing to html or json.
If you can fix the database, then you only need to do the 2nd conversion.
Related
We've a Django, Postgresql database that contains objects with:
object_date = models.DateTimeField()
as a field.
We need to count the objects by hour per day, so we need to remove some of the extra time data, for example: minutes, seconds and microseconds.
We can remove the extra time data in python:
query = MyModel.objects.values('object_date')
data = [tweet['tweet_date'].replace(minute=0, second=0, microsecond=0) for tweet in query
Which leaves us with a list containing the date and hour.
My Question: Is there a better, faster, cleaner way to do this in the query itself?
If you simply want to obtain the dates without the time data, you can use extra to declare calculated fields:
query = MyModel.objects
.extra(select={
'object_date_group': 'CAST(object_date AS DATE)',
'object_hour_group': 'EXTRACT(HOUR FROM object_date)'
})
.values('object_date_group', 'object_hour_group')
You don't gain too much from just that, though; the database is now sending you even more data.
However, with these additional fields, you can use aggregation to instantly get the counts you were looking for, by adding one line:
query = MyModel.objects
.extra(select={
'object_date_group': 'CAST(object_date AS DATE)',
'object_hour_group': 'EXTRACT(HOUR FROM object_date)'
})
.values('object_date_group', 'object_hour_group')
.annotate(count=Count('*'))
Alternatively, you could use any valid SQL to combine what I made two fields into one field, by formatting it into a string, for example. The nice thing about doing that, is that you can then use the tuples to construct a Counter for convenient querying (use values_list()).
This query will certainly be more efficient than doing the counting in Python. For a background job that may not be so important, however.
One downside is that this code is not portable; for one, it does not work on SQLite, which you may still be using for testing purposes. In that case, you might save yourself the trouble and write a raw query right away, which will be just as unportable but more readable.
Update
As of 1.10 it is possible to perform this query nicely using expressions, thanks to the addition of TruncHour. Here's a suggestion for how the solution could look:
from collections import Counter
from django.db.models import Count
from django.db.models.functions import TruncHour
counts_by_group = Counter(dict(
MyModel.objects
.annotate(object_group=TruncHour('object_date'))
.values_list('object_group')
.annotate(count=Count('object_group'))
)) # query with counts_by_group[datetime.datetime(year, month, day, hour)]
It's elegant, efficient and portable. :)
count = len(MyModel.objects.filter(object_date__range=(beginning_of_hour, end_of_hour)))
or
count = MyModel.objects.filter(object_date__range=(beginning_of_hour, end_of_hour)).count()
Assuming I understand what you're asking for, this returns the number of objects that have a date within a specific time range. Set the range to be from the beginning of the hour until the end of the hour and you will return all objects created in that hour. Count() or len() can be used depending on the desired use. For more information on that check out https://docs.djangoproject.com/en/1.9/ref/models/querysets/#count
I have having a problem with inserting date values into an SQL query. I am using sqlite3 and python. The query is:
c.execute("""SELECT tweeterHash.* FROM tweeterHash, tweetDates WHERE
Date(tweetDates.start) > Date(?) AND
Date(tweetDates.end) > Date(?)""",
(start,end,))
The query doesn't return any values, and there is no error message. If I use this query:
c.execute("""SELECT tweeterHash.* FROM tweeterHash, tweetDates WHERE
Date(tweetDates.start) > Date(2014-01-01) AND
Date(tweetDates.end) > Date(2015-01-01)""")
Then I get the values that I want, which is as expected?
The values start and end come from a text file:
f = open('dates.txt','r')
start = f.readline().strip('\n')
end = f.readline().strip('\n')
but I have also just tried declaring it as well:
start = '2014-01-01'
end = '2015-01-01'
I guess I don't understand why passing the string in from the start and end variables doesn't work? What is the best way to pass a date variable into a SQL query? Any help is greatly appreciated.
These aren't the same dates—and it's the non-parameterized ones you've got wrong.
Date(2014-01-01) calculates the arithmetic expression 2014 - 01 - 01, then constructs a Date from the resulting number 2012, which will get you something in 4707 BC.
Date('2014-01-01'), or Date(?) where the parameter is the string '2014-01-01', constructs the date you want, in 2014 AD.
You can see this more easily by just selecting dates directly:
>>> cur.execute('SELECT Date(2014-01-01), Date(?)', ['2014-01-01'])
>>> print(cur.fetchone())
('-4707-05-28', '2014-01-01')
Meanwhile:
What is the best way to pass a date variable into a SQL query?
Ideally, use actual date objects instead of strings. The sqlite3 library knows how to handle datetime.datetime and datetime.date. And don't call Date on the values, just compare them. (Yes, sqlite3 might then compare them as strings instead of dates, but the whole point of using ISO8601-like formats is that this always gives the same result… unless of course you have a bunch of dates from 4707 BC lying around.) So:
start = datetime.date(2014, 1, 1)
end = datetime.date(2015, 1, 1)
c.execute("""SELECT tweeterHash.* FROM tweeterHash, tweetDates WHERE
tweetDates.start > ? AND
tweetDates.end > ?""",
(start,end,))
And would this also mean that when I create the table, I would want: " start datetime, end datetime, "?
That would work, but I wouldn't do that. Python will convert date objects to ISO8601-format strings, but not convert back on SELECT, and SQLite will let you transparently compare those strings to the values returned by the Date function.
You could get the same effect with TEXT, but I believe you'd find it less confusing, DATETIME will set the column affinity to NUMERIC, which can confuse both humans and other tools when you're actually storing strings.
Or you could use the type DATE—which is just as meaningless to SQLite as DATETIME, but it can tell Python to transparently convert return values into datetime.date objects. See Default adapters and converters in the sqlite3 docs.
Also, if you haven't read Datatypes in SQLite Version 3 and SQLite and Python types, you really should; there are a lot of things that are both surprising (even—or maybe especially—if you've used other databases), and potentially very useful.
Meanwhile, if you think you're getting the "right" results from passing Date(2014-01-01) around, that means you've actually got a bunch of garbage values in your database. And there's no way to fix them, because the mistake isn't reversible. (After all, 2014-01-01 and 2015-01-02 are both 2012…) Hopefully you either don't need the old data, or can regenerate it. Otherwise, you'll need some kind of workaround that lets you deal with existing data as usefully as possible under the circumstances.
I have chapter times in the form of HH:MM:SS. I am parsing them from a document, and I will have times as a string in the format of '00:12:14'. How would I store this in a mysql column, and then retrieve it in the required format to be able to:
1) order by time;
2) convert to a string in the above format.
I suggest you look at the MySQL time type. It will allow you to sort and format as you wish.
http://dev.mysql.com/doc/refman/5.0/en/time.html
Use the TIME type.
It allows "time values to be represented in several formats, such as quoted strings or as numbers, depending on the exact type of the value and other factors." In addition, you can perform various functions to manipulate the time.
If I have such a simple task, I choose a simple solution: I would choose the python datetime.time module (see: datetime.time) and store a TIME object using strftime.
Loading it back in is a little painful as you would have to split your string at : and then pass the values to the time constructor. Example:
def load(timestr):
hours,minutes,seconds = timestr.split(":")
return datetime.time(hours,minutes,seconds)
Hope this helps.
Yesterday I had some strange experience with MongoDB. I am using twisted and txmongo - an asynchronous driver for mongodb (similar to pymongo).
I have a rest service where it receives some data and put it to mongodb. One field is timestamp in milliseconds, 13 digits.
First of all ther is no trivial way to convert millisecond timestamp into python datetime in python. I ended up with something like this:
def date2ts(ts):
return int((time.mktime(ts.timetuple()) * 1000) + (ts.microsecond / 1000))
def ts2date(ts):
return datetime.datetime.fromtimestamp(ts / 1000) + datetime.timedelta(microseconds=(ts % 1000))
The problem is that when I save the data to mongodb, retreive datetime back and convert it back to timestamp using my function I don't get the same result in milliseconds.
I did not understand why is it happening. Datetime is saved in mongodb as ISODate object. I tried to query it from shell and there is indeed difference in one second or few millisoconds.
QUESTION 1: Does anybody know why is this happening?
But this is not over. I decided not to use datetime and to save timestamp directly as long. Before that I removed all the data from collection. I was quite surprised that when I tried to save same field not as date but as long, it was represented as ISODate in shell. And when retrieved there was still difference in few milliseconds.
I tried to drop the collection and index. When it did not help I tried to drop entire database. When it did not help I tried to drop entire database and to restart mongod. And after this I guess it started to save it as Long.
QUESTION 2: Does anybody know why is this happening?
Thank you!
Python's timestamp is calculated in seconds since the Unix epoch of Jan 1, 1970. The timestamp in JavaScript (and in turn MongoDB), on the other hand, is in terms of milliseconds.
That said, if you have only have the timestamps on hand, you can multiple the Python value by 1000 to get milliseconds and store that value into MongoDB. Likewise, you can take the value from MongoDB and divide it by 1000 to make it a Python timestamp. Keep in mind that Python only seems to care for two significant digits after the decimal point instead of three (as it doesn't typically care for milliseconds) so keep that in mind if you are still having differences of < 10 milliseconds.
Normally I would suggest working with tuples instead, but the conventions for the value ranges are different for each language (JavaScript is unintuitive in that it starts days of the month at 0 instead of 1) and may cause issues down the road.
It can be the case of different timezone's. Please use the below mentioned function to rectify it.
function time_format(d, offset) {
// Set timezone
utc = d.getTime() + (d.getTimezoneOffset() * 60000);
nd = new Date(utc + (3600000*offset));
return nd;
}
searchdate = time_format(searchdate, '+5.5');
'+5.5' here is the timezone difference from the local time to GMT time.
SQLite docs specifies that the preferred format for storing datetime values in the DB is to use Julian Day (using built-in functions).
However, all frameworks I saw in python (pysqlite, SQLAlchemy) store the datetime.datetime values as ISO formatted strings. Why are they doing so?
I'm usually trying to adapt the frameworks to storing datetime as julianday, and it's quite painful. I started to doubt that is worth the efforts.
Please share your experience in this field with me. Does sticking with julianday make sense?
Julian Day is handy for all sorts of date calculations, but it can's store the time part decently (with precise hours, minutes, and seconds). In the past I've used both Julian Day fields (for dates), and seconds-from-the-Epoch (for datetime instances), but only when I had specific needs for computation (of dates and respectively of times). The simplicity of ISO formatted dates and datetimes, I think, should make them the preferred choice, say about 97% of the time.
Store it both ways. Frameworks can be set in their ways and if yours is expecting to find a raw column with an ISO formatted string then that is probably more of a pain to get around than it's worth.
The concern in having two columns is data consistency but sqlite should have everything you need to make it work. Version 3.3 has support for check constraints and triggers. Read up on date and time functions. You should be able to do what you need entirely in the database.
CREATE TABLE Table1 (jd, isotime);
CREATE TRIGGER trigger_name_1 AFTER INSERT ON Table1
BEGIN
UPDATE Table1 SET jd = julianday(isotime) WHERE rowid = last_insert_rowid();
END;
CREATE TRIGGER trigger_name_2 AFTER UPDATE OF isotime ON Table1
BEGIN
UPDATE Table1 SET jd = julianday(isotime) WHERE rowid = old.rowid;
END;
And if you cant do what you need within the DB you can write a C extension to perform the functionality you need. That way you wont need to touch the framework other than to load your extension.
But typically, the Human doesn't read directly from the database. Fractional time on a Julian Day is easily converted to human readible by (for example)
void hour_time(GenericDate *ConvertObject)
{
double frac_time = ConvertObject->jd;
double hour = (24.0*(frac_time - (int)frac_time));
double minute = 60.0*(hour - (int)hour);
double second = 60.0*(minute - (int)minute);
double microsecond = 1000000.0*(second - (int)second);
ConvertObject->hour = hour;
ConvertObject->minute = minute;
ConvertObject->second = second;
ConvertObject->microsecond = microsecond;
};
Because 2010-06-22 00:45:56 is far easier for a human to read than 2455369.5318981484. Text dates are great for doing ad-hoc queries in SQLiteSpy or SQLite Manager.
The main drawback, of course, is that text dates require 19 bytes instead of 8.