I have the following dateTime text type variable in Postgres table
"2016-05-12T23:59:11+00:00"
"2016-05-13T11:00:11+00:00"
"2016-05-13T23:59:11+00:00"
"2016-05-15T10:10:11+00:00"
"2016-05-16T10:10:11+00:00"
"2016-05-17T10:10:11+00:00"
I have to write a Python function to extract the data for a few variables between two dates
def fn(dateTime):
df1=pd.DataFrame()
query = """ SELECT "recordId" from "Table" where "dateTime" BETWEEN %s AND %s """ %(dStart,dEnd)
df1=pd.read_sql_query(query1,con=engine)
return df1
I need to create dStart and dEnd variables and use them as function parameters as below
fn('2016-05-12','2016-05-15')
I tried using to_char("dateTime", 'YYYY-MM-DD') Postgres function but didn't work out. Please let me know how to solve this
When working with sql, you should always use your sql library to substitute parameters into the query, instead of using Python's string operators. This avoids the risk of malformed queries or sql injection attacks. See e.g., this page. Right now your code won't run because it directly inserts dStart and dEnd without any quoting, so they are interpreted as mathematical expressions (2016 - 5 - 12 = 1999).
There's also a secondary problem that your query will exclude dateTime values on the end date, because endDate will be treated as having a time value of 00:00:00 when it is compared to dateTime. And if you use to_char() or some other function to extract just the date from the dateTime column to do the comparison, it will prevent your query from using indexes, making it very inefficient.
Here is some revised code that may work for you:
def fn(dStart, dEnd):
query = """
SELECT "recordId"
FROM "Table"
WHERE "dateTime" >= %(start)s AND "dateTime" < %(end)s + interval '1 day'
"""
query_params = {'start': dStart, 'end': dEnd}
df1 = pd.read_sql_query(query1, con=engine, params=query_params)
return df1
This code relies on a few assumptions (welcome to the wonderful world of datetime querying!):
you will pass dStart and dEnd to fn(), instead of just a single dateTime,
the dateTime column is type timestamp with timezone (not text),
the timezones in the dateTime column are correct, and
the dates given by dStart and dEnd are in the server's timezone or you have used SET TIMEZONE ... with your engine object to select the right time zone to use for this session.
Notes
Different database engines use different placeholders for the parameters, so you will need to check your database driver's documentation to decide what placeholders to use. The code above should work fine for postgresql.
With the code above, dStart and dEnd will be inserted into the query as strings, and postgresql automatically convert them into timestamps when it runs the query. This should work fine for the example dates you gave, but if you need more direct control, you have two options:
call fn() with Python date or datetime values for dStart and dEnd, and the code above will insert them into the query as postgresql dates or timestamps; or
explicitly convert the dStart and dEnd strings into postgresql dates by replacing %(start)s and %(end)s with something like this: to_date(%(start)s, 'YYYY-MM-DD').
I'm not familiar with postgresql, but you can convert the strings to the struct_time class which is part of the built in time package in Python and simply make comparisons between them.
import time
time_data = ["2016-05-12T23:59:11+00:00",
"2016-05-13T11:00:11+00:00",
"2016-05-13T23:59:11+00:00",
"2016-05-15T10:10:11+00:00",
"2016-05-16T10:10:11+00:00",
"2016-05-17T10:10:11+00:00"]
def fn(t_init, t_fin, t_all):
# Convert string inputs to struct_time using time.strptime()
t_init, t_fin = [time.strptime(x, '%Y-%m-%d') for x in [t_init, t_fin]]
t_all = [time.strptime(x, '%Y-%m-%dT%H:%M:%S+00:00') for x in time_all]
out = []
for jj in range(len(t_all)):
if t_init < t_all[jj] < t_fin:
out.append(jj)
return out
out = fn('2016-05-12','2016-05-15', time_data)
print(out)
# [0, 1, 2]
The time.strptime routine uses a format specifiers to specify which parts of the string correspond to different time components.
%Y Year with century as a decimal number.
%m Month as a decimal number [01,12].
%d Day of the month as a decimal number [01,31].
%H Hour (24-hour clock) as a decimal number [00,23].
%M Minute as a decimal number [00,59].
%S Second as a decimal number [00,61].
%z Time zone offset from UTC.
%a Locale's abbreviated weekday name.
%A Locale's full weekday name.
%b Locale's abbreviated month name.
%B Locale's full month name.
%c Locale's appropriate date and time representation.
%I Hour (12-hour clock) as a decimal number [01,12].
%p Locale's equivalent of either AM or PM.
Related
I am having trouble passing a datetime.time variable into a SQLite database, I have some very basic code here to show what exactly the variable is.
import datetime as dt
time = dt.datetime.now().time()
time = time.strftime('%H:%M')
time = dt.datetime.strptime(time, '%H:%M').time()
print(time)
print(type(time))
time = dt.datetime.now().time() gets the current time in type datetime.time.
Output:
17:34:48.286215
<class 'datetime.time'>
time = time.strftime('%H:%M') is then retrieving just the hour and minute but is of type str
Output:
17:35
<class 'str'>
I then convert it back to a datetime.time with time = dt.datetime.strptime(time, '%H:%M').time() which gives the the output:
17:32:00
<class 'datetime.time'>
The column of type Time accepts the format of HH:SS as shown in the documentation (SQLite3 DateTime Documentation), so I am not sure why I am getting this error:
sqlite3.InterfaceError: Error binding parameter 11 - probably unsupported type.
From this INSERT statement:
cursor.execute("INSERT INTO booked_tickets VALUES (?,?,?,?,?,?,?,?,?,?,?,?)", (booking_ref, ticket_date, film, showing, ticket_type, num_tickets, cus_name, cus_phone, cus_email, ticket_price, booking_date, booking_time, ))
EDIT: As requested, here is a snippet of code to recreate the table with the broken columns:
import datetime as dt
import sqlite3
connection = sqlite3.connect("your_database.db")
cursor = connection.cursor()
# Get the current time
time = dt.datetime.now().time()
# Format the time as a string using the '%H:%M' format
time_str = time.strftime('%H:%M')
# Parse the string back to a time object using the '%H:%M' format
time = dt.datetime.strptime(time_str, '%H:%M').time()
# Create the table
cursor.execute("CREATE TABLE test (example_time Time)")
# Insert the time into the example_time column
cursor.execute("INSERT INTO test VALUES (?)", (time, ))
connection.commit()
connection.close()
There is no Date or Time data type in SQLite.
The documentation from the link that you have in your question clearly states that in SQLite you can store datetime in 3 ways: text in ISO-8601 format, integer unix epochs and float julian days.
If you chose the first way then you should pass strings:
booking_date = dt.datetime.now().date().strftime('%Y-%m-%d')
booking_time = dt.datetime.now().time().strftime('%H:%M:00')
sql = "INSERT INTO booked_tickets VALUES (?,?,?,?,?,?,?,?,?,?)"
cursor.execute(sql, (booking_ref, ticket_date, film, showing, ticket_type, num_tickets, cus_name, cus_phone, cus_email, ticket_price, booking_date, booking_time))
But, you could also let SQLite get the current date and/or time.
Assuming that in the columns booking_date and booking_time you want the current date and time, you can define these columns as:
booking_date TEXT NOT NULL DEFAULT CURRENT_DATE,
booking_time TEXT NOT NULL DEFAULT CURRENT_TIME
and then you don't need to pass anything for them in the INSERT statement:
sql = "INSERT INTO booked_tickets VALUES (?,?,?,?,?,?,?,?,?,?)"
cursor.execute(sql, (booking_ref, ticket_date, film, showing, ticket_type, num_tickets, cus_name, cus_phone, cus_email, ticket_price,))
Checkout the SQLite datatypes documentation
2.2. Date and Time Datatype
SQLite does not have a storage class set aside for storing dates
and/or times. Instead, the built-in Date And Time Functions of SQLite
are capable of storing dates and times as TEXT, REAL, or INTEGER
values:
TEXT as ISO8601 strings ("YYYY-MM-DD HH:MM:SS.SSS").
REAL as Julian day numbers, the number of days since noon in Greenwich on November 24, 4714 B.C. according to the proleptic
Gregorian calendar.
INTEGER as Unix Time, the number of seconds since 1970-01-01 00:00:00 UTC.
Applications can choose to store dates and times in any of these
formats and freely convert between formats using the built-in date and
time functions.
Store the dates as TEXT datatypes.
The documentation you refer to mostly discusses how to format column values that representing dates and times. That is, it discusses what you can do with dates and times that already exist in your database.
It does, however, give just enough information to help you here I think. It says:
Date and time values can be stored as
text in a subset of the ISO-8601 format,
numbers representing the Julian day, or
numbers representing the number of seconds since (or before) 1970-01-01 00:00:00 UTC (the unix timestamp).
So you want to define and supply your dates and times as either full ISO-8601 date strings or as numbers. When defining a table, you indicate which of these formats you wish to use by defining a column type as a STRING, REAL or INTEGER respectively.
Here's some documentation that discusses how to store dates and times in one of these formats: https://www.sqlitetutorial.net/sqlite-date/
I have a string from a pdf that I want to transform it to the date format that I want to work with later,
the string is
05Dec22
how can I change it to 12/05/2022?
import datetime
date1 = '05Dec22'
date1 = datetime.datetime.strptime(date1, '%d%m%Y').strftime('%m/%d/%y')
date1 = str(date1)
This is what i tried so far
If you execute the code you'll get the following error,
ValueError: time data '05Dec22' does not match format '%d%m%Y'
this is because your time string is not in the specified format given ('%d%m%Y'). You can search for tables on the internet which show the placeholders that represent a certain formatting, if you look at the one provided here, you'll see that the formatting your string has is '%d%b%y', in this case, the %b placeholder represents the abbreviated month name and the %y placeholder is the year without century, just as your example string. Now, if you fix that in your code,
import datetime
date1 = '05Dec22'
date1 = datetime.datetime.strptime(date1, '%d%b%y').strftime('%m/%d/%Y')
date1 = str(date1)
you'll get the desired result.
Note that you also have to change the output format in strftime. As I said before, the %y placeholder is the year without century. For you to get the year including the century, you have to use %Y.
A mysql database table has a column whose datatype is time ( http://dev.mysql.com/doc/refman/5.0/en/time.html ). When the table data is accessed, Python returns the value of this column as a datetime.timedelta object. How do I extract the time out of this? (I didn't really understand what timedelta is for from the python manuals).
E.g. The column in the table contains the value "18:00:00"
Python-MySQLdb returns this as datetime.timedelta(0, 64800)
Please ignore what is below (it does return different value) -
Added: Irrespective of the time value in the table, python-MySQLdb seems to only return datetime.timedelta(0, 64800).
Note: I use Python 2.4
It's strange that Python returns the value as a datetime.timedelta. It probably should return a datetime.time. Anyway, it looks like it's returning the elapsed time since midnight (assuming the column in the table is 6:00 PM). In order to convert to a datetime.time, you can do the following::
value = datetime.timedelta(0, 64800)
(datetime.datetime.min + value).time()
datetime.datetime.min and datetime.time() are, of course, documented as part of the datetime module if you want more information.
A datetime.timedelta is, by the way, a representation of the difference between two datetime.datetime values. So if you subtract one datetime.datetime from another, you will get a datetime.timedelta. And if you add a datetime.datetime with a datetime.timedelta, you'll get a datetime.datetime. That's how the code above works.
It seems to me that the TIME type in MySQL is intended to represent time intervals as datetime.timedelta does in Python. From the docs you referenced:
TIME values may range from '-838:59:59' to '838:59:59'. The hours part may be so large because the TIME type can be used not only to represent a time of day (which must be less than 24 hours), but also elapsed time or a time interval between two events (which may be much greater than 24 hours, or even negative).
An alternative to converting from datetime.timedelta to datetime.time would be to change the column type to DATETIME and not using the date fields.
-Insert:
tIn = datetime.datetime(
year=datetime.MINYEAR,
month=1,
day=1,
hour=10,
minute=52,
second=10
)
cursor.execute('INSERT INTO TableName (TimeColumn) VALUES (%s)', [tIn])
-Select:
cursor.execute('SELECT TimeColumn FROM TableName')
result = cursor.fetchone()
if result is not None:
tOut = result[0].time()
print 'Selected time: {0}:{1}:{2}'.format(tOut.hour, tOut.minute, tOut.second)
datetime.time() is called on a datetime object to get a time object.
The Sqlite documentation states:
SQLite has no DATETIME datatype. Instead, dates and times can be stored in any of these ways:
As a TEXT string in the ISO-8601 format. Example: '2018-04-02 12:13:46'.
As an INTEGER number of seconds since 1970 (also known as "unix time").
...
so I decided to use an INTEGER unix timestamp:
import sqlite3, time
conn = sqlite3.connect(':memory:')
conn.execute("CREATE TABLE data(datetime INTEGER, t TEXT);")
conn.execute("INSERT INTO data VALUES (CURRENT_TIMESTAMP, 'hello')")
Why does the following query return no result?
ts = int(time.time()) + 31*24*3600 # unix timestamp 1 month in the future
print(list(conn.execute("SELECT * FROM data WHERE datetime <= ?", (ts, ))))
More generally, how to do a SELECT query with a comparison with a unix timestamp with Sqlite?
PS:
I have already read SQLite DateTime comparison and similar questions, which offer other comparison methods, but here I'd like to precisely discuss why this unix timestamp comparison does not work.
For performance reasons, I'd like to:
do a query that compares integers (which is super fast if many rows): WHERE datetime <= unix_timestamp,
avoid to convert unix_timestamp into string, and then compare datetime to this string (I guess it'll be far slower)
You use CURRENT_TIMESTAMP when inserting new rows.
This means that in your column the values are not stored as unix timestamps becuase CURRENT_TIMESTAMP returns the current date in the format of YYYY-MM-DD hh:mm:ss.
You can transform the unix timestamp to datetime in the format of YYYY-MM-DD hh:mm:ss with the function datetime() and the unixepoch modifier:
conn.execute("SELECT * FROM data WHERE datetime <= datetime(?, 'unixepoch')", (ts, ))
If your unix timestamp contains milliseconds you must strip them off:
conn.execute("SELECT * FROM data WHERE datetime <= datetime(? / 1000, 'unixepoch')", (ts, ))
Or, you can transform the string datetime in the column datetime to a unix timestamp with the function strftime():
conn.execute("SELECT * FROM data WHERE strftime('%s', datetime) + 0 <= ?", (ts, ))
If you want to store integer values in the column, use strftime() like this:
INSERT INTO data VALUES (strftime('%s', CURRENT_TIMESTAMP) + 0, 'hello')
I need to store a timestamp in a readable format, and then later on I need to convert it to epoch for comparison purposes.
I tried doing this:
import time
format = '%Y %m %d %H:%M:%S +0000'
timestamp1 = time.strftime(format,time.gmtime()) # '2016 03 25 04:06:22 +0000'
t1 = time.strptime(timestamp1, format) # time.struct_time(..., tm_isdst=-1)
time.sleep(1)
epoch_now = time.mktime(time.gmtime())
epoch_t1 = time.mktime(t1)
print "Delta: %s" % (epoch_now - epoch_t1)
Running this, instead of getting Delta of 1 sec, I get 3601 (1 hr 1 sec), CONSISTENTLY.
Investigating further, it seems that when I just do time.gmtime(), the struct has tm_isdst=0, whereas the converted struct t1 from timestamp1 string has tm_isdst=-1.
How can I ensure the isdst is preserved to 0. I think that's probably the issue here.
Or is there a better way to record time in human readable format (UTC), and yet be able to convert back to epoch properly for time diff calculation?
UPDATES:
After doing more research last night, I switched to using datetime because it preserves more information in the datetime object, and this is confirmed by albertoql answer below.
Here's what I have now:
from datetime import datetime
format = '%Y-%m-%d %H:%M:%S.%f +0000' # +0000 is optional; only for user to see it's UTC
d1 = datetime.utcnow()
timestamp1 = d1.strftime(format)
d1a = datetime.strptime(timestamp1, format)
time.sleep(1)
d2 = datetime.utcnow()
print "Delta: %s" % (d2 - d1a).seconds
I chose not to add tz to keep it simple/shorter; I can still strptime that way.
Below, first an explanation about the problem, then two possible solutions, one using time, another using datetime.
Problem explanation
The problem is on the observation that the OP made in the question: tm_isdst=-1. tm_isdst is a flag that determines whether daylight savings time is in effect or not (see for more details https://docs.python.org/2/library/time.html#time.struct_time).
Specifically, given the format of the string for the time from the OP (that complies with RFC 2822 Internet email standard), [time.strptime]4 does not store the information about the timezone, namely +0000. Thus, when the struct_time is created again according to the information in the string, tm_isdst=-1, namely unknown. The guess on how to fill in that information when making the calculation is based on the local system. For example, as if the system refers to North America, where daylight savings time is in effect, tm_isdst is set.
Solution with time
If you want to use only time package, then, the easiest way to parse directly the information is to specify that the time is in UTC, and thus adding %Z to the format. Note that time does not provide a way to store the information about the timezone in struct_time. As a result, it does not print the actual time zone associated with the time saved in the variable. The time zone is retrieved from the system. Therefore, it is not possible to directly use the same format for time.strftime. The part of the code for writing and reading the string would look like:
format = '%Y %m %d %H:%M:%S UTC'
format2 = '%Y %m %d %H:%M:%S %Z'
timestamp1 = time.strftime(format, time.gmtime())
t1 = time.strptime(timestamp1, format2)
Solution with datetime
Another solution involves the use datetime and dateutil packages, which directly support timezone, and the code could be (assuming that preserving the timezone information is a requirement):
from datetime import datetime
from dateutil import tz, parser
import time
time_format = '%Y %m %d %H:%M:%S %z'
utc_zone = tz.gettz('UTC')
utc_time1 = datetime.utcnow()
utc_time1 = utc_time1.replace(tzinfo=utc_zone)
utc_time1_string = utc_time1.strftime(time_format)
utc_time1 = parser.parse(utc_time1_string)
time.sleep(1)
utc_time2 = datetime.utcnow()
utc_time2 = utc_time2.replace(tzinfo=utc_zone)
print "Delta: %s" % (utc_time2 - utc_time1).total_seconds()
There are some aspects to pay attention to:
After the call of utcnow, the timezone is not set, as it is a naive UTC datetime. If the information about UTC is not needed, it is possible to delete both lines where the timezone is set for the two times, and the result would be the same, as there is no guess about DST.
It is not possible to use datetime.strptime because of %z, which is not correctly parsed. If the string contains the information about the timezone, then parser should be used.
It is possible to directly perform the difference from two instances of datetime and transform the resulting delta into seconds.
If it is necessary to get the time in seconds since the epoch, an explicit computation should be made, as there is no direct function that does that automatically in datetime (at the time of the answer). Below the code, for example for utc_time2:
epoch_time = datetime(1970,1,1)
epoch2 = (utc_time2 - epoch_time).total_seconds()
datetime.resolution, namely the smallest possible difference between two non-equal datetime objects. This results in a difference that is up to the resolution.