Time column in mysql table - python

I want to store a time value in a mysql table,
1345:55
it is 1345 hours and 55 minutes. What type should column have?
And if I want to pass a time variable from python to this column using mysqldb module, which time type should i use in python? datetime.timedelta?

Generally speaking, one can use MySQL's TIME datatype to store time values:
MySQL retrieves and displays TIME values in 'HH:MM:SS' format (or 'HHH:MM:SS' format for large hours values). TIME values may range from '-838:59:59' to '838:59:59'.
Obviously, in your case, this is insufficient for the range of values required. I would therefore suggest that you instead convert the value to an integer number of minutes and store the result in an 4-byte INT UNSIGNED column (capable of storing values in the range 0 to 4294967295, representing 0:00 to 71582788:15).

Related

convert nanosecond precision datetime to snowflake TIMESTAMP_NTZ format

I have a string datetime "2017-01-01T20:19:47.922596536+09".
I would like to convert this into snowflake's DATETIME_NTZ date type (which can be found here). Simply put, DATETIME_NTZ is defined as
TIMESTAMP_NTZ
TIMESTAMP_NTZ internally stores “wallclock” time with a specified precision. All operations are performed without taking any time zone into account.
If the output format contains a time zone, the UTC indicator (Z) is displayed.
TIMESTAMP_NTZ is the default for TIMESTAMP.
Aliases for TIMESTAMP_NTZ:
TIMESTAMPNTZ
TIMESTAMP WITHOUT TIME ZONE
I've tried using numpy.datetime64 but I get the following:
> numpy.datetime64("2017-01-01T20:19:47.922596536+09")
numpy.datetime64('2017-01-01T11:19:47.922596536')
This for some reason converts the time to certain timezone.
I've also tried pd.to_datetime:
> pd.to_datetime("2017-01-01T20:19:47.922596536+09")
Timestamp('2017-01-01 20:19:47.922596536+0900', tz='pytz.FixedOffset(540)')
This gives me the correct value but when I try to insert the above value to snowflake db, I get the following error:
sqlalchemy.exc.ProgrammingError: (snowflake.connector.errors.ProgrammingError) 252004: Failed processing pyformat-parameters: 255001: Binding data in type (timestamp) is not supported.
Any suggestions would be much appreciated!
You can do this on the Snowflake side if you want by sending the string format as-is and converting to a timestamp_ntz. This single line shows two ways, one that simply strips off the time zone information, and one that converts the time zone to UTC before stripping off the time zone.
select try_to_timestamp_ntz('2017-01-01T20:19:47.922596536+09',
'YYYY-MM-DD"T"HH:MI:SS.FF9TZH') TS_NTZ
,convert_timezone('UTC',
try_to_timestamp_tz('2017-01-01T20:19:47.922596536+09',
'YYYY-MM-DD"T"HH:MI:SS.FF9TZH'))::timestamp_ntz UTC_TS_NTZ
;
Note that Snowflake UI by default only shows 3 decimal places (milliseconds) unless you specify higher precision for the output display using to_varchar() and a timestamp format string.
TS_NTZ
UTC_TS
2017-01-01 20:19:47.922596536
2017-01-01 11:19:47.922596536

How to convert the values of all fields in data frame from decimal to integers

We have a data frame of 1.1 Million rows X 4300 Columns. The data frame contains different floating values in different columns , example of one of the field value is 0.567. In order to read the complete data frame, with all the floating values, we need considerable amount of memory. So we are thinking to read the complete dataframe by converting all the values to the integer, by multiplying /scaling the values by 1000.
So could anyone guide us?
You can use .mul() to multiply the columns in dataframe, and then cast to nullable integer (which supports also null values as integer type. Otherwise, if a column contains NaN the whole column will be treated as decimal float type even other values are of int type and have no decimal points):
df = df.mul(1000).astype('Int64')
If your dataframe does not contains NaN or other null values, you can simply use:
df = df.mul(1000).astype(int)
I'd suggest looking into reading the data in "chunks", e.g.:
with pd.read_csv("data.csv", chunksize=1000) as reader:
for chunk in reader:
process(chunk)
that way you only need to keep ~4 million values in memory at one time, rather than ~5 billion. See the IO section of the user guide for more details.
Another option would be to ensure they're loaded into 32bit floats (rather than 64bit floats as they are by default) when loading the data, e.g. by passing an appropriate via dtype. But this only halves the memory requirement, using scaled 16bit integer values (or even float16) might help, but again you're not going to save nearly as much memory compared to using chunked loading.

Storing a TimeDelta in my SQL Server database

Table Tasks has 2 datetime columns, StartDate and EndDate. I used python to calculate the difference between these two, resulting in column Time_Difference. See below:
Tasks['Time_Difference'] = Tasks['EndDate'] - Tasks['StartDate']
Avg_Task_Duration = Tasks['Time_Difference'].mean()
Now, I want to store the Avg_Task_Duration value in a Stats table from my SQL Server. It is of type timedelta and looks like this: Timedelta('438 days 09:25:10')
Therefore, I have the following questions:
Is it possible to store a timedelta in my SQL Server? If yes, what data type should the column have?
If not, is there any other alternative?
Unfortunately, SQL Server does not support the ISO SQL standard interval type. You could use time for intervals less than 24 hours but need to store the value in integer units (e.g. seconds) to store longer periods.

SQL get aggregate value within a time-window in a time-series table

I am trying to write a SQL method in Python using SQLite/sqlalchemy to build a new table containing the analyzed data like mean, median, max, var over a certain period of time, based on another table which contains the raw time-series data.
Let's say the raw data is and the timestamp is not evenly distributed. I want to get another table: from the raw data table, which is basically the aggregate value over a 60-second sliding time window, e.g.:
RAW:
TIME VALUE
11:11:12 12
11:11:22 24
11:11:34 16
11:12:21 18
11:12:45 22
11:13:03 15
And I want to get:
ID WINDOW_TIME MEAN MEDIAN MAX VAR
1 11:11 mean(12,24,16) med(12,24,16) ...
2 11:12 mean(18,22) ...
3 11:13 ...
...
How could I group the data according to timestamp?
If your TIME column is of time type (https://dev.mysql.com/doc/refman/5.7/en/time.html) you could do something like this in your group by.
GROUP BY TIME_FORMAT(`TIME`, '%H:%i')
If, on the other hand, the column is just a string type, things are a little trickier. I suppose, if you can guarantee every value is in the format hh:ii:ss, you could use SUBSTRING.
GROUP BY SUBSTRING(`TIME` FROM 1 FOR 5)
If you go with either of these options, however, I really hope, for your sake, that you have a paucity of records in your database because I'm pretty sure each of these options is going to be terrible in terms of performance. I haven't done extensive tests, but I don't think mysql is going to be able to use indexes on either example.
Honestly, you're probably better off creating a table which contains time as hh:ii for each record, and then using that table for your aggregate queries, than you are trying to do this in all one query.

python mysql - unicode

I have a column in mysql which is intended to take decimal values (e.g. 0.00585431)
In python, I have a function that gets this value from a webpage, then returns it into a variable. When I print this variable I get [u'0.00585431'] (which is strange)
I then try to insert this into the mysql column which is set to take a decimal(10,0) value. however, the database stores it as just a 0
the code to insert is nothing special and works for other things:
cur.execute("""INSERT INTO earnings VALUES (%s)""", (variable))
if I change the column to a string type then it stores the whole [u'0.00585431']. So i imagine that when I try to store it as a decimal its not actually taking a proper decimal value and stores a 0 instead?
any thoughts on how to fix this?
DECIMAL(10,0) will give 0 to the right of the comma.
The declaration syntax for a DECIMAL column remains DECIMAL(M,D),
although the range of values for the arguments has changed somewhat:
M is the maximum number of digits (the precision). It has a range of 1
to 65. This introduces a possible incompatibility for older
applications, because previous versions of MySQL permit a range of 1
to 254. (The precision of 65 digits actually applies as of MySQL
5.0.6. From 5.0.3 to 5.0.5, the precision is 64 digits.)
D is the number of digits to the right of the decimal point (the
scale). It has a range of 0 to 30 and must be no larger than M.
Try to change your column datatype to DECIMAL(10,8)
If your values will always be in same format as 0.00585431 then DECIMAL(9,8) would suffice.
https://dev.mysql.com/doc/refman/5.0/en/precision-math-decimal-changes.html

Categories