Parsing pandas DateTime where there are different timezones in dataframe - python

I'm trying to parse a .csv file into a dataframe. The csv has multiple timezones because of daylight savings that happened during the recording of the data (ones at +01:00 others at +02:00). Here's a snippet for understanding:
After reading in the csv file, I have setup my code as follows:
df_vitals.Date_time = pd.to_datetime(df_vitals.Date_time, format ='%Y-%m-%d %H:%M:%S%z')
df_vitals.Date_time = df_vitals.Date_time.dt.tz_convert("Europe/Madrid")
Where Date_time is my column containing the mixed timezones. I get the following error:
AttributeError: Can only use .dt accessor with datetimelike values
Note that this works perfectly fine for my csv files with just one time zone (i.e. where no daylight savings happened)
How can I properly parse csv files that have more than one time zone in it?

Instead of using format, set the utc param of to_datetime:
utc (boolean): Return UTC DatetimeIndex if True (converting any tz-aware datetime.datetime objects as well).
df_vitals.Date_time = pd.to_datetime(df_vitals.Date_time, utc=True)

Related

Set column with different time zones as index

I have a DataFrame with time values from different timezones. See here:
The start of the data is the usual time and the second half is daylight savings time. As you can see I want to convert it to a datetime column but because of the different time zones it doesn't work. My goal is to set this column as index. How can I do that?
"... timezone-aware inputs with mixed time offsets ..." can be a bit problematic with Pandas. However, there is a pandas.to_datetime parameter setting that may be acceptable to use timezone-aware inputs with mixed time offsets as a DatetimeIndex.
Excerpt from the docs:
... timezone-aware inputs with mixed time offsets (for example issued from a timezone with daylight savings, such as Europe/Paris)
are not successfully converted to a DatetimeIndex. Instead a simple
Index containing datetime.datetime objects is returned:
...
Setting utc=True solves most of the ... issues:
...
Timezone-aware inputs are converted to UTC (the output represents the exact same datetime, but viewed from the UTC time offset +00:00)
[and a DatetimeIndex is returned].

Storing struct_time in SQL

Some code I am writing in Python takes in a date from a server in the struct_time format (with the 9 args).
How can I store this date in an SQL database, and be able to read back this date as a struct_time while keeping the timezone and all additional information coming from struct_time?
I tried putting the struct_time directly in the SQL
struct_date = time.struct_time(tm_year=2020, tm_mon=9, tm_mday=10, tm_hour=22, tm_min=31, tm_sec=4, tm_wday=3, tm_yday=254, tm_isdst=0)
cursor.execute("UPDATE dbo.RSS_Links SET last_update=? WHERE link=?;", struct_date, links)
> "A TVP's rows must be Sequence objects.", 'HY000'
I can put the time in the database using the below, but I don't see where the timezone is kept when converting to strftime.
date_to_store = time.strftime("%Y-%m-%d %H:%M:%S", struct_date)
I'd highly suggest doing one of these (in this specific order):
Use built-in DATETIME data type and store all dates in UTC
Use LONG/BIGINT type to store date in epoch
Use built-in DATETIME format that can store time zone information
Don't store dates as strings, don't couple it with struct_time or any other struct/class, you'll regret it later :)
Your application should have a data layer, which would handle data serialization/deserialization.

Converting UTC Timestamp in CSV to Local time (PST)

I am looking for a code on how to convert timestamps from some GPS data in a csv file to local time (in this case PST). I do have some other files I would have to convert also to CDT and EDT.
This is what the output looks like:
2019-09-18T07:07:48.000Z
I would like to create a separate column in the right of the Excel for the Date and another for the time EX:
TIME_UTC DATE TIME_PST
2019-09-18T07:07:48.000Z 09-18-2019 12:07:48 AM
I only know basic Python and nothing about Excel in python so it would be super helpful!
Thank you!!!
By calling to localize you tell in what TZ your time is. So, in your example you say that your date is in UTC, then you call astimezone for UTC. FOr example:
utc_dt = pytz.utc.localize(datetime.pstnow())
pst_tz = timezone('US/Pacific')
pst_dt = pst_tz.normalize(pst_dt.astimezone(utc_tz))
pst_dt.strftime(fmt)
For more example, visit here
If you want to use Excel Formula:
For the date:
=INT(SUBSTITUTE(LEFT(A2,LEN(A2)-1),"T"," ")-TIME(7,0,0))
For the Time:
=MOD(SUBSTITUTE(LEFT(A2,LEN(A2)-1),"T"," ")-TIME(7,0,0),1)
And format the output with the desire format: mm-dd-yyyy and hh:mm:ss AM/PM respectively.

Convert UTC timestamp to local timezone issue in pandas

I'm trying to convert a Unix UTC timestamp to a local date format in Pandas. I've been looking through a few solutions but I can't quite get my head around how to do this properly.
I have a dataframe with multiple UTC timestamp columns which all need to be converted to a local timezone. Let's say EU/Berlin.
I first convert all the timestamp columns into valid datetime columns with the following adjustments:
df['date'] = pd.to_datetime(df['date'], unit='s')
This works and gives me the following outcome e.g. 2019-01-18 15:58:25 if I know try to adjust the timezone for this Date Time I have tried both:
df['date'].tz_localize('UTC').tz_convert('Europe/Berlin')
and
df['date'].tz_convert('Europe/Berlin')
In both cases the error is: TypeError: index is not a valid DatetimeIndex or PeriodIndex and I don't understand why.
The problem must be that the DateTime column is not on the index. But even when I use df.set_index('date') and after I try the above options it doesn't work and I get the same error.
Also, if it would work it seems that this method only allows the indexed DateTime to be timezone adjusted. How would I then adjust for the other columns that need adjustment?
Looking to find some information on how to best approach these issues once and for all! Thanks
You should first specify that it is a datetime by adding the .dt. to a non index
df['date'] = df['date'].dt.tz_localize('UTC').dt.tz_convert('Europe/Berlin')
This should be used if the column is not the index column.

Pandas to_datetime specifying format for iso 8601 timestamp string with nanoseconds

I have ISO 8601 formatted timestamp strings in a column of a Pandas dataframe. I am looking for the most efficient way to convert this column of strings to Pandas datatime objects. pd.to_datetime() works but my dataframe is about 7.5 million rows so it is very slow.
I can specify the format using strftime syntax to avoid auto format detection and (I assume) substantially speed up the conversion.
import pandas as pd
pd.to_datetime('2013-04-27 08:27:30.000001540', format='%Y-%m-%d %H:%M:%S.%f')
Gives me:
ValueError: unconverted data remains: 540
If I chop off the last three characters of the timestamp this works perfectly. Seems like the elegant solution here is to determine the Python compatible strftime directive for nanoseconds. I tried %9N, %N, %9. Any ideas?

Categories