Set column with different time zones as index - python

I have a DataFrame with time values from different timezones. See here:
The start of the data is the usual time and the second half is daylight savings time. As you can see I want to convert it to a datetime column but because of the different time zones it doesn't work. My goal is to set this column as index. How can I do that?

"... timezone-aware inputs with mixed time offsets ..." can be a bit problematic with Pandas. However, there is a pandas.to_datetime parameter setting that may be acceptable to use timezone-aware inputs with mixed time offsets as a DatetimeIndex.
Excerpt from the docs:
... timezone-aware inputs with mixed time offsets (for example issued from a timezone with daylight savings, such as Europe/Paris)
are not successfully converted to a DatetimeIndex. Instead a simple
Index containing datetime.datetime objects is returned:
...
Setting utc=True solves most of the ... issues:
...
Timezone-aware inputs are converted to UTC (the output represents the exact same datetime, but viewed from the UTC time offset +00:00)
[and a DatetimeIndex is returned].

Related

Parsing pandas DateTime where there are different timezones in dataframe

I'm trying to parse a .csv file into a dataframe. The csv has multiple timezones because of daylight savings that happened during the recording of the data (ones at +01:00 others at +02:00). Here's a snippet for understanding:
After reading in the csv file, I have setup my code as follows:
df_vitals.Date_time = pd.to_datetime(df_vitals.Date_time, format ='%Y-%m-%d %H:%M:%S%z')
df_vitals.Date_time = df_vitals.Date_time.dt.tz_convert("Europe/Madrid")
Where Date_time is my column containing the mixed timezones. I get the following error:
AttributeError: Can only use .dt accessor with datetimelike values
Note that this works perfectly fine for my csv files with just one time zone (i.e. where no daylight savings happened)
How can I properly parse csv files that have more than one time zone in it?
Instead of using format, set the utc param of to_datetime:
utc (boolean): Return UTC DatetimeIndex if True (converting any tz-aware datetime.datetime objects as well).
df_vitals.Date_time = pd.to_datetime(df_vitals.Date_time, utc=True)

How can I eliminate timezone awareness from timestamps in python on elements in a pandas series?

I have a few different data frames with a similar data structure, but the contents of those data frames are slightly different. Specifically, the datetime format of the datetime fields is different- in some cases, the timestamps are timezone aware, in other cases, they are not. I need to find the minimum range of timestamps that overlap all three dataframes, such that the data in the final dataframes exclusively overlaps the same time periods.
The approach I wanted to take was to take the minimum start time from each of the starttime timestamps in each dataframe, and then take the max of those, and then repeat (but invert) the process for the endtimes. However, when I do this I get an error indicating I cannot compare timestamps with different timezone awareness. I've taken a few different approaches- using tz_convert on the timestamp series, as below:
model_output_dataframes['workRequestSplitEndTime']= pd.to_datetime(model_output_dataframes['workRequestSplitEndTime'], infer_datetime_format=True).tz_convert(None)
this generates the error
TypeError: index is not a valid DatetimeIndex or PeriodIndex
So I tried converting it into a datetimeindex, and then converting it:
model_output_dataframes['workRequestSplitEndTime']= pd.DatetimeIndex(pd.to_datetime(model_output_dataframes['workRequestSplitEndTime'], infer_datetime_format=True)).tz_convert(None)
and this generates a separate error:
ValueError: cannot reindex from a duplicate axis
So at this point, I'm somewhat stuck - I feel like after my conversions I'm back at the place I started.
I would appreciate any help you can give me.

Converting strings to datetime while changing timezone

I have many strings of dates and times (or both), like these:
'Thu Jun 18 19:30:21 2015'
'21:07:52'
I want to convert these times to the proper datetime format while also changing the timezone to UTC. The current timezone is 4 hours behind UTC. Is there a way that I can tell python to add 4 hours while converting the formats? Can it also take care of the date in UTC such that when the hour goes past 24 the date changes and time resets?
I will ultimately be inserting these into a mysql table into fields with the 'datetime' and 'time' data type, but they all need to be in UTC.
I would approach this with time.strptime() to parse the source time string, time.mktime() to convert the resulting time vector into an epoch time (seconds since 1970-01-01 00:00:00), and time.strftime() to format the time as you like.
For the timezone adjustment, you could add 4*3600 to the epoch time value or, more generally, append a timezone string to the source and use %Z to parse it.

How can I convert a half-hour timezone to a pytz timezone object?

I am trying to read times with their timezone specified by its UTC-offset and store them as python datetimes.
The pytz module provides the available timezones and I think the complete list is given in this question. If so, most of the times can be stored by using the corresponding Etc/ timezone and flipping the sign:
Etc/GMT
Etc/GMT+0
Etc/GMT+1
Etc/GMT+10
Etc/GMT+11
Etc/GMT+12
Etc/GMT+2
Etc/GMT+3
Etc/GMT+4
Etc/GMT+5
Etc/GMT+6
Etc/GMT+7
Etc/GMT+8
Etc/GMT+9
Etc/GMT-0
Etc/GMT-1
Etc/GMT-10
Etc/GMT-11
Etc/GMT-12
Etc/GMT-13
Etc/GMT-14
Etc/GMT-2
Etc/GMT-3
Etc/GMT-4
Etc/GMT-5
Etc/GMT-6
Etc/GMT-7
Etc/GMT-8
Etc/GMT-9
However, some timezones, like Newfoundland Standard Time and Afghanistan Time are half-hour offsets from GMT (-3:30 and +4:30, respectively). How can I store these times with their appropriate timezone without manually mapping these specific offsets to America/St_Johns or Asia/Kabul?
There is something oyu can do, although not exactly what you seem to want.
>>> nf_offset = pytz.FixedOffset(-150) # offset is in minutes, so 150=2.5 hours
>>> nf_tz = pytz.datetime("Canada/Newfoundland")
>>> datetime.now(nf_tz).strftime("%H:%M") == datetime.now(nf_offset).strftime("%H:%M")
True
They are different objects, so it's not certain you can do what you're hoping after this point, but this will get you a generic object from which you can compare times at arbitrary offsets from UTC.
Do not use Etc/GMT±h POSIX timezones. They are present only for historical reasons (and perhaps for ships in the sea).
In general, utc offset is not enough to specify a timezone i.e., the same timezone may have different utc offsets at different times. And in reverse, multiple timezones may have the same utc offset at some point.
If you are given a fixed utc offset with corresponding dates then you could use any FixedOffset implementation.
If you want to get the correct time for other dates then you have to map your input data to the correct timezone ids such as 'America/St_Johns'. See pytz: return Olson Timezone name from only a GMT Offset.

How to remove the tzinfo completely from the time after converting to UTC in Python?

I came across this exact issue, and I can't figure out how to achieve the solution in my case.
Guido says
The solution is to remove the tzinfo completely from the time after
converting to UTC.
This is what I have tried:
date_time = parser.parse(i.pubDate.text)
news.publication_date = date_time.replace(tzinfo=None).date()
And I get the same error:
NotImplementedError: DatetimeProperty publication_date_time can only support UTC. Please derive a new Property to support alternative timezones.
So it seems I have to convert the date to UTC first. And here my research has failed me.
I came across this solution:
The solution suggested is this:
def date_time_to_utc(date_time):
tz = pytz.timezone('???')
return tz.normalize(tz.localize(date_time)).astimezone(pytz.utc)
But I don't have the timezone. I am scraping the date from a html source. So the timezone could really be from anywhere in the world. Is there no easy and reliable way to convert a date time to UTC?
I could use both dateutil and pytz to achieve this. Many Thanks.
UPDATE
It has been a really long day. I have misread the stack trace. However the question remains valid.
date_time = (datetime}2015-01-13 18:13:26+00:00
news.publication_date_time = date_time
This caused the crash. And it seems by doing this, I pass the unit test:
news.publication_date_time = date_time.replace(tzinfo=None)
Is this the correct way converting a GMT 0 datetime to UTC datetime? Or in fact any timezone to UTC?
Is this the correct way converting a GMT 0 datetime to UTC datetime? Or in fact any timezone to UTC?
If aware datetime object is already in UTC (+0000) then your formula works:
naive_utc = aware_utc.replace(tzinfo=None)
where aware_utc is a timezone-aware datetime object that represents time in UTC.
But if aware datetime object is not in UTC; it fails. You should take into account a (possibly) non-zero UTC offset in the general case:
assert aware.tzinfo is not None and aware.utcoffset() is not None
# local time = utc time + utc offset (by definition)
# -> utc = local - offset
naive_utc = aware.replace(tzinfo=None) - aware.utcoffset()
where aware is a timezone-aware datetime object in an arbitrary timezone.
But I don't have the timezone. I am scraping the date from a html
source. So the timezone could really be from anywhere in the world. Is
there no easy and reliable way to convert a date time to UTC? I could
use both dateutil and pytz to achieve this. Many Thanks.
No. dateutil, pytz won't help you unless the date string itself contains the timezone (or at least its utc offset).
Remember: It is always noon somewhere on Earth i.e., if you collect date/time strings from different places on Earth then you can't compare them unless you attach the corresponding timezones. You can't convert it to UTC, you can't get a valid POSIX timestamp if you don't know the source timezone for the date.
I'm an idiot and it's late here, this time I read the question.
tstmp= date_time.replace(tzinfo=utc).total_seconds()
naive_date = datetime.utcfromtimestamp(tstmp)
First answer will just give you the current naive time
Try this:
dateTime = dateTime.replace(tzinfo=None)
dtUtcAware = pytz.UTC.localize(dateTime)

Categories