I am trying to convert UTC data to local time Mozambique. For Mozambique the local time follows GMT+2 or Africa/Maputo. However, when using .tz_localize('UTC').tz_convert(X) where X can either be = 'GMT+2' or = 'Africa/Maputo' I get separate answers. As an example:
import pandas as pd
import numpy as np
np.random.seed(2019)
N = 1000
rng = pd.date_range('2019-01-01', freq='10Min', periods=N)
df = pd.DataFrame(np.random.rand(N, 3), columns=['temp','depth','acceleration'], index=rng)
print(df.tz_localize('UTC').tz_convert('Etc/GMT+2'))
print(df.tz_localize('UTC').tz_convert('Africa/Maputo'))
The code that solves my problem is: df.tz_localize('UTC').tz_convert('Africa/Maputo'). Therefore, I wonder if I have misunderstood the tz_convert('Etc/GMT+2') method, and why the two different solutions dont provide the same answers. tz_convert('Etc/GMT-2') solves the trick but is not intuitive, at least to me.
Thanks in advance.
The time zone conversion using etcetera works in reverse, and perhaps it should be deprecated altogether, considering the following observation on its documentation:
These entries are mostly present for historical reasons, so that
people in areas not otherwise covered by the tz files could "zic -l"
to a time zone that was right for their area. These days, the
tz files cover almost all the inhabited world, so there's little
need now for the entries that are not on UTC.
Your workaround is correct and the best explanation why can be found here. Maybe stick with the tz_convert('Africa/Maputo').
Related
Is there a way to generate negative time values in Python?
I want to generate a time range ranging from -4 minutes to a variable positive time (between 5 to 10 min), something like this:
import datetime
import pandas as pd
time_range = range(-datetime.time(minute=4), datetime.time(minute=5))
# or
time_range = pd.date_range(-datetime.time(minute=4), datetime.time(minute=5))
But datetime does not seem to support negative values.
I need it to generate a graph like the following one but with a time/datetime index instead of integer values (A time/datetime index is especially useful on a plotly graph as it gives a readable index at any zoom level)
In addition, I believe that the possibility to generate negative time values could have many other applications.
datetime.time doesn't accept negative values
Maybe you can try to do something with timedelta
from datetime import timedelta
delta = timedelta(minutes=-4)
I hope this clue will help you.
Saying “thanks” is appreciated, but it doesn’t answer the question. Instead, vote up the answers that helped you the most! If these answers were helpful to you, please consider saying thank you in a more constructive way – by contributing your own answers to questions your peers have asked here.
The crux of this is that its stock data, so the first day of the month might not always be the 1st. I have found a way of isolating these, but i don't know how to then edit the dataframe to put a "1" next to each of these.
Hopefully this makes sense.
import pandas_datareader as web
import pandas as pd
import numpy as np
import datetime as dt
df = web.DataReader('AAPL', 'google')
df = df.set_index(pd.to_datetime(df['Date']))
df.sort_index(inplace=True)
print(df.groupby(pd.Grouper(freq='MS')).nth(0))
The code i'm using. Currently it prints the first of the month correctly, but i'm not sure how to make a new column (D_FoM) with a 1 at every one of these dates.
I'm sure its something easy but i cant work it out, R is much easier for this sort of thing i feel.
I intend to find the time difference between two time variables in seconds. The issue here is that I am referring to time in a different zone. I have managed to find a solution, but it is a mix of pandas.datetime function and python datetime library. I guess, the objective can be achieved with just pandas/numpy alone and with fewer lines of code. below is my code, appreciate any guidance on how can i achieve the final_output more efficiently.
import pandas as pd
from datetime import timedelta
local_time = pd.to_datetime('now').tz_localize('UTC').tz_convert('Asia/Dubai')
t1 = timedelta(hours=local_time.now('Asia/Dubai').hour, minutes=local_time.now('Asia/Dubai').minute)
t2 = timedelta(hours=9, minutes=14)
final_output = (t2 - t1).seconds
You may want to convert both times to UTC, then find the difference. Programmers usually like to work with UTC until the time reaches the front end.
Example
import pytz
b=pytz.timezone('Europe/Rome')
c=pytz.timezone('Europe/Berlin')
These two timezones have different names but represent the same thing, however
b==c returns false
b.zone is different than c.zone
Is there any way to see that b is in reality equal to c?
The concrete problem is that I have to convert the timezone of a pandas data frame, but only if this zone is different than let's say c. The original timezone might be b and in this case I do not want to convert as it would be a lost of time to convert b into c (since they represent the same time zones at the end....)
Thanks for any help.
Update:
changed 'CET' into 'Europe/Rome' to make sure that the timezones are the same in the example, using the feedback from an answer
They do not represent the same thing.
"CET" is always UTC+01:00
"Europe/Berlin" alternates between CET (UTC+01:00) in the winter, and CEST (UTC+02:00) in the summer.
See also:
The timezone tag wiki - specifically, the section "Time Zone != Offset"
The dst tag wiki - covering daylight saving time.
With regards to the edit, Europe/Rome is a distinct time zone. It is not the same as Europe/Berlin, nor Europe/Zurich, nor Europe/Amsterdam. At least not for their entire histories.
If you compare their definitions (using the links in the prior paragraph), you'll see that these each aligned to the "EU" rule for CET/CEST at some point in their past. Rome and Berlin since 1980, Zurich since 1981, and Amsterdam since 1977. Before those dates, they differed significantly. Other time zones have different rules as well.
If you're interested in the history of these zones, I suggest reading through the europe file in the TZ data. The comments alone are quite interesting.
On the other hand, if you are only working with modern dates, where all zones are following the same rules and offsets, then you could consider these substitutable - at least as long as they don't change in the future.
Also, there are some time zones that are just aliases and are completely interchangeable. In the TZ data, they're called "links". For example, you can see here that Europe/Vatican and Europe/San_Marino are both linked to Europe/Rome, and are therefore equivalent.
It's kind of ghetto, but I could compare the offsets of both timezones against a given a timestamp.
from datetime import datetime
today = datetime.today()
b.utcoffset(today) == c.utcoffset(today)
If the only reason for not wanting to convert is because of inefficiency, I would question whether this is really necessary. There is a good blog post by Wes McKinney on vectorized datetime conversion http://wesmckinney.com/blog/?p=506. As an example, for a series with 1e6 rows
In [1]: from pandas import *
In [2]: import numpy as np
In [3]: rng = date_range('3/6/2012', periods=1000000, freq='s', tz='US/Eastern')
In [4]: ts = Series(np.random.randn(len(rng)),rng)
In [5]: %timeit ts.tz_convert('utc')
100 loops, best of 3: 2.17 ms per loop
Is there a way to compare the size of two DateOffset objects?
>>> from pandas.core.datetools import *
>>> Hour(24) > Minute(5)
False
This works with timedelta, so I assumed that pandas would inherit that behavior - or is the time system made from scratch?
pandas DateOffsets does not inherit from timedelta. It's possible for some DateOffsets to be compared, but for offsets like MonthEnd, MonthStart, etc, the span of time to the next offset is non-uniform and depends on the starting date.
Please feel free to start a github issue on this at https://github.com/pydata/pandas, we can continue the discussion there and it'll serve as a reminder.
Thanks.