Pandas: convert datetime timestamp to whether it's day or night? - python

I am trying to determine if its a day or night based on list of timestamps. Will it be correct if I just check the hour between 7:00AM to 6:00PM to classify it as "day", otherwise "night"? Like I have done in below code. I am not sure of this because sometimes its day even after 6pm so whats the accurate way to differentiate between day or night using python?
sample data: (timezone= utc/zulutime)
timestamps = ['2015-03-25 21:15:00', '2015-06-27 18:24:00', '2015-06-27 18:22:00', '2015-06-27 18:21:00', '2015-07-07 07:53:00']
Code:
for timestamp in timestamps:
time = datetime.datetime.strptime(timestamp, "%Y-%m-%d %H:%M:%S")
hr, mi = (time.hour, time.minute)
if hr>=7 and hr<18: print ("daylight")
else: print ("evening or night")
sample output:
evening or night
evening or night
evening or night
evening or night
daylight

You could use pyephem for this task. It's a
Python package for performing high-precision astronomy computations.
You could set the desired location and get the sun altitude. There are multiple definitions for night, depending if it's for civil (-6°), nautical (-12°) or astronomical (-18°) purposes. Just pick a treshold : if the sun is below, it's nighttime!
#encoding: utf8
import ephem
import math
import datetime
sun = ephem.Sun()
observer = ephem.Observer()
# ↓ Define your coordinates here ↓
observer.lat, observer.lon, observer.elevation = '48.730302', '9.149483', 400
# ↓ Set the time (UTC) here ↓
observer.date = datetime.datetime.utcnow()
sun.compute(observer)
current_sun_alt = sun.alt
print(current_sun_alt*180/math.pi)
# -16.8798870431°

As a workaround, there is a free api for adhan times for muslims. It includes sunset and sunrise times exactly. However you still need location coordinates to obtain the data. It is free at the moment.

Unfortunately python's timestamp cannot determine whether it's day or night. This is also because it depends on where you are located and how exactly you define day and night. I'm afraid you will have to get auxiliary data for that.

You need to know both latitude and longitude. In fact, if a place is in a deep valley the sunrise there will be later and the sunset earlier. You can pay for this service if you need to obtain it many times per day or simply scrape pages like the one at https://www.timeanddate.com/worldclock/uk/london.

Related

How to handle Pandas time series analysis, Daylight Savings time and conversion to other time zones

I am pretty new to Panda's and am having trouble with Pandas/time series analysis and Daylight Savings time.
I have a 1-minute frequency txt file with NY Daylight Savings Time data.
When I use pytz to localize and convert to UTC and then downsample to 2Hr, 4H, all data and times match for rows during DST, but do not match for those rows during Stamdard Time (mid november- mid march). What I need is for it all to match.
So what I (believe) I need please is to somehow normalize the Standard Time rows. As it is, when downsampled to say 2H as an example, it is obvious that time changes from even times (midnight, 2am,4am, 6am, etc.) to odd times (1pm, 3pm, 5pm, etc.). If I that can get addressed, after that, converting to UTC and/or other Time Zones and resampling (all of which I think I have somewhat figured out) should work.
The closest I have been able to get so far is to:
Take 1min data, Localize to US/Eastern, Converted to UTC.
When reampled to 2 Hr, this matches (all even hours) but is incorrect for dates that fall during standard time.
I then tried:
Take 1 min data, Localize to America/New_York, Converted to UTC.
This matches the entire year in resampled 2H and is CORRECT for Standard time, but not for DST.
The code is below, and you'll note that I have commented out several of the blocks of code. That is because I have tried many different combinations of things to try work this out myself.
Should I be using something other than pytz? or? Thanks for the help!
import pandas as pd
import datetime as datetime
import pytz as pytz
from pytz import all_timezones
colnames=['Date', 'Open', 'High', 'Low','Close','Volume']
df = pd.read_csv ("/Users/aiki/Desktop/GC_1min_full.txt", sep=',',names=colnames, header=None,index_col='Date',parse_dates=True)
#Make this naive NY data TZ aware: America/New_York handles DST, US/Eastern does not.
df = df.tz_localize('America/New_York')
#Convert the NY TZ aware data to UTC
df = df.tz_convert(tz= 'UTC')
#Make this naive NY data TZ aware
#df = df.tz_localize('US/Eastern')
#Make this UTC NY localized (again)
#df = df.tz_convert('US/Eastern')
#Convert this data to central time
#df = df.tz_convert(tz= 'America/Chicago')`
Adding more info here after guidance from other posters:
My data source, says: "All data is in US Eastern Timezone (ie EST/EDT depending on the time of year)".
If I read in original data, df.index says "datetime" but there is no TZ info, and just to check, I: print (df.index.tz) and get "None". This means I have naive TZ data that they say is in DST/ST format.
Since my original post I've learned that:
1 - EST is UTC -5 hours. This is pytz (US/Eastern).
2 - But pytz (America/New_York) is, essentially, EST in the winter and EDT in the summer. And so, importantly, America/New_York deals with DST.
(I believe this is right, please correct if wrong)
After (many) more tries, what I now know is:
EST Convert -
Localize to America/New_York, Convert to UTC,
Do Resample, Convert to America/Chicago.
With validation material set to Chicago tz, this works for EST, but not DST.
#Make this naive NY data TZ aware using America/New_York which handles DST.
df = df.tz_localize(tz= 'America/New_York')
df = df.tz_convert(tz= 'UTC')
(do resample code, etc)
df = df.tz_convert(tz= 'America/Chicago')
DST Convert
Localized to US/Eastern, Convert to US/Central, Do Resample.
With validation material set to Chicago tz, this works for DST but not EST.
df = df.tz_localize('US/Eastern')
df = df.tz_convert(tz= 'US/Central')
(do resample code, etc)
This does not solve the question of how to get the whole year in one shot, but I can live with a two-part solution that loses some data if I have to. Not ideal, but time is finite...
Thank you all for your good ideas- I do appreciate it very much. If you have any other thoughts, I am all ears.
Addressing Ultra909's comments below:
1- Yes, done per my method(s) above, my resampled DST and my (separate) EST data match my charting platform, and also match another public facing charting system.
2- Data is 1min data, and since it's market related, it has no data (or time stamps) during the actual (missing /ambiguous time) 0200 DST/EST switch. So for me, it's hard to tell how it's being handled.
DST in 2021 starts Sunday, 3.14.21 #0200.
2021-03-12 is Friday, NY
2021-03-14 is Sunday, NY
This from the data:
2021-03-12 16:59:00,65.57,65.60,65.55,65.56,28
#(Market reopens Sunday #1800, so we have timestamps.
2021-03-14 18:00:00,65.56,65.75,65.54,65.68,238
#(So...the actual DST change is not visible here...)
DST in 2021 ends Sunday 2021-11-07
2021-11-05 is Friday, NY
2021-11-07 is Sunday, NY
2021-11-05 16:59:00,81.15,81.22,81.13,81.17,96
2021-11-07 18:00:00,81.13,81.65,81.05,81.60,974
It could be me (so very likely). Or it could be the data. Either way, I think I've gone as far as I can. Should anyone know data providers where this isn't an issue, I'd be grateful to know about it.
Also, if I continue on this path, I'll need to split out Dec1-Feb 28th for every year I have. I'll lose the 2 months involved in the DST/EST switch, but it will have to do. So if anyone has thoughts on how to do that programarically in Pandas en masse, that would be great.
Thank you again for all your input!
So if I understand correctly, the data is in Eastern Standard Time (GMT-5) without any daylight savings?
Then the way I would solve it is to add 5:00:00 to the index across the board and then localise as UTC.
ix = df.index + pd.Timedelta(hours=5)
df_utc = df.set_index(ix).tz_localize("UTC")
You can then tz_convert(..) further, if desired.

Python - Calendar / Date Library for Arithmetic Date Operations

This is for Python:
I need a library that is able to do arithmetic operations on dates while taking into account the duration of a month and or year.
For example, say I add a value of "1 day" to 3/31/2020, the result of should return:
1 + 3/31/2020 = 4/1/2020.
I also would need to be able to convert this to datetime format, and extract day, year and month.
Does a library like this exist?
import datetime
tday = datetime.date.today() # create today
print("Today:", tday)
""" create one week time duration """
oneWeek = datetime.timedelta(days=7)
""" create 1 day and 1440 minutes of time duraiton """
eightDays = datetime.timedelta(days=7, minutes=1440)
print("A week later than today:", tday + oneWeek) # print today +7 days
And the output to this code snippet is:
Today: 2020-03-25
A week later than today: 2020-04-01
>>>
As you see, it takes month overflows into account and turns March to April. datetime module has lots of things, I don't know all its attributes well and haven't used for a long time. However, I believe you can find nice documentation or tutorials on the web.
You definitely can create any specific date(there should be some constraints though) instead of today by supplying day, month and year info. I just don't remember how to do it.

Bi-monthly salary between interval of two dates

I'm trying to program a salary calculator that tells you what your salary is during sick leave. In Costa Rica, where I live, salaries are paid bi-monthly (the 15th and 30th of each month), and each sick day you get paid 80% of your salary. So, the program asks you what your monthly salary is and then asks you what was the start date and finish date of your sick leave. Finally, it's meant to print out what you got paid each payday between your sick leave. This is what I have so far:
import datetime
salario = float(input("What is your monthly salary? "))
fecha1 = datetime.strptime(input('Start date of sick leave m/d/y: '), '%m/%d/%Y')
fecha2 = datetime.strptime(input('End date of sick leave m/d/y: '), '%m/%d/%Y')
diasinc = ((fecha2 - fecha1).days)
print ("Number of days in sick leave: ")
print (diasinc)
def daterange(fecha1, fecha2):
for n in range(int ((fecha2 - fecha1).days)):
yield fecha1 + timedelta(n)
for single_date in daterange(fecha1, fecha2):
print (single_date.strftime("%Y-%m-%d")) #This prints out each individual day between those dates.
I know for the salary I just multiply it by .8 to get 80% but how do I get the program to print it out for each pay day?
Thank you in advance.
Here's an old answer to a similar question from about eight years ago: python count days ignoring weekends ...
... read up on the Python: datetime module and adjust Dave Webb's generator expression to count each time the date is on the 15th or the 30th. Here's another example for counting the number of occurrences of Friday on the 13th of any month.
There are fancier ways to shortcut this calculation using modulo arithmetic. But they won't matter unless you're processing millions of these at a time on lower powered hardware and for date ranges spanning months at a time. There may even be a module somewhere that does this sort of thing, more efficiently, for you. But it might be hard for you to validate (test for correctness) as well as being hard to find.
Note that one approach which might be better in the long run would be to use Python: SQLite3 which should be included with the standard libraries of your Python distribution. Use that to generate a reference table of all dates over a broad range (from the founding of your organization until a century from now). You can add a column to that table to note all paydays and use SQL to query that table and select the dates WHERE payday==True AND date BETWEEN .... etc.
There's an example of how to SQLite: Get all dates between dates.
That approach invests some minor coding effort and some storage space into a reference table which can be used efficiently for the foreseeable future.

Explaining how NOAA UV Index forecast calculates hour in forecast file?

I'm getting hourly UV index from the ftp site of NOAA. As mentioned here, the forecast time is present in the file name as uv.t12z.grbfXX, where XX is the forecast hour (01 to 120). But inside the grib2 files, the key hour always indicate that the hour is 12. Also the date doesn't changes at all in any of the file. As of today it is 2016/06/04 in all of the 120 files. I couldn't find the timezone in the key of the file as well.
So, I wanted to know, if I want to find the UV Index of Delhi(28.6139, 77.2090) timezone: Asia/Kolkata on 2016/06/05 at 2 pm, how can I calculate this from those files ? Is it the UV index in the uv.t12z.grbf38 file assuming 01 in the files is 00:00 hrs on 2016/06/04. I couldn't find anything in documentation regarding this.
Here's the code snippet for finding the hour:
import pygrib
grbs = pygrib.open('uv.t12z.grbf01.grib2')
grb = grbs.select(name='UV index')[0]
year = grb.year
month = grb.month
day = grb.day
hour = grb.hour
minute = grb.minute
print year,month,day,hour,minute
Output:
>>2016 6 4 12 0
Values hour, day, month, year in GRIB file only indicate start time of a weather forecast (it is initial time from which forecast is produced).
Date and time in this file is according to UTC.
To get forecast date and time in UTC you need to add to initial time a value XX.
Then to get your local time add to this value your time shift.
I.e., I need to convert uv.t12z.grbf38, from which hour, day, month, year = 12,1,1,2016 to Moscow time. First, I add to start time 38 hours and get 2,3,1,2016 (this time according to UTC). Moscow has time shift +3 UTC, it means, local time is 5:00 3 January, 2016.
You may convert date and time vise-versa as I wrote above.

How to calculate the angle of the sun above the horizon using pyEphem

I'm new to PyEphem and this is probably a simple question. I want to calculate the angle of the sun above the horizon at a certain GPS point and date. My code is as follows:
import ephem
import datetime
date = datetime.datetime(2010,1,1,12,0,0)
print "Date: " + str(date)
obs=ephem.Observer()
obs.lat='31:00'
obs.long='-106:00'
obs.date = date
print obs
sun = ephem.Sun(obs)
sun.compute(obs)
print float(sun.alt)
print str(sun.alt)
sun_angle = float(sun.alt) * 57.2957795 # Convert Radians to degrees
print "sun_angle: %f" % sun_angle
And the output is:
python sunTry.py
Date: 2010-01-01 12:00:00
<ephem.Observer date='2010/1/1 12:00:00' epoch='2000/1/1 12:00:00' lon=-106:00:00.0 lat=31:00:00.0 elevation=0.0m horizon=0:00:00.0 temp=15.0C pressure=1010.0mBar>
-0.44488877058
-25:29:24.9
sun_angle: -25.490249
Why is the alt negative? The GPS location is somewhere in Mexico and I've specified 12 Noon in the date parameter of the observer. The sun should be pretty much directly overhead so I would have thought that the alt variable would return an angle somewhere int he range of 70 - 90 degrees? What am I missing here?
Thanks
Stephen
I believe the problem is that in PyEphem, dates are always in UTC. So no matter what the local timezone is for your observer's lat/lon, if you tell it Noon, it assumes you mean Noon in UTC. That means you have to pass in the time you intend already converted to UTC.
The UTC time for "somewhere in Mexico at datetime.datetime(2010,1,1,12,0,0)" is roughly datetime.datetime(2010,1,1,18,0,0).
With this new date, I get the output
sun_angle: 33.672932
That still seems kind of low, but more reasonable than -25.
If you want a programmatic way of doing it, you can (at your own risk) meddle with the module pytz.
tz_mexico = pytz.timezone('America/Mexico_City')
mexico_time = datetime.datetime(2010,1,1,12,0,0,0,tz_mexico)
utc_time = mexico_time.astimezone(pytz.utc)
obs.date = utc_time

Categories