Python Time 'Lag' Effect in URL - python

My server's timezone and the data that I have fetched via the following span two consecutive hours. Once the hour changes, the hour that python syntax is getting is not found on the server that is providing the content, since the server jumps to the next hour while the data is not processed yet. In case you are wondering the data in question is weather model data in .grib2 format.
I have the following code now:
# /usr/bin/python
import time
# Save your URL to a variable:
url = time.strftime("http://nomads.ncep.noaa.gov/pub/data/nccf/nonoperational/com/hrrr/para/hrrr.%Y%m%d/hrrr.t%Hz.wrfnatf04.grib2")
# Save that string to a file:
with open('hrrr/hrrrf4.txt', 'a') as f: f.write(url+'\n')
Is there a way to 'lag' the &H variable in the above URL one hour, or another method that will delay it to ensure a smooth data processing for all desired hours?
Thank you for taking the time to answer my question.

The code below would print out the datetime of now, and then offset it by subtracting 1 hour, you could also add an hour, or minutes, seconds, etc.... I scrape lots of forums that are in different timezones than my scraping server and that's how I adjust anyway. This also helps if the servers clock is off a little bit too, you could adjust the time back of forward however much you need.
import datetime
timenow = datetime.datetime.now()
timeonehourago = timenow - datetime.timedelta(hours=1)
url = timenow.strftime("http://nomads.ncep.noaa.gov/pub/data/nccf/nonoperational/com/hrrr/para/hrrr.%Y%m%d/hrrr.t%Hz.wrfnatf04.grib2")
offseturl = timeonehourago.strftime("http://nomads.ncep.noaa.gov/pub/data/nccf/nonoperational/com/hrrr/para/hrrr.%Y%m%d/hrrr.t%Hz.wrfnatf04.grib2")
print url
print offseturl

Related

How to clear a Set at Midnight?

I have a script that receives information from an endpoint, stores this information and counts the number of occurrences in a set.
At 23:59 the size of the set is written to a file and I want the set to be cleared by 00:00(midnight) to recount the number of occurrences the next day.
def delayedCounter(delayedSet):
now = datetime.now()
date = now.strftime('%Y-%m-%d')
hour = datetime.now().strftime('%H:%M')
with open('delayedData.csv','a+') as file:
if hour == '23:59':
file.write(f'NÂș : {len(delayedSet)} Data: {date}\n')
elif hour == '00:00':
delayedSet.clear()
Along with this I am using Flask to display everything in a webapp.
I make the request every 1 minute using apscheduler.
However it writes to the delayedData.csv file but does not reset the set.
Does anyone know what it could be? I can make more code available if needed

Getting microseconds past the hour

I am working on a quick program to generate DIS (Distributed Interactive Simulation) packets to stress test a gateway we have. I'm all set and rearing to go, except for one small issue. I'm having trouble pulling the current microseconds past the top of the hour correctly.
Currently I'm doing it like this:
now = dt.now()
minutes = int(now.strftime("%M"))
seconds = int(now.strftime("%S")) + minutes*60
microseconds = int(now.strftime("%f"))+seconds*(10**6)
However when I run this multiple times in a row, I'll get results all over the place, with numbers that cannot physically be right. Can someone sanity check my process??
Thanks very much
You can eliminate all that formatting and just do the following:
now = dt.now()
microseconds_past_the_hour = now.microsecond + 1000000*(now.minute*60 + now.second)
Keep in mind that running this multiple times in a row will continually produce different results, as the current time keeps advancing.

How to use search imaplib

I need to download only emails from the IMAP server which are older than a certain Linux timestamp. The Linux time stamp has to be to the second. This gets updated as per the server time.
Example :
datetime.now().strftime('%d-%b-%Y %H:%M:%s')
'21-Dec-2012 16:50:1356088844
Problem code:
result, data = imap_server.search(None, '(SINCE '+datetime.now().strftime('%d-%b-%Y')+')')
The search function only takes in 'SINCE + DATE'. It won't take a full date time argument. It only takes the date example '21-Dec-2012'
How can I solve this?

Run only 10k requests per day and next day another 10k and so on

I'm just doing some program which fetches some data from some api.
I use yql to access yahoo geo stuff to match to some geonames id. For example:
def get_woeid(geonames_id):
y = yql.Public()
query = 'select * from geo.concordance where \
namespace="geonames" and text="' + geonames_id + '"'
result = y.execute(query)
for row in result.rows:
print row.get('woeid')
This function takes the geonames_id from the db and do a request to match that id to the woeid(where on earth id) from yahoo geo.
The problem is that this api thing just allows 10k requests per day, so I have to use some logic which gets 10k requests and "waits" and next day it will continue with the next 10k..
I could start a loop over all the data and if 10k requests are made, then do some wait or sleep stuff till next day, and do the rest, but this should be done better I think, but I don't really know how.
Hope someone could help out here.
Thank you :)
Ok, I'm gonna make it like this. I will save the id after each query and write a script that filters for objects with missing woeids and queries them (but not more than 10k) and runt the script daily with e.g. kronos.
Thanks to all :)

What's the best performing xml parsing for GAE (Python Version)?

I think we all know this page, but the benchmarks provided dated from more than two years ago. So, I would like to know if you could point out the best xml parser around. As I need just a xml parser, the more important thing to me is speed over everything else.
My objective is to process some xml feeds (about 25k) that are 4kb in size (this will be a daily task). As you probably know, I'm restricted by the 30 seconds request timeout. So, what's the best parser (Python only) that I can use?
Thanks for your anwsers.
Edit 01:
#Peter Recore
I'll. I'm writing some code now and plan to run some profiling in the near future. Regarding your question, the answer is no. Processing takes just a little time when compared with downloading the actual xml feed. But, I can't increase Google's Bandwidth, so I can only focus on this right now.
My only problem is that i need to do this as fastest as possible because my objective is to get a snapshot of a website status. And, as internet is live and people keep adding and changing it's data, i need the fastest method because any data insertion during the "downloading and processing" time span will actually mess with my statistical analisys.
I used to do it from my own computer and the process took 24 minutes back then, but now the website has 12 times more information.
I know that this don't awnser my question directly, but id does what i just needed.
I remenbered that xml is not the only file type I could use, so instead of using a xml parser I choose to use json. About 2.5 times smaller in size. What means a decrease in download time. I used simplejson as my json libray.
I used from google.appengine.api import urlfetch to get the json feeds in parallel:
class GetEntityJSON(webapp.RequestHandler):
def post(self):
url = 'http://url.that.generates.the.feeds/'
if self.request.get('idList'):
idList = self.request.get('idList').split(',')
try:
asyncRequests = self._asyncFetch([url + id + '.json' for id in idList])
except urlfetch.DownloadError:
# Dealed with time out errors (#5) as these were very frequent
for result in asyncRequests:
if result.status_code == 200:
entityJSON = simplejson.loads(result.content)
# Filled a database entity with some json info. It goes like this:
# entity= Entity(
# name = entityJSON['name'],
# dateOfBirth = entityJSON['date_of_birth']
# ).put()
self.redirect('/')
def _asyncFetch(self, urlList):
rpcs = []
for url in urlList:
rpc = urlfetch.create_rpc(deadline = 10)
urlfetch.make_fetch_call(rpc, url)
rpcs.append(rpc)
return [rpc.get_result() for rpc in rpcs]
I tried getting 10 feeds at a time, but most of the times an individual feed raised the DownloadError #5 (Time out). Then, I increased the deadline to 10 seconds and started getting 5 feeds at a time.
But still, 25k feeds getting 5 at a time results in 5k calls. In a queue that can spawn 5 tasks a second, the total task time should be 17min in the end.

Categories