I've got a Python CGI script that pulls data from a GPS service; I'd like this information to be updated on the webpage about once every 10s (the max allowed by the GPS service's TOS). But there could be, say, 100 users viewing the webpage at once, all calling the script.
I think the users' scripts need to grab data from a buffer page that itself only upates once every ten seconds. How can I make this buffer page auto-update if there's no one directly viewing the content (and not accessing the CGI)? Are there better ways to accomplish this?
Cache the results of your GPS data query in a file or database (sqlite) along with a datetime.
You can then do a datetime check against the last cached datetime to initiate another GPS data query.
You'll probably run into concurrency issues with cgi and the datetime check though...
To get around concurrency issues, you can use sqlite, and put the write in a try/except.
Here's a sample cache implementation using sqlite.
import datetime
import sqlite3
class GpsCache(object):
db_path = 'gps_cache.db'
def __init__(self):
self.con = sqlite3.connect(self.db_path)
self.cur = self.con.cursor()
def _get_period(self, dt=None):
'''normalize time to 15 minute periods'''
if dt.minute < 15:
minute_period = 0
elif 15 <= dt.minute < 30:
minute_period = 15
elif 30 <= dt_minute < 45:
minute_period = 30
elif 45 <= dt_minute:
minute_period = 25
period_dt = datetime.datetime(year=dt.year, month=dt.month, day=dt.day, hour=dt.hour, minute=minute_period)
return period_dt
def get_cache(dt=None):
period_dt = self._get_period(dt)
select_sql = 'SELECT * FROM GPS_CACHE WHERE date_time = "%s";' % period_dt.strftime('%Y-%m-%d %H:%M')
self.cur.execut(select_sql)
result = self.cur.fetchone()[0]
return result
def put_cache(dt=None, data=None):
period_dt = self._get_period(dt)
insert_sql = 'INSERT ....' # edit to your table structure
try:
self.cur.execute(insert_sql)
self.con.commit()
except sqlite3.OperationalError:
# assume db is being updated by another process with the current resutls and ignore
pass
So we have the cache tool now the implementation side.
You'll want to check the cache first then if it's not 'fresh' (doens't return anything), go grab the data using your current method. Then cache the data you grabbed.
you should probably organize this better, but you should get the general idea here.
Using this sample, you just replace your current calls to 'remote_get_gps_data' with 'get_gps_data'.
from gps_cacher import GpsCache
def remote_get_gps_data():
# your function here
return data
def get_gps_data():
data = None
gps_cache = GpsCache()
current_dt = datetime.datetime.now()
cached_data = gps_cache.get_cache(current_dt)
if cached_data:
data = cached_data
else:
data = remote_get_gps_data()
gps_cache.put_cache(current_dt, data)
return data
Related
I am using Python Snowflake connector to extract data from tables in Snowflake. Here is my file structure:
sql
a.sql
b.sql
c.sql
configurations.py
data_extract.py
main.py
Here the sql folder contains all my sql queries in .sql files. I put these sql files separately because they are handreds of lines long each and looks messy if I put them into python files.
configuration.py contains datetime parameters I want to change every time I run the code. It looks like this:
START_TIME = '2018-10-01 00:00:00'
END_TIME = '2019-04-01 00:00:00'
I want to add these parameters into the .sql files. For example, a.sql includes the following content:
DECLARE
#START_PICKUP_DATE DATE,
#END_PICKUP_DATE DATE,
SET
#START_PICKUP_DATE = '2018-10-01'
SET
#END_PICKUP_DATE = '2019-04-01'
select supplier_confirmation_id, pickup_datetime, dropoff_datetime, pickup_station_distance
from SANDBOX.ZQIAN.V_PDL
where pickup_datetime >= START_PICKUP_DATE and pickup_datetime < END_PICKUP_DATE
and supplier_confirmation_id is not null;
I use a.sql in my python code in the following way:
def executeSQLScriptsFromFile(filepath):
# snowflake credentials, replace SECRET with your own
ctx = snowflake.connector.connect(
user='S_ANALYTICS_USER',
account=SECRET_A,
region='us-east-1',
warehouse=SECRET_B,
database=SECRET_C,
role=SECRET_D,
password=SECRET_E)
fd = open(filepath, 'r')
query = fd.read()
fd.close()
cs = ctx.cursor()
try:
cur = cs.execute(query)
df = pd.DataFrame.from_records(iter(cur), columns=[x[0] for x in cur.description])
finally:
cs.close()
ctx.close()
return df
def extract_data():
a_sqlpath = os.path.join(os.getcwd(), 'sql\a.sql')
a_df = executeSQLScriptsFromFile(a_sqlpath)
return a_df
The problem is I want START_PICKUP_DATE and END_PICKUP_DATE in a.sql file to be synced and equal to START_TIME and END_TIME in configurations.py file so that I only need to change START_TIME and END_TIME in configurations.py and extract data in different timeframe using a.sql in Snowflake.
I've been looking for solutions online for quite a long time, but still not able to find a good solution that is specific to my problem. Many thanks to anyone who can provide a hint!
You should be able to parameterize the sql statements so that instead of declaring in the SQL file you can just make it a parameter passed during execution.
select supplier_confirmation_id, pickup_datetime, dropoff_datetime, pickup_station_distance
from SANDBOX.ZQIAN.V_PDL
where pickup_datetime >= %(START_PICKUP_DATE)s and pickup_datetime < %(END_PICKUP_DATE)s and supplier_confirmation_id is not null;
Then when calling the function, just send the parameters START_PICKUP_DATE and END_PICKUP_DATE as parameters to the execute statement. One way to do this is to do a mapping from the parameter name to the value of the parameter. (In this example I'm assuming you have a function that will get the parameter value).
cur = cs.execute(query, {'START_PICKUP_DATE':get_value_from_config('start_pickup'), 'END_PICKUP_DATE':get_value_from_config('end_pickup')})
Or you can pass them by location
cur = cs.execute(query, [get_value_from_config('start_pickup'), get_value_from_config('end_pickup')])
Which in essense becomes
cur = cs.execute(query, ['2018-10-01 00:00:00','2019-04-01 00:00:00'])
To accomplish this, I would take your .sql files and extract the queries into triple-quoted python strings with format specifiers for your variables. Then import the queries into your main script just like you import your configuration:
sql_queries.py:
sql_a = """
DECLARE
#START_PICKUP_DATE DATE,
#END_PICKUP_DATE DATE,
SET
#START_PICKUP_DATE = {START_TIME}
SET
#END_PICKUP_DATE = {END_TIME}
select supplier_confirmation_id, pickup_datetime, dropoff_datetime, pickup_station_distance
from SANDBOX.ZQIAN.V_PDL
where pickup_datetime >= START_PICKUP_DATE and pickup_datetime < END_PICKUP_DATE
and supplier_confirmation_id is not null;
"""
main:
from sql_queries import sql_a
print(sql_a.format(configuration.START_TIME, configuration.END_TIME))
I'm writing some code to query data in Elasticsearch. We have huge amounts of data so I am using a scan feature and searching a specific index. We index elasticsearch by the day, so for example today = index_2019_04_15 and yesterday = index_2019_04_14. Is there a way I can query only the previous days index?
Second, in terms of doing _all and then limiting the query to say 2019-04-14, will I see a big performance hit? If not, then I can just do a previous day query.
Here's my code:
import pandas as pd
from elasticsearch_dsl import Search
from elasticsearch_dsl import connections
class get_data:
def __init__(self, host, query):
self.host = host
self.query = query
def pull_es_data(self):
connections.create_connection(alias='client',hosts=self.host,timeout=60)
s = Search(using='client', index="data-2019-04-15") \
.query("match", clientid=r"AB1234-12345")
response = s.scan()
return response
test = get_data("localhost","test")
x = test.pull_es_data()
results_df = pd.DataFrame(([item.clientid,item.clientlocation] for item in x),\
columns=['clientid','clientlocation'])
I was able to take care of this using Index in Elasticsearch-dsl
def get_index_list(self):
i = Index("*").get_alias("client")
return i
I am still trying to populate a query with the results of a calendar widget I created. The calendar widget works perfectly and the weeklyrpt.py works perfectly if the message box is answered with a yes response. A no response opens the calendar widget where dates are picked and parsed out individually but I cannot figure out how to integrate the results into my query.
The rundate function is
def rundate():
global run
result = tkinter.messagebox.askyesno(title="Rundate", message="back 7 days?")
#result = tkinter.messagebox.askyesno()
if result == True:
run = date.today() - timedelta(7)
return run
if result == False:
os.system("python pick.py")
#opens the pick.py program which is a calendar the after choosing the dates returns this function
def get_rundate():
start = self.result1.strftime("%m/%d/%Y")
end = self.result2.strftime("%m/%d/%Y")
daterange = pd.date_range(start, end)
for single_date in daterange:
print(single_date.strftime("%m/%d/%Y"))
return single_date.strftime("%m/%d/%Y")
#then returns to weeklyreport
rundate = rundate()
print(rundate)
The data I am trying to obtain for the query is the "rundate"
#Query permits made effective since rundate
query = '''select `NPDES_ID`, `EffectiveDate`, `FacilityName`, `StateFacilityID`, `City`, `CountyName`
from Permits
where `EffectiveDate` >= ?
order by `NPDES_ID`'''
#for each row in result of query
for row in cur.execute(query, (rundate)):
try:
d= row[1].strftime('%m/%d/%Y')
except:
d=""
for i in range(len(row)):
if not row[i]:
row[i] = ""
I need help getting the results from the pick.py widget to return as rundate within weeklyrpt.py. As I said the first part of the rundate function works fine (returns the data request going back 7 days)
I don't know if it's the only problem, but definitely a problem is this:
for row in cur.execute(query, (rundate)):
(rundate) is a single value, (rundate,) is a tuple. cur.execute requires a tuple.
I am accessing a api and extracting a json but I want to make sure I stay within the hourly request limit, what would be the best way to do this?
This where I make the request:
# return the json
def returnJSONQuestion(id):
url = 'http://someApi.com?index_id={0}&output=json'
format_url = url.format(id)
try:
urlobject = urllib2.urlopen(format_url)
jsondata = json.loads(urlobject.read().decode("utf-8"))
print jsondata
shortRandomSleep()
except urllib2.URLError, e:
print e.reason
except(json.decoder.JSONDecodeError,ValueError):
print 'Decode JSON has failed'
return jsondata
I usually use a cheap hack where I make the script run every other minute by checking the current time. This is the general form of the function:
def minuteMod(x, p=0):
import datetime
minute = datetime.datetime.now() + datetime.timedelta(seconds=15)
minute = int(datetime.datetime.strftime(minute, "%M"))
if minute % x == p:
return True
return False
p is the remainder here and has a default value of 0 so no particular need to pass in the second argument.
So basically, if you want your script to run only every other minute, you use it like this:
def returnJSONQuestion(id):
if not minuteMod(2):
return None or ''
# rest of the code
This will stop the request if the current minute is not even. Considering this is not the best way to do things, you can use this function to cache results (depending on if this is allowed). So basically, you would do something like this:
def returnJSONQuestion(id):
if minuteMod(3): # current minute is a factor of 3
return jsonFromCache # open a file and output cached contents
else:
url = 'http://...'
storeJSONToFile(url)
return json
You could use a token bucket algorithm, something like this: http://code.activestate.com/recipes/511490/
Have tokens added to the bucket at the rate the API allows you to make requests, and take a token from the bucket each time you make a request.
I'm trying to fetch results in a python2.7 appengine app using cursors, but each time I use with_cursor() it fetches the same result set.
query = Model.all().filter("profile =", p_key).order('-created')
if r.get('cursor'):
query = query.with_cursor(start_cursor = r.get('cursor'))
cursor = query.cursor()
objs = query.fetch(limit=10)
count = len(objs)
for obj in objs:
...
Each time through I'm getting same 10 results. I'm thinkng it has to do with using end_cursor, but how do I get that value if query.cursor() is returning the start_cursor. I've looked through the docs but this is poorly documented.
Your formatting is a bit screwy by the way. Looking at your code (which is incomplete and therefore potentially leaving something out.) I have to assume you have forgotten to store the cursor after fetching results (or return to the user - I am assuming r is a request ?).
So after you have fetched some data you need to call cursor() on the query. e.g This function counts all entities using a cursor.
def count_entities(kind):
c = None
count = 0
q = kind.all(keys_only=True)
while True:
if c:
q.with_cursor(c)
i = q.fetch(1000)
count = count + len(i)
if not i:
break
c = q.cursor()
return count
See how after fetch() has been called the c=q.cursor() call and it's is used as the cursor next time through the loop.
Here's what finally worked:
query = Model.all().filter("profile =", p_key).order('-created')
if request.get('cursor'):
query = query.with_cursor(request.get('cursor'))
objs = query.fetch(limit=10)
cursor = query.cursor()
for obj in objs:
...