Choose sleep time to retry request - Python - python

I have a program that makes a web request. I protect against an invalid request by doing the following:
import time
for i in range(5):
val = makeRequest()
if val is None:
time.sleep(i*5)
else:
break
My question: is this a good way to protect against invalid web requests (i.e. connection error, server error, etc.)? In particular, is my choice of increasing time and 5 seconds good?
Edit: the function makeRequest() returns None upon an exception.

Related

in a single thread application server(flask) that handling a time consuming request(CPU-bound), what will happen to other requests?

in short:
I expect that while a single-threaded server is handling a request other requests must be dropped, (so the browser must continuously send that HTTP request to finally get the response) but in my example, this is not happening. why?
in my simple flask app I have two routes that, corresponding view function returns as follow:
/fibo that returns a JSON that contains the start and end of execution of view function that calculates Fibonacci 37. I use the recursive version of Fibonacci as a CPU-bound process.
start_process and end_process value is in the form of %M:%S (the current time's minute and second will be displayed).
/fibo2 that contains the same logic as /fibo. (i will use it to better illuster my question)
an example of returned JSON response is like this:
{"start_process":"02:02",
"fibo execution time":"7.22","fibo(37)":24157817,
"end_process":"02:09"}
I run the app by flask run command. I send two get requests to /fibo route, in separate tabs with the browser (while the first tab is loading I sent the second request)
the responses for each one are:
{"start_process":"02:02",
"fibo execution time":"7.22",
"fibo(37)":24157817,
"end_process":"02:09"}
and
{"start_process":"02:09",
"fibo execution time":"6.93",
"fibo(37)":24157817,
"end_process":"02:16"}
as the JSON responses illuster the second process started after that the first request is finished its job.
the Wireshark log of this process is :
in this section, my question is why after that for each of the separate requests TCP handshake happened the second request's get was sent after receiving the response of the first request? I expect that I must see lost HTTP packets, in Wireshark
I repeat this step and send requests to each of routes /fibo and /fibo2 the results are:
{"start_process":"29:41",
"fibo execution time":"16.30",
"fibo(37)":24157817,
"end_process":"29:58"
}
and
{"start_process":"29:43",
"fibo execution time":"16.21",
"fibo(37)":24157817,
"end_process":"29:59"
}
this time the requests are started approximately at the same time. ( maybe the 2-second delay is because of the time spent entering the second request in the browser.) but this time the time of executing the Fibonacci is doubled.
this time as the Wireshark shows the HTTP requests are sent separately from each other.
codes for flask app:
import re
from flask import Flask, request
from fibo import fib_t
from datetime import datetime
app = Flask(__name__)
#app.route('/home')
def hello():
return 'hello '
#app.route('/fibo')
def cal_fibo():
start_p = datetime.now().strftime("%M:%S")
v, duretion = fib_t(37)
end_p = datetime.now().strftime("%M:%S")
d = {'start_process':start_p,'fibo(37)':v,'fibo execution time':duretion, 'end_process':end_p}
return d
#app.route('/fibo2')
def cal_fibo2():
start_p = datetime.now().strftime("%M:%S")
v, duretion = fib_t(37)
end_p = datetime.now().strftime("%M:%S")
d = {'start_process':start_p,'fibo(37)':v,'fibo execution time':duretion, 'end_process':end_p}
return d
if __name__ == '__main__':
app.run(debug=True)
fibo module:
def fib(a):
if a == 1 or a==2:
return 1
else:
return fib(a-1) + fib(a-2)
def fib_t(a):
t = time.time()
v = fib(a)
return v, f'{time.time()-t:.2f}'
if __name__ == '__main__':
print(fib_t(38))

(Step Functions Activity Worker) Best practice for handling long polling timeouts in boto?

I am working on my first Step Functions Activity Worker (EC2). Predictably, after 5 minutes of long polling with no activity from the Step Functions state machine, the client connection times out with the error:
botocore.exceptions.ReadTimeoutError: Read timeout on endpoint URL: "https://states.us-east-1.amazonaws.com/"
Would it be better to catch the error and retry the long poll (every 5 minutes when no activity is present), or try to terminate the call early and retry before the error? I've thought about using a different type of loop, but I want to maximize the value of long polling and not repeatedly request against the Step Functions API (although if that's the best way I'll do it).
Thank you,
Andrew
import boto3
import time
import json
region = 'us-east-1'
activity_arn = 'arn:aws:states:us-east-1:754112345676:activity:Process_Imagery'
while True:
client = boto3.client('stepfunctions', region_name=region)
response = client.get_activity_task(activityArn=activity_arn,
workerName='imagery_processor')
activity_token = response['taskToken']
input_params = json.loads(response['input'])
print("================")
print(input_params)
client.send_task_success(taskToken=activity_token, output='true')
I believe I answered my own question here. The AWS documentation states:
"The maximum time the service holds on to the request before responding is 60 seconds. If no task is available within 60 seconds, the poll returns a taskToken with a null string."
However, instead of string being returned, I believe the JSON response from StepFunctions has no 'taskToken' at all. This while loop works:
import boto3
import time
import json
from botocore.config import Config as BotoCoreConfig
region = 'us-east-1'
boto_config = BotoCoreConfig(read_timeout=70, region_name=region)
sf_client = boto3.client('stepfunctions', config=boto_config)
activity_arn = 'arn:aws:states:us-east-1:754185699999:activity:Process_Imagery'
while True:
response = sf_client.get_activity_task(activityArn=activity_arn,
workerName='imagery_processor')
if 'taskToken' not in response:
print('No Task Token')
# time.sleep(2)
else:
print(response['taskToken'])
print("===================")
activity_token = response['taskToken']
sf_client.send_task_success(taskToken=activity_token, output='true')

How to check for changes in url parameter inside loop with Python Flask

I am trying to start and stop a service running as a Python web app using Flask. The service involves a loop that executes continuously, listening for microphone input and taking some action if the input surpasses a predefined threshold. I can get the program to start execution when the url is passed with a /on parameter, but once it starts, I can't find a way to stop it. I have tried using request.args.get to monitor the status of the url parameter and watch for it to change from /on to /off, but for some reason, the program doesn't register that I have changed the query string to attempt to halt the execution. Is there a better way to execute my code and have it stop when the url parameter is changed from /on to /off? Any help is greatly appreciated!
import alsaaudio, time, audioop
import RPi.GPIO as G
import pygame
from flask import Flask
from flask import request
app = Flask(__name__)
G.setmode(G.BCM)
G.setup(17,G.OUT)
pygame.mixer.init()
pygame.mixer.music.load("/home/pi/OceanLoud.mp3")
#app.route('/autoSoothe', methods=['GET','POST'])
def autoSoothe():
toggle = request.args.get('state')
print(toggle)
if toggle == 'on':
# Open the device in nonblocking capture mode. The last argument could
# just as well have been zero for blocking mode. Then we could have
# left out the sleep call in the bottom of the loop
inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE,alsaaudio.PCM_NONBLOCK,'null',0)
# Set attributes: Mono, 8000 Hz, 16 bit little endian samples
inp.setchannels(1)
inp.setrate(8000)
inp.setformat(alsaaudio.PCM_FORMAT_S16_LE)
# The period size controls the internal number of frames per period.
# The significance of this parameter is documented in the ALSA api.
# For our purposes, it is suficcient to know that reads from the device
# will return this many frames. Each frame being 2 bytes long.
# This means that the reads below will return either 320 bytes of data
# or 0 bytes of data. The latter is possible because we are in nonblocking
# mode.
inp.setperiodsize(160)
musicPlay = 0
while toggle == 'on':
toggle = request.args.get('state')
print(toggle)
if toggle == 'off':
break
# Read data from device
l,data = inp.read()
if l:
try:
# Return the maximum of the absolute value of all samples in a fragment.
if audioop.max(data, 2) > 20000:
G.output(17,True)
musicPlay = 1
else:
G.output(17,False)
if musicPlay == 1:
pygame.mixer.music.play()
time.sleep(10)
pygame.mixer.music.stop()
musicPlay = 0
except audioop.error, e:
if e.message != "not a whole number of frames":
raise e
time.sleep(.001)
return toggle
if __name__ == "__main__":
app.run(host='0.0.0.0', port=5000, debug=True)
When an HTTP request is made from the client to the Flask server, the client sends one request and waits for a response from the server. This means when you send the state parameter, there is no way for the client to retroactively change it.
There are a few different ways to get your desired behavior.
The first that comes to my mind is to use some asynchronous code. You could have code that starts a thread/process when the state is "on" and then finishes the request. This thread would run your audio loop. Then the client could send another request but with the state being "off", which could alert the other process to gracefully stop.
Here is some information about multiproccessing; however, there is a lot of information about how to do similar things with Flask using Celery, etc.

Stop request processing in CherryPy and return 200 response from a tool

My Question
I am looking for a way to stop request processing in a Tool without raising an exception. In other words: I want to stop the request befor it gets to the specified controller and return a 2xx status code?
Background
We want our application to support CORS and therefore the preflight request. The idea was to write a tool which hooks before_handler. If an OPTIONS request is made, return the relevant CORS-headers and exit.
The problem is, that I havn't found a way to stop the execution flow. This means that the original URL is processed, as it would be if requested normally. The point is: this could lead to security issues, as the preflight request is always made. This is what I have so far:
class CORSTool(cherrypy.Tool):
def __init__(self):
cherrypy.Tool.__init__(self, 'before_handler', self.insert_cors_header, priority=40)
cherrypy.request.hooks.attach('before_handler', self.insert_cors_header, priority=40)
def _setup(self):
cherrypy.Tool._setup(self)
def insert_cors_header(self):
"""
Inserts the relevant CORS-Headers:
- Access-Control-Allow-Origin: <from-config>
- Access-Control-Allow-Methods: POST, GET, OPTIONS
- Access-Control-Max-Age: 86400
"""
if cherrypy.config.get('enable_cors') is True:
if cherrypy.request.method.upper() == "OPTIONS":
cherrypy.response.headers['Access-Control-Allow-Origin'] = cherrypy.config.get('cors_domains')
cherrypy.response.headers['Access-Control-Allow-Methods'] = 'POST, GET, OPTIONS'
cherrypy.response.headers['Access-Control-Max-Age'] = '86400'
# Stop execution
cherrypy.response.body = None
cherrypy.response.status = 200
cherrypy.response.finalize()
# And NOW stop doing anything else...
Alternatives likely not working
I know that there is a cherrypy-cors plugin, but from the source I can't see how this stops the execution.
I also know that CherryPy has a MethodDispatcher, but that would mean a complete rewrite of our code.
Searching Stackoverflow I found this answer, however I don't want to "kill" the execution, I just want a way to prevent the handler from being called.
Just have your Tool set request.handler = None. See Request.respond for the code which implements this and the CachingTool for an example:
request = cherrypy.serving.request
if _caching.get(**kwargs):
request.handler = None

Errno 10054 while scraping HTML with Python: how to reconnect

I'm a novice Python programmer trying to use Python to scrape a large amount of pages from fanfiction.net and deposit a particular line of the page's HTML source into a .csv file. My program works fine, but eventually hits a snag where it stops running. My IDE told me that the program has encountered "Errno 10054: an existing connection was forcibly closed by the remote host".
I'm looking for a way to get my code to reconnect and continue every time I get the error. My code will be scraping a few hundred thousand pages every time it runs; is this maybe just too much for the site? The site doesn't appear to prevent scraping. I've done a fair amount of research on this problem already and attempted to implement a retry decorator, but the decorator doesn't seem to work. Here's the relevant section of my code:
def retry(ExceptionToCheck, tries=4, delay=3, backoff=2, logger=None):
def deco_retry(f):
#wraps(f)
def f_retry(*args, **kwargs):
mtries, mdelay = tries, delay
while mtries > 1:
try:
return f(*args, **kwargs)
except ExceptionToCheck as e:
msg = "%s, Retrying in %d seconds..." % (str(e), mdelay)
if logger:
logger.warning(msg)
else:
print(msg)
time.sleep(mdelay)
mtries -= 1
mdelay *= backoff
return f(*args, **kwargs)
return f_retry # true decorator
return deco_retry
#retry(urllib.error.URLError, tries=4, delay=3, backoff=2)
def retrieveURL(URL):
response = urllib.request.urlopen(URL)
return response
def main():
# first check: 5000 to 100,000
MAX_ID = 600000
ID = 400001
URL = "http://www.fanfiction.net/s/" + str(ID) + "/index.html"
fCSV = open('buffyData400k600k.csv', 'w')
fCSV.write("Rating, Language, Genre 1, Genre 2, Character A, Character B, Character C, Character D, Chapters, Words, Reviews, Favorites, Follows, Updated, Published, Story ID, Story Status, Author ID, Author Name" + '\n')
while ID <= MAX_ID:
URL = "http://www.fanfiction.net/s/" + str(ID) + "/index.html"
response = retrieveURL(URL)
Whenever I run the .py file outside of my IDE, it eventually locks up and stops grabbing new pages after about an hour, tops. I'm also running a different version of the same file in my IDE, and that appears to have been running for almost 12 hours now, if not longer-is it possible that the file could work in my IDE but not when run independently?
Have I set my decorator up wrong? What else could I potentially do to get python to reconnect? I've also seen claims that the SQL native client being out of date could cause problems for a Window user such as myself - is this true? I've tried to update that but had no luck.
Thank you!
You are catching URLErrors, which Errno: 10054 is not, so your #retry decorator is not going to retry. Try this.
#retry(Exception, tries=4)
def retrieveURL(URL):
response = urllib.request.urlopen(URL)
return response
This should retry 4 times on any Exception. Your #retry decorator is defined correctly.
Your code for reconnecting looks good except for one part - the exception that you're trying to catch. According to this StackOverflow question, an Errno 10054 is a socket.error. All you need to do is to import socket and add an except socket.error statement in your retry handler.

Categories