How to execute asynchronous post-processing in CherryPy?

How to execute asynchronous post-processing in CherryPy? - python

Context:
Imagine that you have a standard CherryPy hello word app:
def index(self):
return "Hello world!"
index.exposed = True
and you would like to do some post-processing, i.e. record request processing or just log the fact that we were called from specific IP. What you would do is probably:
def index(self):
self.RunMyPostProcessing()
return "Hello world!"
index.exposed = True
However, that will add to your request processing time. (btw. And probably you will use decorators, or even some more sophisticated method if you would like to call it on every function).
Question:
Is there a way of creating a global threading aware queue (buffer) to which each request can write messages (events) that needs be logged, while some magic function will grab it and post-process? Would you know a pattern for such a thing?
I bet that CherryPy supports something like that :-)
Thank you in advance...

The "global threading aware queue" is called Queue.Queue.

As i was looking for this and it's now outdated, i found it useful to provide the correct (2012ish) answer. Simply add this at the beginning of the function that handles your url :
cherrypy.request.hooks.attach('on_end_request', mycallbackfunction)
There's more infos on hooks in the documentation but it's not very clear to me.

Related

Python Flask: How to wait for webhook to be executed?

I am working on a Python flask app, and the main method start() calls an external API (third_party_api_wrapper()). That external API has an associated webhook (webhook()) that receives the output of that external API call (note that the output that webhook() receives is actually different from the response returned in the third_party_wrapper())
The main method start() needs the result of webhook(). How do I make start() wait for webhook() to be executed? And how do wo pass the returned value of webhook() back to start()?
Here's is a minimal code snippet to capture the scenario.
#app.route('/webhook', methods=['POST'])
def webhook():
return "webhook method has executed"
# this method has a webhook that calls webhook() after this method has executed
def third_party_api_wrapper():
url = 'https://api.thirdparty.com'
response = requests.post(url)
return response
# this is the main entry point
#app.route('/start', methods=['POST'])
def start():
third_party_api_wrapper()
# The rest of this code depends on the output of webhook().
# How do we wait until webhook() is called, and how do we access the returned value?

The answer to this question really depends on how you plan on running your app in production. It's much simpler if we make the assumption that you only plan to have a single instance of your app running at once (as opposed to multiple behind a load balancer, for example), so I'll make that assumption first to give you a place to start, and comment on a more "production-ready" solution afterwards.
A big thing to keep in mind when writing a web application is that you have to understand how you want the outside world to interact with your app. Do you expect to have the /start endpoint called only once at the beginning of your app's lifetime, or is this a generic endpoint that may start any number of background processes that you want the caller of each to wait for? Or, do you want the behavior where any caller after the first one will wait for the same process to complete as the first one? I can't answer these questions for you, it depends on the use-case you're trying to implement. I'll give you a relatively simple solution that you should be able to modify to fulfill any of the ones I mentioned though.
This solution will use the Event class from the threading standard library module; I added some comments to clarify which parts you may have to change depending on the specifics of the API you're calling and stuff like that.
import threading
import uuid
from typing import Any
import requests
from flask import Flask, Response, request
# The base URL for your app, if you're running it locally this should be fine
# however external providers can't communicate with your `localhost` so you'll
# need to change this for your app to work end-to-end.
BASE_URL = "http://localhost:5000"
app = Flask(__name__)
class ThirdPartyProcessManager:
def __init__(self) -> None:
self.events = {}
self.values = {}
def wait_for_request(self, request_id: str) -> None:
event = threading.Event()
actual_event = self.events.setdefault(request_id, event)
if actual_event is not event:
raise ValueError(f"Request {request_id} already exists.")
event.wait()
return self.values.pop(request_id)
def finish_request(self, request_id: str, value: Any) -> None:
event = self.events.pop(request_id, None)
if event is None:
raise ValueError(f"Request {request_id} does not exist.")
self.values[request_id] = value
event.set()
MANAGER = ThirdPartyProcessManager()
# This is assuming that you can specify the callback URL per-request, otherwise
# you may have to get the request ID from the body of the request or something
#app.route('/webhook/<request_id>', methods=['POST'])
def webhook(request_id: str) -> Response:
MANAGER.finish_request(request_id, request.json)
return "webhook method has executed"
# Somehow in here you need to create or generate a unique identifier for this
# request--this may come from the third-party provider, or you can generate one
# yourself. There are three main paths I see here:
# - If you can specify the callback/webhook URL in each call, you can just pass them
# <base>/webhook/<request_id> and use that to identify which request is being
# responded to in the webhook.
# - If the provider gives you a request ID, you can return it from this function
# then retrieve it from the request body in the webhook route
# For now, I'll assume the first situation but you should be able to implement the second
# with minimal changes
def third_party_api_wrapper() -> str:
request_id = uuid.uuid4().hex
url = 'https://api.thirdparty.com'
# Just an example, I don't know how the third party API you're working with works
response = requests.post(
url,
json={"callback_url": f"{BASE_URL}/webhook/{request_id}"}
)
# NOTE: unrelated to the problem at hand, you should always check for errors
# in HTTP responses. This method is an easy way provided by requests to raise
# for non-success status codes.
response.raise_for_status()
return request_id
#app.route('/start', methods=['POST'])
def start() -> Response:
request_id = third_party_api_wrapper()
result = MANAGER.wait_for_request(request_id)
return result
If you want to run the example fully locally to test it, do the following:
Comment out lines 62-71, which actually make the external API call
Add a print statement after line 77, so that you can get the ID of the "in flight" request. E.g. print("Request ID", request_id)
In one terminal, run the app by pasting the above code into an app.py file and running flask run in that directory.
In another terminal, start the process via:
curl -XPOST http://localhost:5000/start
Copy the request ID that will be logged in the first terminal that's running the server.
In a third terminal, complete the process by calling the webhook:
curl -XPOST http://localhost:5000/webhook/<your_request_id> -H Content-Type:application/json -d '{"foo":"bar"}'
You should see {"foo":"bar"} as the response in the second terminal that made the /start request.
I hope that's enough to help you get started w/ whatever problem you're trying to solve.
There are a couple of design-y comments I have based on the information provided as well:
As I mentioned before, this will not work if you have more than one instance of the app running at once. This works by storing the state of in-flight requests in a global state inside your python process, so if you have more than one process, they won't all be working and modifying the same state. If you need to run more than one instance of your process, I would use a similar approach with some database backend to store the shared state (assuming your requests are pretty short-lived, Redis might be a good choice here, but once again it'll depend on exactly what you're trying to do).
Even if you do only have one instance of the app running, flask is capable of being run in a variety of different server contexts--for example, the server might be using threads (the default), greenlets via gevent or a similar library, or multiple processes, or maybe some other approach entirely in order to handle multiple requests concurrently. If you're using an approach that creates multiple processes, you should be able to use the utilities provided by the multiprocessing module to implement the same approach as I've given above.
This approach probably will work just fine for something where the difference in time between the API call and the webhook response is small (on the order of a couple of seconds at most I'd say), but you should be wary of using this approach for something where the difference in time can be quite large. If the connection between the client and your server fails, they'll have to make another request and run the long-running process that your third party is completing for you again. Some proxies and load balancers may also have time out behavior that could terminate the request after a certain amount of time even if nothing goes wrong in the connection between your server and the client making a request to it. An alternative approach would be for your /start endpoint to return quickly and give the client a request_id that they could poll for updates. As an example, AWS Athena's API is structured like this--there is a StartQueryExecution method, and separate GetQueryExecution and GetQueryResults methods that the client makes requests to check the status of a query and retrieve the results respectively (there are also other methods like StopQueryExecution and GetQueryRuntimeStatistics available as well). You can check out the documentation here.
I know that's a lot of info, but I hope it helps. Happy to update the answer w/ more specific info if you'll provide some more details about your use-case.

How do I redirect within Flask without changing the URL

How do I redirect within Flask without changing the URL? I've looked for while now and haven't found a good solution. There are other threads for .htaccess and nginx but they aren't applicable to this situation
The only solution I have is:
# url_root is currently http://127.0.0.1:5000/
# args = t/Some%20text?p=preset
res = requests.get(request.url_root + args)
return res.text, res.status_code, res.headers.items()
It's essentially just visiting the url, then copying the data and sending it. While this does work, it is very slow, around 300-400ms wait time on a local server.
I feel like there's got to be some way to reroute the request, but I can't find it.
Edit: I am not looking for a basic Flask proxy, as that does the same thing that I am already doing. It contains the same bottleneck that I want to avoid. I want something that will call a different page from within Flask.

I ended up doing the following:
#app.route('/t/<urltext>', defaults={'args': None})
def index(urltext, args):
# Stuff and things
# later...
#app.route('/l/<urlcode>')
def little(urlcode):
# Processing urlcode
return index(urltext=text, args=args)
It's basically just avoiding the arguments problem but it works

Add/Change channel for handler at runtime

In circuits 3.1.0, is there a way to set at runtime the channel for a handler?
An useful alternative would be to add a handler at runtime and specify the channel.
I've checked the Manager.addHandler implementation but couldn't make it work. I tried:
self._my_method.__func__.channel = _my_method_channel
self._my_method.__func__.names = ["event name"]
self.addHandler(self._my_method)

Yes there is; however it's not really a publically exposed API.
Example: (of creating event handlers at runtime)
#handler("foo")
def on_foo(self):
return "Hello World!"
def test_addHandler():
m = Manager()
m.start()
m.addHandler(on_foo)
This is taken from tests.core.test_dynamic_handlers
NB: Every BaseComponent/Component subclass is also a subclass of Manager and has the .addHandler() and .removeHandler() methods. You can also set the #handler() dynamically like this:
def on_foo(...):
...
self.addHandler(handler("foo")(on_foo))
You can also see a good example of this in the library itself with circuits.io.process where we dynamically create event handlers for stdin, stdout and stderr.

weird python print behaviour

why this does not print anything:
for item in pipe.json["value"]["items"]:
print item["pubDate"]
but this does:
for item in pipe.json["value"]["items"]:
print item["pubDate"] + "\n"
p.s. the loop is running inside another loop.
p.p.s. this is running inside google app engine application.i have looked at http response and it is completely empty in the first case.

It might be a problem with buffering, in which case flushing stdout would help.
import sys
sys.stdout.flush()

Are you using some sort of wsgi framework, or just trying to write pure CGI code (which is a mistake?)
You probably don't want to be using print at all here, but rather using your framework's method of adding to the response (for webapp, self.response.out.write). My guess would be that without the extra \n, you're writing all of this data to the HTTP headers, and with it you're only losing the first line of your output.

On GAE if you want to use print for output you'll have to print an empty string before any printing so that these kind of problems won't happen:
print ""
print "something"

This is just a wild guess, but if item['pubDate'] is a non-string object it might a result of differences between special methods. Perhaps the __str__ method returns nothing, while the __add__ method does something different.

developing for modularity & reusability: how to handle While True loops?

I've been playing around with the pybluez module recently to scan for nearby Bluetooth devices. What I want to do now is extend the program to also find nearby WiFi client devices.
The WiFi client scanner will have need to have a While True loop to continually monitor the airwaves. If I were to write this as a straight up, one file program, it would be easy.
import ...
while True:
client = scan()
print client['mac']
What I want, however, is to make this a module. I want to be able to reuse it later and, possible, have others use it too. What I can't figure out is how to handle the loop.
import mymodule
scan()
Assuming the first example code was 'mymodule', this program would simply print out the data to stdout. I would want to be able to use this data in my program instead of having the module print it out...
How should I code the module?

I think the best approach is going to be to have the scanner run on a separate thread from the main program. The module should have methods that start and stop the scanner, and another that returns the current access point list (using a lock to synchronize). See the threading module.

How about something pretty straightforward like:
mymodule.py
import ...
def scanner():
while True:
client = scan()
yield client['mac']
othermodule.py
import mymodule
for mac in mymodule.scanner():
print mac
If you want something more useful than that, I'd also suggest a background thread as #kindall did.

Two interfaces would be useful.
scan() itself, which returned a list of found devices, such that I could call it to get an instantaneous snapshot of available bluetooth. It might take a max_seconds_to_search or a max_num_to_return parameter.
A "notify on found" function that accepted a callback. For instance (maybe typos, i just wrote this off the cuff).
def find_bluetooth(callback_func, time_to_search = 5.0):
already_found = []
start_time = time.clock()
while 1:
if time.clock()-start_time > 5.0: break
found = scan()
for entry in found:
if entry not in already_found:
callback_func(entry)
already_found.append(entry)
which would be used by doing this:
def my_callback(new_entry):
print new_entry # or something more interesting...
find_bluetooth(my_callback)

If I get your question, you want scan() in a separate file, so that it can be reused later.
Create utils.py
def scan():
# write code for scan here.
Create WiFi.py
import utils
def scan_wifi():
while True:
cli = utils.scan()
...
return

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to execute asynchronous post-processing in CherryPy? - python

The "global threading aware queue" is called Queue.Queue.

Related

Python Flask: How to wait for webhook to be executed?

How do I redirect within Flask without changing the URL

Add/Change channel for handler at runtime

weird python print behaviour

developing for modularity & reusability: how to handle While True loops?

Categories

Resources