Dispatch isn't routing internal queue requests properly - python

I have a process a user can launch which inserts a bunch of items into a queue. This queue sends a request to the worker's url /pipeline/foo and the handler takes it from there.
The handler that consumes items coming into /pipeline/foo is on a separate module module-pipeline.
The issue is that in production dispatch.yaml doesn't seem to dispatch the internal requests made by the queue to the right module. However, if I manually send a request to the URL (as a user inserting the URL in the browser), it seems to dispatch me to the right module...
dispatch.yaml
application: my-app
dispatch:
- url: "*/pipeline/*"
module: module-pipeline
load-queue.py
# does some things before...
for url in urls_to_load:
task = taskqueue.Task(url='/pipeline/foo', params={'id': url.key.id()})
queue.add(task)
This works fine on the dev_appserver however in production when the queue sends the request to pipeline/fooits processed by the default module (returning a 404 as it's not implemented there), whereas if I send the same request manually through my browser (GET request), it's processed by module-pipeline
Any ideas what's wrong?

Try to use parameter target when defining the queue in queue.yaml instead of routing defined in dispatch.yaml.
Target parameter
A string naming a module/version, a frontend version, or a backend, on which to execute all of the tasks enqueued onto this queue.
The module (or frontend or backend) and version in which the handler runs is determined by:
The target or header keyword arguments in your call to the Task() constructor.
The target directive in the queue.yaml file.

Related

Load the value of a route dynamically in Sanic at startup

I want to do an asynchronous HTTP call to an external service at server startup and get an URL from there which I can then use in my own Sanic routing. E.g., fetch the string which needs to be my actual route via an httpx call (for the sake of simplicity, let's say the string returned by the external service is api/users/) and then use it as a route in my Sanic microservice.
Unfortunately it seems a before_server_start listener does not do the trick, as that is run after routes are loaded and I get a FinalizationError("Cannot finalize router more than once.") if I try to update the string value of a route.
Any ideas of how else I could hook up my call before defining / adding routes? I would like to keep it as coupled as possible to the Sanic app, i.e., not use a utility script that would run before it, but instead have the call to the external service triggered every time the app starts.
You can do it inside the before_server_start event with a little roundabout.
#app.before_server_start
async def setup_dynamic_routes(app, _):
app.router.reset()
# add your routes here
app.router.finalize()

How to handle callbacks in Python 3?

I have a custom HTTP method/verb (lets say LISTEN) which allows me to listen for an update on a resource stored on a remote server. The API available for this has a blocking call which will get my client code to listen for an update till I interrupt the execution of that call. Just to provide an example, if I were to perform a curl as follows:
curl -X LISTEN http://<IP-Address>:<Port>/resource
The execution of this creates a blocking call, providing me updates on the resource whenever a new value for this resource is pushed to the server (similar to a pub-sub model), the response for that would look similar to this:
{"data":"value update 1","id":"id resource"}
{"data":"value update 2","id":"id resource"}
(...)
If I were to write code to handle this in Python, how do I call my url using this custom verb and handle the blocking call/call back while ensuring that this does not block the execution of the rest of my code?
If you're using Python requests lib with a custom HTTP verb and need to read stream content, you can do something like this:
import json
import requests # sudo pip3 install requests
url = "http://........."
r = requests.request('LISTEN', url, stream=True)
for line in r.iter_lines():
# filter out keep-alive new lines
if line:
decoded_line = line.decode('utf-8')
print(json.loads(decoded_line))
Note: by default all requests calls are blocking, so you need to run this code in a separate thread/process to avoid that.
...while ensuring that this does not block the execution of the rest of my code
Since you provided no details about your application, I will try to list some general thoughts on question.
Your task can be solved in many ways. Solution depends on your app architecture.
If this is a web server, you can take a look at tornado(see streaming callback) or aiohttp streaming examples.
On the other hand you can run the code above in a separate process and communicate with other applications/services using RabbitMQ for example (or other ipc mechanism).

Why binding to context is necessary in Werkzeug

I was reading the source code of the Werkzeug library in github and in one of the examples (Simplewiki to name it), in the application.py file there is function which binds the application to the current active context. I would like to know why this is necessary, or where can I find something that explains this?
The function is this:
def bind_to_context(self):
"""
Useful for the shell. Binds the application to the current active
context. It's automatically called by the shell command.
"""
local.application = self
And this is the part where the dispatcher binds the request.
def dispatch_request(self, environ, start_response):
"""Dispatch an incoming request."""
# set up all the stuff we want to have for this request. That is
# creating a request object, propagating the application to the
# current context and instanciating the database session.
self.bind_to_context()
request = Request(environ)
request.bind_to_context()
As far as I know, contexts in the Werkzeug is about separating environment between different threads. For example, contexts are very common thing in the Flask framework which is built on top of the Werkzeug. You can run Flask application in multi-threaded mode. In such case you'll have only one application object which is accessed by multiple threads simultaneously. Each thread requires a piece of data within the app for private usage. Storing such data is organized via thread's local storage. And this is called the context.

Use a flask session inside a python thread

How can I update a flask session inside a python thread? The below code is throwing this error:
*** RuntimeError: working outside of request context
from flask import session
def test(ses):
ses['test'] = "test"
#app.route('/test', methods=['POST', 'GET'])
def mytest():
t = threading.Thread(target=test, args=(session, ))
t.start()
When you execute t.start(), you are creating an independent thread of execution which is not synchronized with the execution of the main thread in any way.
The Flask session object is only defined in the context of a particular HTTP request.
What does the variable session mean in the second thread (t)?
When t executes, there is no guarantee that the user request from the main thread still exists or is in a modifiable state. Perhaps the HTTP request has already been fully handled in the main thread.
Flask detects that you are trying to manipulate an object that is dependent on a particular context, and that your code is not running in that context. So it raises an exception.
There are a variety of approaches to synchronizing output from multiple threads into a single request context but... what are you actually trying to do here?
None of the documentation I've seen really elaborates why this isn't possible in this framework - it's as if they have never heard of the use case.
In a nutshell, the built in session uses the user's browser (the cookie) as storage for the session - this is not what I understand sessions to be, and oh boy the security issues - don't store any secrets in there - the session is basically JSON encoded, compressed then set as a cookie - at least it's signed, I guess.
Flask-Session mitigates the security issues by behaving more like sessions do in other frameworks - the cookie is just an opaque identifier meaningful only in the back end - but the value changes every time the session changes, requiring the cookie be sent to the browser again - a background thread won't have access to the request when it's been completed a long time ago, so all you have is a one way transfer of data - out of the session and into your background task.
Might I suggest the baggage claim pattern? Your initial request handling function designates some key in some shared storage - a file on disk, a row in a database identified by some key, an object key in an in memory cache - whatever - and puts that in the session, then passes the session to your background process which can inspect the session for the location to place the results. Your subsequent request handling functions can then check this location for the results.

Class instance in Python Bottle application, is it shared between threads/processes?

I have created a class in a Bottle application which handles and stores URL information and is created each time a http request is made:
#route('/<fullurl:path>')
def page_req(fullurl=''):
urlData = urlReq(request.urlparts[1], fullurl)
urlData is the instance name and urlReq is the class name.
Obviously the urlData instance will contain information generated from one request. I'm just wondering what happens if another request comes in before the cycle of the first request has finished and sent its output. Will the second request change the data in urlData or will there be two separate processes each with their own version of urlData?
I've been reading the WSGI processes/threads information and the Bottle docs all afternoon and it's still not immediately clear. I have tried writing a small automated script fire multiple requests at the development server but it seems to hold excess requests off til one has finished. Hope I've been clear enough.
bottle.request is a thread-safe instance of LocalRequest(). If accessed from within a request callback, this instance always refers to the current request (even on a multithreaded server).
see http://bottlepy.org/docs/dev/api.html#bottle.request

Categories