of tornado and blocking code - python

I am trying to move away from CherryPy for a web service that I am working on and one alternative that I am considering is Tornado. Now, most of my requests look on the backend something like:
get POST data
see if I have it in cache (database access)
if not make multiple HTTP requests to some other web service which can take even a good few seconds depending on the number of requests
I keep hearing that one should not block the tornado main loop; I am wondering if all of the above code is executed in the post() method of a RequestHandler, does this mean that I am blocking the code ? And if so, what's the appropriate approach to use tornado with the above requirements.

Tornado comes shipped with an asynchronous (actually two iirc) http client (AsyncHTTPClient). Use that one if you need to do additional http requests.
The database lookup should also be done using an asynchronous client in order to not block the tornado ioloop/mainloop. I know there are a couple of tornado tailor made database clients (e.g redis, mongodb) out there. The mysql lib is included in the tornado distribution.

Related

Performance of Singleton Controllers in fastapi

I am having trouble troubleshooting a performance bottleneck in one of my APIs and I have a theory that I need somebody with deeper knowledge of Python to validate for me.
I have a fastapi web service and a nodejs web service deployed on AWS. The node.js api is performing perfeclty under heavier loads, multiple concurrent requests taking same amount of time to be served.
My fastapi service however, is performing absurdly. If I make two requests concurrently, only one is served while the other has to wait for the first to be finished, hence the response time for the second request is twice as long as the first one.
My theory is that I am using Singleton pattern to instantiate the controller after a request comes to a route and the object already being in use and locked is causing the second request to wait until the first is resolved. Could this be it or am I missing something very obvious here? 2 concurrent requests should absolutely not be a problem for any type of web server.

Asynchronous Socket connections in Python or Node

I am creating an application that basically has multiple connections to a third party Chat Streaming API(Socket based).
The way it works is - Every user has an account on my app and another account on the third party app. He gives me an access token for the third party chat app and I connect to the third party API to stream his chats. This happens for hundreds of users.
I need to create a socket connection pool for every user and run parallel threads. I am using a python library(for that API) and am able to achieve real time feeds for single users. How do I implement an asynchronous socket connection pool in Python or NodeJS? I have a Linux micro instance on EC2 and I need to run this application for 1000 users.
I am exploring Redis+Tornado to implement this. Are there any better alternatives?
This will be messy and also a couple of things to consider.
If you are going to use multiple threads remember that you can only run so many per CPU as the OS permits, rather go multiprocessing.
If you are going async with long polling processes it will prevent other clients from processing requests.
Solution
When your application absolutely needs to be real-time I would suggest websockets for server-client interaction.
Then from your clients request start a single process that listens\polls on your streaming API using multiprocessing in python. So you will essentially create a separate process for each client.
And now, to make your WebSocketHandler and Background API Streamer interact with each other you can use the Observer Pattern (https://en.wikipedia.org/wiki/Observer_pattern) to notify the WebSocket that you have received data from the API.
Make sure that you assign a unique ID to every client and make sure that you only post the data to the intended client when using websockets.
EDIT:
Web:
Also on your question regarding Tornado. It is a good lightweight framework for running a couple of users maybe 1000. But anything more than that I would suggest looking at Django as it will allow you to be more productive in producing code and also there are lots of tools out there that the community have developed over time.
Database:
Red.is is a good choice if you need a very fast no-sql db, also have a look at mongodb. If you require a multi-region DB I would suggest going with Cassandra or CouchDB due to the partitioned nodes. The image below might help you better decide which DB to use.

Handling multiple requests in Flask

My Flask applications has to do quite a large calculation to fetch a certain page. While Flask is doing that function, another user cannot access the website, because Flask is busy with the large calculation.
Is there any way that I can make my Flask application accept requests from multiple users?
Yes, deploy your application on a different WSGI server, see the Flask deployment options documentation.
The server component that comes with Flask is really only meant for when you are developing your application; even though it can be configured to handle concurrent requests with app.run(threaded=True) (as of Flask 1.0 this is the default). The above document lists several options for servers that can handle concurrent requests and are far more robust and tuneable.
For requests that take a long time, you might want to consider starting a background job for them.

Twisted server-client interconnecting XML-RPC and REST services

I have a service provided by a REST API, with a Python library wrapping it using python-requests.
I have a 'dumb' user interface designed by a third party (not Python) to connect to a local XML-RPC.
Now I have to connect both ends and forward the XML-RPC calls to the REST API and return the results. It's mostly asynchronous and doesn't depend on results returning to the user in real-time. Most of the XML-RPC calls are supposed to return immediately, queue a task, and some other call will query the results later. Data is stored in an sqlite database until needed.
So, I decided to use twisted.web.xmlrpc for this middle layer and use the requests based lib for the remote calls and it works fine. I guess I'm blocking twisted's mainloop for a few seconds once in a while, but that's not a big deal.
The problem is that I also have to make some big file uploads from this middle layer to the HTTP server providing the REST API. I can't make those uploads using the requests based lib because it will block the twisted loop until the upload is finished.
I'd rather not use multithreading, and I really don't want to rewrite the python-requests based lib I have as a twisted client. Is there any way I can integrate requests into twisted's mainloop, or any other reasonable solution?
If you like requests' style of API, but want something that would work with Twisted, consider using treq. There are support libraries for writing interfaces which can be either synchronous or asynchronous depending on their caller's needs.
If you really want to use requests, but you don't want to block the main loop, you can invoke it with twisted.internet.threads.deferToThread. This is mostly transparent, and if your requests don't share any state you can almost ignore the fact that you're using multithreading.
But, ultimately, Jean-Paul's comment is correct; you are going to need to make some changes to the way this code works, if you want to change the way it works.

Making an asynchronous interface appear synchronous to mod_python users

I have a Python-driven web interface powered by Apache 2.2 with mod_python and Python 2.4. I need to make an asynchronous process appear synchronous to users of this web interface.
When users access one module on this website:
An external SOAP interface will be contacted with a unique identifier and will respond with a number N
The external interface will respond asynchronously by contacting a SOAP server on my machine between 1 and 10 times (the number N tells us how many responses we will receive)
I need to somehow aggregate these responses and pass them to the original module which will display the information back to the user. The goal is to make the process appear synchronous to the user.
What is the best way to handle this synchronization issue? Is this something Twisted would be well-suited for?
I am not restricting myself to Python for the solution, though it is preferred because everything else on the server is in Python. I prefer a solution that is both scalable and will take a minimal amount of programming time (though I understand that these attributes are somewhat at odds).
Maybe you can use Orbited to get ajax push with long-lived HTTP connections to your web clients. Orbited is based on Twisted, so I think it makes sense to look at if you already know Twisted. Have a look at this tutorial to get started.

Categories