Why do Flask rate limiting solutions use Redis? - python

I want to rate limit my Flask API. I found 2 solutions.
The Flask-Limiter extension.
A snippet from the Flask website using Redis: http://flask.pocoo.org/snippets/70/
What is the significance of Redis when Flask-Limiter is able to rate limit the request on the basis of remote address without Redis?

Redis allows you to store the rate-limiting state in a persistent store.
This means you can:
Restart your web server or web application and still have the rate-limitation work. You won't lose the records of the last requests made because of the working process being destroyed and a new one being created, instead.
Use multiple web servers or web applications. This is because the rate-limitation state is stored in an external data store that also solves the issue of shared data synchronisation and data races. You can run as many web servers as you wish - the rate-limitation is shared among all of them.
Look at the rate-limitation state. Redis offers easy CLI tools that allow you to look at the current active data in an ad-hoc manner, even MONITORing the incoming commands and requests.
Let Redis manage TTL, LRU etc for rate-limitation algorithms. Redis supports this intrinsically.

Related

Proper Celery Monitoring and File-Management with Web-Services

We are working on an Internet and Intranet platform, that serves client-requests over website applications.
There are heavy-weight computations on database entries and files. We want to update the state of those computations via push-notification to the client and make changes to files without the risk of race-conditions. The architecture is supposed to run on both, low- scaled one-server environments and high-scaled cluster environments.
So far, we are running a Django Webserver with Postgresql, the Python-Library Channels and RabbitMQ as Messagebroker.
Once a HTTP-Request from a client arrives in Django, we trigger the task via task.delay() and immediatly return the task_id to the client. The client then opens a websocket to another Django-route and hands over the task_ids he is interested in. Django then polls the state of the task via AsyncResult(task_id).state. Once the state changes, we read the results via AsyncResult(task_id).get and push the task_results to the client.
Here a similar sequence diagramm, from another project I found online.
Source(18.09.21)
Something that is not seen on the diagram, the channels_worker have to fetch the file they are working on from Django. A part of the result is not for the client, but to update the file. Django locks and updates the file localy as soon, as the client asks for and Django receives the task_results from celery (the changes only add attributes and will not be in conflict with each other).
My thoughts about this architecture are:
monitoring of the celery-events is bad so far.
It is only triggered by the client, which has to know about the tasks to begin with.
Django is not suited for monitoring
and polling is not efficient in general.
The file management seems fishy.
I would prefer a proper monitoring, where events are pushed to Django and the client. The client have to be able to consume the events at any time later.
I have some thoughts about solutions, but I would like to hear your opinion first. Later I can bring them in the discussion too.
Greetings
Python
Edit 1
From other sources I got helpful information regarding a good strategy.
Instead of Django "monitoring" the celery tasks, we can use a dedicated Websocket-Service, like FastAPI thand monitors task events and propagates them to the clients via websocket.
The client doesn't have to know about it's running tasks per se. Instead we can have ownership of tasks and the client only has to authenticate himself. The whole Security Blog will be implemented anyways and its supported by Celery.
For the file management, we should use a dedicated object storage like minio. This service can become subscriber to task-events related to files.
We all like Python, but we don't have to re-invent the wheel whenever we want a better monitoring or more control on the behavior of our systems.
That being said, I would recommend re-architecture the solution to decrease the complexity of your django application by exploring what native cloud solutions are offering in terms of micro-service architecture (API-Gateway), AWS SQS and SNS, computation, and storage options for your files.
Such an approach will carry out a lot of the monitoring, configurations, file management activities, and most importantly your monolithic application could scale without code changes or additional configurations.

Asynchronous Socket connections in Python or Node

I am creating an application that basically has multiple connections to a third party Chat Streaming API(Socket based).
The way it works is - Every user has an account on my app and another account on the third party app. He gives me an access token for the third party chat app and I connect to the third party API to stream his chats. This happens for hundreds of users.
I need to create a socket connection pool for every user and run parallel threads. I am using a python library(for that API) and am able to achieve real time feeds for single users. How do I implement an asynchronous socket connection pool in Python or NodeJS? I have a Linux micro instance on EC2 and I need to run this application for 1000 users.
I am exploring Redis+Tornado to implement this. Are there any better alternatives?
This will be messy and also a couple of things to consider.
If you are going to use multiple threads remember that you can only run so many per CPU as the OS permits, rather go multiprocessing.
If you are going async with long polling processes it will prevent other clients from processing requests.
Solution
When your application absolutely needs to be real-time I would suggest websockets for server-client interaction.
Then from your clients request start a single process that listens\polls on your streaming API using multiprocessing in python. So you will essentially create a separate process for each client.
And now, to make your WebSocketHandler and Background API Streamer interact with each other you can use the Observer Pattern (https://en.wikipedia.org/wiki/Observer_pattern) to notify the WebSocket that you have received data from the API.
Make sure that you assign a unique ID to every client and make sure that you only post the data to the intended client when using websockets.
EDIT:
Web:
Also on your question regarding Tornado. It is a good lightweight framework for running a couple of users maybe 1000. But anything more than that I would suggest looking at Django as it will allow you to be more productive in producing code and also there are lots of tools out there that the community have developed over time.
Database:
Red.is is a good choice if you need a very fast no-sql db, also have a look at mongodb. If you require a multi-region DB I would suggest going with Cassandra or CouchDB due to the partitioned nodes. The image below might help you better decide which DB to use.

Spawning and monitoring concurrent tasks with Python and Flask

I have a larger application which automates several data aggregation tasks for an end-user. I'm wrapping this application with a web-based interface using Flask. Both of these applications are wrapped and joined with a MongoDB back-end via a Docker container.
The end-goal is for the user to start and monitor tasks via the web front-end (storing the on-going process information in the MongoDB instance).
What I'm missing is the glue between the Flask front-end and my own back-end. I need a method for starting a process from a web request, storing and monitoring that process, and administering that process from subsequent web requests (starting, stopping, etc.).
Note: This particular application will run a max of two or three processes at once and needs to stay within the confines of being able to run on a docker instance with limited memory/CPU (the back-end code simply makes web requests and aggregates data). Therefore, Celery (as prescribed in an answer from a similar question) is not ideal as it's far too bloated for the simple implementation I need.

django/python - what's the recommended secure way to exchange data between my infrastructure and my customers?

I'm using Django/Postgres and Python for my web site and the background processes. I have hundreds of messages every minute populating my database and I would like to securely allow other customers access their data.
My customers use either linux or windows so I would like some solution that will be platform/database agnostic.
I looked so far at Piston , Twisted , Celery and RabbitMQ. All these have some way to exchange data. But I'm not sure what to use or if there are any better options.
For example I need the customers to be able to access only their data on my database. Another thing I need is to allow the customers to send a short command back to my servers. My servers will execute the command and return re result in real time back to the customer.
Any ideas?
You asked how your customers can securely transmit commands to your website and retrieve results in their response (near "real-time").
... have you considered hooking a reasonable API into your django app? If you're concerned about security, you can use authentication and serve it over HTTPS.
It's not as fancy as the messaging and queuing platforms that the kids are using these days but it'll get the job done.
Things to like about HTTP/HTTPS APIs:
They can be load balanced (highly available and scalable!)
They can be cached (mo' betta performance and the ability to still serve content while rate limiting how often a client can hit the DB)
Just about every programming language has a mature library that allows HTTP/HTTPS connections. Some have multiple, e.g. Python: urllib,urllib2,httplib

Many producer, single consumer with python/mod_wsgi

I have a Pylons web application served by Apache (mod_wsgi, prefork). Because of Apache, there are multiple separate processes running my application code concurrently. Some of the non-critical tasks that the application does I want to defer for processing in background to improve "live" response times. So I'm thinking of task queue, many Apache processes adding tasks to this queue, a single separate Python process processing them one-by-one and removing from queue.
The queue should preferably be persisted to disk so queued unprocessed tasks are not lost because of power outage, server restart etc. The question is what would be a reasonable way to implement such queue?
As for the things I've tried: I started with simple SQLite database and single table in it for storing queue items. In load testing, when increasing level of concurrency, I started getting "database locked" errors, as expected. The quick'n'dirty fix was to replace SQLite with MySQL--it handles concurrency issues well but feels like an overkill for the simple thing I need to do. Queue-related DB operations also show up prominently in my profiling reports.
A message broker like Apache's ActiveMQ is an ideal solution here.
The pipeline could be following:
Application process that is responsible for handling HTTP requests generates replies quickly and sends low-priority, heavy tasks to AMQ queue.
One or more another processes are subscribed to consume AMQ queue and do what is intended to do with these heavy tasks.
The requirement of queue persistence is fulfilled out of the box since ActiveMQ stores messages that are not yet consumed in persistent storage. Furthermore it scales quite well since you're free to deploy multiple HTTP-apps, multiple consumer apps and AMQ itself on different machines each.
We use something like this in our project written in Python utilizing STOMP as underlying communication protocol.
A web server (any web server) is multi-producer, single-consumer process.
A simple solution is to build a wsgiref or Werkzeug backend server to handle your backend requests.
Since this "backend" server is build using WSGI technology, it's very, very similar to the front-end web server. Except. It doesn't produce HTML responses (JSON is usually simpler). Other than that, it's very straightforward.
You design RESTful transactions for this backend. You use all of the various WSGI features for URI parsing, authorization, authentication, etc. You -- generally -- don't need session management, since RESTful servers don't usually offer sessions.
If you get into serious scalability issues, you simply wrap your backend server in lighttpd or some other web engine to create a multi-threaded backend.

Categories