Running kafka consumer with Django

Running kafka consumer with Django - python

I've setup a kafka server on AWS and I already have a Django project acting as the producer, using kafka-python.
I've also setup a second Django project to act as the consumer (kafka-python), but I'm trying to figure out a way to run the consumer automatically after the server has started without having to trigger the consumer through an API call.
Everything I've tried so far either runs the consumer and blocks the server from starting or runs the server and blocks the consumer.

I did something like this on a Django project : I put the consumer launch into a daemon thread into a method and I call this method in the manage.py file.
I'm not really sure about impacts of modify manage.py file, but it's work fine.
def run_consumers():
thread = threading.Thread(name=my_consumer, target=main.lauch_consumer, args=()
thread.daemon = True
thread.start()
And in manage.py I added :
if os.environ.get('RUN_MAIN'):
# Run consumers once at server start
run_consumers()

Related

InactiveRpcChannel Error when running a GRPC server as a subprocess daemon in flask on cloud run

I'm deploying a flask app on cloud run using gunicorn. My setup is kind of unusual so bear with me; my app is written in python and part of it is a GRPC server binary embedded in the python package, originally written in cpp. This GRPC server is initiated only once (by checking the process name using psutil) by the factory method for the GRPC's channel stub
The problem is the following:
While testing locally everything runs smoothly, the flask app gets a request it proxies the requests to the GRPC subprocess, wait for a response serialize it, and then sends it back to the client.
However, on cloud run I'm facing with thousand of InactiveRpcError.
What I'm suspecting is that cloud-run is killing the workers which is coincidentally the parent process for the grpc server subprocess.
However, I tried to add a retry to re-connect but it doesn't work
Which led me to another suspicion that cloud-run runtime might suspending the process without killing since the connection function, should start another process
Finally, I tried to create a init process using bash and run the grpc-server as a background process as detailed in Ahmed-tb blog here but still without any success
Code for the GRPC server initializer
def connect(timeout=TIMEOUT_SEC):
#retry(retry_on_exception=grpc.FutureTimeoutError, stop_max_attempt_number=5)
def create_channel():
ch = grpc.insecure_channel(f'localhost:{CHANNEL_PORT}')
grpc.channel_ready_future(ch).result(timeout=timeout)
return ch
executable_path = os.path.join(
os.path.dirname(pb2.__file__),
EXECUTABLE_NAME
)
if EXECUTABLE_NAME not in [p.name() for p in psutil.process_iter()]:
subprocess.Popen([executable_path], shell=True, stdout=subprocess.PIPE)
ch = create_channel()
client = svc.LocalServiceStub(ch)
return client

Flask web application infinite process on start-up best practices

I'm working on Flask/Python back-end for web application, which will be deployed with Docker. It's receiving some data from the user, then send it to the RabbitMQ queue. On the other side there're workers in docker containers. The consume task, process and send result back to other RabbitMQ message queue. It works pretty well when I'm running Flask app locally without placing it into docker. What I'm doing - on application start-up I'm running receive thread, which is linked to callback function that consume result from message queue and update note in database. Deploying it with docker and gunicorn as web server, thread is not started with no reason.
How can I run background threads, that would work alongside with Flask web application through the whole web application cycle?
At this moment I'm thinking about next options:
Try to start thread on application start-up
Run background task with Celery (cons: after reading a lot of stuff I think Celery is not applicable here because it mostly using for task that could be finished in the upcoming future. I could be wrong. If yes, please correct me).
Flask module called app_scheduler. I've used it just couple times and I think Celery is better in this case.
My web app structure is pretty simple:
app.py
models.py
app
|-engine - # classes Send and Receive, function run_receive()
|-tasks
|-__init__.py
|-routes.py
|-tools.py
|-email.py
|-templates
|-__init__.py # create_app() function here
My create_app() function from the app/__init__.py file.
def create_app(config_class=Config):
app_ = Flask(__name__)
app_.config.from_object(config_class)
# extensions
db.init_app(app_)
migrate.init_app(app_, db)
celery.init_app(app_)
mail.init_app(app_)
# blueprints
from app.engine import bp as bp_engine, run_receive
from app.tasks import bp as bp_tasks
app_.register_blueprint(bp_engine)
app_.register_blueprint(bp_tasks)
...
So what I want to do is on start-up run thread which will consume results from RabbitMQ. Function for this action is called run_receive() and is placed in the app/engine/__init__.py.
Also in the future I'd like to run another thread to check whether notes in the database are out-of-date. So it would be another thread, that would be invoked every day to check database.
So what are the best practices in this situation?
Thank you for your response.

Multiprocess can not work in flask when I deploy flask to server by using flask + uwsgi + nginx

I'm using flask + uwsgi + nginx to deploy website on server.
In the flask, my code is below, here is what I want to design: every time when I click run model, it would run a model in another process, but the interface would link to the waiting interface immediately.
train_dataset.status = status
db.session.commit()
text = 'Start training your models, please wait for a while or check results several hours later'
# would run a model in another process
task = Process(target=start_train, args=(app.config['UPLOAD_FOLDER'], current_user.id, p.id, p.task), name="training.exe")
task.start()
print(task.pid, task.name)
session[f"{project_name}_train"] = task.pid
# in the meanwhile, link to waiting interface
return render_template('p_computeview.html', project_name=project_name, text=text,
status=status, p=p, email=email, not_identify=not_identify)
And when I test in local development environment
app.run()
it's ok, when I click run model, the interface just link to wait interface and I can see the model running logs.
But when I deploy to server, I chose uwsgi + nginx + flask.
In uwsgi.ini, I already specify the processes
processes=2
threads=5
But when I click run model, the interface was still, didn't link to waiting interface, however I can see the model running logs, and when finished modeling, the interface would link to waiting interface (which prove that the Process function was not working ??)
my server have 2 cpus, so I think it can support multi process
Can someone help me ? I guess there are some problems in uwsgi or nginx ?

The desire to run separate threads or processes within a request context is a common one. For various reasons, except in very narrow circumstances, it is a desire that leads to frustration. In this case, as soon as task goes out of scope, the Process machinery gets torn down.
If you want to start a long-running task from a request handler, use a framework like celery or Rq, which arrange to run jobs entirely out of process from the http server.

Dramatiq doesn't add tasks to the queue

I'm trying to run some dramatiq actors from my Falcon API method, like this:
def on_post(self, req, resp):
begin_id = int(req.params["begin_id"])
count = int(req.params["count"])
for page_id in range(begin_id, begin_id + count):
process_vk_page.send(f"https://vk.com/id{page_id}")
resp.status = falcon.HTTP_200
My code gets to "send" method, goes through the loop without any problems. But where are no new tasks in the queue! Actor itself is not called, and "default" queue in my broker is empty. If I set custom queue, it is still empty. My actor looks like this:
#dramatiq.actor(broker=broker)
def process_vk_page(link: str):
pass
Where broker is
broker = RabbitmqBroker(url="amqp://guest:guest#rabbitmq:5672")
RabbitMQ logs tell that it is connecting fine
I've done some additional research in debugger. It gets the message (which is meant to be sent to broker) fine, and broker.enqueue in Actor.send_with_options() returns no exceptions, although I can't really get it's internal logic. I don't really know why it fails, but it is definitely RabbitmqBroker.enqueue() which is causing the problem.
Broker is RabbitMQ 3.8.2 on Erlang 22.2.1, running in Docker from rabbitmq Docker Hub image with default settings. Dramatiq version is 1.7.0.
In RabbitMQ logs there are only connections to broker when app starts and disconnections when I turn it off, like this:
2020-01-05 08:25:35.622 [info] <0.594.0> accepting AMQP connection <0.594.0> (172.20.0.1:51242 -> 172.20.0.3:5672)
2020-01-05 08:25:35.627 [info] <0.594.0> connection <0.594.0> (172.20.0.1:51242 -> 172.20.0.3:5672): user 'guest' authenticated and granted access to vhost '/'
2020-01-05 08:28:35.625 [error] <0.597.0> closing AMQP connection <0.597.0> (172.20.0.1:51246 -> 172.20.0.3:5672):
missed heartbeats from client, timeout: 60s
Broker is defined in __init__.py of main package and imported in subpackages. I'm not sure that specifying the same broker instance in decorators of all the functions is fine, but where are nothing in docs which bans it. I guess it doesn't matter, since if I create new broker for each Actor it still doesn't work.
I've tried to set Redis as broker, but I still get the same issue.
What might be the reason for this?

Most likely the issue is that you're not telling the workers which broker to use, since you're not declaring a default broker.
You haven't mentioned how your files are laid out in your application, but, assuming your broker is defined as broker inside tasks.py, then you would have to let your workers know about it like so:
dramatiq tasks:broker
See the examples at the end of dramatiq --help for more information and patterns.

Python, Twisted, Django, reactor.run() causing problem

I have a Django web application. I also have a spell server written using twisted running on the same machine having django (running on localhost:8090). The idea being when user does some action, request comes to Django which in turn connects to this twisted server & server sends data back to Django. Finally Django puts this data in some html template & serves it back to the user.
Here's where I am having a problem. In my Django app, when the request comes in I create a simple twisted client to connect to the locally run twisted server.
...
factory = Spell_Factory(query)
reactor.connectTCP(AS_SERVER_HOST, AS_SERVER_PORT, factory)
reactor.run(installSignalHandlers=0)
print factory.results
...
The reactor.run() is causing a problem. Since it's an event loop. The next time this same code is executed by Django, I am unable to connect to the server. How does one handle this?

The above two answers are correct. However, considering that you've already implemented a spelling server then run it as one. You can start by running it on the same machine as a separate process - at localhost:PORT. Right now it seems you have a very simple binary protocol interface already - you can implement an equally simple Python client using the standard lib's socket interface in blocking mode.
However, I suggest playing around with twisted.web and expose a simple web interface. You can use JSON to serialize and deserialize data - which is well supported by Django. Here's a very quick example:
import json
from twisted.web import server, resource
from twisted.python import log
class Root(resource.Resource):
def getChild(self, path, request):
# represents / on your web interface
return self
class WebInterface(resource.Resource):
isLeaf = True
def render_GET(self, request):
log.msg('GOT a GET request.')
# read request.args if you need to process query args
# ... call some internal service and get output ...
return json.dumps(output)
class SpellingSite(server.Site):
def __init__(self, *args, **kwargs):
self.root = Root()
server.Site.__init__(self, self.root, **kwargs)
self.root.putChild('spell', WebInterface())
And to run it you can use the following skeleton .tac file:
from twisted.application import service, internet
site = SpellingSite()
application = service.Application('WebSpell')
# attach the service to its parent application
service_collection = service.IServiceCollection(application)
internet.TCPServer(PORT, site).setServiceParent(service_collection)
Running your service as another first class service allows you to run it on another machine one day if you find the need - exposing a web interface makes it easy to horizontally scale it behind a reverse proxying load balancer too.

reactor.run() should be called only once in your whole program. Don't think of it as "start this one request I have", think of it as "start all of Twisted".
Running the reactor in a background thread is one way to get around this; then your django application can use blockingCallFromThread in your Django application and use a Twisted API as you would any blocking API. You will need a little bit of cooperation from your WSGI container, though, because you will need to make sure that this background Twisted thread is started and stopped at appropriate times (when your interpreter is initialized and torn down, respectively).
You could also use Twisted as your WSGI container, and then you don't need to start or stop anything special; blockingCallFromThread will just work immediately. See the command-line help for twistd web --wsgi.

You should stop reactor after you got results from Twisted server or some error/timeout happening. So on each Django request that requires query your Twisted server you should run reactor and then stop it. But, it's not supported by Twisted library — reactor is not restartable. Possible solutions:
Use separate thread for Twisted reactor, but you will need to deploy your django app with server, which has support for long running threads (I don't now any of these, but you can write your own easily :-)).
Don't use Twisted for implementing client protocol, just use plain stdlib's socket module.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Running kafka consumer with Django - python

Related

InactiveRpcChannel Error when running a GRPC server as a subprocess daemon in flask on cloud run

Flask web application infinite process on start-up best practices

Multiprocess can not work in flask when I deploy flask to server by using flask + uwsgi + nginx

Dramatiq doesn't add tasks to the queue

Python, Twisted, Django, reactor.run() causing problem

Categories

Resources