I'm running a socketio server with a flask app using gevent. My namespace code is here:
class ConversationNamespace(BaseNamespace):
def __init__(self, *args, **kwargs):
request = kwargs.get('request', None)
if request:
self.current_app = request['current_app']
self.current_user = request['current_user']
super(ConversationNamespace, self).__init__(*args, **kwargs)
def listener(self):
r = StrictRedis(host=self.current_app.config['REDIS_HOST'])
p = r.pubsub()
p.subscribe(self.current_app.config['REDIS_CHANNEL_CONVERSATION_KEY'] + self.current_user.user_id)
conversation_keys = r.lrange(self.current_app.config['REDIS_CONVERSATION_LIST_KEY'] +
self.current_user.user_id, 0, -1)
# Reverse conversations so the newest is up top.
conversation_keys.reverse()
# Emit conversation history.
pipe = r.pipeline()
for key in conversation_keys:
pipe.hgetall(self.current_app.config['REDIS_CONVERSATION_KEY'] + key)
self.emit(self.current_app.config['SOCKETIO_CHANNEL_CONVERSATION'] + self.current_user.user_id, pipe.execute())
# Listen for new conversations..
for m in p.listen():
conversation = r.hgetall(self.current_app.config['REDIS_CONVERSATION_KEY'] + str(m['data']))
self.emit(self.current_app.config['SOCKETIO_CHANNEL_CONVERSATION'] +
self.current_user.user_id, conversation)
def on_subscribe(self):
self.spawn(self.listener)
What I'm noticing in my app is that when I first start the SocketIO server (code below), the clients are able to connect via a websocket in firefox and chrome
#!vendor/venv/bin/python
from gevent import monkey
monkey.patch_all()
from yellowtomato import app_instance
import werkzeug.serving
from socketio.server import SocketIOServer
app = app_instance('sockets')
#werkzeug.serving.run_with_reloader
def runServer():
SocketIOServer(('0.0.0.0', app.config['SOCKET_PORT']), app, resource='socket.io').serve_forever()
runServer()
After sometime (maybe an hour or so), when I try to connect to that namespace via the browser client, it no longer communicates with a websocket but rather xhr-polling. Moreover, it takes about 20 seconds before the first response comes from the server. It gives the end user the perception that things have become very slow (but its only when rendering the page on the first subscibe, the xhr polling happens frequently and events get pushed to clients in a timely fashion).
What is triggering this latency and how can I assure that clients connect quickly using websockets.
Figured it out - I was running via the command line in an ssh session. Ending the sessions killed the parent process which was causing gevent to not work properly.
Forking the SocketIOServer process in a screen session fixed the problem
Related
TLDR:
I need to setup a flask app for multiprocessing such that the API and stomp queue listener are running in separate processes and therefore not interfering with each other's operations.
Details:
I am building a python flask app that has API endpoints and also creates a message queue listener to connect to an activemq queue with the stomp package.
I need to implement multiprocessing such that the API and listener do not block each other's operation. That way the API will accept new requests and the listener will continue to listen for new messages and carry out tasks accordingly.
A simplified version of the code is shown below (some details are omitted for brevity).
Problem: The multiprocessing is causing the application to get stuck. The worker's run method is not called consistently, and therefore the listener never gets created.
# Start the worker as a subprocess -- this is not working -- app gets stuck before the worker's run method is called
m = Manager()
shared_state = m.dict()
worker = MyWorker(shared_state=shared_state)
worker.start()
After several days of troubleshooting I suspect the problem is due to the multiprocessing not being setup correctly. I was able to prove that this is the case because when I stripped out all of the multiprocessing code and called the worker's run method directly, the all of the queue management code is working correctly, the CustomWorker module creates the listener, creates the message, and picks up the message. I think this indicates that the queue management code is working correctly and the source of the problem is most likely due to the multiprocessing.
# Removing the multiprocessing and calling the worker's run method directly works without getting stuck so the issue is likely due to multiprocessing not being setup correctly
worker = MyWorker()
worker.run()
Here is the code I have so far:
App
This part of the code creates the API and attempts to create a new process to create the queue listener. The 'custom_worker_utils' module is a custom module that creates the stomp listener in the CustomWorker() class run method.
from flask import Flask, request, make_response, jsonify
from flask_restx import Resource, Api
import sys, os, logging, time
basedir = os.path.dirname(os.getcwd())
sys.path.append('..')
from custom_worker_utils.custom_worker_utils import *
from multiprocessing import Manager
# app.py
def create_app():
app = Flask(__name__)
app.config['BASE_DIR'] = basedir
api = Api(app, version='1.0', title='MPS Worker', description='MPS Common Worker')
logger = get_logger()
'''
This is a placeholder to trigger the sending of a message to the first queue
'''
#api.route('/initialapicall', endpoint="initialapicall", methods=['GET', 'POST', 'PUT', 'DELETE'])
class InitialApiCall(Resource):
#Sends a message to the queue
def get(self, *args, **kwargs):
mqconn = get_mq_connection()
message = create_queue_message(initial_tracker_file)
mqconn.send('/queue/test1', message, headers = {"persistent":"true"})
return make_response(jsonify({'message': 'Initial Test Call Worked!'}), 200)
# Start the worker as a subprocess -- this is not working -- app gets stuck before the worker's run method is called
m = Manager()
shared_state = m.dict()
worker = MyWorker(shared_state=shared_state)
worker.start()
# Removing the multiprocessing and calling the worker's run method directly works without getting stuck so the issue is likely due to multiprocessing not being setup correctly
#worker = MyWorker()
#worker.run()
return app
Custom worker utils
The run() method is called, connects to the queue and creates the listener with the stomp package
# custom_worker_utils.py
from multiprocessing import Manager, Process
from _datetime import datetime
import os, time, json, stomp, requests, logging, random
'''
The listener
'''
class MyListener(stomp.ConnectionListener):
def __init__(self, p):
self.process = p
self.logger = p.logger
self.conn = p.mqconn
self.conn.connect(_user, _password, wait=True)
self.subscribe_to_queue()
def on_message(self, headers, message):
message_data = json.loads(message)
ticket_id = message_data[constants.TICKET_ID]
prev_status = message_data[constants.PREVIOUS_STEP_STATUS]
task_name = message_data[constants.TASK_NAME]
#Run the service
if prev_status == "success":
resp = self.process.do_task(ticket_id, task_name)
elif hasattr(self, 'revert_task'):
resp = self.process.revert_task(ticket_id, task_name)
else:
resp = True
if (resp):
self.logger.debug('Acknowledging')
self.logger.debug(resp)
self.conn.ack(headers['message-id'], self.process.conn_id)
else:
self.conn.nack(headers['message-id'], self.process.conn_id)
def on_disconnected(self):
self.conn.connect('admin', 'admin', wait=True)
self.subscribe_to_queue()
def subscribe_to_queue(self):
queue = os.getenv('QUEUE_NAME')
self.conn.subscribe(destination=queue, id=self.process.conn_id, ack='client-individual')
def get_mq_connection():
conn = stomp.Connection([(_host, _port)], heartbeats=(4000, 4000))
conn.connect(_user, _password, wait=True)
return conn
class CustomWorker(Process):
def __init__(self, **kwargs):
super(CustomWorker, self).__init__()
self.logger = logging.getLogger("Worker Log")
log_level = os.getenv('LOG_LEVEL', 'WARN')
self.logger.setLevel(log_level)
self.mqconn = get_mq_connection()
self.conn_id = random.randrange(1,100)
for k, v in kwargs.items():
setattr(self, k, v)
def revert_task(self, ticket_id, task_name):
# If the subclass does not implement this,
# then there is nothing to undo so just return True
return True
def run(self):
lst = MyListener(self)
self.mqconn.set_listener('queue_listener', lst)
while True:
pass
Seems like Celery is excatly what you need.
Celery is a task queue that can distribute work across worker-processes and even across machines.
Miguel Grinberg created a great post about that, Showing how to accept tasks via flask and spawn them using Celery as tasks.
Good Luck!
To resolve this issue I have decided to run the flask API and the message queue listener as two entirely separate applications in the same docker container. I have installed and configured supervisord to start and the processes individually.
[supervisord]
nodaemon=true
logfile=/home/appuser/logs/supervisord.log
[program:gunicorn]
command=gunicorn -w 1 -c gunicorn.conf.py "app:create_app()" -b 0.0.0.0:8081 --timeout 10000
directory=/home/appuser/app
user=appuser
autostart=true
autorestart=true
stdout_logfile=/home/appuser/logs/supervisord_worker_stdout.log
stderr_logfile=/home/appuser/logs/supervisord_worker_stderr.log
[program:mqlistener]
command=python3 start_listener.py
directory=/home/appuser/mqlistener
user=appuser
autostart=true
autorestart=true
stdout_logfile=/home/appuser/logs/supervisord_mqlistener_stdout.log
stderr_logfile=/home/appuser/logs/supervisord_mqlistener_stderr.log
I have a flask app. I want the client-server connection to terminate if the server does not respond within a stipulated time (say 20 seconds). I read here that the session.permanent = True can be set. I am a bit unclear where this goes in the server side code (if at all this is the way??).
For simplicity I am including the minimal server side code I have. Actually the server is performing a File Read/Write operation and returning a result to the client.
from flask import Flask, session, app
from flask_restful import Api, Resource
from datetime import timedelta
app = Flask(__name__)
api = Api(app)
class GetParams(Resource):
def get(self):
print ("Hello.")
return 'OK'
api.add_resource(GetParams, '/data')
if __name__ == '__main__':
app.run(host='127.0.0.1', port=5002)
Can anyone tell me what should I do here so that the connection between my client and server is terminated if the server does not respond i.e., send data back to the client within 20 seconds?
Long running tasks should be dealt with in a different design because, if you allow your server to keep a request alive for 50 minutes, you can't force user browser to do so.
I would recommend implementing the long running task as a thread that notifies the user once it's done.
For more readings about the problem statement and suggested solutions:
timeout issue with chrome and flask
long request time patterns
I believe that the only thing you need is to put your connexion statement in a try/except block. So that you will be able to handle any kind of connexion error.
Furthermore, a session timeout and a connexion fail/unreachable server are different things. A session timeout disconnect a user from a server which is here for too long (usually used to avoid a user to forgot a session open). Whereas when a server is unreachable the user isn't connected so there is no session timeout.
from flask import Flask, session, app
from flask_restful import Api, Resource
from datetime import timedelta
app = Flask(__name__)
api = Api(app)
class GetParams(Resource):
def get(self):
print ("Hello.")
return 'OK'
api.add_resource(GetParams, '/data')
if __name__ == '__main__':
try:
app.run(host='130.0.1.1', port=5002)
except:
print("unexcepted error")
you could qualify the received exception, but you'll have to read a bit of doc http://flask.pocoo.org/docs/1.0/quickstart/#what-to-do-if-the-server-does-not-start
I have Flask application with route (webhook) receiving POST requests (webhooks) from external phone application (incomming call = POST request). This route sets threading.Event.set() and based on this event, another route (eventsource) sends an event stream to opened EventSource connection on a webpage created by yet another route (eventstream).
telfa_called = Event()
telfa_called.clear()
call = ""
#telfa.route('/webhook', methods=['GET', 'POST'])
def webhook():
global call
print('THE CALL IS HERE')
x = request.data
y = ET.fromstring(x.decode())
caller_number = y.find('caller_number').text
telfa_called.set() # setting threading.Event for another route
return Response(status=200)
#telfa.route('/eventstream', methods = ['GET','POST'])
#login_required
def eventstream():
jsid = str(uuid.uuid4())
return render_template('telfa/stream.html', jsid=jsid)
def eventsource_gen():
while 1:
if telfa_called.wait(10):
telfa_called.clear()
print('JE TO TADY')
yield "data: {}\n\n".format(json.dumps(call))
#telfa.route('/eventsource', methods=['GET', 'POST'])
def eventsource():
return Response(eventsource_gen(), mimetype='text/event-stream')`
Everything works great when testing in pure Python application. The problem starts, when I move this to production server, where I use uWSGI with nginx. (Other parts of this Python application work without any troubles.)
When the eventSource connection is opened and incomming webhook should be processed, whole flask server stucks (for all other users, too), page stops to load and I cannot find, where the error is.
I only know, the POST request from external application is received, but the response to EventSource is not made.
I suspect it has something to do with processes - the EventSource connection from JavaScript is one process, the webhook route another - and they do not communicate. So or so, I suppose this has to have very trivial solution, but I didn't find it in past 3 days and nights. Any hints, please? Thanks in advance.
To be complete, this my uwsgi config file:
[uwsgi]
module = wsgi:app
enable-threads = true
master = true
processes = 5
threads = 2
uid = www-data
gid= www-data
socket = /tmp/myproject.sock
chmod-socket = 666
vacuum = true
die-on-term = true
limit-as=512
buffer-size = 512000
workers = 5
max-requests = 100
req-logger = file:/tmp/uwsg-req.log
logger = file:/tmp/uwsgi.log`
I have an internal website, which is required to have file sharing links which are direct links to a shared location on the pc that the table row represents.
When accessing the links, I would like to first test if the remote pc is available, in the quickest possible fashion. I thought this would be a ping, but for some reason, timeout does not work with -w (yes windows)
This is not allowed to take time, for some reason, it causes the web server to block on ping, even though I am using Tornado to serve Flask routes asynchronously.
Preferably, I would like to have the server continously updating the front end, with active/deactive links, allowing users to only access links with pc's online, and restrict them elsehow. Possibly even maintaining the value in a database.
Any and all advice is welcome, I've never really worked with File Sharing before.
Backend is Python 3.4, Flask & Tornado.
The Ajax Call
function is_drive_online2(sender){
hostname = sender.parentNode.parentNode.id;
$.get('Media/test',{
drive: hostname
},
function(returnedData){
console.log(returnedData[hostname]);
if(returnedData[hostname] == 0){
open("file://"+hostname+"/MMUsers");
}else{
alert("Server Offline");
}
}
);
}
The Response (Flask route)
#app.route('/Media/test', methods=['GET', 'POST'])
def ping_response():
before = datetime.datetime.now()
my_dict = dict()
drive = request.args.get('drive')
print(drive)
response = os.system("ping -n 1 -w 1 " + drive)
my_dict[drive] = response
after = datetime.datetime.now()
print(after-before)
return json.dumps(my_dict), 200, {'Content-Type': 'application/json'}
The ping call takes 18 seconds to resolve, even with -w 1 (or 1000)
I only need to support Internet Explorer 11. Is this even a plausible scenario? Are there hardware limitations to something like this Should the server have a long thread whose sole task is to continuously update active/deactivate links? I am not sure the best approach.
Thanks for reading.
EDIT 1:
Trying to apply the ping_response as native Tornado asynchronous response. Result is the same
class PingHandler(RequestHandler):
#asynchronous
def get(self):
dr = self.get_argument('drive')
print(dr)
b = datetime.datetime.now()
myreturn = {self.get_argument('drive'):
os.system("ping -n 1 -w 1 " + self.get_argument('drive'))}
a = datetime.datetime.now()
print(a-b)
self.write(myreturn)
wsgi = WSGIContainer(app)
application = Application([(r"/Media/test", PingHandler),
(r".*", FallbackHandler, dict(fallback=wsgi))])
application.listen(8080)
IOLoop.instance().start()
EDIT 2: Trying to Use Celery. Still blocking.
def make_celery(app):
celery = Celery(app.name, broker=app.config['CELERY_BROKER_URL'])
celery.conf.update(app.config)
TaskBase = celery.Task
class ContextTask(TaskBase):
abstract = True
def __call__(self, *args, **kwargs):
with app.app_context():
return TaskBase.__call__(self, *args, **kwargs)
celery.Task = ContextTask
return celery
celery = make_celery(app)
#celery.task
def ping(drive):
"""
Background Task to test is computer is online
:param drive: The drive name to test
:return: Non Zero status code for Offline boxes.
"""
response = os.system("ping -n 1 -w 1 " + drive)
return json.dumps({drive: response}), 200, {'Content-Type': 'application/json'}
#app.route('/Media/test', methods=['GET', 'POST'])
def ping_response():
before = datetime.datetime.now()
my_dict = dict()
drive = request.args.get('drive')
print(drive)
this_drive = temp_session.query(Drive).filter(Drive.name == drive).first()
address = this_drive.computer.ip_address if this_drive.computer.ip_address else this_drive.name
response = ping.apply_async(args=[address])
return response
Tornado isn't serving your Flask app asynchronously (that's impossible: asynchronousness is a property of the interface and ping_response is a synchronous function). Tornado's WSGIContainer is a poor fit for what you're trying to do (see the warning in its docs)
You should either use Flask with a multi-threaded server like gunicorn or uwsgi, or use native Tornado asynchronous RequestHandlers.
I am trying to be able to respond incoming web requests simultaneously, while processing of a request includes quite long IO call. I'm going to use gevent, as it's supposed to be "non-blocking"
The problem I found is that requests are processed sequentially even though I have a lot of gevent threads. For some reason requests get served by single green thread.
I have nginx (with default config which isn't relevant here I think), also I have uwsgi and simple wsgi app that emulates IO-blocking call as gevent.sleep(). Here they are:
uwsgi.ini
[uwsgi]
chdir = /srv/website
home = /srv/website/env
module = wsgi:app
socket = /tmp/uwsgi_mead.sock
#daemonize = /data/work/zx900/mob-effect.mead/logs/uwsgi.log
processes = 1
gevent = 100
gevent-monkey-patch
wsgi.py
import gevent
import time
from flask import Flask
app = Flask(__name__)
#app.route("/")
def hello():
t0 = time.time()
gevent.sleep(10.0)
t1 = time.time()
return "{1} - {0} = {2}".format(t0, t1, t1 - t0)
then I simultaneously (almost) open two tabs in my browser, and here is what I get as result:
1392297388.98 - 1392297378.98 = 10.0021491051
# first tab, processing finished at 1392297378.98
1392297398.99 - 1392297388.99 = 10.0081849098
# second tab, processing started at 1392297398.99
As you can see, first call blocked execution of the view. What did I wrong?
Send requests with curl or anything else than browser as browser has a limit on the number of simultaneous connections per site or per address. Or use two different browsers.