Flask server unresponsive port - how to kill all threads - python

I am running a Flask powered server in Python. The server has several POST and GET routes. Everything works fine for many days, then suddenly it becomes unresponsive from the point of view of the client, i.e. from the client side it looks like the server is unreachable or not running. Looking at the server side everything seems running normally. I suspect that at the OS level (Windows server 2012) a TCP socket may go bad but I do not get any exception notification on the server. To try and escape this status I have added a heartbeat POST, and server side a periodic scheduled check every 2 minutes that there is a fresher heartbeat coming from the client. If that is not the case (i.e. stale heartbeats) then my idea was to kill all threads serverside (a CTRL-C generated from within the python code) and then restart it externally via the launching script. This is what I followed to kill the server: http://flask.pocoo.org/snippets/67/ However at the first occurrence of the "unreachable" state it looks like also the GET/POST route from localhost is now unresponsive. So the shutdown never gets triggered.
Now my question: is there a way from a scheduler spawned thread to kill all other threads, including the Flask app.run thread?
Many thanks!
PB

Related

Python - best way to wait until a remote system is booted

I am using wake on lan to start a certain server in a python script.
The server is online when I can do a successfull API request, such as:
return requests.get(
url + path,
auth=('user', user_password),
headers={'Content-Type':'application/json'},
verify=False,
timeout=0.05
).json()
What is the best method to wait for the server bootup process (until it is reachable via API) without spamming the network with requests in a loop?
I believe you're very close. Why not put that request in while a d try except blocks?
while True:
try:
return requests.head(...)
except requests.exceptions.ConnectionError:
time.sleep(0.5)
Your two choices are to poll the remote service until it responds or configure the service to in some way notify you that it's up.
There's really nothing wrong with sending requests in a loop, as long as you don't do so unnecessarily often. If the service takes ~10 seconds to come up, checking once a second would be reasonable. If it takes ~10 minutes, every 30 seconds or so would probably be fine.
The alternative - some sort of push notification - is more elegant, but it requires you having some other service up and running already, listening for the notification. For example you could start a simple webserver locally before restarting the remote service and have the remote service make a request against your server when it's ready to start handling requests.
Generally speaking I would start with the polling approach since it's easier and involves fewer moving parts. Just be sure you design your polling in a fault-tolerant way; in particular be sure to specify a maximum time to wait or number of polling attempts to make before giving up. Otherwise your script will just hang if the remote service never comes up.

Terminating a uwsgi worker programmatically

In my application I need to "simulate" a HTTP timeout. Simply put, in this scenario:
client -> myapp -> server
client makes a HTTP POST connection to myapp which forwards it to server. However, server does not respond due to network issues or similar problems. I am stuck with an open TCP session from client which I'll need to drop.
My application uses web.py, nginx and uwsgi.
I cannot return a custom HTTP error such as 418 I am a teapot - it has to be a connection timeout to mirror server's behaviour as closely as possible.
One hack-y solution could be (I guess) to just time.wait() until client disconnects but this would use a uwsgi thread and I have a feeling it could lead to resource starvation because a server timeout is likely to happen for other connections. Another approach is pointed out here however this solution implies returning something to client, which is not my case.
So my question is: is there an elegant way to kill a uwsgi worker programmatically from python code?
So far I've found
set_user_harakiri(N) which I could combine with a time.sleep(N+1). However in this scenario uwsgi detects the harakiri and tries re-spawning the worker.
worker_id() but I'm not sure how to handle it - I can't find much documentation on using it
A suggestion to use connection_fd() as explained here
disconnect() which does not seem to do anything, as the code continues and returns to client
suspend() does suspend the instance, but NGINX returns the boilerplate error page
Any other idea?
UPDATE
Turns out it's more complicated than that. If I just close the socket or disconnect from uwsgi the nginx web server detects a 'server error' and returns a 500 boilerplate error page. And, I do not know how to tell nginx to stop being so useful.
The answer is a combination of both.
From the python app, return 444
Configure nginx as explained on this answer i.e. using the uwsgi_intercept_errors directive.

Which web servers are compatible with gevent and how do the two relate?

I'm looking to start a web project using Flask and its SocketIO plugin, which depends on gevent (something something greenlets), but I don't understand how gevent relates to the webserver. Does using gevent restrict my server choice at all? How does it relate to the different levels of web servers that we have in python (e.g. Nginx/Apache, Gunicorn)?
Thanks for the insight.
First, lets clarify what we are talking about:
gevent is a library to allow the programming of event loops easily. It is a way to immediately return responses without "blocking" the requester.
socket.io is a javascript library create clients that can maintain permanent connections to servers, which send events. Then, the library can react to these events.
greenlet think of this a thread. A way to launch multiple workers that do some tasks.
A highly simplified overview of the entire process follows:
Imagine you are creating a chat client.
You need a way to notify the user's screens when anyone types a message. For this to happen, you need someway to tell all the users when a new message is there to be displayed. That's what socket.io does. You can think of it like a radio that is tuned to a particular frequency. Whenever someone transmits on this frequency, the code does something. In the case of the chat program, it adds the message to the chat box window.
Of course, if you have a radio tuned to a frequency (your client), then you need a radio station/dj to transmit on this frequency. Here is where your flask code comes in. It will create "rooms" and then transmit messages. The clients listen for these messages.
You can also write the server-side ("radio station") code in socket.io using node, but that is out of scope here.
The problem here is that traditionally - a web server works like this:
A user types an address into a browser, and hits enter (or go).
The browser reads the web address, and then using the DNS system, finds the IP address of the server.
It creates a connection to the server, and then sends a request.
The webserver accepts the request.
It does some work, or launches some process (depending on the type of request).
It prepares (or receives) a response from the process.
It sends the response to the client.
It closes the connection.
Between 3 and 8, the client (the browser) is waiting for a response - it is blocked from doing anything else. So if there is a problem somewhere, like say, some server side script is taking too long to process the request, the browser stays stuck on the white page with the loading icon spinning. It can't do anything until the entire process completes. This is just how the web was designed to work.
This kind of 'blocking' architecture works well for 1-to-1 communication. However, for multiple people to keep updated, this blocking doesn't work.
The event libraries (gevent) help with this because they accept and will not block the client; they immediately send a response and when the process is complete.
Your application, however, still needs to notify the client. However, as the connection is closed - you don't have a way to contact the client back.
In order to notify the client and to make sure the client doesn't need to "refresh", a permanent connection should be open - that's what socket.io does. It opens a permanent connection, and is always listening for messages.
So work request comes in from one end - is accepted.
The work is executed and a response is generated by something else (it could be a the same program or another program).
Then, a notification is sent "hey, I'm done with your request - here is the response".
The person from step 1, listens for this message and then does something.
Underneath is all is WebSocket a new full-duplex protocol that enables all this radio/dj functionality.
Things common between WebSockets and HTTP:
Work on the same port (80)
WebSocket requests start off as HTTP requests for the handshake (an upgrade header), but then shift over to the WebSocket protocol - at which point the connection is handed off to a websocket-compatible server.
All your traditional web server has to do is listen for this handshake request, acknowledge it, and then pass the request on to a websocket-compatible server - just like any other normal proxy request.
For Apache, you can use mod_proxy_wstunnel
For nginx versions 1.3+ have websocket support built-in

Server sent events with Flask/Redis: how can more than one client view a stream?

I have multiple clients trying to connect to a server sent events stream at /stream. This works with a single client, but attempting to connect any more clients results in the new client becoming indefinitely blocked waiting for data. If I send more data, it only goes to the first client, and no others.
Here is a small snippet that illustrates my problem:
import flask
import time
app = flask.Flask(__name__)
def event_stream():
for i in xrange(9999):
yield "data: %d\n\n" % i
time.sleep(1)
#app.route("/stream", methods=[ "GET" ])
def stream():
return flask.Response(
event_stream(),
mimetype="text/event-stream"
)
I then run this with gunicorn --worker-class=gevent -w 4 -t 99999 app:app. It works for a single client, but any others get blocked when issuing GET /stream.
What is the cause of the block, and how should I fix it?
I debugged a little more and got some strange results. If I do this procedure, then this happens:
Start client 1 (only client 1 receiving data)
Start client 2 (only client 1 receiving data)
Start client 3 (only client 1 receiving data)
Start client 4 (only client 1 receiving data)
Restart client 1 (all 4 clients suddenly start receiving data at the same time)
It turns out that this is something to do with the Chromium web browser, where I was testing. It holds back on making the request until the first one completes, for some reason. Using curl, or an incognito browser session allowed multiple sessions to run at the same time. This means that my problem doesn't really exist in reality, it just appears that way because of the way that Chromium handles simultaneous requests to the same resource.
I'm not sure quite why Chromium behaves this way, it seems weird. Either way, this isn't a real problem, only a perceived one by my browser.
I was getting similar results in Firefox (as I noted in the comments) then I switched to using WSGIServer in the main block instead of gunicorn and everything works, the timeout is gone (because WSGIServer doesn't timeout its workers but gunicorn does) so I thought this made it worth adding as an answer.
Add this:
if __name__ == '__main__':
http_server = WSGIServer(('127.0.0.1', 8001), app)
http_server.serve_forever()
Then just do
python app.py
[I would not have had a timeout after 30s if I had used Chris' command line and set timeout to 99999 but there would have been much later]

python - can't restart socket connection from client if server becomes unavailable temporarily

I am running a Graphite server to monitor instruments at remote locations. I have a "perpetual" ssh tunnel to the machines from my server (loving autossh) to map their local ports to my server's local port. This works well, data comes through with no hasstles. However we use a flaky satellite connection to the sites, which goes down rather regularly. I am running a "data crawler" on the instrument that is running python and using socket to send packets to the Graphite server. The problem is, if the link goes down temporarily (or the server gets rebooted, for testing mostly), I cannot re-establish the connection to the server. I trap the error, and then run socket.close(), and then re-open, but I just can't re-establish the connection. If I quit the python program and restart it, the connection comes up just fine. Any ideas how I can "refresh" my socket connection?
It's hard to answer this correctly without a code sample. However, it sounds like you might be trying to reuse a closed socket, which is not possible.
If the socket has been closed (or has experienced an error), you must re-create a new connection using a new socket object. For this to work, the remote server must be able to handle multiple client connections in its accept() loop.

Categories