uWSGI kill issue - python

After adding a runloop code in Python, uWSGI seems to be taking longer to kill.
Setup
Python, Flask
Running on Nginx with uWSGI
Using a psql database
Issue
Stopping uWSGI used to be very quick.
Recently I integrated a background thread to periodically check the database and make changes if needed, every 60 seconds.
This seems to be working just fine, except now every time I try to kill uWSGI, it takes a long time.
'seems' like the longer I leave the server running, the longer it takes to die,
or maybe it always just gets killed after the current 60 second loop ends? (I'm not sure my visual inspection supports this)
sounds like a leak?
Here is the code I recently added:
################################
## deploy.ini module .py file ##
################################
from controllers import runloop
from flask import Flask
from flask import request, redirect,Response
app = Flask(__name__)
runloop.startrunloop()
if __name__ == '__main__':
app.run() #app.run(debug=True)
################################
## runloop.py ##
################################
### initialize run loop ###
## code ref: http://stackoverflow.com/a/22900255/2298002
# "Your additional threads must be initiated from the same app that is called by the WSGI server.
# 'The example below creates a background thread that executes every 5 seconds and manipulates data
# structures that are also available to Flask routed functions."
#####################################################################
POOL_TIME = 60 #Seconds
# variables that are accessible from anywhere
commonDataStruct = {}
# lock to control access to variable
dataLock = threading.Lock()
# thread handler
yourThread = threading.Thread()
def startrunloop():
logfuncname = 'runloop.startrunloop'
logging.info(' >> %s >> ENTER ' % logfuncname)
def interrupt():
logging.info(' %s >>>> interrupt() ' % logfuncname)
global yourThread
yourThread.cancel()
def loopfunc():
logging.info(' %s >>> loopfunc() ' % logfuncname)
global commonDataStruct
global yourThread
with dataLock:
# Do your stuff with commonDataStruct Here
# function that performs at most 15 db queries (right now)
# this function will perform many times more db queries in production
auto_close_dws()
# Set the next thread to happen
yourThread = threading.Timer(POOL_TIME, loopfunc, ())
yourThread.start()
def initfunc():
# Do initialisation stuff here
logging.info(' %s >> initfunc() ' % logfuncname)
global yourThread
# Create your thread
yourThread = threading.Timer(POOL_TIME, loopfunc, ())
yourThread.start()
# Initiate
initfunc()
# When you kill Flask (SIGTERM), clear the trigger for the next thread
atexit.register(interrupt)
Additional info (all flask requests work just fine):
I start server with:
$ nginx
and stop with:
$ nginx -s stop
I start uWSGI with:
$ uwsgi —enable-threads —ini deploy.ini
I stop uWSGI to make python changes with:
ctrl + c (if in the foreground)
Otherwise I stop uWSGI with:
$ killall -s INT uwsgi
Then after making changes to the Python code, I start uWSGI again with:
$ uwsgi —enable-threads —ini deploy.ini
The following is an example Nginx output when I try to kill:
^CSIGINT/SIGQUIT received...killing workers...
Fri May 6 00:50:39 2016 - worker 1 (pid: 49552) is taking too much time to die...NO MERCY !!!
Fri May 6 00:50:39 2016 - worker 2 (pid: 49553) is taking too much time to die...NO MERCY !!!
Any help or hints are greatly appreciated. Please let me know if I need to be more clear with anything or if I’m missing any details.

I know the question is a bit old, but I had the same problem and Google got me here, so I will answer for anyone who gets here in the same boat.
The problem seems to be caused by the --enable-threads option, we have several applications running with uwsgi and flask and only the one with this option has the problem.
If what you want is to have the uwsgi process dying faster, you can add this options:
reload-mercy = *int*
worker-reload-mercy = *int*
They will cause the uwsgi to force the process to quit after int seconds.
On the other hand, if all you need is to reload the uwsgi, try just sending a SIGHUP signal. This will cause the uwsgi to reload its children.
POST NOTE: It seems I had spoken too soon, using SIGHUP also hangs sometimes. I am using the mercy options to avoid the hanging to take too long.
Also, I found the issue report on uwsgi github, if anyone wants to follow it:
https://github.com/unbit/uwsgi/issues/844

Related

Profiling an application that uses reactors/websockets and threads

Hi I wrote a Python program that should run unattended. What it basically does is fetching some data via http get requests in a couple of threads and fetching data via websockets and the autobahn framework. Running it for 2 days shows me that it has a growing memory demand and even stops without any notice.
The documentation says I have to run the reactor as last line of code in the app.
I read that yappi is capable of profiling threaded applications
Here is some pseudo code
from autobahn.twisted.websocket import WebSocketClientFactory,connectWS
if __name__ == "__main__":
#setting up a thread
#start the thread
Consumer.start()
xfactory = WebSocketClientFactory("wss://url")
cex_factory.protocol = socket
## SSL client context: default
##
if factory.isSecure:
contextFactory = ssl.ClientContextFactory()
else:
contextFactory = None
connectWS(xfactory, contextFactory)
reactor.run()
The example from the yappi project site is the following:
import yappi
def a():
for i in range(10000000): pass
yappi.start()
a()
yappi.get_func_stats().print_all()
yappi.get_thread_stats().print_all()
So I could put yappi.start() at the beginning and yappi.get_func_stats().print_all() plus yappi.get_thread_stats().print_all() after reactor.run() but since this code is never executed I will never get it executed.
So how do I profile a program like that ?
Regards
It's possible to use twistd profilers by the following way:
twistd -n --profile=profiling_results.txt --savestats --profiler=hotshot your_app
hotshot is a default profiler, you are also able to use cprofile.
Or you can run twistd from your python script by means of:
from twistd.scripts import run
run()
And add necessary parameters to script by sys.argv[1:1] = ["--profile=profiling_results.txt", ...]
After all you can convert hotshot format to calltree by means of:
hot2shot2calltree profiling_results.txt > calltree_profiling
And open generated calltree_profiling file:
kcachegrind calltree_profiling
There is a project for profiling of asynchronous execution time twisted-theseus
You can also try tool of pycharm: thread concurrency
There is a related question here sof
You can also run your function by:
reactor.callWhenRunning(your_function, *parameters_list)
Or by reactor.addSystemEventTrigger() with event description and your profiling function call.

Reload python flask server by function

I'm writing a python/flask application and would like to add the functionality of reloading the server.
I'm currently running the server with the following option
app.run(debug=True)
which results in the following, each time a code change happens
* Running on http://127.0.0.1:5000/
* Restarting with reloader
In a production environment however, I would rather not have debug=True set, but be able to only reload the application server whenever I need to.
I'm trying to get two things working:
if reload_needed: reload_server(), and
if a user clicks on a "Reload Server" button in the admin panel, the reload_server() function should be called.
However, despite the fact that the server get's reloaded after code changes, I couldn't find a function that let's me do exactly that.
If possible I would like to use the flask/werkzeug internal capabilities. I am aware that I could achieve something like that by adding things like gunicorn/nginx/apache, etc.
I think I've had the same problem.
So there was a python/flask application (XY.py), on clients. I wrote a build step (Teamcity) which deploys this python code to the clients. Let's suppose the XY.py is already running on the clients. After deploying this new/fixed/corrected XY.py I had to restart it for applying the changes on the running code.
The problem what I've had is that after using the fine restarting oneliner os.execl(sys.executable, *([sys.executable]+sys.argv)) my port used by app is still busy/established, so after restarting I can't reach it.
This is how I resolved the problem:
I put my app to run on a separate Process and made a queue for it. To see it more cleanly here is some code.
global some_queue = None
#app.route('/restart')
def restart():
try:
some_queue.put("something")
return "Quit"
def start_flaskapp(queue):
some_queue = queue
app.run(your_parameters)
Add this to your main:
q = Queue()
p = Process(target=start_flaskapp, args=[q,])
p.start()
while True: #wathing queue, sleep if there is no call, otherwise break
if q.empty():
time.sleep(1)
else:
break
p.terminate() #terminate flaskapp and then restart the app on subprocess
args = [sys.executable] + [sys.argv[0]]
subprocess.call(args)
Hope it was clean and short enough and it helped to you!
How following in your Python code in order to kill the server:
#app.route('/quit')
def _quit():
os._exit(0)
When process is killed it will repeat itself in the while loop.
app_run.sh:
#!/bin/bash
while true
do
hypercorn app_async:app -b 0.0.0.0:5000
sleep 1
done

Celery: correct way to run lengthy initialization function (per process)

TLDR;
To run an initialization function for each process that is spawned by celery, you can use the worker_process_init signal. As you can read in the docs, handlers for that signal should not be blocking for more than 4 seconds.
But what are the options, if I have to run an init function that takes more than 4 seconds to execute?
Problem
I use a C extension module to run certain operations within celery tasks. This module requires an initialization that might take several seconds (maybe 4 - 10). Since I would rather prefer not to run this init function for every task but for every process that is spawned, I made use of the worker_process_init signal:
#lib.py
import isclient #c extension module
client = None
def init():
global client
client = isclient.Client() #this might take a while
def create_ne_list(text):
return client.ne_receiventities4datachunk(text)
#celery.py
from celery import Celery
from celery.signals import worker_process_init
from lib import init
celery = Celery(include=[
'isc.ne.tasks'
])
celery.config_from_object('celeryconfig')
#worker_process_init.connect
def process_init(sender=None, conf=None, **kwargs):
init()
if __name__ == '__main__':
celery.start()
#tasks.py
from celery import celery
from lib import create_ne_list as cnl
#celery.task(time_limit=1200)
def create_ne_list(text):
return cnl(text)
What happens, when I run this code is what I described in my earlier question (Celery: stuck in infinitly repeating timeouts (Timed out waiting for UP message)). In short: since my init function takes longer than 4 seconds, it sometimes happens that a worker gets killed and restarted and during the restarting process gets killed again, because that's what automatically happens after 4 seconds unresponsiveness. This eventually results in an infinite repeating kill-and-restart process.
Another option would be to run my init function only once for every worker, using the signal worker_init. If I do that, I get a different problem: Now the queued up processes get stuck for some reason.
When I start the worker with a concurrency of 3, and then send a couple of tasks, the first three will get finished, the remaining ones won't get touched. (I assume it might have something to do with the fact, that the client objects needs to be shared between multiple processes and that the C extension, for some reasons, doesn't support that. But to be honest, I'm relatively new to muli-processing, so I can just guess)
Question
So, the question remains: How can I run an init function per process that takes longer than 4 seconds? Is there a correct way to do that and what way would that be?
Celery limits to process init timeout to 4.0 sec.
Check source code
To workaround this limit, you can consider change it before you create celery app
from celery.concurrency import asynpool
asynpool.PROC_ALIVE_TIMEOUT = 10.0 #set this long enough
Note that there is no configuration or setting to change this value.
#changhwan's answer is no longer the only method as of celery 4.4.0. Here is the pull request that added the config option for this feature.
Use the config option
With celery ^4.4.0, this value is configurable. Use the celery application config option worker_proc_alive_timeout. From the stable version docs:
worker_proc_alive_timeout
Default: 4.0.
The timeout in seconds (int/float) when waiting for a new worker process to start up.
Example:
from celery import Celery
from celery.signals import worker_process_init
app = Celery('app')
app.conf.worker_proc_alive_timeout = 10
#worker_process_init.connect
def long_init_function(*args, **kwargs):
import time
time.sleep(8)

Python consumes 99% of CPU running eventlet

I have posted to the python and eventlet mailing list already so I apologize if I seem impatient.
I am running eventlet 0.9.16 on a Small (not micro) reserved ubuntu 11.10 aws instance.
I have a socketserver that is similar to the echo server from the examples in the eventlet documentation. When I first start running the code, everything seems fine, but I have been noticing that after 10 or 15 hours the cpu usage goes from about 1% to 99+%. At that point I am unable to make further connections to the socketserver.
This is the code that I am running:
def socket_listener(self, port, socket_type):
L.LOGG(self._CONN, 0, H.func(), 'Action:Starting|SocketType:%s' % socket_type)
listener = eventlet.listen((self._host, port))
listener.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
pool = eventlet.GreenPool(20000)
while True:
connection, address = listener.accept()
connection.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
L.LOGG(self._CONN, 0, H.func(), 'IPAddress:%s|GreenthreadsFree:%s|GreenthreadsRunning:%s' % (str(address[0]), str(pool.free()),str(pool.running())))
pool.spawn_n(self.spawn_socketobject, connection, address, socket_type)
listener.shutdown(socket.SHUT_RDWR)
listener.close()
The L.LOGG method simply logs the supplied parameters to a mysql table.
I am running the socket_listener in a thread like so:
def listen_phones(self):
self.socket_listener(self._port_phone, 'phone')
t_phones = Thread(target = self.listen_phones)
t_phones.start()
From my initial google searches I thought the issue might be similar to the bug reported at https://lists.secondlife.com/pipermail/eventletdev/2008-October/000140.html but I am using a new version of eventlet so surely that cannot be it?
If listener.accept() is non-blocking, you should put the thread to sleep for a small amount of time, so that the os scheduler can dispatch work to other processes. Do this by putting
time.sleep(0.03)
at the end of your while True loop.
Sorry for late reply.
There was no code like listener.setblocking(0), therefore, it MUST behave as blocking and no sleep must be required.
Also, please use a tool like ps or top to at least ensure that it's python process who eats all CPU.
If the issue still persists, please, report it to one of these channels, whichever you like:
https://bitbucket.org/which_linden/eventlet/issues/new
https://github.com/eventlet/eventlet/issues/new
email to eventletdev#lists.secondlife.com

error: can't start new thread

I have a site that runs with follow configuration:
Django + mod-wsgi + apache
In one of user's request, I send another HTTP request to another service, and solve this by httplib library of python.
But sometimes this service don't get answer too long, and timeout for httplib doesn't work. So I creating thread, in this thread I send request to service, and join it after 20 sec (20 sec - is a timeout of request). This is how it works:
class HttpGetTimeOut(threading.Thread):
def __init__(self,**kwargs):
self.config = kwargs
self.resp_data = None
self.exception = None
super(HttpGetTimeOut,self).__init__()
def run(self):
h = httplib.HTTPSConnection(self.config['server'])
h.connect()
sended_data = self.config['sended_data']
h.putrequest("POST", self.config['path'])
h.putheader("Content-Length", str(len(sended_data)))
h.putheader("Content-Type", 'text/xml; charset="utf-8"')
if 'base_auth' in self.config:
base64string = base64.encodestring('%s:%s' % self.config['base_auth'])[:-1]
h.putheader("Authorization", "Basic %s" % base64string)
h.endheaders()
try:
h.send(sended_data)
self.resp_data = h.getresponse()
except httplib.HTTPException,e:
self.exception = e
except Exception,e:
self.exception = e
something like this...
And use it by this function:
getting = HttpGetTimeOut(**req_config)
getting.start()
getting.join(COOPERATION_TIMEOUT)
if getting.isAlive(): #maybe need some block
getting._Thread__stop()
raise ValueError('Timeout')
else:
if getting.resp_data:
r = getting.resp_data
else:
if getting.exception:
raise ValueError('REquest Exception')
else:
raise ValueError('Undefined exception')
And all works fine, but sometime I start catching this exception:
error: can't start new thread
at the line of starting new thread:
getting.start()
and the next and the final line of traceback is
File "/usr/lib/python2.5/threading.py", line 440, in start
_start_new_thread(self.__bootstrap, ())
And the answer is: What's happen?
Thank's for all, and sorry for my pure English. :)
The "can't start new thread" error almost certainly due to the fact that you have already have too many threads running within your python process, and due to a resource limit of some kind the request to create a new thread is refused.
You should probably look at the number of threads you're creating; the maximum number you will be able to create will be determined by your environment, but it should be in the order of hundreds at least.
It would probably be a good idea to re-think your architecture here; seeing as this is running asynchronously anyhow, perhaps you could use a pool of threads to fetch resources from another site instead of always starting up a thread for every request.
Another improvement to consider is your use of Thread.join and Thread.stop; this would probably be better accomplished by providing a timeout value to the constructor of HTTPSConnection.
You are starting more threads than can be handled by your system. There is a limit to the number of threads that can be active for one process.
Your application is starting threads faster than the threads are running to completion. If you need to start many threads you need to do it in a more controlled manner I would suggest using a thread pool.
I was running on a similar situation, but my process needed a lot of threads running to take care of a lot of connections.
I counted the number of threads with the command:
ps -fLu user | wc -l
It displayed 4098.
I switched to the user and looked to system limits:
sudo -u myuser -s /bin/bash
ulimit -u
Got 4096 as response.
So, I edited /etc/security/limits.d/30-myuser.conf and added the lines:
myuser hard nproc 16384
myuser soft nproc 16384
Restarted the service and now it's running with 7017 threads.
Ps. I have a 32 cores server and I'm handling 18k simultaneous connections with this configuration.
I think the best way in your case is to set socket timeout instead of spawning thread:
h = httplib.HTTPSConnection(self.config['server'],
timeout=self.config['timeout'])
Also you can set global default timeout with socket.setdefaulttimeout() function.
Update: See answers to Is there any way to kill a Thread in Python? question (there are several quite informative) to understand why. Thread.__stop() doesn't terminate thread, but rather set internal flag so that it's considered already stopped.
I completely rewrite code from httplib to pycurl.
c = pycurl.Curl()
c.setopt(pycurl.FOLLOWLOCATION, 1)
c.setopt(pycurl.MAXREDIRS, 5)
c.setopt(pycurl.CONNECTTIMEOUT, CONNECTION_TIMEOUT)
c.setopt(pycurl.TIMEOUT, COOPERATION_TIMEOUT)
c.setopt(pycurl.NOSIGNAL, 1)
c.setopt(pycurl.POST, 1)
c.setopt(pycurl.SSL_VERIFYHOST, 0)
c.setopt(pycurl.SSL_VERIFYPEER, 0)
c.setopt(pycurl.URL, "https://"+server+path)
c.setopt(pycurl.POSTFIELDS,sended_data)
b = StringIO.StringIO()
c.setopt(pycurl.WRITEFUNCTION, b.write)
c.perform()
something like that.
And I testing it now. Thanks all of you for help.
If you are tying to set timeout why don't you use urllib2.
I'm running a python script on my machine only to copy and convert some files from one format to another, I want to maximize the number of running threads to finish as quickly as possible.
Note: It is not a good workaround from an architecture perspective If you aren't using it for a quick script on a specific machine.
In my case, I checked the max number of running threads that my machine can run before I got the error, It was 150
I added this code before starting a new thread. which checks if the max limit of running threads is reached then the app will wait until some of the running threads finish, then it will start new threads
while threading.active_count()>150 :
time.sleep(5)
mythread.start()
If you are using a ThreadPoolExecutor, the problem may be that your max_workers is higher than the threads allowed by your OS.
It seems that the executor keeps the information of the last executed threads in the process table, even if the threads are already done. This means that when your application has been running for a long time, eventually it will register in the process table as many threads as ThreadPoolExecutor.max_workers
As far as I can tell it's not a python problem. Your system somehow cannot create another thread (I had the same problem and couldn't start htop on another cli via ssh).
The answer of Fernando Ulisses dos Santos is really good. I just want to add, that there are other tools limiting the number of processes and memory usage "from the outside". It's pretty common for virtual servers. Starting point is the interface of your vendor or you might have luck finding some information in files like
/proc/user_beancounters

Categories