Python flask threads does not close - python

My python flask app runs using nohup. ie it is always live. I see that it creates a thread every time user submits from page. It is because flask.run is with multithread=true. But my problem is even after the processing is over, the thread doesn't seem to be closed. I'm checking this with the ps -eLf |grep userid command. where i see many threads still active long after the code execution is over. and it gets added when another submit is done. All threads are removed when the app itself is restarted.
What is the criteria for the thread to close without restarting the app?
Many posts like these suggests the gc.collect, del object etc..
I have many user defined classes getting instantiated on submit. and one object refers another . So
is it because the memory not getting released?
Should i use gc.collect or del objects?
Pythons should be clearing these objects once the scope of the variable is over. is it correct?
app = Flask(__name__)
#app.route('/submit',methods = ['GET','POST'])
def submit():
#obj1=class1()
#obj2=class2(obj1)
#obj3=class3(obj1)
#refer objects
#process data
#done
if __name__ == "__main__":
app.run(host='0.0.0.0', port=4000, threaded=True, debug=False)

It looks like the problem was with a paramiko object not getting closed. Once the SFTPClient or the SSHClient is opened, it has to be closed explicitly. I have assumed that along with my class object (where paramiko object is defined) it would get closed. But it doesnt.
So on the end of my process i call below lines. Now the threads seems getting closed properly
if objs.ssh:
objs.ssh.close()
if objs.sftp:
objs.t.close()
objs.sftp.close()
del objs
gc.collect()

Related

Python Flask returning a html page while simultaneously performing a function

I'm currently creating a web app using Python Flask and I've run into a road block and I'm not sure if I'm even thinking about it correctly.
So my website's homepage is just a simple landing page with text input that is required to perform the websites function. What I am trying to accomplish is for the web app to perform two things after the text is input. First, the server takes the username input and performs a function that doesn't return anything to the user but creates a bunch of data that is logged into an sqlite database, and used later on in the process. Then, the server returns the web page for a survey that has to be taken after the username is input. However, the function that the server performs can take upwards of 2 minutes depending on the user. The way I currently have it coded, the server performs the function, then once it has finished, it returns the web page, so the user is stuck at a loading screen for up to 2 minutes.
#app.route("/survey")
def main(raw_user):
raw_user = request.args.get("SteamID") <
games = createGameDict(user_obj) <----- the function
tag_lst = get_tags(games) <
return render_template("survey_page.html")
Since the survey doesn't depend on the user input, instead of having the user sitting at a loading screen, I would like them to be able to start the survey while the functions works in the background, is that possible, and how would I do that?
Update: I've had to solve this problem a number of times in Flask, so I wrote a small Flask extension called Flask-Executor to do it for me. It's a wrapper for concurrent.futures that provides a few handy features, and is my preferred way of handling background tasks that don't require distribution in Flask.
For more complex background tasks, something like celery is your best bet. For simpler use cases however, what you want is the threading module.
Consider the following example:
from flask import Flask
from time import sleep
app = Flask(__name__)
def slow_function(some_object):
sleep(5)
print(some_object)
#app.route('/')
def index():
some_object = 'This is a test'
slow_function(some_object)
return 'hello'
if __name__ == '__main__':
app.run()
Here, we create a function, slow_function() that sleeps for five seconds before returning. When we call it in our route function it blocks the page load. Run the example and hit http://127.0.0.1:5000 in your browser, and you'll see the page wait five seconds before loading, after which the test message is printed in your terminal.
What we want to do is to put slow_function() on a different thread. With just a couple of additional lines of code, we can use the threading module to separate out the execution of this function onto a different thread:
from flask import Flask
from time import sleep
from threading import Thread
app = Flask(__name__)
def slow_function(some_object):
sleep(5)
print(some_object)
#app.route('/')
def index():
some_object = 'This is a test'
thr = Thread(target=slow_function, args=[some_object])
thr.start()
return 'hello'
if __name__ == '__main__':
app.run()
What we're doing here is simple. We're creating a new instance of Thread and passing it two things: the target, which is the function we want to run, and args, the argument(s) to be passed to the target function. Notice that there are no parentheses on slow_function, because we're not running it - functions are objects, so we're passing the function itself to Thread. As for args, this always expects a list. Even if you only have one argument, wrap it in a list so args gets what it's expecting.
With our thread ready to go, thr.start() executes it. Run this example in your browser, and you'll notice that the index route now loads instantly. But wait another five seconds and sure enough, the test message will print in your terminal.
Now, we could stop here - but in my opinion at least, it's a bit messy to actually have this threading code inside the route itself. What if you need to call this function in another route, or a different context? Better to separate it out into its own function. You could make threading behaviour a part of slow function itself, or you could make a "wrapper" function - which approach you take depends a lot on what you're doing and what your needs are.
Let's create a wrapper function, and see what it looks like:
from flask import Flask
from time import sleep
from threading import Thread
app = Flask(__name__)
def slow_function(some_object):
sleep(5)
print(some_object)
def async_slow_function(some_object):
thr = Thread(target=slow_function, args=[some_object])
thr.start()
return thr
#app.route('/')
def index():
some_object = 'This is a test'
async_slow_function(some_object)
return 'hello'
if __name__ == '__main__':
app.run()
The async_slow_function() function is doing pretty much exactly what we were doing before - it's just a bit neater now. You can call it in any route without having to rewrite your threading logic all over again. You'll notice that this function actually returns the thread - we don't need that for this example, but there are other things you might want to do with that thread later, so returning it makes the thread object available if you ever need it.

Django - dictionary gets empty on next endpoint

i've a program which starts multiple threads with data from database and i store the thread object in a dictionary.
thread_manager.py
threads = {}
............
#controller for /start-threads
def start_threads(request):
datas = Data.objects.all()
for data in datas:
thread = MyThread(data)
threads[data.id] = thread
thread.start()
return HttpResponse("all threads are running")
def get_thread(id,request):
return threads[id]
at this point threads dictionary has all the threads in it and i can access the thread object with threads[id], now if try to get the thread from another endpoint (im using django)
views.py
import thread_manager
def get_thread(request, id):
thread = thread_manager.get_thread(id, request)
return HttpResponse("got thread with id {0}".format(id))
the threads dictionary is empty at this point(ofc i get a keyerror), if i run this on local server everything works fine. if i run this on live server which has uwsgi running django. it doesnt work, is this a problem with uWsgi or am i doing anything wrong, thanx.
Your server is almost certainly running with more than one process. But threads belong to a single process; you can't access them from a different one.
You don't say what you're doing with these threads, but offline work is almost always better done with a specific system such as Celery.

Strange blocking behavior with python multiprocessing queue put() and get()

I have written a class in python 2.7 (under linux) that uses multiple processes to manipulate a database asynchronously. I encountered a very strange blocking behaviour when using multiprocessing.Queue.put() and multiprocessing.Queue.get() which I can't explain.
Here is a simplified version of what I do:
from multiprocessing import Process, Queue
class MyDB(object):
def __init__(self):
self.inqueue = Queue()
p1 = Process(target = self._worker_process, kwargs={"inqueue": self.inqueue})
p1.daemon = True
started = False
while not started:
try:
p1.start()
started = True
except:
time.sleep(1)
#Sometimes I start a same second process but it makes no difference to my problem
p2 = Process(target = self._worker_process, kwargs={"inqueue": self.inqueue})
#blahblah... (same as above)
#staticmethod
def _worker_process(inqueue):
while True:
#--------------this blocks depite data having arrived------------
op = inqueue.get(block = True)
#do something with specified operation
#---------------problem area end--------------------
print "if this text gets printed, the problem was solved"
def delete_parallel(self, key, rawkey = False):
someid = ...blahblah
#--------------this section blocked when I was posting the question but for unknown reasons it's fine now
self.inqueue.put({"optype": "delete", "kwargs": {"key":key, "rawkey":rawkey}, "callid": someid}, block = True)
#--------------problem area end----------------
print "if you see this text, there was no blocking or block was released"
If I run the code above inside a test (in which I call delete_parallel on the MyDB object) then everything works, but if I run it in context of my entire application (importing other stuff, inclusive pygtk) strange things happen:
For some reason self.inqueue.get blocks and never releases despite self.inqueue having the data in its buffer. When I instead call self.inqueue.get(block = False, timeout = 1) then the call finishes by raising Queue.Empty, despite the queue containing data. qsize() returns 1 (suggests that data is there) while empty() returns True (suggests that there is no data).
Now clearly there must be something somewhere else in my application that renders self.inqueue unusable by causing acquisition of some internal semaphore. However I don't know what to look for. Eclipse dubugging becomes useless once a blocking semaphore is reached.
Edit 8 (cleaning up and summarizing my previous edits) Last time I had a similar problem, it turned out that pygtk was hijacking the global interpreter lock, but I solved it by calling gobject.threads_init() before I called anything else. Could this issue be related?
When I introduce a print "successful reception" after the get() method and execute my application in terminal, the same behaviour happens at first. When I then terminate by pressing CTRL+D I suddenly get the string "successful reception" inbetween messages. This looks to me like some other process/thread is terminated and releases the lock that blocks the process that is stuck at get().
Since the process that was stuck terminates later, I still see the message. What kind of process could externally mess with a Queue like that? self.inqueue is only accessed inside my class.
Right now it seems to come down to this queue which won't return anything despite the data being there:
the get() method seems to get stuck when it attempts to receive the actual data from some internal pipe. The last line before my debugger hangs is:
res = self._recv()
which is inside of multiprocessing.queues.get()
Tracking this internal python stuff further I find the assignments
self._recv = self._reader.recv and self._reader, self._writer = Pipe(duplex=False).
Edit 9
I'm currently trying to hunt down the import that causes it. My application is quite complex with hundreds of classes and each class importing a lot of other classes, so it's a pretty painful process. I have found a first candidate class which Uses 3 different MyDB instances when I track all its imports (but doesn't access MyDB.inqueue at any time as far as I can tell). The strange thing is, it's basically just a wrapper and the wrapped class works just fine when imported on its own. This also means that it uses MyDB without freezing. As soon as I import the wrapper (which imports that class), I have the blocking issue.
I started rewriting the wrapper by gradually reusing the old code. I'm testing each time I introduce a couple of new lines until I will hopefully see which line will cause the problem to return.
queue.Queue uses internal threads to maintain its state. If you are using GTK then it will break these threads. So you will need to call gobject.init_threads().
It should be noted that qsize() only returns an approximate size of the queue. The real size may be anywhere between 0 and the value returned by qsize().

Reload python flask server by function

I'm writing a python/flask application and would like to add the functionality of reloading the server.
I'm currently running the server with the following option
app.run(debug=True)
which results in the following, each time a code change happens
* Running on http://127.0.0.1:5000/
* Restarting with reloader
In a production environment however, I would rather not have debug=True set, but be able to only reload the application server whenever I need to.
I'm trying to get two things working:
if reload_needed: reload_server(), and
if a user clicks on a "Reload Server" button in the admin panel, the reload_server() function should be called.
However, despite the fact that the server get's reloaded after code changes, I couldn't find a function that let's me do exactly that.
If possible I would like to use the flask/werkzeug internal capabilities. I am aware that I could achieve something like that by adding things like gunicorn/nginx/apache, etc.
I think I've had the same problem.
So there was a python/flask application (XY.py), on clients. I wrote a build step (Teamcity) which deploys this python code to the clients. Let's suppose the XY.py is already running on the clients. After deploying this new/fixed/corrected XY.py I had to restart it for applying the changes on the running code.
The problem what I've had is that after using the fine restarting oneliner os.execl(sys.executable, *([sys.executable]+sys.argv)) my port used by app is still busy/established, so after restarting I can't reach it.
This is how I resolved the problem:
I put my app to run on a separate Process and made a queue for it. To see it more cleanly here is some code.
global some_queue = None
#app.route('/restart')
def restart():
try:
some_queue.put("something")
return "Quit"
def start_flaskapp(queue):
some_queue = queue
app.run(your_parameters)
Add this to your main:
q = Queue()
p = Process(target=start_flaskapp, args=[q,])
p.start()
while True: #wathing queue, sleep if there is no call, otherwise break
if q.empty():
time.sleep(1)
else:
break
p.terminate() #terminate flaskapp and then restart the app on subprocess
args = [sys.executable] + [sys.argv[0]]
subprocess.call(args)
Hope it was clean and short enough and it helped to you!
How following in your Python code in order to kill the server:
#app.route('/quit')
def _quit():
os._exit(0)
When process is killed it will repeat itself in the while loop.
app_run.sh:
#!/bin/bash
while true
do
hypercorn app_async:app -b 0.0.0.0:5000
sleep 1
done

How can I run django with mod_wsgi and use the multiprocess module?

The workflow I am dealing with (user-wise) looks like this:
User submits information and files with a form
Form is saved
Additional post-save processing is done
This is fine, but the post-save processing takes quite a while, so I'm looking to do it in the background and issue an HttpResponseRedirect to a message informing the user that processing is happening and to please return later. Unfortunately, this doesn't seem to be working; what I've got at the moment is this:
if form.is_valid():
p = multiprocessing.Process(target=form.save)
p.start()
return HttpResponseRedirect('/running')
But the error that I get back is this:
IOError at /content/script/new/
sys.stdout access restricted by mod_wsgi
...
/usr/lib/python2.6/multiprocessing/forking.py in __init__
# We define a Popen class similar to the one from subprocess, but
# whose constructor takes a process object as its argument.
#
class Popen(object):
def __init__(self, process_obj):
>>>> sys.stdout.flush() ...
sys.stderr.flush()
self.returncode = None
self.pid = os.fork()
if self.pid == 0:
if 'random' in sys.modules:
▼ Local vars
Variable Value
process_obj
<Process(Process-1, initial)>
self
<multiprocessing.forking.Popen object at 0xb8a06dec>
Does python have a more magical way to do this? Does Django? If not, how can I go ahead and use multiprocessing?
Use celery.
The mod_wsgi environment can multi-threaded -- depending on your configuration. You do not want to interfere with how Apache, mod_wsgi, and Django are already using threads and processes to manage web server throughput.
You have to assume that your Django operation is a single thread and cannot do anything except respond to Apache as quickly as possible.
The error is because you are using an old mod_wsgi version and also because you haven't read:
http://code.google.com/p/modwsgi/wiki/ApplicationIssues#Writing_To_Standard_Output
http://blog.dscpl.com.au/2009/04/wsgi-and-printing-to-standard-output.html
As someone else said, you are better off using something like Celery anyway and trigger such stuff outside of Apache processes.

Categories