This question already has answers here:
Are global variables thread-safe in Flask? How do I share data between requests?
(4 answers)
Closed 18 days ago.
Context
I'm trying to write a python flask server that answers a simple request. The data to be returned in the response is queried from a backend service. Since this query may take some time to complete, I don't want to do it synchronously, I want to have it run periodically, in a background thread, where I can explicitly control the frequency. It should update some data structure that is shared with the flask view function, so the GET requests gets its answer from the shared data.
I'm including two sample codes below. In both codes, cnt is a global variable, starting at 0, and increased inside a separate thread. The index.html file displays the value of cnt:
<h1>cnt = {{ cnt }}</h1>
The issue I'm facing
When the view function is inside the same module, it works: every time I refresh the page with F5, the value of cnt has changed, I can see it increasing.
But when I put the view functions in a separate routes module (which I import at the end of my hello.py file), it no longer works: I can see in the server traces that cnt is being increased by the background thread, but when I refresh the page I always see
cnt = 1
It's as if I now have two different copies of the cnt variable, even though the variable has been imported into the routes module.
Note
I've found countless question on SO on this topic, but none that really addresses this specific concern. Also, I'm perfectly aware that in my examples below, there is no lock protecting the shared data (which is a simple cnt variable) and I'm not handling thread termination. This is being deliberately ignored for now, in order to keep the sample code minimal.
Here are the codes.
Single module, it works
Here's the main hello.py file, with everything inside the same module:
from flask import Flask, render_template
import threading as th
from time import sleep
cnt = 0
app = Flask(__name__)
# Run in background thread
def do_stuff():
global cnt
while True:
cnt += 1
print(f'do_stuff: cnt={cnt}')
sleep(1)
# Create the separate thread
t = th.Thread(target=do_stuff)
t.start()
#app.route("/")
def hello():
return render_template('index.html', cnt=cnt)
The variable is indeed shared between the background thread and the view function, as expected.
Separate modules, it no longer works
Here's the main hello.py module, without the view function :
from flask import Flask
import threading as th
from time import sleep
cnt = 0
app = Flask(__name__)
# Run in background thread
def do_stuff():
global cnt
while True:
cnt += 1
print(f'do_stuff: cnt={cnt}')
sleep(1)
# Create the separate thread
t = th.Thread(target=do_stuff)
t.start()
import routes
And here is the separate routes.py file (see import at the end of hello.py above):
# routes.py
from hello import app, cnt
from flask import render_template
#app.route("/")
def hello():
return render_template('index.html', cnt=cnt)
With this code, the web page always displays cnt = 1, as if the two modules had two distinct instances of the cntvariable.
I feel like I'm missing some basic insight into python modules, or threads, or their interaction. Any help will be appreciated, and my apologies for such a long question.
globals are not shared between modules
when you say
from moduleA import count you have imported a non mutable number
if something in another file changes count in module A it will not update anything else... its overwritten that count you imported with a new variable named count effectively
if count was a mutable object than any changes would indeed reflect back ... so if it was a list or dictionary or a class with fields etc..
instead you can import the module which is mutable
import moduleA
now when moduleA changes count it is changing the mutable module field and you can see it in another place by printing moduleA.count
Related
I have multiple python files in different folders that work together to make my program function. They consist of a main.pyfile that creates new threads for each file and then starts them with the necessary parameters. This works great while the parameters are static, but if a variable changes in the main.py it doesn't get changed in the other files. I also can't import the main.py file into otherfile.py to get the new variable since it is in a previous dir.
I have created an example below. What should happen is that the main.py file creates a new thread and calls otherfile.py with set params. After 5 seconds, the variable in main.py changes and so should the var in otherfile (so it starts printing the number 5 instead of 10), but I haven't found a solution to update them in otherfile.py
The folder structure is as follows:
|-main.py
|-other
|
otherfile.py
Here is the code in both files:
main.py
from time import sleep
from threading import Thread
var = 10
def newthread():
from other.otherfile import loop
nt = Thread(target=loop(var))
nt.daemon = True
nt.start()
newthread()
sleep(5)
var = 5 #change the var, otherfile.py should start printing it now (doesnt)
otherfile.py
from time import sleep
def loop(var):
while True:
sleep(1)
print(var)
In Python, there are two types of objects:
Immutable objects can’t be changed.
Mutable objects can be changed.
Int is immutable. you must be use list or dict variable.
from time import sleep
from threading import Thread
var = [10]
def newthread():
from other.otherfile import loop
nt = Thread(target=loop, args=(var,), daemon=True)
nt.start()
newthread()
sleep(5)
var[0] = 5
This happens because of how objects are passed into functions in Python. You'll hear that everything is passed by reference in Python, but since integers are immutable, when you edit the value of val, you're actually creating a new object and your thread still holds a reference to the integer with a value of 10.
To get around this, I wrote a simple wrapper class for an integer:
class IntegerHolder():
def __init__(self, n):
self.value = n
def set_value(self, n):
self.value = n
def get_value(self):
return self.value
Then, instead of var = 10, I did i = IntegerHolder(10), and after the sleep(5) call, I simply did i.set_value(5), which updates the wrapper object. The thread still has the same reference to the IntegerHolder object i, and when i.get_value() is called in the thread, it will return 5, as required.
You can also do this with a Python list, since lists are objects — it's just that this implementation makes it clearer what's going on. You'd just do var = [10] and do var[0] = 5, which would work since your thread should still keep a reference to the same list object as the main thread.
Two more errors:
Instead of Thread(target=loop(var)), you need to do Thread(target=loop, args=(i,)). This is because target is supposed to be a callable object, which is basically a function. Doing loop(var) will cause the Thread constructor to loop forever waiting for the function to return (and then set target to the return value), so the thread never actually gets created. You can verify this with your favorite Python debugger, or print statements.
Setting nt.daemon = True allows main.py to exit before the thread finishes. This means that as soon as i.set_value(5) is called, the main program terminates and your integer wrapper object ceases to exist. This makes your thread very confused when it tries to access the wrapper object, and by very confused, I mean it throws an exception and dies because threads do that. You can verify this by catching the exit code of the thread. Deleting that line fixes things (nt.daemon = False by default), but it's probably safer to do a nt.join() call in the main thread, which waits for a thread to finish execution.
And one warning, because programming wouldn't be complete without warnings:
Whenever different threads try to access a value, if AT LEAST ONE thread is modifying the value, this can cause a race condition. This means that all accesses at that point should be wrapped in a lock/mutex to prevent this. The Python (3.7.4) docs have more info about this.
Let me know if you have any more questions!
I'm trying to understand multiprocessing. My actual application is to display log messages in real time on a pyqt5 GUI, but I ran into some problems using queues so I made a simple program to test it out.
The issue I'm seeing is that I am unable to add elements to a Queue across python modules and across processes. Here is my code and my output, along with the expected output.
Config file for globals:
# cfg.py
# Using a config file to import my globals across modules
#import queue
import multiprocessing
# q = queue.Queue()
q = multiprocessing.Queue()
Main module:
# mod1.py
import cfg
import mod2
import multiprocessing
def testq():
global q
print("q has {} elements".format(cfg.q.qsize()))
if __name__ == '__main__':
testq()
p = multiprocessing.Process(target=mod2.add_to_q)
p.start()
p.join()
testq()
mod2.pullfromq()
testq()
Secondary module:
# mod2.py
import cfg
def add_to_q():
cfg.q.put("Hello")
cfg.q.put("World!")
print("qsize in add_to_q is {}".format(cfg.q.qsize()))
def pullfromq():
if not cfg.q.empty():
msg = cfg.q.get()
print(msg)
Here is the output that I actually get from this:
q has 0 elements
qsize in add_to_q is 2
q has 0 elements
q has 0 elements
vs the output that I would expect to get:
q has 0 elements
qsize in add_to_q is 2
q has 2 elements
Hello
q has 1 elements
So far I have tried using both multiprocessing.Queue and queue.Queue. I have also tested this with and without Process.join().
If I run the same program without using multiprocessing, I get the expected output shown above.
What am I doing wrong here?
EDIT:
Process.run() gives me the expected output, but it also blocks the main process while it is running, which is not what I want to do.
My understanding is that Process.run() runs the created process in the context of the calling process (in my case the main process), meaning that it is no different from the main process calling the same function.
I still don't understand why my queue behavior isn't working as expected
I've discovered the root of the issue and I'll document it here for future searches, but I'd still like to know if there's a standard solution to creating a global queue between modules so I'll accept any other answers/comments.
I found the problem when I added the following to my cfg.py file.
print("cfg.py is running in process {}".format(multiprocessing.current_process()))
This gave me the following output:
cfg.py is running in process <_MainProcess(MainProcess, started)>
cfg.py is running in process <_MainProcess(Process-1, started)>
cfg.py is running in process <_MainProcess(Process-2, started)>
It would appear that I'm creating separate Queue objects for each process that I create, which would certainly explain why they aren't interacting as expected.
This question has a comment stating that
a shared queue needs to originate from the master process, which is then passed to all of its subprocesses.
All this being said, I'd still like to know if there is an effective way to share a global queue between modules without having to pass it between methods.
I'm making a website, and on startup, I want to launch another process that starts loading an embedding model because this takes a long time and will be needed by the user eventually. This is my code:
from flask import Flask, render_template
from flask_socketio import SocketIO, send
import bot
import sys
sys.path = sys.path + ['filepath']
from BigLearnPy import BigLearn
from multiprocessing import Process
app = Flask(__name__)
app.config['SECRET_KEY'] = 'password'
socketio = SocketIO(app)
def loadModel():
BigLearn.LoadEmbeddingEngine()
emb = BigLearn.EmbeddingEngine('filepath')
#app.route('/')
def index():
return render_template('index.html')
#socketio.on('message')
def handleMessage(msg):
send(msg, broadcast=True)
p1.join()
send('0' + bot.getResponse(msg, emb), broadcast=True)
send('2' + bot.getKB(msg, emb), broadcast=True)
if __name__ == '__main__':
emb = None
p1 = Process(target=loadModel)
p1.start()
socketio.run(app)
I start the process to load the model right before I start running the app (penultimate line). I join the process in the handleMessage function right before I need the value of emb. So that I can access emb outside of the loadModel function, I declared it right before creating the process. However, when I run the code, I get an error saying emb is a NoneType object. This seems like a scoping issue but no matter where I sayemb = None, I either get that emb is None or undefined when I try to use it. How can I load the model in a different process then access the model? Thanks.
You cannot load the model from a different process. That is not how multi-processing works.
At the fork, each process get its own copy of memory (conceptually; in practice there are tricks to prevent copying everything). Any change in variables after the fork will only be visible in the process that changed it, not in its parent.
If you want to share memory you need to use threads, not processes. But mutating memory that is shared between threads in a safe way is fairly complicated. In any case it might not help you that much because Python has a Global Interpreter Lock: only one Python thread can run at a time.
If you want to experiment with threads or processes I would recommend starting with simpler examples.
As for your problem, I would start by trying to optimize the loading code so it is faster. Without knowing what it does it is hard to make more specific suggestions.
My python flask app runs using nohup. ie it is always live. I see that it creates a thread every time user submits from page. It is because flask.run is with multithread=true. But my problem is even after the processing is over, the thread doesn't seem to be closed. I'm checking this with the ps -eLf |grep userid command. where i see many threads still active long after the code execution is over. and it gets added when another submit is done. All threads are removed when the app itself is restarted.
What is the criteria for the thread to close without restarting the app?
Many posts like these suggests the gc.collect, del object etc..
I have many user defined classes getting instantiated on submit. and one object refers another . So
is it because the memory not getting released?
Should i use gc.collect or del objects?
Pythons should be clearing these objects once the scope of the variable is over. is it correct?
app = Flask(__name__)
#app.route('/submit',methods = ['GET','POST'])
def submit():
#obj1=class1()
#obj2=class2(obj1)
#obj3=class3(obj1)
#refer objects
#process data
#done
if __name__ == "__main__":
app.run(host='0.0.0.0', port=4000, threaded=True, debug=False)
It looks like the problem was with a paramiko object not getting closed. Once the SFTPClient or the SSHClient is opened, it has to be closed explicitly. I have assumed that along with my class object (where paramiko object is defined) it would get closed. But it doesnt.
So on the end of my process i call below lines. Now the threads seems getting closed properly
if objs.ssh:
objs.ssh.close()
if objs.sftp:
objs.t.close()
objs.sftp.close()
del objs
gc.collect()
I'm currently creating a web app using Python Flask and I've run into a road block and I'm not sure if I'm even thinking about it correctly.
So my website's homepage is just a simple landing page with text input that is required to perform the websites function. What I am trying to accomplish is for the web app to perform two things after the text is input. First, the server takes the username input and performs a function that doesn't return anything to the user but creates a bunch of data that is logged into an sqlite database, and used later on in the process. Then, the server returns the web page for a survey that has to be taken after the username is input. However, the function that the server performs can take upwards of 2 minutes depending on the user. The way I currently have it coded, the server performs the function, then once it has finished, it returns the web page, so the user is stuck at a loading screen for up to 2 minutes.
#app.route("/survey")
def main(raw_user):
raw_user = request.args.get("SteamID") <
games = createGameDict(user_obj) <----- the function
tag_lst = get_tags(games) <
return render_template("survey_page.html")
Since the survey doesn't depend on the user input, instead of having the user sitting at a loading screen, I would like them to be able to start the survey while the functions works in the background, is that possible, and how would I do that?
Update: I've had to solve this problem a number of times in Flask, so I wrote a small Flask extension called Flask-Executor to do it for me. It's a wrapper for concurrent.futures that provides a few handy features, and is my preferred way of handling background tasks that don't require distribution in Flask.
For more complex background tasks, something like celery is your best bet. For simpler use cases however, what you want is the threading module.
Consider the following example:
from flask import Flask
from time import sleep
app = Flask(__name__)
def slow_function(some_object):
sleep(5)
print(some_object)
#app.route('/')
def index():
some_object = 'This is a test'
slow_function(some_object)
return 'hello'
if __name__ == '__main__':
app.run()
Here, we create a function, slow_function() that sleeps for five seconds before returning. When we call it in our route function it blocks the page load. Run the example and hit http://127.0.0.1:5000 in your browser, and you'll see the page wait five seconds before loading, after which the test message is printed in your terminal.
What we want to do is to put slow_function() on a different thread. With just a couple of additional lines of code, we can use the threading module to separate out the execution of this function onto a different thread:
from flask import Flask
from time import sleep
from threading import Thread
app = Flask(__name__)
def slow_function(some_object):
sleep(5)
print(some_object)
#app.route('/')
def index():
some_object = 'This is a test'
thr = Thread(target=slow_function, args=[some_object])
thr.start()
return 'hello'
if __name__ == '__main__':
app.run()
What we're doing here is simple. We're creating a new instance of Thread and passing it two things: the target, which is the function we want to run, and args, the argument(s) to be passed to the target function. Notice that there are no parentheses on slow_function, because we're not running it - functions are objects, so we're passing the function itself to Thread. As for args, this always expects a list. Even if you only have one argument, wrap it in a list so args gets what it's expecting.
With our thread ready to go, thr.start() executes it. Run this example in your browser, and you'll notice that the index route now loads instantly. But wait another five seconds and sure enough, the test message will print in your terminal.
Now, we could stop here - but in my opinion at least, it's a bit messy to actually have this threading code inside the route itself. What if you need to call this function in another route, or a different context? Better to separate it out into its own function. You could make threading behaviour a part of slow function itself, or you could make a "wrapper" function - which approach you take depends a lot on what you're doing and what your needs are.
Let's create a wrapper function, and see what it looks like:
from flask import Flask
from time import sleep
from threading import Thread
app = Flask(__name__)
def slow_function(some_object):
sleep(5)
print(some_object)
def async_slow_function(some_object):
thr = Thread(target=slow_function, args=[some_object])
thr.start()
return thr
#app.route('/')
def index():
some_object = 'This is a test'
async_slow_function(some_object)
return 'hello'
if __name__ == '__main__':
app.run()
The async_slow_function() function is doing pretty much exactly what we were doing before - it's just a bit neater now. You can call it in any route without having to rewrite your threading logic all over again. You'll notice that this function actually returns the thread - we don't need that for this example, but there are other things you might want to do with that thread later, so returning it makes the thread object available if you ever need it.