Using multiprocessing with gunicorn in Flask application - python

I made a basic flask application using Gunicorn with worker class gevent. The issue I ran into was as follows. If I had a basic flask app like this:
from multiprocessing import Pool
import Queue
import random
from threading import Thread
import time
from flask import Flask
app = Flask(__name__)
def f(x):
return random.randint(1, 6)
def thread_random(queue):
time.sleep(random.random())
queue.put(random.randint(1, 6))
def thread_roll():
q = Queue.Queue()
threads = []
for _ in range(3):
t = Thread(target=thread_random, args=(q, ))
t.start()
threads.append(t)
for t in threads:
t.join()
dice_roll = sum([q.get() for _ in range(3)])
return dice_roll
#app.route('/')
def hello_world():
# technique 1
pool = Pool(processes=4)
return 'roll is: %s \n' % sum(pool.map(f, range(3)))
# technique 2
return 'roll is: %s \n' % thread_roll()
if __name__ == '__main__':
app.run(debug=True)
And I took two techniques at it, technique 1 will break gunicorn if I run it like:
sudo gunicorn -b 0.0.0.0:8000 app:app --worker-class gevent
but technique 2 won't. I see this is because technique 1 relies on multiprocessing and technique 2 relies on threads, but I can't figure out why a gevent worker class doesn't allow for a pool?

If you're using gevent. You should try using monkey_patch.
http://www.gevent.org/gevent.monkey.html

Related

Multiple Threads or Workers in Python? - Want to increase Performance

Currently, I got some little Python Script running, creating some Web-Requests.
I am absolute new to Python, so I took a bare-bones Script I found, and it uses Multi-Threads (see end of thread for the full Script):
if __name__ == '__main__':
threads = []
for i in range(THREAD_COUNT):
t = Thread(target=callback)
threads.append(t)
t.start()
for t in threads:
t.join()
However, I feel this Script is kinda slow, like it does the Requests after each other and not at the same time.
So I took another approach and tried to find more about Workers and Multi-Threads.
It seems "Workers" are the Way to go, instead of Threads?
So I took the following from a Tutorial and modified it a little:
import logging
import os
from queue import Queue
from threading import Thread
from time import time
from multi import callback
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
class DownloadWorker(Thread):
def __init__(self, queue):
Thread.__init__(self)
self.queue = queue
def run(self):
while True:
# that is my Function in Multi.py (A simple Web Request Function)
try:
callback()
finally:
self.queue.task_done()
if __name__ == '__main__':
ts = time()
queue = Queue()
for x in range(8):
worker = DownloadWorker(queue)
worker.daemon = True
worker.start()
# I put that here, because I want to run my "Program" infinite times
for i in range(500000):
logger.info('Queueing')
queue.put(i)
queue.join()
logging.info('Took %s', time() - ts)
I am not sure here, if that is the correct approach, from my Understanding I created 8 Workers and with the queue.put(i). I give them Jobs (500,000 in this Case?) passing them the current counter (which does nothing, it seems to be required tho?)
After he is done queening, the Function is executed, as I can see in my Console.
However, I feel it still runs same slow as before?
(My Original Request File)
from threading import Thread
import requests
import json
import string
import urllib3
import threading
THREAD_COUNT = 5
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
def callback():
counter = 0
try:
while True:
print("Prozess " + str(threading.get_ident())+ " " +str(counter))
counter = counter + 1
response = requests.post('ourAPIHere',verify=False, json={"pingme":"hello"})
json_data = json.loads(response.text)
if json_data["status"] == "error":
print("Server Error? Check logs!")
if json_data["status"] == "success":
print("OK")
except KeyboardInterrupt:
return
if __name__ == '__main__':
threads = []
for i in range(THREAD_COUNT):
t = Thread(target=callback)
threads.append(t)
t.start()
for t in threads:
t.join()

How to excecute code after Flask `app.run()` statement (run a Flask app and a function in parallel, execute code while Flask server is running)

Recently added Flask to a sample infinite-loop which randomly prints words. However, when adding app.run(host='0.0.0.0') the code after that line won't execute after I stop Flask running.
if __name__ == '__main__':
app.run(host='0.0.0.0')
while True: # won't run until I press stop once (stop Flask) when running directly from IDE
...
What I want is to be able to run the while loop while the Flask app is running.
Is there any way to solve this?
You can use before_first_request instead. Functions decorated with #app.before_first_request will run once before the first request to this instance of the application.
The code looks like this:
from flask import Flask
app = Flask(__name__)
#app.route("/")
def index():
print("index is running!")
return "Hello world"
#app.before_first_request
def before_first_request_func():
print("This function will run once")
if __name__ == "__main__":
app.run(host="0.0.0.0")
The code in before_first_request_func will be executed once before the first request to the server. Therefore, after starting the Flask instance, one can simulate the first request to the server using curl or so.
You can do what you want by using multithreading:
from flask import Flask
import threading
import time
app = Flask(__name__)
#app.route("/")
def hello_world():
return "Hello, World!"
def run_app():
app.run(debug=False, threaded=True)
def while_function():
i = 0
while i < 20:
time.sleep(1)
print(i)
i += 1
if __name__ == "__main__":
first_thread = threading.Thread(target=run_app)
second_thread = threading.Thread(target=while_function)
first_thread.start()
second_thread.start()
Output:
* Serving Flask app "app"
* Environment: production
* Debug mode: off
* Running on [...] (Press CTRL+C to quit)
0
1
2
3
4
5
6
7
8
[...]
The idea is simple:
create 2 functions, one to run the app and an other to execute the wile loop,
and then execute each function in a seperate thread, making them run in parallel
You can do this with multiprocessing instead of multithreading too:
The (main) differences here is that the functions will run on different CPUs and in memory spaces.
from flask import Flask
from multiprocessing import Process
import time
# Helper function to easly parallelize multiple functions
def parallelize_functions(*functions):
processes = []
for function in functions:
p = Process(target=function)
p.start()
processes.append(p)
for p in processes:
p.join()
# The function that will run in parallel with the Flask app
def while_function():
i = 0
while i < 20:
time.sleep(1)
print(i)
i += 1
app = Flask(__name__)
#app.route("/")
def hello_world():
return "Hello, World!"
def run_app():
app.run(debug=False)
if __name__ == '__main__':
parallelize_functions(while_function, run_app)
If you want to use before_first_request proposed by #Triet Doan: you will have to pass the while function as an argument of before_first_request like this:
from flask import Flask
import time
app = Flask(__name__)
def while_function(arg):
i = 0
while i < 5:
time.sleep(1)
print(i)
i += 1
#app.before_first_request(while_function)
#app.route("/")
def index():
print("index is running!")
return "Hello world"
if __name__ == "__main__":
app.run()
In this setup, the while function will be executed, and, when it will be finished, your app will run, but I don't think that was what you were asking for?

Python - how can I run separate module (not function) as a separate process?

tl,dr: How can I programmably execute a python module (not function) as a separate process from a different python module?
On my development laptop, I have a 'server' module containing a bottle server. In this module, the name==main clause starts the bottle server.
#bt_app.post("/")
def server_post():
<< Generate response to 'http://server.com/' >>
if __name__ == '__main__':
serve(bt_app, port=localhost:8080)
I also have a 'test_server' module containing pytests. In this module, the name==main clause runs pytest and displays the results.
def test_something():
_rtn = some_server_function()
assert _rtn == desired
if __name__ == '__main__':
_rtn = pytest.main([__file__])
print("Pytest returned: ", _rtn)
Currently, I manually run the server module (starting the web server on localhost), then I manually start the pytest module which issues html requests to the running server module and checks the responses.
Sometimes I forget to start the server module. No big deal but annoying. So I'd like to know if I can programmatically start the server module as a separate process from the pytest module (just as I'm doing manually now) so I don't forget to start it manually.
Thanks
There is my test cases dir tree:
test
├── server.py
└── test_server.py
server.py start a web server with flask.
from flask import Flask
app = Flask(__name__)
#app.route('/')
def hello_world():
return 'Hello, World!'
if __name__ == '__main__':
app.run()
test_server.py make request to test.
import sys
import requests
import subprocess
import time
p = None # server process
def start_server():
global p
sys.path.append('/tmp/test')
# here you may want to do some check.
# whether the server is already started, then pass this fucntion
kwargs = {} # here u can pass other args needed
p = subprocess.Popen(['python','server.py'], **kwargs)
def test_function():
response = requests.get('http://localhost:5000/')
print('This is response body: ', response.text)
if __name__ == '__main__':
start_server()
time.sleep(3) # waiting server started
test_function()
p.kill()
Then you can do python test_server to start the server and do test cases.
PS: Popen() needs python3.5+. if older version, use run instead
import logging
import threading
import time
def thread_function(name):
logging.info("Thread %s: starting", name)
time.sleep(2)
logging.info("Thread %s: finishing", name)
if __name__ == "__main__":
format = "%(asctime)s: %(message)s"
logging.basicConfig(format=format, level=logging.INFO,
datefmt="%H:%M:%S")
threads = list()
for index in range(3):
logging.info("Main : create and start thread %d.", index)
x = threading.Thread(target=thread_function, args=(index,))
threads.append(x)
x.start()
for index, thread in enumerate(threads):
logging.info("Main : before joining thread %d.", index)
thread.join()
logging.info("Main : thread %d done", index)
With threading you can run multiple processes at once!
Wim baasically answered this question. I looked into the subprocess module. While reading up on it, I stumbled on the os.system function.
In short, subprocess is a highly flexible and functional program for running a program. os.system, on the other hand, is much simpler, with far fewer functions.
Just running a python module is simple, so I settled on os.system.
import os
server_path = "python -m ../src/server.py"
os.system(server_path)
Wim, thanks for the pointer. Had it been a full fledged answer I would have upvoted it. Redo it as a full fledged answer and I'll do so.
Async to the rescue.
import gevent
from gevent import monkey, spawn
monkey.patch_all()
from gevent.pywsgi import WSGIServer
#bt_app.post("/")
def server_post():
<< Generate response to 'http://server.com/' >>
def test_something():
_rtn = some_server_function()
assert _rtn == desired
print("Pytest returned: ",_rtn)
sleep(0)
if __name__ == '__main__':
spawn(test_something) #runs async
server = WSGIServer(("0.0.0.0", 8080, bt_app)
server.serve_forever()

gunicorn with eventlet runs threads in sequential manner

from flask import Flask
app = Flask(__name__)
import threading
class SThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def run(self):
for i in range(1, 1000):
print 0
t = SThread()
t.start()
for i in range(1, 1000):
print 1
t.join()
#app.route('/')
def hello_world():
return 'Hello, World!'
When you start your server like this, gunicorn run:app -b 0.0.0.0:8000, you will see all 0s and 1s will be in random order, main thread and child thread are running parallel.
But when you run same piece of code with gunicorn --worker-class eventlet run:app -b 0.0.0.0:8000, you will see first there will be all 0s and then there will be all 1s. That means main thread and child thread are not running parallel.
Is this expected behaviour?
And how can I use eventlet and make use of threading behaviour?
Edited ::
Based on suggestion, I am trying to do something like this to achieve threads like random behaviour and to join these multiple execution streams.
But it is running in sequential manner only.
from flask import Flask
app = Flask(__name__)
import eventlet
def background():
for i in range(1, 10000):
print 0
return 42
def callback(gt, *args, **kwargs):
result = gt.wait()
print("[cb] %s" % result)
greenth = eventlet.spawn(background)
for i in range(1, 10000):
print 1
greenth.link(callback)
#app.route('/')
def hello_world():
return 'Hello, World!'
This "tight loop" doesn't give chance to run other green threads.
for i in range(1, 1000):
print 0
Eventlet / gevent / asyncio / other similar technologies provide cooperative multithreading. So you must write code that cooperates. You may find this answer useful https://stackoverflow.com/a/14227272/73957
In more "real code", you'd perform some network IO or wait on synchronisation which would run other green threads implicitly. Otherwise you need to yield control to other green threads explicitly: eventlet.sleep()
Unwanted code review: it would help if you decided to use one of eventlet or threading.

Multiprocess within flask app spinning up 2 processes

I am building a flask app and need some background processes to run. I decided to go with multiprocess, but it's producing two processes when running within Flask. Does anyone know why this would happen? I've tested it on OS X and Ubuntu 12.04, with the same results. Here is an example:
import time
import multiprocessing
from flask import Flask
app = Flask(__name__)
backProc = None
def testFun():
print('Starting')
while True:
time.sleep(3)
print('looping')
time.sleep(3)
print('3 Seconds Later')
#app.route('/')
def root():
return 'Started a background process with PID ' + str(backProc.pid) + " is running: " + str(backProc.is_alive())
#app.route('/kill')
def kill():
backProc.terminate()
return 'killed: ' + str(backProc.pid)
#app.route('/kill_all')
def kill_all():
proc = multiprocessing.active_children()
for p in proc:
p.terminate()
return 'killed all'
#app.route('/active')
def active():
proc = multiprocessing.active_children()
arr = []
for p in proc:
print(p.pid)
arr.append(p.pid)
return str(arr)
#app.route('/start')
def start():
global backProc
backProc = multiprocessing.Process(target=testFun, args=(), daemon=True)
backProc.start()
return 'started: ' + str(backProc.pid)
if __name__ == '__main__':
app.run(port=int("7879"))
This is a problem with the Flask auto-reload feature, which is used during development to automatically restart the webserver when changes in code is detected, in order to serve up the new code without requiring a manual restart.
In the guide, the “app.run()” call is always placed within an “if __name__ == ‘__main__’” condition, since the reloader is set to on by default. When using multiprocessing, this condition will result in false, so you have to instead disable the Flask autoreload when using it in a function like so:
def startWebserver():
app.run(debug=True, use_reloader=False)
Link for reference:
http://blog.davidvassallo.me/2013/10/23/nugget-post-python-flask-framework-and-multiprocessing/

Categories