gunicorn with eventlet runs threads in sequential manner - python

from flask import Flask
app = Flask(__name__)
import threading
class SThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def run(self):
for i in range(1, 1000):
print 0
t = SThread()
t.start()
for i in range(1, 1000):
print 1
t.join()
#app.route('/')
def hello_world():
return 'Hello, World!'
When you start your server like this, gunicorn run:app -b 0.0.0.0:8000, you will see all 0s and 1s will be in random order, main thread and child thread are running parallel.
But when you run same piece of code with gunicorn --worker-class eventlet run:app -b 0.0.0.0:8000, you will see first there will be all 0s and then there will be all 1s. That means main thread and child thread are not running parallel.
Is this expected behaviour?
And how can I use eventlet and make use of threading behaviour?
Edited ::
Based on suggestion, I am trying to do something like this to achieve threads like random behaviour and to join these multiple execution streams.
But it is running in sequential manner only.
from flask import Flask
app = Flask(__name__)
import eventlet
def background():
for i in range(1, 10000):
print 0
return 42
def callback(gt, *args, **kwargs):
result = gt.wait()
print("[cb] %s" % result)
greenth = eventlet.spawn(background)
for i in range(1, 10000):
print 1
greenth.link(callback)
#app.route('/')
def hello_world():
return 'Hello, World!'

This "tight loop" doesn't give chance to run other green threads.
for i in range(1, 1000):
print 0
Eventlet / gevent / asyncio / other similar technologies provide cooperative multithreading. So you must write code that cooperates. You may find this answer useful https://stackoverflow.com/a/14227272/73957
In more "real code", you'd perform some network IO or wait on synchronisation which would run other green threads implicitly. Otherwise you need to yield control to other green threads explicitly: eventlet.sleep()
Unwanted code review: it would help if you decided to use one of eventlet or threading.

Related

Python - how can I run separate module (not function) as a separate process?

tl,dr: How can I programmably execute a python module (not function) as a separate process from a different python module?
On my development laptop, I have a 'server' module containing a bottle server. In this module, the name==main clause starts the bottle server.
#bt_app.post("/")
def server_post():
<< Generate response to 'http://server.com/' >>
if __name__ == '__main__':
serve(bt_app, port=localhost:8080)
I also have a 'test_server' module containing pytests. In this module, the name==main clause runs pytest and displays the results.
def test_something():
_rtn = some_server_function()
assert _rtn == desired
if __name__ == '__main__':
_rtn = pytest.main([__file__])
print("Pytest returned: ", _rtn)
Currently, I manually run the server module (starting the web server on localhost), then I manually start the pytest module which issues html requests to the running server module and checks the responses.
Sometimes I forget to start the server module. No big deal but annoying. So I'd like to know if I can programmatically start the server module as a separate process from the pytest module (just as I'm doing manually now) so I don't forget to start it manually.
Thanks
There is my test cases dir tree:
test
├── server.py
└── test_server.py
server.py start a web server with flask.
from flask import Flask
app = Flask(__name__)
#app.route('/')
def hello_world():
return 'Hello, World!'
if __name__ == '__main__':
app.run()
test_server.py make request to test.
import sys
import requests
import subprocess
import time
p = None # server process
def start_server():
global p
sys.path.append('/tmp/test')
# here you may want to do some check.
# whether the server is already started, then pass this fucntion
kwargs = {} # here u can pass other args needed
p = subprocess.Popen(['python','server.py'], **kwargs)
def test_function():
response = requests.get('http://localhost:5000/')
print('This is response body: ', response.text)
if __name__ == '__main__':
start_server()
time.sleep(3) # waiting server started
test_function()
p.kill()
Then you can do python test_server to start the server and do test cases.
PS: Popen() needs python3.5+. if older version, use run instead
import logging
import threading
import time
def thread_function(name):
logging.info("Thread %s: starting", name)
time.sleep(2)
logging.info("Thread %s: finishing", name)
if __name__ == "__main__":
format = "%(asctime)s: %(message)s"
logging.basicConfig(format=format, level=logging.INFO,
datefmt="%H:%M:%S")
threads = list()
for index in range(3):
logging.info("Main : create and start thread %d.", index)
x = threading.Thread(target=thread_function, args=(index,))
threads.append(x)
x.start()
for index, thread in enumerate(threads):
logging.info("Main : before joining thread %d.", index)
thread.join()
logging.info("Main : thread %d done", index)
With threading you can run multiple processes at once!
Wim baasically answered this question. I looked into the subprocess module. While reading up on it, I stumbled on the os.system function.
In short, subprocess is a highly flexible and functional program for running a program. os.system, on the other hand, is much simpler, with far fewer functions.
Just running a python module is simple, so I settled on os.system.
import os
server_path = "python -m ../src/server.py"
os.system(server_path)
Wim, thanks for the pointer. Had it been a full fledged answer I would have upvoted it. Redo it as a full fledged answer and I'll do so.
Async to the rescue.
import gevent
from gevent import monkey, spawn
monkey.patch_all()
from gevent.pywsgi import WSGIServer
#bt_app.post("/")
def server_post():
<< Generate response to 'http://server.com/' >>
def test_something():
_rtn = some_server_function()
assert _rtn == desired
print("Pytest returned: ",_rtn)
sleep(0)
if __name__ == '__main__':
spawn(test_something) #runs async
server = WSGIServer(("0.0.0.0", 8080, bt_app)
server.serve_forever()

gunicorn returns same pid for different worker process

I'm trying to get a better understanding of how gunicorn manages its processes so I wrote the following code:
from fastapi import FastAPI
import os
from time import sleep
app = FastAPI()
sleep_time = [5, 0]
global_var = 100
order_called = 0
#app.get('/test')
def test_handler():
global global_var, order_called
order_called += 1
local_order_called = order_called
sleep(sleep_time[order_called - 1]) # doing some time consuming stuff here
global_var += 1
return {"test": os.getpid(), "global_var": global_var, "order called": local_order_called}
And I started gunicorn with:
gunicorn -w 2 -k uvicorn.workers.UvicornWorker server:app --preload
The server is supposed to output: pid, 102, 1 and pid, 101, 2 for two consecutive requests - because it takes more time for worker #1 to finish.
I'm getting the correct output, but somehow the two pids returned are the same, which is very strange - this happens with or without the --preload.
Anyone can shed some lights on this? Thanks!
What makes you think the server should produce multiple PIDs? I don't see anything here that would fork another process. os.getpid() will return the same value each time if called by the server. Hope that helps!

Execute function periodically or when an event fires

I am working on Python's Flask server side code where there is a background task which runs periodically and executes a function (note that 'periodically' is not so hardline, execute once and then after x seconds also works). But I also need it to execute the same function immediately when the server receives a request (and then resume the background task).
This kind of reminds me of the SELECT system call in C, where the system waits for a timeout or until a packet arrives.
Here is what I came up minimally after looking up a lot of answers.
from flask import Flask, request
import threading, os, time
POOL_TIME = 2
myThread = threading.Thread()
def pollAndExecute(a='|'):
time.sleep(1)
print(time.time(), a)
# time.sleep(1)
myThread = threading.Timer(POOL_TIME, pollAndExecute)
myThread.start()
def startWork():
global myThread
myThread = threading.Timer(POOL_TIME, pollAndExecute)
myThread.start()
app = Flask(__name__)
#app.route('/ping', methods=['POST'])
def ping():
global myThread
myThread.cancel()
pollAndExecute("#")
return "Hello"
if __name__ == '__main__':
app.secret_key = os.urandom(12)
startWork()
app.run(port=5001)
Output:
But the output clearly says that it is not behaving properly after there is a request (sent using curl -X POST http://localhost:5001/ping)
Please guide me as to how to correct this or are there any other ways to do it. Just FYI, in the original code, there are various database updates in the pollAndExecute() as well and I need to take care that there are no race conditions between polling and ping. Needless to say, only one copy of the function should execute at a particular time (preferably in a single thread).
Here is the solution I made for your problem. I used a priority queue that takes in data to be run with the printTime function. The background and flask functions are two different threads that push data into the priority queue, which should prioritize the flask call over the background one. Notice how it now waits for the current thread to finish before executing another one.
from flask import Flask, request
import threading, os, time
from threading import Thread, Lock
from queue import PriorityQueue
POOL_TIME = 2
lock = Lock()
def printTime(a='|'):
time.sleep(1) # Simulate process taking 1 sec
print(time.time(), a)
jobs = PriorityQueue()
class Queue(Thread):
def __init__(self):
Thread.__init__(self)
self.daemon = True
self.start()
def run(self):
while True:
_, data = jobs.get()
printTime(data)
class backGroundProcess(Thread):
def __init__(self):
Thread.__init__(self)
self.daemon = True
self.start()
def run(self):
while True:
time.sleep(2) # Background process enqueues a job every 2 secs
jobs.put((0,"|"))
class flaskProcess(Thread):
def __init__(self):
Thread.__init__(self)
self.start()
def run(self):
jobs.put((1,"#"))
app = Flask(__name__)
#app.route('/ping', methods=['POST'])
def ping():
flaskThread = flaskProcess()
return "Hello"
if __name__ == '__main__':
backGroundProcess()
Queue()
app.secret_key = os.urandom(12)
app.run(port=5001)
The above snippet may be a little verbose because I used classes, but this should get you started.

Using multiprocessing with gunicorn in Flask application

I made a basic flask application using Gunicorn with worker class gevent. The issue I ran into was as follows. If I had a basic flask app like this:
from multiprocessing import Pool
import Queue
import random
from threading import Thread
import time
from flask import Flask
app = Flask(__name__)
def f(x):
return random.randint(1, 6)
def thread_random(queue):
time.sleep(random.random())
queue.put(random.randint(1, 6))
def thread_roll():
q = Queue.Queue()
threads = []
for _ in range(3):
t = Thread(target=thread_random, args=(q, ))
t.start()
threads.append(t)
for t in threads:
t.join()
dice_roll = sum([q.get() for _ in range(3)])
return dice_roll
#app.route('/')
def hello_world():
# technique 1
pool = Pool(processes=4)
return 'roll is: %s \n' % sum(pool.map(f, range(3)))
# technique 2
return 'roll is: %s \n' % thread_roll()
if __name__ == '__main__':
app.run(debug=True)
And I took two techniques at it, technique 1 will break gunicorn if I run it like:
sudo gunicorn -b 0.0.0.0:8000 app:app --worker-class gevent
but technique 2 won't. I see this is because technique 1 relies on multiprocessing and technique 2 relies on threads, but I can't figure out why a gevent worker class doesn't allow for a pool?
If you're using gevent. You should try using monkey_patch.
http://www.gevent.org/gevent.monkey.html

Multiprocess within flask app spinning up 2 processes

I am building a flask app and need some background processes to run. I decided to go with multiprocess, but it's producing two processes when running within Flask. Does anyone know why this would happen? I've tested it on OS X and Ubuntu 12.04, with the same results. Here is an example:
import time
import multiprocessing
from flask import Flask
app = Flask(__name__)
backProc = None
def testFun():
print('Starting')
while True:
time.sleep(3)
print('looping')
time.sleep(3)
print('3 Seconds Later')
#app.route('/')
def root():
return 'Started a background process with PID ' + str(backProc.pid) + " is running: " + str(backProc.is_alive())
#app.route('/kill')
def kill():
backProc.terminate()
return 'killed: ' + str(backProc.pid)
#app.route('/kill_all')
def kill_all():
proc = multiprocessing.active_children()
for p in proc:
p.terminate()
return 'killed all'
#app.route('/active')
def active():
proc = multiprocessing.active_children()
arr = []
for p in proc:
print(p.pid)
arr.append(p.pid)
return str(arr)
#app.route('/start')
def start():
global backProc
backProc = multiprocessing.Process(target=testFun, args=(), daemon=True)
backProc.start()
return 'started: ' + str(backProc.pid)
if __name__ == '__main__':
app.run(port=int("7879"))
This is a problem with the Flask auto-reload feature, which is used during development to automatically restart the webserver when changes in code is detected, in order to serve up the new code without requiring a manual restart.
In the guide, the “app.run()” call is always placed within an “if __name__ == ‘__main__’” condition, since the reloader is set to on by default. When using multiprocessing, this condition will result in false, so you have to instead disable the Flask autoreload when using it in a function like so:
def startWebserver():
app.run(debug=True, use_reloader=False)
Link for reference:
http://blog.davidvassallo.me/2013/10/23/nugget-post-python-flask-framework-and-multiprocessing/

Categories