python apscheduler not shutting down - python

I'm trying to stop the apscheduler from running on by removing the job and shutting it down completely!
None of them is working, my function expire_data still gets triggered
def process_bin(value):
print "Stored:",pastebin.value
print "Will expire in",pastebin_duration.value,"seconds!"
if pastebin_duration>=0:
scheduler = BlockingScheduler()
job=scheduler.add_job(expire_data, 'interval', seconds=5)
scheduler.start()
job.remove()
scheduler.shutdown()
def expire_data():
print "Delete data!"
How can I stop it?

Question: I'm trying to stop the apscheduler from running
You are using a BlockingScheduler, therefore you can't.
APScheduler BlockingScheduler
BlockingScheduler is the simplest possible scheduler.
It runs in the foreground, so when you call start(), the call never returns.
Read about Choosing the right scheduler
BlockingScheduler: use when the scheduler is the only thing running in your process
BackgroundScheduler: use when you’re not using any of the frameworks below, and want the scheduler to run in the background inside your application
AsyncIOScheduler: use if your application uses the asyncio module
GeventScheduler: use if your application uses gevent
TornadoScheduler: use if you’re building a Tornado application
TwistedScheduler: use if you’re building a Twisted application
QtScheduler: use if you’re building a Qt application

Related

Adding PeriodicCallback to already running IOLoop instance

I want to create a simple scheduler in Tornado, where along the course of the app, some job with a (time,callback) is generated dynamically, for example,
Send a push notification 30 mins before an event,
but this reminder is created only after the job is created by the server, which may be through a POST request.
I wanted to achieve this through PeriodicCallback, but I read that IOLoop.start() must be called after the PeriodicCallback is created. How can I add PeriodicCallback to already running IOLoop or is there any other way?
There is no requirement that PeriodicCallbacks be started before the IOLoop. You can start them while the IOLoop is running. You have to schedule something before calling IOLoop.start() since that will run forever, but whatever you schedule on the IOLoop can go on to schedule other stuff.

Tornado server caused Django unable to handle concurrent requests

I wrote a Django website that handles concurrent database requests and subprocess calls perfectly fine, if I just run "python manage.py runserver"
This is my model
class MyModel:
...
def foo(self):
args = [......]
pipe = subprocess.Popen(args, stdout=subproccess.PIPE, stderr=subprocess.PIPE)
In my view:
def call_foo(request):
my_model = MyModel()
my_model.foo()
However, after I wrap it using Tornado server, it's no longer able to handle concurrent request. When I click my website where it sends async get request to this call_foo() function, it seems like my app is not able to handle other requests. For example, if I open the home page url, it keeps waiting and won't display until the above subprocess call in foo() has finished.
If I do not use Tornado, everything works fine.
Below is my code to start the tornado server. Is there anything that I did wrong?
MAX_WAIT_SECONDS_BEFORE_SHUTDOWN = 5
def sig_handler(sig, frame):
logging.warning('Caught signal: %s', sig)
tornado.ioloop.IOLoop.instance().add_callback(force_shutdown)
def force_shutdown():
logging.info("Stopping tornado server")
server.stop()
logging.info('Will shutdown in %s seconds ...', MAX_WAIT_SECONDS_BEFORE_SHUTDOWN)
io_loop = tornado.ioloop.IOLoop.instance()
deadline = time.time() + MAX_WAIT_SECONDS_BEFORE_SHUTDOWN
def stop_loop():
now = time.time()
if now < deadline and (io_loop._callbacks or io_loop._timeouts):
io_loop.add_timeout(now + 1, stop_loop)
else:
io_loop.stop()
logging.info('Force Shutdown')
stop_loop()
def main():
parse_command_line()
logging.info("starting tornado web server")
os.environ['DJANGO_SETTINGS_MODULE'] = 'mydjango.settings'
django.setup()
wsgi_app = tornado.wsgi.WSGIContainer(django.core.handlers.wsgi.WSGIHandler())
tornado_app = tornado.web.Application([
(r'/(favicon\.ico)', tornado.web.StaticFileHandler, {'path': "static"}),
(r'/static/(.*)', tornado.web.StaticFileHandler, {'path': "static"}),
('.*', tornado.web.FallbackHandler, dict(fallback=wsgi_app)),
])
global server
server = tornado.httpserver.HTTPServer(tornado_app)
server.listen(options.port)
signal.signal(signal.SIGTERM, sig_handler)
signal.signal(signal.SIGINT, sig_handler)
tornado.ioloop.IOLoop.instance().start()
logging.info("Exit...")
if __name__ == '__main__':
main()
There is nothing wrong with your set-up. This is by design.
So, WSGI protocol (and so Django) uses syncronous model. It means that when your app starts processing a request it takes control and gives it back only when request is finished. That's why it can process single request at once. To allow simultaneous requests one usually launches wsgi application in multithreaded or multiprocessed mode.
The Tornado server on other side uses asynchronous model. The idea here is to have own scheduler instead of OS scheduler that works with threads and processes. So your code runs some logic, then launches some long task (DB call, URL fetch), sets up what to run when task finishes and gives control back to scheduler.
Giving controll back to scheduler is crucial part, it allows async server to work fast because it can start processing new request while previous is waiting for data.
This answer explains sync/async detailed. It focuses on client, but I think you can see the idea.
So whats wrong with your code: Popen does not give control to IOLoop. Python does nothing until your subprocess is finished, and so can not process other requests, even not Django's requests. runserver "works" here because it's multithreaded. So while locking entirely the thread, other threads can still process requests.
For this reason it's usually not recommended to run WSGI apps under async server like tornado. The doc claims it will be less scalable, but you can see the problem on your own code. So if you need both servers (e.g. Tornado for sockets and Django for main site), I'd suggest to run both behind nginx, and use uwsgi or gunicorn to run Django. Or take a look at django-channels app instead of tornado.
Besides, while it works on test environment, I guess it's not a recomended way to do what you try to achieve. It's hard to suggest the solution, as I don't know what do you call with Popen, but it seams to be something long running. Maybe you should take a look at Celery project. It's a package for running long-term background job.
However, back to running sub-processes. In Tornado you can use tornado.process.Subprocess. It's a wrapper over Popen to allow it to work with IOLoop. Unfortunately I don't know if you can use it in wsgi part under tornado. There are some projects I remember, like django futures but it seems to be abandoned.
As another quick and dirty fix - you can run Tornado with several processes. Check this example on how to fork server. But I will not recoment using this in production anyway (fork is OK, running wsgi fallback is not).
So to summarize, I would rewrite your code to do one of the following:
Run the Popen call in some background queue, like Celery
Process such views with Tornado and use tornado.processes module to run subprocess.
And overall, I'd seek for another deployment infrastructure, and would not run Durango under tornado.

Stop a background process in flask without creating zombie processes

I need to start a long-running background process with subprocess when someone visits a particular view.
My code:
from flask import Flask
import subprocess
app = Flask(__name__)
#app.route("/")
def index():
subprocess.Popen(["sleep", "10"])
return "hi\n"
if __name__ == "__main__":
app.run(debug=True)
This works great, for the most part.
The problem is that when the process (sleep) ends, ps -Af | grep sleep shows it as [sleep] <defunct>.
From what I've read, this is because I still have a reference to the process in flask.
Is there a way to drop this reference after the process exits?
I tried doing g.subprocess = subprocess.Popen(["sleep", "10"]), and waiting for the process to end in #app.after_request(response) so I can use del on it, but this prevents flask from returning the response until the subprocess exits - I need it to return the response before the subprocess exits.
Note:
I need the subprocess.Popen operation to be non-blocking - this is important.
As I've suggested in the comments, one of the cleanest and most robust way of achieving this kind of thing in Python is by using celery.
Celery requires a broker transport for messaging, for which rabbitmq is the default, and at least a process with workers running. However, the thing that increases readbility an dmaintanability is that the worker code can co-exist in the same file or files than your server app. You invoke the remote procedures as though it where a simple function call.
Celery can handle retries, post-task events, and lots of other things for free, everything with mature code hardened by years of use in production.
This is your example after re-writting it for use with Celery:
from flask import Flask
from celery import Celery
import subprocess
app = Flask(__name__)
celery_app = Celery("test")
#celery_app.task
def run_process():
subprocess.Popen(["sleep", "5"])
#app.route("/")
def index():
run_process.delay()
return "hi\n"
if __name__ == "__main__":
app.run(debug=True, port=8080)
With this code, in a system with the rabbitmq server running with default options (I installed the package, and started the service - no configurations whatsoever. Of course on production you would have to tune that - but if everything is to be on the same server, it may not even be needed.)
With rabbitmq in place, one starts the worker process with a command line like: celery worker -A bla1.celery_app -D (pip install celery on the same virtualenv you have your Flask). Then just launch the flask server and see it working.
Of course this has even more advantages if you are doing more work in Python itself than just calling an external process. It can have access to your database models, and you can perform assynchronous actions that modify objects in there (and eventually trigger responses for the user, as "flash" messages on the user session, or e-mails)
I've seen a lot of "poor man's parallel processing" using subprocess.Popen and letting it run freely, but that's often leading to zombie problems as you noted.
You could run your process in a thread (in that case, no need for Popen, just use call or check_call if you want to raise an exception if process failed). call or check_call (or run since Python 3.5) waits for the process to complete so no zombies, and since you're running it in a thread you're not blocked.
import threading
def in_background():
subprocess.call(["sleep", "10"])
#app.route("/")
def index():
t = threading.Thread(target=in_background)
t.start()
return "hi\n"
Note: To wait for thread completion you'd have to use t.join() and for that you'd have to keep a reference on the t thread object.
BTW, I suppose that your real process isn't sleep, or it's not very useful and time.sleep(10) does the same (always in a thread of course!)

How to force apscheduler to add jobs to the job store?

I'm adding a job to a scheduler using apscheduler using a script. Unfortunately, the job is not properly scheduled when using a script as I didn't start the scheduler.
scheduler = self.getscheduler() # initializes and returns scheduler
scheduler.add_job(trigger=trigger, func = function, jobstore = 'mongo') #sample code. Note that I did not call scheduler.start()
I'm seeing a message: apscheduler.scheduler - INFO - Adding job tentatively -- it will be properly scheduled when the scheduler starts
The script is supposed to add jobs to the scheduler (not to run the scheduler at that particular instance) and there are some other info which are to be added on the event of a job added to the database. Is it possible to add a job and force the scheduler to add it to the jobstore without actually running the scheduler?
I know, that it is possible to start and shutdown the scheduler after addition of each job to make the scheduler save the job information into the jobstore. Is that really a good approach?
Edit: My original intention was to isolate initialization process of my software. I just wanted to add some jobs to a scheduler, which is not yet started. The real issue is that I've given permission for the user to start and stop scheduler. I cannot assure that there is a running instance of scheduler in the system. I've temporarily fixed the problem by starting the scheduler and shutting it down after addition of jobs. It works.
You would have to have some way to notify the scheduler that a job has been added, so that it could wake up and adjust the delay to its next wakeup. It's better to do this via some sort of RPC mechanism. What kind of mechanism is appropriate for your particular use case, I don't know. But RPyC and Execnet are good candidates. Use one of them or something else to remotely control the scheduler process to add said jobs, and you'll be fine.

Run code at Pyramid shutdown

Pyramid supports an ApplicationCreated event. However I can't find any ApplicationDestroyed/ApplicationShutdown event. Is it at all possible do execute a function upon shutdown.
Do I have any choice other than to go further up my stack: ie. I'm using gevent inside uWSGI. It might be possible to get gevent or uWSGI to run my shutdown code, but it certainly isn't as pretty.
Pyramid does not support any shutdown event.
However Python has a atexit event, that runs on interpreter shutdown
http://docs.python.org/library/atexit.html
import atexit
#atexit.register
def goodbye():
print "You are now leaving the Python sector."

Categories