Does anyone know of a working and well documented implementation of a daemon using python? Please post a link here if you know of a project that fits these two requirements.
Three options I can think of-
Make a cron job that calls your script. Cron is a common name for a GNU/Linux daemon that periodically launches scripts according to a schedule you set. You add your script into a crontab or place a symlink to it into a special directory and the daemon handles the job of launching it in the background. You can read more at wikipedia. There is a variety of different cron daemons, but your GNU/Linux system should have it already installed.
Pythonic approach (a library, for example) for your script to be able to daemonize itself. Yes, it will require a simple event loop (where your events are timer triggering, possibly, provided by sleep function). Here is the one I recommend & use - A simple unix/linux daemon in Python
Use python multiprocessing module. The nitty-gritty of trying to fork a process etc. are hidden in this implementation. It's pretty neat.
I wouldn't recommend 2 or 3 'coz you're in fact repeating cron functionality. The Linux system paradigm is to let multiple simple tools interact and solve your problems. Unless there are additional reasons why you should make a daemon (in addition to trigger periodically), choose the other approach.
Also, if you use daemonize with a loop and a crash happens, make sure that you have logs which will help you debug. Also devise a way so that the script starts again. While if the script is added as a cron job, it will trigger again in the time gap you kept.
If you just want to run a daemon, consider Supervisor, a daemon that itself controls and manages daemons.
If you want to look at the nitty-gritty, you can check out Supervisor's launch script or some of the responses to this lazyweb request.
Check this link for a double-fork daemon: http://code.activestate.com/recipes/278731-creating-a-daemon-the-python-way/
The code is readable and well-documented. You want to take a look at chapter 13 of W. Richard's book 'Advanced Programming in the UNix Environment' for detailed information on Unix daemons.
Related
I have a page where the user selects a Python script, and then this script executes.
My issue is that some scripts take a while to execute (up to 30m) so I'd like to run them in the background while the user can still navigate on the website.
I tried to use Celery but as I'm on Windows I couldn't do better than using --pool=solo which, while allowing the user to do something else, can only do so for one user at a time.
I also saw this thread while searching for a solution, but didn't manage to really understand how it worked nor how to implement it, as well as determine if it was really answering my problem...
So here is my question : how can I have multiple thread/multiple processes on Celery while on Windows ? Or if there's another way, how can I execute several tasks simultaneously in the background ?
Have you identified whether your slow scripts belong to CPU-bound tasks or I/O bound tasks?
if they're I/O bound, you can use eventlet and gevent based on Strategy 1 in the blog from distributedpython.com
but if they're CPU bound, you may have to think of using the ways like a dedicated Celery windows box (or windows Docker container) to workaround Celery billiard issue on Windows by setting the environment variable (FORKED_BY_MULTIPROCESSING=1) based on Strategy 2 in the blog from distributedpython.com
I have a pretty complex computation code written in Octave and a python script which receives user input, and needs to run the Octave code based on the user inputs. As I see it, I have these options:
Port the Octave code to python.
Use external libraries (i.e. oct2py) which enable you to run the Octave/Matlab engine from python.
Communicate between a python process and an octave process. One such possibility would be to use subprocess from the python code and wait for the answer.
Since I'm pretty reluctant to port my code to python and I don't want to rely on maintenance of external libraries such as oct2py, I am in favor of option 3. However, since the system should scale well, I do not want to spawn a new octave process for every request, and a tasks queue system seems more reasonable. Is there any (recommended) tasks queue system to enqueue tasks in python and have an octave worker on the other end process it?
The way it is described here, option 3 degenerates to option 2 because Octave does not have an obvious way (an API or package) for the 'Octave worker' to connect to a task queue.
The only way Octave does "networking" is by the sockets package and this means implementing the protocol for communicating with the task queue from scratch (in Octave).
The original motivation for having an 'Octave worker' is to have the main process of Octave launch once and then "direct it" to execute functions and return results, rather than launching the main process of Octave for every call to a function.
Since Octave cannot do 'a worker' (that launches, listens to a 'channel' and executes code) out of the box, the only other way to achieve this is to have the task queue framework all work in Python and only call Octave when you need its functionality, most likely via oct2py (i.e. option 2).
There are many different ways to do this ranging from Redis, to PyPubSub, Celery and RabbitMQ. All of them straightforward and very well documented. PyPubSub does not require any additional components.
(Just as a note: The solution of having an 'executable' octave script, calling it via Python and blocking until it returns is not as bad as it sounds however and for some parallel-processing frameworks it is the only way to have multiple copies of the same Octave script operate on different data segments.)
All three options are reasonable depending on your particular case.
I don't want to rely on maintenance of external libraries such as oct2py, I am in favor of option 3
oct2py is implemented using option 3. You can reinvent what it already does or use it directly. oct2py is pure Python and it has permissive license: if its development were to stop tomorrow; you could include its code alongside yours.
I have a python program that I would like to constantly be running updates and gathering new data. Essentially, I am gathering data from a bunch of domains. My processors take about a day and a half to run. Once they finish, I'd like them to automatically start over again.
I don't want to use a while loop to just restart the processes without killing everything related first because some of the packages that I am using to support these processors (mainly pyV8) have a problem of memory slowly accumulating and I'm not a good enough programmer to dive into debugging a memory leak in a big package like that. So, I need all of the related processes to successfully die and then come back to life.
I have heard that supervisord can do this type of work, but don't like messing around with .conf files and would prefer to keep everything inside of python.
Summary: Is there a package that will kill all related processes with a script/package that I could use to put into a while loop or create this kind of behavior inside of a python script?
I don't see why you couldn't use supervisord. The configuration is really simple and very flexible and it's not limited to python programs.
For example, you can create file /etc/supervisor/conf.d/myprog.conf:
[program:myprog]
command=/opt/myprog/bin/myprog --opt1 --opt2
directory=/opt/myprog
user=myuser
Then reload supervisor's config:
$ sudo supervisorctl reload
and it's on. Isn't it simple enough?
More about supervisord configuration: http://supervisord.org/subprocess.html
I am attempting to build an educational coding site, similar to Codecademy, but I am frankly at a loss as to what steps should be taken. Could I be pointed in the right direction in including even a simple python interpreter in a webapp?
One option might be to use PyPy to create a sandboxed python. It would limit the external operations someone could do.
Once you have that set up, your website would take the code source, send it over ajax to your webserver, and the server would run the code in a subprocess of a sandboxed python instance. You would also be able to kill the process if it took longer than say 5 seconds. Then you return the output back as a response to the client.
See these links for help on a PyPy sandbox:
http://doc.pypy.org/en/latest/sandbox.html
http://readevalprint.com/blog/python-sandbox-with-pypy.html
To create a fully interactive REPL would be even more involved. You would need to keep an interpreter alive to each client on your server. Then accept ajax "lines" of input and run them through the interp by communicating with the running process, and return the output.
Overall, not trivial. You would need some strong dev skills to do this comfortably. You may find this task a bit daunting if you are just learning.
There's more to do here than you think.
The major problem is that you cannot let people run arbitrary Python code on your webserver. For example, what happens if they do
import os
os.system("rm -rf *.*")
So clearly you have to run this Python code securely. But then you have the problem of securing Python, which is basically impossible because of how dynamic it is. And so you'll probably have to run the Python shell in a virtual machine, which comes with its own headaches.
Have you seen e.g. http://code.google.com/p/google-app-engine-samples/downloads/detail?name=shell_20091112.tar.gz&can=2&q=?
One recent option for this is to use repl.
This option is awesome because the compilers are made using JavaScript so the compilation and execution is made in the user-side, meaning that the server is free of vulnerabilities.
They have compilers for: Python3, Python, Javascript, Java, Ruby, PHP...
I strongly recommend you to check their site at http://repl.it
Look into LXC Containers. They have a pretty cool api that you can use to create lightweight linux containers. You could run the subprocess commands inside that container that way the end user could not mess with your main server.
I'm starting a web project in Python and I'm looking for a process manager that offers reloading in the same manner as PHP-FPM.
I've built stuff with Python before and Paste seems similar to what I want, but not quite.
The need for the ability to reload the process rather than restart is to allow long-running tasks to complete uninterrupted where necessary.
How about supervisor with uwsgi?