The title is a bit fuzzy because I don't know the right vocabulary.
Here's the thing I am trying to do: I have a script/program on the server for running checks. Now my co-workers want that this script can be started from a website, and the logs viewed from there. The process can be quite long running for the checks, usually more than a few hours.
for that, I gathered, I'd have to monitor the processes with the website script, and show their logs. The chosen language would be either PHP or Python.
I'd very much like a hint or view on how such a thing is generally done and what are best practices, as I'm unsure how to start with this one. Especially a reliable way to start/monitor the processes would be much welcome.
If you choose Python check out Celery (although it may be a little bit overkill if you want to keep things simple). It allows you to run asynchronous tasks and you can easily monitor them. There is also a django integration for celery (django-celery) that includes a web monitor for the tasks.
Related
I have a python script, that calls HandBrakeCli as a subprocess on Linux. During normal operations, HandBrakeCli takes almost 100% of available CPU resources. But sometimes it seems to starve or whatever - it seems not to be active, but does not return for a long time. This seems to be due to a broken DVD (Handbrake is a dvd ripping tool) - it seems that Handbrake is not handling this so well. It will finish after a long while (hours), but bevor it does nothing. Whereas in normal operation it will use lots of cpu cycles.
So my idea to solve this was to have a watch dog that checks if there is a HandBrakeCli process, and if so monitors if it does use a good part of the cpu. If that is not the case for maybe 5 minutes, it will kill that process, so that the parent script can continue its operations.
It does not seem too hard to program this, but I have a feeling it could involve some tedium. Also it seems not unlikely that this problem has been solved before. Is there a solution for this, possibly in python? If there is a standalone program or daemone that does the job on Linux I would not mind if its not python, as long as it is open source.
Well perhaps the most obvious answer for something to monitor and control child process in python would have to be supervisord. It's very mature, and has a lot of features and documentation.
It is written in Python, and has a plugin API, as well as a service API and a web dashboard.
Your problem description is a little bit vague though, re 'starvation'. It does sound as if your job is blocking on some resource, but you might want to diagnose this a little more precisely before reaching for a complicated software scaffolding to prop it up.
i.e. perhaps it's blocked on I/O (network or disk bandwidth), or it's stalled reading from a pipeline that's failed. You might want to look into process control (e.g. the renice command ) to manage it's CPU allocation a little better etc.
My organization spends a lot of time processing GIS data. I have built a number of python scripts that perform different steps of the data processing. Other than the first script, all scripts rely on a different script to finish before it can start. Many of the scripts take 5+ minutes to execute (one is over an hour), so I do not want to repeat already-executed steps. I want this to work similar to Make, so that if an error occurs in "script3", I don't have to re-execute "script1" and "script2". I can just re-run "script3".
Is SCons the right tool for this? I looked at it, and it seems to be focused on compiling code rather than running scripts. I'm open to other suitable tools.
I am not sure a build system is what you want. Unless I am missing something, what you want is some kind of controlled automation to execute your processing tasks, and handle runtime errors.
Of course, 'make' and 'SCons' can do that, but it would be like using a bazooka to hammer a nail. And you're actually overlooking something that might be easier and more rewarding to invest time learning on the long run, which is Python itself. Python is a full-fledged, multi-paradigm programming language, with a lot of features for robust exception handling and interaction with the operating system (and it is heavily used in system administration on Unix-like platforms).
A first simple step would be to have a master script call each of your other scripts, each inside a try ... except block, and handle the exceptions according to your requirements. And you might improve on that as you go along, by refactoring your scripts into a consistent Python application.
Here are some links to start with: link1, link2.
Question is relevant to this and this;
the difference is, I'd prefer something with possibly more precision and low load (per-minute cron job isn't preferable for those) and with minimal overhead (i.e. installing celery with rabbitmq seems like a big overkill).
An example task for such is personal reminders server (with reminders that could be edited over web and sent out through e-mail or XMPP).
I'm probably looking for something more like node.js's setTimeout but for django (and though I might prefer to implement reminders in node.js anyway, it's still a possibly interesting question).
For example, it's possible to start new threads in django app (with functions consisting of sleep() and send()); in what ways this can be bad?
The problem with using threads for this solution are the typical problems with Python threads that always drive people towards multi-process solutions instead. The problem is compounded here by the fact your thread isn't driven by the normal request-response cycle. This is summarized nicely by Malcolm Tredinnick here:
Have to disagree. Threads are not a good solution to this problem. The
issue is process management. As written, your threads will never be
rejoined. Webserver processes have a lifecycle uncontrollable by you
(the MaxRequestsPerChild Apache parameter and similar things in other
servers) and you are messing with that by using threads.
If you need a process with a lifecycle that is not matched by the
request-response path — something long running and independent of the
response — a completely separate process is definitely the right model
to use. Using a thread is tying it to the response lifecycle, which
wil have unintended side-effects.
A possible solution for you might be to have a long running process performing your tasks which gets a wake-up signal from a light cron process.
Another possibility would be build something using 0mq, which is much lighter than AMQP style queues (at the cost of some features of course). Tarek Ziade is working on a Mozilla project called powerhose that uses 0mq, looks super simple, and has a heartbeat capability with resolution to the second.
I want to create a python script that polls a folder that a java server will fill with images when a user transfers them, however I want this script to be almost as invisible as possible in terms of noticeable effects. Keep in mind that this computer has many servers on it, and management of memory and speed are thing I want to optimize. How is the best way to poll this directory without it clogging up the system? Would I want to pull sleep functions in there, or does that cause even more problems?
If your server is Linux, the best and cleanest way to do this is with the system service inotify, which is designed just for your needs. Python has a lib as a part of the twisted network programming framework, which is loosely coupled, so you can use it while keeping it simple. Just check-out this example:
http://twistedmatrix.com/documents/10.2.0/api/twisted.internet.inotify.html
it is quite straight-forward.
I'm working on a turn-based web game that will perform all world updates (player orders, physics, scripted events, etc.) on the server. For now, I could simply update the world in a web request callback. Unfortunately, that naive approach is not at all scalable. I don't want to bog down my web server when I start running many concurrent games.
So what is the best way to separate the load from the web server, ideally in a way that could even be run on a separate machine?
A simple python module with infinite loop?
A distributed task in something like Celery?
Some sort of cross-platform Cron scheduler?
Some other fancy Django feature or third-party library that I don't know about?
I also want to minimize code duplication by using the same model layer. That probably means my service would need access to the Django model code, so that definitely determines how I architect the service.
I think Celery, which you mention in your question, is the way to go here. It will interface nicely with the rest of your setup, support your eventual aim of separating out the systems, and is compatible with Django.
I'd just write the backend to just use the Django database interface (look at the setup code in your manage.py), spawn it as its own process, and interface to it with Protocol Buffers. That route should move to a separate machine with little work. MPI may be an option, too.
Pipes, FIFOs, and most other IPC requires both processes to be on the same box.
Though I have to point out a flaw in your premise:
Unfortunately, that naive approach is not at all scalable. I don't want to bog down my web server when I start running many concurrent games.
If you run concurrent games, so long as you keep all the parts for a given game on the same server, this is a non-issue, unless there's a common resource needed by all games. Then the real issue becomes load balancing across the servers.