Worried about threading in python on apache server - python

I don't need the threads to be aware of each other. They just need to preform a task that shouldn't take more than two or three seconds tops. What can I do to guarantee that the tread will not be killed before the task is completed. Also, I need to use the occasionally timer thread. The timer is only for a minute but I'm nervous about that being too long for apache.

Why don't start these threads in the background? Why do they need to be part of the webserver? I would suggest that you write some scripts that either sit idle in the background all the time, or are called periodically by a cron job. The python scripts could lookup info in the database or even use a file to indicate what it needs to do, run, then exit.

Related

Spawn a subprocess but kill it if main process gets killed

I am creating a program in Python that listens to varios user interactions and logs them. I have these requirements/restrictions:
I need a separate process that sends those logs to a remote database every hour
I can't do it in the current process because it blocks the UI.
If the main process stops, the background process should also stop.
I've been reading about subprocess but I can't seem to find anything on how to stop both simultaneously. I need the equivalent of spawn_link if anybody know some Erlang/Elixir.
Thanks!
To answer the question in the title (for visitors from google): there are robust solutions on Linux, Windows using OS-specific APIs and less robust but more portable psutil-based solutions.
To fix your specific problem (it is XY problem): use a daemon thread instead of a process.
A thread would allow to perform I/O without blocking GUI, code example (even if GUI you've chosen doesn't provide async. I/O API such as tkinter's createfilehandler() or gtk's io_add_watch()).

Run specific django manage.py commands at intervals

I need to run a specific manage.py commands on an EC2 instance every X minutes. For example: python manage.py some_command.
I have looked up django-chronograph. Following the instructions, I've added chronograph to my settings.py but on runserver it keeps telling me No module named chronograph.
Is there something I'm missing to get this running? And after running how do I get manage.py commands to run using chronograph?
Edit: It's installed in the EC2 instance's virtualenv.
I would suggest you to configure cron to run your command at specific times/intervals.
First, install it by running pip install django-chronograph.
I would say handle this through cross, but if you don't want to use cross then:
Make sure you installed the module in the virtualenv (With easy_install, pip, or any other way that Amazon EC2 allows). After that you might want to look up the threading module documentation:
Python 2 threading module documentation
Python 3 threading module documentation
The purpose of using threading will be to have the following structure:
A "control" thread, which will use the chronograph module and do the time measurements, and putting the new work to do in an "input queue" on each scheduled time, for the worker threads (which will be active already) to process, or just trigger each worker thread (make it active) at the time you want to trigger each execution. In the first case you'll be taking advantage of parallel threads to do a big chunk of work and minimize io wait times, but since the work is in a queue, the workers will process one at a time. Meaning if you schedule two things too close together and the previous element is still being processed, the new item will have to wait (Depending on your programming logic and amount of worker threads some workers might start processing the new item, but is a bit more complex logic).
In the second case your control thread will actually trigger the start of a new thread (or group of threads) each time you want to trigger a scheduled action. If there's big data to process you might need to spawn a new queue for each task to process and create a group of worker threads for it for each task, but if the data is not that big then you can just get away with having the worker process just one data package and be done once execution is done and you get a result. Either way this method will allow you to schedule tasks without limitation on how close they can be, since new independent worker threads will be created for them every time.
Finally, you might want to create an "output queue" and output thread, to store and process (or output, or anything else you want to do with it...) the results of each worker threads.
The control thread will be basically trying to imitate cron in its logic, triggering actions at certain times depending on how it was configured.
There's also a multiprocessing module in python which will work with processes instead and take advantage of true multiprocessing hardware, but I don't think you'll really need it in this case, unless you see performance issues caused by cpu performance.
If you need any clarification, help, examples, just let me know.

long running CPU intensive python script sent to sleep by scheduler

I have written a data munging script that is very CPU intensive. It has been running for a few days now, but now (thanks to trace messages sent to the console), I can see that it is not working (actually, has not been working for the last 10 hours or so.
When I run top, I notice that the process is either sleeping (S) or in uninterreptable sleep (D). This is wasting a lot of time.
I used sudo renice -10 PID to change the process's nice value, and after running for a short while, I notice that the process has gone back to sleep again.
My question(s):
Is there anything I can do to FORCE the script to run until it finishes (if even it means the machine is unusable until the end of the script?
Is there a yield command I can use in Python, which allows me to periodically pass control to other process/threads to stop the scheduler from trying to put my script to sleep?.
I am using python 2.7.x on Ubuntu 10.0.4
The scheduler will only put your process on hold if there is another process ready to run. If you have no other processes which hog up the CPU, your process will be running most of the time. The scheduler does not put your process to sleep just because it feels like it.
My guess is that there is some reason your process is not runnable, e.g. it is blocking and waiting for I/O or data.

Handling Processing-Intensive Event-Actions in Jython

I have some long-term processes and such that must occur at given button-presses or other events in a Jython GUI I am creating.
In such situations, it seems the best option is to make a separate thread to run the called method/function in when the event happens.
What is the best way to do this? import Threading and have a class that I initialize and run when actionPerformed? Use invokelater? It seems there are a lot of ways to go about this, but would work best in the Jython-Swing environment and be the 'fastest'?
start = JButton( "Analyze", actionPerformed = self.do_analysis )
def do_analysis(self):
...
Large Time Consuming Task
...
I'm not 100% sure that jython has the same problem, but in C Python, you would run into a problem with the GIL or Global Interpreter Lock. This will mean that when your background thread is running, the GUI thread cannot start (even if it is queued to run on another core). You click a button and nothing happens :(
To get round this, I would split the long running process into short steps that can be run on an event, and queue up the event to start the next step as the current step ends. Then the GUI will be able to run between steps if it needs to. The shorter you make the steps, the more responsive the GUI will be - 50ms to 100ms should be OK.
This approach has the nice side effect that you don't need to worry about threads, locking, message queueing or anything. You can try and add these to a GUI but the GUI events and the threads can fight, leading to some very strange and hard to debug errors.
As for the "fastest", this is probably the lowest overhead for shorter background tasks. If you create a new process to run the background task (very heavy overhead in Windows) then it will run faster becasue it has its own core, but the start/stop overhead is high.
This is a situation where you will get the best results by remembering that Jython is running on the JVM. Jython has full access to Java classes, so use the Java threading API to set up a separate computation thread. And if the CPU load is high enough that using more cores would help, Java (the jvm) will take care of that by itself.
In some circumstances, with long running processes, people have used jstack -l to get the nids of running threads, and then use taskset to set the CPU affinity. The JVM nid is in hex and is the PID of the Linux process corresponding to a thread. Other OSes may have similar capabilities.
In general, it is not necessary to do anything other than to make your Jython multithreaded. If you use the Python threading module you don't have access to the full Java threading featureset, but it does use JVM threads under the hood. Just remember to limit your access to global variables or you will end up recreating the Global Interpreter Lock. The Queue module can help with this.

Kill sub-threads when Django restarts?

I'm running Django, and I'm creating threads that run in parallel while Django runs. Those threads sometimes run external processes that block while waiting for external input.
When I restart Django, those threads that are blocking while awaiting external input sometimes persist through the restart, and further they have and keep open Port 8080 so Django can't restart.
If I knew when Django was restarting, I could kill those threads. How can I tell when Django is restarting so that I can kill those threads (and their spawn).
It wasn't obvious from django.utils.autoreload where any hooks may be to tell when a restart is occurring.
Is there an alternative way to kill these threads when Django starts up?
Thanks for reading.
Brian
It's not easy for a Python process to kill its own threads -- even harder (nearly impossible) to kill the threads of another process, and I suspect the latter is the case you have... the "restart" is presumably happening on a different process, so those threads are more or less out of bounds for you!
What I suggest instead is "a stitch in time saves nine": when you create those threads, make sure you set their daemon property to True (see the docs -- it's the setDaemon method in Python <= 2.5). This way, when the main thread finishes, e.g. to restart in another process, so will the entire process (which should take all the daemon threads down, too, automatically!-)
What are you using to restart django? I'd put something in that script to look for process id's in the socket file(s) and kill those before starting django.
Alternatively, you could be very heavy handed and just run something like 'pkill -9 *django*' before your django startup sequence.

Categories