long running CPU intensive python script sent to sleep by scheduler

long running CPU intensive python script sent to sleep by scheduler - python

I have written a data munging script that is very CPU intensive. It has been running for a few days now, but now (thanks to trace messages sent to the console), I can see that it is not working (actually, has not been working for the last 10 hours or so.
When I run top, I notice that the process is either sleeping (S) or in uninterreptable sleep (D). This is wasting a lot of time.
I used sudo renice -10 PID to change the process's nice value, and after running for a short while, I notice that the process has gone back to sleep again.
My question(s):
Is there anything I can do to FORCE the script to run until it finishes (if even it means the machine is unusable until the end of the script?
Is there a yield command I can use in Python, which allows me to periodically pass control to other process/threads to stop the scheduler from trying to put my script to sleep?.
I am using python 2.7.x on Ubuntu 10.0.4

The scheduler will only put your process on hold if there is another process ready to run. If you have no other processes which hog up the CPU, your process will be running most of the time. The scheduler does not put your process to sleep just because it feels like it.
My guess is that there is some reason your process is not runnable, e.g. it is blocking and waiting for I/O or data.

Related

Python process takes a very long time to interrupt

I'm running a particularly CPU expensive script that requires a lot of my CPU. I let it run for an hour or so and interrupt it at some point. When I interrupt it, however, it doesn't actually stop the process. When I check if the process is still running with ps a, it lists the process with "U+".
I can see that + means "running in the foreground" but don't know what U stands for.
Is this due to a memory leak or something? What is the explanation for it taking so long to interrupt the process?
Thanks

Profling a python process when there are multiple processes running on the system

time.clock() measures cpu time. Let's say there are multiple processes running on the system then in that case its possible our python process might be swapped out at various time during execution by scheduler. So there will be a period when our python process will be in waiting state.
In that case can we still measure cpu ticks just for our process-ignoring everything else thats scheduled to be run on the system?
Basically the results should not change depending on what else is running on the cpu. the numbers should be same as there was single core and our process was the only process bound to that core. I want to profile from within the code.

How to change process state from sleep to running in linux?

I have a python program which needs to scan some large log files to extract useful information.
In this program, to better utilize computing resource of sever (which runs ubuntu 12.04 LTS and has 64 cores and 96 GB memory), I create a process pool with size = 10 and apply sever jobs to these pool workers. Each job reads from several large files(about 50 GB each, 20 files in total) by using file.readlines(), and then analyze them line by line to find useful information and save the results in a dictionary. After all files are scanned and analyzed, the result dictionary is wrote to the disk. Besides, there is no explicit call of gc.collect() in the whole script.
I started this program on server using root account and these processes works fine at first: each process of this program will occupies about 3.8 GB memory, so there is 40 GB in total
After a few hours, some other user starts another memory-consuming program (also use root account), which aggressively uses almost all the memory (99% of total memory), and later this program is interrupted by CTRL-Z and killed by using killall -9 process_name
However, after this, I have found that the process state of most of my poolworkers have been changed to S, the CPU usage of these sleep process is decreased to 0. According to man top:
The status of the task which can be one of:
'D' = uninterruptible sleep,
'R' = running,
'S' = sleeping,
'T' = traced or stopped,
'Z' = zombie
I used ps -axl command to check the name of the kernel function where the process is sleeping, and it turns out to these poolworker processes sleep on _fastMutex.
This situation lasts for a long time(The process state is still S now) and I don't want to restart my process to scan all the files again, how can I change these process from state Sleep to Running ?

The Sleeping state indicates that they are waiting for something; the way to wake them up is to satisfy whatever condition it is they wait for (the mutex is probably the mechanism of waiting, not the condition itself). The references to memory consumption suggest the possibility that some processes are at least partially paged out, in which case they would be waiting for the swapper; however, that results in uninterruptible sleep D, not S.
System calls that are in interruptible sleep can also be interrupted by signals, such as alarm, terminate, stop, or continue. Most signals cause the program to abort, however. The two that are (usually) safe, continue and ignore, don't change program flow; so it would just go back to sleep on the same condition again.
Most likely, the reason your processes are in S is that they're genuinely waiting for outside input. Since all we know of your program is that it loads a lot of data, I can't tell you where that happens.
As for how you've described your program: "Each job reads from several large files ... using file.readlines(), and then analyze them line by line". It's highly unlikely this is an efficient way to do it; if you're only scanning line by line in one sweep, it's better to iterate on the file objects in the first place (getting one line at a time). If you're reading text lines in a random order, linecache is your friend. Using mmap you could avoid copying the data from the disk buffers. Which is the best fit depends on the structure of your data and algorithm.
By "state of most of my poolworkers have been changed to S" I suspect that the other workers are what's interesting. Perhaps the sleeping ones are just waiting for the ones that are paged out to return.

python handling subprocess

I am running an os.system(cmd) in a for-loop. Since sometimes it hangs, I am trying to use process=subprocess.pOpen(cmd) in a for-loop. But I want to know the following:
If I do sleep(60) and then check if the process is still running by using process.poll(), how do I differentiate between process actually running even after 1 minute and process that hung?
If I kill the process which hung, will the for-loop still continue or will it exit?
Thanks!

I don't know of any general way to tell whether a process is hung or working. If a process hangs due to a locking issue, then it might consume 0% CPU and you might be able to guess that it is hung and not working; but if it hangs with an infinite loop, the process might make the CPU 100% busy but not accomplish any useful work. And you might have a process communicating on the network, talking to a really slow host with long timeouts; that would not be hung but would consume 0% CPU while waiting.
I think that, in general, the only hope you have is to set up some sort of "watchdog" system, where your sub-process uses inter-process communication to periodically send a signal that means "I'm still alive".
If you can't modify the program you are running as a sub-process, then at least try to figure out why it hangs, and see if you can then figure out a way to guess that it has hung. Maybe it normally has a balanced mix of CPU and I/O, but when it hangs it goes in a tight infinite loop and the CPU usage goes to 100%; that would be your clue that it is time to kill it and restart. Or, maybe it writes to a log file every 30 seconds, and you can monitor the size of the file and restart it if the file doesn't grow. Or, maybe you can put the program in a "verbose" mode where it prints messages as it works (either to stdout or stderr) and you can watch those. Or, if the program works as a daemon, maybe you can actively query it and see if it is alive; for example, if it is a database, send a simple query and see if it succeeds.
So I can't give you a general answer, but I have some hope that you should be able to figure out a way to detect when your specific program hangs.
Finally, the best possible solution would be to figure out why it hangs, and fix the problem so it doesn't happen anymore. This may not be possible, but at least keep it in mind. You don't need to detect the program hanging if the program never hangs anymore!
P.S. I suggest you do a Google search for "how to monitor a process" and see if you get any useful ideas from that.

A common way to detect things that have stopped working is to have them emit a signal at roughly regular intervals and have another process monitor the signal. If the monitor sees that no signal has arrived after, say, twice the interval it can take action such as killing and restarting the process.
This general idea can be used not only for software but also for hardware. I have used it to restart embedded controllers by simply charging a capacitor from an a.c. coupled signal from an output bit. A simple detector monitors the capacitor and if the voltage ever falls below a threshold it just pulls the reset line low and at the same time holds the capacitor charged for long enough for the controller to restart.
The principle for software is similar; one way is for the process to simply touch a file at intervals. The monitor checks the file modification time at intervals and if it is too old kills and restarts the process.
In OP's case the subprocess could write a status code to a file to say how far it has got in its work.

Worried about threading in python on apache server

I don't need the threads to be aware of each other. They just need to preform a task that shouldn't take more than two or three seconds tops. What can I do to guarantee that the tread will not be killed before the task is completed. Also, I need to use the occasionally timer thread. The timer is only for a minute but I'm nervous about that being too long for apache.

Why don't start these threads in the background? Why do they need to be part of the webserver? I would suggest that you write some scripts that either sit idle in the background all the time, or are called periodically by a cron job. The python scripts could lookup info in the database or even use a file to indicate what it needs to do, run, then exit.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.