I'm running a particularly CPU expensive script that requires a lot of my CPU. I let it run for an hour or so and interrupt it at some point. When I interrupt it, however, it doesn't actually stop the process. When I check if the process is still running with ps a, it lists the process with "U+".
I can see that + means "running in the foreground" but don't know what U stands for.
Is this due to a memory leak or something? What is the explanation for it taking so long to interrupt the process?
Thanks
Related
I have a multi-threading program and have faced an interesting phenomenon recently.
If I call the print method in the worker of a thread, the program turns very reactive. There's no big trick, just calling the print method resolves everything.
I have recently read an article about Python's Global Interpreter Lock aka GIL and it was saying the GIL is released once an I/O bound stuff is executed. Do you think the print call is also an I/O bound?
I would really like to make my program reactive but it is obviously awkward to dump data on the stdout while it's running. So I tried to redirect the output to /dev/null but it didn't resolve the issue:
with contextlib.redirect_stdout(None):
print('')
I would appreciate if you have an idea so that I can reproduce the same effect with the following call but without dumping anything:
print('')
As far as I see the phenomenon, the GIL is released while the interpreter is working for print(''). Maybe I need such a short break which releases me from the GIL.
This is just for your information but I have tried to call the following method:
print('', end='', flush=True)
Of course, it didn't dump anything but my program turned a bit jaggy and it looked to be the thread had occupied the execution time so other threads were running very infrequently.
Update
If I call usleep(1) of QThread expecting to let it sleep for 1 us, then it waits much more than I specified. For example. the thread worker runs every 1 ms which is very slow because I was expecting to run it in the microsecond order. Calling print('') makes the thread running in a few microseconds order. In this meaning, I call it reactive.
Update
I feel something is drugging the execution time of the thread but it's not usleep or time.sleep(). However, I've faced a fact that print can kick the blocker away. So I would like to know what is actually kicking away the blocker.
So there are two things happening here. First, for the GIL itself, most of the I/O functions will release it just before calling into platform code, so a print call will definitely release it. This will naturally let the runtime schedule another thread.
Second, for usleep, this function is guaranteed to sleep at least as many microseconds as you ask for, but is pretty much not going to sleep less than the duration of the OS scheduler tick. On Linux this is often running at 1,000 Hz, 250 Hz, or 100 Hz, but can vary quite a bit.
Now, if you want something more granular than that, there's the nanosleep call which will "busy-wait" for delays shorter than 2 ms instead of calling into the kernel.
What exactly is happening when I call time.sleep(5) in a python script? Is the program using a lot of resources from the computer?
I see people using the sleep function in their programs to schedule tasks, but this requires you leave your hard drive running the whole time right? That would be taking for you computer over the long haul right?
I'm trying to figure out what's to run programs at specific times remotely, but I haven't found an explanation of how to do this that is very intuitive.
Any suggestions?
sleep will mark the process (thread) for being inactive until the given time is up. During this time the kernel will simply not schedule this process (thread). It will not waste resources.
Hard disks typically have spin-down policies based solely on their usage. If they aren't accessed for a specific time, they will spin down. They will spin up as soon as some process (thread) is accessing them again.
This means that letting a process (thread) sleep for some time gives the hard disk a chance to spin down (especially if the sleep duration is large, say, more than some minutes).
I have a script which runs quite a lot of concurrent threads (at least 200). Every thread does some quite complex evaluations, which can take unpredictably lot of time. The evaluation method is implemented in C and I can't change it. I want to limit the method execution time for every thread. Please advise.
From what I understand of your problem, it might be a good case for using multiprocessing instead of multithreading. Multiprocessing will allow you to make use of all the available resources on the system - and then some, if you're not careful.
Threads don't actually run in parallel, so unless you're doing a lot of waiting for I/O or something like that, it would make more sense to call it from a separate process. You could use the Python multiprocessing library to call it from a Python script, or you could use a wrapper written in C and use some form of interprocess communication. The second option will avoid the overhead of launching another Python instance just to run some C code.
You could call time.sleep (or perform other tasks and check the system clock for elapsed time), and then check for results after the desired interval, permitting any processes that haven't finished to continue running while you make use of the results. Or, if you don't care at that point, you can send a signal to kill the process.
I am running an os.system(cmd) in a for-loop. Since sometimes it hangs, I am trying to use process=subprocess.pOpen(cmd) in a for-loop. But I want to know the following:
If I do sleep(60) and then check if the process is still running by using process.poll(), how do I differentiate between process actually running even after 1 minute and process that hung?
If I kill the process which hung, will the for-loop still continue or will it exit?
Thanks!
I don't know of any general way to tell whether a process is hung or working. If a process hangs due to a locking issue, then it might consume 0% CPU and you might be able to guess that it is hung and not working; but if it hangs with an infinite loop, the process might make the CPU 100% busy but not accomplish any useful work. And you might have a process communicating on the network, talking to a really slow host with long timeouts; that would not be hung but would consume 0% CPU while waiting.
I think that, in general, the only hope you have is to set up some sort of "watchdog" system, where your sub-process uses inter-process communication to periodically send a signal that means "I'm still alive".
If you can't modify the program you are running as a sub-process, then at least try to figure out why it hangs, and see if you can then figure out a way to guess that it has hung. Maybe it normally has a balanced mix of CPU and I/O, but when it hangs it goes in a tight infinite loop and the CPU usage goes to 100%; that would be your clue that it is time to kill it and restart. Or, maybe it writes to a log file every 30 seconds, and you can monitor the size of the file and restart it if the file doesn't grow. Or, maybe you can put the program in a "verbose" mode where it prints messages as it works (either to stdout or stderr) and you can watch those. Or, if the program works as a daemon, maybe you can actively query it and see if it is alive; for example, if it is a database, send a simple query and see if it succeeds.
So I can't give you a general answer, but I have some hope that you should be able to figure out a way to detect when your specific program hangs.
Finally, the best possible solution would be to figure out why it hangs, and fix the problem so it doesn't happen anymore. This may not be possible, but at least keep it in mind. You don't need to detect the program hanging if the program never hangs anymore!
P.S. I suggest you do a Google search for "how to monitor a process" and see if you get any useful ideas from that.
A common way to detect things that have stopped working is to have them emit a signal at roughly regular intervals and have another process monitor the signal. If the monitor sees that no signal has arrived after, say, twice the interval it can take action such as killing and restarting the process.
This general idea can be used not only for software but also for hardware. I have used it to restart embedded controllers by simply charging a capacitor from an a.c. coupled signal from an output bit. A simple detector monitors the capacitor and if the voltage ever falls below a threshold it just pulls the reset line low and at the same time holds the capacitor charged for long enough for the controller to restart.
The principle for software is similar; one way is for the process to simply touch a file at intervals. The monitor checks the file modification time at intervals and if it is too old kills and restarts the process.
In OP's case the subprocess could write a status code to a file to say how far it has got in its work.
I have written a data munging script that is very CPU intensive. It has been running for a few days now, but now (thanks to trace messages sent to the console), I can see that it is not working (actually, has not been working for the last 10 hours or so.
When I run top, I notice that the process is either sleeping (S) or in uninterreptable sleep (D). This is wasting a lot of time.
I used sudo renice -10 PID to change the process's nice value, and after running for a short while, I notice that the process has gone back to sleep again.
My question(s):
Is there anything I can do to FORCE the script to run until it finishes (if even it means the machine is unusable until the end of the script?
Is there a yield command I can use in Python, which allows me to periodically pass control to other process/threads to stop the scheduler from trying to put my script to sleep?.
I am using python 2.7.x on Ubuntu 10.0.4
The scheduler will only put your process on hold if there is another process ready to run. If you have no other processes which hog up the CPU, your process will be running most of the time. The scheduler does not put your process to sleep just because it feels like it.
My guess is that there is some reason your process is not runnable, e.g. it is blocking and waiting for I/O or data.