On a Windows 7 platform I'm using python 3.6 as a framework to start working processes (written in C).
For starting the processes subprocess.Popen is used. The following shows the relevant code (one thread per process to be started).
redirstream = open(redirfilename, "w")
proc = subprocess.Popen(batchargs, shell=False, stdout=redirstream)
outs, errs = proc.communicate(timeout=60)
# wait for job to be finished
ret = proc.wait()
...
if ret == 0: # changed !!
redirstream.flush()
redirstream.close()
os.remove(redirfilename)
communicate is just used to be able, to terminate the executable after 60 seconds , for the case it hangs. redirstream is used to write output from the executable (written in C) to a file, used for general debugging purposes (not related to this issue). Of course, all processes are passed redirfiles with different filenames.
Up to ten such subprocesses are started in that way from independent python threads.
Although it works, I made a mysterious observation:
For the case an executable has finished without errors, I want to delete redirfilename, because it is not used anymore.
Now lets say, I have started process-A, B and C.
Processes A and B are finished and gave back 0 as result.
Process C however intentionally doesn't get data (just for testing, a serial connection has been disconnected) and waits for input from a named pipe (created from python) using Windows ReadFile function:
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365467(v=vs.85).aspx
In that case, while "C" is still waiting for ReadFile to be finished, os.remove(redirfilename) for A and B sometimes throws exception "PermissionError", saying, that the file is still used by another process. But from task manager I can see, that the processes A and B are not existing anymore (as expected).
I tried to catch the PermissionError and repeat the delete command after some delay. Only after "C" has terminated (timeout after 60 seconds), the redirfile for A or B can be deleted.
Why is the redirstream still blocked and somehow in use, although the process behind is not alive anymore and why is it blocked by ReadFile() in a completely unrelated process, which is definitely not related to that particular file? Is that an issue in Python or in my implementation?
Any hints are highly appreciated...
Related
I have a problem with using Python to run Windows executables in parallel.
I will explain my problem in more detail.
I was able to write some code that creates an amount of threads equal to the number of cores. Each thread executes the following function that starts the executable with the use of subprocess.Popen().
The executable are unit test for an application. The test use gtest library. From what I know they just read and write on the file system.
def _execute(self, test_file_path) -> None:
test_path = self._get_test_path_without_extension(test_file_path)
process = subprocess.Popen(test_path,
shell=False,
stdout=sys.stdout,
stderr=sys.stderr,
universal_newlines=True)
try:
process.communicate(timeout=TEST_TIMEOUT_IN_SECONDS)
if process.returncode != 0:
print(f'Test fail')
except subprocess.TimeoutExpired:
process.kill()
During the execution of processes it happens that some hang, never ending. I set a timeout as workaround but I wondering why some of these application never terminate. This block the execution of the Python code.
The following code show the creation of the threads. The function _execute_tests just take a test from the Queue (with the .get() function) and pass it to the function execute(test_file_path).
### Peace of code used to spawn the threads
for i in range(int(core_num)):
thread = threading.Thread(target=self._execute_tests,
args=(tests,),
daemon=True)
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
I already try to:
use subprocess.run, subprocess.call and the other function explained on the documentation page
use a larger buffer with the use of bufsize parameter
disable the buffer
move the stdout to a file per thread
move the stdout to subprocess.DEVNULL
remove the use of subprocess.communicate()
remove the use of threading
use multiprocessing
On my local machine with 16 core / 64GB RAM I can run without problems 16 threads. All of them always terminate without problems. To be able to reproduce the problem I need to increase the number of threads to 30/40.
On Azure machine with 8 core / 32 GB RAM the issues can be reproduce with just 8 threads in parallel.
If I run the executables from a bat
for /r "." %%a in (*.exe) do start /B "" "%%~fa"
the problem never happen.
Have someone an idea of what could be the problem?
I have a script, that prints colored output if it is on tty. A bunch of them executes in parallel, so I can't put their stdout to tty. I don't have control over the script code either (to force coloring), so I want to fake it via pty. My code:
invocation = get_invocation()
master, slave = pty.openpty()
subprocess.call(invocation, stdout=slave)
print string_from_fd(master)
And I can't figure out, what should be in string_from_fd. For now, I have something like
def string_from_fd(fd):
return os.read(fd, 1000)
It works, but that number 1000 looks strange . I think output can be quiet large, and any number there could be not sufficient. I tried a lot of solutions from stack overflow, but none of them works (it prints nothing or hanging forever).
I am not very familiar with file descriptors and all that, so any clarification if I'm doing something wrong would be much appreciated.
Thanks!
This won't work for long outputs: subprocess.call will block once the PTY's buffer is full. That's why subprocess.communicate exists, but that won't work with a PTY.
The standard/easiest solution is to use the external module pexpect, which uses PTYs internally: For example,
pexpect.spawn("/bin/ls --color=auto").read()
will give you the ls output with color codes.
If you'd like to stick to subprocess, then you must use subprocess.Popen for the reason stated above. You are right in your assumption that by passing 1000, you read at most 1000 bytes, so you'll have to use a loop. os.read blocks if there is nothing to read and waits for data to appear. The catch is how to recognize when the process terminated: In this case, you know that no more data will arrive. The next call to os.read will block forever. Luckily, the operating system helps you detect this situation: If all file descriptors to the pseudo terminal that could be used for writing are closed, then os.read will either return an empty string or return an error, depending on the OS. You can check for this condition and exit the loop when this happens. Now the final piece to understanding the following code is to understand how open file descriptors and subprocess go together: subprocess.Popen internally calls fork(), which duplicates the current process including all open file descriptors, and then within one of the two execution paths calls exec(), which terminates the current process in favour of a new one. In the other execution path, control returns to your Python script. So after calling subprocess.Popen there are two valid file descriptors for the slave end of the PTY: One belongs to the spawned process, one to your Python script. If you close yours, then the only file descriptor that could be used to send data to the master end belongs to the spawned process. Upon its termination, it is closed, and the PTY enters the state where calls to read on the master end fail.
Here's the code:
import os
import pty
import subprocess
master, slave = pty.openpty()
process = subprocess.Popen("/bin/ls --color", shell=True, stdout=slave,
stdin=slave, stderr=slave, close_fds=True)
os.close(slave)
output = []
while True:
try:
data = os.read(master, 1024)
except OSError:
break
if not data:
break
output.append(data) # In Python 3, append ".decode()" to os.read()
output = "".join(output)
I am using the multiprocessing module in python to launch few processes in parallel. These processes are independent of each other. They generate their own output and write out the results in different files. Each process calls an external tool using the subprocess.call method.
It was working fine until I discovered an issue in the external tool where due to some error condition it goes into a 'prompt' mode and waits for the user input. Now in my python script I use the join method to wait till all the processes finish their tasks. This is causing the whole thing to wait for this erroneous subprocess call. I can put a timeout for each of the process but I do not know in advance how long each one is going to run and hence this option is ruled out.
How do I figure out if any child process is waiting for an user input and how do I send an 'exit' command to it? Any pointers or suggestions to relevant modules in python will be really appreciated.
My code here:
import subprocess
import sys
import os
import multiprocessing
def write_script(fname,e):
f = open(fname,'w')
f.write("Some useful cammnd calling external tool")
f.close()
subprocess.call(['chmod','+x',os.path.abspath(fname)])
return os.path.abspath(fname)
def run_use(mname,script):
print "ssh "+mname+" "+script
subprocess.call(['ssh',mname,script])
if __name__ == '__main__':
dict1 = {}
dict['mod1'] = ['pp1','ext2','les3','pw4']
dict['mod2'] = ['aaa','bbb','ccc','ddd']
machines = ['machine1','machine2','machine3','machine4']
log_file.write(str(dict1.keys()))
for key in dict1.keys():
arr = []
for mod in dict1[key]:
d = {}
arr.append(mod)
if ((mod == dict1[key][-1]) | (len(arr)%4 == 0)):
for i in range(0,len(arr)):
e = arr.pop()
script = write_script(e+"_temp.sh",e)
d[i] = multiprocessing.Process(target=run_use,args=(machines[i],script,))
d[i].daemon = True
for pp in d:
d[pp].start()
for pp in d:
d[pp].join()
Since you're writing a shell script to run your subcommands, can you simply tell them to read input from /dev/null?
#!/bin/bash
# ...
my_other_command -a -b arg1 arg2 < /dev/null
# ...
This may stop them blocking on input and is a really simple solution. If this doesn't work for you, read on for some other options.
The subprocess.call() function is simply shorthand for constructing a subprocess.Popen instance and then calling the wait() method on it. So, your spare processes could instead create their own subprocess.Popen instances and poll them with poll() method on the object instead of wait() (in a loop with a suitable delay). This leaves them free to remain in communication with the main process so you can, for example, allow the main process to tell the child process to terminate the Popen instance with the terminate() or kill() methods and then itself exit.
So, the question is how does the child process tell whether the subprocess is awaiting user input, and that's a trickier question. I would say perhaps the easiest approach is to monitor the output of the subprocess and search for the user input prompt, assuming that it always uses some string that you can look for. Alternatively, if the subprocess is expected to generate output continually then you could simply look for any output and if a configured amount of time goes past without any output then you declare that process dead and terminate it as detailed above.
Since you're reading the output, actually you don't need poll() or wait() - the process closing its output file descriptor is good enough to know that it's terminated in this case.
Here's an example of a modified run_use() method which watches the output of the subprocess:
def run_use(mname,script):
print "ssh "+mname+" "+script
proc = subprocess.Popen(['ssh',mname,script], stdout=subprocess.PIPE)
for line in proc.stdout:
if "UserPrompt>>>" in line:
proc.terminate()
break
In this example we assume that the process either gets hung on on UserPrompt>>> (replace with the appropriate string) or it terminates naturally. If it were to get stuck in an infinite loop, for example, then your script would still not terminate - you can only really address that with an overall timeout, but you didn't seem keen to do that. Hopefully your subprocess won't misbehave in that way, however.
Finally, if you don't know in advance the prompt that will be giving from your process then your job is rather harder. Effectively what you're asking to do is monitor an external process and know when it's blocked reading on a file descriptor, and I don't believe there's a particularly clean solution to this. You could consider running a process under strace or similar, but that's quite an awful hack and I really wouldn't recommend it. Things like strace are great for manual diagnostics, but they really shouldn't be part of a production setup.
i have a simple problem to solve(more or less)
if i watch python multiprocessing tutorials i see that a process should be started more or less like this:
from multiprocessing import *
def u(m):
print(m)
return
A=Process(target=u,args=(0,))
A.start()
A.join()
It should print a 0 but nothing gets printed. Instead it hangs forever at the A.join().
if i manually start the function u doing this
A.run()
it actually prints 0 on the shell but it doesn't work simultaneously
for example the output of following code:
from multiprocessing import *
from time import sleep
def u(m):
sleep(1)
print(m)
return
A=Process(target=u,args=(1,))
A.start()
print(0)
should be
0
1
but actually is
0
and if i add before the last line
A.run()
then the output becomes
1
0
this seems confusing to me...
and if i try to join the process it waits forever.
however,if it can help giving me an answer
my OS is Mac os x 10.6.8
python versions used are 3.1 and 3.3
my computer has 1 intel core i3 processor
--Update--
I have noticed that this strange behaviour is present only when launching the program from IDLE ,if i run the program from the terminal everything works as it is supposed to,so this problem must be connected to some IDLE bug.
But runnung programs from terminal is even weirder: using something like range(100000000) activates all my computer's ram until the end of the program; if i remember well this shouldn't happen in python 3,only in older python versions.
I hope these new informations will help you giving an answer
--Update 2--
the bug occurs even if i don't perform output from my process,because setting this:
def u():
return
as the target of the process and then starting it , if i try to join the process,idle waits forever
As suggested here and here, the problem is that IDLE overrides sys.stdin and sys.stdout in some weird ways, which do not propagate cleanly to processes you spawn from it (they are not real filehandles).
The first link also indicates it's unlikely to be fixed any time soon ("may be a 'cannot fix' issue", they say).
So unfortunately the only solution I can suggest is not to use IDLE for this script...
Have you tried adding A.join() to your program? I am guessing that your main process is exiting before the child process prints which is causing the output to be hidden. If you tell the main process to wait for the child process (A.join()), I bet you'll see the output you expect.
Given that it only happens with IDLE, I suspect the problem has to do with the stdout used by both processes. Perhaps it's some file-like object that's not safe to use from two different processes.
If you don't have the child process write to stdout, I suspect it will complete and join properly. For example, you could have it write to a file, instead. Or you could set up a pipe between the parent and child.
Have you tried unbuffered output? Try importing the sys module and change the print statement:
print >> sys.stderr, m
How does this affect the behavior? I'm with the others that suspect that IDLE is mucking with the stdio . . .
Note: This question has been re-asked with a summary of all debugging attempts here.
I have a Python script that is running as a background process executing every 60 seconds. Part of that is a call to subprocess.Popen to get the output of ps.
ps = subprocess.Popen(['ps', 'aux'], stdout=subprocess.PIPE).communicate()[0]
After running for a few days, the call is erroring with:
File "/home/admin/sd-agent/checks.py", line 436, in getProcesses
File "/usr/lib/python2.4/subprocess.py", line 533, in __init__
File "/usr/lib/python2.4/subprocess.py", line 835, in _get_handles
OSError: [Errno 12] Cannot allocate memory
However the output of free on the server is:
$ free -m
total used free shared buffers cached
Mem: 894 345 549 0 0 0
-/+ buffers/cache: 345 549
Swap: 0 0 0
I have searched around for the problem and found this article which says:
Solution is to add more swap space to your server. When the kernel is forking to start the modeler or discovery process, it first ensures there's enough space available on the swap store the new process if needed.
I note that there is no available swap from the free output above. Is this likely to be the problem and/or what other solutions might there be?
Update 13th Aug 09 The code above is called every 60 seconds as part of a series of monitoring functions. The process is daemonized and the check is scheduled using sched. The specific code for the above function is:
def getProcesses(self):
self.checksLogger.debug('getProcesses: start')
# Memory logging (case 27152)
if self.agentConfig['debugMode'] and sys.platform == 'linux2':
mem = subprocess.Popen(['free', '-m'], stdout=subprocess.PIPE).communicate()[0]
self.checksLogger.debug('getProcesses: memory before Popen - ' + str(mem))
# Get output from ps
try:
self.checksLogger.debug('getProcesses: attempting Popen')
ps = subprocess.Popen(['ps', 'aux'], stdout=subprocess.PIPE).communicate()[0]
except Exception, e:
import traceback
self.checksLogger.error('getProcesses: exception = ' + traceback.format_exc())
return False
self.checksLogger.debug('getProcesses: Popen success, parsing')
# Memory logging (case 27152)
if self.agentConfig['debugMode'] and sys.platform == 'linux2':
mem = subprocess.Popen(['free', '-m'], stdout=subprocess.PIPE).communicate()[0]
self.checksLogger.debug('getProcesses: memory after Popen - ' + str(mem))
# Split out each process
processLines = ps.split('\n')
del processLines[0] # Removes the headers
processLines.pop() # Removes a trailing empty line
processes = []
self.checksLogger.debug('getProcesses: Popen success, parsing, looping')
for line in processLines:
line = line.split(None, 10)
processes.append(line)
self.checksLogger.debug('getProcesses: completed, returning')
return processes
This is part of a bigger class called checks which is initialised once when the daemon is started.
The entire checks class can be found at http://github.com/dmytton/sd-agent/blob/82f5ff9203e54d2adeee8cfed704d09e3f00e8eb/checks.py with the getProcesses function defined from line 442. This is called by doChecks() starting at line 520.
You've perhaps got a memory leak bounded by some resource limit (RLIMIT_DATA, RLIMIT_AS?) inherited by your python script. Check your *ulimit(1)*s before you run your script, and profile the script's memory usage, as others have suggested.
What do you do with the variable ps after the code snippet you show us? Do you keep a reference to it, never to be freed? Quoting the subprocess module docs:
Note: The data read is buffered in memory, so do not use this
method if the data size is large or unlimited.
... and ps aux can be verbose on a busy system...
Update
You can check rlimits from with your python script using the resource module:
import resource
print resource.getrlimit(resource.RLIMIT_DATA) # => (soft_lim, hard_lim)
print resource.getrlimit(resource.RLIMIT_AS)
If these return "unlimited" -- (-1, -1) -- then my hypothesis is incorrect and you may move on!
See also resource.getrusage, esp. the ru_??rss fields, which can help you to instrument for memory consumption from with the python script, without shelling out to an external program.
when you use popen you need to hand in close_fds=True if you want it to close extra file descriptors.
creating a new pipe, which occurs in the _get_handles function from the back trace, creates 2 file descriptors, but your current code never closes them and your eventually hitting your systems max fd limit.
Not sure why the error you're getting indicates an out of memory condition: it should be a file descriptor error as the return value of pipe() has an error code for this problem.
That swap space answer is bogus. Historically Unix systems wanted swap space available like that, but they don't work that way anymore (and Linux never worked that way). You're not even close to running out of memory, so that's not likely the actual problem - you're running out of some other limited resource.
Given where the error is occuring (_get_handles calls os.pipe() to create pipes to the child), the only real problem you could be running into is not enough free file descriptors. I would instead look for unclosed files (lsof -p on the PID of the process doing the popen). If your program really needs to keep a lot of files open at one time, then increase the user limit and/or the system limit for open file descriptors.
If you're running a background process, chances are that you've redirected your processes stdin/stdout/stderr.
In that case, append the option "close_fds=True" to your Popen call, which will prevent the child process from inheriting your redirected output. This may be the limit you're bumping into.
You might want to actually wait for all of those PS processes to finish before adding swap space.
It's not at all clear what "running as a background process executing every 60 seconds" means.
But your call to subprocess.Popen is forking a new process each time.
Update.
I'd guess that you're somehow leaving all those processes running or hung in a zombie state. However, the communicate method should clean up the spawned subprocesses.
Have you watched your process over time?
lsof
ps -aux | grep -i pname
top
All should give interesting information. I am thinking that the process is tying up resources that should be freed up. Is there a chance that it is tying up resource handles (memory blocks, streams, file handles, thread or process handles)? stdin, stdout, stderr from the spawned "ps". Memory handles, ... from many small incremental allocations. I would be very interested in seeing what the above commands display for your process when it has just finished launching and running for the first time and after 24 hours of "sitting" there launching the sub-process regularly.
Since it dies after a few days, you could have it run for only a few loops, and then restart it once a day as a workaround. That would help you in the meantime.
Jacob
You need to
ps = subprocess.Popen(["sleep", "1000"])
os.waitpid(ps.pid, 0)
to free resources.
Note: this does not work on Windows.
I don't think that the circumstances given in the Zenoss article you linked to is the only cause of this message, so it's not clear yet that swap space is definitely the problem. I would advise logging some more information even around successful calls, so that you can see the state of free memory every time just before you do the ps call.
One more thing - if you specify shell=True in the Popen call, do you see different behaviour?
Update: If not memory, the next possible culprit is indeed file handles. I would advise running the failing command under strace to see exactly which system calls are failing.
Virtual Memory matters!!!
I encountered the same issue before I add swap to my OS. The formula for virtual memory is usually like: SwapSize + 50% * PhysicalMemorySize. I finally get this resolved by either adding more physical memory or adding a Swap disk. close_fds won't work in my case.