I want to get screenshots of a webpage in Python. For this I am using http://github.com/AdamN/python-webkit2png/ .
newArgs = ["xvfb-run", "--server-args=-screen 0, 640x480x24", sys.argv[0]]
for i in range(1, len(sys.argv)):
if sys.argv[i] not in ["-x", "--xvfb"]:
newArgs.append(sys.argv[i])
logging.debug("Executing %s" % " ".join(newArgs))
os.execvp(newArgs[0], newArgs)
Basically calls xvfb-run with the correct args. But man xvfb says:
Note that the demo X clients used in the above examples will not exit on their own, so they will have to be killed before xvfb-run will exit.
So that means that this script will <????> if this whole thing is in a loop, (To get multiple screenshots) unless the X server is killed. How can I do that?
The documentation for os.execvp states:
These functions all execute a new
program, replacing the current
process; they do not return. [..]
So after calling os.execvp no other statement in the program will be executed. You may want to use subprocess.Popen instead:
The subprocess module allows you to
spawn new processes, connect to their
input/output/error pipes, and obtain
their return codes. This module
intends to replace several other,
older modules and functions, such as:
Using subprocess.Popen, the code to run xlogo in the virtual framebuffer X server becomes:
import subprocess
xvfb_args = ['xvfb-run', '--server-args=-screen 0, 640x480x24', 'xlogo']
process = subprocess.Popen(xvfb_args)
Now the problem is that xvfb-run launches Xvfb in a background process. Calling process.kill() will not kill Xvfb (at least not on my machine...). I have been fiddling around with this a bit, and so far the only thing that works for me is:
import os
import signal
import subprocess
SERVER_NUM = 99 # 99 is the default used by xvfb-run; you can leave this out.
xvfb_args = ['xvfb-run', '--server-num=%d' % SERVER_NUM,
'--server-args=-screen 0, 640x480x24', 'xlogo']
subprocess.Popen(xvfb_args)
# ... do whatever you want to do here...
pid = int(open('/tmp/.X%s-lock' % SERVER_NUM).read().strip())
os.kill(pid, signal.SIGINT)
So this code reads the process ID of Xvfb from /tmp/.X99-lock and sends the process an interrupt. It works, but does yield an error message every now and then (I suppose you can ignore it, though). Hopefully somebody else can provide a more elegant solution. Cheers.
Related
I have a Python script that runs all day long checking time every 60 seconds so it can start/end tasks (other python scripts) at specific periods of the day.
This script is running almost all ok. Tasks are starting at the right time and being open over a new cmd window so the main script can keep running and sampling the time. The only problem is that it just won't kill the tasks.
import os
import time
import signal
import subprocess
import ctypes
freq = 60 # sampling frequency in seconds
while True:
print 'Sampling time...'
now = int(time.time())
#initialize the task.. lets say 8:30am
if ( time.strftime("%H:%M", time.localtime(now)) == '08:30'):
# The following method is used so python opens another cmd window and keeps original script running and sampling time
pro = subprocess.Popen(["start", "cmd", "/k", "python python-task.py"], shell=True)
# kill process attempts.. lets say 11:40am
if ( time.strftime("%H:%M", time.localtime(now)) == '11:40'):
pro.kill() #not working - nothing happens
pro.terminate() #not working - nothing happens
os.kill(pro.pid, signal.SIGINT) #not working - windows error 5 access denied
# Kill the process using ctypes - not working - nothing happens
ctypes.windll.kernel32.TerminateProcess(int(pro._handle), -1)
# Kill process using windows taskkill - nothing happens
os.popen('TASKKILL /PID '+str(pro.pid)+' /F')
time.sleep(freq)
Important Note: the task script python-task.py will run indefinitely. That's exactly why I need to be able to "force" kill it at a certain time while it still running.
Any clues? What am I doing wrong? How to kill it?
You're killing the shell that spawns your sub-process, not your sub-process.
Edit: From the documentation:
The only time you need to specify shell=True on Windows is when the command you wish to execute is built into the shell (e.g. dir or copy). You do not need shell=True to run a batch file or console-based executable.
Warning
Passing shell=True can be a security hazard if combined with untrusted input. See the warning under Frequently Used Arguments for details.
So, instead of passing a single string, pass each argument separately in the list, and eschew using the shell. You probably want to use the same executable for the child as for the parent, so it's usually something like:
pro = subprocess.Popen([sys.executable, "python-task.py"])
I have a .jar file that I'm running with arguments via Popen. This server takes about 4 seconds to start up and then dumps out "Server Started" on the terminal and then runs until the user quits the terminal. However, the print and webbrowser.open execute immediately because of Popen and if I use call, they never run at all. Is there a way to ensure that the print and webbrowser don't run until after the server is started other than using wait? Maybe grep for server started?
from subprocess import Popen
import glob
import sys
import webbrowser
reasoner = glob.glob("reasoner*.jar")
reasoner = reasoner.pop()
port = str(input("Enter connection port: "))
space = ""
portArg = ("-p", port)
portArg = space.join(portArg)
print "Navigate to the Reasoner at http://locahost:" + port
reasoner_process = Popen(["java", "-jar", reasoner, "-i", "0.0.0.0", portArg, "--dbconnect", "jdbc:h2:tcp://localhost//tmp/UXDemo;user=sa;password=admin"])
# I want the following to execute after the .jar process above
print "Opening http://locahost:" + port + "..."
webbrowser.open("http://locahost:" + port)
What you're looking to do is a very simple, special version of interacting with a CLI app. So, you have two options.
First, you can use a library like pexpect that's designed to handle driving almost any CLI application. It may be overkill, and there is a bit of a learning curve, but once you get the basics down this will make your problem trivial: you launch the JAR, block expecting "Server Started", then close.
Alternatively, you can do this manually with the Popen pipes. In general this has a lot of problems, but when you know there's going to exactly one output that fits easily into 128 bytes and you don't want to do anything but block on that output and then close the pipe, none of those problems comes up. So:
reasoner_process = Popen(args, stdout=PIPE)
line = reasoner_process.stdout.readline()
if line.strip() != 'Server Started':
# error handling
# Any code that you want to do while the server is running goes here
reasoner_process.stdout.close()
reasoner_process.kill()
reasoner_process.wait()
But first make sure you actually have to kill it; often closing the pipe is sufficient, in which case you can and should leave out the kill(), in which case you can also check the exit code and raise if it's not 0.
Also, you probably want a with contextlib.closing(…) or whatever's appropriate, or just a try/finally to make sure you can raise an exception for error handling and not leak the child. (Python 3.2+ makes this a lot simpler, because it guarantees that both the pipes and the Popen itself are usable as context managers.)
Finally, I was assuming that "runs until the user quits the terminal" means you want to wait for it to start, then leave it running while you do other stuff, then kill it. If your workflow is different, you obviously need to change the order in which you do things.
I am using the multiprocessing module in python to launch few processes in parallel. These processes are independent of each other. They generate their own output and write out the results in different files. Each process calls an external tool using the subprocess.call method.
It was working fine until I discovered an issue in the external tool where due to some error condition it goes into a 'prompt' mode and waits for the user input. Now in my python script I use the join method to wait till all the processes finish their tasks. This is causing the whole thing to wait for this erroneous subprocess call. I can put a timeout for each of the process but I do not know in advance how long each one is going to run and hence this option is ruled out.
How do I figure out if any child process is waiting for an user input and how do I send an 'exit' command to it? Any pointers or suggestions to relevant modules in python will be really appreciated.
My code here:
import subprocess
import sys
import os
import multiprocessing
def write_script(fname,e):
f = open(fname,'w')
f.write("Some useful cammnd calling external tool")
f.close()
subprocess.call(['chmod','+x',os.path.abspath(fname)])
return os.path.abspath(fname)
def run_use(mname,script):
print "ssh "+mname+" "+script
subprocess.call(['ssh',mname,script])
if __name__ == '__main__':
dict1 = {}
dict['mod1'] = ['pp1','ext2','les3','pw4']
dict['mod2'] = ['aaa','bbb','ccc','ddd']
machines = ['machine1','machine2','machine3','machine4']
log_file.write(str(dict1.keys()))
for key in dict1.keys():
arr = []
for mod in dict1[key]:
d = {}
arr.append(mod)
if ((mod == dict1[key][-1]) | (len(arr)%4 == 0)):
for i in range(0,len(arr)):
e = arr.pop()
script = write_script(e+"_temp.sh",e)
d[i] = multiprocessing.Process(target=run_use,args=(machines[i],script,))
d[i].daemon = True
for pp in d:
d[pp].start()
for pp in d:
d[pp].join()
Since you're writing a shell script to run your subcommands, can you simply tell them to read input from /dev/null?
#!/bin/bash
# ...
my_other_command -a -b arg1 arg2 < /dev/null
# ...
This may stop them blocking on input and is a really simple solution. If this doesn't work for you, read on for some other options.
The subprocess.call() function is simply shorthand for constructing a subprocess.Popen instance and then calling the wait() method on it. So, your spare processes could instead create their own subprocess.Popen instances and poll them with poll() method on the object instead of wait() (in a loop with a suitable delay). This leaves them free to remain in communication with the main process so you can, for example, allow the main process to tell the child process to terminate the Popen instance with the terminate() or kill() methods and then itself exit.
So, the question is how does the child process tell whether the subprocess is awaiting user input, and that's a trickier question. I would say perhaps the easiest approach is to monitor the output of the subprocess and search for the user input prompt, assuming that it always uses some string that you can look for. Alternatively, if the subprocess is expected to generate output continually then you could simply look for any output and if a configured amount of time goes past without any output then you declare that process dead and terminate it as detailed above.
Since you're reading the output, actually you don't need poll() or wait() - the process closing its output file descriptor is good enough to know that it's terminated in this case.
Here's an example of a modified run_use() method which watches the output of the subprocess:
def run_use(mname,script):
print "ssh "+mname+" "+script
proc = subprocess.Popen(['ssh',mname,script], stdout=subprocess.PIPE)
for line in proc.stdout:
if "UserPrompt>>>" in line:
proc.terminate()
break
In this example we assume that the process either gets hung on on UserPrompt>>> (replace with the appropriate string) or it terminates naturally. If it were to get stuck in an infinite loop, for example, then your script would still not terminate - you can only really address that with an overall timeout, but you didn't seem keen to do that. Hopefully your subprocess won't misbehave in that way, however.
Finally, if you don't know in advance the prompt that will be giving from your process then your job is rather harder. Effectively what you're asking to do is monitor an external process and know when it's blocked reading on a file descriptor, and I don't believe there's a particularly clean solution to this. You could consider running a process under strace or similar, but that's quite an awful hack and I really wouldn't recommend it. Things like strace are great for manual diagnostics, but they really shouldn't be part of a production setup.
I'm trying to launch a background process from a CGI scripts. Basically, when a form is submitted the CGI script will indicate to the user that his or her request is being processed, while the background script does the actual processing (because the processing tends to take a long time.) The problem I'm facing is that Apache won't send the output of the parent CGI script to the browser until the child script terminates.
I've been told by a colleague that what I want to do is impossible because there is no way to prevent Apache from waiting for the entire process tree of a CGI script to die. However, I've also seen numerous references around the web to a "double fork" trick which is supposed to do the job. The trick is described succinctly in this Stack Overflow answer, but I've seen similar code elsewhere.
Here's a short script I wrote to test the double-fork trick in Python:
import os
import sys
if os.fork():
print 'Content-type: text/html\n\n Done'
sys.exit(0)
if os.fork():
os.setsid()
sys.exit(0)
# Second child
os.chdir("/")
sys.stdout.close()
sys.stderr.close()
sys.stdin.close()
f = open('/tmp/lol.txt', 'w')
while 1:
f.write('test\n')
If I run this from the shell, it does exactly what I'd expect: the original script and first descendant die, and the second descendant keeps running until it's killed manually. But if I access it through CGI, the page won't load until I kill the second descendant or Apache kills it because of the CGI timeout. I've also tried replacing the second sys.exit(0) with os._exit(0), but there is no difference.
What am I doing wrong?
Don't fork - run batch separately
This double-forking approach is some kind of hack, which to me is indication it shouldn't be done :). For CGI anyway. Under the general principle that if something is too hard to accomplish, you are probably approaching it the wrong way.
Luckily you give the background info on what you need - a CGI call to initiate some processing that happens independently and to return back to the caller. Well sure - there are unix commands that do just that - schedule command to run at specific time (at) or whenever CPU is free (batch). So do this instead:
import os
os.system("batch <<< '/home/some_user/do_the_due.py'")
# or if you don't want to wait for system idle,
# os.system("at now <<< '/home/some_user/do_the_due.py'")
print 'Content-type: text/html\n'
print 'Done!'
And there you have it. Keep in mind that if there is some output to stdout/stderr, that will be mailed to the user (which is good for debugging but otherwise script probably should keep quiet).
PS. i just remembered that Windows also has version of at, so with minor modification of the invocation you can have that work under apache on windows too (vs fork trick that won't work on windows).
PPS. make sure the process running CGI is not excluded in /etc/at.deny from scheduling batch jobs
I think there are two issues: setsid is in the wrong place and doing buffered IO operations in one of the transient children:
if os.fork():
print "success"
sys.exit(0)
if os.fork():
os.setsid()
sys.exit()
You've got the original process (grandparent, prints "success"), the middle parent, and the grandchild ("lol.txt").
The os.setsid() call is being performed in the middle parent after the grandchild has been spawned. The middle parent can't influence the grandchild's session after the grandchild has been created. Try this:
print "success"
sys.stdout.flush()
if os.fork():
sys.exit(0)
os.setsid()
if os.fork():
sys.exit(0)
This creates a new session before spawning the grandchild. Then the middle parent dies, leaving the session without a process group leader, ensuring that any calls to open a terminal will fail, making sure there's never any blocking on terminal input or output, or sending unexpected signals to the child.
Note that I've also moved the success to the grandparent; there's no guarantee of which child will run first after calling fork(2), and you run the risk that the child would be spawned, and potentially try to write output to standard out or standard error, before the middle parent could have had a chance to write success to the remote client.
In this case, the streams are closed quickly, but still, mixing standard IO streams among multiple processes is bound to give difficulty: keep it all in one process, if you can.
Edit I've found a strange behavior I can't explain:
#!/usr/bin/python
import os
import sys
import time
print "Content-type: text/plain\r\n\r\npid: " + str(os.getpid()) + "\nppid: " + str(os.getppid())
sys.stdout.flush()
if os.fork():
print "\nfirst fork pid: " + str(os.getpid()) + "\nppid: " + str(os.getppid())
sys.exit(0)
os.setsid()
print "\nafter setsid pid: " + str(os.getpid()) + "\nppid: " + str(os.getppid())
sys.stdout.flush()
if os.fork():
print "\nsecond fork pid: " + str(os.getpid()) + "\nppid: " + str(os.getppid())
sys.exit(0)
#os.sleep(1) # comment me out, uncomment me, notice following line appear and dissapear
print "\nafter second fork pid: " + str(os.getpid()) + "\nppid: " + str(os.getppid())
The last line, after second fork pid, only appears when the os.sleep(1) call is commented out. When the call is left in place, the last line never appears in the browser. (But otherwise all the content is printed to the browser.)
I wouldn't suggets going about the problem this way. If you need to execute some task asynchronously, why not use a work queue like beanstalkd instead of trying to fork off the tasks from the request? There are client libraries for beanstalkd available for python.
I needed to break the stdout as well as the stderr like this:
sys.stdout.flush()
os.close(sys.stdout.fileno()) # Break web pipe
sys.sterr.flush()
os.close(sys.stderr.fileno()) # Break web pipe
if os.fork(): # Get out parent process
sys.exit()
#background processing follows here
Ok, I'm adding a simpler solution, if you don't need to start another script but continue in the same one to do the long process in background. This will let you give a waiting message instantly seen by the client and continue your server processing even if the client kill the browser session:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import sys
import time
import datetime
print "Content-Type: text/html;charset=ISO-8859-1\n\n"
print "<html>Please wait...<html>\n"
sys.stdout.flush()
os.close(sys.stdout.fileno()) # Break web pipe
if os.fork(): # Get out parent process
sys.exit()
# Continue with new child process
time.sleep(1) # Be sure the parent process reach exit command.
os.setsid() # Become process group leader
# From here I cannot print to Webserver.
# But I can write in other files or do any long process.
f=open('long_process.log', 'a+')
f.write( "Starting {0} ...\n".format(datetime.datetime.now()) )
f.flush()
time.sleep(15)
f.write( "Still working {0} ...\n".format(datetime.datetime.now()) )
f.flush()
time.sleep(300)
f.write( "Still alive - Apache didn't scalped me!\n" )
f.flush()
time.sleep(150)
f.write( "Finishing {0} ...\n".format(datetime.datetime.now()) )
f.flush()
f.close()
I have read half the Internet for one week without success on this one, finally I tried to test if there is a difference between sys.stdout.close() and os.close(sys.stdout.fileno()) and there is an huge one: The first didn't do anything while the second closed the pipe from the web server and completly disconnected from the client. The fork is only necessary because the webserver will kill its processes after a while and your long process probably needs more time to complete.
As other answers have noted, it is tricky to start a persistent process from your CGI script because the process must cleanly dissociate itself from the CGI program. I have found that a great general-purpose program for this is daemon. It takes care of the messy details involving open file handles, process groups, root directory, etc etc for you. So the pattern of such a CGI program is:
#!/bin/sh
foo-service-ping || daemon --restart foo-service
# ... followed below by some CGI handler that uses the "foo" service
The original post describes the case where you want your CGI program to return quickly, while spawning off a background process to finish handling that one request. But there is also the case where your web application depends on a running service which must be kept alive. (Other people have talked about using beanstalkd to handle jobs. But how do you ensure that beanstalkd itself is alive?) One way to do this is to restart the service (if it's down) from within the CGI script. This approach makes sense in an environment where you have limited control over the server and can't rely on things like cron or an init.d mechanism.
There are situations where passing work off to a daemon or cron is not appropriate. Sometimes you really DO need to fork, let the parent exit (to keep Apache happy) and let something slow happen in the child.
What worked for me: When done generating web output, and before the fork:
fflush(stdout), close(0), close(1), close(2); // in the process BEFORE YOU FORK
Then fork() and have the parent immediately exit(0);
The child then AGAIN does
close(0), close(1), close(2);
and also a
setsid();
...and then gets on with whatever it needs to do.
Why you need to close them in the child even though they were closed in the primordial process in advance is confusing to me, but this is what worked. It didn't without the 2nd set of closes. This was on Linux (on a raspberry pi).
I haven't tried using fork but I have accomplished what you're asking by executing a sys.stdout.flush() after the original message, before calling the background process.
i.e.
print "Please wait..."
sys.stdout.flush()
output = some_processing() # put what you want to accomplish here
print output # in my case output was a redirect to a results page
My head still hurting on that one. I tried all possible ways to use your code with fork and stdout closing, nulling or anything but nothing worked. The uncompleted process output display depends on webserver (Apache or other) config, and in my case it wasn't an option to change it, so tries with "Transfer-Encoding: chunked;chunk=CRLF" and "sys.stdout.flush()" didn't worked either. Here is the solution that finally worked.
In short, use something like:
if len(sys.argv) == 1: # I'm in the parent process
childProcess = subprocess.Popen('./myScript.py X', bufsize=0, stdin=open("/dev/null", "r"), stdout=open("/dev/null", "w"), stderr=open("/dev/null", "w"), shell=True)
print "My HTML message that says to wait a long time"
else: # Here comes the child and his long process
# From here I cannot print to Webserver, but I can write in files that will be refreshed in my web page.
time.sleep(15) # To verify the parent completes rapidly.
I use the "X" parameter to make the distinction between parent and child because I call the same script for both, but you could do it simpler by calling another script. If a complete example would be useful, please ask.
For thous that have "sh: 1: Syntax error: redirection unexpected" with the at/batch solution try using something like this:
Make sure that the at command is installed and the user running the application ins't in /etc/at.deny
os.system("echo sudo /srv/scripts/myapp.py | /usr/bin/at now")
I am working with a cluster system over linux (www.mosix.org) that allows me to run jobs and have the system run them on different computers. Jobs are run like so:
mosrun ls &
This will naturally create the process and run it on the background, returning the process id, like so:
[1] 29199
Later it will return. I am writing a Python infrastructure that would run jobs and control them. For that I want to run jobs using the mosrun program as above, and save the process ID of the spawned process (29199 in this case). This naturally cannot be done using os.system or commands.getoutput, as the printed ID is not what the process prints to output... Any clues?
Edit:
Since the python script is only meant to initially run the script, the scripts need to run longer than the python shell. I guess it means the mosrun process cannot be the script's child process. Any suggestions?
Thanks
Use subprocess module. Popen instances have a pid attribute.
Looks like you want to ensure the child process is daemonized -- PEP 3143, which I'm pointing to, documents and points to a reference implementation for that, and points to others too.
Once your process (still running Python code) is daemonized, be it by the means offered in PEP 3143 or others, you can os.execl (or other os.exec... function) your target code -- this runs said target code in exactly the same process which we just said is daemonized, and so it keeps being daemonized, as desired.
The last step cannot use subprocess because it needs to run in the same (daemonized) process, overlaying its executable code -- exactly what os.execl and friends are for.
The first step, before daemonization, might conceivably be done via subprocess, but that's somewhat inconvenient (you need to put the daemonize-then-os.exec code in a separate .py): most commonly you'd just want to os.fork and immediately daemonize the child process.
subprocess is quite convenient as a mostly-cross-platform way to run other processes, but it can't really replace Unix's good old "fork and exec" approach for advanced uses (such as daemonization, in this case) -- which is why it's a good thing that the Python standard library also lets you do the latter via those functions in module os!-)
Thanks all for the help. Here's what I did in the end, and seems to work ok. The code uses python-daemon. Maybe something smarter should be done about transferring the process id from the child to the father, but that's the easier part.
import daemon
def run_in_background(command, tmp_dir="/tmp"):
# Decide on a temp file beforehand
warnings.filterwarnings("ignore", "tempnam is a potential security")
tmp_filename = os.tempnam(tmp_dir)
# Duplicate the process
pid = os.fork()
# If we're child, daemonize and run
if pid == 0:
with daemon.DaemonContext():
child_id = os.getpid()
file(tmp_filename,'w').write(str(child_id))
sp = command.split(' ')
os.execl(*([sp[0]]+sp))
else:
# If we're a parent, poll for the new file
n_iter = 0
while True:
if os.path.exists(tmp_filename):
child_id = int(file(tmp_filename, 'r').read().strip())
break
if n_iter == 100:
raise Exception("Cannot read process id from temp file %s" % tmp_filename)
n_iter += 1
time.sleep(0.1)
return child_id