Printing in loop - python

I have following code to print to the System-Default printer:
def printFile(file):
print("printing file...")
with open(file, "rb") as source:
printer = subprocess.Popen('/usr/bin/lpr', stdin=subprocess.PIPE)
printer.stdin.write(source.read())
This function works quite well if I use it on its own. But if use it in a loop construct like this:
while True:
printFile(file)
(...)
the printing job won't run (although) the loop will continue without error...
I tried to build in a time delay, but it didn't helped...
[Edit]: further investigations showed me that the printing function (when called from the loop) will put the printing jobs on hold...?

In modern Python3, it is advised to use subprocess.run() in most cases instead of using subprocess.Popen directly. And I would leave it to lpr to read the file, rather than passing it to standard input:
def printFile(file):
print("printing file...")
cp = subprocess.run(['\usr\bin\lpr', file])
return cp.returncode
Using subprocess.run allows you to ascertain that the lpr process finished correctly. And this way you don't have to read and write the complete file. You can even remove the file once lpr is finished.
Using Popen directly has some disadvantages here;
Using Popen.stdin might produce a deadlock if it overfills the OS pipe buffers (according to the Python docs).
Since you don't wait() for the Popen process to finish, you don't know if it finished without errors.
Depending on how lpr is set up, it might have rate controls. That is, it might stop printing if it gets a lot of print requests in a short span of time.
Edit: I just thought of something. Most lpr implementations allow you to print more than one file at a time. So you could also do:
def printFile(files):
"""
Print file(s).
Arguments:
files: string or sequence of strings.
"""
if isinstance(files, str):
files = [files]
# if you want to be super strict...
if not isinstance(files (list, tuple)):
raise ValueError('files must be a sequence type')
else:
if not all(isinstance(f, str) for f in files):
raise ValueError('files must be a sequence of strings')
cp = subprocess.run(['\usr\bin\lpr'] + files)
return cp.returncode
That would print a single file or a whole bunch of them in one go...

Related

Python: Why does Pool.map() hang when attempting to use the input arguments to its map function?

I have the following function (shortened for readability), which I parallelize using Python's (3.5) multiprocessing module:
def evaluate_prediction(enumeration_tuple):
i = enumeration_tuple[0]
logits_pred = enumeration_tuple[1]
print("This prints succesfully")
print("This never gets printed: ")
print(enumeration_tuple[0])
filename = sample_names_test[i]
onehots_pred = logits_to_onehots(logits_pred)
np.save("/media/nfs/7_raid/ebos/models/fcn/" + channels + "/test/ndarrays/" + filename, onehots_pred)
However, this function hangs whenever I attempt to read its input argument. Execution can get past the logits_pred = enumeration_tuple[1] line, as evidenced by a print statement printing a simple string, but it halts whenever I print(logits_pred). So apparently, whenever I actually need the passed value, the process stops. I do not get an exception or error message. When using either Python's built-in map() function or a for-loop, the function finishes succesfully. I should have sufficient memory en computing power available. All processes are writing to different files. enumerate(predictions) yields correct index-value pairs, as expected. I call this function using Pool.map():
pool = multiprocessing.Pool()
file_results = pool.map(evaluate_prediction, enumerate(predictions))
Why is it hanging? And how can I get an exception, so I know what's going wrong?
UPDATE: After outsourcing the mapped function to another module, importing it from there, and adding __init__.py to my directory, I manage to print the first item in the tuple, but not the second.
I had a similar issue before, and a solution that worked for me was to put the function you want to parallelize in a separate module and then import it.
from eval_prediction import evaluate_prediction
pool = multiprocessing.Pool()
file_results = pool.map(evaluate_prediction, enumerate(predictions))
I assume you will save the function definition inside a filename eval_prediction.py in same directory. Make sure you have __init__.py as well.

Python - good way to determine that my PyInstaller-based Python script is the only copy running, and possibly terminate other copies?

I'm looking for a way to determine that a script I wrote, packed by PyInstaller, is the only copy of itself running - so that it can quit if it finds itself open already.
I'd also like to implement an argument to kill all currently running versions of the .exe. Killing them one by one by simple list of PIDs associated with the .exe isn't an option since I could accidentally kill my own process before finishing.
It would be the best if I could use only win32 APIs, as this script is sometimes called by services and thus is unfriendly to many subprocess.Popen calls. I don't want to have to go through UAC spoofing. However, sometimes the .exe is invoked by the Windows Scheduler or by user-land programs.
My current version of finding processes uses win32pdh. I'm not exactly sure where to attribute this, though it's very close to first example from here: http://www.programcreek.com/python/example/51184/win32pdh.OpenQuery
def get_win_processes():
win32pdh.EnumObjects(None, None, win32pdh.PERF_DETAIL_WIZARD)
junk, instances = win32pdh.EnumObjectItems(None,None,'Process', win32pdh.PERF_DETAIL_WIZARD)
proc_dict = {}
for instance in instances:
if proc_dict.has_key(instance):
proc_dict[instance] = proc_dict[instance] + 1
else:
proc_dict[instance]=0
proc_ids = []
for instance, max_instances in proc_dict.items():
for inum in xrange(max_instances+1):
hq = win32pdh.OpenQuery() # initializes the query handle
try:
path = win32pdh.MakeCounterPath( (None, 'Process', instance, None, inum, 'ID Process') )
counter_handle=win32pdh.AddCounter(hq, path) #convert counter path to counter handle
try:
win32pdh.CollectQueryData(hq) #collects data for the counter
type, val = win32pdh.GetFormattedCounterValue(counter_handle, win32pdh.PDH_FMT_LONG)
proc_ids.append((instance, val))
except win32pdh.error, e:
pass
win32pdh.RemoveCounter(counter_handle)
except win32pdh.error, e:
pass
win32pdh.CloseQuery(hq)
return proc_ids
However, this returns two processes, one of which is guardian process for PyInstaller, the other is the actual instance of the program. Furthermore, it doesn't indicate which one is the currently-running guardian or child.
Example output when exe is 'wcdo.exe' and there are two copies running:
(u'wcdo', 11700)
(u'wcdo', 8748)
(u'wcdo', 4152)
(u'wcdo', 9308)
Thanks!
You could query wmic and check which applications are connected ...
C:\>wmic process where name="webserver2.exe" get processid,parentprocessid,commandline
CommandLine ParentProcessId ProcessId
webserver2.exe --scheduled 3136 2212
webserver2.exe --scheduled 2212 6004
Here:
3112 is cmd.exe
4140 the 'pyInstaller wrappper' (because it is parent and process)
3220 the application itself
Using PHD seems to be overhead, it is slow and quite unflexible to indentify processes on Windows.
Calling 'wmic' through subprocess and parsing the output is done in a few lines.
Additionally there is a format flag, how the wmic output is presented (csv, xml, ...)
Btw. you could try to create your exe with py2exe, that does not use a wrapper application.
Not sure if it is relevant, to identify how the application was started. But you could add a special command line argument to your Windows Scheduler to run wcdo.exe --scheduled.

Maya GUI freezes during subprocess call

I need to conform some maya scenes we receive from a client to make them compatible to our pipeline. I'd like to batch that action, obviously, and I'm asked to launch the process from within Maya.
I've tried two methods already (quite similar to each other), which both work, but the problem is that the Maya GUI freezes until the process is complete. I'd like for the process to be completely transparent for the user so that they can keep workind, and only a message when it's done.
Here's what I tried and found until now:
This tutorial here : http://www.toadstorm.com/blog/?p=136 led me to write this and save it:
filename = sys.argv[1]
def createSphere(filename):
std.initialize(name='python')
try:
mc.file(filename, open=True, pmt=False, force=True)
sphere = mc.polySphere() [0]
mc.file(save=True, force=True)
sys.stdout.write(sphere)
except Exception, e:
sys.stderr.write(str(e))
sys.exit(-1)
if float(mc.about(v=True)) >= 2016.0:
std.uninitialize()
createSphere(filename)
Then to call it from within maya that way:
mayapyPath = 'C:/Program Files/Autodesk/Maya2016/bin/mayapy.exe'
scriptPath = 'P:/WG_MAYA_Users/lbouet/scripts/createSphere.py'
filenames = ['file1', 'file2', 'file3', 'file4']
def massCreateSphere(filenames):
for filename in filenames:
maya = subprocess.Popen(mayapyPath+' '+scriptPath+' '+filename,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
out,err = maya.communicate()
exitcode = maya.returncode
if str(exitcode) != '0':
print(err)
print 'error opening file: %s' % (filename)
else:
print 'added sphere %s to %s' % (out,filename)
massCreateSphere(filenames)
It works fine, but like I said, freezes Maya GUI until the process is over. And it's just for creating a sphere, so not nearly close to all the actions I'll actually have to perform on the scenes.
I've also tried to run the first script via a .bat file calling mayabatch and running the script, same issue.
I found this post (Running list of cmd.exe commands from maya in Python) who seems to be exactly what I'm looking for, but I can't see how to adapt it to my situation ?
From what I understand the issue might come from calling Popen in a loop (i.e. multiple times), but I really can't see how to do otherwise... I'm thinking maybe saving the second script somewhere on disk too and calling that one from Maya ?
In this case subprocess.communicate() will block until the child process is done, so it is not going to fix your problem on its own.
If you just want to kick off the processes and not wait for them to complete -- 'fire and forget' style -- you can just use threads, starting off a new thread for each process. However you'll have to be very careful about reporting back to the user -- if you try to touch the Maya scene or GUI from an outside thread you'll get mysterious, undebuggable errors. print() is usually ok but maya.cmds() is not. If you're only printing messages you can probably get away with maya.utils.executeDeferred() which is discussed in this question and in the docs.

Python continuously parse console input

I am writing a little Python script that parses the input from a QR reader (which is seen as a keyboard by the system).
At the moment I am using raw_input() but this function waits for an EOF/end-of-line symbol in order to submit the received string to the program.
I am wondering if there is a way to continuously parse the input string and not just in chunks limited by a line end.
In practice:
- is there a way in python to asynchronously and continuously parse a console input ?
- is there a way to change raw_input() (or an equivalent function) to look for another character in order to submit the string read into the program?
It seems like you're generally trying to solve two problems:
Read input in chunks
Parse that input asynchronously
For the first part, it will vary greatly based on the specifics of the input function your calling, but for standard input you could use something like
sys.stdin.read(1)
As for parsing asynchronously, there are a number of approaches you could take. Python is synchronous, so you will necessarily have to involve some subprocess calls. Manually spawning a function using the subprocess library is one option. You could also use something like Redis or some lightweight job queue to pop input chunks on and have them read and processed by another background script. Finally, gevent is a very popular coroutine based library for spawning asynchronous processes. Using gevent, this whole set up would look something like this:
class QRLoader(object):
def __init__(self):
self.data = []
def add_data(data):
self.data.append(data)
# if self._data constitutes a full QR code
# do something with data
gevent.spawn(parse_async)
def parse_async():
# do something with qr_loader.data
qr_loader = QRLoader()
while True:
data = sys.stdin.read(1)
if data:
qr_loader.add_data(data)

Is there a hidden possible deadlock in ppmap/parallel python?

I am having some trouble with using a parallel version of map (ppmap wrapper, implementation by Kirk Strauser).
The function I am trying to run in parallel runs a simple regular expression search on large number of strings (protein sequences), which are parsed from the filesystem using BioPython's SeqIO. Each of function calls uses their own file.
If I run the function using a normal map, everything works as expected. However, when using the ppmap, some of the runs simple freeze, there is no CPU usage and the main program does not even react to KeyboardInterrupt. Also, when I look onto the running processes, the workers are still there (but not using any CPU anymore).
e.g.
/usr/bin/python -u /usr/local/lib/python2.7/dist-packages/pp-1.6.1-py2.7.egg/ppworker.py 2>/dev/null
Furthermore, the workers do not seem to freeze on any particular data entry - if I manually kill the process and re-run the execution, it stops at a different point. (So I have temporarily resorted to keeping a list of finished entries and re-started the program multiple times).
Is there any way to see where the problem is?
Sample of the code that I am running:
def analyse_repeats(data):
"""
Loads whole proteome in memory and then looks for repeats in sequences,
flags both real repeats and sequences not containing particular aminoacid
"""
(organism, organism_id, filename) = data
import re
letters = ['C','M','F','I','L','V','W','Y','A','G','T','S','Q','N','E','D','H','R','K','P']
try:
handle = open(filename)
data = Bio.SeqIO.parse(handle, "fasta")
records = [record for record in data]
store_records = []
for record in records:
sequence = str(record.seq)
uniprot_id = str(record.name)
for letter in letters:
items = set(re.compile("(%s+)" % tuple(([letter] * 1))).findall(sequence))
if items:
for item in items:
store_records.append((organism_id,len(item), uniprot_id, letter))
else:
# letter not present in the string, "zero" repeat
store_records.append((organism_id,0, uniprot_id, letter))
handle.close()
return (organism,store_records)
except IOError as e:
print e
return (organism, [])
res_generator = ppmap.ppmap(
None,
analyse_repeats,
zip(todo_list, organism_ids, filenames)
)
for res in res_generator:
# process the output
If I use simple map instead of the ppmap, everything works fine:
res_generator = map(
analyse_repeats,
zip(todo_list, organism_ids, filenames)
)
You could try using one of the methods (like map) of the Pool object from the multiprocessing module instead. The advantage is that it's built in and doesn't require external packages. It also works very well.
By default, it uses as many worker processes as your computer has cores, but you can specifiy a higher number as well.
May I suggest using dispy (http://dispy.sourceforge.net)? Disclaimer: I am the author. I understand it doesn't address the question directly, but hopefully helps you.

Categories