How to capture prints in real time from function? - python

I want to capture all the prints and do something like return them but keep running the function.
I found this method but it only returns the prints when the code is finished.
f = io.StringIO()
with redirect_stdout(f):
# my code
return f.getvalue()
Is there any method to capture every print in real-time?

You could write your own file-like object that processes lines of text as it sees them. In the simplest case you only need to supply a write method as shown below. The tricky part is knowing when a "print" call is done. print may call stdout.write several times to do a single print operation. In this example, I did processing whenever a newline is seen. This code does not return interim prints but does allow you to intercept the writes to stdout and process them before returning to the function that calls print.
from contextlib import redirect_stdout
import sys
real_stdout_for_test = sys.stdout
class WriteProcessor:
def __init__(self):
self.buf = ""
def write(self, buf):
# emit on each newline
while buf:
try:
newline_index = buf.index("\n")
except ValueError:
# no newline, buffer for next call
self.buf += buf
break
# get data to next newline and combine with any buffered data
data = self.buf + buf[:newline_index + 1]
self.buf = ""
buf = buf[newline_index + 1:]
# perform complex calculations... or just print with a note.
real_stdout_for_test.write("fiddled with " + data)
with redirect_stdout(WriteProcessor()):
print("hello there")
print("a\nprint\nof\nmany\nlines")
print("goodbye ", end="")
print("for now")

Related

How to set a prefix for all print() output in python?

I am printing to a console in python. I am looking for a one off piece of code so that all print statments after a line of code have 4 spaces at the start. Eg.
print('Computer: Hello world')
print.setStart(' ')
print('receiving...')
print('received!')
print.setStart('')
print('World: Hi!')
Output:
Computer: Hello world
receiving...
received!
World: Hi!
This would be helpful for tabbing all of the output that is contained in a function, and setting when functions output are tabbed. Is this possible?
You can define a print function which first prints your prefix, and then internally calls the built-in print function. You can even make your custom print() function to look at the call-stack and accordingly determine how many spaces to use as a prefix:
import builtins
import traceback
def print(*objs, **kwargs):
my_prefix = len(traceback.format_stack())*" "
builtins.print(my_prefix, *objs, **kwargs)
Test it out:
def func_f():
print("Printing from func_f")
func_g()
def func_g():
print ("Printing from func_g")
func_f()
Output:
Printing from func_f
Printing from func_g
Reverting back to the built-in print() function:
When you are done with your custom printing, and want to start using the built-in print() function, just use del to "delete" your own definition of print:
del print
Why not define your own custom function and use that when needed:
def tprint(*args):
print(' ', *args)
It would be used like so:
print('Computer: Hello world')
tprint('receiving...')
tprint('received!')
print('World: Hi!')
Output:
Computer: Hello world
receiving...
received!
World: Hi!
You might want to use specific prefixes only at specific places
import sys
from contextlib import contextmanager
#contextmanager
def add_prefix(prefix):
global is_new_line
orig_write = sys.stdout.write
is_new_line = True
def new_write(*args, **kwargs):
global is_new_line
if args[0] == "\n":
is_new_line = True
elif is_new_line:
orig_write("[" + str(prefix) + "]: ")
is_new_line = False
orig_write(*args, **kwargs)
sys.stdout.write = new_write
yield
sys.stdout.write = orig_write
with add_prefix("Computer 1"):
print("Do something", "cool")
print("Do more stuffs")
with add_prefix("Computer 2"):
print("Do further stuffs")
print("Done")
#[Computer 1]: Do something cool
#[Computer 1]: Do more stuffs
#[Computer 2]: Do further stuffs
#Done
The advantage is that it's a utility function, i.e. you just have to import to use it, without having to redefine every time you write a new script.

How to put all print result in a function into a variable?

I have a function in Python:
def f():
...
a lot of code
...
print "hello"
...
a lot of code
...
I want to call this function, however, the print result will be put into a variable instead print on the screen directly. How can I do this with Python?
ps:
please don't just return, sometimes I don't know where the print statement is.
Assuming that print is writing to sys.stdout, you can temporarily replace that with something like a StringIO object.
stdout = sys.stdout
sys.stdout = StringIO()
f()
x = sys.stdout.getvalue()
sys.stdout = stdout
Or, if you have a reference to the file handle that print is using, you can use that instead of sys.stdout.
If there are multiple uses of print from inside f, and you only want to capture some of them (say, only from a function g called from inside f), I'm afraid you are out of luck. The amount of introspection you would need to do would make it possible would allow you to simply re-implement the function to accumulate the desired output in a variable instead of using print.
Use a decorator like below
import sys
from StringIO import StringIO
s = StringIO()
def catch_stdout(user_method):
sys.stdout = s
def decorated(*args, **kwargs):
user_method(*args, **kwargs)
sys.stdout = sys.__stdout__
print 'printing result of all prints in one go'
s.seek(0, 0)
print s.read()
return decorated
#catch_stdout
def test():
print 'hello '
print 'world '
test()
You could also define your own context manager if you find you need to do this a lot so you can capture the output for a block of statements, eg:
import contextlib
from StringIO import StringIO
import sys
#contextlib.contextmanager
def capture_stdout():
old_stdout = sys.stdout
sys.stdout = StringIO()
yield sys.stdout, old_stdout
sys.stdout = old_stdout
Then use as follows:
def something():
print 'this is something'
# All prints that go to stdout inside this block either called
# directly or indirectly will be put into a StringIO object instead
# unless the original stdout is used directly...
with capture_print() as (res, stdout):
print 'hello',
print >> stdout, "I'm the original stdout!"
something()
print res.getvalue() + 'blah' # normal print to stdout outside with block
Gives you:
I'm the original stdout
hello this is something
blah
def f():
#code
variable = 'hello\n'
#code
variable += 'hello2\n'
#code
...
print(variable)
or
def f():
#code
variable = 'hello\n'
#code
variable += 'hello2\n'
#code
...
return(variable)
and then
print(f())

python multithreading and file locking issues

I have implemented multithreaded code in two ways, but in both ways I got an error. Could someone explain what causes the problem?
In version 1, I got an exception saying two arguments passed to writekey function instead of one.
In version 2, one of the threads reads empty line, therefore exception is raised while processing the empty string.
I am using locks, shouldn't it prevent multiple threads accessing the function or file at the same time?
Version 1:
class SomeThread(threading.Thread):
def __init__(self, somequeue, lockfile):
threading.Thread.__init__(self)
self.myqueue = somequeue
self.myfilelock = lockfile
def writekey(key):
if os.path.exists(os.path.join('.', outfile)):
with open(outfile, 'r') as fc:
readkey = int(fc.readline().rstrip())
os.remove(os.path.join('.', outfile))
with open(outfile, 'w') as fw:
if readkey > key:
fw.write(str(readkey))
else:
fw.write(str(key))
def run(self):
while(True):
dict = self.myqueue.get()
self.myfilelock.acquire()
try:
self.writekey(dict.get("key"))
finally:
self.myfilelock.release()
self.myqueue.task_done()
populateQueue() # populate queue with objects
filelock = threading.Lock()
for i in range(threadnum):
thread = SomeThread(somequeue, filelock)
thread.setDaemon(True)
thread.start()
somequeue.join()
Version 2:
def writekey(key):
if os.path.exists(os.path.join('.', outfile)):
with open(outfile, 'r') as fc:
# do something...
os.remove(os.path.join('.', outfile))
with open(outfile, 'w') as fw:
# do something...
class SomeThread(threading.Thread):
def __init__(self, somequeue, lockfile):
threading.Thread.__init__(self)
self.myqueue = somequeue
self.myfilelock = lockfile
def run(self):
while(True):
dict = self.myqueue.get()
self.myfilelock.acquire()
try:
writekey(dict.get("key"))
finally:
myfilelock.release()
self.myqueue.task_done()
# Same as above ....
In version 1, def writekey(key) should be declared with "self" as the first parameter, i.e.
def writekey(self, key):
The problem in version 2 is less clear. I assume that an empty line is being read while reading outfile. This is normal and it indicates that the end-of-file has been reached. Normally you would just break out of your read loop. Usually it is preferable to read your file line-by-line in a for loop, e.g.
with open(outfile, 'r') as fc:
for line in fc:
# process the line
The for loop will terminate naturally upon reaching end-of-file.

python: change sys.stdout print to custom print function

Im trying to understand how to create a custom print function.
(using python 2.7)
import sys
class CustomPrint():
def __init__(self):
self.old_stdout=sys.stdout #save stdout
def write(self, text):
sys.stdout = self.old_stdout #restore normal stdout and print
print 'custom Print--->' + text
sys.stdout= self # make stdout use CustomPrint on next 'print'
# this is the line that trigers the problem
# how to avoid this??
myPrint = CustomPrint()
sys.stdout = myPrint
print 'why you make 2 lines??...'
The code above prints this to console:
>>>
custom Print--->why you make 2 lines??...
custom Print--->
>>>
and i want to print only one line:
>>>
1custom Print--->why you make 2 lines??...
>>>
But cant figure out how to make this custom print work , i understand that there's some kind of recursion that triggers the second output to the console (i use self.write , to assign stdout to self.write himself !)
how can i make this work ? or is my approach just completely wrong...
It's not recursion. What happens is your write function is called twice, once with the text you expect, second time with just '\n'. Try this:
import sys
class CustomPrint():
def __init__(self):
self.old_stdout=sys.stdout
def write(self, text):
text = text.rstrip()
if len(text) == 0: return
self.old_stdout.write('custom Print--->' + text + '\n')
def flush(self):
self.old_stdout.flush()
What I do in the above code is I add the new line character to the text passed in the first call, and make sure the second call made by the print statement, the one meant to print new line, doesn't print anything.
Now try to comment out the first two lines and see what happens:
def write(self, text):
#text = text.rstrip()
#if len(text) == 0: return
self.old_stdout.write('custom Print--->' + text + '\n')
One solution may be to use a context manager if it's localised.
#!/usr/bin/env python
from __future__ import print_function
from contextlib import contextmanager
#############################
#contextmanager
def no_stdout():
import sys
old_stdout = sys.stdout
class CustomPrint():
def __init__(self, stdout):
self.old_stdout = stdout
def write(self, text):
if len(text.rstrip()):
self.old_stdout.write('custom Print--->' + text)
sys.stdout = CustomPrint(old_stdout)
try:
yield
finally:
sys.stdout = old_stdout
#############################
print("BEFORE")
with no_stdout():
print("WHY HELLO!\n")
print("DING DONG!\n")
print("AFTER")
The above produces:
BEFORE
custom Print--->WHY HELLO!
custom Print--->DING DONG!
AFTER
The code would need tidying up esp. around what the class should do WRT setting stdout back to what it was.
How about doing from __future__ import print_function. This way you will use Python3 print function instead of print statement from Python2. Then you can redefine the print function:
def print(*args, **kwargs):
__builtins__.print("Custom--->", *args, **kwargs)
There is a catch however, you will have to start using print function.

Creating interruptible process in python

I'm creating a python script of which parses a large (but simple) CSV.
It'll take some time to process. I would like the ability to interrupt the parsing of the CSV so I can continue at a later stage.
Currently I have this - of which lives in a larger class: (unfinished)
Edit:
I have some changed code. But the system will parse over 3 million rows.
def parseData(self)
reader = csv.reader(open(self.file))
for id, title, disc in reader:
print "%-5s %-50s %s" % (id, title, disc)
l = LegacyData()
l.old_id = int(id)
l.name = title
l.disc_number = disc
l.parsed = False
l.save()
This is the old code.
def parseData(self):
#first line start
fields = self.data.next()
for row in self.data:
items = zip(fields, row)
item = {}
for (name, value) in items:
item[name] = value.strip()
self.save(item)
Thanks guys.
If under linux, hit Ctrl-Z and stop the running process. Type "fg" to bring it back and start where you stopped it.
You can use signal to catch the event. This is a mockup of a parser than can catch CTRL-C on windows and stop parsing:
import signal, tme, sys
def onInterupt(signum, frame):
raise Interupted()
try:
#windows
signal.signal(signal.CTRL_C_EVENT, onInterupt)
except:
pass
class Interupted(Exception): pass
class InteruptableParser(object):
def __init__(self, previous_parsed_lines=0):
self.parsed_lines = previous_parsed_lines
def _parse(self, line):
# do stuff
time.sleep(1) #mock up
self.parsed_lines += 1
print 'parsed %d' % self.parsed_lines
def parse(self, filelike):
for line in filelike:
try:
self._parse(line)
except Interupted:
print 'caught interupt'
self.save()
print 'exiting ...'
sys.exit(0)
def save(self):
# do what you need to save state
# like write the parse_lines to a file maybe
pass
parser = InteruptableParser()
parser.parse([1,2,3])
Can't test it though as I'm on linux at the moment.
The way I'd do it:
Puty the actual processing code in a class, and on that class I'd implement the Pickle protocol (http://docs.python.org/library/pickle.html ) (basically, write proper __getstate__ and __setstate__ functions)
This class would accept the filename, keep the open file, and the CSV reader instance as instance members. The __getstate__ method would save the current file position, and setstate would reopen the file, forward it to the proper position, and create a new reader.
I'd perform the actuall work in an __iter__ method, that would yeld to an external function after each line was processed.
This external function would run a "main loop" monitoring input for interrupts (sockets, keyboard, state of an specific file on the filesystem, etc...) - everything being quiet, it would just call for the next iteration of the processor. If an interrupt happens, it would pickle the processor state to an specific file on disk.
When startingm the program just has to check if a there is a saved execution, if so, use pickle to retrieve the executor object, and resume the main loop.
Here goes some (untested) code - the iea is simple enough:
from cPickle import load, dump
import csv
import os, sys
SAVEFILE = "running.pkl"
STOPNOWFILE = "stop.now"
class Processor(object):
def __init__(self, filename):
self.file = open(filename, "rt")
self.reader = csv.reader(self.file)
def __iter__(self):
for line in self.reader():
# do stuff
yield None
def __getstate__(self):
return (self.file.name, self.file.tell())
def __setstate__(self, state):
self.file = open(state[0],"rt")
self.file.seek(state[1])
self.reader = csv.reader(self.File)
def check_for_interrupts():
# Use your imagination here!
# One simple thing would e to check for the existence of an specific file
# on disk.
# But you go all the way up to instantiate a tcp server and listen to
# interruptions on the network
if os.path.exists(STOPNOWFILE):
return True
return False
def main():
if os.path.exists(SAVEFILE):
with open(SAVEFILE) as savefile:
processor = load(savefile)
os.unlink(savefile)
else:
#Assumes the name of the .csv file to be passed on the command line
processor = Processor(sys.argv[1])
for line in processor:
if check_for_interrupts():
with open(SAVEFILE, "wb") as savefile:
dump(processor)
break
if __name__ == "__main__":
main()
My Complete Code
I followed the advice of #jsbueno with a flag - but instead of another file, I kept it within the class as a variable:
I create a class - when I call it asks for ANY input and then begins another process doing my work. As its looped - if I were to press a key, the flag is set and only checked when the loop is called for my next parse. Thus I don't kill the current action.
Adding a process flag in the database for each object from the data I'm calling means I can start this any any time and resume where I left off.
class MultithreadParsing(object):
process = None
process_flag = True
def f(self):
print "\nMultithreadParsing has started\n"
while self.process_flag:
''' get my object from database '''
legacy = LegacyData.objects.filter(parsed=False)[0:1]
if legacy:
print "Processing: %s %s" % (legacy[0].name, legacy[0].disc_number)
for l in legacy:
''' ... Do what I want it to do ...'''
sleep(1)
else:
self.process_flag = False
print "Nothing to parse"
def __init__(self):
self.process = Process(target=self.f)
self.process.start()
print self.process
a = raw_input("Press any key to stop \n")
print "\nKILL FLAG HAS BEEN SENT\n"
if a:
print "\nKILL\n"
self.process_flag = False
Thanks for all you help guys (especially yours #jsbueno) - if it wasn't for you I wouldn't have got this class idea.

Categories