Your program just paused on a pdb.set_trace().
Is there a way to monkey patch the function that is currently running, and "resume" execution?
Is this possible through call frame manipulation?
Some context:
Oftentimes, I will have a complex function that processes large quantities of data, without having a priori knowledge of what kind of data I'll find:
def process_a_lot(data_stream):
#process a lot of stuff
#...
data_unit= data_stream.next()
if not can_process(data_unit)
import pdb; pdb.set_trace()
#continue processing
This convenient construction launches a interactive debugger when it encounters unknown data, so I can inspect it at will and change process_a_lot code to handle it properly.
The problem here is that, when data_stream is big, you don't really want to chew through all the data again (let's assume next is slow, so you can't save what you already have and skip on the next run)
Of course, you can replace other functions at will once in the debugger. You can also replace the function itself, but it won't change the current execution context.
Edit:
Since some people are getting side-tracked:
I know there are a lot of ways of structuring your code such that your processing function is separate from process_a_lot. I'm not really asking about ways to structure the code as much as how to recover (in runtime) from the situation when the code is not prepared to handle the replacement.
First a (prototype) solution, then some important caveats.
# process.py
import sys
import pdb
import handlers
def process_unit(data_unit):
global handlers
while True:
try:
data_type = type(data_unit)
handler = handlers.handler[data_type]
handler(data_unit)
return
except KeyError:
print "UNUSUAL DATA: {0!r}". format(data_unit)
print "\n--- INVOKING DEBUGGER ---\n"
pdb.set_trace()
print
print "--- RETURNING FROM DEBUGGER ---\n"
del sys.modules['handlers']
import handlers
print "retrying"
process_unit("this")
process_unit(100)
process_unit(1.04)
process_unit(200)
process_unit(1.05)
process_unit(300)
process_unit(4+3j)
sys.exit(0)
And:
# handlers.py
def handle_default(x):
print "handle_default: {0!r}". format(x)
handler = {
int: handle_default,
str: handle_default
}
In Python 2.7, this gives you a dictionary linking expected/known types to functions that handle each type. If no handler is available for a type, the user is dropped own into the debugger, giving them a chance to amend the handlers.py file with appropriate handlers. In the above example, there is no handler for float or complex values. When they come, the user will have to add appropriate handlers. For example, one might add:
def handle_float(x):
print "FIXED FLOAT {0!r}".format(x)
handler[float] = handle_float
And then:
def handle_complex(x):
print "FIXED COMPLEX {0!r}".format(x)
handler[complex] = handle_complex
Here's what that run would look like:
$ python process.py
handle_default: 'this'
handle_default: 100
UNUSUAL DATA: 1.04
--- INVOKING DEBUGGER ---
> /Users/jeunice/pytest/testing/sfix/process.py(18)process_unit()
-> print
(Pdb) continue
--- RETURNING FROM DEBUGGER ---
retrying
FIXED FLOAT 1.04
handle_default: 200
FIXED FLOAT 1.05
handle_default: 300
UNUSUAL DATA: (4+3j)
--- INVOKING DEBUGGER ---
> /Users/jeunice/pytest/testing/sfix/process.py(18)process_unit()
-> print
(Pdb) continue
--- RETURNING FROM DEBUGGER ---
retrying
FIXED COMPLEX (4+3j)
Okay, so that basically works. You can improve and tweak that into a more production-ready form, making it compatible across Python 2 and 3, et cetera.
Please think long and hard before you do it that way.
This "modify the code in real-time" approach is an incredibly fragile pattern and error-prone approach. It encourages you to make real-time hot fixes in the nick of time. Those fixes will probably not have good or sufficient testing. Almost by definition, you have just this moment discovered you're dealing with a new type T. You don't yet know much about T, why it occurred, what its edge cases and failure modes might be, etc. And if your "fix" code or hot patches don't work, what then? Sure, you can put in some more exception handling, catch more classes of exceptions, and possibly continue.
Web frameworks like Flask have debug modes that work basically this way. But those are debug modes, and generally not suited for production. Moreover, what if you type the wrong command in the debugger? Accidentally type "quit" rather than "continue" and the whole program ends, and with it, your desire to keep the processing alive. If this is for use in debugging (exploring new kinds of data streams, maybe), have at.
If this is for production use, consider instead a strategy that sets aside unhandled-types for asynchronous, out-of-band examination and correction, rather than one that puts the developer / operator in the middle of a real-time processing flow.
No.
You can't moneky-patch a currently running Python function and continue pressing as though nothing else had happened. At least not in any general or practical way.
In theory, it is possible--but only under limited circumstances, with much effort and wizardly skill. It cannot be done with any generality.
To make the attempt, you'd have to:
Find the relevant function source and edit it (straightforward)
Compile the changed function source to bytecode (straightforward)
Insert the new bytecode in place of the old (doable)
Alter the function housekeeping data to point at the "logically" "same point" in the program where it exited to pdb. (iffy, under some conditions)
"Continue" from the debugger, falling back into the debugged code (iffy)
There are some circumstances where you might achieve 4 and 5, if you knew a lot about the function housekeeping and analogous debugger housekeeping variables. But consider:
The bytecode offset at which your pdb breakpoint is called (f_lasti in the frame object) might change. You'd probably have to narrow your goal to "alter only code further down in the function's source code than the breakpoint occurred" to keep things reasonably simple--else, you'd have to be able to compute where the breakpoint is in the newly-compiled bytecode. That might be feasible, but again under restrictions (such as "will only call pdb_trace() once, or similar "leave breadcrumbs for post-breakpoint analysis" stipulations).
You're going to have to be sharp at patching up function, frame, and code objects. Pay special attention to func_code in the function (__code__ if you're also supporting Python 3); f_lasti, f_lineno, and f_code in the frame; and co_code, co_lnotab, and co_stacksize in the code.
For the love of God, hopefully you do not intend to change the function's parameters, name, or other macro defining characteristics. That would at least treble the amount of housekeeping required.
More troubling, adding new local variables (a pretty common thing you'd want to do to alter program behavior) is very, very iffy. It would affect f_locals, co_nlocals, and co_stacksize--and quite possibly, completely rearrange the order and way bytecode accesses values. You might be able to minimize this by adding assignment statements like x = None to all your original locals. But depending on how the bytecodes change, it's possible you'll even have to hot-patch the Python stack, which cannot be done from Python per se. So C/Cython extensions could be required there.
Here's a very simple example showing that bytecode ordering and arguments can change significantly even for small alterations of very simple functions:
def a(x): LOAD_FAST 0 (x)
y = x + 1 LOAD_CONST 1 (1)
return y BINARY_ADD
STORE_FAST 1 (y)
LOAD_FAST 1 (y)
RETURN_VALUE
------------------ ------------------
def a2(x): LOAD_CONST 1 (2)
inc = 2 STORE_FAST 1 (inc)
y = x + inc LOAD_FAST 0 (x)
return y LOAD_FAST 1 (inc)
BINARY_ADD
STORE_FAST 2 (y)
LOAD_FAST 2 (y)
RETURN_VALUE
Be equally sharp at patching some of the pdb values that track where it's debugging, because when you type "continue," those are what dictates where control flow goes next.
Limit your patchable functions to those that have rather static state. They must, for example, never have objects that might be garbage-collected before the breakpoint is resumed, but accessed after it (e.g. in your new code). E.g.:
some = SomeObject()
# blah blah including last touch of `some`
# ...
pdb.set_trace()
# Look, Ma! I'm monkey-patching!
if some.some_property:
# oops, `some` was GC'd - DIE DIE DIE
While "ensuring the execution environment for the patched function is same as it ever was" is potentially problematic for many values, it's guaranteed to crash and burn if any of them exit their normal dynamic scope and are garbage-collected before patching alters their dynamic scope/lifetime.
Assert you only ever want to run this on CPython, since PyPy, Jython, and other Python implementations don't even have standard Python bytecodes and do their function, code, and frame housekeeping differently.
I would love to say this super-dynamic patching is possible. And I'm sure you can, with a lot of housekeeping object twiddling, construct simple cases where it does work. But real code has objects that go out of scope. Real patches might want new variables allocated. Etc. Real world conditions vastly multiply the effort required to make the patching work--and in some cases, make that patching strictly impossible.
And at the end of the day, what have you achieved? A very brittle, fragile, unsafe way to extend your processing of a data stream. There is a reason most monkey-patching is done at function boundaries, and even then, reserved for a few very-high-value use cases. Production data streaming is better served adopting a strategy that sets aside unrecognized values for out-of-band examination and accommodation.
If I understand correctly:
you don't want to repeat all the work that has already been done
you need a way to replace the #continue processing as usual with the new code once you have figured out how to handle the new data
#user2357112 was on the right track: expected_types should be a dictionary of
data_type:(detect_function, handler_function)
and detect_type needs to go through that to find a match. If no match is found, pdb pops up, you can then figure out what's going on, write a new detect_function and handler_funcion, add them to expected_types, and continue from pdb.
What I wanted to know is if there's a way to monkey patch the function that is currently running (process_a_lot), and "resume" execution.
So you want to somehow, from within pdb, write a new process_a_lot function, and then transfer control to it at the location of the pdb call?
Or, do you want to rewrite the function outside pdb, and then somehow reload that function from the .py file and transfer control into the middle of the function at the location of the pdb call?
The only possibility I can think of is: from within pdb, import your newly written function, then replace the current process_a_lot byte-code with the byte-code from the new function (I think it's func.co_code or something). Make sure you change nothing in the new function (not even the pdb lines) before the pdb lines, and it might work.
But even if it does, I would imagine it is a very brittle solution.
Related
I have started questioning if the way I handle errors is correct or pythonic. The code scenarios below are easy in nature, whilst the actual use would be more in line with discord.py and PRAW (reddit). The boolean indicates success, and the message returned is a message summarising the exception triggered.
Which of the following scenarios (WITH VERY SIMPLIFIED FUNCTIONS) is the proper pythonic way of doing it, or is there none? Or one I haven't thought of/learned yet?
Scenario 1: Main function returns on false from check function, but check function prints message
def main_function():
# calls the check function
success = check_function(1, 2)
if not success:
return
def check_function(intone, inttwo):
if intone != inttwo:
print("Not the same")
return False
return True
Scenario 2: The main functions prints an error message based on boolean returned from check function
def main_function():
# calls the check function
success = check_function(1, 2)
if not success:
print("Failure, not the same")
return
def check_function(intone, inttwo):
if intone != inttwo:
return False
return True
Scenario 3: The check function returns a boolean and a generated error message which the main function prints if false.
def main_function():
# calls the check function
success, error = check_function(1, 2)
if not success:
print(error)
return
def check_function(intone, inttwo):
if intone != inttwo:
return False, "Not the same"
return True
Obviously, the coding I'm doing that made me think about this is slightly more complicated, but I believe the examples are carried over.
If sounds like you are used to outputting a Boolean for whether things went well or not.
If an error occurs, then you output False, or True.
Additionally, if there is an an error, you are using print() statements to display the error.
An example of pseudo-code using an approach similar to yours is shown below:
# define function for baking a cake:
def make_a_pancake():
"""
outputs an ordered pair (`ec`, cake)
`ec` is an error code
If `ec` is zero, then `cake` is a finished cake
ERROR CODES:
0 - EVERYTHING WENT WELL
1 - NO MIXING BOWLS AVAILABLE
2 - WE NEED TO GET SOME EGGS
3 - WE NEED TO GET SOME FLOUR
"""
# IF THERE IS A MIXING BOWL AVAILABLE:
# GET OUT A MIXING BOWL
if kitchen.has(1, "bowl"):
my_bowl = kitchen.pop(1, "bowl")
else: # no mixing bowl available
print("WE NEED A MIXING BOWL TO MAKE A CAKE")
return 1 # NO MIXING BOWL IS AVAILABLE
# IF THERE ARE EGGS AVAILABLE:
# PUT THE EGGS INTO THE BOWL
if kitchen.has(6, "eggs"):
my_eggs = kitchen.pop(6, "eggs")
my_bowl.insert(my_eggs)
else: # OUT OF EGGS
print("RAN OUT OF EGGS. NEED MORE EGGS")
return 2
# IF THERE IS WHEAT FLOUR AVAILABLE:
# put 2 cups of flour into the bowl
if len(kitchen.peek("flour")) > 0:
f = kitchen.pop("flour", 2, "cups")
my_bowl.insert(f)
else:
print("NOT ENOUGH FLOUR IS AVAILABLE TO MAKE A CAKE")
RETURN 3, None
# stir the eggs and flour inside of the bowl
# stir 3 times
for _ in range(0, 3):
bowl.stir()
# pour the cake batter from the bowl into a pan
my_batter = bowl.pop()
pan.push(my_batter)
# cook the cake
stove.push(pan)
stove.turn_on()
stove.turn_off()
the_cake = pan.pop()
return err_code, the_cake
The code above is similar to the way code was written many decades ago.
Usually,
0 is interpreted as False
1, 2, 3, etc... are all True.
It might be confusing that 0 means no error occurred
However,
There is only one way for things to go right.
There are many ways for things to go wrong.
Every program written in python, Java, C++, C#, etc... will give the operating system (Microsoft windows, Linux, etc...) an error code when the whole program is done running.
A "clear" error flag (zero) is good.
An error flag of 1, 2, 3, ..., 55, ... 193, etc... is bad.
The most pythonic way to handle printing error messages is something you have not learned about yet.
It is called, exception handling
It looks like the following:
class NO_MORE_EGGS(Exception):
pass
def cook_omelet():
# if (THERE ARE NO EGGS):
# raise an exception
if(kitchen.peek("eggs") < 1):
msg = "you cannot make an omelet without breaking some eggs"
raise NO_MORE_EGGS(msg)
pan.push(eggs)
# cook the cake
stove.push(pan) # `push` means `insert` or `give`
stove.turn_on()
stove.turn_off()
pan = stove.pop()
the_omelet = pan.pop()
return the_omelet
Ideally, a function is like a small component in a larger machine.
For example, a car, or truck, contains many smaller components:
Alternators
Stereos (for music)
Radiator (to lower the engine temperature)
Brake-pads
Suppose that Bob designs a stereo for a car made by a company named "Ford."
Ideally, I can take the stereo Bob designed put the stereo in a design for a different car.
A function is also like a piece of equipment in a kitchen.
Examples of kitchen equipment might be:
A rice-cooker
A toaster
A kitchen sink
Suppose that you design a rice-cooker which works in Sarah's kitchen.
However, your rice-cooker does not work in Bob's kitchen.
That would be a very badly designed rice-cooker.
A well designed rice-cooker works in anybody's kitchen.
If you write a function in Python, then someone else should be able to use the function you wrote in somebody else's code.
For example, suppose you write a function to calculate the volume of a sphere.
If I cannot re-use your function in someone else's computer program, then you did a bad job writing your function.
A well-written function can be used in many, many, many different computer programs.
If you write a function containing print statements, that is a very bad idea.
Suppose you make a website to teach children about math.
Children type the radius of a sphere into the website.
The website calculates the volume of the sphere.
The website prints "The volume of the sphere is 5.11 cubic meters"
import math
def calc_sphere_vol(radius):
T = type(radius)
rad_cubed = radius**T(3)
vol = T(4)/T(3)*T(math.pi)*rad_cubed
print("the volume of the sphere is", vol)
return vol
In a different computer program, people want to calculate the radius of a sphere, quickly and easily without seeing any message printed to the console.
Maybe calculating the volume of a sphere is one tiny step on the way to a larger more complicated result.
A function should only print message to the console if the person using the function given the function permission to do so.
Suppose that:
you write some code.
you post your code on Github
I download your code from Github.
I should be able to run your code without seeing a single print statement (if I want to).
I should not have to re-write your code to turn-off the print statements.
Imagine that you were paid to design next year's model of some type of car.
You should not have to look inside the radio/stereo-system.
IF you are designing a large system, you should not have to see what is inside each small component.
A computer program is too big and complicated to re-write the code inside of the existing functions.
Imagine pieces of a computer program as small black cubes, or boxes.
Each box has input USB ports.
Each box has output USB ports.
I should be able to plug in any wires I want into the small box you designed and built.
I should never have to open up the box and re-wire the inside.
A computer programmer should be able to change where output from a function goes without modifying the code inside of that function.
print statements are very very bad.
I would avoid option 1 in the majority of cases. By having it do the reporting, you're limiting how it can be used in the future. What if you want this function to be used in a tkinter app later? Is printing still appropriate? If the function's job is to do a check, it should just do the check. The caller can figure out how they want to use the returned boolean.
Options 2 and 3 are both viable, but which you'd use would depend on the case.
If there's exactly one way in which a function can fail, or all different failures should be treated the same, a simple False is fine, since False implies the single failure reason.
If there's multiple distinct ways in which a function can fail, returning some failure indicator that allows for multiple reasons would be better. A return of type Union[bool, str] may be appropriate for some cases; although I don't thinks strings make for good error types since they can be typo'd. I think a tuple of Union[bool, ErrorEnum] would be better; where ErrorEnum is some enum.Enum that can take a restricted set of possible values.
Options 3 bears a weak resemblence to Haskell's Maybe and Either types. If you're interested in different ways of error-handling, seeing how the "other side of the pond" handles errors may be enlightening. Here's a similar example to what you have from the Either link:
>>> import Data.Char ( digitToInt, isDigit )
>>> :{
let parseEither :: Char -> Either String Int
parseEither c
| isDigit c = Right (digitToInt c)
| otherwise = Left "parse error"
>>> :}
parseEither returns either a parsed integer, or a string representing an error.
I am getting a segmentation fault when initializing an array.
I have a callback function from when an RFID tag gets read
IDS = []
def readTag(e):
epc = str(e.epc, 'utf-8')
if not epc in IDS:
now = datetime.datetime.now().strftime('%m/%d/%Y %H:%M:%S')
IDS.append([epc, now, "name.instrument"])
and a main function from which it's called
def main():
for x in vals:
IDS.append([vals[0], vals[1], vals[2]])
for x in IDS:
print(x[0])
r = mercury.Reader("tmr:///dev/ttyUSB0", baudrate=9600)
r.set_region("NA")
r.start_reading(readTag, on_time=1500)
input("press any key to stop reading: ")
r.stop_reading()
The error occurs because of the line IDS.append([epc, now, "name.instrument"]). I know because when I replace it with a print call instead the program will run just fine. I've tried using different types for the array objects (integers), creating an array of the same objects outside of the append function, etc. For some reason just creating an array inside the "readTag" function causes the segmentation fault like row = [1,2,3]
Does anyone know what causes this error and how I can fix it? Also just to be a little more specific, the readTag function will work fine for the first two (only ever two) calls, but then it crashes and the Reader object that has the start_reading() function is from the mercury-api
This looks like a scoping issue to me; the mercury library doesn't have permission to access your list's memory address, so when it invokes your callback function readTag(e) a segfault occurs. I don't think that the behavior that you want is supported by that library
To extend Michael's answer, this appears to be an issue with scoping and the API you're using. In general pure-Python doesn't seg-fault. Or at least, it shouldn't seg-fault unless there's a bug in the interpreter, or some extension that you're using. That's not to say pure-Python won't break, it's just that a genuine seg-fault indicates the problem is probably the result of something messy outside of your code.
I'm assuming you're using this Python API.
In that case, the README.md mentions that the Reader.start_reader() method you're using is "asynchronous". Meaning it invokes a new thread or process and returns immediately and then the background thread continues to call your callback each time something is scanned.
I don't really know enough about the nitty gritty of CPython to say exactly what going on, but you've declared IDS = [] as a global variable and it seems like the background thread is running the callback with a different context to the main program. So when it attempts to access IDS it's reading memory it doesn't own, hence the seg-fault.
Because of how restrictive the callback is and the apparent lack of a buffer, this might be an oversight on the behalf of the developer. If you really need asynchronous reads it's worth sending them an issue report.
Otherwise, considering you're just waiting for input you probably don't need the asynchronous reads, and you could use the synchronous Reader.read() method inside your own busy loop instead with something like:
try:
while True:
readTags(r.read(timeout=10))
except KeyboardInterrupt: ## break loop on SIGINT (Ctrl-C)
pass
Note that r.read() returns a list of tags rather than just one, so you'd need to modify your callback slightly, and if you're writing more than just a quick script you probably want to use threads to interrupt the loop properly as SIGINT is pretty hacky.
I'm using an in-house Python library for scientific computing. I need to consecutively copy an object, modify it, and then delete it. The object is huge which causes my machine to run out of memory after a few cycles.
The first problem is that I use python's del to delete the object, which apparently only dereferences the object, rather than freeing up RAM.
The second problem is that even when I encapsulate the whole process in a function, after the function is invoked, the RAM is still not freed up. Here's a code snippet to better explain the issue.
ws = op.core.Workspace()
net = op.network.Cubic(shape=[100,100,100], spacing=1e-6)
proj = net.project
def f():
for i in range(5):
clone = ws.copy_project(proj)
result = do_something_with(clone)
del clone
f()
gc.collect()
>>> ws
{'sim_01': [<openpnm.network.Cubic object at 0x7fed1c417780>],
'sim_02': [<openpnm.network.Cubic object at 0x7fed1c417888>],
'sim_03': [<openpnm.network.Cubic object at 0x7fed1c417938>],
'sim_04': [<openpnm.network.Cubic object at 0x7fed1c417990>],
'sim_05': [<openpnm.network.Cubic object at 0x7fed1c4179e8>],
'sim_06': [<openpnm.network.Cubic object at 0x7fed1c417a40>]}
My question is how do I completely delete a Python object?
Thanks!
PS. In the code snippet, each time ws.copy_project is called, a copy of proj is stored in ws dictionary.
There are some really smart python people on here. They may be able to tell you better ways to keep your memory clear, but I have used leaky libraries before, and found one (so-far) foolproof way to guarantee that your memory gets cleared after use: execute the memory hog in another process.
To do this, you'd need to arrange for an easy way to make your long calculation be executable separately. I have done this by adding special flags to my existing python script that tells it just to run that function; you may find it easier to put that function in a separate .py file, e.g.:
do_something_with.py
import sys
def do_something_with(i)
# Your example is still too vague. Clearly, something differentiates
# each do_something_with, otherwise you're just taking the
# same inputs 5 times over.
# Whatever the difference is, pass it in as an argument to the function
ws = op.core.Workspace()
net = op.network.Cubic(shape=[100,100,100], spacing=1e-6)
proj = net.project
# You may not even need to clone anymore?
clone = ws.copy_project(proj)
result = do_something_with(clone)
# Whatever arg(s) you need to get to the function, just pass it in on the command line
if __name__ == "__main__":
sys.exit(do_something_with(sys.args[1:]))
You can do this using any of the python tools that handle subprocesses. In python 3.5+, the recommended way to do this is subprocess.run. You could change your bigger function to something like this:
import subprocess
invoke_do_something(i):
completed_args = subprocess.run(["python", "do_something_with.py", str(i)], check=False)
return completed_args.returncode
results = map(invoke_do_something, range(5))
You'll obviously need to tailor this to fit your own situation, but by running in a subprocess, you're guaranteed to not have to worry about the memory getting cleaned up. As an added bonus, you could potentially use multiprocess.Pool.map to use multiple processors at one time. (I deliberately coded this to use map to make such a transition simple. You could still use your for loop if you prefer, and then you don't need the invoke... function.) Multiprocessing could speed up your processing, but since you're already worried about memory, is almost certainly a bad idea - with multiple processes of the big memory hog, your system itself will likely quickly run out of memory and kill your process.
Your example is fairly vague, so I've written this at a high level. I can answer some questions if you need.
According to Tim Peters, "There should be one-- and preferably only one --obvious way to do it." In Python, there appears to be three ways to print information:
print('Hello World', end='')
sys.stdout.write('Hello World')
os.write(1, b'Hello World')
Question: Are there best-practice policies that state when each of these three different methods of printing should be used in a program?
Note that the statement of Tim is perfectly correct: there is only one obvious way to do it: print().
The other two possibilities that you mention have different goals.
If we want to summarize the goals of the three alternatives:
print is the high-level function that allow you to write something to stdout(or an other file). It provides a simple and readable API, with some fancy options about how the single items are separated, or whether you want to add or not a terminator etc. This is what you want to do most of the time.
sys.stdout.write is just a method of the file objects. So the real point of sys.stdout is that you can pass it around as if it were any other file. This is useful when you have to deal with a function that is expecting a file and you want it to print the text directly on stdout.
In other words you shouldn't use sys.stdout.write at all. You just pass around sys.stdout to code that expects a file.
Note: in python2 there were some situations where using the print statement produced worse code than calling sys.stdout.write. However the print function allows you to define the separator and terminator and thus avoids almost all these corner cases.
os.write is a low-level call to write to a file. You must manually encode the contents and you also have to pass the file descriptor explicitly. This is meant to handle only low level code that, for some reason, cannot be implemented on top of the higher-level interfaces. You almost never want to call this directly, because it's not required and has a worse API than the rest.
Note that if you have code that should write down things on a file, it's better to do:
my_file.write(a)
# ...
my_file.write(b)
# ...
my_file.write(c)
Than:
print(a, file=my_file)
# ...
print(b, file=my_file)
# ...
print(c, file=my_file)
Because it's more DRY. Using print you have to repeat file= everytime. This is fine if you have to write only in one place of the code, but if you have 5/6 different writes is much easier to simply call the write method directly.
To me print is the right way to print to stdout, but :
There is a good reason why sys.stdout.write exists - Imagine a class which generates some text output, and you want to make it write to either stdout, and file on disk, or a string. Ideally the class really shouldn't care what output type it is writing to. The class can simple be given a file object, and so long as that object supports the write method, the class can use the write method to output the text.
Two of these methods require importing entire modules. Based on this alone, print() is the best standard use option.
sys.stdout is useful whenever stdout may change. This gives quite a bit of power for stream handling.
os.write is useful for os specific writing tasks (non blocking writes for instance)
This question has been asked a number of times on this site for sys.stdout vs. print:
Python - The difference between sys.stdout.write and print
print() vs sys.stdout.write(): which and why?
One example for using os.write (non blocking file writes demonstrated in the question below). The function may only be useful on some os's but it still must remain portable even when certain os's don't support different/special behaviors.
How to write to a file using non blocking IO?
I'm making a progress indicator for some long-running console process with intent to use it like this:
pi = ProgressIndicator()
for x in somelongstuff:
do stuff
pi.update()
pi.print_totals()
Basically, it should output some kind of a progress bar with dots and dashes, and something like "234234 bytes processed" at the end.
I thought it would be nice to use it as a context manager:
with ProgressIndicator() as pi:
for x in somelongstuff:
do stuff
pi.update()
However there are a few things that concern me about this solution:
extra indentation makes the indicator feature appear more important than it actually is
I don't want ProgressIndicator to handle any exceptions that might occur in the loop
Is this a valid use case for a context manager? What other solutions can you suggest?
It definitely seems a valid use case. The context manager doesn't have to handle exceptions if you don't want it to, although you would want to end the line that the progress bar is output on to prevent it being confused with the traceback, and have it not print the totals if it is exited through an exception.
With regard to indentation, I'd argue that letting the user see progress is actually a very important feature so it's fine for it to take up an indentation level.
There's a GUI application which has a very similar ProgressTask API, which you use like this:
def slow_func():
t = nuke.ProgressTask()
t.setMessage("Doing something")
for x in range(100):
do_something()
t.setProgress(x+1)
When ProgressTask.__del__ is called, the progress bar UI disappears. This works nicely for the most part, however if an exception is raised (e.g by do_something()), the traceback object keeps a reference to the ProgressTask object, so the progress-bar gets stuck (until another traceback occurs)
The ProgressTask implemented the context-manager protocol, it could use the __exit__ method to ensure the progress bar has been hidden.
For a command-line UI (which is sounds like you are writing), this may not be an issue, but you could do similar cleanup tasks, e.g to display an ######### 100% (error) type bar, and ensure the traceback output isn't messed up etc
There's no reason your progress-bar class couldn't be usable in both manners - most context-managers are perfectly usable as both regular objects and context-managers, e.g:
lock = threading.Lock()
lock.acquire()
lock.release()
# or:
with lock:
pass