I'm reading a book on Python3 (Introducing Python by Bill Lubanovic), and came across something I wasn't sure is a Python preference, or just a "simplification" due to being a book and trying to describe something else.
It's on how to write to a file using chunks instead of in one shot.
poem = '''There was a young lady named Bright,
Whose speed was far faster than light;
She started one day
In a relative way,
And returned on the previous night.'''
fout = open('relativity', 'wt')
size = len(poem)
offset = 0
chunk = 100
while True:
if offset > size:
break
fout.write(poem[offset:offset+chunk])
offset += chunk
fout.close()
I was about to ask why it has while True instead of while (offset > size), but decided to try it for myself, and saw that while (offset > size) doesn't actually do anything in my Python console.
Is that just a bug in the console, or does Python really require you to move the condition inside the while loop like that? With all of the changes to make it as minimal as possible, this seems very verbose.
(I'm coming from a background in Java, C#, and JavaScript where the condition as the definition for the loop is standard.)
EDIT
Thanks to xnx's comment, I realized that I had my logic incorrect in what I would have the condition be.
This brings me back to a clearer question that I originally wanted to focus on:
Does Python prefer to do while True and have the condition use a break inside the loop, or was that just an oversight on the author's part as he tried to explain a different concept?
I was about to ask why it has while True instead of while (offset <= size), but decided to try it for myself,
This is actually how I would have written it. It should be logically equivelent.
and saw that while (offset > size) doesn't actually do anything in my Python console.
You needed to use (offset <= size), not (offset > size). The current logic stops as soon as the offset is greater than size, so reverse the condition if you want to put it in the while statement.
does Python really require you to move the condition inside the while loop like that?
No, Python allows you write write the condition in the while loop directly. Both options are fine, and it really is more a matter of personal preference in how you write your logic. I prefer the simpler form, as you were suggesting, over the original author's version.
This should work fine:
while offset <= size:
fout.write(poem[offset:offset+chunk])
offset += chunk
For details, see the documentation for while, which specifically states that any expression can be used before the :.
Edit:
Does Python prefer to do while True and have the condition use a break inside the loop, or was that just an oversight on the author's part as he tried to explain a different concept?
Python does not prefer while True:. Either version is fine, and it's completely a matter of preference for the coder. I personally prefer keeping the expression in the while statement, as I find the code more clear, concise, and maintainable using while offset <= size:.
It is legal Python code to put the conditional in the loop. Personally I think:
while offset <= size:
is clearer than:
while True:
if offset < size:
break
I prefer the first form because there's one less branch to follow but the logic is not any more complex. All other things being equal, lower levels of indentation are easier to read.
If there were multiple different conditions that would break out of the loop then it might be preferable to go for the while True syntax.
As for the observed behavior with the incorrect loop logic, consider this snippet:
size = len(poem)
offset = 0
while offset > size:
#loop code
The while loop will never be entered as offset > size starts off false.
while True:
if offset > size:
break
func(x)
is exactly equivalent to
while offset <= size:
func(x)
They both run until offset > size. It is simply a different way of phrasing it -- both are acceptable, and I'm not aware of any performance differences. They would only run differently if you had the break condition at the bottom of the while loop (i.e. after func(x))
edit:
According to the Python wiki, in Python 2.* "it slows things down a lot" to put the condition inside the while loop: "this is due to first testing the True condition for the while, then again testing" the break condition. I don't know what measure they use for "a lot", but it seems miniscule enough.
When reading from a file, you usually do
output = []
while True:
chunk = f.read(chunksize)
if len(chunk) == 0:
break
output.append(chunk)
It seems to me like the author is more used to doing reading than writing, and in this case the reading idiom of using while True has leaked through to the writing code.
As most of the folks answering the question can attest to, using simply while offset <= size is probably more Pythonic and simpler, though even simpler might be just to do
f.write(poem)
since the underlying I/O library can handle the chunked writes for you.
Does Python prefer to do while True and have the condition use a break
inside the loop, or was that just an oversight on the author's part as
he tried to explain a different concept?
No it doesn't, this is a quirk or error of the author's own.
There are situations where typical Python style (and Guido van Rossum) actively advise using while True, but this isn't one of them. That said, they don't disadvise it either. I imagine there are cases where a test would be easier to read and understand as "say when to break" than as "say when to keep going". Even though they're just logical negations of each other, one or other might make express things a little more simply:
while not god_unwilling() and not creek_risen():
while not (god_unwilling() or creek_risen()):
vs.
while True:
if god_unwilling() or creek_risen():
break
I still sort of prefer the first, but YMMV. Even better introduce functions that correspond to the English idiom: god_willing() and creek_dont_rise()
The "necessary" use is when you want to execute the break test at the end or middle of the loop (that is to say, when you want to execute part or all of the loop unconditionally the first time through). Where other languages have a greater variety of more complex loop constructs, and other examples play games with a variable to decide whether to break or not, Guido says "just use while True". Here's an example from the FAQ, for a loop that starts with an assignment:
C code:
while (line = readline(f)) {
// do something with line
}
Python code:
while True:
line = f.readline()
if not line:
break
# do something with line
The FAQ remarks (and this relates to typical Python style):
An interesting phenomenon is that most experienced Python programmers
recognize the while True idiom and don’t seem to be missing the
assignment in expression construct much; it’s only newcomers who
express a strong desire to add this to the language.
It also points out that for this particular example, you can in fact avoid the whole problem anyway with for line in f.
In my opinion, while True is better than other ways, in big programs; which has long codes. Because you cant see actually that a variable may change because of some functions or etc. while True means start if its true which means start this loop whatever happens except closed the program. So that maybe writer of the book want you used to use while True, is a little less risky than others.
It's better used to while True and define the variable which is may stop this loop.
Related
I am experiencing some really weird behavior with pool.starmap in the context of a groupby apply function. Without getting the specifics, my code is something like this:
def groupby_apply_func(input_df):
print('Processing group # x (I determine x from the df)')
output_df = pool.starmap(another_func, zip(some fields in input_df))
return output_df
result = some_df.groupby(groupby_fields).apply(groupby_apply_func)
In words, this takes a dataframe, forms a groupby on it, sends these groups to groupby_apply_func, which does some processing asynchronously using starmap and returns the results, which are concatenated into a final df. pool is a worker pool made from the multiprocessing library.
This code works for smaller datasets without problem. So there are no syntax errors or anything. The computer will loop through all of the groups formed by groupby, send them to groupby_apply_func (I can see the progress from the print statement), and come back fine.
The weird behavior is: on large datasets, it starts looping through the groups. Then, halfway through, or 3/4 way through (which in real time might be 12 hours), it starts completely over at the beginning of the groupbys! It resets the loop and begins again. Then, sometimes, the second loop resets also and so on... and it gets stuck in an infinite loop. Again, this is only with large datasets, it works as intended on small ones.
Could there be something in the apply functionality that, upon running out of memory, for example, decides to start re-processing all the groups? Seems unlikely to me, but I did read that the apply function will actually process the first group multiple times in order to optimize code paths, so I know that there is "meta" functionality in there - and some logic to handle the processing - and it's not just a straight loop.
Hope all that made sense. Does anyone know the inner workings of groupby.apply and if so if anything in there could possibly be causing this?
thx
EDIT: IT APPEARS TO RESET THE LOOP at this point in ops.py ... it gets to this except clause and then proceeds to line 195 which is for key, (i, group) in zip(group_keys, splitter): which starts the entire loop over again. Does this mean anything to anybody?
except libreduction.InvalidApply as err:
# This Exception is raised if `f` triggers an exception
# but it is preferable to raise the exception in Python.
if "Let this error raise above us" not in str(err):
# TODO: can we infer anything about whether this is
# worth-retrying in pure-python?
raise
I would use a list of the group dataframes as the argument to map (I don't think you need starmap here), rather than hiding the multiprocessing in the function to be applied.
def func(df):
# do something
return df.apply(func2)
with mp.Pool(mp.cpu_count()) as p:
groupby = some_df.groupby(groupby_fields)
groups = [groupby.get_group(group) for group in groupby.groups]
result = p.map(func, groups)
OK so I figured it out. Doesn't have anything to do with starmap. It is due to the groupby apply function. This function tries to call fast_apply over the groupby prior to running "normal" apply. If anything causes an error in that fast_apply loop (in my case it was an out of memory error) it then tries to re-run using "normal" apply. However, it does not print the exception / error and just catches all errors.
Not sure if any Python people will read this but I'd humbly suggest that:
if an error really occurs in the fast_apply loop, maybe print it out, rather than catch everything, this could make debugging this like this much easier
the logic to re-run the entire loop if fast_apply fails... seems a little weird to me. Probably not a big deal for small apply operations. In my case I had a huge one and I really don't want it re-running the entire thing again. How about: Perhaps give the user an option to NOT use fast_apply - to avoid the whole fast_apply optimization? I don't know the inner workings of it and I'm sure it's in there for a good reason, but it does add complexity and in my case created very confusing situation which took hours to figure out.
I've done everything in my ability to try to do this without any functions and only while loops (as said by my teacher) please could you find out why, I even tried dry running but it should still work
p.s it doesn't work in either ways of using:
while c!=0 and f<50: or while c!0 or f<50:
happy_num=1
x= [0]*30
f=0
#f is a safety measure so that the program has a stop and doesnt go out of control
happynumbers=" "
number=int(input("input number "))
while happy_num!=31:
c=0
happy=number
while c!=1 or f<50:
integer=number
f=f+1
if integer<10:
a=number
b=0
d=0
elif integer<100:
a=number // 10
b=number % 10
d=0
else:
a=number // 10
bee=number % 100
b=bee // 10
d=number % 10
number=(a*a)+(b*b)+(d*d)
c=number
x[happy_num-1]=happy
if c==1:
happy_num+=1
elif f>49:
print("too many iterations program shutting down")
exit()
number=happy+1
print ("the happy numbers are: ", happynumbers)
Firstly, you say:
It doesn't work in either ways of using: while c!=1 and f<50: or while c!=1 or f<50
You should use the one with and (while c!=1 and f<50:), because otherwise it is useless as a failsafe. Right now your program gets stuck anyways so it might seem to you to not make a difference, I understand that. It's important in general that a failsafe is added with and, and not with or (because true or true == true and true or false == true, so when your loop is infinite, the f<50 failsafe will not make any difference to the truth value of the guard).
Adding print statements to your program, you can see that at around f=30 the program starts to become very slow, but this is not because of some infinite loop or anything; it's because the computations start to become very big at the line:
number=(a*a)+(b*b)+(d*d)
So f never actually reaches 50, because the program gets stuck trying to perform enormous multiplications. So much for your failsafe :/
I am not sure what algorithm you are using to find these happy numbers, but my guess it that there is something lacking in your guard of the inner loop, or some break statement missing. Do you account for the situation that the number isn't a happy number? Somehow you should also exit the loop in that situation. If you have some more background to what the program is supposed to do, that would be very helpful.
Edit: judging from the programming example on the Wikipedia page of happy numbers, you need some way to keep track of which numbers you've already 'visited' during your computations; that is the only way to know that some number is not a happy number.
plus the question i was given was to find the first 30 happy numbers, so the input would 1
I have wrote this code for solving a problem given to me in class, the task was to solve the "toads and frogs problem" using backtracking. My code solves this problem but does not stop once it reaches the solution (it keeps printing "states" showing other paths that are not a solution to the problem), is there a way to do this?. This is the code:
def solution_recursive(frogs):
#Prints the state of the problem (example: "L L L L _ R R R R" in the starting case
#when all the "left" frogs are on the left side and all the "right" frogs are on
#the right side)
show_frogs(frogs)
#If the solution is found, return the list of frogs that contains the right order
if frogs == ["R","R","R","R","E","L","L","L","L"]:
return(frogs)
#If the solution isn't the actual state, then start (or continue) recursion
else:
#S_prime contains possible solutions to the problem a.k.a. "moves"
S_prime = possible_movements(frogs)
#while S_prime contains solutions, do the following
while len(S_prime) > 0:
s = S_prime[0]
S_prime.pop(0)
#Start again with solution s
solution_recursive(s)
Thanks in advancement!
How do I stop my backtracking algorithm once I find an answer?
You could use Python exception facilities for such a purpose.
You could also adopt the convention that your solution_recursive returns a boolean telling to stop the backtrack.
It is also a matter of taste or of opinion.
I'd like to expand a bit on your recursive code.
One of your problems is that your program displays paths that are not the solutions. This is because each call to solution_recursive starts with
show_frogs(frogs)
regardless of whether frogs is the solution or not.
Then, you say that the program is continuing even after the solution has been found. There are two reasons for this, the first being that your while loop doesn't care about whether the solution has been found or not, it will go through all the possible moves:
while len(S_prime) > 0:
And the other reason is that you are reinitializing S_prime every time this function is called. I'm actually quite amazed that it didn't enter an infinite loop just checking the first move over and over again.
Since this is a problem in class, I won't give you an exact solution but these problems need to be resolved and I'm sure that your teaching material can guide you.
Your program just paused on a pdb.set_trace().
Is there a way to monkey patch the function that is currently running, and "resume" execution?
Is this possible through call frame manipulation?
Some context:
Oftentimes, I will have a complex function that processes large quantities of data, without having a priori knowledge of what kind of data I'll find:
def process_a_lot(data_stream):
#process a lot of stuff
#...
data_unit= data_stream.next()
if not can_process(data_unit)
import pdb; pdb.set_trace()
#continue processing
This convenient construction launches a interactive debugger when it encounters unknown data, so I can inspect it at will and change process_a_lot code to handle it properly.
The problem here is that, when data_stream is big, you don't really want to chew through all the data again (let's assume next is slow, so you can't save what you already have and skip on the next run)
Of course, you can replace other functions at will once in the debugger. You can also replace the function itself, but it won't change the current execution context.
Edit:
Since some people are getting side-tracked:
I know there are a lot of ways of structuring your code such that your processing function is separate from process_a_lot. I'm not really asking about ways to structure the code as much as how to recover (in runtime) from the situation when the code is not prepared to handle the replacement.
First a (prototype) solution, then some important caveats.
# process.py
import sys
import pdb
import handlers
def process_unit(data_unit):
global handlers
while True:
try:
data_type = type(data_unit)
handler = handlers.handler[data_type]
handler(data_unit)
return
except KeyError:
print "UNUSUAL DATA: {0!r}". format(data_unit)
print "\n--- INVOKING DEBUGGER ---\n"
pdb.set_trace()
print
print "--- RETURNING FROM DEBUGGER ---\n"
del sys.modules['handlers']
import handlers
print "retrying"
process_unit("this")
process_unit(100)
process_unit(1.04)
process_unit(200)
process_unit(1.05)
process_unit(300)
process_unit(4+3j)
sys.exit(0)
And:
# handlers.py
def handle_default(x):
print "handle_default: {0!r}". format(x)
handler = {
int: handle_default,
str: handle_default
}
In Python 2.7, this gives you a dictionary linking expected/known types to functions that handle each type. If no handler is available for a type, the user is dropped own into the debugger, giving them a chance to amend the handlers.py file with appropriate handlers. In the above example, there is no handler for float or complex values. When they come, the user will have to add appropriate handlers. For example, one might add:
def handle_float(x):
print "FIXED FLOAT {0!r}".format(x)
handler[float] = handle_float
And then:
def handle_complex(x):
print "FIXED COMPLEX {0!r}".format(x)
handler[complex] = handle_complex
Here's what that run would look like:
$ python process.py
handle_default: 'this'
handle_default: 100
UNUSUAL DATA: 1.04
--- INVOKING DEBUGGER ---
> /Users/jeunice/pytest/testing/sfix/process.py(18)process_unit()
-> print
(Pdb) continue
--- RETURNING FROM DEBUGGER ---
retrying
FIXED FLOAT 1.04
handle_default: 200
FIXED FLOAT 1.05
handle_default: 300
UNUSUAL DATA: (4+3j)
--- INVOKING DEBUGGER ---
> /Users/jeunice/pytest/testing/sfix/process.py(18)process_unit()
-> print
(Pdb) continue
--- RETURNING FROM DEBUGGER ---
retrying
FIXED COMPLEX (4+3j)
Okay, so that basically works. You can improve and tweak that into a more production-ready form, making it compatible across Python 2 and 3, et cetera.
Please think long and hard before you do it that way.
This "modify the code in real-time" approach is an incredibly fragile pattern and error-prone approach. It encourages you to make real-time hot fixes in the nick of time. Those fixes will probably not have good or sufficient testing. Almost by definition, you have just this moment discovered you're dealing with a new type T. You don't yet know much about T, why it occurred, what its edge cases and failure modes might be, etc. And if your "fix" code or hot patches don't work, what then? Sure, you can put in some more exception handling, catch more classes of exceptions, and possibly continue.
Web frameworks like Flask have debug modes that work basically this way. But those are debug modes, and generally not suited for production. Moreover, what if you type the wrong command in the debugger? Accidentally type "quit" rather than "continue" and the whole program ends, and with it, your desire to keep the processing alive. If this is for use in debugging (exploring new kinds of data streams, maybe), have at.
If this is for production use, consider instead a strategy that sets aside unhandled-types for asynchronous, out-of-band examination and correction, rather than one that puts the developer / operator in the middle of a real-time processing flow.
No.
You can't moneky-patch a currently running Python function and continue pressing as though nothing else had happened. At least not in any general or practical way.
In theory, it is possible--but only under limited circumstances, with much effort and wizardly skill. It cannot be done with any generality.
To make the attempt, you'd have to:
Find the relevant function source and edit it (straightforward)
Compile the changed function source to bytecode (straightforward)
Insert the new bytecode in place of the old (doable)
Alter the function housekeeping data to point at the "logically" "same point" in the program where it exited to pdb. (iffy, under some conditions)
"Continue" from the debugger, falling back into the debugged code (iffy)
There are some circumstances where you might achieve 4 and 5, if you knew a lot about the function housekeeping and analogous debugger housekeeping variables. But consider:
The bytecode offset at which your pdb breakpoint is called (f_lasti in the frame object) might change. You'd probably have to narrow your goal to "alter only code further down in the function's source code than the breakpoint occurred" to keep things reasonably simple--else, you'd have to be able to compute where the breakpoint is in the newly-compiled bytecode. That might be feasible, but again under restrictions (such as "will only call pdb_trace() once, or similar "leave breadcrumbs for post-breakpoint analysis" stipulations).
You're going to have to be sharp at patching up function, frame, and code objects. Pay special attention to func_code in the function (__code__ if you're also supporting Python 3); f_lasti, f_lineno, and f_code in the frame; and co_code, co_lnotab, and co_stacksize in the code.
For the love of God, hopefully you do not intend to change the function's parameters, name, or other macro defining characteristics. That would at least treble the amount of housekeeping required.
More troubling, adding new local variables (a pretty common thing you'd want to do to alter program behavior) is very, very iffy. It would affect f_locals, co_nlocals, and co_stacksize--and quite possibly, completely rearrange the order and way bytecode accesses values. You might be able to minimize this by adding assignment statements like x = None to all your original locals. But depending on how the bytecodes change, it's possible you'll even have to hot-patch the Python stack, which cannot be done from Python per se. So C/Cython extensions could be required there.
Here's a very simple example showing that bytecode ordering and arguments can change significantly even for small alterations of very simple functions:
def a(x): LOAD_FAST 0 (x)
y = x + 1 LOAD_CONST 1 (1)
return y BINARY_ADD
STORE_FAST 1 (y)
LOAD_FAST 1 (y)
RETURN_VALUE
------------------ ------------------
def a2(x): LOAD_CONST 1 (2)
inc = 2 STORE_FAST 1 (inc)
y = x + inc LOAD_FAST 0 (x)
return y LOAD_FAST 1 (inc)
BINARY_ADD
STORE_FAST 2 (y)
LOAD_FAST 2 (y)
RETURN_VALUE
Be equally sharp at patching some of the pdb values that track where it's debugging, because when you type "continue," those are what dictates where control flow goes next.
Limit your patchable functions to those that have rather static state. They must, for example, never have objects that might be garbage-collected before the breakpoint is resumed, but accessed after it (e.g. in your new code). E.g.:
some = SomeObject()
# blah blah including last touch of `some`
# ...
pdb.set_trace()
# Look, Ma! I'm monkey-patching!
if some.some_property:
# oops, `some` was GC'd - DIE DIE DIE
While "ensuring the execution environment for the patched function is same as it ever was" is potentially problematic for many values, it's guaranteed to crash and burn if any of them exit their normal dynamic scope and are garbage-collected before patching alters their dynamic scope/lifetime.
Assert you only ever want to run this on CPython, since PyPy, Jython, and other Python implementations don't even have standard Python bytecodes and do their function, code, and frame housekeeping differently.
I would love to say this super-dynamic patching is possible. And I'm sure you can, with a lot of housekeeping object twiddling, construct simple cases where it does work. But real code has objects that go out of scope. Real patches might want new variables allocated. Etc. Real world conditions vastly multiply the effort required to make the patching work--and in some cases, make that patching strictly impossible.
And at the end of the day, what have you achieved? A very brittle, fragile, unsafe way to extend your processing of a data stream. There is a reason most monkey-patching is done at function boundaries, and even then, reserved for a few very-high-value use cases. Production data streaming is better served adopting a strategy that sets aside unrecognized values for out-of-band examination and accommodation.
If I understand correctly:
you don't want to repeat all the work that has already been done
you need a way to replace the #continue processing as usual with the new code once you have figured out how to handle the new data
#user2357112 was on the right track: expected_types should be a dictionary of
data_type:(detect_function, handler_function)
and detect_type needs to go through that to find a match. If no match is found, pdb pops up, you can then figure out what's going on, write a new detect_function and handler_funcion, add them to expected_types, and continue from pdb.
What I wanted to know is if there's a way to monkey patch the function that is currently running (process_a_lot), and "resume" execution.
So you want to somehow, from within pdb, write a new process_a_lot function, and then transfer control to it at the location of the pdb call?
Or, do you want to rewrite the function outside pdb, and then somehow reload that function from the .py file and transfer control into the middle of the function at the location of the pdb call?
The only possibility I can think of is: from within pdb, import your newly written function, then replace the current process_a_lot byte-code with the byte-code from the new function (I think it's func.co_code or something). Make sure you change nothing in the new function (not even the pdb lines) before the pdb lines, and it might work.
But even if it does, I would imagine it is a very brittle solution.
I saw some similar questions to this but none seems to address this is specific question so I don't know if I am overlooking something since I am new to Python.
Here is the context for the question:
for i in range(10):
if something_happens(i):
break
if(something_happened_on_last_position()):
# do something
From my C background, if I had a for (i=0;i<10;i++) doing the same thing with a break, then the value of i would be 10, not 9 if the break didn't occur, and 9 if it occurred on the last element. That means the method something_happened_on_last_position() could use this fact to distinguish between both events. However what I noticed on python is that i will stop on 9 even after running a successful loop without breaks.
While make a distinction between both could be as simple as adding a variable there like a flag, I never liked such usage on C. So I was curious, is there another alternative to do this or am I missing something silly here?
Do notice that I can't just use range(11) because this would run something_happens(10). It is different on C on this since '10' would fail on the condition on the for loop and would never execute something_happens(10) (since we start from index 0 here the value is 10 on both Python and C).
I used the methods just to illustrate which code chunk I was interest, they are a set of other conditions that are irrelevant for explaining the problem.
Thank you!
It works the other way:
for i in range(10):
if something_happens(i):
break
else: # no break in any position
do whatever
This is precisely what the else clause is for on for loops:
for i in range(10):
if something_happens(i):
break
else:
# Never hit the break
The else clause is confusing to many, think of it as the else that goes with all those if's you executed in the loop. The else clause happens if the break never does. More about this: For/else