Best way in python to check if a loop is not executed - python

The title might be misleading, so here is a better explanation.
Consider the following code:
def minimum_working_environment(r):
trial=np.arange(0,6)[::-1]
for i in range(len(trial)):
if r>trial[i]:
return i
return len(trial)
We see that if r is smaller than the smallest element of trial, the if clause inside the loop is never executed. Therefore, the function never returns anything in the loop and returns something in the last line. If the if clause inside the loop is executed, return terminates the code, so the last line is never executed.
I want to implement something similar, but without return, i.e.,
def minimum_working_environment(self,r):
self.trial=np.arange(0,6)[::-1]
for i in range(len(self.trial)):
if r>trial[i]:
self.some_aspect=i
break
self.some_aspect=len(self.trial)
Here, break disrupts the loop but the function is not terminated.
The solutions I can think of are:
Replace break with return 0 and not check the return value of the function.
Use a flag variable.
Expand the self.trial array with a very small negative number, like -1e99.
First method looks good, I will probably implement it if I don't get any answer. The second one is very boring. The third one is not just boring but also might cause performance problems.
My questions are:
Is there a reserved word like return that would work in the way that I want, i.e., terminate the function?
If not, what is the best solution to this?
Thanks!

You can check that a for loop did not run into a break with else, which seems to be what you're after.
import numpy as np
def minimum_working_environment(r):
trial = np.arange(0, 6)[::-1]
for i in range(len(trial)):
if r > trial[i]:
return i
return len(trial)
def alternative(r):
trial = np.arange(0, 6)[::-1]
for i in range(len(trial)):
if r > trial[i]:
break
else:
i = len(trial)
return i
print(minimum_working_environment(3))
print(minimum_working_environment(-3))
print(alternative(3))
print(alternative(-3))
Result:
3
6
3
6
This works because the loop controlling variable i will still have the last value it had in the loop after the break and the else will only be executed if the break never executes.
However, if you just want to terminate a function, you should use return. The example I provided is mainly useful if you do indeed need to know if a loop completed fully (i.e. without breaking) or if it terminated early. It works for your example, which I assume was exactly that, just an example.

Related

Python multiprocessing pool.map within an apply loop causing resets and strange behavior

I am experiencing some really weird behavior with pool.starmap in the context of a groupby apply function. Without getting the specifics, my code is something like this:
def groupby_apply_func(input_df):
print('Processing group # x (I determine x from the df)')
output_df = pool.starmap(another_func, zip(some fields in input_df))
return output_df
result = some_df.groupby(groupby_fields).apply(groupby_apply_func)
In words, this takes a dataframe, forms a groupby on it, sends these groups to groupby_apply_func, which does some processing asynchronously using starmap and returns the results, which are concatenated into a final df. pool is a worker pool made from the multiprocessing library.
This code works for smaller datasets without problem. So there are no syntax errors or anything. The computer will loop through all of the groups formed by groupby, send them to groupby_apply_func (I can see the progress from the print statement), and come back fine.
The weird behavior is: on large datasets, it starts looping through the groups. Then, halfway through, or 3/4 way through (which in real time might be 12 hours), it starts completely over at the beginning of the groupbys! It resets the loop and begins again. Then, sometimes, the second loop resets also and so on... and it gets stuck in an infinite loop. Again, this is only with large datasets, it works as intended on small ones.
Could there be something in the apply functionality that, upon running out of memory, for example, decides to start re-processing all the groups? Seems unlikely to me, but I did read that the apply function will actually process the first group multiple times in order to optimize code paths, so I know that there is "meta" functionality in there - and some logic to handle the processing - and it's not just a straight loop.
Hope all that made sense. Does anyone know the inner workings of groupby.apply and if so if anything in there could possibly be causing this?
thx
EDIT: IT APPEARS TO RESET THE LOOP at this point in ops.py ... it gets to this except clause and then proceeds to line 195 which is for key, (i, group) in zip(group_keys, splitter): which starts the entire loop over again. Does this mean anything to anybody?
except libreduction.InvalidApply as err:
# This Exception is raised if `f` triggers an exception
# but it is preferable to raise the exception in Python.
if "Let this error raise above us" not in str(err):
# TODO: can we infer anything about whether this is
# worth-retrying in pure-python?
raise
I would use a list of the group dataframes as the argument to map (I don't think you need starmap here), rather than hiding the multiprocessing in the function to be applied.
def func(df):
# do something
return df.apply(func2)
with mp.Pool(mp.cpu_count()) as p:
groupby = some_df.groupby(groupby_fields)
groups = [groupby.get_group(group) for group in groupby.groups]
result = p.map(func, groups)
OK so I figured it out. Doesn't have anything to do with starmap. It is due to the groupby apply function. This function tries to call fast_apply over the groupby prior to running "normal" apply. If anything causes an error in that fast_apply loop (in my case it was an out of memory error) it then tries to re-run using "normal" apply. However, it does not print the exception / error and just catches all errors.
Not sure if any Python people will read this but I'd humbly suggest that:
if an error really occurs in the fast_apply loop, maybe print it out, rather than catch everything, this could make debugging this like this much easier
the logic to re-run the entire loop if fast_apply fails... seems a little weird to me. Probably not a big deal for small apply operations. In my case I had a huge one and I really don't want it re-running the entire thing again. How about: Perhaps give the user an option to NOT use fast_apply - to avoid the whole fast_apply optimization? I don't know the inner workings of it and I'm sure it's in there for a good reason, but it does add complexity and in my case created very confusing situation which took hours to figure out.

How to run a for loop with variable range?

I want to have a Python program like--
for i in range(r):
if (i==2):
#change r in some way
which will run the loop for the new range r, after it gets modified in the if statement.
Even if I change r after the if statement,the for loop runs for the initial r I gave.This must be happening because range(r) gets fixed in the for statement in the first line itself,and is not affected by change in r later on.
Is there a "simple way" to bypass this?
By "simple" I mean that I don't want to add a counter which counts how many times the loop already ran and how many times it need to run again after changing(specifically increasing) r,or by replacing for loop with a while loop.
When you say:
for i in range(r):
range(r) creates a range object. It does this only once when the loop is set up initially. Therefore, any changes you make to r inside the loop have no effect on the performance of the loop, since it's the range object that dictates the number of iterations (it just happens to be initialized with r).
Rule of thumb: If you know how many iterations you need in advance, use a for loop. If you don't know how many iterations you need, use a while loop.
I don't believe this is possible. However, using a while loop and a manual counter is itself an extremely simple way to do this.
The code will look something like this:
i = 0
while i < r:
if i == 2:
# Change r in some way

How to prevent a repeat of a program call within a while loop

I am trying to prevent this switch function from constantly repeating within the probability while loop, i want it to be called once promoting an input and then using the return of that input for each time in the while loop instead of asking every time
Click here to see screenshot of code
(it won't let me add a second picture of the switch function so ill just copy and paste it)
def switch_door():
switch=raw_input("Switch doors?:")
if switch!="y" and switch!="n":
return "Incorrect inputs"
elif switch=='y':
return True
elif switch=='n':
return False
You may set a variable e.g.if_switch=switch_door() in the probability() function before your while loop and pass that variable to your simulation function as a parameter.
Note that you will need to change your simulation definition to e.g. def simulation(doors, if_switch):; you will also need to change these two lines:
if switch_door()==True: to if if_switch==True:
elif switch_door()==False: to simply else:
Now your problem should solved.

Check statement for a loop only once

Let’s say I have following simple code:
useText = True
for i in range(20):
if useText:
print("The square is "+ str(i**2))
else:
print(i**2)
I use the variable useText to control which way to print the squares. It doesn’t change while running the loop, so it seems inefficient to me to check it every time the loop runs. Is there any way to check useText only once, before the loop, and then always print out according to that result?
This question occurs to me quite often. In this simple case of course it doesn’t matter but I could imagine this leading to slower performance in more complex cases.
The only difference that useText accomplishes here is the formatting string. So move that out of the loop.
fs = '{}'
if useText:
fs = "The square is {}"
for i in range(20):
print(fs.format(i**2))
(This assumes that useText doesn't change during the loop! In a multithreaded program that might not be true.)
The general structure of your program is to loop through a sequence and print the result in some manner.
In code, this becomes
for i in range(20):
print_square(i)
Before the loop runs, set print_square appropriately depending on the useText variable.
if useText:
print_square = lambda x: print("The square is" + str(x**2))
else:
print_square = lambda x: print(x**2)
for i in range(20):
print_square(i)
This has the advantage of not repeating the loop structure or the check for useText and could easily be extended to support other methods of printing the results inside the loop.
If you are not going to change the value of useText inside the loop, you can move it outside of for:
if useText:
for i in range(20):
print("The square is "+ str(i**2))
else:
for i in range(20):
print(i**2)
We can move if outside of for since you mentioned useText is not changing.
If you write something like this, you're checking the condition, running code, moving to the next iteration, and repeating, checking the condition each time, because you're running the entire body of the for loop, including the if statement, on each iteration:
for i in a_list:
if condition:
code()
If you write something like this, with the if statement inside the for loop, you're checking the condition and running the entire for loop only if the condition is true:
if condition:
for i in a_list:
code()
I think you want the second one, because that one only checks the condition once, at the start. It does that because the if statement isn't inside the loop. Remember that everything inside the loop is run on each iteration.

What is the best way on Python to identify if a break occurred on the last element iteration due to a condition within?

I saw some similar questions to this but none seems to address this is specific question so I don't know if I am overlooking something since I am new to Python.
Here is the context for the question:
for i in range(10):
if something_happens(i):
break
if(something_happened_on_last_position()):
# do something
From my C background, if I had a for (i=0;i<10;i++) doing the same thing with a break, then the value of i would be 10, not 9 if the break didn't occur, and 9 if it occurred on the last element. That means the method something_happened_on_last_position() could use this fact to distinguish between both events. However what I noticed on python is that i will stop on 9 even after running a successful loop without breaks.
While make a distinction between both could be as simple as adding a variable there like a flag, I never liked such usage on C. So I was curious, is there another alternative to do this or am I missing something silly here?
Do notice that I can't just use range(11) because this would run something_happens(10). It is different on C on this since '10' would fail on the condition on the for loop and would never execute something_happens(10) (since we start from index 0 here the value is 10 on both Python and C).
I used the methods just to illustrate which code chunk I was interest, they are a set of other conditions that are irrelevant for explaining the problem.
Thank you!
It works the other way:
for i in range(10):
if something_happens(i):
break
else: # no break in any position
do whatever
This is precisely what the else clause is for on for loops:
for i in range(10):
if something_happens(i):
break
else:
# Never hit the break
The else clause is confusing to many, think of it as the else that goes with all those if's you executed in the loop. The else clause happens if the break never does. More about this: For/else

Categories