takewhile in itertools takes one value too much

takewhile in itertools takes one value too much - python

I have the following code to stop the iterator on a certain value, save the state until the value and return both the saved state and the original state. I am using takewhile from itertools to get the values till the given break_point, and then I use chain on the saved iterator until the break_point and the initial iterator to merge them:
from itertools import takewhile, chain
def iter_break(iterator_input, break_point):
new_iter = list(takewhile(lambda x: x <= break_point-1, iterator_input))
return chain(iter(new_iter), iterator_input)
import unittest
class TestEqual(unittest.TestCase):
def test_iters(self):
it = iter(range(20))
old_it = iter_break(it, 10)
self.assertEqual(list(it), list(old_it))
if __name__ == '__main__':
unittest.main()
The problem is, in the end the returned iterator and the full iterator I am returning are not similar since the returned one misses one value, and it misses the value that is equal to the break point itself. Please help.

it is not just missing the breakpoint value, it's missing all the values before it because it's just an iterator created with iter and not a list, so iter_sample uses up values as it iterates over them with takewhile. This includes the breakpoint itself because takewhile needs to see that value in order to know that the condition is no longer satisfied.

Related

Parallelizing a list comprehension in Python

someList = [x for x in someList if not isOlderThanXDays(x, XDays, DtToday)]
I have this line and the function isOlderThanXDays makes some API calls causing it to take a while. I would like to perform this using multi/parrellel processing in python. The order in which the list is done doesn't matter (so asynchronous I think)
The function isOlderThanXDays essentially returns a boolean value and everything newer than is kept in the new list using List Comprehension.
Edit:
Params of function: So the XDays is for the user to pass in lets say 60 days. and DtToday is today's date (date time object). Then I make API calls to see metaData of the file's modified date and return if it is older I return true otherwise false.
I am looking for something similar to the question below. The difference is this question for every list input there is an output, whereas mine is like filtering the list based on boolean value from the function used, so I don't know how to apply it in my scenario
How to parallelize list-comprehension calculations in Python?

This should run all of your checks in parallel, and then filter out the ones that failed the check.
import multiprocessing
try:
cpus = multiprocessing.cpu_count()
except NotImplementedError:
cpus = 2 # arbitrary default
def MyFilterFunction(x):
if not isOlderThanXDays(x, XDays, DtToday):
return x
return None
pool = multiprocessing.Pool(processes=cpus)
parallelized = pool.map(MyFilterFunction, someList)
newList = [x for x in parallelized if x]

you can use ThreadPool:
from multiprocessing.pool import ThreadPool # Class which supports an async version of applying functions to arguments
from functools import partial
NUMBER_CALLS_SAME_TIME = 10 # take care to avoid throttling
# Asume that isOlderThanXDays signature is isOlderThanXDays(x, XDays, DtToday)
my_api_call_func = partial(isOlderThanXDays, XDays=XDays, DtToday=DtToday)
pool = ThreadPool(NUMBER_CALLS_SAME_TIME)
responses = pool.map(my_api_call_func, someList)

python: how to return two values at different position of calculating in a function?

I want to have a function that process the data in the predefined pipeline, and I need the data at intermediate and final step, so I want to have a function that I can call twice, in the first call, the function returns the intermediate data, and in the second call, the function starts from the last return position and returns the final data, so something like this:
def tworeturns(x):
intermediate = do-something(x)
return intermediate
final = do-something(intermediate)
return final
How to implement this with python?
==========================================================================
Thanks to new-dev-123's answer I was able to return two values, but then I got another problem. After the first yield, I changed the intermediate, and when I call the next for the second time, I want the function to compute based on the modified intermediate, not the original intermediate. Is there a way of doing it?

Using a generator like a coroutine allows you to effectively pause the function and you can drive it using next()
>>> def tworeturns(x):
... print(f"hi {x}")
... yield
... print(f"bye {x}")
... yield
...
>>> corou = tworeturns("bob")
>>> next(corou)
hi bob
>>> next(corou)
bye bob
here's a quick demo I did on cli
So for your example you would do something like:
def tworeturns(x):
intermediate = do-something(x)
yield intermediate
final = do-something(intermediate)
yield final
corou = tworeturns(x)
first_value = next(corou)
second_value = next(corou)

OK, that's a nice question, I sometimes have similar situations.
As one of the comments to your question says, you can use a generator, which literally answers your question.
But I like better the following solution:
def tworeturns(x, intermediate=None):
if intermediate is None:
return do_something(x)
else:
return do_something(intermediate)
Of course, you will now need to call tworeturns with the intermediate result. But it's a clear code, and also, stateless.

Return both values(which will give you a tuple):
def tworeturns(x):
intermediate = do-something(x)
final = do-something(intermediate)
return intermediate, final

Mocking subprocess.check_call more than once

I have a function that calls subprocess.check_call() twice. I want to test all their possible outputs. I want to be able to set the first check_call() to return 1 and the second to return 0 and to do so for all possible combinations. The below is what I have so far. I am not sure how to adjust the expected return value
#patch('subprocess.check_call')
def test_hdfs_dir_func(mock_check_call):
for p, d in list(itertools.product([1, 0], repeat=2)):
if p or d:

You can assign the side_effect of your mock to an iterable and that will return the next value in the iterable each time it's called. In this case, you could do something like this:
import copy
import itertools
import subprocess
from unittest.mock import patch
#patch('subprocess.check_call')
def test_hdfs_dir_func(mock_check_call):
return_values = itertools.product([0, 1], repeat=2)
# Flatten the list; only one return value per call
mock_check_call.side_effect = itertools.chain.from_iterable(copy.copy(return_values))
for p, d in return_values:
assert p == subprocess.check_call()
assert d == subprocess.check_call()
Note a few things:
I don't have your original functions so I put my own calls to check_call in the loop.
I'm using copy on the original itertools.product return value because if I don't, it uses the original iterator. This exhausts that original iterator when what we want is 2 separate lists: one for the mock's side_effect and one for you to loop through in your test.
You can do other neat stuff with side_effect, not just raise. As shown above, you can change the return value for multiple calls: https://docs.python.org/3/library/unittest.mock-examples.html#side-effect-functions-and-iterables
Not only that, but you can see from the link above that you can also give it a function pointer. That allows you to do even more complex logic when keeping track of multiple mock calls.

Suggestions to implement an automatic counter in Python

In Tex, we have the count variables that get automatically updated each time there is a call to reference, that way the figure counter goes up automatically.
I wanted to do something similar in python for counters, for example, each time I need the counter it already has the new value without me needing to add
A+=1
Thanks

Use itertools.count(), this is an iterator so use the next() function the advance the object to the next value:
from itertools import count
yourcounter = count()
next_counted_value = next(yourcounter)
You can create a lambda to wrap the function:
yourcounter = lambda c=count(): next(c)
Or use a functools.partial() object:
from functools import partial
yourcounter = partial(next, count())
Then call the object each time:
next_counted_value = yourcounter()

python: iterator from a function

What is an idiomatic way to create an infinite iterator from a function? For example
from itertools import islice
import random
rand_characters = to_iterator( random.randint(0,256) )
print ' '.join( islice( rand_characters, 100))
would produce 100 random numbers

You want an iterator which continuously yields values until you stop asking it for new ones? Simply use
it = iter(function, sentinel)
which calls function() for each iteration step until the result == sentinel.
So choose a sentinel which can never be returned by your wanted function, such as None, in your case.
rand_iter = lambda start, end: iter(random.randint(start, end), None)
rand_bytes = rand_iter(0, 256)
If you want to monitor some state on your machine, you could do
iter_mystate = iter(getstate, None)
which, in turn, infinitely calls getstate() for each iteration step.
But beware of functions returning None as a valid value! In this case, you should choose a sentinel which is guaranteed to be unique, maybe an object created for exactly this job:
iter_mystate = iter(getstate, object())

Every time I see iter with 2 arguments, I need to scratch my head an look up the documentation to figure out exactly what is going on. Simply because of that, I would probably roll my own:
def call_forever(callback):
while True:
yield callback()
Or, as stated in the comments by Jon Clements, you could use the itertools.repeatfunc recipe which allows you to pass arguments to the function as well:
import itertools as it
def repeatfunc(func, times=None, *args):
"""
Repeat calls to func with specified arguments.
Example: repeatfunc(random.random)
"""
if times is None:
return it.starmap(func, it.repeat(args))
return it.starmap(func, it.repeat(args, times))
Although I think that the function signature def repeatfunc(func,times=None,*args) is a little awkward. I'd prefer to pass a tuple as args (it seems more explicit to me, and "explicit is better than implicit"):
import itertools as it
def repeatfunc(func, args=(),times=None):
"""
Repeat calls to func with specified arguments.
Example: repeatfunc(random.random)
"""
if times is None:
return it.starmap(func, it.repeat(args))
return it.starmap(func, it.repeat(args, times))
which allows it to be called like:
repeatfunc(func,(arg1,arg2,...,argN),times=4) #repeat 4 times
repeatfunc(func,(arg1,arg2,...)) #repeat infinitely
instead of the vanilla version from itertools:
repeatfunc(func,4,arg1,arg2,...) #repeat 4 times
repeatfunc(func,None,arg1,arg2,...) #repeat infinitely

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

takewhile in itertools takes one value too much - python

Related

Parallelizing a list comprehension in Python

python: how to return two values at different position of calculating in a function?

Mocking subprocess.check_call more than once

Suggestions to implement an automatic counter in Python

python: iterator from a function

Categories

Resources