Suggestions to implement an automatic counter in Python - python

In Tex, we have the count variables that get automatically updated each time there is a call to reference, that way the figure counter goes up automatically.
I wanted to do something similar in python for counters, for example, each time I need the counter it already has the new value without me needing to add
A+=1
Thanks

Use itertools.count(), this is an iterator so use the next() function the advance the object to the next value:
from itertools import count
yourcounter = count()
next_counted_value = next(yourcounter)
You can create a lambda to wrap the function:
yourcounter = lambda c=count(): next(c)
Or use a functools.partial() object:
from functools import partial
yourcounter = partial(next, count())
Then call the object each time:
next_counted_value = yourcounter()

Related

Any way to convert pandas DataFrame itertuples object to dictionary without using a private function?

This is what I want to do drive a computational experiment
import pandas
def foo(a = 1, b = 2):
print("In foo with %s" % str(locals()))
return a + b
def expand_grid(dictionary):
from itertools import product
return pandas.DataFrame([row for row in product(*dictionary.values())], columns=dictionary.keys())
experiment = {"a":[1,2], "b":[10,12]}
grid = expand_grid(experiment)
for g in grid.itertuples(index=False):
foo(**g._asdict())
That works, but the issue is it has a private _asdict() call. There is a standard in Python not to use "_" calls in external code, so here is my question:
Can you do the above without the _asdict() and if so how?
Also note while foo(a =g.a, b=g.b) is a solution, the actual code is a heavily parameterized call, and for my own knowledge, I was just trying to figure out how to treat it as kwargs without the _ call if it was possible.
Thanks for the hint.

Dictionary With Lambda Values Updates All Entries

I'm in Python 2.7. I have two classes and one namedtuple. One class houses a dictionary as an instance attribute and a function that assigns to that dictionary. (This is a very simplified version of the situation). The namedtuple is simple enough. The other class is one that adds entries into test_dict via the add_to_test_dict function call.
Then I instantiate DictManipulator and call the test function:
from collections import namedtuple
class DictHolder(object):
def __init__(self):
self.test_dict = {}
def add_to_test_dict(self, key, val):
self.test_dict[key] = val
TestTuple = namedtuple('TestTuple', 'name data')
class DictManipulator(object):
def test(self):
named_tuple_list = [TestTuple(name='key1', data=1), TestTuple(name='key2', data=1000)]
self.my_dh = DictHolder()
for item in named_tuple_list:
self.my_dh.add_to_test_dict(item.name, lambda: item.data)
my_dm = DictManipulator()
my_dm.test()
print('key1 value: ', my_dm.my_dh.test_dict['key1']())
print('key2 value: ', my_dm.my_dh.test_dict['key2']())
# ('key1 value: ', 1000)
# ('key2 value: ', 1000)
Why do both keys return the same value there? I have experimented enough to say that the original named_tuple_list is not updated, and I've tried to use lambda: copy.deepcopy(item.data), but that doesn't work either. Thanks very much, folks.
This is a typical late binding issue (see common gotchas): when the functions (being lambda/anonymous has nothing to do with it) are called, they access the current value of item, which is the last one from the loop. Try
lambda x=item: x.data
in your loop instead. This works since default arguments are bound to a function at definition time while common local variables are evaluated at calling time.
Similar (possible duplicate) question: Python Lambda in a loop

How to pass a function with more than one argument to python concurrent.futures.ProcessPoolExecutor.map()?

I would like concurrent.futures.ProcessPoolExecutor.map() to call a function consisting of 2 or more arguments. In the example below, I have resorted to using a lambda function and defining ref as an array of equal size to numberlist with an identical value.
1st Question: Is there a better way of doing this? In the case where the size of numberlist can be million to billion elements in size, hence ref size would have to follow numberlist, this approach unnecessarily takes up precious memory, which I would like to avoid. I did this because I read the map function will terminate its mapping until the shortest array end is reach.
import concurrent.futures as cf
nmax = 10
numberlist = range(nmax)
ref = [5, 5, 5, 5, 5, 5, 5, 5, 5, 5]
workers = 3
def _findmatch(listnumber, ref):
print('def _findmatch(listnumber, ref):')
x=''
listnumber=str(listnumber)
ref = str(ref)
print('listnumber = {0} and ref = {1}'.format(listnumber, ref))
if ref in listnumber:
x = listnumber
print('x = {0}'.format(x))
return x
a = map(lambda x, y: _findmatch(x, y), numberlist, ref)
for n in a:
print(n)
if str(ref[0]) in n:
print('match')
with cf.ProcessPoolExecutor(max_workers=workers) as executor:
#for n in executor.map(_findmatch, numberlist):
for n in executor.map(lambda x, y: _findmatch(x, ref), numberlist, ref):
print(type(n))
print(n)
if str(ref[0]) in n:
print('match')
Running the code above, I found that the map function was able to achieve my desired outcome. However, when I transferred the same terms to concurrent.futures.ProcessPoolExecutor.map(), python3.5 failed with this error:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/queues.py", line 241, in _feed
obj = ForkingPickler.dumps(obj)
File "/usr/lib/python3.5/multiprocessing/reduction.py", line 50, in dumps
cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function <lambda> at 0x7fd2a14db0d0>: attribute lookup <lambda> on __main__ failed
Question 2: Why did this error occur and how do I get concurrent.futures.ProcessPoolExecutor.map() to call a function with more than 1 argument?
To answer your second question first, you are getting an exception because a lambda function like the one you're using is not picklable. Since Python uses the pickle protocol to serialize the data passed between the main process and the ProcessPoolExecutor's worker processes, this is a problem. It's not clear why you are using a lambda at all. The lambda you had takes two arguments, just like the original function. You could use _findmatch directly instead of the lambda and it should work.
with cf.ProcessPoolExecutor(max_workers=workers) as executor:
for n in executor.map(_findmatch, numberlist, ref):
...
As for the first issue about passing the second, constant argument without creating a giant list, you could solve this in several ways. One approach might be to use itertools.repeat to create an iterable object that repeats the same value forever when iterated on.
But a better approach would probably be to write an extra function that passes the constant argument for you. (Perhaps this is why you were trying to use a lambda function?) It should work if the function you use is accessible at the module's top-level namespace:
def _helper(x):
return _findmatch(x, 5)
with cf.ProcessPoolExecutor(max_workers=workers) as executor:
for n in executor.map(_helper, numberlist):
...
(1) No need to make a list. You can use itertools.repeat to create an iterator that just repeats the some value.
(2) You need to pass a named function to map because it will be passed to the subprocess for execution. map uses the pickle protocol to send things, lambdas can't be pickled and therefore they can't be part of the map. But its totally unnecessary. All your lambda did was call a 2 parameter function with 2 parameters. Remove it completely.
The working code is
import concurrent.futures as cf
import itertools
nmax = 10
numberlist = range(nmax)
workers = 3
def _findmatch(listnumber, ref):
print('def _findmatch(listnumber, ref):')
x=''
listnumber=str(listnumber)
ref = str(ref)
print('listnumber = {0} and ref = {1}'.format(listnumber, ref))
if ref in listnumber:
x = listnumber
print('x = {0}'.format(x))
return x
with cf.ProcessPoolExecutor(max_workers=workers) as executor:
#for n in executor.map(_findmatch, numberlist):
for n in executor.map(_findmatch, numberlist, itertools.repeat(5)):
print(type(n))
print(n)
#if str(ref[0]) in n:
# print('match')
Regarding your first question, do I understand it correctly that you want to pass an argument whose value is determined only at the time you call map but constant for all instances of the mapped function? If so, I would do the map with a function derived from a "template function" with the second argument (ref in your example) baked into it using functools.partial:
from functools import partial
refval = 5
def _findmatch(ref, listnumber): # arguments swapped
...
with cf.ProcessPoolExecutor(max_workers=workers) as executor:
for n in executor.map(partial(_findmatch, refval), numberlist):
...
Re. question 2, first part: I haven't found the exact piece of code that tries to pickle (serialize) the function that should then be executed in parallel, but it sounds natural that that has to happen -- not only the arguments but also the function has to be transferred to the workers somehow, and it likely has to be serialized for this transfer. The fact that partial functions can be pickled while lambdas cannot is mentioned elsewhere, for instance here: https://stackoverflow.com/a/19279016/6356764.
Re. question 2, second part: if you wanted to call a function with more than one argument in ProcessPoolExecutor.map, you would pass it the function as the first argument, followed by an iterable of first arguments for the function, followed by an iterable of its second arguments etc. In your case:
for n in executor.map(_findmatch, numberlist, ref):
...

takewhile in itertools takes one value too much

I have the following code to stop the iterator on a certain value, save the state until the value and return both the saved state and the original state. I am using takewhile from itertools to get the values till the given break_point, and then I use chain on the saved iterator until the break_point and the initial iterator to merge them:
from itertools import takewhile, chain
def iter_break(iterator_input, break_point):
new_iter = list(takewhile(lambda x: x <= break_point-1, iterator_input))
return chain(iter(new_iter), iterator_input)
import unittest
class TestEqual(unittest.TestCase):
def test_iters(self):
it = iter(range(20))
old_it = iter_break(it, 10)
self.assertEqual(list(it), list(old_it))
if __name__ == '__main__':
unittest.main()
The problem is, in the end the returned iterator and the full iterator I am returning are not similar since the returned one misses one value, and it misses the value that is equal to the break point itself. Please help.
it is not just missing the breakpoint value, it's missing all the values before it because it's just an iterator created with iter and not a list, so iter_sample uses up values as it iterates over them with takewhile. This includes the breakpoint itself because takewhile needs to see that value in order to know that the condition is no longer satisfied.

python: iterator from a function

What is an idiomatic way to create an infinite iterator from a function? For example
from itertools import islice
import random
rand_characters = to_iterator( random.randint(0,256) )
print ' '.join( islice( rand_characters, 100))
would produce 100 random numbers
You want an iterator which continuously yields values until you stop asking it for new ones? Simply use
it = iter(function, sentinel)
which calls function() for each iteration step until the result == sentinel.
So choose a sentinel which can never be returned by your wanted function, such as None, in your case.
rand_iter = lambda start, end: iter(random.randint(start, end), None)
rand_bytes = rand_iter(0, 256)
If you want to monitor some state on your machine, you could do
iter_mystate = iter(getstate, None)
which, in turn, infinitely calls getstate() for each iteration step.
But beware of functions returning None as a valid value! In this case, you should choose a sentinel which is guaranteed to be unique, maybe an object created for exactly this job:
iter_mystate = iter(getstate, object())
Every time I see iter with 2 arguments, I need to scratch my head an look up the documentation to figure out exactly what is going on. Simply because of that, I would probably roll my own:
def call_forever(callback):
while True:
yield callback()
Or, as stated in the comments by Jon Clements, you could use the itertools.repeatfunc recipe which allows you to pass arguments to the function as well:
import itertools as it
def repeatfunc(func, times=None, *args):
"""
Repeat calls to func with specified arguments.
Example: repeatfunc(random.random)
"""
if times is None:
return it.starmap(func, it.repeat(args))
return it.starmap(func, it.repeat(args, times))
Although I think that the function signature def repeatfunc(func,times=None,*args) is a little awkward. I'd prefer to pass a tuple as args (it seems more explicit to me, and "explicit is better than implicit"):
import itertools as it
def repeatfunc(func, args=(),times=None):
"""
Repeat calls to func with specified arguments.
Example: repeatfunc(random.random)
"""
if times is None:
return it.starmap(func, it.repeat(args))
return it.starmap(func, it.repeat(args, times))
which allows it to be called like:
repeatfunc(func,(arg1,arg2,...,argN),times=4) #repeat 4 times
repeatfunc(func,(arg1,arg2,...)) #repeat infinitely
instead of the vanilla version from itertools:
repeatfunc(func,4,arg1,arg2,...) #repeat 4 times
repeatfunc(func,None,arg1,arg2,...) #repeat infinitely

Categories