I have function f that takes int and return bool. I want to find minimum non-negative integer x, for which f(x) is False. How can I do it in most pythonic way (ideally one line)?
Here is how I do it now:
x = 0
while f(x):
x += 1
print(x)
I want something like:
x = <perfect one line expression>
print(x)
Here it is, using next:
from itertools import count
x = next(i for i in count() if not f(i))
Demo:
>>> def f(x):
... return (x - 42)**2
...
>>> next(i for i in count() if not f(i))
42
A similar functional approach with itertools.filterfalse and itertools.count could be
from itertools import filterfalse, count
x = next(filterfalse(f, count()))
Or you can swap out filterfalse with dropwhile, which while performantly similar maintains the same syntax across Python 2 and 3 (thanks to rici).
from itertools import dropwhile, count
x = next(dropwhile(f, count()))
If you'd like a single line without imports, one way might be a list comprehension (Python 2.7 / PyPy):
def f(x):
return True if x == 5 else False
x = [g(0) for g in [lambda x: x if f(x) else g(x+1)]][0]
print(x)
Is it okay to use the yield statement in an instance method of a class? For example,
# Similar to itertools.islice
class Nth(object):
def __init__(self, n):
self.n = n
self.i = 0
self.nout = 0
def itervalues(self, x):
for xi in x:
self.i += 1
if self.i == self.n:
self.i = 0
self.nout += 1
yield self.nout, xi
Python doesn't complain about this, and simple cases seem to work. However, I've only seen examples with yield from regular functions.
I start having problems when I try to use it with itertools functions. For example, suppose I have two large data streams X and Y that are stored across multiple files, and I want to compute their sum and difference with only one loop through the data. I could use itertools.tee and itertools.izip like in the following diagram
In code it would be something like this (sorry, it's long)
from itertools import izip_longest, izip, tee
import random
def add(x,y):
for xi,yi in izip(x,y):
yield xi + yi
def sub(x,y):
for xi,yi in izip(x,y):
yield xi - yi
class NthSumDiff(object):
def __init__(self, n):
self.nthsum = Nth(n)
self.nthdiff = Nth(n)
def itervalues(self, x, y):
xadd, xsub = tee(x)
yadd, ysub = tee(y)
gen_sum = self.nthsum.itervalues(add(xadd, yadd))
gen_diff = self.nthdiff.itervalues(sub(xsub, ysub))
# Have to use izip_longest here, but why?
#for (i,nthsum), (j,nthdiff) in izip_longest(gen_sum, gen_diff):
for (i,nthsum), (j,nthdiff) in izip(gen_sum, gen_diff):
assert i==j, "sum row %d != diff row %d" % (i,j)
yield nthsum, nthdiff
nskip = 12
ns = Nth(nskip)
nd = Nth(nskip)
nsd = NthSumDiff(nskip)
nfiles = 10
for i in range(nfiles):
# Generate some data.
# If the block length is a multiple of nskip there's no problem.
#n = random.randint(5000, 10000) * nskip
n = random.randint(50000, 100000)
print 'file %d n=%d' % (i, n)
x = range(n)
y = range(100,n+100)
# Independent processing is no problem but requires two loops.
for i, nthsum in ns.itervalues(add(x,y)):
pass
for j, nthdiff in nd.itervalues(sub(x,y)):
pass
assert i==j
# Trying to do both with one loops causes problems.
for nthsum, nthdiff in nsd.itervalues(x,y):
# If izip_longest is necessary, why don't I ever get a fillvalue?
assert nthsum is not None
assert nthdiff is not None
# After each block of data the two iterators should have the same state.
assert nsd.nthsum.nout == nsd.nthdiff.nout, \
"sum nout %d != diff nout %d" % (nsd.nthsum.nout, nsd.nthdiff.nout)
But this fails unless I swap itertools.izip out for itertools.izip_longest even though the iterators have the same length. It's the last assert that gets hit, with output like
file 0 n=58581
file 1 n=87978
Traceback (most recent call last):
File "test.py", line 71, in <module>
"sum nout %d != diff nout %d" % (nsd.nthsum.nout, nsd.nthdiff.nout)
AssertionError: sum nout 12213 != diff nout 12212
Edit: I guess it's not obvious from the example I wrote, but the input data X and Y are only available in blocks (in my real problem they're chunked in files). This is important because I need to maintain state between blocks. In the toy example above, this means Nth needs to yield the equivalent of
>>> x1 = range(0,10)
>>> x2 = range(10,20)
>>> (x1 + x2)[::3]
[0, 3, 6, 9, 12, 15, 18]
NOT the equivalent of
>>> x1[::3] + x2[::3]
[0, 3, 6, 9, 10, 13, 16, 19]
I could use itertools.chain to join the blocks ahead of time and then make one call to Nth.itervalues, but I'd like to understand what's wrong with maintaining state in the Nth class between calls (my real app is image processing involving more saved state, not simple Nth/add/subtract).
I don't understand how my Nth instances end up in different states when their lengths are the same. For example, if I give izip two strings of equal length
>>> [''.join(x) for x in izip('ABCD','abcd')]
['Aa', 'Bb', 'Cc', 'Dd']
I get a result of the same length; how come my Nth.itervalues generators seem to be getting unequal numbers of next() calls even though each one yields the same number of results?
Gist repo with revisions |
Quick link to solution
Quick answer
You never reset self.i and self.nout in class Nth. Also, you should have used something like this:
# Similar to itertools.islice
class Nth(object):
def __init__(self, n):
self.n = n
def itervalues(self, x):
for a,b in enumerate(islice(x, self.n - 1, None, self.n)):
self.nout = a
yield a,b
but since you don't even need nout, you should use this:
def Nth(iterable, step):
return enumerate(itertools.islice(iterable, step - 1, None, step))
Long answer
Your code had an off-by-one smell that led me to this line in NthSumDiff.itervalues():
for (i,nthsum), (j,nthdiff) in izip(gen_sum, gen_diff):
If you swap gen_sum and gen_diff, you'll see that gen_diff will always be the one with nout greater by one. This is because izip() pulls from gen_sum before pulling from gen_diff. gen_sum raises a StopIteration exception before gen_diff is even tried in the last iteration.
For example, say you pick N samples where N % step == 7. At the end of each iteration, self.i for the Nth instances should equal 0. But on the very last iteration, self.i in gen_sum will increment up to 7 and then there will be no more elements in x. It will raise StopIteration. gen_diff is still sitting at self.i equal to 0, though.
If you add self.i = 0 and self.nout = 0 to the beginning of Nth.itervalues(), the problem goes away.
Lesson
You only had this problem because your code is too complicated and not Pythonic. If you find yourself using lots of counters and indexes in loops, that's a good sign (in Python) to take a step back and see if you can simplify your code. I have a long history of C programming, and consequently, I still catch myself doing the same thing from time to time in Python.
Simpler implementation
Putting my money where my mouth is...
from itertools import izip, islice
import random
def sumdiff(x,y,step):
# filter for the Nth values of x and y now
x = islice(x, step-1, None, step)
y = islice(y, step-1, None, step)
return ((xi + yi, xi - yi) for xi, yi in izip(x,y))
nskip = 12
nfiles = 10
for i in range(nfiles):
# Generate some data.
n = random.randint(50000, 100000)
print 'file %d n=%d' % (i, n)
x = range(n)
y = range(100,n+100)
for nthsum, nthdiff in sumdiff(x,y,nskip):
assert nthsum is not None
assert nthdiff is not None
assert len(list(sumdiff(x,y,nskip))) == n/nskip
More explanation of the problem
In response to Brian's comment:
This doesn't do the same thing. Not resetting i and nout is
intentional. I've basically got a continuous data stream X that's
split across several files. Slicing the blocks gives a different
result than slicing the concatenated stream (I commented earlier about
possibly using itertools.chain). Also my actual program is more
complicated than mere slicing; it's just a working example. I don't
understand the explanation about the order of StopIteration. If
izip('ABCD','abcd') --> Aa Bb Cc Dd then it seems like equal-length
generators should get an equal number of next calls, no? – Brian
Hawkins 6 hours ago
Your problem was so long that I missed the part about the stream coming from multiple files. Let's just look at the code itself. First, we need to be really clear about how itervalues(x) actually works.
# Similar to itertools.islice
class Nth(object):
def __init__(self, n):
self.n = n
self.i = 0
self.nout = 0
def itervalues(self, x):
for xi in x:
# We increment self.i by self.n on every next()
# call to this generator method unless the
# number of objects remaining in x is less than
# self.n. In that case, we increment by that amount
# before the for loop exits normally.
self.i += 1
if self.i == self.n:
self.i = 0
self.nout += 1
# We're yielding, so we're a generator
yield self.nout, xi
# Python helpfully raises StopIteration to fulfill the
# contract of an iterable. That's how for loops and
# others know when to stop.
In itervalues(x) above, for every next() call, it internally increments self.i by self.n and then yields OR it increments self.i by the number of objects remaining in x and then exits the for loop and then exits the generator (itervalues() is a generator because it yields). When the itervalues() generator exits, Python raises a StopIteration exception.
So, for every instance of class Nth initialized with N, the value of self.i after exhausting all elements in itervalues(X) will be:
self.i = value_of_self_i_before_itervalues(X) + len(X) % N
Now when you iterate over izip(Nth_1, Nth_2), it will do something like this:
def izip(A, B):
try:
while True:
a = A.next()
b = B.next()
yield a,b
except StopIteration:
pass
So, imagine N=10 and len(X)=13. On the very last next() call to izip(),
both A and B have self.i==0 as their state. A.next() is called, increments self.i += 3, runs out of elements in X, exits the for loop, returns, and then Python raises StopIteration. Now, inside izip() we go directly to the exception block skipping B.next() entirely. So, A.i==3 and B.i==0 at the end.
Second try at simplification (with correct requirements)
Here's another simplified version that treats all file data as one continuous stream. It uses chained, small, re-usable generators. I would highly, highly recommend watching this PyCon '14 talk about generators by David Beazley. Guessing from your problem description, it should be 100% applicable.
from itertools import izip, islice
import random
def sumdiff(data):
return ((x + y, x - y) for x, y in data)
def combined_file_data(files):
for i,n in files:
# Generate some data.
x = range(n)
y = range(100,n+100)
for data in izip(x,y):
yield data
def filelist(nfiles):
for i in range(nfiles):
# Generate some data.
n = random.randint(50000, 100000)
print 'file %d n=%d' % (i, n)
yield i, n
def Nth(iterable, step):
return islice(iterable, step-1, None, step)
nskip = 12
nfiles = 10
filedata = combined_file_data(filelist(nfiles))
nth_data = Nth(filedata, nskip)
for nthsum, nthdiff in sumdiff(nth_data):
assert nthsum is not None
assert nthdiff is not None
Condensing the discussion, there's nothing wrong with using yield in an instance method per se. You get into trouble with izip if the instance state changes after the last yield because izip stops calling next() on its arguments once any of them stops yielding results. A clearer example might be
from itertools import izip
class Three(object):
def __init__(self):
self.status = 'init'
def run(self):
self.status = 'running'
yield 1
yield 2
yield 3
self.status = 'done'
raise StopIteration()
it = Three()
for x in it.run():
assert it.status == 'running'
assert it.status == 'done'
it1, it2 = Three(), Three()
for x, y in izip(it1.run(), it2.run()):
pass
assert it1.status == 'done'
assert it2.status == 'done', "Expected status=done, got status=%s." % it2.status
which hits the last assertion,
AssertionError: Expected status=done, got status=running.
In the original question, the Nth class can consume input data after its last yield, so the sum and difference streams can get out of sync with izip. Using izip_longest would work since it will try to exhaust each iterator. A clearer solution might be to refactor to avoid changing state after the last yield.
i would like to perform a calculation using python, where the current value (i) of the equation is based on the previous value of the equation (i-1), which is really easy to do in a spreadsheet but i would rather learn to code it
i have noticed that there is loads of information on finding the previous value from a list, but i don't have a list i need to create it! my equation is shown below.
h=(2*b)-h[i-1]
can anyone give me tell me a method to do this ?
i tried this sort of thing, but that will not work as when i try to do the equation i'm calling a value i haven't created yet, if i set h=0 then i get an error that i am out of index range
i = 1
for i in range(1, len(b)):
h=[]
h=(2*b)-h[i-1]
x+=1
h = [b[0]]
for val in b[1:]:
h.append(2 * val - h[-1]) # As you add to h, you keep up with its tail
for large b list (brr, one-letter identifier), to avoid creating large slice
from itertools import islice # For big list it will keep code less wasteful
for val in islice(b, 1, None):
....
As pointed out by #pad, you simply need to handle the base case of receiving the first sample.
However, your equation makes no use of i other than to retrieve the previous result. It's looking more like a running filter than something which needs to maintain a list of past values (with an array which might never stop growing).
If that is the case, and you only ever want the most recent value,then you might want to go with a generator instead.
def gen():
def eqn(b):
eqn.h = 2*b - eqn.h
return eqn.h
eqn.h = 0
return eqn
And then use thus
>>> f = gen()
>>> f(2)
4
>>> f(3)
2
>>> f(2)
0
>>>
The same effect could be acheived with a true generator using yield and send.
First of, do you need all the intermediate values? That is, do you want a list h from 0 to i? Or do you just want h[i]?
If you just need the i-th value you could us recursion:
def get_h(i):
if i>0:
return (2*b) - get_h(i-1)
else:
return h_0
But be aware that this will not work for large i, as it will exceed the maximum recursion depth. (Thanks for pointing this out kdopen) In that case a simple for-loop or a generator is better.
Even better is to use a (mathematically) closed form of the equation (for your example that is possible, it might not be in other cases):
def get_h(i):
if i%2 == 0:
return h_0
else:
return (2*b)-h_0
In both cases h_0 is the initial value that you start out with.
h = []
for i in range(len(b)):
if i>0:
h.append(2*b - h[i-1])
else:
# handle i=0 case here
You are successively applying a function (equation) to the result of a previous application of that function - the process needs a seed to start it. Your result looks like this [seed, f(seed), f(f(seed)), f(f(f(seed)), ...]. This concept is function composition. You can create a generalized function that will do this for any sequence of functions, in Python functions are first class objects and can be passed around just like any other object. If you need to preserve the intermediate results use a generator.
def composition(functions, x):
""" yields f(x), f(f(x)), f(f(f(x)) ....
for each f in functions
functions is an iterable of callables taking one argument
"""
for f in functions:
x = f(x)
yield x
Your specs require a seed and a constant,
seed = 0
b = 10
The equation/function,
def f(x, b = b):
return 2*b - x
f is applied b times.
functions = [f]*b
Usage
print list(composition(functions, seed))
If the intermediate results are not needed composition can be redefined as
def composition(functions, x):
""" Returns f(x), g(f(x)), h(g(f(x)) ....
for each function in functions
functions is an iterable of callables taking one argument
"""
for f in functions:
x = f(x)
return x
print composition(functions, seed)
Or more generally, with no limitations on call signature:
def compose(funcs):
'''Return a callable composed of successive application of functions
funcs is an iterable producing callables
for [f, g, h] returns f(g(h(*args, **kwargs)))
'''
def outer(f, g):
def inner(*args, **kwargs):
return f(g(*args, **kwargs))
return inner
return reduce(outer, funcs)
def plus2(x):
return x + 2
def times2(x):
return x * 2
def mod16(x):
return x % 16
funcs = (mod16, plus2, times2)
eq = compose(funcs) # mod16(plus2(times2(x)))
print eq(15)
While the process definition appears to be recursive, I resisted the temptation so I could stay out of maximum recursion depth hades.
I got curious, searched SO for function composition and, of course, there are numerous relavent Q&A's.
I wrote a function "rep" that takes a function f and takes n compositions of f.
So rep(square,3) behaves like this: square(square(square(x))).
And when I pass 3 into it, rep(square,3)(3)=6561.
There is no problem with my code, but I was wondering if there was a way to make it "prettier" (or shorter) without having to call another function or import anything. Thanks!
def compose1(f, g):
"""Return a function h, such that h(x) = f(g(x))."""
def h(x):
return f(g(x))
return h
def rep(f,n):
newfunc = f
count=1
while count < n:
newfunc = compose1(f,newfunc)
count+=1
return newfunc
If you're looking for speed, the for loop is clearly the way to go. But if you're looking for theoretical academic acceptance ;-), stick to terse functional idioms. Like:
def rep(f, n):
return f if n == 1 else lambda x: f(rep(f, n-1)(x))
def rep(f, n):
def repeated(x):
for i in xrange(n):
x = f(x)
return x
return repeated
Using a for loop instead of while is shorter and more readable, and compose1 doesn't really need to be a separate function.
While I agree that repeated composition of the same function is best done with a loop, you could use *args to compose an arbitrary number of functions:
def identity(x):
return x
def compose(*funcs):
if funcs:
rest = compose(*funcs[1:])
return lambda x: funcs[0](rest(x))
else:
return identity
And in this case you would have:
def rep(f,n):
funcs = (f,)*n # tuple with f repeated n times
return compose(*funcs)
And as DSM kindly pointed out in the comments, you could remove the recursion like so:
def compose(*funcs):
if not funcs:
return identity
else:
def composed(x):
for f in reversed(funcs):
x = f(x)
return x
return composed
(also note that you can replace x with *args if you also want to support arbitrary arguments to the functions you're composing, but I left it at one argument since that's how you have it in the original problem)
Maybe someone will find this solution useful
Compose number of functions
from functools import reduce
def compose(*functions):
return reduce(lambda x, y: (lambda arg: x(y(arg))), functions)
Use list comprehensions to generate list of functions
def multi(how_many, func):
return compose(*[func for num in range(how_many)])
Usage
def square(x):
return x * x
multi(3, square)(3) == 6561
I should write a function min_in_list(munbers), which takes a list of
numbers and returns the smallest one. NOTE: built-in function min is NOT allowed!
def min_in_list(numbers):
the_smallest = [n for n in numbers if n < n+1]
return the_smallest
What's wrong?
def min_of_two(x, y):
if x >= y: return x
else: return y
def min_in_list(numbers):
return reduce(min_of_two, numbers)
You have to produce 1 number from list, not just another list. And this is work for reduce function (of course, you can implement it without reduce, but by analogy with it).
Here you go. This is almost certainly about as simple as you could make it. You don't even have to give me credit when you turn the assignment in.
import itertools
import functools
import operator
def min(seq, keyfun=operator.gt):
lt = lambda n: functools.partial(keyfun, n)
for i in seq:
lti = lt(i)
try:
next(itertools.ifilter(lti, seq))
except:
return i
min = lambda n: return reduce(lambda x,y: (x>y) and return x or return y,n)
Never been tested, use at your own risk.