benefit of generators in fibonacci series - python

I just saw an example of Fibonacci series using a generator from a webpage.
This is the example I saw
def fibonacci(n):
curr = 1
prev = 0
counter = 0
while counter < n:
yield curr
prev, curr = curr, prev + curr
counter += 1
The user can input the limit 'n'.
My question is what's the use of using generator here?
how generator benefits the user here?
What advantage he gets instead of just using a normal print message here to print the series?
I am agreeing if the user has to store the value in a list and process it later, yield can do generate number on fly.
What could be the use of yield in this case?

By yielding the value, the caller of the function has access to the intermediate results, making this memory efficient.
For example, the following would give you all results up to n as well, but requires that you create a list first to hold those values:
def fibonacci(n):
results = []
curr = 1
prev = 0
counter = 0
while counter < n:
results.append(curr)
prev, curr = curr, prev + curr
counter += 1
return results
If my code then asked for fibonacci(10 ** 9), that would not only take a long time to produce, but would also require a significant amount of memory.
The generator option gives the caller access to the results immediately, which means they can use just that one intermediate result in another part of the program.)
This makes the generator function far more versatile and flexible. You could sum the fibonacci results, without having to rewrite the generator, for example:
sum_of_fibonacci_10 = sum(fibonacci(10))
This is still memory efficient, the summing takes place as the results are produced, at no point do all 10 results have to exist in memory. Just the current intermediate value, and the sum total up to now, are required.
If you wanted to just print the values instead, you can do so in a loop:
for next_value in fibonacci(10):
print(next_value)
and the code is still just as efficient, and we still did not have to change the generator function.
So generators let you maintain and share intermediate state, saving you memory. This can be done with a class too, but a generator is much more readable. The following would do the same, but is not nearly as easy to follow:
class FibonacciGenerator:
def __init__(self, n):
self.n = n
self.curr = 1
self.prev = 0
self.counter = 0
def __iter__(self):
return self
def __next__(self):
if self.counter >= self.n:
raise StopIteration
self.prev, self.curr = self.curr, self.prev + self.curr
self.counter += 1
return self.prev

The advantage of a generator instead of a function is that it can keep its state.
If you had used a recursive function, calculating the 5th fibonacci number would be equivalent to calculating the 1st, then calculating the 1st and 2nd, then calculating the 1st, 2nd and 3rd, and so on until the 5th.
You can see how this is the problem for large numbers; there's a lot of resources being wasted.
With a generator, you calculate the 1st, and remember what it is, so that you don't have to calculate it again in order to find the second. Unlike a function that stops when it reaches its return, the generator can keep state. So it calculates the 1st number, and when requested, calculates the 2nd, then the 3rd, without wasting processing power because it doesn't need to recalculate the whole thing each time; it remembers its previous state.
With regards to print vs yield, obviously you can't use the printed result if you want to use the fibonacci numbers you calculated for something else; it's just output on the screen. If you'd like to generate a graph, for example, print wouldn't do, you'd have to pass values to your plotting function.

Using a generator allows you to generate the 'tip' of the sequence one at a time, at your leisure. You can retrieve and print the result, but you don't need to.
A plain ol' loop in a function would run until the breaking condition was met - or forever - and without any special bells and whistles, would block the execution of your code until the loop terminated.
This means an unbounded loop would only stop once you killed the interpreter, and soon enough you'd have to kill it because it would start eating resources like crazy computing very high sums.
Yield pauses between calls and returns the value rather than just displaying it unlike print(). This means you can retrieve nth fibonacci number, do some other stuff for five minutes, then retrieve n+1th number.

Related

What is a nice, python style way to yield ranged subsets of a string/bytes/list several items at a time?

I want to loop over bytes data, and I'd hope the same principle would apply to strings and lists, where I don't go item by item, but a few items at a time. I know I can do mystr[0:5] to get the first five characters, and I'd like to do that in a loop.
I can do it the C style way, looping over ranges and then returning the remaining elements, if any:
import math
def chunkify(listorstr, chunksize:int):
# Loop until the last chunk that is still chunksize long
end_index = int(math.floor(len(listorstr)/chunksize))
for i in range(0, end_index):
print(f"yield ")
yield listorstr[i*chunksize:(i+1)*chunksize]
# If anything remains at the end, yield the rest
remainder = len(listorstr)%chunksize
if remainder != 0:
yield listorstr[end_index*chunksize:len(listorstr)]
[i for i in chunkify("123456789", 2)]
This works just fine, but I strongly suspect python language features could make this a lot more compact.
You can condense your code using the range step parameter. Instead of the for loop, a generator for this is
listorstr[i:i+chunksize] for i in range(0,len(listorstr), chunksize)
Your function could yield from this generator to make for a more tidy call.
def chunkify(listorstr, chunksize:int):
yield from (listorstr[i:i+chunksize]
for i in range(0,len(listorstr), chunksize))

yield slower than return. why?

I wrote two function f and g with same functionality
def f(l, count):
if count > 1:
for i in f(l, count-1):
yield i + 1
else:
yield from l
for i in f(range(100000),900):
pass
print('f')
and
def g(l, count):
if count > 1:
tmp = []
for i in g(l, count-1):
tmp.append(i+1)
return tmp
else:
return l
for i in g(range(100000),900):
pass
print('f')
and i
I think f shuold be faster but g is faster when in run it
time for g
real 0m5.977s
user 0m5.956s
sys 0m0.020s
time for f
real 0m7.389s
user 0m7.376s
sys 0m0.012s
There are a couple of big differences between a solution that yields a result and one that computes the complete result.
The yield keeps returning the next result until exhausted while the complete calculation is always done fully so if you had a test that might terminate your calculation early, (often the case), the yield method will only be called enough times to meet that criteria - this often results in faster code.
The yield result only consumes enough memory to hold the generator and a single result at any moment in time - the full calculation consumes enough memory to hold all of the results at once. When you get to really large data sets that can make the difference between something that runs regardless of the size and something that crashes.
So yield is slightly more expensive, per operation, but much more reliable and often faster in cases where you don't exhaust the results.
I have no idea what your h and g functions are doing but remember this,
Yield is a keyword that is used like return, except the function will return a generator and that is the reason it takes time.
There is a wonderful explanation about what yeild does. Check this answer on stackoverflow.

python 3 median-of-3 Quicksort implementation which switches to heapsort after a recursion depth limit is met

Functions called: (regardless of class)
def partition( pivot, lst ):
less, same, more = list(), list(), list()
for val in lst:
if val < pivot:
less.append(val)
elif val > pivot:
more.append(val)
else:
same.append(val)
return less, same, more
def medianOf3(lst):
"""
From a lst of unordered data, find and return the the median value from
the first, middle and last values.
"""
finder=[]
start=lst[0]
mid=lst[len(lst)//2]
end=lst[len(lst)-1]
finder.append(start)
finder.append(mid)
finder.append(end)
finder.sort()
pivot_val=finder[1]
return pivot_val
main code
def quicheSortRec(lst,limit):
"""
A non in-place, depth limited quickSort, using median-of-3 pivot.
Once the limit drops to 0, it uses heapSort instead.
"""
if limit==0:
return heapSort.heapSort(lst)
else:
pivot=qsPivotMedian3.medianOf3(lst) # here we select the first element as the pivot
less, same, more = qsPivotMedian3.partition(pivot, lst)
return quicheSortRec(less,limit-1) + same + quicheSortRec(more,limit-1)
def quicheSort(lst):
"""
The main routine called to do the sort. It should call the
recursive routine with the correct values in order to perform
the sort
"""
N=len(lst)
end= int(math.log(N,2))
if len(lst)==0:
return list()
else:
return quicheSortRec(lst,end)
so this code is supposed to take a list and sort it using a median of 3 implementation of quicksort, up until it reaches a depth recursion limit, at which point the the function will instead use heapsort.the results of this code are then fed into another program designed to test the algorithm with lists of length 1,10,100,1000,10000,100000.
however, when I run the code, tells me that lengths 1 and 10 work fine, but after that, it gives an list index out of bounds error for
start=lst[0] in the median of 3 function
I cant figure out why.
Without seeing your test data, we're flying blind here. In general,
less, same, more = qsPivotMedian3.partition(pivot, lst)
is fine on its own, but you never check less or more to see whether they're empty. Even if they are empty, they're passed on via:
return quicheSortRec(less,limit-1) + same + quicheSortRec(more,limit-1)
and quicheSortRec() doesn't check to see whether they're empty either: it unconditionally calls:
pivot=qsPivotMedian3.medianOf3(lst)
and that function will raise the error you're seeing if it's passed an empty list. Note that your quicheSort() does check for an empty input. The recursive worker function should too.
BTW, consider this rewrite of medianOf3():
def medianOf3(lst):
return sorted([lst[0], lst[len(lst)//2], lst[-1]])[1]
This is one case where brevity is clearer ;-)

getting Nth value of a linked list recursively

I want to be able to be able to recursively pass count or increment count and then pass it into my recursion. However, I know I must declare count = 0 to be able to use it when incrementing. I am still learning python and I am finding it hard to recursively increment count. Could someone help me with this please?
I know currently my code is wrong because each recursion i make, count will be resent to 0. I don't want to set count as a 3rd argument because I feel as though it is not necessary.
my code:
def getNth(head, n):
count = 0
if count == n:
count += 1
return head.value
else:
if head.next is not None:
getNth(head.next,n)
else:
print 'not in linked list'
Count backwards rather than up.
def getNth(head, n):
if n == 0:
return head.value
return getNth(head.next, n - 1)
This however will perform miserably in practice, and you'll get a stack overflow if your list is any reasonable length. Functional programming style is not usually good Python style (since, for example, tail recursion isn't a feature of Python).
I'd just write the loop out.
def getNth(head, n):
for _ in xrange(n):
head = head.next
return head.value
This is a common pattern in recursion that is cleanly executed in python, so it's worth mentioning.
Methods allow keyword arguments, which are useful for keeping track of recursion depth. The change to your method signature is trivial:
def getNth(head, n, count=0):
0 is the default argument to count. Just leave it out in your initial call (or explicitly call it with count=0) and you're good. You can then easily recursively call getNth with getNth(*args, count + 1).
I should note now that I've explained this that recursion is quite slow in python. If you care at all about performance you should favor iterative solutions (often involving generators) over recursive solutions.

Is there any built-in way to get the length of an iterable in python?

For example, files, in Python, are iterable - they iterate over the lines in the file. I want to count the number of lines.
One quick way is to do this:
lines = len(list(open(fname)))
However, this loads the whole file into memory (at once). This rather defeats the purpose of an iterator (which only needs to keep the current line in memory).
This doesn't work:
lines = len(line for line in open(fname))
as generators don't have a length.
Is there any way to do this short of defining a count function?
def count(i):
c = 0
for el in i: c += 1
return c
To clarify, I understand that the whole file will have to be read! I just don't want it in memory all at once
Short of iterating through the iterable and counting the number of iterations, no. That's what makes it an iterable and not a list. This isn't really even a python-specific problem. Look at the classic linked-list data structure. Finding the length is an O(n) operation that involves iterating the whole list to find the number of elements.
As mcrute mentioned above, you can probably reduce your function to:
def count_iterable(i):
return sum(1 for e in i)
Of course, if you're defining your own iterable object you can always implement __len__ yourself and keep an element count somewhere.
If you need a count of lines you can do this, I don't know of any better way to do it:
line_count = sum(1 for line in open("yourfile.txt"))
The cardinality package provides an efficient count() function and some related functions to count and check the size of any iterable: http://cardinality.readthedocs.org/
import cardinality
it = some_iterable(...)
print(cardinality.count(it))
Internally it uses enumerate() and collections.deque() to move all the actual looping and counting logic to the C level, resulting in a considerable speedup over for loops in Python.
I've used this redefinition for some time now:
def len(thingy):
try:
return thingy.__len__()
except AttributeError:
return sum(1 for item in iter(thingy))
It turns out there is an implemented solution for this common problem. Consider using the ilen() function from more_itertools.
more_itertools.ilen(iterable)
An example of printing a number of lines in a file (we use the with statement to safely handle closing files):
# Example
import more_itertools
with open("foo.py", "r+") as f:
print(more_itertools.ilen(f))
# Output: 433
This example returns the same result as solutions presented earlier for totaling lines in a file:
# Equivalent code
with open("foo.py", "r+") as f:
print(sum(1 for line in f))
# Output: 433
Absolutely not, for the simple reason that iterables are not guaranteed to be finite.
Consider this perfectly legal generator function:
def forever():
while True:
yield "I will run forever"
Attempting to calculate the length of this function with len([x for x in forever()]) will clearly not work.
As you noted, much of the purpose of iterators/generators is to be able to work on a large dataset without loading it all into memory. The fact that you can't get an immediate length should be considered a tradeoff.
Because apparently the duplication wasn't noticed at the time, I'll post an extract from my answer to the duplicate here as well:
There is a way to perform meaningfully faster than sum(1 for i in it) when the iterable may be long (and not meaningfully slower when the iterable is short), while maintaining fixed memory overhead behavior (unlike len(list(it))) to avoid swap thrashing and reallocation overhead for larger inputs.
# On Python 2 only, get zip that lazily generates results instead of returning list
from future_builtins import zip
from collections import deque
from itertools import count
def ilen(it):
# Make a stateful counting iterator
cnt = count()
# zip it with the input iterator, then drain until input exhausted at C level
deque(zip(it, cnt), 0) # cnt must be second zip arg to avoid advancing too far
# Since count 0 based, the next value is the count
return next(cnt)
Like len(list(it)), ilen(it) performs the loop in C code on CPython (deque, count and zip are all implemented in C); avoiding byte code execution per loop is usually the key to performance in CPython.
Rather than repeat all the performance numbers here, I'll just point you to my answer with the full perf details.
For filtering, this variation can be used:
sum(is_good(item) for item in iterable)
which can be naturally read as "count good items" and is shorter and simpler (although perhaps less idiomatic) than:
sum(1 for item in iterable if is_good(item)))
Note: The fact that True evaluates to 1 in numeric contexts is specified in the docs
(https://docs.python.org/3.6/library/stdtypes.html#boolean-values), so this coercion is not a hack (as opposed to some other languages like C/C++).
We'll, if you think about it, how do you propose you find the number of lines in a file without reading the whole file for newlines? Sure, you can find the size of the file, and if you can gurantee that the length of a line is x, you can get the number of lines in a file. But unless you have some kind of constraint, I fail to see how this can work at all. Also, since iterables can be infinitely long...
I did a test between the two common procedures in some code of mine, which finds how many graphs on n vertices there are, to see which method of counting elements of a generated list goes faster. Sage has a generator graphs(n) which generates all graphs on n vertices. I created two functions which obtain the length of a list obtained by an iterator in two different ways and timed each of them (averaging over 100 test runs) using the time.time() function. The functions were as follows:
def test_code_list(n):
l = graphs(n)
return len(list(l))
and
def test_code_sum(n):
S = sum(1 for _ in graphs(n))
return S
Now I time each method
import time
t0 = time.time()
for i in range(100):
test_code_list(5)
t1 = time.time()
avg_time = (t1-t0)/10
print 'average list method time = %s' % avg_time
t0 = time.time()
for i in range(100):
test_code_sum(5)
t1 = time.time()
avg_time = (t1-t0)/100
print "average sum method time = %s" % avg_time
average list method time = 0.0391882109642
average sum method time = 0.0418473792076
So computing the number of graphs on n=5 vertices this way, the list method is slightly faster (although 100 test runs isn't a great sample size). But when I increased the length of the list being computed by trying graphs on n=7 vertices (i.e. changing graphs(5) to graphs(7)), the result was this:
average list method time = 4.14753051996
average sum method time = 3.96504004002
In this case the sum method was slightly faster. All in all, the two methods are approximately the same speed but the difference MIGHT depend on the length of your list (it might also just be that I only averaged over 100 test runs, which isn't very high -- would have taken forever otherwise).

Categories