Python: itertools.product consuming too much resources - python

I've created a Python script that generates a list of words by permutation of characters. I'm using itertools.product to generate my permutations. My char list is composed by letters and numbers 01234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVXYZ. Here is my code:
#!/usr/bin/python
import itertools, hashlib, math
class Words:
chars = '01234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVXYZ'
def __init__(self, size):
self.make(size)
def getLenght(self, size):
res = []
for i in range(1, size+1):
res.append(math.pow(len(self.chars), i))
return sum(res)
def getMD5(self, text):
m = hashlib.md5()
m.update(text.encode('utf-8'))
return m.hexdigest()
def make(self, size):
file = open('res.txt', 'w+')
res = []
i = 1
for i in range(1, size+1):
prod = list(itertools.product(self.chars, repeat=i))
res = res + prod
j = 1
for r in res:
text = ''.join(r)
md5 = self.getMD5(text)
res = text+'\t'+md5
print(res + ' %.3f%%' % (j/float(self.getLenght(size))*100))
file.write(res+'\n')
j = j + 1
file.close()
Words(3)
This script works fine for list of words with max 4 characters. If I try 5 or 6 characters, my computer consumes 100% of CPU, 100% of RAM and freezes.
Is there a way to restrict the use of those resources or optimize this heavy processing?

Does this do what you want?
I've made all the changes in the make method:
def make(self, size):
with open('res.txt', 'w+') as file_: # file is a builtin function in python 2
# also, use with statements for files used on only a small block, it handles file closure even if an error is raised.
for i in range(1, size+1):
prod = itertools.product(self.chars, repeat=i)
for j, r in enumerate(prod):
text = ''.join(r)
md5 = self.getMD5(text)
res = text+'\t'+md5
print(res + ' %.3f%%' % ((j+1)/float(self.get_length(size))*100))
file_.write(res+'\n')
Be warned this will still chew up gigabytes of memory, but not virtual memory.
EDIT: As noted by Padraic, there is no file keyword in Python 3, and as it is a "bad builtin", it's not too worrying to override it. Still, I'll name it file_ here.
EDIT2:
To explain why this works so much faster and better than the previous, original version, you need to know how lazy evaluation works.
Say we have a simple expression as follows (for Python 3) (use xrange for Python 2):
a = [i for i in range(1e12)]
This immediately evaluates 1 trillion elements into memory, overflowing your memory.
So we can use a generator to solve this:
a = (i for i in range(1e12))
Here, none of the values have been evaluated, just given the interpreter instructions on how to evaluate it. We can then iterate through each item one by one and do work on each separately, so almost nothing is in memory at a given time (only 1 integer at a time). This makes the seemingly impossible task very manageable.
The same is true with itertools: it allows you to do memory-efficient, fast operations by using iterators rather than lists or arrays to do operations.
In your example, you have 62 characters and want to do the cartesian product with 5 repeats, or 62**5 (nearly a billion elements, or over 30 gigabytes of ram). This is prohibitively large."
In order to solve this, we can use iterators.
chars = '01234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVXYZ'
for i in itertools.product(chars, repeat=5):
print(i)
Here, only a single item from the cartesian product is in memory at a given time, meaning it is very memory efficient.
However, if you evaluate the full iterator using list(), it then exhausts the iterator and adds it to a list, meaning the nearly one billion combinations are suddenly in memory again. We don't need all the elements in memory at once: just 1. Which is the power of iterators.
Here are links to the itertools module and another explanation on iterators in Python 2 (mostly true for 3).

Related

How to create a script that gives me every combination possible of a six digit code

Me and a friend want to create a script that gives us every possible permutation of a six digit code, comprised of 36 alphanumeric characters (0-9, and a-z), in alphabetical order, then be able to see them in a .txt file.
And I want it to use all of the CPU and RAM it can, so that it takes less time to complete the task.
So far, this is the code:
import random
charset = "0123456789abcdefghijklmnopqrstuvwxyz"
links = []
file = open("codes.txt", "a")
for g in range(0, 36**6):
key = ""
base = ""
print(str(g))
for i in range(0, 6):
char = random.choice(charset)
key += char
base += key
file.write(base + "\n")
file.close()
This code randomly generates the combinations and immediately writes them in a .txt file, while printing the amount of codes it has already created but, it isn't in alphabetical order (have to do it afterwards), and it takes too long.
How can the code be improved to give the desired outcome?
Thanks to #R0Best for providing the best answer
Although this post already has 6 answers, I'm not content with any of them, so I've decided to contribute a solution of my own.
First, note that many of the answers provide the combinations or permutations of letters, however the post actually wants the Cartesian Product of the alphabet with itself (repeated N times, where N=6). There is (at this time) two answers that do this, however they both write an excessive number of times, resulting in subpar performance, and also concatenate their intermediate results in the hottest portion of the loop (also bringing down performance).
In the interest of taking optimization to the absolute max, I present the following code:
from string import digits, ascii_lowercase
from itertools import chain
ALPHABET = (digits + ascii_lowercase).encode("ascii")
def fast_brute_force():
# Define some constants to make the following sections more readable
base_size = 6
suffix_size = 4
prefix_size = base_size - suffix_size
word_size = base_size + 1
# define two containers
# word_blob - placeholder words, with hyphens in the unpopulated characters (followed by newline)
# sleds - a tuple of repeated bytes, used for substituting a bunch of characters in a batch
word_blob = bytearray(b"-" * base_size + b"\n")
sleds = tuple(bytes([char]) for char in ALPHABET)
# iteratively extend word_blob and sleds, and filling in unpopulated characters using the sleds
# in doing so, we construct a single "blob" that contains concatenated suffixes of the desired
# output with placeholders so we can quickly substitute in the prefix, write, repeat, in batches
for offset in range(prefix_size, base_size)[::-1]:
word_blob *= len(ALPHABET)
word_blob[offset::word_size] = chain.from_iterable(sleds)
sleds = tuple(sled * len(ALPHABET) for sled in sleds)
with open("output.txt", "wb") as f:
# I've expanded out the logic for substituting in the prefixes into explicit nested for loops
# to avoid both redundancy (reassigning the same value) and avoiding overhead associated with
# a recursive implementation
# I assert this below, so any changes in suffix_size will fail loudly
assert prefix_size == 2
for sled1 in sleds:
word_blob[0::word_size] = sled1
for sled2 in sleds:
word_blob[1::word_size] = sled2
# we write to the raw FileIO since we know we don't need buffering or other fancy
# bells and whistles, however in practice it doesn't seem that much faster
f.raw.write(word_blob)
There's a lot of magic happening in that code block, but in a nutshell:
I batch the writes, so that I'm writing 36**4 or 1679616 entries at once, so there's less context switching.
I update all 1679616 entries per batch simultaneously with the new prefix, using bytearray slicing / assignment.
I operate on bytes, write to the raw FileIO, expand the loops for the prefix assignments, and other small optimizations to avoid encoding/buffering/function call overhead/other performance hits.
Note, unless you have a very fast disk and slowish CPU, you won't see much benefit from the smaller optimizations, just the write batching probably.
On my system, it takes about 45 seconds to product + write the 14880348 file, and that's writing to my slowest disk. On my NVMe drive, it takes 6.868 seconds.
The fastest way I can think of is using pypy3 with this code:
import functools
import time
from string import digits, ascii_lowercase
#functools.lru_cache(maxsize=128)
def main():
cl = []
cs = digits + ascii_lowercase
for letter in cs:
cl.append(letter)
ct = tuple(cl)
with open("codes.txt", "w") as file:
for p1 in ct:
for p2 in ct:
for p3 in ct:
for p4 in ct:
for p5 in ct:
for p6 in ct:
file.write(f"{p1}{p2}{p3}{p4}{p5}{p6}\n")
if __name__ == '__main__':
start = time.time()
main()
print(f"Done!\nTook {time.time() - start} seconds!")
It writes at around 10-15MB/s. The total file is around 15GB I believe so it would take like 990-1500 seconds to generate. The results are on a VM of unraid with 1 3.4 ghz core of server CPU, with an old SATA3 SSD. You will probably get better results with an NVME drive and a faster single core CPU.
Random Can be very inefficient. You can try :
from itertools import permutations
from pandas import Series
charset = list("0123456789abcdefghijklmnopqrstuvwxyz")
links = []
file = open("codes.txt", "a")
comb = permutations(charset,6)
comb = list(comb)
comb = list(map(lambda x:return ''.join(x),comb))
mySeries = Series(comb)
mySeries = mySeries.sort_values()
base = ""
for k in mySeries:
base += k
file.write(base + "\n")
file.close()
You could use itertools.permutaions from the default itertools library. You can also specify the number of characters in the combination.
from itertools import permutations
charset = "0123456789abcdefghijklmnopqrstuvwxyz"
c = permutations(charset, 6)
with open('code.txt', 'w') as f:
for i in c:
f.write("".join(i) + '\n')
Runs on my computer in about 200 milliseconds for creating the list of permutations, then spends a lot of time writing to the file
For permutations, this would do the trick:
from itertools import permutations
charset = "0123456789abcdefghijklmnopqrstuvwxyz"
links = []
with open("codes.txt", "w") as f:
for permutation in permutations(charset, 6):
f.write(''.join(permutation) + '\n')
FYI, it would create a 7.8 GigaByte file
For combinations, this would do the trick:
from itertools import combinations
charset = "0123456789abcdefghijklmnopqrstuvwxyz"
links = []
with open("codes.txt", "w") as f:
for comb in combinations(charset, 6):
f.write(''.join(comb)+ '\n')
FYI, it would create a 10.8 megabyte file
First thing; There is better ways to do this but I want to write something clear and understandable.
Pseudo Code:
base = "";
for(x1=0; x1<charset.length(); x1++)
for(x2=0; x2<charset.length(); x2++)
for(x3=0; x3<charset.length(); x3++)
.
.
.
{ base = charset[x1]+charset[x2]+charset[x3]+.....+charset[x6];
file.write(base + "\n")
}
This is a combination problem where you are trying to get combinations of length 6 from the character set of length 36. This will produce an output of size 36!/(30!*6!) . You can refer the itertools for solving a combination problem like yours. You can refer the Combination function in itertools
Documentation. It is recommended not to perform such a performance intensive computation using Python.

Convert stream of hex values to 16-bit ints

I get packages of binary strings of size 61440 in hex values, somehting like:
b'004702AF42324fe380ac...'
I need to split those into batches of 4 and convert them to integers. 16 bit would be preferred but casting this later is not a problem. The way i did it looks like this and it works.
out = [int(img[i][j:j+4],16) for j in range(0,len(img[i]), 4)]
The issue im having is performance. Thing is i get a minimum of 200 of those a second possibly more and without multithreading i can only pass through 100-150 a second.
Can i improve the speed of this in some way?
This is a rewrite of my earlier offering showing how multithreading does, in fact, make a very significant difference - possibly depending on the system architecture.
The following code executes in ~0.05s on my machine:-
import random
from datetime import datetime
import concurrent.futures
N = 10
R = 61440
IMG = []
for _ in range(N):
IMG.append(''.join(random.choice('0123456789abcdef')
for _ in range(R)))
"""
now IMG has N elements each containg R pseudo randomly generated hexadecimal values
"""
def tfunc(img, k):
return k, [int(img[j:j + 4], 16) for j in range(0, len(img), 4)]
R = [0] * N
start = datetime.now()
with concurrent.futures.ThreadPoolExecutor() as executor:
futures = []
"""
note that we pass the relevant index to the worker function
because we can't be sure of the order of completion
"""
for i in range(N):
futures.append(executor.submit(tfunc, IMG[i], i))
for future in concurrent.futures.as_completed(futures):
k, r = future.result()
R[k] = r
"""
list R now contains the converted values from the same relative indexes in IMG
"""
print(f'Duration={datetime.now()-start}')
I don't think multi-threading will help in this case as it's purely CPU intensive. The overheads of breaking it down over, say, 4 threads would outweigh any theoretical advantages. Your list comprehension appears to be as efficient as it can be although I'm unclear as to why img seems to have multiple dimensions. I've written the following simulation and on my machine this consistently executes in ~0.8 seconds. I think the performance you'll get from your code is going to be highly dependent on your CPU's capabilities. Here's the code:-
import random
from datetime import datetime
hv = '0123456789abcdef'
img = ''.join(random.choice(hv) for _ in range(61440))
start = datetime.now()
for _ in range(200):
out = [int(img[j:j + 4], 16) for j in range(0, len(img), 4)]
print(f'Duration={datetime.now()-start}')
I did some more research and found that not multi-threading but multi-processes are what I need. That gave me a speedup from 220 batches per second to ~370 batches per second. This probably bottnecks somewhere else now since i only got 15% load on all cores but puts me comfortably above spec and thats good enough.
from multiprocessing import Pool
def combine(img):
return np.array([int(img[j:j+4],16) for j in range(0,len(img), 4)]).reshape((24,640))
p = Pool(20)
img = p.map(combine, tmp)

Concatenate Big List Elements Efficiently

I want to make a list of elements where each element starts with 4 numbers and ends with 4 letters with every possible combination. This is my code
import itertools
def char_range(c1, c2):
"""Generates the characters from `c1` to `c2`"""
for c in range(ord(c1), ord(c2)+1):
yield chr(c)
chars =list()
nums =list()
for combination in itertools.product(char_range('a','b'),repeat=4):
chars.append(''.join(map(str, combination)))
for combination in itertools.product(range(10),repeat=4):
nums.append(''.join(map(str, combination)))
c = [str(x)+y for x,y in itertools.product(nums,chars)]
for dd in c:
print(dd)
This runs fine but when I use a bigger range of characters, such as (a-z) the program hogs the CPU and memory, and the PC becomes unresponsive. So how can I do this in a more efficient way?
The documentation of itertools says that "it is roughly equivalent to nested for-loops in a generator expression". So itertools.product is never an enemy of memory, but if you store its results in a list, that list is. Therefore:
for element in itertools.product(...):
print element
is okay, but
myList = [element for itertools.product(...)]
or the equivalent loop of
for element in itertools.product(...):
myList.append(element)
is not! So you want itertools to generate results for you, but you don't want to store them, rather use them as they are generated. Think about this line of your code:
c = [str(x)+y for x,y in itertools.product(nums,chars)]
Given that nums and chars can be huge lists, building another gigantic list of all combinations on top of them is definitely going to choke your system.
Now, as mentioned in the comments, if you replace all the lists that are too fat to fit into the memory with generators (functions that just yield), memory is not going to be a concern anymore.
Here is my full code. I basically changed your lists of chars and nums to generators, and got rid of the final list of c.
import itertools
def char_range(c1, c2):
"""Generates the characters from `c1` to `c2`"""
for c in range(ord(c1), ord(c2)+1):
yield chr(c)
def char(a):
for combination in itertools.product(char_range(str(a[0]),str(a[1])),repeat=4):
yield ''.join(map(str, combination))
def num(n):
for combination in itertools.product(range(n),repeat=4):
yield ''.join(map(str, combination))
def final(one,two):
for foo in char(one):
for bar in num(two):
print str(bar)+str(foo)
Now let's ask what every combination of ['a','b'] and range(2) is:
final(['a','b'],2)
Produces this:
0000aaaa
0001aaaa
0010aaaa
0011aaaa
0100aaaa
0101aaaa
0110aaaa
0111aaaa
1000aaaa
1001aaaa
1010aaaa
1011aaaa
1100aaaa
1101aaaa
1110aaaa
1111aaaa
0000aaab
0001aaab
0010aaab
0011aaab
0100aaab
0101aaab
0110aaab
0111aaab
1000aaab
1001aaab
1010aaab
1011aaab
1100aaab
1101aaab
1110aaab
1111aaab
0000aaba
0001aaba
0010aaba
0011aaba
0100aaba
0101aaba
0110aaba
0111aaba
1000aaba
1001aaba
1010aaba
1011aaba
1100aaba
1101aaba
1110aaba
1111aaba
0000aabb
0001aabb
0010aabb
0011aabb
0100aabb
0101aabb
0110aabb
0111aabb
1000aabb
1001aabb
1010aabb
1011aabb
1100aabb
1101aabb
1110aabb
1111aabb
0000abaa
0001abaa
0010abaa
0011abaa
0100abaa
0101abaa
0110abaa
0111abaa
1000abaa
1001abaa
1010abaa
1011abaa
1100abaa
1101abaa
1110abaa
1111abaa
0000abab
0001abab
0010abab
0011abab
0100abab
0101abab
0110abab
0111abab
1000abab
1001abab
1010abab
1011abab
1100abab
1101abab
1110abab
1111abab
0000abba
0001abba
0010abba
0011abba
0100abba
0101abba
0110abba
0111abba
1000abba
1001abba
1010abba
1011abba
1100abba
1101abba
1110abba
1111abba
0000abbb
0001abbb
0010abbb
0011abbb
0100abbb
0101abbb
0110abbb
0111abbb
1000abbb
1001abbb
1010abbb
1011abbb
1100abbb
1101abbb
1110abbb
1111abbb
0000baaa
0001baaa
0010baaa
0011baaa
0100baaa
0101baaa
0110baaa
0111baaa
1000baaa
1001baaa
1010baaa
1011baaa
1100baaa
1101baaa
1110baaa
1111baaa
0000baab
0001baab
0010baab
0011baab
0100baab
0101baab
0110baab
0111baab
1000baab
1001baab
1010baab
1011baab
1100baab
1101baab
1110baab
1111baab
0000baba
0001baba
0010baba
0011baba
0100baba
0101baba
0110baba
0111baba
1000baba
1001baba
1010baba
1011baba
1100baba
1101baba
1110baba
1111baba
0000babb
0001babb
0010babb
0011babb
0100babb
0101babb
0110babb
0111babb
1000babb
1001babb
1010babb
1011babb
1100babb
1101babb
1110babb
1111babb
0000bbaa
0001bbaa
0010bbaa
0011bbaa
0100bbaa
0101bbaa
0110bbaa
0111bbaa
1000bbaa
1001bbaa
1010bbaa
1011bbaa
1100bbaa
1101bbaa
1110bbaa
1111bbaa
0000bbab
0001bbab
0010bbab
0011bbab
0100bbab
0101bbab
0110bbab
0111bbab
1000bbab
1001bbab
1010bbab
1011bbab
1100bbab
1101bbab
1110bbab
1111bbab
0000bbba
0001bbba
0010bbba
0011bbba
0100bbba
0101bbba
0110bbba
0111bbba
1000bbba
1001bbba
1010bbba
1011bbba
1100bbba
1101bbba
1110bbba
1111bbba
0000bbbb
0001bbbb
0010bbbb
0011bbbb
0100bbbb
0101bbbb
0110bbbb
0111bbbb
1000bbbb
1001bbbb
1010bbbb
1011bbbb
1100bbbb
1101bbbb
1110bbbb
1111bbbb
Which is the exact result you are looking for. Each element of this result is generated on the fly, hence never creates a memory problem. You can now try and see that much bigger operations such as final(['a','z'],10) are CPU-friendly.

OCaml equivalent of Python generators

The french Sécurité Sociale identification numbers end with a check code of two digits. I have verified that every possible common transcription error can be detected, and found some other kinds of errors (e.g., rolling three consecutive digits) that may stay undetected.
def check_code(number):
return 97 - int(number) % 97
def single_digit_generator(number):
for i in range(len(number)):
for wrong_digit in "0123456789":
yield number[:i] + wrong_digit + number[i+1:]
def roll_generator(number):
for i in range(len(number) - 2):
yield number[:i] + number[i+2] + number[i] + number[i+1] + number[i+3:]
yield number[:i] + number[i+1] + number[i+2] + number[i] + number[i+3:]
def find_error(generator, number):
control = check_code(number)
for wrong_number in generator(number):
if number != wrong_number and check_code(wrong_number) == control:
return (number, wrong_number)
assert find_error(single_digit_generator, "0149517490979") is None
assert find_error(roll_generator, "0149517490979") == ('0149517490979', '0149517499709')
My Python 2.7 code (working fragment above) makes heavy use of generators. I was wondering how I could adapt them in OCaml. I surely can write a function maintaining some internal state, but I'm looking for a purely functional solution. May I study the library lazy, which I'm not too familiar with? I'm not asking for code, just directions.
You may simply define a generator as a stream, using the extension of the language :
let range n = Stream.from (fun i -> if i < n then Some i else None);;
The for syntactic construct cannot be used with that, but there's a set of functions provided by the Stream module to check the state of the stream and iterate on its elements.
try
let r = range 10 in
while true do
Printf.printf "next element: %d\n" ## Stream.next r
done
with Stream.Failure -> ();;
Or more simply:
Stream.iter (Printf.printf "next element: %d\n") ## range 10;;
You may also use a special syntax provided by the camlp4 preprocessor:
Stream.iter (Printf.printf "next element: %d\n") [< '11; '3; '19; '52; '42 >];;
Other features include creating streams out of lists, strings, bytes or even channels. The API documentation succinctly describes the different possibilities.
The special syntax let you compose them, so as to put back elements before or after, but it can be somewhat unintuitive at first:
let dump = Stream.iter (Printf.printf "next element: %d\n");;
dump [< range 10; range 20 >];;
will produce the elements of the first stream, and then pick up the second stream at element ranked right after the rank of the last yielded element, thus it will appear in this case as if only the second stream got iterated.
To get all the elements, you could construct a stream of 'a Stream.t and then recursively iterate on each:
Stream.iter dump [< '(range 10); '(range 20) >];;
These would produce the expected output.
I recommend reading the old book on OCaml (available online) for a better introduction on the topic.
Core library provides generators in a python style, see Sequence module.
Here is an example, taken from one of my project:
open Core_kernel.Std
let intersections tab (x : mem) : _ seq =
let open Sequence.Generator in
let init = return () in
let m = fold_intersections tab x ~init ~f:(fun addr x gen ->
gen >>= fun () -> yield (addr,x)) in
run m

Where do you use generators feature in your python code?

I have studied generators feature and i think i got it but i would like to understand where i could apply it in my code.
I have in mind the following example i read in "Python essential reference" book:
# tail -f
def tail(f):
f.seek(0,2)
while True:
line = f.readline()
if not line:
time.sleep(0.1)
continue
yield line
Do you have any other effective example where generators are the best tool for the job like tail -f?
How often do you use generators feature and in which kind of functionality\part of program do you usually apply it?
I use them a lot when I implement scanners (tokenizers) or when I iterate over data containers.
Edit: here is a demo tokenizer I used for a C++ syntax highlight program:
whitespace = ' \t\r\n'
operators = '~!%^&*()-+=[]{};:\'"/?.,<>\\|'
def scan(s):
"returns a token and a state/token id"
words = {0:'', 1:'', 2:''} # normal, operator, whitespace
state = 2 # I pick ws as first state
for c in s:
if c in operators:
if state != 1:
yield (words[state], state)
words[state] = ''
state = 1
words[state] += c
elif c in whitespace:
if state != 2:
yield (words[state], state)
words[state] = ''
state = 2
words[state] += c
else:
if state != 0:
yield (words[state], state)
words[state] = ''
state = 0
words[state] += c
yield (words[state], state)
Usage example:
>>> it = scan('foo(); i++')
>>> it.next()
('', 2)
>>> it.next()
('foo', 0)
>>> it.next()
('();', 1)
>>> it.next()
(' ', 2)
>>> it.next()
('i', 0)
>>> it.next()
('++', 1)
>>>
Whenever your code would either generate an unlimited number of values or more generally if too much memory would be consumed by generating the whole list at first.
Or if it is likely that you don't iterate over the whole generated list (and the list is very large). I mean there is no point in generating every value first (and waiting for the generation) if it is not used.
My latest encounter with generators was when I implemented a linear recurrent sequence (LRS) like e.g. the Fibonacci sequence.
In all cases where I have algorithms that read anything, I use generators exclusively.
Why?
Layering in filtering, mapping and reduction rules is so much easier in a context of multiple generators.
Example:
def discard_blank( source ):
for line in source:
if len(line) == 0:
continue
yield line
def clean_end( source ):
for line in source:
yield line.rstrip()
def split_fields( source ):
for line in source;
yield line.split()
def convert_pos( tuple_source, position ):
for line in tuple_source:
yield line[:position]+int(line[position])+line[position+1:]
with open('somefile','r') as source:
data= convert_pos( split_fields( discard_blank( clean_end( source ) ) ), 0 )
total= 0
for l in data:
print l
total += l[0]
print total
My preference is to use many small generators so that a small change is not disruptive to the entire process chain.
In general, to separate data aquisition (which might be complicated) from consumption. In particular:
to concatenate results of several b-tree queries - the db part generates and executes the queries yield-ing records from each one, the consumer only sees single data items arriving.
buffering (read-ahead ) - the generator fetches data in blocks and yields single elements from each block. Again, the consumer is separated from the gory details.
Generators can also work as coroutines. You can pass data into them using nextval=g.next(data) on the 'consumer' side and data = yield(nextval) on the generator side. In this case the generator and its consumer 'swap' values. You can even make yield throw an exception within the generator context: g.throw(exc) does that.

Categories