how to parallelize big for loops in python

how to parallelize big for loops in python - python

I just got to Python, and I am still in the steep phase of the learning curve. Thank you for any comments ahead.
I have a big for loop to run (big in the sense of many iterations), for example:
for i in range(10000)
for j in range(10000)
f((i,j))
I though that it would be a common question of how to parallelize it, and after hours of search on google I arrived at the solution using "multiprocessing" module, as the following:
pool=Pool()
x=pool.map(f,[(i,j) for i in range(10000) for j in range(10000)])
This works when the loop is small. However, it is really slow if the loop is large, Or sometimes a memory error occurs if the loops are too big. It seems that python would generate the list of arguments first, and then feed the list to the function "f", even using xrange. Is that correct?
So this parallelization does not work for me because I do not really need to store all arguments in a list. Is there a better way to do this? I appreciate any suggestions or references. Thank you.

It seems that python would generate the list of arguments first, and then feed the list to the function "f", even using xrange. Is that correct?
Yes, because you're using a list comprehension, which explicitly asks it to generate that list.
(Note that xrange isn't really relevant here, because you only have two ranges at a time, each 10K long; compared to the 100M of the argument list, that's nothing.)
If you want it to generate the values on the fly as needed, instead of all 100M at once, you want to use a generator expression instead of a list comprehension. Which is almost always just a matter of turning the brackets into parentheses:
x=pool.map(f,((i,j) for i in range(10000) for j in range(10000)))
However, as you can see from the source, map will ultimately just make a list if you give it a generator, so in this case, that won't solve anything. (The docs don't explicitly say this, but it's hard to see how it could pick a good chunksize to chop the iterable into if it didn't have a length…).
And, even if that weren't true, you'd still just run into the same problem again with the results, because pool.map returns a list.
To solve both problems, you can use pool.imap instead. It consumes the iterable lazily, and returns a lazy iterator of results.
One thing to note is that imap does not guess at the best chunksize if you don't pass one, but just defaults to 1, so you may need a bit of thought or trial&error to optimize it.
Also, imap will still queue up some results as they come in, so it can feed them back to you in the same order as the arguments. In pathological cases, it could end up queuing up (poolsize-1)/poolsize of your results, although in practice this is incredibly rare. If you want to solve this, use imap_unordered. If you need to know the ordering, just pass the indexes back and forth with the args and results:
args = ((i, j) for i in range(10000) for j in range(10000))
def indexed_f(index, (i, j)):
return index, f(i, j)
results = pool.imap_unordered(indexed_f, enumerate(args))
However, I notice that in your original code, you're not doing anything at all with the results of f(i, j). In that case, why even bother gathering up the results at all? In that case, you can just go back to the loop:
for i in range(10000):
for j in range(10000):
map.apply_async(f, (i,j))
However, imap_unordered may still be worth using, because it provides a very easy way to block until all of the tasks are done, while still leaving the pool itself running for later use:
def consume(iterator):
deque(iterator, max_len=0)
x=pool.imap_unordered(f,((i,j) for i in range(10000) for j in range(10000)))
consume(x)

Related

How do Python Recursive Generators work?

In a Python tutorial, I've learned that
Like functions, generators can be recursively programmed. The following
example is a generator to create all the permutations of a given list of items.
def permutations(items):
n = len(items)
if n==0: yield []
else:
for i in range(len(items)):
for cc in permutations(items[:i]+items[i+1:]):
yield [items[i]]+cc
for p in permutations(['r','e','d']): print(''.join(p))
for p in permutations(list("game")): print(''.join(p) + ", ", end="")
I cannot figure out how it generates the results. The recursive things and 'yield' really confused me. Could someone explain the whole process clearly?

There are 2 parts to this --- recursion and generator. Here's the non-generator version that just uses recursion:
def permutations2(items):
n = len(items)
if n==0: return [[]]
else:
l = []
for i in range(len(items)):
for cc in permutations2(items[:i]+items[i+1:]):
l.append([items[i]]+cc)
return l
l.append([item[i]]+cc) roughly translates to the permutation of these items include an entry where item[i] is the first item, and permutation of the rest of the items.
The generator part yield one of the permutations instead of return the entire list of permutations.

When you call a function that returns, it disappears after having produced its result.
When you ask a generator for its next element, it produces it (yields it), and pauses -- yields (the control back) to you. When asked again for the next element, it will resume its operations, and run normally until hitting a yield statement. Then it will again produce a value and pause.
Thus calling a generator with some argument causes creation of actual memory entity, an object, capable of running, remembering its state and arguments, and producing values when asked.
Different calls to the same generator produce different actual objects in memory. The definition is a recipe for the creation of that object. After the recipe is defined, when it is called it can call any other recipe it needs -- or the same one -- to create new memory objects it needs, to produce the values for it.
This is a general answer, not Python-specific.

Thanks for the answers. It really helps me to clear my mind and now I want to share some useful resources about recursion and generator I found on the internet, which is also very friendly to the beginners.
To understand generator in python. The link below is really readable and easy to understand.
What does the "yield" keyword do in Python?
To understand recursion, "https://www.youtube.com/watch?v=MyzFdthuUcA". This youtube video gives a "patented" 4 steps method to writing any recursive method/function. That is very clear and practicable. The channel also has several videos to show people how does the recursion works and how to trace it.
I hope it can help someone like me.

Python generators timeit

I am trying to understand Python generators in many tutorials guys tells that they are much faster then for example iterating through a list, so I give it a try, I wrote a simple code. I didn't expect that time difference can be that big, can someone explain me why? Or maybe I am doing something wrong here.
def f(limit):
for i in range(limit):
if(i / 7.0) % 1 == 0:
yield i
def f1(limit):
l = []
for i in range(limit):
if(i / 7.0) % 1 == 0:
l.append(i)
return l
t = timeit.Timer(stmt="f(50)", setup="from __main__ import f")
print t.timeit()
t1 = timeit.Timer(stmt="f1(50)", setup="from __main__ import f1")
print t1.timeit()
Results:
t = 0.565694382945
t1 =11.9298217371

You are not comparing f and f1 fairly.
Your first test is only measuring how long it takes Python to construct a generator object. It never iterates over this object though, which means the code inside f is never actually executed (generators only execute their code when they are iterated over).
Your second test however measures how long it takes to call f1. Meaning, it counts how long it takes the function to construct the list l, run the for-loop to completion, call list.append numerous times, and then return the result. Obviously, this will be much slower than just producing a generator object.
For a fair comparison, exhaust the generator f by converting it into a list:
t = timeit.Timer(stmt="list(f(50))", setup="from __main__ import f")
This will cause it to be iterated over entirely, which means the code inside f will now be executed.

You're timing how long it takes to create a generator object. Creating one doesn't actually execute any code, so you're essentially timing an elaborate way to do nothing.
After you fix that, you'll find that generators are usually slightly slower when run to completion. Their advantage is that they don't need to store all elements in memory at once, and can stop halfway through. For example, when you have a sequence of boolean values and want to check whether any of them are true, with lists you'd first compute all the values and create a list of them, before checking for truth, while with generators you can:
Create the first boolean
Check if it's true, and if so, stop creating booleans
Else, create the second boolean
Check if that one's true, and if so, stop creating booleans
And so on.

https://wiki.python.org/moin/Generators has some good information under the section improved performance. Although creating a generator can take a bit of time, it offers a number of advantages.
Uses less memory. By creating the values one by one, the whole list is never in memory.
Less time to begin. Making a whole list takes time, while a generator can be used as soon as it creates the first value.
Generators don't have a set ending point.
Here's a good tutorial on creating generators and iterators http://sahandsaba.com/python-iterators-generators.html. Check it out!

Global variable messes up my recursive function

I've just run into a tricky issue. The following code is supposed to split words into chunks of length numOfChar. The function calls itself, which makes it impossible to have the resulting list (res) inside the function. But if I keep it outside as a global variable, then every subsequent call of the function with different input values leads to a wrong result because res doesn't get cleared.
Can anyone help me out?
Here's the code
(in case you are interested, this is problem 7-23 from PySchools.com):
res = []
def splitWord(word, numOfChar):
if len(word) > 0:
res.append(word[:numOfChar])
splitWord(word[numOfChar:], numOfChar)
return res
print splitWord('google', 2)
print splitWord('google', 3)
print splitWord('apple', 1)
print splitWord('apple', 4)

A pure recursive function should not modify the global state, this counts as a side effect.
Instead of appending-and-recursion, try this:
def splitWord(word, numOfChar):
if len(word) > 0:
return [word[:numOfChar]] + splitWord(word[numOfChar:], numOfChar)
else:
return []
Here, you chop the word into pieces one piece at a time, on every call while going down, and then rebuild the pieces into a list while going up.
This is a common pattern called tail recursion.
P.S. As #e-satis notes, recursion is not an efficient way to do this in Python. See also #e-satis's answer for a more elaborate example of tail recursion, along with a more Pythonic way to solve the problem using generators.

Recursion is completely unnecessary here:
def splitWord(word, numOfChar):
return [word[i:i+numOfChar] for i in xrange(0, len(word), numOfChar)]
If you insist on a recursive solution, it is a good idea to avoid global variables (they make it really tricky to reason about what's going on). Here is one way to do it:
def splitWord(word, numOfChar):
if len(word) > 0:
return [word[:numOfChar]] + splitWord(word[numOfChar:], numOfChar)
else:
return []

To elaborate on #Helgi answer, here is a more performant recursive implémentation. It updates the list instead of summing two lists (which results in the creation of a new object every time).
This pattern forces you to pass a list object as third parameter.
def split_word(word, num_of_chars, tail):
if len(word) > 0:
tail.append(word[:num_of_chars])
return split_word(word[num_of_chars:], num_of_chars, tail)
return tail
res = split_word('fdjskqmfjqdsklmfjm', 3, [])
Another advantage of this form, is that it allows tail recursion optimisation. It's useless in Python because it's not a language that performs such optimisation, but if you translate this code into Erlang or Lisp, you will get it for free.
Remember, in Python you are limited by the recursion stack, and there is no way out of it. This is why recursion is not the preferred method.
You would most likely use generators, using yield and itertools (a module to manipulate generators). Here is a very good example of a function that can split any iterable in chunks:
from itertools import chain, islice
def chunk(seq, chunksize, process=iter):
it = iter(seq)
while True:
yield process(chain([it.next()], islice(it, chunksize - 1)))
Now it's a bit complicated if you start learning Python, so I'm not expecting you to fully get it now, but it's good that you can see this and know it exists. You'll come back to it later (we all did, Python iteration tools are overwhelming at first).
The benefits of this approach are:
It can chunk ANY iterable, not just strings, but also lists, dictionaries, tuples, streams, files, sets, queryset, you name it...
It accepts iterables of any length, and even one with an unknown length (think bytes stream here).
It eats very few memory, as the best thing with generators is that they generate the values on the fly, one by one, and they don't store the previous results before computing the next.
It returns chunks of any nature, meaning you can have a chunks of x letters, lists of x items, or even generators spitting out x items (which is the default).
It returns a generator, and therefor can be use in a flow of other generators. Piping data from one generator to the other, bash style, is a wonderful Python ability.
To get the same result that with your function, you would do:
In [17]: list(chunk('fdjskqmfjqdsklmfjm', 3, ''.join))
Out[17]: ['fdj', 'skq', 'mfj', 'qds', 'klm', 'fjm']

More efficient ways of doing this

for i in vr_world.getNodeNames():
if i != "_error_":
World[i] = vr_world.getChild(i)
vr_world.getNodeNames() returns me a gigantic list, vr_world.getChild(i) returns a specific type of object.
This is taking a long time to run, is there anyway to make it more efficient? I have seen one-liners for loops before that are supposed to be faster. Ideas?

kaloyan suggests using a generator. Here's why that may help.
If getNodeNames() builds a list, then your loop is basically going over the list twice: once to build it, and once when you iterate over the list.
If getNodeNames() is a generator, then your loop doesn't ever build the list; instead of creating the item and adding it to the list, it creates the item and yields it to the caller.
Whether or not this helps is contingent on a couple of things. First, it has to be possible to implement getNodeNames() as a generator. We don't know anything about the implementation details of that function, so it's not possible to say if that's the case. Next, the number of items you're iterating over needs to be pretty big.
Of course, none of this will have any effect at all if it turns out that the time-consuming operation in all of this is vr_world.getChild(). That's why you need to profile your code.

I don't think you can make it faster than what you have there. Yes, you can put the whole thing on one line but that will not make it any faster. The bottleneck obviously is getNodeNames(). If you can make it a generator, you will start populating the World dict with results sooner (if that matters to you) and if you make it filter out the "_error_" values, you will not have the deal with that at a later stage.

World = dict((i, vr_world.getChild(i)) for i in vr_world.getNodeNames() if i != "_error_")
This is a one-liner, but not necessarily much faster than your solution...

Maybe you can use a filter and a map, however I don't know if this would be any faster:
valid = filter(lambda i: i != "_error_", vr_world.getNodeNames())
World = map(lambda i: vr_world.getChild(i), valid)
Also, as you'll see a lot around here, profile first, and then optimize, otherwise you may be wasting time. You have two functions there, maybe they are the slow parts, not the iteration.

What is the most pythonic way to have a generator expression executed?

More and more features of Python move to be "lazy executable", like generator
expressions and other kind of iterators.
Sometimes, however, I see myself wanting to roll a one liner "for" loop, just to perform some action.
What would be the most pythonic thing to get the loop actually executed?
For example:
a = open("numbers.txt", "w")
(a.write ("%d " % i) for i in xrange(100))
a.close()
Not actuall code, but you see what I mean. If I use a list generator, instead, I have the side effect of creating a N-lenght list filled with "None"'s.
Currently what I do is to use the expression as the argument in a call to "any" or to "all". But I would like to find a way that would not depend on the result of the expression performed in the loop - both "any" and "all" can stop depending on the expression evaluated.
To be clear, these are ways to do it that I already know about, and each one has its drawbacks:
[a.write ("%d " % i) for i in xrange(100))]
any((a.write ("%d " % i) for i in xrange(100)))
for item in (a.write ("%d " % i) for i in xrange(100)): pass

There is one obvious way to do it, and that is the way you should do it. There is no excuse for doing it a clever way.
a = open("numbers.txt", "w")
for i in xrange(100):
a.write("%d " % i)
d.close()
Lazy execution gives you a serious benefit: It allows you to pass a sequence to another piece of code without having to hold the entire thing in memory. It is for the creation of efficient sequences as data types.
In this case, you do not want lazy execution. You want execution. You can just ... execute. With a for loop.

If I wanted to do this specific example, I'd write
for i in xrange(100): a.write('%d ' % i)
If I often needed to consume an iterator for its effect, I'd define
def for_effect(iterable):
for _ in iterable:
pass

There are many accumulators which have the effect of consuming the whole iterable they're given, such as min or max -- but even they don't ignore entirely the results yielded in the process (min and max, for example, will raise an exception if some of the results are complex numbers). I don't think there's a built-in accumulator that does exactly what you want -- you'll have to write (and add to your personal stash of tiny utility function) a tiny utility function such as
def consume(iterable):
for item in iterable: pass
The main reason, I guess, is that Python has a for statement and you're supposed to use it when it fits like a glove (i.e., for the cases you'd want consume for;-).
BTW, a.write returns None, which is falsish, so any will actually consume it (and a.writelines will do even better!). But I realize you were just giving that as an example;-).

It is 2019 -
and this is a question from 2010 that keeps showing up. A recent thread in one of Python's mailing lists spammed over 70 e-mails on this subject, and they refused again to add a consume call to the language.
On that thread, the most efficient mode to that actually showed up, and it is far from being obvious, so I am posting it as the answer here:
import deque
consume = deque(maxlen=0).extend
And then use the consume callable to process generator expressions.
It turns out the deque native code in cPython actually is optimized for the maxlen=0 case, and will just consume the iterable.
The any and all calls I mentioned in the question should be equally as efficient, but one has to worry about the expression truthiness in order for the iterable to be consumed.
I see this still may be controversial, after all, an explicit two line for loop can handle this - I remembered this question because I just made a commit where I create some threads, start then, and join then back - without a consume callable, that is 4 lines with mostly boiler plate, and without benefiting from cycling through the iterable in native code:
https://github.com/jsbueno/extracontext/blob/a5d24be882f9aa18eb19effe3c2cf20c42135ed8/tests/test_thread.py#L27

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to parallelize big for loops in python - python

Related

How do Python Recursive Generators work?

Python generators timeit

Global variable messes up my recursive function

More efficient ways of doing this

What is the most pythonic way to have a generator expression executed?

Categories

Resources