I've just run into a tricky issue. The following code is supposed to split words into chunks of length numOfChar. The function calls itself, which makes it impossible to have the resulting list (res) inside the function. But if I keep it outside as a global variable, then every subsequent call of the function with different input values leads to a wrong result because res doesn't get cleared.
Can anyone help me out?
Here's the code
(in case you are interested, this is problem 7-23 from PySchools.com):
res = []
def splitWord(word, numOfChar):
if len(word) > 0:
res.append(word[:numOfChar])
splitWord(word[numOfChar:], numOfChar)
return res
print splitWord('google', 2)
print splitWord('google', 3)
print splitWord('apple', 1)
print splitWord('apple', 4)
A pure recursive function should not modify the global state, this counts as a side effect.
Instead of appending-and-recursion, try this:
def splitWord(word, numOfChar):
if len(word) > 0:
return [word[:numOfChar]] + splitWord(word[numOfChar:], numOfChar)
else:
return []
Here, you chop the word into pieces one piece at a time, on every call while going down, and then rebuild the pieces into a list while going up.
This is a common pattern called tail recursion.
P.S. As #e-satis notes, recursion is not an efficient way to do this in Python. See also #e-satis's answer for a more elaborate example of tail recursion, along with a more Pythonic way to solve the problem using generators.
Recursion is completely unnecessary here:
def splitWord(word, numOfChar):
return [word[i:i+numOfChar] for i in xrange(0, len(word), numOfChar)]
If you insist on a recursive solution, it is a good idea to avoid global variables (they make it really tricky to reason about what's going on). Here is one way to do it:
def splitWord(word, numOfChar):
if len(word) > 0:
return [word[:numOfChar]] + splitWord(word[numOfChar:], numOfChar)
else:
return []
To elaborate on #Helgi answer, here is a more performant recursive implémentation. It updates the list instead of summing two lists (which results in the creation of a new object every time).
This pattern forces you to pass a list object as third parameter.
def split_word(word, num_of_chars, tail):
if len(word) > 0:
tail.append(word[:num_of_chars])
return split_word(word[num_of_chars:], num_of_chars, tail)
return tail
res = split_word('fdjskqmfjqdsklmfjm', 3, [])
Another advantage of this form, is that it allows tail recursion optimisation. It's useless in Python because it's not a language that performs such optimisation, but if you translate this code into Erlang or Lisp, you will get it for free.
Remember, in Python you are limited by the recursion stack, and there is no way out of it. This is why recursion is not the preferred method.
You would most likely use generators, using yield and itertools (a module to manipulate generators). Here is a very good example of a function that can split any iterable in chunks:
from itertools import chain, islice
def chunk(seq, chunksize, process=iter):
it = iter(seq)
while True:
yield process(chain([it.next()], islice(it, chunksize - 1)))
Now it's a bit complicated if you start learning Python, so I'm not expecting you to fully get it now, but it's good that you can see this and know it exists. You'll come back to it later (we all did, Python iteration tools are overwhelming at first).
The benefits of this approach are:
It can chunk ANY iterable, not just strings, but also lists, dictionaries, tuples, streams, files, sets, queryset, you name it...
It accepts iterables of any length, and even one with an unknown length (think bytes stream here).
It eats very few memory, as the best thing with generators is that they generate the values on the fly, one by one, and they don't store the previous results before computing the next.
It returns chunks of any nature, meaning you can have a chunks of x letters, lists of x items, or even generators spitting out x items (which is the default).
It returns a generator, and therefor can be use in a flow of other generators. Piping data from one generator to the other, bash style, is a wonderful Python ability.
To get the same result that with your function, you would do:
In [17]: list(chunk('fdjskqmfjqdsklmfjm', 3, ''.join))
Out[17]: ['fdj', 'skq', 'mfj', 'qds', 'klm', 'fjm']
Related
So I was writing a function where a lot of processing happens in the body of a loop, and occasionally it may be of interest to the caller to have the answer to some of the computations.
Normally I would just put the results in a list and return the list, but in this case the results are too large (a few hundred MB on each loop).
I wrote this without really thinking about it, expecting Python's dynamic typing to figure things out, but the following is always created as a generator.
def mixed(is_generator=False):
for i in range(5):
# process some stuff including writing to a file
if is_generator:
yield i
return
From this I have two questions:
1) Does the presence of the yield keyword in a scope immediately turn the object its in into a generator?
2) Is there a sensible way to obtain the behaviour I intended?
2.1) If no, what is the reasoning behind it not being possible? (In terms of how functions and generators work in Python.)
Lets go step by step:
1) Does the presence of the yield keyword in a scope immediately turn the object its in into a generator? Yes
2) Is there a sensible way to obtain the behaviour I intended? Yes, see example below
The thing is to wrap the computation and either return a generator or a list with the data of that generator:
def mixed(is_generator=False):
# create a generator object
gen = (compute_stuff(i) for i in range(5))
# if we want just the generator
if is_generator:
return gen
# if not we consume it with a list and return that list
return list(gen)
Anyway, I would say this is a bad practice. You should have it separated, usually just have the generator function and then use some logic outside:
def computation():
for i in range(5):
# process some stuff including writing to a file
yield i
gen = computation()
if lazy:
for data in gen:
use_data(data)
else:
data = list(gen)
use_all_data(data)
Think about a function that I'm calling for its side effects, not return values (like printing to screen, updating GUI, printing to a file, etc.).
def fun_with_side_effects(x):
...side effects...
return y
Now, is it Pythonic to use list comprehensions to call this func:
[fun_with_side_effects(x) for x in y if (...conditions...)]
Note that I don't save the list anywhere
Or should I call this func like this:
for x in y:
if (...conditions...):
fun_with_side_effects(x)
Which is better and why?
It is very anti-Pythonic to do so, and any seasoned Pythonista will give you hell over it. The intermediate list is thrown away after it is created, and it could potentially be very, very large, and therefore expensive to create.
You shouldn't use a list comprehension, because as people have said that will build a large temporary list that you don't need. The following two methods are equivalent:
consume(side_effects(x) for x in xs)
for x in xs:
side_effects(x)
with the definition of consume from the itertools man page:
def consume(iterator, n=None):
"Advance the iterator n-steps ahead. If n is none, consume entirely."
# Use functions that consume iterators at C speed.
if n is None:
# feed the entire iterator into a zero-length deque
collections.deque(iterator, maxlen=0)
else:
# advance to the empty slice starting at position n
next(islice(iterator, n, n), None)
Of course, the latter is clearer and easier to understand.
List comprehensions are for creating lists. And unless you are actually creating a list, you should not use list comprehensions.
So I would got for the second option, just iterating over the list and then call the function when the conditions apply.
Second is better.
Think of the person who would need to understand your code. You can get bad karma easily with the first :)
You could go middle between the two by using filter(). Consider the example:
y=[1,2,3,4,5,6]
def func(x):
print "call with %r"%x
for x in filter(lambda x: x>3, y):
func(x)
Depends on your goal.
If you are trying to do some operation on each object in a list, the second approach should be adopted.
If you are trying to generate a list from another list, you may use list comprehension.
Explicit is better than implicit.
Simple is better than complex. (Python Zen)
You can do
for z in (fun_with_side_effects(x) for x in y if (...conditions...)): pass
but it's not very pretty.
Using a list comprehension for its side effects is ugly, non-Pythonic, inefficient, and I wouldn't do it. I would use a for loop instead, because a for loop signals a procedural style in which side-effects are important.
But, if you absolutely insist on using a list comprehension for its side effects, you should avoid the inefficiency by using a generator expression instead. If you absolutely insist on this style, do one of these two:
any(fun_with_side_effects(x) and False for x in y if (...conditions...))
or:
all(fun_with_side_effects(x) or True for x in y if (...conditions...))
These are generator expressions, and they do not generate a random list that gets tossed out. I think the all form is perhaps slightly more clear, though I think both of them are confusing and shouldn't be used.
I think this is ugly and I wouldn't actually do it in code. But if you insist on implementing your loops in this fashion, that's how I would do it.
I tend to feel that list comprehensions and their ilk should signal an attempt to use something at least faintly resembling a functional style. Putting things with side effects that break that assumption will cause people to have to read your code more carefully, and I think that's a bad thing.
In a Python tutorial, I've learned that
Like functions, generators can be recursively programmed. The following
example is a generator to create all the permutations of a given list of items.
def permutations(items):
n = len(items)
if n==0: yield []
else:
for i in range(len(items)):
for cc in permutations(items[:i]+items[i+1:]):
yield [items[i]]+cc
for p in permutations(['r','e','d']): print(''.join(p))
for p in permutations(list("game")): print(''.join(p) + ", ", end="")
I cannot figure out how it generates the results. The recursive things and 'yield' really confused me. Could someone explain the whole process clearly?
There are 2 parts to this --- recursion and generator. Here's the non-generator version that just uses recursion:
def permutations2(items):
n = len(items)
if n==0: return [[]]
else:
l = []
for i in range(len(items)):
for cc in permutations2(items[:i]+items[i+1:]):
l.append([items[i]]+cc)
return l
l.append([item[i]]+cc) roughly translates to the permutation of these items include an entry where item[i] is the first item, and permutation of the rest of the items.
The generator part yield one of the permutations instead of return the entire list of permutations.
When you call a function that returns, it disappears after having produced its result.
When you ask a generator for its next element, it produces it (yields it), and pauses -- yields (the control back) to you. When asked again for the next element, it will resume its operations, and run normally until hitting a yield statement. Then it will again produce a value and pause.
Thus calling a generator with some argument causes creation of actual memory entity, an object, capable of running, remembering its state and arguments, and producing values when asked.
Different calls to the same generator produce different actual objects in memory. The definition is a recipe for the creation of that object. After the recipe is defined, when it is called it can call any other recipe it needs -- or the same one -- to create new memory objects it needs, to produce the values for it.
This is a general answer, not Python-specific.
Thanks for the answers. It really helps me to clear my mind and now I want to share some useful resources about recursion and generator I found on the internet, which is also very friendly to the beginners.
To understand generator in python. The link below is really readable and easy to understand.
What does the "yield" keyword do in Python?
To understand recursion, "https://www.youtube.com/watch?v=MyzFdthuUcA". This youtube video gives a "patented" 4 steps method to writing any recursive method/function. That is very clear and practicable. The channel also has several videos to show people how does the recursion works and how to trace it.
I hope it can help someone like me.
Is there a way I can implement the code block below using map or list comprehension or any other faster way, keeping it functionally the same?
def name_check(names, n_val):
lower_names = names.lower()
for item in set(n_val):
if item in lower_names:
return True
return False
Any help here is appreciated
A simple implementation would be
return any(character in names_lower for character in n_val)
A naive guess at the complexity would be O(K*2*N) where K is the number of characters in names and N is the number of characters in n_val. We need one "loop" for the call to lower*, one for the inner comprehension, and one for any. Since any is a built-in function and we're using a generator expression, I would expect it to be faster than your manual loop, but as always, profile to be sure.
To be clear, any short-circuits, so that behaviour is preserved
Notes on Your Implementation
On using a set: Your intuition to use a set to reduce the number of checks is a good one (you could add it to my form above, also), but it's a trade-off. In the case that the first element short circuits, the extra call to set is an additional N steps to produce the set expression. In the case where you wind up checking each item, it will save you some time. It depends on your expected inputs. If n_val was originally an iterable, you've lost that benefit and allocated all the memory up front. If you control the input to the function, why not just recommend it's called using lists that don't have duplicates (i.e., call set() on its input), and leave the function general?
* #Neopolitan pointed out that names_lower = names.lower() should be called out of the loop, as your original implementation called it, else it may (will?) be called repeatedly in the generator expression
What Python's user-made list-comprehension construction is the most useful?
I have created the following two quantifiers, which I use to do different verification operations:
def every(f, L): return not (False in [f(x) for x in L])
def some(f, L): return True in [f(x) for x in L]
an optimized versions (requres Python 2.5+) was proposed below:
def every(f, L): return all(f(x) for x in L)
def some(f, L): return any(f(x) for x in L)
So, how it works?
"""For all x in [1,4,9] there exists such y from [1,2,3] that x = y**2"""
answer = every([1,4,9], lambda x: some([1,2,3], lambda y: y**2 == x))
Using such operations, you can easily do smart verifications, like:
"""There exists at least one bot in a room which has a life below 30%"""
answer = some(bots_in_this_room, lambda x: x.life < 0.3)
and so on, you can answer even very complicated questions using such quantifiers. Of course, there is no infinite lists in Python (hey, it's not Haskell :) ), but Python's lists comprehensions are very practical.
Do you have your own favourite lists-comprehension constructions?
PS: I wonder, why most people tend not to answer questions but criticize presented examples? The question is about favourite list-comprehension construction actually.
anyand all are part of standard Python from 2.5. There's no need to make your own versions of these. Also the official version of any and all short-circuit the evaluation if possible, giving a performance improvement. Your versions always iterate over the entire list.
If you want a version that accepts a predicate, use something like this that leverages the existing any and all functions:
def anyWithPredicate(predicate, l): return any(predicate(x) for x in l)
def allWithPredicate(predicate, l): return all(predicate(x) for x in l)
I don't particularly see the need for these functions though, as it doesn't really save much typing.
Also, hiding existing standard Python functions with your own functions that have the same name but different behaviour is a bad practice.
There aren't all that many cases where a list comprehension (LC for short) will be substantially more useful than the equivalent generator expression (GE for short, i.e., using round parentheses instead of square brackets, to generate one item at a time rather than "all in bulk at the start").
Sometimes you can get a little extra speed by "investing" the extra memory to hold the list all at once, depending on vagaries of optimization and garbage collection on one or another version of Python, but that hardly amounts to substantial extra usefulness of LC vs GE.
Essentially, to get substantial extra use out of the LC as compared to the GE, you need use cases which intrinsically require "more than one pass" on the sequence. In such cases, a GE would require you to generate the sequence once per pass, while, with an LC, you can generate the sequence once, then perform multiple passes on it (paying the generation cost only once). Multiple generation may also be problematic if the GE / LC are based on an underlying iterator that's not trivially restartable (e.g., a "file" that's actually a Unix pipe).
For example, say you are reading a non-empty open text file f which has a bunch of (textual representations of) numbers separated by whitespace (including newlines here and there, empty lines, etc). You could transform it into a sequence of numbers with either a GE:
G = (float(s) for line in f for s in line.split())
or a LC:
L = [float(s) for line in f for s in line.split()]
Which one is better? Depends on what you're doing with it (i.e, the use case!). If all you want is, say, the sum, sum(G) and sum(L) will do just as well. If you want the average, sum(L)/len(L) is fine for the list, but won't work for the generator -- given the difficulty in "restarting f", to avoid an intermediate list you'll have to do something like:
tot = 0.0
for i, x in enumerate(G): tot += x
return tot/(i+1)
nowhere as snappy, fast, concise and elegant as return sum(L)/len(L).
Remember that sorted(G) does return a list (inevitably), so L.sort() (which is in-place) is the rough equivalent in this case -- sorted(L) would be supererogatory (as now you have two lists). So when sorting is needed a generator may often be preferred simply due to conciseness.
All in all, since L is identically equivalent to list(G), it's hard to get very excited about the ability to express it via punctuation (square brackets instead of round parentheses) instead of a single, short, pronounceable and obvious word like list;-). And that's all a LC is -- punctuation-based syntax shortcut for list(some_genexp)...!
This solution shadows builtins which is generally a bad idea. However the usage feels fairly pythonic, and it preserves the original functionality.
Note there are several ways to potentially optimize this based on testing, including, moving the imports out into the module level and changing f's default into None and testing for it instead of using a default lambda as I did.
def any(l, f=lambda x: x):
from __builtin__ import any as _any
return _any(f(x) for x in l)
def all(l, f=lambda x: x):
from __builtin__ import all as _all
return _all(f(x) for x in l)
Just putting that out there for consideration and to see what people think of doing something so potentially dirty.
For your information, the documentation for module itertools in python 3.x lists some pretty nice generator functions.