how to use python's yield statement - python

I have a list of items and would like to generate all possible subsets. Therefore I'm using a recursive function with the item number and a list of all selected items as parameters. The function is called with 0 as the first parameter and does the following:
It looks at the item described by the index parameter
It selects it
It calls itself with an incremented index parameter
It deselects the item
It calls itself with an incremented index parameter
I'm needing the possible subsets to optimise something but since the list will get very long, I can't look at all of them. At first I tried to use brute force to take all subsets into consideration but that was a naive idea. Now the new plan is to create a greedy algorithm that takes the first "useful" selection: I want to look at all subsets until I find one that suits my needs and figured that python's yield statement is exactly the right choice. Here's some code:
def bruteForceLeft(selected,index):
#left is the list of which i need subsets
#its a gobal variable. to test the code, just make sure that you have a
#list called left in scope
if index==len(left):
#print(selected)
yield selected
else:
#the algorithm stores the selection in a tuple of two lists
#that's necessary since there's a second list called right as well
#I think you can just ignore this. Think of selected as a list that
#contains the current selection, not a tuple that contains the current
#selection on the right as well as the left side.
selected[0].append(left[index])
bruteForceLeft(selected,index+1)
selected[0].pop()
bruteForceLeft(selected,index+1)
#as you can see I pass a tuple of two empty lists to the function.
#only the first one is used in this piece of code
for option in bruteForceLeft( ([],[]) ,0):
print(option)
#check if the option is "good"
#break
The output is: nothing
At first I thought that I had made an error in generating the subsets, but in the if condition you can see that I have a commented print statement. If I uncomment this print statement and instead comment out the yield statement all the possible choices are printed - and the for loop is broken
With the yield statement the code runs without error, but it doesn't do anything either.

The problem is that when you recursively call bruteForceLeft, the yielded values don't magically get yielded from the enclosing function. So, you need to re-yield them yourself:
def bruteForceLeft(selected,index):
#left is the list of which i need subsets
if index==len(left):
#print(selected)
yield selected
else:
#the algorithm stores the selection in a tuple of two lists
#that's necessary since there's a second list called right as well
#I think you can just ignore this. Think of selected as a list that
#contains the current selection, not a tuple that contains the current
#selection on the right as well as the left side.
selected[0].append(left[index])
for s in bruteForceLeft(selected,index+1):
yield s
selected[0].pop()
for s in bruteForceLeft(selected,index+1):
yield s
(Edit: I actually just tested this, and your code has errors, but I'm pretty sure not re-yielding is the problem)

Related

Returning list of different results that are created recursively in Python

Lately I've been working with some recursive problems in Python where I have to generate a list of possible configurations (i.e list of permutations of a given string, list of substrings, etc..) using recursion. I'm having a very hard time in finding the best practice and also in understanding how to manage this sort of variable in recursion.
I'll give the example of the generate binary trees problem. I more-or-less know what I have to implement in the recursion:
If n=1, return just one node.
If n=3, return the only possible binary tree.
For n>3, crate one node and then explore the possibilities: left node is childless, right node is childless, neither node is childless. Explore these possibilites recursively.
Now the thing I'm having the most trouble visualising is how exactly I am going to arrive to the list of trees. Currently the practice I do is pass along a list in the function call (as an argument) and the function would return this list, but then the problem is in case 3 when calling the recursive function to explore the possibilites for the nodes it would be returning a list and not appending nodes to a tree that I am building. When I picture the recursion tree in my head I imagine a "tree" variable that is unique to each of the tree leaves, and these trees are added to a list which is returned by the "root" (i.e first) call. But I don't know if that is possible. I thought of a global list and the recursive function not returning anything (just appending to it) but the problem I believe is that at each call the function would receive a copy of the variable.
How can I deal with generating combinations and returning lists of configurations in these cases in recursion? While I gave an example, the more general the answer the better. I would also like to know if there is a "best practice" when it comes to that.
Currently the practice I do is pass along a list in the function call (as an argument) and the function would return this list
This is not the purest way to attack a recursive problem. It would be better if you can make the recursive function such that it solves the sub problem without an extra parameter variable that it must use. So the recursive function should just return a result as if it was the only call that was ever made (by the testing framework). So in the example, that recursive call should return a list with trees.
Alternatively the recursive function could be a sub-function that doesn't return a list, but yields the individual values (in this case: trees). The caller can then decide whether to pack that into a list or not. This is more pythonic.
As to the example problem, it is also important to identify some invariants. For instance, it is clear that there are no solutions when n is even. As to recursive aspect: once you have decided to create a root, then both its left and right sided subtree will have an odd number of nodes. Of course, this is an observation that is specific to this problem, but it is important to look for such problem properties.
Finally, it is equally important to see if the same sub problems can reoccur multiple times. This surely is the case in the example problem: for instance, the left subtree may sometimes have the same number of nodes as the right subtree. In such cases memoization will improve efficiency (dynamic programming).
When the recursive function returns a list, the caller can then iterate that list to retrieve its elements (trees in the example), and use them to build an extended result that satisfies the caller's task. In the example case that means that the tree taken from the recursively retrieved list, is appended as a child to a new root. Then this new tree is appended to a new list (not related to the one returned from the recursive call). This new list will in many cases be longer, although this depends on the type of problem.
To further illustrate the way to tackle these problems, here is a solution for the example problem: one which uses the main function for the recursive calls, and using memoization:
class Solution:
memo = { 1: [TreeNode()] }
def allPossibleFBT(self, n: int) -> List[Optional[TreeNode]]:
# If we didn't solve this problem before...
if n not in self.memo:
# Create a list for storing the results (the trees)
results = []
# Before creating any root node,
# decide the size of the left subtree.
# It must be odd
for num_left in range(1, n, 2):
# Make the recursive call to get all shapes of the
# left subtree
left_shapes = self.allPossibleFBT(num_left)
# The remainder of the nodes must be in the right subtree
num_right = n - 1 - num_left # The root also counts as 1
right_shapes = self.allPossibleFBT(num_right)
# Now iterate the results we got from recursion and
# combine them in all possible ways to create new trees
for left in left_shapes:
for right in right_shapes:
# We have a combination. Now create a new tree from it
# by putting a root node on top of the two subtrees:
tree = TreeNode(0, left, right)
# Append this possible shape to our results
results.append(tree)
# All done. Save this for later re-use
self.memo[n] = results
return self.memo[n]
This code can be made more compact using list comprehension, but it may make the code less readable.
Don't pass information into the recursive calls, unless they need that information to compute their local result. It's much easier to reason about recursion when you write without side effects. So instead of having the recursive call put its own results into a list, write the code so that the results from the recursive calls are used to create the return value.
Let's take a trivial example, converting a simple loop to recursion, and using it to accumulate a sequence of increasing integers.
def recursive_range(n):
if n == 0:
return []
return recursive_range(n - 1) + [n]
We are using functions in the natural way: we put information in with the arguments, and get information out using the return value (rather than mutation of the parameters).
In your case:
Now the thing I'm having the most trouble visualising is how exactly I am going to arrive to the list of trees.
So you know that you want to return a list of trees at the end of the process. So the natural way to proceed, is that you expect each recursive call to do that, too.
How can I deal with generating combinations and returning lists of configurations in these cases in recursion? While I gave an example, the more general the answer the better.
The recursive calls return their lists of results for the sub-problems. You use those results to create the list of results for the current problem.
You don't need to think about how recursion is implemented in order to write recursive algorithms. You don't need to think about the call stack. You do need to think about two things:
What are the base cases?
How does the problem break down recursively? (Alternately: why is recursion a good fit for this problem?)
The thing is, recursion is not special. Making the recursive call is just like calling any other function that would happen to give you the correct answer for the sub-problem. So all you need to do is understand how solving the sub-problems helps you to solve the current one.

Class wordplay with four methods

I have the following problem and two very important questions.
Write a class called Wordplay. It should have a field that holds a list of words. The user
of the class should pass the list of words they want to use to the class. There should be the
following methods:
words_with_length(length) — returns a list of all the words of length length
starts_with(s) — returns a list of all the words that start with s
ends_with(s) — returns a list of all the words that end with s
palindromes() — returns a list of all the palindromes in the list
First problem. After compiling my program the methods starts with and ends with return the same word.
Next problem. In this case i have created a list of three names. But what if i wanted to ask for a list size and iterate over it while asking to input a word. How can i implement that idea?
class Wordplay:
def __init__(self):
self.words_list=[]
def words_with_lenght(self,lenght):
for i in range(0,len(self.words_list)-1):
if len(self.words_list[i])==lenght:
return self.words_list[i]
def starts_with_s(self,s):
for i in range(0,len(self.words_list)-1):
if s.startswith('s')==True:
return self.words_list[i]
def ends_with_s(self,s):
for i in range(0,len(self.words_list)-1):
if s.endswith('s')==True:
return self.words_list[i]
def palindromes(self):
for i in range(0,len(self.words_list)-1):
normal_word=self.words_list[i]
reversed_word=normal_word[::-1]
if reversed_word==normal_word:
return reversed_word
verification=Wordplay()
verification.words_list=['sandro','abba','luis']
lenght=int(input('Digit size you want to compare\n'))
s='s'
print(verification.words_with_lenght(lenght))
print(verification.starts_with_s(s))
print(verification.ends_with_s(s))
print(verification.palindromes())
If i input for example size 4 i expect the result to be:
abba,luis ; sandro ; luis ; abba and not-
abba; sandro ; sandro ; abba
In the line if s.startswith('s')==True:, you've passed the string "s" into the function resulting in
if 's'.startswith('s')==True:
# ^^^
return self.words_list[i]
This conditional is always true. You probably don't need a parameter here at all since the assignment asks you to hard code "s". You can use:
if self.words_list[i].startswith('s'):
return self.words_list[i]
Notice the above example uses a return as soon as a match is found. This is a problem. The loops in this program break early, returning from the function as soon as a single match is located. You may have intended to append each successful match to a list and return the resulting list or use the yield keyword to return a generator (but the caller would need to use list() if they want a persistent list from the generator). Using a list to build a result would look like:
result = []
for i in range(len(self.words_list)):
if self.words_list[i].startswith('s'):
result.append(self.words_list[i])
return result
Another issue: the loops in this program don't iterate all the way through their respective lists. The range() function is inclusive of the start and exclusive of the end, so you likely intended range(len(self.words_list)) instead of range(0, len(self.words_list) - 1).
Beyond that, there are a number of design and style points I'd like to suggest:
Use horizontal space between operators and use vertical whitespace around blocks.
foo=bar.corge(a,b,c)
if foo==baz:
return quux
is clearer as
foo = bar.corge(a, b, c)
if foo == baz:
return quux
Use 4 spaces to indent instead of 2, which makes it easier to quickly determine which code is in which block.
Prefer for element in my_list instead of for i in range(len(my_list)). If you need the index, in most cases you can use for i, elem in enumerate(my_list). Better yet, use list comprehensions to perform filtering operations, which is most of this logic.
There's no need to use if condition == True. if condition is sufficient. You can simplify confusing and inaccurate logic like:
def palindromes(self):
for i in range(0,len(self.words_list)-1):
normal_word=self.words_list[i]
reversed_word=normal_word[::-1]
if reversed_word==normal_word:
return reversed_word
to, for example:
def palindromes(self):
return [word for word in self.words_list if word[::-1] == word]
that is, avoid intermediate variables and indexes whenever possible.
I realize you're probably tied down to the design, but this strikes me as a strange way to write a utility class. It'd be more flexible as static methods that operate on iterables. Typical usage might be like:
from Wordplay import is_palindrome
is_palindrome(some_iterable)
instead of:
wordplay = Wordplay(some_iterable)
wordplay.palindromes()
My rationale is that this class is basically stateless, so it seems odd to impose state when none is needed. This is a bit subjective, but worth noting (if you've ever used the math or random modules, it's the same idea).
The lack of parameter in the constructor is even weirder; the client of the class has to magically "know" somehow that words_list is the internal variable name they need to make an assignment to in order to populate class state. This variable name should be an implementation detail that the client has no idea about. Failing providing a parameter in the initialization function, there should be a setter for this field (or just skip internal state entirely).
ends_with_s(self, s) is a silly function; it seems the designer is confused between wanting to write ends_with(self, letter) and ends_with_s(self) (the former is far preferable). What if you want a new letter? Do you need to write dozens of functions for each possible ending character ends_with_a, ends_with_b, ends_with_c, etc? I realize it's just a contrived assignment, but the class still exhibits poor design.
Spelling error: words_with_lenght -> words_with_length.
Here's a general tip on how to build the skill at locating these problems: work in very small chunks and run your program often. It appears that these four functions were written all in one go without testing each function along the way to make sure it worked first. This is apparent because the same mistakes were repeated in all four functions.
s.endswith('s') compares your input string s ("s") with "s". "s" ends in "s", so it always returns your first entry. Change it to if self.words_list[i].startswith('s'): (same for endswith).
I would recommend changing your for loops to iterate over the words themselves though:
def ends_with_s(self, s):
for word in self.words_list:
if word.endswith('s'):
return word
Entering a list of values as you described:
amount = int(input("How many words? "))
words = [input("Word {}".format(i + 1)) for i in range(amount)]

making python code block with loop faster

Is there a way I can implement the code block below using map or list comprehension or any other faster way, keeping it functionally the same?
def name_check(names, n_val):
lower_names = names.lower()
for item in set(n_val):
if item in lower_names:
return True
return False
Any help here is appreciated
A simple implementation would be
return any(character in names_lower for character in n_val)
A naive guess at the complexity would be O(K*2*N) where K is the number of characters in names and N is the number of characters in n_val. We need one "loop" for the call to lower*, one for the inner comprehension, and one for any. Since any is a built-in function and we're using a generator expression, I would expect it to be faster than your manual loop, but as always, profile to be sure.
To be clear, any short-circuits, so that behaviour is preserved
Notes on Your Implementation
On using a set: Your intuition to use a set to reduce the number of checks is a good one (you could add it to my form above, also), but it's a trade-off. In the case that the first element short circuits, the extra call to set is an additional N steps to produce the set expression. In the case where you wind up checking each item, it will save you some time. It depends on your expected inputs. If n_val was originally an iterable, you've lost that benefit and allocated all the memory up front. If you control the input to the function, why not just recommend it's called using lists that don't have duplicates (i.e., call set() on its input), and leave the function general?
* #Neopolitan pointed out that names_lower = names.lower() should be called out of the loop, as your original implementation called it, else it may (will?) be called repeatedly in the generator expression

Pythonic way to add to a set and care about if it worked?

Often times I find that, when working with Pythonic sets, the Pythonic way seems to be absent.
For example, doing something like a dijkstra or a*:
openSet, closedSet = set(nodes), set(nodes)
while openSet:
walkSet, openSet = openSet, set()
for node in walkSet:
for dest in node.destinations():
if dest.weight() < constraint:
if dest not in closedSet:
closedSet.add(dest)
openSet.add(dest)
This is a weakly contrived example, the focus is the last three lines:
if not value in someSet:
someSet.add(value)
doAdditionalThings()
Given the Python way of working with, for example, accessing/using values of a dict, I would have expected to be able to do:
try:
someSet.add(value)
except KeyError:
continue # well, that's ok then.
doAdditionalThings()
As a C++ programmer, my skin crawls a bit that I can't even do:
if someSet.add(value):
# add wasn't blocked by the value already being present
doAdditionalThings()
Is there a more Pythonic (and if possible more efficient) way to work with this sort of set-as-guard usage?
The add operation is not supposed to also tell you if the item was already in the set; it just makes sure it is in there after you add it. Or put another way, what you want is not "add an item and check if it worked"; you want to first check if the item is there, and if not, then do some special stuff. If all you wanted to do was add the item, you wouldn't do the check at all. There is nothing unpythonic about this pattern:
if item not in someSet:
someSet.add(item)
doStuff()
else:
doOtherStuff()
It is true that the API could have been designed so that .add returned whether the item was already in there, but in my experience that's not a particularly common use case. Part of the point of sets is that you can freely add items without worrying about whether they were already in there (since adding an already-included item has no effect). Also, having .add return None is consistent with the general convention for Python builtin types that methods that mutate their arguments return None. It is really things like dict.setdefault (which gets an item but first adds it if isn't there) that are the unusual case.

Correct way to iterate twice over a list?

What is the correct way to perform multiple iteration over a container? From python documentation:
Iterator - A container object (such as a list) produces a fresh new
iterator each time you pass it to the iter() function or use it in a
for loop. Attempting this with an iterator will just return the same
exhausted iterator object used in the previous iteration pass, making
it appear like an empty container.
The intention of the protocol is that once an iterator’s next() method
raises StopIteration, it will continue to do so on subsequent calls.
Implementations that do not obey this property are deemed broken.
(This constraint was added in Python 2.3; in Python 2.2, various
iterators are broken according to this rule.)
If I have this code:
slist = [1,2,3,4]
rlist = reversed(slist)
list(rlist)
#[4,3,2,1]
tuple(rlist)
#()
What would be the easiest and most correct way to iterate over 'rlist' twice?
rlist = list(reversed(slist))
Then iterate as often as you want. This trick applies more generally; whenever you need to iterate over an iterator multiple times, turn it into a list. Here's a code snippet that I keep copy-pasting into different projects for exactly this purpose:
def tosequence(it):
"""Turn iterable into a sequence, avoiding a copy if possible."""
if not isinstance(it, collections.Sequence):
it = list(it)
return it
(Sequence is the abstract type of lists, tuples and many custom list-like objects.)
I wouldn't stored the list twice, if you can not combine it to iterate once, then I would
slist = [1,2,3,4]
for element in reversed(slist):
print element # do first iteration stuff
for element in reversed(slist):
print element # do second iteration stuff
Just think of the reversed() as setting up a reverse iterator on slist. The reversed is cheap. That being said, if you only ever need it reversed, I would reverse it and just have it stored like that.
What is the correct way to perform multiple iteration over a container?
Just do it twice in a row. No problem.
What would be the easiest and most correct way to iterate over 'rlist' twice?
See, the reason that isn't working for you is that rlist isn't "a container".
Notice how
list(slist) # another copy of the list
tuple(slist) # still works!
So, the simple solution is to just ensure you have an actual container of items if you need to iterate multiple times:
rlist = list(reversed(slist)) # we store the result of the first iteration
# and then that result can be iterated over multiple times.
If you really must not store the items, try itertools.tee. But note that you won't really avoid storing the items if you need to complete one full iteration before starting the next. In the general case, storage is really unavoidable under those restrictions.
Why don't you simply reverse the original list in-place (slist.reverse()), then iterate over it as many times as you wish, and finally reverse it again to obtain the original list once again?
If this doesn't work for you, the best solution for iterating over the list in reversed order is to create a new reverse iterator every time you need to iterate
for _ in xrange(as_many_times_as_i_wish_to_iterate_this_list_in_reverse_order):
for x in reversed(slist):
do_stuff(x)

Categories