Generator not closing over data as expected - python

Sorry if the title is poorly worded, I'm not sure how to phrase it. I have a function that basically iterates over the 2nd dimension of a 2 dimensional iterable. The following is a simple reproduction:
words = ['ACGT', 'TCGA']
def make_lists():
for i in range(len(words[0])):
iter_ = iter([word[i] for word in words])
yield iter_
lists = list(make_lists())
for list_ in lists:
print(list(list_))
Running this outputs:
['A', 'T']
['C', 'C']
['G', 'G']
['T', 'A']
I would like to yield generators instead of having to evaluate words, in case words is very long, so I tried the following:
words = ['ACGT', 'TCGA']
def make_generators():
for i in range(len(words[0])):
gen = (word[i] for word in words)
yield gen
generators = list(make_iterator())
for gen in generators:
print(list(gen))
However, running outputs:
['T', 'A']
['T', 'A']
['T', 'A']
['T', 'A']
I'm not sure exactly what's happening. I suspect it has something to do with the generator comprehension not closing over its scope when yielded, so they're all sharing. If I create the generators inside a separate function and yield the return from that function it seems to work.

i is a free variable for those generators now, and they are now going to use its last value, i.e 3. In simple words, they know from where they are supposed to fetch the value of i but are not aware of actual value of i when they were created. So, something like this:
def make_iterator():
for i in range(len(words[0])):
gen = (word[i] for word in words)
yield gen
i = 0 # Modified the value of i
will result in:
['A', 'T']
['A', 'T']
['A', 'T']
['A', 'T']
Generator expressions are implemented as function scope, on the other hand a list comprehension runs right away and can fetch the value of i during that iteration itself.(Well list comprehensions are implemented as function scope in Python 3, but the difference is that they are not lazy)
A fix will be to use a inner function that captures the actual value of i in each loop using a default argument value:
words = ['ACGT', 'TCGA']
def make_iterator():
for i in range(len(words[0])):
# default argument value is calculated at the time of
# function creation, hence for each generator it is going
# to be the value at the time of that particular iteration
def inner(i=i):
return (word[i] for word in words)
yield inner()
generators = list(make_iterator())
for gen in generators:
print(list(gen))
You may also want to read:
What do (lambda) function closures capture?
Python internals: Symbol tables, part 1

Related

Append multiple items to a list on a for loop in python

I have a nested python for loop and need to append 2 times a value, is the code below PEP8 valid? Or there is a better pythonic way to to write the function?
def function():
empty_list = []
my_list = ['a', 'b', 'c']
for letter_1 in my_list:
for letter_2 in my_list:
empty_list.append(letter_1)
empty_list.append(letter_2)
return empty_list
Your code is right and PEP8 compliant. I would remove the my_list from the function block and make it a function's parameter. I would suggest using list.extend() to perform the operation you need in one line. In order to make it a bit more Pythonic I would add typing hints and the function's docstring. The code would look like this:
from typing import List
def function(my_list: List) -> List:
"""Function's docstring.
Args:
my_list (List): List of characters.
Returns:
List: Processed list of characters.
"""
empty_list = []
for a in my_list:
for b in my_list:
empty_list.extend((a, b))
return empty_list
I don't know which IDE you use, but on Visual Studio Code you can download some extensions to generate docstrings automatically from your function's/classes' signature and typing hints. And also, there's extensions to automatically lint Python code to be PEP8 compliant.
I would also add a small test to make sure my function works as expected. Something like this:
assert function(['a', 'b', 'c']) == ['a', 'a', 'a', 'b', 'a', 'c',
'b', 'a', 'b', 'b', 'b', 'c', 'c', 'a', 'c', 'b', 'c', 'c']
Assuming that your desired output is:
Desired output:
['a','a','a','b','a','c', # letter_1, with loop of letter_2
'b','a','b','b','b','c', # letter_2, with loop of letter_2
'c','a','c','b','c','c'] # letter_3, with loop of letter_2
An alternative (more "pythonic"?) way to write your function is to use the itertools library and list comprehensions:
def alt_function(my_list = ['a', 'b', 'c']):
iterable = chain.from_iterable([a+b for a, b in product(my_list, repeat=2)])
return list(iterable)
alt_function()
Output
['a','a','a','b','a','c',
'b','a','b','b','b','c',
'c','a','c','b','c','c']

Understanding the syntax of list comprehensions

I don't understand the syntax for list comprehension:
newList = [expression(element) for element in oldList if condition]
The bit I don't understand is (element). Let's say you had a following code:
List = [character for character in 'Hello world!']
print(list)
And then you will get:
['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd']
Since the first character isn't quite an expression, what is it doing? Does it just mean that each item in the string is getting stored in a new list?
Python list comprehensions are for loops executed in a list to generate a new list.The reason python list comprehensions are evaluated backward from or right to left is because usually anything inside a bracket( [], {}, () ) in python is executed from right to left with just a few exeptions .Another thing to note is that a string is an iterable (lists,tuples, sets, dictionaries, numpy arrays) concatenating characters so it can be iterated over like a list.
List Comprhension form:
new_list = [item for item in my_list]
This will have the same effect:
for item in my_list:
my_list.append(item)
Since a strings is an iterable of characters you can do this:
my_list = [character for character in 'Hello world!']
print(list)
Output:
['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd']
Your list comprehension can also be written as:
my_list = []
for character in 'Hello world':
my_list.append(character)
print(my_list)
I am also pointing out that you shouldn't use built in methods(such as List) as variable names because when you do you overide them which will bar you from using that method in the future.
Here is a complete list of all builins as of python 3.9.6:
Let's understand the syntax for Python List Comprehensions using a few examples. Reference to complete documentation.
Basic usage: [<expression> for <item> in <iterable>]. In python, any object that implements the __next__() method is an iterable.
For example, List objects are iterables. Consider the following code:
list_a = [1, 2, 3]
list_b = [item for item in list_a] # [1, 2, 3]
list_c = [item + 1 for item in list_a] # [2, 3, 4]
list_d = [None for item in list_a] # [None, None, None]
Interestingly, a String object is also iterable. So when you iterate over a string, each item will be a character. This was the case in your example.
Now, coming to expressions. In python, any value (integer, float, char, etc) or any object is an expression. Just to clarify, the expression does not contain an equals symbol. This is an excellent answer to the question, "What is an expression in Python?".
I noticed that you had used (also pointed out in comments), list as the name of a variable. There are some keywords and names you should not use as variable names in Python. You can find the list of all such builtin names by: (refer this post for more details)
import builtins
dir(builtins)

recursive function for extract elements from deep nested lists/tuples

I want to write a function that extracts elements from deep nested tuples and lists, say I have something like this
l = ('THIS', [('THAT', ['a', 'b']), 'c', ('THAT', ['d', 'e', 'f'])])
And I want a flat list without 'THIS' and 'THAT':
list = ['a', 'b', 'c', 'd', 'e', 'f']
Here's what I have so far:
def extract(List):
global terms
terms = []
for i in word:
if type(i) is not str:
extract(i)
else:
if i is not "THIS" and i is not "THAT":
terms.append(i)
return terms
But I keep getting list = ['d', 'e', 'f'], it looks like the terms = [] is set again after looping to 'c'.
You're doing terms = [] at the top of the function, so of course every time you recursively call the function, you're doing that terms=[] again.
The quickest solution is to write a simple wrapper:
def _extract(List):
global terms
for i in word:
if type(i) is not str:
_extract(i)
else:
if i is not "THIS" and i is not "THAT":
terms.append(i)
return terms
def extract(List):
global terms
terms = []
return _extract(List)
One more thing: You shouldn't use is to test for string equality (except in very, very special cases). That tests that they're the same string object in memory. It will happen to work here, at least in CPython (because both "THIS" strings are constants in the same module—and even if they weren't, they'd get interned)—but that's not something you want to rely on. Use ==, which tests that they both mean the same string, whether or not they're actually the identical object.
Testing types for identity is useful a little more often, but still not usually what you want. In fact, you usually don't even want to test types for equality. You don't often have subclasses of str—but if you did, you'd probably want to treat them as str (since that's the whole point of subtyping). And this is even more important for types that you do subclass from more often.
If you don't completely understand all of that, the simple guideline is to just never use is unless you know you have a good reason to.
So, change this:
if i is not "THIS" and i is not "THAT":
… to this:
if i != "THIS" and i != "THAT":
Or, maybe even better (definitely better if you had, say, four strings to check instead of two), use a set membership test instead of anding together multiple tests:
if i not in {"THIS", "THAT"}:
And likewise, change this:
if type(i) is not str:
… to this:
if not isinstance(i, str):
But while we're being all functional here, why not use a closure to eliminate the global?
def extract(List)
terms = []
def _extract(List):
nonlocal terms
for i in word:
if not isinstance(i, str):
_extract(i)
else:
if i not in {"THIS", "THAT"}:
terms.append(i)
return terms
return _extract(List)
This isn't the way I'd solve this problem (wim's answer is probably what I'd do if given this spec and told to solve it with recursion), but this has the virtue of preserving the spirit of (and most of the implementation of) your existing design.
It will be good to separate the concerns of "flattening" and "filtering". Decoupled code is easier to write and easier to test. So let's first write a "flattener" using recursion:
from collections import Iterable
def flatten(collection):
for x in collection:
if isinstance(x, Iterable) and not isinstance(x, str):
yield from flatten(x)
else:
yield x
Then extract and blacklist:
def extract(data, exclude=()):
yield from (x for x in flatten(data) if x not in exclude)
L = ('THIS', [('THAT', ['a', 'b']), 'c', ('THAT', ['d', 'e', 'f'])])
print(*extract(L, exclude={'THIS', 'THAT'}))
Assuming that the first element of each tuple can be disregarded, and we should recurse with list that is the second element, we can do this:
def extract(node):
if isinstance(node, tuple):
return extract(node[1])
if isinstance(node, list):
return [item for sublist in [extract(elem) for elem in node] for item in sublist]
return node
The list comprehension is a little dense, here's the same with loops:
def extract(node):
if isinstance(node, tuple):
return extract(node[1])
if isinstance(node, list):
result = []
for item in node:
for sublist in extract(item):
for elem in sublist:
result.append(elem)
return result
return node
This iterative function should do the trick alongside the .extend() list operator.
def func(lst):
new_lst = []
for i in lst:
if i != 'THAT' and i != 'THIS':
if type(i) == list or type(i) == tuple:
new_lst.extend(func(i))
else: new_lst.append(i)
return new_lst
l = ('THIS', [('THAT', ['a', 'b']), 'c', ('THAT', ['dk', 'e', 'f'])])
print(func(l))
['a', 'b', 'c', 'dk', 'e', 'f']

making calls to iter and next when iterating through a generator

i am writing a function that takes an iterator an int and a padding at the end to be added if what was iterated through has less than n values.. I am able to get the function working completely for the iterator parameters that are not of type generator and if it is it would raise the typerror exception where I would be working on the generator in that block of code. The problem is I am able to yield all values inside the generator but I have not been able to figure out a way to add the padding at the end because the outer for loop interferes. I need to implement this by making calls to iter and next which I have been playing around with but it has not been working... Here is the function ill explain
def n_with_pad(iterable,n,pad=None):
for i in range(n):
try:
yield iterable[i]
except IndexError:
yield pad
except TypeError:
for i in iterable:
yield i
so I were to call this function as follow
for i n_with_pad('function',3):
print(i,end=' ')
i would print: 'f' 'u' 'n'
but adding the pad with iterables that have less than n values would print as follows
for i n_with_pad('abcdefg',10,'?'):
print(i,end=' ')
'a', 'b', 'c', 'd', 'e', 'f', 'g', '?', '?' and '?'
for the second call I am able to get up to
'a', 'b', 'c', 'd', 'e', 'f', 'g'
with the code I have so far but cannot seem to add the ??? to satisfy n-values
I see no benefit to trying the __getitem__ approach and falling back to the iterator protocol. Just use the iterable, that's even the name of the variable!
def n_with_pad(iterable,n,pad=None):
it = iter(iterable)
for _ in range(n):
yield next(it,pad)
demo:
''.join(n_with_pad('function',3,pad='?'))
Out[6]: 'fun'
''.join(n_with_pad('function',10,pad='?'))
Out[7]: 'function??'

looking for unique elements in chemical species

Determining unique elements: Write a function which, when given a list of species,
will return an alphabetized list of the unique elements contained in the set of species.
Make use of the parser from the previous step. Example: calling your function with
an input of ['CO', 'H2O', 'CO2', 'CH4'] should return an output of ['C', 'H',
'O']
This is part of a larger project that I am doing.
The problem I am having is how to look at the individual characters of each element. Once I have this I should be able to check if its unique or not. I know this is not right, its just a rough idea of something I am thinking.
def unique_elements(x):
if x in y
else
y.append(x)
return y
>>> def sanitize(compound):
return compound.translate(None,string.digits)
>>> def elementazie(compoud):
return re.findall("([A-Z][a-z]*)",compoud)
>>> sorted(set(chain(*(elementazie(sanitize(s)) for s in species))))
['Au', 'C', 'H', 'O']

Categories