multi line v single line for-loop different results - python

This is an exercise on Kaggle/Python/Strings and Dictionaries. I wasn't able to solve it so I peeked at the solution and tried to write it in a way I would do it (i.e. not necessarily as sophisticated but in a way I understood). I use Python tutor to visualise what's going on behind the code and understand most things but the for-loop is getting me.
normalised = (token.strip(",.").lower() for token in tokens) This works and gives me index [0]
but if I rewrite as:
for token in tokens:
normalised = token.strip(",.").lower()
it doesn't work; it gives me index [0][2] (presumably because casino is in casinoville). Can someone write the multi-line equivalent: for token in tokens:...?
code is below for a bit more context.
def word_search(doc_list, keyword):
Takes a list of documents (each document is a string) and a keyword.
Returns list of the index values into the original list for all documents
containing the keyword.
Example:
doc_list = ["The Learn Python Challenge Casino.", "They bought a car", "Casinoville"]
>>> word_search(doc_list, 'casino')
>>> [0]
"""
indices = []
counter = 0
for doc in doc_list:
tokens = doc.split()
**normalised = (token.strip(",.").lower() for token in tokens)**
if keyword.lower() in normalised:
indices.append(counter)
counter += 1
return indices
#Test - output should be [0]
doc_list = ["The Learn Python Challenge Casino.", "They bought a car", "Casinoville"]
keyword = 'Casino'
print(word_search(doc_list,keyword))

normalised = (token.strip(",.").lower() for token in tokens) returns a tuple generator. Let's explore this:
>>> a = [1,2,3]
>>> [x**2 for x in a]
[1, 4, 9]
This is a list comprehension. The multi-line equivalent is:
>>> a = [1,2,3]
>>> b = []
>>> for x in a:
... b.append(x**2)
...
>>> print(b)
[1, 4, 9]
Using parentheses instead of square brackets does not return a tuple (as one might suspect naively, as I did earlier), but a generator:
>>> a = [1,2,3]
>>> (x**2 for x in a)
<generator object <genexpr> at 0x0000024BD6E33B48>
We can iterate over this object with next:
>>> a = [1,2,3]
>>> b = (x**2 for x in a)
>>> next(b)
1
>>> next(b)
4
>>> next(b)
9
>>> next(b)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
This can be written as a multi-line expression like this:
>>> a = [1,2,3]
>>> def my_iterator(x):
... for k in x:
... yield k**2
...
>>> b = my_iterator(a)
>>> next(b)
1
>>> next(b)
4
>>> next(b)
9
>>> next(b)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
In the original example, an in comparison is used. This works for both the list and the generator, but for the generator it only works once:
>>> a = [1,2,3]
>>> b = [x**2 for x in a]
>>> 9 in b
True
>>> 5 in b
False
>>> b = (x**2 for x in a)
>>> 9 in b
True
>>> 9 in b
False
Here is a discussion of the issue with generator reset: Resetting generator object in Python
I hope that clarified the differences between list comprehensions, generators and multi-line loops.

Related

what is the difference and uses of these lines of code?

first line of code:
for i in list:
print(i)
second line of code:
print(i for i in list)
what would I use each of them for?
You can see for yourself what the difference is.
The first one iterates over range and then prints integers.
>>> for i in range(4):
... print(i)
...
0
1
2
3
The second one is a generator expression.
>>> print(i for i in range(4))
<generator object <genexpr> at 0x10b6c20f0>
How iteration works in the generator. Python generators are a simple way of creating iterators.
Simply speaking, a generator is a function that returns an object (iterator) which we can iterate over (one value at a time).
>>> g=(i for i in range(4))
>>> print(g)
<generator object <genexpr> at 0x100f015d0>
>>> print(next(g))
0
>>>
>>> print(next(g))
1
>>> print(next(g))
2
>>> print(next(g))
3
>>> print(next(g))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> g=(i for i in range(4))
>>> for i in g:
... print(i)
...
0
1
2
3
>>> for i in g:
... print(i)
...
>>>
>>>
In python3, you can use tuple unpacking to print the generator. If that's what you were going for.
>>> print(*(i for i in range(4)))
0 1 2 3
The first code snippet will iterate over your list and print the value of i for each pass through the loop. In most cases you will want to use something like this to print the values in a list:
my_list = list(range(5))
for i in my_list:
print(i)
0
1
2
3
4
The second snippet will evaluate the expression in the print statement and print the result. Since the expression in print statement, i for i in my_list evaluates to a generator expression, that string representation of that generator expression will be outputted. I cannot think of any real world cases where that is the result you would want.
my_list = list(range(5))
print(i for i in my_list)
<generator object <genexpr> at 0x0E9EB2F0>
The first way is just a loop going through a list and printing the elements one by one:
l = [1, 2, 3]
for i in l:
print(i)
output:
1
2
3
The second way, list comprehension, creates an iterable you can store (list, dictionary, etc.)
l = [i for i in l]
print( l ) #[1, 2, 3]
print( l[0] ) #1
print( l[1:] ) #[2, 3]
output:
[1, 2, 3]
1
[2, 3]
The second is used for doing 1 thing to all the elements e.g. turn all elements from string to int:
l = ['1', '2', '3']
l = [int(i) for i in l] #now the list is [1, 2, 3]
loops are better for doing a lot of things:
for i in range(4):
#Code
#more code
#lots of more code
pass
In response to the (since edited) answers that suggested otherwise:
the second one is NOT a list comprehension.
the two code snippets do NOT do the same thing.
>>> x = [1,2,3,4,5]
>>> print(i for i in x)
<generator object <genexpr> at 0x000002322FA1BA50>
The second one is printing a generator object because (i for i in x) is a generator. The first snippet simply prints the elements in the list one at a time.
BTW: don't use list as a variable name. It's the name of a built-in type in Python, so when you use it as a variable name, you overwrite the constructor for that type. Basically, you're erasing the built-in list() function.
The second one is a generator expression. Both will give same result if you convert the generator expression to list comprehension. In short, list comprehensions are used to increase both memory and execution efficiency. However, they are generally applicable to small blocks of codes, generally one to two lines.
For more information, see this link on official python website - https://docs.python.org/3/tutorial/datastructures.html?highlight=list%20comprehensions

Preventing a generator from yielding the same object twice

Assuming I have a generator yielding hashable values (str / int etc.) is there a way to prevent the generator from yielding the same value twice?
Obviously, I'm using a generator so I don't need to unpack all the values first so something like yield from set(some_generator) is not an option, since that will unpack the entire generator.
Example:
# Current result
for x in my_generator():
print(x)
>>> 1
>>> 17
>>> 15
>>> 1 # <-- This shouldn't be here
>>> 15 # <-- This neither!
>>> 3
>>> ...
# Wanted result
for x in my_no_duplicate_generator():
print(x)
>>> 1
>>> 17
>>> 15
>>> 3
>>> ...
What's the most Pythonic solution for this?
There is a unique_everseen in Python itertools module recipes that is roughly equivalent to #NikosOikou's answer.
The main drawback of these solutions is that they rely upon the hypothesis that elements of the iterable are hashable:
>>> L = [[1], [2,3], [1]]
>>> seen = set()
>>> for e in L: seen.add(e)
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
The more-itertools module refines the implementation to accept unhashables elements and the doc give a tip on how to keep a good speed in some cases (disclaimer: I'm the "author" of the tip).
You can check the source code.
You can try this:
def my_no_duplicate_generator(iterable):
seen = set()
for x in iterable:
if x not in seen:
yield x
seen.add(x)
You can use it by passing your generator as an argument:
for x in my_no_duplicate_generator(my_generator()):
print(x)

Lambda as iterator in Python returns function object on first iteration

I have this code snippet which I can't understand:
>>> l = lambda: -4, 'c', 0
>>> i = iter(l)
>>> i
<tuple_iterator object at 0x00700CD0>
>>> next(i)
<function <lambda> at 0x0070A4F8>
>>> next(i)
'c'
>>> next(i)
0
>>> next(i)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>>
Why is it returning lambda object on first iteration, instead of -4?
I think you might have misunderstood what l is.
l is a tuple of 3 elements:
a lambda that returns -4
the string c
the integer 0
When you create an iterator iterating through the tuple, of course the first call to next is going to give you a lambda!
Maybe you meant to call the lambda:
next(i)()
Or maybe you meant to declare l like this:
l = lambda: (-4, 'c', 0) # you might think you don't need the parentheses, but you do
A lambda that returns a tuple.
And then do iter(l()).
When you do this :
>>> l = lambda: -4, 'c', 0
l is actually a tuple containing first item as a lambda function, second item a string and third item an integer.
It is equivalent to the following :
>>> l = (lambda: -4, 'c', 0)
If you want to get access to the lambda function which returns -4, you should try this :
>>> i = iter(l)
>>> next(i)()
-4
But note that next(i)() works only with callable(lambda, functions etc) objects. If you use next(i)() with a string object python will raise TypeError: 'str' object is not callable. So always check if the item is callable. ie,
i = iter(l)
item = next(i)
if callable(item):
print(item())
else:
print(item)

Python - Printing Map Object Issue

I was playing with the map object and noticed that it didn't print if I do list() beforehand. When I viewed only the map beforehand, the printing worked. Why?
map returns an iterator and you can consume an iterator only once.
Example:
>>> a=map(int,[1,2,3])
>>> a
<map object at 0x1022ceeb8>
>>> list(a)
[1, 2, 3]
>>> next(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> list(a)
[]
Another example where I consume the first element and create a list with the rest
>>> a=map(int,[1,2,3])
>>> next(a)
1
>>> list(a)
[2, 3]
As per the answer from #newbie, this is happening because you are consuming the map iterator before you use it. (Here is another great answer on this topic from #LukaszRogalski)
Example 1:
w = [[1,5,7],[2,2,2,9],[1,2],[0]]
m = map(sum,w) # map iterator is generated
list(m) # map iterator is consumed here (output: [13,15,3,0])
for v in m:
print(v) # there is nothing left in m, so there's nothing to print
Example 2:
w = [[1,5,7],[2,2,2,9],[1,2],[0]]
m = map(sum,w) #map iterator is generated
for v in m:
print(v) #map iterator is consumed here
# if you try and print again, you won't get a result
for v in m:
print(v) # there is nothing left in m, so there's nothing to print
So you have two options here, if you only want to iterate the list once, Example 2 will work fine. However, if you want to be able to continue using m as a list in your code, you need to amend Example 1 like so:
Example 1 (amended):
w = [[1,5,7],[2,2,2,9],[1,2],[0]]
m = map(sum,w) # map iterator is generated
m = list(m) # map iterator is consumed here, but it is converted to a reusable list.
for v in m:
print(v) # now you are iterating a list, so you should have no issue iterating
# and reiterating to your heart's content!
It's because it return an generator so clearer example:
>>> gen=(i for i in (1,2,3))
>>> list(gen)
[1, 2, 3]
>>> for i in gen:
print(i)
>>>
Explanation:
it's because to convert it into the list it basically loops trough than after you want to loop again it will think that still continuing but there are no more elements
so best thing to do is:
>>> M=list(map(sum,W))
>>> M
[13, 15, 3, 0]
>>> for i in M:
print(i)
13
15
3
0
You can either use this:
list(map(sum,W))
or this:
{*map(sum,W)}

Python Store dynamic data

I don't know if the heading makes sense... but this is what I am trying to do using list
>>> x = 5
>>> l = [x]
>>> l
[5]
>>> x = 6
>>> l
[5] # I want l to automatically get updated and wish to see [6]
>>>
The same happens with dict, tuple. Is there a python object that can store the dynamic value of variable?
Thanks,
There's no way to get this to work due to how the assignment operator works in Python. x = WHATEVER will always rebind the local name x to WHATEVER, without modifying what previously x was previously bound to.(*)
You can work around this by replacing the integers with a container data type, such as single-element lists:
>>> x = [5]
>>> l = [x]
>>> l
[[5]]
>>> x[0] = 6
>>> l
[[6]]
but that's really a hack, and I wouldn't recommend it for anything but experimentation.
(*) Rebinding may actually modify previously bound objects when their reference count drops to zero, e.g. it may close files. You shouldn't rely on that, though.
A variable is a place to store data. A datastructure is a place to store data. Pick the one which meets your needs.
You can do it with the numpy module.
>>> from numpy import array
>>> a = array(5)
>>> a
array(5)
>>> l = [a]
>>> l
[array(5)]
>>> a.itemset(6)
>>> a
array(6)
>>> l
[array(6)]
Generally a 0-D numpy array can be treated as any regular value as shown below:
>>> a + 3
9
However, if you need to, you can access the underlying object as such:
>>> a.item()
6
Here's a kind of hacky method of dynamic access that isn't very extensible/flexible in its given form, but could be used as a basis for something better.
>>> a = 7
>>> class l:
def a_get(self):
global a
return a
def a_set(self, value):
global a
a = value
a = property(a_get, a_set)
>>> c = l()
>>> c.a
7
>>> a = 4
>>> c.a
4
>>> c.a = 6
>>> a
6

Categories