I have 2 generators. One has a nested generator, and the other has a nested list comprehension.
// list of variables
variables = []
nestedGen = (x for x in (y for y in variables))
nestedList = (x for x in [y for y in variables])
Both generators can be simplified to remove nesting, but are they identical in terms of function unchanged?
There's a difference if variables gets modified.
Reassigning variables won't do anything, because both versions retrieve variables up front. The first for target in a genexp is evaluated immediately, unlike the rest of the genexp. For nestedList, that means evaluating the list comprehension immediately. For nestedGen, that means evaluating (but not iterating over) the nested genexp (y for y in variables), and evaluating that evaluates variables immediately.
Mutating the variables list, however, will affect nestedGen but not nestedList. nestedList immediately iterates over variables and builds a new list with the same contents, so adding or removing elements from variables won't affect nestedList. nestedGen doesn't do that, so adding or removing elements from variables will affect nestedGen.
For example, the following test:
variables = [1]
nestedGen = (x for x in (y for y in variables))
nestedList = (x for x in [y for y in variables])
variables.append(2)
variables = []
print(list(nestedGen))
print(list(nestedList))
prints
[1, 2]
[1]
The append affected only nestedGen. The variables = [] reassignment didn't affect either genexp.
Aside from that, there's also a difference in memory consumption, since nestedList builds a copy of variables with the list comprehension. That doesn't affect output, but it's still a practical consideration to keep in mind.
Yes, both expressions will produce a similar result. When the outer generator is enumerated, the inner sequence will be iterated and then disposed.
The difference is that, in the second case, a list object will have to be instantiated to hold the sequence of elements and then discarded. This is not a huge penalty, unless you are doing this a lot. The first option is preferred, in the general case.
** Important Note: ***
This would NOT be equivalent if you decide to store the inner sequence in a variable!
variables = (y for y in "abdce")
nestedGenList = list(x for x in variables)
nestedGenList = list(x for x in variables) #// will be empty!
variables = [y for y in "abdce"]
nestedGenList = list(x for x in variables)
nestedGenList = list(x for x in variables) #// ok!
In the first case, the iterator will already be fully iterated after the first use, and when tried to iterate the variable again, it will be empty.
Related
I tested two snippets of code and found out that declaring a set ahead of using it in a list comprehension was much faster than declaring it inside the list comprehension. Why does this happen? (Using python 3.9.13)
import time
# Setup
a = [x for x in range(10000)]
b = [x for x in range(8000)]
t = time.time()
b = set(b)
[x for x in a if x in b]
print(time.time() - t)
# 0.0010492801666259766
t = time.time()
[x for x in a if x in set(b)]
print(time.time() - t)
# 1.0515294075012207
I didn't expect there to be orders of magnitude of a difference...
Actually set([iterable]) function returns object of type set and inside the list comprehension you are repeating the execution of the function in each iteration, while in first case you only reference its result to b variable and execute the list comprehension on the referenced object.
I can't understand the explanation about evaluations.
What is the difference between normal and augmented assignments about this?
I know about in-place behavior.
https://docs.python.org/3/reference/simple_stmts.html#grammar-token-augmented-assignment-stmt
An augmented assignment expression like x += 1 can be rewritten as x = x + 1 to achieve a similar, but not exactly equal effect. In the augmented version, x is only evaluated once. Also, when possible, the actual operation is performed in-place, meaning that rather than creating a new object and assigning that to the target, the old object is modified instead.
For a simple variable x, there is nothing to evaluate. The difference is seen when x is an expression. For example:
some_list[get_index()] = some_list[get_index()] + 1
some_list[get_index()] += 1
The first line will call get_index() twice, but the second one calls it once.
The second difference mentioned is when x is a mutable object like a list. In that case x can be mutated in place instead of creating a new object. So for example lst = lst + [1,2] has to copy the entire list to the new one, but lst += [1,2] will mutate the list itself which might be possible to do without having to copy it. It also affects whether other references to lst see the change:
lst = [1,2,3]
lst2 = lst
lst += [4] # Also affects lst2
lst = lst + [5] # lst2 is unchanged because lst is now a new object
I have set objects which are comparable in some way and I want to remove objects from the set. I thought about how this problem changes, for different comparable relations between the elements. I was interested in the development of the search space, the use of memory and how the problem scales.
1st scenario: In the most simple scenario the relation would be bidirectional and thus we could remove both elements, as long as we can ensure that by removing the elements do not remove other 'partners'.
2nd scenario: The comparable relation is not bidirectional. Remove only the element in question, not the one it is comparable to. A simplified scenario would be the set consisting of integers and the comparable operation would be 'is dividable without rest'
I can do the following, not removing elements:
a_set = set([4,2,3,7,9,16])
def compare(x, y):
if (x % y == 0) and not (x is y):
return True
return False
def list_solution():
out = set()
for x in a_set:
for y in a_set:
if compare(x,y):
out.add(x)
break
result = a_set-out
print(result)
Of course, my first question as a junior Python programmer would be:
What is the appropriate set comprehension for this?
Also: I cannot modify the size of Python sets during iteration, without copying, right?
And now for the algo people out there: How does this problem change if the relation between the number of elements an element can be comparable to increases. How does it change if the comparable relation represent partial orders?
I will precede with confirming your claim - changing set while iterating it would trigger a RuntimeError, which would claim something along the lines of "Set changed size during iteration".
Now, lets start from the compare function: since you are using sets, x is y is probably like x == y and the last one is always a better choice when possible.
Furthermore, no need for the condition; you are already performing one:
def compare (x, y):
return x != y and x % y == 0
Now to the set comprehension - that's a messy one. After setting the set as an argument - which is better than using global variable - the normal code would be something like
for x in my_set:
for y in my_set:
if compare(x, y):
for a in (x, y):
temp.append(a)
notice the last two lines, which do not use unpacking because that would be impossible in the comprehension. Now, all what's left is to move the a to the front and make all : disappear - and the magic happens:
def list_solution (my_set):
return my_set - {a for x in my_set for y in my_set if compare(x, y) for a in (x, y)}
you can test it with something like
my_set = set([4, 2, 3, 7, 9, 16])
print(list_solution(my_set)) # {7}
The condition and the iteration on (x, y) can switch places - but I believe that iterating after confirming would be faster then going in and starting to iterate when there's a chance you wouldn't perform any action.
The change for the second case is minor - merely using x instead of the x, y unpacking:
def list_solution_2 (my_set):
return my_set - {x for x in my_set for y in my_set if compare(x, y)}
This question already has answers here:
Explanation of how nested list comprehension works?
(11 answers)
Closed 6 years ago.
Can someone please explain the meaning the syntax behind the following line of code:
temp3 = [x for x in temp1 if x not in s]
I understand it's for finding the differences between 2 lists, but what does the 'x' represent here? Each individual element in the list that is being compared? I understand that temp1 and s are lists. Also, does x for x have to have the same variable or could it be x for y?
[x for x in temp1 if x not in s]
It may help to re-order it slightly, so you can read the whole thing left to right. Let's move the first x to the end.
[for x in temp1 if x not in s yield x]
I've added a fake yield keyword so it reads naturally as English. If we then add some colons it becomes even clearer.
[for x in temp1: if x not in s: yield x]
Really, this is the order that things get evaluated in. The x variable comes from the for loop, that's why you can refer to it in the if and yield clauses. But the way list comprehensions are written is to put the value being yielding at the front. So you end up using a variable name that's not yet defined.
In fact, this final rewrite is exactly how you'd write an explicit generator function.
def func(temp1, s):
for x in temp1:
if x not in s:
yield x
If you call func(temp1, s) you get a generator equivalent to the list. You could turn it into that list with list(func(temp1, s)).
It iterates through each element in temp1 and checks to see if it is not in s before including it in temp3.
It is a shorter and more pythonic way of writing
temp3 = []
for item in temp1:
if item not in s:
temp3.append(item)
Where temp1 and s are the two lists you are comparing.
As for your second question, x for y will work, but probably not in the way you intend to, and certainly not in a very useful way. It will assign each item in temp1 to the variable name y, and then search for x in the scope outside of the list comprehension. Assuming x is defined previously (otherwise you will get NameError or something similar), the condition if x not in s will evaluate to the same thing for every item in temp1, which is why it’s not terribly useful. And if that condition is true, your resulting temp3 will be populated with xs; the y values are unused.
Do not take this as saying that using different variables in a list comprehension is never useful. In fact list comprehensions like [a if condition(x) else b for x in original_sequence] are often very useful. A list comprehension like [a for x in original_sequence if condition(x)] can also be useful for constructing a list containing exactly as many instances of a as the number of items in original_sequence that satisfy condition().
Try yourself:
arr = [1,2,3]
[x+5 for x in arr]
This should give you [6, 7, 8] that are the values on the [1,2,3] list plus 5. This syntax is know as list comprehension (or mapping). It applies the same instructions to all elements on a list. Would be the same as doing this:
for x in arr:
arr += 5
X is same variable and it is not y. It works same as below code
newList = []
for x in temp1:
if x not in s:
newList.append(x)
So x for x, here first is x which is inside append in code and x after for is same as for x in temp1.
I am looking for the proper term to describe this well-known property of collection objects, and more importantly, the way the stack diagram changes when variables are used to reference their elements:
>>> x = 5
>>> l = [x]
>>> x += 1
>>> l
[5]
>>> x
6
What is the name of what the list is doing to the variable x to prevent it from being bound to any changes to the original value of x? Shielding? Shared structure? List binding? Nothing is coming back from a Google search using those terms.
Here's an example with more detail (but doesn't have a definition, unfortunately).
Credit to ocw.mit.edu
What is the name of what the list is doing to the variable x to prevent it from being bound to any changes to the original value of x? Shielding? Shared structure? List binding? Nothing is coming back from a Google search using those terms.
Because the list isn't doing anything, and it isn't a property of collections.
In Python, variables are names.
>>> x = 5
This means: x shall be a name for the value 5.
>>> l = [x]
This means: l shall be a name for the value that results from taking the value that x names (5), and making a one-element list with that value ([5]).
>>> x += 1
x += 1 gets rewritten into x = x + 1 here, because integers are immutable. You cannot cause the value 5 to increase by 1, because then it would be 5 any more.
Thus, this means: x shall stop being a name for what it currently names, and start being a name for the value that results from the mathematical expression x + 1. I.e., 6.
That's how it happens with reference semantics. There is no reason to expect the contents of the list to change.
Now, let's look at what happens with value semantics, in a hypothetical language that looks just like Python but treats variables the same way they are treated in C.
>>> x = 5
This now means: x is a label for a chunk of memory that holds a representation of the number 5.
>>> l = [x]
This now means: l is a label for a chunk of memory that holds some list structure (possibly including some pointers and such), which will be initialized in some way so that it represents a list with 1 element, that has the value 5 (copied from the x variable). It cannot be made to contain x logically, since that is a separate variable and we have value semantics; so we store a copy.
>>> x += 1
This now means: increment the number in the x variable; now it is 6. The list is, again, unaffected.
Regardless of your semantics, you cannot affect the list contents this way. Expecting the list contents to change means being inconsistent in your interpretations. (This becomes more obvious if you rewrite the code to l = [5]; x = l[0]; x += 1.)
I would call it "immutability" of the contained object.
I think you compare your situation with the following one:
x = []
l = [x]
x += [1]
print l # --> [[1]]
The difference is:
In this situation (the mutable situation), you mutate your original object x which is contained in the list l.
In your situation however, you have x point to an immutable object (5), which is then added to the list. Afterwards, this reference is replaced by the one to 6, but only for x, not for the list.
So x += <something> either modifies x or replaces it with another object depending on the nature of the object type.
EDIT: It as well has nothing to do with the nature of lists. You can achieve the same with 2 variables:
x = 5
y = x
print x, y, x is y
x += 1
print x, y, x is y
vs.
x = []
y = x
print x, y, x is y
x += [1]
print x, y, x is y
The first one will change x due to the immutability of ints, resulting to x is y being false, while in the second one, x is y remains true because the object (the list) is mutated and the identity of the object referenced to by x and y stays the same.
The behavior you're describing has to do with references. If you're familiar with c, you're probably also familiar with "pointers." Pointers in c can get really complicated, and Python uses a data model that greatly simplifies things. But having a bit of background in c helps understand Python's behavior here, which is closely related to the behavior of c pointers. So "pointer," "reference," and "dereference" are all terms that are related to what you're talking about, although none of them is quite a "name" for it. Perhaps the best name for it is "indirection" -- though that's a bit too abstract and inclusive; this is a very specific kind of indirection. Perhaps "reference semantics"? Here's a slide from a talk by GvR himself that uses that term, and a google search turns up a few useful hits.
But if you don't have any background in c, here's my best explanation. In short, you can think of a Python name as a pointer to an object. So when you assign a name to a value, you're "pointing" that name at the value. Then, when you assign a new value to the name, you point it at a new value; but the old value isn't changed at all as a result. This seems natural when thinking about names as pointers here; the "value of the name" is changed, but the "value of the value" is not.
The thing that's a bit confusing is that += behaves inconsistently. When you use += on a number, the result makes perfect sense using the above metaphor:
x = 5
y = x
x += 1
print x, y
# 6 5
The behavior is exactly the same as it would be if you did x = x + 1.
But sometimes += is overloaded in a way that does in-place mutation. It's a pragmatic, but slightly inconsistent, idiom:
x = [5]
y = x
x += [6]
print x, y
# [5, 6] [5, 6]
So here, the "value of the value" is changed. This is actually not my favorite thing about Python, but there are good reasons for it.
The only thing list is doing to x in your example is reading it; it interacts with the variable in no other way. In fact the list generated by the expression [x] does not interact with the variable x at all.
If you had to introduce a piece of jargon for it, it would be value-capture, or perhaps simply independence.
I think the reason why there isn't a special term for this is that it is (a) not something that requires much discussion (b) is an aspect of strict by-value semantics where variables always hold references. Those kinds of semantics are pretty much the norm now (except where it is by reference with variables actually naming bits of memory containing objects). I think OP was expecting by-name or lazy semantics.
What you're effectively doing is constructing a list with one element that references the same object as x. Then you bind a new value to x while the list still references the old element. This is because += returns a reference to a new object (6) and leaves the old 5 object intact. Integers are immutable in Python.