I tested two snippets of code and found out that declaring a set ahead of using it in a list comprehension was much faster than declaring it inside the list comprehension. Why does this happen? (Using python 3.9.13)
import time
# Setup
a = [x for x in range(10000)]
b = [x for x in range(8000)]
t = time.time()
b = set(b)
[x for x in a if x in b]
print(time.time() - t)
# 0.0010492801666259766
t = time.time()
[x for x in a if x in set(b)]
print(time.time() - t)
# 1.0515294075012207
I didn't expect there to be orders of magnitude of a difference...
Actually set([iterable]) function returns object of type set and inside the list comprehension you are repeating the execution of the function in each iteration, while in first case you only reference its result to b variable and execute the list comprehension on the referenced object.
Related
I have 2 generators. One has a nested generator, and the other has a nested list comprehension.
// list of variables
variables = []
nestedGen = (x for x in (y for y in variables))
nestedList = (x for x in [y for y in variables])
Both generators can be simplified to remove nesting, but are they identical in terms of function unchanged?
There's a difference if variables gets modified.
Reassigning variables won't do anything, because both versions retrieve variables up front. The first for target in a genexp is evaluated immediately, unlike the rest of the genexp. For nestedList, that means evaluating the list comprehension immediately. For nestedGen, that means evaluating (but not iterating over) the nested genexp (y for y in variables), and evaluating that evaluates variables immediately.
Mutating the variables list, however, will affect nestedGen but not nestedList. nestedList immediately iterates over variables and builds a new list with the same contents, so adding or removing elements from variables won't affect nestedList. nestedGen doesn't do that, so adding or removing elements from variables will affect nestedGen.
For example, the following test:
variables = [1]
nestedGen = (x for x in (y for y in variables))
nestedList = (x for x in [y for y in variables])
variables.append(2)
variables = []
print(list(nestedGen))
print(list(nestedList))
prints
[1, 2]
[1]
The append affected only nestedGen. The variables = [] reassignment didn't affect either genexp.
Aside from that, there's also a difference in memory consumption, since nestedList builds a copy of variables with the list comprehension. That doesn't affect output, but it's still a practical consideration to keep in mind.
Yes, both expressions will produce a similar result. When the outer generator is enumerated, the inner sequence will be iterated and then disposed.
The difference is that, in the second case, a list object will have to be instantiated to hold the sequence of elements and then discarded. This is not a huge penalty, unless you are doing this a lot. The first option is preferred, in the general case.
** Important Note: ***
This would NOT be equivalent if you decide to store the inner sequence in a variable!
variables = (y for y in "abdce")
nestedGenList = list(x for x in variables)
nestedGenList = list(x for x in variables) #// will be empty!
variables = [y for y in "abdce"]
nestedGenList = list(x for x in variables)
nestedGenList = list(x for x in variables) #// ok!
In the first case, the iterator will already be fully iterated after the first use, and when tried to iterate the variable again, it will be empty.
This question already has answers here:
Python list intersection efficiency: generator or filter()?
(4 answers)
Closed 12 months ago.
Let's assume we have two lists containing unique values and want to find the values that are in both lists using list comprehension.
a = [1,3,5]
b = [3,5,7]
c = [x for x in a if x in b]
print(c)
[3,5]
Simple enough. Now, what if each list had 1 million elements that we wanted to compare? List comprehension would continue to compare every element in list a to every element in list b, even after it has found '5' (from the example above) in both lists. Is that correct?
Would removing an element from the lists when it is found in both lists be more efficient to shorten the comparison time as it loops? Or is there something else I've probably missed?
for x in a:
if x in b:
c.append(x)
a.remove(x)
b.remove(x)
print(a)
[1]
print(b)
[7]
print(c)
[3,5]
Removing x from your list a in
for x in a:
if x in b:
c.append(x)
a.remove(x)
b.remove(x)
would add extra time complexity for the removing of the item. Whenever you call the remove it's of O(n) complexity with n being the number of items in the list. You could write a second for loop which makes it a bit faster, because you could "break" whenever you find the element. However, I think the biggest performance gainer is using a set, because of the O(1) lookup time. You can read about a set here:
https://www.w3schools.com/python/python_sets.asp
https://stackoverflow.com/questions/7351459/time-complexity-of-python-set-operations#:~:text=According%20to%20Python%20wiki%3A%20Time,collisions%20and%20O(n). I wrote a little code snippet for you where you can test and also see the performance difference:
I import the performance counter from the time library to measure the time.
from time import perf_counter
I generate two lists with unique elements:
a = [x for x in range(10000)]
b = [x * 2 for x in range(10000)]
I measure the time from your operation mentioned above:
start_list = perf_counter()
c = [x for x in a if x in b]
stop_list = perf_counter()
print(f"Calculating with list operations took {stop_list - start_list}s")
I measure the time via set operations:
start_set = perf_counter()
d = list(set(a) & set(b))
stop_set = perf_counter()
print(f"Calculating with set operations took {stop_set - start_set}s")
Just to make sure the two methods give the same result:
assert c == d
Output:
Calculating with list operations took 0.796774061s
Calculating with set operations took 0.0013706330000000655s
I can't understand the explanation about evaluations.
What is the difference between normal and augmented assignments about this?
I know about in-place behavior.
https://docs.python.org/3/reference/simple_stmts.html#grammar-token-augmented-assignment-stmt
An augmented assignment expression like x += 1 can be rewritten as x = x + 1 to achieve a similar, but not exactly equal effect. In the augmented version, x is only evaluated once. Also, when possible, the actual operation is performed in-place, meaning that rather than creating a new object and assigning that to the target, the old object is modified instead.
For a simple variable x, there is nothing to evaluate. The difference is seen when x is an expression. For example:
some_list[get_index()] = some_list[get_index()] + 1
some_list[get_index()] += 1
The first line will call get_index() twice, but the second one calls it once.
The second difference mentioned is when x is a mutable object like a list. In that case x can be mutated in place instead of creating a new object. So for example lst = lst + [1,2] has to copy the entire list to the new one, but lst += [1,2] will mutate the list itself which might be possible to do without having to copy it. It also affects whether other references to lst see the change:
lst = [1,2,3]
lst2 = lst
lst += [4] # Also affects lst2
lst = lst + [5] # lst2 is unchanged because lst is now a new object
I am using any with a list comprehension. I would like to break the list comprehension when any returns True. For example,
import time
def f(x):
time.sleep(2)
return x
beginTime = time.time()
result = any([f(x) == 0 for x in [0,1,3,5,7]])
endTime = time.time()
print(endTime - beginTime)
The above code prints 10 seconds although it could break the iteration after first True.
Use a generator expression instead of a list comprehension to avoid forming the list first:
result = any(f(x) == 0 for x in [0,1,3,5,7])
(the square brackets of the list comprehension are gone.)
Note that any has a short-circuiting behaviour in either case, but what differs is the lack of forming the whole list.
You can use a generator, as told by Mustafa, but retrieve only first element of truth.
The generator non necessary must be consumed totally, and walrus operator do the rest
import time
def f(x):
time.sleep(2)
return x
beginTime = time.time()
result = next((wr for x in [0,1,3,5,7] if (wr := f(x)) ==0))
endTime = time.time()
print(endTime - beginTime)
This takes only minimum time to retrieve first ocurrence
I am doing a conditional list comprehension e.g. newlist = [x for x in list if x % 2 == 0]. I want to limit the length of the resulting list to a specific number.
Is this possible without first comprehending the entire list and then slicing it?
I imagine something that has the functionality of:
limit = 10
newlist = []
for x in list:
if len(newlist) > limit:
break
if x % 2 == 0:
newlist.append(x)
Slicing the original list (e.g. [x for x in list[:25] if x % 2 == 0] is not possible, as the if condition does not return True in any predictable intervals in my specific use case.
Many thanks in advance.
Please don't name any variables list as it shadows the built-in list constructor. I used li as a replacement for the input list here.
import itertools as it
gen = (x for x in li if x % 2 == 0) # Lazy generator.
result = list(it.islice(gen, 25))
Since you are creating a list with the list comprehension you can slice your list directly after it is created.
[x for x in list[:25] if x % 2 == 0][:limit]