Set variables in python list comprehension - python

If I have 2 or 3 of the same calculations done within a generator for each loop, is there a way to just set them as a variable?
A quick example would be like this:
#Normal
[len( i ) for i in list if len( i ) > 1]
#Set variable
[x for i in list if x > 1; x = len( i )]
Before anyone says len( i ) would be so fast the difference would be negligible, I also mean for other calculations, using len just made it easier to read.
Also, if there is a way, how would you set multiple variables?
Apologies if it's been asked before, but I've searched around and not found anything.

One way to get around the expensive operation is to nest a generator in a list comprehension that simply acts as a filter, for example
def foo(x): # assume this function is expensive
return 2*x
>>> [j for j in (foo(i) for i in range(6)) if j > 4]
# ^ only called once per element
[6, 8, 10]
Using analogous functions and variables to your example, you'd have
[x for x in (len(i) for i in list) if x > 1]

Most implementations of Python do not, as you correctly surmise, have common sub-expression optimization, so your first expression would indeed call len(x) twice per iteration. So why not just have two comprehensions:
a = [len(x) for x in list]
b = [x for x in a if x > 1]
That make two passes, but only one call of len() per. If the function were an expensive one, that's probably a win. I'd have to time this to be sure.
Cyber's nested version is essentially the same thing.

using itertools.imap in python2 will be an efficient way to do what you need and most likely outperform a generator expression:
[x for x in imap(len, lst) if x > 4]

Related

How to find last "K" indexes of vector satisfying condition (Python) ? (Analogue of Matlab's "find" )

Consider some vector:
import numpy as np
v = np.arange(10)
Assume we need to find last 2 indexes satisfying some condition.
For example in Matlab it would be written e.g.
find(v <5 , 2,'last')
answer = [ 3 , 4 ] (Note: Matlab indexing from 1)
Question: What would be the clearest way to do that in Python ?
"Nice" solution should STOP search when it finds 2 desired results, it should NOT search over all elements of vector.
So np.where does not seems to be "nice" in that sense.
We can easyly write that using "for", but is there any alternative way ?
I am afraid using "for" since it might be slow (at least it is very much so in Matlab).
This attempt doesn't use numpy, and it is probably not very idiomatic.
Nevertheless, if I understand it correctly, zip, filter and reversed are all lazy iterators that take only the elements that they really need. Therefore, you could try this:
x = list(range(10))
from itertools import islice
res = reversed(list(map(
lambda xi: xi[1],
islice(
filter(
lambda xi: xi[0] < 5,
zip(reversed(x), reversed(range(len(x))))
),
2
)
)))
print(list(res))
Output:
[3, 4]
What it does (from inside to outside):
create index range
reverse both array and indices
zip the reversed array with indices
filter the two (value, index)-pairs that you need, extract them by islice
Throw away the values, retain only indices with map
reverse again
Even though it looks somewhat monstrous, it should all be lazy, and stop after it finds the first two elements that you are looking for. I haven't compared it with a simple loop, maybe just using a loop would be both simpler and faster.
Any solution you'd find will iterate over the list even if the loop is 'hidden' inside a function.
The solution to your problem depends on the assumptions you can make e.g. is the list sorted?
for the general case I'd iterate over the loop starting at the end:
def find(condition, k, v):
indices = []
for i, var in enumerate(reversed(v)):
if condition(var):
indices.append(len(v) - i - 1)
if len(indices) >= k:
break
return indices
The condition should then be passed as a function, so you can use a lambda:
v = range(10)
find(lambda x: x < 5, 3, v)
will output
[4, 3, 2]
I'm not aware of a "good" numpy solution to short-circuiting.
The most principled way to go would be using something like Cython which to brutally oversimplify it adds fast loops to Python. Once you have set that up it would be easy.
If you do not want to do that you'd have to employ some gymnastics like:
import numpy as np
def find_last_k(vector, condition, k, minchunk=32):
if k > minchunk:
minchunk = k
l, r = vector.size - minchunk, vector.size
found = []
n_found = 0
while r > 0:
if l <= 0:
l = 0
found.append(l + np.where(condition(vector[l:r]))[0])
n_found += len(found[-1])
if n_found >= k:
break
l, r = 3 * l - 2 * r, l
return np.concatenate(found[::-1])[-k:]
This tries balancing loop overhead and numpy "inflexibility" by searching in chunks, which we grow exponentially until enough hits are found.
Not exactly pretty, though.
This is what I've found that seems to do this job for the example described (using argwhere which returns all indices that meet the criteria and then we find the last two of these as a numpy array):
ind = np.argwhere(v<5)
ind[-2:]
This searches through the entire array so is not optimal but is easy to code.

Most efficient if statement?

I would like to write a function that takes integer numbers x, y, L and R as parameters and returns True if x**y lies in the interval (L, R] and False otherwise.
I am considering several ways to write a conditional statement inside this function:
if L < x ** y <= R:
if x ** y > L and x ** y <= R:
if x ** y in range(L + 1, R + 1):
Why is option 1 the most efficient in terms of execution time ?
Both #1 and #3 avoid recalculating x ** y, where #2 must calculate it twice.
On Python 2, #3 will be terrible, because it must compute the whole contents of the range. On Python 3.2+, it doesn't have to (range is smart, and can properly determine mathematically whether an int appears in the range without actually iterating, in constant time), but it's at best equivalent to #1, since creating the range object at all has some overhead.
As tobias_k mentions in the comments, if x ** y produces a float, #3 will be slower (breaks the Python 3.2+ O(1) membership testing optimization, requiring an implicit loop over all values), and will get different results than #1 and #2 if the value is not equal to any int value in the range. That is, testing 3.5 in range(1, 5) returns False, and has to check 3.5 against 1, 2, 3, and 4 individually before it can even tell you that much.
Basically, stick to #1, it's going to be the only one that avoids redundant computations and avoids creating a ton of values for comparison on both Py 2 and Py3. #3 is not going to be much (if at all) slower on Python 3.2+, but it does involve creating a range object that isn't needed here, and won't be quite logically equivalent.
The first one has to evaluate x**y only once, so it should be faster than the second (also, more readable). The third one would have to loop over the iterator (in python 2, so it should be slower than both) or make two comparisons (in python 3, so it is no better than the first one). Keep the first one.

applying for loop such that counters are multiplied rather than being added in python

hello I am relatively new to python! Is there a way to do this using for loops in python?
This is a java implementation of something i want to do in python
for (i=1;i<20; i*= 2)
{System.out.println(i);}
Solution in while loop in python`
while i<20:
print i
i*=2
I cannot figure out a way to do this using for loops. Implemented it using while loop obviously, but still curious to know whether there is a method to do so or not
There are lots of ways to do this, e.g.
for i in range(5):
i = 2 ** i
print i
or using generators
from itertools import count, takewhile
def powers_of_two():
for i in count():
yield 2 ** i
for i in takewhile(lambda x: x < 20, powers_of_two()):
print i
But in the end, it depends on your use case what version gives the clearest and most readbale code. In most cases, you would probably just use a while-loop, since it's simple and does the job.
You think of for loops like they would be in other languages, like C, C++, Java, JavaScript etc.
Python for loops are different; they work on iterables, and you always have to read them like:
for element in iterable
instead of the C'ish
for(start_condition; continue_condition; step_statement)
Hence, you would need iterable to generate your products.
I like readability, so here's how I'd do it:
for a in (2**i for i in range(20)):
print a
But that mainly works because we mathematically know that the i'th element of your sequence is going to be 2**i.
There is not a real way to do this in Python. If you wanted to mimic the logic of that for loop exactly, then a manual while loop would definitely be the way to go.
Otherwise, in Python, you would try to find a generator or generator expression that produces the values of i. Depending on the complexity of your post loop expression, this may require an actual function.
In your case, it’s a bit simpler because the numbers you are looking for are the following:
1 = 2 ** 0
2 = 2 ** 1
4 = 2 ** 2
8 = 2 ** 3
...
So you can generate the numbers using a generator expression (2 ** k for k in range(x)). The problem here is that you would need to specify a value x which happens to be math.floor(math.log2(20)) + 1 (because you are looking for the largest number k for which 2 ** k < 20 is true).
So the full expression would be this:
for i in (2 ** k for k in range(math.floor(math.log2(20)) + 1)):
print(i)
… which is a bit messy, so if you don’t necessarily need the i to be those values, you could move it inside the loop body:
for k in range(math.floor(math.log2(20)) + 1):
i = 2 ** k
print(i)
But this still only fits your purpose. If you wanted a “real” C-for loop expression, you could write a generator function:
def classicForLoop (init, stop, step):
i = init
while i < stop:
yield i
i = step(i)
Used like this:
for i in classicForLoop(1, 20, lambda x: x * 2):
print(i)
Of course, you could also modify the generator function to take lambdas as the first and second parameter, but it’s a bit simpler like this.
Use range() function to define iteration length.You can directly use print() than system.out.println
Alexander mentioned it and re-iterating
for i in range(1,20):print(i*2)
You can also consider while loop here-
i=0
while (i<20):
print(2**i)
i=i+1
Remember indentation in python

Python: Is this the most efficient way to reverse order without using shortcuts?

x = [1,2,3,4,5,6,7,8,9,10]
#Random list elements
for i in range(int(len(x)/2)):
value = x[i]
x[i] = x[len(x)-i-1]
x[len(x)-i-1] = value
#Confusion on efficiency
print(x)
This is a uni course for first year. So no python shortcuts are allowed
Not sure what counts as "a shortcut" (reversed and the "Martian Smiley" [::-1] being obvious candidates -- but does either count as "a shortcut"?!), but at least a couple small improvements are easy:
L = len(x)
for i in range(L//2):
mirror = L - i - 1
x[i], x[mirror] = x[mirror], x[i]
This gets len(x) only once -- it's a fast operation but there's no reason to keep repeating it over and over -- also computes mirror but once, does the swap more directly, and halves L (for the range argument) directly with the truncating-division operator rather than using the non-truncating division and then truncating with int. Nanoseconds for each case, but it may be considered slightly clearer as well as microscopically faster.
x = [1,2,3,4,5,6,7,8,9,10]
x = x.__getitem__(slice(None,None,-1))
slice is a python builtin object (like range and len that you used in your example)
__getitem__ is a method belonging to iterable types ( of which x is)
there are absolutely no shortcuts here :) and its effectively one line.

Possible to capture the returned value from a Python list comprehension for use a condition?

I want to construct a value in a list comprehension, but also filter on that value. For example:
[expensive_function(x) for x in generator where expensive_function(x) < 5]
I want to avoid calling expensive_function twice per iteration.
The generator may return an infinite series, and list comprehensions aren't lazily evaluated. So this wouldn't work:
[y in [expensive_function(x) for x in generator where expensive_function(x)] where y < 5]
I could write this another way, but it feels right for a list comprehension and I'm sure this is a common usage pattern (possible or not!).
If generator may be infinite, you do not want to use a list comprehension. And not everything has to be a one-liner.
def filtered_gen(gen):
for item in gen:
result = expensive_function(item)
if result < 5:
yield result
I'm going to answer the part of the question about how to capture intermediate results in a list comprehension for use in a condition, and ignore the question of a list comprehension built from an infinite generator (which obviously isn't going to work), just in case anyone looking for an answer to the question in the title comes here.
So, you have a list comprehension like this:
[expensive_function(x) for x in xrange(5) if expensive_function(x) % 2 == 0]
And you want to avoid calculating expensive_function twice when it passes your filter. Languages with more expressive comprehension syntax (Scala, Haskell, etc) allow you to simply assign names to expressions calculated from comprehension variables, which lets you do things like the following:
# NOT REAL PYTHON
[result for x in xrange(5) for result = expensive_function(x) if result % 2 == 0]
But you can easily emulate this by turning the assignment result = expensive_function(x) into another for iteration over a sequence of one element:
[result for x in xrange(5) for result in (expensive_function(x),) if result % 2 == 0]
And the proof:
>>> def expensive_function(x):
print 'expensive_function({})'.format(x)
return x + 10
>>> [expensive_function(x) for x in xrange(5) if expensive_function(x) % 2 == 0]
expensive_function(0)
expensive_function(0)
expensive_function(1)
expensive_function(2)
expensive_function(2)
expensive_function(3)
expensive_function(4)
expensive_function(4)
[10, 12, 14]
>>> [result for x in xrange(5) for result in (expensive_function(x),) if result % 2 == 0]
expensive_function(0)
expensive_function(1)
expensive_function(2)
expensive_function(3)
expensive_function(4)
[10, 12, 14]
you should make 2 generator expressions:
ys_all = (expensive(x) for x in xs)
ys_filtered = (y for y in ys_all if y <5)
or
from itertools import imap, ifilter
ys = ifilter(lambda y : y < 5, imap(expensive, xs))
Warning This is a bit convoluted but does the job. I will use an example to explain it.
Let say expensive_function = math.sin
infinite generator = collections.count(0.1,0.1)
then
[z for z in (y if y < 5 else next(iter([]))
for y in (math.sin(x) for x in itertools.count(0.1,0.1)))]
is
[0.09983341664682815,
0.19866933079506122,
0.2955202066613396,
0.3894183423086505,
0.479425538604203]
So your problem boils down to
[z for z in (y if y < 0.5 else next(iter([])) \
for y in (expensive_function(x) for x in generator))]
The trick is to force a StopIteration from a generator and nothing elegant than next(iter([]))
Here expensive_function is only called once per iteration.
Extend the Infinite Generator with a Finite Generator, with the Stop Condition.
As the generator won't allow raise StopIteration, we opt for a convoluted way i.e. next(iter([]))
And now you have a Finite Generator, which can be used in a List Comprehension
As OP was concerned with the application of the above method for a non-monotonic function here is a fictitious non-monotonic function
Expensive Non-Monotonic Function f(x) = random.randint(1,100)*x
Stop Condition = < 7
[z for z in (y if y < 7 else next(iter([])) for y in
(random.randint(1,10)*x for x in itertools.count(0.1,0.1)))]
[0.9,
0.6000000000000001,
1.8000000000000003,
4.0,
0.5,
6.0,
4.8999999999999995,
3.1999999999999997,
3.5999999999999996,
5.999999999999999]
Btw: sin in true sense is non-monotonic over the entire range (0,2pi)

Categories