Why `itertools.repeat` always generate the same random number? - python

Compare the outputs of these two functions:
from itertools import repeat
def rand_list1():
l = lambda: np.random.rand(3)
return list(repeat(l(), 5))
def rand_list2():
return [np.random.rand(3) for i in range(5)]
We see that rand_list1 who uses itetools.repeat always generates the same 3 numbers. why is this? Can it be avoided, so each call of rand_list() will generate new numbers?
For example, the output of rand_list1():
[[0.07678796 0.22623777 0.07533145]
[0.07678796 0.22623777 0.07533145]
[0.07678796 0.22623777 0.07533145]
[0.07678796 0.22623777 0.07533145]
[0.07678796 0.22623777 0.07533145]]
and the output of rand_list2():
[[0.77863856 0.30345662 0.7007517 ]
[0.56422447 0.97138115 0.47976387]
[0.20576279 0.92875791 0.06518335]
[0.2992384 0.89726684 0.16917078]
[0.8440534 0.38016789 0.51691172]]

There is a basic miscomprehension on how the language works in your question.
With the lambda expression, you simply create a new function named l.
At the moment you do l() Python will call the function - and it will return a value: it is the returned value that will be used in place of the expression l() in the remaining of the larger expression. So, in this case, you are actually calling repeat with a single, already generated, number as the first parameter.
FUnctions that are passed as arguments to be called on their destination, and then are run anew each time are an allowed construct in Python, but (1) they depend on the receiving function being able to use functions as arguments, and that is not the case of repeat, and (2) more importanty one has to pass the function name without typing in the parentheses.
In this case, repeat is redundant, as the universal syntax that allows one to call a function multiple times to create an iterator already does the repetition you thought repeat would create for you.
Just do:
return [l() for _ in range(5)]
This will call l() for each interaction of the loop.
(btw, one should strongly avoid l as a single variable or function name in any context, as in many fonts it ishard to distinguish l from 1)

The reason why list(repeat(l(), 5)) repeats the same value.
itertools.repeat() will just iterate the same result.
Because l() is assigned the result of l() in list(repeat(l(), 5)) when the l is called (which means l()).
itertools.repeat() document
repeat(10, 3) --> 10 10 10
So what is exactly going on?
stage 1
list(repeat(l(), 5))
stage 2
list(repeat([some numbers], 5))
stage 3
list(repeat([some numbers], 5)) --> [some numbers], [some numbers], [some numbers], [some numbers], [some numbers]

Related

What is the most optimal method for "slicing" path objects, specifically the iterdir() function? [duplicate]

I would like to loop over a "slice" of an iterator. I'm not sure if this is possible as I understand that it is not possible to slice an iterator. What I would like to do is this:
def f():
for i in range(100):
yield(i)
x = f()
for i in x[95:]:
print(i)
This of course fails with:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-37-15f166d16ed2> in <module>()
4 x = f()
5
----> 6 for i in x[95:]:
7 print(i)
TypeError: 'generator' object is not subscriptable
Is there a pythonic way to loop through a "slice" of a generator?
Basically the generator I'm actually concerned with reads a very large file and performs some operations on it line by line. I would like to test slices of the file to make sure that things are performing as expected, but it is very time consuming to let it run over the entire file.
Edit:
As mentioned I need to to this on a file. I was hoping that there was a way of specifying this explicitly with the generator for instance:
import skbio
f = 'seqs.fna'
seqs = skbio.io.read(f, format='fasta')
seqs is a generator object
for seq in itertools.islice(seqs, 30516420, 30516432):
#do a bunch of stuff here
pass
The above code does what I need, however is still very slow as the generator still loops through the all of the lines. I was hoping to only loop over the specified slice
In general, the answer is itertools.islice, but you should note that islice doesn't, and can't, actually skip values. It just grabs and throws away start values before it starts yield-ing values. So it's usually best to avoid islice if possible when you need to skip a lot of values and/or the values being skipped are expensive to acquire/compute. If you can find a way to not generate the values in the first place, do so. In your (obviously contrived) example, you'd just adjust the start index for the range object.
In the specific cases of trying to run on a file object, pulling a huge number of lines (particularly reading from a slow medium) may not be ideal. Assuming you don't need specific lines, one trick you can use to avoid actually reading huge blocks of the file, while still testing some distance in to the file, is the seek to a guessed offset, read out to the end of the line (to discard the partial line you probably seeked to the middle of), then islice off however many lines you want from that point. For example:
import itertools
with open('myhugefile') as f:
# Assuming roughly 80 characters per line, this seeks to somewhere roughly
# around the 100,000th line without reading in the data preceding it
f.seek(80 * 100000)
next(f) # Throw away the partial line you probably landed in the middle of
for line in itertools.islice(f, 100): # Process 100 lines
# Do stuff with each line
For the specific case of files, you might also want to look at mmap which can be used in similar ways (and is unusually useful if you're processing blocks of data rather than lines of text, possibly randomly jumping around as you go).
Update: From your updated question, you'll need to look at your API docs and/or data format to figure out exactly how to skip around properly. It looks like skbio offers some features for skipping using seq_num, but that's still going to read if not process most of the file. If the data was written out with equal sequence lengths, I'd look at the docs on Alignment; aligned data may be loadable without processing the preceding data at all, by e.g by using Alignment.subalignment to create new Alignments that skip the rest of the data for you.
islice is the pythonic way
from itertools import islice
g = (i for i in range(100))
for num in islice(g, 95, None):
print num
You can't slice a generator object or iterator using a normal slice operations.
Instead you need to use itertools.islice as #jonrsharpe already mentioned in his comment.
import itertools
for i in itertools.islice(x, 95)
print(i)
Also note that islice returns an iterator and consume data on the iterator or generator. So you will need to convert you data to list or create a new generator object if you need to go back and do something or use the little known itertools.tee to create a copy of your generator.
from itertools import tee
first, second = tee(f())
let's clarify something first.
Spouse you want to extract the first values ​​from your generator, based on the number of arguments you specified to the left of the expression. Starting from this moment, we have a problem, because in Python there are two alternatives to unpack something.
Let's discuss these alternatives using the following example. Imagine you have the following list l = [1, 2, 3]
1) The first alternative is to NOT use the "start" expression
a, b, c = l # a=1, b=2, c=3
This works great if the number of arguments at the left of the expression (in this case, 3 arguments) is equal to the number of elements in the list.
But, if you try something like this
a, b = l # ValueError: too many values to unpack (expected 2)
This is because the list contains more arguments than those specified to the left of the expression
2) The second alternative is to use the "start" expression; this solve the previous error
a, b, c* = l # a=1, b=2, c=[3]
The "start" argument act like a buffer list.
The buffer can have three possible values:
a, b, *c = [1, 2] # a=1, b=2, c=[]
a, b, *c = [1, 2, 3] # a=1, b=2, c=[3]
a, b, *c = [1, 2, 3, 4, 5] # a=1, b=2, c=[3,4,5]
Note that the list must contain at least 2 values (in the above example). If not, an error will be raised
Now, jump to your problem. If you try something like this:
a, b, c = generator
This will work only if the generator contains only three values (the number of the generator values must be the same as the number of left arguments). Elese, an error will be raise.
If you try something like this:
a, b, *c = generator
If the number of values in the generator is lower than 2; an error will be raise because variables "a", "b" must have a value
If the number of values in the generator is 3; then a=, b=(val_2>, c=[]
If the numeber of values in the generator is greater than 3; then a=, b=(val_2>, c=[, ... ]
In this case, if the generator is infinite; the program will be blocked trying to consume the generator
What I propose for you is the following solution
# Create a dummy generator for this example
def my_generator():
i = 0
while i < 2:
yield i
i += 1
# Our Generator Unpacker
class GeneratorUnpacker:
def __init__(self, generator):
self.generator = generator
def __iter__(self):
return self
def __next__(self):
try:
return next(self.generator)
except StopIteration:
return None # When the generator ends; we will return None as value
if __name__ == '__main__':
dummy_generator = my_generator()
g = GeneratorUnpacker(dummy_generator )
a, b, c = next(g), next(g), next(g)

How to print a list of function values from a generating function in python?

I'm new to python and just starting to learn the basics.
I have defined a function recursively and I want to print a list of function outputs.
This is the code:
def x(n):
assert n>=0, "Only non-negative integers n are allowed"
if n == 0:
return 5
else:
return (x(n-1)+5)/x(n-1)
print([x(0),x(1),x(2)])
for k in range(0,9,1):
print(x(k))
So my question is: say I want to print a list of the first 10 outputs of the sequence/function, i.e. x(0),...,x(9), how do I do this without actually listing each output manually? I want them to be in the form "[x(0),...,x(9)]", just like I did for the first 3 values. My attempt is in the last command of the program, where k moves from 0 to 9. The last command clearly prints the first 10 outputs, but not as a list, i.e. in [] brackets.
Any input is greatly appreciated.
One Solution:
I replaced the code
for k in range(0,9,1):
print(x(k))
with
print([x(k) for k in range(9)])
This puts the outputs in a list, i.e. in the [ ] brackets. Worked wonderfully!
You can use list comprehension.
print([x(n) for n in range(9)])
# outputs: [5, 2.0, 3.5, 2.4285714285714284, 3.058823529411765, 2.634615384615384, 2.8978102189781025, 2.72544080604534, 2.83456561922366]
Explanation:
We're making a list out by calling the function x() for each of the numbers (n) that are in the range from 0 to 9 (not included).
Please note that it is implicit that the starting point of the range() function is 0, that the step is 1, and the endpoint (9) is not included.
Here's a solution for a beginner (not an one-liner, should be easier to understand):
myarray = []
for i in range(9):
myarray.append(x(i))
Just to show the alternative to a list comprehension using map, since this is practically the scenario that map was made for:
xs = map(x, range(9))
map takes a function, and applies it to each member of the supplied iterable.
The main difference between this and using a comprehension is this returns a lazy iterable (a map object), not a list. x will not be applied to an element until you request the element.
Use of a list comprehension/generator expression is preferable in the majority of scenarios, but map is nice if you need/can tolerate a lazy result, and you already have a predefined function.

lambda operators in python loops [duplicate]

This question already has answers here:
Creating lambda inside a loop [duplicate]
(3 answers)
Closed 6 years ago.
I'm encountering some strange behavior with lambda functions in a loop in python. When I try to assign lambda functions to dictionary entries in a list, and when other entries in the dictionary are used in the function, only the last time through the loop is the lambda operator evaluated. So all of the functions end up having the same value!
Below is stripped-down code that captures just the parts of what I'm trying that is behaving oddly. My actual code is more complex, not as trivial as this, so I'm looking for an explanation and, preferably, a workaround.
n=4
numbers=range(n)
entries = [dict() for x in numbers]
for number, entry in zip(numbers,entries):
n = number
entry["number"] = n
entry["number2"] = lambda x: n*1
for number in numbers:
print(entries[number]["number"], entries[number]["number2"](2))
The output is:
0 3
1 3
2 3
3 3
In other words, the dictionary entires that are just integers are fine, and were filled properly by the loop. But the lambda functions — which are trivial and should just return the same value as the "number" entries — are all set to the last pass through.
What's going on?
Try this
N=4
numbers=range(N)
entries = [dict() for x in numbers]
for number, entry in zip(numbers,entries):
entry["number"] = number
entry["number2"] = lambda x,n=number: n*1
for number in numbers:
print(entries[number]["number"], entries[number]["number2"](2))
It prints (python3)
0 0
1 1
2 2
3 3
To avoid confusion, n referred to different things in your code. I used it only at one place.
It is a closure problem.
By the end of your for loop, the n variable - which, unlike in static languages such as C#, is set to 3, which is then being accessed in the lambda expression. The variable value is not fixed; as another answer on the site points out, lambda expressions are fluid and will retain references to the variables involved instead of capturing the values at the time of creation. This question also discusses your issue.
To fix it, you need to give the lambdas new, local variable via default parameters:
entry["number2"] = lambda x, n=n: n*1
This creates a new variable in the lambda's scope, called n, which sets its default value to the "outside" value of n. Note that this is the solution endorsed by the official FAQ, as this answer by Adrien Plisson states.
Now, you can call your lambda like normal and ignore the optional parameter, with no ill effect.
EDIT: As originally stated by Sci Prog, this solution makes n = number redundant. Your final code will look similar to this:
lim = 4
numbers = range(lim)
entries = [dict() for x in numbers]
for number, entry in zip(numbers, entries):
entry["number"] = number
entry["number2"] = lambda x, n = number: n*1
for number in numbers:
print(entries[number]["number"], entries[number]["number2"](2))
You are probably reaching the problem that the method is created as referencing a variable n. The function is only evaluated after the loop so you are going to call the function which references n. If you're ok with having the function evaluated at the time of assignment you could put a function call around it:
(lambda x: n*1)(2)
or if you want to have the functions to use, have them reference the specific value you want. From your code you could use a default argument as a workaround:
entry["number"] = n
entry["number2"] = lambda x, n=n: n*1
The difference comes down to a question of memory addressing. I imagine it went something like this:
You: Python, please give me a variable called "n"
Python: Ok! Here it is, it is at memory slot 1
You: Cool! I will now create functions which say take that variable "n"
value (at memory slot 1) and multiply it by 1 and return that to me.
Python: Ok! Got it:
1. Take the value at memory slot 1.
2. Multiply by 1.
3. Return it to you.
You: Done with my looping, now evaluate those instructions!
Python: Ok! Now I will take the value of at memory slot 1 and multiply by 1
and give that to you.
You: Hey, I wanted each function to reference different values!
Python: I followed your instructions exactly!

Please help me translate the following lambda to human language

What do the following expression actually does?
list = [lambda n=n: lambda x: x+n for n in range(10)]
More specifically:
What does n=n mean?
What will be the content of 'list'?
What will be the output of
print(list[0](14)) and print(list[0]()(14))
and why?
What does n=n mean?
lambda lets you define functions that take parameters, just like def. And those parameters can have default argument values. So, lambda n=n: is the same as def foo(n=n):.
In fact, when faced with an expression that's too complicated for you to read, it's often worth unpacking into simple statements:
list = []
for n in range(10):
def spam(n=n):
def eggs(x):
return x+n
return eggs
list.append(spam)
Now, why would you want to create a parameter named n with default value n? Why not just lambda:? The official FAQ explains this, but let's try to summarize.
If you just write this:
funcs = [lambda: n for n in range(10)]
… what you get is 10 functions of no parameters, that are all closures over the same variable, n. Because n has the value 9 at the end of the loop, when called, they're all going to return 9.
But if you do this:
funcs = [lambda n=n: n for n in range(10)]
… what you get is 10 functions of one optional parameter n (which hides the closure n from view), whose default value is the value of n at the time each function was defined. So, when called with no arguments, the first one will return 0, the second 1, and so on.
In your case, of course, the functions aren't just returning n, they're returning a function that takes a parameter, adds n to it, and returns the result. But the idea is the same; you want them to return different functions, which add 0, 1, … 9 to their arguments, not all return equal functions that all add 9.
What will be the content of list?
list will be 10 functions of one optional parameter whose default values range from 0 to 9, each of which returns a function of one parameter. That returned function is a closure over the value of n from the outer function. So, when it's called, it returns its argument, x, plus the n variable that ranges from 0 through 9.
What will be the output of
print(list[0](14))
Here, you're calling the first outer function, list[0], with the argument 14. So, instead of its default value 0 for n, it's going to have 14. So, what you'll get is a function that takes one argument and adds 14 to it. But it will print out as something like:
<function <listcomp>.<lambda>.<locals>.<lambda> at 0x105f21f28>
That long mess is Python 3.4+ trying to be helpful by telling you where to find the function definition. Usually, when a function is nested this deeply, most of the steps along the way have names. In this case, you've got three layers of anonymous functions, so none of the names are very useful…
In order to see it do anything useful, you'll have to call it:
print(list[0](14)(20))
And this will give you 34.
You could also use the inspect module, or just dir, to poke around inside the function. For example, print(list[0](14).__code__.co_freevars[0], list[0](14).__closure__[0].cell_contents) will tell you that it's stashed the number 14 under the name n for use by its internal function.
…
print(list[0]()(14))
Here, you're again calling list[0], but this time with no argument, so its n gets the default value of 0. So, it returns a function that adds 0 to its argument. You then call that function with 14, so you get 14.
To answer the last part first:
In [1]: list = [lambda n=n: lambda x: x+n for n in range(10)]
In [2]: print(list[0](14))
<function <lambda> at 0x7f47b5ca7cf8>
In [3]: print(list[0]()(14))
14
Obtained by running the code. list bad name by the way as list is a python builtin gives you 10 lambda functions that don't do much - the first will return the original argument x, the second the argument + 1, ect. as n is stored as the index of the lambda by n=n local to that lambda.

Inconsistent behavior of python generators

The following python code produces [(0, 0), (0, 7)...(0, 693)] instead of the expected list of tuples combining all of the multiples of 3 and multiples of 7:
multiples_of_3 = (i*3 for i in range(100))
multiples_of_7 = (i*7 for i in range(100))
list((i,j) for i in multiples_of_3 for j in multiples_of_7)
This code fixes the problem:
list((i,j) for i in (i*3 for i in range(100)) for j in (i*7 for i in range(100)))
Questions:
The generator object seems to play the role of an iterator instead of providing an iterator object each time the generated list is to be enumerated. The later strategy seems to be adopted by .Net LINQ query objects. Is there an elegant way to get around this?
How come the second piece of code works? Shall I understand that the generator's iterator is not reset after looping through all multiples of 7?
Don't you think that this behavior is counter intuitive if not inconsistent?
A generator object is an iterator, and therefore one-shot. It's not an iterable which can produce any number of independent iterators. This behavior is not something you can change with a switch somewhere, so any work around amounts to either using an iterable (e.g. a list) instead of an generator or repeatedly constructing generators.
The second snippet does the latter. It is by definition equivalent to the loops
for i in (i*3 for i in range(100)):
for j in (i*7 for i in range(100)):
...
Hopefully it isn't surprising that here, the latter generator expression is evaluated anew on each iteration of the outer loop.
As you discovered, the object created by a generator expression is an iterator (more precisely a generator-iterator), designed to be consumed only once. If you need a resettable generator, simply create a real generator and use it in the loops:
def multiples_of_3(): # generator
for i in range(100):
yield i * 3
def multiples_of_7(): # generator
for i in range(100):
yield i * 7
list((i,j) for i in multiples_of_3() for j in multiples_of_7())
Your second code works because the expression list of the inner loop ((i*7 ...)) is evaluated on each pass of the outer loop. This results in creating a new generator-iterator each time around, which gives you the behavior you want, but at the expense of code clarity.
To understand what is going on, remember that there is no "resetting" of an iterator when the for loop iterates over it. (This is a feature; such a reset would break iterating over a large iterator in pieces, and it would be impossible for generators.) For example:
multiples_of_2 = iter(xrange(0, 100, 2)) # iterator
for i in multiples_of_2:
print i
# prints nothing because the iterator is spent
for i in multiples_of_2:
print i
...as opposed to this:
multiples_of_2 = xrange(0, 100, 2) # iterable sequence, converted to iterator
for i in multiples_of_2:
print i
# prints again because a new iterator gets created
for i in multiples_of_2:
print i
A generator expression is equivalent to an invoked generator and can therefore only be iterated over once.
The real issue as I found out is about single versus multiple pass iterables and the fact that there is currently no standard mechanism to determine if an iterable single or multi pass: See Single- vs. Multi-pass iterability
If you want to convert a generator expression to a multipass iterable, then it can be done in a fairly routine fashion. For example:
class MultiPass(object):
def __init__(self, initfunc):
self.initfunc = initfunc
def __iter__(self):
return self.initfunc()
multiples_of_3 = MultiPass(lambda: (i*3 for i in range(20)))
multiples_of_7 = MultiPass(lambda: (i*7 for i in range(20)))
print list((i,j) for i in multiples_of_3 for j in multiples_of_7)
From the point of view of defining the thing it's a similar amount of work to typing:
def multiples_of_3():
return (i*3 for i in range(20))
but from the point of view of the user, they write multiples_of_3 rather than multiples_of_3(), which means the object multiples_of_3 is polymorphic with any other iterable, such as a tuple or list.
The need to type lambda: is a bit inelegant, true. I don't suppose there would be any harm in introducing "iterable comprehensions" to the language, to give you what you want while maintaining backward compatibility. But there are only so many punctuation characters, and I doubt this would be considered worth one.

Categories