Related
I have a list of strings that print out just fine using a normal loop:
for x in listing:
print(x)
I thought it should be pretty simple to use a lambda to reduce the loop syntax, and kickstart my learning of lambdas in Python (I'm pretty new to Python).
Given that the syntax for map is map(function, iterable, ...) I tried:
map(lambda x: print(x), listing)
But this does not print anything out (it also does not produce an error). I've done some searching through material online but everything I have found to date is based on Python 2, namely mentioning that with Python 2 this isn't possible but that it should be in Python 3, without explicitly mentioning how so.
What am I doing wrong?
In python 3, map returns an iterator:
>>> map(print, listing)
<map object at 0x7fabf5d73588>
This iterator is lazy, which means that it won't do anything until you iterate over it. Once you do iterate over it, you get the values of your list printed:
>>> listing = [1, 2, 3]
>>> for _ in map(print, listing):
... pass
...
1
2
3
What this also means is that map isn't the right tool for the job. map creates an iterator, so it should only be used if you're planning to iterate over that iterator. It shouldn't be used for side effects, like printing values. See also When should I use map instead of a for loop.
I wouldn't recommend using map here, as you don't really care about the iterator. If you want to simplify the basic "for loop", you could instead use str.join():
>>> mylist = ['hello', 'there', 'everyone']
>>> '\n'.join(mylist)
hello
there
everyone
Or if you have a non-string list:
>>> mylist = [1,2,3,4]
>>> '\n'.join(map(str, mylist))
1
2
3
4
I am reading Joel Grus's data science from scratch book and found something a bit mysterious. Basically, in some sample code, he wrote
a = [1, 2 ,3 ,4]
xs = [i for i,_ in enumerate(a)]
Why would he prefer to do this way? Instead of
xs = range(len(a))
Answer: personal preference of the author. I find
[i for i, _ in enumerate(xs)]
clearer and more readable than
list(range(len(xs)))
which feels clunky to me. (I don't like reading the nested functions.) Your mileage may vary (and apparently does!).
That said, I am pretty sure I didn't say not to do the second, I just happen to prefer the first.
Source: I am the author.
P.S. If you're the commenter who had no intention of reading anything I write about Python, I apologize if you read this answer by accident.
I looked at the code available on github and frankly, I do not see any other reason for this except the personal preference of the author.
However, the result needs to be a list in places like this:
indexes = [i for i, _ in enumerate(data)] # create a list of indexes
random.shuffle(indexes) # shuffle them
for i in indexes: # return the data in that order
yield data[i]
Using bare range(len(data)) in that part on Python 3 would be wrong, because random.shuffle() requires a mutable sequence as the argument, and the range objects in Python 3 are immutable sequences.
I personally would use list(range(len(data))) on Python 3 in the case that I linked to, as it is guaranteed to be more efficient and would fail if a generator/iterator was passed in by accident, instead of a sequence.
Without being the author, I would have to guess, but my guess is that it's for Python 2 and 3 compatibility.
In Python 2:
>>> a = [1,2,3,4]
>>> xs = range(len(a))
>>> xs
[0, 1, 2, 3]
>>> type(xs)
<type 'list'>
In Python 3:
>>> a = [1,2,3,4]
>>> xs = range(len(a))
>>> xs
range(0, 4)
>>> type(xs)
<class 'range'>
Now, that doesn't make a difference when you're directly iterating over the range, but if you're planning to use the index list for something else later on, the author may feel that the enumerate is simpler to understand than list(range(len(a)))
Both are ok.
When I started coding in python I was more list(range(len(a))) .
Now I am more in pythonic way .
Both are readable.
If I write
for i in range(5):
print i
Then it gives 0, 1, 2, 3, 4
Does that mean Python assigned 0, 1, 2, 3, 4 to i at the same time?
However if I wrote:
for i in range(5):
a=i+1
Then I call a, it only gives 5
But if I add ''print a'' it gives 1, 2, 3, 4, 5
So my question is what is the difference here?
Is i a string or a list or something else?
Or maybe can anyone help me to sort out:
for l in range(5):
#vs,fs,rs are all m*n matrixs,got initial values in,i.e vs[0],fs[0],rs[0] are known
#want use this foor loop to update them
vs[l+1]=vs[l]+fs[l]
fs[l+1]=((rs[l]-re[l])
rs[l+1]=rs[l]+vs[l]
#then this code gives vs,fs,rs
If I run this kind of code, then I will get the answer only when l=5
How can I make them start looping?
i.e l=0 got values for vs[1],fs[1],rs[1],
then l=1 got values for vs[2],rs[2],fs[2]......and so on.
But python gives different arrays of fs,vs,rs, correspond to different value of l
How can I make them one piece?
A "for loop" in most, if not all, programming languages is a mechanism to run a piece of code more than once.
This code:
for i in range(5):
print i
can be thought of working like this:
i = 0
print i
i = 1
print i
i = 2
print i
i = 3
print i
i = 4
print i
So you see, what happens is not that i gets the value 0, 1, 2, 3, 4 at the same time, but rather sequentially.
I assume that when you say "call a, it gives only 5", you mean like this:
for i in range(5):
a=i+1
print a
this will print the last value that a was given. Every time the loop iterates, the statement a=i+1 will overwrite the last value a had with the new value.
Code basically runs sequentially, from top to bottom, and a for loop is a way to make the code go back and something again, with a different value for one of the variables.
I hope this answered your question.
When I'm teaching someone programming (just about any language) I introduce for loops with terminology similar to this code example:
for eachItem in someList:
doSomething(eachItem)
... which, conveniently enough, is syntactically valid Python code.
The Python range() function simply returns or generates a list of integers from some lower bound (zero, by default) up to (but not including) some upper bound, possibly in increments (steps) of some other number (one, by default).
So range(5) returns (or possibly generates) a sequence: 0, 1, 2, 3, 4 (up to but not including the upper bound).
A call to range(2,10) would return: 2, 3, 4, 5, 6, 7, 8, 9
A call to range(2,12,3) would return: 2, 5, 8, 11
Notice that I said, a couple times, that Python's range() function returns or generates a sequence. This is a relatively advanced distinction which usually won't be an issue for a novice. In older versions of Python range() built a list (allocated memory for it and populated with with values) and returned a reference to that list. This could be inefficient for large ranges which might consume quite a bit of memory and for some situations where you might want to iterate over some potentially large range of numbers but were likely to "break" out of the loop early (after finding some particular item in which you were interested, for example).
Python supports more efficient ways of implementing the same semantics (of doing the same thing) through a programming construct called a generator. Instead of allocating and populating the entire list and return it as a static data structure, Python can instantiate an object with the requisite information (upper and lower bounds and step/increment value) ... and return a reference to that.
The (code) object then keeps track of which number it returned most recently and computes the new values until it hits the upper bound (and which point it signals the end of the sequence to the caller using an exception called "StopIteration"). This technique (computing values dynamically rather than all at once, up-front) is referred to as "lazy evaluation."
Other constructs in the language (such as those underlying the for loop) can then work with that object (iterate through it) as though it were a list.
For most cases you don't have to know whether your version of Python is using the old implementation of range() or the newer one based on generators. You can just use it and be happy.
If you're working with ranges of millions of items, or creating thousands of different ranges of thousands each, then you might notice a performance penalty for using range() on an old version of Python. In such cases you could re-think your design and use while loops, or create objects which implement the "lazy evaluation" semantics of a generator, or use the xrange() version of range() if your version of Python includes it, or the range() function from a version of Python that uses the generators implicitly.
Concepts such as generators, and more general forms of lazy evaluation, permeate Python programming as you go beyond the basics. They are usually things you don't have to know for simple programming tasks but which become significant as you try to work with larger data sets or within tighter constraints (time/performance or memory bounds, for example).
[Update: for Python3 (the currently maintained versions of Python) the range() function always returns the dynamic, "lazy evaluation" iterator; the older versions of Python (2.x) which returned a statically allocated list of integers are now officially obsolete (after years of having been deprecated)].
for i in range(5):
is the same as
for i in [0,1,2,3,4]:
range(x) returns a list of numbers from 0 to x - 1.
>>> range(1)
[0]
>>> range(2)
[0, 1]
>>> range(3)
[0, 1, 2]
>>> range(4)
[0, 1, 2, 3]
for i in range(x): executes the body (which is print i in your first example) once for each element in the list returned by range().
i is used inside the body to refer to the “current” item of the list.
In that case, i refers to an integer, but it could be of any type, depending on the objet on which you loop.
The range function wil give you a list of numbers, while the for loop will iterate through the list and execute the given code for each of its items.
for i in range(5):
print i
This simply executes print i five times, for i ranging from 0 to 4.
for i in range(5):
a=i+1
This will execute a=i+1 five times. Since you are overwriting the value of a on each iteration, at the end you will only get the value for the last iteration, that is 4+1.
Useful links:
http://www.network-theory.co.uk/docs/pytut/rangeFunction.html
http://www.ibiblio.org/swaroopch/byteofpython/read/for-loop.html
It is looping, probably the problem is in the part of the print...
If you can't find the logic where the system prints, just add the folling where you want the content out:
for i in range(len(vs)):
print vs[i]
print fs[i]
print rs[i]
Both my professor and this guy claim that range creates a list of values.
"Note: The range function simply returns a list containing the numbers
from x to y-1. For example, range(5, 10) returns the list [5, 6, 7, 8,
9]."
I believe this is to be inaccurate because:
type(range(5, 10))
<class 'range'>
Furthermore, the only apparent way to access the integers created by range is to iterate through them, which leads me to believe that labeling range as a lists is incorrect.
In Python 2.x, range returns a list, but in Python 3.x range returns an immutable sequence, of type range.
Python 2.x:
>>> type(range(10))
<type 'list'>
>>> type(xrange(10))
<type 'xrange'>
Python 3.x:
>>> type(range(10))
<class 'range'>
In Python 2.x, if you want to get an iterable object, like in Python 3.x, you can use xrange function, which returns an immutable sequence of type xrange.
Advantage of xrange over range in Python 2.x:
The advantage of xrange() over range() is minimal (since xrange() still has to create the values when asked for them) except when a very large range is used on a memory-starved machine or when all of the range’s elements are never used (such as when the loop is usually terminated with break).
Note:
Furthermore, the only apparent way to access the integers created by range() is to iterate through them,
Nope. Since range objects in Python 3 are immutable sequences, they support indexing as well. Quoting from the range function documentation,
Ranges implement all of the common sequence operations except concatenation and repetition
...
Range objects implement the collections.abc.Sequence ABC, and provide features such as containment tests, element index lookup, slicing and support for negative indices.
For example,
>>> range(10, 20)[5]
15
>>> range(10, 20)[2:5]
range(12, 15)
>>> list(range(10, 20)[2:5])
[12, 13, 14]
>>> list(range(10, 20, 2))
[10, 12, 14, 16, 18]
>>> 18 in range(10, 20)
True
>>> 100 in range(10, 20)
False
All these are possible with that immutable range sequence.
Recently, I faced a problem and I think it would be appropriate to include here. Consider this Python 3.x code
from itertools import islice
numbers = range(100)
items = list(islice(numbers, 10))
while items:
items = list(islice(numbers, 10))
print(items)
One would expect this code to print every ten numbers as a list, till 99. But, it would run infinitely. Can you reason why?
Solution
Because the range returns an immutable sequence, not an iterator object. So, whenever islice is done on a range object, it always starts from the beginning. Think of it as a drop-in replacement for an immutable list. Now the question comes, how will you fix it? Its simple, you just have to get an iterator out of it. Simply change
numbers = range(100)
to
numbers = iter(range(100))
Now, numbers is an iterator object and it remembers how long it has been iterated before. So, when the islice iterates it, it just starts from the place where it previously ended.
It depends.
In python-2.x, range actually creates a list (which is also a sequence) whereas xrange creates an xrange object that can be used to iterate through the values.
On the other hand, in python-3.x, range creates an iterable (or more specifically, a sequence)
range creates a list if the python version used is 2.x .
In this scenario range is to be used only if its referenced more than once otherwise use xrange which creates a generator there by redusing the memory usage and sometimes time as it has lazy approach.
xrange is not there in python 3.x rather range stands for what xrange is for python 2.x
refer to question
What is the difference between range and xrange functions in Python 2.X?
Python3
for loop with range function
To understand what for i in range() means in Python3, we need first to understand the working of the range() function.
The range() function uses the generator to produce numbers. It doesn’t generate all numbers at once.
As you know range() returns the range object. A range object uses the same (small) amount of memory, no matter the size of the range it represents. It only stores the start, stop and step values and calculates individual items and subranges as needed.
I.e., It generates the next value only when for loop iteration asked for it. In each loop iteration, It generates the next value and assigns it to the iterator variable i.
So it means range() produces numbers one by one as the loop moves to the next iteration. It saves lots of memory, which makes range() faster and efficient.
Reference: PyNative - for loop with range()
I have a list that I want to filter by an attribute of the items.
Which of the following is preferred (readability, performance, other reasons)?
xs = [x for x in xs if x.attribute == value]
xs = filter(lambda x: x.attribute == value, xs)
It is strange how much beauty varies for different people. I find the list comprehension much clearer than filter+lambda, but use whichever you find easier.
There are two things that may slow down your use of filter.
The first is the function call overhead: as soon as you use a Python function (whether created by def or lambda) it is likely that filter will be slower than the list comprehension. It almost certainly is not enough to matter, and you shouldn't think much about performance until you've timed your code and found it to be a bottleneck, but the difference will be there.
The other overhead that might apply is that the lambda is being forced to access a scoped variable (value). That is slower than accessing a local variable and in Python 2.x the list comprehension only accesses local variables. If you are using Python 3.x the list comprehension runs in a separate function so it will also be accessing value through a closure and this difference won't apply.
The other option to consider is to use a generator instead of a list comprehension:
def filterbyvalue(seq, value):
for el in seq:
if el.attribute==value: yield el
Then in your main code (which is where readability really matters) you've replaced both list comprehension and filter with a hopefully meaningful function name.
This is a somewhat religious issue in Python. Even though Guido considered removing map, filter and reduce from Python 3, there was enough of a backlash that in the end only reduce was moved from built-ins to functools.reduce.
Personally I find list comprehensions easier to read. It is more explicit what is happening from the expression [i for i in list if i.attribute == value] as all the behaviour is on the surface not inside the filter function.
I would not worry too much about the performance difference between the two approaches as it is marginal. I would really only optimise this if it proved to be the bottleneck in your application which is unlikely.
Also since the BDFL wanted filter gone from the language then surely that automatically makes list comprehensions more Pythonic ;-)
Since any speed difference is bound to be miniscule, whether to use filters or list comprehensions comes down to a matter of taste. In general I'm inclined to use comprehensions (which seems to agree with most other answers here), but there is one case where I prefer filter.
A very frequent use case is pulling out the values of some iterable X subject to a predicate P(x):
[x for x in X if P(x)]
but sometimes you want to apply some function to the values first:
[f(x) for x in X if P(f(x))]
As a specific example, consider
primes_cubed = [x*x*x for x in range(1000) if prime(x)]
I think this looks slightly better than using filter. But now consider
prime_cubes = [x*x*x for x in range(1000) if prime(x*x*x)]
In this case we want to filter against the post-computed value. Besides the issue of computing the cube twice (imagine a more expensive calculation), there is the issue of writing the expression twice, violating the DRY aesthetic. In this case I'd be apt to use
prime_cubes = filter(prime, [x*x*x for x in range(1000)])
Although filter may be the "faster way", the "Pythonic way" would be not to care about such things unless performance is absolutely critical (in which case you wouldn't be using Python!).
I thought I'd just add that in python 3, filter() is actually an iterator object, so you'd have to pass your filter method call to list() in order to build the filtered list. So in python 2:
lst_a = range(25) #arbitrary list
lst_b = [num for num in lst_a if num % 2 == 0]
lst_c = filter(lambda num: num % 2 == 0, lst_a)
lists b and c have the same values, and were completed in about the same time as filter() was equivalent [x for x in y if z]. However, in 3, this same code would leave list c containing a filter object, not a filtered list. To produce the same values in 3:
lst_a = range(25) #arbitrary list
lst_b = [num for num in lst_a if num % 2 == 0]
lst_c = list(filter(lambda num: num %2 == 0, lst_a))
The problem is that list() takes an iterable as it's argument, and creates a new list from that argument. The result is that using filter in this way in python 3 takes up to twice as long as the [x for x in y if z] method because you have to iterate over the output from filter() as well as the original list.
An important difference is that list comprehension will return a list while the filter returns a filter, which you cannot manipulate like a list (ie: call len on it, which does not work with the return of filter).
My own self-learning brought me to some similar issue.
That being said, if there is a way to have the resulting list from a filter, a bit like you would do in .NET when you do lst.Where(i => i.something()).ToList(), I am curious to know it.
EDIT: This is the case for Python 3, not 2 (see discussion in comments).
I find the second way more readable. It tells you exactly what the intention is: filter the list.
PS: do not use 'list' as a variable name
generally filter is slightly faster if using a builtin function.
I would expect the list comprehension to be slightly faster in your case
Filter is just that. It filters out the elements of a list. You can see the definition mentions the same(in the official docs link I mentioned before). Whereas, list comprehension is something that produces a new list after acting upon something on the previous list.(Both filter and list comprehension creates new list and not perform operation in place of the older list. A new list here is something like a list with, say, an entirely new data type. Like converting integers to string ,etc)
In your example, it is better to use filter than list comprehension, as per the definition. However, if you want, say other_attribute from the list elements, in your example is to be retrieved as a new list, then you can use list comprehension.
return [item.other_attribute for item in my_list if item.attribute==value]
This is how I actually remember about filter and list comprehension. Remove a few things within a list and keep the other elements intact, use filter. Use some logic on your own at the elements and create a watered down list suitable for some purpose, use list comprehension.
Here's a short piece I use when I need to filter on something after the list comprehension. Just a combination of filter, lambda, and lists (otherwise known as the loyalty of a cat and the cleanliness of a dog).
In this case I'm reading a file, stripping out blank lines, commented out lines, and anything after a comment on a line:
# Throw out blank lines and comments
with open('file.txt', 'r') as lines:
# From the inside out:
# [s.partition('#')[0].strip() for s in lines]... Throws out comments
# filter(lambda x: x!= '', [s.part... Filters out blank lines
# y for y in filter... Converts filter object to list
file_contents = [y for y in filter(lambda x: x != '', [s.partition('#')[0].strip() for s in lines])]
It took me some time to get familiarized with the higher order functions filter and map. So i got used to them and i actually liked filter as it was explicit that it filters by keeping whatever is truthy and I've felt cool that I knew some functional programming terms.
Then I read this passage (Fluent Python Book):
The map and filter functions are still builtins
in Python 3, but since the introduction of list comprehensions and generator ex‐
pressions, they are not as important. A listcomp or a genexp does the job of map and
filter combined, but is more readable.
And now I think, why bother with the concept of filter / map if you can achieve it with already widely spread idioms like list comprehensions. Furthermore maps and filters are kind of functions. In this case I prefer using Anonymous functions lambdas.
Finally, just for the sake of having it tested, I've timed both methods (map and listComp) and I didn't see any relevant speed difference that would justify making arguments about it.
from timeit import Timer
timeMap = Timer(lambda: list(map(lambda x: x*x, range(10**7))))
print(timeMap.timeit(number=100))
timeListComp = Timer(lambda:[(lambda x: x*x) for x in range(10**7)])
print(timeListComp.timeit(number=100))
#Map: 166.95695265199174
#List Comprehension 177.97208347299602
In addition to the accepted answer, there is a corner case when you should use filter instead of a list comprehension. If the list is unhashable you cannot directly process it with a list comprehension. A real world example is if you use pyodbc to read results from a database. The fetchAll() results from cursor is an unhashable list. In this situation, to directly manipulating on the returned results, filter should be used:
cursor.execute("SELECT * FROM TABLE1;")
data_from_db = cursor.fetchall()
processed_data = filter(lambda s: 'abc' in s.field1 or s.StartTime >= start_date_time, data_from_db)
If you use list comprehension here you will get the error:
TypeError: unhashable type: 'list'
In terms of performance, it depends.
filter does not return a list but an iterator, if you need the list 'immediately' filtering and list conversion it is slower than with list comprehension by about 40% for very large lists (>1M). Up to 100K elements, there is almost no difference, from 600K onwards there starts to be differences.
If you don't convert to a list, filter is practically instantaneous.
More info at: https://blog.finxter.com/python-lists-filter-vs-list-comprehension-which-is-faster/
Curiously on Python 3, I see filter performing faster than list comprehensions.
I always thought that the list comprehensions would be more performant.
Something like:
[name for name in brand_names_db if name is not None]
The bytecode generated is a bit better.
>>> def f1(seq):
... return list(filter(None, seq))
>>> def f2(seq):
... return [i for i in seq if i is not None]
>>> disassemble(f1.__code__)
2 0 LOAD_GLOBAL 0 (list)
2 LOAD_GLOBAL 1 (filter)
4 LOAD_CONST 0 (None)
6 LOAD_FAST 0 (seq)
8 CALL_FUNCTION 2
10 CALL_FUNCTION 1
12 RETURN_VALUE
>>> disassemble(f2.__code__)
2 0 LOAD_CONST 1 (<code object <listcomp> at 0x10cfcaa50, file "<stdin>", line 2>)
2 LOAD_CONST 2 ('f2.<locals>.<listcomp>')
4 MAKE_FUNCTION 0
6 LOAD_FAST 0 (seq)
8 GET_ITER
10 CALL_FUNCTION 1
12 RETURN_VALUE
But they are actually slower:
>>> timeit(stmt="f1(range(1000))", setup="from __main__ import f1,f2")
21.177661532000116
>>> timeit(stmt="f2(range(1000))", setup="from __main__ import f1,f2")
42.233950221000214
I would come to the conclusion: Use list comprehension over filter since its
more readable
more pythonic
faster (for Python 3.11, see attached benchmark, also see )
Keep in mind that filter returns a iterator, not a list.
python3 -m timeit '[x for x in range(10000000) if x % 2 == 0]'
1 loop, best of 5: 270 msec per loop
python3 -m timeit 'list(filter(lambda x: x % 2 == 0, range(10000000)))'
1 loop, best of 5: 432 msec per loop
Summarizing other answers
Looking through the answers, we have seen a lot of back and forth, whether or not list comprehension or filter may be faster or if it is even important or pythonic to care about such an issue. In the end, the answer is as most times: it depends.
I just stumbled across this question while optimizing code where this exact question (albeit combined with an in expression, not ==) is very relevant - the filter + lambda expression is taking up a third of my computation time (of multiple minutes).
My case
In my case, the list comprehension is much faster (twice the speed). But I suspect that this varies strongly based on the filter expression as well as the Python interpreter used.
Test it for yourself
Here is a simple code snippet that should be easy to adapt. If you profile it (most IDEs can do that easily), you will be able to easily decide for your specific case which is the better option:
whitelist = set(range(0, 100000000, 27))
input_list = list(range(0, 100000000))
proximal_list = list(filter(
lambda x: x in whitelist,
input_list
))
proximal_list2 = [x for x in input_list if x in whitelist]
print(len(proximal_list))
print(len(proximal_list2))
If you do not have an IDE that lets you profile easily, try this instead (extracted from my codebase, so a bit more complicated). This code snippet will create a profile for you that you can easily visualize using e.g. snakeviz:
import cProfile
from time import time
class BlockProfile:
def __init__(self, profile_path):
self.profile_path = profile_path
self.profiler = None
self.start_time = None
def __enter__(self):
self.profiler = cProfile.Profile()
self.start_time = time()
self.profiler.enable()
def __exit__(self, *args):
self.profiler.disable()
exec_time = int((time() - self.start_time) * 1000)
self.profiler.dump_stats(self.profile_path)
whitelist = set(range(0, 100000000, 27))
input_list = list(range(0, 100000000))
with BlockProfile("/path/to/create/profile/in/profile.pstat"):
proximal_list = list(filter(
lambda x: x in whitelist,
input_list
))
proximal_list2 = [x for x in input_list if x in whitelist]
print(len(proximal_list))
print(len(proximal_list2))
Your question is so simple yet interesting. It just shows how flexible python is, as a programming language. One may use any logic and write the program according to their talent and understandings. It is fine as long as we get the answer.
Here in your case, it is just an simple filtering method which can be done by both but i would prefer the first one my_list = [x for x in my_list if x.attribute == value] because it seems simple and does not need any special syntax. Anyone can understands this command and make changes if needs it.
(Although second method is also simple, but it still has more complexity than the first one for the beginner level programmers)