Related
Which one of these is considered the more pythonic, taking into account scalability and readability?
Using enumerate:
group = ['A','B','C']
tag = ['a','b','c']
for idx, x in enumerate(group):
print(x, tag[idx])
or using zip:
for x, y in zip(group, tag):
print(x, y)
The reason I ask is that I have been using a mix of both. I should keep to one standard approach, but which should it be?
No doubt, zip is more pythonic. It doesn't require that you use a variable to store an index (which you don't otherwise need), and using it allows handling the lists uniformly, while with enumerate, you iterate over one list, and index the other list, i.e. non-uniform handling.
However, you should be aware of the caveat that zip runs only up to the shorter of the two lists. To avoid duplicating someone else's answer I'd just include a reference here: someone else's answer.
#user3100115 aptly points out that in python2, you should prefer using itertools.izip over zip, due its lazy nature (faster and more memory efficient). In python3 zip already behaves like py2's izip.
While others have pointed out that zip is in fact more pythonic than enumerate, I came here to see if it was any more efficient. According to my tests, zip is around 10 to 20% faster than enumerate when simply accessing and using items from multiple lists in parallel.
Here I have three lists of (the same) increasing length being accessed in parallel. When the lists are more than a couple of items in length, the time ratio of zip/enumerate is below zero and zip is faster.
Code I used:
import timeit
setup = \
"""
import random
size = {}
a = [ random.randint(0,i+1) for i in range(size) ]
b = [ random.random()*i for i in range(size) ]
c = [ random.random()+i for i in range(size) ]
"""
code_zip = \
"""
data = []
for x,y,z in zip(a,b,c):
data.append(x+z+y)
"""
code_enum = \
"""
data = []
for i,x in enumerate(a):
data.append(x+c[i]+b[i])
"""
runs = 10000
sizes = [ 2**i for i in range(16) ]
data = []
for size in sizes:
formatted_setup = setup.format(size)
time_zip = timeit.timeit(code_zip, formatted_setup, number=runs)
time_enum = timeit.timeit(code_enum, formatted_setup, number=runs)
ratio = time_zip/time_enum
row = (size,time_zip,time_enum,ratio)
data.append(row)
with open("testzipspeed.csv", 'w') as csv_file:
csv_file.write("size,time_zip,time_enumerate,ratio\n")
for row in data:
csv_file.write(",".join([ str(i) for i in row ])+"\n")
The answer to the question asked in your title, "Which is more pythonic; zip or enumerate...?" is: they both are. enumerate is just a special case of zip.
The answer to your more specific question about that for loop is: use zip, but not for the reasons you've seen so far.
The biggest advantage of zip in that loop has nothing to do with zip itself. It has to do with avoiding the assumptions made in your enumerate loop. To explain, I'll make two different generators based on your two examples:
def process_items_and_tags(items, tags):
"Do something with two iterables: items and tags."
for item, tag in zip(items, tag):
yield process(item, tag)
def process_items_and_list_of_tags(items, tags_list):
"Do something with an iterable of items and an indexable collection of tags."
for idx, item in enumerate(items):
yield process(item, tags_list[idx])
Both generators can take any iterable as their first argument (items), but they differ in how they handle their second argument. The enumerate-based approach can only process tags in a list-like collection with [] indexing. That rules out a huge number of iterables, like file streams and generators, for no good reason.
Why is one parameter more tightly constrained than the other? The restriction isn't inherent in the problem the user is trying to solve, since the generator could just as easily have been written the other way 'round:
def process_list_of_items_and_tags(items_list, tags):
"Do something with an indexable collection of items and an iterable of tags."
for idx, tag in enumerate(tags):
yield process(items[idx], tag)
Same result, different restriction on the inputs. Why should your caller have to know or care about any of that?
As an added penalty, anything of the form some_list[some_index] could raise an IndexError, which you would have to either catch or prevent in some way. That's not normally a problem when your loop both enumerates and accesses the same list-like collection, but here you're enumerating one and then accessing items from another. You'd have to add more code to handle an error that could not have happened in the zip-based version.
Avoiding the unnecessary idx variable is also nice, but hardly the deciding difference between the two approaches.
For more on the subject of iterables, generators, and functions that use them, see Ned Batchelder's PyCon US 2013 talk, "Loop Like a Native" (text, 30-minute video).
zip is more pythonic as said where you don't require another variable while you could also use
from collections import deque
deque(map(lambda x, y:sys.stdout.write(x+" "+y+"\n"),group,tag),maxlen=0)
Since we are printing output here a the list of None values need to be rectified and also provided your lists are of same length.
Update : Well in this case it may not be as good because you are printing group and tag values and it generates a list of None values because of sys.stdout.write but practically if you needed to fetch values it would be better.
zip might be more Pythonic, but it has a gotcha. If you want to change elements in place, you need to use indexing. Iterating over the elements will not work. For example:
x = [1,2,3]
for elem in x:
elem *= 10
print(x)
Output: [1,2,3]
y = [1,2,3]
for index in range(len(y)):
y[i] *= 10
print(y)
Output: [10,20,30]
This is a trivial starting question. I think range(len([list])) isn´t pythonic trying a non pythonist solution.
Thinking about it and reading excelent python documentation, I really like docs as numpy format style in simple pythonic code, that enumerate is a solution for iterables if you need a for loop because make an iterable is a comprehensive form.
list_a = ['a', 'b', 'c'];
list_2 = ['1', '2', '3',]
[print(a) for a in lista]
is for exec the printable line and perhaps better is a generator,
item = genetator_item = (print(i, a) for i, a in enumerate(lista) if a.find('a') == 0)
next(item)
for multiline for and more complex for loops, we can use the enumerate(zip(.
for i, (arg1, arg2) i in enumerate(zip(list_a, list_2)):
print('multiline') # do complex code
but perhaps in extended pythonic code we can use anotrher complex format with itertools, note idx at the end for len(list_a[:]) slice
from itertools import count as idx
for arg1, arg2, i in zip(list_a, list_2, idx(start=1)):
print(f'multiline {i}: {arg1}, {arg2}') # do complex code
Now I am trying to make two dimesional array with double loop.
In my code:
for t in range(0,150):
for z in range(0,279):
QC1 = QC[t,z,:,:]
SUMQ =1000*np.mean(QC1)
QRAIN1.append(SUMQ)
print len(QRAIN1)
QRAIN.append(QRAIN1)
QR = np.array(QRAIN)
I would like to make 150X279 array, but the result is not, because I think that in every time of the first loop run, the results are appended in the QRAIN1.
I would like to separate each loop run of the list of 259 numbers and accumulate them to QRAIN resulting 150x279 array.
Any help or idea would be really appreciated.
Thank you,
Isaac
Just make a new empty list each time through the loop:
for t in range(0,150):
QRAIN1 = []
for z in range(0,279):
QC1 = QC[t,z,:,:]
SUMQ =1000*np.mean(QC1)
QRAIN1.append(SUMQ)
print len(QRAIN1)
QRAIN.append(QRAIN1)
QR = np.array(QRAIN)
BTW, any time you find yourself starting with an empty list and then appending to it in a for loop, consider the stylish alternative of a list comprehension:
for t in range(150):
QRAIN1 = [1000*np.mean(QC[t,z,:,:]) for z in range(279)]
print len(QRAIN1)
QRAIN.append(QRAIN1)
QR = np.array(QRAIN)
I'm also removing the redundant 0, in the range calls -- again just a matter of style, but I like Tufte's principle, "no wasted pixels":-)
Of course you could also build all of QRAIN with a nested list comprehension, but I understand that's starting to be a bit of a stretch, and the "middle way" of a listcomp inside, a for loop outside, may be considered more readable. Anyway, just in case you want to try...:
QRAIN = [ [1000*np.mean(QC[t,z,:,:]) for z in range(279)]
for t in range(150) ]
QR = np.array(QRAIN)
This one doesn't have the prints but I suspect you were only using them as a debugging aid, so their loss shouldn't be a big problem, I hope:-).
I have a list that I want to filter by an attribute of the items.
Which of the following is preferred (readability, performance, other reasons)?
xs = [x for x in xs if x.attribute == value]
xs = filter(lambda x: x.attribute == value, xs)
It is strange how much beauty varies for different people. I find the list comprehension much clearer than filter+lambda, but use whichever you find easier.
There are two things that may slow down your use of filter.
The first is the function call overhead: as soon as you use a Python function (whether created by def or lambda) it is likely that filter will be slower than the list comprehension. It almost certainly is not enough to matter, and you shouldn't think much about performance until you've timed your code and found it to be a bottleneck, but the difference will be there.
The other overhead that might apply is that the lambda is being forced to access a scoped variable (value). That is slower than accessing a local variable and in Python 2.x the list comprehension only accesses local variables. If you are using Python 3.x the list comprehension runs in a separate function so it will also be accessing value through a closure and this difference won't apply.
The other option to consider is to use a generator instead of a list comprehension:
def filterbyvalue(seq, value):
for el in seq:
if el.attribute==value: yield el
Then in your main code (which is where readability really matters) you've replaced both list comprehension and filter with a hopefully meaningful function name.
This is a somewhat religious issue in Python. Even though Guido considered removing map, filter and reduce from Python 3, there was enough of a backlash that in the end only reduce was moved from built-ins to functools.reduce.
Personally I find list comprehensions easier to read. It is more explicit what is happening from the expression [i for i in list if i.attribute == value] as all the behaviour is on the surface not inside the filter function.
I would not worry too much about the performance difference between the two approaches as it is marginal. I would really only optimise this if it proved to be the bottleneck in your application which is unlikely.
Also since the BDFL wanted filter gone from the language then surely that automatically makes list comprehensions more Pythonic ;-)
Since any speed difference is bound to be miniscule, whether to use filters or list comprehensions comes down to a matter of taste. In general I'm inclined to use comprehensions (which seems to agree with most other answers here), but there is one case where I prefer filter.
A very frequent use case is pulling out the values of some iterable X subject to a predicate P(x):
[x for x in X if P(x)]
but sometimes you want to apply some function to the values first:
[f(x) for x in X if P(f(x))]
As a specific example, consider
primes_cubed = [x*x*x for x in range(1000) if prime(x)]
I think this looks slightly better than using filter. But now consider
prime_cubes = [x*x*x for x in range(1000) if prime(x*x*x)]
In this case we want to filter against the post-computed value. Besides the issue of computing the cube twice (imagine a more expensive calculation), there is the issue of writing the expression twice, violating the DRY aesthetic. In this case I'd be apt to use
prime_cubes = filter(prime, [x*x*x for x in range(1000)])
Although filter may be the "faster way", the "Pythonic way" would be not to care about such things unless performance is absolutely critical (in which case you wouldn't be using Python!).
I thought I'd just add that in python 3, filter() is actually an iterator object, so you'd have to pass your filter method call to list() in order to build the filtered list. So in python 2:
lst_a = range(25) #arbitrary list
lst_b = [num for num in lst_a if num % 2 == 0]
lst_c = filter(lambda num: num % 2 == 0, lst_a)
lists b and c have the same values, and were completed in about the same time as filter() was equivalent [x for x in y if z]. However, in 3, this same code would leave list c containing a filter object, not a filtered list. To produce the same values in 3:
lst_a = range(25) #arbitrary list
lst_b = [num for num in lst_a if num % 2 == 0]
lst_c = list(filter(lambda num: num %2 == 0, lst_a))
The problem is that list() takes an iterable as it's argument, and creates a new list from that argument. The result is that using filter in this way in python 3 takes up to twice as long as the [x for x in y if z] method because you have to iterate over the output from filter() as well as the original list.
An important difference is that list comprehension will return a list while the filter returns a filter, which you cannot manipulate like a list (ie: call len on it, which does not work with the return of filter).
My own self-learning brought me to some similar issue.
That being said, if there is a way to have the resulting list from a filter, a bit like you would do in .NET when you do lst.Where(i => i.something()).ToList(), I am curious to know it.
EDIT: This is the case for Python 3, not 2 (see discussion in comments).
I find the second way more readable. It tells you exactly what the intention is: filter the list.
PS: do not use 'list' as a variable name
generally filter is slightly faster if using a builtin function.
I would expect the list comprehension to be slightly faster in your case
Filter is just that. It filters out the elements of a list. You can see the definition mentions the same(in the official docs link I mentioned before). Whereas, list comprehension is something that produces a new list after acting upon something on the previous list.(Both filter and list comprehension creates new list and not perform operation in place of the older list. A new list here is something like a list with, say, an entirely new data type. Like converting integers to string ,etc)
In your example, it is better to use filter than list comprehension, as per the definition. However, if you want, say other_attribute from the list elements, in your example is to be retrieved as a new list, then you can use list comprehension.
return [item.other_attribute for item in my_list if item.attribute==value]
This is how I actually remember about filter and list comprehension. Remove a few things within a list and keep the other elements intact, use filter. Use some logic on your own at the elements and create a watered down list suitable for some purpose, use list comprehension.
Here's a short piece I use when I need to filter on something after the list comprehension. Just a combination of filter, lambda, and lists (otherwise known as the loyalty of a cat and the cleanliness of a dog).
In this case I'm reading a file, stripping out blank lines, commented out lines, and anything after a comment on a line:
# Throw out blank lines and comments
with open('file.txt', 'r') as lines:
# From the inside out:
# [s.partition('#')[0].strip() for s in lines]... Throws out comments
# filter(lambda x: x!= '', [s.part... Filters out blank lines
# y for y in filter... Converts filter object to list
file_contents = [y for y in filter(lambda x: x != '', [s.partition('#')[0].strip() for s in lines])]
It took me some time to get familiarized with the higher order functions filter and map. So i got used to them and i actually liked filter as it was explicit that it filters by keeping whatever is truthy and I've felt cool that I knew some functional programming terms.
Then I read this passage (Fluent Python Book):
The map and filter functions are still builtins
in Python 3, but since the introduction of list comprehensions and generator ex‐
pressions, they are not as important. A listcomp or a genexp does the job of map and
filter combined, but is more readable.
And now I think, why bother with the concept of filter / map if you can achieve it with already widely spread idioms like list comprehensions. Furthermore maps and filters are kind of functions. In this case I prefer using Anonymous functions lambdas.
Finally, just for the sake of having it tested, I've timed both methods (map and listComp) and I didn't see any relevant speed difference that would justify making arguments about it.
from timeit import Timer
timeMap = Timer(lambda: list(map(lambda x: x*x, range(10**7))))
print(timeMap.timeit(number=100))
timeListComp = Timer(lambda:[(lambda x: x*x) for x in range(10**7)])
print(timeListComp.timeit(number=100))
#Map: 166.95695265199174
#List Comprehension 177.97208347299602
In addition to the accepted answer, there is a corner case when you should use filter instead of a list comprehension. If the list is unhashable you cannot directly process it with a list comprehension. A real world example is if you use pyodbc to read results from a database. The fetchAll() results from cursor is an unhashable list. In this situation, to directly manipulating on the returned results, filter should be used:
cursor.execute("SELECT * FROM TABLE1;")
data_from_db = cursor.fetchall()
processed_data = filter(lambda s: 'abc' in s.field1 or s.StartTime >= start_date_time, data_from_db)
If you use list comprehension here you will get the error:
TypeError: unhashable type: 'list'
In terms of performance, it depends.
filter does not return a list but an iterator, if you need the list 'immediately' filtering and list conversion it is slower than with list comprehension by about 40% for very large lists (>1M). Up to 100K elements, there is almost no difference, from 600K onwards there starts to be differences.
If you don't convert to a list, filter is practically instantaneous.
More info at: https://blog.finxter.com/python-lists-filter-vs-list-comprehension-which-is-faster/
Curiously on Python 3, I see filter performing faster than list comprehensions.
I always thought that the list comprehensions would be more performant.
Something like:
[name for name in brand_names_db if name is not None]
The bytecode generated is a bit better.
>>> def f1(seq):
... return list(filter(None, seq))
>>> def f2(seq):
... return [i for i in seq if i is not None]
>>> disassemble(f1.__code__)
2 0 LOAD_GLOBAL 0 (list)
2 LOAD_GLOBAL 1 (filter)
4 LOAD_CONST 0 (None)
6 LOAD_FAST 0 (seq)
8 CALL_FUNCTION 2
10 CALL_FUNCTION 1
12 RETURN_VALUE
>>> disassemble(f2.__code__)
2 0 LOAD_CONST 1 (<code object <listcomp> at 0x10cfcaa50, file "<stdin>", line 2>)
2 LOAD_CONST 2 ('f2.<locals>.<listcomp>')
4 MAKE_FUNCTION 0
6 LOAD_FAST 0 (seq)
8 GET_ITER
10 CALL_FUNCTION 1
12 RETURN_VALUE
But they are actually slower:
>>> timeit(stmt="f1(range(1000))", setup="from __main__ import f1,f2")
21.177661532000116
>>> timeit(stmt="f2(range(1000))", setup="from __main__ import f1,f2")
42.233950221000214
I would come to the conclusion: Use list comprehension over filter since its
more readable
more pythonic
faster (for Python 3.11, see attached benchmark, also see )
Keep in mind that filter returns a iterator, not a list.
python3 -m timeit '[x for x in range(10000000) if x % 2 == 0]'
1 loop, best of 5: 270 msec per loop
python3 -m timeit 'list(filter(lambda x: x % 2 == 0, range(10000000)))'
1 loop, best of 5: 432 msec per loop
Summarizing other answers
Looking through the answers, we have seen a lot of back and forth, whether or not list comprehension or filter may be faster or if it is even important or pythonic to care about such an issue. In the end, the answer is as most times: it depends.
I just stumbled across this question while optimizing code where this exact question (albeit combined with an in expression, not ==) is very relevant - the filter + lambda expression is taking up a third of my computation time (of multiple minutes).
My case
In my case, the list comprehension is much faster (twice the speed). But I suspect that this varies strongly based on the filter expression as well as the Python interpreter used.
Test it for yourself
Here is a simple code snippet that should be easy to adapt. If you profile it (most IDEs can do that easily), you will be able to easily decide for your specific case which is the better option:
whitelist = set(range(0, 100000000, 27))
input_list = list(range(0, 100000000))
proximal_list = list(filter(
lambda x: x in whitelist,
input_list
))
proximal_list2 = [x for x in input_list if x in whitelist]
print(len(proximal_list))
print(len(proximal_list2))
If you do not have an IDE that lets you profile easily, try this instead (extracted from my codebase, so a bit more complicated). This code snippet will create a profile for you that you can easily visualize using e.g. snakeviz:
import cProfile
from time import time
class BlockProfile:
def __init__(self, profile_path):
self.profile_path = profile_path
self.profiler = None
self.start_time = None
def __enter__(self):
self.profiler = cProfile.Profile()
self.start_time = time()
self.profiler.enable()
def __exit__(self, *args):
self.profiler.disable()
exec_time = int((time() - self.start_time) * 1000)
self.profiler.dump_stats(self.profile_path)
whitelist = set(range(0, 100000000, 27))
input_list = list(range(0, 100000000))
with BlockProfile("/path/to/create/profile/in/profile.pstat"):
proximal_list = list(filter(
lambda x: x in whitelist,
input_list
))
proximal_list2 = [x for x in input_list if x in whitelist]
print(len(proximal_list))
print(len(proximal_list2))
Your question is so simple yet interesting. It just shows how flexible python is, as a programming language. One may use any logic and write the program according to their talent and understandings. It is fine as long as we get the answer.
Here in your case, it is just an simple filtering method which can be done by both but i would prefer the first one my_list = [x for x in my_list if x.attribute == value] because it seems simple and does not need any special syntax. Anyone can understands this command and make changes if needs it.
(Although second method is also simple, but it still has more complexity than the first one for the beginner level programmers)
I'm trying to solve a codechef beginner problem - Enormous Input Test. My code
a,b = [ int(i) for i in raw_input().split()]
print [input()%b==0 for i in range(a)].count(True)
gets timed out. Another solution, which uses basic for-loops, seems to be working fine.
I believe that list comprehension is quicker than basic for - loops. Then why is the former slower? Also will using generators in this case reduce the memory used and perform the computation faster, if so how can I do it?
Why do you believe that list comprehension is quicker than basic for loops? (Hint: they are both implemented using the same underlying instructions.)
Your code will be executed in some manner like this:
a, b = ...
temp = []
for i in range(a):
temp.append(int(raw_input()) % b == 0)
print temp.count(True)
As you can see, it creates a large list in memory, iterates over it to create a second list, and then iterates over the second list to create a count. The list does not ever need to be created.
a, b = ...
count = 0
for i in xrange(a):
if int(raw_input()) % b == 0:
count += 1
print count
Some compilers are capable of optimizing hylomorphisms to remove the intermideate list, but I know of no Python implementation capable of this. So you are stuck optimizing by hand.
Note: Do not use input in Python 2.x, unless you know what you are doing. I have changed the code to use int(raw_input()) because that is safe, whereas input() is dangerous.
I see this kind of thing sometimes:
(k for k in (j for j in (i for i in xrange(10))))
Now this really bends my brain, and I would rather it wasn't presented in this way.
Are there any use-cases, or examples of having used these nested expressions where it was more elegant and more readable than if it had been a nested loop?
Edit: Thanks for the examples of ways to simplify this. It's not actually what I asked for, I was wondering if there were any times when it was elegant.
Check PEP 202 which was where list comprehensions syntax was introduced to the language.
For understanding your example, there is a simple rule from Guido himself:
The form [... for x... for y...] nests, with the last index
varying fastest, just like nested for loops.
Also from PEP 202, which serves to answer your question:
Rationale
List comprehensions provide a more concise way to create lists in
situations where map() and filter() and/or nested loops would
currently be used.
If you had a situation like that, you could find it to be more elegant. IMHO, though, multiple nested list comprehensions may be less clear in your code than nested for loops, since for loops are easily parsed visually.
If you're worried about too much complexity on one line, you could split it:
(k for k in
(j for j in
(i for i in xrange(10))))
I've always found line continuations to look a little weird in Python, but this does make it easier to see what each one is looping over. Since an extra assignment/lookup is not going to make or break anything, you could also write it like this:
gen1 = (i for i in xrange(10))
gen2 = (j for j in gen1)
gen3 = (k for k in gen2)
In practice, I don't think I've ever nested a comprehension more than 2-deep, and at that point it was still pretty easy to understand.
In the case of your example, I would probably write it as:
foos = (i for i in xrange(10))
bars = (j for j in foos)
bazs = (k for k in bars)
Given more descriptive names, I think this would probably be quite clear, and I can't imagine there being any measurable performance difference.
Perhaps you're thinking more of expressions like:
(x for x in xs for xs in ys for ys in lst)
-- actually, that's not even valid. You have to put things in the other order:
(x for ys in lst for xs in ys for x in xs)
I might write that as a quick way of flattening a list, but in general I think you're write: the time you save by typing less is usually balanced by the extra time you spend getting the generator expression right.
Since they are generator expressions, you can bind each to it's own name to make it more readable without any change in performance. Changing it to a nested loop would likely be detrimental to performance.
irange = (i for i in xrange(10))
jrange = (j for j in irange)
krange = (k for k in jrange)
It really doesn't matter which you choose, I think the multi-line example is more readable, in general.
Caveat: elegance is partly a matter of taste.
List comprehensions are never more clear than the corresponding expanded for loop. For loops are also more powerful than list comprehensions. So why use them at all?
What list comprehensions are is concise -- they allow you to do something in a single line.
The time to use a list comprehension is when you need a certain list, it can be created on the fly fairly easily, and you don't want or need intermediate objects hanging around. This might happen when you need to package some objects in the current scope into a single object that you can feed into a function, like below:
list1 = ['foo', 'bar']
list2 = ['-ness', '-ity']
return filterRealWords([str1+str2 for str1 in list1 for str2 in list2])
This code is about as readable than the expanded version, but it is far shorter. It avoids creating/naming an object that is only used once in the current scope, which is arguably more elegant.
The expression:
(k for k in (j for j in (i for i in xrange(10))))
is equivalent to:
(i for i in xrange(10))
that is almost the same:
xrange(10)
The last variant is more elegant than the first one.
I find it can be useful and elegant in situations like these where you have code like this:
output = []
for item in list:
if item >= 1:
new = item += 1
output.append(new)
You can make it a one-liner like this:
output = [item += 1 for item in list if item >= 1]