Why is my code slower in pypy than the default python interpreter? - python

Given a number of players n, I need to find H, the list of all tuples where each tuple is a combination of coalitions (of the players, e.g. (1,2,3) is the coalition of players 1, 2 and 3. ((1,2,3),(4,5),(6,)) is a combination of coalitions - which are also tuples) that respects this rule: each player appears only and exactly once (i.e. in only one coalition).
P.S. Each combination of coalitions is called layout in the code.
At the beginning I wrote a snippet in which I computed all combinations of all coalitions and for each combination I checked the rule. Problem is that for 5-6 players the number of combinations of coalitions was already so big that my computer went phut.
In order to avoid a a big part of the computation (all possible combinations, the loop and the ifs) I wrote the following (which I tested and it's equivalent to the previous snippet):
from itertools import combinations, combinations_with_replacement, product, permutations
players = range(1,n+1)
coalitions = [[coal for coal in list(combinations(players,length))] for length in players]
H = [tuple(coalitions[0]),(coalitions[-1][0],)]
combs = [comb for length in xrange(2,n) for comb in combinations_with_replacement(players,length) if sum(comb) == n]
perms = list(permutations(players))
layouts = set(frozenset(frozenset(perm[i:i+x]) for (i,x) in zip([0]+[sum(comb[:y]) for y in xrange(1,len(comb))],comb)) for comb in combs for perm in perms)
H.extend(tuple(tuple(tuple(coal) for coal in layout) for layout in layouts))
print H
EXPLANATION: say n = 3
First I create all possible coalitions:
coalitions = [[(1,),(2,),(3,)],[(1,2),(1,3),(2,3)],[(1,2,3)]]
Then I initialize H with the obvious combinations: each player in his own coalition and every player in the biggest coalition.
H = [((1,),(2,),(3,)),((1,2,3),)]
Then I compute all the possible forms of the layouts:
combs = [(1,2)] #(1,2) represents a layout in which there is
#one 1-player coalition and one 2-player coalition.
I compute the permutations (perms).
Finally for each perm and for each comb I calculate the different possible layouts. I set the result (layouts) in order to delete duplicates and add to H.
H = [((1,),(2,),(3,)),((1,2,3),),((1,2),(3,)),((1,3),(2,)),((2,3),(1,))]
Here's the comparison:
python script.py
4: 0.000520944595337 seconds
5: 0.0038321018219 seconds
6: 0.0408189296722 seconds
7: 0.431486845016 seconds
8: 6.05224680901 seconds
9: 76.4520540237 seconds
pypy script.py
4: 0.00342392921448 seconds
5: 0.0668039321899 seconds
6: 0.311077833176 seconds
7: 1.13124799728 seconds
8: 11.5973010063 seconds
9: went phut
Why is pypy that slower? What should I change?

First, I want to point out that you are studying the Bell numbers, which might ease the next part of your work, after you're done generating all the subsets. For example, it's easy to know how large each Bell set will be; OEIS has the sequence of Bell numbers already.
I hand-wrote the loops to generate the Bell sets; here is my code:
cache = {0: (), 1: ((set([1]),),)}
def bell(x):
# Change these lines to alter memoization.
if x in cache:
return cache[x]
previous = bell(x - 1)
new = []
for sets in previous:
r = []
for mark in range(len(sets)):
l = [s | set([x]) if i == mark else s for i, s in enumerate(sets)]
r.append(tuple(l))
new.extend(r)
new.append(sets + (set([x]),))
cache[x] = tuple(new)
return new
I included some memoization here for practical purposes. However, by commenting out some code, and writing some other code, you can obtain the following un-memoized version, which I used for benchmarks:
def bell(x):
if x == 0:
return ()
if x == 1:
return ((set([1]),),)
previous = bell(x - 1)
new = []
for sets in previous:
r = []
for mark in range(len(sets)):
l = [s | set([x]) if i == mark else s for i, s in enumerate(sets)]
r.append(tuple(l))
new.extend(r)
new.append(sets + (set([x]),))
cache[x] = tuple(new)
return new
My numbers are based on a several-year-old Thinkpad that I do most of my work on. Most of the smaller cases are way too fast to measure reliably (not even a single millisecond per trial for the first few), so my benchmarks are testing bell(9) through bell(11).
Benchmarks for CPython 2.7.11, using the standard timeit module:
$ python -mtimeit -s 'from derp import bell' 'bell(9)'
10 loops, best of 3: 31.5 msec per loop
$ python -mtimeit -s 'from derp import bell' 'bell(10)'
10 loops, best of 3: 176 msec per loop
$ python -mtimeit -s 'from derp import bell' 'bell(11)'
10 loops, best of 3: 1.07 sec per loop
And on PyPy 4.0.1, also using timeit:
$ pypy -mtimeit -s 'from derp import bell' 'bell(9)'
100 loops, best of 3: 14.3 msec per loop
$ pypy -mtimeit -s 'from derp import bell' 'bell(10)'
10 loops, best of 3: 90.8 msec per loop
$ pypy -mtimeit -s 'from derp import bell' 'bell(11)'
10 loops, best of 3: 675 msec per loop
So, the conclusion that I've come to is that itertools is not very fast when you try to use it outside of its intended idioms. Bell numbers are interesting combinatorically but they do not naturally arise from any simple composition of itertools widgets that I can find.
In response to your original query of what to do to make it faster: Just open-code it. Hope this helps!
~ C.

Here's a Pypy issue on itertools.product.
https://bitbucket.org/pypy/pypy/issues/1677/itertoolsproduct-slower-than-nested-fors
Note that our goal is to ensure that itertools is not massively slower than
plain Python, but we don't really care about making it exactly as fast (or
faster) as plain Python. As long as it's not massively slower, it's fine. (At
least I don't agree with you about whether a) or b) is easier to read :-)
Without studying your code in detail, it looks like it makes heavy use of the itertools combinations, permutations and product functions. In regular CPython those are written in compiled C code, with the intention of making them fast. Pypy does not implement the C code, so it shouldn't be surprising that these functions are slower.

Related

Why is Collections.counter so slow?

I'm trying to solve a Rosalind basic problem of counting nucleotides in a given sequence, and returning the results in a list. For those ones not familiar with bioinformatics it's just counting the number of occurrences of 4 different characters ('A','C','G','T') inside a string.
I expected collections.Counter to be the fastest method (first because they claim to be high-performance, and second because I saw a lot of people using it for this specific problem).
But to my surprise this method is the slowest!
I compared three different methods, using timeit and running two types of experiments:
Running a long sequence few times
Running a short sequence a lot of times.
Here is my code:
import timeit
from collections import Counter
# Method1: using count
def method1(seq):
return [seq.count('A'), seq.count('C'), seq.count('G'), seq.count('T')]
# method 2: using a loop
def method2(seq):
r = [0, 0, 0, 0]
for i in seq:
if i == 'A':
r[0] += 1
elif i == 'C':
r[1] += 1
elif i == 'G':
r[2] += 1
else:
r[3] += 1
return r
# method 3: using Collections.counter
def method3(seq):
counter = Counter(seq)
return [counter['A'], counter['C'], counter['G'], counter['T']]
if __name__ == '__main__':
# Long dummy sequence
long_seq = 'ACAGCATGCA' * 10000000
# Short dummy sequence
short_seq = 'ACAGCATGCA' * 1000
# Test 1: Running a long sequence once
print timeit.timeit("method1(long_seq)", setup='from __main__ import method1, long_seq', number=1)
print timeit.timeit("method2(long_seq)", setup='from __main__ import method2, long_seq', number=1)
print timeit.timeit("method3(long_seq)", setup='from __main__ import method3, long_seq', number=1)
# Test2: Running a short sequence lots of times
print timeit.timeit("method1(short_seq)", setup='from __main__ import method1, short_seq', number=10000)
print timeit.timeit("method2(short_seq)", setup='from __main__ import method2, short_seq', number=10000)
print timeit.timeit("method3(short_seq)", setup='from __main__ import method3, short_seq', number=10000)
Results:
Test1:
Method1: 0.224009990692
Method2: 13.7929501534
Method3: 18.9483819008
Test2:
Method1: 0.224207878113
Method2: 13.8520510197
Method3: 18.9861831665
Method 1 is way faster than method 2 and 3 for both experiments!!
So I have a set of questions:
Am I doing something wrong or it is indeed slower than the other two approaches? Could someone run the same code and share the results?
In case my results are correct, (and maybe this should be another question) is there a faster method to solve this problem than using method 1?
If count is faster, then what's the deal with collections.Counter?
It's not because collections.Counter is slow, it's actually quite fast, but it's a general purpose tool, counting characters is just one of many applications.
On the other hand str.count just counts characters in strings and it's heavily optimized for its one and only task.
That means that str.count can work on the underlying C-char array while it can avoid creating new (or looking up existing) length-1-python-strings during the iteration (which is what for and Counter do).
Just to add some more context to this statement.
A string is stored as C array wrapped as python object. The str.count knows that the string is a contiguous array and thus converts the character you want to co to a C-"character", then iterates over the array in native C code and checks for equality and finally wraps and returns the number of found occurrences.
On the other hand for and Counter use the python-iteration-protocol. Each character of your string will be wrapped as python-object and then it (hashes and) compares them within python.
So the slowdown is because:
Each character has to be converted to a Python object (this is the major reason for the performance loss)
The loop is done in Python (not applicable to Counter in python 3.x because it was rewritten in C)
Each comparison has to be done in Python (instead of just comparing numbers in C - characters are represented by numbers)
The counter needs to hash the values and your loop needs to index your list.
Note the reason for the slowdown is similar to the question about Why are Python's arrays slow?.
I did some additional benchmarks to find out at which point collections.Counter is to be preferred over str.count. To this end I created random strings containing differing numbers of unique characters and plotted the performance:
from collections import Counter
import random
import string
characters = string.printable # 100 different printable characters
results_counter = []
results_count = []
nchars = []
for i in range(1, 110, 10):
chars = characters[:i]
string = ''.join(random.choice(chars) for _ in range(10000))
res1 = %timeit -o Counter(string)
res2 = %timeit -o {char: string.count(char) for char in chars}
nchars.append(len(chars))
results_counter.append(res1)
results_count.append(res2)
and the result was plotted using matplotlib:
import matplotlib.pyplot as plt
plt.figure()
plt.plot(nchars, [i.best * 1000 for i in results_counter], label="Counter", c='black')
plt.plot(nchars, [i.best * 1000 for i in results_count], label="str.count", c='red')
plt.xlabel('number of different characters')
plt.ylabel('time to count the chars in a string of length 10000 [ms]')
plt.legend()
Results for Python 3.5
The results for Python 3.6 are very similar so I didn't list them explicitly.
So if you want to count 80 different characters Counter becomes faster/comparable because it traverses the string only once and not multiple times like str.count. This will be weakly dependent on the length of the string (but testing showed only a very weak difference +/-2%).
Results for Python 2.7
In Python-2.7 collections.Counter was implemented using python (instead of C) and is much slower. The break-even point for str.count and Counter can only be estimated by extrapolation because even with 100 different characters the str.count is still 6 times faster.
The time difference here is pretty simple to explain. It all comes down to what runs within Python and what runs as native code. The latter will always be faster since it does not come with lots of evaluation overhead.
Now that’s already the reason why calling str.count() four times is faster than anything else. Although this iterates the string four times, these loops run in native code. str.count is implemented in C, so this has very little overhead, making this very fast. It’s really difficult to beat this, especially when the task is that simple (looking only for simple character equality).
Your second method, of collecting the counts in an array is actually a less performant version of the following:
def method4 (seq):
a, c, g, t = 0, 0, 0, 0
for i in seq:
if i == 'A':
a += 1
elif i == 'C':
c += 1
elif i == 'G':
g += 1
else:
t += 1
return [a, c, g, t]
Here, all four values are individual variables, so updating them is very fast. This is actually a bit faster than mutating list items.
The overall performance “problem” here is however that this iterates the string within Python. So this creates a string iterator and then produces every character individually as an actual string object. That’s a lot overhead and the main reason why every solution that works by iterating the string in Python will be slower.
The same problem is with collection.Counter. It’s implemented in Python so even though it’s very efficient and flexible, it suffers from the same issue that it’s just never near native in terms of speed.
As others have already noted, you are comparing fairly specific code against fairly general one.
Consider that something as trivial as spelling out the loop over the characters you are interested in is already buying you a factor 2, i.e.
def char_counter(text, chars='ACGT'):
return [text.count(char) for char in chars]
%timeit method1(short_seq)
# 100000 loops, best of 3: 18.8 µs per loop
%timeit char_counter(short_seq)
# 10000 loops, best of 3: 40.8 µs per loop
%timeit method1(long_seq)
# 10 loops, best of 3: 172 ms per loop
%timeit char_counter(long_seq)
# 1 loop, best of 3: 374 ms per loop
Your method1() is the fastest but not the most efficient, as the input is looped through entirely for each char you are inspecting, thereby not taking advantage of the fact that you could easily short-circuit your looping as soon as a character gets assigned to one of the character classes.
Unfortunately, Python does not offer a fast method to take advantage of the specific conditions of your problem.
However, you could use Cython for this, and you would then be able to outperform your method1():
%%cython -c-O3 -c-march=native -a
#cython: language_level=3, boundscheck=False, wraparound=False, initializedcheck=False, cdivision=True, infer_types=True
import numpy as np
cdef void _count_acgt(
const unsigned char[::1] text,
unsigned long len_text,
unsigned long[::1] counts):
for i in range(len_text):
if text[i] == b'A':
counts[0] += 1
elif text[i] == b'C':
counts[1] += 1
elif text[i] == b'G':
counts[2] += 1
else:
counts[3] += 1
cpdef ascii_count_acgt(text):
counts = np.zeros(4, dtype=np.uint64)
bin_text = text.encode()
return _count_acgt(bin_text, len(bin_text), counts)
%timeit ascii_count_acgt(short_seq)
# 100000 loops, best of 3: 12.6 µs per loop
%timeit ascii_count_acgt(long_seq)
# 10 loops, best of 3: 140 ms per loop

Anaphoric list comprehension in Python

Consider the following toy example:
>>> def square(x): return x*x
...
>>> [square(x) for x in range(12) if square(x) > 50]
[64, 81, 100, 121]
I have to call square(x) twice in the list comprehension. The duplication is ugly, bug-prone (it's easy to change only one of the two calls when modifying the code), and inefficient.
Of course I can do this:
>>> squares = [square(x) for x in range(12)]
>>> [s for s in squares if s > 50]
[64, 81, 100, 121]
or this:
[s for s in [square(x) for x in range(12)] if s > 50]
These are both livable, but it feels as though there might be a way to do it all in a single statement without nesting the two list comprehensions, which I know I'll have to stare it for a while next time I'm reading the code just to figure out what's going on. Is there a way?
I think a fair question to ask of me would be what I imagine such syntax could look like. Here are two ideas, but neither feels idiomatic in Python (nor do they work). They are inspired by anaphoric macros in Lisp.
[square(x) for x in range(12) if it > 50]
[it=square(x) for x in range(12) if it > 50]
You should use a generator:
[s for s in (square(x) for x in range(12)) if s > 50]
This avoids creating an intermediate unfiltered list of squares.
Here is a comparison of nested generator vs "chained" list comps vs calculating twice
$ python -m timeit "[s for n in range(12) for s in [n * n] if s > 50]"
100000 loops, best of 3: 2.48 usec per loop
$ python -m timeit "[s for s in (x * x for x in range(12)) if s > 50]"
1000000 loops, best of 3: 1.89 usec per loop
$ python -m timeit "[n * n for n in range(12) if n * n > 50]"
1000000 loops, best of 3: 1.1 usec per loop
$ pypy -m timeit "[s for n in range(12) for s in [n * n] if s > 50]"
1000000 loops, best of 3: 0.211 usec per loop
$ pypy -m timeit "[s for s in (x * x for x in range(12)) if s > 50]"
1000000 loops, best of 3: 0.359 usec per loop
$ pypy -m timeit "[n * n for n in range(12) if n * n > 50]"
10000000 loops, best of 3: 0.0834 usec per loop
I used n * n instead of square(n) because it was convenient and removes the function call overhead from the benckmark
TLDR: For simple cases it may be best to just duplicate the calculation.
Another alternative, using "chained" list comps rather than nested ones:
[s for n in range(12) for s in [square(n)] if s > 50]
Might be a weird read, though.
[square(s) for s in range(12) if s >= 7] # sqrt(50) = 7.071...
Or even simpler (no branching, woo!)
[square(s) for s in range(7, 12)] # sqrt(50) = 7.071...
EDIT: I'm blind, duplicated Eevee's answer.
It is possible to abuse iteration over a 1-element list to "bind" intermediate variables:
[s for x in range(12) for s in [square(x)] if s > 50]
I'm hesitant to recommend this as a readable solution though.
Pro: Compared to the nested comprehension, I prefer the order here — having for x in range(12) outside. You can just read it sequentially instead of zooming in then back out...
Con: The for s in [...] is a non-idiomatic hack and might give readers a pause. The nested comprehension while arguably harder to decipher at least uses language features in an "obvious" way.
Idea: Renaming the intermediate variable something like tmp could I think make it clearer.
The bottom line is that I'm not happy with either.
Probably the most readable is naming the intermediate generator:
squares = (square(x) for x in range(12))
result = [s for s in squares if s > 50]
[Side note: naming results of generator expressions is a bit rare. But read David Beazley's lecture and it might grow on you.]
OTOH if you're going to write such constructs a lot, go for the for tmp in [expr(x)] pattern — it will become "locally idiomatic" within your code and once familiar, its compactness will pay off. My readability concern is more about one-off use...

Fastest possible way to iterate through a specific list?

Let's say I have a list:
list=['plu;ean;price;quantity','plu1;ean1;price1;quantity1']
I want to iterate over the list + split the list by ";" and put an if clause, like this:
for item in list:
split_item=item.split(";")
if split_item[0] == "string_value" or split_item[1] == "string_value":
do something.....
I was wondering, if this is the fastest way possible? Let's say my initial list is a lot bigger (has a lot more list items). I tried with list comprehensions:
item=[item.split(";") for item in list if item.split(";")[0] == "string_value" or item.split(";")[1] == "string_value"]
But this is actually giving me slower results. The first case is giving me an average of 90ms, while the second one is giving me an average of 130ms.
Am I doing the list comprehension wrong? Is there a faster solution?
I was wondering, if this is the fastest way possible?
No, of course not. You can implement it a lot faster in hand-coded assembly than in Python. So what?
If the "do something..." is not trivial, and there are many matches, the cost to do something 100000 times is going to be a lot more expensive than the cost of looping 500000 times, so finding the fastest way to loop doesn't matter at all.
In fact, just calling split two to three each loop instead of remembering and reusing the result is going to swamp the cost of iteration, and not passing a maxsplit argument when you only care about two results may as well.
So, you're trying to optimize the wrong thing. But what if, after you fix everything else, it turns out that the cost of iteration really does matter here?
Well, you can't use a comprehension directly to speed things up, because comprehensions are for expressions that return values, not statements to do things.
But, if you look at your code, you'll realize you're actually doing three things: splitting each string, then filtering out the ones that don't match, then doing the "do something". So, you can use a comprehension for the first two parts, and then you're only using a slow for loop for the much smaller list of values that passed the filter.
It looks like you tried this, but you made two mistakes.
First, you're better off with a generator expression than a list comprehension—you don't need a list here, just something to iterator over, so don't pay to build one.
Second, you don't want to split the string three times. You can probably find some convoluted way to get the split done once in a single comprehension, but why bother? Just write each step as its own step.
So:
split_items = (item.split(';') for item in items)
filtered_items = (item for item in split_items
if item[0] == "string_value" or item[1] == "string_value")
for item in filtered_items:
do something...
Will this actually be faster? If you can get some real test data, and "do something..." code, that shows that the iteration is a bottleneck, you can test on that real data and code. Until then, there's nothing to test.
Split the whole string only when the first two items retrieved from str.split(';', 2) satisfy the conditions:
>>> strs = 'plu;ean;price;quantity'
>>> strs.split(';', 2)
['plu', 'ean', 'price;quantity']
Here split the third item('price;quantity') only if the first two items have satisfied the condition:
>>> lis = ['plu;ean;price;quantity'*1000, 'plu1;ean1;price1;quantity1'*1000]*1000
Normal for-loop, single split of whole string for each item of the list.
>>> %%timeit
for item in lis:
split_item=item.split(";")
if split_item[0] == "plu" or split_item[1] == "ean":pass
...
1 loops, best of 3: 952 ms per loop
List comprehension equivalent to the for-loop above:
>>> %timeit [x for x in (item.split(';') for item in lis) if x[0]== "plu" or x[1]=="ean"]
1 loops, best of 3: 961 ms per loop
Split on-demand:
>>> %timeit [[x] + [y] + z.split(';') for x, y, z in (item.split(';', 2) for item in lis) if x== "plu" or y=="ean"]
1 loops, best of 3: 508 ms per loop
Of course, if the list and strings are small then such optimisation doesn't matter.
EDIT: It turns out that the Regex cache was being a bit unfair to the competition. My bad. Regex is only a small percentage faster.
If you're looking for speed, hcwhsa's answer should be good enough. If you need slightly more, look to re.
import re
from itertools import chain
lis = ['plu;ean;price;quantity'*1000, 'plu1;ean1;price1;quantity1'*100]*1000
matcher = re.compile('^(?:plu(?:;|$)|[^;]*;ean(?:;|$))').match
[l.split(';') for l in lis if matcher(l)]
Timings, for mostly positive results (aka. split is the major cause of slowness):
SETUP="
import re
from itertools import chain
matcher = re.compile('^(?:plu(?:;|$)|[^;]*;ean(?:;|$))').match
lis = ['plu1;ean1;price1;quantity1'+chr(i) for i in range(10000)] + ['plu;ean;price;quantity' for i in range(10000)]
"
python -m timeit -s "$SETUP" "[[x] + [y] + z.split(';') for x, y, z in (item.split(';', 2) for item in lis) if x== 'plu' or y=='ean']"
python -m timeit -s "$SETUP" "[l.split(';') for l in lis if matcher(l)]"
We see mine's a little faster.
10 loops, best of 3: 55 msec per loop
10 loops, best of 3: 49.5 msec per loop
For mostly negative results (most things are filtered):
SETUP="
import re
from itertools import chain
matcher = re.compile('^(?:plu(?:;|$)|[^;]*;ean(?:;|$))').match
lis = ['plu1;ean1;price1;quantity1'+chr(i) for i in range(1000)] + ['plu;ean;price;quantity' for i in range(10000)]
"
python -m timeit -s "$SETUP" "[[x] + [y] + z.split(';') for x, y, z in (item.split(';', 2) for item in lis) if x== 'plu' or y=='ean']"
python -m timeit -s "$SETUP" "[l.split(';') for l in lis if matcher(l)]"
The lead's a touch higher.
10 loops, best of 3: 40.9 msec per loop
10 loops, best of 3: 35.7 msec per loop
If the result will always be unique, use
next([x] + [y] + z.split(';') for x, y, z in (item.split(';', 2) for item in lis) if x== 'plu' or y=='ean')
or the faster Regex version
next(filter(matcher, lis)).split(';')
(use itertools.ifilter on Python 2).
Timings:
SETUP="
import re
from itertools import chain
matcher = re.compile('^(?:plu(?:;|$)|[^;]*;ean(?:;|$))').match
lis = ['plu1;ean1;price1;quantity1'+chr(i) for i in range(10000)] + ['plu;ean;price;quantity'] + ['plu1;ean1;price1;quantity1'+chr(i) for i in range(10000)]
"
python -m timeit -s "$SETUP" "[[x] + [y] + z.split(';') for x, y, z in (item.split(';', 2) for item in lis) if x== 'plu' or y=='ean']"
python -m timeit -s "$SETUP" "next([x] + [y] + z.split(';') for x, y, z in (item.split(';', 2) for item in lis) if x== 'plu' or y=='ean')"
python -m timeit -s "$SETUP" "[l.split(';') for l in lis if matcher(l)]"
python -m timeit -s "$SETUP" "next(filter(matcher, lis)).split(';')"
Results:
10 loops, best of 3: 31.3 msec per loop
100 loops, best of 3: 15.2 msec per loop
10 loops, best of 3: 28.8 msec per loop
100 loops, best of 3: 14.1 msec per loop
So this gives a substantial boost to both methods.
I found a good alternative here.
You can use a combination of map and filter. Try this:
>>>import itertools
>>>splited_list = itertools.imap(lambda x: x.split(";"), your_list)
>>>result = filter(lambda x: filter(lambda x: x[0] == "plu" or x[1] == "string_value", lista)
The first item will create a iterator of elements. And The second one will filter it.
I run a small benchmark in my IPython Notebook shell, and got the following results:
1st test:
With small sizes, the one-line solution works better
2nd test:
With a bigger list, the map/filter solution is slightly better
3rd test:
With a big list and bigger elements, the map/filter solution it`s way better.
I guess the difference in performance continues increasing as the size of the list goes by, untill peaks in 66% more time (in a 10000 elements list trial).
The difference between the map/filter solution and the list comprehension solutions is the number of calls to .split(). Ones calls it 3 times for each item, the other just one, because list comprehensions are just a pythonic way to do map/filter together. I used to use list comprehensions a lot, and thought that i don't knew what the lambda was all about. Untill i discovered that map and list comprehensions are the same thing.
If you don't care about memory usage, you can use regular map instead of imap. It will create the list with splits at once. It will use more memory to store it, but its slightly faster.
Actually, if you don't care about memory usage, you can write the map/filter solution using 2 list comprehensions, and get the same exact result. Checkout:

Rationale behind Python's preferred for syntax

What is the rationale behind the advocated use of the for i in xrange(...)-style looping constructs in Python? For simple integer looping, the difference in overheads is substantial. I conducted a simple test using two pieces of code:
File idiomatic.py:
#!/usr/bin/env python
M = 10000
N = 10000
if __name__ == "__main__":
x, y = 0, 0
for x in xrange(N):
for y in xrange(M):
pass
File cstyle.py:
#!/usr/bin/env python
M = 10000
N = 10000
if __name__ == "__main__":
x, y = 0, 0
while x < N:
while y < M:
y += 1
x += 1
Profiling results were as follows:
bash-3.1$ time python cstyle.py
real 0m0.109s
user 0m0.015s
sys 0m0.000s
bash-3.1$ time python idiomatic.py
real 0m4.492s
user 0m0.000s
sys 0m0.031s
I can understand why the Pythonic version is slower -- I imagine it has a lot to do with calling xrange N times, perhaps this could be eliminated if there was a way to rewind a generator. However, with this deal of difference in execution time, why would one prefer to use the Pythonic version?
Edit: I conducted the tests again using the code Mr. Martelli provided, and the results were indeed better now:
I thought I'd enumerate the conclusions from the thread here:
1) Lots of code at the module scope is a bad idea, even if the code is enclosed in an if __name__ == "__main__": block.
2) *Curiously enough, modifying the code that belonged to thebadone to my incorrect version (letting y grow without resetting) produced little difference in performance, even for larger values of M and N.
Here's the proper comparison, e.g. in loop.py:
M = 10000
N = 10000
def thegoodone():
for x in xrange(N):
for y in xrange(M):
pass
def thebadone():
x = 0
while x < N:
y = 0
while y < M:
y += 1
x += 1
All substantial code should always be in functions -- putting a hundred million loops at a module's top level shows reckless disregard for performance and makes a mockery of any attempts at measuring said performance.
Once you've done that, you see:
$ python -mtimeit -s'import loop' 'loop.thegoodone()'
10 loops, best of 3: 3.45 sec per loop
$ python -mtimeit -s'import loop' 'loop.thebadone()'
10 loops, best of 3: 10.6 sec per loop
So, properly measured, the bad way that you advocate is about 3 times slower than the good way which Python promotes. I hope this makes you reconsider your erroneous advocacy.
You forgot to reset y to 0 after the inner loop.
#!/usr/bin/env python
M = 10000
N = 10000
if __name__ == "__main__":
x, y = 0, 0
while x < N:
while y < M:
y += 1
x += 1
y = 0
ed: 20.63s after fix vs. 6.97s using xrange
good for iterating over data structures
The for i in ... syntax is great for iterating over data structures. In a lower-level language, you would generally be iterating over an array indexed by an int, but with the python syntax you can eliminate the indexing step.
this is not a direct answer to the question, but i want to open the dialog a bit more on xrange(). two things:
A. there is something wrong with one of the OP statements that no one has corrected yet (yes, in addition to the bug in the code of not resetting y):
"I imagine it has a lot to do with calling xrange N times...."
unlike traditional counting for loops, Python's is more like a shell's foreach... looping over an iterable. therefore, xrange() is called exactly once, not "N times."
B. xrange() is the name of this function in Python 2. it replaces and is renamed to range() in Python 3, so keep this in mind when porting. if you didn't know already, xrange() returns an iterator(-like object) while range() returns lists. since the latter is more inefficient, it has been deprecated in favor of xrange() which is more memory-friendly. the workaround in Python 3, for all those who need to have a list is list(range(N)).
I've repeated the test from #Alex Martelli's answer. The idiomatic for loop is 5 times faster than the while loop:
python -mtimeit -s'from while_vs_for import while_loop as loop' 'loop(10000)'
10 loops, best of 3: 9.6 sec per loop
python -mtimeit -s'from while_vs_for import for_loop as loop' 'loop(10000)'
10 loops, best of 3: 1.83 sec per loop
while_vs_for.py:
def while_loop(N):
x = 0
while x < N:
y = 0
while y < N:
pass
y += 1
x += 1
def for_loop(N):
for x in xrange(N):
for y in xrange(N):
pass
At module level:
$ time -p python for.py
real 4.38
user 4.37
sys 0.01
$ time -p python while.py
real 14.28
user 14.28
sys 0.01

Python tabstop-aware len() and padding functions

Python's len() and padding functions like string.ljust() are not tabstop-aware, i.e. they treat '\t' like any other single-width character, and don't round len() up to the nearest multiple of tabstop. Example:
len('Bear\tnecessities\t')
is 17 instead of 24 ( i.e. 4+(8-4)+11+(8-3) )
and say I also want a function pad_with_tabs(s) such that
pad_with_tabs('Bear', 15) = 'Bear\t\t'
Looking for simple implementations of these - compactness and readability first, efficiency second.
This is a basic but irritating question.
#gnibbler - can you show a purely Pythonic solution, even if it's say 20x less efficient?
Sure you could convert back and forth using str.expandtabs(TABWIDTH), but that's clunky.
Importing math to get TABWIDTH * int( math.ceil(len(s)*1.0/TABWIDTH) ) also seems like massive overkill.
I couldn't manage anything more elegant than the following:
TABWIDTH = 8
def pad_with_tabs(s,maxlen):
s_len = len(s)
while s_len < maxlen:
s += '\t'
s_len += TABWIDTH - (s_len % TABWIDTH)
return s
and since Python strings are immutable and unless we want to monkey-patch our function into string module to add it as a method, we must also assign to the result of the function:
s = pad_with_tabs(s, ...)
In particular I couldn't get clean approaches using list-comprehension or string.join(...):
''.join([s, '\t' * ntabs])
without special-casing the cases where len(s) is < an integer multiple of TABWIDTH), or len(s)>=maxlen already.
Can anyone show better len() and pad_with_tabs() functions?
TABWIDTH=8
def my_len(s):
return len(s.expandtabs(TABWIDTH))
def pad_with_tabs(s,maxlen):
return s+"\t"*((maxlen-len(s)-1)/TABWIDTH+1)
Why did I use expandtabs()?
Well it's fast
$ python -m timeit '"Bear\tnecessities\t".expandtabs()'
1000000 loops, best of 3: 0.602 usec per loop
$ python -m timeit 'for c in "Bear\tnecessities\t":pass'
100000 loops, best of 3: 2.32 usec per loop
$ python -m timeit '[c for c in "Bear\tnecessities\t"]'
100000 loops, best of 3: 4.17 usec per loop
$ python -m timeit 'map(None,"Bear\tnecessities\t")'
100000 loops, best of 3: 2.25 usec per loop
Anything that iterates over your string is going to be slower, because just the iteration is ~4 times slower than expandtabs even when you do nothing in the loop.
$ python -m timeit '"Bear\tnecessities\t".split("\t")'
1000000 loops, best of 3: 0.868 usec per loop
Even just splitting on tabs takes longer. You'd still need to iterate over the split and pad each item to the tabstop
I believe gnibbler's is the best for most prectical cases. But anyway, here is a naive (without accounting CR, LF etc) solution to compute the length of string without creating expanded copy:
def tab_aware_len(s, tabstop=8):
pos = -1
extra_length = 0
while True:
pos = s.find('\t', pos+1)
if pos<0:
return len(s) + extra_length
extra_length += tabstop - (pos+extra_length) % tabstop - 1
Probably it could be useful for some huge strings or even memory mapped files. And here is padding function a bit optimized:
def pad_with_tabs(s, max_len, tabstop=8):
length = tab_aware_len(s, tabstop)
if length<max_len:
s += '\t' * ((max_len-1)//tabstop + 1 - length//tabstop)
return s
TABWIDTH * int( math.ceil(len(s)*1.0/TABWIDTH) ) is indeed a massive over-kill; you can get the same result much more simply. For positive i and n, use:
def round_up_positive_int(i, n):
return ((i + n - 1) // n) * n
This procedure works in just about any language I've ever used, after appropriate translation.
Then you can do next_pos = round_up_positive_int(len(s), TABWIDTH)
For a slight increase in the elegance of your code, instead of
while(s_len < maxlen):
use this:
while s_len < maxlen:
Unfortunately I was unable to make use of accepted answer "as is" so here goes slightly modified version just in case someone would run into same problem and discovers this post via search:
from decimal import Decimal, ROUND_HALF_UP
TABWIDTH = 4
def pad_with_tabs(src, max_len):
return src + "\t" * int(
Decimal((max_len - len(src.expandtabs(TABWIDTH))) / TABWIDTH + 1).quantize(0, ROUND_HALF_UP))
def pad_fields(input):
result = []
longest = max(len(x) for x in input)
for row in input:
result.append(pad_with_tabs(row, longest))
return result
Output list contains properly padded rows having tab count rounded so the resulting data will have same indentation level regardless of corner .5 cases when no tab gets added in the original answer.

Categories