I see many people using format strings like this:
root = "sample"
output = "output"
path = "{}/{}".format(root, output)
Instead of simply concatenating strings like this:
path = root + '/' + output
Do format strings have better performance or is this just for looks?
It's just for the looks. You can see at one glance what the format is. Many of us like readability better than micro-optimization.
Let's see what IPython's %timeit says:
Python 3.7.2 (default, Jan 3 2019, 02:55:40)
IPython 5.8.0
Intel(R) Core(TM) i5-4590T CPU # 2.00GHz
In [1]: %timeit root = "sample"; output = "output"; path = "{}/{}".format(root, output)
The slowest run took 12.44 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 223 ns per loop
In [2]: %timeit root = "sample"; output = "output"; path = root + '/' + output
The slowest run took 13.82 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 5: 101 ns per loop
In [3]: %timeit root = "sample"; output = "output"; path = "%s/%s" % (root, output)
The slowest run took 27.97 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 5: 155 ns per loop
In [4]: %timeit root = "sample"; output = "output"; path = f"{root}/{output}"
The slowest run took 19.52 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 5: 77.8 ns per loop
I agree that the formatting is mostly used for readability, but since the release of f-strings in 3.6, the tables have turned in terms of performance. It is also my opinion that the f-strings are more readable/maintainable since 1) they can be read left-right like most regular text and 2) the spacing-related disadvantages of concatenation are avoided since the variables are in-string.
Running this code:
from timeit import timeit
runs = 1000000
def print_results(time, start_string):
print(f'{start_string}\n'
f'Total: {time:.4f}s\n'
f'Avg: {(time/runs)*1000000000:.4f}ns\n')
t1 = timeit('"%s, %s" % (greeting, loc)',
setup='greeting="hello";loc="world"',
number=runs)
t2 = timeit('f"{greeting}, {loc}"',
setup='greeting="hello";loc="world"',
number=runs)
t3 = timeit('greeting + ", " + loc',
setup='greeting="hello";loc="world"',
number=runs)
t4 = timeit('"{}, {}".format(greeting, loc)',
setup='greeting="hello";loc="world"',
number=runs)
print_results(t1, '% replacement')
print_results(t2, 'f strings')
print_results(t3, 'concatenation')
print_results(t4, '.format method')
yields this result on my machine:
% replacement
Total: 0.3044s
Avg: 304.3638ns
f strings
Total: 0.0991s
Avg: 99.0777ns
concatenation
Total: 0.1252s
Avg: 125.2442ns
.format method
Total: 0.3483s
Avg: 348.2690ns
A similar answer to a different question is given on this answer.
As with most things, there will be a performance difference, but ask yourself "Does it really matter if this is ns faster?". The root + '/' output method is quick and easy to type out. But this can get hard to read real quick when you have multiple variables to print out
foo = "X = " + myX + " | Y = " + someY + " Z = " + Z.toString()
vs
foo = "X = {} | Y= {} | Z = {}".format(myX, someY, Z.toString())
Which is easier to understand what is going on? Unless you really need to eak out performance, chose the way that will be easiest for people to read and understand
It's not just for "looks", or for powerful lexical type conversions; it's also a must for internationalisation.
You can swap out the format string depending on what language is selected.
With a long line of string concatenations baked into the source code, this becomes effectively impossible to do properly.
As of Python 3.6 you can do literal string interpolation by prepending f to the string:
foo = "foo"
bar = "bar"
path = f"{foo}/{bar}"
String format is free of data type while binding data. While in concatenation we have to type cast or convert the data accordingly.
For example:
a = 10
b = "foo"
c = str(a) + " " + b
print c
> 10 foo
It could be done via string formatting as:
a = 10
b = "foo"
c = "{} {}".format(a, b)
print c
> 10 foo
Such that with-in placeholders {} {}, we assume two things to come further i.e., in this case, are a and b.
It's for looks and the maintaining of the code. It's really easier to edit your code if you used format. Also when you use + you may miss the details like spaces. Use format for your and possible maintainers' good.
Related
I'm currently having a small side project in which I want to sort a 20GB file on my machine as fast as possible. The idea is to chunk the file, sort the chunks, merge the chunks. I just used pyenv to time the radixsort code with different Python versions and saw that 2.7.18 is way faster than 3.6.10, 3.7.7, 3.8.3 and 3.9.0a. Can anybody explain why Python 3.x is slower than 2.7.18 in this simple example? Were there new features added?
import os
def chunk_data(filepath, prefixes):
"""
Pre-sort and chunk the content of filepath according to the prefixes.
Parameters
----------
filepath : str
Path to a text file which should get sorted. Each line contains
a string which has at least 2 characters and the first two
characters are guaranteed to be in prefixes
prefixes : List[str]
"""
prefix2file = {}
for prefix in prefixes:
chunk = os.path.abspath("radixsort_tmp/{:}.txt".format(prefix))
prefix2file[prefix] = open(chunk, "w")
# This is where most of the execution time is spent:
with open(filepath) as fp:
for line in fp:
prefix2file[line[:2]].write(line)
Execution times (multiple runs):
2.7.18: 192.2s, 220.3s, 225.8s
3.6.10: 302.5s
3.7.7: 308.5s
3.8.3: 279.8s, 279.7s (binary mode), 295.3s (binary mode), 307.7s, 380.6s (wtf?)
3.9.0a: 292.6s
The complete code is on Github, along with a minimal complete version
Unicode
Yes, I know that Python 3 and Python 2 deal different with strings. I tried opening the files in binary mode (rb / wb), see the "binary mode" comments. They are a tiny bit faster on a couple of runs. Still, Python 2.7 is WAY faster on all runs.
Try 1: Dictionary access
When I phrased this question, I thought that dictionary access might be a reason for this difference. However, I think the total execution time is way less for dictionary access than for I/O. Also, timeit did not show anything important:
import timeit
import numpy as np
durations = timeit.repeat(
'a["b"]',
repeat=10 ** 6,
number=1,
setup="a = {'b': 3, 'c': 4, 'd': 5}"
)
mul = 10 ** -7
print(
"mean = {:0.1f} * 10^-7, std={:0.1f} * 10^-7".format(
np.mean(durations) / mul,
np.std(durations) / mul
)
)
print("min = {:0.1f} * 10^-7".format(np.min(durations) / mul))
print("max = {:0.1f} * 10^-7".format(np.max(durations) / mul))
Try 2: Copy time
As a simplified experiment, I tried to copy the 20GB file:
cp via shell: 230s
Python 2.7.18: 237s, 249s
Python 3.8.3: 233s, 267s, 272s
The Python stuff is generated by the following code.
My first thought was that the variance is quite high. So this could be the reason. But then, the variance of chunk_data execution time is also high, but the mean is noticeably lower for Python 2.7 than for Python 3.x. So it seems not to be an I/O scenario as simple as I tried here.
import time
import sys
import os
version = sys.version_info
version = "{}.{}.{}".format(version.major, version.minor, version.micro)
if os.path.isfile("numbers-tmp.txt"):
os.remove("numers-tmp.txt")
t0 = time.time()
with open("numbers-large.txt") as fin, open("numers-tmp.txt", "w") as fout:
for line in fin:
fout.write(line)
t1 = time.time()
print("Python {}: {:0.0f}s".format(version, t1 - t0))
My System
Ubuntu 20.04
Thinkpad T460p
Python through pyenv
This is a combination of multiple effects, mostly the fact that Python 3 needs to perform unicode decoding/encoding when working in text mode and if working in binary mode it will send the data through dedicated buffered IO implementations.
First of all, using time.time to measure execution time uses the wall time and hence includes all sorts of Python unrelated things such as OS-level caching and buffering, as well as buffering of the storage medium. It also reflects any interference with other processes that require the storage medium. That's why you are seeing these wild variations in timing results. Here are the results for my system, from seven consecutive runs for each version:
py3 = [660.9, 659.9, 644.5, 639.5, 752.4, 648.7, 626.6] # 661.79 +/- 38.58
py2 = [635.3, 623.4, 612.4, 589.6, 633.1, 613.7, 603.4] # 615.84 +/- 15.09
Despite the large variation it seems that these results indeed indicate different timings as can be confirmed for example by a statistical test:
>>> from scipy.stats import ttest_ind
>>> ttest_ind(p2, p3)[1]
0.018729004515179636
i.e. there's only a 2% chance that the timings emerged from the same distribution.
We can get a more precise picture by measuring the process time rather than the wall time. In Python 2 this can be done via time.clock while Python 3.3+ offers time.process_time. These two functions report the following timings:
py3_process_time = [224.4, 226.2, 224.0, 226.0, 226.2, 223.7, 223.8] # 224.90 +/- 1.09
py2_process_time = [171.0, 171.1, 171.2, 171.3, 170.9, 171.2, 171.4] # 171.16 +/- 0.16
Now there's much less spread in the data since the timings reflect the Python process only.
This data suggests that Python 3 takes about 53.7 seconds longer to execute. Given the large amount of lines in the input file (550_000_000) this amounts to about 97.7 nanoseconds per iteration.
The first effect causing increased execution time are unicode strings in Python 3. The binary data is read from the file, decoded and then encoded again when it is written back. In Python 2 all strings are stored as binary strings right away, so this doesn't introduce any encoding/decoding overhead. You don't see this effect clearly in your tests because it disappears in the large variation introduced by various external resources which are reflected in the wall time difference. For example we can measure the time it takes for a roundtrip from binary to unicode to binary:
In [1]: %timeit b'000000000000000000000000000000000000'.decode().encode()
162 ns ± 2 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
This does include two attribute lookups as well as two function calls, so the actual time needed is smaller than the value reported above. To see the effect on execution time, we can change the test script to use binary modes "rb" and "wb" instead of text modes "r" and "w". This reduces the timing results for Python 3 as follows:
py3_binary_mode = [200.6, 203.0, 207.2] # 203.60 +/- 2.73
That reduces the process time by about 21.3 seconds or 38.7 nanoseconds per iteration. This is in agreement with timing results for the roundtrip benchmark minus timing results for name lookups and function calls:
In [2]: class C:
...: def f(self): pass
...:
In [3]: x = C()
In [4]: %timeit x.f()
82.2 ns ± 0.882 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [5]: %timeit x
17.8 ns ± 0.0564 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)
Here %timeit x measures the additional overhead of resolving the global name x and hence the attribute lookup and function call make 82.2 - 17.8 == 64.4 seconds. Subtracting this overhead twice from the above roundtrip data gives 162 - 2*64.4 == 33.2 seconds.
Now there's still a difference of 32.4 seconds between Python 3 using binary mode and Python 2. This comes from the fact that all the IO in Python 3 goes through the (quite complex) implementation of io.BufferedWriter .write while in Python 2 the file.write method proceeds fairly straightforward to fwrite.
We can check the types of the file objects in both implementations:
$ python3.8
>>> type(open('/tmp/test', 'wb'))
<class '_io.BufferedWriter'>
$ python2.7
>>> type(open('/tmp/test', 'wb'))
<type 'file'>
Here we also need to note that the above timing results for Python 2 have been obtained by using text mode, not binary mode. Binary mode aims to support all objects implementing the buffer protocol which results in additional work being performed also for strings (see also this question). If we switch to binary mode also for Python 2 then we obtain:
py2_binary_mode = [212.9, 213.9, 214.3] # 213.70 +/- 0.59
which is actually a bit larger than the Python 3 results (18.4 ns / iteration).
The two implementations also differ in other details such as the dict implementation. To measure this effect we can create a corresponding setup:
from __future__ import print_function
import timeit
N = 10**6
R = 7
results = timeit.repeat(
"d[b'10'].write",
setup="d = dict.fromkeys((str(i).encode() for i in range(10, 100)), open('test', 'rb'))", # requires file 'test' to exist
repeat=R, number=N
)
results = [x/N for x in results]
print(['{:.3e}'.format(x) for x in results])
print(sum(results) / R)
This gives the following results for Python 2 and Python 3:
Python 2: ~ 56.9 nanoseconds
Python 3: ~ 78.1 nanoseconds
This additional difference of about 21.2 nanoseconds amounts to about 12 seconds for the full 550M iterations.
The above timing code checks the dict lookup for only one key, so we also need to verify that there are no hash collisions:
$ python3.8 -c "print(len({str(i).encode() for i in range(10, 100)}))"
90
$ python2.7 -c "print len({str(i).encode() for i in range(10, 100)})"
90
Using a list's insert function is much slower than achieving the same effect using slice assignment:
> python -m timeit -n 100000 -s "a=[]" "a.insert(0,0)"
100000 loops, best of 5: 19.2 usec per loop
> python -m timeit -n 100000 -s "a=[]" "a[0:0]=[0]"
100000 loops, best of 5: 6.78 usec per loop
(Note that a=[] is only the setup, so a starts empty but then grows to 100,000 elements.)
At first I thought maybe it's the attribute lookup or function call overhead or so, but inserting near the end shows that that's negligible:
> python -m timeit -n 100000 -s "a=[]" "a.insert(-1,0)"
100000 loops, best of 5: 79.1 nsec per loop
Why is the presumably simpler dedicated "insert single element" function so much slower?
I can also reproduce it at repl.it:
from timeit import repeat
for _ in range(3):
for stmt in 'a.insert(0,0)', 'a[0:0]=[0]', 'a.insert(-1,0)':
t = min(repeat(stmt, 'a=[]', number=10**5))
print('%.6f' % t, stmt)
print()
# Example output:
#
# 4.803514 a.insert(0,0)
# 1.807832 a[0:0]=[0]
# 0.012533 a.insert(-1,0)
#
# 4.967313 a.insert(0,0)
# 1.821665 a[0:0]=[0]
# 0.012738 a.insert(-1,0)
#
# 5.694100 a.insert(0,0)
# 1.899940 a[0:0]=[0]
# 0.012664 a.insert(-1,0)
I use Python 3.8.1 32-bit on Windows 10 64-bit.
repl.it uses Python 3.8.1 64-bit on Linux 64-bit.
I think it's probably just that they forgot to use memmove in list.insert. If you take a look at the code list.insert uses to shift elements, you can see it's just a manual loop:
for (i = n; --i >= where; )
items[i+1] = items[i];
while list.__setitem__ on the slice assignment path uses memmove:
memmove(&item[ihigh+d], &item[ihigh],
(k - ihigh)*sizeof(PyObject *));
memmove typically has a lot of optimization put into it, such as taking advantage of SSE/AVX instructions.
I'm doing a simple Monte Carlo simulation exercise, using ipcluster engines of IPython. I've noticed a huge difference in execution time based on how I define my function, and I'm asking the reason for this. Here are the details:
When I definde the task as below, it is fast:
def sample(n):
return (rand(n)**2 + rand(n)**2 <= 1).sum()
When run in parallel:
from IPython.parallel import Client
rc = Client()
v = rc[:]
with v.sync_imports():
from numpy.random import rand
n = 1000000
timeit -r 1 -n 1 print 4.* sum(v.map_sync(sample, [n]*len(v))) / (n*len(v))
3.141712
1 loops, best of 1: 53.4 ms per loop
But if I change the function to:
def sample(n):
return sum(rand(n)**2 + rand(n)**2 <= 1)
I get:
3.141232
1 loops, best of 1: 3.81 s per loop
...which is 71 time slower. What can be the reason for this?
I can't go too in-depth, but the reason it is slower is because sum(<array>) is the built-in CPython sum function, whereas your <numpy array>.sum() is using the numpy sum function, which is substantially faster than the built-in python version.
I imagine you would get similar results if you replaced sum(<array>) with numpy.sum(<array>)
see numpy sum docs here: http://docs.scipy.org/doc/numpy/reference/generated/numpy.sum.html
I've been told os.path.join is horribly slow in python and I should use string concatenation ('%s/%s' % (x, y)) instead. Is there really that big a difference and if so how can I track it?
$ python -mtimeit -s 'import os.path' 'os.path.join("/root", "file")'
1000000 loops, best of 3: 1.02 usec per loop
$ python -mtimeit '"/root" + "file"'
10000000 loops, best of 3: 0.0223 usec per loop
So yes, it's nearly 50 times slower. 1 microsecond is still nothing though, so I really wouldn't factor the difference in. Use os.path.join: it's cross-platform, more readable and less bug-prone.
EDIT: Two people have now commented that the import explains the difference. This is not true, as -s is a setup flag thus the import is not factored into the reported runtime. Read the docs.
I don't know who told you not to use it, but they're wrong.
Even if it were slow, it would never be slow to a program-breaking extent. I've never noticed it being remotely slow.
It's key to cross-platform programming. Line separators etc. differ by platform, and os.path.join will always join paths correctly regardless of platform.
Readability. Everyone knows what join is doing. People might have to do a double take for string concatenation for paths.
Also be aware that periods in function calls are known to be slow. Compare:
python -mtimeit -s "import os.path;x=range(10)" "os.path.join(x)"
1000000 loops, best of 3: 0.405 usec per loop
python -mtimeit -s "from os.path import join;x=range(10)" "join(x)"
1000000 loops, best of 3: 0.29 usec per loop
So that's a slowdown of 40% just by having periods in your function invocation syntax.
Curiously, these two are different speeds:
$ python -mtimeit -s "from os.path import sep;join=sep.join;x=map(str,range(10))" "join(x)"
1000000 loops, best of 3: 0.253 usec per loop
$ python -mtimeit -s "from os.path import join;x=map(str,range(10))" "join(x)"
1000000 loops, best of 3: 0.285 usec per loop
It may be nearly 50 times faster, but unless you're doing it in a CPU bound tight inner loop, the speed difference isn't going to matter at all. The portability difference on the other hand will make the difference between whether or not your program can be easily ported to a non-Unix platform or not.
So, please use os.path.join unless you've profiled and discovered that it really is a major impediment to your program's performance.
You should use os.path.join simply for portability.
I don't get the point of comparing os.path.join (which works for any number or parts, on any platform) with something as trivial as string formatting two paths.
To answer the question in the title, "Is Python's os.path.join slow?" you have to at least compare it with a remotely similar function to find out what speed you can expect from a function like this.
As you can see below, compared to a similar function, there is nothing slow about os.path.join:
python -mtimeit -s "x = tuple(map(str, range(10)))" "'/'.join(x)"
1000000 loops, best of 3: 0.26 usec per loop
python -mtimeit -s "from os.path import join;x = tuple(range(10))" "join(x)"
1000000 loops, best of 3: 0.27 usec per loop
python -mtimeit -s "x = tuple(range(3))" "('/%s'*len(x)) % x"
1000000 loops, best of 3: 0.456 usec per loop
python -mtimeit -s "x = tuple(map(str, range(3)))" "'/'.join(x)"
10000000 loops, best of 3: 0.178 usec per loop
In this hot controversy, I dare to propose:
(I know, I know , there is timeit, but I'm not so trained with timeit, and clock() seems to me to be sufficient for the case)
import os
from time import clock
separ = os.sep
ospath = os.path
ospathjoin = os.path.join
A,B,C,D,E,F,G,H = [],[],[],[],[],[],[],[]
n = 1000
for essays in xrange(100):
te = clock()
for i in xrange(n):
xa = os.path.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
A.append(clock()-te)
te = clock()
for i in xrange(n):
xb = ospath.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
B.append(clock()-te)
te = clock()
for i in xrange(n):
xc = ospathjoin('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
C.append(clock()-te)
te = clock()
for i in xrange(n):
xd = 'C:\WINNT\system32'+os.sep+'Microsoft\Crypto'+os.sep+'RSA\MachineKeys'
D.append(clock()-te)
te = clock()
for i in xrange(n):
xe = '%s\\%s\\%s' % ('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
E.append(clock()-te)
te = clock()
for i in xrange(n):
xf = 'C:\WINNT\system32'+separ+'Microsoft\Crypto'+separ+'RSA\MachineKeys'
F.append(clock()-te)
te = clock()
for i in xrange(n):
xg = os.sep.join(('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys'))
G.append(clock()-te)
te = clock()
for i in xrange(n):
xh = separ.join(('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys'))
H.append(clock()-te)
print min(A), "os.path.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')"
print min(B), "ospath.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')"
print min(C), "ospathjoin('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')"
print min(D), "'C:\WINNT\system32'+os.sep+'Microsoft\Crypto'+os.sep+'RSA\MachineKeys'"
print min(E), "'%s\\%s\\%s' % ('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')"
print min(F), "'C:\WINNT\system32'+separ+'Microsoft\Crypto'+separ+'RSA\MachineKeys'"
print min(G), "os.sep.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')"
print min(H), "separ.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')"
print 'xa==xb==xc==xd==xe==xf==xg==xh==',xa==xb==xc==xd==xe==xf==xg==xh
result
0.0284533369465 os.path.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
0.0277652606686 ospath.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
0.0272489939364 ospathjoin('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
0.00398598145854 'C:\WINNT\system32'+os.sep+'Microsoft\Crypto'+os.sep+'RSA\MachineKeys'
0.00375075603184 '%s\%s\%s' % ('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
0.00330824168994 'C:\WINNT\system32'+separ+'Microsoft\Crypto'+separ+'RSA\MachineKeys'
0.00292467338726 os.sep.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
0.00261401937956 separ.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
True
with
separ = os.sep
ospath = os.path
ospathjoin = os.path.join
Everyone sholud know one inevident feature of os.path.join()
os.path.join( 'a', 'b' ) == 'a/b'
os.path.join( 'a', '/b' ) == '/b'
Given the string (or any length string with an even-number of word pairs):
"12345678"
How would I swap adjacent "words"?
The result I want is
"34127856"
As well as, when that's done I need to swap the longs.
The result I want is:
"78563412"
A regex approach:
import re
twopairs = re.compile(r'(..)(..)')
stringwithswappedwords = twopairs.sub(r'\2\1', basestring)
twoquads = re.compile(r'(....)(....)')
stringwithswappedlongs = twoquads.sub(r'\2\1', stringwithswappedwords)
Edit:
However, this is definitely not the fastest approach in Python -- here's how one finds out about such things: first, write all "competing" approaches into a module, here I'm calling it 'swa.py'...:
import re
twopairs = re.compile(r'(..)(..)')
twoquads = re.compile(r'(....)(....)')
def withre(basestring, twopairs=twopairs, twoquads=twoquads):
stringwithswappedwords = twopairs.sub(r'\2\1', basestring)
return twoquads.sub(r'\2\1', stringwithswappedwords)
def withoutre(basestring):
asalist = list(basestring)
asalist.reverse()
for i in range(0, len(asalist), 2):
asalist[i+1], asalist[i] = asalist[i], asalist[i+1]
return ''.join(asalist)
s = '12345678'
print withre(s)
print withoutre(s)
Note that I set s and try out the two approaches for a fast sanity check that they're actually computing the same result -- good practice, in general, for this kind of "head to head performance races"!
Then, at the shell prompt, you use timeit, as follows:
$ python -mtimeit -s'import swa' 'swa.withre(swa.s)'
78563412
78563412
10000 loops, best of 3: 42.2 usec per loop
$ python -mtimeit -s'import swa' 'swa.withoutre(swa.s)'
78563412
78563412
100000 loops, best of 3: 9.84 usec per loop
...and you find that in this case the RE-less approach is about 4 times faster, a worthwhile optimization. Once you have such a "measurement harness" in place, it's also easy to experiment with further alternative and tweaks for further optimization, if there is any need for "really blazing speed" in this operation, of course.
Edit: for example, here's an even faster approach (add to the same swa.py, with a final line of print faster(s) of course;-):
def faster(basestring):
asal = [basestring[i:i+2]
for i in range(0, len(basestring), 2)]
asal.reverse()
return ''.join(asal)
This gives:
$ python -mtimeit -s'import swa' 'swa.faster(swa.s)'
78563412
78563412
78563412
100000 loops, best of 3: 5.58 usec per loop
About 5.6 microseconds, down from about 9.8 for the simplest RE-less approach, is another possibly-worthwhile micro-optimization.
And so on, of course -- there's an old folk (pseudo)theorem that says that any program can be made at least one byte shorter and at least one nanosecond faster...;-)
Edit: and to "prove" the pseudotheorem, here's a completely different approach (replace the end of swa.py)...:
import array
def witharray(basestring):
a2 = array.array('H', basestring)
a2.reverse()
return a2.tostring()
s = '12345678'
# print withre(s)
# print withoutre(s)
print faster(s)
print witharray(s)
This gives:
$ python -mtimeit -s'import swa' 'swa.witharray(swa.s)'
78563412
78563412
100000 loops, best of 3: 3.01 usec per loop
for a further possible-worthy speedup.
import re
re.sub(r'(..)(..)', r'\2\1', '12345678')
re.sub(r'(....)(....)', r'\2\1', '34127856')
just for the string "12345678"
from textwrap import wrap
s="12345678"
t=wrap(s,len(s)/2)
a,b=wrap(t[0],len(t[0])/2)
c,d=wrap(t[1],len(t[1])/2)
a,b=b,a
c,d=d,c
print a+b+c+d
you can make it to a generic function to do variable length string.
output
$ ./python.py
34127856
>>> import re
>>> re.sub("(..)(..)","\\2\\1","12345678")
'34127856'
>>> re.sub("(....)(....)","\\2\\1","34127856")
'78563412'
If you want to do endianness conversion, use Python's struct module on the original binary data.
If that is not your goal, here's a simple sample code to rearrange one 8 character string:
def wordpairswapper(s):
return s[6:8] + s[4:6] + s[2:4] + s[0:2]
I'm using the following approach:
data = "deadbeef"
if len(data) == 4: #2 bytes, 4 characters
value = socket.ntohs(int(data, 16))
elif len(data) >= 8:
value = socket.ntohl(int(data, 16))
else:
value = int(data, 16)
works for me!