Is Python's os.path.join slow? - python

I've been told os.path.join is horribly slow in python and I should use string concatenation ('%s/%s' % (x, y)) instead. Is there really that big a difference and if so how can I track it?

$ python -mtimeit -s 'import os.path' 'os.path.join("/root", "file")'
1000000 loops, best of 3: 1.02 usec per loop
$ python -mtimeit '"/root" + "file"'
10000000 loops, best of 3: 0.0223 usec per loop
So yes, it's nearly 50 times slower. 1 microsecond is still nothing though, so I really wouldn't factor the difference in. Use os.path.join: it's cross-platform, more readable and less bug-prone.
EDIT: Two people have now commented that the import explains the difference. This is not true, as -s is a setup flag thus the import is not factored into the reported runtime. Read the docs.

I don't know who told you not to use it, but they're wrong.
Even if it were slow, it would never be slow to a program-breaking extent. I've never noticed it being remotely slow.
It's key to cross-platform programming. Line separators etc. differ by platform, and os.path.join will always join paths correctly regardless of platform.
Readability. Everyone knows what join is doing. People might have to do a double take for string concatenation for paths.

Also be aware that periods in function calls are known to be slow. Compare:
python -mtimeit -s "import os.path;x=range(10)" "os.path.join(x)"
1000000 loops, best of 3: 0.405 usec per loop
python -mtimeit -s "from os.path import join;x=range(10)" "join(x)"
1000000 loops, best of 3: 0.29 usec per loop
So that's a slowdown of 40% just by having periods in your function invocation syntax.
Curiously, these two are different speeds:
$ python -mtimeit -s "from os.path import sep;join=sep.join;x=map(str,range(10))" "join(x)"
1000000 loops, best of 3: 0.253 usec per loop
$ python -mtimeit -s "from os.path import join;x=map(str,range(10))" "join(x)"
1000000 loops, best of 3: 0.285 usec per loop

It may be nearly 50 times faster, but unless you're doing it in a CPU bound tight inner loop, the speed difference isn't going to matter at all. The portability difference on the other hand will make the difference between whether or not your program can be easily ported to a non-Unix platform or not.
So, please use os.path.join unless you've profiled and discovered that it really is a major impediment to your program's performance.

You should use os.path.join simply for portability.
I don't get the point of comparing os.path.join (which works for any number or parts, on any platform) with something as trivial as string formatting two paths.
To answer the question in the title, "Is Python's os.path.join slow?" you have to at least compare it with a remotely similar function to find out what speed you can expect from a function like this.
As you can see below, compared to a similar function, there is nothing slow about os.path.join:
python -mtimeit -s "x = tuple(map(str, range(10)))" "'/'.join(x)"
1000000 loops, best of 3: 0.26 usec per loop
python -mtimeit -s "from os.path import join;x = tuple(range(10))" "join(x)"
1000000 loops, best of 3: 0.27 usec per loop
python -mtimeit -s "x = tuple(range(3))" "('/%s'*len(x)) % x"
1000000 loops, best of 3: 0.456 usec per loop
python -mtimeit -s "x = tuple(map(str, range(3)))" "'/'.join(x)"
10000000 loops, best of 3: 0.178 usec per loop

In this hot controversy, I dare to propose:
(I know, I know , there is timeit, but I'm not so trained with timeit, and clock() seems to me to be sufficient for the case)
import os
from time import clock
separ = os.sep
ospath = os.path
ospathjoin = os.path.join
A,B,C,D,E,F,G,H = [],[],[],[],[],[],[],[]
n = 1000
for essays in xrange(100):
te = clock()
for i in xrange(n):
xa = os.path.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
A.append(clock()-te)
te = clock()
for i in xrange(n):
xb = ospath.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
B.append(clock()-te)
te = clock()
for i in xrange(n):
xc = ospathjoin('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
C.append(clock()-te)
te = clock()
for i in xrange(n):
xd = 'C:\WINNT\system32'+os.sep+'Microsoft\Crypto'+os.sep+'RSA\MachineKeys'
D.append(clock()-te)
te = clock()
for i in xrange(n):
xe = '%s\\%s\\%s' % ('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
E.append(clock()-te)
te = clock()
for i in xrange(n):
xf = 'C:\WINNT\system32'+separ+'Microsoft\Crypto'+separ+'RSA\MachineKeys'
F.append(clock()-te)
te = clock()
for i in xrange(n):
xg = os.sep.join(('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys'))
G.append(clock()-te)
te = clock()
for i in xrange(n):
xh = separ.join(('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys'))
H.append(clock()-te)
print min(A), "os.path.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')"
print min(B), "ospath.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')"
print min(C), "ospathjoin('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')"
print min(D), "'C:\WINNT\system32'+os.sep+'Microsoft\Crypto'+os.sep+'RSA\MachineKeys'"
print min(E), "'%s\\%s\\%s' % ('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')"
print min(F), "'C:\WINNT\system32'+separ+'Microsoft\Crypto'+separ+'RSA\MachineKeys'"
print min(G), "os.sep.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')"
print min(H), "separ.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')"
print 'xa==xb==xc==xd==xe==xf==xg==xh==',xa==xb==xc==xd==xe==xf==xg==xh
result
0.0284533369465 os.path.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
0.0277652606686 ospath.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
0.0272489939364 ospathjoin('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
0.00398598145854 'C:\WINNT\system32'+os.sep+'Microsoft\Crypto'+os.sep+'RSA\MachineKeys'
0.00375075603184 '%s\%s\%s' % ('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
0.00330824168994 'C:\WINNT\system32'+separ+'Microsoft\Crypto'+separ+'RSA\MachineKeys'
0.00292467338726 os.sep.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
0.00261401937956 separ.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
True
with
separ = os.sep
ospath = os.path
ospathjoin = os.path.join

Everyone sholud know one inevident feature of os.path.join()
os.path.join( 'a', 'b' ) == 'a/b'
os.path.join( 'a', '/b' ) == '/b'

Related

TypeError: 'dict' object is not callable in Python3 OR Python2, resulting in a memory error [duplicate]

This question already has answers here:
Check if a given key already exists in a dictionary
(16 answers)
Closed last month.
Given:
>>> d = {'a': 1, 'b': 2}
Which of the following is the best way to check if 'a' is in d?
>>> 'a' in d
True
>>> d.has_key('a')
True
in is definitely more pythonic.
In fact has_key() was removed in Python 3.x.
in wins hands-down, not just in elegance (and not being deprecated;-) but also in performance, e.g.:
$ python -mtimeit -s'd=dict.fromkeys(range(99))' '12 in d'
10000000 loops, best of 3: 0.0983 usec per loop
$ python -mtimeit -s'd=dict.fromkeys(range(99))' 'd.has_key(12)'
1000000 loops, best of 3: 0.21 usec per loop
While the following observation is not always true, you'll notice that usually, in Python, the faster solution is more elegant and Pythonic; that's why -mtimeit is SO helpful -- it's not just about saving a hundred nanoseconds here and there!-)
According to python docs:
has_key() is deprecated in favor of
key in d.
Use dict.has_key() if (and only if) your code is required to be runnable by Python versions earlier than 2.3 (when key in dict was introduced).
There is one example where in actually kills your performance.
If you use in on a O(1) container that only implements __getitem__ and has_key() but not __contains__ you will turn an O(1) search into an O(N) search (as in falls back to a linear search via __getitem__).
Fix is obviously trivial:
def __contains__(self, x):
return self.has_key(x)
Solution to dict.has_key() is deprecated, use 'in' -- sublime text editor 3
Here I have taken an example of dictionary named 'ages' -
ages = {}
# Add a couple of names to the dictionary
ages['Sue'] = 23
ages['Peter'] = 19
ages['Andrew'] = 78
ages['Karren'] = 45
# use of 'in' in if condition instead of function_name.has_key(key-name).
if 'Sue' in ages:
print "Sue is in the dictionary. She is", ages['Sue'], "years old"
else:
print "Sue is not in the dictionary"
Expanding on Alex Martelli's performance tests with Adam Parkin's comments...
$ python3.5 -mtimeit -s'd=dict.fromkeys(range( 99))' 'd.has_key(12)'
Traceback (most recent call last):
File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/timeit.py", line 301, in main
x = t.timeit(number)
File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/timeit.py", line 178, in timeit
timing = self.inner(it, self.timer)
File "<timeit-src>", line 6, in inner
d.has_key(12)
AttributeError: 'dict' object has no attribute 'has_key'
$ python2.7 -mtimeit -s'd=dict.fromkeys(range( 99))' 'd.has_key(12)'
10000000 loops, best of 3: 0.0872 usec per loop
$ python2.7 -mtimeit -s'd=dict.fromkeys(range(1999))' 'd.has_key(12)'
10000000 loops, best of 3: 0.0858 usec per loop
$ python3.5 -mtimeit -s'd=dict.fromkeys(range( 99))' '12 in d'
10000000 loops, best of 3: 0.031 usec per loop
$ python3.5 -mtimeit -s'd=dict.fromkeys(range(1999))' '12 in d'
10000000 loops, best of 3: 0.033 usec per loop
$ python3.5 -mtimeit -s'd=dict.fromkeys(range( 99))' '12 in d.keys()'
10000000 loops, best of 3: 0.115 usec per loop
$ python3.5 -mtimeit -s'd=dict.fromkeys(range(1999))' '12 in d.keys()'
10000000 loops, best of 3: 0.117 usec per loop
has_key is a dictionary method, but in will work on any collection, and even when __contains__ is missing, in will use any other method to iterate the collection to find out.
If you have something like this:
t.has_key(ew)
change it to below for running on Python 3.X and above:
key = ew
if key not in t

Why is a.insert(0,0) much slower than a[0:0]=[0]?

Using a list's insert function is much slower than achieving the same effect using slice assignment:
> python -m timeit -n 100000 -s "a=[]" "a.insert(0,0)"
100000 loops, best of 5: 19.2 usec per loop
> python -m timeit -n 100000 -s "a=[]" "a[0:0]=[0]"
100000 loops, best of 5: 6.78 usec per loop
(Note that a=[] is only the setup, so a starts empty but then grows to 100,000 elements.)
At first I thought maybe it's the attribute lookup or function call overhead or so, but inserting near the end shows that that's negligible:
> python -m timeit -n 100000 -s "a=[]" "a.insert(-1,0)"
100000 loops, best of 5: 79.1 nsec per loop
Why is the presumably simpler dedicated "insert single element" function so much slower?
I can also reproduce it at repl.it:
from timeit import repeat
for _ in range(3):
for stmt in 'a.insert(0,0)', 'a[0:0]=[0]', 'a.insert(-1,0)':
t = min(repeat(stmt, 'a=[]', number=10**5))
print('%.6f' % t, stmt)
print()
# Example output:
#
# 4.803514 a.insert(0,0)
# 1.807832 a[0:0]=[0]
# 0.012533 a.insert(-1,0)
#
# 4.967313 a.insert(0,0)
# 1.821665 a[0:0]=[0]
# 0.012738 a.insert(-1,0)
#
# 5.694100 a.insert(0,0)
# 1.899940 a[0:0]=[0]
# 0.012664 a.insert(-1,0)
I use Python 3.8.1 32-bit on Windows 10 64-bit.
repl.it uses Python 3.8.1 64-bit on Linux 64-bit.
I think it's probably just that they forgot to use memmove in list.insert. If you take a look at the code list.insert uses to shift elements, you can see it's just a manual loop:
for (i = n; --i >= where; )
items[i+1] = items[i];
while list.__setitem__ on the slice assignment path uses memmove:
memmove(&item[ihigh+d], &item[ihigh],
(k - ihigh)*sizeof(PyObject *));
memmove typically has a lot of optimization put into it, such as taking advantage of SSE/AVX instructions.

Format strings vs concatenation

I see many people using format strings like this:
root = "sample"
output = "output"
path = "{}/{}".format(root, output)
Instead of simply concatenating strings like this:
path = root + '/' + output
Do format strings have better performance or is this just for looks?
It's just for the looks. You can see at one glance what the format is. Many of us like readability better than micro-optimization.
Let's see what IPython's %timeit says:
Python 3.7.2 (default, Jan 3 2019, 02:55:40)
IPython 5.8.0
Intel(R) Core(TM) i5-4590T CPU # 2.00GHz
In [1]: %timeit root = "sample"; output = "output"; path = "{}/{}".format(root, output)
The slowest run took 12.44 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 223 ns per loop
In [2]: %timeit root = "sample"; output = "output"; path = root + '/' + output
The slowest run took 13.82 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 5: 101 ns per loop
In [3]: %timeit root = "sample"; output = "output"; path = "%s/%s" % (root, output)
The slowest run took 27.97 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 5: 155 ns per loop
In [4]: %timeit root = "sample"; output = "output"; path = f"{root}/{output}"
The slowest run took 19.52 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 5: 77.8 ns per loop
I agree that the formatting is mostly used for readability, but since the release of f-strings in 3.6, the tables have turned in terms of performance. It is also my opinion that the f-strings are more readable/maintainable since 1) they can be read left-right like most regular text and 2) the spacing-related disadvantages of concatenation are avoided since the variables are in-string.
Running this code:
from timeit import timeit
runs = 1000000
def print_results(time, start_string):
print(f'{start_string}\n'
f'Total: {time:.4f}s\n'
f'Avg: {(time/runs)*1000000000:.4f}ns\n')
t1 = timeit('"%s, %s" % (greeting, loc)',
setup='greeting="hello";loc="world"',
number=runs)
t2 = timeit('f"{greeting}, {loc}"',
setup='greeting="hello";loc="world"',
number=runs)
t3 = timeit('greeting + ", " + loc',
setup='greeting="hello";loc="world"',
number=runs)
t4 = timeit('"{}, {}".format(greeting, loc)',
setup='greeting="hello";loc="world"',
number=runs)
print_results(t1, '% replacement')
print_results(t2, 'f strings')
print_results(t3, 'concatenation')
print_results(t4, '.format method')
yields this result on my machine:
% replacement
Total: 0.3044s
Avg: 304.3638ns
f strings
Total: 0.0991s
Avg: 99.0777ns
concatenation
Total: 0.1252s
Avg: 125.2442ns
.format method
Total: 0.3483s
Avg: 348.2690ns
A similar answer to a different question is given on this answer.
As with most things, there will be a performance difference, but ask yourself "Does it really matter if this is ns faster?". The root + '/' output method is quick and easy to type out. But this can get hard to read real quick when you have multiple variables to print out
foo = "X = " + myX + " | Y = " + someY + " Z = " + Z.toString()
vs
foo = "X = {} | Y= {} | Z = {}".format(myX, someY, Z.toString())
Which is easier to understand what is going on? Unless you really need to eak out performance, chose the way that will be easiest for people to read and understand
It's not just for "looks", or for powerful lexical type conversions; it's also a must for internationalisation.
You can swap out the format string depending on what language is selected.
With a long line of string concatenations baked into the source code, this becomes effectively impossible to do properly.
As of Python 3.6 you can do literal string interpolation by prepending f to the string:
foo = "foo"
bar = "bar"
path = f"{foo}/{bar}"
String format is free of data type while binding data. While in concatenation we have to type cast or convert the data accordingly.
For example:
a = 10
b = "foo"
c = str(a) + " " + b
print c
> 10 foo
It could be done via string formatting as:
a = 10
b = "foo"
c = "{} {}".format(a, b)
print c
> 10 foo
Such that with-in placeholders {} {}, we assume two things to come further i.e., in this case, are a and b.
It's for looks and the maintaining of the code. It's really easier to edit your code if you used format. Also when you use + you may miss the details like spaces. Use format for your and possible maintainers' good.

Execution Time Difference based on sum() on IPython

I'm doing a simple Monte Carlo simulation exercise, using ipcluster engines of IPython. I've noticed a huge difference in execution time based on how I define my function, and I'm asking the reason for this. Here are the details:
When I definde the task as below, it is fast:
def sample(n):
return (rand(n)**2 + rand(n)**2 <= 1).sum()
When run in parallel:
from IPython.parallel import Client
rc = Client()
v = rc[:]
with v.sync_imports():
from numpy.random import rand
n = 1000000
timeit -r 1 -n 1 print 4.* sum(v.map_sync(sample, [n]*len(v))) / (n*len(v))
3.141712
1 loops, best of 1: 53.4 ms per loop
But if I change the function to:
def sample(n):
return sum(rand(n)**2 + rand(n)**2 <= 1)
I get:
3.141232
1 loops, best of 1: 3.81 s per loop
...which is 71 time slower. What can be the reason for this?
I can't go too in-depth, but the reason it is slower is because sum(<array>) is the built-in CPython sum function, whereas your <numpy array>.sum() is using the numpy sum function, which is substantially faster than the built-in python version.
I imagine you would get similar results if you replaced sum(<array>) with numpy.sum(<array>)
see numpy sum docs here: http://docs.scipy.org/doc/numpy/reference/generated/numpy.sum.html

How to swap adjacent bytes in a string of hex bytes (with or without regular expressions)

Given the string (or any length string with an even-number of word pairs):
"12345678"
How would I swap adjacent "words"?
The result I want is
"34127856"
As well as, when that's done I need to swap the longs.
The result I want is:
"78563412"
A regex approach:
import re
twopairs = re.compile(r'(..)(..)')
stringwithswappedwords = twopairs.sub(r'\2\1', basestring)
twoquads = re.compile(r'(....)(....)')
stringwithswappedlongs = twoquads.sub(r'\2\1', stringwithswappedwords)
Edit:
However, this is definitely not the fastest approach in Python -- here's how one finds out about such things: first, write all "competing" approaches into a module, here I'm calling it 'swa.py'...:
import re
twopairs = re.compile(r'(..)(..)')
twoquads = re.compile(r'(....)(....)')
def withre(basestring, twopairs=twopairs, twoquads=twoquads):
stringwithswappedwords = twopairs.sub(r'\2\1', basestring)
return twoquads.sub(r'\2\1', stringwithswappedwords)
def withoutre(basestring):
asalist = list(basestring)
asalist.reverse()
for i in range(0, len(asalist), 2):
asalist[i+1], asalist[i] = asalist[i], asalist[i+1]
return ''.join(asalist)
s = '12345678'
print withre(s)
print withoutre(s)
Note that I set s and try out the two approaches for a fast sanity check that they're actually computing the same result -- good practice, in general, for this kind of "head to head performance races"!
Then, at the shell prompt, you use timeit, as follows:
$ python -mtimeit -s'import swa' 'swa.withre(swa.s)'
78563412
78563412
10000 loops, best of 3: 42.2 usec per loop
$ python -mtimeit -s'import swa' 'swa.withoutre(swa.s)'
78563412
78563412
100000 loops, best of 3: 9.84 usec per loop
...and you find that in this case the RE-less approach is about 4 times faster, a worthwhile optimization. Once you have such a "measurement harness" in place, it's also easy to experiment with further alternative and tweaks for further optimization, if there is any need for "really blazing speed" in this operation, of course.
Edit: for example, here's an even faster approach (add to the same swa.py, with a final line of print faster(s) of course;-):
def faster(basestring):
asal = [basestring[i:i+2]
for i in range(0, len(basestring), 2)]
asal.reverse()
return ''.join(asal)
This gives:
$ python -mtimeit -s'import swa' 'swa.faster(swa.s)'
78563412
78563412
78563412
100000 loops, best of 3: 5.58 usec per loop
About 5.6 microseconds, down from about 9.8 for the simplest RE-less approach, is another possibly-worthwhile micro-optimization.
And so on, of course -- there's an old folk (pseudo)theorem that says that any program can be made at least one byte shorter and at least one nanosecond faster...;-)
Edit: and to "prove" the pseudotheorem, here's a completely different approach (replace the end of swa.py)...:
import array
def witharray(basestring):
a2 = array.array('H', basestring)
a2.reverse()
return a2.tostring()
s = '12345678'
# print withre(s)
# print withoutre(s)
print faster(s)
print witharray(s)
This gives:
$ python -mtimeit -s'import swa' 'swa.witharray(swa.s)'
78563412
78563412
100000 loops, best of 3: 3.01 usec per loop
for a further possible-worthy speedup.
import re
re.sub(r'(..)(..)', r'\2\1', '12345678')
re.sub(r'(....)(....)', r'\2\1', '34127856')
just for the string "12345678"
from textwrap import wrap
s="12345678"
t=wrap(s,len(s)/2)
a,b=wrap(t[0],len(t[0])/2)
c,d=wrap(t[1],len(t[1])/2)
a,b=b,a
c,d=d,c
print a+b+c+d
you can make it to a generic function to do variable length string.
output
$ ./python.py
34127856
>>> import re
>>> re.sub("(..)(..)","\\2\\1","12345678")
'34127856'
>>> re.sub("(....)(....)","\\2\\1","34127856")
'78563412'
If you want to do endianness conversion, use Python's struct module on the original binary data.
If that is not your goal, here's a simple sample code to rearrange one 8 character string:
def wordpairswapper(s):
return s[6:8] + s[4:6] + s[2:4] + s[0:2]
I'm using the following approach:
data = "deadbeef"
if len(data) == 4: #2 bytes, 4 characters
value = socket.ntohs(int(data, 16))
elif len(data) >= 8:
value = socket.ntohl(int(data, 16))
else:
value = int(data, 16)
works for me!

Categories