Store Values through Array or Use Function? (to optimize) - python

In Python, I have a variable x (dependant on another vairable i) that takes following values :
x = 0.5 for i=11 or i=111
x = 1 for 12<=i<=100 or 112<=i<=200
x = 0 for the rest of the values of i (i takes integer values from 1 to 300)
I wish to use value of x many times inside a loop (where i is NOT the iterator). Which would be the better (computation time saving) way to store values of x: Array or function?
I can store it as array of length of 300 and assign above values. Or I can have function get_x that takes value of i as input and gives above values according to if condition.
I want to optimize my code for time. Which way would be better? (I am trying to implement this in Python and MATLAB as well.)

The answer is entirely dependent on your application, and anyway I would tend towards the philosophy of avoiding premature optimization. Probably just implement it in whichever way looks cleanest or makes the most sense to you, and if it ends up being too slow try something different.
But if you really do insist upon seeing real results, let's take a look. Here's the code I used to run this:
import time
import random
def get_x(i):
if i == 11 or i == 111:
return 0.5
if (12 <= i and i <= 100) or (112 <= i and i <= 200):
return 1
return 0
x_array = [get_x(i) for i in range(300)]
i_array = [i % 300 for i in range(1000000)]
print "Sequential access"
start = time.time()
for i in i_array:
x = get_x(i)
end = time.time()
print "get_x method:", end-start
start = time.time()
for i in i_array:
x = x_array[i]
end = time.time()
print "Array method:", end-start
print
random.seed(123)
i_array = [random.randint(0,299) for i in range(1000000)]
print "Random access"
start = time.time()
for i in i_array:
x = get_x(i)
end = time.time()
print "get_x method:", end-start
start = time.time()
for i in i_array:
x = x_array[i]
end = time.time()
print "Array method:", end-start
Here's what it prints out:
Sequential access
get_x method: 0.264999866486
Array method: 0.108999967575
Random access
get_x method: 0.263000011444
Array method: 0.108999967575
Overall neither method is very slow, this is for 10^6 accesses and they both easily complete within a quarter of a second. The get_x method does appear to be about twice as slow as the array method. However, this will not be the slow part of your loop logic! Anything else you put in that loop will certainly be the main cause of your program's execution time. You should definitely choose the method makes your code easier to maintain, which is probably the get_x method.

I would prefer the function, maybe it could be a little slower since the function creates a new scope in the stack of memory but it will be more readable, and as the Python's Zen says: "Simple is better than complex."

Precomputing x and then doing lookup will be faster for each lookup but will pay for itself only if enough lookups are done to outweigh the cost of precomputing all the values each time the program is run. The break-even point can be computed based on benchmarks, but perhaps not worth the effort. Another strategy is to not do precomputation but for every computation memoize or cache its results. Some caching strategies such as reddis and memcache allow persistance between program runs.
Based on testing with timeit, list access is between 3.5-7.3 times faster than computation depending on the value of i. Below are some test results.
def f(i):
if i == 11 or i == 111:
return .5
else:
if i >= 12 and i <= 100 or i >= 112 and i <= 200:
return 1
return 0
timeit f(11)
10000000 loops, best of 3: 139 ns per loop
timeit f(24)
1000000 loops, best of 3: 215 ns per loop
timeit f(150)
1000000 loops, best of 3: 249 ns per loop
timeit f(105)
1000000 loops, best of 3: 267 ns per loop
timeit f(237)
1000000 loops, best of 3: 289 ns per loop
x = range(300)
timeit x[150]
10000000 loops, best of 3: 39.5 ns per loop
timeit x[1]
10000000 loops, best of 3: 39.7 ns per loop
timeit x[299]
10000000 loops, best of 3: 39.7 ns per loop

Related

Python In Operator - Short Circuit

I was reading an interesting post on Short-Circuiting in Python and wondered if this was true for the in operator. My simple testing would conclude that it does not:
%%timeit -n 1000
0 in list(range(10))
1000 loops, best of 3: 639 ns per loop
%%timeit -n 1000
0 in list(range(1000))
1000 loops, best of 3: 23.7 µs per loop
# larger the list, the longer it takes. however, i do notice that a higher
# value does take longer.
%%timeit -n 1000
999 in list(range(1000))
1000 loops, best of 3: 45.1 µs per loop
Is there a detailed explanation of why 999 takes longer than 0. Is the in operator like a loop?
Also, is there a way to tell the in operator to "stop the loop" once the value is found (or is this the already defaulted behavior that I'm not seeing)?
Lastly- Is there another operator/function that I am skipping over that does what I'm talking about in regards to "short-circuiting" in?
Short circuiting does occur. The in operator calls the __contains__ method, which in turn is implemented differently per class (in your case list). Searching for 999 takes around double the time as searching for 0, since half of the work is creating the list, and the other half is iterating through it, which is short circuited in the case of 0.
The implementation of in for list objects is found in list_contains. It performs a scan of the list and does exit early if the last comparison has found the element, there's no point in continuing there.
The loop involved is:
for (i = 0, cmp = 0 ; cmp == 0 && i < Py_SIZE(a); ++i)
cmp = PyObject_RichCompareBool(el, PyList_GET_ITEM(a, i),
Py_EQ);
If cmp is 1 (the value returned from PyObject_RichCompareBool for a match), the for loop condition (cmp == 0 && i < Py_SIZE(a)) becomes false and terminates.
For list objects, which are built-in, what is called for in is a C function (for CPython). For other implementations of Python, this can be a different language using different language constructs.
For user-defined classes in Python, what is called is defined in the Membership test operations of the Reference Manual, take a look there for a run-down of what gets called.
You could also come to this conclusion by timing:
l = [*range(1000)]
%timeit 1 in l
85.8 ns ± 11.9 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit 999 in l
22 µs ± 221 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
The furthest the element the more you need to scan. If it didn't short-circuit, all in operations would result in similar timings.
Here's another look with a hashed object, set:
from time import time
qlist = list(range(1000))
qset = set(qlist)
start = time()
for i in range(1000):
0 in qlist
print time() - start
start = time()
for i in range(1000):
999 in qlist
print time() - start
start = time()
for i in range(1000):
0 in qset
print time() - start
start = time()
for i in range(1000):
999 in qset
print time() - start
Output:
0.000172853469849 0 in list
0.0399038791656 999 in list
0.000147104263306i 0 in set
0.000195980072021 999 in set
As others have said, the list implementation must do a sequential search. Set inclusion uses a hashed value, and is on par with finding the item in the first element checked.

Is faster to compare int values or strings?

I have a series of values in a file and I'm iterating on them.
Is it faster to run:
if FTB == "0":
do something
or
if int(FTB) > 0:
do something
Simply using the %timeit function in IPython:
FTB = 0
%timeit if FTB == 0: pass
10000000 loops, best of 3: 47 ns per loop
FTB = '0'
%timeit if int(FTB) == 0: pass
The slowest run took 9.47 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 231 ns per loop
If you're planning to convert string -> integer on-the-fly using int(), then it looks like you're losing out on (relatively) quite a bit of speed. Comparisons involving FTB as a int to begin with are almost 80% faster than comparisons coercing a string FTB to integer.
Perhaps your original question was whether simply comparing already-typed objects (like something already an int or str and not needing type conversion) was different, speed-wise, in the case of strings and integers. In that case, for completeness:
FTB = '0'
%timeit if FTB == '0': pass
10000000 loops, best of 3: 49.9 ns per loop
FTB = 0
%timeit if str(FTB) == '0': pass
The slowest run took 8.62 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 233 ns per loop
More sampling may be required, but naively it's tough to say there's a significant speed difference comparing str to str versus int to int. The biggest cost is the cost of calling either the int() or str() function to change types.
I am late for the party but here is a simple code that shows that there is almost no difference in python.
Also what may comes to my mind is that integers are limited by the lengt while a string can be any size (until you are out of memory) that's why I am increasing the size.
import time
theinteger = 2147483647
thestring = "2147483647"
stringtime = []
integertime = []
for i in range(0,99999):
t0 = time.time()
if thestring == "2147483647":
print("hello")
t1 = time.time();
stringtime.append(t1 - t0)
t0 = time.time()
if theinteger == 2147483647:
print("hello")
t1 = time.time();
integertime.append(t1 - t0)
theinteger = theinteger + 1;
thestring = str(theinteger)
print("time for string: " + str(sum(stringtime)/len(stringtime)))
print("time for integer: " + str(sum(integertime)/len(integertime)))
Integers are faster to compare because on the CPU it is just one operation. Strings are a representation of an array of characters. To compare a String you have to compare earch item in the array until you find a difference. So this are much more operations to compare the whole String.
But the difference is in the range of a couple of nano seconds.

map string sequence with a condition

I am trying to use map to avoid loop in Python in order to get better performance. my code is
def fun(s):
result = []
for i in range(len(s)-1):
if (s[i:i+2]=="ab"):
result.append(s[:i]+"cd"+s[i+2:])
return result
My guess for the function is:
def fun(s):
return map(lambda s : s[:i]+"cd"+s[i+2:] if s[i:i+2]=="ab", s)
However, I do not know how to associate i with s in this case... And the function above is wrong in syntax.
Anyone could help?
-------------------------------------------------------Add explanation-------------------------------------------------------
A lot of people are confused why I do this. The idea simply comes from Python performance document(see Loop section) and Guido's article. I am just learning.
Big thanks to #gboffi, perfect and neat answer!
A Possible Solution
I've written the function using two auxiliary definitions, but if you want you can write it as a one liner,
def fun(s):
substitute = lambda i: s[:i]+'cd'+s[i+2:]
match = lambda i: s[i:i+2]=='ab'
return map(substitute, filter(match, range(len(s)-1)))
it works by creating a list of indices for which s[i:i+2] matches 'ab' using filter and mapping the string substitution function only for the indices that matched.
Timings
It is apparent that there is a large overhead due to the compilation of the lambdas at each invocation but furtunately it is easy to test this hypotesis
In [41]: def fun(s):
result = []
for i in range(len(s)-1):
if (s[i:i+2]=="ab"):
result.append(s[:i]+"cd"+s[i+2:])
return result
....:
In [42]: def fun2(s):
substitute = lambda i: s[:i]+'cd'+s[i+2:]
match = lambda i: s[i:i+2]=='ab'
return map(substitute, filter(match, range(len(s)-1)))
....:
In [43]: %timeit fun('aaaaaaabaaaabaaabaaab')
100000 loops, best of 3: 2.38 µs per loop
In [44]: %timeit fun2('aaaaaaabaaaabaaabaaab')
100000 loops, best of 3: 3.74 µs per loop
In [45]: %timeit fun('aaaaaaabaaaabaaabaaab'*1000)
10 loops, best of 3: 33.7 ms per loop
In [46]: %timeit fun2('aaaaaaabaaaabaaabaaab'*1000)
10 loops, best of 3: 33.8 ms per loop
In [47]:
for a short string the map version is 50% slower, while for a very long string the timings are asymptotically equal...
First, I don't think that map has a performance advantage over a for loop.
If 's' is large then you may use xrange instead of range https://docs.python.org/2/library/functions.html#xrange
Second map can not filter elements, it can only map them to a new value.
You may use a comprehension instead of a for loop, but i don't think you get a performance advantage either.

Is it faster to make a variable for the length of a string?

I am implementing a reverse(s) function in Python 2.7 and I made a code like this:
# iterative version 1
def reverse(s):
r = ""
for c in range(len(s)-1, -1, -1):
r += s[c];
return r
print reverse("Be sure to drink your Ovaltine")
But for each iteration, it gets the length of the string even though it's been deducted.
I made another version that
# iterative version 2
def reverse(s):
r = ""
l = len(s)-1
for c in range(l, -1, -1):
r += s[c];
return r
print reverse("Be sure to drink your Ovaltine")
This version remembers the length of the string and doesn't ask for it every iteration, is this faster for longer strings (like a string that has the length of 1024) than the first version or does it have no effect at all?
In [12]: %timeit reverse("Be sure to drink your Ovaltine")
100000 loops, best of 3: 2.53 µs per loop
In [13]: %timeit reverse1("Be sure to drink your Ovaltine")
100000 loops, best of 3: 2.55 µs per loop
reverse is your first method, reverse1 is the second.
As you can see from timing there is very little difference in the performance.
You can use Ipython to time your code with the above syntax, just def your functions and use %timeit and then your function and whatever parameters .
In the line
for c in range(len(s)-1, -1, -1):
len(s) is evaluated only once, and the result (minus one) passed as an argument to range. Therefore the two versions are almost identical - if anything, the latter may be (very) slightly slower, as it creates a new name to assign the result of the subtraction.

Two similar implementations quit dramatic difference times to run

I've tried the basic cython tutorial here to see how significant the speed up is.
I've also made two different python implementations which differ quit significantly in runtime. I've tested run times of the differences, and as far as I can see, they do not explain the overall runtime difference.
The code is calculating the first kmax primes:
def pyprimes1(kmax):
p=[]
result = []
if kmax > 1000:
kmax = 1000
k = 0
n = 2
while k < kmax:
i = 0
while i < k and n % p[i] != 0:
i = i + 1
if i == k:
p.append(n)
k = k + 1
result.append(n)
n = n + 1
return result
def pyprimes2(kmax):
p=zeros(kmax)
result = []
if kmax > 1000:
kmax = 1000
p=zeros(kmax)
k = 0
n = 2
while k < kmax:
i = 0
while i < k and n % p[i] != 0:
i = i + 1
if i == k:
p[k] = n
k = k + 1
result.append(n)
n = n + 1
return result
As you can see, the only difference between the two implementations is in the usage of the p variable, in the first it is a python list, in the other it is a numpy array. I used IPython %timeit magic to test timinigs. who do you think preformed better? here is what I got:
%timeit pyprimes1(1000)
10 loops, best of 3: 79.4 ms per loop
%timeit pyprimes2(1000)
1 loops, best of 3: 1.14 s per loop
That was strange and surprising, as I thought a numpy array pre-allocated and probably C implemented would be much faster.
I've also test:
array assignment:
%timeit p[100]=5
10000000 loops, best of 3: 116 ns per loop
array selection:
%timeit p[100]
1000000 loops, best of 3: 252 ns per loop
which was twice slower.. also didnt expect that.
array initialization:
%timeit zeros(1000)
1000000 loops, best of 3: 1.65 µs per loop
list appending:
%timeit p.append(1)
10000000 loops, best of 3: 164 ns per loop
list selection:
%timeit p[100]
10000000 loops, best of 3: 56 ns per loop
So it seems list selection is 5 times faster then array selection.
I cant see how this numbers adds-up to the more then x10 time difference. while we do selection in each iteration, it is only 5 times faster.
Would appriciate an explanation regarding the timing differnces bewtween arrays and lists and also the overall time differnce between the two implementations. or am I using %timeit wrong by measuring time on increased length list?
BTW, the cython code did best at 3.5ms.
The 1000th prime number is 7919. So if on average the inner loops iterates kmax/2 times (very roughly), your program performs approx. 7919 * (1000/2) ~ = 4*106 selections from the array/list. If a single selection from a list for the first version takes 56 ns, even the selections wouldn't fit into 79 ms (0.056 µs * 4*106 ~ = 0.22 sec).
Probably these nanosecond times are not very accurate.
By the way, performance of append depends on size of the list. In some cases it can lead to reallocation, but in most the list has enough free space and it's lightning fast.
Numpy's main use case is to perform operations on whole arrays and slices, not single elements. Those operations are implemented in C and therefore much faster than the equivalent Python code. For example,
c = a + b
will be much faster than
for i in xrange(len(a)):
c[i] = a[i] + b[i]
even if the variables are numpy arrays in both cases.
However, single element operations like the ones you are testing may well be worse than Python lists. Python lists are plain C arrays of structs, which are quite simple to access.
On the other hand, accessing an element in a numpy array comes with lots of overhead to support multiple raw data formats and advanced indexing options, among other reasons.

Categories