Use of bitwise operations instead of testing for even/odd - python

I'm trying to understand how this particular solution to prime decomposition (taken from http://rosettacode.org/wiki/Prime_decomposition#Python:_Using_floating_point ), and am a bit puzzled by the usage of bitwise operators in the definition of step
def fac(n):
step = lambda x: 1 + (x<<2) - ((x>>1)<<1)
maxq = long(floor(sqrt(n)))
d = 1
q = n % 2 == 0 and 2 or 3
while q <= maxq and n % q != 0:
q = step(d)
d += 1
return q <= maxq and [q] + fac(n//q) or [n]
I understand what it does (multiply by x 3 and then add 1 if x is even and 2 if x is odd), but son't quite see why one would resort to bitwise operations in this context. Is there a reason, besides the obvious succinctness of this formulation, for the use of bitwise operators instead of a more explicit solution:
mystep = lambda x: (3 * x) + 1 if (x % 2 == 0) else (3 * x) + 2
If there is a good reason (say, (x>>1)<<1 being more efficient than modulo arithmetic, as suggested here), is there a general strategy for extracting the underlying logic from an expression with several bitwise operators?
UPDATE
Following the suggestions in the answers, I timed both the version with step and with my step, and the difference is imperceptible:
%timeit fac(600851475143)
1000 loops, best of 3: 306 µs per loop
%timeit fac2(600851475143)
1000 loops, best of 3: 307 µs per loop

This could be an attempt to optimize around branch misprediction. Modern CPUs are massively pipelined; they speculatively execute 10 or more instructions ahead. A conditional branch that near-randomly goes one way half the time and the other way half the time means the CPU will have to throw out 10 instructions worth of work half the time, making your work 5x as slow. At least with CPython, much of the cost of branch mispredictions is hidden in the overhead, but you can still easily find cases where they increase time by at least 12%, if not the 500% you can expect in C.
The alternative is that the author is optimizing for something even less relevant. On 70s and 80s hardware, replacing arithmetic operations with bitwise operations often led to huge speedups, just because the ALUs were simple and the compilers didn't optimize much. Even people who don't actually expect to get the same speedups today have internalized all the standard bit-twiddling hacks and use them without thinking. (Or, of course, the author could have just ported some code over from C or Scheme or some other language without really thinking about it, and that code could have been written decades ago when this optimization made a big difference.)
At any rate, this code is almost certainly optimizing in the wrong place. Defining a function to call every time in your inner loop, instead of just inlining the one-liner expression there, is adding far more overhead than 12%. And the fact that the code uses step = lambda x: … instead of def step(x): … implies pretty strongly that the author isn't comfortable in Python and doesn't know how to optimize for it. If you really want to make this go faster, there are almost certainly a lot of things that would make a whole lot more difference than which implementation you use for step.
That being said, the right thing to do with any optimization that you're not sure about is to test it. Implement it both ways, use timeit to see the difference, and if you don't understand the results, use a Python-level profiler or hardware-level performance counters (e.g., via cachegrind) or something else to get more information. From a very quick test of the original code against your alternative, throwing various numbers at it with IPython's %timeit, I got results ranging from .92x to 1.08x time for your version. In other words, it seems to be a wash…

In theory, three bit shifts are more efficient than a single multiplication and a single division. In practice, such code should be profiled to ensure that the resulting optimization provides a sufficient speed boost to justify the loss of readability.
Any code that resorts to such optimizations should clearly document what the code does along with why the optimization was deemed useful, if only for the sake of future maintainers who may be tempted to replace the code with something more readable.

Related

scala slower than python in constructing a set

I am learning scala by converting some of my python code to scala code. I just encountered an issue where the python code is significantly outperforming the scala code. The code is supposed to construct a set of candidate pairs based on some conditions. Scala has comparable runtime performance with python for all previous parts.
The id_map is an array of map from Long to set of string. The average number of k-v pairs in the map is 1942.
The scala code snippet is below:
// id_map Array[mutable.Map[Long, Set[String]]
val candidate_pairs = id_map
.flatMap(hashmap => hashmap.values)
.filter(_.size >= 2)
.flatMap(strset => strset.toList.combinations(2))
.map(_.sorted)
.toSet
and the corresponding python code is
candidate_pairs = set()
for hashmap in id_map.values():
for strset in hashmap.values():
if len(strset) >= 2:
for pair in combinations(strset, 2):
candidate_pairs.add(tuple(sorted(pair)))
The scala code snippet takes 80 seconds while python version takes 10 seconds.
I am wondering what can I optimize the above code to make it faster. What I have been trying is updating the set using the for loop
var candidate_pairs = Set.empty[List[String]]
for (
hashmap: mutable.Map[Long, Set[String]] <- id_map;
setstr: Set[String] <- hashmap.values if setstr.size >= 2;
pair <- setstr.toList.combinations(2)
)
candidate_pairs += pair.sorted
and although the candidate_pairs is updated a lot of time and each time it creates a new set, it actually is faster than the previous scala version, and takes about 50 seconds, still worse than python though. I tried using mutable set but however the result is about the same as the immutable version.
Any help would be appreciated! Thanks!
Being slower than python sounds ... surprising.
First of all, make sure you have adequate memory settings, and it is not spending half of those 80 seconds in GC.
Also, be sure to "warm up" the JVM (run your function a few times before doing actual measurement), use the same exact data for runs in python and scala (not just same statistics, exactly the same data), and do not include the time spent acquiring/generating data into measurement. Make several runs and compare average time, not how much a single run took.
Having said that, a few ways to make your code faster:
Adding .view (or .iterator) after id_map in your implementation cuts the execution time by about factor of 4 in my experiments.
(.view makes your chained transformation applied "lazily" – essentially, making a single pass through the single instance of array instead of multiple with multiple copies).
- Replacing .map(_.sorted) with
.map {
case List(a,b) if a < b => (a,b)
case List(a,b) => (b, a)
}
Shaves off about another 75% (sorting two element lists is mostly overhead).
This changes the return type to tuples rather than lists (constructing lots of tiny lists also adds up), but this seems even more appropriate in this case actually.
– Removing .filter(_.size >= 2) (it is redundant anyway, and computing size of a collection may get expensive) yields further improvement, but fairly small, that I did not bother to measure exactly.
Additionally, it may be cheaper to get rid of the separate sort step altogether, and just add .sorted before .combinations. I have not tested it, because it would be futile without knowing more details about your data profile.
These are some general improvements that should improve your performance either way, though it is hard to be sure you'll see the same effect as I do, as I don't really know anything about your data beyond that average map size, the improvement you see might be even better than mine, or it could be somewhat smaller ... but you should see some.
I ran this version with some test Scala code I created. On a list of 1944 elements, it completed in about 15 ms on my laptop.
id_map
.flatMap(hashmap => hashmap.values)
.flatMap { strset =>
if (strset.size >= 2) {
strset.toIndexedSeq.combinations(2)
} else IndexedSeq.empty
}.map(_.sorted).toSet
Main changes I have are to use an IndexedSeq instead of a List (which is a LinkedList), and to do the filter on the fly.
I assume you didn't want to hyper optimize, in which case you could still remove a lot of the intermediate collections created in the flatMap, combinations, conversion to IndexedSeq and toSet call.

Check if float is an integer: is_integer() vs. modulo 1

I've seen a number of questions asking how to check if a float is an integer. Majority of answers seem to recommend using is_integer():
(1.0).is_integer()
(1.55).is_integer()
I have also occasionally seen math.floor() being used:
import math
1.0 == math.floor(1.0)
1.55 == math.floor(1.55)
I'm wondering why % 1 is rarely used or recommended?
1.0 % 1 == 0
1.55 % 1 == 0
Is there a problem with using modulo for this purpose? Are there edge cases that this doesn't catch? Performance issues for really large numbers?
If % 1 is a fine alternative, then I'm also wondering why is_integer() was introduced to the standard library?
It seems that % is much more flexible. For example, its common to use % 2 to check if a number is odd/even, or % n to check if something is a multiple of n. Given this flexibility, why introduce a new method (is_integer) that does the same thing, or use math.floor, both of which require knowing/remembering that they exist and knowing how to use them? I know that math.floor has uses beyond just integer checking but still...
All are valid for the purpose. The math.floor option requires exact matching between a specific value and the result of the floor function. Which is not very convenient if you want to encapsulate it in a generic method. So it boils down to the first and third option. Both are valid and will do the job. So the key difference is simple - performance:
from timeit import Timer
def with_isint(num):
return num.is_integer()
def with_mod(num):
return num % 1 == 0
Timer(lambda: with_isint(10.0)).timeit(number=10000000)
#output: 2.0617980659008026
Timer(lambda: with_mod(10.0)).timeit(number=10000000)
#output: 2.6560597440693527
Naturally this is a simple operation so you'd need a lot of calls in order to see a considerable difference, as you can see in the example.
One soft reason is definitely: readability
If a function called is_integer() returns True, it is obvious what you have been testing.
However, using the modulo solution, one has to think through the process to see, that it is actually testing if a float is an integer. If you wrap your modulo formalism in a function with an obvious name such as simon_says_its_an_integer(), I think it's just as fine (apart from needlessly introducing an already existing function).

Python :: Iteration vs Recursion on string manipulation

In the examples below, both functions have roughly the same number of procedures.
def lenIter(aStr):
count = 0
for c in aStr:
count += 1
return count
or
def lenRecur(aStr):
if aStr == '':
return 0
return 1 + lenRecur(aStr[1:])
Picking between the two techniques is a matter of style or is there a most efficient method here?
Python does not perform tail call optimization, so the recursive solution can hit a stack overflow on long strings. The iterative method does not have this flaw.
That said, len(str) is faster than both methods.
This is not correct: 'functions have roughly the same number of procedures'. You probably mean that: 'these procedures require the same number of operations', or, more formally 'they have the same computational time complexity'.
While both have the same computational time complexity, the one using recursion requires additional CPU instructions to execute code for creating new instances of procedures during recursion, and to switch contexts. And to clean up after returning from every recursion. While these operations do not increase the theoretical computational complexity, in most real life implementations of operating systems they will put significant load.
Also the resursive method will have higher space complexity, as each new instance of recursively-called procedure needs new storage for its data.
Surely the first approach is more optimized, as python doesn't have to do a lot of function call and string slicing, which each of these operations are contain some other operations that cost much for python interpreter, and may be cause a lot of problems in future and in dealing with log strings.
As a more pythonic way you better to use len() function in order to get the length of a string.
You can also use code object to see the required stack sized for each function:
>>> lenRecur.__code__.co_stacksize
4
>>> lenIter.__code__.co_stacksize
3

Numpy: vectorizing two-branch test (ternary-operator like)

I am vectorizing a test in Numpy for the following idea: perform elementwise some test and pick expr1 or expr2 according to the test. This is like the ternary-operator in C: test?expr1:expr2
I see two major ways for performing that; I would like to know if there is a good reason to choose one rather than the other one; maybe also other tricks are available and I would be very happy to know about them. Main goal is speed; for that reason I don't want to use np.vectorize with an if-else statement.
For my example, I will re-build the min function; please, don't tell me about some Numpy function for computing that; this is a mere example!
Idea 1: Use the arithmetic value of the booleans in a multiplication:
# a and b have similar shape
test = a < b
ntest = np.logical_not(test)
out = test*a + ntest*b
Idea 2: More or less following the APL/J style of coding (by using the conditional expression as an index for an array made with one dimension more than initial arrays).
# a and b have similar shape
np.choose(a<b, np.array([b,a]))
This is a better way to use choose
np.choose(a<b, [b,a])
In my small timings it is faster. Also the choose doc says Ifchoicesis itself an array (not recommended), ....
(a<b).choose([b,a])
saves one level of function redirection.
Another option:
out = b.copy(); out[test] = a[test]
In quick tests this actually faster. masked.filled uses np.copyto for this sort of 'where' copy, though it doesn't seem to be any faster.
A variation on the choose is where:
np.where(test,a,b)
Or use where (or np.nonzero) to convert boolean index to a numeric one:
I = np.where(test); out = b.copy(); out[I] = a[I]
For some reason this times faster than the one-piece where.
I've used the multiplication approach in the past; if I recall correctly even with APL (though that's decades ago). An old trick to avoid divide by 0 was to add n==0, a/(b+(b==0)). But it's not as generally applicable. a*0, a*1 have to make sense.
choose looks nice, but with the mode parameter may be more powerful (and hence complicated) that needed.
I'm not sure there is a 'best' way. timing tests can evaluate certain situations, but I don't know where they can be generalized across all cases.

Which programming language or a library can process Infinite Series?

Which programming language or a library is able to process infinite series (like geometric or harmonic)? It perhaps must have a database of some well-known series and automatically give proper values in case of convergence, and maybe generate an exception in case of divergence.
For example, in Python it could look like:
sum = 0
sign = -1.0
for i in range(1,Infinity,2):
sign = -sign
sum += sign / i
then, sum must be math.pi/4 without doing any computations in the loop (because it's a well-known sum).
Most functional languages which evaluate lazily can simulate the processing of infinite series. Of course, on a finite computer it is not possible to process infinite series, as I am sure you are aware. Off the top of my head, I guess Mathematica can do most of what you might want, I suspect that Maple can too, maybe Sage and other computer-algebra systems and I'd be surprised if you can't find a Haskell implementation that suits you.
EDIT to clarify for OP: I do not propose generating infinite loops. Lazy evaluation allows you to write programs (or functions) which simulate infinite series, programs which themselves are finite in time and space. With such languages you can determine many of the properties, such as convergence, of the simulated infinite series with considerable accuracy and some degree of certainty. Try Mathematica or, if you don't have access to it, try Wolfram Alpha to see what one system can do for you.
One place to look might be the Wikipedia category of Computer Algebra Systems.
There are two tools available in Haskell for this beyond simply supporting infinite lists.
First there is a module that supports looking up sequences in OEIS. This can be applied to the first few terms of your series and can help you identify a series for which you don't know the closed form, etc. The other is the 'CReal' library of computable reals. If you have the ability to generate an ever improving bound on your value (i.e. by summing over the prefix, you can declare that as a computable real number which admits a partial ordering, etc. In many ways this gives you a value that you can use like the sum above.
However in general computing the equality of two streams requires an oracle for the halting problem, so no language will do what you want in full generality, though some computer algebra systems like Mathematica can try.
Maxima can calculate some infinite sums, but in this particular case it doesn't seem to find the answer :-s
(%i1) sum((-1)^k/(2*k), k, 1, inf), simpsum;
inf
==== k
\ (- 1)
> ------
/ k
====
k = 1
(%o1) ------------
2
but for example, those work:
(%i2) sum(1/(k^2), k, 1, inf), simpsum;
2
%pi
(%o2) ----
6
(%i3) sum((1/2^k), k, 1, inf), simpsum;
(%o3) 1
You can solve the series problem in Sage (a free Python-based math software system) exactly as follows:
sage: k = var('k'); sum((-1)^k/(2*k+1), k, 1, infinity)
1/4*pi - 1
Behind the scenes, this is really using Maxima (a component of Sage).
For Python check out SymPy - clone of Mathematica and Matlab.
There is also a heavier Python-based math-processing tool called Sage.
You need something that can do a symbolic computation like Mathematica.
You can also consider quering wolframaplha: sum((-1)^i*1/i, i, 1 , inf)
There is a library called mpmath(python), a module of sympy, which provides the series support for sympy( I believe it also backs sage).
More specifically, all of the series stuff can be found here: Series documentation
The C++ iRRAM library performs real arithmetic exactly. Among other things it can compute limits exactly using the limit function. The homepage for iRRAM is here. Check out the limit function in the documentation. Note that I'm not talking about arbitrary precision arithmetic. This is exact arithmetic, for a sensible definition of exact. Here's their code to compute e exactly, pulled from the example on their web site:
//---------------------------------------------------------------------
// Compute an approximation to e=2.71.. up to an error of 2^p
REAL e_approx (int p)
{
if ( p >= 2 ) return 0;
REAL y=1,z=2;
int i=2;
while ( !bound(y,p-1) ) {
y=y/i;
z=z+y;
i+=1;
}
return z;
};
//---------------------------------------------------------------------
// Compute the exact value of e=2.71..
REAL e()
{
return limit(e_approx);
};
Clojure and Haskell off the top of my head.
Sorry I couldn't find a better link to haskell's sequences, if someone else has it, please let me know and I'll update.
Just install sympy on your computer. Then do the following code:
from sympy.abc import i, k, m, n, x
from sympy import Sum, factorial, oo, IndexedBase, Function
Sum((-1)**k/(2*k+1), (k, 0, oo)).doit()
Result will be: pi/4
I have worked in couple of Huge Data Series for Research purpose.
I used Matlab for that. I didn't know it can/can't process Infinite Series.
But I think there is a possibility.
U can try :)
This can be done in for instance sympy and sage (among open source alternatives) In the following, a few examples using sympy:
In [10]: summation(1/k**2,(k,1,oo))
Out[10]:
2
π
──
6
In [11]: summation(1/k**4, (k,1,oo))
Out[11]:
4
π
──
90
In [12]: summation( (-1)**k/k, (k,1,oo))
Out[12]: -log(2)
In [13]: summation( (-1)**(k+1)/k, (k,1,oo))
Out[13]: log(2)
Behind the scenes, this is using the theory for hypergeometric series, a nice introduction is the book "A=B" by Marko Petkovˇeks, Herbert S. Wilf
and Doron Zeilberger which you can find by googling. ¿What is a hypergeometric series?
Everybody knows what an geometric series is: $X_1, x_2, x_3, \dots, x_k, \dots $ is geometric if the contecutive terms ratio $x_{k+1}/x_k$ is constant. It is hypergeometric if the consecutive terms ratio is a rational function in $k$! sympy can handle basically all infinite sums where this last condition is fulfilled, but only very few others.

Categories