Performance difference in alternative switches in Python - python

I have read a few articles around alternatives to the switch statement in Python. Mainly using dicts instead of lots of if's and elif's. However none really answer the question: is there one with better performance or efficiency? I have read a few arguments that if's and elifs would have to check each statement and becomes inefficient with many ifs and elif's. However using dicts gets around that, but you end up having to create new modules to call which cancels the performance gain anyways. The only difference in the end being readability.
Can anyone comment on this, is there really any difference in the long run? Does anyone regularly use the alternative? Only reason I ask is because I am going to end up having 30-40 elif/if's and possibly more in the future. Any input is appreciated. Thanks.

dict's perfomance is typically going to be unbeatable, because a lookup into a dict is going to be O(1) except in rare and practically never-observed cases (where they key involves user-coded types with lousy hashing;-). You don't have to "create new modules" as you say, just arbitrary callables, and that creation, which is performed just once to prep the dict, is not particularly costly anyway -- during operation, it's just one lookup and one call, greased lightning time.
As others have suggested, try timeit to experiment with a few micro-benchmarks of the alternatives. My prediction: with a few dozen possibilities in play, as you mention you have, you'll be slapping your forehead about ever considering anything but a dict of callables!-)
If you find it too hard to run your own benchmarks and can supply some specs, I guess we can benchmark the alternatives for you, but it would be really more instructive if you tried to do it yourself before you ask SO for help!-)

Your concern should be about the readability and maintainability of the code, rather than its efficiency. This applies in most scenarios, and particularly in the one you describe now. The efficiency difference is likely to be negligible (you can easily check it with a small amount of benchmarking code), but 30-40 elif's are a warning sign - perhaps something can be abstracted away and make the code more readable. Describe your case, and perhaps someone can come up with a better design.

With all performance/profiling questions, the right answer is "test each case yourself for your specific needs."
One great tool for this is timeit which you can learn about in the python docs.
In general I have seen no performance issues related to using a dictionary in place of other languages switch statement. My guess would be that the comparison in performance would depend on the number of alternatives. Who knows, there may be a tipping point where one becomes better than the other.
If you (or anyone else) tests it, feel free to post your results.

Times when you'd use a switch in many languages you would use a dict in Python. A switch statement, if added to Python (it's been considered), would not be able to give any real performance gain anyways.
dicts are used ubiquitously in Python. CPython dicts are an insanely-efficient, robust hashtable implementation. Lookup is O(1), as opposed to traversing an elif chain, which is O(n). (30-40 probably doesn't qualify as big enough for this to matter tons anyways). I am not sure what you mean about creating new modules to call, but using dicts is very scalable and easy.
As for actual performance gain, that is impossible to really tackle effectively abstractly. Write your code in the most straightforward and maintainable way (you're using Python forgoshsakes!) and then see if it's too slow. If it is, profile it and find out what places it needs to be sped up to make a real difference.

I think a dict will gain advantage over the alternative sequence of if statements as the number of cases goes up, since the key lookup only requires one hash operation. Otherwise if you only have a few cases, a few if statements are better. A dict is probably a more elegant solution for what you are doing. Either way, the performance difference wont really be noticeable in your case.

I ran a few benchmarks (see here). Using lists of function pointers is fastest,if your keys are sequential integers. For the general case: Alex Martelli is right, dictionary are fastest.

Related

Bundling functions into classes vs functions as they are. Which option is more memory efficient? [duplicate]

When I started learning Python, I created a few applications just using functions and procedural code. However, now I know classes and realized that the code can be much readable (and subjectively easier to understand) if I rewrite it with classes.
How much slower the equivalent classes may get compared to the functions in general? Will the initializer, methods in the classes make any considerable difference in speed?
To answer the question: yes, it is likely to be a little slower, all else being equal. Some things that used to be variables (including functions) are now going to be object attributes, and self.foo is always going to be slightly slower than foo regardless of whether foo was a global or local originally. (Local variables are accessed by index, and globals by name, but an attribute lookup on an object is either a local or a global lookup, plus an additional lookup by name for the attribute, possibly in multiple places.) Calling a method is also slightly slower than calling a function -- not only is it slower to get the attribute, it is also slower to make the call, because a method is a wrapper object that calls the function you wrote, adding an extra function call overhead.
Will this be noticeable? Usually not. In rare cases it might be, say if you are accessing an object attribute a lot (thousands or millions of times) in a particular method. But in that case you can just assign self.foo to a local variable foo at the top of the method, and reference it by the local name throughout, to regain 99.44% of the local variable's performance advantage.
Beyond that there will be some overhead for allocating memory for instances that you probably didn't have before, but unless you are constantly creating and destroying instances, this is likely a one-time cost.
In short: there will be a likely-minor performance hit, and where the performance hit is more than minor, it is easy to mitigate. On the other hand, you could save hours in writing and maintaining the code, assuming your problem lends itself to an object-oriented solution. And saving time is likely why you're using a language like Python to begin with.
No.
In general you will not notice any difference in performance based on using classes or not. The different code structures implied may mean that one is faster than the other, but it's impossible to say which.
Always write code to be read, then if, and only if, it's not fast enough make it faster. Remember: Premature optimization is the root of all evil.
Donald Knuth, one of the grand old minds of computing, is credited with the observation that "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." Deciding to use procedural techniques rather than object-oriented ones on the basis of speed gains that may well not be realized anyway is not a sensible strategy.
If your code works and doesn't need to be modified then feel free to leave it alone. If it needs to be modified then you should consider a judicious refactoring to include classes, since program readability is far more important than speed during development. You will also see benefits in improved maintainability. An old saw from Kernighan and Plauger's "Elements of Programming Style" still applies:
First, make it work. Then (if it doesn't work fast enough) make it work faster.
But, first and foremost, go for readability. Seriously.
You probably don't care as much as you think you do.
Really.
Sure, code with classes might be a little slower through indirection. Maybe. That is what JIT compilation is for, right? I can never remember which versions of python do this and which don't, because:
Performance doesn't matter.
At least constant performance differences like this. Unless you are doing a hell of a lot of computations (you aren't!), you will spend more time developing/debugging/maintaining your code. Optimize for that.
Really. Because you will never ever be able to measure the difference, unless you are in a tight loop. And you don't want to be doing that in python anyway, unless you don't really care about time. It's not like you're trying to balance your segway in python, right? You just want to compute some numbers, right? Your computer is really good at this. Trust it.
That said, this doesn't mean classes are the way to go. Just that speed isn't the question you should be asking. Instead, try to figure out what representation will be the best for your code. It seems, now you know classes, you will write clean code in OO fashion. Go ahead. Learn. Iterate.

absolute fastest lookup in python / cython

I'd like to do a lookup mapping 32bit integer => 32bit integer.
The input keys aren't necessary contiguous nor cover 2^32 -1 (nor do I want this in-memory to consume that much space!).
The use case is for a poker evaluator, so doing a lookup must be as fast as possible. Perfect hashing would be nice, but that might be a bit out of scope.
I feel like the answer is some kind of cython solution, but I'm not sure about the underpinnings of cython and if it really does any good with Python's dict() type. Of course a flat array with just a simple offset jump would be super fast, but then I'm allocating 2^32 - 1 places in memory for the table, which I don't want.
Any tips / strategies? Absolute speed with minimal memory footprint is the goal.
You aren't smart enough to write something faster than dict. Don't feel bad; 99.99999% of the people on the planet aren't. Use a dict.
First, you should actually define what "fast enough" means to you, before you do anything else. You can always make something faster, so you need to set a target so you don't go insane. It is perfectly reasonable for this target to be dual-headed - say something like "Mapping lookups must execute in these parameters (min/max/mean), and when/if we hit those numbers we're willing to spend X more development hours to optimize even further, but then we'll stop."
Second, the very first thing you should do to make this faster is to copy the code in Objects/dictobject.c in the Cpython source tree (make something new like intdict.c or something) and then modify it so that the keys are not python objects. Chasing after a better hash function will not likely be a good use of your time for integers, but eliminating INCREF/DECREF and PyObject_RichCompareBool calls for your keys will be a huge win. Since you're not deleting keys you could also elide any checks for dummy values (which exist to preserve the collision traversal for deleted entries), although it's possible that you'll get most of that win for free simply by having better branch prediction for your new object.
You are describing a perfect use case for a hash indexed collection. You are also describing a perfect scenario for the strategy of write it first, optimise it second.
So start with the Python dict. It's fast and it absolutely will do the job you need.
Then benchmark it. Figure out how fast it needs to go, and how near you are. Then 3 choices.
It's fast enough. You're done.
It's nearly fast enough, say within about a factor of two. Write your own hash indexing, paying attention to the hash function and the collision strategy.
It's much too slow. You're dead. There is nothing simple that will give you a 10x or 100x improvement. At least you didn't waste any time on a better hash index.

Converting lists to dictionaries to check existence?

If I instantiate/update a few lists very, very few times, in most cases only once, but I check for the existence of an object in that list a bunch of times, is it worth it to convert the lists into dictionaries and then check by key existence?
Or in other words is it worth it for me to convert lists into dictionaries to achieve possible faster object existence checks?
Dictionary lookups are faster the list searches. Also a set would be an option. That said:
If "a bunch of times" means "it would be a 50% performance increase" then go for it. If it doesn't but makes the code better to read then go for it. If you would have fun doing it and it does no harm then go for it. Otherwise it's most likely not worth it.
You should be using a set, since from your description I am guessing you wouldn't have a value to associate. See Python: List vs Dict for look up table for more info.
Usually it's not important to tune every line of code for utmost performance.
As a rule of thumb, if you need to look up more than a few times, creating a set is usually worthwhile.
However consider that pypy might do the linear search 100 times faster than CPython, then a "few" times might be "dozens". In other words, sometimes the constant part of the complexity matters.
It's probably safest to go ahead and use a set there. You're less likely to find that a bottleneck as the system scales than the other way around.
If you really need to microtune everything, keep in mind that the implementation, cpu cache, etc... can affect it, so you may need to remicrotune differently for different platforms, and if you need performance that badly, Python was probably a bad choice - although maybe you can pull the hotspots out into C. :)
random access (look up) in dictionary are faster but creating hash table consumes more memory.
more performance = more memory usages
it depends on how many items in your list.

copy.deepcopy or create a new object?

I'm developing a real-time application and sometimes I need to create instances to new objects always with the same data.
First, I did it just instantiating them, but then I realised maybe with copy.deepcopy it would be faster. Now, I find people who say deepcopy is horribly slow.
I can't do a simply copy.copy because my object has lists.
My question is, do you know a faster way or I just need to give up and instantiate them again?
Thank you for your time
I believe copy.deepcopy() is still pure Python, so it's unlikely to give you any speed boost.
It sounds to me a little like a classic case of early optimisation. I would suggest writing your code to be intuitive which, in my opinion, is simply instantiating each object. Then you can profile it and see where savings need to be made, if anywhere. It may well be in your real-world use-case that some completely different piece of code will be a bottleneck.
EDIT: One thing I forgot to mention in my original answer - if you're copying a list, make sure you use slice notation (new_list = old_list[:]) rather than iterating through it in Python, which will be slower. This won't do a deep copy, however, so if your lists have other lists or dictionaries you'll need to use deepcopy(). For dict objects, use the copy() method.
If you still find constructing your objects is what's taking the time, you can then consider how to speed it up. You could experiment with __slots__, although they're typically about saving memory more than CPU time so I doubt they'll buy you much. In the extreme case, you can push your object out to a C extension module, which is likely to be a lot faster at the expense of increased complexity. This is always the approach I've taken in the past, where I use native C data structures under the hood and use Python's special methods to wrap a "list-like" or "dict-like" interface on top. This is does rely on you being happy with coding in C, of course.
(As an aside I would avoid C++ unless you have a compelling reason, C++ Python extensions are slightly more fiddly to get building than plain C - it's entirely possible if you have a good motivation, however)
If you object has, for example, very long lists then you might get some mileage out of a sort of copy-on-write approach, where clones of objects just keep the same references instead of copying the lists. Every time you access them you could use sys.getrefcount() to see if it's safe to update in-place or whether you need to take a copy. This approach is likely to be error-prone and overly complex, but I thought I'd mention it for interest.
You could also look at your object hierarchy and see whether you can break objects up such that parts which don't need to be duplicated can be shared between other objects. Again, you'd need to take care when modifying such shared objects.
The important point, however, is that you first want to get your code correct and then make your code fast once you understand the best ways to do that from real world usage.

FSharp runs my algorithm slower than Python

Years ago, I solved a problem via dynamic programming:
https://www.thanassis.space/fillupDVD.html
The solution was coded in Python.
As part of expanding my horizons, I recently started learning OCaml/F#. What better way to test the waters, than by doing a direct port of the imperative code I wrote in Python to F# - and start from there, moving in steps towards a functional programming solution.
The results of this first, direct port... are disconcerting:
Under Python:
bash$ time python fitToSize.py
....
real 0m1.482s
user 0m1.413s
sys 0m0.067s
Under FSharp:
bash$ time mono ./fitToSize.exe
....
real 0m2.235s
user 0m2.427s
sys 0m0.063s
(in case you noticed the "mono" above: I tested under Windows as well, with Visual Studio - same speed).
I am... puzzled, to say the least. Python runs code faster than F# ? A compiled binary, using the .NET runtime, runs SLOWER than Python's interpreted code?!?!
I know about startup costs of VMs (mono in this case) and how JITs improve things for languages like Python, but still... I expected a speedup, not a slowdown!
Have I done something wrong, perhaps?
I have uploaded the code here:
https://www.thanassis.space/fsharp.slower.than.python.tar.gz
Note that the F# code is more or less a direct, line-by-line translation of the Python code.
P.S. There are of course other gains, e.g. the static type safety offered by F# - but if the resulting speed of an imperative algorithm is worse under F# ... I am disappointed, to say the least.
EDIT: Direct access, as requested in the comments:
the Python code: https://gist.github.com/950697
the FSharp code: https://gist.github.com/950699
Dr Jon Harrop, whom I contacted over e-mail, explained what is going on:
The problem is simply that the program has been optimized for Python. This is common when the programmer is more familiar with one language than the other, of course. You just have to learn a different set of rules that dictate how F# programs should be optimized...
Several things jumped out at me such as the use of a "for i in 1..n do" loop rather than a "for i=1 to n do" loop (which is faster in general but not significant here), repeatedly doing List.mapi on a list to mimic an array index (which allocated intermediate lists unnecessarily) and your use of the F# TryGetValue for Dictionary which allocates unnecessarily (the .NET TryGetValue that accepts a ref is faster in general but not so much here)
... but the real killer problem turned out to be your use of a hash table to implement a dense 2D matrix. Using a hash table is ideal in Python because its hash table implementation has been extremely well optimized (as evidenced by the fact that your Python code is running as fast as F# compiled to native code!) but arrays are a much better way to represent dense matrices, particularly when you want a default value of zero.
The funny part is that when I first coded this algorithm, I DID use a table -- I changed the implementation to a dictionary for reasons of clarity (avoiding the array boundary checks made the code simpler - and much easier to reason about).
Jon transformed my code (back :-)) into its array version, and it runs at 100x speed.
Moral of the story:
F# Dictionary needs work... when using tuples as keys, compiled F# is slower than interpreted Python's hash tables!
Obvious, but no harm in repeating: Cleaner code sometimes means... much slower code.
Thank you, Jon -- much appreciated.
EDIT: the fact that replacing Dictionary with Array makes F# finally run at the speeds a compiled language is expected to run, doesn't negate the need for a fix in Dictionary's speed (I hope F# people from MS are reading this). Other algorithms depend on dictionaries/hashes, and can't be easily switched to using arrays; making programs suffer "interpreter-speeds" whenever one uses a Dictionary, is arguably, a bug. If, as some have said in the comments, the problem is not with F# but with .NET Dictionary, then I'd argue that this... is a bug in .NET!
EDIT2: The clearest solution, that doesn't require the algorithm to switch to arrays (some algorithms simply won't be amenable to that) is to change this:
let optimalResults = new Dictionary<_,_>()
into this:
let optimalResults = new Dictionary<_,_>(HashIdentity.Structural)
This change makes the F# code run 2.7x times faster, thus finally beating Python (1.6x faster). The weird thing is that tuples by default use structural comparison, so in principle, the comparisons done by the Dictionary on the keys are the same (with or without Structural). Dr Harrop theorizes that the speed difference may be attributed to virtual dispatch: "AFAIK, .NET does little to optimize virtual dispatch away and the cost of virtual dispatch is extremely high on modern hardware because it is a "computed goto" that jumps the program counter to an unpredictable location and, consequently, undermines branch prediction logic and will almost certainly cause the entire CPU pipeline to be flushed and reloaded".
In plain words, and as suggested by Don Syme (look at the bottom 3 answers), "be explicit about the use of structural hashing when using reference-typed keys in conjunction with the .NET collections". (Dr. Harrop in the comments below also says that we should always use Structural comparisons when using .NET collections).
Dear F# team in MS, if there is a way to automatically fix this, please do.
As Jon Harrop has pointed out, simply constructing the dictionaries using Dictionary(HashIdentity.Structural) gives a major performance improvement (a factor of 3 on my computer). This is almost certainly the minimally invasive change you need to make to get better performance than Python, and keeps your code idiomatic (as opposed to replacing tuples with structs, etc.) and parallel to the Python implementation.
Edit: I was wrong, it's not a question of value type vs reference type. The performance problem was related to the hash function, as explained in other comments. I keep my answer here because there's an interessant discussion. My code partially fixed the performance issue, but this is not the clean and recommended solution.
--
On my computer, I made your sample run twice as fast by replacing the tuple with a struct. This means, the equivalent F# code should run faster than your Python code. I don't agree with the comments saying that .NET hashtables are slow, I believe there's no significant difference with Python or other languages implementations. Also, I don't agree with the "You can't 1-to-1 translate code expect it to be faster": F# code will generally be faster than Python for most tasks (static typing is very helpful to the compiler). In your sample, most of the time is spent doing hashtable lookups, so it's fair to imagine that both languages should be almost as fast.
I think the performance issue is related to gabage collection (but I haven't checked with a profiler). The reason why using tuples can be slower here than structures has been discussed in a SO question ( Why is the new Tuple type in .Net 4.0 a reference type (class) and not a value type (struct)) and a MSDN page (Building tuples):
If they are reference types, this
means there can be lots of garbage
generated if you are changing elements
in a tuple in a tight loop. [...]
F# tuples were reference types, but
there was a feeling from the team that
they could realize a performance
improvement if two, and perhaps three,
element tuples were value types
instead. Some teams that had created
internal tuples had used value instead
of reference types, because their
scenarios were very sensitive to
creating lots of managed objects.
Of course, as Jon said in another comment, the obvious optimization in your example is to replace hashtables with arrays. Arrays are obviously much faster (integer index, no hashing, no collision handling, no reallocation, more compact), but this is very specific to your problem, and it doesn't explain the performance difference with Python (as far as I know, Python code is using hashtables, not arrays).
To reproduce my 50% speedup, here is the full code: http://pastebin.com/nbYrEi5d
In short, I replaced the tuple with this type:
type Tup = {x: int; y: int}
Also, it seems like a detail, but you should move the List.mapi (fun i x -> (i,x)) fileSizes out of the enclosing loop. I believe Python enumerate does not actually allocate a list (so it's fair to allocate the list only once in F#, or use Seq module, or use a mutable counter).
Hmm.. if the hashtable is the major bottleneck, then it is properly the hash function itself. Havn't look at the specific hash function but For one of the most common hash functions namely
((a * x + b) % p) % q
The modulus operation % is painfully slow, if p and q is of the form 2^k - 1, we can do modulus with an and, add and a shift operation.
Dietzfelbingers universal hash function h_a : [2^w] -> [2^l]
lowerbound(((a * x) % 2^w)/2^(w-l))
Where is a random odd seed of w-bit.
It can be computed by (a*x) >> (w-l), which is magnitudes of speed faster than the first hash function. I had to implement a hash table with linked list as collision handling. It took 10 minutes to implement and test, we had to test it with both functions, and analyse the differens of speed. The second hash function had as I remember around 4-10 times of speed gain dependend on the size of the table.
But the thing to learn here is if your programs bottleneck is hashtable lookup the hash function has to be fast too

Categories