FSharp runs my algorithm slower than Python

FSharp runs my algorithm slower than Python - python

Years ago, I solved a problem via dynamic programming:
https://www.thanassis.space/fillupDVD.html
The solution was coded in Python.
As part of expanding my horizons, I recently started learning OCaml/F#. What better way to test the waters, than by doing a direct port of the imperative code I wrote in Python to F# - and start from there, moving in steps towards a functional programming solution.
The results of this first, direct port... are disconcerting:
Under Python:
bash$ time python fitToSize.py
....
real 0m1.482s
user 0m1.413s
sys 0m0.067s
Under FSharp:
bash$ time mono ./fitToSize.exe
....
real 0m2.235s
user 0m2.427s
sys 0m0.063s
(in case you noticed the "mono" above: I tested under Windows as well, with Visual Studio - same speed).
I am... puzzled, to say the least. Python runs code faster than F# ? A compiled binary, using the .NET runtime, runs SLOWER than Python's interpreted code?!?!
I know about startup costs of VMs (mono in this case) and how JITs improve things for languages like Python, but still... I expected a speedup, not a slowdown!
Have I done something wrong, perhaps?
I have uploaded the code here:
https://www.thanassis.space/fsharp.slower.than.python.tar.gz
Note that the F# code is more or less a direct, line-by-line translation of the Python code.
P.S. There are of course other gains, e.g. the static type safety offered by F# - but if the resulting speed of an imperative algorithm is worse under F# ... I am disappointed, to say the least.
EDIT: Direct access, as requested in the comments:
the Python code: https://gist.github.com/950697
the FSharp code: https://gist.github.com/950699

Dr Jon Harrop, whom I contacted over e-mail, explained what is going on:
The problem is simply that the program has been optimized for Python. This is common when the programmer is more familiar with one language than the other, of course. You just have to learn a different set of rules that dictate how F# programs should be optimized...
Several things jumped out at me such as the use of a "for i in 1..n do" loop rather than a "for i=1 to n do" loop (which is faster in general but not significant here), repeatedly doing List.mapi on a list to mimic an array index (which allocated intermediate lists unnecessarily) and your use of the F# TryGetValue for Dictionary which allocates unnecessarily (the .NET TryGetValue that accepts a ref is faster in general but not so much here)
... but the real killer problem turned out to be your use of a hash table to implement a dense 2D matrix. Using a hash table is ideal in Python because its hash table implementation has been extremely well optimized (as evidenced by the fact that your Python code is running as fast as F# compiled to native code!) but arrays are a much better way to represent dense matrices, particularly when you want a default value of zero.
The funny part is that when I first coded this algorithm, I DID use a table -- I changed the implementation to a dictionary for reasons of clarity (avoiding the array boundary checks made the code simpler - and much easier to reason about).
Jon transformed my code (back :-)) into its array version, and it runs at 100x speed.
Moral of the story:
F# Dictionary needs work... when using tuples as keys, compiled F# is slower than interpreted Python's hash tables!
Obvious, but no harm in repeating: Cleaner code sometimes means... much slower code.
Thank you, Jon -- much appreciated.
EDIT: the fact that replacing Dictionary with Array makes F# finally run at the speeds a compiled language is expected to run, doesn't negate the need for a fix in Dictionary's speed (I hope F# people from MS are reading this). Other algorithms depend on dictionaries/hashes, and can't be easily switched to using arrays; making programs suffer "interpreter-speeds" whenever one uses a Dictionary, is arguably, a bug. If, as some have said in the comments, the problem is not with F# but with .NET Dictionary, then I'd argue that this... is a bug in .NET!
EDIT2: The clearest solution, that doesn't require the algorithm to switch to arrays (some algorithms simply won't be amenable to that) is to change this:
let optimalResults = new Dictionary<_,_>()
into this:
let optimalResults = new Dictionary<_,_>(HashIdentity.Structural)
This change makes the F# code run 2.7x times faster, thus finally beating Python (1.6x faster). The weird thing is that tuples by default use structural comparison, so in principle, the comparisons done by the Dictionary on the keys are the same (with or without Structural). Dr Harrop theorizes that the speed difference may be attributed to virtual dispatch: "AFAIK, .NET does little to optimize virtual dispatch away and the cost of virtual dispatch is extremely high on modern hardware because it is a "computed goto" that jumps the program counter to an unpredictable location and, consequently, undermines branch prediction logic and will almost certainly cause the entire CPU pipeline to be flushed and reloaded".
In plain words, and as suggested by Don Syme (look at the bottom 3 answers), "be explicit about the use of structural hashing when using reference-typed keys in conjunction with the .NET collections". (Dr. Harrop in the comments below also says that we should always use Structural comparisons when using .NET collections).
Dear F# team in MS, if there is a way to automatically fix this, please do.

As Jon Harrop has pointed out, simply constructing the dictionaries using Dictionary(HashIdentity.Structural) gives a major performance improvement (a factor of 3 on my computer). This is almost certainly the minimally invasive change you need to make to get better performance than Python, and keeps your code idiomatic (as opposed to replacing tuples with structs, etc.) and parallel to the Python implementation.

Edit: I was wrong, it's not a question of value type vs reference type. The performance problem was related to the hash function, as explained in other comments. I keep my answer here because there's an interessant discussion. My code partially fixed the performance issue, but this is not the clean and recommended solution.
--
On my computer, I made your sample run twice as fast by replacing the tuple with a struct. This means, the equivalent F# code should run faster than your Python code. I don't agree with the comments saying that .NET hashtables are slow, I believe there's no significant difference with Python or other languages implementations. Also, I don't agree with the "You can't 1-to-1 translate code expect it to be faster": F# code will generally be faster than Python for most tasks (static typing is very helpful to the compiler). In your sample, most of the time is spent doing hashtable lookups, so it's fair to imagine that both languages should be almost as fast.
I think the performance issue is related to gabage collection (but I haven't checked with a profiler). The reason why using tuples can be slower here than structures has been discussed in a SO question ( Why is the new Tuple type in .Net 4.0 a reference type (class) and not a value type (struct)) and a MSDN page (Building tuples):
If they are reference types, this
means there can be lots of garbage
generated if you are changing elements
in a tuple in a tight loop. [...]
F# tuples were reference types, but
there was a feeling from the team that
they could realize a performance
improvement if two, and perhaps three,
element tuples were value types
instead. Some teams that had created
internal tuples had used value instead
of reference types, because their
scenarios were very sensitive to
creating lots of managed objects.
Of course, as Jon said in another comment, the obvious optimization in your example is to replace hashtables with arrays. Arrays are obviously much faster (integer index, no hashing, no collision handling, no reallocation, more compact), but this is very specific to your problem, and it doesn't explain the performance difference with Python (as far as I know, Python code is using hashtables, not arrays).
To reproduce my 50% speedup, here is the full code: http://pastebin.com/nbYrEi5d
In short, I replaced the tuple with this type:
type Tup = {x: int; y: int}
Also, it seems like a detail, but you should move the List.mapi (fun i x -> (i,x)) fileSizes out of the enclosing loop. I believe Python enumerate does not actually allocate a list (so it's fair to allocate the list only once in F#, or use Seq module, or use a mutable counter).

Hmm.. if the hashtable is the major bottleneck, then it is properly the hash function itself. Havn't look at the specific hash function but For one of the most common hash functions namely
((a * x + b) % p) % q
The modulus operation % is painfully slow, if p and q is of the form 2^k - 1, we can do modulus with an and, add and a shift operation.
Dietzfelbingers universal hash function h_a : [2^w] -> [2^l]
lowerbound(((a * x) % 2^w)/2^(w-l))
Where is a random odd seed of w-bit.
It can be computed by (a*x) >> (w-l), which is magnitudes of speed faster than the first hash function. I had to implement a hash table with linked list as collision handling. It took 10 minutes to implement and test, we had to test it with both functions, and analyse the differens of speed. The second hash function had as I remember around 4-10 times of speed gain dependend on the size of the table.
But the thing to learn here is if your programs bottleneck is hashtable lookup the hash function has to be fast too

Related

How to improve Python code speed

I was solving this python challenge http://coj.uci.cu/24h/problem.xhtml?abb=2634 and this is my answer
c = int(input())
l = []
for j in range(c) :
i = raw_input().split()[1].split('/')
l.append(int(i[1]))
for e in range(1,13) :
print e , l.count(e)
But it was not the fastest python solution, so i tried to find how to improve the speed and i found that xrange was faster than range. But when i tried the following code it was actually slower
c = int(input())
l = []
for j in xrange(c):
i = raw_input().split()[1].split('/')[1]
l.append(i)
for e in xrange(1,13) :
print e , l.count(`e`)
so i have 2 questions :
How can i improve the speed of my script
Where can i find information on how to improve python speed
When i was looking for this info i found sites like this one https://wiki.python.org/moin/PythonSpeed/PerformanceTips
but it doesn't specify for example, if it is faster/slower to split a string multiple times in a single line or in multiple lines, for example using part of the script mentioned above :
i = raw_input().split()[1].split('/')[1]
vs
i = raw_input().split()
i = i[1].split('/')
i = i[1]
Edit : I have tried all your suggestions but my first answer is still the fastest and i don't know why. My firs answer was 151ms and #Bakuriu's answer was 197ms and my answer using collections.Counter was 188ms.
Edit 2 : Please disregard my last edit, i just found out that the method for checking your code performance in the site mentioned above does not work, if you upload the same code more times the performance is different each time, some times it's slower and sometimes faster

Assuming you are using CPython, the golden rule is to push as much work as possible into built-in functions, since these are written in C and thus avoid the interpreter overhead.
This means that you should:
Avoid explicit loops when there is a function/method that already does what you want
Avoid expensive lookups in inner loops. In rare circumstances you may go as far as use local variables to store built-in functions.
Use the right data structures. Don't simply use lists and dicts. The standard library contains other data types, and there are many libraries out there. Consider which should be the efficient operations to solve your problem and choose the correct data structure
Avoid meta-programming. If you need speed you don't want a simple attribute lookup to trigger 10 method calls with complex logic behind the scenes. (However where you don't really need speed metaprogramming is really cool!)
Profile your code to find the bottleneck and optimize the bottleneck. Often what we think about performance of some concrete code is completely wrong.
Use the dis module to disassemble the bytecode. This gives you a simple way to see what the interpreter will really do. If you really want to know how the interpreter works you should try to read the source for PyEval_EvalFrameEx which contains the mainloop of the interpreter (beware: hic sunt leones!).
Regarding CPython you should read An optimization anecdote by Guido Van Rossum. It gives many insights as to how performance can change with various solutions. An other example could be this answer (disclaimer: it's mine) where the fastest solution is probably very counter intuitive for someone not used to CPython workings.
An other good thing to do is to study all most used built-in and stdlib data types, since each one has both positive and negative proporties. In this specific case calling list.count() is an heavy operation, since it has to scan the whole list every time it is performed. That's probably were a lot of the time is consumed in your solution.
One way to minimize interpreter overhead is to use collections.Counter, which also avoids scanning the data multiple times:
from collections import Counter
counts = Counter(raw_input().split('/')[-2] for _ in range(int(raw_input())))
for i in range(1, 13):
print(i, counts[str(i)])
Note that there is no need to convert the month to an integer, so you can avoid those function calls (assuming the months are always written in the same way. No 07 and 7).
Also I don't understand why you are splitting on whitespace and then on the / when you can simply split by the / and take the one-to-last element from the list.
An other (important) optimization could be to read all stdin to avoid multiple IO calls, however this may not work in this situation since the fact that they tell you how many employees are there probably means that they are not sending an EOF.
Note that different versions of python have completely different ways of optimizing code. For example PyPy's JIT works best when you perform simply operations in loops that the JIT is able to analyze and optimize. So it's about the opposite of what you would do in CPython.

absolute fastest lookup in python / cython

I'd like to do a lookup mapping 32bit integer => 32bit integer.
The input keys aren't necessary contiguous nor cover 2^32 -1 (nor do I want this in-memory to consume that much space!).
The use case is for a poker evaluator, so doing a lookup must be as fast as possible. Perfect hashing would be nice, but that might be a bit out of scope.
I feel like the answer is some kind of cython solution, but I'm not sure about the underpinnings of cython and if it really does any good with Python's dict() type. Of course a flat array with just a simple offset jump would be super fast, but then I'm allocating 2^32 - 1 places in memory for the table, which I don't want.
Any tips / strategies? Absolute speed with minimal memory footprint is the goal.

You aren't smart enough to write something faster than dict. Don't feel bad; 99.99999% of the people on the planet aren't. Use a dict.

First, you should actually define what "fast enough" means to you, before you do anything else. You can always make something faster, so you need to set a target so you don't go insane. It is perfectly reasonable for this target to be dual-headed - say something like "Mapping lookups must execute in these parameters (min/max/mean), and when/if we hit those numbers we're willing to spend X more development hours to optimize even further, but then we'll stop."
Second, the very first thing you should do to make this faster is to copy the code in Objects/dictobject.c in the Cpython source tree (make something new like intdict.c or something) and then modify it so that the keys are not python objects. Chasing after a better hash function will not likely be a good use of your time for integers, but eliminating INCREF/DECREF and PyObject_RichCompareBool calls for your keys will be a huge win. Since you're not deleting keys you could also elide any checks for dummy values (which exist to preserve the collision traversal for deleted entries), although it's possible that you'll get most of that win for free simply by having better branch prediction for your new object.

You are describing a perfect use case for a hash indexed collection. You are also describing a perfect scenario for the strategy of write it first, optimise it second.
So start with the Python dict. It's fast and it absolutely will do the job you need.
Then benchmark it. Figure out how fast it needs to go, and how near you are. Then 3 choices.
It's fast enough. You're done.
It's nearly fast enough, say within about a factor of two. Write your own hash indexing, paying attention to the hash function and the collision strategy.
It's much too slow. You're dead. There is nothing simple that will give you a 10x or 100x improvement. At least you didn't waste any time on a better hash index.

Can I improve python runtime by compiling?

I'm writing a small toy simulation in python. Granted, this simulations are slow. To my understanding, the major reason that python codes are slow is the fact that python is in interpreted language. I don't want to give up python since the clear syntax and the available library cut the writing time significantly. So is there a simple way for me to "compile" my python code?
Edit
I answer some questions:
Yes, I'm using numpy. It greatly simplify the code and I don't think I can improve performance writing the functions on my own. I use numpy for all my lists and and I add all of the beads together. Namely. I invoke
pos += V*dt + forces*0.5*dt**2
where ''pos'', 'V', and 'forces' are all np.array of (2000,3) dimensions.
I'm quite certain that the slow part in the forces calculation. This is logical as I have to iterate over all my particles and check their position. For my real project (Ph.D. stuff) I have code of about roughly the same level of complexity, and I know that this is the expensive stuff.

If none of the solutions in the comment suffice, you can also take a look at cython.
For a quick tutorial & example check:
http://docs.cython.org/src/tutorial/cython_tutorial.html
Used at the correct spots (e.g. around frequently called functions) it can easily speed things up by a factor of 10 - 100.

Python is a slightly odd language in that it is both interpreted and compiled. Well sort of. When you run it is compiled to ".pyc" bytecode - so we can quickly get bogged down in semantic details here. Hell I don't even know if what I just said is strictly accurate. But at the end of the day you want to speed things up so...
First, use the profiler and timeit to work out where all the time is going
Second, rewrite your pure python code to improve the slow bits you've discovered
Third, see how it goes when optimised
Now, depends on your scenario, but seriously think "Can I run it on a bigger CPU/memory"
Ok, try rewriting those slow sections in C++
Screw it, write it all in C++
If you get so far as the last option I dare say you're screwed and the savings aren't going to be significant.

Why program functionally in Python?

At work we used to program our Python in a pretty standard OO way. Lately, a couple guys got on the functional bandwagon. And their code now contains lots more lambdas, maps and reduces. I understand that functional languages are good for concurrency but does programming Python functionally really help with concurrency? I am just trying to understand what I get if I start using more of Python's functional features.

Edit: I've been taken to task in the comments (in part, it seems, by fanatics of FP in Python, but not exclusively) for not providing more explanations/examples, so, expanding the answer to supply some.
lambda, even more so map (and filter), and most especially reduce, are hardly ever the right tool for the job in Python, which is a strongly multi-paradigm language.
lambda main advantage (?) compared to the normal def statement is that it makes an anonymous function, while def gives the function a name -- and for that very dubious advantage you pay an enormous price (the function's body is limited to one expression, the resulting function object is not pickleable, the very lack of a name sometimes makes it much harder to understand a stack trace or otherwise debug a problem -- need I go on?!-).
Consider what's probably the single most idiotic idiom you sometimes see used in "Python" (Python with "scare quotes", because it's obviously not idiomatic Python -- it's a bad transliteration from idiomatic Scheme or the like, just like the more frequent overuse of OOP in Python is a bad transliteration from Java or the like):
inc = lambda x: x + 1
by assigning the lambda to a name, this approach immediately throws away the above-mentioned "advantage" -- and doesn't lose any of the DISadvantages! For example, inc doesn't know its name -- inc.__name__ is the useless string '<lambda>' -- good luck understanding a stack trace with a few of these;-). The proper Python way to achieve the desired semantics in this simple case is, of course:
def inc(x): return x + 1
Now inc.__name__ is the string 'inc', as it clearly should be, and the object is pickleable -- the semantics are otherwise identical (in this simple case where the desired functionality fits comfortably in a simple expression -- def also makes it trivially easy to refactor if you need to temporarily or permanently insert statements such as print or raise, of course).
lambda is (part of) an expression while def is (part of) a statement -- that's the one bit of syntax sugar that makes people use lambda sometimes. Many FP enthusiasts (just as many OOP and procedural fans) dislike Python's reasonably strong distinction between expressions and statements (part of a general stance towards Command-Query Separation). Me, I think that when you use a language you're best off using it "with the grain" -- the way it was designed to be used -- rather than fighting against it; so I program Python in a Pythonic way, Scheme in a Schematic (;-) way, Fortran in a Fortesque (?) way, and so on:-).
Moving on to reduce -- one comment claims that reduce is the best way to compute the product of a list. Oh, really? Let's see...:
$ python -mtimeit -s'L=range(12,52)' 'reduce(lambda x,y: x*y, L, 1)'
100000 loops, best of 3: 18.3 usec per loop
$ python -mtimeit -s'L=range(12,52)' 'p=1' 'for x in L: p*=x'
100000 loops, best of 3: 10.5 usec per loop
so the simple, elementary, trivial loop is about twice as fast (as well as more concise) than the "best way" to perform the task?-) I guess the advantages of speed and conciseness must therefore make the trivial loop the "bestest" way, right?-)
By further sacrificing compactness and readability...:
$ python -mtimeit -s'import operator; L=range(12,52)' 'reduce(operator.mul, L, 1)'
100000 loops, best of 3: 10.7 usec per loop
...we can get almost back to the easily obtained performance of the simplest and most obvious, compact, and readable approach (the simple, elementary, trivial loop). This points out another problem with lambda, actually: performance! For sufficiently simple operations, such as multiplication, the overhead of a function call is quite significant compared to the actual operation being performed -- reduce (and map and filter) often forces you to insert such a function call where simple loops, list comprehensions, and generator expressions, allow the readability, compactness, and speed of in-line operations.
Perhaps even worse than the above-berated "assign a lambda to a name" anti-idiom is actually the following anti-idiom, e.g. to sort a list of strings by their lengths:
thelist.sort(key=lambda s: len(s))
instead of the obvious, readable, compact, speedier
thelist.sort(key=len)
Here, the use of lambda is doing nothing but inserting a level of indirection -- with no good effect whatsoever, and plenty of bad ones.
The motivation for using lambda is often to allow the use of map and filter instead of a vastly preferable loop or list comprehension that would let you do plain, normal computations in line; you still pay that "level of indirection", of course. It's not Pythonic to have to wonder "should I use a listcomp or a map here": just always use listcomps, when both appear applicable and you don't know which one to choose, on the basis of "there should be one, and preferably only one, obvious way to do something". You'll often write listcomps that could not be sensibly translated to a map (nested loops, if clauses, etc), while there's no call to map that can't be sensibly rewritten as a listcomp.
Perfectly proper functional approaches in Python often include list comprehensions, generator expressions, itertools, higher-order functions, first-order functions in various guises, closures, generators (and occasionally other kinds of iterators).
itertools, as a commenter pointed out, does include imap and ifilter: the difference is that, like all of itertools, these are stream-based (like map and filter builtins in Python 3, but differently from those builtins in Python 2). itertools offers a set of building blocks that compose well with each other, and splendid performance: especially if you find yourself potentially dealing with very long (or even unbounded!-) sequences, you owe it to yourself to become familiar with itertools -- their whole chapter in the docs makes for good reading, and the recipes in particular are quite instructive.
Writing your own higher-order functions is often useful, especially when they're suitable for use as decorators (both function decorators, as explained in that part of the docs, and class decorators, introduced in Python 2.6). Do remember to use functools.wraps on your function decorators (to keep the metadata of the function getting wrapped)!
So, summarizing...: anything you can code with lambda, map, and filter, you can code (more often than not advantageously) with def (named functions) and listcomps -- and usually moving up one notch to generators, generator expressions, or itertools, is even better. reduce meets the legal definition of "attractive nuisance"...: it's hardly ever the right tool for the job (that's why it's not a built-in any more in Python 3, at long last!-).

FP is important not only for concurrency; in fact, there's virtually no concurrency in the canonical Python implementation (maybe 3.x changes that?). in any case, FP lends itself well to concurrency because it leads to programs with no or fewer (explicit) states. states are troublesome for a few reasons. one is that they make distributing the computation hard(er) (that's the concurrency argument), another, far more important in most cases, is the tendency to inflict bugs. the biggest source of bugs in contemporary software is variables (there's a close relationship between variables and states). FP may reduce the number of variables in a program: bugs squashed!
see how many bugs can you introduce by mixing the variables up in these versions:
def imperative(seq):
p = 1
for x in seq:
p *= x
return p
versus (warning, my.reduce's parameter list differs from that of python's reduce; rationale given later)
import operator as ops
def functional(seq):
return my.reduce(ops.mul, 1, seq)
as you can see, it's a matter of fact that FP gives you fewer opportunities to shoot yourself in the foot with a variables-related bug.
also, readability: it may take a bit of training, but functional is way easier to read than imperative: you see reduce ("ok, it's reducing a sequence to a single value"), mul ("by multiplication"). wherease imperative has the generic form of a for cycle, peppered with variables and assignments. these for cycles all look the same, so to get an idea of what's going on in imperative, you need to read almost all of it.
then there's succintness and flexibility. you give me imperative and I tell you I like it, but want something to sum sequences as well. no problem, you say, and off you go, copy-pasting:
def imperative(seq):
p = 1
for x in seq:
p *= x
return p
def imperative2(seq):
p = 0
for x in seq:
p += x
return p
what can you do to reduce the duplication? well, if operators were values, you could do something like
def reduce(op, seq, init):
rv = init
for x in seq:
rv = op(rv, x)
return rv
def imperative(seq):
return reduce(*, 1, seq)
def imperative2(seq):
return reduce(+, 0, seq)
oh wait! operators provides operators that are values! but.. Alex Martelli condemned reduce already... looks like if you want to stay within the boundaries he suggests, you're doomed to copy-pasting plumbing code.
is the FP version any better? surely you'd need to copy-paste as well?
import operator as ops
def functional(seq):
return my.reduce(ops.mul, 1, seq)
def functional2(seq):
return my.reduce(ops.add, 0, seq)
well, that's just an artifact of the half-assed approach! abandoning the imperative def, you can contract both versions to
import functools as func, operator as ops
functional = func.partial(my.reduce, ops.mul, 1)
functional2 = func.partial(my.reduce, ops.add, 0)
or even
import functools as func, operator as ops
reducer = func.partial(func.partial, my.reduce)
functional = reducer(ops.mul, 1)
functional2 = reducer(ops.add, 0)
(func.partial is the reason for my.reduce)
what about runtime speed? yes, using FP in a language like Python will incur some overhead. here i'll just parrot what a few professors have to say about this:
premature optimization is the root of all evil.
most programs spend 80% of their runtime in 20% percent of their code.
profile, don't speculate!
I'm not very good at explaining things. Don't let me muddy the water too much, read the first half of the speech John Backus gave on the occasion of receiving the Turing Award in 1977. Quote:
5.1 A von Neumann Program for Inner Product
c := 0
for i := I step 1 until n do
c := c + a[i] * b[i]
Several properties of this program are
worth noting:
Its statements operate on an invisible "state" according to complex
rules.
It is not hierarchical. Except for the right side of the assignment
statement, it does not construct
complex entities from simpler ones.
(Larger programs, however, often do.)
It is dynamic and repetitive. One must mentally execute it to
understand it.
It computes word-at-a-time by repetition (of the assignment) and by
modification (of variable i).
Part of the data, n, is in the program; thus it lacks generality and
works only for vectors of length n.
It names its arguments; it can only be used for vectors a and b.
To become general, it requires a
procedure declaration. These involve
complex issues (e.g., call-by-name
versus call-by-value).
Its "housekeeping" operations are represented by symbols in
scattered places (in the for statement
and the subscripts in the assignment).
This makes it impossible to
consolidate housekeeping operations,
the most common of all, into single,
powerful, widely useful operators.
Thus in programming those operations
one must always start again at square
one, writing "for i := ..." and
"for j := ..." followed by
assignment statements sprinkled with
i's and j's.

I program in Python everyday, and I have to say that too much 'bandwagoning' toward OO or functional could lead toward missing elegant solutions. I believe that both paradigms have their advantages to certain problems - and I think that's when you know what approach to use. Use a functional approach when it leaves you with a clean, readable, and efficient solution. Same goes for OO.
And that's one of the reasons I love Python - the fact that it is multi-paradigm and lets the developer choose how to solve his/her problem.

This answer is completely re-worked. It incorporates a lot of observations from the other answers.
As you can see, there is a lot of strong feelings surrounding the use of functional programming constructs in Python. There are three major groups of ideas here.
First, almost everybody but the people who are most wedded to the purest expression of the functional paradigm agree that list and generator comprehensions are better and clearer than using map or filter. Your colleagues should be avoiding the use of map and filter if you are targeting a version of Python new enough to support list comprehensions. And you should be avoiding itertools.imap and itertools.ifilter if your version of Python is new enough for generator comprehensions.
Secondly, there is a lot of ambivalence in the community as a whole about lambda. A lot of people are really annoyed by a syntax in addition to def for declaring functions, especially one that involves a keyword like lambda that has a rather strange name. And people are also annoyed that these small anonymous functions are missing any of the nice meta-data that describes any other kind of function. It makes debugging harder. Lastly the small functions declared by lambda are often not terribly efficient as they require the overhead of a Python function call each time they are invoked, which is frequently in an inner loop.
Lastly, most (meaning > 50%, but most likely not 90%) people think that reduce is a little strange and obscure. I myself admit to having print reduce.__doc__ whenever I want to use it, which isn't all that often. Though when I see it used, the nature of the arguments (i.e. function, list or iterator, scalar) speak for themselves.
As for myself, I fall in the camp of people who think the functional style is often very useful. But balancing that thought is the fact that Python is not at heart a functional language. And overuse of functional constructs can make programs seem strangely contorted and difficult for people to understand.
To understand when and where the functional style is very helpful and improves readability, consider this function in C++:
unsigned int factorial(unsigned int x)
{
int fact = 1;
for (int i = 2; i <= n; ++i) {
fact *= i;
}
return fact
}
This loop seems very simple and easy to understand. And in this case it is. But its seeming simplicity is a trap for the unwary. Consider this alternate means of writing the loop:
unsigned int factorial(unsigned int n)
{
int fact = 1;
for (int i = 2; i <= n; i += 2) {
fact *= i--;
}
return fact;
}
Suddenly, the loop control variable no longer varies in an obvious way. You are reduced to looking through the code and reasoning carefully about what happens with the loop control variable. Now this example is a bit pathological, but there are real-world examples that are not. And the problem is with the fact that the idea is repeated assignment to an existing variable. You can't trust the variable's value is the same throughout the entire body of the loop.
This is a long recognized problem, and in Python writing a loop like this is fairly unnatural. You have to use a while loop, and it just looks wrong. Instead, in Python you would write something like this:
def factorial(n):
fact = 1
for i in xrange(2, n):
fact = fact * i;
return fact
As you can see, the way you talk about the loop control variable in Python is not amenable to fooling with it inside the loop. This eliminates a lot of the problems with 'clever' loops in other imperative languages. Unfortunately, it's an idea that's semi-borrowed from functional languages.
Even this lends itself to strange fiddling. For example, this loop:
c = 1
for i in xrange(0, min(len(a), len(b))):
c = c * (a[i] + b[i])
if i < len(a):
a[i + 1] = a[a + 1] + 1
Oops, we again have a loop that is difficult to understand. It superficially resembles a really simple and obvious loop, and you have to read it carefully to realize that one of the variables used in the loop's computation is being messed with in a way that will effect future runs of the loop.
Again, a more functional approach to the rescue:
from itertools import izip
c = 1
for ai, bi in izip(a, b):
c = c * (ai + bi)
Now by looking at the code we have some strong indication (partly by the fact that the person is using this functional style) that the lists a and b are not modified during the execution of the loop. One less thing to think about.
The last thing to be worried about is c being modified in strange ways. Perhaps it is a global variable and is being modified by some roundabout function call. To rescue us from this mental worry, here is a purely function approach:
from itertools import izip
c = reduce(lambda x, ab: x * (ab[0] + ab[1]), izip(a, b), 1)
Very concise, and the structure tells us that x is purely an accumulator. It is a local variable everywhere it appear. The final result is unambiguously assigned to c. Now there is much less to worry about. The structure of the code removes several classes of possible error.
That is why people might choose a functional style. It is concise and clear, at least if you understand what reduce and lambda do. There are large classes of problems that could afflict a program written in a more imperative style that you know won't afflict your functional style program.
In the case of factorial, there is a very simple and clear way to write this function in Python in a functional style:
import operator
def factorial(n):
return reduce(operator.mul, xrange(2, n+1), 1)

The question, which seems to be mostly ignored here:
does programming Python functionally really help with concurrency?
No. The value FP brings to concurrency is in eliminating state in computation, which is ultimately responsible for the hard-to-grasp nastiness of unintended errors in concurrent computation. But it depends on the concurrent programming idioms not themselves being stateful, something that doesn't apply to Twisted. If there are concurrency idioms for Python that leverage stateless programming, I don't know of them.

Here's a short summary of positive answers when/why to program functionally.
List comprehensions were imported from Haskell, a FP language. They are Pythonic. I'd prefer to write
y = [i*2 for i in k if i % 3 == 0]
than to use an imperative construct (loop).
I'd use lambda when giving a complicated key to sort, like list.sort(key=lambda x: x.value.estimate())
It's cleaner to use higher-order functions than to write code using OOP's design patterns like visitor or abstract factory
People say that you should program Python in Python, C++ in C++ etc. That's true, but certainly you should be able to think in different ways at the same thing. If while writing a loop you know that you're really doing reducing (folding), then you'll be able to think on a higher level. That cleans your mind and helps to organize. Of course lower-level thinking is important too.
You should NOT overuse those features - there are many traps, see Alex Martelli's post. I'd subjectively say the most serious danger is that excessive use of those features will destroy readability of your code, which is a core attribute of Python.

The standard functions filter(), map() and reduce() are used for various operations on a list and all of the three functions expect two arguments: A function and a list
We could define a separate function and use it as an argument to filter() etc., and its probably a good idea if that function is used several times, or if the function is too complex to be written in a single line. However, if it's needed only once and it's quite simple, it's more convenient to use a lambda construct to generate a (temporary) anonymous function and pass it to filter().
This helps in readability and compact code.
Using these function, would also turn out to be efficient, because the looping on the elements of the list is done in C, which is a little bit faster than looping in python.
And object oriented way is forcibly needed when states are to be maintained, apart from abstraction, grouping, etc., If the requirement is pretty simple, I would stick with functional than to Object Oriented programming.

Map and Filter have their place in OO programming. Right next to list comprehensions and generator functions.
Reduce less so. The algorithm for reduce can rapidly suck down more time than it deserves; with a tiny bit of thinking, a manually-written reduce-loop will be more efficient than a reduce which applies a poorly-thought-out looping function to a sequence.
Lambda never. Lambda is useless. One can make the argument that it actually does something, so it's not completely useless. First: Lambda is not syntactic "sugar"; it makes things bigger and uglier. Second: the one time in 10,000 lines of code that think you need an "anonymous" function turns into two times in 20,000 lines of code, which removes the value of anonymity, making it into a maintenance liability.
However.
The functional style of no-object-state-change programming is still OO in nature. You just do more object creation and fewer object updates. Once you start using generator functions, much OO programming drifts in a functional direction.
Each state change appears to translate into a generator function that builds a new object in the new state from old object(s). It's an interesting world view because reasoning about the algorithm is much, much simpler.
But that's no call to use reduce or lambda.

Performance difference in alternative switches in Python

I have read a few articles around alternatives to the switch statement in Python. Mainly using dicts instead of lots of if's and elif's. However none really answer the question: is there one with better performance or efficiency? I have read a few arguments that if's and elifs would have to check each statement and becomes inefficient with many ifs and elif's. However using dicts gets around that, but you end up having to create new modules to call which cancels the performance gain anyways. The only difference in the end being readability.
Can anyone comment on this, is there really any difference in the long run? Does anyone regularly use the alternative? Only reason I ask is because I am going to end up having 30-40 elif/if's and possibly more in the future. Any input is appreciated. Thanks.

dict's perfomance is typically going to be unbeatable, because a lookup into a dict is going to be O(1) except in rare and practically never-observed cases (where they key involves user-coded types with lousy hashing;-). You don't have to "create new modules" as you say, just arbitrary callables, and that creation, which is performed just once to prep the dict, is not particularly costly anyway -- during operation, it's just one lookup and one call, greased lightning time.
As others have suggested, try timeit to experiment with a few micro-benchmarks of the alternatives. My prediction: with a few dozen possibilities in play, as you mention you have, you'll be slapping your forehead about ever considering anything but a dict of callables!-)
If you find it too hard to run your own benchmarks and can supply some specs, I guess we can benchmark the alternatives for you, but it would be really more instructive if you tried to do it yourself before you ask SO for help!-)

Your concern should be about the readability and maintainability of the code, rather than its efficiency. This applies in most scenarios, and particularly in the one you describe now. The efficiency difference is likely to be negligible (you can easily check it with a small amount of benchmarking code), but 30-40 elif's are a warning sign - perhaps something can be abstracted away and make the code more readable. Describe your case, and perhaps someone can come up with a better design.

With all performance/profiling questions, the right answer is "test each case yourself for your specific needs."
One great tool for this is timeit which you can learn about in the python docs.
In general I have seen no performance issues related to using a dictionary in place of other languages switch statement. My guess would be that the comparison in performance would depend on the number of alternatives. Who knows, there may be a tipping point where one becomes better than the other.
If you (or anyone else) tests it, feel free to post your results.

Times when you'd use a switch in many languages you would use a dict in Python. A switch statement, if added to Python (it's been considered), would not be able to give any real performance gain anyways.
dicts are used ubiquitously in Python. CPython dicts are an insanely-efficient, robust hashtable implementation. Lookup is O(1), as opposed to traversing an elif chain, which is O(n). (30-40 probably doesn't qualify as big enough for this to matter tons anyways). I am not sure what you mean about creating new modules to call, but using dicts is very scalable and easy.
As for actual performance gain, that is impossible to really tackle effectively abstractly. Write your code in the most straightforward and maintainable way (you're using Python forgoshsakes!) and then see if it's too slow. If it is, profile it and find out what places it needs to be sped up to make a real difference.

I think a dict will gain advantage over the alternative sequence of if statements as the number of cases goes up, since the key lookup only requires one hash operation. Otherwise if you only have a few cases, a few if statements are better. A dict is probably a more elegant solution for what you are doing. Either way, the performance difference wont really be noticeable in your case.

I ran a few benchmarks (see here). Using lists of function pointers is fastest,if your keys are sequential integers. For the general case: Alex Martelli is right, dictionary are fastest.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.