Most efficient way to sum a list in Python [duplicate] - python

This question already has answers here:
Summing with a for loop faster than with reduce?
(3 answers)
Closed 7 years ago.
Among the ways to obtain the sum of a list A of ints in Python are the following two:
Built-in sum function: sum(A)
Reduce function with adder lambda: reduce(lambda x, y: x + y, A)
Is there any speed advantage to using either of these, or are their performances roughly the same?

Among the ways to obtain the sum of a list A of ints in Python are the
following two:
Built-in sum function: sum(A)
Reduce function with adder lambda: reduce(lambda x, y: x + y, A)
Is there any speed advantage to using either of these, or are their
performances roughly the same?
On my machine the "sum" function appears to be way faster than the "reduce" version (at least for summing 5000 arrays of size 1000).
See:
$ cat doit.py
from timeit import timeit
print timeit('reduce(lambda x, y: x + y, range(1000))',number=5000)
print timeit('sum(range(1000))',number=5000)
$ python2 doit.py
0.460000038147
0.0599999427795
Update:
To address the comment, I've updated my answer to also include a 'setup' for creating the array to be summed:
$ cat doit2.py
from timeit import timeit
print timeit('reduce(lambda x, y: x + y, a)',setup='a=range(1000)',number=5000)
print timeit('sum(a)',setup='a=range(1000)',number=5000)
$ python2 doit2.py
0.530030012131
0.0320019721985
Again, the "sum" version appears to be the clear winner.

The answer almost certainly varies depending on the implementation you're using. However, as a best practice, you should assume that the built-in function has the best performance unless you have (a) proved otherwise and (b) shown that the difference is impacting performance in your specific application.
There are two complementary reasons for this. First, it's safe to assume that the people who implement the language are concerned with performance, and that they hear every complaint (justified or not) about performance. Therefore, if there's a better implementation, it's safe to assume that they will change to that implementation as soon as possible.
And if an even better one comes on line, you can assume that they'll change to that. This means that you get speed improvements for free, as they're discovered.
Second, it's safe to assume that the built-in function is going to communicate better in your codebase than an inline lambda. It's just simpler to read "sum" and understand "sum" than to parse the lambda. Since programmer time is in general vastly more expensive than CPU time, it makes sense to always optimize for the former over the latter, unless there is a clear and specific reason to do otherwise.

Related

Functional programming in Python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 months ago.
Improve this question
Functional programming is one of the programming paradigms in Python. It treats computation as the evaluation of mathematical functions and avoids state and mutable data. I am trying to understand how Python incorporates functional programming.
Consider the following factorial program (factorial.py):
def factorial(n, total):
if n == 0:
return total
else:
return factorial(n-1, total*n)
num = raw_input("Enter a natural number: ")
print factorial(int(num), 1)
That code avoids mutable data, because we are not changing the value of any variable. We are only recursively calling the factorial function with a new value.
If the example given above for functional programming is correct, then what does avoiding state mean?
Does functional programming only mean that I must use only functions whenever I have computations (as given in the above example)?
If the given example is wrong, what is a simple example with an explanation?
The example is correct for functional programming. And a good example of what not to do in Python because it is inefficient and doesn't scale. Python doesn't have any tail-call optimisation, so recursive calls should not be used solely to avoid imperative loops. If you really start programming in this style in Python, your programs will end with runtime errors eventually.
You are describing pure functional programming which is not something Python could be used for.
Python supports functional programming to some degree in the sense that functions are first class values. That means functions can be passed to other functions and returned as results from functions. And the standard library contains functions also found in most functional programming languages standard libraries, like map(), filter(), reduce(), and the stuff in the functools and itertools modules.
Borrowing from http://ua.pycon.org/static/talks/kachayev/#/8 where he makes a comparison between the way one thinks of imperative and functional programs. The example is borrowed.
Imperative:
expr, res = "28+32+++32++39", 0
for t in expr.split("+"):
if t != "":
res += int(t)
print res
Functional:
from operator import add
expr = "28+32+++32++39"
print reduce(add, map(int, filter(bool, expr.split("+"))))
If the given example is wrong, then kindly provide another simple example with an explanation.
It’s not, and it shows a realistic problem to start with, which you’ll see if you call factorial with a huge number. The maximum recursion depth will be reached. Python doesn’t have tail call optimization.
It the example given above for functional programming is correct, then what does avoiding state mean.
It means (in Python) that once a variable had been assigned, you should not reassign a new value, or change the value that you assigned to that variable.
Secondly, does functional programming only mean that, I must use only functions whenever I have computations (as given in the above example)
Functional programming is quite broad. Python is a multi-paradigm language that supports some functional programming concepts.
Functional programming means that all computations should be seen as mathematical functions.
I wrote a post about it that explains all the above in greater detail: Use Functional Programming In Python
Basics
Functional programming is a paradigm where we use only expressions and no statements. Statements do something like an if, whereas expressions evaluate mathematical stuff. We try to avoid mutability (values getting changed), since we only want pure functions. Pure functions don't have side effects, meaning that a function, given the same input, always produces the same output. We want functions to be pure, since they are easier to debug. By doing this, we are describing what a function is, as opposed to giving steps about how to do something and we write a lot less code. Functional programming is often called declarative, while other approaches are called imperative.
Utilities
Know about built-in functions, since they are so useful. For starters: abs, round, pow, max & min, sum, len, sorted, reversed, zip, and range.
Lambdas
Anonymous functions or lambdas are pure functions that take input and produce a value. There isn't any return statement, since we are returning what we are evaluating. You can also give them a name by just declaring a variable.
(lambda x, y: x + y)(1, 1)
add = lambda x, y: x + y
add(1, 1)
Ternary operator
Since we are working with expressions, instead of using if statements for logic, we use the ternary operator.
(expression) if (condition) else (expression2)
Map, filter & reduce
We also need a way of looping. This comes with list comprehension.
In this example we add one to each array element.
inc = lambda x: [i + 1 for i in x]
We can also operate on elements that meet a condition.
evens = lambda x: [i for i in x if (i % 2) == 0]
Some people say that that is the correct Pythonic way of doing things. But people who are more serious use a different approach: map, filter, and reduce. These are the building blocks of functional programming. While map and filter are built-in functions, reduce used to be, but now is in the functools module. To use it, import it:
from functools import reduce
Map is a function that takes a function and calls it on the elements of an array. The result of a map function is unfortunately unreadable, so you need to turn it into a tuple or a collection.
inc = lambda x: tuple(map(lambda y: y + 1, x))
Filter is a function that calls a function on an array, keeps the elements that output True and removes the ones that are False. Like map, it's not readable.
evens = lambda x: tuple(filter(lambda y: (y % 2) == 0, x))
Reduce takes a function that has two parameters. One is the result of the last time it was called & one is the new value. It keeps doing this until it reduces down to a value.
from functools import reduce
m_sum = lambda x: reduce(lambda y, z: y + z, x)
Let and do
A lot of functional programming languages have do. Do is a function that takes n arguments, evaluates all of them and returns the value of the last one. Python doesn't have do, but I created one myself.
do = lambda *args: tuple(map(lambda y: y, args))[-1]
Here's an example using do that prints something and exits.
print_err = lambda x: do(print(x), exit())
We use do to get imperative advantages.
Let allows some expression to be equal to something in a expression. In Python, most people do that as follows.
def something(x):
y = x + 10
return y * 3
Python 3.8 adds the expression assignment operators :=. So that code can now be written as a lambda.
something = lambda x: do(y := x + 10, y * 3)
Working with impure systems
The console, the file system, the web, etc. are immutable and impure and we need a way to work with them. In some languages like Haskell you have monads, which are wrappers for functions. Clojure allows the use of impure functions. In Python, a multi-paradigm language, we don't need anything, since it's not just functional. Print is an impure function.
Recursion
Your example uses recursion. Recursion uses the call stack, which is a small chunk of memory that is used for calling functions. It can get full and crash. A lot of functional programming languages use something known as lazy evaluation to help with the problem of recursion, but Python doesn't have it. So avoid recursion as much as possible.
Answers
Avoiding state is being immutable, meaning not changing the value of something.
In functional programming, everything is an expression, which is a function. We don't have that in Python though, since it's multi-paradigm.
In a more functional way, your code would be written as follows. Also it's not functional due to using an if statement instead of a ternary expression.
from functools import reduce
factorial = lambda x: reduce(lambda y, z: y*z, range(1,x+1))
This produces a range of 1 to x and multiplies all the values using range.

Why is python's built in multiplication so fast [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
So the other day I was trying something in python, I was trying to write a custom multiplication function in python
def multi(x, y):
z = 0
while y > 0:
z = z + x
y = y - 1
return z
However, when I ran it with extremely large numbers like (1 << 90) and (1 << 45) which is (2 ^ 90) * (2 ^ 45). It took forever to compute.
So I tried looking into different types of multiplication, like the russian peasant multiplication technique, implemented down there, which was extremely fast but not as readable as multi(x, y)
def russian_peasant(x, y):
z = 0
while y > 0:
if y % 2 == 1: z = z + x
x = x << 1
y = y >> 1
return z
What I want you to answer is how do programming languages like python multiply numbers ?
Your multi version runs in O(N) whereas russian_peasant version runs in O(logN), which is far better than O(N).
To realize how fast your russian_peasant version is, check this out
from math import log
print round(log(100000000, 2)) # 27.0
So, the loop has to be executed just 27 times, but your multi version's while loop has to be executed 100000000 times, when y is 100000000.
To answer your other question,
What I want you to answer is how do programming languages like python
multiply numbers ?
Python uses O(N^2) grade school multiplication algorithm for small numbers, but for big numbers it uses Karatsuba algorithm.
Basically multiplication is handled in C code, which can be compiled to machine code and executed faster.
Programming languages like Python use the multiplication instruction provided by your computer's CPU.
In addition, you have to remember that Python is a very high-level programming language, which runs on a virtual machine which itself runs on your computer. As such, it is, inherently, a few order of magnitudes slower than native code. Translating your algorithm to assembly (or even to C) would result in a massive speedup -- although it'd still be slower than the CPU's multiplication operation.
On the plus side, unlike naive assembly/C, Python auto-promotes integers to bignums instead of overflowing when your numbers are bigger than 2**32.
The basic answer to your question is this, multiplication using * is handled through C code. In essence if you write something in pure python its going to be slower than the C implementation, let me give you an example.
The operator.mul function is implemented in C, but a lambda is implemented in Python, we're going to try to find the product of all the numbers in an array using functools.reduce and we are going to use two cases, one using operator.mul and another using a lambda which both do the same thing (on the surface):
from timeit import timeit
setup = """
from functools import reduce
from operator import mul
"""
print(timeit('reduce(mul, range(1, 10))', setup=setup))
print(timeit('reduce(lambda x, y: x * y, range(1, 10))', setup=setup))
Output:
1.48362842561
2.67425475375
operator.mul takes less time, as you can see.
Usually, functional programming involving many computations is best made to take less time using memoization -- the basic idea is that if you feed a true function (something that always evaluates the same result for a given argument) the same thing twice or more, you're wasting time, time that could easily be saved by identifying common calls and storing whatever they evaluate down to into a hash table or other quickly-accessible object. See https://en.wikipedia.org/wiki/Memoization for basic theory. It is well-implemented in Common Lisp.

Efficiency difference between dict.has_key and key in dict in Python [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
'has_key()' or 'in'?
In Python, there're two ways of deciding whether a key is in a dict:
if dict.has_key(key) and if key in dict
Someone tells me that the second one is slower than the first one since the in keyword makes the expression an iteration over the dict, so it will be slower than the has_key alternative, which apparently uses hash to make the decision.
As I highly doubt the difference, since I think Python is smart enough to translate an in keyword before a dict to some hash way, I can't find any formal claim about this.
So is there really any efficiency difference between the two?
Thanks.
Both of these operations do the same thing: examine the hash table implemented in the dict for the key. Neither will iterate the entire dictionary. Keep in mind that for x in dict is different than if x in dict. They both use the in keyword, but are different operations.
The in keyword becomes a call on dict.__contains__, which dict can implement however it likes.
If there is a difference in the timings of these operations, it will be very small, and will have to do with the function call overhead of has_key.
BTW, the general preference is for key in dict as a clearer expression of the intent than dict.has_key(key). Note that speed has nothing to do with the preference. Readability is more important than speed unless you know you are in the critical path.
has_key isn't an alternative. It's deprecated. Don't use it. (It's slower anyhow)
D.has_key is actually slower due to the function call:
>>> D = dict((x, y) for x, y in zip(range(1000000), range(1000000)))
>>> from timeit import Timer
>>> t = Timer("1700 in D", "from __main__ import D")
>>> t.timeit()
0.10631704330444336
>>> t = Timer("D.has_key(1700)", "from __main__ import D")
>>> t.timeit()
0.18113207817077637

Does this function have to use reduce() or is there a more pythonic way?

If I have a value, and a list of additional terms I want multiplied to the value:
n = 10
terms = [1,2,3,4]
Is it possible to use a list comprehension to do something like this:
n *= (term for term in terms) #not working...
Or is the only way:
n *= reduce(lambda x,y: x*y, terms)
This is on Python 2.6.2. Thanks!
reduce is the best way to do this IMO, but you don't have to use a lambda; instead, you can use the * operator directly:
import operator
n *= reduce(operator.mul, terms)
n is now 240. See the docs for the operator module for more info.
Reduce is not the only way. You can also write it as a simple loop:
for term in terms:
n *= term
I think this is much more clear than using reduce, especially when you consider that many Python programmers have never seen reduce and the name does little to convey to people who see it for the first time what it actually does.
Pythonic does not mean write everything as comprehensions or always use a functional style if possible. Python is a multi-paradigm language and writing simple imperative code when appropriate is Pythonic.
Guido van Rossum also doesn't want reduce in Python:
So now reduce(). This is actually the one I've always hated most, because, apart from a few examples involving + or *, almost every time I see a reduce() call with a non-trivial function argument, I need to grab pen and paper to diagram what's actually being fed into that function before I understand what the reduce() is supposed to do. So in my mind, the applicability of reduce() is pretty much limited to associative operators, and in all other cases it's better to write out the accumulation loop explicitly.
There aren't a whole lot of associative operators. (Those are operators X for which (a X b) X c equals a X (b X c).) I think it's just about limited to +, *, &, |, ^, and shortcut and/or. We already have sum(); I'd happily trade reduce() for product(), so that takes care of the two most common uses. [...]
In Python 3 reduce has been moved to the functools module.
Yet another way:
import operator
n = reduce(operator.mul, terms, n)

Most useful list-comprehension construction?

What Python's user-made list-comprehension construction is the most useful?
I have created the following two quantifiers, which I use to do different verification operations:
def every(f, L): return not (False in [f(x) for x in L])
def some(f, L): return True in [f(x) for x in L]
an optimized versions (requres Python 2.5+) was proposed below:
def every(f, L): return all(f(x) for x in L)
def some(f, L): return any(f(x) for x in L)
So, how it works?
"""For all x in [1,4,9] there exists such y from [1,2,3] that x = y**2"""
answer = every([1,4,9], lambda x: some([1,2,3], lambda y: y**2 == x))
Using such operations, you can easily do smart verifications, like:
"""There exists at least one bot in a room which has a life below 30%"""
answer = some(bots_in_this_room, lambda x: x.life < 0.3)
and so on, you can answer even very complicated questions using such quantifiers. Of course, there is no infinite lists in Python (hey, it's not Haskell :) ), but Python's lists comprehensions are very practical.
Do you have your own favourite lists-comprehension constructions?
PS: I wonder, why most people tend not to answer questions but criticize presented examples? The question is about favourite list-comprehension construction actually.
anyand all are part of standard Python from 2.5. There's no need to make your own versions of these. Also the official version of any and all short-circuit the evaluation if possible, giving a performance improvement. Your versions always iterate over the entire list.
If you want a version that accepts a predicate, use something like this that leverages the existing any and all functions:
def anyWithPredicate(predicate, l): return any(predicate(x) for x in l)
def allWithPredicate(predicate, l): return all(predicate(x) for x in l)
I don't particularly see the need for these functions though, as it doesn't really save much typing.
Also, hiding existing standard Python functions with your own functions that have the same name but different behaviour is a bad practice.
There aren't all that many cases where a list comprehension (LC for short) will be substantially more useful than the equivalent generator expression (GE for short, i.e., using round parentheses instead of square brackets, to generate one item at a time rather than "all in bulk at the start").
Sometimes you can get a little extra speed by "investing" the extra memory to hold the list all at once, depending on vagaries of optimization and garbage collection on one or another version of Python, but that hardly amounts to substantial extra usefulness of LC vs GE.
Essentially, to get substantial extra use out of the LC as compared to the GE, you need use cases which intrinsically require "more than one pass" on the sequence. In such cases, a GE would require you to generate the sequence once per pass, while, with an LC, you can generate the sequence once, then perform multiple passes on it (paying the generation cost only once). Multiple generation may also be problematic if the GE / LC are based on an underlying iterator that's not trivially restartable (e.g., a "file" that's actually a Unix pipe).
For example, say you are reading a non-empty open text file f which has a bunch of (textual representations of) numbers separated by whitespace (including newlines here and there, empty lines, etc). You could transform it into a sequence of numbers with either a GE:
G = (float(s) for line in f for s in line.split())
or a LC:
L = [float(s) for line in f for s in line.split()]
Which one is better? Depends on what you're doing with it (i.e, the use case!). If all you want is, say, the sum, sum(G) and sum(L) will do just as well. If you want the average, sum(L)/len(L) is fine for the list, but won't work for the generator -- given the difficulty in "restarting f", to avoid an intermediate list you'll have to do something like:
tot = 0.0
for i, x in enumerate(G): tot += x
return tot/(i+1)
nowhere as snappy, fast, concise and elegant as return sum(L)/len(L).
Remember that sorted(G) does return a list (inevitably), so L.sort() (which is in-place) is the rough equivalent in this case -- sorted(L) would be supererogatory (as now you have two lists). So when sorting is needed a generator may often be preferred simply due to conciseness.
All in all, since L is identically equivalent to list(G), it's hard to get very excited about the ability to express it via punctuation (square brackets instead of round parentheses) instead of a single, short, pronounceable and obvious word like list;-). And that's all a LC is -- punctuation-based syntax shortcut for list(some_genexp)...!
This solution shadows builtins which is generally a bad idea. However the usage feels fairly pythonic, and it preserves the original functionality.
Note there are several ways to potentially optimize this based on testing, including, moving the imports out into the module level and changing f's default into None and testing for it instead of using a default lambda as I did.
def any(l, f=lambda x: x):
from __builtin__ import any as _any
return _any(f(x) for x in l)
def all(l, f=lambda x: x):
from __builtin__ import all as _all
return _all(f(x) for x in l)
Just putting that out there for consideration and to see what people think of doing something so potentially dirty.
For your information, the documentation for module itertools in python 3.x lists some pretty nice generator functions.

Categories