I'm looking for an efficient way to compute the result of a (complex) mathematical function.
right now it looks comparable to:
def f(x):
return x**2
def g(x):
if not x: return 1
return f(x)*5
def h(x):
return g(x)
with concurrent.futures.ProcessPoolExecutor() as executor:
print(list(executor.map(h, params)))
since every function call is costly in Python, the code should already run faster if f(x) is merged with g(x). Unfortunately in that case the 'return ...' line of the g(x) function becomes very long already. Furthermore there are currently actually 6 functions defined in total, so the complete formula occupies several lines.
So, what's a clever way to compute the result of a physics formula?
EDIT:
Thank you so far, but my question is not really about this specific snippet of code but more about the way to implement physics formulas in Python. For example one could also define the expression as a string and evaluate it using eval() but that is obviously slower.
To be more specific I have a potential and want to implement it parallel. Therefore I call my version of "h(x)" using the map function of a ProcessPoolExecutor (with different values each time). But is it best practice to define the function as a function that calls other functions or uses variables? Is there a more efficient way?
def formula(x):
if not x :
return 1
return x*x*5
I don't think the line is in danger of being problematically long, but if you're concerned about the length of the return ... line you could use intermediate values, e.g.:
def g(x):
if x == 0:
return 1
x2 = x ** 2
return x2 * 5
As an aside, in this context it is incorrect to use the is operator as in x is 0. It does not check for numerical equality, which is what == does. The is operator checks that the two operands refer to exactly the same object in memory, which happens to have the same behaviour as == in this case because the Python interpreter is intelligently reusing number objects. It can lead to confusing errors, for example:
a = 1234
b = 1233
a == (b + 1) # True
a is (b + 1) # False
In practice, is is mainly used only to check if a value is None.
Related
I have a function that computes a large expression based on Sympy symbols passed as arguments. A very simplified version for readability is:
def expr(a):
return (1+a)/a
This generally works but if the passed argument is the infinity symbol then this becomes a NaN, whereas I'd prefer the result to be somehow evaluated as a limit and return 1 (in this simplified case).
Since in my actual code there are many arguments which could be infinite and the expression is quite large, I'd rather avoid an if-else sequence covering every possible combination of infinite-valued arguments.
I've tried using unevaluated expressions but that doesn't seem to work either. Is there a good workaround for this?
You could write the expression as a limit:
from sympy import oo, limit
from sympy.abc import a, x
def expr(a):
return limit((1 + x) / x, x, a)
print(expr(oo)) # 1
In python, if I have a function
f(x)=(g(x)+1)(g(x))
g(x) is defined before and takes time to calculate.
some thing like this
def g(x):
return value
def f(x):
return (g(x)+1)*(g(x))
When I am calculating f(x), will python calculate g(x) two times when substitute g(x) in to the equation?
If it does calculate two times, how people usually handle it in python?
Depending on the function g(x), yes, it will be calculated again each time it is called. A memoized function (Wikipedia) will only need to calculate the value once for a given value of x. For example:
def g(x):
if cache[x]:
return cache[x]
else:
result = # calculate g(x)
cache[x] = result
return cache[x] # or return result
It's easy to check - print something in g and see how many times it gets printed.
Anyway, yes. It will call g twice.
The best way I know to handle it is to create a local variable equal to the method returned value then using it instead.
def f(x):
g_result = g(x)
return (g_result + 1) * g_result
will python calculate g(x) two times when substitute g(x) in to the
equation
return:(g(x)+1) * (g(x))
^^^ ^^^
Yes. Since the function is called twice, it will be calculated twice.
how people usually handle it in python
You call the function and store the value to be used later.
def f(x):
value_g = g(x)
return (value_g+1) * (value_g)
I feel that the first part of your question has been well answered, but as for:
how people usually handle it in python?
If it is the case that the output from the function g(x) will be needed twice, it makes sense (at least for me) to just store it in a variable. Due to the nature of how a function should work (without global influences such as time), the same input should always give the same output.
So lets say that this function would take 1 minute to compute, if we wanted to use the output twice, it wouldn't make sense to call it twice as it would now take twice as long: 2 minutes. As mentioned above, you could use memoization or simply store the output of g(x) in a variable.
E.g.
def f(x):
gx = g(x)
return: (gx+1) * gx
Let us say I have an equation
x + 2*cos(x) = 0
and I want to solve it. Then I can program the following:
def func1(x):
out = x + 2*cos(x)
return out
Solution = fsolve(func1, StartValue)
StartValue can have an arbitrary value in this example. So far so good! I am programming a simulation that creates a non linear system of equations which I want to solve with fsolve().
The challenge is now that (!) before run time the size of the non linear system of equations is not known (!). This means that I can have for example
x + 2*cos(x) = 0
in the same way as I can have
2*x + 4*y = 0
18*y -18 = 0
In order to solve the last mentioned system of equations (which will generally always be a non linear one in my program) I have found the following solution:
def func2(x):
out = [2*x[0] + 4*x[1]]
out.append(18*x[1]-18)
return out
Solution = fsolve(func2, [1, 1])
This works quite well, too. But there is a certain reason why I cannot use the solution shown for func2(x): It makes my program very slow!
The function fsolve() calls the function func2(x) iteratively several times until it has found the solution [-2 1]. But my program will handle a linear system of equations
of some hundred to thousend rows. This would mean that in every iteration step all those thousends rows are appended as it is shown in func2(x). Therefore I am looking for a solution that ONCE creates the system of equations as a function func3(x) and afterwards fsolve() only calls the ready built func3(x). Here is a PSEUDO CODE example:
func3 = lambda x: 2*x[0] + 4*x[1]
func3.append(lambda x: 18*x[1] - 18)
Solution = fsolve(func3, [1, 1])
Unfortunately, functions cannot be appended as I show it above in my PSEUDO CODE. Therefore my question: How can I dynamically build my function func3 and then pass the (
!) ONCE READY BUILT (!) function func3 to fsolve() ???
Thank you very much in advance
Solution: "outsource the command into a string concetanation"
One can build a string outside the function as follows:
StringCommand = "f = [2*x[0] + 4*x[1], 18*x[1] - 18]"
Afterwards this StringCommand is used as input parameter of the function to be called as follows:
def func(x,StringCommand):
exec StringCommand
return f
Thus, the StringCommand is executed via the command exec. Finally, one just needs to call the function within fsolve() as follows:
fsolve(func, [1, 1], StringCommand)
That's it. Doing it this way, the StringCommand is built once outside the function func() and therefore much time is saved when fsolve() does its iteration with function func(). Note that [1,1] are the start values for the iteration!
For some bizarre reason, I am struggling to get my head around iterators/generators as used in Python (I write/use them with no problem in C++ - but somehow I can't seem to grok how to write one using Python).
I have a mathematical function of the form:
f(a,b) = ( v1(a) - v1(b) ) / ( v2(a) - v2(b) )
Where v1 and v2 are equal length 1D vectors.
I want to write a function (actually, a generator), that generates the output of f() as defined above.
Can anyone help?
[[Edit]]
My notation may have been confusing. I hope to clarify that. The function described above return a set of values. With the argument b taking on values, in the interval (a,b]
So for example if we call f(1,5), the function will return the following values (not functions -in case my clarification below causes further confusion):
f(1,1)
f(1,2)
f(1,3)
f(1,4)
f(1,5)
You can use generator expression:
def f(a, b):
return ((v1[a] - v1[i]) / (v2[a] - v2[i]) for i in xrange(a, b+1))
Or a generator function
def f(a, b):
for i in xrange(a, b+1)
yield (v1[a] - v1[i]) / (v2[a] - v2[i])
The generator may look like that, since there is no iteration (as kindall correctly noted).
def f(a, b):
yield (v1[a] - v1[b]) / (v2[a] - v2[b]) # be careful about division!
Couple notes:
there is nothing to iterate through, and generators are generally used in iterations,
be careful about divisions: in Python 2.x a / b returns integer if both a and b are integers (so 4 / 3 == 1) - you can avoid this by using floats or by from __future__ import division,
I'm trying to use Python and Numpy/Scipy to implement an image processing algorithm. The profiler tells me a lot of time is being spent in the following function (called often), which tells me the sum of square differences between two images
def ssd(A,B):
s = 0
for i in range(3):
s += sum(pow(A[:,:,i] - B[:,:,i],2))
return s
How can I speed this up? Thanks.
Just
s = numpy.sum((A[:,:,0:3]-B[:,:,0:3])**2)
(which I expect is likely just sum((A-B)**2) if the shape is always (,,3))
You can also use the sum method: ((A-B)**2).sum()
Right?
Just to mention that one can also use np.dot:
def ssd(A,B):
dif = A.ravel() - B.ravel()
return np.dot( dif, dif )
This might be a bit faster and possibly more accurate than alternatives using np.sum and **2, but doesn't work if you want to compute ssd along a specified axis. In that case, there might be a magical subscript formula using np.einsum.
I am confused why you are taking i in range(3). Is that supposed to be the whole array, or just part?
Overall, you can replace most of this with operations defined in numpy:
def ssd(A,B):
squares = (A[:,:,:3] - B[:,:,:3]) ** 2
return numpy.sum(squares)
This way you can do one operation instead of three and using numpy.sum may be able to optimize the addition better than the builtin sum.
Further to Ritsaert Hornstra's answer that got 2 negative marks (admittedly I didn't see it in it's original form...)
This is actually true.
For a large number of iterations it can often take twice as long to use the '**' operator or the pow(x,y) method as to just manually multiply the pairs together. If necessary use the math.fabs() method if it's throwing out NaN's (which it sometimes does especially when using int16s etc.), and it still only takes approximately half the time of the two functions given.
Not that important to the original question I know, but definitely worth knowing.
I do not know if the pow() function with power 2 will be fast. Try:
def ssd(A,B):
s = 0
for i in range(3):
s += sum((A[:,:,i] - B[:,:,i])*(A[:,:,i] - B[:,:,I]))
return s
You can try this one:
dist_sq = np.sum((A[:, np.newaxis, :] - B[np.newaxis, :, :]) ** 2, axis=-1)
More details can be found here (the 'k-Nearest Neighbors' example):
https://jakevdp.github.io/PythonDataScienceHandbook/02.08-sorting.html
In Ruby language you can achieve this in this way
def diff_btw_sum_of_squars_and_squar_of_sum(from=1,to=100) # use default values from 1..100.
((1..100).inject(:+)**2) -(1..100).map {|num| num ** 2}.inject(:+)
end
diff_btw_sum_of_squars_and_squar_of_sum #call for above method