I'm looking for some help understanding best practices regarding dictionaries in Python.
I have an example below:
def convert_to_celsius(temp, source):
conversion_dict = {
'kelvin': temp - 273.15,
'romer': (temp - 7.5) * 40 / 21
}
return conversion_dict[source]
def convert_to_celsius_lambda(temp, source):
conversion_dict = {
'kelvin': lambda x: x - 273.15,
'romer': lambda x: (x - 7.5) * 40 / 21
}
return conversion_dict[source](temp)
Obviously, the two functions achieve the same goal, but via different means. Could someone help me understand the subtle difference between the two, and what the 'best' way to go on about this would be?
If you have both dictionaries being created inside the function, then the former will be more efficient - although the former performs two calculations when only one is needed, there is more overhead in the latter version for creating the lambdas each time it's called:
>>> import timeit
>>> setup = "from __main__ import convert_to_celsius, convert_to_celsius_lambda, convert_to_celsius_lambda_once"
>>> timeit.timeit("convert_to_celsius(100, 'kelvin')", setup=setup)
0.5716437913429102
>>> timeit.timeit("convert_to_celsius_lambda(100, 'kelvin')", setup=setup)
0.6484164544288618
However, if you move the dictionary of lambdas outside the function:
CONVERSION_DICT = {
'kelvin': lambda x: x - 273.15,
'romer': lambda x: (x - 7.5) * 40 / 21
}
def convert_to_celsius_lambda_once(temp, source):
return CONVERSION_DICT[source](temp)
then the latter is more efficient, as the lambda objects are only created once, and the function only does the necessary calculation on each call:
>>> timeit.timeit("convert_to_celsius_lambda_once(100, 'kelvin')", setup=setup)
0.3904035060131186
Note that this will only be a benefit where the function is being called a lot (in this case, 1,000,000 times), so that the overhead of creating the two lambda function objects is less than the time wasted in calculating two results when only one is needed.
The dictionary is totally pointless, since you need to re-create it on each call but all you ever do is a single look-up. Juse use an if:
def convert_to_celsius(temp, source):
if source == "kelvin": return temp - 273.15
elif source == "romer": return (temp - 7.5) * 40 / 21
raise KeyError("unknown temperature source '%s'" % source)
Even though both achieve the same thing, the first part is more readable and faster.
In your first example you have a simple arithmetical operation which is going to be calculated once convert_to_celsius is called.
In the second example you calculate only the required temperature.
If you had the second function do an expensive calculation, then it would probably make sense to use a function instead, but for this particular example it's not required.
As others have pointed out, neither of your options are ideal. The first one does both calculations every time and has an unnecessary dict. The second one has to create the lambdas every time through. If this example is the goal then I agree with unwind to just use an if statement. If the goal is to learn something that can be expanded to other uses, I like this approach:
convert_to_celsius = { 'kelvin' : lambda temp: temp - 273.15 ,
'romer' : lambda temp: (temp-7.5) * 40 / 21}
newtemp = convert_to_celsius[source](temp)
Your calculation defintions are all stored together and your function call is uncluttered and meaningful.
Related
sympy seems to evaluate expressions by default which is problematic in scenarios where automatic evaluation negatively impacts numerical stability. I need a way to control what gets evaluated to preserve that stability.
The only official mechanism I'm aware of is the UnevaluatedExpr class, but this solution is problematic for my purpose. Users of my code are not supposed to be burdened by any numerical stability considerations. They simply want to enter an expression and the code needs to do all the rest. Making them analyze the numerical stability of their own expressions is not an option. It needs to be done automatically.
First I tried to gain control of sympify() by monkeypatching it as it seems the main culprit behind most calls that lead to unwanted evaluation, but I only came as far as catchig all the calls, without being able to really control them the way I wanted. I bumped against so many walls there that I wouldn't even know where to start.
Modifying sympy itself, as you can probably imagine, is not an option either as I can't possibly require users to make some exotic patches of their local sympy installations.
Next I discovered that it's possible to say
with evaluate(False): doSomeStuffToExpression(expr)
This seems to violently shove evaluate=False down the throat of sympy no matter what.
However that means it radically deactivates all evaluation and does not allow any fine control.
Specifically I want to deactivate evaluation when there is an Add inside an sympy.exp
So the third attempt was to modify the expression tree. Basically developing a method that takes the expression, traverses it and automatically wraps args with UnevaluatedExpr where needed (remember: I can't bother the user with doing that manually)
So I wrote the following code to test the new apporach:
from sympy.core.expr import UnevaluatedExpr
from sympy.core.symbol import Symbol
import sympy as sp
from sympy.core.numbers import Float
x, z = sp.symbols('x z')
#expr = (x + 2.*x)/4. + sp.exp((x+sp.UnevaluatedExpr(32.))/6.)
expr = sp.sympify('(x + 2.*x)/4. + exp((x+32.)/6.)', evaluate=False)
expr_ = expr.subs(x, z)
print(expr)
print(expr_)
print('///////////\n')
def prep(expr, exp_depth = 0):
# once we are inside UnevaluatedExpr, we need to continue to traverse
# down to the Symbol and also wrap it with UnevaluatedExpr
if isinstance(expr, UnevaluatedExpr):
for arg in expr.args:
newargs = []
for arg_inside in arg.args:
if isinstance(arg_inside, Symbol) or isinstance(arg_inside, Float):
newargs.append(UnevaluatedExpr(arg_inside))
else:
newargs.append(arg_inside)
arg._args = tuple(newargs)
for arg_inside in arg.args:
prep(arg_inside, exp_depth = exp_depth + 1)
return
original_args = expr.args
# if args empty
if not original_args: return
# check if we just entered exp
is_exp = (expr.func == sp.exp)
print('\n-----')
print('expression\t\t-->', expr)
print('func || args\t\t-->', expr.func, ' || ', original_args)
print('is it exp right now?\t-->', is_exp)
print('inside exp?\t-->', exp_depth > 0)
# if we just received exp or if we are inside exp
if is_exp or exp_depth > 0:
newargs = []
for arg in original_args:
if isinstance(arg, sp.Add):
newargs.append(UnevaluatedExpr(arg))
else:
newargs.append(arg)
expr._args= tuple(newargs)
for arg in expr.args:
prep(arg, exp_depth = exp_depth + 1)
else:
for arg in original_args: prep(arg, exp_depth)
prep(expr)
print('///////////\n')
print(expr)
substituted = expr.subs(x, z)
print("substitution after prep still does not work:\n", substituted)
wewantthis = expr.subs(x, UnevaluatedExpr(z))
print("we want:\n", wewantthis)
print('///////////\n')
However the output was dissapointing as subs() triggers the dreaded evaluation again, despite wrapping args in UnevaluatedExpr where needed. Or let's say where I understood wrapping would be needed.
For some reason subs() completely ignores my changes.
So the question is: is there even any hope in this last approach (maybe I still missed something when traversing the tree) - and if there is no hope in my approach, how else should I achieve the goal of disabling evaluation of a specific Symbol when encountering an Add inside an sympy.exp (which is the exponential function)
PS:
I should probably also mention that for reasons that seem puzzling, the following works (but as I mentioned it's a manual solution that I don't desire)
expr = (x + 2.*x)/4. + sp.exp((x+sp.UnevaluatedExpr(32.))/6.)
expr_ = expr.subs(x, z)
print(expr)
print(expr_)
Here we successfully prevented the evaluation of the Add inside sp.exp
Output:
0.75*x + exp(0.166666666666667*(x + 32.0))
0.75*z + exp(0.166666666666667*(z + 32.0))
Edit 0:
The solution should permit the usage of floats. For example some of the values may describe physical properties, measured beyond the accuracy of an integer. I need to be able to allow those.
Substituting Floats with Symbols is also problematic as it substantially complicates handling of the expressions or the usage of those expressions at a later time
I'm not sure but I think that the problem you are having is to do with automatic distribution of a Number over an Add which is controlled by the distribute context manager:
In [326]: e1 = 2*(x + 1)
In [327]: e1
Out[327]: 2⋅x + 2
In [328]: from sympy.core.parameters import distribute
In [329]: with distribute(False):
...: e2 = 2*(x + 1)
...:
In [330]: e2
Out[330]: 2⋅(x + 1)
The automatic distribution behaviour is something that would be good to change in sympy. It's just not easy to change because it is such a low-level operation and it has been this way for a long time (it would break a lot of people's code).
Other parts of the evaluation that you see are specific to the fact that you are using floats and would not happen for Rational or for a symbol e.g.:
In [337]: exp(2*(x + 1))
Out[337]:
2⋅x + 2
ℯ
In [338]: exp(2.0*(x + 1))
Out[338]:
2.0⋅x
7.38905609893065⋅ℯ
In [339]: exp(y*(x + 1))
Out[339]:
y⋅(x + 1)
ℯ
You could convert rationals to float with nsimplify to avoid that e.g.:
In [340]: parse_expr('exp(2.0*(x + 1))', evaluate=False)
Out[340]:
2.0⋅(x + 1)
ℯ
In [341]: parse_expr('exp(2.0*(x + 1))', evaluate=False).subs(x, z)
Out[341]:
2.0⋅z
7.38905609893065⋅ℯ
In [342]: nsimplify(parse_expr('exp(2.0*(x + 1))', evaluate=False))
Out[342]:
2⋅x + 2
ℯ
In [343]: nsimplify(parse_expr('exp(2.0*(x + 1))', evaluate=False)).subs(x, z)
Out[343]:
2⋅z + 2
ℯ
Another possibility is to use symbols and delay substitution of any values until numerical evaluation. This is the way to get the most accurate result from evalf:
In [344]: exp(y*(z + 1)).evalf(subs={y:1, z:2})
Out[344]: 20.0855369231877
SciPy can solve ode equations by scipy.integrate.odeint or other packages, but it gives result after the function has been solved completely. However, if the ode function is very complex, the program will take a lot of time(one or two days) to give the whole result. So how can I mointor the step it solve the equations(print out result when the equation hasn't been solved completely)?
When I was googling for an answer, I couldn't find a satisfactory one. So I made a simple gist with a proof-of-concept solution using the tqdm project. Hope that helps you.
Edit: Moderators asked me to give an explanation of what is going on in the link above.
First of all, I am using scipy's OOP version of odeint (solve_ivp) but you could adapt it back to odeint. Say you want to integrate from time T0 to T1 and you want to show progress for every 0.1% of progress. You can modify your ode function to take two extra parameters, a pbar (progress bar) and a state (current state of integration). Like so:
def fun(t, y, omega, pbar, state):
# state is a list containing last updated time t:
# state = [last_t, dt]
# I used a list because its values can be carried between function
# calls throughout the ODE integration
last_t, dt = state
# let's subdivide t_span into 1000 parts
# call update(n) here where n = (t - last_t) / dt
time.sleep(0.1)
n = int((t - last_t)/dt)
pbar.update(n)
# we need this to take into account that n is a rounded number.
state[0] = last_t + dt * n
# YOUR CODE HERE
dydt = 1j * y * omega
return dydt
This is necessary because the function itself must know where it is located, but scipy's odeint doesn't really give this context to the function. Then, you can integrate fun with the following code:
T0 = 0
T1 = 1
t_span = (T0, T1)
omega = 20
y0 = np.array([1], dtype=np.complex)
t_eval = np.arange(*t_span, 0.25/omega)
with tqdm(total=1000, unit="‰") as pbar:
sol = solve_ivp(
fun,
t_span,
y0,
t_eval=t_eval,
args=[omega, pbar, [T0, (T1-T0)/1000]],
)
Note that anything mutable (like a list) in the args is instantiated once and can be changed from within the function. I recommend doing this rather than using a global variable.
This will show a progress bar that looks like this:
100%|█████████▉| 999/1000 [00:13<00:00, 71.69‰/s]
You could split the integration domain and integrate the segments, taking the last value of the previous as initial condition of the next segment. In-between, print out whatever you want. Use numpy.concatenate to assemble the pieces if necessary.
In a standard example of a 3-body solar system simulation, replacing the code
u0 = solsys.getState0();
t = np.arange(0, 100*365.242*day, 0.5*day);
%timeit u_res = odeint(lambda u,t: solsys.getDerivs(u), u0, t, atol = 1e11*1e-8, rtol = 1e-9)
output: 1 loop, best of 3: 5.53 s per loop
with a progress-reporting code
def progressive(t,N):
nk = [ int(n+0.5) for n in np.linspace(0,len(t),N+1) ]
u0 = solsys.getState0();
u_seg = [ np.array([u0]) ];
for k in range(N):
u_seg.append( odeint(lambda u,t: solsys.getDerivs(u), u0, t[nk[k]:nk[k+1]], atol = 1e11*1e-8, rtol = 1e-9)[1:] )
print t[nk[k]]/day
for b in solsys.bodies: print("%10s %s"%(b.name,b.x))
return np.concatenate(u_seg)
%timeit u_res = progressive(t,20)
output: 1 loop, best of 3: 5.96 s per loop
shows only a slight 8% overhead for console printing. With a more substantive ODE function, the fraction of the reporting overhead will reduce significantly.
That said, python, at least with its standard packages, is not the tool for industrial-scale number-crunching. Always use compiled versions with strong typing of variables to reduce interpretative overhead as much as possible.
Use some heavily developed and tested package like Sundials or the julia-lang framework differentialequations.jl directly coding the ODE function in an appropriate compiled language. Use the higher-order methods for larger step sizes, thus smaller steps. Test if using implicit or exponential/Rosenbrock methods reduces the number of steps or ODE function evaluations per fixed interval further. The difference can be a factor of 10 to 100 in speedup.
Use a python wrapper of the above with some acceleration-friendly implementation of your ODE function.
Use the quasi-source-translating tool JITcode to translate your python ODE function to a spaghetti list of C instruction that then give a compiled function that can be (almost) directly called from the compiled FORTRAN kernel of odeint.
Simple and Clear.
If you want to integrate an ODE from T0 to T1:
In the last line of the code, before return, you can use print((t/T1)*100,end='')
Then use a sys.stdout.flush() to keep the same line of printing.
Here is an example. My integrating time [0 0.2]
ddt[-2]=(beta/(Ap2*(L-x)))*(-Qgap+Ap*u)
ddt[-1]=(beta/(Ap2*(L+x)))*(Qgap-Ap*u)
print("\rCompletion percentage "+str(format(((t/0.2)*100),".4f")),end='')
sys.stdout.flush()
return ddt
It slows a bit the solving process by fraction of seconds, but it serves perfectly the purpose rather than creating new functions.
In Scala–since it is a functional programming language–I can sequentially iterate a function from a starting value to create an array of [f(initial), f( f(initial)), f( f( f(initial))), ...].
For example, if I want to predict the future temperature based on the current temperature, I can do something like this in Python:
import random as rnd
def estimateTemp( previousTemp):
# function to estimate the temperature, for simplicity assume it is as follows:
return( previousTemp * rnd.uniform(0.8, 1.2) + rnd.uniform(-1.0, 1.0))
Temperature = [0.0 for i in range(100)]
for i in range(1,100):
Temperature[i] = estimateTemp( Temperature[i-1] )
The problem with the previous code is that it uses for loop, requires predefined array for the temperature, and in many languages you can replace the for loop with an iterator. For example, in Scala you can easily do the previous example by using the iterate method to create a list:
val Temperature = List.iterate(0.0,100)( n =>
(n * (scala.util.Random.nextDouble()*0.4+0.8)) +
(scala.util.Random.nextDouble()*2-1)
)
Such an implementation is easy to follow and clearly written.
Python have implemented the itertools module to imitate some functional programming languages. Are there any methods in the itertools module which imitate the Scala iterate method?
You could turn your function into an infinite generator and take an appropriate slice:
import random as rnd
from itertools import islice
def estimateTemp(startTemp):
while 1:
yield startTemp
startTemp = (startTemp * rnd.uniform(0.8, 1.2) + rnd.uniform(-1.0, 1.0))
temperature = list(islice(estimateTemp(0.0), 0, 100))
An equivalent program can be produced by using itertools.accumulate-:
from itertools import accumulate
accumulate(range(0, 100), lambda x, y => estimateTemp(x))
So here we have an accumulator x that is updated, the y parameter (which is the next element of the iterable) is ignored. We use it as a way to iterate 100 times.
Unfortunately, itertools does not have this functionality built-in. Haskell and Scala both have this function, and it bothered me too. An itertools wrapper called Alakazam that I am developing has some additional helper functions, including the aforementioned iterate function.
Runnable example using Alakazam:
import random as rnd
import alakazam as zz
def estimateTemp(previousTemp):
return( previousTemp * rnd.uniform(0.8, 1.2) + rnd.uniform(-1.0, 1.0))
Temperature = zz.iterate(estimateTemp, 0.0).take(100).list()
print(Temperature)
To preface this question, I understand that it could be done better. But this is a question in a class of mine and I must approach it this way. We cannot use any built in functions or packages.
I need to write a function to approximate the numerical value of the second derivative of a given function using finite difference. The function is below we are using.
2nd Derivative Formula (I lost the login info to my old account so pardon my lack of points and not being able to include images).
My question is this:
I don't understand how to make the python function accept the input function it is to be deriving. If someone puts in the input 2nd_deriv(2x**2 + 4, 6) I dont understand how to evaluate 2x^2 at 6.
If this is unclear, let me know and I can try again to describe. Python is new to me so I am just getting my feet wet.
Thanks
you can pass the function as any other "variable":
def f(x):
return 2*x*x + 4
def d2(fn, x0, h):
return (fn(x0+h) - 2*fn(x0) + fn(x0-h))/(h*h)
print(d2(f, 6, 0.1))
you can't pass a literal expression, you need a function (or a lambda).
def d2(f, x0, h = 1e-9):
func = f
if isinstance(f, str):
# quite insecure, use only with controlled input
func = eval ("lambda x:%s" % (f,))
return (func(x0+h) - 2*func(x0) + func(x0-h))/(2*h)
Then to use it
def g(x):
return 2*x**2 + 4
# using explicit function, forcing h value
print d2(g, 6, 1e-10)
Or directly:
# using lambda and default value for h
print d2(lambda x:2x**2+4, 6)
EDIT
updated to take into account that f can be a string or a function
I've been working on making my python more pythonic and toying with runtimes of short snippets of code. My goal to improve the readability, but additionally, to speed execution.
This example conflicts with the best practices I've been reading about and I'm interested to find the where the flaw in my thought process is.
The problem is to compute the hamming distance on two equal length strings. For example the hamming distance of strings 'aaab' and 'aaaa' is 1.
The most straightforward implementation I could think of is as follows:
def hamming_distance_1(s_1, s_2):
dist = 0
for x in range(len(s_1)):
if s_1[x] != s_2[x]: dist += 1
return dist
Next I wrote two "pythonic" implementations:
def hamming_distance_2(s_1, s_2):
return sum(i.imap(operator.countOf, s_1, s_2))
and
def hamming_distance_3(s_1, s_2):
return sum(i.imap(lambda s: int(s[0]!=s[1]), i.izip(s_1, s_2)))
In execution:
s_1 = (''.join(random.choice('ABCDEFG') for i in range(10000)))
s_2 = (''.join(random.choice('ABCDEFG') for i in range(10000)))
print 'ham_1 ', timeit.timeit('hamming_distance_1(s_1, s_2)', "from __main__ import s_1,s_2, hamming_distance_1",number=1000)
print 'ham_2 ', timeit.timeit('hamming_distance_2(s_1, s_2)', "from __main__ import s_1,s_2, hamming_distance_2",number=1000)
print 'ham_3 ', timeit.timeit('hamming_distance_3(s_1, s_2)', "from __main__ import s_1,s_2, hamming_distance_3",number=1000)
returning:
ham_1 1.84980392456
ham_2 3.26420593262
ham_3 3.98718094826
I expected that ham_3 would run slower then ham_2, due to the fact that calling a lambda is treated as a function call, which is slower then calling the built in operator.countOf.
I was surprised I couldn't find a way to get a more pythonic version to run faster then ham_1 however. I have trouble believing that ham_1 is the lower bound for pure python.
Thoughts anyone?
The key is making less method lookups and function calls:
def hamming_distance_4(s_1, s_2):
return sum(i != j for i, j in i.izip(s_1, s_2))
runs at ham_4 1.10134792328 in my system.
ham_2 and ham_3 makes lookups inside the loops, so they are slower.
I wonder if this might be a bit more Pythonic, in some broader sense. What if you use http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.hamming.html ... a module that already implements what you're looking for?