Using sympy cxxcode on Heaviside function fails

Using sympy cxxcode on Heaviside function fails - python

I am using sympy version 1.10.1 and numpy version 1.20.0. Can someone please explain why the following simple code results in an error?
import sympy as sp
from sympy.printing import cxxcode
T = sp.symbols('T', real=True, finite=True)
func = sp.Heaviside(T - 0.01)
func_cxx = cxxcode(func)
The error is
ValueError: All Piecewise expressions must contain an (expr, True) statement to be used as a default condition. Without one, the generated expression may not evaluate to anything under some condition.
I understand that sympy converts Heaviside to a Piecewise function, but I'd imagine the corresponding Piecewise is also defined for all real & finite T:
>>func.rewrite(sp.Piecewise)
>>Piecewise((0, T - 0.01 < 0), (1/2, Eq(T - 0.01, 0)), (1, T - 0.01 > 0))

If I were you I would open an issue on SymPy.
To solve your problem you might want to modify the function in order to get cxxcode to work:
func = func.rewrite(Piecewise)
# modify the last argument to insert the True condition
args = list(func.args)
args[-1] = [args[-1][0], True]
# recreate the function
func = Piecewise(*args)
print(cxxcode(func))
# out: '((T - 0.01 < 0) ? (\n 0\n)\n: ((T - 0.01 == 0) ? (\n 1.0/2.0\n)\n: (\n 1\n)))'

Related

conversion of function from Matlab to python

I have a MATLAB function :
Bits=30
NBits= ceil(fzero(#(x)2^(x) - x -1 - Bits, max(log2(Bits),1)))
I want to convert it to python, I wrote something like this so far:
from numpy import log, log2
from scipy.optimize import root_scalar
def func(x,Bits):
return ((x)2^(x)-x-1-Bits, max(log2(Bits)))
However it says that it need to be (x)*2^
Does anybody know first, if the conversion from Matlab to python is correct? and second if * has to be added?
Upon suggestion I wrote this lambda function:
lambda x: (2^(x) -x -1 -Bits) , max(log2(Bits))
but I get this error:
TypeError: 'numpy.float64' object is not iterable

I don't have numpy or scipy on this computer so here is my best attempt at an answer.
def YourFunc(Bits):
return math.ceil(root_scalar(lambda x: (2**x)-x-1-Bits, x0 = max(log2(Bits),1)))
Bits = 30
NBits = YourFunc(30)
print(NBits)
I used this function for log2 rather than the one from numpy. Try it
def log2(x):
return math.log(x,2)

Sympy simplification of maximum

I don't understand why Sympy won't return to me the expression below simplified (not sure its a bug in my code or a feature of Sympy).
import sympy as sp
a = sp.Symbol('a',finite = True, real = True)
b = sp.Symbol('b',finite = True, real = True)
sp.assumptions.assume.global_assumptions.add(sp.Q.positive(b))
sp.assumptions.assume.global_assumptions.add(sp.Q.negative(a))
sp.simplify(sp.Max(a-b,a+b))
I would expect the output to be $a+b$, but Sympy still gives me $Max(a-b,a+b)$.
Thanks; as you can see I am a beginner in Sympy so any hints/help are appreciated.

Surely the result should be a + b...
You can do this by setting the assumptions on the symbol as in:
In [2]: a = Symbol('a', negative=True)
In [3]: b = Symbol('b', positive=True)
In [4]: Max(a - b, a + b)
Out[4]: a + b
You are trying to use the new assumptions system but that system is still experimental and is not widely used within sympy. The new assumptions are not used in core evaluation so e.g. the Max function has no idea that you have declared global assumptions on a and b unless those assumptions are declared on the symbols as I show above.

solving non linear problems in python

in last equation i need to solve for q. Here is the problem from miranda feckler , I need to develop equivalent python code If my function is based on many variables and i need to solve non linear root finding problem for only one variable then how will i write-
when i write all the three variable, I get following error
TypeError: 'numpy.ndarray' object is not callable
and when i write only one of variables-
i get error-
TypeError: resid() missing 2 required positional arguments: 'p' and 'phi'
can anyone tell me my mistake and a better code for this.

broyden1(resid(co, p_node, q), co)
breaks because the term resid(co, p_node, q) gets evaluated (returning an array) before passing into the function.
broyden1(resid, co)
breaks because when broyden1 evaluates it calls resid(co) which is clearly not well defined. You want to be able to pass the initial guess as a single object (e.g. a tuple) in broyden1, so a simple solution is to just redefine resid to take in a tuple instead of three sepearate arguments, like so:
def resid(arg):
c,p,phi = arg
return p + (phi * c) * ((-1 / eta) * (p ** (eta + 1))) \
- alpha * (np.sqrt(np.abs(phi * c))) - (phi * c) ** 2
c1 = scipy.optimize.broyden1(resid, (co, p_node, q))

Python - In-line boolean evaluation without IF statements

I am trying to assess the value of a column of a dataframe to determine the value of another column. I did this by using an if statement and .apply() function successfully. I.e.
if Col x < 0.3:
return y
elif Col x > 0.6:
return z
Etc. The problem is this takes quite a while to run with a lot of data. Instead I am trying to use the following logic to determine the new column value:
(x<0.3)*y + (x>0.6)*z
So Python evaluates TRUE/FALSE and applies the correct value. This seems to work much faster, the only thing is Python says:
"UserWarning: evaluating in Python space because the '*' operator is not supported by numexpr for the bool dtype, use '&' instead
unsupported[op_str]))"
Is this a problem? Should I be using "&"? I feel using "&" would be incorrect when multiplying.
Thank you!

From what I have read so far, the performance gap is issued by the parser backend chosen by pandas. There's the regular python parser as a backand and, additionally, a pandas parsing backend.
The docs say, that there is no performance gain if using plain old python over pandas here: Pandas eval Backends
However, you obviously hit a white spot in the pandas backend; i.e. you formed an expression that cannot be evaluated using pandas. The result is that pandas falls back to the original python parsing backend, as stated in the resulting UserWarning:
UserWarning: evaluating in Python space because the '*' operator is not supported by numexpr for the bool dtype, use '&' instead
unsupported[op_str]))
(More on this topic)
Timing evaluations
So, as we now know about different parsing backends, it's time to check a few options provided by pandas that are suitable for your desired dataframe operation (complete script below):
expr_a = '''(a < 0.3) * 1 + (a > 0.6) * 3 + (a >= 0.3) * (a <= 0.6) * 2'''
Evaluate the expression as a string using the pandas backend
Evaluate the same string using the python backend
Evaluate the expression string with external variable reference using pandas
Solve the problem using df.apply()
Solve the problem using df.applymap()
Direct submission of the expression (no string evaluation)
The results on my machine for a dataframe with 10,000,000 random float values in one column are:
(1) Eval (pd) 0.240498406269
(2) Eval (py) 0.197919774926
(3) Eval # (pd) 0.200814546686
(4) Apply 3.242620778595
(5) ApplyMap 6.542354086152
(6) Direct 0.140075372736
The major points explaining the performance differences are most likely the following:
Using a python function (as in apply() and applymap()) is (of course!) much slower than using functionality completely implemented in C
String evaluation is expensive (see (6) vs (2))
The overhead (1) has over (2) is probably the backend choice and fallback to also using the python backend, because pandas does not evaluate bool * int.
Nothing new, eh?
How to proceed
We basically just proved what our gut feeling was telling us before (namely: pandas chooses the right backend for a task).
As a consequence, I think it is totally okay to ignore the UserWarning, as long as you know the underlying hows and whys.
Thus: Keep going and have pandas use the fastest of all implementations, which is, as usual, the C functions.
The Test Script
from __future__ import print_function
import sys
import random
import pandas as pd
import numpy as np
from timeit import default_timer as timer
def conditional_column(val):
if val < 0.3:
return 1
elif val > 0.6:
return 3
return 2
if __name__ == '__main__':
nr = 10000000
df = pd.DataFrame({
'a': [random.random() for _ in range(nr)]
})
print(nr, 'rows')
expr_a = '''(a < 0.3) * 1 + (a > 0.6) * 3 + (a >= 0.3) * (a <= 0.6) * 2'''
expr_b = '''(#df.a < 0.3) * 1 + (#df.a > 0.6) * 3 + (#df.a >= 0.3) * (#df.a <= 0.6) * 2'''
fmt = '{:16s} {:.12f}'
# Evaluate the string expression using pandas parser
t0 = timer()
b = df.eval(expr_a, parser='pandas')
print(fmt.format('(1) Eval (pd)', timer() - t0))
# Evaluate the string expression using python parser
t0 = timer()
c = df.eval(expr_a, parser='python')
print(fmt.format('(2) Eval (py)', timer() - t0))
# Evaluate the string expression using pandas parser with external variable access (#)
t0 = timer()
d = df.eval(expr_b, parser='pandas')
print(fmt.format('(3) Eval # (pd)', timer() - t0))
# Use apply to map the if/else function to each row of the df
t0 = timer()
d = df['a'].apply(conditional_column)
print(fmt.format('(4) Apply', timer() - t0))
# Use element-wise apply (WARNING: requires a dataframe and walks ALL cols AND rows)
t0 = timer()
e = df.applymap(conditional_column)
print(fmt.format('(5) ApplyMap', timer() - t0))
# Directly access the pandas series objects returned by boolean expressions on columns
t0 = timer()
f = (df['a'] < 0.3) * 1 + (df['a'] > 0.6) * 3 + (df['a'] >= 0.3) * (df['a'] <= 0.6) * 2
print(fmt.format('(6) Direct', timer() - t0))

Porting IDL code, lindgen function to Python

Afternoon everyone. I'm currently porting over an IDL code to python and it's been plain sailing up until this point so far. I'm stuck on this section of IDL code:
nsteps = 266
ind2 = ((lindgen(nsteps+1,nsteps+1)) mod (nsteps+1))
dk2 = (k2arr((ind2+1) < nsteps) - k2arr(ind2-1) > 0)) / 2.
My version of this includes a rewritten lindgen function as follows:
def pylindgen(shape):
nelem = numpy.prod(numpy.array(shape))
out = numpy.arange(nelem,dtype=int)
return numpy.reshape(out,shape)
... and the ported code where k2arr is an array of shape (267,):
ind2 = pylindgen((nsteps+1,nsteps+1)) % (nsteps+1)
dk2 = (k2arr[ (ind2+1) < nsteps ] - k2arr[ (ind2-1) > 0. ]) / 2.
Now, the problem is that my code makes ind2 an array where, by looking at the IDL code and the errors thrown in the python script, I'm sure it's meant to be a scalar. Am I missing some feature of these IDL functions?
Any thoughts would be greatly appreciated.
Cheers.

My knowledge of IDL is not what it used to be, I had to research a little. The operator ">" in IDL is not an equivalent of python (or other languages). It stablishes a maximum, anything above it will be set to that value. Same goes for "<", obviously, it sets a minimum.
dk2 = (k2arr((ind2+1) < nsteps) - k2arr(ind2-1) > 0))
where k2arr is 266 and ind2 is (266,266) is equivalent to saying:
- (ind2+1 < nsteps) take ind2+1 and, in any place that ind2+1
is greater than nsteps, replace by nsteps.
- (ind2-1 > 0) take ind2-1 and, in any place that ind2-1 is
less than zero, put zero instead.
The tricky part is now. k2arr (266,) is evaluated for each of the rows of (ind2+1) and (ind2-1), meaning that if (ind2+1 < nsteps) = [1,2,3,...,nsteps-1, nsteps, nsteps] the k2arr will be evaluated for exactly that 266 times, one on top of the other, with the result being (266,266) array.
And NOW I remember why I stopped programming in IDL!

The code for pylindgen works perfectly for me. Produces an array of (267,267), though. IF k2array is a (267,) array, you should be getting an error like:
ValueError: boolean index array should have 1 dimension
Is that your problem?
Cheers

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using sympy cxxcode on Heaviside function fails - python

Related

conversion of function from Matlab to python

Sympy simplification of maximum

solving non linear problems in python

Python - In-line boolean evaluation without IF statements

Porting IDL code, lindgen function to Python

Categories

Resources