Im trying to compile an expression that contains an UndefinedFunction which has an implementation provided. (Alternatively: an expression which contains a Symbol which represents a call to an external numerical function)
Is there a way to do this? Either with autowrap or codegen or perhaps with some manual editing of the generated files?
The following naive example does not work:
import sympy as sp
import numpy as np
from sympy.abc import *
from sympy.utilities.lambdify import implemented_function
from sympy.utilities.autowrap import autowrap, ufuncify
def g_implementation(a):
"""represents some numerical function"""
return a*5
# sympy wrapper around the above function
g = implemented_function('g', g_implementation)
# some random expression using the above function
e = (x+g(a))**2+100
# try to compile said expression
f = autowrap(e, backend='cython')
# fails with "undefined reference to `g'"
EDIT:
I have several large Sympy expressions
The expressions are generated automatically (via differentiation and such)
The expressions contain some "implemented UndefinedFunctions" that call some numerical functions (i.e. NOT Sympy expressions)
The final script/program that has to evaluate the expressions for some input will be called quite often. That means that evaluating the expression in Sympy (via evalf) is definitely not feasible. Even compiling just in time (lambdify, autowrap, ufuncify, numba.jit) produces too much of an overhead.
Basically I want to create a binary python extension for those expressions without having to implement them by hand in C, which I consider too error prone.
OS is Windows 7 64bit
You may want to take a look at this answer about serialization of SymPy lambdas (generated by lambdify).
It's not exactly what you asked but may alleviate your problem with startup performance. Then lambdify-ied functions will mostly count only execution time.
You may also take a look at Theano. It has nice integration with SymPy.
Ok, this could do the trick, I hope, if not let me know and I'll try again.
I compare a compiled version of an expression using Cython against the lambdified expression.
from sympy.utilities.autowrap import autowrap
from sympy import symbols, lambdify
def wraping(expression):
return autowrap(expression, backend='cython')
def lamFunc(expression, x, y):
return lambdify([x,y], expr)
x, y = symbols('x y')
expr = ((x - y)**(25)).expand()
print expr
binary_callable = wraping(expr)
print binary_callable(1, 2)
lamb = lamFunc(expr, x, y)
print lamb(1,2)
which outputs:
x**25 - 25*x**24*y + 300*x**23*y**2 - 2300*x**22*y**3 + 12650*x**21*y**4 - 53130*x**20*y**5 + 177100*x**19*y**6 - 480700*x**18*y**7 + 1081575*x**17*y**8 - 2042975*x**16*y**9 + 3268760*x**15*y**10 - 4457400*x**14*y**11 + 5200300*x**13*y**12 - 5200300*x**12*y**13 + 4457400*x**11*y**14 - 3268760*x**10*y**15 + 2042975*x**9*y**16 - 1081575*x**8*y**17 + 480700*x**7*y**18 - 177100*x**6*y**19 + 53130*x**5*y**20 - 12650*x**4*y**21 + 2300*x**3*y**22 - 300*x**2*y**23 + 25*x*y**24 - y**25
-1.0
-1
If I time the execution times the autowraped function is 10x faster (depending on the problem I have also observed cases where the factor was as little as two):
%timeit binary_callable(12, 21)
100000 loops, best of 3: 2.87 µs per loop
%timeit lamb(12, 21)
100000 loops, best of 3: 28.7 µs per loop
So here wraping(expr) wraps your expression expr and returns a wrapped object binary_callable. This you can use at any time to do numerical evaluation.
EDIT: I have done this on Linux/Ubuntu and Windows OS, both seem to work fine!
Related
I don't understand why Sympy won't return to me the expression below simplified (not sure its a bug in my code or a feature of Sympy).
import sympy as sp
a = sp.Symbol('a',finite = True, real = True)
b = sp.Symbol('b',finite = True, real = True)
sp.assumptions.assume.global_assumptions.add(sp.Q.positive(b))
sp.assumptions.assume.global_assumptions.add(sp.Q.negative(a))
sp.simplify(sp.Max(a-b,a+b))
I would expect the output to be $a+b$, but Sympy still gives me $Max(a-b,a+b)$.
Thanks; as you can see I am a beginner in Sympy so any hints/help are appreciated.
Surely the result should be a + b...
You can do this by setting the assumptions on the symbol as in:
In [2]: a = Symbol('a', negative=True)
In [3]: b = Symbol('b', positive=True)
In [4]: Max(a - b, a + b)
Out[4]: a + b
You are trying to use the new assumptions system but that system is still experimental and is not widely used within sympy. The new assumptions are not used in core evaluation so e.g. the Max function has no idea that you have declared global assumptions on a and b unless those assumptions are declared on the symbols as I show above.
The purpose of using the numexpr.evaluate() is to speed up the compute. But im my case it wokrs even slower than numpy und eval(). I would like to know why?
code as example:
import datetime
import numpy as np
import numexpr as ne
expr = '11808000.0*1j*x**2*exp(2.5e-10*1j*x) + 1512000.0*1j*x**2*exp(5.0e-10*1j*x)'
# use eval
start_eval = datetime.datetime.now()
namespace = dict(x=np.array([m+3j for m in range(1, 1001)]), exp=np.exp)
result_eval = eval(expr, namespace)
end_eval = datetime.datetime.now()
# print(result)
print("time by using eval : %s" % (end_eval- start_eval))
# use numexpr
# ne.set_num_threads(8)
start_ne = datetime.datetime.now()
x = np.array([n+3j for n in range(1, 1001)])
result_ne = ne.evaluate(expr)
end_ne = datetime.datetime.now()
# print(result_ne)
print("time by using numexpr: %s" % (end_ne- start_ne))
return:
time by using eval : 0:00:00.002998
time by using numexpr: __ 0:00:00.052969
Thank you all
I have get the answer from robbmcleod in Github
for NumExpr 2.6 you'll need arrays around 128 - 256 kElements to see a speed-up. NumPy always starts faster as it doesn't have to synchronize at a thread barrier and otherwise spin-up the virtual machine
Also, once you call numexpr.evaluate() the second time, it should be faster as it will have already compiled everything. Compilation takes around 0.5 ms for simple expressions, more for longer expressions. The expression is stored as a hash, so the next time that computational expense is gone.
related url: https://github.com/pydata/numexpr/issues/301#issuecomment-388097698
I was trying to find a fast way to sort strings in Python and the locale is a non-concern i.e. I just want to sort the array lexically according to the underlying bytes. This is perfect for something like radix sort. Here is my MWE
import numpy as np
import timeit
# randChar is workaround for MemoryError in mtrand.RandomState.choice
# http://stackoverflow.com/questions/25627161/how-to-solve-memory-error-in-mtrand-randomstate-choice
def randChar(f, numGrp, N) :
things = [f%x for x in range(numGrp)]
return [things[x] for x in np.random.choice(numGrp, N)]
N=int(1e7)
K=100
id3 = randChar("id%010d", N//K, N) # small groups (char)
timeit.Timer("id3.sort()" ,"from __main__ import id3").timeit(1) # 6.8 seconds
As you can see it took 6.8 seconds which is almost 10x slower than R's radix sort below.
N = 1e7
K = 100
id3 = sample(sprintf("id%010d",1:(N/K)), N, TRUE)
system.time(sort(id3,method="radix"))
I understand that Python's .sort() doesn't use radix sort, is there an implementation somewhere that allows me to sort strings as performantly as R?
AFAIK both R and Python "intern" strings so any optimisations in R can also be done in Python.
The top google result for "radix sort strings python" is this gist which produced an error when sorting on my test array.
It is true that R interns all strings, meaning it has a "global character cache" which serves as a central dictionary of all strings ever used by your program. This has its advantages: the data takes less memory, and certain algorithms (such as radix sort) can take advantage of this structure to achieve higher speed. This is particularly true for the scenarios such as in your example, where the number of unique strings is small relative to the size of the vector. On the other hand it has its drawbacks too: the global character cache prevents multi-threaded write access to character data.
In Python, afaik, only string literals are interned. For example:
>>> 'abc' is 'abc'
True
>>> x = 'ab'
>>> (x + 'c') is 'abc'
False
In practice it means that, unless you've embedded data directly into the text of the program, nothing will be interned.
Now, for your original question: "what is the fastest way to sort strings in python"? You can achieve very good speeds, comparable with R, with python datatable package. Here's the benchmark that sorts N = 10⁸ strings, randomly selected from a set of 1024:
import datatable as dt
import pandas as pd
import random
from time import time
n = 10**8
src = ["%x" % random.getrandbits(10) for _ in range(n)]
f0 = dt.Frame(src)
p0 = pd.DataFrame(src)
f0.to_csv("test1e8.csv")
t0 = time(); f1 = f0.sort(0); print("datatable: %.3fs" % (time()-t0))
t0 = time(); src.sort(); print("list.sort: %.3fs" % (time()-t0))
t0 = time(); p1 = p0.sort_values(0); print("pandas: %.3fs" % (time()-t0))
Which produces:
datatable: 1.465s / 1.462s / 1.460s (multiple runs)
list.sort: 44.352s
pandas: 395.083s
The same dataset in R (v3.4.2):
> require(data.table)
> DT = fread("test1e8.csv")
> system.time(sort(DT$C1, method="radix"))
user system elapsed
6.238 0.585 6.832
> system.time(DT[order(C1)])
user system elapsed
4.275 0.457 4.738
> system.time(setkey(DT, C1)) # sort in-place
user system elapsed
3.020 0.577 3.600
Jeremy Mets posted in the comments of this blog post that Numpy can sort string fairly by converting the array to np.araray. This indeed improve performance, however it is still slower than Julia's implementation.
import numpy as np
import timeit
# randChar is workaround for MemoryError in mtrand.RandomState.choice
# http://stackoverflow.com/questions/25627161/how-to-solve-memory-error-in-mtrand-randomstate-choice
def randChar(f, numGrp, N) :
things = [f%x for x in range(numGrp)]
return [things[x] for x in np.random.choice(numGrp, N)]
N=int(1e7)
K=100
id3 = np.array(randChar("id%010d", N//K, N)) # small groups (char)
timeit.Timer("id3.sort()" ,"from __main__ import id3").timeit(1) # 6.8 seconds
I'm working with Python Sympy, solving a quadratic, and wanting to print the result using LaTex. For example, if the result is x = (1 + sqrt(3))/2, I would like it to print via LaTex as \frac{1 + \sqrt{3}}{2}. However, Python Sympy either splits this into two fractions, as in \frac{1}{2} + \frac{\sqrt{3}}{2}, OR factors out the half, as in \frac{1}{2}(1 + \sqrt{3}). I have tried to return the numerator via sympy.fraction(expr) and have viewed other articles (Sympy - fraction manipulation and others), but none were able to produce the result.
Check out how to override the default printers.
import sympy
from sympy.printing.latex import LatexPrinter # necessary because latex is both a function and a module
class CustomLatexPrinter(LatexPrinter):
def _print_Add(self, expr):
n, d = expr.as_numer_denom()
if d == sympy.S.One:
# defer to the default printing mechanism
super()._print_Add(expr)
return
return r'\frac{%s}{%s}' % (sympy.latex(n), sympy.latex(d))
# doing this should override the default latex printer globally
# adopted from "Examples of overloading StrPrinter" in the sympy documentation
sympy.printing.latex = lambda self: CustomLatexPrinter().doprint(self)
print(sympy.printing.latex((1 + sympy.sqrt(3)) / 2)) # \frac{1 + \sqrt{3}}{2}
I will have data which uses degrees for the trigonometric functions.
I will use sympy and since it uses radians for the trigonometric functions, I want to be able to convert and use degrees.
But, I will not know which symbol variable is been used as input to the trigonometric function.
So , I will have something like:
import sympy
from sympy import mpmath
a = 2
b = 30
a,b = sympy.symbols('a b')
expr = 'sqrt(a + cos(b))'
expres = sympy.sympify(expr)
print(expres.atoms(sympy.cos))
The b is 30 and it is in degrees (so, in the cos(b) computations it must be degrees, not radians as sympy expects).
But, as I said, another time the program runs, it will use another symbol variable for trigonometric.Or, more than one trigonometric functions.
I thought if using atoms in order to find if in the expression I have trigonometric functions.And then , somehow , use degrees instead of radians wherever the corresponding symbol variables is been used.
Is there a way to do this?
Arguably the most clean way to avoid this problem is to tell sympify to use custom definitions for selected symbols/strings, e.g., like this:
from sympy.abc import *
from sympy import sin, pi, Lambda, sympify
sin_degree = Lambda(x, sin(x*pi/180))
degree_trigs = {"sin": sin_degree}
expr_string = "sin(a) + sin(b**d + exp(c))"
expr = sympify(expr_string, locals=degree_trigs)
print(expr)
This returns:
sin(pi*a/180) + sin(pi*(b**d/180 + exp(c)/180))
You can use subs with functions, e.g., like this:
from sympy.abc import *
from sympy import sin, exp, pi, Lambda
sin_degree = Lambda(x, sin(x*pi/180))
expr = sin(a) + sin(b**d + exp(c))
print( expr.subs(sin,sin_degree) )
This returns:
sin(pi*a/180) + sin(pi*(b**d/180 + exp(c)/180))