Sympy get vector from vector field - python

I'm using sympy (which is awesome) and I just made a vector field like this
> import sympy
> from sympy.vector import CoordSys3D
> from sympy import *
> R = CoordSys3D('R')
> x, y, z, t = symbols('x y z t')
> v = x*R.i + 4*z*R.j + y*R.k
x*R.i + 4*z*R.j + y*R.k
> v.evalf(subs={x:6, y:5, z:2})
6.00000000000000*R.i + 8.00000000000000*R.j + 5.00000000000000*R.k
and what I need is to get a vector or list of the form [6.0,8.0,5.0], so is there a way to get a list form v.evalf()? I could use use split or something on "6.00000000000000*R.i + 8.00000000000000*R.j + 5.00000000000000*R.k" but thats seems ugly and maybe there a built in method for that?

In [252]: vector = v.evalf(subs={x:6, y:5, z:2}); vector
Out[252]: 6.00000000000000*R.i + 8.00000000000000*R.j + 5.00000000000000*R.k
In [253]: list(vector.to_matrix(R))
Out[253]: [6.00000000000000, 8.00000000000000, 5.00000000000000]
Other possibilities include
In [256]: vector.as_poly().coeffs()
Out[256]: [6.00000000000000, 8.00000000000000, 5.00000000000000]
In [257]: list(vector.components.values())
Out[257]: [5.00000000000000, 8.00000000000000, 6.00000000000000]
but I think they suffer a fatal flaw which is exposed when one or more of the components equal 0. For example, if z is set to 0:
In [258]: vector = v.evalf(subs={x:6, y:5, z:0}); vector
Out[258]: 6.00000000000000*R.i + 5.00000000000000*R.k
Then list(vector.to_matrix(R)) still returns 3 components:
In [259]: list(vector.to_matrix(R))
Out[259]: [6.00000000000000, 0, 5.00000000000000]
while these other two expressions omit the zero-component:
In [260]: vector.as_poly().coeffs()
Out[260]: [6.00000000000000, 5.00000000000000]
In [261]: list(vector.components.values())
Out[261]: [5.00000000000000, 6.00000000000000]

Related

How can I convert a SymPy expression to NumPy?

How can I convert a sympy expression to numpy code? For example, say I this was the code for the expression:
expression = 2 * x/y + 10 * sympy.exp(x) # Assuming that x and y are predefined from sympy.symbols
I would want to go from expression to this:
np_expression = "np.dot(2, np.dot(x, np.linalg.pinv(y))) + np.dot(10, np.exp(x))"
Note that x and y are matrices, but we can assume the shapes will match
An example with real numbers would go like this:
a = np.array([1,2],[3,4])
b = np.array([5,6],[7,8])
expression = 2 * a/b + 10 # These would be sympy symbols rather than numbers
and the result would be this:
np_expression = "np.dot(2, np.dot(5, np.linalg.pinv(9))) + 10"
In [1]: expr = 2 *x/y + 10 * exp(x)
In [3]: f = lambdify((x,y), expr)
In [4]: help(f)
_lambdifygenerated(x, y)
Created with lambdify. Signature:
func(x, y)
Expression:
2*x/y + 10*exp(x)
Source code:
def _lambdifygenerated(x, y):
return 2*x/y + 10*exp(x)
Which for specific inputs, array or otherwise:
In [5]: f(np.arange(1,5)[:,None], np.arange(1,4))
Out[5]:
array([[ 29.18281828, 28.18281828, 27.84948495],
[ 77.89056099, 75.89056099, 75.22389432],
[206.85536923, 203.85536923, 202.85536923],
[553.98150033, 549.98150033, 548.648167 ]])
In [6]: f(1,1)
Out[6]: 29.18281828459045
In [7]: f(2,3)
Out[7]: 75.22389432263984
In [8]: f(np.arange(1,4),np.arange(1,4))
Out[8]: array([ 29.18281828, 75.89056099, 202.85536923])
Normal array broadcasting rules apply. Note that x/y is element-wise. I'm not sure what lambdify will translate into dot and inv code.
trying your numpy code:
In [9]: np.dot(2, np.dot(2,np.linalg.pinv(3)))+10*np.exp(2)
---------------------------------------------------------------------------
LinAlgError Traceback (most recent call last)
<ipython-input-9-6cae91f0e0f8> in <module>
----> 1 np.dot(2, np.dot(2,np.linalg.pinv(3)))+10*np.exp(2)
....
LinAlgError: 0-dimensional array given. Array must be at least two-dimensional
We have to change the y into a 2d array, e.g. [[3]]:
In [10]: np.dot(2, np.dot(2,np.linalg.pinv([[3]])))+10*np.exp(2)
Out[10]: array([[75.22389432]])

Find the point of intersection of two linear equations using Numpy

The objective is to find the point of intersection of two linear equations. These two linear equation are derived using the Numpy polyfit functions.
Given two time series (xLeft, yLeft) and (xRight, yRight), the linear least suqares fit to each of them was calculated using polyfit as shown below:
xLeft = [
6168, 6169, 6170, 6171, 6172, 6173, 6174, 6175, 6176, 6177,
6178, 6179, 6180, 6181, 6182, 6183, 6184, 6185, 6186, 6187
]
yLeft = [
0.98288751, 1.3639959, 1.7550986, 2.1539073, 2.5580614,
2.9651523, 3.3727503, 3.7784295, 4.1797948, 4.5745049,
4.9602985, 5.3350167, 5.6966233, 6.0432272, 6.3730989,
6.6846867, 6.9766307, 7.2477727, 7.4971657, 7.7240791
]
xRight = [
6210, 6211, 6212, 6213, 6214, 6215, 6216, 6217, 6218, 6219,
6220, 6221, 6222, 6223, 6224, 6225, 6226, 6227, 6228, 6229,
6230, 6231, 6232, 6233, 6234, 6235, 6236, 6237, 6238, 6239,
6240, 6241, 6242, 6243, 6244, 6245, 6246, 6247, 6248, 6249,
6250, 6251, 6252, 6253, 6254, 6255, 6256, 6257, 6258, 6259,
6260, 6261, 6262, 6263, 6264, 6265, 6266, 6267, 6268, 6269,
6270, 6271, 6272, 6273, 6274, 6275, 6276, 6277, 6278, 6279,
6280, 6281, 6282, 6283, 6284, 6285, 6286, 6287, 6288]
yRight=[
7.8625913, 7.7713094, 7.6833806, 7.5997391, 7.5211883,
7.4483986, 7.3819046, 7.3221073, 7.2692747, 7.223547,
7.1849418, 7.1533613, 7.1286001, 7.1103559, 7.0982385,
7.0917811, 7.0904517, 7.0936642, 7.100791, 7.1111741,
7.124136, 7.1389918, 7.1550579, 7.1716633, 7.1881566,
7.2039142, 7.218349, 7.2309117, 7.2410989, 7.248455,
7.2525721, 7.2530937, 7.249711, 7.2421637, 7.2302341,
7.213747, 7.1925621, 7.1665707, 7.1356878, 7.0998487,
7.0590014, 7.0131001, 6.9621005, 6.9059525, 6.8445964,
6.7779589, 6.7059474, 6.6284504, 6.5453324, 6.4564347,
6.3615761, 6.2605534, 6.1531439, 6.0391097, 5.9182019,
5.7901659, 5.6547484, 5.5117044, 5.360805, 5.2018456,
5.034656, 4.8591075, 4.6751242, 4.4826899, 4.281858,
4.0727611, 3.8556159, 3.6307325, 3.3985188, 3.1594861,
2.9142516, 2.6635408, 2.4081881, 2.1491354, 1.8874279,
1.6242117,1.3607255,1.0982931,0.83831298
]
left_line = np.polyfit(xleft, yleft, 1)
right_line = np.polyfit(xRight, yRight, 1)
In this case, polyfit outputs the coeficients m and b for y = mx + b, respectively.
The intersection of the two linear equations then can be calculated as follows:
x0 = -(left_line[1] - right_line[1]) / (left_line[0] - right_line[0])
y0 = x0 * left_line[0] + left_line[1]
However, I wonder whether there exist Numpy build-in approach to calculate the last two steps?
Not exactly a built-in approach, but you can simplify the problem. Say I have lines given my y = m1 * x + b1 and y = m2 * x + b2. You can trivially find an equation for the difference, which is also a line:
y = (m1 - m2) * x + (b1 - b2)
Notice that this line will have a root at the intersection of the two original lines, if they intersect. You can use the numpy.polynomial.Polynomial class to perform these operations:
>>> (np.polynomial.Polynomial(left_line[::-1]) - np.polynomial.Polynomial(right_line[::-1])).roots()
array([6192.0710885])
Notice that I had to swap the order of the coefficients, since Polynomial expects smallest to largest, while np.polyfit returns the opposite. In fact, np.polyfit is not recommended. Instead, you can get Polynomial obejcts directly using np.polynomial.Polynomial.fit class method. Your code would then look like:
left_line = np.polynomial.Polynomial.fit(xLeft, yLeft, 1, domain=[-1, 1])
right_line = np.polynomial.Polynomial.fit(xRight, yRight, 1, domain=[-1, 1])
x0 = (left_line - right_line).roots()
y0 = left_line(x0)
The domain is mapped to the window [-1, 1]. If you do not specify a domain, the peak-to-peak of the x-values will be used instead. You do not want this, since it will result in a mapping of the input values. Instead, we explicitly specify that the domain [-1, 1] maps to the same window. An alternative would be to use the default domain and set e.g. window=[xLeft.min(), xLeft.max()]. The problem with this approach is that it would then create different domains for the polynomials, preventing the operation left_line - right_line.
See https://numpy.org/doc/stable/reference/routines.polynomials.classes.html for more information.
You can model it as a linear system and use simple linear algebra:
def get_intersection(m1,b1,m2,b2):
A = np.array([[-m1, 1], [-m2, 1]])
b = np.array([[b1], [b2]])
# you have to solve linear System AX = b where X = [x y]'
X = np.linalg.pinv(A) # b
x, y = np.round(np.squeeze(X), 4)
return x, y # returns point of intersection (x,y) with 4 decimal precision
m1,b1,m2,b2 = left_line(0), left_line(1), right_line(0), right_line(1)
print(get_intersection(m1,b1,m2,b2))
As an example, for lines y - x = 1, and y + x = 1, we expect the intersection as (0,1):
m1,b1,m2,b2 = 1, 1, -1, 1
print(get_intersection(m1,b1,m2,b2))
Output: (0.0, 1.0) as expected.

Derivative of patsy dmatrix with respect to a specific variable

Edit: I now have a candidate solution to my question (see toy example below) -- if you can think of something more robust, please let me know.
I just found out about python's patsy package for creating design matrices from R-style formulas, and it looks great. My question is this: given a patsy formula, e.g. "(x1 + x2 + x3)**2", is there an easy way to create a design matrix containing the derivative with respect to a particular variable, e.g. "x1"?
Here's a toy example:
import numpy as np
import pandas as pd
import patsy
import sympy
import sympy.parsing.sympy_parser as sympy_parser
n_obs = 200
df = pd.DataFrame(np.random.uniform(size=(n_obs, 3)), columns=["x1", "x2", "x3"])
df.describe()
design_matrix = patsy.dmatrix("(I(7*x1) + x2 + x3)**2 + I(x1**2) + I(x1*x2*x3)", df)
design_matrix.design_info.column_names
## ['Intercept', 'I(7 * x1)', 'x2', 'x3', 'I(7 * x1):x2', 'I(7 * x1):x3', 'x2:x3', 'I(x1 ** 2)', 'I(x1 * x2 * x3)']
x1, x2, x3 = sympy.symbols("x1 x2 x3")
def diff_wrt_x1(string):
return str(sympy.diff(sympy_parser.parse_expr(string), x1))
colnames_to_differentiate = [colname.replace(":", "*").replace("Intercept", "1").replace("I", "")
for colname in design_matrix.design_info.column_names]
derivatives_wrt_x1 = [diff_wrt_x1(colname) for colname in colnames_to_differentiate]
def get_column(string):
try:
return float(string) * np.ones((len(df), 1)) # For cases like string == "7"
except ValueError:
return patsy.dmatrix("0 + I(%s)" % string, df) # For cases like string == "x2*x3"
derivative_columns = tuple(get_column(derivative_string) for derivative_string in derivatives_wrt_x1)
design_matrix_derivative = np.hstack(derivative_columns)
design_matrix_derivative[0] # Contains [0, 7, 0, 0, 7*x2, 7*x3, 0, 2*x1, x2*x3]
design_matrix_derivative_manual = np.zeros_like(design_matrix_derivative)
design_matrix_derivative_manual[:, 1] = 7.0
design_matrix_derivative_manual[:, 4] = 7*df["x2"]
design_matrix_derivative_manual[:, 5] = 7*df["x3"]
design_matrix_derivative_manual[:, 7] = 2*df["x1"]
design_matrix_derivative_manual[:, 8] = df["x2"] * df["x3"]
np.all(np.isclose(design_matrix_derivative, design_matrix_derivative_manual)) # True!
The code generates a design matrix with columns [1, 7*x1, x2, x3, 7*x1*x2, 7*x1*x3, x2*x3, x1^2, x1*x2*x3].
Suppose I want a new formula which differentiates design_matrix with respect to x1. The desired result is a matrix of the same shape as design_matrix, but whose columns are [0, 7, 0, 0, 7*x2, 7*x3, 0, 2*x1, x2*x3]. Is there a programmatic way to do that? I've tried searching the patsy docs as well as stackoverflow and I don't see anything. Of course I can create the derivative matrix manually, but it would be great to have a function that does it (and that doesn't have to be updated when I change the formula to, say, "(x1 + x2 + x3 + x4)**2 + I(x1**3)").

how to write symbol for sum over a variable's subscript in sympy

I want to write a sympy symbol for a summation, but the index summed over also appears as the subscript of a variable name in the summand. For example,
import numpy as np
import sympy
sympy.init_printing()
r = sympy.Symbol('r')
a = sympy.Matrix(sympy.symbols('a:4'))
rpowers = sympy.Matrix([r**i for i in range(len(a))])
long_expr = a.dot(rpowers)
n = sympy.Symbol('n')
a_n = sympy.Symbol('a_n')
short_expr = sympy.Sum(a_n * r**n, (n, 0, 3))
long_expr and short_expr denote the same thing mathematically. But with long_expr, I can substitute in the values for the a's and then lambdify that expression into a numpy function:
coeffed_long_expr = long_expr.subs(zip(a, [-1, 3, 23, 8]))
func_long_expr = sympy.lambdify([r], coeffed_long_expr, 'numpy')
How can I do the same with short_expr? Or is short_expr only useful for displaying the expression with a summation sign in this case? I would like to be able to display using the summation sign, especially for large ns.
You can accomplish this by using sympy.Function:
import sympy
a_seq = [-1, 3, 23, 8]
n, r = sympy.symbols('n, r')
a_n = sympy.Function('a')(n)
terms = 4
short_expr = sympy.Sum(a_n * r**n, (n, 0, terms - 1))
coeffed_short_expr = short_expr.doit().subs(
(a_n.subs(n, i), a_seq[i]) for i in range(terms)) # 8*r**3 + 23*r**2 + 3*r - 1
func_short_expr = sympy.lambdify(r, coeffed_short_expr, 'numpy')
If you wish for a cleaner, more efficient implementation, I suspect you may be able to define a subclass of sympy.Symbol that implements subs() properly for summations.

Interval containing specified percent of values

With numpy or scipy, is there any existing method that will return the endpoints of an interval which contains a specified percent of the values in a 1D array? I realize that this is simple to write myself, but it seems like the kind of thing that might be built in, although I can't find it.
E.g:
>>> import numpy as np
>>> x = np.random.randn(100000)
>>> print(np.bounding_interval(x, 0.68))
Would give approximately (-1, 1)
You can use np.percentile:
In [29]: x = np.random.randn(100000)
In [30]: p = 0.68
In [31]: lo = 50*(1 - p)
In [32]: hi = 50*(1 + p)
In [33]: np.percentile(x, [lo, hi])
Out[33]: array([-0.99206523, 1.0006089 ])
There is also scipy.stats.scoreatpercentile:
In [34]: scoreatpercentile(x, [lo, hi])
Out[34]: array([-0.99206523, 1.0006089 ])
I don't know of a built-in function to do it, but you can write one using the math package to specify approximate indices like this:
from __future__ import division
import math
import numpy as np
def bound_interval(arr_in, interval):
lhs = (1 - interval) / 2 # Specify left-hand side chunk to exclude
rhs = 1 - lhs # and the right-hand side
sorted = np.sort(arr_in)
lower = sorted[math.floor(lhs * len(arr_in))] # use floor to get index
upper = sorted[math.floor(rhs * len(arr_in))]
return (lower, upper)
On your specified array, I got the interval (-0.99072237819851039, 0.98691691784955549). Pretty close to (-1, 1)!

Categories