I am computing these derivatives using the Montecarlo approach for a generic call option. I am interested in this combined derivative (with respect to both S and Sigma). Doing this with the algorithmic differentiation, I get an error that can be seen at the end of the page. What could be a possible solution? Just to explain something regarding the code, I am going to attach the formula used to compute the "X" in the code below:
from jax import jit, grad, vmap
import jax.numpy as jnp
from jax import random
Underlying_asset = jnp.linspace(1.1,1.4,100)
volatilities = jnp.linspace(0.5,0.6,100)
def second_derivative_mc(S,vol):
N = 100
j,T,q,r,k = 10000,1.,0,0,1.
S0 = jnp.array([S]).T #(Nx1) vector underlying asset
C = jnp.identity(N)*vol #matrix of volatilities with 0 outside diagonal
e = jnp.array([jnp.full(j,1.)])#(1xj) vector of "1"
Rand = np.random.RandomState()
Rand.seed(10)
U= Rand.normal(0,1,(N,j)) #Random number for Brownian Motion
sigma2 = jnp.array([vol**2]).T #Vector of variance Nx1
first = jnp.dot(sigma2,e) #First part equation
second = jnp.dot(C,U) #Second part equation
X = -0.5*first+jnp.sqrt(T)*second
St = jnp.exp(X)*S0
P = jnp.maximum(St-k,0)
payoff = jnp.average(P, axis=-1)*jnp.exp(-q*T)
return payoff
greek = vmap(grad(grad(second_derivative_mc, argnums=1), argnums=0)(Underlying_asset,volatilities)
This is the error message:
> UnfilteredStackTrace Traceback (most recent call
> last) <ipython-input-78-0cc1da97ae0c> in <module>()
> 25
> ---> 26 greek = vmap(grad(grad(second_derivative_mc, argnums=1), argnums=0))(Underlying_asset,volatilities)
>
> 18 frames UnfilteredStackTrace: TypeError: Gradient only defined for
> scalar-output functions. Output had shape: (100,).
The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.
The above exception was the direct cause of the following exception:
> TypeError Traceback (most recent call
> last) /usr/local/lib/python3.7/dist-packages/jax/_src/api.py in
> _check_scalar(x)
> 894 if isinstance(aval, ShapedArray):
> 895 if aval.shape != ():
> --> 896 raise TypeError(msg(f"had shape: {aval.shape}"))
> 897 else:
> 898 raise TypeError(msg(f"had abstract value {aval}"))
> TypeError: Gradient only defined for scalar-output functions. Output had shape: (100,).
As the error message indicates, gradients can only be computed for functions that return a scalar. Your function returns a vector:
print(len(second_derivative_mc(1.1, 0.5)))
# 100
For vector-valued functions, you can compute the jacobian (which is similar to a multi-dimensional gradient). Is this what you had in mind?
from jax import jacobian
greek = vmap(jacobian(jacobian(second_derivative_mc, argnums=1), argnums=0))(Underlying_asset,volatilities)
Also, this is not what you asked about, but the function above will probably not work as you intend even if you solve the issue in the question. Numpy RandomState objects are stateful, and thus will generally not work correctly with jax transforms like grad, jit, vmap, etc., which require side-effect-free code (see Stateful Computations In JAX). You might try using jax.random instead; see JAX: Random Numbers for more information.
Related
I am trying to create a minimum variance portfolio based on 1 year of data. I then want to rebalance the portfolio every month recomputing thus the covariance matrix. (my dataset starts in 1992 and finishes in 2017).
I did the following code which works when it is not in a loop. But when put in the loop the inverse of the covariance matrix is Singular. I don't understand why this problem arises since I reset every variable at the end of the loop.
### Importing the necessary libraries ###
import pandas as pd
import numpy as np
from numpy.linalg import inv
### Importing the dataset ###
df = pd.read_csv("UK_Returns.csv", sep = ";")
df.set_index('Date', inplace = True)
### Define varibales ###
stocks = df.shape[1]
returns = []
vol = []
weights_p =[]
### for loop to compute portfolio and rebalance every 30 days ###
for i in range (0,288):
a = i*30
b = i*30 + 252
portfolio = df[a:b]
mean_ret = ((1+portfolio.mean())**252)-1
var_cov = portfolio.cov()*252
inv_var_cov = inv(var_cov)
doit = 0
weights = np.dot(np.ones((1,stocks)),inv_var_cov)/(np.dot(np.ones((1,stocks)),np.dot(inv_var_cov,np.ones((stocks,1)))))
ret = np.dot(weights, mean_ret)
std = np.sqrt(np.dot(weights, np.dot(var_cov, weights.T)))
returns.append(ret)
vol.append(std)
weights_p.append(weights)
weights = []
var_cov = np.zeros((stocks,stocks))
inv_var_cov = np.zeros((stocks,stocks))
i+=1
Does anyone has an idea to solve this issue?
The error it yields is the following:
---------------------------------------------------------------------------
LinAlgError Traceback (most recent call last)
<ipython-input-17-979efdd1f5b2> in <module>()
21 mean_ret = ((1+portfolio.mean())**252)-1
22 var_cov = portfolio.cov()*252
---> 23 inv_var_cov = inv(var_cov)
24 doit = 0
25 weights = np.dot(np.ones((1,stocks)),inv_var_cov)/(np.dot(np.ones((1,stocks)),np.dot(inv_var_cov,np.ones((stocks,1)))))
<__array_function__ internals> in inv(*args, **kwargs)
1 frames
/usr/local/lib/python3.6/dist-packages/numpy/linalg/linalg.py in _raise_linalgerror_singular(err, flag)
95
96 def _raise_linalgerror_singular(err, flag):
---> 97 raise LinAlgError("Singular matrix")
98
99 def _raise_linalgerror_nonposdef(err, flag):
LinAlgError: Singular matrix
Thank you so much for any help you can provide me with!
The data is shared in the following google drive: https://drive.google.com/file/d/1-Bw7cowZKCNU4JgNCitmblHVw73ORFKR/view?usp=sharing
It would be better to identify what is causing the singularity of the matrix
but there are means of living with singular matrices.
Try to use pseudoinverse by np.linalg.pinv(). It is guaranteed to always exist.
See pinv
Other way around it is avoid computing inverse matrix at all.
Just find Least Squares solution of the system. See lstsq
Just replace np.dot(X,inv_var_cov) with
np.linalg.lstsq(var_conv, X, rcond=None)[0]
I am playing with the cvxpy library in order to solve some particular optimisation problem
import cvxpy as cp
import numpy as np
(...)
prob = cp.Problem(
cp.Minimize(max(M*theta-b)) <= 45,
[-48 <= theta, theta <= 48])
(Here M and b are certain numpy matrices.)
Interestingly, it screams:
NotImplementedError Traceback (most recent call last)
<ipython-input-62-0296c965b1ff> in <module>
1 prob = cp.Problem(
----> 2 cp.Minimize(max(M*theta-b)) <= 45,
3 [-10 <= theta, theta <= 10])
~\Anaconda3\lib\site-packages\cvxpy\expressions\expression.py in __gt__(self, other)
595 """Unsupported.
596 """
--> 597 raise NotImplementedError("Strict inequalities are not allowed.")
NotImplementedError: Strict inequalities are not allowed.
however, to me, they do not look strict at all...
Same reason as in your earlier question (although things like that are hard to analyze).
You need to ask cvxpy for it's max function explicitly. This is always required / recommended.
cp.Minimize(max(M*theta-b))
should be
cp.Minimize(cp.max(M*theta-b))
You basically have to use only functions from cvxpy, except for the following:
The CVXPY function sum sums all the entries in a single expression. The built-in Python sum should be used to add together a list of expressions.
I am trying to do a correlated fit of both x and y data, however when I pass in covariance matrices for my x and y measurements, I get the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-173-273ef42c6f27> in <module>()
----> 1 odrout = theodr.run()
/Users/anaconda/lib/python2.7/site-packages/scipy/odr/odrpack.pyc in run(self)
1098 for attr in kwd_l:
1099 obj = getattr(self, attr)
-> 1100 if obj is not None:
1101 kwds[attr] = obj
1102
ValueError: could not convert we to a suitable array
Here is a minimal NOT working example that triggers this error on my machine:
import numpy as np
import scipy.odr as spodr
# make x and y data for a function
xx = np.linspace(0, 2*np.pi, 100)
yy = 2.*np.sin(3*xx) - 1
# randomize both variables a bit, and make 10 measurements
# of each data point
xdat = xx + np.random.normal(scale=0.3, size=(10,100))
ydat = yy + np.random.normal(scale=0.3, size=(10, 100))
# the function I will fit to
sin = lambda beta, x: beta[0]*np.sin(beta[1] * x) + beta[2]
# the covariance matrices for both data sets, here I summed over
# the 10 measurements I made for both my x and y data
xcov = np.cov(xdat.transpose())
ycov = np.cov(ydat.transpose())
# setup the odr data
odrdat = spodr.RealData(np.mean(xdat, axis=0),
np.mean(ydat, axis=0), covx=xcov, covy=ycov)
# set up the odr model
model = spodr.Model(sin)
# make the odr object
theodr = spodr.ODR(odrdat, model, beta0=[2,3,-1])
# run the odr object
odrout = theodr.run()
I can't seem to see why the matrices I'm passing are not suitable arrays. From the docs:
Covariance of x covx is an array of covariance matrices of x and are converted to weights by performing a matrix inversion on each observation’s covariance matrix.
This makes me think I should be passing a covariance matrix for each data point, but I don't have that type of information, and I don't think I need it. For a correlated fit it should be enough to have the covariances between all the data. For instance, in scipy.curve_fit you can pass in a 2d-array as a covariance matrix for the y-data, you don't need one for every single point.
Is there a particular way I should be passing these covariance matrices?
I have some data which I try to interpolate using scipy.interpolate.griddata. In my use-case I marked some of the numpy arrays read-only, which apparently breaks the interpolation:
import numpy as np
from scipy import interpolate
x0 = 10 * np.random.randn(100, 2)
y0 = np.random.randn(100)
x1 = np.random.randn(3, 2)
x0.flags.writeable = False
# x1.flags.writeable = False
interpolate.griddata(x0, y0, x1)
yields the following exception:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-14-a6e09dbdd371> in <module>()
6 # x1.flags.writeable = False
7
----> 8 interpolate.griddata(x0, y0, x1)
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/interpolate/ndgriddata.pyc in griddata(points, values, xi, method, fill_value, rescale)
216 ip = LinearNDInterpolator(points, values, fill_value=fill_value,
217 rescale=rescale)
--> 218 return ip(xi)
219 elif method == 'cubic' and ndim == 2:
220 ip = CloughTocher2DInterpolator(points, values, fill_value=fill_value,
scipy/interpolate/interpnd.pyx in scipy.interpolate.interpnd.NDInterpolatorBase.__call__ (scipy/interpolate/interpnd.c:3930)()
scipy/interpolate/interpnd.pyx in scipy.interpolate.interpnd.LinearNDInterpolator._evaluate_double (scipy/interpolate/interpnd.c:5267)()
scipy/interpolate/interpnd.pyx in scipy.interpolate.interpnd.LinearNDInterpolator._do_evaluate (scipy/interpolate/interpnd.c:6006)()
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/interpolate/interpnd.so in View.MemoryView.memoryview_cwrapper (scipy/interpolate/interpnd.c:17829)()
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/interpolate/interpnd.so in View.MemoryView.memoryview.__cinit__ (scipy/interpolate/interpnd.c:14104)()
ValueError: buffer source array is read-only
Clearly, the interpolation function doesn't like that the arrays are write-protected. However, I don't understand why they want to change this – I certainly don't expect my input to be mutated by a call to the interpolation function and this is also not mentioned in the documentation as far as I can tell. Why would the function behave like this?
Note that setting x1 readonly instead of x0 leads to a similar error.
The relevant code is written in Cython, and when Cython requests a memoryview of the input array, it always asks for a writeable one, even if you don't need it.
Since an array flagged as non-writeable will refuse to provide a writeable memoryview, the code fails, even though it didn't need to write to the array in the first place.
How can I use scipy.stats.kde.gaussian_kde and scipy.stats.kstest in a conformal way?
For example, the code:
from numpy import inf
import scipy.stat
my_pdf = scipy.stats.kde.gaussian_kde(sample)
scipy.stats.kstest(sample, lambda x: my_pdf.integrate_box_1d(-inf, x))
Gives the following answer:
(0.5396735893479544, 0.0)
Which is not true because a sample obviously belongs to the distribution which was constructed on this sample.
First of all, the right test to use for testing if two samples may have come from the same distribution is the two-sample KS test, implemented in scipy.stats.ks_2samp, which directly compares the empirical CDFs. KDE is density estimation, which smooths out the CDF, and is therefore a bunch of unnecessary work that also makes your estimate worse, statistically speaking.
But the reason you're seeing this problem is that the signature for your CDF parameter isn't quite right. kstest calls cdf(vals) (source), where vals is the sorted samples, to get out the CDF value for each of your samples. In your code, this ends up calling my_pdf.integrate_box_1d(-np.inf, samps), but integrate_box_1d wants both arguments to be scalars. The signature is wrong, and if you tried this with most arrays it'd crash with a ValueError:
>>> my_pdf.integrate_box_1d(-np.inf, samp[:10])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-38-81d0253a33bf> in <module>()
----> 1 my_pdf.integrate_box_1d(-np.inf, samp[:10])
/Library/Python/2.7/site-packages/scipy-0.12.0.dev_ddd617d_20120725-py2.7-macosx-10.8-x86_64.egg/scipy/stats/kde.pyc in integrate_box_1d(self, low, high)
311
312 normalized_low = ravel((low - self.dataset) / stdev)
--> 313 normalized_high = ravel((high - self.dataset) / stdev)
314
315 value = np.mean(special.ndtr(normalized_high) - \
ValueError: operands could not be broadcast together with shapes (10) (1,1000)
but unfortunately, when the second argument is samp, it can broadcast just fine since the arrays are the same shape, and then everything goes to hell. Presumably integrate_box_1d should check the shape of its arguments, but here's one way to do it correctly:
>>> my_cdf = lambda ary: np.array([my_pdf.integrate_box_1d(-np.inf, x) for x in ary])
>>> scipy.stats.kstest(sample, my_cdf)
(0.015597917205996903, 0.96809912578616597)
You could also use np.vectorize if you felt like it.
(But again, you probably actually want to use ks_2samp.)