Python Numpy Logistic Regression - python

I'm trying to implement vectorized logistic regression in python using
numpy. My Cost function (CF) seems to work OK. However there is a
problem with gradient calculation. It returns 3x100 array whereas it
should return 3x1. I think there is a problem with the (hypo-y) part.
def sigmoid(a):
return 1/(1+np.exp(-a))
def CF(theta,X,y):
m=len(y)
hypo=sigmoid(np.matmul(X,theta))
J=(-1./m)*((np.matmul(y.T,np.log(hypo)))+(np.matmul((1-y).T,np.log(1-hypo))))
return(J)
def gr(theta,X,y):
m=len(y)
hypo=sigmoid(np.matmul(X,theta))
grad=(1/m)*(np.matmul(X.T,(hypo-y)))
return(grad)
X is a 100x3 arrray, y is 100x1, and theta is a 3x1 arrray. It seems both functions are working individually, however this optimization function gives an error:
optim = minimize(CF, theta, method='BFGS', jac=gr, args=(X,y))
The error: "ValueError: shapes (3,100) and (3,100) not aligned: 100 (dim 1) != 3 (dim 0)"

I think there is a problem with the (hypo-y) part.
Spot on!
hypo is of shape (100,) and y is of shape (100, 1). In the element-wise - operation, hypo is broadcasted to shape (1, 100) according to numpy's broadcasting rules. This results in a (100, 100) array, which causes the matrix multiplication to result in a (3, 100) array.
Fix this by bringing hypo into the same shape as y:
hypo = sigmoid(np.matmul(X, theta)).reshape(-1, 1) # -1 means automatic size on first dimension
There is one more issue: scipy.optimize.minimize (which I assume you are using) expects the gradient to be an array of shape (k,) but the function gr returns a vector of shape (k, 1). This is easy to fix:
return grad.reshape(-1)
The final function becomes
def gr(theta,X,y):
m=len(y)
hypo=sigmoid(np.matmul(X,theta)).reshape(-1, 1)
grad=(1/m)*(np.matmul(X.T,(hypo-y)))
return grad.reshape(-1)
and running it with toy data works (I have not checked the math or the plausibility of the results):
theta = np.reshape([1, 2, 3], 3, 1)
X = np.random.randn(100, 3)
y = np.round(np.random.rand(100, 1))
optim = minimize(CF, theta, method='BFGS', jac=gr, args=(X,y))
print(optim)
# fun: 0.6830931976615066
# hess_inv: array([[ 4.51307367, -0.13048255, 0.9400538 ],
# [-0.13048255, 3.53320257, 0.32364498],
# [ 0.9400538 , 0.32364498, 5.08740428]])
# jac: array([ -9.20709950e-07, 3.34459058e-08, 2.21354905e-07])
# message: 'Optimization terminated successfully.'
# nfev: 15
# nit: 13
# njev: 15
# status: 0
# success: True
# x: array([-0.07794477, 0.14840167, 0.24572182])

Related

scipy.sparse for numpy.random_multivariate_normal

I wanted to save a bit of memory, and thought I'd create a scipy.sparse identity matrix (dim is in the thousands, not terrible, but also not frugal). Notice its shape passes the assert:
cov = sigma_0 * sparse.identity(dim, dtype=np.float32)
assert (dim, dim) == cov.shape
result = np.random.multivariate_normal(mu, cov)
E ValueError: cov must be 2 dimensional and square
The following, however, works fine:
cov = sigma_0 * np.identity(dim, dtype=np.float32)
assert (dim, dim) == cov.shape
result = np.random.multivariate_normal(mu, cov)
Did I miss it, somewhere, in the docs to say that sparse covariance matrices are expected fail with a ValueError?
What's happening here is that in np.random.multivariate_normal the input array is cast to an array:
cov = np.array(cov)
which ends up creating a scalar array of dtype object since numpy doesn't know anything about sparse matrices.
In [3]: cov = sparse.identity(100, dtype=np.float32)
In [4]: cov.shape
Out[4]: (100, 100)
In [5]: np.array(cov)
Out[5]:
array(<100x100 sparse matrix of type '<type 'numpy.float32'>'
with 100 stored elements (1 diagonals) in DIAgonal format>, dtype=object)

scipy.integrate.solve_ivp vectorized

Trying to use the vectorized option for solve_ivp and strangely it throws an error that y0 must be 1 dimensional.
MWE :
from scipy.integrate import solve_ivp
import numpy as np
import math
def f(t, y):
theta = math.pi/4
ham = np.array([[1,0],[1,np.exp(-1j*theta*t)]])
return-1j * np.dot(ham,y)
def main():
y0 = np.eye(2,dtype= np.complex128)
t0 = 0
tmax = 10**(-6)
sol=solve_ivp( lambda t,y :f(t,y),(t0,tmax),y0,method='RK45',vectorized=True)
print(sol.y)
if __name__ == '__main__':
main()
The calling signature is fun(t, y). Here t is a scalar, and there are two options for the ndarray y: It can either have shape (n,); then fun must return array_like with shape (n,). Alternatively it can have shape (n, k); then fun must return an array_like with shape (n, k), i.e. each column corresponds to a single column in y. The choice between the two options is determined by vectorized argument (see below). The vectorized implementation allows a faster approximation of the Jacobian by finite differences (required for stiff solvers).
Error :
ValueError: y0 must be 1-dimensional.
Python 3.6.8
scipy.version
'1.2.1'
The meaning of vectorize here is a bit confusing. It doesn't mean that y0 can be 2d, but rather that y as passed to your function can be 2d. In other words that func may be evaluated at multiple points at once, if the solver so desires. How many points is up to the solver, not you.
Change the f to show the shape a y at each call:
def f(t, y):
print(y.shape)
theta = math.pi/4
ham = np.array([[1,0],[1,np.exp(-1j*theta*t)]])
return-1j * np.dot(ham,y)
A sample call:
In [47]: integrate.solve_ivp(f,(t0,tmax),[1j,0],method='RK45',vectorized=False)
(2,)
(2,)
(2,)
(2,)
(2,)
(2,)
(2,)
(2,)
Out[47]:
message: 'The solver successfully reached the end of the integration interval.'
nfev: 8
njev: 0
nlu: 0
sol: None
status: 0
success: True
t: array([0.e+00, 1.e-06])
t_events: None
y: array([[0.e+00+1.e+00j, 1.e-06+1.e+00j],
[0.e+00+0.e+00j, 1.e-06-1.e-12j]])
Same call, but with vectorize=True:
In [48]: integrate.solve_ivp(f,(t0,tmax),[1j,0],method='RK45',vectorized=True)
(2, 1)
(2, 1)
(2, 1)
(2, 1)
(2, 1)
(2, 1)
(2, 1)
(2, 1)
Out[48]:
message: 'The solver successfully reached the end of the integration interval.'
nfev: 8
njev: 0
nlu: 0
sol: None
status: 0
success: True
t: array([0.e+00, 1.e-06])
t_events: None
y: array([[0.e+00+1.e+00j, 1.e-06+1.e+00j],
[0.e+00+0.e+00j, 1.e-06-1.e-12j]])
With False, the y passed to f is (2,), 1d; with True it is (2,1). I'm guessing it could be (2,2) or even (2,3) if the solver method so desires. That could speed up the execution, with fewer calls to f. In this case, it doesn't matter.
quadrature has a similar vec_func boolean parameter:
Numerical Quadrature of scalar valued function with vector input using scipy
A related bug/issue discussion:
https://github.com/scipy/scipy/issues/8922

Numpy Rowwise Addition with a (Nx1) Matrix and a Vector with Length N

I am trying to update the weights in a neural network with this line:
self.l1weights[0] = self.l1weights[0] + self.learning_rate * l1error
And this results in a value error:
ValueError: could not broadcast input array from shape (7,7) into shape (7)
Printing the learning_rate*error and the weights returns something like this:
[[-0.00657573]
[-0.01430752]
[-0.01739463]
[-0.00038115]
[-0.01563393]
[-0.02060908]
[-0.01559269]]
[ 4.17022005e-01 7.20324493e-01 1.14374817e-04 3.02332573e-01
1.46755891e-01 9.23385948e-02 1.86260211e-01]
It is clear the weights are initialized as a vector of length 7 in this example and the error is initialized as a 7x1 matrix. I would expect addition to return a 7x1 matrix or a vector as well, but instead it generates a 7x7 matrix like this:
[[ 4.10446271e-01 7.13748760e-01 -6.46135890e-03 2.95756839e-01
1.40180157e-01 8.57628611e-02 1.79684478e-01]
[ 4.02714481e-01 7.06016970e-01 -1.41931487e-02 2.88025049e-01
1.32448367e-01 7.80310713e-02 1.71952688e-01]
[ 3.99627379e-01 7.02929868e-01 -1.72802505e-02 2.84937947e-01
1.29361266e-01 7.49439695e-02 1.68865586e-01]
[ 4.16640855e-01 7.19943343e-01 -2.66775370e-04 3.01951422e-01
1.46374741e-01 9.19574446e-02 1.85879061e-01]
[ 4.01388075e-01 7.04690564e-01 -1.55195551e-02 2.86698643e-01
1.31121961e-01 7.67046648e-02 1.70626281e-01]
[ 3.96412924e-01 6.99715412e-01 -2.04947062e-02 2.81723492e-01
1.26146810e-01 7.17295137e-02 1.65651130e-01]
[ 4.01429313e-01 7.04731801e-01 -1.54783174e-02 2.86739880e-01
1.31163199e-01 7.67459026e-02 1.70667519e-01]]
Numpy.sum also returns the same 7x7 matrix. Is there a way to solve this without explicit reshaping? Output size is variable and this is an issue specific to when the output size is one.
When adding (7,) array (named a) with (1, 7) array (named b), broadcasting happens and generates (7, 7) array. If your just want to do element-by-element addition, keep them in the same shape.
a + b.flatten() gives (7,). flatten makes all the dimensions collapse into one. This keeps the result as a row.
a.reshape(-1, 1) + b gives (1, 7). -1 in reshape requires numpy to decide how many elements are there given other dimensions. This keeps the result as a column.
a = np.arange(7) # row
b = a.reshape(-1, 1) # column
print((a + b).shape) # (7, 7)
print((a + b.flatten()).shape) # (7,)
print((a.reshape(-1, 1) + b).shape) # (7, 1)
In your case, a and b would be self.l1weights[0] and self.learning_rate * l1error respectively.

Solve a nonlinear equation system with constraints on the variables

Some hypothetical example solving a nonlinear equation system with fsolve:
from scipy.optimize import fsolve
import math
def equations(p):
x, y = p
return (x+y**2-4, math.exp(x) + x*y - 3)
x, y = fsolve(equations, (1, 1))
print(equations((x, y)))
Is it somehow possible to solve it using scipy.optimize.brentq with some interval, e.g. [-1,1]? How does the unpacking work in that case?
As sascha suggested, constrained optimization is the easiest way to proceed. The least_squares method is convenient here: you can directly pass your equations to it, and it will minimize the sum of squares of its components.
from scipy.optimize import least_squares
res = least_squares(equations, (1, 1), bounds = ((-1, -1), (2, 2)))
The structure of bounds is ((min_first_var, min_second_var), (max_first_var, max_second_var)), or similarly for more variables.
The resulting object has a bunch of fields, shown below. The most relevant ones are: res.cost is essentially zero, which means a root was found; and res.x says what the root is: [ 0.62034453, 1.83838393]
active_mask: array([0, 0])
cost: 1.1745369255773682e-16
fun: array([ -1.47918522e-08, 4.01353883e-09])
grad: array([ 5.00239352e-11, -5.18964300e-08])
jac: array([[ 1. , 3.67676787],
[ 3.69795254, 0.62034452]])
message: '`gtol` termination condition is satisfied.'
nfev: 7
njev: 7
optimality: 8.3872972696740977e-09
status: 1
success: True
x: array([ 0.62034453, 1.83838393])

Input dimensions for distance function for nearest neighbors

In the context of unsupervised nearest neighbors with scikit-learn, I have implemented my own distance function to deal with my uncertain points (i.e. a point is represented as a normal distribution):
def my_mahalanobis_distance(x, y):
'''
x: array of shape (4,) x[0]: mu_x_1, x[1]: mu_x_2,
x[2]: cov_x_11, x[3]: cov_x_22
y: array of shape (4,) y[0]: mu_ y_1, y[1]: mu_y_2,
y[2]: cov_y_11, y[3]: cov_y_22
'''
cov_inv = np.linalg.inv(np.diag(x[:2])+np.diag(y[:2]))
return sp.spatial.distance.mahalanobis(x[:2], y[:2], cov_inv)
However, when I set my nearest neighbors:
nnbrs = NearestNeighbors(n_neighbors=1, metric='pyfunc', func=my_mahalanobis_distance)
nearest_neighbors = nnbrs.fit(X)
where X is a (N, 4) (n_samples, n_features) array, if I print x and y in my my_mahalanobis_distance, I get shapes of (10,) instead of (4,) as I would expect.
Example:
I add the following line to my_mahalanobis_distance:
print(x.shape)
Then in my main:
n_features = 4
n_samples = 10
# generate X array:
X = np.random.rand(n_samples, n_features)
nnbrs = NearestNeighbors(n_neighbors=1, metric='pyfunc', func=my_mahalanobis_distance)
nearest_neighbors = nnbrs.fit(X)
The result is:
(10,)
ValueError: shapes (2,) and (8,8) not aligned: 2 (dim 0) != 8 (dim 0)
I perfectly understand the error, but I do not understand why my x.shape is (10,) while my number of features is 4 in X.
I am using Python 2.7.10 and scikit-learn 0.16.1.
EDIT:
replacing return sp.spatial.distance.mahalanobis(x[:2], y[:2], cov_inv) by return 1 just for testing return:
(10,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
So only the first call to my_mahalanobis_distance is wrong. Looking at the x and y values at this first iteration, my observations are:
x and y are identical
if I run my code multiple times, x and y are still identical but their values have change compared to the previous run.
these values seem coming from a numpy.random function.
I would conclude that such a first call is a debugging piece of code which has not been removed.
This is not an answer, yet too long for a comment. I can not reproduce the error.
Using:
Python 3.5.2 and
Sklearn 0.18.1
with the code:
from sklearn.neighbors import NearestNeighbors
import numpy as np
import scipy as sp
n_features = 4
n_samples = 10
# generate X array:
X = np.random.rand(n_samples, n_features)
def my_mahalanobis_distance(x, y):
cov_inv = np.linalg.inv(np.diag(x[:2])+np.diag(y[:2]))
print(x.shape)
return sp.spatial.distance.mahalanobis(x[:2], y[:2], cov_inv)
n_features = 4
n_samples = 10
# generate X array:
X = np.random.rand(n_samples, n_features)
nnbrs = NearestNeighbors(n_neighbors=1, metric=my_mahalanobis_distance)
nearest_neighbors = nnbrs.fit(X)
The output is
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
(4,)
I customed my my_mahalanobis_distance to handle this issue:
def my_mahalanobis_distance(x, y):
'''
x: array of shape (4,) x[0]: mu_x_1, x[1]: mu_x_2,
x[2]: cov_x_11, x[3]: cov_x_22
y: array of shape (4,) y[0]: mu_ y_1, y[1]: mu_y_2,
y[2]: cov_y_11, y[3]: cov_y_22
'''
if (x.size, y.size) == (4, 4):
return sp.spatial.distance.mahalanobis(x[:2], y[:2],
np.linalg.inv(np.diag(x[2:])
+ np.diag(y[2:])))
# to handle the buggy first call when calling NearestNeighbors.fit()
else:
warnings.warn('x and y are respectively of size %i and %i' % (x.size, y.size))
return sp.spatial.distance.euclidean(x, y)

Categories