Related
I have tried to no avail for a week while trying to solve a system of coupled differential equations and reproduce the results shown in the attached image. I seem to be getting weird results as shown also. I don't seem to know what I might be doing wrong.The set of coupled differential equations were solved using Newman's BAND. Here's a link to the python implementation: python solution using BAND . And another link to the original image of the problem in case the attached is not clear enough: here you find a clearer image of the problem. Now what I am trying to do is to solve the same problem by creating a sparse array directly from the discretized equations using a combination of sympy and numpy and then solving using scipy's spsolve. Here is my code below. I need some help to figure out what I am doing wrong.
I have represented the variables as c1 = cA, c2 = cB, c3 = cC, c4 = cD in my code. Equation 2 has been linearized and phi10 and phi20 are the trial values of the variables cC and cD.
# import modules
import numpy as np
import sympy
from sympy.core.function import _mexpand
import scipy as sp
import scipy.sparse as ss
import scipy.sparse.linalg as ssl
import matplotlib.pyplot as plt
# define functions
def flatten(t):
"""
function to flatten lists
"""
return [item for sublist in t for item in sublist]
def get_coeffs(coeff_dict, func_vars):
"""
function to extract coefficients from variables
and form the sparse symbolic array
"""
c = coeff_dict
for i in list(c.keys()):
b, _ = i.as_base_exp()
if b == i:
continue
if b in c:
c[i] = 0
if any(k.has(b) for k in c):
c[i] = 0
return [coeff_dict[val] for val in func_vars]
# Constants for the problem
I = 0.1 # A/cm2
L = 1.0 # distance (x) in cm
m = 100 # grid spacing
h = L / (m-1)
a = 23300 # 1/cm
io = 2e-7 # A/cm2
n = 1
F = 96500 # C/mol
R = 8.314 # J/mol-K
T = 298 # K
sigma = 20 # S/cm
kappa = 0.06 # S/cm
alpha = 0.5
beta = -(1-alpha)*n*F/R/T
phi10 , phi20 = 5, 0.5 # these are just guesses
P = a*io*np.exp(beta*(phi10-phi20))
j = sympy.symbols('j',integer = True)
cA = sympy.IndexedBase('cA')
cB = sympy.IndexedBase('cB')
cC = sympy.IndexedBase('cC')
cD = sympy.IndexedBase('cD')
# write the boundary conditions at x = 0
bc=[cA[1], cB[1],
(4/3) * cC[2] - (1/3)*cC[3], # use a three point approximation for cC_prime
cD[1]
]
# form a list of expressions from the boundary conditions and equations
expr=flatten([bc,flatten([[
-cA[j-1] - cB[j-1] + cA[j+1] + cB[j+1],
cB[j-1] - 2*h*P*beta*cC[j] + 2*h*P*beta*cD[j] - cB[j+1],
-sigma*cC[j-1] + 2*h*cA[j] + sigma * cC[j+1],
-kappa * cD[j-1] + 2*h * cB[j] + kappa * cD[j+1]] for j in range(2, m)])])
vars = [cA[j], cB[j], cC[j], cD[j]]
# flatten the list of variables
unknowns = flatten([[cA[j], cB[j], cC[j], cD[j]] for j in range(1,m)])
var_len = len(unknowns)
# # # substitute in the boundary conditions at x = L while getting the coefficients
A = sympy.SparseMatrix([get_coeffs(_mexpand(i.subs({cA[m]:I}))\
.as_coefficients_dict(), unknowns) for i in expr])
# convert to a numpy array
mat_temp = np.array(A).astype(np.float64)
# you can view the sparse array with this
fig = plt.figure(figsize=(6,6))
ax = fig.add_axes([0,0, 1,1])
cmap = plt.cm.binary
plt.spy(mat_temp, cmap = cmap, alpha = 0.8)
def solve_sparse(b0, error):
# create the b column vector
b = np.copy(b0)
b[0:4] = np.array([0.0, I, 0.0, 0.0])
b[var_len-4] = I
b[var_len-3] = 0
b[var_len-2] = 0
b[var_len-1] = 0
print(b.shape)
old = np.copy(b0)
mat = np.copy(mat_temp)
b_2 = np.copy(b)
resid = 10
lss = 0
while lss < 100:
mat_2 = np.copy(mat)
for j in range(3, var_len - 3, 4):
# update the forcing term of equation 2
b_2[j+2] = 2*h*(1-beta*old[j+3]+beta*old[j+4])*a*io*np.exp(beta*(old[j+3]-old[j+4]))
# update the sparse array at every iteration for variables cC and cD in equation2
mat_2[j+2, j+3] += 2*h*beta*a*io*np.exp(beta*(old[j+3]-old[j+4]))
mat_2[j+2, j+4] += 2*h*beta*a*io*np.exp(beta*(old[j+3]-old[j+4]))
# form the column sparse matrix
A_s = ss.csc_matrix(mat_2)
new = ssl.spsolve(A_s, b_2).flatten()
resid = np.sum((new - old)**2)/var_len
lss += 1
old = np.copy(new)
return new
val0 = np.array([[0.0, 0.0, 0.0, 0.0] for _ in range(m-1)]).flatten() # form an array of initial values
error = 1e-7
## Run the code
conc = solve_sparse(val0, error).reshape(m-1, len(vars))
conc.shape # gives (99, 4)
# Plot result for cA:
plt.plot(conc[:,0], marker = 'o', linestyle = '')
What happens seems pretty clear now, after having seen that the plotted solution indeed oscillates between the upper and lower values. You are using the central Euler method as discretization, for u'=F(u) this reads as
u[j+1]-u[j-1] = 2*h*F(u[j])
This method is only weakly stable and allows the sub-sequences of odd and even indices to evolve rather independently. As equation this would mean that the solution might approximate the system ue'=F(uo), uo'=F(ue) with independent functions ue, uo that follow the path of the even or odd sub-sequence.
These even and odd parts are only tied together by the treatment of the boundary points, two or three points deep. So to avoid or reduce the oscillation requires a very careful handling of boundary conditions and also the differential equations for the boundary points.
But one can avoid all this unpleasantness by using the trapezoidal method
u[j+1]-u[j] = 0.5*h*(F(u[j+1])+F(u[j]))
This also reduces the band-width of the system matrix.
To properly implement the implied Newton method correctly (linearizing via Taylor and solving the linearized equation is what the Newton-Kantorovich method does) you need to replace F(u[j]) with F(u_old[j])+F'(u_old[j])*(u[j]-u_old[j]). This then gives a linear system of equations in u for the iteration step.
For the trapezoidal method this gives
(I-0.5*h*F'(u_old[j+1]))*u[j+1] - (I+0.5*h*F'(u_old[j]))*u[j]
= 0.5*h*(F(u_old[j+1])-F'(u_old[j+1])*u_old[j+1] + F(u_old[j])-F'(u_old[j])*u_old[j])
In general, the derivatives values and thus the system matrix need not be updated every step, only the function value (else the iteration does not move forward).
I have a CSV file of various persons with 8 parameters to determine whether the person is diabetic or not.
You will get the CSV file from here
I am making a model that will train and predict if a person is diabetic or not without using of third-party applications like Tensorlfow Scikitlearn etc. I am making it from scratch.
here is my code:
from numpy import genfromtxt
import numpy as np
my_data = genfromtxt('E:/diabaties.csv', delimiter=',')
X,Y = my_data[1: ,:-1], my_data[1: ,-1:] #striping data and output from my_data
def sigmoid(x):
return (1/(1+np.exp(-x)))
m = X.shape[0]
def propagate(W, b, X, Y):
#forward propagation
A = sigmoid(np.dot(X, W) + b)
cost = (- 1 / m) * np.sum(Y * np.log(A) + (1 - Y) * (np.log(1 - A)))
print(cost)
#backward propagation
dw = (1 / m) * np.dot(X.T, (A - Y))
db = (1 / m) * np.sum(A - Y)
return(dw, db, cost)
def optimizer(W,b,X,Y,number_of_iterration,learning_rate):
for i in range(number_of_iterration):
dw, db, cost = propagate(W,b,X,Y)
W = W - learning_rate*dw
b = b - learning_rate*db
return(W, b)
W = np.zeros((X.shape[1],1))
b = 0
W,b = optimizer(W, b, X, Y, 100, 0.05)
The output which is getting generated is:
It is in this link please take a look.
I have tried to -
initialize the value of W with random numbers.
spent a lot of time to debug but cannot find what I have done wrong
This short answer is that your learning rate is about 500x too big for this problem. Think about it like you're trying to pilot your W vector into a canyon in the cost function. At each step, the gradient tells you which way is down hill, but the steps you take in that direction are so big that you jump over the canyon and end up on the other side. Each time this happens, your cost goes up because you're getting farther and farther out of the canyon until after 2 iterations, it blows up.
If you replace the line
W,b = optimizer(W, b, X, Y, 100, 0.05)
with
W,b = optimizer(W, b, X, Y, 100, 0.0001)
It will converge, though still not at a reasonable speed. (Side note, there's no good way to know the learning rate you need for a given problem. You just try lower and lower values until your cost value doesn't diverge.)
The longer answer is that the problem is that your features are all on different scales.
col_means = X.mean(axis=0)
col_stds = X.std(axis=0)
print('column means: ', col_means)
print('column stdevs: ', col_stds)
yields
column means: [ 3.84505208 120.89453125 69.10546875 20.53645833 79.79947917
31.99257812 0.4718763 33.24088542]
column stdevs: [ 3.36738361 31.95179591 19.34320163 15.94182863 115.16894926
7.87902573 0.33111282 11.75257265]
This means that the variations in the numbers of the second feature are about 100x as large as the variations in the numbers of the second to last feature which in turn means that the number of the second value in your W vector will have to be tuned to about 100x the precision of the value of the second to last number in your W vector.
There are two ways to deal with this in practice. First, you could use a fancier optimizer. Instead of basic gradient descent, you could use gradient descent with momentum, but that would change all your code. The second, simpler, way is just to scale your features so they're all about the same size.
col_means = X.mean(axis=0)
col_stds = X.std(axis=0)
print('column means: ', col_means)
print('column stdevs: ', col_stds)
X -= col_means
X /= col_stds
W, b = optimizer(W, b, X, Y, 100, 1.0)
Here we subtract the mean value of each feature and divide each feature's value by its standard deviation. Sometimes newbies are thrown off by this -- "you can't change your data values, that changes the problem" -- but it makes sense if you realize that it's just another mathematical transformation, just like multiplying by W, adding b, taking the sigmoid, etc. The only catch is that you've got to make sure you do the same thing for any future data. Just like the values of your W vector are learned parameters of your model, the values of the col_means and col_stds are too, so you've got to save them like W and b and use them if you want to perform inference with this model on new data in the future.
That lets us use a much bigger learning rater of 1.0 because now all the features are approximately the same size.
Now if you try, you'll get the following output:
column means: [ 3.84505208 120.89453125 69.10546875 20.53645833 79.79947917
31.99257812 0.4718763 33.24088542]
column stdevs: [ 3.36738361 31.95179591 19.34320163 15.94182863 115.16894926
7.87902573 0.33111282 11.75257265]
0.6931471805599452
0.5902957589079032
0.5481784378158732
0.5254804089153315
...
0.4709931321295562
0.4709931263193595
0.47099312122176273
0.4709931167488006
0.470993112823447
This is what you want. Your cost function is going down at each step and at the end of your 100 iterations, the cost is stable to ~8 significant figures, so dropping it more probably won't do much.
Welcome to machine learning!
The problem is with your initialization of the weight and bias. It’s important that you don’t initialize at least the weights to zero and instead initialize them with some random small numbers. The value of A is coming out to be zero making your cost function undefined
Update:
Try something like this:
from numpy import genfromtxt
import numpy as np
# my_data = genfromtxt('E:/diabaties.csv', delimiter=',')
# X,Y = my_data[1: ,:-1], my_data[1: ,-1:] #striping data and output from my_data
# Using random data
n_points = 100
n_neurons = 5
X = np.random.rand(n_points, n_neurons) # 5 dimensional data from uniform distribution [0, 1)
Y = np.random.randint(low=0, high=2, size=(n_points, 1)) # Binary labels
def sigmoid(x):
return (1/(1+np.exp(-x)))
m = X.shape[0]
def propagate(W, b, X, Y):
#forward propagation
A = sigmoid(np.dot(X, W) + b)
cost = (- 1 / m) * np.sum(Y * np.log(A) + (1 - Y) * (np.log(1 - A)))
print(cost)
#backward propagation
dw = (1 / m) * np.dot(X.T, (A - Y))
db = (1 / m) * np.sum(A - Y)
return(dw, db, cost)
def optimizer(W,b,X,Y,number_of_iterration,learning_rate):
for i in range(number_of_iterration):
dw, db, cost = propagate(W,b,X,Y)
W = W - learning_rate*dw
b = b - learning_rate*db
return(W, b)
W = np.random.normal(loc=0, scale=0.01, size=(n_neurons, 1)) # Drawing random initialization from gaussian
b = 0
W,b = optimizer(W, b, X, Y, 100, 0.05)
Your NaN problem is simply due to np.log encountering a zero value. You always want to scale your X values. Statistical (mean, std) normalization will work, but I find min-max scaling works best. Here is code for that:
def minmax_scaler(x):
min = np.nanmin(x, axis=0)
max = np.nanmax(x, axis=0)
return (x-min)/(max-min)
Also, your neural net has only one neuron. When you call np.dot(X, W) these should be matrices of shape (cases, features) and (features, neurons) respectively. So, now your initialization code looks like this:
X = minmax_scaler(X)
neurons = 10
learning_rate = 0.05
W = np.random.random((X.shape[1], neurons))
b = np.zeros((1, neurons)) # b width to match W
I got decent convergence without needing to change the learning rate. See chart:
This is such a small dataset that, even with 10-20 neurons, you are in danger of overfitting it. Ordinarily, you would code a predict() method and an accuracy check, and then set aside some of the data to test for overfitting.
I need to fit a function
z(u,v) = C u v^p
That is, I have a two-dimensional data set, and I have to find two parameters, C and p. Is there something in numpy or scipy that can do this in a straightforward manner? I took a look at scipy.optimize.leastsq, but it's not clear to me how I would use it here.
def f(x,u,v,z_data):
C = x[0]
p = x[1]
modelled_z = C*u*v**p
diffs = modelled_z - z_data
return diffs.flatten() # it expects a 1D array out.
# it doesn't matter that it's conceptually 2D, provided flatten it consistently
result = scipy.optimize.leastsq(f,[1.0,1.0], # initial guess at starting point
args = (u,v,z_data) # alternatively you can do this with closure variables in f if you like
)
# result is the best fit point
For your specific function you might be able to do it better - for example, for any given value of p there is one best value of C that can be determined by straightforward linear algebra.
You can transform the problem into a simple linear least squares problem, and then you don't need leastsq() at all.
z[i] == C * u[i] * v[i]**p
becomes
z[i]/u[i] == C * v[i]**p
And then
log(z[i]/u[i]) == log(C) + p * log(v[i])
Change variables and you can solve as a simple linear problem:
Z[i] == L + p * V[i]
Using numpy and assuming you have the data in arrays z, u and v, this is rendered as:
Z = log(z/u)
V = log(v)
p, L = np.polyfit(V, Z, 1)
C = exp(L)
You probably ought to put a try: and except: around it in case some of the u values are zero or there are negative values.
I'm trying to implement Bayesian PCA using PyMC library for python. But, I'm stuck where I define lower dimensional coordinates...
Model is
x = Wz + e
where x is observation vector, W is the transformation matrix, and z is lower dimensional coordinate vector.
First I define a distribution for the transformation matrix W. Each column is drawn from a normal distribution (zero mean, and identity covariance for simplicity)
def W_logp(value):
logLikes = np.array([multivariate_normal.logpdf(value[:,i], mean=np.zeros(dimX), cov=1) for i in range(0, dimZ)])
return logLikes.sum()
def W_random():
W = np.zeros([dimX, dimZ])
for i in range(0, dimZ):
W[:,i] = multivariate_normal.rvs(mean=np.zeros(dimX), cov=1)
return W
w0 = np.random.randn(dimX, dimZ)
W = pymc.Stochastic(
logp = W_logp,
doc = 'Transformation',
name = 'W',
parents = {},
random = W_random,
trace = True,
value = w0,
dtype = float,
rseed = 116.,
observed = False,
cache_depth = 2,
plot = False,
verbose = 0)
Then, I want to define distribution for z that is again a multivariate normal (zero mean, and identity covariance). However, I need to draw a z for each observation separately while W is common for all of them. So, I tried
z = pymc.MvNormal('z', np.zeros(dimZ), np.eye(dimZ), size=N)
However, pymc.MvNormal does not have a size parameter. So it raises an error. Next step would be
m = Data.mean(axis=0) + np.dot(W, z)
obs = pymc.MvNormal('Obs', m, C, value=Data, observed=True)
I did not give the specification for C above since it is irrelevant for now. Any ideas how to implement?
Thanks
EDIT
After Chris Fonnesbeck's answer I changed my code as follows
numD, dimX = Data.shape
dimZ = 3
mm = Data.mean(axis=0)
tau = pymc.Gamma('tau', alpha=10, beta=2)
tauW = pymc.Gamma('tauW', alpha=20, beta=2, size=dimZ)
#pymc.deterministic(dtype=float)
def C(tau=tau):
return (tau)*np.eye(dimX)
#pymc.deterministic(dtype=float)
def CW(tau=tauW):
return np.diag(tau)
W = [pymc.MvNormal('W%i'%i, np.zeros(dimZ), CW) for i in range(dimX)]
z = [pymc.MvNormal('z%i'%i, np.zeros(dimZ), np.eye(dimZ)) for i in range(numD)]
mu = [pymc.Lambda('mu%i'%i, lambda W=W, z=z: mm + np.dot(np.array(W), np.array(z[i]))) for i in range(numD)]
obs = [pymc.MvNormal('Obs%i'%i, mu[i], C, value=Data[i,:], observed=True) for i in range(numD)]
model = pymc.Model([tau, tauW] + obs + W + z)
mcmc = pymc.MCMC(model)
But this time, it tries to allocate a huge amount of memory (more than 8GB) when running pymc.MCMC(model), with numD=45 and dimX=504. Even when I try it with only numD=1 (so creating only 1 z, mu, and obs), it does the same. Any idea why?
Unfortunately, PyMC does not easily let you define vectors of multivariate stochastics. Hopefully we can make this happen in PyMC 3. For now, you would have to specify this using a container. For example:
z = [pymc.MvNormal('z_%i' % i, np.zeros(dimZ), np.eye(dimZ)) for i in range(N)]
Regarding the memory issue, try using a different backend for the traces. The default ("ram") keeps everything in RAM. You can try something like "pickle" or "sqlite" instead.
Regarding the plate notation, it might be something we could pursue for PyMC 3. Feel free to create an issue suggesting this in our issue tracker.
I'm trying to get the vector coordinates from the polynomial p in the follow code assuming that x,y and z belong to GF(2) but I get error
TypeError: can't initialize vector from nonzero non-list.
How I will be able to fix that?
reset()
var("x")
var("y")
var("z")
pp = 2
k.<t>=GF(2^pp)
VS = k.vector_space()
p = z*x*t^2 + t*y + 1
print VS.coordinates(p)
Maybe you can use the coefficient list of the polynomial as its vectoral coordinates, and then you may convert this list to a vector. But in that case, it is better to define GF(2^2) as GF(4,'a')={0,1,a,a+1}.
For example you may do something like this:
sage
K = GF(4,'a')
R = PolynomialRing(GF(4,'a'),"x")
x = R.gen()
a = K.gen()
p = (a+1)*x^3 + x^2 + a
p.list()
If you need to fix the dimension n to a bigger value than the degree of p, then you may do the following;
n = 6
L = p.list(); l=len(L); i = n-l; L_ = [0]*i; L.extend(L_)
L
gives you the 6-dimensional coordinates of p.
If you need to use this coefficient list as a vector afterwards, you may just use vector(L) instead of L.