Why non-linear response to random values is always positive? - python

I'm creating a non-linear response to a series of random values from {-1, +1} using a simple Volterra kernel:
With a zero mean for a(k) values I would expect r(k) to have a zero mean as well for arbitrary w values. However, I get r(k) with an always positive mean value, while a mean for a(k) behaves as expected: is close to zero and changes sign from run to run.
Why don't I get a similar behavior for r(k)? Is it because a(k) are pseudo-random and two different values from a are not actually independent?
Here is a code that I use:
import numpy as np
import matplotlib.pyplot as plt
import itertools
# array of random values {-1, 1}
A = np.random.randint(2, size=10000)
A = [x*2 - 1 for x in A]
# array of random weights
M = 3
w = np.random.rand(int(M*(M+1)/2))
# non-linear response to random values
R = []
for i in range(M, len(A)):
vals = np.asarray([np.prod(x) for x in itertools.combinations_with_replacement(A[i-M:i], 2)])
R.append(np.dot(vals, w))
print(np.mean(A), np.var(A))
print(np.mean(R), np.var(R))
Edit:
Check on whether the quadratic form, which is employed by the kernel, is definite-positive fails (i.e. there are negative principal minors). The code to do the check:
import scipy.linalg as lin
wm = np.zeros((M,M))
w_index = 0
# check Sylvester's criterion
# reconstruct weights for quadratic form
for r in range(0,M):
for c in range(r,M):
wm[r,c] += w[w_index]/2
wm[c,r] += w[w_index]/2
w_index += 1
# check principal minors
for r in range(0,M):
if lin.det(wm[:r+1,:r+1])<0: print('found negative principal minor of order', r)

I'm not certain if this is the case for Volterra kernels, but many kernels are positive definite, and some kernels, such as covariance functions, do not admit values less than zero (e.g. Squared Exponential/RBF, Rational Quadratic, Matern kernels).
If these are not the cases for the Volterra kernel, you can also try changing the random seed to seed the RNG differently to check if this is still the case. Here is a looped version of your code that iterates over different random seeds:
import numpy as np
import matplotlib.pyplot as plt
import itertools
# Loop over random seeds
for i in range(10):
# Seed the RNG
np.random.seed(i)
# array of random values {-1, 1}
A = np.random.randint(2, size=10000)
A = [x*2 - 1 for x in A]
# array of random weights
M = 3
w = np.random.rand(int(M*(M+1)/2))
# non-linear response to random values
R = []
for i in range(M, len(A)):
vals = np.asarray([np.prod(x) for x in itertools.combinations_with_replacement(A[i-M:i], 2)])
R.append(np.dot(vals, w))
# Covert R to a numpy array to check for slicing
R = np.array(R)
print("A: ", np.mean(A), np.var(A))
print("R <= 0: ", R[R <= 0])
print("R: ", np.mean(R), np.var(R))
Running this, I get the following values:
A: 0.017 0.9997109999999997
R <= 0: []
R: 1.487637375177384 0.14880206863520892
A: -0.0012 0.9999985600000002
R <= 0: []
R: 2.28108226352669 0.5926651729251319
A: 0.0104 0.9998918400000001
R <= 0: []
R: 1.6138015284426408 0.9526360372883802
A: -0.0064 0.9999590399999999
R <= 0: []
R: 0.988332642595828 0.9650456000380685
A: 0.0026 0.9999932399999998
R <= 0: [-0.75835076 -0.75835076 -0.75835076 ... -0.75835076 -0.75835076
-0.75835076]
R: 0.7352258581171865 1.2668744674748733
A: -0.0048 0.9999769599999996
R <= 0: [-0.02201476 -0.29894937 -0.29894937 ... -0.02201476 -0.29894937
-0.02201476]
R: 0.7396699663779303 1.3844391355510492
A: -0.0012 0.9999985600000002
R <= 0: []
R: 2.4343947709617475 1.6377776468054106
A: -0.0052 0.99997296
R <= 0: []
R: 0.8778918601676095 0.07656607914368625
A: 0.0086 0.99992604
R <= 0: []
R: 2.3490174001719937 0.059871902764070624
A: 0.0046 0.9999788399999996
R <= 0: []
R: 1.7699147798471178 1.8049209966313247
So as you can see, R still has some negative values. My guess is that this occurs because your kernel is positive definite.

This question ended up being about math, and not programming. Nevertheless, this is my own answer.
Simply put, when indices of a(k-i) are equal, the variables in the resulting product are not independent (because they are equal). Such a product does not have a zero mean, hence the mean value of the whole equation is shifted into the positive range.
Formally, implemented function is a quadratic form, for which a mean value can be calculated by
where \mu and \Sigma are a vector of expected values and a covariance matrix for a vector A respectively.
Having a zero vector \mu leaves only the first part of this equation. The resulting estimate can be done with the following code. And it actually gives values that are close to the statistical results in the question.
# Estimate R mean
# sum weights in a main diagonal for quadratic form (matrix trace)
w_sum = 0
w_index = 0
for r in range(0,M):
for c in range(r,M):
if r==c: w_sum += w[w_index]
w_index += 1
Rmean_est = np.var(A) * w_sum
print(Rmean_est)
This estimate uses an assumption, that a elements with different indices are independent. Any implicit dependency due to the nature of pseudo-random generator, if present, probably gives only a slight change to the resulting estimate.

Related

Minimise a multivariate function using scipy.optimize.fmin_bfgs

I am trying to minimize this multivariate
where αi are constants (could be both positive or negative) and n is fixed,
using the scipy.optimize.fmin_bfgs function.
Conditions:
Test the code for a random natural number n between 5 and 10
A random starting point (all of the form m.dddd)
Do the iterations till the successive iterates are less than 2% in
absolute value, in the l∞ norm.
The coefficients αi (of the form m.dddd) should be chosen randomly so
that at least 40% of them are negative and at least 25% of them are
positive.
This is what I have tried (for custom callback refered to https://stackoverflow.com/a/30365576/7906671),
import numpy as np
from scipy.optimize import minimize
from scipy.optimize import fmin_bfgs
#Generate a list of random positive and negative integers
random_list = np.random.uniform(-1, 1, size=(1, 10))[0].tolist()
p = []
n, npr = [], []
for r in range(len(random_list)):
if random_list[r] < 0:
n.append(random_list[r])
npr.append((str(random_list[r]), 0.4))
else:
p.append(random_list[r])
npr.append((str(random_list[r]), 0.25))
#Function to pick negative number with 40% probability and positive numbers with 25% probability
def w_choice(seq):
total_prob = sum(item[1] for item in seq)
chosen = np.random.uniform(0, total_prob)
cumulative = 0
for item, probality in seq:
cumulative += probality
if cumulative > chosen:
return item
#Random start value with m.dddd and size of the input array is between 5 and 10
n = np.random.randint(5, 10)
x0 = np.round(np.random.randn(n,1), 4)
alpha = []
for i in range(n):
alpha.append(np.round(float(w_choice(npr)), 4))
print("alpha: ", alpha)
def func(x):
return sum(alpha*(x**2.0))
class StopOptimizingException(Exception):
pass
class CallbackCollector:
def __init__(self, f, thresh):
self._f = f
self._thresh = thresh
def __call__(self, xk):
if self._f(xk) < self._thresh:
self.x_opt = xk
cb = CallbackCollector(func, thresh=0.02)
x, _, _ = fmin_bfgs(func, x0, callback=cb)
But this does not converge and gives the following :
Warning: Desired error not necessarily achieved due to precision loss.
I am not able to figure out why this fails. Any help is appreciated!

Search of interval optimization

My goal is to find the probability density function for a certain distribution, using a given algorithm.
This algorithm requires that I search in which interval a float is placed in. Even though the code runs perfectly, it takes too long. I was looking for a way of optimizing my code, but none came to mind.
In each iteration I check if the float is in the interval: if that's the case, I'd like to had a unity to the position I'm considering, in array p.
This is my code:
import numpy as np
import pylab as plt
import random as rd
n = [10,100,1000]
N = [10**6]
dy = 0.005
k_max = int(1/dy-1)
y = np.array([(j+0.5)*dy for j in range(k_max+1)])
intervals = np.linspace(0,1,k_max+2)
def p(y,n,N):
p = np.zeros(len(y))
Y = np.array([sum(np.array([rd.random() for k in range(n)]))/n for j in range(N)])
z = np.array([sum(np.array([rd.random() for k in range(n)])) for l in range(N)])
for j in Y:
for i in range(len(y)-1):
if intervals[i] <= j < intervals[i+1]:
p[i] += 1
return(p/(dy*N))
for a in n:
pi = p(y,a,N[0])
plt.plot(y,pi,label = 'n = ' + str(a))
plt.title('Probability Density Function')
plt.xlabel('x')
plt.ylabel('p(x)')
plt.show()
Edit: I've added the full code, as requested.
Edit 2: Fixed an error intervals.
A quick and simple optimization can be made here:
for j in Y:
for i in range(len(y)-1):
if intervals[i] <= j < intervals[i+1]:
p[i] += 1
Since intervals consists of len(y) evenly spaced numbers over the interval [0, 1], which is also the range of Y values, we need not search the position of j in intervals, but rather we can compute it.
for j in Y: p[int(j*(len(y)-1))] += 1
Also we can remove the unused
z = np.array([sum(np.array([rd.random() for k in range(n)])) for l in range(N)])
The greatest part of the remaining execution time is taken by
Y = np.array([sum(np.array([rd.random() for k in range(n)]))/n for j in range(N)])
Here the inner conversions to np.array are very time consuming; better leave them all out:
Y = [sum([rd.random() for k in range(n)])/n for j in range(N)]

Selecting two numbers from a list in python with a probability that decays as the relative distance between them

I am trying to take a list, and from it choose a number i randomly. Following which, I want to select a second element j. The probability of choosing a j decays as 1/|i-j|. For example, the relative probability of it choosing a j four steps away from my initial i is 1/4, the probability of selecting a j immediately next to my i.
So far what I have been trying to do is populate my list, choose my i, first then calculate weights using |i-j| for all the other elements in the list.
import numpy as np
import random as random
list = [1,2,3,4,5,6,7,8,9,10]
a = 1
n1 = random.choice(range(len(list)))
n1_coord = (n1, list[n1])
print(n1_coord)
prob_weights = []
for i in range(0, n1):
wt = 1/((np.absolute(n1-i)))
#print(wt)
prob_weights.append(wt)
for i in range(n1+1,len(list)):
wt = 1/((np.absolute(n1-i)))
prob_weights.append(wt)
Is there a function built in python that I can feed these weights into which will choose a j with this probability distribution. Can I feed my array of weights in to:
numpy.random.choice(a, size=None, replace=True, p=None)
I suppose I will let p=prob_weights in my code?
import numpy as np
import random as random
list = [1,2,3,4,5,6,7,8,9,10]
a = 1
n1 = random.choice(range(len(list)))
n1_coord = (n1, list[n1])
print(n1_coord)
prob_weights = []
for i in range(0, n1):
wt = 1/((np.absolute(n1-i)))
#print(wt)
prob_weights.append(wt)
for i in range(n1+1,len(list)):
wt = 1/((np.absolute(n1-i)))
prob_weights.append(wt)
n2 = np.random.choice(range(len(list)), p=prob_weights)
n2_coord = (n2, list[n2])
Running this above with np.random.choice gives me an error. I am not even sure if this is doing what I want it do in the first place. Is there an alternate way to do this?
There is a built in function for this: random.choices, which accepts a weights argument.
Given your first selected index n1, you can do something like
indices = range(len(mylist))
weights = [0 if i == n1 else 1 / abs(i - n1) for i in indices]
n2 = random.choices(indices, weights=prb_wts, k=1).
By setting the weight of the first item to zero, you prevent it form bering selected.
Numerical operations do tend to be faster when using numpy, so you can use np.random.choice, which accepts a p argument:
values = np.array([...])
indices = np.arange(values.size)
n1 = np.random.choice(indices)
i = values[n1]
delta = np.abs(indices - n1)
weights = np.divide(1.0, delta, where=delta)
n2 = np.random.choice(indices, p=weights)
j = values[n2]
As minor nitpicks, don't call a variable list, since that shadows a built-in, and import x as x is just import x.

generating random matrices in python

In the following code I have implemented Gaussian elimination with partial pivoting for a general square linear system Ax=b. I have tested my code and it produced the right output. However now I am trying to do the following but I am not quite sure how to code it, looking for some help with this!
I want to test my implementation by solving Ax=b where A is a random 100x100 matrix and b is a random 100x1 vector.
In my code I have put in the matrices
A = np.array([[3.,2.,-4.],[2.,3.,3.],[5.,-3.,1.]])
b = np.array([[3.],[15.],[14.]])
and gotten the following correct output:
[3. 1. 2.]
[3. 1. 2.]
but now how do I change it to generate the random matrices?
here is my code below:
import numpy as np
def GEPP(A, b, doPricing = True):
'''
Gaussian elimination with partial pivoting.
input: A is an n x n numpy matrix
b is an n x 1 numpy array
output: x is the solution of Ax=b
with the entries permuted in
accordance with the pivoting
done by the algorithm
post-condition: A and b have been modified.
'''
n = len(A)
if b.size != n:
raise ValueError("Invalid argument: incompatible sizes between"+
"A & b.", b.size, n)
# k represents the current pivot row. Since GE traverses the matrix in the
# upper right triangle, we also use k for indicating the k-th diagonal
# column index.
# Elimination
for k in range(n-1):
if doPricing:
# Pivot
maxindex = abs(A[k:,k]).argmax() + k
if A[maxindex, k] == 0:
raise ValueError("Matrix is singular.")
# Swap
if maxindex != k:
A[[k,maxindex]] = A[[maxindex, k]]
b[[k,maxindex]] = b[[maxindex, k]]
else:
if A[k, k] == 0:
raise ValueError("Pivot element is zero. Try setting doPricing to True.")
#Eliminate
for row in range(k+1, n):
multiplier = A[row,k]/A[k,k]
A[row, k:] = A[row, k:] - multiplier*A[k, k:]
b[row] = b[row] - multiplier*b[k]
# Back Substitution
x = np.zeros(n)
for k in range(n-1, -1, -1):
x[k] = (b[k] - np.dot(A[k,k+1:],x[k+1:]))/A[k,k]
return x
if __name__ == "__main__":
A = np.array([[3.,2.,-4.],[2.,3.,3.],[5.,-3.,1.]])
b = np.array([[3.],[15.],[14.]])
print (GEPP(np.copy(A), np.copy(b), doPricing = False))
print (GEPP(A,b))
You're already using numpy. Have you considered np.random.rand?
np.random.rand(m, n) will get you a random matrix with values in [0, 1). You can further process it by multiplying random values or rounding.
EDIT: Something like this
if __name__ == "__main__":
A = np.round(np.random.rand(100, 100)*10)
b = np.round(np.random.rand(100)*10)
print (GEPP(np.copy(A), np.copy(b), doPricing = False))
print (GEPP(A,b))
So I would use np.random.randint for this.
numpy.random.randint(low, high=None, size=None, dtype='l')
which outputs a size-shaped array of random integers from the appropriate distribution, or a single such random int if size not provided.
low is the lower bound of the ints you want in your range
high is one greater than the upper bound in your desired range
size is the dimensions of your output array
dtype is the dtype of the result
so if I was you I would write
A = np.random.randint(0, 11, (100, 100))
b = np.random.randint(0, 11, 100)
Basically you could create the desired matrices with ones and then iterate over them, setting each value to random.randint(0,100) for example.
Empty matrix with ones is:
one_array = np.ones((100, 100))
EDIT:
like:
for x in one_array.shape[0]:
for y in one_array.shape[1]:
one_array[x][y] = random.randint(0, 100)
A = np.random.normal(size=(100,100))
b = np.random.normal(size=(100,1))
x = np.linalg.solve(A,b)
assert max(abs(A#x - b)) < 1e-12
Clearly, you can use different distributions than normal, like uniform.
You can use numpy's native rand function:
np.random.rand()
In your code just define A and b as:
A = np.random.rand(100, 100)
b = np.random.rand(100)
This will generate 100x100 matrix and 100x1 vector (both numpy arrays) filled with random values between 0 and 1.
See the docs for this function to learn more.

How to create an array that can be accessed according to its indices in Numpy?

I am trying to solve the following problem via a Finite Difference Approximation in Python using NumPy:
$u_t = k \, u_{xx}$, on $0 < x < L$ and $t > 0$;
$u(0,t) = u(L,t) = 0$;
$u(x,0) = f(x)$.
I take $u(x,0) = f(x) = x^2$ for my problem.
Programming is not my forte so I need help with the implementation of my code. Here is my code (I'm sorry it is a bit messy, but not too bad I hope):
## This program is to implement a Finite Difference method approximation
## to solve the Heat Equation, u_t = k * u_xx,
## in 1D w/out sources & on a finite interval 0 < x < L. The PDE
## is subject to B.C: u(0,t) = u(L,t) = 0,
## and the I.C: u(x,0) = f(x).
import numpy as np
import matplotlib.pyplot as plt
# definition of initial condition function
def f(x):
return x^2
# parameters
L = 1
T = 10
N = 10
M = 100
s = 0.25
# uniform mesh
x_init = 0
x_end = L
dx = float(x_end - x_init) / N
#x = np.zeros(N+1)
x = np.arange(x_init, x_end, dx)
x[0] = x_init
# time discretization
t_init = 0
t_end = T
dt = float(t_end - t_init) / M
#t = np.zeros(M+1)
t = np.arange(t_init, t_end, dt)
t[0] = t_init
# Boundary Conditions
for m in xrange(0, M):
t[m] = m * dt
# Initial Conditions
for j in xrange(0, N):
x[j] = j * dx
# definition of solution to u_t = k * u_xx
u = np.zeros((N+1, M+1)) # NxM array to store values of the solution
# finite difference scheme
for j in xrange(0, N-1):
u[j][0] = x**2 #initial condition
for m in xrange(0, M):
for j in xrange(1, N-1):
if j == 1:
u[j-1][m] = 0 # Boundary condition
else:
u[j][m+1] = u[j][m] + s * ( u[j+1][m] - #FDM scheme
2 * u[j][m] + u[j-1][m] )
else:
if j == N-1:
u[j+1][m] = 0 # Boundary Condition
print u, t, x
#plt.plot(t, u)
#plt.show()
So the first issue I am having is I am trying to create an array/matrix to store values for the solution. I wanted it to be an NxM matrix, but in my code I made the matrix (N+1)x(M+1) because I kept getting an error that the index was going out of bounds. Anyways how can I make such a matrix using numpy.array so as not to needlessly take up memory by creating a (N+1)x(M+1) matrix filled with zeros?
Second, how can I "access" such an array? The real solution u(x,t) is approximated by u(x[j], t[m]) were j is the jth spatial value, and m is the mth time value. The finite difference scheme is given by:
u(x[j],t[m+1]) = u(x[j],t[m]) + s * ( u(x[j+1],t[m]) - 2 * u(x[j],t[m]) + u(x[j-1],t[m]) )
(See here for the formulation)
I want to be able to implement the Initial Condition u(x[j],t[0]) = x**2 for all values of j = 0,...,N-1. I also need to implement Boundary Conditions u(x[0],t[m]) = 0 = u(x[N],t[m]) for all values of t = 0,...,M. Is the nested loop I created the best way to do this? Originally I tried implementing the I.C. and B.C. under two different for loops which I used to calculate values of the matrices x and t (in my code I still have comments placed where I tried to do this)
I think I am just not using the right notation but I cannot find anywhere in the documentation for NumPy how to "call" such an array so at to iterate through each value in the proposed scheme. Can anyone shed some light on what I am doing wrong?
Any help is very greatly appreciated. This is not homework but rather to understand how to program FDM for Heat Equation because later I will use similar methods to solve the Black-Scholes PDE.
EDIT: So when I run my code on line 60 (the last "else" that I use) I get an error that says invalid syntax, and on line 51 (u[j][0] = x**2 #initial condition) I get an error that reads "setting an array element with a sequence." What does that mean?

Categories