Minimizing a function of a linear combination of data with Scipy - python

Suppose I have some matrix X where each row represents a time-series. For example, X could be a matrix of size 3 x 1000, which would mean that there are 3 time-series each consisting of 1000 time-points. In addition to X, I have one scalar for each time-series in X. I would like to find a linear combination
a[0] * X[0, :] + a[1] * X[1, :] + ... + a[n-1] * X[n-1, :]
that has the minimum value for some function F.
So, I attempted the following
import numpy as np
from scipy.optimization import minimize
def f(x):
return 0 # for testing purposes
def obj(a,x):
y = a*x
return f(y)
minimize(obj, np.array([1,1]), args=np.array([[1,1],[2,2]]), method='nelder-mead')
So the second argument is the initial guess x0 (the coefficients a). The data given by args should get mapped to x (if I understand it correctly) and remains constant during the optimization.
However, I get the error
ValueError: setting an array element with a sequence.
I guess my problem is pretty general one, so I hope someone would be able to help!

Something like this?
import scipy.optimize as opt
def f(val):
return val**2
def obj(a, series):
s = 0
for row in series:
for t in range(len(row)):
s += f(a[t] * row[t])
return s
ll_x = [[2, 3, 2, 6], [3, 5, 2, 7]] # 2 series
l_a = [1 for _ in ll_x[0]] # initial coeffs.
res = opt.minimize(obj, l_a, args=ll_x, method='nelder-mead')
for elem in sorted(res.items()):
print(*elem)
(works for me with Python 3.4.3)

Related

Find values for which matrix becomes singular in Python

Let's take the following square matrix:
import numpy as np
A = np.array([[10.0, -498.0],
[-2.0, 100.0]])
A will be singular if its determinant (A[0,0]*A[1,1]-A[0,1]*A[1,0]) is zero. For example, A will be singular if A[0,1] takes the value -500.0 (all else unchanged):
from sympy import symbols, Eq, solve
y = symbols('y')
eq = Eq(A[0,0]*A[1,1]-y*A[1,0])
sol = solve(eq)
sol
How to find all values (A[0,0],A[0,1],...) for which A (or any given square matrix) becomes singular efficiently (I work with large matrices)? Many thanks in advance.
The trick is to use Laplace expansion to calculate the determinant. The formula is
det(A) = sum (-1)^(i+j) * a_ij * M_ij
So to make a matrix singular, you just need to use the above formula, change the subject to a_ij and set det(A) = 0. It can be done like this:
import numpy as np
def cofactor(A, i, j):
A = np.delete(A, (i), axis=0)
A = np.delete(A, (j), axis=1)
return (-1)**(i+j) * np.linalg.det(A)
def make_singular(A, I, J):
n = A.shape[0]
s = 0
for i in range(n):
if i != J:
s += A[I, i] * cofactor(A, I, i)
M = cofactor(A, I, J)
if M == 0:
return 'No solution'
else:
return -s / M
Testing:
>>> M = np.array([[10.0, -498.0],
[-2.0, 100.0]])
>>> make_singular(M, 0, 1)
-500.0000000000002
>>> M = np.array([[10.0, -498.0],
[0, 100.0]])
>>> make_singular(M, 0, 1)
'No solution'
This thing works for square matrices...
What it does is it bruteforces through every item in the matrix and check if its singular, (so theres a lot of messy output, ue it if you like it tho)
And also very important, it is a Recursive function that returns a matrix if it is singular. So it throws RecursiveError recursively....:|
This is the code i have come up with, you can use it if its okay for you
import numpy as np
def is_singular(_temp_int:str, matrix_size:int):
kwargs = [int(i) for i in _temp_int]
arr = [] # Creates the matrix from the given size
temp_count = 0
for i in range(matrix_size):
arr.append([])
m = arr[i]
for j in range(matrix_size):
m.append(int(_temp_int[temp_count]))
temp_count += 1
n_array = np.array(arr)
if int(np.linalg.det(n_array)) == 0:
print(n_array) # print(n_array) for a pretty output or print(arr) for single line output of the determinant matrix
_temp_int = str(_temp_int[:-len(str(int(_temp_int)+1))] + str(int(_temp_int)+1))
is_singular(_temp_int, matrix_size)
# Only square matrices, so only one-digit integer as input
print("List of singular matrices in the size of '3x3': ")
is_singular('112278011', 3)
# Just give a temporary integer string which will be converted to matrix like [[1, 1, 2], [2, 7, 8], [0, 1, 1]]
# From the provided integer string, it adds up 1 after every iteration
I think this is the code you want, let me know if its not working

Computing covariance matrix of complex array with defined function is not matching while comparing with np.cov

I am trying to write a simple covariance matrix function in Python.
import numpy as np
def manual_covariance(x):
mean = x.mean(axis=1)
print(x.shape[1])
cov = np.zeros((len(x), len(x)), dtype='complex64')
for i in range(len(mean)):
for k in range(len(mean)):
s = 0
for j in range(len(x[1])): # 5 Col
s += np.dot((x[i][j] - mean[i]), (x[k][j] - mean[i]))
cov[i, k] = s / ((x.shape[1]) - 1)
return cov
With this function if I compute the covariance of:
A = np.array([[1, 2], [1, 5]])
man_cov = manual_covariance(A)
num_cov = np.cov(A)
My answer matches with the np.cov(), and there is no problem. But, when I use complex number instead, my answer does not match with np.cov()
A = np.array([[1+1j, 1+2j], [1+4j, 5+5j]])
man_cov = manual_covariance(A)
num_cov = cov(A)
Manual result:
[[-0.5+0.j -0.5+2.j]
[-0.5+2.j 7.5+4.j]]
Numpy cov result:
[[0.5+0.j 0.5+2.j]
[0.5-2.j 8.5+0.j]]
I have tried printing every statement, to check where it can go wrong, but I am not able to find a fault.
It is because the dot product of two complex vectors z1 and z2 is defined as z1 ยท z2*, where * means conjugation. If you use s += np.dot((x[i,j] - mean[i]), np.conj(x[k,j] - mean[i])) you should get the correct result, where we have used Numpy's conjugate function.

How can I compute the Pearson correlation matrix and retain only significant values?

I have a 4-by-3 matrix, X, and wish to form the 3-by-3 Pearson correlation matrix, C, obtained by computing correlations between all 3 possible column combinations of X. However, entries of C that correspond to correlations that aren't statistically significant should be set to zero.
I know how to get pair-wise correlations and significance values using pearsonr in scipy.stats. For example,
import numpy as np
from scipy.stats.stats import pearsonr
X = np.array([[1, 1, -2], [0, 0, 0], [0, .2, 1], [5, 3, 4]])
pearsonr(X[:, 0], X[:, 1])
returns (0.9915008164289165, 0.00849918357108348), a correlation of about .9915 between columns one and two of X, with p-value .0085.
I could easily get my desired matrix using nested loops:
Pre-populate C as a 3-by-3 matrix of zeros.
Each pass of the nested loop will correspond to two columns of X. The entry of C corresponding to this pair of columns will be set to the pairwise correlation provided the p-value is less than or equal to my threshold, say .01.
I'm wondering if there's a simpler way. I know in Pandas, I can create the correlation matrix, C, in basically one line:
import pandas as pd
df = pd.DataFrame(data=X)
C_frame = df.corr(method='pearson')
C = C_frame.to_numpy()
Is there a way to get the matrix or data frame of p-values, P, without a loop? If so, how could I set each entry of C to zero should the corresponding p-value in P exceed my threshold?
Looking through the docs for pearsonr reveals the fomulae used to compute the correlations. It should not be too difficult to get the correlations between each column of a matrix using vectorization.
While you could compute the value of C using pandas, I will show pure numpyan implementation for the entire process.
First, compute the r-values:
X = np.array([[1, 1, -2],
[0, 0, 0],
[0, .2, 1],
[5, 3, 4]])
n = X.shape[0]
X -= X.mean(axis=0)
s = (X**2).sum(axis=0)
r = (X[..., None] * X[..., None, :]).sum(axis=0) / np.sqrt(s[:, None] * s[None, :])
Computing the p values is made simple given the existence of the beta distribution in scipy. Taken directly from the docs:
dist = scipy.stats.beta(n/2 - 1, n/2 - 1, loc=-1, scale=2)
p = 2 * dist.cdf(-abs(r))
You can trivially make a mask from p with your threshold, and apply it to r to make C:
mask = (p <= 0.01)
C = np.zeros_like(r)
C[mask] = r[mask]
A better option would probably be to modify your r in-place:
r[p > 0.1] = 0
In function form:
def non_trivial_correlation(X, threshold=0.1):
n = X.shape[0]
X = X - X.mean(axis=0) # Don't modify the original
x = (X**2).sum(axis=0)
r = (X[..., None] * X[..., None, :]).sum(axis=0) / np.sqrt(s[:, None] * s[None, :])
p = 2 * scipy.stats.beta(n/2 - 1, n/2 - 1, loc=-1, scale=2).cdf(-abs(r))
r[p > threshold] = 0
return r

How to write a function, that generates a vector recursively in Python?

How can I write a recursive function to generate a vector X of size (1,n) as follows, where X_i is the i-th entry:
X_1 = Z_1 * E_1
X_i = max{B_(1,i) * X_1, ... , B_((i-1),i) * X_(i-1), Z_i} * E_i, i = 2,...,n,
where
Z = np.random.normal(0, 1,size = n)
E = np.random.lognormal(0, 1, size = n)
B = np.random.uniform(0,1,(n,n))
I do not have any experience with recursive functions, that is why I can not present any code with which I tried to solve this.
If you're working with numpy, then use all the power of numpy, not just the random module ;)
And if you work with vectors, then forget about recursion and use numpy's vectorised operations. For example, np.max gives you the maximum over an axis, np.dot gives you element-wise multiplication. You also have np.prod for the product of array elements over a given axis... Those are just examples that might fit your problem well. For a full documentation, https://docs.scipy.org/doc/numpy/
I got it, one does not need a recursion as #meowgoesthedog stated in the first comment.
import numpy as np
s=1000 # sample size
n=5
Z = np.random.normal(0, 1,size = (s,n))
B = np.random.uniform(0,1,(n,n))
E = np.random.lognormal(0, 1, size = (s,n))
X = np.zeros((s,n))
X[:,0] = Z[:,0]*E[:,0]
for k in range(s):
for l in range(1,n):
X[k,l] = max(np.max(X[k,:(l)] * B[:(l),l]), Z[k,l]) * E[k,l]

2d sum using an array - Python

I'm trying to sum a two dimensional function using the array method, somehow, using a for loop is not outputting the correct answer. I want to find (in latex) $$\sum_{i=1}^{M}\sum_{j=1}^{M_2}\cos(i)\cos(j)$$ where according to Mathematica the answer when M=5 is 1.52725. According to the for loop:
def f(N):
s1=0;
for p1 in range(N):
for p2 in range(N):
s1+=np.cos(p1+1)*np.cos(p2+1)
return s1
print(f(4))
is 0.291927.
I have thus been trying to use some code of the form:
def f1(N):
mat3=np.zeros((N,N),np.complex)
for i in range(0,len(mat3)):
for j in range(0,len(mat3)):
mat3[i][j]=np.cos(i+1)*np.cos(j+1)
return sum(mat3)
which again
print(f1(4))
outputs 0.291927. Looking at the array we should find for each value of i and j a matrix of the form
mat3=[[np.cos(1)*np.cos(1),np.cos(2)*np.cos(1),...],[np.cos(2)*np.cos(1),...]...[np.cos(N+1)*np.cos(N+1)]]
so for N=4 we should have
mat3=[[np.cos(1)*np.cos(1) np.cos(2)*np.cos(1) ...] [np.cos(2)*np.cos(1) ...]...[... np.cos(5)*np.cos(5)]]
but what I actually get is the following
mat3=[[0.29192658+0.j 0.+0.j 0.+0.j ... 0.+0.j] ... [... 0.+0.j]]
or a matrix of all zeros apart from the mat3[0][0] element.
Does anybody know a correct way to do this and get the correct answer? I chose this as an example because the problem I'm trying to solve involves plotting a function which has been summed over two indices and the function that python outputs is not the same as Mathematica (i.e., a function of the form $$f(E)=\sum_{i=1}^{M}\sum_{j=1}^{M_2}F(i,j,E)$$).
The return statement is not indented correctly in your sample code. It returns immediately in the first loop iteration. Indent it on the function body instead, so that both for loops finish:
def f(N):
s1=0;
for p1 in range(N):
for p2 in range(N):
s1+=np.cos(p1+1)*np.cos(p2+1)
return s1
>>> print(f(5))
1.527247272700347
I have moved your code to a more numpy-ish version:
import numpy as np
N = 5
x = np.arange(N) + 1
y = np.arange(N) + 1
x = x.reshape((-1, 1))
y = y.reshape((1, -1))
mat = np.cos(x) * np.cos(y)
print(mat.sum()) # 1.5272472727003474
The trick here is to reshape x to a column and y to a row vector. If you multiply them, they are matched up like in your loop.
This should be more performant, since cos() is only called 2*N times. And it avoids loops (bad in python).
UPDATE (regarding your comment):
This pattern can be extended in any dimension. Basically, you get something like a crossproduct. Where every instance of x is matched up with every instance of y, z, u, k, ... Along the corresponding dimensions.
It's a bit confusing to describe, so here is some more code:
import numpy as np
N = 5
x = np.arange(N) + 1
y = np.arange(N) + 1
z = np.arange(N) + 1
x = x.reshape((-1, 1, 1))
y = y.reshape((1, -1, 1))
z = z.reshape((1, 1, -1))
mat = z**2 * np.cos(x) * np.cos(y)
# x along first axis
# y along second, z along third
# mat[0, 0, 0] == 1**2 * np.cos(1) * np.cos(1)
# mat[0, 4, 2] == 3**2 * np.cos(1) * np.cos(5)
If you use this for many dimensions, and big values for N, you will run into memory problems, though.

Categories