I am looking for finding or rather building common eigenvectors matrix X between 2 matrices A and B such as :
AX=aX with "a" the diagonal matrix corresponding to the eigenvalues
BX=bX with "b" the diagonal matrix corresponding to the eigenvalues
where A and B are square and diagonalizable matrices.
I took a look in a similar post but had not managed to conclude, i.e having valid results when I build the final wanted endomorphism F defined by : F = P D P^-1
I have also read the wikipedia topic and this interesting paper but couldn't have to extract methods pretty easy to implement.
Particularly, I am interested by the eig(A,B) Matlab function.
I tried to use it like this :
% Search for common build eigen vectors between FISH_sp and FISH_xc
[V,D] = eig(FISH_sp,FISH_xc);
% Diagonalize the matrix (A B^-1) to compute Lambda since we have AX=Lambda B X
[eigenv, eigen_final] = eig(inv(FISH_xc)*FISH_sp);
% Compute the final endomorphism : F = P D P^-1
FISH_final = V*eye(7).*eigen_final*inv(V)
But the matrix FISH_final don't give good results since I can do other computations from this matrix FISH_final (this is actually a Fisher matrix) and the results of these computations are not valid.
So surely, I must have done an error in my code snippet above. In a first time, I prefer to conclude in Matlab as if it was a prototype, and after if it works, look for doing this synthesis with MKL or with Python functions. Hence also tagging python.
How can I build these common eigenvectors and finding also the eigenvalues associated? I am a little lost between all the potential methods that exist to carry it out.
The screen capture below shows that the kernel of commutator has to be different from null vector :
EDIT 1: From maths exchange, one advices to use Singular values Decomposition (SVD) on the commutator [A,B], that is in Matlab doing by :
"If š£ is a common eigenvector, then ā(š“šµāšµš“)š£ā=0. The SVD approach gives you a unit-vector š£ that minimizes ā(š“šµāšµš“)š£ā (with the constraint that āš£ā=1)"
So I extract the approximative eigen vectors V from :
[U,S,V] = svd(A*B-B*A)
Is there a way to increase the accuracy to minimize ā(š“šµāšµš“)š£ā as much as possible ?
IMPORTANT REMARK : Maybe some of you didn't fully understand my goal.
Concerning the common basis of eigen vectors, I am looking for a combination (vectorial or matricial) of V1 and V2, or directly using null operator on the 2 input Fisher marices, to build this new basis "P" in which, with others eigenvalues than known D1 and D2 (noted D1a and D2a), we could have :
F = P (D1a+D2a) P^-1
To compute the new Fisher matrix F, I need to know P, assuming that D1a and D2a are equal respectively to D1 and D2 diagonal matrices (coming from diagonalization of A and B matrices)
If I know common basis of eigen vectors P, I could deduce D1a and Da2 from D1 and D2, couldn't I ?
The 2 Fisher matrices are available on these links :
Matrix A
Matrix B
I don't think there is a built-in facility in Matlab for computing common eigenvalues of two matrices. I'll just outline brute force way and do it in Matlab in order to highlight some of its eigenvector related methods. We will assume the matrices A and B are square and diagonalizable.
Outline of steps:
Get eigenvectors/values for A and B respectively.
Group the resultant eigenvectors by their eigenspaces.
Check for intersection of the eigenspaces by checking linear dependency among the eigenvectors of A and B one pair eigenspaces at a time.
Matlab does provide methods for (efficiently) completing each step! Except of course step 3 involves checking linear dependency many many times, which in turn means we are likely doing unnecessary computation. Not to mention, finding common eigenvectors may not require finding all eigenvectors. So this is not meant to be a general numerical recipe.
How to get eigenvector/values
The syntax is
[V,D] = eig(A)
where D(i), V(:,i) are the corresponding eigenpairs.
Just be wary of numerical errors. In other words, if you check
tol=sum(abs(A*V(:,i)-D(i)*V(:,i)));
tol<n*eps should be true for some small n for a smallish matrix A but it's probably not true for 0 or 1.
Example:
>> A = gallery('lehmer',4);
>> [V,D] = eig(A);
>> sum(abs(A*V(:,1)-D(1)*V(:,1)))<eps
ans =
logical
0
>> sum(abs(A*V(:,1)-D(1)*V(:,1)))<10*eps
ans =
logical
1
How to group eigenvectors by their eigenspaces
In Matlab, eigenvalues are not automatically sorted in the output of [V,D] = eig(A). So you need to do that.
Get diagonal entries of matrix: diag(D)
Sort and keep track of the required permutation for sorting: [d,I]=sort(diag(D))
Identify repeating elements in d: [~,ia,~]=unique(d,'stable')
ia(i) tells you the beginning index of the ith eigenspace. So you can expect d(ia(i):ia(i+1)-1) to be identical eigenvalues and thus the eigenvectors belonging to the ith eigenspace are the columns W(:,ia(i):ia(i+1)-1) where W=V(:,I). Of course, for the last one, the index is ia(end):end
The last step happens to be answered here in true generality. Here, unique is sufficient at least for small A.
(Feel free to ask a separate question on how to do this whole step of "shuffling columns of one matrix based on another diagonal matrix" efficiently. There are probably other efficient methods using built-in Matlab functions.)
For example,
>> A=[1,2,0;1,2,2;3,6,1];
>> [V,D] = eig(A),
V =
0 0 0.7071
1.0000 -0.7071 0
0 0.7071 -0.7071
D =
3 0 0
0 5 0
0 0 3
>> [d,I]=sort(diag(D));
>> W=V(:,I),
W =
0 0.7071 0
1.0000 0 -0.7071
0 -0.7071 0.7071
>> [~,ia,~]=unique(d,'stable'),
ia =
1
3
which makes sense because the 1st eigenspace is the one with eigenvalue 3 comprising of span of column 1 and 2 of W, and similarly for the 2nd space.
How to get linear intersect of (the span of) two sets
To complete the task of finding common eigenvectors, you do the above for both A and B. Next, for each pair of eigenspaces, you check for linear dependency. If there is linear dependency, the linear intersect is an answer.
There are a number of ways for checking linear dependency. One is to use other people's tools. Example: https://www.mathworks.com/matlabcentral/fileexchange/32060-intersection-of-linear-subspaces
One is to get the RREF of the matrix formed by concatenating the column vectors column-wise.
Let's say you did the computation in step 2 and arrived at V1, D1, d1, W1, ia1 for A and V2, D2, d2, W2, ia2 for B. You need to do
for i=1:numel(ia1)
for j=1:numel(ia2)
check_linear_dependency(col1,col2);
end
end
where col1 is W1(:,ia1(i):ia1(i+1)-1) as mentioned in step 2 but with the caveat for the last space and similarly for col2 and by check_linear_dependency we mean the followings. First we get RREF:
[R,p] = rref([col1,col2]);
You are looking for, first, rank([col1,col2])<size([col1,col2],2). If you have computed rref anyway, you already have the rank. You can check the Matlab documentation for details. You will need to profile your code for selecting the more efficient method. I shall refrain from guess-estimating what Matlab does in rank(). Although whether doing rank() implies doing the work in rref can make a good separate question.
In cases where rank([col1,col2])<size([col1,col2],2) is true, some rows don't have leading 1s and I believe p will help you trace back to which columns are dependent on which other columns. And you can build the intersect from here. As usual, be alert of numerical errors getting in the way of == statements. We are getting to the point of a different question -- ie. how to get linear intersect from rref() in Matlab, so I am going to leave it here.
There is yet another way using fundamental theorem of linear algebra (*sigh at that unfortunate naming):
null( [null(col1.').' ; null(col2.').'] )
The formula I got from here. I think ftla is why it should work. If that's not why or if you want to be sure that the formula works (which you probably should), please ask a separate question. Just beware that purely math questions should go on a different stackexchange site.
Now I guess we are done!
EDIT 1:
Let's be extra clear with how ia works with an example. Let's say we named everything with a trailing 1 for A and 2 for B. We need
for i=1:numel(ia1)
for j=1:numel(ia2)
if i==numel(ia1)
col1 = W1(:,ia1(end):end);
else
col1 = W1(:,ia1(i):ia1(i+1)-1);
end
if j==numel(ia2)
col2 = W2(:,ia2(j):ia2(j+1)-1);
else
col2 = W2(:,ia2(end):end);
end
check_linear_dependency(col1,col2);
end
end
EDIT 2:
I should mention the observation that common eigenvectors should be those in the nullspace of the commutator. Thus, perhaps null(A*B-B*A) yields the same result.
But still be alert of numerical errors. With the brute force method, we started with eigenpairs with low tol (see definition in earlier sections) and so we already verified the "eigen" part in the eigenvectors. With null(A*B-B*A), the same should be done as well.
Of course, with multiple methods at hand, it's good idea to compare results across methods.
I suspect this is rather a delicate matter.
First off, mathematically, A and B are simultaneously diagonalisable iff they commute, that is iff
A*B - B*A == 0 (often A*B-B*A is written [A,B])
(for if A*X = X*a and B*X = X*b with a, b diagonal then
A = X*a*inv(X), B = X*b*inv(X)
[A,B] = X*[a,b]*inv(X) = 0 since a and b, being diagonal, commute)
So I'd say the first thing to check is that your A and B do commute, and here is the first awkward issue: since [A,B] as computed is unlikely to be all zeroes due to rounding error, you'll need to decide if [A,B] being non-zero is just due to rounding error or if, actually, A and B don't commute.
Now suppose x is an eigenvector of A, with eigenvalue e. Then
A*B*x = B*A*x = B*e*x = e*B*x
And so we have, mathematically, two possibilities: either Bx is 0, or Bx is also an eigenvector of A with eigenvalue e.
A nice case is when all the elements of a are different, that is when each eigenspace of A is one dimensional. In that case:
if AX = Xa for diagonal a, then BX = Xb for diagonal b (which you'll need to compute). If you diagonalize A, and all the eigenvalues are sufficiently different, then you can assume each eigenspace is of dimension 1, but what does 'sufficiently' mean? Another delicate question, alas. If two computed eigenvalues are very close, are the eigenvalues different or is the difference rounding error?
Anyway, to compute the eigenvalues of b for each eigenvector x of A compute Bx. If ||Bx|| is small enough compared to ||x|| then the eigenvalue of B is 0, otherwise it's
x'*B*x/(x'*x)
In the general case, some of the eigenspaces may have dimension greater than 1. The one dimensional eigen spaces can be dealt with as above, but more computation is required for the higher dimensional ones.
Suppose that m eigenvectors x[1].. x[m] of A correspond to the eigenvalue e. Since A and B commute, it is easy to see that B maps the space spanned by the xs to itself. Let C be the mxm matrix
C[i,j] = x[j]'*B*x[i]
Then C is symmetric and so we can diagonalize it, ie find orthogonal V and diagonal c with
C = V'*c*V
If we define
y[l] = Sum{ k | V[l,k]*x[k] } l=1..m
then a little algebra shows that y[l] is an eigenvector of B, with eigenvalue c[l]. Moreover, since each x[i] is an eigenvector of A with the same eigenvalue e, each y[l] is also an eigenvector of A with eigenvector e.
So all in all, I think a method would be:
Compute [A,B] and if its really not 0, give up
Diagonalise A, and sort the eigenvalues to be increasing (and sort the eigenvectors!)
Identify the eigenspaces of A. For the 1 dimensional spaces the corresponding eigenvector of A is an eigenvector of B, and all you need to compute is the eigenvalue of B. For higher dimensional ones, proceed as in the previous paragraph.
A relatively expensive (in computational effort) but reasonably reliable way to test whether the commutator is zero would be to compute the svd of the commutator and take is largest singular value, c say, and also to take the largest singular value (or largest absolute value of the eigenvalues) a of A and b of B. Unless c is a lot smaller (eg 1e-10 times) the lesser of a and b, you should conclude the commutator is not zero.
I'm trying to solve this task.
I wrote function for this purpose which uses itertools.product() for Cartesian product of input iterables:
def probability(dice_number, sides, target):
from itertools import product
from decimal import Decimal
FOUR_PLACES = Decimal('0.0001')
total_number_of_experiment_outcomes = sides ** dice_number
target_hits = 0
sides_combinations = product(range(1, sides+1), repeat=dice_number)
for side_combination in sides_combinations:
if sum(side_combination) == target:
target_hits += 1
p = Decimal(str(target_hits / total_number_of_experiment_outcomes)).quantize(FOUR_PLACES)
return float(p)
When calling probability(2, 6, 3) output is 0.0556, so works fine.
But calling probability(10, 10, 50) calculates veeery long (hours?), but there must be a better way:)
for side_combination in sides_combinations: takes to long to iterate through huge number of sides_combinations.
Please, can you help me to find out how to speed up calculation of result, i want too sleep tonight..
I guess the problem is to find the distribution of the sum of dice. An efficient way to do that is via discrete convolution. The distribution of the sum of variables is the convolution of their probability mass functions (or densities, in the continuous case). Convolution is an n-ary operator, so you can compute it conveniently just two pmf's at a time (the current distribution of the total so far, and the next one in the list). Then from the final result, you can read off the probabilities for each possible total. The first element in the result is the probability of the smallest possible total, and the last element is the probability of the largest possible total. In between you can figure out which one corresponds to the particular sum you're looking for.
The hard part of this is the convolution, so work on that first. It's just a simple summation, but it's just a little tricky to get the limits of the summation correct. My advice is to work with integers or rationals so you can do exact arithmetic.
After that you just need to construct an appropriate pmf for each input die. The input is just [1, 1, 1, ... 1] if you're using integers (you'll have to normalize eventually) or [1/n, 1/n, 1/n, ..., 1/n] if rationals, where n = number of faces. Also you'll need to label the indices of the output correctly -- again this is just a little tricky to get it right.
Convolution is a very general approach for summations of variables. It can be made even more efficient by implementing convolution via the fast Fourier transform, since FFT(conv(A, B)) = FFT(A) FFT(B). But at this point I don't think you need to worry about that.
If someone still interested in solution which avoids very-very-very long iteration process through all itertools.product Cartesian products, here it is:
def probability(dice_number, sides, target):
if dice_number == 1:
return (1 <= target <= sides**dice_number) / sides
return sum([probability(dice_number-1, sides, target-x) \
for x in range(1,sides+1)]) / sides
But you should add caching of probability function results, if you won't - calculation of probability will takes very-very-very long time as well)
P.S. this code is 100% not mine, i took it from the internet, i'm not such smart to product it by myself, hope you'll enjoy it as much as i.
I'm constructing a Naive Bayes text classifier from scratch in Python and I am aware that, upon encountering a product of very small probabilities, using a logarithm over the probabilities is a good choice.
The issue now, is that the mathematical function that I'm using has a summation OVER a product of these extremely small probabilities.
To be specific, I'm trying to calculate the total word probabilities given a mixture component (class) over all classes.
Just plainly adding up the logs of these total probabilities is incorrect, since the log of a sum is not equal to the sum of logs.
To give an example, lets say that I have 3 classes, 2000 words and 50 documents.
Then I have a word probability matrix called wordprob with 2000 rows and 3 columns.
The algorithm for the total word probability in this example would look like this:
sum = 0
for j in range(0,3):
prob_product = 1
for i in words: #just the index of words from my vocabulary in this document
prob_product = prob_product*wordprob[i,j]
sum = sum + prob_product
What ends up happening is that prob_product becomes 0 on many iterations due to many small probabilities multiplying with each other.
Since I can't easily solve this with logs (because of the summation in front) I'm totally clueless.
Any help will be much appreciated.
I think you may be best to keep everything in logs. The first part of this, to compute the log of the product is just adding up the log of the terms. The second bit, computing the log of the sum of the exponentials of the logs is a bit trickier.
One way would be to store each of the logs of the products in an array, and then you need a function that, given an array L with n elements, will compute
S = log( sum { i=1..n | exp( L[i])})
One way to do this is to find the maximum, M say, of the L's; a little algebra shows
S = M + log( sum { i=1..n | exp( L[i]-M)})
Each of the terms L[i]-M is non-positive so overflow can't occur. Underflow is not a problem as for them exp will return 0. At least one of them (the one where L[i] is M) will be zero so it's exp will be one and we'll end up with something we can pass to log. In other words the evaluation of the formula will be trouble free.
If you have the function log1p (log1p(x) = log(1+x)) then you could gain some accuracy by omitting the (just one!) i where L[i] == M from the sum, and passing the sum to log1p instead of log.
your question seems on the math side of things rather than the coding of it.
I haven't quite figured out what your issue is but the sum of logs equals the log of the products. Dont know if that helps..
Also, you are calculating one prob_product for every j but you are just using the last one (and you are re-initializing it). you meant to do one of two things: either initialize it before the j-loop or use it before you increment j. Finally, i doesnt look that you need to initialize sum unless this is part of yet another loop you are not showing here.
That's all i have for now.
Sorry for the long post and no code.
High school algebra tells you this:
log(A*B*....*Z) = log(A) + log(B) + ... + log(Z) != log(A + B + .... + Z)
I have a 1D array of data and wish to extract the spatial variation. The standard way to do this which I wish to pythonize is to perform a moving linear regression to the data and save the gradient...
def nssl_kdp(phidp, distance, fitlen):
kdp=zeros(phidp.shape, dtype=float)
myshape=kdp.shape
for swn in range(myshape[0]):
print "Sweep ", swn+1
for rayn in range(myshape[1]):
print "ray ", rayn+1
small=[polyfit(distance[a:a+2*fitlen], phidp[swn, rayn, a:a+2*fitlen],1)[0] for a in xrange(myshape[2]-2*fitlen)]
kdp[swn, rayn, :]=array((list(itertools.chain(*[fitlen*[small[0]], small, fitlen*[small[-1]]]))))
return kdp
This works well but is SLOW... I need to do this 17*360 times...
I imagine the overhead is in the iterator in the [ for in arange] line... Is there an implimentation of a moving fit in numpy/scipy?
the calculation for linear regression is based on the sum of various values. so you could write a more efficient routine that modifies the sum as the window moves (adding one point and subtracting an earlier one).
this will be much more efficient than repeating the process every time the window shifts, but is open to rounding errors. so you would need to restart occasionally.
you can probably do better than this for equally spaced points by pre-calculating all the x dependencies, but i don't understand your example in detail so am unsure whether it's relevant.
so i guess i'll just assume that it is.
the slope is (NĪ£XY - (Ī£X)(Ī£Y)) / (NĪ£X2 - (Ī£X)2) where the "2" is "squared" - http://easycalculation.com/statistics/learn-regression.php
for evenly spaced data the denominator is fixed (since you can shift the x axis to the start of the window without changing the gradient). the (Ī£X) in the numerator is also fixed (for the same reason). so you only need to be concerned with Ī£XY and Ī£Y. the latter is trivial - just add and subtract a value. the former decreases by Ī£Y (each X weighting decreases by 1) and increases by (N-1)Y (assuming x_0 is 0 and x_N is N-1) each step.
i suspect that's not clear. what i am saying is that the formula for the slope does not need to be completely recalculated each step. particularly because, at each step, you can rename the X values as 0,1,...N-1 without changing the slope. so almost everything in the formula is the same. all that changes are two terms, which depend on Y as Y_0 "drops out" of the window and Y_N "moves in".
I've used these moving window functions from the somewhat old scikits.timeseries module with some success. They are implemented in C, but I haven't managed to use them in a situation where the moving window varies in size (not sure if you need that functionality).
http://pytseries.sourceforge.net/lib.moving_funcs.html
Head here for downloads (if using Python 2.7+, you'll probably need to compile the extension itself -- I did this for 2.7 and it works fine):
http://sourceforge.net/projects/pytseries/files/scikits.timeseries/0.91.3/
I/we might be able to help you more if you clean up your example code a bit. I'd consider defining some of the arguments/objects in lines 7 and 8 (where you're defining 'small') as variables, so that you don't end row 8 with so many hard-to-follow parentheses.
Ok.. I have what seems to be a solution.. not an answer persay, but a way of doing a moving, multi-point differential... I have tested this and the result looks very very similar to a moving regression... I used a 1D sobel filter (ramp from -1 to 1 convolved with the data):
def KDP(phidp, dx, fitlen):
kdp=np.zeros(phidp.shape, dtype=float)
myshape=kdp.shape
for swn in range(myshape[0]):
#print "Sweep ", swn+1
for rayn in range(myshape[1]):
#print "ray ", rayn+1
kdp[swn, rayn, :]=sobel(phidp[swn, rayn,:], window_len=fitlen)/dx
return kdp
def sobel(x,window_len=11):
"""Sobel differential filter for calculating KDP
output:
differential signal (Unscaled for gate spacing
example:
"""
s=np.r_[x[window_len-1:0:-1],x,x[-1:-window_len:-1]]
#print(len(s))
w=2.0*np.arange(window_len)/(window_len-1.0) -1.0
#print w
w=w/(abs(w).sum())
y=np.convolve(w,s,mode='valid')
return -1.0*y[window_len/2:len(x)+window_len/2]/(window_len/3.0)
this runs QUICK!