Linear Dependence of Set of Vectors in numpy - python

I want to check whether some vectors are dependent on each other or not by numpy, I found some good suggestions for checking linear dependency of rows of a matrix in the link below:
How to find linearly independent rows from a matrix
I can not understand the 'Cauchy-Schwarz inequality' method which I think is due to lack of my knowledge, however I tried the Eigenvalue method to check linear dependency among columns and here is my code:
A = np.array([
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 1, 1, 0],
[1, 0, 0, 1]
])
lambdas, V = np.linalg.eig(A)
print(lambdas)
print(V)
and I get:
[ 1. 0. 1.61803399 -0.61803399]
[[ 0. 0.70710678 0.2763932 -0.7236068 ]
[ 0. 0. 0.4472136 0.4472136 ]
[ 0. 0. 0.7236068 -0.2763932 ]
[ 1. -0.70710678 0.4472136 0.4472136 ]]
My question is what is the relevance between these eigenvectors or eigenvalues to the dependency of columns of my matrix? How can I understand which columns are dependent to each other and which are independent by these values?

The second column vector corresponds to the eigenvalue of 0.
Just take a look at the API documentation when you get confused.
v : (…, M, M) array
The normalized (unit “length”) eigenvectors, such that the column
v[:,i] is the eigenvector corresponding to the eigenvalue w[i].

You can find the linearly independent columns by QR decomposition as described here.

Related

How to make a affine transform matrix to a perspective transform matrix?

From code
rotation = cv2.getRotationMatrix2D((0, 0), 47.65, 1.0)
I got a rotation transform matrix like:
[[ 0.67365771 0.7390435 0. ]
[-0.7390435 0.67365771 0. ]]
Since rotation is a special case of affine transform, I think this is a valid affine transform matrix, am I right?
Since affine transform is a special case of perspective transform, I also think this matrix will be a valid perspective transform matrix, if I make some modification based on it.
So I tried to add 1 more row to make it shape as 3 x 3.
newrow = numpy.array([numpy.array([1, 1, 1])]) # [[0 0 0]]
rotation3 = numpy.append(rotation, newrow, axis=0)
print(rotation3):
[[ 0.67365771 0.7390435 0. ]
[-0.7390435 0.67365771 0. ]
[ 1. 1. 1. ]]
But rotation3 does not seem to work properly as a perspective matrix, here is how I tested it:
rotated_points = cv2.perspectiveTransform(points, rotation3)
rotated_points does not look like a rotaion of points
Is [1, 1, 1] the correct row 3, should I also change row 1 and 2? and how can I do it?
Basically you are right, the affine transform is a special case of the perspective transform.
The perspective transform of a identity matrix results in no change to the output:
(identity 3x3 matrix)
[1,0,0]
[0,1,0]
[0,0,1]
So if you want a affine transformation matrix to grow to a perspective one you want to add the last line of this identity matrix.
Your example would look like:
[ 0.67365771 0.7390435 0. ]
[-0.7390435 0.67365771 0. ]
[ 0. 0. 1. ]
Applying the above perspective mat has the same effect as if you would apply a affine transform with:
[ 0.67365771 0.7390435 0. ]
[-0.7390435 0.67365771 0. ]
--> have a look at
affie transformation wikipedia
Opencv generate identity matrix
identity matrix wikipedia

How to solve a large set of multivariate compound linear inequality in Python?

I'm working on trying to implement Dinur-Nissim algorithm and am stuck at how to solve the set of linear inequalities with multiple unknowns and a large number of equations along with constraints.
Example:
0.2<=c4<=0.66
0.66<=c3<=1.56
0.96<=c3+c4<=2.26
Constraints:
0<=ci<=1
and many other equations with the no of unknowns going till cn where n is the size of the database, so I need a solution which works for a large number of equations.
I've tried to look for some libraries but most of them solve Maximization or Minimization problems so am not sure if its possible to convert these equations to one of those problems.
Simple approach using scipy's linprog (Linear Programming; LP is probably the most specific/powerful optimization-problem-type usable here!):
Code
from scipy.optimize import linprog
c = [0, 0, 0, 0] # empty objective
A = [[0, 0, 0, -1], [0, 0, 0, 1],
[0, 0, -1, 0], [0, 0, 1, 0],
[0, 0, -1, -1], [0, 0, 1, 1]]
b = [-0.2, 0.66, -0.66, 1.56, -0.96, 2.26]
result = linprog(c, A, b, bounds=(0,1))
print(result)
Output
fun: -0.0
message: 'Optimization terminated successfully.'
nit: 3
slack: array([ 0.1 , 0.9 , 1.3 , 1. , 1. , 0.34, 0.7 , 0. , 0. , 0. ])
status: 0
success: True
x: array([ 0. , 0. , 0.66, 0.3 ])
The above is a basic usage of linprog:
we don't need any objective and therefore keep all factors zero (see c)
we need the form Ax <= b for our inequalities:
0.96 <= c3+c4 <=> c3+c4 >= 0.96 <=> -c3 -c4 <= 0.96
Keep in mind, that linprog is not as stable as commercial solvers. You could solve this problem also with SLSQP.
The above, in combination with your description:
basically the last step involves making a decision based on the value of c, if ci>1/2 then xi=1 else xi=0, so I just need to find the region of c's which satisfy the inequalities
makes not much sense in the general-case, as the optimization, as described in your post, returns a feasible value and without additional modelling, the solver does not care about your threshold of 0.5. So you should check your theory again (i did not check out your algorithm to implement; maybe the nature of the problem allows for this approach).

affinity propagation in python

I am seeing something strange while using AffinityPropagation from sklearn. I have a 4 x 4 numpy ndarray - which is basically the affinity-scores. sim[i, j] has the affinity score of [i, j]. Now, when I feed into the AffinityPropgation function, I get a total of 4 labels.
here is an similar example with a smaller matrix:
In [215]: x = np.array([[1, 0.2, 0.4, 0], [0.2, 1, 0.8, 0.3], [0.4, 0.8, 1, 0.7], [0, 0.3, 0.7, 1]]
.....: )
In [216]: x
Out[216]:
array([[ 1. , 0.2, 0.4, 0. ],
[ 0.2, 1. , 0.8, 0.3],
[ 0.4, 0.8, 1. , 0.7],
[ 0. , 0.3, 0.7, 1. ]])
In [217]: clusterer = cluster.AffinityPropagation(affinity='precomputed')
In [218]: f = clusterer.fit(x)
In [219]: f.labels_
Out[219]: array([0, 1, 1, 1])
This says (according to Kevin), that the first sample (0th-indexed row) is a cluster (Cluster # 0) on its own and the rest of the samples are in another cluster (cluster # 1). But, still, I do not understand this output. What is a sample here? What are the members? I want to have a set of pairs (i, j) assigned to one cluster, another set of pairs assigned to another cluster and so on.
It looks like a 4-sample x 4-feature matrix..which I do not want. Is this the problem? If so, how to convert this to a nice 4-sample x 4-sample affinity-matrix?
The documentation (http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AffinityPropagation.html) says
fit(X, y=None)
Create affinity matrix from negative euclidean distances, then apply affinity propagation clustering.
Parameters:
X: array-like, shape (n_samples, n_features) or (n_samples, n_samples) :
Data matrix or, if affinity is precomputed, matrix of similarities / affinities.
Thanks!
By your description it sounds like you are working with a "pairwise similarity matrix": x (although your example data does not show that). If this is the case your matrix should be symmertric so that: sim[i,j] == sim[j,i] with your diagonal values equal to 1. Example similarity data S:
S
array([[ 1. , 0.08276253, 0.16227766, 0.47213595, 0.64575131],
[ 0.08276253, 1. , 0.56776436, 0.74456265, 0.09901951],
[ 0.16227766, 0.56776436, 1. , 0.47722558, 0.58257569],
[ 0.47213595, 0.74456265, 0.47722558, 1. , 0.87298335],
[ 0.64575131, 0.09901951, 0.58257569, 0.87298335, 1. ]])
Typically when you already have a distance matrix you should use affinity='precomputed'. But in your case, you are using similarity. In this specific example you can transform to pseudo-distance using 1-D. (The reason to do this would be because I don't know that Affinity Propagation will give you expected results if you give it a similarity matrix as input):
1-D
array([[ 0. , 0.91723747, 0.83772234, 0.52786405, 0.35424869],
[ 0.91723747, 0. , 0.43223564, 0.25543735, 0.90098049],
[ 0.83772234, 0.43223564, 0. , 0.52277442, 0.41742431],
[ 0.52786405, 0.25543735, 0.52277442, 0. , 0.12701665],
[ 0.35424869, 0.90098049, 0.41742431, 0.12701665, 0. ]])
With that being said, I think this is where your interpretation was off:
This says that the first 3-rows are similar, 4th row is a cluster on its own, and the 5th row is also a cluster on its own. Totally of 3 clusters.
The f.labels_ array:
array([0, 1, 1, 1, 0])
is telling you that samples (not rows) 0 and 4 are in cluster 0 AND that samples 2, 3, and 4 are in cluster 1. You don't need 25 different labels for a 5 sample problem, that wouldn't make sense. Hope this helps a little, try the demo (inspect the variables along the way and compare them with your data), which starts with raw data; it should help you decide if Affinity Propagation is the right clustering algorithm for you.
According to this page https://scikit-learn.org/stable/modules/clustering.html
you can use a similarity matrix for AffinityPropagation.

numpy mean of rows when speed is a concern

I want to do mean of rows of numpy matrix. So for the input:
array([[ 1, 1, -1],
[ 2, 0, 0],
[ 3, 1, 1],
[ 4, 0, -1]])
my output will be:
array([[ 0.33333333],
[ 0.66666667],
[ 1.66666667],
[ 1. ]])
I came up with a solution result = array([[x] for x in np.mean(my_matrix, axis=1)]), but this function will be called a lots of times on matrices of 40rows x 10-300 columns, so i would like to make it faster, and this implementation seems slow
You can do something like this:
>>> my_matrix.mean(axis=1)[:,np.newaxis]
array([[ 0.33333333],
[ 0.66666667],
[ 1.66666667],
[ 1. ]])
If the matrices are fresh and independent there isn't much you can save because the only way to compute the mean is to actually sum the numbers.
If however the matrices are obtained from partial views of a single fixed dataset (e.g. you're computing a moving average) the you can use a sum table. For example after:
st = data.cumsum(0)
you can compute the average of the elements between index x0 and x1 with
avg = (st[x1] - st[x0]) / (x1 - x0)
in O(1) (i.e. the computing time doesn't depends on how many elements you are averaging).
You can even use numpy to compute an array with the moving averages directly with:
res = (st[n:] - st[:-n]) / n
This approach can even be extended to higher dimensions like computing the average of the values in a rectangle in O(1) with
st = data.cumsum(0).cumsum(1)
rectsum = (st[y1][x1] + st[y0][x0] - st[y0][x1] - st[y1][x0])

Covariance Matrix calculated by Python Numpy change every time

I have a 1043*261 matrix with very small numbers between 0 and 1, and I calculated 1043*1043 covariance matrix using numpy.cov() function. I tried to run the code a few times and got similar (not exactly the same) covariance matrices, but the elements in the covariance matrices were slightly different by scale of e-7. This sometimes made the covariance matrix non-PSD, which will cause serious problem for me.
Does anyone know why the differences would exist and how to solve it?
Attached are two covariance matrices I got by running the same code twice. If you compare them by element, you will see slight differences:
No. 1
[[ 5.05639177e-06 2.44041401e-06 3.30187175e-06 ..., 1.66634014e-06
4.03972183e-06 1.18433575e-06]
[ 2.44041401e-06 9.67277658e-06 9.04356309e-06 ..., 2.50668884e-06
5.43371939e-06 4.74297546e-06]
[ 3.30187175e-06 9.04356309e-06 2.09334309e-05 ..., 3.13977728e-06
8.69946165e-06 6.15981652e-06]
...,
[ 1.66634014e-06 2.50668884e-06 3.13977728e-06 ..., 4.20175297e-06
4.16076781e-06 1.59827406e-06]
[ 4.03972183e-06 5.43371939e-06 8.69946165e-06 ..., 4.16076781e-06
2.58010941e-05 3.02797946e-06]
[ 1.18433575e-06 4.74297546e-06 6.15981652e-06 ..., 1.59827406e-06
3.02797946e-06 6.60805238e-06]]
No.2
[[ 5.05997030e-06 2.42187179e-06 3.30788097e-06 ..., 1.66495376e-06
4.03676937e-06 1.17413702e-06]
[ 2.42187179e-06 9.60677140e-06 9.05219266e-06 ..., 2.50338648e-06
5.42679569e-06 4.75547515e-06]
[ 3.30788097e-06 9.05219266e-06 2.04172017e-05 ..., 3.13058624e-06
8.67976701e-06 6.28137859e-06]
...,
[ 1.66495376e-06 2.50338648e-06 3.13058624e-06 ..., 4.20175297e-06
4.16076781e-06 1.59827884e-06]
[ 4.03676937e-06 5.42679569e-06 8.67976701e-06 ..., 4.16076781e-06
2.58010941e-05 3.02810307e-06]
[ 1.17413702e-06 4.75547515e-06 6.28137859e-06 ..., 1.59827884e-06
3.02810307e-06 6.63834973e-06]]
Thank you very much!
numpy.cov seems to be deterministic:
import numpy
randoms = numpy.random.random((1043, 261))
covs = [numpy.cov(randoms) for _ in range(10)]
all((c==covs[0]).all() for c in covs)
#>>> True
I'd imagine the problem is elsewhere.
Also note that this result holds with numbers 1000th the size

Categories