Efficient Implementation of Gaussian Elimination in Python [duplicate] - python

Is there somewhere in the cosmos of scipy/numpy/... a standard method for Gauss-elimination of a matrix?
One finds many snippets via google, but I would prefer to use "trusted" modules if possible.

I finally found, that it can be done using LU decomposition. Here the U matrix represents the reduced form of the linear system.
from numpy import array
from scipy.linalg import lu
a = array([[2.,4.,4.,4.],[1.,2.,3.,3.],[1.,2.,2.,2.],[1.,4.,3.,4.]])
pl, u = lu(a, permute_l=True)
Then u reads
array([[ 2., 4., 4., 4.],
[ 0., 2., 1., 2.],
[ 0., 0., 1., 1.],
[ 0., 0., 0., 0.]])
Depending on the solvability of the system this matrix has an upper triangular or trapezoidal structure. In the above case a line of zeros arises, as the matrix has only rank 3.

One function that can be worth checking is _remove_redundancy, if you wish to remove repeated or redundant equations:
import numpy as np
import scipy.optimize
a = np.array([[1.,1.,1.,1.],
[0.,0.,0.,1.],
[0.,0.,0.,2.],
[0.,0.,0.,3.]])
print(scipy.optimize._remove_redundancy._remove_redundancy(a, np.zeros_like(a[:, 0]))[0])
which gives:
[[1. 1. 1. 1.]
[0. 0. 0. 3.]]
As a note to #flonk answer, using a LU decomposition might not always give the desired reduced row matrix. Example:
import numpy as np
import scipy.linalg
a = np.array([[1.,1.,1.,1.],
[0.,0.,0.,1.],
[0.,0.,0.,2.],
[0.,0.,0.,3.]])
_,_, u = scipy.linalg.lu(a)
print(u)
gives the same matrix:
[[1. 1. 1. 1.]
[0. 0. 0. 1.]
[0. 0. 0. 2.]
[0. 0. 0. 3.]]
even though the last 3 rows are linearly dependent.

You can use the symbolic mathematics python library sympy
import sympy as sp
m = sp.Matrix([[1,2,1],
[-2,-3,1],
[3,5,0]])
m_rref, pivots = m.rref() # Compute reduced row echelon form (rref).
print(m_rref, pivots)
This will output the matrix in reduced echelon form, as well as a list of the pivot columns
Matrix([[1, 0, -5],
[0, 1, 3],
[0, 0, 0]])
(0, 1)

Related

I'd like to know how to calculate the similarity(numerical accuracy) of the two numpy array types in Python

I'm a student who just started deep learning with Python.
First of all, my native language is not English, so I can be poor at using a translator.
I used time series data in deep learning to create a model that predicts the likelihood of certain situations in the future. We've even completed visualizations using graphs.
But rather than visualizing it through graphs, I wanted to understand the similarity between train data and test data, the accuracy of the numbers.
The two data are in the following format:
In [51] : train_r
Out[51] : array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
Note: This data is composed of 0 and 1.
In [52] : test_r
Out[52] : array([[0. , 0. , 0. , ..., 0.03657577, 0.06709877,
0.0569071 ],
[0. , 0. , 0. , ..., 0.04707848, 0.07826 ,
0.0819832 ],
[0. , 0. , 0. , ..., 0.04467918, 0.07355513,
0.08117414],
I used the Cosine Similarity method to determine the accuracy of these two types of data, but an error has occurred.
from numpy import dot
from numpy.linalg import norm
cos_sim = dot(train_r, test_r)/(norm(train_r)*norm(test_r))
ValueError: shapes (100,24) and (100,24) not aligned: 24 (dim 1) != 100 (dim 0)
So I searched the Internet to find a different way, but it didn't help because most of them were string-analysis.
Can I figure out how to calculate the similarity between the two lists, and describe it in numbers?
Found the cause.
The reason for the error is that a total of 24 lists were stored in train_r and test_r.
I tried to calculate the list of 24 at once, and there was an error.
It's a simple solution. You can specify a list in train_r and test_r to calculate by cosine similarity method.
train_c = train_r[:,12]
test_c = test_r[:,12]
from numpy import dot
from numpy.linalg import norm
a = train_c
b = test_c
cos_sim = (dot(a, b)/(norm(a)*norm(b))) * 100
print(cos_sim)
95.18094658851624

one-hot encoding of an array of floats using just keras

First of, I am new to stackoverflow, so if there is a way to improve the way I formulate my question or if I missed something obvious, do point it out to me please!
I am building a classification convolutional network in Keras, where the network is asked to predict parameter was used to generate the image. The classes are encoded in 5 float values, e.g. a list of the classes may look like this:
[[0.], [0.76666665], [0.5], [0.23333333], [1.]]
I want to one-hot encode these classes, using the keras.utils.to_categorical(y, num_classes=5, dtype='float32') function.
However, it returns the following:
array(
[
[1., 0., 0., 0., 0.],
[1., 0., 0., 0., 0.],
[1., 0., 0., 0., 0.],
[1., 0., 0., 0., 0.],
[0., 1., 0., 0., 0.]
],
dtype=float32)
It only takes integers as input, thus it maps all values < 1. to 0.
I could circumvent this by multiplying all values with a constant so they are all integers and I think there is also a way to solve this problem within scikit learn, but that sounds like a huge work-around for a problem that should be trivial to solve within just keras, which makes me believe I am missing something obvious.
I hope somebody is able to point out a simple alternative using just Keras.
Another option is to use OneHotEncoder from sklearn:
from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder(categories='auto')
input = [[0.], [0.76666665], [0.5], [0.23333333], [1.]]
output = encoder.fit_transform(input)
print(input)
print(output.toarray())
Outputs:
[[0.0], [0.76666665], [0.5], [0.23333333], [1.0]]
[[ 1. 0. 0. 0. 0.]
[ 0. 0. 0. 1. 0.]
[ 0. 0. 1. 0. 0.]
[ 0. 1. 0. 0. 0.]
[ 0. 0. 0. 0. 1.]]
Due to the continuous nature of floating point values, it's not advisable to try and one hot encode them. Instead, you should try something like this:
a = {}
classes = []
for item, i in zip(your_array, range(len(your_array))):
a[str(i)] = item
classes.append(str(i))
encoded_classes = to_categorical(classes)
The dictionary is so that you can refer to actual values later.
EDIT: Updated after comment from nuric.
your_array = [[0.], [0.76666665], [0.5], [0.23333333], [1.]]
class_values = {}
classes = []
for i, item in enumerate(your_array):
class_values[str(i)] = item
classes.append(i)
encoded_classes = to_categorical(classes)

norm parameters in sklearn.preprocessing.normalize

In sklearn documentation says "norm" can be either
norm : ‘l1’, ‘l2’, or ‘max’, optional (‘l2’ by default)
The norm to use to normalize each non zero sample (or each non-zero feature if axis is 0).
The documentation about normalization isn't clearly stating how ‘l1’, ‘l2’, or ‘max’ are calculated.
Can anyone clear these?
Informally speaking, the norm is a generalization of the concept of (vector) length; from the Wikipedia entry:
In linear algebra, functional analysis, and related areas of mathematics, a norm is a function that assigns a strictly positive length or size to each vector in a vector space.
The L2-norm is the usual Euclidean length, i.e. the square root of the sum of the squared vector elements.
The L1-norm is the sum of the absolute values of the vector elements.
The max-norm (sometimes also called infinity norm) is simply the maximum absolute vector element.
As the docs say, normalization here means making our vectors (i.e. data samples) having unit length, so specifying which length (i.e. which norm) is also required.
You can easily verify the above adapting the examples from the docs:
from sklearn import preprocessing
import numpy as np
X = [[ 1., -1., 2.],
[ 2., 0., 0.],
[ 0., 1., -1.]]
X_l1 = preprocessing.normalize(X, norm='l1')
X_l1
# array([[ 0.25, -0.25, 0.5 ],
# [ 1. , 0. , 0. ],
# [ 0. , 0.5 , -0.5 ]])
You can verify by simple visual inspection that the absolute values of the elements of X_l1 sum up to 1.
X_l2 = preprocessing.normalize(X, norm='l2')
X_l2
# array([[ 0.40824829, -0.40824829, 0.81649658],
# [ 1. , 0. , 0. ],
# [ 0. , 0.70710678, -0.70710678]])
np.sqrt(np.sum(X_l2**2, axis=1)) # verify that L2-norm is indeed 1
# array([ 1., 1., 1.])

implementing euclidean distance based formula using numpy

I am trying to implement this formula in python using numpy
As you can see in picture above X is numpy matrix and each xi is a vector with n dimensions and C is also a numpy matrix and each Ci is vector with n dimensions too, dist(Ci,xi) is euclidean distance between these two vectors.
I implement a code in python:
value = 0
for i in range(X.shape[0]):
min_value = math.inf
#this for loop iterate k times
for j in range(C.shape[0]):
distance = (np.dot(X[i] - C[j],
X[i] - C[j])) ** .5
min_value = min(min_value, distance)
value += min_value
fitnessValue = value
But my code performance is not good enough I'am looking for faster,is there any faster way to calculate that formula in python any idea would be thankful.
Generally, loops running an important number of times should be avoided when possible in python.
Here, there exists a scipy function, scipy.spatial.distance.cdist(C, X), which computes the pairwise distance matrix between C and X. That is to say, if you call distance_matrix = scipy.spatial.distance.cdist(C, X), you have distance_matrix[i, j] = dist(C_i, X_j).
Then, for each j, you want to compute the minimum of the dist(C_i, X_j) over all i. You do not either need a loop to compute this! The function numpy.minimum does it for you, if you pass an axis argument.
And finally, the summation of all these minimum is done by calling the numpy.sum function.
This gives code much more readable and faster:
import scipy.spatial.distance
import numpy as np
def your_function(C, X):
distance_matrix = scipy.spatial.distance.cdist(C, X)
minimum = np.min(distance_matrix, axis=0)
return np.sum(minimum)
Which returns the same results as your function :)
Hope this helps!
einsum can also be called into play. Here is a simple small example of a pairwise distance calculation for a small set. Useful if you don't have scipy installed and/or wish to use numpy solely.
>>> a
array([[ 0., 0.],
[ 1., 1.],
[ 2., 2.],
[ 3., 3.],
[ 4., 4.]])
>>> b = a.reshape(np.prod(a.shape[:-1]),1,a.shape[-1])
>>> b
array([[[ 0., 0.]],
[[ 1., 1.]],
[[ 2., 2.]],
[[ 3., 3.]],
[[ 4., 4.]]])
>>> diff = a - b; dist_arr = np.sqrt(np.einsum('ijk,ijk->ij', diff, diff)).squeeze()
>>> dist_arr
array([[ 0. , 1.41421, 2.82843, 4.24264, 5.65685],
[ 1.41421, 0. , 1.41421, 2.82843, 4.24264],
[ 2.82843, 1.41421, 0. , 1.41421, 2.82843],
[ 4.24264, 2.82843, 1.41421, 0. , 1.41421],
[ 5.65685, 4.24264, 2.82843, 1.41421, 0. ]])
Array 'a' is a simple 2d (shape=(5,2), 'b' is just 'a' reshaped to facilitate (5, 1, 2) the difference calculations for the cdist style array. The terms are written verbosely since they are extracted from other code. the 'diff' variable is the difference array and the dist_arr shown is for the 'euclidean' distance. Should you need euclideansq (square distance) for 'closest' determinations, simply remove the np.sqrt term and finally squeeze, just removes and 1 terms in the shape.
cdist is faster for much larger arrays (in the order of 1000s of origins and destinations) but einsum is a nice alternative and well documented by others on this site.

How to operate elementwise on a matrix of type scipy.sparse.csr_matrix?

In numpy if you want to calculate the sinus of each entry of a matrix (elementise) then
a = numpy.arange(0,27,3).reshape(3,3)
numpy.sin(a)
will get the job done! If you want the power let's say to 2 of each entry
a**2
will do it.
But if you have a sparse matrix things seem more difficult. At least I haven't figured a way to do that besides iterating over each entry of a lil_matrix format and operate on it.
I've found this question on SO and tried to adapt this answer but I was not succesful.
The Goal is to calculate elementwise the squareroot (or the power to 1/2) of a scipy.sparse matrix of CSR format.
What would you suggest?
The following trick works for any operation which maps zero to zero, and only for those operations, because it only touches the non-zero elements. I.e., it will work for sin and sqrt but not for cos.
Let X be some CSR matrix...
>>> from scipy.sparse import csr_matrix
>>> X = csr_matrix(np.arange(10).reshape(2, 5), dtype=np.float)
>>> X.A
array([[ 0., 1., 2., 3., 4.],
[ 5., 6., 7., 8., 9.]])
The non-zero elements' values are X.data:
>>> X.data
array([ 1., 2., 3., 4., 5., 6., 7., 8., 9.])
which you can update in-place:
>>> X.data[:] = np.sqrt(X.data)
>>> X.A
array([[ 0. , 1. , 1.41421356, 1.73205081, 2. ],
[ 2.23606798, 2.44948974, 2.64575131, 2.82842712, 3. ]])
Update In recent versions of SciPy, you can do things like X.sqrt() where X is a sparse matrix to get a new copy with the square roots of elements in X.

Categories