Sorry if this post is a dupli,I couldn't find an answer... I have the following code:
import numpy as np
V = np.array([[6, 10, 0],
[2, 5, 0],
[0, 0, 0]])
subarr = np.array([[arr[0][0], arr[0][1]], [arr[1][0], arr[1][1]]])
det = np.linalg.det(subarr)
cross = np.cross(arr[0], arr[1])
print(f"Det: {det}")
print(f"Cross: {cross}")
I would expect that the det would return 10.0 and the cross returns in this case [0, 0, 10], the last number being equal to the det. For some reason, python returns
Det: 10.000000000000002
Cross: [ 0 0 10]
Can someone please explain why?
What you're seeing is floating point inaccuracies.
And in case you're wondering how you end up with floats when finding the determinant of a matrix made up of integers (where the usual calculation method is just 6*5 - 2*10 = 10), np.linalg.det uses LU decomposition to find the determinant. This isn't very efficient for 2x2 matrices, but is much more efficient when you have bigger matrices.
For your 2x2, you get:
scipy.linalg.lu(A, 1)
Out:
(array([[ 1. , 0. ],
[ 0.33333333, 1. ]]),
array([[ 6. , 10. ],
[ 0. , 1.66666667]]))
The determinant is just the product of the diagonals from this, which ends up being 6. * 1.66666667, which resolves to 10.00000002 due to floating point errors.
Related
I'm implementing the Nearest Centroid Classification algorithm and I'm kind of blocked on how to use numpy.mean in my case.
So suppose I have some spherical datasets X:
[[ 0.39151059 3.48203037]
[-0.68677876 1.45377717]
[ 2.30803493 4.19341503]
[ 0.50395297 2.87076658]
[ 0.06677012 3.23265678]
[-0.24135103 3.78044279]
[-0.05660036 2.37695381]
[ 0.74210998 -3.2654815 ]
[ 0.05815341 -2.41905942]
[ 0.72126958 -1.71081388]
[ 1.03581142 -4.09666955]
[ 0.23209714 -1.86675298]
[-0.49136284 -1.55736028]
[ 0.00654881 -2.22505305]]]
and the labeled vector Y:
[0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 1. 1. 1. 1.]
An example with 100 2D data points gives the following result:
The NCC algorithm consists of first calculating the class mean of each class (0 and 1: that's blue and red) and then calculating the nearest class centroid for the next data point.
This is my current function:
def mean_ncc(X,Y):
# find unique classes
m_cids = np.unique(Y) #[0. 1.]
# compute class means
mu = np.zeros((len(cids), X.shape[1])) #[[0. 0.] [0. 0.]] (in the case where Y has 2 unique points (0 and 1)
for class_idx, class_label in enumerate(cids):
mu[class_idx, :] = #problem here
return mu
So here I want an array containing the class means of '0' (blue) points and '1' (red) points:
How can I specify the number of elements of X whose mean I want to calculate?
I would like to do something like this:
for class_idx, class_label in enumerate(m_cids):
mu[class_idx, :] = np.mean(X[only the elements,that contains the same class_label], axis=0)
Is it possible or is there another way to implement this?
You could use something like this:
import numpy as np
tags = [0, 0, 1, 1, 0, 1]
values = [5, 4, 2, 5, 9, 8]
tags_np = np.array(tags)
values_np = np.array(values)
print(values_np[tags_np == 1].mean())
EDIT: You will surely need to look more into the axis parameter for the mean function:
import numpy as np
values = [[5, 4],
[5, 4],
[4, 3],
[4, 3]]
values_np = np.array(values)
tags_np = np.array([0, 0, 1, 1])
print(values_np[tags_np == 0].mean(axis=0))
I have a 10*10 2D numpy array/list where some values are 0 and others are 1. No two 1 can be set diagonally, which means if array[3][4] is 1,then array[2][4], array[4,4], array[3][3] and array[3][5] can't be 1. So I write this code:
if arr[i,j]:
if arr[i+1,j] or arr[i-1,j] or arr[i,j-1] or arr[i,j+1]:
return False
But the problem is I can't loop through this code all the way from i = 0 to i = 9 because for i = 0 and i = 9 there will be list out of index error.
So I had to rewrite the code:
if arr[i,j]:
if (i>0 and i<9) and (j>0 and j<9):
if arr[i+1,j] or arr[i-1,j] or arr[i,j-1] or arr[i,j+1]:
return False
And then I have to write a if-else for i==0 and j==0, then for i==0 and j==9 and then for i == 0 and (j > 0 or j < 9) and some more.
Can anybody suggest a shortcut way to solve the problem in one if-else condition without getting the list out of index error.
You can apply a convolution across your matrix with 0.5 on the diagonals and wherever it returns a 1, there are two ones in a diagonal at that position.
example:
array = np.array([[1, 1, 1],
[1, 0, 0],
[0, 0, 0]])
applying a convolution to the main diagonal direction (\):
from scipy.signal import convolve
convolve(array, np.array([[0.5, 0],[0,0.5]]), mode="valid")
output:
array([[0.5, 0.5],
[0.5, 0. ]])
There are no '1's so this passes
Now applying in the other direction (the anti-diagonal, /)
convolve(array, np.array([[0, 0.5],[0.5,0]]), mode="valid")
output:
array([[1. , 0.5],
[0. , 0. ]])
There is a 1 on the top left square of this convolution, so there are two 1's in the anti-diagonal of the original array in the top left corner.
Related to this question: How to have negative zero always formatted as positive zero in a python string?
I have the following function that implements Matlab's orth.m using Numpy. I have a docstring test that relies on np.array2string using suppress_small=True, which will make small values round to zero. However, sometimes they round to positive zero and sometimes they round to negative zero, depending on whether the answer comes out as 1e-16 or -1e-17 or similar. Which case happens is based on the SVD decomposition, and can vary from platform to platform or across Python versions depending on which underlying linear algebra solver is used (BLAS, Lapack, etc.)
What's the best way to design the docstring test to account for this?
In the final doctest, sometimes the Q[0, 1] term is -0. and sometimes it's 0.
import doctest
import numpy as np
def orth(A):
r"""
Orthogonalization basis for the range of A.
That is, Q.T # Q = I, the columns of Q span the same space as the columns of A, and the number
of columns of Q is the rank of A.
Parameters
----------
A : 2D ndarray
Input matrix
Returns
-------
Q : 2D ndarray
Orthogonalization matrix of A
Notes
-----
#. Based on the Matlab orth.m function.
Examples
--------
>>> import numpy as np
Full rank matrix
>>> A = np.array([[1, 0, 1], [-1, -2, 0], [0, 1, -1]])
>>> r = np.linalg.matrix_rank(A)
>>> print(r)
3
>>> Q = orth(A)
>>> with np.printoptions(precision=8):
... print(Q)
[[-0.12000026 -0.80971228 0.57442663]
[ 0.90175265 0.15312282 0.40422217]
[-0.41526149 0.5664975 0.71178541]]
Rank deficient matrix
>>> A = np.array([[1, 0, 1], [0, 1, 0], [1, 0, 1]])
>>> r = np.linalg.matrix_rank(A)
>>> print(r)
2
>>> Q = orth(A)
>>> print(np.array2string(Q, precision=8, suppress_small=True)) # Sometimes this fails
[[-0.70710678 -0. ]
[ 0. 1. ]
[-0.70710678 0. ]]
"""
# compute the SVD
(Q, S, _) = np.linalg.svd(A, full_matrices=False)
# calculate a tolerance based on the first eigenvalue (instead of just using a small number)
tol = np.max(A.shape) * S[0] * np.finfo(float).eps
# sum the number of eigenvalues that are greater than the calculated tolerance
r = np.sum(S > tol, axis=0)
# return the columns corresponding to the non-zero eigenvalues
Q = Q[:, np.arange(r)]
return Q
if __name__ == '__main__':
doctest.testmod(verbose=False)
You can print the rounded array plus 0.0 to eliminate the -0:
A = np.array([[1, 0, 1], [0, 1, 0], [1, 0, 1]])
Q = orth(A)
Q[0,1] = -1e-16 # simulate a small floating point deviation
print(np.array2string(Q.round(8)+0.0, precision=8, suppress_small=True))
#[[-0.70710678 0. ]
# [ 0. 1. ]
# [-0.70710678 0. ]]
So your doc string should be:
>>> Q = orth(A)
>>> print(np.array2string(Q.round(8)+0.0, precision=8, suppress_small=True)) # guarantee non-negative zeros
[[-0.70710678 0. ]
[ 0. 1. ]
[-0.70710678 0. ]]
Here's another alternative I came up with, although I think I like rounding the array to the given precision better. In this method, you shift the whole array by some amount that is bigger than the round-off error, but smaller than the precision comparison. That way the small numbers will still always be slightly positive.
>>> Q = orth(A)
>>> print(np.array2string(Q + np.full(Q.shape, 1e-14), precision=8, suppress_small=True))
[[-0.70710678 0. ]
[ 0. 1. ]
[-0.70710678 0. ]]
What is the most idiomatic way to produce a cumulative sum which "fades" out as it moves along. Let me explain with an example.
>>> np.array([1,0,-1,0,0]).cumsum()
array([1, 1, 0, 0, 0], dtype=int32)
But I would like to provide a factor <1 and produce something like:
>>> np.array([1,0,-1,0,0]).cumsum_with_factor(0.5)
array([1.0, 0.5, -0.75, -0.375, -0.1875], dtype=float64)
It's a big plus if it's fast!
Your result can be obtained by linear convolution:
signal = np.array([1,0,-1,0,0])
kernel = 0.5**np.arange(5)
np.convolve(signal, kernel, mode='full')
# array([ 1. , 0.5 , -0.75 , -0.375 , -0.1875, -0.125 , -0.0625,
0. , 0. ])
If performance is a consideration use scipy.signal.fftconvolve which is a faster implementation of the same logic.
Consider two urns, E and U. There are holy grails and crappy grails in each of these. Denote the holy ones with H.
Say we draw out of both urns, xe times out of E, and xu times out of U - how many holy grails are we going to find? This is easily solvable for any pair (xe, xu). But I'd like to do this for grids of draws out of xe and xu.
What is the most efficient way to do this in Python using standard packages?
Here is my approach.
import numpy as np
import scipy.stats as stats
binomial = stats.binom.pmf
# define the grids of E, U to search
numberOfE = np.arange(3)
numberOfHolyE = np.arange(3)
numberOfU = np.arange(5)
numberOfHolyU = np.arange(5)
# mesh it
E, U, EH, UH = np.meshgrid(numberOfE, numberOfU, numberOfHolyE, numberOfHolyU, indexing='ij')
# independent draws from both urns. Probabilities are 0.9 and 0.1
drawsE = binomial(EH, E, 0.9)
drawsU = binomial(UH, U, 0.1)
# joint probability of being at a specific grid point
prob = drawsE * drawsU
totalHigh = EH + UH
This is how far I've come:
In [77]: prob[1,1,:]
Out[77]:
array([[ 0.09, 0.01, 0. , 0. , 0. ],
[ 0.81, 0.09, 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ]])
In [78]: totalHigh[1,1,:]
Out[78]:
array([[0, 1, 2, 3, 4],
[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6]])
I think, that, these matrices mean the following:
Take a look at where totalHigh has value 1: if I draw one time from both urns, I have a 0.81 probability of drawing one high from E and zero from U, and 0.01 the other way around. That means, the total probability of drawing one guy conditional on drawing once from both urns is 0.82.
Which brings me to my second question:
Conditional on doing it this way, How do I sum up these probabilities efficiently, conditional on the first two dimensions? I effectively want to transform these 4D matrices into 3D matrices.