What is the most idiomatic way to produce a cumulative sum which "fades" out as it moves along. Let me explain with an example.
>>> np.array([1,0,-1,0,0]).cumsum()
array([1, 1, 0, 0, 0], dtype=int32)
But I would like to provide a factor <1 and produce something like:
>>> np.array([1,0,-1,0,0]).cumsum_with_factor(0.5)
array([1.0, 0.5, -0.75, -0.375, -0.1875], dtype=float64)
It's a big plus if it's fast!
Your result can be obtained by linear convolution:
signal = np.array([1,0,-1,0,0])
kernel = 0.5**np.arange(5)
np.convolve(signal, kernel, mode='full')
# array([ 1. , 0.5 , -0.75 , -0.375 , -0.1875, -0.125 , -0.0625,
0. , 0. ])
If performance is a consideration use scipy.signal.fftconvolve which is a faster implementation of the same logic.
Related
Sorry if this post is a dupli,I couldn't find an answer... I have the following code:
import numpy as np
V = np.array([[6, 10, 0],
[2, 5, 0],
[0, 0, 0]])
subarr = np.array([[arr[0][0], arr[0][1]], [arr[1][0], arr[1][1]]])
det = np.linalg.det(subarr)
cross = np.cross(arr[0], arr[1])
print(f"Det: {det}")
print(f"Cross: {cross}")
I would expect that the det would return 10.0 and the cross returns in this case [0, 0, 10], the last number being equal to the det. For some reason, python returns
Det: 10.000000000000002
Cross: [ 0 0 10]
Can someone please explain why?
What you're seeing is floating point inaccuracies.
And in case you're wondering how you end up with floats when finding the determinant of a matrix made up of integers (where the usual calculation method is just 6*5 - 2*10 = 10), np.linalg.det uses LU decomposition to find the determinant. This isn't very efficient for 2x2 matrices, but is much more efficient when you have bigger matrices.
For your 2x2, you get:
scipy.linalg.lu(A, 1)
Out:
(array([[ 1. , 0. ],
[ 0.33333333, 1. ]]),
array([[ 6. , 10. ],
[ 0. , 1.66666667]]))
The determinant is just the product of the diagonals from this, which ends up being 6. * 1.66666667, which resolves to 10.00000002 due to floating point errors.
I read this.
https://matthew-brett.github.io/transforms3d/gimbal_lock.html
and it gave an example of gimbal lock
>>> import numpy as np
>>> np.set_printoptions(precision=3, suppress=True) # neat printing
>>> from transforms3d.euler import euler2mat, mat2euler
>>> x_angle = -0.2
>>> y_angle = -np.pi / 2
>>> z_angle = -0.2
>>> R = euler2mat(x_angle, y_angle, z_angle, 'sxyz')
>>> R
array([[ 0. , 0.389, -0.921],
[-0. , 0.921, 0.389],
[ 1. , -0. , 0. ]])
Then I tried this:
http://kieranwynn.github.io/pyquaternion/
q1 = Quaternion(axis=[1, 0, 0], angle=-0.2)
q2 = Quaternion(axis=[0, 1, 0], angle=-numpy.pi/2)
q3 = Quaternion(axis=[0, 0, 1], angle=-0.2)
q4 = q3 * q2 * q1;
q4.rotation_matrix
array([[ 0. , 0.38941834, -0.92106099],
[ 0. , 0.92106099, 0.38941834],
[ 1. , 0. , 0. ]])
it gave same gimbal lock.
So, why Quaternion can prevent gimbal lock?
Gimbal lock can occur when you do three separate rotations around separate axes, which every euler angle rotation does. For every set of rotations about several axes, there is always an equivalent single rotation about one single axis. Quaternions stop gimbal lock by allowing you to take this single equivalent rotation rather than a set of three rotations that, if done in the wrong order, could create gimbal lock.
I hope this helps, if you need me to clarify anything or have any further questions feel free to ask!
I have a numpy csr matrix and I want to get it's mean, but it contains a lot of zeros, because I eliminated all values that are on the main diagonal and below it taking only the upper triangle values, and now my csr matrix when converted to array looks like that:
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0.63646664 0.34827262
0.24316454 0.1362165 0.63646664 0.15762204 0.31692202 0.12114576
0.35917146
As far as I understand this zeros are important to be there in order for the csr matrix to work and display things like this:
(0,5) 0.5790418
(3,10) 0.578210
(5,20) 0.912370
(67,5) 0.1093109
I saw that csr matrix has it's own mean function, but does this mean function takes into account all the zeros, therefore dividing on the number of elements in the array including the zeros? Because I need the mean on only the non zero values. My matrix contains the similarities between multiple vectors and is more like a list of matrices something like that:
[[ 0. 0.63646664 0.48492084 0.42134077 0.14366401 0.10909745
0.06172853 0.08116201 0.19100626 0.14517247 0.23814955 0.1899649
0.20181049 0.25663533 0.21003358 0.10436352 0.2038447 1.
0.63646664 0.34827262 0.24316454 0.1362165 0.63646664 0.15762204
0.31692202 0.12114576 0.35917146]
[ 0. 0. 0.58644824 0.4977052 0.15953415 0.46110612
0.42580993 0.3236768 0.48874263 0.44671607 0.59153001 0.57868948
0.27357541 0.51645488 0.43317846 0.50985032 0.37317457 0.63646664
1. 0.51529235 0.56963948 0.51218525 1. 0.38345582
0.55396192 0.32287605 0.46700191]
[ 0. 0. 0. 0.6089113 0.53873289 0.3367261
0.29264493 0.13232082 0.43288206 0.80079927 0.37842518 0.33658945
0.61990095 0.54372307 0.49982101 0.23555037 0.39283379 0.48492084
0.58644824 0.64524906 0.31279271 0.39476181 0.58644824 0.39028705
0.43856802 0.32296735 0.5541861 ]]
So how can I take the mean on only the non-zero values?
My other question is how can I remove all values that are equal to something, as I pointed out above I probably have to turn the certain value to a zero? But how do I do that ? For example I want to get rid of all values that are equal to 1.0 or bigger?
Here is the code I have till this point to make my matrix:
vectorized_words = parse.csr_matrix(vectorize_words(nostopwords,glove_dict))
#calculating the distance/similarity between each vector in the matrix
cos_similiarity = cosine_similarity(vectorized_words, dense_output=False)
# since there are duplicates like (5,0) and (0,5) which we should remove, I use scipy's triu function
coo_cossim = cos_similiarity.tocoo()
vector_similarities = sparse.triu(coo_cossim, k = 1).tocsr()
Yes, csr_matrix.mean() does include all of the zeros when calculating the mean. As a simple example:
from scipy.sparse import csr_matrix
m = csr_matrix(([1,1], ([2,3],[3,3])), shape=(5,5))
m.toarray()
# returns:
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 0, 0]], dtype=int32)
# test the mean method
m.mean(), m.mean(axis=0), m.mean(axis=1)
# returns:
0.080000000000000002,
matrix([[ 0. , 0. , 0. , 0.4, 0. ]]),
matrix([[ 0. ],
[ 0. ],
[ 0.2],
[ 0.2],
[ 0. ]])
If you need to perform a calculation that does not include zeros, you will have to build the result with other methods. It is not terribly hard to do though:
nonzero_mean = m.sum() / m.count_nonzero()
Consider two urns, E and U. There are holy grails and crappy grails in each of these. Denote the holy ones with H.
Say we draw out of both urns, xe times out of E, and xu times out of U - how many holy grails are we going to find? This is easily solvable for any pair (xe, xu). But I'd like to do this for grids of draws out of xe and xu.
What is the most efficient way to do this in Python using standard packages?
Here is my approach.
import numpy as np
import scipy.stats as stats
binomial = stats.binom.pmf
# define the grids of E, U to search
numberOfE = np.arange(3)
numberOfHolyE = np.arange(3)
numberOfU = np.arange(5)
numberOfHolyU = np.arange(5)
# mesh it
E, U, EH, UH = np.meshgrid(numberOfE, numberOfU, numberOfHolyE, numberOfHolyU, indexing='ij')
# independent draws from both urns. Probabilities are 0.9 and 0.1
drawsE = binomial(EH, E, 0.9)
drawsU = binomial(UH, U, 0.1)
# joint probability of being at a specific grid point
prob = drawsE * drawsU
totalHigh = EH + UH
This is how far I've come:
In [77]: prob[1,1,:]
Out[77]:
array([[ 0.09, 0.01, 0. , 0. , 0. ],
[ 0.81, 0.09, 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ]])
In [78]: totalHigh[1,1,:]
Out[78]:
array([[0, 1, 2, 3, 4],
[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6]])
I think, that, these matrices mean the following:
Take a look at where totalHigh has value 1: if I draw one time from both urns, I have a 0.81 probability of drawing one high from E and zero from U, and 0.01 the other way around. That means, the total probability of drawing one guy conditional on drawing once from both urns is 0.82.
Which brings me to my second question:
Conditional on doing it this way, How do I sum up these probabilities efficiently, conditional on the first two dimensions? I effectively want to transform these 4D matrices into 3D matrices.
Hi I have to enlarge the number of points inside of vector to enlarge the vector to fixed size. for example:
for this simple vector
>>> a = np.array([0, 1, 2, 3, 4, 5])
>>> len(a)
# 6
now, I want to get a vector with size of 11 taken the a vector as base the results will be
# array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. ])
EDIT 1
what I need is a function that will enter the base vector and the number of values that must be the resultant vector, and I return a new vector with size equal to the parameter. something like
def enlargeVector(vector, size):
.....
return newVector
to use like:
>>> a = np.array([0, 1, 2, 3, 4, 5])
>>> b = enlargeVector(a, 200):
>>> len(b)
# 200
and b contains data results of linear, cubic, or whatever interpolation methods
There are many methods to do this within scipy.interpolate. My favourite is UnivariateSpline, which produces an order k spline guaranteed to be differentiable k times.
To use it:
from scipy.interpolate import UnivariateSpline
old_indices = np.arange(0,len(a))
new_length = 11
new_indices = np.linspace(0,len(a)-1,new_length)
spl = UnivariateSpline(old_indices,a,k=3,s=0)
new_array = spl(new_indices)
The s is a smoothing factor that you should set to 0 in this case (since the data are exact).
Note that for the problem you have specified (since a just increases monotonically by 1), this is overkill, since the second np.linspace gives already the desired output.
EDIT: clarified that the length is arbitrary
As AGML pointed out there are tools to do this, but how about a pure numpy solution:
In [20]: a = np.arange(6)
In [21]: temp = np.dstack((a[:-1], a[:-1] + np.diff(a) / 2.0)).ravel()
In [22]: temp
Out[22]: array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])
In [23]: np.hstack((temp, [a[-1]]))
Out[23]: array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. ])