Python - Scipy Multivariate normal generalized to 1 dimension - python

When running y = multivariate_normal(np.zeros(d), np.eye(d)).rvs() we obtain a sample of dimension (d, ). However, when d=1 we obtain a scalar, which makes sense since it's 1 dimensional. Unfortunately, I have some piece of code that must work for any number of dimensions, including d=1, and basically takes the dot product of a d dimensional vector x with y. This breaks for d=1. How can I fix it?
import numpy as np
from scipy.stats import multivariate_normal as MVN
def mwe_function(d, x):
"""Minimal Working Example"""
y = MVN(np.zeros(d), np.eye(d)).rvs()
return x # y
mwe_function(2, np.ones(2)) # This works
mwe_function(1, np.ones(1)) # This doesn't
IMPORTANT: I want to avoid if statements. One could simply use scipy.stats.norm in that case, but I want to avoid if statements as they would slow down the code.

You can use np.reshape to fix the shape of your sample. By using -1 to specify the length of the first dimension, you will always get a 1-dimensional array and no scalar.
import numpy as np
from scipy.stats import multivariate_normal as MVN
def mwe_function(d, x):
"""Minimal Working Example"""
y = MVN(np.zeros(d), np.eye(d)).rvs().reshape([-1])
return x # y
v0 = mwe_function(2, np.ones(2)) # This works
print(v0) # -0.5718013906409207
v1 = mwe_function(1, np.ones(1)) # This works as well :-)
print(v1) # -0.20196038784485093
where .reshape([-1]) does the job.
Personally, I prefer reshaping over using np.atleast_1d, since the effect is directly visible - but in the end it is a matter of taste.

Related

Problem using integration scheme in numpy

I am trying to perform integration of a function in order to find the mean position. However when I perform the integrating with quad I get problems of dimensions not matching. When I run the function on its own it works without a problem. However, when the function is used by the quad integration scheme it gives the error of dimenions mismatch. I will post the completely functional code below and the error message and I hope someone can tell me whats going wrong and how I can fix it.
Please let me know if anything is unclear so I can add more information.
import numpy as np
import time
from scipy.sparse.linalg import eigsh
from scipy.sparse.linalg import spsolve
from scipy.sparse import diags
from scipy.sparse import identity
import scipy.sparse as sp
from scipy.integrate import quad
x0 = 15
v = 16
sigma2 = 5
tmax = 4
def my_gaussian(x, mu=x0, var=5):
return np.exp(-((x - mu)**2 / (2*var))+(1j*v*x/2))
L = 100
N = 3200
dx = 1/32
x = np.linspace(0, L, N)
func = lambda x: my_gaussian(x)*my_gaussian(x).conjugate()
C,e = quad(func, 0, L)
def task3(x):
psi_0 = (C**-(1/2))*my_gaussian(x)
H = (dx**-2)*diags([-1, 2, -1], [-1, 0, 1], shape=(N, N))
H = sp.lil_matrix(H)
H[0,0]=0.5*H[0,0]
H[N-1,N-1]=0.5*H[N-1,N-1]
lam, phi = eigsh(H, 400, which="SM")
a = phi.T.dot(psi_0)
psi = phi.dot(a*np.exp(-1j*lam*tmax))
return psi*psi.conjugate()
func = lambda x: task3(x)*x
N1,e = quad(func, 0, L)
Initially this seems like a pretty straightforward shape mismatch. If you multiply a 2-d array by a 1-d array, numpy will only broadcast them automatically if the last dimension of the 2-d array is the same size as the 1-d array. Otherwise you have to explicitly reshape the 1-d array to be broadcastable with the 2-d array.
To fix the problem we need to know which arrays have the mismatched shapes. I added this to your code to see:
try:
psi = phi.dot(a * np.exp(-1j*lam*tmax).reshape(-1, 1))
except ValueError:
print('phi.shape:', phi.shape)
print('a.shape:', a.shape)
print('lam.shape:', lam.shape)
raise
The result was
phi.shape: (3200, 400)
a.shape: (400, 3200)
lam.shape: (400,)
So we need to reshape the lam term to be a column vector:
np.exp(-1j*lam*tmax).reshape(-1, 1))
This fixes the shape problem. But it doesn't fix the whole problem, because... the output of task3 is then a (3200,3200) array! This is, of course, not useful for an integration routine that expects a function that returns a single scalar.
This last problem is something you'll have to work out on your own, since I have no way of knowing what your goal is.

How to find roots for a numpy array

I am wondering how to find foots for an array. What I have now is:
import numpy as np
from scipy.optimize import brentq as find_root
t = np.linspace(0, 100)
def f(x):
return x ** 2 - t
a = find_root(f, -400, 400)
print(a)
It gives me a type array saying that:
TypeError: only size-1 arrays can be converted to Python scalars.
I know the reason is that find_root can only take a scalar in its argument. What I want is to make “a” a bumpy array that finds root for the function given each possible value of t. Does that mean I need to write a loop for find_root? Or do I need to write a loop before I define the function? What’s the easiest way to do it?
Thank you very much for helping.
Yes, in this case it might be easiest to just loop over the arguments.
import numpy as np
from scipy.optimize import brentq as find_root
def f(x, t):
return x ** 2 - t
a = [find_root(f, 0, 400,args=(i,)) for i in np.linspace(1,10,10)]
print(a)
Note that I introduced an argument t to your function f to which you can pass the value using the args parameter of find_root.

Newton method in python for multivariables (system of equations)

My code is running fine for first iteration but after that it outputs the following error:
ValueError: matrix must be 2-dimensional
To the best of my knowledge (which is not much in python), my code is correct. but I don't know, why it is not running correctly for all given iterations. Could anyone help me in this problem.
from __future__ import division
import numpy as np
import math
import matplotlib.pylab as plt
import sympy as sp
from numpy.linalg import inv
#initial guesses
x = -2
y = -2.5
i1 = 0
while i1<5:
F= np.matrix([[(x**2)+(x*y**3)-9],[(3*y*x**2)-(y**3)-4]])
theta = np.sum(F)
J = np.matrix([[(2*x)+y**3, 3*x*y**2],[6*x*y, (3*x**2)-(3*y**2)]])
Jinv = inv(J)
xn = np.array([[x],[y]])
xn_1 = xn - (Jinv*F)
x = xn_1[0]
y = xn_1[1]
#~ print theta
print xn
i1 = i1+1
I believe xn_1 is a 2D matrix. Try printing it you and you will see [[something], [something]]
Therefore to get the x and y, you need to use multidimensional indexing. Here is what I did
x = xn_1[0,0]
y = xn_1[1,0]
This works because within the 2D matrix xn_1 are two single element arrays. Therefore we need to further index 0 to get that single element.
Edit: To clarify, xn_1[1,0] means to index 1 and then take that subarray and index 0 on that. And although according to Scipy it may seem that it should be functionally equivalent to xn_1[1][0], that only applies to the general np.array type and not the np.matrix type. Here is an excellent thread on SO that explains this.
So you should use the xn_1[1,0] way to get the element you want.
xn_1 is a numpy matrix, so it's elements are accessed with the item() method, not like an array. (with []s)
So just change
x = xn_1[0]
y = xn_1[1]
to
x = xn_1.item(0)
y = xn_1.item(1)

find 2d elements in a 3d array which are similar to 2d elements in another 3d array

I have two 3D arrays and want to identify 2D elements in one array, which have one or more similar counterparts in the other array.
This works in Python 3:
import numpy as np
import random
np.random.seed(123)
A = np.round(np.random.rand(25000,2,2),2)
B = np.round(np.random.rand(25000,2,2),2)
a_index = np.zeros(A.shape[0])
for a in range(A.shape[0]):
for b in range(B.shape[0]):
if np.allclose(A[a,:,:].reshape(-1, A.shape[1]), B[b,:,:].reshape(-1, B.shape[1]),
rtol=1e-04, atol=1e-06):
a_index[a] = 1
break
np.nonzero(a_index)[0]
But of course this approach is awfully slow. Please tell me, that there is a more efficient way (and what it is). THX.
You are trying to do an all-nearest-neighbor type query. This is something that has special O(n log n) algorithms, I'm not aware of a python implementation. However you can use regular nearest-neighbor which is also O(n log n) just a bit slower. For example scipy.spatial.KDTree or cKDTree.
import numpy as np
import random
np.random.seed(123)
A = np.round(np.random.rand(25000,2,2),2)
B = np.round(np.random.rand(25000,2,2),2)
import scipy.spatial
tree = scipy.spatial.cKDTree(A.reshape(25000, 4))
results = tree.query_ball_point(B.reshape(25000, 4), r=1e-04, p=1)
print [r for r in results if r != []]
# [[14252], [1972], [7108], [13369], [23171]]
query_ball_point() is not an exact equivalent to allclose() but it is close enough, especially if you don't care about the rtol parameter to allclose(). You also get a choice of metric (p=1 for city block, or p=2 for Euclidean).
P.S. Consider using query_ball_tree() for very large data sets. Both A and B have to be indexed in that case.
P.S. I'm not sure what effect the 2d-ness of the elements should have; the sample code I gave treats them as 1d and that is identical at least when using city block metric.
From the docs of np.allclose, we have :
If the following equation is element-wise True, then allclose returns
True.
absolute(a - b) <= (atol + rtol * absolute(b))
Using that criteria, we can have a vectorized implementation using broadcasting, customized for the stated problem, like so -
# Setup parameters
rtol,atol = 1e-04, 1e-06
# Use np.allclose criteria to detect true/false across all pairwise elements
mask = np.abs(A[:,None,] - B) <= (atol + rtol * np.abs(B))
# Use the problem context to get final output
out = np.nonzero(mask.all(axis=(2,3)).any(1))[0]

root mean square in numpy and complications of matrix and arrays of numpy

Can anyone direct me to the section of numpy manual where i can get functions to accomplish root mean square calculations ...
(i know this can be accomplished using np.mean and np.abs .. isn't there a built in ..if no why?? .. just curious ..no offense)
can anyone explain the complications of matrix and arrays (just in the following case):
U is a matrix(T-by-N,or u say T cross N) , Ue is another matrix(T-by-N)
I define k as a numpy array
U[ind,:] is still matrix
in the following fashion
k = np.array(U[ind,:])
when I print k or type k in ipython
it displays following
K = array ([[2,.3 .....
......
9]])
You see the double square brackets (which makes it multi-dim i guess)
which gives it the shape = (1,N)
but I can't assign it to array defined in this way
l = np.zeros(N)
shape = (,N) or perhaps (N,) something like that
l[:] = k[:]
error:
matrix dimensions incompatible
Is there a way to accomplish the vector assignment which I intend to do ... Please don't tell me do this l = k (that defeats the purpose ... I get different errors in program .. I know the reasons ..If you need I may attach the piece of code)
writing a loop is the dumb way .. which I'm using for the time being ...
I hope I was able to explain .. the problems I'm facing ..
regards ...
For the RMS, I think this is the clearest:
from numpy import mean, sqrt, square, arange
a = arange(10) # For example
rms = sqrt(mean(square(a)))
The code reads like you say it: "root-mean-square".
For rms, the fastest expression I have found for small x.size (~ 1024) and real x is:
def rms(x):
return np.sqrt(x.dot(x)/x.size)
This seems to be around twice as fast as the linalg.norm version (ipython %timeit on a really old laptop).
If you want complex arrays handled more appropriately then this also would work:
def rms(x):
return np.sqrt(np.vdot(x, x)/x.size)
However, this version is nearly as slow as the norm version and only works for flat arrays.
For the RMS, how about
norm(V)/sqrt(V.size)
I don't know why it's not built in. I like
def rms(x, axis=None):
return sqrt(mean(x**2, axis=axis))
If you have nans in your data, you can do
def nanrms(x, axis=None):
return sqrt(nanmean(x**2, axis=axis))
Try this:
U = np.zeros((N,N))
ind = 1
k = np.zeros(N)
k[:] = U[ind,:]
I use this for RMS, all using NumPy, and let it also have an optional axis similar to other NumPy functions:
import numpy as np
rms = lambda V, axis=None: np.sqrt(np.mean(np.square(V), axis))
If you have complex vectors and are using pytorch, the vector norm is the fastest approach on CPU & GPU:
import torch
batch_size, length = 512, 4096
batch = torch.randn(batch_size, length, dtype=torch.complex64)
scale = 1 / torch.sqrt(torch.tensor(length))
rms_power = batch.norm(p=2, dim=-1, keepdim=True)
batch_rms = batch / (rms_power * scale)
Using batch vdot like goodboy's approach is 60% slower than above. Using naïve method similar to deprecated's approach is 85% slower than above.

Categories