Related
I would like to evaluate a 4d Gaussian / normal distribution on a 4d grid. Let's call the variables (x1,y1,x2,y2). Then if I have means = (x1=1,y1=0,x2=2,y2=0), I expect that when I do a 2d contour plot in the x1, x2 direction, at y1=y2=0, to see a Gaussian centered in (x1=1, x2=2). However, I see the mean/center at (x1=2,x2=0) instead.
What am I missing here? Is it how I define the grid to begin with?
For a 2d normal distribution it works as expected.
import numpy as np
from matplotlib import pyplot as plt
from scipy.stats import multivariate_normal
xy_min = -5
xy_max = 5
npoints = 50
x = np.linspace(xy_min, xy_max, npoints)
dim = 4
xx1,yy1,xx2,yy2 = np.meshgrid(x, x,x,x)
points = np.concatenate([xx1[:, :,:, :,None], yy1[:, :, :,:,None],xx2[:, :, :,:,None],yy2[:, :, :,:,None]], axis=-1)
cov = np.diag(np.ones(4))
mean=np.array([1,0,2,0])
rv = multivariate_normal.pdf(points , mean=mean, cov=cov)
plt.figure()
plt.contourf(x, x, rv[:,0,:,0])
I tried to manually reshape the evaluation points first, but it gives the same results. So I think I am missing something conceptually here?
points_resh = np.reshape(points,[npoints**4,dim],order='C')
rv_resh = multivariate_normal.pdf(points_resh , mean=mean, cov=cov)
rv2 = np.reshape(rv_resh,[npoints,npoints,npoints,npoints],order='C')
plt.figure()
plt.contourf(x, x, rv2[:,0,:,0])
** EDIT: SOLVED **
using ij indexing for meshgrid everything works as expected. Only need to keep in mind that the matrix needs to be transposed for contour plotting. See example below:
#%% Instead use ij indexing
x = np.linspace(-5, 5, 50)
y = np.linspace(-3, 3, 30)
z= np.linspace(-2, 2, 20)
w= np.linspace(-1, 1, 10)
x4d,y4d,z4d,w4d= np.meshgrid(x, y,z,w,indexing='ij')
points4d= np.concatenate([x4d[:, :,:,:,None], y4d[:, :,:,:,None], z4d[:, :,:,:,None],w4d[:, :,:,:,None]], axis=-1)
rv4d = multivariate_normal.pdf(points4d , mean=[1,0.0,2,0.0], cov=[0.1,0.1,0.1,0.1])
fig,ax=plt.subplots()
ax.contourf(x,z,rv4d[:,0,:,0].T)
ax.set(xlabel='x',ylabel='y')
print(x_mean)
using ij indexing for meshgrid everything works as expected. Only need to keep in mind that the matrix needs to be transposed for contour plotting. See example below:
#%% Instead use ij indexing
x = np.linspace(-5, 5, 50)
y = np.linspace(-3, 3, 30)
z= np.linspace(-2, 2, 20)
w= np.linspace(-1, 1, 10)
x4d,y4d,z4d,w4d= np.meshgrid(x, y,z,w,indexing='ij')
points4d= np.concatenate([x4d[:, :,:,:,None], y4d[:, :,:,:,None], z4d[:, :,:,:,None],w4d[:, :,:,:,None]], axis=-1)
rv4d = multivariate_normal.pdf(points4d , mean=[1,0.0,2,0.0], cov=[0.1,0.1,0.1,0.1])
fig,ax=plt.subplots()
ax.contourf(x,z,rv4d[:,0,:,0].T)
ax.set(xlabel='x',ylabel='y')
print(x_mean)
I hae a problem with scipy.interpolate.interp2d.
For sorted input, the interpolation is OK.
When I ask to get the interpolation values for an unsorted array, I get output as if it is sorted internally by SciPy. Why is that?
The way around is to get interpolation values in a loop.
Here is my demonstration code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
'''
SciPy interp2d test.
Why is the input of 2D interpolation sorted internally?
'''
import matplotlib.pyplot as plt
import scipy.interpolate as itp
import numpy as np
def fMain():
nx=11
ny=21
ax=np.linspace(0,1,nx)
mx=np.empty((nx,ny))
for i in range(ny):
mx[:,i] = ax
pass
ay=np.linspace(0,1,ny)
my=np.empty((nx,ny))
for i in range(nx): # can I do this without loop?
my[i,:] = ay
pass
mz=np.empty((nx,ny))
mz=mx**2 + my**3
f2Di = itp.interp2d( mx, my, mz, kind='linear')
#this provides identical results, ok
#f2Di = itp.interp2d( ax, ay, mz.transpose(), kind='linear')
if True :
# just to check the interpolation
mzi = f2Di(ax,ay)
fig = plt.figure()
axis = fig.add_subplot(projection='3d')
axis.plot_wireframe( mx, my, mz )
axis.scatter(mx, my, mzi.transpose(), marker="o",color="red")
axis.set_xlabel("x")
axis.set_ylabel("y")
axis.set_zlabel("z")
plt.tight_layout()
plt.show()
plt.close()
pass
if True:
y = 0.5
az = f2Di( ax , y )
axf=np.flip(ax)
azf1 = f2Di( axf , y )
azf2 = np.empty(nx)
for i in range(nx):
azf2[i] = f2Di( axf[i] , y )
pass
plt.plot(ax,az,label="Normal",linewidth=3,linestyle="dashed")
plt.plot(axf,azf1,label="Reversed")
plt.plot(axf,azf2,label="Reversed loop")
plt.legend()
plt.xlabel("x")
plt.ylabel("z")
plt.tight_layout()
plt.show()
plt.close()
pass
pass
if __name__ == "__main__":
fMain()
pass
To answer another question (from a comment in the code):
ax=np.linspace(0,1,nx)
mx=np.empty((nx,ny))
for i in range(ny):
mx[:,i] = ax
(and similar for ay).
Can I do this with a loop?
Yes (technically no, since there'll be a C loop under the hood, but practially, yes). Use numpy.tile:
ax = np.linspace(0, 1, nx)
mx = np.tile(ax, (ny, 1)).T
And the np.empty doesn't make sense below: you allocate memory, but immediately (re)assign the variable to another value:
#mz = np.empty((nx, ny)) # This line is redundant
mz = mx**2 + my**3
This is also why np.empty has disappeared form the for-replacement code.
There's a builtin function for make grids like your mx,my:
In [68]: I,J = np.meshgrid(ay,ax)
In [69]: I.shape
Out[69]: (11, 21)
In [71]: np.allclose(I,my)
Out[71]: True
In [72]: np.allclose(J,mx)
Out[72]: True
alternatively you could have assigned the values with broadcasting
In [76]: my = np.empty((nx,ny)); my[:]=ay
In [78]: mx = np.empty((nx,ny)); mx[:]=ax[:,None]
The interp2d docs say that input arrays are flattened, even if input as 2d. And that the x,y can be the coordinates as in ax,ay; they don't have to be constructed from the full grid. So the 2 ways of setting up the f2Di are equivalent.
Full documentation for the use of f2Di(x,y) is
https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp2d.__call__.html#scipy.interpolate.interp2d.__call__
It explicitly states that the inputs, x,y have to sorted, or it will do it for you.
One interpolation:
In [86]: mzi = f2Di(ax,ay)
In [87]: mzi.shape
Out[87]: (21, 11)
Another with the inputs reversed:
In [89]: azf1 = f2Di(ax[::-1], ay[::-1] )
In [90]: azf1.shape
Out[90]: (21, 11)
In [91]: np.allclose(mzi, azf1)
Out[91]: True
As you note, and attempt to show with a lot of plotting code, the results are the same - inputs have been sorted before they are used to interpolate.
If I falsely tell it that the coordinates are sorted:
In [94]: azf1 = f2Di(ax[::-1], ay[::-1] , assume_sorted=True)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
...
File ~\anaconda3\lib\site-packages\scipy\interpolate\_fitpack_impl.py:1054, in bisplev(x, y, tck, dx, dy)
1052 z, ier = _fitpack._bispev(tx, ty, c, kx, ky, x, y, dx, dy)
1053 if ier == 10:
-> 1054 raise ValueError("Invalid input data")
1055 if ier:
1056 raise TypeError("An error occurred")
ValueError: Invalid input data
Note that the error was raised by a function in _fitpack. The name implies that this using some sort of compiled interpolation code, a library that is probably written in C or Fortran. I'm not a developer, but I can imagine that it's easiest to write such code assuming that the inputs are sorted. Such shared libraries work best when they have clear, and relatively simple, expectations regarding the inputs.
I have a problem with my code.
So i try to represent the sampled values of a function 'sin(t^3)/2^tan(t)' for
t between 0 and 1.5 and frequency fs=50Hz.
I have created a function 'sampleFunction' which takes as parameters the string which represents the trigonometric function,beginning of the interval,end of interval and the frequency.
I create tVector(0,0.02,0.04,..,1.48)
Then I take the elements of tVector and use them to evaluate the string and put the result in another vector y
I return both y and tVector
But I encounter a problem when i run it saying 'y' is not defined
This is the code:
import numpy as np
import matplotlib.pyplot as plt
import math
def sampleFunction(functionString,t0,t1,fs):
tVector=np.arange(start=t0, stop=t1, step=1/fs, dtype='float')
t=t0
for i in range(0,len(tVector)):
t=tVector[i]
y[i]=eval(functionString)
return y,tVector
t0=0
t1 =1.5
fs=50
thold=.1
functionString='math.sin(t**3)/2**math.tan(t)'
y,t=sampleFunction(functionString,t0,t1,fs)
plt.plot(t,y)
plt.xlabel('time')
plt.ylabel('Amplitude')
You can change your code in the following way:
def sampleFunction(functionString,t0,t1,fs):
tVector=np.arange(start=t0, stop=t1, step=1/fs, dtype='float')
t=t0
y = np.zeros( tVector.shape )
for i in range(0,len(tVector)):
t=tVector[i]
y[i]=eval(functionString)
return y,tVector
However, this is not good python. There are a couple of issues:
You should use vectorized operations.
You should avoid eval like the plague. This has security implications.
For vectorized operations, simply do:
def sampleFunction(functionString,t0,t1,fs):
t = np.arange(start=t0, stop=t1, step=1/fs, dtype='float')
y = eval(functionString)
return y, t
and call it as:
sampleFunction('np.sin(t**3)/2**np.tan(t)', 0, 10, 100)
This is much faster (especially for large arrays)
Finally, the vectorized form is only a single line long. You probably don't need the extra function.
You have a problem with the allocation of the 'y' variable as Harold is saying.
However, there are multiple ways of achieving what you are doing and the eval function is, unless you have a very good reason, the absolute worst. Maybe consider one of the possible examples below:
import numpy as np
import matplotlib.pyplot as plt
import math
def sampleFunction(functionString,t0,t1,fs):
tVector=np.arange(start=t0, stop=t1, step=1/fs, dtype='float')
t=t0
y = [float]*len(tVector) # <------------------- Allocate 'y' variable
for i in range(0,len(tVector)):
t = tVector[i]
y[i]=eval(functionString)
return y,tVector
t0=0
t1 =1.5
fs=50
thold=.1
# Your code
functionString = 'math.sin(t**3)/2**math.tan(t)'
y, t = sampleFunction(functionString,t0,t1,fs)
plt.plot(t, y, color='cyan')
# Using the 'map' built-in function
t = np.arange(start=t0, stop=t1, step=1./fs, dtype='float')
y = map(lambda ti: 0.9*math.sin(ti**3)/2**math.tan(ti), t)
plt.plot(t, y, color='magenta')
# Using Numpy's 'sin' and 'tan'
t = np.arange(start=t0, stop=t1, step=1./fs, dtype='float')
y = 0.8*np.sin(t**3)/2**np.tan(t)
plt.plot(t, y, color='darkorange')
# Using 'list comprehensions'
t = np.arange(start=t0, stop=t1, step=1./fs, dtype='float')
y = [ 0.7*math.sin(ti**3)/2**math.tan(ti) for ti in t]
plt.plot(t, y, color='darkgreen')
plt.xlabel('time')
plt.ylabel('Amplitude')
plt.show()
The result is:
When running the above code, you should have gotten an error message saying, in the end, "name 'y' is not defined". If you look at your function definition, you will see that it really isn't. You cannot passing a value to y[i] without defining y first! The following line before the "for" loop fixes that particular problem:
y = [None] * len(tVector)
The code will run fine after that correction.
But: why do you want to pass a function string when you can pass a function? Functions, in Python, are first-class-objects!
I am currently stuck on a problem on which I am required to generate a curve of best fit which I am required to use a more precise x array from 250 to 100 in steps of 10. Here is my code below so far..
import numpy as np
from numpy import polyfit, polyval
import matplotlib.pyplot as plt
x = [250,300,350,400,450,500,550,600,700,750,800,900,1000]
x = np.array(x)
y = [0.791, 0.846, 0.895, 0.939, 0.978, 1.014, 1.046, 1.075, 1.102, 1.148, 1.169, 1.204, 1.234]
y= np.array(y)
r = polyfit(x,y,3)
fit = polyval(r, x)
plt.plot(x, fit, 'b')
plt.plot(x,y, color = 'r', marker = 'x')
plt.show()
If I understand correctly, you are trying to create an array of numbers from a to b by steps of c.
With pure python you can use:
list(range(a, b, c)) #in your case list(range(250, 1000, 10))
Or, since you are using numpy you can directly make the numpy array:
np.arange(a, b, c)
To create an array in steps you can use numpy.arange([start,] stop[, step]):
import numpy as np
x = np.arange(250,1000,10)
To generate values from 250-1000, use range(start, stop, step):
x = range(250,1001,10)
x = np.array(x)
I will try to specify what I say. Firstly show you the code:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1977)
x, y = np.random.random((2, 1000))
xbins = np.linspace(0, 1, 10)
ybins = np.linspace(0, 1, 10)
counts, _, _ = np.histogram2d(x, y, bins=(xbins, ybins))
print counts
You will get a two dimension array from this code, now if I have another array
z = np.random.random((2, 1000))
Then how to get a three dimension array of distribution from these three arrays. I tried:
zbins = np.linspace(0, 1, 10)
counts, _,_,_ = np.histogramdd(x, y, z, bins=(xbins, ybins, zbins))
But it's no use.
What's more, the really data file is too big to use loop statement, which will cost me hours to run it over, and it will not easy for me to check.
Thanks for thinking about the question!
I made the following code according to your last comment
import numpy as np
data = np.random.random((1000, 3))
nbins = 10
H, [bx, by, bz]=np.histogramdd(data, bins=(nbins,nbins,nbins),
range=((0,1),(0,1),(0,1)))
And H is the summary of the number of points in each grid. In your previous code, histogramdd was not used correctly. The input data is the first argument which should be N x 3 array in your case.
You can see the document of histogramdd here.