What is a good way to produce a numpy array containing the values of a function evaluated on an n-dimensional grid of points?
For example, suppose I want to evaluate the function defined by
def func(x, y):
return <some function of x and y>
Suppose I want to evaluate it on a two dimensional array of points with the x values going from 0 to 4 in ten steps, and the y values going from -1 to 1 in twenty steps. What's a good way to do this in numpy?
P.S. This has been asked in various forms on StackOverflow many times, but I couldn't find a concisely stated question and answer. I posted this to provide a concise simple solution (below).
shorter, faster and clearer answer, avoiding meshgrid:
import numpy as np
def func(x, y):
return np.sin(y * x)
xaxis = np.linspace(0, 4, 10)
yaxis = np.linspace(-1, 1, 20)
result = func(xaxis[:,None], yaxis[None,:])
This will be faster in memory if you get something like x^2+y as function, since than x^2 is done on a 1D array (instead of a 2D one), and the increase in dimension only happens when you do the "+". For meshgrid, x^2 will be done on a 2D array, in which essentially every row is the same, causing massive time increases.
Edit: the "x[:,None]", makes x to a 2D array, but with an empty second dimension. This "None" is the same as using "x[:,numpy.newaxis]". The same thing is done with Y, but with making an empty first dimension.
Edit: in 3 dimensions:
def func2(x, y, z):
return np.sin(y * x)+z
xaxis = np.linspace(0, 4, 10)
yaxis = np.linspace(-1, 1, 20)
zaxis = np.linspace(0, 1, 20)
result2 = func2(xaxis[:,None,None], yaxis[None,:,None],zaxis[None,None,:])
This way you can easily extend to n dimensions if you wish, using as many None or : as you have dimensions. Each : makes a dimension, and each None makes an "empty" dimension. The next example shows a bit more how these empty dimensions work. As you can see, the shape changes if you use None, showing that it is a 3D object in the next example, but the empty dimensions only get filled up whenever you multiply with an object that actually has something in those dimensions (sounds complicated, but the next example shows what i mean)
In [1]: import numpy
In [2]: a = numpy.linspace(-1,1,20)
In [3]: a.shape
Out[3]: (20,)
In [4]: a[None,:,None].shape
Out[4]: (1, 20, 1)
In [5]: b = a[None,:,None] # this is a 3D array, but with the first and third dimension being "empty"
In [6]: c = a[:,None,None] # same, but last two dimensions are "empty" here
In [7]: d=b*c
In [8]: d.shape # only the last dimension is "empty" here
Out[8]: (20, 20, 1)
edit: without needing to type the None yourself
def ndm(*args):
return [x[(None,)*i+(slice(None),)+(None,)*(len(args)-i-1)] for i, x in enumerate(args)]
x2,y2,z2 = ndm(xaxis,yaxis,zaxis)
result3 = func2(x2,y2,z2)
This way, you make the None-slicing to create the extra empty dimensions, by making the first argument you give to ndm as the first full dimension, the second as second full dimension etc- it does the same as the 'hardcoded' None-typed syntax used before.
Short explanation: doing x2, y2, z2 = ndm(xaxis, yaxis, zaxis) is the same as doing
x2 = xaxis[:,None,None]
y2 = yaxis[None,:,None]
z2 = zaxis[None,None,:]
but the ndm method should also work for more dimensions, without needing to hardcode the None-slices in multiple lines like just shown. This will also work in numpy versions before 1.8, while numpy.meshgrid only works for higher than 2 dimensions if you have numpy 1.8 or higher.
import numpy as np
def func(x, y):
return np.sin(y * x)
xaxis = np.linspace(0, 4, 10)
yaxis = np.linspace(-1, 1, 20)
x, y = np.meshgrid(xaxis, yaxis)
result = func(x, y)
I use this function to get X, Y, Z values ready for plotting:
def npmap2d(fun, xs, ys, doPrint=False):
Z = np.empty(len(xs) * len(ys))
i = 0
for y in ys:
for x in xs:
Z[i] = fun(x, y)
if doPrint: print([i, x, y, Z[i]])
i += 1
X, Y = np.meshgrid(xs, ys)
Z.shape = X.shape
return X, Y, Z
Usage:
def f(x, y):
# ...some function that can't handle numpy arrays
X, Y, Z = npmap2d(f, np.linspace(0, 0.5, 21), np.linspace(0.6, 0.4, 41))
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_wireframe(X, Y, Z)
The same result can be achieved using map:
xs = np.linspace(0, 4, 10)
ys = np.linspace(-1, 1, 20)
X, Y = np.meshgrid(xs, ys)
Z = np.fromiter(map(f, X.ravel(), Y.ravel()), X.dtype).reshape(X.shape)
In the case your function actually takes a tuple of d elements, i.e. f((x1,x2,x3,...xd)) (for example the scipy.stats.multivariate_normal function), and you want to evaluate f on N^d combinations/grid of N variables, you could also do the following (2D case):
x=np.arange(-1,1,0.2) # each variable is instantiated N=10 times
y=np.arange(-1,1,0.2)
Z=f(np.dstack(np.meshgrid(x,y))) # result is an NxN (10x10) matrix, whose entries are f((xi,yj))
Here np.dstack(np.meshgrid(x,y)) creates an 10x10 "matrix" (technically a 10x10x2 numpy array) whose entries are the 2-dimensional tuples to be evaluated by f.
My two cents:
import numpy as np
x = np.linspace(0, 4, 10)
y = np.linspace(-1, 1, 20)
[X, Y] = np.meshgrid(x, y, indexing = 'ij', sparse = 'true')
def func(x, y):
return x*y/(x**2 + y**2 + 4)
# I have defined a function of x and y.
func(X, Y)
Related
I have 2 numpy arrays with data, say x,y, and I apply plt.step() and get a continues (step) curve of it.
I would like to be able to create this function by my own, meaning I want to have an (zero order hold) step approximation to the value of y for x that does not actually exist in the original x array.
For example, in the following link I want to have the 'new' actual rectangle sine values, and not only plotted:
https://matplotlib.org/gallery/lines_bars_and_markers/step_demo.html#sphx-glr-gallery-lines-bars-and-markers-step-demo-py
You can use scipy's interp1d to create a step function. Default the interpolation is 'linear', but you can change it to 'next', 'previous' or 'nearest' for a step function.
A standard step function is obtained from step_fun = interp1d(x, y, kind='previous') and then calling it as step_fun(new_x).
The following code compares different types of "interpolation":
from matplotlib import pyplot as plt
import numpy as np
from scipy.interpolate import interp1d
x = np.random.uniform(0.1, 0.7, 20).cumsum()
y = np.sin(x)
kinds = ['linear', 'previous', 'next', 'nearest', 'cubic']
for i, kind in enumerate(kinds):
function_from_points = interp1d(x, y + i, kind=kind)
x_detailed = np.linspace(x[0], x[-1], 1000)
plt.plot(x_detailed, function_from_points(x_detailed), color='dodgerblue')
plt.scatter(x, y + i, color='crimson')
plt.yticks(range(len(kinds)), kinds)
plt.show()
You can choose tick values and corresponding function values whichever you want. This is an example not equally spaced arguments and their values:
x = np.arange(20) + np.random.random(20)/2
y = np.sin(x / 2)**2 + np.random.random(20)/5
Remark: these two arrays must have equal size. If you want your own custom function, you can use np.vectorise:
x = np.arange(20) + np.random.random(20)/2
func = np.vectorize(lambda x: np.sin(x) + np.random.random()/5)
y = func(x)
Usually I use Scipy.optimize.curve_fit to fit custom functions to data.
Data in this case was always a 1 dimensional array.
Is there a similiar function for a two dimensional array?
So, for example, I have a 10x10 numpy array. Then I have a function that does some stuff and creates a 10x10 numpy array, and I want to fit the function, so that the resulting 10x10 array has the best fit to the input array.
Maybe an example is better :)
data = pyfits.getdata('data.fits') #fits is an image format, this gives me a NxM numpy array
mod1 = pyfits.getdata('mod1.fits')
mod2 = pyfits.getdata('mod2.fits')
mod3 = pyfits.getdata('mod3.fits')
mod1_1D = numpy.ravel(mod1)
mod2_1D = numpy.ravel(mod2)
mod3_1D = numpy.ravel(mod3)
def dostuff(a,b): #originaly this is a function for 2D arrays
newdata = (mod1_1D*12)+(mod2_1D)**a - mod3_1D/b
return newdata
Now a and b should be fitted, so that newdata is as close as possible to data.
What I got so far:
data1D = numpy.ravel(data)
data_X = numpy.arange(data1D.size)
fit = curve_fit(dostuff,data_X,data1D)
But print fit only gives me
(array([ 1.]), inf)
I do have some nans in the arrays, maybe thats a problem?
The goal is to express the 2D function as a 1D function: g(x, y, ...) --> f(xy, ...)
Converting the coordinate pair (x, y) into a single number xy may seem tricky at first. But it's actually quite simple. Just enumerate all data points and you have a single number that uniquely defines each coordinate pair. The fitted function simply has to reconstruct the original coordinates, do it's calculations and return the result.
Example that fits a 2D linear gradient in a 20x10 image:
import scipy as sp
import numpy as np
import matplotlib.pyplot as plt
n, m = 10, 20
# noisy example data
x = np.arange(m).reshape(1, m)
y = np.arange(n).reshape(n, 1)
z = x + y * 2 + np.random.randn(n, m) * 3
def f(xy, a, b):
i = xy // m # reconstruct y coordinates
j = xy % m # reconstruct x coordinates
out = i * a + j * b
return out
xy = np.arange(z.size) # 0 is the top left pixel and 199 is the top right pixel
res = sp.optimize.curve_fit(f, xy, np.ravel(z))
z_est = f(xy, *res[0])
z_est2d = z_est.reshape(n, m)
plt.subplot(2, 1, 1)
plt.plot(np.ravel(z), label='original')
plt.plot(z_est, label='fitted')
plt.legend()
plt.subplot(2, 2, 3)
plt.imshow(z)
plt.xlabel('original')
plt.subplot(2, 2, 4)
plt.imshow(z_est2d)
plt.xlabel('fitted')
I would recommend using symfit for this, I wrote that to take care of all of the magic for you automatically.
In symfit you would just write the equation pretty much as you would on paper, and then you can run the fit.
I would do something like this:
from symfit import parameters, variables, Fit
# Assuming all this data is in the form of NxM arrays
data = pyfits.getdata('data.fits')
mod1 = pyfits.getdata('mod1.fits')
mod2 = pyfits.getdata('mod2.fits')
mod3 = pyfits.getdata('mod3.fits')
a, b = parameters('a, b')
x, y, z, u = variables('x, y, z, u')
model = {u: (x * 12) + y**a - z / b}
fit = Fit(model, x=mod1, y=mod2, z=mod3, u=data)
fit_result = fit.execute()
print(fit_result)
Unfortunatelly I have not yet included examples of the kind you need in the docs yet, but if you just look at the docs I think you can figure it out in case this doesn't work out of the box.
I have two arrays with values:
x = np.array([100, 123, 123, 118, 123])
y = np.array([12, 1, 14, 13])
I want to evaluate for example the function:
def func(a, b):
return a*0.8 * (b/2)
So, I want to fill the y missing values.
I am using:
import numpy as np
from scipy import interpolate
def func(a, b):
return a*0.8 * (b/2)
x = np.array([100, 123, 123, 118, 123])
y = np.array([12, 1, 14, 13])
X, Y = np.meshgrid(x, y)
Z = func(X, Y)
f = interpolate.interp2d(x, y, Z, kind='cubic')
Now, I am not sure how to continue from here.If I try:
xnew = np.linspace(0,150,10)
ynew = np.linspace(0,150,10)
Znew = f(xnew, ynew)
Znew is filled with nan values.
Also, I want to make the opposite.
If x is smaller than y and I want to interpolate always based on x values.
So, for example:
x = np.array([1,3,4])
y = np.array([1,2,3,4,5,6,7])
I want to remove values from y now.
How can I proceed with this?
To interpolate from a 1d array you can use np.interp as follow :
np.interp(np.linspace(0,1,len(x)), np.linspace(0,1,len(y)),y)
you can have a look at the documentation for full details but in short :
consider that your array y have value with references from 0 to 1 (example [5,2,6,3,9] will have indexes [0,0.25,0.5,0.75,1])
The second and the third argument of the function are the indexes and the vector y
The first argument is the indexes of the interpolated value of y
as an example :
>>> y = [0,5]
>>> indexes = [0,1]
>>> new_indexes = [0,0.5,1]
>>> np.interp(new_indexes, indexes, y)
[0,2.5,5]
I have experimental observations in a volume:
import numpy as np
# observations are not uniformly spaced
x = np.random.normal(0, 1, 10)
y = np.random.normal(5, 2, 10)
z = np.random.normal(10, 3, 10)
xx, yy, zz = np.meshgrid(x, y, z, indexing='ij')
# fake temperatures at those coords
tt = xx*2 + yy*2 + zz*2
# sample distances
dx = np.diff(x)
dy = np.diff(y)
dz = np.diff(z)
grad = np.gradient(tt, [dx, dy, dz]) # returns error
This gives me the error:
ValueError: operands could not be broadcast together with shapes (10,10,10) (3,9) (10,10,10).
EDIT: according to #jay-kominek in the comments below:
np.gradient won't work for you, it simply doesn't handle unevenly sampled data.
I've updated the question. Is there any function which can can do my computation?
Two things to note: First, scalars are single values, not arrays. Second, the signature of the function is numpy.gradient(f, *varargs, **kwargs). Note the * before varargs. That means if varargs is a list, you pass *varargs. Or you can just provide the elements of varargs as separate arguments.
So, np.gradient wants a single value for the distance along each dimension, like:
np.gradient(tt, np.diff(x)[0], np.diff(y)[0], np.diff(z)[0])
or:
distances = [np.diff(x)[0], np.diff(y)[0], np.diff(z)[0]]
np.gradient(tt, *distances)
The required dx ... to be passed to np.gradient aren't grids of differences, but just one scalar each. So grad = np.gradient(tt,0.1,0.1,0.1)appears to work.
I have two 2D array, x(ni, nj) and y(ni,nj), that I need to interpolate over one axis. I want to interpolate along last axis for every ni.
I wrote
import numpy as np
from scipy.interpolate import interp1d
z = np.asarray([200,300,400,500,600])
out = []
for i in range(ni):
f = interp1d(x[i,:], y[i,:], kind='linear')
out.append(f(z))
out = np.asarray(out)
However, I think this method is inefficient and slow due to loop if array size is too large. What is the fastest way to interpolate multi-dimensional array like this? Is there any way to perform linear and cubic interpolation without loop? Thanks.
The method you propose does have a python loop, so for large values of ni it is going to get slow. That said, unless you are going to have large ni you shouldn't worry much.
I have created sample input data with the following code:
def sample_data(n_i, n_j, z_shape) :
x = np.random.rand(n_i, n_j) * 1000
x.sort()
x[:,0] = 0
x[:, -1] = 1000
y = np.random.rand(n_i, n_j)
z = np.random.rand(*z_shape) * 1000
return x, y, z
And have tested them with this two versions of linear interpolation:
def interp_1(x, y, z) :
rows, cols = x.shape
out = np.empty((rows,) + z.shape, dtype=y.dtype)
for j in xrange(rows) :
out[j] =interp1d(x[j], y[j], kind='linear', copy=False)(z)
return out
def interp_2(x, y, z) :
rows, cols = x.shape
row_idx = np.arange(rows).reshape((rows,) + (1,) * z.ndim)
col_idx = np.argmax(x.reshape(x.shape + (1,) * z.ndim) > z, axis=1) - 1
ret = y[row_idx, col_idx + 1] - y[row_idx, col_idx]
ret /= x[row_idx, col_idx + 1] - x[row_idx, col_idx]
ret *= z - x[row_idx, col_idx]
ret += y[row_idx, col_idx]
return ret
interp_1 is an optimized version of your code, following Dave's answer. interp_2 is a vectorized implementation of linear interpolation that avoids any python loop whatsoever. Coding something like this requires a sound understanding of broadcasting and indexing in numpy, and some things are going to be less optimized than what interp1d does. A prime example being finding the bin in which to interpolate a value: interp1d will surely break out of loops early once it finds the bin, the above function is comparing the value to all bins.
So the result is going to be very dependent on what n_i and n_j are, and even how long your array z of values to interpolate is. If n_j is small and n_i is large, you should expect an advantage from interp_2, and from interp_1 if it is the other way around. Smaller z should be an advantage to interp_2, longer ones to interp_1.
I have actually timed both approaches with a variety of n_i and n_j, for z of shape (5,) and (50,), here are the graphs:
So it seems that for z of shape (5,) you should go with interp_2 whenever n_j < 1000, and with interp_1 elsewhere. Not surprisingly, the threshold is different for z of shape (50,), now being around n_j < 100. It seems tempting to conclude that you should stick with your code if n_j * len(z) > 5000, but change it to something like interp_2 above if not, but there is a great deal of extrapolating in that statement! If you want to further experiment yourself, here's the code I used to produce the graphs.
n_s = np.logspace(1, 3.3, 25)
int_1 = np.empty((len(n_s),) * 2)
int_2 = np.empty((len(n_s),) * 2)
z_shape = (5,)
for i, n_i in enumerate(n_s) :
print int(n_i)
for j, n_j in enumerate(n_s) :
x, y, z = sample_data(int(n_i), int(n_j), z_shape)
int_1[i, j] = min(timeit.repeat('interp_1(x, y, z)',
'from __main__ import interp_1, x, y, z',
repeat=10, number=1))
int_2[i, j] = min(timeit.repeat('interp_2(x, y, z)',
'from __main__ import interp_2, x, y, z',
repeat=10, number=1))
cs = plt.contour(n_s, n_s, np.transpose(int_1-int_2))
plt.clabel(cs, inline=1, fontsize=10)
plt.xlabel('n_i')
plt.ylabel('n_j')
plt.title('timeit(interp_2) - timeit(interp_1), z.shape=' + str(z_shape))
plt.show()
One optimization is to allocate the result array once like so:
import numpy as np
from scipy.interpolate import interp1d
z = np.asarray([200,300,400,500,600])
out = np.zeros( [ni, len(z)], dtype=np.float32 )
for i in range(ni):
f = interp1d(x[i,:], y[i,:], kind='linear')
out[i,:]=f(z)
This will save you some memory copying that occurs in your implementation, which occurs in the calls to out.append(...).