I have two arrays with values:
x = np.array([100, 123, 123, 118, 123])
y = np.array([12, 1, 14, 13])
I want to evaluate for example the function:
def func(a, b):
return a*0.8 * (b/2)
So, I want to fill the y missing values.
I am using:
import numpy as np
from scipy import interpolate
def func(a, b):
return a*0.8 * (b/2)
x = np.array([100, 123, 123, 118, 123])
y = np.array([12, 1, 14, 13])
X, Y = np.meshgrid(x, y)
Z = func(X, Y)
f = interpolate.interp2d(x, y, Z, kind='cubic')
Now, I am not sure how to continue from here.If I try:
xnew = np.linspace(0,150,10)
ynew = np.linspace(0,150,10)
Znew = f(xnew, ynew)
Znew is filled with nan values.
Also, I want to make the opposite.
If x is smaller than y and I want to interpolate always based on x values.
So, for example:
x = np.array([1,3,4])
y = np.array([1,2,3,4,5,6,7])
I want to remove values from y now.
How can I proceed with this?
To interpolate from a 1d array you can use np.interp as follow :
np.interp(np.linspace(0,1,len(x)), np.linspace(0,1,len(y)),y)
you can have a look at the documentation for full details but in short :
consider that your array y have value with references from 0 to 1 (example [5,2,6,3,9] will have indexes [0,0.25,0.5,0.75,1])
The second and the third argument of the function are the indexes and the vector y
The first argument is the indexes of the interpolated value of y
as an example :
>>> y = [0,5]
>>> indexes = [0,1]
>>> new_indexes = [0,0.5,1]
>>> np.interp(new_indexes, indexes, y)
[0,2.5,5]
Related
I have this function which computes the average of y value for the same x, but it doesn't work when I have (x +/- eps).
import numpy as np
import matplotlib.pyplot as plt
from uncertainties import ufloat
from uncertainties.umath import *
x = np.array([0, 0,1,1,2,2,2], float)
y = np.array([1, 2, 3,5,4, 4, 6.8], float)
def avg_group(x, y):
A, ind, counts = np.unique(x, return_index=True, return_counts=True)
B = y[ind]
for dup in A[counts>1]:
B[(A==dup)] = np.average(y[(x==dup)] )
return A, B
new_x, new_y = avg_group(x, y)
plt.plot(new_x,new_y,'o')
plt.show()
How can I add a condition into avg_average to get the average of y (for an x+/- eps)?
I dont think there is an in-built function for doing this. Even if there was, there is a little bit of ambiguity in the question. Consider a sequence like this:
x = x1, x1+eps, x1 + 2eps...
x1 is close to x1+eps (ie within eps) but not close to x1+2eps. x1+eps is close to both x1 and x1+2eps. Wo which x's do we treat as "same"? x1 and x1+eps can be treated as same but so can x1+eps and x2+eps. For the sake of further discussion, I will that we will have two different x values in this case: x1 and x1+2eps.
Assuming the above, we can iterate over sorted copy of x, and for each value of x, we check if it is close enough to previous value. If it is, we group it with the previous value else create a new entry.
I extended your code to implement the above.
import numpy as np
import matplotlib.pyplot as plt
eps = 1e-5
x = np.array([0, 0,1,1,2,2,2], float)
y = np.array([1, 2, 3,5,4, 4, 6.8], float)
def avg_group(x, y):
# sort by x-vales
sorted_indices = np.argsort(x)
x = x[sorted_indices]
y = y[sorted_indices]
sum_and_count = [(x[0], y[0], 1)]
# we are maintaining a list of tupls of (x, sum of y, count of y)
for index, (current_x, current_y) in enumerate(zip(x[1:], y[1:]), 1):
# check if this x value is eps close to the previous x value
previous_x = x[index-1]
if current_x - previous_x <= eps:
# This entry belongs to the previous tuple
prev_tuple = sum_and_count[-1]
new_tuple = (prev_tuple[0], prev_tuple[1]+current_y, prev_tuple[2]+1)
sum_and_count[-1] = new_tuple
else:
# insert a new tuple
new_tuple = (current_x, current_y, 1)
sum_and_count.append(new_tuple)
x, sum_y, count_y = zip(*sum_and_count)
return np.asarray(x), np.asarray(sum_y) / np.asarray(count_y)
new_x, new_y = avg_group(x, y)
plt.plot(new_x,new_y,'o')
plt.show()
A colab notebook (with the code) is linked here.
Let me know if this helps or if you have any followup questions :)
I'd suggest you use pandas.DataFrame, instead of writing your own functions. By using pandas.DataFrame.GroupBy.mean() on a new column of close-values you can get the expected result.
import numpy as np, pandas as pd,
df = pd.DataFrame({'x':np.r_[0.0, 0.05, 0.93, 1, 2.1, 1.95, 2 ], #added small values
'y':np.r_[1, 2, 3, 5, 4, 4, 6.8]})
dist = .3
df['close_ind'] = df['x'].sort_values().diff().gt(dist).cumsum()
x_new = df.groupby('close_ind')['x'].mean().tolist()
y_new = df.groupby('close_ind')['y'].mean().tolist()
x_new: [0.025, 0.965, 2.017]
y_new: [1.5, 4.0, 4.933]
How do I evaluate a function in n variables in numpy? For simplicity, let n = 3. Consider the following example:
x, y, z = numpy.linspace(0, 1, 100), numpy.linspace(0, 1, 100), numpy.linspace(0, 1, 100)
def F(a, b, c): # Test function in 3 variables
return a + b + c
F_over_xyz = ... # How to get an array that contains F evaluated at all points in [0;1]³?
Somehow, I am also having a hard time wrapping my head around which shape the generated array would have?.
A general way to get Cartesian product of any wanted number of arrays is:
np.stack(np.meshgrid(*arrays), axis=-1).reshape(-1, len(arrays))
So you could list all the points in [0;1]³:
import numpy as np
arrays = np.linspace(0, 1, 100), np.linspace(0, 1, 100), np.linspace(0, 1, 100)
list_of_points = np.stack(np.meshgrid(*arrays), axis=-1).reshape(-1, len(arrays))
Shape of list_of_points is (1000000, 3): 1M points, 3 coordinates each.
Then you can calculate sum of coordinates like so:
np.sum(list_of_points, axis=1)
You could also try:
import numpy as np
nn = 4
x,y,z=np.linspace(0,1,nn),np.linspace(0,1,nn),np.linspace(0,1,nn)
def F(a, b, c): # Test function in 3 variables
return a + b + c
# this creates your grid
xgrid,ygrid,zgrid = np.meshgrid(x,y,z)
# output[i,j,k] will be F(xgrid[i,j,k],ygrid[i,j,k],zgrid[i,j,k])
output = F(xgrid,ygrid,zgrid)
I have the following data:
x = [0, 2, 4, 8, 30]
y = [1.2e-3, 3.5e-4, 5.1e-5, 1.6e-5, 2e-7]
I'm trying to interpolate to get y from a given x value.
When plotted the data looks like:
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1)
ax.semilogy(x, y, 'o-')
plt.show()
So say I'm interested in finding what value of x is for a y value of 3e-5.
I can get the x value of a given y by:
z = np.linspace(0, 30, 10000)
logy = np.log10(y)
yy = np.power(10.0, np.interp(z, x, logy))
z[np.isclose(3.5e-5, yy, atol=1e-8)]
Out:
array([5.29852985])
But I have to adjust the atol if I change the value to get a single match and also have to create a load more data points to get the resolution.
Is there a simpler way to do this? Thanks.
Let's say you want to find the x_f corresponding to y_f. Assuming the entries in your original list y are in strictly decreasing order and the entries in x are increasing, you find the first entry in y that is less than or equal to your y_0. Say it is the one at index i, so the x,y tuples that make up your the relevant linear function from your piecewise partition will be (x[i-1],y[i-1]) and (x[i], y[i]).
Using the formula for the line given two points on the line we can get x_f:
x_f = x[i-1] + (x[i]-x[i-1])/(y[i]-y[i-1])*(y_f-y[i-1])
I have a 2d matrix (1800*600) with many NaN values.
I would like to conduct a 2d interpolation, which is very simple in matlab.
But if scipy.interpolate.inter2d is used, the result is a NaN matrix. I know the NaN values could be filled using scipy.interpolate.griddata, but I don't want to fulfill the Nan. What other functions can I use to conduct a 2d interpolation?
A workaround using inter2d is to perform two interpolations: one on the filled data (replace the NaNs with an arbitrary value) and one to keep track of the undefined areas. It is then possible to re-assign NaN value to these areas:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from scipy.interpolate import interp2d
# Generate some test data:
x = np.linspace(-2, 2, 40)
y = np.linspace(-2, 2, 41)
xx, yy = np.meshgrid(x, y)
z = xx**2+yy**2
z[ xx**2+yy**2<1 ] = np.nan
# Interpolation functions:
nan_map = np.zeros_like( z )
nan_map[ np.isnan(z) ] = 1
filled_z = z.copy()
filled_z[ np.isnan(z) ] = 0
f = interp2d(x, y, filled_z, kind='linear')
f_nan = interp2d(x, y, nan_map, kind='linear')
# Interpolation on new points:
xnew = np.linspace(-2, 2, 20)
ynew = np.linspace(-2, 2, 21)
z_new = f(xnew, ynew)
nan_new = f_nan( xnew, ynew )
z_new[ nan_new>0.5 ] = np.nan
plt.pcolor(xnew, ynew, z_new);
What is a good way to produce a numpy array containing the values of a function evaluated on an n-dimensional grid of points?
For example, suppose I want to evaluate the function defined by
def func(x, y):
return <some function of x and y>
Suppose I want to evaluate it on a two dimensional array of points with the x values going from 0 to 4 in ten steps, and the y values going from -1 to 1 in twenty steps. What's a good way to do this in numpy?
P.S. This has been asked in various forms on StackOverflow many times, but I couldn't find a concisely stated question and answer. I posted this to provide a concise simple solution (below).
shorter, faster and clearer answer, avoiding meshgrid:
import numpy as np
def func(x, y):
return np.sin(y * x)
xaxis = np.linspace(0, 4, 10)
yaxis = np.linspace(-1, 1, 20)
result = func(xaxis[:,None], yaxis[None,:])
This will be faster in memory if you get something like x^2+y as function, since than x^2 is done on a 1D array (instead of a 2D one), and the increase in dimension only happens when you do the "+". For meshgrid, x^2 will be done on a 2D array, in which essentially every row is the same, causing massive time increases.
Edit: the "x[:,None]", makes x to a 2D array, but with an empty second dimension. This "None" is the same as using "x[:,numpy.newaxis]". The same thing is done with Y, but with making an empty first dimension.
Edit: in 3 dimensions:
def func2(x, y, z):
return np.sin(y * x)+z
xaxis = np.linspace(0, 4, 10)
yaxis = np.linspace(-1, 1, 20)
zaxis = np.linspace(0, 1, 20)
result2 = func2(xaxis[:,None,None], yaxis[None,:,None],zaxis[None,None,:])
This way you can easily extend to n dimensions if you wish, using as many None or : as you have dimensions. Each : makes a dimension, and each None makes an "empty" dimension. The next example shows a bit more how these empty dimensions work. As you can see, the shape changes if you use None, showing that it is a 3D object in the next example, but the empty dimensions only get filled up whenever you multiply with an object that actually has something in those dimensions (sounds complicated, but the next example shows what i mean)
In [1]: import numpy
In [2]: a = numpy.linspace(-1,1,20)
In [3]: a.shape
Out[3]: (20,)
In [4]: a[None,:,None].shape
Out[4]: (1, 20, 1)
In [5]: b = a[None,:,None] # this is a 3D array, but with the first and third dimension being "empty"
In [6]: c = a[:,None,None] # same, but last two dimensions are "empty" here
In [7]: d=b*c
In [8]: d.shape # only the last dimension is "empty" here
Out[8]: (20, 20, 1)
edit: without needing to type the None yourself
def ndm(*args):
return [x[(None,)*i+(slice(None),)+(None,)*(len(args)-i-1)] for i, x in enumerate(args)]
x2,y2,z2 = ndm(xaxis,yaxis,zaxis)
result3 = func2(x2,y2,z2)
This way, you make the None-slicing to create the extra empty dimensions, by making the first argument you give to ndm as the first full dimension, the second as second full dimension etc- it does the same as the 'hardcoded' None-typed syntax used before.
Short explanation: doing x2, y2, z2 = ndm(xaxis, yaxis, zaxis) is the same as doing
x2 = xaxis[:,None,None]
y2 = yaxis[None,:,None]
z2 = zaxis[None,None,:]
but the ndm method should also work for more dimensions, without needing to hardcode the None-slices in multiple lines like just shown. This will also work in numpy versions before 1.8, while numpy.meshgrid only works for higher than 2 dimensions if you have numpy 1.8 or higher.
import numpy as np
def func(x, y):
return np.sin(y * x)
xaxis = np.linspace(0, 4, 10)
yaxis = np.linspace(-1, 1, 20)
x, y = np.meshgrid(xaxis, yaxis)
result = func(x, y)
I use this function to get X, Y, Z values ready for plotting:
def npmap2d(fun, xs, ys, doPrint=False):
Z = np.empty(len(xs) * len(ys))
i = 0
for y in ys:
for x in xs:
Z[i] = fun(x, y)
if doPrint: print([i, x, y, Z[i]])
i += 1
X, Y = np.meshgrid(xs, ys)
Z.shape = X.shape
return X, Y, Z
Usage:
def f(x, y):
# ...some function that can't handle numpy arrays
X, Y, Z = npmap2d(f, np.linspace(0, 0.5, 21), np.linspace(0.6, 0.4, 41))
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_wireframe(X, Y, Z)
The same result can be achieved using map:
xs = np.linspace(0, 4, 10)
ys = np.linspace(-1, 1, 20)
X, Y = np.meshgrid(xs, ys)
Z = np.fromiter(map(f, X.ravel(), Y.ravel()), X.dtype).reshape(X.shape)
In the case your function actually takes a tuple of d elements, i.e. f((x1,x2,x3,...xd)) (for example the scipy.stats.multivariate_normal function), and you want to evaluate f on N^d combinations/grid of N variables, you could also do the following (2D case):
x=np.arange(-1,1,0.2) # each variable is instantiated N=10 times
y=np.arange(-1,1,0.2)
Z=f(np.dstack(np.meshgrid(x,y))) # result is an NxN (10x10) matrix, whose entries are f((xi,yj))
Here np.dstack(np.meshgrid(x,y)) creates an 10x10 "matrix" (technically a 10x10x2 numpy array) whose entries are the 2-dimensional tuples to be evaluated by f.
My two cents:
import numpy as np
x = np.linspace(0, 4, 10)
y = np.linspace(-1, 1, 20)
[X, Y] = np.meshgrid(x, y, indexing = 'ij', sparse = 'true')
def func(x, y):
return x*y/(x**2 + y**2 + 4)
# I have defined a function of x and y.
func(X, Y)