How do I compute the derivative of an array, y (say), with respect to another array, x (say) - both arrays from a certain experiment?
e.g.
y = [1,2,3,4,4,5,6] and x = [.1,.2,.5,.6,.7,.8,.9];
I want to get dy/dx!
Use numpy.diff
If dx is constant
from numpy import diff
dx = 0.1
y = [1, 2, 3, 4, 4, 5, 6]
dy = diff(y)/dx
print dy
array([ 10., 10., 10., 0., 10., 10.])
dx is not constant (your example)
from numpy import diff
x = [.1, .2, .5, .6, .7, .8, .9]
y = [1, 2, 3, 4, 4, 5, 6]
dydx = diff(y)/diff(x)
print dydx
[10., 3.33333, 10. , 0. , 10. , 10.]
Note that this approximated "derivative" has size n-1 where n is your array/list size.
Don't know what you are trying to achieve but here are some ideas:
If you are trying to make numerical differentiation maybe finite differences formulation might help you better.
The solution above is like a first-order accuracy approximation for the forward schema of finite differences with a non-uniform grid/array.
use numpy.gradient()
Please be aware that there are more advanced way to calculate the numerical derivative than simply using diff. I would suggest to use numpy.gradient, like in this example.
import numpy as np
from matplotlib import pyplot as plt
# we sample a sin(x) function
dx = np.pi/10
x = np.arange(0,2*np.pi,np.pi/10)
# we calculate the derivative, with np.gradient
plt.plot(x,np.gradient(np.sin(x), dx), '-*', label='approx')
# we compare it with the exact first derivative, i.e. cos(x)
plt.plot(x,np.cos(x), label='exact')
plt.legend()
I'm assuming this is what you meant:
>>> from __future__ import division
>>> x = [.1,.2,.5,.6,.7,.8,.9]
>>> y = [1,2,3,4,4,5,6]
>>> from itertools import izip
>>> def pairwise(iterable): # question 5389507
... "s -> (s0,s1), (s2,s3), (s4, s5), ..."
... a = iter(iterable)
... return izip(a, a)
...
>>> for ((a, b), (c, d)) in zip(pairwise(x), pairwise(y)):
... print (d - c) / (b - a)
...
10.0
10.0
10.0
>>>
question 5389507 link
That is, define dx as the difference between adjacent elements in x.
numpy.diff(x) computes
the difference between adjacent elements in x
just like in the answer by #tsm.
As a result you get an array which is 1 element shorter than the original one. This of course makes sense, as you can only start computing the differences from the first index (1 "history element" is needed).
>>> x = [1,3,4,6,7,8]
>>> dx = numpy.diff(x)
>>> dx
array([2, 1, 2, 1, 1])
>>> y = [1,2,4,2,3,1]
>>> dy = numpy.diff(y)
>>> dy
array([ 1, 2, -2, 1, -2])
Now you can divide those 2 resulting arrays to get the desired derivative.
>>> d = dy / dx
>>> d
array([ 0.5, 2. , -1. , 1. , -2. ])
If for some reason, you need a relative (to the y-values) growth, you can do it the following way:
>>> d / y[:-1]
array([ 0.5 , 1. , -0.25 , 0.5 , -0.66666667])
Interpret as 50% growth, 100% growth, -25% growth, etc.
Full code:
import numpy
x = [1,3,4,6,7,8]
y = [1,2,4,2,3,1]
dx = numpy.diff(x)
dy = numpy.diff(y)
d = dy/dx
Related
Let's say I have a simple array, like this one:
import numpy as np
a = np.array([1,2,3])
Which returns me, obviously:
array([1, 2, 3])
I'm trying to add calculated values between consecutive values in this array. The calculation should return me n equally spaced values between it's bounds.
To express myself in numbers, let's say I want to add 1 value between each pair of consecutive values, so the function should return me a array like this:
array([1, 1.5, 2, 2.5, 3])
Another example, now with 2 values between each pair:
array([1, 1.33, 1.66, 2, 2.33, 2.66, 3])
I know the logic and I can create myself a function which will do the work, but I feel numpy has specific functions that would make my code so much cleaner!
If your array is
import numpy as np
n = 2
a = np.array([1,2,5])
new_size = a.size + (a.size - 1) * n
x = np.linspace(a.min(), a.max(), new_size)
xp = np.linspace(a.min(), a.max(), a.size)
fp = a
result = np.interp(x, xp, fp)
returns: array([1. , 1.33333333, 1.66666667, 2. , 2.66666667, 3.33333333, 4. ])
If your array is always evenly spaced, you can just use
new_size = a.size + (a.size - 1) * n
result = np.linspace(a.min(), a.max(), new_size)
Using linspace should do the trick:
a = np.array([1,2,3])
n = 1
temps = []
for i in range(1, len(a)):
temps.append(np.linspace(a[i-1], a[i], num=n+1, endpoint=False))
# Add last final ending point
temps.append(np.array([a[-1]]))
new_a = np.concatenate(temps)
print(new_a)
Try with np.arange:
a = np.array([1,2,3])
n = 2
print(np.arange(a.min(), a.max(), 1 / (n + 1)))
Output:
[1. 1.33333333 1.66666667 2. 2.33333333 2.66666667]
I am aware of the scipy.spatial.distance.pdist function and how to compute the mean from the resulting matrix/ndarray.
>>> x = np.random.rand(10000, 2)
>>> y = pdist(x, metric='euclidean')
>>> y.mean()
0.5214255824176626
In the example above y gets quite large (nearly 2,500 times as large as the input array):
>>> y.shape
(49995000,)
>>> from sys import getsizeof
>>> getsizeof(x)
160112
>>> getsizeof(y)
399960096
>>> getsizeof(y) / getsizeof(x)
2498.0019986009793
But since I am only interested in the mean pairwise distance, the distance matrix doesn't have to be kept in memory. Instead the mean of each row (or column) can be computed seperatly. The final mean value can then be computed from the row mean values.
Is there already a function which exploit this property or is there an easy way to extend/combine existing functions to do so?
If you use the square version of distance, it is equivalent to using the variance with n-1:
from scipy.spatial.distance import pdist, squareform
import numpy as np
x = np.random.rand(10000, 2)
y = np.array([[1,1], [0,0], [2,0]])
print(pdist(x, 'sqeuclidean').mean())
print(np.var(x, 0, ddof=1).sum()*2)
>>0.331474285845873
0.33147428584587346
You will have to weight each row by the number of observations that make up the mean. For example the pdist of a 3 x 2 matrix is the flattened upper triangle (offset of 1) of the squareform 3 x 3 distance matrix.
arr = np.arange(6).reshape(3,2)
arr
array([[0, 1],
[2, 3],
[4, 5]])
pdist(arr)
array([2.82842712, 5.65685425, 2.82842712])
from sklearn.metrics import pairwise_distances
square = pairwise_distances(arr)
square
array([[0. , 2.82842712, 5.65685425],
[2.82842712, 0. , 2.82842712],
[5.65685425, 2.82842712, 0. ]])
square[triu_indices(square.shape[0], 1)]
array([2.82842712, 5.65685425, 2.82842712])
There is the pairwise_distances_chuncked function that can be used to iterate over the distance matrix row by row, but you will need to keep track of the row index to make sure you only take the mean of values in the upper/lower triangle of the matrix (distance matrix is symmetrical). This isn't complicated, but I imagine you will introduce a significant slowdown.
tot = ((arr.shape[0]**2) - arr.shape[0]) / 2
weighted_means = 0
for i in gen:
if r < arr.shape[0]:
sm = i[0, r:].mean()
wgt = (i.shape[1] - r) / tot
weighted_means += sm * wgt
r += 1
In pyomo, I have a piece-wise linear constraint defined through pyomo.environ.Piecewise. I keep getting a warning along the lines of
Piecewise component '<component name>' has detected slopes of consecutive piecewise segments to be within <tolerance> of one another. Refer to the Piecewise help documentation for information on how to disable this warning.
I know I could increase the tolerance and get rid of the warning, but I'm wondering if there is a general approach (through Pyomo or numpy) to reduce the number of "segments" if two consecutive slopes are below a given tolerance.
I could obviously implement this myself, but I'd like to avoid reinventing the wheel.
Ok, this is what I came up with. Definitely not optimized for performance, but my case depends on few points. It also lacks more validations on the inputs (e.g. x being sorted and unique).
def reduce_piecewise(x, y, abs_tol):
"""
Remove unnecessary points from piece-wise curve.
Points are remove if the slopes of consecutive segments
differ by less than `abs_tol`.
x points must be sorted and unique.
Consecutive y points can be the same though!
Parameters
----------
x : List[float]
Points along x-axis.
y : List[float]
abs_tol : float
Tolerance between consecutive segments.
Returns
-------
(np.array, np.array)
x and y points - reduced.
"""
if not len(x) == len(y):
raise ValueError("x and y must have same shape")
x_reduced = [x[0]]
y_reduced = [y[0]]
for i in range(1, len(x) - 1):
left_slope = (y[i] - y_reduced[-1])/(x[i] - x_reduced[-1])
right_slope = (y[i+1] - y[i])/(x[i+1] - x[i])
if abs(right_slope - left_slope) > abs_tol:
x_reduced.append(x[i])
y_reduced.append(y[i])
x_reduced.append(x[-1])
y_reduced.append(y[-1])
return np.array(x_reduced), np.array(y_reduced)
And here are some examples:
>>> x = np.array([0, 1, 2, 3])
>>> y = np.array([0, 1, 2, 3])
>>> reduce_piecewise(x, y, 0.01)
(array([0, 3]), array([0, 3]))
>>> x = np.array([0, 1, 2, 3, 4, 5])
>>> y = np.array([0, 2, -1, 3, 4.001, 5]) # 4.001 should be removed
>>> reduce_piecewise(x, y, 0.01)
(array([0, 1, 2, 3, 5]), array([ 0., 2., -1., 3., 5.]))
I am doing a cubic spline interpolation using scipy.interpolate.splrep as following:
import numpy as np
import scipy.interpolate
x = np.linspace(0, 10, 10)
y = np.sin(x)
tck = scipy.interpolate.splrep(x, y, task=0, s=0)
F = scipy.interpolate.PPoly.from_spline(tck)
I print t and c:
print F.x
array([ 0. , 0. , 0. , 0. ,
2.22222222, 3.33333333, 4.44444444, 5.55555556,
6.66666667, 7.77777778, 10. , 10. ,
10. , 10. ])
print F.c
array([[ -1.82100357e-02, -1.82100357e-02, -1.82100357e-02,
-1.82100357e-02, 1.72952212e-01, 1.26008293e-01,
-4.93704109e-02, -1.71230879e-01, -1.08680287e-01,
1.00658224e-01, 1.00658224e-01, 1.00658224e-01,
1.00658224e-01],
[ -3.43151441e-01, -3.43151441e-01, -3.43151441e-01,
-3.43151441e-01, -4.64551679e-01, 1.11955696e-01,
5.31983340e-01, 3.67415303e-01, -2.03354294e-01,
-5.65621916e-01, 1.05432909e-01, 1.05432909e-01,
1.05432909e-01],
[ 1.21033389e+00, 1.21033389e+00, 1.21033389e+00,
1.21033389e+00, -5.84561936e-01, -9.76335250e-01,
-2.60847433e-01, 7.38484392e-01, 9.20774403e-01,
6.63563923e-02, -9.56285846e-01, -9.56285846e-01,
-9.56285846e-01],
[ -4.94881722e-18, -4.94881722e-18, -4.94881722e-18,
-4.94881722e-18, 7.95220057e-01, -1.90567963e-01,
-9.64317117e-01, -6.65101515e-01, 3.74151231e-01,
9.97097891e-01, -5.44021111e-01, -5.44021111e-01,
-5.44021111e-01]])
So I had supplied the x array as :
array([ 0. , 1.11111111, 2.22222222, 3.33333333,
4.44444444, 5.55555556, 6.66666667, 7.77777778,
8.88888889, 10. ])
Q.1: The F.x (knots) are not the same as original x array and has duplicate values (possibly to force first derivative to zero?). Also some values in x (1.11111111, 8.88888889) are missing in F.x. Any ideas?
Q.2 The shape of F.c is (4, 13). I understand that 4 comes from the fact that it is cubic spline fit. But I do not know how do I select coefficients for each of the 9 sections that I want (from x = 0 to x=1.11111, x = 1.111111 to x = 2.222222 and so on). Any help in extraction of the coefficients for different segments would be appreciated.
If you want to have the knots in specific locations along the curves you need to use the argument task=-1 of splrep and give an array of interior knots as the t argument.
The knots in t must satisfy the following condition:
If provided, knots t must satisfy the Schoenberg-Whitney conditions, i.e., there must be a subset of data points x[j] such that t[j] < x[j] < t[j+k+1], for j=0, 1,...,n-k-2.
See the documentation here.
Then you should get F.c of the following size (4, <length of t> + 2*(k+1)-1) corresponding to the consecutive intervals along the curve (k+1 knots are added at either end of the curve by splrep).
Try the following:
import numpy as np
import scipy.interpolate
x = np.linspace(0, 10, 20)
y = np.sin(x)
t = np.linspace(0, 10, 10)
tck = scipy.interpolate.splrep(x, y, t=t[1:-1])
F = scipy.interpolate.PPoly.from_spline(tck)
print(F.x)
print(F.c)
# Accessing coeffs of nth segment: index = k + n - 1
# Eg. for second segment:
print(F.c[:,4])
I'm very new to Python and currently trying to replicate plots etc that I previously used GrADs for. I want to calculate the divergence at each grid box using u and v wind fields (which are just scaled by specific humidity, q), from a netCDF climate model file.
From endless searching I know I need to use some combination of np.gradient and np.sum, but can't find the right combination. I just know that to do it 'by hand', the calculation would be
divg = dqu/dx + dqv/dy
I know the below is wrong, but it's the best I've got so far...
nc = Dataset(ifile)
q = np.array(nc.variables['hus'][0,:,:])
u = np.array(nc.variables['ua'][0,:,:])
v = np.array(nc.variables['va'][0,:,:])
lon=nc.variables['lon'][:]
lat=nc.variables['lat'][:]
qu = q*u
qv = q*v
dqu/dx, dqu/dy = np.gradient(qu, [dx, dy])
dqv/dx, dqv/dy = np.gradient(qv, [dx, dy])
divg = np.sum(dqu/dx, dqv/dy)
This gives the error 'SyntaxError: can't assign to operator'.
Any help would be much appreciated.
try something like:
dqu_dx, dqu_dy = np.gradient(qu, [dx, dy])
dqv_dx, dqv_dy = np.gradient(qv, [dx, dy])
you can not assign to any operation in python; any of those are syntax errors:
a + b = 3
a * b = 7
# or, in your case:
a / b = 9
UPDATE
following Pinetwig's comment: a/b is not a valid identifier name; it is (the return value of) an operator.
Try removing the [dx, dy].
[dqu_dx, dqu_dy] = np.gradient(qu)
[dqv_dx, dqv_dy] = np.gradient(qv)
Also to point out if you are recreating plots. Gradient changed in numpy between 1.82 and 1.9. This had an effect for recreating matlab plots in python as 1.82 was the matlab method. I am not sure how this relates to GrADs. Here is the wording for both.
1.82
"The gradient is computed using central differences in the interior
and first differences at the boundaries. The returned gradient hence has
the same shape as the input array."
1.9
"The gradient is computed using second order accurate central differences in the interior and either first differences or second order accurate one-sides (forward or backwards) differences at the boundaries. The returned gradient hence has the same shape as the input array."
The gradient function for 1.82 is here.
def gradient(f, *varargs):
"""
Return the gradient of an N-dimensional array.
The gradient is computed using central differences in the interior
and first differences at the boundaries. The returned gradient hence has
the same shape as the input array.
Parameters
----------
f : array_like
An N-dimensional array containing samples of a scalar function.
`*varargs` : scalars
0, 1, or N scalars specifying the sample distances in each direction,
that is: `dx`, `dy`, `dz`, ... The default distance is 1.
Returns
-------
gradient : ndarray
N arrays of the same shape as `f` giving the derivative of `f` with
respect to each dimension.
Examples
--------
>>> x = np.array([1, 2, 4, 7, 11, 16], dtype=np.float)
>>> np.gradient(x)
array([ 1. , 1.5, 2.5, 3.5, 4.5, 5. ])
>>> np.gradient(x, 2)
array([ 0.5 , 0.75, 1.25, 1.75, 2.25, 2.5 ])
>>> np.gradient(np.array([[1, 2, 6], [3, 4, 5]], dtype=np.float))
[array([[ 2., 2., -1.],
[ 2., 2., -1.]]),
array([[ 1. , 2.5, 4. ],
[ 1. , 1. , 1. ]])]
"""
f = np.asanyarray(f)
N = len(f.shape) # number of dimensions
n = len(varargs)
if n == 0:
dx = [1.0]*N
elif n == 1:
dx = [varargs[0]]*N
elif n == N:
dx = list(varargs)
else:
raise SyntaxError(
"invalid number of arguments")
# use central differences on interior and first differences on endpoints
outvals = []
# create slice objects --- initially all are [:, :, ..., :]
slice1 = [slice(None)]*N
slice2 = [slice(None)]*N
slice3 = [slice(None)]*N
otype = f.dtype.char
if otype not in ['f', 'd', 'F', 'D', 'm', 'M']:
otype = 'd'
# Difference of datetime64 elements results in timedelta64
if otype == 'M' :
# Need to use the full dtype name because it contains unit information
otype = f.dtype.name.replace('datetime', 'timedelta')
elif otype == 'm' :
# Needs to keep the specific units, can't be a general unit
otype = f.dtype
for axis in range(N):
# select out appropriate parts for this dimension
out = np.empty_like(f, dtype=otype)
slice1[axis] = slice(1, -1)
slice2[axis] = slice(2, None)
slice3[axis] = slice(None, -2)
# 1D equivalent -- out[1:-1] = (f[2:] - f[:-2])/2.0
out[slice1] = (f[slice2] - f[slice3])/2.0
slice1[axis] = 0
slice2[axis] = 1
slice3[axis] = 0
# 1D equivalent -- out[0] = (f[1] - f[0])
out[slice1] = (f[slice2] - f[slice3])
slice1[axis] = -1
slice2[axis] = -1
slice3[axis] = -2
# 1D equivalent -- out[-1] = (f[-1] - f[-2])
out[slice1] = (f[slice2] - f[slice3])
# divide by step size
outvals.append(out / dx[axis])
# reset the slice object in this dimension to ":"
slice1[axis] = slice(None)
slice2[axis] = slice(None)
slice3[axis] = slice(None)
if N == 1:
return outvals[0]
else:
return outvals
If your grid is Gaussian and the wind names in the file are "u" and "v" you can also calculate divergence directly using cdo:
cdo uv2dv in.nc out.nc
See https://code.mpimet.mpg.de/projects/cdo/embedded/index.html#x1-6850002.13.2 for more details.