Difference of spline interpolation in IDL and Python - python

I wrote IDL code:
zz= [ 0, 5, 10, 15, 30, 50, 90, 100, 500]
uz= [ 20, 20, 20, 30, 60, 90, 30, -200, -200]*(-1.)
zp= findgen(120)*500+500
up= spline((zz-10.),uz,(zp/1000.0))
print, up
and IDL gave me the values of up array from about -20 to 500
.The same I did in Python
import numpy as npy
zz = npy.array([ 0, 5, 10, 15, 30, 50, 90, 100, 500])
uz = npy.array([ 20, 20, 20, 30, 60, 90, 30, -200, -200])*(-1.)
zp = npy.arange(0,120)*500+500
from scipy.interpolate import interp1d
cubic_interp_u = interp1d(zz-10., uz, kind='cubic')
up = cubic_interp_u(zp/1000)
print up
and it gave me up with values from about -20 to -160. Any idea? Thanks in advance!

Actually, I don't see a problem. I'm using UnivariateSpline here instead of interp1d and cubic_interp_u, but the underlying routines are essentially the same, as far as I can tell:
import numpy as npy
import pyplot as pl
from scipy.interpolate import UnivariateSpline
zz = npy.array([ 0, 5, 10, 15, 30, 50, 90, 100, 500])
uz = npy.array([ 20, 20, 20, 30, 60, 90, 30, -200, -200])*(-1.)
zp = npy.arange(0,120)*500+500
pl.plot(zz, uz, 'ro')
pl.plot(zp/100, UnivariateSpline(zz, uz, s=1, k=3)(zp/100), 'k-.')
pl.plot(zp/1000, UnivariateSpline(zz, uz, s=1, k=3)(zp/1000), 'b-')
The only problem I see is that you limited the interpolation, by using zp/1000. Using zp/100, I get all lots of values outside that -160, -20 range, which you can also see on the graph from the dot-dashed line, compared to the blue line (zp/1000):
It looks like scipy is doing a fine job.
By the way, if you want to (spline-)fit such outlying values, you may want to consider working in log-log space instead, or roughly normalizing your data (log-log space kind-of does that). Most fitting problems work best if the values are in the same order of magnitude.

Related

numpy interpolation with period

Can someone explain to me the code that is in the documentation specifically this:
Interpolation with periodic x-coordinates:
x = [-180, -170, -185, 185, -10, -5, 0, 365]
xp = [190, -190, 350, -350]
fp = [5, 10, 3, 4]
np.interp(x, xp, fp, period=360)
array([7.5 , 5. , 8.75, 6.25, 3. ,
3.25, 3.5 , 3.75])
I did a trial like this
import matplotlib.pyplot as plt
import numpy as np
x = [-180, -170, -185, 185, -10, -5, 0, 365]
xp = [190, -190, 350, -350]
fp = [5, 10, 3, 4]
y=np.interp(x, xp, fp, period=360)
print(x)
print(y)
plt.grid()
plt.plot(xp, fp)
#plt.scatter(x,y,marker="o",color="green")
plt.plot(x,y,'o')
plt.show()
and it shows like this
How the orange points can be considered "interpolations" is beyond me. They are not even in the curve
EDIT: Thanks to Warren Weckesser for the detailed explanation!
A plot to see it better
The numbers used in the example that demonstrates the use of period in the interp docstring can be a bit difficult to interpret in a plot. Here's what is happening...
The period is 360, and the given "known" points are
xp = [190, -190, 350, -350]
fp = [ 5, 10, 3, 4]
Note that the values in xp span an interval longer than 360. Let's consider the interval [0, 360) to be the fundamental domain of the interpolator. If we map the given points to the fundamental domain, they are:
xp1 = [190, 170, 350, 10]
fp1 = [ 5, 10, 3, 4]
Now for a periodic interpolator, we can imagine this data being extended periodically in the positive and negative directions, e.g.
xp_ext = [..., 190-360, 170-360, 350-360, 10-360, 190, 170, 350, 10, 190+360, 170+360, 350+360, 10+360, ...]
fp_ext = [..., 5, 10, 3, 4, 5, 10, 3, 4, 5, 10, 3, 4, ...]
It is this extended data that interp is interpolating.
Here's a script that replaces the array x from the example with a dense set of points. With this dense set, the plot of y = np.interp(x, xp, fp, period=360) should make clearer what is going on:
xp = [190, -190, 350, -350]
fp = [5, 10, 3, 4]
x = np.linspace(-360, 720, 1200)
y = np.interp(x, xp, fp, period=360)
plt.plot(x, y, '--')
plt.plot(xp, fp, 'ko')
plt.grid(True)
Each "corner" in the plot is at a point in the periodically extended version of (xp, fp).

scipy.stats.binned_statistic_dd() bin numbering has lots of extra bins

I'm struggling to deal with a scipy.stats.binned_statistic_dd() result. I have an array of positions and another array of ids that I'm binning in 3 directions. I'm providing a list of the bin edges as input rather than a number of bins in each direction coupled with a range option. I have 3 bins in x, 2 in y, and 3 in z, or 18 bins.
However, when I check the binnumbers listed, they are all in a range greater than 20. How do I get the bin numbers to reflect the number of bins provided and get rid of all the extra bins?
I've tried to follow what was suggested in this post (Output in scipy.stats.binned_statistic_dd()) which deals with something similar, but I can't understand how to apply this to my case. As usual, the documentation is as cryptic as ever.
Any help on get my binnumbers between 1-18 in this example would be greatly appreciated!
pos = np.array([[-0.02042167, -0.0223282 , 0.00123734],
[-0.0420364 , 0.01196078, 0.00694259],
[-0.09625651, -0.00311446, 0.06125461],
[-0.07693234, -0.02749618, 0.03617278],
[-0.07578646, 0.01199925, 0.02991888],
[-0.03258293, -0.00371765, 0.04245596],
[-0.06765955, 0.02798434, 0.07075846],
[-0.02431445, 0.02774102, 0.06719837],
[ 0.02798265, -0.01096739, -0.01658691],
[-0.00584252, 0.02043389, -0.00827088],
[ 0.00623063, -0.02642285, 0.03232817],
[ 0.00884222, 0.01498996, 0.02912483],
[ 0.07189474, -0.01541584, 0.01916607],
[ 0.07239394, 0.0059483 , 0.0740187 ],
[-0.08519159, -0.02894125, 0.10923724],
[-0.10803509, 0.01365444, 0.09555333],
[-0.0442866 , -0.00845725, 0.10361843],
[-0.04246779, 0.00396127, 0.1418258 ],
[-0.08975861, 0.02999023, 0.12713186],
[ 0.01772454, -0.0020405 , 0.08824418]])
ids = np.array([16, 9, 6, 19, 1, 4, 10, 5, 18, 11, 2, 12, 13, 8, 3, 17, 14,
15, 20, 7])
xbinEdges = np.array([-0.15298488, -0.05108961, 0.05080566, 0.15270093])
ybinEdges = np.array([-0.051, 0. , 0.051])
zbinEdges = np.array([-0.053, 0.049, 0.151, 0.253])
ret = stats.binned_statistic_dd(pos, ids, bins=[xbinEdges, ybinEdges, zbinEdges],
statistic='count', expand_binnumbers=False)
bincounts = ret.statistic
binnumber = ret.binnumber.T
>>> binnumber = array([46, 51, 27, 26, 31, 46, 32, 52, 46, 51, 46, 51, 66, 72, 27, 32, 47,
52, 32, 47], dtype=int64)
ranges = [[-0.15298488071, 0.15270092971],
[-0.051000000000000004, 0.051000000000000004],
[-0.0530000000000001, 0.25300000000000006]]
ret3 = stats.binned_statistic_dd(pos, ids, bins=(3,2,3), statistic='count', expand_binnumbers=False, range=ranges)
bincounts = ret3.statistic
binnumber = ret3.binnumber.T
>>> binnumber = array([46, 51, 27, 26, 31, 46, 32, 52, 46, 51, 46, 51, 66, 72, 27, 32, 47,
52, 32, 47], dtype=int64)
Ok, after several days of background thinking and a quick scour through the binned_statistic_dd() source code I think I've come to the correct answer and it's pretty simple.
It seem binned_statistic_dd() adds an extra set of outlier bins in the binning phase and then removes these when returning the histogram results, but leaving the bin numbers untouched (I think this is in case you want to reuse the result for further stats outputs).
So it seems that if you export the expanded binnumbers (expand_binnumbers=True) and then subtract 1 from each binnumber to re-adjust the bin indices you can calculate the "correct" bin ids.
ret2 = stats.binned_statistic_dd(pos, ids, bins=[xbinEdges, ybinEdges, zbinEdges],
statistic='count', expand_binnumbers=True)
bincounts2 = ret2.statistic
binnumber2 = ret2.binnumber
indxnum2 = binnumber2-1
corrected_bin_ids = np.ravel_multi_index((indxnum2),(numX, numY, numZ))
Quick and simple in the end!

How to defind t as a time variable

from scipy import arange
import numpy as np
import matplotlib.pyplot as plt
from numpy import cos,pi,sin
fm=200
fc=500
t=np.arange(0, 5, 0.5)
y1=cos(2*pi*fc*t+(fc-fm)/fm*sin(2*pi*fm*t))
it just a line with value one in figure how can i defind this t as a time variable
Thank you
Something like this:
(np.arange(0, 5, 0.5) * 60).astype('timedelta64[s]')
It gives you:
array([ 0, 30, 60, 90, 120, 150, 180, 210, 240, 270], dtype='timedelta64[s]')
You can choose the units--s for seconds, m for minutes, etc.

Blockproc like function for Python image processing

edit: it's an image so the suggested (How can I efficiently process a numpy array in blocks similar to Matlab's blkproc (blockproc) function) isn't really working for me
I have the following matlab code
fun = #(block_struct) ...
std2(block_struct.data) * ones(size(block_struct.data));
B=blockproc(im2double(Icorrected), [4 4], fun);
I want to remake my code, but this time in Python. I have installed Scikit and i'm trying to work around it like this
b = np.std(a, axis = 2)
The problem of course it's that i'm not applying the std for a number of blocks, just like above.
How can i do something like this? Start a loop and try to call the function for each X*X blocks? Then i wouldn't keep the size the it was.
Is there another more efficient way?
If there is no overlap in the windows you can reshape the data to suit your needs:
Find the mean of 3x3 windows of a 9x9 array.
import numpy as np
>>> a
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8],
[ 9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26],
[27, 28, 29, 30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53],
[54, 55, 56, 57, 58, 59, 60, 61, 62],
[63, 64, 65, 66, 67, 68, 69, 70, 71],
[72, 73, 74, 75, 76, 77, 78, 79, 80]])
Find the new shape
>>> window_size = (3,3)
>>> tuple(np.array(a.shape) / window_size) + window_size
(3, 3, 3, 3)
>>> b = a.reshape(3,3,3,3)
Find the mean along the first and third axes.
>>> b.mean(axis = (1,3))
array([[ 10., 13., 16.],
[ 37., 40., 43.],
[ 64., 67., 70.]])
>>>
2x2 windows of a 4x4 array:
>>> a = np.arange(16).reshape((4,4))
>>> window_size = (2,2)
>>> tuple(np.array(a.shape) / window_size) + window_size
(2, 2, 2, 2)
>>> b = a.reshape(2,2,2,2)
>>> b.mean(axis = (1,3))
array([[ 2.5, 4.5],
[ 10.5, 12.5]])
>>>
It won't work if the window size doesn't divide into the array size evenly. In that case you need some overlap in the windows or if you just want overlap numpy.lib.stride_tricks.as_strided is the way to go - a generic N-D function can be found at Efficient Overlapping Windows with Numpy
Another option for 2d arrays is sklearn.feature_extraction.image.extract_patches_2d and for ndarray's - sklearn.feature_extraction.image.extract_patches. Each manipulate the array's strides to produce the patches/windows.
I did the following
io.use_plugin('pil', 'imread')
a = io.imread('C:\Users\Dimitrios\Desktop\polimesa\\arizona.jpg')
B = np.zeros((len(a)/2 +1, len(a[0])/2 +1))
for i in xrange(0, len(a), 2):
for j in xrange(0, len(a[0]), 2):
x.append(a[i][j])
if i+1 < len(a):
x.append(a[i+1][j])
if j+1 < len(a[0]):
x.append(a[i][j+1])
if i+1 < len(a) and j+1 < len(a[0]):
x.append(a[i+1][j+1])
B[i/2][j/2] = np.std(x)
x[:] = []
and i think it's correct. Iterating over the image by 2 and taking each neighbour node, adding them to a list and calculating std.
edit* later edited for 4x4 blocks.
We can implement blockproc() in python the following way:
def blockproc(im, block_sz, func):
h, w = im.shape
m, n = block_sz
for x in range(0, h, m):
for y in range(0, w, n):
block = im[x:x+m, y:y+n]
block[:,:] = func(block)
return im
Now, let's apply it to implement contrast enhancement with local histogram equalization, with the low-contrast moon image (of size 512x512) as input and choosing 32x32 blocks:
from skimage import data, exposure
img = data.moon()
img = img / img.max()
m, n = 64, 64
img_eq = blockproc(img.copy(), (m, n), exposure.equalize_hist)
Display the input and output images:
Note that the function does in-place modification to the image, hence a copy of the input image is passed instead.

Variable amount of dimensions in slice

I have a multidimensional array called resultsten, with the following shape
print np.shape(resultsten)
(3, 3, 6, 10, 1, 9)
In some occasions, I use a part of this array in a program called cleanup, which then further tears this array apart into x, y, and z arrays:
x,y,z = cleanup(resultsten[0,:,:,:,:,:])
def cleanup(resultsmat):
x = resultsmat[:,:,:,:,2]
y = resultsmat[:,:,:,:,1]
z = resultsmat[:,:,:,:,4]
return x,y,z
However, it might also occur that I do not want to put the entire matrix of resultsten in my program cleanup, thus:
x,y,z = cleanup(resultsten[0,0,:,:,:,:])
This, of course gives an error, as the indices given to cleanup do not match the indices expected.
I was wondering if it is possible to have a variable amount of dimensions included in your slice.
I would like to know a command that takes all the entries for every dimension, up until the last dimension, where it only takes one index.
I've seen that is possible to do this for all dimensions except the first, e.g
resultsten[1,:,:,:,:,:]
gives the same result as:
resultsten[1,:]
I tried this:
resultsten[:,1]
but it does not give the required result, Python interprets it like this:
resultsten[:,1,:,:,:,:]
MWE:
def cleanup(resultsmat):
x = resultsmat[:,:,:,0,2]
y = resultsmat[:,:,:,0,1]
z = resultsmat[:,:,:,0,4]
return x,y,z
resultsten=np.arange(3*3*6*10*1*9).reshape(3,3,6,10,1,9)
x0,y0,z0 = cleanup(resultsten[0,:,:,:,:,:]) #works
x0,y0,z0 = cleanup(resultsten[0,0,:,:,:,:]) #does not work
I would use a list of slice objects:
import numpy as np
A = np.arange(2*3*4*5).reshape(2,3,4,5)
#[:] <-> [slice(None,None, None)]
sliceList = [slice(None, None, None)]*(len(A.shape)-1)
a,b,c,d,e = [A[sliceList+[i]] for i in range(A.shape[-1])]
Output:
>>> A[:,:,:,0]
array([[[ 0, 5, 10, 15],
[ 20, 25, 30, 35],
[ 40, 45, 50, 55]],
[[ 60, 65, 70, 75],
[ 80, 85, 90, 95],
[100, 105, 110, 115]]])
>>> a
array([[[ 0, 5, 10, 15],
[ 20, 25, 30, 35],
[ 40, 45, 50, 55]],
[[ 60, 65, 70, 75],
[ 80, 85, 90, 95],
[100, 105, 110, 115]]])

Categories