scipy splrep() with weights not fitting the given curve - python

Using scipy's splrep I can easily fit a test sinewave:
import numpy as np
from scipy.interpolate import splrep, splev
import matplotlib.pyplot as plt
plt.style.use("ggplot")
# Generate test sinewave
x = np.arange(0, 20, .1)
y = np.sin(x)
# Interpolate
tck = splrep(x, y)
x_spl = x + 0.05 # Just to show it wors
y_spl = splev(x_spl, tck)
plt.plot(x_spl, y_spl)
The splrep documentation states that the default value for the weight parameter is np.ones(len(x)). However, plotting this results in a totally different plot:
tck = splrep(x, y, w=np.ones(len(x_spl)))
y_spl = splev(x_spl, tck)
plt.plot(x_spl, y_spl)
The documentation also states that the smoothing condition s is different when a weight array is given - but even when setting s=len(x_spl) - np.sqrt(2*len(x_spl)) (the default value without a weight array) the result does not strictly correspond to the original curve as shown in the plot.
What do I need to change in the code listed above in order to make the interpolation with weight array (as listed above) output the same result as the interpolation without the weights?
I have tested this with scipy 0.17.0. Gist with a test IPython notebook

You only have to change one line of your code to get the identical output:
tck = splrep(x, y, w=np.ones(len(x_spl)))
should become
tck = splrep(x, y, w=np.ones(len(x_spl)), s=0)
So, the only difference is that you have to specify s instead of using the default one.
When you look at the source code of splrep you will see why that is necessary:
if w is None:
w = ones(m, float)
if s is None:
s = 0.0
else:
w = atleast_1d(w)
if s is None:
s = m - sqrt(2*m)
which means that, if neither weights nor s are provided, s is set to 0 and if you provide weights but no s then s = m - sqrt(2*m) where m = len(x).
So, in your example above you compare outputs with the same weights but with different s (which are 0 and m - sqrt(2*m), respectively).

Related

Numpy Polynomial class not printing correct coefficients from fit

As recommended in the numpy documentation, I am trying to move from the old polyfit and polyval functions to the Polynomial class. Below is a minimal example that demonstrates my confusion with this.
import numpy as np
x = np.array([-2.3, -2.8, -2.9, -3.1])
y = np.array([2.4, 3.1, 3.3, 3.5])
poly = np.polynomial.Polynomial.fit(x, y, 1)
print(poly)
print(poly.coef)
print(poly(0))
print(poly(1) - poly(0))
which gives the output, running numpy 1.23.3,
2.9697841726618703 - 0.5611510791366912·x¹
[ 2.96978417 -0.56115108]
-0.8179856115107942
-1.4028776978417268
Judging from the first two lines, the fitted polynomial is something like 2.969 - 0.561x. But evaluating it at x=0 gives -0.817, and evaluating the slope by f(1) - f(0) gives -1.40. The latter is what I would expect given the points that I am fitting, but what's going on with the first two lines of output?
If you check out the documentation you'll see that the polynomial object has two more fields: poly.domain and poly.window.
To get better numerical properties the range independent variable of the input to fit() will get renormalized to [-1, 1] (by default at least, this is the poly.window you can set yourself), so the coefficients you get from poly.coef are valid on that renormalized domain. To get back the "original" coefficients you have to undo that normalization, just as I've done in the following snippet. You can also go in the other direction an normalize the x-values yourself which I also included in this snippet:
import matplotlib.pyplot as plt
plt.plot(t, c[0] + c[1]*t) # line in the normalized domain
#x_normalized = (x - d[0])/(d[1] - d[0]) * 2 - 1 # points in the normalized domain
a = 2/(d[1]-d[0])
b = - 2*d[0]/(d[1]-d[0]) - 1
x_normalized = a*x + b
print('original coefficients')
print(a*c[1], c[1]*b + c[0])
plt.plot(x_normalized, y, 'o')
plt.show()

How to maintain the size of image while interpolating in python?

I have interpolated my nifti image (mri data) via linear method. but the problem is that field of view in output changes (input is 400*400 but output is 4000*3000 ) I am searching for a way to keep the dimensions while interpolating. Do you may have a solution?
Thank you
t1_ = "name.nii.gz"
img_t1_ = nib.load(t1_)
img_t1_ = np.double(img_t1_.get_fdata())
slice_ = 300
img_t1_ = np.rot90(img_t1_[:,:,slice_,0])
x = np.linspace(0, img_t1_.shape[1], img_t1_.shape[1])
y = np.linspace(0, img_t1_.shape[0], img_t1_.shape[0])
X, Y = np.meshgrid(x, y)
Z = img_t1_
x2 = np.linspace(0, img_t1_.shape[1], 9*img_t1_.shape[1])
y2 = np.linspace(0, img_t1_.shape[0], 9*img_t1_.shape[0])
print(x2.shape[0], y2.shape[0])
tmp_z_ = np.zeros((x2.shape[0], y2.shape[0]))
f_linear = interp2d(x, y, Z, kind='linear')
Z2 = f_linear(x2, y2)
I assume by keeping the dimensions, you mean keeping the aspect ratio (i.e. input: 400x400 -> output: 4000x4000). Also, I'm not exactly sure what you are trying to achieve with your interpolation. However, depending on the use case, the zoom function in scipy's ndimage module could do the job. If you chose parameter order=1, this corresponds to linear interpolation.
I'm not familiar with the packages you use to import your images. The following example assumes that the image data has been loaded to img_data, a two dimensional float array. Just adapt to your use case.
from scipy.ndimage import zoom
interpolated_img_data = zoom(img_data, 10, order=1)

Using FFT for 3D array representation of 2D field

I need to obtain the fourier transform of a complex field. I'm using python.
My input is a 2D snapshot of the electric field in the xy-plane.
I currently have a 3D array F[x][y][z] where F[x][y][0] contains the real component and F[x][y]1 contains the complex component of the field.
My current code is very simple and does this:
result=np.fft.fftn(F)
result=np.fft.fftshift(result)
I have the following questions:
1) Does this correctly compute the fourier transform of the field, or should the field be entered as a 2D matrix with each element containing both the real and imaginary component instead?
2) I entered the complex component values of the field using the real multiple only (i.e if the complex value is 6i I entered 6), is this correct or should this be entered as a complex value instead (i.e. entered as '6j')?
3) As this is technically a 2D input field, should I use np.fft.fft2 instead? Doing this means the output is not centered in the middle.
4) The output does not look like what I'd expect the fourier transform of F to look like, and I'm unsure what I'm doing wrong.
Full example code:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
x, y = np.meshgrid(np.linspace(-1,1,100), np.linspace(-1,1,100))
d = np.sqrt(x*x+y*y)
sigma, mu = .35, 0.0
g1 = np.exp(-( (d-mu)**2 / ( 2.0 * sigma**2 ) ) )
F=np.empty(shape=(300,300,2),dtype=complex)
for x in range(0,300):
for y in range(0,300):
if y<50 or x<100 or y>249 or x>199:
F[x][y][0]=g1[0][0]
F[x][y][1]=0j
elif y<150:
F[x][y][0]=g1[x-100][y-50]
F[x][y][1]=0j
else:
F[x][y][0]=g1[x-100][y-150]
F[x][y][1]=0j
F_2D=np.empty(shape=(300,300))
for x in range(0,300):
for y in range(0,300):
F_2D[x][y]=np.absolute(F[x][y][0])+np.absolute(F[x][y][1])
plt.imshow(F_2D)
plt.show()
result=np.fft.fftn(F)
result=np.fft.fftshift(result)
result_2D=np.empty(shape=(300,300))
for x in range(0,300):
for y in range(0,300):
result_2D[x][y]=np.absolute(result[x][y][0])+np.absolute(result[x][y][1])
plt.imshow(result_2D)
plt.show()
plotting F gives this:
With np.fft.fftn, the image shown at the end is:
And with np.fft.fft2:
Neither of these look like what I would expect the fourier transform of F to look like.
I add here another answer, suitable to the added code.
The answer is still np.fft.fft2(). Here's an example. I modified the code slightly. To verify that we need fft2 I discarded one of the blobs, and then we know that a single Gaussian blob should transform into a Gaussian blob (with a certain phase, that's not shown when plotting absolute value). I also decreased the standard deviation so that the frequency response will widen a little.
Code:
import numpy as np
import matplotlib.pyplot as plt
x, y = np.meshgrid(np.linspace(-1,1,100), np.linspace(-1,1,100))
d = np.sqrt(x**2+y**2)
sigma, mu = .1, 0.0
g1 = np.exp(-( (d-mu)**2 / ( 2.0 * sigma**2 ) ) )
N = 300
positions = [ [150,100] ]#, [150,200] ]
sz2 = [int(x/2) for x in g1.shape]
F_2D = np.zeros([N,N])
for x0,y0 in positions:
F_2D[ x0-sz2[0]: x0+sz2[0], y0-sz2[1]:y0+sz2[1] ] = g1 + 1j*0.
result = np.fft.fftshift(np.fft.fft2(F_2D))
plt.subplot(211); plt.imshow(F_2D)
plt.subplot(212); plt.imshow(np.absolute(result))
plt.title('$\sigma$=.1')
plt.show()
Result:
To get back to the original problem, we need only change
positions = [ [150,100] , [150,200] ]
and sigma=.35 instead of sigma=.1.
You should use complex numpy variables (by using 1j) and use fft2. For example:
N = 16
x0 = np.random.randn(N,N,2)
x = x0[:,:,0] + 1j*x0[:,:,1]
X = np.fft.fft2(x)
Using fftn on x0 will do a 3D FFT, and using fft will do vector-wise 1D FFT.

Fitting a single gaussian to 'noisy' data yields a poor fit in some cases

I have some noisy data that can contain 0 and n gaussian shapes, I am trying to implement an algorithm that takes the highest data points and fits a gaussian to that as per the following 'scheme':
New attempt, steps:
fit a spline through all data points
get first derivative of spline function
get both data points (left/right) where f'(x) = around 0 the data point with max intensity
fit a gaussian through the data points returned from 3
4a. Plot the gaussian (stopping at baseline) in the pdf
Calculate area under gaussian curve
Calculate area under raw data points
Calculate percentage of total area explained by gaussian area
I have implemented this concept using the following code (minimal working example):
#! /usr/bin/env python
from scipy.interpolate import InterpolatedUnivariateSpline
from scipy.optimize import curve_fit
from scipy.signal import argrelextrema
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
data = [(9.60380153195,187214),(9.62028167623,181023),(9.63676350256,174588),(9.65324602212,169389),(9.66972824591,166921),(9.68621215187,167597),(9.70269675106,170838),(9.71918105436,175816),(9.73566703995,181552),(9.75215371878,186978),(9.76864010158,191718),(9.78512816681,194473),(9.80161692526,194169),(9.81810538757,191203),(9.83459553243,186603),(9.85108637051,180273),(9.86757691233,171996),(9.88406913682,163653),(9.90056205454,156032),(9.91705467586,149928),(9.93354897998,145410),(9.95004397733,141818),(9.96653867816,139042),(9.98303506191,137546),(9.99953213889,138724)]
data2 = [(9.60476933166,163571),(9.62125990879,156662),(9.63775225872,150535),(9.65424539203,146960),(9.67073831905,146794),(9.68723301904,149326),(9.70372850238,152616),(9.72022377931,155420),(9.73672082933,156151),(9.75321866271,154633),(9.76971628954,151549),(9.78621568961,148298),(9.80271587303,146333),(9.81921584976,146734),(9.83571759987,150351),(9.85222013334,156612),(9.86872245996,164192),(9.88522656011,171199),(9.90173144362,175697),(9.91823612015,176867),(9.93474257034,175029),(9.95124980389,171762),(9.96775683032,168449),(9.98426563055,165026)]
def gaussFunction(x, *p):
""" TODO
"""
A, mu, sigma = p
return A*np.exp(-(x-mu)**2/(2.*sigma**2))
def quantify(data):
""" TODO
"""
backGround = 105000 # Normally this is dynamically determined but this value is fine for testing on the provided data
time,intensity = zip(*data)
x_data = np.array(time)
y_data = np.array(intensity)
newX = np.linspace(x_data[0], x_data[-1], 2500*(x_data[-1]-x_data[0]))
f = InterpolatedUnivariateSpline(x_data, y_data)
fPrime = f.derivative()
newY = f(newX)
newPrimeY = fPrime(newX)
maxm = argrelextrema(newPrimeY, np.greater)
minm = argrelextrema(newPrimeY, np.less)
breaks = maxm[0].tolist() + minm[0].tolist()
maxPoint = 0
for index,j in enumerate(breaks):
try:
if max(newY[breaks[index]:breaks[index+1]]) > maxPoint:
maxPoint = max(newY[breaks[index]:breaks[index+1]])
xData = newX[breaks[index]:breaks[index+1]]
yData = [x - backGround for x in newY[breaks[index]:breaks[index+1]]]
except:
pass
# Gaussian fit on main points
newGaussX = np.linspace(x_data[0], x_data[-1], 2500*(x_data[-1]-x_data[0]))
p0 = [np.max(yData), xData[np.argmax(yData)],0.1]
try:
coeff, var_matrix = curve_fit(gaussFunction, xData, yData, p0)
newGaussY = gaussFunction(newGaussX, *coeff)
newGaussY = [x + backGround for x in newGaussY]
# Generate plot for visual confirmation
fig = plt.figure()
ax = fig.add_subplot(111)
plt.plot(x_data, y_data, 'b*')
plt.plot((newX[0],newX[-1]),(backGround,backGround),'red')
plt.plot(newX,newY, color='blue',linestyle='dashed')
plt.plot(newGaussX, newGaussY, color='green',linestyle='dashed')
plt.title("Test")
plt.xlabel("rt [m]")
plt.ylabel("intensity [au]")
plt.savefig("Test.pdf",bbox_inches="tight")
plt.close(fig)
except:
pass
# Call the test
#quantify(data)
quantify(data2)
where normally the background (red line in below pictures) is dynamically determined, but for the sake of this example I have set it to a fixed number. The problem that I have is that for some data it works really well:
Corresponding f'(x):
However, for some other data it fails horrendously:
Corresponding f'(x):
Therefore, I would like to hear some suggestions or ideas on why this happens and on potential approaches to fix it. I have included the data that is shown in the picture below (in case anyone wants to try it):
The error lied in the following bit:
breaks = maxm[0].tolist() + minm[0].tolist()
for index,j in enumerate(breaks):
The breaks list now contains both the maxima and minima, but they are not sorted by time. Resulting in the list yielding the following data points for the poor fit: 9.78, 9.62 and 9.86.
The program would then examine data from 9.78 to 9.62 and 9.62 to 9.86, which meant that 9.62 to 9.86 contained the highest intensity data point yielding the fit that is shown in the second graph.
The fix was rather simple by just adding a sort on the breaks in between, as follows:
breaks = maxm[0].tolist() + minm[0].tolist()
breaks = sorted(breaks)
for index,j in enumerate(breaks):
The program then yielded a fit more closely resembling what I would expect:

To do a spline surface fit using scipy's RectBivariateSpline and SmoothBivariateSpline on noisy data

I am trying to do a 2D-surface fit on some imaging data. I attached an example of such data, which is basically a 1014 x 1014 array with substantial amount of noise. Example_image. Some patches of this array are invalid data, which I masked and set to NaN values, as shown in yellow in the Example image. As you can see in the image, there is a background gradient from left (brighter) to right (dimmer), which I am trying to remove. The gradient cannot be well fitted by a polynomial, hence my goal is to do a 2D-surface bivariate spline fit, and subtract the gradient off.
I have tried a number of tasks in scipy, but most of them do not return ideal result.
To start with I have tried the [RectBivariateSpline] Bivariate structured interpolation of large array with NaN values or mask), but since my image have NaNs in it, running RectBivariateSpline gives only an output of NaNs.
I have also tried SmoothBivariateSpline, which is the irregular gridded version of the task. I omitted those pixels that have NaN values and converted the rest into 1D arrays as input. But it failed as the array size is too big. I then tried to chop my array to try to run it on smaller chunks, but it gives the following error and quit with a segmentation fault, which I have no idea what it means.
fitpack2.py:1044: UserWarning:
Error on entry, no approximation returned. The following conditions
must hold:
xb<=x[i]<=xe, yb<=y[i]<=ye, w[i]>0, i=0..m-1
If iopt==-1, then
xb
I then tried to first filled in the NaN patches in my image with values using griddata with linear interpolation. Since the patches are huge, the interpolation is not ideal, but at least it gave me an array without NaN. I then use this array to run RectBivariateSpline again. But the output array is still NaNs.
I suspect that the noise in my image is screwing up the behaviour of both tasks, so I also tried to first run a Gaussian kernel on my image to smooth it, then filled in the NaN patches with griddata, then run RectBivariateSpline or SmoothBivariateSpline, but they still give me arrays with NaN values as output.
I am not sure that I understand the manual of both tasks correctly, so I attach the following script:
#!/usr/bin/python
import matplotlib
matplotlib.use('qt5agg')
#matplotlib.rc('font',**{'family':'sans-serif','sans-serif':['Helvetica']})
#matplotlib.rc('text.latex', preamble=r'\usepackage{cmbright}')
#matplotlib.rc('text.latex', preamble=r'\usepackage[scaled]{helvet} \renewcommand\familydefault{\sfdefault} \usepackage[T1]{fontenc}')
#matplotlib.rc('text', usetex=True)
import matplotlib.pyplot as plt
import numpy as np
import astropy.io.fits as pyfits
import scipy.interpolate as sp
from astropy.convolution import convolve
from astropy.convolution import Gaussian2DKernel
#------------------------------------------------------------
#Read in the arrays
hdulistorg = pyfits.open('icmj01jrq_flt.fits')
hdulistorg.info()
errarrorg = np.swapaxes(hdulistorg[1].data, 0,1)
hdulist = pyfits.open('jrq_sci_nan_deep.fits')
hdulist.info()
dataarrorg = np.swapaxes(hdulist[0].data, 0,1) #image array
errarrorg = np.swapaxes(hdulistorg[1].data, 0,1) #error array
#Flag some of the problematic values, turn NaNs into 0 for easier handling
dataarr = np.copy(dataarrorg)
w=np.isnan(dataarr)
ww=np.where(dataarr == 0)
www=np.where(dataarr > 100)
wwww=np.where(dataarr < 0)
errarr = 1.0 / (np.copy(errarrorg)+1e-5) # Try to use 1/error as the estimate for weight below
errarr[w] = 0
errarr[ww] = 0
errarr[www] = 0
errarr[wwww]=0
dataarr[w]= 0
dataarr[ww]= 0
dataarr[www]=0
dataarr[wwww]=0
#Make a gaussian kernel smoothed data
maskarr = np.copy(errarr) #For masking the nan regions so they dun get smoothed
maskarr[:]=0
maskarr[w]=1
maskarr[ww]=1
maskarr[www]=1
maskarr[wwww]=1
gauss = Gaussian2DKernel(stddev=5)
condataarr = convolve(dataarr,gauss,normalize_kernel=True,boundary='extend',mask=maskarr)
condataarr[w]=0
conerrarr = np.copy(errarr)
#Setting x,y arrays for the Spline functions
nx, ny = (1014,1014)
x = np.linspace(0, 1013, nx)
y = np.linspace(0, 1013, ny)
xv, yv = np.meshgrid(x, y)
#Make an 1D version of these 2D arrays
dataarrflat = np.ravel(condataarr[0:200,0:200]) #Try only a small chunk!
xvflat = np.ravel(xv[0:200,0:200])
yvflat = np.ravel(yv[0:200,0:200])
errarrflat = np.ravel(conerrarr[0:200,0:200])
notnanloc = np.where(dataarrflat != 0) #Not NaNs
#SmoothBivariateSpline!
rect_S_spline = sp.SmoothBivariateSpline(xvflat[notnanloc], yvflat[notnanloc], dataarrflat[notnanloc],w=errarrflat[notnanloc], kx=3, ky=3)
#Also try using grid data to fix the grid?
gddataarr = np.copy(condataarr)
gddataarrflat = np.ravel(gddataarr)
gdloc = np.where(gddataarrflat != 0) #Not NaNs
gdxvflat = np.ravel(xv)
gdyvflat = np.ravel(yv)
xyarr = np.c_[gdxvflat[gdloc],gdyvflat[gdloc]]
x_grid, y_grid = np.mgrid[0:1013:1014j,0:1013:1014j]
grid_z2 = sp.griddata(xyarr, gddataarrflat[gdloc], (x_grid, y_grid), method='linear')
plt.imshow(grid_z2.T)
#plt.show()
#RectBivariatSpline
rect_B_spline = sp.RectBivariateSpline(x, y, grid_z2.T)
#Result grid (same as input for now)
xnew = np.arange(0, 1013, 1)
ynew = np.arange(0, 1013, 1)
znewS = rect_S_spline(xnew, ynew)
znewB = rect_B_spline(xnew, ynew)
print 'znewS', znewS
print 'znewB', znewB
#Write FITS files
condataarr = np.swapaxes(condataarr, 0, 1)
hdu2 = pyfits.PrimaryHDU(condataarr)
hdulist2 = pyfits.HDUList([hdu2])
hdulist2.writeto('contest.fits',overwrite=True)
hdulist2.close()
hdu3 = pyfits.PrimaryHDU(znewS)
hdulist3 = pyfits.HDUList([hdu3])
hdulist3.writeto('Stest.fits',overwrite=True)
hdulist3.close()
I can not exactly solve your problem, but I have some code that interfaces a FORTRAN interpolation routine with python. You can just call the routines directly from python, no fortran needed.
You can find the code and a description of it at this github page
https://github.com/haakoan/inter

Categories