Interpolating a 2D data grid in python - python

I have a 2D grid with radioactive beta-decay rates. Each vale corresponds to a rate on a specific pair of temperature and density (both on logarithmic scale). What I would like to do, is when I have a temperature and density data pair (after getting their logarithms), to find the matching values in the table. I tried using the scipy interpolate interpn function, but I got a little confused, I would be grateful for the help.
What I have so far:
pointsx = np.array([7+0.2*i for i in range(0,16)]) #temperature range
pointsy = np.array([i for i in range(0,11) ]) #rho_el range
data = numpy.loadtxt(filename) #getting data from file
logT = np.log10(T) #wanted temperature logarithmic
logrho = np.log10(rho) #wanted rho logarithmic
The interpn function has the following arguments: points, values, xi, method='linear', bounds_error=True, fill_value=nan. I figure that the points will be the pointsx and pointsy I have, the data is quite obvious, and xi will be the (T,rho) I'm looking for. But I'm not sure, what dimensions they should have? The points is the same size, as the data? So I have to make an array of the corresponding pairs of T and rho, which will be the points part, and then have a (T, rho) pair as xi?

When you aren't certain about how a function works, it's always a good idea to open up a REPL and test it yourself. In this case, the function works exactly as expected, given your understanding of the documentation.
>>> points = [[1, 2, 3, 4], [1, 2, 3, 4]] # Input values for each grid dimension
>>> values = [[1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6], [4, 5, 6, 7]] # The grid itself
>>> xi = (1, 1.5)
>>> scipy.interpolate.interpn(points, values, xi)
array([ 1.5])
>>> xi = [[1, 1.5], [2, 1.5], [2, 2.5], [3, 2.5], [3, 3.5], [4, 3.5]]
>>> scipy.interpolate.interpn(points, values, xi)
array([ 1.5, 2.5, 3.5, 4.5, 5.5, 6.5])
The only thing you missed was that points is supposed to be a tuple. But as you can see from the above, it works even if points ins't a tuple.

Related

Calculate Similarity multi dimensions array Using fastdtw

I have trying to use fastdtw to calculate similarity
Here is the working example: The similarity is 0.916%.
dataSetI = [1, 0.5, 2, 2]
dataSetII = [1, 1, 1, 0.51, 2, 1]
x = np.array(dataSetI)
y = np.array(dataSetII)
distance, path = fastdtw(x, y, dist=euclidean)
print("{:.3f}%".format(similarity))#0.916%
But the dataset I am going to compare is a multidimensional array, random index length
Example:
dataSetI = [[1, 0.5], [2, 2],[]]]
dataSetII = [[1, 1,3,5], [1, 0.51], [2, 1,5,6,7]]
x = np.array(dataSetI)
y = np.array(dataSetII)
distance, path = fastdtw(x, y, dist=euclidean)
#error here
ValueError: setting an array element with a sequence.
So my question is: Am I able to do this using fastdtw? Or is there any library able to do this? Please let me know. Thx.
Nooo! Dont use FastDTW
FastDTW is approximate and Generally Slower than the Algorithm it Approximates
Renjie Wu, Eamonn J. Keogh: ICDE 2021: 2327-2328

Creating a contour plot from three data columns

I have two columns of input data, that I want as my x and y axis, and a third column of results data relating to the inputs. I have 36 combinations of inputs and then 36 results
I want to achieve something like this plot
I have tried using a cmap but get told the z data is in 1D and needs to be 2D and don't understand how I get get around this issue
Also attached another method below
data = excel[['test','A_h','f_h','fore C_T','hind C_T','fore eff','hind eff','hind C_T ratio','hind eff ratio']]
x = data['A_h']
y = data['f_h']
z = data['hind C_T ratio']
X,Y = np.meshgrid(x,y)
Z = z
plt.pcolor(x,y,z)
If you have arrays [1, 2, 3] and [4, 5, 6] then meshgrid will will give you two arrays of 3x3 each: [[1, 1, 1], [2, 2, 2], [3, 3, 3]] and [[4, 5, 6], [4, 5, 6], [4, 5, 6]]. In your case, you seem to have this already taken care of, since you have 36 each of x, y, z, values. So meshgrid won't be necessary.
If your arrays are well defined (already in the 11122233 and 456456456 format above), then you can just reshape them:
x = np.reshape(data['A_h'], (6,6))
y = np.reshape(data['f_h'], (6,6))
z = np.reshape(data['hind C_T ratio'], (6,6))
plt.contourf(x, y, z)
You can see more help about contourf for details.
On the other hand, if your data are irregular (the 36 points do not form a grid), then you will have to use griddata as #obchardon suggested above.

Phase correlation for rotation registration using opencv

I'm trying to register two images that are a rotated and translated version of one another using opencv. Generally speaking, the procedure is (pseudo code):
a. IF1 = FFT2(I1); IF2 = FFT2(I2)
b. R_translation = (IF1).*(IF2_conjugate)
c. R_translation = R_translation./abs(R_translation)
d. r_translation = IFFT2(R_translation)
where the maximum of r_translation corresponds to the translation. Moving on to calculate the rotation, the abs value removes the translation part,
e. IF1_abs = abs(IF1); IF2_abs = abs(IF2)
Converting to Linear-Polar coordinates,
f. IF1_abs_pol = LINPOL(IF1_abs); IF2_abs_pol = LINPOL(IF2_abs)
f. IFF1 = FFT2(IF1_abs_pol); IFF2 = FFT2(IF2_abs_pol)
f. R_rot = (IFF1).*(IFF2_conjugate)
c. R_rot = R_rot./abs(R_rot)
d. r_rot = IFFT2(R_rot)
where the maximum of r_rotationn corresponds to the rotation. While for translation alone, the cv2.phaseCorrelate function returns expected results, for rotation, it returns odd results. So I had tried the following.
I took two numpy.array-s 5x5, which are a rotated version of one another like so:
a = numpy.array([[1, 2, 3, 4, 5], [1, 2, 3, 4, 5], [1, 2, 3, 4, 5], [1, 2, 3, 4, 5], [1, 2, 3, 4, 5]])
a = a.astype('float')/a.astype('float').max()
b = numpy.array([[5, 5, 5, 5, 5], [4, 4, 4, 4, 4], [3, 3, 3, 3, 3], [2, 2, 2, 2, 2], [1, 1, 1, 1, 1]])
b = b.astype('float') / b.astype('float').max()
First I calculated the phase correlation myself:
center_x = numpy.floor(a.shape[0] / 2.0)#the x center of rotation (= x center of image)
center_y = numpy.floor(a.shape[1] / 2.0)#the y center of rotation (= y center of image)
Mvalue = a.shape[1] / numpy.sqrt(
((a.shape[0] / 2.0) ** 2.0) + ((a.shape[1] / 2.0) ** 2.0)) # rotation radius
Calculating the FFT, taking the absolute value (losing the translation difference data if existed), and switching to Linear-Polar coordinates and normalizing:
a_polar = cv2.linearPolar(numpy.abs(numpy.fft.fft2(a)), (center_x, center_y), Mvalue, cv2.WARP_FILL_OUTLIERS)
b_polar = cv2.linearPolar(numpy.abs(numpy.fft.fft2(b)), (center_x, center_y), Mvalue, cv2.WARP_FILL_OUTLIERS)
a_polar = a_polar/a_polar.max()
b_polar = b_polar / b_polar.max()
Another FFT step, multiplying point wise, and IFFT back:
aff = numpy.fft.fft2(a_polar)
bff = numpy.fft.fft2(b_polar)
R = aff * numpy.ma.conjugate(bff)
R = R / numpy.absolute(R)
r = numpy.fft.ifft2(R).real
r = r/r.max()
yields,
Phase correlation for rotation, b with respect to a
According to cv2.linearPolar() the rows, span the angle (in this case with step size of 360/5 = 72degrees) and the columns span the radius (from 0 to the maximum radius given in Mvalue. The maximum is evident at the last row (corresponding to approximately -90degree shift). So far so good..
The second method is using cv2.phaseCorrelate() directly,
r_direct = cv2.phaseCorrelate(a_polar, b_polar)
which yields,
Phase correlation for rotation, b with respect to a direct method
The first tuple, is the X,Y correlation coefficient (in pixels?) and the third number is the fit grade. When it is close to unity, the correlation coefficient represents better the data (the blob around the maximum is more distinct).
Other than the fact that the result is not distinct enough (why?), the result is confusing...
Generally, The first FFT process in this 5x5 example was not necessary. If rotation is the only interference, one can immediately switch to Linear-Polar coordinates and use cv2.phaseCorrelate. In that case, the result is also confusing.
Any help would be appreciated :)
Thanks!
David

Binning a numpy array

I have a numpy array which contains time series data. I want to bin that array into equal partitions of a given length (it is fine to drop the last partition if it is not the same size) and then calculate the mean of each of those bins.
I suspect there is numpy, scipy, or pandas functionality to do this.
example:
data = [4,2,5,6,7,5,4,3,5,7]
for a bin size of 2:
bin_data = [(4,2),(5,6),(7,5),(4,3),(5,7)]
bin_data_mean = [3,5.5,6,3.5,6]
for a bin size of 3:
bin_data = [(4,2,5),(6,7,5),(4,3,5)]
bin_data_mean = [7.67,6,4]
Just use reshape and then mean(axis=1).
As the simplest possible example:
import numpy as np
data = np.array([4,2,5,6,7,5,4,3,5,7])
print data.reshape(-1, 2).mean(axis=1)
More generally, we'd need to do something like this to drop the last bin when it's not an even multiple:
import numpy as np
width=3
data = np.array([4,2,5,6,7,5,4,3,5,7])
result = data[:(data.size // width) * width].reshape(-1, width).mean(axis=1)
print result
Since you already have a numpy array, to avoid for loops, you can use reshape and consider the new dimension to be the bin:
In [33]: data.reshape(2, -1)
Out[33]:
array([[4, 2, 5, 6, 7],
[5, 4, 3, 5, 7]])
In [34]: data.reshape(2, -1).mean(0)
Out[34]: array([ 4.5, 3. , 4. , 5.5, 7. ])
Actually this will just work if the size of data is divisible by n. I'll edit a fix.
Looks like Joe Kington has an answer that handles that.
Try this, using standard Python (NumPy isn't necessary for this). Assuming Python 2.x is in use:
data = [ 4, 2, 5, 6, 7, 5, 4, 3, 5, 7 ]
# example: for n == 2
n=2
partitions = [data[i:i+n] for i in xrange(0, len(data), n)]
partitions = partitions if len(partitions[-1]) == n else partitions[:-1]
# the above produces a list of lists
partitions
=> [[4, 2], [5, 6], [7, 5], [4, 3], [5, 7]]
# now the mean
[sum(x)/float(n) for x in partitions]
=> [3.0, 5.5, 6.0, 3.5, 6.0]
I just wrote a function to apply it to all array size or dimension you want.
data is your array
axis is the axis you want to been
binstep is the number of points between each bin (allow overlapping bins)
binsize is the size of each bin
func is the function you want to apply to the bin (np.max for maxpooling, np.mean for an average ...)
def binArray(data, axis, binstep, binsize, func=np.nanmean):
data = np.array(data)
dims = np.array(data.shape)
argdims = np.arange(data.ndim)
argdims[0], argdims[axis]= argdims[axis], argdims[0]
data = data.transpose(argdims)
data = [func(np.take(data,np.arange(int(i*binstep),int(i*binstep+binsize)),0),0) for i in np.arange(dims[axis]//binstep)]
data = np.array(data).transpose(argdims)
return data
In you case it will be :
data = [4,2,5,6,7,5,4,3,5,7]
bin_data_mean = binArray(data, 0, 2, 2, np.mean)
or for the bin size of 3:
bin_data_mean = binArray(data, 0, 3, 3, np.mean)

Matrix Approximation and Predicting Timeseries in Python/R with SVD

I have an excel file that is 126 rows and 5 columns full of numbers, I have to use that data and SVD methods to predict 5-10 more rows of data. I have implemented SVD in Python successfully using numpy:
import numpy as np
from numpy import genfromtxt
my_data = genfromtxt('data.csv', delimiter=',')
U, s, V = np.linalg.svd(my_data)
print ("U:")
print (U)
print ("\nSigma:")
print (s)
print ("\nVT:")
print (V)
which outputs:
U:
[[-0.03339497 0.10018171 0.01013636 ..., -0.10076323 -0.09740801
-0.08901366]
[-0.02881809 0.0992715 -0.01239945 ..., -0.02920558 -0.04133748
-0.06100236]
[-0.02501102 0.10637736 -0.0528663 ..., -0.0885227 -0.05408083
-0.01678337]
...,
[-0.02418483 0.10993637 0.05200962 ..., 0.9734676 -0.01866914
-0.00870467]
[-0.02944344 0.10238372 0.02009676 ..., -0.01948701 0.98455034
-0.00975614]
[-0.03109401 0.0973963 -0.0279125 ..., -0.01072974 -0.0109425
0.98929811]]
Sigma:
[ 252943.48015512 74965.29844851 15170.76769244 4357.38062076
3934.63212778]
VT:
[[-0.16143572 -0.22105626 -0.93558846 -0.14545156 -0.16908786]
[ 0.5073101 0.40240734 -0.34460639 0.45443181 0.50541365]
[-0.11561044 0.87141558 -0.07426656 -0.26914744 -0.38641073]
[ 0.63320943 -0.09361249 0.00794671 -0.75788695 0.12580436]
[-0.54977724 0.14516905 -0.01849291 -0.35426346 0.74217676]]
But I am not sure how to use this data to preidct my values. I am using this link http://datascientistinsights.com/2013/02/17/single-value-decomposition-a-golfers-tutotial/ as a reference but that is in R. At the end they use R to predict values but they use this command in R:
approxGolf_1 <- golfSVD$u[,1] %*% t(golfSVD$v[,1]) * golfSVD$d[1]
Here is the IdeOne link to the entire R code: http://ideone.com/Yj3y6j
I'm not really familiar with R so can anyone let me know if there is a similar function in Python to the command above or explain what that command is doing exactly?
Thanks.
I will use the golf course example data you linked, to set the stage:
import numpy as np
A=np.matrix((4,4,3,4,4,3,4,2,5,4,5,3,5,4,5,4,4,5,5,5,2,4,4,4,3,4,5))
A=A.reshape((3,9)).T
This gives you the original 9 rows, 3 columns table with scores of 9 holes for 3 players:
matrix([[4, 4, 5],
[4, 5, 5],
[3, 3, 2],
[4, 5, 4],
[4, 4, 4],
[3, 5, 4],
[4, 4, 3],
[2, 4, 4],
[5, 5, 5]])
Now the singular value decomposition:
U, s, V = np.linalg.svd(A)
The most important thing to investigate is the vector s of singular values:
array([ 21.11673273, 2.0140035 , 1.423864 ])
It shows that the first value is much bigger than the others, indicating that the corresponding Truncated SVD with only one value represents the original matrix A quite well. To calculate this representation, you take column 1 of U multiplied by the first row of V, multiplied by the first singular value. This is what the last cited command in R does. Here is the same in Python:
U[:,0]*s[0]*V[0,:]
And here is the result of this product:
matrix([[ 3.95411864, 4.64939923, 4.34718814],
[ 4.28153222, 5.03438425, 4.70714912],
[ 2.42985854, 2.85711772, 2.67140498],
[ 3.97540054, 4.67442327, 4.37058562],
[ 3.64798696, 4.28943826, 4.01062464],
[ 3.69694905, 4.3470097 , 4.06445393],
[ 3.34185528, 3.92947728, 3.67406114],
[ 3.09108399, 3.63461111, 3.39836128],
[ 4.5599837 , 5.36179782, 5.0132808 ]])
Concerning the vector factors U[:,0] and V[0,:]: Figuratively speaking, U can be seen as a representation of a hole's difficulty, while V encodes a player's strength.

Categories