How to unravel data interpolated with griddata - python

I have interpolate a function on a grid with scipy.interpolate.griddata like so
interpolated_quantity = scipy.interpolate.griddata(old_points, old_array, grid_x, grid_y, grid_z, method='nearest')
What I would like to do is to convert have a set of 4 1-D arrays: 3 with the position of each cell and one with the corresponding value of interpolated quantity in each cell.
So far I'm using a very slow and time consuming operation:
arrays={}
base_gridx = linspace(xmin,xmax,abs(ngridx)+1)
base_gridy = linspace(ymin,ymax,abs(ngridy)+1)
base_gridz = linspace(zmin,zmax,abs(ngridz)+1)
cx = (base_gridx[1:]+base_gridx[:-1])/2.
cy = (base_gridy[1:]+base_gridy[:-1])/2.
cz = (base_gridz[1:]+base_gridz[:-1])/2.
data_len = len(cx)*len(cy)*len(cz)
for ii in arange(0,len(cx)):
for jj in arange(0,len(cy)):
for kk in arange(0,len(cz)):
arrays["x"].append(cx[ii])
arrays["y"].append(cy[jj])
arrays["z"].append(cz[kk])
arrays["prop"].append(interpolated quantity[ii][jj][kk])
This works, but it just takes a huge amount of time. Do you think there might be a faster way to do this? Maybe using ravel?

It is as simple as you suggest. The four arrays are:
grid_x.ravel()
grid_y.ravel()
grid_z.ravel()
interpolated_quantity.ravel()

Related

What is the best way/method to digitize the data of a 3D surface into a grid of pixels with smaller resolution in Python?

I want to digitize (= average out over cells) photon count data into pixels given by a grid that tells how they are aligned. The photon count data is stored in a 2D array. I want to split that data into cells, each of which would correspond to a pixel. The idea is basically the same as changing an HD image to a smaller resolution. I'd like to achieve this in Python.
The digitizing function I've written:
import numpy as np
def digitize(function_data, grid_shape):
"""
function_data = 2D array of function values of some 3D shape,
eg.: exp(-(x^2 + y^2 -> want to digitize this
grid_shape: an array of length 2 which contains the dimensions of the smaller resolution
"""
l = len(function_data)
pixel_len_x = int(l/grid_shape[0])
pixel_len_y = int(l/grid_shape[1])
digitized_data = np.empty((grid_shape[0], grid_shape[1]))
for i in range(grid_shape[0]): #row-index of pixel in smaller-resolution grid
for j in range(grid_shape[1]): #column-index of pixel in smaller-resolution grid
hd_pixel = []
for k in range(pixel_len_y):
hd_pixel.append(z_data[k][j:j*pixel_len_x])
hd_pixel = np.ravel(hd_pixel) #turns 2D array into 1D to be able to compute average
pixel_avg = np.average(hd_pixel)
digitized_data[i][j] = pixel_avg
return digitized_data
In theory, this function should do what I want to achieve, but when tested it doesn't yield the expected results. Either a completed version of my function or any other method that achieves my goal would be extremely helpful.
You could also use a interpolation function, if you can use SciPy. Here we use one of the gridded data interpolating functions, RectBivariateSpline to upsample your function, but you can find numerous examples on this and other sites.
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import RectBivariateSpline as rbs
# Sampling coordinates
x = np.linspace(-2,2,20)
y = np.linspace(-2,2,30)
# Your function
f = np.exp(-(x[:,None]**2 + y**2))
# Interpolator
interp = rbs(x, y, f)
# Higher resolution coordinates
x_hd = np.linspace(x.min(), x.max(), x.size * 5)
y_hd = np.linspace(y.min(), y.max(), y.size * 5)
# New higher res function
f_hd = interp(x_hd, y_hd, grid = True)
# Some plots
fig, ax = plt.subplots(ncols = 2)
ax[0].imshow(f)
ax[1].imshow(f_hd)

Incorrect results for simple 2D transformation

I'm attempting a 2D transformation using the nudged package.
The code is really simple:
import nudged
# Domain data
x_d = [2538.87, 1294.42, 3002.49, 2591.56, 2881.37, 891.906, 1041.24, 2740.13, 1928.55, 3335.12, 3771.76, 1655.0, 696.772, 583.242, 2313.95, 2422.2]
y_d = [2501.89, 4072.37, 2732.65, 2897.21, 808.969, 1760.97, 992.531, 1647.57, 2407.18, 2868.68, 724.832, 1938.11, 1487.66, 1219.14, 672.898, 145.059]
# Range data
x_r = [3.86551776277075, 3.69693290266126, 3.929110096606081, 3.8731112887391532, 3.9115924127798536, 3.6388068074815862, 3.6590261077461577, 3.892482104449016, 3.781816183438835, 3.97464058821231, 4.033173444601999, 3.743901522907265, 3.6117470568340906, 3.5959585708147728, 3.8338853650390945, 3.8487836817639334]
y_r = [1.6816478101135388, 1.8732008327428353, 1.7089144628920678, 1.729386055302033, 1.4767657611559102, 1.5933812675900505, 1.5003232598807479, 1.5781629182153942, 1.670867507106891, 1.7248363641300841, 1.4654588884234485, 1.6143557610354264, 1.5603626129237362, 1.5278835570641824, 1.4609066190929916, 1.397111300807424]
# Random domain data
x, y = np.random.uniform(0., 4000., (2, 1000))
# Define domain and range points
dom, ran = (x_d, y_d), (x_r, y_r)
# Obtain transformation dom --> ran
trans = nudged.estimate(dom, ran)
# Apply the transformation to the (x, y) points
x_t, y_t = trans.transform((x, y))
where (x_d, y_d) and (x_r, y_r) are the 1 to 1 correlated "domain" and "range" points, and (x, y) are all the points in the (x_d, y_d) (domain) system that I want to transform to the (x_r, y_r) (range) system.
This is the result I get:
where:
trans.get_matrix()
[[-0.0006459232439068067, -0.0007947429558548157, 6.534164085946009], [0.0007947429558548157, -0.0006459232439068067, 2.515279819707991], [0, 0, 1]]
trans.get_rotation()
2.2532603497070713
trans.get_scale()
0.0010241255796531702
trans.get_translation()
[6.534164085946009, 2.515279819707991]
This is the final transformed dom values with the original ran points overlayed:
This is clearly not right and I can't figure out what I'm doing wrong.
I was able to figure out your issue. It is simply that nudge has somewhat problematic notation, which is poorly documented.
The estimate function accepts a list of coordinate pairs. You effectively have to transpose dom and ran to get this to work. I suggest either switching to numpy arrays, or using list(map(list, zip(...))) to do the transpose.
The Transform.transfom method is extremely restrictive, and requires that the inner pairs be of type list. Not tuple, not any other sequence, but specifically list. Your attempt to call trans.transform((x, y)) only happened to work by pure luck. transform assessed that the first element is not a list, and attempted to transform (x, y) as a pair of integers. Luckily for you, numpy operators are vectorized, so you can process an entire array as a single unit.
Here is a working version of your code that generates the correct plots using mostly python:
x_d = [2538.87, 1294.42, 3002.49, 2591.56, 2881.37, 891.906, 1041.24, 2740.13, 1928.55, 3335.12, 3771.76, 1655.0, 696.772, 583.242, 2313.95, 2422.2]
y_d = [2501.89, 4072.37, 2732.65, 2897.21, 808.969, 1760.97, 992.531, 1647.57, 2407.18, 2868.68, 724.832, 1938.11, 1487.66, 1219.14, 672.898, 145.059]
# Range data
x_r = [3.86551776277075, 3.69693290266126, 3.929110096606081, 3.8731112887391532, 3.9115924127798536, 3.6388068074815862, 3.6590261077461577, 3.892482104449016, 3.781816183438835, 3.97464058821231, 4.033173444601999, 3.743901522907265, 3.6117470568340906, 3.5959585708147728, 3.8338853650390945, 3.8487836817639334]
y_r = [1.6816478101135388, 1.8732008327428353, 1.7089144628920678, 1.729386055302033, 1.4767657611559102, 1.5933812675900505, 1.5003232598807479, 1.5781629182153942, 1.670867507106891, 1.7248363641300841, 1.4654588884234485, 1.6143557610354264, 1.5603626129237362, 1.5278835570641824, 1.4609066190929916, 1.397111300807424]
# Random domain data
uni = np.random.uniform(0., 4000., (2, 1000))
# Define domain and range points
dom = list(map(list, zip(x_d, y_d)))
ran = list(map(list, zip(x_r, y_r)))
# Obtain transformation dom --> ran
trans = estimate(dom, ran)
# Apply the transformation to the (x, y) points
tra = trans.transform(uni)
fig, ax = plt.subplots(2, 2)
ax[0][0].scatter(x_d, y_d)
ax[0][0].set_title('dom')
ax[0][1].scatter(x_r, y_r)
ax[0][1].set_title('ran')
ax[1][0].scatter(*uni)
ax[1][1].scatter(*tra)
I left in your hack with uni, since I did not feel like converting the array of random values to a nested list. The resulting plot looks like this:
My overall recommendation is to submit a number of bug reports to the nudge library based on these findings.

Using FFT for 3D array representation of 2D field

I need to obtain the fourier transform of a complex field. I'm using python.
My input is a 2D snapshot of the electric field in the xy-plane.
I currently have a 3D array F[x][y][z] where F[x][y][0] contains the real component and F[x][y]1 contains the complex component of the field.
My current code is very simple and does this:
result=np.fft.fftn(F)
result=np.fft.fftshift(result)
I have the following questions:
1) Does this correctly compute the fourier transform of the field, or should the field be entered as a 2D matrix with each element containing both the real and imaginary component instead?
2) I entered the complex component values of the field using the real multiple only (i.e if the complex value is 6i I entered 6), is this correct or should this be entered as a complex value instead (i.e. entered as '6j')?
3) As this is technically a 2D input field, should I use np.fft.fft2 instead? Doing this means the output is not centered in the middle.
4) The output does not look like what I'd expect the fourier transform of F to look like, and I'm unsure what I'm doing wrong.
Full example code:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
x, y = np.meshgrid(np.linspace(-1,1,100), np.linspace(-1,1,100))
d = np.sqrt(x*x+y*y)
sigma, mu = .35, 0.0
g1 = np.exp(-( (d-mu)**2 / ( 2.0 * sigma**2 ) ) )
F=np.empty(shape=(300,300,2),dtype=complex)
for x in range(0,300):
for y in range(0,300):
if y<50 or x<100 or y>249 or x>199:
F[x][y][0]=g1[0][0]
F[x][y][1]=0j
elif y<150:
F[x][y][0]=g1[x-100][y-50]
F[x][y][1]=0j
else:
F[x][y][0]=g1[x-100][y-150]
F[x][y][1]=0j
F_2D=np.empty(shape=(300,300))
for x in range(0,300):
for y in range(0,300):
F_2D[x][y]=np.absolute(F[x][y][0])+np.absolute(F[x][y][1])
plt.imshow(F_2D)
plt.show()
result=np.fft.fftn(F)
result=np.fft.fftshift(result)
result_2D=np.empty(shape=(300,300))
for x in range(0,300):
for y in range(0,300):
result_2D[x][y]=np.absolute(result[x][y][0])+np.absolute(result[x][y][1])
plt.imshow(result_2D)
plt.show()
plotting F gives this:
With np.fft.fftn, the image shown at the end is:
And with np.fft.fft2:
Neither of these look like what I would expect the fourier transform of F to look like.
I add here another answer, suitable to the added code.
The answer is still np.fft.fft2(). Here's an example. I modified the code slightly. To verify that we need fft2 I discarded one of the blobs, and then we know that a single Gaussian blob should transform into a Gaussian blob (with a certain phase, that's not shown when plotting absolute value). I also decreased the standard deviation so that the frequency response will widen a little.
Code:
import numpy as np
import matplotlib.pyplot as plt
x, y = np.meshgrid(np.linspace(-1,1,100), np.linspace(-1,1,100))
d = np.sqrt(x**2+y**2)
sigma, mu = .1, 0.0
g1 = np.exp(-( (d-mu)**2 / ( 2.0 * sigma**2 ) ) )
N = 300
positions = [ [150,100] ]#, [150,200] ]
sz2 = [int(x/2) for x in g1.shape]
F_2D = np.zeros([N,N])
for x0,y0 in positions:
F_2D[ x0-sz2[0]: x0+sz2[0], y0-sz2[1]:y0+sz2[1] ] = g1 + 1j*0.
result = np.fft.fftshift(np.fft.fft2(F_2D))
plt.subplot(211); plt.imshow(F_2D)
plt.subplot(212); plt.imshow(np.absolute(result))
plt.title('$\sigma$=.1')
plt.show()
Result:
To get back to the original problem, we need only change
positions = [ [150,100] , [150,200] ]
and sigma=.35 instead of sigma=.1.
You should use complex numpy variables (by using 1j) and use fft2. For example:
N = 16
x0 = np.random.randn(N,N,2)
x = x0[:,:,0] + 1j*x0[:,:,1]
X = np.fft.fft2(x)
Using fftn on x0 will do a 3D FFT, and using fft will do vector-wise 1D FFT.

Applying circular filter to image in Python / Applying function to each element of numpy array

I have a Python code that works, but it's quite slow and I believe there has to be a way of doing this more efficiently.
The idea is to apply a filter to an image. The filter is an average of the points which fall within a specified radius. The input is a mx2 array representing x,y and mx1 array z, representing coordinates of m observation points.
The program that works is the following
import numpy as np
def haversine(point, xy_list):
earth_radius = 6378137.0
dlon = np.radians(xy_list[:,0]) - np.radians(point[0])
dlat = np.radians(xy_list[:,1]) - np.radians(point[1])
a = np.square(np.sin(dlat/2.0)) + np.cos(np.radians(point[0])) * np.cos(np.radians(xy_list[:,0])) * np.square(np.sin(dlon/2.0))
return 2 * earth_radius * np.arcsin(np.sqrt(a))
def circular_filter(xy, z, radius):
filtered = np.zeros(xy.shape[0])
for q in range(xy.shape[0]):
dist = haversine(xy[q,:],xy)
masked_z = np.ma.masked_where(dist>radius, z)
filtered[q] = masked_z.mean()
return filtered
x = np.random.uniform(low=-90, high=0, size=(1000,1)) # x represents longitude
y = np.random.uniform(low=0, high=90, size=(1000,1)) # y represents latitude
xy = np.hstack((x,y))
z = np.random.rand(1000,)
fitered_z = circular_filter(xy, z, radius=100.)
The problem is that I have 6 million points per data set, and the code is horribly slow. There must be a way to do this more efficiently. I thought of using scipy.spatial.distance.cdist() which is fast, but then I'd have to reproject the data to UTM, and I'd like to avoid reprojection. Any suggestions?
Thanks,
Reniel
After a lot of reading an searching I finally found the reason my code took forever to run. It's because I needed to understand and apply the concept of a filter kernel. Basically I realized there was a connection between my problem and this post:
Local Maxima with circular window
The downside: User needs to provide proper EPSG code, but I think I can find workarounds for this later.
The upside: It is very fast and efficient.
What worked for me was converting the lat long to UTM so that we can create a circular kernel and apply generic_filter from scipy.
import numpy as np
from pyproj import Proj, transform
from scipy.ndimage.filters import generic_filter
def circular_filter(tile, radius):
x, y = np.meshgrid(tile['lon'], tile['lat'])
x = x.reshape(x.size)
y = np.flipud(y.reshape(y.size))
z = tile['values'].reshape(tile['values'].size)
wgs84 = Proj(init='epsg:4326')
utm18N = Proj(init='epsg:26918')
x,y = transform(wgs84,utm18N,x,y)
dem_res = np.abs(x[1]-x[0]) # calculates the raster resolution, (original data is geoTiff read using gdal).
radius = int(np.ceil(radius/dem_res)) # user gives meters, but we figure out number of cells
print(radius)
kernel = np.zeros((2*radius+1, 2*radius+1))
y,x = np.ogrid[-radius:radius+1, -radius:radius+1]
mask = x**2 + y**2 <= radius**2
kernel[mask] = 1
print('Commence circular filter.'); start = time.time()
tile['values'] = generic_filter(tile['values'], np.mean, footprint=kernel)
print('Took {:.3f} seconds'.format(time.time()-start))
I also took a look at clustering techniques from here: http://geoffboeing.com/2014/08/clustering-to-reduce-spatial-data-set-size/
But I realized these clustering techniques serve a completely different purpose.

Integrated for loop to calculate values over a grid/mesh

I am fairly new to python, and I am trying to plot a contour plot of water surface over a 2d mesh.
At the moment the code is running but I am not getting the right solution. I have checked the formula carefully and I am fairly confident that the issue is with my loops.
I want the code to run for each point on my mesh based on their x and y coordinates.
The mesh is 100 x 100 resulting in 10000 nodes. I have posted my code below, I believe the problem is with the integrated for loops. Any advice on what I might be able to try would be great.
Apologies for the length of code...
import numpy as np
import matplotlib.pyplot as plt
import math
import sys
from math import sqrt
import decimal
t=0
n=5
l=100000
d=100
g=9.81
nx, ny = (100,100)
x5 = np.linspace(-100000,100000,nx)
y5 = np.linspace(-100000,100000,ny)
xv,yv = np.meshgrid(x5,y5)
x = np.arange(-100000,100000,2000)
y = np.arange(-100000,100000,2000)
c=np.arange(len(x))
x2=np.arange(len(x))
y2=np.arange(len(x))
t59=np.arange (1,10001,1)
h=np.arange(len(t59))
om2=1.458*(10**-4.0)
phi=52
phirad=phi*(math.pi/180)
f=om2*math.sin(phirad)
A=(((d+n)**2.0)-(d**2.0))/(((d+n)**2.0)+(d**2.0))
w=(((8*g*d)/(l**2))+(f**2))**0.5
a=((1-(A**2.0))**0.5)/(1-(A*math.cos(w*t)))
b=(((1-(A**2.0))/(1-(A*math.cos(w*t)))**2.0)-1)
l2=l**2.0
for i in range (len(x)):
for j in range (len(y)):
h[i]=d*(a-1-((((x[i]**2.0)+(y[j]**2.0))/l2)*b))
h5=np.reshape(h,(100,100))
plt.figure(1)
plt.contourf(x5,y5,h5)
plt.colorbar()
plt.show()
Ok apologies I didn't make myself very clear. So I'm hoping to achieve a parabolic basin output with h values varying between roughly -10 and 10. Instead I am getting enormous values and the completely wrong shape. I thought the for loop needed to be more like:
for i in range (len(x)):
for j in range (len(y)):
h[i][j]=d*(a-1-((((x[i][j]**2.0)+(y[i][j]**2.0))/l2)*b))
Is that clearer? Let me know if not.
The first thing is that the complete loop is not necessary.
h = d * (a - 1 - (x[None,:]**2 + y[:,None]**2) / 12 * b)
Here the magic comes with the None in indexing. x[None, :] means "x as a row vector copied to as many rows as needed and y[:, None] means "y as a column vector copied to as many columns as needed`.
This might be easiest to understand with an example:
import numpy as np
x = np.arange(5)
y = np.arange(0,50,10)
print x, y, x[None,:] + y[:, None]
The one-liner above gives:
Some manual calculations show this should be rather ok.
d = 100
a = 1.05
b = 0.1025
For a corner point at (1e5, 1e5), we have 2e10 in the addition, so the values do not look badly off.

Categories