python scipy 3D interpolation / look up table

python scipy 3D interpolation / look up table - python

I have data produced from Comsol which I would like to use as a look up table in a Python / Scipy program I am building. The output from comsol looks like B(ri,thick,L) and will contain approximately 20,000 entries. An example of the output is shown below for a reduced 3x3x3 version.
While I have found many good solutions for 3D interpolation using e.g. regulargridinterpolator (first link below), I am still looking for a solution using the lookup table style. The second link below seems close, however I am unsure how the method interpolates over all three dimensions.
I am having a hard time believing that a lookup table requires such an elaborate implementation, so any suggestions are most appreciated!
COMSOL data example
interpolate 3D volume with numpy and or scipy
Interpolating data from a look up table

I was able to figure this out and wanted to pass on my solution to the next person. I found that merely averaging the two closest points found via a cKDtree yielded errors as large as 10%.
Instead, I used the cKDtree to find the appropriate entry in the scattered look up table / data file and assign it to the correct entry of a 3D numpy array (You can save this numpy array to file if you like). Then I use rectangulargridinterpolator on this array. Errors were on the order of 0.5 percent which was an order of magnitude better than the cKDtree.
import numpy as np
from scipy.spatial import cKDTree
from scipy.interpolate import RegularGridInterpolator
l_data = np.linspace(.125,0.5,16)# np.linspace(0.01,0.1,10) #Range for "short L"
ri_data = np.linspace(0.005,0.075,29)
thick_data = np.linspace(0.0025,0.1225,25)
#xyz data with known bounds above
F = np.zeros((np.size(l_data),np.size(ri_data),np.size(thick_data)))
LUT = np.genfromtxt('a_data_file.csv', delimiter = ',')
F_val = LUT[:, 3]
tree_small_l = cKDTree(LUT[:, :3]) #xyz coords
for ri_iter in np.arange(np.size(ri_data)):
for thick_iter in np.arange(np.size(thick_data)):
for l_iter in np.arange(np.size(l_data)):
dist,ind = tree_small_l.query(((l_data[l_iter],ri_data[ri_iter],thick_data[thick_iter])))
F[l_iter,ri_iter,thick_iter] = F_val[ind].T
interp_F_func = RegularGridInterpolator((l_data, ri_data, thick_data), F)

Related

How can I plot only particular values in xarray?

I am using data from cdasws to plot dynamic spectra. I am following the example found here https://cdaweb.gsfc.nasa.gov/WebServices/REST/jupyter/CdasWsExample.html
This is my code which I have modified to obtain a dynamic spectra for STEREO.
from cdasws import CdasWs
from cdasws.datarepresentation import DataRepresentation
import matplotlib.pyplot as plt
cdas = CdasWs()
import numpy as np
datasets = cdas.get_datasets(observatoryGroup='STEREO')
for index, dataset in enumerate(datasets):
print(dataset['Id'], dataset['Label'])
variables = cdas.get_variables('STEREO_LEVEL2_SWAVES')
for variable_1 in variables:
print(variable_1['Name'], variable_1['LongDescription'])
data = cdas.get_data('STEREO_LEVEL2_SWAVES', ['avg_intens_ahead'],
'2020-07-11T02:00:00Z', '2020-07-11T03:00:00Z',
dataRepresentation = DataRepresentation.XARRAY)[1]
print(data)
plt.figure(figsize = (15,7))
# plt.ylim(100,1000)
plt.xticks(fontsize=18)
plt.yticks(fontsize=18)
plt.yscale('log')
sorted_data.transpose().plot()
plt.xlabel("Time",size=18)
plt.ylabel("Frequency (kHz)",size=18)
plt.show()
Using this code gives a plot that looks something like this,
My question is, is there anyway of plotting this spectrum only for a particular frequency? For example, I want to plot just the intensity values at 636 kHz, is there any way I can do that?
Any help is greatly appreciated, I dont understand xarray, I have never worked with it before.
Edit -
Using the command,
data_stereo.avg_intens_ahead.loc[:,625].plot()
generates a plot that looks like,
While this is useful, what I needed is;
for the dynamic spectrum, if i choose a particular frequency like 600khz, can it display something like this (i have just added white boxes to clarify what i mean) -

If you still want the plot to be 2D, but to include a subset of your data along one of the dimensions, you can provide an array of indices or a slice object. For example:
data_stereo.avg_intens_ahead.sel(
frequency=[625]
).plot()
Or
# include a 10% band on either side
data_stereo.avg_intens_ahead.sel(
frequency=slice(625*0.9, 625*1.1)
).plot()
Alternatively, if you would actually like your plot to show white space outside this selected area, you could mask your data with where:
data_stereo.avg_intens_ahead.where(
data_stereo.frequency==625
).plot()

zonal_stats: width and height must be > 0 error

I am trying to use the function zonal_stats from rasterstats Python package to get the raster statistics from a .tif file of each shape in a .shp file. I manage to do it in QGIS without any problems, but I have to do the same with more than 200 files, which will take a lot of time, so I'm trying the Python way. Both files and replication code are in my Google Drive.
My script is:
import rasterio
import geopandas as gpd
import numpy as np
from rasterio.plot import show
from rasterstats import zonal_stats
from rasterio.transform import Affine
# Import .tif file
raster = rasterio.open(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\Arroz_2019-03.tif')
# Read the raster values
array = raster.read(1)
# Get the affine
affine = raster.transform
# Import shape file
shapefile = gpd.read_file(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\Setores_Censit_SP_WGS84.shp')
# Zonal stats
zs_shapefile = zonal_stats(shapefile, array, affine = affine,
stats=['min', 'max', 'mean', 'median', 'majority'])
I get the following error:
Input In [1] in <cell line: 22>
zs_shapefile = zonal_stats(shapefile, array, affine = affine,
File ~\Anaconda3\lib\site-packages\rasterstats\main.py:32 in zonal_stats
return list(gen_zonal_stats(*args, **kwargs))
File ~\Anaconda3\lib\site-packages\rasterstats\main.py:164 in gen_zonal_stats
rv_array = rasterize_geom(geom, like=fsrc, all_touched=all_touched)
File ~\Anaconda3\lib\site-packages\rasterstats\utils.py:41 in rasterize_geom
rv_array = features.rasterize(
File ~\Anaconda3\lib\site-packages\rasterio\env.py:387 in wrapper
return f(*args, **kwds)
File ~\Anaconda3\lib\site-packages\rasterio\features.py:353 in rasterize
raise ValueError("width and height must be > 0")
I have found this question about the same problem, but I can't make it work with the solution: I have tried to reverse the signal of the items in the Affine of my raster data, but I couldn't make it work:
''' Trying to use the same solution of question: https://stackoverflow.com/questions/62010050/from-zonal-stats-i-get-this-error-valueerror-width-and-height-must-be-0 '''
old_tif = rasterio.open(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\Arroz_2019-03.tif')
print(old_tif.profile) # copy & paste the output and change signs
new_tif_profile = old_tif.profile
# Affine(0.004611149999999995, 0.0, -46.828504575,
# 0.0, 0.006521380000000008, -24.01169169)
new_tif_profile['transform'] = Affine(0.004611149999999995, 0.0, -46.828504575,
0.0, -0.006521380000000008, 24.01169169)
new_tif_array = old_tif.read(1)
new_tif_array = np.fliplr(np.flip(new_tif_array))
with rasterio.open(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\tentativa.tif', "w", **new_tif_profile) as dest:
dest.write(new_tif_array, indexes=1)
dem = rasterio.open(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\tentativa.tif')
# Read the raster values
array = dem.read(1)
# Get the affine
affine = dem.transform
# Import shape file
shapefile = gpd.read_file(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\Setores_Censit_SP_WGS84.shp')
# Zonal stats
zs_shapefile = zonal_stats(shapefile, array, affine=affine,
stats=['min', 'max', 'mean', 'median', 'majority'])
Doing this way, I don't get the "width and height must be > 0" error! But every stat in zs_shapefile is "NoneType", so it doesn't help my problem.
Does anyone understands why this error happens, and which sign I have to reverse for making it work? Thanks in advance!

I would be careful with overriding the geotransform of your raster like this, unless you are really convinced the original metadata is incorrect. I'm not too familiar with Affine, but it looks like you're setting the latitude now as positive? Placing the raster on the northern hemisphere. My guess would be that this lack of intersection between the vector and raster causes the NoneType results.
I'm also not familiar with raster_stats, but I'm guessing it boils down to GDAL & Numpy at the core of it. So something you could try as a test is to add the all_touched=True keyword:
https://pythonhosted.org/rasterstats/manual.html#rasterization-strategy
If that works, it might indicate that the rasterization fails because your polygons are so small compared to the pixels, that the default rasterization method results in a rasterized polygon of size 0 (in at least one of the dimensions). And that's what the error also hints at (my guess).
Keep in mind that all_touched=True changes the stats you get in result, so I would only do it for testing, or if you're comfortable with this difference.
If you really need a valid value for these (too) small polygons, there are a few workarounds you could try. Something I've done is to simply take the centroid for these polygons, and take the value of the pixel where this centroid falls on.
A potential way to identify these polygons would be to use all_touched with the "count" statistic, every polygon with a count of only 1 might be too small to get rasterized correctly. To really find this out you would probably have to do the rasterization yourself using GDAL, given that raster_stats doesn't seem to allow it.
Note that due to the shape of some of the polygons you use, the centroid might fall outside of the polygon. But given how course your raster data is, relative to the vector, I don't think it would impact the result all that much.
An alternative is, instead of modifying the vector, to significantly increase the resolution of your raster. You could use gdal_translate to output this to a VRT, with some form of resampling, and avoid having to write this data to disk. Once the resolution is high enough that all polygons rasterize to at least a 1x1 array, it should probably work. But your polygons are tiny compared to the pixels, so it'll be a lot. You could guess it, or analyze the envelopes of all polygons. For example take the smallest edge of the envelope as more or less the resolution that's necessary for a correct rasterization.
Edit; To clarify the above a bit further.
The default rasterization strategy of GDAL (all_touched=False) is to consider a pixel "within" the polygon if the centroid of the pixel intersects with the polygon.
Using QGIS you can for example convert the pixels to points, and then do a spatial join with your vector. If you remove polygons that can't be joined (there's a checkbox), you'll get a different vector that most likely should work with raster_stats, given your current raster.
You could perhaps use that in the normal way (all_touched=False), and get the stats for the small polygons using all_touched=True.
In the image below, the green polygons are the ones that intersect with the centroid of a pixel, the red ones don't (and those are probably the ones raster_stats "tries" to rasterize to a size 0 array).

Accessing (the right) data when using holoviews/bokeh

I am having difficulties accessing (the right) data when using holoviews/bokeh, either for connected plots showing a different aspect of the dataset, or just customising a plot with dynamic access to the data as plotted (say a tooltip).
TLDR: How to add a projection plot of my dataset (different set of dimensions and linked to main plot, like a marginal distribution but, you know, not restricted to histogram or distribution) and probably with a similar solution a related question I asked here on SO
Let me exemplify (straight from a ipynb, should be quite reproducible):
import numpy as np
import random, pandas as pd
import bokeh
import datashader as ds
import holoviews as hv
from holoviews import opts
from holoviews.operation.datashader import datashade, shade, dynspread, spread, rasterize
hv.extension('bokeh')
With imports set up, let's create a dataset (N target 10e12 ;) to use with datashader. Beside the key dimensions, I really need some value dimensions (here z and z2).
import numpy as np
import pandas as pd
N = int(10e6)
x_r = (0,100)
y_r = (100,2000)
z_r = (0,10e8)
x = np.random.randint(x_r[0]*1000,x_r[1]*1000,size=(N, 1))
y = np.random.randint(y_r[0]*1000,y_r[1]*1000,size=(N, 1))
z = np.random.randint(z_r[0]*1000,z_r[1]*1000,size=(N, 1))
z2 = np.ones((N,1)).astype(int)
df = pd.DataFrame(np.column_stack([x,y,z,z2]), columns=['x','y','z','z2'])
df[['x','y','z']] = df[['x','y','z']].div(1000, axis=0)
df
Now I plot the data, rasterised, and also activate the tooltip to see the defaults. Sure, x/y is trivial, but as I said, I care about the value dimensions. It shows z2 as x_y z2. I have a question related to tooltips with the same sort of data here on SO for value dimension access for the tooltips.
from matplotlib.cm import get_cmap
palette = get_cmap('viridis')
# palette_inv = palette.reversed()
p=hv.Points(df,['x','y'], ['z','z2'])
P=rasterize(p, aggregator=ds.sum("z2"),x_range=(0,100)).opts(cmap=palette)
P.opts(tools=["hover"]).opts(height=500, width=500,xlim=(0,100),ylim=(100,2000))
Now I can add a histogram or a marginal distribution which is pretty close to what I want, but there are issues with this soon past the trivial defaults. (E.g.: P << hv.Distribution(p, kdims=['y']) or P.hist(dimension='y',weight_dimension='x_y z',num_bins = 2000,normed=True))
Both are close approaches, but do not give me the other value dimension I'd like visualise. If I try to access the other value dimension ('x_y z') this fails. Also, the 'x_y z2' way seems very clumsy, is there a better way?
When I do something like this, my browser/notebook-extension blows up, of course.
transformed = p.transform(x=hv.dim('z'))
P << hv.Curve(transformed)
So how do I access all my data in the right way?

Find 2D transformation to merge/stich 2 datasets (tuple) with overlap in python with sklearn

Lets say I have 2 datasets / tuples "Left" and "Right". Some values are present in both datasets due to the overlap. How can I find the best transfromation for "Right" to combine it with "Left"?
I just found geometric transformation for matrices and images but not for datasets.
In this example, the data is identical besides the transformation. What if the data differs slighlty with noise? Would the best transformation be the result of a RANSAC Fit, i.e. a homography. This should be possible with Sklearn (Scikit-learn), isn't it? Again, most of the results what I looked were based on matrices and images and not datasets.
I have the feeling this problem occurred many times and a nice solution is definitely somewhere out there but unfortunately I couldn't find any.
Thank you very much for your help!
import numpy as np
import matplotlib.pyplot as plt
#Load data from txt file
dataleft = np.genfromtxt(r'C:\Data2MergeLeft.txt',delimiter="\t")
dataright = np.genfromtxt(r'C:\Data2MergeRight.txt',delimiter="\t")
#Find overlap
overlapleft = np.where(dataleft[:,0]>=dataright[1,0])[0]
overlapright = np.where(dataright[:,0]<dataleft[-1,0])[0]
#Trim data (overlap only)
olleft = dataleft[overlapleft]
olright = dataright[overlapright]
#Transformed data ->newarray
olrightnew = olright
#Initial values of transformation
offsetx = 10
linyconstant = 5
linyfactor = 1.2
#Loop for optimization
#Transformation
olrightnew[:,0] = olright[:,0]-offsetx
olrightnew[:,1] = (olright[:,1]-linyconstant)/linyfactor
#Residual
residuals = olright[:,1]-olrightnew[:,1]
sqresidual = residuals*residuals
residual = np.sum(sqresidual)

Median filter produces unexpected result on FITS file

This is based on a couple of other questions that haven't quite been answered, so I've started a new post. I'm working on finding the median of a masked array in 50-pixel patches. The image and the mask are both 901x877 telescope images.
import numpy as np
import matplotlib.pyplot as plt
from astropy.io import fits
# Use the fits files as input image and mask
hdulist = fits.open('xbulge-w1.fits')
w1data = hdulist[0].data
hdulist3 = fits.open('xbulge-mask.fits')
mask = 1 - hdulist3[0].data
w1masked = np.ma.array(w1data, mask = mask)
# Use general arrays as input image and mask
#w1data = np.arange(790177).reshape(901,877)
#w1masked = np.ma.masked_inside(w1data, 30000, 60000)
side = 50
w, h = w1data.shape
width_index = np.array(range(w//side)) * side
height_index = np.array(range(h//side)) * side
def assign_patch(patch, median, side):
"""Break this loop out to prevent 4 nested 'for' loops"""
for j in range(side):
for i in range(side):
patch[i,j] = median
return patch
for width in width_index:
for height in height_index:
patch = w1masked[width:width+side, height:height+side]
median = np.median(patch)
assign_patch(patch, median, side)
plt.imshow(w1masked)
plt.show()
The problem is, when I use the general arrays as input image and mask (the commented out section), it works fine, but when I use the FITS files, it produces 'side'-sized patches on the output image. I can't figure out what's going on with this.

I don't know how your FITS files look like but there are several things standing out:
np.median doesn't take the mask into account. In fact in recent NumPy releases this (correctly) prints a Warning if attempted. You should be using np.ma.median instead. If you would update your NumPy you'll likely see this:
UserWarning: Warning: 'partition' will ignore the 'mask' of the MaskedArray.
The assign_patch function is unnecessary when you know that you can use slice assignment:
w1masked[width:width+side, height:height+side] = median
# instead of "assign_patch(patch, median, side)"
That's also much faster than doing a double loop to replace each value.
I assume that the issue is in fact because you use np.median instead of np.ma.median. There are lots of values a masked pixel could have including nan, 0, inf, ... so if these are taken into account (when they should be ignored) could produce any kind of problems, especially if the median starts returning nans or similar.
More generally if you really wanted a median filter you can't just calculate the median of a patch and replace all values in the patch with that median. You should be using a median filter that takes the mask into account. Unfortunately I've never seen such a filter implemented in any wide-spread Python package. But if you have numba you could checkout a (very experimental!) package of mine numbamisc which contains a median_filter that takes masks into account.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.