efficient 3D interpolation/approximation in scipy (python) - python

Below is small example code which tries to interpolate EEG cap signals. In the example, EEG cap has 44 channels/electrodes, and 1125 timestamps for each of the channels. Furthermore there are 800 samples which contain 1125 timestamps of 44 channels/electrodes each.
I tried RBF interpolation from scipy but it seems to be very slow.
Please note that the electrode coordinates only needed to be rotated once.
How can I improve the code such that interpolation is faster? I am open to consider other interpolation/approximation method.
import numpy as np
from scipy.interpolate import Rbf
x = np.random.rand(44,1)
y = np.random.rand(44,1)
z = np.random.rand(44,1)
xR = np.random.rand(44,1)
yR = np.random.rand(44,1)
zR = np.random.rand(44,1)
time_series = np.random.rand(800,44,1125)
time_series_rotated = np.zeros((800,44,1125))
total_time_steps = time_series.shape[2]
total_samples = time_series.shape[0]
for s in range(total_samples):
for t in range(total_time_steps):
rbfi = Rbf(x, y, z, time_series[s,:,t], function="quintic")
time_series_rotated[s,:,t] = np.squeeze(rbfi(xR, yR, zR))

griddata accept multidimensional arrays as values, so you can directly write:
from scipy.interpolate import griddata
nbr_electrodes = 44
nbr_samples = 800
nbr_timestamps = 125 # to be testable
xyz = np.random.rand(nbr_electrodes, 3)
xyz_rotated = np.random.rand(nbr_electrodes, 3)
time_series = np.random.rand(nbr_electrodes, nbr_timestamps, nbr_samples)
time_series_rotated = griddata(xyz, time_series, xyz_rotated, method='linear')
Note that the points (electrodes) are on the first dimension now. It takes less than 100 ms on my computer versus more than 1s for the loop method.
time_series_rotated.shape gives (44, 125, 800)

Related

Dimensions of C (1801, 3600) are incompatible with X (3600) and/or Y (1801); see help(pcolormesh)

I'm looking to show surface wind data from "https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-land?tab=form"
I'm using the code below to unpack the data in the form of a diagram
"https://confluence.ecmwf.int/display/CKB/How+to+plot+GRIB+files+with+Python+and+matplotlib"
However I get the error 'TypeError: Dimensions of C (1801, 3600) are incompatible with X (3600) and/or Y (1801); see help(pcolormesh)', which should work as C is (rows, columns) and X represents columns and Y represents rows, but this error suggests the data does not fit?
The code is below, please any suggestions would be massively appreciated, thanks!
import pygrib
import matplotlib.pyplot as plt
import matplotlib.colors as colors
from mpl_toolkits.basemap import Basemap
from mpl_toolkits.basemap import shiftgrid
import numpy as np
plt.figure(figsize=(12,8))
grib = 'adaptor.mars.internal-1669570066.0148444-9941-17-702abb2d-e37e-4ef7-a19e-a04fb24e5a20.grib' # Set the file name of your input GRIB file
grbs = pygrib.open(grib)
grb = grbs.select()[0]
data = grb.values
# need to shift data grid longitudes from (0..360) to (-180..180)
lons = np.linspace(float(grb['longitudeOfFirstGridPointInDegrees']), \
float(grb['longitudeOfLastGridPointInDegrees']), int(grb['Ni']) )
lats = np.linspace(float(grb['latitudeOfFirstGridPointInDegrees']), \
float(grb['latitudeOfLastGridPointInDegrees']), int(grb['Nj']) )
data, lons = shiftgrid(180., data, lons, start=False)
grid_lon, grid_lat = np.meshgrid(lons, lats) #regularly spaced 2D grid
m = Basemap(projection='cyl', llcrnrlon=-180, \
urcrnrlon=180.,llcrnrlat=lats.min(),urcrnrlat=lats.max(), \
resolution='c')
m.drawcoastlines()
m.drawmapboundary()
m.drawparallels(np.arange(-90.,120.,30.),labels=[1,0,0,0])
m.drawmeridians(np.arange(-180.,180.,60.),labels=[0,0,0,1])
x, y = m(grid_lon, grid_lat)
#x = np.transpose(x)
#y = np.transpose(y)
cs = m.pcolormesh(x,y,data,shading='flat',cmap=plt.cm.gist_stern_r)
plt.colorbar(cs,orientation='vertical', shrink=0.5)
plt.title('CAMS AOD forecast') # Set the name of the variable to plot
plt.savefig(grib+'.png') # Set the output file name
However, even if I transpose the x and y data I get the same error type but: X (1801) and/or Y (3600). What ideally should happen is a map loads with North-South and East-West wind surface speeds.
Numpy pcolormesh: TypeError: Dimensions of C are incompatible with X and/or Y
Above is a similar question. However, the suggested transposing data solution which worked in that case did not work here.

Assign 3D data value based on a 2D index profile

I have a 3D numpy array:
data0 = np.random.rand(30, 50, 50)
I have a 2D surface:
surf = np.random.rand(50, 50) * 30
surf = surf.astype(int)
Now I want to assign '0' to data0 along the surface profile. Which I know for loop can achieve this:
for xx in range(50):
for yy in range(50):
data0[0:surf[xx, yy], xx, yy] = 0
Data0 is a 3D volume with size of 30 * 50 * 50. surf is a 2D surface profile with size of 50 * 50. What I am trying to do is filling '0' from top to the surface (axis=0) in the volume
Here, 'for' loop is very slow, and it is inefficient when data0 is very huge. Could someone advise how to efficiently assign the values based on the surf profile?
If you want to use numpy, you can create a mask with z-index values below your surf values set to True, then fill those cells with zeros:
import numpy as np
np.random.seed(123)
x, y, z = 4, 5, 3
data0 = np.random.rand(z, x, y)
surf = np.random.rand(x, y) * z
surf = surf.astype(int)
#your attempt
#we copy the data just for the comparison
data_loop = data0.copy()
for xx in range(x):
for yy in range(y):
data_loop[0:surf[xx, yy], xx, yy] = 0
#again, we copy the data just for the comparison
data_np = data0.copy()
#masking the cells according to your index comparison criteria
mask = np.broadcast_to(np.arange(data0.shape[0])[:,None, None], data0.shape) < surf[None, :]
#set masked values to zero
data_np[mask] = 0
#check for equivalence of resulting arrays
print((data_np==data_loop).all())
I am sure there is a better, numpier way to generate the index number mask. As it is, this version is not necessarily faster. This depends on the shape of your array.
For x=500, y=200, and z=3000, your loop takes 1.42 s and my numpy approach 1.94 s.
For the same array size but with shape x=5000, y=2000, and z=30, your loop approach takes 7.06 s and the numpy approach 1.95 s.

boxplot structure disappears when pandas contains nan [duplicate]

I am using matplotlib to plot a box figure but there are some missing values (NaN). Then I found it doesn't display the box figure within the columns having NaN values.
Do you know how to solve this problem?
Here are the codes.
import numpy as np
import matplotlib.pyplot as plt
#==============================================================================
# open data
#==============================================================================
filename='C:\\Users\\liren\\OneDrive\\Data\\DATA in the first field-final\\ks.csv'
AllData=np.genfromtxt(filename,delimiter=";",skip_header=0,dtype='str')
TreatmentCode = AllData[1:,0]
RepCode = AllData[1:,1]
KsData= AllData[1:,2:].astype('float')
DepthHeader = AllData[0,2:].astype('float')
TreatmentUnique = np.unique(TreatmentCode)[[3,1,4,2,8,6,9,7,0,5,10],]
nT = TreatmentUnique.size#nT=number of treatments
#nD=number of deepth;nR=numbers of replications;nT=number of treatments;iT=iterms of treatments
nD = 5
nR = 6
KsData_3D = np.zeros((nT,nD,nR))
for iT in range(nT):
Treatment = TreatmentUnique[iT]
TreatmentFilter = TreatmentCode == Treatment
KsData_Filtered = KsData[TreatmentFilter,:]
KsData_3D[iT,:,:] = KsData_Filtered.transpose()iD = 4
fig=plt.figure()
ax = fig.add_subplot(111)
plt.boxplot(KsData_3D[:,iD,:].transpose())
ax.set_xticks(range(1,nT+1))
ax.set_xticklabels(TreatmentUnique)
ax.set_title(DepthHeader[iD])
Here is the final figure and some of the treatments are missing in the box.
You can remove the NaNs from the data first, then plot the filtered data.
To do that, you can first find the NaNs using np.isnan(data), then perform the bitwise inversion of that Boolean array using the ~: bitwise inversion operator. Use that to index the data array, and you filter out the NaNs.
filtered_data = data[~np.isnan(data)]
In a complete example (adapted from here)
Tested in python 3.10, matplotlib 3.5.1, seaborn 0.11.2, numpy 1.21.5, pandas 1.4.2
For 1D data:
import matplotlib.pyplot as plt
import numpy as np
# fake up some data
np.random.seed(2022) # so the same data is created each time
spread = np.random.rand(50) * 100
center = np.ones(25) * 50
flier_high = np.random.rand(10) * 100 + 100
flier_low = np.random.rand(10) * -100
data = np.concatenate((spread, center, flier_high, flier_low), 0)
# Add a NaN
data[40] = np.NaN
# Filter data using np.isnan
filtered_data = data[~np.isnan(data)]
# basic plot
plt.boxplot(filtered_data)
plt.show()
For 2D data:
For 2D data, you can't simply use the mask above, since then each column of the data array would have a different length. Instead, we can create a list, with each item in the list being the filtered data for each column of the data array.
A list comprehension can do this in one line: [d[m] for d, m in zip(data.T, mask.T)]
import matplotlib.pyplot as plt
import numpy as np
# fake up some data
np.random.seed(2022) # so the same data is created each time
spread = np.random.rand(50) * 100
center = np.ones(25) * 50
flier_high = np.random.rand(10) * 100 + 100
flier_low = np.random.rand(10) * -100
data = np.concatenate((spread, center, flier_high, flier_low), 0)
data = np.column_stack((data, data * 2., data + 20.))
# Add a NaN
data[30, 0] = np.NaN
data[20, 1] = np.NaN
# Filter data using np.isnan
mask = ~np.isnan(data)
filtered_data = [d[m] for d, m in zip(data.T, mask.T)]
# basic plot
plt.boxplot(filtered_data)
plt.show()
I'll leave it as an exercise to the reader to extend this to 3 or more dimensions, but you get the idea.
Use seaborn, which is a high-level API for matplotlib
seaborn.boxplot filters NaN under the hood
import seaborn as sns
sns.boxplot(data=data)
1D
2D
NaN is also ignored if plotting from df.plot(kind='box') for pandas, which uses matplotlib as the default plotting backend.
import pandas as pd
df = pd.DataFrame(data)
df.plot(kind='box')
1D
2D

is there a solution to resize 3d data Without the data become messy?

i use this code to resize 3d nifti data but when i check the result i found it messy and the axes are changed
import numpy as np
import nibabel as nib
import itertools
initial_size_x = 560
initial_size_y = 560
initial_size_z = 240
new_size_x = 512
new_size_y = 512
new_size_z = 216
initial_data = nib.load("id001-512x512x216.nii.gz-pred.nii").get_data()
print('helooooooooooooooooooo')
delta_x = initial_size_x/new_size_x
delta_y = initial_size_y/new_size_y
delta_z = initial_size_z/new_size_z
new_data = np.zeros((new_size_x,new_size_y,new_size_z))
for x, y, z in itertools.product(range(new_size_x),
range(new_size_y),
range(new_size_z)):
new_data[x][y][z] = initial_data[int(x*delta_x)][int(y*delta_y)][int(z*delta_z)]
img = nib.Nifti1Image(new_data, np.eye(4))
img.to_filename("test_"+str(new_size_x)+""+str(new_size_y)+""+str(new_size_z)+".nii")
enter image description here
In this question, I believe you want to slightly change the resolution of 3D data. The solution I am presenting only works for enlarging or shrinking the data an integer number of times.
For enlarging the data, you can use np.repeat and for shrinking it you can use slicing. For example, here we can write:
import numpy as np
import nibabel as nib
import itertools
initial_size_x = 560
initial_size_y = 560
initial_size_z = 240
new_size_x = 1120
new_size_y = 1120
new_size_z = 720
initial_data = nib.load("id001-512x512x216.nii.gz-pred.nii").get_data()
rep_x = new_size_x/initial_size_x # 2
rep_y = new_size_y/initial_size_y # 2
rep_z = new_size_z/initial_size_z # 3
new_data = np.repeat(initial_data, rep_x, axis=0)
new_data = np.repeat(new_data, rep_y, axis=1)
new_data = np.repeat(new_data, rep_z, axis=2)
Possible improvements
I am sure this answer can be improved. However I am not sure what you have in mind from a floating point repetition.
For instance, should my_repeat(data, 0.9, axis=axis) skip every 10th element?

Regridding regular netcdf data

I have a netcdf file containing global sea-surface temperatures. Using matplotlib and Basemap, I've managed to make a map of this data, with the following code:
from netCDF4 import Dataset
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
filename = '/Users/Nick/Desktop/SST/SST.nc'
fh = Dataset(filename, mode='r')
lons = fh.variables['LON'][:]
lats = fh.variables['LAT'][:]
sst = fh.variables['SST'][:].squeeze()
fig = plt.figure()
m = Basemap(projection='merc', llcrnrlon=80.,llcrnrlat=-25.,urcrnrlon=150.,urcrnrlat=25.,lon_0=115., lat_0=0., resolution='l')
lon, lat = np.meshgrid(lons, lats)
xi, yi = m(lon, lat)
cs = m.pcolormesh(xi,yi,sst, vmin=18, vmax=32)
m.drawmapboundary(fill_color='0.3')
m.fillcontinents(color='0.3', lake_color='0.3')
cbar = m.colorbar(cs, location='bottom', pad="10%", ticks=[18., 20., 22., 24., 26., 28., 30., 32.])
cbar.set_label('January SST (' + u'\u00b0' + 'C)')
plt.savefig('SST.png', dpi=300)
The problem is that the data is very high resolution (9km grid) which makes the resulting image quite noisy. I would like to put the data onto a lower resolution grid (e.g. 1 degree), but I'm struggling to work out how this could be done. I followed a worked solution to try and use the matplotlib griddata function by inserting the code below into my above example, but it resulted in 'ValueError: condition must be a 1-d array'.
xi, yi = np.meshgrid(lons, lats)
X = np.arange(min(x), max(x), 1)
Y = np.arange(min(y), max(y), 1)
Xi, Yi = np.meshgrid(X, Y)
Z = griddata(xi, yi, z, Xi, Yi)
I'm a relative beginner to Python and matplotlib, so I'm not sure what I'm doing wrong (or what a better approach might be). Any advice appreciated!
If you regrid your data to a coarser lat/lon grid using e.g. bilinear interpolation, this will result in a smoother field.
The NCAR ClimateData guide has a nice introduction to regridding (general, not Python-specific).
The most powerful implementation of regridding routines available for Python is, to my knowledge, the Earth System Modeling Framework (ESMF) Python interface (ESMPy). If this is a bit too involved for your application, you should look into
EarthPy tutorials on regridding (e.g. using Pyresample, cKDTree, or Basemap).
Turning your data into an Iris cube and using Iris' regridding functions.
Perhaps start by looking at the EarthPy regridding tutorial using Basemap, since you are using it already.
The way to do this in your example would be
from mpl_toolkits import basemap
from netCDF4 import Dataset
filename = '/Users/Nick/Desktop/SST/SST.nc'
with Dataset(filename, mode='r') as fh:
lons = fh.variables['LON'][:]
lats = fh.variables['LAT'][:]
sst = fh.variables['SST'][:].squeeze()
lons_sub, lats_sub = np.meshgrid(lons[::4], lats[::4])
sst_coarse = basemap.interp(sst, lons, lats, lons_sub, lats_sub, order=1)
This performs bilinear interpolation (order=1) on your SST data onto a sub-sampled grid (every fourth point). Your plot will look more coarse-grained afterwards. If you do not like that, interpolate back onto the original grid with e.g.
sst_smooth = basemap.interp(sst_coarse, lons_sub[0,:], lats_sub[:,0], *np.meshgrid(lons, lats), order=1)
I usually run my data through a Laplace filter for smoothing. Perhaps you could try the function below and see if it helps with your data. The function can be called with or without a mask (e.g land/ocean mask for ocean data points). Hope this helps. T
# Laplace filter for 2D field with/without mask
# M = 1 on - cells used
# M = 0 off - grid cells not used
# Default is without masking
import numpy as np
def laplace_X(F,M):
jmax, imax = F.shape
# Add strips of land
F2 = np.zeros((jmax, imax+2), dtype=F.dtype)
F2[:, 1:-1] = F
M2 = np.zeros((jmax, imax+2), dtype=M.dtype)
M2[:, 1:-1] = M
MS = M2[:, 2:] + M2[:, :-2]
FS = F2[:, 2:]*M2[:, 2:] + F2[:, :-2]*M2[:, :-2]
return np.where(M > 0.5, (1-0.25*MS)*F + 0.25*FS, F)
def laplace_Y(F,M):
jmax, imax = F.shape
# Add strips of land
F2 = np.zeros((jmax+2, imax), dtype=F.dtype)
F2[1:-1, :] = F
M2 = np.zeros((jmax+2, imax), dtype=M.dtype)
M2[1:-1, :] = M
MS = M2[2:, :] + M2[:-2, :]
FS = F2[2:, :]*M2[2:, :] + F2[:-2, :]*M2[:-2, :]
return np.where(M > 0.5, (1-0.25*MS)*F + 0.25*FS, F)
# The mask may cause laplace_X and laplace_Y to not commute
# Take average of both directions
def laplace_filter(F, M=None):
if M == None:
M = np.ones_like(F)
return 0.5*(laplace_X(laplace_Y(F, M), M) +
laplace_Y(laplace_X(F, M), M))
To answer your original question regarding scipy.interpolate.griddata, too:
Have a close look at the parameter specs for that function (e.g. in the SciPy documentation) and make sure that your input arrays have the right shapes. You might need to do something like
import numpy as np
points = np.vstack([a.flat for a in np.meshgrid(lons,lats)]).T # (n,D)
values = sst.ravel() # (n)
etc.
If you are working on Linux, you can achieve this using nctoolkit (https://nctoolkit.readthedocs.io/en/latest/).
You have not stated the latlon extent of your data, so I will assume it is a global dataset. Regridding to 1 degree resolution would require the following:
import nctoolkit as nc
filename = '/Users/Nick/Desktop/SST/SST.nc'
data = nc.open_data(filename)
data.to_latlon(lon = [-179.5, 179.5], lat = [-89.5, 89.5], res = [1,1])
# visualize the data
data.plot()
Look at this example with xarray...
use the ds.interp method and specify the new latitude and longitude values.
http://xarray.pydata.org/en/stable/interpolation.html#example

Categories