Coordinate offsets in xarray and dask - python

I'm making use of xarray as the coordinates and automatic alignment are really useful. and I've been using Dask as the data I'm generally dealing with datasets in the order of terabytes.
I have a 3D source array generated (or loaded) and dependent on wavelength (wl) and x position and y position at origin zero.
I also have a 2D output array dependant only on x and y which accumulates all of the wavelengths from the source array. Idealistically the output would be:
output = source.sum('wl')
However, the wavelength dependence means that each wavelength offsets the source origin by a certain amount. The best (and ugliest) solution I could come up with is to loop through each wavelength, reassign coordinates, interp up to the output coordinates, stack them into a new array and then sum.
I have an example code that shows what I'm trying to do:
from dask.distributed import Client
import xarray as xr
import dask.array as da
import numpy as np
client = Client(n_workers=2, threads_per_worker=2, memory_limit='2GB')
client
# Generate some offset data here
wavelengths = np.linspace(0.1,10,1000)
x_offsets = np.linspace(100,400,1000)
y_offsets = np.linspace(100,400,1000)
# Coordinate offsets for each wavelength
offset = xr.Dataset(
{
'x': (['wl'],x_offsets),
'y': (['wl'],y_offsets)
},
coords={
'wl': wavelengths
})
# Our example source function
source_shape = (1000, 10000, 10000,)
wl_source = np.linspace(0.4,5,source_shape[0])
x_source = np.linspace(-6,6, source_shape[1])
y_source = np.linspace(-6,6, source_shape[2])
source = xr.DataArray(da.random.random(size=source_shape, chunks=(10,400,400)),
coords=[wl_source,x_source,y_source],
dims=['wl','x','y'])
out_shape = (10000, 10000,)
# Our final output array
x_out = np.linspace(-1000,1000,out_shape[0])
y_out = np.linspace(-1000,1000,out_shape[1])
out = xr.DataArray(da.random.random(size=out_shape, chunks=(4000,4000)),coords=[x_out, y_out], dims=['x','y'])
accum =[]
for wl in source.wl:
# Build our map from source -> output space
x_map = offset.interp(wl=wl).x + source.x
y_map = offset.interp(wl=wl).y + source.y
# Remap coordinates
source_mapped = source.sel(wl=wl).assign_coords({'x':x_map,
'y':y_map})
# Interp_like unchunks it so need to rechunk it here
# Interp up to the output coordinates
accum.append(
source_mapped.interp_like(out, method='nearest',kwargs={'fill_value':0}).chunk({'x':4000,'y':4000})
)
# Accumalate and add to the output
out += xr.concat(accum,dim='wl').sum('wl')
out
This solution ends up with over 1 million tasks, because of that the building of the task graph takes a long time and during computation the gc collection takes a long time, memory is exhausted or I spill so much to disk that I run out of storage. Manually slicing has the same issue.
Additionally, this can't scale if I have more than one source as well. I've been racking my brain trying to figure out a better solution.
I'm wondering if theres a more efficient way of doing this? Either through, dask, xarray or some other library. I'm fairly new to dask and xarray so I'm still trying to get to grips with how they work and how to better chunk and distribute tasks
Sorry for the long winded question!

Related

dask-image ndfilter.maximum_filter memory error issue when using a circle footprint

When I'm trying to run a neighborhood analysis dask_image.ndfilters.maximum_filter() on a quite big geotiff file (~7GB / 30000x50000 px) but also small ones (800x800 px) using a circle pattern for the footprint attribute I get a MemoryError. I get the memory error when computing .compute() the dask array back to a numpy array.
This occurs when using big radius for the circle pattern (> 100). When I'm using smaller circle patterns or size instead of footprint the system is not running out of memory so I think handling the circle pattern floods the RAM.
I have tried to chunk both the input array aswell as the circle array (i converted the circle array to a dask array) but still getting MemoryErrors.
import numpy as np
import dask.array as da
from dask.distributed import Client, LocalCluster
import dask_image.imread
import dask_image.ndfilters
cluster = LocalCluster(n_workers=1,
threads_per_worker=1,
memory_target_fraction=0.98,
memory_limit='32GB')
client = Client(cluster)
client
# Import raster file as dask array
raster_path = "data_file.tif"
raster_data = dask_image.imread.imread(raster_path)
raster_data = np.squeeze(raster_data, axis=0)
radius = 300
# Create a circle filter for the neighbor analysis
circle_filter = np.zeros((2*radius+1, 2*radius+1), dtype=int)
y, x = np.ogrid[-radius:radius+1, -radius:radius+1]
mask = x**2 + y**2 <= radius**2
circle_filter[mask] = 1
circle_filter = da.from_array(circle_filter)
# Calculate max value in neighorhood
max_value = dask_image.ndfilters.maximum_filter(raster_data, footprint=circle_filter, mode='nearest')
# Convert dask array to numpy array
output = max_value.compute()
My overall goal is to export the array back to a GeoTIFF.
Any suggestions how to handle neighborhood analysis with large circled patterns?

xarray: best way to "insert" a time slice into a dataset or dataarray

I have a 3-dimensional xarray dataset with the dimensions x, y, and time. Assuming I know that there's a missing observation at timestep n, what would be the best way to insert a timeslice with no-data values?
Here's a working example:
import xarray as xr
import pandas as pd
x = xr.tutorial.load_dataset("air_temperature")
# assuming this is the missing point in time (currently not in the dataset)
missing = "2014-12-31T07:00:00"
# create an "empty" time slice with fillvalues
empty = xr.full_like(x.isel(time=0), -3000)
# fix the time coordinate of the timeslice
empty['time'] = pd.date_range(missing, periods=1)[0]
# before insertion
print(x.time[-5:].values)
# '2014-12-30T18:00:00.000000000' '2014-12-31T00:00:00.000000000'
# '2014-12-31T06:00:00.000000000' '2014-12-31T12:00:00.000000000'
# '2014-12-31T18:00:00.000000000']
# concat and sort time
x2 = xr.concat([x, empty], "time").sortby("time")
# after insertion
print(x2.time[-5:].values)
# ['2014-12-31T00:00:00.000000000' '2014-12-31T06:00:00.000000000'
# '2014-12-31T07:00:00.000000000' '2014-12-31T12:00:00.000000000'
# '2014-12-31T18:00:00.000000000']
The example works fine, but I'm not sure if that's the best (or even the correct) approach.
My concerns are to use this with bigger datasets, and specifically with dask-array backed datasets.
Is there a better way to fill a missing 2d array?
Would it be better to use a dask-backed "fill array" when inserting into a dask-backed dataset?
You might consider using xarray's reindex method with a constant fill_value for this purpose:
import numpy as np
import xarray as xr
x = xr.tutorial.load_dataset("air_temperature")
missing_time = np.datetime64("2014-12-31T07:00:00")
missing_time_da = xr.DataArray([missing_time], dims=["time"], coords=[[missing_time]])
full_time = xr.concat([x.time, missing_time_da], dim="time")
full = x.reindex(time=full_time, fill_value=-3000.0).sortby("time")
I think both your method and the reindex method will automatically use dask-backed arrays if x is dask-backed.

Xarray: out-of-memory error (when calling the stack method of a Dask-backed Xarray the task is being killed by the system)

I was trying to solve a question of a SO post (Writing xarray multiindex data in chunks) and I ended with the code I provide below.
The problem is that in this code the call to the stack method of the DataArray results in an out of memory error.
My question is:
I'm wondering why the stack method of the concatenated object (which is a xarray.DataArray) can't complete successfully, even though this DataArray is backed by Dask (so it should not run out of memory) ? Why does the test machine runs out of memory and the task is killed by the system ?
A little summary of what is happening when running this code:
The code runs smoothly until the line stacked = concatenated.stack(sample=('y','x','time')).
At this this moment, the memory usage keeps increasing until it reaches almost 100% and the task is killed by the system.
The code was executed on a machine with 8GB of RAM.
I thought it would not run out of memory because concatenated is a Dask-backed xarray. DataArray. But it does.
I made various changes to this code, like using delayed operations, changing chunks sizes, using Dask method instead of xarray's, etc. but without success
I thought about two possibilities of what is happening:
The stack operation is NOT Dask-backed
The stack operation is Dask-backed, but even Dask requires a minimum amount of memory (for each chunk) and this amount can't fit in memory
Does someone know what is happening here and how to solve this problem ?
END NOTES:
You can vary nrows and ncols in order to change the size of concatenated.
For example setting nrows = 10000 instead of nrows = 20000 will reduce its size by half.
To be sure that the DataArray is Dask-backed I also tried to save concatenated to netcdf and load it with the chunks parameter. I tried different values for the chunks, but again without success:
concatenated.to_netcdf("concatenated.nc")
concatenated = xr.open_dataarray("concatenated.nc", chunks=10)
Using smaller values for the chunks parameter only results in taking
more time to run out of memory.
This is the code:
import numpy as np
import dask.array as da
import xarray as xr
from numpy.random import RandomState
nrows = 20000
ncols = 20000
row_chunks = 500
col_chunks = 500
# Create a reproducible random numpy array
prng = RandomState(1234567890)
numpy_array = prng.rand(1, nrows, ncols)
data = da.from_array(numpy_array, chunks=(1, row_chunks, col_chunks))
def create_band(data, x, y, band_name):
return xr.DataArray(data,
dims=('band', 'y', 'x'),
coords={'band': [band_name],
'y': y,
'x': x})
def create_coords(data, left, top, celly, cellx):
nrows = data.shape[-2]
ncols = data.shape[-1]
right = left + cellx*ncols
bottom = top - celly*nrows
x = np.linspace(left, right, ncols) + cellx/2.0
y = np.linspace(top, bottom, nrows) - celly/2.0
return x, y
x, y = create_coords(data, 1000, 2000, 30, 30)
bands = ['blue', 'green', 'red', 'nir']
times = ['t1', 't2', 't3']
bands_list = [create_band(data, x, y, band) for band in bands]
src = []
for time in times:
src_t = xr.concat(bands_list, dim='band')\
.expand_dims(dim='time')\
.assign_coords({'time': [time]})
src.append(src_t)
concatenated = xr.concat(src, dim='time')
print(concatenated)
# computed = concatenated.compute() # "computed" is ~35.8GB
# All is fine until here
stacked = concatenated.stack(sample=('y','x','time'))
# After stack we'd like to transpose
transposed = stacked.T

interpolation periodic boundaries with xarray

I would like to interpolate many xarray datasets containig global climate data to one common grid. xarray actually has an interp() method which works fine, but as far as I can tell does not take any periodic boundries into account, although this is necessary when interpolating on a sphere. Instead, datapoints which are outside of the old grid are extrapolated or filled with NaNs. The interpolation is based on the scipy package, and I think other interpolation methods from scipy also do not support periodic boundaries.
I am considering using xesmf, but was wondering if there is an easier solution for this just using xarray?
I would prefer linear interpolation but am flexible in this regard.
This is possible, if you are willing to wrap your data in the longitudinal direction. With some assumptions (DataArray has coords 'lon' and 'lat', 'lon' spans almost 0-360 and doesn't quite go to the boundaries), and borrowing some ideas from this answer, this should work:
import numpy as np
import xarray as xr
data = np.arange(360 * 180).reshape(360, 180)
lon = np.linspace(0.5, 359.5, 360)
lat = np.linspace(-89.5, 89.5, 180)
da = xr.DataArray(
coords=dict(
lon=lon,
lat=lat,
),
data=data,
)
# These will both print 'nan' as lon is outside 0.5-359.5
print(da.interp(lon=0.3, lat=32).values)
print(da.interp(lon=359.7, lat=32).values)
def xr_add_cyclic_points(da):
"""
Add cyclic points at start and end of `lon` dimension of data array.
Inputs
da: xr.DataArray including dimensions (lat,lon)
"""
# Borrows heavily from cartopy.util.add_cyclic point, but adds at start and end.
lon_idx = da.dims.index('lon')
start_slice = [slice(None)] * da.ndim
end_slice = [slice(None)] * da.ndim
start_slice[lon_idx] = slice(0, 1)
end_slice[lon_idx] = slice(-1, None)
wrap_data = np.concatenate([da.values[tuple(end_slice)], da.values, da.values[tuple(start_slice)]], axis=lon_idx)
wrap_lon = np.concatenate([da.lon.values[-1:] - 360, da.lon.values, da.lon.values[0:1] + 360])
# Generate output DataArray with new data but same structure as input
outp_da = xr.DataArray(data=wrap_data,
coords=dict(lat=da.lat, lon=wrap_lon),
dims=da.dims,
attrs=da.attrs)
return outp_da
da_wrapped = xr_add_cyclic_points(da)
# These will print interpolated values.
print(da_wrapped.interp(lon=0.3, lat=32).values)
print(da_wrapped.interp(lon=359.7, lat=32).values)

Efficiently using 1-D pyfftw on small slices of a 3-D numpy array

I have a 3D data cube of values of size on the order of 10,000x512x512. I want to parse a window of vectors (say 6) along dim[0] repeatedly and generate the fourier transforms efficiently. I think I'm doing an array copy into the pyfftw package and it's giving me massive overhead. I'm going over the documentation now since I think there is an option I need to set, but I could use some extra help on the syntax.
This code was originally written by another person with numpy.fft.rfft and accelerated with numba. But the implementation wasn't working on my workstation so I re-wrote everything and opted to go for pyfftw instead.
import numpy as np
import pyfftw as ftw
from tkinter import simpledialog
from math import ceil
import multiprocessing
ftw.config.NUM_THREADS = multiprocessing.cpu_count()
ftw.interfaces.cache.enable()
def runme():
# normally I would load a file, but for Stack Overflow, I'm just going to generate a 3D data cube so I'll delete references to the binary saving/loading functions:
# load the file
dataChunk = np.random.random((1000,512,512))
numFrames = dataChunk.shape[0]
# select the window size
windowSize = int(simpledialog.askstring('Window Size',
'How many frames to demodulate a single time point?'))
numChannels = windowSize//2+1
# create fftw arrays
ftwIn = ftw.empty_aligned(windowSize, dtype='complex128')
ftwOut = ftw.empty_aligned(windowSize, dtype='complex128')
fftObject = ftw.FFTW(ftwIn,ftwOut)
# perform DFT on the data chunk
demodFrames = dataChunk.shape[0]//windowSize
channelChunks = np.zeros([numChannels,demodFrames,
dataChunk.shape[1],dataChunk.shape[2]])
channelChunks = getDFT(dataChunk,channelChunks,
ftwIn,ftwOut,fftObject,windowSize,numChannels)
return channelChunks
def getDFT(data,channelOut,ftwIn,ftwOut,fftObject,
windowSize,numChannels):
frameLen = data.shape[0]
demodFrames = frameLen//windowSize
for yy in range(data.shape[1]):
for xx in range(data.shape[2]):
index = 0
for i in range(0,frameLen-windowSize+1,windowSize):
ftwIn[:] = data[i:i+windowSize,yy,xx]
fftObject()
channelOut[:,index,yy,xx] = 2*np.abs(ftwOut[:numChannels])/windowSize
index+=1
return channelOut
if __name__ == '__main__':
runme()
What happens is I get a 4D array; the variable channelChunks. I am saving out each channel to a binary (not included in the code above, but the saving part works fine).
This process is for a demodulation project we have, the 4D data cube channelChunks is then parsed into eval(numChannel) 3D data cubes (movies) and from that we are able to separate a movie by color given our experimental set up. I was hoping I could circumvent writing a C++ function that calls the fft on the matrix via pyfftw.
Effectively, I am taking windowSize=6 elements along the 0 axis of dataChunk at a given index of 1 and 2 axis and performing a 1D FFT. I need to do this throughout the entire 3D volume of dataChunk to generate the demodulated movies. Thanks.
The FFTW advanced plans can be automatically built by pyfftw.
The code could be modified in the following way:
Real to complex transforms can be used instead of complex to complex transform.
Using pyfftw, it typically writes:
ftwIn = ftw.empty_aligned(windowSize, dtype='float64')
ftwOut = ftw.empty_aligned(windowSize//2+1, dtype='complex128')
fftObject = ftw.FFTW(ftwIn,ftwOut)
Add a few flags to the FFTW planner. For instance, FFTW_MEASURE will time different algorithms and pick the best. FFTW_DESTROY_INPUT signals that the input array can be modified: some implementations tricks can be used.
fftObject = ftw.FFTW(ftwIn,ftwOut, flags=('FFTW_MEASURE','FFTW_DESTROY_INPUT',))
Limit the number of divisions. A division costs more than a multiplication.
scale=1.0/windowSize
for ...
for ...
2*np.abs(ftwOut[:,:,:])*scale #instead of /windowSize
Avoid multiple for loops by making use of FFTW advanced plan through pyfftw.
nbwindow=numFrames//windowSize
# create fftw arrays
ftwIn = ftw.empty_aligned((nbwindow,windowSize,dataChunk.shape[2]), dtype='float64')
ftwOut = ftw.empty_aligned((nbwindow,windowSize//2+1,dataChunk.shape[2]), dtype='complex128')
fftObject = ftw.FFTW(ftwIn,ftwOut, axes=(1,), flags=('FFTW_MEASURE','FFTW_DESTROY_INPUT',))
...
for yy in range(data.shape[1]):
ftwIn[:] = np.reshape(data[0:nbwindow*windowSize,yy,:],(nbwindow,windowSize,data.shape[2]),order='C')
fftObject()
channelOut[:,:,yy,:]=np.transpose(2*np.abs(ftwOut[:,:,:])*scale, (1,0,2))
Here is the modifed code. I also, decreased the number of frame to 100, set the seed of the random generator to check that the outcome is not modifed and commented tkinter. The size of the window can be set to a power of two, or a number made by multiplying 2,3,5 or 7, so that the Cooley-Tuckey algorithm can be efficiently applied. Avoid large prime numbers.
import numpy as np
import pyfftw as ftw
#from tkinter import simpledialog
from math import ceil
import multiprocessing
import time
ftw.config.NUM_THREADS = multiprocessing.cpu_count()
ftw.interfaces.cache.enable()
ftw.config.PLANNER_EFFORT = 'FFTW_MEASURE'
def runme():
# normally I would load a file, but for Stack Overflow, I'm just going to generate a 3D data cube so I'll delete references to the binary saving/loading functions:
# load the file
np.random.seed(seed=42)
dataChunk = np.random.random((100,512,512))
numFrames = dataChunk.shape[0]
# select the window size
#windowSize = int(simpledialog.askstring('Window Size',
# 'How many frames to demodulate a single time point?'))
windowSize=32
numChannels = windowSize//2+1
nbwindow=numFrames//windowSize
# create fftw arrays
ftwIn = ftw.empty_aligned((nbwindow,windowSize,dataChunk.shape[2]), dtype='float64')
ftwOut = ftw.empty_aligned((nbwindow,windowSize//2+1,dataChunk.shape[2]), dtype='complex128')
#ftwIn = ftw.empty_aligned(windowSize, dtype='complex128')
#ftwOut = ftw.empty_aligned(windowSize, dtype='complex128')
fftObject = ftw.FFTW(ftwIn,ftwOut, axes=(1,), flags=('FFTW_MEASURE','FFTW_DESTROY_INPUT',))
# perform DFT on the data chunk
demodFrames = dataChunk.shape[0]//windowSize
channelChunks = np.zeros([numChannels,demodFrames,
dataChunk.shape[1],dataChunk.shape[2]])
channelChunks = getDFT(dataChunk,channelChunks,
ftwIn,ftwOut,fftObject,windowSize,numChannels)
return channelChunks
def getDFT(data,channelOut,ftwIn,ftwOut,fftObject,
windowSize,numChannels):
frameLen = data.shape[0]
demodFrames = frameLen//windowSize
printed=0
nbwindow=data.shape[0]//windowSize
scale=1.0/windowSize
for yy in range(data.shape[1]):
#for xx in range(data.shape[2]):
index = 0
ftwIn[:] = np.reshape(data[0:nbwindow*windowSize,yy,:],(nbwindow,windowSize,data.shape[2]),order='C')
fftObject()
channelOut[:,:,yy,:]=np.transpose(2*np.abs(ftwOut[:,:,:])*scale, (1,0,2))
#for i in range(nbwindow):
#channelOut[:,i,yy,xx] = 2*np.abs(ftwOut[i,:])*scale
if printed==0:
for j in range(channelOut.shape[0]):
print j,channelOut[j,0,yy,0]
printed=1
return channelOut
if __name__ == '__main__':
seconds=time.time()
runme()
print "time: ", time.time()-seconds
Let us know how much it speeds up your computations! I went from 24s to less than 2s on my computer...

Categories