Create a raster stack in from xarray dataset in Python - python

I am trying to make a raster stack from an xarray dataset which I obtained from multiple netCDF files. There are 365 netCDF files each containing a 2D data of Sea Surface Temperature (SST) having height and width 3600 and 7200 respectively. To perform further operations I need to prepare a raster stack.
import netCDF4 as nc
import rasterio as rio
import numpy as np
import xarray as xr
import os
fpath = '/home/sst/2015'
pattern = '*.nc'
filelist = []
for root, dirs, files in os.walk(fpath):
for name in files:
filelist.append(os.path.join(fpath,name))
ds = xr.open_mfdataset(eflist, concat_dim='time', parallel = True) # netCDF data
ds_data = ds.sel(time='2015')['SST'] # xarray dataset with dimension 365x3600x7200.
Raster stack of this xarray will be used to extract data values at point locations. I am currently using numpy and rasterio as mentioned in rasterio documentation. By iterating over the 3D xarray following code writes 365 files to HDD and later I can read and stack those.
from rasterio.transform import from_origin
transform = from_origin(-180,90, 0.05, 0.05)
fpath = '/home/sst/sst_tif'
fname = 'sst_array_'
extname = '.tiff'
timedf = ds.time # time dimension to loop over
for i in range(len(timedf)):
np_array = np.array(ds_data[i])
iname = str(i)
fwname = fpath + fname + iname + extname
sst_tif = rio.open(fwname,
'w',
driver = 'GTiff',
height = 3600,
width = 7200,
dtype = np_array.dtype,
count = 1,
crs = 'EPSG:4326',
transform = transform)
sst_tif.write(np_array, 1)
sst_tif.close()
However this takes a very long time to process entire dataset. I also attempted converting entire xarray to a numpy array and writing all 365 layers in a single file but it freezes the Python kernel.
Is there any way I can create this raster stack in memory and do further processing without having to write it to file/s on HDD. I am trying to obtain functionality similar to that of stack function available in raster package of R.

Related

Change dimension and values of netcdf file in Python

I'm trying to change the dimension of values in netcdf file.
First I read a netcdf file and interpolated the data.
import numpy as np
import netCDF4
from scipy.interpolate import interp1d
def interpolation(a,b,c):
f = interp1d(a,b,kind='linear')
return f(c)
file = 'directory/test.nc'
data = netcdf4.Dataset(file)
lon = data.variables['lon'] # size = 10
lat = data.variables['lat'] # size = 10
lev = data.variables['lev'] # size = 100
values = data.variables['values'] # size = (100,10,10)
new_lev = np.linspace(0,1,200) # new vertical grid size = 200
new_values = np.full(len(new_lev), len(lat), len(lon)) # size = (200,10,10)
### interpolation ###
for loop_lat in range(len(lat)):
for loop_lon in range(len(lon)):
new_values[:, loop_lat, loop_lon] = interpolation(lev, values[:,loop_lat,loop_lon], new_lev)
## how can I save these new_lev and new_values in the netcdf file ?
Using the interpolation, I converted the values of dimension A to dimension B.
Let say the original dimension A is 100 and interpolated dimension B is 200.
After the changing dimension, how can I save this values and dimension into netcdf file?
Could you please give me some advise?
You can open a NetCDF file for editing in place. See:
https://unidata.github.io/netcdf4-python/#creatingopeningclosing-a-netcdf-file
Rather than:
data = netcdf4.Dataset(file)
Try:
data = netCDF4.Dataset(file,'r+',clobber=True).

Fitness tracking using Python Numpy and Matplotlib module. Takes latest data from a specified folder and stores them in a numpy array

I have a folder fitness_tracker, where i have more folders like Location GPS 2021-10-23. the only difference between folders is the date. IN all these subfolders there is a CSV file Named Raw Data.
Raw Data includes, time velocity latitude longitiude in different columns. i want to write a program that goes into fitness_tracker, takes the latest folders ( lets say 5 out of 10 folders ) by reading the file names and goes into those folders and reads the Raw Data csv files and stores time data in a single matrix array. right now i can do it for a single file using NUMPY.
i want to read time value from Raw Data from separate folder and store it in a matrix
time = np.array([t1, t2, t3,t4,t5])
and then use these data to make a graph using matplot lib
this is the program i am running now.
import numpy as np
import matplotlib.pyplot as plt
bus_data = np.loadtxt('Raw Data.csv',delimiter=',',skiprows=1) # 1a. Import GPS File
time = bus_data[:,0]/60 # Second to minute
latitude = bus_data[:,1]
longitude = bus_data[:,2]
altitude = bus_data[:,3] # Unit = Meter
speed = bus_data[:,5] # Unit = Meter / second
distance = bus_data[:,7] # Unit = kilometer
fig1, axs1 = plt.subplots(1, 1)
axs1.plot(distance, speed, 'k.',markersize = 1, label='data')
axs1.set(xlabel='Distance (km)', ylabel='Speed (m/s)')
axs1.set_title('Speed over Distance')
axs1.legend()
plt.savefig('Speed over Distance.png',dpi=200)
plt.show()
This is how I'd get the time data from a List of directories:
import numpy as np
from os import listdir
from os.path import isfile, join
directories = [dir1, dir2, ...]
files = []
for directory in directories:
files.append([f for f in listdir(directory) if isfile(join(directory, f)) and f.endswith(".csv")])
data = []
for file in files:
data.append(np.loadtxt('Raw Data.csv',delimiter=',',skiprows=1))
time = []
for d in data:
time.append(d[:,0]/60)
If you want to get the list of directories based on their name (and thus their date) you'll have to do some more work parsing dates from the names of the folders, but i guess that deserves a new question, as this one is already asked way too broadly imho.

Average thousands of partially overlapping rasters with GDAL

I have thousands of partially overlapping rasters that are each about 6500 x 6500 px, with a 500 px overlap on each of the four edges.
I am trying to merge them into one geotiff and average the overlapping regions. I have followed the guidance in this question, which came to the following conclusions.
gdal_merge and gdal_translate -r average options do not actually average overlapping regions
Using a Pixel Function and GDALBuildVRT will work
Based upon this, I have set up the following code to create a VRT, add a pixel function, and create a single TIF
out_name = "output"
tifs = ['tif1.tif', .... 'tif1235.tif']
gdal.BuildVRT(f'{out_name}.vrt', tifs, options=gdal.BuildVRTOptions(srcNodata=255, VRTNodata=255))
add_pixel_fn(f'{out_name}.vrt')
ds = gdal.Open(f'{out_name}.vrt')
translateoptions = gdal.TranslateOptions(gdal.ParseCommandLine("-ot Byte -co COMPRESS=LZW -a_nodata 255"))
ds = gdal.Translate(f'{out_name}.tif', ds, options=translateoptions)
Where the pixel function is defined as
def add_pixel_fn(filename: str) -> None:
"""inserts pixel-function into vrt file named 'filename'
Args:
filename (:obj:`string`): name of file, into which the function will be inserted
resample_name (:obj:`string`): name of resampling method
"""
header = """ <VRTRasterBand dataType="Byte" band="1" subClass="VRTDerivedRasterBand">"""
contents = """
<PixelFunctionType>average</PixelFunctionType>
<PixelFunctionLanguage>Python</PixelFunctionLanguage>
<PixelFunctionCode><![CDATA[
from numba import jit
import numpy as np
#jit(nogil=True)
def average_jit(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, buf_radius, gt):
np.mean(in_ar, axis = 0,out = out_ar, dtype = 'uint8')
def average(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,raster_ysize, buf_radius, gt):
average_jit(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,raster_ysize, buf_radius, gt)
]]>
</PixelFunctionCode>"""
lines = open(filename, 'r').readlines()
lines[3] = header
lines.insert(4, contents)
open(filename, 'w').write("".join(lines))
This works, but it:
a) Loads all the TIFs into RAM -- this is about 30+ gb, which is rough
b) Takes days to run
Curious if anyone knows of any faster way to approach this problem!

Iterating over files and stitching together 3D arrays

I have several netCDF files that have an array in the shape of (365,585,1386) and I'm trying to read in each new array and stitch them together along axis = 0; or add all of the days of the year (365). The other two dimensions are latitude and longitude, so ideally I have several years of data for each lat/long point (each netCDF file is one calendar year of data).
import glob
from netCDF4 import Dataset
import numpy as np
data = '/Users/sjakober/Documents/ResearchSpring2020/erc_1979.nc'
files = sorted(glob.glob('erc*'))
for x, f in enumerate(files):
nc = Dataset(data, mode='r')
print(f)
if x == 0:
a = nc.variables['energy_release_component-g'][:]
else:
b = nc.variables['energy_release_component-g'][:]
np.hstack((a, b))
nc.close()

Convert Longitude, latitude to Pixel values using GDAL

I have a satellite GeoTIFF Image and a corresponding OSM file with only the highways. I want to convert the longitude latitude value in the OSM file to the pixels and want to highlight highway on the satellite image.
I have tried several methods that are explained on StackExchange. But I get the negative and same pixel value for every longitude and latitude values. Could somebody explain, what am I missing?
Here is the information of the image that I have gathered using OTB application.
Here is the code that i am using.
from osgeo import gdal, osr
import numpy as np
import xml.etree.ElementTree as xml
src_filename = 'image.tif'
dst_filename = 'foo.tiff'
def readLongLat(path):
lonlatList = []
latlongtuple = ()
root = xml.parse(path).getroot()
for i in root:
if i.tag == "node":
latlong = []
lat = float(i.attrib["lat"])
long = float(i.attrib["lon"])
latlong.append(lat)
latlong.append(long)
lonlatList.append(latlong)
return lonlatList
# Opens source dataset
src_ds = gdal.Open(src_filename)
format = "GTiff"
driver = gdal.GetDriverByName(format)
# Open destination dataset
dst_ds = driver.CreateCopy(dst_filename, src_ds, 0)
# Get raster projection
epsg = 4269 # http://spatialreference.org/ref/sr-org/lambert_conformal_conic_2sp/
srs = osr.SpatialReference()
srs.ImportFromEPSG(epsg)
# Make WGS84 lon lat coordinate system
world_sr = osr.SpatialReference()
world_sr.SetWellKnownGeogCS('WGS84')
transform = src_ds.GetGeoTransform()
gt = [transform[0],transform[1],0,transform[3],0,-transform[5]]
#Reading the osm file
lonlat = readLongLat("highways.osm")
# Transform lon lats into XY
coord_transform = osr.CoordinateTransformation(world_sr, srs)
newpoints = coord_transform.TransformPoints(lonlat) # list of XYZ tuples
# Make Inverse Geotransform (try:except due to gdal version differences)
try:
success, inverse_gt = gdal.InvGeoTransform(gt)
except:
inverse_gt = gdal.InvGeoTransform(gt)
# [Note 1] Set pixel values
marker_array_r = np.array([[255]], dtype=np.uint8)
marker_array_g = np.array([[0]], dtype=np.uint8)
marker_array_b = np.array([[0]], dtype=np.uint8)
for x,y,z in newpoints:
pix_x = int(inverse_gt[0] + inverse_gt[1] * x + inverse_gt[2] * y)
pix_y = int(inverse_gt[3] + inverse_gt[4] * x + inverse_gt[5] * y)
dst_ds.GetRasterBand(1).WriteArray(marker_array_r, pix_x, pix_y)
dst_ds.GetRasterBand(2).WriteArray(marker_array_g, pix_x, pix_y)
dst_ds.GetRasterBand(3).WriteArray(marker_array_b, pix_x, pix_y)
# Close files
dst_ds = None
src_ds = None
Something I have tried recently is using the xarray module. I think of xarray as a hybrid between pandas and numpy that allows you to store information as an array but access it using simply .sel requests. Docs here.
UPDATE: Seems as if rasterio and xarray are required to be installed for the below method to work. See link.
It is a much simpler way of translating a GeoTiff file to a user-friendly array. See my example below:
import xarray as xr
ds = xr.open_rasterio("/path/to/image.tif")
# Insert your lat/lon/band below to extract corresponding pixel value
ds.sel(band=2, lat=19.9, lon=39.5, method='nearest').values
>>> [10.3]
This does not answer your question directly, but may help you identify a different (and probably simpler) approach that I've recently switched to.
Note: obviously care needs to be taken to ensure that your lat/lon pairs are in the same coordinate system as the GeoTiff file, but I think you're handling that anyway.
I was able to do that using the library geoio.
import geoio
img = geoio.GeoImage(src_filename)
pix_x, pix_y = img.proj_to_raster(lon,lat)

Categories