convert tiff to netcdf - python

i try to convert a tiff to netcdf file. errors is saying index error:
import numpy as np
from netCDF4 import Dataset
import rasterio
with rasterio.drivers():
src=rasterio.open(r"ia.tiff","r")
dst_transform=src.transform
dst_width=src.width
dst_height=src.height
print (dst_transform)
xmin = dst_transform[0]
xmax = dst_transform[0] + dst_transform[1]*dst_width
print (xmax)
min = dst_transform[3] + dst_transform[5]*dst_height
print(ymin)
ymax = dst_transform[3]
dst_width=dst_width+1
dst_height=dst_height+1
outf=Dataset(r'ia.nc','w',format='NETCDF4_CLASSIC')
lats=np.linspace(ymin,ymax,dst_width)
lons=np.linspace(xmin,xmax,dst_height)
lat=outf.createDimension('lon',len(lats))
lon=outf.createDimension('lat',len(lons))
longitude=outf.createVariable('longitude',np.float64,('lon',))
latitude=outf.createVariable('latitude',np.float64,('lat',))
SHIA=outf.createVariable('SHIA',np.int8,('lon','lat'))
outf.variables['longitude'][:]=lons
outf.varibales['longitude'][:]=lat
im=src.read()
SHIA[:,:]=im
outf.description="IA for"
longitude.units="degrees east"
latitude.units='degrees north'
print ("created empty array")
outf.close()
outf.close()
error is that index error: size of the data array does not conform to slice. can somebody take a look and help me where i did wrong. Much appreciated!

I use xarray for this kind of things. Create xarray DataArray for each variable you have (seems SHIA for yours). Create DataSet and related DataArray with it. Don't forget to set coordinate variables into Dataset as coordinate.
see:
http://xarray.pydata.org/en/stable/io.html
Also you can convert your netcdf / tiff into dataframe and return again. But i wouldn't recommend this till you have to. Beause netcdf is multidimensional data and dataframe represent all data as cloning to one matrix.

The easiest way I could think of is to use the GDAL tool.
# Convert TIF to netCDF
gdal_translate -of netCDF -co "FOMRAT=NC4" ia.tif ia.nc
# Convert SHP to netCDF
gdal_rasterize -of netCDF -burn 1 -tr 0.01 0.01 input.shp output.nc

Related

Regrid Netcdf file in python

I'm trying to regrid a NetCDF file from 0.125 degrees to 0.083-degree spatial scale. The netcdf contains 224 latitudes and 464 longitudes and it has daily data for one year.
I tried xarray for it but it produces this memory error:
MemoryError: Unable to allocate 103. GiB for an array with shape (13858233841,) and data type float64
How can I regrid the file with python?
A Python option, using CDO as a backend, is my package nctoolkit: https://nctoolkit.readthedocs.io/en/latest/, instalable via pip (https://pypi.org/project/nctoolkit/)
It has a built in method called to_latlon which will regrid to a specified latlon grid
In your case, you would need to do:
import nctoolkit as nc
data = nc.open_data(infile)
data.to_latlon(lon = [lon_min,lon_max],lat=[lat_min,lat_max], res =[0.083, 0.083])
Another option is try cf-python, which can (in general) regrid larger-than-memory datasets in both spherical polar coordinates and Cartesian coordinates. It uses the ESMF regridding engine to do this, so linear, first and second-order conservative, nearest neighbour, etc. regridding methods are available.
Here is an example of the kind of regridding that you need:
import cf
import numpy
f = cf.example_field(2) # Use cf.read to read your own data
print('Source field:')
print(f)
# Define the output grid
lat = cf.DimensionCoordinate(
data=cf.Data(numpy.arange(-90, 90.01, 0.083), 'degreesN'))
lon = cf.DimensionCoordinate(
data=cf.Data(numpy.arange(0, 360, 0.083), 'degreesE'))
# Regrid the field
g = f.regrids({'latitude': lat, 'longitude': lon}, method='linear')
print('\nRegridded field:')
print(g)
which produces:
Source field:
Field: air_potential_temperature (ncvar%air_potential_temperature)
------------------------------------------------------------------
Data : air_potential_temperature(time(36), latitude(5), longitude(8)) K
Cell methods : area: mean
Dimension coords: time(36) = [1959-12-16 12:00:00, ..., 1962-11-16 00:00:00]
: latitude(5) = [-75.0, ..., 75.0] degrees_north
: longitude(8) = [22.5, ..., 337.5] degrees_east
: air_pressure(1) = [850.0] hPa
Regridded field:
Field: air_potential_temperature (ncvar%air_potential_temperature)
------------------------------------------------------------------
Data : air_potential_temperature(time(36), latitude(2169), longitude(4338)) K
Cell methods : area: mean
Dimension coords: time(36) = [1959-12-16 12:00:00, ..., 1962-11-16 00:00:00]
: latitude(2169) = [-90.0, ..., 89.94399999999655] degreesN
: longitude(4338) = [0.0, ..., 359.971] degreesE
: air_pressure(1) = [850.0] hPa
There are plenty of options to get the destination grid from other fields, as well as defining it explicitly. More details can be found in the documentation
cf-python will infer which axes are X and Y, etc from the CF metadata attached to the dataset, but if that is missing then there are always ways to manually set it or work around it.
The easiest way to do this is to use operators like cdo and nco.
For example:
cdo remapbil,target_grid infile.nc ofile.nc
The target_grid can be a descriptor file or you can use a NetCDF file with your desired grid resolution. Take note of other regridding methods that might suit your need. The example above is using bilinear interpolation.
Xarray uses something called 'lazy loading' to try and avoid using too much memory. Somewhere in your code, you are using a command which loads the entirety of the data into memory, which it cannot do. Instead, you should specify the calculation, then save the result directly to file. Xarray will perform the calculation a chunk at a time without loading everything into memory.
An example of regridding might look something like this:
da_input = open_dataarray(
'input.nc') # the file the data will be loaded from
regrid_axis = np.arange(-90, 90, 0.125) # new coordinates
da_output = da_input.interp(lat=regrid_axis) # specify calculation
da_ouput.to_netcdf('output.nc') # save direct to file
Doing da_input.load(), or da_output.compute(), for example, would cause all the data to be loaded into memory - which you want to avoid.
Another way to access the cdo functionality from within python is to make use of the Pypi cdo project:
pip install cdo
Then you can do
from cdo import Cdo
cdo=Cdo()
cdo.remapbil("target_grid",input="in.nc",output="out.nc")
where target_grid is your usual list of options
a nc file to use the grid from
a regular grid specifier e.g. r360x180
a txt file with a grid descriptor (see below)
There are several methods built in for the regridding:
remapbic : bicubic interpolation
remapbil : bilinear interpolation
remapnn : nearest neighbour interpolation
remapcon : first order conservative remapping
remapcon2 : 2nd order conservative remapping
You can use a grid descriptor file to define the area you need to interpolate to...
in the file grid.txt
gridtype=lonlat
xfirst=X (here X is the longitude of the left hand point)
xinc=0.083
xsize=NX (here put the number of points in domain)
yfirst=Y
yinc=0.083
ysize=NY
For more details you can refer to my video guide on interpolation.

How to preprocess NIfTI data format using NiBabel (Python)

after convertig a NIfTI-file to an array using NiBabel, the array has three dimensions and the numbers look like this:
[-9.35506855e-42 -1.78675141e-35 1.18329136e-30 -1.58892995e-24
5.25227377e-24 1.11677966e-23 -2.41237451e-24 -1.51333104e-25
6.79829196e-30 -9.84047188e-36 1.23314265e-43 -0.00000000e+00]
How can I preprocess this array for machine-learning? When choosing only the exponent, most of the information gets lost when plotting the image, so maybe the base is also important?
Any help is appreciated.
This will help you to convert a niftiimage to access as numpy array:
img = nib.load(example_filename)
data = img.get_fdata()

plotting a large matrix in python

I have a data file in excel (.xlsx). The data represents a 100 micrometer by 100 micrometer area. Number of steps were set at 50 for x and 50 for y meaning each pixel is 2 micrometer in size. How can I create a 2D image from this data.
getting data from xslx files can be achieved using the openpyxl python module. after installing the module a simple example is (assuming you have an xslx as in the image attached):
from openpyxl import load_workbook
wb = load_workbook("/path/to/matrix.xlsx")
cell_range = wb['Sheet1']['B2:G16']
for row in cell_range:
for cell in row:
print(str(cell.value) + " ", end='')
print("")
this would print all the vaules in the range, you could also read them into a numpy array and plot. xslx example
If you are willing to plot the pixels instead of points using matplotlib then you can convert your dataframe into numpy array and then plot that array using imshow() method of matplotlib, after manipulating the numpy array as per your need.

Convert raster time series of multiple GeoTIFF images to NetCDF

I have a raster time series stored in multiple GeoTIFF files (*.tif) that I'd like to convert to a single NetCDF file. The data is uint16.
I could probably use gdal_translate to convert each image to netcdf using:
gdal_translate -of netcdf -co FORMAT=NC4 20150520_0164.tif foo.nc
and then some scripting with NCO to extract dates from filenames and then concatenate, but I was wondering whether I might do this more effectively in Python using xarray and it's new rasterio backend.
I can read a file easily:
import glob
import xarray as xr
f = glob.glob('*.tif')
da = xr.open_rasterio(f[0])
da
which returns
<xarray.DataArray (band: 1, y: 5490, x: 5490)>
[30140100 values with dtype=uint16]
Coordinates:
* band (band) int64 1
* y (y) float64 5e+05 5e+05 5e+05 5e+05 5e+05 4.999e+05 4.999e+05 ...
* x (x) float64 8e+05 8e+05 8e+05 8e+05 8.001e+05 8.001e+05 ...
Attributes:
crs: +init=epsg:32620
and I can write one of these to NetCDF:
ds.to_netcdf('foo.nc')
but ideally I would be able to use something like xr.open_mfdataset , write the time values (extracted from the filenames) and then write the entire aggregation to netCDF. And have dask handle the out-of-core memory issues. :-)
Can something like this be done with xarray and dask?
Xarray should be able to do the concat step for you. I have adapted your example a bit below. It will be up to you to parse the filenames into something useful.
import glob
import pandas as pd
import xarray as xr
def time_index_from_filenames(filenames):
'''helper function to create a pandas DatetimeIndex
Filename example: 20150520_0164.tif'''
return pd.DatetimeIndex([pd.Timestamp(f[:8]) for f in filenames])
filenames = glob.glob('*.tif')
time = xr.Variable('time', time_index_from_filenames(filenames))
chunks = {'x': 5490, 'y': 5490, 'band': 1}
da = xr.concat([xr.open_rasterio(f, chunks=chunks) for f in filenames], dim=time)

How do I import tif using gdal?

How do I import tif using gdal?
I'm trying to get my tif file in a usable format in Python, so I can analyze the data. However, every time I import it, I just get an empty list. Here's my code:
xValues = [447520.0, 432524.0, 451503.0]
yValues = [4631976.0, 4608827.0, 4648114.0]
gdal.AllRegister()
dataset = gdal.Open('final_snow.tif', GA_ReadOnly)
if dataset is None:
print 'Could not open image'
sys.exit(1)
data = np.array([gdal.Open(name, gdalconst.GA_ReadOnly).ReadAsArray() for name, descr in dataset.GetSubDatasets()])
print 'this is data ', data`
It always prints an empty list, but it doesn't throw an error. I checked out other questions, such as [this] (Create shapefile from tif file using GDAL) What might be the problem?
For osgeo.gdal, it should look like this:
from osgeo import gdal
gdal.UseExceptions() # not required, but a good idea
dataset = gdal.Open('final_snow.tif', gdal.GA_ReadOnly)
data = dataset.ReadAsArray()
Where data is either a 2D array for 1-banded rasters, or a 3D array for multiband.
An alternative with rasterio looks like:
import rasterio
with rasterio.open('final_snow.tif', 'r') as r:
data = r.read()
Where data is always a 3D array, with the first dimension as band index.

Categories