Writing interpolated grib2 data with pygrib leads to unusable grib file - python

I'm trying to use pygrib to read data from a grib2 file, interpolate it using python, and write it to another file. I've tried both pygrib and eccodes and both produce the same problem. The output file size increased by a factor of 3, but when I try to view the data in applications like Weather and Climate Toolkit it has all the variables listed, but "No Data" when plotted. If I use the same script and don't interpolate the data, but just write it to the new file it works fine in WCT. If I use wgrib2 it lists all the grib messages, but if I use wgrib2 -V it works on the unaltered data but produces the error "*** FATAL ERROR: unsupported: code table 5.6=0 ***" for the interpolated data. Am I doing something wrong in my python script? Here is an example of what I'm doing to write the file (same result using pygrib 2.05 and 2.1.3). I used a basic hrrr file for the example.
import pygrib
import numpy as np
import sys
def writeNoChange():
# This produces a useable grib file.
filename = 'hrrr.t00z.wrfprsf06.grib2'
outfile = 'test.grib2'
grbs = pygrib.open(filename)
with open(outfile, 'wb') as outgrb:
for grb in grbs:
msg = grb.tostring()
outgrb.write(msg)
outgrb.close()
grbs.close()
def writeChange():
# This method produces a grib file that isn't recognized by WCT
filename = 'hrrr.t00z.wrfprsf06.grib2'
outfile = 'testChange.grib2'
grbs = pygrib.open(filename)
with open(outfile, 'wb') as outgrb:
for grb in grbs:
vals = grb.values * 1
grb['values'] = vals
msg = grb.tostring()
outgrb.write(msg)
outgrb.close()
grbs.close()
#-------------------------------
if __name__ == "__main__":
writeNoChange()
writeChange()

Table 5.6 for GRIB2 (https://www.nco.ncep.noaa.gov/pmb/docs/grib2/grib2_doc/) is related to "ORDER OF SPATIAL DIFFERENCING".
For some reason, when you modify grb['values'], it sets grb['orderOfSpatialDifferencing'] = 0, which "wgrib2 -V" doesn't like. So, after changing 'values', change 'orderOfSpatialDifferencing' to what it was initially:
orderOfSpatialDifferencing = grb['orderOfSpatialDifferencing']
grb['values']= [new values]
grb['orderOfSpatialDifferencing'] = orderOfSpatialDifferencing
This worked for me in terms of getting wgrib2 -V to work, but messed up the data. Possibly some other variables in Section 5 also need to be modified.

Related

Converting Shapefile to Geojson

In short, I am trying to convert a shapefile to geojson using gdal. Here is the idea:
from osgeo import gdal
def shapefile2geojson(infile, outfile):
options = gdal.VectorTranslateOptions(format="GeoJSON", dstSRS="EPSG:4326")
gdal.VectorTranslate(outfile, infile, options=options)
Okay then here is my input & output locations:
infile = r"C:\Users\clay\Desktop\Geojson Converter\arizona.shp"
outfile = r"C:\Users\clay\Desktop\Geojson Converter\arizona.geojson"
Then I call the function:
shapefile2geojson(infile, outfile)
It never saves where I can find it, if it is working at all. It would be nice if it would pull from a file and put the newly converted geojson in the same folder. I do am not receiving any errors. I am using windows and Jupyter Notebook and am a noob. I don't know if I am using this right:
r"C:\Users\clay\Desktop\Geojson Converter\arizona.shp"

Managing HDF5 Object Reference

I am trying to load a mat file for the Street View House Numbers (SVHN) Dataset http://ufldl.stanford.edu/housenumbers/ in Python with the following code
import h5py
labels_file = './sv/train/digitStruct.mat'
f = h5py.File(labels_file)
struct= f.values()
names = struct[1].values()
print(names[1][1].value)
I get [<HDF5 object reference>] but I need to know the actual string
To get an idea of the data layout you could execute
h5dump ./sv/train/digitStruct.mat
but there are also other methods like visit or visititems.
A good reference that can help you and that seems to have already addressed a very similar problem (if not the same) recently is the following SO post:
h5py, access data in Datasets in SVHN
For example the snippet:
import h5py
import numpy
def get_name(index, hdf5_data):
name = hdf5_data['/digitStruct/name']
print ''.join([chr(v[0]) for v in hdf5_data[name[index][0]].value])
labels_file = 'train/digitStruct.mat'
f = h5py.File(labels_file)
for j in range(33402):
get_name(j, f)
will print the name of the files. I get for example:
7459.png
7460.png
7461.png
7462.png
7463.png
7464.png
7465.png
You can generalize from here.

Work with multiple netCDF files/variables in python

I have around 4TB MERIS time series data which comes in netCDF format.
So I have a lot netCDF files containing several 'variables'.
NetCDF format is new to me and although I've read a lot about netCDF processing I don't get an idea of how to do it. This question 'Combining a large amount of netCDF files' deals somehow with my problem but I did not get there. My approach was to first mosaic, then stack and lately take the mean out of every pixel.
One file contains the following 32 variables
Here's additional the ncdump output of one .nc file of one day:
http://www.filedropper.com/ncdumpoutput
I managed to read the files, extract the variables I want (variable # 32) and put them into a list using the following code
l = list()
for i in files_in:
# read netCDF file
dset = nc.Dataset(i, mode = 'r')
# save variables
var = dset.variables['vegetation_index_mean'][:]
# write all temp loop outputs in a list
l.append (var)
# close netCDF file
dset.close()
The list now contains 24 'masked_arrays' of different locations of the same date.
Every time I want to print the contents of the list my Spyder freezes. Every command I run afterwards Spyder first freezes for five sec before starting.
My goal is to make a time series analysis for a specific time frame (every date stored in a single .nc file). So my plan was to mosaic (is this possible?) the variables in the list (treating them as raster bands), process additional dates and take the mean for every pixel (1800 x 1800 ).
Maybe my whole approach is wrong? Can I treat these 'variables' like raster bands?
I'm not sure if the following answer may respond to your needs, as this procedure is designed in order to process timeseries, is pretty manual and furthermore you have 4Tb of data...
Thus I apologize myself if this doesn't help.
This is for Python 2.7:
First import all the modules needed:
import tkFileDialog
from netCDF4 import Dataset
import matplotlib.pyplot as plt
Second parse multiple nc files:
n = []
filename = {}
filename = tkFileDialog.askopenfilenames()
filename = list(filename)
n = len(filename)
Third read nc files and classify data and metadata within dictionaries using a loop:
wtr_tem = {} # create empty arrays for variable sea water temperature
fh = {} # create empty arrays for filehandler and variables nc file
vars = {}
for i in range(n):
filename[i]=filename[i].decode('unicode_escape').encode('ascii','ignore') # remove unicode in order to execute the following command
filename1 = ''.join(filename[i]) # converts list to string
fh[i] = Dataset(filename1, mode='r') #create the file handle
vars[i] = fh[i].variables.keys() #returns a list with the variables of the file
wtr_tem[i] = fh[i].variables['WTR_TEM']
#plot variables in different figures
plt.plot(wtr_tem[i],'r-')
plt.xlabel(fh[i].title) #add specific title from each nc file
plt.show()
I hope it may help to somebody.

Exporting data to .mat file in python

I am working on a script in python where I can pipe ls output to the script and open all the files I want to work with in python using scipy.io, and then I want to take all the imported data, and assign them into a .mat file (again using scipy.io). I have had success importing the data and assigning it to a dictionary to export, but when I load my output file in MATLAB none of the data looks at all the same.
The data I am importing all has a lat/lon coordinate attached to it so I will use that data as an example. Data is coming from a netCDF (.nc) file:
#!/usr/bin/python
import sys
import scipy.io as sio
import numpy as np
# script take absolute path inputs (or absolute paths are desirable)
# to access an abolute path on a linux systen use the command:
# ls -d $PWD/* OR if files not in PWD then ls -d /absolute_file_path/*
# and then pipe the standard output of that to input of this script
# initialize dictionary (param) and cycle number (i.e. datafile number)
cycle = []
param = {}
# read each file from stdin
for filename in sys.stdin:
fh = sio.netcdf.netcdf_file(filename,'r')
# load in standard variables (coordinates)
latitude = fh.variables['LATITUDE'][:]
longitude = fh.variables['LONGITUDE'][:]
# close file
fh.close()
# get file number (cycle number)
cycle = filename[-7:-4]
# add latest imported coordinate to dictionary
latvar = 'lat' + cycle
lonvar = 'lon' + cycle
param.update({latvar:latitude})
param.update({lonvar:longitude})
# export dictionary to .mat file
sio.savemat('test.mat',param)
When I print the values to check if they are imported correctly I get reasonable values - but when I open my exported values in MATLAB this is an example of my output:
>> lat001
lat001 =
-3.5746e-133
and other variables have similar large exponent value (sometime very small as in this example sometimes extremely large ~1e100).
I have tried looking for similar problems but all I have come across is that some have had issues assigning large amounts of data to a single .mat file (ex. an array exceeding 2**32-1 bytes).
EDIT: Some example outputs (loading file in python, datatypes)
print latitude
[ 43.091]
print type(latitude)
<type 'numpy.ndarray'>
data = sio.loadmat('test.mat')
print data['lat001']
array([[ -3.57459142e-133]])

Python: Saving / loading large array using numpy

I have saved a large array of complex numbers using python,
numpy.save(file_name, eval(variable_name))
that worked without any trouble. However, loading,
variable_name=numpy.load(file_name)
yields the following error,
ValueError: total size of new array must be unchanged
Using: Python 2.7.9 64-bit and the file is 1.19 GB large.
There is no problem with the size of your array, you likely didn't opened your file in the right way, try this:
with open(file_name, "rb") as file_:
variable_name = np.load(file_)
Alternatively you can use pickle:
import pickle
# Saving:
data_file = open('filename.bi', 'w')
pickle.dump(your_data, data_file)
data_file.close()
# Loading:
data_file = open('filename.bi')
data = pickle.load(data_file)
data_file.close()

Categories