How to iterate over latitude longitude values of a grib2 file?

How to iterate over latitude longitude values of a grib2 file? - python

I'm trying to get the uvIndex of all the lat,lng present in a grib2 file.
This is the link from where I'm getting the file. The problem is I'm not able to understand the structure of the file so that I can get the data. I'm using pygrib to read the file.
Here's the code I've tried:
grbs = pygrib.open('uv.t12z.grbf01.grib2')
grb = grbs.select(name='UV index')[0]
print grb.data(23.5,55.5)
What I'm trying to achieve is either iterate over all the lat longs and print the corresponding uvIndex value or enter a lat long and get the corresponding value. Read the docs of pygrib but couldn't find any suitable command that will serve my purpose. Please help.

You have to iterate though GRIB file and find desirable record, then get data, like here:
for g in grbs:
print g.shortName, g.typeOfLevel, g.level # print info about all GRIB records and check names
if (g.shortName == shortName and g.typeOfLevel == typeOfLevel and g.level == level):
tmp = np.array(g.values)
# now work with tmp as numpy array
To get lat and lon arrays use: lt, ln = g.latlons(), g - element of grbs.
Read the examples in Section python at https://software.ecmwf.int/wiki/display/GRIB/GRIB+API+examples (pygrib use this library to read GRIB).
The fastest way to get data from large GRIB file is to make index:
# use attributes what you want to build index
indx = pygrib.index(gribfile,'typeOfLevel','level','parameterName')
# important: msg is an array and may have more then one record
# get U wind component on 10 m above ground
msg = indx.select(level = 10, typeOfLevel = "heightAboveGround",
parameterName = "U U-component of wind m s**-1")
u10 = np.array(msg[0].values)
# get V wind component on 10 m above ground
msg = indx.select(level = 10, typeOfLevel = "heightAboveGround",
parameterName = "V V-component of wind m s**-1")
v10 = np.array(msg[0].values)

Related

Python good practice with NetCDF4 shared dimensions across groups

This question is conceptual in place of a direct error.
I am working with the python netcdf4 api to translate and store binary datagram packets from multiple sensors packaged in a single file. My question is in reference to Scope of dimensions and best use practices.
According to the Netcdf4 convention and metadata docs, dimension scope is such that all child groups have access to a dimension defined in the parent group (http://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#_scope).
Context:
The multiple sensors datapackets are written to a binary file. Timing adjustments are handled prior to writing the binary file such that we can trust the time stamp of a data packet. Time sampling rates are not synchonious. Sensor 1 samples at say 1Hz. Sensor 2 samples at 100Hz. Sensor 1 and 2 measure a number of different variables.
Questions:
Do I define a single, unlimited time dimension at the root level and create multiple variables using that dimension, or create individual time dimensions at the group level. Psuedo-code below.
In setting up the netcdf I would use the following code:
import netcdf4
data_set = netcdf4.Dataset(file_name, mode='w')
dim_time = data_set.createDimension(dimname='time', size=None)
grp_1 = data_set.createGroup('grp1')
var_time_1 = grp_1.createVariable(varname='sensor_1_time', datatype='f8', dimensions=(time,))
var_time_1[:] = sensor_1_time # timestamp of data values from sensor_1
var_1 = grp_1.createVariable(varname='sensor_1_data', datatype='f8',
dimensions=(time,))
var_1[:] = sensor_1_data # data values from sensor 1
grp_2 = data_set.createGroup('grp2')
var_time_2 = grp_2.createVariable(varname='sensor_2_time', datatype='f8', dimensions=(time,))
var_time_2[:] = sensor_2_time
var_2 = grp_2.createVariable(varname='sensor_2_data', datatype='f8', dimension=(time,))
var_2[:] = sensor_2_data # data values from sensor 2
The group separation is not necessarily by sensor but by logical data grouping. In the case that data from two sensors falls into multiple groups, is it best to replicate the time array in each group or is it acceptable to reference to other groups using the Scope mechanism.
import netcdf4
data_set = netcdf4.Dataset(file_name, mode='w')
dim_time = data_set.createDimension(dimname='time', size=None)
grp_env = data_set.createGroup('env_data')
sensor_time_1 = grp_env.createVariable(varname='sensor_1_time', datatype='f8', dimensions=(time,))
sensor_time_1[:] = sensor_1_time # timestamp of data values from sensor_1
env_1 = grp_env.createVariable(varname='sensor_1_data', datatype='f8', dimensions=(time,))
env_1[:] = sensor_1_data # data values from sensor 1
env_2 = grp_1.createVariable(varname='sensor_2_data', datatype='f8', dimensions=(time,))
env_2.coordinates = "/grp_platform/sensor_time_1"
grp_platform = data_set.createGroup('platform')
sensor_time_2 = grp_platform.createVariable(varname='sensor_2_time', datatype='f8', dimensions=(time,))
sensor_time_2[:] = sensor_2_time
plt_2 = grp_platform.createVariable(varname='sensor_2_data', datatype='f8', dimension=(time,))
var_2[:] = sensor_2_data # data values from sensor 2
Most examples do not deal with these cross group functionality and I can't seem to find the best practices. I'd love some advice, or even a push in the right direction.

How to extract data from netCDF to CSV using simple python script?

This question is related to data extraction from NetCDF files (*.nc) to CSV, to be further used by researchers for other applications - statistical analysis along time series that are not array-like data. There are two alternatives for beginners in python, namely:
Script 1:
I need your help in extracting climate date from an nc file to csv using python. The data founder, namely Copernicus gives some advice in how to extract data from nc to csv using simple python script. However, I have some issues with OverflowError: int too big to convert.
I will briefly describe the steps and provide all necessary info regarding data shape and content.
#this is for reading the .nc in the working folder
import glob
#this is reaquired ti read the netCDF4 data
from netCDF4 import Dataset
#required to read and write the csv files
import pandas as pd
#required for using the array functions
import numpy as np
from matplotlib.dates import num2date
data = Dataset('prAdjust_tmean.nc')
And data looks like this:
print(data)
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
CDI: Climate Data Interface version 1.8.2 (http://mpimet.mpg.de/cdi)
frequency: year
CDO: Climate Data Operators version 1.8.2 (http://mpimet.mpg.de/cdo)
creation_date: 2020-02-12T15:00:49ZCET+0100
Conventions: CF-1.6
institution_url: www.smhi.se
invar_platform_id: -
invar_rcm_model_driver: MPI-M-MPI-ESM-LR
time_coverage_start: 1971
time_coverage_end: 2000
domain: EUR-11
geospatial_lat_min: 23.942343
geospatial_lat_max: 72.641624
geospatial_lat_resolution: 0.04268074 degree
geospatial_lon_min: -35.034023
geospatial_lon_max: 73.937675
geospatial_lon_resolution: 0.009246826 degree
geospatial_bounds: -
NCO: netCDF Operators version 4.7.7 (Homepage = http://nco.sf.net, Code = http://github.com/nco/nco)
acknowledgements: This work was performed within Copernicus Climate Change Service - C3S_424_SMHI, https://climate.copernicus.eu/operational-service-water-sector, on behalf of ECMWF and EU.
contact: Hydro.fou#smhi.se
keywords: precipitation
license: Copernicus License V1.2
output_frequency: 30 year average value
summary: Calculated as the mean annual values of daily precipitation averaged over a 30 year period.
comment: The Climate Data Operators (CDO) software was used for the calculation of climate impact indicators (https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf, https://code.mpimet.mpg.de/projects/cdo/embedded/cdo_eca.pdf).
history: CDO commands (last cdo command first and separated with ;): timmean; yearmean
invar_bc_institution: Swedish Meteorological and Hydrological Institute
invar_bc_method: TimescaleBC, Description in deliverable C3S_D424.SMHI.1.3b
invar_bc_method_id: TimescaleBC v1.02
invar_bc_observation: EFAS-Meteo, https://ec.europa.eu/jrc/en/publication/eur-scientific-and-technical-research-reports/efas-meteo-european-daily-high-resolution-gridded-meteorological-data-set-1990-2011
invar_bc_observation_id: EFAS-Meteo
invar_bc_period: 1990-2018
data_quality: Testing of EURO-CORDEX data performed by ESGF nodes. Additional tests were performed when producing CII and ECVs in C3S_424_SMHI.
institution: SMHI
project_id: C3S_424_SMHI
references:
source: The RCM data originate from EURO-CORDEX (Coordinated Downscaling Experiment - European Domain, EUR-11) https://euro-cordex.net/.
invar_experiment_id: rcp45
invar_realisation_id: r1i1p1
invar_rcm_model_id: MPI-CSC-REMO2009-v1
variable_name: prAdjust_tmean
dimensions(sizes): x(1000), y(950), time(1), bnds(2)
variables(dimensions): float32 lon(y,x), float32 lat(y,x), float64 time(time), float64 time_bnds(time,bnds), float32 prAdjust_tmean(time,y,x)
groups:
Extract the variable:
t2m = data.variables['prAdjust_tmean']
Get dimensions assuming 3D: time, latitude, longitude
time_dim, lat_dim, lon_dim = t2m.get_dims()
time_var = data.variables[time_dim.name]
times = num2date(time_var[:], time_var.units)
latitudes = data.variables[lat_dim.name][:]
longitudes = data.variables[lon_dim.name][:]
output_dir = './'
And the error:
OverflowError Traceback (most recent call last)
<ipython-input-9-69e10e41e621> in <module>
2 time_dim, lat_dim, lon_dim = t2m.get_dims()
3 time_var = data.variables[time_dim.name]
----> 4 times = num2date(time_var[:], time_var.units)
5 latitudes = data.variables[lat_dim.name][:]
6 longitudes = data.variables[lon_dim.name][:]
C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\dates.py in num2date(x, tz)
509 if tz is None:
510 tz = _get_rc_timezone()
--> 511 return _from_ordinalf_np_vectorized(x, tz).tolist()
512
513
C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\function_base.py in __call__(self, *args, **kwargs)
2106 vargs.extend([kwargs[_n] for _n in names])
2107
-> 2108 return self._vectorize_call(func=func, args=vargs)
2109
2110 def _get_ufunc_and_otypes(self, func, args):
C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\function_base.py in _vectorize_call(self, func, args)
2190 for a in args]
2191
-> 2192 outputs = ufunc(*inputs)
2193
2194 if ufunc.nout == 1:
C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\dates.py in _from_ordinalf(x, tz)
329
330 dt = (np.datetime64(get_epoch()) +
--> 331 np.timedelta64(int(np.round(x * MUSECONDS_PER_DAY)), 'us'))
332 if dt < np.datetime64('0001-01-01') or dt >= np.datetime64('10000-01-01'):
333 raise ValueError(f'Date ordinal {x} converts to {dt} (using '
OverflowError: int too big to convert
Last part of the script is:
import os
# Path
path = "/home"
# Join various path components
print(os.path.join(path, "User/Desktop", "file.txt"))
# Path
path = "User/Documents"
# Join various path components
print(os.path.join(path, "/home", "file.txt"))
filename = os.path.join(output_dir, 'table.csv')
print(f'Writing data in tabular form to {filename} (this may take some time)...')
times_grid, latitudes_grid, longitudes_grid = [
x.flatten() for x in np.meshgrid(times, latitudes, longitudes, indexing='ij')]
df = pd.DataFrame({
'time': [t.isoformat() for t in times_grid],
'latitude': latitudes_grid,
'longitude': longitudes_grid,
't2m': t2m[:].flatten()})
df.to_csv(filename, index=False)
print('Done')
Script 2:
The second script has a more useful approach, because it gives the user the opportunity to select the exact data from NetCDF file using python, rather than extracting all data. This script is written by GeoDelta Labs on youtube. However, as in most cases the script is "written" for a specific format of nc files, and needs further tweaking to read any nc file.
The code is like this:
import netCDF4
import glob
from netCDF4 import Dataset
import pandas as pd
import numpy as np
# Record all the years of the netCDF files into a Python list
all_years = []
for file in glob.glob('name_of_the_file.nc'):
print(file)
data = Dataset(file, 'r')
data.variables.keys()
print(data.__dict__)
for file in glob.glob('name_of_the_file.nc'):
print(file)
data = Dataset(file, 'r')
time = data.variables['time']
year = time.units[14:18]
all_years.append(year)
all_years
# Creating an empty Pandas DataFrame covering the whole range of data
year_start = min(all_years)
end_year = max(all_years)
date_range = pd.date_range(start = str(year_start) + '-01-01',
end = str(end_year) + '-12-31',
freq = 'Y')
df = pd.DataFrame(0.0, columns = ['Temperature'], index = date_range)
Now the interesting part of this second python script. Usually NetCDf files cover vast geographical areas (for example Nothern Hemisphere or entire continents), which is not so useful for particular statistical analysis. Let say that the user wants to extract climate data from nc files for particular locations - countries, regions, cities, etc. GeoDelta Lab comes with a very useful approach.
It requires a small CSV file, created separately by the user, where three columns are created: Name (name of the region/country/location), Longitude and Latitude.
# Defining the location, lat, lon based on the csv data
cities = pd.read_csv('countries.csv')
cities
all_years
for index, row in cities.iterrows():
location = row['names']
location_latitude = row['lat']
location_longitude = row['long']
# Sorting the all_years python list
all_years.sort()
for yr in all_years:
# Reading-in the data
data = Dataset('prAdjust_tmean_abs_QM-EFAS-Meteo-EUR-11_MPI-M-MPI-ESM-LR_historical_r1i1p1_MPI-CSC-REMO2009-v1_na_1971-2000_grid5km_v1.nc', 'r')
# Storing the lat and lon data of the netCDF file into variables
lat = data.variables['lat'][:]
lon = data.variables['lon'][:]
# Squared difference between the specified lat,lon and the lat,lon of the netCDF
sq_diff_lat = (lat - location_latitude)**2
sq_diff_lon = (lon - location_longitude)**2
# Identify the index of the min value for lat and lon
min_index_lat = sq_diff_lat.argmin()
min_index_lon = sq_diff_lon.argmin()
# Accessing the average temparature data
temp = data.variables['prAdjust_tmean']
# Creating the date range for each year during each iteration
start = str(yr) + '-01-01'
end = str(yr) + '-12-31'
d_range = pd.date_range(start = start,
end = end,
freq = 'D')
for t_index in np.arange(0, len(d_range)):
print('Recording the value for '+ location+': ' + str(d_range[t_index]))
df.loc[d_range[t_index]]['Temparature'] = temp[t_index, min_index_lat, min_index_lon]
df.to_csv(location +'.csv')
The last part combines the data from user-defined CSV (with exact locations) and data extracted from nc file, to create a new CSV with the exact data needed by the user.
However, like always the second script fails to "read" the years from my nc file and finds only the starting year 1970. Can someone explain what I am doing wrong? Could be that the nc file contains only data for 1970? In the description of the data - it is clearly specified that the file contains annual averages from 1970 until 2000.
If someone can help me to solve this issue, I guess this script can be useful for all future users of netCDF files. Many thanks for your time.
Edit: #Robert Davy requested a sample data. I've added the link to Google Drive, where the original nc files are stored and could be downloaded/viewed by anyone that accesses this link: https://drive.google.com/drive/folders/11q9dKCoQWqPE5EMcM5ye1bNPqELGMzOA?usp=sharing

Need help writing code that will automatically write more code?

I need help with writing code for a work project. I have written a script that uses pandas to read an excel file. I have a while-loop written to iterate through each row and append latitude/longitude data from the excel file onto a map (Folium, Open Street Map)
The issue I've run into has to do with the GPS data. I download a CVS file with vehicle coordinates. On some of the vehicles I'm tracking, the GPS loses signal for whatever reason and doesn't come back online for hundreds of miles. This causes issues when I'm using line plots to track the vehicle movement on the map. I end up getting long straight lines running across cities since Folium is trying to connect the last GPS coordinate before the vehicle went offline, with the next GPS coordinate available once the vehicle is back online, which could be hundreds of miles away as shown here. I think if every time the script finds a gap in GPS coords, I can have a new loop generated that will basically start a completely new line plot and append it to the existing map. This way I should still see the entire vehicle route on the map but without the long lines trying to connect broken points together.
My idea is to have my script calculate the absolute value difference between each iteration of longitude data. If the difference between each point is greater than 0.01, I want my program to end the loop and to start a new loop. This new loop would then need to have new variables init. I will not know how many new loops would need to be created since there's no way to predict how many times the GPS will go offline/online in each vehicle.
https://gist.github.com/tapanojum/81460dd89cb079296fee0c48a3d625a7
import folium
import pandas as pd
# Pulls CSV file from this location and adds headers to the columns
df = pd.read_csv('Example.CSV',names=['Longitude', 'Latitude',])
lat = (df.Latitude / 10 ** 7) # Converting Lat/Lon into decimal degrees
lon = (df.Longitude / 10 ** 7)
zoom_start = 17 # Zoom level and starting location when map is opened
mapa = folium.Map(location=[lat[1], lon[1]], zoom_start=zoom_start)
i = 0
j = (lat[i] - lat[i - 1])
location = []
while i < len(lat):
if abs(j) < 0.01:
location.append((lat[i], lon[i]))
i += 1
else:
break
# This section is where additional loops would ideally be generated
# Line plot settings
c1 = folium.MultiPolyLine(locations=[location], color='blue', weight=1.5, opacity=0.5)
c1.add_to(mapa)
mapa.save(outfile="Example.html")
Here's pseudocode for how I want to accomplish this.
1) Python reads csv
2) Converts Long/Lat into decimal degrees
3) Init location1
4) Runs while loop to append coords
5) If abs(j) >= 0.01, break loop
6) Init location(2,3,...)
7) Generates new while i < len(lat): loop using location(2,3,...)
9) Repeats step 5-7 while i < len(lat) (Repeat as many times as there are
instances of abs(j) >= 0.01))
10) Creats (c1, c2, c3,...) = folium.MultiPolyLine(locations=[location], color='blue', weight=1.5, opacity=0.5) for each variable of location
11) Creates c1.add_to(mapa) for each c1,c2,c3... listed above
12) mapa.save
Any help would be tremendously appreciated!
UPDATE:
Working Solution
import folium
import pandas as pd
# Pulls CSV file from this location and adds headers to the columns
df = pd.read_csv(EXAMPLE.CSV',names=['Longitude', 'Latitude'])
lat = (df.Latitude / 10 ** 7) # Converting Lat/Lon into decimal degrees
lon = (df.Longitude / 10 ** 7)
zoom_start = 17 # Zoom level and starting location when map is opened
mapa = folium.Map(location=[lat[1], lon[1]], zoom_start=zoom_start)
i = 1
location = []
while i < (len(lat)-1):
location.append((lat[i], lon[i]))
i += 1
j = (lat[i] - lat[i - 1])
if abs(j) > 0.01:
c1 = folium.MultiPolyLine(locations=[location], color='blue', weight=1.5, opacity=0.5)
c1.add_to(mapa)
location = []
mapa.save(outfile="Example.html")

Your while loop looks wonky. You only set j once, outside the loop. Also, I think you want a list of line segments. Did you want something like this;
i = 0
segment = 0
locations = []
while i < len(lat):
locations[segment] = [] # start a new segment
# add points to the current segment until all are
# consumed or a disconnect is detected
while i < len(lat):
locations[segment].append((lat[i], lon[i]))
i += 1
j = (lat[i] - lat[i - 1])
if abs(j) > 0.01:
break
segment += 1
When this is done locations will be a list of segments, e.g.;
[ segment0, segment1, ..... ]
each segment will be a list of points, e.g.;
[ (lat,lon), (lan,lon), ..... ]

Parsing CSVs for only one value

I am trying to parse data from CSV files. The files are in a folder and I want to extract data and write them to the db. However the csvs are not set up in a table format. I know how to import csvs into the db with the for each loop container, adding data flow tasks, and importing with OLE DB Destination.
The problem is just getting one value out of these csvs. The format of the file is as followed:
Title Title 2
Date saved ##/##/#### ##:## AM
Comment
[ Main ]
No. Measure Output Unit of measure
1 Name 8 µm
Count 0 pcs
[ XY Measure ]
X
Y
D
[ Area ]
No. Area Unit Perimeter Unit
All I want is just the output which is "8", to snatch the name of the file to make it name of the result or add it to a column, and the date and time to add to their own columns.
I am not sure which direction to head into and i hope someone has some things for me to look into. Originally, I wasn't sure if I should do the parsing externally (python) before using SQL server. If anyone knows another way I should use to get this done please let me know. Sorry for the unclear post earlier.
The expect outcome:
Filename Date Time Outcome
jnnnnnnn ##/##/#### ##:## 8

I'd try this:
filename = # from the from the path of the file you're parsing
# define appropriate vars
for row in csv_file:
if row.find('Date saved') > 0:
row = row.replace('Date saved ')
date_saved = row[0:row.find(' ')]
row = row.replace(date_saved + ' ')
time = row[0:row.find(' ')]
elif row.find(u"\u03BC"):
split_row = row.split(' ')
outcome = split_row[2]
# add filename,date_saved,time,outcome to data that will go in DB

Loop through netcdf files and run calculations - Python or R

This is my first time using netCDF and I'm trying to wrap my head around working with it.
I have multiple version 3 netcdf files (NOAA NARR air.2m daily averages for an entire year). Each file spans a year between 1979 - 2012. They are 349 x 277 grids with approximately a 32km resolution. Data was downloaded from here.
The dimension is time (hours since 1/1/1800) and my variable of interest is air. I need to calculate accumulated days with a temperature < 0. For example
Day 1 = +4 degrees, accumulated days = 0
Day 2 = -1 degrees, accumulated days = 1
Day 3 = -2 degrees, accumulated days = 2
Day 4 = -4 degrees, accumulated days = 3
Day 5 = +2 degrees, accumulated days = 0
Day 6 = -3 degrees, accumulated days = 1
I need to store this data in a new netcdf file. I am familiar with Python and somewhat with R. What is the best way to loop through each day, check the previous days value, and based on that, output a value to a new netcdf file with the exact same dimension and variable.... or perhaps just add another variable to the original netcdf file with the output I'm looking for.
Is it best to leave all the files separate or combine them? I combined them with ncrcat and it worked fine, but the file is 2.3gb.
Thanks for the input.
My current progress in python:
import numpy
import netCDF4
#Change my working DIR
f = netCDF4.Dataset('air7912.nc', 'r')
for a in f.variables:
print(a)
#output =
lat
long
x
y
Lambert_Conformal
time
time_bnds
air
f.variables['air'][1, 1, 1]
#Output
298.37473
To help me understand this better what type of data structure am I working with? Is ['air'] the key in the above example and [1,1,1] are also keys? to get the value of 298.37473. How can I then loop through [1,1,1]?

You can use the very nice MFDataset feature in netCDF4 to treat a bunch of files as one aggregated file, without the need to use ncrcat. So you code would look like this:
from pylab import *
import netCDF4
f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.19??.nc')
# print variables
f.variables.keys()
atemp = f.variables['air']
print atemp
ntimes, ny, nx = shape(atemp)
cold_days = zeros((ny,nx),dtype=int)
for i in xrange(ntimes):
cold_days += atemp[i,:,:].data-273.15 < 0
pcolormesh(cold_days)
colorbar()
And here's one way to write the file (there might be easier ways):
# create NetCDF file
nco = netCDF4.Dataset('/usgs/data2/notebook/cold_days.nc','w',clobber=True)
nco.createDimension('x',nx)
nco.createDimension('y',ny)
cold_days_v = nco.createVariable('cold_days', 'i4', ( 'y', 'x'))
cold_days_v.units='days'
cold_days_v.long_name='total number of days below 0 degC'
cold_days_v.grid_mapping = 'Lambert_Conformal'
lono = nco.createVariable('lon','f4',('y','x'))
lato = nco.createVariable('lat','f4',('y','x'))
xo = nco.createVariable('x','f4',('x'))
yo = nco.createVariable('y','f4',('y'))
lco = nco.createVariable('Lambert_Conformal','i4')
# copy all the variable attributes from original file
for var in ['lon','lat','x','y','Lambert_Conformal']:
for att in f.variables[var].ncattrs():
setattr(nco.variables[var],att,getattr(f.variables[var],att))
# copy variable data for lon,lat,x and y
lono[:]=f.variables['lon'][:]
lato[:]=f.variables['lat'][:]
xo[:]=f.variables['x'][:]
yo[:]=f.variables['y'][:]
# write the cold_days data
cold_days_v[:,:]=cold_days
# copy Global attributes from original file
for att in f.ncattrs():
setattr(nco,att,getattr(f,att))
nco.Conventions='CF-1.6'
nco.close()
If I try looking at the resulting file in the Unidata NetCDF-Java Tools-UI GUI, it seems to be okay:
Also note that here I just downloaded two of the datasets for testing, so I used
f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.19??.nc')
as an example. For all the data, you could use
f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.????.nc')
or
f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.*.nc')

Here is an R solution.
infiles <- list.files("data", pattern = "nc", full.names = TRUE, include.dirs = TRUE)
outfile <- "data/air.colddays.nc"
library(raster)
r <- raster::stack(infiles)
r <- sum((r - 273.15) < 0)
plot(r)

I know this is rather late for this thread from 2013, but I just want to point out that the accepted solution doesn't provide the solution to the exact question posed. The question seems to want the length of each continuous period of temperatures below zero (note in the question the counter resets if the temperature exceeds zero), which can be important for climate applications (e.g. for farming) whereas the accepted solution only gives the total number of days in a year that the temperature is below zero. If this is really what mkmitchell wants (it has been accepted as the answer) then it can be done in from the command line in cdo without having to worry about NETCDF input/output:
cdo timsum -lec,273.15 in.nc out.nc
so a looped script would be:
files=`ls *.nc` # pick up all the netcdf files in a directory
for file in $files ; do
# I use 273.15 as from the question seems T is in Kelvin
cdo timsum -lec,273.15 $file ${file%???}_numdays.nc
done
If you then want the total number over the whole period you can then cat the _numdays files instead which are much smaller:
cdo cat *_numdays.nc total.nc
cdo timsum total.nc total_below_zero.nc
But again, the question seems to want accumulated days per event, which is different, but not provided by the accepted answer.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to iterate over latitude longitude values of a grib2 file? - python

Related

Python good practice with NetCDF4 shared dimensions across groups

How to extract data from netCDF to CSV using simple python script?

Need help writing code that will automatically write more code?

Parsing CSVs for only one value

Loop through netcdf files and run calculations - Python or R

Categories

Resources