Read .nc (netcdf) files using python

Read .nc (netcdf) files using python - python

I am trying to learn how to read .nc (netcdf) files using Python in the most easiest/fastest way. I heard that it can be done with 3 lines of code but I really don't know how.
I am running the MITgcm numerical model. I'm trying to get an easy way to visualize the output data in the same way as programs like NCview does but with Python, so I can customise the parameters to read and everything.
I found this:
from matplotlib import pyplot as plt
import pandas as pd
import netCDF4
fp='uwstemp.nc'
nc = netCDF4.Dataset(fp)
plt.imshow(nc['Temp'][1,:,0,:])
plt.show()
It worked roughly like I want it, but I would like to understand word by word what is it doing. I guess 'Temp' is one of my variables, but I don't know how to figure out what all my variables are.
Specially, I don't understand plt.imshow(nc['Temp'][1,:,0,:]) thhat [1,:,0,:] I tried to change it and does not compile; but I don't understand what is it doing and why this numbers.

I use the MITgcm too. Say you have your state.nc output.
First of all make sure you import all you need:
from scipy.io import netcdf
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
The easiest way to read the data is:
file2read = netcdf.NetCDFFile(path+'state.nc','r')
temp = file2read.variables[var] # var can be 'Theta', 'S', 'V', 'U' etc..
data = temp[:]*1
file2read.close()
Then a quick way to plot say layer z at time t is:
plt.contourf(data[t,z,:,:])
To answer your question I commented the code:
from matplotlib import pyplot as plt # import libraries
import pandas as pd # import libraries
import netCDF4 # import libraries
fp='uwstemp.nc' # your file name with the eventual path
nc = netCDF4.Dataset(fp) # reading the nc file and creating Dataset
""" in this dataset each component will be
in the form nt,nz,ny,nx i.e. all the variables will be flipped. """
plt.imshow(nc['Temp'][1,:,0,:])
""" imshow is a 2D plot function
according to what I have said before this will plot the second
iteration of the vertical slize with y = 0, one of the vertical
boundaries of your model. """
plt.show() # this shows the plot
If you want to check the various dimensions of your data so you know what you can plot simply do print(nc['Temp'].shape)

For netCDF4 files (with python 3), use:
import netCDF4
file2read = netCDF4.Dataset(cwd+'\filename.nc','r')
var1 = file2read.variables['var1'] # access a variable in the file
where cwd is my current working directory for getting the file path for the .nc file in order to read it:
import os
cwd = os.getcwd()
I am using Windows, so file directory will be different than for Mac or Linux.
To look at all of the variable keys:
print(file2read.variables.keys())
Which will give an output like this:
dict_keys(['ap', 'ap_bnds', 'b', 'b_bnds', 'bnds', 'ch4', 'lat', 'lat_bnds', 'lev', 'lev_bnds', 'lon', 'lon_bnds', 'time', 'time_bnds'])
Or to look at all of the variables in your netcfd4 file, you can just print 'file2read':
print(file2read)
And the output will include something like this (look at the end specifically):
source_id: GFDL-ESM4
source_type: AOGCM AER CHEM BGC
sub_experiment: none
sub_experiment_id: none
title: NOAA GFDL GFDL-ESM4 model output prepared for CMIP6 update of RCP8.5 based on SSP5
variable_id: ch4
variant_info: N/A
references: see further_info_url attribute
variant_label: r1i1p1f1
dimensions(sizes): lev(49), bnds(2), time(1032), lat(180), lon(288)
variables(dimensions): float64 ap(lev), float64 ap_bnds(lev, bnds), float64 b(lev), float64 b_bnds(lev, bnds), float64 bnds(bnds), float32 ch4(time, lev, lat, lon), float64 lat(lat), float64 lat_bnds(lat, bnds), float64 lev(lev), float64 lev_bnds(lev, bnds), float64 lon(lon), float64 lon_bnds(lon, bnds), float64 time(time), float64 time_bnds(time, bnds)
You can notice that the last part includes the dimensions of the variables along with the type and name of the variables.
Check out this website for more info and examples:
https://www.earthinversion.com/utilities/reading-NetCDF4-data-in-python/

If you are working in Linux, my package nctoolkit (toolkit.readthedocs.io/en/latest/) offers similar functionality to ncview, but within Python. It can autoplot the contents of NetCDF files either within a Jupyter notebook or a web browser. The following should work for the data given:
import nctoolkit as nc
fp='uwstemp.nc'
data = nc.open_data(fp)
data.plot()

Related

Interactive Xarray dataset raster visualisation app using Panel and hvplot

I am trying to replicate the Glaciers Demo using an Xarray of geospatial data. I am able to create pretty much exactly what I want but I am trying to create a Panel app that allows the user to select the data_vars, each of which has different dimensions that I want make interactable, and visualize on an interactive map with at least the continents contour. Here is what my Xarray Dataset looks like :
def plot(field):
return xds[field].hvplot.image().opts(cmap='jet',height=650,width=1300,data_aspect=1)
interact(plot, field = list(xds.data_vars))
and here is what the code above produces in a notebook :
I would like to integrate the selector for the data_vars and then depending on its dimensions have interactive maps with controls for all its dimensions (ES has (time, pres1, lat, lon) while P0 has only (time, lat, lon)) and I would like to have the controls in the sidebar and the plots in the main of the following template :
from turtle import width
from matplotlib.pyplot import title
import panel as pn
import numpy as np
import holoviews as hv
from panel.template import DefaultTheme
from pathlib import Path
import fstd2nc
import hvplot.xarray
import xarray as xr
from unicodedata import name
import hvplot
import param
from panel.interact import interact
pn.extension(sizing_mode='stretch_width')
bootstrap = pn.template.MaterialTemplate(title='Material Template', theme=DefaultTheme, )
glob_path = Path(r"C:\Users\spart\Documents\Anaconda-Work-Dir")
file_list = [str(pp).split('\\')[-1] for pp in glob_path.glob("2022*")]
phase = pn.widgets.FloatSlider(name="Phase", start=0, end=np.pi)
fileSel = pn.widgets.Select(name='Select File', options=file_list)
#pn.depends(fileSel=fileSel)
def selectedFile(fileSel):
base_path = r"C:\Users\spart\Documents\Anaconda-Work-Dir\{}".format(fileSel)
return pn.widgets.StaticText(name='Selected', value=base_path)
#pn.depends(fileSel=fileSel)
def dataXArray(fileSel):
base_path = r"C:\Users\spart\Documents\Anaconda-Work-Dir\{}".format(fileSel)
xds = fstd2nc.Buffer(base_path).to_xarray()
return xds.ES.hvplot( width=500)
bootstrap.sidebar.append(fileSel)
bootstrap.sidebar.append(selectedFile)
bootstrap.main.append(
pn.Row(
pn.Card(hv.DynamicMap(dataXArray), title='Plot'),
)
)
bootstrap.show()
EDIT : Here is a link to an example dataset which can be loaded with the following code
xds = fstd2nc.Buffer(PATH_TO_FILE).to_xarray()

Without the data file I can't easily run the code, but some observations:
If using bare functions like this rather than classes, I'd recommend using pn.bind rather than pn.depends; it really helps get the code organized better.
For a simple application like this, I'd use hvPlot .interactive: https://hvplot.holoviz.org/user_guide/Interactive.html
I can't seem to find this in the docs, but you can pull out the widgets from the result of dataXArray (or any other hvplot or holoviews object) using .widgets(), and you can then put that in the sidebar. You can then pull out just the plot using .panel(), and put that in the main area.
If that helps you get it done, then great; if not please post a sample data file or two so that it's runnable, and I can look into it further. And please submit a PR to the docs once you get it working so that future users have less trouble!

Preserving the WCS information of a FITS file when rebinned

Aim : Rebin an existing image (FITS file) and write the new entries into a new rebinned image (also a FITS file).
Issue : Rebinned FITS file and the original FITS file seem to have mismatched co-ordinates (figure shown later in the question).
Process : I will briefly describe my process to shed more light. The first step is to read the existing fits file and define numpy arrays
from math import *
import numpy as np
import matplotlib.pyplot as plt
from astropy.visualization import astropy_mpl_style
from astropy.io import fits
import matplotlib.pyplot as plt
%matplotlib notebook
import aplpy
from aplpy import FITSFigure
file = 'F0621_HA_POL_0700471_HAWDHWPD_PMP_070-199Augy20.fits'
hawc = fits.open(file)
stokes_i = np.array(hawc[0].data)
stokes_i_rebinned = congrid(stokes_i,newdim,method="neighbour", centre=False, minusone=False)
Here "congrid" is a function I have used for near-neigbhour rebinning that rebins the original array to a new dimension given by "newdim". Now the goal is to write this rebinned array back into the FITS file format as a new file. I have several more such arrays but for brevity, I just include one array as an example. To keep the header information same, I read the header information of that array from the existing FITS file and use that to write the new array into a new FITS file. After writing, the rebinned file can be read just like the original :-
header_0= hawc[0].header
fits.writeto("CasA_HAWC+_rebinned_congrid.fits", rebinned_stokes_i, header_0, overwrite=True)
rebinned_file = 'CasA_HAWC+_rebinned_congrid.fits'
hawc_rebinned= fits.open(rebinned_file)
To check how the rebinned image looks now I plot them
cmap = 'rainbow'
stokes_i = hawc[0]
stokes_i_rebinned = hawc_rebinned[0]
axi = FITSFigure(stokes_i, subplot=(1,2,1)) # generate FITSFigure as subplot to have two axes together
axi.show_colorscale(cmap=cmap) # show I
axi_rebinned = FITSFigure(stokes_i_rebinned, subplot=(1,2,2),figure=plt.gcf())
axi_rebinned.show_colorscale(cmap=cmap) # show I rebinned
# FORMATTING
axi.set_title('Stokes I (146 x 146)')
axi_rebinned.set_title('Rebinned Stokes I (50 x 50)')
axi_rebinned.axis_labels.set_yposition('right')
axi_rebinned.tick_labels.set_yposition('right')
axi.tick_labels.set_font(size='small')
axi.axis_labels.set_font(size='small')
axi_rebinned.tick_labels.set_font(size='small')
axi_rebinned.axis_labels.set_font(size='small')
As you see for the original and rebinned image, the X,Y co-ordinates seem mismatched and my best guess was that WCS (world co-ordinate system) for the original FITS file wasn't properly copied for the new FITS file, thus causing any mismatch. So how do I align these co-ordinates ?
Any help will be deeply appreciated ! Thanks

I'm posting my answer in an astropy slack channel here should this be useful for others.
congrid will not work because it doesn't include information about the WCS. For example, your CD matrix is tied to the original image, not the re-binned set. There are a number of way to re-bin data with proper WCS. You might consider reproject although this often requires another WCS header to re-bin to.
Montage (though not a Python tool but has Python wrappers) is potentially another way.

As #astrochun already said, your re-binning function does not adjust the WCS of the re-binned image. In addition to reproject and Montage, astropy.wcs.WCSobject has slice() method. You could try using it to "re-bin" the WCS like this:
from astropy.wcs import WCS
import numpy as np
wcs = WCS(hawc[0].header, hawc)
wcs_rebinned = wcs.slice((np.s_[::2], np.s_[::2]))
wcs_hdr = wcs_rebinned.to_header()
header_0.update(wcs_hdr) # but watch out for CD->PC conversion
You should also make a "real" copy of hawc[0].header in header_0= hawc[0].header, for example as header_0= hawc[0].header.copy() or else header_0.update(wcs_hdr) will modify hawc[0].header as well.

How to plot the free energy landscape of protein structure?

I understand the question is not appropriate for this platform, but I can try if I can get some hints,
I've been trying to plot the free energy landscape of a protein structure ("Chignolin"). I'm completely run out of ideas how to do that!! I've MD simulation trajectory file Trajectory file and using pyemma to plot the energy landscape. But I'm getting the error
""
TypeError: plot_free_energy() takes from 2 to 20 positional arguments but 28 were given
""
Could someone figure out where the problem lies?
Here is my code
import numpy as np
import matplotlib.pyplot as plt
import mdtraj as md
from itertools import combinations
from simtk.openmm.app.topology import Topology
from simtk.openmm.app.simulation import Simulation
from simtk.openmm.app.dcdreporter import DCDReporter
from simtk.openmm.app.statedatareporter import StateDataReporter
import simtk.unit as u
import simtk.openmm as mm
import simtk.openmm.openmm as openmm
import pyemma.coordinates as coor
import pyemma
pdb = md.load('1uao_Calpha.pdb')
feat = pyemma.coordinates.data.MDFeaturizer(pdb)
feat.add_distances_ca(periodic=False)
files = pyemma.coordinates.load('traj/DESRES/CLN025-0-c-alpha/CLN025-0-c-alpha-005.dcd', features = feat)
pyemma.plots.plot_free_energy(*files.T)
plt.show()
Here is the another pdb file.

I recommend you start reading the documentation, especially the "learn PyEMMA" section containing Jupyter notebooks teaching you the work-flow to extract properly weighted "pseudo" free-energy surfaces. Usually these surfaces are drawn into the dimensions of the first two slowest dynamical processes, but you can think of any other combination as well. These dimensions are defined by a TICA or VAMP projection, which are basically methods to extract the slow modes from your data, in case of proteins this contains folding and rare events.
As a primer I suggest reading this tutorial first, as it gives you a brief overview how to load and process your data to extract the slow modes. Note that this not yet contain Markov state modelling, so read further in the other examples to learn about that.

Error reading Kepler FITS file using astropy

I was trying to read fits files from Kepler FITS files (Received from this URL https://archive.stsci.edu/pub/kepler/lightcurves/0007/000757076/) using astropy. Below are the set of commands I was trying to read the file:
from astropy.io import fits
fits_image_filename = fits.util.get_testdata_filepath(r'O:\MyWorks\keplar-test\kplr100000925-2009166043257_llc.fits')
But the above command produced this error:
I am not sure how to solve this error. My target is to read keplar data then plot this and/or convert this to CSV.

This: fits.util.get_testdata_filepath(r'O:\MyWorks\keplar-test\kplr100000925-2009166043257_llc.fits') is not the correct function for opening a file.
You should use fits.open('file.fits'), or if this is table data, as you imply, Table.read('file.fits')
See the note at the top of the FITS documentation

%matplotlib inline
from astropy.io import fits
import matplotlib
import matplotlib.pyplot as plt
#My required file has been downloaded in the following path of my HD,
"~/projects/eclipsing_binary/A/mastDownload/HLSP/hlsp_qlp_tess_ffi_s0018-0000000346784049_tess_v01_llc/hlsp_qlp_tess_ffi_s0018-000000346784049_tess_v01_llc.fits". Using linux command open and see the list
of the files in the directory.
%cd ~/projects/eclipsing_binary/A/mastDownload/HLSP/
hlsp_qlp_tess_ffi_s0018-0000000346784049_tess_v01_llc/
%ls
#Now plot the required file in simple way,
import lightkurve as lk
file_r = 'hlsp_qlp_tess_ffi_s0018-0000000346784049_tess_v01_llc.fits'
lr = lk.read(file_r)
lr.plot()

How do I plot GFS grib2 data with Python?

I would like to have a chart with the temperatures for the following days on my website, and the Global Forecasting System meets my needs the most. How do I plot the GRIB2 data in matplotlib and create a PNG image from the plot?
I've spend hours of searching on the internet, asking people who do know how to do this (they where not helpfull at all) and I don't know where to start.
GFS data can be found here: ftp://ftp.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/
If possible, I'd like it to be lightweight and without loosing too much server space.

When you think lightweight about data usage and storage, you may consider to use other data forms than GRIB. GRIB-files usually contain worldwide data, which is pretty useless when you only want to plot for a specific domain.
I can strongly recommend to use data from the NOAA-NCEP opendap data server. You can gain data from this server using netCDF4. Unfortunately, this server is known to be unstable at some times which may causes delays in refreshing runs and/or malformed datasets. Although, in 95% of the time, I have acces to all the data I need.
Note: This data server may be slow due to high trafficking after a release of a new run. Acces to the data server can be found here: http://nomads.ncdc.noaa.gov/data.php?name=access#hires_weather_datasets
Plotting data is pretty easy with Matplotlib and Basemap toolkits. Some examples, including usage of GFS-datasets, can be found here: http://matplotlib.org/basemap/users/examples.html

Basically, there are 2 steps:
use wgrib to extract selected variables from grib2 data, and save into NetCDF file. Although there are some API such as pygrib, yet I found it less buggy to use the command line tool directly. some useful links:
install: http://www.cpc.ncep.noaa.gov/products/wesley/wgrib2/compile_questions.html
tricks: http://www.ftp.cpc.ncep.noaa.gov/wd51we/wgrib2/tricks.wgrib2
For example, extract temperature and humidity:
wgrib2 test.grb2 -s | egrep '(:RH:2 m above ground:|:TMP:2 m above ground:)'|wgrib2 -i test.grb2 -netcdf test.nc
use Python libraries to process NetCDF files, example code may look like this:
import warnings
warnings.filterwarnings("ignore")
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
% matplotlib inline
from netCDF4 import Dataset
from mpl_toolkits.basemap import Basemap
from pyproj import Proj
import matplotlib.cm as cm
import datetime
file = "test.nc"
rootgrp = Dataset(file, "r")
x = rootgrp['longitude'][:] # 0-359, step = 1
y = rootgrp['latitude'][:] # -90~90, step =1
tmp = rootgrp['TMP_2maboveground'][:][0] # shape(181,360)
dt = datetime.datetime(1970,1,1) + datetime.timedelta(seconds = rootgrp['time'][0])
fig = plt.figure(dpi=150)
m = Basemap(projection='mill',lat_ts=10,llcrnrlon=x.min(),
urcrnrlon=x.max(),llcrnrlat=y.min(),urcrnrlat=y.max(), resolution='c')
xx, yy = m(*np.meshgrid(x,y))
m.pcolormesh(xx,yy,tmp-273.15,shading='flat',cmap=plt.cm.jet)
m.colorbar(location='right')
m.drawcoastlines()
m.drawparallels(np.arange(-90.,120.,30.), labels=[1,0,0,0], fontsize=10)
m.drawmeridians(np.arange(0.,360.,60.), labels=[0,0,0,1], fontsize=10)
plt.title("{}, GFS, Temperature (C) ".format(dt.strftime('%Y-%m-%d %H:%M UTC')))
plt.show()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Read .nc (netcdf) files using python - python

Related

Interactive Xarray dataset raster visualisation app using Panel and hvplot

Preserving the WCS information of a FITS file when rebinned

How to plot the free energy landscape of protein structure?

Error reading Kepler FITS file using astropy

How do I plot GFS grib2 data with Python?

Categories

Resources