Extracting data from a NETCDF file

Extracting data from a NETCDF file - python

I am just a python begginner and I am having some trouble in extracting data from a netcdf file.
For example, in this code I was trying to create a variable with the temperature but it is not appearing at the variable explorer console. Does anyone know why?
import netCDF4 as nc
import numpy as np
fn ='C:/Users/Public/Documents/Python Scripts/MERRA2_300.tavgM_2d_slv_Nx.201001.nc4'
ds = nc.Dataset(fn)

print(ds.variables.keys()) to see all the variables in the netcdf file.
Printing the above statement would also give the key for the temperature variable in the file.
Assign it to a variable as shown:temp_variable =ds.variables["#temp#"]
Note: Replace #temp# with the key for temperature variable

Related

Loading Multiple Data files from same folder in Python

I am trying to load a large number of data files from the same folder in Python. The ultimate goal here is to simply choose which file I would like to use in calculations, rather than individually opening files.
Here is what I have. This seems to work in opening the data in the files, but I am having a hard time choosing a specific file I want to work with (and assigning a value to each column in each file).
import astropy
import numpy as np
import matplotlib.pyplot as plt
dir = '/S34_east_tfa/'
import glob, os
os.chdir(dir)
for file in glob.glob("*.data"):
data = np.loadtxt(file)
print (data)
Time = data[:,0]

Use a python dictionary, instead of overwriting the results in data variable inside your loop.
data_dict = dict()
for file in glob.glob("*.data"):
data_dict[file] = np.loadtxt(file)
Is this what you were looking for?

Import a column from excel into python and run autocorrelation on it

I have a 1 column excel file. I want to import all the values it has in a variable x (something like x=[1,2,3,4.5,-6.....]), then use this variable to run numpy.correlate(x,x,mode='full') to get autocorrelation, after I import numpy.
When I manually enter x=[1,2,3...], it does the job fine, but when I try to copy paste all the values in x=[], it gives me a NameError: name 'NO' is not defined.
Can someone tell me how to go around doing this?

You can use Pandas to import a CSV file with the pd.read_csv function.

open .dat file in python

I have a .dat file and I want to generate the content in python, thus I use the following code:
import numpy as np
bananayte=np.fromfile("U04_banana-ytest.dat",dtype=float)
print(bananayte)
However, my initial data should be like "1.0000000e+00", while the output is like "1.39804066e-76". What happened? and what should I do to get the correct value? Thanks!

Work with multiple netCDF files/variables in python

I have around 4TB MERIS time series data which comes in netCDF format.
So I have a lot netCDF files containing several 'variables'.
NetCDF format is new to me and although I've read a lot about netCDF processing I don't get an idea of how to do it. This question 'Combining a large amount of netCDF files' deals somehow with my problem but I did not get there. My approach was to first mosaic, then stack and lately take the mean out of every pixel.
One file contains the following 32 variables
Here's additional the ncdump output of one .nc file of one day:
http://www.filedropper.com/ncdumpoutput
I managed to read the files, extract the variables I want (variable # 32) and put them into a list using the following code
l = list()
for i in files_in:
# read netCDF file
dset = nc.Dataset(i, mode = 'r')
# save variables
var = dset.variables['vegetation_index_mean'][:]
# write all temp loop outputs in a list
l.append (var)
# close netCDF file
dset.close()
The list now contains 24 'masked_arrays' of different locations of the same date.
Every time I want to print the contents of the list my Spyder freezes. Every command I run afterwards Spyder first freezes for five sec before starting.
My goal is to make a time series analysis for a specific time frame (every date stored in a single .nc file). So my plan was to mosaic (is this possible?) the variables in the list (treating them as raster bands), process additional dates and take the mean for every pixel (1800 x 1800 ).
Maybe my whole approach is wrong? Can I treat these 'variables' like raster bands?

I'm not sure if the following answer may respond to your needs, as this procedure is designed in order to process timeseries, is pretty manual and furthermore you have 4Tb of data...
Thus I apologize myself if this doesn't help.
This is for Python 2.7:
First import all the modules needed:
import tkFileDialog
from netCDF4 import Dataset
import matplotlib.pyplot as plt
Second parse multiple nc files:
n = []
filename = {}
filename = tkFileDialog.askopenfilenames()
filename = list(filename)
n = len(filename)
Third read nc files and classify data and metadata within dictionaries using a loop:
wtr_tem = {} # create empty arrays for variable sea water temperature
fh = {} # create empty arrays for filehandler and variables nc file
vars = {}
for i in range(n):
filename[i]=filename[i].decode('unicode_escape').encode('ascii','ignore') # remove unicode in order to execute the following command
filename1 = ''.join(filename[i]) # converts list to string
fh[i] = Dataset(filename1, mode='r') #create the file handle
vars[i] = fh[i].variables.keys() #returns a list with the variables of the file
wtr_tem[i] = fh[i].variables['WTR_TEM']
#plot variables in different figures
plt.plot(wtr_tem[i],'r-')
plt.xlabel(fh[i].title) #add specific title from each nc file
plt.show()
I hope it may help to somebody.

Exporting data to .mat file in python

I am working on a script in python where I can pipe ls output to the script and open all the files I want to work with in python using scipy.io, and then I want to take all the imported data, and assign them into a .mat file (again using scipy.io). I have had success importing the data and assigning it to a dictionary to export, but when I load my output file in MATLAB none of the data looks at all the same.
The data I am importing all has a lat/lon coordinate attached to it so I will use that data as an example. Data is coming from a netCDF (.nc) file:
#!/usr/bin/python
import sys
import scipy.io as sio
import numpy as np
# script take absolute path inputs (or absolute paths are desirable)
# to access an abolute path on a linux systen use the command:
# ls -d $PWD/* OR if files not in PWD then ls -d /absolute_file_path/*
# and then pipe the standard output of that to input of this script
# initialize dictionary (param) and cycle number (i.e. datafile number)
cycle = []
param = {}
# read each file from stdin
for filename in sys.stdin:
fh = sio.netcdf.netcdf_file(filename,'r')
# load in standard variables (coordinates)
latitude = fh.variables['LATITUDE'][:]
longitude = fh.variables['LONGITUDE'][:]
# close file
fh.close()
# get file number (cycle number)
cycle = filename[-7:-4]
# add latest imported coordinate to dictionary
latvar = 'lat' + cycle
lonvar = 'lon' + cycle
param.update({latvar:latitude})
param.update({lonvar:longitude})
# export dictionary to .mat file
sio.savemat('test.mat',param)
When I print the values to check if they are imported correctly I get reasonable values - but when I open my exported values in MATLAB this is an example of my output:
>> lat001
lat001 =
-3.5746e-133
and other variables have similar large exponent value (sometime very small as in this example sometimes extremely large ~1e100).
I have tried looking for similar problems but all I have come across is that some have had issues assigning large amounts of data to a single .mat file (ex. an array exceeding 2**32-1 bytes).
EDIT: Some example outputs (loading file in python, datatypes)
print latitude
[ 43.091]
print type(latitude)
<type 'numpy.ndarray'>
data = sio.loadmat('test.mat')
print data['lat001']
array([[ -3.57459142e-133]])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extracting data from a NETCDF file - python

print(ds.variables.keys()) to see all the variables in the netcdf file. Printing the above statement would also give the key for the temperature variable in the file. Assign it to a variable as shown:temp_variable =ds.variables["#temp#"] Note: Replace #temp# with the key for temperature variable

Related

Loading Multiple Data files from same folder in Python

Import a column from excel into python and run autocorrelation on it

open .dat file in python

Work with multiple netCDF files/variables in python

Exporting data to .mat file in python

Categories

Resources