Exporting data to .mat file in python - python

I am working on a script in python where I can pipe ls output to the script and open all the files I want to work with in python using scipy.io, and then I want to take all the imported data, and assign them into a .mat file (again using scipy.io). I have had success importing the data and assigning it to a dictionary to export, but when I load my output file in MATLAB none of the data looks at all the same.
The data I am importing all has a lat/lon coordinate attached to it so I will use that data as an example. Data is coming from a netCDF (.nc) file:
#!/usr/bin/python
import sys
import scipy.io as sio
import numpy as np
# script take absolute path inputs (or absolute paths are desirable)
# to access an abolute path on a linux systen use the command:
# ls -d $PWD/* OR if files not in PWD then ls -d /absolute_file_path/*
# and then pipe the standard output of that to input of this script
# initialize dictionary (param) and cycle number (i.e. datafile number)
cycle = []
param = {}
# read each file from stdin
for filename in sys.stdin:
fh = sio.netcdf.netcdf_file(filename,'r')
# load in standard variables (coordinates)
latitude = fh.variables['LATITUDE'][:]
longitude = fh.variables['LONGITUDE'][:]
# close file
fh.close()
# get file number (cycle number)
cycle = filename[-7:-4]
# add latest imported coordinate to dictionary
latvar = 'lat' + cycle
lonvar = 'lon' + cycle
param.update({latvar:latitude})
param.update({lonvar:longitude})
# export dictionary to .mat file
sio.savemat('test.mat',param)
When I print the values to check if they are imported correctly I get reasonable values - but when I open my exported values in MATLAB this is an example of my output:
>> lat001
lat001 =
-3.5746e-133
and other variables have similar large exponent value (sometime very small as in this example sometimes extremely large ~1e100).
I have tried looking for similar problems but all I have come across is that some have had issues assigning large amounts of data to a single .mat file (ex. an array exceeding 2**32-1 bytes).
EDIT: Some example outputs (loading file in python, datatypes)
print latitude
[ 43.091]
print type(latitude)
<type 'numpy.ndarray'>
data = sio.loadmat('test.mat')
print data['lat001']
array([[ -3.57459142e-133]])

Related

Extracting data from a NETCDF file

I am just a python begginner and I am having some trouble in extracting data from a netcdf file.
For example, in this code I was trying to create a variable with the temperature but it is not appearing at the variable explorer console. Does anyone know why?
import netCDF4 as nc
import numpy as np
fn ='C:/Users/Public/Documents/Python Scripts/MERRA2_300.tavgM_2d_slv_Nx.201001.nc4'
ds = nc.Dataset(fn)
print(ds.variables.keys()) to see all the variables in the netcdf file.
Printing the above statement would also give the key for the temperature variable in the file.
Assign it to a variable as shown:temp_variable =ds.variables["#temp#"]
Note: Replace #temp# with the key for temperature variable

Writing interpolated grib2 data with pygrib leads to unusable grib file

I'm trying to use pygrib to read data from a grib2 file, interpolate it using python, and write it to another file. I've tried both pygrib and eccodes and both produce the same problem. The output file size increased by a factor of 3, but when I try to view the data in applications like Weather and Climate Toolkit it has all the variables listed, but "No Data" when plotted. If I use the same script and don't interpolate the data, but just write it to the new file it works fine in WCT. If I use wgrib2 it lists all the grib messages, but if I use wgrib2 -V it works on the unaltered data but produces the error "*** FATAL ERROR: unsupported: code table 5.6=0 ***" for the interpolated data. Am I doing something wrong in my python script? Here is an example of what I'm doing to write the file (same result using pygrib 2.05 and 2.1.3). I used a basic hrrr file for the example.
import pygrib
import numpy as np
import sys
def writeNoChange():
# This produces a useable grib file.
filename = 'hrrr.t00z.wrfprsf06.grib2'
outfile = 'test.grib2'
grbs = pygrib.open(filename)
with open(outfile, 'wb') as outgrb:
for grb in grbs:
msg = grb.tostring()
outgrb.write(msg)
outgrb.close()
grbs.close()
def writeChange():
# This method produces a grib file that isn't recognized by WCT
filename = 'hrrr.t00z.wrfprsf06.grib2'
outfile = 'testChange.grib2'
grbs = pygrib.open(filename)
with open(outfile, 'wb') as outgrb:
for grb in grbs:
vals = grb.values * 1
grb['values'] = vals
msg = grb.tostring()
outgrb.write(msg)
outgrb.close()
grbs.close()
#-------------------------------
if __name__ == "__main__":
writeNoChange()
writeChange()
Table 5.6 for GRIB2 (https://www.nco.ncep.noaa.gov/pmb/docs/grib2/grib2_doc/) is related to "ORDER OF SPATIAL DIFFERENCING".
For some reason, when you modify grb['values'], it sets grb['orderOfSpatialDifferencing'] = 0, which "wgrib2 -V" doesn't like. So, after changing 'values', change 'orderOfSpatialDifferencing' to what it was initially:
orderOfSpatialDifferencing = grb['orderOfSpatialDifferencing']
grb['values']= [new values]
grb['orderOfSpatialDifferencing'] = orderOfSpatialDifferencing
This worked for me in terms of getting wgrib2 -V to work, but messed up the data. Possibly some other variables in Section 5 also need to be modified.

Loading Multiple Data files from same folder in Python

I am trying to load a large number of data files from the same folder in Python. The ultimate goal here is to simply choose which file I would like to use in calculations, rather than individually opening files.
Here is what I have. This seems to work in opening the data in the files, but I am having a hard time choosing a specific file I want to work with (and assigning a value to each column in each file).
import astropy
import numpy as np
import matplotlib.pyplot as plt
dir = '/S34_east_tfa/'
import glob, os
os.chdir(dir)
for file in glob.glob("*.data"):
data = np.loadtxt(file)
print (data)
Time = data[:,0]
Use a python dictionary, instead of overwriting the results in data variable inside your loop.
data_dict = dict()
for file in glob.glob("*.data"):
data_dict[file] = np.loadtxt(file)
Is this what you were looking for?

How to loop through multiple csv files and output their contents into one array?

I working in python and trying to take x, y, z coordinates from multiple LAZ files and put them into one array that can be used for another analysis. I am trying to automate this task as I have about 2000 files to turn into one or even 10 arrays.The example involves two files but I can't get the loop to work properly. I think I am not correctly naming my variables. below is an example code I have been trying to write (note that I am extremely new to programming so apologize if this is a horrible code).
Create list of las files, then turn them into an array--attempt at better automation
import numpy as np
from laspy.file import File
import glob as glob
# create list of vegetation files to be opened
VegList = sorted(glob.glob('/Users/sophiathompson/Desktop/copys/Clips/*.las'))
for f in VegList:
print(f)
Veg = File(filename = f, mode = "r") # Open the file
points = Veg.get_points() # Grab all of the points from the file.
print points #this is a check that the number of rows changes at the end
print ("array shape:")
print points.shape
VegListCoords = np.vstack((Veg.x, Veg.y, Veg.z)).transpose()
print VegListCoords
This block reads both files but fills VegListCoords with the results of the second file in the file list. I need it to hold the records from both. if this is a horrible way to go about it, I am very open to a new way.
You keep overwriting VegListCoords by assigning the values in your last opened file
instead, initialize at the beginning :
VegListCoords = []
and do instead :
VegListCoords.append(np.vstack((Veg.x, Veg.y, Veg.z)).transpose())
If you want them in one numpy array at the end, use np.concatenate

Work with multiple netCDF files/variables in python

I have around 4TB MERIS time series data which comes in netCDF format.
So I have a lot netCDF files containing several 'variables'.
NetCDF format is new to me and although I've read a lot about netCDF processing I don't get an idea of how to do it. This question 'Combining a large amount of netCDF files' deals somehow with my problem but I did not get there. My approach was to first mosaic, then stack and lately take the mean out of every pixel.
One file contains the following 32 variables
Here's additional the ncdump output of one .nc file of one day:
http://www.filedropper.com/ncdumpoutput
I managed to read the files, extract the variables I want (variable # 32) and put them into a list using the following code
l = list()
for i in files_in:
# read netCDF file
dset = nc.Dataset(i, mode = 'r')
# save variables
var = dset.variables['vegetation_index_mean'][:]
# write all temp loop outputs in a list
l.append (var)
# close netCDF file
dset.close()
The list now contains 24 'masked_arrays' of different locations of the same date.
Every time I want to print the contents of the list my Spyder freezes. Every command I run afterwards Spyder first freezes for five sec before starting.
My goal is to make a time series analysis for a specific time frame (every date stored in a single .nc file). So my plan was to mosaic (is this possible?) the variables in the list (treating them as raster bands), process additional dates and take the mean for every pixel (1800 x 1800 ).
Maybe my whole approach is wrong? Can I treat these 'variables' like raster bands?
I'm not sure if the following answer may respond to your needs, as this procedure is designed in order to process timeseries, is pretty manual and furthermore you have 4Tb of data...
Thus I apologize myself if this doesn't help.
This is for Python 2.7:
First import all the modules needed:
import tkFileDialog
from netCDF4 import Dataset
import matplotlib.pyplot as plt
Second parse multiple nc files:
n = []
filename = {}
filename = tkFileDialog.askopenfilenames()
filename = list(filename)
n = len(filename)
Third read nc files and classify data and metadata within dictionaries using a loop:
wtr_tem = {} # create empty arrays for variable sea water temperature
fh = {} # create empty arrays for filehandler and variables nc file
vars = {}
for i in range(n):
filename[i]=filename[i].decode('unicode_escape').encode('ascii','ignore') # remove unicode in order to execute the following command
filename1 = ''.join(filename[i]) # converts list to string
fh[i] = Dataset(filename1, mode='r') #create the file handle
vars[i] = fh[i].variables.keys() #returns a list with the variables of the file
wtr_tem[i] = fh[i].variables['WTR_TEM']
#plot variables in different figures
plt.plot(wtr_tem[i],'r-')
plt.xlabel(fh[i].title) #add specific title from each nc file
plt.show()
I hope it may help to somebody.

Categories