Why is the NETcdf data I created masked?

Why is the NETcdf data I created masked? - python

I create a netcdf file with some data, and when I import the data in another script, it is masked :
>>> type(Data[:])
<class 'numpy.ma.core.MaskedArray'>
Here is how I create the data :
# Put in a grid
print 'Putting the data in a grid...'
LatRange = range( int(min(Lat)), int(max(Lat)), 1 )
LonRange = np.arange( int(min(Lon)), int(max(Lon)), 1 )
dRange = range(0,200,10) + range(200,4000,100)
dateRange = np.arange( float(min(Dates).year)+min(Dates).month/12., float(max(Dates).year)+max(Dates).month/12., 1./12. )
dataset = Dataset('gridded_data/DataAveraged.nc','w', format='NETCDF4_CLASSIC')
zD = dataset.createDimension('zD',len(dRange))
latD = dataset.createDimension('latD',len(LatRange))
lonD = dataset.createDimension('lonD',len(LonRange))
timeD = dataset.createDimension('timeD',len(dateRange))
tempAve = dataset.createVariable('tempAve', np.float32, ('zD','latD','lonD','timeD'), fill_value=-9999)
tempAve.units = 'psu'
tempAve[:] = Tgrid_ave
Where Tgrid_ave is a numpy array.
Then, I import the data this way in another script :
dataset = Dataset('gridded_data/DataAveraged.nc', 'r')
LatRange = dataset.variables['lat'][:]
LonRange = dataset.variables['lon'][:-1]
Tgrid_ave = dataset.variables['tempAve']
And my Lat and Lon data are not masked, but my Tgrid_ave data is.
How can I avoid this!?

The netCDF4 library used to return either a masked array or a regular Numpy array, depending on if the data you request from the array (or array slice) contains fill values or not. This is unfortunate behavior but it seems to be fixed in PR 787. So I think that, from version 1.4 onward, the default behavior is always to return a masked array if a fill value is defined (I haven't tested it).
Anyway, you can ensure that you always get a regular numpy array by setting the set_auto_mask to False.

Related

Change dimension and values of netcdf file in Python

I'm trying to change the dimension of values in netcdf file.
First I read a netcdf file and interpolated the data.
import numpy as np
import netCDF4
from scipy.interpolate import interp1d
def interpolation(a,b,c):
f = interp1d(a,b,kind='linear')
return f(c)
file = 'directory/test.nc'
data = netcdf4.Dataset(file)
lon = data.variables['lon'] # size = 10
lat = data.variables['lat'] # size = 10
lev = data.variables['lev'] # size = 100
values = data.variables['values'] # size = (100,10,10)
new_lev = np.linspace(0,1,200) # new vertical grid size = 200
new_values = np.full(len(new_lev), len(lat), len(lon)) # size = (200,10,10)
### interpolation ###
for loop_lat in range(len(lat)):
for loop_lon in range(len(lon)):
new_values[:, loop_lat, loop_lon] = interpolation(lev, values[:,loop_lat,loop_lon], new_lev)
## how can I save these new_lev and new_values in the netcdf file ?
Using the interpolation, I converted the values of dimension A to dimension B.
Let say the original dimension A is 100 and interpolated dimension B is 200.
After the changing dimension, how can I save this values and dimension into netcdf file?
Could you please give me some advise?

You can open a NetCDF file for editing in place. See:
https://unidata.github.io/netcdf4-python/#creatingopeningclosing-a-netcdf-file
Rather than:
data = netcdf4.Dataset(file)
Try:
data = netCDF4.Dataset(file,'r+',clobber=True).

Re-distributing 2d data with max in middle

Hey all I have a set up seemingly random 2D data that I want to reorder. This is more for an image with specific values at each pixel but the concept will be the same.
I have large 2d array that looks very random, say:
x = 100
y = 120
np.random.random((x,y))
and I want to re-distribute the 2d matrix so that the maximum value is in the center and the values from the maximum surround it giving it sort of a gaussian fall off from the center.
small example:
output = [[0.0,0.5,1.0,1.0,1.0,0.5,0.0]
[0.0,1.0,1.0,1.5,1.0,0.5,0.0]
[0.5,1.0,1.5,2.0,1.5,1.0,0.5]
[0.0,1.0,1.0,1.5,1.0,0.5,0.0]
[0.0,0.5,1.0,1.0,1.0,0.5,0.0]]
I know it wont really be a gaussian but just trying to give a visualization of what I would like. I was thinking of sorting the 2d array into a list from max to min and then using that to create a new 2d array but Im not sure how to distribute the values down to fill the matrix how I want.
Thank you very much!

If anyone looks at this in the future and needs help, Here is some advice on how to do this effectively for a lot of data. Posted below is the code.
def datasort(inputarray,spot_in_x,spot_in_y):
#get the data read
center_of_y = spot_in_y
center_of_x = spot_in_x
M = len(inputarray[0])
N = len(inputarray)
l_list = list(itertools.chain(*inputarray)) #listed data
l_sorted = sorted(l_list,reverse=True) #sorted listed data
#Reorder
to_reorder = list(np.arange(0,len(l_sorted),1))
x = np.linspace(-1,1,M)
y = np.linspace(-1,1,N)
centerx = int(M/2 - center_of_x)*0.01
centery = int(N/2 - center_of_y)*0.01
[X,Y] = np.meshgrid(x,y)
R = np.sqrt((X+centerx)**2 + (Y+centery)**2)
R_list = list(itertools.chain(*R))
values = zip(R_list,to_reorder)
sortedvalues = sorted(values)
unzip = list(zip(*sortedvalues))
unzip2 = unzip[1]
l_reorder = zip(unzip2,l_sorted)
l_reorder = sorted(l_reorder)
l_unzip = list(zip(*l_reorder))
l_unzip2 = l_unzip[1]
sorted_list = np.reshape(l_unzip2,(N,M))
return(sorted_list)
This code basically takes your data and reorders it in a sorted list. Then zips it together with a list based on a circular distribution. Then using the zip and sort commands you can create the distribution of data you wish to have based on your distribution function, in my case its a circle that can be offset.

How can I optimize this data smoothing python loop?

I am trying to make a data smoothng function on a set of data I am using savitzky golay filter in order to do that, I am collecting an array of data and call the function by Scipy.
But since I am looping through a spcific element in a different frame I dont have spatial locality nor time locality.
dataobj.body.data[j][0][i]
holds (x,y) and I am only collecting the ys.
Here's the following loop :
def smooth_data(dataobj):
number_of_frames = len(dataobj.body.data)
for i in range(0, 137):
arr = []
for j in range(0, number_of_frames):
arr.append(dataobj.body.data[j][0][i][1])
newdata = scipy.signal.savgol_filter(arr, 25, 3)
for k in range(0, number_of_frames):
dataobj.body.data[k][0][i][1] = newdata[k]
return dataobj
I'd like to make it work faster, right now when the number of frames is over 1000 it takes a considerable amount of time, something like 30 seconds.
Thanks alot to all of the helpers !

If the input data is a multi-dimensional numpy array, then you can pass in a slice of the numpy array to the scipy method, and then insert the resulting array back into the original data object:
def smooth_data(dataobj):
number_of_frames = len(dataobj[:,0,0,1])
number_of_records = len(dataobj[0,0,:,1])
for i in range(0, number_of_records):
newdata = scipy.signal.savgol_filter(dataobj[:,0,i,1], 3, 1)
dataobj[:][0][i][1] = newdata
return dataobj

What about training a Krige model (of just a polynomial interpolation ) with 50 % of your x and y datas, and then taking the ^y evaluation of the model on your whole set x ?
Krige model example of code (using smt module) :
from smt.surrogate_models import KRG
t= KRG(theta0=[1e-2]*ndim,print_prediction = False)
t.set_training_values(xt,yt) #training inputs, outputs
t.train()
# Prediction of the other points
y = t.predict_values(xtest)

netCDF grid file: Extracting information from 1D array using 2D values

I am trying to work in Python 3 with topography/bathymetry-information (basically a grid containing x [longitude in decimal degrees], y [latitude in decimal degrees] and z [meter]).
The grid file has the extension .nc and is therefore a netCDF-file. Normally I would use it in mapping tools like Generic Mapping Tools and don't have to bother with how a netCDF file works, but I need to extract specific information in a Python script. Right now this is only limiting the dataset to certain longitude/latitude ranges.
However, right now I am a bit lost on how to get to the z-information for specific x and y values. Here's what I know about the data so far
import netCDF4
#----------------------
# Load netCDF file
#----------------------
bathymetry_file = 'C:/Users/te279/Matlab/data/gebco_08.nc'
fh = netCDF4.Dataset(bathymetry_file, mode='r')
#----------------------
# Getting information about the file
#----------------------
print(fh.file_format)
NETCDF3_CLASSIC
print(fh)
root group (NETCDF3_CLASSIC data model, file format NETCDF3):
title: GEBCO_08 Grid
source: 20100927
dimensions(sizes): side(2), xysize(933120000)
variables(dimensions): float64 x_range(side), float64 y_range(side), int16 z_range(side), float64 spacing(side), int32 dimension(side), int16 z(xysize)
groups:
print(fh.dimensions.keys())
odict_keys(['side', 'xysize'])
print(fh.dimensions['side'])
: name = 'side', size = 2
print(fh.dimensions['xysize'])
: name = 'xysize', size = 933120000
#----------------------
# Variables
#----------------------
print(fh.variables.keys()) # returns all available variable keys
odict_keys(['x_range', 'y_range', 'z_range', 'spacing', 'dimension', 'z'])
xrange = fh.variables['x_range'][:]
print(xrange)
[-180. 180.] # contains the values -180 to 180 for the longitude of the whole world
yrange = fh.variables['y_range'][:]
print(yrange)
[-90. 90.] # contains the values -90 to 90 for the latitude of the whole world
zrange = fh.variables['z_range'][:]
[-10977 8685] # contains the depths/topography range for the world
spacing = fh.variables['spacing'][:]
[ 0.00833333 0.00833333] # spacing in both x and y. Equals the dimension, if multiplied with x and y range
dimension = fh.variables['dimension'][:]
[43200 21600] # corresponding to the shape of z if it was the 2D array I would've hoped for (it's currently an 1D array of 9333120000 - which is 43200*21600)
z = fh.variables['z'][:] # currently an 1D array of the depth/topography/z information I want
fh.close
Based on this information I still don't know how to access z for specific x/y (longitude/latitude) values. I think basically I need to convert the 1D array of z into a 2D array corresponding to longitude/latitude values. I just have not a clue how to do that. I saw in some posts where people tried to convert a 1D into a 2D array, but I have no means to know in what corner of the world they start and how they progress.
I know there is a 3 year old similar post, however, I don't know how to find an analogue "index of the flattened array" for my problem - or how to exactly work with that. Can somebody help?

You need to first read in all three of z's dimensions (lat, lon, depth) and then extract values across each of those dimensions. Here are a few examnples.
# Read in all 3 dimensions [lat x lon x depth]
z = fh.variables['z'][:,:,:]
# Topography at a single lat/lon/depth (1 value):
z_1 = z[5,5,5]
# Topography at all depths for a single lat/lon (1D array):
z_2 = z[5,5,:]
# Topography at all latitudes and longitudes for a single depth (2D array):
z_3 = z[:,:,5]
Note that the number you enter for lat/lon/depth is the index in that dimension, not an actual latitude, for instance. You'll need to determine the indices of the values you are looking for beforehand.

I just found the solution in this post. Sorry that I didn't see that before. Here's what my code looks like now. Thanks to Dave (he answered his own question in the post above). The only thing I had to work on was that the dimensions have to stay integers.
import netCDF4
import numpy as np
#----------------------
# Load netCDF file
#----------------------
bathymetry_file = 'C:/Users/te279/Matlab/data/gebco_08.nc'
fh = netCDF4.Dataset(bathymetry_file, mode='r')
#----------------------
# Extract variables
#----------------------
xrange = fh.variables['x_range'][:]
yrange = fh.variables['y_range'][:]
zz = fh.variables['z'][:]
fh.close()
#----------------------
# Compute Lat/Lon
#----------------------
nx = (xrange[-1]-xrange[0])/spacing[0] # num pts in x-dir
ny = (yrange[-1]-yrange[0])/spacing[1] # num pts in y-dir
nx = nx.astype(np.integer)
ny = ny.astype(np.integer)
lon = np.linspace(xrange[0],xrange[-1],nx)
lat = np.linspace(yrange[0],yrange[-1],ny)
#----------------------
# Reshape the 1D to an 2D array
#----------------------
bathy = zz[:].reshape(ny, nx)
So, now when I look at the shape of both zz and bathy (following code), the former is a 1D array with a length of 933120000, the latter the 2D array with dimensions of 43200x21600.
print(zz.shape)
print(bathy.shape)
The next step is to use indices to access the bathymetry/topography data correctly, just as N1B4 described in his post

combining/merging multiple 2d arrays into single array by using python

I have four 2 dimensional np arrays. Shape of each array is (203 , 135). Now I want join all these arrays into one single array with respect to latitude and longitude.
I have used code below to read data
import pandas as pd
import numpy as np
import os
import glob
from pyhdf import SD
import datetime
import mpl_toolkits.basemap.pyproj as pyproj
DATA = ({})
files = glob.glob('MOD04*')
files.sort()
for n, f in enumerate(files):
SDS_NAME='Deep_Blue_Aerosol_Optical_Depth_550_Land'
hdf=SD.SD(f)
lat = hdf.select('Latitude')
latitude = lat[:]
min_lat=latitude.min()
max_lat=latitude.max()
lon = hdf.select('Longitude')
longitude = lon[:]
min_lon=longitude.min()
max_lon=longitude.max()
sds=hdf.select(SDS_NAME)
data=sds.get()
p = pyproj.Proj(proj='utm', zone=45, ellps='WGS84')
x,y = p(longitude, latitude)
def set_element(elements, x, y, data):
# Set element with two coordinates.
elements[x + (y * 10)] = data
elements = []
set_element(elements,x,y,data)
But I got error: only integer arrays with one element can be converted to an index
you can find the data: https://drive.google.com/open?id=0B2rkXkOkG7ExMElPRDd5YkNEeDQ
I have created toy datasets for this problem as per requested.
what I want is to get one single array from four (a,b,c,d) arrays. whose dimension should be something like (406, 270)
a = (np.random.rand(27405)).reshape(203,135)
b = (np.random.rand(27405)).reshape(203,135)
c = (np.random.rand(27405)).reshape(203,135)
d = (np.random.rand(27405)).reshape(203,135)
a_x = (np.random.uniform(10,145,27405)).reshape(203,135)
a_y = (np.random.uniform(204,407,27405)).reshape(203,135)
d_x = (np.random.uniform(150,280,27405)).reshape(203,135)
d_y = (np.random.uniform(204,407,27405)).reshape(203,135)
b_x = (np.random.uniform(150,280,27405)).reshape(203,135)
b_y = (np.random.uniform(0,202,27405)).reshape(203,135)
c_x = (np.random.uniform(10,145,27405)).reshape(203,135)
c_y = (np.random.uniform(0,202,27405)).reshape(203,135)
any help?

This should be a comment, yet the comment space is not enough for these questions. Therefore I am posting here:
You say that you have 4 input arrays (a,b,c,d) which are somehow to be intergrated into an output array. As far as is understood, two of these arrays contain positional information (x,y) such as longitude and latitude. The only line in your code, where you combine several input arrays is here:
def set_element(elements, x, y, data):
# Set element with two coordinates.
elements[x + (y * 10)] = data
Here you have four input variables (elements, x, y, data) which I assume to be your input arrays (a,b,c,d). In this operation yet you do not combine them, but you overwrite an element of elements (index: x + 10y) with a new value (data).
Therefore, I do not understand your target output.
When I was asking for toy data, I had something like this in mind:
a = [[1,2]]
b = [[3,4]]
c = [[5,6]]
d = [[7,8]]
This would be such an easy example that you could easily say:
What I want is this:
res = [[[1,2],[3,4]],[[5,6],[7,8]]]
Then we could help you to find an answer.
Please, thus, provide more information about the operation that you want to conduct either mathematically notated ( such as x = a +b*c +d) or with toy data so that we can deduce the function you ask for.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why is the NETcdf data I created masked? - python

Related

Change dimension and values of netcdf file in Python

Re-distributing 2d data with max in middle

How can I optimize this data smoothing python loop?

netCDF grid file: Extracting information from 1D array using 2D values

combining/merging multiple 2d arrays into single array by using python

Categories

Resources