Adding 2D arrays to specific indexes in an empty 4D array - python

I currently have ~40 years worth of daily ozone measurement datasets (which were 3D arrays with dimensions time (24), latitude (361), and longitude (576) respectively). Each day has its own data file.
I then created a 2D array (361, 576) for each day, averaging all of the data from each hour.
My next goal is to create one plot for each day of the calendar year (January 1st, January 2nd, etc.) that ranges through all of the years in my dataset. I'm trying to show the trend of ozone on each day through each respective year. For example, my first plot would be the trend of the daily average on January 1st from the first year to the last year in my dataset.
dims = np.shape(TO3) #Dimensions of original data (24, 361, 576)
avgTO3 = np.arange(dims[1]*dims[2], dtype=float).reshape(dims[1], dims[2]) #Creates new 2D array for daily averages
avgTO3[:,:] = 0.0
for i in range(TO3.shape[0]):
np.add(TO3[i], avgTO3, out=avgTO3)
avgozone = avgTO3 / 24.0 #Final 2D array of daily average
dailyavgdims = np.shape(avgozone)
dailyavgyear = np.arange(dailyavgdims[0]*dailavgdims[1], dtype=float).reshape(dailyavgyear[0], dailyavgyear[1])
dailyavgyear[:,:] = 0.0
dailyavgbyyear = dailyavgyear[..., np.newaxis, np.newaxis]
dailyavgbyyear[:,:,:,:] = 0.0 #Empty 4D array with dimensions (361, 576, 365, 40)
Within the 4D array, the third dimension represents the calendar day (so it would likely go to 365), and the 4th dimension represents the year (which would be around 40).
My question is how I can add each of the 2D arrays to specific dimensions in the 4D array. Like how can I add the daily average of January 1st, 1980 (the first possible day) to the 4D array with dimensions (361, 576, 0, 0), and then January 2nd, 1980's 2D array to (361, 576, 1, 0) and so on? I'm finding it to be difficult, especially since I can't necessarily store these arrays anywhere else because of Linux. Any help is appreciated!
Sidenote: I know my code isn't too condensed, but that's something I'm not terribly worried about at the moment. I'm still trying to learn the in's and out's of Python and Linux.

years = 40
days = 365
# random data for the example. You'd load an array, or hardcode in the dimensions
TO3 = np.random.randn(24, 361, 576)
# get average over first dimension
avgozone = TO3.mean(0)
# Create the empty array
dailyavgbyyear = np.zeros((*avgozone.shape, days, years))
for y in range(years):
for d in range(days):
# Load 3d array
TO3 = np.random.randn(24, 361, 576) # random data for the example
avgozone = TO3.mean(0) # get mean over first dimension
dailyavgbyyear[:, :, d, y] = avgozone

Related

Mapping data above 90th percentile with cartopy

I am still fairly new to programming and python in particular.
I have spatial dust optical depth data, with dimensions of lat, lon, and time.
I have managed to make daily plots of a dust plume as it moves across the Atlantic, and now I am trying to make similar plots masking out any cell below the 90th percentile value.
I have an array of 90th percentile values created with the following code:
#====== select and average over study region ======#
tropatl_mjjas = dod_2.sel(latitude=slice(25, 8), longitude=slice(270, 342)).mean(dim=('latitude', 'longitude'))
#====== resample to daily then drop NAN ======#
tropatl_mjjas_daily = tropatl_mjjas.resample(time='D').mean(dim='time').dropna(dim='time')
#====== reshape to 17 rows x 153 columns ======#
tropatl_2d = np.reshape(tropatl_mjjas_daily.values,(17, 153))
percvalues=np.zeros(92)
k=0
for i in range(24,152):
if i==116:
break
ravel_1 = np.ravel(tropatl_2d[:, i:i+15])
percvalues[k]=np.percentile(ravel_1, 90)
k+=1
print(percvalues)
What I would like to do is somehow check each cell against the 90th percentile for that day in the study region. This particular dust event is 12 days long. I started with trying it for a single day:
data = dod.sel(time='2015-06-18').resample(time='D').mean(dim='time')
valid_data = data[data>percvalues[24]]
But I get this error: IndexError: 3-dimensional boolean indexing is not supported.
Here is what the data array looks like:
[data array][1] you can see the variable name is 'duaod550'
I tried it this instead:
valid_data = data[dod>percvalues[24]]
IndexError: Boolean array size 49672 is used to index array with shape (1,).
and
valid_data = data[duaod550>percvalues[24]]
NameError: name 'duaod550' is not defined
So how do I work around the dimension issues to make the code compare each cell to the 90th percentile value?
Thank you in advance for any help.
[1]: https://i.stack.imgur.com/m1FHV.png

Obtain mean value of specific area in netCDF

I am trying to plot a time series of the sea surface temperature (SST) for a specific region from a .nc file. The SST is a three-dimensional variable (lat,lon,time), that has mean daily values for a specific region from 1982 to 2016. I want my plot to reflect the seasonal sst variability of the entire period of time. I assume that what I need to do first is to obtain a mean sst value for my lat,lon region for each of the days with which I can work alter on. So far, I assume that I need to read the .nc file and the variables:
import netCDF4 as nc
f = nc.Dataset('cmems_SST_MED_SST_L4_REP_OBSERVATIONS_010_021_1639073212518.nc')
sst = f.variables['analysed_sst'][:]
lon = f.variables['longitude'][:]
lat = f.variables['latitude'][:]
Next, following the code suggested here, I tried to reshape and obtain the mean, but an error pops up:
global_average= np.nanmean(sst[:,:,:],axis=(1,2))
annual_temp = np.nanmean(np.reshape(global_average, (34,12)), axis = 1)
#34 years between 1982 and 2016, and 12 months per year.
ERROR cannot reshape array of size 14008 into shape (34,12)
From here I found different ways, like using cdo or nco (which didn't work due installation problems) among others, which were not suitable for my case. I used nanmean because know that in MATLAB this is done using the nanmean function. I am quite new to this topic and I would like to ask for some hints, like, where should I focus more or what path is more suitable for this case. Thank you!!
Handling daily data with just pure python is difficult because you should consider leap years and sub-setting a region require tedious indexing striding....
As steTATO mentioned, since the data that you are working has daily temporal resolution you need to consider the following
You need to reshape the global_average in the shape of (34,365) or (34,366) depending on the year (1984,1988,1992,1996,2000,2004,2008,2012,2016). So your above code should look something like
annual_temp = np.nanmean(np.reshape(global_average, (34,365)), axis = 1)
But, like I said, because of the leap years, you can't do the things you want by simply reshaping the global_average.
If I had no choice but to use python only, I'd do the following
import numpy as np
def days_in_year(in_year):
leap_years = [1984,1988,1992,1996,2000,2004,2008,2012,2016]
if (in_year in leap_years):
out_days = 366
else:
out_days = 365
return out_days
# some of your code, importing netcdf data
year = np.arange(1982,2017)
global_avg= np.nanmean(sst[:,:,:],axis=(1,2))
annual_avgs = []
i = 0
for yr in range(35):
i = i + days_in_year(year[yr])
f = i - days_in_year(year[yr])
annual_avg = np.nanmean(global_avg[i:f])
annual_avgs.append(annual_avg)
Above code basically takes and averages by taking strides of the global_avg considering the leap year, and saving it as annual_avgs.

Insert values at a specific location for each time series of pixels using Numpy Array

Let's say I have a 3-dimensional Numpy Array. It is daily data of a year and 1-degree pixels of the globe, resulting in a shape of (365, 180, 360).
Now, you want to insert a value of 16th January to its position, so that each time series becomes:
...val_0114, val_0115, val_0116, val_0116, val_0117, val_0118, ...
I could do it like:
arr_new = np.empty((arr_old.shape[0], 180, 360)) * np.nan
for _lat in range(arr_new.shape[1]):
for _lon in range(arr_new.shape[2]):
arr_new[:, _lat, _lon] = np.insert(arr_old, 15, arr_old[15, _lat, _lon], axis=0)
But I would like to find a more fancy way without the loop.
In numpy you are allowed to directly reference a matrix. This is similar to when you work with multi-dimensional lists:
# If you want to place the same temperature in the whole world/pixels
arr_old[16] = 15
# If you want to place the same temperature in a certain place the whole year
arr_old[:][lat][lon]

netCDF grid file: Extracting information from 1D array using 2D values

I am trying to work in Python 3 with topography/bathymetry-information (basically a grid containing x [longitude in decimal degrees], y [latitude in decimal degrees] and z [meter]).
The grid file has the extension .nc and is therefore a netCDF-file. Normally I would use it in mapping tools like Generic Mapping Tools and don't have to bother with how a netCDF file works, but I need to extract specific information in a Python script. Right now this is only limiting the dataset to certain longitude/latitude ranges.
However, right now I am a bit lost on how to get to the z-information for specific x and y values. Here's what I know about the data so far
import netCDF4
#----------------------
# Load netCDF file
#----------------------
bathymetry_file = 'C:/Users/te279/Matlab/data/gebco_08.nc'
fh = netCDF4.Dataset(bathymetry_file, mode='r')
#----------------------
# Getting information about the file
#----------------------
print(fh.file_format)
NETCDF3_CLASSIC
print(fh)
root group (NETCDF3_CLASSIC data model, file format NETCDF3):
title: GEBCO_08 Grid
source: 20100927
dimensions(sizes): side(2), xysize(933120000)
variables(dimensions): float64 x_range(side), float64 y_range(side), int16 z_range(side), float64 spacing(side), int32 dimension(side), int16 z(xysize)
groups:
print(fh.dimensions.keys())
odict_keys(['side', 'xysize'])
print(fh.dimensions['side'])
: name = 'side', size = 2
print(fh.dimensions['xysize'])
: name = 'xysize', size = 933120000
#----------------------
# Variables
#----------------------
print(fh.variables.keys()) # returns all available variable keys
odict_keys(['x_range', 'y_range', 'z_range', 'spacing', 'dimension', 'z'])
xrange = fh.variables['x_range'][:]
print(xrange)
[-180. 180.] # contains the values -180 to 180 for the longitude of the whole world
yrange = fh.variables['y_range'][:]
print(yrange)
[-90. 90.] # contains the values -90 to 90 for the latitude of the whole world
zrange = fh.variables['z_range'][:]
[-10977 8685] # contains the depths/topography range for the world
spacing = fh.variables['spacing'][:]
[ 0.00833333 0.00833333] # spacing in both x and y. Equals the dimension, if multiplied with x and y range
dimension = fh.variables['dimension'][:]
[43200 21600] # corresponding to the shape of z if it was the 2D array I would've hoped for (it's currently an 1D array of 9333120000 - which is 43200*21600)
z = fh.variables['z'][:] # currently an 1D array of the depth/topography/z information I want
fh.close
Based on this information I still don't know how to access z for specific x/y (longitude/latitude) values. I think basically I need to convert the 1D array of z into a 2D array corresponding to longitude/latitude values. I just have not a clue how to do that. I saw in some posts where people tried to convert a 1D into a 2D array, but I have no means to know in what corner of the world they start and how they progress.
I know there is a 3 year old similar post, however, I don't know how to find an analogue "index of the flattened array" for my problem - or how to exactly work with that. Can somebody help?
You need to first read in all three of z's dimensions (lat, lon, depth) and then extract values across each of those dimensions. Here are a few examnples.
# Read in all 3 dimensions [lat x lon x depth]
z = fh.variables['z'][:,:,:]
# Topography at a single lat/lon/depth (1 value):
z_1 = z[5,5,5]
# Topography at all depths for a single lat/lon (1D array):
z_2 = z[5,5,:]
# Topography at all latitudes and longitudes for a single depth (2D array):
z_3 = z[:,:,5]
Note that the number you enter for lat/lon/depth is the index in that dimension, not an actual latitude, for instance. You'll need to determine the indices of the values you are looking for beforehand.
I just found the solution in this post. Sorry that I didn't see that before. Here's what my code looks like now. Thanks to Dave (he answered his own question in the post above). The only thing I had to work on was that the dimensions have to stay integers.
import netCDF4
import numpy as np
#----------------------
# Load netCDF file
#----------------------
bathymetry_file = 'C:/Users/te279/Matlab/data/gebco_08.nc'
fh = netCDF4.Dataset(bathymetry_file, mode='r')
#----------------------
# Extract variables
#----------------------
xrange = fh.variables['x_range'][:]
yrange = fh.variables['y_range'][:]
zz = fh.variables['z'][:]
fh.close()
#----------------------
# Compute Lat/Lon
#----------------------
nx = (xrange[-1]-xrange[0])/spacing[0] # num pts in x-dir
ny = (yrange[-1]-yrange[0])/spacing[1] # num pts in y-dir
nx = nx.astype(np.integer)
ny = ny.astype(np.integer)
lon = np.linspace(xrange[0],xrange[-1],nx)
lat = np.linspace(yrange[0],yrange[-1],ny)
#----------------------
# Reshape the 1D to an 2D array
#----------------------
bathy = zz[:].reshape(ny, nx)
So, now when I look at the shape of both zz and bathy (following code), the former is a 1D array with a length of 933120000, the latter the 2D array with dimensions of 43200x21600.
print(zz.shape)
print(bathy.shape)
The next step is to use indices to access the bathymetry/topography data correctly, just as N1B4 described in his post

Numpy Reshape to obtain monthly means from data

I'm trying to obtain monthly means from an observed precipitation data set for the period 1901-2015. The current shape of my prec variable is (1380(time), 360(lon), 720(lat)), with 1380 being the number of months over a 115 year period. I have been informed that to calculate monthly means, the most effective way is to conduct an np.reshape command on the prec variable to split the array up into months and years. However I am not sure what the best way to do this is. I was also wondering if there was a way in Python to select specific months of the year, as I will be producing plots for each month of the year.
I have been attempting to reshape the prec variable with the code below. However I am not sure how to do this correctly:
#Set Source Folder
sys.path.append('../../..')
SrcFld = ("/export/silurian/array-01/obs/CRU/")
#Retrieve Data
data_path = ''
example = (str(SrcFld) + 'cru_ts4.00.1901.2015.pre.dat.nc')
Data = Dataset(example)
#Create Prec Mean Array and reshape to get monthly means
Prec_mean = np.zeros((360,720))
#Retrieve Variables
Prec = Data.variables['pre'][:]
lats = Data.variables['lat'][:]
lons = Data.variables['lon'][:]
np.reshape(Prec, ())
#Get Annual/Monthly Average
Prec_mean =np.mean(Prec,axis=0)
Any guidance on this issue would be appreciated.
The following snippet will first dice the precipitation array year-wise. We can then use that array to get the monthly average of precipitation.
>>> prec = np.random.rand(1380,360,720)
>>> ind = np.arange(12,1380,12)
>>> yearly_split = np.array(np.split(prec, ind, axis=0))
>>> yearly_split.shape
(115, 12, 360, 720)
>>> monthly_mean = yearly_split.mean(axis=0)
>>> monthly_mean.shape
(12, 360, 720)

Categories