Iterating over files and stitching together 3D arrays

Iterating over files and stitching together 3D arrays - python

I have several netCDF files that have an array in the shape of (365,585,1386) and I'm trying to read in each new array and stitch them together along axis = 0; or add all of the days of the year (365). The other two dimensions are latitude and longitude, so ideally I have several years of data for each lat/long point (each netCDF file is one calendar year of data).
import glob
from netCDF4 import Dataset
import numpy as np
data = '/Users/sjakober/Documents/ResearchSpring2020/erc_1979.nc'
files = sorted(glob.glob('erc*'))
for x, f in enumerate(files):
nc = Dataset(data, mode='r')
print(f)
if x == 0:
a = nc.variables['energy_release_component-g'][:]
else:
b = nc.variables['energy_release_component-g'][:]
np.hstack((a, b))
nc.close()

Related

Change dimension and values of netcdf file in Python

I'm trying to change the dimension of values in netcdf file.
First I read a netcdf file and interpolated the data.
import numpy as np
import netCDF4
from scipy.interpolate import interp1d
def interpolation(a,b,c):
f = interp1d(a,b,kind='linear')
return f(c)
file = 'directory/test.nc'
data = netcdf4.Dataset(file)
lon = data.variables['lon'] # size = 10
lat = data.variables['lat'] # size = 10
lev = data.variables['lev'] # size = 100
values = data.variables['values'] # size = (100,10,10)
new_lev = np.linspace(0,1,200) # new vertical grid size = 200
new_values = np.full(len(new_lev), len(lat), len(lon)) # size = (200,10,10)
### interpolation ###
for loop_lat in range(len(lat)):
for loop_lon in range(len(lon)):
new_values[:, loop_lat, loop_lon] = interpolation(lev, values[:,loop_lat,loop_lon], new_lev)
## how can I save these new_lev and new_values in the netcdf file ?
Using the interpolation, I converted the values of dimension A to dimension B.
Let say the original dimension A is 100 and interpolated dimension B is 200.
After the changing dimension, how can I save this values and dimension into netcdf file?
Could you please give me some advise?

You can open a NetCDF file for editing in place. See:
https://unidata.github.io/netcdf4-python/#creatingopeningclosing-a-netcdf-file
Rather than:
data = netcdf4.Dataset(file)
Try:
data = netCDF4.Dataset(file,'r+',clobber=True).

Create a raster stack in from xarray dataset in Python

I am trying to make a raster stack from an xarray dataset which I obtained from multiple netCDF files. There are 365 netCDF files each containing a 2D data of Sea Surface Temperature (SST) having height and width 3600 and 7200 respectively. To perform further operations I need to prepare a raster stack.
import netCDF4 as nc
import rasterio as rio
import numpy as np
import xarray as xr
import os
fpath = '/home/sst/2015'
pattern = '*.nc'
filelist = []
for root, dirs, files in os.walk(fpath):
for name in files:
filelist.append(os.path.join(fpath,name))
ds = xr.open_mfdataset(eflist, concat_dim='time', parallel = True) # netCDF data
ds_data = ds.sel(time='2015')['SST'] # xarray dataset with dimension 365x3600x7200.
Raster stack of this xarray will be used to extract data values at point locations. I am currently using numpy and rasterio as mentioned in rasterio documentation. By iterating over the 3D xarray following code writes 365 files to HDD and later I can read and stack those.
from rasterio.transform import from_origin
transform = from_origin(-180,90, 0.05, 0.05)
fpath = '/home/sst/sst_tif'
fname = 'sst_array_'
extname = '.tiff'
timedf = ds.time # time dimension to loop over
for i in range(len(timedf)):
np_array = np.array(ds_data[i])
iname = str(i)
fwname = fpath + fname + iname + extname
sst_tif = rio.open(fwname,
'w',
driver = 'GTiff',
height = 3600,
width = 7200,
dtype = np_array.dtype,
count = 1,
crs = 'EPSG:4326',
transform = transform)
sst_tif.write(np_array, 1)
sst_tif.close()
However this takes a very long time to process entire dataset. I also attempted converting entire xarray to a numpy array and writing all 365 layers in a single file but it freezes the Python kernel.
Is there any way I can create this raster stack in memory and do further processing without having to write it to file/s on HDD. I am trying to obtain functionality similar to that of stack function available in raster package of R.

Comparing two lists of stellar x-y coordinates to find matching objects

I have two .txt files that contain the x and y pixel coordinates of thousands of stars in an image. These two different coordinate lists were the products of different data processing methods, which result in slightly different x and y values for the same object.
File 1 *the id_out is arbitrary
id_out x_out y_out m_out
0 803.6550 907.0910 -8.301
1 700.4570 246.7670 -8.333
2 802.2900 894.2130 -8.344
3 894.6710 780.0040 -8.387
File 2
xcen ycen mag merr
31.662 37.089 22.759 0.387
355.899 37.465 19.969 0.550
103.079 37.000 20.839 0.847
113.500 38.628 20.966 0.796
The objects listed in the .txt files are not organized in a way that allows me to identify the same object in both files. So, I thought for every object in file 1, which has fewer objects than file 2, I would impose a test to find the star match between file 1 and file 2. For every star in file 1, I want to find the star in file 2 that has the closest match to x-y coordinates using the distance formula:
distance= sqrt((x1-x2)^2 + (y1-y2)^2) within some tolerance of distance that I can change. Then print onto a master list the x1, y1, x2, y2, m_out, mag, and merr parameters in the file.
Here is the code I have so far, but I am not sure how to arrive at a working solution.
#/usr/bin/python
import pandas
import numpy as np
xcen_1 = np.genfromtxt('file1.txt', dtype=float, usecols=1)
ycen_1 = np.genfromtxt('file1.txt', dtype=float, usecols=2)
mag1 = np.genfromtxt('file1.txt', dtype=float, usecols=3)
xcen_2 = np.genfromtxt('file2.txt', dtype=float, usecols=0)
ycen_2 = np.genfromtxt('file2.txt', dtype=float, usecols=1)
mag2 = np.genfromtxt('file2.txt', dtype=float, usecols=2)
merr2 = np.genfromtxt('file2.txt', dtype=float, usecols=3)
tolerance=10.0
i=0
file=open('results.txt', 'w+')
file.write("column names")
for i in len(xcen_1):
dist=np.sqrt((xcen_1[i]-xcen_2[*])^2+(ycen_1[i]-ycen_2[*]^2))
if dist < tolerance:
f.write(i, xcen_1, ycen_1, xcen_2, ycen_2, mag1, mag2, merr2)
else:
pass
i=i+1
file.close
The code doesn't work as I don't know how to implement that every star in file 2 must be run through the test, as indicated by the * index (which comes from idl, in which I am more versed). Is there a solution for this logic, as opposed to the thinking in this case:
To compare two independent image coordinate lists with same scale but coordinate-grid having some rotation and shift
Thanks in advance!

You can use pandas Dataframes. Here's how:
import pandas as pd
# files containing the x and y pixel coordinates and other information
df_file1=pd.read_csv('file1.txt',sep='\s+')
df_file2=pd.read_csv('file2.txt',sep='\s+')
join=[]
for i in range(len(df_file1)):
for j in range(len(df_file2)):
dis=((df_file1['x_out'][i]-df_file2['xcen'][j])**2+(df_file1['y_out'][i]-df_file2['ycen'][j])**2)**0.5
if dis<10:
join.append({'id_out': df_file1['id_out'][i], 'x_out': df_file1['x_out'][i], 'y_out':df_file1['y_out'][i],
'm_out':df_file1['m_out'][i],'xcen':df_file2['xcen'][j],'ycen':df_file2['ycen'][j],
'mag':df_file2['mag'][j],'merr':df_file2['merr'][j]})
df_join=pd.DataFrame(join)
df_join.to_csv('results.txt', sep='\t')

Numpy Masked Array changes zeros to a huge number when uploaded in another script

I have a script where I create masks on a gridded array for certain distances from the coastline and save them to a netcdf file. I have a grid file with latitude, longitude and a land mask (netcdf files). My script (simplified here) does the following:
from netCDF4 import Dataset
grd = Dataset('gridfile', 'r')
mask_rho = grd.variables['mask_rho'][:]
lat = grd.variables['lat'][:]
full_region = np.copy(mask_rho) #copy land mask from grid file
full_region[lat<5] = 0. #masks latitudes above and below region of interest
full_region[lat>35] = 0.
mask_full = np.ma.masked_where(full_region==0., full_region) #numpy mask
filename = directory+'mask.nc'
ncfile = Dataset(filename, 'w', format='NETCDF4')
ncfile.createDimension('lat', lat)
ncfile.createDimension('lon',lon)
Dims = ('lat,'lon')
data = ncfile.createVariable('mask_full','f4',Dims)
data[:] = mask_full
ncfile.close()
However, when I open the netcdf file i created in another script, the data seems to come in three forms:
'masked data' (masked out values are blank, values within the mask are 1)
'data' (values within the mask are 1, and masked out values are some huge number, 996920996838.....)
and 'mask', which is a true/false boolean.
Why is the 'data' part giving me such a huge number? And why are there three data types?

Extract data from NETCDF (.NC file) based on time

I am currently working on extracting data from a .NC file to create a .cur file for usage in GNOME. I am doing this in python
I extracted the following variables.
water_u(time, y, x)
water_v(time, y, x)
x(x):
y(y):
time(time): time
SEP(time, y, x)
The cur file should contain the following:
[x][y][velocity x][velocity y]
this should happen for each time variable present. In this case I have 10 time data extracted, but I have thousands and thousand of [x][y] and velocity.
My question is how to I extract the velocities based on the time variable?
import numpy as np
from netCDF4 import Dataset
volcgrp = Dataset('file_1.nc', 'r')
var = volcgrp.variables['water_v'][:]
print(var)
newList = var.tolist()
file = open('text.txt', 'w')
file.write('%s\n' % newList)
print("Done")
volcgrp.close()

The key here is to read in the water_u and water_v for each of its three dimensions and then you can access those variables along its time dimension.
import netCDF4
ncfile = netCDF4.Dataset('file_1.nc', 'r')
time = ncfile.variables['time'][:] #1D
water_u = ncfile.variables['water_u'][:,:,:] #3D (time x lat x lon)
water_v = ncfile.variables['water_v'][:,:,:]
To access data at each grid point for the first time in this file:
water_u_first = water_u[0,:,:]
To store this 3D data into a text file as you describe in the comments, you'll need to (1) loop over time, (2) access water_u and water_v at that time, (3) flatten those 2D arrays to 1D, (4) convert to strings if using the standard file.write technique (can be avoided using Pandas to_csv for example), and (5) write-out the 1D arrays as rows in the text file.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Iterating over files and stitching together 3D arrays - python

Related

Change dimension and values of netcdf file in Python

Create a raster stack in from xarray dataset in Python

Comparing two lists of stellar x-y coordinates to find matching objects

Numpy Masked Array changes zeros to a huge number when uploaded in another script

Extract data from NETCDF (.NC file) based on time

Categories

Resources