Extract data from NETCDF (.NC file) based on time - python

I am currently working on extracting data from a .NC file to create a .cur file for usage in GNOME. I am doing this in python
I extracted the following variables.
water_u(time, y, x)
water_v(time, y, x)
x(x):
y(y):
time(time): time
SEP(time, y, x)
The cur file should contain the following:
[x][y][velocity x][velocity y]
this should happen for each time variable present. In this case I have 10 time data extracted, but I have thousands and thousand of [x][y] and velocity.
My question is how to I extract the velocities based on the time variable?
import numpy as np
from netCDF4 import Dataset
volcgrp = Dataset('file_1.nc', 'r')
var = volcgrp.variables['water_v'][:]
print(var)
newList = var.tolist()
file = open('text.txt', 'w')
file.write('%s\n' % newList)
print("Done")
volcgrp.close()

The key here is to read in the water_u and water_v for each of its three dimensions and then you can access those variables along its time dimension.
import netCDF4
ncfile = netCDF4.Dataset('file_1.nc', 'r')
time = ncfile.variables['time'][:] #1D
water_u = ncfile.variables['water_u'][:,:,:] #3D (time x lat x lon)
water_v = ncfile.variables['water_v'][:,:,:]
To access data at each grid point for the first time in this file:
water_u_first = water_u[0,:,:]
To store this 3D data into a text file as you describe in the comments, you'll need to (1) loop over time, (2) access water_u and water_v at that time, (3) flatten those 2D arrays to 1D, (4) convert to strings if using the standard file.write technique (can be avoided using Pandas to_csv for example), and (5) write-out the 1D arrays as rows in the text file.

Related

Fitness tracking using Python Numpy and Matplotlib module. Takes latest data from a specified folder and stores them in a numpy array

I have a folder fitness_tracker, where i have more folders like Location GPS 2021-10-23. the only difference between folders is the date. IN all these subfolders there is a CSV file Named Raw Data.
Raw Data includes, time velocity latitude longitiude in different columns. i want to write a program that goes into fitness_tracker, takes the latest folders ( lets say 5 out of 10 folders ) by reading the file names and goes into those folders and reads the Raw Data csv files and stores time data in a single matrix array. right now i can do it for a single file using NUMPY.
i want to read time value from Raw Data from separate folder and store it in a matrix
time = np.array([t1, t2, t3,t4,t5])
and then use these data to make a graph using matplot lib
this is the program i am running now.
import numpy as np
import matplotlib.pyplot as plt
bus_data = np.loadtxt('Raw Data.csv',delimiter=',',skiprows=1) # 1a. Import GPS File
time = bus_data[:,0]/60 # Second to minute
latitude = bus_data[:,1]
longitude = bus_data[:,2]
altitude = bus_data[:,3] # Unit = Meter
speed = bus_data[:,5] # Unit = Meter / second
distance = bus_data[:,7] # Unit = kilometer
fig1, axs1 = plt.subplots(1, 1)
axs1.plot(distance, speed, 'k.',markersize = 1, label='data')
axs1.set(xlabel='Distance (km)', ylabel='Speed (m/s)')
axs1.set_title('Speed over Distance')
axs1.legend()
plt.savefig('Speed over Distance.png',dpi=200)
plt.show()
This is how I'd get the time data from a List of directories:
import numpy as np
from os import listdir
from os.path import isfile, join
directories = [dir1, dir2, ...]
files = []
for directory in directories:
files.append([f for f in listdir(directory) if isfile(join(directory, f)) and f.endswith(".csv")])
data = []
for file in files:
data.append(np.loadtxt('Raw Data.csv',delimiter=',',skiprows=1))
time = []
for d in data:
time.append(d[:,0]/60)
If you want to get the list of directories based on their name (and thus their date) you'll have to do some more work parsing dates from the names of the folders, but i guess that deserves a new question, as this one is already asked way too broadly imho.

Comparing two lists of stellar x-y coordinates to find matching objects

I have two .txt files that contain the x and y pixel coordinates of thousands of stars in an image. These two different coordinate lists were the products of different data processing methods, which result in slightly different x and y values for the same object.
File 1 *the id_out is arbitrary
id_out x_out y_out m_out
0 803.6550 907.0910 -8.301
1 700.4570 246.7670 -8.333
2 802.2900 894.2130 -8.344
3 894.6710 780.0040 -8.387
File 2
xcen ycen mag merr
31.662 37.089 22.759 0.387
355.899 37.465 19.969 0.550
103.079 37.000 20.839 0.847
113.500 38.628 20.966 0.796
The objects listed in the .txt files are not organized in a way that allows me to identify the same object in both files. So, I thought for every object in file 1, which has fewer objects than file 2, I would impose a test to find the star match between file 1 and file 2. For every star in file 1, I want to find the star in file 2 that has the closest match to x-y coordinates using the distance formula:
distance= sqrt((x1-x2)^2 + (y1-y2)^2) within some tolerance of distance that I can change. Then print onto a master list the x1, y1, x2, y2, m_out, mag, and merr parameters in the file.
Here is the code I have so far, but I am not sure how to arrive at a working solution.
#/usr/bin/python
import pandas
import numpy as np
xcen_1 = np.genfromtxt('file1.txt', dtype=float, usecols=1)
ycen_1 = np.genfromtxt('file1.txt', dtype=float, usecols=2)
mag1 = np.genfromtxt('file1.txt', dtype=float, usecols=3)
xcen_2 = np.genfromtxt('file2.txt', dtype=float, usecols=0)
ycen_2 = np.genfromtxt('file2.txt', dtype=float, usecols=1)
mag2 = np.genfromtxt('file2.txt', dtype=float, usecols=2)
merr2 = np.genfromtxt('file2.txt', dtype=float, usecols=3)
tolerance=10.0
i=0
file=open('results.txt', 'w+')
file.write("column names")
for i in len(xcen_1):
dist=np.sqrt((xcen_1[i]-xcen_2[*])^2+(ycen_1[i]-ycen_2[*]^2))
if dist < tolerance:
f.write(i, xcen_1, ycen_1, xcen_2, ycen_2, mag1, mag2, merr2)
else:
pass
i=i+1
file.close
The code doesn't work as I don't know how to implement that every star in file 2 must be run through the test, as indicated by the * index (which comes from idl, in which I am more versed). Is there a solution for this logic, as opposed to the thinking in this case:
To compare two independent image coordinate lists with same scale but coordinate-grid having some rotation and shift
Thanks in advance!
You can use pandas Dataframes. Here's how:
import pandas as pd
# files containing the x and y pixel coordinates and other information
df_file1=pd.read_csv('file1.txt',sep='\s+')
df_file2=pd.read_csv('file2.txt',sep='\s+')
join=[]
for i in range(len(df_file1)):
for j in range(len(df_file2)):
dis=((df_file1['x_out'][i]-df_file2['xcen'][j])**2+(df_file1['y_out'][i]-df_file2['ycen'][j])**2)**0.5
if dis<10:
join.append({'id_out': df_file1['id_out'][i], 'x_out': df_file1['x_out'][i], 'y_out':df_file1['y_out'][i],
'm_out':df_file1['m_out'][i],'xcen':df_file2['xcen'][j],'ycen':df_file2['ycen'][j],
'mag':df_file2['mag'][j],'merr':df_file2['merr'][j]})
df_join=pd.DataFrame(join)
df_join.to_csv('results.txt', sep='\t')

Iterating over files and stitching together 3D arrays

I have several netCDF files that have an array in the shape of (365,585,1386) and I'm trying to read in each new array and stitch them together along axis = 0; or add all of the days of the year (365). The other two dimensions are latitude and longitude, so ideally I have several years of data for each lat/long point (each netCDF file is one calendar year of data).
import glob
from netCDF4 import Dataset
import numpy as np
data = '/Users/sjakober/Documents/ResearchSpring2020/erc_1979.nc'
files = sorted(glob.glob('erc*'))
for x, f in enumerate(files):
nc = Dataset(data, mode='r')
print(f)
if x == 0:
a = nc.variables['energy_release_component-g'][:]
else:
b = nc.variables['energy_release_component-g'][:]
np.hstack((a, b))
nc.close()

How can I append features column wise and start a row for each sample

I am trying to create a training data file which is structured as follows:
[Rows = Samples, Columns = features]
So if I have 100 samples and 2 features the shape of my np.array would be (100,2).
The list bellow contains path-strings to the .nrrd 3D sample patch-data files.
['/Users/FK/Documents/image/0128/subject1F_200.nrrd',
'/Users/FK/Documents/image/0128/subject2F_201.nrrd']
This is the code I have so far:
training_file = []
# For each sample in my image folder
for patches in dir_0128_list:
# Reads the 64x64x64 numpy array
data, options = nrrd.read(patches)
# Calculates the median and sum of the 3D array file. 2 Features per sample
f_median = np.median(data)
training_file.append(f_median)
f_sum = np.sum(data)
training_file.append(f_sum)
# Calculates a numpy array with the following shape (169,) containing 169 features per sample.
f_mof = my_own_function(data)
training_file.append(f_mof)
training_file = np.array((training_file), dtype=np.float32)
# training_file = np.column_stack((core_training_list))
If I don't use the np.column_stack function I get a (173,1) matrix. (1,173) if I run the function. In this scenario it should have a (2,171) shape.
I want to calculate the sum and median and append it to an list or numpy array column wise. At the end of the for loop I want to jump 1 row down and append the 2 features column wise for the second sample and so on...
Very simple solution
Instead of
f_median = np.median(data)
training_file.append(f_median)
f_sum = np.sum(data)
training_file.append(f_sum)
you could do do
training_file.append((np.median(data), np.sum(data)))
slightly longer solution
Then you would still have 1 piece of consecutive code that is not easy to reuse and test indicidually.
I would structure the different parts of the script
Iterate over the files to read the patches
Calculate the mean and sum
Aggregate to the requested format
Read the patches
def read_patches(files):
for file in files:
yield nrrd.read(file)
make a generator yielding the patches info
Calculate
def parse_patch(patch):
data, options = patch
return np.median(data), np.sum(data)
Putting it together
from pathlib import Path
file_dir = Path(<my_filedir>)
files = file_dir.glob('*.nrrd')
patches = read_patches(files)
training_file = np.array([parse_patch(patch) for patch in patches], dtype=np.float32)
This might look convoluted, but it allows for easy testing of each of the sub-blocks

Parallel process dealing with read a bunch of files and save into one Pandas.Dataframe

Background
For some place of interest, we want to extract some useful information from open datasource. Take meteorology data for example, we just want to recognize the long-term temporal pattern of one point, but the datafiles sometimes cover the whole word.
Here, I use Python to extract the vertical velocity for one spot in FNL(ds083.2) file every 6 hours in one year.
In other way, I want to read the original data and save the target variables through the timeline
My attempt
import numpy as np
from netCDF4 import Dataset
import pygrib
import pandas as pd
import os, time, datetime
# Find the corresponding grid box
def find_nearest(array,value):
idx = (np.abs(array-value)).argmin()
return array[idx]
## Obtain the X, Y indice
site_x,site_y =116.4074, 39.9042 ## The area of interests.
grib='./fnl_20140101_06_00.grib2' ## Any files for obtaining lat-lon
grbs=pygrib.open(grib)
grb = grbs.select(name='Vertical velocity')[8]#
lon_list,lat_list = grb.latlons()[1][0],grb.latlons()[0].T[0]
x_indice = np.where(lon_list == find_nearest(lon_list, site_x))[0]
y_indice = np.where(lat_list == find_nearest(lat_list, site_y))[0]
def extract_vm():
files = os.listdir('.') ### All files have already save in one path
files.sort()
dict_vm = {"V":[]}
### Travesing the files
for file in files[1:]:
if file[-5:] == "grib2":
grib=file
grbs=pygrib.open(grib)
grb = grbs.select(name='Vertical velocity')[4] ## Select certain Z level
data=grb.values
data = data[y_indice,x_indice]
dict_vm['V'].append(data)
ff = pd.DataFrame(dict_vm)
return ff
extract_vm()
My thought
How to speed up the reading process? Now, I use the linear reading method, the implement time will increase linearly with the processing temporal period.
Can we split those files in several cluster and tackle with them separately with multi-core processor. Are there any other advices on my code to improve the speed?
Any comments will be appreciate!

Categories