I have downloaded climate model output in the form of netcdf files with one variable (pr) for the whole world with a daily time-step. My final goal is to have monthly data for Europe.
I have never used netcdf files before and all the specific software for netcdf I could find doesn't seems to work in windows. Since I programme in R, I tried using the ncdf4 package but run into memory size problems (my files are around 2Gb)... I am now trying the netCDF4 module in python (first time I am using python - so go easy on me).
I have managed to install everything and found some code online to import the dataset:
nc_fid = Dataset(nc_f, 'r')
# Extract data from NetCDF file
lats = nc_fid.variables['lat'][:]
lons = nc_fid.variables['lon'][:]
time = nc_fid.variables['time'][:]
pp = nc_fid.variables['pr'][:]
However all the tutorials I found are on how to make a netcdf file... I have no idea how to aggregate this daily rainfall (variable pr) into monthly. Also, I have different types of calender in different files, but I don't even know how to access that information:
time.calendar
AttributeError: 'numpy.ndarray' object has no attribute 'calendar'
Please help, I don't want to have to learn Linux just so I can sort-out some data :(
Why not avoid programming entirely and use NCO which supplies the ncrcat command that aggregates data thusly:
ncrcat day*.nc month.nc
VoilĂ . See more ncrcat examples here.
Added 20160628: If instead of a month-long timeseries you want a monthly average then use the same command only with ncra instead of ncrcat. The manual explains things like this.
If you have a daily timestep and you want to calculate the monthly mean then you can do
cdo monmean input_yyyy.nc output_yyyy.nc
It sounds as if you have several of these files, so you will need to merge them with
cdo mergetime file_*.nc timeseries.nc
where the * is a wildcard for the years.
Related
I currently have a number of csv files each for different locations. Within each file there are two columns one is datetime and the other is hourly maximum wind gust in knots. I also have a separate csv that contains the coordinates for each of these csv file locations.
Initially i want to create a netcdf from 12 locations in a 3 x 4 grid with spacing of 0.25 degrees.
All of the examples I have read online about creating netcdf files from csv start with csv files with lat long and then the variable in them where as I am starting with timeseries and the variable. and lat long for each point separate.
As well as this all the examples I've seen load each timestep in manually one at a time. Obviously if I am using hourly data from 1979 this is unfeasible and if possible would like to load all the data in in one go. If this is not possible then it would still be quicker to load in the data for each grid point as opposed to each time step. Any help at all with these problems would be much appreciated.
I have been following the example from
https://www.esri.com/arcgis-blog/products/arcgis/data-management/creating-netcdf-files-for-analysis-and-visualization-in-arcgis/
if this is of any use to those providing assistance
I am also familiar with CDO but I'm not sure if it has any useful functionality here
cheers
There are a variety of ways of doing this. The simplest is possibly to use pandas and xarray. The code below shows how to create a simple dataframe and save it netCDF using pandas/xarray.
import pandas as pd
import xarray as xr
df = pd.DataFrame({"lon":range(0,10), "lat":range(0,10),
"value":range(0,10)})
df = df.set_index(["lat", "lon"])
df.to_xarray().to_netcdf("outfile.nc")
You haven't specified how the time is stored etc., so I will leave it up to you to work out how to read the csvs and get the times in the necessary format.
I've got a GRIB file with ECMWF forecast, and I'm keen to pull data from it based on coordinate inputs. As in, provide coordinates, and get the forecast for the next 5 days, for a specific time (wind speed, gust speed, wind direction, wave height..).
I think Python is probably the best option to accomplish this. Can someone point me in the right direction? Give me some bits.
I'm guessing the binary needs to be converted to a JSON (or another readable format), and then I can parse through and look for the data based on the coordinates provided?
One way of doing this in native Python is using xarray and cfgrib. Here is a tutorial. Here is the key code from the tutorial:
import xarray as xr
ds = xr.tutorial.load_dataset('<your_grib>.grib', engine='cfgrib')
Once you have done this, all the fields in the grib file will be available. The general form is
ds.<field_name>[<index>].values
Be warned that this code is very slow compared to using the GRIB tools provided by the US National Weather Service. Check out degrib. Most of the weather processing code is written in C and Fortran, because it is so much faster than Python. Depending on your available compute resources and data size, you may not be able to process a whole grib file in Python before the forecast it contains expires.
Finally, this topic is discussed more extensively on the GIS stack exchange. "grib" is a tag over there.
I am new to python and have been searching for this but can't find any questions on this. I have stock price data for hundreds of stocks, all in .txt files. I am trying to upload all of them to jupyter notebook to analyze them, ideally with charts and mathematical analysis (specifically mean reversion analysis).
I am wondering how can I upload so many files at once? I need to be able to analyze each of them to see if they are reverting to their mean price. Then I would like to create a chart that analyzes the top 5 biggest difference from the mean.
Also, should I convert them to .csv files? maybe then upload them to pandas? Also what are some good libraries to use? I know pandas, matplotlib, and the math library, as well as probably numpy.
Thank you.
use glob to read the dir and pandas to read the files.
Then concat them all
from glob import glob
dir_containing_files = 'path_to_csv_files'
df = pd.concat([pd.read_csv(i) for i in glob(dir_containing_files + '/*.txt')])
I'm guessing your text files contain columns of data separated by some delimiter, in which case, you can use pd.DataFrame.read_csv (even without changing the file extension to .csv)
data = pd.read_csv('stock_data.txt', sep=",")
# change `sep` to whatever delimiter is in your files
You could put the line above into a loop to load many files at once. Can't say exactly how to loop through it without knowing the pattern in your file names.
In addition to Pandas, libraries that I would reach for to do mean reversion analysis are:
statsmodel for model fitting
matplotlib for drawing graphs
I wanted to get some minute history data by using the following:
hist_minutes = data.history(context.aapl,'price',50,'1m')
This gave me the following error:
NoDataForSid:No minute data for sid 2509.
This is strange because when I used 1d instead of 1m it did work, so why is that? And how can this be fixed in order to also get minute data.
What data are you using? If you ingested the default Quandl bundle, that dataset in particular only has daily prices, and does not have minutely prices. You'll need to gather your own minutely data and either write a new bundle or ingest csv files.
If you want to use anything that's not US equity daily pricing, you'll need your own data.
Source / Disclaimer: I'm a Zipline maintainer.
I know there is software like wgrib2 that will convert files in grib and grib2 format to NetCDF files, but I need to go the other way: from NetCDF to grib2, because the local weather offices here can only consume gridded data in grib2 format.
It appears that one solution could be in Python, using the NetCDF4-Python library (or other) to read the NetCDF files and using pygrib to write grib2.
Is there a better way?
After some more research, I ended up using the British Met Office "Iris" package (http://scitools.org.uk/iris/docs/latest/index.html) which can read NetCDF as well as OPeNDAP, GRIB and several other formats, and allows to save as NetCDF or GRIB.
Basically the code looks like:
import iris
cubes = iris.load('input.nc') # each variable in the netcdf file is a cube
iris.save(cubes[0],'output.grib2') # save a specific variable to grib
But if your netcdf file doesn't contain sufficient metadata, you may need to add it, which you can also do with Iris. Here's a full working example:
https://github.com/rsignell-usgs/ipython-notebooks/blob/master/files/Iris_CFSR_wave_wind.ipynb
One can also use climate data operators (cdo's) for the task -https://code.zmaw.de/projects/cdo/wiki
but need to install the software with all additional libraries.
I know CDO is mentioned above, but I thought it would be useful to give the full command
cdo -f grb2 copy in.nc out.grb
ECMWF has a command line based tool to do just this: https://software.ecmwf.int/wiki/display/GRIB/grib_to_netcdf