xarray: compute daily anomalies from monthly resampled average (not the climatology) - python

xarray's documentation explains how to compute anomalies to the monthly climatology. Here I am trying to do something slightly different: from daily timeseries, I would like to compute the daily anomaly to this month's average (not from the monthly climatology).
I managed to do it using groupby and a manualy created monthly stamp (code below). Is there a better, less hacky way to obtain the same result?
import xarray as xr
import numpy as np
import pandas as pd
# Create a data array
t = pd.date_range('2001', '2003', freq='D')
da = xr.DataArray(np.arange(len(t)), coords={'time':t}, dims='time')
# Monthly time stamp for groupby
da.coords['stamp'] = ('time', [str(y) + '-' + str(m) for (y, m) in
zip(da['time.year'].values,
da['time.month'].values)])
# Anomaly
da_ano = da.groupby('stamp') - da.groupby('stamp').mean()
da_ano.plot();

You could explicitly resample the monthly time-series of means into a daily time-series. Example:
monthly = da.resample(time='1MS').mean()
upsampled_monthly = monthly.resample(time='1D').ffill()
anomalies = da - upsampled_monthly

Related

How to calculate rolling annualized returns 1/2/3 years?

I have a dataframe with s&p weekly returns as defined below. I'm looking to compute the rolling annualized return on a 1 and 2 year basis with a rolling period window step of 1 week. I'm not an expert in performance metrics - do I compute the weekly annualized return for each date in the time series? Any guidance is appreciated.
import pandas as pd
rtn = [['12/25/2004','1.371495'],['1/1/2005','0.169999'],['1/8/2005','-2.079667'],['1/15/2005','-0.121435'],['1/22/2005','-1.395396'],['1/29/2005','0.319125'],['2/5/2005','2.748624'],['2/12/2005','0.264128'],['2/19/2005','-0.272399'],['2/26/2005','0.867309'],['3/5/2005','0.926353'],['3/12/2005','-1.750495'],['3/19/2005','-0.854506'],['3/26/2005','-1.527771'],['4/2/2005','0.162125'],['4/9/2005','0.746782'],['4/16/2005','-3.248578'],['4/23/2005','0.844543'],['4/30/2005','0.455245'],['5/7/2005','1.282951'],['5/14/2005','-1.400242'],['5/21/2005','3.097504'],['5/28/2005','0.829419'],['6/4/2005','-0.19604'],['6/11/2005','0.205592'],['6/18/2005','1.608322'],['6/25/2005','-2.062291'],['7/2/2005','0.284588'],['7/9/2005','1.490092'],['7/16/2005','1.343825'],['7/23/2005','0.482289'],['7/30/2005','0.08428'],['8/6/2005','-0.603821'],['8/13/2005','0.387027'],['8/20/2005','-0.81305'],['8/27/2005','-1.188282'],['9/3/2005','1.134381'],['9/10/2005','1.948857'],['9/17/2005','-0.248249'],['9/24/2005','-1.80248'],['10/1/2005','1.137008'],['10/8/2005','-2.629315'],['10/15/2005','-0.760918'],['10/22/2005','-0.575239'],['10/29/2005','1.622626'],['11/5/2005','1.860981'],['11/12/2005','1.272326'],['11/19/2005','1.150783'],['11/26/2005','1.614237'],['12/3/2005','-0.185336'],['12/10/2005','-0.415427'],['12/17/2005','0.656002'],['12/24/2005','0.148925'],['12/31/2005','-1.58025'],['1/7/2006','3.023067'],['1/14/2006','0.186268'],['1/21/2006','-2.016371'],['1/28/2006','1.780614'],['2/4/2006','-1.484149'],['2/11/2006','0.287199'],['2/18/2006','1.657959'],['2/25/2006','0.220439'],['3/4/2006','-0.122329'],['3/11/2006','-0.400158'],['3/18/2006','2.043408'],['3/25/2006','-0.324942'],['4/1/2006','-0.599207'],['4/8/2006','0.097749'],['4/15/2006','-0.471862'],['4/22/2006','1.733836'],['4/29/2006','-0.009851'],['5/6/2006','1.180809'],['5/13/2006','-2.531495'],['5/20/2006','-1.830585'],['5/27/2006','1.067124'],['6/3/2006','0.678422'],['6/10/2006','-2.754486'],['6/17/2006','-0.025335'],['6/24/2006','-0.53429'],['7/1/2006','2.090994'],['7/8/2006','-0.32387'],['7/15/2006','-2.293133'],['7/22/2006','0.344589'],['7/29/2006','3.109117'],['8/5/2006','0.11205'],['8/12/2006','-0.915976'],['8/19/2006','2.853287'],['8/26/2006','-0.532922'],['9/2/2006','1.29413'],['9/9/2006','-0.897293'],['9/16/2006','1.65329'],['9/23/2006','-0.358821'],['9/30/2006','1.626733'],['10/7/2006','1.081837'],['10/14/2006','1.20317'],['10/21/2006','0.233226'],['10/28/2006','0.656253'],['11/4/2006','-0.896976'],['11/11/2006','1.268202'],['11/18/2006','1.533185'],['11/25/2006','0.007764'],['12/2/2006','-0.237287'],['12/9/2006','0.976213'],['12/16/2006','1.254487'],['12/23/2006','-1.099541'],['12/30/2006','0.562431'],['1/6/2007','-0.587128']]
df = pd.DataFrame(rtn, columns=['date', 'weekly_rtn'])
df['date'] = pd.to_datetime(df['date'])
df['weekly_rtn'] = pd.to_numeric(df['weekly_rtn'])
df.head()

how can i use dataframe and datetimeindex to return rolling 12-month?

Imagine a pandas dataframe with 2 columns (“Manager Returns” and “Benchmark Returns”) and a DatetimeIndex of monthly frequency. Please write a function to calculate the rolling 12-month manager alpha and rolling-12 month tracking error (both annualized).
so far I have this but confused about the rolling-12 month:
import pandas as pd
import numpy as np
#define dummy dataframe with monthly returns
df = pd.DataFrame(1 + np.random.rand(20), columns=['returns'])
#compute 12-month rolling returns
df_roll = df.rolling(window=12).apply(np.prod) - 1
So, you want to calculate the excess return on the 'Manager Returns' compared to the 'Benchmark Returns. First, we create some random data for these two values.
import pandas as pd
import numpy as np
n=20
df = pd.DataFrame(dict(
Manager=np.random.randint(2, 9, size=n),
Benchmark=np.random.randint(1, 7, size=n),
index=pd.date_range("20180101", freq='MS', periods=20)))
df.set_index('index', inplace = True)
To calculates the excess return (Alpha), the rolling mean of Alpha and the rolling mean of Tracking Error we create new columns for each value.
# Create Alpha
df['Alpha'] = df['Manager'] - df['Benchmark']
# Rolling mean of Alpha
df['Alpha_rolling'] = df['Alpha'].rolling(12).mean()
# Rolling mean of Tracking error
df['TrackingError_rolling'] = df['Alpha'].rolling(12).std()
Edit: I see that the values should be annualized, so you would have to transform the monthly returns I guess, my finance lingo is not currently up to date.

mask NetCDF using shapefile and calculate average and anomaly for all polygons within the shapefile

There are several tutorials (example 1, example 2, example 3) about masking NetCDF using shapefile and calculating average measures. However, I was confused with those workflows about masking NetCDF and extracting measures such as average, and those tutorials did not include extract anomaly (for example, the difference between temperature in 2019 and a baseline average temperature).
I make an example here. I have downloaded monthly temperature (download temperature file) from 2000 to 2019 and the state-level US shapefile (download shapefile). I want to get the state-level average temperature based on the monthly average temperature from 2000 to 2019 and the temperature anomaly of year 2019 relative to baseline temperature from 2000 to 2010. Specifically, the final dataframe looks as follow:
state
avg_temp
anom_temp2019
AL
xx
xx
AR
xx
xx
...
...
...
WY
xx
xx
# Load libraries
%matplotlib inline
import regionmask
import numpy as np
import xarray as xr
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
# Read shapefile
us = gpd.read_file('./shp/state_cus.shp')
# Read gridded data
ds = xr.open_mfdataset('./temp/monthly_mean_t2m_*.nc')
......
I really appreciate your help that providing an explicit workflow that could do the above task. Thanks a lot.
This can be achieved using regionmask. I don't use your files but the xarray tutorial data and naturalearth data for the US states.
import numpy as np
import regionmask
import xarray as xr
# load polygons of US states
us_states_50 = regionmask.defined_regions.natural_earth.us_states_50
# load an example dataset
air = xr.tutorial.load_dataset("air_temperature")
# turn into monthly time resolution
air = air.resample(time="M").mean()
# create a mask
mask3D = us_states_50.mask_3D(air)
# latitude weights
wgt = np.cos(np.deg2rad(air.lat))
# calculate regional averages
reg_ave = air.weighted(mask3D * wgt).mean(("lat", "lon"))
# calculate the average temperature (over 2013-2014)
avg_temp = reg_ave.sel(time=slice("2013", "2014")).mean("time")
# calculate the anomaly (w.r.t. 2013-2014)
reg_ave_anom = reg_ave - avg_temp
# select a single timestep (January 2013)
reg_ave_anom_ts = reg_ave_anom.sel(time="2013-01")
# remove the time dimension
reg_ave_anom_ts = reg_ave_anom_ts.squeeze(drop=True)
# convert to a pandas dataframe so it's in tabular form
df = reg_ave_anom_ts.air.to_dataframe()
# set the state codes as index
df = df.set_index("abbrevs")
# remove other columns
df = df.drop(columns="names")
You can find info how to use your own shapefile on the regionmask docs (Working with geopandas).
disclaimer: I am the main author of regionmask.

How to calculate daily evapotranspiration by hargreaves-samani equation and using python?

I have a ten-year weather data including maximum temperature (Tmax), minimum temperature (Tmin), rainfall and solar radiation (Ra) for each day.
At first, I would like to calculate evapotranspiration (ETo) for each day using the following equation:
ETo=0.0023*(((Tmax+Tmin)/2)+17.8)*sqrt(Tmax-Tmin)*Ra
Then, calculation of the monthly and yearly average of all parameters (Tmax,Tmin, Rainfall, Ra and ETo) and print them in Excel format.
I have written some parts. could you possibly help me with completing it? I think it may need a loop.
import numpy as np
import pandas as pd
import math as mh
# load the weather data file
data_file = pd.read_excel(r'weather data.xlsx', sheet_name='city_1')
# defining time
year = data_file['Year']
month = data_file['month']
day = data_file['day']
# defining weather parameters
Tmax = data_file.loc[:,'Tmax']
Tmin = data_file.loc[:,'Tmin']
Rainfall = data_file.loc[:,'Rainfall']
Ra = data_file.loc[:,'Ra']
# adjusting time to start at zero
year = year-year[0]
month=month-month[0]
day=day-day[0]
#calculation process for estimation of evapotranspiration
ET0=(0.0023*(((Tmax+Tmin)/2)+17.8)*(mh.sqrt(Tmax-Tmin))*Ra
Looks like you've got one data row (record) per day.
Since you already have Tmax, Tmin, Rainfall, and Sunhours in the row, you could add a net ET0 row with the calculation like this:
data_file['ET0'] = data_file.apply(lambda x: 0.0023*(((x.Tmax+x.Tmin)/2)+17.8)*(mh.sqrt(x.Tmax-x.Tmin))*x.Ra, axis=0)

Resample time series data to find tail characteristics

I have daily time series data. I am able to convert to monthly (or quarterly) time series and obtain monthly mean using the resample function, provided by this link.
df.Date = pd.to_datetime(df.Date)
df.set_index('Date', inplace=True)
df.resample('MS').mean()
Instead of monthly mean, I am interested in obtaining monthly skewness (or kurtosis).
You could try something like this with scipy.stats.skew:
from scipy.stats import skew
df.resample('MS').agg(skew)
Or with scipy.stats.kurtosis:
from scipy.stats import kurtosis
df.resample('MS').agg(kurtosis)
Or as #Ben.T suggests, you can use the functions that pandas provides (pd.Series.skew, pd.Series.kurtosis):
df.resample('MS').agg([pd.Series.skew, pd.Series.kurtosis])
#Same as:
#df.resample('MS').skew()
#or:
#df.resample('MS').kurtosis()

Categories