How to calculate daily evapotranspiration by hargreaves-samani equation and using python? - python

I have a ten-year weather data including maximum temperature (Tmax), minimum temperature (Tmin), rainfall and solar radiation (Ra) for each day.
At first, I would like to calculate evapotranspiration (ETo) for each day using the following equation:
ETo=0.0023*(((Tmax+Tmin)/2)+17.8)*sqrt(Tmax-Tmin)*Ra
Then, calculation of the monthly and yearly average of all parameters (Tmax,Tmin, Rainfall, Ra and ETo) and print them in Excel format.
I have written some parts. could you possibly help me with completing it? I think it may need a loop.
import numpy as np
import pandas as pd
import math as mh
# load the weather data file
data_file = pd.read_excel(r'weather data.xlsx', sheet_name='city_1')
# defining time
year = data_file['Year']
month = data_file['month']
day = data_file['day']
# defining weather parameters
Tmax = data_file.loc[:,'Tmax']
Tmin = data_file.loc[:,'Tmin']
Rainfall = data_file.loc[:,'Rainfall']
Ra = data_file.loc[:,'Ra']
# adjusting time to start at zero
year = year-year[0]
month=month-month[0]
day=day-day[0]
#calculation process for estimation of evapotranspiration
ET0=(0.0023*(((Tmax+Tmin)/2)+17.8)*(mh.sqrt(Tmax-Tmin))*Ra

Looks like you've got one data row (record) per day.
Since you already have Tmax, Tmin, Rainfall, and Sunhours in the row, you could add a net ET0 row with the calculation like this:
data_file['ET0'] = data_file.apply(lambda x: 0.0023*(((x.Tmax+x.Tmin)/2)+17.8)*(mh.sqrt(x.Tmax-x.Tmin))*x.Ra, axis=0)

Related

mask NetCDF using shapefile and calculate average and anomaly for all polygons within the shapefile

There are several tutorials (example 1, example 2, example 3) about masking NetCDF using shapefile and calculating average measures. However, I was confused with those workflows about masking NetCDF and extracting measures such as average, and those tutorials did not include extract anomaly (for example, the difference between temperature in 2019 and a baseline average temperature).
I make an example here. I have downloaded monthly temperature (download temperature file) from 2000 to 2019 and the state-level US shapefile (download shapefile). I want to get the state-level average temperature based on the monthly average temperature from 2000 to 2019 and the temperature anomaly of year 2019 relative to baseline temperature from 2000 to 2010. Specifically, the final dataframe looks as follow:
state
avg_temp
anom_temp2019
AL
xx
xx
AR
xx
xx
...
...
...
WY
xx
xx
# Load libraries
%matplotlib inline
import regionmask
import numpy as np
import xarray as xr
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
# Read shapefile
us = gpd.read_file('./shp/state_cus.shp')
# Read gridded data
ds = xr.open_mfdataset('./temp/monthly_mean_t2m_*.nc')
......
I really appreciate your help that providing an explicit workflow that could do the above task. Thanks a lot.
This can be achieved using regionmask. I don't use your files but the xarray tutorial data and naturalearth data for the US states.
import numpy as np
import regionmask
import xarray as xr
# load polygons of US states
us_states_50 = regionmask.defined_regions.natural_earth.us_states_50
# load an example dataset
air = xr.tutorial.load_dataset("air_temperature")
# turn into monthly time resolution
air = air.resample(time="M").mean()
# create a mask
mask3D = us_states_50.mask_3D(air)
# latitude weights
wgt = np.cos(np.deg2rad(air.lat))
# calculate regional averages
reg_ave = air.weighted(mask3D * wgt).mean(("lat", "lon"))
# calculate the average temperature (over 2013-2014)
avg_temp = reg_ave.sel(time=slice("2013", "2014")).mean("time")
# calculate the anomaly (w.r.t. 2013-2014)
reg_ave_anom = reg_ave - avg_temp
# select a single timestep (January 2013)
reg_ave_anom_ts = reg_ave_anom.sel(time="2013-01")
# remove the time dimension
reg_ave_anom_ts = reg_ave_anom_ts.squeeze(drop=True)
# convert to a pandas dataframe so it's in tabular form
df = reg_ave_anom_ts.air.to_dataframe()
# set the state codes as index
df = df.set_index("abbrevs")
# remove other columns
df = df.drop(columns="names")
You can find info how to use your own shapefile on the regionmask docs (Working with geopandas).
disclaimer: I am the main author of regionmask.

xarray: compute daily anomalies from monthly resampled average (not the climatology)

xarray's documentation explains how to compute anomalies to the monthly climatology. Here I am trying to do something slightly different: from daily timeseries, I would like to compute the daily anomaly to this month's average (not from the monthly climatology).
I managed to do it using groupby and a manualy created monthly stamp (code below). Is there a better, less hacky way to obtain the same result?
import xarray as xr
import numpy as np
import pandas as pd
# Create a data array
t = pd.date_range('2001', '2003', freq='D')
da = xr.DataArray(np.arange(len(t)), coords={'time':t}, dims='time')
# Monthly time stamp for groupby
da.coords['stamp'] = ('time', [str(y) + '-' + str(m) for (y, m) in
zip(da['time.year'].values,
da['time.month'].values)])
# Anomaly
da_ano = da.groupby('stamp') - da.groupby('stamp').mean()
da_ano.plot();
You could explicitly resample the monthly time-series of means into a daily time-series. Example:
monthly = da.resample(time='1MS').mean()
upsampled_monthly = monthly.resample(time='1D').ffill()
anomalies = da - upsampled_monthly

Pandas: parallel plots using groupie

I was wondering if anyone could help me with parallel coordinate plotting.
First this is how the data looks like:
It's data manipulated from : https://data.cityofnewyork.us/Transportation/2016-Yellow-Taxi-Trip-Data/k67s-dv2t
So I'm trying to normalise some features and use that to compute the mean of trip distance, passenger count and payment amount for each day of the week.
from pandas.tools.plotting import parallel_coordinates
feature = ['trip_distance','passenger_count','payment_amount']
#normalizing data
for feature in features:
df[feature] = (df[feature]-df[feature].min())/(df[feature].max()-df[feature].min())
#change format to datetime
pickup_time = pd.to_datetime(df['pickup_datetime'], format ='%d/%m/%y %H:%M')
#fill dayofweek column with 0~6 0:Monday and 6:Sunday
df['dayofweek'] = pickup_time.dt.weekday
mean_trip = df.groupby('dayofweek').trip_distance.mean()
mean_passanger = df.groupby('dayofweek').passenger_count.mean()
mean_payment = df.groupby('dayofweek').payment_amount.mean()
#parallel_coordinates('notsurewattoput')
So if I print mean_trip:
It shows the mean of each day of the week but I'm not sure how I would use this to draw a parallel coordinate plot with all 3 means on the same plot.
Does anyone know how to implement this?
I think you can change 3 times aggregating mean to one with output DataFrame instead 3 Series:
mean_trip = df.groupby('dayofweek').trip_distance.mean()
mean_passanger = df.groupby('dayofweek').passenger_count.mean()
mean_payment = df.groupby('dayofweek').payment_amount.mean()
to:
from pandas.tools.plotting import parallel_coordinates
cols = ['trip_distance','passenger_count','payment_amount']
df1 = df.groupby('dayofweek', as_index=False)[cols].mean()
#https://stackoverflow.com/a/45082022
parallel_coordinates(df1, class_column='dayofweek', cols=cols)

Numpy Reshape to obtain monthly means from data

I'm trying to obtain monthly means from an observed precipitation data set for the period 1901-2015. The current shape of my prec variable is (1380(time), 360(lon), 720(lat)), with 1380 being the number of months over a 115 year period. I have been informed that to calculate monthly means, the most effective way is to conduct an np.reshape command on the prec variable to split the array up into months and years. However I am not sure what the best way to do this is. I was also wondering if there was a way in Python to select specific months of the year, as I will be producing plots for each month of the year.
I have been attempting to reshape the prec variable with the code below. However I am not sure how to do this correctly:
#Set Source Folder
sys.path.append('../../..')
SrcFld = ("/export/silurian/array-01/obs/CRU/")
#Retrieve Data
data_path = ''
example = (str(SrcFld) + 'cru_ts4.00.1901.2015.pre.dat.nc')
Data = Dataset(example)
#Create Prec Mean Array and reshape to get monthly means
Prec_mean = np.zeros((360,720))
#Retrieve Variables
Prec = Data.variables['pre'][:]
lats = Data.variables['lat'][:]
lons = Data.variables['lon'][:]
np.reshape(Prec, ())
#Get Annual/Monthly Average
Prec_mean =np.mean(Prec,axis=0)
Any guidance on this issue would be appreciated.
The following snippet will first dice the precipitation array year-wise. We can then use that array to get the monthly average of precipitation.
>>> prec = np.random.rand(1380,360,720)
>>> ind = np.arange(12,1380,12)
>>> yearly_split = np.array(np.split(prec, ind, axis=0))
>>> yearly_split.shape
(115, 12, 360, 720)
>>> monthly_mean = yearly_split.mean(axis=0)
>>> monthly_mean.shape
(12, 360, 720)

Questions on pandas moving average

I am a beginner of python and pandas. I am having difficulty with making volatility adjusted moving average, so I need your help.
Volatility adjusted moving average is a kind of moving average, of which moving average period is not static, but dynamically adjusted according to volatility.
What I'd like to code is,
Get stock data from yahoo finance (monthly close)
Calculate monthly volatility X some constant --> use variables of dynamic moving average period
Calculate dynamic moving average
I've tried this code, but only to fail. I don't know what the problem is. If you know the problem, or any better code suggestion, please let me know.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import pandas_datareader.data as web
def price(stock, start):
price = web.DataReader(name=stock, data_source='yahoo', start=start)['Adj Close']
price = price / price[0]
a = price.resample('M').last().to_frame()
a.columns = ['price']
return a
a = price('SPY','2000-01-01')
a['volperiod'] = round(a.rolling(12).std()*100)*2
for i in range(len(a.index)):
k = a['price'].rolling(int(a['volperiod'][i])).mean()
a['ma'][i] = k[i]
print(a)
first of all: you need to calculate pct_change on price to calculate volatility of returns
my solution
def price(stock, start):
price = web.DataReader(name=stock, data_source='yahoo', start=start)['Adj Close']
return price.div(price.iat[0]).resample('M').last().to_frame('price')
a = price('SPY','2000-01-01')
v = a.pct_change().rolling(12).std().dropna().mul(200).astype(int)
def dyna_mean(x):
end = a.index.get_loc(x.name)
start = end - x.price
return a.price.iloc[start:end].mean()
pd.concat([a.price, v.price, v.apply(dyna_mean, axis=1)],
axis=1, keys=['price', 'vol', 'mean'])

Categories