For a machine learning problem, I want to derive the hourly PV power of a specific system given various weather parameter, including hourly GHI and DHI, but no DNI. If I would take one of the pvLib DNI estimation models, I always need the zenith angle. Since I have hourly values for Irradiance, I cannot be very specific regarding the angle. Would you take an hourly average? There is always the problem that angles close to 90° result in super high DNI values.
So far I tried to manually calculate hourly DNI = (GHI-DHI)/cos(zenith), taking the mean of 5 min resolution zenith angles for the hourly zenith. The sunrise in the location is almost always before 7 am, so I should get some very small PV power in hour 6 of the day. However, due to the fact that I take the average which is almost always over 90°, I get 0 kW AC power or for the few days when the mean angle is just below 90° I get 40 kW AC power, which is the system's maximum limited by the inverters and this in these early hours even more unrealistic.
ModelChain Parameters:
pvsys_ref=pvsyst
loc_ref=loc
orient_strat_ref=None
sky_mod_ref='ineichen'
transp_mod_ref='haydavies'
sol_pos_mod_ref='nrel_numpy'
airm_mod_ref='kastenyoung1989'
dc_mod_ref='cec'
ac_mod_ref=None
aoi_mod_ref='physical'
spec_mod_ref='no_loss'
temp_mod_ref='sapm'
loss_mod_ref='no_loss'
The required weather panda consists out of the hourly simulated ghi, dhi, temp and windspeed as well as the manually calculated dni.
Usually the midpoint of the hour is used to calculate the sun position/sun zenith, and for the sunset and sunrise hours, the midpoint of the period when the sun is above the horizon.
To calculate DNI from GHI and DHI, try using the function dni in pvlib.irradiance:
https://pvlib-python.readthedocs.io/en/latest/generated/pvlib.irradiance.dni.html#pvlib.irradiance.dni
Related
I have my cumulative bike riding time for the first 14 days
time = [2.29,2.29,3.15,3.89,4.72,5.21,5.21,5.55,5.8,6.18,6.44,6.9,7.11,7.32]
I know this values described by equation
y(t) = a*ln(t+b)
Where:
t - day of my riding
a,b - coefficients to be founded
I need to find the correct coefficients a and b for 14 days values with optimization sum squares of deviations(pretty simple with solver in Excel). And then predict 30 day value. How to find this coefficients in python?
Thanks for helping me!
I have around 23300 hourly datapoints in my dataset and I try to forecast using Facebook Prophet.
To fine-tune the hyperparameters one can use cross validation:
from fbprophet.diagnostics import cross_validation
The whole procedure is shown here:
https://facebook.github.io/prophet/docs/diagnostics.html
Using cross_validation one needs to specify initial, period and horizon:
df_cv = cross_validation(m, initial='xxx', period='xxx', horizon = 'xxx')
I am now wondering how to configure these three values in my case? As stated I have data of about 23.300 hourly datapoints. Should I take a fraction of that as the horizon or is it not that important to have correct fractions of the data as horizon and I can take whatever value seems to be appropriate?
Furthermore, cutoffs has also be defined as below:
cutoffs = pd.to_datetime(['2013-02-15', '2013-08-15', '2014-02-15'])
df_cv2 = cross_validation(m, cutoffs=cutoffs, horizon='365 days')
Should these cutoffs be equally distributed as above or can we set the cutoffs individually as someone likes to set them?
initial is the first training period. It is the minimum
amount of data needed to begin your training on.
horizon is the length of time you want to evaluate your forecast
over. Let's say that a retail outlet is building their model so
that they can predict sales over the next month. A horizon set to 30
days would make sense here, so that they are evaluating their model
on the same parameter setting that they wish to use it on.
period is the amount of time between each fold. It can be either
greater than the horizon or less than it, or even equal to it.
cutoffs are the dates where each horizon will begin.
You can understand these terms by looking at this image -
credits: Forecasting Time
Series Data with
Facebook Prophet by Greg Rafferty
Let's imagine that a retail outlet wants a model that is able to predict the next month
of daily sales, and they plan on running the model at the beginning of each quarter. They
have 3 years of data
They would set their initial training data to be 2 years, then. They want to predict the
next month of sales, and so would set horizon to 30 days. They plan to run the model
each business quarter, and so would set the period to be 90 days.
Which is also shown in above image.
Let's apply these parameters into our model:
df_cv = cross_validation(model,
horizon='30 days',
period='90 days',
initial='730 days')
thank you for taking a look at this. I have failure data for tires over a 5 year period. For each tire, I have the start date(day0), the end date(dayn), and the number of miles driven for each day. I used the total miles each car drove to create 2 distributions, one weibull, one ecdf. My hope is to be able to use those distributions to predict the probability a tire will fail 50 miles in the future during the life of the tire. So an an example, if its 2 weeks into the life of a tire, and the total miles is currently 100 miles and the average miles per week is 50. I want to predict the probability it will fail at 150 miles/ in a week.
My thinking is that if I can get the probabilities of all tires active on a given day, I can sum the probability of each tires failure to get a prediction of how many tires will need to be replaced for a given time period in the future of the given day.
My current methodology is to fit a distribution using 3 years of failure data using scipy.weibull_min and statsmodel.ecdf. Then if a tire is currently at 100 miles and we expect the next week to add 50 miles to that I get the cdf of 150.
However, currently after I run this across all tires that are on the road on the date I am predicting from and sum their respective probabilities I get a prediction that is ~50% higher than what the actual number of tire replacements is. My first thought is that it is an issue with my methodology. Does it sound valid or am I doing something dumb?
This might be too late of a reply but perhaps it will help someone in the future reading this.
If you are looking to make predictions, you need to fit a parametric model (like the Weibull Distribution). The ecdf (Empirical CDF / Nonparametric model) will give you an indication of how well the parametric model fits but it will not allow you to make any future predictions.
To fit the parametric model, I recommend you use the Python reliability library.
This library makes it fairly straightforward to fit a parametric model (especially if you have right censored data) and then use the fitted model to make the kind of predictions you are trying to make. Scipy won't handle censored data.
If you have failure data for a population of tires then you will be able to fit a model. The question you asked (about the probability of failure in the next week given that it has survived 2 weeks) is called conditional survival. Essentially you want CS(1|2) which means the probability it will survive 1 more week given that it has survived to week 2. You can find this as the ratio of the survival functions (SF) at week 3 and week 2: CS(1|2) = SF(2+1)/SF(2).
Let's take a look at some code using the Python reliability library. I'll assume we have 10 failure times that we will use to fit our distribution and from that I'll find CS(1|2):
from reliability.Fitters import Fit_Weibull_2P
data = [113, 126, 91, 110, 146, 147, 72, 83, 57, 104] # failure times (in weeks) of some tires from our vehicle fleet
fit = Fit_Weibull_2P(failures=data, show_probability_plot=False)
CS_1_2 = fit.distribution.SF([3])[0] / fit.distribution.SF([2])[0] # conditional survival
CF_1_2 = 1 - CS_1_2 # conditional failure
print('Probability of failure of any given tire failing in the next week give it has survived 2 weeks:', CF_1_2)
'''
Results from Fit_Weibull_2P (95% CI):
Point Estimate Standard Error Lower CI Upper CI
Parameter
Alpha 115.650803 9.168086 99.008075 135.091084
Beta 4.208001 1.059183 2.569346 6.891743
Log-Likelihood: -47.5428956288772
Probability of failure in the next week given it has survived 2 weeks: 1.7337430857633507e-07
'''
Let's now assume you have 250 vehicles in your fleet, each with 4 tires (1000 tires in total). The probability of 1 tire failing is CF_1_2 = 1.7337430857633507e-07
We can find the probability of X tires failing (throughout the fleet of 1000 tires) like this:
X = [0, 1, 2, 3, 4, 5]
from scipy.stats import poisson
print('n failed probability')
for x in X:
PF = poisson.pmf(k=x, mu=CF_1_2 * 1000)
print(x, ' ', PF)
'''
n failed probability
0 0.9998266407198806
1 0.00017334425253100934
2 1.502671996412269e-08
3 8.684157279833254e-13
4 3.764024409898102e-17
5 1.305170259061071e-21
'''
These numbers make sense because I generated the data from a weibull distribution with a characteristic life (alpha) of 100 weeks, so we'd expect that the probability of failure during week 3 should be very low.
If you have further questions, feel free to email me directly.
I am working on time-series classification problem using CNN. The dataset used is financial stock market data (like Yahoo Finance). I am using some technical indicators calculated using raw values high,low,volume,open,close.
One of the technical indicators is MACD (Moving Average Convergence Divergence) using TA Library. However, it is written, in most places, that it is calculated for n_fast = 12 and n_slow = 26 periods with RSI (Relative Strength Index) being calculated for 14 days and n_sign = 9 (parameter of macd_diff() in ta library).
So, if I am calculating RSI for 5 days period then how do we set these n_fast and n_slow values according to it? Should these be n_fast = 3 and n_slow = 8. Also, what should be the value of n_sign then? I am new to finance domain.
This is likely a math problem as much as it is a programming problem, but I seem to be encountering severe oscillations in temperature in my class method "update()" when warp is set for a high value (1000+) in the code below. All temperatures are in Kelvin for simplicity.
(I am not a programmer by profession. This formatting is likely unpleasant.)
import math
#Critical to the Stefan-Boltzmann equation. Otherwise known as Sigma
BOLTZMANN_CONSTANT = 5.67e-8
class GeneratorObject(object):
"""Create a new object to run thermal simulation on."""
def __init__(self, mass, emissivity, surfaceArea, material, temp=0, power=5000, warp=1):
self.tK = temp #Temperature of the object.
self.mass = mass #Mass of the object.
self.emissivity = emissivity #Emissivity of the object. Always between 0 and 1.
self.surfaceArea = surfaceArea #Emissive surface area of the object.
self.material = material #Store the material name for some reason.
self.specificHeat = (0.45*1000)*self.mass #Get the specific heat of the object in J/kg (Iron: 0.45*1000=450J/kg)
self.power = power #Joules/Second (Watts) input. This is for heating the object.
self.warp = warp #Warp Multiplier. This pertains to how KSP's warp multiplier works.
def update(self):
"""Update the object's temperature according to it's properties."""
#This method updates the object's temperature according to heat losses and other factors.
self.tK -= (((self.emissivity * BOLTZMANN_CONSTANT * self.surfaceArea * (math.pow(self.tK,4) - math.pow(30+273.15,4))) / self.specificHeat) - (self.power / self.specificHeat)) * self.warp
The law used is the Stefan-Boltzmann law for calculating black-body heat losses:
Temp -= (Emissivity*Sigma*SurfaceArea*(Temp^4-Amb^4))/SpecificHeat)
This was ported from a KSP plugin for quicker debugging. Object.update() is called 50 times per second.
Would there be a solution to preventing these extreme oscillations that doesn't involve executing the code multiple times per step?
Your integration scheme is bad as already hinted by #Beta and #tom10. The integration timestep is self.warp units of time, i.e. self.warp seconds since your work with physical units. This is not the way things are done. You should first convert the equation to a dimensionless format by expressing each term in some sort of computational units. For example, the Stefan-Boltzmann constant and the self.power could be measured in units, in which the constant is 1. Then you should determine the characteristic time for the object, e.g. the time by which the temperature reaches to a certain degree the equilibrium one. If there are many such objects, you should find the smallest of all characteristic times and use it as unit of measurement for the time. Then the integration timestep should be about an order of magnitude less than the characteristic time, otherwise you completely miss the correct solution to the differential equation and end up with wild oscillations.
Example of what happens now: Let's take an 1 kg iron sphere. With surface area of 3,05.10^(-3) m^2 the radiative heating/cooling power is up to 1,73.10^(-10) W/K^4. With self.power equal to 5 kW, the radiative power equates the internal one when the temperature reaches 2319 K and that's the equilibrium temperature. At low temperatures the radiative heating/cooling is negligible and with the internal heating only you end up with temperature rate of 11,1 K/s. If warp is 1000+, your first integration step results in temperature of 11100 K or more, which overshoots the equilibrium one 5 times. Now the radiative energy is orders of magnitude higher than the internal heating and it leads to huge cool-down rate - multiply by 1000+ and you end up with negative temperature. And then the cycle repeats with higher and higher absolute temperatures until you hit outside the range of the floating-point arithmetic.
Here is a hint for you: if self.power is kept constant, then the equation has an analytical solution. Find it (or use a tool like Maple or Mathematica to find it for you) and then plot the solution. See how your timestep of 1000+ units compares to the timescale of the solution, i.e. the time it takes for the system to reach an almost equilibrium state.
I guess KSP = Kerbal Space Platform, so I gather this is a problem in game physics. If so maybe an approximation with the same qualitative behavior is sufficient. Maybe an exponential curve which starts at the initial temperature and falls to the ambient temperature is enough. Pick the decay constant by matching the heat transfer at the initial time.
Sometimes an approximation is good enough. I don't know if this is one of those situations.