I am trying to find the best set of parameters for a model of coupled partial differential equations, i.e. the objective function is not analytical. The underlying rate equations are analytical and must be integrated so the results depend on the history sent to the set of equations. There are up to 16 parameters. They are bounded but there are interdependencies that are unknown (otherwise, I would make some constraints). I have done my best to come up with constant bounds but there are instances where the optimizer chooses parameters that result in division by zero or infinity values.
I have already tried "try:/except:" to no avail. Does anyone know of a way I can get scipy.optmize.minimize to reject/ignore a run if these numerical issues show up?
I am pretty new to OR-TOOLS in Python. I have made several tutorial examples, but I am facing issues trying to model my problem.
Let's see we have a bin packing problem, in which I need to find the fewest bins that will hold all the items in function of their weight. In this typical problem we would want to minimize the number of bins used. But let's say we have an additional objective: to maximize the "quality" of the bin. Here's the problem: to evaluate the quality of that bin, we need to call a non-linear function that takes the items in that bin and returns a quality. I guess I cannot use a multi-objective approach with CP/SAT, so we could model it weighting both objectives.
The problem I am facing is thus the following:
I cannot set the 'quality' as variable because it depends on the
current solution (the items associated to a bin)
How can I do that? assigning a callback? is it possible?
Depending on the "current" solution is not a problem. You could add a "quality" variable, which depends on the values of the variables representing the bins and their contents, and uses the solver's primitives to calculate the desired quantity.
This might not be possible for just any function, but the solver's primitives do allow some forms of non-linear calculations (just as an example, you can calculate abs(x), or x^2, (ref)).
So, for instance, you could have a quality variable which calculates the (number of bins used)^2.
Once you get to a form of quality calculation which works within the solver, you can go back to use one of the approaches for solving for more than a single objective, like weighted sum.
I'm currently scratching my head about how I might implement a classic ARIMA(X) model using base TensorFlow (and optionally Keras). The equation I am attempting to setup has the following form:
Where d represents the level of differencing applied to the input observed time series, p is the auto-regressive order, and q is the moving average order. The part which is stumping me currently is the calculation/estimation of the residuals epsilon. The auto-regression portion is a simple linear regression on the lagged samples, while the same is true for the terms involving the exogenous series (X). When I am estimating the residuals, should I simply feed the q-many previous steps into the current estimated parameters, and compute the residuals as y_true-y_predict? Though this also begs the question of: How does one estimate the residuals for observations where you have no previous observations? Do we simply estimate residuals 0 through q simply on a chosen random distribution of set variance (e.g. Normal, Poisson, etc.) with a mean of 0?
I have looked at the source for the statsmodels package to try to understand it, but it is quite opaque. Part of the reason for implementing the model this way is that it needs to fit into a fairly standard ecosystem at the company I work for, and we need control over what slices of data the model is fitted to at a given time step. This is because some data may arrive (much) later than the time stamp it relates to, due to lag at the source etc.
Thank you for any help you might be able to offer.
I'm trying to make a model for a very simple data set using spline regression but so far I couldn't find any Python implementation that lets me choose knots position. The picture below shows where I want to put my knot, I want my function to consist only of 2 linear regressions and nothing more.
So far I've tried pyearth and scipy splines but I couldn't find in any of them parameter responsible for setting knots position and even when I tweak other parameters I can't get result that would satisfy me.
patsy.dmatrix and scipy.interpolate.splrep both have knot selection features
I have a list of many float numbers, representing the length of an operation made several times.
For each type of operation, I have a different trend in numbers.
I'm aware of many random generators presented in some python modules, like in numpy.random
For example, I have binomial, exponencial, normal, weibul, and so on...
I'd like to know if there's a way to find the best random generator, given a list of values, that best fit each list of numbers that I have.
I.e, the generator (with its params) that best fit the trend of the numbers on the list
That's because I'd like to automatize the generation of time lengths, of each operation, so that I can simulate it during n years, without having to find by hand what method fits best what list of numbers.
EDIT: In other words, trying to clarify the problem:
I have a list of numbers. I'm trying to find the probability distribution that best fit the array of numbers I already have. The only problem I see is that each probability distribution has input params that may interfer on the result. So I'll have to figure out how to enter this params automatically, trying to best fit the list.
Any idea?
You might find it better to think about this in terms of probability distributions, rather than thinking about random number generators. You can then think in terms of testing goodness of fit for your different distributions.
As a starting point, you might try constructing probability plots for your samples. Probably the easiest in terms of the math behind it would be to consider a Q-Q plot. Using the random number generators, create a sample of the same size as your data. Sort both of these, and plot them against one another. If the distributions are the same, then you should get a straight line.
Edit: To find appropriate parameters for a statistical model, maximum likelihood estimation is a standard approach. Depending on how many samples of numbers you have and the precision you require, you may well find that just playing with the parameters by hand will give you a "good enough" solution.
Why using random numbers for this is a bad idea has already been explained. It seems to me that what you really need is to fit the distributions you mentioned to your points (for example, with a least squares fit), then check which one fits the points best (for example, with a chi-squared test).
EDIT Adding reference to numpy least squares fitting example
Given a parameterized univariate distirbution (e.g. exponential depends on lambda, or gamma depends on theta and k), the way to find the parameter values that best fit a given sample of numbers is called the Maximum Likelyhood procedure. It is not a least squares procedure, which would require binning and thus loosing information! Some Wikipedia distribution articles give expressions for the maximum likelyhood estimates of parameters, but many do not, and even the ones that do are missing expressions for error bars and covarainces. If you know calculus, you can derive these results by expressing the log likeyhood of your data set in terms of the parameters, setting the second derivative to zero to maximize it, and using the inverse of the curvature matrix at the minimum as the covariance matrix of your parameters.
Given two different fits to two different parameterized distributions, the way to compare them is called the likelyhood ratio test. Basically, you just pick the one with the larger log likelyhood.
Gabriel, if you have access to Mathematica, parameter estimation is built in:
In[43]:= data = RandomReal[ExponentialDistribution[1], 10]
Out[43]= {1.55598, 0.375999, 0.0878202, 1.58705, 0.874423, 2.17905, \
0.247473, 0.599993, 0.404341, 0.31505}
In[44]:= EstimatedDistribution[data, ExponentialDistribution[la],
ParameterEstimator -> "MaximumLikelihood"]
Out[44]= ExponentialDistribution[1.21548]
In[45]:= EstimatedDistribution[data, ExponentialDistribution[la],
ParameterEstimator -> "MethodOfMoments"]
Out[45]= ExponentialDistribution[1.21548]
However, it might be easy to figure what maximum likelihood method commands the parameter to be.
In[48]:= Simplify[
D[LogLikelihood[ExponentialDistribution[la], {x}], la], x > 0]
Out[48]= 1/la - x
Hence the estimated parameter for exponential distribution is sum (1/la -x_i) from where la = 1/Mean[data]. Similar equations can be worked out for other distribution families and coded in the language of your choice.