Scipy optimize: Set maximum error - python

I'm trying to optimize a 4 dimensional function with scipy. Everything works so far, except that I'm not satisfied with the quality of the solution. Right now I have ground truth data, which I use to verify my code. What I get so far is:
End error: 1.52606896507e-05
End Gradient: [ -1.17291295e-05 2.60362493e-05 5.15347856e-06 -2.72388430e-05]
Ground Truth: [0.07999999..., 0.0178329..., 0.9372903878..., 1.7756283966...]
Reconstructed: [ 0.08375729 0.01226504 1.13730592 0.21389899]
The error itself sounds good, but as the values are totally wrong I want to force the optimization algorithm (BFGS) to do more steps.
In the documentation I found the options 'gtol' and 'norm' and I tried to set both to pretty small values (like 0.0000001) but it did not seem to change anything.
Background:
The problem is, that I try to demodulate waves, so I have sin and cos terms and potentially many local (or global) minima. I use bruteforce search to find a good starting point, witch helps a lot, but it currently seems that the most work is done by that brute force search, as the optimization uses often only one iteration step. So I'm trying to improve that part of the calculation somehow.

Many local minima + hardly any improvement after brute search, that sounds bad. It's hard to say something very specific with the level of detail you provide in the question, so here are vague ideas to try (basically, what I'd do if I suspect my minimizer gets stuck):
try manually starting the minimizer from a bunch of different initial guesses.
try using a stochastic minimizer. You're tagging a question scipy, so try basinhopping
if worst comes to worst, just throw random points in a loop, leave it to work over the lunch break (or overnight)
Also, waves, sines and cosines --- it might be useful to think if you can reformulate your problem in the Fourier space.

I found out that the gradient at the starting point is already very flat (values in 10^-5), so I tried to scale the gradient function witch I already provided. This seemed to be pretty effective, I could force the Algorithm to do much more steps and my results are far better now.
They are not perfect though, but a complete discussion of this is outside of the bounds of this question, so I might start a new one, where I describe the whole problem from bottom up.

Related

Python GEKKO Unexpected Behavior with Constraints

I've been playing around with GEKKO for solving flow optimizations and I have come across behavior that is confusing me.
Context:
Sources --> [mixing and delivery] --> Sinks
I have multiple sources (where my flow is coming from), and multiple sinks (where my flow goes to). For a given source (e.g., SOURCE_1), the total flow to the resulting sinks must equal to the volume from SOURCE_1. This is my idea of conservation of mass where the 'mixing' plant blends all the source volumes together.
Constraint Example (DOES NOT WORK AS INTENDED):
When I try to create a constraint for the two SINK volumes, and the one SOURCE volume:
m.Equation(volume_sink_1[i] + volume_sink_2[i] == max_volumes_for_source_1)
I end up with weird results. With that, I mean, it's not actually optimal, it ends up assigning values very poorly. I am off from the optimal by at least 10% (I tried with different max volumes).
Constraint Example (WORKS BUT I DON'T GET WHY):
When I try to create a constraint for the two SINK volumes, and the one SOURCE volume like this:
m.Equation(volume_sink_1[i] + volume_sink_2[i] <= max_volumes_for_source_1 * 0.999999)
With this, I get MUCH closer to the actual optimum to the point where I can just treat it as the optimum. Please note that I had to change it to a less than or equal to and also multiply by 0.999999 which was me messing around with it nonstop and eventually leading to that.
Also, please note that this uses practically all of the source (up to 99.9999% of it) as I would expect. So both formulations make sense to me but the first approach doesn't work.
The only thing I can think of for this behavior is that it's stricter to solve for == than <=. That doesn't explain to me why I have to multiply by 0.999999 though.
Why is this the case? Also, is there a way for me to debug occurrences like this easier?
This same improvement occurs with complementary constraints for conditional statements when using s1*s2<=0 (easier to solve) versus s1*s2==0 (harder to solve).
From the research papers I've seen, the justification is that the solver has more room to search to find the optimal solution even if it always ends up at s1*s2==0. It sounds like your problem may have multiple local minima as well if it converges to a solution, but it isn't the global optimum.
If you can post a complete and minimal problem that demonstrates the issue, we can give more specific suggestions.

How to programatically report reasonable local maximas and minimas in a data set?

I have an array of y-values which are evenly spaced along the x-axis and I need to programmatically find the "troughs". I think either Octave or Python3 are good language choices for this problem as I know both have strong math capabilities.
I thought to interpolate the function and look for where the derivatives are 0, but that would require my human eyes to first analyze the resulting graph to know where the maxima and minima already were, but I need this entire thing to be automatic; as to work with an arbitrary dataset.
It dawned on me that this problem likely had an existing solution in a Python3 or Octave function or library, but I could not find one. Does there exist a library to automatically report local maximas and minimas within a dataset?
More Info
My current planned approach is to implement a sort of "n-day moving average" with a threshold. After initializing the first day moving average, I'll watch for the next moving average to move above or below it by a threshold. If it moves higher then I'll consider myself in a "rising" period. If it moves lower then I'm in a "falling" period. While I'm in a rising period, I'll update the maximum observed moving average until the current moving average is sufficiently below the previous maximum.
At this point, I'll consider myself in a "falling" period. I'll lock in the point where the moving average was previously highest, and then repeat except using inverse logic for the "falling" period.
It seemed to me that this is probably a pretty common problem though, so I'm sure there's an existing solution.
Python answer:
This is a common problem, with existing solutions.
Examples include:
peakutils
scipy find_peaks
see also this question
in all cases, you'll have to test your parameters to have what you want.
Octave answer:
I believe immaximas and imregionalmax do exactly what you are looking for (depending on which of the two it is exactly that you are looking for - have a look at their documentation to see the difference).
These are part of the image package, but will obviously work on 1D signals too.
For more 'functional' zero-finding functions, there is also fzero etc.

How do you know when memoization should be used?

I'm learning about memoization and although I have no trouble implementing it when it's needed in Python, I still don't know how to identify situations where it will be needed.
I know it's implemented when there are overlapping subcalls, but how do you identify if this is the case? Some recursive functions seem to go deep before an overlapping call is made. How would you identify this in a coding interview? Would you draw out all the calls (even if some of them go 5-6 levels deep in an O(2^n) complexity brute force solution)?
Cases like the Fibonacci sequence make sense because the overlap happens immediately (the 'fib(i-1)' call will almost immediately overlap with the 'fib(i-2)' call). But for other cases, like the knapsack problem, I still can't wrap my head around how anyone can identify that memoization should be used while at an interview. Is there a quick way to check for an overlap?
I hope my question makes sense. If someone could point me to a good resource, or give me clues to look for, I would really appreciate it. Thanks in advance.
In order to reach the "memoization" solution, you first need to identify the following two properties in the problem:
Overlapping subproblems
Optimal substructure
Looks like you do understand (1). For (2):
Optimal Substructure: If the optimal solution to a problem, S, of size n can be calculated by JUST looking at optimal solution of a subproblem, s, with size < n and NOT ALL solutions to a subproblem, AND it will also result in an optimal solution for problem S, then this problem S is considered to have optimal substructure.
For more detail, please take a look at the following answer:
https://stackoverflow.com/a/59461413/12346373

How do I determine the accuracy of this two-dimensional potential field?

To begin with this might more be a conceptual question than a programming one. I will not write my code below, but if you want to really see it you can take a look at this similar code link. The code's not the important thing here, so don't get confused trying to understand everything all the physics (electrostatics).
Let's say I want to solve this problem. I know the values on the boundaries and I guess a initial solution on the rectangular grid inside these boundaries, see figure below.
Potential=10 on boundaries; potential=0 inside
I know that eventually the field will converge to the same potential as the boundaries. If I let the program iterate long enough the whole area on the inside will also become yellow. The figure below shows an intermediate step towards equilibrium:
Now, in this project I am working on I am supposed to stop the simulation when the accuracy of the simulation is 1%. Is there a general definition of accuracy in these cases when working with a two dimensional grid? There are several grid nodes, all with different values, are these supposed to be 1% or less from equilibrium (all yellow)?
(This might be the wrong forum, I am aware of that.)
Best regards

Comparing Root-finding (of a function) algorithms in Python

I would like to compare different methods of finding roots of functions in python (like Newton's methods or other simple calc based methods). I don't think I will have too much trouble writing the algorithms
What would be a good way to make the actual comparison? I read up a little bit about Big-O. Would this be the way to go?
The answer from #sarnold is right -- it doesn't make sense to do a Big-Oh analysis.
The principal differences between root finding algorithms are:
rate of convergence (number of iterations)
computational effort per iteration
what is required as input (i.e. do you need to know the first derivative, do you need to set lo/hi limits for bisection, etc.)
what functions it works well on (i.e. works fine on polynomials but fails on functions with poles)
what assumptions does it make about the function (i.e. a continuous first derivative or being analytic, etc)
how simple the method is to implement
I think you will find that each of the methods has some good qualities, some bad qualities, and a set of situations where it is the most appropriate choice.
Big O notation is ideal for expressing the asymptotic behavior of algorithms as the inputs to the algorithms "increase". This is probably not a great measure for root finding algorithms.
Instead, I would think the number of iterations required to bring the actual error below some epsilon ε would be a better measure. Another measure would be the number of iterations required to bring the difference between successive iterations below some epsilon ε. (The difference between successive iterations is probably a better choice if you don't have exact root values at hand for your inputs. You would use a criteria such as successive differences to know when to terminate your root finders in practice, so you could or should use them here, too.)
While you can characterize the number of iterations required for different algorithms by the ratios between them (one algorithm may take roughly ten times more iterations to reach the same precision as another), there often isn't "growth" in the iterations as inputs change.
Of course, if your algorithms take more iterations with "larger" inputs, then Big O notation makes sense.
Big-O notation is designed to describe how an alogorithm behaves in the limit, as n goes to infinity. This is a much easier thing to work with in a theoretical study than in a practical experiment. I would pick things to study that you can easily measure that and that people care about, such as accuracy and computer resources (time/memory) consumed.
When you write and run a computer program to compare two algorithms, you are performing a scientific experiment, just like somebody who measures the speed of light, or somebody who compares the death rates of smokers and non-smokers, and many of the same factors apply.
Try and choose an example problem or problems to solve that is representative, or at least interesting to you, because your results may not generalise to sitations you have not actually tested. You may be able to increase the range of situations to which your results reply if you sample at random from a large set of possible problems and find that all your random samples behave in much the same way, or at least follow much the same trend. You can have unexpected results even when the theoretical studies show that there should be a nice n log n trend, because theoretical studies rarely account for suddenly running out of cache, or out of memory, or usually even for things like integer overflow.
Be alert for sources of error, and try to minimise them, or have them apply to the same extent to all the things you are comparing. Of course you want to use exactly the same input data for all of the algorithms you are testing. Make multiple runs of each algorithm, and check to see how variable things are - perhaps a few runs are slower because the computer was doing something else at a time. Be aware that caching may make later runs of an algorithm faster, especially if you run them immediately after each other. Which time you want depends on what you decide you are measuring. If you have a lot of I/O to do remember that modern operating systems and computer cache huge amounts of disk I/O in memory. I once ended up powering the computer off and on again after every run, as the only way I could find to be sure that the device I/O cache was flushed.
You can get wildly different answers for the same problem just by changing starting points. Pick an initial guess that's close to the root and Newton's method will give you a result that converges quadratically. Choose another in a different part of the problem space and the root finder will diverge wildly.
What does this say about the algorithm? Good or bad?
I would suggest you to have a look at the following Python root finding demo.
It is a simple code, with some different methods and comparisons between them (in terms of the rate of convergence).
http://www.math-cs.gordon.edu/courses/mat342/python/findroot.py
I just finish a project where comparing bisection, Newton, and secant root finding methods. Since this is a practical case, I don't think you need to use Big-O notation. Big-O notation is more suitable for asymptotic view. What you can do is compare them in term of:
Speed - for example here newton is the fastest if good condition are gathered
Number of iterations - for example here bisection take the most iteration
Accuracy - How often it converge to the right root if there is more than one root, or maybe it doesn't even converge at all.
Input - What information does it need to get started. for example newton need an X0 near the root in order to converge, it also need the first derivative which is not always easy to find.
Other - rounding errors
For the sake of visualization you can store the value of each iteration in arrays and plot them. Use a function you already know the roots.
Although this is a very old post, my 2 cents :)
Once you've decided which algorithmic method to use to compare them (your "evaluation protocol", so to say), then you might be interested in ways to run your challengers on actual datasets.
This tutorial explains how to do it, based on an example (comparing polynomial fitting algorithms on several datasets).
(I'm the author, feel free to provide feedback on the github page!)

Categories