Restrict SciPy optimization to integers - python

I'm using SciPy's optimization functions (in particular shgo()) in order to optimize my problem. Right now I'm managing to get a valid solution, however I would like to improve this a little bit.
My function is solving a NLU problem. Basically, I have a tokenized sentence and for each word I have a potential interpretation. For each combination I can apply black box grammar rules which will result in a score.
The problem with this is that in terms of complexity it can be disastrous, since it's O(exp(n)).
For this reason I'm using the shgo() optimization algorithm (or similar things) which so far gives me good results, the only thing is that the minimizing function uses real values instead of integer, yet my parameters are integer (word 1 = interpretation 2, word 2 = interpretation 1, ..., word N = interpretation I).
In the end, for some options that are fairly obvious (1 interpretation or less for each word) it takes 170 runs because it's trying to find the exact value while it's actually exploring things in the range [0, 1[ which is actually all the same thing for me.
I would like to have integer steps but after playing with the different parameters a bit I couldn't find how to tell the minimizer to have smaller steps. Even if it's not strictly integers, just have the thing to stop when it's 0.5 away from a solution would already be a wonderful improvement.
Edit: you can have a look at the code if you want.
Thanks!

Related

How do write a code to distribute different weights as evenly as possible among 4 boxes?

I was given a problem in which you are supposed to write a python code that distributes a number of different weights among 4 boxes.
Logically we can't expect a perfect distribution as in case we are given weights like 10, 65, 30, 40, 50 and 60 kilograms, there is no way of grouping those numbers without making one box heavier than another. But we can aim for the most homogenous distribution. ((60),(40,30),(65),(50,10))
I can't even think of an algorithm to complete this task let alone turn it into python code. Any ideas about the subject would be appreciated.
The problem you're describing is similar to the "fair teams" problem, so I'd suggest looking there first.
Because a simple greedy algorithm where weights are added to the lightest box won't work, the most straightforward solution would be a brute force recursive backtracking algorithm that keeps track of the best solution it has found while iterating over all possible combinations.
As stated in #j_random_hacker's response, this is not going to be something easily done. My best idea right now is to find some baseline. I describe a baseline as an object with the largest value since it cannot be subdivided. Using that you can start trying to match the rest of the data to that value which would only take about three iterations to do. The first and second would create a list of every possible combination and then the third can go over that list and compare the different options by taking the average of each group and storing the closest average value to your baseline.
Using your example, 65 is the baseline and since you cannot subdivide it you know that has to be the minimum bound on your data grouping so you would try to match all of the rest of the values to that. It wont be great, but it does give you something to start with.
As j_random_hacker notes, the partition problem is NP-complete. This problem is also NP-complete by a reduction from the 4-partition problem (the article also contains a link to a paper by Garey and Johnson that proves that 4-partition itself is NP-complete).
In particular, given a list to 4-partition, you could feed that list as an input to a function that solves your box distribution problem. If each box had the same weight in it, a 4-partition would exist, otherwise not.
Your best bet would be to create an exponential time algorithm that uses backtracking to iterate over the 4^n possible assignments. Because unless P = NP (highly unlikely), no polynomial time algorithm exists for this problem.

Optimizing/learning function in Python

I would like to create a function that, given a list of integers as input, returns a boolean based on that number. I would like it to use an algorithm to find the optimum cut-off value that optimizes the number of correct returns.
Is there some tool built-in with Python for this? Otherwise, how would I approach such a problem using Python? Preferably, I would want to learn how to do both.
This appears to be something that a linear machine learning algorithm could solve. In fact, the Ordinary Least Squares linear classification model seems to follow the exact outline you provide: it uses an algorithm to attempt to match it's output with your examples based on the numerical input, with the heuristic it attempts to minimize being a number of answers it gets wrong. If this is indeed the case, I believe scikit-learn will be the library you want. As to learning how this is done, the document linked above will at least get you started.

Time complexity for matching and counting between two lists

n_dicwords = [np.sum([c.lower().count(w.decode('utf-8')) for w in dictionary])
for c in documents]
Here I am trying to determine my feature engineering computation time:
By using this line of code, which goes through every document and checks whether or not and if yes then how many its words also appear in this dictionary that I have, it generates a feature called n_dicwords. Sorry I am such a noob to complexity theory, I think the time complexity for generating this feature is O(n* m*w) where n is the number of documents, m is the number of words in each document and w is the number of words in the dictionary. Am I right? And if so is there any way to improve this?
Thank you so much! I am really appreciated for your help!
Unless the code underneath your code does any clever stuff your complexity analysis should be correct.
If performance in this part is important you should use a multiple-pattern string search algorithm, which attempts to solve pretty much the exact problem you are doing.
To start with have a look at Aho-Corasick which is the most commonly used one and runs in linear time. Googling "Aho-Corasick python" turned up a few different implementations, so while I have not used any of them personally I would think you would not have to implement the algorithm itself to use it.
If you just need your code to run a little faster, and don't need to get the best performance you possibly could you could just use a set for the dictionary. In python a normal set is a hash set, so it has constant time lookup. Then you could just for each word check if it is in the dictionary.
I'm slightly surprised to note the the "x in s" construction in python is O(n), where n is the number of items n the list. So, your estimation is correct. A slightly more correct way of looking at it: Since your document or wor counts in said aren't changing at all, the important numbers are the total number of words which must be checked, and the length of the dictionary against which they are being checked. Obviously, this doesn't change the number of computations at all, it just gets us to a quickly recognizable form of O(m*n).
You could conceivably store your dictionary in a binary tree, which would reduce that to O(log(n)).
Search for "binary tree python" on Google, I was a few interesting things out there, like a package called "bintrees".
However, Erik Vesteraas points out the the python 'set' data structure is a hashed based collection, and has a complexity of O(1) in the average case, and O(n) in the worst, and highly rare case.
See https://docs.python.org/2/library/stdtypes.html#set

Scipy optimize: Set maximum error

I'm trying to optimize a 4 dimensional function with scipy. Everything works so far, except that I'm not satisfied with the quality of the solution. Right now I have ground truth data, which I use to verify my code. What I get so far is:
End error: 1.52606896507e-05
End Gradient: [ -1.17291295e-05 2.60362493e-05 5.15347856e-06 -2.72388430e-05]
Ground Truth: [0.07999999..., 0.0178329..., 0.9372903878..., 1.7756283966...]
Reconstructed: [ 0.08375729 0.01226504 1.13730592 0.21389899]
The error itself sounds good, but as the values are totally wrong I want to force the optimization algorithm (BFGS) to do more steps.
In the documentation I found the options 'gtol' and 'norm' and I tried to set both to pretty small values (like 0.0000001) but it did not seem to change anything.
Background:
The problem is, that I try to demodulate waves, so I have sin and cos terms and potentially many local (or global) minima. I use bruteforce search to find a good starting point, witch helps a lot, but it currently seems that the most work is done by that brute force search, as the optimization uses often only one iteration step. So I'm trying to improve that part of the calculation somehow.
Many local minima + hardly any improvement after brute search, that sounds bad. It's hard to say something very specific with the level of detail you provide in the question, so here are vague ideas to try (basically, what I'd do if I suspect my minimizer gets stuck):
try manually starting the minimizer from a bunch of different initial guesses.
try using a stochastic minimizer. You're tagging a question scipy, so try basinhopping
if worst comes to worst, just throw random points in a loop, leave it to work over the lunch break (or overnight)
Also, waves, sines and cosines --- it might be useful to think if you can reformulate your problem in the Fourier space.
I found out that the gradient at the starting point is already very flat (values in 10^-5), so I tried to scale the gradient function witch I already provided. This seemed to be pretty effective, I could force the Algorithm to do much more steps and my results are far better now.
They are not perfect though, but a complete discussion of this is outside of the bounds of this question, so I might start a new one, where I describe the whole problem from bottom up.

Comparing Root-finding (of a function) algorithms in Python

I would like to compare different methods of finding roots of functions in python (like Newton's methods or other simple calc based methods). I don't think I will have too much trouble writing the algorithms
What would be a good way to make the actual comparison? I read up a little bit about Big-O. Would this be the way to go?
The answer from #sarnold is right -- it doesn't make sense to do a Big-Oh analysis.
The principal differences between root finding algorithms are:
rate of convergence (number of iterations)
computational effort per iteration
what is required as input (i.e. do you need to know the first derivative, do you need to set lo/hi limits for bisection, etc.)
what functions it works well on (i.e. works fine on polynomials but fails on functions with poles)
what assumptions does it make about the function (i.e. a continuous first derivative or being analytic, etc)
how simple the method is to implement
I think you will find that each of the methods has some good qualities, some bad qualities, and a set of situations where it is the most appropriate choice.
Big O notation is ideal for expressing the asymptotic behavior of algorithms as the inputs to the algorithms "increase". This is probably not a great measure for root finding algorithms.
Instead, I would think the number of iterations required to bring the actual error below some epsilon ε would be a better measure. Another measure would be the number of iterations required to bring the difference between successive iterations below some epsilon ε. (The difference between successive iterations is probably a better choice if you don't have exact root values at hand for your inputs. You would use a criteria such as successive differences to know when to terminate your root finders in practice, so you could or should use them here, too.)
While you can characterize the number of iterations required for different algorithms by the ratios between them (one algorithm may take roughly ten times more iterations to reach the same precision as another), there often isn't "growth" in the iterations as inputs change.
Of course, if your algorithms take more iterations with "larger" inputs, then Big O notation makes sense.
Big-O notation is designed to describe how an alogorithm behaves in the limit, as n goes to infinity. This is a much easier thing to work with in a theoretical study than in a practical experiment. I would pick things to study that you can easily measure that and that people care about, such as accuracy and computer resources (time/memory) consumed.
When you write and run a computer program to compare two algorithms, you are performing a scientific experiment, just like somebody who measures the speed of light, or somebody who compares the death rates of smokers and non-smokers, and many of the same factors apply.
Try and choose an example problem or problems to solve that is representative, or at least interesting to you, because your results may not generalise to sitations you have not actually tested. You may be able to increase the range of situations to which your results reply if you sample at random from a large set of possible problems and find that all your random samples behave in much the same way, or at least follow much the same trend. You can have unexpected results even when the theoretical studies show that there should be a nice n log n trend, because theoretical studies rarely account for suddenly running out of cache, or out of memory, or usually even for things like integer overflow.
Be alert for sources of error, and try to minimise them, or have them apply to the same extent to all the things you are comparing. Of course you want to use exactly the same input data for all of the algorithms you are testing. Make multiple runs of each algorithm, and check to see how variable things are - perhaps a few runs are slower because the computer was doing something else at a time. Be aware that caching may make later runs of an algorithm faster, especially if you run them immediately after each other. Which time you want depends on what you decide you are measuring. If you have a lot of I/O to do remember that modern operating systems and computer cache huge amounts of disk I/O in memory. I once ended up powering the computer off and on again after every run, as the only way I could find to be sure that the device I/O cache was flushed.
You can get wildly different answers for the same problem just by changing starting points. Pick an initial guess that's close to the root and Newton's method will give you a result that converges quadratically. Choose another in a different part of the problem space and the root finder will diverge wildly.
What does this say about the algorithm? Good or bad?
I would suggest you to have a look at the following Python root finding demo.
It is a simple code, with some different methods and comparisons between them (in terms of the rate of convergence).
http://www.math-cs.gordon.edu/courses/mat342/python/findroot.py
I just finish a project where comparing bisection, Newton, and secant root finding methods. Since this is a practical case, I don't think you need to use Big-O notation. Big-O notation is more suitable for asymptotic view. What you can do is compare them in term of:
Speed - for example here newton is the fastest if good condition are gathered
Number of iterations - for example here bisection take the most iteration
Accuracy - How often it converge to the right root if there is more than one root, or maybe it doesn't even converge at all.
Input - What information does it need to get started. for example newton need an X0 near the root in order to converge, it also need the first derivative which is not always easy to find.
Other - rounding errors
For the sake of visualization you can store the value of each iteration in arrays and plot them. Use a function you already know the roots.
Although this is a very old post, my 2 cents :)
Once you've decided which algorithmic method to use to compare them (your "evaluation protocol", so to say), then you might be interested in ways to run your challengers on actual datasets.
This tutorial explains how to do it, based on an example (comparing polynomial fitting algorithms on several datasets).
(I'm the author, feel free to provide feedback on the github page!)

Categories