The Problem
I've been doing a bit of research on Particle Swarm Optimization, so I said I'd put it to the test.
The problem I'm trying to solve is the Balanced Partition Problem - or reduced simply to the Subset Sum Problem (where the sum is half of all the numbers).
It seems the generic formula for updating velocities for particles is
but I won't go into too much detail for this question.
Since there's no PSO attempt online for the Subset Sum Problem, I looked at the Travelling Salesman Problem instead.
They're approach for updating velocities involved taking sets of visited towns, subtracting one from another and doing some manipulation on that.
I saw no relation between that and the formula above.
My Approach
So I scrapped the formula and tried my own approach to the Subset Sum Problem.
I basically used gbest and pbest to determine the probability of removing or adding a particular element to the subset.
i.e - if my problem space is [1,2,3,4,5] (target is 7 or 8), and my current particle (subset) has [1,None,3,None,None], and the gbest is [None,2,3,None,None] then there is a higher probability of keeping 3, adding 2 and removing 1, based on gbest
I can post code but don't think it's necessary, you get the idea (I'm using python btw - hence None).
So basically, this worked to an extent, I got decent solutions out but it was very slow on larger data sets and values.
My Question
Am I encoding the problem and updating the particle "velocities" in a smart way?
Is there a way to determine if this will converge correctly?
Is there a resource I can use to learn how to create convergent "update" formulas for specific problem spaces?
Thanks a lot in advance!
Encoding
Yes, you're encoding this correctly: each of your bit-maps (that's effectively what your 5-element lists are) is a particle.
Concept
Your conceptual problem with the equation is because your problem space is a discrete lattice graph, which doesn't lend itself immediately to the update step. For instance, if you want to get a finer granularity by adjusting your learning rate, you'd generally reduce it by some small factor (say, 3). In this space, what does it mean to take steps only 1/3 as large? That's why you have problems.
The main possibility I see is to create 3x as many particles, but then have the transition probabilities all divided by 3. This still doesn't satisfy very well, but it does simulate the process somewhat decently.
Discrete Steps
If you have a very large graph, where a high velocity could give you dozens of transitions in one step, you can utilize a smoother distance (loss or error) function to guide your model. With something this small, where you have no more than 5 steps between any two positions, it's hard to work with such a concept.
Instead, you utilize an error function based on the estimated distance to the solution. The easy one is to subtract the particle's total from the nearer of 7 or 8. A harder one is to estimate distance based on that difference and the particle elements "in play".
Proof of Convergence
Yes, there is a way to do it, but it requires some functional analysis. In general, you want to demonstrate that the error function is convex over the particle space. In other words, you'd have to prove that your error function is a reliable distance metric, at least as far as relative placement goes (i.e. prove that a lower error does imply you're closer to a solution).
Creating update formulae
No, this is a heuristic field, based on shape of the problem space as defined by the particle coordinates, the error function, and the movement characteristics.
Extra recommendation
Your current allowable transitions are "add" and "delete" element.
Include "swap elements" to this: trade one present member for an absent one. This will allow the trivial error function to define a convex space for you, and you'll converge in very little time.
Related
I have a rather simple problem to define but I did not find a simple answer so far.
I have two graphs (ie sets of vertices and edges) which are identical. Each of them has independently labelled vertices. Look at the example below:
How can the computer detect, without prior knowledge of it, that 1 is identical to 9, 2 to 10 and so on?
Note that in the case of symmetry, there may be several possible one to one pairings which give complete equivalence, but just finding one of them is sufficient to me.
This is in the context of a Python implementation. Does someone have a pointer towards a simple algorithm publicly available on the Internet? The problem sounds simple but I simply lack the mathematical knowledge to come up to it myself or to find proper keywords to find the information.
EDIT: Note that I also have atom types (ie labels) for each graphs, as well as the full distance matrix for the two graphs to align. However the positions may be similar but not exactly equal.
This is known as the graph isomorphism problem, and probably very hard; although the exactly details of how hard are still subject of research.
(But things look better if you graphs are planar.)
So, after searching for it a bit, I think that I found a solution that works most of the time for moderate computational cost. This is a kind of genetic algorithm which uses a bit of randomness, but it is practical enough for my purposes it seems. I didn't have any aberrant configuration with my samples so far even if it is theoretically possible that this happens.
Here is how I proceeded:
Determine the complete set of 2-paths, 3-paths and 4-paths
Determine vertex types using both atom type and surrounding topology, creating an "identity card" for each vertex
Do the following ten times:
Start with a random candidate set of pairings complying with the allowed vertex types
Evaluate how much of 2-paths, 3-paths and 4-paths correspond between the two pairings by scoring one point for each corresponding vertex (also using the atom type as an additional descriptor)
Evaluate all other shortlisted candidates for a given vertex by permuting the pairings for this candidate with its other positions in the same way
Sort the scores in descending order
For each score, check if the configuration is among the excluded configurations, and if it is not, take it as the new configuration and put it into the excluded configurations.
If the score is perfect (ie all of the 2-paths, 3-paths and 4-paths correspond), then stop the loop and calculate the sum of absolute differences between the distance matrices of the two graphs to pair using the selected pairing, otherwise go back to 4.
Stop this process after it has been done 10 times
Check the difference between distance matrices and take the pairings associated with the minimal sum of absolute differences between the distance matrices.
I was given a problem in which you are supposed to write a python code that distributes a number of different weights among 4 boxes.
Logically we can't expect a perfect distribution as in case we are given weights like 10, 65, 30, 40, 50 and 60 kilograms, there is no way of grouping those numbers without making one box heavier than another. But we can aim for the most homogenous distribution. ((60),(40,30),(65),(50,10))
I can't even think of an algorithm to complete this task let alone turn it into python code. Any ideas about the subject would be appreciated.
The problem you're describing is similar to the "fair teams" problem, so I'd suggest looking there first.
Because a simple greedy algorithm where weights are added to the lightest box won't work, the most straightforward solution would be a brute force recursive backtracking algorithm that keeps track of the best solution it has found while iterating over all possible combinations.
As stated in #j_random_hacker's response, this is not going to be something easily done. My best idea right now is to find some baseline. I describe a baseline as an object with the largest value since it cannot be subdivided. Using that you can start trying to match the rest of the data to that value which would only take about three iterations to do. The first and second would create a list of every possible combination and then the third can go over that list and compare the different options by taking the average of each group and storing the closest average value to your baseline.
Using your example, 65 is the baseline and since you cannot subdivide it you know that has to be the minimum bound on your data grouping so you would try to match all of the rest of the values to that. It wont be great, but it does give you something to start with.
As j_random_hacker notes, the partition problem is NP-complete. This problem is also NP-complete by a reduction from the 4-partition problem (the article also contains a link to a paper by Garey and Johnson that proves that 4-partition itself is NP-complete).
In particular, given a list to 4-partition, you could feed that list as an input to a function that solves your box distribution problem. If each box had the same weight in it, a 4-partition would exist, otherwise not.
Your best bet would be to create an exponential time algorithm that uses backtracking to iterate over the 4^n possible assignments. Because unless P = NP (highly unlikely), no polynomial time algorithm exists for this problem.
This Question is more Theoretical, and not specifically trying to problem-solve.
I recently was introduced to the K-Means Clustering algorithm, and unsupervised machine learning algorithm, and I was intrigued by the though that one some sets of data, even if completely random, the average centroids drawn could keep changing through each iteration.
Example:
What I am trying to show here, is, imagine if the program flipped between iteration 6, to iteration 9, and kept doing this forever.
I have had my code randomly hang before using K-Means, so I don't believe this is impossible, but please let me know if this is a known occurrence, or if it is impossible due to the nature of the algorithm.
If you need more information just ask me in a comment. Using Python 3.7
tl;dr No, a K-means algorithm always has an end point if the algorithm is coded correctly.
Explanation:
The ideal way to think about this is not in the sense of what datapoints would cause issues, but rather about how kmeans is working in the broader sense of things. The k-means algorithm is always working in a finite space. For N data points, there are only N ^ k distinct arrangements for the data points. (This number can be pretty large, but is still finite)
Secondly, a k-means algorithm is always optimizing a loss function, based on the sum of squared distances between each data point and it's assigned cluster center. This means two very important things: Each of the N ^ k distinct arrangements can be arranged in an ascending/descending order of minimum loss to maximum loss. Also, the K-means algorithm will never go from a state of lower net loss to a higher net loss.
These two conditions guarantee that the algorithm will always tend towards the minimum loss arrangement in a finite space, thus ensuring that it has an end.
The last edge case: What if more than one minimum state has equal loss? This is a highly unlikely scenario, but can cause issues if and only if the algorithm is coded poorly for tie breakers. Essentially, the only way this can cause a cycle is if a data point has equal distance for two clusters, and is allowed to change clusters away from it's current cluster even on equal distance. Suffice to say, the algorithms are generally coded so that the data points never swap on a tie, or in some other deterministic manner, thus avoiding this scenario entirely.
I am a PhD student who is trying to use the NEAT algorithm as a controller for a robot and I am having some accuracy issues with it. I am working with Python 2.7 and for it and am using two NEAT python implementations:
The NEAT which is in this GitHub repository: https://github.com/CodeReclaimers/neat-python
Searching in Google, it looks like it has been used in some projects with succed.
The multiNEAT library developed by Peter Chervenski and Shane Ryan: http://www.multineat.com/index.html.
Which appears in the "official" software web page of NEAT software catalog.
While testing the first one, I've found that my program converges quickly to a solution, but this solution is not precise enough. As lack of precision I want to say a deviation of a minimum of 3-5% in the median and average related to the "perfect" solution at the end of the evolution (Depending on the complexity of the problem, an error around 10% is normal for my solutions. Furthermore, I could said that I've "never" seen an error value under the 1% between the solution given by the NEAT and the solution that it is the correct one). I must said that I've tried a lot of different parameter combinations and configurations (this is an old problem for me).
Due to that, I tested the second library. The MultiNEAT library converges quickly and easier that the previous one. (I assume that is due to the C++ implementation instead the pure Python) I get similar results, but I still have the same problem; lack of accuracy. This second library has different configuration parameters too, and I haven't found a proper combination of them to improve the performance of the problem.
My question is:
Is it normal to have this lack of accuracy in the NEAT results? It achieves good solutions, but not good enough for controlling a robot arm, which is what I want to use it for.
I'll write what I am doing in case someone sees some conceptual or technical mistake in the way I set out my problem:
To simplify the problem, I'll show a very simple example: I have a very simple problem to solve, I want a NN that may calculate the following function: y = x^2 (similar results are found with y=x^3 or y = x^2 + x^3 or similar functions)
The steps that I follow to develop the program are:
"Y" are the inputs to the network and "X" the outputs. The
activation functions of the neural net are sigmoid functions.
I create a data set of "n" samples given values to "X" between the
xmin = 0.0 and the xmax = 10.0
As I am using sigmoid functions, I make a normalization of the "Y"
and "X" values:
"Y" is normalized linearly between (Ymin, Ymax) and (-2.0, 2.0) (input range of sigmoid).
"X" is normalized linearly between (Xmin, Xmax) and (0.0, 1.0) (the output range of sigmoid).
After creating the data set, I subdivide in in a train sample (70%
percent of the total amount), a validation sample and a test sample
(15% each one).
At this point, I create a population of individuals for doing
evolution. Each individual of the population is evaluated in all the
train samples. Each position is evaluated as:
eval_pos = xmax - abs(xtarget - xobtained)
And the fitness of the individual is the average value of all the train positions (I've selected the minimum too but it gives me worse performance).
After the whole evaluation, I test the best obtained individual
against the test sample. And here is where I obtained those
"un-precise values". Moreover, during the evaluation process, the
maximum value where "abs(xtarget - xobtained) = 0" is never
obtained.
Furthermore, I assume that how I manipulate the data is right because, I use the same data set for training a neural network in Keras and I get much better results than with NEAT (an error less than a 1% is achievable after 1000 epochs in a layer with 5 neurons).
At this point, I would like to know if what is happened is normal because I shouldn't use a data set of data for developing the controller, it must be learned "online" and NEAT looks like a suitable solution for my problem.
Thanks in advance.
EDITED POST:
Firstly, Thanks for comment nick.
I'll answer your questions below::
I am using the NEAT algorithm.
Yes, I've carried out experiments increasing the number of individuals in the population and the generations number. A typical graph that I get is like this:
Although the population size in this example is not such big, I've obtained similar results in experiments incrementing the number of individuals or the number of generations. Populations of 500 in individuals and 500 generations, for example. In this experiments, the he algorithm converge fast to a solution, but once there, the best solution is stucked and it does not improve any more.
As I mentioned in my previous post, I've tried several experiments with many different parameters configurations... and the graphics are more or less similar to the previous showed.
Furthermore, other two experiments that I've tried were: once the evolution reach the point where the maximum value and the median converge, I generate other population based on that genome with new configuration parameters where:
The mutation parameters change with a high probability of mutation (weight and neuron probability) in order to find new solutions with the aim to "jumping" from the current genome to other better.
The neuron mutation is reduced to 0, while the weight "mutation probability" increase for "mutate weight" in a lower range in order to get slightly modifications with the aim to get a better adjustment of the weights. (trying to get a "similar" functionality as backprop. making slighty changes in the weights)
This two experiments didn't work as I expected and the best genome of the population was also the same of the previous population.
I am sorry, but I do not understand very well what do you want to say with "applying your own weighted penalties and rewards in your fitness function". What do you mean with including weight penalities in the fitness function?
Regards!
Disclaimer: I have contributed to these libraries.
Have you tried increasing the population size to speed up the search and increasing the number of generations? I use it for a trading task, and by increasing the population size my champions were found much sooner.
Another thing to think about is applying your own weighted penalties and rewards in your fitness function, so that anything that doesn't get very close right away is "killed off" sooner and the correct genome is found faster. It should be noted that neat uses a fitness function to learn as a opposed to gradient descent so it wont converge in the same way and its possible you may have to train a bit longer.
Last question, are you using the neat or hyperneat algo from multineat?
I am studying physics and ran into a really interesting problem. I'm not an expert on programming so please take this into account while reading this.
I really hope that someone can help me with this problem because I struggle with this matter for about 2 months now and don't see any success.
So here is my Problem:
I got a bunch of data sets (more than 2 less than 20) from numerical calculations. The set is given by x against measurement values. I have a set of sensors and want to find the best positions x for my sensors such that the integral of the interpolation comes as close as possible to the integral of the numerical data set.
As this sounds like a typical mathematical problem I started to look for some theorems but I did not find anything.
So I started to write a python program based on the SLSQP minimizer. I chose this because it can handle bounds and constraints. (Note there is always a sensor at 0 and one at 1)
Constraints: the sensor array must stay sorted all the time such that x_i smaller than x_i+1 and the interval of x is normalized to [0,1].
Before doing an overall optimization I started to look for good starting points and searched for maximums, minimums and linear areas of my given data sets. But an optimization over 40 values turned out to deliver bad results.
In my second try I started to search for these points and defined certain areas. So I optimized each area with 1 to 40 sensors. Then I compared the results and decided which area is worth putting more sensors in. I the last step I wanted to do an overall optimization again. But these idea didn't seem to be the proper solution, too, because the optimization had convergence problem as well.
The big problem was, that my optimizer broke the boundaries. I covered this by interrupting the optimization, because once this boundaries were broken the result was not correct in the end. If this happens I reset my initial setup and a homogeneous distribution. After this there are normally no violence of boundaries but the results seems to be a homogeneous distribution, too, often this is obviously not the perfect distribution.
As my algorithm works for simple examples and dies for more complex data I think there is a general problem and not just some error in my coding. Does anyone have an idea how to move on or knows some theoretical stuff about this matter?
The attached plot show the areas in different colors. The function is shown at the bottom and the sensor positions are represented as dots. Dots at value y=1 are from the optimization with one sensors 2 represents the results of optimization with 2 variables. So as the program reaches higher sensor numbers the whole thing gets more and more homogeneous.
It is easy to see that if n is the number of sensors and n goes to infinity you have a total homogeneous distribution. But as far as I see this this should not happen for just 10 sensors.