I have a 2D list. Consider it to be possibleVals. Each sub list within this 2D list contains possible values for a given variable.
i.e If the 2D list contains m 1D lists it implies that there are m variables. with the i th variable will having possible values contained within possibleVals[i].
I want to find the optimal combination for all the m variables. to maximize a certain function based of the inputs. I understand we could derive all the combinations and find the value one by one but that is very time consuming especially as m grows. I was wondering whether there is a solution for this using machine learning (Neural nets) as it has a similar behaviour (of course involving reducing a certain value
Related
Python or R (preferably Python, since I am better with arrays and loops there)
Suppose we have a two dimensional array A (of n^2 rows and n^2 columns) whose entries we have already filled in with for loops, using some complicated rules.
Now I want to create a Markov chain on n^2 states with transition matrix A and then compute the following hitting probability: the probability that, starting from (2,3), say, I ever reach a state corresponding to an integer which is a multiple of n.
How is this possible? Are there some nice library/package functions for this?
Update: I am also fine with not formally creating a chain but just a complicated system of equations which finds the hitting probability I am chasing.
Can you suggest a module function from numpy/scipy that can find local maxima/minima in a text file? I was trying to use the nearest neighbours approach, but the data fluctuations cause false identification. Is it possible to use the neighbour's approach but use 20 data points as the sample_len.
scipy.signal.argrelmax looks for relative maxima in an array (there is also argrelmin for minima). It has the order keyword argument which allows you to compare eg. 20 neighbours.
I have an array of N elements and a function func() that takes as input M unique elements from the array, with M<N. I need to find the subset M* that maximizes my func().
I can't use an exhaustive search (ie: test every possible subset M that can be created from the N elements in the array) because the total number of combinations is too large even for modest values of N, M.
I can't use any of the usual scipy optimization algorithms (at least none that I am aware of) since I'm not working with a continuous, or even a discrete, parameter; rather I'm trying to find a subset of elements that maximizes my function.
Is there some Python algorithm in any package that could help me with this?
I have a function that takes 4 variable and returns a single float value in range [0,1].
I want to know which inputs will maximize function's output. However, this function runs slow, so I just made 1000 random samples. i.e. 1000 tuples of (input, output)
Is there any good method to predict values that maximize my function with these tuples? I don't care if there are more function running, but not many.
Thanks in advance.
No there is no general method to do what you're asking.
Global optimization is a collection of techniques (and a whole field of study) that are used to minimize a function based on some of its general properties. Without more information about the underlying function, niave random sampling (as you're doing) is a 'reasonable' approach.
You're best best is to find additional information about the character of your function mapping (is the output spikey or smoothly varying with the input? Are there lots of minima, or just a few?), or just keep sampling.
Imagine I have a dataset as follows:
[{"x":20, "y":50, "attributeA":90, "attributeB":3849},
{"x":34, "y":20, "attributeA":86, "attributeB":5000},
etc.
There could be a bunch more other attributes in addition to these - this is just an example. What I am wondering is, how can I cluster these points based on all of the factors with control over the maximum separation between a given point and the next for a given variable for it to be considered linked. (i.e. euclidean distance must be within 10 points, attributeA within 5 points and attributeB within 1000 points)
Any ideas on how to do this in python? As I implied above, I would like to apply euclidean distance to compare distance between the two points if possible - not just comparing x and y as separate attributes. For the rest of the attributes it would be all single dimensional comparison...if that makes sense.
Edit: Just to add some clarity in case this doesn't make sense, basically I am looking for some algorithm to compare all objects with each other (or some more efficient way), if all of object A's attributes and euclidean distance are within the specified threshold when compared to object B, then those two are considered similar and linked - this procedure continues until eventually all the linked clusters can be returned as some clusters will have no points that satisfy the conditions to be similar to any point in another cluster resulting in the clusters being separated.
The simplest approach is to build a binary "connectivity" matrix.
Let a[i,j] be 0 exactly if your conditions are fullfilled, 1 otherwise.
Then run hierarchical agglomerative clustering with complete linkage on this matrix. If you don't need every pair of objects in every cluster to satisfy your threshold, then you can also use other linkages.
This isn't the best solution - other distance matrix will need O(n²) memory and time, and the clustering even O(n³), but the easiest to implement. Computing the distance matrix in Python code will be really slow unless you can avoid all loops and have e.g. numpy do most of the work. To improve scalability, you should consider DBSCAN, and a data index.
It's fairly straightforward to replace the three different thresholds with weights, so that you can get a continuous distance; likely even a metric. Then you could use data indexes, and try out OPTICS.