How to make user-defined callable weight function for sklearn knn? - python

I am trying to make custom weights for Sklearn KNN classifier, similar as here.
In documentation is just briefly mentioned that you can set custom weights as a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights. here.
How can I make that function for squared distance or linear weights?
I went trough countless pages of SO, but without any luck.
Is there a walkthrough or correct example?

Related

Force centroid initialization to array in k-means with sklearn

I am working on an anomaly detection project, and for that I have embeddings which contain features from images (adversarial autoencoder model). Now I want to interpret these embeddings with PCA and k-means. I need to find certain embeddings of features in the images that allow me to distinguish between two datasets of images.
With scikit-learn, k-means is pretty easy, but the centroid initialization is almost always done randomly. I already know the means of the datasets I want to apply k-means to, so no random initialization is needed. How can I force the sklearn k-means function to initialize the centroids to an array of means?
The initilialization can be done by passing of through the init parameter, but the only examples on the sklearn documentation site use init='k-means++'. The library source code doesn't have an example either
The documentation states that you can pass an array-like as init argument when calling the function
"[...] init{‘k-means++’, ‘random’}, callable or array-like of shape (n_clusters, n_features), default=’k-means++’
Method for initialization:", (https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html) meaning something like
your_centroids = np.random.randn(8, 3)
k_means = sklearn.cluster.KMeans(n_clusters=8, init=your_centroids)
should work (not tested).

How to write a scikit-lean estimator with different predict results

I'm trying to wrap a new method called "GEMSEC: Embedding with Self Clustering" in a model class that conforms to scikit-learn's conventions.
I read about predict function here and it seems like predict function must return an array of [n_samples,]or [n_samples, n_outputs].
The model I'm implementing does two different things: Learning embeddings (representation) and clustering and I don't know what to return from my the predict function to be suitable for predict functions as defined by scikit-learn.
Thanks in advance

Custom Estimator Head for scoring vector of probability distributions

I'm using TensorFlow 1.4 and the Estimator framework.
I'd like to use tf.contrib.estimator.multi_head to create one head (in a model that has one other head) that summarizes the result of N softmax_cross_entropy_with_logits. The N probability distributions are each defined over the same set of classes, but are independent distributions. The summary loss score I'd like to compute is simply the sum of squares of the softmax cross entropy losses.
I can almost use a tf.contrib.estimator.regression_head to compute the summary if I fake a labels vector of N zeros, as mean squared error with a zero vector is equivalent to summing the squares of the softmax losses. But this seems kludgy and I'd like a more direct approach.
It seems that I will need to create my own subclass of _Head implemented in tensorflow.python.estimator.canned.head, and will have to implement the method create_loss which is documented as to be used by framework developers.
Before I start down this path, I'd like to hear if there are alternate approaches I should consider.
I originally started development with Keras and at one time had a multihead model using the functional API. I wonder if perhaps I should return to using Keras, and then create my model_fn using the tf.keras.estimator.model_to_estimator. All things being equal I would prefer to code in pure Tensorflow idioms, but perhaps Keras is the easiest path forward.

TensorFlow - How to minimize function of one variable?

I've been given a fully trained model by another researcher that has inputs as placeholders. Regarding it as a function f(x), I would like to find x to minimize my distance metric (loss function) dist(x, f(x)). This could be something like the euclidean distance between the two points.
I tried to use TensorFlow's built-in optimizer functions. The issue is that tf.train.AdamOptimizer(1e-4).minimize(loss, var_list[input_placeholder]) fails, complaining that input_placeholder isn't of a supported type. Thus, I cannot get gradients for my input.
How can I optimize a function in TensorFlow when the inputs have to be specified in this way? Unfortunately, these placeholders are not passed through a Variable first, and I have to treat that model as a black box.
Using the Keras functional API detailed in this question, I created a dense layer with no bias to sit right before the model I was given. Holding its input as a constant all 1's vector, I optimized the joined model using only the Variable in the dense layer, giving me the optimal vector as the output of that layer.
All TensorFlow Optimizer subclasses allow you to minimize while only modifying a particular set of Variables, which I got out of Keras fairly simply.

How to analyse 3d mesh data(in .stl) by TensorFlow

I try to write an script in python for analyse an .stl data file(3d geometry) and say which model is convex or concave and watertight and tell other properties...
I would like to use and TensorFlow, scikit-learn or other machine learning library. Create some database with examples of objects with tags and in future add some more examples and just re-train model for better results.
But my problem is: I don´t know how to recalculate or restructure 3d data for working in ML libraries. I have no idea.
Thank you for your help.
You have to first extract "features" out of your dataset. These are fixed-dimension vectors. Then you have to define labels which define the prediction. Then, you have to define a loss function and a neural network. Put that all together and you can train a classifier.
In your example, you would first need to extract a fixed dimension vector out of an object. For instance, you could extract the object and project it on a fixed support on the x, y, and z dimensions. That defines the features.
For each object, you'll need to label whether it's convex or concave. You can do that by hand, analytically, or by creating objects analytically that are known to be concave or convex. Now you have a dataset with a lot of sample pairs (object, is-concave).
For the loss function, you can simply use the negative log-probability.
Finally, a feed-forward network with some convoluational layers at the bottom is probably a good idea.

Categories