Novelty detection using one class svm-python - python

I'm in the process of novelty detection using machine-learning. I have tried using one-class svm in scikit learn.
from sklearn import svm
train_data = [[0, 0, 0, 0, 0, 1, 0, 0], [0, 1, 0, 0, 0, 1, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 1], [0, 3, 0, 0, 0, 1, 0, 0], [0, 11, 0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 4]]
test_data = [[0, 0, 0, 0, 0, 1, 0, 0], [0, 1, 0, 0, 0, 1, 0, 0]]
clf = svm.OneClassSVM(nu=0.1, kernel="rbf", gamma=0.1)
clf.fit(train_data)
pred_test = clf.predict(test_data)
I'm new to this area and I want to know how can I say there is novelty in my test data?

The inliers are labeled 1, and the outliers (i.e., the novelties in your case) are labeled -1 (as the result of the predict function).
Please notice that the current documentation incorrectly states that the outliers are labeled 1 & inliers are labeled 0. Please check out the latest updates on github repo for the correct information.

check = clf.predict(test_data)
if check = 1 then not anomaly and
if check = -1 then it an anomaly i.e. data is outlier

Related

Labeling via majority vote of connected clusters in python3

I have a tensor with three dimensions and three classes (0: background, 1: first class, 2: second class). I would like to find connected clusters and assign outlier's labels by performing a majority vote. A 2D example:
import numpy as np
data = np.array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 1, 2],
[1, 2, 0, 0, 2, 2, 2],
[0, 1, 0, 0, 0, 2, 0],
[0, 0, 0, 0, 0, 0, 0],])
should be changed to
data = np.array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 2, 2],
[1, 1, 0, 0, 2, 2, 2],
[0, 1, 0, 0, 0, 2, 0],
[0, 0, 0, 0, 0, 0, 0],])
It is enough to see connected regions as one cluster an count the appearence of the labels. I am not looking for any machine learning method.
You can use scipy.ndimage.measurements.label to find the connected components and then use np.bincount for the counting
from scipy.ndimage import measurements
lbl,ncl = measurements.label(data)
lut = np.bincount((data+2*lbl).ravel(),None,2*ncl+3)[1:].reshape(-1,2).argmax(1)+1
lut[0] = 0
lut[lbl]
# array([[0, 0, 0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 0, 0],
# [0, 1, 0, 0, 0, 0, 0],
# [1, 1, 1, 0, 0, 2, 2],
# [1, 1, 0, 0, 2, 2, 2],
# [0, 1, 0, 0, 0, 2, 0],
# [0, 0, 0, 0, 0, 0, 0]])

Create boolean mask on TensorFlow

Suppose I have a list
x = [0, 1, 3, 5]
And I want to get a tensor with dimensions
s = (10, 7)
Such that the first column of the rows with indexes defined in x are 1, and 0 otherwise.
For this particular example, I want to obtain the tensor containing:
T = [[1, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]]
Using numpy, this would be the equivalent:
t = np.zeros(s)
t[x, 0] = 1
I found this related answer, but it doesn't really solve my problem.
Try this:
import tensorflow as tf
indices = tf.constant([[0, 1],[3, 5]], dtype=tf.int64)
values = tf.constant([1, 1])
s = (10, 7)
st = tf.SparseTensor(indices, values, s)
st_ordered = tf.sparse_reorder(st)
result = tf.sparse_tensor_to_dense(st_ordered)
sess = tf.Session()
sess.run(result)
Here is the output:
array([[0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]], dtype=int32)
I slightly modified your indexes so you can see the x,y format of the indices
To obtain what you originally asked, set:
indices = tf.constant([[0, 0], [1, 0],[3, 0], [5, 0]], dtype=tf.int64)
Output:
array([[1, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]], dtype=int32)

In python solve for a matrix with restrictions

Here's the problem: I have two vectors A (1Xn) and B (1Xm) where n>m. I'm looking for a matrix T (nXm), such that AT=B. T has the following properties: All elements of T are either 1's or 0's. The elements in each row in T sum to 1. Ideally, I would like the program to return the best solution where as many elements of AT-B=0 if there is not a perfect solution.
Here's an example:
import numpy as np
A = np.array([-1.051, 1.069, 0.132, -0.003, -0.001, 0.066, -0.28,
-0.121, 0.075, 0.006, 0.229, -0.018, -0.213, -0.11])
B = np.array([-1.051, 1.201, -0.003, -0.001, 0.066, -0.121, 0.075,
-0.045,-0.231, -0.11])
T = np.array([[1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]])
# This should equal a vector of 0's
print A.dot(T)-B
I've come up with something, but I don't think it's totally satisfactory. I'd prefer not to have to go this route as it's clunky. I first try 1 to 1 mapping because that's a common solution to many of the mappings. Anything left over I move to iterating over all possibilities. This gets messy a bit quickly. You'll also notice I'm fairly new to numpy. I'd appreciate any feedback on how to improve this. Thanks.
def solver(B,A):
m=B.size
n=A.size
start=np.ones((1,n))
start=np.concatenate((start,np.zeros((m-1,n))),axis=0).astype(np.int)
for i in xrange(0,m):
T=np.roll(start,i,axis=0)
test=B.dot(T)-A
if i==0:
matches=np.absolute(test)<.0001
else:
matches=np.vstack((matches,np.absolute(test)<.0001))
rA=(A-B.dot(matches))[np.absolute(A-B.dot(matches))>.0001]
Amissing=A-B.dot(matches)
rB=(B-B*np.sum(matches,axis=1))[np.absolute(B-B*np.sum(matches,axis=1))>.0001]
Bmissing=B-B*np.sum(matches,axis=1)
rm=rB.size
rn=rA.size
rmxrn = np.arange(rm*rn).reshape(rm,rn)
dif=np.absolute(rA)
best=np.zeros(shape=(rm,rn))
for i in xrange(0, 2**(rm*rn)):
arr = (i >> rmxrn) % 2
if np.amax(np.sum(arr,axis=1))>1 or np.sum(arr)>rm:
continue
else:
diftemp=rB.dot(arr)-rA
besttemp=arr
if np.sum(np.absolute(diftemp))<np.sum(np.absolute(dif)):
dif=diftemp
best=besttemp
if np.sum(np.absolute(dif)<.0001)==rn:
break
best=best.astype(np.bool)
matchesT=matches.T
bestT=best.T
newbestT=np.zeros(shape=(m,rn)).astype(np.bool).T
for x in xrange(0,rn):
counter=0
for i, value in enumerate(Bmissing):
if abs(Bmissing[i])>.0001:
newbestT[x,i]=bestT[x,counter]
counter=counter+1
for x in xrange(0,rn):
counter=0
for i, value in enumerate(Amissing):
if abs(Amissing[i])>.0001:
matchesT[i]=newbestT[counter]
counter=counter+1
return(matchesT.T)
A=np.array([-1.051,1.201,-0.003,-0.001,0.066,-0.121,0.075,-0.045,-0.231,-0.11])
B=np.array([-1.051,1.069,0.132,-0.003,-0.001,0.066,-0.28,-0.121,0.075,0.006,0.229,-0.018,-0.213,-0.11])
print solver(B,A)

Finding all legal moves in a simple board game

I have an 8x8 board represented by a numpy.ndarray:
array([[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0]])
# 0 = free space
# 1 = player1's figure
A figure can either move forward and left, forward and right or just forward (forward means down the board in this case).
Right now I am using nested for loops in order to look through the board indexes. When I find a figure I append the board states that can be achieved by making moves with that figure to a list and then keep searching for figures.
For this example the output of my function looks like this:
[array([[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0]],
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0]],
[[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0]]])]
Is there a faster way in which I can find all the possible moves for a given board state?
Wouldn't it be easier for memory as well as performance sake, to rather than keeping a complete board in the memory , keep the player figure's position in the memory. Lets take your example, player figure's location is -
player1 = (1, 4)
Let's assume , players position is denoted by (x,y) . Then you can compute the moves for that player at runtime (no need to keep in memory) , the possible moves would be -
(x+1,y)
(x+1,y+1)
(x+1,y-1)
If the player figure can circle back in the board, that is if he is at the bottom most position in the board, and then next moves would be the top most row , if that is the case, the moves would be determined by taking their modulo against the number of rows and columns (assuming nr - number of rows and nc - number of columns ) . Example , for (x,y) next moves would be -
((x+1) % nr,y % nc)
((x+1) % nr,(y+1) % nc)
((x+1) % nr,(y-1) % nc)

Drawing a directed graph using a link matrix with networkx

I am working on pagerank for a school project, and i have a matrix where the row "i" represent the links from the site j (line) to the site i. (If it is still unclear i'll explain more).
The current part is:
Z=[[0,1,1,1,1,0,1,0,0,0,0,0,0,0],[1,0,0,0,1,0,0,0,0,0,0,0,0,0], [1,1,0,0,0,0,0,0,0,0,0,0,0,0],[1,0,1,0,0,0,0,0,0,0,0,0,0,0],[1,0,0,1,0,0,0,0,0,0,0,0,0,0],[1,0,0,0,0,0,0,1,0,1,0,0,0,0],[0,0,0,0,0,1,0,0,0,0,0,0,0,0],[0,0,0,0,0,1,1,0,1,0,0,0,0,0],[0,0,0,0,0,1,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,1,0,1,1,1,1],[0,0,0,0,0,0,0,0,0,1,0,0,0,1],[0,0,0,0,0,0,0,0,0,1,1,0,0,0],[0,0,0,0,0,0,0,0,0,1,0,1,0,0],[0,0,0,0,0,0,0,0,0,1,0,0,1,0]]
A=np.matrix(Z)
G=nx.from_numpy_matrix(A,create_using=nx.MultiDiGraph())
pos=nx.circular_layout(G)
labels={}
for i in range (N):
labels[i]=i+1
nx.draw_circular(G)
nx.draw_networkx_labels(G,pos,labels,font_size=15)
The problem i have is that the labels are not where they are supposed to be, it seems that networkx is just placing them clockwise...
Also, how could i easily direct the graph, so that a link from j to i won't be from i to j?
Thanks!
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx
Z = [[0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0]]
G = nx.from_numpy_matrix(np.array(Z), create_using=nx.MultiDiGraph())
pos = nx.circular_layout(G)
nx.draw_circular(G)
labels = {i : i + 1 for i in G.nodes()}
nx.draw_networkx_labels(G, pos, labels, font_size=15)
plt.show()
yields
This result appears correct to me. Notice, for example, that the node labeled 1 has directed edges pointing to 2, 3, 4, 5 and 7. This corresponds to the ones on the first row in the array, Z[0]:
[0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0]
since the first row corresponds to node 1, and the ones in this row occur in the columns corresponding to nodes 2, 3, 4, 5 and 7.

Categories