Find all 100% event combinations in correlation matrix - python

I have a correlation matrix which correlated a list of events with themselves.
So, if the intersection between events A and B is 1, every time when A happens B also happens, etc. the typical correlation stuff.
And when an event G also correlates for 1 with A and B all three events always happen at the same time.
Now I tried for some time to figure out how to get a list containing all the event combinations that correlate for 1 with each element in their combination.
How do I get a list of all 100% dependent event combinations?
I managed to get the coordinates of all 1 vectors of the matrix and tried for some time to compare them with their spacial coordinates that would be expected in the matrix in case of 100% dependents but I am not quite figuring it out yet and I am wondering if there is an easier way.
I got the correlation matrix via pandas .corr(), resulting in a dataframe, which I then tranformed in a numpy array.
with a subset of a correlation matrix for example looking like this:
matrix = np.array([
[1, -0.5, -0.5, 1, 1, -0.5, -0.5, -0.5, -0.5],
[-0.5, 1, 1, -0.5, -0.5, 1, 1, 1, 1],
[-0.5, 1, 1, -0.5, -0.5, 1, 1, 1, 1],
[1, -0.5, -0.5, 1, 1, -0.5, -0.5, -0.5, -0.5],
[1, -0.5, -0.5, 1, 1, -0.5, -0.5, -0.5, -0.5],
[-0.5, 1, 1, -0.5, -0.5, 1, 1, 1, 1],
[-0.5, 1, 1, -0.5, -0.5, 1, 1, 1, 1],
[-0.5, 1, 1, -0.5, -0.5, 1, 1, 1, 1],
[-0.5, 1, 1, -0.5, -0.5, 1, 1, 1, 1],
])
As seen in this example from the events, indice 0 to 8, with the indices 1,2,5,6,7 and 8 correlate with each other for 1 (100%), and so do indices 0, 3 and 4
as output i would like some sort of list, like
result = [
[1,2,5,6,7, 8],[0, 3, 4]
]

Related

how to use pandas list to solve linear equation using numpy

I was hoping to find out how to take two pandas lists and solve for x.
list_a = [-1, 1, 1, -1, 1, 1, 1, -1, 0, 1]
list_b = [0.0, 0.0, 1.75, -1.125, 1.0, 0.5, 0.0, -1.25, 1.375, -0.125]
for each entry in the list I would like to compute the following:
x + list_b = list_a
which would then return a list of 10 elements with the result for x.
Any help is appreciated!
If you convert list_a and list_b to numpy.array you can just solve for x using subtraction, which numpy will perform elementwise
>>> import numpy as np
>>> list_a = np.array([-1, 1, 1, -1, 1, 1, 1, -1, 0, 1])
>>> list_b = np.array([0.0, 0.0, 1.75, -1.125, 1.0, 0.5, 0.0, -1.25, 1.375, -0.125])
>>> x = list_a - list_b
>>> x
array([-1. , 1. , -0.75 , 0.125, 0. , 0.5 , 1. , 0.25 , -1.375, 1.125])

Add a list to combinations

I have list, A, containing 8 values. I like to make combinations first, then add two more points in list B to each combination. Here is my code:
def combination(arr, r):
return list(itertools.combinations(arr, r))
A = [[0.0, 0.0, 0.0], [0.0, 0.5, 0.0], [0.5, 0.0, 0.5], [0.5, 0.5, 0.5], [0.0, 0.25, 0.0], [0.0, 0.7499999819999985, 0.0], [0.5, 0.25, 0.5], [0.5, 0.7499999819999985, 0.5]]
B = [[0.4950293947410103, 0.5021785267638279, 0.4935703740156043], [1, 1, 1]]
n = 1 #can be change
com = combination(A, n)
for item in com:
item.extend(B)
print(item)
But I got an error:
AttributeError: 'tuple' object has no attribute 'extend'
Expected results:
[[0.0, 0.0, 0.0], [0.4950293947410103, 0.5021785267638279, 0.4935703740156043], [1, 1, 1]]
[[0.0, 0.5, 0.0], [0.4950293947410103, 0.5021785267638279, 0.4935703740156043], [1, 1, 1]]
[[0.5, 0.0, 0.5], [0.4950293947410103, 0.5021785267638279, 0.4935703740156043], [1, 1, 1]]
[[0.5, 0.5, 0.5], [0.4950293947410103, 0.5021785267638279, 0.4935703740156043], [1, 1, 1]]
[[0.0, 0.25, 0.0], [0.4950293947410103, 0.5021785267638279, 0.4935703740156043], [1, 1, 1]]
[[0.0, 0.7499999819999985, 0.0], [0.4950293947410103, 0.5021785267638279, 0.4935703740156043], [1, 1, 1]]
[[0.5, 0.25, 0.5], [0.4950293947410103, 0.5021785267638279, 0.4935703740156043], [1, 1, 1]]
[[0.5, 0.7499999819999985, 0.5], [0.4950293947410103, 0.5021785267638279, 0.4935703740156043], [1, 1, 1]]
Type tuple (return by combinations) is immutable, you may use a list and populate it with the item and B
com = combination(A, n)
com = [[*item, *B] for item in com]
Or return list of list from your combination
def combination(arr, r):
return [list(c) for c in itertools.combinations(arr, r)]
# ...
for item in com:
item.extend(B)
You can use also map function, to get list of list.
map() function returns a map object(which is an iterator) of the results after applying the given function to each item of a given iterable (in your case list)
instead of:
com = combination(A, n)
use:
com = map(list, combination(A, n))

tf.metrics.mean_iou returns weird values when fed with a tf.constant

I am trying to understand why tf.metrics.mean_iou returns different values in every trials. If input is same over for loop, I guess the output also should be the same. I expected miou = 0.061, but the following code returns unexpected values some times even if the inputs in every loop are the same. Could someone help me please?
import numpy as np
import tensorflow as tf
"""
iou=TP/(TP + FP + FN)
"""
num_iteration = 10
num_classes = 3
l = np.array([[0, 1, 2, 0],
[1, 0, 1, 0],
[0, 1, 1, 1],
[0, 2, 2, 0]])
tf_label = tf.constant(l, dtype=tf.int32)
p0 = np.array([[0.0, 0.1, 0.7, 0.3],
[0.6, 0.3, 0.4, 0.9],
[0.3, 0.6, 0.3, 0.1],
[0.1, 0.2, 0.3, 0.4]])
p1 = np.array([[0.6, 0.1, 0.1, 0.3],
[0.6, 0.3, 0.4, 0.9],
[0.3, 0.6, 0.3, 0.1],
[0.1, 0.5, 0.5, 0.4]])
p2 = 1 - p0 - p1
p = np.stack((p0, p1, p2), axis=2)
tf_logit = tf.constant(p, dtype=tf.float32)
tf_prediction = tf.argmax(tf_logit, axis=2)
miou = tf.metrics.mean_iou(labels=tf_label, predictions=tf_prediction, num_classes=num_classes)
sv = tf.train.Supervisor(logdir=None, summary_op=None)
with sv.managed_session() as sess:
for i in range(num_iteration):
label, prediction = sess.run([tf_label, tf_prediction])
mean_iou, update_op = sess.run(miou)
print(mean_iou)
the above code returns different values in trials
There seems to be a bug in tensorflow when using tf.constants with running metrics; see this post.

python numpy: evaluate list of probabilities to randomly generated binary values

I have an array of probabilities, and I want to use those probabilities to generate an array of picked values, each value picked with the corresponding probability.
Example:
in: [ 0.5, 0.5, 0.5, 0.5, 0.01, 0.01, 0.99, 0.99]
out: [ 0, 1, 1, 0, 0, 0, 1, 1]
I'd like to use numpy native functions for this, rather than the following loop:
array_of_probs = [0.5, 0.5, 0.5, 0.5, 0.01, 0.01, 0.99, 0.99]
results = np.zeros(len(array_of_probs))
for i, probs in enumerate(array_of_probs):
results[i] = np.random.choice([1, 0], 1, p=[probs, 1-probs])
You can easily calculate this by comparing the array with a random generated array, as the probability that a random number between 0 and 1 is smaller than 0.3 is 0.3.
E.g.
np.random.rand(len(odds)) < odds

Python, issue with ties using argsort

I have the following problem with sorting a 2D array using the function argsort.
More precisely, let's assume I have 5 points and have calculated the euclidean distances between them, which are stored in the 2D array D:
D=np.array([[0,0.3,0.4,0.2,0.5],[0.3,0,0.2,0.6,0.1],
[0.4,0.2,0,0.5,0],[0.2,0.6,0.5,0,0.7],[0.5,0.1,0,0.7,0]])
D
array([[ 0. , 0.3, 0.4, 0.2, 0.5],
[ 0.3, 0. , 0.2, 0.6, 0.1],
[ 0.4, 0.2, 0. , 0.5, 0. ],
[ 0.2, 0.6, 0.5, 0. , 0.7],
[ 0.5, 0.1, 0. , 0.7, 0. ]])
Each element D[i,j] (i,j=0,...,4) shows the distance between point i and point j. The diagonal entries are of course equal to zero, as they show the distance of a point to itself. However, there can be 2 or more points which overlap. For instance, in this particular case, point 4 is located in the same position of point 2, so that the the distances D[2,4] and D[4,2] are equal to zero.
Now, I want to sort this array D: for each point i I want to know the indices of its neighbour points, from the closest to the furthest one. Of course, for a given point i the first point/index in the sorted array should be i, i.e. the the closest point to point 1 is 1. I used the function argsort:
N = np.argsort(D)
N
array([[0, 3, 1, 2, 4],
[1, 4, 2, 0, 3],
[2, 4, 1, 0, 3],
[3, 0, 2, 1, 4],
[2, 4, 1, 0, 3]])
This function sorts the distances properly until it gets to point 4: the first entry of the 4th row (counting from zero) is not 4 (D[4,4]=0) as I would like. I would like the 4th row to be [4, 2, 1, 0, 3]. The first entry is 2, because points 2 and 4 overlap so that D[2,4]=D[4,2], and between the same value entries D[2,4]=0 and D[4,2]=0, argsort selects always the first one.
Is there a way to fix this so that the sorted array N[i,j] of D[i,j] always starts with the indices corresponding to the diagonal entries D[i,i]=0?
Thank you for your help,
MarcoC
One way would be to fill the diagonal elements with something lesser than global minimum and then use argsort -
In [286]: np.fill_diagonal(D,D.min()-1) # Or use -1 for filling
# if we know beforehand that the global minimum is 0
In [287]: np.argsort(D)
Out[287]:
array([[0, 3, 1, 2, 4],
[1, 4, 2, 0, 3],
[2, 4, 1, 0, 3],
[3, 0, 2, 1, 4],
[4, 2, 1, 0, 3]])
If you don't want the input array to be changed, make a copy and then do the diagonal filling.
How about this:
import numpy as np
D = np.array([[ 0. , 0.3, 0.4, 0.2, 0.5],
[ 0.3, 0. , 0.2, 0.6, 0.1],
[ 0.4, 0.2, 0. , 0.5, 0. ],
[ 0.2, 0.6, 0.5, 0. , 0.7],
[ 0.5, 0.1, 0. , 0.7, 0. ]])
s = np.argsort(D)
line = np.argwhere(s[:,0] != np.arange(D.shape[0]))[0,0]
column = np.argwhere(s[line,:] == line)[0,0]
s[line,0], s[line, column] = s[line, column], s[line,0]
Just find the lines that don't have the dioganal element in front using numpy.argwhere, then the column to swap and then swap the elements. Then s contains what you want in the end.
This works for your example. In a general case, where numpy.argwhere can contain several elements, one would have to run a loop over those elements instead of just typing [0,0] at the end of the two lines of code above.
Hope I could help.

Categories