Getting non-singleton cluster ids in scipy hierachical clustering [duplicate] - python

According to this we can get labels for non-singleton clusters.
I tried this with a simple example.
import numpy as np
import scipy.cluster.hierarchy
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
mat = np.array([[ 0. , 1. , 3. ,0. ,2. ,3. ,1.],
[ 1. , 0. , 3. , 1., 1. , 2. , 2.],
[ 3., 3. , 0., 3. , 3., 3. , 4.],
[ 0. , 1. , 3., 0. , 2. , 3., 1.],
[ 2. , 1., 3. , 2., 0. , 1., 3.],
[ 3. , 2., 3. , 3. , 1. , 0. , 3.],
[ 1. , 2., 4. , 1. , 3., 3. , 0.]])
def llf(id):
if id < n:
return str(id)
else:
return '[%d %d %1.2f]' % (id, count, R[n-id,3])
linkage_matrix = linkage(mat, "complete")
dendrogram(linkage_matrix,
p=4,
leaf_label_func=llf,
color_threshold=1,
truncate_mode='lastp',
distance_sort='ascending')
plt.show()
What are n, and count here?In a diagram like following I need to know who are listed under(3) and (2)?

I think the document is not very clear at this part and the sample code in it is not even operational. But it is clear that 1 means the 2nd observation and (3) means there are 3 observation in that node.
If you want to know what are the 3 obs. in the 2nd node, if that is your question:
In [51]:
D4=dendrogram(linkage_matrix,
color_threshold=1,
p=4,
truncate_mode='lastp',
distance_sort='ascending')
D7=dendrogram(linkage_matrix,
color_list=['g',]*7,
p=7,
truncate_mode='lastp',
distance_sort='ascending', no_plot=True)
from itertools import groupby
[list(group) for key, group in groupby(D7['ivl'],lambda x: x in D4['ivl'])]
Out[51]:
[['1'], ['6', '0', '3'], ['2'], ['4', '5']]
The 2nd node contains obs. 7th, 1th and 4th, and the 2th node contains the 5th and the 6th observations.

Related

How to create a specific upper triangular matrix?

I would like to create in python (using numpy) an upper triangular matrix in the form:
[[ 1, c, c^2],
[ 0, 1, c ],
[ 0, 0, 1 ]])
where c is a rational number and the rank of the matrix may vary (2, 3, 4, ...). Is there any smart way to do it other than creating rows and stacking them?
r = 3
c = 3
i,j = np.indices((r,r))
np.triu(float(c)**(j-i))
Result:
array([[1., 3., 9.],
[0., 1., 3.],
[0., 0., 1.]])
There are probably more straightforward solutions but this is what I came up with:
import numpy as np
c=5
m=np.triu(c**np.triu(np.ones((3,3)), 1).cumsum(axis =1))
print(m)
output:
[[ 1. 5. 25.]
[ 0. 1. 5.]
[ 0. 0. 1.]]

how to put Multiple Matrices Together into a Single Matrix?

I need to put multiple matrices together into a single matrix, like so:
I have the values for the matrix, but I can't get it to appear like how it does in the image- instead, my values end up stacked on top of each other in an array. How can I go about getting my matrices to look like the image above?
My code:
import numpy as np
w_estimate = [0.656540, 7.192304, 2.749036]
F = [np.identity(3) * -w_estimate[1:4], -np.identity(3)], [np.identity(3)*0, np.identity(3)*0]
It's supposed to look like:
F = [[np.identity(3) * -w_estimate[1:4], -np.identity(3)]
[np.identity(3) * 0, np.identity(3) * 0]]
but instead it looks like:
[[np.identity(3) * -w_estimate[1:4]],
[-np.identity(3)],
[np.identity(3) * 0],
[np.identity(3) * 0]]
Help is very much appreciated.
The first correction to your code pertains to -w_estimate[1:4].
Since w_estimate is a plain pythonic list, you can not apply
minus operator to it.
You can however apply minus operator to a Numpy array.
Another correction is to avoid -0 in the result.
To get an array with diagonal elements filled from some other array,
and all other zeroes, you can use np.diagonal_fill, which fills
in-place diagonal elements of some (earlier) created array
(using np.zeros).
So to construct 2 "upper" blocks of your result, you can write:
a1 = np.zeros((3,3))
a2 = a1.copy()
np.fill_diagonal(a1, -np.array(w_estimate)[1:4])
np.fill_diagonal(a2, -1)
Note that -np.array(w_estimate)[1:4] returns last 2 elements of
w_estimate them, i.e. [7.192304, 2.749036]. Since the target array
is "3 by 3", the source sequence is repeated (in this case, for the
last diagonal element only).
If your intention is different, change -np.array(w_estimate)[1:4]
accordingly.
And to construct the whole intended array, run:
F = np.vstack((np.hstack((a1, a2)), np.zeros((3,6))))
The result is:
array([[-7.192304, 0. , 0. , -1. , 0. , 0. ],
[ 0. , -2.749036, 0. , 0. , -1. , 0. ],
[ 0. , 0. , -7.192304, 0. , 0. , -1. ],
[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. ]])
You shoud definitely take a look at numpy.block method.
>>> A = np.eye(2) * 2
>>> B = np.eye(3) * 3
>>> np.block([
... [A, np.zeros((2, 3))],
... [np.ones((3, 2)), B ]
... ])
array([[2., 0., 0., 0., 0.],
[0., 2., 0., 0., 0.],
[1., 1., 3., 0., 0.],
[1., 1., 0., 3., 0.],
[1., 1., 0., 0., 3.]])

specific kind of slicing over tensor object in tensorflow

Summary of the question, Is this kind of slicing and then assignment supported in tensorflow?
out[tf_a2[y],x[:,None]] = tf_a1[tf_a2[y],x[:,None]]
final = out[:-1]
Lets give the example, I have a tensor like this:
tf_a1 = tf.Variable([ [9.968594, 8.655439, 0., 0. ],
[0., 8.3356, 0., 8.8974 ],
[0., 0., 6.103182, 7.330564 ],
[6.609862, 0., 3.0614321, 0. ],
[9.497023, 0., 3.8914037, 0. ],
[0., 8.457685, 8.602337, 0. ],
[0., 0., 5.826657, 8.283971 ],
[0., 0., 0., 0. ]])
and I have this one:
tf_a2 = tf.constant([[1, 2, 5],
[1, 4, 6],
[0, 7, 7],
[2, 3, 6],
[2, 4, 7]])
Now I want to keep the elements in tf_a1 in which the combination of n (here n is 2) of them (index of them) is in the value of tf_a2. What does it mean?
For example, in tf_a1, in the first column, indexes which has value are: (0,3,4). Is there any row in tf_a2 which contains any combination of these two indexes: (0,3), (0,4) or (3,4). Actually, there is no such row. So all the elements in that column became zero.
Indexes for the second column in tf_a1 is (0,1) (0,5) (1,5). As you see the record (1,5) is available in the tf_a2 in the first row. That's why we keep those in the tf_a1.
This is the correct numpy code:
y,x = np.where(np.count_nonzero(a1p[a2], axis=1) >= n)
out = np.zeros_like(tf_a1)
out[tf_a2[y],x[:,None]] = tf_a1[tf_a2[y],x[:,None]]
final = out[:-1]
This is the expected output of this numpy code (but I need this in tensorflow):
[[0. 0. 0. 0. ]
[0. 8.3356 0. 8.8974 ]
[0. 0. 6.103182 7.330564 ]
[0. 0. 3.0614321 0. ]
[0. 0. 3.8914037 0. ]
[0. 8.457685 8.602337 0. ]
[0. 0. 5.826657 8.283971 ]]
The tensorflow code should be something like this:
y, x = tf.where(tf.count_nonzero(tf.gather(tf_a1, tf_a2, axis=0), axis=1) >= n)
out = tf.zeros_like(tf_a1)
out[tf_a2[y],x[:,None]] = tf_a1[tf_a2[y],x[:,None]]
final = out[:-1]
This part of the code tf.gather(tf_a1, tf_a2, axis=0), axis=1) is doing the numpy like slicing tf_a1[tf_a2]
Update 1
The only line which does not work its:
out[tf_a2[y],x[:,None]] = tf_a1[tf_a2[y],x[:,None]]
final = out[:-1]
Any idea how can I accomplish this in tensorflow, is this kind of slicing is supported in tensor object at all?
Any help is appreciated:)

Change values of a tensorflow tensor based on condition

I'm trying to recreate a numpy code snippet I wrote in tensorflow, but I'm struggling to find the correct/best tensorflow operations.
Consider the following numpy solution:
import numpy as np
# Initialize a random numpy array:
my_dummy = np.random.random((6, 2, 2, 10))
print(my_dummy)
> [[[[0.6715164 0.58915908 0.36607568 0.73404715 0.69455375 0.52177771
0.91810873 0.85010461 0.37485212 0.35634401]
[0.55885052 0.13041019 0.89774818 0.3363019 0.66634638 0.32054576
0.46174629 0.59975141 0.02283781 0.02997967]]
....
]]]]
# Create random floats, based on channel 0 of my dummy:
random_floats = np.random.random(my_dummy.shape[0])
print(random_floats)
> [0.89351759 0.76734892 0.36810602 0.08513434 0.65511941 0.61297472]
# Create a mask with ones and a shape based on my_dummy:
my_mask = np.ones((my_dummy.shape[0], 1, 1, my_dummy.shape[-1]))
print(my_mask)
> [[[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]]
[[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]]
[[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]]
[[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]]
[[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]]
[[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]]]
# Initialize a rate parameter:
my_rate = 0.5
# Based on my_rate, change the array accordingly:
my_mask[my_rate > random_floats] = [1, 0, 1, 0, 1, 0, 1, 0, 1, 0]
print(my_mask)
[[[[1. 0. 1. 0. 1. 0. 1. 0. 1. 0.]]]
[[[1. 0. 1. 0. 1. 0. 1. 0. 1. 0.]]]
[[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]]
[[[1. 0. 1. 0. 1. 0. 1. 0. 1. 0.]]]
[[[1. 0. 1. 0. 1. 0. 1. 0. 1. 0.]]]
[[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]]]
# Multiply my_dummy with the new mask:
np.multiply(my_dummy, my_mask)
array([[[[0.6715164 , 0.58915908, 0.36607568, 0.73404715, 0.69455375,
0.52177771, 0.91810873, 0.85010461, 0.37485212, 0.35634401],
[0.55885052, 0.13041019, 0.89774818, 0.3363019 , 0.66634638,
0.32054576, 0.46174629, 0.59975141, 0.02283781, 0.02997967]],
[[0.22358676, 0.74959561, 0.11109368, 0.56021714, 0.2767754 ,
0.55156506, 0.15488703, 0.25738564, 0.18588607, 0.57593545],
[0.15804289, 0.87858207, 0.12890992, 0.78828551, 0.52467083,
0.45117698, 0.2605117 , 0.46659721, 0.855278 , 0.29630581]]],
[[[0.381445 , 0. , 0.48308211, 0. , 0.5136352 ,
0. , 0.84428703, 0. , 0.20532641, 0. ],
[0.696645 , 0. , 0.84184568, 0. , 0.01369105,
0. , 0.27683334, 0. , 0.59356542, 0. ]],
[[0.5281193 , 0. , 0.82336821, 0. , 0.63435181,
0. , 0.12824084, 0. , 0.35045286, 0. ],
[0.02205884, 0. , 0.22927706, 0. , 0.45538199,
0. , 0.81220918, 0. , 0.46427429, 0. ]]],
.....
]]]])
In tensorflow, I did this (warning, many imports, I tried a lot of things and no longer sure whether all of them are necessary, just want to ensure you can reproduce immediately):
from keras.engine.base_layer import InputSpec
from tensorflow.python.util import deprecation
from tensorflow.python.framework import ops
from tensorflow.python.eager import context
from tensorflow.python.framework import tensor_shape
from tensorflow.python.framework import tensor_util
from tensorflow.python.ops import array_ops
from tensorflow.python.ops import random_ops
from tensorflow.python.ops import math_ops
from tensorflow.python.platform import tf_logging as logging
import numbers
import numpy as np
import tensorflow as tf
from tensorflow.python.framework import ops
from tensorflow.python.ops import random_ops
from tensorflow.python.ops import math_ops
from keras import backend as K
# Create my_dummy and convert to tensor object:
my_dummy = np.random.random((6, 2, 2, 4))
my_dummy = ops.convert_to_tensor(my_dummy)
my_dummy.get_shape()
> TensorShape([Dimension(6), Dimension(2), Dimension(2), Dimension(4)])
# Create random floats, like before and inspect tensor with Keras (instead of running a tf session):
random_floats = random_ops.random_uniform([my_dummy.get_shape().as_list()[0]], dtype=my_dummy.dtype)
K.eval(random_floats)
> array([0.74018297, 0.76996447, 0.52047441, 0.28215968, 0.91457724,
0.64637448])
# Like before, create a mask with ones, like before shape (almost completely) based on my_dummy:
my_mask = tf.ones([my_dummy.get_shape()[0], 1, 1, my_dummy.get_shape()[-1]], dtype=x.dtype)
K.eval(my_mask)
> array([[[[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]]],
[[[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]]],
[[[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]]],
[[[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]]],
[[[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]]],
[[[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]]]])
Unfortunately, this where I'm stuck. I did not find a way to alter the entries in the my_mask Tensor object, based on a rate value. One thing I tried was tf.where:
tf.where(rate > random_floats, my_mask, tf.constant([1, 0, 1, 0, 1, 0, 1, 0, 1, 0], dtype = my_dummy.dtype))
but get the error:
ValueError: Shapes must be equal rank, but are 4 and 1 for 'Select_1' (op: 'Select') with input shapes: [6], [6,1,1,10], [10].
Thankful for any advice/help :)
It is basically more or less the same in tensorflow. Showing with smaller shaped data for convenience:
import tensorflow as tf
value_to_assign = tf.constant([[1., 0., 1., 0., 1.]])
rate = tf.constant(.5)
dummy = tf.random_normal(shape=(4, 1, 1, 5))
# random_floats = tf.random_normal(shape=(tf.shape(dummy)[0], ))
random_floats = tf.constant([0.4, 0.6, .7, .2]) # <--using const values to illustrate
init_val = tf.ones((tf.shape(dummy)[0], 1, 1, tf.shape(dummy)[-1]))
mask = tf.Variable(init_val,
trainable=False)
indices = tf.where(tf.equal(True, rate > random_floats))
tiled = tf.tile(value_to_assign,
multiples=[tf.shape(indices)[0], 1])[:, tf.newaxis, tf.newaxis, :]
mask = tf.scatter_nd_update(mask,
indices=indices,
updates=tiled)
res = mask * dummy
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print('MASK')
print(sess.run(mask))
print('DUMMY')
print(sess.run(dummy))
print('RESULT')
print(sess.run(res))
MASK
[[[[1. 0. 1. 0. 1.]]]
[[[1. 1. 1. 1. 1.]]]
[[[1. 1. 1. 1. 1.]]]
[[[1. 0. 1. 0. 1.]]]]
DUMMY
[[[[-1.2031308 -1.6657363 -1.5552464 0.8540495 0.37618718]]]
[[[-0.4468031 0.46417323 -0.3764856 1.1906835 -1.4670093 ]]]
[[[ 1.2066191 -1.4767337 -0.9487017 -0.49180242 -0.33098853]]]
[[[-0.1621628 0.61168176 0.10006899 0.7585997 -0.23903783]]]]
RESULT
[[[[ 1.7753109 0. -0.5451439 -0. -0.53782284]]]
[[[ 0.08024058 -1.8178499 1.183356 1.0895957 -0.9272436 ]]]
[[[-0.5266396 -2.0316153 -1.0043124 -1.1657876 0.6106227 ]]]
[[[-0.46503183 0. 0.01983969 -0. 0.58563703]]]]

How to get non-singleton cluster ids in scipy hierachical clustering

According to this we can get labels for non-singleton clusters.
I tried this with a simple example.
import numpy as np
import scipy.cluster.hierarchy
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
mat = np.array([[ 0. , 1. , 3. ,0. ,2. ,3. ,1.],
[ 1. , 0. , 3. , 1., 1. , 2. , 2.],
[ 3., 3. , 0., 3. , 3., 3. , 4.],
[ 0. , 1. , 3., 0. , 2. , 3., 1.],
[ 2. , 1., 3. , 2., 0. , 1., 3.],
[ 3. , 2., 3. , 3. , 1. , 0. , 3.],
[ 1. , 2., 4. , 1. , 3., 3. , 0.]])
def llf(id):
if id < n:
return str(id)
else:
return '[%d %d %1.2f]' % (id, count, R[n-id,3])
linkage_matrix = linkage(mat, "complete")
dendrogram(linkage_matrix,
p=4,
leaf_label_func=llf,
color_threshold=1,
truncate_mode='lastp',
distance_sort='ascending')
plt.show()
What are n, and count here?In a diagram like following I need to know who are listed under(3) and (2)?
I think the document is not very clear at this part and the sample code in it is not even operational. But it is clear that 1 means the 2nd observation and (3) means there are 3 observation in that node.
If you want to know what are the 3 obs. in the 2nd node, if that is your question:
In [51]:
D4=dendrogram(linkage_matrix,
color_threshold=1,
p=4,
truncate_mode='lastp',
distance_sort='ascending')
D7=dendrogram(linkage_matrix,
color_list=['g',]*7,
p=7,
truncate_mode='lastp',
distance_sort='ascending', no_plot=True)
from itertools import groupby
[list(group) for key, group in groupby(D7['ivl'],lambda x: x in D4['ivl'])]
Out[51]:
[['1'], ['6', '0', '3'], ['2'], ['4', '5']]
The 2nd node contains obs. 7th, 1th and 4th, and the 2th node contains the 5th and the 6th observations.

Categories