I'm trying to use the scipy.hierarchy.cluster module to hierarchically cluster some text. I've done the following:
l = linkage(model.wv.syn0, method='complete', metric='cosine')
den = dendrogram(
l,
leaf_rotation=0.,
leaf_font_size=16.,
orientation='left',
leaf_label_func=lambda v: str(model.wv.index2word[v])
The dendrogram function returns a dict containing a representation of the tree where:
den['ivl'] is a list of labels corresponding to the leaves:
['politics', 'protest', 'characterfirstvo', 'machine', 'writing', 'learning', 'healthcare', 'climate', 'of', 'rights', 'activism', 'resistance', 'apk', 'week', 'challenge', 'water', 'obamacare', 'colorado', 'change', 'voiceovers', '52', 'acting', 'android']
den['leaves'] is a list of the position of each leaf in the left-to-right traversal of the leaves:[0, 18, 5, 6, 2, 7, 12, 16, 21, 20, 22, 3, 10, 14, 15, 19, 11, 1, 17, 4, 13, 8, 9]
I know that scipy's to_tree() method converts a hierarchical clustering represented by a linkage matrix into a tree object by returning a reference to the root node (a ClusterNode object) - but I'm not sure how this root node corresponds to my leaves/labels. For example, the ids returned by the get_id() method in this case are root = 44, left = 41, right = 43:
rootnode, nodelist = to_tree(l, rd=True)
rootID = rootnode.get_id()
leftID = rootnode.get_left().get_id()
rightID = rootnode.get_right().get_id()
My question essentially is, how can I traverse this tree and get the corresponding position in den['leaves'] and label in den['ivl'] for each ClusterNode?
Thank you in advance for any help!
For reference, this is the linkage matrix l:
[[20. 22. 0.72081252 2. ]
[12. 16. 0.78620636 2. ]
[ 3. 10. 0.79635815 2. ]
[ 0. 18. 0.80193474 2. ]
[15. 19. 0.82297097 2. ]
[ 2. 7. 0.84152483 2. ]
[ 1. 17. 0.84453892 2. ]
[ 4. 13. 0.86098654 2. ]
[ 8. 9. 0.88163748 2. ]
[14. 27. 0.91252009 3. ]
[11. 29. 0.92034739 3. ]
[21. 23. 0.92406542 3. ]
[ 5. 6. 0.93213108 2. ]
[25. 32. 0.98555722 5. ]
[26. 35. 0.99214198 4. ]
[30. 31. 1.05624908 4. ]
[24. 34. 1.0606247 5. ]
[28. 39. 1.06322889 7. ]
[37. 40. 1.1455562 11. ]
[33. 38. 1.15171714 7. ]
[36. 42. 1.17330334 12. ]
[41. 43. 1.25056073 23. ]]
Related
I try to use recursive function on python, i have this matrix W:
[[ 13. 14. ]
[ 12. 15. ]
[ 0. 4. ]
[ 3. 6. ]
[ 7. 8. ]
[ 11. 18. ]
[ 10. 17. ]
[ 2. 23. ]
[ 5. 22. ]
[ 16. 19. ]
[ 1. 27. ]
[ 9. 21. ]
[ 25. 29. ]
[ 24. 28. ]
[ 20. 26. ]
[ 31. 32. ]
[ 30. 33. ]
[ 34. 35. ]
[ 36. 37. ]]
the principle that for each line, I get the values of the two columns, if they are <20 I return them, otherwise I do the modulo until I reach a value lower than 20. for example I have a value 35 which is> 20, so 35% 20 = 15, I go to line 15 I get the value, if I find it for example 11, I return 11, if I find it 23 for example, I redo the modulo 23% 20 = 3, I go to line 3 and so on .. this is my code
def modulo(entier):
if entier < 20:
return(entier)
else:
c = (entier % 20)
if int(W[c,0]) < 20:
return(int(W[c,0]))
else:
a = modulo(int(W[c,0]))
return(a)
if int(W[c,1]) < 20:
return(int(W[c,1]))
else:
e = modulo(int(W[c,1]))
return(e)
i = 12
print(modulo(int(W[i,0])), modulo(int(W[i,1])))
here I tried with line 12 of the matrix, which has both values 25 and 29, following the principle the function must return 11 and 18 for the value 25 and 16 and 19 for the value 29. but in the running, the program only displays two values 11 and 16. so I have the impression that it just points to the first column of the matrix, it does not read the second if condition. I hope I explained the problem well and I find a solution. Thank you
[[ 208.47 26. ]
[ 202.84 17. ]
[ 143.37 10. ]
...,
[ 45.99 3. ]
[ 159.31 10. ]
[ 34.12 4. ]]
[[ 58.64 1. ]
[ 44.31 19. ]
[ 37.89 14. ]
...,
[ 46.86 4. ]
[ 60.73 5. ]
[ 41.91 6. ]]
[[ 36.6 4. ]
[ 219.29 17. ]
[ 64.77 5. ]
...,
[ 51.85 37. ]
[ 161.26 10. ]
[ 53.63 20. ]]
[[ 52.97 32. ]
[ 51.32 3. ]
[ 196.23 4. ]
...,
[ 41.39 8. ]
[ 47.49 5. ]
[ 34.34 3. ]]
I have this numpy array entering my function:
def initialize_centroids(points, k):
"""returns k centroids from the initial points"""
centroids = points.copy()
np.random.shuffle(centroids)
print centroids
return centroids[:k]
Now what the function is currently doing is, shuffling the values and sending the first k of them. I want to basically randomize the values of the first column between 0 and 300 and the second between 0 and 100. How would I do this?
This is part of my work on building a K-Means algorithm using Python.
As #kazemakase has commented, the answer is simply using:
np.random.rand(k, 2) * [300, 100]
This question already has answers here:
Sorting a 2D numpy array by multiple axes
(7 answers)
Closed 7 years ago.
So I've a 2d array, that when sorted by the second column using a[np.argsort(-a[:,1])] looks like this:
array([[ 30. , 98.7804878 ],
[ 24. , 98.7804878 ],
[ 21. , 98.7804878 ],
[ 26. , 98.7804878 ],
[ 20. , 98.70875179],
[ 4. , 98.27833572],
[ 1. , 7.10186514]])
Now I want to sort this by the lowest "id" column so it looks like this:
array([[ 21. , 98.7804878 ],
[ 24. , 98.7804878 ],
[ 26. , 98.7804878 ],
[ 30. , 98.7804878 ],
[ 20. , 98.70875179],
[ 4. , 98.27833572],
[ 1. , 7.10186514]])
I can't figure out how to do it, even if I take the top highest percentages from the first and then order them.
You can use np.lexsort for this:
>>> a[np.lexsort((a[:, 0], -a[:, 1]))]
array([[ 21. , 98.7804878 ],
[ 24. , 98.7804878 ],
[ 26. , 98.7804878 ],
[ 30. , 98.7804878 ],
[ 20. , 98.70875179],
[ 4. , 98.27833572],
[ 1. , 7.10186514]])
This sorts by -a[:, 1], then by a[:, 0], returning an array of indices than you can use to index a.
Say that I have a sparse matrix in scipy.sparse format. How can I extract a diagonal other than than the main diagonal? For a numpy array, you can use numpy.diag. Is there a scipy sparse equivalent?
For example:
from scipy import sparse
A = sparse.diags(ones(5),1)
How would I get back the vector of ones without converting to a numpy array?
When the sparse array is in dia format, the data along the diagonals is recorded in the offsets and data attributes:
import scipy.sparse as sparse
import numpy as np
def make_sparse_array():
A = np.arange(ncol*nrow).reshape(nrow, ncol)
row, col = zip(*np.ndindex(nrow, ncol))
val = A.ravel()
A = sparse.coo_matrix(
(val, (row, col)), shape=(nrow, ncol), dtype='float')
A = A.todia()
# A = sparse.diags(np.ones(5), 1)
# A = sparse.diags([np.ones(4),np.ones(3)*2,], [2,3])
print(A.toarray())
return A
nrow, ncol = 10, 5
A = make_sparse_array()
diags = {offset:(diag[offset:nrow+offset] if 0<=offset<=ncol else
diag if offset+nrow-ncol>=0 else
diag[:offset+nrow-ncol])
for offset, diag in zip(A.offsets, A.data)}
for offset, diag in sorted(diags.iteritems()):
print('{o}: {d}'.format(o=offset, d=diag))
Thus for the array
[[ 0. 1. 2. 3. 4.]
[ 5. 6. 7. 8. 9.]
[ 10. 11. 12. 13. 14.]
[ 15. 16. 17. 18. 19.]
[ 20. 21. 22. 23. 24.]
[ 25. 26. 27. 28. 29.]
[ 30. 31. 32. 33. 34.]
[ 35. 36. 37. 38. 39.]
[ 40. 41. 42. 43. 44.]
[ 45. 46. 47. 48. 49.]]
the code above yields
-9: [ 45.]
-8: [ 40. 46.]
-7: [ 35. 41. 47.]
-6: [ 30. 36. 42. 48.]
-5: [ 25. 31. 37. 43. 49.]
-4: [ 20. 26. 32. 38. 44.]
-3: [ 15. 21. 27. 33. 39.]
-2: [ 10. 16. 22. 28. 34.]
-1: [ 5. 11. 17. 23. 29.]
0: [ 0. 6. 12. 18. 24.]
1: [ 1. 7. 13. 19.]
2: [ 2. 8. 14.]
3: [ 3. 9.]
4: [ 4.]
The output above is printing the offset followed by the diagonal at that offset.
The code above should work for any sparse array. I used a fully populated sparse array only to make it easier to check that the output is correct.
Hi there
I need to convert a matrix to a list as the example below
Matrix:
[[ 1. 6. 13. 10. 2.]
[ 2. 9. 10. 13. 15.]
[ 3. 15. 13. 14. 16.]
[ 4. 5. 14. 13. 6.]
[ 5. 18. 16. 4. 3.]
[ 6. 7. 12. 18. 3.]
[ 7. 1. 8. 17. 11.]
[ 8. 14. 5. 4. 16.]
[ 9. 16. 18. 17. 15.]
[ 10. 8. 9. 15. 17.]
[ 11. 11. 17. 18. 12.]]
List:
[(1, 6, 13, 10, 2), (2, 9, 10, 13, 15), (3, 15, 13, 14, 16),
(4, 5, 14, 13, 6), (5, 18, 16, 4, 3), (6, 7, 12, 18, 3),
(7, 1, 8, 17, 11), (8, 14, 5, 4, 16), (9, 16, 18, 17, 15),
(10, 8, 9, 15, 17), (11, 11, 17, 18, 12)]
Thx in adavance
Is this a numpy matrix? If so, just use the tolist() method. E.g.:
import numpy as np
x = np.matrix([[1,2,3],
[7,1,3],
[9,4,3]])
y = x.tolist()
This yields:
y --> [[1, 2, 3], [7, 1, 3], [9, 4, 3]]
if you are using numpy and you want to just traverse the matrix as a list then you can just
from numpy import array
m = [[ 1. 6. 13. 10. 2.]
[ 2. 9. 10. 13. 15.]
[ 3. 15. 13. 14. 16.]
[ 4. 5. 14. 13. 6.]
[ 5. 18. 16. 4. 3.]
[ 6. 7. 12. 18. 3.]
[ 7. 1. 8. 17. 11.]
[ 8. 14. 5. 4. 16.]
[ 9. 16. 18. 17. 15.]
[ 10. 8. 9. 15. 17.]
[ 11. 11. 17. 18. 12.]]
for x in array(m).flat:
print x
This will not consume extra memory
The best way to do it is:
result = map(tuple, Matrix)
OR you can use one of those :
1- li = list(i for j in yourMatrix for i in j)
2- li = sum(yourMatrix, [])