I try to use recursive function on python, i have this matrix W:
[[ 13. 14. ]
[ 12. 15. ]
[ 0. 4. ]
[ 3. 6. ]
[ 7. 8. ]
[ 11. 18. ]
[ 10. 17. ]
[ 2. 23. ]
[ 5. 22. ]
[ 16. 19. ]
[ 1. 27. ]
[ 9. 21. ]
[ 25. 29. ]
[ 24. 28. ]
[ 20. 26. ]
[ 31. 32. ]
[ 30. 33. ]
[ 34. 35. ]
[ 36. 37. ]]
the principle that for each line, I get the values of the two columns, if they are <20 I return them, otherwise I do the modulo until I reach a value lower than 20. for example I have a value 35 which is> 20, so 35% 20 = 15, I go to line 15 I get the value, if I find it for example 11, I return 11, if I find it 23 for example, I redo the modulo 23% 20 = 3, I go to line 3 and so on .. this is my code
def modulo(entier):
if entier < 20:
return(entier)
else:
c = (entier % 20)
if int(W[c,0]) < 20:
return(int(W[c,0]))
else:
a = modulo(int(W[c,0]))
return(a)
if int(W[c,1]) < 20:
return(int(W[c,1]))
else:
e = modulo(int(W[c,1]))
return(e)
i = 12
print(modulo(int(W[i,0])), modulo(int(W[i,1])))
here I tried with line 12 of the matrix, which has both values 25 and 29, following the principle the function must return 11 and 18 for the value 25 and 16 and 19 for the value 29. but in the running, the program only displays two values 11 and 16. so I have the impression that it just points to the first column of the matrix, it does not read the second if condition. I hope I explained the problem well and I find a solution. Thank you
Related
I've been working with dendrograms to determine the optimal number of clusters for hierarchical.
If having this:
Dendrogram
Having a linkage array like so, that defines this dendrogram:
[[ 1. 2. 5.83095189 2. ]
[ 3. 10. 9.21954446 3. ]
[ 6. 7. 11.18033989 2. ]
[ 0. 11. 13. 4. ]
[ 9. 12. 14.2126704 3. ]
[ 5. 14. 17.20465053 4. ]
[ 4. 13. 20.88061302 5. ]
[ 8. 15. 21.21320344 5. ]
[16. 17. 47.16990566 10. ]]
Comparing those values to the graph, index 2 defines the y value and index 3 defines the number of expansions. Now to determine the optimal value I need to calculate the max Distance.
How could I do this knowing that I would need to subtract 41.1699 to 21.21 and 41.1699 to 20.88 (subtracting in descending order of ramifications)
I am writing a program to find Isis rectangles based on a user input of n. The goal of the program is to run through the equation based on the input and output an array of dimensions for a rectangle where area == perimeter. I am new to NumPy and I'm struggling on finding an answer anywhere else.
Below is a chunk of my code that is responsible for outputting the array:
def choice_2():
n = int(input("Please enter a positive integer for n: "))
a1 = 2 * n +1
a2 = 4 * n
a = np.array(list(range(a1, a2+1)))
for j in range(a1, a2+1):
b = (2 * n * a)/(a - 2 * n)
print(f"\nIsis rectangles of type {n}")
print("----------------------------")
print(np.array(list(zip(a,b))))
And this is what my output is:
Isis rectangles of type 10
----------------------------
[[ 21. 420. ]
[ 22. 220. ]
[ 23. 153.33333333]
[ 24. 120. ]
[ 25. 100. ]
[ 26. 86.66666667]
[ 27. 77.14285714]
[ 28. 70. ]
[ 29. 64.44444444]
[ 30. 60. ]
[ 31. 56.36363636]
[ 32. 53.33333333]
[ 33. 50.76923077]
[ 34. 48.57142857]
[ 35. 46.66666667]
[ 36. 45. ]
[ 37. 43.52941176]
[ 38. 42.22222222]
[ 39. 41.05263158]
[ 40. 40. ]]
The math is working properly and it is zipping correctly, but I want to remove the rectangles that have float values. For example, the first rectangle with sides 21 and 420 is good, but the thrid rectangle with sides 23 and 153.33333333 is not something I want in the final array.
Find the indices which has integer values and filter a and b using this indices
idx = (b == b.astype(int)).nonzero()
print(np.array(list(zip(a[idx],b[idx]))))
Testcase:
n= 10
a1 = 2 * n +1
a2 = 4 * n
a = np.array(list(range(a1, a2+1)))
for j in range(a1, a2+1):
b = (2 * n * a)/(a - 2 * n)
print(f"\nIsis rectangles of type {n}")
print("----------------------------")
idx = (b == b.astype(int)).nonzero()
print(np.array(list(zip(a[idx],b[idx]))))
Output:
Isis rectangles of type 10
----------------------------
[[ 21. 420.]
[ 22. 220.]
[ 24. 120.]
[ 25. 100.]
[ 28. 70.]
[ 30. 60.]
[ 36. 45.]
[ 40. 40.]]
I'm trying to use the scipy.hierarchy.cluster module to hierarchically cluster some text. I've done the following:
l = linkage(model.wv.syn0, method='complete', metric='cosine')
den = dendrogram(
l,
leaf_rotation=0.,
leaf_font_size=16.,
orientation='left',
leaf_label_func=lambda v: str(model.wv.index2word[v])
The dendrogram function returns a dict containing a representation of the tree where:
den['ivl'] is a list of labels corresponding to the leaves:
['politics', 'protest', 'characterfirstvo', 'machine', 'writing', 'learning', 'healthcare', 'climate', 'of', 'rights', 'activism', 'resistance', 'apk', 'week', 'challenge', 'water', 'obamacare', 'colorado', 'change', 'voiceovers', '52', 'acting', 'android']
den['leaves'] is a list of the position of each leaf in the left-to-right traversal of the leaves:[0, 18, 5, 6, 2, 7, 12, 16, 21, 20, 22, 3, 10, 14, 15, 19, 11, 1, 17, 4, 13, 8, 9]
I know that scipy's to_tree() method converts a hierarchical clustering represented by a linkage matrix into a tree object by returning a reference to the root node (a ClusterNode object) - but I'm not sure how this root node corresponds to my leaves/labels. For example, the ids returned by the get_id() method in this case are root = 44, left = 41, right = 43:
rootnode, nodelist = to_tree(l, rd=True)
rootID = rootnode.get_id()
leftID = rootnode.get_left().get_id()
rightID = rootnode.get_right().get_id()
My question essentially is, how can I traverse this tree and get the corresponding position in den['leaves'] and label in den['ivl'] for each ClusterNode?
Thank you in advance for any help!
For reference, this is the linkage matrix l:
[[20. 22. 0.72081252 2. ]
[12. 16. 0.78620636 2. ]
[ 3. 10. 0.79635815 2. ]
[ 0. 18. 0.80193474 2. ]
[15. 19. 0.82297097 2. ]
[ 2. 7. 0.84152483 2. ]
[ 1. 17. 0.84453892 2. ]
[ 4. 13. 0.86098654 2. ]
[ 8. 9. 0.88163748 2. ]
[14. 27. 0.91252009 3. ]
[11. 29. 0.92034739 3. ]
[21. 23. 0.92406542 3. ]
[ 5. 6. 0.93213108 2. ]
[25. 32. 0.98555722 5. ]
[26. 35. 0.99214198 4. ]
[30. 31. 1.05624908 4. ]
[24. 34. 1.0606247 5. ]
[28. 39. 1.06322889 7. ]
[37. 40. 1.1455562 11. ]
[33. 38. 1.15171714 7. ]
[36. 42. 1.17330334 12. ]
[41. 43. 1.25056073 23. ]]
[[ 208.47 26. ]
[ 202.84 17. ]
[ 143.37 10. ]
...,
[ 45.99 3. ]
[ 159.31 10. ]
[ 34.12 4. ]]
[[ 58.64 1. ]
[ 44.31 19. ]
[ 37.89 14. ]
...,
[ 46.86 4. ]
[ 60.73 5. ]
[ 41.91 6. ]]
[[ 36.6 4. ]
[ 219.29 17. ]
[ 64.77 5. ]
...,
[ 51.85 37. ]
[ 161.26 10. ]
[ 53.63 20. ]]
[[ 52.97 32. ]
[ 51.32 3. ]
[ 196.23 4. ]
...,
[ 41.39 8. ]
[ 47.49 5. ]
[ 34.34 3. ]]
I have this numpy array entering my function:
def initialize_centroids(points, k):
"""returns k centroids from the initial points"""
centroids = points.copy()
np.random.shuffle(centroids)
print centroids
return centroids[:k]
Now what the function is currently doing is, shuffling the values and sending the first k of them. I want to basically randomize the values of the first column between 0 and 300 and the second between 0 and 100. How would I do this?
This is part of my work on building a K-Means algorithm using Python.
As #kazemakase has commented, the answer is simply using:
np.random.rand(k, 2) * [300, 100]
Say that I have a sparse matrix in scipy.sparse format. How can I extract a diagonal other than than the main diagonal? For a numpy array, you can use numpy.diag. Is there a scipy sparse equivalent?
For example:
from scipy import sparse
A = sparse.diags(ones(5),1)
How would I get back the vector of ones without converting to a numpy array?
When the sparse array is in dia format, the data along the diagonals is recorded in the offsets and data attributes:
import scipy.sparse as sparse
import numpy as np
def make_sparse_array():
A = np.arange(ncol*nrow).reshape(nrow, ncol)
row, col = zip(*np.ndindex(nrow, ncol))
val = A.ravel()
A = sparse.coo_matrix(
(val, (row, col)), shape=(nrow, ncol), dtype='float')
A = A.todia()
# A = sparse.diags(np.ones(5), 1)
# A = sparse.diags([np.ones(4),np.ones(3)*2,], [2,3])
print(A.toarray())
return A
nrow, ncol = 10, 5
A = make_sparse_array()
diags = {offset:(diag[offset:nrow+offset] if 0<=offset<=ncol else
diag if offset+nrow-ncol>=0 else
diag[:offset+nrow-ncol])
for offset, diag in zip(A.offsets, A.data)}
for offset, diag in sorted(diags.iteritems()):
print('{o}: {d}'.format(o=offset, d=diag))
Thus for the array
[[ 0. 1. 2. 3. 4.]
[ 5. 6. 7. 8. 9.]
[ 10. 11. 12. 13. 14.]
[ 15. 16. 17. 18. 19.]
[ 20. 21. 22. 23. 24.]
[ 25. 26. 27. 28. 29.]
[ 30. 31. 32. 33. 34.]
[ 35. 36. 37. 38. 39.]
[ 40. 41. 42. 43. 44.]
[ 45. 46. 47. 48. 49.]]
the code above yields
-9: [ 45.]
-8: [ 40. 46.]
-7: [ 35. 41. 47.]
-6: [ 30. 36. 42. 48.]
-5: [ 25. 31. 37. 43. 49.]
-4: [ 20. 26. 32. 38. 44.]
-3: [ 15. 21. 27. 33. 39.]
-2: [ 10. 16. 22. 28. 34.]
-1: [ 5. 11. 17. 23. 29.]
0: [ 0. 6. 12. 18. 24.]
1: [ 1. 7. 13. 19.]
2: [ 2. 8. 14.]
3: [ 3. 9.]
4: [ 4.]
The output above is printing the offset followed by the diagonal at that offset.
The code above should work for any sparse array. I used a fully populated sparse array only to make it easier to check that the output is correct.