setting an array element with a sequence matplolib 3d - python

I have the following strange problem. I am trying to do a 3d plot. That works ok. I wanted to put the projections on the surfaces of the plot. My code looks at the moment like this
fig = plt.figure(figsize = (10,8))
ax = fig.add_subplot(111, projection='3d')
ax.plot_trisurf(xarr, yarr, zarr, cmap=cm.coolwarm, linewidth=50)
ax.set_xlabel('\nMAE', fontsize = 14, linespacing = 1.5)
ax.set_ylabel('\nDIFF', fontsize = 14)
ax.set_zlabel('\nCounts', fontsize = 14, linespacing=1.5)
cset = ax.contour(np.array(xx), np.array(yy),
np.array(zz), zdir='z', offset=-100, cmap=cm.coolwarm)
cset = ax.contour(xx, yy, np.array(zz), zdir='x', offset=-40, cmap=cm.coolwarm)
cset = ax.contour(xx, yy, np.array(zz), zdir='y', offset=40, cmap=cm.coolwarm)
plt.show()
What is not working is the following line
cset = ax.contour(np.array(xx), np.array(yy),
np.array(zz), zdir='z', offset=-100, cmap=cm.coolwarm)
Here are the vectors
np.array(yy)
array([ 21, 6, 30, 3, 27, 61, 56, 52, 38, 14, 33, 12, 93,
129, 36, 11, 59, 9, 113, 18, 26, 8, 17, 10, 29, 2,
4, 16, 85, 55, 58, 45, 7, 15, 19, 5, 69, 57, 20,
158, 86, 118, 31, 107, 34, 92, 32, 28, 66, 54, 87, 25,
13, 99, 23, 60, 81, 24, 72, 123, 49, 63, 64, 71, 67,
40, 46, 48, 47, 95, 43, 159, 22, 37, 35, 105, 104, 42,
128, 53, 76, 75, 103, 65, 136, 144, 68, 77, 278, 98, 111,
114, 41, 84, 154, 62, 214, 124, 210, 1, 155, 79, 74, 80,
83, 318, 70, 120, 78, 44, 88, 73, 50, 110, 178, 51, 134,
106, 189, 91, 411, 135, 138, 143, 127, 122, 160, 94, 109, 226,
140, 117, 100, 133, 191, 141, 89, 288, 126, 97, 653, 121, 172,
161, 39, 96, 90, 130, 169, 142, 82, 132, 156, 137, 119, 102,
112, 188, 610, 115, 146, 234, 108, 150, 182, 170, 116, 223, 139,
197, 194, 241, 131, 181, 183, 152, 147, 250, 203, 165, 199, 218,
334, 167, 151, 384, 163, 162, 125, 148, 233, 354, 184, 168, 186,
180, 166, 369, 192, 101, 201, 157, 164, 419, 239], dtype=int64)
and
np.array(xx)
array([ 500., 1500., 2500., 3500., 4500., 5500., 6500.,
7500., 8500., 9500., 10500., 11500., 12500., 13500.,
14500., 15500., 16500., 17500., 18500., 19500., 20500.,
21500., 22500., 23500., 24500., 25500., 26500., 27500.,
28500.])
the zz has dimensions
np.array(zz).shape
(205,29)
as it should. Anyone can guess what is wrong? The complete error is
ValueError: setting an array element with a sequence.
I cannot unfortunately publish the data, but I hope the error is linked to how the data are structured...
Thanks in advance, Umberto

If you check shapes of X, Y and Z in contour3d example, you shall
find out that they are the same.
So, in order to make your code working, you
should extend your xx and yy to 2d arrays with np.meshgrid before creating a plot.
xx, yy = np.meshgrid(xx, yy)

Related

Graph contraction not working as expected

My input is a graph G, written as an adjacency list (each row of G starts with a vertex and all further values are the vertices adjacent to it).
There are 200 vertices in total.
I'm trying to contract the graph randomly, until only two vertices are left (part of the Karger algorithm).
My issue is that after several iterations, v2's index can't be found in G.
It appears that my code merging v2 into v1 doesn't work, as the same vertex is picked multiple times as v2, but I've no idea why.
removed = []
n = len(G) # number of vertices in G
while n > 2:
# Randomly choosing two vertices (and their index)
iv1 = random.randint(0, n - 1)
v1 = G[iv1][0]
v2 = random.choice(G[iv1][1:])
iv2 = None
for index, sublist in enumerate(G):
if sublist[0] is v2:
iv2 = index
# Debug code
removed.append(v2)
if iv2 is None:
print("===")
print("removed=", removed)
print("len set=", len(set(removed)), " len list=", len(removed))
print("G[iv1]=", G[iv1])
print("v1=", v1, " iv1=", iv1, " v2=", v2, " iv2=", iv2, "n=", n)
print("===")
break
# Graph Contraction (v1 and v2 merged, v1 becomes the merged vertex)
G[iv2].remove(v1) # Removing self-loops
G[iv1] += G[iv2][1:] # Vertices adjacent to v2 now adjacent to v1 (1/2)
G[iv1].remove(v2) # Removing self-loops
del G[iv2]
n -= 1
for i in range(n):
if G[i][0] is not v1:
G[i] = [v1 if vert is v2 else vert for vert in G[i]] # (2/2)
return len(G[0])
Here's an output example :
===
removed= [91, 98, 173, 23, 169, 179, 85, 54, 89, 110, 180, 2, 37, 17, 73, 43, 77, 34, 66, 19, 51, 178, 61, 99, 26, 52, 162, 111, 22, 149, 57, 118, 120, 30, 4, 28, 5, 27, 147, 188, 75, 136, 32, 40, 156, 145, 70, 138, 36, 12, 41, 140, 55, 152, 105, 60, 81, 64, 142, 45, 7, 148, 164, 49, 183, 165, 78, 74, 158, 160, 24, 146, 141, 182, 97, 116, 86, 96, 177, 186, 65, 135, 76, 9, 108, 3, 88, 151, 115, 42, 167, 185, 8, 190, 189, 175, 194, 184, 153, 196, 126, 195, 197, 107, 58, 6, 104, 117, 56, 199, 82, 168, 130, 29, 87, 121, 109, 90, 18, 132, 163, 198, 125, 13, 21, 154, 103, 72, 174, 187, 171, 80, 161, 191, 150, 137, 106, 79, 192, 1, 50, 155, 159, 35, 172, 176, 139, 20, 63, 38, 84, 119, 69, 94, 68, 193, 10, 95, 130]
len set= 158 len list= 159
G[iv1]= [123, 92, 92, 92, 129, 92, 11, 92, 92, 92, 92, 33, 47, 92, 92, 129, 92, 92, 92, 92, 69, 69, 33, 92, 129, 13, 128, 134, 92, 69, 92, 134, 92, 13, 114, 47, 13, 13, 128, 44, 134, 33, 123, 44, 181, 69, 33, 92, 16, 69, 134, 33, 157, 44, 83, 47, 181, 33, 92, 44, 92, 92, 181, 134, 129, 170, 92, 47, 129, 47, 44, 16, 181, 92, 44, 134, 157, 92, 11, 33, 181, 33, 92, 48, 92, 33, 13, 134, 130, 47, 92, 69, 92, 92, 134, 134, 92, 47, 123, 69, 92, 129, 130, 92, 114, 69, 69, 92, 44, 129, 157, 123, 92, 44, 134, 13, 11, 47, 13, 47, 92, 181, 134, 123, 47, 128, 92, 181, 92, 44, 48, 123, 134, 69, 33, 92, 129, 33, 123, 16, 130, 33, 92, 44, 92, 13, 44, 92, 157, 129, 114, 181, 47, 69, 92, 92]
v1= 123 iv1= 27 v2= 130 iv2= None n= 42
===

Using Agglomerative Hierarchical Clustering on a high-dimensional dataset with categorical and continuous variables

My group and I are working on a high-dimensional dataset with a mix of categorical (binary and integer) and continuous variables. We are wondering what would be the best distance metric and linkage method to use for agglomerative hierarchical clustering. We first started with Euclidean distance and Ward's linkage, but with the issues that arise with Euclidean distance and categorical variables we need a new strategy. We have attempted Heterogeneous Euclidean-Overlap Metric (HEOM) and Gower's distance metric with average, centroid, and single linkage, but have not gotten the clear results that we were hoping for. We are wondering if there are better methods or metrics that we should use for our analysis?
Here is an example of the code we have already:
from distython import HEOM
categorical_ix = [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 28, 34, 37, 39, 142, 41, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 59, 60, 61, 62, 63, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 213, 217, 218, 219, 220, 221, 222, 223, 224, 225]
nan_eqv = 12345
heom_metric = HEOM(features, categorical_ix, nan_equivalents = [nan_eqv])
from sklearn.neighbors import DistanceMetric
dist = DistanceMetric.get_metric(heom_metric.heom)
distance = dist.pairwise(features)
import scipy.cluster.hierarchy as shc
from scipy.cluster.hierarchy import linkage, dendrogram
linkage_matrix = linkage(distance, 'average')
plt.figure(figsize=(10, 7))
plt.title("Test")
dendrogram(linkage_matrix)
plt.axhline(y=8, color='r', linestyle='--')
plt.show()
from scipy.cluster.hierarchy import fcluster
k = 4
clusters = fcluster(linkage_matrix, k, criterion='maxclust')
clusters
If Gower's distance or HEOM is the preferred method to use we would also appreciate any advice on how to better implement these metrics into our code. Thank you

Remove noise using logarithmic binning

I'm interested in plotting the probability distribution of a set of points which are distributed as a power law. Further, I would like to use logarithmic binning to be able to smooth out the large fluctuations, especially those observed in the tail. I made the code:
plt.figure()
plt.grid(True)
plt.loglog(x, y, 'bo')
plt.savefig('distribution.png', dpi=400)
plt.show()
plt.close()
Where x and y are lists with the data. I know I should use numpy.logspace, but I'm not sure how to do it.
I attach the lists and image of the graph:
Graphic: 1
x=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37,
38, 39, 40, 41, 44, 45, 46, 48, 50, 53, 54, 55, 56, 57, 58, 59, 63, 64,
66, 71, 72, 73, 76, 79, 81, 84, 85, 86, 90, 95, 97, 99, 100, 101, 103,
105, 114, 117, 118, 120, 122, 129, 141, 159, 166, 168, 172, 199, 201,
206, 218, 226, 243, 260, 262, 263, 265, 273, 274, 278, 281, 292, 300,
390, 404, 420, 443, 491, 849, 939, 1036, 1156, 1191, 1389, 1551, 1742,
2082]
y=[0.0, 0.3508771929824561, 0.4259259259259261, 0.4400278940027895,
0.439337474120083, 0.43933333333333335, 0.4165445665445665,
0.4361247947454843, 0.4325877825877826, 0.4820728291316526,
0.42828042328042315, 0.35761299632267374, 0.3491461529923068,
0.4079423222280365, 0.43694194694194693, 0.34069215098626865,
0.3449795896319961, 0.3633688071188071, 0.30852671293847767,
0.4242381075714409, 0.20068791049183207, 0.24466260863319686,
0.12237645395540135, 0.37624875124875123, 0.28918557997841887,
0.25374977395437753, 0.4761346678013344, 0.41219336219336217,
0.19267411510058569, 0.30895915678524377, 0.18104998922645982,
0.2407892107892108, 0.23937740965604742, 0.3727204759813455,
0.23712669683257917, 0.2567023619655199, 0.33474793703626654,
0.3520767731294047, 0.2475947884643537, 0.3738888888888889,
0.5274725274725275, 0.33489003749873314, 0.18518518518518517,
0.15181358496575886, 0.3152953084067635, 0.17919413919413918,
0.20858299108299105, 0.21746880570409982, 0.1915602105707053,
0.2972972972972973, 0.18115942028985507, 0.25, 0.32707722385141735,
0.33894302848575714, 0.21774193548387097, 0.34782608695652173,
0.27608756290137165, 0.17296320127462694, 0.2727272727272727,
0.2879728132387707, 0.06535947712418301, 0.083710407239819,
0.28118393234672306, 0.1951219512195122, 0.09254361251031618,
0.3062211259885678, 0.002663622526636225, 0.27311522048364156,
0.0506558118498417, 0.1044776119402985, 0.06284153005464481,
0.18588399720475193, 0.2129032258064516, 0.14903846153846154,
0.021532091097308487, 0.3089430894308943, 0.301010101010101,
0.3761904761904762, 0.10466269841269842, 0.07138047138047138,
0.21709633649932158, 0.019401589527816735, 0.017575757575757574,
0.15817805383022773, 0.025306629405371837, 0.20850040096230954,
0.0001638001638001638, 0.04357084357084357, 0.09221213569039656,
0.14047410008779632, 0.002560163850486431, 0.0031680440771349864,
0.12334152334152335, 0.6428571428571429, 0.012745098039215686,
0.0058073399287151255, 0.0012413644214162348, 0.013532269257460098,
0.04368752313957793, 0.20265151515151514, 0.0018470281790196543,
0.023099982366425676, 0.03265807243707796, 0.00695970695970696,
0.003737745098039216, 0.009634076615208691, 0.024085079762277136,
0.0062196422224854876, 0.030849549121974372, 0.01636020744931636,
0.003922512815882666, 0.005677708965459911, 0.04833570605382686,
0.014331723027375202]

python - subtracting ranges from bigger ranges

The problem I have is that I basically would like to find if there are any free subnets between a BGP aggregate-address (ex: 10.76.32.0 255.255.240.0) and all the network commands on the same router (ex: 10.76.32.0 255.255.255.0, 10.76.33.0 255.255.255.0)
In the above example, 10.76.34.0 -> 10.76.47.255 would be free.
I'm thinking of tackling this problem by converting the IP addresses and subnet masks to binary and subtracting that way.
To keep it simple I will keep this example in decimal but doing this would leave me with the following problem: let's say I have a range from 1 to 250, I subtract from this a smaller range that goes from 20 to 23, I would like to end up with a range from 1 to 19 and 24 to 250.
Using the range command doesn't really give me the expected results and while I could possibly create a list with every item in the range and subtract another list with a sub-set of items, it seems to me that it might not be a good idea to have lists with possibly tens of thousands of elements.
Hunor
If you are trying to create a "range" with a gap in it, i.e., with 1-9 and 24-250, you could try to use filterfalse (or ifilterfalse if you are using Python 2.X) from the itertools module, which takes as its arguments a predicate and a sequence, and returns elements of the sequence where the predicate returns False. As an example, if you do:
from itertools import filterfalse
new_range = filterfalse(lambda x: 20 <= x <= 23, range(1,251))
new_range will be an iterable containing the numbers 1-19, and 24-250, which can be used similarly to range():
for i in new_range:
do_things()
The question has been asked long ago but I want to add numpy array answer.
import numpy as np
aa=np.arange(1,251)
bb=np.concatenate((np.array(aa[aa<20]),np.array(aa[aa>23])))
print(bb)
output
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
14, 15, 16, 17, 18, 19, 24, 25, 26, 27, 28, 29, 30,
31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56,
57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69,
70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82,
83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95,
96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108,
109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121,
122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147,
148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160,
161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173,
174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186,
187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199,
200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212,
213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225,
226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238,
239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250])

How to find differences of mat files in Python in human readable format?

.mat files can be loaded into Python with:
import scipy.io
matdata1 = scipy.io.loadmat('file1.mat')
matdata2 = scipy.io.loadmat('file2.mat')
the files can then be piped and save the mat files as text by calling the following Python function:
def mat2txt(matdata):
for k, v in matdata.items(): #Python 3 specific
if isinstance(v,dict):
myprint(v)
else:
print (k,v)
The two .mat files that are being compared are of the same structure and type with different values.
I would like to able to identify the different values in human readable format, and not just their location.
I have tried:
diff matdata1.txt matdata2.txt
diff matdata1.txt matdata2.txt | grep "<" | sed 's/^<//g'
grep -v -F -x -f matdata1.txt matdata2.txt
which do not point to specific differences in values, and they are not within Python. I hoped to store the .mat files as .txt to be able to create a static state to compare the files data at different dates relative to itself and other files, as well as, affording the opportunity to store in git for future comparisons.
A toy example of the resulting data files are:
matdata1.txt
b [[([[(array([[0]], dtype=uint8),)]],)]]
a [[([[(array([[0]], dtype=uint8),)]],)]]
c [[ ([[(array([[ ([[122, 139, 156, 173, 190, 207, 224, 1, 18, 35, 52, 69, 86, 103, 120], [138, 155, 172, 189, 206, 223, 15, 17, 34, 51, 68, 85, 102, 119, 121], [154, 171, 188, 205, 222, 14, 16, 33, 50, 67, 84, 101, 118, 135, 137], [170, 187, 204, 221, 13, 30, 32, 49, 66, 83, 100, 117, 134, 136, 153], [186, 203, 220, 12, 29, 31, 48, 65, 82, 99, 116, 133, 150, 152, 169], [202, 22, 11, 28, 45, 47, 64, 81, 98, 115, 132, 149, 151, 168, 185], [218, 10, 27, 33, 46, 63, 80, 97, 114, 131, 148, 165, 167, 184, 201], [9, 26, 43, 60, 62, 11, 96, 113, 130, 147, 164, 166, 183, 200, 217], [25, 42, 59, 61, 78, 95, 112, 99, 146, 163, 180, 182, 199, 216, 8], [41, 58, 75, 77, 94, 111, 128, 145, 162, 179, 181, 198, 215, 7, 24], [57, 74, 76, 93, 110, 127, 144, 161, 178, 195, 197, 214, 6, 23, 40], [73, 90, 92, 109, 126, 143, 160, 177, 194, 196, 213, 5, 22, 39, 56], [89, 91, 108, 125, 142, 159, 176, 193, 210, 212, 4, 21, 38, 55, 72], [105, 107, 124, 141, 158, 175, 192, 209, 211, 3, 20, 37, 54, 71, 88], [106, 123, 140, 157, 174, 191, 208, 225, 2, 19, 36, 53, 70, 87, 104]],)]],
dtype=[('c', 'O')]),)]],)]]
__header__ b'MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Mon Jun 27 20:55:29 2016'
d [[1]]
__globals__ []
f ['string']
__version__ 1.0
e [[2]]
matdata2.txt
e [[2]]
d [[1]]
__globals__ []
f ['string']
__header__ b'MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Mon Jun 27 20:54:48 2016'
c [[ ([[(array([[ ([[122, 139, 156, 173, 190, 207, 224, 1, 18, 35, 52, 69, 86, 103, 120], [138, 155, 172, 189, 206, 223, 15, 17, 34, 51, 68, 85, 102, 119, 121], [154, 171, 188, 205, 222, 14, 16, 33, 50, 67, 84, 101, 118, 135, 137], [170, 187, 204, 221, 13, 30, 32, 49, 66, 83, 100, 117, 134, 136, 153], [186, 203, 220, 12, 29, 31, 48, 65, 82, 99, 116, 133, 150, 152, 169], [202, 219, 11, 28, 45, 47, 64, 81, 98, 115, 132, 149, 151, 168, 185], [218, 10, 27, 44, 46, 63, 80, 97, 114, 131, 148, 165, 167, 184, 201], [9, 26, 43, 60, 62, 79, 96, 113, 130, 147, 164, 166, 183, 200, 217], [25, 42, 59, 61, 78, 95, 112, 129, 146, 163, 180, 182, 199, 216, 8], [41, 58, 75, 77, 94, 111, 128, 145, 162, 179, 181, 198, 215, 7, 24], [57, 74, 76, 93, 110, 127, 144, 161, 178, 195, 197, 214, 6, 23, 40], [73, 90, 92, 109, 126, 143, 160, 177, 194, 196, 213, 5, 22, 39, 56], [89, 91, 108, 125, 142, 159, 176, 193, 210, 212, 4, 21, 38, 55, 72], [105, 107, 124, 141, 158, 175, 192, 209, 211, 3, 20, 37, 54, 71, 88], [106, 123, 140, 157, 174, 191, 208, 225, 2, 19, 36, 53, 70, 87, 104]],)]],
dtype=[('c', 'O')]),)]],)]]
a [[([[(array([[1]], dtype=uint8),)]],)]]
b [[([[(array([[0]], dtype=uint8),)]],)]]
__version__ 1.0

Categories