I have this code, it produces a random matrix of 1s and 0s. I want to create a graph from this matrix where the 1s in the matrix represent a node and each node has a maximum of 3 edges. How can i implement this, please help?
import numpy as np
from random import sample
N = int(input("Enter the number of nodes:"))
my_matrix = np.zeros((N,N), dtype='int8')
If you matrix is just random, probably, you don't need it. Instead, you can create graph from list of edges
import networkx as nx
from random import sample
import numpy as np
from numpy.random import randint
n = 7 # number of nodes in graph
max_connections = int(input("Enter max connections per node:")) # input: 3
nodes = np.arange(n)
# create graph based on list of edges [(0, 1), (0, 4), ...]
gr = nx.Graph([
# for each node select <= 'max_connections' nodes as connections
(i, j) for i in range(n) for j in sample(nodes[nodes != i].tolist(), randint(1, max_connections+1))
])
# check number of connections
for n in gr.nodes():
nei = list(gr.neighbors(n))
while len(nei) > max_connections:
gr.remove_edge(n, random.choice(nei))
nei = list(gr.neighbors(n))
nx.draw_networkx(gr, with_labels=True, node_color='#7d99f5')
Graph:
And you can get adjacency matrix using nx.adjacency_matrix()
nx.adjacency_matrix(gr, nodelist=sorted(gr.nodes())).todense()
matrix([[0, 1, 1, 0, 1, 0, 0],
[1, 0, 0, 0, 0, 1, 1],
[1, 0, 0, 1, 0, 1, 0],
[0, 0, 1, 0, 0, 0, 1],
[1, 0, 0, 0, 0, 1, 1],
[0, 1, 1, 0, 1, 0, 0],
[0, 1, 0, 1, 1, 0, 0]])
complete update :
if bi-directional is not important, then the adjM is tri-diagonal
if not more than 3 edges per node are allowed, then each row and each column of adjM has 3 or less "1"
the code follows the tri-diagnoal structure required
step 1: fill in 3 edges starting from each node (follow the rows of adjM)
step 2: but then some nodes may receive more than 3 edges, so remove some of them until there are 3 left only (follow the columnes of adjM)
step 3: remove the self-adjacencies
yes, it might happen that the graph will not be connected due to the random processes. Then repeat the runs till you are happy with.
The structure of adjM looks reasonable now at least. The graph however presents at node 12 more connections than can be seen in the adjM matrix. (so there is still an improvement necessary....)
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
Nnodes = 16
Nedges = 3
#---- 0. to initialize: generate a random adjacency matrix
rng = np.random.default_rng()
adjM = rng.integers(1, size=(Nnodes,Nnodes)) # random adjacency matrix / with upper=1 it is a zero matrix
#---- 1. for eaach node generate randomly Nedges edges ("sending" connections alongs the rows of adjM)
for node in range(Nnodes):
dd = Nnodes-node
rand_ind = np.random.choice(np.arange(node,Nnodes), size=min(dd,Nedges), replace=False, p=None) # generate randomly indexes
# you might use replace=False too with different results
adjM[node, rand_ind] = 1 # insert the connections
#---- 2. for each node eliminate randomly edges that are more than Nedges ("receiving" connections alongs the columns of adjM)
for node in range(Nnodes): # run through the columns of adjM
dd = Nnodes-node
a = adjM[:,node] # select a column = receiving connections
jnz = np.array(a.nonzero()) # indices of the non-zero elements
Nnz = jnz.shape[1] # number of non-zero elements
if Nnz > Nedges: # ...then randomly select Nedges edges only
jchoice = np.random.choice(jnz.ravel(), size=min(Nedges,Nedges), replace=False, p=None)
#print(' jchoice', jchoice)
adjM[:,node] = 0
adjM[jchoice, node] = 1
#---- 3. remove self-adjacency
jDiag = np.arange(Nnodes)
adjM[jDiag, jDiag] = 0 # set the diagonals to zero
print(adjM)
#---- grafics
plt.spy(adjM, precision=0, marker=None, markersize=12, aspect='equal', origin='upper')
plt.show()
gr = nx.from_numpy_matrix(adjM)
nx.draw_networkx(gr, with_labels=True, node_size=400, node_color='#7d99f5', edge_color='orange', width=2, font_weight='bold')
plt.show()
Addition to your code (function check_graph () fix two problems we have discussed). Also, according your way of generation of adjacency matrix, you won't face second problem, so you can comment out second section in function.
def check_graph(graph, max_conn):
# 1) remove self loops
graph.remove_edges_from(nx.selfloop_edges(graph))
# 2) remove random edge(s) if limit of edges per node have been exceeded
for i in graph.nodes():
# list of connections - nodes that are connected to the selected node 'i'
nei = list(graph.neighbors(i))
if len(nei) > max_conn:
graph.remove_edges_from(
# like if len(nei) - max_conn = 5 - 4 = 1, then one random edge will be selected
np.random.choice(nei, size=(len(nei)-max_conn))
)
# <-- insert your code here --> N = 20
gr = nx.from_numpy_matrix(my_matrix)
check_graph(gr, max_conn=N)
nx.draw_networkx(gr, with_labels=True, node_color='#7d99f5')
Result looks a bit strange for me, but I don't know purposes of your graph, probably, it's okay.
Related
I have a set of xyz points and a set of tetrahedrons. Where each node of the tetrahedron points to an index in the points table.
I need to plot the tetrahedrons with a corresponding color based on the tag attribute.
points
Index
x
y
z
0
x_1
y_1
z_1
1
x_2
y_2
z_2
...
...
...
...
tetrahedrons
Index
a
b
c
d
tag
0
a_1.pt
b_1.pt
c_1.pt
d_1.pt
9
1
a_2.pt
b_2.pt
c_2.pt
d_2.pt
0
...
...
...
...
...
...
I have tried using the Mesh3d api but it does not allow for a 4th vertex.
I can plot something like the code below but it does not have all the faces of the tetrahedron.
go.Figure(data=[
go.Mesh3d(
x=mesh_pts.x, y=mesh_pts.y, z=mesh_pts.z,
i=tagged_th.a, j=tagged_th.b, k=tagged_th.c,
),
]).show()
I think the Volume or Isosurface plots might work but I'm not sure how to convert my data into a format to be consumed by those apis.
I can't hide the fact that, a few minutes ago, I wasn't even aware of i,j,k parameters. But, still, I know that Mesh3D draws triangles, not tetrahedron. You need to take advantage of those i,j,k parameters to control which triangles are drawn. But it is still your job to tell which triangles need to be drawn to that it look like tetrahedrons.
Yes, there are 4 triangles per tetrahedron. If you wish to draw them four, you need to explicitly pass i,j,k for all 4. Not just pass i,j,k and an nonexistent l and expect plotly to understand that this means 4 triangles.
If a, b, c and d are 4 vertices of a tetrahedron, then the 4 triangles you need to draw are the 4 combinations of 3 of vertices from those. That is bcd, acd, abd and abc.
Let's write this in 4 rows
bcd
acd
abd
abc
^^^
|||
||\------k
|\------ j
\------- i
So, if, now, a, b, c and d are list of n vertices, then i, j, k must be lists 4 times longer
i=b + a + a + a
j=c + c + b + b
k=d + d + d + c
Application: let's define 2 tetrahedrons, one sitting on the spike of the other, using your dataframes format
import plotly.graph_objects as go
import pandas as pd
mesh_pts = pd.DataFrame({'x':[0, 1, 0, 0, 1, 0, 0],
'y':[0, 0, 1, 0, 0, 1, 0],
'z':[0, 0, 0, 1, 1, 1, 2]})
tagged_th = pd.DataFrame({'a':[0,3],
'b':[1,4],
'c':[2,5],
'd':[3,6],
'tag':[0,1]})
# And from there, just create a list of triangles, made of 4 combinations
# of 3 points taken from list of tetrahedron vertices
go.Figure(data=[
go.Mesh3d(
x=mesh_pts.x,
y=mesh_pts.y,
z=mesh_pts.z,
i=pd.concat([tagged_th.a, tagged_th.a, tagged_th.a, tagged_th.b]),
j=pd.concat([tagged_th.b, tagged_th.b, tagged_th.c, tagged_th.c]),
k=pd.concat([tagged_th.c, tagged_th.d, tagged_th.d, tagged_th.d]),
intensitymode='cell',
intensity=pd.concat([tagged_th.tag, tagged_th.tag, tagged_th.tag, tagged_th.tag])
)
]).show()
I don't see what you mean by "does not allow for a 4th vertex". Here is an example with two tetrahedra:
import plotly.graph_objects as go
import plotly.io as pio
import numpy as np
i = np.array([0, 0, 0, 1])
j = np.array([1, 2, 3, 2])
k = np.array([2, 3, 1, 3])
fig = go.Figure(data = [
go.Mesh3d(
x = [0,1,2,0, 4,5,6,4],
y = [0,0,1,2, 0,0,1,2],
z = [0,2,2,3, 4,2,4,1],
i = np.concatenate((i, i+4)),
j = np.concatenate((j, j+4)),
k = np.concatenate((k, k+4)),
facecolor = ["red","red","red","red", "green","green","green","green"]
)
])
pio.write_html(fig, file = "tetrahedra.html", auto_open = True)
I would like to have a random list where the occurence of ones is 10% and the rest of the items are zeros. The length of this list is 1000. I would like for the values to be in a random order so that there is an adjustable minimum distance between ones. So for example if I choose a value of 3, the list would look something like this:
[0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, ...]
What is the most elegant way to achieve this?
Edit. I was asked for more information and to show some effort.
This is for a study where 0 signifies one type of stimulus and 1 an other kind of stimulus and we want to have a minimum distance between stimulus type 1.
So far I have achieved this with:
trials = [0]*400
trials.extend([1]*100)
random.shuffle(trials)
#Make sure a fixed minimum number of standard runs follow each deviant
i = 0
while i < len(trials):
if trials[i] == 1:
trials[i+1:i+1] = 5*[0]
i = i + 6
else:
i = i + 1
This gives me a list of length 1000 but to me seems a little clumsy so out of curiosity I was wondering if there is a better way to do this.
You have essentially a binomial random variable. The waiting time between successes for a binomial random variable is given by the negative binomial distribution. Using this distribution, we can get a random sequence of intervals between successes for a binomial variable with the specified success rate. Then we simply add your "refractory period" to all intervals and create a binary representation.
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import nbinom
min_failures = 3 # refractory period
total_successes = 100
total_time = 1000
# create a negative binomial distribution to model the waiting times to the next success for a Bernoulli RV;
rv = nbinom(1, total_successes / float(total_time))
# get interval lengths between successes;
intervals = rv.rvs(size=total_successes)
# get event times
events = np.cumsum(intervals)
# rescale event times to fit into the total time - refractory time
total_refractory = total_successes * min_failures
remaining_time = total_time - total_refractory
events = events.astype(np.float) / np.max(events) * remaining_time
# add refractory periods
intervals = np.diff(np.r_[0, events])
intervals += min_failures
events = np.r_[0, np.cumsum(intervals[:-1])] # series starts with success
# create binary representation
binary = np.zeros((total_time), dtype=np.uint8)
binary[events.astype(np.int)] = 1
To check that the inter-event intervals match your expectations, plot a histogram:
# check that intervals match our expectations
fig, ax = plt.subplots(1,1)
ax.hist(intervals, bins=20, normed=True);
ax.set_xlabel('Interval length')
ax.set_ylabel('Normalised frequency')
xticks = ax.get_xticks()
ax.set_xticks(np.r_[xticks, min_failures])
plt.show()
My approach to this problem is to maintain a list of candidate positions from which the next position is chosen randomly. Then, the surrounding range of positions is checked to be empty. If so, this position is chosen and the whole range around it in which no future position is allowed is removed from the list of available candidates. This ensures a minimum number of loops.
It may happen (if mindist is big compared to the number of positions) that less than the required positions are returned. In this case, the function needs to be called again, like shown.
import random
def array_ones(mindist, length_array, numones):
result = [0]*length_array
candidates = range(length_array)
while sum(result) < numones and len(candidates) > 0:
# choose one position randomly from candidates
pos = candidates[random.randint(0, len(candidates)-1)]
L = pos-mindist if pos >= mindist else 0
U = pos+mindist if pos <= length_array-1-mindist else length_array-1
if sum(result[L:U+1]) == 0: # no taken positions around
result[pos] = 1
# remove all candidates around this position
no_candidates = set(range(L, U+1))
candidates = list(set(candidates).difference(no_candidates))
return result, sum(result)
def main():
numones = 5
numtests = 50
mindist = 4
while True:
arr, ones = array_ones(mindist, numtests, numones)
if ones == numones:
break
print arr
if __name__ == '__main__':
main()
The function returns the array of ones and it's number of ones. Set difference is used to remove a range of candidate positions noniteratively.
Seems that there wasn't a very simple one-line answer to this problem. I finally came up with this:
import numpy as np
def construct_list(n_zeros, n_ones, min_distance):
if min_distance > (n_zeros + n_ones) / n_ones:
raise ValueError("Minimum distance too high.")
initial_zeros = n_zeros - min_distance * n_ones
block = np.random.permutation(np.array([0]*initial_zeros + [1]*n_ones))
ones = np.where(block == 1)[0].repeat(min_distance)
#Insert min_distance number of 0s after each 1
block = np.insert(block, ones+1, 0)
return block.tolist()
This seems simpler than the other answers although Paul's answer was just a little faster with values n_zeros=900, n_ones=100, min_distance=3
I have pulled the following data from a .csv file(databoth.csv) and performed a k-means clustering utilising matplotlib. The data is 3 columns(Country, birthrate, life expectancy).
I need help to output:
The number of countries belonging to each cluster.
The list of countries belonging to each cluster.
The mean Life Expectancy and Birth Rate for each cluster.
Here is my code:
import csv
import matplotlib.pyplot as plt
import sys
import pylab as plt
import numpy as np
plt.ion()
#K-Means clustering implementation
# data = set of data points
# k = number of clusters
# maxIters = maximum number of iterations executed k-means
def kMeans(data, K, maxIters = 10, plot_progress = None):
centroids = data[np.random.choice(np.arange(len(data)), K), :]
for i in range(maxIters):
# Cluster Assignment step
C = np.array([np.argmin([np.dot(x_i-y_k, x_i-y_k) for y_k in
centroids]) for x_i in data])
# Move centroids step
centroids = [data[C == k].mean(axis = 0) for k in range(K)]
if plot_progress != None: plot_progress(data, C, np.array(centroids))
return np.array(centroids) , C
# Calculates euclidean distance between
# a data point and all the available cluster
# centroids.
def euclidean_dist(data, centroids, clusters):
for instance in data:
mu_index = min([(i[0], np.linalg.norm(instance-centroids[i[0]])) \
for i in enumerate(centroids)], key=lambda t:t[1])[0]
try:
clusters[mu_index].append(instance)
except KeyError:
clusters[mu_index] = [instance]
# If any cluster is empty then assign one point
# from data set randomly so as to not have empty
# clusters and 0 means.
for cluster in clusters:
if not cluster:
cluster.append(data[np.random.randint(0, len(data), size=1)].flatten().tolist())
return clusters
# this function reads the data from the specified files
def csvRead(file):
np.genfromtxt('dataBoth.csv', delimiter=',')
# function to show the results on the screen in form of 3 clusters
def show(X, C, centroids, keep = False):
import time
time.sleep(0.5)
plt.cla()
plt.plot(X[C == 0, 0], X[C == 0, 1], '*b',
X[C == 1, 0], X[C == 1, 1], '*r',
X[C == 2, 0], X[C == 2, 1], '*g')
plt.plot(centroids[:,0],centroids[:,1],'*m',markersize=20)
plt.draw()
if keep :
plt.ioff()
plt.show()
# generate 3 cluster data
data = csvRead('dataBoth.csv')
m1, cov1 = [9, 8], [[1.5, 2], [1, 2]]
m2, cov2 = [5, 13], [[2.5, -1.5], [-1.5, 1.5]]
m3, cov3 = [3, 7], [[0.25, 0.5], [-0.1, 0.5]]
data1 = np.random.multivariate_normal(m1, cov1, 250)
data2 = np.random.multivariate_normal(m2, cov2, 180)
data3 = np.random.multivariate_normal(m3, cov3, 100)
X = np.vstack((data1,np.vstack((data2,data3))))
np.random.shuffle(X)
# calls to the functions
# first to find centroids using k-means
centroids, C = kMeans(X, K = 3, plot_progress = show)
#second to show the centroids on the graph
show(X, C, centroids, True)
maybe you can use annotate:
http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.annotate
more example :
http://matplotlib.org/users/annotations.html#plotting-guide-annotation
This will allow to have a text label near to each point.
or you can use colours as in this post
I've got two musical files: one lossless with little sound gap (at this time it's just silence but it could be anything: sinusoid or just some noise) at the beginning and one mp3:
In [1]: plt.plot(y[:100000])
Out[1]:
In [2]: plt.plot(y2[:100000])
Out[2]:
This lists are similar but not identical so I need to cut this gap, to find the first occurrence of one list in another with lowest delta error.
And here's my solution (5.7065 sec.):
error = []
for i in range(25000):
y_n = y[i:100000]
y2_n = y2[:100000-i]
error.append(abs(y_n - y2_n).mean())
start = np.array(error).argmin()
print(start, error[start]) #23057 0.0100046
Is there any pythonic way to solve this?
Edit:
After calculating the mean distance between special points (e.g. where data == 0.5) I reduce the area of search from 25000 to 2000. This gives me reasonable time of 0.3871s:
a = np.where(y[:100000].round(1) == 0.5)[0]
b = np.where(y2[:100000].round(1) == 0.5)[0]
mean = int((a - b[:len(a)]).mean())
delta = 1000
error = []
for i in range(mean - delta, mean + delta):
...
What you are trying to do is a cross-correlation of the two signals.
This can be done easily using signal.correlate from the scipy library:
import scipy.signal
import numpy as np
# limit your signal length to speed things up
lim = 25000
# do the actual correlation
corr = scipy.signal.correlate(y[:lim], y2[:lim], mode='full')
# The offset is the maximum of your correlation array,
# itself being offset by (lim - 1):
offset = np.argmax(corr) - (lim - 1)
You might want to take a look at this answer to a similar problem.
Let's generate some data first
N = 1000
y1 = np.random.randn(N)
y2 = y1 + np.random.randn(N) * 0.05
y2[0:int(N / 10)] = 0
In these data, y1 and y2 are almost the same (note the small added noise), but the first 10% of y2 are empty (similarly to your example)
We can now calculate the absolute difference between the two vectors and find the first element for which the absolute difference is below a sensitivity threshold:
abs_delta = np.abs(y1 - y2)
THRESHOLD = 1e-2
sel = abs_delta < THRESHOLD
ix_start = np.where(sel)[0][0]
fig, axes = plt.subplots(3, 1)
ax = axes[0]
ax.plot(y1, '-')
ax.set_title('y1')
ax.axvline(ix_start, color='red')
ax = axes[1]
ax.plot(y2, '-')
ax.axvline(ix_start, color='red')
ax.set_title('y2')
ax = axes[2]
ax.plot(abs_delta)
ax.axvline(ix_start, color='red')
ax.set_title('abs diff')
This method works if the overlapping parts are indeed "almost identical". You will have to think of smarter alignment ways if the similarity is low.
I think what you are looking for is correlation. Here is a small example.
import numpy as np
equal_part = [0, 1, 2, 3, -2, -4, 5, 0]
y1 = equal_part + [0, 1, 2, 3, -2, -4, 5, 0]
y2 = [1, 2, 4, -3, -2, -1, 3, 2]+y1
np.argmax(np.correlate(y1, y2, 'same'))
Out:
7
So this returns the time-difference, where the correlation between both signals is at its maximum. As you can see, in the example the time difference should be 8, but this depends on your data...
Also note that both signals have the same length.
Exist any form to count the number of edges that separe 2 nodes, for example if we have the nodes "a", "b", "c" and "d" with a form "a"-"b"-"c"-"d" (where "-" is a edge) I need count the edges between "a" and "d".
the real example is as follow. I have a big graph, but in this link you can see a image https://drive.google.com/file/d/0B7GayK8MGGtCcVhRMncyM0VMc2c/view?usp=sharing
graph have 2806 nodes in this case, and I need to know for example how many edges separe the node 608 of 616. I thought that number_of_edges function can help me, but i think now that only returns if 2 nodes are connected or not (because returns 1 or 0 in a code like this)
for k in id1: #id1 is a list of nodes
for l in id2: #id2 is a list of nodes
print graph.number_of_edges(k,l)
Without knowing what you have attempted and without an example graph I will give you a simple example. It might clear things up for you.
I will make a 4 node, 4 edge graph from an adjacency matrix using newtworkx and numpy.
import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
adjacency_matrix = np.array([[0,1,0,1], [1,0,1,0], [0,1,0,1], [1,0,1,0]])
print adjacency_matrix
This prints our graph:
[[0 1 0 1]
[1 0 1 0]
[0 1 0 1]
[1 0 1 0]]
Now feed this adjacency matrix into networkx as such:
rows, cols = np.where(adjacency_matrix == 1)
edges = zip(rows.tolist(), cols.tolist())
gr = nx.Graph()
gr.add_edges_from(edges)
Plot it:
nx.draw_networkx(gr)
plt.show()
Now we can look at which nodes are connected to one another by:
print gr.number_of_edges(0, 1) # clearly has an edge
print gr.number_of_edges(2, 0) # no edge between 2 and 0
And as expected this prints:
1
0
So if you are getting 0 from number_of_edges(a, b) then a and b are not adjacent (not edge between them).
[EDIT: If we want to find all paths between 2 and 0 you can do the following
for path in nx.all_simple_paths(gr, source=2, target=0):
print(path)
# prints
# [2, 1, 0]
# [2, 3, 0]
Or to find the shortest path:
p = nx.shortest_path(gr,source=2, target=0)
# [2, 1, 0]
In which case you could say:
num_edges = len(p) - 1 # num_edges = 2
]