Creating the node-edge triangle adjacency graph in Python/R - python

How can I write an R/Python program which creates a node-edge adjacency matrix in which rows denote nodes and columns denote the edges and an entry is one in this adjacency matrix if the edge is part of a triangle and the node is part of the same triangle. I am actually more interested to make use of igraph or linkcomm for this purpose but wouldn't mind seeing a different package/program for this purpose.
I know I can use maximal.clique(g) for locating the triangle but I am not sure of how to make use of this data to create the node-edge triangle adjacency matrix.
> g <- erdos.renyi.game(15, 45, type="gnm", dir=TRUE)
> triad.census(g)
[1] 113 168 38 16 13 49 23 17 7 2
[11] 2 1 2 2 2 0
> str(g)
IGRAPH D--- 15 45 -- Erdos renyi (gnm) graph
+ attr: name (g/c), type (g/c), loops
(g/x), m (g/n)
+ edges:
1 -> 3 4 6 12 13 2 -> 1 3 7
3 -> 2 5 10 15 4 -> 5 12 14
5 -> 6 7 9 6 -> 4 8 12
7 -> 5 9 12 8 -> 2 7 15
9 -> 1 4 11 13 10 -> 4 5 8
11 -> 1 2 9 12 -> 1 4 14 15
13 -> 15 14 -> 11 12
15 -> 3
> maximal.cliques(g)
[[1]]
[1] 13 15
[[2]]
[1] 13 1 9
[[3]]
[1] 2 8 7
[[4]]
[1] 2 1 3
[[5]]
[1] 2 1 11
[[6]]
[1] 3 5 10
[[7]]
[1] 3 15
[[8]]
[1] 4 14 12
[[9]]
[1] 4 10 5
[[10]]
[1] 4 5 6
[[11]]
[1] 4 5 9
[[12]]
[1] 4 1 9
[[13]]
[1] 4 1 12 6
[[14]]
[1] 5 7 9
[[15]]
[1] 6 8
[[16]]
[1] 7 12
[[17]]
[1] 8 15
[[18]]
[1] 8 10
[[19]]
[1] 9 1 11
[[20]]
[1] 11 14
[[21]]
[1] 12 15
Warning message:
In maximal.cliques(g) :
At maximal_cliques_template.h:203 :Edge directions are ignored for maximal clique calculation
According to the Vincent's answer when I use the following I am doubtful if it finds the clique of exactly size 3 or it finds cliques of size 3 and greater? (I just need the triangles). One problem is that this code is super slow. Any idea on how to speed up this?
library(igraph)
set.seed(1)
g <- erdos.renyi.game(100, .6)
#print(g)
plot(g)
ij <- get.edgelist(g)
print(ij)
library(Matrix)
m <- sparseMatrix(
i = rep(seq(nrow(ij)), each=2),
j = as.vector(t(ij)),
x = 1
)
print(m)
# Maximal cliques of size at least 3
cl <- maximal.cliques(g)
print(cl)
cl <- cl[ sapply(cl, length) > 2 ]
print(cl)
# Function to test if an edge is part of a triangle
triangle <- function(e) {
any( sapply( cl, function(u) all( e %in% u ) ) )
}
print(triangle)
# Only keep those edges
kl <- ij[ apply(ij, 1, triangle), ]
print(kl)
# Same code as before
m <- sparseMatrix(
i = rep(seq(nrow(kl)), each=2),
j = as.vector(t(kl)),
x = 1
)
print(m)
Also for some reasons the function cocluster tells me that the output m is not a matrix. Any idea on what I should do to make use of m sparse matrix in the cocluster function?
>library("blockcluster")
> out<-cocluster(m,datatype="binary",nbcocluster=c(2,3))
Error in cocluster(m, datatype = "binary", nbcocluster = c(2, 3)) :
Data should be matrix.

The following gives you an edge/vertex adjacency matrix,
but for all edges, not just those included in triangles.
library(igraph)
set.seed(1)
g <- erdos.renyi.game(6, .6)
plot(g)
ij <- get.edgelist(g)
library(Matrix)
m <- sparseMatrix(
i = rep(seq(nrow(ij)), each=2),
j = as.vector(t(ij)),
x = 1
)
As you suggest, you can use maximal.cliques
to identify the edges that are part of triangle
(equivalently, that are part of a maximal
clique of size at least 3).
# Maximal cliques of size at least 3
cl <- maximal.cliques(g)
cl <- cl[ sapply(cl, length) > 2 ]
# Function to test if an edge is part of a triangle
triangle <- function(e) {
any( sapply( cl, function(u) all( e %in% u ) ) )
}
# Only keep those edges
kl <- ij[ apply(ij, 1, triangle), ]
# Same code as before
m <- sparseMatrix(
i = rep(seq(nrow(kl)), each=2),
j = as.vector(t(kl)),
x = 1
)
m
# 5 x 5 sparse Matrix of class "dgCMatrix"
# [1,] 1 1 . . .
# [2,] . 1 1 . .
# [3,] 1 . . . 1
# [4,] . 1 . . 1
# [5,] . . 1 . 1

Related

Create dynamic nested for loops

I have some arrays m rows by 2 `columns (like series of coordinates) and I want to automatize my code so that I will not use nested loop for every coord. Here is my code it runs well and gives right answer coordinates but I want to make a dynamic loop:
import numpy as np
A = np.array([[1,5,7,4,6,2,2,6,7,2],[2,8,2,9,3,9,8,5,6,2],[3,4,0,2,4,3,0,2,6,7],\
[1,5,7,3,4,5,2,7,9,7],[6,2,8,8,6,7,9,6,9,7],[0,2,0,3,3,5,2,3,5,5],[5,5,5,0,6,6,8,5,9,0]\
,[0,5,7,6,0,6,9,9,6,7],[5,5,8,5,0,8,5,3,5,5],[0,0,6,3,3,3,9,5,9,9]])
number = 8292
number = np.asarray([int(i) for i in str(number)]) #split number into array
#the coordinates of every single value contained in required number
coord1=np.asarray(np.where(A == number[0])).T
coord2=np.asarray(np.where(A == number[1])).T
coord3=np.asarray(np.where(A == number[2])).T
coord4=np.asarray(np.where(A == number[3])).T
coordinates = np.array([[0,0]]) #initialize the array that will return all the desired coordinates
solutions = 0 #initialize the array that will give the number of solutions
for j in coord1:
j = j.reshape(1, -1)
for i in coord2 :
i=i.reshape(1, -1)
if (i[0,0]==j[0,0]+1 and i[0,1]==j[0,1]) or (i[0,0]==j[0,0]-1 and i[0,1]==j[0,1]) or (i[0,0]==j[0,0] and i[0,1]==j[0,1]+1) or (i[0,0]==j[0,0] and i[0,1]==j[0,1]-1) :
for ii in coord3 :
ii=ii.reshape(1, -1)
if (np.array_equal(ii,j)==0 and ii[0,0]==i[0,0]+1 and ii[0,1]==i[0,1]) or (np.array_equal(ii,j)==0 and ii[0,0]==i[0,0]-1 and ii[0,1]==i[0,1]) or (np.array_equal(ii,j)==0 and ii[0,0]==i[0,0] and ii[0,1]==i[0,1]+1) or (np.array_equal(ii,j)==0 and ii[0,0]==i[0,0] and ii[0,1]==i[0,1]-1) :
for iii in coord4 :
iii=iii.reshape(1, -1)
if (np.array_equal(iii,i)==0 and iii[0,0]==ii[0,0]+1 and iii[0,1]==ii[0,1]) or (np.array_equal(iii,i)==0 and iii[0,0]==ii[0,0]-1 and iii[0,1]==ii[0,1]) or (np.array_equal(iii,i)==0 and iii[0,0]==ii[0,0] and iii[0,1]==ii[0,1]+1) or (np.array_equal(iii,i)==0 and iii[0,0]==ii[0,0] and iii[0,1]==ii[0,1]-1) :
point = np.concatenate((j,i,ii,iii))
coordinates = np.append(coordinates,point,axis=0)
solutions +=1
coordinates = np.delete(coordinates, (0), axis=0)
import itertools
A = [1, 2, 3]
B = [4, 5, 6]
C = [7, 8, 9]
for (a, b, c) in itertools.product (A, B, C):
print (a, b, c);
outputs:
1 4 7
1 4 8
1 4 9
1 5 7
1 5 8
1 5 9
1 6 7
1 6 8
1 6 9
2 4 7
2 4 8
2 4 9
2 5 7
2 5 8
2 5 9
2 6 7
2 6 8
2 6 9
3 4 7
3 4 8
3 4 9
3 5 7
3 5 8
3 5 9
3 6 7
3 6 8
3 6 9
See documentation for details.

Ulam Spiral (for diagonal numbers) writing program python

I am writing a code the represent the Ulam Spiral Diagonal Numbers and this is the code I typed myself
t = 1
i = 2
H = [1]
while i < 25691 :
for n in range(4):
t += i
H.append(t)
i += 2
print(H)
The number "25691" in the code is the side lenght of the spiral.If it was 7 then the spiral would contain 49 numbers etc.
Here H will give you the all numbers in diagonal. But I wonder is there a much faster way to do this.
For example if I increase the side lenght large amount it really takes forever to calculate the next H.
Code Example:
t = 1
i = 2
H = [1]
for j in range(25000,26000):
while i < j :
for n in range(4):
t += i
H.append(t)
i += 2
For example my computer cannot calculate it so, is there a faster way to do this ?
You dont need to calculate the intermediate values:
Diagonal, horizontal, and vertical lines in the number spiral correspond to polynomials of the form
where b and c are integer constants.
wikipedia
You can find b and c by solving a linear system of equations for two numbers.
17 16 15 14 13
18 5 4 3 12 ..
19 6 1 2 11 28
20 7 8 9 10 27
21 22 23 24 25 26
Eg for the line 1,2,11,28 etc:
f(0) = 4*0*0+0*b+c = 1 => c = 1
f(1) = 4*1*1+1*b+1 = 2 => 5+b = 2 => b = -3
f(2) = 4*2*2+2*(-3)+1 = 11
f(3) = 4*3*3+3*(-3)+1 = 28

Creating a subarray with no of aubarrays passed as arguments in python

I have a large 100x15 array like this:
[a b c d e f g h i j k l m n o]
[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
.
.
.(Up to 100 rows)
I want to select a portion of this data into a subset using a function which has an argument 'k' in which 'k' denotes the no of subsets to be made, like say k=5 means the data attributes are divided into 3 subsets like below:
[a b c d e] [f g h i j] [k l m n o]
[1 2 3 4 5] [6 7 8 9 10] [11 12 13 14 15]
[1 2 3 4 5] [6 7 8 9 10] [11 12 13 14 15]
[1 2 3 4 5] [6 7 8 9 10] [11 12 13 14 15]
[1 2 3 4 5] [6 7 8 9 10] [11 12 13 14 15]
.
.
.(Up to 100 rows)
and they are stored in a different array. I want to implement this using python. I have implemented this partially. Can any one implement this and provide me the code in the answer?
Partial logic for the inner loop
given k
set start_index = 0
end_index = length of array/k = increment
for j from start_index to end_index
start_index=end_index + 1
end_index = end_index + increment
//newarray[][] (I'm not sure abt here)
Thank You.
This returns an array of matrices with columnsize = 2 , which works for k=2:
import numpy as np
def portion(mtx, k):
array = []
array.append( mtx[:, :k])
for i in range(1, mtx.shape[1]-1):
array.append( mtx[:, k*i:k*(i+1)])
return array[:k+1]
mtx = np.matrix([[1,2,3,10,13,14], [4,5,6,11,15,16], [7,8,9,12,17,18]])
k = 2
print(portion(mtx, k))
Unfortunately I have to do it myself and this is the code in python for the logic. Anyway thanks to #astaning for the attempt.
def build_rotationtree_model(k):
mtx =np.array([[2.95,6,63,23],[2,53,7,79],[3.57,5,65,32],[3.16,5,47,34],[21,2.58,4,46],[3.1,2.16,6,22],[3.5,3.27,3,52],[12,2.56,4,42]])
#Length of attributes (width of matrix)
a = mtx.shape[1]
newArray =[[0 for x in range(k)] for y in range(len(mtx))]
#Height of matrix(total rows)
b = mtx.shape[0]
#Seperation limit
limit = a/k
#Starting of sub matrix
start = 0
#Ending of sub matrix
end = a/k
print(end)
print(a)
#Loop
while(end != a):
for i in range(0,b-1):
for j in range(start,int(end)):
newArray[i][j] = mtx[i][j]
print(newArray[i])
#Call LDA function and add the result to Sparse Matrix
#sparseMat = LDA(newArray) SHould be inside a loop
start = end + 1
end = end + limit
a=list(input())
for i in range(0,len(a)):
for j in range(i,len(a)):
for k in range(i,j+1):
print(a[k],end=" ")
print("\n",end="")

Identify clusters linked by delta to the left and different delta to the right

Consider the sorted array a:
a = np.array([0, 2, 3, 4, 5, 10, 11, 11, 14, 19, 20, 20])
If I specified left and right deltas,
delta_left, delta_right = 1, 1
Then this is how I'd expect the clusters to be assigned:
# a = [ 0 . 2 3 4 5 . . . . 10 11 . . 14 . . . . 19 20
# 11 20
#
# [10--|-12] [19--|-21]
# [1--|--3] [10--|-12] [19--|-21]
# [-1--|--1] [3--|--5] [9--|-11] [18--|-20]
# +--+--|--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--|
# [2--|--4] [13--|-15]
#
# │ ╰──┬───╯ ╰┬─╯ │ ╰┬─╯
# │ cluster 2 Cluster 3 │ Cluster 5
# Cluster 1 Cluster 4
NOTE: Despite the interval [-1, 1] sharing an edge with [1, 3], neither interval includes an adjacent point and therefore do not constitute joining their respective clusters.
Assuming the cluster assignments were stored in an array named clusters, I'd expect the results to look like this
print(clusters)
[1 2 2 2 2 3 3 3 4 5 5 5]
However, suppose I change the left and right deltas to be different:
delta_left, delta_right = 2, 1
This means that for a value of x it should be combined with any other point in the interval [x - 2, x + 1]
# a = [ 0 . 2 3 4 5 . . . . 10 11 . . 14 . . . . 19 20
# 11 20
#
# [9-----|-12] [18-----|-21]
# [0-----|--3] [9-----|-12] [18-----|-21]
# [-2-----|--1][2-----|--5] [8-----|-11] [17-----|-20]
# +--+--|--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--|
# [1 ----|--4] [12-----|-15]
#
# ╰─────┬─────╯ ╰┬─╯ │ ╰┬─╯
# cluster 1 Cluster 2 │ Cluster 4
# Cluster 3
NOTE: Despite the interval [9, 12] sharing an edge with [12, 15], neither interval includes an adjacent point and therefore do not constitute joining their respective clusters.
Assuming the cluster assignments were stored in an array named clusters, I'd expect the results to look like this:
print(clusters)
[1 1 1 1 1 2 2 2 3 4 4 4]
We will leverage np.searchsorted and logic to find cluster edges.
First, let's take a closer look at what np.searchsorted does:
Find the indices into a sorted array a such that, if the corresponding elements in v were inserted before the indices, the order of a would be preserved.
What I'll do is execute np.searchsorted with a using a - delta_left. Let's look at that for delta_left = 1
# a =
# [ 0 2 3 4 5 10 11 11 14 19 20 20]
#
# a - delta_left
# [-1 1 2 3 4 9 10 10 13 18 19 19]
-1 would get inserted at position 0 to maintain order
1 would get inserted at position 1 to maintain order
2 would get inserted at position 1 as well, indicating that 2 might be in the same cluster as 1
3 would get inserted at position 2 indicating that 3 might be in the same cluster as 2
so on and so forth
What we notice is that only when an element less delta would get inserted at its current position would we consider a new cluster starting.
We do this again for the right side with a difference. The difference is that by default if a bunch of elements are the same, np.searchsorted assumes to insert into the front of values. To identify the ends of clusters, I'm going to want to insert after the identical elements. Therefore I'll use the paramater side='right'
If ‘left’, the index of the first suitable location found is given. If ‘right’, return the last such index. If there is no suitable index, return either 0 or N (where N is the length of a).
Now the logic. A cluster can only begin if a prior cluster has ended, with the exception of the first cluster. We'll then consider a shifted version of the results of our second np.searchsorted
Let's now define our function
def delta_cluster(a, dleft, dright):
# use to track whether searchsorted results are at correct positions
rng = np.arange(len(a))
edge_left = a.searchsorted(a - dleft)
starts = edge_left == rng
# we append 0 to shift
edge_right = np.append(0, a.searchsorted(a + dright, side='right')[:-1])
ends = edge_right == rng
return (starts & ends).cumsum()
demonstration
with left, right deltas equal to 1 and 1
print(delta_cluster(a, 1, 1))
[1 2 2 2 2 3 3 3 4 5 5 5]
with left, right deltas equal to 2 and 1
print(delta_cluster(a, 2, 1))
[1 1 1 1 1 2 2 2 3 4 4 4]
Extra Credit
What if a isn't sorted?
I'll utilize information learned from this post
def delta_cluster(a, dleft, dright):
s = a.argsort()
size = s.size
if size > 1000:
y = np.empty(s.size, dtype=np.int64)
y[s] = np.arange(s.size)
else:
y = s.argsort()
a = a[s]
rng = np.arange(len(a))
edge_left = a.searchsorted(a - dleft)
starts = edge_left == rng
edge_right = np.append(0, a.searchsorted(a + dright, side='right')[:-1])
ends = edge_right == rng
return (starts & ends).cumsum()[y]
demonstration
b = np.random.permutation(a)
print(b)
[14 10 3 11 20 0 19 20 4 11 5 2]
print(delta_cluster(a, 2, 1))
[1 1 1 1 1 2 2 2 3 4 4 4]
print(delta_cluster(b, 2, 1))
[3 2 1 2 4 1 4 4 1 2 1 1]
print(delta_cluster(b, 2, 1)[b.argsort()])
[1 1 1 1 1 2 2 2 3 4 4 4]

Sampling a matrix with conditions (no zeros or repeated columns)

In case you are interested in the background of the question, I'm thinking how to solve this post- incidentally, if you solve it there, I'll just erase this question. Ideally, I'd like to get an analytical or algebraic solution (constrained non-capturing rook problem), but short of that I'd like a simulation. Incidentally, I posted a related question without as much detail, in case it is easier to tackle.
But you don't have to leave this page. Basically there are pairings of two lists of soccer teams, and some pairings are good, while others are forbidden by the rules. This is the matrix:
So to generate multiple samplings to match the teams on the row names (to the left) with the column names of opposing teams (at the top), I have to come up with a conditional sampling procedure, but I have no clue how to.
This is what I have attempted so far:
BCN = c(0,2,3,4,0,0,7,8)
ATL = c(0,0,3,4,5,0,7,8)
DOR = c(0,0,3,4,5,6,7,0)
MON = c(1,2,3,0,5,6,7,0)
ARS = c(1,2,3,0,0,6,7,8)
LEI = c(1,2,3,4,0,6,0,8)
JUV = c(1,2,3,4,5,0,7,8)
NAP = c(1,2,0,4,5,6,7,8)
chessboard = t(as.matrix(data.frame(BCN, ATL, DOR, MON, ARS, LEI, JUV, NAP)))
colnames(chessboard) = c("MAD", "BYN", "BEN", "PSG", "MCY", "SEV", "OPO", "LEV")
chessboard
MAD BYN BEN PSG MCY SEV OPO LEV
BCN 0 2 3 4 0 0 7 8
ATL 0 0 3 4 5 0 7 8
DOR 0 0 3 4 5 6 7 0
MON 1 2 3 0 5 6 7 0
ARS 1 2 3 0 0 6 7 8
LEI 1 2 3 4 0 6 0 8
JUV 1 2 3 4 5 0 7 8
NAP 1 2 0 4 5 6 7 8
match = function(){
vec = rep(0,8)
for(i in 1:8){
tryCatch({vec[i] = as.numeric(sample(as.character(chessboard[i,][!(chessboard[i,] %in% vec) & chessboard[i,] > 0]),1))
last=chessboard[8,][!(chessboard[8,] %in% vec) & chessboard[i,] > 0]
},error=function(e){})
}
vec
}
match()
set.seed(0)
nsim = 100000
matches = t(replicate(nsim, match()))
matches = subset(matches, matches[,8]!=0)
colnames(matches) = c("BCN", "ATL", "DOR", "MON", "ARS", "LEI", "JUV", "NAP")
head(matches)
table = apply(matches, 2, function(x) table(x)/nrow(matches))
table
$BCN
x
2 3 4 7 8
0.1969821 0.2125814 0.1967272 0.1967166 0.1969927
$ATL
x
3 4 5 7 8
0.2016226 0.1874462 0.2357732 0.1875737 0.1875843
$DOR
x
3 4 5 6 7
0.1773264 0.1686188 0.2097673 0.2787270 0.1655605
$MON
x
1 2 3 5 6 7
0.2567882 0.2031199 0.1172017 0.1341921 0.1789617 0.1097365
$ARS
x
1 2 3 6 7 8
0.2368882 0.1907169 0.1104480 0.1651358 0.1026112 0.1941999
$LEI
x
1 2 3 4 6 8
0.2129743 0.1717302 0.1019210 0.1856410 0.1511081 0.1766255
$JUV
x
1 2 3 4 5 7 8
0.15873252 0.12940289 0.07889902 0.14203948 0.22837179 0.12845781 0.13409648
$NAP
x
1 2 4 5 6 7 8
0.1346168 0.1080481 0.1195272 0.1918956 0.2260675 0.1093436 0.1105011
Maybe try this:
matches = setNames(as.list(rep(NA,8)), rownames(mat))
set.seed(1)
# For each row, sample a column, then drop that column.
# 'sample.int' will automatically renormalize the probabilities.
for (i in sample.int(8)) {
team_i = rownames(mat)[i]
j = sample.int(ncol(mat), 1, prob=mat[i,])
matches[[team_i]] = colnames(mat)[j]
mat = mat[,-j,drop=FALSE]
}
> matches
# $Barcelona
# [1] "Oporto"
#
# $Atletico
# [1] "Benfica"
#
# $Dortmund
# [1] "Paris"
#
# $Juventus
# [1] "City"
#
# $Arsenal
# [1] "Sevilla"
#
# $Napoli
# [1] "Leverkusen"
#
# $Monaco
# [1] "Bayern"
#
# $Leicester
# [1] "Madrid"
Might be a good idea to add restrictions so you don't end up with a row of zeros.

Categories