Python: Slow algorithm with communities - python

I have two different community structure, but the nodes are the same. Both community structures are stored in a dictionary(key: name of community (string) ; value: nodes in this community (int list)) like this:
communities_map_friendship:
C0:[0, 20, 48, 55, 60, 68, 79, 81, 85, ..., 78190]
C1:[1, 6, 10, 13, 18, 19, 22, 24, 26, ..., 78180]
C2:[7, 21, 25, 29, 36, 37, 42, 49, 70, ..., 78146]
C3:[40, 86, 103, 123, 129, 143, 154, 167, ..., 78172]
C4:[66, 83, 133, 169, 174, 175, 205, 237, ..., 78166]
C5:[179, 182, 188, 219, 228, 248, 265, 286, ..., 77981]
community_map_uservotes:
C0:[0, 20, 41, 48, 55, 60, 68, 79, 81, 85, ..., 78190]
C1:[1, 6, 10, 13, 18, 19, 24, 26, 28, 30, 31, ..., 78173]
C2:[22, 39, 43, 47, 53, 61, 69, 73, 97, 102, ..., 78180]
C3:[7, 21, 25, 29, 36, 37, 42, 49, 70, 80, 83, ..., 78166]
C4:[183, 483, 608, 1453, 2205, 2957, 3090, 3378, ..., 78149]
My goal is to count the cases when two different nodes are in on of the community lists in both structures. (e.g.: (0,20), (0,48), (20,48), ..., (1,6),(1,10),(6,10), ..., (7,21),...). It's important that is not required to be the same community. For example the nodes 7 and 21 are in C2 community in the first structure, but in C3 community in the second structure, but this pair should be included in the same way.
What I have already tried:
# Return true, if the two nodes are in the same community, otherwise return false
def Is_In_Same_Community(node1, node2, community_map):
for community in community_map.values():
if((node1 in community) and (node2 in community)):
return True
elif(((node1 in community) and (node2 not in community)) or ((node1 not in community) and (node2 in community))):
return False
return False
#The algorithm, which counts the appropriate value:
TP=0
for community in communities_map_friendship.values():
res = [Is_In_Same_Community(x,y,communities_map_uservotes)
for i,x in enumerate(community) for j,y in enumerate(community) if i != j]
TP = TP + res.count(True)
The algorithm is good, but the problem is that I have around 30.000 nodes, so it would run for days until I got the proper value.
Does anyone have an idea to speed up this algorithm somehow?

This shouldn't take days for 30000 nodes, and the while loop could still be optimized some:
def count_pairs( cm1, cm2 ):
count = 0
for k,l1 in cm1.items():
if k not in cm2:
continue
l2 = cm2[k]
i1 = i2 = 0
common = []
while i1 < len(l1) and i2 < len(l2):
v1 = l1[i1]
v2 = l2[i2]
if v1 < v2:
i1 += 1
elif v1 > v2:
i2 += 1
else:
common.append(v1)
i1 += 1
i2 += 1
count += len(common)*(len(common)+1)/2
return count
Taking the latest version of the question into account:
def count_pairs( cm1, cm2 ):
count = 0
for k,l1 in cm1.items():
for k2,l2 in cm2.items():
i1 = i2 = 0
common = []
while i1 < len(l1) and i2 < len(l2):
v1 = l1[i1]
v2 = l2[i2]
if v1 < v2:
i1 += 1
elif v1 > v2:
i2 += 1
else:
common.append(v1)
i1 += 1
i2 += 1
count += len(common)*(len(common)+1)/2
return count

There's a different way to approach this. Consider two lists:
l1 = [0, 20, 48, 55]
l2 = [0, 20, 41, 48, 60]
The shared pairs between these lists are just the permutations (or combinations if you don't want (0, 20) and (20, 0) to be distinct) of the shared members. For example the intersection of these lists are:
set(l1) & set(l2)
# {0, 20, 48}
So the shared pairs are (0, 20), (20, 0), (0, 48), (48, 0), (20, 48), (48, 20)
If you only care about the count, then you don't even need to worry about figuring out those pairs because we know the number of pairs is determined by the formula:
(n!)/(n -2)!
With that in mind you can just take the product of the keys from each list and add the count of the permutations of shared nodes:
from itertools import product
import math
mf = {
"C0":[0, 20, 48, 55],
"C1":[1, 6, 10, 13],
"C2":[7, 21, 25, 55],
}
mu = {
"C0":[0, 20, 41, 48, 60],
"C1":[1, 6, 10, 13, 18],
"C3":[7, 21, 25, 29],
}
TP = 0
for p1, p2 in product(mf.values(), mu.values()):
num_common = len(set(p1) & set(p2))
if num_common >= 2:
TP += math.factorial(num_common)//math.factorial((num_common - 2))
print(TP) # 24
Which is the same answer you get with your code.

Related

Modifying alternate indices of 3d numpy array

I have a numpy array with shape (140, 23, 2) being 140 frames, 23 objects, and x,y locations. The data has been generated by a GAN and when I animate the movement it's very jittery. I want to smooth it by converting the coordinates for each object so every odd number index to be the mid-point between the even numbered indices either side of it. e.g.
x[1] = (x[0] + x[2]) / 2
x[3] = (x[2] + x[4]) / 2
Below is my code:
def smooth_coordinates(df):
# df shape is (140, 23, 2)
# iterate through each object (23)
for j in range(len(df[0])):
# iterate through 140 frames
for i in range(len(df)):
# if it's an even number and index allows at least 1 index after it
if (i%2 != 0) and (i < (len(df[0])-2)):
df[i][j][0] = ( (df[i-1][j][0]+df[i+1][j][0]) /2 )
df[i][j][1] = ( (df[i-1][j][1]+df[i+1][j][1]) /2 )
return df
Aside from it being very inefficient my input df and output df are identical. Any suggestions for how to achieve this more efficiently?
import numpy as np
a = np.random.randint(100, size= [140, 23, 2]) # input array
b = a.copy()
i = np.ogrid[1: a.shape[0]-1: 2] # odd indicies
i
>>> [ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25,
27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51,
53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77,
79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101, 103,
105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129,
131, 133, 135, 137]
(a == b).all() # testing for equality
>>> True
a[i] = (a[i-1] + a[i+1]) / 2 # averaging positions across frames
(a == b).all() # testing for equality again
>>> False

Problem with adding elements from functions to list (too much memory is using?)

I replace in this code
import matplotlib.pyplot as plt
#parametry dla romeo i julii, zeby byly niezmienne w uczuciach musza byc wieksze od 0
aR = 0.5
aL = 0.7
#pR pL odpowiedzi Romea/Julii na miłość
pR = 0.2
pL = 0.5
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]
rom = []
jul = []
def Romeo(n):
if n == 0:
return 1
return Romeo(n - 1)*aR
def Julia(n):
if n == 0:
return 1
return Julia(n - 1)*aL
def alfa(n):
if n == 0:
return 1
return aR*Romeo(n - 1) + pR*Julia(n - 1)
def beta(n):
if n == 0:
return 1
return aL*Julia(n - 1) + pL*Romeo(n - 1)
j = 0
while j < 100:
rom.append(alfa(j))
j+=1
j = 0
while j < 100:
jul.append(beta(j))
j+=1
plt.plot(x, rom, label = "Romeo love")
plt.plot(x, jul, label = "Julia love")
plt.xlabel("Days")
plt.ylabel("Romeo love")
plt.title("Some graph")
plt.legend()
plt.show()
only alfa and beta functions byt this:
import matplotlib.pyplot as plt
#parametry dla romeo i julii, zeby byly niezmienne w uczuciach musza byc wieksze od 0
aR = 0.5
aL = 0.7
#pR pL odpowiedzi Romea/Julii na miłość
pR = 0.2
pL = 0.5
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, ]
rom = []
jul = []
def Romeo(n):
if n == 0:
return 1
return Romeo(n - 1)*aR
def Julia(n):
if n == 0:
return 1
return Julia(n - 1)*aL
def alfa(n):
if n == 0:
return 1
return round(aR*alfa(n - 1) + pR*beta(n - 1), 3)
def beta(n):
if n == 0:
return 1
return round(aL*beta(n-1) + pL*alfa(n - 1), 3)
j = 0
while j < 100:
rom.append(alfa(j))
j+=1
j = 0
while j < 100:
jul.append(beta(j))
j+=1
plt.plot(x, rom, label = "Romeo love")
plt.plot(x, jul, label = "Julia love")
plt.xlabel("Days")
plt.ylabel("Romeo love")
plt.title("Some graph")
plt.legend()
plt.show()
And Pycharm does not want to compilate (does not draw this graph) or it will take a lot of time. Ealier it was not a problem. \
I thought that a lot of numbers after point can be a reason and i round every number from list, but it didnt solve the problem.
What I changed by replacing this functions? How can I fix that?
Im pretty sure that the problem is in assigning elements from functions to list [2 while]. But i do not know why.
The current recursive approach is wasteful.
For example, when computing alfa(1) would require alfa(0), beta(0).
When you move on to alfa(2), the code will first compute alfa(1) and beta(1). Then alfa(1) would call alfa(0) and beta(0), while beta(1) would separately call alfa(0), beta(0) again, without recycling what we have computed before. So you need 6 calls for alfa(2).
At alfa(3), you would compute alfa(2) and beta(2), each of which needs 6 calls; so you need 14 calls (if my math is not off).
Imagine how many computations you would need at n == 100; the answer is 2535301200456458802993406410750. Cumulatively, i.e., since you want to plot alfa(1), ..., alfa(100), you need 5070602400912917605986812821300
computations in total, only to produce a single list rom.
You can use memoization to remember the previously calculated results and recycle them.
In python, you can achieve this by using functools.lru_cache (python doc); put
from functools import lru_cache
at the beginning of your code and then put
#lru_cache()
before each function; e.g.,
#lru_cache()
def Romeo(n):
if n == 0:
return 1
return Romeo(n - 1)*aR
You will see the graph almost immediately now.

Multiple Traveling Salemans Problem with MIP

I've been trying to implement a mTSP in a normal TSP using the already made code in MIP Link
So this is the code I have so far in python and its throwing me an error which I don't understand:
places = ['Antwerp', 'Bruges', 'C-Mine', 'Dinant', 'Ghent',
'Grand-Place de Bruxelles', 'Hasselt', 'Leuven',
'Mechelen', 'Mons', 'Montagne de Bueren', 'Namur',
'Remouchamps', 'Waterloo']
salesman=['Salesman1','Salesman2']
# distances in an upper triangular matrix
dists = [[83, 81, 113, 52, 42, 73, 44, 23, 91, 105, 90, 124, 57],
[161, 160, 39, 89, 151, 110, 90, 99, 177, 143, 193, 100],
[90, 125, 82, 13, 57, 71, 123, 38, 72, 59, 82],
[123, 77, 81, 71, 91, 72, 64, 24, 62, 63],
[51, 114, 72, 54, 69, 139, 105, 155, 62],
[70, 25, 22, 52, 90, 56, 105, 16],
[45, 61, 111, 36, 61, 57, 70],
[23, 71, 67, 48, 85, 29],
[74, 89, 69, 107, 36],
[117, 65, 125, 43],
[54, 22, 84],
[60, 44],
[97],
[]]
# number of nodes and list of vertices
n, V, S = len(dists), set(range(len(dists))), set(range(len(salesman)))
# distances matrix
c = [[0 if i == j
else dists[i][j-i-1] if j > i
else dists[j][i-j-1]
for j in V] for i in V]
model = Model()
# binary variables indicating if arc (i,j) is used on the route or not
x = [[[model.add_var(var_type=BINARY) for j in V] for i in V] for s in S]
# objective function: minimize the distance
model.objective = minimize(xsum(c[i][j]*x[i][j][s] for i in V for j in V for s in S))
The error is:
IndexError Traceback (most recent call last)
<ipython-input-52-8550246fcd90> in <module>
48
49 # objective function: minimize the distance
---> 50 model.objective = minimize(xsum(c[i][j]*x[i][j][s] for i in V for j in V for s in S))
51
52
~/opt/anaconda3/lib/python3.7/site-packages/mip/model.py in xsum(terms)
1453 """
1454 result = mip.LinExpr()
-> 1455 for term in terms:
1456 result.add_term(term)
1457 return result
<ipython-input-52-8550246fcd90> in <genexpr>(.0)
48
49 # objective function: minimize the distance
---> 50 model.objective = minimize(xsum(c[i][j]*x[i][j][s] for i in V for j in V for s in S))
51
52
IndexError: list index out of range
And it doesn't make sense to me since I created just another summation.
Thank you very much in advance.
This is a Python issue, not a Gurobi issue. You don't completely understand how nested list-comprehensions work.
We can reproduce this with:
x = [[["x%s%s%s" % (i,j,k) for i in range(2)] for j in range(2)] for k in range(3)]
i = 0
j = 0
k = 2
x[i][j][k]
The x is not x[i][j][k] but rather x[k][j][i]. So in the above example we see:
[[['x000', 'x100'], ['x010', 'x110']], [['x001', 'x101'], ['x011', 'x111']], [['x002', 'x102'], ['x012', 'x112']]]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-13-a2e90cc605ea> in <module>()
4 j = 0
5 k = 2
----> 6 x[i][j][k]
IndexError: list index out of range
If we would have entered:
i = 0
j = 0
k = 2
print(x[k][j][i])
we would see:
x002
Conclusion: try:
x = [[[model.add_var(var_type=BINARY) for s in S] for j in V] for i in V]
...
x[i][j][s]

Sum of multiple of 3+5 < 100

New to Jupyter Notebook, computing this code to return sum of values that are a multiple of 3 and 5, AND less than 100 in my list range 1, 100. I've got a feeling that I'm truncating the code by removing 3 and 5 from the equation. Not sure how/where to include that.
print(list(range(1, 100)))
multiple35 = 0
for i in range (1, 100):
if i % 15 == 0 and multiple35 <= 100:
multiple35 += i
print(multiple35)
My print line returns the range, Plus the 3 correct multiples less than 100. BUT ALSO prints 150, which is greater than and should be excluded from the result.
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]
15
45
90
150
Appreciate your help here.
BUT ALSO prints 150, which is greater than and should be excluded from the result.
The reason is simple. You are testing multiple35 <= 100 before the addition (multiple35 += i). So the sum is printed first and then tested in the next round. Therefore the output ends after the first occurrence that is bigger than 100.
By the way, it is useless to go through all natural numbers and only do anything on each 15th element (because of i % 15 == 0). You can use a tailored range instead:
>>> list(range(15,100,15))
[15, 30, 45, 60, 75, 90]
So a simplified loop which would stop printing when reaching 100, could look like:
multiple35 = 0
for i in range (15, 100, 15):
multiple35 += i
if multiple35 > 100:
break # no reason to continue the loop, the sum will never go back below 100
print(multiple35)
You have to check if the final target will exceed the threshold, because in your loop, when i=75, it satisfies the condition 75%15==0 and also satisfies 75<=100 and since it satisfies both, we let it into the next block which adds 75 to it and gives 150 which exceeds the threshold. The solution is to not even allow a number inside the adding part if when added, crosses the threshold,
There are simpler solutions like above by #Melebius but, I wanted to explain this in the way OP has written
multiple35 = 0
for i in range (1, 100):
if i % 15 == 0 and multiple35+i<=100:
multiple35 += i
print(multiple35)
There are multiple flaws in your logic. I am not going to address them all, but instead suggest an alternate way to solve your issue.
Simply notice that multiples of 3 and 5 are exactly the multiples of 15. So there is no need to range over all numbers from 0 to 100.
for x in range(15, 100, 15):
print(x)
# 15
# 30
# 45
# 60
# 75
# 90
You also mention that you want to sum all numbers. In python, you can sum over any iterator with sum, including a range.
print(sum(range(15, 100, 15)))
# 315
You can use list comprehension also here
values = [i for i in range(1,100) if i%5==0 if i%3==0]
print("Numbers divisible by 3 and 5:",values)
sum_of_numbers = 0
for i,items in enumerate(values):
sum_of_numbers = sum_of_numbers+items
if sum_of_numbers>100:
break
print(values[:i])

Is there any way to condense a for-else loop in Python?

I have made a piece of code that spits out prime numbers up to the 10001st number. It currently takes up 4 lines of code, and was wondering if I could condense it further? Here it is;
for i in range(3,104744,2):
for x in range(3,int(i/2),2):
if i % x == 0 and i != x: break
else: print(i)
I am aware that condensing code too much is usually not a good thing, but was wondering if it was possible.
Thanks.
You can use a list comprehension and any to get a one-liner solution:
>>> [p for p in range(2, 100) if not any (p % d == 0 for d in range(2, int(p**0.5) + 1))]
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
It uses the fact that a divisor cannot be larger than the square root of the number it divies.
It seems to work fine:
>>> len([p for p in range(2, 104744) if not any (p % d == 0 for d in range(2,int(p**0.5)+1))])
10001
List comprehension
>>> r=range(2,100)
>>> [p for p in r if [p%d for d in r].count(0)<2]
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
Try this one:
for i in range(3,100,2):
if all( i%x for x in range(3, i//2, 2) ):
print(i)

Categories