How to generate random integer pairs between 0 and 31? - python

As the title illustrated, I want to generate 100 pairs of random integers, such as (1, 0), (1, 31), (5, 7), (3, 19) and so on. And the range of value is from 0 to 31. At the same time, I hope each pair appears only once within the 100 pairs, i.e., the value of each pair is different from others.
So How can I achieve it in Python?
Supplement:
To be precise, what I want is a 2D array, and its shape is (100,2). Each row is required to be unique.

You can use random.sample:
import random
pool = range(32)
random.sample(pool, 2)
# [7, 28]
random.sample(pool, 2)
# [15, 3]

import numpy as np
pool_ = [ (i,j) for i in range(32) for j in range(32) ]
# List of all pairs
pool = np.zeros( len(pool_), dtype = tuple ) # Create an np.array of tuples
pool[:] = pool_ # fill it with pool_
selected = np.random.choice( pool, replace = False, size =100 )
# select 100 random choices from the pool, without replacement
print( selected )
# [(0, 22) (4, 30) (2, 25) (4, 19) (6, 6) (17, 22) (18, 14) (12, 27) (30, 6)
# (22, 18) (13, 5) (23, 22) (27, 17) (17, 26) (26, 22) (7, 15) (15, 27)
# (4, 31) (15, 1) (28, 22) (25, 16) (25, 15) (7, 12) (7, 21) (26, 14)
# (9, 9) (8, 0) (26, 27) (14, 14) (22, 0) (4, 18) (12, 3) (25, 9) (22, 31)
# (11, 6) (23, 7) (18, 19) (19, 25) (23, 19) (25, 5) (5, 19) (3, 24)
# (30, 0) (18, 10) (20, 4) (24, 11) (13, 28) (10, 5) (6, 7) (11, 7)
# (25, 24) (23, 18) (15, 10) (14, 7) (11, 11) (9, 23) (13, 8) (3, 28)
# (28, 3) (21, 3) (24, 31) (29, 27) (24, 28) (17, 6) (30, 19) (25, 28)
# (12, 17) (13, 15) (3, 11) (14, 1) (12, 6) (17, 17) (23, 2) (24, 18)
# (25, 11) (3, 26) (6, 2) (0, 28) (5, 12) (4, 1) (23, 17) (29, 23) (22, 17)
# (24, 15) (2, 5) (28, 11) (19, 27) (9, 20) (1, 11) (30, 5) (30, 21)
# (30, 28) (18, 31) (5, 27) (30, 11) (16, 0) (24, 16) (12, 30) (25, 25)
# (16, 22)]
I haven't thoroughly tested it but with replace = False in random.choice each selection should be unique.
To return a 2D array
import numpy as np
pool_ = [ (i,j) for i in range(32) for j in range(32) ]
# List of all pairs
pool_ix = np.arange( len(pool_) ) # index by row
pool = np.array(pool_) # 2D pool
selected_ix = np.random.choice( pool_ix, replace = False, size =100 )
pool[selected_ix, : ] # Select all of each selected row.
# array([[12, 19],
# [ 6, 23],
# [ 2, 3],
# [ 5, 20],
# :::
# [20, 3],
# [24, 20],
# [ 1, 28]])

Related

How can I effectively use reduce() to optimize my runtime here?

I'm trying to optimize my critical path. source_and_dest is a list of coordinates, state is a translation from one coordinate to another, and distance_squared is a two-dimensional list describing a relationship between two coordinates. My current approach, which is correct, is the following:
e = 0
for source, dest in self.source_and_dest:
e += self.distance_squared[self.state[source]][self.state[dest]]
However, I run these lines millions of times and I need to improve my runtime (even if only slightly). My cumulative runtime for this is around 6.2s (with other benchmarks, this will grow to be much larger).
My understanding is that using functools.reduce() could improve runtime. My attempt, however, gave me much worse runtime (2x worse).
def do_sum(self,x1, x2):
if type(x1) == int:
return x1 + self.distance_squared[self.state[x2[0]]][ self.state[x2[1]]]
else:
return self.distance_squared[self.state[x1[0]]][ self.state[x1[1]] ] + self.distance_squared[self.state[x2[0]]][ self.state[x2[1]] ]
...
e=functools.reduce(self.do_sum, self.source_and_dest)
I imagine there may be a way to use reduce() more effectively here, rather than having to check for the x1's type and without so many nested array accesses.
Here is some runnable code. Both approaches are included:
import functools
import cProfile
def torus_min_distance_xy( sourcexy, sinkxy, Nx,Ny ):
source_x = sourcexy % Nx
source_y = sourcexy // Nx
sink_x = sinkxy % Nx
sink_y = sinkxy // Nx
if( source_x <= sink_x and source_y <= sink_y ):
return (( sink_x - source_x ), ( sink_y - source_y ))
elif( source_x > sink_x and source_y <= sink_y ):
return (( Nx + sink_x - source_x ) , ( sink_y - source_y ))
elif( source_x <= sink_x and source_y > sink_y ):
return (( sink_x - source_x ) , ( Ny + sink_y - source_y ))
else:
return (( Nx - source_x + sink_x ) , ( Ny - source_y + sink_y ))
class Energy:
def __init__( self ):
self.Nx = 8
self.Ny = 9
self.memoized_source_pe_and_dest_partitions = [(56, 5), (12, 33), (68, 14), (53, 15), (28, 33), (7, 24), (57, 5), (14, 22), (22, 28), (32, 19), (1, 28), (66, 17), (58, 0), (69, 14), (55, 7), (63, 12), (52, 15), (17, 22), (62, 12), (59, 0), (54, 7), (8, 29), (65, 1), (33, 29), (0, 32), (31, 70), (67, 17), (19, 24), (61, 8), (60, 8), (64, 1), (29, 32), (15, 31), (5, 19), (24, 31), (38, 16), (3, 26), (50, 9), (35, 4), (20, 26), (10, 23), (39, 16), (9, 18), (18, 20), (21, 25), (11, 20), (48, 2), (40, 6), (51, 9), (37, 10), (45, 3), (34, 4), (2, 18), (44, 3), (41, 6), (36, 10), (13, 30), (47, 11), (26, 30), (6, 21), (27, 71), (49, 2), (25, 23), (43, 13), (42, 13), (46, 11), (30, 21), (4, 27), (16, 25), (23, 27)]
self.memoized_distance_squared = [[0 for a in range(self.Nx*self.Ny)] for b in range(self.Nx*self.Ny)]
for source in range(self.Nx*self.Ny):
for dest in range(self.Nx*self.Ny):
tmp = torus_min_distance_xy(source,dest,self.Nx,self.Ny)
self.memoized_distance_squared[source][dest] = (tmp[0]+1)*(tmp[1]+1)
self.state = [59, 1, 2, 44, 4, 5, 6, 7, 28, 18, 37, 21, 62, 13, 14, 15, 38, 66, 51, 61, 46, 47, 22, 23, 69, 50, 45, 39, 60, 63, 30, 31, 67, 68, 34, 35, 36, 10, 27, 16, 40, 41, 42, 43, 26, 3, 20, 11, 48, 49, 25, 9, 52, 53, 54, 55, 56, 57, 58, 0, 33, 8, 29, 12, 64, 65, 32, 17, 24, 19, 70, 71]
def do_sum(self, x1, x2):
if type(x1) == int:
return x1 + self.memoized_distance_squared[self.state[x2[0]]][ self.state[x2[1]]]
else:
return self.memoized_distance_squared[self.state[x1[0]]][ self.state[x1[1]] ] + self.memoized_distance_squared[self.state[x2[0]]][ self.state[x2[1]] ]
def loop( self, iterations ):
for i in range( iterations ):
# comment out one approach at a time
# first approach
e0 = 0
for source_partition, dest_partition in self.memoized_source_pe_and_dest_partitions:
e0 += self.memoized_distance_squared[self.state[source_partition]][ self.state[dest_partition] ]
# second approach
# e1 = functools.reduce( self.do_sum, self.memoized_source_pe_and_dest_partitions )
energy = Energy()
cProfile.runctx('energy.loop(100000)',globals(), locals())

Unexpected output when applying LDA trained model to given corpus

I have trained a LDA model using the following parameters:
>> model = gensim.models.ldamodel.LdaModel(corpus=corpus,
id2word=id2word,
num_topics=25,
passes=10,
minimum_probability=0)
Then, I applied this model to a given corpus:
>> lda_corpus = model[corpus]
I was expecting lda_corpus to be a list of lists or 2D matrix, where the number of rows is the number of docs and the number of columns is the number of topics and each element matrix, a tuple of the form (topic_index, probability). However I am getting this weird result where some elements are again a list:
>> print(lda_model_1[corpus[0]])
>> ([(0, 0.012841966), (3, 0.073988825), (4, 0.05184835), (8, 0.38537887), (10, 0.022958927), (11, 0.24562633), (13, 0.05168812), (17, 0.06522224), (21, 0.024792604)], [(0, [11]), (1, [8, 3, 17, 13]), (2, [3, 17, 8, 13]), (3, [8, 3]), (4, [11]), (5, [8, 17, 3]), (6, [4]), (7, [4, 8]), (8, [8, 13, 3]), (9, [11]), (10, [8, 0]), (11, [8, 13, 0]), (12, [21]), (13, [11]), (14, [11]), (15, [8]), (16, [8, 11, 13, 0]), (17, [11]), (18, [11, 17]), (19, [8, 13, 17, 3]), (20, [17, 13, 8]), (21, [17, 11, 8]), (22, [11]), (23, [8]), (24, [8, 13]), (25, [8, 3, 13])], [(0, [(11, 1.0)]), (1, [(3, 0.15384258), (8, 0.71774876), (13, 0.011975089), (17, 0.11643356)]), (2, [(3, 0.45133045), (8, 0.21692151), (13, 0.09479065), (17, 0.23232804)]), (3, [(3, 0.24423833), (8, 0.75576156)]), (4, [(11, 1.0)]), (5, [(3, 0.02001735), (8, 1.6895359), (17, 0.2904468)]), (6, [(4, 1.0)]), (7, [(4, 1.2565874), (8, 0.7367453)]), (8, [(3, 0.05150538), (8, 0.8553984), (13, 0.07775658)]), (9, [(11, 2.0)]), (10, [(0, 0.13937186), (8, 0.8588695)]), (11, [(0, 0.023420962), (8, 0.7131521), (13, 0.263427)]), (12, [(21, 1.0)]), (13, [(11, 0.99124163)]), (14, [(11, 2.0)]), (15, [(8, 1.0)]), (16, [(0, 0.011193657), (8, 1.7189965), (11, 0.23104382), (13, 0.029387457)]), (17, [(11, 1.9989293)]), (18, [(11, 0.9135094), (17, 0.08400644)]), (19, [(3, 0.07146881), (8, 2.1837764), (13, 0.38799366), (17, 0.352704)]), (20, [(8, 0.22638415), (13, 0.24114841), (17, 0.52740365)]), (21, [(8, 0.02224951), (11, 0.24574266), (17, 0.7231928)]), (22, [(11, 1.0)]), (23, [(8, 1.0)]), (24, [(8, 0.972818), (13, 0.027181994)]), (25, [(3, 0.16742931), (8, 0.7671518), (13, 0.05224549)])])
I would appreciate any help.
The problem was related to model parameters. I was using the following config:
lda_model = gensim.models.ldamodel.LdaModel(corpus=corpus,
id2word=id2word,
num_topics=ntopics,
random_state=100,
update_every=1,
chunksize=100,
passes=10,
alpha='auto',
per_word_topics=True)
However, there were some of them that wasn't necessary and that were causing the trouble. The config I am using now is the following:
lda_model = gensim.models.ldamodel.LdaModel(corpus=corpus, id2word=id2word, num_topics=ntopics, \
update_every=1, chunksize=10000, passes=1)

Restart nested loop in python after condition is met

I have two ranges:
range_1 (0,10)
range_2 (11, 40)
I want to create a list of tuples from the two ranges above (range_1 and range_2) if the sum of any of the two elements in the two ranges is an even number.
Thus 0 from range_1 and 12 from range_2 = 12 which is even, the same with 1 from range_1 and 13 from range_2 = 14 which is even.
However I don't want to go through all the elements in range_2. Only 5 successful attempts are needed, then immediately I have to go back to the second iteration in range_1.
Thus for the first iteration:
(0, 12, 12), (0, 14, 14), (0, 16, 16), (0, 18, 18), (0, 20, 20)
then we go to the second iteration:
(1, 11, 12), (1, 13, 14), (1, 15, 16), (1, 17, 18), (1, 19, 20)
and so on till 9 in range_1:
(9, 11, 20), (9, 13, 22), (9, 15, 24), (9, 17, 26), (9, 19, 28)
Here is my code, which goes through all the elements, which is obviously wrong, because it goes through all the elements in range_2!
list_1 = []
for i in range(10):
for j in range(11,40):
if (i+j)%2 == 0:
list_1.append((i, j, (i+j)))
Just store a counter so that if you reach five then you break out of your nested for-loop:
list_1 = []
for i in range(10):
counter = 0
for j in range(11,40):
if (i+j)%2 == 0:
list_1.append((i, j, (i+j)))
counter += 1
if counter == 5:
break
which gives list_1 as:
[(0, 12, 12), (0, 14, 14), (0, 16, 16), (0, 18, 18), (0, 20, 20),
(1, 11, 12), (1, 13, 14), (1, 15, 16), (1, 17, 18), (1, 19, 20),
(2, 12, 14), (2, 14, 16), (2, 16, 18), (2, 18, 20), (2, 20, 22),
(3, 11, 14), (3, 13, 16), (3, 15, 18), (3, 17, 20), (3, 19, 22),
(4, 12, 16), (4, 14, 18), (4, 16, 20), (4, 18, 22), (4, 20, 24),
(5, 11, 16), (5, 13, 18), (5, 15, 20), (5, 17, 22), (5, 19, 24),
(6, 12, 18), (6, 14, 20), (6, 16, 22), (6, 18, 24), (6, 20, 26),
(7, 11, 18), (7, 13, 20), (7, 15, 22), (7, 17, 24), (7, 19, 26),
(8, 12, 20), (8, 14, 22), (8, 16, 24), (8, 18, 26), (8, 20, 28),
(9, 11, 20), (9, 13, 22), (9, 15, 24), (9, 17, 26), (9, 19, 28)]
It should be noted that this is not the most efficient way to go about creating your data structure. Clearly only ever other j-value generated in the inner for-loop will be used, which is wasteful.
Therefore you could specify a step for the j for-loop of 2 so that only other j-value is considered. However, you must be careful with the starting value now. If you were to always start at 11 and step in 2s then you would only get odd j-values and these could never combine with the current i-value to give an even number if i was even. Therefore you would have to change the j for-loop to start at 12 if i is even and 11 if i is odd.
As others have commented, you can simplify the problem quite a bit by constructing ranges that accomodate what you're trying to do. Here it is as a nested comprehension:
[(i, j, i+j) for i in range(0, 10) for j in range((11 if i % 2 else 12), 21, 2)]
If you're smart about creating your range, you can simplify the algorithm. Use a step to make the range skip every other j, adjust the start based on whether i is even, and set the end to 21 since it will only ever get that high.
list_1 = []
for i in range(10):
start = 12 if i % 2 == 0 else 11
for j in range(start, 21, 2):
list_1.append((i, j, i+j))
print(list_1[-5:]) # For testing
First two lines of output:
[(0, 12, 12), (0, 14, 14), (0, 16, 16), (0, 18, 18), (0, 20, 20)]
[(1, 11, 12), (1, 13, 14), (1, 15, 16), (1, 17, 18), (1, 19, 20)]

Joining two sets of tuples at a common value

Given:
setA = [(1, 25), (2, 24), (3, 23), (4, 22), (5, 21), (6, 20),
(7, 19), (8, 18), (9, 17), (10, 16), (11, 15), (12, 14),
(13, 13),(14, 12), (15, 11), (16, 10), (17, 9), (18, 8),
(19, 7),(20, 6), (21, 5), (22, 4), (23, 3), (24, 2), (25, 1)]
setB = [(1, 19), (2, 18), (3, 17), (4, 16), (5, 15), (6, 14), (7, 13),
(8, 12), (9, 11), (10, 10), (11, 9), (12, 8), (13, 7), (14, 6),
(15, 5), (16, 4), (17, 3), (18, 2), (19, 1)]
How can I combine the two sets using the first element of each tuple in each set as a common key value. So for tuple at position 1 in each set it would be (1,25) and (1,19) respectively. Joined together would yield: (25,1,19)
(25,1,19)
(24,2,18)
(23,3,17)
...
(7,19,1)
(6,20,none)
...
(2,24,none)
(1,25,none)
Note: that order of output tuple must be maintained. Example:
(setA value, common value, setB value)
(setA value, common value, setB value)etc...
Note: Must use Python 2.7x standard libraries
I'm trying to do something like [(a,b,c) for (a,b),(b,c) in zip(setA,setB)] but I don't fully understand the proper syntax and logic.
Thank you.
Seems like what you want can be implemented as easily as a dictionary lookup on setB inside a list comprehension.
mapping = dict(setB)
out = [(b, a, mapping.get(a)) for a, b in setA]
print(out)
[(25, 1, 19),
(24, 2, 18),
(23, 3, 17),
(22, 4, 16),
(21, 5, 15),
(20, 6, 14),
(19, 7, 13),
(18, 8, 12),
(17, 9, 11),
(16, 10, 10),
(15, 11, 9),
(14, 12, 8),
(13, 13, 7),
(12, 14, 6),
(11, 15, 5),
(10, 16, 4),
(9, 17, 3),
(8, 18, 2),
(7, 19, 1),
(6, 20, None),
(5, 21, None),
(4, 22, None),
(3, 23, None),
(2, 24, None),
(1, 25, None)]
Since our lists have different size zip is not a solution.
One solution could be using zip_longest method from itertools package.
finalSet = [(b, a, c[1] if c is not None else c) for (a,b), c in zip_longest(*setA,*setB)]
Output
(25, 1, 19)
(24, 2, 18)
(23, 3, 17)
(22, 4, 16)
(21, 5, 15)
(20, 6, 14)
(19, 7, 13)
(18, 8, 12)
(17, 9, 11)
(16, 10, 10)
(15, 11, 9)
(14, 12, 8)
(13, 13, 7)
(12, 14, 6)
(11, 15, 5)
(10, 16, 4)
(9, 17, 3)
(8, 18, 2)
(7, 19, 1)
(6, 20, None)
(5, 21, None)
(4, 22, None)
(3, 23, None)
(2, 24, None)
(1, 25, None)
setA = [(1, 25), (2, 24), (3, 23), (4, 22), (5, 21), (6, 20),
(7, 19), (8, 18), (9, 17), (10, 16), (11, 15), (12, 14),
(13, 13),(14, 12), (15, 11), (16, 10), (17, 9), (18, 8),
(19, 7),(20, 6), (21, 5), (22, 4), (23, 3), (24, 2), (25, 1)]
setB = [(1, 19), (2, 18), (3, 17), (4, 16), (5, 15), (6, 14), (7, 13),
(8, 12), (9, 11), (10, 10), (11, 9), (12, 8), (13, 7), (14, 6),
(15, 5), (16, 4), (17, 3), (18, 2), (19, 1)]
la, lb = len(setA), len(setB)
temp=[[setA[i][1] if i<la else None, i+1, setB[i][1] if i<lb else None] for i in range(0,max(la,lb))]
[[25, 1, 19],
[24, 2, 18],
[23, 3, 17],
[22, 4, 16],
[21, 5, 15],
[20, 6, 14],
[19, 7, 13],
[18, 8, 12],
[17, 9, 11],
[16, 10, 10],
[15, 11, 9],
[14, 12, 8],
[13, 13, 7],
[12, 14, 6],
[11, 15, 5],
[10, 16, 4],
[9, 17, 3],
[8, 18, 2],
[7, 19, 1],
[6, 20, None],
[5, 21, None],
[4, 22, None],
[3, 23, None],
[2, 24, None],
[1, 25, None]]
If you want setC in the same format as setA and setB. I think this workaround will do.
Entering values directly as tuple is not possible as tuples are immutable and hence we append the new tuples as list and then convert it to a tuple.
`
setC = []
i=0
while setA[0][i][0]== setB[0][i][0] and (i < min(len(setA[0]), len(setB[0]))-1):
setC.append((setA[0][i][1],setA[0][i][0], setB[0][i][1]))
i+=1
setC = [tuple(setC)]
`

Find Longest Weighted Path from DAG with Networkx in Python?

I need a algorithm for find the longest weighted path in a directed, acyclic graph networkx.MultiDiGraph(). My graph has weighted edges and many edges have a null value as the weighting. In networkx doc I have found nothing for solve this problem. My graph has the following structure:
>>> print graph.nodes()
[0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 15, 16, 17, 20, 21, 22, 25, 26, 'end']
>>> print graph.edges()
[(0, 'end'), (1, 0), (1, 10), (1, 5), (2, 1), (2, 11), (2, 6), (3, 2), (3, 12), (3, 7), (4, 8), (4, 3), (4, 13), (5, 'end'), (6, 5), (6, 15), (7, 16), (7, 6), (8, 17), (8, 7), (10, 'end'), (11, 10), (11, 20), (11, 15), (12, 16), (12, 11), (12, 21), (13, 17), (13, 12), (13, 22), (15, 'end'), (16, 25), (16, 15), (17, 16), (17, 26), (20, 'end'), (21, 25), (21, 20), (22, 26), (22, 21), (25, 'end'), (26, 25)]
>>> print graph.edge[7][16]
{1: {'weight': 100.0, 'time': 2}}
>>> print graph.edge[7][6]
{0: {'weight': 0, 'time': 2}}
I find this her, but I have problems with the implementation:
networkx: efficiently find absolute longest path in digraph this solution is without weightings.
How to find the longest path with Python NetworkX? This solution transformed the weightings into negativ values, but my graph has null values… and the nx.dijkstra_path() does not support negative values.
Have anyone an idea or a solution to a similar problem found?
Take the solution in the link 1 and change the line:
pairs = [[dist[v][0]+1,v] for v in G.pred[node]] # incoming pairs
To something like:
pairs = [[dist[v][0]+edge['weight'],v] for u, edge in G[v][node] for v in G.pred[node]] # incoming pairs

Categories