PYTHON How to generate 20 million unrepeatable random numbers - python

Need to generate 20 million unrepeatable random numbers with 8 characters length and save it in an array.
I try with multiprocessing,threading but it stays slow.
Try with multiprocessing:
from numpy.random import default_rng
from multiprocessing import Process,Queue
import os,time
import numpy as np
rng = default_rng()
f=np.array([],dtype=np.int64)
def generate(q,start,stop):
numbers=[rng.choice(range(start,stop),replace=False) for _ in range(1000)]
q.put(numbers)
if __name__ == '__main__':
timeInit = time.time()
for x in range(20000):
q=Queue()
p = Process(target=generate,args=(q,11111111,99999999,))
p.start()
f=np.append(f,q.get())
p.join()
print(f)
timeStop = time.time()
print('[TIME EXECUTED] ' + str(timeStop-timeInit) +' segs')

This took less than 30 secs on my personal laptop, if it works for you:
import random
candidates = list(range(10**7, 10**8)) # all numbers from 10000000 to 99999999
random.shuffle(candidates)
result = candidates[:20* 10**6] # take first 20 million

You haven't explained why you're doing all of that overhead. I simply took a random sample from the candidate numbers:
from random import sample
result = sample(
list(range(10**7, 10**8)),
2*10**7
)
51 seconds on my laptop, with interference from other jobs.
I just ran a more controlled test on both solutions. The one in this post took 48.5 seconds; the one from naicolas took 81.6 seconds, likely due to the extra list creation.

I hope I got your idea. The random numbers that you are trying to generate are actually a bit tricky. Basically we are looking for a set of unique (non-repeatable) but random numbers. In this case, we can not draw random numbers from uniform distribution, because there is no guarantee that numbers are unique.
There are 2 possible algorithms. The first one is to generate A LOT of possible random numbers, and remove those repeated ones. For instance,
import numpy as np
N = 20_000_000
L0 = 11_111_111 # legitimate int in Python
L1 = L0 * 9
not_enough_unique = True
while not_enough_unique:
X = np.random.uniform(L0, L1, int(N * 2)).astype(int)
X_unique = np.unique(X) # remove repeated numbers
not_enough_unique = len(X_unique) < N
random_numbers = X_unique[:N]
np.random.shuffle(random_numbers)
There is also another more "physics" approach. We can start with equal–spaced numbers, and move each number a little bit. The result will not be as random as the first one, but it is much faster and purely fun.
import numpy as np
N = 20_000_000
L0 = 11_111_111 # legitimate int in Python
L1 = L0 * 9
lattice = np.linspace(L0, L1, N) # all numbers have equal spacing
pertubation = np.random.normal(0, 0.4, N) # every number move left/right a little bit
random_numbers = (lattice + pertubation).astype(int)
# check if the minimum distance between two successive numbers
# i.e. all numbers are unique
min_dist = np.abs(np.diff(random_numbers)).min()
print(f"generating random numbers with minimum separation of {min_dist}")
print("(if it is > 1 you are good)")
np.random.shuffle(random_numbers)
(Both algorithms generate the result within 10s on my laptop)

Related

Is there a faster way to get the index of the maximum value and keep count?

Is there a faster way to get the count of each team having the highest score?
Input
# Random scores per player per round (each player corresponds to an integer from 0 - 100)
scores_per_round = np.random.rand(10_000,100)
# 100,000 random teams of 8
teams = np.array([random.sample(list(range(100)), 8) for _ in range(100_000)])
Desired Output
# Count of top scores per team key being the index of the team in the teams array and the value being the amount of wins.
{
0: 20,
1: 12,
...
}
Currently I loop through the rounds, add up each teams score and then grab the maximum values index using np.argmax and store the count in a dictionary.
import random
from collections import defaultdict
win_count = defaultdict(int)
# Random scores
scores_per_round = np.random.rand(10_000,100)
# 100,000 random teams of 8
teams = np.array([random.sample(list(range(100)), 8) for _ in range(100_000)])
# Loop through and keep track of teams wins
for round in range(10_000):
win_count[np.argmax(np.sum(np.take(scores_per_round[round], teams), axis=1))] += 1
The initial code is slow because it allocates pretty-big temporary arrays. Iterating over them 10_000 is expensive because the RAM or the last-level cache are relatively slow (compared to the L1 cache or registers). Using Numba can fix this problem by computing the arrays on-the-fly in a more cache-friendly way.
Here is a simple parallel implementation:
import numpy as np
import numba as nb
#nb.njit('int32[::1](float64[:,::1], int32[:,::1])', parallel=True, fastmath=True)
def computeTeamWins(scores_per_round, teams):
roundCount = scores_per_round.shape[0]
result = np.empty(roundCount, dtype=np.int32)
n, m = teams.shape
assert m == 8 # See the comment below
for r in nb.prange(roundCount):
iMax, sMax = -1, -1.0
for i in range(n):
s = 0.0
# Faster if the size is known as the loop can be unrolled
for j in range(8):
s += scores_per_round[r, teams[i, j]]
if s > sMax:
iMax, sMax = i, s
result[r] = iMax
return result
win_count = defaultdict(int)
for v in computeTeamWins(scores_per_round, teams):
win_count[v] += 1
On my 6-core machine, it takes 0.8 second while the initial code takes 54.3 seconds. This means the Numba implementation is about 68 times faster. If you cast teams to an np.uint8 array, then the computation takes only 0.57 seconds resulting in a 95 times faster execution (because of the caches). Note that this means the maximum integer value is bounded to 255 (included). Note that the final loop takes only few milliseconds.

Extracting total value of a tuple

I'm very new to Python, so please forgive my ignorance. I'm trying to calculate the total number of energy units in a system. For example, the Omega here will output both (0,0,0,1) and (2,2,2,1) along with a whole lot of other tuples. I want to extract from Omega how many tuples have a total value of 1 (like the first example) and how many have a total value of 7 (like the second example). How do I achieve this?
import numpy as np
import matplotlib.pyplot as plt
from itertools import product
N = 4 ##The number of Oscillators
q = range(3) ## Range of number of possible energy units per oscillator
Omega = product(q, repeat = N)
print(list(product(q, repeat = N)))
try this:
Omega = product(q, repeat = N)
l = list(product(q, repeat = N))
l1 = [i for i in l if sum(i)==1]
l2 = [i for i in l if sum(i)==7]
print(l1,l2)
I believe you can use sum() on tuples as well as lists of integers/numbers.
Now you say omega is a list of tuples, is that correct? Something like
Omega = [(0,0,0,1), (2,2,2,1), ...)]
In that case I think you can do
sums_to_1 = [int_tuple for int_tuple in omega if sum(int_tuple) == 1]
If you want to have some default value for the tuples that don't sum to one you can put the if statement in the list comprehension in the beginning and do
sums_to_1 = [int_tuple if sum(int_tuple) == 1 else 'SomeDefaultValue' for int_tuple in omega]

Implementing probability function involving heavy combinatorics in python

This is regarding the answer to a question I asked on the math stack.
I'm looking to convert this question/solution into python, but I'm having trouble interpreting all of the notation used here.
I realize this post is a bit too 'gimme the code' to be a great question, but I ask with the intention of understanding the math involved here. I don't understand the language of mathematical notations used here in concert very well, but I can interpret python well enough to conceptualize the answer if I see it.
The problem can be set up like this
import numpy as np
bag = np.hstack((
np.repeat(0, 80),
np.repeat(1, 21),
np.repeat(3, 5),
np.repeat(7,1)
))
I'm not sure if this is exactly what you're after but this is how I would calculate, for example, the probability of getting a sum == 6.
It's more practical than mathematical and just addresses this particular problem, so I'm not sure if it will help you under stand the maths.
import numpy as np
import itertools
from collections import Counter
import pandas as pd
bag = np.hstack((
np.repeat(0, 80),
np.repeat(1, 21),
np.repeat(3, 5),
np.repeat(7,1)
))
#107*106*105*104*103*102*101*100*99*98
#Out[176]: 127506499163211168000 #Permutations
##Need to reduce the number to sample from without changing the number of possible combinations
reduced_bag = np.hstack((
np.repeat(0, 10), ## 0 can be chosen all 10 times
np.repeat(1, 10), ## 1 can be chosen all 10 times
np.repeat(3, 5), ## 3 can be chosen up to 5 times
np.repeat(7,1) ## 7 can be chosen once
))
## There are 96 unique combinations
number_unique_combinations = len(set(list(itertools.combinations(reduced_bag,10))))
### sorted list of all combinations
unique_combinations = sorted(list(set(list(itertools.combinations(reduced_bag,10)))))
### sum of each unique combination
sums_list = [sum(uc) for uc in unique_combinations]
### probability for each unique combination
probability_dict = {0:80, 1:21, 3:5, 7:1} ##Dictionary to refer to
n = 107 ##Number in the bag
probability_list = []
##This part is VERY slow to run because of the itertools.permutations
for x in unique_combinations:
print(x)
p = 1 ##Start with the probability again
n = 107 ##Start with a full bag for each combination
count_x = Counter(x)
for i in x:
i_left = probability_dict[i] - (Counter(x)[i] - count_x[i]) ##Number of that type left in bag
p *= i_left/n ##Multiply the probability
n = n-1 # non replacement
count_x[i] = count_x[i] - 1 ##Reduce the number in the bag
p *= len(set(list(itertools.permutations(x,10)))) ##Multiply by the number of permutations per combination
probability_list.append(p)
##sum(probability_list) ## Has a rounding error
##Out[57]: 1.0000000000000002
##
##Put the combinations into dataframe
ar = np.array((unique_combinations,sums_list,probability_list))
df = pd.DataFrame(ar).T
##Name the columns
df.columns = ["combination", "sum","probability"]
## probability that sum is >= 6
df[df["sum"] >= 6]['probability'].sum()
## 0.24139909236232826
## probability that sum is == 6
df[df["sum"] == 6]['probability'].sum()
## 0.06756408790812335

Trying to optimize my complex function to excute in a polynomial time

I have this code that generate all the 2**40 possible binary numbers, and from this binary numbers, i will try to get all the vectors that match my objectif function conditions which is:
1- each vector in the matrix must have 20 of ones(1).
2- the sum of s = s + (the index of one +1)* the rank of the one must equal 4970.
i wrote this code but it will take a lot of time maybe months, to give the results. Now, i am looking for an alternative way or an optimization of this code if possible.
import time
from multiprocessing import Process
from multiprocessing import Pool
import numpy as np
import itertools
import numpy
CC = 20
#test if there is 20 numbers of 1
def test1numebers(v,x=1,x_l=CC):
c = 0
for i in range(len(v)):
if(v[i]==x):
c+=1
if c == x_l:
return True
else:
return False
#s = s+ the nth of 1 * (index+1)
def objectif_function(v,x=1):
s = 0
for i in range(len(v)):
if(v[i]==x):
s = s+((i+1)*nthi(v,i))
return s
#calculate the nth of 1 in a vecteur
def nthi(v,i):
c = 0
for j in range(0,i+1):
if(v[j] == 1):
c+=1
return c
#generate 2**40 of all possible binray numbers
def generateMatrix(N):
l = itertools.product([0, 1], repeat=N)
return l
#function that get the number of valide vector that match our objectif function
def main_algo(N=40,S=4970):
#N = 40
m = generateMatrix(N)
#S = 4970
c = 0
ii = 0
for i in m:
ii+=1
print("\n count:",ii)
xx = i
if(test1numebers(xx)):
if(objectif_function(xx)==S):
c+=1
print('found one')
print('\n',xx,'\n')
if ii>=1000000:
break
t_end = time.time()
print('time taken for 10**6 is: ',t_end-t_start)
print(c)
#main_algo()
if __name__ == '__main__':
'''p = Process(target=main_algo, args=(40,4970,))
p.start()
p.join()'''
p = Pool(150)
print(p.map(main_algo, [40,4970]))
While you could make a lot of improvements in readability and make your code more pythonic.
I recommend that you use numpy which is the fastest way of working with matrixes.
Avoid working with matrixes on a "pixel by pixel" loop. With numpy you can make those calculations faster and with all the data at once.
Also numpy has support for generating matrixes really fast. I think that you could make a random [0,1] matrix in less lines of code and quite faster.
Also i recommend that you install OPENBLAS, ATLAS and LAPACK which make linear algebra calculations quite faster.
I hope this helps you.

Python Random Function without using random module

I need to write the function -
random_number(minimum,maximum)
Without using the random module and I did this:
import time
def random_number(minimum,maximum):
now = str(time.clock())
rnd = float(now[::-1][:3:])/1000
return minimum + rnd*(maximum-minimum)
I am not sure this is fine.. is there a known way to do it with the time?
The thing is I need to do something that somehow uses the time
You could generate randomness based on a clock drift:
import struct
import time
def lastbit(f):
return struct.pack('!f', f)[-1] & 1
def getrandbits(k):
"Return k random bits using a relative drift of two clocks."
# assume time.sleep() and time.clock() use different clocks
# though it might work even if they use the same clock
#XXX it does not produce "good" random bits, see below for details
result = 0
for _ in range(k):
time.sleep(0)
result <<= 1
result |= lastbit(time.clock())
return result
Once you have getrandbits(k), it is straigforward to get a random integer in range [a, b], including both end points. Based on CPython Lib/random.py:
def randint(a, b):
"Return random integer in range [a, b], including both end points."
return a + randbelow(b - a + 1)
def randbelow(n):
"Return a random int in the range [0,n). Raises ValueError if n<=0."
# from Lib/random.py
if n <= 0:
raise ValueError
k = n.bit_length() # don't use (n-1) here because n can be 1
r = getrandbits(k) # 0 <= r < 2**k
while r >= n: # avoid skew
r = getrandbits(k)
return r
Example, to generate 20 random numbers from 10 to 110 including:
print(*[randint(10, 110) for _ in range(20)])
Output:
11 76 66 58 107 102 73 81 16 58 43 107 108 98 17 58 18 107 107 77
If getrandbits(k) returns k random bits then randint(a, b) should work as is (no skew due to modulo, etc).
To test the quality of getrandbits(k), dieharder utility could be used:
$ python3 random-from-time.py | dieharder -a -g 200
where random-from-time.py generates infinite (random) binary stream:
#!/usr/bin/env python3
def write_random_binary_stream(write):
while True:
write(getrandbits(32).to_bytes(4, 'big'))
if __name__ == "__main__":
import sys
write_random_binary_stream(sys.stdout.buffer.write)
where getrandbits(k) is defined above.
The above assumes that you are not allowed to use os.urandom() or ssl.RAND_bytes(), or some known PRNG algorithm such as Mersenne Twister to implement getrandbits(k).
getrandbits(n) implemented using "time.sleep() + time.clock()" fails dieharder tests (too many to be a coincidence).
The idea is still sound: a clock drift may be used as a source of randomness (entropy) but you can't use it directly (the distribution is not uniform and/or some bits are dependent); the bits could be passed as a seed to a PRNG that accepts an arbitrary entropy source instead. See "Mixing" section.
Are you allowed to read random data in some special file? Under Linux, the file `/dev/urandom' provides a convenient way to get random bytes. You could write:
import struct
f = open("/dev/urandom","r")
n = struct.unpack("i",f.read(4))[0]
But this will not work under Windows however.
Idea is to get number between 0 and 1 using time module and use that to get a number in range.Following will print 20 numbers randomly in range 20 and 60
from time import time
def time_random():
return time() - float(str(time()).split('.')[0])
def gen_random_range(min, max):
return int(time_random() * (max - min) + min)
if __name__ == '__main__':
for i in range(20):
print gen_random_range(20,60)
here we need to understand one thing that
a random varible is generated by using random
values that gives at run time. For that we need
time module
time.time() gives you random values (digits count nearly 17).
we need in milliseconds so we need to multiply by 1000
if i need the values from 0-10
then we need to get the value less than 10 that means we need below:
time.time%10 (but it is in float we need to convert to int)
int(time.time%10)
import time
def rand_val(x):
random=int(time.time()*1000)
random %= x
return random
x=int(input())
print(rand_val(x))
Use API? if allowed.
import urllib2
def get_random(x,y):
url = 'http://www.random.org/integers/?num=1&min=[min]&max=[max]&col=1&base=10&format=plain&rnd=new'
url = url.replace("[min]", str(x))
url = url.replace("[max]", str(y))
response = urllib2.urlopen(url)
num = response.read()
return num.strip()
print get_random(1,1000)
import datetime
def rand(s,n):
'''
This function create random number between the given range, its maximum range is 6 digits
'''
s = int(s)
n = int(n)
list_sec = datetime.datetime.now()
last_el=str(list_sec).split('.')[-1]
len_str=len(str(n))
get_number_elements = last_el[-int(len_str):]
try:
if int(get_number_elements)<=n and int(get_number_elements)>=s:
return get_number_elements
else:
max_value = int('9'*len_str)
res = s+int(get_number_elements)*(n-s)/(max_value)
return res
except Exception as e:
print(e)
finding random values between in a range(x,y)
you need to subtract low range from high store at x
then find random from 0-x
then add the value to low range-> lowrange+x(x is random)
import time
def rand_val(x,y):
sub=y-x
random=int(time.time()*1000)
random %=sub
random+=x
return random
x=int(input())
y=int(input())
print(rand_val(x,y))

Categories