I have a program that dispatches messages to separate processes. I need to balance the load, but not in very precise way, almost the same number is ok. Since every message has an uuid field, I want to do it by uuid value. After I tested the uuid randomness I found it to not be as random as I expexted. I have the last one and the first one about 80% difference. This is unacceptable, so I want to know if there is an algorithm that can make it more random.
Here is my test code.
import uuid
from collections import Counter
COUNT = 3000
def b(length):
holder = []
for i in xrange(COUNT):
holder.append(str(uuid.uuid4())[:length])
return Counter(holder)
def num(part_count):
sep = 0xffffffffffffffffffffffffffffffff / part_count
parts = []
for i in xrange(COUNT):
# str_hex = str(uuid.uuid4())[:4]
num = int(uuid.uuid4().hex,16)
divide = num/sep
if divide == part_count:
divide = part_count - 1
parts.append(divide)
return Counter(parts)
if __name__ == "__main__":
print num(200)
and I get the output like this:
Counter({127L: 29, 198L: 26, 55L: 25, 178L: 24, 184L: 24, 56L: 23, 132L: 23, 143L: 23, 148L: 23, 195L: 23, 16L: 21, 30L: 21, 44L: 21, 53L: 21, 97L: 21, 158L: 21, 185L: 21, 13L: 20, 146L: 20, 149L: 20, 196L: 20, 2L: 19, 11L: 19, 15L: 19, 19L: 19, 46L: 19, 58L: 19, 64L: 19, 68L: 19, 70L: 19, 89L: 19, 112L: 19, 118L: 19, 128L: 19, 144L: 19, 156L: 19, 192L: 19, 27L: 18, 41L: 18, 42L: 18, 51L: 18, 54L: 18, 85L: 18, 87L: 18, 88L: 18, 93L: 18, 94L: 18, 104L: 18, 106L: 18, 115L: 18, 4L: 17, 22L: 17, 45L: 17, 59L: 17, 79L: 17, 81L: 17, 105L: 17, 125L: 17, 138L: 17, 150L: 17, 159L: 17, 167L: 17, 194L: 17, 3L: 16, 18L: 16, 28L: 16, 31L: 16, 33L: 16, 62L: 16, 65L: 16, 83L: 16, 111L: 16, 123L: 16, 126L: 16, 133L: 16, 145L: 16, 147L: 16, 163L: 16, 166L: 16, 183L: 16, 188L: 16, 190L: 16, 5L: 15, 6L: 15, 9L: 15, 23L: 15, 26L: 15, 34L: 15, 35L: 15, 38L: 15, 69L: 15, 73L: 15, 74L: 15, 77L: 15, 82L: 15, 86L: 15, 107L: 15, 108L: 15, 109L: 15, 110L: 15, 114L: 15, 136L: 15, 141L: 15, 142L: 15, 153L: 15, 160L: 15, 169L: 15, 176L: 15, 180L: 15, 186L: 15, 0L: 14, 1L: 14, 36L: 14, 39L: 14, 43L: 14, 60L: 14, 71L: 14, 72L: 14, 76L: 14, 92L: 14, 113L: 14, 131L: 14, 135L: 14, 157L: 14, 171L: 14, 172L: 14, 181L: 14, 189L: 14, 7L: 13, 17L: 13, 20L: 13, 24L: 13, 25L: 13, 32L: 13, 47L: 13, 49L: 13, 101L: 13, 102L: 13, 117L: 13, 121L: 13, 122L: 13, 124L: 13, 130L: 13, 151L: 13, 152L: 13, 165L: 13, 179L: 13, 14L: 12, 21L: 12, 29L: 12, 50L: 12, 63L: 12, 67L: 12, 80L: 12, 84L: 12, 90L: 12, 91L: 12, 96L: 12, 120L: 12, 129L: 12, 139L: 12, 140L: 12, 182L: 12, 193L: 12, 197L: 12, 52L: 11, 75L: 11, 78L: 11, 103L: 11, 116L: 11, 119L: 11, 134L: 11, 137L: 11, 161L: 11, 173L: 11, 12L: 10, 37L: 10, 66L: 10, 98L: 10, 100L: 10, 162L: 10, 170L: 10, 175L: 10, 177L: 10, 187L: 10, 191L: 10, 199L: 10, 48L: 9, 155L: 9, 164L: 9, 174L: 9, 10L: 8, 95L: 8, 99L: 8, 168L: 8, 8L: 7, 40L: 7, 57L: 7, 61L: 7, 154L: 6})
the last one is 6 the first one is 29, nearly 5 times difference
UUIDs are not meant to be random, just unique. If your balancer needs to be keyed off of them, it should run them through a hash function first to get the randomness you want:
import hashlib
actually_random = hashlib.sha1(uuid).digest()
Your testing methodology doesn't make any sense (see below). But first, this is the implementation of uuid4:
def uuid4():
"""Generate a random UUID."""
# When the system provides a version-4 UUID generator, use it.
if _uuid_generate_random:
_buffer = ctypes.create_string_buffer(16)
_uuid_generate_random(_buffer)
return UUID(bytes=_buffer.raw)
# Otherwise, get randomness from urandom or the 'random' module.
try:
import os
return UUID(bytes=os.urandom(16), version=4)
except:
import random
bytes = [chr(random.randrange(256)) for i in range(16)]
return UUID(bytes=bytes, version=4)
And the randomness returned by libuuid (the ctypes call), os.urandom and random.randrange should be good enough for most non-crypto stuff.
Edit: Ok, my guess as to why your testing methodology is broken: the number you're counting (divide) is biased in two ways: first, it's the result of dividing by a number which isn't a power of two (in this case, 200), which introduces modulo bias. Second, the if divide == part_count: divide = part_count - 1 introduces more bias.
Additionally, you'll need to figure out what the confidence interval is for any random number generator test before you can interpret the results. My stats-foo isn't great here, though, so I can't really help you with that…
Well, UUID is not supposed to be random, it's supposed to be unique : usually, it's based on computer name/ip, date, stuff like that : the goal is not to make it random, the goal is to make sure that two successive calls will provide two different values and that Id from different computers won't collide. If you want more details, you can look at official spec (RFC 4122)
Now, if your load balancer want to use that as a criteria for balancing, I think your design is flawed. If you want a better randomness out of it, you can hash it (like sha-256), thus diluting the little randomness amongst all the bits (that's what a hash is doing)
Only because something doesn't look random, doesn't mean it isn't.
Maybe to the human eye (and mind) some sequences look less random than others, they are not.
When you roll a dice 10 times, the probability to roll 2-5-1-3-5-1-3-5-2-6 is as high as rolling 1-1-1-1-1-1-1-1-1-1 or 1-2-3-4-5-6-1-2-3-4. Although the two latter examples seem to be less random, they are not.
Do not try to improve random generators as most probably you will only worsen the output.
For instance: You want to generate a random sequence and it doesn't look random enough to you that one byte appears more frequently than another. Hence you dismiss all sequences with repeated bytes ( or bytes repeated more than n times) in order to assure more randomness. Actually, you are making your sequences less random.
Related
from random import *
day = list(range(1, 29))
day = day[3:29]
shuffleday = shuffle(day)
print(shuffleday)
The result is None. What's wrong?
random.shuffle does not return anything. It modifies the input list.
import random
day=list(range(1,29))
print(day)
# [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28]
day=day[3:29]
print(day)
# [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28]
shuffleday=random.shuffle(day)
print(shuffleday)
# None
print(day)
# [27, 9, 14, 15, 7, 17, 28, 10, 23, 21, 16, 12, 6, 11, 22, 25, 24, 20, 5, 19, 13, 4, 18, 8, 26]
random.shuffle modifies day. It does not return anything.
This is per convention in Python, where functions which modify the arguments given to them return None.
If you wish to generate a new list shuffled, without modifying the original, you may wish to use random.sample.
from random import *
day = list(range(1, 29))
shuffleday = sample(day[3:29], len(day2))
print(shuffleday)
If you simply wish to get a random element from a list, you can use random.choice.
from random import *
day = list(range(1, 29))
randomday = choice(day[3:29])
print(randomday)
To get a random number from day list use choice instead of shuffle, it will return a value of random element from day[3:29]:
from random import *
day = list(range(1, 29))
day = day[3:29]
shuffleday = choice(day)
print(shuffleday)
The value of shuffleday is None because the shuffle function does not return a new list, it just changes the order of the elements in the list supplied as an argument.
This will return a value of random number. Choose one:
shuffleday = choice(range(3,29))
or
shuffleday = randint(3,29)
or
shuffleday = randrange(3,29)
in fact the shuffle function does not return anything, it shuffles the elements in place (in day for your case).
try print(day) instead
random.shuffle modifies the list in place, it doesn't return a value. so you can just replace your second last line with
shuffle(day)
then simply print shuffle without making your new variable.
Additionally I would say it's better practice to just import what you need from random, rather than doing a * import and cluttering your namespace (especially considering you are new to the language and unlikely to know the name of every function in the random library). In this case you can do:
from random import shuffle
I've got a function that simulates a stochastic system of chemical reactions. I now want to use the Process class from Pythons Multiprocessing library to run the stochastic simulation function several times.
I tried the following:
v = range(1, 51)
def parallelfunc(v):
gillespie_tau_leaping(start_state, LHS, stoch_rate, state_change_array)
if __name__ == '__main__':
start = datetime.utcnow()
p = Process(target=parallelfunc, args=(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50))
p.start()
p.join()
end = datetime.utcnow()
sim_time = end - start
print(f"Simualtion utc time:\n{sim_time}")
but this results in the error TypeError: parallelfunc() takes 1 positional argument but 50 were given
Then i tried just passing range(1, 51) to both parallelfunc and the args parameter of process but then I just get SyntaxError: invalid syntax on the deceleration of parallelfunc
The method of using a function like parallelfunc in this way works when using pool.map there I just pass parallelfunc followed by a list from 1 - 50.
But I can't figure out whats going wrong here.
Any suggestions
Cheers.
thats cause you have given too many arguments.
try
def parallelfunc(*v):
gillespie_tau_leaping(start_state, LHS, stoch_rate, state_change_array)
this allows you to take multiple arguments
I think the error is because your are giving 50 parameters as arguments for the parallelfunc, which you defined it to be only one argument. Maybe try adding list() so it is only one argument, which contains the 50 numbers. However, I dont know if the number of numbers is high.
I'm trying to run a stochastic simulation in parallel using Python's multiprocessing library 50 times, and compare the parallel execution time to the sequential execution time.
I've written a function that calls my stochastic simulation function, parallelfunc this is passed as the first argument to the multiprocessing map command.
def parallelfunc(v):
gillespie_tau_leaping(start_state, LHS, stoch_rate, state_change_array)
At the moment the second argument to map is as follows:
if __name__ == '__main__':
#para_proc = list(range(50))
start = datetime.utcnow()
with Pool() as p:
pool_results = p.map(parallelfunc, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50])
end = datetime.utcnow()
sim_time = end - start
print(f"Simualtion utc time:\n{sim_time}")
I've tried a few methods to pass an iterable of length 50 as the second argument to map, including a list in the form of para_proc this didn't work.
I then tried a list comprehension [x for x in range(50)] only I think the for loop really slowed down the execution time and it wasn't much faster than running the simulation 50 times sequentially which isn't what I want.
Is there a more Pythonic way that won't heavily impact the speed to do this?
Cheers
I've a python list as this one [2, 5, 26, 37, 45, 12, 23, 37, 45, 12, 23, 37, 45, 12, 23, 37]. The real list is really long. The list repeat itself after a certain point in this case after 37. I have no problem finding the number at which it repeats, but i need to truncate the list at the second one. In this case the result would be [2, 5, 26, 37, 45, 12, 23, 37]. For finding the number (37 in this case) i use a function firstDuplicate() found on stackoverflow. Someone can help me ?
def firstDuplicate(a):
aset = set()
for i in a:
if i in aset:
return i
else:
aset.add(i)
pass
pass
pass
LIST = LIST[1:firstDuplicate(LIST)]
You can use the same basic idea of firstDuplicate() and create a generator that yields values until the dupe is found. Then pass it to list(), a loop, etc.
l = [2, 5, 26, 37, 45, 12, 23, 37, 45, 12, 23, 37, 45, 12, 23, 37]
def partitionAtDupe(l):
seen = set()
for n in l:
yield n
if n in seen:
break
seen.add(n)
list(partitionAtDupe(l))
# [2, 5, 26, 37, 45, 12, 23, 37]
It's not clear what should happen if there are no dupes. The code above will yield the whole list in that case.
A function to find the period size and length of repeated numbers should start from the end of the sequence of numbers. This will make it easier to ensure that there is a cycle up to the end of the list and avoid any concerns over non-periodic repetitions at the beginning of the list.
For example:
def getPeriod(seq):
lastPos = { n:p for p,n in enumerate(seq) }
prevPos = { n:p for p,n in enumerate(seq) if p<lastPos[n] }
period = 1
for n in reversed(seq):
if n not in prevPos: break
delta = lastPos[n] - prevPos[n]
if delta%period == 0 or period%delta == 0:
period = max(delta,period)
else: break
nonPeriodic = (i for i,(n,p) in enumerate(zip(seq[::-1],seq[-period-1::-1])) if n != p)
periodLength = next(nonPeriodic,0)
return period, periodLength
output:
seq = [2, 5, 26, 37, 45, 12, 23, 37, 45, 12, 23, 37, 45, 12, 23, 37]
period, periodLength = getPeriod(seq)
print(period,periodLength) # 4 9
print(seq[:-periodLength]) # [2, 5, 26, 37, 45, 12, 23]
If I have to generate natural numbers, I can use 'range' as follows:
list(range(5))
[0, 1, 2, 3, 4]
Is there any way to achieve this without using range function or looping?
You could use recursion to print first n natural numbers
def printNos(n):
if n > 0:
printNos(n-1)
print n
printNos(100)
Based on Nihal's solution, but returns a list instead:
def recursive_range(n):
if n == 0:
return []
return recursive_range(n-1) + [n-1]
Looping will be required in some form or another to generate a list of numbers, whether you do it yourself, use library functions, or use recursive methods.
If you're not opposed to looping in principle (but just don't want to implement it yourself), there are many practical and esoteric ways to do it (a number have been mentioned here already).
A similar question was posted here: How to fill a list. Although it has interesting solutions, they're all still looping, or using range type functions.
Well, yes, you can do this without using range, loop or recursion:
>>> num = 10
>>> from subprocess import call
>>> call(["seq", str(num)])
You can even have a list (or a generator, of course):
>>> num = 10
>>> from subprocess import check_output
>>> ls = check_output(["seq", str(num)])
>>> [int(num) for num in ls[:-1].split('\n')]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42]
But...what's the purpose?