python random binary list, evenly distributed - python

I have code to make a binary list of any length I want, with a random number of bits turned on:
rand_binary_list = lambda n: [random.randint(0,1) for b in range(1,n+1)]
rand_binary_list(10)
this returns something like this:
[0,1,1,0,1,0,1,0,0,0]
and if you run it a million times you'll get a bell curve distribution where about the sum(rand_binary_list(10)) is about 5 way more often than 1 or 10.
What I'd prefer is that having 1 bit turned on out of 10 is equally as likely as having half of them turned on. The number of bits turned on should be uniformly distributed.
I'm not sure how this can be done without compromising the integrity of the randomness. Any ideas?
EDIT:
I wanted to show this bell curve phenomenon explicitly so here it is:
>>> import random
>>> rand_binary_list = lambda n: [random.randint(0,1) for b in range(1,n+1)]
>>> counts = {0:0,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0}
>>> for i in range(10000):
... x = sum(rand_binary_list(10))
... counts[x] = counts[x] + 1
...
>>> counts[0]
7
>>> counts[1]
89
>>> counts[2]
454
>>> counts[3]
1217
>>> counts[4]
2017
>>> counts[5]
2465
>>> counts[6]
1995
>>> counts[7]
1183
>>> counts[8]
460
>>> counts[9]
107
>>> counts[10]
6
see how the chances of getting 5 turned on are much higher than the chances of getting 1 bit turned on?

Something like this:
def randbitlist(n=10):
n_on = random.randint(0, n)
n_off = n - n_on
result = [1]*n_on + [0]*n_off
random.shuffle(result)
return result
The number of bits "on" should be uniformly distributed in [0, n] inclusive, and then those bits selected will be uniformly distributed throughout the list.

Related

A while loop time complexity

I'm interested in determining the big O time complexity of the following:
def f(x):
r = x / 2
d = 1e-10
while abs(x - r**2) > d:
r = (r + x/r) / 2
return r
I believe this is O(log n). To arrive at this, I merely collected empirical data via the timeit module and plotted the results, and saw that a plot that looked logarithmic using the following code:
ns = np.linspace(1, 50_000, 100, dtype=int)
ts = [timeit.timeit('f({})'.format(n),
number=100,
globals=globals())
for n in ns]
plt.plot(ns, ts, 'or')
But this seems like a corny way to go about figuring this out. Intuitively, I understand that the body of the while loop involves dividing an expression by 2 some number k times until the while expression is equal to d. This repeated division by 2 gives something like 1/2^k, from which I can see where a log is involved to solve for k. I can't seem to write down a more explicit derivation, though. Any help?
This is Heron's (Or Babylonian) method for calculating the square root of a number. https://en.wikipedia.org/wiki/Methods_of_computing_square_roots
Big O notation for this requires a numerical analysis approach. For more details on the analysis you can check the wikipedia page listed or look for Heron's error convergence or fixed point iteration. (or look here https://mathcirclesofchicago.org/wp-content/uploads/2015/08/johnson.pdf)
Broad-strokes, if we can write the error e_n = (x-r_n**2) in terms of itself to where e_n = (e_n**2)/(2*(e_n+1))
Then we can see that e_n+1 <= min{(e_n**2)/2,e_n/2} so we have the error decrease quadratically. With the degrees of accuracy effectively doubling each iteration.
Whats different between this analysis and Big-O, is that the time it takes does NOT depend on the size of the input, but instead of the wanted accuracy. So in terms of input, this while loop is O(1) because its number of iterations is bounded by the accuracy not the input.
In terms of accuracy the error is bounded by above by e_n < 2**(-n) so we would need to find -n such that 2**(-n) < d. So log_2(d) = b such that 2^b = d. Assuming d < 2, then n = floor(log_2(d)) would work. So in terms of d, it is O(log(d)).
EDIT: Some more info on error analysis of fixed point iteration http://www.maths.lth.se/na/courses/FMN050/media/material/part3_1.pdf
I believe you're correct that it's O(log n).
Here you can see the successive values of r when x = 100000:
1 50000
2 25001
3 12502
4 6255
5 3136
6 1584
7 823
8 472
9 342
10 317
11 316
12 316
(I've rounded them off because the fractions are not interesting).
What you can see if that it goes through two phases.
Phase 1 is when r is large. During these first few iterations, x/r is tiny compared to r. As a result, r + x/r is close to r, so (r + x/r) / 2 is approximately r/2. You can see this in the first 8 iterations.
Phase 2 is when it gets close to the final result. During the last few iterations, x/r is close to r, so r + x/r is close to 2 * r, so (r + x/r) / 2 is close to r. At this point we're just improving the approximation by small amounts. These iterations are not really very dependent on the magnitude of x.
Here's the succession for x = 1000000 (10x the above):
1 500000
2 250001
3 125002
4 62505
5 31261
6 15646
7 7855
8 3991
9 2121
10 1296
11 1034
12 1001
13 1000
14 1000
This time there are 10 iterations in Phase 1, then we again have 4 iterations in Phase 2.
The complexity of the algorithm is dominated by Phase 1, which is logarithmic because it's approximately dividing by 2 each time.

find if a number divisible by the input numbers

Given two numbers a and b, we have to find the nth number which is divisible by a or b.
The format looks like below:
Input :
First line consists of an integer T, denoting the number of test cases.
Second line contains three integers a, b and N
Output :
For each test case, print the Nth
number in a new line.
Constraints :
1≤t≤105
1≤a,b≤104
1≤N≤10
Sample Input
1
2 3 10
Sample Output
15
Explanation
The numbers which are divisible by 2
or 3 are: 2,3,4,6,8,9,10,12,14,15 and the 10th number is 15
My code
test_case=input()
if int(test_case)<=100000 and int(test_case)>=1:
for p in range(int(test_case)):
count=1
j=1
inp=list(map(int,input().strip('').split()))
if inp[0]<=10000 and inp[0]>=1 and inp[1]<=10000 and inp[1]>=1 and inp[1]<=1000000000 and inp[1]>=1:
while(True ):
if count<=inp[2] :
k=j
if j%inp[0]==0 or j%inp[1] ==0:
count=count+1
j=j+1
else :
j=j+1
else:
break
print(k)
else:
break
Problem Statement:
For single test case input 2000 3000 100000 it is taking more than one second to complete.I want if i can get the results in less than 1 second. Is there a time efficient approach to this problem,may be if we can use some data structure and algorithms here??
For every two numbers there will be number k such that k=a*b. There will only be so many multiples of a and b under k. This set can be created like so:
s = set(a*1, b*1, ... a*(b-1), b*(a-1), a*b)
Say we take the values a=2, b=3 then s = (2,3,4,6). These are the possible values of c:
[1 - 4] => (2,3,4,6)
[5 - 8] => 6 + (2,3,4,6)
[9 - 12] => 6*2 + (2,3,4,6)
...
Notice that the values repeat with a predictable pattern. To get the row you can take the value of c and divide by length of the set s (call it n). The set index is the mod of c by n. Subtract 1 for 1 indexing used in the problem.
row = floor((c-1)/n)
column = `(c-1) % n`
result = (a*b)*row + s(column)
Python impl:
a = 2000
b = 3000
c = 100000
s = list(set([a*i for i in range(1, b+1)] + [b*i for i in range(1, a+1)]))
print((((c-1)//len(s)) * (a*b)) + s[(c - 1)%len(s)])
I'm not certain to grasp exactly what you're trying to accomplish. But if I get it right, isn't the answer simply b*(N/2)? since you are listing the multiples of both numbers the Nth will always be the second you list times N/2.
In your initial example that would be 3*10/2=15.
In the code example, it would be 3000*100000/2=150'000'000
Update:
Code to compute the desired values using set's and lists to speed up the calculation process. I'm still wondering what the recurrence for the odd indexes could be if anyone happens to stumble upon it...
a = 2000
b = 3000
c = 100000
a_list = [a*x for x in range(1, c)]
b_list = [b*x for x in range(1, c)]
nums = set(a_list)
nums.update(b_list)
nums = sorted(nums)
print(nums[c-1])
This code runs in 0.14s on my laptop. Which is significantly below the requested threshold. Nonetheless, this values will depend on the machine the code is run on.

Python Random Selection

I have a code which generates either 0 or 9 randomly. This code is run 289 times...
import random
track = 0
if track < 35:
val = random.choice([0, 9])
if val == 9:
track += 1
else:
val = 0
According to this code, if 9 is generated 35 times, then 0 is generated. So there is a heavy bias at the start and in the end 0 is mostly output.
Is there a way to reduce this bias so that the 9's are spread out quite evenly in 289 times.
Thanks for any help in advance
Apparently you want 9 to occur 35 times, and 0 to occur for the remainder - but you want the 9's to be evenly distributed. This is easy to do with a shuffle.
values = [9] * 35 + [0] * (289 - 35)
random.shuffle(values)
It sounds like you want to add some bias to the numbers that are generated by your script. Accordingly, you'll want to think about how you can use probability to assign a correct bias to the numbers being assigned.
For example, let's say you want to generate a list of 289 integers where there is a maximum of 35 nines. 35 is approximately 12% of 289, and as such, you would assign a probability of .12 to the number 9. From there, you could assign some other (relatively small) probability to the numbers 1 - 8, and some relatively large probability to the number 0.
Walker's Alias Method appears to be able to do what you need for this problem.
General Example (strings A B C or D with probabilities .1 .2 .3 .4):
abcd = dict( A=1, D=4, C=3, B=2 )
# keys can be any immutables: 2d points, colors, atoms ...
wrand = Walkerrandom( abcd.values(), abcd.keys() )
wrand.random() # each call -> "A" "B" "C" or "D"
# fast: 1 randint(), 1 uniform(), table lookup
Specific Example:
numbers = dict( 1=725, 2=725, 3=725, 4=725, 5=725, 6=725, 7=725, 8=725, 9=12, 0=3 )
wrand = Walkerrandom( numbers.values(), numbers.keys() )
#Add looping logic + counting logic to keep track of 9's here
track = 0
i = 0
while i < 290
if track < 35:
val = wrand.random()
if val == 9:
track += 1
else:
val = 0
i += 1

Distributing integers using weights? How to calculate?

I need to distribute a value based on some weights. For example, if my weights are 1 and 2, then I would expect the column weighted as 2 to have twice the value as the column weighted 1.
I have some Python code to demonstrate what I'm trying to do, and the problem:
def distribute(total, distribution):
distributed_total = []
for weight in distribution:
weight = float(weight)
p = weight/sum(distribution)
weighted_value = round(p*total)
distributed_total.append(weighted_value)
return distributed_total
for x in xrange(100):
d = distribute(x, (1,2,3))
if x != sum(d):
print x, sum(d), d
There are many cases shown by the code above where distributing a value results in the sum of the distribution being different than the original value. For example, distributing 3 with weights of (1,2,3) results in (1,1,2), which totals 4.
What is the simplest way to fix this distribution algorithm?
UPDATE:
I expect the distributed values to be integer values. It doesn't matter exactly how the integers are distributed as long as they total to the correct value, and they are "as close as possible" to the correct distribution.
(By correct distribution I mean the non-integer distribution, and I haven't fully defined what I mean by "as close as possible." There are perhaps several valid outputs, so long as they total the original value.)
Distribute the first share as expected. Now you have a simpler problem, with one fewer participants, and a reduced amount available for distribution. Repeat until there are no more participants.
>>> def distribute2(available, weights):
... distributed_amounts = []
... total_weights = sum(weights)
... for weight in weights:
... weight = float(weight)
... p = weight / total_weights
... distributed_amount = round(p * available)
... distributed_amounts.append(distributed_amount)
... total_weights -= weight
... available -= distributed_amount
... return distributed_amounts
...
>>> for x in xrange(100):
... d = distribute2(x, (1,2,3))
... if x != sum(d):
... print x, sum(d), d
...
>>>
You have to distribute the rounding errors somehow:
Actual:
| | | |
Pixel grid:
| | | |
The simplest would be to round each true value to the nearest pixel, for both the start and end position. So, when you round up block A 0.5 to 1, you also change the start position of the block B from 0.5 to 1. This decreases the size of B by 0.5 (in essence, "stealing" the size from it). Of course, this leads you to having B steal size from C, ultimately resulting in having:
| | | |
but how else did you expect to divide 3 into 3 integral parts?
The easiest approach is to calculate the normalization scale, which is the factor by which the sum of the weights exceeds the total you are aiming for, then divide each item in your weights by that scale.
def distribute(total, weights):
scale = float(sum(weights))/total
return [x/scale for x in weights]
If you expect distributing 3 with weights of (1,2,3) to be equal to (0.5, 1, 1.5), then the rounding is your problem:
weighted_value = round(p*total)
You want:
weighted_value = p*total
EDIT: Solution to return integer distribution
def distribute(total, distribution):
leftover = 0.0
distributed_total = []
distribution_sum = sum(distribution)
for weight in distribution:
weight = float(weight)
leftover, weighted_value = modf(weight*total/distribution_sum + leftover)
distributed_total.append(weighted_value)
distributed_total[-1] = round(distributed_total[-1]+leftover) #mitigate round off errors
return distributed_total

Poisson simulation not working as expected?

I have a simple script to set up a Poisson distribution by constructing an array of "events" of probability = 0.1, and then counting the number of successes in each group of 10. It almost works, but the distribution is not quite right (P(0) should equal P(1), but is instead about 90% of P(1)). It's like there's an off-by-one kind of error, but I can't figure out what it is. The script uses the Counter class from here (because I have Python 2.6 and not 2.7) and the grouping uses itertools as discussed here. It's not a stochastic issue, repeats give pretty tight results, and the overall mean looks good, group size looks good. Any ideas where I've messed up?
from itertools import izip_longest
import numpy as np
import Counter
def groups(iterable, n=3, padvalue=0):
"groups('abcde', 3, 'x') --> ('a','b','c'), ('d','e','x')"
return izip_longest(*[iter(iterable)]*n, fillvalue=padvalue)
def event():
f = 0.1
r = np.random.random()
if r < f: return 1
return 0
L = [event() for i in range(100000)]
rL = [sum(g) for g in groups(L,n=10)]
print len(rL)
print sum(list(L))
C = Counter.Counter(rL)
for i in range(max(C.keys())+1):
print str(i).rjust(2), C[i]
$ python script.py
10000
9949
0 3509
1 3845
2 1971
3 555
4 104
5 15
6 1
$ python script.py
10000
10152
0 3417
1 3879
2 1978
3 599
4 115
5 12
I did a combinatorial reality check on your math, and it looks like your results are correct actually. P(0) should not be roughly equivalent to P(1)
.9^10 = 0.34867844 = probability of 0 events
.1 * .9^9 * (10 choose 1) = .1 * .9^9 * 10 = 0.387420489 = probability of 1 event
I wonder if you accidentally did your math thusly:
.1 * .9^10 * (10 choose 1) = 0.34867844 = incorrect probability of 1 event

Categories