Binary Search vs. Linear Search Strange Times

Binary Search vs. Linear Search Strange Times - python

I have a list of about 100k integers and I am testing out concepts of linear vs binary search. It looks like my linear search is actually faster than my binary search. Im not sure I see why the that is though. Any insights?
It also looks like my cpu takes a major hit with the linear search.
def linear_search(big_list, element):
"""
Iterate over an array and return the index of the first occurance of the item.
Use to determine the time it take a standard seach to complate.
Compare to binary search to see the difference in speed.
% time ./searchalgo.py
Results: time ./main.py 0.35s user 0.09s system 31% cpu 1.395 total
"""
for i in range(len(big_list)):
if big_list[i] == element:
# Returning the element in the list.
return i
return -1
print(linear_search(big_list, 999990))
def linear_binary_search(big_list, value):
"""
Divide and concuer approach
- Sort the array.
- Is the value I am searching for equal to the middle element of the array.
- Compare is the middle element smaller orlarger than the element you are looking for?
- If smaller then perform linear search to the right.
- If larger then perform linear seach to the left.
% time ./searchalgo.py
Results: 0.57s user 0.18s system 32% cpu 2.344 total
"""
big_list = sorted(big_list)
left, right = 0, len(big_list) - 1
index = (left + right) // 2
while big_list[index] != value:
if big_list[index] == value:
return index
if big_list[index] < value:
index = index + 1
elif big_list[index] > value:
index = index - 1
print(big_list[index])
# linear_binary_search(big_list, 999990)
Output of the linear search time:
./main.py 0.26s user 0.08s system 94% cpu 0.355 total
Output of the binary search time:
./main.py 0.39s user 0.11s system 45% cpu 1.103 total

your first algorithm time complexity is O(n) because of single traversal. But in case of your second traversal, first you sort the elements which takes O(nlogn) time which and then for searching it takes O(logn). So your second algorithm time complexity will be O(nlogn) + O(logn) which is greater than time complexity of your first algorithm.

def binary_search(sorted_list,target):
if not sorted_list:
return None
mid = len(sorted_list)//2
if sorted_list[mid] == target:
return target
if sorted_list[mid] < target:
return binary_search(sorted_list[mid+1:],target)
return binary_search(sorted_list[:mid],target)
Im pretty sure will actually correctly implement binary search recursively (which is easier for my brain to process) ... there is also the builtin bisect library that basically implements binary_search for you

Related

Why is this genetic algorithm taking too many iterations?

I'm learning about genetic algorithms and in order to better understand the concepts I tried to build genetic algorithm from scratch using python without using any external module (just the standard library and a little bit of numpy)
The goal is to find a target string, so if I give it the string hello and define 26 chars + a space, there are 26^5 possibilities which is huge. Thus the need to use a GA to solve this probem.
I defined the following functions:
Generate population : we generate the population given size n and a target we generate n string having len(target) of random chars, we return the population as a list of str
Compute a fitness score: if the char at position i is equal to the char at position i of target we increment the score, here's the code:
def fitness(indiv,target):
score = 0
#print(indiv," vs ",target)
for idx,char in enumerate(list(target)):
if char == indiv[idx]:
score += 1
else:
score = 0
return score
Select parents, crossing between parents and generating a new population of children
Here are the function responsible for that:
from numpy.random import choice
def crossover(p1,p2):
# we define a crossover between p1 and p2 (single point cross over)
point = random.choice([i for i in range (len(target))])
#print("Parents:",p1,p2)
# C1 and C2 are the new children, before the cross over point they are equalt to their prantes, after that we swap
c = [p1[i] for i in range(point)]
#print("Crossover point: ",point)
for i in range(point,len(p1)):
c.append(p2[i])
#print("Offsprings:", c1," and ", c2)
c = "".join(c)
# we mutate c too
c = mutate(c)
return c
def mutate(ind):
point = random.choice([i for i in range (len(target))])
new_ind = list(ind)
new_ind[point] = random.choice(letters)
return "".join(new_ind)
def select_parent(new_pop,fit_scores):
totale = sum(fit_scores)
probs = [score/totale for score in fit_scores]
parent = choice(new_pop,1,p=probs)[0]
return parent
I'm selecting parents by computing the probabilities of each individual (individual score/ total score of population), then using a weighted random choice function to select a parent (this is a numpy function).
For the crossover, I'm generating a child c and a random splitting point, all chars before this random point are the first parent chars, and all chars after the splitting point are chars from the parent.
besides that I defined a function called should_stop which check whether we found the target, and print_best which gets the best individuals out of a population (highest fitness score).
Then I created a find function that use all the functions defined above:
def find(size,target,pop):
scores = [fitness(ind,target) for ind in pop]
#print("len of scores is ", len(scores))
#good_indiv = select_individuals(pop,scores)
#print("Length of good indivs is", len(good_indiv))
new_pop = []
# corssover good individuals
for ind in pop:
pa = select_parent(pop,scores)
pb = select_parent(pop,scores)
#print(pa,pb)
child = crossover(pa,pb)
#print(type(child))
new_pop.append(child)
best = print_best(new_pop,scores)
print("********** The best individual is: ", best, " ********")
return (new_pop,best)
n = 200
target = "hello"
popu = generate_pop(n,target)
#find(n,target,popu)
for i in range(1000):
print(len(popu))
data = find(n,target,popu)
popu = data[0]
print("iteration number is ", i)
if data[1] == target:
break
The Problem The problem is that it's taking too many iterations than it shoud be to generate hello (more than 200 iterations most of the time), while in this example, it only takes few iterations: https://jbezerra.github.io/The-Shakespeare-and-Monkey-Problem/index.html
Sure the problem is not coded in the same way, I used python and a procedural way to code things but the logic is the same. So what I'm doing wrong ?

You mutate 100% of the time. You select 'suitable' parents which are likely to produce a fit offspring, but then you apply a mutation that's more likely than not to "throw it off". The example link your provided behaves the same way if you increase mutation rate to 100%.
The purpose of mutation is to "nudge" the search in a different direction if you appear to be stuck in a local optimum, applying it all the time turns this from an evolutionary algorithm to something much closer to random search.

The idea of genetic algorithms supports that best ones survive and create new generations
First off all you should keep best ones in the every generation for the next generation (for example best 40% of every generation keep living on the next generatio) and you should breed those 40 percent with each other and mutate only limited number of individual in every generation those numbers should be low like lower than 5% of the individuals mutates i believe this will reduce the number of generations

I would suggest define your strings in a dictionary and give a number to them
then analyse this arrays
example
my dictionary is
I : 1
eat : 23
want : 12
to : 2
so I want to eat
convert to [ 1 , 12, 2, 23]
so the randomness is reduce by a factor.
here the words are inferred from dictionary
so the only variable is the order and which words appear in your string.
re-write you algorithm with a dictionary
your algo run time will improve by a factor.
with regards
teja

Fast algorithm of multiple numbers sequential decrements until limit of sum

I have:
an ordered list of dc objects that have a float field result in it.
limit value for sum of result.
pack (not a better name ever) is a value of decreasing.
Problem:
I need to sequentially decrease results for each dc until sum of all results will be less or equal limit (without assigning result values below 0).
After some profiling I got this code:
while(self.sum > self.limit):
for dc in self.dc:
if dc.result > 0:
# max() too slow here
result = (
dc.result - self.pack
if dc.result - self.pack > 0
else 0
)
# Prevent sum() count for all list on each iteration
self.sum -= dc.result - result
dc.result = result
if self.sum <= self.limit:
break
But it has a low performance for small self.pack values (the code is doing too many iterations).
Is there a way to make this method faster?

If you are not too concerned how much you remove from the pack as long as it ensures it is less than sum, then you could just implement DC as a max heap (priority queue), and pop it every time until sum is <= self.limit. That would significantly speed up processing time especially in big data sets.
Edit:
Since dc is an ordered list, just treat it like a stack and pop from the back and remove from the pack (since the "heaviest" things are at the back).

Testing a vector without scanning it [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
For a series of algorithms I'm implementing I need to simulate things like sets of coins being weighed or pooled blood samples. The overriding goal is to identify a sparse set of interesting items in a set of otherwise identical items. This identification is done by testing groups of items together. For example the classic problem is to find a light counterfeit coin in a group of 81 (identical) coins, using as few weightings of a pan balance as possible. The trick is to split the 81 coins into three groups and weigh two groups against each other. You then do this on the group which doesn't balance until you have 2 coins left.
The key point in the discussion above is that the set of interesting items is sparse in the wider set - the algorithms I'm implementing all outperform binary search etc for this type of input.
What I need is a way to test the entire vector that indicates the presence of a single, or more ones, without scanning the vector componentwise.
I.e. a way to return the Hamming Weight of the vector in an O(1) operation - this will accurately simulate pooling blood samples/weighing groups of coins in a pan balance.
It's key that the vector isn't scanned - but the output should indicate that there is at least one 1 in the vector. By scanning I mean looking at the vector with algorithms such as binary search or looking at each element in turn. That is need to simulate pooling groups of items (such as blood samples) and s single test on the group which indicates the presence of a 1.
I've implemented this 'vector' as a list currently, but this needn't be set in stone. The task is to determine, by testing groups of the sublist, where the 1s in the vector are. An example of the list is:
sparselist = [0]*100000
sparselist[1024] = 1
But this could equally well be a long/set/something else as suggested below.
Currently I'm using any() as the test but it's been pointed out to me that any() will scan the vector - defeating the purpose of what I'm trying to achieve.
Here is an example of a naive binary search using any to test the groups:
def binary_search(inList):
low = 0
high = len(inList)
while low < high:
mid = low + (high-low) // 2
upper = inList[mid:high]
lower = inList[low:mid]
if any(lower):
high = mid
elif any(upper):
low = mid+1
else:
# Neither side has a 1
return -1
return mid
I apologise if this code isn't production quality. Any suggestions to improve it (beyond the any() test) will be appreciated.
I'm trying to come up with a better test than any() as it's been pointed out that any() will scan the list - defeating the point of what I'm trying to do. The test needn't return the exact Hamming weight - it merely needs to indicate that there is (or isn't!) a 1 in the group being tested (i.e. upper/lower in the code above).
I've also thought of using a binary xor, but don't know how to use it in a way that isn't componentwise.

Here is a sketch:
class OrVector (list):
def __init__(self):
self._nonzero_counter = 0
list.__init__(self)
def append(self, x):
list.append(self, x)
if x:
self._nonzero_counter += 1
def remove(self, x):
if x:
self._nonzero_counter -= 1
list.remove(self, x)
def hasOne(self):
return self._nonzero_counter > 0
v = OrVector()
v.append(0)
print v
print v.hasOne()
v.append(1);
print v
print v.hasOne()
v.remove(1);
print v
print v.hasOne()
Output:
[0]
False
[0, 1]
True
[0]
False
The idea is to inherit from list, and add a single variable which stores the number of nonzero entries. While the crucial functionality is delegated to the base list class, at the same time you monitor the number of nonzero entries in the list, and can query it in O(1) time using hasOne() member function.
HTH.

any will only scan the whole vector if does not find you you're after before the end of the "vector".
From the docs it is equivalent to
def any(iterable):
for element in iterable:
if element:
return True
return False
This does make it O(n). If you have things sorted (in your "binary vector") you can use bisect.
e.g. position = index(myVector, value)

Ok, maybe I will try an alternative answer.
You cannot do this with out any prior knowledge of your data. The only thing you can do it to make a test and cache the results. You can design a data structure that will help you determine a result of any subsequent tests in case your data structure is mutable, or a data structure that will be able to determine answer in better time on a subset of your vector.
However, your question does not indicate this. At least it did not at the time of writing the answer. For now you want to make one test on a vector, for a presence of a particular element, giving no prior knowledge about the data, in time complexity less than O(log n) in average case or O(n) in worst. This is not possible.
Also keep in mind you need to load a vector at some point which takes O(n) operations, so if you are interested in performing one test over a set of elements you wont loose much. On the average case with more elements, the loading time will take much more than testing.
If you want to perform a set of tests you can design an algorithm that will "build up" some knowledge during the subsequent test, that will help it determine results in better times. However, that holds only if you want make more than one test!

Unexpected performance curve from CPython merge sort

I have implemented a naive merge sorting algorithm in Python. Algorithm and test code is below:
import time
import random
import matplotlib.pyplot as plt
import math
from collections import deque
def sort(unsorted):
if len(unsorted) <= 1:
return unsorted
to_merge = deque(deque([elem]) for elem in unsorted)
while len(to_merge) > 1:
left = to_merge.popleft()
right = to_merge.popleft()
to_merge.append(merge(left, right))
return to_merge.pop()
def merge(left, right):
result = deque()
while left or right:
if left and right:
elem = left.popleft() if left[0] > right[0] else right.popleft()
elif not left and right:
elem = right.popleft()
elif not right and left:
elem = left.popleft()
result.append(elem)
return result
LOOP_COUNT = 100
START_N = 1
END_N = 1000
def test(fun, test_data):
start = time.clock()
for _ in xrange(LOOP_COUNT):
fun(test_data)
return time.clock() - start
def run_test():
timings, elem_nums = [], []
test_data = random.sample(xrange(100000), END_N)
for i in xrange(START_N, END_N):
loop_test_data = test_data[:i]
elapsed = test(sort, loop_test_data)
timings.append(elapsed)
elem_nums.append(len(loop_test_data))
print "%f s --- %d elems" % (elapsed, len(loop_test_data))
plt.plot(elem_nums, timings)
plt.show()
run_test()
As much as I can see everything is OK and I should get a nice N*logN curve as a result. But the picture differs a bit:
Things I've tried to investigate the issue:
PyPy. The curve is ok.
Disabled the GC using the gc module. Wrong guess. Debug output showed that it doesn't even run until the end of the test.
Memory profiling using meliae - nothing special or suspicious.
`
I had another implementation (a recursive one using the same merge function), it acts the similar way. The more full test cycles I create - the more "jumps" there are in the curve.
So how can this behaviour be explained and - hopefully - fixed?
UPD: changed lists to collections.deque
UPD2: added the full test code
UPD3: I use Python 2.7.1 on a Ubuntu 11.04 OS, using a quad-core 2Hz notebook. I tried to turn of most of all other processes: the number of spikes went down but at least one of them was still there.

You are simply picking up the impact of other processes on your machine.
You run your sort function 100 times for input size 1 and record the total time spent on this. Then you run it 100 times for input size 2, and record the total time spent. You continue doing so until you reach input size 1000.
Let's say once in a while your OS (or you yourself) start doing something CPU-intensive. Let's say this "spike" lasts as long as it takes you to run your sort function 5000 times. This means that the execution times would look slow for 5000 / 100 = 50 consecutive input sizes. A while later, another spike happens, and another range of input sizes look slow. This is precisely what you see in your chart.
I can think of one way to avoid this problem. Run your sort function just once for each input size: 1, 2, 3, ..., 1000. Repeat this process 100 times, using the same 1000 inputs (it's important, see explanation at the end). Now take the minimum time spent for each input size as your final data point for the chart.
That way, your spikes should only affect each input size only a few times out of 100 runs; and since you're taking the minimum, they will likely have no impact on the final chart at all.
If your spikes are really really long and frequent, you of course might want to increase the number of repetitions beyond the current 100 per input size.
Looking at your spikes, I notice the execution slows down exactly 3 times during a spike. I'm guessing the OS gives your python process one slot out of three during high load. Whether my guess is correct or not, the approach I recommend should resolve the issue.
EDIT:
I realized that I didn't clarify one point in my proposed solution to your problem.
Should you use the same input in each of your 100 runs for the given input size? Or should use 100 different (random) inputs?
Since I recommended to take the minimum of the execution times, the inputs should be the same (otherwise you'll be getting incorrect output, as you'll measuring the best-case algorithm complexity instead of the average complexity!).
But when you take the same inputs, you create some noise in your chart since some inputs are simply faster than others.
So a better solution is to resolve the system load problem, without creating the problem of only one input per input size (this is obviously pseudocode):
seed = 'choose whatever you like'
repeats = 4
inputs_per_size = 25
runtimes = defaultdict(lambda : float('inf'))
for r in range(repeats):
random.seed(seed)
for i in range(inputs_per_size):
for n in range(1000):
input = generate_random_input(size = n)
execution_time = get_execution_time(input)
if runtimes[(n, i)] > execution_time:
runtimes[(n,i)] = execution_time
for n in range(1000):
runtimes[n] = sum(runtimes[(n,i)] for i in range(inputs_per_size))/inputs_per_size
Now you can use runtimes[n] to build your plot.
Of course, depending if your system is super-noisy, you might change (repeats, inputs_per_size) from (4,25) to say, (10,10), or even (25,4).

I can reproduce the spikes using your code:
You should choose an appropriate timing function (time.time() vs. time.clock() -- from timeit import default_timer), number of repetitions in a test (how long each test takes), and number of tests to choose the minimal time from. It gives you a better precision and less external influence on the results. Read the note from timeit.Timer.repeat() docs:
It’s tempting to calculate mean and standard deviation from the result
vector and report these. However, this is not very useful. In a
typical case, the lowest value gives a lower bound for how fast your
machine can run the given code snippet; higher values in the result
vector are typically not caused by variability in Python’s speed, but
by other processes interfering with your timing accuracy. So the min()
of the result is probably the only number you should be interested in.
After that, you should look at the entire vector and apply common
sense rather than statistics.
timeit module can choose appropriate parameters for you:
$ python -mtimeit -s 'from m import testdata, sort; a = testdata[:500]' 'sort(a)'
Here's timeit-based performance curve:
The figure shows that sort() behavior is consistent with O(n*log(n)):
|------------------------------+-------------------|
| Fitting polynom | Function |
|------------------------------+-------------------|
| 1.00 log2(N) + 1.25e-015 | N |
| 2.00 log2(N) + 5.31e-018 | N*N |
| 1.19 log2(N) + 1.116 | N*log2(N) |
| 1.37 log2(N) + 2.232 | N*log2(N)*log2(N) |
To generate the figure I've used make-figures.py:
$ python make-figures.py --nsublists 1 --maxn=0x100000 -s vkazanov.msort -s vkazanov.msort_builtin
where:
# adapt sorting functions for make-figures.py
def msort(lists):
assert len(lists) == 1
return sort(lists[0]) # `sort()` from the question
def msort_builtin(lists):
assert len(lists) == 1
return sorted(lists[0]) # builtin
Input lists are described here (note: the input is sorted so builtin sorted() function shows expected O(N) performance).

Weighted random selection with and without replacement

Recently I needed to do weighted random selection of elements from a list, both with and without replacement. While there are well known and good algorithms for unweighted selection, and some for weighted selection without replacement (such as modifications of the resevoir algorithm), I couldn't find any good algorithms for weighted selection with replacement. I also wanted to avoid the resevoir method, as I was selecting a significant fraction of the list, which is small enough to hold in memory.
Does anyone have any suggestions on the best approach in this situation? I have my own solutions, but I'm hoping to find something more efficient, simpler, or both.

One of the fastest ways to make many with replacement samples from an unchanging list is the alias method. The core intuition is that we can create a set of equal-sized bins for the weighted list that can be indexed very efficiently through bit operations, to avoid a binary search. It will turn out that, done correctly, we will need to only store two items from the original list per bin, and thus can represent the split with a single percentage.
Let's us take the example of five equally weighted choices, (a:1, b:1, c:1, d:1, e:1)
To create the alias lookup:
Normalize the weights such that they sum to 1.0. (a:0.2 b:0.2 c:0.2 d:0.2 e:0.2) This is the probability of choosing each weight.
Find the smallest power of 2 greater than or equal to the number of variables, and create this number of partitions, |p|. Each partition represents a probability mass of 1/|p|. In this case, we create 8 partitions, each able to contain 0.125.
Take the variable with the least remaining weight, and place as much of it's mass as possible in an empty partition. In this example, we see that a fills the first partition. (p1{a|null,1.0},p2,p3,p4,p5,p6,p7,p8) with (a:0.075, b:0.2 c:0.2 d:0.2 e:0.2)
If the partition is not filled, take the variable with the most weight, and fill the partition with that variable.
Repeat steps 3 and 4, until none of the weight from the original partition need be assigned to the list.
For example, if we run another iteration of 3 and 4, we see
(p1{a|null,1.0},p2{a|b,0.6},p3,p4,p5,p6,p7,p8) with (a:0, b:0.15 c:0.2 d:0.2 e:0.2) left to be assigned
At runtime:
Get a U(0,1) random number, say binary 0.001100000
bitshift it lg2(p), finding the index partition. Thus, we shift it by 3, yielding 001.1, or position 1, and thus partition 2.
If the partition is split, use the decimal portion of the shifted random number to decide the split. In this case, the value is 0.5, and 0.5 < 0.6, so return a.
Here is some code and another explanation, but unfortunately it doesn't use the bitshifting technique, nor have I actually verified it.

A simple approach that hasn't been mentioned here is one proposed in Efraimidis and Spirakis. In python you could select m items from n >= m weighted items with strictly positive weights stored in weights, returning the selected indices, with:
import heapq
import math
import random
def WeightedSelectionWithoutReplacement(weights, m):
elt = [(math.log(random.random()) / weights[i], i) for i in range(len(weights))]
return [x[1] for x in heapq.nlargest(m, elt)]
This is very similar in structure to the first approach proposed by Nick Johnson. Unfortunately, that approach is biased in selecting the elements (see the comments on the method). Efraimidis and Spirakis proved that their approach is equivalent to random sampling without replacement in the linked paper.

Here's what I came up with for weighted selection without replacement:
def WeightedSelectionWithoutReplacement(l, n):
"""Selects without replacement n random elements from a list of (weight, item) tuples."""
l = sorted((random.random() * x[0], x[1]) for x in l)
return l[-n:]
This is O(m log m) on the number of items in the list to be selected from. I'm fairly certain this will weight items correctly, though I haven't verified it in any formal sense.
Here's what I came up with for weighted selection with replacement:
def WeightedSelectionWithReplacement(l, n):
"""Selects with replacement n random elements from a list of (weight, item) tuples."""
cuml = []
total_weight = 0.0
for weight, item in l:
total_weight += weight
cuml.append((total_weight, item))
return [cuml[bisect.bisect(cuml, random.random()*total_weight)] for x in range(n)]
This is O(m + n log m), where m is the number of items in the input list, and n is the number of items to be selected.

I'd recommend you start by looking at section 3.4.2 of Donald Knuth's Seminumerical Algorithms.
If your arrays are large, there are more efficient algorithms in chapter 3 of Principles of Random Variate Generation by John Dagpunar. If your arrays are not terribly large or you're not concerned with squeezing out as much efficiency as possible, the simpler algorithms in Knuth are probably fine.

It is possible to do Weighted Random Selection with replacement in O(1) time, after first creating an additional O(N)-sized data structure in O(N) time. The algorithm is based on the Alias Method developed by Walker and Vose, which is well described here.
The essential idea is that each bin in a histogram would be chosen with probability 1/N by a uniform RNG. So we will walk through it, and for any underpopulated bin which would would receive excess hits, assign the excess to an overpopulated bin. For each bin, we store the percentage of hits which belong to it, and the partner bin for the excess. This version tracks small and large bins in place, removing the need for an additional stack. It uses the index of the partner (stored in bucket[1]) as an indicator that they have already been processed.
Here is a minimal python implementation, based on the C implementation here
def prep(weights):
data_sz = len(weights)
factor = data_sz/float(sum(weights))
data = [[w*factor, i] for i,w in enumerate(weights)]
big=0
while big<data_sz and data[big][0]<=1.0: big+=1
for small,bucket in enumerate(data):
if bucket[1] is not small: continue
excess = 1.0 - bucket[0]
while excess > 0:
if big==data_sz: break
bucket[1] = big
bucket = data[big]
bucket[0] -= excess
excess = 1.0 - bucket[0]
if (excess >= 0):
big+=1
while big<data_sz and data[big][0]<=1: big+=1
return data
def sample(data):
r=random.random()*len(data)
idx = int(r)
return data[idx][1] if r-idx > data[idx][0] else idx
Example usage:
TRIALS=1000
weights = [20,1.5,9.8,10,15,10,15.5,10,8,.2];
samples = [0]*len(weights)
data = prep(weights)
for _ in range(int(sum(weights)*TRIALS)):
samples[sample(data)]+=1
result = [float(s)/TRIALS for s in samples]
err = [a-b for a,b in zip(result,weights)]
print(result)
print([round(e,5) for e in err])
print(sum([e*e for e in err]))

The following is a description of random weighted selection of an element of a
set (or multiset, if repeats are allowed), both with and without replacement in O(n) space
and O(log n) time.
It consists of implementing a binary search tree, sorted by the elements to be
selected, where each node of the tree contains:
the element itself (element)
the un-normalized weight of the element (elementweight), and
the sum of all the un-normalized weights of the left-child node and all of
its children (leftbranchweight).
the sum of all the un-normalized weights of the right-child node and all of
its chilren (rightbranchweight).
Then we randomly select an element from the BST by descending down the tree. A
rough description of the algorithm follows. The algorithm is given a node of
the tree. Then the values of leftbranchweight, rightbranchweight,
and elementweight of node is summed, and the weights are divided by this
sum, resulting in the values leftbranchprobability,
rightbranchprobability, and elementprobability, respectively. Then a
random number between 0 and 1 (randomnumber) is obtained.
if the number is less than elementprobability,
remove the element from the BST as normal, updating leftbranchweight
and rightbranchweight of all the necessary nodes, and return the
element.
else if the number is less than (elementprobability + leftbranchweight)
recurse on leftchild (run the algorithm using leftchild as node)
else
recurse on rightchild
When we finally find, using these weights, which element is to be returned, we either simply return it (with replacement) or we remove it and update relevant weights in the tree (without replacement).
DISCLAIMER: The algorithm is rough, and a treatise on the proper implementation
of a BST is not attempted here; rather, it is hoped that this answer will help
those who really need fast weighted selection without replacement (like I do).

This is an old question for which numpy now offers an easy solution so I thought I would mention it. Current version of numpy is version 1.2 and numpy.random.choice allows the sampling to be done with or without replacement and with given weights.

Suppose you want to sample 3 elements without replacement from the list ['white','blue','black','yellow','green'] with a prob. distribution [0.1, 0.2, 0.4, 0.1, 0.2]. Using numpy.random module it is as easy as this:
import numpy.random as rnd
sampling_size = 3
domain = ['white','blue','black','yellow','green']
probs = [.1, .2, .4, .1, .2]
sample = rnd.choice(domain, size=sampling_size, replace=False, p=probs)
# in short: rnd.choice(domain, sampling_size, False, probs)
print(sample)
# Possible output: ['white' 'black' 'blue']
Setting the replace flag to True, you have a sampling with replacement.
More info here:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.choice.html#numpy.random.choice

We faced a problem to randomly select K validators of N candidates once per epoch proportionally to their stakes. But this gives us the following problem:
Imagine probabilities of each candidate:
0.1
0.1
0.8
Probabilities of each candidate after 1'000'000 selections 2 of 3 without replacement became:
0.254315
0.256755
0.488930
You should know, those original probabilities are not achievable for 2 of 3 selection without replacement.
But we wish initial probabilities to be a profit distribution probabilities. Else it makes small candidate pools more profitable. So we realized that random selection with replacement would help us – to randomly select >K of N and store also weight of each validator for reward distribution:
std::vector<int> validators;
std::vector<int> weights(n);
int totalWeights = 0;
for (int j = 0; validators.size() < m; j++) {
int value = rand() % likehoodsSum;
for (int i = 0; i < n; i++) {
if (value < likehoods[i]) {
if (weights[i] == 0) {
validators.push_back(i);
}
weights[i]++;
totalWeights++;
break;
}
value -= likehoods[i];
}
}
It gives an almost original distribution of rewards on millions of samples:
0.101230
0.099113
0.799657

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.