Task
Given a numpy or pytorch matrix, find the indices of cells that have values that are larger than a given threshold.
My implementation
#abs_cosine is the matrix
#sim_vec is the wanted
sim_vec = []
for m in range(abs_cosine.shape[0]):
for n in range(abs_cosine.shape[1]):
# exclude diagonal cells
if m != n and abs_cosine[m][n] >= threshold:
sim_vec.append((m, n))
Concerns
Speed. All other computations are built on Pytorch, using numpy is already a compromise, because it has moved computations from GPU to CPU. Pure python for loops will make the whole process even worse (for small data set already 5 times slower). I was wondering if we can move the whole computation to Numpy (or pytorch) without invoking any for loops?
An improvement I can think of (but got stuck...)
bool_cosine = abs_cosine > threshold
which returns a boolean matrix of True and False. But I cannot find a way to quick retrieve the indices of the True cells.
The following is for PyTorch (fully on GPU)
# abs_cosine should be a Tensor of shape (m, m)
mask = torch.ones(abs_cosine.size()[0])
mask = 1 - mask.diag()
sim_vec = torch.nonzero((abs_cosine >= threshold)*mask)
# sim_vec is a tensor of shape (?, 2) where the first column is the row index and second is the column index
The following works in numpy
mask = 1 - np.diag(np.ones(abs_cosine.shape[0]))
sim_vec = np.nonzero((abs_cosine >= 0.2)*mask)
# sim_vec is a 2-array tuple where the first array is the row index and the second array is column index
This is about twice as fast than np.where
import numba as nb
#nb.njit(fastmath=True)
def get_threshold(abs_cosine,threshold):
idx=0
sim_vec=np.empty((abs_cosine.shape[0]*abs_cosine.shape[1],2),dtype=np.uint32)
for m in range(abs_cosine.shape[0]):
for n in range(abs_cosine.shape[1]):
# exclude diagonal cells
if m != n and abs_cosine[m,n] >= threshold:
sim_vec[idx,0]=m
sim_vec[idx,1]=n
idx+=1
return sim_vec[0:idx,:]
The first call takes about 0.2s longer (compilation overhead). If the array is on the GPU, there may be also a way to do the whole computation on the GPU.
Nevertheless I am not really satisfied with the performance, since a simple boolean operation is about 5 times faster than the solution shown above and 10 times faster than np.where. If the order of the indices doesn't matter this problem can also be parallelized.
I am trying to find an efficient code instead of the following piece of code (that is only one part of my code), to increase the speed:
for pr in some_list:
Tp = T[partition[pr]].sum(0)
Tpx = np.dot(Tp, xhat)
hp = h[partition[[pr]].sum(0)
up = (uk[partition[pr][:]].sum(0))/len(partition[pr])
hpu = hpu + np.dot(hp.T, up)
Tpu = Tpu + np.dot(Tp.T, up)
I have at least two more similar blocks of code. As you can see, I used fancy indexing three times (really couldn't find another way). In my algorithm, I need this part to be done very quickly, but it's not happening now. I will really appreciate any suggestion.
Thank you all.
Best,
If your partitions are few and have many elements each, you should consider swapping around the indices of your objects. Summing an array of shape (30,1000) along its second dimension should be faster than summing an array of shape (1000,30) along its first dimension, since in the former case you are always summing contiguous blocks of memory (i.e. arr[k,:] for each k) for each remaining index. So if you put the summation index last (and get rid of some trailing singleton dimension while you're at it), you might get speed-up.
As hpaulj noted in a comment, it's not clear how your loop could be vectorized. However, since it's performance-critical, you could still try vectorizing some of the work.
I suggest that you store hp, up and Tp for each partition (following pre-allocation), then perform the scalar/matrix products in a single vectorized step. Also note that Tpx is unused in your example, so I omitted it here (whatever you're doing with it, you can do it similarly to the other examples):
part_len = len(some_list) # number of partitions, N
Tpshape = (part_len,) + T.shape[1:] # (N,30,100) if T was (1000,30,100)
hpshape = (part_len,) + h.shape[1:] # (N,30,1) if h was (1000,30,1)
upshape = (part_len,) + uk.shape[1:] # (N,30,1) if uk was (1000,30,1)
Tp = np.zeros(Tpshape)
hp = np.zeros(hpshape)
up = np.zeros(upshape)
for ipr,pr in enumerate(some_list):
Tp[ipr,:,:] = T[partition[pr]].sum(0)
hp[ipr,:,:] = h[partition[[pr]].sum(0)
up[ipr,:,:] = uk[partition[pr]].sum(0)/len(partition[pr])
# compute vectorized dot products:
#Tpx unclear in original, omitted
# sum over second index (dot), sum over first index (sum in loop)
hpu = np.einsum('abc,abd->cd',hp,up) # shape (1,1)
Tpu = np.einsum('abc,abd->cd',Tp,up) # shape (100,1)
Clearly the key player is numpy.einsum. And of course if hpu and Tpu had some prior values before the loop, you have to increment those values with the results from einsum above.
As for einsum, it performs summations and contractions of arrays of arbitrary dimensions. The pattern apearing above, 'abc,abd->cd', when applied to 3d arrays A and B, will return a 2d array C, with the following definition (math pseudocode):
C(c,d) = sum_a sum_b A(a,b,c)*B(a,b,d)
For a given fix a summation index, what's inside is
sum_b A(a,b,c)*B(a,b,d)
which, if the c and d indices are kept, will be euqivalent to np.dot(A(a,:,:).T,B(a,:,:)). Since we're summing these matrices with respect to a too, we're supposed to do exactly what your loopy version does, adding up each np.dot() contribution of the total sums.
I am continuously calculating correlation matrices where each time the order of the underlying data is randomized. When a correlation score with randomized data is greater than or equal to the original correlation determined with ordered data, I would like to update the corresponding cell in a scoring matrix with +1. (All cells begin as zeroes in the scoring matrix).
Due to the size of the matrices I am dealing with shape = (3681, 12709), I would like to find out an efficient way of doing this. So far, what I have is inefficient and takes too long. I wonder if there is a matrix-operation style approach to this rather than iterating, as I am currently doing below:
for i, j in product(data_sorted.index, data_sorted.columns):
# if random correlation is as good as or better than sorted correlation
if data_random.loc[i, j] >= data_sorted.loc[i, j]:
# update scoring matrix
scoring_matrix[sorted_index_list.index(i)][sorted_column_list.index(j)] += 1
I have crudely timed this approach and found that doing this for a single line of my matrix will take roughly 4.2 seconds which seems excessive.
Any help would he much obliged.
Assuming everything has the same indices, this should work as expected and be pretty quick.
scoring_matrix += (data_random >= data_sorted).astype(int)
I'm trying to implement PageRank. I'm reading the description here: http://nlp.stanford.edu/IR-book/html/htmledition/markov-chains-1.html
Everything is very clear to me, however I'm concerned about the construction of the matrix $P$. I find that constructing $P$ the naive way would be very expensive. For example: to implement step 1, one would need to check every row of $A$ and then check every element of that row to see if all elements are zero. For step 2 one would need to compute the number of ones for each row. I can imagine my code to have nasty slow loops. I was wondering if there are smart linear algebra techniques that could efficiently construct $P$. I will be using python numpy for my coding.
EDIT: one way I'm thinking now to solve this is by doing a summation element wise over the columns of $A$. By that I would have a column vector. Now I will go through each element of this vector to check which elements are zeros. Thus I can now know which rows has no 1s and I can multiply those rows with $1/N$.
Your concern is correct. Since the number of web pages (vertices in the representing graph) is huge, it is impossible to actually generate such A and work on it.
The matrix calculation of page rank can be much more efficiently calculated using sparse matrix implementations, since the matrix is very sparse. Most webpages are not actually connected to each other, so most entries in the matrix are 0.
The sparse matrix is built as follows:
Build matrix A as described A_ij = 1 if (i,j) is an edge, otherwise A_ij = 0
Step 1 is usually not made, and instead we remove 'sinks' iteratively. This is done to prevent the matrix being dense, some alternatives are also linking 'sinks' back to the nodes that linked to them, or link a sink to itself.
Divide each 1 in A as described in (2)
Let's denote the resulting matrix as M, and this is the resulting matrix we will work on, in order to get a column vector p (which is initialized with 1/n for each entry).
x = [1/n, 1/n, ... , 1/n]^T //a column vector
p = [1/n, 1/n, ... , 1/n]^T //a column vector with the initial ranks
M = genSparseMatrix() //as described above
do until p converge:
p = (1-\alpha)* M*p + (\alpha) * x
return p
In the end, this yields p, the column vector that holds the page rank value for each node.
Recently I needed to do weighted random selection of elements from a list, both with and without replacement. While there are well known and good algorithms for unweighted selection, and some for weighted selection without replacement (such as modifications of the resevoir algorithm), I couldn't find any good algorithms for weighted selection with replacement. I also wanted to avoid the resevoir method, as I was selecting a significant fraction of the list, which is small enough to hold in memory.
Does anyone have any suggestions on the best approach in this situation? I have my own solutions, but I'm hoping to find something more efficient, simpler, or both.
One of the fastest ways to make many with replacement samples from an unchanging list is the alias method. The core intuition is that we can create a set of equal-sized bins for the weighted list that can be indexed very efficiently through bit operations, to avoid a binary search. It will turn out that, done correctly, we will need to only store two items from the original list per bin, and thus can represent the split with a single percentage.
Let's us take the example of five equally weighted choices, (a:1, b:1, c:1, d:1, e:1)
To create the alias lookup:
Normalize the weights such that they sum to 1.0. (a:0.2 b:0.2 c:0.2 d:0.2 e:0.2) This is the probability of choosing each weight.
Find the smallest power of 2 greater than or equal to the number of variables, and create this number of partitions, |p|. Each partition represents a probability mass of 1/|p|. In this case, we create 8 partitions, each able to contain 0.125.
Take the variable with the least remaining weight, and place as much of it's mass as possible in an empty partition. In this example, we see that a fills the first partition. (p1{a|null,1.0},p2,p3,p4,p5,p6,p7,p8) with (a:0.075, b:0.2 c:0.2 d:0.2 e:0.2)
If the partition is not filled, take the variable with the most weight, and fill the partition with that variable.
Repeat steps 3 and 4, until none of the weight from the original partition need be assigned to the list.
For example, if we run another iteration of 3 and 4, we see
(p1{a|null,1.0},p2{a|b,0.6},p3,p4,p5,p6,p7,p8) with (a:0, b:0.15 c:0.2 d:0.2 e:0.2) left to be assigned
At runtime:
Get a U(0,1) random number, say binary 0.001100000
bitshift it lg2(p), finding the index partition. Thus, we shift it by 3, yielding 001.1, or position 1, and thus partition 2.
If the partition is split, use the decimal portion of the shifted random number to decide the split. In this case, the value is 0.5, and 0.5 < 0.6, so return a.
Here is some code and another explanation, but unfortunately it doesn't use the bitshifting technique, nor have I actually verified it.
A simple approach that hasn't been mentioned here is one proposed in Efraimidis and Spirakis. In python you could select m items from n >= m weighted items with strictly positive weights stored in weights, returning the selected indices, with:
import heapq
import math
import random
def WeightedSelectionWithoutReplacement(weights, m):
elt = [(math.log(random.random()) / weights[i], i) for i in range(len(weights))]
return [x[1] for x in heapq.nlargest(m, elt)]
This is very similar in structure to the first approach proposed by Nick Johnson. Unfortunately, that approach is biased in selecting the elements (see the comments on the method). Efraimidis and Spirakis proved that their approach is equivalent to random sampling without replacement in the linked paper.
Here's what I came up with for weighted selection without replacement:
def WeightedSelectionWithoutReplacement(l, n):
"""Selects without replacement n random elements from a list of (weight, item) tuples."""
l = sorted((random.random() * x[0], x[1]) for x in l)
return l[-n:]
This is O(m log m) on the number of items in the list to be selected from. I'm fairly certain this will weight items correctly, though I haven't verified it in any formal sense.
Here's what I came up with for weighted selection with replacement:
def WeightedSelectionWithReplacement(l, n):
"""Selects with replacement n random elements from a list of (weight, item) tuples."""
cuml = []
total_weight = 0.0
for weight, item in l:
total_weight += weight
cuml.append((total_weight, item))
return [cuml[bisect.bisect(cuml, random.random()*total_weight)] for x in range(n)]
This is O(m + n log m), where m is the number of items in the input list, and n is the number of items to be selected.
I'd recommend you start by looking at section 3.4.2 of Donald Knuth's Seminumerical Algorithms.
If your arrays are large, there are more efficient algorithms in chapter 3 of Principles of Random Variate Generation by John Dagpunar. If your arrays are not terribly large or you're not concerned with squeezing out as much efficiency as possible, the simpler algorithms in Knuth are probably fine.
It is possible to do Weighted Random Selection with replacement in O(1) time, after first creating an additional O(N)-sized data structure in O(N) time. The algorithm is based on the Alias Method developed by Walker and Vose, which is well described here.
The essential idea is that each bin in a histogram would be chosen with probability 1/N by a uniform RNG. So we will walk through it, and for any underpopulated bin which would would receive excess hits, assign the excess to an overpopulated bin. For each bin, we store the percentage of hits which belong to it, and the partner bin for the excess. This version tracks small and large bins in place, removing the need for an additional stack. It uses the index of the partner (stored in bucket[1]) as an indicator that they have already been processed.
Here is a minimal python implementation, based on the C implementation here
def prep(weights):
data_sz = len(weights)
factor = data_sz/float(sum(weights))
data = [[w*factor, i] for i,w in enumerate(weights)]
big=0
while big<data_sz and data[big][0]<=1.0: big+=1
for small,bucket in enumerate(data):
if bucket[1] is not small: continue
excess = 1.0 - bucket[0]
while excess > 0:
if big==data_sz: break
bucket[1] = big
bucket = data[big]
bucket[0] -= excess
excess = 1.0 - bucket[0]
if (excess >= 0):
big+=1
while big<data_sz and data[big][0]<=1: big+=1
return data
def sample(data):
r=random.random()*len(data)
idx = int(r)
return data[idx][1] if r-idx > data[idx][0] else idx
Example usage:
TRIALS=1000
weights = [20,1.5,9.8,10,15,10,15.5,10,8,.2];
samples = [0]*len(weights)
data = prep(weights)
for _ in range(int(sum(weights)*TRIALS)):
samples[sample(data)]+=1
result = [float(s)/TRIALS for s in samples]
err = [a-b for a,b in zip(result,weights)]
print(result)
print([round(e,5) for e in err])
print(sum([e*e for e in err]))
The following is a description of random weighted selection of an element of a
set (or multiset, if repeats are allowed), both with and without replacement in O(n) space
and O(log n) time.
It consists of implementing a binary search tree, sorted by the elements to be
selected, where each node of the tree contains:
the element itself (element)
the un-normalized weight of the element (elementweight), and
the sum of all the un-normalized weights of the left-child node and all of
its children (leftbranchweight).
the sum of all the un-normalized weights of the right-child node and all of
its chilren (rightbranchweight).
Then we randomly select an element from the BST by descending down the tree. A
rough description of the algorithm follows. The algorithm is given a node of
the tree. Then the values of leftbranchweight, rightbranchweight,
and elementweight of node is summed, and the weights are divided by this
sum, resulting in the values leftbranchprobability,
rightbranchprobability, and elementprobability, respectively. Then a
random number between 0 and 1 (randomnumber) is obtained.
if the number is less than elementprobability,
remove the element from the BST as normal, updating leftbranchweight
and rightbranchweight of all the necessary nodes, and return the
element.
else if the number is less than (elementprobability + leftbranchweight)
recurse on leftchild (run the algorithm using leftchild as node)
else
recurse on rightchild
When we finally find, using these weights, which element is to be returned, we either simply return it (with replacement) or we remove it and update relevant weights in the tree (without replacement).
DISCLAIMER: The algorithm is rough, and a treatise on the proper implementation
of a BST is not attempted here; rather, it is hoped that this answer will help
those who really need fast weighted selection without replacement (like I do).
This is an old question for which numpy now offers an easy solution so I thought I would mention it. Current version of numpy is version 1.2 and numpy.random.choice allows the sampling to be done with or without replacement and with given weights.
Suppose you want to sample 3 elements without replacement from the list ['white','blue','black','yellow','green'] with a prob. distribution [0.1, 0.2, 0.4, 0.1, 0.2]. Using numpy.random module it is as easy as this:
import numpy.random as rnd
sampling_size = 3
domain = ['white','blue','black','yellow','green']
probs = [.1, .2, .4, .1, .2]
sample = rnd.choice(domain, size=sampling_size, replace=False, p=probs)
# in short: rnd.choice(domain, sampling_size, False, probs)
print(sample)
# Possible output: ['white' 'black' 'blue']
Setting the replace flag to True, you have a sampling with replacement.
More info here:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.choice.html#numpy.random.choice
We faced a problem to randomly select K validators of N candidates once per epoch proportionally to their stakes. But this gives us the following problem:
Imagine probabilities of each candidate:
0.1
0.1
0.8
Probabilities of each candidate after 1'000'000 selections 2 of 3 without replacement became:
0.254315
0.256755
0.488930
You should know, those original probabilities are not achievable for 2 of 3 selection without replacement.
But we wish initial probabilities to be a profit distribution probabilities. Else it makes small candidate pools more profitable. So we realized that random selection with replacement would help us – to randomly select >K of N and store also weight of each validator for reward distribution:
std::vector<int> validators;
std::vector<int> weights(n);
int totalWeights = 0;
for (int j = 0; validators.size() < m; j++) {
int value = rand() % likehoodsSum;
for (int i = 0; i < n; i++) {
if (value < likehoods[i]) {
if (weights[i] == 0) {
validators.push_back(i);
}
weights[i]++;
totalWeights++;
break;
}
value -= likehoods[i];
}
}
It gives an almost original distribution of rewards on millions of samples:
0.101230
0.099113
0.799657