I have two ordered lists of consecutive integers m=0, 1, ... M and n=0, 1, 2, ... N. Each value of m has a probability pm, and each value of n has a probability pn. I am trying to find the ordered list of unique values r=n/m and their probabilities pr. I am aware that r is infinite if n=0 and can even be undefined if m=n=0.
In practice, I would like to run for M and N each be of the order of 2E4, meaning up to 4E8 values of r - which would mean 3 GB of floats (assuming 8 Bytes/float).
For this calculation, I have written the python code below.
The idea is to iterate over m and n, and for each new m/n, insert it in the right place with its probability if it isn't there yet, otherwise add its probability to the existing number. My assumption is that it is easier to sort things on the way instead of waiting until the end.
The cases related to 0 are added at the end of the loop.
I am using the Fraction class since we are dealing with fractions.
The code also tracks the multiplicity of each unique value of m/n.
I have tested up to M=N=100, and things are quite slow. Are there better approaches to the question, or more efficient ways to tackle the code?
Timing:
M=N=30: 1 s
M=N=50: 6 s
M=N=80: 30 s
M=N=100: 82 s
import numpy as np
from fractions import Fraction
import time # For timiing
start_time = time.time() # Timing
M, N = 6, 4
mList, nList = np.arange(1, M+1), np.arange(1, N+1) # From 1 to M inclusive, deal with 0 later
mProbList, nProbList = [1/(M+1)]*(M), [1/(N+1)]*(N) # Probabilities, here assumed equal (not general case)
# Deal with mn=0 later
pmZero, pnZero = 1/(M+1), 1/(N+1) # P(m=0) and P(n=0)
pNaN = pmZero * pnZero # P(0/0) = P(m=0)P(n=0)
pZero = pmZero * (1 - pnZero) # P(0) = P(m=0)P(n!=0)
pInf = pnZero * (1 - pmZero) # P(inf) = P(m!=0)P(n=0)
# Main list of r=m/n, P(r) and mult(r)
# Start with first line, m=1
rList = [Fraction(mList[0], n) for n in nList[::-1]] # Smallest first
rProbList = [mProbList[0] * nP for nP in nProbList[::-1]] # Start with first line
rMultList = [1] * len(rList) # Multiplicity of each element
# Main loop
for m, mP in zip(mList[1:], mProbList[1:]):
for n, nP in zip(nList[::-1], nProbList[::-1]): # Pick an n value
r, rP, rMult = Fraction(m, n), mP*nP, 1
for i in range(len(rList)-1): # See where it fits in existing list
if r < rList[i]:
rList.insert(i, r)
rProbList.insert(i, rP)
rMultList.insert(i, 1)
break
elif r == rList[i]:
rProbList[i] += rP
rMultList[i] += 1
break
elif r < rList[i+1]:
rList.insert(i+1, r)
rProbList.insert(i+1, rP)
rMultList.insert(i+1, 1)
break
elif r == rList[i+1]:
rProbList[i+1] += rP
rMultList[i+1] += 1
break
if r > rList[-1]:
rList.append(r)
rProbList.append(rP)
rMultList.append(1)
break
# Deal with 0
rList.insert(0, Fraction(0, 1))
rProbList.insert(0, pZero)
rMultList.insert(0, N)
# Deal with infty
rList.append(np.Inf)
rProbList.append(pInf)
rMultList.append(M)
# Deal with undefined case
rList.append(np.NAN)
rProbList.append(pNaN)
rMultList.append(1)
print(".... done in %s seconds." % round(time.time() - start_time, 2))
print("************** Final list\nr", 'Prob', 'Mult')
for r, rP, rM in zip(rList, rProbList, rMultList): print(r, rP, rM)
print("************** Checks")
print("mList", mList, 'nList', nList)
print("Sum of proba = ", np.sum(rProbList))
print("Sum of multi = ", np.sum(rMultList), "\t(M+1)*(N+1) = ", (M+1)*(N+1))
Based on the suggestion of #Prune, and on this thread about merging lists of tuples, I have modified the code as below. It's a lot easier to read, and runs about an order of magnitude faster for N=M=80 (I have omitted dealing with 0 - would be done same way as in original post). I assume there may be ways to tweak the merge and conversion back to lists further yet.
# Do calculations
data = [(Fraction(m, n), mProb(m) * nProb(n)) for n in range(1, N+1) for m in range(1, M+1)]
data.sort()
# Merge duplicates using a dictionary
d = {}
for r, p in data:
if not (r in d): d[r] = [0, 0]
d[r][0] += p
d[r][1] += 1
# Convert back to lists
rList, rProbList, rMultList = [], [], []
for k in d:
rList.append(k)
rProbList.append(d[k][0])
rMultList.append(d[k][1])
I expect that "things are quite slow" because you've chosen a known inefficient sort. A single list insertion is O(K) (later list elements have to be bumped over, and there is added storage allocation on a regular basis). Thus a full-list insertion sort is O(K^2). For your notation, that is O((M*N)^2).
If you want any sort of reasonable performance, research and use the best-know methods. The most straightforward way to do this is to make your non-exception results as a simple list comprehension, and use the built-in sort for your penultimate list. Simply append your n=0 cases, and you're done in O(K log K) time.
I the expression below, I've assumed functions for m and n probabilities.
This is a notational convenience; you know how to directly compute them, and can substitute those expressions if you wish.
data = [ (mProb(m) * nProb(n), Fraction(m, n))
for n in range(1, N+1)
for m in range(0, M+1) ]
data.sort()
data.extend([ # generate your "zero" cases here ])
Related
Consider the following problem: Given a set of n intervals and a set of m floating-point numbers, determine, for each floating-point number, the subset of intervals that contain the floating-point number.
This problem has been addressed by constructing an interval tree (or called range tree or segment tree). Implementations have been done for the one-dimensional case, e.g. python's intervaltree package. Usually, these implementations consider one or few floating-point numbers, namely a small "m" above.
In my problem setting, both n and m are extremely large numbers (from solving an image processing problem). Further, I need to consider the N-dimensional intervals (called cuboid when N=3, because I was modeling human brains with the Finite Element Method). I have implemented a simple N-dimensional interval tree in python, but it run in a loop and can only take one floating-point number at a time. Can anyone help improve the implementation in terms of efficiency? You can change data structure freely.
import sys
import time
import numpy as np
# find the index of a satisfying x > a in one dimension
def find_index_smaller(a, x):
idx = np.argsort(a)
ss = np.searchsorted(a, x, sorter=idx)
res = idx[0:ss]
return res
# find the index of a satisfying x < a in one dimension
def find_index_larger(a, x):
return find_index_smaller(-a, -x)
# find the index of a satisfing amin < x < amax in one dimension
def find_intv_at(amin, amax, x):
idx = find_index_smaller(amin, x)
idx2 = find_index_larger(amax[idx], x)
res = idx[idx2]
return res
# find the index of a satisfying amin < x < amax in N dimensions
def find_intv_at_nd(amin, amax, x):
dim = amin.shape[0]
res = np.arange(amin.shape[-1])
for i in range(dim):
idx = find_intv_at(amin[i, res], amax[i, res], x[i])
res = res[idx]
return res
I also have two test examples for sanity check and performance testing:
def demo1():
print ("By default, we do a correctness test")
n_intv = 2
n_point = 2
# generate the test data
point = np.random.rand(3, n_point)
intv_min = np.random.rand(3, n_intv)
intv_max = intv_min + np.random.rand(3, n_intv)*8
print ("point ")
print (point)
print ("intv_min")
print (intv_min)
print ("intv_max")
print (intv_max)
print ("===Indexes of intervals that contain the point===")
for i in range(n_point):
print (find_intv_at_nd(intv_min,intv_max, point[:, i]))
def demo2():
print ("Performance:")
n_points=100
n_intv = 1000000
# generate the test data
points = np.random.rand(n_points, 3)*512
intv_min = np.random.rand(3, n_intv)*512
intv_max = intv_min + np.random.rand(3, n_intv)*8
print ("point.shape = "+str(points.shape))
print ("intv_min.shape = "+str(intv_min.shape))
print ("intv_max.shape = "+str(intv_max.shape))
starttime = time.time()
for point in points:
tmp = find_intv_at_nd(intv_min, intv_max, point)
print("it took this long to run {} points, with {} interva: {}".format(n_points, n_intv, time.time()-starttime))
My idea would be:
Remove np.argsort() from the algo, because the interval tree does not change, so sorting could have been done in pre-processing.
Vectorize x. The algo runs a loop for each x. It would be nice if we can get rid of the loop over x.
Any contribution would be appreciated.
Within a loop that collects some samples, I need to obtain some statistics about their sorted indices every now and then, for which argsort returns exactly what I need. However, each iteration adds only a single sample, and it is a huge waste of resources to keep passing the whole samples array to the argsort function, especially since the samples array is very huge. Is not there an incremental efficient technique equivalent to argsort?
I believe an efficient incremental argsort function can be implemented by maintaining an ordered list of samples, which can be searched for the proper insertion indices once a new sample arrives. Such indices can be then utilized to both maintain the order of the samples list as well as to generate the incremental argsort-like desired output.
So far, I have utilized the searchsorted2d function by #Divakar, with slight modifications to obtain the insertion indices, and built some routine that can get the desired output if it is called after each sample insertion (b = 1).
Yet, this is inefficient, and I would like to call the routine after the collection of kth samples (e.g. b = 10). In the case of bulk insertions, searchsorted2d seems to return incorrect indices, and that is were I stopped!
import time
import numpy as np
# By Divakar
# See https://stackoverflow.com/a/40588862
def searchsorted2d(a, b):
m, n = a.shape
max_num = np.maximum(a.max() - a.min(), b.max() - b.min()) + 1
r = max_num * np.arange(m)[:,np.newaxis]
p = np.searchsorted((a + r).ravel(), (b + r).ravel()).reshape(b.shape)
return p #- n * (np.arange(m)[:,np.newaxis])
# The following works with batch size b = 1,
# but that is not efficient ...
# Can we make it work for any b > 0 value?
class incremental(object):
def __init__(self, shape):
# Express each row offset
self.ranks_offset = np.tile(np.arange(shape[1]).reshape(1, -1),
(shape[0], 1))
# Storage for sorted samples
self.a_sorted = np.empty((shape[0], 0))
# Storage for sort indices
self.a_ranks = np.empty((shape[0], 0), np.int)
def argsort(self, a):
if self.a_sorted.shape[1] == 0: # Use np.argsort for initialization
self.a_ranks = a.argsort(axis=1)
self.a_sorted = np.take_along_axis(a, self.a_ranks, 1)
else: # In later itterations,
# searchsorted the input increment
indices = searchsorted2d(self.a_sorted, a)
# insert the stack pos to track the sorting indices
self.a_ranks = np.insert(self.a_ranks, indices.ravel(),
self.ranks_offset.ravel() +
self.a_ranks.shape[1]).reshape((n, -1))
# insert the increments to maintain a sorted input array
self.a_sorted = np.insert(self.a_sorted, indices.ravel(),
a.ravel()).reshape((n, -1))
return self.a_ranks
M = 1000 # number of iterations
n = 16 # vector size
b = 10 # vectors batch size
# Storage for samples
samples = np.zeros((n, M)) * np.nan
# The proposed approach
inc = incremental((n, b))
c = 0 # iterations counter
tick = time.time()
while c < M:
if c % b == 0: # Perform batch computations
#sample_ranks = samples[:,:c].argsort(axis=1)
sample_ranks = inc.argsort(samples[:,max(0,c-b):c]) # Incremental argsort
######################################################
# Utilize sample_ranks in some magic statistics here #
######################################################
samples[:,c] = np.random.rand(n) # collect a sample
c += 1 # increment the counter
tock = time.time()
last = ((c-1) // b) * b
sample_ranks_GT = samples[:,:last].argsort(axis=1) # Ground truth
print('Compatibility: {0:.1f}%'.format(
100 * np.count_nonzero(sample_ranks == sample_ranks_GT) / sample_ranks.size))
print('Elapsed time: {0:.1f}ms'.format(
(tock - tick) * 1000))
I would expect 100% compatibility with the argsort function, yet it need to be more efficient than calling argsort. As for execution time with an incremental approach, it seems that 15ms or so should be more than enough for the given example.
So far, only one condition of these two can be met with any of the explored techniques.
To make a long story short, the shown above algorithm seems to be a variant of an order-statistic tree to estimate the data ranks, but it fails to do so when samples are added in bulk (b > 1). So far, it only works when inserting samples one by one (b = 1). However, the arrays are copied every time insert is called, which causes a huge overhead and forms a bottleneck, therefore samples shall be added in bulks rather than individually.
Can you introduce more efficient incremental argsort algorithm, or at least figure out how to support bulk insertion (b > 1) in the above one?
If you choose to start from where I stopped, then the problem can be reduced to fixing the bug in following snapshot:
import numpy as np
# By Divakar
# See https://stackoverflow.com/a/40588862
def searchsorted2d(a, b):
m, n = a.shape
max_num = np.maximum(a.max() - a.min(), b.max() - b.min()) + 1
r = max_num * np.arange(m)[:,np.newaxis]
p = np.searchsorted((a + r).ravel(), (b + r).ravel()).reshape(b.shape)
# It seems the bug is around here...
#return p - b.shape[0] * np.arange(b.shape[1])[np.newaxis]
#return p - b.shape[1] * np.arange(b.shape[0])[:,np.newaxis]
return p
n = 16 # vector size
b = 2 # vectors batch size
a = np.random.rand(n, 1) # Samples array
a_ranks = a.argsort(axis=1) # Initial ranks
a_sorted = np.take_along_axis(a, a_ranks, 1) # Initial sorted array
new_data = np.random.rand(n, b) # New block to append into the samples array
a = np.hstack((a, new_data)) #Append new block
indices = searchsorted2d(a_sorted, new_data) # Compute insertion indices
ranks_offset = np.tile(np.arange(b).reshape(1, -1), (a_ranks.shape[0], 1)) + a_ranks.shape[1] # Ranks to insert
a_ranks = np.insert(a_ranks, indices.ravel(), ranks_offset.ravel()).reshape((n, -1)) # Insert ransk according to their indices
a_ransk_GT = a.argsort(axis=1) # Ranks ground truth
mask = (a_ranks == a_ransk_GT)
print(mask) #Why they are not all True?
assert(np.all(mask)), 'Oops!' #This should not fail, but it does :(
It seems the bulk insertion is more involved that what I initially thought, and that searchsorted2d is not to be blamed. Take the case of a sorted array a = [ 1, 2, 5 ], and two new elements block = [3, 4] to be inserted. If we iterate and insert, then np.searchsorted(a, block[i]) would return [2] and [3], and that is just OK. However, if we call np.searchsorted(a, block) (the desired behavior - equivalent to iteration without insertion), we would get [2, 2]. This is problematic for implementing an incremental argsort, since even np.searchsorted(a, block[::-1]) would result in the same. Any idea?
It turned out that the returned indices by searchsorted are not enough to ensure a sorted array when dealing with batch inputs. If the being inserted block contains two entries that are out of order, yet they will end up being placed adjacently in the target array, then they will receive the exact same insertion index, thus get inserted in their current order, causing the glitch. Accordingly, the input block itself needs to be sorted before insertion. See the last paragraph of the question for a numerical example.
By sorting the input block and adapting the remaining parts, a 100.0% compatible solution with argsort is obtained, and it is very efficient (elapsed time is 15.6ms for inserting 1000 entries in ten by ten blocks b = 10). This can be reproduced by replacing the buggy incremental class found in the question with the following one:
# by Hamdi Sahloul
class incremental(object):
def __init__(self, shape):
# Storage for sorted samples
self.a_sorted = np.empty((shape[0], 0))
# Storage for sort indices
self.a_ranks = np.empty((shape[0], 0), np.int)
def argsort(self, block):
# Compute the ranks of the input block
block_ranks = block.argsort(axis=1)
# Sort the block accordingly
block_sorted = np.take_along_axis(block, block_ranks, 1)
if self.a_sorted.shape[1] == 0: # Initalize using the block data
self.a_ranks = block_ranks
self.a_sorted = block_sorted
else: # In later itterations,
# searchsorted the input block
indices = searchsorted2d(self.a_sorted, block_sorted)
# update the global ranks
self.a_ranks = np.insert(self.a_ranks, indices.ravel(),
block_ranks.ravel() +
self.a_ranks.shape[1]).reshape((block.shape[0], -1))
# update the overall sorted array
self.a_sorted = np.insert(self.a_sorted, indices.ravel(),
block_sorted.ravel()).reshape((block.shape[0], -1))
return self.a_ranks
I'm trying to make a Z3 program (in Python) that generates boolean circuits that do certain tasks (e.g. adding two n-bit numbers) but the performance is terrible to the point where a brute-force search of the entire solution space would be faster. This is my first time using Z3 so I could be doing something that impacts my performance, but my code seems fine.
The following is copied from my code here:
from z3 import *
BITLEN = 1 # Number of bits in input
STEPS = 1 # How many steps to take (e.g. time)
WIDTH = 2 # How many operations/values can be stored in parallel, has to be at least BITLEN * #inputs
# Input variables
x = BitVec('x', BITLEN)
y = BitVec('y', BITLEN)
# Define operations used
op_list = [BitVecRef.__and__, BitVecRef.__or__, BitVecRef.__xor__, BitVecRef.__xor__]
unary_op_list = [BitVecRef.__invert__]
for uop in unary_op_list:
op_list.append(lambda x, y : uop(x))
# Chooses a function to use by setting all others to 0
def chooseFunc(i, x, y):
res = 0
for ind, op in enumerate(op_list):
res = res + (ind == i) * op(x, y)
return res
s = Solver()
steps = []
# First step is just the bits of the input padded with constants
firststep = Array("firststep", IntSort(), BitVecSort(1))
for i in range(BITLEN):
firststep = Store(firststep, i * 2, Extract(i, i, x))
firststep = Store(firststep, i * 2 + 1, Extract(i, i, y))
for i in range(BITLEN * 2, WIDTH):
firststep = Store(firststep, i, BitVec("const_0_%d" % i, 1))
steps.append(firststep)
# Generate remaining steps
for i in range(1, STEPS + 1):
this_step = Array("step_%d" % i, IntSort(), BitVecSort(1))
last_step = steps[-1]
for j in range(WIDTH):
func_ind = Int("func_%d_%d" % (i,j))
s.add(func_ind >= 0, func_ind < len(op_list))
x_ind = Int("x_%d_%d" % (i,j))
s.add(x_ind >= 0, x_ind < WIDTH)
y_ind = Int("y_%d_%d" % (i,j))
s.add(y_ind >= 0, y_ind < WIDTH)
node = chooseFunc(func_ind, Select(last_step, x_ind), Select(last_step, y_ind))
this_step = Store(this_step, j, node)
steps.append(this_step)
# Set the result to the first BITLEN bits of the last step
if BITLEN == 1:
result = Select(steps[-1], 0)
else:
result = Concat(*[Select(steps[-1], i) for i in range(BITLEN)])
# Set goal
goal = x | y
s.add(ForAll([x, y], goal == result))
print(s)
print(s.check())
print(s.model())
The code basically lays out the inputs as individual bits, then at each "step" one of 5 boolean functions can operate on the values from the previous step, where the final step represents the end result.
In this example, I generate a circuit to calculate the boolean OR of two 1-bit inputs, and an OR function is available in the circuit, so the solution is trivial.
I have a solution space of only 5*5*2*2*2*2=400:
5 Possible functions (two function nodes)
2 Inputs for each function, each of which has two possible values
This code takes a few seconds to run and provides a correct answer, but I feel like it should run instantaneously as there are only 400 possible solutions, of which quite a few are valid. If I increase the inputs to be two bits long, the solution space has a size of (5^4)*(4^8)=40,960,000 and never finishes on my computer, though I feel this should be easily doable with Z3.
I also tried effectively the same code but substituted Arrays/Store/Select for Python lists and "selected" the variables by using the same trick I used in chooseFunc(). The code is here and it runs in around the same time the original code does, so no speedup.
Am I doing something that would drastically slow down the solver? Thanks!
You have a duplicated __xor__ in your op_list; but that's not really the major problem. The slowdown is inevitable as you increase bit-size, but on a first look you can (and should) avoid mixing integer reasoning with booleans here. I'd code your chooseFunc as follows:
def chooseFunc(i, x, y):
res = False;
for ind, op in enumerate(op_list):
res = If(ind == i, op (x, y), res)
return res
See if that improves run-times in any meaningful way. If not, the next thing to do would be to get rid of arrays as much as possible.
I had an interview with a hedge fund company in New York a few months ago and unfortunately, I did not get the internship offer as a data/software engineer. (They also asked the solution to be in Python.)
I pretty much screwed up on the first interview problem...
Question: Given a string of a million numbers (Pi for example), write
a function/program that returns all repeating 3 digit numbers and number of
repetition greater than 1
For example: if the string was: 123412345123456 then the function/program would return:
123 - 3 times
234 - 3 times
345 - 2 times
They did not give me the solution after I failed the interview, but they did tell me that the time complexity for the solution was constant of 1000 since all the possible outcomes are between:
000 --> 999
Now that I'm thinking about it, I don't think it's possible to come up with a constant time algorithm. Is it?
You got off lightly, you probably don't want to be working for a hedge fund where the quants don't understand basic algorithms :-)
There is no way to process an arbitrarily-sized data structure in O(1) if, as in this case, you need to visit every element at least once. The best you can hope for is O(n) in this case, where n is the length of the string.
Although, as an aside, a nominal O(n) algorithm will be O(1) for a fixed input size so, technically, they may have been correct here. However, that's not usually how people use complexity analysis.
It appears to me you could have impressed them in a number of ways.
First, by informing them that it's not possible to do it in O(1), unless you use the "suspect" reasoning given above.
Second, by showing your elite skills by providing Pythonic code such as:
inpStr = '123412345123456'
# O(1) array creation.
freq = [0] * 1000
# O(n) string processing.
for val in [int(inpStr[pos:pos+3]) for pos in range(len(inpStr) - 2)]:
freq[val] += 1
# O(1) output of relevant array values.
print ([(num, freq[num]) for num in range(1000) if freq[num] > 1])
This outputs:
[(123, 3), (234, 3), (345, 2)]
though you could, of course, modify the output format to anything you desire.
And, finally, by telling them there's almost certainly no problem with an O(n) solution, since the code above delivers results for a one-million-digit string in well under half a second. It seems to scale quite linearly as well, since a 10,000,000-character string takes 3.5 seconds and a 100,000,000-character one takes 36 seconds.
And, if they need better than that, there are ways to parallelise this sort of stuff that can greatly speed it up.
Not within a single Python interpreter of course, due to the GIL, but you could split the string into something like (overlap indicated by vv is required to allow proper processing of the boundary areas):
vv
123412 vv
123451
5123456
You can farm these out to separate workers and combine the results afterwards.
The splitting of input and combining of output are likely to swamp any saving with small strings (and possibly even million-digit strings) but, for much larger data sets, it may well make a difference. My usual mantra of "measure, don't guess" applies here, of course.
This mantra also applies to other possibilities, such as bypassing Python altogether and using a different language which may be faster.
For example, the following C code, running on the same hardware as the earlier Python code, handles a hundred million digits in 0.6 seconds, roughly the same amount of time as the Python code processed one million. In other words, much faster:
#include <stdio.h>
#include <string.h>
int main(void) {
static char inpStr[100000000+1];
static int freq[1000];
// Set up test data.
memset(inpStr, '1', sizeof(inpStr));
inpStr[sizeof(inpStr)-1] = '\0';
// Need at least three digits to do anything useful.
if (strlen(inpStr) <= 2) return 0;
// Get initial feed from first two digits, process others.
int val = (inpStr[0] - '0') * 10 + inpStr[1] - '0';
char *inpPtr = &(inpStr[2]);
while (*inpPtr != '\0') {
// Remove hundreds, add next digit as units, adjust table.
val = (val % 100) * 10 + *inpPtr++ - '0';
freq[val]++;
}
// Output (relevant part of) table.
for (int i = 0; i < 1000; ++i)
if (freq[i] > 1)
printf("%3d -> %d\n", i, freq[i]);
return 0;
}
Constant time isn't possible. All 1 million digits need to be looked at at least once, so that is a time complexity of O(n), where n = 1 million in this case.
For a simple O(n) solution, create an array of size 1000 that represents the number of occurrences of each possible 3 digit number. Advance 1 digit at a time, first index == 0, last index == 999997, and increment array[3 digit number] to create a histogram (count of occurrences for each possible 3 digit number). Then output the content of the array with counts > 1.
A million is small for the answer I give below. Expecting only that you have to be able to run the solution in the interview, without a pause, then The following works in less than two seconds and gives the required result:
from collections import Counter
def triple_counter(s):
c = Counter(s[n-3: n] for n in range(3, len(s)))
for tri, n in c.most_common():
if n > 1:
print('%s - %i times.' % (tri, n))
else:
break
if __name__ == '__main__':
import random
s = ''.join(random.choice('0123456789') for _ in range(1_000_000))
triple_counter(s)
Hopefully the interviewer would be looking for use of the standard libraries collections.Counter class.
Parallel execution version
I wrote a blog post on this with more explanation.
The simple O(n) solution would be to count each 3-digit number:
for nr in range(1000):
cnt = text.count('%03d' % nr)
if cnt > 1:
print '%03d is found %d times' % (nr, cnt)
This would search through all 1 million digits 1000 times.
Traversing the digits only once:
counts = [0] * 1000
for idx in range(len(text)-2):
counts[int(text[idx:idx+3])] += 1
for nr, cnt in enumerate(counts):
if cnt > 1:
print '%03d is found %d times' % (nr, cnt)
Timing shows that iterating only once over the index is twice as fast as using count.
Here is a NumPy implementation of the "consensus" O(n) algorithm: walk through all triplets and bin as you go. The binning is done by upon encountering say "385", adding one to bin[3, 8, 5] which is an O(1) operation. Bins are arranged in a 10x10x10 cube. As the binning is fully vectorized there is no loop in the code.
def setup_data(n):
import random
digits = "0123456789"
return dict(text = ''.join(random.choice(digits) for i in range(n)))
def f_np(text):
# Get the data into NumPy
import numpy as np
a = np.frombuffer(bytes(text, 'utf8'), dtype=np.uint8) - ord('0')
# Rolling triplets
a3 = np.lib.stride_tricks.as_strided(a, (3, a.size-2), 2*a.strides)
bins = np.zeros((10, 10, 10), dtype=int)
# Next line performs O(n) binning
np.add.at(bins, tuple(a3), 1)
# Filtering is left as an exercise
return bins.ravel()
def f_py(text):
counts = [0] * 1000
for idx in range(len(text)-2):
counts[int(text[idx:idx+3])] += 1
return counts
import numpy as np
import types
from timeit import timeit
for n in (10, 1000, 1000000):
data = setup_data(n)
ref = f_np(**data)
print(f'n = {n}')
for name, func in list(globals().items()):
if not name.startswith('f_') or not isinstance(func, types.FunctionType):
continue
try:
assert np.all(ref == func(**data))
print("{:16s}{:16.8f} ms".format(name[2:], timeit(
'f(**data)', globals={'f':func, 'data':data}, number=10)*100))
except:
print("{:16s} apparently crashed".format(name[2:]))
Unsurprisingly, NumPy is a bit faster than #Daniel's pure Python solution on large data sets. Sample output:
# n = 10
# np 0.03481400 ms
# py 0.00669330 ms
# n = 1000
# np 0.11215360 ms
# py 0.34836530 ms
# n = 1000000
# np 82.46765980 ms
# py 360.51235450 ms
I would solve the problem as follows:
def find_numbers(str_num):
final_dict = {}
buffer = {}
for idx in range(len(str_num) - 3):
num = int(str_num[idx:idx + 3])
if num not in buffer:
buffer[num] = 0
buffer[num] += 1
if buffer[num] > 1:
final_dict[num] = buffer[num]
return final_dict
Applied to your example string, this yields:
>>> find_numbers("123412345123456")
{345: 2, 234: 3, 123: 3}
This solution runs in O(n) for n being the length of the provided string, and is, I guess, the best you can get.
As per my understanding, you cannot have the solution in a constant time. It will take at least one pass over the million digit number (assuming its a string). You can have a 3-digit rolling iteration over the digits of the million length number and increase the value of hash key by 1 if it already exists or create a new hash key (initialized by value 1) if it doesn't exists already in the dictionary.
The code will look something like this:
def calc_repeating_digits(number):
hash = {}
for i in range(len(str(number))-2):
current_three_digits = number[i:i+3]
if current_three_digits in hash.keys():
hash[current_three_digits] += 1
else:
hash[current_three_digits] = 1
return hash
You can filter down to the keys which have item value greater than 1.
As mentioned in another answer, you cannot do this algorithm in constant time, because you must look at at least n digits. Linear time is the fastest you can get.
However, the algorithm can be done in O(1) space. You only need to store the counts of each 3 digit number, so you need an array of 1000 entries. You can then stream the number in.
My guess is that either the interviewer misspoke when they gave you the solution, or you misheard "constant time" when they said "constant space."
Here's my answer:
from timeit import timeit
from collections import Counter
import types
import random
def setup_data(n):
digits = "0123456789"
return dict(text = ''.join(random.choice(digits) for i in range(n)))
def f_counter(text):
c = Counter()
for i in range(len(text)-2):
ss = text[i:i+3]
c.update([ss])
return (i for i in c.items() if i[1] > 1)
def f_dict(text):
d = {}
for i in range(len(text)-2):
ss = text[i:i+3]
if ss not in d:
d[ss] = 0
d[ss] += 1
return ((i, d[i]) for i in d if d[i] > 1)
def f_array(text):
a = [[[0 for _ in range(10)] for _ in range(10)] for _ in range(10)]
for n in range(len(text)-2):
i, j, k = (int(ss) for ss in text[n:n+3])
a[i][j][k] += 1
for i, b in enumerate(a):
for j, c in enumerate(b):
for k, d in enumerate(c):
if d > 1: yield (f'{i}{j}{k}', d)
for n in (1E1, 1E3, 1E6):
n = int(n)
data = setup_data(n)
print(f'n = {n}')
results = {}
for name, func in list(globals().items()):
if not name.startswith('f_') or not isinstance(func, types.FunctionType):
continue
print("{:16s}{:16.8f} ms".format(name[2:], timeit(
'results[name] = f(**data)', globals={'f':func, 'data':data, 'results':results, 'name':name}, number=10)*100))
for r in results:
print('{:10}: {}'.format(r, sorted(list(results[r]))[:5]))
The array lookup method is very fast (even faster than #paul-panzer's numpy method!). Of course, it cheats since it isn't technicailly finished after it completes, because it's returning a generator. It also doesn't have to check every iteration if the value already exists, which is likely to help a lot.
n = 10
counter 0.10595780 ms
dict 0.01070654 ms
array 0.00135370 ms
f_counter : []
f_dict : []
f_array : []
n = 1000
counter 2.89462101 ms
dict 0.40434612 ms
array 0.00073838 ms
f_counter : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
f_dict : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
f_array : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
n = 1000000
counter 2849.00500992 ms
dict 438.44007806 ms
array 0.00135370 ms
f_counter : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]
f_dict : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]
f_array : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]
Image as answer:
Looks like a sliding window.
Here is my solution:
from collections import defaultdict
string = "103264685134845354863"
d = defaultdict(int)
for elt in range(len(string)-2):
d[string[elt:elt+3]] += 1
d = {key: d[key] for key in d.keys() if d[key] > 1}
With a bit of creativity in for loop(and additional lookup list with True/False/None for example) you should be able to get rid of last line, as you only want to create keys in dict that we visited once up to that point.
Hope it helps :)
-Telling from the perspective of C.
-You can have an int 3-d array results[10][10][10];
-Go from 0th location to n-4th location, where n being the size of the string array.
-On each location, check the current, next and next's next.
-Increment the cntr as resutls[current][next][next's next]++;
-Print the values of
results[1][2][3]
results[2][3][4]
results[3][4][5]
results[4][5][6]
results[5][6][7]
results[6][7][8]
results[7][8][9]
-It is O(n) time, there is no comparisons involved.
-You can run some parallel stuff here by partitioning the array and calculating the matches around the partitions.
inputStr = '123456123138276237284287434628736482376487234682734682736487263482736487236482634'
count = {}
for i in range(len(inputStr) - 2):
subNum = int(inputStr[i:i+3])
if subNum not in count:
count[subNum] = 1
else:
count[subNum] += 1
print count
We have some large binary number N (large means millions of digits). We also have binary mask M where 1 means that we must remove digit in this position in number N and move all higher bits one position right.
Example:
N = 100011101110
M = 000010001000
Res 1000110110
Is it possible to solve this problem without cycle with some set of logical or arithmetical operations? We can assume that we have access to bignum arithmetic in Python.
Feels like it should be something like this:
Res = N - (N xor M)
But it doesn't work
UPD: My current solution with cycle is following:
def prepare_reduced_arrays(dict_of_N, mask):
'''
mask: string '0000011000'
each element of dict_of_N - big python integer
'''
capacity = len(mask)
answer = dict()
for el in dict_of_N:
answer[el] = 0
new_capacity = 0
for i in range(capacity - 1, -1, -1):
if mask[i] == '1':
continue
cap2 = (1 << new_capacity)
pos = (capacity - i - 1)
for el in dict_of_N:
current_bit = (dict_of_N[el] >> pos) & 1
if current_bit:
answer[el] |= cap2
new_capacity += 1
return answer, new_capacity
While this may not be possible without a loop in python, it can be made extremely fast with numba and just in time compilation. I went on the assumption that your inputs could be easily represented as boolean arrays, which would be very simple to construct from a binary file using struct. The method I have implemented involves iterating a few different objects, however these iterations were chosen carefully to make sure they were compiler optimized, and never doing the same work twice. The first iteration is using np.where to locate the indices of all the bits to delete. This specific function (among many others) is optimized by the numba compiler. I then use this list of bit indices to build the slice indices for slices of bits to keep. The final loop copies these slices to an empty output array.
import numpy as np
from numba import jit
from time import time
def binary_mask(num, mask):
num_nbits = num.shape[0] #how many bits are in our big num
mask_bits = np.where(mask)[0] #which bits are we deleting
mask_n_bits = mask_bits.shape[0] #how many bits are we deleting
start = np.empty(mask_n_bits + 1, dtype=int) #preallocate array for slice start indexes
start[0] = 0 #first slice starts at 0
start[1:] = mask_bits + 1 #subsequent slices start 1 after each True bit in mask
end = np.empty(mask_n_bits + 1, dtype=int) #preallocate array for slice end indexes
end[:mask_n_bits] = mask_bits #each slice ends on (but does not include) True bits in the mask
end[mask_n_bits] = num_nbits + 1 #last slice goes all the way to the end
out = np.empty(num_nbits - mask_n_bits, dtype=np.uint8) #preallocate return array
for i in range(mask_n_bits + 1): #for each slice
a = start[i] #use local variables to reduce number of lookups
b = end[i]
c = a - i
d = b - i
out[c:d] = num[a:b] #copy slices
return out
jit_binary_mask = jit("b1[:](b1[:], b1[:])")(binary_mask) #decorator without syntax sugar
###################### Benchmark ########################
bignum = np.random.randint(0,2,1000000, dtype=bool) # 1 million random bits
bigmask = np.random.randint(0,10,1000000, dtype=np.uint8)==9 #delete about 1 in 10 bits
t = time()
for _ in range(10): #10 cycles of just numpy implementation
out = binary_mask(bignum, bigmask)
print(f"non-jit: {time()-t} seconds")
t = time()
out = jit_binary_mask(bignum, bigmask) #once ahead of time to compile
compile_and_run = time() - t
t = time()
for _ in range(10): #10 cycles of compiled numpy implementation
out = jit_binary_mask(bignum, bigmask)
jit_runtime = time()-t
print(f"jit: {jit_runtime} seconds")
print(f"estimated compile_time: {compile_and_run - jit_runtime/10}")
In this example, I execute the benchmark on a boolean array of length 1,000,000 a total of 10 times for both the compiled and un-compiled version. On my laptop, the output is:
non-jit: 1.865583896636963 seconds
jit: 0.06370806694030762 seconds
estimated compile_time: 0.1652850866317749
As you can see with a simple algorithm like this, very significant performance gains can be seen from compilation. (in my case about 20-30x speedup)
As far as I know, this can be done without the use of loops if and only if M is a power of 2.
Let's take your example, and modify M so that it is a power of 2:
N = 0b100011101110 = 2286
M = 0b000000001000 = 8
Removing the fourth lowest bit from N and shifting the higher bits to the right would result in:
N = 0b10001110110 = 1142
We achieved this using the following algorithm:
Begin with N = 0b100011101110 = 2286
Iterate from the most-significant bit to the least-significant bit in M.
If the current bit in M is set to 1, then store the lower bits in some variable, x:
x = 0b1101110
Then, subtract every bit up to and including the current bit in M from N, so that we end up with the following:
N - (0b10000000 + x) = N - (0b10000000 + 0b1101110) = 0b100011101110 - 0b11101110 = 0b100000000000
This step can also be achieved by and-ing the bits with 0, which may be more efficient.
Next, we shift the result once to the right:
0b100000000000 >> 1 = 0b10000000000
Finally, we add back x to the shifted result:
0b10000000000 + x = 0b10000000000 + 0b1101110 = 0b10001101110 = 1142
There may be a possibility that this can somehow be done without loops, but it would actually be efficient if you were to simply iterate over M (from the most-significant bit to the least-significant bit) and performed this process on every set bit, as the time complexity would be O(M.bit_length()).
I wrote up the code for this algorithm as well, and I believe it's relatively efficient, but I don't have any big binary numbers to test it with:
def remove_bits(N, M):
bit = 2 ** (M.bit_length() - 1)
while bit != 0:
if M & bit:
ones = bit - 1
# Store lower `bit` bits.
temp = N & ones
# Clear lower `bit` bits.
N &= ~ones
# Shift once to the right.
N >>= 1
# Set stored lower `bit` bits.
N |= temp
bit >>= 1
return N
if __name__ == '__main__':
N = 0b100011101110
M = 0b000010001000
print(bin(remove_bits(N, M)))
Using your example, this returns your result: 0b1000110110
I don't think there's any way to do this in a constant number of calls to the built-in bitwise operators. Python would have to provide something like PEXT for that to be possible.
For literally millions of digits, you may actually get best performance by working in terms of sequences of bits, sacrificing the space advantages of Python ints and the time advantages of bitwise operations in favor of more flexibility in the operations you can perform. I don't know where the break-even point would be:
import itertools
bits = bin(N)[2:]
maskbits = bin(M)[2:].zfill(len(bits))
bits = bits.zfill(len(maskbits))
chosenbits = itertools.compress(bits, map('0'.__eq__, maskbits))
result = int(''.join(chosenbits), 2)