Related
I have a dataframe with a lot of rows with numerical columns, such as:
A
B
C
D
12
7
1
0
7
1
2
0
1
1
1
1
2
2
0
0
I need to reduce the size of the dataframe by removing those rows that has another row with all values bigger.
In the previous example i need to remove the last row because the first row has all values bigger (in case of dubplicate rows i need to keep one of them).
And return This:
A
B
C
D
12
7
1
0
7
1
2
0
1
1
1
1
My faster solution are the folowing:
def complete_reduction(df, columns):
def _single_reduction(row):
df["check"] = True
for col in columns:
df["check"] = df["check"] & (df[col] >= row[col])
drop_index.append(df["check"].sum() == 1)
df = df.drop_duplicates(subset=columns)
drop_index = []
df.apply(lambda x: _single_reduction(x), axis=1)
df = df[numpy.array(drop_index).astype(bool)]
return df
Any better ideas?
Update:
A new solution has been found here
https://stackoverflow.com/a/68528943/11327160
but i hope for somethings faster.
An more memory-efficient and faster solution than the one proposed so far is to use Numba. There is no need to create huge temporary array with Numba. Moreover, it is easy to write a parallel implementation that makes use of all CPU cores. Here is the implementation:
import numba as nb
#nb.njit
def is_dominated(arr, k):
n, m = arr.shape
for i in range(n):
if i != k:
dominated = True
for j in range(m):
if arr[i, j] < arr[k, j]:
dominated = False
if dominated:
return True
return False
# Precompile the function to native code for the most common types
#nb.njit(['(i4[:,::1],)', '(i8[:,::1],)'], parallel=True, cache=True)
def dominated_rows(arr):
n, m = arr.shape
toRemove = np.empty(n, dtype=np.bool_)
for i in nb.prange(n):
toRemove[i] = is_dominated(arr, i)
return toRemove
# Special case
df2 = df.drop_duplicates()
# Main computation
result = df2[~dominated_rows(np.ascontiguousarray(df.values))]
Benchmark
The input test is two random dataframes of shape 20000x5 and 5000x100 containing small integers (ie. [0;100[). Tests have been done on a (6-core) i5-9600KF processor with 16 GiB of RAM on Windows. The version of #BingWang is the updated one of the 2022-05-24. Here are performance results of the proposed approaches so far:
Dataframe with shape 5000x100
- Initial code: 114_340 ms
- BENY: 2_716 ms (consume few GiB of RAM)
- Bing Wang: 2_619 ms
- Numba: 303 ms <----
Dataframe with shape 20000x5
- Initial code: (too long)
- BENY: 8.775 ms (consume few GiB of RAM)
- Bing Wang: 578 ms
- Numba: 21 ms <----
This solution is respectively about 9 to 28 times faster than the fastest one (of #BingWang). It also has the benefit of consuming far less memory. Indeed, the #BENY implementation consume few GiB of RAM while this one (and the one of #BingWang) only consumes no more than few MiB for this used-case. The speed gain over the #BingWang implementation is due to the early stop, parallelism and the native execution.
One can see that this Numba implementation and the one of #BingWang are quite efficient when the number of column is small. This makes sense for the #BingWang since the complexity should be O(N(logN)^(d-2)) where d is the number of columns. As for Numba, it is significantly faster because most rows are dominated on the second random dataset causing the early stop to be very effective in practice. I think the #BingWang algorithm might be faster when most rows are not dominated. However, this case should be very uncommon on dataframes with few columns and a lot of rows (at least, clearly on uniformly random ones).
We can do numpy board cast
s = df.values
out = df[np.sum(np.all(s>=s[:,None],-1),1)==1]
Out[44]:
A B C D
0 12 7 1 0
1 7 1 2 0
2 1 1 1 1
Here is a try based on Kung et al 1975
http://www.eecs.harvard.edu/~htk/publication/1975-jacm-kung-luccio-preparata.pdf
Brutal force solution is from https://stackoverflow.com/a/68528943/11327160
I didn't robustly test it, but using these parameters it looks to be the same answer
There is no guarantee it is correct, or I am even following the paper. Please test thoroughly. In addition, there is very likely to be a commercial solution to calculate it.
D=5 #dimension, or number of columns
N=2000 #number of data rows
M=1000 #upper bound for random integers
Changing to D=20 and N=20000 you can see Kung75 completes in <1 minute but Brutal Force will use more than 10x the time.
Even at Dimension=1000,Rows=20000,value range 0~999, it can still complete slightly over 1 minute
This can be revised similar to merge sort (compute small chunks by brutal force, then merge up with Filter), which is easier to switch to parallel computing.
Another way of speeding up is to turn off array boundary check after you are comfortable with the code. This is due to heavy array indexing here. I would recommend C# if you want to try this path.
import pandas as pd
import numpy as np
import datetime
#generate fake data
D=1000 #dimension, or number of columns
N=20000 #number of data rows
M=1000 #upper bound for random integers
np.random.seed(12345) #set seed so this is reproducible
data=np.random.randint(0,M,(N,D))
for i in range(0,12):
print(i,data[i])
#Compare w and v starting dimention d
def Compare(w,v,d=0):
cmp=3 #0x11, low bit is GE, high bit is LE, together means EQ
while d<D:
if w[d]>v[d]:
cmp&=1
elif w[d]<v[d]:
cmp&=2
if cmp>0:
d+=1
else:
break
return cmp # 0=uncomparable, 1=GT, 2=LT, 3=EQ
#unit test:
#print(Compare(data[0],data[1]))
#print(Compare(data[0],data[1],4))
#print(Compare(data[1],data[11]))
#print(Compare(data[11],data[1]))
#print(Compare(data[1],data[1]))
def AuxSort(d,ndxArray): #stable sort desc by dimention d
return [x[1] for x in sorted([(-data[n][d],n) for n in ndxArray])]
#unit test
#print(AuxSort(data,0,[0,4,3]))
#print(AuxSort(data,2,[0,1,2]))
#cumulatively find the pareto front. Time O(N^2), space O(N)
def N2BrutalForce(data,ndxArray=None,d=0):
if len(data)==0:
return []
if not ndxArray: #by default check the entire data
ndxArray=list(range(len(data)))
#up to this point ndxArray is not empty
result={ndxArray[0]:data[ndxArray[0]]}
for i in range(1,len(ndxArray)):
dominated=[]
j=ndxArray[i]
for k,v in result.items():
c=Compare(data[j],v,d)
if c>1:
break
elif c==1:
dominated.append(k)
else:
for o in dominated:
del result[o]
result[j]=data[j]
return [r for r in result]
def resultPrinter(res, ShowCountOnly=False):
if not ShowCountOnly:
for r in sorted(res):
print(r,data[r])
print(len(res),'results found',datetime.datetime.today())
#unit rest
#resultPrinter(N2BrutalForce(data),True)
#resultPrinter(N2BrutalForce(data,list(range(15))))
def FindT(R1,R2,S1,S2,d):
S1R1=set(Filter(data,d,R1,S1))
T1=[s for s in S1 if s in S1R1]
S2R1=Filter(data,d+1,R1,S2)
S2R2=set(Filter(data,d,R2,S2))
T2=[s for s in S2R1 if s in S2R2]
return T1+T2
def BreakAtPseudoMedian(sArray,d):
sArray=AuxSort(d,sArray) #this could speed up by moving the sort to caller and avoid redo sorting
if data[sArray[0]][d]==data[sArray[-1]][d]:
return [],sArray
L=len(sArray)
mHigh=mLow=L//2
while mLow>0 and data[sArray[mLow]][d]==data[sArray[mLow-1]][d]:
mLow-=1
if mLow>0:
return sArray[:mLow],sArray[mLow:]
while mHigh<L-1 and data[sArray[mHigh]][d]==data[sArray[mHigh+1]][d]:
mHigh+=1
return sArray[:mHigh],sArray[mHigh:]
def Filter(data,d,rArray,sArray):
L=len(rArray)+len(sArray)
if d==D-1 and rArray:
R=max(data[r][d] for r in rArray)
return [s for s in sArray if data[s][d]>R]
elif len(rArray)*len(sArray)<=30 or len(rArray)<=2 or len(sArray)<=2:
nonDominated=[]
for s in sArray:
for r in rArray:
c=Compare(data[s],data[r],d)
if c>1:
break
else:
nonDominated.append(s)
return nonDominated
S1,S2=BreakAtPseudoMedian(sArray,d)
R1,R2=BreakAtRefValue(rArray,d,data[S2[0]][d])
if not S1 and not R1:
return Filter(data,d+1,rArray,sArray)
return FindT(R1,R2,S1,S2,d)
#Filter(data,0,[0,1,2,3,4,5,6,7,8,9],[11])
def BreakAtRefValue(rArray,d,br):
rArray=AuxSort(d,rArray)
if data[rArray[0]][d]<=br:
return [],rArray
if data[rArray[-1]][d]>br:
return rArray,[]
mLow,mHigh=0,len(rArray)-1
while mLow<mHigh-1 and data[rArray[mLow]][d]>br and data[rArray[mHigh]][d]<br:
mid=(mLow+mHigh)//2
if data[rArray[mid]][d]>br:
mLow=mid
elif data[rArray[mid]][d]<br:
mHigh=mid
else:
mLow=mid
break
if data[rArray[mLow]][d]>br and data[rArray[mHigh]][d]<br:
return rArray[:mHigh],rArray[mHigh:]
if data[rArray[mLow]][d]==br:
while data[rArray[mLow-1]][d]==br:
mLow-=1
return rArray[:mLow],rArray[mLow:]
while data[rArray[mHigh-1]][d]==br:
mHigh-=1
return rArray[:mHigh],rArray[mHigh:]
def Kung75(data,d,ndxArray):
L=len(ndxArray)
if L<10:
return N2BrutalForce(data,ndxArray,d)
elif d==D-1:
x,y=-1,-1
for n in ndxArray:
if y<0 or data[n][d]>x:
x,y=data[n][d],n
return [y]
if data[ndxArray[0]][d]==data[ndxArray[-1]][d]:
return Kung75(data,d+1,AuxSort(d+1,ndxArray))
R,S=BreakAtPseudoMedian(ndxArray,d)
R=Kung75(data,d,R)
S=Kung75(data,d,S)
T=Filter(data,d+1,R,S)
return R+T
print('started at',datetime.datetime.today())
resultPrinter(Kung75(data,0,AuxSort(0,list(range(len(data))))),True)
We take the cumulative maximum value per column in the dataframe.
We want to keep all rows that have a single column value that is equal to the maximum. We then drop duplicates using pandas drop_duplicates
In [14]: df = pd.DataFrame(
...: [[12, 7, 1, 0], [7, 1, 2, 0], [1, 1, 1, 1], [2, 2, 0, 0]],
...: columns=["A", "B", "C", "D"],
...: )
In [15]: df[(df == df.cummax(axis=0)).any(axis=1)].drop_duplicates()
Out[15]:
A B C D
0 12 7 1 0
1 7 1 2 0
2 1 1 1 1
df.sort_values(by=['A', 'B', 'C', 'D'], ascending=False, inplace=True)
df = df.iloc[:cutoff]
If this takes too long you could do it on subsets of the df until
it is small enough.
Given an nxn array A of real positive numbers, I'm trying to find the minimum of the maximum of the element-wise minimum of all combinations of three rows of the 2-d array. Using for-loops, that comes out to something like this:
import numpy as np
n = 100
np.random.seed(2)
A = np.random.rand(n,n)
global_best = np.inf
for i in range(n-2):
for j in range(i+1, n-1):
for k in range(j+1, n):
# find the maximum of the element-wise minimum of the three vectors
local_best = np.amax(np.array([A[i,:], A[j,:], A[k,:]]).min(0))
# if local_best is lower than global_best, update global_best
if (local_best < global_best):
global_best = local_best
save_rows = [i, j, k]
print global_best, save_rows
In the case for n = 100, the output should be this:
Out[]: 0.492652949593 [6, 41, 58]
I have a feeling though that I could do this much faster using Numpy vectorization, and would certainly appreciate any help on doing this. Thanks.
This solution is 5x faster for n=100:
coms = np.fromiter(itertools.combinations(np.arange(n), 3), 'i,i,i').view(('i', 3))
best = A[coms].min(1).max(1)
at = best.argmin()
global_best = best[at]
save_rows = coms[at]
The first line is a bit convoluted but turns the result of itertools.combinations into a NumPy array which contains all possible [i,j,k] index combinations.
From there, it's a simple matter of indexing into A using all the possible index combinations, then reducing along the appropriate axes.
This solution consumes a lot more memory as it builds the concrete array of all possible combinations A[coms]. It saves time for smallish n, say under 250, but for large n the memory traffic will be very high and it may be slower than the original code.
Working by chunks allows to combine the speed of vectorized calculus while avoiding to run into Memory Errors. Below there is an example of converting the nested loops to vectorization by chunks.
Starting from the same variables as the question, a chunk length is defined, in order to vectorize calculations inside the chunk and loop only over chunks instead of over combinations.
chunk = 2000 # define chunk length, if to small, the code won't take advantage
# of vectorization, if it is too large, excessive memory usage will
# slow down execution, or Memory Error will be risen
combinations = itertools.combinations(range(n),3) # generate iterator containing
# all possible combinations of 3 columns
N = n*(n-1)*(n-2)//6 # number of combinations (length of combinations cannot be
# retrieved because it is an iterator)
# generate a list containing how many elements of combinations will be retrieved
# per iteration
n_chunks, remainder = divmod(N,chunk)
counts_list = [chunk for _ in range(n_chunks)]
if remainder:
counts_list.append(remainder)
# Iterate one chunk at a time, using vectorized code to treat the chunk
for counts in counts_list:
# retrieve combinations in current chunk
current_comb = np.fromiter(combinations,dtype='i,i,i',count=counts)\
.view(('i',3))
# maximum of element-wise minimum in current chunk
chunk_best = np.minimum(np.minimum(A[current_comb[:,0],:],A[current_comb[:,1],:]),
A[current_comb[:,2],:]).max(axis=1)
ravel_save_row = chunk_best.argmin() # minimum of maximums in current chunk
# check if current chunk contains global minimum
if chunk_best[ravel_save_row] < global_best:
global_best = chunk_best[ravel_save_row]
save_rows = current_comb[ravel_save_row]
print(global_best,save_rows)
I ran some performance comparisons with the nested loops, obtaining the following results (chunk_length = 1000):
n=100
Nested loops: 1.13 s ± 16.6 ms
Work by chunks: 108 ms ± 565 µs
n=150
Nested loops: 4.16 s ± 39.3 ms
Work by chunks: 523 ms ± 4.75 ms
n=500
Nested loops: 3min 18s ± 3.21 s
Work by chunks: 1min 12s ± 1.6 s
Note
After profiling the code, I found that the np.min was what took longest by calling np.maximum.reduce. I converted it directly to np.maximum which improved performance a bit.
Don't try to vectorize loops that are not simple to vectorize. Instead use a jit compiler like Numba or use Cython. Vectorized solutions are good if the resulting code is more readable, but in terms of performance a compiled solution is usually faster or in a worst case scenario as fast as a vectorized solution (except BLAS routines).
Single-threaded example
import numba as nb
import numpy as np
#Min and max library calls may be costly for only 3 values
#nb.njit()
def max_min_3(A,B,C):
max_of_min=-np.inf
for i in range(A.shape[0]):
loc_min=A[i]
if (B[i]<loc_min):
loc_min=B[i]
if (C[i]<loc_min):
loc_min=C[i]
if (max_of_min<loc_min):
max_of_min=loc_min
return max_of_min
#nb.njit()
def your_func(A):
n=A.shape[0]
save_rows=np.zeros(3,dtype=np.uint64)
global_best=np.inf
for i in range(n):
for j in range(i+1, n):
for k in range(j+1, n):
# find the maximum of the element-wise minimum of the three vectors
local_best = max_min_3(A[i,:], A[j,:], A[k,:])
# if local_best is lower than global_best, update global_best
if (local_best < global_best):
global_best = local_best
save_rows[0] = i
save_rows[1] = j
save_rows[2] = k
return global_best, save_rows
Performance of single-threaded version
n=100
your_version: 1.56s
compiled_version: 0.0168s (92x speedup)
n=150
your_version: 5.41s
compiled_version: 0.08122s (66x speedup)
n=500
your_version: 283s
compiled_version: 8.86s (31x speedup)
The first call has a constant overhead of about 0.3-1s. For performance measurement of the calculation time itself, call it once and then measure performance.
With a few code changes this task can also be parallelized.
Multi-threaded example
#nb.njit(parallel=True)
def your_func(A):
n=A.shape[0]
all_global_best=np.inf
rows=np.empty((3),dtype=np.uint64)
save_rows=np.empty((n,3),dtype=np.uint64)
global_best_Temp=np.empty((n),dtype=A.dtype)
global_best_Temp[:]=np.inf
for i in range(n):
for j in nb.prange(i+1, n):
row_1=0
row_2=0
row_3=0
global_best=np.inf
for k in range(j+1, n):
# find the maximum of the element-wise minimum of the three vectors
local_best = max_min_3(A[i,:], A[j,:], A[k,:])
# if local_best is lower than global_best, update global_best
if (local_best < global_best):
global_best = local_best
row_1 = i
row_2 = j
row_3 = k
save_rows[j,0]=row_1
save_rows[j,1]=row_2
save_rows[j,2]=row_3
global_best_Temp[j]=global_best
ind=np.argmin(global_best_Temp)
if (global_best_Temp[ind]<all_global_best):
rows[0] = save_rows[ind,0]
rows[1] = save_rows[ind,1]
rows[2] = save_rows[ind,2]
all_global_best=global_best_Temp[ind]
return all_global_best, rows
Performance of multi-threaded version
n=100
your_version: 1.56s
compiled_version: 0.0078s (200x speedup)
n=150
your_version: 5.41s
compiled_version: 0.0282s (191x speedup)
n=500
your_version: 283s
compiled_version: 2.95s (96x speedup)
Edit
In a newer Numba Version (installed through the Anaconda Python Distribution) I have to manually install tbb to get a working parallelization.
You can use combinations from itertools, that it's a python standard library, and it will help you to to remove all those nested loops.
from itertools import combinations
import numpy as np
n = 100
np.random.seed(2)
A = np.random.rand(n,n)
global_best = 1000000000000000.0
for i, j, k in combinations(range(n), 3):
local_best = np.amax(np.array([A[i,:], A[j,:], A[k,:]]).min(0))
if local_best < global_best:
global_best = local_best
save_rows = [i, j, k]
print global_best, save_rows
I had an interview with a hedge fund company in New York a few months ago and unfortunately, I did not get the internship offer as a data/software engineer. (They also asked the solution to be in Python.)
I pretty much screwed up on the first interview problem...
Question: Given a string of a million numbers (Pi for example), write
a function/program that returns all repeating 3 digit numbers and number of
repetition greater than 1
For example: if the string was: 123412345123456 then the function/program would return:
123 - 3 times
234 - 3 times
345 - 2 times
They did not give me the solution after I failed the interview, but they did tell me that the time complexity for the solution was constant of 1000 since all the possible outcomes are between:
000 --> 999
Now that I'm thinking about it, I don't think it's possible to come up with a constant time algorithm. Is it?
You got off lightly, you probably don't want to be working for a hedge fund where the quants don't understand basic algorithms :-)
There is no way to process an arbitrarily-sized data structure in O(1) if, as in this case, you need to visit every element at least once. The best you can hope for is O(n) in this case, where n is the length of the string.
Although, as an aside, a nominal O(n) algorithm will be O(1) for a fixed input size so, technically, they may have been correct here. However, that's not usually how people use complexity analysis.
It appears to me you could have impressed them in a number of ways.
First, by informing them that it's not possible to do it in O(1), unless you use the "suspect" reasoning given above.
Second, by showing your elite skills by providing Pythonic code such as:
inpStr = '123412345123456'
# O(1) array creation.
freq = [0] * 1000
# O(n) string processing.
for val in [int(inpStr[pos:pos+3]) for pos in range(len(inpStr) - 2)]:
freq[val] += 1
# O(1) output of relevant array values.
print ([(num, freq[num]) for num in range(1000) if freq[num] > 1])
This outputs:
[(123, 3), (234, 3), (345, 2)]
though you could, of course, modify the output format to anything you desire.
And, finally, by telling them there's almost certainly no problem with an O(n) solution, since the code above delivers results for a one-million-digit string in well under half a second. It seems to scale quite linearly as well, since a 10,000,000-character string takes 3.5 seconds and a 100,000,000-character one takes 36 seconds.
And, if they need better than that, there are ways to parallelise this sort of stuff that can greatly speed it up.
Not within a single Python interpreter of course, due to the GIL, but you could split the string into something like (overlap indicated by vv is required to allow proper processing of the boundary areas):
vv
123412 vv
123451
5123456
You can farm these out to separate workers and combine the results afterwards.
The splitting of input and combining of output are likely to swamp any saving with small strings (and possibly even million-digit strings) but, for much larger data sets, it may well make a difference. My usual mantra of "measure, don't guess" applies here, of course.
This mantra also applies to other possibilities, such as bypassing Python altogether and using a different language which may be faster.
For example, the following C code, running on the same hardware as the earlier Python code, handles a hundred million digits in 0.6 seconds, roughly the same amount of time as the Python code processed one million. In other words, much faster:
#include <stdio.h>
#include <string.h>
int main(void) {
static char inpStr[100000000+1];
static int freq[1000];
// Set up test data.
memset(inpStr, '1', sizeof(inpStr));
inpStr[sizeof(inpStr)-1] = '\0';
// Need at least three digits to do anything useful.
if (strlen(inpStr) <= 2) return 0;
// Get initial feed from first two digits, process others.
int val = (inpStr[0] - '0') * 10 + inpStr[1] - '0';
char *inpPtr = &(inpStr[2]);
while (*inpPtr != '\0') {
// Remove hundreds, add next digit as units, adjust table.
val = (val % 100) * 10 + *inpPtr++ - '0';
freq[val]++;
}
// Output (relevant part of) table.
for (int i = 0; i < 1000; ++i)
if (freq[i] > 1)
printf("%3d -> %d\n", i, freq[i]);
return 0;
}
Constant time isn't possible. All 1 million digits need to be looked at at least once, so that is a time complexity of O(n), where n = 1 million in this case.
For a simple O(n) solution, create an array of size 1000 that represents the number of occurrences of each possible 3 digit number. Advance 1 digit at a time, first index == 0, last index == 999997, and increment array[3 digit number] to create a histogram (count of occurrences for each possible 3 digit number). Then output the content of the array with counts > 1.
A million is small for the answer I give below. Expecting only that you have to be able to run the solution in the interview, without a pause, then The following works in less than two seconds and gives the required result:
from collections import Counter
def triple_counter(s):
c = Counter(s[n-3: n] for n in range(3, len(s)))
for tri, n in c.most_common():
if n > 1:
print('%s - %i times.' % (tri, n))
else:
break
if __name__ == '__main__':
import random
s = ''.join(random.choice('0123456789') for _ in range(1_000_000))
triple_counter(s)
Hopefully the interviewer would be looking for use of the standard libraries collections.Counter class.
Parallel execution version
I wrote a blog post on this with more explanation.
The simple O(n) solution would be to count each 3-digit number:
for nr in range(1000):
cnt = text.count('%03d' % nr)
if cnt > 1:
print '%03d is found %d times' % (nr, cnt)
This would search through all 1 million digits 1000 times.
Traversing the digits only once:
counts = [0] * 1000
for idx in range(len(text)-2):
counts[int(text[idx:idx+3])] += 1
for nr, cnt in enumerate(counts):
if cnt > 1:
print '%03d is found %d times' % (nr, cnt)
Timing shows that iterating only once over the index is twice as fast as using count.
Here is a NumPy implementation of the "consensus" O(n) algorithm: walk through all triplets and bin as you go. The binning is done by upon encountering say "385", adding one to bin[3, 8, 5] which is an O(1) operation. Bins are arranged in a 10x10x10 cube. As the binning is fully vectorized there is no loop in the code.
def setup_data(n):
import random
digits = "0123456789"
return dict(text = ''.join(random.choice(digits) for i in range(n)))
def f_np(text):
# Get the data into NumPy
import numpy as np
a = np.frombuffer(bytes(text, 'utf8'), dtype=np.uint8) - ord('0')
# Rolling triplets
a3 = np.lib.stride_tricks.as_strided(a, (3, a.size-2), 2*a.strides)
bins = np.zeros((10, 10, 10), dtype=int)
# Next line performs O(n) binning
np.add.at(bins, tuple(a3), 1)
# Filtering is left as an exercise
return bins.ravel()
def f_py(text):
counts = [0] * 1000
for idx in range(len(text)-2):
counts[int(text[idx:idx+3])] += 1
return counts
import numpy as np
import types
from timeit import timeit
for n in (10, 1000, 1000000):
data = setup_data(n)
ref = f_np(**data)
print(f'n = {n}')
for name, func in list(globals().items()):
if not name.startswith('f_') or not isinstance(func, types.FunctionType):
continue
try:
assert np.all(ref == func(**data))
print("{:16s}{:16.8f} ms".format(name[2:], timeit(
'f(**data)', globals={'f':func, 'data':data}, number=10)*100))
except:
print("{:16s} apparently crashed".format(name[2:]))
Unsurprisingly, NumPy is a bit faster than #Daniel's pure Python solution on large data sets. Sample output:
# n = 10
# np 0.03481400 ms
# py 0.00669330 ms
# n = 1000
# np 0.11215360 ms
# py 0.34836530 ms
# n = 1000000
# np 82.46765980 ms
# py 360.51235450 ms
I would solve the problem as follows:
def find_numbers(str_num):
final_dict = {}
buffer = {}
for idx in range(len(str_num) - 3):
num = int(str_num[idx:idx + 3])
if num not in buffer:
buffer[num] = 0
buffer[num] += 1
if buffer[num] > 1:
final_dict[num] = buffer[num]
return final_dict
Applied to your example string, this yields:
>>> find_numbers("123412345123456")
{345: 2, 234: 3, 123: 3}
This solution runs in O(n) for n being the length of the provided string, and is, I guess, the best you can get.
As per my understanding, you cannot have the solution in a constant time. It will take at least one pass over the million digit number (assuming its a string). You can have a 3-digit rolling iteration over the digits of the million length number and increase the value of hash key by 1 if it already exists or create a new hash key (initialized by value 1) if it doesn't exists already in the dictionary.
The code will look something like this:
def calc_repeating_digits(number):
hash = {}
for i in range(len(str(number))-2):
current_three_digits = number[i:i+3]
if current_three_digits in hash.keys():
hash[current_three_digits] += 1
else:
hash[current_three_digits] = 1
return hash
You can filter down to the keys which have item value greater than 1.
As mentioned in another answer, you cannot do this algorithm in constant time, because you must look at at least n digits. Linear time is the fastest you can get.
However, the algorithm can be done in O(1) space. You only need to store the counts of each 3 digit number, so you need an array of 1000 entries. You can then stream the number in.
My guess is that either the interviewer misspoke when they gave you the solution, or you misheard "constant time" when they said "constant space."
Here's my answer:
from timeit import timeit
from collections import Counter
import types
import random
def setup_data(n):
digits = "0123456789"
return dict(text = ''.join(random.choice(digits) for i in range(n)))
def f_counter(text):
c = Counter()
for i in range(len(text)-2):
ss = text[i:i+3]
c.update([ss])
return (i for i in c.items() if i[1] > 1)
def f_dict(text):
d = {}
for i in range(len(text)-2):
ss = text[i:i+3]
if ss not in d:
d[ss] = 0
d[ss] += 1
return ((i, d[i]) for i in d if d[i] > 1)
def f_array(text):
a = [[[0 for _ in range(10)] for _ in range(10)] for _ in range(10)]
for n in range(len(text)-2):
i, j, k = (int(ss) for ss in text[n:n+3])
a[i][j][k] += 1
for i, b in enumerate(a):
for j, c in enumerate(b):
for k, d in enumerate(c):
if d > 1: yield (f'{i}{j}{k}', d)
for n in (1E1, 1E3, 1E6):
n = int(n)
data = setup_data(n)
print(f'n = {n}')
results = {}
for name, func in list(globals().items()):
if not name.startswith('f_') or not isinstance(func, types.FunctionType):
continue
print("{:16s}{:16.8f} ms".format(name[2:], timeit(
'results[name] = f(**data)', globals={'f':func, 'data':data, 'results':results, 'name':name}, number=10)*100))
for r in results:
print('{:10}: {}'.format(r, sorted(list(results[r]))[:5]))
The array lookup method is very fast (even faster than #paul-panzer's numpy method!). Of course, it cheats since it isn't technicailly finished after it completes, because it's returning a generator. It also doesn't have to check every iteration if the value already exists, which is likely to help a lot.
n = 10
counter 0.10595780 ms
dict 0.01070654 ms
array 0.00135370 ms
f_counter : []
f_dict : []
f_array : []
n = 1000
counter 2.89462101 ms
dict 0.40434612 ms
array 0.00073838 ms
f_counter : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
f_dict : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
f_array : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
n = 1000000
counter 2849.00500992 ms
dict 438.44007806 ms
array 0.00135370 ms
f_counter : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]
f_dict : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]
f_array : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]
Image as answer:
Looks like a sliding window.
Here is my solution:
from collections import defaultdict
string = "103264685134845354863"
d = defaultdict(int)
for elt in range(len(string)-2):
d[string[elt:elt+3]] += 1
d = {key: d[key] for key in d.keys() if d[key] > 1}
With a bit of creativity in for loop(and additional lookup list with True/False/None for example) you should be able to get rid of last line, as you only want to create keys in dict that we visited once up to that point.
Hope it helps :)
-Telling from the perspective of C.
-You can have an int 3-d array results[10][10][10];
-Go from 0th location to n-4th location, where n being the size of the string array.
-On each location, check the current, next and next's next.
-Increment the cntr as resutls[current][next][next's next]++;
-Print the values of
results[1][2][3]
results[2][3][4]
results[3][4][5]
results[4][5][6]
results[5][6][7]
results[6][7][8]
results[7][8][9]
-It is O(n) time, there is no comparisons involved.
-You can run some parallel stuff here by partitioning the array and calculating the matches around the partitions.
inputStr = '123456123138276237284287434628736482376487234682734682736487263482736487236482634'
count = {}
for i in range(len(inputStr) - 2):
subNum = int(inputStr[i:i+3])
if subNum not in count:
count[subNum] = 1
else:
count[subNum] += 1
print count
The Kinect camera returns a depth image for the whole view. Due to the way the image is captured, some small areas are invisible to the camera. For those areas 2047 is returned.
I want to fill those areas with the value that is left of them - which is the most likely value for that area. I have the depth as numpy uint16 array. The trivial solution would be:
for x in xrange(depth.shape[1]):
for y in xrange(depth.shape[0]):
if depth[y,x] == 2047 and x > 0:
depth[y,x] = depth[y,x-1]
This takes around 16 seconds to execute (Raspberry 2) per 640 x 480 frame.
I came up with a solution using indexes:
w = numpy.where(depth == 2047)
w = zip(w[0], w[1])
for index in w:
if depth[index] == 2047 and index[1] > 0:
depth[index] = depth[index[0],index[1] - 1]
This takes around 0.6 seconds to execute for a test frame. Much faster but still far from perfect. Index computation and zip only take 0.04 seconds, so the main performance killer is the loop.
I reduced it to 0.3 seconds by using item():
for index in w:
if depth.item(index) == 2047 and index[1] > 0:
depth.itemset(index, depth.item(index[0],index[1] - 1))
Can this be improved further using only python (+numpy/opencv)? Compared to how fast simple filtering is, it should be possible to be faster than 0.05s
You have islands going behind the places where the elements in the input array are 2. The job here is to fill the shadows with the element right before starting the shadows. So, one way would be to find out the start and stop places of those islands and put x and -x at those places respectively, where x is the element right before starting of each island. Then, do cumsum along the rows, which would effectively fill the shodow-islands with x. That's all there is for a vectorized solution! Here's the implementation -
# Get mask of places to be updated
mask = np.zeros(np.array(depth.shape) + [0,1],dtype=bool)
mask[:,1:-1] = depth[:,1:] == 2047
# Get differentiation along the second axis and thus island start and stops
diffs = np.diff(mask.astype(int),axis=1)
start_mask = diffs == 1
stop_mask = diffs == -1
# Get a mapping array that has island places filled with the start-1 element
map_arr = np.zeros_like(diffs)
map_arr[start_mask] = depth[start_mask]
map_arr[stop_mask] = -depth[start_mask]
map_filled_arr = map_arr.cumsum(1)[:,:-1]
# Use mask created earlier to selectively set elements from map array
valid_mask = mask[:,1:-1]
depth[:,1:][valid_mask] = map_filled_arr[valid_mask]
Benchmarking
Define functions :
def fill_depth_original(depth):
for x in xrange(depth.shape[1]):
for y in xrange(depth.shape[0]):
if depth[y,x] == 2047 and x > 0:
depth[y,x] = depth[y,x-1]
def fill_depth_original_v2(depth):
w = np.where(depth == 2047)
w = zip(w[0], w[1])
for index in w:
if depth[index] == 2047 and index[1] > 0:
depth[index] = depth[index[0],index[1] - 1]
def fill_depth_vectorized(depth):
mask = np.zeros(np.array(depth.shape) + [0,1],dtype=bool)
mask[:,1:-1] = depth[:,1:] == 2047
diffs = np.diff(mask.astype(int),axis=1)
start_mask = diffs == 1
stop_mask = diffs == -1
map_arr = np.zeros_like(diffs)
map_arr[start_mask] = depth[start_mask]
map_arr[stop_mask] = -depth[start_mask]
map_filled_arr = map_arr.cumsum(1)[:,:-1]
valid_mask = mask[:,1:-1]
depth[:,1:][valid_mask] = map_filled_arr[valid_mask]
Runtime tests and verify outputs :
In [303]: # Create a random array and get a copy for profiling vectorized method
...: depth = np.random.randint(2047-150,2047+150,(500,500))
...: depthc1 = depth.copy()
...: depthc2 = depth.copy()
...:
In [304]: fill_depth_original(depth)
...: fill_depth_original_v2(depthc1)
...: fill_depth_vectorized(depthc2)
...:
In [305]: np.allclose(depth,depthc1)
Out[305]: True
In [306]: np.allclose(depth,depthc2)
Out[306]: True
In [307]: # Create a random array and get a copy for profiling vectorized method
...: depth = np.random.randint(2047-150,2047+150,(500,500))
...: depthc1 = depth.copy()
...: depthc2 = depth.copy()
...:
In [308]: %timeit fill_depth_original(depth)
...: %timeit fill_depth_original_v2(depthc1)
...: %timeit fill_depth_vectorized(depthc2)
...:
10 loops, best of 3: 89.6 ms per loop
1000 loops, best of 3: 1.47 ms per loop
100 loops, best of 3: 10.3 ms per loop
So, the second approach listed in the question still looks like winning!
I have two matrices. Both are filled with zeros and ones. One is a big one (3000 x 2000 elements), and the other is smaller ( 20 x 20 ) elements. I am doing something like:
newMatrix = (size of bigMatrix), filled with zeros
l = (a constant)
for y in xrange(0, len(bigMatrix[0])):
for x in xrange(0, len(bigMatrix)):
for b in xrange(0, len(smallMatrix[0])):
for a in xrange(0, len(smallMatrix)):
if (bigMatrix[x, y] == smallMatrix[x + a - l, y + b - l]):
newMatrix[x, y] = 1
Which is being painfully slow. Am I doing anything wrong? Is there a smart way to make this work faster?
edit: Basically I am, for each (x,y) in the big matrix, checking all the pixels of both big matrix and the small matrix around (x,y) to see if they are 1. If they are 1, then I set that value on newMatrix. I am doing a sort of collision detection.
I can think of a couple of optimisations there -
As you are using 4 nested python "for" statements, you are about as slow as you can be.
I can't figure out exactly what you are looking for -
but for one thing, if your big matrix "1"s density is low, you can certainly use python's "any" function on bigMtarix's slices to quickly check if there are any set elements there -- you could get a several-fold speed increase there:
step = len(smallMatrix[0])
for y in xrange(0, len(bigMatrix[0], step)):
for x in xrange(0, len(bigMatrix), step):
if not any(bigMatrix[x: x+step, y: y + step]):
continue
(...)
At this point, if still need to interact on each element, you do another pair of indexes to walk each position inside the step - but I think you got the idea.
Apart from using inner Numeric operations like this "any" usage, you could certainly add some control flow code to break-off the (b,a) loop when the first matching pixel is found.
(Like, inserting a "break" statement inside your last "if" and another if..break pair for the "b" loop.
I really can't figure out exactly what your intent is - so I can't give you more specifc code.
Your example code makes no sense, but the description of your problem sounds like you are trying to do a 2d convolution of a small bitarray over the big bitarray. There's a convolve2d function in scipy.signal package that does exactly this. Just do convolve2d(bigMatrix, smallMatrix) to get the result. Unfortunately the scipy implementation doesn't have a special case for boolean arrays so the full convolution is rather slow. Here's a function that takes advantage of the fact that the arrays contain only ones and zeroes:
import numpy as np
def sparse_convolve_of_bools(a, b):
if a.size < b.size:
a, b = b, a
offsets = zip(*np.nonzero(b))
n = len(offsets)
dtype = np.byte if n < 128 else np.short if n < 32768 else np.int
result = np.zeros(np.array(a.shape) + b.shape - (1,1), dtype=dtype)
for o in offsets:
result[o[0]:o[0] + a.shape[0], o[1]:o[1] + a.shape[1]] += a
return result
On my machine it runs in less than 9 seconds for a 3000x2000 by 20x20 convolution. The running time depends on the number of ones in the smaller array, being 20ms per each nonzero element.
If your bits are really packed 8 per byte / 32 per int,
and you can reduce your smallMatrix to 20x16,
then try the following, here for a single row.
(newMatrix[x, y] = 1 when any bit of the 20x16 around x,y is 1 ??
What are you really looking for ?)
python -m timeit -s '
""" slide 16-bit mask across 32-bit pairs bits[j], bits[j+1] """
import numpy as np
bits = np.zeros( 2000 // 16, np.uint16 ) # 2000 bits
bits[::8] = 1
mask = 32+16
nhit = 16 * [0]
def hit16( bits, mask, nhit ):
"""
slide 16-bit mask across 32-bit pairs bits[j], bits[j+1]
bits: long np.array( uint16 )
mask: 16 bits, int
out: nhit[j] += 1 where pair & mask != 0
"""
left = bits[0]
for b in bits[1:]:
pair = (left << 16) | b
if pair: # np idiom for non-0 words ?
m = mask
for j in range(16):
if pair & m:
nhit[j] += 1
# hitposition = jb*16 + j
m <<= 1
left = b
# if any(nhit): print "hit16:", nhit
' \
'
hit16( bits, mask, nhit )
'
# 15 msec per loop, bits[::4] = 1
# 11 msec per loop, bits[::8] = 1
# mac g4 ppc