Numpy `searchsorted` far slower than my binary search function

Numpy `searchsorted` far slower than my binary search function - python

I was experimenting with binary search, and when I got my version working I figured I would compare its speed to that of NumPy's. I was fairly surprised at the results, for two reasons.
I know that binary search should grow as log n, which mine did, but NumPy grew linearly.
Not only that, but NumPy was just plain slower -- at the start and certainly at the end.
I attached a graph of the results. Orange is NumPy and blue is mine. To the left is the time in milliseconds it took to find the last item in the list (items[-1]) and the bottom shows the length of the list. I have also checked to make sure that my code is returning the correct value and it is.
In case I wasn't clear, my questions are basically "why" two #1 and #2
#binary_search.py
from typing import Iterable
from numba import njit
from numba.typed import List
def _find(items: Iterable[int], to_find: int):
min = -1
max = len(items)
while True:
split = int((max+min)/2)
item = items[split]
if item == to_find:
return split
elif max == min:
print(min, max)
print(items)
print(to_find)
print(split)
exit()
elif item > to_find:
max = split - 1
elif item < to_find:
min = split + 1
def findsorted(_items: Iterable[int], to_find: int):
items = _items
return _find(items, to_find)
#graph_results.py
import binary_search as bs
import sys
import time
import numpy as np
from matplotlib import pyplot as plt
iterations = int(sys.argv[1])
items = [0, 1]
lx = []
ly = []
nx = []
ny = []
for i in range(2, iterations):
l_avg_times = []
n_avg_times = []
items.append(items[-1] + 1)
for _ in range(0, 100):
to_find = items[-1]
lstart = time.time()
bs.findsorted(items, to_find)
lend = time.time()
nstart = lend
np.searchsorted(items, to_find)
nend = time.time()
ltotal = lend-lstart
ntotal = nend-nstart
l_avg_times.append(ltotal)
n_avg_times.append(ntotal)
ly.append(
round(
sum(l_avg_times)/len(l_avg_times),
1000
)*1000
)
lx.append(i)
ny.append(
round(
sum(n_avg_times)/len(n_avg_times),
1000
)*1000
)
nx.append(i)
plt.plot(lx, ly)
plt.plot(nx, ny)
plt.show()

Related

Python iterate over each 100 elements

I don't know if this is a good way to optimize, but basically I am using python inside a 3D app to create random colors per object. And the code I have works well with objects within 10k polygons. But it crashes in 100k polygons. Is there a way to do it by chunks in the loop, basically I have the for loop and using an if statement to filter the first 100. But then I need another 100, and another 100, etc. How can I write that? Maybe with a time sleep between each. It's not going to be faster but at least won't possible crash the program. Thanks.
for i, n in enumerate(uvShellIds):
#code can only perform well within sets of 100 elements
limit = 100 #?
if 0 <= i <= 100:
#do something
print(n)
# now I need it to work on a new set of 100 elements
#if 101 <= i <= 200:
#(...keep going between sets of 100...)
My current code :
import maya.OpenMaya as om
import maya.cmds as cmds
import random
def getUvShelList(name):
selList = om.MSelectionList()
selList.add(name)
selListIter = om.MItSelectionList(selList, om.MFn.kMesh)
pathToShape = om.MDagPath()
selListIter.getDagPath(pathToShape)
meshNode = pathToShape.fullPathName()
uvSets = cmds.polyUVSet(meshNode, query=True, allUVSets =True)
allSets = []
for uvset in uvSets:
shapeFn = om.MFnMesh(pathToShape)
shells = om.MScriptUtil()
shells.createFromInt(0)
# shellsPtr = shells.asUintPtr()
nbUvShells = shells.asUintPtr()
uArray = om.MFloatArray() #array for U coords
vArray = om.MFloatArray() #array for V coords
uvShellIds = om.MIntArray() #The container for the uv shell Ids
shapeFn.getUVs(uArray, vArray)
shapeFn.getUvShellsIds(uvShellIds, nbUvShells, uvset)
# shellCount = shells.getUint(shellsPtr)
shells = {}
for i, n in enumerate(uvShellIds):
#print(i,n)
limit = 100
if i <= limit:
if n in shells:
# shells[n].append([uArray[i],vArray[i]])
shells[n].append( '%s.map[%i]' % ( name, i ) )
else:
# shells[n] = [[uArray[i],vArray[i]]]
shells[n] = [ '%s.map[%i]' % ( name, i ) ]
allSets.append({uvset: shells})
for shell in shells:
selection_shell = shells.get(shell)
cmds.select(selection_shell)
#print(shells.get(shell))
facesSel = cmds.polyListComponentConversion(fromUV=True, toFace=True)
cmds.select(facesSel)
r = [random.random() for i in range(3)]
cmds.polyColorPerVertex(facesSel,rgb=(r[0], r[1], r[2]), cdo=1 )
cmds.select(deselect=1)
getUvShelList( 'polySurface359' )

You can use islice from itertools to chunk.
from itertools import islice
uvShellIds = list(range(1000))
iterator = iter(uvShellIds)
while True:
chunk = list(islice(iterator, 100))
if not chunk:
break
print(chunk) # chunk contains 100 elements you can process
I don't know how well it fits in your current code but, below is how you can process the chunks:
from itertools import islice
uvShellIds = list(range(1000))
iterator = iter(uvShellIds)
offset = 0
while True:
chunk = list(islice(iterator, 100))
if not chunk:
break
# Processing chunk items
for i, n in enumerate(chunk):
# offset + i will give you the right index referring to the uvShellIds variable
# Then , perform your actions
if n in shells:
# shells[n].append([uArray[i],vArray[i]])
shells[n].append( '%s.map[%i]' % ( name, offset + i ) )
else:
# shells[n] = [[uArray[i],vArray[i]]]
shells[n] = [ '%s.map[%i]' % ( name, offset + i ) ]
offset += 100
# Your sleep can come here
The snippet above should replace your for i, n in enumerate(uvShellIds): block.
As #David Culbreth's answer stated, I'm not sure the sleep will be of help, but I left a comment on where you can place it.

I use this generator to "chunkify" my long-running operations in python into smaller batches:
def chunkify_list(items, chunk_size):
for i in range(0, len(items), chunk_size):
yield items[i:i+chunk_size]
With this defined, you can write your program something like this:
items = [1,2,3,4,5 ...]
for chunk in chunkify_list(items, 100):
for item in chunk:
process_item(item)
sleep(delay)
Now, I'm not going to guarantee that sleep will actually solve your problems, but this lets you handle your data one chunk at a time.

Any easy way to transform a missing number sequence to its range?

Suppose I have a list that goes like :
'''
[1,2,3,4,9,10,11,20]
'''
I need the result to be like :
'''
[[4,9],[11,20]]
'''
I have defined a function that goes like this :
def get_range(lst):
i=0
seqrange=[]
for new in lst:
a=[]
start=new
end=new
if i==0:
i=1
old=new
else:
if new - old >1:
a.append(old)
a.append(new)
old=new
if len(a):
seqrange.append(a)
return seqrange
Is there any other easier and efficient way to do it? I need to do this in the range of millions.

You can use numpy arrays and the diff function that comes along with them. Numpy is so much more efficient than looping when you have millions of rows.
Slight aside:
Why are numpy arrays so fast? Because they are arrays of data instead of arrays of pointers to data (which is what Python lists are), because they offload a whole bunch of computations to a backend written in C, and because they leverage the SIMD paradigm to run a Single Instruction on Multiple Data simultaneously.
Now back to the problem at hand:
The diff function gives us the difference between consecutive elements of the array. Pretty convenient, given that we need to find where this difference is greater than a known threshold!
import numpy as np
threshold = 1
arr = np.array([1,2,3,4,9,10,11,20])
deltas = np.diff(arr)
# There's a gap wherever the delta is greater than our threshold
gaps = deltas > threshold
gap_indices = np.argwhere(gaps)
gap_starts = arr[gap_indices]
gap_ends = arr[gap_indices + 1]
# Finally, stack the two arrays horizontally
all_gaps = np.hstack((gap_starts, gap_ends))
print(all_gaps)
# Output:
# [[ 4 9]
# [11 20]]
You can access all_gaps like a 2D matrix: all_gaps[0, 1] would give you 9, for example. If you really need the answer as a list-of-lists, simply convert it like so:
all_gaps_list = all_gaps.tolist()
print(all_gaps_list)
# Output: [[4, 9], [11, 20]]
Comparing the runtime of the iterative method from #happydave's answer with the numpy method:
import random
import timeit
import numpy
def gaps1(arr, threshold):
deltas = np.diff(arr)
gaps = deltas > threshold
gap_indices = np.argwhere(gaps)
gap_starts = arr[gap_indices]
gap_ends = arr[gap_indices + 1]
all_gaps = np.hstack((gap_starts, gap_ends))
return all_gaps
def gaps2(lst, thr):
seqrange = []
for i in range(len(lst)-1):
if lst[i+1] - lst[i] > thr:
seqrange.append([lst[i], lst[i+1]])
return seqrange
test_list = [i for i in range(100000)]
for i in range(100):
test_list.remove(random.randint(0, len(test_list) - 1))
test_arr = np.array(test_list)
# Make sure both give the same answer:
assert np.all(gaps1(test_arr, 1) == gaps2(test_list, 1))
t1 = timeit.timeit('gaps1(test_arr, 1)', setup='from __main__ import gaps1, test_arr', number=100)
t2 = timeit.timeit('gaps2(test_list, 1)', setup='from __main__ import gaps2, test_list', number=100)
print(f"t1 = {t1}s; t2 = {t2}s; Numpy gives ~{t2 // t1}x speedup")
On my laptop, this gives:
t1 = 0.020834800001466647s; t2 = 1.2446780000027502s; Numpy gives ~59.0x speedup
My word that's fast!

There is iterator based solution. It'is allow to get intervals one by one:
flist = [1,2,3,4,9,10,11,20]
def get_range(lst):
start_idx = lst[0]
for current_idx in flist[1:]:
if current_idx > start_idx+1:
yield [start_idx, current_idx]
start_idx = current_idx
for inverval in get_range(flist):
print(inverval)

I don't think there's anything inefficient about the solution, but you can clean up the code quite a bit:
seqrange = []
for i in range(len(lst)-1):
if lst[i+1] - lst[i] > 1:
seqrange.append([lst[i], lst[i+1]])

I think this could be more efficient and a bit cleaner.
def func(lst):
ans=0
final=[]
sol=[]
for i in range(1,lst[-1]+1):
if(i not in lst):
ans+=1
final.append(i)
elif(i in lst and ans>0):
final=[final[0]-1,i]
sol.append(final)
ans=0
final=[]
else:
final=[]
return(sol)

Faster implementation for shuffling buckets in a dataset? Python

Is there a faster way to implement this? Each row is about 1024 buckets and it's not as fast as I wish it was..
I'd like to generate quite a lot but it needs a few hours to complete as it is. It's quite the bottleneck at this point.. Any suggestions or ideas for how to optimize it would be greatly appreciated!
Edit*
Apologies for not having a minimal working example before. Now it's posted. If optimization could be made for Python 2.7 it would be very appreciated.
import math
import numpy as np
import copy
import random
def number_to_move(n):
l=math.exp(-n)
k=0
p=1.0
while p>l:
k += 1
p *= random.random()
return k-n
def createShuffledDataset(input_data, shuffle_indexes_dict, shuffle_quantity):
shuffled = []
for key in shuffle_indexes_dict:
for values in shuffle_indexes_dict[key]:
temp_holder = copy.copy(input_data[values[0] - 40: values[1]]) #may need to increase 100 padding
for line in temp_holder:
buckets = range(1,1022)
for bucket in buckets:
bucket_value = line[bucket]
proposed_number = number_to_move(bucket_value)
moving_amount = abs(proposed_number) if bucket_value - abs(proposed_number) >= 0 else bucket_value
line[bucket] -= moving_amount
if proposed_number > 0:
line[bucket + 1] += moving_amount
else:
line[bucket - 1] += moving_amount
shuffled.extend(temp_holder)
return np.array(shuffled)
example_data = np.ones((100,1024))
shuffle_indexes = {"Ranges to Shuffle 1" : [[10,50], [53, 72]]}
shuffle_quantity = 1150
shuffled_data = createShuffledDataset(example_data, shuffle_indexes,
shuffle_quantity)

Some minors things you might try:
merge key and values into single call:
for key, values in shuffle_indexes_dict.iteritems():
use xrange rather range
bucket values seem to be integer - try caching them.
_cache = {}
def number_to_move(n):
v = _cache.get(n, None)
if v is not None: return v
l=math.exp(-n)
k=0
p=1.0
while p>l:
k += 1
p *= random.random()
v = k-n
_cache[n] = v
return v
If shuffle_indexes_dict ranges are exclusive, then you can - probably - avoid coping values from input_data.
Otherwise i'd say you're out of luck.

Parallelization/multiprocessing of conditional for loop

I want to use multiprocessing in Python to speed up a while loop.
More specifically:
I have a matrix (samples*features). I want to select x subsets of samples whose values at a random subset of features is unequal to a certain value (-1 in this case).
My serial code:
np.random.seed(43)
datafile = '...'
df = pd.read_csv(datafile, sep=" ", nrows = 89)
no_feat = 500
no_samp = 5
no_trees = 5
i=0
iter=0
samples = np.zeros((no_trees, no_samp))
features = np.zeros((no_trees, no_feat))
while i < no_trees:
rand_feat = np.random.choice(df.shape[1], no_feat, replace=False)
iter_order = np.random.choice(df.shape[0], df.shape[0], replace=False)
samp_idx = []
a=0
#--------------
#how to run in parallel?
for j in iter_order:
pot_samp = df.iloc[j, rand_feat]
if len(np.where(pot_samp==-1)[0]) == 0:
samp_idx.append(j)
if len(samp_idx) == no_samp:
print a
break
a+=1
#--------------
if len(samp_idx) == no_samp:
samples[i,:] = samp_idx
features[i, :] = rand_feat
i+=1
iter+=1
if iter>1000: #break if subsets cannot be found
break
Searching for fitting samples is the potentially expensive part (the j for loop), which in theory can be run in parallel. In some cases, it is not necessary to iterate over all samples to find a large enough subset, which is why I am breaking out of the loop as soon as the subset is large enough.
I am struggling to find an implementation that would allow for checks of how many valid results are generated already. Is it even possible?
I have used joblib before. If I understand correctly this uses the pool methods of multiprocessing as a backend which only works for separate tasks? I am thinking that queues might be helpful but thus far I failed at implementing them.

I found a working solution. I decided to run the while loop in parallel and have the different processes interact over a shared counter. Furthermore, I vectorized the search for suitable samples.
The vectorization yielded a ~300x speedup and running on 4 cores speeds up the computation ~twofold.
First I tried to implement separate processes and put the results into a queue. Turns out these aren't made to store large amounts of data.
If someone sees another bottleneck in that code I would be glad if someone pointed it out.
With my basically nonexistent knowledge about parallel computing I found it really hard to puzzle this together, especially since the example on the internet are all very basic. I learnt a lot though =)
My code:
import numpy as np
import pandas as pd
import itertools
from multiprocessing import Pool, Lock, Value
from datetime import datetime
import settings
val = Value('i', 0)
worker_ID = Value('i', 1)
lock = Lock()
def findSamp(no_trees, df, no_feat, no_samp):
lock.acquire()
print 'starting worker - {0}'.format(worker_ID.value)
worker_ID.value +=1
worker_ID_local = worker_ID.value
lock.release()
max_iter = 100000
samp = []
feat = []
iter_outer = 0
iter = 0
while val.value < no_trees and iter_outer<max_iter:
rand_feat = np.random.choice(df.shape[1], no_feat, replace=False
#get samples with random features from dataset;
#find and select samples that don't have missing values in the random features
samp_rand = df.iloc[:,rand_feat]
nan_idx = np.unique(np.where(samp_rand == -1)[0])
all_idx = np.arange(df.shape[0])
notnan_bool = np.invert(np.in1d(all_idx, nan_idx))
notnan_idx = np.where(notnan_bool == True)[0]
if notnan_idx.shape[0] >= no_samp:
#if enough samples for random feature subset, select no_samp samples randomly
notnan_idx_rand = np.random.choice(notnan_idx, no_samp, replace=False)
rand_feat_rand = rand_feat
lock.acquire()
val.value += 1
#x = val.value
lock.release()
#print 'no of trees generated: {0}'.format(x)
samp.append(notnan_idx_rand)
feat.append(rand_feat_rand)
else:
#increase iter_outer counter if no sample subset could be found for random feature subset
iter_outer += 1
iter+=1
if iter >= max_iter:
print 'exiting worker{0} because iter >= max_iter'.format(worker_ID_local)
else:
print 'worker{0} - finished'.format(worker_ID_local)
return samp, feat
def initialize(*args):
global val, worker_ID, lock
val, worker_ID, lock = args
def star_findSamp(i_df_no_feat_no_samp):
return findSamp(*i_df_no_feat_no_samp)
if __name__ == '__main__':
np.random.seed(43)
datafile = '...'
df = pd.read_csv(datafile, sep=" ", nrows = 89)
df = df.fillna(-1)
df = df.iloc[:, 6:]
no_feat = 700
no_samp = 10
no_trees = 5000
startTime = datetime.now()
print 'starting multiprocessing'
ncores = 4
p = Pool(ncores, initializer=initialize, initargs=(val, worker_ID, lock))
args = itertools.izip([no_trees]*ncores, itertools.repeat(df), itertools.repeat(no_feat), itertools.repeat(no_samp))
result = p.map(star_findSamp, args)#, callback=log_result)
p.close()
p.join()
print '{0} sample subsets for tree training have been found'.format(val.value)
samples = [x[0] for x in result if x != None]
samples = np.vstack(samples)
features = [x[1] for x in result if x != None]
features = np.vstack(features)
print datetime.now() - startTime

comparing large vectors in python

I have two large vectors (~133000 values) of different length. They are each sortet from small to large values. I want to find values that are similar within a given tolerance. This is my solution but it is very slow. Is there a way to speed this up?
import numpy as np
for lv in range(np.size(vector1)):
for lv_2 in range(np.size(vector2)):
if np.abs(vector1[lv_2]-vector2[lv])<.02:
print(vector1[lv_2],vector2[lv],lv,lv_2)
break

Your algorithm is far from optimal. You compare way too much values. Assume you are at a certain position in vector1 and the current value in vector2 is already more than 0.02 bigger. Why would you compare the rest of vector2?
Start with something like
pos1 = 0
pos2 = 0
Now compare the values at those postions in your vectors. If the difference is too big, move the position of the smaller one fowared and check again. Continue until you reach the end of one vector.

haven't tested it, but the following should work. The idea is to exploit the fact that the vectors are sorted
lv_1, lv_2 = 0,0
while lv_1 < len(vector1) and lv_2 < len(vector2):
if np.abs(vector1[lv_2]-vector2[lv_1])<.02:
print(vector1[lv_2],vector2[lv_1],lv_1,lv_2)
lv_1 += 1
lv_2 += 1
elif vector1[lv_1] < vector2[lv_2]: lv_1 += 1
else: lv_2 += 1

The following code gives a nice increase in performance that depends upon how dense the numbers are. Using a set of 1000 random numbers, sampled uniformly between 0 and 100, it runs about 30 times faster than your implementation.
pos_1_start = 0
for i in range(np.size(vector1)):
for j in range(pos1_start, np.size(vector2)):
if np.abs(vector1[i] - vector2[j]) < .02:
results1 += [(vector1[i], vector2[j], i, j)]
else:
if vector2[j] < vector1[i]:
pos1_start += 1
else:
break
The timing:
time new method: 0.112464904785
time old method: 3.59720897675
Which is produced by the following script:
import random
import numpy as np
import time
# initialize the vectors to be compared
vector1 = [random.uniform(0, 40) for i in range(1000)]
vector2 = [random.uniform(0, 40) for i in range(1000)]
vector1.sort()
vector2.sort()
# the arrays that will contain the results for the first method
results1 = []
# the arrays that will contain the results for the second method
results2 = []
pos1_start = 0
t_start = time.time()
for i in range(np.size(vector1)):
for j in range(pos1_start, np.size(vector2)):
if np.abs(vector1[i] - vector2[j]) < .02:
results1 += [(vector1[i], vector2[j], i, j)]
else:
if vector2[j] < vector1[i]:
pos1_start += 1
else:
break
t1 = time.time() - t_start
print "time new method:", t1
t = time.time()
for lv1 in range(np.size(vector1)):
for lv2 in range(np.size(vector2)):
if np.abs(vector1[lv1]-vector2[lv2])<.02:
results2 += [(vector1[lv1], vector2[lv2], lv1, lv2)]
t2 = time.time() - t_start
print "time old method:", t2
# sort the results
results1.sort()
results2.sort()
print np.allclose(results1, results2)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy `searchsorted` far slower than my binary search function - python

Related

Python iterate over each 100 elements

Any easy way to transform a missing number sequence to its range?

Faster implementation for shuffling buckets in a dataset? Python

Parallelization/multiprocessing of conditional for loop

comparing large vectors in python

Categories

Resources