Related
I have two ordered lists of consecutive integers m=0, 1, ... M and n=0, 1, 2, ... N. Each value of m has a probability pm, and each value of n has a probability pn. I am trying to find the ordered list of unique values r=n/m and their probabilities pr. I am aware that r is infinite if n=0 and can even be undefined if m=n=0.
In practice, I would like to run for M and N each be of the order of 2E4, meaning up to 4E8 values of r - which would mean 3 GB of floats (assuming 8 Bytes/float).
For this calculation, I have written the python code below.
The idea is to iterate over m and n, and for each new m/n, insert it in the right place with its probability if it isn't there yet, otherwise add its probability to the existing number. My assumption is that it is easier to sort things on the way instead of waiting until the end.
The cases related to 0 are added at the end of the loop.
I am using the Fraction class since we are dealing with fractions.
The code also tracks the multiplicity of each unique value of m/n.
I have tested up to M=N=100, and things are quite slow. Are there better approaches to the question, or more efficient ways to tackle the code?
Timing:
M=N=30: 1 s
M=N=50: 6 s
M=N=80: 30 s
M=N=100: 82 s
import numpy as np
from fractions import Fraction
import time # For timiing
start_time = time.time() # Timing
M, N = 6, 4
mList, nList = np.arange(1, M+1), np.arange(1, N+1) # From 1 to M inclusive, deal with 0 later
mProbList, nProbList = [1/(M+1)]*(M), [1/(N+1)]*(N) # Probabilities, here assumed equal (not general case)
# Deal with mn=0 later
pmZero, pnZero = 1/(M+1), 1/(N+1) # P(m=0) and P(n=0)
pNaN = pmZero * pnZero # P(0/0) = P(m=0)P(n=0)
pZero = pmZero * (1 - pnZero) # P(0) = P(m=0)P(n!=0)
pInf = pnZero * (1 - pmZero) # P(inf) = P(m!=0)P(n=0)
# Main list of r=m/n, P(r) and mult(r)
# Start with first line, m=1
rList = [Fraction(mList[0], n) for n in nList[::-1]] # Smallest first
rProbList = [mProbList[0] * nP for nP in nProbList[::-1]] # Start with first line
rMultList = [1] * len(rList) # Multiplicity of each element
# Main loop
for m, mP in zip(mList[1:], mProbList[1:]):
for n, nP in zip(nList[::-1], nProbList[::-1]): # Pick an n value
r, rP, rMult = Fraction(m, n), mP*nP, 1
for i in range(len(rList)-1): # See where it fits in existing list
if r < rList[i]:
rList.insert(i, r)
rProbList.insert(i, rP)
rMultList.insert(i, 1)
break
elif r == rList[i]:
rProbList[i] += rP
rMultList[i] += 1
break
elif r < rList[i+1]:
rList.insert(i+1, r)
rProbList.insert(i+1, rP)
rMultList.insert(i+1, 1)
break
elif r == rList[i+1]:
rProbList[i+1] += rP
rMultList[i+1] += 1
break
if r > rList[-1]:
rList.append(r)
rProbList.append(rP)
rMultList.append(1)
break
# Deal with 0
rList.insert(0, Fraction(0, 1))
rProbList.insert(0, pZero)
rMultList.insert(0, N)
# Deal with infty
rList.append(np.Inf)
rProbList.append(pInf)
rMultList.append(M)
# Deal with undefined case
rList.append(np.NAN)
rProbList.append(pNaN)
rMultList.append(1)
print(".... done in %s seconds." % round(time.time() - start_time, 2))
print("************** Final list\nr", 'Prob', 'Mult')
for r, rP, rM in zip(rList, rProbList, rMultList): print(r, rP, rM)
print("************** Checks")
print("mList", mList, 'nList', nList)
print("Sum of proba = ", np.sum(rProbList))
print("Sum of multi = ", np.sum(rMultList), "\t(M+1)*(N+1) = ", (M+1)*(N+1))
Based on the suggestion of #Prune, and on this thread about merging lists of tuples, I have modified the code as below. It's a lot easier to read, and runs about an order of magnitude faster for N=M=80 (I have omitted dealing with 0 - would be done same way as in original post). I assume there may be ways to tweak the merge and conversion back to lists further yet.
# Do calculations
data = [(Fraction(m, n), mProb(m) * nProb(n)) for n in range(1, N+1) for m in range(1, M+1)]
data.sort()
# Merge duplicates using a dictionary
d = {}
for r, p in data:
if not (r in d): d[r] = [0, 0]
d[r][0] += p
d[r][1] += 1
# Convert back to lists
rList, rProbList, rMultList = [], [], []
for k in d:
rList.append(k)
rProbList.append(d[k][0])
rMultList.append(d[k][1])
I expect that "things are quite slow" because you've chosen a known inefficient sort. A single list insertion is O(K) (later list elements have to be bumped over, and there is added storage allocation on a regular basis). Thus a full-list insertion sort is O(K^2). For your notation, that is O((M*N)^2).
If you want any sort of reasonable performance, research and use the best-know methods. The most straightforward way to do this is to make your non-exception results as a simple list comprehension, and use the built-in sort for your penultimate list. Simply append your n=0 cases, and you're done in O(K log K) time.
I the expression below, I've assumed functions for m and n probabilities.
This is a notational convenience; you know how to directly compute them, and can substitute those expressions if you wish.
data = [ (mProb(m) * nProb(n), Fraction(m, n))
for n in range(1, N+1)
for m in range(0, M+1) ]
data.sort()
data.extend([ # generate your "zero" cases here ])
I have an extremely large list of coordinates in the form of a list of tuples.
data = [(1,1),(1,11),(1,21),(11,1),(21,1),(11,11),(11,21),(21,11),(21,21),(1,2),(2,1)]
The list of tuple is actually being formed by a for loop with an append command like so:
data = []
for i in source: # where i a tuple of form (x,y)
data.append(i)
Is there an approach to ensure euclidean distance between all tuples is above a certain threshold? In this example there is a very small distance between (1,1),(1,2),(2,1). In this scenario I would like to keep only one of the 3 tuples. Resulting in either one of these new list of tuples:
data = [(1,1),(1,11),(1,21),(11,1),(21,1),(11,11),(11,21),(21,11),(21,21)]
data = [(2,1),(1,11),(1,21),(11,1),(21,1),(11,11),(11,21),(21,11),(21,21)]
data = [(1,2),(1,11),(1,21),(11,1),(21,1),(11,11),(11,21),(21,11),(21,21)]
I have a brute force algorithm that iterates through the list but there should be a more elegant way or quicker way to do this? Or is there any other methods to speed up this operation? I am expecting lists of ~70k up to 500k tuples.
My method:
from scipy.spatial.distance import euclidean
data = [(1,1),(1,11),(1,21),(11,1),(21,1),(11,11),(11,21),(21,11),(21,21),(1,2),(2,1)]
new_data = []
while len(data) >0:
check = data.pop()
flag = True
for i in data:
if euclidean(check,i) < 5:
flag = False
break
else:
pass
if flag == True:
new_data.append(check)
else:
flag = True
Additional points:
Although the list of tuples is coming from some iterative function, the order of tuples is uncertain.
Actual number of tuples is unknown until end of for loop.
I would rather avoid multiprocessing/multithreading for speed up in this scenario.
If necessary I can put up some timings but I dont think its necessary.
The solution I have right now is time O(n(n-1)/2) and space complexity of O(n) I think so any improvement would be better.
You can organize your 2D data/tuples using a Quadtree.
Quadtrees are the two-dimensional analog of octrees and are most often used to partition a two-dimensional space by recursively subdividing it into four quadrants or regions.
you can use numpy try this :
import numpy as np
data = [(1,1),(1,11),(1,21),(11,1),(21,1),(11,11),(11,21),(21,11),(21,21),(1,2),(2,1)]
start_time = time.time()
#transform to numpy array
a = np.array(data)
subs = a[:,None] - a
#calculate ecludien distance between all element
dist=np.sqrt(np.einsum('ijk,ijk->ij',subs,subs))
#replace 0 to 5 because distance distance between identic element will be 0
dist=np.where(dist == 0, 5, dist)
#select element where distance sup to 5
dist_bool=[dist[:,0] < 5]
#select element where distance sup to 5 are false
a=a[dist_bool[0] == False]
print("--- %s seconds ---" % (time.time() - start_time))#got --- 0.00020575523376464844 seconds ---
when we compare to your soltion :
start_time = time.time()
new_data = []
while len(data) >0:
check = data.pop()
flag = True
for i in data:
if euclidean(check,i) < 5:
flag = False
break
else:
pass
if flag == True:
new_data.append(check)
else:
flag = True
print("--- %s seconds ---" % (time.time() - start_time))# got ---0.001013040542602539 seconds ---
Within a loop that collects some samples, I need to obtain some statistics about their sorted indices every now and then, for which argsort returns exactly what I need. However, each iteration adds only a single sample, and it is a huge waste of resources to keep passing the whole samples array to the argsort function, especially since the samples array is very huge. Is not there an incremental efficient technique equivalent to argsort?
I believe an efficient incremental argsort function can be implemented by maintaining an ordered list of samples, which can be searched for the proper insertion indices once a new sample arrives. Such indices can be then utilized to both maintain the order of the samples list as well as to generate the incremental argsort-like desired output.
So far, I have utilized the searchsorted2d function by #Divakar, with slight modifications to obtain the insertion indices, and built some routine that can get the desired output if it is called after each sample insertion (b = 1).
Yet, this is inefficient, and I would like to call the routine after the collection of kth samples (e.g. b = 10). In the case of bulk insertions, searchsorted2d seems to return incorrect indices, and that is were I stopped!
import time
import numpy as np
# By Divakar
# See https://stackoverflow.com/a/40588862
def searchsorted2d(a, b):
m, n = a.shape
max_num = np.maximum(a.max() - a.min(), b.max() - b.min()) + 1
r = max_num * np.arange(m)[:,np.newaxis]
p = np.searchsorted((a + r).ravel(), (b + r).ravel()).reshape(b.shape)
return p #- n * (np.arange(m)[:,np.newaxis])
# The following works with batch size b = 1,
# but that is not efficient ...
# Can we make it work for any b > 0 value?
class incremental(object):
def __init__(self, shape):
# Express each row offset
self.ranks_offset = np.tile(np.arange(shape[1]).reshape(1, -1),
(shape[0], 1))
# Storage for sorted samples
self.a_sorted = np.empty((shape[0], 0))
# Storage for sort indices
self.a_ranks = np.empty((shape[0], 0), np.int)
def argsort(self, a):
if self.a_sorted.shape[1] == 0: # Use np.argsort for initialization
self.a_ranks = a.argsort(axis=1)
self.a_sorted = np.take_along_axis(a, self.a_ranks, 1)
else: # In later itterations,
# searchsorted the input increment
indices = searchsorted2d(self.a_sorted, a)
# insert the stack pos to track the sorting indices
self.a_ranks = np.insert(self.a_ranks, indices.ravel(),
self.ranks_offset.ravel() +
self.a_ranks.shape[1]).reshape((n, -1))
# insert the increments to maintain a sorted input array
self.a_sorted = np.insert(self.a_sorted, indices.ravel(),
a.ravel()).reshape((n, -1))
return self.a_ranks
M = 1000 # number of iterations
n = 16 # vector size
b = 10 # vectors batch size
# Storage for samples
samples = np.zeros((n, M)) * np.nan
# The proposed approach
inc = incremental((n, b))
c = 0 # iterations counter
tick = time.time()
while c < M:
if c % b == 0: # Perform batch computations
#sample_ranks = samples[:,:c].argsort(axis=1)
sample_ranks = inc.argsort(samples[:,max(0,c-b):c]) # Incremental argsort
######################################################
# Utilize sample_ranks in some magic statistics here #
######################################################
samples[:,c] = np.random.rand(n) # collect a sample
c += 1 # increment the counter
tock = time.time()
last = ((c-1) // b) * b
sample_ranks_GT = samples[:,:last].argsort(axis=1) # Ground truth
print('Compatibility: {0:.1f}%'.format(
100 * np.count_nonzero(sample_ranks == sample_ranks_GT) / sample_ranks.size))
print('Elapsed time: {0:.1f}ms'.format(
(tock - tick) * 1000))
I would expect 100% compatibility with the argsort function, yet it need to be more efficient than calling argsort. As for execution time with an incremental approach, it seems that 15ms or so should be more than enough for the given example.
So far, only one condition of these two can be met with any of the explored techniques.
To make a long story short, the shown above algorithm seems to be a variant of an order-statistic tree to estimate the data ranks, but it fails to do so when samples are added in bulk (b > 1). So far, it only works when inserting samples one by one (b = 1). However, the arrays are copied every time insert is called, which causes a huge overhead and forms a bottleneck, therefore samples shall be added in bulks rather than individually.
Can you introduce more efficient incremental argsort algorithm, or at least figure out how to support bulk insertion (b > 1) in the above one?
If you choose to start from where I stopped, then the problem can be reduced to fixing the bug in following snapshot:
import numpy as np
# By Divakar
# See https://stackoverflow.com/a/40588862
def searchsorted2d(a, b):
m, n = a.shape
max_num = np.maximum(a.max() - a.min(), b.max() - b.min()) + 1
r = max_num * np.arange(m)[:,np.newaxis]
p = np.searchsorted((a + r).ravel(), (b + r).ravel()).reshape(b.shape)
# It seems the bug is around here...
#return p - b.shape[0] * np.arange(b.shape[1])[np.newaxis]
#return p - b.shape[1] * np.arange(b.shape[0])[:,np.newaxis]
return p
n = 16 # vector size
b = 2 # vectors batch size
a = np.random.rand(n, 1) # Samples array
a_ranks = a.argsort(axis=1) # Initial ranks
a_sorted = np.take_along_axis(a, a_ranks, 1) # Initial sorted array
new_data = np.random.rand(n, b) # New block to append into the samples array
a = np.hstack((a, new_data)) #Append new block
indices = searchsorted2d(a_sorted, new_data) # Compute insertion indices
ranks_offset = np.tile(np.arange(b).reshape(1, -1), (a_ranks.shape[0], 1)) + a_ranks.shape[1] # Ranks to insert
a_ranks = np.insert(a_ranks, indices.ravel(), ranks_offset.ravel()).reshape((n, -1)) # Insert ransk according to their indices
a_ransk_GT = a.argsort(axis=1) # Ranks ground truth
mask = (a_ranks == a_ransk_GT)
print(mask) #Why they are not all True?
assert(np.all(mask)), 'Oops!' #This should not fail, but it does :(
It seems the bulk insertion is more involved that what I initially thought, and that searchsorted2d is not to be blamed. Take the case of a sorted array a = [ 1, 2, 5 ], and two new elements block = [3, 4] to be inserted. If we iterate and insert, then np.searchsorted(a, block[i]) would return [2] and [3], and that is just OK. However, if we call np.searchsorted(a, block) (the desired behavior - equivalent to iteration without insertion), we would get [2, 2]. This is problematic for implementing an incremental argsort, since even np.searchsorted(a, block[::-1]) would result in the same. Any idea?
It turned out that the returned indices by searchsorted are not enough to ensure a sorted array when dealing with batch inputs. If the being inserted block contains two entries that are out of order, yet they will end up being placed adjacently in the target array, then they will receive the exact same insertion index, thus get inserted in their current order, causing the glitch. Accordingly, the input block itself needs to be sorted before insertion. See the last paragraph of the question for a numerical example.
By sorting the input block and adapting the remaining parts, a 100.0% compatible solution with argsort is obtained, and it is very efficient (elapsed time is 15.6ms for inserting 1000 entries in ten by ten blocks b = 10). This can be reproduced by replacing the buggy incremental class found in the question with the following one:
# by Hamdi Sahloul
class incremental(object):
def __init__(self, shape):
# Storage for sorted samples
self.a_sorted = np.empty((shape[0], 0))
# Storage for sort indices
self.a_ranks = np.empty((shape[0], 0), np.int)
def argsort(self, block):
# Compute the ranks of the input block
block_ranks = block.argsort(axis=1)
# Sort the block accordingly
block_sorted = np.take_along_axis(block, block_ranks, 1)
if self.a_sorted.shape[1] == 0: # Initalize using the block data
self.a_ranks = block_ranks
self.a_sorted = block_sorted
else: # In later itterations,
# searchsorted the input block
indices = searchsorted2d(self.a_sorted, block_sorted)
# update the global ranks
self.a_ranks = np.insert(self.a_ranks, indices.ravel(),
block_ranks.ravel() +
self.a_ranks.shape[1]).reshape((block.shape[0], -1))
# update the overall sorted array
self.a_sorted = np.insert(self.a_sorted, indices.ravel(),
block_sorted.ravel()).reshape((block.shape[0], -1))
return self.a_ranks
I want to make changes to the coefficients in an existing model. Currently (with the Python API) I'm looping through the constraints and calling model.chgCoeff but it's quite slow. Is there a faster way, perhaps accessing the constraint matrix directly, in the Python and/or C API?
Example code below. The reason it's slow seems to be mostly because of the loop itself; replacing chgCoeff with any other operation is still slow. Normally I would get around this by using vector operations rather than for loops, but without access to the constraint matrix I don't think I can do that.
from __future__ import division
import gurobipy as gp
import numpy as np
import time
N = 300
M = 2000
m = gp.Model()
m.setParam('OutputFlag', False)
masks = [np.random.rand(N) for i in range(M)]
p = 1/np.random.rand(N)
rets = [p * masks[i] - 1 for i in range(M)]
v = np.random.rand(N)*10000 * np.round(np.random.rand(N))
t = m.addVar()
x = [m.addVar(vtype=gp.GRB.SEMICONT, lb=1000, ub=v[i]) for i in range(N)]
m.update()
cons = [m.addConstr(t <= gp.LinExpr(ret, x)) for ret in rets]
m.setObjective(t, gp.GRB.MAXIMIZE)
m.update()
start_time = time.time()
m.optimize()
solve_ms = int(((time.time() - start_time)*1000))
print('First solve took %s ms' % solve_ms)
p = 1/np.random.rand(N)
rets = [p * masks[i] - 1 for i in range(M)]
start_time = time.time()
for i in range(M):
for j in range(N):
if rets[i][j] != -1:
m.chgCoeff(cons[i], x[j], -rets[i][j])
m.update()
update_ms = int(((time.time() - start_time)*1000))
print('Model update took %s ms' % update_ms)
start_time = time.time()
m.optimize()
solve_ms = int(((time.time() - start_time)*1000))
print('Second solve took %s ms' % solve_ms)
k = 2
start_time = time.time()
for i in range(M):
for j in range(N):
if rets[i][j] != -1:
k *= rets[i][j]
solve_ms = int(((time.time() - start_time)*1000))
print('Plain loop took %s ms' % solve_ms)
R = np.array(rets)
start_time = time.time()
S = np.copy(R)
copy_ms = int(((time.time() - start_time)*1000))
print('np.copy() took %s ms' % copy_ms)
Output:
First solve took 1767 ms
Model update took 2051 ms
Second solve took 1872 ms
Plain loop took 1103 ms
np.copy() took 3 ms
A call to np.copy on the size (2000, 300) constraint matrix takes 3ms. Is there a fundamental reason I'm missing that the whole model update can't be that fast?
You can't access the constraint matrix directly in Gurobi using the Python interface. Even if you could, you couldn't do an np.copy operation because the matrix is in the CSR format, not a dense format. To make the sort of wholesale changes to a constraint, it is better to modify the constraint matrix by removing the constraint and adding a new one. In your case, the changes to each are so significant that you aren't going to get much benefit from a warm start, so you won't loose anything by not keeping the same constraint objects.
Assuming you adjust the rets array in the code above for the -1 special case, the following code will do want you want and be much faster.
for con in cons:
m.remove(con)
new_cons = [m.addConstr(t <= gp.LinExpr(ret, x)) for ret in rets]
I'm trying to sort two large four dimensional arrays in numpy.
I want to sort based on the values axis 2 of the first array, and sort the second array by the same indices. All other axes should remain in the same order for both arrays.
The following code does what I want, but relies on looping in python, so it's slow. The arrays are quite large, so I'd really like to get this working using compiled numpy operations for performance reasons. Or some other means of getting this block of code to be compiled (Cython?).
import numpy as np
data = np.random.rand(10,6,4,1)
data2 = np.random.rand(10,6,4,3)
print data[0,0,:,:]
print data2[0,0,:,:]
for n in range(data.shape[0]):
for m in range(data.shape[1]):
sort_ids = np.argsort(data[n,m,:,0])
data[n,m,:,:] = data[n,m,sort_ids,:]
data2[n,m,:,:] = data2[n,m,sort_ids,:]
print data[0,0,:,:]
print data2[0,0,:,:]
Maybe there is a better solution but this should work:
sort_ids = np.argsort(data,axis=2)
s1 = data.shape
s2 = data2.shape
d1 = data[np.arange(s1[0])[:,None,None,None],np.arange(s1[1])[None,:,None,None],sort_ids,np.arange(s1[3])[None,None,None,:]]
d2 = data2[np.arange(s2[0])[:,None,None,None],np.arange(s2[1])[None,:,None,None],sort_ids,np.arange(s2[3])[None,None,None,:]]
At least the output is identical to your code.
Found a way to make this work. It requires storing an index array, which may cause some memory issues for me, but it's way faster. Example code with timing comparison:
import numpy as np
import time
loops = 1000
data = np.random.rand(100,6,4,1)
data2 = np.random.rand(100,6,4,3)
start = time.time()
for n in range(loops):
idxs = np.indices(data.shape)
idxs2 = np.indices(data2.shape)
sort_ids = np.argsort(data, 2)
sorted_data = data[idxs[0], idxs[1], sort_ids, idxs[3]]
sorted_data2 = data2[idxs2[0], idxs2[1], np.repeat(sort_ids, data2.shape[3], 3), idxs2[3]]
print 'Time Elapsed: %5.2f seconds' % (time.time() - start)
start = time.time()
for n in range(loops):
sorted_data = np.zeros(data.shape)
sorted_data2 = np.zeros(data2.shape)
for n in range(data.shape[0]):
for m in range(data.shape[1]):
sort_ids = np.argsort(data[n,m,:,0])
data[n,m,:,:] = data[n,m,sort_ids,:]
data2[n,m,:,:] = data2[n,m,sort_ids,:]
print 'Time Elapsed: %5.2f seconds' % (time.time() - start)