I am trying to construct a binomial lattice model in Python. The idea is that there are multiple binomial lattices and based on the value in particular lattice, a series of operations are performed in other lattices.
These operations are similar to 'option pricing model' ( Reference to Black Scholes models) in a way that calculations start at the last column of the lattice and those are iterated to previous column one step at a time.
For example,
If I have a binomial lattice with n columns,
1. I calculate the values in nth column for a single or multiple lattices.
2. Based on these values, I update the values in (n-1)th column in same or other binomial lattices
3. This process continues until I reach the first column.
So in short, I cannot process the calculations for all of the lattice simultaneously as value in each column depends on the values in next column and so on.
From coding perspective,
I have written a function that does the calculations for a particular column in a lattice and outputs the numbers that are used as input for next column in the process.
def column_calc(StockPrices_col, ConvertProb_col, y_col, ContinuationValue_col, ConversionValue_col, coupon_dates_index, convert_dates_index ,
call_dates_index, put_dates_index, ConvertProb_col_new, ContinuationValue_col_new, y_col_new,tau, r, cs, dt,call_trigger,
putPrice,callPrice):
for k in range(1, n+1-tau):
ConvertProb_col_new[n-k] = 0.5*(ConvertProb_col[n-1-k] + ConvertProb_col[n-k])
y_col_new[n-k] = ConvertProb_col_new[n-k]*r + (1- ConvertProb_col_new[n-k]) *(r + cs)
# Calculate the holding value
ContinuationValue_col_new[n-k] = 0.5*(ContinuationValue_col[n-1-k]/(1+y_col[n-1-k]*dt) + ContinuationValue_col[n-k]/(1+y_col[n-k]*dt))
# Coupon payment date
if np.isin(n-1-tau, coupon_dates_index) == True:
ContinuationValue_col_new[n-k] = ContinuationValue_col_new[n-k] + Principal*(1/2*c);
# check put/call schedule
callflag = (np.isin(n-1-tau, call_dates_index)) & (StockPrices_col[n-k] >= call_trigger)
putflag = np.isin(n-1-tau, put_dates_index)
convertflag = np.isin(n-1-tau, convert_dates_index)
# if t is in call date
if (np.isin(n-1-tau, call_dates_index) == True) & (StockPrices_col[n-k] >= call_trigger):
node_val = max([putPrice * putflag, ConversionValue_col[n-k] * convertflag, min(callPrice, ContinuationValue_col_new[n-k])] )
# if t is not call date
else:
node_val = max([putPrice * putflag, ConversionValue_col[n-k] * convertflag, ContinuationValue_col_new[n-k]] )
# 1. if Conversion happens
if node_val == ConversionValue_col[n-k]*convertflag:
ContinuationValue_col_new[n-k] = node_val
ConvertProb_col_new[n-k] = 1
# 2. if put happens
elif node_val == putPrice*putflag:
ContinuationValue_col_new[n-k] = node_val
ConvertProb_col_new[n-k] = 0
# 3. if call happens
elif node_val == callPrice*callflag:
ContinuationValue_col_new[n-k] = node_val
ConvertProb_col_new[n-k] = 0
else:
ContinuationValue_col_new[n-k] = node_val
return ConvertProb_col_new, ContinuationValue_col_new, y_col_new
I am calling this function for every column in the lattice through a for loop.
So essentially I am running a nested for loop for all the calculations.
My issue is - This is very slow.
The function doesn't take much time. but the second iteration where I am calling the function through the for loop is very time consuming ( avg. times the function will be iterated in below for loop is close to 1000 or 1500 ) It takes almost 2.5 minutes to run the complete model which is very slow from standard modeling standpoint.
As mentioned above, most of the time is taken by the nested for loop shown below:
temp_mat = np.empty((n,3))*(np.nan)
temp_mat[:,0] = ConvertProb[:, n-1]
temp_mat[:,1] = ContinuationValue[:, n-1]
temp_mat[:,2] = y[:, n-1]
ConvertProb_col_new = np.empty((n,1))*(np.nan)
ContinuationValue_col_new = np.empty((n,1))*(np.nan)
y_col_new = np.empty((n,1))*(np.nan)
for tau in range(1,n):
ConvertProb_col = temp_mat[:,0]
ContinuationValue_col = temp_mat[:,1]
y_col = temp_mat[:,2]
ConversionValue_col = ConversionValue[:, n-tau-1]
StockPrices_col = StockPrices[:, n-tau-1]
out = column_calc(StockPrices_col, ConvertProb_col, y_col, ContinuationValue_col, ConversionValue_col, coupon_dates_index, convert_dates_index ,call_dates_index, put_dates_index, ConvertProb_col_new, ContinuationValue_col_new, y_col_new, tau, r, cs, dt,call_trigger,putPrice,callPrice)
temp_mat[:,0] = out[0].reshape(np.shape(out[0])[0],)
temp_mat[:,1] = out[1].reshape(np.shape(out[1])[0],)
temp_mat[:,2] = out[2].reshape(np.shape(out[2])[0],)
#Final value
print(temp_mat[-1][1])
Is there any way I can reduce the time consumed in nested for loop? or is there any alternative that I can use instead of nested for loop.
Please let me know. Thanks a lot !!!
Related
I am attempting to use a for loop to run a function f(x) on a range of values. I am running into an issue when I want to recalculate my input on the next step dependent on the previous step. Below is some pseudo code that might explain my issue better and below that is the actual code I am attempting to write. The goal is to have a loop that calculates the results on one step and recalculates an input on the next step dependent on the results. IE:
for i in range(10):
H = i - 1
result_up = f(H)
H_delta = (nsolve(Eq(result_up,A(H1)),H1,1))
i += H_delta
The idea is that in the next iteration result_up = f(i += H_delta) so on and so forth.
from sympy import *
USHead = np.linspace(1,3,25)
PHeight = 1.5
WH = Symbol('WH')
for i in USHead:
if i <= PHeight:
Oh = i-1
OD = ((Cd * 0.738 * math.sqrt(2 * 32.2 * (Oh)))* 2)
i_delta = (nsolve(Eq(OD,(0.0612*(TwH1/1)+3.173)*(6-0.003)*(TwH1+.003)**(1.5)), TwH1,1))
i += i_delta
My understanding of for loops is that you have the ability to recalculate i as you continue through the iteration but am thinking the issue is because I have defined my range as a list?
The step size of the list is .083 starting at 1 and ending at 3.
You can't use a for loop if you want to update the iteration variable yourself, since it will override that with the next value from the range. Use a while loop.
i = 0
while i < 10:
H = i - 1
result_up = f(H)
H_delta = (nsolve(Eq(result_up,A(H1)),H1,1))
i += H_delta
I have a dataset which I want to create a new column that is based on a division of two other columns using a for-loop with if-conditions.
This is the dataset, with the empty 'solo_fare' column created beforehand.
The task is to loop through each row and divide 'Fare' by 'relatives' to get the per-passenger fare. However, there are certain if-conditions to follow (passengers in this category should see per-passenger prices of between 3 and 8)
The code I have tried here doesn't seem to fill in the 'solo_fare' rows at all. It returns an empty column (same as above df).
for i in range(0, len(fare_result)):
p = fare_result.iloc[i]['Fare']/fare_result.iloc[i]['relatives']
q = fare_result.iloc[i]['Fare']
r = fare_result.iloc[i]['relatives']
# if relatives == 0, return original Fare amount
if (r == 0):
fare_result.iloc[i]['solo_fare'] = q
# if the divided fare is below 3 or more than 8, return original Fare amount again
elif (p < 3) or (p > 8):
fare_result.iloc[i]['solo_fare'] = q
# else, return the divided fare to get solo_fare
else:
fare_result.iloc[i]['solo_fare'] = p
How can I get this to work?
You should probably not use a loop for this but instead just use loc
if you first create the 'solo fare' column and give every row the default value from Fare you can then change the value for the conditions you have set out
fare_result['solo_fare'] = fare_result['Fare']
fare_results.loc[(
(fare_results.Fare / fare_results.relatives) >= 3) & (
(fare_results.Fare / fare_results.relatives) <= 8), 'solo_fare'] = (
fare_results.Fare / fare_results.relatives)
Did you try to initialize those new colums first ?
By that I mean that the statement fare_result.iloc[i]['solo_fare'] = q
only means that you are assigning the value q to the field solo_fare of the line i
The issue there is that at this moment, the line i does not have any solo_fare key. Hence, you are only filling the last value of your table here.
To solve this issue, try declaring the solo_fare column before the for loop like:
fare_result['solo_fare'] = np.nan
One way to do is to define a row-wise function, and apply it to the dataframe:
# row-wise function (mockup)
def foo(fare, relative):
# your logic here. Mine just serves as example
if relative > 100:
res = fare/relative
elif (relative < 10):
res = fare
else:
res = 10
return res
Then apply it to the dataframe (row-wise):
fare_result['solo_fare'] = fare_result.apply(lambda row: foo(row['Fare'], row['relatives']) , axis=1)
I have two ordered lists of consecutive integers m=0, 1, ... M and n=0, 1, 2, ... N. Each value of m has a probability pm, and each value of n has a probability pn. I am trying to find the ordered list of unique values r=n/m and their probabilities pr. I am aware that r is infinite if n=0 and can even be undefined if m=n=0.
In practice, I would like to run for M and N each be of the order of 2E4, meaning up to 4E8 values of r - which would mean 3 GB of floats (assuming 8 Bytes/float).
For this calculation, I have written the python code below.
The idea is to iterate over m and n, and for each new m/n, insert it in the right place with its probability if it isn't there yet, otherwise add its probability to the existing number. My assumption is that it is easier to sort things on the way instead of waiting until the end.
The cases related to 0 are added at the end of the loop.
I am using the Fraction class since we are dealing with fractions.
The code also tracks the multiplicity of each unique value of m/n.
I have tested up to M=N=100, and things are quite slow. Are there better approaches to the question, or more efficient ways to tackle the code?
Timing:
M=N=30: 1 s
M=N=50: 6 s
M=N=80: 30 s
M=N=100: 82 s
import numpy as np
from fractions import Fraction
import time # For timiing
start_time = time.time() # Timing
M, N = 6, 4
mList, nList = np.arange(1, M+1), np.arange(1, N+1) # From 1 to M inclusive, deal with 0 later
mProbList, nProbList = [1/(M+1)]*(M), [1/(N+1)]*(N) # Probabilities, here assumed equal (not general case)
# Deal with mn=0 later
pmZero, pnZero = 1/(M+1), 1/(N+1) # P(m=0) and P(n=0)
pNaN = pmZero * pnZero # P(0/0) = P(m=0)P(n=0)
pZero = pmZero * (1 - pnZero) # P(0) = P(m=0)P(n!=0)
pInf = pnZero * (1 - pmZero) # P(inf) = P(m!=0)P(n=0)
# Main list of r=m/n, P(r) and mult(r)
# Start with first line, m=1
rList = [Fraction(mList[0], n) for n in nList[::-1]] # Smallest first
rProbList = [mProbList[0] * nP for nP in nProbList[::-1]] # Start with first line
rMultList = [1] * len(rList) # Multiplicity of each element
# Main loop
for m, mP in zip(mList[1:], mProbList[1:]):
for n, nP in zip(nList[::-1], nProbList[::-1]): # Pick an n value
r, rP, rMult = Fraction(m, n), mP*nP, 1
for i in range(len(rList)-1): # See where it fits in existing list
if r < rList[i]:
rList.insert(i, r)
rProbList.insert(i, rP)
rMultList.insert(i, 1)
break
elif r == rList[i]:
rProbList[i] += rP
rMultList[i] += 1
break
elif r < rList[i+1]:
rList.insert(i+1, r)
rProbList.insert(i+1, rP)
rMultList.insert(i+1, 1)
break
elif r == rList[i+1]:
rProbList[i+1] += rP
rMultList[i+1] += 1
break
if r > rList[-1]:
rList.append(r)
rProbList.append(rP)
rMultList.append(1)
break
# Deal with 0
rList.insert(0, Fraction(0, 1))
rProbList.insert(0, pZero)
rMultList.insert(0, N)
# Deal with infty
rList.append(np.Inf)
rProbList.append(pInf)
rMultList.append(M)
# Deal with undefined case
rList.append(np.NAN)
rProbList.append(pNaN)
rMultList.append(1)
print(".... done in %s seconds." % round(time.time() - start_time, 2))
print("************** Final list\nr", 'Prob', 'Mult')
for r, rP, rM in zip(rList, rProbList, rMultList): print(r, rP, rM)
print("************** Checks")
print("mList", mList, 'nList', nList)
print("Sum of proba = ", np.sum(rProbList))
print("Sum of multi = ", np.sum(rMultList), "\t(M+1)*(N+1) = ", (M+1)*(N+1))
Based on the suggestion of #Prune, and on this thread about merging lists of tuples, I have modified the code as below. It's a lot easier to read, and runs about an order of magnitude faster for N=M=80 (I have omitted dealing with 0 - would be done same way as in original post). I assume there may be ways to tweak the merge and conversion back to lists further yet.
# Do calculations
data = [(Fraction(m, n), mProb(m) * nProb(n)) for n in range(1, N+1) for m in range(1, M+1)]
data.sort()
# Merge duplicates using a dictionary
d = {}
for r, p in data:
if not (r in d): d[r] = [0, 0]
d[r][0] += p
d[r][1] += 1
# Convert back to lists
rList, rProbList, rMultList = [], [], []
for k in d:
rList.append(k)
rProbList.append(d[k][0])
rMultList.append(d[k][1])
I expect that "things are quite slow" because you've chosen a known inefficient sort. A single list insertion is O(K) (later list elements have to be bumped over, and there is added storage allocation on a regular basis). Thus a full-list insertion sort is O(K^2). For your notation, that is O((M*N)^2).
If you want any sort of reasonable performance, research and use the best-know methods. The most straightforward way to do this is to make your non-exception results as a simple list comprehension, and use the built-in sort for your penultimate list. Simply append your n=0 cases, and you're done in O(K log K) time.
I the expression below, I've assumed functions for m and n probabilities.
This is a notational convenience; you know how to directly compute them, and can substitute those expressions if you wish.
data = [ (mProb(m) * nProb(n), Fraction(m, n))
for n in range(1, N+1)
for m in range(0, M+1) ]
data.sort()
data.extend([ # generate your "zero" cases here ])
I have started learning reinforcement learning and referring the book by Sutton. I was trying to understand the non-stationary environment which was quoted in the book as:
suppose the bandit task were nonstationary, that is, that the true
values of the actions changed over time. In this case exploration is
needed even in the deterministic case to make sure one of the
nongreedy actions has not changed to become better than the greedy on
This tells me that the true expected rewards value given an action changes with time. But does this mean with every time step ? I could clearly understand that how we track the rewards in such case i.e. by weighting the recent ones more than previous ones for every time step. However, does this also mean or give indication that target or true values change with every time step? I am trying to simulate the 10arm bandit problem with the same fig given below where we compare Upper Confidence-Bound Action Selection and Epsilon-greedy methods with sample average method for estimating action values in stationary environment.
if I have to simulate the same with non_stationary environment then how can I do that ? Below is my code :
class NArmedBandit:
#10-armed bandit testbed with sample averages
def __init__(self,k=10,step_size = 0.1,eps = 0,UCB_c = None, sample_avg_flag = False,
init_estimates = 0.0,mu = 0, std_dev = 1):
self.k = k
self.step_size = step_size
self.eps = eps
self.init_estimates = init_estimates
self.mu = mu
self.std_dev = std_dev
self.actions = np.zeros(k)
self.true_reward = 0.0
self.UCB_c = UCB_c
self.sample_avg_flag = sample_avg_flag
self.re_init()
def re_init(self):
#true values of rewards for each action
self.actions = np.random.normal(self.mu,self.std_dev,self.k)
# estimation for each action
self.Q_t = np.zeros(self.k) + self.init_estimates
# num of chosen times for each action
self.N_t = np.zeros(self.k)
#best action chosen
self.optim_action = np.argmax(self.actions)
self.time_step = 0
def act(self):
val = np.random.rand()
if val < self.eps:
action = np.random.choice(np.arange(self.k))
#print('action 1:',action)
elif self.UCB_c is not None:
#1e-5 is added so as to avoid division by zero
ucb_estimates = self.Q_t + self.UCB_c * np.sqrt(np.log(self.time_step + 1) / (self.N_t + 1e-5))
A_t = np.max(ucb_estimates)
action = np.random.choice(np.where(ucb_estimates == A_t)[0])
else:
A_t = np.max(self.Q_t)
action = np.random.choice(np.where(self.Q_t == A_t)[0])
#print('action 2:',action)
return action
def step(self,action):
# generating the reward under N(real reward, 1)
reward = np.random.randn() + self.actions[action]
self.time_step += 1
self.N_t[action] += 1
# estimation with sample averages
if self.sample_avg_flag == True:
self.Q_t[action] += (reward - self.Q_t[action]) / self.N_t[action]
else:
# non-staationary with constant step size
self.Q_t[action] += self.step_size * (reward - self.Q_t[action])
return reward
def play(self,tasks,num_time_steps):
rewards = np.zeros((tasks, num_time_steps))
optim_action_counts = np.zeros(rewards.shape)
for task in trange(tasks):
self.re_init()
for t in range(num_time_steps):
action = self.act()
reward = self.step(action)
rewards[task, t] = reward
if action == self.optim_action:
optim_action_counts[task, t] = 1
avg_optim_action_counts = optim_action_counts.mean(axis=0)
avg_rewards = rewards.mean(axis=0)
return avg_optim_action_counts, avg_rewards
Should I change the actions array (which are assumed as true estimates) defined in re_init() function by calling re_init() function after every time step in play() which is like changing true expected rewards for every action at each time step. I have already incorporated the code for calculating the rewards in case of non-stationary environment in act() and step() functions which are using constant step size alpha = 0.1. The only thing that I don't know is how do set up or simulate the non-stationary environment here and if it is correctly understood by me.
You understand correctly about non-stationary. As you understand "that the true values of the actions changed over time."
But, how they change?
Actually, it is not clearly defined. Your re_init approach is correct from my perspective. What you need to decide when they change. But one thing is clear, if you change rewards every time step, there is nothing to be learned since you are changing all to-be-learned rewards every step.
I can offer two solutions for satisfying non-stationary definition.
you call re_init with small probability of eps for every 100 or 1000 steps.
you can start with initial values and add small random +/- values to your initial values. Then your rewards will drift in positive or negative direction.
Within a loop that collects some samples, I need to obtain some statistics about their sorted indices every now and then, for which argsort returns exactly what I need. However, each iteration adds only a single sample, and it is a huge waste of resources to keep passing the whole samples array to the argsort function, especially since the samples array is very huge. Is not there an incremental efficient technique equivalent to argsort?
I believe an efficient incremental argsort function can be implemented by maintaining an ordered list of samples, which can be searched for the proper insertion indices once a new sample arrives. Such indices can be then utilized to both maintain the order of the samples list as well as to generate the incremental argsort-like desired output.
So far, I have utilized the searchsorted2d function by #Divakar, with slight modifications to obtain the insertion indices, and built some routine that can get the desired output if it is called after each sample insertion (b = 1).
Yet, this is inefficient, and I would like to call the routine after the collection of kth samples (e.g. b = 10). In the case of bulk insertions, searchsorted2d seems to return incorrect indices, and that is were I stopped!
import time
import numpy as np
# By Divakar
# See https://stackoverflow.com/a/40588862
def searchsorted2d(a, b):
m, n = a.shape
max_num = np.maximum(a.max() - a.min(), b.max() - b.min()) + 1
r = max_num * np.arange(m)[:,np.newaxis]
p = np.searchsorted((a + r).ravel(), (b + r).ravel()).reshape(b.shape)
return p #- n * (np.arange(m)[:,np.newaxis])
# The following works with batch size b = 1,
# but that is not efficient ...
# Can we make it work for any b > 0 value?
class incremental(object):
def __init__(self, shape):
# Express each row offset
self.ranks_offset = np.tile(np.arange(shape[1]).reshape(1, -1),
(shape[0], 1))
# Storage for sorted samples
self.a_sorted = np.empty((shape[0], 0))
# Storage for sort indices
self.a_ranks = np.empty((shape[0], 0), np.int)
def argsort(self, a):
if self.a_sorted.shape[1] == 0: # Use np.argsort for initialization
self.a_ranks = a.argsort(axis=1)
self.a_sorted = np.take_along_axis(a, self.a_ranks, 1)
else: # In later itterations,
# searchsorted the input increment
indices = searchsorted2d(self.a_sorted, a)
# insert the stack pos to track the sorting indices
self.a_ranks = np.insert(self.a_ranks, indices.ravel(),
self.ranks_offset.ravel() +
self.a_ranks.shape[1]).reshape((n, -1))
# insert the increments to maintain a sorted input array
self.a_sorted = np.insert(self.a_sorted, indices.ravel(),
a.ravel()).reshape((n, -1))
return self.a_ranks
M = 1000 # number of iterations
n = 16 # vector size
b = 10 # vectors batch size
# Storage for samples
samples = np.zeros((n, M)) * np.nan
# The proposed approach
inc = incremental((n, b))
c = 0 # iterations counter
tick = time.time()
while c < M:
if c % b == 0: # Perform batch computations
#sample_ranks = samples[:,:c].argsort(axis=1)
sample_ranks = inc.argsort(samples[:,max(0,c-b):c]) # Incremental argsort
######################################################
# Utilize sample_ranks in some magic statistics here #
######################################################
samples[:,c] = np.random.rand(n) # collect a sample
c += 1 # increment the counter
tock = time.time()
last = ((c-1) // b) * b
sample_ranks_GT = samples[:,:last].argsort(axis=1) # Ground truth
print('Compatibility: {0:.1f}%'.format(
100 * np.count_nonzero(sample_ranks == sample_ranks_GT) / sample_ranks.size))
print('Elapsed time: {0:.1f}ms'.format(
(tock - tick) * 1000))
I would expect 100% compatibility with the argsort function, yet it need to be more efficient than calling argsort. As for execution time with an incremental approach, it seems that 15ms or so should be more than enough for the given example.
So far, only one condition of these two can be met with any of the explored techniques.
To make a long story short, the shown above algorithm seems to be a variant of an order-statistic tree to estimate the data ranks, but it fails to do so when samples are added in bulk (b > 1). So far, it only works when inserting samples one by one (b = 1). However, the arrays are copied every time insert is called, which causes a huge overhead and forms a bottleneck, therefore samples shall be added in bulks rather than individually.
Can you introduce more efficient incremental argsort algorithm, or at least figure out how to support bulk insertion (b > 1) in the above one?
If you choose to start from where I stopped, then the problem can be reduced to fixing the bug in following snapshot:
import numpy as np
# By Divakar
# See https://stackoverflow.com/a/40588862
def searchsorted2d(a, b):
m, n = a.shape
max_num = np.maximum(a.max() - a.min(), b.max() - b.min()) + 1
r = max_num * np.arange(m)[:,np.newaxis]
p = np.searchsorted((a + r).ravel(), (b + r).ravel()).reshape(b.shape)
# It seems the bug is around here...
#return p - b.shape[0] * np.arange(b.shape[1])[np.newaxis]
#return p - b.shape[1] * np.arange(b.shape[0])[:,np.newaxis]
return p
n = 16 # vector size
b = 2 # vectors batch size
a = np.random.rand(n, 1) # Samples array
a_ranks = a.argsort(axis=1) # Initial ranks
a_sorted = np.take_along_axis(a, a_ranks, 1) # Initial sorted array
new_data = np.random.rand(n, b) # New block to append into the samples array
a = np.hstack((a, new_data)) #Append new block
indices = searchsorted2d(a_sorted, new_data) # Compute insertion indices
ranks_offset = np.tile(np.arange(b).reshape(1, -1), (a_ranks.shape[0], 1)) + a_ranks.shape[1] # Ranks to insert
a_ranks = np.insert(a_ranks, indices.ravel(), ranks_offset.ravel()).reshape((n, -1)) # Insert ransk according to their indices
a_ransk_GT = a.argsort(axis=1) # Ranks ground truth
mask = (a_ranks == a_ransk_GT)
print(mask) #Why they are not all True?
assert(np.all(mask)), 'Oops!' #This should not fail, but it does :(
It seems the bulk insertion is more involved that what I initially thought, and that searchsorted2d is not to be blamed. Take the case of a sorted array a = [ 1, 2, 5 ], and two new elements block = [3, 4] to be inserted. If we iterate and insert, then np.searchsorted(a, block[i]) would return [2] and [3], and that is just OK. However, if we call np.searchsorted(a, block) (the desired behavior - equivalent to iteration without insertion), we would get [2, 2]. This is problematic for implementing an incremental argsort, since even np.searchsorted(a, block[::-1]) would result in the same. Any idea?
It turned out that the returned indices by searchsorted are not enough to ensure a sorted array when dealing with batch inputs. If the being inserted block contains two entries that are out of order, yet they will end up being placed adjacently in the target array, then they will receive the exact same insertion index, thus get inserted in their current order, causing the glitch. Accordingly, the input block itself needs to be sorted before insertion. See the last paragraph of the question for a numerical example.
By sorting the input block and adapting the remaining parts, a 100.0% compatible solution with argsort is obtained, and it is very efficient (elapsed time is 15.6ms for inserting 1000 entries in ten by ten blocks b = 10). This can be reproduced by replacing the buggy incremental class found in the question with the following one:
# by Hamdi Sahloul
class incremental(object):
def __init__(self, shape):
# Storage for sorted samples
self.a_sorted = np.empty((shape[0], 0))
# Storage for sort indices
self.a_ranks = np.empty((shape[0], 0), np.int)
def argsort(self, block):
# Compute the ranks of the input block
block_ranks = block.argsort(axis=1)
# Sort the block accordingly
block_sorted = np.take_along_axis(block, block_ranks, 1)
if self.a_sorted.shape[1] == 0: # Initalize using the block data
self.a_ranks = block_ranks
self.a_sorted = block_sorted
else: # In later itterations,
# searchsorted the input block
indices = searchsorted2d(self.a_sorted, block_sorted)
# update the global ranks
self.a_ranks = np.insert(self.a_ranks, indices.ravel(),
block_ranks.ravel() +
self.a_ranks.shape[1]).reshape((block.shape[0], -1))
# update the overall sorted array
self.a_sorted = np.insert(self.a_sorted, indices.ravel(),
block_sorted.ravel()).reshape((block.shape[0], -1))
return self.a_ranks