Tricky Median Question

Tricky Median Question - python

Given n points, choose a point in the given list such that the sum of distances to this point is minimum ,compared to all others.
Distance is measured in the following manner.
For a point (x,y) all 8 adjacent points have distance 1.
(x+1,y)(x+1,y+1),(x+1,y-1),(x,y+1),(x,y-1),(x-1,y)(x-1,y+1),(x-1,y-1)
EDIT
More clearer explanation.
A function foo is defined as
foo(point_a,point_b) = max(abs(point_a.x - point_b.x),abs(point_a.y - point_b.y))
Find a point x such that sum([foo(x,y) for y in list_of_points]) is minimum.
Example
Input:
12 -14
-3 3
-14 7
-14 -3
2 -12
-1 -6
Output
-1 -6
Eg:
Distance between (4,5) and 6,7) is 2.
This can be done in O(n^2) time, by checking the sum of each pair.
Is there any better algorithm to do it?

Update: it sometimes fails to find the optimum, I'll leave this here till I find the problem.
this is O(n): nth is O(n) (expected, not worst), iterating over the list is O(n). If you need strict O() then pick the middle element with sorting but then it's going to be O(n*log(n)).
Note: it's easy to modifiy it to return all the optimal points.
import sys
def nth(sample, n):
pivot = sample[0]
below = [s for s in sample if s < pivot]
above = [s for s in sample if s > pivot]
i, j = len(below), len(sample)-len(above)
if n < i: return nth(below, n)
elif n >= j: return nth(above, n-j)
else: return pivot
def getbest(li):
''' li is a list of tuples (x,y) '''
l = len(li)
lix = [x[0] for x in li]
liy = [x[1] for x in li]
mid_x1 = nth(lix, l/2) if l%2==1 else nth(lix, l/2-1)
mid_x2 = nth(lix, l/2)
mid_y1 = nth(liy, l/2) if l%2==1 else nth(liy, l/2-1)
mid_y2 = nth(liy, l/2)
mindist = sys.maxint
minp = None
for p in li:
dist = 0 if mid_x1 <= p[0] <= mid_x2 else min(abs(p[0]-mid_x1), abs(p[0]-mid_x2))
dist += 0 if mid_y1 <= p[1] <= mid_y2 else min(abs(p[1]-mid_y1), abs(p[1]-mid_y2))
if dist < mindist:
minp, mindist = p, dist
return minp
It's based on the solution of the one dimensional problem - for a list of numbers find a number for which the sum distance is the minimum.
The solution for this is the middle element of the (sorted) list or any number between the two middle elements (including these two elements) if there are an even number of elements in the list.
Update: my nth algorithm seems to be very slow, probably there is a better way to rewrite it, sort outperforms it with < 100000 elements, so if you do speed comparison, just add sort(lix); sort(liy); and
def nth(sample, n):
return sample[n]
For anyone out there who wants to test his solution, here is what I use. Just run a loop, generate input and compare your solution with the output of bruteforce.
import random
def example(length):
l = []
for x in range(length):
l.append((random.randint(-100, 100), random.randint(-100,100)))
return l
def bruteforce(li):
bestsum = sys.maxint
bestp = None
for p in li:
sum = 0
for p1 in li:
sum += max(abs(p[0]-p1[0]), abs(p[1]-p1[1]))
if sum < bestsum:
bestp, bestsum = p, sum
return bestp

I can imagine a scheme better than O(n^2), at least in the common case.
Build a quadtree out of your input points. For each node in the tree, compute the number and average position of the points within that node. Then for each point, you can use the quadtree to compute its distance to all other points in less than O(n) time. If you're computing the distance from a point p to a distant quadtree node v, and v doesn't overlap the 45 degree diagonals from p, then the total distance from p to all the points in v is easy to compute (for v which are more horizontally than vertically separated from p, it is just v.num_points * |p.x - v.average.x|, and similarly using y coordinates if v is predominately vertically seperated). If v overlaps one of the 45 degree diagonals, recurse on its components.
That should beat O(n^2), at least when you can find a balanced quadtree to represent your points.

Related

What's the most efficient way to find factors in a list?

What I'm looking to do:
I need to make a function that, given a list of positive integers (there can be duplicate integers), counts all triples (in the list) in which the third number is a multiple of the second and the second is a multiple of the first:
(The same number cannot be used twice in one triple, but can be used by all other triples)
For example, [3, 6, 18] is one because 18 goes evenly into 6 which goes evenly into 3.
So given [1, 2, 3, 4, 5, 6] it should find:
[1, 2, 4] [1, 2, 6] [1, 3, 6]
and return 3 (the number of triples it found)
What I've tried:
I made a couple of functions that work but are not efficient enough. Is there some math concept I don't know about that would help me find these triples faster? A module with a function that does better? I don't know what to search for...
def foo(q):
l = sorted(q)
ln = range(len(l))
for x in ln:
if len(l[x:]) > 1:
for y in ln[x + 1:]:
if (len(l[y:]) > 0) and (l[y] % l[x] == 0):
for z in ln[y + 1:]:
if l[z] % l[y] == 0:
ans += 1
return ans
This one is a bit faster:
def bar(q):
l = sorted(q)
ans = 0
for x2, x in enumerate(l):
pool = l[x2 + 1:]
if len(pool) > 1:
for y2, y in enumerate(pool):
pool2 = pool[y2 + 1:]
if pool2 and (y % x == 0):
for z in pool2:
if z % y == 0:
ans += 1
return ans
Here's what I've come up with with help from y'all but I must be doing something wrong because it get's the wrong answer (it's really fast though):
def function4(numbers):
ans = 0
num_dict = {}
index = 0
for x in numbers:
index += 1
num_dict[x] = [y for y in numbers[index:] if y % x == 0]
for x in numbers:
for y in num_dict[x]:
for z in num_dict[y]:
print(x, y, z)
ans += 1
return ans
(39889 instead of 40888) - oh, I accidentally made the index var start at 1 instead of 0. It works now.
Final Edit
I've found the best way to find the number of triples by reevaluating what I needed it to do. This method doesn't actually find the triples, it just counts them.
def foo(l):
llen = len(l)
total = 0
cache = {}
for i in range(llen):
cache[i] = 0
for x in range(llen):
for y in range(x + 1, llen):
if l[y] % l[x] == 0:
cache[y] += 1
total += cache[x]
return total
And here's a version of the function that explains the thought process as it goes (not good for huge lists though because of spam prints):
def bar(l):
list_length = len(l)
total_triples = 0
cache = {}
for i in range(list_length):
cache[i] = 0
for x in range(list_length):
print("\n\nfor index[{}]: {}".format(x, l[x]))
for y in range(x + 1, list_length):
print("\n\ttry index[{}]: {}".format(y, l[y]))
if l[y] % l[x] == 0:
print("\n\t\t{} can be evenly diveded by {}".format(l[y], l[x]))
cache[y] += 1
total_triples += cache[x]
print("\t\tcache[{0}] is now {1}".format(y, cache[y]))
print("\t\tcount is now {}".format(total_triples))
print("\t\t(+{} from cache[{}])".format(cache[x], x))
else:
print("\n\t\tfalse")
print("\ntotal number of triples:", total_triples)

Right now your algorithm has O(N^3) running time, meaning that every time you double the length of the initial list the running time goes up by 8 times.
In the worst case, you cannot improve this. For example, if your numbers are all successive powers of 2, meaning that every number divides every number grater than it, then every triple of numbers is a valid solution so just to print out all the solutions is going to be just as slow as what you are doing now.
If you have a lower "density" of numbers that divide other numbers, one thing you can do to speed things up is to search for pairs of numbers instead of triples. This will take time that is only O(N^2), meaning the running time goes up by 4 times when you double the length of the input list. Once you have a list of pairs of numbers you can use it to build a list of triples.
# For simplicity, I assume that a number can't occur more than once in the list.
# You will need to tweak this algorithm to be able to deal with duplicates.
# this dictionary will map each number `n` to the list of other numbers
# that appear on the list that are multiples of `n`.
multiples = {}
for n in numbers:
multiples[n] = []
# Going through each combination takes time O(N^2)
for x in numbers:
for y in numbers:
if x != y and y % x == 0:
multiples[x].append(y)
# The speed on this last step will depend on how many numbers
# are multiples of other numbers. In the worst case this will
# be just as slow as your current algoritm. In the fastest case
# (when no numbers divide other numbers) then it will be just a
# O(N) scan for the outermost loop.
for x in numbers:
for y in multiples[x]:
for z in multiples[y]:
print(x,y,z)
There might be even faster algorithms, that also take advantage of algebraic properties of division but in your case I think a O(N^2) is probably going to be fast enough.

the key insight is:
if a divides b, it means a "fits into b".
if a doesn't divide c, then it means "a doesn't fit into c".
And if a can't fit into c, then b cannot fit into c (imagine if b fitted into c, since a fits into b, then a would fit into all the b's that fit into c and so a would have to fit into c too.. (think of prime factorisation etc))
this means that we can optimise. If we sort the numbers smallest to largest and start with the smaller numbers first. First iteration, start with the smallest number as a
If we partition the numbers into two groups, group 1, the numbers which a divides, and group 2 the group which a doesn't divide, then we know that no numbers in group 1 can divide numbers in group 2 because no numbers in group 2 have a as a factor.
so if we had [2,3,4,5,6,7], we would start with 2 and get:
[2,4,6] and [3,5,7]
we can repeat the process on each group, splitting into smaller groups. This suggests an algorithm that could count the triples more efficiently. The groups will get really small really quickly, which means its efficiency should be fairly close to the size of the output.

This is the best answer that I was able to come up with so far. It's fast, but not quite fast enough. I'm still posting it because I'm probably going to abandon this question and don't want to leave out any progress I've made.
def answer(l):
num_dict = {}
ans_set = set()
for a2, a in enumerate(l):
num_dict[(a, a2)] = []
for x2, x in enumerate(l):
for y2, y in enumerate(l):
if (y, y2) != (x, x2) and y % x == 0:
pair = (y, y2)
num_dict[(x, x2)].append(pair)
for x in num_dict:
for y in num_dict[x]:
for z in num_dict[y]:
ans_set.add((x[0], y[0], z[0]))
return len(ans_set)

NumPy: Sparse outer product of n vectors (hyperbolic cross)

I'm trying to compute a certain subset of the full outer product of n vectors. The computation of the full outer product is described in this question.
Formally: Let v1,v2,...,vk be vectors of some length n, and K be a positive constant. I want a list containing all the products v1[i1]v2[i2]...vk[ik] for which i1*i2*...*ik <= K (indices start at one). Note: For example, if K = n ** k, the list would contain every combination.
My current approach is to create a hierarchical list of the indices fulfilling the condition above and then calculating the products recursively, which has the advantage of reusing some factors.
This implementation is a lot slower than the computation of the full outer product using NumPy (for same n and k). I want to achieve a better performance than the computation of the full product. I'm interested in larger values for k, and small K (this problem comes from function approximation with sparse bases, i.e. hyperbolic cross).
Does anyone know a more performant way to get this list? Maybe by using more NumPy or another algorithm? I will try a C implementation next.
Here is my current implementation:
import numpy as np
def get_cross_indices(n, k, K):
"""
Assume k > 0.
Returns a hierarchical list containg elements of type
(i1, list) with
- i1 being a index (zero based!)
- list being again a list (possibly empty) with all indices i2, such
that (i1+1) * (i2+1) * ... * (ik+1) <= K (going down the hierarchy)
"""
if k == 1:
num = min(n, K)
return (num, [(x, []) for x in range(num)])
else:
indices = []
nums = 0
for i in xrange(min(n, K)):
(num, tail) = get_cross_indices(n,
k - 1, K // (i + 1))
indices.append((i, tail))
nums += num
return (nums, indices)
def calc_cross_outer_product(vectors, result, factor, indices, pos):
"""
Fills the result list recursively with all products
vectors[0][i1] * ... * vectors[k-1][ik]
such that i1,...,ik is a feasible index sequence
from `indices` (they are in there hierarchically,
also see `get_cross_indices`).
"""
for (x, list) in indices:
if not list:
result[pos] = factor * vectors[0][x]
pos += 1
else:
pos = calc_cross_outer_product(vectors[1:], result,
factor * vectors[0][x], list, pos)
return pos
k = 3 # number of vectors
n = 4 # vector length
K = 3
# using random values here just for demonstration purposes
vectors = np.random.rand(k, n)
# get all indices which meet the condition
(count, indices) = get_cross_indices(n, k, K)
result = np.ones(count)
calc_cross_outer_product(vectors, result, 1, indices, 0)
## Equivalent version ##
alt_result = np.ones(count)
# create full outer products
outer_product = reduce(np.multiply, np.ix_(*vectors))
pos = 0
for inds in np.ndindex((n,)*k):
# current index set is feasible?
if np.product(np.array(inds) + 1) <= K:
# compute [ vectors[0][inds[0]],...,vectors[k-1][inds[k-1]] ]
values = map(lambda x: vectors[x[0]][x[1]],
np.dstack((np.arange(k), inds))[0])
alt_result[pos] = np.product(values)
pos += 1
To get a visual idea of the indices I'm interested in, here is a picture for k=3, K=n:
(taken from this website)

Find the total number of triplets when summed are less than a given threshold

So I'm working on some practice problems and having trouble reducing the complexity. I am given an array of distinct integers a[] and a threshold value T. I need to find the number of triplets i,j,k such that a[i] < a[j] < a[k] and a[i] + a[j] + a[k] <= T. I've gotten this down from O(n^3) to O(n^2 log n) with the following python script. I'm wondering if I can optimize this any further.
import sys
import bisect
first_line = sys.stdin.readline().strip().split(' ')
num_numbers = int(first_line[0])
threshold = int(first_line[1])
count = 0
if num_numbers < 3:
print count
else:
numbers = sys.stdin.readline().strip().split(' ')
numbers = map(int, numbers)
numbers.sort()
for i in xrange(num_numbers - 2):
for j in xrange(i+1, num_numbers - 1):
k_1 = threshold - (numbers[i] + numbers[j])
if k_1 < numbers[j]:
break
else:
cross_thresh = bisect.bisect(numbers,k_1) - (j+1)
if cross_thresh > 0:
count += cross_thresh
print count
In the above example, the first input line simply provides the number of numbers and the threshold. The next line is the full list. If the list is less than 3, there is no triplets that can exist, so we return 0. If not, we read in the full list of integers, sort them, and then process them as follows: we iterate over every element of i and j (such that i < j) and we compute the highest value of k that would not break i + j + k <= T. We then find the index (s) of the first element in the list that violates this condition and take all the elements between j and s and add them to the count. For 30,000 elements in a list, this takes about 7 minutes to run. Is there any way to make it faster?

You are performing binary search for each (i,j) pair to find the corresponding value for k. Hence O(n^2 log(n)).
I can suggest an algorithm that will have the worst case time complexity of O(n^2).
Assume the list is sorted from left to right and elements are numbered from 1 to n. Then the pseudo code is:
for i = 1 to n - 2:
j = i + 1
find maximal k with binary search
while j < k:
j = j + 1
find maximal k with linear search to the left, starting from last k position
The reason this has the worst case time complexity of O(n^2) and not O(n^3) is because the position k is monotonically decreasing. Thus even with linear scanning, you are not spending O(n) for each (i,j) pair. Rather, you are spending a total of O(n) time to scan for k for each distinct i value.

O(n^2) version implemented in Python (based on wookie919's answer):
def triplets(N, T):
N = sorted(N)
result = 0
for i in xrange(len(N)-2):
k = len(N)-1
for j in xrange(i+1, len(N)-1):
while k>=0 and N[i]+N[j]+N[k]>T:
k-=1
result += max(k, j)-j
return result
import random
sample = random.sample(xrange(1000000), 30000)
print triplets(sample, 500000)

Subset sum algorithm a little faster than 2^(n/2) in worst time?

After analyzing the fastest subset sum algorithm which runs in 2^(n/2) time, I noticed a slight optimization that can be done. I'm not sure if it really counts as an optimization and if it does, I'm wondering if it can be improved by recursion.
Basically from the original algorithm: http://en.wikipedia.org/wiki/Subset_sum_problem (see part with title Exponential time algorithm)
it takes the list and splits it into two
then it generates the sorted power sets of both in 2^(n/2) time
then it does a linear search in both lists to see if 1 value in both lists sum to x using a clever trick
In my version with the optimization
it takes the list and removes the last element last
then it splits the list in two
then it generates the sorted power sets of both in 2^((n-1)/2) time
then it does a linear search in both lists to see if 1 value in both lists sum to x or x-last (at same time with same running time) using a clever trick
If it finds either, then I will know it worked. I tried using python time functions to test with lists of size 22, and my version is coming like twice as fast apparently.
After running the below code, it shows
0.050999879837 <- the original algorithm
0.0250000953674 <- my algorithm
My logic for the recursion part is, well if it works for a size n list in 2^((n-1)/1) time, can we not repeat this again and again?
Does any of this make sense, or am I totally wrong?
Thanks
I created this python code:
from math import log, ceil, floor
import helper # my own code
from random import randint, uniform
import time
# gets a list of unique random floats
# s = how many random numbers
# l = smallest float can be
# h = biggest float can be
def getRandomList(s, l, h):
lst = []
while len(lst) != s:
r = uniform(l,h)
if not r in lst:
lst.append(r)
return lst
# This just generates the two powerset sorted lists that the 2^(n/2) algorithm makes.
# This is just a lazy way of doing it, this running time is way worse, but since
# this can be done in 2^(n/2) time, I just pretend its that running time lol
def getSortedPowerSets(lst):
n = len(lst)
l1 = lst[:n/2]
l2 = lst[n/2:]
xs = range(2**(n/2))
ys1 = helper.getNums(l1, xs)
ys2 = helper.getNums(l2, xs)
return ys1, ys2
# this just checks using the regular 2^(n/2) algorithm to see if two values
# sum to the specified value
def checkListRegular(lst, x):
lst1, lst2 = getSortedPowerSets(lst)
left = 0
right = len(lst2)-1
while left < len(lst1) and right >= 0:
sum = lst1[left] + lst2[right]
if sum < x:
left += 1
elif sum > x:
right -= 1
else:
return True
return False
# this is my improved version of the above version
def checkListSmaller(lst, x):
last = lst.pop()
x1, x2 = x, x - last
return checkhelper(lst, x1, x2)
# this is the same as the function 'checkListRegular', but it checks 2 values
# at the same time
def checkhelper(lst, x1, x2):
lst1, lst2 = getSortedPowerSets(lst)
left = [0,0]
right = [len(lst2)-1, len(lst2)-1]
while 1:
check = 0
if left[0] < len(lst1) and right[0] >= 0:
check += 1
sum = lst1[left[0]] + lst2[right[0]]
if sum < x1:
left[0] += 1
elif sum > x1:
right[0] -= 1
else:
return True
if left[1] < len(lst1) and right[1] >= 0:
check += 1
sum = lst1[left[1]] + lst2[right[1]]
if sum < x2:
left[1] += 1
elif sum > x2:
right[1] -= 1
else:
return True
if check == 0:
return False
n = 22
lst = getRandomList(n, 1, 3000)
startTime = time.time()
print checkListRegular(lst, -50) # -50 so it does worst case scenario
startTime2 = time.time()
print checkListSmaller(lst, -50) # -50 so it does worst case scenario
startTime3 = time.time()
print (startTime2 - startTime)
print (startTime3 - startTime2)
This is the helper library which I just use to generate the powerset list.
def dec_to_bin(x):
return int(bin(x)[2:])
def getNums(lst, xs):
sums = []
n = len(lst)
for i in xs:
bin = str(dec_to_bin(i))
bin = (n-len(bin))*"0" + bin
chosen_items = getList(bin, lst)
sums.append(sum(chosen_items))
sums.sort()
return sums
def getList(binary, lst):
s = []
for i in range(len(binary)):
if binary[i]=="1":
s.append(float(lst[i]))
return s

then it generates the sorted power sets of both in 2^((n-1)/2) time
OK, since now the list has one less lement. However, this is not a big deal its just a constant time improvement of 2^(1/2)...
then it does a linear search in both lists to see if 1 value in both lists sum to x or x-last (at same time with same running time) using a clever trick
... and this improvement will go away because now you do twice as many operations to check for both x and x-last sums instead of only for x
can we not repeat this again and again?
No you can't, for the same reason why you couldn't split the original algorithm again and again. The trick only works for once because once you start looking for values in more than two lists you can't use the sorting trick anymore.

Python: Determine whether each step in path across n arrays falls below threshold value

Given n arrays of integers, is there a good algorithm with which to determine whether there is a path across those arrays such that the minimum (Euclidean) distance of each "step" along that path falls below a certain threshold value? That is, the path across all arrays will include only one member from each array, and the distance of each step of that path will be determined by the absolute distance between the values from the two arrays being compared during the given step. For instance, say you have the following arrays:
a = [1,3,7]
b = [10,13]
c = [13,24]
and
threshold = 3
In that case, you would want to determine whether any elements of a and b have a distance of 3 or less between them, and for all pairs of elements from a and b that do in fact have a distance of three or less between them, you would want to determine whether either the given member from a or the given member from b has a distance of 3 or less from any member of c. (In the example above, the only path across the integers for which the distance of each step falls below the threshold condition is 7-->10-->13.)
Here's how I'm approaching such a problem when the number of arrays is three:
from numpy import*
a = [1,3,7]
b = [10,13]
c = [13,24]
d = [45]
def find_path_across_three_arrays_with_threshold_value_three(a,b,c):
'''this function takes three lists as input, and it determines whether
there is a path across those lists for which each step of that path
has a distance of three or less'''
threshold = 3
#start with a,b
for i in a:
for j in b:
#if the absolute value of i-j is less than or equal to the threshold parameter (user-specified proximity value)
if abs(i-j) <= threshold:
for k in c:
if abs(i-k) <= threshold:
return i,j,k
elif abs(j-k) <= threshold:
return i,j,k
#now start with a,c
for i in a:
for k in c:
if abs(i-k) <= threshold:
for j in b:
if abs(i-j) <= threshold:
return i,j,k
elif abs(j-k) <= threshold:
return i,j,k
#finally, start with b,c
for j in b:
for k in c:
if abs(j-k) <= threshold:
for i in a:
if abs(i-j) <= threshold:
return i,j,k
elif abs(i-k) <= threshold:
return i,j,k
if find_path_across_three_arrays_with_threshold_value_three(a,b,c):
print "ok"
If you didn't know in advance, though, how many arrays there were, what would be the most efficient way of calculating whether there is a path across all n arrays, such that the distance of each "step" in the path fell below the desired threshold value? Would something like Dijkstra's algorithm be the best way to generalize this problem for n arrays?
EDIT:
#Falko's method works for me:
import numpy as np
import itertools
my_list = [[1, 3, 7], [10, 13], [13, 24], [19], [16]]
def isPath(A, threshold):
for i in range(len(A) - 1):
#print "Finding edges from layer", i, "to", i + 1, "..."
diffs = np.array(A[i]).reshape((-1, 1)) - np.array(A[i + 1]).reshape((1, -1))
reached = np.any(np.abs(diffs) <= threshold, axis = 0)
A[i + 1] = [A[i + 1][j] for j in range(len(reached)) if reached[j]]
#print "Reachable nodes of next layer:", A[i + 1]
return any(reached)
for i in itertools.permutations(my_list):
new_list = []
for j in i:
new_list.extend([j])
if isPath(new_list,3):
print "threshold 3 match for ", new_list
if isPath(new_list,10):
print "threshold 10 match for ", new_list

I found a much simpler solution (maybe related to the one from JohnB; I'm not sure):
import numpy as np
def isPath(A, threshold):
for i in range(len(A) - 1):
print "Finding edges from layer", i, "to", i + 1, "..."
diffs = np.array(A[i]).reshape((-1, 1)) - np.array(A[i + 1]).reshape((1, -1))
reached = np.any(np.abs(diffs) <= threshold, axis = 0)
A[i + 1] = [A[i + 1][j] for j in range(len(reached)) if reached[j]]
print "Reachable nodes of next layer:", A[i + 1]
return any(reached)
print isPath([[1, 3, 7], [10, 13], [13, 24]], 3)
print isPath([[1, 3, 7], [10, 13], [13, 24]], 10)
Output:
Finding edges from layer 0 to 1 ...
Reachable nodes of next layer: [10]
Finding edges from layer 1 to 2 ...
Reachable nodes of next layer: [13]
True
Finding edges from layer 0 to 1 ...
Reachable nodes of next layer: [10, 13]
Finding edges from layer 1 to 2 ...
Reachable nodes of next layer: [13]
True
It steps from one layer to another an checks, which nodes still can be reached given the predefined threshold. Unreachable nodes are removed from the array. When the loop continues, those nodes are not considered anymore.
I guess it's pretty efficient and easy to implement.

First I'd build up an undirected graph: Each number in your array is a node and nodes of neighboring rows are connected if and only if their distance is smaller than your threshold.
Then you can use a standard algorithm to determine connected components of the graph. You'll probably find many references and code examples about this common problem.
Finally you need to check if one component contains nodes from a as well as nodes from your last row, c in this case.

(short answer: Floyd-Warshall is more efficient in this case than naive application of Dijkstra's)
I'm not 100% clear from your example, but it seems that you have to advance through the arrays in increasing order, and that you cannot backtrack.
ie...
A = [1, 300]
B = [2, 11]
C = [12, 301]
You go A(1) -> B(2), but there is no path to C because you can't jump to B(11) -> C(12). Similarly you can't jump A(300) -> C(301).
You could create, as you suggest, an adjacency matrix of size NM x NM where N is the |arrays| and M is |elements in each array|. You would want to use a sparse array implementation as most of the values are nil.
For each increasing pair (ai,bj), (bi,cj) you perform the pairwise calculations and store the connection if it is <= your threshold value.
The runtime for this would be N * M^2, which is smaller than the cost of finding paths (in the worst case) and so is probably acceptable. For cases where threshold << M and arrays do not contain repetitions this can be reduced to N*MlgM by sorting first. As at most threshold*M comparisons are needed for each array pair comparison.
To use Dijkstra's you'd have to run it M*M times, once for each pair of elements in arrays a and n. Which works out to O(M^2 * ElgV) (E is number of edges, V is number of vertexes) Which in the worst case E = (N-1)*M^2, and V is N*M. or N*M^4 * lg(N*M). Floyd-Warshall algorithm for all pairs of shortest paths runs in V^3, or (N*M)^3, which is smaller.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Tricky Median Question - python

Related

What's the most efficient way to find factors in a list?

NumPy: Sparse outer product of n vectors (hyperbolic cross)

Find the total number of triplets when summed are less than a given threshold

Subset sum algorithm a little faster than 2^(n/2) in worst time?

Python: Determine whether each step in path across n arrays falls below threshold value

Categories

Resources