List of minimal pairs from a pair of lists - python

Given two lists of integers, generate the shortest list of pairs where every value in both lists is present. The first of each pair must be a value from the first list, and the second of each pair must be a value from the second list. The first of each pair must be less than the second of the pair.
A simple zip will not work if the lists are different lengths, or if the same integer exists at the same position in each list.
def gen_min_pairs(uplist, downlist):
for pair in zip(uplist, downlist):
yield pair
Here is what I can come up with so far:
def gen_min_pairs(uplist, downlist):
up_gen = iter(uplist)
down_gen = iter(downlist)
last_up = None
last_down = None
while True:
next_out = next(up_gen, last_up)
next_down = next(down_gen, last_down)
if (next_up == last_up and
next_down == last_down):
return
while not next_up < next_down:
next_down = next(down_gen, None)
if next_down is None:
return
yield next_up, next_down
last_up = next_up
last_down = next_down
And here is a simple test routine:
if __name__ == '__main__':
from pprint import pprint
datalist = [
{
'up': [1,7,8],
'down': [6,7,13]
},
{
'up': [1,13,15,16],
'down': [6,7,15]
}
]
for dates in datalist:
min_pairs = [pair for pair in
gen_min_pairs(dates['up'], dates['down'])]
pprint(min_pairs)
The program produces the expect output for the first set of dates, but fails for the second.
Expected:
[(1, 6), (7, 13), (8, 13)]
[(1, 6), (1, 7), (13, 15)]
Actual:
[(1, 6), (7, 13), (8, 13)]
[(1, 6), (13, 15)]
I think this can be done while only looking at each element of each list once, so in the complexity O(len(up) + len(down)). I think it depends on the number elements unique to each list.
EDIT: I should add that we can expect these lists to be sorted with the smallest integer first.
EDIT: uplist and downlist were just arbitrary names. Less confusing arbitrary ones might be A and B.
Also, here is a more robust test routine:
from random import uniform, sample
from pprint import pprint
def random_sorted_sample(maxsize=6, pop=31):
size = int(round(uniform(1,maxsize)))
li = sample(xrange(1,pop), size)
return sorted(li)
if __name__ == '__main__':
A = random_sorted_sample()
B = random_sorted_sample()
min_pairs = list(gen_min_pairs(A, B))
pprint(A)
pprint(B)
pprint(min_pairs)
This generates random realistic inputs, calculates the output, and displays all three lists. Here is an example of what a correct implementation would produce:
[11, 13]
[1, 13, 28]
[(11, 13), (13, 28)]
[5, 15, 24, 25]
[3, 13, 21, 22]
[(5, 13), (15, 21), (15, 22)]
[3, 28]
[4, 6, 15, 16, 30]
[(3, 4), (3, 6), (3, 15), (3, 16), (28, 30)]
[2, 5, 20, 24, 26]
[8, 12, 16, 21, 23, 28]
[(2, 8), (5, 12), (5, 16), (20, 21), (20, 23), (24, 28), (26, 28)]
[3, 4, 5, 6, 7]
[1, 2]
[]

I had many ideas to solve this (see edit history ;-/) but none of them quite worked out or did it in linear time. It took me a while to see it, but I had a similar problem before so I really wanted to figure this out ;-)
Anyways, in the end the solution came when I gave up on doing it directly and started drawing graphs about the matchings. I think your first list simply defines intervals and you're looking for the items that fall into them:
def intervals(seq):
seq = iter(seq)
current = next(seq)
for s in seq:
yield current,s
current = s
yield s, float("inf")
def gen_min_pairs( fst, snd):
snd = iter(snd)
s = next(snd)
for low, up in intervals(fst):
while True:
# does it fall in the current interval
if low < s <= up:
yield low, s
# try with the next
s = next(snd)
else:
# nothing in this interval, go to the next
break

zip_longest is called izip_longest in python 2.x.
import itertools
def MinPairs(up,down):
if not (up or down):
return []
up=list(itertools.takewhile(lambda x:x<down[-1],up))
if not up:
return []
down=list(itertools.dropwhile(lambda x:x<up[0],down))
if not down:
return []
for i in range(min(len(up),len(down))):
if up[i]>=down[i]:
up.insert(i,up[i-1])
return tuple(itertools.zip_longest(up,down,fillvalue=(up,down)[len(up)>len(down)][-1]))

While not a complete answers (i.e. no code), have you tried looking at the numpy "where" module?

Related

Accessing left and right side indices in an array, where the elements differ by 1

I have a one-dimensional array like this:
data=np.array([1,1,1,1,0,0,0,0,0,1,1,1,4,4,4,4,4,4,1,1,1,0,0,0])
With the function below, I am able to access the index pairs where the absolute difference between adjacent elements is 1.
Current code:
result = [(i, i + 1) for i in np.where(np.abs(np.diff(data)) == 1)[0]]
Current output:
[(3, 4), (8, 9), (20, 21)]
How would I modify this code so that for each place where the difference is 1, I get not only the pair of indices, but also two indices to the left and two to the right of the transition?
Required output:
[2,3,(3, 4),4,5,7,8,(8, 9),9,10,19,20, (20, 21),21,22]
Ignore my variable names. I didn't try to be professional. Also, this could probably be done in a "shorter" way. Just wanted to provide a solution.
Code:
import numpy as np
data=np.array([1,1,1,1,0,0,0,0,0,1,1,1,4,4,4,4,4,4,1,1,1,0,0,0])
result = [(i - 1, i, (i, i + 1), i + 1, i + 2) for i in np.where(np.abs(np.diff(data)) == 1)[0]]
new_result = []
for r in result:
for r1 in r:
new_result.append(r1)
new_result = np.array(new_result, dtype=object)
print(new_result)
Output:
[2, 3, (3, 4), 4, 5, 7, 8, (8, 9), 9, 10, 19, 20, (20, 21), 21, 22]

Return tuple with biggest increase of second value in a list of tuples

like the title says I have a list of tuples: [(3, 20), (9, 21), (18, 19)]. I need to find the tuple that has a positive y-increase wrt its predecessor. In this case 21-20 = 1. So tuple (9,21) should be returned. 19-21 = -1 so tuple (18,19) shouldn't be returned. The very first tuple in the list should never be returned. I've tried putting all the values in a list and then trying to figure it out but I'm clueless. It should work for lists of tuples of any length. I hope you guys can help me out, thanks in advance.
You could compare the second element of each tuple with the previous one, while iterating over the list:
data = [(3, 20), (9, 21), (18, 19), (1, 35), (4, 37), (1, 2)]
maxIncrease = [0, 0] # store max increase value and it's index
for i in range(1, len(data)):
lst = data[i - 1]
cur = data[i]
diff = cur[1] - lst[1]
if diff > maxIncrease[0]:
maxIncrease = [diff, i]
print(
f'Biggest increase of {maxIncrease[0]} found at index {maxIncrease[1]}: {data[maxIncrease[1]]}'
)
Out:
Biggest increase of 16 found at index 3: (1, 35)
I think something like that can solve your problem:
import numpy as np
data = [(3, 20), (9, 21), (18, 19), (10, 22)]
diff_with_previous = []
for i in range(len(data)):
if i == 0:
diff_with_previous.append(-np.inf)
else:
diff_with_previous.append(data[i][1] - data[i-1][1])
indices = np.where(np.array(diff_with_previous) > 0)
print([data[i] for i in indices[0]])
[EDIT]
Without numpy:
data = [(3, 20), (9, 21), (18, 19), (10, 22)]
indices = []
for i in range(1, len(data)):
if (data[i][1] - data[i-1][1]) > 0:
indices.append(i)
print([data[i] for i in indices])

Concatenating returned elements to list in a recursive function

This one has been giving me a headache for too long
I am trying to create a list of tuples from a recursion, but I can't quite figure out if how I'm approaching this is going to work or not.
Below, foo() and A are aliases for more complicated methods and structures, but I'd like foo() below to return the following:
[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7), (8,
8)]
First attempt
When I try adding them together as lists, it nests all the lists.
A = [(num, num) for num in np.arange(9)]
def foo(A):
if len(A)==1:
return(A[0])
else:
return([A[0]] + [foo(A[1:])])
print(foo(A))
output: [(0, 0), [(1, 1), [(2, 2), [(3, 3), [(4, 4), [(5, 5), [(6, 6),
[(7, 7), (8, 8)]]]]]]]]
Second attempt
I can understand why this is wrong, so I tried appending the returned values to the list at the higher level, nothing returns:
A = [(num, num) for num in np.arange(9)]
def foo(A):
if len(A)==1:
return(A[0])
else:
return([A[0]].append(foo(A[1:])))
print(foo(A))
output: None
Current solution (there's got to be a better way)
def foo(A):
if len(A)==1:
return(A[0])
else:
return(A[0] + foo(A[1:]))
output: (0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8)
...then:
temp = np.array(foo(A)).reshape(-1,2)
output = [tuple(temp[i, :]) for i in range(np.shape(temp)[0])]
print(output)
which gives the desired output...
Can someone give some advice on how to do this correctly using recursion?
I'm not sure what you are asking for since A is already the structure you want. But for your first attempt, you mess up the return type. The first if returns a number, but the second if returns a list. So make sure the first if returns a list, and remove the list conversion in the second if. It should be like this:
import numpy as np
A = [(num, num) for num in np.arange(9)]
def foo(A):
if len(A)==1:
return([A[0]])
else:
return([A[0]] + foo(A[1:]))
print(foo(A))
You pretty much had it on your first try. By adding an unpacking and altering your original base case, we can get your desired result.
def foo(A):
if len(A)==1:
return([A[0]])
else:
return([A[0]] + [*foo(A[1:])])
print(foo(A))

How can I find a tuple from a group of tuples with the values closest to a given tuple?

I'm working with python and have a dict where the keys are tuples with 3 values each.
I'm computing another tuple with 3 values, and I want to find the tuple in the keys of the dict with the closest values to this newly computed tuple.
How should I go about doing this?
You are probably looking for the abs() of the difference, e.g.:
>>> from random import randint
>>> d = [tuple(randint(1, 20) for _ in range(3)) for _ in range(5)]
>>> d
[(4, 13, 10), (12, 18, 19), (11, 18, 8), (16, 17, 4), (2, 4, 10)]
>>> k = tuple(randint(1, 20) for _ in range(3))
>>> k
(14, 13, 1)
>>> min(d, key=lambda x: sum(abs(m-n) for m, n in zip(k, x)))
(16, 17, 4)
You could do something like this:
def euclid2(x,y):
return sum((xi-yi)**2 for xi,yi in zip(x,y))
def closestTuple(target,tuples, dist = euclid2):
return min((dist(t,target),t) for t in tuples)[1]
#test:
target = (3,5,1)
tuples = [(3,1,2), (4,1,5), (6,1,7), (4,4,2), (1,5,7)]
print(closestTuple(target,tuples)) #prints (4,4,2)
This finds the tuple which is closest to the target tuple in the Euclidean metric. You could of course pass another function for the dist parameter.
You could try the following (i used a list), and simply iterate over each element, and take the difference between each element in the tuple, then in the end sort the sums of the differences and look for the smallest amount
a = [(1, 2, 3), (3, 4, 5), (5, 6, 7)]
b = (2, 4, 5)
c = []
for x in a:
c[a.index(x)] = 0
for i in range(len(x)):
c[i]+=x[i]-b[i]

Find duplicates in a list of lists with tuples

I am trying to find duplicates within tuples that are nested within a list. This whole construction is a list too. If there are other better ways to organize this to let my problem to be solved - I'd be glad to know, because this is something I build on the way.
pairsList = [
[1, (11, 12), (13, 14)], #list1
[2, (21, 22), (23, 24)], #list2
[3, (31, 32), (13, 14)], #list3
[4, (43, 44), (21, 22)], #list4
]
The first element in each list uniquely identifies each list.
From this object pairsList, I want to find out which lists have identical tuples. So I want to report that list1 has the same tuple as list3 (because both have (13,14). Likewise, list2 and list4 have the same tuple (both have (21,22)) and need to be reported. The position of tuples within the list doesn't matter (list2 and list4 both have (13,14) even though the position in the list the tuple has is different).
The output result could be anything iterable later on such as (1,3),(2,4) or [1,3],[2,4]. It is the pairs I am interested in.
I am aware of sets and have used them to delete duplicates within the lists in other situations, but cannot understand how to solve this problem. I can check like this if one list contains any element from the other list:
list1 = [1, (11, 12), (13, 14)]
list2 = [3, (31, 32), (13, 14)]
print not set(list1).isdisjoint(list2)
>>>True
So, the code below lets me know what lists have same tuple(s) as the first one. But what is the correct way to perform this on all the lists?
counter = 0
for pair in pairsList:
list0 = pairsList[0]
iterList = pairsList[counter]
if not set(list0).isdisjoint(iterList):
print iterList[0] #print list ID
counter += 1
The first element in each list uniquely identifies each list.
Great, then let's convert it to a dict first:
d = {x[0]: x[1:] for x in pairsList}
# d:
{1: [(11, 12), (13, 14)],
2: [(21, 22), (23, 24)],
3: [(31, 32), (13, 14)],
4: [(43, 44), (21, 22)]}
Let's index the whole data structure:
index = {}
for k, vv in d.iteritems():
for v in vv:
index.setdefault(v, []).append(k)
Now index is:
{(11, 12): [1],
(13, 14): [1, 3],
(21, 22): [2, 4],
(23, 24): [2],
(31, 32): [3],
(43, 44): [4]}
The output result could be anything iterable later on such as (1,3),(2,4) or [1,3],[2,4]. It is the pairs I am interested in.
pairs = [v for v in index.itervalues() if len(v) == 2]
returns [[1,3],[2,4]].

Categories