Complex heapq lazy merge (minimize space used) - python

Given a list of sorted lists, I wish to produce a sorted list of output.
This would be easy:
nums : List[List[int]]
h = heapq.merge(nums)
However, I also wish to tag each element of the output, with the index of the inner list from which it originated. Eg.
nums = [[1,3,5], [2,6], [4]]
h = _ # ???
for x in h:
print(x)
# Outputs
# (1,0)
# (2,1)
# (3,0)
# (4,2)
# (5,0)
# (6,1)
I have written a version that works,
h = heapq.merge(*map(lambda l: map(lambda x: (x,l[0]), l[1]), enumerate(nums)))
But I'm afraid I might have lost a desirable space-complexity guarantee; how can I know whether the (transformed) inner lists are being manifested or not? (and what exactly does the * do in my attempt?)

Since default tuple comparison is lexicographic, you can convert the innermost-element x to (x, i) where i is the index of the list containing x in nums. For example,
import heapq
from itertools import repeat
nums = [[1,3,5], [2,6], [4]]
nums_with_inds = (zip(lst, repeat(i)) for i, lst in enumerate(nums))
res = heapq.merge(*nums_with_inds)
for tup in res:
print(tup)
# (1, 0)
# (2, 1)
# (3, 0)
# (4, 2)
# (5, 0)
# (6, 1)

Related

Searching for intersections in two tuples of tuples in python

Having the following problem. I'm reading the data from stdin and save it in list that I convert to tuple the following way:
x = int(input())
f = []
for i in range(x):
a, b = map(int, input().split())
f.append([a,b])
def to_tuple(lst):
return tuple(to_tuple(i) if isinstance(i, list) else i for i in lst)
After this I receive two tuples of tuples looking something like that:
f = ((0, 1), (1, 2), (0, 2), (0, 3))
s = (((0,), (1, 2, 3)), ((0, 1), (2, 3)), ((0, 1, 2), (3,)))
What I'm trying to do is to find the number of intersections between all inner tuples of f and each tuple of s. In my case "intersection" should be considered as an "edges" between tuples (so in f we have all possible "edges" and checking if there will be an edge between inner tuples in particular tuple of s). So for the example it should print [3,3,1].
Basically, I know how to do in the simple case of intersection - so one can just use set() and then apply a.intersection(b) But how should I proceed in my case?
Many thanks and sorry if the question was already asked before :=)
I am sure this can be solve by different ways. but I believe this is the easiest.
out = set() # holds the output
for ff in f: # loop through f tuple
ff = set(ff) # convert to set
for ss1,ss2 in s: # loop through s tuple
# you can select which tuple to do the intersection on.
# here I am doing the intersection on both inner tuples in the s tuple.
ss1 = set(ss1) # convert to set
ss2 = set(ss2)
out.update(ff.intersection(ss1)) # intersection and add to out
out.update(ff.intersection(ss2)) # intersection and add to out
#if you want your output to be in list format
out = list(out)
This is an example of how you can proceed
a = ((1,1),(1,2))
b = (((1,2),(3,1)),((3,2),(1,2)),((1,4),))
l=[]
for t in b:
c=[i for i in a for j in t if i==j]
l.append(c)
print(l)
General answer for overall amount of edges:
def cnt_edges(a,b):
edge_cnt = 0
for i in range(len(a)):
node1 = a[i][0]
node2 = a[i][1]
for j in range(len(b)):
inner_node1 = b[j][0]
inner_node2 = b[j][1]
if (node1 in inner_node1 and node2 in inner_node2) or (node1 in inner_node2 and node2 in inner_node1):
edge_cnt += 1
return edge_cnt
a = ((0, 1),(0, 2), (0,3))
b = (((0),(1,2,3)), ((0,1),(2,3)), ((0,1,2),(3)))
cnt_edges(a,b)

Order dictionary with x and y coordinates in python

I have this problem.
I need order this points 1-7
1(4,2), 2(3, 5), 3(1,4), 4(1,1), 5(2,2), 6(1,3), 7(1,5)
and get this result
4 , 6 , 3 , 5 , 2 , 1 , 7.
I am using a python script for sort with x reference and is ok, but the sort in y is wrong.
I have tried with sorted(dicts,key=itemgetter(1,2))
Someone can help me please ?
Try this:
sorted(dicts,key=itemgetter(1,0))
Indexing in python starts at 0. itemgetter(1,0) is sorting by the second element and then by the first element
This sorts the code based on ordering the first coordinate of the tuple, and then sub-ordering by the second coordinate of the tuple. I.e. Like alphabetically where "Aa", then "Ab", then "Ba", then "Bb". More literall (1,1), (1,2), (2,1), (2,2), etc.
This will work IF (and only if) the tuple value pair associated with #7 is actually out of order in your question (and should actually be between #3 and #5.)
If this is NOT the case, See my other answer.
# Make it a dictionary, with the VALUETUPLES as the KEYS, and the designator as the value
d = {(1,1):4, (1,3):6, (1,4):3, (2,2):5, (3,5):2, (4,2):1,(1,5):7}
# ALSO make a list of just the value tuples
l = [ (1,1), (1,3), (1,4), (2,2), (3,5), (4,2), (1,5)]
# Sort the list by the first element in each tuple. ignoring the second
new = sorted(l, key=lambda x: x[0])
# Create a new dictionary, basically for temp sorting
new_d = {}
# This iterates through the first sorted list "new"
# and creates a dictionary where the key is the first number of value tuples
count = 0
# The extended range is because we don't know if any of the Tuple Values share any same numbers
for r in range(0, len(new)+1,1):
count += 1
new_d[r] = []
for item in new:
if item[0] == r:
new_d[r].append(item)
print(new_d) # So it makes sense
# Make a final list to capture the rdered TUPLES VALUES
final_list = []
# Go through the same rage as above
for r in range(0, len(new)+1,1):
_list = new_d[r] # Grab the first list item from the dic. Order does not matter here
if len(_list) > 0: # If the list has any values...
# Sort that list now by the SECOND tuple value
_list = sorted(_list, key=lambda x: x[1])
# Lists are ordered. So we can now just tack that ordered list onto the final list.
# The order remains
for item in _list:
final_list.append(item)
# This is all the tuple values in order
print(final_list)
# If you need them correlated to their original numbers
by_designator_num = []
for i in final_list: # The the first tupele value
by_designator_num.append(d[i]) # Use the tuple value as the key, to get the original designator number from the original "d" dictionary
print(by_designator_num)
OUTPUT:
[(1, 1), (1, 3), (1, 4), (1, 5), (2, 2), (3, 5), (4, 2)]
[4, 6, 3, 7, 5, 2, 1]
Since you're searching visually from top-to-bottom, then left-to-right, this code is much simpler and provides the correct result. It basically does the equivalent of a visual scan, by checking for all tuples that are at each "y=n" position, and then sorting any "y=n" tuples based on the second number (left-to-right).
Just to be more consistent with the Cartesian number system, I've converted the points on the graph to (x,y) coordinates, with X-positive (increasing to the right) and y-negative (decreasing as they go down).
d = {(2,-4):1, (5,-3):2, (4,-1):3, (1,-1):4, (2,-2):5, (3,-1):6, (1,-5):7}
l = [(2,-4), (5,-3), (4,-1), (1,-1), (2,-2), (3,-1), (1,-5)]
results = []
# Use the length of the list. Its more than needed, but guarantees enough loops
for y in range(0, -len(l), -1):
# For ONLY the items found at the specified y coordinate
temp_list = []
for i in l: # Loop through ALL the items in the list
if i[1] == y: # If tuple is at this "y" coordinate then...
temp_list.append(i) # ... append it to the temp list
# Now sort the list based on the "x" position of the coordinate
temp_list = sorted(temp_list, key=lambda x: x[0])
results += temp_list # And just append it to the final result list
# Final TUPLES in order
print(results)
# If you need them correlated to their original numbers
by_designator_num = []
for i in results: # The the first tupele value
by_designator_num.append(d[i]) # Use the tuple value as the key, to get the original designator number from the original "d" dictionary
print(by_designator_num)
OR if you want it faster and more compact
d = {(2,-4):1, (5,-3):2, (4,-1):3, (1,-1):4, (2,-2):5, (3,-1):6, (1,-5):7}
l = [(2,-4), (5,-3), (4,-1), (1,-1), (2,-2), (3,-1), (1,-5)]
results = []
for y in range(0, -len(l), -1):
results += sorted([i for i in l if i[1] == y ], key=lambda x: x[0])
print(results)
by_designator_num = [d[i] for i in results]
print(by_designator_num)
OUTPUT:
[(1, -1), (3, -1), (4, -1), (2, -2), (5, -3), (2, -4), (1, -5)]
[4, 6, 3, 5, 2, 1, 7]

Find indexes of repeated elements in an array (Python, NumPy)

Assume, I have a NumPy-array of integers, as:
[34,2,3,22,22,22,22,22,22,18,90,5,-55,-19,22,6,6,6,6,6,6,6,6,23,53,1,5,-42,82]
I want to find the start and end indices of the array, where a value is more than x-times (say 5-times) repeated. So in the case above, it is the value 22 and 6. Start index of the repeated 22 is 3 and end-index is 8. Same for the repeatening 6.
Is there a special tool in Python that is helpful?
Otherwise, I would loop through the array index for index and compare the actual value with the previous.
Regards.
Using np.diff and the method given here by #WarrenWeckesser for finding runs of zeros in an array:
import numpy as np
def zero_runs(a): # from link
iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
absdiff = np.abs(np.diff(iszero))
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
return ranges
a = [34,2,3,22,22,22,22,22,22,18,90,5,-55,-19,22,6,6,6,6,6,6,6,6,23,53,1,5,-42,82]
zero_runs(np.diff(a))
Out[87]:
array([[ 3, 8],
[15, 22]], dtype=int32)
This can then be filtered on the difference between the start & end of the run:
runs = zero_runs(np.diff(a))
runs[runs[:, 1]-runs[:, 0]>5] # runs of 7 or more, to illustrate filter
Out[96]: array([[15, 22]], dtype=int32)
Here is a solution using Python's native itertools.
Code
import itertools as it
def find_ranges(lst, n=2):
"""Return ranges for `n` or more repeated values."""
groups = ((k, tuple(g)) for k, g in it.groupby(enumerate(lst), lambda x: x[-1]))
repeated = (idx_g for k, idx_g in groups if len(idx_g) >=n)
return ((sub[0][0], sub[-1][0]) for sub in repeated)
lst = [34,2,3,22,22,22,22,22,22,18,90,5,-55,-19,22,6,6,6,6,6,6,6,6,23,53,1,5,-42,82]
list(find_ranges(lst, 5))
# [(3, 8), (15, 22)]
Tests
import nose.tools as nt
def test_ranges(f):
"""Verify list results identifying ranges."""
nt.eq_(list(f([])), [])
nt.eq_(list(f([0, 1,1,1,1,1,1, 2], 5)), [(1, 6)])
nt.eq_(list(f([1,1,1,1,1,1, 2,2, 1, 3, 1,1,1,1,1,1], 5)), [(0, 5), (10, 15)])
nt.eq_(list(f([1,1, 2, 1,1,1,1, 2, 1,1,1], 3)), [(3, 6), (8, 10)])
nt.eq_(list(f([1,1,1,1, 2, 1,1,1, 2, 1,1,1,1], 3)), [(0, 3), (5, 7), (9, 12)])
test_ranges(find_ranges)
This example captures (index, element) pairs in lst, and then groups them by element. Only repeated pairs are retained. Finally, first and last pairs are sliced, yielding (start, end) indices from each repeated group.
See also this post for finding ranges of indices using itertools.groupby.
There really isn't a great short-cut for this. You can do something like:
mult = 5
for elem in val_list:
target = [elem] * mult
found_at = val_list.index(target)
I leave the not-found exceptions and longer sequence detection to you.
If you're looking for value repeated n times in list L, you could do something like this:
def find_repeat(value, n, L):
look_for = [value for _ in range(n)]
for i in range(len(L)):
if L[i] == value and L[i:i+n] == look_for:
return i, i+n
Here is a relatively quick, errorless solution which also tells you how many copies were in the run. Some of this code was borrowed from KAL's solution.
# Return the start and (1-past-the-end) indices of the first instance of
# at least min_count copies of element value in container l
def find_repeat(value, min_count, l):
look_for = [value for _ in range(min_count)]
for i in range(len(l)):
count = 0
while l[i + count] == value:
count += 1
if count >= min_count:
return i, i + count
I had a similar requirement. This is what I came up with, using only comprehension lists:
A=[34,2,3,22,22,22,22,22,22,18,90,5,-55,-19,22,6,6,6,6,6,6,6,6,23,53,1,5,-42,82]
Find unique and return their indices
_, ind = np.unique(A,return_index=True)
np.unique sorts the array, sort the indices to get the indices in the original order
ind = np.sort(ind)
ind contains the indices of the first element in the repeating group, visible by non-consecutive indices
Their diff gives the number of elements in a group. Filtering using np.diff(ind)>5 shall give a boolean array with True at the starting indices of groups. The ind array contains the end indices of each group just after each True in the filtered list
Create a dict with the key as the repeating element and the values as a tuple of start and end indices of that group
rep_groups = dict((A[ind[i]], (ind[i], ind[i+1]-1)) for i,v in enumerate(np.diff(ind)>5) if v)

top n keys with highest values in dictionary with tuples as keys

I want to get the top n keys of a dictionary with tuples as keys, where the first value of the tuple is a particular number (1 in the example below):
a = {}
a[1,2] = 3
a[1,0] =4
a[1,5] = 1
a[2,3] = 9
I want [1,0] and [1,2] to be returned, where the first element of the tuple/key = 1
this
import heapq
k = heapq.nlargest(2, a, key=a.get(1,))
returns [1,4] and [1,3], the highest keys/tuples with first element = 1, though if I make it
k = heapq.nlargest(2, a, key=a.get(2,))
it returns the same thing?
First you should take only the keys with first coordinate 1. Otherwise, there is the chance if there are a few elements with 1 as first coordinate, to get other tuples also. Then you can use heapq normally. For example:
a = {
(1, 2): 3,
(1, 0): 4,
(1, 5): 1,
(2, 3): 9
}
import heapq
print heapq.nlargest(2, (k for k in a if k[0] == 1), key=lambda k: a[k])
print heapq.nlargest(2, (k for k in a if k[0] == 2), key=lambda k: a[k])
Output:
[(1, 0), (1, 2)]
[(2, 3)]
The key parameter should be a function. But you are passing in a.get(1,). What this does is calling a.get(1,) which is the same as a.get(1) which is the same as a.get(1, None).
The dictionary doesn't have a 1 key so it returns None which means you are doing the equivalent of passing key=None which is the same as not passing a key at all: you are using the identity function as key.
Then heapq.nlargest returns the top 2 elements which are, correctly, [1, 4] and [1, 3].
This explains why using a.get(1,) and a.get(2,) does the same thing. The above reasoning works for both values and you end up with key=None in both cases.
To achieve what you want use something like:
key=lambda x: (x[0] == 1, a[x])
If you find yourself using this kind of keys often you can create a key maker function:
def make_key(value, container):
def key(x):
return x[0] == value, container[x]
return key
using it as:
nlargest(2, a, key=make_key(1, a))
nlargest(2, a, key=make_key(2, a))

getting all the tiles with the maximum value in a 2d grid (a multidimensional list)

I have a 2d grid (multidimensional list) like maybe so: [[1,2,3],[1,3,3],[1,1,1]] and I want to get the positions of all elements that contain the max values, so in this case that would be: (0,3), (1,1)and(1,2).
I now have a very clumsy way which works, like this:
max_val = -999999
for i in range(board.get_dim()):
for j in range(board.get_dim()):
if scores[i][j] > max_val:
max_val = scores[i][j]
max_coords = []
for i in range(board.get_dim()):
for j in range(board.get_dim()):
if scores[i][j] == max_val:
max_coords.append((i,j))
But I was hoping if someone could point me to a more concise solution?
One way is to use itertools.chain with max() to get the max value and then use a list comprehension with enumerate() to get the indices:
>>> from itertools import chain
##find the max value on a flattened version of the list
>>> max_val = max(chain.from_iterable(lst))
>>> lst = [[1,2,3], [1,3,3], [1,1,1]]
>>> [(i, j) for i, x in enumerate(lst) for j, y in enumerate(x) if y == max_val]
[(0, 2), (1, 1), (1, 2)]
NumPy makes it very easy though:
>>> import numpy as np
>>> arr = np.array(lst)
>>> zip(*np.where(arr==arr.max()))
[(0, 2), (1, 1), (1, 2)]

Categories