top n keys with highest values in dictionary with tuples as keys - python

I want to get the top n keys of a dictionary with tuples as keys, where the first value of the tuple is a particular number (1 in the example below):
a = {}
a[1,2] = 3
a[1,0] =4
a[1,5] = 1
a[2,3] = 9
I want [1,0] and [1,2] to be returned, where the first element of the tuple/key = 1
this
import heapq
k = heapq.nlargest(2, a, key=a.get(1,))
returns [1,4] and [1,3], the highest keys/tuples with first element = 1, though if I make it
k = heapq.nlargest(2, a, key=a.get(2,))
it returns the same thing?

First you should take only the keys with first coordinate 1. Otherwise, there is the chance if there are a few elements with 1 as first coordinate, to get other tuples also. Then you can use heapq normally. For example:
a = {
(1, 2): 3,
(1, 0): 4,
(1, 5): 1,
(2, 3): 9
}
import heapq
print heapq.nlargest(2, (k for k in a if k[0] == 1), key=lambda k: a[k])
print heapq.nlargest(2, (k for k in a if k[0] == 2), key=lambda k: a[k])
Output:
[(1, 0), (1, 2)]
[(2, 3)]

The key parameter should be a function. But you are passing in a.get(1,). What this does is calling a.get(1,) which is the same as a.get(1) which is the same as a.get(1, None).
The dictionary doesn't have a 1 key so it returns None which means you are doing the equivalent of passing key=None which is the same as not passing a key at all: you are using the identity function as key.
Then heapq.nlargest returns the top 2 elements which are, correctly, [1, 4] and [1, 3].
This explains why using a.get(1,) and a.get(2,) does the same thing. The above reasoning works for both values and you end up with key=None in both cases.
To achieve what you want use something like:
key=lambda x: (x[0] == 1, a[x])
If you find yourself using this kind of keys often you can create a key maker function:
def make_key(value, container):
def key(x):
return x[0] == value, container[x]
return key
using it as:
nlargest(2, a, key=make_key(1, a))
nlargest(2, a, key=make_key(2, a))

Related

Dictionary with multiple values per key via for loop

given a List in Python I want top create a dictionary that stores all possible two sums as keys and the corresponding indices as values, e.g.
list = [1,0,-1, 0]
Then I would to compute the dictionary {1:{0,1}, {0,3}, 0: {1,3},{0,2}, -1:{1,2}, {2,3}}.
I am having troubles finding out how to have a dictionary where one key corresponds to multiple values. If I use dict[sum]={i,j} I am always replacing the entries in my dictionary while instead I would like to add them.
Does anyone know if there exists a solution?
IIUC, use a dictionary with setdefault to add the results and itertools.combinations to generate the combinations of indices:
lst = [1,0,-1, 0]
from itertools import combinations
out = {}
for i,j in combinations(range(len(lst)), 2):
a = lst[i] # first value
b = lst[j] # second value
S = a+b # sum of values
# if the key is missing, add empty list
# append combination of indices as value
out.setdefault(S, []).append((i,j))
print(out)
Condensed variant:
out = {}
for i,j in combinations(range(len(lst)), 2):
out.setdefault(lst[i]+lst[j], []).append((i,j))
output:
{ 1: [(0, 1), (0, 3)],
0: [(0, 2), (1, 3)],
-1: [(1, 2), (2, 3)]}
Try this:
arr = [1, 0, -1, 0]
map = {}
for i in range(len(arr)):
for j in range(i + 1, len(arr)):
s = arr[i] + arr[j]
if s not in map:
map[s] = []
map[s].append((i, j))
print(map)

Finding value in dict given an integer that can be found in between dictionary's tuple key

Given an x dictionary of tuple keys and string values:
x = {(0, 4): 'foo', (4,9): 'bar', (9,10): 'sheep'}
The task is to write the function, find the value, given a specific number, e.g. if user inputs 3, it should return 'foo'. We can assume that there is no overlapping numbers in the key.
Another e.g., if user inputs 9, it should return 'bar'.
I've tried converting the x dict to a list and write the function as follows, but it's suboptimal if the range of values in the keys is extremely huge:
from itertools import chain
mappings = None * max(chain(*x))
for k in x:
for i in range(k[0], k[1]):
mappings[i] = x[k]
def myfunc(num):
return mapping[num]
How else can the myfunc function be written?
Is there a better data structure to keep the mapping?
You can convert your key in a numpy array and use numpy.searchsorted to search a query. Since keys are left open I have incremented open value of keys by 1 in the array.
Each query is of order O(log(n)).
Create an array:
A = np.array([[k1+1, k2] for k1, k2 in x])
>>> A
array([[ 1, 4],
[ 5, 9],
[10, 10]])
Function to search query:
def myfunc(num):
ind1 = np.searchsorted(A[:, 0], num, 'right')
ind2 = np.searchsorted(A[:, 1], num, 'left')
if ind1 == 0 or ind2 == A.shape[0] or ind1 <= ind2: return None
return vals[ind2]
Prints:
>>> myfunc(3)
'foo'
Iterate over the dictionary comparing to the keys:
x = {(0, 4): 'foo', (4, 9): 'bar', (9, 10): 'sheep'}
def find_tuple(dct, num):
for tup, val in dct.items():
if tup[0] <= num < tup[1]:
return val
return None
print(find_tuple(x, 3))
# foo
print(find_tuple(x, 9))
# sheep
print(find_tuple(x, 11))
# None
A better data structure would be a dictionary with just the left boundaries of the intervals (as keys) and the corresponding values. Then you can use bisect as the other answers mention.
import bisect
import math
x = {
-math.inf: None,
0: 'foo',
4: 'bar',
9: 'sheep',
10: None,
}
def find_tuple(dct, num):
idx = bisect.bisect_right(list(dct.keys()), num)
return list(dct.values())[idx-1]
print(find_tuple(x, 3))
# foo
print(find_tuple(x, 9))
# sheep
print(find_tuple(x, 11))
# None
You could simply iterate through keys and compare the values (rather than creating a mapping). This is a bit more efficient than creating a mapping first, since you could have a key like (0, 100000) which will create needless overhead.
Edited answer based on comments from OP
x = {(0, 4): 'foo', (4,9): 'bar', (9,10): 'sheep'}
def find_value(k):
for t1, t2 in x:
if k > t1 and k <= t2: # edited based on comments
return x[(t1, t2)]
# if we end up here, we can't find a match
# do whatever appropriate, e.g. return None or raise exception
return None
Note: it's unclear in your tuple keys if they are inclusive ranges for the input number. E.g. if a user inputs 4, should they get 'foo' or 'bar'? This will affect your comparison in the function described above in my snippet. (see edit above, this should fulfill your requirement).
In this example above, an input of 4 would return 'foo', since it would fulfill the condition of being k >= 0 and k <= 4, and thus return before continuing the loop.
Edit: wording and typo fix
Here's one solution using pandas.IntervalIndex and pandas.cut. Note, I "tweaked" the last key to (10, 11), because I'm using closed="left" in my IntervalIndex. You can change this if you want the intervals closed on different sides (or both):
import pandas as pd
x = {(0, 4): "foo", (4, 9): "bar", (10, 11): "sheep"}
bins = pd.IntervalIndex.from_tuples(x, closed="left")
result = pd.cut([3], bins)[0]
print(x[(result.left, result.right)])
Prints:
foo
Other solution using bisect module (assuming the ranges are continuous - so no "gaps"):
from bisect import bisect_left
x = {(0, 4): "foo", (4, 9): "bar", (10, 10): "sheep"}
bins, values = [], []
for k in sorted(x):
bins.append(k[1]) # intervals are closed "right", eg. (0, 4]
values.append(x[k])
idx = bisect_left(bins, 4)
print(values[idx])
Prints:
foo

Sparse matrix subtraction

I need to write a function which gets a list of dictionaries (every dictionary represents a sparse matrix) and returns a dictionary of the subtraction matrix.
For example: for the list [{(1, 3): 2, (2, 7): 1}, {(1, 3): 6}] it needs to return {(1, 3): -4, (2, 7): 1} .
The matrices don't have to be the same size, the list can have more than two matrices and if the subtraction is 0 then it should not appear in the final dictionary.
I succeeded in getting the -4 but no matter what I write after defining x I get x == -6 and I can't tell why. I want to insert the -4 as the new value for the element.
lst = [{(1, 3): 2, (2, 7): 1}, {(1, 3): 6}]
def diff_sparse_matrices(lst):
result = {}
for dictionary in lst:
for element in dictionary:
if element not in result:
result[element] = dictionary[element]
if element in result:
x = result[element] - dictionary[element]
def diff_sparse_matrices(lst):
result = lst[0].copy()
for matrix in lst[1:]:
for coordinates, value in matrix.items():
result[coordinates] = result.get(coordinates, 0) - value
if result[coordinates] == 0:
del result[coordinates]
return result
def diff_sparse_matrices(lst):
result = lst[0].copy()
for d in lst[1:]:
for tup in d:
if tup in result:
result[tup] -= d[tup]
else:
result[tup] = -d[tup]
return result

Order dictionary with x and y coordinates in python

I have this problem.
I need order this points 1-7
1(4,2), 2(3, 5), 3(1,4), 4(1,1), 5(2,2), 6(1,3), 7(1,5)
and get this result
4 , 6 , 3 , 5 , 2 , 1 , 7.
I am using a python script for sort with x reference and is ok, but the sort in y is wrong.
I have tried with sorted(dicts,key=itemgetter(1,2))
Someone can help me please ?
Try this:
sorted(dicts,key=itemgetter(1,0))
Indexing in python starts at 0. itemgetter(1,0) is sorting by the second element and then by the first element
This sorts the code based on ordering the first coordinate of the tuple, and then sub-ordering by the second coordinate of the tuple. I.e. Like alphabetically where "Aa", then "Ab", then "Ba", then "Bb". More literall (1,1), (1,2), (2,1), (2,2), etc.
This will work IF (and only if) the tuple value pair associated with #7 is actually out of order in your question (and should actually be between #3 and #5.)
If this is NOT the case, See my other answer.
# Make it a dictionary, with the VALUETUPLES as the KEYS, and the designator as the value
d = {(1,1):4, (1,3):6, (1,4):3, (2,2):5, (3,5):2, (4,2):1,(1,5):7}
# ALSO make a list of just the value tuples
l = [ (1,1), (1,3), (1,4), (2,2), (3,5), (4,2), (1,5)]
# Sort the list by the first element in each tuple. ignoring the second
new = sorted(l, key=lambda x: x[0])
# Create a new dictionary, basically for temp sorting
new_d = {}
# This iterates through the first sorted list "new"
# and creates a dictionary where the key is the first number of value tuples
count = 0
# The extended range is because we don't know if any of the Tuple Values share any same numbers
for r in range(0, len(new)+1,1):
count += 1
new_d[r] = []
for item in new:
if item[0] == r:
new_d[r].append(item)
print(new_d) # So it makes sense
# Make a final list to capture the rdered TUPLES VALUES
final_list = []
# Go through the same rage as above
for r in range(0, len(new)+1,1):
_list = new_d[r] # Grab the first list item from the dic. Order does not matter here
if len(_list) > 0: # If the list has any values...
# Sort that list now by the SECOND tuple value
_list = sorted(_list, key=lambda x: x[1])
# Lists are ordered. So we can now just tack that ordered list onto the final list.
# The order remains
for item in _list:
final_list.append(item)
# This is all the tuple values in order
print(final_list)
# If you need them correlated to their original numbers
by_designator_num = []
for i in final_list: # The the first tupele value
by_designator_num.append(d[i]) # Use the tuple value as the key, to get the original designator number from the original "d" dictionary
print(by_designator_num)
OUTPUT:
[(1, 1), (1, 3), (1, 4), (1, 5), (2, 2), (3, 5), (4, 2)]
[4, 6, 3, 7, 5, 2, 1]
Since you're searching visually from top-to-bottom, then left-to-right, this code is much simpler and provides the correct result. It basically does the equivalent of a visual scan, by checking for all tuples that are at each "y=n" position, and then sorting any "y=n" tuples based on the second number (left-to-right).
Just to be more consistent with the Cartesian number system, I've converted the points on the graph to (x,y) coordinates, with X-positive (increasing to the right) and y-negative (decreasing as they go down).
d = {(2,-4):1, (5,-3):2, (4,-1):3, (1,-1):4, (2,-2):5, (3,-1):6, (1,-5):7}
l = [(2,-4), (5,-3), (4,-1), (1,-1), (2,-2), (3,-1), (1,-5)]
results = []
# Use the length of the list. Its more than needed, but guarantees enough loops
for y in range(0, -len(l), -1):
# For ONLY the items found at the specified y coordinate
temp_list = []
for i in l: # Loop through ALL the items in the list
if i[1] == y: # If tuple is at this "y" coordinate then...
temp_list.append(i) # ... append it to the temp list
# Now sort the list based on the "x" position of the coordinate
temp_list = sorted(temp_list, key=lambda x: x[0])
results += temp_list # And just append it to the final result list
# Final TUPLES in order
print(results)
# If you need them correlated to their original numbers
by_designator_num = []
for i in results: # The the first tupele value
by_designator_num.append(d[i]) # Use the tuple value as the key, to get the original designator number from the original "d" dictionary
print(by_designator_num)
OR if you want it faster and more compact
d = {(2,-4):1, (5,-3):2, (4,-1):3, (1,-1):4, (2,-2):5, (3,-1):6, (1,-5):7}
l = [(2,-4), (5,-3), (4,-1), (1,-1), (2,-2), (3,-1), (1,-5)]
results = []
for y in range(0, -len(l), -1):
results += sorted([i for i in l if i[1] == y ], key=lambda x: x[0])
print(results)
by_designator_num = [d[i] for i in results]
print(by_designator_num)
OUTPUT:
[(1, -1), (3, -1), (4, -1), (2, -2), (5, -3), (2, -4), (1, -5)]
[4, 6, 3, 5, 2, 1, 7]

Better ways to find pairs that sum to N

Is there a faster way to write this, the function takes a list and a value to find the pairs of numeric values in that list that sum to N without duplicates I tried to make it faster by using sets instead of using the list itself (however I used count() which I know is is linear time) any suggestions I know there is probably a way
def pairsum_n(list1, value):
set1 = set(list1)
solution = {(min(i, value - i) , max(i, value - i)) for i in set1 if value - i in set1}
solution.remove((value/2,value/2)) if list1.count(value/2) < 2 else None
return solution
"""
Example: value = 10, list1 = [1,2,3,4,5,6,7,8,9]
pairsum_n = { (1,9), (2,8), (3,7), (4,6) }
Example: value = 10, list2 = [5,6,7,5,7,5,3]
pairsum_n = { (5,5), (3,7) }
"""
Your approach is quite good, it just needs a few tweaks to make it more efficient. itertools is convenient, but it's not really suitable for this task because it produces so many unwanted pairs. It's ok if the input list is small, but it's too slow if the input list is large.
We can avoid producing duplicates by looping over the numbers in order, stopping when i >= value/2, after using a set to get rid of dupes.
def pairsum_n(list1, value):
set1 = set(list1)
list1 = sorted(set1)
solution = []
maxi = value / 2
for i in list1:
if i >= maxi:
break
j = value - i
if j in set1:
solution.append((i, j))
return solution
Note that the original list1 is not modified. The assignment in this function creates a new local list1. If you do actually want (value/2, value/2) in the output, just change the break condition.
Here's a slightly more compact version.
def pairsum_n(list1, value):
set1 = set(list1)
solution = []
for i in sorted(set1):
j = value - i
if i >= j:
break
if j in set1:
solution.append((i, j))
return solution
It's possible to condense this further, eg using itertools.takewhile, but it will be harder to read and there won't be any improvement in efficiency.
Try this, running time O(nlogn):
v = [1, 2, 3, 4, 5, 6, 7, 8, 9]
l = 0
r = len(v)-1
def myFunc(v, value):
ans = []
% this block search for the pair (value//2, value//2)
if value % 2 == 0:
c = [i for i in v if i == value // 2]
if len(c) >= 2:
ans.append((c[0], c[1]))
v = list(set(v))
l = 0
r = len(v)-1
v.sort()
while l<len(v) and r >= 0 and l < r:
if v[l] + v[r] == value:
ans.append((v[l], v[r]))
l += 1
r -= 1
elif v[l] + v[r] < value:
l += 1
else:
r -= 1
return list(set(ans))
It is called the Two pointers technique and it works as follows. First of all, sort the array. This imposes a minimum running time of O(nlogn). Then set two pointers, one pointing at the start of the array l and other pointing at its last element r (pointers name are for left and right).
Now, look at the list. If the sum of the values returned at position l and r is lower than the value we are looking for, then we need to increment l. If it's greater, we need to decrement r.
If v[l] + v[r] == value than we can increment/decrement both l or r since in any case we want to skip the combination of values (v[l], v[r]) as we don't want duplicates.
Timings: this is actually slower then the other 2 solutions. Due to the amount of combinations produced but not actually needed it gets worse the bigger the lists are.
You can use itertools.combinations to produce the 2-tuple-combinations for you.
Put them into a set if they match your value, then return as set/list:
from itertools import combinations
def pairsum_n(list1, value):
"""Returns the unique list of pairs of combinations of numbers from
list1 that sum up `value`. Reorders the values to (min_value,max_value)."""
result = set()
for n in combinations(list1, 2):
if sum(n) == value:
result.add( (min(n),max(n)) )
return list(result)
# more ugly one-liner:
# return list(set(((min(n),max(n)) for n in combinations(list1,2) if sum(n)==value)))
data = [1,2,3,4,5,6,6,5,4,3,2,1]
print(pairsum_n(data,7))
Output:
[(1, 6), (2, 5), (3, 4)]
Fun little thing, with some sorting overhead you can get all at once:
def pairsum_n2(data, count_nums=2):
"""Generate a dict with all count_nums-tuples from data. Key into the
dict is the sum of all tuple-values."""
d = {}
for n in (tuple(sorted(p)) for p in combinations(data,count_nums)):
d.setdefault(sum(n),set()).add(n)
return d
get_all = pairsum_n2(data,2) # 2 == number of numbers to combine
for k in get_all:
print(k," -> ", get_all[k])
Output:
3 -> {(1, 2)}
4 -> {(1, 3), (2, 2)}
5 -> {(2, 3), (1, 4)}
6 -> {(1, 5), (2, 4), (3, 3)}
7 -> {(3, 4), (2, 5), (1, 6)}
2 -> {(1, 1)}
8 -> {(2, 6), (4, 4), (3, 5)}
9 -> {(4, 5), (3, 6)}
10 -> {(5, 5), (4, 6)}
11 -> {(5, 6)}
12 -> {(6, 6)}
And then just access the one you need via:
print(get_all.get(7,"Not possible")) # {(3, 4), (2, 5), (1, 6)}
print(get_all.get(17,"Not possible")) # Not possible
Have another solution, it's alot faster then the one I just wrote, not as fast as #PM 2Ring's answer:
def pairsum_n(list1, value):
set1 = set(list1)
if list1.count(value/2) < 2:
set1.remove(value/2)
return set((min(x, value - x) , max(x, value - x)) for x in filterfalse(lambda x: (value - x) not in set1, set1))

Categories