Accessing value in a dict ignoring the key's second value - python

I have a dictionary L, whose keys are tuple of length 2: the first element is an index, the second element is either 0 or 1. I'm defining several functions. In one of them, I need to consider the second element of the tuple, therefore I need it to stay there.
But now I have trouble in another function, in which I do not care at all about it. I have to retrieve the dict value of a given index (first element of the tuple), but I have no idea if the second value is a 0 or a 1.
Is there a mute variable, or something that I can pass and it says "either 0 or 1"?
To make things clearer, I would like to have something like:
needed_value = L.get((given_index, either))
where "either" could be 0 or 1.
For now, I created an if/else, but it seems stupid, because both their body just assign the value.
Thank you very much,
I hope I didn't miss a preexisting solution for this problem!
Edit:
My dict is something like:
L = {(7, 1): 0, (2, 0): 1, (5, 1): 4, (1, 1): 2, (11, 0): 3}
So, I know for sure that the second value in the keys is 0 or (exclusive) 1. Moreover, the first value is unique (if it is relevant).
The code with if/else was (more or less, I do not have it anymore):
if (i, 1) in L.keys():
tau = L.get((i, 1))
elif (i, 0) in L.keys():
tau = L.get((i, 0))

No, there's no way to do this. If you need to retrieve elements by the first part only, then you should make that the key and store the other part in the value.

No. Dictionary keys are hashed and you can't query them database-style with partial tuple matches.
What you can do, if you are sure that either 0 or 1 exists (but, for clarity and in Python <3.6, not both), is to use the optional fallback argument of dict.get:
L = {(10, 1): 5, (10, 2): 6, (10, 3): 7}
val = L.get((10, 0), L.get((10, 1)))
print(val) # 5
Alternatively, and to account for the case when both (10, 0) and (10, 1) exist, you can use a custom function:
L = {(10, 1): 5, (10, 2): 6, (10, 3): 7, (10, 0): 8}
def get_val(x, key):
try:
return x[(key, 0)]
except KeyError:
return x[(key, 1)]
val = get_val(L, 10)
print(val) # 8

Assuming you have something like this:
L = {(10, 1): 5, (10, 0): 6, (20, 0): 7}
And you want to get all the values that correspond to keys starting with, e.g., 10, aka (10, 0) and (10, 1)
to_match = 10
You can do:
res = [v for k, v in L.items() if any(k == (to_match, x) for x in (0, 1))]
which returns:
[5, 6]

As everyone already mentionned, you can't do this out of the box. Now if you need both ways to get the value(s) - one from the full key and another from a "partial" key, a solution might be to maintain a key[0]=>[keys...] mapping in parallel, ie:
L = {(10, 1): 5, (10, 0): 6, (20, 0): 7}
index = defaultdict(list)
for key in L:
index[key[0]].append(key)
Then you can get the list of "complete" keys for a partial one
keys = index.get(10)
values = [L[k] for k in keys]
Of course this means you have to make sure you keep your index up to date so you would need to wrap this all in a class to make sure all updates to L are reflected in index.

Related

How to create a list of all values within a certain range from each other?

I have a list of tuples:
x = [(2, 10), (4, 5), (8, 10), (9, 11), (10, 15)]
I'm trying to compare the first values in all the tuples to see if they are within 1 from each other. If they are within 1, I want to aggregate (sum) the second value of the tuple, and take the mean of the first value.
The output list would look like this:
[(2, 10), (4, 5), (9, 36)]
Notice that the 8 and 10 have a difference of 2, but they're both only 1 away from 9, so they all 3 get aggregated.
I have been trying something along these lines, but It's not capturing the sequenced values like 8, 9, and 10. It's also still preserving the original values, even if they've been aggregated together.
tuple_list = [(2, 10), (4, 5), (8, 10), (9, 11), (10, 15)]
output_list = []
for x1,y1 in tuple_list:
for x2,y2 in tuple_list:
if x1==x2:
continue
if np.abs(x1-x2) <= 1:
output_list.append((np.mean([x1,x2]), y1+y2))
else:
output_list.append((x1,y1))
output_list = list(set(output_list))
You can do it in a list comprehension using groupby (from itertools). The grouping key will be the difference between the first value and the tuple's index in the list. When the values are 1 apart, this difference will be constant and the tuples will be part of the same group.
For example: [2, 4, 8, 9, 10] minus their indexes [0, 1, 2, 3, 4] will give [2, 3, 6, 6, 6] forming 3 groups: [2], [4] and [8 ,9, 10].
from itertools import groupby
x = [(2, 10), (4, 5), (8, 10), (9, 11), (10, 15)]
y = [ (sum(k)/len(k),sum(v)) # output tuple
for i in [enumerate(x)] # sequence iterator
for _,g in groupby(x,lambda t:t[0]-next(i)[0]) # group by sequence
for k,v in [list(zip(*g))] ] # lists of keys & values
print(y)
[(2.0, 10), (4.0, 5), (9.0, 36)]
The for k,v in [list(zip(*g))] part is a bit tricky but what it does it transform a list of tuples (in a group) into two lists (k and v) with k containing the first item of each tuple and v containing the second items.
e.g. if g is ((8,10),(9,11),(10,15)) then k will be (8,9,10) and v will be (10,11,15)
By sorting the list first, and then using itertools.pairwise to iterate over the next and previous days, this problem starts to become much easier. On sequential days, instead of adding a new item to our final list, we modify the last item added to it. Figuring out the new sum is easy enough, and figuring out the new average is actually super easy because we're averaging sequential numbers. We just need to keep track of how many sequential days have passed and we can use that to get the average.
def on_neighboring_days_sum_occurrances(tuple_list):
tuple_list.sort()
ret = []
sequential_days = 1
# We add the first item now
# And then when we start looping we begin looping on the second item
# This way the loop will always be able to modify ret[-1]
ret.append(tuple_list[0])
# Python 3.10+ only, in older versions do
# for prev, current in zip(tuple_list, tuple_list[1:]):
for prev, current in itertools.pairwise(tuple_list):
day = current[0]
prev_day = prev[0]
is_sequential_day = day - prev_day <= 1
if is_sequential_day:
sequential_days += 1
avg_day = day - sequential_days/2
summed_vals = ret[-1][1] + current[1]
ret[-1] = (avg_day, summed_vals)
else:
sequential_days = 1
ret.append(current)
return ret
You can iterate through the list and keep track of a single tuple, and iterate from the tuple next to the one that you're tracking till the penultimate tuple in the list because, when the last tuple comes into tracking there is no tuple after that and thus it is a waste iteration; and find if the difference between the 1st elements is equal to the difference in indices of the tuples, if so sum up the 2nd as well as 1st elements, when this condition breaks, divide the sum of 1st elements with the difference in indices so as to get the average of them, and append them to the result list, now to make sure the program doesn't consider the same tuples again, jump to the index where the condition broke... like this
x = [(2, 10), (4, 5), (8, 10), (9, 11), (10, 15)]
x.sort()
res, i = [], 0
while i<len(x)-1:
sum2, avg1 = x[i][1], x[i][0]
for j in range(i+1, len(x)):
if abs(x[j][0]-x[i][0]) == (j-i):
sum2 += x[j][1]
avg1 += x[j][0]
else:
res.append(x[i])
i+=1
break
else:
avg1 /= len(x)-i
res.append((int(avg1), sum2))
i = j+1
print(res)
Here the while loop iterates from the start of the list till the penultimate tuple in the list, the sum2, avg1 keeps track of the 2nd and 1st elements of the current tuple respectively. The for loop iterates through the next tuple to the current tuple till the end. The if checks the condition, and if it is met, it adds the elements of the tuple from the for loop since the variables are intialized with the elements of current tuple, else it appends the tuple from the for loop directly to the result list res, increments the while loop variable and breaks out of the iteration. When the for loop culminates without a break, it means that the condition breaks, thus it finds the average of the 1st element and appends the tuple (avg1, sum2) to res and skips to the tuple which is next to the one that broke the condition.

Finding value in dict given an integer that can be found in between dictionary's tuple key

Given an x dictionary of tuple keys and string values:
x = {(0, 4): 'foo', (4,9): 'bar', (9,10): 'sheep'}
The task is to write the function, find the value, given a specific number, e.g. if user inputs 3, it should return 'foo'. We can assume that there is no overlapping numbers in the key.
Another e.g., if user inputs 9, it should return 'bar'.
I've tried converting the x dict to a list and write the function as follows, but it's suboptimal if the range of values in the keys is extremely huge:
from itertools import chain
mappings = None * max(chain(*x))
for k in x:
for i in range(k[0], k[1]):
mappings[i] = x[k]
def myfunc(num):
return mapping[num]
How else can the myfunc function be written?
Is there a better data structure to keep the mapping?
You can convert your key in a numpy array and use numpy.searchsorted to search a query. Since keys are left open I have incremented open value of keys by 1 in the array.
Each query is of order O(log(n)).
Create an array:
A = np.array([[k1+1, k2] for k1, k2 in x])
>>> A
array([[ 1, 4],
[ 5, 9],
[10, 10]])
Function to search query:
def myfunc(num):
ind1 = np.searchsorted(A[:, 0], num, 'right')
ind2 = np.searchsorted(A[:, 1], num, 'left')
if ind1 == 0 or ind2 == A.shape[0] or ind1 <= ind2: return None
return vals[ind2]
Prints:
>>> myfunc(3)
'foo'
Iterate over the dictionary comparing to the keys:
x = {(0, 4): 'foo', (4, 9): 'bar', (9, 10): 'sheep'}
def find_tuple(dct, num):
for tup, val in dct.items():
if tup[0] <= num < tup[1]:
return val
return None
print(find_tuple(x, 3))
# foo
print(find_tuple(x, 9))
# sheep
print(find_tuple(x, 11))
# None
A better data structure would be a dictionary with just the left boundaries of the intervals (as keys) and the corresponding values. Then you can use bisect as the other answers mention.
import bisect
import math
x = {
-math.inf: None,
0: 'foo',
4: 'bar',
9: 'sheep',
10: None,
}
def find_tuple(dct, num):
idx = bisect.bisect_right(list(dct.keys()), num)
return list(dct.values())[idx-1]
print(find_tuple(x, 3))
# foo
print(find_tuple(x, 9))
# sheep
print(find_tuple(x, 11))
# None
You could simply iterate through keys and compare the values (rather than creating a mapping). This is a bit more efficient than creating a mapping first, since you could have a key like (0, 100000) which will create needless overhead.
Edited answer based on comments from OP
x = {(0, 4): 'foo', (4,9): 'bar', (9,10): 'sheep'}
def find_value(k):
for t1, t2 in x:
if k > t1 and k <= t2: # edited based on comments
return x[(t1, t2)]
# if we end up here, we can't find a match
# do whatever appropriate, e.g. return None or raise exception
return None
Note: it's unclear in your tuple keys if they are inclusive ranges for the input number. E.g. if a user inputs 4, should they get 'foo' or 'bar'? This will affect your comparison in the function described above in my snippet. (see edit above, this should fulfill your requirement).
In this example above, an input of 4 would return 'foo', since it would fulfill the condition of being k >= 0 and k <= 4, and thus return before continuing the loop.
Edit: wording and typo fix
Here's one solution using pandas.IntervalIndex and pandas.cut. Note, I "tweaked" the last key to (10, 11), because I'm using closed="left" in my IntervalIndex. You can change this if you want the intervals closed on different sides (or both):
import pandas as pd
x = {(0, 4): "foo", (4, 9): "bar", (10, 11): "sheep"}
bins = pd.IntervalIndex.from_tuples(x, closed="left")
result = pd.cut([3], bins)[0]
print(x[(result.left, result.right)])
Prints:
foo
Other solution using bisect module (assuming the ranges are continuous - so no "gaps"):
from bisect import bisect_left
x = {(0, 4): "foo", (4, 9): "bar", (10, 10): "sheep"}
bins, values = [], []
for k in sorted(x):
bins.append(k[1]) # intervals are closed "right", eg. (0, 4]
values.append(x[k])
idx = bisect_left(bins, 4)
print(values[idx])
Prints:
foo

Sum possibilities, one loop

Earlier I had a lot of wonderful programmers help me get a function done. however the instructor wanted it in a single loop and all the working solutions used multiple loops.
I wrote an another program that almost solves the problem. Instead of using a loop to compare all the values, you have to use the function has_key to see if that specific key exists. Answer of that will rid you of the need to iter through the dictionary to find matching values because u can just know if they are matching or not.
again, charCount is just a function that enters the constants of itself into a dictionary and returns the dictionary.
def sumPair(theList, n):
for a, b in level5.charCount(theList).iteritems():
x = n - a
if level5.charCount(theList).get(a):
if a == x:
if b > 1: #this checks that the frequency of the number is greater then one so the program wouldn't try to multiply a single possibility by itself and use it (example is 6+6=12. there could be a single 6 but it will return 6+6
return a, x
else:
if level5.charCount(theList).get(a) != x:
return a, x
print sumPair([6,3,8,3,2,8,3,2], 9)
I need to just make this code find the sum without iteration by seeing if the current element exists in the list of elements.
You can use collections.Counter function instead of the level5.charCount
And I don't know why you need to check if level5.charCount(theList).get(a):. I think it is no need. a is the key you get from the level5.charCount(theList)
So I simplify you code:
form collections import Counter
def sumPair(the_list, n):
for a, b in Counter(the_list).iteritems():
x = n - a
if a == x and b >1:
return a, x
if a != x and b != x:
return a, x
print sumPair([6, 3, 8, 3, 2, 8, 3, 2], 9) #output>>> (8, 1)
The also can use List Comprehension like this:
>>>result = [(a, n-a) for a, b in Counter(the_list).iteritems() if a==n-a and b>1 or (a != n-a and b != n-a)]
>>>print result
[(8, 1), (2, 7), (3, 6), (6, 3)]
>>>print result[0] #this is the result you want
(8, 1)

Python: is index() buggy at all?

I'm working through this thing on pyschools and it has me mystified.
Here's the code:
def convertVector(numbers):
totes = []
for i in numbers:
if i!= 0:
totes.append((numbers.index(i),i))
return dict((totes))
Its supposed to take a 'sparse vector' as input (ex: [1, 0, 1 , 0, 2, 0, 1, 0, 0, 1, 0])
and return a dict mapping non-zero entries to their index.
so a dict with 0:1, 2:1, etc where x is the non zero item in the list and y is its index.
So for the example number it wants this: {0: 1, 9: 1, 2: 1, 4: 2, 6: 1}
but instead gives me this: {0: 1, 4: 2} (before its turned to a dict it looks like this:
[(0, 1), (0, 1), (4, 2), (0, 1), (0, 1)]
My plan is for i to iterate through numbers, create a tuple of that number and its index, and then turn that into a dict. The code seems straightforward, I'm at a loss.
It just looks to me like numbers.index(i) is not returning the index, but instead returning some other, unsuspected number.
Is my understanding of index() defective? Are there known index issues?
Any ideas?
index() only returns the first:
>>> a = [1,2,3,3]
>>> help(a.index)
Help on built-in function index:
index(...)
L.index(value, [start, [stop]]) -> integer -- return first index of value.
Raises ValueError if the value is not present.
If you want both the number and the index, you can take advantage of enumerate:
>>> for i, n in enumerate([10,5,30]):
... print i,n
...
0 10
1 5
2 30
and modify your code appropriately:
def convertVector(numbers):
totes = []
for i, number in enumerate(numbers):
if number != 0:
totes.append((i, number))
return dict((totes))
which produces
>>> convertVector([1, 0, 1 , 0, 2, 0, 1, 0, 0, 1, 0])
{0: 1, 9: 1, 2: 1, 4: 2, 6: 1}
[Although, as someone pointed out though I can't find it now, it'd be easier to write totes = {} and assign to it directly using totes[i] = number than go via a list.]
What you're trying to do, it could be done in one line:
>>> dict((index,num) for index,num in enumerate(numbers) if num != 0)
{0: 1, 2: 1, 4: 2, 6: 1, 9: 1}
Yes your understanding of list.index is incorrect. It finds the position of the first item in the list which compares equal with the argument.
To get the index of the current item, you want to iterate over with enumerate:
for index, item in enumerate(iterable):
# blah blah
The problem is that .index() looks for the first occurence of a certain argument. So for your example it always returns 0 if you run it with argument 1.
You could make use of the built in enumerate function like this:
for index, value in enumerate(numbers):
if value != 0:
totes.append((index, value))
Check the documentation for index:
Return the index in the list of the first item whose value is x. It is
an error if there is no such item.
According to this definition, the following code appends, for each value in numbers a tuple made of the value and the first position of this value in the whole list.
totes = []
for i in numbers:
if i!= 0:
totes.append((numbers.index(i),i))
The result in the totes list is correct: [(0, 1), (0, 1), (4, 2), (0, 1), (0, 1)].
When turning it into again, again, the result is correct, since for each possible value, you get the position of its first occurrence in the original list.
You would get the result you want using i as the index instead:
result = {}
for i in range(len(numbers)):
if numbers[i] != 0:
result[i] = numbers[i]
index() returns the index of the first occurrence of the item in the list. Your list has duplicates which is the cause of your confusion. So index(1) will always return 0. You can't expect it to know which of the many instances of 1 you are looking for.
I would write it like this:
totes = {}
for i, num in enumerate(numbers):
if num != 0:
totes[i] = num
and avoid the intermediate list altogether.
Riffing on #DSM:
def convertVector(numbers):
return dict((i, number) for i, number in enumerate(numbers) if number)
Or, on re-reading, as #Rik Poggi actually suggests.

Summing Consecutive Ranges Pythonically

I have a sumranges() function, which sums all the ranges of consecutive numbers found in a tuple of tuples. To illustrate:
def sumranges(nums):
return sum([sum([1 for j in range(len(nums[i])) if
nums[i][j] == 0 or
nums[i][j - 1] + 1 != nums[i][j]]) for
i in range(len(nums))])
>>> nums = ((1, 2, 3, 4), (1, 5, 6), (19, 20, 24, 29, 400))
>>> print sumranges(nums)
7
As you can see, it returns the number of ranges of consecutive digits within the tuple, that is: len((1, 2, 3, 4), (1), (5, 6), (19, 20), (24), (29), (400)) = 7. The tuples are always ordered.
My problem is that my sumranges() is terrible. I hate looking at it. I'm currently just iterating through the tuple and each subtuple, assigning a 1 if the number is not (1 + previous number), and summing the total. I feel like I am missing a much easier way to accomplish my stated objective. Does anyone know a more pythonic way to do this?
Edit: I have benchmarked all the answers given thus far. Thanks to all of you for your answers.
The benchmarking code is as follows, using a sample size of 100K:
from time import time
from random import randrange
nums = [sorted(list(set(randrange(1, 10) for i in range(10)))) for
j in range(100000)]
for func in sumranges, alex, matt, redglyph, ephemient, ferdinand:
start = time()
result = func(nums)
end = time()
print ', '.join([func.__name__, str(result), str(end - start) + ' s'])
Results are as follows. Actual answer shown to verify that all functions return the correct answer:
sumranges, 250281, 0.54171204567 s
alex, 250281, 0.531121015549 s
matt, 250281, 0.843333005905 s
redglyph, 250281, 0.366822004318 s
ephemient, 250281, 0.805964946747 s
ferdinand, 250281, 0.405596971512 s
RedGlyph does edge out in terms of speed, but the simplest answer is probably Ferdinand's, and probably wins for most pythonic.
My 2 cents:
>>> sum(len(set(x - i for i, x in enumerate(t))) for t in nums)
7
It's basically the same idea as descriped in Alex' post, but using a set instead of itertools.groupby, resulting in a shorter expression. Since sets are implemented in C and len() of a set runs in constant time, this should also be pretty fast.
Consider:
>>> nums = ((1, 2, 3, 4), (1, 5, 6), (19, 20, 24, 29, 400))
>>> flat = [[(x - i) for i, x in enumerate(tu)] for tu in nums]
>>> print flat
[[1, 1, 1, 1], [1, 4, 4], [19, 19, 22, 26, 396]]
>>> import itertools
>>> print sum(1 for tu in flat for _ in itertools.groupby(tu))
7
>>>
we "flatten" the "increasing ramps" of interest by subtracting the index from the value, turning them into consecutive "runs" of identical values; then we identify and could the "runs" with the precious itertools.groupby. This seems to be a pretty elegant (and speedy) solution to your problem.
Just to show something closer to your original code:
def sumranges(nums):
return sum( (1 for i in nums
for j, v in enumerate(i)
if j == 0 or v != i[j-1] + 1) )
The idea here was to:
avoid building intermediate lists but use a generator instead, it will save some resources
avoid using indices when you already have selected a subelement (i and v above).
The remaining sum() is still necessary with my example though.
Here's my attempt:
def ranges(ls):
for l in ls:
consec = False
for (a,b) in zip(l, l[1:]+(None,)):
if b == a+1:
consec = True
if b is not None and b != a+1:
consec = False
if consec:
yield 1
'''
>>> nums = ((1, 2, 3, 4), (1, 5, 6), (19, 20, 24, 29, 400))
>>> print sum(ranges(nums))
7
'''
It looks at the numbers pairwise, checking if they are a consecutive pair (unless it's at the last element of the list). Each time there's a consecutive pair of numbers it yields 1.
This could probably be put together in a more compact form, but I think clarity would suffer:
def pairs(seq):
for i in range(1,len(seq)):
yield (seq[i-1], seq[i])
def isadjacent(pair):
return pair[0]+1 == pair[1]
def sumrange(seq):
return 1 + sum([1 for pair in pairs(seq) if not isadjacent(pair)])
def sumranges(nums):
return sum([sumrange(seq) for seq in nums])
nums = ((1, 2, 3, 4), (1, 5, 6), (19, 20, 24, 29, 400))
print sumranges(nums) # prints 7
You could probably do this better if you had an IntervalSet class because then you would scan through your ranges to build your IntervalSet, then just use the count of set members.
Some tasks don't always lend themselves to neat code, particularly if you need to write the code for performance.
There is a formula for this, the sum of the first n numbers, 1+ 2+ ... + n = n(n+1) / 2 . Then if you want to have the sum of i-j then it is (j(j+1)/2) - (i(i+1)/2) this I am sure simplifies but you can work that out. It might not be pythonic but it is what I would use.

Categories