Python: Efficient lookup by interval - python

I have a large lookup table where the key is an interval:
| min | max | value |
|-----|-----|---------|
| 0 | 3 | "Hello" |
| 4 | 5 | "World" |
| 6 | 6 | "!" |
| ... | ... | ... |
The goal is to create a lookup structure my_lookup that returns a value for each integer, depending on the range the integer is in.
For example: 2 -> "Hello", 3 -> "Hello", 4 -> "World".
Here is an implementation that does what I want:
d = {
(0, 3): "Hello",
(4, 5): "World",
(6, 6): "!"
}
def my_lookup(i: int) -> str:
for key, value in d.items():
if key[0] <= i <= key[1]:
return value
But looping over all entries seems inefficient (the actual lookup table contains 400.000 lines). Is there a faster way?

If your intervals are sorted (in ascending order), you can use bisect module (doc). The search is O(log n) instead of O(n):
min_lst = [0, 4, 6]
max_lst = [3, 5, 6]
values = ['Hello', 'World', '!']
import bisect
val = 2
idx = bisect.bisect_left(max_lst, val)
if idx < len(max_lst) and min_lst[idx] <= val <= max_lst[idx]:
print('Value found ->', values[idx])
else:
print('Value not found')
Prints:
Value found -> Hello

Related

How should I solve logic error in timestamp using Python?

I have written a code to calculate a, b, and c. They were initialized at 0.
This is my input file
-------------------------------------------------------------
| Line | Time | Command | Data |
-------------------------------------------------------------
| 1 | 0015 | ACTIVE | |
| 2 | 0030 | WRITING | |
| 3 | 0100 | WRITING_A | |
| 4 | 0115 | PRECHARGE | |
| 5 | 0120 | REFRESH | |
| 6 | 0150 | ACTIVE | |
| 7 | 0200 | WRITING | |
| 8 | 0314 | PRECHARGE | |
| 9 | 0318 | ACTIVE | |
| 10 | 0345 | WRITING_A | |
| 11 | 0430 | WRITING_A | |
| 12 | 0447 | WRITING | |
| 13 | 0503 | WRITING | |
and the timestamps and commands are used to process the calculation for a, b, and c.
import re
count = {}
timestamps = {}
with open ("page_stats.txt", "r") as f:
for line in f:
m = re.split(r"\s*\|\s*", line)
if len(m) > 3 and re.match(r"\d+", m[1]):
count[m[3]] = count[m[3]] + 1 if m[3] in count else 1
#print(m[2])
if m[3] in timestamps:
timestamps[m[3]].append(m[2])
#print(m[3], m[2])
else:
timestamps[m[3]] = [m[2]]
#print(m[3], m[2])
a = b = c = 0
for key in count:
print("%-10s: %2d, %s" % (key, count[key], timestamps[key]))
if timestamps["ACTIVE"] > timestamps["PRECHARGE"]: #line causing logic error
a = a + 1
print(a)
Before getting into the calculation, I assign the timestamps with respect to the commands. This is the output for this section.
ACTIVE : 3, ['0015', '0150', '0318']
WRITING : 4, ['0030', '0200', '0447', '0503']
WRITING_A : 3, ['0100', '0345', '0430']
PRECHARGE : 2, ['0115', '0314']
REFRESH : 1, ['0120']
To get a, the timestamps of ACTIVE must be greater than PRECHARGE and WRITING must be greater than ACTIVE. (Line 4, 6, 7 will contribute to the first a and Line 8, 9, and 12 contributes to the second a)
To get b, the timestamps of WRITING must be greater than ACTIVE. For the lines that contribute to a such as Line 4, 6, 7, 8, 9, and 12, they cannot be used to calculate b. So, Line 1 and 2 contribute to b.
To get c, the rest of the unused lines containing WRITING will contribute to c.
The expected output:
a = 2
b = 1
c = 1
However, in my code, when I print a, it displays 0, which shows the logic has some error. Any suggestion to amend my code to achieve the goal? I have tried for a few days and the problem is not solved yet.
I made a function that will return the commands in order that match a pattern with gaps allowed.
I also made a more compact version of your file reading.
There is probably a better version to divide the list into two parts, the problem was to only allow elements in that match the whole pattern. In this one I iterate over the elements twice.
import re
commands = list()
with open ("page_stats.txt", "r") as f:
for line in f:
m = re.split(r"\s*\|\s*", line)
if len(m) > 3 and re.match(r"\d+", m[1]):
_, line, time, command, data, _ = m
commands.append((line,time,command))
def search_pattern(pattern, iterable, key=None):
iter = 0
count = 0
length = len(pattern)
results = []
sentinel = object()
for elem in iterable:
original_elem = elem
if key is not None:
elem = key(elem)
if elem == pattern[iter]:
iter += 1
results.append((original_elem,sentinel))
if iter >= length:
iter = iter % length
count += length
else:
results.append((sentinel,original_elem))
matching = []
nonmatching = []
for res in results:
first,second = res
if count > 0:
if second is sentinel:
matching.append(first)
count -= 1
elif first is sentinel:
nonmatching.append(second)
else:
value = first if second is sentinel else second
nonmatching.append(value)
return matching, nonmatching
pattern_a = ['PRECHARGE','ACTIVE','WRITING']
pattern_b = ['ACTIVE','WRITING']
pattern_c = ['WRITING']
matching, nonmatching = search_pattern(pattern_a, commands, key=lambda t: t[2])
a = len(matching)//len(pattern_a)
matching, nonmatching = search_pattern(pattern_b, nonmatching, key=lambda t: t[2])
b = len(matching)//len(pattern_b)
matching, nonmatching = search_pattern(pattern_c, nonmatching, key=lambda t: t[2])
c = len(matching)//len(pattern_c)
print(f'{a=}')
print(f'{b=}')
print(f'{c=}')
Output:
a=2
b=1
c=1

Print item of list and another item of list

Suppose I have two lists:
list1=[1, 2, 2, 12, 23]
list2=[2, 3, 5, 3, 4]
I want to arrange them side by side list1 and list2 using python
so:
| list1 | list2 |
|---|---|
| 1 | 2 |
| 2 | 3 |
| 2 | 5 |
| 12 | 3 |
| 23 | 4 |
but I want to remove the third row (2,5) and make it:
| list1 | list2 |
|---|---|
| 1 | 2 |
| 2 | 3,5 |
| 12 | 3 |
| 23 | 4 |
Using itertools.groupby:
from itertools import groupby
list1=[1, 2, 2, 12, 23]
list2=[2, 3, 5, 3, 4]
for key, value in groupby(zip(list1, list2), key=lambda x: x[0]):
print(f"{key} {','.join(str(y) for x, y in value)}")
or if you prefer using collections.defaultdict:
from collections import defaultdict
list1=[1,2,2,12,23]
list2=[2, 3, 5, 3, 4]
result = defaultdict(list)
for key, value in zip(list1, list2):
result[key].append(value)
for key, value in result.items():
print(f"{key} {','.join(map(str, value))}")
using only loops and built-in functions:
list1=[1,2,2,12,23]
list2=[2, 3, 5, 3, 4]
result={}
for key, value in zip(list1, list2):
result.setdefault(key, []).append(value)
for key, value in result.items():
print(f"{key} {','.join(map(str, value))}")
In all three cases the output is
1 2
2 3,5
12 3
23 4
You could zip the two lists together and append them to a dictionary. If one value of list 1 already exists as a key in the dictionary, you format the value as a string and append the next one to the same key. I just posted this since it might be easier to follow, however i really like the answer above from #buran.
list1 = [1, 2, 2, 12, 23]
list2 = [2, 3, 5, 3, 4]
combo_dict = {}
# zip both lists together and iterate over them
for key, val in zip(list1, list2):
"""if an item in list1 already exists in combo_dict,
format the respective overlapping value in list2 as a
string and append to the same key to the dict"""
if key in combo_dict.keys():
combo_dict[key] = ",".join([f"{combo_dict[key]}", f"{val}"])
"""if not, simply append the value of list 2 with the key of
list 1 to the dict (could also format this to a string)"""
else:
combo_dict[key] = val
# this is just for printing the dict to the console
no_out=[print(f"{key} {val}") for key, val in combo_dict.items()]

How to collect results of recursive backtracking?

I'm teaching myself recursive backtracking. For a dice summing problem I can't figure out how to elegantly collect the results.
For reference here's my code that just prints any dice roll which meets the criteria. Ideally I want to change it so instead of printing the output, I can build up a list of those chosen dice and return it.
Below here is code that does not do what I want
def dice_sum(num_dice: int, target_sum: int) -> None:
dice_sum_helper(num_dice, target_sum, [])
def dice_sum_helper(num_dice: int, target_sum: int, chosen: List[int]) -> None:
if num_dice == 0 and sum(chosen) == target_sum:
print(chosen)
elif num_dice == 0:
pass
else:
for i in range(1, 7):
chosen.append(i)
dice_sum_helper(num_dice - 1, target_sum, chosen)
chosen.pop()
Instead I want it to do something like this
from typing import List
DiceResult = List[List[int]]
def dice_sum(num_dice: int, target_sum: int) -> DiceResult:
return dice_sum_helper(num_dice, target_sum, [])
def dice_sum_helper(num_dice: int, target_sum: int, chosen: List[int]) -> DiceResult:
if num_dice == 0 and sum(chosen) == target_sum:
# Return the value that meets the constraints
return chosen
elif num_dice == 0:
pass
else:
for i in range(1, 7):
chosen.append(i)
# Return the result of my recursive call and build the list of lists?
result = dice_sum_helper(num_dice - 1, target_sum, chosen)
return result.append(result)
# End of that logic
chosen.pop()
I'm looking more for the theory or pattern to use than the exact code. I can't quite get the code to collect and append each result without using an external list working.
You can pass a 'results' list in which to store the results:
from typing import List
DiceResult = List[List[int]]
def dice_sum(num_dice: int, target_sum: int) -> DiceResult:
results = []
dice_sum_helper(num_dice, target_sum, [], results)
return results
def dice_sum_helper(num_dice: int, target_sum: int, chosen: List[int], results: DiceResult):
if num_dice == 0 and sum(chosen) == target_sum:
# Store the value that meets the constraints
results.append(chosen.copy())
elif num_dice == 0:
pass
else:
for i in range(1, 7):
chosen.append(i)
dice_sum_helper(num_dice - 1, target_sum, chosen, results)
chosen.pop()
Note this will be returning many duplicates, if ordering does not matter. You may want to investigate changing this to calculate a smaller target sum each recursion, and memoizing that in some way so it save on a lot of work. See also e.g. Calculate the number of ways to roll a certain number
You could utilize yield and yield from to return results from your functions:
from typing import List
def dice_sum(num_dice: int, target_sum: int) -> None:
yield from dice_sum_helper(num_dice, target_sum, [])
def dice_sum_helper(num_dice: int, target_sum: int, chosen: List[int]) -> None:
if num_dice == 0 and sum(chosen) == target_sum:
yield chosen[:]
elif num_dice == 0:
pass
else:
for i in range(1, 7):
chosen.append(i)
yield from dice_sum_helper(num_dice - 1, target_sum, chosen)
chosen.pop()
# you can store the results e.g. to list:
# results = list(dice_sum(3, 12))
for dices in dice_sum(3, 12):
for d in dices:
print('{: ^4}'.format(d), end='|')
print()
Prints:
1 | 5 | 6 |
1 | 6 | 5 |
2 | 4 | 6 |
2 | 5 | 5 |
2 | 6 | 4 |
3 | 3 | 6 |
3 | 4 | 5 |
3 | 5 | 4 |
3 | 6 | 3 |
4 | 2 | 6 |
4 | 3 | 5 |
4 | 4 | 4 |
4 | 5 | 3 |
4 | 6 | 2 |
5 | 1 | 6 |
5 | 2 | 5 |
5 | 3 | 4 |
5 | 4 | 3 |
5 | 5 | 2 |
5 | 6 | 1 |
6 | 1 | 5 |
6 | 2 | 4 |
6 | 3 | 3 |
6 | 4 | 2 |
6 | 5 | 1 |
For the stated goal of elegance, I would go with a simpler design without a helper function nor passing extra arguments:
def dice_sum(num_dice, target_sum):
solutions = []
for i in range(1, 7):
if target_sum < i:
continue
if target_sum == i:
if num_dice == 1:
solutions.append([i])
continue
for solution in dice_sum(num_dice - 1, target_sum - i):
solutions.append([i] + solution)
return solutions
print(dice_sum(3, 5))
OUTPUT
> python3 test.py
[[1, 1, 3], [1, 2, 2], [1, 3, 1], [2, 1, 2], [2, 2, 1], [3, 1, 1]]
>
For added efficiency, you could add:
if target_sum - i < num_dice - 1:
continue
before the inner for loop to avoid the recursion if there's no hope.
A recursive generator with early abort conditions, so you don't get duplicate solutions, with regards to the order of die rolls. I.e. I assume that your dice are not identifiable and therefor the order of the numbers shouldn't matter.
def dice_sum (num_dice, target_sum, start=1):
if num_dice == 1 and 0 < target_sum < 7:
yield (target_sum, )
return
if num_dice * 6 < target_sum or num_dice > target_sum:
return
for i in xrange(start, 7):
for option in dice_sum(num_dice - 1, target_sum - i, i):
if i > option[0]:
return
yield (i, ) + option
>>> tuple(dice_sum(3, 12))
((1, 5, 6), (2, 4, 6), (2, 5, 5), (3, 3, 6), (3, 4, 5), (4, 4, 4))

Counting choice streaks in Python

I have a dataset that looks like the following:
Subject | Session | Trial | Choice
--------+---------+-------+-------
1 | 1 | 1 | A
1 | 1 | 2 | B
1 | 1 | 3 | B
1 | 1 | 4 | B
1 | 1 | 5 | B
1 | 1 | 6 | A
2 | 1 | 1 | A
2 | 1 | 2 | A
2 | 1 | 3 | A
I would like to use a Python script to generate the following table:
Subject | Session | streak_count
--------+---------+-------------
1 | 1 | 3
2 | 1 | 1
Where streak_count is a count of the total number of choice streaks made by a given subject during a given session, and a streak is any number of choices of one particular item in a row (>0).
I've tried using some of the suggestions to similar questions here, but I'm having trouble figuring out how to count these instances, rather than measure their length, etc., which seem to be more common queries.
def count():
love = []
love1 = []
streak = -1
k = 0
session = 1
subject = raw_input("What is your subject? ")
trials = raw_input("How many trials do you wish to do? ")
trial = 0
for i in range(int(trials)):
choice = raw_input("What was the choice? ")
love.append(choice)
love1.append(choice)
trial += 1
print subject, trial, choice
if love[i] == love1[i-1]:
streak += 1
print subject, session, streak
This may be what you want it takes in how many trials you wish to do and whatever your subject is and if there is a streak it adds one. The reason streak starts at -1 is because when you put your first answer it adds one because of the negative index going back to its self.
I think this is what you are asking for;
import itertools
data = [
[1, 1, 1, 'A'],
[1, 1, 2, 'B'],
[1, 1, 3, 'B'],
[1, 1, 4, 'B'],
[1, 1, 5, 'B'],
[1, 1, 6, 'A'],
[2, 1, 1, 'A'],
[2, 1, 2, 'A'],
[2, 1, 3, 'A']
]
grouped = itertools.groupby(data, lambda x: x[0])
results = dict()
this, last = None, None
for key, group in grouped:
results[key] = 0
for c, d in enumerate(group):
this = d
streak = c == 0 or this[3] != last[3]]
if streak:
results[key] += 1
last = this
print results
This yields;
{1: 3, 2: 1}

Python - Huffman bit compression

I need help on this. I made a Huffman compression program. For example I'll input "google", I can just print it like this:
Character | Frequency | Huffman Code
------------------------------------
'g' | 2 | 0
'o' | 2 | 11
'l' | 1 | 101
'e' | 1 | 100
I also need to print the combined bits so if I use that it will be:
011101100
Which is wrong because as far as I understand Huffman compression it should be:
011110101100
So my output should be:
Character | Frequency | Huffman Code
------------------------------------
'g' | 2 | 0
'o' | 2 | 11
'o' | 2 | 11
'g' | 2 | 0
'l' | 1 | 101
'e' | 1 | 100
Basically I need to display it based on what I input. So if I input "test" it should also print test vertically with their corresponding bits since I'm just appending the bits and displaying them. How do I achieve this? Here's the printing part:
freq = {}
for c in word:
if c in freq:
freq[c] += 1
else:
freq[c] = 1
freq = sorted(freq.items(), key=lambda x: x[1], reverse=True)
if check:
print (" Char | Freq ")
for key, c in freq:
print (" %4r | %d" % (key, c))
nodes = freq
while len(nodes) > 1:
key1, c1 = nodes[-1]
key2, c2 = nodes[-2]
nodes = nodes[:-2]
node = NodeTree(key1, key2)
nodes.append((node, c1 + c2))
nodes = sorted(nodes, key=lambda x: x[1], reverse=True)
if check:
print ("left: %s" % nodes[0][0].nodes()[0])
print ("right: %s" % nodes[0][0].nodes()[1])
huffmanCode = huffman(nodes[0][0])
print ("\n")
print (" Character | Frequency | Huffman code ")
print ("---------------------------------------")
for char, frequency in freq:
print (" %-9r | %10d | %12s" % (char, frequency, huffmanCode[char]))
P.S. I know I shouldn't be sorting them I'll remove the sorting part don't worry

Categories