Related
I have two lists:
lookup_list = [1,2,3]
my_list = [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]
I want to count how many times the lookup_list appeared in my_list with the following logic:
The order should be 1 -> 2 -> 3
In my_list, the lookup_list items doesn't have to be next to each other: 1,4,2,1,5,3 -> should generate a match since there is a 2 comes after a 1 and a 3 comes after 2.
The mathces based on the logic:
1st match: [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]
2nd match: [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]
3rd match: [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]
4th match: [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]
The lookup_list is dynamic, it could be defined as [1,2] or [1,2,3,4], etc. How can I solve it? All the answers I've found is about finding matches where 1,2,3 appears next to each other in an ordered way like this one: Find matching sequence of items in a list
I can find the count of consecutive sequences with the below code but it doesn't count the nonconsecutive sequences:
from nltk import ngrams
lookup_list = [1,2,3]
my_list = [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]
all_counts = Counter(ngrams(l2, len(l1)))
counts = {k: all_counts[k] for k in [tuple(lookup_list)]}
counts
>>> {(1, 2, 3): 2}
I tried using pandas rolling window functions but they don't have a custom reset option.
def find_all_sequences(source, sequence):
def find_sequence(source, sequence, index, used):
for i in sequence:
while True:
index = source.index(i, index + 1)
if index not in used:
break
yield index
first, *rest = sequence
index = -1
used = set()
while True:
try:
index = source.index(first, index + 1)
indexes = index, *find_sequence(source, rest, index, used)
except ValueError:
break
else:
used.update(indexes)
yield indexes
Usage:
lookup_list = [1,2,3]
my_list = [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]
print(*find_all_sequences(my_list, lookup_list), sep="\n")
Output:
(0, 1, 2)
(6, 7, 11)
(9, 10, 15)
(14, 16, 17)
Generator function find_all_sequences() yields tuples with indexes of sequence matches. In this function we initialize loop which will be stopped when list.index() call will throw ValueError. Internal generator function find_sequence() yields index of every sequence item.
According to this benchmark, my method is about 60% faster than one from Andrej Kesely's answer.
The function find_matches() returns indices where the matches from lookup_list are:
def find_matches(lookup_list, lst):
buckets = []
def _find_bucket(i, v):
for b in buckets:
if lst[b[-1]] == lookup_list[len(b) - 1] and v == lookup_list[len(b)]:
b.append(i)
if len(b) == len(lookup_list):
buckets.remove(b)
return b
break
else:
if v == lookup_list[0]:
buckets.append([i])
rv = []
for i, v in enumerate(my_list):
b = _find_bucket(i, v)
if b:
rv.append(b)
return rv
lookup_list = [1, 2, 3]
my_list = [1, 2, 3, 4, 5, 2, 1, 2, 2, 1, 2, 3, 4, 5, 1, 3, 2, 3, 1]
print(find_matches(lookup_list, my_list))
Prints:
[[0, 1, 2], [6, 7, 11], [9, 10, 15], [14, 16, 17]]
Here is a recursive solution:
lookup_list = [1,2,3]
my_list = [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]
def find(my_list, continue_from_index):
if continue_from_index > (len(my_list) - 1):
return 0
last_found_index = 0
found_indizes = []
first_occuring_index = 0
found = False
for l in lookup_list:
for m_index in range(continue_from_index, len(my_list)):
if my_list[m_index] is l and m_index >= last_found_index:
if not found:
found = True
first_occuring_index = m_index
last_found_index = m_index
found += 1
found_indizes.append(str(m_index))
break
if len(found_indizes) is len(lookup_list):
return find(my_list, first_occuring_index+1) + 1
return 0
print(find(my_list, 0))
my_list = [5, 6, 3, 8, 2, 1, 7, 1]
lookup_list = [8, 2, 7]
counter =0
result =False
for i in my_list:
if i in lookup_list:
counter+=1
if(counter==len(lookup_list)):
result=True
print (result)
I'm new to dynamic programming in python and this is the code I have done so far to get the numbers that give the max sum from the array. However, my code doesn't work for input array A
Here are cases:
Test cases:
A = [7,2,-3,5,-4,8,6,3,1]
B = [7,2,5,8,6]
C = [-2,3,1,10,3,-7]
Output:
A = [7,5,8,3]
B = [7,5,6]
C = [3,10]
My output works for B and C but not for array A. The output I get is this:
[7,6,1]
And Here is my code:
def max_sum(nums):
#Get the size of the array
size = len(nums)
list = []
cache = [[0 for i in range(3)] for j in range(size)]
if(size == 0):
return 0
if (size == 1):
return nums[0]
for i in range(0, size):
if(nums[i] < 0):
validate = i
if(size == validate + 1):
return []
#Create array 'cache' to store non-consecutive maximum values
#cache = [0]*(size + 1)
#base case
cache[0][2] = nums[0]
#temp = nums[0]
cache[0][1] = nums[0]
for i in range(1, size):
#temp1 = temp
cache[i][2] = nums[i] #I store the array numbers at index [I][2]
cache[i][1] = cache[i - 1][0] + nums[I] #the max sum is store here
cache[i][0] = max(cache[i - 1][1], cache[i -1][0]) #current sum is store there
maxset = 0
for i in range(0, size): #I get the max sum
if(cache[i][1] > maxset):
maxset = cache[i][1]
for i in range(0, size): #I get the first element here
if(cache[i][1] == maxset):
temp = cache[i][2]
count = 0
for i in range(0, size): # I check at what index in the nums array the index 'temp' is store
if(nums[i] != temp):
count += 1
if(size - 1 == count): #iterate through the nums array to apend the non-adjacent elements
if(count % 2 == 0):
for i in range(0, size):
if i % 2 == 0 and i < size:
list.append(nums[i])
else:
for i in range(0, size):
if i % 2 != 0 and i < size:
list.append(nums[i])
list[:]= [item for item in list if item >= 0]
return list
if __name__ == '__main__':
A = [7,2,-3,5,-4,8,6,3,1]
B = [7,2,5,8,6]
C = [-2,3,1,10,3,-7]
'''
Also, I came up with the idea to create another array to store the elements that added to the max sum, but I don't know how to do that.
Any guidance would be appreciated and thanks beforehand!
Probably not the best solution , but what about trying with recursion ?
tests = [([7, 2, -3, 5, -4, 8, 6, 3, 1], [7, 5, 8, 3]),
([7, 2, 5, 8, 6], [7, 5, 6]),
([-2, 3, 1, 10, 3, -7], [3, 10]),
([7, 2, 9, 10, 1], [7, 9, 1]),
([7, 2, 5, 18, 6], [7, 18]),
([7, 20, -3, -5, -4, 8, 60, 3, 1], [20, 60, 1]),
([-7, -20, -3, 5, -4, 8, 60, 3, 1], [5, 60, 1])]
def bigest(arr, cache, final=[0]):
if len(arr) == 0:
return cache
for i in range(len(arr)):
result = bigest(arr[i + 2:], [*cache, arr[i]], final)
if sum(cache) > sum(final):
final[:] = cache[:]
if sum(result) > sum(final):
final[:] = result[:]
return result
if __name__ == "__main__":
print("has started")
for test, answer in tests:
final = [0]
bigest(test, [], final)
assert final == answer, "not matching"
print(f"for {test} , answer: {final} ")
Here is a dynamic programming approach.
def best_skips (data):
answers = []
for i in range(len(data)):
x = data[i]
best = [0, None]
for prev in answers[0:i-1]:
if best[0] < prev[0]:
best = prev
max_sum, path = best
answers.append([max_sum + x, [x, path]])
answers.append([0, None]) # Add empty set as an answer.
path = max(answers)[1]
final = []
while path is not None:
final.append(path[0])
path = path[1]
return final
I want to group consecutive values if it's duplicates and each value is just in one group, let's see my example below:
Note: results is an index of the value in test_list
test_list = ["1","2","1","2","1","1","5325235","2","62623","1","1"]
--->results = [[[0, 1], [2, 3]],
[[4, 5], [9, 10]]]
test_list = ["1","2","1","1","2","1","5325235","2","62623","1","2","1","236","2388","626236437","1","2","1","236","2388"]
--->results = [[[9, 10, 11, 12, 13], [15, 16, 17, 18, 19]],
[[0, 1, 2], [3, 4, 5]]]
I build a recursive function:
def group_duplicate_continuous_value(list_label_group):
# how to know which continuous value is duplicate, I implement take next number minus the previous number
list_flag_grouping = [str(int(j.split("_")[0]) - int(i.split("_")[0])) +f"_{j}_{i}" for i,j in zip(list_label_group,list_label_group[1:])]
# I find duplicate value in list_flag_grouping
counter_elements = Counter(list_flag_grouping)
list_have_duplicate = [k for k,v in counter_elements.items() if v > 1]
if len(list_have_duplicate) > 0:
list_final_index = group_duplicate_continuous_value(list_flag_grouping)
# To return exactly value, I use index to define
for k, v in list_final_index.items():
temp_list = [v[i] + [v[i][-1] + 1] for i in range(0,len(v))]
list_final_index[k] = temp_list
check_first_cursive = list_label_group[0].split("_")
# If we have many list grouping duplicate countinous value with different length, we need function below to return exactly results
if len(check_first_cursive) > 1:
list_temp_index = find_index_duplicate(list_label_group)
list_duplicate_index = list_final_index.values()
list_duplicate_index = [val for sublist in list_duplicate_index for val1 in sublist for val in val1]
for k,v in list_temp_index.items():
list_index_v = [val for sublist in v for val in sublist]
if any(x in list_index_v for x in list_duplicate_index) is False:
list_final_index[k] = v
return list_final_index
else:
if len(list_label_group) > 0:
check_first_cursive = list_label_group[0].split("_")
if len(check_first_cursive) > 1:
list_final_index = find_index_duplicate(list_label_group)
return list_final_index
list_final_index = None
return list_final_index
Support function:
def find_index_duplicate(list_data):
dups = defaultdict(list)
for i, e in enumerate(list_data):
dups[e].append([i])
new_dict = {key:val for key, val in dups.items() if len(val) >1}
return new_dict
But when I run with test_list = [5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,1,2,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,1,2,5,5,5], it's very slow and make out of memory (~6GB). I knew a reason is stack overflow of my recursive function group_duplicate_continuous_value but I don't know how to fix it.
You can create a dict of lists, where every item from the original list is a key in the dict, and every key is mapped to the list of its indices in the original list. For instance, your list ["1","3","5","5","7","1","3","5"] would result in the dict {"1": [0, 5], "3": [1, 6], "5": [2, 3, 7], "7": [4]}.
Creating a dict of lists in this way is very idiomatic in python, and fast, too: it can be done by iterating just once on the list.
def build_dict(l):
d = {}
for i, x in enumerate(l):
d.setdefault(x, []).append(i)
return d
l = ["1","3","5","5","7","1","3","5"]
d = build_dict(l)
print(d)
# {'1': [0, 5], '3': [1, 6], '5': [2, 3, 7], '7': [4]}
Then you can iterate on the dict to build two lists of indices:
def build_index_results(l):
d = build_dict(l)
idx1, idx2 = [], []
for v in d.values():
if len(v) > 1:
idx1.append(v[0])
idx2.append(v[1])
return idx1, idx2
print(build_index_results(l))
# ([0, 1, 2], [5, 6, 3])
Or using zip:
from operator import itemgetter
def build_index_results(l):
d = build_dict(l)
return list(zip(*map(itemgetter(0,1), (v for v in d.values() if len(v) > 1))))
print(build_index_results(l))
# [(0, 1, 2), (5, 6, 3)]
I can't resist showcasing more_itertools.map_reduce for this:
from more_itertools import map_reduce
from operator import itemgetter
def build_index_results(l):
d = map_reduce(enumerate(l),
keyfunc=itemgetter(1),
valuefunc=itemgetter(0),
reducefunc=lambda v: v[:2] if len(v) > 1 else None
)
return list(zip(*filter(None, d.values())))
print(build_index_results(l))
# [(0, 1, 2), (5, 6, 3)]
The Goal
I would like to get the ranges where values are not None in a list, so for example:
test1 = [None, 0, None]
test2 = [2,1,None]
test3 = [None,None,3]
test4 = [1,0,None,0,0,None,None,1,None,0]
res1 = [[1,1]]
res2 = [[0,1]]
res3 = [[2,2]]
res4 = [[0,1],[3,4],[7,7],[9,9]]
What I have tried
This is my super lengthy implementation, which does not perfectly work...
def get_not_None_ranges(list_):
# Example [0, 2, None, 1, 4] -> [[0, 1], [3, 4]]
r = []
end_i = len(list_)-1
if list_[0] == None:
s = None
else:
s = 0
for i, elem in enumerate(list_):
if s != None:
if elem == None and end_i != i:
r.append([s,i-1])
s = i+1
if end_i == i:
if s > i:
r=r
elif s==i and elem == None:
r=r
else:
r.append([s,i])
else:
if elem != None:
s = i
if end_i == i:
if s > i:
r=r
else:
r.append([s,i])
return r
As you can see the results are sometimes wrong:
print(get_not_None_ranges(test1))
print(get_not_None_ranges(test2))
print(get_not_None_ranges(test3))
print(get_not_None_ranges(test4))
[[1, 2]]
[[0, 2]]
[[2, 2]]
[[0, 1], [3, 4], [6, 5], [7, 7], [9, 9]]
So, I was wondering if you guys know a much better way to achieve this?
Use itertools.groupby:
from itertools import groupby
test1 = [None, 0, None]
test2 = [2, 1, None]
test3 = [None, None, 3]
test4 = [1, 0, None, 0, 0, None, None, 1, None, 0]
def get_not_None_ranges(lst):
result = []
for key, group in groupby(enumerate(lst), key=lambda x: x[1] is not None):
if key:
index, _ = next(group)
result.append([index, index + sum(1 for _ in group)])
return result
print(get_not_None_ranges(test1))
print(get_not_None_ranges(test2))
print(get_not_None_ranges(test3))
print(get_not_None_ranges(test4))
Output
[[1, 1]]
[[0, 1]]
[[2, 2]]
[[0, 1], [3, 4], [7, 7], [9, 9]]
A non-groupby solution that doesn't need extra treatment for the last group:
def get_not_None_ranges(lst):
result = []
it = enumerate(lst)
for i, x in it:
if x is not None:
first = last = i
for i, x in it:
if x is None:
break
last = i
result.append([first, last])
return result
Whenever I find the first of a non-None streak, I use an inner loop to right away run to the last of that streak. To allow both loops to use the same iterator, I store it in a variable.
You just need to iterate over the list, and check for two conditions:
If the previous element is None and the current element is not None, start a new "range".
If the previous element is not None and the current element is None, end the currently active range at the previous index.
def gnnr(lst):
all_ranges = []
current_range = []
prev_item = None
for index, item in enumerate(lst):
# Condition 1
if prev_item is None and item is not None:
current_range.append(index)
# Condition 2
elif prev_item is not None and item is None:
current_range.append(index - 1) # Close current range at the previous index
all_ranges.append(current_range) # Add to all_ranges
current_range = [] # Reset current_range
prev_item = item
# If current_range isn't closed, close it at the last index of the list
if current_range:
current_range.append(index)
all_ranges.append(current_range)
return all_ranges
Calling this function with your test cases gives the expected output:
[[1, 1]]
[[0, 1]]
[[2, 2]]
[[0, 1], [3, 4], [7, 7], [9, 9]]
Well, we can solve this by using classic sliding window approach.
Here is the solution which works fine:
def getRanges(nums):
left = right = 0
ranges, n = [], len(nums)
while right < n:
while left < n and nums[left] == None:
left += 1
right += 1
while right < n and nums[right] != None:
right += 1
if right >= n:
break
ranges.append([left, right - 1])
left = right = right + 1
return ranges + [[left, right - 1]] if right - 1 >= left else ranges
Lets test it:
test = [
[1, 0, None, 0, 0, None, None, 1, None, 0],
[None, None, 3],
[2, 1, None],
[None, 0, None],
]
for i in test:
print(getRanges(i))
Output:
[[0, 1], [3, 4], [7, 7], [9, 9]]
[[2, 2]]
[[0, 1]]
[[1, 1]]
Give it a try. Code uses Type Hint and a named tuple in order to increase readablity.
from typing import NamedTuple,List,Any
class Range(NamedTuple):
left: int
right: int
def get_ranges(lst: List[Any]) -> List[Range]:
ranges : List[Range] = []
left = None
right = None
for i,x in enumerate(lst):
is_none = x is None
if is_none:
if left is not None :
right = right if right is not None else left
ranges.append(Range(left,right))
left = None
right = None
else:
if left is None:
left = i
else:
right = i
if left is not None:
right = right if right is not None else left
ranges.append(Range(left,right))
return ranges
data = [[1,0,None,0,0,None,None,1,None,0],[None,None,3],[2,1,None],[None, 0, None]]
for entry in data:
print(get_ranges(entry))
outut
[Range(left=0, right=1), Range(left=3, right=4), Range(left=7, right=7), Range(left=9, right=9)]
[Range(left=2, right=2)]
[Range(left=0, right=1)]
[Range(left=1, right=1)]
Using first and last of each group of not nones:
from itertools import groupby
def get_not_None_ranges(lst):
result = []
for nones, group in groupby(enumerate(lst), lambda x: x[1] is None):
if not nones:
first = last = next(group)
for last in group:
pass
result.append([first[0], last[0]])
return result
Here's my example. It is definitely NOT the most efficient way, but I think it is more intuitive and you can optimize it later.
def get_not_None_ranges(list_: list):
res = []
start_index = -1
for i in range(len(list_)):
e = list_[i]
if e is not None:
if start_index < 0:
start_index = i
else:
if start_index >= 0:
res.append([start_index, i - 1])
start_index = -1
if start_index >= 0:
res.append([start_index, len(list_) - 1])
return res
The main thought of this function:
start_index is initialized with -1
When we meet not None element, set start_index to its index
When we meet None, save [start_index, i - 1 (since the previous element is the end of the session)]. Then set start_index back to -1.
When we meet None but start_index is -1, we need to do nothing since we have not met the not None element this turn. For the same reason, do nothing when we meet not None when start_index > 0.
When the loop end but start_index still larger than 0, it means we haven't record this valid turn. So we need to do that manually.
I think it may be a little bit complex, it may help to paste the code above and debug it line by line in a debugger.
How about:-
test1 = [None, 0, None]
test2 = [2, 1, None]
test3 = [None, None, 3]
test4 = [1, 0, None, 0, 0, None, None, 1, None, 0]
def goal(L):
r = []
_r = None
for i, e in enumerate(L):
if e is not None:
if _r:
_r[1] = i
else:
_r = [i, i]
else:
if _r:
r.append(_r)
_r = None
if _r:
r.append(_r)
return r
for _l in [test1, test2, test3, test4]:
print(goal(_l))
Another solution (one-liner with itertools.groupby):
from itertools import groupby
out = [[(v := list(g))[0][1], v[-1][1]] for _, g in groupby(enumerate(i for i, v in enumerate(testX) if not v is None), lambda k: k[0] - k[1],)]
Tests:
test1 = [None, 0, None]
test2 = [2, 1, None]
test3 = [None, None, 3]
test4 = [1, 0, None, 0, 0, None, None, 1, None, 0]
tests = [test1, test2, test3, test4]
for t in tests:
out = [
[(v := list(g))[0][1], v[-1][1]]
for _, g in groupby(
enumerate(i for i, v in enumerate(t) if not v is None),
lambda k: k[0] - k[1],
)
]
print(out)
Prints:
[[1, 1]]
[[0, 1]]
[[2, 2]]
[[0, 1], [3, 4], [7, 7], [9, 9]]
In the following code:
def pascal_row(row):
if row == 0:
return [1]
previous_row = pascal_row(row - 1)
pairs = zip(previous_row[:-1], previous_row[1:])
return [1] + map(sum, pairs) + [1]
if I print (pascal_row(5)), it returns [1, 5, 10, 10, 5, 1] which is the correct solution.
This is a homework assignment where we need to use recursion, and cannot use any loops or zip.
Could someone please help me convert it accordingly? Thank you!
You can use a different recursive function sliding_sum in order to calculate the pairwise sum for the previous row. Then, just append [1] on either end.
def sliding_sum(someList):
if len(someList) == 1:
return []
return [someList[0] + someList[1]] + sliding_sum(someList[1:])
def pascal_row(row):
if row == 0:
return [1]
previous_row = pascal_row(row-1)
new_row = [1] + sliding_sum(previous_row) + [1]
return new_row
for i in range(6):
print pascal_row(i)
Output
[1]
[1, 1]
[1, 2, 1]
[1, 3, 3, 1]
[1, 4, 6, 4, 1]
[1, 5, 10, 10, 5, 1]
Here's another solution involving a helper function:
def pascal_row(row):
if row == 0:
return [1]
return _pascal_row(row, 0, [1])
def _pascal_row(target_row, current_row, res):
if target_row == current_row:
return res
else:
res = [1] + [res[i] + res[i+1] for i in xrange(len(res) - 1)] + [1]
return _pascal_row(target_row, current_row + 1, res)
print pascal_row(5) # [1, 5, 10, 10, 5, 1]