The Goal
I would like to get the ranges where values are not None in a list, so for example:
test1 = [None, 0, None]
test2 = [2,1,None]
test3 = [None,None,3]
test4 = [1,0,None,0,0,None,None,1,None,0]
res1 = [[1,1]]
res2 = [[0,1]]
res3 = [[2,2]]
res4 = [[0,1],[3,4],[7,7],[9,9]]
What I have tried
This is my super lengthy implementation, which does not perfectly work...
def get_not_None_ranges(list_):
# Example [0, 2, None, 1, 4] -> [[0, 1], [3, 4]]
r = []
end_i = len(list_)-1
if list_[0] == None:
s = None
else:
s = 0
for i, elem in enumerate(list_):
if s != None:
if elem == None and end_i != i:
r.append([s,i-1])
s = i+1
if end_i == i:
if s > i:
r=r
elif s==i and elem == None:
r=r
else:
r.append([s,i])
else:
if elem != None:
s = i
if end_i == i:
if s > i:
r=r
else:
r.append([s,i])
return r
As you can see the results are sometimes wrong:
print(get_not_None_ranges(test1))
print(get_not_None_ranges(test2))
print(get_not_None_ranges(test3))
print(get_not_None_ranges(test4))
[[1, 2]]
[[0, 2]]
[[2, 2]]
[[0, 1], [3, 4], [6, 5], [7, 7], [9, 9]]
So, I was wondering if you guys know a much better way to achieve this?
Use itertools.groupby:
from itertools import groupby
test1 = [None, 0, None]
test2 = [2, 1, None]
test3 = [None, None, 3]
test4 = [1, 0, None, 0, 0, None, None, 1, None, 0]
def get_not_None_ranges(lst):
result = []
for key, group in groupby(enumerate(lst), key=lambda x: x[1] is not None):
if key:
index, _ = next(group)
result.append([index, index + sum(1 for _ in group)])
return result
print(get_not_None_ranges(test1))
print(get_not_None_ranges(test2))
print(get_not_None_ranges(test3))
print(get_not_None_ranges(test4))
Output
[[1, 1]]
[[0, 1]]
[[2, 2]]
[[0, 1], [3, 4], [7, 7], [9, 9]]
A non-groupby solution that doesn't need extra treatment for the last group:
def get_not_None_ranges(lst):
result = []
it = enumerate(lst)
for i, x in it:
if x is not None:
first = last = i
for i, x in it:
if x is None:
break
last = i
result.append([first, last])
return result
Whenever I find the first of a non-None streak, I use an inner loop to right away run to the last of that streak. To allow both loops to use the same iterator, I store it in a variable.
You just need to iterate over the list, and check for two conditions:
If the previous element is None and the current element is not None, start a new "range".
If the previous element is not None and the current element is None, end the currently active range at the previous index.
def gnnr(lst):
all_ranges = []
current_range = []
prev_item = None
for index, item in enumerate(lst):
# Condition 1
if prev_item is None and item is not None:
current_range.append(index)
# Condition 2
elif prev_item is not None and item is None:
current_range.append(index - 1) # Close current range at the previous index
all_ranges.append(current_range) # Add to all_ranges
current_range = [] # Reset current_range
prev_item = item
# If current_range isn't closed, close it at the last index of the list
if current_range:
current_range.append(index)
all_ranges.append(current_range)
return all_ranges
Calling this function with your test cases gives the expected output:
[[1, 1]]
[[0, 1]]
[[2, 2]]
[[0, 1], [3, 4], [7, 7], [9, 9]]
Well, we can solve this by using classic sliding window approach.
Here is the solution which works fine:
def getRanges(nums):
left = right = 0
ranges, n = [], len(nums)
while right < n:
while left < n and nums[left] == None:
left += 1
right += 1
while right < n and nums[right] != None:
right += 1
if right >= n:
break
ranges.append([left, right - 1])
left = right = right + 1
return ranges + [[left, right - 1]] if right - 1 >= left else ranges
Lets test it:
test = [
[1, 0, None, 0, 0, None, None, 1, None, 0],
[None, None, 3],
[2, 1, None],
[None, 0, None],
]
for i in test:
print(getRanges(i))
Output:
[[0, 1], [3, 4], [7, 7], [9, 9]]
[[2, 2]]
[[0, 1]]
[[1, 1]]
Give it a try. Code uses Type Hint and a named tuple in order to increase readablity.
from typing import NamedTuple,List,Any
class Range(NamedTuple):
left: int
right: int
def get_ranges(lst: List[Any]) -> List[Range]:
ranges : List[Range] = []
left = None
right = None
for i,x in enumerate(lst):
is_none = x is None
if is_none:
if left is not None :
right = right if right is not None else left
ranges.append(Range(left,right))
left = None
right = None
else:
if left is None:
left = i
else:
right = i
if left is not None:
right = right if right is not None else left
ranges.append(Range(left,right))
return ranges
data = [[1,0,None,0,0,None,None,1,None,0],[None,None,3],[2,1,None],[None, 0, None]]
for entry in data:
print(get_ranges(entry))
outut
[Range(left=0, right=1), Range(left=3, right=4), Range(left=7, right=7), Range(left=9, right=9)]
[Range(left=2, right=2)]
[Range(left=0, right=1)]
[Range(left=1, right=1)]
Using first and last of each group of not nones:
from itertools import groupby
def get_not_None_ranges(lst):
result = []
for nones, group in groupby(enumerate(lst), lambda x: x[1] is None):
if not nones:
first = last = next(group)
for last in group:
pass
result.append([first[0], last[0]])
return result
Here's my example. It is definitely NOT the most efficient way, but I think it is more intuitive and you can optimize it later.
def get_not_None_ranges(list_: list):
res = []
start_index = -1
for i in range(len(list_)):
e = list_[i]
if e is not None:
if start_index < 0:
start_index = i
else:
if start_index >= 0:
res.append([start_index, i - 1])
start_index = -1
if start_index >= 0:
res.append([start_index, len(list_) - 1])
return res
The main thought of this function:
start_index is initialized with -1
When we meet not None element, set start_index to its index
When we meet None, save [start_index, i - 1 (since the previous element is the end of the session)]. Then set start_index back to -1.
When we meet None but start_index is -1, we need to do nothing since we have not met the not None element this turn. For the same reason, do nothing when we meet not None when start_index > 0.
When the loop end but start_index still larger than 0, it means we haven't record this valid turn. So we need to do that manually.
I think it may be a little bit complex, it may help to paste the code above and debug it line by line in a debugger.
How about:-
test1 = [None, 0, None]
test2 = [2, 1, None]
test3 = [None, None, 3]
test4 = [1, 0, None, 0, 0, None, None, 1, None, 0]
def goal(L):
r = []
_r = None
for i, e in enumerate(L):
if e is not None:
if _r:
_r[1] = i
else:
_r = [i, i]
else:
if _r:
r.append(_r)
_r = None
if _r:
r.append(_r)
return r
for _l in [test1, test2, test3, test4]:
print(goal(_l))
Another solution (one-liner with itertools.groupby):
from itertools import groupby
out = [[(v := list(g))[0][1], v[-1][1]] for _, g in groupby(enumerate(i for i, v in enumerate(testX) if not v is None), lambda k: k[0] - k[1],)]
Tests:
test1 = [None, 0, None]
test2 = [2, 1, None]
test3 = [None, None, 3]
test4 = [1, 0, None, 0, 0, None, None, 1, None, 0]
tests = [test1, test2, test3, test4]
for t in tests:
out = [
[(v := list(g))[0][1], v[-1][1]]
for _, g in groupby(
enumerate(i for i, v in enumerate(t) if not v is None),
lambda k: k[0] - k[1],
)
]
print(out)
Prints:
[[1, 1]]
[[0, 1]]
[[2, 2]]
[[0, 1], [3, 4], [7, 7], [9, 9]]
Related
I want to group consecutive values if it's duplicates and each value is just in one group, let's see my example below:
Note: results is an index of the value in test_list
test_list = ["1","2","1","2","1","1","5325235","2","62623","1","1"]
--->results = [[[0, 1], [2, 3]],
[[4, 5], [9, 10]]]
test_list = ["1","2","1","1","2","1","5325235","2","62623","1","2","1","236","2388","626236437","1","2","1","236","2388"]
--->results = [[[9, 10, 11, 12, 13], [15, 16, 17, 18, 19]],
[[0, 1, 2], [3, 4, 5]]]
I build a recursive function:
def group_duplicate_continuous_value(list_label_group):
# how to know which continuous value is duplicate, I implement take next number minus the previous number
list_flag_grouping = [str(int(j.split("_")[0]) - int(i.split("_")[0])) +f"_{j}_{i}" for i,j in zip(list_label_group,list_label_group[1:])]
# I find duplicate value in list_flag_grouping
counter_elements = Counter(list_flag_grouping)
list_have_duplicate = [k for k,v in counter_elements.items() if v > 1]
if len(list_have_duplicate) > 0:
list_final_index = group_duplicate_continuous_value(list_flag_grouping)
# To return exactly value, I use index to define
for k, v in list_final_index.items():
temp_list = [v[i] + [v[i][-1] + 1] for i in range(0,len(v))]
list_final_index[k] = temp_list
check_first_cursive = list_label_group[0].split("_")
# If we have many list grouping duplicate countinous value with different length, we need function below to return exactly results
if len(check_first_cursive) > 1:
list_temp_index = find_index_duplicate(list_label_group)
list_duplicate_index = list_final_index.values()
list_duplicate_index = [val for sublist in list_duplicate_index for val1 in sublist for val in val1]
for k,v in list_temp_index.items():
list_index_v = [val for sublist in v for val in sublist]
if any(x in list_index_v for x in list_duplicate_index) is False:
list_final_index[k] = v
return list_final_index
else:
if len(list_label_group) > 0:
check_first_cursive = list_label_group[0].split("_")
if len(check_first_cursive) > 1:
list_final_index = find_index_duplicate(list_label_group)
return list_final_index
list_final_index = None
return list_final_index
Support function:
def find_index_duplicate(list_data):
dups = defaultdict(list)
for i, e in enumerate(list_data):
dups[e].append([i])
new_dict = {key:val for key, val in dups.items() if len(val) >1}
return new_dict
But when I run with test_list = [5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,1,2,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,1,2,5,5,5], it's very slow and make out of memory (~6GB). I knew a reason is stack overflow of my recursive function group_duplicate_continuous_value but I don't know how to fix it.
You can create a dict of lists, where every item from the original list is a key in the dict, and every key is mapped to the list of its indices in the original list. For instance, your list ["1","3","5","5","7","1","3","5"] would result in the dict {"1": [0, 5], "3": [1, 6], "5": [2, 3, 7], "7": [4]}.
Creating a dict of lists in this way is very idiomatic in python, and fast, too: it can be done by iterating just once on the list.
def build_dict(l):
d = {}
for i, x in enumerate(l):
d.setdefault(x, []).append(i)
return d
l = ["1","3","5","5","7","1","3","5"]
d = build_dict(l)
print(d)
# {'1': [0, 5], '3': [1, 6], '5': [2, 3, 7], '7': [4]}
Then you can iterate on the dict to build two lists of indices:
def build_index_results(l):
d = build_dict(l)
idx1, idx2 = [], []
for v in d.values():
if len(v) > 1:
idx1.append(v[0])
idx2.append(v[1])
return idx1, idx2
print(build_index_results(l))
# ([0, 1, 2], [5, 6, 3])
Or using zip:
from operator import itemgetter
def build_index_results(l):
d = build_dict(l)
return list(zip(*map(itemgetter(0,1), (v for v in d.values() if len(v) > 1))))
print(build_index_results(l))
# [(0, 1, 2), (5, 6, 3)]
I can't resist showcasing more_itertools.map_reduce for this:
from more_itertools import map_reduce
from operator import itemgetter
def build_index_results(l):
d = map_reduce(enumerate(l),
keyfunc=itemgetter(1),
valuefunc=itemgetter(0),
reducefunc=lambda v: v[:2] if len(v) > 1 else None
)
return list(zip(*filter(None, d.values())))
print(build_index_results(l))
# [(0, 1, 2), (5, 6, 3)]
The problem is: given an array nums, find the unique subarrays (need not be continuous) that add up to k. If I comment out the first if statement that checks for inclusion in the memo, everything works. With that line, it will not work. Can anyone help me figure out why? Thank you
class Solution:
def unique_subarrays_sum(self, nums, k):
self.memo = {}
self.pool = sorted(nums)
return self.helper([], 0, k)
def first_idx(self, nums, start):
seen = set()
return [(idx, n) for idx, n in enumerate(nums, start) if n not in seen and not seen.add(n)]
def helper(self, used, idx, k, call_num=0):
if (idx, k) in self.memo:
return self.memo[(idx, k)]
elif k == 0:
self.memo[(idx, k)] = [used]
return self.memo[(idx, k)]
elif k < 0:
return []
else:
res = []
# print(' '*call_num, self.first_idx(self.pool[idx:], idx), sep='')
for i, e in self.first_idx(self.pool[idx:], idx):
# print(f'{" "*call_num}going to: used={used+[e]}, pool={self.pool[i+1:]}, k={k-e}')
r = self.helper(used+[e], i+1, k-e, call_num+1)
res.extend(r)
self.memo[(idx, k)] = res
# print(f'{" "*call_num}found: {res}', end='\n')
return res
An example where this doesn't work:
input = [4,1,1,4,4,4,4,2,3,5]
output, memoized (wrong) = [[1, 1, 3, 5], [1, 1, 4, 4], [1, 2, 3, 4], [1, 4, 5], [1, 1, 3, 5], [1, 1, 4, 4]]
output, no memo (correct) = [[1, 1, 3, 5], [1, 1, 4, 4], [1, 2, 3, 4], [1, 4, 5], [2, 3, 5], [2, 4, 4]]
The issue is that used is being passed in when it should only be added to results after the computation
class Solution:
def unique_subarrays_sum(self, nums, k):
self.memo = defaultdict(list)
self.pool = sorted(nums)
return self.helper([], 0, k)
def first_idx(self, nums, start):
seen = set()
return [(idx, n) for idx, n in enumerate(nums, start) if n not in seen and not seen.add(n)]
def helper(self, used, idx, k):
if (idx, k) in self.memo:
return self.memo[(idx, k)]
elif k == 0:
return [used]
elif k < 0:
return []
else:
res = []
for i, e in self.first_idx(self.pool[idx:], idx):
r = [used + l for l in self.helper([e], i+1, k-e)]
res.extend(r)
self.memo[(idx, k)] = res
return res
I have a list of list and I want to remove zero values that are between numbers in each list. All my lists inside my list have same lenght.
For example:
List1=[[0,1,0,2,3,0,0],[0,5,6,0,0,9,0]]
desired output:
list2=[[0,1,2,3,0,0],[0,5,6,9,0]]
I was thinking about using indices to identify the first non zero value and last non zero value, but then I don't know how I can remove zeros between them.
You have the right idea, I think, with finding the first and last indices of nonzeroes and removing zeroes between them. Here's a function that does that:
def remove_enclosed_zeroes(lst):
try:
first_nonzero = next(
i
for (i, e) in enumerate(lst)
if e != 0
)
last_nonzero = next(
len(lst) - i - 1
for (i, e) in enumerate(reversed(lst))
if e != 0
)
except StopIteration:
return lst[:]
return lst[:first_nonzero] + \
[e for e in lst[first_nonzero:last_nonzero] if e != 0] + \
lst[last_nonzero:]
list1 = [[0,1,0,2,3,0,0],[0,5,6,0,0,9,0]]
list2 = [remove_enclosed_zeroes(sublist) for sublist in list1]
# [[0, 1, 2, 3, 0, 0], [0, 5, 6, 9, 0]]
Inspired by #python_user I thought about this a bit more and came up with this simpler solution:
def remove_internal_zeros(lst):
return [v for i, v in enumerate(lst) if v or not any(lst[i+1:]) or not any(lst[:i])]
This works by passing any value from the original list which is either
not zero (v); or
zero and not preceded by a non-zero value (not any(lst[:i])); or
zero and not followed by a non-zero value (not any(lst[i+1:]))
It can also be written as a list comprehension:
list2 = [[v for i, v in enumerate(lst) if v or not any(lst[:i]) or not any(lst[i+1:])] for lst in list1]
Original Answer
Here's another brute force approach, this pops all the zeros off either end of the list into start and end lists, then filters the balance of the list for non-zero values:
def remove_internal_zeros(l):
start_zeros = []
# get starting zeros
v = l.pop(0)
while v == 0 and len(l) > 0:
start_zeros.append(0)
v = l.pop(0)
if len(l) == 0:
return start_zeros + [v]
l = [v] + l
# get ending zeros
end_zeros = []
v = l.pop()
while v == 0 and len(l) > 0:
end_zeros.append(0)
v = l.pop()
# filter balance of list
if len(l) == 0:
return start_zeros + [v] + end_zeros
return start_zeros + list(filter(bool, l)) + [v] + end_zeros
print(remove_internal_zeros([0,1,0,2,3,0,0]))
print(remove_internal_zeros([0,5,6,0,0,9,0]))
print(remove_internal_zeros([0,0]))
print(remove_internal_zeros([0,5,0]))
Output:
[0, 1, 2, 3, 0, 0]
[0, 5, 6, 9, 0]
[0, 0]
[0, 5, 0]
I think this has to be done with brute force.
new = []
for sub in List1:
# Find last non-zero.
for j in range(len(sub)):
if sub[-1-j]:
lastnonzero = len(sub)-j
break
print(j)
newsub = []
firstnonzero = False
for i,j in enumerate(sub):
if j:
firstnonzero = True
newsub.append(j)
elif i >= lastnonzero or not firstnonzero:
newsub.append(j)
new.append(newsub)
print(new)
Please try this, remove all 0 between numbers in each list.:
list1=[[0,1,0,2,3,0,0],[0,5,6,0,0,9,0]]
rowIndex=len(list1) # count of rows
colIndex=len(list1[0]) # count of columns
for i in range(0, rowIndex):
noZeroFirstIndex = 1
noZeroLastIndex = colIndex - 2
for j in range(1, colIndex - 1):
if(list1[i][j] != 0):
noZeroFirstIndex = j
break
for j in range(colIndex -2, 0, -1):
if(list1[i][j] != 0):
noZeroLastIndex = j
break
for j in range(noZeroLastIndex, noZeroFirstIndex, -1):
if(list1[i][j] == 0 ):
del list1[i][j]
print(list1)
Result:
[[0, 1, 2, 3, 0, 0], [0, 5, 6, 9, 0]]
I wrote a pretty straight-forward approach. Try this.
def removeInnerZeroes(list):
listHold=[]
listNew = []
firstNonZeroFound = False
for item in list:
if item==0:
if firstNonZeroFound:
listHold.append(item)
else:
listNew.append(item)
else:
firstNonZeroFound=True
listHold.clear()
listNew.append(item)
listNew.extend(listHold)
return listNew
complexList = [[0,1,0,2,3,0,0],[0,5,6,0,0,9,0]]
print(complexList)
complexListNew = []
for listi in complexList:
complexListNew.append(removeInnerZeroes(listi))
print(complexListNew)
One way to do it is to treat each sublist as 3 sections:
Zeros at the front, if any
Zeros at the end, if any
Numbers in the middle from which zeros are to be purged
itertools.takewhile is handy for the front and end bits.
from itertools import takewhile
List1=[[0,1,0,2,3,0,0],[0,5,6,0,0,9,0]]
def purge_middle_zeros(numbers):
is_zero = lambda x: x==0
leading_zeros = list(takewhile(is_zero, numbers))
n_lead = len(leading_zeros)
trailing_zeros = list(takewhile(is_zero, reversed(numbers[n_lead:])))
n_trail = len(trailing_zeros)
mid_numbers = numbers[n_lead:-n_trail] if n_trail else numbers[n_lead:]
mid_non_zeros = [x for x in mid_numbers if x]
return leading_zeros + mid_non_zeros + trailing_zeros
list2 = [purge_middle_zeros(sub_list) for sub_list in List1]
list2
[[0, 1, 2, 3, 0, 0], [0, 5, 6, 9, 0]]
Other notes:
the lambda function is_zero tells takewhile what the criteria are for continuing, in this case "keep taking while it's a zero"
for the mid_non_zeros section the list comprehension [x for.... ] takes all the numbers except for the zeros (the if x at the end applies the filter)
slicing notation to pick out the middle of the list, numbers[from_start:-from_end] with the negative -from_end meaning 'except for this many elements at the end'. The case where there are no trailing zeros requires a different slice expression, i.e. numbers[from_start:]
I wrote the following code out of curiosity, that is somewhat intuitive and mimics a "looking item-by-item" approach.
def remove_zeros_inbetween(list_):
new_list = list_.copy()
for j, l in enumerate(list_): # loop through the inner lists
checking = False
start = end = None
i = 0
deleted = 0
while i < len(l): # loop through the values of an inner list
if l[i] == 0: # ignore
i += 1
continue
if l[i] != 0 and not checking: # non-zero value found
checking = True # start checking for zeros
start = i
elif l[i] != 0 and checking: # if got here and checking, the finish checking
checking = False
end = i
if start and end: # if both values have been set, i.e, different to None
# delete values in-between
new_list[j] = new_list[j][:(start+1-deleted)] + new_list[j][(end-deleted):]
deleted += end - start - 1
if l[i] != 0: # for the case of two non-zero values
start = i
checking = True
else:
i = end # ignore everything up to end
end = None # restart check
i += 1
return new_list
>>> remove_zeros_inbetween([[0, 1, 0, 2, 3, 0, 5], [0, 5, 6, 0, 0, 9, 4]])
[[0, 1, 2, 3, 5], [0, 5, 6, 9, 4]]
>>> remove_zeros_inbetween([[0, 0], [0, 3, 0], [0]]))
[[0, 0], [0, 3, 0], [0]]
>>> remove_zeros_inbetween([[0, 0, 0, 0]]))
[[0, 0, 0, 0]]
You start by replacing 0 by "0" - which is not necessary. Secondly your filter call does not save the resulting list; try:
list1[i] = list(filter(lambda a: a !=0, list1[1:-1])) # changed indexing , I suppose this could work
I have a list such as
[0,0,0,12,34,86,0,0,0,95,20,1,6,0,0,0,11,24,67,0,0,0]
I want to find start and end position where element are positive:
[[3,5],[9,12],[16,18]]
what is the best way to do this in python?
(build in function in python such as: find,lambda,itemgetter and so on.)
And lastly, regex version. ;)
input = [0,0,0,12,34,86,0,0,0,95,20,1,6,0,0,0,11,24,67,0,0,0]
input3 = str(list(
map(lambda i_x: i_x[0] * (i_x[1] and (1, -1)[i_x[1] < 0]), enumerate(input))
))
import re
s = re.sub(r'([\[ ]0[\],])+', ' ', input3)
s = s.replace(', ', '], [')
if s[-1:] != ']':
s = s[:-2] + ']'
s = '[' + s[2:]
s = re.sub(r' [0-9]+,', '', s)
output = list(eval(s))
print(output) # [[3, 5], [9, 12], [16, 18]]
Crude for solution. :(
input = [0,0,0,12,34,86,0,0,0,95,20,1,6,0,0,0,11,24,67,0,0,0]
output = []
pair = []
for i in range(len(input)):
if input[i] > 0:
if len(pair) > 1:
pair.pop()
pair.append(i)
else:
if pair:
output.append(pair)
pair = []
print(output) # [[3, 5], [9, 12], [16, 18]]
Not sure if ranges can go off the ends of the array or not.
def get_positive_ranges(a):
in_range = False
result = []
for i in range(len(a)):
if not in_range:
if a[i] > 0:
in_range = True
first = i
else: # Inside a range
if a[i] <= 0: # End of range
in_range = False
result.append([first, i - 1])
if in_range: # Tidy
result.append([first, i])
return result
print(get_positive_ranges([0,0,0,12,34,86,0,0,0,95,20,1,6,0,0,0,11,24,67,0,0,0]))
print(get_positive_ranges([]))
print(get_positive_ranges([1]))
print(get_positive_ranges([0, 1]))
print(get_positive_ranges([0, 1, 0]))
Here is a numpy solution, not sure if this is better than a naive for-loop though; see inline comments for explanation.
import numpy as np
a = np.array([0,0,0,12,34,86,0,0,0,95,20,1,6,0,0,0,11,24,67,0,0,0])
# get indices of non-zero elements in a
nze = a.nonzero()[0]
# check where the differences of these indices are unequal to one; there you have a jump to/from 0
nze_diff = np.where(np.diff(nze) > 1)[0] + 1
# if a starts with 0, add the index 0
if nze_diff[0] != 0:
nze_diff = np.insert(nze_diff, 0, 0)
# store output
res = []
# loop through the indices and add the desired slices
for ix, i in enumerate(nze_diff):
try:
sl = nze[i:nze_diff[ix + 1]]
res.append([sl[0], sl[-1]])
# means we reached the end of nze_diff
except IndexError:
sl = nze[i:]
res.append([sl[0], sl[-1]])
If you run it for your a, you receive the desired output:
[[3, 5], [9, 12], [16, 18]]
There are probably smarter solutions than this, but this might get you started.
If you want to get the entire range, it simplifies a bit:
res2 = []
for ix, i in enumerate(nze_diff):
try:
res2.append(nze[i:nze_diff[ix + 1]])
except IndexError:
res2.append(nze[i:])
Then res2 would be:
[array([3, 4, 5]), array([ 9, 10, 11, 12]), array([16, 17, 18])]
This works
lst = [0, 0, 0, 12, 34, 86, 0, 0, 0, 95, 20, 1, 6, 0, 0, 0, 11, 24, 67, 0, 0, 0]
n = len(lst)
starting_points = [i for i in range(n) if lst[i] > 0 and (lst[i - 1] == 0 or i == 0)]
end_points = [next((i for i in range(j + 1, n) if lst[i] == 0), n) - 1 for j in starting_points]
print zip(starting_points, end_points)
output
[(3, 5), (9, 12), (16, 18)]
If performance is the key, you should test which implementaion is the most fast with your very long list. Anyway, this is 'no array acess by index' version, hopefully for boosting speed. And it uses map, lambda, index(find), if it pleases you. Though, of course it uses while.
input = [0,0,0,12,34,86,0,0,0,95,20,1,6,0,0,0,11,24,67,0,0,0]
output = []
input2 = list(map(lambda x: x and (1, -1)[x < 0], input)) # mapping by 'math.sign'-like func
start = end = 0
while end < len(input2):
try:
start = input2.index(1, end + 1)
end = input2.index(0, start) - 1
output.append([start, end])
except ValueError:
break
if start >= end:
output.append([start, len(input2) - 1])
print(output) # [[3, 5], [9, 12], [16, 18]]
I want to substitute missing values (None) with the last previous known value. This is my code. But it doesn't work. Any suggestions for a better algorithm?
t = [[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]]
def treat_missing_values(table):
for line in table:
for value in line:
if value == None:
value = line[line.index(value)-1]
return table
print treat_missing_values(t)
This is probably how I'd do it:
>>> def treat_missing_values(table):
... for line in table:
... prev = None
... for i, value in enumerate(line):
... if value is None:
... line[i] = prev
... else:
... prev = value
... return table
...
>>> treat_missing_values([[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]])
[[1, 3, 3, 5, 5], [2, 2, 2, 3, 1], [4, 4, 2, 1, 1]]
>>> treat_missing_values([[None, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]])
[[None, 3, 3, 5, 5], [2, 2, 2, 3, 1], [4, 4, 2, 1, 1]]
When you do an assignment in python, you are merely creating a reference on an object in memory. You can't use value to set the object in the list because you're effectively making value reference another object in memory.
To do what you want, you need to set directly in the list at the right index.
As stated, your algorithm won't work if one of the inner lists has None as the first value.
So you can do it like this:
t = [[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]]
def treat_missing_values(table, default_value):
last_value = default_value
for line in table:
for index in xrange(len(line)):
if line[index] is None:
line[index] = last_value
else:
last_value = line[index]
return table
print treat_missing_values(t, 0)
That thing about looking up the index from the value won't work if the list start with None or if there's a duplicate value. Try this:
def treat(v):
p = None
r = []
for n in v:
p = p if n == None else n
r.append(p)
return r
def treat_missing_values(table):
return [ treat(v) for v in table ]
t = [[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]]
print treat_missing_values(t)
This better not be your homework, dude.
EDIT A functional version for all you FP fans out there:
def treat(l):
def e(first, remainder):
return [ first ] + ([] if len(remainder) == 0 else e(first if remainder[0] == None else remainder[0], remainder[1:]))
return l if len(l) == 0 else e(l[0], l[1:])
That's because the index method returns the first occurence of the argument you pass to it. In the first line, for example, line.index(None) will always return 2, because that's the first occurence of None in that list.
Try this instead:
def treat_missing_values(table):
for line in table:
for i in range(len(line)):
if line[i] == None:
if i != 0:
line[i] = line[i - 1]
else:
#This line deals with your other problem: What if your FIRST value is None?
line[i] = 0 #Some default value here
return table
I'd use a global variable to keep track of the most recent valid value. And I'd use map() for the iteration.
t = [[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]]
prev = 0
def vIfNone(x):
global prev
if x:
prev = x
else:
x = prev
return x
print map( lambda line: map( vIfNone, line ), t )
EDIT: Malvolio, here. Sorry to be writing in your answer, but there were too many mistakes to corrected in a comment.
if x: will fail for all falsy values (notably 0 and the empty string).
Mutable global values are bad. They aren't thread-safe and produce other peculiar behaviors (in this case, if a list starts with None, it is set to the last value that happened to be processed by your code.
The re-writing of x is unnecessary; prev always has the right value.
In general, things like this should be wrapped in functions, for naming and for scoping
So:
def treat(n):
prev = [ None ]
def vIfNone(x):
if x is not None:
prev[0] = x
return prev[0]
return map( vIfNone, n )
(Note the weird use of prev as a closed variable. It will be local to each invocation of treat, and global across all invocations of vIfNone from the same treat invocation, exactly what you need. For dark and probably disturbing Python reasons I don't understand, it has to be an array.)
EDIT1
# your algorithm won't work if the line start with None
t = [[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]]
def treat_missing_values(table):
for line in table:
for index in range(len(line)):
if line[index] == None:
line[index] = line[index-1]
return table
print treat_missing_values(t)