Python split list if sequence of numbers is found

Python split list if sequence of numbers is found - python

I've been trying to find a relevant question, though I can't seem to search for the right words and all I'm finding is how to check if a list contains an intersection.
Basically, I need to split a list once a certain sequence of numbers is found, similar to doing str.split(sequence)[0], but with lists instead. I have working code, though it doesn't seem very efficient (also no idea if raising an error was the right way to go about it), and I'm sure there must be a better way to do it.
For the record, long_list could potentially have a length of a few million values, which is why I think iterating through them all might not be the best idea.
long_list = [2,6,4,2,7,98,32,5,15,4,2,6,43,23,95,10,31,5,1,73]
end_marker = [6,43,23,95]
end_marker_len = len(end_marker)
class SuccessfulTruncate(Exception):
pass
try:
counter = 0
for i in range(len(long_list)):
if long_list[i] == end_marker[counter]:
counter += 1
else:
counter = 0
if counter == end_marker_len:
raise SuccessfulTruncate()
except SuccessfulTruncate:
long_list = long_list[:2 + i - end_marker_len]
else:
raise IndexError('sequence not found')
>>> long_list
[2,6,4,2,7,98,32,5,15,4,2]
Ok, timing a few answers with a big list of 1 million values (the marker is very near the end):
Tim: 3.55 seconds
Mine: 2.7 seconds
Dan: 0.55 seconds
Andrey: 0.28 seconds
Kasramvd: still executing :P

I have working code, though it doesn't seem very efficient (also no idea if raising an error was the right way to go about it), and I'm sure there must be a better way to do it.
I commented on the exception raising in my comment
Instead of raising an exception and catching it in the same try/except you can just omit the try/except and do if counter == end_marker_len: long_list = long_list[:2 + i - end_marker_len]. Successful is not a word thats fitting for an exception name. Exceptions are used to indicate that something failed
Anyway, here is a shorter way:
>>> long_list = [2,6,4,2,7,98,32,5,15,4,2,6,43,23,95,10,31,5,1,73]
>>> end_marker = [6,43,23,95]
>>> index = [i for i in range(len(long_list)) if long_list[i:i+len(end_marker)] == end_marker][0]
>>> long_list[:index]
[2, 6, 4, 2, 7, 98, 32, 5, 15, 4, 2]
List comprehension inspired by this post

As a more pythonic way instead of multiple slicing you can use itertools.islice within a list comprehension :
>>> from itertools import islice
>>> M,N=len(long_list),len(end_maker)
>>> long_list[:next((i for i in range(0,M) if list(islice(long_list,i,i+N))==end_marker),0)]
[2, 6, 4, 2, 7, 98, 32, 5, 15, 4, 2]
Note that since the default value of next function is 0 if it doesn't find any match it will returns the whole of long_list.

In my solution used approach with index method:
input = [2,6,4,2,7,98,32,5,15,4,2,6,43,23,95,10,31,5,1,73]
brk = [6,43,23,95]
brk_len = len(brk)
brk_idx = 0
brk_offset = brk_idx + brk_len
try:
while input[brk_idx:brk_offset] != brk:
brk_idx = input.index(brk[0], brk_idx + 1)
brk_offset = brk_idx + brk_len
except ValueError:
print("Not found")
else:
print(input[:brk_idx])

If the values are of limited range, say fit in bytes (this can also be adapted to larger types), why not then encode the lists so that the string method find could be used:
long_list = [2,6,4,2,7,98,32,5,15,4,2,6,43,23,95,10,31,5,1,73]
end_marker = [6,43,23,95]
import struct
long_list_p = struct.pack('B'*len(long_list), *long_list)
end_marker_p = struct.pack('B'*len(end_marker), *end_marker)
print long_list[:long_list_p.find(end_marker_p)]
Prints:
[2, 6, 4, 2, 7, 98, 32, 5, 15, 4, 2]
I tried using bytes as in but the find method they had didn't work:
print long_list[:bytes(long_list).find(bytes(end_marker))]

Related

How to append to second value of dictionary value after underscore in +1 manner

Imagine I have a dictionary as such:
barcodedict={"12_20":[10,15,20], "12_21":[5, "5_1","5_2",6]}
Then I have a number that corresponds to a date, lets say 12_21 and we append it to the values of this date if it is not there as such:
if 8 not in barcodedict["12_21"]:
barcodedict["12_21"].append(8)
{'12_20': [10, 15, 20], '12_21': [5, "5_1", "5_2", 6, 8]}
However, if this number is already present in the value list, I want to add it to the value list with an extra integer that states that its a new occurrence as such:
if 5 not in barcodedict["12_21"]:
barcodedict["12_21"].append(5)
else: #which is now the case
barcodedict["12_21"].append(5_(2+1))
Desired output:
{"12_20":[10,15,20], "12_21":[5, "5_1","5_2","5_3",6, 8]}
As can be seen from the second example, I am not allowed to put underscore in list numbers and they are removed (5_1 becomes 51). And how can I achieve adding a new listing with +1 to the last number? I tried iterating over them and then splitting them but this seems unpythonic and didn't work because the underscore is ignored.
Edit 7/19/2022 10:46AM,
I found a bit of a hackish way around but it seems to hold for now:
placeholder=[]
for i in barcodedict["12_21"]:
if "5" in str(i):
try:
placeholder.append(str(i).split("_")[1])
except:
print("this is for the first 5 occurence, that has no _notation")
print(placeholder)
if len(placeholder) == 0 :
placeholder=[0]
occurence=max(list(map(int, placeholder)))+1
barcodedict["12_21"].append("5_"+occurence)
prints {'12_20': [10, 15, 20], '12_21': [5, '5_1', '5_2', 6, '5_3']}

With the requested number/string mixture it can be done with:
if 5 not in barcodedict["12_21"]:
barcodedict["12_21"].append(5)
else: #which is now the case
i = 1
while True:
if f"5_{i}" not in barcodedict["12_21"]:
barcodedict["12_21"].append(f"5_{i}")
break
i += 1

Underscores used like that do not show up in print, because they are meant to be used for convenience in representing big numbers, but when interpreted they don't show like that. You should use string manipulation if the way they're are displayed matters, or the other way around if you want to actually use them as numbers and want simply to represent them in a convenient way.

Another solution:
def fancy_append(dct, key, val):
last_num = max(
(
int(s[1])
for v in dct[key]
if isinstance(v, str) and (s := v.split("_"))[0] == str(val)
),
default=0,
)
dct[key].append(f"{val}_{last_num+1}" if last_num > 0 else val)
barcodedict = {"12_20": [10, 15, 20], "12_21": [5, "5_1", "5_2", 6]}
fancy_append(barcodedict, "12_21", 5)
print(barcodedict)
Prints:
{'12_20': [10, 15, 20], '12_21': [5, '5_1', '5_2', 6, '5_3']}

Count all sequences in a list

My self-learning task is to find how many sequences are on the list. A sequence is a group of numbers, where each is one 1 bigger than the previous one. So, in the list:
[1,2,3,5,8,10,12,13,14,15,17,19,21,23,24,25,26]
there are 3 sequences:
1,2,3
12,13,14,15
23,24,25,26
I've spent few hours and got a solution, which I think is a workaround rather than the real solution.
My solution is to have a separate list for adding sequences and count the attempts to update this list. I count the very first appending, and every new appending except for the sequence, which already exists.
I believe there is a solution without additional list, which allows to count the sequences itself rather than the list manipulation attempts.
numbers = [1,2,3,5,8,10,12,13,14,15,17,19,21,23,24,25,26]
goods = []
count = 0
for i in range(len(numbers)-1):
if numbers[i] + 1 == numbers[i+1]:
if goods == []:
goods.append(numbers[i])
count = count + 1
elif numbers[i] != goods[-1]:
goods.append(numbers[i])
count = count + 1
if numbers[i+1] != goods[-1]:
goods.append(numbers[i+1])
The output from my debugging:
Number 1 added to: [1]
First count change: 1
Number 12 added to: [1, 2, 3, 12]
Normal count change: 2
Number 23 added to: [1, 2, 3, 12, 13, 14, 15, 23]
Normal count change: 3

Thanks everyone for your help!
Legman suggested the original solution I failed to implemented before I end up with another solution in this post.
MSeifert helped to find a the right way with the lists:
numbers = [1,2,3,5,8,10,12,13,14,15,17,19,21,23,24,25,26]
print("Numbers:", numbers)
goods = []
count = 0
for i in range(len(numbers)-1):
if numbers[i] + 1 == numbers[i+1]:
if goods == []:
goods.append([numbers[i]])
count = count + 1
elif numbers[i] != goods[-1][-1]:
goods.append([numbers[i]])
count = count + 1
if numbers[i+1] != goods[-1]:
goods[-1].extend([numbers[i+1]])
print("Sequences:", goods)
print("Number of sequences:", len(goods))

One way would be to iterate over pairwise elements:
l = [1,2,3,5,8,10,12,13,14,15,17,19,21,23,24,25,26]
res = [[]]
for item1, item2 in zip(l, l[1:]): # pairwise iteration
if item2 - item1 == 1:
# The difference is 1, if we're at the beginning of a sequence add both
# to the result, otherwise just the second one (the first one is already
# included because of the previous iteration).
if not res[-1]: # index -1 means "last element".
res[-1].extend((item1, item2))
else:
res[-1].append(item2)
elif res[-1]:
# The difference isn't 1 so add a new empty list in case it just ended a sequence.
res.append([])
# In case "l" doesn't end with a "sequence" one needs to remove the trailing empty list.
if not res[-1]:
del res[-1]
>>> res
[[1, 2, 3], [12, 13, 14, 15], [23, 24, 25, 26]]
>>> len(res) # the amount of these sequences
3
A solution without zip only requires small changes (the loop and the the beginning of the loop) compared to the approach above:
l = [1,2,3,5,8,10,12,13,14,15,17,19,21,23,24,25,26]
res = [[]]
for idx in range(1, len(l)):
item1 = l[idx-1]
item2 = l[idx]
if item2 - item1 == 1:
if not res[-1]:
res[-1].extend((item1, item2))
else:
res[-1].append(item2)
elif res[-1]:
res.append([])
if not res[-1]:
del res[-1]

Taken from python itertools documentation, as demonstrated here you can use itemgetter and groupby to do that using only one list, like so:
>>> from itertools import groupby
>>> from operator import itemgetter
>>>
>>> l = [1, 2, 3, 5, 8, 10, 12, 13, 14, 15, 17, 19, 21, 23, 24, 25, 26]
>>>
>>> counter = 0
>>> for k, g in groupby(enumerate(l), lambda (i,x):i-x):
... seq = map(itemgetter(1), g)
... if len(seq)>1:
... print seq
... counter+=1
...
[1, 2, 3]
[12, 13, 14, 15]
[23, 24, 25, 26]
>>> counter
3
Notice: As correctly mentioned by #MSeifert, tuple unpacking in the signature is only possible in Python 2 and it will fail on Python 3 - so this is a python 2.x solution.

This could be solved with dynamic programming. If you only want to know the number of sequences and don't actually need to know what the sequences are you should be able to do this with only a couple of variables. Realistically, as you're going through the list you only really need to know if you are currently in a sequence, if not if the next one is incremented by 1 making this the beginning of a sequence and if so is the next one greater than 1 making it the exit of a sequence. After that, you just need to make sure to end the loop one cell before the end of the list since the last cell cant form a sequence by itself and so that it doesn't cause an error when you're performing a check. Below is example code
isSeq=false
for i in range(len(numbers)-1):
if isSeq==false:
if numbers[i]+1==numbers[i+1]:
isSeq=true
count=count+1
elif
if numbers[i]+1!=numbers[i+1]:
isSeq=false
Here is a link to a dynamic programming tutorial.
https://www.codechef.com/wiki/tutorial-dynamic-programming

Cumulative product of a list

I have implemented a list of all prime numbers from a set amount.
What I'm trying to do is hard to explain so I'll just show it with some hard code:
euclst = []
euclst.append((primelst[0]) + 1)
euclst.append((primelst[0] * primelst[1]) + 1)
euclst.append((primelst[0] * primelst[1] * primelst[2]) + 1)
....
So essentially I'm trying to take a single element in order from my prev list and multiplying it exponentially I guess and appending it to my other list.
I realized that I could just do this, which is probably easier:
euclst = []
euclst.append(primelst[0])
euclst.append(primelst[0] * primelst[1])
euclst.append(primelst[0] * primelst[1] * primelst[2])
....
#then add one to each element in the list later
I need some ideas to do this in a loop of some sort.

You want a list of the cumulative product. Here's a simple recipe:
>>> primelist = [2, 3, 5, 7, 11, 13, 17, 19, 23]
>>> euclist = []
>>> current = 1
>>> for p in primelist:
... current *= p
... euclist.append(current)
...
>>> euclist
[2, 6, 30, 210, 2310, 30030, 510510, 9699690, 223092870]
>>>
Another way, using itertools:
>>> import itertools
>>> import operator
>>> list(itertools.accumulate(primelist, operator.mul))
[2, 6, 30, 210, 2310, 30030, 510510, 9699690, 223092870]
>>>
OR, perhaps this is what you mean:
>>> [x + 1 for x in itertools.accumulate(primelist, operator.mul)]
[3, 7, 31, 211, 2311, 30031, 510511, 9699691, 223092871]
With the equivalent for-loop:
>>> euclist = []
>>> current = 1
>>> for p in primelist:
... current = current*p
... euclist.append(current + 1)
...
>>> euclist
[3, 7, 31, 211, 2311, 30031, 510511, 9699691, 223092871]
>>>

You could do it like this:
euclst = [primelst[0]]
for p in primelst[1:]:
euclst.append(euclst[-1]*p)
initialize your list with first element
loop and append the current element with the previous appended element
(since current result depends on previous results, it's not easily doable in a list comprehension)
To solve the more complex one with a +1 on it:
euclst = [primelst[0]+1]
for p in primelst[1:]:
euclst.append((euclst[-1]-1)*p+1)
(the previous result is the product plus one, so to reuse it, just substract one)
EDIT: other answers make me realize that I'm overcomplicating things. A temp variable to store the cumulative product would probably be cleaner.

It might be clearest to use an intermediate variable to keep track of the product, and then add the 1 as you put it in the list.
euclst = []
running_prod = 1
for p in primelst[]:
running_prod *= p
euclst.append(running_prod + 1)

you can use something like this:
euclst.append(primelst[0])
euclst.append(euclst[-1]*primelst[i])

Searching for key string within target string in Python recursively

The following is for Python 3.2.3.
I would like to write a function that takes two arguments, a key string and a target string. These function is to recursively determine (it must be recursive) the positions of the key string in the target string.
Currently, my code is as follows.
def posSubStringMatchRecursive(target,key):
import string
index=str.rfind(target, key)
if index !=-1:
print (index)
target=target[:(index+len(key)-1)]
posSubStringMatchRecursive(target,key)
The issue with this is that there is no way to store all the locations of the key string in the target string in a list as the numbers indicating the location will just be printed out.
So, my question is, is there any way to change the code such that the positions of the key string in the target string can be stored in a list?
Example Output
countSubStringMatchRecursive ('aatcgdaaaggraaa', 'aa')
13
12
7
6
0
Edit
The following code seems to work without the issue in Ashwini's code. Thanks, Lev.
def posSubStringMatchRecursive(target,key):
import string
index=str.rfind(target, key)
if index ==-1:
return []
else:
target=target[:(index+len(key)-1)]
return ([index] + posSubStringMatchRecursive(target,key))

def posSubStringMatchRecursive(target,key,res):
import string
index=str.rfind(target, key)
if index !=-1:
target=target[:(index+len(key)-1)]
res.append(index) #append the index to the list res,
return posSubStringMatchRecursive(target,key,res) #Use return here when calling recursively else your program will return None, and also pass res to the function
else:
return res
print(posSubStringMatchRecursive('aatcgdaaaggraaa', 'aa',[]))#pass a empty list to the function
print(posSubStringMatchRecursive('aatcgdaaaggraaa', 'a',[]))
output:`
[13, 12, 7, 6, 0]`
[14, 13, 12, 8, 7, 6, 1, 0]

Since it suspiciously resembles a homework question, here's an example of a recursive function that returns a list:
In [1]: def range_rec(limit):
if limit == 0:
return []
else:
return ([limit-1] + range_rec(limit-1)[::-1])[::-1]
...:
In [2]: range_rec(10)
Out[2]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

pythonic format for indices

I am after a string format to efficiently represent a set of indices.
For example "1-3,6,8-10,16" would produce [1,2,3,6,8,9,10,16]
Ideally I would also be able to represent infinite sequences.
Is there an existing standard way of doing this? Or a good library? Or can you propose your own format?
thanks!
Edit: Wow! - thanks for all the well considered responses. I agree I should use ':' instead. Any ideas about infinite lists? I was thinking of using "1.." to represent all positive numbers.
The use case is for a shopping cart. For some products I need to restrict product sales to multiples of X, for others any positive number. So I am after a string format to represent this in the database.

You don't need a string for that, This is as simple as it can get:
from types import SliceType
class sequence(object):
def __getitem__(self, item):
for a in item:
if isinstance(a, SliceType):
i = a.start
step = a.step if a.step else 1
while True:
if a.stop and i > a.stop:
break
yield i
i += step
else:
yield a
print list(sequence()[1:3,6,8:10,16])
Output:
[1, 2, 3, 6, 8, 9, 10, 16]
I'm using Python slice type power to express the sequence ranges. I'm also using generators to be memory efficient.
Please note that I'm adding 1 to the slice stop, otherwise the ranges will be different because the stop in slices is not included.
It supports steps:
>>> list(sequence()[1:3,6,8:20:2])
[1, 2, 3, 6, 8, 10, 12, 14, 16, 18, 20]
And infinite sequences:
sequence()[1:3,6,8:]
1, 2, 3, 6, 8, 9, 10, ...
If you have to give it a string then you can combine #ilya n. parser with this solution. I'll extend #ilya n. parser to support indexes as well as ranges:
def parser(input):
ranges = [a.split('-') for a in input.split(',')]
return [slice(*map(int, a)) if len(a) > 1 else int(a[0]) for a in ranges]
Now you can use it like this:
>>> print list(sequence()[parser('1-3,6,8-10,16')])
[1, 2, 3, 6, 8, 9, 10, 16]

If you're into something Pythonic, I think 1:3,6,8:10,16 would be a better choice, as x:y is a standard notation for index range and the syntax allows you to use this notation on objects. Note that the call
z[1:3,6,8:10,16]
gets translated into
z.__getitem__((slice(1, 3, None), 6, slice(8, 10, None), 16))
Even though this is a TypeError if z is a built-in container, you're free to create the class that will return something reasonable, e.g. as NumPy's arrays.
You might also say that by convention 5: and :5 represent infinite index ranges (this is a bit stretched as Python has no built-in types with negative or infinitely large positive indexes).
And here's the parser (a beautiful one-liner that suffers from slice(16, None, None) glitch described below):
def parse(s):
return [slice(*map(int, x.split(':'))) for x in s.split(',')]
There's one pitfall, however: 8:10 by definition includes only indices 8 and 9 -- without upper bound. If that's unacceptable for your purposes, you certainly need a different format and 1-3,6,8-10,16 looks good to me. The parser then would be
def myslice(start, stop=None, step=None):
return slice(start, (stop if stop is not None else start) + 1, step)
def parse(s):
return [myslice(*map(int, x.split('-'))) for x in s.split(',')]
Update: here's the full parser for a combined format:
from sys import maxsize as INF
def indices(s: 'string with indices list') -> 'indices generator':
for x in s.split(','):
splitter = ':' if (':' in x) or (x[0] == '-') else '-'
ix = x.split(splitter)
start = int(ix[0]) if ix[0] is not '' else -INF
if len(ix) == 1:
stop = start + 1
else:
stop = int(ix[1]) if ix[1] is not '' else INF
step = int(ix[2]) if len(ix) > 2 else 1
for y in range(start, stop + (splitter == '-'), step):
yield y
This handles negative numbers as well, so
print(list(indices('-5, 1:3, 6, 8:15:2, 20-25, 18')))
prints
[-5, 1, 2, 6, 7, 8, 10, 12, 14, 20, 21, 22, 23, 24, 25, 18, 19]
Yet another alternative is to use ... (which Python recognizes as the built-in constant Ellipsis so you can call z[...] if you want) but I think 1,...,3,6, 8,...,10,16 is less readable.

This is probably about as lazily as it can be done, meaning it will be okay for even very large lists:
def makerange(s):
for nums in s.split(","): # whole list comma-delimited
range_ = nums.split("-") # number might have a dash - if not, no big deal
start = int(range_[0])
for i in xrange(start, start + 1 if len(range_) == 1 else int(range_[1]) + 1):
yield i
s = "1-3,6,8-10,16"
print list(makerange(s))
output:
[1, 2, 3, 6, 8, 9, 10, 16]

import sys
class Sequencer(object):
def __getitem__(self, items):
if not isinstance(items, (tuple, list)):
items = [items]
for item in items:
if isinstance(item, slice):
for i in xrange(*item.indices(sys.maxint)):
yield i
else:
yield item
>>> s = Sequencer()
>>> print list(s[1:3,6,8:10,16])
[1, 2, 6, 8, 9, 16]
Note that I am using the xrange builtin to generate the sequence. That seems awkward at first because it doesn't include the upper number of sequences by default, however it proves to be very convenient. You can do things like:
>>> print list(s[1:10:3,5,5,16,13:5:-1])
[1, 4, 7, 5, 5, 16, 13, 12, 11, 10, 9, 8, 7, 6]
Which means you can use the step part of xrange.

This looked like a fun puzzle to go with my coffee this morning. If you settle on your given syntax (which looks okay to me, with some notes at the end), here is a pyparsing converter that will take your input string and return a list of integers:
from pyparsing import *
integer = Word(nums).setParseAction(lambda t : int(t[0]))
intrange = integer("start") + '-' + integer("end")
def validateRange(tokens):
if tokens.from_ > tokens.to:
raise Exception("invalid range, start must be <= end")
intrange.setParseAction(validateRange)
intrange.addParseAction(lambda t: list(range(t.start, t.end+1)))
indices = delimitedList(intrange | integer)
def mergeRanges(tokens):
ret = set()
for item in tokens:
if isinstance(item,int):
ret.add(item)
else:
ret += set(item)
return sorted(ret)
indices.setParseAction(mergeRanges)
test = "1-3,6,8-10,16"
print indices.parseString(test)
This also takes care of any overlapping or duplicate entries, such "3-8,4,6,3,4", and returns a list of just the unique integers.
The parser takes care of validating that ranges like "10-3" are not allowed. If you really wanted to allow this, and have something like "1,5-3,7" return 1,5,4,3,7, then you could tweak the intrange and mergeRanges parse actions to get this simpler result (and discard the validateRange parse action altogether).
You are very likely to get whitespace in your expressions, I assume that this is not significant. "1, 2, 3-6" would be handled the same as "1,2,3-6". Pyparsing does this by default, so you don't see any special whitespace handling in the code above (but it's there...)
This parser does not handle negative indices, but if that were needed too, just change the definition of integer to:
integer = Combine(Optional('-') + Word(nums)).setParseAction(lambda t : int(t[0]))
Your example didn't list any negatives, so I left it out for now.
Python uses ':' for a ranging delimiter, so your original string could have looked like "1:3,6,8:10,16", and Pascal used '..' for array ranges, giving "1..3,6,8..10,16" - meh, dashes are just as good as far as I'm concerned.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python split list if sequence of numbers is found - python

Related

How to append to second value of dictionary value after underscore in +1 manner

Count all sequences in a list

Cumulative product of a list

Searching for key string within target string in Python recursively

pythonic format for indices

Categories

Resources