How to to split a list at a certain value - python

given a unique valued list such as [5,4,9,2,1,7,'dog',9] is there a way to split it a a certain value? ie
[5,4,9,2,7,'dog'].split(4)
= [5,4],[9,2,7,'dog']
[5,4,9,2,7,'dog'].split(2)
= [5,4,9,2], [7,'dog']
?

>>> mylist = [5,4,9,2,7,'dog']
>>> def list_split(l, element):
... if l[-1] == element:
... return l, []
... delimiter = l.index(element)+1
... return l[:delimiter], l[delimiter:]
...
>>> list_split(mylist, 4)
([5, 4], [9, 2, 7, 'dog'])
>>> list_split(mylist, 2)
([5, 4, 9, 2], [7, 'dog'])
>>> list_split(mylist, 'dog')
([5, 4, 9, 2, 7, 'dog'], [])

Here is a function that will split a list into an arbitrary number of sub-lists and return a list of them, much like a string split.
def split_list(input_list,seperator):
outer = []
inner = []
for elem in input_list:
if elem == separator:
if inner:
outer.append(inner)
inner = []
else:
inner.append(elem)
if inner:
outer.append(inner)
return outer
>>> split_list([1,2,3,4,1,2,3,4,4,3,4,3],3)
[[1, 2], [4, 1, 2], [4, 4], [4]]
If the separator element is to be retained on the sub-list to the left, as specifically posed in the question, then this variation will do it:
def split_list(input_list,seperator):
outer = []
inner = []
for elem in input_list:
if elem == separator:
inner.append(elem)
outer.append(inner)
inner = []
else:
inner.append(elem)
if inner:
outer.append(inner)
return outer
>>> splitlist.split_list([1,2,3,4,1,2,3,4,4,3,4,3],3)
[[1, 2, 3], [4, 1, 2, 3], [4, 4, 3], [4, 3]]

def listsplit(l, e):
try:
i = l.index(e) + 1
return l[:i], l[i:]
except ValueError:
return l
In the case of the element being at the first or last index in the list, one of the returned lists will be empty, but I'm not entirely sure if this is a problem. In the case of ValueError from the list.index call, you'll have only the list itself returned, but if another behavior is desired, it's pretty simple to change.

Not as such, but you can pretty easily slice the list you're making using the index method and get the results you show in your example output.
>>> l1 = [5, 4, 9, 2, 7, 'dog']
>>> l1[:l1.index(4) + 1], l1[l1.index(4) + 1:]
([5, 4], [9, 2, 7, 'dog'])
The '+1' is there because you wanted the needle (4) to appear in the first list.
Note that this splits on the first occurrence, which is pretty typical. If you really want a split method as in your question, you can subclass list and write your own using something like this.

On one line:
[x[k+1:x[k+1:len(x)].index(4)+len(x[:k+1])] for k in [c for c in [-1]+[i for i, a in enumerate(x) if a == 4]][:-1] ] + [x[(len(x) - 1) - x[::-1].index(4)+1:len(x)]]
The function:
def split_list(x, value):
return [x[k+1:x[k+1:len(x)].index(value)+len(x[:k+1])] for k in [c for c in [-1]+[i for i, a in enumerate(x) if a == value]][:-1] ] + [x[(len(x) - 1) - x[::-1].index(value)+1:len(x)]]
It will work in case the separator exists several times, and in cas it exists at thy end or the beginning of the lis.
Details:
Part 1: [c for c in [-1]+[i for i, a in enumerate(x) if a == value]][:-1]
Get the indexes of occurences of value, append "-1" in the beginning and remove the index of the last occurence.
x[k+1:x[k+1:len(x)].index(value)+len(x[:k+1])]
Make a slice (let's callet Slice A) of the list starting the first time from 0 (that's why I appended [-1] -1+1=0) and then starting from occurence index +1 (to make the current occurence unreachable) with each occurence index got from Part 1, then I take another slice starting from the curent occurence index +1 ending at thy end of the list, then I look for the first occurence in this slice, I get it; because it was a slice, I add the current offset (the part from beginning + current occurence index +1) to it, at this point I get the position to stop in, this is the end of the slice A. I finally obtain a split section each time.
I removed the index of the last occurence, coz it is not relevant to calculate anything after that index, so I just apend them at thy end: + [x[(len(x) - 1) - x[::-1].index(value)+1:len(x)]]

Use slicing:
def split_list(li, item):
if item not in li:
return li
else:
return [li[0:li.index(item)+1], li[li.index(item)+1:]]
Testing:
for item in (5,4,2,'dog'):
print item, split_list([5,4,9,2,7,'dog'], item)
Prints:
5 [[5], [4, 9, 2, 7, 'dog']]
4 [[5, 4], [9, 2, 7, 'dog']]
2 [[5, 4, 9, 2], [7, 'dog']]
dog [[5, 4, 9, 2, 7, 'dog'], []]

Related

Grouping numbers in a list of floats in ascending order [duplicate]

Assume no consecutive integers are in the list.
I've tried using NumPy (np.diff) for the difference between each element, but haven't been able to use that to achieve the answer. Two examples of the input (first line) and expected output (second line) are below.
[6, 0, 4, 8, 7, 6]
[[6], [0, 4, 8], [7], [6]]
[1, 4, 1, 2, 4, 3, 5, 4, 0]
[[1, 4], [1, 2, 4], [3, 5], [4], [0]]
You could use itertools.zip_longest to enable iteration over sequential element pairs in your list along with enumerate to keep track of index values where the sequences are not increasing in order to append corresponding slices to your output list.
from itertools import zip_longest
nums = [1, 4, 1, 2, 4, 3, 5, 4, 0]
results = []
start = 0
for i, (a, b) in enumerate(zip_longest(nums, nums[1:])):
if b is None or b <= a:
results.append(nums[start:i+1])
start = i + 1
print(results)
# [[1, 4], [1, 2, 4], [3, 5], [4], [0]]
Here's a simple way to do what you're asking without any extra libraries:
result_list = []
sublist = []
previous_number = None
for current_number in inp:
if previous_number is None or current_number > previous_number:
# still ascending, add to the current sublist
sublist.append(current_number)
else:
# no longer ascending, add the current sublist
result_list.append(sublist)
# start a new sublist
sublist = [current_number]
previous_number = current_number
if sublist:
# add the last sublist, if there's anything there
result_list.append(sublist)
Just cause I feel kind, this will also work with negative numbers.
seq = [6, 0, 4, 8, 7, 6]
seq_by_incr_groups = [] # Will hold the result
incr_seq = [] # Needed to create groups of increasing values.
previous_value = 0 # Needed to assert whether or not it's an increasing value.
for curr_value in seq: # Iterate over the list
if curr_value > previous_value: # It's an increasing value and belongs to the group of increasing values.
incr_seq.append(curr_value)
else: # It was lower, lets append the previous group of increasing values to the result and reset the group so that we can create a new one.
if incr_seq: # It could be that it's empty, in the case that the first number in the input list is a negative.
seq_by_incr_groups.append(incr_seq)
incr_seq = []
incr_seq.append(curr_value)
previous_value = curr_value # Needed so that we in the next iteration can assert that the value is increasing compared to the prior one.
if incr_seq: # Check if we have to add any more increasing number groups.
seq_by_incr_groups.append(incr_seq) # Add them.
print(seq_by_incr_groups)
Below code should help you. However I would recommend that you use proper nomenclature and consider handling corner cases:
li1 = [6, 0, 4, 8, 7, 6]
li2 = [1, 4, 1, 2, 4, 3, 5, 4, 0]
def inc_seq(li1):
lix = []
li_t = []
for i in range(len(li1)):
#print (i)
if i < (len(li1) - 1) and li1[i] >= li1[i + 1]:
li_t.append(li1[i])
lix.append(li_t)
li_t = []
else:
li_t.append(li1[i])
print (lix)
inc_seq(li1)
inc_seq(li2)
You can write a simple script and you don't need numpy as far as I have understood your problem statement. Try the script below. I have tested it using Python 3.6.7 and Python 2.7.15+ on my Ubuntu machine.
def breakIntoList(inp):
if not inp:
return []
sublist = [inp[0]]
output = []
for a in inp[1:]:
if a > sublist[-1]:
sublist.append(a)
else:
output.append(sublist);
sublist = [a]
output.append(sublist)
return output
list = [1, 4, 1, 2, 4, 3, 5, 4, 0]
print(list)
print(breakIntoList(list))
Explanation:
The script first checks if input List passed to it has one or more elements.
It then initialise a sublist (variable name) to hold elements in increasing order. After that, we append input List's first element into our sublist.
We iterate through the input List beginning from it's second element (Index: 1). We keep on checking if the current element in Input List is greater than last element of sublist (by sublist[-1]). If yes, we append the current element to our sublist (at the end). If not, it means we can't hold that current element in sub-List. We append the sublist to output List and clear the sublist (for holding other increasing order sublists) and add the current element to our sublist.
At the end, we append the remaining sublist to the output List.
Here's an alternative using dict, list comprehensions, and zip:
seq = [1, 4, 1, 2, 4, 3, 5, 4, 0]
dict_seq = {i:j for i,j in enumerate(seq)}
# Get the index where numbers start to decrease
idx = [0] # Adding a zero seems counter-intuitive now; we'll see the benefit later.
for k, v in dict_seq.items():
if k>0:
if dict_seq[k]<dict_seq[k-1]:
idx.append(k)
# Using zip, slice and handling the last entry
inc_seq = [seq[i:j] for i, j in zip(idx, idx[1:])] + [seq[idx[-1:]]]
Output
print(inc_seq)
>>> [[1, 4], [1, 2, 4], [3, 5], [4], [0]]
By initiating idx = [0] and creating 2 sublists idx, idx[1:], we can zip these sublists to form [0:2], [2:5], [5:7] and [7:8] with the list comprehension.
>>> print(idx)
>>> [0, 2, 5, 7, 8]
>>> for i, j in zip(idx, idx[1:]):
print('[{}:{}]'.format(i,j))
[0:2]
[2:5]
[5:7]
[7:8] # <-- need to add the last slide [8:]

python : group elements together in list

I'm currently working with itertools to create and return a list whose elements are lists that contain the consecutive runs of equal elements of the original list.
import itertools
it = [1, 1, 5, 5, 5, 'test', 'test', 5]
new = len(it)
for a in range(new):
return [list(k) for a, k in itertools.groupby(it)]
For the above example the result is:
[[1, 1], [5, 5, 5], ['test', 'test'], [5]]
Can I achieve this without using itertools?
You can pair adjacent items by zipping the list with itself but with a padding of float('nan') since it can't be equal to any object, and then iterate through the zipped pairs to append items to last sub-list of the output list, and add a new sub-list when the adjacent items are different:
output = []
for a, b in zip([float('nan')] + it, it):
if a != b:
output.append([])
output[-1].append(b)
output becomes:
[[1, 1], [5, 5, 5], ['test', 'test'], [5]]
To be honest a simple for loop could make this work, you don't even have to import itertools.
The simplest way to do this is by using this:
it = [1, 1, 5, 5, 5, 'test', 'test', 5]
result = []
for (i, x) in enumerate(it):
if i < 1 or type(x) != type(it[i - 1]) or x != it[i - 1]:
result.append([x])
else:
result[-1].append(x)
print(result)
Or, in function form:
def type_chunk(it):
result = []
for (i, x) in enumerate(it):
if i < 1 or type(x) != type(it[i - 1]) or x != it[i - 1]:
result.append([x])
else:
result[-1].append(x)
return result
You would then use the function like this:
print(type_chunk([1, 1, 5, 5, 5, 'test', 'test', 5]))
You could even skip the type checking and only look for equal values:
def type_chunk(it):
result = []
for (i, x) in enumerate(it):
if i < 1 or x != it[i - 1]:
result.append([x])
else:
result[-1].append(x)
return result
Good luck.
You could have a look at the function in itertools to see how they are doing it.
Here is one way which shows the logic clearly (can be further reduced):
def i_am_itertool():
it = [1, 1, 5, 5, 5, 'test', 'test', 5]
ret = []
temp = []
last = it[0]
for e in it:
if e == last:
temp.append(e)
else:
ret.append(temp) # Add previous group
temp = [e] # Start next group
last = e
ret.append(temp) # Add final group
return ret
print(i_am_itertool())
Output:
[[1, 1], [5, 5, 5], ['test', 'test'], [5]]

Find smallest repeated piece of a list

I've got some list with integers like:
l1 = [8,9,8,9,8,9,8],
l2 = [3,4,2,4,3]
My purpose to slice it into the smallest repeated piece. So:
output_l1 = [8,9]
output_l2 = [3,4,2,4]
Biggest problem that the sequences not fully finished every time. So not
'abcabcabc'
just
'abcabcab'.
def shortest_repeating_sequence(inp):
for i in range(1, len(inp)):
if all(inp[j] == inp[j % i] for j in range(i, len(inp))):
return inp[:i]
# inp doesn't have a repeating pattern if we got this far
return inp[:]
This code is O(n^2). The worst case is one element repeated a lot of times followed by something that breaks the pattern at the end, for example [1, 1, 1, 1, 1, 1, 1, 1, 1, 8].
You start with 1, and then iterate over the entire list checking that each inp[i] is equal to inp[i % 1]. Any number % 1 is equal to 0, so you're checking if each item in the input is equal to the first item in the input. If all items are equal to the first element then the repeating pattern is a list with just the first element so we return inp[:1].
If at some point you hit an element that isn't equal to the first element (all() stops as soon as it finds a False), you try with 2. So now you're checking if each element at an even index is equal to the first element (4 % 2 is 0) and if every odd index is equal to the second item (5 % 2 is 1). If you get all the way through this, the pattern is the first two elements so return inp[:2], otherwise try again with 3 and so on.
You could do range(1, len(inp)+1) and then the for loop will handle the case where inp doesn't contain a repeating pattern, but then you have to needlessly iterate over the entire inp at the end. And you'd still have to have to have return [] at the end to handle inp being the empty list.
I return a copy of the list (inp[:]) instead of the list to have consistent behavior. If I returned the original list with return inp and someone called that function on a list that didn't have a repeating pattern (ie their repeating pattern is the original list) and then did something with the repeating pattern, it would modify their original list as well.
shortest_repeating_sequence([4, 2, 7, 4, 6]) # no pattern
[4, 2, 7, 4, 6]
shortest_repeating_sequence([2, 3, 1, 2, 3]) # pattern doesn't repeat fully
[2, 3, 1]
shortest_repeating_sequence([2, 3, 1, 2]) # pattern doesn't repeat fully
[2, 3, 1]
shortest_repeating_sequence([8, 9, 8, 9, 8, 9, 8])
[8, 9]
shortest_repeating_sequence([1, 1, 1, 1, 1])
[1]
shortest_repeating_sequence([])
[]
The following code is a rework of your solution that addresses some issues:
Your solution as posted doesn't handle your own 'abcabcab' example.
Your solution keeps processing even after it's found a valid result, and then filters through both the valid and non-valid results. Instead, once a valid result is found, we process and return it. Additional valid results, and non-valid results, are simply ignored.
#Boris' issue regarding returning the input if there is no repeating pattern.
CODE
def repeated_piece(target):
target = list(target)
length = len(target)
for final in range(1, length):
result = []
while len(result) < length:
for i in target[:final]:
result.append(i)
if result[:length] == target:
return result[:final]
return target
l1 = [8, 9, 8, 9, 8, 9, 8]
l2 = [3, 4, 2, 4, 3]
l3 = 'abcabcab'
l4 = [1, 2, 3]
print(*repeated_piece(l1), sep='')
print(*repeated_piece(l2), sep='')
print(*repeated_piece(l3), sep='')
print(*repeated_piece(l4), sep='')
OUTPUT
% python3 test.py
89
3424
abc
123
%
You can still use:
print(''.join(map(str, repeated_piece(l1))))
if you're uncomfortable with the simpler Python 3 idiom:
print(*repeated_piece(l1), sep='')
SOLUTION
target = [8,9,8,9,8,9,8]
length = len(target)
result = []
results = [] * length
for j in range(1, length):
result = []
while len(result) < length:
for i in target[:j]:
result.append(i)
results.append(result)
final = []
for i in range(0, len(results)):
if results[i][:length] == target:
final.append(1)
else:
final.append(0)
if 1 in final:
solution = results[final.index(1)][:final.index(1)+1]
else:
solution = target
int(''.join(map(str, solution)))
'result: [8, 9]'.
Simple Solution:
def get_unique_items_list(some_list):
new_list = []
for i in range(len(some_list)):
if not some_list[i] in new_list:
new_list.append(some_list[i])
return new_list
l1 = [8,9,8,9,8,9,8]
l2 = [3,4,2,4,3]
print(get_unique_items_list(l1))
print(get_unique_items_list(l2))
#### Output ####
# [8, 9]
# [3, 4, 2]

Loop from a specific point in a list of lists Python

I would like to append to a new list all elements of an existing list of lists after a specific point
m = [[1,2,3],[4,5,10],[6,2,1]]
specific point = m[0][2]
newlist = [3,4,5,10,6,2,1]
You can directly slice off the remainder of the first target list and then add on all subsequent elements, eg:
m = [[1,2,3],[4,5,10],[6,2,1]]
y, x = 0, 2
new_list = m[y][x:] + [v for el in m[y+1:] for v in el]
# [3, 4, 5, 10, 6, 2, 1]
Here's a couple of functional approaches for efficiently iterating over your data.
If sublists are evenly sized, and you know the index from where to begin extracting elements, use chain + islice:
from itertools import chain, islice
n = 3 # Sublist size.
i,j = 0,2
newlist = list(islice(chain.from_iterable(m), i*n + j, None))
If you don't know the size of your sublists in advance, you can use next to discard the first portion of your data.
V = chain.from_iterable(m)
next(v for v in V if v == m[i][j])
newlist = list(V)
newlist.insert(m[i][j], 0)
This assumes there is no identical value earlier in the sequence.
You can put a conditional in your iteration and only add based on that condition. Once you hit that specific index, make your condition true. Something like this:
m = [[1,2,3],[4,5,10],[6,2,1]]
specific_point = (0,2)
newlist = [3,4,5,10,6,2,1]
output = []
for i in range(len(m)):
for j in range(len(m[i])):
if (i,j) < specific_point:
continue
output.append(m[i][j])
output:
[3, 4, 5, 10, 6, 2, 1]
why not flatten the initial list and go from there
flat_list = [item for sublist in m for item in sublist]
would return [1,2,3,4,5,10,6,2,1] so now you're really on flat_list[2:]
Most of the answers only work for this specific shape of nested list, but it's also possible to create a solution that works with any shape of nested list.
def flatten_from(sequence, path=[]):
start = path.pop(0) if path else 0
for item in sequence[start:]:
if isinstance(item, (list, tuple)):
yield from flatten_from(item, path)
else:
yield item
With the example from the question
>>> list(flatten_from([[1, 2, 3], [4, 5, 10], [6, 2, 1]], [0, 2]))
[3, 4, 5, 10, 6, 2, 1]
It also works with any shape and level of nesting of the input data
m = [[1], [[2], [3, 4, 5, 6, 7]], 8, [9, [10, 11]]]
flatten_from(m, [])) # 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
flatten_from(m, [2]) # 8, 9, 10, 11
flatten_from(m, [1, 1, 3]) # 6, 7, 8, 9, 10, 11
This is a bit of a bastard algorithm, though. On one hand, it uses nice functional programming concepts: recursion and yield.
On the other hand it relies on the side effect of mutating the path argument with list.pop, so it's not a pure function.
Below solution will work for your case where your array is restricted to list of list and the size of 'sublist' is consistent throughout i.e "3" in your case
m = [[1,2,3],[4,5,10],[6,2,1]] #input 2D array
a, b = 0, 2 #user input --> specific point a and b
flat_list_m = [item for firstlist in m for item in firstlist] #flat the 2D list
print (flat_list_m[len(m[0])*a+b:]) #print from specific position a and b, considering your sublist length is consistent throughout.
I hope this helps! :)

Remove sublist from list

I want to do the following in Python:
A = [1, 2, 3, 4, 5, 6, 7, 7, 7]
C = A - [3, 4] # Should be [1, 2, 5, 6, 7, 7, 7]
C = A - [4, 3] # Should not be removing anything, because sequence 4, 3 is not found
So, I simply want to remove the first appearance of a sublist (as a sequence) from another list. How can I do that?
Edit: I am talking about lists, not sets. Which implies that ordering (sequence) of items matter (both in A and B), as well as duplicates.
Use sets:
C = list(set(A) - set(B))
In case you want to mantain duplicates and/or oder:
filter_set = set(B)
C = [x for x in A if x not in filter_set]
If you want to remove exact sequences, here is one way:
Find the bad indices by checking to see if the sublist matches the desired sequence:
bad_ind = [range(i,i+len(B)) for i,x in enumerate(A) if A[i:i+len(B)] == B]
print(bad_ind)
#[[2, 3]]
Since this returns a list of lists, flatten it and turn it into a set:
bad_ind_set = set([item for sublist in bad_ind for item in sublist])
print(bad_ind_set)
#set([2, 3])
Now use this set to filter your original list, by index:
C = [x for i,x in enumerate(A) if i not in bad_ind_set]
print(C)
#[1, 2, 5, 6, 7, 7, 7]
The above bad_ind_set will remove all matches of the sequence. If you only want to remove the first match, it's even simpler. You just need the first element of bad_ind (no need to flatten the list):
bad_ind_set = set(bad_ind[0])
Update: Here is a way to find and remove the first matching sub-sequence using a short circuiting for loop. This will be faster because it will break out once the first match is found.
start_ind = None
for i in range(len(A)):
if A[i:i+len(B)] == B:
start_ind = i
break
C = [x for i, x in enumerate(A)
if start_ind is None or not(start_ind <= i < (start_ind + len(B)))]
print(C)
#[1, 2, 5, 6, 7, 7, 7]
I considered this question was like one substring search, so KMP, BM etc sub-string search algorithm could be applied at here. Even you'd like support multiple patterns, there are some multiple pattern algorithms like Aho-Corasick, Wu-Manber etc.
Below is KMP algorithm implemented by Python which is from GitHub Gist.
PS: the author is not me. I just want to share my idea.
class KMP:
def partial(self, pattern):
""" Calculate partial match table: String -> [Int]"""
ret = [0]
for i in range(1, len(pattern)):
j = ret[i - 1]
while j > 0 and pattern[j] != pattern[i]:
j = ret[j - 1]
ret.append(j + 1 if pattern[j] == pattern[i] else j)
return ret
def search(self, T, P):
"""
KMP search main algorithm: String -> String -> [Int]
Return all the matching position of pattern string P in S
"""
partial, ret, j = self.partial(P), [], 0
for i in range(len(T)):
while j > 0 and T[i] != P[j]:
j = partial[j - 1]
if T[i] == P[j]: j += 1
if j == len(P):
ret.append(i - (j - 1))
j = 0
return ret
Then use it to calcuate out the matched position, finally remove the match:
A = [1, 2, 3, 4, 5, 6, 7, 7, 7, 3, 4]
B = [3, 4]
result = KMP().search(A, B)
print(result)
#assuming at least one match is found
print(A[:result[0]:] + A[result[0]+len(B):])
Output:
[2, 9]
[1, 2, 5, 6, 7, 7, 7, 3, 4]
[Finished in 0.201s]
PS: You can try other algorithms also. And #Pault 's answers is good enough unless you care about the performance a lot.
Here is another approach:
# Returns that starting and ending point (index) of the sublist, if it exists, otherwise 'None'.
def findSublist(subList, inList):
subListLength = len(subList)
for i in range(len(inList)-subListLength):
if subList == inList[i:i+subListLength]:
return (i, i+subListLength)
return None
# Removes the sublist, if it exists and returns a new list, otherwise returns the old list.
def removeSublistFromList(subList, inList):
indices = findSublist(subList, inList)
if not indices is None:
return inList[0:indices[0]] + inList[indices[1]:]
else:
return inList
A = [1, 2, 3, 4, 5, 6, 7, 7, 7]
s1 = [3,4]
B = removeSublistFromList(s1, A)
print(B)
s2 = [4,3]
C = removeSublistFromList(s2, A)
print(C)

Categories