group elements in list by pairwise criterion in python - python

I am implementing a local optimization that fuses objects together. In the simplest form, given a list:
[0, 3, 5, 8, 1, 2, 9, 0, 3, 5]
I would like to group into:
[[0, 3], 5, 8, 1, 2, 9, [0, 3], 5]
which is based on a provided criterion:
def is_group(a, b):
return a == 0 and b == 3
My current solution seems a bit convoluted, and am looking for most pythonic approach:
def pairwise(iterable):
for (a, b) in zip(iterable, iterable[1:]):
yield (a, b)
yield (iterable[-1], None) # handle edge case
def fuse(ops):
ops_new = []
skip_next = False
for (a, b) in pairwise(ops):
if is_group(a, b):
ops_new.append([a, b])
skip_next = True
elif skip_next:
skip_next = False
elif:
ops_new.append(a)
I've looked at groupby, which is the closest but aren't quite sure how to make it work since here the criterion depends on pairwise arguments.
Edit: Another way to ask the question is I am basically trying to do pattern search and replace with lists (e.g. regex for lists).

Custom isolate_group function:
def isolate_group(pair, l):
result = []
idx_skip = -1
for i in range(len(l)):
if i == idx_skip:
continue
if l[i:i+2] == pair:
result.append(l[i:i+2])
idx_skip = i+1
else:
result.append(l[i])
return result
Test 1:
print(isolate_group([0,3], [0, 3, 5, 8, 1, 2, 9, 0, 3, 5]))
The output:
[[0, 3], 5, 8, 1, 2, 9, [0, 3], 5]
Test 2:
print(isolate_group([0,3], [0, 3, 5, 8, 0, 3, 9, 5, 0, 3]))
The output:
[[0, 3], 5, 8, [0, 3], 9, 5, [0, 3]]

Related

Finding Indices for Repeat Sequences in NumPy Array

This is a follow up to a previous question. If I have a NumPy array [0, 1, 2, 2, 3, 4, 2, 2, 5, 5, 6, 5, 5, 2, 2], for each repeat sequence (starting at each index), is there a fast way to to then find all matches of that repeat sequence and return the index for those matches?
Here, the repeat sequences are [2, 2] and [5, 5] (note that the length of the repeat is specified by the user but will be the same length and can be much greater than 2). The repeats can be found at [2, 6, 8, 11, 13] via:
def consec_repeat_starts(a, n):
N = n-1
m = a[:-1]==a[1:]
return np.flatnonzero(np.convolve(m,np.ones(N, dtype=int))==N)-N+1
But for each unique type of repeat sequence (i.e., [2, 2] and [5, 5]) I want to return something like the repeat followed by the indices for where the repeat is located:
[([2, 2], [2, 6, 13]), ([5, 5], [8, 11])]
Update
Additionally, given the repeat sequence, can you return the results from a second array. So, look for [2, 2] and [5, 5] in:
[2, 2, 5, 5, 1, 4, 9, 2, 5, 5, 0, 2, 2, 2]
And the function would return:
[([2, 2], [0, 11, 12]), ([5, 5], [2, 8]))]
Here's a way to do so -
def group_consec(a, n):
idx = consec_repeat_starts(a, n)
b = a[idx]
sidx = b.argsort()
c = b[sidx]
cut_idx = np.flatnonzero(np.r_[True, c[:-1]!=c[1:],True])
idx_s = idx[sidx]
indices = [idx_s[i:j] for (i,j) in zip(cut_idx[:-1],cut_idx[1:])]
return c[cut_idx[:-1]], indices
# Perform lookup in another array, b
n = 2
v_a,indices_a = group_consec(a, n)
v_b,indices_b = group_consec(b, n)
idx = np.searchsorted(v_a, v_b)
idx[idx==len(v_a)] = 0
valid_mask = v_a[idx]==v_b
common_indices = [j for (i,j) in zip(valid_mask,indices_b) if i]
common_val = v_b[valid_mask]
Note that for simplicity and ease of usage, the first output arg off group_consec has the unique values per sequence. If you need them in (val, val,..) format, simply replicate at the end. Similarly, for common_val.

Slicing a list at integer of when an element is different from the previous

I need to be able to slice a list of points into multiple sublists, to act as a guide for slicing another list.
a = 1 # just an example
b = 2 # just an example
c = 3 # just an example
# My list right now
y_vals = [a, a, a, a, a, a, b, b, b, b, b, b, b, b, b, b, b, c, c, c, c, c]
and I need it to slice every time the number changes. a, b and c are actual numbers, but the numbers are rather long, so I typed it out in a,b,c.
I wanted to use the slicing method of [:x], but it's a list of 5000 over numbers, and I'm not sure how to slice a list. Thank you in advance!
If you simply want to get sublists with only the same number then don't bother with slicing. A good approach is itertools.groupby:
from itertools import groupby
li = [3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 2, 2]
for _, group in groupby(li):
print(list(group))
Outputs
[3, 3, 3, 3, 3, 3]
[1, 1, 1, 1, 1]
[2, 2]
EDIT getting from this to your required list of lists is a straightforward one-liner:
output = [list(group) for _, group in groupby(li)]
print(output)
Outputs
[[3, 3, 3, 3, 3, 3], [1, 1, 1, 1, 1], [2, 2]]
This algorithm isn't really beautiful, but it should work:
a = 1
b = 2
c = 3
y_vals = [a,a,a,a,a,a,b,b,b,b,c,c,c,c,c]
last_break = 0
for i in range(1, len(y_vals)):
if y_vals[i - 1] != y_vals[i]:
print(y_vals[last_break: i])
last_break = i
if i == len(y_vals) - 1:
print(y_vals[last_break: i + 1])
Result:
[1, 1, 1, 1, 1, 1]
[2, 2, 2, 2]
[3, 3, 3, 3, 3]
Edit: It will also work for lists like that:
y_vals = [1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,1,1,1]
The algorithm by #GotCubes won't.
A solution without slicing:
y_vals = [1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,1,1,1]
sub_list = []
for i in range(0, len(y_vals)):
if ((i < len(y_vals)-1 ) and (y_vals[i] == y_vals[i+1])):
sub_list.append(y_vals[i])
else:
sub_list.append(y_vals[i])
print(sub_list)
sub_list=[]
Output:
[1, 1, 1, 1, 1, 1]
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
[3, 3, 3, 3, 3]
[1, 1, 1]
#DeepSpace has the answer you're most likely looking for. However, if you're insistent on slicing, or otherwise getting the indices on which to slice, this might help:
# Six 1's, Eleven 2's, Five 3's
y_vals = [1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3]
split_points = [y_vals.count(v) for v in set(y_vals)]
print(split_points)
ind = 0
for i in split_points:
segment = y_vals[ind:ind+i]
ind = ind + i
print(segment)
Which gives you:
[6, 11, 5]
[1, 1, 1, 1, 1, 1]
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
[3, 3, 3, 3, 3]

Test whether list A is contained in list B

I have two lists, A & B, and I would like to test whether A is contained in B. By "contained" I mean that the elements of A appear in the exact same order within B with no other elements between them. What I'm looking for is very similar to the behavior of A in B if they were strings.
Some elements of A will be repeated. We can assume A will be shorter than B.
There are many answers to similar questions on SO, but most answer a different question:
Is A an element of B? (Not my question: B is a flat list, not a list of lists.)
Are all the elements of A contained in B? (Not my question: I'm concerned about order as well.)
Is A a sublist of B? (Not my question: I don't want to know whether the elements of A appear in the same order in B, I want to know if they appear exactly as they are somewhere in B.)
If the operation were implemented as the keyword containedin, it would behave like this.
>>> [2, 3, 4] containedin [1, 2, 3, 4, 5]
True
>>> [2, 3, 4] containedin [1, 1, 2, 2, 3, 3, 4, 4, 5, 5]
False
>>> [2, 3, 4] containedin [5, 4, 3, 2, 1]
False
>>> [2, 2, 2] containedin [1, 2, 3, 4, 5]
False
>>> [2, 2, 2] containedin [1, 1, 2, 2, 3, 3, 4, 4, 5, 5]
False
>>> [2, 2, 2] containedin [1, 1, 1, 2, 2, 2, 3, 3, 3]
True
Is there a concise way to perform this operation in Python? Am I missing some important terminology that would have led me to the answer more quickly?
Use any with list slicing:
def contained_in(lst, sub):
n = len(sub)
return any(sub == lst[i:i+n] for i in range(len(lst)-n+1))
Or, use join to join both lists to strings and use in operator:
def contained_in(lst, sub):
return ','.join(map(str, sub)) in ','.join(map(str, lst))
Usage:
>>> contained_in([1, 2, 3, 4, 5], [2, 3, 4])
True
>>> contained_in([1, 2, 2, 4, 5], [2, 3, 4])
False
many people have posted their answers. but I want to post my efforts anyway ;)
this is my code:
def containedin(a,b):
for j in range(len(b)-len(a)+1):
if a==b[j:j+len(a)]:
return True
return False
print(containedin([2, 3, 4],[1, 2, 3, 4, 5]))
print(containedin([2, 3, 4],[1, 1, 2, 2, 3, 3, 4, 4, 5, 5]))
print(containedin([2, 3, 4],[5, 4, 3, 2, 1]))
print(containedin([2, 2, 2],[1, 2, 3, 4, 5]))
print(containedin([2, 2, 2],[1, 1, 1, 2, 2, 2, 3, 3, 3]))
this is the output:
True
False
False
False
True
Assuming a always shorter than b what you can do is as follows.
any(a == b[i:i+len(a)] for i in range(len(b)-len(a)+1))
Considering you need to preserve order:
def contains(sub_array, array):
for i in range(len(array)-len(sub_array)+1):
for j in range(len(sub_array)):
if array[i+j] != sub_array[j]:
break
else:
return i, i+len(sub_array)
return False
Use this function
I tried to not make it complex
def contains(list1,list2):
str1=""
for i in list1:
str1+=str(i)
str2=""
for j in list2:
str2+=str(j)
if str1 in str2:
return True
else:
return False
Hope it works :)
Something like this?
class myList(list):
def in_other(self, other_list):
for i in range(0, len(other_list)-len(self)):
if other_list[i:i+len(self)] == self:
return True
else:
continue
if __name__ == "__main__":
x = myList([1, 2, 3])
b = [0, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6]
print(x.in_other(b))
No need to slice for every element:
def contains(seq, sub):
sub_length = len(sub)
sub_first = sub[0]
return any(sub == seq[index:index+sub_length]
for index, element in enumerate(seq)
if element == sub_first)
Usage:
>>> seq = [1, 2, 3, 4, 5]
>>> sub = [2, 3, 4]
>>> contains(seq, sub)
True
You can concatenate the 2 lists into two different strings. Then, write a function to check if one string is in another.
def containedin(a, b):
if b in a:
return True
return False`

Fastest way to count identical sub-arrays in a nd-array?

Let's consider a 2d-array A
2 3 5 7
2 3 5 7
1 7 1 4
5 8 6 0
2 3 5 7
The first, second and last lines are identical. The algorithm I'm looking for should return the number of identical rows for each different row (=number of duplicates of each element). If the script can be easily modified to also count the number of identical column also, it would be great.
I use an inefficient naive algorithm to do that:
import numpy
A=numpy.array([[2, 3, 5, 7],[2, 3, 5, 7],[1, 7, 1, 4],[5, 8, 6, 0],[2, 3, 5, 7]])
i=0
end = len(A)
while i<end:
print i,
j=i+1
numberID = 1
while j<end:
print j
if numpy.array_equal(A[i,:] ,A[j,:]):
numberID+=1
j+=1
i+=1
print A, len(A)
Expected result:
array([3,1,1]) # number identical arrays per line
My algo looks like using native python within numpy, thus inefficient. Thanks for help.
In unumpy >= 1.9.0, np.unique has a return_counts keyword argument you can combine with the solution here to get the counts:
b = np.ascontiguousarray(A).view(np.dtype((np.void, A.dtype.itemsize * A.shape[1])))
unq_a, unq_cnt = np.unique(b, return_counts=True)
unq_a = unq_a.view(A.dtype).reshape(-1, A.shape[1])
>>> unq_a
array([[1, 7, 1, 4],
[2, 3, 5, 7],
[5, 8, 6, 0]])
>>> unq_cnt
array([1, 3, 1])
In an older numpy, you can replicate what np.unique does, which would look something like:
a_view = np.array(A, copy=True)
a_view = a_view.view(np.dtype((np.void,
a_view.dtype.itemsize*a_view.shape[1]))).ravel()
a_view.sort()
a_flag = np.concatenate(([True], a_view[1:] != a_view[:-1]))
a_unq = A[a_flag]
a_idx = np.concatenate(np.nonzero(a_flag) + ([a_view.size],))
a_cnt = np.diff(a_idx)
>>> a_unq
array([[1, 7, 1, 4],
[2, 3, 5, 7],
[5, 8, 6, 0]])
>>> a_cnt
array([1, 3, 1])
You can lexsort on the row entries, which will give you the indices for traversing the rows in sorted order, making the search O(n) rather than O(n^2). Note that by default, the elements in the last column sort last, i.e. the rows are 'alphabetized' right to left rather than left to right.
In [9]: a
Out[9]:
array([[2, 3, 5, 7],
[2, 3, 5, 7],
[1, 7, 1, 4],
[5, 8, 6, 0],
[2, 3, 5, 7]])
In [10]: lexsort(a.T)
Out[10]: array([3, 2, 0, 1, 4])
In [11]: a[lexsort(a.T)]
Out[11]:
array([[5, 8, 6, 0],
[1, 7, 1, 4],
[2, 3, 5, 7],
[2, 3, 5, 7],
[2, 3, 5, 7]])
You can use Counter class from collections module for this.
It works like this :
x = [2, 2, 1, 5, 2]
from collections import Counter
c=Counter(x)
print c
Output : Counter({2: 3, 1: 1, 5: 1})
Only issue you will face is in your case since every value of x is itself a list which is a non hashable data structure.
If you can convert every value of x in a tuple that it should works as :
x = [(2, 3, 5, 7),(2, 3, 5, 7),(1, 7, 1, 4),(5, 8, 6, 0),(2, 3, 5, 7)]
from collections import Counter
c=Counter(x)
print c
Output : Counter({(2, 3, 5, 7): 3, (5, 8, 6, 0): 1, (1, 7, 1, 4): 1})

Python list issue

I need some hints or an example, how can i localize in a list a the list b, then replace it with list c.
a=[1,3,6,2,6,7,3,4,5,6,6,7,8]
input the b list (this is the sublist the program searches for in list a).
b=[6,7]
when found return me the indexes were the sublist has been found and replace it each time with c=[0,0], so the result will be
[1,3,6,2,0,0,3,4,5,6,0,0,8]
Here's a more efficient approach than my first, using list-slicing:
>>> for i in xrange(len(a) - len(b) + 1):
... if a[i:i+len(b)] == b:
... a[i:i+len(b)] = c
...
>>> a
[1, 3, 6, 2, 0, 0, 3, 4, 5, 6, 0, 0, 8]
First attempt, for posterity....
If you don't need the intermediate indices, here's one approach, using string functions and taking a functional approach, not modifying your list in-place.
>>> a_as_str = ','.join(str(i) for i in a)
>>> print a_as_str
1,3,6,2,6,7,3,4,5,6,6,7,8
>>> b_as_str = ','.join(str(i) for i in b)
>>> b_as_str
'6,7'
>>> c_as_str = ','.join(str(i) for i in c)
>>> c_as_str
'0,0'
>>> replaced = a_as_str.replace(b_as_str, c_as_str)
>>> replaced
'1,3,6,2,0,0,3,4,5,6,0,0,8'
>>> [int(i) for i in replaced.split(',')]
[1, 3, 6, 2, 0, 0, 3, 4, 5, 6, 0, 0, 8]
This can be refactored as:
>>> def as_str(l):
... return ','.join(str(i) for i in l)
...
>>> def as_list_of_ints(s):
... return [int(i) for i in s.split(',')]
...
>>> as_list_of_ints(as_str(a).replace(as_str(b), as_str(c)))
[1, 3, 6, 2, 0, 0, 3, 4, 5, 6, 0, 0, 8]
you can do something similar to (written in python 3.2, use xrange in python 2.x):
for i in range(0, len(a)):
if a[i:i+len(b)] == b:
a[i:i+len(b)] = c
this will account for lists of all sizes.
This assumes list b == list c I don't know if that is what you want however, please state if it is not.
Output for lists:
a = [1,2,3,4,5,6,7,8,9,0]
b = [1,2]
c = [0,0]
Output:
[0, 0, 3, 4, 5, 6, 7, 8, 9, 0]
I give you an example
li=[1,3,6,2,6,7,3,4,5,6,6,7,8]
for i in range(len(li)):
if li[i:i + 2] == [3, 4]:
li[i:i + 2] = [0, 0]
I think that this code should work. If you want a more robust script I suggest you to check the occurrences of a substring in the original list an edit a copy (to avoid side-effect behaviors).
It is important also to consider what happens when the given pattern is created by the substitution.
I think this function should treat all cases as intended:
def replace(a, b, c):
ii = 0
while ii <= (len(a) - len(b) + 1):
print(ii)
if a[ii:ii+len(b)] == b:
a[ii:ii+len(b)] = c
ii += len(b)
else:
ii += 1
return a
The output using the original example:
[1, 3, 6, 2, 0, 0, 3, 4, 5, 6, 0, 0, 8]
Here is an example where the substitution creates the search pattern:
a = [1,1,1,1,1,1,1,1,1,6,6,7,7,1]
b = [6,7]
c = [0,6]
Output is as expected:
[1, 1, 1, 1, 1, 1, 1, 1, 1, 6, 0, 6, 7, 1]
Any ideas on how to do this a bit more concisely?

Categories