Find items and repetitions in list - python

I am working in Python and considering the following problem: given a list, such as [1, 0, -2, 0, 0, 4, 5, 0, 3] which contains the integer 0 multiple times, I would like to have the indices at of these 0 and for each one, the number of times it appears in the list until a different element appears or the list ends.
Given l = [1, 0, -2, 0, 0, 4, 5, 0], the function would return ((1, 1), (3, 2), (7, 1)). The result is a list of tuples. The first element of the tuple is the index (in the list) of the given element and the second is the number of times it is repeated until a different element appears or the list ends.
Naively, I would write something like this:
def myfun(l, x):
if x not in l:
print("The given element is not in list.")
else:
j = 0
n = len(l)
r = list()
while j <= (n-2):
count = 0
if l[j] == x:
while l[j + count] == x and j <= (n-1):
count +=1
r.append((j, count))
j += count
else:
j += 1
if l[-1] == x:
r.append((n-1, 1))
return r
But I was wondering whether there would be a nicer (shorter?) way of doing the same thing.

Not the prettiest, but a one-liner:
>>> import itertools
>>> l=[1, 0, -2, 0, 0, 4, 5, 0]
>>> [(k[0][0],len(k)) for k in [list(j) for i,j in itertools.groupby(enumerate(l), lambda x: x[1]) if i==0]]
[(1, 1), (3, 2), (7, 1)]
First, itertools.groupby(enumerate(l), lambda x: x[1]) will group by the second item of enumerate(l), but keep the index of the item.
Then [list(j) for i,j in itertools.groupby(enumerate(l), lambda x: x[1]) if i==0] will keep only the 0 values.
Finally, the last list comprehension is needed because list(j) consume the itertools object.

Another oneliner with groupby, without using intermediate lists:
>>> from itertools import groupby
>>> l = [1, 0, -2, 0, 0, 4, 5, 0, 3]
>>> [(next(g)[0], 1 + sum(1 for _ in g)) for k, g in groupby(enumerate(l), key=lambda x: x[1]) if k == 0]
[(1, 1), (3, 2), (7, 1)]
In above enumerate will return (index, value) tuples which are then grouped by the value. groupby returns (key, iterable) tuples and if key is nonzero the group is discarded. For kept groups next is used to pull out the first item in the group and take index from there while rest of the items are processed by generator expression given to sum in order to get the count.

This is how i would do this
l=[1, 0, -2, 0, 0, 4, 5, 0]
lis=[]
t=0
for m in range(len(l)):
if l[m]==0:
if t==0:
k=m
j=1
t=1
else:
j=j+1
t=1
if m==len(l)-1:
lis.append((k,j))
else:
if t==1:
t=0
lis.append((k,j))

Another solution, using itertools.takewhile:
from itertools import takewhile
L = [1, 0, -2, 0, 0, 4, 5, 0]
res = []
i = 0
while i < len(L):
if L[i] == 0:
t = len(list(takewhile(lambda k: k == 0, L[i:])))
res.append((i, t))
i += t
else:
i += 1
print(res)
The line
t = len(list(takewhile(lambda k: k == 0, L[i:])))
counts the number of zeroes there are from the current position to the right.
While clear enough, the disadvantage of this solution is that it needs the whole list before processing it.

Related

Struggling to write a function that finds 'clusters' of the same data in a list

I am struggling to write a Python function that finds the indices of 'clusters' of the same data within a list. I want it to return a dictionary with keys as the repeating data and values as a list containing the start and end index of each cluster. NOTE: If there are multiple clusters with the same data, I would like a 2D list as the value for that key. To give an example, say I have the list [1, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 3, 3, 3]. My function find_clusters(x) should take the list as input and return the following dictionary: {1: [[0, 5], [9, 11]], 2: [5, 9], 3: [11, 14]}
Before worrying about multiple clusters of the same data, I tried to code a function that could handle single clusters, but it's stuck in an infinite loop:
def find_clusters(x):
cluster_dict = {}
start_ind = 0
end_ind = 0
while end_ind < len(x):
start_ind = end_ind
current_data = x[start_ind]
while x[end_ind] == current_data:
if end_ind + 1 == len(x):
break
else:
end_ind += 1
cluster_dict[current_data] = [start_ind, end_ind]
return cluster_dict
There's no need for (nested) while loops here - you just need to iterate over the list once, keep track of the last value you've seen, and mark the end of a cluster whenever you see a different value. Since you want to return a dict of lists, you can use a defaultdict to store the indices.
from collections import defaultdict
def find_clusters(xs: list):
clusters = defaultdict(list)
current_value = xs[0]
start_idx = 0
for i, value in enumerate(xs):
if value != current_value:
clusters[current_value].append((start_idx, i))
current_value = value
start_idx = i
# Handle final cluster after the loop completes
clusters[current_value].append((start_idx, len(xs)))
return clusters
>>> xs = [1, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 3, 3, 3]
>>> find_clusters(xs)
defaultdict(<class 'list'>, {1: [(0, 5), (9, 11)], 2: [(5, 9)], 3: [(11, 14)]})
Side note: In your version, the second index of each cluster corresponds to the first occurrence of a different value, not the last occurrence of the current value. This is useful for slicing, but I find it more intuitive to store the last occurrence of the current value and then use xs[start_idx:end_idx+1] when slicing:
defaultdict(<class 'list'>, {1: [(0, 4), (9, 10)], 2: [(5, 8)], 3: [(11, 13)]})
To achieve this, just append (start_idx, i-1) instead of (start_idx, i) (and len(xs)-1 instead of len(xs) at the end).
When if end_ind + 1 == len(x) is True, you do not increment end_ind, so you get stuck. Try :
from collections import defaultdict
def find_clusters(x):
cluster_dict = defaultdict(list)
start_ind = 0
end_ind = 0
while end_ind < len(x):
start_ind = end_ind
while x[end_ind] == x[start_ind]:
# note that we always increment end_ind heree
end_ind += 1
if end_ind == len(x):
break
cluster_dict[x[start_ind]].append([start_ind, end_ind])
return cluster_dict
find_clusters(arr)
Output :
defaultdict(list, {1: [[0, 5], [9, 11]], 2: [[5, 9]], 3: [[11, 14]]})

More elegant way of find a range of repeating elements

I have this problem.
let l be a list containing only 0's and 1's, find all tuples that represents the start and end of a repeating sequence of 1's.
example
l=[1,1,0,0,0,1,1,1,0,1]
answer:
[(0,2),(5,8),(9,10)]
i solved the problem with the following code, but i think it is pretty messy, i would like to know if there is a cleaner way to solve this problem (maybe using map/reduce ?)
from collections import deque
def find_range(l):
pairs=deque((i,i+1) for i,e in enumerate(l) if e==1)
ans=[]
p=[0,0]
while(len(pairs)>1):
act=pairs.popleft()
nex=pairs[0]
if p==[0,0]:
p=list(act)
if act[1]==nex[0]:
p[1]=nex[1]
else:
ans.append(tuple(p))
p=[0,0]
if(len(pairs)==1):
if p==[0,0]:
ans.append(pairs.pop())
else:
ans.append((p[0],pairs.pop()[1]))
return ans
With itertools.groupby magic:
from itertools import groupby
lst = [1, 1, 0, 0, 0, 1, 1, 1, 0, 1]
indices, res = range(len(lst)), []
for k, group in groupby(indices, key=lambda i: lst[i]):
if k == 1:
group = list(group)
sl = group[0], group[-1] + 1
res.append(sl)
print(res)
The output:
[(0, 2), (5, 8), (9, 10)]
Or with a more efficient generator function:
def get_ones_coords(lst):
indices = range(len(lst))
for k, group in groupby(indices, key=lambda i: lst[i]):
if k == 1:
group = list(group)
yield group[0], group[-1] + 1
lst = [1, 1, 0, 0, 0, 1, 1, 1, 0, 1]
print(list(get_ones_coords(lst))) # [(0, 2), (5, 8), (9, 10)]
As a short bonus, here's alternative numpy approach, though sophisticated, based on discrete difference between consecutive numbers (numpy.diff) and extracting indices of non-zero items (numpy.faltnonzero):
In [137]: lst = [1,1,0,0,0,1,1,1,0,1]
In [138]: arr = np.array(lst)
In [139]: np.flatnonzero(np.diff(np.r_[0, arr, 0]) != 0).reshape(-1, 2)
Out[139]:
array([[ 0, 2],
[ 5, 8],
[ 9, 10]])
Code:
a = [[l.index(1)]]
[l[i] and len(a[-1])==2 and a.append([i]) or l[i] or len(a[-1])==1 and a[-1].append(i) for i in range(len(l))]
Output:
[[0, 2], [5, 8], [9]]
Code:
l=[1,1,0,0,0,1,1,1,0,1]
indices = [ind for ind, elem in enumerate(l) if elem == 1]
diff = [0]+[x - indices[i - 1] for i, x in enumerate(indices)][1:]
change_ind = [0]+[i for i, change in enumerate(diff) if change > 1]+[len(indices)]
split_indices = [tuple(indices[i:j]) for i,j in zip(change_ind,change_ind[1:])]
proper_tuples = [(tup[0], tup[-1]) if len(tup)>2 else tup for tup in split_indices]
print(proper_tuples)
Logic:
indices is the list of indices where l elements = 1 => [0, 1, 5, 6, 7, 9]
diff calculates the difference between the indices found above and appends a 0 at the start to keep their lengths the same => [0, 1, 4, 1, 1, 2]
change_ind indicates the locations where a split needs to happen which corresponds to where diff is greater than 1. Also append the first index and last index for later use or else you will only have the middle tuple => [0, 2, 5, 6]
split_indices creates tuples based on the range indicated in consecutive elements in change_ind (using zip which creates the combination of ranges) => [(0, 1), (5, 6, 7), (9,)]
Lastly, proper_tuples loops through the tuples create in split_indices and insures that if their length is greater than 2, then only consider the first and last elements, otherwise keep as is => [(0, 1), (5, 7), (9,)]
Output:
[(0, 1), (5, 7), (9,)]
Final Comments:
Although this does not match what OP suggested in the original question:
[(0,2),(5,8),(9,10)]
It does make more logical sense and seems to follow what OP indicated in the comments.
For example, at the start of l there are two ones - so the tuple should be (0, 1) not (0, 2) to match the proposed (start, end) notation.
Likewise at the end there is only a single one - so the tuple corresponding to this is (9,) not (9, 10)

Python - Finds the index of the smallest element in the list A from index k onwards

I am stuck in finding how I can take the "k" in consideration to solve the following problem. Basically, it should start at index k and look for the lowest value in the range from k until the end of the list.
def find_min_index(A, k):
"""
Finds the index of the smallest element in the list A from index k onwards
Parameters:
A (list)
k: index from which start search
Example use:
>>> find_min_index([1, 2, 5, -1], 0)
3
>>> find_min_index([1, 1, 1, 5, 9], 2)
2
"""
minpos = A.index(min(A))
return minpos
One-liner solution is this:
return A[k:].index(min(A[k:]) + k
You select the minimal element from A[k:], find its index in A[k:] and add k to it to compensate the search area.
A slightly neater solution is this:
slice = A[k:]
return slice.index(min(slice)) + k
You can use enumerate to keep track of the original index before you slice the list with k as the starting index:
from operator import itemgetter
def find_min_index(A, k):
return min(list(enumerate(A))[k:], key=itemgetter(1))[0]
so that:
print(find_min_index([1, 2, 5, -1], 0))
print(find_min_index([1, 1, 1, 5, 9], 2))
would output:
3
2
You could use enumerate to find the index of min:
def find_min_index(A, k):
"""
Finds the index of the smallest element in the list A from index k onwards
Parameters:
A (list)
k: index from which start search
Example use:
>>> find_min_index([1, 2, 5, -1], 0)
3
>>> find_min_index([1, 1, 1, 5, 9], 2)
2
"""
o, _ = min(enumerate(A[k:]), key=lambda i: i[1])
minpos = k + o
return minpos
print(find_min_index([1, 2, 3, 4], 1))
print(find_min_index([4, 3, 2, 1], 1))
Output
1
3
You can add k to the index calculated from a sliced input list:
def find_min_index(A, k):
sliced = A[k:]
return k + sliced.index(min(sliced))
find_min_index([1, 2, 5, -1], 2) # 3
find_min_index([1, 1, 1, 5, 9], 2) # 2

Replace when n or less consecutive values are found

this is perhaps a very simple question
I have a list that looks like this:
a=[0,1,1,2,3,2,1,2,0,3,4,1,1,1,1,0,0,0,0,4,5,1,1,1,3,2,0,2,1,1,3,4,1]
I am struggling to find a simple python code that replaces when n or less consecutive 1s are found to 0s and creates a new list with the new values
So if
n = 2
b = [0,0,0,2,3,2,0,2,0,3,4,1,1,1,1,0,0,0,0,4,5,1,1,1,3,2,0,2,0,0,3,4,0]
if
n = 3
b = [0,0,0,2,3,2,0,2,0,3,4,1,1,1,1,0,0,0,0,4,5,0,0,0,3,2,0,2,0,0,3,4,0]
I have highlighted the new replaces values in each example
You can try this:
import itertools
a=[0,1,1,2,3,2,1,2,0,3,4,1,1,1,1,0,0,0,0,4,5,1,1,1,3,2,0,2,1,1,3,4,1]
n = 3
new_list = list(itertools.chain(*[[0]*len(b) if a == 1 and len(b) <= n else b for a, b in [(c, list(d)) for c, d in itertools.groupby(a)]]))
Output:
[0, 0, 0, 2, 3, 2, 0, 2, 0, 3, 4, 1, 1, 1, 1, 0, 0, 0, 0, 4, 5, 0, 0, 0, 3, 2, 0, 2, 0, 0, 3, 4, 0]
"One"-liner, using some itertools:
from itertools import groupby, chain
a=[0,1,1,2,3,2,1,2,0,3,4,1,1,1,1,0,0,0,0,4,5,1,1,1,3,2,0,2,1,1,3,4,1]
list(
chain.from_iterable(
([0] * len(lst) if x == 1 and len(lst) <= n else lst
for x, lst in ((k, list(g)) for k, g in groupby(a)))
)
)
# [0,0,0,2,3,2,0,2,0,3,4,1,1,1,1,0,0,0, 0,4,5,1,1,1,3,2,0,2,0,0,3,4,0]
groupby groups the initial list into groups of identical objects. Its output is an iterator of pairs (k, g) where k is the element that is the grouping key and g is an iterator producing the actual elements in the group.
Since you cannot call len on an iterator, this listifies the groups and chains the resulting lists except lists of 1 of the appropriate lengthes. Those are replaced by lists of 0 of the same length.
In single steps (using intermediate lists instead of generators):
grouped_lists_by_key = [k, list(g)) for k, g in groupby(a)]
# [(0, [0]), (1, [1, 1]), ...]
grouped_lists = [[0] * len(lst) if x == 1 and len(lst) <= n else lst for x, lst in grouped]
# [[0], [0, 0], [2], [3], ...]
flattened = chain.from_iterable(grouped_lists)
# [0, 0, 0, 2, 3, ...]
Non-oneliner using itertools.groupby():
a = [0,1,1,2,3,2,1,2,0,3,4,1,1,1,1,0,0,0,0,4,5,1,1,1,3,2,0,2,1,1,3,4,1]
n = 2
b = []
for k, g in groupby(a):
l = list(g)
if k == 1 and len(l) <= n:
b.extend([0]*len(l))
else:
b.extend(l)
print(b)
Try this:
def replacer(array, n):
i, consec = 0, 0
while i < len(array):
if array[i] == 1:
consec += 1
else:
if consec >= n:
for x in range(i-consec, i):
array[x] = 0
consec = 0
i += 1
return array
Longer than others, but arguably straightforward:
a = [0,1,1,2,3,2,1,2,0,3,4,1,1,1,1,0,0,0,0,4,5,1,1,1,3,2,0,2,1,1,3,4,1]
def suppress_consecutive_generator(consecutive=2, toreplace=1, replacement=0):
def gen(l):
length = len(l)
i = 0
while i < length:
if l[i] != toreplace:
yield l[i]
i += 1
continue
j = i
count = 0
while j < length:
if l[j] != toreplace:
break
count += 1
j += 1
i += count
if count <= consecutive:
for _ in range(count):
yield replacement
else:
for _ in range(count):
yield toreplace
return gen
print(list(suppress_consecutive_generator()(a)))

How do you calculate the greatest number of repetitions in a list?

If I have a list in Python like
[1, 2, 2, 2, 2, 1, 1, 1, 2, 2, 1, 1]
How do I calculate the greatest number of repeats for any element? In this case 2 is repeated a maximum of 4 times and 1 is repeated a maximum of 3 times.
Is there a way to do this but also record the index at which the longest run began?
Use groupby, it group elements by value:
from itertools import groupby
group = groupby([1, 2, 2, 2, 2, 1, 1, 1, 2, 2, 1, 1])
print max(group, key=lambda k: len(list(k[1])))
And here is the code in action:
>>> group = groupby([1, 2, 2, 2, 2, 1, 1, 1, 2, 2, 1, 1])
>>> print max(group, key=lambda k: len(list(k[1])))
(2, <itertools._grouper object at 0xb779f1cc>)
>>> group = groupby([1, 2, 2, 2, 2, 1, 1, 1, 2, 2, 1, 1, 3, 3, 3, 3, 3])
>>> print max(group, key=lambda k: len(list(k[1])))
(3, <itertools._grouper object at 0xb7df95ec>)
From python documentation:
The operation of groupby() is similar
to the uniq filter in Unix. It
generates a break or new group every
time the value of the key function
changes
# [k for k, g in groupby('AAAABBBCCDAABBB')] --> A B C D A B
# [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC D
If you also want the index of the longest run you can do the following:
group = groupby([1, 2, 2, 2, 2, 1, 1, 1, 2, 2, 1, 1, 3, 3, 3, 3, 3])
result = []
index = 0
for k, g in group:
length = len(list(g))
result.append((k, length, index))
index += length
print max(result, key=lambda a:a[1])
Loop through the list, keep track of the current number, how many times it has been repeated, and compare that to the most times youve seen that number repeated.
Counts={}
Current=0
Current_Count=0
LIST = [1, 2, 2, 2, 2, 1, 1, 1, 2, 2, 1, 1]
for i in LIST:
if Current == i:
Current_Count++
else:
Current_Count=1
Current=i
if Current_Count>Counts[i]:
Counts[i]=Current_Count
print Counts
If you want it for just any element (i.e. the element with the most repetitions), you could use:
def f((v, l, m), x):
nl = l+1 if x==v else 1
return (x, nl, max(m,nl))
maxrep = reduce(f, l, (0,0,0))[2];
This only counts continuous repetitions (Result for [1,2,2,2,1,2] would be 3) and only records the element with the the maximum number.
Edit: Made definition of f a bit shorter ...
This is my solution:
def longest_repetition(l):
if l == []:
return None
element = l[0]
new = []
lar = []
for e in l:
if e == element:
new.append(e)
else:
if len(new) > len(lar):
lar = new
new = []
new.append(e)
element = e
if len(new) > len(lar):
lar = new
return lar[0]
-You can make new copy of the list but with unique values and a corresponding hits list.
-Then get the Max of hits list and get from it's index your most repeated item.
oldlist = ["A", "B", "E", "C","A", "C","D","A", "E"]
newlist=[]
hits=[]
for i in range(len(oldlist)):
if oldlist[i] in newlist:
hits[newlist.index(oldlist[i])]+= 1
else:
newlist.append(oldlist[i])
hits.append(1);
#find the most repeated item
temp_max_hits=max(hits)
temp_max_hits_index=hits.index(temp_max_hits)
print(newlist[temp_max_hits_index])
print(temp_max_hits)
But I don't know is this the fastest way to do that or there are faster solution.
If you think there are faster or more efficient solution, kindly inform us.
I'd use a hashmap of item to counter.
Every time you see a 'key' succession, increment its counter value. If you hit a new element, set the counter to 1 and keep going. At the end of this linear search, you should have the maximum succession count for each number.
This code seems to work:
l = [1, 2, 2, 2, 2, 1, 1, 1, 2, 2, 1, 1]
previous = None
# value/repetition pair
greatest = (-1, -1)
reps = 1
for e in l:
if e == previous:
reps += 1
else:
if reps > greatest[1]:
greatest = (previous, reps)
previous = e
reps = 1
if reps > greatest[1]:
greatest = (previous, reps)
print greatest
i write this code and working easly:
lst = [4,7,2,7,7,7,3,12,57]
maximum=0
for i in lst:
count = lst.count(i)
if count>maximum:
maximum=count
indexx = lst.index(i)
print(lst[indexx])

Categories