Find indexes of repeated elements in an array (Python, NumPy)

Find indexes of repeated elements in an array (Python, NumPy) - python

Assume, I have a NumPy-array of integers, as:
[34,2,3,22,22,22,22,22,22,18,90,5,-55,-19,22,6,6,6,6,6,6,6,6,23,53,1,5,-42,82]
I want to find the start and end indices of the array, where a value is more than x-times (say 5-times) repeated. So in the case above, it is the value 22 and 6. Start index of the repeated 22 is 3 and end-index is 8. Same for the repeatening 6.
Is there a special tool in Python that is helpful?
Otherwise, I would loop through the array index for index and compare the actual value with the previous.
Regards.

Using np.diff and the method given here by #WarrenWeckesser for finding runs of zeros in an array:
import numpy as np
def zero_runs(a): # from link
iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
absdiff = np.abs(np.diff(iszero))
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
return ranges
a = [34,2,3,22,22,22,22,22,22,18,90,5,-55,-19,22,6,6,6,6,6,6,6,6,23,53,1,5,-42,82]
zero_runs(np.diff(a))
Out[87]:
array([[ 3, 8],
[15, 22]], dtype=int32)
This can then be filtered on the difference between the start & end of the run:
runs = zero_runs(np.diff(a))
runs[runs[:, 1]-runs[:, 0]>5] # runs of 7 or more, to illustrate filter
Out[96]: array([[15, 22]], dtype=int32)

Here is a solution using Python's native itertools.
Code
import itertools as it
def find_ranges(lst, n=2):
"""Return ranges for `n` or more repeated values."""
groups = ((k, tuple(g)) for k, g in it.groupby(enumerate(lst), lambda x: x[-1]))
repeated = (idx_g for k, idx_g in groups if len(idx_g) >=n)
return ((sub[0][0], sub[-1][0]) for sub in repeated)
lst = [34,2,3,22,22,22,22,22,22,18,90,5,-55,-19,22,6,6,6,6,6,6,6,6,23,53,1,5,-42,82]
list(find_ranges(lst, 5))
# [(3, 8), (15, 22)]
Tests
import nose.tools as nt
def test_ranges(f):
"""Verify list results identifying ranges."""
nt.eq_(list(f([])), [])
nt.eq_(list(f([0, 1,1,1,1,1,1, 2], 5)), [(1, 6)])
nt.eq_(list(f([1,1,1,1,1,1, 2,2, 1, 3, 1,1,1,1,1,1], 5)), [(0, 5), (10, 15)])
nt.eq_(list(f([1,1, 2, 1,1,1,1, 2, 1,1,1], 3)), [(3, 6), (8, 10)])
nt.eq_(list(f([1,1,1,1, 2, 1,1,1, 2, 1,1,1,1], 3)), [(0, 3), (5, 7), (9, 12)])
test_ranges(find_ranges)
This example captures (index, element) pairs in lst, and then groups them by element. Only repeated pairs are retained. Finally, first and last pairs are sliced, yielding (start, end) indices from each repeated group.
See also this post for finding ranges of indices using itertools.groupby.

There really isn't a great short-cut for this. You can do something like:
mult = 5
for elem in val_list:
target = [elem] * mult
found_at = val_list.index(target)
I leave the not-found exceptions and longer sequence detection to you.

If you're looking for value repeated n times in list L, you could do something like this:
def find_repeat(value, n, L):
look_for = [value for _ in range(n)]
for i in range(len(L)):
if L[i] == value and L[i:i+n] == look_for:
return i, i+n

Here is a relatively quick, errorless solution which also tells you how many copies were in the run. Some of this code was borrowed from KAL's solution.
# Return the start and (1-past-the-end) indices of the first instance of
# at least min_count copies of element value in container l
def find_repeat(value, min_count, l):
look_for = [value for _ in range(min_count)]
for i in range(len(l)):
count = 0
while l[i + count] == value:
count += 1
if count >= min_count:
return i, i + count

I had a similar requirement. This is what I came up with, using only comprehension lists:
A=[34,2,3,22,22,22,22,22,22,18,90,5,-55,-19,22,6,6,6,6,6,6,6,6,23,53,1,5,-42,82]
Find unique and return their indices
_, ind = np.unique(A,return_index=True)
np.unique sorts the array, sort the indices to get the indices in the original order
ind = np.sort(ind)
ind contains the indices of the first element in the repeating group, visible by non-consecutive indices
Their diff gives the number of elements in a group. Filtering using np.diff(ind)>5 shall give a boolean array with True at the starting indices of groups. The ind array contains the end indices of each group just after each True in the filtered list
Create a dict with the key as the repeating element and the values as a tuple of start and end indices of that group
rep_groups = dict((A[ind[i]], (ind[i], ind[i+1]-1)) for i,v in enumerate(np.diff(ind)>5) if v)

Related

How to create a list of all values within a certain range from each other?

I have a list of tuples:
x = [(2, 10), (4, 5), (8, 10), (9, 11), (10, 15)]
I'm trying to compare the first values in all the tuples to see if they are within 1 from each other. If they are within 1, I want to aggregate (sum) the second value of the tuple, and take the mean of the first value.
The output list would look like this:
[(2, 10), (4, 5), (9, 36)]
Notice that the 8 and 10 have a difference of 2, but they're both only 1 away from 9, so they all 3 get aggregated.
I have been trying something along these lines, but It's not capturing the sequenced values like 8, 9, and 10. It's also still preserving the original values, even if they've been aggregated together.
tuple_list = [(2, 10), (4, 5), (8, 10), (9, 11), (10, 15)]
output_list = []
for x1,y1 in tuple_list:
for x2,y2 in tuple_list:
if x1==x2:
continue
if np.abs(x1-x2) <= 1:
output_list.append((np.mean([x1,x2]), y1+y2))
else:
output_list.append((x1,y1))
output_list = list(set(output_list))

You can do it in a list comprehension using groupby (from itertools). The grouping key will be the difference between the first value and the tuple's index in the list. When the values are 1 apart, this difference will be constant and the tuples will be part of the same group.
For example: [2, 4, 8, 9, 10] minus their indexes [0, 1, 2, 3, 4] will give [2, 3, 6, 6, 6] forming 3 groups: [2], [4] and [8 ,9, 10].
from itertools import groupby
x = [(2, 10), (4, 5), (8, 10), (9, 11), (10, 15)]
y = [ (sum(k)/len(k),sum(v)) # output tuple
for i in [enumerate(x)] # sequence iterator
for _,g in groupby(x,lambda t:t[0]-next(i)[0]) # group by sequence
for k,v in [list(zip(*g))] ] # lists of keys & values
print(y)
[(2.0, 10), (4.0, 5), (9.0, 36)]
The for k,v in [list(zip(*g))] part is a bit tricky but what it does it transform a list of tuples (in a group) into two lists (k and v) with k containing the first item of each tuple and v containing the second items.
e.g. if g is ((8,10),(9,11),(10,15)) then k will be (8,9,10) and v will be (10,11,15)

By sorting the list first, and then using itertools.pairwise to iterate over the next and previous days, this problem starts to become much easier. On sequential days, instead of adding a new item to our final list, we modify the last item added to it. Figuring out the new sum is easy enough, and figuring out the new average is actually super easy because we're averaging sequential numbers. We just need to keep track of how many sequential days have passed and we can use that to get the average.
def on_neighboring_days_sum_occurrances(tuple_list):
tuple_list.sort()
ret = []
sequential_days = 1
# We add the first item now
# And then when we start looping we begin looping on the second item
# This way the loop will always be able to modify ret[-1]
ret.append(tuple_list[0])
# Python 3.10+ only, in older versions do
# for prev, current in zip(tuple_list, tuple_list[1:]):
for prev, current in itertools.pairwise(tuple_list):
day = current[0]
prev_day = prev[0]
is_sequential_day = day - prev_day <= 1
if is_sequential_day:
sequential_days += 1
avg_day = day - sequential_days/2
summed_vals = ret[-1][1] + current[1]
ret[-1] = (avg_day, summed_vals)
else:
sequential_days = 1
ret.append(current)
return ret

You can iterate through the list and keep track of a single tuple, and iterate from the tuple next to the one that you're tracking till the penultimate tuple in the list because, when the last tuple comes into tracking there is no tuple after that and thus it is a waste iteration; and find if the difference between the 1st elements is equal to the difference in indices of the tuples, if so sum up the 2nd as well as 1st elements, when this condition breaks, divide the sum of 1st elements with the difference in indices so as to get the average of them, and append them to the result list, now to make sure the program doesn't consider the same tuples again, jump to the index where the condition broke... like this
x = [(2, 10), (4, 5), (8, 10), (9, 11), (10, 15)]
x.sort()
res, i = [], 0
while i<len(x)-1:
sum2, avg1 = x[i][1], x[i][0]
for j in range(i+1, len(x)):
if abs(x[j][0]-x[i][0]) == (j-i):
sum2 += x[j][1]
avg1 += x[j][0]
else:
res.append(x[i])
i+=1
break
else:
avg1 /= len(x)-i
res.append((int(avg1), sum2))
i = j+1
print(res)
Here the while loop iterates from the start of the list till the penultimate tuple in the list, the sum2, avg1 keeps track of the 2nd and 1st elements of the current tuple respectively. The for loop iterates through the next tuple to the current tuple till the end. The if checks the condition, and if it is met, it adds the elements of the tuple from the for loop since the variables are intialized with the elements of current tuple, else it appends the tuple from the for loop directly to the result list res, increments the while loop variable and breaks out of the iteration. When the for loop culminates without a break, it means that the condition breaks, thus it finds the average of the 1st element and appends the tuple (avg1, sum2) to res and skips to the tuple which is next to the one that broke the condition.

Complex heapq lazy merge (minimize space used)

Given a list of sorted lists, I wish to produce a sorted list of output.
This would be easy:
nums : List[List[int]]
h = heapq.merge(nums)
However, I also wish to tag each element of the output, with the index of the inner list from which it originated. Eg.
nums = [[1,3,5], [2,6], [4]]
h = _ # ???
for x in h:
print(x)
# Outputs
# (1,0)
# (2,1)
# (3,0)
# (4,2)
# (5,0)
# (6,1)
I have written a version that works,
h = heapq.merge(*map(lambda l: map(lambda x: (x,l[0]), l[1]), enumerate(nums)))
But I'm afraid I might have lost a desirable space-complexity guarantee; how can I know whether the (transformed) inner lists are being manifested or not? (and what exactly does the * do in my attempt?)

Since default tuple comparison is lexicographic, you can convert the innermost-element x to (x, i) where i is the index of the list containing x in nums. For example,
import heapq
from itertools import repeat
nums = [[1,3,5], [2,6], [4]]
nums_with_inds = (zip(lst, repeat(i)) for i, lst in enumerate(nums))
res = heapq.merge(*nums_with_inds)
for tup in res:
print(tup)
# (1, 0)
# (2, 1)
# (3, 0)
# (4, 2)
# (5, 0)
# (6, 1)

Order dictionary with x and y coordinates in python

I have this problem.
I need order this points 1-7
1(4,2), 2(3, 5), 3(1,4), 4(1,1), 5(2,2), 6(1,3), 7(1,5)
and get this result
4 , 6 , 3 , 5 , 2 , 1 , 7.
I am using a python script for sort with x reference and is ok, but the sort in y is wrong.
I have tried with sorted(dicts,key=itemgetter(1,2))
Someone can help me please ?

Try this:
sorted(dicts,key=itemgetter(1,0))
Indexing in python starts at 0. itemgetter(1,0) is sorting by the second element and then by the first element

This sorts the code based on ordering the first coordinate of the tuple, and then sub-ordering by the second coordinate of the tuple. I.e. Like alphabetically where "Aa", then "Ab", then "Ba", then "Bb". More literall (1,1), (1,2), (2,1), (2,2), etc.
This will work IF (and only if) the tuple value pair associated with #7 is actually out of order in your question (and should actually be between #3 and #5.)
If this is NOT the case, See my other answer.
# Make it a dictionary, with the VALUETUPLES as the KEYS, and the designator as the value
d = {(1,1):4, (1,3):6, (1,4):3, (2,2):5, (3,5):2, (4,2):1,(1,5):7}
# ALSO make a list of just the value tuples
l = [ (1,1), (1,3), (1,4), (2,2), (3,5), (4,2), (1,5)]
# Sort the list by the first element in each tuple. ignoring the second
new = sorted(l, key=lambda x: x[0])
# Create a new dictionary, basically for temp sorting
new_d = {}
# This iterates through the first sorted list "new"
# and creates a dictionary where the key is the first number of value tuples
count = 0
# The extended range is because we don't know if any of the Tuple Values share any same numbers
for r in range(0, len(new)+1,1):
count += 1
new_d[r] = []
for item in new:
if item[0] == r:
new_d[r].append(item)
print(new_d) # So it makes sense
# Make a final list to capture the rdered TUPLES VALUES
final_list = []
# Go through the same rage as above
for r in range(0, len(new)+1,1):
_list = new_d[r] # Grab the first list item from the dic. Order does not matter here
if len(_list) > 0: # If the list has any values...
# Sort that list now by the SECOND tuple value
_list = sorted(_list, key=lambda x: x[1])
# Lists are ordered. So we can now just tack that ordered list onto the final list.
# The order remains
for item in _list:
final_list.append(item)
# This is all the tuple values in order
print(final_list)
# If you need them correlated to their original numbers
by_designator_num = []
for i in final_list: # The the first tupele value
by_designator_num.append(d[i]) # Use the tuple value as the key, to get the original designator number from the original "d" dictionary
print(by_designator_num)
OUTPUT:
[(1, 1), (1, 3), (1, 4), (1, 5), (2, 2), (3, 5), (4, 2)]
[4, 6, 3, 7, 5, 2, 1]

Since you're searching visually from top-to-bottom, then left-to-right, this code is much simpler and provides the correct result. It basically does the equivalent of a visual scan, by checking for all tuples that are at each "y=n" position, and then sorting any "y=n" tuples based on the second number (left-to-right).
Just to be more consistent with the Cartesian number system, I've converted the points on the graph to (x,y) coordinates, with X-positive (increasing to the right) and y-negative (decreasing as they go down).
d = {(2,-4):1, (5,-3):2, (4,-1):3, (1,-1):4, (2,-2):5, (3,-1):6, (1,-5):7}
l = [(2,-4), (5,-3), (4,-1), (1,-1), (2,-2), (3,-1), (1,-5)]
results = []
# Use the length of the list. Its more than needed, but guarantees enough loops
for y in range(0, -len(l), -1):
# For ONLY the items found at the specified y coordinate
temp_list = []
for i in l: # Loop through ALL the items in the list
if i[1] == y: # If tuple is at this "y" coordinate then...
temp_list.append(i) # ... append it to the temp list
# Now sort the list based on the "x" position of the coordinate
temp_list = sorted(temp_list, key=lambda x: x[0])
results += temp_list # And just append it to the final result list
# Final TUPLES in order
print(results)
# If you need them correlated to their original numbers
by_designator_num = []
for i in results: # The the first tupele value
by_designator_num.append(d[i]) # Use the tuple value as the key, to get the original designator number from the original "d" dictionary
print(by_designator_num)
OR if you want it faster and more compact
d = {(2,-4):1, (5,-3):2, (4,-1):3, (1,-1):4, (2,-2):5, (3,-1):6, (1,-5):7}
l = [(2,-4), (5,-3), (4,-1), (1,-1), (2,-2), (3,-1), (1,-5)]
results = []
for y in range(0, -len(l), -1):
results += sorted([i for i in l if i[1] == y ], key=lambda x: x[0])
print(results)
by_designator_num = [d[i] for i in results]
print(by_designator_num)
OUTPUT:
[(1, -1), (3, -1), (4, -1), (2, -2), (5, -3), (2, -4), (1, -5)]
[4, 6, 3, 5, 2, 1, 7]

How to check if sum of some contiguous subarray is equal to N?

So say I have a list sequences such as this.
I want to remove all sequences where its total sum = N and/or it has a contiguous subarray with sum = N.
For example, if N = 4, then (1,1,2) is not valid since its total is 4. (1,1,3) is also not valid since the (1,3) is also 4. (1,3,1) is also not valid for the same reason.
lst = [
(1,1,1), (1,1,2), (1,1,3),
(1,2,1), (1,2,2), (1,2,3),
(1,3,1), (1,3,2), (1,3,3),
(2,1,1), (2,1,2), (2,1,3),
(2,2,1), (2,2,2), (2,2,3),
(2,3,1), (2,3,2), (2,3,3),
(3,1,1), (3,1,2), (3,1,3),
(3,2,1), (3,2,2), (3,2,3),
(3,3,1), (3,3,2), (3,3,3)
]
What are some ways to do this?
I'm currently trying to see if I'm able to remove sequences whose total is not necessarily a multiple of N but not its contiguous subarrays, but I'm unsuccessful
for elements in list(product(range(1,n), repeat=n-1)):
lst.append(elements)
for val in lst:
if np.cumsum(val).any() %n != 0:
lst2.append(val) # append value to a filtered list

You can use itertools.combinations to generate all combinations of slice indices to test for sums of subsequences:
from itertools import combinations
[t for t in lst if not any(sum(t[l:h+1]) == 4 for l, h in combinations(range(len(t)), 2))]
This returns:
[(1, 1, 1), (1, 2, 3), (2, 3, 2), (2, 3, 3), (3, 2, 1), (3, 2, 3), (3, 3, 2), (3, 3, 3)]

You can split your problem into two subproblems:
The elements in your list sum up to N. Then you can simply test:
if sum(myList) == N:
# do fancy stuff
The elements in your list do not sum up to N. In this case, there might be a subsequence that sum up to N. To find it, let's define two pointers, l and r. Their name stand for left and right and will define the boundaries of your subsequence. Then, the solution is the following:
r = 1
l = 0
while r <= len(myList):
sum_ = sum(myList[l:r])
if sum_ < 4:
r += 1
elif sum_ > 4:
l += 1
else:
# do fancy stuff and exit from the loop
break
It works as follows. First you initialize l and r so that you consider the subsequence consisting of only the first element of myList. Then, you sum the element of the subsequence and if the sum is lower than N, you enlarge the subsequence by adding 1 to r. If it is greater than N, then you restrict the subsequence by adding 1 to l.
Note thanks to eozd:
The above algorithm works only if the elemnent of the list are non-negative.

Python Distribute List of Tuple Items by Matching Recursively

I'm trying to write a recursive function that will look for matching items in a list of tuples, and distribute them into groups. An example list could be:
l = [(1,2),(2,3),(4,6),(3,5),(7,9),(6,8)]
And the intent is to end with three sublists in the form of:
l = [(1,2),(2,3),(3,5)],[(4,6),(6,8)],[7,9]]
This appears to be an ideal situation for a recursive function, but I haven't made it work yet. Here is what I have written so far:
count = 0
def network(index_list, tuples_list, groups, count):
if len(index_list) > 0:
i = 0
for j in range (len(index_list)):
match = set(tuples_list[groups[count][i]]) & set(tuples_list[index_list[j]])
if len(match) > 0:
groups[count].append(index_list[j])
index_list.pop(j)
i += 1
else:
count += 1
groups.append([])
groups[count].append(index_list[0])
network(index_list, tuples_list, groups, count)
else:
return groups
Also I'm pretty certain this question is different than the one marked duplicate. I'm looking for a recursive solution that keeps all of the tuples intact, in this case by pushing around their indices with append and pop. I'm positive there's an elegant and recursive solution to this problem.

I'm not sure recursion is helpful here. It was an interesting thought experiment, so I tried it out, but my recursive function only ever needs one run-through.
Modified based on comment to only match the first two data points of a tuple
import copy
#Inital 1-item groups
def groupTuples(tupleList):
groups = []
for t in tupleList:
groups.append([t])
return groups
#Merge two groups
def splice(list1, list2, index):
for item in list2:
list1.insert(index, item)
index += 1
return list1
def network(groups = []):
#Groups get modified by reference, need a shallow copy for reference
oldGroups = copy.copy(groups)
for group in groups:
for index, secondGroup in enumerate(groups):
if group == secondGroup:
#Splice can get caught in a loop if passed the same item twice.
continue
#Last item of First Group matches First item of Second Group
if group[-1][1] == secondGroup[0][0]:
splice(group, secondGroup, len(group))
groups.pop(index)
#First item of First Group matches Last item of Second Group
elif group[0][0] == secondGroup[-1][1]:
splice(group, secondGroup, 0)
groups.pop(index)
if groups == oldGroups:
return groups #No change.
else:
return network(groups)
l = [(1,2,0),(2,3,1),(4,6,2),(3,5,3),(1,7,4),(7,9,5),(6,8,6),(12, 13,7),(5, 12,8)]
result = network(groupTuples(l))
print result
Result: [[(1, 2, 0), (2, 3, 1), (3, 5, 3), (5, 12, 8), (12, 13, 7)], [(1, 7, 4), (7, 9, 5)], [(4, 6, 2), (6, 8, 6)]]
It does run the matching algorithm over the dataset multiple times, but I've yet to see a configuration of data where the second iteration yields any changes, so not sure recursion is necessary here.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find indexes of repeated elements in an array (Python, NumPy) - python

There really isn't a great short-cut for this. You can do something like: mult = 5 for elem in val_list: target = [elem] * mult found_at = val_list.index(target) I leave the not-found exceptions and longer sequence detection to you.

If you're looking for value repeated n times in list L, you could do something like this: def find_repeat(value, n, L): look_for = [value for _ in range(n)] for i in range(len(L)): if L[i] == value and L[i:i+n] == look_for: return i, i+n

Related

How to create a list of all values within a certain range from each other?

Complex heapq lazy merge (minimize space used)

Order dictionary with x and y coordinates in python

How to check if sum of some contiguous subarray is equal to N?

Python Distribute List of Tuple Items by Matching Recursively

Categories

Resources