I'm trying to write a recursive function that will look for matching items in a list of tuples, and distribute them into groups. An example list could be:
l = [(1,2),(2,3),(4,6),(3,5),(7,9),(6,8)]
And the intent is to end with three sublists in the form of:
l = [(1,2),(2,3),(3,5)],[(4,6),(6,8)],[7,9]]
This appears to be an ideal situation for a recursive function, but I haven't made it work yet. Here is what I have written so far:
count = 0
def network(index_list, tuples_list, groups, count):
if len(index_list) > 0:
i = 0
for j in range (len(index_list)):
match = set(tuples_list[groups[count][i]]) & set(tuples_list[index_list[j]])
if len(match) > 0:
groups[count].append(index_list[j])
index_list.pop(j)
i += 1
else:
count += 1
groups.append([])
groups[count].append(index_list[0])
network(index_list, tuples_list, groups, count)
else:
return groups
Also I'm pretty certain this question is different than the one marked duplicate. I'm looking for a recursive solution that keeps all of the tuples intact, in this case by pushing around their indices with append and pop. I'm positive there's an elegant and recursive solution to this problem.
I'm not sure recursion is helpful here. It was an interesting thought experiment, so I tried it out, but my recursive function only ever needs one run-through.
Modified based on comment to only match the first two data points of a tuple
import copy
#Inital 1-item groups
def groupTuples(tupleList):
groups = []
for t in tupleList:
groups.append([t])
return groups
#Merge two groups
def splice(list1, list2, index):
for item in list2:
list1.insert(index, item)
index += 1
return list1
def network(groups = []):
#Groups get modified by reference, need a shallow copy for reference
oldGroups = copy.copy(groups)
for group in groups:
for index, secondGroup in enumerate(groups):
if group == secondGroup:
#Splice can get caught in a loop if passed the same item twice.
continue
#Last item of First Group matches First item of Second Group
if group[-1][1] == secondGroup[0][0]:
splice(group, secondGroup, len(group))
groups.pop(index)
#First item of First Group matches Last item of Second Group
elif group[0][0] == secondGroup[-1][1]:
splice(group, secondGroup, 0)
groups.pop(index)
if groups == oldGroups:
return groups #No change.
else:
return network(groups)
l = [(1,2,0),(2,3,1),(4,6,2),(3,5,3),(1,7,4),(7,9,5),(6,8,6),(12, 13,7),(5, 12,8)]
result = network(groupTuples(l))
print result
Result: [[(1, 2, 0), (2, 3, 1), (3, 5, 3), (5, 12, 8), (12, 13, 7)], [(1, 7, 4), (7, 9, 5)], [(4, 6, 2), (6, 8, 6)]]
It does run the matching algorithm over the dataset multiple times, but I've yet to see a configuration of data where the second iteration yields any changes, so not sure recursion is necessary here.
Related
I have a list of tuples:
x = [(2, 10), (4, 5), (8, 10), (9, 11), (10, 15)]
I'm trying to compare the first values in all the tuples to see if they are within 1 from each other. If they are within 1, I want to aggregate (sum) the second value of the tuple, and take the mean of the first value.
The output list would look like this:
[(2, 10), (4, 5), (9, 36)]
Notice that the 8 and 10 have a difference of 2, but they're both only 1 away from 9, so they all 3 get aggregated.
I have been trying something along these lines, but It's not capturing the sequenced values like 8, 9, and 10. It's also still preserving the original values, even if they've been aggregated together.
tuple_list = [(2, 10), (4, 5), (8, 10), (9, 11), (10, 15)]
output_list = []
for x1,y1 in tuple_list:
for x2,y2 in tuple_list:
if x1==x2:
continue
if np.abs(x1-x2) <= 1:
output_list.append((np.mean([x1,x2]), y1+y2))
else:
output_list.append((x1,y1))
output_list = list(set(output_list))
You can do it in a list comprehension using groupby (from itertools). The grouping key will be the difference between the first value and the tuple's index in the list. When the values are 1 apart, this difference will be constant and the tuples will be part of the same group.
For example: [2, 4, 8, 9, 10] minus their indexes [0, 1, 2, 3, 4] will give [2, 3, 6, 6, 6] forming 3 groups: [2], [4] and [8 ,9, 10].
from itertools import groupby
x = [(2, 10), (4, 5), (8, 10), (9, 11), (10, 15)]
y = [ (sum(k)/len(k),sum(v)) # output tuple
for i in [enumerate(x)] # sequence iterator
for _,g in groupby(x,lambda t:t[0]-next(i)[0]) # group by sequence
for k,v in [list(zip(*g))] ] # lists of keys & values
print(y)
[(2.0, 10), (4.0, 5), (9.0, 36)]
The for k,v in [list(zip(*g))] part is a bit tricky but what it does it transform a list of tuples (in a group) into two lists (k and v) with k containing the first item of each tuple and v containing the second items.
e.g. if g is ((8,10),(9,11),(10,15)) then k will be (8,9,10) and v will be (10,11,15)
By sorting the list first, and then using itertools.pairwise to iterate over the next and previous days, this problem starts to become much easier. On sequential days, instead of adding a new item to our final list, we modify the last item added to it. Figuring out the new sum is easy enough, and figuring out the new average is actually super easy because we're averaging sequential numbers. We just need to keep track of how many sequential days have passed and we can use that to get the average.
def on_neighboring_days_sum_occurrances(tuple_list):
tuple_list.sort()
ret = []
sequential_days = 1
# We add the first item now
# And then when we start looping we begin looping on the second item
# This way the loop will always be able to modify ret[-1]
ret.append(tuple_list[0])
# Python 3.10+ only, in older versions do
# for prev, current in zip(tuple_list, tuple_list[1:]):
for prev, current in itertools.pairwise(tuple_list):
day = current[0]
prev_day = prev[0]
is_sequential_day = day - prev_day <= 1
if is_sequential_day:
sequential_days += 1
avg_day = day - sequential_days/2
summed_vals = ret[-1][1] + current[1]
ret[-1] = (avg_day, summed_vals)
else:
sequential_days = 1
ret.append(current)
return ret
You can iterate through the list and keep track of a single tuple, and iterate from the tuple next to the one that you're tracking till the penultimate tuple in the list because, when the last tuple comes into tracking there is no tuple after that and thus it is a waste iteration; and find if the difference between the 1st elements is equal to the difference in indices of the tuples, if so sum up the 2nd as well as 1st elements, when this condition breaks, divide the sum of 1st elements with the difference in indices so as to get the average of them, and append them to the result list, now to make sure the program doesn't consider the same tuples again, jump to the index where the condition broke... like this
x = [(2, 10), (4, 5), (8, 10), (9, 11), (10, 15)]
x.sort()
res, i = [], 0
while i<len(x)-1:
sum2, avg1 = x[i][1], x[i][0]
for j in range(i+1, len(x)):
if abs(x[j][0]-x[i][0]) == (j-i):
sum2 += x[j][1]
avg1 += x[j][0]
else:
res.append(x[i])
i+=1
break
else:
avg1 /= len(x)-i
res.append((int(avg1), sum2))
i = j+1
print(res)
Here the while loop iterates from the start of the list till the penultimate tuple in the list, the sum2, avg1 keeps track of the 2nd and 1st elements of the current tuple respectively. The for loop iterates through the next tuple to the current tuple till the end. The if checks the condition, and if it is met, it adds the elements of the tuple from the for loop since the variables are intialized with the elements of current tuple, else it appends the tuple from the for loop directly to the result list res, increments the while loop variable and breaks out of the iteration. When the for loop culminates without a break, it means that the condition breaks, thus it finds the average of the 1st element and appends the tuple (avg1, sum2) to res and skips to the tuple which is next to the one that broke the condition.
I'm doing Advent of Code and until this point I had no problem to solve issues on my own until day 18 which got me. I want to solve it on my own but I feel like I can't even start however the task itself is not difficult. It's kinda long to elaborate but the point is the following:
I have to implement nested pairs, so every item of a pair can be an another pair and so on, for example:
[[[[[9,8],1],2],3],4]
[[3,[2,[1,[7,3]]]],[6,[5,[4,[3,2]]]]]
I have to do different operations on these pairs, such as deleting pairs on a very deep level and use its values at higher level. I know Python handles tuples really great but I have yet to find a solution for recursive (?) traversal, deletion and "saving" values from the "deep". It's not a wise solution to delete items during iterating. Shall I use a different approach or some custom data structure for a task like this? I don't need an exact solution just some generic guidance.
I haven't read problem 18 from advent of code, but here is an example answer to your question with a list version and a tuple version.
Imagine I have nested pairs, and I want to write a function that delete all the deepest pairs and replaced them with a single number being the sum of the two numbers in the pair. For example:
input -> output
(1, 2) -> 3
((1, 2), 3) -> (3, 3)
(((1,2), (3,4)), ((5,6), (7,8))) -> ((3, 7), (11, 15))
((((((1, 2), 3), 4), 5), 6), 7) -> (((((3, 3), 4), 5), 6), 7)
Here is a version with immutable tuples, that returns a new tuple:
def updated(p):
result, _ = updated_helper(p)
return result
def updated_helper(p):
if isinstance(p, tuple) and len(p) == 2:
a, b = p
new_a, a_is_number = updated_helper(a)
new_b, b_is_number = updated_helper(b)
if a_is_number and b_is_number:
return a+b, False
else:
return (new_a, new_b), False
else:
return p, True
Here is a version with mutable lists, that returns nothing useful but mutates the list:
def update(p):
if isinstance(p, list) and len(p) == 2:
a, b = p
a_is_number = update(a)
b_is_number = update(b)
if isinstance(a, list) and len(a) == 1:
p[0] = a[0]
if isinstance(b, list) and len(b) == 1:
p[1] = b[0]
if a_is_number and b_is_number:
p[:] = [a+b]
return False
else:
return True
Note how I used a substantive, updated, and a verb, update, to highlight the different logic between these two similar functions. Function update performs an action (modifying a list), whereas updated(p) doesn't perform any action, but is the updated pair.
Testing:
print( updated( (((1,2), (3,4)), ((5,6), (7,8))) ) )
# ((3, 7), (11, 15))
l = [[[1,2], [3,4]], [[5,6], [7,8]]]
update(l)
print(l)
# [[3, 7], [11, 15]]
I have a list of lists, which each consists of three tuples of four-value combinations of the integers 0 to 11, e.g.:
[(0, 1, 4, 7), (2, 3, 6, 9), (5, 8, 10, 11)]
I want to be able to find if this list (and others) contains certain integers in at least one of the tuples, and contains other integers in at least one of the other tuples.
It's simple enough if I want to check if at least one tuple contains certain integers. For instance, you could have something like:
def func_1(lst):
for tup in lst:
if {0,1}.issubset(tup):
return True
return False
which returns True for [(0, 1, 4, 7), (2, 3, 6, 9), (5, 8, 10, 11)]
...but False for [(0, 4, 7, 9), (1, 2, 3, 6), (5, 8, 10, 11)]
But what if I want to find if one tuple contains {0,1} and another separate tuple contains {2, 3}? Something like this won't work:
def func_2(lst):
for tup in lst:
if {0,1}.issubset(tup) or {2,3}.issubset(tup):
return True
return False
That would return True for both of the above lists, when it is only True for the first one.
You'd have to iterate over the tuples, and test each against both sets. Then if one of the sets matches, remove that set from consideration and only return true if the all sets have matches in the remainder:
def func_2(lst):
sets_to_test = {frozenset({0,1}), frozenset({2,3})}
for tup in lst:
matched = next((s.issubset(tup) for s in sets_to_test), None)
if matched:
s.remove(matched)
if not sets_to_test:
# all sets have matches to separate tuples
return True
return False
This approach will extend to more sets too.
Note that this could miss matches if there are tuples that are subsets of either one of the sets. If this is a possibility you need to account for, I'd use recursion to handle this option:
def func_2(lst, sets_to_test=None):
if sets_to_test is None:
sets_to_test = {frozenset({0,1}), frozenset({2,3})}
if not sets_to_test:
# all sets have matched
return True
if not lst:
# no tuples remain
return False
for i, tup in enumerate(lst):
for set_to_test in sets_to_test:
if set_to_test.issubset(tup):
if func_2(lst[i + 1:], sets_to_test - {set_to_test}):
return True
return False
So whenever one of the sets matches, recursion is used to test the remaining sets and list of tuples; if that recursive call doesn't return True, further sets are tested.
Use 2 functions. The first checks for your first match. Then call the second on the remaining elements to check for your second match. If there is none, continue your previous search.
def sub_search(lst):
for tup in lst:
if {2,3}.issubset(tup):
return True
return False
def search(lst):
for i,tup in enumerate(lst):
if {0,1}.issubset(tup):
if sub_search(lst[0:i]+lst[i+1:]):
return True
return False
Assume, I have a NumPy-array of integers, as:
[34,2,3,22,22,22,22,22,22,18,90,5,-55,-19,22,6,6,6,6,6,6,6,6,23,53,1,5,-42,82]
I want to find the start and end indices of the array, where a value is more than x-times (say 5-times) repeated. So in the case above, it is the value 22 and 6. Start index of the repeated 22 is 3 and end-index is 8. Same for the repeatening 6.
Is there a special tool in Python that is helpful?
Otherwise, I would loop through the array index for index and compare the actual value with the previous.
Regards.
Using np.diff and the method given here by #WarrenWeckesser for finding runs of zeros in an array:
import numpy as np
def zero_runs(a): # from link
iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
absdiff = np.abs(np.diff(iszero))
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
return ranges
a = [34,2,3,22,22,22,22,22,22,18,90,5,-55,-19,22,6,6,6,6,6,6,6,6,23,53,1,5,-42,82]
zero_runs(np.diff(a))
Out[87]:
array([[ 3, 8],
[15, 22]], dtype=int32)
This can then be filtered on the difference between the start & end of the run:
runs = zero_runs(np.diff(a))
runs[runs[:, 1]-runs[:, 0]>5] # runs of 7 or more, to illustrate filter
Out[96]: array([[15, 22]], dtype=int32)
Here is a solution using Python's native itertools.
Code
import itertools as it
def find_ranges(lst, n=2):
"""Return ranges for `n` or more repeated values."""
groups = ((k, tuple(g)) for k, g in it.groupby(enumerate(lst), lambda x: x[-1]))
repeated = (idx_g for k, idx_g in groups if len(idx_g) >=n)
return ((sub[0][0], sub[-1][0]) for sub in repeated)
lst = [34,2,3,22,22,22,22,22,22,18,90,5,-55,-19,22,6,6,6,6,6,6,6,6,23,53,1,5,-42,82]
list(find_ranges(lst, 5))
# [(3, 8), (15, 22)]
Tests
import nose.tools as nt
def test_ranges(f):
"""Verify list results identifying ranges."""
nt.eq_(list(f([])), [])
nt.eq_(list(f([0, 1,1,1,1,1,1, 2], 5)), [(1, 6)])
nt.eq_(list(f([1,1,1,1,1,1, 2,2, 1, 3, 1,1,1,1,1,1], 5)), [(0, 5), (10, 15)])
nt.eq_(list(f([1,1, 2, 1,1,1,1, 2, 1,1,1], 3)), [(3, 6), (8, 10)])
nt.eq_(list(f([1,1,1,1, 2, 1,1,1, 2, 1,1,1,1], 3)), [(0, 3), (5, 7), (9, 12)])
test_ranges(find_ranges)
This example captures (index, element) pairs in lst, and then groups them by element. Only repeated pairs are retained. Finally, first and last pairs are sliced, yielding (start, end) indices from each repeated group.
See also this post for finding ranges of indices using itertools.groupby.
There really isn't a great short-cut for this. You can do something like:
mult = 5
for elem in val_list:
target = [elem] * mult
found_at = val_list.index(target)
I leave the not-found exceptions and longer sequence detection to you.
If you're looking for value repeated n times in list L, you could do something like this:
def find_repeat(value, n, L):
look_for = [value for _ in range(n)]
for i in range(len(L)):
if L[i] == value and L[i:i+n] == look_for:
return i, i+n
Here is a relatively quick, errorless solution which also tells you how many copies were in the run. Some of this code was borrowed from KAL's solution.
# Return the start and (1-past-the-end) indices of the first instance of
# at least min_count copies of element value in container l
def find_repeat(value, min_count, l):
look_for = [value for _ in range(min_count)]
for i in range(len(l)):
count = 0
while l[i + count] == value:
count += 1
if count >= min_count:
return i, i + count
I had a similar requirement. This is what I came up with, using only comprehension lists:
A=[34,2,3,22,22,22,22,22,22,18,90,5,-55,-19,22,6,6,6,6,6,6,6,6,23,53,1,5,-42,82]
Find unique and return their indices
_, ind = np.unique(A,return_index=True)
np.unique sorts the array, sort the indices to get the indices in the original order
ind = np.sort(ind)
ind contains the indices of the first element in the repeating group, visible by non-consecutive indices
Their diff gives the number of elements in a group. Filtering using np.diff(ind)>5 shall give a boolean array with True at the starting indices of groups. The ind array contains the end indices of each group just after each True in the filtered list
Create a dict with the key as the repeating element and the values as a tuple of start and end indices of that group
rep_groups = dict((A[ind[i]], (ind[i], ind[i+1]-1)) for i,v in enumerate(np.diff(ind)>5) if v)
Earlier I had a lot of wonderful programmers help me get a function done. however the instructor wanted it in a single loop and all the working solutions used multiple loops.
I wrote an another program that almost solves the problem. Instead of using a loop to compare all the values, you have to use the function has_key to see if that specific key exists. Answer of that will rid you of the need to iter through the dictionary to find matching values because u can just know if they are matching or not.
again, charCount is just a function that enters the constants of itself into a dictionary and returns the dictionary.
def sumPair(theList, n):
for a, b in level5.charCount(theList).iteritems():
x = n - a
if level5.charCount(theList).get(a):
if a == x:
if b > 1: #this checks that the frequency of the number is greater then one so the program wouldn't try to multiply a single possibility by itself and use it (example is 6+6=12. there could be a single 6 but it will return 6+6
return a, x
else:
if level5.charCount(theList).get(a) != x:
return a, x
print sumPair([6,3,8,3,2,8,3,2], 9)
I need to just make this code find the sum without iteration by seeing if the current element exists in the list of elements.
You can use collections.Counter function instead of the level5.charCount
And I don't know why you need to check if level5.charCount(theList).get(a):. I think it is no need. a is the key you get from the level5.charCount(theList)
So I simplify you code:
form collections import Counter
def sumPair(the_list, n):
for a, b in Counter(the_list).iteritems():
x = n - a
if a == x and b >1:
return a, x
if a != x and b != x:
return a, x
print sumPair([6, 3, 8, 3, 2, 8, 3, 2], 9) #output>>> (8, 1)
The also can use List Comprehension like this:
>>>result = [(a, n-a) for a, b in Counter(the_list).iteritems() if a==n-a and b>1 or (a != n-a and b != n-a)]
>>>print result
[(8, 1), (2, 7), (3, 6), (6, 3)]
>>>print result[0] #this is the result you want
(8, 1)