Related
I have a list of tuples:
x = [(2, 10), (4, 5), (8, 10), (9, 11), (10, 15)]
I'm trying to compare the first values in all the tuples to see if they are within 1 from each other. If they are within 1, I want to aggregate (sum) the second value of the tuple, and take the mean of the first value.
The output list would look like this:
[(2, 10), (4, 5), (9, 36)]
Notice that the 8 and 10 have a difference of 2, but they're both only 1 away from 9, so they all 3 get aggregated.
I have been trying something along these lines, but It's not capturing the sequenced values like 8, 9, and 10. It's also still preserving the original values, even if they've been aggregated together.
tuple_list = [(2, 10), (4, 5), (8, 10), (9, 11), (10, 15)]
output_list = []
for x1,y1 in tuple_list:
for x2,y2 in tuple_list:
if x1==x2:
continue
if np.abs(x1-x2) <= 1:
output_list.append((np.mean([x1,x2]), y1+y2))
else:
output_list.append((x1,y1))
output_list = list(set(output_list))
You can do it in a list comprehension using groupby (from itertools). The grouping key will be the difference between the first value and the tuple's index in the list. When the values are 1 apart, this difference will be constant and the tuples will be part of the same group.
For example: [2, 4, 8, 9, 10] minus their indexes [0, 1, 2, 3, 4] will give [2, 3, 6, 6, 6] forming 3 groups: [2], [4] and [8 ,9, 10].
from itertools import groupby
x = [(2, 10), (4, 5), (8, 10), (9, 11), (10, 15)]
y = [ (sum(k)/len(k),sum(v)) # output tuple
for i in [enumerate(x)] # sequence iterator
for _,g in groupby(x,lambda t:t[0]-next(i)[0]) # group by sequence
for k,v in [list(zip(*g))] ] # lists of keys & values
print(y)
[(2.0, 10), (4.0, 5), (9.0, 36)]
The for k,v in [list(zip(*g))] part is a bit tricky but what it does it transform a list of tuples (in a group) into two lists (k and v) with k containing the first item of each tuple and v containing the second items.
e.g. if g is ((8,10),(9,11),(10,15)) then k will be (8,9,10) and v will be (10,11,15)
By sorting the list first, and then using itertools.pairwise to iterate over the next and previous days, this problem starts to become much easier. On sequential days, instead of adding a new item to our final list, we modify the last item added to it. Figuring out the new sum is easy enough, and figuring out the new average is actually super easy because we're averaging sequential numbers. We just need to keep track of how many sequential days have passed and we can use that to get the average.
def on_neighboring_days_sum_occurrances(tuple_list):
tuple_list.sort()
ret = []
sequential_days = 1
# We add the first item now
# And then when we start looping we begin looping on the second item
# This way the loop will always be able to modify ret[-1]
ret.append(tuple_list[0])
# Python 3.10+ only, in older versions do
# for prev, current in zip(tuple_list, tuple_list[1:]):
for prev, current in itertools.pairwise(tuple_list):
day = current[0]
prev_day = prev[0]
is_sequential_day = day - prev_day <= 1
if is_sequential_day:
sequential_days += 1
avg_day = day - sequential_days/2
summed_vals = ret[-1][1] + current[1]
ret[-1] = (avg_day, summed_vals)
else:
sequential_days = 1
ret.append(current)
return ret
You can iterate through the list and keep track of a single tuple, and iterate from the tuple next to the one that you're tracking till the penultimate tuple in the list because, when the last tuple comes into tracking there is no tuple after that and thus it is a waste iteration; and find if the difference between the 1st elements is equal to the difference in indices of the tuples, if so sum up the 2nd as well as 1st elements, when this condition breaks, divide the sum of 1st elements with the difference in indices so as to get the average of them, and append them to the result list, now to make sure the program doesn't consider the same tuples again, jump to the index where the condition broke... like this
x = [(2, 10), (4, 5), (8, 10), (9, 11), (10, 15)]
x.sort()
res, i = [], 0
while i<len(x)-1:
sum2, avg1 = x[i][1], x[i][0]
for j in range(i+1, len(x)):
if abs(x[j][0]-x[i][0]) == (j-i):
sum2 += x[j][1]
avg1 += x[j][0]
else:
res.append(x[i])
i+=1
break
else:
avg1 /= len(x)-i
res.append((int(avg1), sum2))
i = j+1
print(res)
Here the while loop iterates from the start of the list till the penultimate tuple in the list, the sum2, avg1 keeps track of the 2nd and 1st elements of the current tuple respectively. The for loop iterates through the next tuple to the current tuple till the end. The if checks the condition, and if it is met, it adds the elements of the tuple from the for loop since the variables are intialized with the elements of current tuple, else it appends the tuple from the for loop directly to the result list res, increments the while loop variable and breaks out of the iteration. When the for loop culminates without a break, it means that the condition breaks, thus it finds the average of the 1st element and appends the tuple (avg1, sum2) to res and skips to the tuple which is next to the one that broke the condition.
from collections import Counter
test_list = [(6, 5), (2, 7), (2, 5), (8, 7), (9, 8), (3, 7)]
freq_2ndEle=Counter(val for key,val in test_list)
res=sorted(test_list,key=lambda ele:freq_2ndEle[ele[1]],reverse=True)
print(res)
Input : test_list = [(6, 5), (1, 7), (2, 5), (8, 7), (9, 8), (3, 7)]
Output : [(1, 7), (8, 7), (3, 7), (6, 5), (2, 5), (9, 8)]
Explanation : 7 occurs 3 times as 2nd element, hence all tuples with 7, are aligned first.
please clarify how the code is working especially, this part
res=sorted(test_list,key=lambda ele:freq_2ndEle[ele[1]],reverse=True)
I have confusion on ele:freq_2ndEle[ele[1]].
Here is an explanation - in the future, you should try following similar steps, including reading the documentation:
Counter takes an iterable or a map as an argument. In your case, val for key,val in test_list is an iterable. You fetch values from test_list and feed them to Counter.
You don't need the key, val semantics, it is confusing in this context, as it suggests you are looping through a dictionary. Instead, you are looping through a list of tuples so freq_2ndEle=Counter(tp[1] for tp in test_list) is much clearer - here you access the second tuple element, indexed with 1.
Counter gives you number of occurrences of each of the second tuple elements. If you print freq_2ndEle, you will see this:
Counter({7: 3, 5: 2, 8: 1}), which is a pair of how many times each second element appears in the list.
In the last step you're sorting the original list by the frequency of the second element using sorted,
res=sorted(test_list,key=lambda ele:freq_2ndEle[ele[1]],reverse=True)
So you take in test_list as an argument to sort, and then you specify the key by which you want to sort: in your case the key is the the time second tuple element occurred.
freq_2ndEle stores key-value pairs of second second element name:times it ocurred in test_list - it is a dictionary in a way, so you access it as you would access a dictionary, that is - you get the value that corresponds to ele[1] which is the (name) of the second tuple element. Name is not the base term, but I thought it may be clearer. The value you fetch with freq_2ndEle[ele[1]] is exactly the time ele[1] occurred in test_list
Lastly, you sort the keys, but in reverse order - that is, descending, highest to lowest, [(2, 7), (8, 7), (3, 7), (6, 5), (2, 5), (9, 8)] with the values that have the same keys (like 7 and 5) grouped together. Note, according to the documentation sorted is stable, meaning it will preserve the order of elements from input, and this is why when the keys are the same, you get them in the order as in test_list i.e. (2,7) goes first and (3,7) last in the "7" group.
freq_2ndEle is a dictionary that contains the second elements of the tuple as keys, and their frequencies as values. Passing this frequency as a return value of lambda in the key argument of the function sorted will sort the list by this return value of lambda (which is the frequency).
If your question is about how lambda works, you can refer to this brief explanation which is pretty simple.
So the problem is essentially this: I have a list of tuples made up of n ints that have to be eliminated if they dont fit certain criteria. This criterion boils down to that each element of the tuple must be equal to or less than the corresponding int of another list (lets call this list f) in the exact position.
So, an example:
Assuming I have a list of tuples called wk, made up of tuples of ints of length 3, and a list f composed of 3 ints. Like so:
wk = [(1,3,8),(8,9,1),(1,1,1)]
f = [2,5,8]
=== After applying the function ===
wk_result = [(1,3,8),(1,1,1)]
The rationale would be that when looking at the first tuple of wk ((1,3,8)), the first element of it is smaller than the first element of f. The second element of wk also complies with the rule, and the same applies for the third. This does not apply for the second tuple tho given that the first and second element (8 and 9) are bigger than the first and second elements of f (2 and 5).
Here's the code I have:
for i,z in enumerate(wk):
for j,k in enumerate(z):
if k <= f[j]:
pass
else:
del wk[i]
When I run this it is not eliminating the tuples from wk. What could I be doing wrong?
EDIT
One of the answers provided by user #James actually made it a whole lot simpler to do what I need to do:
[t for t in wk if t<=tuple(f)]
#returns:
[(1, 3, 8), (1, 1, 1)]
The thing is in my particular case it is not getting the job done, so I assume it might have to do with the previous steps of the process which I will post below:
max_i = max(f)
siz = len(f)
flist = [i for i in range(1,max_i +1)]
def cartesian_power(seq, p):
if p == 0:
return [()]
else:
result = []
for x1 in seq:
for x2 in cartesian_power(seq, p - 1):
result.append((x1,) + x2)
return result
wk = cartesian_power(flist, siz)
wk = [i for i in wk if i <= tuple(f) and max(i) == max_i]
What is happening is the following: I cannot use the itertools library to do permutations, that is why I am using a function that gets the job done. Once I produce a list of tuples (wk) with all possible permutations, I filter this list using two parameters: the one that brought me here originally and another one not relevant for the discussion.
Ill show an example of the results with numbers, given f = [2,5,8]:
[(1, 1, 8), (1, 2, 8), (1, 3, 8), (1, 4, 8), (1, 5, 8), (1, 6, 8), (1, 7, 8), (1, 8, 1), (1, 8, 2), (1, 8, 3), (1, 8, 4), (1, 8, 5), (1, 8, 6), (1, 8, 7), (1, 8, 8), (2, 1, 8), (2, 2, 8), (2, 3, 8), (2, 4, 8), (2, 5, 8)]
As you can see, there are instances where the ints in the tuple are bigger than the corresponding position in the f list, like (1,6,8) where the second position of the tuple (6) is bigger than the number in the second position of f (5).
You can use list comprehension with a (short-circuiting) predicate over each tuple zipped with the list f.
wk = [(1, 3, 8), (8, 9, 1), (1, 1, 1), (1, 9, 1)]
f = [2, 5, 8] # In this contrived example, f could preferably be a 3-tuple as well.
filtered = [t for t in wk if all(a <= b for (a, b) in zip(t, f))]
print(filtered) # [(1, 3, 8), (1, 1, 1)]
Here, all() has been used to specify a predicate that all tuple members must be less or equal to the corresponding element in the list f; all() will short-circuit its testing of a tuple as soon as one of its members does not pass the tuple member/list member <= sub-predicate.
Note that I added a (1, 9, 1) tuple for an example where the first tuple element passes the sub-predicate (<= corresponding element in f) whereas the 2nd tuple element does not (9 > 5).
You can do this with a list comprehension. It iterates over the list of tuples and checks that all of the elements of the tuple are less than or equal to the corresponding elements in f. You can compare tuples directly for element-wise inequality
[t for t in wk if all(x<=y for x,y in zip(t,f)]
# returns:
[(1, 3, 8), (1, 1, 1)]
Here is without loop solution which will compare each element in tuple :
wk_1 = [(1,3,8),(8,9,1),(1,1,1)]
f = [2,5,8]
final_input=[]
def comparison(wk, target):
if not wk:
return 0
else:
data=wk[0]
if data[0]<=target[0] and data[1]<=target[1] and data[2]<=target[2]:
final_input.append(data)
comparison(wk[1:],target)
comparison(wk_1,f)
print(final_input)
output:
[(1, 3, 8), (1, 1, 1)]
P.S : since i don't know you want less and equal or only less condition so modify it according to your need.
I'm using the product method from the itertools python library to calculate all permutations of items in a list of lists. As an example:
>> elems = [[1,2],[4,5],[7,8]]
>> permutations = list(itertools.product(*elems))
>> print permutations
# this prints [(1, 4, 7), (1, 4, 8), (1, 5, 7), (1, 5, 8), (2, 4, 7), (2, 4, 8), (2, 5, 7), (2, 5, 8)]
How can I check each permutation as it is calculated, rather than returning the entire set of permutations at once? The problem I am currently facing is that I run into a python Memory Error while running my script because too many permutations are being generated. I only care about a single one of the permutations. If I can check each permutation as it is generated, I can store just a single value rather than storing every possible permutation. Is this possible, and if so, how would I go about implementing this?
You can just do it in a for loop, one at a time:
for a_perm in itertools.product(*elems):
print(a_perm)
itertools.product() gives you iterator, which you can iterate over one item at a time.
I have the following list of tuples:
a = [(1, 2), (2, 4), (3, 1), (4, 4), (5, 2), (6, 8), (7, -1)]
I would like to select the elements which second value in the tuple is increasing compared to the previous one. For example I would select (2, 4) because 4 is superior to 2, in the same manner I would select (4, 4) and (6, 8).
Can this be done in a more elegant way than a loop starting explicitly on the second element ?
To clarify, I want to select the tuples which second elements are superior to the second element of the prior tuple.
>>> [right for left, right in pairwise(a) if right[1] > left[1]]
[(2, 4), (4, 4), (6, 8)]
Where pairwise is an itertools recipe you can find here.
You can use a list comprehension to do this fairly easily:
a = [a[i] for i in range(1, len(a)) if a[i][1] > a[i-1][1]]
This uses range(1, len(a)) to start from the second element in the list, then compares the second value in each tuple with the second value from the preceding tuple to determine whether it should be in the new list.
Alternatively, you could use zip to generate pairs of neighbouring tuples:
a = [two for one, two in zip(a, a[1:]) if two[1] > one[1]]
You can use enumerate to derive indices and then make list comprehension:
a = [t[1] for t in enumerate(a[1:]) if t[1][1] > a[t[0]-1][1]]
You can use list comprehension
[i for i in a if (i[0] < i[1])]
Returns
[(1, 2), (2, 4), (6, 8)]
Edit: I was incorrect in my understanding of the question. The above code will return all tuples in which the second element is greater than the first. This is not the answer to the question OP asked.