Accumulate items in a list of tuples - python

I have a list of tuples that looks like this:
lst = [(0, 0), (2, 3), (4, 3), (5, 1)]
What is the best way to accumulate the sum of the first and secound tuple elements? Using the example above, I'm looking for the best way to produce this list:
new_lst = [(0, 0), (2, 3), (6, 6), (11, 7)]
I am looking for a solution in Python 2.6

I would argue the best solution is itertools.accumulate() to accumulate the values, and using zip() to split up your columns and merge them back. This means the generator just handles a single column, and makes the method entirely scalable.
>>> from itertools import accumulate
>>> lst = [(0, 0), (2, 3), (4, 3), (5, 1)]
>>> list(zip(*map(accumulate, zip(*lst))))
[(0, 0), (2, 3), (6, 6), (11, 7)]
We use zip() to take the columns, then apply itertools.accumulate() to each column, then use zip() to merge them back into the original format.
This method will work for any iterable, not just sequences, and should be relatively efficient.
Prior to 3.2, accumulate can be defined as:
def accumulate(iterator):
total = 0
for item in iterator:
total += item
yield total
(The docs page gives a more generic implementation, but for this use case, we can use this simple implementation).

How about this generator:
def accumulate_tuples(iterable):
accum_a = accum_b = 0
for a, b in iterable:
accum_a += a
accum_b += b
yield accum_a, accum_b
If you need a list, just call list(accumulate_tuples(your_list)).
Here's a version that works for arbitrary length tuples:
def accumulate_tuples(iterable):
it = iter(iterable):
accum = next(it) # initialize with the first value
yield accum
for val in it: # iterate over the rest of the values
accum = tuple(a+b for a, b in zip(accum, val))
yield accum

>> reduce(lambda x,y: (x[0] + y[0], x[1] + y[1]), lst)
(11, 7)
EDIT. I can see your updated question. To get the running list you can do:
>> [reduce(lambda x,y: (x[0]+y[0], x[1]+y[1]), lst[:i]) for i in range(1,len(lst)+1)]
[(0, 0), (2, 3), (6, 6), (11, 7)]
Not super efficient, but at least it works and does what you want :)

This works for any length of tuples or other iterables.
from collections import defaultdict
def accumulate(lst):
sums = defaultdict(int)
for item in lst:
for index, subitem in enumerate(item):
sums[index] += subitem
yield [sums[index] for index in xrange(len(sums))]
print [tuple(x) for x in accumulate([(0, 0), (2, 3), (4, 3), (5, 1)])]
In Python 2.7+ you would use a Counter instead of defaultdict(int).

This is a really poor way (in terms of performance) to do this because list.append is expensive, but it works.
last = lst[0]
new_list = [last]
for t in lst[1:]:
last += t
new_list.append(last)

Simple method:
>> x = [(0, 0), (2, 3), (4, 3), (5, 1)]
>>> [(sum(a for a,b in x[:t] ),sum(b for a,b in x[:t])) for t in range(1,len(x)+1)]
[(0, 0), (2, 3), (6, 6), (11, 7)]

lst = [(0, 0), (2, 3), (4, 3), (5, 1)]
lst2 = [lst[0]]
for idx in range(1, len(lst)):
newItem = [0,0]
for idx2 in range(0, idx + 1):
newItem[0] = newItem[0] + lst[idx2][0]
newItem[1] = newItem[1] + lst[idx2][1]
lst2.append(newItem)
print(lst2)

You can use the following function
>>> def my_accumulate(lst):
new_lst = [lst[0]]
for x, y in lst[1:]:
new_lst.append((new_lst[-1][0]+x, new_lst[-1][1]+y))
return new_lst
>>> lst = [(0, 0), (2, 3), (4, 3), (5, 1)]
>>> my_accumulate(lst)
[(0, 0), (2, 3), (6, 6), (11, 7)]

Changed my code to a terser version:
lst = [(0, 0), (2, 3), (4, 3), (5, 1)]
def accumulate(the_list):
the_item = iter(the_list)
accumulator = next(the_item)
while True:
yield accumulator
accumulator = tuple(x+y for (x,y) in zip (accumulator, next(the_item)))
new_lst = list(accumulate(lst))

Related

How to sort a list of tuples which contain two integer element firstly by its second, then by its first element descending in Python?

I've been using loops in order to solve my problem but my brain has stopped. I have a input as a list which contains couple of tuples like this:
input_list = [(1, 2), (2, 2), (3, 2), (5, 3), (7, 3), (4, 4)]
I would like to mix this tuples. Here are the steps:
Tuples have to be sorted in ascending by its second element for every time. ((X, 2), (X, 3), (X, 4), (X, 2), (X, 3), (X, 2))
First element of the tuple must be the smallest value. ((1, 2), (5, 3), (4, 4), (2, 2), (7, 3), (3, 2))
Here is the output I would like to achieve:
output_list = [(1, 2), (5, 3), (4, 4), (2, 2), (7, 3), (3, 2)]
Can someone help me to find the algorithm to mix and convert to input list to the output list?
Sort the input by the second, then the first element. For each tuple in the sorted list, iterate a list of "slots" (lists) and add the tuple to the first slot whose last item is less than the tuple. If there's no such slot, create a new slot. Finally, join the slots into one list:
slots = []
for p in sorted(INPUT, key=lambda p: p[::-1]):
for s in slots:
if s[-1][1] < p[1]:
s.append(p)
break
else:
slots.append([p])
result = sum(slots, [])
You could use roundrobin from the itertools recipes. The function group_values sorts the data so that it can be used as an input for roundrobin.
from collections import defaultdict
from itertools import cycle, islice
def roundrobin(*iterables):
"roundrobin('ABC', 'D', 'EF') --> A D E B F C"
# Recipe credited to George Sakkis
num_active = len(iterables)
nexts = cycle(iter(it).__next__ for it in iterables)
while num_active:
try:
for next_ in nexts:
yield next_()
except StopIteration:
# Remove the iterator we just exhausted from the cycle.
num_active -= 1
nexts = cycle(islice(nexts, num_active))
def group_values(data):
result = defaultdict(list)
# sort by the second element first
for entry in sorted(data, key=lambda x: (x[1], x[0])):
result[entry[1]].append(entry)
return result
def main():
input_list = [(1, 2), (2, 2), (3, 2), (5, 3), (7, 3), (4, 4)]
grouped_lists = group_values(input_list)
result = list(roundrobin(*grouped_lists.values()))
print(result)
if __name__ == '__main__':
main()
group_values creates a dictionary that looks like this: {2: [(1, 2), (2, 2), (3, 2)], 3: [(5, 3), (7, 3)], 4: [(4, 4)]}. The function takes advantage of the fact that in modern versions of Python, entries in the dictionaries retain the order in which they were added.
The result is [(1, 2), (5, 3), (4, 4), (2, 2), (7, 3), (3, 2)], even if you shuffle the input list.
You might want to use a debugger or add some print functions in the code if you want to see what's happening.

How to remove duplicate from list of tuple when order is important

I have seen some similar answers, but I can't find something specific for this case.
I have a list of tuples:
[(5, 0), (3, 1), (3, 2), (5, 3), (6, 4)]
What I want is to remove tuples from this list only when first element of tuple has occurred previously in the list and the tuple which remains should have the smallest second element.
So the output should look like this:
[(5, 0), (3, 1), (6, 4)]
Here's a linear time approach that requires two iterations over your original list.
t = [(5, 0), (3, 1), (3, 2), (5, 3), (6, 4)] # test case 1
#t = [(5, 3), (3, 1), (3, 2), (5, 0), (6, 4)] # test case 2
smallest = {}
inf = float('inf')
for first, second in t:
if smallest.get(first, inf) > second:
smallest[first] = second
result = []
seen = set()
for first, second in t:
if first not in seen and second == smallest[first]:
seen.add(first)
result.append((first, second))
print(result) # [(5, 0), (3, 1), (6, 4)] for test case 1
# [(3, 1), (5, 0), (6, 4)] for test case 2
Here is a compact version I came up with using OrderedDict and skipping replacement if new value is larger than old.
from collections import OrderedDict
a = [(5, 3), (3, 1), (3, 2), (5, 0), (6, 4)]
d = OrderedDict()
for item in a:
# Get old value in dictionary if exist
old = d.get(item[0])
# Skip if new item is larger than old
if old:
if item[1] > old[1]:
continue
#else:
# del d[item[0]]
# Assign
d[item[0]] = item
list(d.values())
Returns:
[(5, 0), (3, 1), (6, 4)]
Or if you use the else-statement (commented out):
[(3, 1), (5, 0), (6, 4)]
Seems to me that you need to know two things:
The tuple that has the smallest second element for each first element.
The order to index each first element in the new list
We can get #1 by using itertools.groupby and a min function.
import itertools
import operator
lst = [(3, 1), (5, 3), (5, 0), (3, 2), (6, 4)]
# I changed this slightly to make it harder to accidentally succeed.
# correct final order should be [(3, 1), (5, 0), (6, 4)]
tmplst = sorted(lst, key=operator.itemgetter(0))
groups = itertools.groupby(tmplst, operator.itemgetter(0))
# group by first element, in this case this looks like:
# [(3, [(3, 1), (3, 2)]), (5, [(5, 3), (5, 0)]), (6, [(6, 4)])]
# note that groupby only works on sorted lists, so we need to sort this first
min_tuples = {min(v, key=operator.itemgetter(1)) for _, v in groups}
# give the best possible result for each first tuple. In this case:
# {(3, 1), (5, 0), (6, 4)}
# (note that this is a set comprehension for faster lookups later.
Now that we know what our result set looks like, we can re-tackle lst to get them in the right order.
seen = set()
result = []
for el in lst:
if el not in min_tuples: # don't add to result
continue
elif el not in seen: # add to result and mark as seen
result.append(el)
seen.add(el)
This will do what you need:
# I switched (5, 3) and (5, 0) to demonstrate sorting capabilities.
list_a = [(5, 3), (3, 1), (3, 2), (5, 0), (6, 4)]
# Create a list to contain the results
list_b = []
# Create a list to check for duplicates
l = []
# Sort list_a by the second element of each tuple to ensure the smallest numbers
list_a.sort(key=lambda i: i[1])
# Iterate through every tuple in list_a
for i in list_a:
# Check if the 0th element of the tuple is in the duplicates list; if not:
if i[0] not in l:
# Add the tuple the loop is currently on to the results; and
list_b.append(i)
# Add the 0th element of the tuple to the duplicates list
l.append(i[0])
>>> print(list_b)
[(5, 0), (3, 1), (6, 4)]
Hope this helped!
Using enumerate() and list comprehension:
def remove_if_first_index(l):
return [item for index, item in enumerate(l) if item[0] not in [value[0] for value in l[0:index]]]
Using enumerate() and a for loop:
def remove_if_first_index(l):
# The list to store the return value
ret = []
# Get the each index and item from the list passed
for index, item in enumerate(l):
# Get the first number in each tuple up to the index we're currently at
previous_values = [value[0] for value in l[0:index]]
# If the item's first number is not in the list of previously encountered first numbers
if item[0] not in previous_values:
# Append it to the return list
ret.append(item)
return ret
Testing
some_list = [(5, 0), (3, 1), (3, 2), (5, 3), (6, 4)]
print(remove_if_first_index(some_list))
# [(5, 0), (3, 1), (6, 4)]
I had this idea without seeing the #Anton vBR's answer.
import collections
inp = [(5, 0), (3, 1), (3, 2), (5, 3), (6, 4)]
od = collections.OrderedDict()
for i1, i2 in inp:
if i2 <= od.get(i1, i2):
od.pop(i1, None)
od[i1] = i2
outp = list(od.items())
print(outp)

Is it possible to include mutiple equations in a lambda function

I'm trying to include multiple operations in the lambda function with variables that have different lengths, i.e. something like:
$ serial_result = map(lambda x,y:(x**2,y**3), range(20), range(10))
but this doesn't work. Could someone tell me how to get around this?
I understand that:
$ serial_result = map(lambda x,y:(x**2,y**3), range(0,20,2), range(10))
works because the arrays of "x" and "y" have the same length.
If you want the product of range items you can use itertools.product :
>>> from itertools import product
>>> serial_result = map(lambda x:(x[0]**2,x[1]**3), product(range(20), range(10)))
If you want to pass the pairs to lambda like second case you can use itertools.zip_longest (in python 2 use izip_longest)and pass a fillvalue to fill the missed items,
>>> from itertools import zip_longest
>>> serial_result = map(lambda x:(x[0]**2,x[1]**3), zip_longest(range(20), range(10),fillvalue=1))
Note that if you are in python 2 you can pass multiple argument to lambda as a tuple :
>>> serial_result = map(lambda (x,y):(x**2,y**3), product(range(20), range(10)))
See the difference of izip_longest and product in following example :
>>> list(product(range(5),range(3)))
[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2), (3, 0), (3, 1), (3, 2), (4, 0), (4, 1), (4, 2)]
>>> list(zip_longest(range(5),range(3)))
[(0, 0), (1, 1), (2, 2), (3, None), (4, None)]
>>> list(zip_longest(range(5),range(3),fillvalue=1))
[(0, 0), (1, 1), (2, 2), (3, 1), (4, 1)]
It sounds like you may be confused as to how exactly you want to use all the values in these two variables. There are several ways combine them...
If you want a result for every combination of an element in a and an element in b: itertools.product(a, b).
If you want to stop once you get to the end of the shorter: zip(a, b)
If you want to continue on until you've used all of the longest: itertools.zip_longer(a, b) (izip_longer in python 2). Once a runs out of elements it will be filled in with None, or a default you provide.

How to generate list of tuples relating records

I need to generate a list from the list of tuples:
a = [(1,2), (1,3), (2,3), (2,5), (2,6), (3,4), (3,6), (4,7), (5 6), (5,9), (5,10), (6,7)
(6.10) (6.11) (7.8) (7.12) (8.12) (9.10) (10.11)]
The rule is:
- I have a record from any (begin = random.choice (a))
- Items from the new list must have the following relationship:
the last item of each tuple in the list must be equal to the first item of the next tuple to be inserted.
Example of a valid output (starting by the tuple (3.1)):
[(3, 1), (1, 2), (2, 3), (3, 4), (4, 7), (7, 8), (8, 12), (12, 7), (7, 6), (6, 2), (2, 5), (5, 6), (6, 10), (10, 5) (5, 9), (9, 10), (10, 11), (11, 6), (6, 3)]
How can I do this? Its make using list comprehensions?
Thanks!
Here, lisb will be populated with tuples in the order that you seek. This is, of course, if lisa provides appropriate tuples (ie, each tuple has a 1th value matching another tuple's 0th value). Your sample list will not work, regardless of the implementation, because all the values don't match up (for example, there is no 0th element with 12, so that tuple can't be connected forward to any other tuple)...so you should come up with a better sample list.
Tested, working.
import random
lisa = [(1, 2), (3, 4), (2, 3), (4, 0), (0, 9), (9, 1)]
lisb = []
current = random.choice(lisa)
while True:
lisa.remove(current)
lisb.append(current)
current = next((y for y in lisa if y[0] == current[1]), None)
if current == None:
break
print lisb
If you don't want to delete items from lisa, just slice a new list.
As a generator function:
def chained_tuples(x):
oldlist = x[::]
item = random.choice(oldlist)
oldlist.remove(item)
yield item
while oldlist:
item = next(next_item for next_item in oldlist if next_item[0] == item[1])
oldlist.remove(item)
yield item
As noted, you'll get an incomplete response if your list isn't actually chainable all the way through, like your example list.
Just to add another way of solving this problem:
import random
from collections import defaultdict
lisa = [(1, 2), (3, 4), (2, 3), (4, 0), (0, 9), (9, 1)]
current_start, current_end = lisa[random.randint(0, len(lisa) - 1)]
starts = defaultdict(list)
lisb = [(current_start, current_end)]
for start, end in lisa:
starts[start].append(end)
while True:
if not starts[current_end]:
break
current_start, current_end = current_end, starts[current_end].pop()
lisb.append((current_start, current_end))
Note: You have to make sure lisa is not empty.
I think all of the answers so far are missing the requirement (at least based on your example output) that the longest chain be found.
My suggested solution is to recursively parse all possible chains that can be constructed, and return the longest result. The function looks like this:
def generateTuples(list, offset, value = None):
if value == None: value = list[offset]
list = list[:offset]+list[offset+1:]
res = []
for i,(a,b) in enumerate(list):
if value[1] in (a,b):
if value[1] == a:
subres = generateTuples(list, i, (a,b))
else:
subres = generateTuples(list, i, (b,a))
if len(subres) > len(res):
res = subres
return [value] + res
And you would call it like this:
results = generateTuples(a, 1, (3,1))
Producing the list:
[(3, 1), (1, 2), (2, 3), (3, 4), (4, 7), (7, 8), (8, 12), (12, 7), (7, 6),
(6, 2), (2, 5), (5, 6), (6, 10), (10, 5), (5, 9), (9, 10), (10, 11),
(11, 6), (6, 3)]
The first parameter of the function is the source list of tuples, the second parameter is the offset of the first element to use, the third parameter is optional, but allows you to override the value of the first element. The latter is useful when you want to start with a tuple in its reversed order as you have done in your example.

How to enumerate a range of numbers starting at 1

I am using Python 2.5, I want an enumeration like so (starting at 1 instead of 0):
[(1, 2000), (2, 2001), (3, 2002), (4, 2003), (5, 2004)]
I know in Python 2.6 you can do: h = enumerate(range(2000, 2005), 1) to give the above result but in python2.5 you cannot...
Using Python 2.5:
>>> h = enumerate(range(2000, 2005))
>>> [x for x in h]
[(0, 2000), (1, 2001), (2, 2002), (3, 2003), (4, 2004)]
Does anyone know a way to get that desired result in Python 2.5?
As you already mentioned, this is straightforward to do in Python 2.6 or newer:
enumerate(range(2000, 2005), 1)
Python 2.5 and older do not support the start parameter so instead you could create two range objects and zip them:
r = xrange(2000, 2005)
r2 = xrange(1, len(r) + 1)
h = zip(r2, r)
print h
Result:
[(1, 2000), (2, 2001), (3, 2002), (4, 2003), (5, 2004)]
If you want to create a generator instead of a list then you can use izip instead.
Just to put this here for posterity sake, in 2.6 the "start" parameter was added to enumerate like so:
enumerate(sequence, start=1)
Python 3
Official Python documentation: enumerate(iterable, start=0)
You don't need to write your own generator as other answers here suggest. The built-in Python standard library already contains a function that does exactly what you want:
>>> seasons = ['Spring', 'Summer', 'Fall', 'Winter']
>>> list(enumerate(seasons))
[(0, 'Spring'), (1, 'Summer'), (2, 'Fall'), (3, 'Winter')]
>>> list(enumerate(seasons, start=1))
[(1, 'Spring'), (2, 'Summer'), (3, 'Fall'), (4, 'Winter')]
The built-in function is equivalent to this:
def enumerate(sequence, start=0):
n = start
for elem in sequence:
yield n, elem
n += 1
Easy, just define your own function that does what you want:
def enum(seq, start=0):
for i, x in enumerate(seq):
yield i+start, x
Simplest way to do in Python 2.5 exactly what you ask about:
import itertools as it
... it.izip(it.count(1), xrange(2000, 2005)) ...
If you want a list, as you appear to, use zip in lieu of it.izip.
(BTW, as a general rule, the best way to make a list out of a generator or any other iterable X is not [x for x in X], but rather list(X)).
from itertools import count, izip
def enumerate(L, n=0):
return izip( count(n), L)
# if 2.5 has no count
def count(n=0):
while True:
yield n
n+=1
Now h = list(enumerate(xrange(2000, 2005), 1)) works.
enumerate is trivial, and so is re-implementing it to accept a start:
def enumerate(iterable, start = 0):
n = start
for i in iterable:
yield n, i
n += 1
Note that this doesn't break code using enumerate without start argument. Alternatively, this oneliner may be more elegant and possibly faster, but breaks other uses of enumerate:
enumerate = ((index+1, item) for index, item)
The latter was pure nonsense. #Duncan got the wrapper right.
>>> list(enumerate(range(1999, 2005)))[1:]
[(1, 2000), (2, 2001), (3, 2002), (4, 2003), (5, 2004)]
h = [(i + 1, x) for i, x in enumerate(xrange(2000, 2005))]
Ok, I feel a bit stupid here... what's the reason not to just do it with something like
[(a+1,b) for (a,b) in enumerate(r)] ? If you won't function, no problem either:
>>> r = range(2000, 2005)
>>> [(a+1,b) for (a,b) in enumerate(r)]
[(1, 2000), (2, 2001), (3, 2002), (4, 2003), (5, 2004)]
>>> enumerate1 = lambda r:((a+1,b) for (a,b) in enumerate(r))
>>> list(enumerate1(range(2000,2005))) # note - generator just like original enumerate()
[(1, 2000), (2, 2001), (3, 2002), (4, 2003), (5, 2004)]
>>> h = enumerate(range(2000, 2005))
>>> [(tup[0]+1, tup[1]) for tup in h]
[(1, 2000), (2, 2001), (3, 2002), (4, 2003), (5, 2004)]
Since this is somewhat verbose, I'd recommend writing your own function to generalize it:
def enumerate_at(xs, start):
return ((tup[0]+start, tup[1]) for tup in enumerate(xs))
I don't know how these posts could possibly be made more complicated then the following:
# Just pass the start argument to enumerate ...
for i,word in enumerate(allWords, 1):
word2idx[word]=i
idx2word[i]=word

Categories