Smart way to delete tuples

Smart way to delete tuples - python

I having a list of tuple as describes below (This tuple is sorted in decreasing order of the second value):
from string import ascii_letters
myTup = zip (ascii_letters, range(10)[::-1])
threshold = 5.5
>>> myTup
[('a', 9), ('b', 8), ('c', 7), ('d', 6), ('e', 5), ('f', 4), ('g', 3), ('h', 2), \
('i', 1), ('j', 0)]
Given a threshold, what is the best possible way to discard all tuples having the second value less than this threshold.
I am having more than 5 million tuples and thus don't want to perform comparison tuple by tuple basis and consequently delete or add to another list of tuples.

Since the tuples are sorted, you can simply search for the first tuple with a value lower than the threshold, and then delete the remaining values using slice notation:
index = next(i for i, (t1, t2) in enumerate(myTup) if t2 < threshold)
del myTup[index:]
As Vaughn Cato points out, a binary search would speed things up even more. bisect.bisect would be useful, except that it won't work with your current data structure unless you create a separate key sequence, as documented here. But that violates your prohibition on creating new lists.
Still, you could use the source code as the basis for your own binary search. Or, you could change your data structure:
>>> myTup
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f'),
(6, 'g'), (7, 'h'), (8, 'i'), (9, 'j')]
>>> index = bisect.bisect(myTup, (threshold, None))
>>> del myTup[:index]
>>> myTup
[(6, 'g'), (7, 'h'), (8, 'i'), (9, 'j')]
The disadvantage here is that the deletion may occur in linear time, since Python will have to shift the entire block of memory back... unless Python is smart about deleting slices that start from 0. (Anyone know?)
Finally, if you're really willing to change your data structure, you could do this:
[(-9, 'a'), (-8, 'b'), (-7, 'c'), (-6, 'd'), (-5, 'e'), (-4, 'f'),
(-3, 'g'), (-2, 'h'), (-1, 'i'), (0, 'j')]
>>> index = bisect.bisect(myTup, (-threshold, None))
>>> del myTup[index:]
>>> myTup
[(-9, 'a'), (-8, 'b'), (-7, 'c'), (-6, 'd')]
(Note that Python 3 will complain about the None comparison, so you could use something like (-threshold, chr(0)) instead.)
My suspicion is that the linear time search I suggested at the beginning is acceptable in most circumstances.

Here's an exotic approach that wraps the list in a list-like object before performing bisect.
import bisect
def revkey(items):
class Items:
def __getitem__(self, index):
assert 0 <= index < _len
return items[_max-index][1]
def __len__(self):
return _len
def bisect(self, value):
return _len - bisect.bisect_left(self, value)
_len = len(items)
_max = _len-1
return Items()
tuples = [('a', 9), ('b', 8), ('c', 7), ('d', 6), ('e', 5), ('f', 4), ('g', 3), ('h', 2), ('i', 1), ('j', 0)]
for x in range(-2, 12):
assert len(tuples) == 10
t = tuples[:]
stop = revkey(t).bisect(x)
del t[stop:]
assert t == [item for item in tuples if item[1] >= x]

Maybe a bit faster code than of #Curious:
newTup=[]
for tup in myTup:
if tup[1]>threshold:
newTup.append(tup)
else:
break
Because the tuples are ordered, you do not have to go through all of them.
Another possibility would also be, to use bisection, and find the index i of last element, which is above threshold. Then you would do:
newTup=myTup[:i]
I think the last method would be the fastest.

Given the number of tuples you're dealing with, you may want to consider using NumPy.
Define a structured array like
my_array= np.array(myTup, dtype=[('f0',"|S10"), ('f1',float)])
You can access the second elements of your tuples with myarray['f1'] which gives you a float array. Youcan know use fancy indexing techniques to filter the elements you want, like
my_array[myarray['f1'] < threshold]
keeping only the entries where your f1 is less than your threshold..

You can also use itertools e.g.
from itertools import ifilter
iterable_filtered = ifilter(lambda x : x[1] > threshold, myTup)
If you wanted an iterable filtered list or just:
filtered = filter(lambda x: x[1] > threshold, myTup)
to go straight to a list.
I'm not too familiar with the relative performance of these methods and would have to test them (e.g. in IPython using %timeit).

Related

Python combinations with tuples

I have a list of tuples, one of the two items in the tuples is a number, I'm trying to get the list of all the tuples for which the sum of the numbers will be a specific sum. With itertools combinations I only get the list of numbers, not the list of tuples:
listInit= [("A",1), ("B",2),("C",3),("D",4)]
resultExpected=[ [("A",1),("D",4)], [("B",2),("C",3)] ] #target sum=5
With the following code:
listInit=[1,2,3,4]
result=[seq for i in range(len(listInit), 0, -1) for seq in itertools.combinations(listInit, i) if sum(seq) == 5]
print(result)
I'm getting:
result=[ (1,4), (2,3) ]
Which is the right results with numbers only, I'm not sure how to get a similar result with tuples.
Thank you in advance

I would break out the list comprehension to make it easier to follow the flow of your code.
for seq in itertools.combinations(listInit, 2):
print(seq)
=== Output: ===
(('A', 1), ('B', 2))
(('A', 1), ('C', 3))
(('A', 1), ('D', 4))
(('B', 2), ('C', 3))
(('B', 2), ('D', 4))
(('C', 3), ('D', 4))
So we can see that each seq has what we want - the pairs of tuples. Now you need to test whether the sum of the second elements is five.
for seq in itertools.combinations(listInit, 2):
# seq is a tuple of tuples
# so you want the second element of both outer tuples
if seq[0][1] + seq[1][1] == 5:
print(seq)
=== Output: ===
(('A', 1), ('D', 4))
(('B', 2), ('C', 3))

Compare List of Tuples and Return Indices of Matched Values

I'm new to programming and am having some trouble with this exercise. The goal is to write a function that returns a list of matching items.
Items are defined by a tuple with a letter and a number and we consider item 1 to match item 2 if:
Both their letters are vowels (aeiou), or both are consonants
AND
The sum of their numbers is a multiple of 3
NOTE: The return list should not include duplicate matches --> (1,2) contains the same information as (2,1), the output list should only contain one of them.
Here's an example:
***input:*** [('a', 4), ('b', 5), ('c', 1), ('d', 3), ('e', 2), ('f',6)]
***output:*** [(0,4), (1,2), (3,5)]
Any help would be much appreciated!

from itertools import combinations
lst = [('a', 4), ('b', 5), ('c', 1), ('d', 3), ('e', 2), ('f',6)]
vowels = 'aeiou'
matched = [(i[0],j[0]) for (i,j) in combinations(enumerate(lst),2) if (i[1][0] in vowels) == (j[1][0] in vowels) and ((i[1][1] + j[1][1]) % 3 == 0)]
print(matched)

Sorry, I'm high enough rep to comment, but i'll edit / update once I can.
Im a little confused about the question, what is the purpose of the letters, should we be using their positon in the alphabet as their value? i.e a=0, b=1?
what are we comparing one tuple to?
Thanks

You can use itertools.combinations with enumerate to iterate all combinations and output indices. Combinations do not include permutations, so you will not see duplicates.
from itertools import combinations
lst = [('a', 4), ('b', 5), ('c', 1), ('d', 3), ('e', 2), ('f',6)]
def checker(lst):
vowels = set('aeiou')
for (idx_i, i), (idx_j, j) in combinations(enumerate(lst), 2):
if ((i[0] in vowels) == (j[0] in vowels)) and ((i[1] + j[1]) % 3 == 0):
yield idx_i, idx_j
res = list(checker(lst))
# [(0, 4), (1, 2), (3, 5)]

iterate till the end of second array using python

I have done something like this:
d = [('e', 0), ('f', 1), ('e', 0), ('f', 1)]
e = ['a']
d = [(n,j) for n,(i,j) in zip(e,d)]
d
[('a',0)]
I was just tryig to replace the equivalent tuple value with the array value, without changing the associated numbers. But the list only goes till the len of array e and not d. What I want to get as output is something like this:
d
[('a', 0), ('f', 1), ('e', 0), ('f', 1)]

Just add the unprocessed tail of d to the processed part:
[(n,j) for n,(i,j) in zip(e,d)] + d[len(e):]
#[('a', 0), ('f', 1), ('e', 0), ('f', 1)]

You can use itertools.zip_longest:
[(n or i, j) for n,(i,j) in itertools.zip_longest(e, d)]
Check the doc

If it's acceptable to mutate the original d list, I'd simply replace the first d tuples by iterating on e:
d = [('e', 0), ('f', 1), ('e', 0), ('f', 1)]
e = ['a']
for i, new_letter in enumerate(e):
d[i] = (new_letter, d[i][1])
print(d)
# [('a', 0), ('f', 1), ('e', 0), ('f', 1)]
Note that Python tuples are immutable. d[i][0] = new_letter would fail with the error:
TypeError: 'tuple' object does not support item assignment
The above code modifies the d list in place by replacing old tuples with new ones. It cannot modify the old tuples.

I think the problem is the zip function. The documentation says that (zip) "Returns an iterator of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. The iterator stops when the shortest input iterable is exhausted."
[(n,j) for n,(i,j) in zip(e,d)] + d[len(e):] should do the trick

filtering a list of tuples in python

Am looking for a clean pythonic way of doing the following
I have a list of tuples say :
[(1,'a'), (1,'b'), (1,'c'), (2, 'd'), (5, 'e'), (5, 'f')]
I want to make a new list which discards tuples whose first key has been seen before. So the o/p for the above would be:
[(1,'c'), (2,'d'), (5, 'f')]
Thanks!

A simple way would be creating a dictionary, since it will only keep the last element with the same key:
In [1]: l = [(1,'a'), (1,'b'), (1,'c'), (2, 'd'), (5, 'e'), (5, 'f')]
In [2]: dict(l).items()
Out[2]: [(1, 'c'), (2, 'd'), (5, 'f')]
Update: As #Tadeck mentions in his comment, since the order of dictionary items is not guaranteed, you probably want to use an ordered dictionary:
from collections import OrderedDict
newl = OrderedDict(l).items()
If you actually want to keep the first tuple with the same key (and not the last, your question is ambiguous), then you could reverse the list first, add it do the dictionary and reverse the output of .items() again.
Though in that case there are probably better ways to accomplish this.

Using unique_everseen from itertools docs
from itertools import ifilterfalse
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in ifilterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
a = [(1,'a'), (1,'b'), (1,'c'), (2, 'd'), (5, 'e'), (5, 'f')]
print list(unique_everseen(a,key=lambda x: x[0]))
Yielding
[(1, 'a'), (2, 'd'), (5, 'e')]

a nifty trick for one liner fetishists that keeps the order in place (I admit it's not very readable but you know...)
>>> s = [(1,'a'), (1,'b'), (1,'c'), (2, 'd'), (5, 'e'), (5, 'f')]
>>> seen = set()
>>> [seen.add(x[0]) or x for x in s if x[0] not in seen]
[(1, 'a'), (2, 'd'), (5, 'e')]

List multiplication [duplicate]

This question already has answers here:
Operation on every pair of element in a list
(5 answers)
Closed 8 months ago.
I have a list L = [a, b, c] and I want to generate a list of tuples :
[(a,a), (a,b), (a,c), (b,a), (b,b), (b,c)...]
I tried doing L * L but it didn't work. Can someone tell me how to get this in python.

You can do it with a list comprehension:
[ (x,y) for x in L for y in L]
edit
You can also use itertools.product as others have suggested, but only if you are using 2.6 onwards. The list comprehension will work will all versions of Python from 2.0. If you do use itertools.product bear in mind that it returns a generator instead of a list, so you may need to convert it (depending on what you want to do with it).

The itertools module contains a number of helpful functions for this sort of thing. It looks like you may be looking for product:
>>> import itertools
>>> L = [1,2,3]
>>> itertools.product(L,L)
<itertools.product object at 0x83788>
>>> list(_)
[(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)]

Take a look at the itertools module, which provides a product member.
L =[1,2,3]
import itertools
res = list(itertools.product(L,L))
print(res)
Gives:
[(1,1),(1,2),(1,3),(2,1), .... and so on]

Two main alternatives:
>>> L = ['a', 'b', 'c']
>>> import itertools
>>> list(itertools.product(L, L))
[('a', 'a'), ('a', 'b'), ('a', 'c'), ('b', 'a'), ('b', 'b'), ('b', 'c'), ('c', 'a'), ('c', 'b'), ('c', 'c')]
>>> [(one, two) for one in L for two in L]
[('a', 'a'), ('a', 'b'), ('a', 'c'), ('b', 'a'), ('b', 'b'), ('b', 'c'), ('c', 'a'), ('c', 'b'), ('c', 'c')]
>>>
the former one needs Python 2.6 or better -- the latter works in just about any Python version you might be tied to.

x = [a,b,c]
y = []
for item in x:
for item2 in x:
y.append((item, item2))
Maybe not the Pythonic way but working

Ok I tried :
L2 = [(x,y) for x in L for x in L] and this got L square.
Is this the best pythonic way to do this? I would expect L * L to work in python.

The most old fashioned way to do it would be:
def perm(L):
result = []
for i in L:
for j in L:
result.append((i,j))
return result
This has a runtime of O(n^2) and is therefore quite slow, but you could consider it to be "vintage" style code.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Smart way to delete tuples - python

Related

Python combinations with tuples

Compare List of Tuples and Return Indices of Matched Values

iterate till the end of second array using python

filtering a list of tuples in python

List multiplication [duplicate]

Categories

Resources