python efficient way to compare item in list of tuples - python

Is there an efficient way without for loops to compare if an item inside a list of tuples is the same across all tuples in Python?
lst_tups = [('Hello', 1, 'Name:'), ('Goodbye', 1, 'Surname:'), ('See you!', 1, 'Time:')]
The expected output is Return all unique values for item in index 1 of the tuple
unique = list()
for i in lst_tups:
item = i[1]
unique.append(item)
set(unique)
Expected Output:
>>
Unique values: [1]
True if all are equal, otherwise False

I think the set comprehension is an acceptable way:
>>> unique = {i[1] for i in lst_tups}
>>> unique
{1}
If you want to avoid the for loop anyway, you can use operator.itemgetter and map (for large lists, it will be slightly more efficient than set comprehension, but the readability is worse):
>>> from operator import itemgetter
>>> unique = set(map(itemgetter(1), lst_tups))
>>> unique
{1}
Then you can confirm whether the elements are all the same by judging whether the length of the set is 1:
>>> len(unique) == 1
True
If you only want to get the result or the item you want to compare is unhashable (such as dict), you can use itertools.pairwise (in Python3.10+) to compare adjacent elements to judge (but that doesn't mean it will be faster):
>>> from itertools import pairwise, starmap
>>> from operator import itemgetter, eq
>>> all(i[1] == j[1] for i, j in pairwise(lst_tups))
True
>>> all(starmap(eq, pairwise(map(itemgetter(1), lst_tups))))
True
According to the questions raised in the comment area, when your unique item is in another position or the element itself in the sequence, the above method only needs to be slightly modified to achieve the purpose, so here are two more general solutions:
def all_equal_by_set(iterable):
return len(set(iterable)) == 1
def all_equal_by_compare(iterable):
return all(starmap(eq, pairwise(iterable)))
Then you just need to call them like this:
>>> all_equal_by_set(map(itemgetter(1), lst_tups))
True
>>> all_equal_by_set(tup[1] for tup in lst_tups) # Note that here is a generator expression, which is no longer comprehension.
True
>>> all_equal_by_compare(map(itemgetter(1), lst_tups))
True
>>> all_equal_by_compare(tup[1] for tup in lst_tups)
True

Solution without using for loop.
import operator
lst_tups = [('Hello', 1, 'Name:'), ('Goodbye', 1, 'Surname:'), ('See you!', 1, 'Time:')]
unique = set(map(operator.itemgetter(1),lst_tups))
print(unique) # {1}
Please consider above code and write if is an efficient way according to your standards.

You can use chain.from_iterable and slicing with three step : [1::3].
from itertools import chain
res = list(chain.from_iterable(lst_tups))[1::3]
print(set(res))
# If you want to print True if all are equal, otherwise False
if len(set(res)) == 1:
print('True')
else:
print('False')
{1}

Related

Is there a Don't Care value for lists in Python

Is there a way to use count() where you are looking for a specific value in the nested list and not caring about the rest?
lst = [[1,6],[1,4],[3,4],[1,2]]
X = 1
lst.count([X, _ ])
This would return a count of 3, since there are three nested lists that have a 1 in the first index.
Is there a way to do this?
Use some sneaky sum() hacks:
sum(k[0] == X for k in your_list)
I.e.
>>> X = 1
>>> your_list = [[1,6],[1,4],[3,4],[1,2]]
>>> sum(k[0] == X for k in your_list)
3
why?
The section: k[0] == X for k in your_list is a generator expression that yields True for each element in your_list which has first element equal to your X. The sum() function takes the values and treats a True as a 1.
Look at the length of a filtered list:
my_list = [[1,6][1,4][3,4][1,2]]
X = 1
len([q for q in my_list if q[0] == X])
Or, if you prefer to use count, then make a list of the items you do care about:
[q[0] for q in my_list].count(X)
You can do len(filter(lambda x: x[0] == 1, lst))
But be careful, if your list contains an element that is not a list (or an empty list) it will throw an exception! This could be handled by adding two additional conditions
len(filter(lambda x: type(x) == list and len(x) > 0 and x[0] == 1, lst))
Counting how often one value occurs in the first position requires a full pass over the list, so if you plan to use the potential countfunction(inputlist, target) more than once on the same list, it's more efficient to build a dictionary holding all the counts (also requiring one pass) which you can subsequently query with O(1).
>>> from collections import Counter
>>> from operator import itemgetter
>>>
>>> lst = [[1,6],[1,4],[3,4],[1,2]]
>>> c = Counter(map(itemgetter(0), lst))
>>> c[1]
3
>>> c[3]
1
>>> c[512]
0
Others have shown good ways to approach this problem using python built-ins, but you can use numpy if what you're actually after is fancy indexing.
For example:
import numpy as np
lst = np.array([[1,6],[1,4],[3,4],[1,2]])
print(lst)
#array([[1, 6],
# [1, 4],
# [3, 4],
# [1, 2]])
In this case lst is a numpy.ndarray with shape (4,2) (4 rows and 2 columns). If you want to count the number of rows where the first column (index 0) is equal to X, you can write:
X = 1
print((lst[:,0] == X).sum())
#3
The first part lst[:,0] means grab all rows and only the first index.
print(lst[:,0])
#[1 1 3 1]
Then you check which of these is equal to X:
print(lst[:,0]==X)
#[ True True False True]
Finally sum the resultant array to get the count. (There is an implicit conversion from bool to int for the sum.)

Check whether a list starts with the elements of another list

What is the easiest (most pythonic way) to check, if the beginning of the list are exactly the elements of another list? Consider the following examples:
li = [1,4,5,3,2,8]
#Should return true
startsWithSublist(li, [1,4,5])
#Should return false
startsWithSublist(list2, [1,4,3])
#Should also return false, although it is contained in the list
startsWithSublist(list2, [4,5,3])
Sure I could iterate over the lists, but I guess there is an easier way. Both list will never contain the same elements twice, and the second list will always be shorter or equal long to the first list. Length of the list to match is variable.
How to do this in Python?
Use list slicing:
>>> li = [1,4,5,3,2,8]
>>> sublist = [1,4,5]
>>> li[:len(sublist)] == sublist
True
You can do it using all without slicing and creating another list:
def startsWithSublist(l,sub):
return len(sub) <= l and all(l[i] == ele for i,ele in enumerate(sub))
It will short circuit if you find non-matching elements or return True if all elements are the same, you can also use itertools.izip :
from itertools import izip
def startsWithSublist(l,sub):
return len(sub) <= l and all(a==b for a,b in izip(l,sub))

Dropping values from a list of tuples

I have a list of tuples which I would like to only return the second column of data from and only unique values
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
Desired output:
['Andrew#gmail.com','Jim#gmail.com','Sarah#gmail.com']
My idea would be to iterate through the list and append the item from the second column into a new list then use the following code. Before I go down that path too far I know there is a better way to do this.
from collections import Counter
cnt = Counter(mytuple_new)
unique_mytuple_new = [k for k, v in cnt.iteritems() if v > 1]
You can use zip function :
>>> set(zip(*mytuple)[1])
set(['Sarah#gmail.com', 'Jim#gmail.com', 'Andrew#gmail.com'])
Or as a less performance way you can use map and operator.itemgetter and use set to get the unique tuple :
>>> from operator import itemgetter
>>> tuple(set(map(lambda x:itemgetter(1)(x),mytuple)))
('Sarah#gmail.com', 'Jim#gmail.com', 'Andrew#gmail.com')
a benchmarking on some answers :
my answer :
s = """\
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
set(zip(*mytuple)[1])
"""
print timeit.timeit(stmt=s, number=100000)
0.0740020275116
icodez answer :
s = """\
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
seen = set()
[x[1] for x in mytuple if x[1] not in seen and not seen.add(x[1])]
"""
print timeit.timeit(stmt=s, number=100000)
0.0938332080841
Hasan's answer :
s = """\
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
set([k[1] for k in mytuple])
"""
print timeit.timeit(stmt=s, number=100000)
0.0699651241302
Adem's answer :
s = """
from itertools import izip
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
set(map(lambda x: x[1], mytuple))
"""
print timeit.timeit(stmt=s, number=100000)
0.237300872803 !!!
unique_emails = set(item[1] for item in mytuple)
The list comprehension will help you generate a list containing only the second column data, and converting that list to set() removes duplicated values.
try:
>>> unique_mytuple_new = set([k[1] for k in mytuple])
>>> unique_mytuple_new
set(['Sarah#gmail.com', 'Jim#gmail.com', 'Andrew#gmail.com'])
You can use a list comprehension and a set to keep track of seen values:
>>> mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
>>> seen = set()
>>> [x[1] for x in mytuple if x[1] not in seen and not seen.add(x[1])]
['Andrew#gmail.com', 'Jim#gmail.com', 'Sarah#gmail.com']
>>>
The most important part of this solution is that order is preserved like in your example. Doing just set(x[1] for x in mytuple) or something similar will get you the unique items, but their order will be lost.
Also, the if x[1] not in seen and not seen.add(x[1]) may seem a little strange, but it is actually a neat trick that allows you to add items to the set inside the list comprehension (otherwise, we would need to use a for-loop).
Because and performs short-circuit evaluation in Python, not seen.add(x[1]) will only be evaluated if x[1] not in seen returns True. So, the condition sees if x[1] is in the set and adds it if not.
The not operator is placed before seen.add(x[1]) so that the condition evaluates to True if x[1] needed to be added to the set (set.add returns None, which is treated as False. not False is True).
How about the obvious and simple loop? There is no need to create a list and then convert to set, just don't add dupliates.
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
result = []
for item in mytuple:
if item[1] not in result:
result.append(item[1])
print result
Output:
['Andrew#gmail.com', 'Jim#gmail.com', 'Sarah#gmail.com']
Is the order of the items important? A lot of the proposed answers use set to unique-ify the list. That's good, proper, and performant if the order is unimportant. If order does matter, you can used an OrderedDict to perform set-like unique-ification while preserving order.
# test data
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
from collections import OrderedDict
emails = list(OrderedDict((t[1], 1) for t in mytuple).keys())
print emails
Yielding:
['Andrew#gmail.com', 'Jim#gmail.com', 'Sarah#gmail.com']
Update
Based on iCodez's suggestion, restating answer to:
from collections import OrderedDict
emails = list(OrderedDict.fromkeys(t[1] for t in mytuple).keys())

Efficiently test if two lists have the same elements and length

I got 2 lists:
alist = ['A','B','C','D']
anotherList = ['A','C','B','D']
would like to write a function which returns True if both lists contain the exact same elements, and are same length. I'm kinda new on this stuff, so I got this, which I'm pretty sure it's terrible, and I'm trying to find a more efficient way. Thanks!
def smyFunction(aList,anotherList):
n = 0
for element in aList:
if element in anotherList:
n = n+1
if n == len(aList):
return True
else:
return False
The two ways that come to mind are:
1) Use collections.Counter
>>> from collections import Counter
>>> Counter(alist) == Counter(anotherList)
True
2) Compare the sorted lists
>>> sorted(alist) == sorted(anotherList)
True
Sort the lists with sorted and then compare them with ==:
>>> alist = ['A','B','C','D']
>>> anotherList = ['A','C','B','D']
>>> def smyFunction(aList,anotherList):
... return sorted(aList) == sorted(anotherList)
...
>>> smyFunction(alist, anotherList)
True
>>>
You need to sort them first in case the elements are out of order:
>>> alist = ['A','B','C','D']
>>> anotherList = ['D','A','C','B']
>>> alist == anotherList
False
>>> sorted(alist) == sorted(anotherList)
True
>>>
Actually, it would probably be better to test the length of the lists first and then use sorted:
return len(alist) == len(anotherList) and sorted(alist) == sorted(anotherList)
That way, we can avoid the sorting operations if the lengths of the list are different to begin with (using len on a list has O(1) (constant) complexity, so it is very cheap).
If there aren't duplicates, use a set, it doesn't have an order:
set(alist) == set(anotherList)
try like this:
def check(a,b):
return sorted(a) == sorted(b)

How to find the number of instances of an item in a list of lists

I want part of a script I am writing to do something like this.
x=0
y=0
list=[["cat","dog","mouse",1],["cat","dog","mouse",2],["cat","dog","mouse",3]]
row=list[y]
item=row[x]
print list.count(item)
The problem is that this will print 0 because it isn't searching the individual lists.How can I make it return the total number of instances instead?
Search per sublist, adding up results per contained list with sum():
sum(sub.count(item) for sub in lst)
Demo:
>>> lst = [["cat","dog","mouse",1],["cat","dog","mouse",2],["cat","dog","mouse",3]]
>>> item = 'cat'
>>> sum(sub.count(item) for sub in lst)
3
sum() is a builtin function for adding up its arguments.
The x.count(item) for x in list) is a "generator expression" (similar to a list comprehension) - a handy way to create and manage list objects in python.
item_count = sum(x.count(item) for x in list)
That should do it
Using collections.Counter and itertools.chain.from_iterable:
>>> from collections import Counter
>>> from itertools import chain
>>> lst = [["cat","dog","mouse",1],["cat","dog","mouse",2],["cat","dog","mouse",3]]
>>> count = Counter(item for item in chain.from_iterable(lst) if not isinstance(item, int))
>>> count
Counter({'mouse': 3, 'dog': 3, 'cat': 3})
>>> count['cat']
3
I filtered out the ints because I didn't see why you had them in the first place.

Categories