I have a list of tuples which I would like to only return the second column of data from and only unique values
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
Desired output:
['Andrew#gmail.com','Jim#gmail.com','Sarah#gmail.com']
My idea would be to iterate through the list and append the item from the second column into a new list then use the following code. Before I go down that path too far I know there is a better way to do this.
from collections import Counter
cnt = Counter(mytuple_new)
unique_mytuple_new = [k for k, v in cnt.iteritems() if v > 1]
You can use zip function :
>>> set(zip(*mytuple)[1])
set(['Sarah#gmail.com', 'Jim#gmail.com', 'Andrew#gmail.com'])
Or as a less performance way you can use map and operator.itemgetter and use set to get the unique tuple :
>>> from operator import itemgetter
>>> tuple(set(map(lambda x:itemgetter(1)(x),mytuple)))
('Sarah#gmail.com', 'Jim#gmail.com', 'Andrew#gmail.com')
a benchmarking on some answers :
my answer :
s = """\
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
set(zip(*mytuple)[1])
"""
print timeit.timeit(stmt=s, number=100000)
0.0740020275116
icodez answer :
s = """\
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
seen = set()
[x[1] for x in mytuple if x[1] not in seen and not seen.add(x[1])]
"""
print timeit.timeit(stmt=s, number=100000)
0.0938332080841
Hasan's answer :
s = """\
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
set([k[1] for k in mytuple])
"""
print timeit.timeit(stmt=s, number=100000)
0.0699651241302
Adem's answer :
s = """
from itertools import izip
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
set(map(lambda x: x[1], mytuple))
"""
print timeit.timeit(stmt=s, number=100000)
0.237300872803 !!!
unique_emails = set(item[1] for item in mytuple)
The list comprehension will help you generate a list containing only the second column data, and converting that list to set() removes duplicated values.
try:
>>> unique_mytuple_new = set([k[1] for k in mytuple])
>>> unique_mytuple_new
set(['Sarah#gmail.com', 'Jim#gmail.com', 'Andrew#gmail.com'])
You can use a list comprehension and a set to keep track of seen values:
>>> mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
>>> seen = set()
>>> [x[1] for x in mytuple if x[1] not in seen and not seen.add(x[1])]
['Andrew#gmail.com', 'Jim#gmail.com', 'Sarah#gmail.com']
>>>
The most important part of this solution is that order is preserved like in your example. Doing just set(x[1] for x in mytuple) or something similar will get you the unique items, but their order will be lost.
Also, the if x[1] not in seen and not seen.add(x[1]) may seem a little strange, but it is actually a neat trick that allows you to add items to the set inside the list comprehension (otherwise, we would need to use a for-loop).
Because and performs short-circuit evaluation in Python, not seen.add(x[1]) will only be evaluated if x[1] not in seen returns True. So, the condition sees if x[1] is in the set and adds it if not.
The not operator is placed before seen.add(x[1]) so that the condition evaluates to True if x[1] needed to be added to the set (set.add returns None, which is treated as False. not False is True).
How about the obvious and simple loop? There is no need to create a list and then convert to set, just don't add dupliates.
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
result = []
for item in mytuple:
if item[1] not in result:
result.append(item[1])
print result
Output:
['Andrew#gmail.com', 'Jim#gmail.com', 'Sarah#gmail.com']
Is the order of the items important? A lot of the proposed answers use set to unique-ify the list. That's good, proper, and performant if the order is unimportant. If order does matter, you can used an OrderedDict to perform set-like unique-ification while preserving order.
# test data
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
from collections import OrderedDict
emails = list(OrderedDict((t[1], 1) for t in mytuple).keys())
print emails
Yielding:
['Andrew#gmail.com', 'Jim#gmail.com', 'Sarah#gmail.com']
Update
Based on iCodez's suggestion, restating answer to:
from collections import OrderedDict
emails = list(OrderedDict.fromkeys(t[1] for t in mytuple).keys())
Related
Is there an efficient way without for loops to compare if an item inside a list of tuples is the same across all tuples in Python?
lst_tups = [('Hello', 1, 'Name:'), ('Goodbye', 1, 'Surname:'), ('See you!', 1, 'Time:')]
The expected output is Return all unique values for item in index 1 of the tuple
unique = list()
for i in lst_tups:
item = i[1]
unique.append(item)
set(unique)
Expected Output:
>>
Unique values: [1]
True if all are equal, otherwise False
I think the set comprehension is an acceptable way:
>>> unique = {i[1] for i in lst_tups}
>>> unique
{1}
If you want to avoid the for loop anyway, you can use operator.itemgetter and map (for large lists, it will be slightly more efficient than set comprehension, but the readability is worse):
>>> from operator import itemgetter
>>> unique = set(map(itemgetter(1), lst_tups))
>>> unique
{1}
Then you can confirm whether the elements are all the same by judging whether the length of the set is 1:
>>> len(unique) == 1
True
If you only want to get the result or the item you want to compare is unhashable (such as dict), you can use itertools.pairwise (in Python3.10+) to compare adjacent elements to judge (but that doesn't mean it will be faster):
>>> from itertools import pairwise, starmap
>>> from operator import itemgetter, eq
>>> all(i[1] == j[1] for i, j in pairwise(lst_tups))
True
>>> all(starmap(eq, pairwise(map(itemgetter(1), lst_tups))))
True
According to the questions raised in the comment area, when your unique item is in another position or the element itself in the sequence, the above method only needs to be slightly modified to achieve the purpose, so here are two more general solutions:
def all_equal_by_set(iterable):
return len(set(iterable)) == 1
def all_equal_by_compare(iterable):
return all(starmap(eq, pairwise(iterable)))
Then you just need to call them like this:
>>> all_equal_by_set(map(itemgetter(1), lst_tups))
True
>>> all_equal_by_set(tup[1] for tup in lst_tups) # Note that here is a generator expression, which is no longer comprehension.
True
>>> all_equal_by_compare(map(itemgetter(1), lst_tups))
True
>>> all_equal_by_compare(tup[1] for tup in lst_tups)
True
Solution without using for loop.
import operator
lst_tups = [('Hello', 1, 'Name:'), ('Goodbye', 1, 'Surname:'), ('See you!', 1, 'Time:')]
unique = set(map(operator.itemgetter(1),lst_tups))
print(unique) # {1}
Please consider above code and write if is an efficient way according to your standards.
You can use chain.from_iterable and slicing with three step : [1::3].
from itertools import chain
res = list(chain.from_iterable(lst_tups))[1::3]
print(set(res))
# If you want to print True if all are equal, otherwise False
if len(set(res)) == 1:
print('True')
else:
print('False')
{1}
I have the following list and nested list:
first_var = ["id1","id2","id3"]
second_var = [("id1","name1"),("id2","name2"),("id3","name3"),("id4","name4"),]
I want to check for each first element in 'second_var' that doesn't exist in 'first_var' and print the second element in the 'second_var'.
my code is:
for x in [x[0] for x in second_var]:
if x not in first_var:
print(...)
For now if i execute print(x) it prints:
id4
but i need it to print
name4
How can i achieve that?
The problem with your code is you are not iterating the original list. You are only iterating the first entry of each tuple within the list.
This is how you can adapt your code:
first_var = ["id1","id2","id3"]
second_var = [("id1","name1"),("id2","name2"),("id3","name3"),("id4","name4"),]
for x in second_var:
if x[0] not in first_var:
print(x[1])
The Pythonic solution is to convert this to a list comprehension:
values = set(first_var)
res = [x[1] for x in second_var if x[0] not in values]
for item in res:
print(item)
Or the functional version; not recommended, but another way of seeing the logic:
from operator import itemgetter
values = set(first_var)
res = map(itemgetter(1), filter(lambda x: x[0] not in values, second_var))
>>> [v[1] for v in second_var if v[0] not in first_var]
['name4']
You can use list comprehension feature.
ids = [tuple[1] for tuple in second_var if tuple[0] not in first_var]
print(ids)
Output
['name4']
The list comprehension statement above is equivalent to:
>>> result = []
for tuple in second_var:
if tuple[0] not in first_var:
result.append(tuple[1])
>>> result
['name4']
If you have a lot of data, you need to build a dictionary & use set & all those nice hashing/difference techniques already existing in Python instead of linear lookup in lists (O(1) vs O(n)).
first_var = ["id1","id2","id3"]
second_var = [("id1","name1"),("id2","name2"),("id3","name3"),("id4","name4"),]
second_d = dict(second_var) # create a dict directly from tuples
missing = set(second_d).difference(first_var)
for m in missing:
print(second_d[m])
this prints name4
missing is the difference between the dict keys and the list.
I have a list of 3-tuples in a Python program that I'm building while looking through a file (so one at a time), with the following setup:
(feature,combination,durationOfTheCombination),
such that if a unique combination of feature and combination is found, it will be added to the list. The list itself holds a similar setup, but the durationOfTheCombination is the sum of all duration that share the unique combination of (feature,combination). Therefore, when deciding if it should be added to the list, I need to only compare the first two parts of the tuple, and if a match is found, the duration is added to the corresponding list item.
Here's an example for clarity. If the input is
(ABC,123,10);(ABC,123,10);(DEF,123,5);(ABC,123,30);(EFG,456,30)
The output will be (ABC,123,50);(DEF,123,5);(EFG,456,30).
Is there any way to do this comparison?
You can do this with Counter,
In [42]: from collections import Counter
In [43]: lst = [('ABC',123,10),('ABC',123,10),('DEF',123,5)]
In [44]: [(i[0],i[1],i[2]*j) for i,j in Counter(lst).items()]
Out[44]: [('DEF', 123, 5), ('ABC', 123, 20)]
As per the OP suggestion if it's have different values, use groupby
In [26]: lst = [('ABC',123,10),('ABC',123,10),('ABC',123,25),('DEF',123,5)]
In [27]: [tuple(list(n)+[sum([i[2] for i in g])]) for n,g in groupby(sorted(lst,key = lambda x:x[:2]), key = lambda x:x[:2])]
Out[27]: [('ABC', 123, 45), ('DEF', 123, 5)]
If you don't want to use Counter, you can use a dict instead.
setOf3Tuples = dict()
def add3TupleToSet(a):
key = a[0:2]
if key in setOf3Tuples:
setOf3Tuples[a[0:2]] += a[2]
else:
setOf3Tuples[a[0:2]] = a[2]
def getRaw3Tuple():
for k in setOf3Tuples:
yield k + (setOf3Tuples[k],)
if __name__ == "__main__":
add3TupleToSet(("ABC",123,10))
add3TupleToSet(("ABC",123,10))
add3TupleToSet(("DEF",123,5))
print([i for i in getRaw3Tuple()])
It seems a dict is more suited than a list here, with the first 2 fields as key. And to avoid checking each time if the key is already here you can use a defaultdict.
from collections import defaultdict
d = defaultdict(int)
for t in your_list:
d[t[:2]] += t[-1]
Assuming your input is collected in a list as below, you can use pandas groupby to accomplish this quickly:
import pandas as pd
input = [('ABC',123,10),('ABC',123,10),('DEF',123,5),('ABC',123,30),('EFG',456,30)]
output = [tuple(x) for x in pd.DataFrame(input).groupby([0,1])[2].sum().reset_index().values]
I have a list populated from entries of a log; for sake of simplicity, something like
listlog = ["entry1:abcde", "entry2:abbds", "entry1:eorieo", "entry3:orieqor", "entry2:iroewiow"......]
This list can have an undefined number of entry, which may or may not be in sequence, since I run multiple operations in async fashion.
Then I have another list, which I use as reference to get only the list of entries; which may be like
list_template = ["entry1", "entry2", "entry3"]
I am trying to use the second list, to get sequences of entries, so I can isolate the single sequence, taking only the first instance found of each entry.
Since I am not dealing with numbers, I can't use set, so I did try with a loop inside a loop, comparing values in each list
This does not work, because it is possible that another entry may happen before what I am looking for (say, I want entry1, entry2, entry3, and the loop find entry1, but then find entry3, and since I compare every element of each list, it will be happy to find an element)
for item in listlog:
entry, value = item.split(":")
for reference_entry in list_template:
if entry == reference_entry:
print item
break
I have to, in a nutshell, find a sequence as in the template list, while these items are not necessarily in order. I am trying to parse the list once, otherwise I could do a very expensive multi-pass for each element of the template list, until I find the first occurrence and bail out. I thought that doing the loop in the loop is more efficient, since my reference list is always smaller than the log list, which is usually few elements.
How would you approach this problem, in the most efficient and pythonic way? All that I can think of, is multiple passes on the log list
you can use dict:
>>> listlog
['entry1:abcde', 'entry2:abbds', 'entry1:eorieo', 'entry3:orieqor', 'entry2:iroewiow']
>>> list_template
['entry1', 'entry2', 'entry3']
>>> for x in listlog:
... key, value = x.split(":")
... if key not in my_dict and key in list_template:
... my_dict[key] = value
...
>>> my_dict
{'entry2': 'abbds', 'entry3': 'orieqor', 'entry1': 'abcde'}
Disclaimer : This answer could use someone's insight on performance. Sure, list/dict comprehensions and zip are pythonic but the following may very well be a poor use of those tools.
You could use zip :
>>> data = ["a:12", "b:32", "c:54"]
>>> ref = ['c', 'b']
>>> matches = zip(ref, [val for key,val in [item.split(':') for item in data] if key in ref])
>>> for k, v in matches:
>>> print("{}:{}".format(k, v))
c:32
b:54
Here's another (worse? I'm not sure, performance-wise) way to get around this :
>>> data = ["a:12", "b:32", "c:54"]
>>> data_dict = {x:y for x,y in [item.split(':') for item in data]}
>>> ["{}:{}".format(key, val) for key,val in md.items() if key in ref]
['b:32', 'c:54']
Explanation :
Convert your initial list into a dict using a dict
For each pair of (key, val) found in the dict, join both in a string if the key is found in the 'ref' list
You can use a list comprehension something like this:
import re
listlog = ["entry1:abcde", "entry2:abbds", "entry1:eorieo", "entry3:orieqor", "entry2:iroewiow"]
print([item for item in listlog if re.search('entry', item)])
# ['entry1:abcde', 'entry2:abbds', 'entry1:eorieo', 'entry3:orieqor', 'entry2:iroewiow']
Than u can split 'em as u wish and create a dictonary if u want:
import re
listlog = ["entry1:abcde", "entry2:abbds", "entry1:eorieo", "entry3:orieqor", "entry2:iroewiow"]
mylist = [item for item in listlog if re.search('entry', item)]
def create_dict(string, dict_splitter=':'):
_dict = {}
temp = string.split(dict_splitter)
key = temp[0]
value = temp[1]
_dict[key] = value
return _dict
mydictionary = {}
for x in mylist:
x = str(x)
mydictionary.update(create_dict(x))
for k, v in mydictionary.items():
print(k, v)
# entry1 eorieo
# entry2 iroewiow
# entry3 orieqor
As you see this method need an update, cause we have changing the dictionary value. That's bad. Most better to update value for the same key. But it's much easier as u can think
I'm trying to compare multiple lists. However the lists aren't label...normally. I'm using a while loop to make a new list each time and label them accordingly. So for example, if the while loop runs 3 times it will make a List1 a List2 and List3. Here is then snippet of the code to create the list.
for link in links:
print('*', link.text)
locals()['list{}'.format(str(i))].append(link.text)
So I want to compare each list for the strings that are in them but I want to compare all the lists at once then print out the common strings.
I feel like I'll be using something like this, but I'm not 100% sure.
lists = [list1, list2, list3, list4, list5, list6, list7, list8, list9, list10]
common = list(set().union(*lists).intersection(Keyword))
Rather than directly modifying locals() (generally not a good idea), use a defaultdict as a container. This data structure allows you to create new key-value pairs on the fly rather than relying on a method which is sure to lead to a NameError at some point.
from collections import defaultdict
i = ...
link_lists = defaultdict(list)
for link in links:
print('*', link.text)
link_lists[i].append(link.text)
To find the intersection of all of the lists:
all_lists = list(link_lists.values())
common_links = set(all_lists[0]).intersection(*all_lists[1:])
In Python 2.6+, you can pass multiple iterables to set.intersection. This is what the star-args do here.
Here's an example of how the intersection will work:
>>> from collections import defaultdict
>>> c = defaultdict(list)
>>> c[9].append("a")
>>> c[0].append("b")
>>> all = list(c.values())
>>> set(all[0]).intersection(*all[1:])
set()
>>> c[0].append("a")
>>> all = list(c.values())
>>> set(all[0]).intersection(*all[1:])
{'a'}
You have several options,
option a)
use itertools to get a cartesian product, this is quite nice because its an iterator
a = ["A", "B", "C"]
b = ["A","C"]
c = ["C","D","E"]
for aval,bval,cval in itertools.product(a,b,c):
if aval == bval and bval == cval:
print aval
option b)
Use sets (recommended):
all_lists = []
# insert your while loop X times
for lst in lists: # This is my guess of your loop running.
currentList = map(lambda x: x.link, links)
all_lists.append(currentList) # O(1) operation
result_set = set()
if len(all_lists)>1:
result_set = set(all_lists[0]).intersection(*all_lists[1:])
else:
result_set = set(all_lists[0])
Using the sets, however, will be faster