I wish to compare two nested lists. If there is a match between the first element of each sublist, I wish to add the matched element to a new list for further operations. Below is an example and what I've tried so far:
Example:
x = [['item1','somethingelse1'], ['item2', 'somethingelse2']...]
y = [['item1','somethingelse3'], ['item3','somethingelse4']...]
What I've I tried so far:
match = []
for itemx in x:
for itemy in y:
if itemx[0] == itemy[0]:
match.append(itemx)
The above of what I tried did the job of appending the matched item into the new list, but I have two very long nested lists, and what I did above is very slow for operating on very long lists. Are there any more efficient ways to get out the matched item between two nested lists?
Yes, use a data structure with constant-time membership testing. So, using a set, for example:
seen = set()
for first,_ in x:
seen.add(first)
matched = []
for first,_ in y:
if first in seen:
matched.append(first)
Or, more succinctly using set/list comprehensions:
seen = {first for first,_ in x}
matched = [first for first,_ in y if first in seen]
(This was before the OP changed the question from append(itemx[0]) to append(itemx)...)
>>> {a[0] for a in x} & {b[0] for b in y}
{'item1'}
Or if the inner lists are always pairs:
>>> dict(x).keys() & dict(y)
{'item1'}
IIUC using numpy:
import numpy as np
y=[l[0] for l in y]
x=np.array(x)
x[np.isin(x[:, 0], y)]
I have a Python list like:
['user#gmail.com', 'someone#hotmail.com'...]
And I want to extract only the strings after # into another list directly, such as:
mylist = ['gmail.com', 'hotmail.com'...]
Is it possible? split() doesn't seem to be working with lists.
This is my try:
for x in range(len(mylist)):
mylist[x].split("#",1)[1]
But it didn't give me a list of the output.
You're close, try these small tweaks:
Lists are iterables, which means its easier to use for-loops than you think:
for x in mylist:
#do something
Now, the thing you want to do is 1) split x at '#' and 2) add the result to another list.
#In order to add to another list you need to make another list
newlist = []
for x in mylist:
split_results = x.split('#')
# Now you have a tuple of the results of your split
# add the second item to the new list
newlist.append(split_results[1])
Once you understand that well, you can get fancy and use list comprehension:
newlist = [x.split('#')[1] for x in mylist]
That's my solution with nested for loops:
myl = ['user#gmail.com', 'someone#hotmail.com'...]
results = []
for element in myl:
for x in element:
if x == '#':
x = element.index('#')
results.append(element[x+1:])
I have a list:
list1 = [1,2,3]
I'm looking up info for each item in the list via some arbitrary function, and want to add the results so the list becomes:
list1 = [(1,a),(2,b),(3,x)]
How to best accomplish this in Python3?
for item in list1:
newinfo = some_arbitrary_function(item)
item = (item, newinfo)
Does not appear to work.
You need a list comprehension:
lst = [1,2,3]
result = [(item, function(item)) for item in lst]
Using list as a name is not a good idea, you'll shadow the original list builtin making it inaccessible later in your code.
In case you want to keep the reference to the original list:
lst[:] = [(item, function(item)) for item in lst]
It is not possible to assign a list item with the iteration variable (item in your case). In fact, there is nothing special about the variable item compared to an assignment with =. If the operation is intended to be in-place, you should do
for i, item in enumerate(list):
list[i] = (item, func(item))
Also, you should not name your list list, because it will hide the built-in type list.
It looks like you just want to change the values in the list to go from a single to a tuple containing the original value and the result of some lookup on that value. This should do what you want.
zzz = [1,2,3]
i = 0
for num in zzz:
zzz[i] = (num, somefunc(num))
i += 1
running this
zzz = [1,2,3]
i = 0
for num in zzz:
zzz[i] = (num, 8)
i += 1
gives the results zzz = [(1,8), (2,8), (3,8)]
If you have list1 = [1,2,3] and list2 = [x,y,z], zip(list1, list2) will give you what you're looking for. You will need to iterate over the first list with the function to find x, y, z and put it in a list.
zip(list1, list2)
[(1, x), (2, y), (3, z)]
I have a list of tuples which I would like to only return the second column of data from and only unique values
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
Desired output:
['Andrew#gmail.com','Jim#gmail.com','Sarah#gmail.com']
My idea would be to iterate through the list and append the item from the second column into a new list then use the following code. Before I go down that path too far I know there is a better way to do this.
from collections import Counter
cnt = Counter(mytuple_new)
unique_mytuple_new = [k for k, v in cnt.iteritems() if v > 1]
You can use zip function :
>>> set(zip(*mytuple)[1])
set(['Sarah#gmail.com', 'Jim#gmail.com', 'Andrew#gmail.com'])
Or as a less performance way you can use map and operator.itemgetter and use set to get the unique tuple :
>>> from operator import itemgetter
>>> tuple(set(map(lambda x:itemgetter(1)(x),mytuple)))
('Sarah#gmail.com', 'Jim#gmail.com', 'Andrew#gmail.com')
a benchmarking on some answers :
my answer :
s = """\
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
set(zip(*mytuple)[1])
"""
print timeit.timeit(stmt=s, number=100000)
0.0740020275116
icodez answer :
s = """\
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
seen = set()
[x[1] for x in mytuple if x[1] not in seen and not seen.add(x[1])]
"""
print timeit.timeit(stmt=s, number=100000)
0.0938332080841
Hasan's answer :
s = """\
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
set([k[1] for k in mytuple])
"""
print timeit.timeit(stmt=s, number=100000)
0.0699651241302
Adem's answer :
s = """
from itertools import izip
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
set(map(lambda x: x[1], mytuple))
"""
print timeit.timeit(stmt=s, number=100000)
0.237300872803 !!!
unique_emails = set(item[1] for item in mytuple)
The list comprehension will help you generate a list containing only the second column data, and converting that list to set() removes duplicated values.
try:
>>> unique_mytuple_new = set([k[1] for k in mytuple])
>>> unique_mytuple_new
set(['Sarah#gmail.com', 'Jim#gmail.com', 'Andrew#gmail.com'])
You can use a list comprehension and a set to keep track of seen values:
>>> mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
>>> seen = set()
>>> [x[1] for x in mytuple if x[1] not in seen and not seen.add(x[1])]
['Andrew#gmail.com', 'Jim#gmail.com', 'Sarah#gmail.com']
>>>
The most important part of this solution is that order is preserved like in your example. Doing just set(x[1] for x in mytuple) or something similar will get you the unique items, but their order will be lost.
Also, the if x[1] not in seen and not seen.add(x[1]) may seem a little strange, but it is actually a neat trick that allows you to add items to the set inside the list comprehension (otherwise, we would need to use a for-loop).
Because and performs short-circuit evaluation in Python, not seen.add(x[1]) will only be evaluated if x[1] not in seen returns True. So, the condition sees if x[1] is in the set and adds it if not.
The not operator is placed before seen.add(x[1]) so that the condition evaluates to True if x[1] needed to be added to the set (set.add returns None, which is treated as False. not False is True).
How about the obvious and simple loop? There is no need to create a list and then convert to set, just don't add dupliates.
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
result = []
for item in mytuple:
if item[1] not in result:
result.append(item[1])
print result
Output:
['Andrew#gmail.com', 'Jim#gmail.com', 'Sarah#gmail.com']
Is the order of the items important? A lot of the proposed answers use set to unique-ify the list. That's good, proper, and performant if the order is unimportant. If order does matter, you can used an OrderedDict to perform set-like unique-ification while preserving order.
# test data
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
from collections import OrderedDict
emails = list(OrderedDict((t[1], 1) for t in mytuple).keys())
print emails
Yielding:
['Andrew#gmail.com', 'Jim#gmail.com', 'Sarah#gmail.com']
Update
Based on iCodez's suggestion, restating answer to:
from collections import OrderedDict
emails = list(OrderedDict.fromkeys(t[1] for t in mytuple).keys())