Comparing Multiple Lists Python

Comparing Multiple Lists Python - python

I'm trying to compare multiple lists. However the lists aren't label...normally. I'm using a while loop to make a new list each time and label them accordingly. So for example, if the while loop runs 3 times it will make a List1 a List2 and List3. Here is then snippet of the code to create the list.
for link in links:
print('*', link.text)
locals()['list{}'.format(str(i))].append(link.text)
So I want to compare each list for the strings that are in them but I want to compare all the lists at once then print out the common strings.
I feel like I'll be using something like this, but I'm not 100% sure.
lists = [list1, list2, list3, list4, list5, list6, list7, list8, list9, list10]
common = list(set().union(*lists).intersection(Keyword))

Rather than directly modifying locals() (generally not a good idea), use a defaultdict as a container. This data structure allows you to create new key-value pairs on the fly rather than relying on a method which is sure to lead to a NameError at some point.
from collections import defaultdict
i = ...
link_lists = defaultdict(list)
for link in links:
print('*', link.text)
link_lists[i].append(link.text)
To find the intersection of all of the lists:
all_lists = list(link_lists.values())
common_links = set(all_lists[0]).intersection(*all_lists[1:])
In Python 2.6+, you can pass multiple iterables to set.intersection. This is what the star-args do here.
Here's an example of how the intersection will work:
>>> from collections import defaultdict
>>> c = defaultdict(list)
>>> c[9].append("a")
>>> c[0].append("b")
>>> all = list(c.values())
>>> set(all[0]).intersection(*all[1:])
set()
>>> c[0].append("a")
>>> all = list(c.values())
>>> set(all[0]).intersection(*all[1:])
{'a'}

You have several options,
option a)
use itertools to get a cartesian product, this is quite nice because its an iterator
a = ["A", "B", "C"]
b = ["A","C"]
c = ["C","D","E"]
for aval,bval,cval in itertools.product(a,b,c):
if aval == bval and bval == cval:
print aval
option b)
Use sets (recommended):
all_lists = []
# insert your while loop X times
for lst in lists: # This is my guess of your loop running.
currentList = map(lambda x: x.link, links)
all_lists.append(currentList) # O(1) operation
result_set = set()
if len(all_lists)>1:
result_set = set(all_lists[0]).intersection(*all_lists[1:])
else:
result_set = set(all_lists[0])
Using the sets, however, will be faster

Related

How to merge n lists together item by item for each list

I want to make one large list for entering into a database with values from 4 different lists. I want it to be like
[[list1[0], list2[0], list3[0], list4[0]], [list1[1], list2[1], list3[1], list4[1]], etc.....]
Another issue is that currently the data is received like this:
[ [ [list1[0], list1[1], [list1[3]]], [[list2[0]]], etc.....]
I've tried looping through each list using indexs and adding them to a new list based on those but it hasn't worked, I'm pretty sure it didn't work because some of the lists are different lengths (they're not meant to be but it's automated data so sometimes there's a mistake).
Anyone know what's the best way to go about this? Thanks.

First list can be constructed using zip function as follows (for 4 lists):
list1 = [1,2,3,4]
list2 = [5,6,7,8]
list3 = [9,10,11,12]
list4 = [13,14,15,16]
res = list(zip(list1,list2,list3,list4))
For arbitrtary number of lists stored in another list u can use *-notation to unpack outer list:
lists = [...]
res = list(zip(*lists))
To construct list of lists for zipping from you data in second issue use flatten concept to it and then zip:
def flatten(l):
res = []
for el in l:
if(isinstance(el, list)):
res += flatten(el)
else:
res.append(el)
return res
auto_data = [...]
res = list(zip(*[flatten(el) for el in auto_data]))
Some clarification at the end:
zip function construct results of the smallest length between all inputs, then you need to extend data in list comprehension in last code string to be one length to not lose some info.

So if I understand correctly, this is your input:
l = [[1.1,1.2,1.3,1.4],[2.1,2.2,2.3,2.4],[3.1,3.2,3.3,3.4],[4.1,4.2,4.3,4.4]]
and you would like to have this output
[[1.1,2.1,3.1,4.1],...]
If so, this could be done by using zip
zip(*l)

Make a for loop which only gives you the counter variable. Use that variable to index the lists. Make a temporary list , fill it up with the values from the other lists. Add that list to the final one. With this you will et the desired structure.
nestedlist = []
for counter in range(0,x):
temporarylist = []
temporarylist.append(firstlist[counter])
temporarylist.append(secondlist[counter])
temporarylist.append(thirdlist[counter])
temporarylist.append(fourthlist[counter])
nestedlist.append(temporarylist)
If all the 4 lists are the same length you can use this code to make it even nicer.
nestedlist = []
for counter in range(0,len(firstlist)): #changed line
temporarylist = []
temporarylist.append(firstlist[counter])
temporarylist.append(secondlist[counter])
temporarylist.append(thirdlist[counter])
temporarylist.append(fourthlist[counter])
nestedlist.append(temporarylist)

This comprehension should work, with a little help from zip:
mylist = [i for i in zip(list1, list2, list3, list4)]
But this assumes all the list are of the same length. If that's not the case (or you're not sure of that), you can "pad" them first, to be of same length.
def padlist(some_list, desired_length, pad_with):
while len(some_list) < desired_length:
some_list.append(pad_with)
return some_list
list_of_lists = [list1, list2, list3, list4]
maxlength = len(max(list_of_lists, key=len))
list_of_lists = [padlist(l, maxlength, 0) for l in list_of_lists]
And now do the above comprehension statement, works well in my testing of it
mylist = [i for i in zip(*list_of_lists)]

If the flatten concept doesn't work, try this out:
import numpy as np
myArray = np.array([[list1[0], list2[0], list3[0], list4[0]], [list1[1], list2[1], list3[1], list4[1]]])
np.hstack(myArray)
Also that one should work:
np.concatenate(myArray, axis=1)
Just for those who will search for the solution of this problem when lists are of the same length:
def flatten(lists):
results = []
for numbers in lists:
for output in numbers:
results.append(output)
return results
print(flatten(n))

How do I convert multiple lists inside a list using Python? [duplicate]

This question already has answers here:
How do I make a flat list out of a list of lists?
(34 answers)
Closed 7 years ago.
I want to convert multiple lists inside a list? I am doing it with a loop, but each sub list item doesn't get a comma between it.
myList = [['a','b','c','d'],['a','b','c','d']]
myString = ''
for x in myList:
myString += ",".join(x)
print myString
ouput:
a,b,c,da,b,c,d
desired output:
a,b,c,d,a,b,c,d

This can be done using a list comprehension where you will "flatten" your list of lists in to a single list, and then use the "join" method to make your list a string. The ',' portion indicates to separate each part by a comma.
','.join([item for sub_list in myList for item in sub_list])
Note: Please look at my analysis below for what was tested to be the fastest solution on others proposed here
Demo:
myList = [['a','b','c','d'],['a','b','c','d']]
result = ','.join([item for sub_list in myList for item in sub_list])
output of result -> a,b,c,d,a,b,c,d
However, to further explode this in to parts to explain how this works, we can see the following example:
# create a new list called my_new_list
my_new_list = []
# Next we want to iterate over the outer list
for sub_list in myList:
# Now go over each item of the sublist
for item in sub_list:
# append it to our new list
my_new_list.append(item)
So at this point, outputting my_new_list will yield:
['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd']
So, now all we have to do with this is make it a string. This is where the ','.join() comes in to play. We simply make this call:
myString = ','.join(my_new_list)
Outputting that will give us:
a,b,c,d,a,b,c,d
Further Analysis
So, looking at this further, it really piqued my interest. I suspect that in fact the other solutions are possibly faster. Therefore, why not test it!
I took each of the solutions proposed, and ran a timer against them with a much bigger sample set to see what would happen. Running the code yielded the following results in increasing order:
map: 3.8023074030061252
chain: 7.675725881999824
comprehension: 8.73164687899407
So, the clear winner here is in fact the map implementation. If anyone is interested, here is the code used to time the results:
from timeit import Timer
def comprehension(l):
return ','.join([i for sub_list in l for i in sub_list])
def chain(l):
from itertools import chain
return ','.join(chain.from_iterable(l))
def a_map(l):
return ','.join(map(','.join, l))
myList = [[str(i) for i in range(10)] for j in range(10)]
print(Timer(lambda: comprehension(myList)).timeit())
print(Timer(lambda: chain(myList)).timeit())
print(Timer(lambda: a_map(myList)).timeit())

from itertools import chain
myList = [['a','b','c','d'],['a','b','c','d']]
print(','.join(chain.from_iterable(myList)))
a,b,c,d,a,b,c,d

You could also just join at both levels:
>>> ','.join(map(','.join, myList))
'a,b,c,d,a,b,c,d'
It's shorter and significantly faster than the other solutions:
>>> myList = [['a'] * 1000] * 1000
>>> from timeit import timeit
>>> timeit(lambda: ','.join(map(','.join, myList)), number=10)
0.18380278121490046
>>> from itertools import chain
>>> timeit(lambda: ','.join(chain.from_iterable(myList)), number=10)
0.6535200733309843
>>> timeit(lambda: ','.join([item for sub_list in myList for item in sub_list]), number=10)
1.0301431917067738
I also tried [['a'] * 10] * 10, [['a'] * 10] * 100000 and [['a'] * 100000] * 10 and it was always the same picture.

myList = [['a','b','c','d'],[a','b','c','d']]
smyList = myList[0] + myList[1]
str1 = ','.join(str(x) for x in smyList)
print str1
output
a,b,c,d,a,b,c,d

Dropping values from a list of tuples

I have a list of tuples which I would like to only return the second column of data from and only unique values
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
Desired output:
['Andrew#gmail.com','Jim#gmail.com','Sarah#gmail.com']
My idea would be to iterate through the list and append the item from the second column into a new list then use the following code. Before I go down that path too far I know there is a better way to do this.
from collections import Counter
cnt = Counter(mytuple_new)
unique_mytuple_new = [k for k, v in cnt.iteritems() if v > 1]

You can use zip function :
>>> set(zip(*mytuple)[1])
set(['Sarah#gmail.com', 'Jim#gmail.com', 'Andrew#gmail.com'])
Or as a less performance way you can use map and operator.itemgetter and use set to get the unique tuple :
>>> from operator import itemgetter
>>> tuple(set(map(lambda x:itemgetter(1)(x),mytuple)))
('Sarah#gmail.com', 'Jim#gmail.com', 'Andrew#gmail.com')
a benchmarking on some answers :
my answer :
s = """\
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
set(zip(*mytuple)[1])
"""
print timeit.timeit(stmt=s, number=100000)
0.0740020275116
icodez answer :
s = """\
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
seen = set()
[x[1] for x in mytuple if x[1] not in seen and not seen.add(x[1])]
"""
print timeit.timeit(stmt=s, number=100000)
0.0938332080841
Hasan's answer :
s = """\
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
set([k[1] for k in mytuple])
"""
print timeit.timeit(stmt=s, number=100000)
0.0699651241302
Adem's answer :
s = """
from itertools import izip
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
set(map(lambda x: x[1], mytuple))
"""
print timeit.timeit(stmt=s, number=100000)
0.237300872803 !!!

unique_emails = set(item[1] for item in mytuple)
The list comprehension will help you generate a list containing only the second column data, and converting that list to set() removes duplicated values.

try:
>>> unique_mytuple_new = set([k[1] for k in mytuple])
>>> unique_mytuple_new
set(['Sarah#gmail.com', 'Jim#gmail.com', 'Andrew#gmail.com'])

You can use a list comprehension and a set to keep track of seen values:
>>> mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
>>> seen = set()
>>> [x[1] for x in mytuple if x[1] not in seen and not seen.add(x[1])]
['Andrew#gmail.com', 'Jim#gmail.com', 'Sarah#gmail.com']
>>>
The most important part of this solution is that order is preserved like in your example. Doing just set(x[1] for x in mytuple) or something similar will get you the unique items, but their order will be lost.
Also, the if x[1] not in seen and not seen.add(x[1]) may seem a little strange, but it is actually a neat trick that allows you to add items to the set inside the list comprehension (otherwise, we would need to use a for-loop).
Because and performs short-circuit evaluation in Python, not seen.add(x[1]) will only be evaluated if x[1] not in seen returns True. So, the condition sees if x[1] is in the set and adds it if not.
The not operator is placed before seen.add(x[1]) so that the condition evaluates to True if x[1] needed to be added to the set (set.add returns None, which is treated as False. not False is True).

How about the obvious and simple loop? There is no need to create a list and then convert to set, just don't add dupliates.
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
result = []
for item in mytuple:
if item[1] not in result:
result.append(item[1])
print result
Output:
['Andrew#gmail.com', 'Jim#gmail.com', 'Sarah#gmail.com']

Is the order of the items important? A lot of the proposed answers use set to unique-ify the list. That's good, proper, and performant if the order is unimportant. If order does matter, you can used an OrderedDict to perform set-like unique-ification while preserving order.
# test data
mytuple = [('Andrew','Andrew#gmail.com','20'),('Jim',"Jim#gmail.com",'12'),("Sarah","Sarah#gmail.com",'43'),("Jim","Jim#gmail.com",'15'),("Andrew","Andrew#gmail.com",'56')]
from collections import OrderedDict
emails = list(OrderedDict((t[1], 1) for t in mytuple).keys())
print emails
Yielding:
['Andrew#gmail.com', 'Jim#gmail.com', 'Sarah#gmail.com']
Update
Based on iCodez's suggestion, restating answer to:
from collections import OrderedDict
emails = list(OrderedDict.fromkeys(t[1] for t in mytuple).keys())

How to find the number of instances of an item in a list of lists

I want part of a script I am writing to do something like this.
x=0
y=0
list=[["cat","dog","mouse",1],["cat","dog","mouse",2],["cat","dog","mouse",3]]
row=list[y]
item=row[x]
print list.count(item)
The problem is that this will print 0 because it isn't searching the individual lists.How can I make it return the total number of instances instead?

Search per sublist, adding up results per contained list with sum():
sum(sub.count(item) for sub in lst)
Demo:
>>> lst = [["cat","dog","mouse",1],["cat","dog","mouse",2],["cat","dog","mouse",3]]
>>> item = 'cat'
>>> sum(sub.count(item) for sub in lst)
3

sum() is a builtin function for adding up its arguments.
The x.count(item) for x in list) is a "generator expression" (similar to a list comprehension) - a handy way to create and manage list objects in python.
item_count = sum(x.count(item) for x in list)
That should do it

Using collections.Counter and itertools.chain.from_iterable:
>>> from collections import Counter
>>> from itertools import chain
>>> lst = [["cat","dog","mouse",1],["cat","dog","mouse",2],["cat","dog","mouse",3]]
>>> count = Counter(item for item in chain.from_iterable(lst) if not isinstance(item, int))
>>> count
Counter({'mouse': 3, 'dog': 3, 'cat': 3})
>>> count['cat']
3
I filtered out the ints because I didn't see why you had them in the first place.

converting a simple list to a dictionary (in python)

I'm learning python. I have a list of simple entries and I want to convert it in a dictionary where the first element of list is the key of the second element, the third is the key of the fourth, and so on. How can I do it?
list = ['first_key', 'first_value', 'second_key', 'second_value']
Thanks in advance!

The most concise way is
some_list = ['first_key', 'first_value', 'second_key', 'second_value']
d = dict(zip(*[iter(some_list)] * 2))

myDict = dict(zip(myList[::2], myList[1::2]))
Please do not use 'list' as a variable name, as it prevents you from accessing the list() function.
If there is much data involved, we can do it more efficiently using iterator functions:
from itertools import izip, islice
myList = ['first_key', 'first_value', 'second_key', 'second_value']
myDict = dict(izip(islice(myList,0,None,2), islice(myList,1,None,2)))

If the list is large, you end up wasting memory by building slices or eager zips. One way to convert the list more lazily is to (ab)use the list iterator and izip.
from itertools import izip
lst = ['first_key', 'first_value', 'second_key', 'second_value']
i = iter(lst)
d = dict(izip(i,i))

The KISS way:
Use exception and iterators
myDict = {}
it = iter(list)
for x in list:
try:
myDict[it.next()] = it.next()
except:
pass
myDict

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Comparing Multiple Lists Python - python

Related

How to merge n lists together item by item for each list

How do I convert multiple lists inside a list using Python? [duplicate]

Dropping values from a list of tuples

How to find the number of instances of an item in a list of lists

converting a simple list to a dictionary (in python)

Categories

Resources