Python: check if item exists in variable amount of lists - python

I'm working on a small search engine and I'm lost a certain point. I have multiple lists containing items, and I want to check which items exist in all lists. The amount of lists can vary, since they are created based on the number of words in the search query, done with:
index_list = [[] for i in range((len(query)+1))]
I figured I start with finding out what the shortest list is, since that is the maximum amount of items that need to be checked. So for example, with a three-word-search-query:
index_list[1]=[set(1,2,3,4,5)]
index_list[2]=[set(3,4,5,6,7)]
index_list[3]=[set(4,5,6,7)]
shortest_list = index_list[3]
(What the shortest list is, is figured out with a function, not relevant for now).
Now I want to check if the items of the shortest list, index_list[3], also exist in the other lists. In this case there are 3 lists in total, but when entering a longer search query, the amount of lists increase. I thought to do something with loops, like:
result = []
for element in shortest_list:
for subelement in element:
for element2 in index_list[1]:
if subelement in element2:
for element3 in index_list[2]:
if subelement in element3:
result.append(subelement)
So, the result should be:
[4, 5]
since these items exist in all lists.
But, the loop above won't work when there are more lists. As described earlier, I don't know the amount of lists beforehand because it depends on the amount of words in the search query. So basically the depth of my loop depends on the amount of lists I have.
When doing research I found some postings suggesting recursion may do the job. Unfortunately I'm not Python skilled that well.
Any suggestions?
Thanks in advance!

Just use all sets and use set.intersection to find the common elements, also {1,2,3,4,5} is how to create a set of ints not set(1,2,3,4,5):
index_list = [set() for i in range(4)]
index_list[0].update({1,2,3,4,5})
index_list[1].update({3,4,5,6,7})
index_list[2].update({4,5,6,7})
shortest_list = index_list[2]
print(shortest_list.intersection(*index_list[:2]))
set([4, 5])

Try to go about it the opposite way: First make a list of all the index lists by doing something like
index_list_list = []
for ix_list in get_index_lists(): #Or whatever
index_list_list.append(ix_list)
Then you can loop through all of these, removing the elements in your 'remaining_items' list if they are not contained in the others:
remaining_items = shortest_list
for index_list in index_list_list:
curr_remaining_items = copy(remaining_items)
for element in curr_remaining_items:
if element not in index_list:
remaining_items.remove(element)
Your final 'remaining_items' list would then contain the elements that are common to all the lists.

I written code by your approach. You can try out following code:
index_list=['1','2','3','4','5']
index_list1=['3','4','5','6','7']
index_list2=['4','5','6','7']
result = []
for element in index_list:
for subelement in element:
for element2 in index_list1:
if subelement in element2:
for element3 in index_list2:
if subelement in element3:
result.append(subelement)
print result
output:
['4', '5']

It is a little confusing that you appear to have something shadowing the built in type set, which happens to be built for precisely this type of job.
subset = set(shortest_list)
# Use map here to only lookup method once.
# We don't need the result, which will be a list of None.
map(subset.intersection_update, index_lists)
# Alternative: describe the reduction more directly
# Cost: rebuilds a new set for each list
subset = reduce(set.intersection, index_lists, set(shortest_list))
Note: As Padraic indicated in his answer, set.intersection and set.intersection_update both take an arbitrary number of arguments so it is unnecessary to use map or reduce in this case.
It is also by far preferable that all the lists already be sets, since the intersection can be optimized to the size of the smaller set, but a list intersection requires scanning the list.

Related

What is the fastest way of getting the list of lists containing a certain integer in a list of list of integers

for example I have a list of list of integers like
x = [[1,2,3,4], [4,5,6], [2,3,1,9]]
Assume the length of x is in million. In that case iterating trough each element will be very slow.
Is there any faster way?
Without any prior knowledge or extra information about the list (e.g., whether its sorted), you have no real choice but to iterate over the entire list. Note, however, that doing this with a generator could be must more performant than creating a filtered list, as the values are calculated just when you attempt to consume them, and not upfront:
search = 2
listGenerator = (i for i in x if search in i)

Find matching elements of two unordered Python lists of different sizes

I'm getting this error: index out of range, in if largerList[j] == smallerList[i]. I'm working on an assignment about binary search trees, I put the trees into lists and I'm just trying to compare the two lists:
def matchList(largerList, smallerList) :
matches = []
for i in smallerList:
for j in largerList:
if largerList[j] == smallerList[i] :
matches[i] = smallerList[i]
return matches
I'm assuming nested for loops should totally iterate all elements in each loop, so smallerList is the smaller list so smallerList doesn't make largerList go out of bounds. The inner for-loop should iterate over all of the larger list entirely, comparing each value to each element of the smaller list. Why doesn't it work?
You can't set a list value with matches[i] if that index does not exist in matches.
Try appending instead:
Change this matches[i] = smallerList[i] to this matches = matches.append(smallerList[i])
Trying to find matching elements in lists like this is rather inefficient. One thing you could improve to make it arguably more pythonic is to use a list comprehension:
matches = [i for i in largerList if i in smallerList]
But then the more mathematically sensible approach still would be to realise that we have two sets of elements and we want to find an intersection of two sets so we can write something like:
matches = set(largerList).intersection(smallerList)

Reduce a list in a specific way

I have a list of strings which looks like this:
['(num1, num2):1', '(num3, num4):1', '(num5, num6):1', '(num7, num8):1']
What I try to achieve is to reduce this list and combine every two elements and I want to do this until there is only one big string element left.
So the intermediate list would look like this:
['((num1, num2):1,(num3, num4):1)', '((num5, num6):1,(num7, num8):1)']
The complicated thing is (as you can see in the intermediate list), that two strings need to be wrapped in paranthesis. So for the above mentioned starting point the final result should look like this:
(((num_1,num_2):1,(num_3,num_4):1),((num_5,num_6):1,(num_7,num_8):1))
Of course this should work in a generic way also for 8, 16 or more string elements in the starting list. Or to be more precise it should work for an=2(n+1).
Just to be very specific how the result should look with 8 elements:
'((((num_1,num_2):1,(num_3,num_4):1),((num_5,num_6):1,(num_7,num_8):1)),(((num_9,num_10):1,(num_11,num_12):1),((num_13,num_14):1,(num_15,num_16):1)))'
I already solved the problem using nested for loops but I thought there should be a more functional or short-cut solution.
I also found this solution on stackoverflow:
import itertools as it
l = [map( ",".join ,list(it.combinations(my_list, l))) for l in range(1,len(my_list)+1)]
Although, the join isn't bad, I still need the paranthesis. I tried to use:
"{},{}".format
instead of .join but this seems to be to easy to work :).
I also thought to use reduce but obviously this is not the right function. Maybe one can implement an own reduce function or so?
I hope some advanced pythonics can help me.
Sounds like a job for the zip clustering idiom: zip(*[iter(x)]*n) where you want to break iterable x into size n chunks. This will discard "leftover" elements that don't make up a full chunk. For x=[1, 2, 3], n=2 this would yield (1, 2)
def reducer(l):
while len(l) > 1:
l = ['({},{})'.format(x, y) for x, y in zip(*[iter(l)]*2)]
return l
reducer(['(num1, num2):1', '(num3, num4):1', '(num5, num6):1', '(num7, num8):1'])
# ['(((num1, num2):1,(num3, num4):1),((num5, num6):1,(num7, num8):1))']
This is an explanation of what is happening in zip(*[iter(l)]*2)
[iter(l)*2] This creates an list of length 2 with two times the same iterable element or to be more precise with two references to the same iter-object.
zip(*...) does the extracting. It pulls:
Loop
the first element from the first reference of the iter-object
the second element from the second reference of the iter-object
Loop
the third element from the first reference of the iter-object
the fourth element from the second reference of the iter object
Loop
the fifth element from the first reference of the iter-object
the sixth element from the second reference of the iter-object
and so on...
Therefore we have the extracted elements available in the for-loop and can use them as x and y for further processing.
This is really handy.
I also want to point to this thread since it helped me to understand the concept.

how to create a list containing of 100 number of strings whose names are in series

a list of string objects is like
nodes=["#A_CN1","#A_CN2","#A_CN3","#A_CN4","#A_CN5","#A_CN6","#A_CN7","#A_CN8","#A_CN9","#A_CN10"]
Here in the above list there are 10 elements but i need to use around 100 elements and the element is like #A_CN100
Is there any way to represent it shortly rather than writing 100 times in python ?
If suppose there is a list of 100 elements where each element itself is a list like, node1 , node2.. all are some lists
nodes=[node1,node2,node3,node4,node5,node6....node100]
if I express this as
nodes=[node{0}.format(i) for i in range(1,101)]
But,this throws an error! How to rectify this?
A one liner with list comprehensions
nodes = ["#A_CN{0}".format(i) for i in range(1,101)]
There is also a suggestion in the comments that a generator version be demonstrated. It would look like this:
nodes = ("#A_CN{0}".format(i) for i in range(1,101))
But more commonly this is passed to list
nodes = list("#A_CN{0}".format(i) for i in range(1,101))
So we end up with the same result as the list comprehension. However the second form is useful if you want to generate about a million items.
You omitted quotes (or apostrophes). Instead of
nodes=[node{0}.format(i) for i in range(1,101)]
use
nodes=["node{0}".format(i) for i in range(1,101)]

How can I find the intersection of a list and a nested list?

I have a list of fruits:
fruits = ["apple","banana"]
I also have a nested list of baskets, in which each list contains a string (the name of the basket) and a list of fruits.
baskets = [["basket1",["apple","banana","pear","strawberry"]],["basket2",["strawberry","pear","peach"]],["basket3",["peach","apple","banana"]]]
I would like to know which baskets contain every fruits in the list fruits: the result I expect is a list with two elements, "basket1" and "basket3".
I figured that intersections would the cleanest way of achieving that, and I tried the following:
myset = set(fruits).intersection(*map(set, set(baskets)))
But I'm getting a TypeError "unhashable type: 'list'". I understand that I can't map lists, but I thought that using the function "set" on both lists would convert them to sets... is there any other way I can find the intersection of a list and a list of lists?
You can loop over baskets and check if the fruits set is a subset of fruits in current basket, if yes store current basket's name.
>>> fruits = {"apple", "banana"} #notice the {}, or `set(["apple","banana"])` in Python 2.6 or earlier
>>> [b for b, f in baskets if fruits.issubset(f)]
['basket1', 'basket3']
You can't hash sets any more than you can hash lists. They both have the same problem: because they're mutable, a value can change its contents, making any set that contains it as a member or any dictionary that contains it as a key suddenly invalid.
You can hash the immutable equivalents of both, tuple and frozenset.
Meanwhile, your immediate problem is ironically created by your attempt to solve this problem. Break this line down into pieces:
myset = set(fruits).intersection(*map(set, set(baskets)))
The first piece is this:
baskets_set = set(baskets)
You've got a list of lists. You, set(baskets) is trying to make a set of lists. Which you can't do, because lists aren't hashable.
If you just removed that, and used map(set, baskets), you would then have an iterator of sets, which is a perfectly valid thing.
Of course as soon as you try to iterate it, it will try to make a set out of the first element of baskets, which is a list, so you'll run into the error again.
Plus, even if you solve this, the logic still doesn't make any sense. What's the intersection of a set of, say, 3 strings with a set of, say, 3 (frozen)sets of strings? It's empty. The two sets don't have any elements in common. The fact that some elements of the second one may contain elements of the first doesn't mean that the second one itself contains any elements of the first.
You could do it this way using your approach:
fruits = ["apple","banana"]
baskets = [["basket1",["apple","banana","pear","strawberry"]],
["basket2",["strawberry","pear","peach"]],
["basket3",["peach","apple","banana"]]]
fruitset = set(fruits)
res = set(b for b, s in ((b, set(c)) for b, c in baskets) if s & fruitset)
print res # --> set(['basket1', 'basket3'])

Categories