How to update a set? - python

Seems like using update should be pretty straight forward, and I think that I'm using it correctly, so it must be an error dealing with types or something else.
But anyway, here's the sit:
I'm doing coursework for a Coursera course (needless to say, answers minimizes or occluding code most helpful!) and am stuck on the last problem. The task is to return a set that contains all the documents which contain all the words in a query. The function takes an inverseIndex, a dictionary containing words as keys and the documents containing those words as values ex: {'a':[0,1],'be':[0,1,4].....}
The way I've attempted to implement this is pretty simple: get a set of sets, where each of the sets contains the list of document IDs, and then call .intersections(sets) to merge the sets into a set containing only the doc IDs of docs that contain all words in the query.
def andSearch(inverseIndex, query):
sets = set()
s = set()
for word in query:
s.update(inverseIndex[word])
print(inverseIndex[word])
print s
s.intersection(*sets)
return s
Unfortunately, this returns all the documents in the inverseIndex when it should only return the index '3'.
terminal output:
[0, 1, 2, 3, 4]
[0, 1, 2, 3]
[0, 1, 2, 3, 4]
[0, 1, 2, 3]
[0, 1, 3, 4]
[2, 3, 4]
set([0, 1, 2, 3, 4])
What's wrong?
Thanks so much!
sets = []
s = set()
for word in query:
sets.append(inverseIndex[word])
print sets
s.intersection(*sets)
return s
Output:
[[0, 1, 2, 3, 4], [0, 1, 2, 3], [0, 1, 2, 3, 4], [0, 1, 2, 3], [0, 1, 3, 4], [2, 3, 4]]
set([])
logout

You use update inside the loop. So, on each iteration you add the new pages to s. But you need to intersect those pages, because you need the pages, each of which contains all the words (not 'at least one word'). So you need to intersect on each iteration instead of updating.
Also, I'm not getting why you need sets at all.
This should work:
def andSearch(inverseIndex, query):
return set.intersection(*(set(inverseIndex[word]) for word in query))
This just produces the array of sets:
>>> [set(ii[word]) for word in query]
[set([0, 1]), set([0, 1, 4])]
And then I just call set.intersection to intersect them all.
About your question update.
It happens because s is empty.
Consider this example:
>>> s = set()
>>> s.intersection([1,2,3],[2,3,4])
set([])
To intersect sets just use set.intersection. But it accepts only sets as arguments. So you should convert lists of pages to sets of pages, or keep pages as sets in the dictionary.

Related

Apply list function on same itertools._grouper object give different value

The code below:
from itertools import groupby
for key, group in groupby(sorted([1, 1, 3, 3, 3, 3, 1, 3])):
print(list(group))
print(list(group))
return:
[1, 1, 1]
[]
[3, 3, 3, 3, 3]
[]
Why does the second print(list(group)) output an empty list instead of something same as the first one?
Because group is an iterator that you can exhaust and you do so with your first list(group) call.
You can think of them as guns. When they are created they are fully loaded; but there is no way to re-fire a fired bullet. Your second list(group) call, does exactly that; tries to fire an empty gun.

Ordering a list of items based on their second value

The function needs to be able to take a list such as:
[("Alice", [1, 2, 1, 1, 1, 1]), ("Bob", [3, 1, 5, 3, 2, 5]), ("Clare", [2, 3, 2, 2, 4, 2]), ("Dennis", [5, 4, 4, 4, 3, 4]), ("Eva", [4, 5, 3, 5, 5, 3])]
and process the information to order it by the total of each of the results, and output the data in the original format but starting with the person with the lowest score and working downwards and it has to break ties using the first value in each. (The first result for each person)
What I have written so far can take one entry and work out the total score:
def result(name, a):
a.remove(max(a))
score = 0
for i in a:
score = score + i
return score
But I need to be able to adapt this to take any number of entries and be able to out more than just the total.
I know I need to have the function work out the total scores but be able to keep the original sets in tact but I don't know how to interact with just one part of an entry and iterate through all of them doing the same.
I'm using Python 3.4.
If I've understood your question properly, you'd like to sort the list, and have the sort order defined by the sum of the numbers provided in each tuple. So, Alice's numbers add up to 7, Clare's add up to 15, so Alice is before Clare.
sorted() can take a function to override the normal sort order. For example, you'd want:
sorted(data, key=lambda entry: sum(entry[1]))
This will mean that in order to work out what's bigger and smaller, sorted() will look at the sum of the list of numbers, and compare those. For example, when looking at Alice, the lambda (anonymous) function will be given ("Alice", [1, 2, 1, 1, 1, 1]), and so entry[1] is [1, 2, 1, 1, 1, 1], sum(entry[1]) is then 7, and that's the number that sorted() will use to put it in the right place.

Error when search values in Python

I have a linked list with the following values:
switch_list = [[4, 1, 2, 2],
[4, 2, 3, 2],
[3, 1, 1, 3],
[3, 2, 4, 2],
[1, 3, 3, 1],
[1, 2, 2, 1],
[2, 1, 1, 2],
[2, 2, 4, 1]]
My goal is to compare an integer with the first value of the all linked-list and then, return a new list with filtered values.
Example: I want all values in switch_list with the first number equals 4, then the function will returns:
[[4, 1, 2, 2], [4, 2, 3, 2],]
Here is my program:
def bucket(value, input):
output = []
for i in input:
if i[0][0] == value:
output.append(i)
return output
And here is the output error:
File "matrix_optimizate.py", line 63, in bucket
if i[0][0] == value:
TypeError: 'int' object has no attribute '__getitem__'
You're already iterating over the outer list, so there's no need for two index lookups.
for i in input:
if i[0] == value:
output.append(i)
Also, there's a much more elegant way to do this using filter:
def bucket(input, value):
return filter(lambda x: x[0] == value, input)
At which point you probably don't even need to have it as it's own function.
And lastly you could use a list comprehension as well:
[i for i in input if i[0] == value]
I am assuming that input is going to be your list of lists. In that case,
for i in input:
will give you each sub list as i during each iteration. By trying to access
i[0][0]
you are trying to access the first element of the element of the sublist. In your example
i would give [4, 1, 2, 2]
i[0] would give 4, and
i[0][0] would therefore not make sense
Please try i[0] instead.
Edit: Please note that this answer only serves to point out your current problem. dursk's answer provides other solutions to what you are trying to perform as a whole and those options are much more powerful (list comprehension is a fantastic tool, I would recommend looking into it).

python delete all entries of a value in list

Why isn't this for loop working? My goal is to delete every 1 from my list.
>>> s=[1,4,1,4,1,4,1,1,0,1]
>>> for i in s:
... if i ==1: s.remove(i)
...
>>> s
[4, 4, 4, 0, 1]
Never change a list while iterating over it. The results are unpredictable, as you're seeing here. One simple alternative is to construct a new list:
s = [i for i in s if i != 1]
If for some reason you absolutely have to edit the list in place rather than constructing a new one, my off-the-cuff best answer is to traverse it finding the indices that must be deleted, then reverse traverse that list of indices removing them one by one:
indices_to_remove = [i for (i, val) in enumerate(s) if val == 1]
for i in reversed(indices_to_remove):
del s[i]
Because that removes elements from the end of the list first, the original indices computed remain valid. But I would generally prefer computing the new list unless special circumstances apply.
Consider this code:
#!/usr/bin/env python
s=[1, 4, 1, 4, 1, 4, 1, 1, 0, 1]
list_size=len(s)
i=0
while i!=list_size:
if s[i]==1:
del s[i]
list_size=len(s)
else:
i=i + 1
print s
Result:
[4, 4, 4, 0]
For short, your code get some undesirable result because of "size" and "index positions" of your list are changed every times you cut the number 1 off and your code is clearly proved that for each loop in Python can not handle a list with a dynamic size.
You should not change the content of list while iterating over it
But you could iterate over the copy of the list content and change it in your case
Code:
s=[1,4,1,4,1,4,1,1,0,1]
for i in s[:]:
if i ==1: s.remove(i)
print s
Output:
[4, 4, 4, 0]
As #metatoaster stated you could use filter
Code:
s=[1,4,1,4,1,4,1,1,0,1]
s=list(filter(lambda x:x !=1,s))
print s
[4, 4, 4, 0]
You could use filter to remove multiple things example
Code:
s=[1,4,1,4,1,4,1,1,0,1,2,3,5,6,7,8,9,10,20]
remove_element=[1,2,3,5,6,7,8,9]
s=list(filter(lambda x:x not in remove_element,s))
print s
[4, 4, 4, 0, 10, 20]
This doesn't work because you are modifying the list as it is iterating, and the current pointer moves past one of the 1 you check against. We can illustrate this:
>>> for i in s:
... print(s)
... if i == 1:
... s.remove(i)
...
[1_, 4, 1, 4, 1, 4, 1, 1, 0, 1]
[4, 1_, 4, 1, 4, 1, 1, 0, 1]
[4, 4, 1_, 4, 1, 1, 0, 1]
[4, 4, 4, 1_, 1, 0, 1]
[4, 4, 4, 1, 0_, 1]
[4, 4, 4, 1, 0, 1_]
I added _ to the element being compared. Note how there was only 6 passes in total and with one of the 1s actually skipped over from being ever looked at. That ends up being the element that was removed because list.remove removes the first occurrence of the element specified, and it is an O(n) operation on its own which gets very expensive once your list gets big - this is O(n) even if the item is in the beginning, as it has to copy every single item from everything after the item one element forward as python lists are more like C styled arrays than Java linked-lists (if you want to use linked-lists, use collections.deque). O(n) towards the end because it has to iterate through the entire list to do its own comparison too. Your resulting code can result in a worst case runtime complexity of O(n log n) if you make use of remove.
See Python's data structure time complexity
Peter's answer already covered the generation of a new list, I am only answering why and how your original code did not work exactly.

Returning a list of list elements

I need help writing a function that will take a single list and return a different list where every element in the list is in its own original list.
I know that I'll have to iterate through the original list that I pass through and then append the value depending on whether or not the value is already in my list or create a sublist and add that sublist to the final list.
an example would be:
input:[1, 2, 2, 2, 3, 1, 1, 3]
Output:[[1,1,1], [2,2,2], [3,3]]
I'd do this in two steps:
>>> import collections
>>> inputs = [1, 2, 2, 2, 3, 1, 1, 3]
>>> counts = collections.Counter(inputs)
>>> counts
Counter({1: 3, 2: 3, 3: 2})
>>> outputs = [[key] * count for key, count in counts.items()]
>>> outputs
[[1, 1, 1], [2, 2, 2], [3, 3]]
(The fact that these happen to be in sorted numerical order, and also in the order of first appearance, is just a coincidence here. Counters, like normal dictionaries, store their keys in arbitrary order, and you should assume that [[3, 3], [1, 1, 1], [2, 2, 2]] would be just as possible a result. If that's not acceptable, you need a bit more work.)
So, how does it work?
The first step creates a Counter, which is just a special subclass of dict made for counting occurrences of each key. One of the many nifty things about it is that you can just pass it any iterable (like a list) and it will count up how many times each element appears. It's a trivial one-liner, it's obvious and readable once you know how Counter works, and it's even about as efficient as anything could possibly be.*
But that isn't the output format you wanted. How do we get that? Well, we have to get back from 1: 3 (meaning "3 copies of 1") to [1, 1, 1]). You can write that as [key] * count.** And the rest is just a bog-standard list comprehension.
If you look at the docs for the collections module, they start with a link to the source. Many modules in the stdlib are like this, because they're meant to serve as source code for learning from as well as usable code. So, you should be able to figure out how the Counter constructor works. (It's basically just calling that _count_elements function.) Since that's the only part of Counter you're actually using beyond a basic dict, you could just write that part yourself. (But really, once you've understood how it works, there's no good reason not to use it, right?)
* For each element, it's just doing a hash table lookup (and insert if needed) and a += 1. And in CPython, it all happens in reasonably-optimized C.
** Note that we don't have to worry about whether to use [key] * count vs. [key for _ in range(count)] here, because the values have to be immutable, or at least of an "equality is as good as identity" type, or they wouldn't be usable as keys.
The most time efficient would be to use a dictionary:
collector = {}
for elem in inputlist:
collector.setdefault(elem, []).append(elem)
output = collector.values()
The other, more costly option is to sort, then group using itertools.groupby():
from itertools import groupby
output = [list(g) for k, g in groupby(sorted(inputlist))]
Demo:
>>> inputlist = [1, 2, 2, 2, 3, 1, 1, 3]
>>> collector = {}
>>> for elem in inputlist:
... collector.setdefault(elem, []).append(elem)
...
>>> collector.values()
[[1, 1, 1], [2, 2, 2], [3, 3]]
>>> from itertools import groupby
>>> [list(g) for k, g in groupby(sorted(inputlist))]
[[1, 1, 1], [2, 2, 2], [3, 3]]
What about this, as you said you wanted a function:
def makeList(user_list):
user_list.sort()
x = user_list[0]
output = [[]]
for i in user_list:
if i == x:
output[-1].append(i)
else:
output.append([i])
x = i
return output
>>> print makeList([1, 2, 2, 2, 3, 1, 1, 3])
[[1, 1, 1], [2, 2, 2], [3, 3]]

Categories