why doesn't following Python removing duplicates function work? [duplicate]

why doesn't following Python removing duplicates function work? [duplicate] - python

This question already has answers here:
Strange result when removing item from a list while iterating over it
(8 answers)
Closed last month.
As an experiment, I did this:
letters=['a','b','c','d','e','f','g','h','i','j','k','l']
for i in letters:
letters.remove(i)
print letters
The last print shows that not all items were removed ? (every other was).
IDLE 2.6.2
>>> ================================ RESTART ================================
>>>
['b', 'd', 'f', 'h', 'j', 'l']
>>>
What's the explanation for this ? How it could this be re-written to remove every item ?

Some answers explain why this happens and some explain what you should've done. I'll shamelessly put the pieces together.
What's the reason for this?
Because the Python language is designed to handle this use case differently. The documentation makes it clear:
It is not safe to modify the sequence being iterated over in the loop (this can only happen for mutable sequence types, such as lists). If you need to modify the list you are iterating over (for example, to duplicate selected items) you must iterate over a copy.
Emphasis mine. See the linked page for more -- the documentation is copyrighted and all rights are reserved.
You could easily understand why you got what you got, but it's basically undefined behavior that can easily change with no warning from build to build. Just don't do it.
It's like wondering why i += i++ + ++i does whatever the hell it is it that line does on your architecture on your specific build of your compiler for your language -- including but not limited to trashing your computer and making demons fly out of your nose :)
How it could this be re-written to remove every item?
del letters[:] (if you need to change all references to this object)
letters[:] = [] (if you need to change all references to this object)
letters = [] (if you just want to work with a new object)
Maybe you just want to remove some items based on a condition? In that case, you should iterate over a copy of the list. The easiest way to make a copy is to make a slice containing the whole list with the [:] syntax, like so:
#remove unsafe commands
commands = ["ls", "cd", "rm -rf /"]
for cmd in commands[:]:
if "rm " in cmd:
commands.remove(cmd)
If your check is not particularly complicated, you can (and probably should) filter instead:
commands = [cmd for cmd in commands if not is_malicious(cmd)]

You cannot iterate over a list and mutate it at the same time, instead iterate over a slice:
letters=['a','b','c','d','e','f','g','h','i','j','k','l']
for i in letters[:]: # note the [:] creates a slice
letters.remove(i)
print letters
That said, for a simple operation such as this, you should simply use:
letters = []

You cannot modify the list you are iterating, otherwise you get this weird type of result. To do this, you must iterate over a copy of the list:
for i in letters[:]:
letters.remove(i)

It removes the first occurrence, and then checks for the next number in the sequence. Since the sequence has changed it takes the next odd number and so on...
take "a"
remove "a" -> the first item is now "b"
take the next item, "c"
-...

what you want to do is:
letters[:] = []
or
del letters[:]
This will preserve original object letters was pointing to. Other options like, letters = [], would create a new object and point letters to it: old object would typically be garbage-collected after a while.
The reason not all values were removed is that you're changing list while iterating over it.
ETA: if you want to filter values from a list you could use list comprehensions like this:
>>> letters=['a','b','c','d','e','f','g','h','i','j','k','l']
>>> [l for l in letters if ord(l) % 2]
['a', 'c', 'e', 'g', 'i', 'k']

Probably python uses pointers and the removal starts at the front. The variable „letters“ in the second line partially has a different value than tha variable „letters“ in the third line. When i is 1 then a is being removed, when i is 2 then b had been moved to position 1 and c is being removed. You can try to use „while“.

#!/usr/bin/env python
import random
a=range(10)
while len(a):
print a
for i in a[:]:
if random.random() > 0.5:
print "removing: %d" % i
a.remove(i)
else:
print "keeping: %d" % i
print "done!"
a=range(10)
while len(a):
print a
for i in a:
if random.random() > 0.5:
print "removing: %d" % i
a.remove(i)
else:
print "keeping: %d" % i
print "done!"
I think this explains the problem a little better, the top block of code works, whereas the bottom one doesnt.
Items that are "kept" in the bottom list never get printed out, because you are modifiying the list you are iterating over, which is a recipe for disaster.

OK, I'm a little late to the party here, but I've been thinking about this and after looking at Python's (CPython) implementation code, have an explanation I like. If anyone knows why it's silly or wrong, I'd appreciate hearing why.
The issue is moving through a list using an iterator, while allowing that list to change.
All the iterator is obliged to do is tell you which item in the (in this case) list comes after the current item (i.e. with the next() function).
I believe the way iterators are currently implemented, they only keep track of the index of the last element they iterated over. Looking in iterobject.c one can see what appears to be a definition of an iterator:
typedef struct {
PyObject_HEAD
Py_ssize_t it_index;
PyObject *it_seq; /* Set to NULL when iterator is exhausted */
} seqiterobject;
where it_seq points to the sequence being iterated over and it_index gives the index of the last item supplied by the iterator.
When the iterator has just supplied the nth item and one deletes that item from the sequence, the correspondence between subsequent list elements and their indices changes. The former (n+1)st item becomes the nth item as far as the iterator is concerned. In other words, the iterator now thinks that what was the 'next' item in the sequence is actually the 'current' item.
So, when asked to give the next item, it will give the former (n+2)nd item(i.e. the new (n+1)st item).
As a result, for the code in question, the iterator's next() method is going to give only the n+0, n+2, n+4, ... elements from the original list. The n+1, n+3, n+5, ... items will never be exposed to the remove statement.
Although the intended activity of the code in question is clear (at least for a person), it would probably require much more introspection for an iterator to monitor changes in the sequence it iterates over and, then, to act in a 'human' fashion.
If iterators could return prior or current elements of a sequence, there might be a general work-around, but as it is, you need to iterate over a copy of the list, and be certain not to delete any items before the iterator gets to them.

Intially i is reference of a as the loop runs the first position element deletes or removes and the second position element occupies the first position but the pointer moves to the second position this goes on so that's the reason we are not able to delete b,d,f,h,j,l
`

Related

why isn't len returning correct values?

def dbl_linear(n):
u=[1]
i=0
for a in u:
u.append((2*a+1))
u.append((3*a+1))
u=set(u)
u=list(u)
if len(u)>=n:
print(len(u))
break
return len(u)
i want this code to return n elements in list u.But that isn't happening.can someone help? i gave input n=20.the len(u) is coming as 15 or 7.different answers on every run

Modifying an object you're iterating over is basically undefined behaviour, you can't assume whether the iterations will or will not take the new items in account, especially in the face of resize (list is O(1) amortized append, because it's O(1) on reserved space but they regularly need to reallocate the entire thing to make more room for new elements). Not to mention here you're only modifying the initial list during the first iteration, afterwards you're updating a different unrelated list.
There's no reason to even use for a in u, just use an infinite loop (and probably remember the last element as your uniquification via set will scramble the list, alternatively just check before inserting if the element is already present, in is O(n) but so are set(a) and list(a)).

Even positioned numbers are escaped unfortunatly python3

I have a function:
def fun(l):
for i in l:
if len(i)==10:
l.append('+91 {} {}'.format(i[:5],i[5:]))
l.remove(i)
if len(i)==11:
j=list(''.join(i))
j.remove(i[0])
l.append('+91 {} {}'.format(''.join(j[:5]),''.join(j[5:])))
l.remove(i)
if len(i)==12:
j=list(''.join(i))
j.remove(i[0])
j.remove(i[1])
l.append('+91 {} {}'.format(''.join(j[:5]),''.join(j[5:])))
l.remove(i)
if len(i)==13:
j=list(''.join(i))
j.remove(i[0])
j.remove(i[1])
j.remove(i[2])
l.append('+91 {} {}'.format(''.join(j[:5]),''.join(j[5:])))
l.remove(i)
return l
say l=['9195969878','07895462130','919875641230']
I am getting the output as
['+91 91959 69878','7895462130','+91 98756 41230']
But i have suppose to get the output as:
['+91 91959 69878','+91 78954 62130,'+91 98756 41230']
Actually this function is escaping all that is positioned even no in 'l' list. Kindly suggest

The first problem is that you're mutating the list while iterating over it. In this particular case, this caused the loop to skip some items, as you deleted items that were earlier. In other Python versions it might trigger an error. But you're returning your result, so I don't see why you're mutating the list at all.
Secondly your code does some roundabout things, in particular ''.join(i) which is absolutely redundant (it literally rebuilds the same string), and series of remove() calls which almost certainly don't do what you expect. If you remove the first item from [1,2,3], the list becomes [2,3], and if you follow that by removing the second item (index 1) you end up with [2]. This is the same sort of issue your for loop has with the other remove.
I would also restructure the code a bit to avoid code duplication. I get something like:
def fun(l):
return ['+91 {} {}'.format(i[-10:-5],i[-5:])
for i in l]
This never alters l, makes one single pass, and joins all the different length behaviours by observing that we're using parts at a fixed distance from the end. There is one caveat: other lengths aren't handled separately. I don't know if those occur, or how you actually want them handled (the old code would leave them as is). We can easily enough specify other behaviour:
def fun(l):
return ['+91 {} {}'.format(i[-10:-5],i[-5:]) if 10<=len(i)<=13
else i
for i in l]
This still doesn't reproduce the behaviour that reformatted numbers were appended at the end, but I'm not sure you really wanted that. It made little sense for the loop to process its own output in the first place.

You are modifying the list l as you go - I would suggest to create a new list and add things to this list. Is there a reason you want to mutate in place?
If you are intent on mutating in place, why not just do something like this?
l[index] = '+91 {} {}'.format(i[:5],i[5:])
Also, here is the first google result for "python phone number library": https://github.com/daviddrysdale/python-phonenumbers as it may be of use to you. (Never used it, am not the maintainer.)

Python list.remove items present in second list

I've searched around and most of the errors I see are when people are trying to iterate over a list and modify it at the same time. In my case, I am trying to take one list, and remove items from that list that are present in a second list.
import pymysql
schemaOnly = ["table1", "table2", "table6", "table9"]
db = pymysql.connect(my connection stuff)
tables = db.cursor()
tables.execute("SHOW TABLES")
tablesTuple = tables.fetchall()
tablesList = []
# I do this because there is no way to remove items from a tuple
# which is what I get back from tables.fetchall
for item in tablesTuple:
tablesList.append(item)
for schemaTable in schemaOnly:
tablesList.remove(schemaTable)
When I put various print statements in the code, everything looks like proper and like it is going to work. But when it gets to the actual tablesList.remove(schemaTable) I get the dreaded ValueError: list.remove(x): x not in list.
If there is a better way to do this I am open to ideas. It just seemed logical to me to iterate through the list and remove items.
Thanks in advance!
** Edit **
Everyone in the comments and the first answer is correct. The reason this is failing is because the conversion from a Tuple to a list is creating a very badly formatted list. Hence there is nothing that matches when trying to remove items in the next loop. The solution to this issue was to take the first item from each Tuple and put those into a list like so: tablesList = [x[0] for x in tablesTuple] . Once I did this the second loop worked and the table names were correctly removed.
Thanks for pointing me in the right direction!

I assume that fetchall returns tuples, one for each database row matched.
Now the problem is that the elements in tablesList are tuples, whereas schemaTable contains strings. Python does not consider these to be equal.
Thus when you attempt to call remove on tablesList with a string from schemaTable, Python cannot find any such value.
You need to inspect the values in tablesList and find a way convert them to a strings. I suspect it would be by simply taking the first element out of the tuple, but I do not have a mySQL database at hand so I cannot test that.

Regarding your question, if there is a better way to do this: Yes.
Instead of adding items to the list, and then removing them, you can append only the items that you want. For example:
for item in tablesTuple:
if item not in schemaOnly:
tablesList.append(item)
Also, schemaOnly can be written as a set, to improve search complexity from O(n) to O(1):
schemaOnly = {"table1", "table2", "table6", "table9"}
This will only be meaningful with big lists, but in my experience it's useful semantically.
And finally, you can write the whole thing in one list comprehension:
tablesList = [item for item in tablesTuple if item not in schemaOnly]
And if you don't need to keep repetitions (or if there aren't any in the first place), you can also do this:
tablesSet = set(tablesTuple) - schemaOnly
Which is also has the best big-O complexity of all these variations.

What is the most pythonic way to pop a random element from a list?

Say I have a list x with unkown length from which I want to randomly pop one element so that the list does not contain the element afterwards. What is the most pythonic way to do this?
I can do it using a rather unhandy combincation of pop, random.randint, and len, and would like to see shorter or nicer solutions:
import random
x = [1,2,3,4,5,6]
x.pop(random.randint(0,len(x)-1))
What I am trying to achieve is consecutively pop random elements from a list. (i.e., randomly pop one element and move it to a dictionary, randomly pop another element and move it to another dictionary, ...)
Note that I am using Python 2.6 and did not find any solutions via the search function.

What you seem to be up to doesn't look very Pythonic in the first place. You shouldn't remove stuff from the middle of a list, because lists are implemented as arrays in all Python implementations I know of, so this is an O(n) operation.
If you really need this functionality as part of an algorithm, you should check out a data structure like the blist that supports efficient deletion from the middle.
In pure Python, what you can do if you don't need access to the remaining elements is just shuffle the list first and then iterate over it:
lst = [1,2,3]
random.shuffle(lst)
for x in lst:
# ...
If you really need the remainder (which is a bit of a code smell, IMHO), at least you can pop() from the end of the list now (which is fast!):
while lst:
x = lst.pop()
# do something with the element
In general, you can often express your programs more elegantly if you use a more functional style, instead of mutating state (like you do with the list).

You won't get much better than that, but here is a slight improvement:
x.pop(random.randrange(len(x)))
Documentation on random.randrange():
random.randrange([start], stop[, step])
Return a randomly selected element from range(start, stop, step). This is equivalent to choice(range(start, stop, step)), but doesn’t actually build a range object.

To remove a single element at random index from a list if the order of the rest of list elements doesn't matter:
import random
L = [1,2,3,4,5,6]
i = random.randrange(len(L)) # get random index
L[i], L[-1] = L[-1], L[i] # swap with the last element
x = L.pop() # pop last element O(1)
The swap is used to avoid O(n) behavior on deletion from a middle of a list.

despite many answers suggesting use random.shuffle(x) and x.pop() its very slow on large data. and time required on a list of 10000 elements took about 6 seconds when shuffle is enabled. when shuffle is disabled speed was 0.2s
the fastest method after testing all the given methods above was turned out to be written by #jfs
import random
L = [1,"2",[3],(4),{5:"6"},'etc'] #you can take mixed or pure list
i = random.randrange(len(L)) # get random index
L[i], L[-1] = L[-1], L[i] # swap with the last element
x = L.pop() # pop last element O(1)
in support of my claim here is the time complexity chart from this source
IF there are no duplicates in list,
you can achieve your purpose using sets too. once list made into set duplicates will be removed. remove by value and remove random cost O(1), ie very effecient. this is the cleanest method i could come up with.
L=set([1,2,3,4,5,6...]) #directly input the list to inbuilt function set()
while 1:
r=L.pop()
#do something with r , r is random element of initial list L.
Unlike lists which support A+B option, sets also support A-B (A minus B) along with A+B (A union B)and A.intersection(B,C,D). super useful when you want to perform logical operations on the data.
OPTIONAL
IF you want speed when operations performed on head and tail of list, use python dequeue (double ended queue) in support of my claim here is the image. an image is thousand words.

Here's another alternative: why don't you shuffle the list first, and then start popping elements of it until no more elements remain? like this:
import random
x = [1,2,3,4,5,6]
random.shuffle(x)
while x:
p = x.pop()
# do your stuff with p

I know this is an old question, but just for documentation's sake:
If you (the person googling the same question) are doing what I think you are doing, which is selecting k number of items randomly from a list (where k<=len(yourlist)), but making sure each item is never selected more than one time (=sampling without replacement), you could use random.sample like #j-f-sebastian suggests. But without knowing more about the use case, I don't know if this is what you need.

One way to do it is:
x.remove(random.choice(x))

While not popping from the list, I encountered this question on Google while trying to get X random items from a list without duplicates. Here's what I eventually used:
items = [1, 2, 3, 4, 5]
items_needed = 2
from random import shuffle
shuffle(items)
for item in items[:items_needed]:
print(item)
This may be slightly inefficient as you're shuffling an entire list but only using a small portion of it, but I'm not an optimisation expert so I could be wrong.

This answer comes courtesy of #niklas-b:
"You probably want to use something like pypi.python.org/pypi/blist "
To quote the PYPI page:
...a list-like type with better asymptotic performance and similar
performance on small lists
The blist is a drop-in replacement for the Python list that provides
better performance when modifying large lists. The blist package also
provides sortedlist, sortedset, weaksortedlist, weaksortedset,
sorteddict, and btuple types.
One would assume lowered performance on the random access/random run end, as it is a "copy on write" data structure. This violates many use case assumptions on Python lists, so use it with care.
HOWEVER, if your main use case is to do something weird and unnatural with a list (as in the forced example given by #OP, or my Python 2.6 FIFO queue-with-pass-over issue), then this will fit the bill nicely.

Python for loop skipping every other loop? [duplicate]

This question already has answers here:
Strange result when removing item from a list while iterating over it
(8 answers)
Closed 4 months ago.
I have a weird problem. Does anyone see anything wrong with my code?
for x in questions:
forms.append((SectionForm(request.POST, prefix=str(x.id)),x))
print "Appended " + str(x)
for (form, question) in forms:
print "Testing " + str(question)
if form.is_valid():
forms.remove((form,question))
print "Deleted " + str(question)
a = form.save(commit=False)
a.audit = audit
a.save()
else:
flag_error = True
Results in:
Appended Question 50
Appended Question 51
Appended Question 52
Testing Question 50
Deleted Question 50
Testing Question 52
Deleted Question 52
It seems to skip question 51. It gets appended to the list, but the for loop skips it. Any ideas?

You are modifying the contents of the object forms that you are iterating over, when you say:
forms.remove((form,question))
According to the Python documentation of the for statement, this is not safe (the emphasis is mine):
The for statement in Python differs a bit from what you may be used to in C or Pascal. Rather than always iterating over an arithmetic progression of numbers (like in Pascal), or giving the user the ability to define both the iteration step and halting condition (as C), Python’s for statement iterates over the items of any sequence (a list or a string), in the order that they appear in the sequence.
It is not safe to modify the sequence being iterated over in the loop (this can only happen for mutable sequence types, such as lists). If you need to modify the list you are iterating over (for example, to duplicate selected items) you must iterate over a copy. The slice notation makes this particularly convenient:
for x in a[:]: # make a slice copy of the entire list
... if len(x) > 6: a.insert(0, x)
See also this paragraph from the Python Language Reference which explains exactly what is going on:
There is a subtlety when the sequence is being modified by the loop (this can only occur for mutable sequences, i.e. lists). An internal counter is used to keep track of which item is used next, and this is incremented on each iteration. When this counter has reached the length of the sequence the loop terminates. This means that if the suite deletes the current (or a previous) item from the sequence, the next item will be skipped (since it gets the index of the current item which has already been treated). Likewise, if the suite inserts an item in the sequence before the current item, the current item will be treated again the next time through the loop.
There are a lot of solutions. You can follow their advice and create a copy. Another possibility is to create a new list as a result of your second for loop, instead of modifying forms directly. The choice is up to you...

You are removing objects from forms while iterating over it. This is supposed to lead to the behaviour you are seeing ( http://docs.python.org/reference/compound_stmts.html#the-for-statement ).
The solution is to either iterate over a copy of that list, or add the forms for removal to a separate collection, then perform the removal afterwards.

Using the remove method on forms (which I assume is a list) changes the size of the list. So think of it this way
[ 50, 51, 52 ]
Is your initial list, and you ask for the first item. You then remove that item from the list, so it looks like
[51, 52]
But now you ask for the second item, so you get 52.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.