Eliminating duplicated elements in a list

Eliminating duplicated elements in a list - python

I was trying chp 10.15 in book Think Python and wrote following codes:
def turn_str_to_list(string):
res = []
for letter in string:
res.append(letter)
return res
def sort_and_unique (t):
t.sort()
for i in range (0, len(t)-2, 1):
for j in range (i+1, len(t)-1, 1):
if t[i]==t[j]:
del t[j]
return t
line=raw_input('>>>')
t=turn_str_to_list(line)
print t
print sort_and_unique(t)
I used a double 'for' structure to eliminate any duplicated elements in a sorted list.
However, when I　ran it, I kept getting wrong outputs.
if I input 'committee', the output is ['c', 'e', 'i', 'm', 'o', 't', 't'], which is wrong because it still contains double 't'.
I tried different inputs, sometimes the program can't pick up duplicated letters in middle of the list, and it always can not pick up the ones at the end.
What was I missing? Thanks guys.

The reason why your program isn't removing all the duplicate letters is because the use of del t[j] in the nested for-loops is causing the program to skip letters.
I added some prints to help illustrate this:
def sort_and_unique (t):
t.sort()
for i in range (0, len(t)-2, 1):
print "i: %d" % i
print t
for j in range (i+1, len(t)-1, 1):
print "\t%d %s len(t):%d" % (j, t[j], len(t))
if t[i]==t[j]:
print "\tdeleting %c" % t[j]
del t[j]
return t
Output:
>>>committee
['c', 'o', 'm', 'm', 'i', 't', 't', 'e', 'e']
i: 0
['c', 'e', 'e', 'i', 'm', 'm', 'o', 't', 't']
1 e len(t):9
2 e len(t):9
3 i len(t):9
4 m len(t):9
5 m len(t):9
6 o len(t):9
7 t len(t):9
i: 1
['c', 'e', 'e', 'i', 'm', 'm', 'o', 't', 't']
2 e len(t):9
deleting e
3 m len(t):8
4 m len(t):8
5 o len(t):8
6 t len(t):8
7 t len(t):8
i: 2
['c', 'e', 'i', 'm', 'm', 'o', 't', 't']
3 m len(t):8
4 m len(t):8
5 o len(t):8
6 t len(t):8
i: 3
['c', 'e', 'i', 'm', 'm', 'o', 't', 't']
4 m len(t):8
deleting m
5 t len(t):7
6 t len(t):7
i: 4
['c', 'e', 'i', 'm', 'o', 't', 't']
5 t len(t):7
i: 5
['c', 'e', 'i', 'm', 'o', 't', 't']
i: 6
['c', 'e', 'i', 'm', 'o', 't', 't']
['c', 'e', 'i', 'm', 'o', 't', 't']
Whenever del t[j] is called, the list becomes one element smaller but the inner j variable for-loops keeps iterating.
For example:
i=1, j=2, t = ['c', 'e', 'e', 'i', 'm', 'm', 'o', 't', 't']
It sees that t[1] == t[2] (both 'e') so it removes t[2].
Now t = ['c', 'e', 'i', 'm', 'm', 'o', 't', 't']
However, the code continues with i=1, j=3, which compares 'e' to 'm' and skips over 'i'.
Lastly, it is not catching the last two 't's because by the time i=5, len(t) is 7, so the conditions of the inner for-loop is range(6,6,1) and is not executed.

In python you could make use of the inbuilt data structures and library functions like set() & list()
Your turn_str_to_list() can be done with list(). Maybe you know this but wanted to do it on your own.
Using the list() and set() APIs:
line=raw_input('>>>')
print list(set(line))
Your sort_and_unique() has a O(n^2) complexity. One of the ways to make cleaner:
def sort_and_unique2(t):
t.sort()
res = []
for i in t:
if i not in res:
res.append(i)
return res
This would still be O(n^2) since look up (i not in res) would be linear time, but code looks a bit cleaner. Deletion has complexity O(n), so instead you could do append to new list since append is O(1). See this for complexities of list API: https://wiki.python.org/moin/TimeComplexity

You can try the following code snippet
s = "committe"
res = sorted((set(list(s))))

Solution explained:
>>> word = "committee"
Turn string to list of characters:
>>> clst = list(word)
>>> clst
['c', 'o', 'm', 'm', 'i', 't', 't', 'e', 'e']
Use set to get only unique items:
>>> unq_clst = set(clst)
>>> unq_clst
{'c', 'e', 'i', 'm', 'o', 't'}
It turns out (thanks Blckknght), that the list step is not necessary and we could do that this way:
>>> unq_clst = set(word)
{'c', 'e', 'i', 'm', 'o', 't'}
Both, set and list are taking as parameter an iterable, and iterating over string returns one character by another.
Sort it:
>>> sorted(unq_clst)
['c', 'e', 'i', 'm', 'o', 't']
One line version:
>>> sorted(set("COMMITTEE"))
['C', 'E', 'I', 'M', 'O', 'T']

Here you go:
In [1]: word = 'committee'
In [3]: word_ = set(word)
In [4]: word_
Out[4]: {'c', 'e', 'i', 'm', 'o', 't'}
The standard way to check for unique elements in python is to use a set. The constructor of a set takes any sequential object. A string is a collection of sequential ascii codes (or unicode codepoints), so it qualifies.
If you have further problems, do leave a comment.

So you want to have explained, what is wrong in your code. Here you are:
Before we dive into coding, make test case(s)
It would make our coding faster, if we get test case at hand from very begining
For testing I will make small utility function:
def textinout(text):
return "".join(sort_and_unique(list(text)))
This allows quick test like:
>>> textinout("committee")
"ceimot"
and another helper function for readable error traces:
def checkit(textin, expected):
msg = "For input '{textin}' we expect '{expected}', got '{result}'"
result = textinout(textin)
assert result == expected, msg.format(textin=textin, expected=expected, result=result)
And make the test case function:
def testit():
checkit("abcd", 'abcd')
checkit("aabbccdd", 'abcd')
checkit("a", 'a')
checkit("ddccbbaa", 'abcd')
checkit("ddcbaa", 'abcd')
checkit("committee", 'ceimot')
Let us make first test with existing function:
def sort_and_unique (t):
t.sort()
for i in range (0, len(t)-2, 1):
for j in range (i+1, len(t)-1, 1):
if t[i]==t[j]:
del t[j]
return t
Now we can test it:
testit()
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-11-291a15d81032> in <module>()
----> 1 testit()
<ipython-input-4-d8ad9abb3338> in testit()
1 def testit():
2 checkit("abcd", 'abcd')
----> 3 checkit("aabbccdd", 'abcd')
4 checkit("a", 'a')
5 checkit("ddccbbaa", 'abcd')
<ipython-input-10-620ac3b14f51> in checkit(textin, expected)
2 msg = "For input '{textin}' we expect '{expected}', got '{result}'"
3 result = textinout(textin)
----> 4 assert result == expected, msg.format(textin=textin, expected=expected, result=result)
AssertionError: For input 'aabbccdd' we expect 'abcd', got 'abcdd'
Reading the last line of error trace we know, what is wrong.
General comments to your code
Accessing list members via index
In most cases this is not efficient and it makes the code hard to read.
Instead of:
lst = ["a", "b", "c"]
for i in range(len(lst)):
itm = lst[i]
# do something with the itm
You should use:
lst = ["a", "b", "c"]
for itm in lst:
# do something with the itm
print itm
If you need to access subset of a list, use slicing
Instead of:
for i in range (0, len(lst)-2, 1):
itm = lst[i]
Use:
for itm in lst[:-2]:
# do something with the itm
print itm
If you really need to know position of processed item for inner loops, use enumerate:
Instead of:
lst = ["a", "b", "c", "d", "e"]
for i in range(0, len(lst)):
for j in range (i+1, len(lst)-1, 1):
itm_i = lst[i]
itm_j = lst[j]
# do something
Use enumerate, which turn each list item into tuple (index, item):
lst = ["a", "b", "c", "d", "e"]
for i, itm_i in enumerate(lst):
for itm_j in lst[i+1, -1]
print itm_i, itm_j
# do something
Manipulating a list which is processed
You are looping over a list and suddenly delete an item from it. List modification during iteration is generally better to avoid, if you have to do it, you have to
think twice and take care, like iterating backward so that you do not modify that part, which is
about to be processed in some next iteration.
As alternative to deleting an item from iterated list you can note findings (like duplicated items) to another list and
after you are out of the loop, use it somehow.
How could be your code rewritten
def sort_and_unique (lst):
lst.sort()
to_remove = []
for i, itm_i in enumerate(lst[:-2]):
for j, itm_j in enumerate(lst[i+1: -1]):
if itm_i == itm_j:
to_remove.append(itm_j)
# now we are out of loop and can modify the lst
# note, we loop over one list and modify another, this is safe
for itm in to_remove:
lst.remove(itm)
return lst
Reading the code, the problem turns out: you never touch last item in the sorted list. That is why you do not get "t" removed as it is alphabetically the last item after applying sort.
So your code could be corrected this way:
def sort_and_unique (lst):
lst.sort()
to_remove = []
for i, itm_i in enumerate(lst[:-1]):
for j, itm_j in enumerate(lst[i+1:]):
if itm_i == itm_j:
to_remove.append(itm_j)
for itm in to_remove:
lst.remove(itm)
return lst
From now on, the code is correct, and you shall prove it by calling testit()
>>> testit()
Silent test output is what we were dreaming about.
Having the test function make further code modification easy, as it will be quick to check, if things are still working as expected.
Anyway, the code can be shortened by getting tuples (itm_i, itm_j) using zip
def sort_and_unique (lst):
lst.sort()
to_remove = []
for itm_i, itm_j in zip(lst[:-1], lst[1:]):
if itm_i == itm_j:
to_remove.append(itm_j)
for itm in to_remove:
lst.remove(itm)
return lst
Test it:
>>> testit()
or using list comprehension:
def sort_and_unique (lst):
lst.sort()
to_remove = [itm_j for itm_i, itm_j in zip(lst[:-1], lst[1:]) if itm_i == itm_j]
for itm in to_remove:
lst.remove(itm)
return lst
Test it:
>>> testit()
As list comprehension (using []) completes creation of returned value sooner then are the values
used, we can remove another line:
def sort_and_unique (lst):
lst.sort()
for itm in [itm_j for itm_i, itm_j in zip(lst[:-1], lst[1:]) if itm_i == itm_j]:
lst.remove(itm)
return lst
Test it:
>>> testit()
Note, that so far, the code still reflects your original algorithm, only two bugs were removed:
- not manipulating list, we are iterating over
- taking into account also last item from the list

Related

Cannot find glitch in program using recursion for multible nested for-loops

alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g',
'h', 'i', 'j', 'k', 'l', 'm', 'n',
'o', 'p', 'q', 'r', 's', 't', 'u',
'v', 'w', 'x', 'y', 'z']
endlist = []
def loopfunc(n, lis):
if n ==0:
endlist.append(lis[0]+lis[1]+lis[2]+lis[3]+lis[4])
for i in alphabet:
if n >0:
lis.append(i)
loopfunc(n-1, lis )
loopfunc(5, [])
This program is supposed to make endlist be:
endlist = [aaaaa, aaaab, aaaac, ... zzzzy, zzzzz]
But it makes it:
endlist = [aaaaa, aaaaa, aaaaa, ... , aaaaa]
The lenght is right, but it won't make different words. Can anyone help me see why?

The only thing you ever add to endlist is the first 5 elements of lis, and since you have a single lis that is shared among all the recursive calls (note that you never create a new list in this code other than the initial values for endlist and lis, so every append to lis is happening to the same list), those first 5 elements are always the a values that you appended in your first 5 recursive calls. The rest of the alphabet goes onto the end of lis and is never reached by any of your other code.

Since you want string in the end, it's a little easier just to use strings for collecting your items. This avoids the possibility of shared mutable references which is cause your issues. With that the recursion becomes pretty concise:
alphabet = 'abcdefghijklmnopqrstuvwxyz'
def loopfunc(n, lis=""):
if n < 1:
return [lis]
res = []
for a in alphabet:
res.extend(loopfunc(n-1, lis + a))
return res
l = loopfunc(5)
print(l[0], l[1], l[-1], l[-2])
# aaaaa aaaab zzzzz zzzzy
Note that with n=5 you'll have almost 12 million combinations. If you plan on having larger n values, it may be worth rewriting this as a generator.

How do I recursively reverse a list in python without an aux fnc

Write a function that reverses a string. The input string is given as an array of characters char[].
Do not allocate extra space for another array, you must do this by modifying the input array in-place with O(1) extra memory.
You may assume all the characters consist of printable ascii characters.
Example:
Input: ["h","e","l","l","o"]
Output: ["o","l","l","e","h"]
I'm fairly new to recursion so I looked up some possible solutions but I'm not sure why this one isn't outputting the desired result as it looks like it runs.
class Solution(object):
def reverseString(self, s):
"""
:type s: List[str]
:rtype: None Do not return anything, modify s in-place instead.
"""
if not s:
return []
else:
return [s[-1]] + self.reverseString(s[:-1])

This is not quite as elegant of a solution, but it works.
def rev(arr, i=0):
if i >= len(arr) // 2:
return
arr[i], arr[-(i + 1)] = arr[-(i + 1)], arr[i]
rev(arr, i + 1)
test = ["t", "e", "s", "t"]
>>> test
['t', 'e', 's', 't']
>>> rev(test)
['t', 's', 'e', 't']

Recursive implementation :
>>> L = ["h","e","l","l","o"]
>>> L
['h', 'e', 'l', 'l', 'o']
reverse = lambda L: (reverse (L[1:]) + L[:1] if L else [])
>>> print(reverse(L))
['o', 'l', 'l', 'e', 'h']

List shuffling by range

I have a list full of strings. I want to take the first 10 values, shuffle them, then replace the first 10 values of the list, then with values 11-20, then 21-30, and so on.
For example:
input_list = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t']
and a function called:
shuffle10(input_list)
>>> ['d','b','c','f','j','i','h','a','e','g','m','n','s','r','k','p','l','q','o','t']
I thought it'd work if I defined an empty list and appended every 10 values randomized:
newlist=[]
for i in range(int(len(input_list) / 10)):
newlist.append(shuffle(input_list[(i*10):(i+1)*10]))
print(newlist)
but all this returns is:
[None]
[None, None]

Use random.sample instead of shuffle
>>> input_list = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t']
>>> sum((random.sample(input_list[n:n+10], 10) for n in range(0,len(input_list),10)), [])
['f', 'i', 'd', 'a', 'g', 'j', 'e', 'c', 'b', 'h', 'p', 'l', 'r', 'q', 'm', 't', 's', 'n', 'o', 'k']

You're creating a temp list in place and shuffling it but not capturing its results. You can pull out the relevant sublist, shuffle, then create a new list:
new_list=[]
for i in range(1, len(input_list), 10):
list_slice = input_list[i:i + 10]
shuffle(list_slice)
new_list.extend(list_slice)
print(new_list)

Pop multiple items from the beginning and end of a list

Suppose I have a list of items like this:
mylist=['a','b','c','d','e','f','g','h','i']
I want to pop two items from the left (i.e. a and b) and two items from the right (i.e. h,i). I want the most concise an clean way to do this. I could do it this way myself:
for x in range(2):
mylist.pop()
mylist.pop(0)
Any other alternatives?

From a performance point of view:
mylist = mylist[2:-2] and del mylist[:2];del mylist[-2:] are equivalent
they are around 3 times faster than the first solution for _ in range(2): mylist.pop(0); mylist.pop()
Code
iterations = 1000000
print timeit.timeit('''mylist=range(9)\nfor _ in range(2): mylist.pop(0); mylist.pop()''', number=iterations)/iterations
print timeit.timeit('''mylist=range(9)\nmylist = mylist[2:-2]''', number=iterations)/iterations
print timeit.timeit('''mylist=range(9)\ndel mylist[:2];del mylist[-2:]''', number=iterations)/iterations
output
1.07710313797e-06
3.44465017319e-07
3.49956989288e-07

You could slice out a new list, keeping the old list as is:
mylist=['a','b','c','d','e','f','g','h','i']
newlist = mylist[2:-2]
newlist now returns:
['c', 'd', 'e', 'f', 'g']
You can overwrite the reference to the old list too:
mylist = mylist[2:-2]
Both of the above approaches will use more memory than the below.
What you're attempting to do yourself is memory friendly, with the downside that it mutates your old list, but popleft is not available for lists in Python, it's a method of the collections.deque object.
This works well in Python 3:
for x in range(2):
mylist.pop(0)
mylist.pop()
In Python 2, use xrange and pop only:
for _ in xrange(2):
mylist.pop(0)
mylist.pop()
Fastest way to delete as Martijn suggests, (this only deletes the list's reference to the items, not necessarily the items themselves):
del mylist[:2]
del mylist[-2:]

If you don't want to retain the values, you could delete the indices:
del myList[-2:], myList[:2]
This does still require that all remaining items are moved up to spots in the list. Two .popleft() calls do require this too, but at least now the list object can handle the moves in one step.
No new list object is created.
Demo:
>>> myList = ['a','b','c','d','e','f','g','h','i']
>>> del myList[-2:], myList[:2]
>>> myList
['c', 'd', 'e', 'f', 'g']
However, from your use of popleft I strongly suspect you are, instead, working with a collections.dequeue() object instead. If so, *stick to using popleft(), as that is far more efficient than slicing or del on a list object.

To me, this is the prettiest way to do it using a list comprehension:
>> mylist=['a','b','c','d','e','f','g','h','i']
>> newlist1 = [mylist.pop(0) for idx in range(2)]
>> newlist2 = [mylist.pop() for idx in range(2)]
That will pull the first two elements from the beginning and the last two elements from the end of the list. The remaining items stay in the list.

First 2 elements: myList[:2]
Last 2 elements: mylist[-2:]
So myList[2:-2]

mylist = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o']
new_list = [mylist.pop(0) for _ in range(6) if len(mylist) > 0]
>>> new_list
['a', 'b', 'c', 'd', 'e', 'f']
new_list = [mylist.pop(0) for _ in range(6) if len(mylist) > 0]
>>> new_list
['g', 'h', 'i', 'j', 'k', 'l']
new_list = [mylist.pop(0) for _ in range(6) if len(mylist) > 0]
>>> new_list
['m', 'n', 'o']

Python3 has something cool, similar to rest in JS (but a pain if you need to pop out a lot of stuff)
mylist=['a','b','c','d','e','f','g','h','i']
_, _, *mylist, _, _ = mylist
mylist == ['c', 'd', 'e', 'f', 'g'] # true

Ordered Sets Python 2.7

I have a list that I'm attempting to remove duplicate items from. I'm using python 2.7.1 so I can simply use the set() function. However, this reorders my list. Which for my particular case is unacceptable.
Below is a function I wrote; which does this. However I'm wondering if there's a better/faster way. Also any comments on it would be appreciated.
def ordered_set(list_):
newlist = []
lastitem = None
for item in list_:
if item != lastitem:
newlist.append(item)
lastitem = item
return newlist
The above function assumes that none of the items will be None, and that the items are in order (ie, ['a', 'a', 'a', 'b', 'b', 'c', 'd'])
The above function returns ['a', 'a', 'a', 'b', 'b', 'c', 'd'] as ['a', 'b', 'c', 'd'].

Another very fast method with set:
def remove_duplicates(lst):
dset = set()
# relies on the fact that dset.add() always returns None.
return [item for item in lst
if item not in dset and not dset.add(item)]

Use an OrderedDict:
from collections import OrderedDict
l = ['a', 'a', 'a', 'b', 'b', 'c', 'd']
d = OrderedDict()
for x in l:
d[x] = True
# prints a b c d
for x in d:
print x,
print

Assuming the input sequence is unordered, here's O(N) solution (both in space and time).
It produces a sequence with duplicates removed, while leaving unique items in the same relative order as they appeared in the input sequence.
>>> def remove_dups_stable(s):
... seen = set()
... for i in s:
... if i not in seen:
... yield i
... seen.add(i)
>>> list(remove_dups_stable(['q', 'w', 'e', 'r', 'q', 'w', 'y', 'u', 'i', 't', 'e', 'p', 't', 'y', 'e']))
['q', 'w', 'e', 'r', 'y', 'u', 'i', 't', 'p']

I know this has already been answered, but here's a one-liner (plus import):
from collections import OrderedDict
def dedupe(_list):
return OrderedDict((item,None) for item in _list).keys()
>>> dedupe(['q', 'w', 'e', 'r', 'q', 'w', 'y', 'u', 'i', 't', 'e', 'p', 't', 'y', 'e'])
['q', 'w', 'e', 'r', 'y', 'u', 'i', 't', 'p']

I think this is perfectly OK. You get O(n) performance which is the best you could hope for.
If the list were unordered, then you'd need a helper set to contain the items you've already visited, but in your case that's not necessary.

if your list isn't sorted then your question doesn't make sense.
e.g. [1,2,1] could become [1,2] or [2,1]
if your list is large you may want to write your result back into the same list using a SLICE to save on memory:
>>> x=['a', 'a', 'a', 'b', 'b', 'c', 'd']
>>> x[:]=[x[i] for i in range(len(x)) if i==0 or x[i]!=x[i-1]]
>>> x
['a', 'b', 'c', 'd']
for inline deleting see Remove items from a list while iterating or Remove items from a list while iterating without using extra memory in Python
one trick you can use is that if you know x is sorted, and you know x[i]=x[i+j] then you don't need to check anything between x[i] and x[i+j] (and if you don't need to delete these j values, you can just copy the values you want into a new list)
So while you can't beat n operations if everything in the set is unique i.e. len(set(x))=len(x)
There is probably an algorithm that has n comparisons as its worst case but can have n/2 comparisons as its best case (or lower than n/2 as its best case if you know somehow know in advance that len(x)/len(set(x))>2 because of the data you've generated):
The optimal algorithm would probably use binary search to find maximum j for each minimum i in a divide and conquer type approach. Initial divisions would probably be of length len(x)/approximated(len(set(x))). Hopefully it could be carried out such that even if len(x)=len(set(x)) it still uses only n operations.

There is unique_everseen solution described in
http://docs.python.org/2/library/itertools.html
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in ifilterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element

Looks ok to me. If you really want to use sets do something like this:
def ordered_set (_list) :
result = set()
lastitem = None
for item in _list :
if item != lastitem :
result.add(item)
lastitem = item
return sorted(tuple(result))
I don't know what performance you will get, you should test it; probably the same because of method's overheat!
If you really are paranoid, just like me, read here:
http://wiki.python.org/moin/HowTo/Sorting/
http://wiki.python.org/moin/PythonSpeed/PerformanceTips
Just remembered this(it contains the answer):
http://www.peterbe.com/plog/uniqifiers-benchmark

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Eliminating duplicated elements in a list - python

You can try the following code snippet s = "committe" res = sorted((set(list(s))))

Related

Cannot find glitch in program using recursion for multible nested for-loops

How do I recursively reverse a list in python without an aux fnc

List shuffling by range

Pop multiple items from the beginning and end of a list

Ordered Sets Python 2.7

Categories

Resources