Remove duplicates from list, including original matching item - python

I tried searching and couldn't find this exact situation, so apologies if it exists already.
I'm trying to remove duplicates from a list as well as the original item I'm searching for. If I have this:
ls = [1, 2, 3, 3]
I want to end up with this:
ls = [1, 2]
I know that using set will remove duplicates like this:
print set(ls) # set([1, 2, 3])
But it still retains that 3 element which I want removed. I'm wondering if there's a way to remove the duplicates and original matching items too.

Use a list comprehension and list.count:
>>> ls = [1, 2, 3, 3]
>>> [x for x in ls if ls.count(x) == 1]
[1, 2]
>>>
Here is a reference on both of those.
Edit:
#Anonymous made a good point below. The above solution is perfect for small lists but may become slow with larger ones.
For large lists, you can do this instead:
>>> from collections import Counter
>>> ls = [1, 2, 3, 3]
>>> c = Counter(ls)
>>> [x for x in ls if c[x] == 1]
[1, 2]
>>>
Here is a reference on collections.Counter.

If items are contigious, then you can use groupby which saves building an auxillary data structure in memory...:
from itertools import groupby, islice
data = [1, 2, 3, 3]
# could also use `sorted(data)` if need be...
new = [k for k, g in groupby(data) if len(list(islice(g, 2))) == 1]
# [1, 2]

Related

How to create a list so that when I append a variable the first element gets removed from list after a certain threshold

Let's say I want to create a list. The list need to have a MAX length of 5. The list would operate as such:
list = []
list.append(1)
list = [1]
list.append(2)
list = [1,2]
..
list.append(5)
list = [1,2,3,4,5]
But, when I append another number the first element is removed:
list.append(6)
list = [2,3,4,5,6]
This is super basic and I can't figure this one out.
I don't want to use classes - can this be done with basic functions such as slices?
You could use a collections.deque for this:
>>> import collections
>>> data = collections.deque(maxlen=5)
>>> data.append(1)
>>> data.append(2)
>>> data.append(3)
>>> data.append(4)
>>> data.append(5)
>>> data
deque([1, 2, 3, 4, 5], maxlen=5)
>>> data.append(6)
>>> data
deque([2, 3, 4, 5, 6], maxlen=5)
>>>
Note, this can pretty much work like a list object, however, it does have different performance characteristics, perhaps most importantly, no random access (basically, a linked list of arrays, but it won't have constant-time access like an array-list, which is what a list object is).
However, it is particularly meant for these types of operations, a "double-ended queue", it allows for efficient removal and addition at both ends. A list can only efficiently append and pop, but adding / removing from the beginning will be inefficient.
If you need it to be a list, you can use a slightly different approach to add the items (in-place):
L = []
L[:] = L[-4:] + [1] # [1]
L[:] = L[-4:] + [2] # [1, 2]
L[:] = L[-4:] + [3] # [1, 2, 3]
L[:] = L[-4:] + [4] # [1, 2, 3, 4]
L[:] = L[-4:] + [5] # [1, 2, 3, 4, 5]
L[:] = L[-4:] + [6] # [2, 3, 4, 5, 6]
Which you could place in a function to make it easier to use:
def appendMax(L,maxLen,value): # maxLen > 1
L[:] = L[1-maxLen:] + [value]
L = []
appendMax(L,5,1) # [1]
appendMax(L,5,2) # [1, 2]
appendMax(L,5,3) # [1, 2, 3]
appendMax(L,5,4) # [1, 2, 3, 4]
appendMax(L,5,5) # [1, 2, 3, 4, 5]
appendMax(L,5,6) # [2, 3, 4, 5, 6]
Use deque() from collections. It will hold certain amount of variables, after that it will delete the first item while appending.
If you don't want to use libraries or classes, you can simply create a list with N none values inside it, then loop inside the list to replace the values inside it.

Removing duplicates from a list but returning the same list

I need to remove the duplicates from a list but return the same list.
So options like:
return list(set(list))
will not work for me, as it creates a new list instead.
def remove_extras(lst):
for i in lst:
if lst.count(i)>1:
lst.remove(i)
return lst
Here is my code, it works for some cases, but I dont get why it does not work for remove_extras([1,1,1,1]), as it returns [1,1] when the count for 1 should be >1.
You can use slice assignment to replace the contents of the list after you have created a new list. In case order of the result doesn't matter you can use set:
def remove_duplicates(l):
l[:] = set(l)
l = [1, 2, 1, 3, 2, 1]
remove_duplicates(l)
print(l)
Output:
[1, 2, 3]
You can achieve this using OrderedDict which removes the duplicates while maintaining order of the list.
>>> from collections import OrderedDict
>>> itemList = [1, 2, 0, 1, 3, 2]
>>> itemList[:]=OrderedDict.fromkeys(itemList)
>>> itemList
[1, 2, 0, 3]
This has a Runtime : O(N)

Python list comprehension and do action

Is it possible to do an action with the item in a list comprehension?
Example:
list = [1, 2, 3]
list_filtered = [ i for i in list if i == 3 AND DO SOMETHING WITH I]
print (list_filtered)
For example if I want to remove the '3' how would I go about it? Logic says that it's something like:
list = [1, 2, 3]
list_filtered = [ i for i in list if i == 3 && list.remove(i) ]
print (list_filtered)
I can't seem to make Python perform an action with that 'i' with any syntax that I tried. Can someone please elaborate?
EDIT: Sorry, the explanation might not be clear enough. I know how to iterate and create the new list. I want to create "list_filtered" and remove that value from "list" if it fits the "IF" statement.
Practically I need the following:
list = [1, 2, 3]
list_filtered = [ i for i in list if i == 3 && list.remove(i) ]
print (list_filtered)
# output >> [3]
print (list)
# output >> [1, 2]
I hope the above makes it more clear. Also, please note that my question is if this can be done in the list comprehension specifically. I know how to do it with additional code. :)
EDIT2: Apparently what I wanted to do isn't possible and also isn't advisable (which is the reason it isn't possible). It seemed like a logical thing that I just didn't know how to do. Thanks guys :)
If you're just trying to remove 3 you could do:
list_filtered=[i for i in list if i != 3]
print(list_filtered) # [1,2]
This will remove all values that are not equal to 3.
Alternatively if you wanted to do something like increment all the items in the list you would do:
[i+1 for i in list]
>>> [2,3,4]
Using a function on every item of the list would look like:
[float(i) for i in list]
>>> [1.0, 2.0, 3.0]
You can do ternary statements:
[i if i<3 else None for i in list]
>>>[1, 2, None]
and a whole lot more...
Here is more documentation on list comprehensions.
Given your new updates, I would try something like:
list_filtered=[list.pop(list.index(3))]
Then list_filtered would be [3] and list would be [1,2] as your specified.
Your miss understand the purpose of list comprehension. List comprehension should be used to create a list, not to use its side effects. Further more, as Leijot has already mentioned, you should never modify a container while iterating over it.
If you want to filter out certain elements in a list using list comprehension, use an if statement:
>>> l = [1, 2, 3]
>>> l_filtered = [i for i in l if i != 3]
>>> l_filtered
[1, 2]
>>>
Alternatively, you can use the builtin function filter():
>>> l = [1, 2, 3]
>>> l_filtered = list(filter(lambda x: x != 3, l))
>>> l_filtered
[1, 2]
>>>
Edit: Based on your most recent edit, what your asking is absolutely possible. Just use two different list comprehensions:
>>> l = [1, 2, 3]
>>> l_filtered = [i for i in l if i == 3]
>>> l_filtered
[3]
>>> l = [i for i in l if i != 3] # reassigning the value of l to a new list
>>> l
[1, 2]
>>>
Maybe you want to provide a different example of what you're trying to do since you shouldn't remove elements from a list while you're iterating over it. Since list comprehensions create another list anyway, you could accomplish the same thing as your example by building a list that omits certain elements. If your goal is to create a list that excludes the value 3, to slightly modify your example:
l = [1, 2, 3]
list_filtered = [ i for i in l if i != 3 ]
l remains unaltered, but list_filtered now contains [1, 2].

Track value changes in a repetitive list in Python

I have a list with repeating values as shown below:
x = [1, 1, 1, 2, 2, 2, 1, 1, 1]
This list is generated from a pattern matching regular expression (not shown here). The list is guaranteed to have repeating values (many, many repeats - hundreds, if not thousands), and is never randomly arranged because that's what the regex is matching each time.
What I want is to track the list indices at which the entries change from the previous value. So for the above list x, I want to obtain a change-tracking list [3, 6] indicating that x[3] and x[6] are different from their previous entries in the list.
I managed to do this, but I was wondering if there was a cleaner way. Here's my code:
x = [1, 1, 1, 2, 2, 2, 1, 1, 1]
flag = []
for index, item in enumerate(x):
if index != 0:
if x[index] != x[index-1]:
flag.append(index)
print flag
Output: [3, 6]
Question: Is there a cleaner way to do what I want, in fewer lines of code?
It can be done using a list comprehension, with a range function
>>> x = [1, 1, 1, 2, 2, 2, 3, 3, 3]
>>> [i for i in range(1,len(x)) if x[i]!=x[i-1] ]
[3, 6]
>>> x = [1, 1, 1, 2, 2, 2, 1, 1, 1]
>>> [i for i in range(1,len(x)) if x[i]!=x[i-1] ]
[3, 6]
You can do something like this using itertools.izip, itertools.tee and a list-comprehension:
from itertools import izip, tee
it1, it2 = tee(x)
next(it2)
print [i for i, (a, b) in enumerate(izip(it1, it2), 1) if a != b]
# [3, 6]
Another alternative using itertools.groupby on enumerate(x). groupby groups similar items together, so all we need is the index of first item of each group except the first one:
from itertools import groupby
from operator import itemgetter
it = (next(g)[0] for k, g in groupby(enumerate(x), itemgetter(1)))
next(it) # drop the first group
print list(it)
# [3, 6]
If NumPy is an option:
>>> import numpy as np
>>> np.where(np.diff(x) != 0)[0] + 1
array([3, 6])
I'm here to add the obligatory answer that contains a list comprehension.
flag = [i+1 for i, value in enumerate(x[1:]) if (x[i] != value)]
instead multi-indexing that has O(n) complexity you can use an iterator to check for the next element in list :
>>> x =[1, 1, 1, 2, 2, 2, 3, 3, 3]
>>> i_x=iter(x[1:])
>>> [i for i,j in enumerate(x[:-1],1) if j!=next(i_x)]
[3, 6]
itertools.izip_longest is what you are looking for:
from itertools import islice, izip_longest
flag = []
leader, trailer = islice(iter(x), 1), iter(x)
for i, (current, previous) in enumerate(izip_longest(leader, trailer)):
# Skip comparing the last entry to nothing
# If None is a valid value use a different sentinel for izip_longest
if leader is None:
continue
if current != previous:
flag.append(i)

Returning a list of list elements

I need help writing a function that will take a single list and return a different list where every element in the list is in its own original list.
I know that I'll have to iterate through the original list that I pass through and then append the value depending on whether or not the value is already in my list or create a sublist and add that sublist to the final list.
an example would be:
input:[1, 2, 2, 2, 3, 1, 1, 3]
Output:[[1,1,1], [2,2,2], [3,3]]
I'd do this in two steps:
>>> import collections
>>> inputs = [1, 2, 2, 2, 3, 1, 1, 3]
>>> counts = collections.Counter(inputs)
>>> counts
Counter({1: 3, 2: 3, 3: 2})
>>> outputs = [[key] * count for key, count in counts.items()]
>>> outputs
[[1, 1, 1], [2, 2, 2], [3, 3]]
(The fact that these happen to be in sorted numerical order, and also in the order of first appearance, is just a coincidence here. Counters, like normal dictionaries, store their keys in arbitrary order, and you should assume that [[3, 3], [1, 1, 1], [2, 2, 2]] would be just as possible a result. If that's not acceptable, you need a bit more work.)
So, how does it work?
The first step creates a Counter, which is just a special subclass of dict made for counting occurrences of each key. One of the many nifty things about it is that you can just pass it any iterable (like a list) and it will count up how many times each element appears. It's a trivial one-liner, it's obvious and readable once you know how Counter works, and it's even about as efficient as anything could possibly be.*
But that isn't the output format you wanted. How do we get that? Well, we have to get back from 1: 3 (meaning "3 copies of 1") to [1, 1, 1]). You can write that as [key] * count.** And the rest is just a bog-standard list comprehension.
If you look at the docs for the collections module, they start with a link to the source. Many modules in the stdlib are like this, because they're meant to serve as source code for learning from as well as usable code. So, you should be able to figure out how the Counter constructor works. (It's basically just calling that _count_elements function.) Since that's the only part of Counter you're actually using beyond a basic dict, you could just write that part yourself. (But really, once you've understood how it works, there's no good reason not to use it, right?)
* For each element, it's just doing a hash table lookup (and insert if needed) and a += 1. And in CPython, it all happens in reasonably-optimized C.
** Note that we don't have to worry about whether to use [key] * count vs. [key for _ in range(count)] here, because the values have to be immutable, or at least of an "equality is as good as identity" type, or they wouldn't be usable as keys.
The most time efficient would be to use a dictionary:
collector = {}
for elem in inputlist:
collector.setdefault(elem, []).append(elem)
output = collector.values()
The other, more costly option is to sort, then group using itertools.groupby():
from itertools import groupby
output = [list(g) for k, g in groupby(sorted(inputlist))]
Demo:
>>> inputlist = [1, 2, 2, 2, 3, 1, 1, 3]
>>> collector = {}
>>> for elem in inputlist:
... collector.setdefault(elem, []).append(elem)
...
>>> collector.values()
[[1, 1, 1], [2, 2, 2], [3, 3]]
>>> from itertools import groupby
>>> [list(g) for k, g in groupby(sorted(inputlist))]
[[1, 1, 1], [2, 2, 2], [3, 3]]
What about this, as you said you wanted a function:
def makeList(user_list):
user_list.sort()
x = user_list[0]
output = [[]]
for i in user_list:
if i == x:
output[-1].append(i)
else:
output.append([i])
x = i
return output
>>> print makeList([1, 2, 2, 2, 3, 1, 1, 3])
[[1, 1, 1], [2, 2, 2], [3, 3]]

Categories