This question already has answers here:
Python: Remove duplicates for a specific item from list
(9 answers)
Closed 2 years ago.
I have the list:
[1, 2, 3, 3, 4, 4, 5, 2, 3, 3]
and want to obtain:
[1, 2, 3, 3, 4, 5, 2, 3, 3]
That is I don't want any 4s in my list that are next to each other.
How do I do this without importing any packages?
In [289]: L = [1,2,3,3,4,4,5,2,3,3]
In [290]: answer = [L[0]]
In [291]: for n in L[1:]:
...: if n == answer[-1] == 4:
...: continue
...: answer.append(n)
...:
In [292]: answer
Out[292]: [1, 2, 3, 3, 4, 5, 2, 3, 3]
Use the built-in zip() function:
l = [1, 2, 3, 3, 4, 4, 5, 2, 3, 3]
l = [l[0]] + [j for i, j in zip(l, l[1:]) if not i == 4 == j]
print(l)
Output:
[1, 2, 3, 3, 4, 5, 2, 3, 3]
Explanation:
This line:
[j for i, j in zip(l, l[1:]) if not i == 4 == j]
is a list comprehension, where j is a variable that iterates through all the values of l, excluding the first value, and i is the value right before j in each iteration.
Only add j to the list if the value before it, i, os not equal to 4 while j is also equal to 4.
Since j omits the first value, concatenate the first value to the beginning of the list with [l[0]] + .
Related
This question already has answers here:
Remove all occurrences of item(s) in list if it appears more than once
(3 answers)
Closed last year.
My question is different from removing duplicate
removing duplicate:
list_1 = [1, 2, 3, 4, 4, 3, 3, 7]
will become
you only keep one of the duplicated values
list_1 = [1, 2, 3, 4, 7]
My Question:
list_1 = [1, 2, 3, 4, 4, 3, 3, 7]
will become
you don't keep any of the duplicated values
list_1 = [1, 2, 7]
The following will work:
list_1 = [1, 2, 3, 4, 4, 3, 3, 7]
res = [i for i in list_1 if list_1.count(i) == 1]
>>>print(res)
[1, 2, 7]
from collections import defaultdict
d = defaultdict(int)
for x in list_1:
d[x] += 1
res = [k for k,v in d if v == 1]
complexity: O(n)
EDITED. Even shorter for the same complexity
from collections import Counter
res = [k for k,v in Counter(list_1).items() if v == 1]
If you have two lists,
a = [1, 2, 3, 4, 5]
b = [1, 3, 2, 4, 7]
how can you count the number of times elements at a certain position coincide? For example 1 and 4 in the above example would have 2 cases of elements coinciding.
sum(a_ == b_ for a_, b_ in zip(a, b))
zip can give you the elements that share a position, and you can use sum to count the number of times they match:
a = [1, 2, 3, 4, 5]
b = [1, 3, 2, 4, 7]
print(sum(x == y for x, y in zip(a, b))) # 2
You can use below code and you will get positions which coincide and get sum of them as well.
a = [1, 2, 3, 4, 5]
b = [1, 3, 2, 4, 7]
print(len([i for i,val in enumerate(zip(a,b)) if val[0]==val[1]]))
to get positions you can use
print([i for i,val in enumerate(zip(a,b)) if val[0]==val[1]])
one more version:
a = [1, 2, 3, 4, 5]
b = [1, 3, 2, 4, 7]
print(sum(a[i] == b[i] for i in range(len(a))))
How about this?
# lists:
a = [1, 2, 3, 4, 5]
b = [1, 3, 2, 4, 7]
# initialize variables:
number_of_collisions = 0
# iterate over each element:
for i in range(len(a)):
if a[i] == b[i]:
number_of_collisions += 1
print(number_of_collisions)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Suppose I have a list of values
l = [1, 1, 2, 5, 2, 3, 4, 2]
I would like to extract duplicated pairs/clusters with their indices, e.g., [(0, 1), (2, 4, 7)]. Is there a fast way to do so? The length of the list could be >100000.
Update: I tried to construct a n^2 boolean matrix but that took too much memory.
Use defaulldict:
from collections import defaultdict
l = [1, 1, 2, 5, 2, 3, 4]
d = defaultdict(list) # key - number, value - list of indexes
for i, n in enumerate(l):
d[n].append(i) # add index to list for this number n
print(d)
Output:
{1: [0, 1], 2: [2, 4], 3: [5], 4: [6], 5: [3]}
Complexitity will be O(n) here
To filter only duplicated items use:
[v for v in d.values() if len(v) > 1]
Output:
[[0, 1], [2, 4]]
Since you tag pandas
s=pd.DataFrame(enumerate(l))
s[s[1].duplicated(keep=False)].groupby(1)[0].apply(list)
1
1 [0, 1]
2 [2, 4, 7]
Name: 0, dtype: object
You can use numpy.unique followed by a list comprehension to get the desired collections of indices:
In [29]: l = [1, 1, 2, 5, 2, 3, 4, 2]
In [30]: u, inv, counts = np.unique(l, return_inverse=True, return_counts=True)
In [31]: [np.nonzero(inv == k)[0] for k in np.where(counts > 1)[0]]
Out[31]: [array([0, 1]), array([2, 4, 7])]
Here's another method that works if the values in l are all relatively small integers:
In [40]: l = [1, 1, 2, 5, 2, 3, 4, 2]
In [41]: al = np.array(l)
In [42]: [np.nonzero(al == k)[0] for k in np.where(np.bincount(l) > 1)[0]]
Out[42]: [array([0, 1]), array([2, 4, 7])]
We want to match the indices against the sequence of that list.After you find each match, reset this parameter to the location just after the match that was found.
Code
list =[1,1,1,3,4,5,5,6,7]
def Duplicate(func,data):
start = -1
y = []
while True:
try:
x = func.index(data,start+1)
except ValueError:
break
else:
y.append(x)
start = x
return y
from functools import partial
New= partial(Duplicate, list)
for a in list:
print(a, New(a))
So if we want to do repeated testing for various keys in that list against the same source, we can use functools.partial to create a new function variable, using a "partially complete" argument list.
Output:
1 [0, 1, 2]
1 [0, 1, 2]
1 [0, 1, 2]
3 [3]
4 [4]
5 [5, 6]
5 [5, 6]
6 [7]
7 [8]
You may use np.unique, np.flatnonzero and list comprehension as follows
u_val, dupcount = np.unique(l, return_counts=True)
dups = u_val[dupcount > 1]
out = [tuple(np.flatnonzero(l==item)) for item in dups]
In [98]: out
Out[98]: [(0, 1), (2, 4, 7)]
you can use groupby:
from itertools import groupby
from operator import itemgetter
[gr for gr in (tuple(e for e, _ in v) for _, v in groupby(sorted(enumerate(l),key=itemgetter(1)), key=itemgetter(1))) if len(gr) > 1]
output:
[(0, 1), (2, 4, 7)]
Does anyone know how I can get the last index position of duplicate items in a python list containing duplicate and non-duplicate items?
I have a list sorted in ascending order with [1, 1, 1, 2, 2, 3, 3, 4, 5]
I want it to print the last index of duplicate items and index on non-duplicate items like this
2
4
6
7
8
I tried doing this way but could only print the starting index of duplicate elements and misssed non-duplicate items.
id_list = [1, 1, 1, 2, 2, 3, 3, 4, 5]
for i in range(len(id_list)):
for j in range(i+1,len(id_list)):
if id_list[i]==id_list[j]:
print(i)
Loop on the list using enumerate to get indexes & values, and use a dictionary and retain the last index (last index "wins" when there are duplicates). In the end, sort the indexes (as dictionaries aren't ordered, but you can use an OrderedDict):
import collections
lst = [1, 1, 1, 2, 2, 3, 3, 4, 5]
d = collections.OrderedDict()
for i,v in enumerate(lst):
d[v] = i
print(list(d.values()))
prints:
[2, 4, 6, 7, 8]
The advantage of this solution is that it works even if the duplicates aren't consecutive.
Python 3.7 guarantees the order of the base dictionaries so a simple dict comprehension solves it:
{v:i for i,v in enumerate(lst)}.values()
You can use enumerate and check the next index in the list. If an element is not equal to the element in the next index, it is the last duplicate:
lst = [1, 1, 1, 2, 2, 3, 3, 4, 5]
result = [i for i, x in enumerate(lst) if i == len(lst) - 1 or x != lst[i + 1]]
print(result)
# [2, 4, 6, 7, 8]
You can use a list comprehension with enumerate and zip. The last value will always be in scope, so we can include this at the end explicitly.
L = [1, 1, 1, 2, 2, 3, 3, 4, 5]
res = [idx for idx, (i, j) in enumerate(zip(L, L[1:])) if i != j] + [len(L) - 1]
print(res)
# [2, 4, 6, 7, 8]
I am trying remove duplicate elements from the list, whose number of duplicates is odd.
For example for the following list: [1, 2, 3, 3, 3, 5, 8, 1, 8] I have 1 duplicated 2 times, 3 duplicated 3 times, and 8 duplicated 2 times. So 1 and 8 should be out and instead of 3 elements of 3 I need to leave only 1.
This is what I came up with:
def remove_odd_duplicates(arr):
h = {}
for i in arr:
if i in h:
h[i] += 1
else:
h[i] = 1
arr = []
for i in h:
if h[i] % 2:
arr.append(i)
return arr
It returns everything correctly: [2, 3, 5], but I do believe that this can be written in a nicer way. Any ideas?
You can use collections.Counter and list comprehension, like this
data = [1, 2, 3, 3, 3, 5, 8, 1, 8]
from collections import Counter
print [item for item, count in Counter(data).items() if count % 2]
# [2, 3, 5]
The Counter gives a dictionary, with every element in the input iterable as the keys and their corresponding counts as the values. So, we iterate over that dict and check if the count is odd and filter only those items out.
Note: The complexity of this solution is still O(N), just like your original program.
If order doesn't matter:
>>> a = [1, 2, 3, 3, 3, 5, 8, 1, 8]
>>> list(set([x for x in a if a.count(x)%2 == 1]))
[2, 3, 5]
The list comprehension [x for x in a if a.count(x)%2 == 1] returns only the elements which appear an odd number of times in the list. list(set(...)) is a common way of removing duplicate entries from a list.
you can possibly use scipy.stats.itemfreq:
>>> from scipy.stats import itemfreq
>>> xs = [1, 2, 3, 3, 3, 5, 8, 1, 8]
>>> ifreq = itemfreq(xs)
>>> ifreq
array([[1, 2],
[2, 1],
[3, 3],
[5, 1],
[8, 2]])
>>> i = ifreq[:, 1] % 2 != 0
>>> ifreq[i, 0]
array([2, 3, 5])