Remove duplicates without using list mutation - python

I am trying to remove adjacent duplicates from a list without using list mutations like del or remove. Below is the code I tried:
def remove_dups(L):
L = [x for x in range(0,len(L)) if L[x] != L[x-1]]
return L
print(remove_dups([1,2,2,3,3,3,4,5,1,1,1]))
This outputs:
[1, 3, 6, 7, 8]
Can anyone explain me how this output occurred? I want to understand the flow but I wasn't able to do it even with debugging in VS code.
Input:
[1,2,2,3,3,3,4,5,1,1,1]
Expected output:
[1,2,3,4,5,1]

I'll replace the variables to make this more readable
def remove_dups(L):
L = [x for x in range(0,len(L)) if L[x] != L[x-1]]
becomes:
def remove_dups(lst):
return [index for index in range(len(lst)) if lst[index] != lst[index-1]]
You can see, instead of looping over the items of the list it is instead looping over the indices of the array comparing the value at one index lst[index] to the value at the previous index lst[index-1] and only migrating/copying the value if they don't match
The two main issues are:
the first index it is compared to is -1 which is the last item of the list (compared to the first)
this is actually returning the indices of the non-duplicated items.
To make this work, I'd use the enumerate function which returns the item and it's index as follows:
def remove_dups(lst):
return [item for index, item in enumerate(lst[:-1]) if item != lst[index+1]] + [lst[-1]]
Here what I'm doing is looping through all of the items except for the last one [:-1] and checking if the item matches the next item, only adding it if it doesn't
Finally, because the last value isn't read we append it to the output + [lst[-1]].

This is a job for itertools.groupby:
from itertools import groupby
def remove_dups(L):
return [k for k,g in groupby(L)]
L2 = remove_dups([1,2,2,3,3,3,4,5,1,1,1])
Output: [1, 2, 3, 4, 5, 1]

Related

Save list number within a list only if it contains elements in python

I have list of lists such as :
my_list_of_list=[['A','B','C','E'],['A','B','C','E','F'],['D','G','A'],['X','Z'],['D','M'],['B','G'],['X','Z']]
as you can see, the list 1 and 2 share the most elements (4). So, I keep a list within my_list_of_list only if the 4 shared elements (A,B,C or E) are present within that list.
Here I then save within the list_shared_number[], only the lists 1,2,3 and 6 since the other does not contain (A,B,C or E).
Expected output:
print(list_shared_number)
[0,1,2,5]
Probably sub optimal because I need to iterate 3 times over lists but it's the expect result:
from itertools import combinations
from functools import reduce
common_elements = [set(i).intersection(j)
for i, j in combinations(my_list_of_list, r=2)]
common_element = reduce(lambda i, j: i if len(i) >= len(j) else j, common_elements)
list_shared_number = [idx for idx, l in enumerate(my_list_of_list)
if common_element.intersection(l)]
print(list_shared_number)
# Output
[0, 1, 2, 5]
Alternative with 2 iterations:
common_element = {}
for i, j in combinations(my_list_of_list, r=2):
c = set(i).intersection(j)
common_element = c if len(c) > len(common_element) else common_element
list_shared_number = [idx for idx, l in enumerate(my_list_of_list)
if common_element.intersection(l)]
print(list_shared_number)
# Output
[0, 1, 2, 5]
You can find shared elements by using list comprehension. Checking if index 0 and index 1:
share = [x for x in my_list_of_list[0] if x in my_list_of_list[1]]
print(share)
Assume j is each item so [j for j in x if j in share] can find shared inner elements. if the length of this array is more than 0 so it should include in the output.
So final code is like this:
share = [x for x in my_list_of_list[0] if x in my_list_of_list[1]]
my_list = [i for i, x in enumerate(my_list_of_list) if len([j for j in x if j in share]) > 0]
print(my_list)
You can use itertools.combinations and set operations.
In the first line, you find the intersection that is the longest among pairs of lists. In the second line, you iterate over my_list_of_list to identify the lists that contain elements from the set you found in the first line.
from itertools import combinations
comparison = max(map(lambda x: (len(set(x[0]).intersection(x[1])), set(x[0]).intersection(x[1])), combinations(my_list_of_list, 2)))[1]
out = [i for i, lst in enumerate(my_list_of_list) if comparison - set(lst) != comparison]
Output:
[0, 1, 2, 5]
Oh boy, so mine is a bit messy, however I did not use any imports AND I included the initial "finding" of the two lists which have the most in common with one another. This can easily be optimised but it does do exactly what you wanted.
my_list_of_list=[['A','B','C','E'],['A','B','C','E','F'],['D','G','A'],['X','Z'],['D','M'],['B','G'],['X','Z']]
my_list_of_list = list(map(set,my_list_of_list))
mostIntersects = [0, (None,)]
for i, IndSet in enumerate(my_list_of_list):
for j in range(i+1,len(my_list_of_list)):
intersects = len(IndSet.intersection(my_list_of_list[j]))
if intersects > mostIntersects[0]: mostIntersects = [intersects, (i,j)]
FinalIntersection = set(my_list_of_list[mostIntersects[1][0]]).intersection(my_list_of_list[mostIntersects[1][1]])
skipIndexes = set(mostIntersects[1])
for i,sub_list in enumerate(my_list_of_list):
[skipIndexes.add(i) for char in sub_list
if i not in skipIndexes and char in FinalIntersection]
print(*map(list,(mostIntersects, FinalIntersection, skipIndexes)), sep = '\n')
The print provides this :
[4, (0, 1)]
['E', 'C', 'B', 'A']
[0, 1, 2, 5]
This works by first converting the lists to sets using the map function (it has to be turned back into a list so i can use len and iterate properly) I then intersect each list with the others in the list of lists and count how many elements are in each. Each time i find one with a larger number, i set mostIntersections equal to the len and the set indexes. Once i go through them all, i get the lists at the two indexes (0 and 1 in this case) and intersect them to give a list of elements [A,B,C,E] (var:finalIntersection). From there, i just iterate over all lists which are not already being used and just check if any of the elements are found in finalIntersection. If one is, the index of the list is appended to skipIndexes. This results in the final list of indexes {indices? idk} that you were after. Technically the result is a set, but to convert it back you can just use list({0,1,2,5}) which will give you the value you were after.

Is there a prefered way of splitting a list into two based on indexes

For example if i have an list containing integers
arr = [1,2,3,4,5,6]
I would like to split this list into two lists based on specific indexes.
If i specify the indexes 0,1 and 3 it should return the old list with the removed items and a new list containing only the specified items.
arr = [1,2,3,4,5,6]
foo(arr, "013"): # returns -> [3,5,6] and [1,2,4]
Here's one way using a generator function, by popping the elements from the input list, while they are yielded from the function.
Given that the items in the list are being removed while iterating over it, it'll be necessary to sort in reverse order the list of indices, so that the indices of the actual values to remove in the input list remain unchanged while its values are being removed.
def foo(l, ix):
for i in sorted(list(ix), reverse=True):
yield l.pop(int(i))
By calling the function we get the values that have been removed:
arr = [1,2,3,4,5,6]
list(foo(arr, "013"))[::-1]
# [1, 2, 4]
And these have been removed from the original list:
print(arr)
# [3, 5, 6]
Hi you should look as pop() function.
Using this function is modifying the list directly.
The code should look like :
def foo( arr, indexes):
res= []
# process list in descending order to not modify order of indexes
for i in sorted(indexes, reverse=True):
res = arr.pop(i)
return res, arr
Thus foo(arr, [0,1,3]) is returning : [3,5,6], [1,2,4]
Created one solution based on How to remove multiple indexes from a list at the same time?
which does not use yield.
arr = [1,2,3,4,5,6]
indexes = [0,1,3]
def foo(arr, indexes):
temp = []
for index in sorted(indexes, reverse=True):
temp.append(arr.pop(index))
return arr, temp # returns -> [3, 5, 6] and [4, 2, 1]
This is what you need:
def foo(arr,idxstr):
out=[] # List which will contain elements according to string idxstr
left=list(arr) # A copy of the main List, which will result to the exceptions of the list out
for index in idxstr: # Iterates through every character in string idxstr
out.append(arr[int(index)])
left.remove(arr[int(index)])
return(out,left)

Python# rearranging array with high and low element in list?

I am new in python i would like to rearranging list with sequence of high and low value. Example Input is a=[1,2,3,4,5] then output should be
a=[5,1,4,2,3].i solved this way any one have better solution? please guide me.Thank you in Advance.
number=int(input("how many number do you want to input?"))
number_list=[]
for i in range(number):
array=int(input())
number_list.append(array)
print(number_list)
# empty array for adding value in list
tmp=[]
i=0
m=len(number_list)
while i<m:
# when last element left in array it will add on our tmp list
if i > len(number_list):
tmp.extend(number_list)
# once all value add in array it will break the program
if len(tmp) == m:
break
else:
high=max(number_list)
low=min(number_list)
tmp.append(high)
tmp.append(low)
#number_list.remove(high)
#number_list.remove(low)
# remove the element after added in the list
number_list.remove(high)
number_list.remove(low)
#print(len(number_list))
#print(tmp)
i +=1
print(tmp)
It is really easy, using a basic for-loop and a helper list. You are overthinking it:
list1 = [1,2,3,4,5]
list1.sort()
resultList = []
for i in range(len(list1)):
resultList.append(list1[len(list1)-1-i])
resultList.append(list1[i])
resultList = resultList[:len(resultList)//2]
list1 = resultList
Now, if you try to print it:
print(list1) # [5, 1, 4, 2, 3]
Note: This only works for Python 3, you have to do minor adjustments for it to work with Python 2 as well
My adea is to combine two lists alternatively b and c, the first is an order descending of list a, and the scond in order ascending of a
import itertools
b=sorted(a, reverse=True)
c=sorted(a)
print [x for x in itertools.chain.from_iterable(itertools.izip_longest(b,c)) if x][:len(a)]
Aother approach using min(), max() and a generator:
# list passed in argument must be sorted
def get_new_list(a):
for k in range(int(len(a)/2)+1):
aa = max(a[k:len(a)-k])
bb = min(a[k:len(a)-k])
if aa != bb:
yield aa
yield bb
else:
yield aa
a = [1,2,3,4,5]
a.sort()
final = list(get_new_list(a))
print(final)
Another approach using for loop and list slicing. This approach is more efficient than using min() and max():
def get_new_list(a):
for k in range(int(len(a)/2)+1):
aa = a[k:len(a)-k]
if len(aa) > 1:
yield aa[-1]
yield aa[0]
else:
yield aa[0]
a = [5,1,2,3,4]
# list must be sorted
a.sort()
final = list(get_new_list(a))
print(final)
Both will output:
[5, 1, 4, 2, 3]

Print a new line in between a list in python

I have a list of potentially unknown length like so:
list = [[1,2], [3,4,5], [6]]
I have a for loop that prints these items out, but I also want to be able to add an extra new line in between.
1
2
3
4
5
6
I don't want an additional new line after the final item or before the first. I have a for loop that prints out spaces in between the items in the line. However, there are instances where 1 or more indices are empty. In that case, I don't want to add an extra new line. I've managed to figure out if the first or last index is empty and how to deal with that, but not a middle index.
For example the above result should also be obtainable with this:
list = [[1, 2], [], [3, 4, 5], [6]]
I'm not sure what's the best way to determine this.
Use enumerate() like this:
for i, sub in enumerate(mylist):
if i: print() # If you are using Python 2, remove the parentheses
for x in sub:
print(x)
Edit: I misunderstood your question a little bit. Since your second example list had invalid syntax, I assumed that meant just two sublists. The comment by PaulRooney has cleared that up, so you can do this:
should_print = False
for sub in mylist:
if should_print: print()
for x in sub:
print(x)
should_print = bool(sub)
Since you don't want extra space before or after it sounds like str.join will probably be closer to what you want then other answers that print it chunk by chunk, first you need some generator for each chunk formatted on its own:
def parse_list(mylist):
for seq in mylist:
if seq: #is not empty
yield "\n".join(map(str,seq))
#with print function
print(*parse_list(stuff), sep="\n\n")
#old print statement (but still forward compatible)
print ("\n\n".join(parse_list(stuff)))
you could also just use a generator expression if you only need to use this once:
each_block = ("\n".join(map(str,seq)) for seq in stuff if seq)
print("\n\n".join(each_block))
Or you could condense this to a single line but I wouldn't:
print("\n\n".join("\n".join(map(str,seq)) for seq in stuff if seq))
list = [[1,2],[], [3],[5,6,7],[8]]
first = True
for item in list:
if first and item:
first = False
for number in item:
print(number)
else:
if item:
print('')
for number in item:
print(number)
list = [[1, 7, 3], [], [7, 10], None, [2, 3, 4]]
for i, v in enumerate(list):
if v:
if i != 0:
print
for x in v:
print x
When you say an index is "empty" do you mean it contains None?
list1 = [[1,2], None, [3]]
Or it contains an empty list?
list1 = [[1,2], [], [3]]
Either way, I think this would work:
for i,j in enumerate(list1):
if j:
for k in j:
print k
if i+1 < len(list1):
print
Edit: The above assumes that your input is always a list of lists, as your question seems to imply.
Comparing the current index to the length of the list to determine whether we're at the end will fail if the list has one or more empty elements at the end, e.g.:
list3 = [[1,2], [], [3], []]
This will output:
(start)
1
2
3
(end)
Note that there's a line break after the 3, which it sounds like you wouldn't want. You'd need some additional logic to account for that, if it's a possibility.
This should do the trick:
for idx, i in enumerate(l):
if i:
for j in i:
print(j)
if idx < len(l) - 1:
print()

check for duplicates in a python list

I've seen a lot of variations of this question from things as simple as remove duplicates to finding and listing duplicates. Even trying to take bits and pieces of these examples does not get me my result.
My question is how am I able to check if my list has a duplicate entry? Even better, does my list have a non-zero duplicate?
I've had a few ideas -
#empty list
myList = [None] * 9
#all the elements in this list are None
#fill part of the list with some values
myList[0] = 1
myList[3] = 2
myList[4] = 2
myList[5] = 4
myList[7] = 3
#coming from C, I attempt to use a nested for loop
j = 0
k = 0
for j in range(len(myList)):
for k in range(len(myList)):
if myList[j] == myList[k]:
print "found a duplicate!"
return
If this worked, it would find the duplicate (None) in the list. Is there a way to ignore the None or 0 case? I do not care if two elements are 0.
Another solution I thought of was turn the list into a set and compare the lengths of the set and list to determine if there is a duplicate but when running set(myList) it not only removes duplicates, it orders it as well. I could have separate copies, but it seems redundant.
Try changing the actual comparison line to this:
if myList[j] == myList[k] and not myList[j] in [None, 0]:
I'm not certain if you are trying to ascertain whether or a duplicate exists, or identify the items that are duplicated (if any). Here is a Counter-based solution for the latter:
# Python 2.7
from collections import Counter
#
# Rest of your code
#
counter = Counter(myList)
dupes = [key for (key, value) in counter.iteritems() if value > 1 and key]
print dupes
The Counter object will automatically count occurances for each item in your iterable list. The list comprehension that builds dupes essentially filters out all items appearing only once, and also upon items whose boolean evaluation are False (this would filter out both 0 and None).
If your purpose is only to identify that duplication has taken place (without enumerating which items were duplicated), you could use the same method and test dupes:
if dupes: print "Something in the list is duplicated"
If you simply want to check if it contains duplicates. Once the function finds an element that occurs more than once, it returns as a duplicate.
my_list = [1, 2, 2, 3, 4]
def check_list(arg):
for i in arg:
if arg.count(i) > 1:
return 'Duplicate'
print check_list(my_list) == 'Duplicate' # prints True
To remove dups and keep order ignoring 0 and None, if you have other falsey values that you want to keep you will need to specify is not None and not 0:
print [ele for ind, ele in enumerate(lst[:-1]) if ele not in lst[:ind] or not ele]
If you just want the first dup:
for ind, ele in enumerate(lst[:-1]):
if ele in lst[ind+1:] and ele:
print(ele)
break
Or store seen in a set:
seen = set()
for ele in lst:
if ele in seen:
print(ele)
break
if ele:
seen.add(ele)
You can use collections.defaultdict and specify a condition, such as non-zero / Truthy, and specify a threshold. If the count for a particular value exceeds the threshold, the function will return that value. If no such value exists, the function returns False.
from collections import defaultdict
def check_duplicates(it, condition, thresh):
dd = defaultdict(int)
for value in it:
dd[value] += 1
if condition(value) and dd[value] > thresh:
return value
return False
L = [1, None, None, 2, 2, 4, None, 3, None]
res = check_duplicates(L, condition=bool, thresh=1) # 2
Note in the above example the function bool will not consider 0 or None for threshold breaches. You could also use, for example, lambda x: x != 1 to exclude values equal to 1.
In my opinion, this is the simplest solution I could come up with. this should work with any list. The only downside is that it does not count the number of duplicates, but instead just returns True or False
for k, j in mylist:
return k == j
Here's a bit of code that will show you how to remove None and 0 from the sets.
l1 = [0, 1, 1, 2, 4, 7, None, None]
l2 = set(l1)
l2.remove(None)
l2.remove(0)

Categories