Python - Removing duplicates elements while remain the index

Python - Removing duplicates elements while remain the index - python

I have two lists like:
x = ['A','A','A','B','B','C','C','C','D']
list_date = ['0101','0102','0103','0104','0105','0106','0107','0108','0109']
I wanna remove the duplicates elements of the list, and it can be fulfilled by the answer in Removing elements that have consecutive duplicates
However, the ouput I expect is like
['A','B','C','D']
['0101','0104','0106','0109']
That is
For x, I wanna remove the duplicate elements.
For list_date, I wanna remain the dates based on the remaining elements in x.
Do you have any way to implement this?
2020-06-14 updated:
Thank you for the answers!
My data also has the case
y = ['A','A','A','B','B','C','C','C','D','A','A','C','C','B','B','B']
list_date = ['0101','0102','0103','0104','0105','0106','0107','0108','0109','0110','0111','0112','0113','0114','0115','0116']
The output should be
['A','B','C','D','A','C','B']
['0101','0104','0106','0109','0110','0112','0114']
How should I process the list like this?

You can use zip() to couple your data to your dates, use a loop and a set to remove dupes and zip() again to get single lists from it:
x = ['A','A','A','B','B','C','C','C','D']
list_date = ['0101','0102','0103','0104','0105','0106','0107','0108','0109']
r = []
k = zip(x,list_date)
s = set()
# go over the zipped values and remove dupes
for el in k:
if el[0] in s:
continue
# not a dupe, add to result and set
r.append(el)
s.add(el[0])
data, dates = map(list, zip(*r))
print(data)
print(dates)
Output:
['A', 'B', 'C', 'D']
['0101', '0104', '0106', '0109']
See How to iterate through two lists in parallel?

Try this below :
x = ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C', 'D']
list_date = ['0101', '0102', '0103', '0104', '0105', '0106', '0107', '0108', '0109']
op = dict()
y = []
for i in range(len(x)):
if x[i] not in y:
y.append(x[i])
op[x[i]] = list_date[i]
z = list(op.values())
print(y)
print(z)
Output
['A', 'B', 'C', 'D']
['0101', '0104', '0106', '0109']

You Can use zip function to nail this
l = ['A','A','A','B','B','C','C','C','D']
list_date = ['0101','0102','0103','0104','0105','0106','0107','0108','0109']
new_l = []
new_list_date = []
for i,j in zip(l,list_date):
if i not in new_l:
new_l.append(i)
new_list_date.append(j)
print(new_l)
#['A', 'B', 'C', 'D']
print(new_list_date)
#['0101', '0104', '0106', '0109']

Related

Remove NOT duplicates value from list

The scenario is this something like this:
After joining several lists using:
list1 = ["A","B"]
list2 = ["A","B","C"]
list3 = ["C","D","E"]
mainlist = list1 + list2 + list3
mainlist.sort()
mainlist now looks like that:
mainlist = ['A', 'A', 'B', 'B', 'C', 'C', 'D', 'E']
I would like to remove anything that is not a duplicate value. If the value in question is already present in the list it must not be touched and while if it is present only once in the mainlist I would like to delete it.
I tried to use this approach but seems something isn't working:
for i in mainlist:
if mainlist.count(i) <= 1:
mainlist.remove(i)
else:
continue
but what I return is a list that looks like the following:
mainlist = ['A', 'A', 'B', 'B', 'C', 'C', 'E'] #value "D" is not anymore present. Why?
What i would like to return is a list like that:
mainlist = ['A', 'A', 'B', 'B', 'C', 'C'] #All values NOT duplicates have been deleted
I can delete the duplicates with the below code:
for i in mainlist:
if mainlist.count(i) > 1:
mainlist.remove(i)
else:
continue
and then as a final result:
mainlist = ['A','B','C']
But the real question is: how can I delete the non-duplicates in a list?

You can find duplicates like this:
duplicates = [item for item in mainlist if mainlist.count(item) > 1]

You can use collections.Counter() to keep track of the frequencies of each item:
from collections import Counter
counts = Counter(mainlist)
[item for item in mainlist if counts[item] > 1]
This outputs:
['A', 'A', 'B', 'B', 'C', 'C']

Use collections.Counter to count the list elements. Use list comprehension to keep only the elements that occur more than once. Note that the list does not have to be sorted.
from collections import Counter
list1 = ["A","B"]
list2 = ["A","B","C"]
list3 = ["C","D","E"]
mainlist = list1 + list2 + list3
cnt = Counter(mainlist)
print(cnt)
# Counter({'A': 2, 'B': 2, 'C': 2, 'D': 1, 'E': 1})
dups = [x for x in mainlist if cnt[x] > 1]
print(dups)
# ['A', 'B', 'A', 'B', 'C', 'C']

Another solution, using numpy:
u, c = np.unique(mainlist, return_counts=True)
out = np.repeat(u[c > 1], c[c > 1])
print(out)
Prints:
['A' 'A' 'B' 'B' 'C' 'C']

Your problem lies in you operating on the while iterating over it. After removing the "D" the loops stops because there are no more elements in the list as the "E" at index 6.
Create a copy of the list and only operate on that list:
new_list = list(mainlist)
for i in mainlist:
if mainlist.count(i) <= 1:
new_list.remove(i)
else:
continue

If you want to output only a list of duplicate elements in your lists, you can use sets and a comprehension to keep only the duplicates.
list1 = ["A","B"]
list2 = ["A","B","C"]
list3 = ["C","D","E"]
fulllist = list1 + list2 + list3
fullset = set(list1) | set(list2) | set(list3)
dups = [x for x in fullset if fulllist.count(x) > 1]
print(dups) # ['A', 'C', 'B']

How to remove elements from a list that appear less than k = 2?

I am trying to keep elements of a list that appear at least twice, and remove the elements that appear less than twice.
For example, my list can look like:
letters = ['a', 'a', 'b', 'b', 'b', 'c']
I want to get a list with the numbers that appear at least twice, to get the following list:
letters_appear_twice = ['a', 'b'].
But since this is part of a bigger code, I don't know exactly what my lists looks like, only that I want to keep the letters that are repeated at least twice. But for the sake of understanding, we can assume I know what the list looks like!
I have tried the following:
'''
letters = ['a', 'a', 'b', 'b', 'b', 'c']
for x in set(letters):
if letters.count(x) > 2:
while x in letters:
letters.remove(x)
print(letters)
'''
But this doesn't quite work like I want it too...
Thank you in advance for any help!

letters = ['a', 'a', 'b', 'b', 'b', 'c']
res = []
for x in set(letters):
if letters.count(x) >= 2:
res.append(x)
print(res)
Prints:
['b', 'a']

Using your code above. You can make a new list, and append to it.
new_list = []
for x in set(letters):
if letters.count(x) >= 2:
new_list.append(x)
print(new_list)
Output
['b', 'a']

Easier to create a new list instead of manipulating the source list
def letters_more_or_equal_to_k(letters, k):
result = []
for x in set(letters):
if letters.count(x) >= k:
result.append(x)
result.sort()
return result
def main():
letters = ['a', 'a', 'b', 'b', 'b', 'c']
k = 2
result = letters_more_or_equal_to_k(letters, k)
print(result) # prints ['a', 'b']
if __name__ == "__main__":
main()

If you don't mind shuffling the values, here's one possible solution:
from collections import Counter
letters = ['a', 'a', 'b', 'b', 'b', 'c']
c = Counter(letters)
to_remove = {x for x, i in c.items() if i < 2}
result = list(set(letters) - to_remove)
print(result)
Output:
['a', 'b']
You can always sort later.
This solution is efficient for lists with more than ~10 unique elements.

Putting column values from text file into a list in python

I have a text file like this:
a w
b x
c,d y
e,f z
And I want to get the values of the first column into a list without duplicates. For now I get the values from the first column, which I am doing like this:
f=open("file.txt","r")
lines=f.readlines()
firstCol=[]
for x in lines:
firstCol.append(x.split(' ')[0])
f.close()
In the next step I want to separate the values by a comma delimiter the same way I did before, but then I get an output like this:
[['a'], ['b'], ['c', 'd'], ['e', 'f']]
How can I convert this into a one dimensional thing to be able to remove duplicates afterwards?
I am a beginner in python.

You can split it immediately after the first split and must use extend instead of append.
f=open("file.txt","r")
lines=f.readlines()
firstCol=[]
for x in lines:
firstCol.extend(x.split(' ')[0].split(','))
f.close()
print(firstCol)
Result
['a', 'b', 'c', 'd', 'e', 'f']
Or if you want to keep the firstCol
f=open("file.txt","r")
lines=f.readlines()
firstCol=[]
for x in lines:
firstCol.append(x.split(' ')[0])
f.close()
one_dimension = []
for col in firstCol:
one_dimension.extend(col.split(','))
print(firstCol)
print(one_dimension)
Result
['a', 'b', 'c,d', 'e,f']
['a', 'b', 'c', 'd', 'e', 'f']

you can use itertools.chain to flatten your list of lists and then you can use the built-in class set to remove the duplicates :
from itertools import chain
l = [['a'], ['b'], ['c', 'd'], ['e', 'f']]
set(chain.from_iterable(l))
# {'a', 'b', 'c', 'd', 'e', 'f'}
to flatten your list you can also use a list comprehension:
my_l = [e for i in l for e in i]
# ['a', 'b', 'c', 'd', 'e', 'f']
same with 2 simple for loops:
my_l = []
for i in l:
for e in i:
my_l.append(e)

Possible solution 1
If your are fine with your code, you can keep like that and remove duplicates from a list of lists executing the following:
import itertools
firstCol.sort()
firstCol = list(x for x,_ in itertools.groupby(firstCol))
Possible solution 2
If you want to convert the list of lists into one list of items:
firstCol = [x for y in firstCol for x in y]
If you want to also remove duplicates:
firstCol = list(set([x for y in firstCol for x in y]))

Split a list in python with an element as the delimiter? [duplicate]

This question already has answers here:
Python split for lists
(6 answers)
Closed 2 years ago.
I want to create sub-lists from a list which has many repeating elements, ie.
l = ['a', 'b', 'c', 'c', 'b', 'a', 'b', 'c', 'b', 'a']
Wherever the 'a' begins the list should be split. (preferably removing 'a' but not a must)
As such:
l = [ ['b', 'c', 'c', 'b'], ['b', 'c', 'b'] ]
I have tried new_list = [x.split('a')[-1] for x in l] but I am not getting the desired "New list" effect.

When you write,
new_list = [x.split('a')[-1] for x in l]
you are essentially performing,
result = []
for elem in l:
result.append(elem.split('a')[-1])
That is, you're splitting each string contained in l on the letter 'a', and collecting the last element of each of the strings into the result.
Here's one possible implementation of the mechanic you're looking for:
def extract_parts(my_list, delim):
# Locate the indices of all instances of ``delim`` in ``my_list``
indices = [i for i, x in enumerate(my_list) if x == delim]
# Collect each end-exclusive sublist bounded by each pair indices
sublists = []
for i in range(len(indices)-1):
part = my_list[indices[i]+1:indices[i+1]]
sublists.append(part)
return sublists
Using this function, we have
>>> l = ['a', 'b', 'c', 'c', 'b', 'a', 'b', 'c', 'b', 'a']
>>> extract_parts(l, 'a')
[['b', 'c', 'c', 'b'], ['b', 'c', 'b']]

You can use zip and enumerate to do that. Create a list of ids for separation and just break it at those points.
size = len(l)
id_list = [id + 1 for id, val in
enumerate(test_list) if val == 'a']
result = [l[i:j] for i, j in zip([0] + id_list, id_list +
([size] if id_list[-1] != size else []))]

It will not include the delimiter
import itertools
lst = ['a', 'b', 'c', 'c', 'b', 'a', 'b', 'c', 'b', 'a']
delimiter = lst[0]
li=[list(value) for key,value in itertools.groupby(lst, lambda e: e == delimiter) if not key]
print(li)
Explanation: groupby function will create a new group each time key will change
Key value
True itertools._grouper object pointing to group 'a'
False itertools._grouper object pointing to group 'b', 'c', 'c', 'b'
True itertools._grouper object pointing to group 'a'
False itertools._grouper object pointing to group 'b', 'c', 'b'
True itertools._grouper object pointing to group 'a'
In if condition checking if the key is false, return the itertools._grouper object and then pass itertool object to list.

Create a counters array for each element you want to split at then write a condition in this fashion:
l = ['a', 'b', 'c', 'c', 'b', 'a', 'b', 'c', 'b', 'a']
counters = [0,0,0] #to count instances
index = 0 #index of l
startIndex = 0 #where to start split
endIndex = 0 #where to end split
splitLists = [] #container for splits
for element in l:
if element == 'a': #find 'a'
counters[0] += 1 #increase counter
if counters[0] == 1: #if first instance
startIndex = index + 1 #start split after
if counters[0] == 2:
endIndex = index #if second instance
splitList = l[startIndex:endIndex] #end split here
counters[0] = 1 #make second starting location
startIndex = index + 1
splitLists.append(splitList) #append to main list of lists
index += 1
print(splitLists)
So basically you are finding the start and end index of the matching pattern within the list. You use these to split the list, and append this list into a main list of lists (2d list).

Splitting string values in list into individual values, Python

I have a list of values in which some values are words separated by commas, but are considered single strings as shown:
l = ["a",
"b,c",
"d,e,f"]
#end result should be
#new_list = ['a','b','c','d','e','f']
I want to split those strings and was wondering if there's a one liner or something short to do such a mutation. So far what, I was thinking of just iterating through l and .split(',')-ing all the elements then merging, but that seems like it would take a while to run.

import itertools
new_list = list(itertools.chain(*[x.split(',') for x in l]))
print(new_list)
>>> ['a', 'b', 'c', 'd', 'e', 'f']

Kind of unusual but you could join all your elements with , and then split them:
l = ["a",
"b,c",
"d,e,f"]
newList = ','.join(l).split(',')
print(newList)
Output:
['a', 'b', 'c', 'd', 'e', 'f']

Here's a one-liner using a (nested) list comprehension:
new_list = [item for csv in l for item in csv.split(',')]
See it run here.

Not exactly a one-liner, but 2 lines:
>>> l = ["a",
"b,c",
"d,e,f"]
>>> ll =[]
>>> [ll.extend(x.split(',')) for x in l]
[None, None, None]
>>> ll
['a', 'b', 'c', 'd', 'e', 'f']
The accumulator needs to be created separately since x.split(',') can not be unpacked inside a comprehension.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Removing duplicates elements while remain the index - python

Related

Remove NOT duplicates value from list

How to remove elements from a list that appear less than k = 2?

Putting column values from text file into a list in python

Split a list in python with an element as the delimiter? [duplicate]

Splitting string values in list into individual values, Python

Categories

Resources