Python removing duplicates in a list

Python removing duplicates in a list - python

I want to remove duplicated in a python list in a way that allows me to alter another corresponding list in the same way. In the below example original is the list I want to de-duplicate. Each element in key that shares the same index as original corresponds to each other:
original = [a,a,a,3,4,5,b,2,b]
key = [h,f,g,5,e,6,u,z,t]
So I want to remove duplicates in original such that whatever element I delete from original I delete the corresponding element (of the same index) in key. Results I want:
deduplicated_original = [a,3,4,5,b,2]
deduplicated_key = [h,5,e,6,u,z]
I can get deduplicated_original using list(set(original)) however I cannot get the corresponding deduplicated_key

You can use a set to keep track of duplicates and enumerate() to iterate over indexes/values of the original list:
seen = set()
lst = []
for i, v in enumerate(original):
if not v in seen:
lst.append(key[i])
seen.add(v)
print(lst)

maybe less elegant, less easy to follow the list revesal, index slicing
the inner list comp walks the input list org backwards, asking if there is a prior matching element, if so record the index of this duplicate
[len(org) - 1 - i
for i, e in enumerate(org[::-1]) if e in org[:-i-1]]
then the outer list comp uses .pop() to modify org, ky as a side effect
nested list comprehension 'dups', a 'one liner' (with line breaks):
org = ['a','a','a',3,4,5,'b',2,'b']
ky = ['h','f','g',5,'e',6,'u','z','t']
dups = [(org.pop(di), ky.pop(di))
for di in [len(org) - 1 - i
for i, e in enumerate(org[::-1]) if e in org[:-i-1]]]
org, ky, dups
Out[208]:
(['a', 3, 4, 5, 'b', 2],
['h', 5, 'e', 6, 'u', 'z'],
[('b', 't'), ('a', 'g'), ('a', 'f')])
of course you don't actually have to assign the list comp result to anything to get the side effect of modifying the lists

You can manually get all the indices of duplicates like this:
indices = []
existing = set()
for i, item in enumerate(original):
if item in existing:
indices.append(i)
else:
existing.add(item)
and then remove those indices from your key list, in reverse because deleting a key changes the indices of further items:
for i in reversed(indices):
del key[i]

Related

Not able to delete string item with colon in python list

So I'm having the following problem while coding in python: I have a few string items in a list like so:
['X','Y','Z','A', 'B:C', 'D']
I want to delete everything past 'Z'. I use the following code to attempt this:
for item in lines:
if ((item == "A")):
lines.remove(item)
if (item == "B:C"):
lines.remove(item)
if (item == "D"):
lines.remove(item)
A and D get removed perfectly. However, B:C is not removed and stays in the list...
Mind you, A, D, B:C etc represent strings, not characters (e.g. A could be Transaction failed! and B:C can represent WRITE failure: cannot be done!)
How can this be solved?

Modifying a list while iterating over it is usually a bad thing. Some of the elements get skipped when you remove the current element. You may be able to fix it by iterating over reversed(lines), but it is better to create a new list that doesn't have the elements that you want to drop:
to_remove = {'A', 'B:C', 'D'}
new_lines = [line for line in lines if line not in to_remove]
Or, if you want to modify in-place:
to_remove = {'A', 'B:C', 'D'}
lines[:] = [line for line in lines if line not in to_remove]

You may use the .index() method to find the index of a specific element inside a list.
Then after finding the z_index, you may create another list by slicing the first one.
Here's an example:
l1 = ['X','Y','Z','A', 'B:C', 'D']
#finding index of element 'Z'
z_index = l1.index('Z')
#slicing list from 0 until z_index
l2 = l1[:z_index]
print l2
Output:
['X', 'Y']

Generally, it is not a good idea to delete elements from a list you are iterating. In your case, you may consider creating a new list with the result you want:
l = ['X','Y','Z','A', 'B:C', 'D']
clean_l = [i for i in l if i not in ('A', 'B:C', 'D')]
Which is a good option if you know which elements you want to delete. However, if you know that you don't want anything after 'Z' regardless of their value, then just slice the list:
clean_l = l[:l.index('Z') + 1]

Firstly you would want to find the position of 'Z' by using the index() method.
x = ['X','Y','Z','A', 'B:C', 'D']
position = x.index('Z')
Then to delete everything after z i would do this:
del x[postion+1:]
You have to add one to the position otherwise it will delete 'Z' also

Finding duplicates in few lists

In my case duplicate is not a an item that reappear in one list, but also in the same positions on another lists. For example:
list1 = [1,2,3,3,3,4,5,5]
list2 = ['a','b','b','c','b','d','e','e']
list3 = ['T1','T2','T3','T4','T3','T4','T5','T5']
So the position of the real duplicates in all 3 lists is [2,4] and [6,7]. Because in list1 3 is repeated, in list2 'b' is repeated in the same position as in list1, in list 3 'T3'. in second case 5,e,T5 represent duplicated items in the same positions in their lists. I have a hard time to present results "automatically" in one step.
1) I find duplicate in first list
# Find Duplicated part numbers (exact maches)
def list_duplicates(seq):
seen = set()
seen_add = seen.add
# adds all elements it doesn't know yet to seen and all other to seen_twice
seen_twice = set( x for x in seq if x in seen or seen_add(x) )
# turn the set into a list (as requested)
return list(seen_twice)
# List of Duplicated part numbers
D_list1 = list_duplicates(list1)
D_list2 = list_duplicates(list2)
2) Then I find the positions of given duplicate and look at that position in second list
# find the row position of duplicated part numbers
def list_position_duplicates(list1,n,D_list1):
position = []
gen = (i for i,x in enumerate(data) if x == D_list1[n])
for i in gen: position.append(i)
return position
# Actual calculation find the row position of duplicated part numbers, beginning and end
lpd_part = list_position_duplicates(list1,1,D_list1)
start = lpd_part[0]
end = lpd_part[-1]
lpd_parent = list_position_duplicates(list2[start:end+1],0,D_list2)
So in step 2 I need to put n (position of found duplicate in the list), I would like to do this step automatically, to have a position of duplicated elements in the same positions in the lists. For all duplicates in the same time, and not one by one "manualy". I think it just need a for loop or if, but I'm new to Python and I tried many combinations and it didn't work.

You can use items from all 3 lists on the same index as key and store the the corresponding index as value(in a list). If for any key there are more than 1 indices stored in the list, it is duplicate:
from itertools import izip
def solve(*lists):
d = {}
for i, k in enumerate(izip(*lists)):
d.setdefault(k, []).append(i)
for k, v in d.items():
if len(v) > 1:
print k, v
solve(list1, list2, list3)
#(3, 'b', 'T3') [2, 4]
#(5, 'e', 'T5') [6, 7]

Python: using a dict to speed sorting of a list of tuples

For some reason, I keep having 'how do I sort this list of tuples' questions. (A prior question of mine: sorting list of tuples by arbitrary key).
Here is some arbitrary raw input:
number_of = 3 # or whatever
tuple_list = [(n, 'a', 'b', 'c') for n in xrange(number_of)] # [(0, 'a', 'b', 'c')...]
ordering_list = random.sample(range(number_of), number_of) # e.g. [1, 0, 2]
Sorting tuple_list by ordering_list using sorted:
ordered = sorted(tuple_list, key=lambda t: ordering_list.index(t[0]))
# ordered = [(1, 'a', 'b', 'c'), (0, 'a', 'b', 'c'), (2, 'a', 'b', 'c')]
I have a slightly awkward approach which seems to be much faster, especially as the number of elements in the tuple_list grows. I create a dictionary, breaking the tuple into (tuple[0], tuple[1:]) items inside dictionary list_dict. I retrieve the dictionary item using ordering_list as keys, and then re-assemble the sequence of (tuple[0], tuple[1:]) into a list of tuples, using an idiom I'm still trying to wrap my head around completely: zip(*[iter(_list)] * x) where x is the length of each tuple composed of items from _list. So my question is: is there a version of this approach which is manages the disassemble - reassemble part of the code better?
def gen_key_then_values(key_list, list_dict):
for key in key_list:
values = list_dict[key]
yield key
for n in values:
yield n
list_dict = {t[0]: t[1:] for t in tuple_list}
ordered = zip(*[gen_key_then_values(ordering_list, list_dict)] * 4)
NOTE BETTER CODE, using an obvious comment from Steve Jessop below:
list_dict = {t[0]: t for t in tuple_list}
ordered = [list_dict[k] for k in ordering_list]
My actual project code still requires assembling a tuple for each (k, ['a', 'b' ...]) item retrieved from the list_dict but there was no reason for me to include that part of the code here.

Breaking the elements of tuple_list apart in the dictionary doesn't really gain you anything and requires creating a bunch more tuples for the values. All you're doing is looking up elements in the list according to their first element, so it's probably not worth actually splitting them:
list_dict = { t[0] : t for t in tuple_list }
Note that this only works if the first element is unique, but then the ordering_list only makes sense if the first element is unique, so that's probably OK.
zip(*[iter(_list)] * 4) is just a way of grouping _list into fours, so give it a suitable name and you won't have to worry about it:
def fixed_size_groups(n, iterable):
return zip(*[iter(iterable)] * n)
But all things considered you don't actually need it anyway:
ordered = list(list_dict[val] for val in ordering_list)
The reason your first code is slow, is that ordering_list.index is slow -- it searches through the ordering_list for t[0], and it does this once for each t. So in total it does (number_of ** 2) / 2 inspections of a list element.

Python - Arranging combinations of list into a list of tuples of various sizes

I have a list of strings:
l = ['a', 'b', 'c']
I want to create all possible combinations of the list elements in groups of different sizes. I would prefer this to be a list of tuples of tuples, but it could also be a list of lists of lists, etc. The orders of the tuples, and of the tuples in the tuples, does not matter. No list element can be repeated in either the tuples or the tuples of tuples. For the above list, I would expect something like:
[(('a'),('b'),('c')),
(('a', 'b'), ('c')),
(('a', 'c'), ('b')),
(('b', 'c'), ('a')),
(('a', 'b', 'c'))]
Any help is greatly appreciated.
EDIT:
I do require that each of the tuples in the list contain all of the elements of l.
senderle and Antimony, you are both correct regarding the omissions.

Here's one way to do things. I don't know if there are any more elegant methods. The itertools module has functions for combinations and permutations, but unfortunately, nothing for partitions.
Edit: My first version isn't correct, but fortunately, I already have this lying around from an old project I did.
You can also get a unique integer key that represents an edge bitset associated with each partition by returning d instead of d.values(). This is useful for efficiently testing whether one partition is a refinement of another.
def connectivityDictSub(num, d, setl, key, i):
if i >= num:
assert(key not in d)
d[key] = setl
else:
for ni in range(len(setl)):
nsetl, nkey = setl[:], key
for other in nsetl[ni]:
assert(other != i)
x,y = sorted((i, other))
ki = ((2*num-3-x)*x)/2 + y-1
nkey |= 1<<ki
nsetl[ni] = nsetl[ni] + [i] #not the same as += since it makes a copy
connectivityDictSub(num, d, nsetl, nkey, i+1)
nsetl = setl + [[i]]
connectivityDictSub(num, d, nsetl, key, i+1)
def connectivityDict(groundSet):
gset = sorted(set(groundSet))
d = {}
connectivityDictSub(len(gset), d, [], 0, 0)
for setl in d.values():
setl[:] = [tuple(gset[i] for i in x) for x in setl]
return map(tuple, d.values())
for x in connectivityDict('ABCD'):
print x

itertools should do most of the job you want.
Example:
stuff = [1, 2, 3]
for L in range(0, len(stuff)+1):
for subset in itertools.combinations(stuff, L):
print(subset)
The example is just to show itertools. You will have to figure it out to get the exact output you want.

Using an index to get an item

I have a list in python ('A','B','C','D','E'), how do I get which item is under a particular index number?
Example:
Say it was given 0, it would return A.
Given 2, it would return C.
Given 4, it would return E.

What you show, ('A','B','C','D','E'), is not a list, it's a tuple (the round parentheses instead of square brackets show that). Nevertheless, whether it to index a list or a tuple (for getting one item at an index), in either case you append the index in square brackets.
So:
thetuple = ('A','B','C','D','E')
print thetuple[0]
prints A, and so forth.
Tuples (differently from lists) are immutable, so you couldn't assign to thetuple[0] etc (as you could assign to an indexing of a list). However you can definitely just access ("get") the item by indexing in either case.

values = ['A', 'B', 'C', 'D', 'E']
values[0] # returns 'A'
values[2] # returns 'C'
# etc.

You can use _ _getitem__(key) function.
>>> iterable = ('A', 'B', 'C', 'D', 'E')
>>> key = 4
>>> iterable.__getitem__(key)
'E'

Same as any other language, just pass index number of element that you want to retrieve.
#!/usr/bin/env python
x = [2,3,4,5,6,7]
print(x[5])

You can use pop():
x=[2,3,4,5,6,7]
print(x.pop(2))
output is 4

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python removing duplicates in a list - python

You can use a set to keep track of duplicates and enumerate() to iterate over indexes/values of the original list: seen = set() lst = [] for i, v in enumerate(original): if not v in seen: lst.append(key[i]) seen.add(v) print(lst)

Related

Not able to delete string item with colon in python list

Finding duplicates in few lists

Python: using a dict to speed sorting of a list of tuples

Python - Arranging combinations of list into a list of tuples of various sizes

Using an index to get an item

Categories

Resources