Replace duplicates in a list column - python

I got a list, in one (the last) column is a string of comma separated items:
temp = ['AAA', 'BBB', 'CCC-DDD', 'EE,FFF,FFF,EE']
Now I want to remove the duplicates in that column.
I tried to make a list out of every column:
e = [s.split(',') for s in temp]
print e
Which gave me:
[['AAA'], ['BBB'], ['CCC-DDD'], ['EE', 'FFF', 'FFF', 'EE']]
Now I tried to remove the duplicates with:
y = list(set(e))
print y
What ended up in an error
TypeError: unhashable type: 'list'
I'd appreciate any help.
Edit:
I didn't exactly said what the end result should be. The list should look like that
temp = ['AAA', 'BBB', 'CCC-DDD', 'EE', 'FFF']
Just the duplicates should get removed in the last column.

Apply set on the elements of the list not on the list of lists. You want your set to contain the strings of each list, not the lists.
e = [list(set(x)) for x in e]
You can do it directly as well:
e = [list(set(s.split(','))) for s in temp]
>>> e
[['AAA'], ['BBB'], ['CCC-DDD'], ['EE', 'FFF']]
you may want sorted(set(s.split(','))) instead to ensure lexicographic order (sets aren't ordered, even in python 3.7)
for a flat, ordered list, create a flat set comprehension and sort it:
e = sorted({x for s in temp for x in s.split(',')})
result:
['AAA', 'BBB', 'CCC-DDD', 'EE', 'FFF']

Here is solution, that uses itertools.chain method
import itertools
temp = ['AAA', 'BBB', 'CCC-DDD', 'EE,FFF,FFF,EE']
y = list(set(itertools.chain(*[s.split(',') for s in temp])))
# ['EE', 'FFF', 'AAA', 'BBB', 'CCC-DDD']

a = ['AAA', 'BBB', 'CCC-DDD', 'EE,FFF,FFF,EE']
b = [s.split(',') for s in a]
c = []
for i in b:
c = c + i
c = list(set(c))
['EE', 'FFF', 'AAA', 'BBB', 'CCC-DDD']

Here is a pure functional way to do it in Python:
from functools import partial
split = partial(str.split, sep=',')
list(map(list, map(set, (map(split, temp)))))
[['AAA'], ['BBB'], ['CCC-DDD'], ['EE', 'FFF']]
Or as I see the answer doesn't need lists inside of a list:
from itertools import chain
list(chain(*map(set, (map(split, temp)))))
['AAA', 'BBB', 'CCC-DDD', 'EE', 'FFF']

Related

Split and Append a list

I have a list as follows:
['aaa > bbb', 'ccc > ddd', 'eee > ']
I am looking to split the list to get the following result, with an empty string element at the end
['aaa', 'bbb', 'ccc', 'ddd', 'eee','']
I tried the following code
for element in list:
list.append(element.split(' > '))
I am getting answer as follows:
[['aaa', 'bbb'], ['ccc', 'ddd'], ['eee','']]
After thinking I see that is what it supposed to work as. So how can I achieve what I am looking for.
You can use list comprehension:
l = ['aaa > bbb', 'ccc > ddd', 'eee > ']
[item.strip() for sublist in [element.split(">") for element in l] for item in sublist]
The output is:
['aaa', 'bbb', 'ccc', 'ddd', 'eee', '']
I admit that this list comprehension is not super easy and understandable but it's a one-liner :)
Try this:
for element in list:
list.extend(element.split(' > '))
From the docs:
list.extend(iterable)
Extend the list by appending all the items from the iterable. Equivalent to a[len(a):] = iterable.
You can try the following:
a = ['aaa > bbb', 'ccc > ddd', 'eee > ']
newlist = []
for element in a:
newlist += element.split(' > ')
Append method just adds the argument as a new element, the simple addition solves your problem.

Python: for-in loop is not checking the last string in a list, why?

I'm using a for-in loop to remove any strings from a list titled "words" that start with "x" as part of a function, but find that this loop will not check the last string in the list. Why is this?
After adding some print statements to figure out where things were going wrong I narrowed it down to the second for-in loop, but beyond that I'm not sure what to do...
def front_x(words):
print '\n'
words.sort()
print words
words2 = []
for string in words:
if string[0] == 'x':
words2.append(string)
#print 'added ' + string + ' to words2'
#else:
#print '(append)checked ' + string
for string in words:
if string[0] == 'x':
words.remove(string)
print 'removed ' + string
else: print 'checked ' + string
words2.extend(words)
return words2
As you can see, in each case it will check all of the elements in the list printed above except for the last. Below that are what my program got vs what it is supposed to get.
['axx', 'bbb', 'ccc', 'xaa', 'xzz']
checked axx
checked bbb
checked ccc
removed xaa
X got: ['xaa', 'xzz', 'axx', 'bbb', 'ccc', 'xzz']
expected: ['xaa', 'xzz', 'axx', 'bbb', 'ccc']
['aaa', 'bbb', 'ccc', 'xaa', 'xcc']
checked aaa
checked bbb
checked ccc
removed xaa
X got: ['xaa', 'xcc', 'aaa', 'bbb', 'ccc', 'xcc']
expected: ['xaa', 'xcc', 'aaa', 'bbb', 'ccc']
['aardvark', 'apple', 'mix', 'xanadu', 'xyz']
checked aardvark
checked apple
checked mix
removed xanadu
X got: ['xanadu', 'xyz', 'aardvark', 'apple', 'mix', 'xyz']
expected: ['xanadu', 'xyz', 'aardvark', 'apple', 'mix']
You are mutating the list as you are iterating over it. Behind the scenes Python is stepping through the numeric index of each item in the list. When you remove an item, all of the items with a higher index are shifted.
Instead, build an list of the indices you want to remove, then remove them. Or use a list comprehension to build a new list.
def front_x(words):
words2 = [w for w in words if w.startswith('x')]
return words2
If you want to also mutate the original list (modify words) with the function you can do so using:
def drop_front_x(words):
words2 = []
indices = [i for i, w in enumerate(words) if w.startswith('x')]
for ix in reversed(indices):
words2.insert(0, words.pop(ix))
return words2
I think you need this to get a list as you expected result looks like:
not_xes = [i for i in words if not i.startswith('x')]
xes = [i for i in words if i.startswitch('x')]
expected_result = xes + not_xes

Join items in python list separated by delimiter [duplicate]

This question already has answers here:
Combine elements of lists if some condition
(3 answers)
Closed 8 years ago.
I have a list like the following
list_1 = ['>name', 'aaa', 'bbb', '>name_1', 'ccc', '>name_2', 'ddd', 'eee', 'fff']
I was trying to join the items between the items with the '>" sign. So what I want is:
list_1 = ['>name', 'aaabbb', '>name_1', 'ccc', '>name_2', 'dddeeefff']
How can I do that in python?
Use a generator function; that lets you control when items are 'done' to yield:
def join_unescaped(it):
tojoin = []
for element in it:
if element.startswith('>'):
if tojoin:
yield ''.join(tojoin)
tojoin = []
yield element
else:
tojoin.append(element)
if tojoin:
yield ''.join(tojoin)
To produce a new list then from your input, pass the generator object produced to the list() function:
result = list(join_unescaped(list_1))
Demo:
>>> list_1 = ['>name', 'aaa', 'bbb', '>name_1', 'ccc', '>name_2', 'ddd', 'eee', 'fff']
>>> def join_unescaped(it):
... tojoin = []
... for element in it:
... if element.startswith('>'):
... if tojoin:
... yield ''.join(tojoin)
... tojoin = []
... yield element
... else:
... tojoin.append(element)
... if tojoin:
... yield ''.join(tojoin)
...
>>> list(join_unescaped(list_1))
['>name', 'aaabbb', '>name_1', 'ccc', '>name_2', 'dddeeefff']
>>> from itertools import groupby
>>> list_1 = ['>name', 'aaa', 'bbb', '>name_1', 'ccc', '>name_2', 'ddd', 'eee', 'fff']
>>> [''.join(v) for k, v in groupby(list_1, key=lambda s: s.startswith('>'))]
['>name', 'aaabbb', '>name_1', 'ccc', '>name_2', 'dddeeefff']
The only case to watch for here is if you have no items between > signs, which requires a simple fix.
>>> list_1 = ['>name', '>name0', 'aaa', 'bbb', '>name_1', 'ccc', '>name_2', 'ddd', 'eee', 'fff']
>>> [''.join(v) for k,v in groupby(list_1,key=lambda s:s.startswith('>')and s)]
['>name', '>name0', 'aaabbb', '>name_1', 'ccc', '>name_2', 'dddeeefff']
Sub note: just in the extremely unlikely case that you can have duplicate >names like ['>name', '>name', 'aaa'....] just change and s to and object()(which is unique) and that handles every possible case

Split a list in sublists according to charcter length

I have a list of strings and I like to split that list in different "sublists" based on the character length of the words in th list e.g:
List = [a, bb, aa, ccc, dddd]
Sublist1 = [a]
Sublist2= [bb, aa]
Sublist3= [ccc]
Sublist2= [dddd]
How can i achieve this in python ?
Thank you
by using itertools.groupby:
values = ['a', 'bb', 'aa', 'ccc', 'dddd', 'eee']
from itertools import groupby
output = [list(group) for key,group in groupby(sorted(values, key=len), key=len)]
The result is:
[['a'], ['bb', 'aa'], ['ccc', 'eee'], ['dddd']]
If your list is already sorted by string length and you just need to do grouping, then you can simplify the code to:
output = [list(group) for key,group in groupby(values, key=len)]
I think you should use dictionaries
>>> dict_sublist = {}
>>> for el in List:
... dict_sublist.setdefault(len(el), []).append(el)
...
>>> dict_sublist
{1: ['a'], 2: ['bb', 'aa'], 3: ['ccc'], 4: ['dddd']}
>>> from collections import defaultdict
>>> l = ["a", "bb", "aa", "ccc", "dddd"]
>>> d = defaultdict(list)
>>> for elem in l:
... d[len(elem)].append(elem)
...
>>> sublists = list(d.values())
>>> print(sublists)
[['a'], ['bb', 'aa'], ['ccc'], ['dddd']]
Assuming you're happy with a list of lists, indexed by length, how about something like
by_length = []
for word in List:
wl = len(word)
while len(by_length) < wl:
by_length.append([])
by_length[wl].append(word)
print "The words of length 3 are %s" % by_length[3]

python list different way than googles solution

I am working on the exercises for python from Google and I can't figure out why I am not getting the correct answer for a list problem. I saw the solution and they did it differently then me but I think the way I did it should work also.
# B. front_x
# Given a list of strings, return a list with the strings
# in sorted order, except group all the strings that begin with 'x' first.
# e.g. ['mix', 'xyz', 'apple', 'xanadu', 'aardvark'] yields
# ['xanadu', 'xyz', 'aardvark', 'apple', 'mix']
# Hint: this can be done by making 2 lists and sorting each of them
# before combining them.
def front_x(words):
# +++your code here+++
list = []
xlist = []
for word in words:
list.append(word)
list.sort()
for s in list:
if s.startswith('x'):
xlist.append(s)
list.remove(s)
return xlist+list
The call is:
front_x(['bbb', 'ccc', 'axx', 'xzz', 'xaa'])
I get:
['xaa', 'axx', 'bbb', 'ccc', 'xzz']
when the answer should be:
['xaa', 'xzz', 'axx', 'b
bb', 'ccc']
I've don't understand why my solution does not work
Thank you.
You shouldn't modify a list while iterating over it. See the for statement documentation.
for s in list:
if s.startswith('x'):
xlist.append(s)
list.remove(s) # this line causes the bug
Try this:
def front_x(words):
lst = []
xlst = []
for word in words:
if word.startswith('x'):
xlst.append(word)
else:
lst.append(word)
return sorted(xlst)+sorted(lst)
>>> front_x(['bbb', 'ccc', 'axx', 'xzz', 'xaa'])
['xaa', 'xzz', 'axx', 'bbb', 'ccc']

Categories