Join items in python list separated by delimiter [duplicate] - python

This question already has answers here:
Combine elements of lists if some condition
(3 answers)
Closed 8 years ago.
I have a list like the following
list_1 = ['>name', 'aaa', 'bbb', '>name_1', 'ccc', '>name_2', 'ddd', 'eee', 'fff']
I was trying to join the items between the items with the '>" sign. So what I want is:
list_1 = ['>name', 'aaabbb', '>name_1', 'ccc', '>name_2', 'dddeeefff']
How can I do that in python?

Use a generator function; that lets you control when items are 'done' to yield:
def join_unescaped(it):
tojoin = []
for element in it:
if element.startswith('>'):
if tojoin:
yield ''.join(tojoin)
tojoin = []
yield element
else:
tojoin.append(element)
if tojoin:
yield ''.join(tojoin)
To produce a new list then from your input, pass the generator object produced to the list() function:
result = list(join_unescaped(list_1))
Demo:
>>> list_1 = ['>name', 'aaa', 'bbb', '>name_1', 'ccc', '>name_2', 'ddd', 'eee', 'fff']
>>> def join_unescaped(it):
... tojoin = []
... for element in it:
... if element.startswith('>'):
... if tojoin:
... yield ''.join(tojoin)
... tojoin = []
... yield element
... else:
... tojoin.append(element)
... if tojoin:
... yield ''.join(tojoin)
...
>>> list(join_unescaped(list_1))
['>name', 'aaabbb', '>name_1', 'ccc', '>name_2', 'dddeeefff']

>>> from itertools import groupby
>>> list_1 = ['>name', 'aaa', 'bbb', '>name_1', 'ccc', '>name_2', 'ddd', 'eee', 'fff']
>>> [''.join(v) for k, v in groupby(list_1, key=lambda s: s.startswith('>'))]
['>name', 'aaabbb', '>name_1', 'ccc', '>name_2', 'dddeeefff']
The only case to watch for here is if you have no items between > signs, which requires a simple fix.
>>> list_1 = ['>name', '>name0', 'aaa', 'bbb', '>name_1', 'ccc', '>name_2', 'ddd', 'eee', 'fff']
>>> [''.join(v) for k,v in groupby(list_1,key=lambda s:s.startswith('>')and s)]
['>name', '>name0', 'aaabbb', '>name_1', 'ccc', '>name_2', 'dddeeefff']
Sub note: just in the extremely unlikely case that you can have duplicate >names like ['>name', '>name', 'aaa'....] just change and s to and object()(which is unique) and that handles every possible case

Related

counting 2D list in python

my 2D list just like:
log = [[time1, 'aaa', '123.123.123.123'], [time2, 'def', '123.123.123.123'], [time3, 'aaa', '123.123.123.123'], [time4, 'bbb', '123.123.123.123'], [time5, 'bbb', '123.123.123.123']]
what I want is, the output below by using for loop:
aaa: 2
def: 1
bbb: 2
how can I count the specific col in a 2D list by loop?
This here should give you the solution
from collections import Counter
for k, v in Counter([a[1] for a in log]).items():
print(f"{k}: {v}")
Output:
aaa: 2
def: 1
bbb: 2
If you want to try with the regular dict:
log = [[time1, 'aaa', '123.123.123.123'], [time2, 'def', '123.123.123.123'], [time3, 'aaa', '123.123.123.123'], [time4, 'bbb', '123.123.123.123'], [time5, 'bbb', '123.123.123.123']]
#Keep track of the counts in the dictionary
counter = dict()
for item in log:
key = item[1]
counter[key] = counter.get(key, 0) + 1 #If the key doesn't exist, initialize its count to 0
print(counter)
This would give you the expected output as:
This code should meet your requirements :
import numpy as np
from collections import Counter
Counter(np.array(log)[:,1])
from collections import Counter
ele = [r[1] for r in log]
ele_counts = Counter(ele)
print(dict(ele_counts))
OUTPUT
{'aaa': 2, 'def': 1, 'bbb': 2}
check this code
log = [[time1, 'aaa', '123.123.123.123'], [time2, 'def', '123.123.123.123'], [time3, 'aaa', '123.123.123.123'], [time4, 'bbb', '123.123.123.123'], [time5, 'bbb', '123.123.123.123']]
ans = [0,0,0]
for x in log:
if x[1] == 'aaa':
ans[0] += 1
elif x[1] == 'def':
ans[1] += 1
else:
ans[2] += 1
print(f'aaa: {ans[0]}\ndef: {ans[1]}\nbbb: {ans[2]}')
you must define time1 ~ time5 before check the code

How to filter a list of tuples with another list of items

I have two lists: one list containing items which are reference numbers and a second list containing tuples which some include the reference numbers of the first list.
My list of reference numbers looks like this:
list1 = ['0101', '0202', '0303']
And my list of tuples like this:
list2 = [
('8578', 'aaa', 'bbb', 'ccc'),
('0101', 'ddd', 'eee', 'fff'),
('9743', 'ggg', 'hhh', 'iii'),
('2943', 'jjj', 'kkk', 'lll'),
('0202', 'mmm', 'nnn', 'ooo'),
('7293', 'ppp', 'qqq', 'rrr'),
('0303', 'sss', 'ttt', 'uuu'),
]
I want to filter the second list above depending on the presence of the reference numbers from the first list inside tuples: if the reference number is included in a tuple, the script takes it off from the list.
Here is the expected result:
newlist2 = [
('8578', 'aaa', 'bbb', 'ccc'),
('9743', 'ggg', 'hhh', 'iii'),
('2943', 'jjj', 'kkk', 'lll'),
('7293', 'ppp', 'qqq', 'rrr'),
]
How can I do that?
You can use the built-in filter function with a lambda:
list2 = filter(lambda a:a[0] in list1, list2)
This will turn list2 into a iterable, if you need it to be a list, not just an iterator, you can use a list comprehension instead:
list2 = [element for element in list2 if element[0] not in list1]
list1 = ['0101', '0202', '0303']
list2 = [
('8578', 'aaa', 'bbb', 'ccc'),
('0101', 'ddd', 'eee', 'fff'),
('9743', 'ggg', 'hhh', 'iii'),
('2943', 'jjj', 'kkk', 'lll'),
('0202', 'mmm', 'nnn', 'ooo'),
('7293', 'ppp', 'qqq', 'rrr'),
('0303', 'sss', 'ttt', 'uuu'),
]
filtered = []
for i in list2:
if i[0] not in list1:
filtered.append(i)
print(filtered)
output
[('8578', 'aaa', 'bbb', 'ccc'),
('9743', 'ggg', 'hhh', 'iii'),
('2943', 'jjj', 'kkk', 'lll'),
('7293', 'ppp', 'qqq', 'rrr')]

Replace duplicates in a list column

I got a list, in one (the last) column is a string of comma separated items:
temp = ['AAA', 'BBB', 'CCC-DDD', 'EE,FFF,FFF,EE']
Now I want to remove the duplicates in that column.
I tried to make a list out of every column:
e = [s.split(',') for s in temp]
print e
Which gave me:
[['AAA'], ['BBB'], ['CCC-DDD'], ['EE', 'FFF', 'FFF', 'EE']]
Now I tried to remove the duplicates with:
y = list(set(e))
print y
What ended up in an error
TypeError: unhashable type: 'list'
I'd appreciate any help.
Edit:
I didn't exactly said what the end result should be. The list should look like that
temp = ['AAA', 'BBB', 'CCC-DDD', 'EE', 'FFF']
Just the duplicates should get removed in the last column.
Apply set on the elements of the list not on the list of lists. You want your set to contain the strings of each list, not the lists.
e = [list(set(x)) for x in e]
You can do it directly as well:
e = [list(set(s.split(','))) for s in temp]
>>> e
[['AAA'], ['BBB'], ['CCC-DDD'], ['EE', 'FFF']]
you may want sorted(set(s.split(','))) instead to ensure lexicographic order (sets aren't ordered, even in python 3.7)
for a flat, ordered list, create a flat set comprehension and sort it:
e = sorted({x for s in temp for x in s.split(',')})
result:
['AAA', 'BBB', 'CCC-DDD', 'EE', 'FFF']
Here is solution, that uses itertools.chain method
import itertools
temp = ['AAA', 'BBB', 'CCC-DDD', 'EE,FFF,FFF,EE']
y = list(set(itertools.chain(*[s.split(',') for s in temp])))
# ['EE', 'FFF', 'AAA', 'BBB', 'CCC-DDD']
a = ['AAA', 'BBB', 'CCC-DDD', 'EE,FFF,FFF,EE']
b = [s.split(',') for s in a]
c = []
for i in b:
c = c + i
c = list(set(c))
['EE', 'FFF', 'AAA', 'BBB', 'CCC-DDD']
Here is a pure functional way to do it in Python:
from functools import partial
split = partial(str.split, sep=',')
list(map(list, map(set, (map(split, temp)))))
[['AAA'], ['BBB'], ['CCC-DDD'], ['EE', 'FFF']]
Or as I see the answer doesn't need lists inside of a list:
from itertools import chain
list(chain(*map(set, (map(split, temp)))))
['AAA', 'BBB', 'CCC-DDD', 'EE', 'FFF']

Need to sort given strings (Strings starting with x first)

Code doesn't return last word when given ['mix', 'xyz', 'apple', 'xanadu', 'aardvark'] list.
def front_x(words):
x_list = []
no_x_list = []
[x_list.append(i) for i in words if i[0] == "x"]
[no_x_list.append(words[i2]) for i2 in range(len(words)-1) if words[i2] not in x_list]
x_list.sort()
no_x_list.sort()
return x_list + no_x_list
print front_x(['mix', 'xyz', 'apple', 'xanadu', 'aardvark'])
Must be:
['xanadu', 'xyz', 'aardvark', 'apple', 'mix']
With lists ['bbb', 'ccc', 'axx', 'xzz', 'xaa'] and ['ccc', 'bbb', 'aaa', 'xcc', 'xaa'] everything is right ['xaa', 'xzz', 'axx', 'bbb', 'ccc'] and ['xaa', 'xcc', 'aaa', 'bbb', 'ccc']
Iterating on range(len(words)-1) looks incorrect. More so, making list appends with list comprehension is quite unpythonic; list comps are for building lists not making making list appends which is rather ironic here.
You can perform the sort once by sorting based on a two-tuple whose fist item checks if a word startswith 'x' and puts those ahead. The second item in the tuple applies a lexicograhical sort on the list, breaking ties:
def front_x(words):
return sorted(words, key=lambda y: (not y.startswith('x'), y))
print front_x(['mix', 'xyz', 'apple', 'xanadu', 'aardvark'])
# ['xanadu', 'xyz', 'aardvark', 'apple', 'mix']
Just remove the -1 from range(len(words)-1)
This will be the minimal change in your code.
You can try this:
words = ['mix', 'xyz', 'apple', 'xanadu', 'aardvark']
new_words = [i for i in words if i[0].lower() == "x"]
words = [i for i in words if i[0].lower() != "x"]
final_words = sorted(new_words)+sorted(words)
print(final_words)
Output:
['xanadu', 'xyz', 'aardvark', 'apple', 'mix']
You have to use len(words) instead of len(words)-1 to get expected output.
So, try this way :
def front_x(words):
x_list = []
no_x_list = []
[x_list.append(i) for i in words if i[0] == "x"]
[no_x_list.append(words[i2]) for i2 in range(len(words)) if words[i2] not in x_list]
x_list.sort()
no_x_list.sort()
return x_list + no_x_list
print front_x(['mix', 'xyz', 'apple', 'xanadu', 'aardvark'])
Output :
['xanadu', 'xyz', 'aardvark', 'apple', 'mix']

python list different way than googles solution

I am working on the exercises for python from Google and I can't figure out why I am not getting the correct answer for a list problem. I saw the solution and they did it differently then me but I think the way I did it should work also.
# B. front_x
# Given a list of strings, return a list with the strings
# in sorted order, except group all the strings that begin with 'x' first.
# e.g. ['mix', 'xyz', 'apple', 'xanadu', 'aardvark'] yields
# ['xanadu', 'xyz', 'aardvark', 'apple', 'mix']
# Hint: this can be done by making 2 lists and sorting each of them
# before combining them.
def front_x(words):
# +++your code here+++
list = []
xlist = []
for word in words:
list.append(word)
list.sort()
for s in list:
if s.startswith('x'):
xlist.append(s)
list.remove(s)
return xlist+list
The call is:
front_x(['bbb', 'ccc', 'axx', 'xzz', 'xaa'])
I get:
['xaa', 'axx', 'bbb', 'ccc', 'xzz']
when the answer should be:
['xaa', 'xzz', 'axx', 'b
bb', 'ccc']
I've don't understand why my solution does not work
Thank you.
You shouldn't modify a list while iterating over it. See the for statement documentation.
for s in list:
if s.startswith('x'):
xlist.append(s)
list.remove(s) # this line causes the bug
Try this:
def front_x(words):
lst = []
xlst = []
for word in words:
if word.startswith('x'):
xlst.append(word)
else:
lst.append(word)
return sorted(xlst)+sorted(lst)
>>> front_x(['bbb', 'ccc', 'axx', 'xzz', 'xaa'])
['xaa', 'xzz', 'axx', 'bbb', 'ccc']

Categories