This question already has answers here:
How to iterate over a list in chunks
(39 answers)
Closed 8 months ago.
I know that a list can be joined to make one long string as in:
x = ['a', 'b', 'c', 'd']
print ''.join(x)
Obviously this would output:
'abcd'
However, what I am trying to do is simply join the first and second strings in the list, then join the third and fourth and so on. In short, from the above example instead achieve an output of:
['ab', 'cd']
Is there any simple way to do this? I should probably also mention that the lengths of the strings in the list will be unpredictable, as will the number of strings within the list, though the number of strings will always be even. So the original list could just as well be:
['abcd', 'e', 'fg', 'hijklmn', 'opq', 'r']
You can use slice notation with steps:
>>> x = "abcdefghijklm"
>>> x[0::2] #0. 2. 4...
'acegikm'
>>> x[1::2] #1. 3. 5 ..
'bdfhjl'
>>> [i+j for i,j in zip(x[::2], x[1::2])] # zip makes (0,1),(2,3) ...
['ab', 'cd', 'ef', 'gh', 'ij', 'kl']
Same logic applies for lists too. String lenght doesn't matter, because you're simply adding two strings together.
Use an iterator.
List comprehension:
>>> si = iter(['abcd', 'e', 'fg', 'hijklmn', 'opq', 'r'])
>>> [c+next(si, '') for c in si]
['abcde', 'fghijklmn', 'opqr']
Very efficient for memory usage.
Exactly one traversal of s
Generator expression:
>>> si = iter(['abcd', 'e', 'fg', 'hijklmn', 'opq', 'r'])
>>> pair_iter = (c+next(si, '') for c in si)
>>> pair_iter # can be used in a for loop
<generator object at 0x4ccaa8>
>>> list(pair_iter)
['abcde', 'fghijklmn', 'opqr']
use as an iterator
Using map, str.__add__, iter
>>> si = iter(['abcd', 'e', 'fg', 'hijklmn', 'opq', 'r'])
>>> map(str.__add__, si, si)
['abcde', 'fghijklmn', 'opqr']
next(iterator[, default]) is available starting in Python 2.6
just to be pythonic :-)
>>> x = ['a1sd','23df','aaa','ccc','rrrr', 'ssss', 'e', '']
>>> [x[i] + x[i+1] for i in range(0,len(x),2)]
['a1sd23df', 'aaaccc', 'rrrrssss', 'e']
in case the you want to be alarmed if the list length is odd you can try:
[x[i] + x[i+1] if not len(x) %2 else 'odd index' for i in range(0,len(x),2)]
Best of Luck
Without building temporary lists:
>>> import itertools
>>> s = 'abcdefgh'
>>> si = iter(s)
>>> [''.join(each) for each in itertools.izip(si, si)]
['ab', 'cd', 'ef', 'gh']
or:
>>> import itertools
>>> s = 'abcdefgh'
>>> si = iter(s)
>>> map(''.join, itertools.izip(si, si))
['ab', 'cd', 'ef', 'gh']
>>> lst = ['abcd', 'e', 'fg', 'hijklmn', 'opq', 'r']
>>> print [lst[2*i]+lst[2*i+1] for i in range(len(lst)/2)]
['abcde', 'fghijklmn', 'opqr']
Well I would do it this way as I am no good with Regs..
CODE
t = '1. eat, food\n\
7am\n\
2. brush, teeth\n\
8am\n\
3. crack, eggs\n\
1pm'.splitlines()
print [i+j for i,j in zip(t[::2],t[1::2])]
output:
['1. eat, food 7am', '2. brush, teeth 8am', '3. crack, eggs 1pm']
Hope this helps :)
I came across this page interesting yesterday while wanting to solve a similar issue. I wanted to join items first in pairs using one string in between and then together using another string. Based on the code above I came up with the following function:
def pairs(params,pair_str, join_str):
"""Complex string join where items are first joined in pairs
"""
terms = iter(params)
pairs = [pair_str.join(filter(len, [term, next(terms, '')])) for term in terms]
return join_str.join(pairs)
This results in the following:
a = ['1','2','3','4','5','6','7','8','9']
print(pairs(a, ' plus ', ' and '))
>>1 plus 2 and 3 plus 4 and 5 plus 6 and 7 plus 8 and 9
The filter step prevents the '' which is produced in case of an odd number of terms from putting a final pair_str at the end.
Related
I've seen many questions on getting all the possible substrings (i.e., adjacent sets of characters), but none on generating all possible strings including the combinations of its substrings.
For example, let:
x = 'abc'
I would like the output to be something like:
['abc', 'ab', 'ac', 'bc', 'a', 'b', 'c']
The main point is that we can remove multiple characters that are not adjacent in the original string (as well as the adjacent ones).
Here is what I have tried so far:
def return_substrings(input_string):
length = len(input_string)
return [input_string[i:j + 1] for i in range(length) for j in range(i, length)]
print(return_substrings('abc'))
However, this only removes sets of adjacent strings from the original string, and will not return the element 'ac' from the example above.
Another example is if we use the string 'abcde', the output list should contain the elements 'ace', 'bd' etc.
You can do this easily using itertools.combinations
>>> from itertools import combinations
>>> x = 'abc'
>>> [''.join(l) for i in range(len(x)) for l in combinations(x, i+1)]
['a', 'b', 'c', 'ab', 'ac', 'bc', 'abc']
If you want it in the reversed order, you can make the range function return its sequence in reversed order
>>> [''.join(l) for i in range(len(x),0,-1) for l in combinations(x, i)]
['abc', 'ab', 'ac', 'bc', 'a', 'b', 'c']
This is a fun exercise. I think other answers may use itertools.product or itertools.combinations. But just for fun, you can also do this recursively with something like
def subs(string, ret=['']):
if len(string) == 0:
return ret
head, tail = string[0], string[1:]
ret = ret + list(map(lambda x: x+head, ret))
return subs(tail, ret)
subs('abc')
# returns ['', 'a', 'b', 'ab', 'c', 'ac', 'bc', 'abc']
#Sunitha answer provided the right tool to use. I will just go and suggest an improved way while using your return_substrings method. Basically, my solution will take care of duplicates.
I will use "ABCA" in order to prove validity of my solution. Note that it would include a duplicate 'A' in the returned list of the accepted answer.
Python 3.7+ solution,
x= "ABCA"
def return_substrings(x):
all_combnations = [''.join(l) for i in range(len(x)) for l in combinations(x, i+1)]
return list(reversed(list(dict.fromkeys(all_combnations))))
# return list(dict.fromkeys(all_combnations)) for none-reversed ordering
print(return_substrings(x))
>>>>['ABCA', 'BCA', 'ACA', 'ABA', 'ABC', 'CA', 'BA', 'BC', 'AA', 'AC', 'AB', 'C', 'B', 'A']
Python 2.7 solution,
You'll have to use OrderedDict instead of a normal dict. Therefore,
return list(reversed(list(dict.fromkeys(all_combnations))))
becomes
return list(reversed(list(OrderedDict.fromkeys(all_combnations))))
Order is irrelevant for you ?
You can reduce code complexity if order is not relevant,
x= "ABCA"
def return_substrings(x):
all_combnations = [''.join(l) for i in range(len(x)) for l in combinations(x, i+1)]
return list(set(all_combnations))
def return_substrings(s):
all_sub = set()
recent = {s}
while recent:
tmp = set()
for word in recent:
for i in range(len(word)):
tmp.add(word[:i] + word[i + 1:])
all_sub.update(recent)
recent = tmp
return all_sub
For an overkill / different version of the accepted answer (expressing combinations using https://docs.python.org/3/library/itertools.html#itertools.product ):
["".join(["abc"[y[0]] for y in x if y[1]]) for x in map(enumerate, itertools.product((False, True), repeat=3))]
For a more visual interpretation, consider all substrings as a mapping of all bitstrings of length n.
This question already has answers here:
Splitting a string into 2-letter segments [duplicate]
(6 answers)
Closed 6 years ago.
I need to make a string 'abcdef' into a list ['ab', 'cd', 'ef'] on python
I tried using list() but it returns ['a', 'b', 'c', 'd',' 'e, 'f']
Can anyone help please??
You can use zip:
s = "abcdef"
[''.join(x) for x in zip(s[::2], s[1::2])]
# ['ab', 'cd', 'ef']
Or
[s[i:i+2] for i in range(0, len(s), 2)]
# ['ab', 'cd', 'ef']
If you don't mind an extra function you could use something like this:
def grouper(string, size):
i = 0
while i < len(string):
yield string[i:i+size]
i += size
This is a generator so you need to collect all parts of it, for example by using list:
>>> list(grouper('abcdef', 3))
['abc', 'def']
>>> list(grouper('abcdef', 2)) # <-- that's what you want.
['ab', 'cd', 'ef']
Imagine I have the following strings:
['a','b','c_L1', 'c_L2', 'c_L3', 'd', 'e', 'e_L1', 'e_L2']
Where the "c" string has important sub-categories (L1, L2, L3). These indicate special data for our purposes that have been generated in a program based a pre-designated string "L". In other words, I know that the special entries should have the form:
name_Lnumber
Knowing that I'm looking for this pattern, and that I am using "L" or more specifically "_L" as my designation of these objects, how could I return a list of entries that meet this condition? In this case:
['c', 'e']
Use a simple filter:
>>> l = ['a','b','c_L1', 'c_L2', 'c_L3', 'd', 'e', 'e_L1', 'e_L2']
>>> filter(lambda x: "_L" in x, l)
['c_L1', 'c_L2', 'c_L3', 'e_L1', 'e_L2']
Alternatively, use a list comprehension
>>> [s for s in l if "_L" in s]
['c_L1', 'c_L2', 'c_L3', 'e_L1', 'e_L2']
Since you need the prefix only, you can just split it:
>>> set(s.split("_")[0] for s in l if "_L" in s)
set(['c', 'e'])
you can use the following list comprehension :
>>> set(i.split('_')[0] for i in l if '_L' in i)
set(['c', 'e'])
Or if you want to match the elements that ends with _L(digit) and not something like _Lm you can use regex :
>>> import re
>>> set(i.split('_')[0] for i in l if re.match(r'.*?_L\d$',i))
set(['c', 'e'])
I am new to Python. When I added a string with add() function, it worked well. But when I tried to add multiple strings, it treated them as character items.
>>> set1 = {'a', 'bc'}
>>> set1.add('de')
>>> set1
set(['a', 'de', 'bc'])
>>> set1.update('fg', 'hi')
>>> set1
set(['a', 'g', 'f', 'i', 'h', 'de', 'bc'])
>>>
The results I wanted are set(['a', 'de', 'bc', 'fg', 'hi'])
Does this mean the update() function does not work for adding strings?
The version of Python used is: Python 2.7.1
update treats its arguments as sets. Thus supplied string 'fg' is implicitly converted to a set of 'f' and 'g'.
You gave update() multiple iterables (strings are iterable) so it iterated over each of those, adding the items (characters) of each. Give it one iterable (such as a list) containing the strings you wish to add.
set1.update(['fg', 'hi'])
Here's something fun using pipe equals ( |= )...
>>> set1 = {'a', 'bc'}
>>> set1.add('de')
>>> set1
set(['a', 'de', 'bc'])
>>> set1 |= set(['fg', 'hi'])
>>> set1
set(['a', 'hi', 'de', 'fg', 'bc'])
Try using set1.update( ['fg', 'hi'] ) or set1.update( {'fg', 'hi'} )
Each item in the passed in list or set of strings will be added to the set
I have string which contains every word separated by comma. I want to split the string by every other comma in python. How should I do this?
eg, "xyz,abc,jkl,pqr" should give "xyzabc" as one string and "jklpqr" as another string
It's probably easier to split on every comma, and then rejoin pairs
>>> original = 'a,1,b,2,c,3'
>>> s = original.split(',')
>>> s
['a', '1', 'b', '2', 'c', '3']
>>> alternate = map(''.join, zip(s[::2], s[1::2]))
>>> alternate
['a1', 'b2', 'c3']
Is that what you wanted?
Split, and rejoin.
So, to split:
In [173]: "a,b,c,d".split(',')
Out[173]: ['a', 'b', 'c', 'd']
And to rejoin:
In [193]: z = iter("a,b,c,d".split(','))
In [194]: [a+b for a,b in zip(*([z]*2))]
Out[194]: ['ab', 'cd']
This works because ([z]*2) is a list of two elements, both of which are the same iterator z. Thus, zip takes the first, then second element from z to create each tuple.
This also works as a oneliner, because in [foo]*n foo is evaluated only once, whether or not it is a variable or a more complex expression:
In [195]: [a+b for a,b in zip(*[iter("a,b,c,d".split(','))]*2)]
Out[195]: ['ab', 'cd']
I've also cut out a pair of brackets, because unary * has lower precedence than binary *.
Thanks to #pillmuncher for pointing out that this can be extended with izip_longest to handle lists with an odd number of elements:
In [214]: from itertools import izip_longest
In [215]: [a+b for a,b in izip_longest(*[iter("a,b,c,d,e".split(','))]*2, fillvalue='')]
Out[215]: ['ab', 'cd', 'e']
(See: http://docs.python.org/library/itertools.html#itertools.izip_longest )
str = "a,b,c,d".split(",")
print ["".join(str[i:i+2]) for i in range(0, len(str), 2)]
Just split on every comma, then combine it back:
splitList = someString.split(",")
joinedString = ','.join([splitList[i - 1] + splitList[i] for i in range(1, len(splitList), 2)]
Like this
"a,b,c,d".split(',')