Split by regex without resulting empty strings in Python [duplicate] - python

This question already has answers here:
Why are there extra empty strings at the beginning and end of the list returned by re.split?
(4 answers)
Closed 7 years ago.
I want to split a string containing irregularly repeating delimiter, like method split() does:
>>> ' a b c de '.split()
['a', 'b', 'c', 'de']
However, when I apply split by regular expression, the result is different (empty strings sneak into the resulting list):
>>> re.split('\s+', ' a b c de ')
['', 'a', 'b', 'c', 'de', '']
>>> re.split('\.+', '.a.b...c..de..')
['', 'a', 'b', 'c', 'de', '']
And what I want to see:
>>>some_smart_split_method('.a.b...c..de..')
['a', 'b', 'c', 'de']

The empty strings are just an inevitable result of the regex split (though there is good reasoning as to why that behavior might be desireable). To get rid of them you can call filter on the result.
results = re.split(...)
results = list(filter(None, results))
Note the list() transform is only necessary in Python 3 -- in Python 2 filter() returns a list, while in 3 it returns a filter object.

>>> re.findall(r'\S+', ' a b c de ')
['a', 'b', 'c', 'de']

Related

Convert string of list of list to normal python list [duplicate]

This question already has answers here:
How to convert string representation of list to a list
(19 answers)
Closed 2 years ago.
I have a string in the format of
string= "['a', 'b', 'a e']['de']['a']['a']"
I want this string to get converted into a list of list as
lst= [['a', 'b', 'a e'], ['de'], ['a'], ['a']]
Try this:
import ast
string = "['a', 'b', 'a e']['de']['a']['a']"
string = string.replace("]", "],")
list_ = list(ast.literal_eval(string))
print(list_)
output:
[['a', 'b', 'a e'], ['de'], ['a'], ['a']]
Keep in mind that this will fail if one of items in the list is a ] character.
Here is a simple way -
import ast
string = "['a', 'b', 'a e']['de']['a']['a']"
[ast.literal_eval(i+']') for i in string.split(']') if len(i)>0]
[['a', 'b', 'a e'], ['de'], ['a'], ['a']]
Explanation:
string.split(']') breaks the list by the ] bracket
i+']' appends the ] bracket which is now removed back for each element
len(i)>0 makes sure that no empty lists are being considered
ast.literal_eval(i+']') converts the string to actual list.

How to print a list of tupled tuples in CSV-acceptable format? [duplicate]

This question already has answers here:
Flatten an irregular (arbitrarily nested) list of lists
(51 answers)
Closed 5 years ago.
I have a list of tuples I would like to print in CSV format without quotes or brackets.
[(('a','b','c'), 'd'), ... ,(('e','f','g'), 'h')]
Desired output:
a,b,c,d,e,f,g,h
I can get rid of some of the punctuation using chain, .join() or the *-operator, but my knowledge is not sophisticated enough to get rid of all of it for my particular use case.
Thank you.
So, in your case there is a pattern which makes this relatively easy:
>>> x = [(('a','b','c'), 'd') ,(('e','f','g'), 'h')]
>>> [c for a,b in x for c in (*a, b)]
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
Or, an itertools.chain solution:
>>> list(chain.from_iterable((*a, b) for a,b in x))
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
>>>
And, in case you are on an old version of Python, and can't use (*a, b) you will need something like:
[c for a,b in x for c in a+(b,)]

Split the word by each character python 3.5 [duplicate]

This question already has answers here:
How do I split a string into a list of characters?
(15 answers)
Closed 2 years ago.
I tried using all the methods suggested by others but its not working.
methods like str.split(), lst = list("abcd") but its throwing error saying [TypeError: 'list' object is not callable]
I want to convert string to list for each character in the word
input str= "abc" should give list = ['a','b','c']
I want to get the characters of the str in form of list
output - ['a','b','c','d','e','f'] but its giving ['abcdef']
str = "abcdef"
l = str.split()
print l
First, don't use list as a variable name. It will prevent you from doing what you want, because it will shadow the list class name.
You can do this by simply constructing a list from the string:
l = list('abcedf')
sets l to the list ['a', 'b', 'c', 'e', 'd', 'f']
First of all, don't use list as name of the variable in your program. It is a defined keyword in python and it is not a good practice.
If you had,
str = 'a b c d e f g'
then,
list = str.split()
print list
>>>['a', 'b', 'c', 'd', 'e', 'f', 'g']
Since split by default will work on spaces, it Will give what you need.
In your case, you can just use,
print list(s)
>>>['a', 'b', 'c', 'd', 'e', 'f', 'g']
Q. "I want to convert string to list for each character in the word"
A. You can use a simple list comprehension.
Input:
new_str = "abcdef"
[character for character in new_str]
Output:
['a', 'b', 'c', 'd', 'e', 'f']
Just use a for loop.
input:::
str="abc"
li=[]
for i in str:
li.append(i)
print(li)
#use list function instead of for loop
print(list(str))
output:::
["a","b","c"]
["a","b","c"]

How to find double occurrence of a letter in a word [duplicate]

This question already has answers here:
RegExp match repeated characters
(6 answers)
Closed 8 years ago.
I have string :-
s = 'bubble'
how to use regular expression to get a list like:
['b', 'u', 'bb', 'l', 'e']
I want to filter single as well as double occurrence of a letter.
This should do it:
import re
[m.group(0) for m in re.finditer('(.)\\1*',s)]
For 'bubbles' this returns:
['b', 'u', 'bb', 'l', 'e', 's']
For 'bubblesssss' this returns:
['b', 'u', 'bb', 'l', 'e', 'sssss']
You really have two questions. The first question is how to split the list, the second is how to filter.
The splitting takes advantage of back references in a pattern. In this case we'll construct a pattern the will find one or two occurrences of a letter then construct a list from the search results. The \1 in the code block refers to the first parenthesized expression.
import re
pattern = re.compile(r'(.)\1?')
s = "bubble"
result = [x.group() for x in pattern.finditer(s)]
print(result)
To filter the list stored in result you could use a list comprehension that filters on length.
filtered_result = [x for x in result if len(x) == 2]
print(filtered_result)
You could just get the set of duplications directly by tweaking the regular expression.
pattern2 = re.compile(r'(.)\1')
result2 = [x.group() for x in pattern2.finditer(s)]
print(result2)
The output from running the above is:
['b', 'u', 'bb', 'l', 'e']
['bb']
['bb']

Joining pairs of elements of a list [duplicate]

This question already has answers here:
How to iterate over a list in chunks
(39 answers)
Closed 8 months ago.
I know that a list can be joined to make one long string as in:
x = ['a', 'b', 'c', 'd']
print ''.join(x)
Obviously this would output:
'abcd'
However, what I am trying to do is simply join the first and second strings in the list, then join the third and fourth and so on. In short, from the above example instead achieve an output of:
['ab', 'cd']
Is there any simple way to do this? I should probably also mention that the lengths of the strings in the list will be unpredictable, as will the number of strings within the list, though the number of strings will always be even. So the original list could just as well be:
['abcd', 'e', 'fg', 'hijklmn', 'opq', 'r']
You can use slice notation with steps:
>>> x = "abcdefghijklm"
>>> x[0::2] #0. 2. 4...
'acegikm'
>>> x[1::2] #1. 3. 5 ..
'bdfhjl'
>>> [i+j for i,j in zip(x[::2], x[1::2])] # zip makes (0,1),(2,3) ...
['ab', 'cd', 'ef', 'gh', 'ij', 'kl']
Same logic applies for lists too. String lenght doesn't matter, because you're simply adding two strings together.
Use an iterator.
List comprehension:
>>> si = iter(['abcd', 'e', 'fg', 'hijklmn', 'opq', 'r'])
>>> [c+next(si, '') for c in si]
['abcde', 'fghijklmn', 'opqr']
Very efficient for memory usage.
Exactly one traversal of s
Generator expression:
>>> si = iter(['abcd', 'e', 'fg', 'hijklmn', 'opq', 'r'])
>>> pair_iter = (c+next(si, '') for c in si)
>>> pair_iter # can be used in a for loop
<generator object at 0x4ccaa8>
>>> list(pair_iter)
['abcde', 'fghijklmn', 'opqr']
use as an iterator
Using map, str.__add__, iter
>>> si = iter(['abcd', 'e', 'fg', 'hijklmn', 'opq', 'r'])
>>> map(str.__add__, si, si)
['abcde', 'fghijklmn', 'opqr']
next(iterator[, default]) is available starting in Python 2.6
just to be pythonic :-)
>>> x = ['a1sd','23df','aaa','ccc','rrrr', 'ssss', 'e', '']
>>> [x[i] + x[i+1] for i in range(0,len(x),2)]
['a1sd23df', 'aaaccc', 'rrrrssss', 'e']
in case the you want to be alarmed if the list length is odd you can try:
[x[i] + x[i+1] if not len(x) %2 else 'odd index' for i in range(0,len(x),2)]
Best of Luck
Without building temporary lists:
>>> import itertools
>>> s = 'abcdefgh'
>>> si = iter(s)
>>> [''.join(each) for each in itertools.izip(si, si)]
['ab', 'cd', 'ef', 'gh']
or:
>>> import itertools
>>> s = 'abcdefgh'
>>> si = iter(s)
>>> map(''.join, itertools.izip(si, si))
['ab', 'cd', 'ef', 'gh']
>>> lst = ['abcd', 'e', 'fg', 'hijklmn', 'opq', 'r']
>>> print [lst[2*i]+lst[2*i+1] for i in range(len(lst)/2)]
['abcde', 'fghijklmn', 'opqr']
Well I would do it this way as I am no good with Regs..
CODE
t = '1. eat, food\n\
7am\n\
2. brush, teeth\n\
8am\n\
3. crack, eggs\n\
1pm'.splitlines()
print [i+j for i,j in zip(t[::2],t[1::2])]
output:
['1. eat, food 7am', '2. brush, teeth 8am', '3. crack, eggs 1pm']
Hope this helps :)
I came across this page interesting yesterday while wanting to solve a similar issue. I wanted to join items first in pairs using one string in between and then together using another string. Based on the code above I came up with the following function:
def pairs(params,pair_str, join_str):
"""Complex string join where items are first joined in pairs
"""
terms = iter(params)
pairs = [pair_str.join(filter(len, [term, next(terms, '')])) for term in terms]
return join_str.join(pairs)
This results in the following:
a = ['1','2','3','4','5','6','7','8','9']
print(pairs(a, ' plus ', ' and '))
>>1 plus 2 and 3 plus 4 and 5 plus 6 and 7 plus 8 and 9
The filter step prevents the '' which is produced in case of an odd number of terms from putting a final pair_str at the end.

Categories