This question already has answers here:
Splitting a string into words and punctuation
(11 answers)
Closed 3 years ago.
How can I split a string in python taking into account the punctuation in the result?
The following code:
s = "Hello, my name is Robert."
s_splitted = s.split()
will give as output:
["Hello,","my","name","is","Robert."]
How can I obtain the following result?
["Hello",",","my","name","is","Robert","."]
Regex can handle this.
import re
s = "Hello, my name is Robert."
s_splitted = [part for part in re.split(r'\b|\s', s) if part != '']
# ['Hello', ',', 'my', 'name', 'is', 'Robert']
Does this answer your question?
So in your case:
import re
s = "Hello, my name is Robert."
items = re.findall(r"[\w']+|[.,!?;]", s)
Related
This question already has answers here:
How to split strings into text and number?
(11 answers)
Closed 2 years ago.
I have a string like 'S10', 'S11' v.v
How to split this to ['S','10'], ['S','11']
example:
import re
str = 'S10'
re.compile(...)
result = re.split(str)
result:
print(result)
// ['S','10']
resolved at How to split strings into text and number?
This should do the trick:
I'm using capture groups using the circle brackets to match the alphabetical part to the first group and the numbers to the second group.
Code:
import re
str_data = 'S10'
exp = "(\w)(\d+)"
match = re.match(exp, str_data)
result = match.groups()
Output:
('S', '10')
This question already has answers here:
How to use python regex to replace using captured group? [duplicate]
(4 answers)
Closed 3 years ago.
I have the following string: message = 'hi <#ABC> and <#DEF>',
And the following Regex: exp = '<#(.*?)>', so that re.findall(exp, message) outputs ['ABC', 'DEF']. How can I replace the original message matches with those outputs so I get 'hi ABC and DEF'?
import re
line = re.sub(
r"<#(.*?)>",
r"\1",
line
)
This question already has answers here:
Decode HTML entities in Python string?
(6 answers)
Closed 6 years ago.
I'm trying to split on a lookahead, but it doesn't work for the last occurrence. How do I do this?
my_str = 'HRCâs'
import re
print(re.split(r'.(?=&)', my_str))
My output:
['HR', 'â', '€', 's']
My desired output:
['HRC', 'â', '€', '™', 's']
The solution using re.findall() function:
my_str = 'HRCâs'
result = re.findall(r'\w+|&#\d+(?=;)', my_str)
print(result)
The output:
['HRC', 'â', '€', '™', 's']
This question already has an answer here:
Detect repetitions in string
(1 answer)
Closed 8 years ago.
Let's suppose I have this string
s = '123123123'
I can notice the '123' sub-string is being repeated.
here = '1234'
The sub-string would be '1234' with no repetitions.
s = '11111'
The sub-string would be '1'
How can I get this with Python? Any hints?
strings = ['123123123', '1234', '11111']
import re
pattern, result = re.compile(r'(.+?)\1+'), []
for item in strings:
result.extend(pattern.findall(item) or [item])
print result
# ['123', '1234', '1']
Debuggex Demo
You can see the explanation for the RegEx here
This question already has answers here:
Is there a simple way to remove multiple spaces in a string?
(27 answers)
Closed 6 years ago.
I wanna know how to remove unwanted space in between a string. For example:
>>> a = "Hello world"
and i want to print it removing the extra middle spaces.
Hello world
This will work:
" ".join(a.split())
Without any arguments, a.split() will automatically split on whitespace and discard duplicates, the " ".join() joins the resulting list into one string.
Regular expressions also work
>>> import re
>>> re.sub(r'\s+', ' ', 'Hello World')
'Hello World'