This question already has an answer here:
Detect repetitions in string
(1 answer)
Closed 8 years ago.
Let's suppose I have this string
s = '123123123'
I can notice the '123' sub-string is being repeated.
here = '1234'
The sub-string would be '1234' with no repetitions.
s = '11111'
The sub-string would be '1'
How can I get this with Python? Any hints?
strings = ['123123123', '1234', '11111']
import re
pattern, result = re.compile(r'(.+?)\1+'), []
for item in strings:
result.extend(pattern.findall(item) or [item])
print result
# ['123', '1234', '1']
Debuggex Demo
You can see the explanation for the RegEx here
Related
This question already has answers here:
How to split strings into text and number?
(11 answers)
Closed 2 years ago.
I have a string like 'S10', 'S11' v.v
How to split this to ['S','10'], ['S','11']
example:
import re
str = 'S10'
re.compile(...)
result = re.split(str)
result:
print(result)
// ['S','10']
resolved at How to split strings into text and number?
This should do the trick:
I'm using capture groups using the circle brackets to match the alphabetical part to the first group and the numbers to the second group.
Code:
import re
str_data = 'S10'
exp = "(\w)(\d+)"
match = re.match(exp, str_data)
result = match.groups()
Output:
('S', '10')
This question already has answers here:
In Python, how do I split a string and keep the separators?
(19 answers)
Closed 2 years ago.
I can get numeric with this:
>>> import re
>>> re.findall(r'\d+', '!"123%&654()')
['123', '654']
How can I get all the components ?
['!"', '123', '%&', '654', '()']
For reference, with findall, you would greedily look for only digits, or only non-digits:
re.findall(r'\d+|\D+', '!"123%&654()')
# ['!"', '123', '%&', '654', '()']
split is a little cleaner.
This question already has answers here:
Splitting a string into words and punctuation
(11 answers)
Closed 3 years ago.
How can I split a string in python taking into account the punctuation in the result?
The following code:
s = "Hello, my name is Robert."
s_splitted = s.split()
will give as output:
["Hello,","my","name","is","Robert."]
How can I obtain the following result?
["Hello",",","my","name","is","Robert","."]
Regex can handle this.
import re
s = "Hello, my name is Robert."
s_splitted = [part for part in re.split(r'\b|\s', s) if part != '']
# ['Hello', ',', 'my', 'name', 'is', 'Robert']
Does this answer your question?
So in your case:
import re
s = "Hello, my name is Robert."
items = re.findall(r"[\w']+|[.,!?;]", s)
This question already has answers here:
Remove empty strings from a list of strings
(13 answers)
Closed 3 years ago.
In the code block below, I understand that s is the string. re.split() will generate a list of split results and the list comprehension will iterate through every result created.
I don't understand how "if i" will work here.
This is from the following stackoverflow thread: https://stackoverflow.com/a/28290501/11292262
s = '125km'
>>> [i for i in re.split(r'([A-Za-z]+)', s) if i]
['125', 'km']
>>> [i for i in re.split(r'(\d+)', s) if i]
['125', 'km']
Empty strings evaluate to False. Note what happens when we take the if out:
import re
s = '125km'
print(re.split(r'([A-Za-z]+)', s))
print(re.split(r'(\d+)', s))
Output:
['125', 'km', '']
['', '125', 'km']
The if is used to remove the empty string, which is unwanted, per that question. Note that the capture groups in both expressions are needed to ensure that the part of the string split on (value or unit) is also returned.
This question already has answers here:
Decode HTML entities in Python string?
(6 answers)
Closed 6 years ago.
I'm trying to split on a lookahead, but it doesn't work for the last occurrence. How do I do this?
my_str = 'HRCâs'
import re
print(re.split(r'.(?=&)', my_str))
My output:
['HR', 'â', '€', 's']
My desired output:
['HRC', 'â', '€', '™', 's']
The solution using re.findall() function:
my_str = 'HRCâs'
result = re.findall(r'\w+|&#\d+(?=;)', my_str)
print(result)
The output:
['HRC', 'â', '€', '™', 's']