Splitting on a lookahead [duplicate] - python

This question already has answers here:
Decode HTML entities in Python string?
(6 answers)
Closed 6 years ago.
I'm trying to split on a lookahead, but it doesn't work for the last occurrence. How do I do this?
my_str = 'HRC’s'
import re
print(re.split(r'.(?=&)', my_str))
My output:
['HR', '&#226', '&#128', '™s']
My desired output:
['HRC', '&#226', '&#128', '&#153', 's']

The solution using re.findall() function:
my_str = 'HRC’s'
result = re.findall(r'\w+|&#\d+(?=;)', my_str)
print(result)
The output:
['HRC', '&#226', '&#128', '&#153', 's']

Related

about Regex in Python [duplicate]

This question already has answers here:
Python extract pattern matches
(10 answers)
Closed 1 year ago.
I made this code:
import re
match = re.search(r'[DER]\d+[Y]', 'DER1234Y' )
print(match.group())
and it prints this :
R1234Y
I want the code to only print the numbers and nothing else. How to do that ?
It's basically regex. So would this work?: re.sub('[^0-9]+', '', 'DER1234Y')
[^0-9]+ = everything that is not a numeric value (0-9).

Regex split numbers and letter groups without spaces with python [duplicate]

This question already has answers here:
How to split strings into text and number?
(11 answers)
Closed 2 years ago.
I have a string like 'S10', 'S11' v.v
How to split this to ['S','10'], ['S','11']
example:
import re
str = 'S10'
re.compile(...)
result = re.split(str)
result:
print(result)
// ['S','10']
resolved at How to split strings into text and number?
This should do the trick:
I'm using capture groups using the circle brackets to match the alphabetical part to the first group and the numbers to the second group.
Code:
import re
str_data = 'S10'
exp = "(\w)(\d+)"
match = re.match(exp, str_data)
result = match.groups()
Output:
('S', '10')

python string parsing with regular expression [duplicate]

This question already has answers here:
In Python, how do I split a string and keep the separators?
(19 answers)
Closed 2 years ago.
I can get numeric with this:
>>> import re
>>> re.findall(r'\d+', '!"123%&654()')
['123', '654']
How can I get all the components ?
['!"', '123', '%&', '654', '()']
For reference, with findall, you would greedily look for only digits, or only non-digits:
re.findall(r'\d+|\D+', '!"123%&654()')
# ['!"', '123', '%&', '654', '()']
split is a little cleaner.

How can I split a string maintaining the punctuation? (Python) [duplicate]

This question already has answers here:
Splitting a string into words and punctuation
(11 answers)
Closed 3 years ago.
How can I split a string in python taking into account the punctuation in the result?
The following code:
s = "Hello, my name is Robert."
s_splitted = s.split()
will give as output:
["Hello,","my","name","is","Robert."]
How can I obtain the following result?
["Hello",",","my","name","is","Robert","."]
Regex can handle this.
import re
s = "Hello, my name is Robert."
s_splitted = [part for part in re.split(r'\b|\s', s) if part != '']
# ['Hello', ',', 'my', 'name', 'is', 'Robert']
Does this answer your question?
So in your case:
import re
s = "Hello, my name is Robert."
items = re.findall(r"[\w']+|[.,!?;]", s)

Count how many time a string appears in a longer string [duplicate]

This question already has answers here:
String count with overlapping occurrences [closed]
(25 answers)
Closed 7 years ago.
So I have a little problem,
I want to count how many times a string : "aa" is in my longer string "aaatattgg" its looks like a dna sequence.
Here for exemple I expect 2 (overlap is allow)
There is the .count method but overlap is not allowed
PS: excuse my english , I'm french
Through re module. Put your regex inside positive lookarounds in-order to do overlapping match.
>>> import re
>>> s = "aaatattgg"
>>> re.findall(r'(?=(aa))', s)
['aa', 'aa']
>>> len(re.findall(r'(?=(aa))', s))
2

Categories