Split python string in a specific way [duplicate] - python

This question already has answers here:
Split a string by a delimiter in python
(5 answers)
Match text between two strings with regular expression
(3 answers)
Closed 5 months ago.
I have a string like a = 'This is an example string that has a code !3377! this is the code I want to extract'.
How can I extract 3377 from this string, i.e., the part surrounded by !?

There are multiple ways of doing what you are looking for. But the most optimal way of doing it would be by using regular expressions.
For example, in the case you gave:
import re
def subtract_code_from(sentence: str) -> str:
m = re.search(r'\w?!(\d+)!\w?', sentence)
return m.group(0)
Keep in mind that what I've done is a very quick and loose solution I implemented in five minutes. I don't know what other types of particular cases you could encounter for each sentence. So it is your job to implement the proper regex to match all the cases.
I encourage you to follow this tutorial. And you can use this website to build your regexes.
Good luck.

Related

Python: Using Regex to remove multiple occurrences of punctuation? [duplicate]

This question already has answers here:
strip punctuation with regex - python
(4 answers)
Closed 2 years ago.
I'm looking to remove reoccurring punctuation in a row.
E.g turn 'Hello...' into 'Hello.'
I've been reading some of the documentation on the matter, but am struggling to find a definitive method. (I personally find the docs on regex to a be a little overwhelming, and unclear at times).
I thought it may be something along the lines of:
re.sub('[!()-{};:,<>./?##$%^&*_~]+', '', input)
But this doesn't work. Any help? Thanks.
You can use this:
import re
input='Hello...'
re.sub(r'(\W)(?=\1)', '', input)
Output:
'Hello.'

re.findall return separate non-overlapping results [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
What do 'lazy' and 'greedy' mean in the context of regular expressions?
(13 answers)
Closed 4 years ago.
I am new to Python and I am struggling a bit with regular expressions. If I have an input like this:
text = <tag>xyz</tag>\n<tag>abc</tag>
Is it possible to get an output list with elements like:
matches = ['<tag>xyz</tag>','<tag>abc</tag>]
Right now I am using the following regex
matches = re.findall(r"<tag>[\w\W]*</tag>", text)
But instead of a list with two elements I am getting only one element with the whole input string like:
matches = ['<tag>xyz</tag>\n<tag>abc</tag>']
Could someone please guide me?
Thank you.
You just need to make your capture non-greedy.
Change this regex,
<tag>[\w\W]*</tag>
to
<tag>[\w\W]*?</tag>
import re
text = '<tag>xyz</tag>\n<tag>abc</tag>'
matches = re.findall(r"<tag>[\w\W]*?</tag>", text)
print(matches)
Prints,
['<tag>xyz</tag>', '<tag>abc</tag>']

Date regex in a sentence [duplicate]

This question already has answers here:
How to match a whole word with a regular expression?
(4 answers)
Closed 4 years ago.
I'm trying to use the date regex from this post:
^(?:(?:31(\/|-|\.)(?:0?[13578]|1[02]|(?:Jan|Mar|May|Jul|Aug|Oct|Dec)))\1|(?:(?:29|30)(\/|-|\.)(?:0?[1,3-9]|1[0-2]|(?:Jan|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec))\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:29(\/|-|\.)(?:0?2|(?:Feb))\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:0?[1-9]|1\d|2[0-8])(\/|-|\.)(?:(?:0?[1-9]|(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep))|(?:1[0-2]|(?:Oct|Nov|Dec)))\4(?:(?:1[6-9]|[2-9]\d)?\d{2})$
However, I want to find all matches that are also wrapped around white spaces.
For example in this sentence:
I went to Disney World on 11/11/1989 and once more on 12/12/2009
I want to get back:
11/11/1989
12/12/2009
How do I accomplish this? I'm using Python3 regex module if it matters.
If you want to tweak the regex you linked to work in a string like that, change the three ^ and $s to word boundaries (\b) instead:
\b(?:(?:31(\/|-|\.)(?:0?[13578]|1[02]|(?:Jan|Mar|May|Jul|Aug|Oct|Dec)))\1|(?:(?:29|30)(\/|-|\.)(?:0?[1,3-9]|1[0-2]|(?:Jan|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec))\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:29(\/|-|\.)(?:0?2|(?:Feb))\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))\b|\b(?:0?[1-9]|1\d|2[0-8])(\/|-|\.)(?:(?:0?[1-9]|(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep))|(?:1[0-2]|(?:Oct|Nov|Dec)))\4(?:(?:1[6-9]|[2-9]\d)?\d{2})\b
https://regex101.com/r/WX5Itv/1

unable to match this regular expression in python [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 4 years ago.
I am trying to match regular expression using python in this code.
CDS_REGEX = re.compile(r'\+CDS:\s*"([^"]+)",\s*(\d+)$')
cdsiMatch = allLinesMatchingPattern(self.CDS_REGEX, notificationLine)
print cdsiMatch
Matching String:
['+CDS: 24', '079119890400202306A00AA17909913764514010106115225140101061452200']
Please help me i am not able to find my mistake,
As #Blckknght said, are you sure you really want to match that string?
What is ([^"]+) supposed to match?
You're looking for " instead of ' (you probably want ['"]).
You're only checking for numbers here: (\d+), but your long string clearly contains A's.

Python Function To Find String Between Two Markers [duplicate]

This question already has answers here:
How to extract the substring between two markers?
(22 answers)
Closed 4 years ago.
I'm looking to build a string function to extract the string contents between two markers. It returns an extraction list
def extract(raw_string, start_marker, end_marker):
... function ...
return extraction_list
I know this can be done using regex but is this fast? This will be called billions of times in my process. What is the fastest way to do this?
What happens if the markers are the same and appear and odd number of times?
The function should return multiple strings if the start and end markers appear more than once.
You probably can't go faster than:
def extract(raw_string, start_marker, end_marker):
start = raw_string.index(start_marker) + len(start_marker)
end = raw_string.index(end_marker, start)
return raw_string[start:end]
But if you want to try regex, just try to benchmark it. There's a good timeit module for it.

Categories