Python: Using Regex to remove multiple occurrences of punctuation? [duplicate]

Python: Using Regex to remove multiple occurrences of punctuation? [duplicate] - python

This question already has answers here:
strip punctuation with regex - python
(4 answers)
Closed 2 years ago.
I'm looking to remove reoccurring punctuation in a row.
E.g turn 'Hello...' into 'Hello.'
I've been reading some of the documentation on the matter, but am struggling to find a definitive method. (I personally find the docs on regex to a be a little overwhelming, and unclear at times).
I thought it may be something along the lines of:
re.sub('[!()-{};:,<>./?##$%^&*_~]+', '', input)
But this doesn't work. Any help? Thanks.

You can use this:
import re
input='Hello...'
re.sub(r'(\W)(?=\1)', '', input)
Output:
'Hello.'

Related

Split python string in a specific way [duplicate]

This question already has answers here:
Split a string by a delimiter in python
(5 answers)
Match text between two strings with regular expression
(3 answers)
Closed 5 months ago.
I have a string like a = 'This is an example string that has a code !3377! this is the code I want to extract'.
How can I extract 3377 from this string, i.e., the part surrounded by !?

There are multiple ways of doing what you are looking for. But the most optimal way of doing it would be by using regular expressions.
For example, in the case you gave:
import re
def subtract_code_from(sentence: str) -> str:
m = re.search(r'\w?!(\d+)!\w?', sentence)
return m.group(0)
Keep in mind that what I've done is a very quick and loose solution I implemented in five minutes. I don't know what other types of particular cases you could encounter for each sentence. So it is your job to implement the proper regex to match all the cases.
I encourage you to follow this tutorial. And you can use this website to build your regexes.
Good luck.

re.findall return separate non-overlapping results [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
What do 'lazy' and 'greedy' mean in the context of regular expressions?
(13 answers)
Closed 4 years ago.
I am new to Python and I am struggling a bit with regular expressions. If I have an input like this:
text = <tag>xyz</tag>\n<tag>abc</tag>
Is it possible to get an output list with elements like:
matches = ['<tag>xyz</tag>','<tag>abc</tag>]
Right now I am using the following regex
matches = re.findall(r"<tag>[\w\W]*</tag>", text)
But instead of a list with two elements I am getting only one element with the whole input string like:
matches = ['<tag>xyz</tag>\n<tag>abc</tag>']
Could someone please guide me?
Thank you.

You just need to make your capture non-greedy.
Change this regex,
<tag>[\w\W]*</tag>
to
<tag>[\w\W]*?</tag>
import re
text = '<tag>xyz</tag>\n<tag>abc</tag>'
matches = re.findall(r"<tag>[\w\W]*?</tag>", text)
print(matches)
Prints,
['<tag>xyz</tag>', '<tag>abc</tag>']

get strings between 2 delimiter in python [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 4 years ago.
I would like to get, from the following string "/path/to/%directory_1%/%directory_2%.csv"
the following list: [directory_1, directory_2]. I would like to avoid using split by "%" my string. I was hoping to find a regex that could help me. However I cannot find the correct one.
For now, I have the following:
re.findall('%(.*)%', dirty_arg)
which output ["directory_1%/%directory_2"]
Do you have any recommandation about that?
Thank you very much for your help.

Try this:
import re
regex = r"%(.*?)%"
dirty_arg = "/path/to/%directory_1%/%directory_2%.csv"
print(re.findall(regex, dirty_arg))
I've added ? to your regex which makes sure it matches as few times as possible. The output of this code is ['directory_1', 'directory_2']

Date regex in a sentence [duplicate]

This question already has answers here:
How to match a whole word with a regular expression?
(4 answers)
Closed 4 years ago.
I'm trying to use the date regex from this post:
^(?:(?:31(\/|-|\.)(?:0?[13578]|1[02]|(?:Jan|Mar|May|Jul|Aug|Oct|Dec)))\1|(?:(?:29|30)(\/|-|\.)(?:0?[1,3-9]|1[0-2]|(?:Jan|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec))\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:29(\/|-|\.)(?:0?2|(?:Feb))\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:0?[1-9]|1\d|2[0-8])(\/|-|\.)(?:(?:0?[1-9]|(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep))|(?:1[0-2]|(?:Oct|Nov|Dec)))\4(?:(?:1[6-9]|[2-9]\d)?\d{2})$
However, I want to find all matches that are also wrapped around white spaces.
For example in this sentence:
I went to Disney World on 11/11/1989 and once more on 12/12/2009
I want to get back:
11/11/1989
12/12/2009
How do I accomplish this? I'm using Python3 regex module if it matters.

If you want to tweak the regex you linked to work in a string like that, change the three ^ and $s to word boundaries (\b) instead:
\b(?:(?:31(\/|-|\.)(?:0?[13578]|1[02]|(?:Jan|Mar|May|Jul|Aug|Oct|Dec)))\1|(?:(?:29|30)(\/|-|\.)(?:0?[1,3-9]|1[0-2]|(?:Jan|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec))\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:29(\/|-|\.)(?:0?2|(?:Feb))\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))\b|\b(?:0?[1-9]|1\d|2[0-8])(\/|-|\.)(?:(?:0?[1-9]|(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep))|(?:1[0-2]|(?:Oct|Nov|Dec)))\4(?:(?:1[6-9]|[2-9]\d)?\d{2})\b
https://regex101.com/r/WX5Itv/1

What are () (parentheses) are for in regex python [duplicate]

This question already has answers here:
Python regex -- extraneous matchings
(5 answers)
Closed 6 years ago.
I searched in all the internet and didnt get a good answer on this thing.
What parentheses in python are stand for? its very wierd..
For example, if i do:
re.split(r'(/s*)', "ho from there")
its will give me a list of separate words with the spaces between that... how does its happening?

This isn't specific to python, but in regex those denote a capture group.
Further information on how these are handled in re.split can be seen here

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Using Regex to remove multiple occurrences of punctuation? [duplicate] - python

You can use this: import re input='Hello...' re.sub(r'(\W)(?=\1)', '', input) Output: 'Hello.'

Related

Split python string in a specific way [duplicate]

re.findall return separate non-overlapping results [duplicate]

get strings between 2 delimiter in python [duplicate]

Date regex in a sentence [duplicate]

What are () (parentheses) are for in regex python [duplicate]

Categories

Resources