RegEx works in regexr but not in python re - python

I have this regex: If you don't want these messages, please [a-zA-Z0-9öäüÖÄÜ<>\n\-=#;&?_ "/:.#]+settings<\/a>. It works on regexr but not when I am using the re
library in Python:
data = "<my text (comes from a file)>"
search = "If you don't want these messages, please [a-zA-Z0-9öäüÖÄÜ<>\n\-=#;&?_ \"/:.#]+settings<\/a>" # this search string comes from a database, so it's not hardcoded into my script
print(re.search(search, data))
Is there something I don't see?
Thank you!

the pattern you are using on regexr contains \- but in your exemple shows \\- wich may give an incorrect regex. (and add the r in front of of the string as jupiterby said).

Related

Python Regex - How do I fetch a word after a specific word in a string using python regex?

I need to fetch "repo-name" which is "sonar-repo" from the above multi-line commit string. Can this be achieved with regex? Output Expected: sonar-repo
Here is the string which I need to read using regex,
commit_message=
"""repo-name=sonar-repo;repo-title=Sonar;repo-description=A little demo;repo-requester=Jack
"""
You should be able to use regex to look for repo-name= and then look for the ; right after and get what's inbetween. Something like this:
(?<=repo-name=).*?(?=;)
Tested it here with regex101
Try this:
import re
commit_message= 'repo-name=sonar-repo;repo-title=Sonar;repo-description=A little demo;repo-requester=Jack'
print(re.search(r'repo-name=(.*?);', commit_message).group(1))
Output:
sonar-repo

Fetching specific data within the quotes using regex in python

I am trying to fetch the plugin version from the config XML file(fread) using regex.
Using the following regex. But I am getting the entire line instead, I am just interested in the version i.e "4.3.0". Any help on how that can be achieved?
(Pdb) key
'plugin="git'
(Pdb) re.findall(key+".*",fread)
['plugin="git#4.2.2">\\n <configVersion>2</configVersion>\\n <userRemoteConfigs>\\n <hudson.plugins.git.UserRemoteConfig>\\n
This may not be the most optimal regex check, but it does work: Regexr Tester The link will explain how the capture works. From that just strip the string of the # symbol.
this would also allow for finding multiple sets if they are defined if you want a more precise check you can search for git# blah blah blah and return all of those then just strip the front part essentially

how to find and replace special url patern (Markdown syntax to HTML) by re module in python

I have a string and I want to search this string to find special pattern containing URL and it's name and then I need to change it's format:
Input string:
'Thsi is my [site](http://example.com/url) you can watch it.'
Output string:
'This is my site you can watch it.'
The string may have several URLs and I need to change the format of every one and site is in unicode and can be every character in any language.
What pattern should be used and how I can do it?
This should help
import re
A = 'Thsi is my [site](http://example.com/url) you can watch it.'
site = re.compile( "\[(.*)\]" ).search(A).group(1)
url = re.compile( "\((.*)\)" ).search(A).group(1)
print A.replace("[{0}]".format(site), "").replace("({0})".format(url), '{1}'.format(url, site))
Output:
Thsi is my site you can watch it.
Update as request in Comments:
s = 'my [site](site.com) is about programing (python language)'
site, url = s[s.find("[")+1:s.find(")")].split("](")
print s.replace("[{0}]".format(site), "").replace("({0})".format(url), '{1}'.format(url, site))
Output:
my site is about programing (python language)
I'm not a markdown expert, but if this is indeed markdown that you're trying to replace, and not your own syntax, you should use an appropriate parser. Note that, if you paste your string directly into stackoverflow - which also uses markdown - it will be transformed into a link, so it would clearly be valid markdown.
If it is indeed your own format, however, try the following to transform
'This is my [site](http://example.com/url) you can watch it.'
into
'This is my site you can watch it.'
using the following match:
\[(.*?)\]\((.*?)\)
and the following replacement regex:
<a href="\\2">\\1<\/a>
In python, re.sub(match, replace, stringThatYouWantToReplaceStuffIn) should do the trick. Don't forget to assign the return value of re.sub to whatever variable should contain the new string.

Parsing a message with various special characters and splitting into a list (re and regex) Python 2.7

I am trying to parse a message that receives the following delimiters (Without quotes):
Delimiter1: "###" - Followed by a message
Delimiter2: "!!!" - A signal
Delimiter3: "---" - Followed by a message
Delimiter4: "###" - Followed by a message
Delimiter5: "$$$" - Followed by a message
I have so far:
import re
mystring = '###useradd---userfirstadded###userremoved!!!$$$message'
result = re.split('\\#\#\#|\\!\!\!|\\---|\\#\#\#|\\$\$\$',mystring)
print result
My result so far:
['', 'useradd', 'userfirstadded', 'userremoved', '', 'message']
I want as a result printed to console:
['###useradd','---userfirstadded','###userremoved','!!!','$$$message']
Is this possible using re.split or do I need to use re.find or something a lot better? I have been playing with the re.split delimiters as you can see but maybe you guys have a lot more experience using this functionality within python.
EDITED Solution #1 Using re (From #thefourtheye):
Here is the code:
import re
mystring = '###useradd---userfirstadd%ed###this is my username#!!!$$$hey whats up how are you??###useradd$$$This is my email #gmail.com!!!'
result = re.findall(r'!!!|(?:#|-|#|\$){3}[\w ^]+', mystring)
print result
The result printed is as follows:
['###useradd', '---userfirstadd', '###this is my username', '!!!', '$$$hey whats up how are you', '###useradd', '$$$This is my email ', '!!!']
EDITED New specifications:
Everything works as specified above and more using the following answer below that #thefourtheye has suggested. If there was possibly more functionality to the function as in allowing one or two of the delimiters or more that would be better as if the user wanted to type his email address in a message he would use the # symbol or a dollar amount with a $ etc. If this isn't possible, I can always add the delimiters with a space before and after or possibly ### to separate using the delimiters in a message or a different type of message. What are your suggestions?
Summary: I would like to add functionality of accepting all characters until hitting exactly the delimiter pattern (i.e. ###) Otherwise accept every possible character including the characters in a delimiter pattern in the string (i.e. ### would not split the string) Is this possible?
EDITED Solution #2 Using regex (From #hwnd):
Regex is not installed to python 2.7 if you are using that. You need to download and install this package. These are the explicit directions I took so you can do the same.
Go to https://pypi.python.org/pypi/regex and at the bottom of the page there are download links. Click on regex-2015.03.18-cp27-none-win32.whl for Windows operating systems that are running Python 2.7 (Otherwise try other ones until a successful install works for you).
Browse to the download directory of the .whl file that you just downloaded. Shift+Right Click Anywhere in that directory and click on "Open command window here" and then type "pip install regex-2015.03.18-cp27-none-win32.whl" and should say "Successfully installed!"
You will now be able to use regex!
Here is the code:
import regex
mystring = '###useradd---userfirstadd%ed###this is my username#!!!$$$hey whats up how are you??###useradd$$$This is my email #gmail.com!!!'
result = filter(None, regex.split(r'(?V1)(!!!)|\s*(?=(?:#|\$|#|-){3})', mystring))
print result
The result printed is as follows:
['###useradd', '---userfirstadd%ed', '###this is my username#', '!!!', '$$$hey whats up how are you??', '###useradd', '$$$This is my email #gmail.com', '!!!']
Edit: Since you want to retain all the characters between your pattern delimiters, you can do this using the regex module, splitting on "!!!" and using lookahead for other zero-width matches.
>>> import regex
>>> s = '###useradd---userfirstadd%ed###this is my username#!!!$$$hey whats up how are you??###useradd$$$This is my email #gmail.com!!!'
>>> filter(None, regex.split(r'(?V1)(!!!)|\s*(?=(?:#|\$|#|-){3})', s))
['###useradd', '---userfirstadd%ed', '###this is my username#', '!!!', '$$$hey whats up how are you??', '###useradd', '$$$This is my email #gmail.com', '!!!']
use this regexp if will provide 5 matching groups
(#{3}[a-z]+)(-{3}[a-z]+)(#{3}[a-z]+)(!{3})(\${3}[a-z]+)

RegEx in Python for WikiMarkup

I'm trying to create a re in python that will match this pattern in order to parse MediaWiki Markup:
<ref>*Any_Character_Could_Be_Here</ref>
But I'm totally lost when it comes to regex. Can someone help me, or point me to a tutorial or resource that might be of some help. Thanks!'
Assuming that svick is correct that MediaWiki Markup is not valid xml (or html), then you could use re in this circumstance (although I will certainly defer to better solutions):
>>> import re
>>> test_string = '''<ref>*Any_Character_Could_Be_Here</ref>
<ref>other characters could be here</ref>'''
>>> re.findall(r'<ref>.*?</ref>', test_string)
['<ref>*Any_Character_Could_Be_Here</ref>', '<ref>other characters could be here</ref>'] # a list of matching strings
In any case, you will want to familiarize yourself with the re module (whether or not you use a regex to solve this particular problem).
srhoades28, this will match your pattern.
if re.search(r"<ref>\*[^<]*</ref>", subject):
# Successful match
else:
# Match attempt failed
Note that from your post, it is assumed that the * after always occurs, and that the only variable part is the blue text, in your example "Any_Character_Could_Be_Here".
If this is not the case let me know and I will tweak the expression.

Categories