How to create a regular expression to replace a url? - python

I'm trying to create a regular expression using re.sub() that can replace a URL from a string for example.
tool(id='merge_tool', server='http://localhost:8080')
I created a regular expression that returns a string something like given below.
a = "https:192.168.1.1:8080"
re.sub(r'http\S+', a, "tool(id='merge_tool', server='http://localhost:8080')")
results:
"tool(id='merge_tool', server='https:192.168.1.1"
Or if I provide this URL:
b = 'https:facebook.com'
re.sub(r'http\S+', b, "tool(id='merge_tool', server='http://localhost:8080')")
Results:
"tool(id='merge_tool', server='https:facebook.com"
How to fix this so that it can return the entire string after replacing the URL?

You can use
re.sub(r"http[^\s']+", b.replace('\\', '\\\\'), "tool(id='merge_tool', server='http://localhost:8080')")
Note that
http[^\s']+ will match http and then any one or more chars other than whitespace and single quote
b.replace('\\', '\\\\') is a must for cases where replacement literal string is dynamic, and all backslashes in it must be doubled to work as expected.

Related

Python string.rstrip() doesn't strip specified characters

string = "hi())("
string = string.rstrip("abcdefghijklmnoprstuwxyz")
print(string)
I want to remove every letter from given string using rstrip method, however it does not change the string in the slightest.
Output:
'hi())('
What i Want:
'())('
I know that I can use regex, but I really don't understand why it doesn't work.
Note : It is a part of the Valid Parentheses challenge on code-wars
You have to use lstrip instead of rstrip:
>>> string = "hi())("
>>> string = string.lstrip("abcdefghijklmnoprstuwxyz")
>>> string
'())('

How to combine regular expression and string format to get expected digits code?

I'd like to use a combination of regular expression and string format to match some string of the type, for example 'USA10Y1Y'.
code = 'USA'
b = 1
r'{}\d{{1,2}}Y{}Y'.format(code, b)
>>> 'USA\\d{1,2}Y1Y'
What I need is following, as they are going to put in re.search
>>> 'USA\d{1,2}Y1Y'
How can I get rid of the extra slash before \d?

Python RE question - proper state initial formatting

I have a string that I need to edit, it looks something similar to this:
string = "Idaho Ave N,,Crystal,Mn,55427-1463,US,,610839124763,Expedited"
If you notice the state initial "Mn" is not in proper formatting. I'm trying to use a regular expression to change this:
re.sub("[A-Z][a-z],", "[A-Z][A-Z],", string)
However, re.sub treats the second part as a literal and will change Mn, to [A-Z][A-Z],. How would I use re.sub (or something similar and simple) to properly change Mn, to MN, in this string?
Thank you in advance!
Your re.sub might modify also parts of the string you would not want to modify. Try to process the right element in your list explicitly:
input = "Idaho Ave N,,Crystal,Mn,55427-1463,US,,610839124763,Expedited"
elems = input.split(',')
elems[3] = elems[3].upper()
output = ','.join(elems)
returns
'Idaho Ave N,,Crystal,MN,55427-1463,US,,610839124763,Expedited'
You can pass a function as the replacement parameter to re.sub to generate the replacement string from the match object, e.g.:
import re
s = "Idaho Ave N,,Crystal,Mn,55427-1463,US,,610839124763,Expedited"
def upcase(match):
return match.group().upper()
print re.sub("[A-Z][a-z],", upcase, s)
(This is ignoring the concern of whether you're genuinely finding state initials with this method.)
The appropriate documentation for re.sub is here.
sub(pattern, repl, string, count=0)
Return the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in string by the
replacement repl. repl can be either a string or a callable;
if a string, backslash escapes in it are processed. If it is
a callable, it's passed the match object and must return
a replacement string to be used.
re.sub("[A-Z][a-z]", lambda m: m.group(0).upper(), myString)
I would avoid calling your variable string since that is a type name.
You create a group by surrounding it in parentheses withing your regex, then refer to is by its group number:
re.sub("([A-Z][a-z]),", "\1,".upper(), string)

Why doesn't this regular expression match in this string?

I want to be able to replace a string in a file using regular expressions. But my function isn't finding a match. So I've mocked up a test to replicate what's happening.
I have defined the string I want to replace as follows:
string = 'buf = O_strdup("ONE=001&TYPE=PUZZLE&PREFIX=EXPRESS&");'
I want to replace the "TYPE=PUZZLE&PREFIX=EXPRESS&" part with something else. NB. the string won't always contain exactly "PUZZLE" and "PREFIX" in the original file, but it will be of that format ).
So first I tried testing that I got the correct match.
obj = re.search(r'TYPE=([\^&]*)\&PREFIX=([\^&]*)\&', string)
if obj:
print obj.group()
else:
print "No match!!"
Thinking that ([\^&]*) will match any number of characters that are NOT an ampersand.
But I always get "No match!!".
However,
obj = re.search(r'TYPE=([\^&]*)', string)
returns me "TYPE="
Why doesn't my first one work?
Since the ^ sign is escaped with \ the following part: ([\^&]*) matches any sequence of these characters: ^, &.
Try replacing it with ([^&]*).
In my regex tester, this does work: 'TYPE=(.*)\&PREFIX=(.*)\&'
Try this instead
obj = re.search(r'TYPE=(?P<type>[^&]*?)&PREFIX=(?P<prefix>[^&]*?)&', string)
The ?P<some_name> is a named capture group and makes it a little bit easier to access the captured group, obj.group("type") -->> 'PUZZLE'
It might be better to use the functions urlparse.parse_qsl() and urllib.urlencode() instead of regular expressions. The code will be less error-prone:
from urlparse import parse_qsl
from urllib import urlencode
s = "ONE=001&TYPE=PUZZLE&PREFIX=EXPRESS&"
a = parse_qsl(s)
d = dict(TYPE="a", PREFIX="b")
print urlencode(list((key, d.get(key, val)) for key, val in a))
# ONE=001&TYPE=a&PREFIX=b

Regex to Split 1st Colon

I have a time in ISO 8601 ( 2009-11-19T19:55:00 ) which is also paired with a name commence. I'm trying to parse this into two. I'm currently up to here:
import re
sColon = re.compile('[:]')
aString = sColon.split("commence:2009-11-19T19:55:00")
Obviously this returns:
>>> aString
['commence','2009-11-19T19','55','00']
What I'd like it to return is this:
>>>aString
['commence','2009-11-19T19:55:00']
How would I go about do this in the original creation of sColon? Also, do you recommend any Regular Expression links or books that you have found useful, as I can see myself needing it in the future!
EDIT:
To clarify... I'd need a regular expression that would just parse at the very first instance of :, is this possible? The text ( commence ) before the colon can chance, yes...
>>> first, colon, rest = "commence:2009-11-19T19:55:00".partition(':')
>>> print (first, colon, rest)
('commence', ':', '2009-11-19T19:55:00')
You could put maximum split parameter in split function
>>> "commence:2009-11-19T19:55:00".split(":",1)
['commence', '2009-11-19T19:55:00']
Official Docs
S.split([sep [,maxsplit]]) -> list of strings
Return a list of the words in the string S, using sep as the
delimiter string. If maxsplit is given, at most maxsplit
splits are done. If sep is not specified or is None, any
whitespace string is a separator and empty strings are removed
from the result.
Looks like you need .IndexOf(":"), then .Substring()?
#OP, don't do the unnecessary. Regex is not needed with what you are doing. Python has very good string manipulation methods that you can use. All you need is split(), and slicing. Those are the very basics of Python.
>>> "commence:2009-11-19T19:55:00".split(":",1)
['commence', '2009-11-19T19:55:00']
>>>

Categories