Using regex in python - python

i have the following problem.
I want to escape all special characters in a python string.
str='eFEx-x?k=;-'
re.sub("([^a-zA-Z0-9])",r'\\1', str)
'eFEx\\1x\\1k\\1\\1\\1'
str='eFEx-x?k=;-'
re.sub("([^a-zA-Z0-9])",r'\1', str)
'eFEx-x?k=;-'
re.sub("([^a-zA-Z0-9])",r'\\\1', str)
I can't seem to win here. '\1' indicates the special character and i want to add a '\' before this special character. but using \1 removes its special meaning and \\1 also does not help.

Use r'\\\1'. That's a backslash (escaped, so denoted \\) followed by \1.
To verify that this works, try:
str = 'eFEx-x?k=;-'
print re.sub("([^a-zA-Z0-9])",r'\\\1', str)
This prints:
eFEx\-x\?k\=\;\-
which I think is what you want. Don't be confused when the interpreter outputs 'eFEx\\-x\\?k\\=\\;\\-'; the double backslashes are there because the interpreter quotes it output, unless you use print.

Why don't you use re.escape()?
str = 'eFEx-x?k=;-'
re.escape(str)
'eFEx\\-x\\?k\\=\\;\\-'

Try adding another backslash:
s = 'eFEx-x?k=;-'
print re.sub("([^a-zA-Z0-9])",r'\\\1', s)

Related

Python RegEx: find specific word followed by roman numeral [duplicate]

http://regex101.com/r/oU6eI5/1 , test here seam works, but when i put in Python, match whole str.
str = galley/files/tew/tewt/tweqt/
re.sub('^.+/+([^/]+/$)', "\1", str)
i want get "tweqt/"
You need to use a raw string in the replace:
str = galley/files/tew/tewt/tweqt/
re.sub('^.+/+([^/]+/$)', r"\1", str)
# ^
Otherwise, you get the escaped character \1. For instance on my console, it's a little smiley.
If you somehow don't want to raw your string, you'll have to escape the backslash:
re.sub('^.+/+([^/]+/$)', "\\1", str)
Also worth noting that it's a good practice to raw your regex strings and use consistent quotes, so you I would advise using:
re.sub(r'^.+/+([^/]+/$)', r'\1', str)
Other notes
It might be simpler to match (using re.search) instead of using re.sub:
re.search(r'[^/]+/$', str).group()
# => tweqt/
And you might want to use another variable name other than str because this will override the existing function str().
It would be better if you define the pattern or regex as raw string.
>>> import re
>>> s = "galley/files/tew/tewt/tweqt/"
>>> m = re.sub(r'^.+/+([^/]+/$)', r'\1', s)
^ ^
>>> m
'tweqt/'

operate with python '\' special character in a string

I am trying to iterate in a string and find a character on it and delete it.
For example, my string is "HowAre\youDoing" and I want the string "HowAreyouDoing" back (without the character '\'. My Loop is:
for c in string:
if c == '\':
The Point is that '\' is a Special character and it doesn´t allow me to do it in this way. Does anybody knows how can I proceed?
thanks
In python, as in most programing languages, the backslash character is used to introduce a special character, like \n for newline or \t for tab (and several more).
If you initialize a string in python with \y, it will escape it automatically, since \y is not a valid special character and python assumes that you want the actual character \ which is escaped to \\:
>>> s = "HowAre\youDoing"
>>> s
'HowAre\\youDoing'
So, to replace it in your case, just do
>>> s.replace("\\", "")
'HowAreyouDoing'
If you'd like to replace special characters like the aforementioned, you would need to specify the respective special character with an unescaped "\":
>>> s = "HowAre\nyouDoing"
>>> s
'HowAre\nyouDoing'
>>> s.replace("\n", "")
'HowAreyouDoing'
You should escape the character
for c in string:
if c == '\\':

Python re.match only characters, digits and some punctuations

I am using this re.match call to get only "proper" strings:
re.match('^[A-Za-z0-9\.\,\:\;\!\?\(\)]', str)
But I am getting some garbage too, like # and _. How is that possible? What am I doing wrong?
Thanks!
Use this to check all characters until the end of your string, otherwhise your pattern will only check the first character:
re.match('^[A-Za-z0-9.,:;!?()]+$', str)
Note that the character class doesn't contain spaces, newlines or tabs. You can add them like this:
re.match('^[A-Za-z0-9.,:;!?()\s]+$', str)
If you want to allow void strings you can replace the + quantifier by *

python regex re.compile match

I am trying to match (using regex in python):
http://images.mymaterials.com/images/steel-images/small/steel/steel800/steel800-2.jpg
in the following string:
http://www.mymaterialssite.com','http://images.mymaterials.com/images/steel-images/small/steel/steel800/steel800-2.jpg','Model Photo'
My code has something like this:
temp="http://www.mymaterialssite.com','http://images.mymaterials.com/images/steel-images/small/steel/steel800/steel800-2.jpg','Model Photo'"
dummy=str(re.compile(r'.com'',,''(.*?)'',,''Model Photo').search(str(temp)).group(1))
I do not think the "dummy" is correct & I am unsure how I "escape" the single and double quotes in the regex re.compile command.
I tried googling for the problem, but I couldnt find anything relevant.
Would appreciate any guidance on this.
Thanks.
The easiest way to deal with strings in Python that contain escape characters and quotes is to triple double-quote the string (""") and prefix it with r. For example:
my_str = r"""This string would "really "suck"" to write if I didn't
know how to tell Python to parse it as "raw" text with the 'r' character and
triple " quotes. Especially since I want \n to show up as a backlash followed
by n. I don't want \0 to be the null byte either!"""
The r means "take escape characters as literal". The triple double-quotes (""") prevent single-quotes, double-quotes, and double double-quotes from prematurely ending the string.
EDIT: I expanded the example to include things like \0 and \n. In a normal string (not a raw string) a \ (the escape character) signifies that the next character has special meaning. For example \n means "the newline character". If you literally wanted the character \ followed by n in your string you would have to write \\n, or just use a raw string instead, as I show in the example above.
You can also read about string literals in the Python documentation here:
For beginners: http://docs.python.org/tutorial/introduction.html#strings
Complex explanation: http://docs.python.org/reference/lexical_analysis.html#string-literals
Try triple quotes:
import re
tmp=""".*http://images.mymaterials.com/images/steel-images/small/steel/steel800/steel800-2.jpg.*"""
str="""http://www.mymaterialssite.com\'\,\'http://images.mymaterials.com/images/steel-images/small/steel/steel800/steel800-2.jpg','Model Photo'"""
x=re.match(tmp,str)
if x!=None:
print x.group()
Also you were missing the .* in the beginning of the pattern and at the end. I added that too.
if you use double quotes (which have the same meaning as the single ones, in Python), you don't have to escape at all.. (in this case). you can even use string literal without the starting r (you don't have any backslash there)
re.compile(".com','(.*?)','Model Photo")
Commas don't need to be escaped, and single quotes don't need to be escaped if you use double quotes to create the string:
>>> dummy=re.compile(r".com','(.*?)','Model Photo").search(temp).group(1)
>>> print dummy
http://images.mymaterials.com/images/steel-images/small/steel/steel800/steel800-2.jpg
Note that I also removed some unnecessary str() calls, and for future reference if you do ever need to escape single or double quotes (say your string contains both), use a backslash like this:
'.com\',\'(.*?)\',\'Model Photo'
As mykhal pointed out in comments, this doesn't work very nicely with regex because you can no longer use the raw string (r'...') literal. A better solution would be to use triple quoted strings as other answers suggested.

How to delete () using re module in Python

I am in trouble for processing XML text.
I want to delete () from my text as follows:
from <b>(apa-bhari(n))</b> to <b>apa-bhari(n)</b>
The following code was made
name= re.sub('<b>\((.+)\)</b>','<b>\1</b>',name)
But this can only returns
<b></b>
I do not understand escape sequences and backreference. Please tell me the solution.
You need to use raw strings, or escape the slashes:
name = re.sub(r'<b>\((.+)\)</b>', r'<b>\1</b>', name)
You need to escape backslashes in Python strings if followed by a number; the following expressions are all true:
assert '\1' == '\x01'
assert len('\\1') == 2
assert '\)' == '\\)'
So, your code would be
name = re.sub('<b>\\((.+)\\)</b>','<b>\\1</b>',name)
Alternatively, use the regular expression string definition:
name = re.sub(r'<b>\((.+)\)</b>', r'<b>\1</b>',name)
Try:
name= re.sub('<b>\((.+)\)</b>','<b>\\1</b>',name)
or if you do not want to have an illisible code with \\ everywhere you are using backslashes, do not escape manually backslashes, but add an r before the string, ex: r"myString\" is the same as "myString\\".

Categories