This question already has answers here:
Do regular expressions from the re module support word boundaries (\b)?
(5 answers)
Closed 4 years ago.
I have the following a string, I need to check if
the string contains App2 and iPhone,
but not App and iPhone
I wrote the following:
campaign_keywords = "App2 iPhone"
my_string = "[Love]App2 iPhone Argentina"
pattern = re.compile("r'\b" + campaign_keywords + "\b")
print pattern.search(my_string)
It prints None. Why?
The raw string notation is wrong, the r should not be inside the the quotes. and the second \b should also be a raw string.
The match function tries to match at the start of the string. You need to use search or findall
Difference between re.search and re.match
Example
>>> pattern = re.compile(r"\b" + campaign_keywords + r"\b")
>>> pattern.findall(my_string)
['App2 iPhone']
>>> pattern.match(my_string)
>>> pattern.search(my_string)
<_sre.SRE_Match object at 0x10ca2fbf8>
>>> match = pattern.search(my_string)
>>> match.group()
'App2 iPhone'
Related
This question already has answers here:
Python split string without splitting escaped character
(10 answers)
Closed 5 years ago.
Is there any better way to split a string which contains escaped delimeter in it.
string = "fir\&st_part&secon\&d_part"
print(string.split('&'))
# is giving me
>>> ['fir\\', 'st_part', 'secon\\', 'd_part']
# but not
>>> ['fir&st_part', 'secon&d_part']
I have added an escape character \ before & in fir&st_part and secon&d_part with the intention that split function will escape the following character.
Is there any better way to do this if not by using a string split?
You can user regular expression!
split if ?<! current position of string is not preceded with backward (\, two slashes to escape it)slash and ampersand symbol(&)
>>> import re
>>> re.split(r'(?<!\\)&', string)
['fir\\&st_part', 'secon\\&d_part']
With the resulting list, you can iterate and replace the escaped '\&' with '&' if necessary!
>>> import re
>>> print [each.replace("\&","&") for each in re.split(r'(?<!\\)&', string)]
['fir&st_part', 'secon&d_part']
It's possible using a regular expression:
import re
string = "fir\&st_part&secon\&d_part"
re.split(r'[^\\]&', string)
# ['fir\\&st_par', 'secon\\&d_part']
This question already has answers here:
Why do backslashes appear twice?
(2 answers)
Closed 7 months ago.
Why does:
B = "The" + "\s"
and
B = "The" + r"\s"
yield:
"The\\s"
Is it possible to write the above, such that the output string is:
"The\s"
I have read similar questions on both the issue of backslashes, and their property for escaping, and the interpretation of regex characters in Python.
How to print backslash with Python?
Why can't Python's raw string literals end with a single backslash?
Does this mean there is no way to write what I want?
If it is useful, My end goal is to a write a program that adds the regex expression for space (\s) to a string where this such space:
For example, start with:
A = "The Cat and Dog"
After applying the function, this becomes:
B = "The\sCat\sand\sDog"
I believe this is related to Why does printing a tuple (list, dict, etc.) in Python double the backslashes?
The representation of the string and what it actually contains can differ.
Observe:
>>> B = "The" + "\s"
>>> B
'The\\s'
>>> print B
The\s
Furthermore
>>> A = "The Cat and Dog"
>>> B = str.replace(A, ' ', '\s')
>>> B
'The\\sCat\\sand\\sDog'
>>> print B
The\sCat\sand\sDog
From the docs:
all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the result
So while \s is not a proper escape sequence, Python forgives you your mistake and treats the backslash as if you had properly escaped it as \\. But when you then view the string's representation, it shows the backslash properly escaped. That said, the string only contains one backslash. It's only the representation that shows it as an escape sequence with two.
You must escape the "\"
B = "The" + "\\s"
>>> B = "The" + "\\s"
>>> print(B)
The\s
See the Escape Sequences part:
Python 3 - Lexical Analysis
This question already has answers here:
How to extract the substring between two markers?
(22 answers)
Closed 4 years ago.
I have a string - Python :
string = "/foo13546897/bar/Atlantis-GPS-coordinates/bar457822368/foo/"
Expected output is :
"Atlantis-GPS-coordinates"
I know that the expected output is ALWAYS surrounded by "/bar/" on the left and "/" on the right :
"/bar/Atlantis-GPS-coordinates/"
Proposed solution would look like :
a = string.find("/bar/")
b = string.find("/",a+5)
output=string[a+5,b]
This works, but I don't like it.
Does someone know a beautiful function or tip ?
You can use split:
>>> string.split("/bar/")[1].split("/")[0]
'Atlantis-GPS-coordinates'
Some efficiency from adding a max split of 1 I suppose:
>>> string.split("/bar/", 1)[1].split("/", 1)[0]
'Atlantis-GPS-coordinates'
Or use partition:
>>> string.partition("/bar/")[2].partition("/")[0]
'Atlantis-GPS-coordinates'
Or a regex:
>>> re.search(r'/bar/([^/]+)', string).group(1)
'Atlantis-GPS-coordinates'
Depends on what speaks to you and your data.
What you haven't isn't all that bad. I'd write it as:
start = string.find('/bar/') + 5
end = string.find('/', start)
output = string[start:end]
as long as you know that /bar/WHAT-YOU-WANT/ is always going to be present. Otherwise, I would reach for the regular expression knife:
>>> import re
>>> PATTERN = re.compile('^.*/bar/([^/]*)/.*$')
>>> s = '/foo13546897/bar/Atlantis-GPS-coordinates/bar457822368/foo/'
>>> match = PATTERN.match(s)
>>> match.group(1)
'Atlantis-GPS-coordinates'
import re
pattern = '(?<=/bar/).+?/'
string = "/foo13546897/bar/Atlantis-GPS-coordinates/bar457822368/foo/"
result = re.search(pattern, string)
print string[result.start():result.end() - 1]
# "Atlantis-GPS-coordinates"
That is a Python 2.x example. What it does first is:
1. (?<=/bar/) means only process the following regex if this precedes it (so that /bar/ must be before it)
2. '.+?/' means any amount of characters up until the next '/' char
Hope that helps some.
If you need to do this kind of search a bunch it is better to 'compile' this search for performance, but if you only need to do it once don't bother.
Using re (slower than other solutions):
>>> import re
>>> string = "/foo13546897/bar/Atlantis-GPS-coordinates/bar457822368/foo/"
>>> re.search(r'(?<=/bar/)[^/]+(?=/)', string).group()
'Atlantis-GPS-coordinates'
This question already has answers here:
Why does re.sub replace the entire pattern, not just a capturing group within it?
(4 answers)
Closed 2 years ago.
I am very new to python
I need to match all cases by one regex expression and do a replacement. this is a sample substring --> desired result:
<cross_sell id="123" sell_type="456"> --> <cross_sell>
i am trying to do this in my code:
myString = re.sub(r'\<[A-Za-z0-9_]+(\s[A-Za-z0-9_="\s]+)', "", myString)
instead of replacing everything after <cross_sell, it replaces everything and just returns '>'
is there a way for re.sub to replace only the capturing group instead of the entire pattern?
You can use substitution groups:
>>> my_string = '<cross_sell id="123" sell_type="456"> --> <cross_sell>'
>>> re.sub(r'(\<[A-Za-z0-9_]+)(\s[A-Za-z0-9_="\s]+)', r"\1", my_string)
'<cross_sell> --> <cross_sell>'
Notice I put the first group (the one you want to keep) in parenthesis and then I kept that in the output by using the "\1" modifier (first group) in the replacement string.
You can use a group reference to match the first word and a negated character class to match the rest of the string between <> :
>>> s='<cross_sell id="123" sell_type="456">'
>>> re.sub(r'(\w+)[^>]+',r'\1',s)
'<cross_sell>'
\w is equal to [A-Za-z0-9_].
Since the input data is XML, you'd better parse it with an XML parser.
Built-in xml.etree.ElementTree is one option:
>>> import xml.etree.ElementTree as ET
>>> data = '<cross_sell id="123" sell_type="456"></cross_sell>'
>>> cross_sell = ET.fromstring(data)
>>> cross_sell.attrib = {}
>>> ET.tostring(cross_sell)
'<cross_sell />'
lxml.etree is an another option.
below code tested under python 3.6 , without use group..
test = '<cross_sell id="123" sell_type="456">'
resp = re.sub(r'\w+="\w+"' ,r'',test)
print (resp)
<cross_sell>
I'm trying to search a string for numbers, and when finding them, wrap some chars around them, e.g.
a = "hello, i am 8 years old and have 12 toys"
a = method(a)
print a
"hello, i am \ref{8} years old and have \ref{12} toys"
I've looked at the re (regular expression) library, but cannot seem to find anything helpful... any cool ideas?
This is pretty basic usage of the .sub method:
numbers = re.compile(r'(\d+)')
a = numbers.sub(r'\ref{\1}', a)
The parethesis around the \d+ number pattern create a group, and the \1 reference is replaced with the contents of the group.
>>> import re
>>> a = "hello, i am 8 years old and have 12 toys"
>>> numbers = re.compile(r'(\d+)')
>>> a = numbers.sub(r'\\ref{\1}', a)
>>> print a
hello, i am \ref{8} years old and have \ref{12} toys
you need to use re.sub function along these lines :
re.sub("(\d+)",my_sub_func,text) # catch the numbers here (altho this only catches non real numbers)
where my_sub_func is defined like this :
def my_sub_func(match_obj):
text = match_obj.group(0) # get the digit text here
new_text = "\\ref{"+text+"}" # change the pattern here
return new_text`