This question already has answers here:
Python split string without splitting escaped character
(10 answers)
Closed 5 years ago.
Is there any better way to split a string which contains escaped delimeter in it.
string = "fir\&st_part&secon\&d_part"
print(string.split('&'))
# is giving me
>>> ['fir\\', 'st_part', 'secon\\', 'd_part']
# but not
>>> ['fir&st_part', 'secon&d_part']
I have added an escape character \ before & in fir&st_part and secon&d_part with the intention that split function will escape the following character.
Is there any better way to do this if not by using a string split?
You can user regular expression!
split if ?<! current position of string is not preceded with backward (\, two slashes to escape it)slash and ampersand symbol(&)
>>> import re
>>> re.split(r'(?<!\\)&', string)
['fir\\&st_part', 'secon\\&d_part']
With the resulting list, you can iterate and replace the escaped '\&' with '&' if necessary!
>>> import re
>>> print [each.replace("\&","&") for each in re.split(r'(?<!\\)&', string)]
['fir&st_part', 'secon&d_part']
It's possible using a regular expression:
import re
string = "fir\&st_part&secon\&d_part"
re.split(r'[^\\]&', string)
# ['fir\\&st_par', 'secon\\&d_part']
Related
This question already has answers here:
How do I escape curly-brace ({}) characters in a string while using .format (or an f-string)?
(23 answers)
Closed 2 years ago.
I'm trying to match this:
text = "111111"
reps = 2
f_pattern = re.compile(rf"(\w)(?=\1{{reps}})")
f_matches = re.findall(f_pattern, text)
## returns: []
r_pattern = re.compile(r"(\w)(?=\1{2})")
r_matches = re.findall(r_pattern, text)
## returns: ['1', '1', '1', '1']
How should the f-string pattern be written to return non-empty result?
Write rf"(\w)(?=\1{{{reps}}})") instead of rf"(\w)(?=\1{{reps}})").
{{ is a way to escape single { inside any f-string, same for }.
Try it online!
As mentioned in this answer:
How do I use format() in re.compile
Double braces are interpreted as literal braces in format strings. You need another, third set, to indicate that len is a formatted expression.
If you use double brace, f-string thinks its a literal brace and skips it. use 3 braces so 2 of the braces will be considered as a literal brace and another set for format string.
You need to double the curly bracket for literal print. so here is the solution
text = "111111"
reps = 2
f_pattern = re.compile(rf"(\w)(?=\1{{ {reps} }})")
f_matches = re.findall(f_pattern, text)
This question already has answers here:
Why do backslashes appear twice?
(2 answers)
Closed 4 years ago.
How can i do to represent a string with (\") inside string
I tried several ways:
date = 'xpto\"xpto'
'xpto"xpto'
date = 'xpto\\"xpto'
'xpto\\"xpto'
data='xpto\\' + '"xpto'
'xpto\\"xpto'
data= r'xpto\"xpto'
'xpto\\"xpto'
i need the string exactly like this
'xpto\"xpto'
if someone knows how, I really appreciate the help
The following line works.
print(r"'xpto\"xpto'")
Output:
'xpto\"xpto'
We add r to insinuate that the string is in a raw format.
and/or
print("'xpto\\\"xpto'") where \\ = \ escapes this and \" = " escaping the " with \
"'xpto\\\"xpto'" is correct. Part of the confusion is distinguishing the actual string with Python's textual representation of the string.
>>> date = "'xpto\\\"xpto'"
>>> date
'\'xpto\\"xpto\''
>>> print(date)
'xpto\"xpto'
A simpler solution (which came to mind after reading Elvir's answer) is to use a triple-quoted raw string:
date = r"""'xpto\"xpto'"""
This question already has answers here:
Why do backslashes appear twice?
(2 answers)
Closed 7 months ago.
Why does:
B = "The" + "\s"
and
B = "The" + r"\s"
yield:
"The\\s"
Is it possible to write the above, such that the output string is:
"The\s"
I have read similar questions on both the issue of backslashes, and their property for escaping, and the interpretation of regex characters in Python.
How to print backslash with Python?
Why can't Python's raw string literals end with a single backslash?
Does this mean there is no way to write what I want?
If it is useful, My end goal is to a write a program that adds the regex expression for space (\s) to a string where this such space:
For example, start with:
A = "The Cat and Dog"
After applying the function, this becomes:
B = "The\sCat\sand\sDog"
I believe this is related to Why does printing a tuple (list, dict, etc.) in Python double the backslashes?
The representation of the string and what it actually contains can differ.
Observe:
>>> B = "The" + "\s"
>>> B
'The\\s'
>>> print B
The\s
Furthermore
>>> A = "The Cat and Dog"
>>> B = str.replace(A, ' ', '\s')
>>> B
'The\\sCat\\sand\\sDog'
>>> print B
The\sCat\sand\sDog
From the docs:
all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the result
So while \s is not a proper escape sequence, Python forgives you your mistake and treats the backslash as if you had properly escaped it as \\. But when you then view the string's representation, it shows the backslash properly escaped. That said, the string only contains one backslash. It's only the representation that shows it as an escape sequence with two.
You must escape the "\"
B = "The" + "\\s"
>>> B = "The" + "\\s"
>>> print(B)
The\s
See the Escape Sequences part:
Python 3 - Lexical Analysis
This question already has answers here:
Do regular expressions from the re module support word boundaries (\b)?
(5 answers)
Closed 4 years ago.
I have the following a string, I need to check if
the string contains App2 and iPhone,
but not App and iPhone
I wrote the following:
campaign_keywords = "App2 iPhone"
my_string = "[Love]App2 iPhone Argentina"
pattern = re.compile("r'\b" + campaign_keywords + "\b")
print pattern.search(my_string)
It prints None. Why?
The raw string notation is wrong, the r should not be inside the the quotes. and the second \b should also be a raw string.
The match function tries to match at the start of the string. You need to use search or findall
Difference between re.search and re.match
Example
>>> pattern = re.compile(r"\b" + campaign_keywords + r"\b")
>>> pattern.findall(my_string)
['App2 iPhone']
>>> pattern.match(my_string)
>>> pattern.search(my_string)
<_sre.SRE_Match object at 0x10ca2fbf8>
>>> match = pattern.search(my_string)
>>> match.group()
'App2 iPhone'
This question already has answers here:
Python string.strip stripping too many characters [duplicate]
(3 answers)
Closed 6 years ago.
I have encountered a very odd behavior of built-in function lstrip.
I will explain with a few examples:
print 'BT_NAME_PREFIX=MUV'.lstrip('BT_NAME_PREFIX=') # UV
print 'BT_NAME_PREFIX=NUV'.lstrip('BT_NAME_PREFIX=') # UV
print 'BT_NAME_PREFIX=PUV'.lstrip('BT_NAME_PREFIX=') # UV
print 'BT_NAME_PREFIX=SUV'.lstrip('BT_NAME_PREFIX=') # SUV
print 'BT_NAME_PREFIX=mUV'.lstrip('BT_NAME_PREFIX=') # mUV
As you can see, the function trims one additional character sometimes.
I tried to model the problem, and noticed that it persisted if I:
Changed BT_NAME_PREFIX to BT_NAME_PREFIY
Changed BT_NAME_PREFIX to BT_NAME_PREFIZ
Changed BT_NAME_PREFIX to BT_NAME_PREF
Further attempts have made it even more weird:
print 'BT_NAME=MUV'.lstrip('BT_NAME=') # UV
print 'BT_NAME=NUV'.lstrip('BT_NAME=') # UV
print 'BT_NAME=PUV'.lstrip('BT_NAME=') # PUV - different than before!!!
print 'BT_NAME=SUV'.lstrip('BT_NAME=') # SUV
print 'BT_NAME=mUV'.lstrip('BT_NAME=') # mUV
Could someone please explain what on earth is going on here?
I know I might as well just use array-slicing, but I would still like to understand this.
Thanks
You're misunderstanding how lstrip works. It treats the characters you pass in as a bag and it strips characters that are in the bag until it finds a character that isn't in the bag.
Consider:
'abc'.lstrip('ba') # 'c'
It is not removing a substring from the start of the string. To do that, you need something like:
if s.startswith(prefix):
s = s[len(prefix):]
e.g.:
>>> s = 'foobar'
>>> prefix = 'foo'
>>> if s.startswith(prefix):
... s = s[len(prefix):]
...
>>> s
'bar'
Or, I suppose you could use a regular expression:
>>> s = 'foobar'
>>> import re
>>> re.sub('^foo', '', s)
'bar'
The argument given to lstrip is a list of things to remove from the left of a string, on a character by character basis. The phrase is not considered, only the characters themselves.
S.lstrip([chars]) -> string or unicode
Return a copy of the string S with leading whitespace removed. If
chars is given and not None, remove characters in chars instead. If
chars is unicode, S will be converted to unicode before stripping
You could solve this in a flexible way using regular expressions (the re module):
>>> import re
>>> re.sub('^BT_NAME_PREFIX=', '', 'BT_NAME_PREFIX=MUV')
MUV