I want to strip double quotes from:
string = '"" " " ""\\1" " "" ""'
to obtain:
string = '" " " ""\\1" " "" "'
I tried to use rstrip, lstrip and strip('[^\"]|[\"$]') but it did not work.
How can I do this?
If the quotes you want to strip are always going to be "first and last" as you said, then you could simply use:
string = string[1:-1]
If you can't assume that all the strings you process have double quotes you can use something like this:
if string.startswith('"') and string.endswith('"'):
string = string[1:-1]
Edit:
I'm sure that you just used string as the variable name for exemplification here and in your real code it has a useful name, but I feel obliged to warn you that there is a module named string in the standard libraries. It's not loaded automatically, but if you ever use import string make sure your variable doesn't eclipse it.
IMPORTANT: I'm extending the question/answer to strip either single or double quotes. And I interpret the question to mean that BOTH quotes must be present, and matching, to perform the strip. Otherwise, the string is returned unchanged.
To "dequote" a string representation, that might have either single or double quotes around it (this is an extension of #tgray's answer):
def dequote(s):
"""
If a string has single or double quotes around it, remove them.
Make sure the pair of quotes match.
If a matching pair of quotes is not found,
or there are less than 2 characters, return the string unchanged.
"""
if (len(s) >= 2 and s[0] == s[-1]) and s.startswith(("'", '"')):
return s[1:-1]
return s
Explanation:
startswith can take a tuple, to match any of several alternatives. The reason for the DOUBLED parentheses (( and )) is so that we pass ONE parameter ("'", '"') to startswith(), to specify the permitted prefixes, rather than TWO parameters "'" and '"', which would be interpreted as a prefix and an (invalid) start position.
s[-1] is the last character in the string.
Testing:
print( dequote("\"he\"l'lo\"") )
print( dequote("'he\"l'lo'") )
print( dequote("he\"l'lo") )
print( dequote("'he\"l'lo\"") )
=>
he"l'lo
he"l'lo
he"l'lo
'he"l'lo"
(For me, regex expressions are non-obvious to read, so I didn't try to extend #Alex's answer.)
To remove the first and last characters, and in each case do the removal only if the character in question is a double quote:
import re
s = re.sub(r'^"|"$', '', s)
Note that the RE pattern is different than the one you had given, and the operation is sub ("substitute") with an empty replacement string (strip is a string method but does something pretty different from your requirements, as other answers have indicated).
If string is always as you show:
string[1:-1]
Almost done. Quoting from http://docs.python.org/library/stdtypes.html?highlight=strip#str.strip
The chars argument is a string
specifying the set of characters to be
removed.
[...]
The chars argument is not a prefix or
suffix; rather, all combinations of
its values are stripped:
So the argument is not a regexp.
>>> string = '"" " " ""\\1" " "" ""'
>>> string.strip('"')
' " " ""\\1" " "" '
>>>
Note, that this is not exactly what you requested, because it eats multiple quotes from both end of the string!
Remove a determinated string from start and end from a string.
s = '""Hello World""'
s.strip('""')
> 'Hello World'
Starting in Python 3.9, you can use removeprefix and removesuffix:
'"" " " ""\\1" " "" ""'.removeprefix('"').removesuffix('"')
# '" " " ""\\1" " "" "'
If you are sure there is a " at the beginning and at the end, which you want to remove, just do:
string = string[1:len(string)-1]
or
string = string[1:-1]
I have some code that needs to strip single or double quotes, and I can't simply ast.literal_eval it.
if len(arg) > 1 and arg[0] in ('"\'') and arg[-1] == arg[0]:
arg = arg[1:-1]
This is similar to ToolmakerSteve's answer, but it allows 0 length strings, and doesn't turn the single character " into an empty string.
in your example you could use strip but you have to provide the space
string = '"" " " ""\\1" " "" ""'
string.strip('" ') # output '\\1'
note the \' in the output is the standard python quotes for string output
the value of your variable is '\\1'
Below function will strip the empty spces and return the strings without quotes. If there are no quotes then it will return same string(stripped)
def removeQuote(str):
str = str.strip()
if re.search("^[\'\"].*[\'\"]$",str):
str = str[1:-1]
print("Removed Quotes",str)
else:
print("Same String",str)
return str
find the position of the first and the last " in your string
>>> s = '"" " " ""\\1" " "" ""'
>>> l = s.find('"')
>>> r = s.rfind('"')
>>> s[l+1:r]
'" " " ""\\1" " "" "'
Related
I need to change this string:
input_str = '{resourceType=Type, category=[{coding=[{system=http://google.com, code=item, display=Item}]}]}'
To json format:
output_str = '{"resourceType":"Type", "category":[{"coding":[{"system":"http://google.com", "code":"item", "display":"Item"}]}]}'
Changing the equal sign "=" to colon ":" is quite easy by using replace function:
input_str.replace("=", ":")
But adding quotes before and after each value / word is something that I can't find the solution for
I suggest to surround with double quotes any sequence of characters that are not reserved in your markup. I also made a provision for escaped double quotes, and you can add more escaped symbols to it:
import re
input_str = '{resourceType=Type, category=[{coding=[{system=http://google.com, code=item, display=Item}]}]}'
output_str = re.sub (r'(([^=([\]{},\s]|\")+)', r'"\1"', input_str).replace('=', ':')
print (output_str)
Output:
{"resourceType":"Type", "category":[{"coding":[{"system":"http://google.com", "code":"item", "display":"Item"}]}]}
You can use this function for the conversion.
def to_json(in_str):
return in_str.replace('{', '{"').replace('=', '":"').replace(',', '", "').replace('[', '[').replace('}', '"}').replace(']', ']').replace('" ', '"').replace(':"[', ':[').replace(']"', ']')
this works correctly for the input you have mentioned.
print(to_json(input_str))
#output = {"resourceType":"Type", "category":[{"coding":[{"system":"http://google.com", "code":"item", "display":"Item"}]}]}
Regex is certainly more concise and efficient but, just for the fun, it's also possible using replace :
input_str = input_str.replace("=", "\":\"")
input_str = input_str.replace("=[", "\":[")
input_str = input_str.replace(", ", "\", \"")
input_str = input_str.replace("{", "{\"")
input_str = input_str.replace("}", "\"}")
input_str = input_str.replace("]\"}", "]}")
input_str = input_str.replace("\"[", "[")
print(input_str) #=> '{"resourceType":"Type", "category":[{"coding":[{"system":"http://google.com", "code":"item", "display":"Item"}]}]}'
I am looking for a way to prefix strings in python with a single backslash, e.g. "]" -> "]". Since "\" is not a valid string in python, the simple
mystring = '\' + mystring
won't work. What I am currently doing is something like this:
mystring = r'\###' + mystring
mystring.replace('###','')
While this works most of the time, it is not elegant and also can cause problems for strings containing "###" or whatever the "filler" is set to. Is there a bette way of doing this?
You need to escape the backslash with a second one, to make it a literal backslash:
mystring = "\\" + mystring
Otherwise it thinks you're trying to escape the ", which in turn means you have no quote to terminate the string
Ordinarily, you can use raw string notation (r'string'), but that won't work when the backslash is the last character
The difference between print a and just a:
>>> a = 'hello'
>>> a = '\\' + a
>>> a
'\\hello'
>>> print a
\hello
Python strings have a feature called escape characters. These allow you to do special things inside as string, such as showing a quote (" or ') without closing the string you're typing
See this table
So when you typed
mystring = '\' + mystring
the \' is an escaped apostrophe, meaning that your string now has an apostrophe in it, meaning it isn't actually closed, which you can see because the rest of that line is coloured.
To type a backslash, you must escape one, which is done like this:
>>> aBackSlash = '\\'
>>> print(aBackSlash)
\
You should escape the backslash as follows:
mystring = "\\" + mystring
This is because if you do '\' it will end up escaping the second quotation. Therefore to treat the backslash literally, you must escape it.
Examples
>>> s = 'hello'
>>> s = '\\' + s
>>> print
\hello
Your case
>>> mystring = 'it actually does work'
>>> mystring = '\\' + mystring
>>> print mystring
\it actually does work
As a different way of approaching the problem, have you considered string formatting?
r'\%s' % mystring
or:
r'\{}'.format(mystring)
After replacing all word characters in a string with the character '^', using re.sub("\w", "^" , stringorphrase) I'm left with :
>>> '^^^ ^^ ^^^^'
Is there any way to remove the single quotes so it looks cleaner?
>>> ^^^ ^^ ^^^^
Are you sure it's just not how it's displayed in the interactive prompt or something (and there aren't actually apost's in your string)?
If the ' is actually part of the string, and is first/last then either:
string = string.strip("'")
or:
string = string[1:-1] # lop ending characters off
Use the print statement. The quotes aren't actually part of the string.
To remove all occurrences of single quotes:
mystr = some_string_with_single_quotes
answer = mystr.replace("'", '')
To remove single quotes ONLY at the ends of the string:
mystr = some_string_with_single_quotes
answer = mystr.strip("'")
Hope this helps
I have a question regarding strip() in Python. I am trying to strip a semi-colon from a string, I know how to do this when the semi-colon is at the end of the string, but how would I do it if it is not the last element, but say the second to last element.
eg:
1;2;3;4;\n
I would like to strip that last semi-colon.
Strip the other characters as well.
>>> '1;2;3;4;\n'.strip('\n;')
'1;2;3;4'
>>> "".join("1;2;3;4;\n".rpartition(";")[::2])
'1;2;3;4\n'
how about replace?
string1='1;2;3;4;\n'
string2=string1.replace(";\n","\n")
>>> string = "1;2;3;4;\n"
>>> string.strip().strip(";")
"1;2;3;4"
This will first strip any leading or trailing white space, and then remove any leading or trailing semicolon.
Try this:
def remove_last(string):
index = string.rfind(';')
if index == -1:
# Semi-colon doesn't exist
return string
return string[:index] + string[index+1:]
This should be able to remove the last semicolon of the line, regardless of what characters come after it.
>>> remove_last('Test')
'Test'
>>> remove_last('Test;abc')
'Testabc'
>>> remove_last(';test;abc;foobar;\n')
';test;abc;foobar\n'
>>> remove_last(';asdf;asdf;asdf;asdf')
';asdf;asdf;asdfasdf'
The other answers provided are probably faster since they're tailored to your specific example, but this one is a bit more flexible.
You could split the string with semi colon and then join the non-empty parts back again using ; as separator
parts = '1;2;3;4;\n'.split(';')
non_empty_parts = []
for s in parts:
if s.strip() != "": non_empty_parts.append(s.strip())
print "".join(non_empty_parts, ';')
If you only want to use the strip function this is one method:
Using slice notation, you can limit the strip() function's scope to one part of the string and append the "\n" on at the end:
# create a var for later
str = "1;2;3;4;\n"
# format and assign to newstr
newstr = str[:8].strip(';') + str[8:]
Using the rfind() method(similar to Micheal0x2a's solution) you can make the statement applicable to many strings:
# create a var for later
str = "1;2;3;4;\n"
# format and assign to newstr
newstr = str[:str.rfind(';') + 1 ].strip(';') + str[str.rfind(';') + 1:]
re.sub(r';(\W*$)', r'\1', '1;2;3;4;\n') -> '1;2;3;4\n'
I need a way to remove all whitespace from a string, except when that whitespace is between quotes.
result = re.sub('".*?"', "", content)
This will match anything between quotes, but now it needs to ignore that match and add matches for whitespace..
I don't think you're going to be able to do that with a single regex. One way to do it is to split the string on quotes, apply the whitespace-stripping regex to every other item of the resulting list, and then re-join the list.
import re
def stripwhite(text):
lst = text.split('"')
for i, item in enumerate(lst):
if not i % 2:
lst[i] = re.sub("\s+", "", item)
return '"'.join(lst)
print stripwhite('This is a string with some "text in quotes."')
Here is a one-liner version, based on #kindall's idea - yet it does not use regex at all! First split on ", then split() every other item and re-join them, that takes care of whitespaces:
stripWS = lambda txt:'"'.join( it if i%2 else ''.join(it.split())
for i,it in enumerate(txt.split('"')) )
Usage example:
>>> stripWS('This is a string with some "text in quotes."')
'Thisisastringwithsome"text in quotes."'
You can use shlex.split for a quotation-aware split, and join the result using " ".join. E.g.
print " ".join(shlex.split('Hello "world this is" a test'))
Oli, resurrecting this question because it had a simple regex solution that wasn't mentioned. (Found your question while doing some research for a regex bounty quest.)
Here's the small regex:
"[^"]*"|(\s+)
The left side of the alternation matches complete "quoted strings". We will ignore these matches. The right side matches and captures spaces to Group 1, and we know they are the right spaces because they were not matched by the expression on the left.
Here is working code (and an online demo):
import re
subject = 'Remove Spaces Here "But Not Here" Thank You'
regex = re.compile(r'"[^"]*"|(\s+)')
def myreplacement(m):
if m.group(1):
return ""
else:
return m.group(0)
replaced = regex.sub(myreplacement, subject)
print(replaced)
Reference
How to match pattern except in situations s1, s2, s3
How to match a pattern unless...
Here little longish version with check for quote without pair. Only deals with one style of start and end string (adaptable for example for example start,end='()')
start, end = '"', '"'
for test in ('Hello "world this is" atest',
'This is a string with some " text inside in quotes."',
'This is without quote.',
'This is sentence with bad "quote'):
result = ''
while start in test :
clean, _, test = test.partition(start)
clean = clean.replace(' ','') + start
inside, tag, test = test.partition(end)
if not tag:
raise SyntaxError, 'Missing end quote %s' % end
else:
clean += inside + tag # inside not removing of white space
result += clean
result += test.replace(' ','')
print result