I have something like this:
text = 'This text is very very long.'
replace_words = ['very','word']
for word in replace_words:
text = text.replace('very','not very')
I would like to only replace the first 'very' or choose which 'very' gets overwritten. I'm doing this on much larger amounts of text so I want to control how duplicate words are being replaced.
text = text.replace("very", "not very", 1)
>>> help(str.replace)
Help on method_descriptor:
replace(...)
S.replace (old, new[, count]) -> string
Return a copy of string S with all occurrences of substring
old replaced by new. If the optional argument count is
given, only the first count occurrences are replaced.
text = text.replace("very", "not very", 1)
The third parameter is the maximum number of occurrences that you want to replace.
From the documentation for Python:
string.replace(s, old, new[, maxreplace])
Return a copy of string s with all occurrences of substring old replaced by new. If the optional argument maxreplace is given, the first maxreplace occurrences are replaced.
From http://docs.python.org/release/2.5.2/lib/string-methods.html :
replace( old, new[, count])
Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the
first count occurrences are replaced.
I didn't try but I believe it works
Related
if words = "This is a SENTENCE" in this assignment statement:
words = "This is a SENTENCE".split()
is a string, as it is double-quoted. But why the codelens showed it is the list, see below:
"This is a SENTENCE" is a string, however the .split() takes that string and returns a list (splitting it at the whitespace, hence why each element of the list is one of the words of the sentence).
From the documentation for split:
Return a list of the words in the string
Also relevant:
If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace.
I am trying to strip the characters '_ ' (underscore and space) away from my string. The first code fails to strip anything.
The code for word_1 works just as I intend. Could anyone enlighten me how to modify the first code to get output 'ale'?
word = 'a_ _ le'
word.strip('_ ')
word_1 = '_ _ le'
word_1.strip('_ ')
'''
You need to replace() in this use case, not strip()
word.replace('_ ', '')
strip():
string.strip(s[, chars])
Return a copy of the string with leading and trailing characters removed. If chars is omitted or None, whitespace characters are removed. If given and not None, chars must be a string; the characters in the string will be stripped from the both ends of the string this method is called on.
replace():
string.replace(s, old, new[, maxreplace])
Return a copy of string s with all occurrences of substring old replaced by new. If the optional argument maxreplace is given, the first maxreplace occurrences are replaced.
Strings in Python
.strip removes the target string from the start and end of the source string.
You want .replace.
>>> word = 'a_ _ le'
>>> word = word.replace("_ ", "")
>>> word
'ale'
.strip() is used when the passed string has to be removed from the start and end of string. It does not work in the middle. For this, .replace() is used as word.replace('_ ', ''). This outputs ale
I have a column in my pandas Dataframe df that contains a string with some trailing hex-encoded NULLs (\x00). At least I think that it's that. When I tried to replace them with:
df['SOPInstanceUID'] = df['SOPInstanceUID'].replace('\x00', '')
the column is not updated. When I do the same with
df['SOPInstanceUID'] = df['SOPInstanceUID'].str.replace('\x00', '')
it's working fine.
What's the difference here? (SOPInstanceUID is not an index.)
thanks
The former looks for exact matches, the latter looks for matches in any part of the string, which is why the latter works for you.
The str methods are synonymous with the standard string equivalents but are vectorised
You did not specify a regex or require an exact match, hence str.replace worked
str.replace(old, new[, count])
Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.
DataFrame.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad', axis=None)
parameter: to_replace : str, regex, list, dict, Series, numeric, or None
str or regex:
str: string exactly matching to_replace will be replaced with value
regex: regexs matching to_replace will be replaced with value
They're not actually in the string: you have unescaped control characters, which Python displays using the hexadecimal notation:
remove all non-word characters in the following way:
re.sub(r'[^\w]', '', '\x00\x00\x00\x08\x01\x008\xe6\x7f')
So right now, re.sub does this:
>>> re.sub("DELETE THIS", "", "I want to DELETE THIS472 go to DON'T DELETE THIS847 the supermarket")
"I want to go to DON'T the supermarket"
I want it to instead delete only the first instance of "DELETE THISXXX," where XXX is a number, so that the result is
"I want to go to DON'T DELETE THIS847 the supermarket"
The XXX is a number that varies, and so I actually do need a regex. How can I accomplish this?
As written in the documentation for re.sub(pattern, repl, string, count=0, flags=0) you can specify the count argument in:
re.sub(pattern, repl, string[, count, flags])
if you only give a count of 1 it will only replace the first
From http://docs.python.org/library/re#re.sub:
The optional argument count is the maximum number of pattern occurrences to be replaced; count must be a non-negative integer. If omitted or zero, all occurrences will be replaced. Empty matches for the pattern are replaced only when not adjacent to a previous match, so sub('x*', '-', 'abc') returns '-a-b-c-'.
The optional argument count is the maximum number of pattern occurrences to be replaced; count must be a non-negative integer.
re.sub(pattern, repl, string, count=0, flags=0)
Set count = 1 to only replace the first instance.
I think your phrasing, "first instance," caused everyone else to answer in the direction of count, but if you meant that you want to delete a phrase only if it fully matches a phrase you seek, then first you have to define what you mean by a "phrase", e.g. non-lower-case characters:
DON'T DELETE THIS
In which case, you can do something like this:
(?<![^a-z]+)\s+DELETE THIS\s+(?![^a-z]+)
I'm not sure whether Python allows arbitrary-length negative lookbehind assertions. If not, remove the first +.
you can use str.replace() for this:
In [9]: strs="I want to DELETE THIS go to DON'T DELETE THIS the supermarket"
In [10]: strs.replace("DELETE THIS","",1) # here 1 is the count
Out[10]: "I want to go to DON'T DELETE THIS the supermarket"
I have some sample string. How can I replace first occurrence of this string in a longer string with empty string?
regex = re.compile('text')
match = regex.match(url)
if match:
url = url.replace(regex, '')
string replace() function perfectly solves this problem:
string.replace(s, old, new[, maxreplace])
Return a copy of string s with all occurrences of substring old replaced by new. If the optional argument maxreplace is given, the first maxreplace occurrences are replaced.
>>> u'longlongTESTstringTEST'.replace('TEST', '?', 1)
u'longlong?stringTEST'
Use re.sub directly, this allows you to specify a count:
regex.sub('', url, 1)
(Note that the order of arguments is replacement, original not the opposite, as might be suspected.)