I have some sample string. How can I replace first occurrence of this string in a longer string with empty string?
regex = re.compile('text')
match = regex.match(url)
if match:
url = url.replace(regex, '')
string replace() function perfectly solves this problem:
string.replace(s, old, new[, maxreplace])
Return a copy of string s with all occurrences of substring old replaced by new. If the optional argument maxreplace is given, the first maxreplace occurrences are replaced.
>>> u'longlongTESTstringTEST'.replace('TEST', '?', 1)
u'longlong?stringTEST'
Use re.sub directly, this allows you to specify a count:
regex.sub('', url, 1)
(Note that the order of arguments is replacement, original not the opposite, as might be suspected.)
Related
suppose i have a string
exp = '"security_datafilter"."PRODUCT_CATEGORIES"."CATEGORY_NAME" IN ("CPU","Storage")'
I want to split the string based on word IN
so my exprected result is
['"security_datafilter"."PRODUCT_CATEGORIES"."CATEGORY_NAME"','IN','("CPU","Storage")']
but in my case it doesnt work
This is what i have tried
import re
exp_split = re.split(r'( in )',exp,re.I)
re documentation:
re.split(pattern, string, maxsplit=0, flags=0)
The split() function expects that the third positional argument is the maxsplit argument. Your code gives re.I to maxsplit and no flags. You should give flags as a keyword argument like so:
exp_split = re.split(r'( in )',exp, flags=re.I)
its simply necessary to capitalize your delimiter and if you dont want the spaces in your result keep them outside your capturing group:
exp_split = re.split(r'\s(IN)\s', exp, re.I)
exp_split
Output
['"security_datafilter"."PRODUCT_CATEGORIES"."CATEGORY_NAME"', 'IN', '("CPU","Storage")']
I am trying to strip the characters '_ ' (underscore and space) away from my string. The first code fails to strip anything.
The code for word_1 works just as I intend. Could anyone enlighten me how to modify the first code to get output 'ale'?
word = 'a_ _ le'
word.strip('_ ')
word_1 = '_ _ le'
word_1.strip('_ ')
'''
You need to replace() in this use case, not strip()
word.replace('_ ', '')
strip():
string.strip(s[, chars])
Return a copy of the string with leading and trailing characters removed. If chars is omitted or None, whitespace characters are removed. If given and not None, chars must be a string; the characters in the string will be stripped from the both ends of the string this method is called on.
replace():
string.replace(s, old, new[, maxreplace])
Return a copy of string s with all occurrences of substring old replaced by new. If the optional argument maxreplace is given, the first maxreplace occurrences are replaced.
Strings in Python
.strip removes the target string from the start and end of the source string.
You want .replace.
>>> word = 'a_ _ le'
>>> word = word.replace("_ ", "")
>>> word
'ale'
.strip() is used when the passed string has to be removed from the start and end of string. It does not work in the middle. For this, .replace() is used as word.replace('_ ', ''). This outputs ale
I can strip numerics but not alpha characters:
>>> text
'132abcd13232111'
>>> text.strip('123')
'abcd'
Why the following is not working?
>>> text.strip('abcd')
'132abcd13232111'
The reason is simple and stated in the documentation of strip:
str.strip([chars])
Return a copy of the string with the leading and trailing characters removed.
The chars argument is a string specifying the set of characters to be removed.
'abcd' is neither leading nor trailing in the string '132abcd13232111' so it isn't stripped.
Just to add a few examples to Jim's answer, according to .strip() docs:
Return a copy of the string with the leading and trailing characters removed.
The chars argument is a string specifying the set of characters to be removed.
If omitted or None, the chars argument defaults to removing whitespace.
The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped.
So it doesn't matter if it's a digit or not, the main reason your second code didn't worked as you expected, is because the term "abcd" was located in the middle of the string.
Example1:
s = '132abcd13232111'
print(s.strip('123'))
print(s.strip('abcd'))
Output:
abcd
132abcd13232111
Example2:
t = 'abcd12312313abcd'
print(t.strip('123'))
print(t.strip('abcd'))
Output:
abcd12312313abcd
12312313
I have a column in my pandas Dataframe df that contains a string with some trailing hex-encoded NULLs (\x00). At least I think that it's that. When I tried to replace them with:
df['SOPInstanceUID'] = df['SOPInstanceUID'].replace('\x00', '')
the column is not updated. When I do the same with
df['SOPInstanceUID'] = df['SOPInstanceUID'].str.replace('\x00', '')
it's working fine.
What's the difference here? (SOPInstanceUID is not an index.)
thanks
The former looks for exact matches, the latter looks for matches in any part of the string, which is why the latter works for you.
The str methods are synonymous with the standard string equivalents but are vectorised
You did not specify a regex or require an exact match, hence str.replace worked
str.replace(old, new[, count])
Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.
DataFrame.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad', axis=None)
parameter: to_replace : str, regex, list, dict, Series, numeric, or None
str or regex:
str: string exactly matching to_replace will be replaced with value
regex: regexs matching to_replace will be replaced with value
They're not actually in the string: you have unescaped control characters, which Python displays using the hexadecimal notation:
remove all non-word characters in the following way:
re.sub(r'[^\w]', '', '\x00\x00\x00\x08\x01\x008\xe6\x7f')
I have something like this:
text = 'This text is very very long.'
replace_words = ['very','word']
for word in replace_words:
text = text.replace('very','not very')
I would like to only replace the first 'very' or choose which 'very' gets overwritten. I'm doing this on much larger amounts of text so I want to control how duplicate words are being replaced.
text = text.replace("very", "not very", 1)
>>> help(str.replace)
Help on method_descriptor:
replace(...)
S.replace (old, new[, count]) -> string
Return a copy of string S with all occurrences of substring
old replaced by new. If the optional argument count is
given, only the first count occurrences are replaced.
text = text.replace("very", "not very", 1)
The third parameter is the maximum number of occurrences that you want to replace.
From the documentation for Python:
string.replace(s, old, new[, maxreplace])
Return a copy of string s with all occurrences of substring old replaced by new. If the optional argument maxreplace is given, the first maxreplace occurrences are replaced.
From http://docs.python.org/release/2.5.2/lib/string-methods.html :
replace( old, new[, count])
Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the
first count occurrences are replaced.
I didn't try but I believe it works