I want to loop through a string and when it finds an uppercase letter, I want to replace it with #. Like this:
string = "hOw Are yOu?"
for x in string:
if x.isupper():
string.replace(x, "#")
print(string)
else:
print(string)
However, its not working as intended and is instead outputting the same string. Do tell me if there is a way to fix this or if you'd suggest another way.
Use list comprehension with join:
In [4]: ''.join([i if not i.isupper() else '#' for i in string])
Out[4]: 'h#w #re y#u?'
You just want to put the result again in string see below
string = "hOw Are yOu?"
for x in string:
if x.isupper():
string = string.replace(x, "#")
print(string)
else:
print(string)
Strings are immutable in Python. string.replace(x, "#") must thus be string = string.replace(x, "#") to have an effect on the string.
Note that currently your code has quadratic complexity as each replace operation has to loop over the entire string in linear time. A more efficient approach would be to perform the replacements yourself, as you're already looping over every character:
string = "".join(["#" if c.isupper() else c for c in "hOw Are yOu?"])
it would be even more concise (and possibly faster) to use a very simple RegEx for this:
import re
string = re.sub("[A-Z]", "#", "hOw Are yOu?")
this will fail for non-ASCII alphabets however; you'd have to use unicode properties & regex there.
This should do the trick!
string = "hOw Are yOu?"
for x in string:
if x.isupper():
string = string.replace(x, "#")
else:
pass
print(string)
I'm a novice as well! But from what I learned: when doing loops or if statements, you want to specify the value you are changing as I did in line 4 with: string = string.replace(x,'#') If not the change will not take effect!
Example:
my_list = [1,2,3,4]
for x in my_list:
my_list[x-1] = x + 1
print(my_list)
This is a poor example coding wise but it exemplifies the concept. If you don't address the variable it wont have any effect on it!
Hope this helps!
I searched a bit but couldn't find any questions addressing my problem. Sorry if my question is repetitive. I'm trying to edit python code say to replace all -/+/= operators that don't have white space on either side.
string = 'new_str=str+"this is a quoted string-having some operators+=- within the code."'
I would use '([^\s])(=|+|-)([^\s])' to find such operators. The problem is, I want to exclude those findings within the quoted string. Is there any way to do this by regular expression substitution.
The output I'm trying to get is:
edited_string = 'new_str = str + "this is a quoted string-having some operators+=- within the code."'
This example is just to help to understand the issue. I'm looking for an answer working on general cases.
You can do it in two steps: first adding space to the chars doesn't have space before them and then chars don't have space after them:
string = 'new_str=str+"this is a quoted string-having some operators+=- within the code."'
new_string = re.sub("(?<!\s>)(\+|\=)[^\+=-]", r" \g<0>", string)
new_string = re.sub("(\+|\=)(?=[^\s|=|-])", r"\g<0> ", new_string)
print(new_string)
>>> new_str = str + "this is a quoted string-having some operators+=- within the code."
I have a word within two opening and closing parenthesis, like this ((word)).
I want to remove the first and the last parenthesis, so they are not duplicate, in order to obtain something like this: (word).
I have tried using strip('()') on the variable that contains ((word)). However, it removes ALL parentheses at the beginning and at the end. Result: word.
Is there a way to specify that I only want the first and last one removed?
For this you could slice the string and only keep from the second character until the second to last character:
word = '((word))'
new_word = word[1:-1]
print(new_word)
Produces:
(word)
For varying quantities of parenthesis, you could count how many exist first and pass this to the slicing as such (this leaves only 1 bracket on each side, if you want to remove only the first and last bracket you can use the first suggestion);
word ='((((word))))'
quan = word.count('(')
new_word = word[quan-1:1-quan]
print(new_word)
Produces;
(word)
You can use regex.
import re
word = '((word))'
re.findall('(\(?\w+\)?)', word)[0]
This only keeps one pair of brackets.
instead use str.replace, so you would do str.replace('(','',1)
basically you would replace all '(' with '', but the third argument will only replace n instances of the specified substring (as argument 1), hence you will only replace the first '('
read the documentation :
replace(...)
S.replace (old, new[, count]) -> string
Return a copy of string S with all occurrences of substring
old replaced by new. If the optional argument count is
given, only the first count occurrences are replaced.
you can replace double opening and double closing parentheses, and set the max parameter to 1 for both operations
print('((word))'.replace('((','(',1).replace('))',')',1) )
But this will not work if there are more occurrences of double closing parentheses
Maybe reversing the string before replacing the closing ones will help
t= '((word))'
t = t.replace('((','(',1)
t = t[::-1] # see string reversion topic [https://stackoverflow.com/questions/931092/reverse-a-string-in-python]
t = t.replace('))',')',1) )
t = t[::-1] # and reverse again
Well , I used regular expression for this purpose and substitute a bunch of brackets with a single one using re.sub function
import re
s="((((((word)))))))))"
t=re.sub(r"\(+","(",s)
g=re.sub(r"\)+",")",t)
print(g)
Output
(word)
Try below:
>>> import re
>>> w = '((word))'
>>> re.sub(r'([()])\1+', r'\1', w)
'(word)'
>>> w = 'Hello My ((word)) into this world'
>>> re.sub(r'([()])\1+', r'\1', w)
'Hello My (word) into this world'
>>>
try this one:
str="((word))"
str[1:len(str)-1]
print (str)
And output is = (word)
I'm trying to check and see if every character in a given string is alphabetic. If it isn't, then replace whatever the character is with a blank space. What I have so far is:
def removePunctuation(txt):
for char in txt:
if char.isalpha() == False:
txt.replace(char, " ")
return txt
I've tested some input but it's not removing anything.
The Python Way:
Use join() and a generator expression like:
new_txt = ''.join(c if c.isalpha() else ' ' for c in txt)
Why did my code not work?
The basic problem with your code is that in Python strings are immutable. Which is to say, once they are built, they never change. You can discard them because you are done with them, but you cannot change them. Because of this .replace() does not change txt. What is does do is return a new string with the characters replaced. This is not at all what you want in the loop you have constructed.
Using str.translate and a defaultdict is a good way to approach translation problems.
from collections import defaultdict
from string import ascii_letters
translation = defaultdict(lambda: ' ', str.maketrans(ascii_letters, ascii_letters))
print('Hello.World!'.translate(translation))
The minimum change required to make your function work is:
def removePunctuation(txt):
for char in txt:
if char.isalpha() == False:
txt = txt.replace(char, " ")
return txt
new_txt = removePunctuation(txt)
Strings are immutable, so str.replace does not alter its argument, but returns a new string.
However, Stephen Rauch's one-liner is a more "Python" way of accomplishing this.
def removePunctuation(txt):
return ''.join(c if c.isalpha() else ' ' for c in txt)
new_txt = removePunctuation(txt)
I want to create a list of tags from a user supplied single input box, separated by comma's and I'm looking for some expression(s) that can help automate this.
What I want is to supply the input field and:
remove all double+ whitespaces, tabs, new lines (leaving just single spaces)
remove ALL (single's and double+) quotation marks, except for comma's, which there can be only one of
in between each comma, i want Something Like Title Case, but excluding the first word and not at all for single words, so that when the last spaces are removed, the tag comes out as 'somethingLikeTitleCase' or just 'something' or 'twoWords'
and finally, remove all remaining spaces
Here's what I have gathered around SO so far:
def no_whitespace(s):
"""Remove all whitespace & newlines. """
return re.sub(r"(?m)\s+", "", s)
# remove spaces, newlines, all whitespace
# http://stackoverflow.com/a/42597/523051
tag_list = ''.join(no_whitespace(tags_input))
# split into a list at comma's
tag_list = tag_list.split(',')
# remove any empty strings (since I currently don't know how to remove double comma's)
# http://stackoverflow.com/questions/3845423/remove-empty-strings-from-a-list-of-strings
tag_list = filter(None, tag_list)
I'm lost though when it comes to modifying that regex to remove all the punctuation except comma's and I don't even know where to begin for the capitalizing.
Any thoughts to get me going in the right direction?
As suggested, here are some sample inputs = desired_outputs
form: 'tHiS iS a tAg, 'whitespace' !&#^ , secondcomment , no!punc$$, ifNOSPACESthenPRESERVEcaps' should come out as
['thisIsATag', 'secondcomment', 'noPunc', 'ifNOSPACESthenPRESERVEcaps']
Here's an approach to the problem (that doesn't use any regular expressions, although there's one place where it could). We split up the problem into two functions: one function which splits a string into comma-separated pieces and handles each piece (parseTags), and one function which takes a string and processes it into a valid tag (sanitizeTag). The annotated code is as follows:
# This function takes a string with commas separating raw user input, and
# returns a list of valid tags made by sanitizing the strings between the
# commas.
def parseTags(str):
# First, we split the string on commas.
rawTags = str.split(',')
# Then, we sanitize each of the tags. If sanitizing gives us back None,
# then the tag was invalid, so we leave those cases out of our final
# list of tags. We can use None as the predicate because sanitizeTag
# will never return '', which is the only falsy string.
return filter(None, map(sanitizeTag, rawTags))
# This function takes a single proto-tag---the string in between the commas
# that will be turned into a valid tag---and sanitizes it. It either
# returns an alphanumeric string (if the argument can be made into a valid
# tag) or None (if the argument cannot be made into a valid tag; i.e., if
# the argument contains only whitespace and/or punctuation).
def sanitizeTag(str):
# First, we turn non-alphanumeric characters into whitespace. You could
# also use a regular expression here; see below.
str = ''.join(c if c.isalnum() else ' ' for c in str)
# Next, we split the string on spaces, ignoring leading and trailing
# whitespace.
words = str.split()
# There are now three possibilities: there are no words, there was one
# word, or there were multiple words.
numWords = len(words)
if numWords == 0:
# If there were no words, the string contained only spaces (and/or
# punctuation). This can't be made into a valid tag, so we return
# None.
return None
elif numWords == 1:
# If there was only one word, that word is the tag, no
# post-processing required.
return words[0]
else:
# Finally, if there were multiple words, we camel-case the string:
# we lowercase the first word, capitalize the first letter of all
# the other words and lowercase the rest, and finally stick all
# these words together without spaces.
return words[0].lower() + ''.join(w.capitalize() for w in words[1:])
And indeed, if we run this code, we get:
>>> parseTags("tHiS iS a tAg, \t\n!&#^ , secondcomment , no!punc$$, ifNOSPACESthenPRESERVEcaps")
['thisIsATag', 'secondcomment', 'noPunc', 'ifNOSPACESthenPRESERVEcaps']
There are two points in this code that it's worth clarifying. First is the use of str.split() in sanitizeTags. This will turn a b c into ['a','b','c'], whereas str.split(' ') would produce ['','a','b','c','']. This is almost certainly the behavior you want, but there's one corner case. Consider the string tAG$. The $ gets turned into a space, and is stripped out by the split; thus, this gets turned into tAG instead of tag. This is probably what you want, but if it isn't, you have to be careful. What I would do is change that line to words = re.split(r'\s+', str), which will split the string on whitespace but leave in the leading and trailing empty strings; however, I would also change parseTags to use rawTags = re.split(r'\s*,\s*', str). You must make both these changes; 'a , b , c'.split(',') becomes ['a ', ' b ', ' c'], which is not the behavior you want, whereas r'\s*,\s*' deletes the space around the commas too. If you ignore leading and trailing white space, the difference is immaterial; but if you don't, then you need to be careful.
Finally, there's the non-use of regular expressions, and instead the use of str = ''.join(c if c.isalnum() else ' ' for c in str). You can, if you want, replace this with a regular expression. (Edit: I removed some inaccuracies about Unicode and regular expressions here.) Ignoring Unicode, you could replace this line with
str = re.sub(r'[^A-Za-z0-9]', ' ', str)
This uses [^...] to match everything but the listed characters: ASCII letters and numbers. However, it's better to support Unicode, and it's easy, too. The simplest such approach is
str = re.sub(r'\W', ' ', str, flags=re.UNICODE)
Here, \W matches non-word characters; a word character is a letter, a number, or the underscore. With flags=re.UNICODE specified (not available before Python 2.7; you can instead use r'(?u)\W' for earlier versions and 2.7), letters and numbers are both any appropriate Unicode characters; without it, they're just ASCII. If you don't want the underscore, you can add |_ to the regex to match underscores as well, replacing them with spaces too:
str = re.sub(r'\W|_', ' ', str, flags=re.UNICODE)
This last one, I believe, matches the behavior of my non-regex-using code exactly.
Also, here's how I'd write the same code without those comments; this also allows me to eliminate some temporary variables. You might prefer the code with the variables present; it's just a matter of taste.
def parseTags(str):
return filter(None, map(sanitizeTag, str.split(',')))
def sanitizeTag(str):
words = ''.join(c if c.isalnum() else ' ' for c in str).split()
numWords = len(words)
if numWords == 0:
return None
elif numWords == 1:
return words[0]
else:
return words[0].lower() + ''.join(w.capitalize() for w in words[1:])
To handle the newly-desired behavior, there are two things we have to do. First, we need a way to fix the capitalization of the first word: lowercase the whole thing if the first letter's lowercase, and lowercase everything but the first letter if the first letter's upper case. That's easy: we can just check directly. Secondly, we want to treat punctuation as completely invisible: it shouldn't uppercase the following words. Again, that's easy—I even discuss how to handle something similar above. We just filter out all the non-alphanumeric, non-whitespace characters rather than turning them into spaces. Incorporating those changes gives us
def parseTags(str):
return filter(None, map(sanitizeTag, str.split(',')))
def sanitizeTag(str):
words = filter(lambda c: c.isalnum() or c.isspace(), str).split()
numWords = len(words)
if numWords == 0:
return None
elif numWords == 1:
return words[0]
else:
words0 = words[0].lower() if words[0][0].islower() else words[0].capitalize()
return words0 + ''.join(w.capitalize() for w in words[1:])
Running this code gives us the following output
>>> parseTags("tHiS iS a tAg, AnD tHIs, \t\n!&#^ , se#%condcomment$ , No!pUnc$$, ifNOSPACESthenPRESERVEcaps")
['thisIsATag', 'AndThis', 'secondcomment', 'NopUnc', 'ifNOSPACESthenPRESERVEcaps']
You could use a white list of characters allowed to be in a word, everything else is ignored:
import re
def camelCase(tag_str):
words = re.findall(r'\w+', tag_str)
nwords = len(words)
if nwords == 1:
return words[0] # leave unchanged
elif nwords > 1: # make it camelCaseTag
return words[0].lower() + ''.join(map(str.title, words[1:]))
return '' # no word characters
This example uses \w word characters.
Example
tags_str = """ 'tHiS iS a tAg, 'whitespace' !&#^ , secondcomment , no!punc$$,
ifNOSPACESthenPRESERVEcaps' """
print("\n".join(filter(None, map(camelCase, tags_str.split(',')))))
Output
thisIsATag
whitespace
secondcomment
noPunc
ifNOSPACESthenPRESERVEcaps
I think this should work
def toCamelCase(s):
# remove all punctuation
# modify to include other characters you may want to keep
s = re.sub("[^a-zA-Z0-9\s]","",s)
# remove leading spaces
s = re.sub("^\s+","",s)
# camel case
s = re.sub("\s[a-z]", lambda m : m.group(0)[1].upper(), s)
# remove all punctuation and spaces
s = re.sub("[^a-zA-Z0-9]", "", s)
return s
tag_list = [s for s in (toCamelCase(s.lower()) for s in tag_list.split(',')) if s]
the key here is to make use of re.sub to make the replacements you want.
EDIT : Doesn't preserve caps, but does handle uppercase strings with spaces
EDIT : Moved "if s" after the toCamelCase call