def censor(text, word):
final_text = ''
new_text = ''
items = text.split()
for i in items:
if i == word:
new_text = "*" * len(word)
final_text.join(new_text)
else:
new_text = items
final_text.join(new_text)
return final_text
print censor("this hack is wack hack", "hack")
the above function is intended to censor the word "hack" with asterisks present in the text. Can I know where is the flaw in the above code. Thank you in advance.
This should be it.
def censor(text, word):
final_text = ''
new_text = ''
items = text.split()
for index, w in enumerate(items): #'index' is an index of an array
if w == word:
new_text = "*" * len(word)
items[index] = new_text # substituting the '*'
final_text = ' '.join(items) # the correct way how join works
return final_text
print censor("this hack is wack hack", "hack")
The other way:
text = 'this hack is wack hack'
word = 'hack'
print text.replace(word, '*' * len(word))
The way join() works in python is you execute it on a join sign (e.g. ' ', '-', ',' etc), and you provide the list inside the join(list_in_here)
Simple example:
>>>'-'.join(['1','human','is','a','one'])
'1-human-is-a-one'
Related
I came across below mentioned scenario:
Input:-
parselTongue
Expected Output:-
parsel_tongue
My code:-
empty_string = ""
word = input()
if word.islower() == 1:
empty_string = empty_string + word
print(empty_string)
else:
for char in word:
char = str(char)
if char.isupper() == 1:
x = char
y = word.find(x)
print(char.replace(char, word[0:y] + "_" + char.lower() + word[y:]))
My output:-
parsel_tTongue
Please advice where i am going wrong as my output is coming as "parsel_tTongue" and not "parsel_tongue"
The more elegant solution would be just to implement the logic using comprehension.
word = input()
output= ''.join(c if not c.isupper() else f'_{c.lower()}' for c in word)
#output: 'parsel_tongue'
I believe that this approach could be better.
It prevents from situations where word contains not only letters but also special characters or numbers.
word = "camelCaseWord"
res = "" # sanke case word
# handle 1st upper character
if word[0].isupper():
word = word[0].lower() + word[1:]
for w in word:
# Only letter can be upper
if w.isupper():
res += "_" + w.lower()
else:
res += w
print(res)
>>> camel_case_word
if word = "camelCase3Wor& - > >>> camel_case3_wor&
no need for loop use regex
import re
name = 'parselTongue'
name = re.sub(r'(?<!^)(?=[A-Z])', '_', name).lower()
print(name) # camel_case_name
Adjust the slice on word
empty_string = ""
word = input()
if word.islower() == 1:
empty_string = empty_string + word
print(empty_string)
else:
for char in word:
char = str(char)
if char.isupper() == 1:
x = char
y = word.find(x)
print(char.replace(char, word[0:y] + "_" + char.lower()+ word[y+1:]))
prints the following for the input parselTongue
praselTongue
prasel_tongue
The best practice may be using regex:
fooBarBaz -> foo_bar_baz
re.sub(r'([A-Z])',lambda match:'_'+match.group(1).lower(),'fooBarBaz')
foo_bar_baz -> fooBarBaz
re.sub(r'_([a-z])',lambda match:match.group(1).upper(),'foo_bar_baz')
import re
camel_case = 'miaBau'
snake_case = re.sub(r'([A-Z])', r'_\1', camel_case).lower()
While working on a script to correct formatting errors from documents produced by OCR, I ran into an issue where depending on which loop I run first, the program runs about 80% slower.
Here is a simplified version of the code. I have the following loops to check for uppercase errors (e.g. "posSible"):
def fixUppercase(doc):
fixedText = ''
for line in doc.split('\n'):
fixedLine = ''
for word in line.split():
if (
word.isalpha()
and (
word.isupper()
or word.istitle()
or word.islower()
)
):
if word == line.split()[-1]:
fixedLine += word + '\n'
else:
fixedLine += word + ' '
elif (
word.isalpha()
):
lower = word.lower()
if word == line.split()[-1]:
fixedLine += lower + '\n'
else:
fixedLine += lower + ' '
else:
if word == line.split()[-1]:
fixedLine += word + '\n'
else:
fixedLine += word + ' '
fixedText += fixedLine
return fixedText
The following loop checks for and removes headings:
def headingsFix(doc):
fixedText = ''
count = 0
stopWords = ['on', 'and', 'of', 'as', 'for']
for line in doc.split('\n'):
tokenLine = ''
for word in line.split():
if word not in stopWords:
tokenLine += word + " "
if tokenLine.istitle() and (
not line.endswith('.')
and not line.endswith(',')
and not line.endswith(')')
and not line.endswith(';')
and not line.endswith(':')
):
count += 1
else:
fixedText += line
return fixedText
It's the loop in the fixedUppercase function that massively slows down. If I run any other function or loop prior to that one or If I run that one first or remove it entirely, the program is quick. Same behavior if both loops are part of one function.
I thought maybe another function or loop was causing the error by expanding the length of the document, but a check with len() shows same doc size either way.
headingsFix strips out all the line endings, which you presumably did not intend. However, your question is about why changing the order of transformations results in slower execution, so I'll not discuss fixing that here.
fixUppercase is extremely inefficient at handling lines with many words. It repeatedly calls line.split() over and over again on the entire book-length string. That isn't terribly slow if each line has maybe a dozen words, but it gets extremely slow if you have one enormous line with tens of thousands of words. I found your program runs vastly faster with this change to only split each line once. (I note that I can't say whether your program is correct as it stands, just that this change should have the same behaviour while being a lot faster. I'm afraid I don't particularly understand why it's comparing each word to see if it's the same as the last word on the line.)
def fixUppercase(doc):
fixedText = ''
for line in doc.split('\n'):
line_words = line.split() # Split the line once here.
fixedLine = ''
for word in line_words:
if (
word.isalpha()
and (
word.isupper()
or word.istitle()
or word.islower()
)
):
if word == line_words[-1]: # No need to split here.
fixedLine += word + '\n'
else:
fixedLine += word + ' '
elif (
word.isalpha()
):
lower = word.lower()
if word == line_words[-1]: # No need to split here.
fixedLine += lower + '\n'
else:
fixedLine += lower + ' '
else:
if word == line_words[-1]: # No need to split here.
fixedLine += word + '\n'
else:
fixedLine += word + ' '
fixedText += fixedLine
return fixedText
Here you can see my timings. I download 'Alice in Wonderland' from Project Gutenberg to use as test input.
annette#DISSONANCE:~/scratch$ wget 'https://www.gutenberg.org/files/11/11-0.txt' -O alice.txt
--2021-06-13 02:06:33-- https://www.gutenberg.org/files/11/11-0.txt
Resolving www.gutenberg.org (www.gutenberg.org)... 152.19.134.47, 2610:28:3090:3000:0:bad:cafe:47
Connecting to www.gutenberg.org (www.gutenberg.org)|152.19.134.47|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 174313 (170K) [text/plain]
Saving to: ‘alice.txt’
alice.txt 100%[============================================================================================================================>] 170.23K 175KB/s in 1.0s
2021-06-13 02:06:35 (175 KB/s) - ‘alice.txt’ saved [174313/174313]
annette#DISSONANCE:~/scratch$ time python slow_ocr_cleanup.py --headings-last < alice.txt > alice1.txt
real 0m0.065s
user 0m0.047s
sys 0m0.016s
annette#DISSONANCE:~/scratch$ time python slow_ocr_cleanup.py --headings-first < alice.txt > alice2.txt
^CTraceback (most recent call last):
File "slow_ocr_cleanup.py", line 117, in <module>
main()
File "slow_ocr_cleanup.py", line 106, in main
doc = fixUppercase(doc)
File "slow_ocr_cleanup.py", line 17, in fixUppercase
if word == line.split()[-1]:
KeyboardInterrupt
real 0m16.856s
user 0m8.438s
sys 0m8.375s
annette#DISSONANCE:~/scratch!1$ time python slow_ocr_cleanup.py --fixed < alice.txt > alice3.txt
real 0m0.058s
user 0m0.047s
sys 0m0.000s
As you can see, running without the fix was taking a long time so I stopped it early.
Here's the full test program:
import sys
def fixUppercase(doc):
fixedText = ''
for line in doc.split('\n'):
fixedLine = ''
for word in line.split():
if (
word.isalpha()
and (
word.isupper()
or word.istitle()
or word.islower()
)
):
if word == line.split()[-1]:
fixedLine += word + '\n'
else:
fixedLine += word + ' '
elif (
word.isalpha()
):
lower = word.lower()
if word == line.split()[-1]:
fixedLine += lower + '\n'
else:
fixedLine += lower + ' '
else:
if word == line.split()[-1]:
fixedLine += word + '\n'
else:
fixedLine += word + ' '
fixedText += fixedLine
return fixedText
def fixUppercaseFast(doc):
fixedText = ''
for line in doc.split('\n'):
line_words = line.split()
fixedLine = ''
for word in line_words:
if (
word.isalpha()
and (
word.isupper()
or word.istitle()
or word.islower()
)
):
if word == line_words[-1]:
fixedLine += word + '\n'
else:
fixedLine += word + ' '
elif (
word.isalpha()
):
lower = word.lower()
if word == line_words[-1]:
fixedLine += lower + '\n'
else:
fixedLine += lower + ' '
else:
if word == line_words[-1]:
fixedLine += word + '\n'
else:
fixedLine += word + ' '
fixedText += fixedLine
return fixedText
def headingsFix(doc):
fixedText = ''
count = 0
stopWords = ['on', 'and', 'of', 'as', 'for']
for line in doc.split('\n'):
tokenLine = ''
for word in line.split():
if word not in stopWords:
tokenLine += word + " "
if tokenLine.istitle() and (
not line.endswith('.')
and not line.endswith(',')
and not line.endswith(')')
and not line.endswith(';')
and not line.endswith(':')
):
count += 1
else:
fixedText += line
return fixedText
def main():
doc = sys.stdin.read()
if '--headings-last' in sys.argv[1:]:
doc = fixUppercase(doc)
doc = headingsFix(doc)
elif '--headings-first' in sys.argv[1:]:
doc = headingsFix(doc)
doc = fixUppercase(doc)
elif '--fixed' in sys.argv[1:]:
doc = headingsFix(doc)
doc = fixUppercaseFast(doc)
else:
print('Specify --headings-last, --headings-first or --fixed', file=sys.stderr)
sys.exit(1)
print(doc, end='')
if __name__ == '__main__':
main()
You'll note that the string concatenation isn't the source of the problem, although it's still inadvisable. In some some versions of Python there's an optimisation that makes it fast, but in general you can't rely on it always to work. This question and answer explain the problem in more detail, but broadly speaking, repeatedly using + or += to build larger and larger strings in a loop is inefficient because every time the whole string needs to be copied, and it's getting longer and longer as the loop goes on. It's a notorious pitfall known as Schlemiel the Painter's Algorithm. Better alternatives are to use str.join or io.StringIO.
Your fixUppercase() function basically does this:
change all alphabetical words that are not all lowercase, a proper title or all uppercase to all lowercase
However, you assume a document would only contain \n and as whitespace, so tabs (for example) would break your code. You could instead break up the document in space metacharacters and strings of the rest using regular expressions.
Your main problem is caused by the inefficiency of fixedUpper, so a solution would be to fix that.
This would do the same, but more efficiently:
import re
example="""
This is an example.
It Has:\ta fEw examples of thIngs that should be FIXED and CHANGED!
Don't touch this: a123B or this_Is_finE
Did it woRk?
"""
def fixedUpper(doc):
p = re.compile(r'\s|([^\s]+)')
# go through all the matches and join them back together into a string when done
return ''.join(
# lowercase for any alphabetic substring that does not contain whitespace and isn't a title or all uppercase
m.group(1).lower() if not (m.group(1) is None or m.group(1).istitle() or m.group(1).isupper()) and m.group(1).isalpha()
# in all other cases, just leave the match untouched
else m.group(0)
for m in p.finditer(doc)
)
print(repr(fixedUpper(example)))
Output (note how it preserved the whitespace):
"\nThis is an example.\n\nIt Has:\ta few examples of things that should be FIXED and CHANGED!\n\nDon't touch this: a123B or this_Is_finE\n\nDid it woRk?\n"
Also note that this still has the problem your code does as well: if there's interpunction at the end of a word, it's not fixed, like woRk?
This is better:
def fixedUpper(doc):
p = re.compile(r'\s|((\w+)([^\w\s]*))')
return ''.join(
m.group(1).lower()
if not (m.group(2) is None or m.group(2).istitle() or m.group(2).isupper()) and m.group(2).isalpha()
else m.group(0)
for m in p.finditer(doc)
)
Is there another to have exception for capitalizing an entire sentence. I've heard of skipList method, but it didn't work for my code. See below:
string = input('Enter a string: ')
i = 0
tempString = ' '.join(s[0].upper() + s[1:] for s in string.split(' '))
result = ""
for word in tempString.split():
if i == 0:
result = result + word + " "
elif (len(word) <= 2):
result = result + word.lower() + " "
elif (word == "And" or word == "The" or word == "Not"):
result = result + word.lower() + " "
else:
result = result + word + " "
i = i + 1
print ("\n")
print (result)
Sure. Write a complete list of words that should not be title-cased ("and", "the", "or", "not", etc), and title-case everything else.
words = s.split(' ')
result = ' '.join([words[0]] + [w.title() for w in words[1:] if w not in skipwords])
of course this will still miss Mr. Not's last name, which should be capitalized, and some stranger things like "McFinnigan" will be wrong, but language is hard. If you want better than that, you'll probably have to look into NTLK.
You could rewrite this like this
skip_words = {w.capitalize(): w for w in 'a in of or to and for the'.split()}
words = string.title().split()
result = ' '.join(skip_words.get(w, w) for w in words).capitalize()
Here is my code:
def fix_capitalization(usrStr):
newString = ''
wordList = []
numLetters = 0
for s in usrStr.split('. '):
if s[0].isupper():
s = s.capitalize()
s = s.replace(' i ', " I ")
wordList.append(s)
if s.islower():
s = s.capitalize()
s = s.replace(' i ', " I ")
wordList.append(s)
numLetters += 1
if s[0].islower():
s = s.capitalize()
s = s.replace(' i ', " I ")
wordList.append(s)
numLetters += 1
newString = '. '.join(wordList)
return newString, numLetters
The string being passed in is:
i want some water. he has some. maybe he can give me some. i think I will ask.
Note that there are 4 spaces before maybe. The result that I want is:
I want some water. He has some. Maybe he can give me some. I think I will ask.
but I get:
I want some water. He has some. maybe he can give me some. I think I will ask.
I know that maybe isn't being capitalized because I split on . and that sentence has more than one space after the period, but I'm not sure how I can fix this or if there's a better way to go about what I'm doing. Any help would be greatly appreciated.
In for loop:
First find the index of non-space character.
Then replace s[0] with s[index].
Solution using regex sub method
def fix_capitalization(mystr):
numLettets = 0
newstr = []
for s in mystr.split('. '):
tmp = re.sub('^(\s*\w+)', lambda x:x.group(1).title(), s)
newstr.append(tmp)
# num of letters
if s.lstrip()[0] != tmp.lstrip()[0]:
numLetters += 1
return '. '.join(newstr).replace(' i ', ' I '), numLetters
fix_capitalization( 'i want some water. he has some. maybe he can give me some. i think I will ask.')
# return ('I want some water. He has some. Maybe he can give me some. I think I will ask.', 4)
Simple fix to original code as below
def fix_capitalization(usrStr):
newString = ''
wordList = []
numLetters = 0
for s in usrStr.split('. '):
# check for leading space
lspace = len(s) - len(s.lstrip())
s = s.lstrip()
if s[0].isupper():
s = s.capitalize()
s = s.replace(' i ', " I ")
wordList.append(' '*lspace + s)
if s.islower():
s = s.capitalize()
s = s.replace(' i ', " I ")
wordList.append(' '*lspace + s)
numLetters += 1
if s[0].islower():
s = s.capitalize()
s = s.replace(' i ', " I ")
wordList.append(' '*lspace + s)
numLetters += 1
newString = '. '.join(wordList)
return newString, numLetters
I'm trying to write a function that will translate the input into so-called "cow Latin." I want to return the values from the if statement but whenever I do I get a syntax error. I can print the value but I want to avoid the function returning None as well.
def cow_latinify_sentence(sentence):
vowels = tuple('aeiou1234567890!##$%^&*()-_=+|\\][}{?/.\',><`~"')
sentence = sentence.lower()
sentence_list = sentence.split()
for i in range(len(sentence_list)):
cow_word = sentence_list[i][:]
if cow_word.startswith(vowels):
print('{0}moo'.format(cow_word), end=' ')
else:
cow_word = sentence_list[i][1:] + sentence_list[i][:1]
print('{0}oo'.format(cow_word), end=' ')
cow_latin = cow_latinify_sentence("the quick red fox")
print(cow_latin)
In short, how can I get the function to return instead of print?
def cow_latinify_sentence(sentence):
vowels = tuple('aeiou1234567890!##$%^&*()-_=+|\\][}{?/.\',><`~"')
sentence = sentence.lower()
sentence_list = sentence.split()
result = ''
for i in range(len(sentence_list)):
cow_word = sentence_list[i][:]
if cow_word.startswith(vowels):
result += ('{0}moo'.format(cow_word) + ' ')
else:
result += '{0}oo'.format(sentence_list[i][1:] + sentence_list[i][:1]) + ' '
return result.strip()
>>> cow_latinify_sentence('hello there i am a fish')
'ellohoo heretoo imoo ammoo amoo ishfoo'
Why not just replace the two instances of
print('{0}moo'.format(cow_word), end=' ')
with
return '{0}moo'.format(cow_word)+' '
You have to get rid of end=; you don't have to replace the newline that would otherwise follow the output of print, but if you want a space at the end of the returned string you still have to append it yourself.
You need to create a list to accumulate your results.
result = []
your two print statements in your function would need changed to result.append(XXXX). Then when you have processed the entire sentence you can
return (result)
or, to re-form it into a sentence:
return " ".join(result) + '.'
def cow_latinify_sentence(sentence):
vowels = tuple('aeiou1234567890!##$%^&*()-_=+|\\][}{?/.\',><`~"')
sentence = sentence.lower()
sentence_list = sentence.split()
result = ''
for i in range(len(sentence_list)):
cow_word = sentence_list[i][:]
if cow_word.startswith(vowels):
result += '{0}moo'.format(cow_word) + ' '
else:
result += '{0}oo'.format(sentence_list[i][1:] + sentence_list[i][:1]) + ' '
return result.strip()