I like some ways of how string.capwords() behaves, and some ways of how .title() behaves, but not one single one.
I need abbreviations capitalized, which .title() does, but not string.capwords(), and string.capwords() does not capitalize letters after single quotes, so I need a combination of the two. I want to use .title(), and then I need to lowercase the single letter after an apostrophe only if there are no spaces between.
For example, here's a user's input:
string="it's e.t.!"
And I want to convert it to:
>>> "It's E.T.!"
.title() would capitalize the 's', and string.capwords() would not capitalize the "e.t.".
You can use regular expression substitution (See re.sub):
>>> s = "it's e.t.!"
>>> import re
>>> re.sub(r"\b(?<!')[a-z]", lambda m: m.group().upper(), s)
"It's E.T.!"
[a-z] will match lowercase alphabet letter. But not after ' ((?<!') - negative look-behind assertion). And the letter should appear after the word boundary; so t will not be matched.
The second argument to re.sub, lambda will return substitution string. (upper version of the letter) and it will be used for replacement.
a = ".".join( [word.capitalize() for word in "it's e.t.!".split(".")] )
b = " ".join( [word.capitalize() for word in a.split(" ")] )
print(b)
Edited to use the capitalize function instead. Now it's starting to look like something usable :). But this solution doesn't work with other whitespace characters. For that I would go with falsetru's solution.
if you don't want to use regex , you can always use this simple for loop
s = "it's e.t.!"
capital_s = ''
pos_quote = s.index("'")
for pos, alpha in enumerate(s):
if pos not in [pos_quote-1, pos_quote+1]:
alpha = alpha.upper()
capital_s += alpha
print capital_s
hope this helps :)
Related
I am trying to remove a specific word from within a sentence, which is 'you'. The code is as listed below:
out1.text_condition = out1.text_condition.replace('you','')
This works, however, it also removes it from within a word that contains it, so when 'your' appears, it removes the 'you' from within it, leaving 'r' standing. Can anyone help me figure out what I can do to just remove the word, not the letters from within another string?
Thanks!
In order to replace whole words and not substrings, you should use a regular expression (regex).
Here is how to replace a whole word with the module re:
import re
def replace_whole_word_from_string(word, string, replacement=""):
regular_expression = rf"\b{word}\b"
return re.sub(regular_expression, replacement, string)
string = "you you ,you your"
result = replace_whole_word_from_string("you", string)
print(result)
Output:
, your
Explanation:
The two \b are what we call "word boundaries". The advantage over str.replace is that it will take into account the punctuation too.
In order to create the regular expression, here we use Literal String Interpolation (also called "f-strings", https://www.python.org/dev/peps/pep-0498/).
To create a "f-string", we add the prefix f.
We also use the prefix r, in order to create a "raw string". We use a raw string in order to avoid escaping the backslash in \b.
Without the prefix r, we would have written regular_expression = f"\\b{word}\\b".
If you had used string.replace(' you ', ' '), you would have received this (wrong) output:
you ,you your
A very simple solution is to replace the word with spaces around it with one space:
out1.text_condition = out1.text_condition.replace(' you ', ' ')
But note that it wouldn't remove for example you. (in the end of the sentence) or you,, etc.
Easiest way is probably just to assume there are spaces before and after the word:
out1.text_condition = out1.text_condition.replace(' you ','')
hello dear helpful ppl at stackoverflow ,
I have couple questions about manipulating a string in python ,
first question:-
if I have a string like :
'What's the use?'
and I want to locate the first letter after 'the'
like (What's the use?) the letter is u
how I could do it in the best way possible ?
second question:-
if I want to change something on this string based on the first letter i found in the (First question)
how I could do it ?
and thanks for helping !
You could use a regex replacement to remove all content up and including the first the (along with any following whitespace). Then, just access the first character from that output.
inp = 'What''s the use?'
inp = re.sub(r'^.*?\bthe\b\s*', '', inp)
print("First character after first 'the' is: " + inp[0])
This prints:
First character after first 'the' is: u
Another re take:
import re
sample = "What is the use?"
pattern = r"""
(?<=\bthe\b) # look-behind to ensure 'the' is there. This is non-capturing.
\s+ # one or more whitespace characters
(\w) # Only one alphanumeric or underscore character
"""
# re.X is for verbose, which handles multi-line patterns
m = re.search(pattern, sample, flags = re.X).groups(1)
if not m is None:
print(f"First character after first 'the' is: {m[0]}")
You can find the index of 'u' by using the str.index() method. Then you can extract string before and after using slice operation.
s = "What's the use?"
character_index = s.lower().index('the ') + 4
print(character_index)
# 11
print(s[:character_index] + '*' + s[character_index+1:])
# What's the *se?
How to replace if the first two letters in a word repeats with the same letter?
For instance,
string = 'hhappy'
And I want to get
happy
I tried with
re.sub(r'(.)\1+', r'\1', string)
But, this gives
hapy
Thank you!
You need to add a caret (^) to match only the start of the line.
re.sub(r'^(.)\1+', r'\1', string)
Example:
import re
string = 'hhappy'
print re.sub(r'^(.)\1+', r'\1', string)
Prints:
happy
The above works only for the start of the line. If you need this for each word you need to do this:
re.sub(r'\b(\w)\1+', r'\1', string)
The regex would be
\b(\w)\1+
\b checks for a word boundary.
Check it out here at regex101.
Or you could simply slice:
string = 'hhappy'
func = lambda s: s[1:] if s[0] == s[1] else s
new_string = func(string)
# happy
I want to find the first vowel in a word, and remove all the letters before the first occurrence of vowel, finally return the left of the word. i thought i can use a list to do that, first find 'a' in the word, and get the first part separated by 'a', and then find 'e'.....but i want to simplify it with regular expression, i am thinking if theres a way that i can find all the five vowels at the same time and get a index of the first one, then it will be easy to do next step. so i might need some help. i am a newer in regular expression, does anyone have an idea about this?
i have problems again. this is the code i write according to the suggestion made by #Martijin.
import re
def pigify():
user_input=raw_input()
sentence=re.sub(r'\b([aeiou])([a-z]*)\b',r'\1\2'+'hay',user_input,re.I)
sentence1=re.sub(r'\b(qu)([a-z]*)\b',r'\2\1'+'ay',sentence,re.I)
sentence2=re.sub(r'\b([^aeiou]*)(\w*)\b',r'\2\1'+'ay',sentence1,re.I)
print sentence2
return
pigify()
if i input:
quiet askhj a dhjsadf skdhyksj qdksdj y
i would like to get:
ietquay askhjhay ahay adfdhjsay yksjskdhay qdksdjay yay
but now i've only complished the first two steps:1. find the vowel-started word and add 'hay' at the end of it, 2.find the 'qu'-started word and move 'qu' to the end then add 'ay'.the 3rd step is to find the left words in the sentence and for every word, find the first vowel or 'y'(when 'y' is not the first letter) in it, move all the letters before the vowel to the end and add 'ay'. the code run as result like this:
ietquayayaskhjhay ay ahay dhjsadf skdhyksj qdksdj y
i guess i didn't use \b in a right way, because re.sub use replacement to replace the blocks. how to get it right? by the way, i've accomplished another version with 'for' loop and 'if|else',this is the code, i think there must be a way to simplify it.
def SieveWord(user_input):
return user_input.split(' ')
def UpperToLower(user_input):
return user_input.lower()
vowel=['a','e','i','o','u']
transform_input=UpperToLower(raw_input())
input_list=SieveWord(transform_input)
u=[]
for word in input_list:
if len(word)!=1:
if word[0] in vowel:
word+='h'
else:
if word[0]+word[1]=='qu':
word=word[2:]+'qu'
else:
for letter in word:
if letter in vowel or (letter=='y' and word[0]!='y'):
position=word.index(letter)
removepart=word[0:position]
word=word[position:]+removepart
break
elif word in vowel:
word+='h'
u.append(word+'ay')
for d in u:
print d,
You can use a regular expression to remove all non-vowels at the start of a word:
re.sub(r'\b[^aeoui]*', '', inputstring, flags=re.I)
Demo:
>>> import re
>>> inputstring = 'School'
>>> re.sub(r'\b[^aeoui]*', '', inputstring, flags=re.I)
'ool'
The [^...] negative class matches anything that is not a vowel (with the re.I flag making sure it'll ignore case). The \b anchor matches the position in a string just before or after a word. The re.I makes the In the example above, \b matches the start, and the negative class matches the Sch characters, as they are not in the class.
I would like to replace strings like 'HDMWhoSomeThing' to 'HDM Who Some Thing' with regex.
So I would like to extract words which starts with an upper-case letter or consist of upper-case letters only. Notice that in the string 'HDMWho' the last upper-case letter is in the fact the first letter of the word Who - and should not be included in the word HDM.
What is the correct regex to achieve this goal? I have tried many regex' similar to [A-Z][a-z]+ but without success. The [A-Z][a-z]+ gives me 'Who Some Thing' - without 'HDM' of course.
Any ideas?
Thanks,
Rukki
#! /usr/bin/env python
import re
from collections import deque
pattern = r'([A-Z]{2,}(?=[A-Z]|$)|[A-Z](?=[a-z]|$))'
chunks = deque(re.split(pattern, 'HDMWhoSomeMONKEYThingXYZ'))
result = []
while len(chunks):
buf = chunks.popleft()
if len(buf) == 0:
continue
if re.match(r'^[A-Z]$', buf) and len(chunks):
buf += chunks.popleft()
result.append(buf)
print ' '.join(result)
Output:
HDM Who Some MONKEY Thing XYZ
Judging by lines of code, this task is a much more natural fit with re.findall:
pattern = r'([A-Z]{2,}(?=[A-Z]|$)|[A-Z][a-z]*)'
print ' '.join(re.findall(pattern, 'HDMWhoSomeMONKEYThingX'))
Output:
HDM Who Some MONKEY Thing X
Try to split with this regular expression:
/(?=[A-Z][a-z])/
And if your regular expression engine does not support splitting empty matches, try this regular expression to put spaces between the words:
/([A-Z])(?![A-Z])/
Replace it with " $1" (space plus match of the first group). Then you can split at the space.
one liner :
' '.join(a or b for a,b in re.findall('([A-Z][a-z]+)|(?:([A-Z]*)(?=[A-Z]))',s))
using regexp
([A-Z][a-z]+)|(?:([A-Z]*)(?=[A-Z]))
So 'words' in this case are:
Any number of uppercase letters - unless the last uppercase letter is followed by a lowercase letter.
One uppercase letter followed by any number of lowercase letters.
so try:
([A-Z]+(?![a-z])|[A-Z][a-z]*)
The first alternation includes a negative lookahead (?![a-z]), which handles the boundary between an all-caps word and an initial caps word.
May be '[A-Z]*?[A-Z][a-z]+'?
Edit: This seems to work: [A-Z]{2,}(?![a-z])|[A-Z][a-z]+
import re
def find_stuff(str):
p = re.compile(r'[A-Z]{2,}(?![a-z])|[A-Z][a-z]+')
m = p.findall(str)
result = ''
for x in m:
result += x + ' '
print result
find_stuff('HDMWhoSomeThing')
find_stuff('SomeHDMWhoThing')
Prints out:
HDM Who Some Thing
Some HDM Who Thing