Related
s = 'the brown fox'
...do something here...
s should be:
'The Brown Fox'
What's the easiest way to do this?
The .title() method of a string (either ASCII or Unicode is fine) does this:
>>> "hello world".title()
'Hello World'
>>> u"hello world".title()
u'Hello World'
However, look out for strings with embedded apostrophes, as noted in the docs.
The algorithm uses a simple language-independent definition of a word as groups of consecutive letters. The definition works in many contexts but it means that apostrophes in contractions and possessives form word boundaries, which may not be the desired result:
>>> "they're bill's friends from the UK".title()
"They'Re Bill'S Friends From The Uk"
The .title() method can't work well,
>>> "they're bill's friends from the UK".title()
"They'Re Bill'S Friends From The Uk"
Try string.capwords() method,
import string
string.capwords("they're bill's friends from the UK")
>>>"They're Bill's Friends From The Uk"
From the Python documentation on capwords:
Split the argument into words using str.split(), capitalize each word using str.capitalize(), and join the capitalized words using str.join(). If the optional second argument sep is absent or None, runs of whitespace characters are replaced by a single space and leading and trailing whitespace are removed, otherwise sep is used to split and join the words.
Just because this sort of thing is fun for me, here are two more solutions.
Split into words, initial-cap each word from the split groups, and rejoin. This will change the white space separating the words into a single white space, no matter what it was.
s = 'the brown fox'
lst = [word[0].upper() + word[1:] for word in s.split()]
s = " ".join(lst)
EDIT: I don't remember what I was thinking back when I wrote the above code, but there is no need to build an explicit list; we can use a generator expression to do it in lazy fashion. So here is a better solution:
s = 'the brown fox'
s = ' '.join(word[0].upper() + word[1:] for word in s.split())
Use a regular expression to match the beginning of the string, or white space separating words, plus a single non-whitespace character; use parentheses to mark "match groups". Write a function that takes a match object, and returns the white space match group unchanged and the non-whitespace character match group in upper case. Then use re.sub() to replace the patterns. This one does not have the punctuation problems of the first solution, nor does it redo the white space like my first solution. This one produces the best result.
import re
s = 'the brown fox'
def repl_func(m):
"""process regular expression match groups for word upper-casing problem"""
return m.group(1) + m.group(2).upper()
s = re.sub("(^|\s)(\S)", repl_func, s)
>>> re.sub("(^|\s)(\S)", repl_func, s)
"They're Bill's Friends From The UK"
I'm glad I researched this answer. I had no idea that re.sub() could take a function! You can do nontrivial processing inside re.sub() to produce the final result!
Here is a summary of different ways to do it, and some pitfalls to watch out for
They will work for all these inputs:
"" => ""
"a b c" => "A B C"
"foO baR" => "FoO BaR"
"foo bar" => "Foo Bar"
"foo's bar" => "Foo's Bar"
"foo's1bar" => "Foo's1bar"
"foo 1bar" => "Foo 1bar"
Splitting the sentence into words and capitalizing the first letter then join it back together:
# Be careful with multiple spaces, and empty strings
# for empty words w[0] would cause an index error,
# but with w[:1] we get an empty string as desired
def cap_sentence(s):
return ' '.join(w[:1].upper() + w[1:] for w in s.split(' '))
Without splitting the string, checking blank spaces to find the start of a word
def cap_sentence(s):
return ''.join( (c.upper() if i == 0 or s[i-1] == ' ' else c) for i, c in enumerate(s) )
Or using generators:
# Iterate through each of the characters in the string
# and capitalize the first char and any char after a blank space
from itertools import chain
def cap_sentence(s):
return ''.join( (c.upper() if prev == ' ' else c) for c, prev in zip(s, chain(' ', s)) )
Using regular expressions, from steveha's answer:
# match the beginning of the string or a space, followed by a non-space
import re
def cap_sentence(s):
return re.sub("(^|\s)(\S)", lambda m: m.group(1) + m.group(2).upper(), s)
Now, these are some other answers that were posted, and inputs for which they don't work as expected if we define a word as being the start of the sentence or anything after a blank space:
.title()
return s.title()
# Undesired outputs:
"foO baR" => "Foo Bar"
"foo's bar" => "Foo'S Bar"
"foo's1bar" => "Foo'S1Bar"
"foo 1bar" => "Foo 1Bar"
.capitalize() or .capwords()
return ' '.join(w.capitalize() for w in s.split())
# or
import string
return string.capwords(s)
# Undesired outputs:
"foO baR" => "Foo Bar"
"foo bar" => "Foo Bar"
using ' ' for the split will fix the second output, but not the first
return ' '.join(w.capitalize() for w in s.split(' '))
# or
import string
return string.capwords(s, ' ')
# Undesired outputs:
"foO baR" => "Foo Bar"
.upper()
Be careful with multiple blank spaces, this gets fixed by using ' ' for the split (like shown at the top of the answer)
return ' '.join(w[0].upper() + w[1:] for w in s.split())
# Undesired outputs:
"foo bar" => "Foo Bar"
Why do you complicate your life with joins and for loops when the solution is simple and safe??
Just do this:
string = "the brown fox"
string[0].upper()+string[1:]
Copy-paste-ready version of #jibberia anwser:
def capitalize(line):
return ' '.join(s[:1].upper() + s[1:] for s in line.split(' '))
If only you want the first letter:
>>> 'hello world'.capitalize()
'Hello world'
But to capitalize each word:
>>> 'hello world'.title()
'Hello World'
If str.title() doesn't work for you, do the capitalization yourself.
Split the string into a list of words
Capitalize the first letter of each word
Join the words into a single string
One-liner:
>>> ' '.join([s[0].upper() + s[1:] for s in "they're bill's friends from the UK".split(' ')])
"They're Bill's Friends From The UK"
Clear example:
input = "they're bill's friends from the UK"
words = input.split(' ')
capitalized_words = []
for word in words:
title_case_word = word[0].upper() + word[1:]
capitalized_words.append(title_case_word)
output = ' '.join(capitalized_words)
An empty string will raise an error if you access [1:]. Therefore I would use:
def my_uppercase(title):
if not title:
return ''
return title[0].upper() + title[1:]
to uppercase the first letter only.
Although all the answers are already satisfactory, I'll try to cover the two extra cases along with the all the previous case.
if the spaces are not uniform and you want to maintain the same
string = hello world i am here.
if all the string are not starting from alphabets
string = 1 w 2 r 3g
Here you can use this:
def solve(s):
a = s.split(' ')
for i in range(len(a)):
a[i]= a[i].capitalize()
return ' '.join(a)
This will give you:
output = Hello World I Am Here
output = 1 W 2 R 3g
As Mark pointed out, you should use .title():
"MyAwesomeString".title()
However, if would like to make the first letter uppercase inside a Django template, you could use this:
{{ "MyAwesomeString"|title }}
Or using a variable:
{{ myvar|title }}
The suggested method str.title() does not work in all cases.
For example:
string = "a b 3c"
string.title()
> "A B 3C"
instead of "A B 3c".
I think, it is better to do something like this:
def capitalize_words(string):
words = string.split(" ") # just change the split(" ") method
return ' '.join([word.capitalize() for word in words])
capitalize_words(string)
>'A B 3c'
To capitalize words...
str = "this is string example.... wow!!!";
print "str.title() : ", str.title();
#Gary02127 comment, the below solution works with title with apostrophe
import re
def titlecase(s):
return re.sub(r"[A-Za-z]+('[A-Za-z]+)?", lambda mo: mo.group(0)[0].upper() + mo.group(0)[1:].lower(), s)
text = "He's an engineer, isn't he? SnippetBucket.com "
print(titlecase(text))
You can try this. simple and neat.
def cap_each(string):
list_of_words = string.split(" ")
for word in list_of_words:
list_of_words[list_of_words.index(word)] = word.capitalize()
return " ".join(list_of_words)
Don't overlook the preservation of white space. If you want to process 'fred flinstone' and you get 'Fred Flinstone' instead of 'Fred Flinstone', you've corrupted your white space. Some of the above solutions will lose white space. Here's a solution that's good for Python 2 and 3 and preserves white space.
def propercase(s):
return ''.join(map(''.capitalize, re.split(r'(\s+)', s)))
The .title() method won't work in all test cases, so using .capitalize(), .replace() and .split() together is the best choice to capitalize the first letter of each word.
eg: def caps(y):
k=y.split()
for i in k:
y=y.replace(i,i.capitalize())
return y
You can use title() method to capitalize each word in a string in Python:
string = "this is a test string"
capitalized_string = string.title()
print(capitalized_string)
Output:
This Is A Test String
A quick function worked for Python 3
Python 3.6.9 (default, Nov 7 2019, 10:44:02)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> capitalizeFirtChar = lambda s: s[:1].upper() + s[1:]
>>> print(capitalizeFirtChar('помните своих Предковъ. Сражайся за Правду и Справедливость!'))
Помните своих Предковъ. Сражайся за Правду и Справедливость!
>>> print(capitalizeFirtChar('хай живе вільна Україна! Хай живе Любовь поміж нас.'))
Хай живе вільна Україна! Хай живе Любовь поміж нас.
>>> print(capitalizeFirtChar('faith and Labour make Dreams come true.'))
Faith and Labour make Dreams come true.
Capitalize string with non-uniform spaces
I would like to add to #Amit Gupta's point of non-uniform spaces:
From the original question, we would like to capitalize every word in the string s = 'the brown fox'. What if the string was s = 'the brown fox' with non-uniform spaces.
def solve(s):
# If you want to maintain the spaces in the string, s = 'the brown fox'
# Use s.split(' ') instead of s.split().
# s.split() returns ['the', 'brown', 'fox']
# while s.split(' ') returns ['the', 'brown', '', '', '', '', '', 'fox']
capitalized_word_list = [word.capitalize() for word in s.split(' ')]
return ' '.join(capitalized_word_list)
Easiest solution for your question, it worked in my case:
import string
def solve(s):
return string.capwords(s,' ')
s=input()
res=solve(s)
print(res)
Another oneline solution could be:
" ".join(map(lambda d: d.capitalize(), word.split(' ')))
In case you want to downsize
# Assuming you are opening a new file
with open(input_file) as file:
lines = [x for x in reader(file) if x]
# for loop to parse the file by line
for line in lines:
name = [x.strip().lower() for x in line if x]
print(name) # Check the result
I really like this answer:
Copy-paste-ready version of #jibberia anwser:
def capitalize(line):
return ' '.join([s[0].upper() + s[1:] for s in line.split(' ')])
But some of the lines that I was sending split off some blank '' characters that caused errors when trying to do s[1:]. There is probably a better way to do this, but I had to add in a if len(s)>0, as in
return ' '.join([s[0].upper() + s[1:] for s in line.split(' ') if len(s)>0])
So I am trying to turn a string with multiple separators into a list, but dictated by where the separators are:
ex: ("Hooray! Finally, we're done.", "!,") to be converted to: ['Hooray', ' Finally', " we're done."] based upon the separators given.
As you can see, the string is split into a list based on the separators. My closest attempt:
for ch in separators:
original = ' '.join(original.split(ch))
return(original.split())
when I do this I get the result:
['Hooray', 'Finally', "we're", 'done.']
but I need to have " we're done" as one element of the list, not separated.
I got a suggestion to use a string accumulator, but I don't see how it helps to solve the issue
Thanks
Just do this, using re.split:
>>> import re
>>> original = "Hooray! Finally, we're done."
>>> re.split('!|,', start)
['Hooray', ' Finally', " we're done."]
EDIT:
Without regular expressions, you need a custom function such as:
def multisplit(s, delims):
pos = 0
for i, c in enumerate(s):
if c in delims:
yield s[pos:i]
pos = i + 1
yield list(s[pos:])
And then use it as so:
>>> original = "Hooray! Finally, we're done."
>>> multisplit(original, '!,')
['Hooray', ' Finally', " we're done."]
You can use re.split with an appropriate expression:
>>> data=("Hooray! Finally, we're done.", "!,")
>>> re.split("[%s]" % re.escape(data[1]), data[0])
['Hooray', ' Finally', " we're done."]
Splits the first element of the tuple by every character in the tuples second element.
The regex is made out of the tuple's second string where all regex special charcters are properly escaped. [some chars] means that every character inside the Square Brackets will be a seperator.
A simple solutoin without regular expressions could be:
def split_on_separators(word, separators):
word_list = [word]
auxList = []
for sep in separators:
for w in word_list:
auxList.extend(w.split(sep))
word_list = auxList
auxList = list()
return word_list
example = "Hooray! Finally, we're done."
separators = '!,'
split_on_separators(example, separators)
Out[49]: ['Hooray', ' Finally', " we're done."]
Change the string so all the separators are the same, then split on that one:
def separate(words, separators):
sep0 = separators[0]
for sep in separators[1:]:
words = words.replace(sep, sep0)
return words.split(sep0)
I have a string s with nested brackets: s = "AX(p>q)&E((-p)Ur)"
I want to remove all characters between all pairs of brackets and store in a new string like this: new_string = AX&E
i tried doing this:
p = re.compile("\(.*?\)", re.DOTALL)
new_string = p.sub("", s)
It gives output: AX&EUr)
Is there any way to correct this, rather than iterating each element in the string?
Another simple option is removing the innermost parentheses at every stage, until there are no more parentheses:
p = re.compile("\([^()]*\)")
count = 1
while count:
s, count = p.subn("", s)
Working example: http://ideone.com/WicDK
You can just use string manipulation without regular expression
>>> s = "AX(p>q)&E(qUr)"
>>> [ i.split("(")[0] for i in s.split(")") ]
['AX', '&E', '']
I leave it to you to join the strings up.
>>> import re
>>> s = "AX(p>q)&E(qUr)"
>>> re.compile("""\([^\)]*\)""").sub('', s)
'AX&E'
Yeah, it should be:
>>> import re
>>> s = "AX(p>q)&E(qUr)"
>>> p = re.compile("\(.*?\)", re.DOTALL)
>>> new_string = p.sub("", s)
>>> new_string
'AX&E'
Nested brackets (or tags, ...) are something that are not possible to handle in a general way using regex. See http://www.amazon.de/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124/ref=sr_1_1?ie=UTF8&s=gateway&qid=1304230523&sr=8-1-spell for details why. You would need a real parser.
It's possible to construct a regex which can handle two levels of nesting, but they are already ugly, three levels will already be quite long. And you don't want to think about four levels. ;-)
You can use PyParsing to parse the string:
from pyparsing import nestedExpr
import sys
s = "AX(p>q)&E((-p)Ur)"
expr = nestedExpr('(', ')')
result = expr.parseString('(' + s + ')').asList()[0]
s = ''.join(filter(lambda x: isinstance(x, str), result))
print(s)
Most code is from: How can a recursive regexp be implemented in python?
You could use re.subn():
import re
s = 'AX(p>q)&E((-p)Ur)'
while True:
s, n = re.subn(r'\([^)(]*\)', '', s)
if n == 0:
break
print(s)
Output
AX&E
this is just how you do it:
# strings
# double and single quotes use in Python
"hey there! welcome to CIP"
'hey there! welcome to CIP'
"you'll understand python"
'i said, "python is awesome!"'
'i can\'t live without python'
# use of 'r' before string
print(r"\new code", "\n")
first = "code in"
last = "python"
first + last #concatenation
# slicing of strings
user = "code in python!"
print(user)
print(user[5]) # print an element
print(user[-3]) # print an element from rear end
print(user[2:6]) # slicing the string
print(user[:6])
print(user[2:])
print(len(user)) # length of the string
print(user.upper()) # convert to uppercase
print(user.lstrip())
print(user.rstrip())
print(max(user)) # max alphabet from user string
print(min(user)) # min alphabet from user string
print(user.join([1,2,3,4]))
input()
The parameter to the function satisfy these rules:
It does not have any leading whitespace
It might have trailing whitespaces
There might be interleaved whitespaces in the string.
Goal: remove duplicate whitespaces that are interleaved & strip trailing whitespaces.
This is how I am doing it now:
# toks - a priori no leading space
def squeeze(toks):
import re
p = re.compile(r'\W+')
a = p.split( toks )
for i in range(0, len(a)):
if len(a[i]) == 0:
del a[i]
return ' '.join(a)
>>> toks( ' Mary Decker is hot ' )
Mary Decker is hot
Can this be improved ? Pythonic enough ?
This is how I would do it:
" ".join(toks.split())
PS. Is there a subliminal message in this question? ;-)
Can't you use rstrip()?
some_string.rstrip()
or strip() for stripping the string from both sides?
In addition: the strip() methods also support to pass in arbitrary strip characters:
string.strip = strip(s, chars=None)
strip(s [,chars]) -> string
Related: if you need to strip whitespaces in-between: split the string, strip the terms and re-join it.
Reading the API helps!
To answer your questions literally:
Yes, it could be improved. The first improvement would be to make it work.
>>> squeeze('x ! y')
'x y' # oops
Problem 1: You are using \W+ (non-word characters) when you should be using \s+ (whitespace characters)
>>> toks = 'x ! y z '
>>> re.split('\W+', toks)
['x', 'y', 'z', '']
>>> re.split('\s+', toks)
['x', '!', 'y', 'z', '']
Problem 2: The loop to delete empty strings works, but only by accident. If you wanted a general-purpose loop to delete empty strings in situ, you would need to work backwards, otherwise your subscript i would get out of whack with the number of elements remaining. It works here because re.split() without a capturing group can produce empty elements only at the start and end. You have defined away the start problem, and the end case doesn't cause a problem because there have been no prior deletions. So you are left with a very ugly loop which could be replaced by two lines:
if a and not a[-1]: # guard against empty list
del a[-1]
However unless your string is very long and you are worried about speed (in which case you probably shouldn't be using re), you'd probably want to allow for leading whitespace (assertions like "my data doesn't have leading whitespace" are ignored by convention) and just do it in a loop on the fly:
a = [x for x in p.split(toks) if x]
Next step is to avoid building the list a:
return ' '.join(x for x in p.split(toks) if x)
Now you did mention "Pythonic" ... so let's throw out all that re import and compile overhead stuff, and the genxp and just do this:
return ' '.join(toks.split())
Well, I tend not to use the re module if I can do the job reasonably with
the built-in functions and features. For example:
def toks(s):
return ' '.join([x for x in s.split(' ') if x])
... seems to accomplish the same goal with only built in split, join, and the list comprehension to filter our empty elements of the split string.
Is that more "Pythonic?" I think so. However my opinion is hardly authoritative.
This could be done as a lambda expression as well; and I think that would not be Pythonic.
Incidentally this assumes that you want to ONLY squeeze out duplicate spaces and trim leading and trailing spaces. If your intent is to munge all whitespace sequences into single spaces (and trim leading and trailing) then change s.split(' ') to s.split() -- passing no argument, or None, to the split() method is different than passing it a space.
To make your code more Pythonic, you must realize that in Python, a[i] being a string, instead of deleting a[i] if a[i]=='' , it is better keeping a[i] if a[i]!='' .
So, instead of
def squeeze(toks):
import re
p = re.compile(r'\W+')
a = p.split( toks )
for i in range(0, len(a)):
if len(a[i]) == 0:
del a[i]
return ' '.join(a)
write
def squeeze(toks):
import re
p = re.compile(r'\W+')
a = p.split( toks )
a = [x for x in a if x]
return ' '.join(a)
and then
def squeeze(toks):
import re
p = re.compile(r'\W+')
return ' '.join([x for x in p.split( toks ) if x])
Then, taking account that a function can receive a generator as well as a list:
def squeeze(toks):
import re
p = re.compile(r'\W+')
return ' '.join((x for x in p.split( toks ) if x))
and that doubling parentheses isn't obligatory:
def squeeze(toks):
import re
p = re.compile(r'\W+')
return ' '.join(x for x in p.split( toks ) if x)
.
.
Additionally, instead of obliging Python to verify if re is or isn't present in the namespace of the function squeeze() each time it is called (it is what it does), it would be better to pass re as an argument by defautlt :
import re
def squeeze(toks,re = re):
p = re.compile(r'\W+')
return ' '.join(x for x in p.split( toks ) if x)
and , even better:
import re
def squeeze(toks,p = re.compile(r'\W+')):
return ' '.join(x for x in p.split( toks ) if x)
.
.
Remark: the if x part in the expression is useful only to leave apart the heading '' and the ending '' occuring in the list p.split( toks ) when toks begins and ends with whitespaces.
But , instead of splitting, it is as much good to keep what is desired:
import re
def squeeze(toks,p = re.compile(r'\w+')):
return ' '.join(p.findall(toks))
.
.
All that said, the pattern r'\W+' in your question is wrong for your purpose, as John Machin pointed it out.
If you want to compress internal whitespaces and to remove trailing whitespaces, whitespace being taken in its pure sense designating the set of characters ' ' , '\f' , '\n' , '\r' , '\t' , '\v' ( see \s in re) , you must replace your spliting with this one:
import re
def squeeze(toks,p = re.compile(r'\s+')):
return ' '.join(x for x in p.split( toks ) if x)
or, keeping the right substrings:
import re
def squeeze(toks,p = re.compile(r'\S+')):
return ' '.join(p.findall(toks))
which is nothing else than the simpler and faster expression ' '.join(toks.split())
But if you want in fact just to compress internal and remove trailing characters ' ' and '\t' , keeping the newlines untouched, you will use
import re
def squeeze(toks,p = re.compile(r'[^ \t]+')):
return ' '.join(p.findall(toks))
and that can't be replaced by anything else.
I know this question is old. But why not use regex?
import re
result = ' Mary Decker is hot '
print(f"=={result}==")
result = re.sub('\s+$', '', result)
print(f"=={result}==")
result = re.sub('^\s+', '', result)
print(f"=={result}==")
result = re.sub('\s+', ' ', result)
print(f"=={result}==")
The output is
== Mary Decker is hot ==
== Mary Decker is hot==
==Mary Decker is hot==
==Mary Decker is hot==
I'm creating a function to create all 26 combinations of words with a fixed suffix. The script works except for the JOIN in the second-to-last line.
def create_word(suffix):
e=[]
letters="abcefghijklmnopqrstuvwxyz"
t=list(letters)
for i in t:
e.append(i)
e.append(suffix)
' '.join(e)
print e
Currently, it is printing ['a', 'suffix', 'b', 'suffix, ...etc]. And I want it to print out as one long string: 'aSuffixbSuffixcSuffix...etc.' Why isn't the join working in this? How can I fix this?
In addition, how would I separate the characters once I have the string? For example to translate "take the last character of the suffix and add a space to it every time ('aSuffixbSuffixcSuffix' --> 'aSuffix bSuffix cSuffix')". Or, more generally, to replace the x-nth character, where x is any integer (e.g., to replace the 3rd, 6th, 9th, etc. character some something I choose).
str.join returns the new value, not transform the existing one. Here's one way to accomplish it.
result = ' '.join(e)
print result
But if you're feeling clever, you can streamline a lot of the setup.
import string
def create_word(suffix):
return ' '.join(i + suffix for i in string.ascii_lowercase)
join doesn't change its arguments - it just returns a new string:
result = ' '.join(e)
return result
If you really want the output you specified (all of the results concatenated together):
>>> import string
>>> string.ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'
>>> letters = string.ascii_lowercase
>>> suffix = 'Suffix'
>>> ''.join('%s%s' % (l, suffix) for l in letters)
'aSuffixbSuffixcSuffixdSuffixeSuffixfSuffixgSuffixhSuffixiSuffixjSuffixkSuffixlSuffixmSuffixnSuffixoSuffixpSuffixqSuffixrSuffixsSuffixtSuffixuSuffixvSuffixwSuffixxSuffixySuffixzSuffix'
Beside the problem already mentioned by rekursive, you should have a look at list comprehension:
def create_word(suffix):
return ''.join(
[i+suffix for i in "abcefghijklmnopqrstuvwxyz"]
)
print create_word('suffix')