I would like to read through a file and capitalize the first letters in a string using Python, but some of the strings may contain numbers first. Specifically the file might look like this:
"hello world"
"11hello world"
"66645world hello"
I would like this to be:
"Hello world"
"11Hello world"
"66645World hello"
I have tried the following, but this only capitalizes if the letter is in the first position.
with open('input.txt') as input, open("output.txt", "a") as output:
for line in input:
output.write(line[0:1].upper()+line[1:-1].lower()+"\n")
Any suggestions? :-)
Using regular expressions:
for line in output:
m = re.search('[a-zA-Z]', line);
if m is not None:
index = m.start()
output.write(line[0:index] + line[index].upper() + line[index + 1:])
You can use regular expression to find the position of the first alphabet and then use upper() on that index to capitalize that character. Something like this should work:
import re
s = "66645hello world"
m = re.search(r'[a-zA-Z]', s)
index = m.start()
You can write a function with a for loop:
x = "hello world"
y = "11hello world"
z = "66645world hello"
def capper(mystr):
for idx, i in enumerate(mystr):
if not i.isdigit(): # or if i.isalpha()
return ''.join(mystr[:idx] + mystr[idx:].capitalize())
return mystr
print(list(map(capper, (x, y, z))))
['Hello world', '11Hello world', '66645World hello']
How about this?
import re
text = "1234hello"
index = re.search("[a-zA-Z]", text).start()
text_list = list(text)
text_list[index] = text_list[index].upper()
''.join(text_list)
The result is: 1234Hello
May be worth trying ...
>>> s = '11hello World'
>>> for i, c in enumerate(s):
... if not c.isdigit():
... break
...
>>> s[:i] + s[i:].capitalize()
'11Hello world'
You can find the first alpha character and capitalize it like this:
with open("input.txt") as in_file, open("output.txt", "w") as out_file:
for line in in_file:
pos = next((i for i, e in enumerate(line) if e.isalpha()), 0)
line = line[:pos] + line[pos].upper() + line[pos + 1:]
out_file.write(line)
Which Outputs:
Hello world
11Hello world
66645World hello
Like this, for example:
import re
re_numstart = re.compile(r'^([0-9]*)(.*)')
def capfirst(s):
ma = re_numstart.match(s)
return ma.group(1) + ma.group(2).capitalize()
try this:
with open('input.txt') as input, open("output.txt", "a") as output:
for line in input:
t_line = ""
for c in line:
if c.isalpha():
t_line += c.capitalize()
t_line += line[line.index(c)+1:]
break
else:
t_line += c
output.write(t_line)
Execution result:
Hello world
11Hello world
66645World hello
You can use regular expression for that:
import re
line = "66645world hello"
regex = re.compile(r'\D')
tofind = regex.search(line)
pos = line.find(tofind.group(0))+1
line = line[0:pos].upper()+line[pos:-pos].lower()+"\n"
print(line)
output: 66645World
There is probably a one-line REGEX approach, but using title() should also work:
def capitalise_first_letter(s):
spl = s.split()
return spl[0].title() + ' ' + ' '.join(spl[1:])
s = ['123hello world',
"hello world",
"11hello world",
"66645world hello"]
for i in s:
print(capitalise_first_letter(i))
Producing:
Hello world
11Hello world
66645World hello
Okay, there is already a lot of answers, that should work.
I find them overly complicated or complex though...
Here is a simpler solution:
for s in ("hello world", "11hello world", "66645world hello"):
first_letter = next(c for c in s if not c.isdigit())
print(s.replace(first_letter, first_letter.upper(), 1))
The title() method will capitalize the first alpha character of the string, and ignore the digits before it. It also works well for non-ASCII characters, contrary to the regex methods using [a-zA-Z].
From the doc:
str.title()
Return a titlecased version of the string where words
start with an uppercase character and the remaining characters are
lowercase. [...] The algorithm uses a simple language-independent
definition of a word as groups of consecutive letters. The definition
works in many contexts but it means that apostrophes in contractions
and possessives form word boundaries, which may not be the desired
result:
We can take advantage of it this way:
def my_capitalize(s):
first, rest = s.split(maxsplit=1)
split_on_quote = first.split("'", maxsplit=1)
split_on_quote[0] = split_on_quote[0].title()
first = "'".join(split_on_quote)
return first + ' ' + rest
A few tests:
tests = ["hello world", "11hello world", "66645world hello", "123ça marche!", "234i'm good"]
for s in tests:
print(my_capitalize(s))
# Hello world
# 11Hello world
# 66645World hello
# 123Ça marche! # The non-ASCII ç was turned to uppercase
# 234I'm good # Words containing a quote are treated properly
With re.sub and repl as a function:
If repl is a function, it is called for every non-overlapping
occurrence of pattern. The function takes a single match object
argument, and returns the replacement string.
def capitalize(m):
return m.group(1) + m.group(2).upper() + m.group(3)
lines = ["hello world", "11hello world", "66645world hello"]
for line in lines:
print re.sub(r'(\d*)(\D)(.*)', capitalize, line)
Output:
Hello world
11Hello world
66645World hello
Using isdigit() and title() for strings:
s = ['123hello world', "hello world", "11hello world", "66645world hello"]
print [each if each[0].isdigit() else each.title() for each in s ]
# ['123hello world', 'Hello World', '11hello world', '66645world hello']
If you want to convert the strings starting with a character but not to capitalize the characters after a digit, you can try this piece of code:
def solve(s):
str1 =""
for i in s.split(' '):
str1=str1+str(i.capitalize()+' ') #capitalizes the first character of the string
return str1
>>solve('hello 5g')
>>Hello 5g
Related
I know how to write strings in reverse
txt = "Hello World"[::-1]
print(txt)
but I don't know how to do it with one character still in the same place
like when I type world it should be wdlro
thanks
Just prepend the first character to the remainder of the string (reversed using slice notation, but stopping just before we reach index 0, which is the first character):
>>> s = "world"
>>> s[0] + s[:0:-1]
'wdlro'
word = 'w' + "".join(list('world').remove('w'))[::-1]
If you want to reverse all the words in the text based on the criteria (skipping first character of each word):
txt = "Hello World"
result = []
for word in txt.split():
result.append(word[0]+word[1:][::-1])
print (result)
This is a more generic answer that allows you to pick a random location within the string to hold in the same position:
txt = "Hello World"
position = 3
lock_char = txt[position]
new_string = list((txt[:position] + txt[position+1:])[::-1])
new_string.insert(position, lock_char)
listToStr = ''.join([str(elem) for elem in new_string])
print(listToStr)
Result: dlrloW oleH
The simple way using only range:
txt = "Hello World"
position = 10
[first, lock, last] = txt[:position], txt[position],
txt[position+1:]
new_string = (first + last)[::-1]
[first, last] = new_string[:position], new_string[position:]
new_txt = first + lock + last
print(new_txt)
I'm new in python. I'm trying to reverse each word in the sentence. I wrote following code for that and it is working perfeclty.
My code:
[From answer]
import re
str = "I am Mike!"
def reverse_word(matchobj):
return matchobj.group(1)[::-1]
res = re.sub(r"([A-Za-z]+)", reverse_word, str)
print(res)
But I want to add one condition in that..only words should reverse not any symbol.[ except alphanumerical words and words contains hyphen]
Updated##
Sample:
input: "I am Mike! and123 my-age is 12"
current output: "I ma ekiM! dna123 ym-ega si 12"
required output: "I ma ekiM! 321dna ege-ym si 21"
The Regex: ([A-Za-z]+)
You can use the character class [A-Za-z] for checking any word with one or more length, capture it then reverse the group 1 using a function using the re.sub function.
import re
str = "I am Mike!"
def reverse_word(matchobj):
return matchobj.group(1)[::-1]
res = re.sub(r"([A-Za-z]+)", reverse_word, str)
print(res)
Outputting:
'I ma ekiM!'
Update:
You can tweak the code a little to acheive your results:
import re
str = "I am Mike! and123 my-age is 12"
def reverse_word(matchobj):
hyphen_word_pattern = r"([A-Za-z]+)\-([A-Za-z]+)"
match = re.search(hyphen_word_pattern, matchobj.group(1))
if match:
return re.sub(hyphen_word_pattern, f"{match.group(2)[::-1]}-{match.group(1)[::-1]}", match.group(0))
else:
return matchobj.group(1)[::-1]
res = re.sub(r"([A-Za-z]+\-?[A-Za-z]+)", reverse_word, str)
print(res)
Outputting:
I ma ekiM! dna123 ega-ym si 12
Don't use re at all
def reverse_words_in_string(string):
spl = string.split()
for i, word in enumerate(spl):
spl[i] = word[::-1]
return ' '.join(spl)
gives
'I ma !ekiM 321dna ega-ym si 21'
One approach which might work would be to make an additional iteration over the list of words and use re.sub to move an optional leading punctuation character back to the end of the now reversed word:
s = "I am Mike!"
split_s = s.split()
r_word = [word[::-1] for word in split_s]
r_word = [re.sub(r'^([^\s\w])(.*)$', '\\2\\1', i) for i in r_word]
new_s = " ".join(r_word)
print(new_s)
I ma ekiM!
I want to add a space between Arabic/Farsi and English words in my text.
It should be with regular expression in python.
for example:
input: "علیAli" output: "علی Ali"
input: "علیAliرضا" output: "علی Ali رضا"
input: "AliعلیRezaرضا" output: "Ali علی Reza رضا"
and what ever like them.
You can do it using re.sub likes the following in python 3:
rx = r'[a-zA-Z]+'
output = re.sub(rx, r' \g<0> ', input)
Instead of regular expression , I think this can be done by comparing unicodes. I tried to code the same but didn't know how to again split /r/n to get the required output. This code might be useful for some one.
import codecs,string
def detect_language(character):
maxchar = max(character)
if u'\u0041' <= maxchar <= u'\u007a':
return 'eng'
with codecs.open('letters.txt', encoding='utf-8') as f:
eng_list = []
eng_var =0
arab_list = []
arab_var=0
input = f.read()
for i in input:
isEng = detect_language(i)
if isEng == "eng":
eng_list.append(i)
eng_var = eng_var + 1
elif '\n' in i or '\r' in i:
eng_list.append(i)
arab_list.append(i)
else:
arab_list.append(i)
arab_var =arab_var +1
temp = str(eng_list)
temp1 = temp.encode('ascii','ignore')
I am trying to a string to be used as regex String.
In the following code:
_pattern is a pattern like abba and I am trying to check _string follows the _pattern (eg. catdogdogcat)
rxp in the following code is the regular expression that I am trying to create to match to _string (eg. for above example it will be (.+)(.+)\\2\\1 ). Which is being successfully generated. But the re.match() is returning None.
I want to understand why it is not working and how to correct it ?
import re
_pattern = "abba" #raw_input().strip()
_string = "catdogdogcat" #raw_input().strip()
hm = {}
rxp = ""
c = 1
for x in _pattern:
if hm.has_key(x):
rxp += hm[x]
continue
else:
rxp += "(.+)"
hm[x]="\\\\"+str(c)
c+=1
print rxp
#print re.match(rxp,_string) -> (Tried) Not working
#print re.match(r'rxp', _string) -> (Tried) Not working
print re.match(r'%s' %rxp, _string) # (Tried) Not working
Output
(.+)(.+)\\2\\1
None
Expected Output
(.+)(.+)\\2\\1
<_sre.SRE_Match object at 0x000000000278FE88>
The thing is that your regex string variable has double \\ instead of a single one.
You can use
rxp.replace("\\\\", "\\")
in .match like this:
>>> print re.match(rxp.replace("\\\\", "\\"), _string)
<_sre.SRE_Match object at 0x10bf87c68>
>>> print re.match(rxp.replace("\\\\", "\\"), _string).groups()
('cat', 'dog')
EDIT:
You can also avoid getting double \\ like this:
import re
_pattern = "abba" #raw_input().strip()
_string = "catdogdogcat" #raw_input().strip()
hm = {}
rxp = ""
c = 1
for x in _pattern:
if x in hm:
rxp += hm[x]
continue
else:
rxp += "(.+)"
hm[x]="\\" + str(c)
c+=1
print rxp
print re.match(rxp,_string)
You should use string formatting, and not hard-code rxp into the string:
print re.match(r'%s'%rxp, _string)
I have a user entered string and I want to search it and replace any occurrences of a list of words with my replacement string.
import re
prohibitedWords = ["MVGame","Kappa","DatSheffy","DansGame","BrainSlug","SwiftRage","Kreygasm","ArsonNoSexy","GingerPower","Poooound","TooSpicy"]
# word[1] contains the user entered message
themessage = str(word[1])
# would like to implement a foreach loop here but not sure how to do it in python
for themessage in prohibitedwords:
themessage = re.sub(prohibitedWords, "(I'm an idiot)", themessage)
print themessage
The above code doesn't work, I'm sure I don't understand how python for loops work.
You can do that with a single call to sub:
big_regex = re.compile('|'.join(map(re.escape, prohibitedWords)))
the_message = big_regex.sub("repl-string", str(word[1]))
Example:
>>> import re
>>> prohibitedWords = ['Some', 'Random', 'Words']
>>> big_regex = re.compile('|'.join(map(re.escape, prohibitedWords)))
>>> the_message = big_regex.sub("<replaced>", 'this message contains Some really Random Words')
>>> the_message
'this message contains <replaced> really <replaced> <replaced>'
Note that using str.replace may lead to subtle bugs:
>>> words = ['random', 'words']
>>> text = 'a sample message with random words'
>>> for word in words:
... text = text.replace(word, 'swords')
...
>>> text
'a sample message with sswords swords'
while using re.sub gives the correct result:
>>> big_regex = re.compile('|'.join(map(re.escape, words)))
>>> big_regex.sub("swords", 'a sample message with random words')
'a sample message with swords swords'
As thg435 points out, if you want to replace words and not every substring you can add the word boundaries to the regex:
big_regex = re.compile(r'\b%s\b' % r'\b|\b'.join(map(re.escape, words)))
this would replace 'random' in 'random words' but not in 'pseudorandom words'.
try this:
prohibitedWords = ["MVGame","Kappa","DatSheffy","DansGame","BrainSlug","SwiftRage","Kreygasm","ArsonNoSexy","GingerPower","Poooound","TooSpicy"]
themessage = str(word[1])
for word in prohibitedwords:
themessage = themessage.replace(word, "(I'm an idiot)")
print themessage
Based on Bakariu's answer,
A simpler way to use re.sub would be like this.
words = ['random', 'words']
text = 'a sample message with random words'
new_sentence = re.sub("random|words", "swords", text)
The output is "a sample message with swords swords"
Code:
prohibitedWords =["MVGame","Kappa","DatSheffy","DansGame",
"BrainSlug","SwiftRage","Kreygasm",
"ArsonNoSexy","GingerPower","Poooound","TooSpicy"]
themessage = 'Brain'
self_criticism = '(I`m an idiot)'
final_message = [i.replace(themessage, self_criticism) for i in prohibitedWords]
print final_message
Result:
['MVGame', 'Kappa', 'DatSheffy', 'DansGame', '(I`m an idiot)Slug', 'SwiftRage',
'Kreygasm', 'ArsonNoSexy', 'GingerPower', 'Poooound','TooSpicy']