Related
Input:
string = "My dear adventurer, do you understand the nature of the given discussion?"
expected output:
string = 'My dear ##########, do you ########## the nature ## the given ##########?'
How can you replace the third word in a string of words with the # length equivalent of that word while avoiding counting special characters found in the string such as apostrophes('), quotations("), full stops(.), commas(,), exclamations(!), question marks(?), colons(:) and semicolons (;).
I took the approach of converting the string to a list of elements but am finding difficulty filtering out the special characters and replacing the words with the # equivalent. Is there a better way to go about it?
I solved it with:
s = "My dear adventurer, do you understand the nature of the given discussion?"
def replace_alphabet_with_char(word: str, replacement: str) -> str:
new_word = []
alphabet = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
for c in word:
if c in alphabet:
new_word.append(replacement)
else:
new_word.append(c)
return "".join(new_word)
every_nth_word = 3
s_split = s.split(' ')
result = " ".join([replace_alphabet_with_char(s_split[i], '#') if i % every_nth_word == every_nth_word - 1 else s_split[i] for i in range(len(s_split))])
print(result)
Output:
My dear ##########, do you ########## the nature ## the given ##########?
There are more efficient ways to solve this question, but I hope this is the simplest!
My approach is:
Split the sentence into a list of the words
Using that, make a list of every third word.
Remove unwanted characters from this
Replace third words in original string with # times the length of the word.
Here's the code (explained in comments) :
# original line
line = "My dear adventurer, do you understand the nature of the given discussion?"
# printing original line
print(f'\n\nOriginal Line:\n"{line}"\n')
# printing somehting to indicate that next few prints will be for showing what is happenning after each lone
print('\n\nStages of parsing:')
# splitting by spaces, into list
wordList = line.split(' ')
# printing wordlist
print(wordList)
# making list of every third word
thirdWordList = [wordList[i-1] for i in range(1,len(wordList)+1) if i%3==0]
# pritning third-word list
print(thirdWordList)
# characters that you don't want hashed
unwantedCharacters = ['.','/','|','?','!','_','"',',','-','#','\n','\\',':',';','(',')','<','>','{','}','[',']','%','*','&','+']
# replacing these characters by empty strings in the list of third-words
for unwantedchar in unwantedCharacters:
for i in range(0,len(thirdWordList)):
thirdWordList[i] = thirdWordList[i].replace(unwantedchar,'')
# printing third word list, now without punctuation
print(thirdWordList)
# replacing with #
for word in thirdWordList:
line = line.replace(word,len(word)*'#')
# Voila! Printing the result:
print(f'\n\nFinal Output:\n"{line}"\n\n')
Hope this helps!
Following works and does not use regular expressions
special_chars = {'.','/','|','?','!','_','"',',','-','#','\n','\\'}
def format_word(w, fill):
if w[-1] in special_chars:
return fill*(len(w) - 1) + w[-1]
else:
return fill*len(w)
def obscure(string, every=3, fill='#'):
return ' '.join(
(format_word(w, fill) if (i+1) % every == 0 else w)
for (i, w) in enumerate(string.split())
)
Here are some example usage
In [15]: obscure(string)
Out[15]: 'My dear ##########, do you ########## the nature ## the given ##########?'
In [16]: obscure(string, 4)
Out[16]: 'My dear adventurer, ## you understand the ###### of the given ##########?'
In [17]: obscure(string, 3, '?')
Out[17]: 'My dear ??????????, do you ?????????? the nature ?? the given ???????????'
With help of some regex. Explanation in the comments.
import re
imp = "My dear adventurer, do you understand the nature of the given discussion?"
every_nth = 3 # in case you want to change this later
out_list = []
# split the input at spaces, enumerate the parts for looping
for idx, word in enumerate(imp.split(' ')):
# only do the special logic for multiples of n (0-indexed, thus +1)
if (idx + 1) % every_nth == 0:
# find how many special chars there are in the current segment
len_special_chars = len(re.findall(r'[.,!?:;\'"]', word))
# ^ add more special chars here if needed
# subtract the number of special chars from the length of segment
str_len = len(word) - len_special_chars
# repeat '#' for every non-special char and add the special chars
out_list.append('#'*str_len + word[-len_special_chars] if len_special_chars > 0 else '')
else:
# if the index is not a multiple of n, just add the word
out_list.append(word)
print(' '.join(out_list))
A mixed of regex and string manipulation
import re
string = "My dear adventurer, do you understand the nature of the given discussion?"
new_string = []
for i, s in enumerate(string.split()):
if (i+1) % 3 == 0:
s = re.sub(r'[^\.:,;\'"!\?]', '#', s)
new_string.append(s)
new_string = ' '.join(new_string)
print(new_string)
I have a string. Now I want to split the string into parts if anything matches from two different lists. how can I do that ? there what i have.
dummy_word = "I have a HTML file"
dummy_type = ["HTML","JSON","XML"]
dummy_file_type = ["file","document","paper"]
for e in dummy_type:
if e in dummy_word:
type_found = e
print("type ->" , e)
dum = dummy_word.split(e)
complete_dum = "".join(dum)
for c in dummy_file_type:
if c in complete_dum:
then = complete_dum.split("c")
print("file type ->",then)
In the given scenario my expected output is ["I have a", "HTML","file"]
These sort of tasks a handled pretty well by itertools.groupby(). Here the key will translate to individual words if the words is in the set of words, or False if it's not. This allows all the non-special words to group together and each special word to become its own element:
from itertools import groupby
dummy_word = "I have a HTML file"
dummy_type = ["HTML","JSON","XML"]
dummy_file_type = ["file","document","paper"]
words = set(dummy_type).union(dummy_file_type)
[" ".join(g) for k, g in
groupby(dummy_word.split(), key=lambda word: (word in words) and word)]
# ['I have a', 'HTML', 'file']
This worked for me:
dummy_word = "I have a HTML file"
dummy_type = ["HTML","JSON","XML"]
dummy_file_type = ["file","document","paper"]
temp = ""
dummy_list = []
for word in dummy_word.split():
if word in dummy_type or word in dummy_file_type:
if temp:
dummy_list.append(temp)
print(temp, "delete")
print(temp)
new_word = word + " "
dummy_list.append(new_word)
temp = ""
else:
temp += word + " "
print(temp)
print(dummy_list)
One more way using re:
>>> list(map(str.strip, re.sub("|".join(dummy_type + dummy_file_type), lambda x: "," + x.group(), dummy_word).split(',')))
['I have a', 'HTML', 'file']
>>>
First, form a regex pattern by concatenating all the types using join. Using re.sub, the string is replaced where tokens are prepended with a comma, and then we split the string using comma separator. map is used to strip the whitespaces.
I have the following string:
string1 = "1/0/1/A1,A2"
string2 = "1/1/A1,A2"
string3 = "0/A1,A2"
In the above strings I have to replace the character with zero if it does not exist. The default structure will be "number/number/number/any_character`", if any of number is missing It has to replace with zero. The answer will be as follows.
print(string1) = "1/0/1/A1,A2"
print(string2) = "1/1/0/A1,A2"
print(string3) = "0/0/0/A1,A2"
You can use str.split:
def pad_string(_input, _add='0'):
*_vals, _str = _input.split('/')
return '/'.join([*_vals, *([_add]*(3-len(_vals))), _str])
results = list(map(pad_string, ['1/0/1/A1,A2', '1/1/A1,A2', '0/A1,A2']))
Output:
['1/0/1/A1,A2', '1/1/0/A1,A2', '0/0/0/A1,A2']
You can easily fill missing elements from the left:
def fillZeros(item):
chunks = item.split('/')
for inserts in range(0, 4 - len(chunks)):
chunks.insert(0, '0')
return '/'.join(chunks)
string1 = "1/0/1/A1,A2"
string2 = "1/1/A1,A2"
string3 = "0/A1,A2"
for myString in (string1, string2, string3):
print fillZeros(myString)
Prints:
1/0/1/A1,A2
0/1/1/A1,A2
0/0/0/A1,A2
But for you string2 example you need to identify which element is missing: 1/1/A1,A2. Is the first or the third element missing ?!
If you want to use just string manipulation and loops, try this
strings_list = []
for string in [string1, string2, string3]: # make list containing all strings
strings_list.append(string)
new_strings = [] # make list containing the new strings
for string in strings_list:
if string.count("0/") + string.count("1/") == 3:
# identify the strings not missing a number
new_strings.append(string)
if string.count("0/") + string.count("1/") == 2:
# identify the strings missing 1 number
string = string[:4] + "0/" + string[4:]
new_strings.append(string)
if string.count("0/") + string.count("1/") == 1:
# identify the strings missing 2 numbers
string = string[:2] + "0/" + string[2:]
new_strings.append(string)
print(new_strings)
This results in ['1/0/1/A1,A2', '1/1/0/A1,A2', '0/0/A1,A2'].
I would like to read through a file and capitalize the first letters in a string using Python, but some of the strings may contain numbers first. Specifically the file might look like this:
"hello world"
"11hello world"
"66645world hello"
I would like this to be:
"Hello world"
"11Hello world"
"66645World hello"
I have tried the following, but this only capitalizes if the letter is in the first position.
with open('input.txt') as input, open("output.txt", "a") as output:
for line in input:
output.write(line[0:1].upper()+line[1:-1].lower()+"\n")
Any suggestions? :-)
Using regular expressions:
for line in output:
m = re.search('[a-zA-Z]', line);
if m is not None:
index = m.start()
output.write(line[0:index] + line[index].upper() + line[index + 1:])
You can use regular expression to find the position of the first alphabet and then use upper() on that index to capitalize that character. Something like this should work:
import re
s = "66645hello world"
m = re.search(r'[a-zA-Z]', s)
index = m.start()
You can write a function with a for loop:
x = "hello world"
y = "11hello world"
z = "66645world hello"
def capper(mystr):
for idx, i in enumerate(mystr):
if not i.isdigit(): # or if i.isalpha()
return ''.join(mystr[:idx] + mystr[idx:].capitalize())
return mystr
print(list(map(capper, (x, y, z))))
['Hello world', '11Hello world', '66645World hello']
How about this?
import re
text = "1234hello"
index = re.search("[a-zA-Z]", text).start()
text_list = list(text)
text_list[index] = text_list[index].upper()
''.join(text_list)
The result is: 1234Hello
May be worth trying ...
>>> s = '11hello World'
>>> for i, c in enumerate(s):
... if not c.isdigit():
... break
...
>>> s[:i] + s[i:].capitalize()
'11Hello world'
You can find the first alpha character and capitalize it like this:
with open("input.txt") as in_file, open("output.txt", "w") as out_file:
for line in in_file:
pos = next((i for i, e in enumerate(line) if e.isalpha()), 0)
line = line[:pos] + line[pos].upper() + line[pos + 1:]
out_file.write(line)
Which Outputs:
Hello world
11Hello world
66645World hello
Like this, for example:
import re
re_numstart = re.compile(r'^([0-9]*)(.*)')
def capfirst(s):
ma = re_numstart.match(s)
return ma.group(1) + ma.group(2).capitalize()
try this:
with open('input.txt') as input, open("output.txt", "a") as output:
for line in input:
t_line = ""
for c in line:
if c.isalpha():
t_line += c.capitalize()
t_line += line[line.index(c)+1:]
break
else:
t_line += c
output.write(t_line)
Execution result:
Hello world
11Hello world
66645World hello
You can use regular expression for that:
import re
line = "66645world hello"
regex = re.compile(r'\D')
tofind = regex.search(line)
pos = line.find(tofind.group(0))+1
line = line[0:pos].upper()+line[pos:-pos].lower()+"\n"
print(line)
output: 66645World
There is probably a one-line REGEX approach, but using title() should also work:
def capitalise_first_letter(s):
spl = s.split()
return spl[0].title() + ' ' + ' '.join(spl[1:])
s = ['123hello world',
"hello world",
"11hello world",
"66645world hello"]
for i in s:
print(capitalise_first_letter(i))
Producing:
Hello world
11Hello world
66645World hello
Okay, there is already a lot of answers, that should work.
I find them overly complicated or complex though...
Here is a simpler solution:
for s in ("hello world", "11hello world", "66645world hello"):
first_letter = next(c for c in s if not c.isdigit())
print(s.replace(first_letter, first_letter.upper(), 1))
The title() method will capitalize the first alpha character of the string, and ignore the digits before it. It also works well for non-ASCII characters, contrary to the regex methods using [a-zA-Z].
From the doc:
str.title()
Return a titlecased version of the string where words
start with an uppercase character and the remaining characters are
lowercase. [...] The algorithm uses a simple language-independent
definition of a word as groups of consecutive letters. The definition
works in many contexts but it means that apostrophes in contractions
and possessives form word boundaries, which may not be the desired
result:
We can take advantage of it this way:
def my_capitalize(s):
first, rest = s.split(maxsplit=1)
split_on_quote = first.split("'", maxsplit=1)
split_on_quote[0] = split_on_quote[0].title()
first = "'".join(split_on_quote)
return first + ' ' + rest
A few tests:
tests = ["hello world", "11hello world", "66645world hello", "123ça marche!", "234i'm good"]
for s in tests:
print(my_capitalize(s))
# Hello world
# 11Hello world
# 66645World hello
# 123Ça marche! # The non-ASCII ç was turned to uppercase
# 234I'm good # Words containing a quote are treated properly
With re.sub and repl as a function:
If repl is a function, it is called for every non-overlapping
occurrence of pattern. The function takes a single match object
argument, and returns the replacement string.
def capitalize(m):
return m.group(1) + m.group(2).upper() + m.group(3)
lines = ["hello world", "11hello world", "66645world hello"]
for line in lines:
print re.sub(r'(\d*)(\D)(.*)', capitalize, line)
Output:
Hello world
11Hello world
66645World hello
Using isdigit() and title() for strings:
s = ['123hello world', "hello world", "11hello world", "66645world hello"]
print [each if each[0].isdigit() else each.title() for each in s ]
# ['123hello world', 'Hello World', '11hello world', '66645world hello']
If you want to convert the strings starting with a character but not to capitalize the characters after a digit, you can try this piece of code:
def solve(s):
str1 =""
for i in s.split(' '):
str1=str1+str(i.capitalize()+' ') #capitalizes the first character of the string
return str1
>>solve('hello 5g')
>>Hello 5g
import re
string = "is2 Thi1s T4est 3a"
def order(sentence):
res = ''
count = 1
list = sentence.split()
for i in list:
for i in list:
a = re.findall('\d+', i)
if a == [str(count)]:
res += " ".join(i)
count += 1
print(res)
order(string)
Above there is a code which I have problem with. Output which I should get is:
"Thi1s is2 3a T4est"
Instead I'm getting the correct order but with spaces in the wrong places:
"T h i 1 si s 23 aT 4 e s t"
Any idea how to make it work with this code concept?
You are joining the characters of each word:
>>> " ".join('Thi1s')
'T h i 1 s'
You want to collect your words into a list and join that instead:
def order(sentence):
number_words = []
count = 1
words = sentence.split()
for word in words:
for word in words:
matches = re.findall('\d+', word)
if matches == [str(count)]:
number_words.append(word)
count += 1
result = ' '.join(number_words)
print(result)
I used more verbose and clear variable names. I also removed the list variable; don't use list as a variable name if you can avoid it, as that masks the built-in list name.
What you implemented comes down to a O(N^2) (quadratic time) sort. You could instead use the built-in sort() function to bring this to O(NlogN); you'd extract the digit and sort on its integer value:
def order(sentence):
digit = re.compile(r'\d+')
return ' '.join(
sorted(sentence.split(),
key=lambda w: int(digit.search(w).group())))
This differs a little from your version in that it'll only look at the first (consecutive) digits, it doesn't care about the numbers being sequential, and will break for words without digits. It also uses a return to give the result to the caller rather than print. Just use print(order(string)) to print the return value.
If you assume the words are numbered consecutively starting at 1, then you can sort them in O(N) time even:
def order(sentence):
digit = re.compile(r'\d+')
words = sentence.split()
result = [None] * len(words)
for word in words:
index = int(digit.search(word).group())
result[index - 1] = word
return ' '.join(result)
This works by creating a list of the same length, then using the digits from each word to put the word into the correct index (minus 1, as Python lists start at 0, not 1).
I think the bug is simply in the misuse of join(). You want to concatenate the current sorted string. i is simply a token, hence simply add it to the end of the string. Code untested.
import re
string = "is2 Thi1s T4est 3a"
def order(sentence):
res = ''
count = 1
list = sentence.split()
for i in list:
for i in list:
a = re.findall('\d+', i)
if a == [str(count)]:
res = res + " " + i # your bug here
count += 1
print(res)
order(string)