Python: String manipulation difficulty

Python: String manipulation difficulty - python

I have a function here which should change every letter in the string, apart from the first letter of each word, to an underscore. However, it does not seem to work.
def nameManipulate(title):
positions = []
for letter in range(len(title)):
if title[letter] == " ":
positions.append(letter+1)
positions.insert(0, 0)
print(positions) # Positions of first word of each letter in the string
for letter in range(len(title)):
if letter not in positions: # If the letter is not in the list
new_title = title.replace(str(title[letter]), "_") # Replace the letter with an underscore
return new_title
displayTitle = str(nameManipulate(title))
(the title variable has already been declared and works fine)
The code however doesn't seem to work. It creates an array of positions of all the letters which are at the beginning of the word and changes all those not in that list to an underscore, or should, in theory.
However, when I run the code, this is the output.
(The title in this case was "Jonny B Good")
[0, 6, 8]
Jonny B Goo_
Any help would be greatly appreciated, thank you.

Just use regex.
import re
print( re.sub(r"((?<!\b)\w+)", lambda m: len(m.group(1))*"_", "Johnny B Goode") )
(?<!\b)\w+ (negative lookbehind) matches one or more characters \w+ that is not preceded by an \b (word boundary), m in lambda m: ... is re.Match
object which contains groups we matched with () (capturing group), we return "_" repeated len(m.group(1)) times, and substitute.

You're only actually replacing the last letter. That's because of your final loop and return statement:
for letter in range(len(title)):
if letter not in positions: # If the letter is not in the list
new_title = title.replace(str(title[letter]), "_") # Replace the letter with an underscore
return new_title
You clearly intend new_title to collect all the changes in the loop - but you're actually assigning it to the result of replace on title, which is the original string. As a result, the only change you ever see in the final value is the last one.
The solution is simple: just assign the value in the title variable to new_title before the loop starts, and use that string's replace method. That way, new_title will accumulate all the changes:
new_title = title
for letter in range(len(title)):
if letter not in positions: # If the letter is not in the list
new_title = new_title.replace(str(new_title[letter]), "_") # Replace the letter with an underscore
return new_title
This actually still won't work as intended in all cases, because replace replaces the first occurrence of the given letter, not necessarily the one at the particular position you intend. I'll leave you to solve that yourself, but hopefully this helps you over that first hurdle.

Managed to fix it, it was a problem with the loop.
Rather than:
for letter in range(len(title)):
if letter not in positions: # If the letter is not in the list
new_title = title.replace(str(title[letter]), "_") # Replace the letter with an underscore
You should declare the new_title variable first, and have it in all instances of the .replace method.
def nameManipulate(title):
positions = []
new_title = title
for letter in range(len(title)):
if title[letter] == " ":
positions.append(letter+1)
positions.insert(0, 0)
print(positions) # Positions of first word of each letter in the string
for letter in range(len(title)):
if letter not in positions: # If the letter is not in the list
if title[letter] != " ":
new_title = new_title.replace(str(title[letter]), "_") # Replace the letter with an underscore
return new_title

I would just use regex for this
import re
title = "Johnny B Goode."
print(re.sub("([a-zA-Z])([a-zA-Z]+)",lambda m:m.group(1)+"_"*len(m.group(2)),title))

Your algorithm does not work correctly, if the one replaced character is also a starting character.
def nameManipulate(title):
result = []
replace = False
for character in title:
if character == " ":
replace = False
elif not replace:
replace = True
else:
character = "_"
result.append(character)
return "".join(result)

Related

Python String adjust

Hello is use some method like .isupper() in a loop, or string[i+1] to find my lower char but i don't know how to do that
input in function -> "ThisIsMyChar"
expected -> "This is my char"

I´ve done it with regex, could be done with less code but my intention is readable
import re
def split_by_upper(input_string):
pattern = r'[A-Z][a-z]*'
matches = re.findall(pattern, input_string)
if (matches):
output = matches[0]
for word in matches[1:]:
output += ' ' + word[0].lower() + word[1:]
return output
else:
return input_string
print(split_by_upper("ThisIsMyChar"))
>> split_by_upper() -> "This is my char"

You could use re.findall and str.lower:
>>> import re
>>> s = 'ThisIsMyChar'
>>> ' '.join(w.lower() if i >= 1 else w for i, w in enumerate(re.findall('.[^A-Z]*', s)))
'This is my char'

You should first try by yourself. If you didn't get it done, you can do something like this:
# to parse input string
def parse(str):
result= "" + str[0];
for i in range(1, len(str)):
ch = str[i]
if ch.isupper():
result += " ";
result += ch.lower();
return result;
# input string
str = "ThisIsMyChar";
print(parse(str))

First you need to run a for loop and check for Uppercase words then when you find it just add a space at the starting, lower the word and increment it to your new string. Simple, more code is explained in comments in the code itself.
def AddSpaceInTitleCaseString(string):
NewStr = ""
# Check for Uppercase string in the input string char-by-char.
for i in string:
# If it found one, add it to the NewStr variable with a space and lowering it's case.
if i.isupper(): NewStr += f" {i.lower()}"
# Else just add it as usual.
else: NewStr += i
# Before returning the NewStr, remove all the leading and trailing spaces from it.
# And as shown in your question I'm assuming that you want the first letter or your new sentence,
# to be in uppercase so just use 'capitalize' function for it.
return NewStr.strip().capitalize()
# Test.
MyStr = AddSpaceInTitleCaseString("ThisIsMyChar")
print(MyStr)
# Output: "This is my char"
Hope it helped :)

Here is a concise regex solution:
import re
capital_letter_pattern = re.compile(r'(?!^)[A-Z]')
def add_spaces(string):
return capital_letter_pattern.sub(lambda match: ' ' + match[0].lower(), string)
if __name__ == '__main__':
print(add_spaces('ThisIsMyChar'))
The pattern searches for capital letters ([A-Z]), and the (?!^) is negative lookahead that excludes the first character of the input ((?!foo) means "don't match foo, ^ is "start of line", so (?!^) is "don't match start of line").
The .sub(...) method of a pattern is usually used like pattern.sub('new text', 'my input string that I want changed'). You can also use a function in place of 'new text', in which case the function is called with the match object as an argument, and the value returned by the function is used as the replacement string.
The expression capital_letter_pattern.sub(lambda match: ' ' + match[0].lower(), string) replaces all matches (all capital letters except at the start of the line) using a lambda function to add a space before and make the letter lowercase. match[0] means "the entirety of the matched text", which in this case is the captial letter.

You can split it via Regex using r"(?<!^)(?=[A-Z])" pattern:
import re
txt = 'ThisIsMyChar'
c = re.compile(r"(?<!^)(?=[A-Z])")
first, *rest = map(str.lower, c.split(txt))
print(f'{first.title()} {" ".join(rest)}')
Pattern explanation:
(?<!^) checks to see if it is not at the beginning.
(?=[A-Z]) checks to see there a capital letter after it.
note These are non-capturing groups.

How do I send a character from a string that is NOT a letter or a number to the end of the string?

I am doing a Pig Latin code in which the following words are supposed to return the following responses:
"computer" == "omputercay"
"think" == "inkthay"
"algorithm" == "algorithmway"
"office" == "officeway"
"Computer" == "Omputercay"
"Science!" == "Iencescay!"
However, for the last word, my code does not push the '!' to the end of the string. What is the code that will make this happen?
All of them return the correct word apart from the last which returns "Ience!Scay!"
def pigLatin(word):
vowel = ("a","e","i","o","u")
first_letter = word[0]
if first_letter in vowel:
return word +'way'
else:
l = len(word)
i = 0
while i < l:
i = i + 1
if word[i] in vowel:
x = i
new_word = word[i:] + word[:i] + "ay"
if word[0].isupper():
new_word = new_word.title()
return new_word

For simplicity, how about you check if the word contains an exlamation point ! at the end and if it does just remove it and when you are done add it back. So instead of returning just check place ! at the end (if you discovered it does at the beggining).
def pigLatin(word):
vowel = ("a","e","i","o","u")
first_letter = word[0]
if first_letter in vowel:
return word +'way'
else:
hasExlamation = False
if word[-1] == '!':
word = word[:-1] # removes last letter
hasExlamation = True
l = len(word)
i = 0
while i < l:
i = i + 1
if word[i] in vowel:
x = i
new_word = word[i:] + word[:i] + "ay"
if word[0].isupper():
new_word = new_word.title()
break # do not return just break out of the `while` loop
if hasExlamation:
new_word += "!" # same as new_word = new_word + "!"
return new_word
That way it does not treat ! as a normal letter and the output is Iencescay!. You can of course do this with any other character similarly
specialCharacters = ["!"] # define this outside the function
def pigLatin():
# all of the code above
if word in specialCharacters:
hasSpecialCharacter = True
# then you can continue the same way

Regular expressions to the rescue. A regex pattern with word boundaries will make your life much easier in this case. A word boundary is exactly what it sounds like - it indicates the start- or end of a word, and is represented in the pattern with \b. In your case, the ! would be such a word boundary. The "word" itself consists of any character in the set a-z, A-Z, 0-9 or underscore, and is represented by \w in the pattern. The + means, one or more \w characters.
So, if the pattern is r"\b\w+\b", this will match any word (consisting of any of a-zA-Z0-9_), with leading or succeeding word boundaries.
import re
pattern = r"\b\w+\b"
sentence = "computer think algorithm office Computer Science!"
print(re.findall(pattern, sentence))
Output:
['computer', 'think', 'algorithm', 'office', 'Computer', 'Science']
>>>
Here, we're using re.findall to get a list of all substrings that matched the pattern. Notice, no whitespace or punctuation is included.
Let's introduce re.sub, which takes a pattern to look for, a string to look through, and another string with which to replace any match it finds. Instead of a replacement-string, you can instead pass in a function. This function must take a match object as a parameter, and must return a string with which to replace the current match.
import re
pattern = r"\b\w+\b"
sentence = "computer think algorithm office Computer Science!"
def replace(match):
return "*" * len(match.group())
print(re.sub(pattern, replace, sentence))
Output:
******** ***** ********* ****** ******** *******!
>>>
That's just for demonstration purposes.
Let's change gears for a second:
from string import ascii_letters as alphabet
print(alphabet)
Output:
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
>>>
That's handy for creating a string containing only consonants:
from string import ascii_letters as alphabet
consonants = "".join(set(alphabet) ^ set("aeiouAEIOU"))
print(consonants)
Output:
nptDPbHvsxKNWdYyrTqVQRlBCZShzgGjfkJMLmFXwc
>>>
We've taken the difference between the set of all alpha-characters and the set of only vowels. This yields the set of only consonants. Notice, that the order of the characters it not preserved in a set, but it doesn't matter in our case, since we'll be effectively treating this string as a set - testing for membership (if a character is in this string, it must be a consonant. The order does not matter).
Let's take advantage of this, and modify our pattern from earlier. Let's add two capturing groups - the first will capture any leading consonants (if they exist), the second will capture all remaining alpha characters (consonants or vowels) before the terminating word boundary:
import re
from string import ascii_letters as alphabet
consonants = "".join(set(alphabet) ^ set("aeiouAEIOU"))
pattern = fr"\b([{consonants}]*)(\w+)\b"
word = "computer"
match = re.match(pattern, word)
if match is not None:
print(f"Group one is \"{match.group(1)}\"")
print(f"Group two is \"{match.group(2)}\"")
Output:
Group one is "c"
Group two is "omputer"
>>>
As you can see, the first group captured c, and the second group captured omputer. Separating the match into two groups will be useful later when we construct the pig-latin translation. We can get even cuter by naming our capturing groups. This isn't required, but it will make things a bit easier to read later on:
pattern = fr"\b(?P<prefix>[{consonants}]*)(?P<rest>\w+)\b"
Now, the first capturing group is named prefix, and can be accessed via match.group("prefix"), rather than match.group(1). The second capturing group is named rest, and can be accessed via match.group("rest") instead of match.group(2).
Putting it all together:
import re
from string import ascii_letters as alphabet
consonants = "".join(set(alphabet) ^ set("aeiouAEIOU"))
pattern = fr"\b(?P<prefix>[{consonants}]*)(?P<rest>\w+)\b"
sentence = "computer think algorithm office Computer Science!"
def to_pig_latin(match):
rest = match.group("rest")
prefix = match.group("prefix")
result = rest + prefix
if len(prefix) == 0:
# if the 'prefix' capturing group was empty
# the word must have started with a vowel
# so, the suffix is 'way'
result += "way"
# that also means we need to check if the first character...
# ... (which must be in 'rest') was upper-case.
if rest[0].isupper():
result = result.title()
else:
result += "ay"
if prefix[0].isupper():
result = result.title()
return result
print(re.sub(pattern, to_pig_latin, sentence))
Output:
omputercay inkthay algorithmway officeway Omputercay Iencescay!
>>>
That was the verbose version. The definition of to_pig_latin can be shortened to:
def to_pig_latin(match):
rest = match.group("rest")
prefix = match.group("prefix")
return (str, str.title)[(prefix or rest)[0].isupper()](rest + prefix + "way"[bool(prefix):])

Shuffle words' characters while maintaining sentence structure and punctuations

So, I want to be able to scramble words in a sentence, but:
Word order in the sentence(s) is left the same.
If the word started with a capital letter, the jumbled word must also start with a capital letter
(i.e., the first letter gets capitalised).
Punctuation marks . , ; ! and ? need to be preserved.
For instance, for the sentence "Tom and I watched Star Wars in the cinema, it was
fun!" a jumbled version would be "Mto nad I wachtde Tars Rswa ni het amecin, ti wsa
fnu!".
from random import shuffle
def shuffle_word(word):
word = list(word)
if word.title():
???? #then keep first capital letter in same position in word?
elif char == '!' or '.' or ',' or '?':
???? #then keep their position?
else:
shuffle(word)
return''.join(word)
L = input('try enter a sentence:').split()
print([shuffle_word(word) for word in L])
I am ok for understanding how to jumble each word in the sentence but... struggling with the if statement to apply specifics? please help!

Here is my code. Little different from your logic. Feel free to optimize the code.
import random
def shuffle_word(words):
words_new = words.split(" ")
out=''
for word in words_new:
l = list(word)
if word.istitle():
result = ''.join(random.sample(word, len(word)))
out = out + ' ' + result.title()
elif any(i in word for i in ('!','.',',')):
result = ''.join(random.sample(word[:-1], len(word)-1))
out = out + ' ' + result+word[-1]
else:
result = ''.join(random.sample(word, len(word)))
out = out +' ' + result
return (out[1:])
L = "Tom and I watched Star Wars in the cinema, it was fun!"
print(shuffle_word(L))
Output of above code execution:
Mto nda I whaecdt Atsr Swra in hte ienamc, ti wsa nfu!
Hope it helps. Cheers!

Glad to see you've figured out most of the logic.
To maintain the capitalization of the first letter, you can check it beforehand and capitalize the "new" first letter later.
first_letter_is_cap = word[0].isupper()
shuffle(word)
if first_letter_is_cap:
# Re-capitalize first letter
word[0] = word[0].upper()
To maintain the position of a trailing punctuation, strip it first and add it back afterwards:
last_char = word[-1]
if last_char in ".,;!?":
# Strip the punctuation
word = word[:-1]
shuffle(word)
if last_char in ".,;!?":
# Add it back
word.append(last_char)

Since this is a string processing algorithm I would consider using regular expressions. Regex gives you more flexibility, cleaner code and you can get rid of the conditions for edge cases. For example this code handles apostrophes, numbers, quote marks and special phrases like date and time, without any additional code and you can control these just by changing the pattern of regular expression.
from random import shuffle
import re
# Characters considered part of words
pattern = r"[A-Za-z']+"
# shuffle and lowercase word characters
def shuffle_word(word):
w = list(word)
shuffle(w)
return ''.join(w).lower()
# fucntion to shuffle word used in replace
def replace_func(match):
return shuffle_word(match.group())
def shuffle_str(str):
# replace words with their shuffled version
shuffled_str = re.sub(pattern, replace_func, str)
# find original uppercase letters
uppercase_letters = re.finditer(r"[A-Z]", str)
# make new characters in uppercase positions uppercase
char_list = list(shuffled_str)
for match in uppercase_letters:
uppercase_index = match.start()
char_list[uppercase_index] = char_list[uppercase_index].upper()
return ''.join(char_list)
print(shuffle_str('''Tom and I watched "Star Wars" in the cinema's new 3D theater yesterday at 8:00pm, it was fun!'''))

This works with any sentence, even if was "special" characters in a row, preserving all the punctuaction marks:
from random import sample
def shuffle_word(sentence):
new_sentence=""
word=""
for i,char in enumerate(sentence+' '):
if char.isalpha():
word+=char
else:
if word:
if len(word)==1:
new_sentence+=word
else:
new_word=''.join(sample(word,len(word)))
if word==word.title():
new_sentence+=new_word.title()
else:
new_sentence+=new_word
word=""
new_sentence+=char
return new_sentence
text="Tom and I watched Star Wars in the cinema, it was... fun!"
print(shuffle_word(text))
Output:
Mto nda I hctawed Rast Aswr in the animec, ti asw... fnu!

How add "." before each letter in string

as title says i simply need to add a . before each letter in my string
while having vowels removed and making it lowercase
i got it working just cant add the .s there
here is my code
s = str(input())
vowels = ('a','e','o','u','i','A','E','O','U','I')
for letter in s:
if letter in vowels:
s = s.replace(letter,'').replace()
print(s)

Use:
s = input()
vowels = set('aeoui')
print(''.join([f'.{x}' for x in s.lower() if x not in vowels]))
Sample run:
Hello
.h.l.l

All other answers will insert a . in front of every character in the string, but you specified that you want letters only. So I am assuming that you only want a-z to be prepended with a . for which I suggest re.sub:
import re
s = "This is some test string. It contains some symbols also ()!!"
result = re.sub('[aeoui]', '', s.lower()) # remove vowels and make lowercase
result = re.sub("([a-z])", r".\1", result) # prepend '.' to every letter
print(result)
Outputs:
.t.h.s .s .s.m .t.s.t .s.t.r.n.g. .t .c.n.t.n.s .s.m .s.y.m.b.l.s .l.s ()!!

You can do it step by step:
Replace all the vowels in the string with ''
for i in s:
for j in vowels:
s=s.replace(j,'')
Convert the string into lowercase:
s=s.lower()
Adding '.' in between each letters:
s='.' + '.'.join(s)

How to get the first capital letter and then each that isn't followed by another capital letter in Python?

I am developing a script that creates abbrevations for a list of names that are too long for me to use. I need to split each name into parts divided by dots and then take each capital letter that is at a beginning of a word. Just like this:
InternetGatewayDevice.DeviceInfo.Description -> IGD.DI.D
However, if there are more consecutive capital letters (like in the following example), I only want to take the first one and then the one that is not followed by a capital letter. So, from "WANDevice" I want get "WD". Like this:
InternetGatewayDevice.WANDevice.1.WANConnectionDevice.1.WANIPConnection.1.PortMapping.7.ExternalPort -> IGD.WD1.WCD1.WC1.PM7.EP
So far I have written this script:
data = json.load(open('./cwmp/tr069/test.json'))
def shorten(i):
x = i.split(".")
abbreviations = []
for each in x:
abbrev = ''
for each_letter in each:
if each_letter.isupper():
abbrev = abbrev + each_letter
abbreviations.append(abbrev)
short_string = ".".join(abbreviations)
return short_string
for i in data["mappings"]["cwmp_genieacs"]["properties"]:
if "." in i:
shorten(i)
else:
pass
It works correctly "translates" the first example but I am not sure how to do the rest. I think if I had to, I would probably think of some way to do it (like maybe split the strings into single characters) but I am looking for an efficient & smart way to do it. I will be grateful for any advice.
I am using Python 3.6.
EDIT:
I decided to try a different approach and iterate over single characters and I pretty easily achieved what I wanted. Nevertheless, thank you for your answers and suggestions, I will most certainly go through them.
def char_by_char(i):
abbrev= ""
for index, each_char in enumerate(i):
# Define previous and next characters
if index == 0:
previous_char = None
else:
previous_char = i[index - 1]
if index == len(i) - 1:
next_char = None
else:
next_char = i[index + 1]
# Character is uppercase
if each_char.isupper():
if next_char is not None:
if next_char.isupper():
if (previous_char is ".") or (previous_char is None):
abbrev = abbrev + each_char
else:
pass
else:
abbrev = abbrev + each_char
else:
pass
# Character is "."
elif each_char is ".":
if next_char.isdigit():
pass
else:
abbrev = abbrev + each_char
# Character is a digit
elif each_char.isdigit():
abbrev = abbrev + each_char
# Character is lowercase
else:
pass
print(abbrev)
for i in data["mappings"]["cwmp_genieacs"]["properties"]:
if "." in i:
char_by_char(i)
else:
pass

You could use a regular expression for that. For instance, you could use capture groups for the characters that you want to keep, and perform a substitution where you only keep those captured characters:
import re
def shorten(s):
return re.sub(r'([A-Z])(?:[A-Z]*(?=[A-Z])|[^A-Z.]*)|\.(\d+)[^A-Z.]*', r'\1\2', s)
Explanation:
([A-Z]): capture a capital letter
(?: ): this is a grouping to make clear what the scope is of the | operation inside of it. This is not a capture group like above (so this will be deleted)
[A-Z]*: zero or more capital letters (greedy)
(?=[A-Z]): one more capital letter should follow, but don't process it -- leave it for the next match
|: logical OR
[^A-Z.]*: zero or more non-capitals, non-point (following the captured capital letter): these will be deleted
\.(\d+): a literal point followed by one or more digits: capture the digits (in order to throw away the dot).
In the replacement argument, the captured groups are injected again:
\1: first capture group (this is the capital letter)
\2: second capture group (these are the digit(s) that followed a dot)
In one match, only one of the capture groups will have something, the other will just be the empty string. But the regular expression matching is repeated throughout the whole input string.

Here is a non-regex solution.
def shorten(i):
abr_list = []
abrev = ''
parts = i.split('.')
for word in parts:
for x in range(len(word)):
if x == 0 and word[x].isupper() or word[x].isupper() and not word[x + 1].isupper() or word[x].isnumeric():
abrev += word[x]
abr_list.append(abrev)
abrev = ''
return join_parts(abr_list)
def join_parts(part_list):
ret = part_list[0]
for part in part_list[1:]:
if not part.isnumeric():
ret += '.%s' % part
else:
ret += part
return ret

import re
def foo(s):
print(''.join(list(map(
lambda matchobj: matchobj[0], re.finditer(
r'(?<![A-Z])[A-Z]|[A-Z](?![A-Z])|\.', s)))))
foo('InternetGatewayDevice.DeviceInfo.Description')
foo('WANDevice')
# output:
# IGD.DI.D
# WD
There's three major parts to the regex:
match if it's a capital letter with no capital letter in front of it (?<![A-Z])[A-Z] or
match if it's a capital letter with no capital letter after it [A-Z](?![A-Z]) or
if it's a literal period
https://docs.python.org/3.6/library/re.html

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: String manipulation difficulty - python

I would just use regex for this import re title = "Johnny B Goode." print(re.sub("([a-zA-Z])([a-zA-Z]+)",lambda m:m.group(1)+"_"*len(m.group(2)),title))

Related

Python String adjust

How do I send a character from a string that is NOT a letter or a number to the end of the string?

Shuffle words' characters while maintaining sentence structure and punctuations

How add "." before each letter in string

How to get the first capital letter and then each that isn't followed by another capital letter in Python?

Categories

Resources