Shuffle words' characters while maintaining sentence structure and punctuations - python

So, I want to be able to scramble words in a sentence, but:
Word order in the sentence(s) is left the same.
If the word started with a capital letter, the jumbled word must also start with a capital letter
(i.e., the first letter gets capitalised).
Punctuation marks . , ; ! and ? need to be preserved.
For instance, for the sentence "Tom and I watched Star Wars in the cinema, it was
fun!" a jumbled version would be "Mto nad I wachtde Tars Rswa ni het amecin, ti wsa
fnu!".
from random import shuffle
def shuffle_word(word):
word = list(word)
if word.title():
???? #then keep first capital letter in same position in word?
elif char == '!' or '.' or ',' or '?':
???? #then keep their position?
else:
shuffle(word)
return''.join(word)
L = input('try enter a sentence:').split()
print([shuffle_word(word) for word in L])
I am ok for understanding how to jumble each word in the sentence but... struggling with the if statement to apply specifics? please help!

Here is my code. Little different from your logic. Feel free to optimize the code.
import random
def shuffle_word(words):
words_new = words.split(" ")
out=''
for word in words_new:
l = list(word)
if word.istitle():
result = ''.join(random.sample(word, len(word)))
out = out + ' ' + result.title()
elif any(i in word for i in ('!','.',',')):
result = ''.join(random.sample(word[:-1], len(word)-1))
out = out + ' ' + result+word[-1]
else:
result = ''.join(random.sample(word, len(word)))
out = out +' ' + result
return (out[1:])
L = "Tom and I watched Star Wars in the cinema, it was fun!"
print(shuffle_word(L))
Output of above code execution:
Mto nda I whaecdt Atsr Swra in hte ienamc, ti wsa nfu!
Hope it helps. Cheers!

Glad to see you've figured out most of the logic.
To maintain the capitalization of the first letter, you can check it beforehand and capitalize the "new" first letter later.
first_letter_is_cap = word[0].isupper()
shuffle(word)
if first_letter_is_cap:
# Re-capitalize first letter
word[0] = word[0].upper()
To maintain the position of a trailing punctuation, strip it first and add it back afterwards:
last_char = word[-1]
if last_char in ".,;!?":
# Strip the punctuation
word = word[:-1]
shuffle(word)
if last_char in ".,;!?":
# Add it back
word.append(last_char)

Since this is a string processing algorithm I would consider using regular expressions. Regex gives you more flexibility, cleaner code and you can get rid of the conditions for edge cases. For example this code handles apostrophes, numbers, quote marks and special phrases like date and time, without any additional code and you can control these just by changing the pattern of regular expression.
from random import shuffle
import re
# Characters considered part of words
pattern = r"[A-Za-z']+"
# shuffle and lowercase word characters
def shuffle_word(word):
w = list(word)
shuffle(w)
return ''.join(w).lower()
# fucntion to shuffle word used in replace
def replace_func(match):
return shuffle_word(match.group())
def shuffle_str(str):
# replace words with their shuffled version
shuffled_str = re.sub(pattern, replace_func, str)
# find original uppercase letters
uppercase_letters = re.finditer(r"[A-Z]", str)
# make new characters in uppercase positions uppercase
char_list = list(shuffled_str)
for match in uppercase_letters:
uppercase_index = match.start()
char_list[uppercase_index] = char_list[uppercase_index].upper()
return ''.join(char_list)
print(shuffle_str('''Tom and I watched "Star Wars" in the cinema's new 3D theater yesterday at 8:00pm, it was fun!'''))

This works with any sentence, even if was "special" characters in a row, preserving all the punctuaction marks:
from random import sample
def shuffle_word(sentence):
new_sentence=""
word=""
for i,char in enumerate(sentence+' '):
if char.isalpha():
word+=char
else:
if word:
if len(word)==1:
new_sentence+=word
else:
new_word=''.join(sample(word,len(word)))
if word==word.title():
new_sentence+=new_word.title()
else:
new_sentence+=new_word
word=""
new_sentence+=char
return new_sentence
text="Tom and I watched Star Wars in the cinema, it was... fun!"
print(shuffle_word(text))
Output:
Mto nda I hctawed Rast Aswr in the animec, ti asw... fnu!

Related

All possible combinations add of letters

I greet you all I need help I have this code here
how can i achieve this if I want letters to be added between the letters
example
input
east
output
aeast
eaast
eaast
easat
easta
aeast
beast
ebast
eabst
easbt
eastb
ect...
leters = 'abcdefghijklmnopqrstuvwxyz'
words = ('east')
for letersinput in leters:
for rang in range(1):
print(letersinput+""+str(words)+"")
I'm looking for the exact opposite of this, how to do it?
def missing_char(word, n):
n = abs(n)
front = word[:n]
back = word[n+1:]
return front + back
Iterate through the words and letters, and for each position, slice the word at the position and insert the letter:
letters = 'abcdefghijklmnopqrstuvwxyz'
words = ['east']
for word in words:
for letter in letters:
for i in range(len(word)+1):
print(word[:i] + letter + word[i:])
print()
Output:
aeast
eaast
eaast
easat
easta
beast
ebast
eabst
easbt
eastb
...
zeast
ezast
eazst
easzt
eastz

How to reverse the words of a string considering the punctuation?

Here is what I have so far:
def reversestring(thestring):
words = thestring.split(' ')
rev = ' '.join(reversed(words))
return rev
stringing = input('enter string: ')
print(reversestring(stringing))
I know I'm missing something because I need the punctuation to also follow the logic.
So let's say the user puts in Do or do not, there is no try.. The result should be coming out as .try no is there , not do or Do, but I only get try. no is there not, do or Do. I use a straightforward implementation which reverse all the characters in the string, then do something where it checks all the words and reverses the characters again but only to the ones with ASCII values of letters.
Try this (explanation in comments of code):
s = "Do or do not, there is no try."
o = []
for w in s.split(" "):
puncts = [".", ",", "!"] # change according to needs
for c in puncts:
# if a punctuation mark is in the word, take the punctuation and add it to the rest of the word, in the beginning
if c in w:
w = c + w[:-1] # w[:-1] gets everthing before the last char
o.append(w)
o = reversed(o) # reversing list to reverse sentence
print(" ".join(o)) # printing it as sentence
#output: .try no is there ,not do or Do
Your code does exactly what it should, splitting on space doesn't separator a dot ro comma from a word.
I'd suggest you use re.findall to get all words, and all punctation that interest you
import re
def reversestring(thestring):
words = re.findall(r"\w+|[.,]", thestring)
rev = ' '.join(reversed(words))
return rev
reversestring("Do or do not, there is no try.") # ". try no is there , not do or Do"
You can use regular expressions to parse the sentence into a list of words and a list of separators, then reverse the word list and combine them together to form the desired string. A solution to your problem would look something like this:
import re
def reverse_it(s):
t = "" # result, empty string
words = re.findall(r'(\w+)', s) # just the words
not_s = re.findall(r'(\W+)', s) # everything else
j = len(words)
k = len(not_s)
words.reverse() # reverse the order of word list
if re.match(r'(\w+)', s): # begins with a word
for i in range(k):
t += words[i] + not_s[i]
if j > k: # and ends with a word
t += words[k]
else: # begins with punctuation
for i in range(j):
t += not_s[i] + words[i]
if k > j: # ends with punctuation
t += not_s[j]
return t #result
def check_reverse(p):
q = reverse_it(p)
print("\"%s\", \"%s\"" % (p, q) )
check_reverse('Do or do not, there is no try.')
Output
"Do or do not, there is no try.", "try no is there, not do or Do."
It is not a very elegant solution but sure does work!

How do I send a character from a string that is NOT a letter or a number to the end of the string?

I am doing a Pig Latin code in which the following words are supposed to return the following responses:
"computer" == "omputercay"
"think" == "inkthay"
"algorithm" == "algorithmway"
"office" == "officeway"
"Computer" == "Omputercay"
"Science!" == "Iencescay!"
However, for the last word, my code does not push the '!' to the end of the string. What is the code that will make this happen?
All of them return the correct word apart from the last which returns "Ience!Scay!"
def pigLatin(word):
vowel = ("a","e","i","o","u")
first_letter = word[0]
if first_letter in vowel:
return word +'way'
else:
l = len(word)
i = 0
while i < l:
i = i + 1
if word[i] in vowel:
x = i
new_word = word[i:] + word[:i] + "ay"
if word[0].isupper():
new_word = new_word.title()
return new_word
For simplicity, how about you check if the word contains an exlamation point ! at the end and if it does just remove it and when you are done add it back. So instead of returning just check place ! at the end (if you discovered it does at the beggining).
def pigLatin(word):
vowel = ("a","e","i","o","u")
first_letter = word[0]
if first_letter in vowel:
return word +'way'
else:
hasExlamation = False
if word[-1] == '!':
word = word[:-1] # removes last letter
hasExlamation = True
l = len(word)
i = 0
while i < l:
i = i + 1
if word[i] in vowel:
x = i
new_word = word[i:] + word[:i] + "ay"
if word[0].isupper():
new_word = new_word.title()
break # do not return just break out of the `while` loop
if hasExlamation:
new_word += "!" # same as new_word = new_word + "!"
return new_word
That way it does not treat ! as a normal letter and the output is Iencescay!. You can of course do this with any other character similarly
specialCharacters = ["!"] # define this outside the function
def pigLatin():
# all of the code above
if word in specialCharacters:
hasSpecialCharacter = True
# then you can continue the same way
Regular expressions to the rescue. A regex pattern with word boundaries will make your life much easier in this case. A word boundary is exactly what it sounds like - it indicates the start- or end of a word, and is represented in the pattern with \b. In your case, the ! would be such a word boundary. The "word" itself consists of any character in the set a-z, A-Z, 0-9 or underscore, and is represented by \w in the pattern. The + means, one or more \w characters.
So, if the pattern is r"\b\w+\b", this will match any word (consisting of any of a-zA-Z0-9_), with leading or succeeding word boundaries.
import re
pattern = r"\b\w+\b"
sentence = "computer think algorithm office Computer Science!"
print(re.findall(pattern, sentence))
Output:
['computer', 'think', 'algorithm', 'office', 'Computer', 'Science']
>>>
Here, we're using re.findall to get a list of all substrings that matched the pattern. Notice, no whitespace or punctuation is included.
Let's introduce re.sub, which takes a pattern to look for, a string to look through, and another string with which to replace any match it finds. Instead of a replacement-string, you can instead pass in a function. This function must take a match object as a parameter, and must return a string with which to replace the current match.
import re
pattern = r"\b\w+\b"
sentence = "computer think algorithm office Computer Science!"
def replace(match):
return "*" * len(match.group())
print(re.sub(pattern, replace, sentence))
Output:
******** ***** ********* ****** ******** *******!
>>>
That's just for demonstration purposes.
Let's change gears for a second:
from string import ascii_letters as alphabet
print(alphabet)
Output:
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
>>>
That's handy for creating a string containing only consonants:
from string import ascii_letters as alphabet
consonants = "".join(set(alphabet) ^ set("aeiouAEIOU"))
print(consonants)
Output:
nptDPbHvsxKNWdYyrTqVQRlBCZShzgGjfkJMLmFXwc
>>>
We've taken the difference between the set of all alpha-characters and the set of only vowels. This yields the set of only consonants. Notice, that the order of the characters it not preserved in a set, but it doesn't matter in our case, since we'll be effectively treating this string as a set - testing for membership (if a character is in this string, it must be a consonant. The order does not matter).
Let's take advantage of this, and modify our pattern from earlier. Let's add two capturing groups - the first will capture any leading consonants (if they exist), the second will capture all remaining alpha characters (consonants or vowels) before the terminating word boundary:
import re
from string import ascii_letters as alphabet
consonants = "".join(set(alphabet) ^ set("aeiouAEIOU"))
pattern = fr"\b([{consonants}]*)(\w+)\b"
word = "computer"
match = re.match(pattern, word)
if match is not None:
print(f"Group one is \"{match.group(1)}\"")
print(f"Group two is \"{match.group(2)}\"")
Output:
Group one is "c"
Group two is "omputer"
>>>
As you can see, the first group captured c, and the second group captured omputer. Separating the match into two groups will be useful later when we construct the pig-latin translation. We can get even cuter by naming our capturing groups. This isn't required, but it will make things a bit easier to read later on:
pattern = fr"\b(?P<prefix>[{consonants}]*)(?P<rest>\w+)\b"
Now, the first capturing group is named prefix, and can be accessed via match.group("prefix"), rather than match.group(1). The second capturing group is named rest, and can be accessed via match.group("rest") instead of match.group(2).
Putting it all together:
import re
from string import ascii_letters as alphabet
consonants = "".join(set(alphabet) ^ set("aeiouAEIOU"))
pattern = fr"\b(?P<prefix>[{consonants}]*)(?P<rest>\w+)\b"
sentence = "computer think algorithm office Computer Science!"
def to_pig_latin(match):
rest = match.group("rest")
prefix = match.group("prefix")
result = rest + prefix
if len(prefix) == 0:
# if the 'prefix' capturing group was empty
# the word must have started with a vowel
# so, the suffix is 'way'
result += "way"
# that also means we need to check if the first character...
# ... (which must be in 'rest') was upper-case.
if rest[0].isupper():
result = result.title()
else:
result += "ay"
if prefix[0].isupper():
result = result.title()
return result
print(re.sub(pattern, to_pig_latin, sentence))
Output:
omputercay inkthay algorithmway officeway Omputercay Iencescay!
>>>
That was the verbose version. The definition of to_pig_latin can be shortened to:
def to_pig_latin(match):
rest = match.group("rest")
prefix = match.group("prefix")
return (str, str.title)[(prefix or rest)[0].isupper()](rest + prefix + "way"[bool(prefix):])

Convert the string to a string in which the words are separated by spaces and only the first word starts with an uppercase letter [duplicate]

This question already has answers here:
Split a string at uppercase letters
(22 answers)
Closed 2 years ago.
I am trying to make a script that will accept a string as input in which all of the words are run together, but the first character of each word is uppercase. It should convert the string to a string in which the words are separated by spaces and only the first word starts with an uppercase letter.
For Example (The Input):
"StopWhateverYouAreDoingInterestingIDontCare"
The expected output:
"Stop whatever you are doing interesting I dont care"
Here is the one I wrote so far:
string_input = "StopWhateverYouAreDoingInterestingIDontCare"
def organize_string():
start_sentence = string_input[0]
index_of_i = string_input.index("I")
for i in string_input[1:]:
if i == "I" and string_input[index_of_i + 1].isupper():
start_sentence += ' ' + i
elif i.isupper():
start_sentence += ' ' + i.lower()
else:
start_sentence += i
return start_sentence
While this takes care of some parts, I am struggling with differentiating if the letter "I" is single or a whole word. Here is my output:
"Stop whatever you are doing interesting i dont care"
Single "I" needs to be uppercased, while the "I" in the word "Interesting" should be lowercased "interesting".
I will really appreciate all the help!
A regular expression will do in this example.
import re
s = "StopWhateverYouAreDoingInterestingIDontCare"
t = re.sub(r'(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z])', ' ', s)
Explained:
(?<=[a-z])(?=[A-Z]) - a lookbehind for a lowercase letter followed by a lookahead uppercase letter
| - (signifies or)
(?<=[A-Z])(?=[A-Z]) - a lookbehind for a uppercase letter followed by a lookahead uppercase letter
This regex substitutes a space when there is a lowercase letter followed by an uppercase letter, OR, when there is an uppercase letter followed by an uppercase letter.
UPDATE: This doesn't correctly lowercase the words (with the exception of I and the first_word)
UPDATE2: The fix to this is:
import re
s = "StopWhateverYouAreDoingInterestingIDontCare"
first_word, *rest = re.split(r'(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z])', s)
rest = [word.lower() if word != 'I' else word for word in rest]
print(first_word, ' '.join(rest))
Prints:
Stop whatever you are doing interesting I dont care
Update 3: I looked at why your code failed to correctly form the sentence (which I should have done in the first place instead of posting my own solution :-)).
Here is the corrected code with some remarks about the changes.
string_input = "StopWhateverYouAreDoingInterestingIDontCare"
def organize_string():
start_sentence = string_input[0]
#index_of_i = string_input.index("I")
for i, char in enumerate(string_input[1:], start=1):
if char == "I" and string_input[i + 1].isupper():
start_sentence += ' ' + char
elif char.isupper():
start_sentence += ' ' + char.lower()
else:
start_sentence += char
return start_sentence
print(organize_string())
!. I commented out the line index_of_i = string_input.index("I") as it doesn't do what you need (it finds the index of the first capital I and not an I that should stand alone (it finds the index of the I in Interesting instead of the IDont further in the string_input string). It is not a correct statement.
for i, char in enumerate(string_input[1:], 1) enumerate states the index of the letters in the string starting at 1 (since string_input[1:] starts at index 1 so they are in sync). i is the index of a letter in string_input.
I changed the i's to char to make it clearer that char is the character. Other than these changes, the code stands as you wrote it.
Now the program gives the correct output.
string_input = "StopWhateverYouAreDoingInterestingIDontCare"
counter = 1
def organize_string():
global counter
start_sentence = string_input[0]
for i in string_input[1:]:
if i == "I" and string_input[counter+1].isupper():
start_sentence += ' ' + i
elif i.isupper():
start_sentence += ' ' + i.lower()
else:
start_sentence += i
counter += 1
print(start_sentence)
organize_string()
I made some changes to your program. I used a counter to check the index position. I get your expected output:
Stop whatever you are doing interesting I dont care
s = 'StopWhateverYouAreDoingInterestingIDontCare'
ss = ' '
res = ''.join(ss + x if x.isupper() else x for x in s).strip(ss).split(ss)
sr = ''
for w in res:
sr = sr + w.lower() + ' '
print(sr[0].upper() + sr[1:])
output
Stop whatever you are doing interesting i dont care
I hope this will work fine :-
string_input = "StopWhateverYouAreDoingInterestingIDontCare"
def organize_string():
i=0
while i<len(string_input):
if string_input[i]==string_input[i].upper() and i==0 :
print(' ',end='')
print(string_input[i].upper(),end='')
elif string_input[i]==string_input[i].upper() and string_input[i+1]==string_input[i+1].upper():
print(' ',end='')
print(string_input[i].upper(),end='')
elif string_input[i]==string_input[i].upper() and i!=0:
print(' ',end='')
print(string_input[i].lower(),end='')
if string_input[i]!=string_input[i].upper():
print(string_input[i],end='')
i=i+1
organize_string()
Here is one solution utilising the re package to split the string based on the upper case characters. [Docs]
import re
text = "StopWhateverYouAreDoingInterestingIDontCare"
# Split text by upper character
text_splitted = re.split('([A-Z])', text)
print(text_splitted)
As we see in the output below the separator (The upper case character) and the text before and after is kept. This means that the upper case character is always followed by the rest of the word. The empty first string originates from the first upper case character, which is the first separator.
# Output of print
[
'',
'S', 'top',
'W', 'hatever',
'Y', 'ou',
'A', 're',
'D', 'oing',
'I', 'nteresting',
'I', '',
'D', 'ont',
'C', 'are'
]
As we have seen the first character is always followed by the rest of the word. By combining the two we have the splitted words. This also allows us to easily handle your special case with the I
# Remove first character because it is always empty if first char is always upper
text_splitted = text_splitted[1:]
result = []
for i in range(0, len(text_splitted), 2):
word = text_splitted[i]+text_splitted[i+1]
if (i > 0) and (word != 'I') :
word = word.lower()
result.append(word)
result = ' '.join(result)
split the sentence into individual words. If you find the word "I" in this list, leave it alone. Leave the first word alone. All of the other words, you cast to lower case.
You have to use some string manipulation like this:
output=string_input[0]
for l in string_input[1:]:
if l.islower():
new_s+=l
else:
new_s+=' '+l.lower()
print(output)

How to separate a irregularly cased string to get the words? - Python

I have the following word list.
as my words are not all delimited by capital latter. the word list would consist words such as 'USA' , I am not sure how to do that. 'USA' should be as a one word. cannot be separated.
myList=[u'USA',u'Chancellor', u'currentRank', u'geolocDepartment', u'populationUrban', u'apparentMagnitude', u'Train', u'artery',
u'education', u'rightChild', u'fuel', u'Synagogue', u'Abbey', u'ResearchProject', u'languageFamily', u'building',
u'SnookerPlayer', u'productionCompany', u'sibling', u'oclc', u'notableStudent', u'totalCargo', u'Ambassador', u'copilote',
u'codeBook', u'VoiceActor', u'NuclearPowerStation', u'ChessPlayer', u'runwayLength', u'horseRidingDiscipline']
How to edit the element in the list.
I would like to get change the element in the list as below shows:
updatemyList=[u'USA',u'Chancellor', u'current Rank', u'geoloc Department', u'population Urban', u'apparent Magnitude', u'Train', u'artery',
u'education', u'right Child', u'fuel', u'Synagogue', u'Abbey', u'Research Project', u'language Family', u'building',
u'Snooker Player', u'production Company', u'sibling', u'oclc', u'notable Student', u'total Cargo', u'Ambassador', u'copilote',
u'code Book', u'Voice Actor', u'Nuclear Power Station', u'Chess Player', u'runway Length', u'horse Riding Discipline']
the word is able to separate
You could use re.sub
import re
first_cap_re = re.compile('(.)([A-Z][a-z]+)')
all_cap_re = re.compile('([a-z0-9])([A-Z])')
def convert(word):
s1 = first_cap_re.sub(r'\1 \2', word)
return all_cap_re.sub(r'\1 \2', s1)
updated_words = [convert(word) for word in myList]
Adapated from: Elegant Python function to convert CamelCase to snake_case?
Could do this using regex, but easier to comprehend with a small algorithm (ignoring corner cases like abbreviations e.g NLTK)
def split_camel_case(string):
new_words = []
current_word = ""
for char in string:
if char.isupper() and current_word:
new_words.append(current_word)
current_word = ""
current_word += char
return " ".join(new_words + [current_word])
old_words = ["HelloWorld", "MontyPython"]
new_words = [split_camel_case(string) for string in old_words]
print(new_words)
You can use a regular expression to prepend each upper-case letter that's not at the beginning of a word with a space:
re.sub(r"(?!\b)(?=[A-Z])", " ", your_string)
The bit in the first pair of parens means "not at the beginning of a word", and the bit in the second pair of parens means "followed by an uppercase letter". The regular expression matches the empty string at places where these two conditions hold, and replaces the empty string with a space, i.e. it inserts a space at these positions.
The following code snippet separates the words as you want:
myList=[u'Chancellor', u'currentRank', u'geolocDepartment', u'populationUrban', u'apparentMagnitude', u'Train', u'artery', u'education', u'rightChild', u'fuel', u'Synagogue', u'Abbey', u'ResearchProject', u'languageFamily', u'building', u'SnookerPlayer', u'productionCompany', u'sibling', u'oclc', u'notableStudent', u'totalCargo', u'Ambassador', u'copilote', u'codeBook', u'VoiceActor', u'NuclearPowerStation', u'ChessPlayer', u'runwayLength', u'managerYearsEndYear', 'horseRidingDiscipline']
updatemyList = []
for word in myList:
phrase = word[0]
for letter in word[1:]:
if letter.isupper():
phrase += " "
phrase += letter
updatemyList.append(phrase)
print updatemyList
Can you simply do a check to see if all letters in word are caps, and if so, to ignore them i.e. count them as a single word?
I've used similar code in the past, and it looks a bit hard-coded but it does the job right (in my case I wanted to capture abbreviations up to 4 letters long)
def CapsSumsAbbv():
for word in words:
for i,l in enumerate(word):
try:
if word[i] == word[i].upper() and word[i+1] == word[i+1].upper() and word[i+2] == word[i+2].upper() and word[i+3] == word[i+3].upper():
try:
word = int(word)
except:
if word not in allcaps:
allcaps.append(word)
except:
pass
To further expand, if you had entries such as u'USAMilitarySpending' you can adapt the above code so that if there are more than two Caps letters in a row, but there are also lower caps, the space is added between the last and last-1 caps letter so it becomes u'USA Military Spending'

Categories