How to reverse the words of a string considering the punctuation? - python

Here is what I have so far:
def reversestring(thestring):
words = thestring.split(' ')
rev = ' '.join(reversed(words))
return rev
stringing = input('enter string: ')
print(reversestring(stringing))
I know I'm missing something because I need the punctuation to also follow the logic.
So let's say the user puts in Do or do not, there is no try.. The result should be coming out as .try no is there , not do or Do, but I only get try. no is there not, do or Do. I use a straightforward implementation which reverse all the characters in the string, then do something where it checks all the words and reverses the characters again but only to the ones with ASCII values of letters.

Try this (explanation in comments of code):
s = "Do or do not, there is no try."
o = []
for w in s.split(" "):
puncts = [".", ",", "!"] # change according to needs
for c in puncts:
# if a punctuation mark is in the word, take the punctuation and add it to the rest of the word, in the beginning
if c in w:
w = c + w[:-1] # w[:-1] gets everthing before the last char
o.append(w)
o = reversed(o) # reversing list to reverse sentence
print(" ".join(o)) # printing it as sentence
#output: .try no is there ,not do or Do

Your code does exactly what it should, splitting on space doesn't separator a dot ro comma from a word.
I'd suggest you use re.findall to get all words, and all punctation that interest you
import re
def reversestring(thestring):
words = re.findall(r"\w+|[.,]", thestring)
rev = ' '.join(reversed(words))
return rev
reversestring("Do or do not, there is no try.") # ". try no is there , not do or Do"

You can use regular expressions to parse the sentence into a list of words and a list of separators, then reverse the word list and combine them together to form the desired string. A solution to your problem would look something like this:
import re
def reverse_it(s):
t = "" # result, empty string
words = re.findall(r'(\w+)', s) # just the words
not_s = re.findall(r'(\W+)', s) # everything else
j = len(words)
k = len(not_s)
words.reverse() # reverse the order of word list
if re.match(r'(\w+)', s): # begins with a word
for i in range(k):
t += words[i] + not_s[i]
if j > k: # and ends with a word
t += words[k]
else: # begins with punctuation
for i in range(j):
t += not_s[i] + words[i]
if k > j: # ends with punctuation
t += not_s[j]
return t #result
def check_reverse(p):
q = reverse_it(p)
print("\"%s\", \"%s\"" % (p, q) )
check_reverse('Do or do not, there is no try.')
Output
"Do or do not, there is no try.", "try no is there, not do or Do."
It is not a very elegant solution but sure does work!

Related

Python String adjust

Hello is use some method like .isupper() in a loop, or string[i+1] to find my lower char but i don't know how to do that
input in function -> "ThisIsMyChar"
expected -> "This is my char"
I´ve done it with regex, could be done with less code but my intention is readable
import re
def split_by_upper(input_string):
pattern = r'[A-Z][a-z]*'
matches = re.findall(pattern, input_string)
if (matches):
output = matches[0]
for word in matches[1:]:
output += ' ' + word[0].lower() + word[1:]
return output
else:
return input_string
print(split_by_upper("ThisIsMyChar"))
>> split_by_upper() -> "This is my char"
You could use re.findall and str.lower:
>>> import re
>>> s = 'ThisIsMyChar'
>>> ' '.join(w.lower() if i >= 1 else w for i, w in enumerate(re.findall('.[^A-Z]*', s)))
'This is my char'
You should first try by yourself. If you didn't get it done, you can do something like this:
# to parse input string
def parse(str):
result= "" + str[0];
for i in range(1, len(str)):
ch = str[i]
if ch.isupper():
result += " ";
result += ch.lower();
return result;
# input string
str = "ThisIsMyChar";
print(parse(str))
First you need to run a for loop and check for Uppercase words then when you find it just add a space at the starting, lower the word and increment it to your new string. Simple, more code is explained in comments in the code itself.
def AddSpaceInTitleCaseString(string):
NewStr = ""
# Check for Uppercase string in the input string char-by-char.
for i in string:
# If it found one, add it to the NewStr variable with a space and lowering it's case.
if i.isupper(): NewStr += f" {i.lower()}"
# Else just add it as usual.
else: NewStr += i
# Before returning the NewStr, remove all the leading and trailing spaces from it.
# And as shown in your question I'm assuming that you want the first letter or your new sentence,
# to be in uppercase so just use 'capitalize' function for it.
return NewStr.strip().capitalize()
# Test.
MyStr = AddSpaceInTitleCaseString("ThisIsMyChar")
print(MyStr)
# Output: "This is my char"
Hope it helped :)
Here is a concise regex solution:
import re
capital_letter_pattern = re.compile(r'(?!^)[A-Z]')
def add_spaces(string):
return capital_letter_pattern.sub(lambda match: ' ' + match[0].lower(), string)
if __name__ == '__main__':
print(add_spaces('ThisIsMyChar'))
The pattern searches for capital letters ([A-Z]), and the (?!^) is negative lookahead that excludes the first character of the input ((?!foo) means "don't match foo, ^ is "start of line", so (?!^) is "don't match start of line").
The .sub(...) method of a pattern is usually used like pattern.sub('new text', 'my input string that I want changed'). You can also use a function in place of 'new text', in which case the function is called with the match object as an argument, and the value returned by the function is used as the replacement string.
The expression capital_letter_pattern.sub(lambda match: ' ' + match[0].lower(), string) replaces all matches (all capital letters except at the start of the line) using a lambda function to add a space before and make the letter lowercase. match[0] means "the entirety of the matched text", which in this case is the captial letter.
You can split it via Regex using r"(?<!^)(?=[A-Z])" pattern:
import re
txt = 'ThisIsMyChar'
c = re.compile(r"(?<!^)(?=[A-Z])")
first, *rest = map(str.lower, c.split(txt))
print(f'{first.title()} {" ".join(rest)}')
Pattern explanation:
(?<!^) checks to see if it is not at the beginning.
(?=[A-Z]) checks to see there a capital letter after it.
note These are non-capturing groups.

Finding uppercase acronyms in a string

I'm trying to find uppercase acronyms in a string. For example, if the input is "I need to see you ASAP, because YOLO, you know" should return ["ASAP", "YOLO"].
#!/usr/bin/env python3
import string
def acronyms(s):
s.translate(string.punctuation)
for i, x in enumerate(s):
while x.upper():
print(x)
i += 1
def main():
print(acronyms("""I need to see you ASAP, because YOLO, you know."""))
if __name__ == "__main__":
main()
I tried to get rid of the punctuations, then loop through the string, and while it is uppercase print the letter out. It resulted in an infinite loop. I wanted to solve this using string manipulation, so no RegEx
Edits:
changes in removing punctuations for efficiency
From:
exclude = set(string.punctuation)
s = "".join(ch for ch in s if ch not in exclude)
To:
s.translate(string.punctuation)
A couple things I'd like to point out. One, you have end up with a hanging program because you have a while True and not a single break. Then, you kind of make the enumerate pretty pointless when you do n+=1.
for i, x in enumerate(s):
n+=1
This can all be easily simplified, no enumerate needed.
def acronyms(s):
exclude = set(string.punctuation)
s = "".join(ch for ch in s if ch not in exclude)
acro = [x for x in s.split() if x.isupper()]
return acro
output
['I', 'ASAP', 'YOLO']
Sadly, we do have an extra I which happens not to be an acronym, so one fix could be to make sure x is never one letter before being appended.
acro = [x for x in s.split() if x.isupper() and len(x) != 1]
Your while-loop iterates over the first character, but never breaks out to go to the next character.
You also want to filter out the 'I' as a single letter is not usually classified as an acronym.
the string.isupper() function checks for an entire string rather than a single character, so I recommend you use that. It would look like this:
def acronyms(s):
words = s.split()
acronyms = []
for word in words:
if word.isupper() and len(word) > 1:
acronyms.append(word)
return acronyms
I would highly recommend using nltk for its wonderful tokenisation package, it handles edge cases and punctuation superbly.
For a simplified approach where you define an acronym as:
all characters are alphabetical
all characters are upper case
The following should suffice:
from nltk.tokenize import word_tokenize
def get_acronyms(text):
return [
token for token in word_tokenize(text)
if token.isupper()
]
here this should work:
def acronyms(x):
ans = []
y = x.split(" ")
for i in y:
if i.isupper():
ans += [i]
return ans
isupper() returns True as long as there are no lowercase even if there is punctuation

Shuffle words' characters while maintaining sentence structure and punctuations

So, I want to be able to scramble words in a sentence, but:
Word order in the sentence(s) is left the same.
If the word started with a capital letter, the jumbled word must also start with a capital letter
(i.e., the first letter gets capitalised).
Punctuation marks . , ; ! and ? need to be preserved.
For instance, for the sentence "Tom and I watched Star Wars in the cinema, it was
fun!" a jumbled version would be "Mto nad I wachtde Tars Rswa ni het amecin, ti wsa
fnu!".
from random import shuffle
def shuffle_word(word):
word = list(word)
if word.title():
???? #then keep first capital letter in same position in word?
elif char == '!' or '.' or ',' or '?':
???? #then keep their position?
else:
shuffle(word)
return''.join(word)
L = input('try enter a sentence:').split()
print([shuffle_word(word) for word in L])
I am ok for understanding how to jumble each word in the sentence but... struggling with the if statement to apply specifics? please help!
Here is my code. Little different from your logic. Feel free to optimize the code.
import random
def shuffle_word(words):
words_new = words.split(" ")
out=''
for word in words_new:
l = list(word)
if word.istitle():
result = ''.join(random.sample(word, len(word)))
out = out + ' ' + result.title()
elif any(i in word for i in ('!','.',',')):
result = ''.join(random.sample(word[:-1], len(word)-1))
out = out + ' ' + result+word[-1]
else:
result = ''.join(random.sample(word, len(word)))
out = out +' ' + result
return (out[1:])
L = "Tom and I watched Star Wars in the cinema, it was fun!"
print(shuffle_word(L))
Output of above code execution:
Mto nda I whaecdt Atsr Swra in hte ienamc, ti wsa nfu!
Hope it helps. Cheers!
Glad to see you've figured out most of the logic.
To maintain the capitalization of the first letter, you can check it beforehand and capitalize the "new" first letter later.
first_letter_is_cap = word[0].isupper()
shuffle(word)
if first_letter_is_cap:
# Re-capitalize first letter
word[0] = word[0].upper()
To maintain the position of a trailing punctuation, strip it first and add it back afterwards:
last_char = word[-1]
if last_char in ".,;!?":
# Strip the punctuation
word = word[:-1]
shuffle(word)
if last_char in ".,;!?":
# Add it back
word.append(last_char)
Since this is a string processing algorithm I would consider using regular expressions. Regex gives you more flexibility, cleaner code and you can get rid of the conditions for edge cases. For example this code handles apostrophes, numbers, quote marks and special phrases like date and time, without any additional code and you can control these just by changing the pattern of regular expression.
from random import shuffle
import re
# Characters considered part of words
pattern = r"[A-Za-z']+"
# shuffle and lowercase word characters
def shuffle_word(word):
w = list(word)
shuffle(w)
return ''.join(w).lower()
# fucntion to shuffle word used in replace
def replace_func(match):
return shuffle_word(match.group())
def shuffle_str(str):
# replace words with their shuffled version
shuffled_str = re.sub(pattern, replace_func, str)
# find original uppercase letters
uppercase_letters = re.finditer(r"[A-Z]", str)
# make new characters in uppercase positions uppercase
char_list = list(shuffled_str)
for match in uppercase_letters:
uppercase_index = match.start()
char_list[uppercase_index] = char_list[uppercase_index].upper()
return ''.join(char_list)
print(shuffle_str('''Tom and I watched "Star Wars" in the cinema's new 3D theater yesterday at 8:00pm, it was fun!'''))
This works with any sentence, even if was "special" characters in a row, preserving all the punctuaction marks:
from random import sample
def shuffle_word(sentence):
new_sentence=""
word=""
for i,char in enumerate(sentence+' '):
if char.isalpha():
word+=char
else:
if word:
if len(word)==1:
new_sentence+=word
else:
new_word=''.join(sample(word,len(word)))
if word==word.title():
new_sentence+=new_word.title()
else:
new_sentence+=new_word
word=""
new_sentence+=char
return new_sentence
text="Tom and I watched Star Wars in the cinema, it was... fun!"
print(shuffle_word(text))
Output:
Mto nda I hctawed Rast Aswr in the animec, ti asw... fnu!

What would be the easiest way to search through a list?

It's actually a string but I just converted it to a list because the answer is supposed to be returned as a list. I've been looking at this problem for hours now and cannot get it. I'm supposed to take a string, like "Mary had a little lamb" for example and another string such as "ab" for example and search through string1 seeing if any of the letters from string2 occur. So if done correctly with the two example it would return
["a=4","b=1"]
I have this so far:
def problem3(myString, charString):
myList = list(myString)
charList = list(charString)
count = 0
newList = []
newString = ""
for i in range(0,len(myList)):
for j in range(0,len(charList)):
if charList[j] == myList[i]:
count = count + 1
newString = charList[j] + "=" + str(count)
newList.append(newString)
return newList
Which returns [a=5] I know it's something with the newList.append(string) and where it should be placed, anyone have any suggestions?
You can do this very easily with list comprehensions and the count function that strings (and lists!) have:
Split the search string into a list of chars.
For each character in the search string, loop over the input string and determine how much it occurs (via count).
Example:
string = 'Mary had a little lamb'
search_string = 'ab'
search_string_chars = [char for char in search_string]
result = []
for char in search_string_chars:
result.append('%s=%d' % (char, string.count(char)))
Result:
['a=4', 'b=1']
Note that you don't need to split the search_string ('ab') into a list of characters, as strings are already lists of characters - the above was done that way to illustrate the concept. Hence, a reduced version of the above could be (which also yields the same result):
string = 'Mary had a little lamb'
search_string = 'ab'
result = []
for char in search_string:
result.append('%s=%d' % (char, string.count(char)))
Here's a possible solution using Counter as mentioned by coder,
from collections import Counter
s = "Mary had a little lambzzz"
cntr = Counter(s)
test_str = "abxyzzz"
results = []
for letter in test_str:
if letter in s:
occurrances = letter + "=" + str(cntr.get(letter))
else:
occurrances = letter + "=" + "0"
if occurrances not in results:
results.append(occurrances)
print(results)
output
['a=4', 'b=1', 'x=0', 'y=1', 'z=3']
import collections
def count_chars(s, chars):
counter = collections.Counter(s)
return ['{}={}'.format(char, counter[char]) for char in set(chars)]
That's all. Let Counter do the work of actually counting the characters in the string. Then create a list comprehension of format strings using the characters in chars. (chars should be a set and not a list so that if there are duplicate characters in chars, the output will only show one.)

Python: Find the longest word in a string

I'm preparing for an exam but I'm having difficulties with one past-paper question. Given a string containing a sentence, I want to find the longest word in that sentence and return that word and its length. Edit: I only needed to return the length but I appreciate your answers for the original question! It helps me learn more. Thank you.
For example: string = "Hello I like cookies". My program should then return "Cookies" and the length 7.
Now the thing is that I am not allowed to use any function from the class String for a full score, and for a full score I can only go through the string once. I am not allowed to use string.split() (otherwise there wouldn't be any problem) and the solution shouldn't have too many for and while statements. The strings contains only letters and blanks and words are separated by one single blank.
Any suggestions? I'm lost i.e. I don't have any code.
Thanks.
EDIT: I'm sorry, I misread the exam question. You only have to return the length of the longest word it seems, not the length + the word.
EDIT2: Okay, with your help I think I'm onto something...
def longestword(x):
alist = []
length = 0
for letter in x:
if letter != " ":
length += 1
else:
alist.append(length)
length = 0
return alist
But it returns [5, 1, 4] for "Hello I like cookies" so it misses "cookies". Why? EDIT: Ok, I got it. It's because there's no more " " after the last letter in the sentence and therefore it doesn't append the length. I fixed it so now it returns [5, 1, 4, 7] and then I just take the maximum value.
I suppose using lists but not .split() is okay? It just said that functions from "String" weren't allowed or are lists part of strings?
You can try to use regular expressions:
import re
string = "Hello I like cookies"
word_pattern = "\w+"
regex = re.compile(word_pattern)
words_found = regex.findall(string)
if words_found:
longest_word = max(words_found, key=lambda word: len(word))
print(longest_word)
Finding a max in one pass is easy:
current_max = 0
for v in values:
if v>current_max:
current_max = v
But in your case, you need to find the words. Remember this quote (attribute to J. Zawinski):
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
Besides using regular expressions, you can simply check that the word has letters. A first approach is to go through the list and detect start or end of words:
current_word = ''
current_longest = ''
for c in mystring:
if c in string.ascii_letters:
current_word += c
else:
if len(current_word)>len(current_longest):
current_longest = current_word
current_word = ''
else:
if len(current_word)>len(current_longest):
current_longest = current_word
A final way is to split words in a generator and find the max of what it yields (here I used the max function):
def split_words(mystring):
current = []
for c in mystring:
if c in string.ascii_letters:
current.append(c)
else:
if current:
yield ''.join(current)
max(split_words(mystring), key=len)
Just search for groups of non-whitespace characters, then find the maximum by length:
longest = len(max(re.findall(r'\S+',string), key = len))
For python 3. If both the words in the sentence is of the same length, then it will return the word that appears first.
def findMaximum(word):
li=word.split()
li=list(li)
op=[]
for i in li:
op.append(len(i))
l=op.index(max(op))
print (li[l])
findMaximum(input("Enter your word:"))
It's quite simple:
def long_word(s):
n = max(s.split())
return(n)
IN [48]: long_word('a bb ccc dddd')
Out[48]: 'dddd'
found an error in a previous provided solution, he's the correction:
def longestWord(text):
current_word = ''
current_longest = ''
for c in text:
if c in string.ascii_letters:
current_word += c
else:
if len(current_word)>len(current_longest):
current_longest = current_word
current_word = ''
if len(current_word)>len(current_longest):
current_longest = current_word
return current_longest
I can see imagine some different alternatives. Regular expressions can probably do much of the splitting words you need to do. This could be a simple option if you understand regexes.
An alternative is to treat the string as a list, iterate over it keeping track of your index, and looking at each character to see if you're ending a word. Then you just need to keep the longest word (longest index difference) and you should find your answer.
Regular Expressions seems to be your best bet. First use re to split the sentence:
>>> import re
>>> string = "Hello I like cookies"
>>> string = re.findall(r'\S+',string)
\S+ looks for all the non-whitespace characters and puts them in a list:
>>> string
['Hello', 'I', 'like', 'cookies']
Now you can find the length of the list element containing the longest word and then use list comprehension to retrieve the element itself:
>>> maxlen = max(len(word) for word in string)
>>> maxlen
7
>>> [word for word in string if len(word) == maxlen]
['cookies']
This method uses only one for loop, doesn't use any methods in the String class, strictly accesses each character only once. You may have to modify it depending on what characters count as part of a word.
s = "Hello I like cookies"
word = ''
maxLen = 0
maxWord = ''
for c in s+' ':
if c == ' ':
if len(word) > maxLen:
maxWord = word
word = ''
else:
word += c
print "Longest word:", maxWord
print "Length:", len(maxWord)
Given you are not allowed to use string.split() I guess using a regexp to do the exact same thing should be ruled out as well.
I do not want to solve your exercise for you, but here are a few pointers:
Suppose you have a list of numbers and you want to return the highest value. How would you do that? What information do you need to track?
Now, given your string, how would you build a list of all word lengths? What do you need to keep track of?
Now, you only have to intertwine both logics so computed word lengths are compared as you go through the string.
My proposal ...
import re
def longer_word(sentence):
word_list = re.findall("\w+", sentence)
word_list.sort(cmp=lambda a,b: cmp(len(b),len(a)))
longer_word = word_list[0]
print "The longer word is '"+longer_word+"' with a size of", len(longer_word), "characters."
longer_word("Hello I like cookies")
import re
def longest_word(sen):
res = re.findall(r"\w+",sen)
n = max(res,key = lambda x : len(x))
return n
print(longest_word("Hey!! there, How is it going????"))
Output : there
Here I have used regex for the problem. Variable "res" finds all the words in the string and itself stores them in the list after splitting them.
It uses split() to store all the characters in a list and then regex does the work.
findall keyword is used to find all the desired instances in a string. Here \w+ is defined which tells the compiler to look for all the words without any spaces.
Variable "n" finds the longest word from the given string which is now free of any undesired characters.
Variable "n" uses lambda expressions to define the key len() here.
Variable "n" finds the longest word from "res" which has removed all the non-string charcters like %,&,! etc.
>>>#import regular expressions for the problem.**
>>>import re
>>>#initialize a sentence
>>>sen = "fun&!! time zone"
>>>res = re.findall(r"\w+",sen)
>>>#res variable finds all the words and then stores them in a list.
>>>res
Out: ['fun','time','zone']
>>>n = max(res)
Out: zone
>>>#Here we get "zone" instead of "time" because here the compiler
>>>#sees "zone" with the higher value than "time".
>>>#The max() function returns the item with the highest value, or the item with the highest value in an iterable.
>>>n = max(res,key = lambda x:len(x))
>>>n
Out: time
Here we get "time" because lambda expression discards "zone" as it sees the key is for len() in a max() function.
list1 = ['Happy', 'Independence', 'Day', 'Zeal']
listLen = []
for i in list1:
listLen.append(len(i))
print list1[listLen.index(max(listLen))]
Output - Independence

Categories