How to make my code detect end of string in python? - python

I'm trying to write code to split a sentence without it's punctuation. For example, if a user inputs "Hello, how are you?", I can split the sentence to ['hello','how','are','you']
userinput = str(raw_input("Enter your sentence: "))
def sentence_split(sentence):
result = []
current_word = ""
for letter in sentence:
if letter.isalnum():
current_word += letter
else: ## this is a symbol or punctuation, e.g. reach end of a word
if current_word:
result.append(current_word)
current_word = "" ## reinitialise for creating a new word
return result
print "Split of your sentence:", sentence_split(userinput)
so far my code works, but if i put a sentence without ending it with a punctuation, the last word won't show up in result, for example, if the input were "Hello, how are you", the result would be ['hello','how','are'], I guess it's because there's no punctuation to tell the code the string is ended, is there a way I can make the program detect it's the end of string? So that even if the input were "Hello, how are you", the result would still be ['hello','how','are','you'].

I've not tried to adjust your algorithm myself, but I think the method below should achieve what you are after.
def sentence_split(sentence):
new_sentence = sentence[:]
for letter in sentence:
if not letter.isalnum():
new_sentence = new_sentence.replace(letter, ' ')
return new_sentence.split()
Now with it running:
runfile(r'C:\Users\cat\test.py', wdir=r'C:\Users\cat')
['Hello', 'how', 'are', 'you']
Edit: Fixed a bug with initialisation of new_sentence.

You could try something like this:
def split_string(text, splitlist):
for sep in splitlist:
text = text.replace(sep, splitlist[0])
return filter(None, text.split(splitlist[0])) if splitlist else [text]
If you set splitlist to "!?,." or whatever you need to split on, this will first replace every instance of punctuation with the first sep from splitlist, and finally will split the whole sentence on the first sep, while removing empty strings from the returned list (that's what filter(None, list) does).
Or you could use this simple regex solution:
>>> s = "Hello, how are you?"
>>> re.findall(r'([A-Za-z]+)', s)
['Hello', 'how', 'are', 'you']

Since the algorithm expects every word to end with punctuation or a space, you could just add a space to the end of the input to make sure the algorithm terminates properly:
userinput = str(raw_input("Enter your sentence: ")) + " "
Result:
Enter your sentence: hello how are you
Split of your sentence: ['hello', 'how', 'are', 'you']

Method 1:
Why not just use re.split('[a list of chars i do not like]', s)?
https://docs.python.org/2/library/re.html
Method 2:
Sanitize string (remove unwanted characters):
http://pastebin.com/raw.php?i=1j7ACbyK
Then do s.split(' ').

The problem with your code is that you don’t do anything with current_word at the end, unless you hit a non-alphanum character:
for letter in sentence:
if letter.isalnum():
current_word += letter
else:
if current_word:
result.append(current_word)
current_word = ""
return result
If the last letter is another character, it will just be added to current_word, but current_word will never be appended to the result. You can fix this, by simply duplicating the append-logic after the loop:
for letter in sentence:
if letter.isalnum():
current_word += letter
else:
if current_word:
result.append(current_word)
current_word = ""
if current_word:
result.append(current_word)
return result
So now, when current_word is non-empty after the loop, it will be appended to the result as well. And in case the last character was some punctuation, current_word will be empty again, so the condition of the if after the loop won’t be true.

Related

Shuffle words' characters while maintaining sentence structure and punctuations

So, I want to be able to scramble words in a sentence, but:
Word order in the sentence(s) is left the same.
If the word started with a capital letter, the jumbled word must also start with a capital letter
(i.e., the first letter gets capitalised).
Punctuation marks . , ; ! and ? need to be preserved.
For instance, for the sentence "Tom and I watched Star Wars in the cinema, it was
fun!" a jumbled version would be "Mto nad I wachtde Tars Rswa ni het amecin, ti wsa
fnu!".
from random import shuffle
def shuffle_word(word):
word = list(word)
if word.title():
???? #then keep first capital letter in same position in word?
elif char == '!' or '.' or ',' or '?':
???? #then keep their position?
else:
shuffle(word)
return''.join(word)
L = input('try enter a sentence:').split()
print([shuffle_word(word) for word in L])
I am ok for understanding how to jumble each word in the sentence but... struggling with the if statement to apply specifics? please help!
Here is my code. Little different from your logic. Feel free to optimize the code.
import random
def shuffle_word(words):
words_new = words.split(" ")
out=''
for word in words_new:
l = list(word)
if word.istitle():
result = ''.join(random.sample(word, len(word)))
out = out + ' ' + result.title()
elif any(i in word for i in ('!','.',',')):
result = ''.join(random.sample(word[:-1], len(word)-1))
out = out + ' ' + result+word[-1]
else:
result = ''.join(random.sample(word, len(word)))
out = out +' ' + result
return (out[1:])
L = "Tom and I watched Star Wars in the cinema, it was fun!"
print(shuffle_word(L))
Output of above code execution:
Mto nda I whaecdt Atsr Swra in hte ienamc, ti wsa nfu!
Hope it helps. Cheers!
Glad to see you've figured out most of the logic.
To maintain the capitalization of the first letter, you can check it beforehand and capitalize the "new" first letter later.
first_letter_is_cap = word[0].isupper()
shuffle(word)
if first_letter_is_cap:
# Re-capitalize first letter
word[0] = word[0].upper()
To maintain the position of a trailing punctuation, strip it first and add it back afterwards:
last_char = word[-1]
if last_char in ".,;!?":
# Strip the punctuation
word = word[:-1]
shuffle(word)
if last_char in ".,;!?":
# Add it back
word.append(last_char)
Since this is a string processing algorithm I would consider using regular expressions. Regex gives you more flexibility, cleaner code and you can get rid of the conditions for edge cases. For example this code handles apostrophes, numbers, quote marks and special phrases like date and time, without any additional code and you can control these just by changing the pattern of regular expression.
from random import shuffle
import re
# Characters considered part of words
pattern = r"[A-Za-z']+"
# shuffle and lowercase word characters
def shuffle_word(word):
w = list(word)
shuffle(w)
return ''.join(w).lower()
# fucntion to shuffle word used in replace
def replace_func(match):
return shuffle_word(match.group())
def shuffle_str(str):
# replace words with their shuffled version
shuffled_str = re.sub(pattern, replace_func, str)
# find original uppercase letters
uppercase_letters = re.finditer(r"[A-Z]", str)
# make new characters in uppercase positions uppercase
char_list = list(shuffled_str)
for match in uppercase_letters:
uppercase_index = match.start()
char_list[uppercase_index] = char_list[uppercase_index].upper()
return ''.join(char_list)
print(shuffle_str('''Tom and I watched "Star Wars" in the cinema's new 3D theater yesterday at 8:00pm, it was fun!'''))
This works with any sentence, even if was "special" characters in a row, preserving all the punctuaction marks:
from random import sample
def shuffle_word(sentence):
new_sentence=""
word=""
for i,char in enumerate(sentence+' '):
if char.isalpha():
word+=char
else:
if word:
if len(word)==1:
new_sentence+=word
else:
new_word=''.join(sample(word,len(word)))
if word==word.title():
new_sentence+=new_word.title()
else:
new_sentence+=new_word
word=""
new_sentence+=char
return new_sentence
text="Tom and I watched Star Wars in the cinema, it was... fun!"
print(shuffle_word(text))
Output:
Mto nda I hctawed Rast Aswr in the animec, ti asw... fnu!

How to add strings to items in list that resulted from split?

I am building a function that accepts a string as input, splits it based on certain separator and ends on period. Essentially what I need to do is add certain pig latin words onto certain words within the string if they fit the criteria.
The criteria are:
if the word starts with a non-letter or contains no characters, do nothing to it
if the word starts with a vowel, add 'way' to the end
if the word starts with a consonant, place the first letter at the end and add 'ay'
For output example:
simple_pig_latin("i like this") → 'iway ikelay histay.'
--default sep(space) and end(dot)
simple_pig_latin("i like this", sep='.') → 'i like thisway.'
--separator is dot, so whole thing is a single “word”
simple_pig_latin("i.like.this",sep='.',end='!') → 'iway.ikelay.histay!'
--sep is '.' and end is '!'
simple_pig_latin(".") → '..'
--only word is '.', so do nothing to it and add a '.' to the end
It is now:
def simple_pig_latin(input, sep='', end='.'):
words=input.split(sep)
new_sentence=""
Vowels= ('a','e','i','o','u')
Digit= (0,1,2,3,4,5,6,7,8,9)
cons=('b','c','d','f','g','h','j','k','l','m','n','p','q','r','s','t','v','w','x','y','z')
for word in words:
if word[0] in Vowels:
new_word= word+"way"
if word[0] in Digit:
new_word= word
if word[0] in cons:
new_word= word+"ay"
else:
new_word= word
new_sentence= new_sentence + new_word+ sep
new_sentence= new_sentence.strip(sep) + sentenceEndPunctuation
return new_sentence
Example error:
ERROR: test_simple_pig_latin_8 (__main__.AllTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "testerl8.py", line 125, in test_simple_pig_latin_8
result = simple_pig_latin(input,sep='l',end='')
File "/Users/kgreenwo/Desktop/student.py", line 8, in simple_pig_latin
if word[0] in Vowels:
IndexError: string index out of range
You have the means of adding strings together correct: you use the + operator, as you have in new_string = new_string + "way".
You have two other major issues, however:
To determine whether a variable can be found in a list (in your case, a tuple), you’d probably want to use the in operator. Instead of if [i][0]==Vowels: you would use if [i][0] in Vowels:.
When you reconstruct the string with the new words, you will need to add the word to your new_string. Instead of new_string=new_string+"way" you might use new_string = new_string+word+"way". If you choose to do it this way, you’ll also need to decide when to add the sep back to each word.
Another way of joining smaller strings into larger ones with a known separator is to create a list of the new individual strings, and then join the strings back together using your known separator:
separator = ' '
words = sentence.split(separator)
newWords = []
for word in words:
newWord = doSomething(word)
newWords.append(newWord)
newSentence = separator.join(newWords)
In this way, you don’t have to worry about either the first or last word not needing a separator.
In your case, doSomething might look like:
def doSomething(word):
if word[0] in Vowels:
return word + "way"
elif word[0] in Consonants:
return word + "ay"
#and so forth
How to write a function
On a more basic level, you will probably find it easier to create your functions in steps, rather than trying to write everything at once. At each step, you can be sure that the function (or script) works, and then move on to the next step. For example, your first version might be as simple as:
def simple_pig_latin(sentence, separator=' '):
words = sentence.split(separator)
for word in words:
print word
simple_pig_latin("i like this")
This does nothing except print each word in the sentence, one per line, to show you that the function is breaking the sentence apart into words the way that you expect it to be doing. Since words are fundamental to your function, you need to be certain that you have words and that you know where they are before you can continue. Your error of trying to check [i][0] would have been caught much more easily in this version, for example.
A second version might then do nothing except return the sentence recreated, taking it apart and then putting it back together the same way it arrived:
def simple_pig_latin(sentence, separator=' '):
words = sentence.split(separator)
new_sentence = ""
for word in words:
new_sentence = new_sentence + word + separator
return new_sentence
print simple_pig_latin("i like this")
Your third version might try to add the end punctuation:
def simple_pig_latin(sentence, separator=' ', sentenceEndPunctuation='.'):
words = sentence.split(separator)
new_sentence = ""
for word in words:
new_sentence = new_sentence + word + separator
new_sentence = new_sentence + sentenceEndPunctuation
return new_sentence
print simple_pig_latin("i like this")
At this point, you’ll realize that there’s an issue with the separator getting added on in front of the end punctuation, so you might fix that by stripping off the separator when done, or by using a list to construct the new_sentence, or any number of ways.
def simple_pig_latin(sentence, separator=' ', sentenceEndPunctuation='.'):
words = sentence.split(separator)
new_sentence = ""
for word in words:
new_sentence = new_sentence + word + separator
new_sentence = new_sentence.strip(separator) + sentenceEndPunctuation
return new_sentence
print simple_pig_latin("i like this")
Only when you can return the new sentence without the pig latin endings, and understand how that works, would you add the pig latin to your function. And when you add the pig latin, you would do it one rule at a time:
def simple_pig_latin(sentence, separator=' ', sentenceEndPunctuation='.'):
vowels= ('a','e','i','o','u')
words = sentence.split(separator)
new_sentence = ""
for word in words:
if word[0] in vowels:
new_word = word + "way"
else:
new_word = word
new_sentence = new_sentence + new_word + separator
new_sentence = new_sentence.strip(separator) + sentenceEndPunctuation
return new_sentence
print simple_pig_latin("i like this")
And so on, adding each change one at a time, until the function performs the way you expect.
When you try to build the function complete all at once, you end up with competing errors that make it difficult to see where the function is going wrong. By building the function one step at a time, you should generally only have one error at a time to debug.

How to separate a irregularly cased string to get the words? - Python

I have the following word list.
as my words are not all delimited by capital latter. the word list would consist words such as 'USA' , I am not sure how to do that. 'USA' should be as a one word. cannot be separated.
myList=[u'USA',u'Chancellor', u'currentRank', u'geolocDepartment', u'populationUrban', u'apparentMagnitude', u'Train', u'artery',
u'education', u'rightChild', u'fuel', u'Synagogue', u'Abbey', u'ResearchProject', u'languageFamily', u'building',
u'SnookerPlayer', u'productionCompany', u'sibling', u'oclc', u'notableStudent', u'totalCargo', u'Ambassador', u'copilote',
u'codeBook', u'VoiceActor', u'NuclearPowerStation', u'ChessPlayer', u'runwayLength', u'horseRidingDiscipline']
How to edit the element in the list.
I would like to get change the element in the list as below shows:
updatemyList=[u'USA',u'Chancellor', u'current Rank', u'geoloc Department', u'population Urban', u'apparent Magnitude', u'Train', u'artery',
u'education', u'right Child', u'fuel', u'Synagogue', u'Abbey', u'Research Project', u'language Family', u'building',
u'Snooker Player', u'production Company', u'sibling', u'oclc', u'notable Student', u'total Cargo', u'Ambassador', u'copilote',
u'code Book', u'Voice Actor', u'Nuclear Power Station', u'Chess Player', u'runway Length', u'horse Riding Discipline']
the word is able to separate
You could use re.sub
import re
first_cap_re = re.compile('(.)([A-Z][a-z]+)')
all_cap_re = re.compile('([a-z0-9])([A-Z])')
def convert(word):
s1 = first_cap_re.sub(r'\1 \2', word)
return all_cap_re.sub(r'\1 \2', s1)
updated_words = [convert(word) for word in myList]
Adapated from: Elegant Python function to convert CamelCase to snake_case?
Could do this using regex, but easier to comprehend with a small algorithm (ignoring corner cases like abbreviations e.g NLTK)
def split_camel_case(string):
new_words = []
current_word = ""
for char in string:
if char.isupper() and current_word:
new_words.append(current_word)
current_word = ""
current_word += char
return " ".join(new_words + [current_word])
old_words = ["HelloWorld", "MontyPython"]
new_words = [split_camel_case(string) for string in old_words]
print(new_words)
You can use a regular expression to prepend each upper-case letter that's not at the beginning of a word with a space:
re.sub(r"(?!\b)(?=[A-Z])", " ", your_string)
The bit in the first pair of parens means "not at the beginning of a word", and the bit in the second pair of parens means "followed by an uppercase letter". The regular expression matches the empty string at places where these two conditions hold, and replaces the empty string with a space, i.e. it inserts a space at these positions.
The following code snippet separates the words as you want:
myList=[u'Chancellor', u'currentRank', u'geolocDepartment', u'populationUrban', u'apparentMagnitude', u'Train', u'artery', u'education', u'rightChild', u'fuel', u'Synagogue', u'Abbey', u'ResearchProject', u'languageFamily', u'building', u'SnookerPlayer', u'productionCompany', u'sibling', u'oclc', u'notableStudent', u'totalCargo', u'Ambassador', u'copilote', u'codeBook', u'VoiceActor', u'NuclearPowerStation', u'ChessPlayer', u'runwayLength', u'managerYearsEndYear', 'horseRidingDiscipline']
updatemyList = []
for word in myList:
phrase = word[0]
for letter in word[1:]:
if letter.isupper():
phrase += " "
phrase += letter
updatemyList.append(phrase)
print updatemyList
Can you simply do a check to see if all letters in word are caps, and if so, to ignore them i.e. count them as a single word?
I've used similar code in the past, and it looks a bit hard-coded but it does the job right (in my case I wanted to capture abbreviations up to 4 letters long)
def CapsSumsAbbv():
for word in words:
for i,l in enumerate(word):
try:
if word[i] == word[i].upper() and word[i+1] == word[i+1].upper() and word[i+2] == word[i+2].upper() and word[i+3] == word[i+3].upper():
try:
word = int(word)
except:
if word not in allcaps:
allcaps.append(word)
except:
pass
To further expand, if you had entries such as u'USAMilitarySpending' you can adapt the above code so that if there are more than two Caps letters in a row, but there are also lower caps, the space is added between the last and last-1 caps letter so it becomes u'USA Military Spending'

How can I create a code that translates a string sentence to Pyglatin? [duplicate]

This question already has answers here:
python pig latin converter
(2 answers)
Closed 6 years ago.
I have a python code to translate a one worded string to pyglatin and is as follows:
pyg = 'ay'
original = raw_input('Enter a word:')
if len(original)>0 and original.isalpha():
word = original.lower()
first = word[0]
rest = word[1:]
new_word = rest+first+pyg
print new_word
However, I'm stumped on how to translate an entire sentence to Pyglatin. The problem I'm working on has these following conditions: for words that begin with consonants, all initial consonants are moved to the end of the word and 'ay' is appended. For words that begin with a vowel, the initial vowel remains, but 'way' is added to the end of the word.
As an example, the string 'How are you today?' would be 'owhay areway uoyay odaytay?'
Read in a sentence. Break it into individual words (split method). Translate each word to Pig Latin. Concatenate the translations.
Does that get you moving?
Try this. Use split and put it into an empty string.
original = raw_input("Enter Sentence: ")
conversion = ""
for word in original.split():
if len(word)>0 and word.isalpha():
word = word.lower()
first = word[0]
rest = word[1:]
pyg = "ay"
pygword = rest+first+pyg
conversion += pygword + " "
print conversion
Here is my try, but if you are not doing it yourself at least try to understand it. (You are free to ask of course)
This has basic ability to deal with special characters like the "?"
def pygword(word):
vowels = 'aeiou'
if word[0] in vowels:
return word + 'way'
else:
while word[0] not in vowels:
word = word[1:]+word[0]
return word + "ay"
def pygsentence(sentence):
final = ''
for word in sentence.lower().split(): #split the sentence
#words should only end in symols in correct grammar
if word[-1] in '.,;:!?':
symbol = word[-1]
word = word[:-1]
else:
symbol = ''
if word.isalpha(): #check if word is alphanumerically
final += pygword(word)+symbol+' '
else:
return "There is a strange thing in one of your words."
return final[:-1] #remove last unecessary space
There may be faster, more robust, simpler, better understandable ways to do this, but this how I would start.
Test yields me:
In[1]: pygsentence("How are you today? I am fine, thank you very much good sir!")
Out[1]: 'owhay areway ouyay odaytay? iway amway inefay, ankthay ouyay eryvay uchmay oodgay irsay!'
Your code does not obey the vowel/consonant rule, so I did my own converter for single words.
Just realized that it won't be able to deal with apastrophes in the middle of words (we don't really have theese in german ;) ) so there is a little task left for you.
edit: I did not know in which order you wanted the consonants apended, since that became not clear from your example. So i made an alternative pygword function:
def pygword2(word):
vowels = 'aeiou'
if word[0] in vowels:
return word + 'way'
else:
startcons = ''
while word[0] not in vowels:
startcons = word[0] +startcons
word = word[1:]
word = word+startcons
return word + "ay"
See the differnece:
In[48]: pygword("street")
Out[48]: 'eetstray'
In[49]: pygword2("street")
Out[49]: 'eetrtsay'

Code to detect all words that start with a capital letter in a string

I'm writing out a small snippet that grabs all letters that start with a capital letter in python . Here's my code
def WordSplitter(n):
list1=[]
words=n.split()
print words
#print all([word[0].isupper() for word in words])
if ([word[0].isupper() for word in words]):
list1.append(word)
print list1
WordSplitter("Hello How Are You")
Now when I run the above code. Im expecting that list will contain all the elements, from the string , since all of the words in it start with a capital letter.
But here's my output:
#ubuntu:~/py-scripts$ python wordsplit.py
['Hello', 'How', 'Are', 'You']
['You']# Im expecting this list to contain all words that start with a capital letter
You're only evaluating it once, so you get a list of True and it only appends the last item.
print [word for word in words if word[0].isupper() ]
or
for word in words:
if word[0].isupper():
list1.append(word)
You can take advantage of the filter function:
l = ['How', 'are', 'You']
print filter(str.istitle, l)
I have written the following python snippet to store the capital letter starting words into a dictionary as key and no of its appearances as a value in this dictionary against the key.
#!/usr/bin/env python
import sys
import re
hash = {} # initialize an empty dictinonary
for line in sys.stdin.readlines():
for word in line.strip().split(): # removing newline char at the end of the line
x = re.search(r"[A-Z]\S+", word)
if x:
#if word[0].isupper():
if word in hash:
hash[word] += 1
else:
hash[word] = 1
for word, cnt in hash.iteritems(): # iterating over the dictionary items
sys.stdout.write("%d %s\n" % (cnt, word))
In the above code, I shown both ways, the array index to check for the uppercase start letter and by using the regular expression. Anymore improvement suggestion for the above code for performance or for simplicity is welcome

Categories