Creating sentences by a function without using a package - python

Im currently writing a program that looks at a list and iterates through a groups the words into sentences but whenever i ran it, I got [] and im not 100% sure why. Here is my code for reading in the file, and creating the sentence and an attached snippet of the list.
def import_file(text_file):
wordcounts = []
with open(text_file, encoding = "utf-8") as f:
pride_text = f.read()
sentences = pride_text.split(" ")
return sentences
def create_sentance(sentance):
sentence_list=[]
my_sentence=""
for character in sentance:
if character=='.' or character=='?' or character=='!':
sentence_list.append(my_sentence)
my_sentence=""
else:
my_sentence=my_sentence + character
return sentence_list
Preview of List
Calling of my functions
pride=import_file("pride.txt")
pride=remove_abbreviations_and_punctuation(pride)
pride=create_sentance(pride)
print(pride)

Your return sentence_list is indented one further than it should be. After the first iteration of the for, should the else condition execute and not the if, then your function returns sentence_list which was initialized to [ ]. Either way, if sentence was 20 characters long, your for will only run once given where your return call is.
Make the following change:
def create_sentance(sentance):
sentence_list=[]
my_sentence=""
for character in sentance:
if character=='.' or character=='?' or character=='!':
sentence_list.append(my_sentence)
my_sentence=""
else:
# do you not want this in 'sentence_list'?
my_sentence=my_sentence + character
return sentence_list

The reason your function is returning an empty list is because your return is inside the for loop. In addition, "character" is actually a word, since each element of your list is a word. This program works:
def create_sentance(sentance):
sentence_list=[]
my_sentence=""
for character in sentance:
print character
if '.' in character or '?' in character or '!' in character:
sentence_list.append(my_sentence + ' ' + character)
my_sentence=""
else:
my_sentence=my_sentence + ' ' + character
return sentence_list
create_sentance(['I','will','go','to','the','park.','Ok?'])
You need to use "in" instead of == because each character is a word. Try printing "character" to see this. The above program works and returns the result
[' I will go to the park.', 'Ok?']
which is what you were intending.

Related

Python Code to remove punctuation from dictionary not functioning correctly

I already turned in this assignment but it is driving me crazy. We were given a method to strip punctuation from a dictionary using a "for" loop with this example:
import string​
​
quote = " The joy of coding Python should be in seeing short, concise, readable classes that express " \​
"a lot of action in a small amount of clear code -- not in reams of trivial code that bores the " \​
"reader to death. "​
​
print(quote)​
word_list = quote.split()​
for word in word_list:​
word = word.strip(string.punctuation)​
print(word)
Our assignment for the week was to take the Gettysburg address saved as a .txt file and create a dictionary that has a count of how many times all the words appear. My first try I did this:
import string
def word_counter(speech, word_dictionary):
for word in speech:
if word in word_dictionary:
word_dictionary[word] += 1
else:
word_dictionary[word] = 1
def process_line(word_list, word_dictionary):
##split speech into list of words
words_split = word_list.split()
##remove puncation from list
for word in words_split:
word = word.strip(string.punctuation)
else:
word_counter(word, word_dictionary)
# Printing extra Values
pretty_print(word_dictionary)
def pretty_print(word_dictionary):\
##clean up values that shouldn't be there
##word_dictionary.pop("")
##word_dictionary.pop("19")
##word_dictionary.pop("1863")
##word_dictionary.pop("Abraham")
##word_dictionary.pop("Lincoln")
##Calculating how many words are in the dictionary
word_count_sum = len(word_dictionary.items())
print("Length of dictionary: ", word_count_sum)
for key, value in sorted(word_dictionary.items(), key=lambda kv: kv[1], reverse=True):
print("%s: %s" % (key, value))
def main():
##creating empty dictionary
word_count_dict = {}
##uploading file
gba_file = open('gettysburg.txt','r')
data = gba_file.read()
process_line(data,word_count_dict)
if __name__ == '__main__':
main()
What happens with this is the only entries in the dictionary are 1,9,8,3. I did a print statement and it is running through the entire loop. It also is looping through the entire list when after the split. I was able to complete the assignment by using:
for word in words_split:
for character in word:
if character in string.punctuation:
word = word.replace(character,"")
input_list.append(word)
but I am trying to learn so I want to know what I was doing wrong. Can anyone help? Sorry for the lengthy post and let me know if you need the .txt file to solve this.
You have an errant else in here that's messing up your for loop:
for word in words_split:
word = word.strip(string.punctuation)
else:
word_counter(word, word_dictionary)
The else: clause executes only once (or never) after the for loop is completely done (unless there's a break), so you're only calling word_counter on the very last word from the loop. You don't need the else: here at all; just delete that line, and word_counter will be called once per word.
Note that Python comes with a built-in class, collections.Counter, that will do this exact thing without you having to write your own function.
import string​
​
quote = " The joy of coding Python should be in seeing short, concise, readable classes that express " \​
"a lot of action in a small amount of clear code -- not in reams of trivial code that bores the " \​
"reader to death. "​
​
for s_char in string.punctuation:
quote = quote.replace(s_char,"")
The function .replace() replaces all characters in the string, not only one

How to check generated strings against a text file

I'm trying to have the user input a string of characters with one asterisk. The asterisk indicates a character that can be subbed out for a vowel (a,e,i,o,u) in order to see what substitutions produce valid words.
Essentially, I want to take an input "l*g" and have it return "lag, leg, log, lug" because "lig" is not a valid English word. Below I have invalid words to be represented as "x".
I've gotten it to properly output each possible combination (e.g., including "lig"), but once I try to compare these words with the text file I'm referencing (for the list of valid words), it'll only return 5 lines of x's. I'm guessing it's that I'm improperly importing or reading the file?
Here's the link to the file I'm looking at so you can see the formatting:
https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/words.zip
Using the "en" file ~2.5MB
It's not in a dictionary layout i.e. no corresponding keys/values, just lines (maybe I could use the line number as the index, but I don't know how to do that). What can I change to check the test words to narrow down which are valid words based on the text file?
with open(os.path.expanduser('~/Downloads/words/en')) as f:
words = f.readlines()
inputted_word = input("Enter a word with ' * ' as the missing letter: ")
letters = []
for l in inputted_word:
letters.append(l)
### find the index of the blank
asterisk = inputted_word.index('*') # also used a redundant int(), works fine
### sub in vowels
vowels = ['a','e','i','o','u']
list_of_new_words = []
for v in vowels:
letters[asterisk] = v
new_word = ''.join(letters)
list_of_new_words.append(new_word)
for w in list_of_new_words:
if w in words:
print(new_word)
else:
print('x')
There are probably more efficient ways to do this, but I'm brand new to this. The last two for loops could probably be combined but debugging it was tougher that way.
print(list_of_new_words)
gives
['lag', 'leg', 'lig', 'log', 'lug']
So far, so good.
But this :
for w in list_of_new_words:
if w in words:
print(new_word)
else:
print('x')
Here you print new_word, which is defined in the previous for loop :
for v in vowels:
letters[asterisk] = v
new_word = ''.join(letters) # <----
list_of_new_words.append(new_word)
So after the loop, new_word still has the last value it was assigned to : "lug" (if the script input was l*g).
You probably meant w instead ?
for w in list_of_new_words:
if w in words:
print(w)
else:
print('x')
But it still prints 5 xs ...
So that means that w in words is always False. How is that ?
Looking at words :
print(words[0:10]) # the first 10 will suffice
['A\n', 'a\n', 'aa\n', 'aal\n', 'aalii\n', 'aam\n', 'Aani\n', 'aardvark\n', 'aardwolf\n', 'Aaron\n']
All the words from the dictionary contain a newline character (\n) at the end. I guess you were not aware that it is what readlines do. So I recommend using :
words = f.read().splitlines()
instead.
With these 2 modifications (w and splitlines) :
Enter a word with ' * ' as the missing letter: l*g
lag
leg
x
log
lug
🎉

split strings with multiple special characters into lists without importing anything in python

i need to make a program that will capitalize the first word in a sentence and i want to be sure that all the special characters that are used to end a sentence can be used.
i can not import anything! this is for a class and i just want some examples to do this.
i have tried to use if to look in the list to see if it finds the matching character and do the correct split operatrion...
this is the function i have now... i know its not good at all as it just returns the original string...
def getSplit(userString):
userStringList = []
if "? " in userString:
userStringList=userString.split("? ")
elif "! " in userStringList:
userStringList = userString.split("! ")
elif ". " in userStringList:
userStringList = userString.split(". ")
else:
userStringList = userString
return userStringList
i want to be able to input something like this is a test. this is a test? this is definitely a test!
and get [this is a test.', 'this is a test?', 'this is definitely a test!']
and the this is going to send the list of sentences to another function to make the the first letter capitalized for each sentence.
this is an old homework assignment that i could only make it use one special character to separate the string into a list. buti want to user to be able to put in more then just one kind of sentence...
This may hep. use str.replace to replace special chars with space and the use str.split
Ex:
def getSplit(userString):
return userString.replace("!", " ").replace("?", " ").replace(".", " ").split()
print(map(lambda x:x.capitalize, getSplit("sdfsdf! sdfsdfdf? sdfsfdsf.sdfsdfsd!fdfgdfg?dsfdsfgf")))
Normally, you could use re.split(), but since you cannot import anything, the best option would be just to do a for loop. Here it is:
def getSplit(user_input):
n = len(user_input)
sentences =[]
previdx = 0
for i in range(n - 1):
if(user_input[i:i+2] in ['. ', '! ', '? ']):
sentences.append(user_input[previdx:i+2].capitalize())
previdx = i + 2
sentences.append(user_input[previdx:n].capitalize())
return "".join(sentences)
I would split the string at each white space. Then scan the list for words that contain the special character. If any is present, the next word is capitalised. Join the list back at the end. Of course, this assumes that there are no more than two consecutive spaces between words.
def capitalise(text):
words = text.split()
new_words = [words[0].capitalize()]
i = 1
while i < len(words) - 1:
new_words.append(words[i])
if "." in words[i] or "!" in words[i] or "?" in words[i]:
i += 1
new_words.append(words[i].capitalize())
i += 1
return " ".join(new_words)
If you can use the re module which is available by default in python, this is how you could do it:
import re
a = 'test this. and that, and maybe something else?even without space. or with multiple.\nor line breaks.'
print(re.sub(r'[.!?]\s*\w', lambda x: x.group(0).upper(), a))
Would lead to:
test this. And that, and maybe something else?Even without space. Or with multiple.\nOr line breaks.

Python 3 - How to capitalize first letter of every sentence when translating from morse code

I am trying to translate morse code into words and sentences and it all works fine... except for one thing. My entire output is lowercased and I want to be able to capitalize every first letter of every sentence.
This is my current code:
text = input()
if is_morse(text):
lst = text.split(" ")
text = ""
for e in lst:
text += TO_TEXT[e].lower()
print(text)
Each element in the split list is equal to a character (but in morse) NOT a WORD. 'TO_TEXT' is a dictionary. Does anyone have a easy solution to this? I am a beginner in programming and Python btw, so I might not understand some solutions...
Maintain a flag telling you whether or not this is the first letter of a new sentence. Use that to decide whether the letter should be upper-case.
text = input()
if is_morse(text):
lst = text.split(" ")
text = ""
first_letter = True
for e in lst:
if first_letter:
this_letter = TO_TEXT[e].upper()
else:
this_letter = TO_TEXT[e].lower()
# Period heralds a new sentence.
first_letter = this_letter == "."
text += this_letter
print(text)
From what is understandable from your code, I can say that you can use the title() function of python.
For a more stringent result, you can use the capwords() function importing the string class.
This is what you get from Python docs on capwords:
Split the argument into words using str.split(), capitalize each word using str.capitalize(), and join the capitalized words using str.join(). If the optional second argument sep is absent or None, runs of whitespace characters are replaced by a single space and leading and trailing whitespace are removed, otherwise sep is used to split and join the words.

Python: if a word contains a digit

I'm writing a function that will take a word as a parameter and will look at each character and if there is a number in the word, it will return the word
This is my string that I will iterate through
'Let us look at pg11.'
and I want to look at each character in each word and if there is a digit in the word, I want to return the word just the way it is.
import string
def containsDigit(word):
for ch in word:
if ch == string.digits
return word
if any(ch.isdigit() for ch in word):
print word, 'contains a digit'
To make your code work use the in keyword (which will check if an item is in a sequence), add a colon after your if statement, and indent your return statement.
import string
def containsDigit(word):
for ch in word:
if ch in string.digits:
return word
Why not use Regex?
>>> import re
>>> word = "super1"
>>> if re.search("\d", word):
... print("y")
...
y
>>>
So, in your function, just do:
import re
def containsDigit(word):
if re.search("\d", word):
return word
print(containsDigit("super1"))
output:
'super1'
You are missing a colon:
for ch in word:
if ch.isdigit(): #<-- you are missing this colon
print "%s contains a digit" % word
return word
Often when you want to know if "something" contains "something_else" sets may be usefull.
digits = set('0123456789')
def containsDigit(word):
if set(word) & digits:
return word
print containsDigit('hello')
If you desperately want to use the string module. Here is the code:
import string
def search(raw_string):
for raw_array in string.digits:
for listed_digits in raw_array:
if listed_digits in raw_string:
return True
return False
If I run it in the shell here I get the wanted resuts. (True if contains. False if not)
>>> search("Give me 2 eggs")
True
>>> search("Sorry, I don't have any eggs.")
False
Code Break Down
This is how the code works
The string.digits is a string. If we loop through that string we get a list of the parent string broke down into pieces. Then we get a list containing every character in a string with'n a list. So, we have every single characters in the string! Now we loop over it again! Producing strings which we can see if the string given contains a digit because every single line of code inside the loop takes a step, changing the string we looped through. So, that means ever single line in the loop gets executed every time the variable changes. So, when we get to; for example 5. It agains execute the code but the variable in the loop is now changed to 5. It runs it agin and again and again until it finally got to the end of the string.

Categories