Finding specific words in a file - python

I have to write a program in python where the user is given a menu with four different "word games". There is a file called dictionary.txt and one of the games requires the user to input a) the number of letters in a word and b) a letter to exclude from the words being searched in the dictionary (dictionary.txt has the whole dictionary). Then the program prints the words that follow the user's requirements. My question is how on earth do I open the file and search for words with a certain length in that file. I only have a basic code which only asks the user for inputs. I'm am very new at this please help :(
this is what I have up to the first option. The others are fine and I know how to break the loop but this specific one is really giving me trouble. I have tried everything and I just keep getting errors. Honestly, I only took this class because someone said it would be fun. It is, but recently I've really been falling behind and I have no idea what to do now. This is an intro level course so please be nice I've never done this before until now :(
print
print "Choose Which Game You Want to Play"
print "a) Find words with only one vowel and excluding a specific letter."
print "b) Find words containing all but one of a set of letters."
print "c) Find words containing a specific character string."
print "d) Find words containing state abbreviations."
print "e) Find US state capitals that start with months."
print "q) Quit."
print
choice = raw_input("Enter a choice: ")
choice = choice.lower()
print choice
while choice != "q":
if choice == "a":
#wordlen = word length user is looking for.s
wordlen = raw_input("Please enter the word length you are looking for: ")
wordlen = int(wordlen)
print wordlen
#letterex = letter user wishes to exclude.
letterex = raw_input("Please enter the letter you'd like to exclude: ")
letterex = letterex.lower()
print letterex

Here's what you'd want to do, algorithmically:
Open up your file
Read it line by line, and on each line (assuming each line has one and only one word), check if that word is a) of appropriate length and b) does not contain the excluded character
What sort of control flow would this suggest you use? Think about it.
I'm not sure if you're confused about how to approach this from a problem-solving standpoint or a Python standpoint, but if you're not sure how to do this specifically in Python, here are some helpful links:
The Input and Output section of the official Python tutorial
The len() function, which can be used to get the length of a string, list, set, etc.

To open the file, use open(). You should also read the Python tutorial sec. 7, file input/output.
Open a file and get each line
Assuming your dictionary.txt has each word on a separate line:
opened_file = open('dictionary.txt')
for line in opened_file:
print(line) # Put your code here to run it for each word in the dictionary
Word length:
You can check the length of a string using its str.len() method. See the Python documentation on string methods.
"Bacon, eggs and spam".len() # returns '20' for 20 characters long
Check if a letter is in a word:
Use str.find(), again from the Python sring methods.
Further comments after seeing your code sample:
If you want to print a multi-line prompt, use the heredoc syntax (triple quotes) instead of repeated print() statements.
What happens if, when asked "how many letters long", your user enters bacon sandwich instead of a number? (Your assignment may not specify that you should gracefully handle incorrect user input, but it never hurts to think about it.)

My question is how on earth do I open the file
Use the with statement
with open('dictionary.txt','r') as f:
for line in f:
print line
and search for words with a certain length in that file.
First, decide what is the length of the word you want to search.
Then, read each line of the file that has the words.
Check each word for its length.
If it matches the length you are looking for, add it to a list.

Related

Keeping track of position in searched text when the same word appears more than once

def test():
users_note = """There is a substantial need for child mental health support –
evidence shows that at any given year,
1 in 10 young people will have a diagnosable mental health problem.
Out of these incredibly high numbers,
70% of these young people do not receive adequate mental health support at all,
and of the 30% that do,
only half of them improve.
Though we have evidence-based intervention techniques that work,
we are still relying on outdated technologies and delivery mechanisms.
"""
note_list = users_note.split()
while True:
for users_try in range(5):
word_list = input('Enter words you gonna say next: ').split()
for word in word_list:
if word not in note_list:
print('Sorry, try again')
else:
print(note_list[note_list.index(word) + 1])
else:
break
This function is essentially a teleprompter. When the user inputs a word or multiple words in any order, the function prints the word right after each of the user's input words. However, there's an edge case when an input word occurs multiple times within the pre-made text; the function always prints the word the first occurrence. How can this function be modified in order to keep track of the user's input word position within the pre-made text? For example:
For the first occurrence of "a" in the pre-made text:
Enter words you gonna say next: a
substantial
If the user has already passed that point in the text, inputting "a" a second time:
Enter words you gonna say next: a
diagnosable
For the first occurrence of "these" in the pre-made text:
Enter words you gonna say next: these
incredibly
If the user has already passed that point in the text, inputting "these" a second time:
Enter words you gonna say next: these
young
In other words, the program should continue moving forward in the text once a word has already been searched for. How do I accomplish that?
As long as you don't intend for the user to ever be able to go backwards in the text, an easy solution is to modify note_list as you go:
if word not in note_list:
print('Sorry, try again')
else:
note_list = note_list[note_list.index(word) + 1:]
print(note_list[0])
This makes it impossible to ever "repeat" a particular word, whether the word is unique or not.

how do I add phrases to a file without deleting the file

The text file which is a "txt" file. Also, I have separate files for different length phrases (spaces count towards the phrase length) I saw phrases because it can be multiple words, but in the example below I use three letter words all of which are one word. Also, imagine each phrase is on a new line. Each phrase is separated by a comma. Imagine you have a file like this:
app,
bar,
car,
eel,
get,
pod,
What I want is to be able to add one or more phrases that will be assumed to only contain lowercase alphabetical letters and/or spaces.
For example, let us say I want to add the phrases in this order:
(cat, bat, car, hat, mom, rat)
basically, I want to add these phrases to the file without deleting
the file and making sure no phrases repeat in the file as well as making sure they are alphabetically sorted. Spaces are assumed to be after the letter z in terms of alphabetically sorting them. So after inputting these phrases, the file should look like this:
'
app,
bar,
bat,
car,
eel,
get,
hat,
mom,
pod,
rat
'
And each file will be assumed to become at least a gigabyte of data. What is the fastest/least memory consuming/etc. So copying the file in order to accomplish this is a no go.
I haven't tried anything that 100% works. I know what to do, I just don't know how to do it. Here are the main points that I need to accomplish.
1) Make sure the phrase(s) are created (using input() function)
2) Open the file of organized words (using "with open(filename)" statements)
3) Put each phrase into the "correct" spot in the file. By "correct" I mean that is alphabetical and is not a repeat.
4) Make sure the file doesn't get deleted.
Here is what I have currently (changed it a bit and it is doing MORE of what I want, but not everything):
phrase_to_add = input('Please enter the phrase: ').lower()
with open('/Users/ian/Documents/three_character_phrases.txt') as file:
unique_phrases = list(file.read().split())
unique_phrases.append(phrase_to_add)
unique_phrases.sort()
list_of_phrases = set()
for phrase in unique_phrases:
list_of_phrases.add(phrase)
with open('/Users/ian/Documents/three_character_phrases.txt', 'w') as fin:
for phrase in list_of_phrases:
fin.write(phrase + '\n')
So I started with BOTH files being empty and I added the word 'cow' by putting it into the input and this what the file looked like:
three_character_phrases.txt:
cow
then I inputted the word "bat" and I got this:
bat
cow
then I added the word "bawk" (I know it isn't a 3 letter word but I'll take care of making sure the right words go into the right files)
I got this:
bawk
bat
cow
It looks like you're getting wrapped up in the implementation instead of trying to understand the concept, so let me invite you to take a step back with me.
You have a data structure that resembles a list (since order is relevant) but allows no duplicates.
['act', 'bar', 'dog']
You want to add an entry to that list
['act', 'bar', 'cat', 'dog']
and serialize the whole thing to file afterwards so you can use the same data between multiple sessions.
First up is to establish your serialization method. You've chosen a plain text file, line delimited. There's nothing wrong with that, but if you were looking for alternatives then a csv, a json, or indeed serializing directly to database might be good too. Let's proceed forward under the assumption that you won't change serialization schemas, though.
It's easy to read from file
from pathlib import Path
FILEPATH = Path("/Users/ian/Documents/three_character_phrases.txt")
def read_phrases():
with FILEPATH.open(mode='r') as f:
return [line.strip() for line in f]
and it's easy to write to it, too.
# Assume FILEPATH is defined here, and in all future snippets as well.
def write_phrases(phrases):
with FILEPATH.open(mode='w') as f:
f.writelines(f'{phrase}\n' for phrase in phrases)
# this is equivalent to:
# text = '\n'.join(phrases)
# f.write(text + '\n')
You've even figured out how to have the user enter a new value (though your algorithm could use work to make the worst case better. Since you're always inserting into a sorted list, the bisect stdlib module can help your performance here for large lists. I'll leave that for a different question though).
Since you've successfully done all the single steps, the only thing holding you back is to put them all together.
phrases = read_phrases()
phrase_to_add = input('Please enter the phrase: ').lower()
if phrase_to_add not in phrases:
phrases.append(phrase_to_add)
phrases.sort() # this is, again, not optimal. Look at bisect!
write_phrases(phrases)

How to fix this Anagram Scrambler code?

print "YOU HAVE CHOSEN TO REARRANGE YOUR THE WORD THAT YOU ARE ABOUT TO ENTER..."
word = raw_input ("FIRSTLY YOU MUST ENTER A WORD TO BE REARRANGED, ENTER IT HERE:")
character_save = word[1]
def anagram(word):
if len(word)>1:
print str.replace('a','b')
word = str.replace(word[1],word[3])
word= str.replace(word[3], character_save,1)
print word
anagram(word)
I tried to fix this on numerous occasions, the problem with the first time was that it would just replicate characters instead of replacing the positions, the second time I tried to store the position that I was going to replace in a variable but now it mentions that I have only one argument given (when it should be 2).
Would it be easier to do this with a list instead of a string?
The replace message that you are using is called on the string that you want to replace and not on the str type itself.
In your case that is the word parameter that you are providing.
So if you replace the instances of str.replace with word.replace your code will run. However, it doesn't create an anagram yet. The algorithm is still lacking.

Need help understanding my computing CA A453 task?

For my assessment in my computing class I have completed the first two tasks but need help understanding what the third one is asking me. it states "Develop a program that builds upon the technique from Task 2 to compress a text file with several sentences, including punctuation. The program should be able to compress a file into a list of words and list of positions to recreate the original file. It should also be able to take a compressed file and recreate the full text, including punctuation and capitalisation, of the original file".
some of this i understand but i don't really understand what it actually wants me to do. Also as it says i have to build on the technique from task two so the description and solution for task two is below(solution isn't finished because i don't have access to my finished one)
"Develop a program that identifies individual words in a sentence, stores these in a list and replaces each word in the original sentence with the position of that word in the list.
For example, the sentence ASK NOT WHAT YOUR COUNTRY CAN DO FOR YOU ASK WHAT YOU CAN DO FOR YOUR COUNTRY
contains the words ASK, NOT, WHAT, YOUR, COUNTRY, CAN, DO, FOR, YOU
The sentence can be recreated from the positions of these words in this list using the sequence
1,2,3,4,5,6,7,8,9,1,3,9,6,7,8,4,5
Save the list of words and the positions of these words in the sentence as separate files or as a single file."
And the code for task 2:
restart = 'y'
while (True):
sentence = input("What is your sentence?: ")
sentence_split = sentence.split()
sentence2 = [0]
print(sentence)
for count, i in enumerate(sentence_split):
if sentence_split.count(i) < 2:
sentence2.append(max(sentence2) + 1)
else:
sentence2.append(sentence_split.index(i) +1)
sentence2.remove(0)
print(sentence2)
restart = input("would you like restart the programme y/n?").lower()
if (restart == "n"):
print ("programme terminated")
break
elif (restart == "y"):
pass
else:
print ("Please enter y or n")
As your solution for the second task shows, you have already compressed on sentence with the technique described in the task.
You should now provide a program, that has two functionalities.
read a file and use your technique to create a list of all containing words and a sequence of of all this words, writing this into a file (or stdout)
read the output created by the first function to produce the file before.
Your program may have this command line interface - maybe this makes the task more clear for you.
python task3.py compress /path/to/inputtext.txt /path/to/outputfile
python task3.py extract /path/to/outputfile /path/to/inputtext.txt
This is a very simple way to compress a text file. On top you need to deal with pythons file api. nice task!
I am doing the same task as you for my GCSE and I was confused as well.
However , task 3 is asking you to alter your code so that when you split your sentence, is should be case sensitive now. eg hello and Hello must be treated as separate entities. so they must have different numbers when regenerating.
Also, your code must be compatible for multiple sentences rather than just one sentence.
Finally, you must split the punctuation marks into separate entities also.
use regex values to strip out punctuation.
remove .lower() to make your sentences case sensitive.
allow the code to take "." mark as an entity.
hope that helped.

Spell check program in python

Exercise problem: "given a word list and a text file, spell check the
contents of the text file and print all (unique) words which aren't
found in the word list."
I didn't get solutions to the problem so can somebody tell me how I went and what the correct answer should be?:
As a disclaimer none of this parses in my python console...
My attempt:
a=list[....,.....,....,whatever goes here,...]
data = open(C:\Documents and Settings\bhaa\Desktop\blablabla.txt).read()
#I'm aware that something is wrong here since I get an error when I use it.....when I just write blablabla.txt it says that it can't find the thing. Is this function only gonna work if I'm working off the online IVLE program where all those files are automatically linked to the console or how would I do things from python without logging into the online IVLE?
for words in data:
for words not in a
print words
wrong = words not in a
right = words in a
print="wrong spelling:" + "properly splled words:" + right
oh yeh...I'm very sure I've indented everything correctly but I don't know how to format my question here so that it doesn't come out as a block like it has. sorry.
What do you think?
There are many things wrong with this code - I'm going to mark some of them below, but I strongly recommend that you read up on Python control flow constructs, comparison operators, and built-in data types.
a=list[....,.....,....,whatever goes here,...]
data = open(C:\Documents and Settings\bhaa\Desktop\blablabla.txt).read()
# The filename needs to be a string value - put "C:\..." in quotes!
for words in data:
# data is a string - iterating over it will give you one letter
# per iteration, not one word
for words not in a
# aside from syntax (remember the colons!), remember what for means - it
# executes its body once for every item in a collection. "not in a" is not a
# collection of any kind!
print words
wrong = words not in a
# this does not say what you think it says - "not in" is an operator which
# takes an arbitrary value on the left, and some collection on the right,
# and returns a single boolean value
right = words in a
# same as the previous line
print="wrong spelling:" + "properly splled words:" + right
I don't know what you are trying to iterate over, but why don't you just first iterate over your words (which are in the variable a I guess?) and then for every word in a you iterate over the wordlist and check whether or not that word is in the wordslist.
I won't paste code since it seems like homework to me (if so, please add the homework tag).
Btw the first argument to open() should be a string.
It's simple really. Turn both lists into sets then take the difference. Should take like 10 lines of code. You just have to figure out the syntax on your own ;) You aren't going to learn anything by having us write it for you.

Categories