Spell checking with custom dictionary

Spell checking with custom dictionary - python

Need your guidance!
Want to check some text file for any spelling mistakes against custom dictionary.
Here is the code:
Dictionary=set(open("dictionary.txt").read().split())
print Dictionary
SearchFile = open(input("sample.txt"))
WordList = set()
for line in SearchFile:
line = line.strip()
if line not in Dictionary:
WordList.add(line)
print(WordList)
But when I open and check back the sample file nothing changed. What Im doing wrong?

What you are doing wrong is not explicitly changing anything in any file.
Here is a little bit of code to show how to write stuff to files...
fp = open(somefilepath,'w')
this line opens a file for writing, the 'w' tells python to create the file if it does not exist, but also deletes the contents of the file if it does exist. If you want to open a file for writing and keep the current contents use 'a' instead. 'a' is for append.
fp.write(stuff)
writes whatever is in the variable 'stuff' to the file.
Hope this helps. For code more specific to your problem please tell us what exactly you want to write to your file.
Also, here is some documentation that should help you to better understand the topic of files: http://docs.python.org/tutorial/inputoutput.html#reading-and-writing-files
EDIT: but you are not changing anything!
By the end of your script here is what you have accomplished:
1. Dictionary is a set containing all acceptable words
2. WordList is a set containing all not acceptable lines
3. You have read to the end of SearchFile
If I am understanding your question correctly what you want to now do is:
4. find out which Disctionary word each line stored in Wordlist should be
5. re-write SearchFile with the offending lines replaced.
If this is correct, how do you intend to figure out which WordList entry is supposed to be which Dictionary entry? How do you know the actual corrections? Have you attempted this part of the script (it is the crux, after all. It would only be polite). Can you please share with us your attempt at this part.
Lets assume you have this function:
def magic(line,dictionary):
"""
this takes a line to be checked, and a set of acceptable words.
outputs what line is meant to be.
PLEASE tell us your approach to this bit
"""
if line in dictionary:
return line
...do stuff to find out which word is being mis spelt, return that word
Dictionary=set(open("dictionary.txt").read().split())
SearchFile = open("sample.txt",'r')
result_text = ''
for line in SearchFile:
result_text += magic(line.strip(),Dictionary) #add the correct line to the result we want to save
result_text += '\n'
SearchFile = open("sample.txt",'w')
SearchFile.write(result_text) # here we actually make some changes
If you have not thought about how to find the actual dictionary value that mis-spelt lines should be corrected to become, try this out: http://norvig.com/spell-correct.html
To re-iterate a previous point, it is important that you show that you have at least attempted to solve the crux of your problem if you want any meaningful help.

Related

Modifying letters in a file

I'm new to programming so I'm pretty lost. I'm currently learning Python and I need to open a text file and change every letter to the next one in the alphabet (e.g a -> b, b -> c, etc.). How would I go about writing a code like this?

This sounds like a neat problem to work on for a beginner.
Things you may want to look at:
The open() function, which allows you to open files and read/write to them. For example
https://docs.python.org/3/library/functions.html#open
with open('test.out', 'r+') as fi:
all_lines = fi.readlines() # Read all lines from the file
fi.write('this string will be written to the file')
# The file is closed at this point in the code; `with()` is a context manager, look that up
The os.replace() function, which lets you overwrite one file with another. You might try reading the input file, writing to a new output file, then overwriting the input file with the new output file; this will let you do that.
https://docs.python.org/3/library/os.html#os.replace
Replacing a character with the next increment of a character is an interesting twist, as it's not something that a lot of python programmers have to deal with. Here's one way to increment a character:
x = 'c'
print(chr(ord(x) + 1)) # will print 'd'
Without just giving away the answer, this should give you the pieces that you need to get started, feel free to ask more questions.

I think that this will work very well. The code can be shortened I think but Im still not sure how. Not an expert with with open statements.
with open("(your text file path)", "r") as f:
data = f.readline()
new_data = ""
for x in range(len(data)):
i = ord(data[x][0])
i += 1
x = chr(i)
new_data += x
print(new_data)
with open("(your text file path)", "w") as f:
f.write(new_data)
You must change your letters to numbers so that you can increment them by one, and then change them back to letters. This should work.

How can I delete and rewrite a line without unknown characters?

I have a database.txt file the first column is for usernames the second passwords and the rest 5 recovery question and answers alternating. I want to allow the user to be able to change the password of their details, without affecting another users username as they may be the same. I have found a way to delete the previous one and append the new line of modified details to the file. However, the is always a string or unknown characters at the start of the appended line. AND other characters are being changed not the second value in the list. Please help me find a way to avoid this.
https://repl.it/repls/NecessaryBoldButtonsYou can find the code here changing it will affect everyone, so please copy it elsewhere.
https://onlinegdb.com/BJbsn9-cL
I just need the password to be changed on a user input not other strings, the reason for all this code is that when changing a person's password another username could be changed.This is the original file
This is what happens afterwards, the second string in the list of the line which where data[0] = "bye" should only be changed to newpass, not all of the others
'''
import linecache
f = open("database.txt" , "r+")
for loop in range(3):
line = f.readline()
data = line.split(",")
if data[1] == "bye":
print(data[1]) #These are to help me understand what is happening
print(data[0])
b = data[0]
newpass = "Hi"
a = data[1]
fn = 'database.txt'
e = open(fn)
output = []
str="happy"
for line in e:
if not line.startswith(str):
output.append(line)
e.close()
print(output)
e = open(fn, 'w')
e.writelines(output)
e.close()
line1 = linecache.getline("database.txt" ,loop+1)
print(line)
password = True
print("Password Valid\n")
write = (line1.replace(a, newpass))
write = f.write(line1.replace(a, newpass))
f.close()
'''
This is the file in text:
username,password,Recovery1,Answer1,Recovery2,Answer2,Recovery3,Answer3,Recovery4,Answer4,
Recovery5,Answer5,o,o,o,o,o,o,o,o,o,o,
happy,bye,o,o,o,o,o,o,o,o,o,o,
bye,happy,o,o,o,o,o,o,o,o,o,o,
Support is very much appreciated
Feel free to change the code as much as you need to, as it is already a mess
Thanks in Advance

This should be pretty easy. The basic idea is:
open input file for reading
open output file for writing
for each line in input file
if password = "happy"
change user name in line
write line to output file
It should be pretty easy to convert that to python.
From comments, and by examining your code, I get the feeling that you're trying to update a line in-place. That is, it looks like your expectation is that given the file "database.txt" that contains this:
username,password,Recovery1,Answer1,Recovery2,Answer2,Recovery3,Answer3, Recovery4,Answer4,Recovery5,Answer5,
o,o,o,o,o,o,o,o,o,o,
happy,bye,o,o,o,o,o,o,o,o,o,o,
bye,happy,o,o,o,o,o,o,o,o,o,o,
When you make the change, your new "database.txt" will contain this:
username,password,Recovery1,Answer1,Recovery2,Answer2,Recovery3,Answer3, Recovery4,Answer4,Recovery5,Answer5,
o,o,o,o,o,o,o,o,o,o,
happy,Hi,o,o,o,o,o,o,o,o,o,o,
bye,happy,o,o,o,o,o,o,o,o,o,o,
You can do that, but you can't do it in-place. You have to write all the lines of the file, including the changed line, to a new temporary file. Then you can delete the old "database.txt" and rename the temporary file.
You can't update a line in a text file, because if you change the length of the line then you'll either end up with extra space at the end of the line you changed (because the new line has fewer characters than the old line), or you'll overwrite the beginning of the next line (the new line is longer than the old line).
The only other option is to load all of the lines into memory and close the file. Then change the line or lines you want to change, in memory. Finally, open the "database.txt" file for writing and output all of the lines from memory to the file.

combining txt files and counting words

def word_counter(s)
word_list=s.split()
return len(word_list)
f=open("a.txt")
total=0
for i in f.readlines():
total+=word_counter(i)
print(total)
if I want to count number of alphabet(without blank), number of word used and average length of each 'a.txt', 'b.txt', 'c.txt', 'd.txt', 'e.txt'. At last, I want to get a 'total.txt' of all txt combined.
I dont know how to do more..
please help

You actually have the concept right. Just need to add a little more to reach your desired output.
Remember when you use f = open("a.txt"), make sure you call f.close(). Or, use the with keyword, like I did in the example. It automatically closes the file for you, even if you forget to.
I won't give the exact code as it is, but will provide the steps so that you learn the concepts.
Put all the .txt files names in a list.
Example, list_FileNames = ["a.txt", "b.txt"]
Then open each file, and get the entire file into a string.
for file in list_FileNames:
with open(file, 'r') as inFile:
myFileInOneString = inFile.read().replace('\n', '')
You have the right function to count words. For characters: len(myFileInOneString) - myFileInOneString.count(' ')
Save all these values into a varible and write to another file. Check how to write to a file: How to Write to a File in Python

Write a program in Python 3.5 that reads a file, then writes a different file with the same text that was in the first one as well as more?

The exact question to this problem is:
*Create a file with a 20 lines of text and name it “lines.txt”. Write a program to read this a file “lines.txt” and write the text to a new file, “numbered_lines.txt”, that will also have line numbers at the beginning of each line.
Example:
Input file: “lines.txt”
Line one
Line two
Expected output file:
1 Line one
2 Line two
I am stuck, and this is what I have so far. I am a true beginner to Python and my instructor does not make things very clear. Critique and help much appreciated.
file_object=open("lines.txt",'r')
for ln in file_object:
print(ln)
count=1
file_input=open("numbered_lines.txt",'w')
for Line in file_object:
print(count,' Line',(str))
count=+1
file_object.close
file_input.close
All I get for output is the .txt file I created stating lines 1-20. I am very stuck and honestly have very little idea about what I am doing. Thank you

You have all the right parts, and you're almost there:
When you do
for ln in file_object:
print(ln)
you've exhausted the contents of that file, and you won't be able to read them again, like you try to do later on.
Also, print does not write to a file, you want file_input.write(...)
This should fix all of that:
infile = open("lines.txt", 'r')
outfile = open("numbered_lines.txt", 'w')
line_number = 1
for line in infile:
outfile.write(str(line_number) + " " + line)
infile.close()
outfile.close()
However, here is a more pythonic way to do it:
with open("lines.txt") as infile, open("numbered_lines.txt", 'w') as outfile:
for i, line in enumerate(infile, 1):
outfile.write("{} {}".format(i, line))

Good first try, and with that, I can go through your code and explain what you did right (or wrong)
file_object=open("lines.txt",'r')
for ln in file_object:
print(ln)
This is fine, though generally you want to put a space before and after assignments (you are assigning the results of open to file_object) and add a space after a,` when separating arguments, so you might want to write that like so:
file_object = open("lines.txt", 'r')
for ln in file_object:
print(ln)
However, at this point the internal reference in the file_object have reached the end of the file, so if you wish to reuse the same object, you need to seek back to the beginning position, which is 0. As your assignment only states write to the file (and not on the screen), the above loop should be omitted from the file (but I get what you want to do, you want to see the contents of the file immediately though sometimes instructors are pretty strict on what they accept). Moving on:
count=1
file_input=open("numbered_lines.txt",'w')
for Line in file_object:
Looks pretty normal so far, again, minor formatting issues. In Python, typically we name all variables lower-case, as names with Capitalization are generally reserved for class names (if you wish to, you may read about them). Now we enter into the loop you got
print(count,' Line',(str))
This prints not quite what you want. as ' Line' is enclosed inside a quote, it is treated as a string literal - so it's treated literally as text and not code. Given that you had assigned Line, you want to take out the quotes. The (str) at the end simply just print out the string object and it definitely is not what you want. Also, you forgot to specify the file you want to print to. By default it will print to the screen, but you want to print it to the the numbered_lines.txt file which you had opened and assigned to file_input. We will correct this later.
count=+1
If you format this differently, you are assigning +1 to count. I am guessing you wanted to use the += operator to increment it. Remember this on your quiz/tests.
Finally:
file_object.close
file_input.close
They are meant to be called as functions, you need to invoke them by adding parentheses at the end with arguments, but as close takes no arguments, there will be nothing inside the parentheses. Putting everything together, the complete corrected code for your program should look like this
file_object = open("lines.txt", 'r')
count = 1
file_input = open("numbered_lines.txt", 'w')
for line in file_object:
print(count, line, file=file_input)
count += 1
file_object.close()
file_input.close()
Run the program. You will notice that there is an extra empty line between every line of text. This is because by default the print function adds a new line end character; the line you got from the file included a new-line character at the end (that's what make them lines, right?) so we don't have to add our own here. You can of course change it to an empty string. That line will look like this.
print(count, line, file=file_input, end='')
Naturally, other Python programmers will tell you that there are Pythonic ways, but you are just starting out, don't worry too much about them (although you can definitely pick up on this later and I highly encourage you to!)

The right way to open a file is using a with statement:
with open("lines.txt",'r') as file_object:
... # do something
That way, the context manager introduced by with will close your file at the end of "something " or in case of exception.
Of course, you can close the file yourself if you are not familiar with that. Not that close is a method: to call it you need parenthesis:
file_object.close()
See the chapter 7.2. Reading and Writing Files, in the official documentation.

In the first loop you're printing the contents of the input file. This means that the file contents have already been consumed when you get to the second loop. (Plus the assignment didn't ask you to print the file contents.)
In the second loop you're using print() instead of writing to a file. Try file_input.write(str(count) + " " + Line) (And file_input seems like a bad name for a file that you will be writing to.)
count=+1 sets count to +1, i.e. positive one. I think you meant count += 1 instead.
At the end of the program you're calling .close instead of .close(). The parentheses are important!

Delete a line in multiple text files with the same line beginning but varying line ending using Python v3.5

I have a folder full of .GPS files, e.g. 1.GPS, 2.GPS, etc...
Within each file is the following five lines:
Trace #1 at position 0.004610
$GNGSA,A,3,02,06,12,19,24,25,,,,,,,2.2,1.0,2.0*21
$GNGSA,A,3,75,86,87,,,,,,,,,,2.2,1.0,2.0*2C
$GNVTG,39.0304,T,39.0304,M,0.029,N,0.054,K,D*32
$GNGGA,233701.00,3731.1972590,S,14544.3073733,E,4,09,1.0,514.675,M,,,0.49,3023*27
...followed by the same data structure, with different values, over the next five lines:
Trace #6 at position 0.249839
$GNGSA,A,3,02,06,12,19,24,25,,,,,,,2.2,1.0,2.0*21
$GNGSA,A,3,75,86,87,,,,,,,,,,2.2,1.0,2.0*2C
$GNVTG,247.2375,T,247.2375,M,0.081,N,0.149,K,D*3D
$GNGGA,233706.00,3731.1971997,S,14544.3075178,E,4,09,1.0,514.689,M,,,0.71,3023*2F
(I realise the values after the $GNGSA lines don't vary in the above example. This is just a bad example... in the real dataset they do vary!)
I need to remove the lines that begin with "$GNGSA" and "$GNVTG" (i.e. I need to delete lines 2, 3, and 4 from each group of five lines within each .GPS file).
This five-line pattern continues for a varying number of times throughout each file (for some files, there might only be two five-line groups, while other files might have hundreds of the five-line groups). Hence, deleting these lines based on the line number will not work (because the line number would be variable).
The problem I am having (as seen in the above examples) is that the text that follows the "$GNGSA" or "$GNVTG" varies.
I'm currently learning Python (I'm using v3.5), so figured this would make for a good project for me to learn a few new tricks...
What I've tried already:
So far, I've managed to create the code to loop through the entire folder:
import os
indir = '/Users/dhunter/GRID01/' # input directory
for i in os.listdir(indir): # for each "i" (iteration) within the indir variable directory...
if i.endswith('.GPS'): # if the filename of an iteration ends with .GPS, then...
print(i + ' loaded') # print the filename to CLI, simply for debugging purposes.
with open(indir + i, 'r') as my_file: # open the iteration file
file_lines = my_file.readlines() # uses the readlines method to create a list of all lines in the file.
print(file_lines) # this prints the entire contents of each file to CLI for debugging purposes.
Everything in the above works perfectly.
What I need help with:
How do I detect and delete the lines themselves, and then save the file (to the same location; there is no need to save to a different filename)?
The filenames - which usually end with ".GPS" - sometimes end with ".gps" instead (the only difference being the case). My above code will only work with the uppercase files. Besides completely duplicating the code and changing the endswith argument, how do I make it work with both cases?
In the end, my file needs to look something like this:
Trace #1 at position 0.004610
$GNGGA,233701.00,3731.1972590,S,14544.3073733,E,4,09,1.0,514.675,M,,,0.49,3023*27
Trace #6 at position 0.249839
$GNGGA,233706.00,3731.1971997,S,14544.3075178,E,4,09,1.0,514.689,M,,,0.71,3023*2F
Any suggestions, please? Thanks in advance. :)

You're almost there.
import os
indir = '/Users/dhunter/GRID01/' # input directory
for i in os.listdir(indir): # for each "i" (iteration) within the indir variable directory...
if i.endswith('.GPS'): # if the filename of an iteration ends with .GPS, then...
print(i + ' loaded') # print the filename to CLI, simply for debugging purposes.
with open(indir + i, 'r') as my_file: # open the iteration file
for line in my_file:
if not line.startswith('$GNGSA') and not line.startswith('$GNVTG'):
print(line)

As per what the others have said, you're on the right track! Where you're going wrong is in the case-sensitive file extension check, and in reading in the entire file contents at once (this isn't per se wrong, but it's probably adding complexity we won't need).
I've commented your code, removing all the debug stuff for simplicity, to illustrate what I mean:
import os
indir = '/path/to/files'
for i in os.listdir(indir):
if i.endswith('.GPS'): #This CASE SENSITIVELY checks the file extension
with open(indir + i, 'r') as my_file: # Opens the file
file_lines = my_file.readlines() # This reads the ENTIRE file at once into an array of lines
So we need to fix the case sensitivity issue, and instead of reading in all the lines, we'll instead read the file line-by-line, check each line to see if we want to discard it or not, and write the lines we're interested in into an output file.
So, incorporating #tdelaney's case-insensitive fix for file name, we replace line #5 with
if i.lower().endswith('.gps'): # Case-insensitively check the file name
and instead of reading in the entire file at once, we'll instead iterate over the file stream and print each desired line out
with open(indir + i) as in_file, open(indir + i + 'new.gps') as out_file: # Open the input file for reading and creates + opens a new output file for writing - thanks #tdelaney once again!
for line in in_file # This reads each line one-by-one from the in file
if not line.startswith('$GNGSA') and not line.startswith('$GNVTG'): # Check the line has what we want (thanks Avinash)
out_file.write(line + "\n") # Write the line to the new output file
Note that you should make certain that you open the output file OUTSIDE of the 'for line in in_file' loop, or else the file will be overwritten on every iteration which will erase what you've already written to it so far (I suspect this is the issue you've had with the previous answers). Open both files at the same time and you can't go wrong.
Alternatively, you can specify the file access mode when you open the file, as per
with open(indir + i + 'new.gps', 'a'):
which will open the file in append-mode, which is a specialised from of write-mode that preserves the original contents of the file, and appends new data to it instead of overwriting existing data.

Ok, based on suggestions by Avinash Raj, tdelaney, and Sampson Oliver, here on Stack Overflow, and another friend who helped privately, here is the solution that is now working:
import os
indir = '/Users/dhunter/GRID01/' # input directory
for i in os.listdir(indir): # for each "i" (iteration) within the indir variable directory...
if i.lower().endswith('.gps'): # if the filename of an iteration ends with .GPS, then...
if not i.lower().endswith('.gpsnew.gps'): # if the filename does not end with .gpsnew.gps, then...
print(i + ' loaded') # print the filename to CLI.
with open (indir + i, 'r') as my_file:
for line in my_file:
if not line.startswith('$GNGSA'):
if not line.startswith('$GNVTG'):
with open(indir + i + 'new.gps', 'a') as outputfile:
outputfile.write(line)
outputfile.write('\r\n')
(You'll see I had to add in another layer of if statement to stop it from using the output files from previous uses of the script "if not i.lower().endswith('.gpsnew.gps'):", but this line can easily be deleted for anyone who uses these instructions in future)
We switched the open mode on the third-last line to "a" for append, so that it would save all the right lines to the file, rather than overwriting each time.
We also added in the final line to add a line break at the end of each line.
Thanks everyone for their help, explanations, and suggestions. Hopefully this solution will be useful to someone in future. :)

2. The filenames:
The if accepts any expression returning a truth value, and you can combine expressions with the standart boolean operators: if i.endswith('.GPS') or i.endswith('.gps').
You can also put the ... and ... expression after the if in brackets, to feel more sure, but it's not neccessary.
Alternatively, as a less universal solution, (but since you wanted to learn a few tricks :)) you can use string manipulation in this case: an object of type string has a lot of methods. '.gps'.upper() gives '.GPS' -- try, if you can make use of this! (even a printed string is a string object, but your variables behave the same).
1. Finding the Lines:
As you can see in the other solution, you need not read out all of your lines, you can check if want to have them 'on the fly'. But I will stick to your approach with readlines. It gives you a list, and lists support indexing and slicing. Try:
anylist[stratindex, endindex, stride], for any values, so for example try: newlist = range(100)[1::5].
It's always helpfull to try out the easy basic operations in interactive mode, or at the beginning of your script. Here range(100) is just some sample list. Here you see, how the python for-syntax works, differently than in other languages: you can iterate over any list, and if you just need integers, you create a list with integers with range().
So this will work the same with any other list -- e.g. the one you get from readlines()
This selects a slice from the list, beginnig with the second element, ending at the end (since the end index is omitted), and taking every 5th element. Now you have this sub-list, you can just revome it from the original. So for the example with the range:
a = range(100)
del(a[1::5])
print a
So you see, that the appropriate items have been removed. Now do the same with your file_lines, and then proceed to remove the other lines you want to remove.
Then, in a new with block, open the file for writing and do writelines(file_lines), so the remainig lines are written back to the file.
Of course you can also take the approach to look for the content of each line with a for loop over your list and startswith(). Or you can combine the approaches, and check, if deleting lines by number leaves the right starts, so you can print an error if something is unexpected...
3. Saving the file
You can close your file after you have the lines saved in the readlines(). In fact this is done automatically at the end of the with-block. Then just open it in 'w' mode instead of 'r' and do yourfilename.writelines(yourlist). You don't need to save, it's saven on closing.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Spell checking with custom dictionary - python

Related

Modifying letters in a file

How can I delete and rewrite a line without unknown characters?

combining txt files and counting words

Write a program in Python 3.5 that reads a file, then writes a different file with the same text that was in the first one as well as more?

Delete a line in multiple text files with the same line beginning but varying line ending using Python v3.5

Categories

Resources