How to handle typos when cleansing a text file?

How to handle typos when cleansing a text file? - python

I'm trying to clean a text file in python. I noticed the text file I'm reading in has several typos (ie. chevroelt instead of chevrolet). I have a specific list of typos that I'd like to address. How would I approach making these edits as I read in an input file to a new (clean) output file? Below is the code I have written to read in the original text file and output to a new (clean) file. I appreciate any help in advance!
def _clean_data(self):
ifname = AutoMPGData.DATA_FILE_ORIG
ofname = AutoMPGData.DATA_FILE_CLEAN
with open(ifname, 'r') as ifile:
with open(ofname, 'w') as ofile:
for line in ifile:
ofile.write(line.expandtabs())

If you have a list of specific issues you'd like to address, I would create a map (tuple?) of all words with typo as key and the correct spelling as value, then something like this (pseudocode):
for each word in file:
if word is in keys:
word = key.value

Related

delete all .csv file content in python

so, my program will replace the old data with new data which is the program will put in same .csv file. and after i run the program it is not replaced. below is my code.
TEXTFILE = open("data.csv", "w")
for i in book_list:
TEXTFILE.write("{},{},{},{}".format(i[0],i[1],i[2],i[3]))
TEXTFILE.close()
book_list is the list that save new data to stored
the result i got:
k,k,45,c
a,a,65,r
d,s,65,r
as,as,65,r
df,df6,65,r
as,as,6,r
as,as,46,r
as,as,45,r
as,as,56,rk,k,45,r
a,a,65,r
d,s,65,r
as,as,65,r
df,df6,65,r
as,as,6,r
as,as,46,r
as,as,45,r
as,as,56,r
it stored to csv with combining old and new content.
well the origininal file is looks like this:
k,k,45,r
a,a,65,r
d,s,65,r
as,as,65,r
df,df6,65,r
as,as,6,r
as,as,46,r
as,as,45,r
as,as,56,r
idk how to explain. but I expect that the result will change one line with new data in the fourth row. for example, the previous line is k,k,45,r (line 1). and the program will change it become k,k,45,c that way
hope you all can help me :)

Try using .truncate() , if called after opening file it destroys it's content.
TEXTFILE = open("data.csv", "w")
TEXTFILE.truncate()
for i in book_list:
TEXTFILE.write("{},{},{},{}".format(i[0],i[1],i[2],i[3]))
TEXTFILE.close()
I hope I understood you correctly.

overwrite some data in a python file

I was hoping someone could help me, I am currently trying to add some data into a text file, however the way I am doing it isnt giving me what I want. I Have a file with 20+ lines in it with text and want to overwrite the first 30 characters of the file with 30 new characters. The code I have deletes all the content and adds the 30 characters only. Please help :)
file=open("text.txt", "w")
is there something wrong with this to why its reoving all of the original data too instead of simply overwriting over it?

One way is to read the whole file into a single string, create a new string with first 30 characters replaced and rewrite the whole file. This can be done like this:
with open("text.txt", "r") as f:
data = f.read()
new_thirty_characters = '<put your data here>'
new_data = new_thirty_characters + data[30:]
with open("text.txt", "w") as f:
f.write(new_data)
Ideally, you have to check that file contains more than 30 characters after it's read. Also, do not use file and other reserved names as variable names.

Python: Extracting lines from a file using another file as key

I have a 'key' file that looks like this (MyKeyFile):
afdasdfa ghjdfghd wrtwertwt asdf (these are in a column, but I never figured out the formatting, sorry)
I call these keys and they are identical to the first word of the lines that I want to extract from a 'source' file. So the source file (MySourceFile) would look something like this (again, bad formatting, but 1st column = the key, following columns = data):
afdasdfa (several tab delimited columns)
.
.
ghjdfghd ( several tab delimited columns)
.
wrtwertwt
.
.
asdf
And the '.' would indicate lines of no interest currently.
I am an absolute novice in Python and this is how far I've come:
with open('MyKeyFile','r') as infile, \
open('MyOutFile','w') as outfile:
for line in infile:
for runner in source:
# pick up the first word of the line in source
# if match, print the entire line to MyOutFile
# here I need help
outfile.close()
I realize there may be better ways to do this. All feedback is appreciated - along my way of solving it, or along more sophisticated ones.
Thanks
jd

I think that this would be a cleaner way of doing it, assuming that your "key" file is called "key_file.txt" and your main file is called "main_file.txt"
keys = []
my_file = open("key_file.txt","r") #r is for reading files, w is for writing to them.
for line in my_file.readlines():
keys.append(str(line)) #str() is not necessary, but it can't hurt
#now you have a list of strings called keys.
#take each line from the main text file and check to see if it contains any portion of a given key.
my_file.close()
new_file = open("main_file.txt","r")
for line in new_file.readlines():
for key in keys:
if line.find(key) > -1:
print "I FOUND A LINE THAT CONTAINS THE TEXT OF SOME KEY", line
You can modify the print function or get rid of it to do what you want with the desired line that contains the text of some key. Let me know if this works

As I understood (corrent me in the comments if I am wrong), you have 3 files:
MySourceFile
MyKeyFile
MyOutFile
And you want to:
Read keys from MyKeyFile
Read source from MySourceFile
Iterate over lines in the source
If line's first word is in keys: append that line to MyOutFile
Close MyOutFile
So here is the Code:
with open('MySourceFile', 'r') as sourcefile:
source = sourcefile.read().splitlines()
with open('MyKeyFile', 'r') as keyfile:
keys = keyfile.read().split()
with open('MyOutFile', 'w') as outfile:
for line in source:
if line.split():
if line.split()[0] in keys:
outfile.write(line + "\n")
outfile.close()

Reading text from a text file in python

I am having trouble trying to figure out how to import text from a text file easily or at least a memorable method. I am trying to make a program that insults the user (for school), that brings in a word/s from one text file, adds another word/s from the second file and the final word/s from the third text file...
I am having trouble finding a way of coding to do this...I have the random number up and running to pick the text I just need to know how to access strings or text in a text file.

The easiest way is to use the with statement. It takes care of closing the file for you.
with open("file.txt") as f:
for line in f:
# do something with line
Or read the data into directly into a list:
with open("file.txt") as f:
lines = list(f)
# do something with lines

how do i delete a line that contains a certain string?

What I am trying to do here is :
1.read lines from a text file.
2.find lines that contain certain string.
3.delete that line and write the result in a new text file.
For instance, if I have a text like this:
Starting text
How are you?
Nice to meet you
That meat is rare
Shake your body
And if my certain string is 'are'
I want the output as:
Starting text
Nice to meet you
Shake your body
I don't want something like:
Starting text
Nice to meet you
Shake your body
I was trying something like this:
opentxt = open.('original.txt','w')
readtxt = opentxt.read()
result = readtxt.line.replace('are', '')
newtxt = open.('revised.txt','w')
newtxt.write(result)
newtxt.close()
But it don't seem to work...
Any suggestions? Any help would be great!
Thanks in advance.

Same as always. Open source file, open destination file, only copy lines that you want from the source file into the destination file, close both files, rename destination file to source file.

with open('data.txt') as f,open('out.txt') as f2:
for x in f:
if 'are' not in x:
f2.write(x.strip()+'\n') #strip the line first and then add a '\n',
#so now you'll not get a empty line between two lines

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to handle typos when cleansing a text file? - python

If you have a list of specific issues you'd like to address, I would create a map (tuple?) of all words with typo as key and the correct spelling as value, then something like this (pseudocode): for each word in file: if word is in keys: word = key.value

Related

delete all .csv file content in python

overwrite some data in a python file

Python: Extracting lines from a file using another file as key

Reading text from a text file in python

how do i delete a line that contains a certain string?

Categories

Resources