Having trouble to split a file with text into seperate words - python

I have been trying to split a file with text into distinct words.
I tried using the iter method, the nltk module and just splits, but something doesn't add when i am trying to append the outcome to a list.
Maybe there is some problem with the syntax of my approaching the file.
txt = open(game_file)
print txt.read()
names = []
linestream = iter(txt.read())
for line in linestream:
for word in line.split():
names.append(word)
when I try to print the list names, i just get '[]'.

Remove print txt.read(), you are iterating through empty opened file
Or make new variable text = txt.read() and do stuff with it

When you do txt.read() you're already at the end of your file. So when you try to restart it, the file pointer is already at the end and it does not find anything.
Try to delete your 2nd line and it should work!
Also, you don't need to do iter(txt.read()),
for line in txt should work!

Creating "iter" object of _any_file_obj_.read() returns iter object which iterates over every single character present in the file. Which is surely you dont want to acheive here as you want to split file text into distinct words.
If you want to get the every word form the text file, then you can follow the following approach.
word_list = []
txt = open(any_file) # creating file object
for line in txt.readlines():
if line:
[word_list.append(word) for word in line.split()]
txt.seek(0)
The last line txt.seek(0) is very important.
All this time, your code was giving empty list [] because the files current position after one full iteration was pointing at the end of file (EOF). _file_obj_.seek() can be used to return files current position to wherever you want in the opened file

Related

Saving stripped text from csv file as a string object with Python

I'd like to be able to save text from a file that I had to retrieve from online and decompress (part of an assignment), in order to carry on with my next steps. Specifically, I'd like to save it as its own string object.
I can see the exact text I need when I print it in the following manner.
for line in seq:
print(line.strip())
I just can't seem to figure out how to assign the stripped text to a variable.
You could use a list, and append each line of text to the list.
my_text = []
for line in seq:
my_text.append(line.strip())

Saving Output as List

I have a text file called urldata.txt which I am opening and reading line by line. I wrote a for loop to read it line by line, but I want to save the output I receive as a list.
Here is what I have:
textdata = open("urldata.txt","r")
for line in textdata:
print(line)
this returns:
http://www.google.com
https://twitter.com/search?q=%23ASUcis355
https://github.com/asu-cis-355/course-info
I want to save these lines above as a list. Any suggestions?
I have tried appending and such, however, being new to Python I'm not sure how to go about this.
You just want a list of every line of the file?
urls = open("urldata.txt").read().splitlines()
If you just want the lines as a list, that's trivial:
with open("urldata.txt") as textdata:
lines = list(textdata)
If you want newlines stripped, use a list comprehension to do it:
with open("urldata.txt") as textdata:
lines = [line.rstrip('\r\n') for line in textdata]

How to find a word in a string in a list? (Python)

So im trying to find a way so I can read a txt file and find a specific word. I have been calling the file with
myfile=open('daily.txt','r')
r=myfile.readlines()
that would return a list with a string for each line in the file, i want to find a word in one of the strings inside the list.
edit:
Im sorry I meant if there was a way to find where the word is in the txt file, like x=myfile[12] x=x[2:6]
def findLines():
myWord = 'someWordIWantToSearchFor'
answer = []
with open('daily.txt') as myfile:
lines = myfile.readlines()
for line in lines:
if myWord in line:
answer.append(line)
return answer
with open('daily.txt') as myfile:
for line in myfile:
if "needle" in line:
print "found it:", line
With the above, you don't need to allocate memory for the entire file at once, only one line at a time. This will be much more efficient if your file is large. It also closes the file automatically at the end of the with.
I'm not sure if the suggested answers solve the problem or not, because I'm not sure what the original proposer means. If he really means "words," not "substrings" then the solutions don't work, because, for example,
'cat' in line
evaluates to True if line contains the word 'catastrophe.' I think you may want to amend these answers along the lines of
if word in line.split(): ...

Writing to the end of specific line in python

I have a text file that contains key value pairs separated by a tab like this:
KEY\tVALUE
I have opened this file in append mode(a+) so I can both read and write. Now it may happen that a particular key has more than 1 value. For that I want to be able to go to that particular key and write the next value beside original one separated by a some delimiter(or ,).
Here is what I wish to do:
import io
ft = io.open("test.txt",'a+')
ft.seek(0)
for line in ft:
if (line.split('\t')[0] == "querykey"):
ft.write(unicode("nextvalue"));#Write the another key value beside the original one
Now there are two problems with it:
I will iterate through the file to see on which line the key is present(Is there a faster way?)
I will write a string to the end of that line.
I would be grateful if I can get help with the second point.
The write function always writes at the end of file. How should I write to the end of a specific line? I have searched and have not got very clear answers as to how to do that
You can read whole of file content, do your edit and write edited content to file.
with open('test.txt') as f:
lines = f.readlines()
f= open('test.txt', 'w')#open file for write
for line in lines:
if line.split('\t')[0] == "querykey":
line = line + ',newkey'
f.write('\n'.join(lines))

Spell checking with custom dictionary

Need your guidance!
Want to check some text file for any spelling mistakes against custom dictionary.
Here is the code:
Dictionary=set(open("dictionary.txt").read().split())
print Dictionary
SearchFile = open(input("sample.txt"))
WordList = set()
for line in SearchFile:
line = line.strip()
if line not in Dictionary:
WordList.add(line)
print(WordList)
But when I open and check back the sample file nothing changed. What Im doing wrong?
What you are doing wrong is not explicitly changing anything in any file.
Here is a little bit of code to show how to write stuff to files...
fp = open(somefilepath,'w')
this line opens a file for writing, the 'w' tells python to create the file if it does not exist, but also deletes the contents of the file if it does exist. If you want to open a file for writing and keep the current contents use 'a' instead. 'a' is for append.
fp.write(stuff)
writes whatever is in the variable 'stuff' to the file.
Hope this helps. For code more specific to your problem please tell us what exactly you want to write to your file.
Also, here is some documentation that should help you to better understand the topic of files: http://docs.python.org/tutorial/inputoutput.html#reading-and-writing-files
EDIT: but you are not changing anything!
By the end of your script here is what you have accomplished:
1. Dictionary is a set containing all acceptable words
2. WordList is a set containing all not acceptable lines
3. You have read to the end of SearchFile
If I am understanding your question correctly what you want to now do is:
4. find out which Disctionary word each line stored in Wordlist should be
5. re-write SearchFile with the offending lines replaced.
If this is correct, how do you intend to figure out which WordList entry is supposed to be which Dictionary entry? How do you know the actual corrections? Have you attempted this part of the script (it is the crux, after all. It would only be polite). Can you please share with us your attempt at this part.
Lets assume you have this function:
def magic(line,dictionary):
"""
this takes a line to be checked, and a set of acceptable words.
outputs what line is meant to be.
PLEASE tell us your approach to this bit
"""
if line in dictionary:
return line
...do stuff to find out which word is being mis spelt, return that word
Dictionary=set(open("dictionary.txt").read().split())
SearchFile = open("sample.txt",'r')
result_text = ''
for line in SearchFile:
result_text += magic(line.strip(),Dictionary) #add the correct line to the result we want to save
result_text += '\n'
SearchFile = open("sample.txt",'w')
SearchFile.write(result_text) # here we actually make some changes
If you have not thought about how to find the actual dictionary value that mis-spelt lines should be corrected to become, try this out: http://norvig.com/spell-correct.html
To re-iterate a previous point, it is important that you show that you have at least attempted to solve the crux of your problem if you want any meaningful help.

Categories