Reading a file in Python won't read the first line - python

I am reading a text file, separating the word and the number with the comma then adding them into separate lists however, the first name gets omitted. Here is my code.
for line in keywordFile:
line = keywordFile.readline()
keyword.append(line[0])
keywordValue.append(line[1])

You're jumping ahead with the first readline() and just use line defined in the for statement.

It seems that you keywordFile is a file object and since file objects are iterator (one shot iterables) after the first line that you loop over it you consume the first line.
for line in keywordFile:
^
And then you are using readline to read the next line which is extra here, so for getting ride of this problem you need to remove this part.
Also as a more pythonic way you can use a list comprehension to create the list of words by splitting the lines with comma.If you want to create a list of all words you can use a nested loop :
with open ('filename') as keywordFile :
words = [w for line in keywordFile for w in line.split(',')]
But if you want to put the separated words of each line in a separate list you just need to use a one loop :
with open ('filename') as keywordFile :
words = [line.split(',') for line in keywordFile]
Or as a better choice use csv module to open the file as a separated words.You can pass a delimiter argument to csv.reader function :
import csv
with open('file_name') as f:
words=csv.reader(f,delimiter=',')
Here words is a iterator from tuples of separated words. And of you want to concatenate them you can sue itertools.chain.from_iterable() function.

Try something like:
for line in keywordFile:
tokens = line.split(',')
keyword.append(tokens[0])
keywordValue.append(tokens[1])

Related

Having trouble to split a file with text into seperate words

I have been trying to split a file with text into distinct words.
I tried using the iter method, the nltk module and just splits, but something doesn't add when i am trying to append the outcome to a list.
Maybe there is some problem with the syntax of my approaching the file.
txt = open(game_file)
print txt.read()
names = []
linestream = iter(txt.read())
for line in linestream:
for word in line.split():
names.append(word)
when I try to print the list names, i just get '[]'.
Remove print txt.read(), you are iterating through empty opened file
Or make new variable text = txt.read() and do stuff with it
When you do txt.read() you're already at the end of your file. So when you try to restart it, the file pointer is already at the end and it does not find anything.
Try to delete your 2nd line and it should work!
Also, you don't need to do iter(txt.read()),
for line in txt should work!
Creating "iter" object of _any_file_obj_.read() returns iter object which iterates over every single character present in the file. Which is surely you dont want to acheive here as you want to split file text into distinct words.
If you want to get the every word form the text file, then you can follow the following approach.
word_list = []
txt = open(any_file) # creating file object
for line in txt.readlines():
if line:
[word_list.append(word) for word in line.split()]
txt.seek(0)
The last line txt.seek(0) is very important.
All this time, your code was giving empty list [] because the files current position after one full iteration was pointing at the end of file (EOF). _file_obj_.seek() can be used to return files current position to wherever you want in the opened file

Find, Replace inline file from multiple lists in Python

I have three python lists:
filePaths
textToFind
textToReplace
The lists are always equal lengths and in the correct order.
I need to open each file in filePaths, find the line in textToFind, and replace the line with textToReplace. I have all the code that populates the lists. I am stuck on making the replacements. I have tried:
for line in fileinput.input(filePath[i], inplace=1):
sys.stdout.write(line.replace(find[i], replace[i]))
How do I iterate over each file to make the text replacements on each line that matches find?
When you need to use the indices of the items in a sequence while iterating over that sequence, use enumerate.
for i, path in enumerate(filePath):
for line in fileinput.input(path, inplace=1):
sys.stdout.write(line.replace(find[i], replace[i]))
Another option would be to use zip, which will give you one item from each sequence in order.
for path, find_text, replace_text in zip(filePath, textToFind, textToReplace):
for line in fileinput.input(path, inplace=1):
sys.stdout.write(line.replace(find_text, replace_text))
Note that for Python 2.x zip will produce a new list that can be iterated - so if the sequences you are zipping are huge it will consume memory. Python 3.x zip produces an iterator so it doesn't have that feature.
With a normal file object you could read the entire file into a variable and perform the string replacement on the whole file at once.
I might do something like this without more information
for my_file in file_paths:
with open(my_file, 'r') as cin, open(my_file, 'w') as cout:
lines = cin.readlines() #store the file in mem so i can overwrite it.
for line in lines:
line = line.replace(find, replace) # change as needed
cout.write(line)
Iterate over all the file paths, open the file up for reading and a separate one for writing. Store the files lines in a variable as in this code i will be overwriting the original file. Do your replace, remember if there is nothing to replace python just leaves the line alone. Write the line back to file.
You can read file to some temporary variable, make changes, and then write it back:
with open('file', 'r') as f:
text = f.read()
with open('file', 'w') as f:
f.write(text.replace('aaa', 'bbb'))

Saving Output as List

I have a text file called urldata.txt which I am opening and reading line by line. I wrote a for loop to read it line by line, but I want to save the output I receive as a list.
Here is what I have:
textdata = open("urldata.txt","r")
for line in textdata:
print(line)
this returns:
http://www.google.com
https://twitter.com/search?q=%23ASUcis355
https://github.com/asu-cis-355/course-info
I want to save these lines above as a list. Any suggestions?
I have tried appending and such, however, being new to Python I'm not sure how to go about this.
You just want a list of every line of the file?
urls = open("urldata.txt").read().splitlines()
If you just want the lines as a list, that's trivial:
with open("urldata.txt") as textdata:
lines = list(textdata)
If you want newlines stripped, use a list comprehension to do it:
with open("urldata.txt") as textdata:
lines = [line.rstrip('\r\n') for line in textdata]

I'm trying to write random numbers to a file and then make a list from lines in Python but list takes '\n' too

I'm using a function to create random numbers and writing them line by line in a file named random.txt using \n in a for loop using this block of code:
dosya.write(str(randomNumber))
dosya.write('\n')
I need to make a list from lines and then sort that list using a sort function. I can see my random numbers line by line in that file but when I use readline() function like:
List = open("random.txt").readlines()
print List
the output is:
['22\n', '16\n', '1\n', '4\n', '4\n']
why am I seeing \n after my numbers? I tried printing only first or second element and it didn't show any extra thing. What is wrong with whole list? When I use sort function it takes \n as well.
The .readlines() methods reads each line including the line ending \n as a list element.
You likely want your random numbers as integers anyway.
Converting them to int would make your sorting work also:
with open("random.txt") as fobj:
data = [int(line) for line in fobj]
The with opens the file with the promise to close it as soon as you leave the indentation. The open file object fobj is iterable. Therefore, you can write a list comprehension directly over fobj converting each line into an integer. It removes all newline characters n automatically.
Loop through each line and cast the line to an int and append to an array.
f=open('random.txt','r')
array = []
for line in f: # read lines
array.append(int(line))
f.closed

How to print lines with a certain length in a file(Python)

I am new to python. I have a document that has one random word per line. There are thousands of words in this file. I am trying to print only the words that are four letters long. I tried this:
f=open("filename.txt")
Words=f.readlines()
for line in f:
if len(line)==4:
print(line)
f.close()
But python is blank when I do this. I am assuming I need to strip the blank spaces as well, but when I did
f.strip()
I received an error stating that .strip() doesn't apply to list items. Any help is grateful. Thanks!
'Python is blank' because you attempt to iterate over the file for a second time.
The first time is with readlines(), so when that iteration is finished you are at the end of the file. Then when you do for line in f you are already at the end of the file so there is nothing left over which to iterate. To fix this, drop the call to readlines().
To do what you want to have, I would just do this:
with open('filename.txt') as f:
for line in f: # No need for `readlines()`
word = line.strip() # Strip the line, not the file object.
if len(word) == 4:
print(word)
Your other error occurs with f.strip() because f is a file object- but you only strip a string. Therefore just split the line on each iteration as shown in the example above.
You should do:
for line in Words:
instead of
for line in f:
You want line.strip() because f is a file object, not a string.

Categories