Reading from text file into python list

Reading from text file into python list - python

Very new to python and can't understand why this isn't working. I have a list of web addresses stored line by line in a text file. I want to store the first 10 in an array/list called bing, the next 10 in a list called yahoo, and the last 10 in a list called duckgo. I'm using the readlines function to read the data from the file into each array. The problem is nothing is being written to the lists. The count is incrementing like it should. Also, if I remove the loops altogether and just read the whole text file into one list it works perfectly. This leads me to believe that the loops are causing the problem. The code I am using is below. Would really appreciate some feedback.
count=0;
#Open the file
fo=open("results.txt","r")
#read into each array
while(count<30):
if(count<10):
bing = fo.readlines()
count+=1
print bing
print count
elif(count>=10 and count<=19):
yahoo = fo.readlines()
count+=1
print count
elif(count>=20 and count<=29):
duckgo = fo.readlines()
count+=1
print count
print bing
print yahoo
print duckgo
fo.close

You're using readlines to read the files. readlines reads all of the lines at once, so the very first time through your loop, you exhaust the entire file and store the result in bing. Then, every time through the loop, you overwrite bing, yahoo, or duckgo with the (empty) result of the next readlines call. So your lists all wind up being empty.
There are lots of ways to fix this. Among other things, you should consider reading the file a line at a time, with readline (no 's'). Or better yet, you could iterate over the file, line by line, simply by using a for loop:
for line in fo:
...
To keep the structure of your current code you could use enumerate:
for line_number, line in enumerate(fo):
if condition(line_number):
...
But frankly I think you should ditch your current system. A much simpler way would be to use readlines without a loop, and slice the resulting list!
lines = fo.readlines()
bing = lines[0:10]
yahoo = lines[10:20]
duckgo = lines[20:30]
There are many other ways to do this, and some might be better, but none are simpler!

readlines() reads all of the lines of the file. If you call it again, you get empty list. So you are overwriting your lists with empty data when you iterate through your loop.

You should be using readline() instead of readlines()
readlines() reads the entire file in at once, whereas readline() reads a single line from the file.

I suggest you rewrite it like so:
bing = []
yahoo = []
duckgo = []
with open("results.txt", "r") as f:
for i, line in enumerate(f):
if i < 10:
bing.append(line)
elif i < 20:
yahoo.append(line)
elif i < 30:
duckgo.append(line)
else:
raise RuntimeError, "too many lines in input file"
Note how we use enumerate() to get a running count of lines, rather than making our own count variable and needing to increment it ourselves. This is considered good style in Python.
But I think the best way to solve this problem would be to use itertools like so:
import itertools as it
with open("results.txt", "r") as f:
bing = list(it.islice(f, 10))
yahoo = list(it.islice(f, 10))
duckgo = list(it.islice(f, 10))
if list(it.islice(f, 1)):
raise RuntimeError, "too many lines in input file"
itertools.islice() (or it.islice() since I did the import itertools as it) will pull a specified number of items from an iterator. Our open file-handle object f is an iterator that returns lines from the file, so it.islice(f, 10) pulls exactly 10 lines from the input file.
Because it.islice() returns an iterator, we must explicitly expand it out to a list by wrapping it in list().
I think this is the simplest way to do it. It perfectly expresses what we want: for each one, we want a list with 10 lines from the file. There is no need to keep a counter at all, just pull the 10 lines each time!
EDIT: The check for extra lines now uses it.islice(f, 1) so that it will only pull a single line. Even one extra line is enough to know that there are more than just the 30 expected lines, and this way if someone accidentally runs this code on a very large file, it won't try to slurp the whole file into memory.

Related

How to read a specific range of lines in an external file in python?

Lets say you have a python file with 50 lines of code in it, and you want to read a specific range lines into a list. If you want to read ALL the lines in the file, you can just use the code from this answer:
with open('yourfile.py') as f:
content = f.readlines()
print(content)
But what if you want to read a specific range of lines, like reading line 23-27?
I tried this, but it doesn't work:
f.readlines(23:27)

You were close. readlines returns a list and you can slice that, but it's invalid syntax to try and pass the slice directly in the function call.
f.readlines()[23:27]
If the file is very large, avoid the memory overhead of reading the entire file:
start, stop = 23, 27
for i in range(start):
next(f)
content = []
for i in range(stop-start):
content.append(next(f))

Try this:
sublines = content[23:27]

If there are lots and lots of lines in your file, I believe you should consider using f.readline() (without an s) 27 times, and only save your lines starting wherever you want. :)
Else, the other ones solution is what I would have done too (meaning : f.readlines()[23:28]. 28, because as far as I remember, outer range is excluded.

iterative variable losing value in nested loop

So I seem to be doing something incredibly dumb and I can't seem to figure it out. I am trying to create script that will search a file for terms defined in another file. This seems pretty basic to me but for some reason the outside loop iteration is empty on the inside loop.
if __name__ == "__main__":
searchfile = open(sys.argv[1],"r")
terms = open(sys.argv[2],"r")
for line in searchfile:
for term in terms:
if re.match(term, line.rstrip()):
print line
If I print line before the term loop it has the information. If I print line inside the term loop, it doesn't. What am I missing?

The issue here is that files are iterators that get exhausted - this means that once they have been iterated over once, they will not restart from the beginning.
You are probably used to lists - iterables that return a new iterator each time you loop over them, from the beginning.
Files are single-use iterables - once you loop over them, they are exhausted.
You can either use list() to construct a list you can iterate over multiple times, or open the file inside the loop, so that it is reopened each time, creating a new iterator from the beginning.
Which option is best will vary depending on the use case. Opening the file and reading from disk will be slower, but making a list will require all the data being held in memory - if your file is extremely large, this may be a problem.
It's also worth noting that you should use the with statement when opening files in Python.
with open(sys.argv[1], "r") as searchfile, open(sys.argv[2], "r") as terms:
terms = list(terms)
for line in searchfile:
for term in terms:
if re.match(term, line.rstrip()):
print line

So what are you doing: In the first for-iteration you read the first line of searchfile and compare it with every line in terms, by reading the file terms. After that, the file terms is read completely, so in every next iteration of the searchfile-loop the terms-loop isn't executed any more (terms is 'empty').

While Loop Not Performing Main Function

I'm trying to write a Python script that uses a particular external application belonging to the company I work for. I can generally figure things out for myself when it comes to programming and scripting, but this time I am truely lost!
I can't seem to figure out why the while loop wont function as it is meant to. It doesn't give any errors which doesn't help me. It just seems to skip past the important part of the code in the centre of the loop and then goes on to increment the "count" like it should afterwards!
f = open('C:/tmp/tmp1.txt', 'w') #Create a tempory textfile
f.write("TEXTFILE\nTEXTFILE\nTEXTFILE\nTEXTFILE\nTEXTFILE\nTEXTFILE\n") #Put some simple text in there
f.close() #Close the file
count = 0 #Insert the line number from the text file you want to begin with (first line starts with 0)
num_lines = sum(1 for line1 in open('C:/tmp/tmp1.txt')) #Get the number of lines from the textfile
f = open('C:/tmp/tmp2.txt', 'w') #Create a new textfile
f.close() #Close it
while (count < num_lines): #Keep the loop within the starting line and total number of lines from the first text file
with open('C:/tmp/tmp1.txt', 'r') as f: #Open the first textfile
line2 = f.readlines() #Read these lines for later input
for line2[count] in f: #For each line from chosen starting line until last line from first text file,...
with open('C:/tmp/tmp2.txt', 'a') as g: #...with the second textfile open for appending strings,...
g.write("hello\n") #...write 'hello\n' each time while "count" < "num_lines"
count = count + 1 #Increment the "count"
I think everything works up until: "for line2[count] in f:"
The real code I'm working on is somewhat more complicated, and the application I'm using isn't exactly for sharing, so I have simplified the code to give silly outputs instead just to fix the problem.
I'm not looking for alternative code, I'm just looking for a reason why the loop isn't working so I can try to fix it myself.
All answers will be appreciated, and thanking everyone in advance!
Cormac

Some comments:
num_lines = sum(1 for line1 in open('C:/tmp/tmp1.txt'))
Why? What's wrong with len(open(filename, 'rb').readlines())?
while (count < num_lines):
...
count = count + 1
This is bad style, you could use:
for i in range(num_lines):
...
Note that I named your index i, which is universally recognized, and that I used range and a for loop.
Now, your problem, like I said in the comment, is that f is a file (that is, a stream of bytes with a location pointer) and you've read all the lines from it. So when you do for line2[count] in f:, it will try reading a line into line2[count] (this is a bit weird, actually, you almost never use a for loop with a list member as an index but apparently you can do that), see that there's no line to read, and never executes what's inside the loop.
Anyway, you want to read a file, line by line, starting from a given line number? Here's a better way to do that:
from itertools import islice
start_line = 0 # change this
filename = "foobar" # also this
with open(filename, 'rb') as f:
for line in islice(f, start_line, None):
print(line)
I realize you don't want alternative code, but your code really is needlessly complicated.

If you want to iterate over the lines in the file f, I suggest replacing your "for" line with
for line in line2:
# do something with "line"...
You put the lines in an array called line2, so use that array! Using line2[count] as a loop variable doesn't make sense to me.

You seem to get it wrong how the 'for line in f' loop works. It iterates over a file and calls readline, until there are no lines to read. But at the moment you start the loop all the lines are already read(via f.readlines()) and file's current position is at end. You can achieve what you want by calling f.seek(0), but that doesn't seem to be a good decision anyway, since you're going to read file again and that's slow IO.
Instead you want to do smth like:
for line in line2[count:]: # iterate over lines read, starting with `count` line
do_smth_with(line)

using a text file in python

Im trying to take a text file and use only the first 30 lines of it in python.
this is what I wrote:
text = open("myText.txt")
lines = myText.readlines(30)
print lines
for some reason I get more then 150 lines when I print?
What am I doing wrong?

Use itertools.islice
import itertools
for line in itertools.islice(open("myText.txt"), 0, 30)):
print line

If you are going to process your lines individually, an alternative could be to use a loop:
file = open('myText.txt')
for i in range(30):
line = file.readline()
# do stuff with line here
EDIT: some of the comments below express concern about this method assuming there are at least 30 lines in the file. If that is an issue for your application, you can check the value of line before processing. readline() will return an empty string '' once EOF has been reached:
for i in range(30):
line = file.readline()
if line == '': # note that an empty line will return '\n', not ''!
break
index = new_index
# do stuff with line here

The sizehint argument for readlines isn't what you think it is (bytes, not lines).
If you really want to use readlines, try text.readlines()[:30] instead.
Do note that this is inefficient for large files as it first creates a list containing the whole file before returning a slice of it.
A straight-forward solution would be to use readline within a loop (as shown in mac's answer).
To handle files of various sizes (more or less than 30), Andrew's answer provides a robust solution using itertools.islice(). To achieve similar results without itertools, consider:
output = [line for _, line in zip(range(30), open("yourfile.txt", "r"))]
or as a generator expression (Python >2.4):
output = (line for _, line in zip(range(30), open("yourfile.txt", "r")))
for line in output:
# do something with line.

The argument for readlines is the size (in bytes) that you want to read in. Apparently 150+ lines is 30 bytes worth of data.
Doing it with a for loop instead will give you proper results. Unfortunately, there doesn't seem to be a better built-in function for that.

Read file Again Python

I would like to read a file again and again when it arrives at the end.
The file is only numbers separate by comma.
I use python and I read on the doc that file.seek(0) can be use for this but doesn't work for me.
This is my script:
self.users = []
self.index = -1
infile = open(filename, "r")
for line in infile.readlines():
if line != None:
self.users.append(String.split((line),','))
else:
infile.seek(0)
infile.read()
infile.close()
self.index= self._index +1
return self.users[self.index]
Thank you for your help

infile.read() will read in the whole of the file and then throw away the result. Why are you doing it?
When you call infile.readlines you have already read in the whole file. Then your loop iterates over the result, which is just a Python list. Moving to the start of the file will have no effect on that.
If your code did in fact move to the start of the file after reaching the end, it would simply loop for ever until it ran out of memory (because of the endlessly growing users list).
You could get the behaviour you're asking for by storing the result of readlines() in a variable and then putting the whole for line in all_lines: loop inside another while True:. (Or closing, re-opening and re-reading every time, if (a) you are worried that the file might be changed by another program or (b) you want to avoid reading it all in in a single gulp. For (b) you would replace for line in infile.readlines(): with for line in infile:. For (a), note that trying to read a file while something else might be writing to it is likely to be a bad idea no matter how you do it.)
I strongly suspect that the behaviour you're asking for is not what you really want. What's the goal you're trying to achieve by making your program keep reading the file over and over?

The 'else' branch will never be pursued because the for loop will iterate over all the lines of the files and then exit.
If you want the seek operation to be executed you will have to put it outside the for loop
self.users = []
self.index = -1
infile = open(filename, "r")
while True:
for line in infile.readlines():
self.users.append(String.split((line),','))
infile.seek(0)
infile.close()
self.index= self._index +1
return self.users[self.index]
The problem is, if you will loop for ever you will exhaust the memory. If you want to read it only twice then copy and paste the for loop, otherwise decide an exit condition and use a break operation.

readlines is already reading the entire file contents into an in-memory list, which you are free to iterate over again and again!
To re-read the file do:
infile = file('whatever')
while True:
content = infile.readlines()
# do something with list 'content'
# re-read the file - why? I do not know
infile.seek(0)
infile.close()

You can use itertools.cycle() here.
Here's an example :
import itertools
f = open(filename)
lines = f.readlines()
f.close()
for line in itertools.cycle(lines):
print line,

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reading from text file into python list - python

readlines() reads all of the lines of the file. If you call it again, you get empty list. So you are overwriting your lists with empty data when you iterate through your loop.

You should be using readline() instead of readlines() readlines() reads the entire file in at once, whereas readline() reads a single line from the file.

Related

How to read a specific range of lines in an external file in python?

iterative variable losing value in nested loop

While Loop Not Performing Main Function

using a text file in python

Read file Again Python

Categories

Resources