While Loop Not Performing Main Function - python

I'm trying to write a Python script that uses a particular external application belonging to the company I work for. I can generally figure things out for myself when it comes to programming and scripting, but this time I am truely lost!
I can't seem to figure out why the while loop wont function as it is meant to. It doesn't give any errors which doesn't help me. It just seems to skip past the important part of the code in the centre of the loop and then goes on to increment the "count" like it should afterwards!
f = open('C:/tmp/tmp1.txt', 'w') #Create a tempory textfile
f.write("TEXTFILE\nTEXTFILE\nTEXTFILE\nTEXTFILE\nTEXTFILE\nTEXTFILE\n") #Put some simple text in there
f.close() #Close the file
count = 0 #Insert the line number from the text file you want to begin with (first line starts with 0)
num_lines = sum(1 for line1 in open('C:/tmp/tmp1.txt')) #Get the number of lines from the textfile
f = open('C:/tmp/tmp2.txt', 'w') #Create a new textfile
f.close() #Close it
while (count < num_lines): #Keep the loop within the starting line and total number of lines from the first text file
with open('C:/tmp/tmp1.txt', 'r') as f: #Open the first textfile
line2 = f.readlines() #Read these lines for later input
for line2[count] in f: #For each line from chosen starting line until last line from first text file,...
with open('C:/tmp/tmp2.txt', 'a') as g: #...with the second textfile open for appending strings,...
g.write("hello\n") #...write 'hello\n' each time while "count" < "num_lines"
count = count + 1 #Increment the "count"
I think everything works up until: "for line2[count] in f:"
The real code I'm working on is somewhat more complicated, and the application I'm using isn't exactly for sharing, so I have simplified the code to give silly outputs instead just to fix the problem.
I'm not looking for alternative code, I'm just looking for a reason why the loop isn't working so I can try to fix it myself.
All answers will be appreciated, and thanking everyone in advance!
Cormac

Some comments:
num_lines = sum(1 for line1 in open('C:/tmp/tmp1.txt'))
Why? What's wrong with len(open(filename, 'rb').readlines())?
while (count < num_lines):
...
count = count + 1
This is bad style, you could use:
for i in range(num_lines):
...
Note that I named your index i, which is universally recognized, and that I used range and a for loop.
Now, your problem, like I said in the comment, is that f is a file (that is, a stream of bytes with a location pointer) and you've read all the lines from it. So when you do for line2[count] in f:, it will try reading a line into line2[count] (this is a bit weird, actually, you almost never use a for loop with a list member as an index but apparently you can do that), see that there's no line to read, and never executes what's inside the loop.
Anyway, you want to read a file, line by line, starting from a given line number? Here's a better way to do that:
from itertools import islice
start_line = 0 # change this
filename = "foobar" # also this
with open(filename, 'rb') as f:
for line in islice(f, start_line, None):
print(line)
I realize you don't want alternative code, but your code really is needlessly complicated.

If you want to iterate over the lines in the file f, I suggest replacing your "for" line with
for line in line2:
# do something with "line"...
You put the lines in an array called line2, so use that array! Using line2[count] as a loop variable doesn't make sense to me.

You seem to get it wrong how the 'for line in f' loop works. It iterates over a file and calls readline, until there are no lines to read. But at the moment you start the loop all the lines are already read(via f.readlines()) and file's current position is at end. You can achieve what you want by calling f.seek(0), but that doesn't seem to be a good decision anyway, since you're going to read file again and that's slow IO.
Instead you want to do smth like:
for line in line2[count:]: # iterate over lines read, starting with `count` line
do_smth_with(line)

Related

When writing to a file, can you be specific about where you want to write?

I have a text file, which has the following:
20
15
10
And I have the following code:
test_file = open("test.txt","r")
n = 21
line1 = test_file.readline(1)
line2 = test_file.readline(2)
line3 = test_file.readline(3)
test_file.close()
line1 = int(line1)
line2 = int(line2)
line3 = int(line3)
test_file = open("test.txt","a")
if n > line1:
test_file.write("\n")
n = str(n)
test_file.write(n)
test_file.close()
This code checks if the variable 'n' is bigger than line 1. What I wanted it to do is if it is bigger than line 1, it should be written in a line before the previous line 1. However this code will write it at the bottom of the file. Is there anything I can do to write something where I want to and not at the bottom of the file?
Any help is appreciated.
You can put your whole data in a variable, edit that variable then overwrite the information in the file.
with open('test.txt', 'r') as file:
# read a list of lines into data
data = file.readlines()
# now change the 2nd line, note that you have to add a newline
data[1] = "42\t\n"
# and write everything back
with open('test.txt', 'w') as file:
file.writelines( data )
This is a short answer, implement your own algorithm to solve your own problem.
As correctly pointed out by Amadan in a comment, the only way to obtain this result is a complete rewrite of the file.
This, clearly depending on how strict your requirements are, is fairly inefficient.
If you want to understand more about inefficiency just imagine the actions you would have to manually take to write a new 1st line in a physical notebook page.
Since the 1st line is already written you would have to turn the page, write the new first line, then copy again all the lines from the old page and, finally, tear the 1st page out and have your perfect notebook with a perfect page again.
You are writing with pen so there is no possibility to delete, only a new page will do the trick.
That is quite some work!
This is - well, more or less - what Python is doing behind the scenes when it is opening for reading (the 'r' part in my examples below) and then opening for writing (the 'w' part) the same file again.
As a general idea imagine that when you see for loops there is a lot of work to do.
I will clumsily over-simplify saying that the more the for loops the slower the code (countless pages of paper have been written by brilliant minds on performances, I suggest you diving dive deeper and searching for "Big O notation" using your preferred search engine. Here's an example: https://www.freecodecamp.org/news/all-you-need-to-know-about-big-o-notation-to-crack-your-next-coding-interview-9d575e7eec4/).
A better solution would be to change your data file and make sure that the last value is also the most recent one.
Rewriting the file is as easy as writing an empty file, code and result are identical.
The trick here is that we have in memory (in the variables data and new_data) everything we need.
In data we store the whole content of the file before the change.
In new_data we can easily apply the needed modification because it is just a list containing a number and a newline (\n) for each list item.
Once new_data contains the data in the desired order all we need to do is write that list into a file.
Here's a possible solution, as close as possible to your code:
n = 21
with open('test.txt', 'r') as file:
data = file.readlines()
first_entry = int(data[0])
if (n > first_entry):
new_data = []
new_value = str(n) + "\n"
new_data.append(new_value)
for item in data:
new_data.append(item)
with open('test.txt', 'w') as file:
file.writelines(new_data)
Here's a more portable one:
def prepend_to_file_if_bigger_than_first_line(filename, value):
"""Checks if value is bigger than the one found in the 1st line of the specified file,
if true prepends it to the file
Args:
filename (str): The file name to check.
value (str): The value to check.
"""
with open(filename, 'r') as file:
data = file.readlines()
first_entry = int(data[0])
if (value > first_entry):
new_value = "{}\n".format(value)
new_data = []
new_data.append(new_value)
for old_value in data:
new_data.append(old_value)
with open(filename, 'w') as file:
file.writelines(new_data)
prepend_to_file_if_bigger_than_first_line("test.txt", 301)
As bonus some food for thought and exercises to learn:
What if instead of rewriting everything you just add a new line to the end of the page? Wouldn't it be more efficient and effective?
How would you re-implement my function above just to check the last line in file and append a new value?
Try bench-marking the prepend and the append solution, which one is best?

python - Trying to do a program to replace a given line, by the same line but all CAPS

Trying to do a college exercise where I'm supposed to replace a given line in a file, by the same line but written in all caps. The problem is we can only write in the same file, and in that exact line, we can't write in the rest of the file.
This is the code I have so far, but I can't figure out how to go to the line I want
def upper(n):
count=0
with open("upper.txt", "r+") as file:
lines = file.readlines()
file.seek(0)
for line in file.readlines():
if count == n:
pos = file.tell()
line1 = str(line.upper())
count += 1
file.seek(pos)
file.write(line1)
Help appreciated!
The problem lies in that your readlines already has read the entire file, and so the position of the "file cursor" is always at the end of the file. In theory, a simple fix should be:
Initialize pos to 0.
Read a single line.
If the current line counter indicates this is the one you want, set the position to pos again, update that line, and exit.
Update pos to point to the end of this line (so it points to the start of the next line).
Loop until satisfied.
In code, that would be this:
def upper(n):
count=0
with open("text.txt", "r+") as file:
pos = 0
for line in file.readlines():
if count == n:
line1 = line.upper()
break
pos = file.tell()
count += 1
file.seek(pos)
file.write(line1)
upper(5)
However! There is a snag. File operations are heavily buffered, and the for loop on readlines does not read one line at a time. Instead, for efficiency, it reads as much as possible, but it only "returns" the next line to your program. On a next run through your loop, it simply checks if it already had read enough of your text file to return the following line, and if not, it fills its internal buffer again. So, even while tell() will correctly be updated to the external file position – the value you see –, it does not reflect the "cursor" position of what you are processing at the time.
One way to circumvent this is to physically mimic what readlines does: read a single byte at a time, determine whether you have read an entire line (then this byte would be \n), and update your position and status based on this.
However, a more proper way of updating a file is to read it into memory in its entirety, change it, and write it back to disk. Changing part of an existing file with "r+" is usually recommended to use binary mode (where the position of each byte is known beforehand); admittedly, in theory your method should have worked as well, but as you see the file buffering defeats this.
Reading, changing, and writing the file entirely is as simple as this:
def better_upper(n):
count=0
with open("text.txt", "r") as file:
lines = file.readlines()
lines[n] = lines[n].upper()
with open("text.txt", "w") as file:
file.writelines(lines)
better_upper(5)
(Where the only caveat is that it always overwrites the original file. That is: if something unexpected goes wrong, it will probably erase text.txt. If you want a belt-and-suspenders approach, write to a new file, then check if it got written correctly. If it did, delete the old file and rename the new one. Left as an exercise to the reader.)

Reading large files in a loop

I'm having some trouble dealing with large text files (about 1GB), when I want to read them and use them in while loops.
More specifically: First I start by doing some parsing on the lines of the file, in order to find e.g. all lines that start with "x". In doing so, I add the indices of the found lines to a list (say l). This is the pre-processing part.
Now in a while loop, I'm choosing random indices from l, and want to read its corresponding line (or say 5 lines around it). Thus I need to keep the file in memory once and for all throughout the while loop, as a priori I do not know what lines I end up reading (the line is randomly picked from l).
The problem is, when I call the file before my main loop, during the first run of the loop, the reading gets done successfully, but already from the second run, the file has vanished from memory. What I have tried:
The preprocess part:
for i, line in enumerate(filename):
prep = ''.join(c for c in line if c.isalnum() or c.isspace())
if 'x' in prep: l.append(i)
Now I have my l list. loading the file in memory before main loop:
with open(filename,'r') as f:
while (some condition):
random_index = random.sample(range(0,len(l)),1)
output_file = open("out","w") #I will write here the read line(s)
for i, line in enumerate(f):
#(the lines to be read, starting from the given random index)
if (i >= l[random_index]) and (i < l[random_index+1]):
out.write(line)
out.close()
Only during the first run of the loop, things work properly.
Alternatively I also tried:
f = open(filename)
while (some condition):
random_index = ... #rest is same as above.
Same issue, only first run work. One thing that worked was putting the f=open(filename) in the loop, so every run the file is called. But since it is a large one, this is really no practical solution.
What am I doing wrong here?
How should such readings be done properly?
What am I doing wrong here?
This answer addresses the same problem: you can't read file twice.
You open file f outside of the while loop and read it completely by calling for i, line in enumerate(f): during first iteration of the while loop. During the second iteration you can't read it again, since it has been read already.
How should such readings be done properly?
As noted in the linked answer:
To answer your question directly, once a file has been read, with read() you can use seek(0) to return the read cursor to the start of the file (docs are here).
That means, that to solve your problem you can add f.seek(0) at the end of the while loop to move pointer to the start of the file after each iteration. Doing this you can reread file from the start again.

Reading from text file into python list

Very new to python and can't understand why this isn't working. I have a list of web addresses stored line by line in a text file. I want to store the first 10 in an array/list called bing, the next 10 in a list called yahoo, and the last 10 in a list called duckgo. I'm using the readlines function to read the data from the file into each array. The problem is nothing is being written to the lists. The count is incrementing like it should. Also, if I remove the loops altogether and just read the whole text file into one list it works perfectly. This leads me to believe that the loops are causing the problem. The code I am using is below. Would really appreciate some feedback.
count=0;
#Open the file
fo=open("results.txt","r")
#read into each array
while(count<30):
if(count<10):
bing = fo.readlines()
count+=1
print bing
print count
elif(count>=10 and count<=19):
yahoo = fo.readlines()
count+=1
print count
elif(count>=20 and count<=29):
duckgo = fo.readlines()
count+=1
print count
print bing
print yahoo
print duckgo
fo.close
You're using readlines to read the files. readlines reads all of the lines at once, so the very first time through your loop, you exhaust the entire file and store the result in bing. Then, every time through the loop, you overwrite bing, yahoo, or duckgo with the (empty) result of the next readlines call. So your lists all wind up being empty.
There are lots of ways to fix this. Among other things, you should consider reading the file a line at a time, with readline (no 's'). Or better yet, you could iterate over the file, line by line, simply by using a for loop:
for line in fo:
...
To keep the structure of your current code you could use enumerate:
for line_number, line in enumerate(fo):
if condition(line_number):
...
But frankly I think you should ditch your current system. A much simpler way would be to use readlines without a loop, and slice the resulting list!
lines = fo.readlines()
bing = lines[0:10]
yahoo = lines[10:20]
duckgo = lines[20:30]
There are many other ways to do this, and some might be better, but none are simpler!
readlines() reads all of the lines of the file. If you call it again, you get empty list. So you are overwriting your lists with empty data when you iterate through your loop.
You should be using readline() instead of readlines()
readlines() reads the entire file in at once, whereas readline() reads a single line from the file.
I suggest you rewrite it like so:
bing = []
yahoo = []
duckgo = []
with open("results.txt", "r") as f:
for i, line in enumerate(f):
if i < 10:
bing.append(line)
elif i < 20:
yahoo.append(line)
elif i < 30:
duckgo.append(line)
else:
raise RuntimeError, "too many lines in input file"
Note how we use enumerate() to get a running count of lines, rather than making our own count variable and needing to increment it ourselves. This is considered good style in Python.
But I think the best way to solve this problem would be to use itertools like so:
import itertools as it
with open("results.txt", "r") as f:
bing = list(it.islice(f, 10))
yahoo = list(it.islice(f, 10))
duckgo = list(it.islice(f, 10))
if list(it.islice(f, 1)):
raise RuntimeError, "too many lines in input file"
itertools.islice() (or it.islice() since I did the import itertools as it) will pull a specified number of items from an iterator. Our open file-handle object f is an iterator that returns lines from the file, so it.islice(f, 10) pulls exactly 10 lines from the input file.
Because it.islice() returns an iterator, we must explicitly expand it out to a list by wrapping it in list().
I think this is the simplest way to do it. It perfectly expresses what we want: for each one, we want a list with 10 lines from the file. There is no need to keep a counter at all, just pull the 10 lines each time!
EDIT: The check for extra lines now uses it.islice(f, 1) so that it will only pull a single line. Even one extra line is enough to know that there are more than just the 30 expected lines, and this way if someone accidentally runs this code on a very large file, it won't try to slurp the whole file into memory.

Read file Again Python

I would like to read a file again and again when it arrives at the end.
The file is only numbers separate by comma.
I use python and I read on the doc that file.seek(0) can be use for this but doesn't work for me.
This is my script:
self.users = []
self.index = -1
infile = open(filename, "r")
for line in infile.readlines():
if line != None:
self.users.append(String.split((line),','))
else:
infile.seek(0)
infile.read()
infile.close()
self.index= self._index +1
return self.users[self.index]
Thank you for your help
infile.read() will read in the whole of the file and then throw away the result. Why are you doing it?
When you call infile.readlines you have already read in the whole file. Then your loop iterates over the result, which is just a Python list. Moving to the start of the file will have no effect on that.
If your code did in fact move to the start of the file after reaching the end, it would simply loop for ever until it ran out of memory (because of the endlessly growing users list).
You could get the behaviour you're asking for by storing the result of readlines() in a variable and then putting the whole for line in all_lines: loop inside another while True:. (Or closing, re-opening and re-reading every time, if (a) you are worried that the file might be changed by another program or (b) you want to avoid reading it all in in a single gulp. For (b) you would replace for line in infile.readlines(): with for line in infile:. For (a), note that trying to read a file while something else might be writing to it is likely to be a bad idea no matter how you do it.)
I strongly suspect that the behaviour you're asking for is not what you really want. What's the goal you're trying to achieve by making your program keep reading the file over and over?
The 'else' branch will never be pursued because the for loop will iterate over all the lines of the files and then exit.
If you want the seek operation to be executed you will have to put it outside the for loop
self.users = []
self.index = -1
infile = open(filename, "r")
while True:
for line in infile.readlines():
self.users.append(String.split((line),','))
infile.seek(0)
infile.close()
self.index= self._index +1
return self.users[self.index]
The problem is, if you will loop for ever you will exhaust the memory. If you want to read it only twice then copy and paste the for loop, otherwise decide an exit condition and use a break operation.
readlines is already reading the entire file contents into an in-memory list, which you are free to iterate over again and again!
To re-read the file do:
infile = file('whatever')
while True:
content = infile.readlines()
# do something with list 'content'
# re-read the file - why? I do not know
infile.seek(0)
infile.close()
You can use itertools.cycle() here.
Here's an example :
import itertools
f = open(filename)
lines = f.readlines()
f.close()
for line in itertools.cycle(lines):
print line,

Categories