I have an issue with a bit of code that works in Python 3, but fail in 2.7. I have the following part of code:
def getDimensions(file,log):
noStations = 0
noSpanPts = 0
dataSet = False
if log:
print("attempting to retrieve dimensions. Opening file",file)
while not dataSet:
try: # read until error occurs
string = file.readline().rstrip() # to avoid breaking on an empty line
except IOError:
break
stations
if "Ax dist hub" in string: # parse out number of stations
if log:
print("found ax dist hub location")
next(file) # skip empty line
eos = False # end of stations
while not eos:
string = file.readline().rstrip()
if string =="":
eos = True
else:
noStations = int(string.split()[0])
This returns an error:
ValueError: Mixing iteration and read methods would lose data.
I understand that the issue is how I read my string in the while loop, or at least that is what I believe. Is there a quick way to fix this? Any help is appreciated. Thank you!
The problem is that you are using next and readline on the same file. As the docs say:
. As a consequence of using a read-ahead buffer, combining next() with other file methods (like readline()) does not work right.
The fix is trivial: replace next with readline.
If you want a short code to do that try:
lines = []
with open(filename) as f:
lines = [line for line in f if line.strip()]
Then you can do tests for lines.
Related
I have a file that contains information about users and the amount of times they have logged in. I am trying to pull all users that have a login of >= 250 and save it to another file. I am new at python coding and continue to get a "invalid literal with base 10" error when trying to run this portion of my code. Can anyone help me out and explain why this happens so I can prevent from this from happening in the future? TIA
thanks
def main():
userInformation = readfile("info")
suspicious = []
for i in userInformation :
if(int(i[2])>=250):
suspicious.append(i)
Full code below if needed:
#Reading the file function
def readFile(filename):
file = open(filename,'r')
lines = [x.split('\n')[0].split(';') for x in file.readlines()]
file.close()
return lines
def writeFile(suspicious):
file = open('suspicious.txt','w')
for i in suspicious:
file.write('{};{};{};{}\n'.format(i[0],i[1],i[2],i[3]))
file.close()
def main()
userInformation = readfile("info")
suspicious = []
for i in userInformation :
if(int(i[2])>=250):
suspicious.append(i)
writeFile(suspicious)
print('Suspicious users:')
for i in suspicious:
print('{} {}'.format(i[0],i[1]))
main()
Here is some line of my file:
Jodey;Lamins;278
Chris;Taylors;113
David;Mann;442
etc
etc
"invalid literal with base 10" occurs when you're trying to parse an integer that's not in base 10. In other words i[2] is not a valid integer (most likely it's a string that you're incorrectly trying to convert to an integer). Also, it would be best to correctly format your main function.
Ok, so I took your example file and played with it a little. The issues I faced were mostly spacing issues. So, here's the code you might like -
UsersInfoFileName = '/path/to/usersinfofile.txt'
MaxRetries = 250
usersWithExcessRetries = []
with open(UsersInfoFileName, 'r') as f:
lines = f.readlines()
consecutiveLines = (line.strip() for line in lines if line.strip())
for line in consecutiveLines:
if (int(line.split(';')[-1]) > MaxRetries):
usersWithExcessRetries.append(line)
for suspUsers in usersWithExcessRetries:
print(suspUsers)
Here's what it does -
Reads all lines in the given file
Filters all lines by excluding lines which may be empty
Removes surrounding white spaces for the remaining lines
Reads last semi-colon separated value, and compares it with MaxRetries
Adds the original line to a list if the value exceeds MaxRetries
I'm trying to read a file.out server file but I need to read only latest data in datetime range.
Is it possible to reverse read file using with open() with modes(methods)?
The a+ mode gives access to the end of the file:
``a+'' Open for reading and writing. The file is created if it does not
exist. The stream is positioned at the end of the file. Subsequent writes
to the file will always end up at the then current end of the file,
irrespective of any intervening fseek(3) or similar.
Is there a way to use maybe a+ or other modes(methods) to access the end of the file and read a specific range?
Since regular r mode reads file from beginning
with open('file.out','r') as file:
have tried using reversed()
for line in reversed(list(open('file.out').readlines())):
but it returns no rows for me.
Or there are other ways to reverse read file... help
EDIT
What I got so far:
import os
import time
from datetime import datetime as dt
start_0 = dt.strptime('2019-01-27','%Y-%m-%d')
stop_0 = dt.strptime('2019-01-27','%Y-%m-%d')
start_1 = dt.strptime('09:34:11.057','%H:%M:%S.%f')
stop_1 = dt.strptime('09:59:43.534','%H:%M:%S.%f')
os.system("touch temp_file.txt")
process_start = time.clock()
count = 0
print("reading data...")
for line in reversed(list(open('file.out'))):
try:
th = dt.strptime(line.split()[0],'%Y-%m-%d')
tm = dt.strptime(line.split()[1],'%H:%M:%S.%f')
if (th == start_0) and (th <= stop_0):
if (tm > start_1) and (tm < stop_1):
count += 1
print("%d occurancies" % (count))
os.system("echo '"+line.rstrip()+"' >> temp_file.txt")
if (th == start_0) and (tm < start_1):
break
except KeyboardInterrupt:
print("\nLast line before interrupt:%s" % (str(line)))
break
except IndexError as err:
continue
except ValueError as err:
continue
process_finish = time.clock()
print("Done:" + str(process_finish - process_start) + " seconds.")
I'm adding these limitations so when I find the rows it could atleast print that the occurancies appeared and then just stop reading the file.
The problem is that it's reading, but it's way too slow..
EDIT 2
(2019-04-29 9.34am)
All the answers I received works well for reverse reading logs, but in my (and maybe for other people's) case, when you have n GB size log Rocky's answer below suited me the best.
The code that works for me:
(I only added for loop to Rocky's code):
import collections
log_lines = collections.deque()
for line in open("file.out", "r"):
log_lines.appendleft(line)
if len(log_lines) > number_of_rows:
log_lines.pop()
log_lines = list(log_lines)
for line in log_lines:
print(str(line).split("\n"))
Thanks people, all the answers works.
-lpkej
There's no way to do it with open params but if you want to read the last part of a large file without loading that file into memory, (which is what reversed(list(fp)) will do) you can use a 2 pass solution.
LINES_FROM_END = 1000
with open(FILEPATH, "r") as fin:
s = 0
while fin.readline(): # fixed typo, readlines() will read everything...
s += 1
fin.seek(0)
mylines = []
for i, e in enumerate(fin):
if i >= s - LINES_FROM_END:
mylines.append(e)
This won't keep your file in the memory, you can also reduce this to one pass by using collections.deque
# one pass (a lot faster):
mylines = collections.deque()
for line in open(FILEPATH, "r"):
mylines.appendleft(line)
if len(mylines) > LINES_FROM_END:
mylines.pop()
mylines = list(mylines)
# mylines will contain #LINES_FROM_END count of lines from the end.
Sure there is:
filename = 'data.txt'
for line in reversed(list(open(filename))):
print(line.rstrip())
EDIT:
As mentioned in comments this will read the whole file into memory. This solution should not be used with large files.
Another option is to mmap.mmap the file and then use rfind from the end to search for the newlines and then slice out the lines.
Hey m8 I have made this code it works for me I can read in my file in reversed order. hope it helps :)
I start by creating a new text file, so I don't know how much that is important for you.
def main():
f = open("Textfile.txt", "w+")
for i in range(10):
f.write("line number %d\r\n" % (i+1))
f.close
def readReversed():
for line in reversed(list(open("Textfile.txt"))):
print(line.rstrip())
main()
readReversed()
I've been trying to write some code to read a CSV file. Some of the lines in the CSV are not complete. I would like the code to skip a bad line if there is data missing in one of the fields. I'm using the following code.
def Test():
dataFile = open('test.txt','r')
readFile = dataFile.read()
lineSplit = readFile.split('\n')
for everyLine in lineSplit:
dividedLine = everyLine.split(';')
a = dividedLine[0]
b = dividedLine[1]
c = dividedLine[2]
d = dividedLine[3]
e = dividedLine[4]
f = dividedLine[5]
g = dividedLine[6]
print (a,b,c,d,e,f,g)
In my opinion, the Pythonic way to do this would be to use the included csv module in conjunction with a try/except block (while following PEP 8 - Style Guide for Python Code).
import csv
def test():
with open('reading_test.txt','rb') as data_file:
for line in csv.reader(data_file):
try:
a,b,c,d,e,f,g = line
except ValueError:
continue # ignore the line
print(a,b,c,d,e,f,g)
test()
This approach is called "It's Easier to Ask Forgiveness than Permission" (EAFP). The other more common style is referred to as "Look Before You Leap" (LBYL). You can read more about them in this snippet from a book by a very authoritative author.
Given that you cannot know before hand whether a given line is incomplete, you need to check if it is and skip it if it is not. You can use continue for this, which makes the for loop move to the next iteration:
def Test():
dataFile = open('test.txt','r')
readFile = dataFile.read()
lineSplit = readFile.split('\n')
for everyLine in lineSplit:
dividedLine = everyLine.split(';')
if len(dividedLine) != 7:
continue
a = dividedLine[0]
b = dividedLine[1]
c = dividedLine[2]
d = dividedLine[3]
e = dividedLine[4]
f = dividedLine[5]
g = dividedLine[6]
print (a,b,c,d,e,f,g)
This doesn't seem all-that python related so much as conceptual: A line parsed from a csv row will be invalid if:
1. It is shorter than the minimum required length (i.e missing elements)
2. One or more entries parsed come back empty or None (only if all elements are required)
3. The type of an element doesn't match the intended type of the column (not in the scope of what you requested, but good to keep in mind)
In python, once you have split the array, you can check the first two conditions with
if len(dividedLines) < intended_length or ("" in dividedLines): continue
First part just needs you to get the intended length for a row, you can usually use the index row for that. The second part could have the quotes replaced with a None or something, but split returns a empty string so in this case use the "".
HTH
I just started learning python 3 weeks ago, I apologize if this is really basic. I needed to open a .txt file and print the length of the longest line of code in the file. I just made a random file named it myfile and saved it to my desktop.
myfile= open('myfile', 'r')
line= myfile.readlines()
len(max(line))-1
#the (the "-1" is to remove the /n)
Is this code correct? I put it in interpreter and it seemed to work OK.
But I got it wrong because apparently I was supposed to use a while loop. Now I am trying to figure out how to put it in a while loop. I've read what it says on python.org, watched videos on youtube and looked through this site. I just am not getting it. The example to follow that was given is this:
import os
du=os.popen('du/urs/local')
while 1:
line= du.readline()
if not line:
break
if list(line).count('/')==3:
print line,
print max([len(line) for line in file(filename).readlines()])
Taking what you have and stripping out the parts you don't need
myfile = open('myfile', 'r')
max_len = 0
while 1:
line = myfile.readline()
if not line:
break
if len(line) # ... somethin
# something
Note that this is a crappy way to loop over a file. It relys on the file having an empty line at the end. But homework is homework...
max(['b','aaa']) is 'b'
This lexicographic order isn't what you want to maximise, you can use the key flag to choose a different function to maximise, like len.
max(['b','aaa'], key=len) is 'aaa'
So the solution could be: len ( max(['b','aaa'], key=len) is 'aaa' ).
A more elegant solution would be to use list comprehension:
max ( len(line)-1 for line in myfile.readlines() )
.
As an aside you should enclose opening a file using a with statement, this will worry about closing the file after the indentation block:
with open('myfile', 'r') as mf:
print max ( len(line)-1 for line in mf.readlines() )
As other's have mentioned, you need to find the line with the maximum length, which mean giving the max() function a key= argument to extract that from each of lines in the list you pass it.
Likewise, in a while loop you'd need to read each line and see if its length was greater that the longest one you had seen so far, which you could store in a separate variable and initialize to 0 before the loop.
BTW, you would not want to open the file with os.popen() as shown in your second example.
I think it will be easier to understand if we keep it simple:
max_len = -1 # Nothing was read so far
with open("filename.txt", "r") as f: # Opens the file and magically closes at the end
for line in f:
max_len = max(max_len, len(line))
print max_len
As this is homework... I would ask myself if I should count the line feed character or not. If you need to chop the last char, change len(line) by len(line[:-1]).
If you have to use while, try this:
max_len = -1 # Nothing was read
with open("t.txt", "r") as f: # Opens the file
while True:
line = f.readline()
if(len(line)==0):
break
max_len = max(max_len, len(line[:-1]))
print max_len
For those still in need. This is a little function which does what you need:
def get_longest_line(filename):
length_lines_list = []
open_file_name = open(filename, "r")
all_text = open_file_name.readlines()
for line in all_text:
length_lines_list.append(len(line))
max_length_line = max(length_lines_list)
for line in all_text:
if len(line) == max_length_line:
return line.strip()
open_file_name.close()
I am reading a csv file several times, but cutting its size every time I go through it. So, once I've reached the bottom, I am writing a new csv file which is, say, the bottom half of the .csv file. I then wish to change the csv reader to use this new file instead, but it doesn't seem to be working... Here's what I've done.
sent = open(someFilePath)
r_send = csv.reader(sent)
try:
something = r_send.next()
except StopIteration:
sent.seek(0)
sent.close()
newFile = cutFile(someFilePath, someLineNumber)
sent = open(newFile, "r")
r_send = csv.reader(sent)
where cutFile does..
def cutFile(sender, lines):
sent = open(sender, "r")
new_sent = open(sender + ".temp.csv", "w")
counter = 0
for line in sent:
counter = counter + 1
if counter >= lines:
print >> new_sent, ",".join(line)
new_sent.close()
return sender + ".temp.csv"
Why is this not working?
Is something = r_send.next() in some kind of loop? The way you wrote it, it's only going to read one line.
Why do you need ",".join(line)? You can simply print line itself, and it should work.
Plus, there really is no need to seek(0) before closing a file.
I suggest the following:
Use for something in r_send: rather than something = r_send.next(); you won't even need the try... except blocks, as you'll just put the stuff closing the original file outside that loop (as someone else mentioned, you aren't even looping through the original file in your current code). Then you'll probably want to wrap all that in another loop so it keeps continuing until the file has been fully manipulated.
Use new_sent.write(line) instead of print >> new_sent, ",".join(line). Not that it makes that much of a difference besides the ",".join bit (which you don't need since you aren't using the csv module to write to a file), which you shouldn't be using here anyway, but it makes the fact that you're writing to a file more evident.
So...
sent = open(someFilePath)
r_send = csv.reader(sent)
someLineNumber = len(sent.readlines())
while someLineNumber > 0:
for something in r_send:
# do stuff
someLineNumber /= 2 # //= 2 in Python 3
sent.close()
newFile = cutFile(someFilePath, someLineNumber)
sent = open(newFile, "r")
r_send = csv.reader(sent)
Something like that.