I'm trying to create an asynchronous function that reads the constantly updating log file and gets every line of it. That's what I have for now:
async def log_reader():
with open(LOG_PATH, "r", encoding='utf-8', errors='ignore') as logfile:
logfile.seek(0, os.SEEK_END)
while True:
line = logfile.readline()
if not line:
await asyncio.sleep(0.2)
continue
# do stuff
It works fine until the file is restarted. I was thinking about checking whether the file's size became smaller than it was, that would mean that it was refreshed, but I feel there must be a better option for that.
Any tips are welcome.
For refreshing the file, you can check it's inode. Get it from the path using os.stat and then extract the inode number. If the inode you get is different than the previous one, you'll have to reopen the file. (so doing this using a with block may not be easy)
To optimise it a bit so you don't query the file all the time, you could implement some timeout which you can easily accept, but which is higher than the usual delay between the log lines.
This will work if the file has been replaced, which is the usual method of rotating logfiles. It will not work if the file has only been truncated.
Related
Here is my code of accessing&editing the file:
def edit_default_settings(self, setting_type, value):
with open("cam_settings.json", "r") as f:
cam_settings = json.load(f)
cam_settings[setting_type] = value
with open("cam_settings.json", 'w') as f:
json.dump(cam_settings, f, indent=4)
I use It in a program that runs for several hours in a day, and once in a ~week I'm noticing, that cam_settings.json file becoming empty (literally empty, the file explorer shows 0 bytes), but can't imagine how that is possible
Would be glad to hear some comments on what could go wrong
I can't see any issues with the code itself, but there can be an issue with the execution environment. Are you running the code in a multi-threaded environment or running multiple instances of the same program at once?
This situation can arise if this code is executed parallelly and multiple threads/processes try to access the file at the same time. Try logging each time the function was executed and if the function was executed successfully. Try exception handlers and error logging.
If this is a problem, using buffers or singleton pattern can solve the issue.
As #Chels said, the file is truncated when it's opened with 'w'. That doesn't explain why it stays that way; I can only imagine that happening if your code crashed. Maybe you need to check logs for code crashes (or change how your code is run so that crash reasons get logged, if they aren't).
But there's a way to make this process safer in case of crashes. Write to a separate file and then replace the old file with the new file, only after the new file is fully written. You can use os.replace() for this. You could do this simply with a differently-named file:
with open(".cam_settings.json.tmp", 'w') as f:
json.dump(cam_settings, f, indent=4)
os.replace(".cam_settings.json.tmp", "cam_settings.json")
Or you could use a temporary file from the tempfile module.
When openning a file with the "w" parameter, everytime you will write to it, the content of the file will be erased. (You will actually replace what's written already).
Not sure if this is what you are looking for, but could be one of the reasons why "cam_settings.json" becomes empty after the call of open("cam_settings.json", 'w')!
In such a case, to append some text, use the "a" parameter, as:
open("cam_settings.json", 'a')
I have a file 'out.txt' that is updated continuously. I need to send the contents of this file periodically to another file 'received.txt' every N minutes. I do not want the previous lines to be sent. So the scripts needs to send the new data and update 'received.txt' with the new lines of txt, but not repeat lines.
I'm having a hard time putting this script together. I'm guessing I need some sort of loop to do this continuously. Here is what I have so far. (not in order)
EDIT: I am using Debian(Raspbian) on a Raspberry Pi
import sys
num_lines = sum(1 for line in open('out.txt')) # read the last line of the updated file
sys.stdout = open('received.txt', 'w') #write to the received.txt file
print 'test'
f = open('out.txt', 'r') #read the data from the last line
f.readline(num_lines)
for line in f:
print line
Any advice would be extremely helpful.
Thank you
There are a few ways to do this.
The simplest is to keep looping over the file even after EOF. You could do this by just wrapping a while True: around the for line in f:, or by just looping forever around f.readline().
But this will waste a lot of CPU power and possibly even disk access checking over and over as fast as possible whether the file is still at EOF. You can fix that by sleeping whenever you get to the end of the file, like this:
while True:
for line in f:
print line
time.sleep(0.5)
But if the file is not written to for a long time, you're still wasting CPU power (which may not seem like a problem, but imagine what happens when the computer wants to go to sleep, and it can't because you're making it work every half a second). And meanwhile, if the file is being written to a lot faster than twice/second, you're going to lag.
So, a better solution is to block until there's something to read.
Unfortunately, there's no easy cross-platform way to do this. Fortunately, there are relatively easy platform-specific ways to do it on most platforms, but I'd need to know your platform to help.
For example, on OS X or other *BSD systems, you can use kqueue to wait until a file has something to read:
from select import *
# the rest of your code until the reading loop
while True:
for line in f:
print line
kq = kqueue()
kq.control([kevent(f.fileno(), filter=KQ_FILTER_READ, flags=KQ_EV_ADD)], 0, 0)
kq.control(None, 1)
kq.close()
But that won't work on Windows, or linux, or any other platform. (Also, that's a pretty bad way to do it on BSD, it's just shorter to show this way than the right way. If you want to do this for OS X, find a good tutorial on using kqueue in Python, don't copy this code.)
I'm learning PyGTK and I'm making a Text Editor (That seems to be the hello world of pygtk :])
Anyways, I have a "Save" function that writes the TextBuffer to a file. Looks something like
try:
f = open(self.working_file_path, "rw+")
buff = self._get_buffer()
f.write(self._get_text())
#update modified flag
buff.set_modified(False)
f.close()
except IOError as e:
print "File Doesnt Exist so bring up Save As..."
......
Basically, if the file exist, write the buffer to it, if not bring up the Save As Dialog.
My question is: What is the best way to "update" a file. I seem to only be able to append to the end of a file. I've tried various file modes, but I'm sure I'm missing something.
Thanks in advance!
You can open a file in "r+" mode, which allows you to both read and write to the file, and to seek to particular positions and write there. This probably doesn't help you do what I think you want though; it sounds like you're wanting to only write out the changed data?
Remember that on the disk the file isn't stored as a series of extensible lines, it's just a sequence of bytes; some of those bytes indicate line-endings, but the next line follows on immediately. So if you edit the first line in the file and you write the new first line out, unless the new one happens to be exactly the same length as the old one the second line now won't be in the right place, so you'll need to move it (and have taken a copy of it first if the new line you wrote out was longer than the original). And this now means that the next line isn't in the right position either... and so on until you've had to read in and write out the entire rest of the file.
In practice you almost never write only part of an existing file unless you can simply append more data; if you need to "alter" a file you read it in, alter it in memory, and write it back out or you read in the file in pieces (often line by line) and then write out to a new file as you go (and then possibly move the new file over the top of the original). The first approach is easiest, the second is better for not having to hold the whole thing in memory at once.
At the point where you write to the file, your location is at the end of the file, so you need to seek back to the beginning. Then, you will overwrite the file, but this may leave old content at the end, so you also need to truncate the file.
Additionally, the mode you're specifying ('rw+') is invalid, and I get IOErrors when I try to do some operations on files opened with it. I believe that you want mode 'r+' ("Open for reading and writing. The stream is positioned at the beginning of the file."). 'w+' is similar, but would create the file if it didn't exist.
So, what you're looking for might be code like this:
try:
f = open(self.working_file_path, "r+")
buff = self._get_buffer()
f.seek(0)
f.truncate()
f.write(self._get_text())
#update modified flag
buff.set_modified(False)
f.close()
except IOError as e:
print "File Doesnt Exist so bring up Save As..."
......
However, you may want to modify this code to correctly catch and handle errors while truncating and writing the file, rather than assuming that all IOErrors in this section are non-existant-file errors from the call to open.
Read the file in as a list, add an element to the start of it, write it all out. Something like this.
f = open(self.working_file_path, "r+")
flist = f.readlines()
flist.insert(0, self._get_text())
f.seek(0)
f.writelines(flist)
I'm designing a daemon that will continuously read lines from a single text file and process those lines. What is a good general purpose way to keep track of the last line processed, independent of the file name, in the event of lines being written to the text file while the daemon isn't running?
Every so often, the file is archived and a new blank file is created in its place. The daemon will be stopped for the archival to occur.
My first idea, which seems overcomplicated, is to compute and store a hash and line number of the last successfully processed record. Then, when the daemon is started again, run to that line number and calculate the hash. If the hash matches, continue on processing the next record. If the hash doesn't match, start over on the file at the beginning, since that will say this is a new file.
I have a feeling there is a good general purpose technique used by log file analyzers or something in a text book that I haven't had exposure to.
Assuming you have permission, enough disk space and assuming you kill the daemon safely...
Just write the last line processed to a file (upon shutdown of the daemon).
You could wrap each instance of the daemon inside a context manger if you want
from contextlib import contextmanager
http://docs.python.org/library/contextlib.html
class a_daemon():
def __init__(self,last_line):
print "initilizing.."
self.last_line=last_line
def run_me(self):
print "running.."
#while true, process lines, set last_line to current line being processesed.
self.last_line='blah'
from contextlib import contextmanager
#contextmanager
def run_new_daemon():
print "getting last line"
last_line=open("last_line.txt").read() #you should get a "file does not exist" error the first time running this unless you created the file already
my_daemon=a_daemon(last_line)
yield my_daemon
print "shutting down, writing last line to file."
with open("last_line.txt",'w') as last_line_file:
last_line_file.write(my_daemon.last_line)
with run_new_daemon() as my_daemon:
my_daemon.run_me()
If you're going to take the trouble of storing a hash, you might as well store the whole line. It can't be that long. Or in any case, if it's long enough to be a problem, then these must be really huge files!!
Anyway, you need data persistence of some kind. Pickle, JSON, SQLite are all options, but they all seem like overkill in this case. I would just store it in a file.
Do files opened like file("foo.txt") have any info about file modification time?
Basically I want to know if the file has been modified or replaced since a certain time, but if the file is replaced between checking modification time and opening the file, then you have inaccurate information.
How can I be sure?
Thanks.
UPDATE
#rubayeet: Thanks for the answer (+1), I actually didn't think of that. But... What to do if the modification time has changed? Perhaps I reload the file again. But what if it changes that time? If the file is being touched regularly I could end up in a loop forever! What I really want is a way to just get an open file handle and a modification time to go with it, without a potential infinite loop.
PS The answer you gave was actually plenty good enough for my purposes as the file won't be changed regularly, its general interest on my part now.
UPDATE 2
Thinking the previous update through (and experimenting a little) I realize that simply knowing the file modification time at the point the file was opened is not so much use as if the file is modified while reading you can have some or all of the modified data in the stuff you read in, so you'd have to open and read/process the whole file, then check mtime again (as per #rubayeet's answer) to see if you may have stale data.
For simple modtimes you would use:
from os.path import getmtime
modtime = getmtime('/file/to/path')
If you want something like a callback functionality you could check the inotify bindings for python: pyinotify.
You essentialy set a watchmanager up, which notifies you in a event-loop if any changes happens in the monitored directory. You register for specific events, like opening a file (which changes the modtime if written to).
If you are interested in an exclusive access to a file, i would point to the fnctl module, which has some lowlevel and file-locking mechanism on filedescriptors.
import os
filepath = '/path/to/file'
modifytime1 = os.path.getmtime(filepath)
fp = open(filepath)
modifytime2 = os.path.getmtime(filepath)
if modifytime1 != modifytime2:
print "File modified after opening"