python network file writing in a robust manner - python

I am looking for a robust way to write out to a network drive. I am stuck with WinXP writing to a share on a Win2003 server. I want to pause writing if the network share goes down... then reconnect and continue writing once the network resource is available. With my initial code below, what happens is the 'except' catches the IOError when the drive goes away, but then when the drive becomes available again, the outf operations continue to IOError.
import serial
with serial.Serial('COM8',9600,timeout=5) as port, open('m:\\file.txt','ab') as outf:
while True:
x = port.readline() # read one line from serial port
if x: # if the there was some data
print x[0:-1] # display the line without extra CR
try:
outf.write(x) # write the line to the output file
outf.flush() # actually write the file
except IOError: # catch an io error
print 'there was an io error'

I suspect that once an open file goes into an error state because of the IOError that you will need to reopen it. You could try something like this:
with serial.Serial('COM8',9600,timeout=5) as port:
while True:
try:
with open('m:\\file.txt','ab') as outf:
while True:
x = port.readline() # read one line from serial port
if x: # if the there was some data
print x[0:-1] # display the line without extra CR
try:
outf.write(x) # write the line to the output file
outf.flush() # actually write the file
break
except IOError:
print 'there was an io error'
This puts the exception handling inside an outer loop that will reopen the file (and continue reading from the port) in the event of an exception. In practice you would probably want to add a time.sleep() or something to the except block in order to prevent the code from spinning.

Related

Concurrency when writing to a file in python

I'm working on a p2p filesharing system in python3 right now and I've come across an issue I don't know how to fix exactly.
I have peers with a server process and a client process where a client process connects to the other nodes, puts it in its own thread, and listens for data over a socket. When downloading from only one other peer, the file is written correctly with no problem, but when it is split up over multiple peers, the file is corrupted. The data is correctly received from both other peers so I'm thinking this would be a file write issue.
When I get the data from a peer, I open the file, seek to the position where the data comes from, and then write it and close the file. Would locks be the solution to this?
This is the code that is in its own thread that is constantly listening
def handleResponse(clientConnection, fileName, fileSize):
# Listen for connections forever
try:
while True:
#fileName = ""
startPos = 0
data = clientConnection.recv(2154)
# If a response, process it
if (len(data) > 0):
split = data.split(b"\r\n\r\n")
#print(split[0])
headers = split[0].replace(b'\r\n', b' ').split(b' ')
# Go through the split headers and grab the startPos and fileName
for i in range(len(headers)):
if (headers[i] == b"Range:"):
startPos = int(headers[i+1])
#fileName = headers[i+2].decode()
break
# Write the file at the seek pos
mode = "ab+"
if (startPos == 0):
mode = "wb+"
with open ("Download/" + fileName, mode) as f:
f.seek(startPos, 0)
f.write(split[1])
f.close()
Answered by Steffen Ullrich.
Solution is to open the file in rb+ instead of ab+, seek to the position and write. Do note that if the file does not exist, it will throw an exception since it is not created in rb+

exception stop read files python

I'm trying to control exceptions when reading files, but I have a problem. I'm new to Python, and I am not yet able to control how I can catch an exception and still continue reading text from the files I am accessing. This is my code:
import errno
import sys
class Read:
#FIXME do immutables this 2 const
ROUTE = "d:\Profiles\user\Desktop\\"
EXT = ".txt"
def setFileReaded(self, fileToRead):
content = ""
try:
infile = open(self.ROUTE+fileToRead+self.EXT)
except FileNotFoundError as error:
if error.errno == errno.ENOENT:
print ("File not found, please check the name and try again")
else:
raise
sys.exit()
with infile:
content = infile.read()
infile.close()
return content
And from another class I tell it:
read = Read()
print(read.setFileReaded("verbs"))
print(read.setFileReaded("object"))
print(read.setFileReaded("sites"))
print(read.setFileReaded("texts"))
Buy only print this one:
turn on
connect
plug
File not found, please check the name and try again
And no continue with the next files. How can the program still reading all files?
It's a little difficult to understand exactly what you're asking here, but I'll try and provide some pointers.
sys.exit() will terminate the Python script gracefully. In your code, this is called when the FileNotFoundError exception is caught. Nothing further will be ran after this, because your script will terminate. So none of the other files will be read.
Another thing to point out is that you close the file after reading it, which is not needed when you open it like this:
with open('myfile.txt') as f:
content = f.read()
The file will be closed automatically after the with block.

Generator Reading Log File While Name Changes Python

The goal is to read a log file in real time line by line (standard generator stuff) but the catch is, the file name changes at various intervals. The name change can't be helped (application dictated appended with a time string) and the name is changed when the log file size reaches ~2MB (guesstimate).
My approach was to create a file getter function that got the file (or new file) and then passed that to the generator. I thought that when the file changed names I would get a 'File not found' error, but what my test showed, is that the file name change is prevented entirely as 'another program is using this file'. The name change must be allowed, and this reader code cannot interfere with the application logging process at all.
import os
import time
import fnmatch
directory = '\\foo\\'
def fileGenerator(logFile):
""" Run a line generator """
logFile.seek(0,2)
while True:
line = logFile.readline()
if not line:
time.sleep(0.1)
continue
yield line
def fileGetter():
""" Get the Logging File """
matchedFiles = []
for afile in os.listdir(directory):
if fnmatch.fnmatch(afile,'amc_*.txt'):
matchedFiles.append(afile)
if len(matchedFiles)==1:
#There was exactly one matching file found send it to the generator
return os.path.join(directory,matchedFiles[0])
else:
#There either wasn't a file found or many matching
#Error out and stop process... critical error
if __name__ == '__main__':
filePath = fileGetter()
try:
logFile = open(filePath,"r")
except Exception as e:
#Catch the file not found and go back to the file path getter
#Send the file back to the generator
print e
if logFile:
loglines = fileGenerator(logFile)
for line in loglines:
#handle the line
print line,
If you can't hold the file open while waiting for new content to be written to it, I suggest saving the file position you were last at and closing the file before you sleep, and then reopening the file and seeking to that point afterwards. You could also investigate filesystem notification systems if you care about spotting file additions or renames immediately.
def log_reader():
filename = "does_not_exist"
filepos = 0
while True:
try:
file = open(filename)
except FileNotFoundError:
filename = fileGetter()
# if renamed files start empty, set filepos to zero here!
continue
file.seek(filepos)
while True:
line = file.readline()
if not line:
filepos = file.tell()
file.close()
sleep(0.1) # you may want to test different sleep lengths to avoid FS thrash
break
yield line
The opening and closing of the file may stress out your filesystem if you do it too much, so I'd suggest sleeping longer than your previous code did (but you may want to test to see how well your OS handles it if you care about how responsive your log reader is).

Named pipe won't block

I'm trying to make multiple program communicate using Named Pipes under python.
Here's how I'm proceeding :
import os
os.mkfifo("/tmp/p")
file = os.open("/tmp/p", os.O_RDONLY)
while True:
line = os.read(file, 255)
print("'%s'" % line)
Then, after starting it I'm sending a simple data through the pipe :
echo "test" > /tmp/p
I expected here to have test\n showing up, and the python blocks at os.read() again.
What is happening is python to print the 'test\n' and then print '' (empty string) infinitely.
Why is that happening, and what can I do about that ?
From http://man7.org/linux/man-pages/man7/pipe.7.html :
If all file descriptors referring to the write end of a pipe have been
closed, then an attempt to read(2) from the pipe will see end-of-file
From https://docs.python.org/2/library/os.html#os.read :
If the end of the file referred to by fd has been reached, an empty string is returned.
So, you're closing the write end of the pipe (when your echo command finishes) and Python is reporting that as end-of-file.
If you want to wait for another process to open the FIFO, then you could detect when read() returns end-of-file, close the FIFO, and open it again. The open should block until a new writer comes along.
As an alternative to user9876's answer you can open your pipe for writing right after creating it, this allows it to stay open for writing at all times.
Here's an example contextmanager for working with pipes:
#contextlib.contextmanager
def pipe(path):
try:
os.mkfifo(path)
except FileExistsError:
pass
try:
with open(path, 'w'): # dummy writer
with open(path, 'r') as reader:
yield reader
finally:
os.unlink(path)
And here is how you use it:
with pipe('myfile') as reader:
while True:
print(reader.readline(), end='')

python: read file continuously, even after it has been logrotated

I have a simple python script, where I read logfile continuosly (same as tail -f)
while True:
line = f.readline()
if line:
print line,
else:
time.sleep(0.1)
How can I make sure that I can still read the logfile, after it has been rotated by logrotate?
i.e. I need to do the same what tail -F would do.
I am using python 2.7
As long as you only plan to do this on Unix, the most robust way is probably to check so that the open file still refers to the same i-node as the name, and reopen it when that is no longer the case. You can get the i-number of the file from os.stat and os.fstat, in the st_ino field.
It could look like this:
import os, sys, time
name = "logfile"
current = open(name, "r")
curino = os.fstat(current.fileno()).st_ino
while True:
while True:
buf = current.read(1024)
if buf == "":
break
sys.stdout.write(buf)
try:
if os.stat(name).st_ino != curino:
new = open(name, "r")
current.close()
current = new
curino = os.fstat(current.fileno()).st_ino
continue
except IOError:
pass
time.sleep(1)
I doubt this works on Windows, but since you're speaking in terms of tail, I'm guessing that's not a problem. :)
You can do it by keeping track of where you are in the file and reopening it when you want to read. When the log file rotates, you notice that the file is smaller and since you reopen, you handle any unlinking too.
import time
cur = 0
while True:
try:
with open('myfile') as f:
f.seek(0,2)
if f.tell() < cur:
f.seek(0,0)
else:
f.seek(cur,0)
for line in f:
print line.strip()
cur = f.tell()
except IOError, e:
pass
time.sleep(1)
This example hides errors like file not found because I'm not sure of logrotate details such as small periods of time where the file is not available.
NOTE: In python 3, things are different. A regular open translates bytes to str and the interim buffer used for that conversion means that seek and tell don't operate properly (except when seeking to 0 or the end of file). Instead, open in binary mode ("rb") and do the decode manually line by line. You'll have to know the file encoding and what that encoding's newline looks like. For utf-8, its b"\n" (one of the reasons utf-8 is superior to utf-16, btw).
Thanks to #tdelaney and #Dolda2000's answers, I ended up with what follows. It should work on both Linux and Windows, and also handle logrotate's copytruncate or create options (respectively copy then truncate size to 0 and move then recreate file).
file_name = 'my_log_file'
seek_end = True
while True: # handle moved/truncated files by allowing to reopen
with open(file_name) as f:
if seek_end: # reopened files must not seek end
f.seek(0, 2)
while True: # line reading loop
line = f.readline()
if not line:
try:
if f.tell() > os.path.getsize(file_name):
# rotation occurred (copytruncate/create)
f.close()
seek_end = False
break
except FileNotFoundError:
# rotation occurred but new file still not created
pass # wait 1 second and retry
time.sleep(1)
do_stuff_with(line)
A limitation when using copytruncate option is that if lines are appended to the file while time-sleeping, and rotation occurs before wake-up, the last lines will be "lost" (they will still be in the now "old" log file, but I cannot see a decent way to "follow" that file to finish reading it). This limitation is not relevant with "move and create" create option because f descriptor will still point to the renamed file and therefore last lines will be read before the descriptor is closed and opened again.
Using 'tail -F
man tail
-F same as --follow=name --retr
-f, --follow[={name|descriptor}] output appended data as the file grows;
--retry keep trying to open a file if it is inaccessible
-F option will follow the name of the file not descriptor.
So when logrotate happens, it will follow the new file.
import subprocess
def tail(filename: str) -> Generator[str, None, None]:
proc = subprocess.Popen(["tail", "-F", filename], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
while True:
line = proc.stdout.readline()
if line:
yield line.decode("utf-8")
else:
break
for line in tail("/config/logs/openssh/current"):
print(line.strip())
I made a variation of the awesome above one by #pawamoy into a generator function one for my log monitoring and following needs.
def tail_file(file):
"""generator function that yields new lines in a file
:param file:File Path as a string
:type file: str
:rtype: collections.Iterable
"""
seek_end = True
while True: # handle moved/truncated files by allowing to reopen
with open(file) as f:
if seek_end: # reopened files must not seek end
f.seek(0, 2)
while True: # line reading loop
line = f.readline()
if not line:
try:
if f.tell() > os.path.getsize(file):
# rotation occurred (copytruncate/create)
f.close()
seek_end = False
break
except FileNotFoundError:
# rotation occurred but new file still not created
pass # wait 1 second and retry
time.sleep(1)
yield line
Which can be easily used like the below
import os, time
access_logfile = '/var/log/syslog'
loglines = tail_file(access_logfile)
for line in loglines:
print(line)

Categories