Concurrency when writing to a file in python

Concurrency when writing to a file in python - python

I'm working on a p2p filesharing system in python3 right now and I've come across an issue I don't know how to fix exactly.
I have peers with a server process and a client process where a client process connects to the other nodes, puts it in its own thread, and listens for data over a socket. When downloading from only one other peer, the file is written correctly with no problem, but when it is split up over multiple peers, the file is corrupted. The data is correctly received from both other peers so I'm thinking this would be a file write issue.
When I get the data from a peer, I open the file, seek to the position where the data comes from, and then write it and close the file. Would locks be the solution to this?
This is the code that is in its own thread that is constantly listening
def handleResponse(clientConnection, fileName, fileSize):
# Listen for connections forever
try:
while True:
#fileName = ""
startPos = 0
data = clientConnection.recv(2154)
# If a response, process it
if (len(data) > 0):
split = data.split(b"\r\n\r\n")
#print(split[0])
headers = split[0].replace(b'\r\n', b' ').split(b' ')
# Go through the split headers and grab the startPos and fileName
for i in range(len(headers)):
if (headers[i] == b"Range:"):
startPos = int(headers[i+1])
#fileName = headers[i+2].decode()
break
# Write the file at the seek pos
mode = "ab+"
if (startPos == 0):
mode = "wb+"
with open ("Download/" + fileName, mode) as f:
f.seek(startPos, 0)
f.write(split[1])
f.close()

Answered by Steffen Ullrich.
Solution is to open the file in rb+ instead of ab+, seek to the position and write. Do note that if the file does not exist, it will throw an exception since it is not created in rb+

Related

Writing to data in Python to a local file and uploading to FTP at the same time does not work

I have this weird issue with my code on Raspberry Pi 4.
from gpiozero import CPUTemperature
from datetime import datetime
import ftplib
cpu = CPUTemperature()
now = datetime.now()
time = now.strftime('%H:%M:%S')
# Save data to file
f = open('/home/pi/temp/temp.txt', 'a+')
f.write(str(time) + ' - Temperature is: ' + str(cpu.temperature) + ' C\n')
# Login and store file to FTP server
ftp = ftplib.FTP('10.0.0.2', 'username', 'pass')
ftp.cwd('AiDisk_a1/usb/temperature_logs')
ftp.storbinary('STOR temp.txt', f)
# Close file and connection
ftp.close()
f.close()
When I have this code, script doesn't write anything to the .txt file and file that is transferred to FTP server has size of 0 bytes.
When I remove this part of code, script is writing to the file just fine.
# Login and store file to FTP server
ftp = ftplib.FTP('10.0.0.2', 'username', 'pass')
ftp.cwd('AiDisk_a1/usb/temperature_logs')
ftp.storbinary('STOR temp.txt', f)
...
ftp.close()
I also tried to write some random text to the file and run the script, and the file transferred normally.
Do you have any idea, what am I missing?

After you write the file, the file pointer is at the end. So if you pass file handle to FTP, it reads nothing. Hence nothing is uploaded.
I do not have a direct explanation for the fact the local file ends up empty. But the strange way of combining "append" mode and reading may be the reason. I do not even see a+ mode defined in open function documentation.
If you want to both append data to a local file and FTP, I suggest your either:
Append the data to the file – Seek back to the original position – And upload the appended file contents.
Write the data to memory and then separately 1) dump the in-memory data to a file and 2) upload it.

Sending back a file stream from GRPC Python Server

I have a service that needs to return a filestream to the calling client so I have created this proto file.
service Sample {
rpc getSomething(Request) returns (stream Response){}
}
message Request {
}
message Response {
bytes data = 1;
}
When the server receives this, it needs to read some source.txt file and then write it back to the client
as a byte stream. Just would like to ask is this the proper way to do this in a Python GRPC server?
fileName = "source.txt"
with open(file_name, 'r') as content_file:
content = content_file.read()
response.data = content.encode()
yield response
I cannot find any examples related to this.

That looks mostly correct, though it's hard to be sure since you haven't shared with us all of your service-side code. A few tweaks I'd suggest would be (1) reading the file as binary content in the first place, (2) exiting the with statement as early as possible, (3) constructing the response message only after you've constructed the value of its data field, and (4) making a module-scope module-private constant out of the file name. Something like:
with open(_CONTENT_FILE_NAME, 'rb') as content_file:
content = content_file.read()
yield my_generated_module_pb2.Response(data=content)
. What do you think?

One option would be to lazily read in the binary and yield each chunk. Note, this is untested code:
def read_bytes(file_, num_bytes):
while True:
bin = file_.read(num_bytes)
if len(bin) != num_bytes:
break
yield bin
class ResponseStreamer(Sample_pb2_grpc.SampleServicer):
def getSomething(request, context):
with open('test.bin', 'rb') as f:
for rec in read_bytes(f, 4):
yield Sample_pb2.Response(data=rec)
Downside is that you'll have the file opened while the stream is open.

Named pipe won't block

I'm trying to make multiple program communicate using Named Pipes under python.
Here's how I'm proceeding :
import os
os.mkfifo("/tmp/p")
file = os.open("/tmp/p", os.O_RDONLY)
while True:
line = os.read(file, 255)
print("'%s'" % line)
Then, after starting it I'm sending a simple data through the pipe :
echo "test" > /tmp/p
I expected here to have test\n showing up, and the python blocks at os.read() again.
What is happening is python to print the 'test\n' and then print '' (empty string) infinitely.
Why is that happening, and what can I do about that ?

From http://man7.org/linux/man-pages/man7/pipe.7.html :
If all file descriptors referring to the write end of a pipe have been
closed, then an attempt to read(2) from the pipe will see end-of-file
From https://docs.python.org/2/library/os.html#os.read :
If the end of the file referred to by fd has been reached, an empty string is returned.
So, you're closing the write end of the pipe (when your echo command finishes) and Python is reporting that as end-of-file.
If you want to wait for another process to open the FIFO, then you could detect when read() returns end-of-file, close the FIFO, and open it again. The open should block until a new writer comes along.

As an alternative to user9876's answer you can open your pipe for writing right after creating it, this allows it to stay open for writing at all times.
Here's an example contextmanager for working with pipes:
#contextlib.contextmanager
def pipe(path):
try:
os.mkfifo(path)
except FileExistsError:
pass
try:
with open(path, 'w'): # dummy writer
with open(path, 'r') as reader:
yield reader
finally:
os.unlink(path)
And here is how you use it:
with pipe('myfile') as reader:
while True:
print(reader.readline(), end='')

python: read file continuously, even after it has been logrotated

I have a simple python script, where I read logfile continuosly (same as tail -f)
while True:
line = f.readline()
if line:
print line,
else:
time.sleep(0.1)
How can I make sure that I can still read the logfile, after it has been rotated by logrotate?
i.e. I need to do the same what tail -F would do.
I am using python 2.7

As long as you only plan to do this on Unix, the most robust way is probably to check so that the open file still refers to the same i-node as the name, and reopen it when that is no longer the case. You can get the i-number of the file from os.stat and os.fstat, in the st_ino field.
It could look like this:
import os, sys, time
name = "logfile"
current = open(name, "r")
curino = os.fstat(current.fileno()).st_ino
while True:
while True:
buf = current.read(1024)
if buf == "":
break
sys.stdout.write(buf)
try:
if os.stat(name).st_ino != curino:
new = open(name, "r")
current.close()
current = new
curino = os.fstat(current.fileno()).st_ino
continue
except IOError:
pass
time.sleep(1)
I doubt this works on Windows, but since you're speaking in terms of tail, I'm guessing that's not a problem. :)

You can do it by keeping track of where you are in the file and reopening it when you want to read. When the log file rotates, you notice that the file is smaller and since you reopen, you handle any unlinking too.
import time
cur = 0
while True:
try:
with open('myfile') as f:
f.seek(0,2)
if f.tell() < cur:
f.seek(0,0)
else:
f.seek(cur,0)
for line in f:
print line.strip()
cur = f.tell()
except IOError, e:
pass
time.sleep(1)
This example hides errors like file not found because I'm not sure of logrotate details such as small periods of time where the file is not available.
NOTE: In python 3, things are different. A regular open translates bytes to str and the interim buffer used for that conversion means that seek and tell don't operate properly (except when seeking to 0 or the end of file). Instead, open in binary mode ("rb") and do the decode manually line by line. You'll have to know the file encoding and what that encoding's newline looks like. For utf-8, its b"\n" (one of the reasons utf-8 is superior to utf-16, btw).

Thanks to #tdelaney and #Dolda2000's answers, I ended up with what follows. It should work on both Linux and Windows, and also handle logrotate's copytruncate or create options (respectively copy then truncate size to 0 and move then recreate file).
file_name = 'my_log_file'
seek_end = True
while True: # handle moved/truncated files by allowing to reopen
with open(file_name) as f:
if seek_end: # reopened files must not seek end
f.seek(0, 2)
while True: # line reading loop
line = f.readline()
if not line:
try:
if f.tell() > os.path.getsize(file_name):
# rotation occurred (copytruncate/create)
f.close()
seek_end = False
break
except FileNotFoundError:
# rotation occurred but new file still not created
pass # wait 1 second and retry
time.sleep(1)
do_stuff_with(line)
A limitation when using copytruncate option is that if lines are appended to the file while time-sleeping, and rotation occurs before wake-up, the last lines will be "lost" (they will still be in the now "old" log file, but I cannot see a decent way to "follow" that file to finish reading it). This limitation is not relevant with "move and create" create option because f descriptor will still point to the renamed file and therefore last lines will be read before the descriptor is closed and opened again.

Using 'tail -F
man tail
-F same as --follow=name --retr
-f, --follow[={name|descriptor}] output appended data as the file grows;
--retry keep trying to open a file if it is inaccessible
-F option will follow the name of the file not descriptor.
So when logrotate happens, it will follow the new file.
import subprocess
def tail(filename: str) -> Generator[str, None, None]:
proc = subprocess.Popen(["tail", "-F", filename], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
while True:
line = proc.stdout.readline()
if line:
yield line.decode("utf-8")
else:
break
for line in tail("/config/logs/openssh/current"):
print(line.strip())

I made a variation of the awesome above one by #pawamoy into a generator function one for my log monitoring and following needs.
def tail_file(file):
"""generator function that yields new lines in a file
:param file:File Path as a string
:type file: str
:rtype: collections.Iterable
"""
seek_end = True
while True: # handle moved/truncated files by allowing to reopen
with open(file) as f:
if seek_end: # reopened files must not seek end
f.seek(0, 2)
while True: # line reading loop
line = f.readline()
if not line:
try:
if f.tell() > os.path.getsize(file):
# rotation occurred (copytruncate/create)
f.close()
seek_end = False
break
except FileNotFoundError:
# rotation occurred but new file still not created
pass # wait 1 second and retry
time.sleep(1)
yield line
Which can be easily used like the below
import os, time
access_logfile = '/var/log/syslog'
loglines = tail_file(access_logfile)
for line in loglines:
print(line)

python network file writing in a robust manner

I am looking for a robust way to write out to a network drive. I am stuck with WinXP writing to a share on a Win2003 server. I want to pause writing if the network share goes down... then reconnect and continue writing once the network resource is available. With my initial code below, what happens is the 'except' catches the IOError when the drive goes away, but then when the drive becomes available again, the outf operations continue to IOError.
import serial
with serial.Serial('COM8',9600,timeout=5) as port, open('m:\\file.txt','ab') as outf:
while True:
x = port.readline() # read one line from serial port
if x: # if the there was some data
print x[0:-1] # display the line without extra CR
try:
outf.write(x) # write the line to the output file
outf.flush() # actually write the file
except IOError: # catch an io error
print 'there was an io error'

I suspect that once an open file goes into an error state because of the IOError that you will need to reopen it. You could try something like this:
with serial.Serial('COM8',9600,timeout=5) as port:
while True:
try:
with open('m:\\file.txt','ab') as outf:
while True:
x = port.readline() # read one line from serial port
if x: # if the there was some data
print x[0:-1] # display the line without extra CR
try:
outf.write(x) # write the line to the output file
outf.flush() # actually write the file
break
except IOError:
print 'there was an io error'
This puts the exception handling inside an outer loop that will reopen the file (and continue reading from the port) in the event of an exception. In practice you would probably want to add a time.sleep() or something to the except block in order to prevent the code from spinning.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Concurrency when writing to a file in python - python

Answered by Steffen Ullrich. Solution is to open the file in rb+ instead of ab+, seek to the position and write. Do note that if the file does not exist, it will throw an exception since it is not created in rb+

Related

Writing to data in Python to a local file and uploading to FTP at the same time does not work

Sending back a file stream from GRPC Python Server

Named pipe won't block

python: read file continuously, even after it has been logrotated

python network file writing in a robust manner

Categories

Resources