I am trying to learn python socket programming (Networks) and I have a function that sends a file but I am not fully understanding each line of this function (can't get my head around it). Could someone please explain line by line what this is doing. I am also unsure why it needs the length of the data to send the file. Also if you can see any improvements to this function as well, thanks.
def send_file(socket, filename):
with open(filename, "rb") as x:
data = x.read()
length_data = len(data)
try:
socket.sendall(length_data.to_bytes(20, 'big'))
socket.sendall(data)
except Exception as error:
print("Error", error)
This protocol is written to first send the size of the file and then its data. The size is 20 octets (bytes) serialized as "big endian", meaning the most significant byte is sent first. Suppose you had a 1M file,
>>> val = 1000000
>>> val.to_bytes(20, 'big')
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0fB#'
The program sends those 20 bytes and then the file payload itself. The receiver would read the 20 octets, convert that back into a file size and would know exactly how many more octets to receive for the file. TCP is a streaming protocol. It has no idea of message boundaries. Protocols need some way of telling the other side how much data makes up a message and this is a common way to do it.
As an aside, this code has the serious problem that it reads the entire file in one go. Suppose it was huge, this code would crash.
A receiver would look something like the following. This is a rudimentary.
import io
def recv_exact(skt, count):
buf = io.BytesIO()
while count:
data = skt.recv(count)
if not data:
return b"" # socket closed
buf.write(data)
count -= len(data)
return buf.getvalue()
def recv_file(skt):
data = recv_exact(skt, 20)
if not data:
return b""
file_size = int.from_bytes(data, "big")
file_bytes = recv_exact(skt, file_size)
return file_bytes
Related
I have a service that needs to return a filestream to the calling client so I have created this proto file.
service Sample {
rpc getSomething(Request) returns (stream Response){}
}
message Request {
}
message Response {
bytes data = 1;
}
When the server receives this, it needs to read some source.txt file and then write it back to the client
as a byte stream. Just would like to ask is this the proper way to do this in a Python GRPC server?
fileName = "source.txt"
with open(file_name, 'r') as content_file:
content = content_file.read()
response.data = content.encode()
yield response
I cannot find any examples related to this.
That looks mostly correct, though it's hard to be sure since you haven't shared with us all of your service-side code. A few tweaks I'd suggest would be (1) reading the file as binary content in the first place, (2) exiting the with statement as early as possible, (3) constructing the response message only after you've constructed the value of its data field, and (4) making a module-scope module-private constant out of the file name. Something like:
with open(_CONTENT_FILE_NAME, 'rb') as content_file:
content = content_file.read()
yield my_generated_module_pb2.Response(data=content)
. What do you think?
One option would be to lazily read in the binary and yield each chunk. Note, this is untested code:
def read_bytes(file_, num_bytes):
while True:
bin = file_.read(num_bytes)
if len(bin) != num_bytes:
break
yield bin
class ResponseStreamer(Sample_pb2_grpc.SampleServicer):
def getSomething(request, context):
with open('test.bin', 'rb') as f:
for rec in read_bytes(f, 4):
yield Sample_pb2.Response(data=rec)
Downside is that you'll have the file opened while the stream is open.
I need to be able to upload a file through FTP and SFTP in Python but with some not so usual constraints.
File MUST NOT be written in disk.
The file how it is generated is by calling an API and writing the response which is in JSON to the file.
There are multiple calls to the API. It is not possible to retrieve the whole result in one single call of the API.
I can not store in a string variable the full result by doing the multiple calls needed and appending in each call until I have the whole file in memory. File could be huge and there is a memory resource constraint. Each chunk should be sent and memory deallocated.
So here some sample code of what I would like to:
def chunks_generator():
range_list = range(0, 4000, 100)
for i in range_list:
data_chunk = requests.get(url=someurl, url_parameters={'offset':i, 'limit':100})
yield str(data_chunk)
def upload_file():
chunks_generator = chunks_generator()
for chunk in chunks_generator:
data_chunk= chunk
chunk_io = io.BytesIO(data_chunk)
ftp = FTP(self.host)
ftp.login(user=self.username, passwd=self.password)
ftp.cwd(self.remote_path)
ftp.storbinary("STOR " + "myfilename.json", chunk_io)
I want only one file with all the chunks appended.
What I have already and works is if I have the whole file in memory and send it at once like this:
string_io = io.BytesIO(all_chunks_together_in_one_string)
ftp = FTP(self.host)
ftp.login(user=self.username, passwd=self.password)
ftp.cwd(self.remote_path)
ftp.storbinary("STOR " + "myfilename.json", string_io )
Bonus
I need this in ftplib but will need it in Paramiko as well for SFTP. If there are any other libraries that this would work better I am open.
How about if I need to zip the file? Can I zip each chunk and send the zip-chunked chunk at a time?
You can implement file-like class that upon calling .read(blocksize) method retrieves data from requests object.
Something like this (untested):
class ChunksGenerator:
i = 0
requests = None
def __init__(self, requests)
self.requests = requests
def read(self, blocksize):
# TODO: somehow detect end-of-file and return false in that case
buf = requests.get(
url=someurl, url_parameters={'offset':self.i, 'limit':blocksize})
self.i += blocksize
return buf
generator = ChunksGenerator(requests)
ftp.storbinary("STOR " + "myfilename.json", generator)
With Paramiko, you can use the same class with SFTPClient.putfo method.
If I make a request for a file and specify encoding of gzip, how do I handle that?
Normally when I have a large file I do the following:
while True:
chunk = resp.read(CHUNK)
if not chunk: break
writer.write(chunk)
writer.flush()
where the CHUNK is some size in bytes, writer is an open() object and resp is the request response generated from a urllib request.
So it's pretty simple most of the time when the response header contains 'gzip' as the returned encoding, I would do the following:
decomp = zlib.decompressobj(16+zlib.MAX_WBITS)
data = decomp.decompress(resp.read())
writer.write(data)
writer.flush()
or this:
f = gzip.GzipFile(fileobj=buf)
writer.write(f.read())
where the buf is a BytesIO().
If I try to decompress the gzip response though, I am getting issues:
while True:
chunk = resp.read(CHUNK)
if not chunk: break
decomp = zlib.decompressobj(16+zlib.MAX_WBITS)
data = decomp.decompress(chunk)
writer.write(data)
writer.flush()
Is there a way I can decompress the gzip data as it comes down in little chunks? or do I need to write the whole file to disk, decompress it then move it to the final file name? Part of the issue I have, using 32-bit Python, is that I can get out of memory errors.
Thank you
I think I found a solution that I wish to share.
def _chunk(response, size=4096):
""" downloads a web response in pieces """
method = response.headers.get("content-encoding")
if method == "gzip":
d = zlib.decompressobj(16+zlib.MAX_WBITS)
b = response.read(size)
while b:
data = d.decompress(b)
yield data
b = response.read(size)
del data
else:
while True:
chunk = response.read(size)
if not chunk: break
yield chunk
If anyone has a better solution, please add to it. Basically my error was the creation of the zlib.decompressobj(). I was creating it in the wrong place.
This seems to work in both python 2 and 3 as well, so there is a plus.
I am trying to write a cStringIO buffer to disk. The buffer may represent a pdf, image or html file.
The approach I took seems a bit wonky, so I am open to alternative approaches as a solution as well.
def copyfile(self, destfilepath):
if self.datastream.tell() == 0:
raise Exception("Exception: Attempt to copy empty buffer.")
with open(destfilepath, 'wb') as fp:
shutil.copyfileobj(self.datastream, fp)
self.__datastream__.close()
#property
def datastream(self):
return self.__datastream__
#... inside func that sets __datastream__
while True:
buffer = response.read(block_sz)
self.__datastream__.write(buffer)
if not buffer:
break
# ... etc ..
test = Downloader()
ok = test.getfile(test_url)
if ok:
test.copyfile(save_path)
I took this approach because I don't want to start writting data to disk until I know I have successfully read the entire file and that it's a type I am interested in.
After calling copyfile() the file on disk is always zero bytes.
whoops!
I forgot to reset the stream position before trying to read from it; so it was reading from the end, hence the zero bytes. Moving the cursor to the beginning resolves the issue.
def copyfile(self, destfilepath):
if self.datastream.tell() == 0:
raise Exception("Exception: Attempt to copy empty buffer.")
self.__datastream__.seek(0) # <-- RESET POSITION TO BEGINNING
with open(destfilepath, 'wb') as fp:
shutil.copyfileobj(self.datastream, fp)
self.__datastream__.close()
I have some problems with this code... send not the integer image but some bytes, is there someone than can help me? I want to send all images I find in a folder. Thank you.
CLIENT
import socket
import sys
import os
s = socket.socket()
s.connect(("localhost",9999)) #IP address, port
sb = 'c:\\python27\\invia'
os.chdir(sb) #path
dirs =os.listdir(sb) #list of file
print dirs
for file in dirs:
f=open(file, "rb") #read image
l = f.read()
s.send(file) #send the name of the file
st = os.stat(sb+'\\'+file).st_size
print str(st)
s.send(str(st)) #send the size of the file
s.send(l) #send data of the file
f.close()
s.close()
SERVER
import socket
import sys
import os
s = socket.socket()
s.bind(("localhost",9999))
s.listen(4) #number of people than can connect it
sc, address = s.accept()
print address
sb = 'c:\\python27\\ricevi'
os.chdir(sb)
while True:
fln=sc.recv(5) #read the name of the file
print fln
f = open(fln,'wb') #create the new file
size = sc.recv(7) #receive the size of the file
#size=size[:7]
print size
strng = sc.recv(int(size)) #receive the data of the file
#if strng:
f.write(strng) #write the file
f.close()
sc.close()
s.close()
To transfer a sequence of files over a single socket, you need some way of delineating each file. In effect, you need to run a small protocol on top of the socket which allows to you know the metadata for each file such as its size and name, and of course the image data.
It appears you're attempting to do this, however both sender and receiver must agree on a protocol.
You have the following in your sender:
s.send(file) #send the name of the file
st = os.stat(sb+'\\'+file).st_size
s.send(str(st)) #send the size of the file
s.send(l)
How is the receiver to know how long the file name is? Or, how will the receiver know where the end of the file name is, and where the size of the file starts? You could imagine the receiver obtaining a string like foobar.txt8somedata and having to infer that the name of the file is foobar.txt, the file is 8 bytes long containing the data somedata.
What you need to do is separate the data with some kind of delimeter such as \n to indicate the boundary for each piece of metadata.
You could envisage a packet structure as <filename>\n<file_size>\n<file_contents>. An example stream of data from the transmitter may then look like this:
foobar.txt\n8\nsomedata
The receiver would then decode the incoming stream, looking for \n in the input to determine each field's value such as the file name and size.
Another approach would be to allocate fixed length strings for the file name and size, followed by the file's data.
The parameter to socket.recv only specifies the maximum buffer size for receiving data packages, it doesn't mean exactly that many bytes will be read.
So if you write:
strng = sc.recv(int(size))
you won't necessarily get all the content, specially if size is rather large.
You need to read from the socket in a loop until you have actually read size bytes to make it work.