Python FTP and Streams - python

I need to create a "turning table" platform. My server must be able to take a file from FTP A and send it to FTP B. I did a lot of file transfer systems, so I have no problem with ftplib, aspera, s3 and other transfer protocols.
The thing is that I have big files (150G) on FTP A. And many transfers will occur at the same time, from and to many FTP servers or other.
I don't want my platform to actually store these files in order to send them to another location. I don't want to load everything in memory either... I need to "stream" binary data from A to B, with minimal charge on my transfer platform.
I am looking at https://docs.python.org/2/library/io.html with ReadBuffer and WriteBuffer, but I can't find examples and the documentation is sorta cryptic for me...
Anyone has a starting point?
buff = io.open('/var/tmp/test', 'wb')
def loadbuff(data):
buff.write(data)
self.ftp.retrbinary('RETR ' + name, loadbuff, blocksize=8)
So my data is coming in buff, which is a <_io.BufferedWriter name='/var/tmp/test'> object, but how can I start reading from it while ftplib keeps downloading?
Hope I'm clear enough, any idea is welcomed.
Thanks

Related

ssl socket - Get the real number of bytes of the original message(without the encryption)

I have a server client code that transfers files in python using sockets.
I was thinking about using SSL, but there is one simple problem.
I am using the Buffsize of socket.recv() in order to know how much
bytes were transferred until now.
If I add SSL now, each message size would grow (because of the encryption) and I won't be able to tell how much of the original file was transferred.
Is there a way to get the original message size after the encryption?
or do you have another method of knowing when the file has been completely transferred?

Get large files from FTP with python lib

I need to download some large files (>30GB per file) from a FTP server. I'm using ftplib from the python standardlib but there are some pitfalls: If i download a large file, i can not use the connection anymore if the file finishes. I get an EOF Error afterwards, so the connection is closed (due to timeout?) and for each succeeding file i will get an error 421.
From what i read, there are two connections. The data and control channel, where the data channel seems to work correctly (i can download the file completly) but the control channels times out in the meantime.
I also read that the ftplib (and other python ftp libraries) are not suited for large files and may only support files up to around 1GB.
There is a similar question to this topic here: How to download big file in python via ftp (with monitoring & reconnect)? which is not quite the same because my files are huge in comparison.
My current code looks like this:
import ftplib
import tempfile
ftp = ftplib.FTP_TLS()
ftp.connect(host=server, port=port)
ftp.login(user=user, passwd=password)
ftp.prot_p()
ftp.cwd(folder)
for file in ftp.nlst():
fd, local_filename = tempfile.mkstemp()
f = open(fd, "wb")
ftp.retrbinary('RETR %s' % file, callback=f.write, blocksize=8192)
f.close()
Is there any tweak to it or another library that i can use, which does support huge files?
If you experience issues with standard FTP, you can try using a different protocol that is specifically designed to handle such large files.
A number of suitable solutions exist. Rsync would probably be a good way to start.

Python network communication with encryption and password protection

I want to create a python program that can communicate with another python program running on another machine. They should communicate via network. For me, it's super simple using BasicHTTPServer. I just have to direct my message to http:// server2 : port /my/message and server2 can do whatever action needed based on that message "/my/message". It is also very time-efficient as I do not have to check a file every X seconds or something similar. (My other idea was to put text files via ssh to the remote server and then read that file..)
The downside is, that this is not password protected and not encrypted. I would like to have both, but still keep it that simple to transfer messages.
The machines that are communicating know each other and I can put key files on all those machines.
I also stumbled upon twisted, but it looks rather complicated. Also gevent looks way too complicated with gevent.ssl.SSLsocket, because I have to check for byte length of messages and stuff..
Is there a simple example on how to set something like this up?
You should consider using HTTPS, as it does the job you want.
The good part is that you won't need to change the code as the connection between the two parties is encrypted. The downside is that you have to set up a server with an HTTP certificate (there are lot of resources on the Internet) and you will need sometimes (depending of your implementation) to accept this certificate in order to make a successful connection.
You can combine it, of course, with using password protected files.
if you have no problem rolling out a key file to all nodes ...
simply throw your messages into AES, and move the output like you moved the unencrypted messages ...
on the other side ... decrypt, and handle the plaintext like the messages you handled before ...

Can't seek in streamed file

I have developed a server that serves audio files over HTTP. Using the Content-Length header I was able to view the current position, but this source isn't seekable. How can I make it seekable?
Some people recommended sending Accept-Range: bytes, but when I tried that the audio doesn't even play anymore.
Think about what you're asking: A stream of bytes is arriving over the network, and you want random access over it? You can't. What you can do is implement buffering yourself, which you could do reasonably transparently with the io module. If you want to seek forward, discard the intermediate blocks; if you want to seek backward, you'll have to hold the stream in memory till you don't need it anymore.
If you don't want to buffer the entire stream client-side, you need a way to tell the server to seek to a different position and restart streaming from there.

python ftp retrieve lines -- performance issues

I am trying to retrieve lines from a file through a FTP connection using the ftplib module of python. It takes about 10 mins to read a file of size 1 GB. I was wondering if there are any other ways to read the lines in a faster manner.
I should have included some code to show what I am doing:
ftp.HostName = 'xxx'
ftp.Userid = 'xxx' #so on
ftp.conn.retrlines('RETR ' + fileName, process)
Retrieving remote resources is usually bound by your bandwidth, and FTP protocol does a decent job of using it all.
Are you sure you aren't saturating your network connection? (what is the network link between client running ftplib and server you are downloading from?)
Back of the envelope calc:
1GB/10mins =~ 1.7 MB/sec =~ 13 Mbps
So you are downloading at 13 megabit. That's decent speed for a remote DSL/Cable/WAN connection, but obviously pretty low if this is all a local network.
Can you show some minimal code sample of what you are doing? FTP is for transporting files,
so retrieving lines from a remote file isn't necessarily as efficient as transferring the file in whole once and reading it locally.
Aside from that, have you verified, that you can be faster on this connection?
EDIT: if you try the following and it is not a bit faster, then you are limited by your OS or your connection:
ftp.conn.retrbinary('RETR ' + fileName, open(temp_file_name, 'wb').write)
The assumption here is, that FTP text mode might be somewhat less efficient (on the server side), which might be false or of miniscule relevance.

Categories