I'm trying to create a function that downloads a file from a FTP in memory and returns it. In this case I am trying download a zip file and unzip it without writing the file locally, but I am getting the following error:
ValueError: I/O operation on closed file.
Here is my current code:
from io import BytesIO
from ftplib import FTP_TLS
def download_from_ftp(fp):
"""
Retrieves file from a ftp
"""
ftp_host = 'some ftp url'
ftp_user = 'ftp username'
ftp_pass = 'ftp password'
with FTP_TLS(ftp_host) as ftp:
ftp.login(user=ftp_user, passwd=ftp_pass)
ftp.prot_p()
with BytesIO() as download_file:
ftp.retrbinary('RETR ' + fp, download_file.write)
download_file.seek(0)
return download_file
And here is my code that tries and unzips the file:
import zipfile
from ftp import download_from_ftp
ftp_file = download_from_ftp('ftp zip file path')
with zipfile.ZipFile(ftp_file, 'r') as zip_ref:
# do some stuff with files in the zip
By instantiating BytesIO as a context manager, it closes the file handle upon exit, so download_file no longer has an open file handle when it is returned to the caller.
You can simply assign the instantiated BytesIO object a variable for return instead. Change:
with BytesIO() as download_file:
to:
download_file = BytesIO()
and dedent the block.
Related
Im creating an api using flask where zip file should be downloaded at client side. The zip file is converted in to binary files and sent to client. The client regenerates the binary file back in to zip file. The server side is working fine and a zip file is downloaded but inside the file is empty. How to fix this?
this is server side
'''
#app.route('/downloads/', methods=['GET'])
def download():
from flask import Response
import io
import zipfile
import time
FILEPATH = "/home/Ubuntu/api/files.zip"
fileobj = io.BytesIO()
with zipfile.ZipFile(fileobj, 'w') as zip_file:
zip_info = zipfile.ZipInfo(FILEPATH)
zip_info.date_time = time.localtime(time.time())[:6]
zip_info.compress_type = zipfile.ZIP_DEFLATED
with open(FILEPATH, 'rb') as fd:
zip_file.writestr(zip_info, fd.read())
fileobj.seek(0)
return Response(fileobj.getvalue(),
mimetype='application/zip',
headers={'Content-Disposition': 'attachment;filename=files.zip'})
# client side
bin_data=b"response.content" #Whatever binary data you have store in a variable
binary_file_path = 'files.zip' #Name for new zip file you want to regenerate
with open(binary_file_path, 'wb') as f:
f.write(bin_data)
When I try to upload a compressed gzip file to cloud storage using a python script on a Cloud Shell instance, it always upload an empty file.
Here's the code to reproduce the errors:
import gzip
from google.cloud import storage
storage_client = storage.Client()
list=['hello', 'world', 'please', 'upload']
out_file=gzip.open('test.gz', 'wt')
for line in list:
out_file.write(line + '\n')
out_file.close
out_bucket = storage_client.bucket('test-bucket')
out_blob = out_bucket.blob('test')
out_blob.upload_from_filename('test.gz')
It uploads only an empty file named 'test' on my bucket which is not what I expect.
However, my file written in my Cloud Shell is not empty because when I do zcat test.gz it shows the expected content:
hello
world
please
upload
To understand what's happening in your code, here's a description from gzip docs:
Calling a GzipFile object’s close() method does not close fileobj, since you might wish to append more material after the compressed data.
This explains why file objects not being closed affects the upload of your file. Here's a supporting answer which describes the behavior of your code where the fileobj is not being closed, where:
The warning about fileobj not being closed only applies when you open the file, and pass it to the GzipFile via the fileobj= parameter. When you pass only a filename, GzipFile "owns" the file handle and will also close it.
The solution is to not pass the gzipfile via fileobj = parameter and to rewrite it like this:
import gzip
from google.cloud import storage
storage_client = storage.Client()
list=['hello', 'world', 'please', 'upload']
with gzip.open('test.gz', 'rt') as f_in, gzip.open('test.gz', 'wt') as f_out:
for line in list:
f_out.writelines(line + '\n')
out_bucket = storage_client.bucket('test-bucket')
out_blob = out_bucket.blob('test.gz') # include file format in dest filename
out_blob.upload_from_filename("test.gz")
Using the ftplib in Python, you can download files, but it seems you are restricted to use the file name only (not the full file path). The following code successfully downloads the requested code:
import ftplib
ftp=ftplib.FTP("ladsweb.nascom.nasa.gov")
ftp.login()
ftp.cwd("/allData/5/MOD11A1/2002/001")
ftp.retrbinary('RETR MOD11A1.A2002001.h00v08.005.2007079015634.hdf',open("MOD11A1.A2002001.h00v08.005.2007079015634.hdf",'wb').write)
As you can see, first a login to the site (ftp.login()) is established and then the current directory is set (ftp.cwd()). After that you need to declare the file name to download the file that resides in the current directory.
How about downloading the file directly by using its full path/link?
import ftplib
ftp = ftplib.FTP("ladsweb.nascom.nasa.gov")
ftp.login()
a = 'allData/5/MOD11A1/2002/001/MOD11A1.A2002001.h00v08.005.2007079015634.hdf'
fhandle = open('ftp-test', 'wb')
ftp.retrbinary('RETR ' + a, fhandle.write)
fhandle.close()
This solution uses the urlopen function in the urllib module. The urlopen function will let you download ftp and http urls. I like using it because you can connect and get all the data in one line. The last three lines extract the filename from the url and then save the data to that filename.
from urllib import urlopen
url = 'ftp://ladsweb.nascom.nasa.gov/allData/5/MOD11A1/2002/001/MOD11A1.A2002001.h00v08.005.2007079015634.hdf'
data = urlopen(url).read()
filename = url.split('/')[-1]
with open(filename, 'wb') as f:
f.write(data)
Right now, I have the following code:
pilimg = PILImage.open(img_file_tmp) # img_file_tmp just contains the image to read
pilimg.thumbnail((200,200), PILImage.ANTIALIAS)
pilimg.save(fn, 'PNG') # fn is a filename
This works just fine for saving to a local file pointed to by fn. However, what I would want this to do instead is to save the file on a remote FTP server.
What is the easiest way to achieve this?
Python's ftplib library can initiate an FTP transfer, but PIL cannot write directly to an FTP server.
What you can do is write the result to a file and then upload it to the FTP server using the FTP library. There are complete examples of how to connect in the ftplib manual so I'll focus just on the sending part:
# (assumes you already created an instance of FTP
# as "ftp", and already logged in)
f = open(fn, 'r')
ftp.storbinary("STOR remote_filename.png", f)
If you have enough memory for the compressed image data, you can avoid the intermediate file by having PIL write to a StringIO, and then passing that object into the FTP library:
import StringIO
f = StringIO()
image.save(f, 'PNG')
f.seek(0) # return the StringIO's file pointer to the beginning of the file
# again this assumes you already connected and logged in
ftp.storbinary("STOR remote_filename.png", f)
Here's what I'm doing now:
mysock = urllib.urlopen('http://localhost/image.jpg')
fileToSave = mysock.read()
oFile = open(r"C:\image.jpg",'wb')
oFile.write(fileToSave)
oFile.close
f=file('image.jpg','rb')
ftp.storbinary('STOR '+os.path.basename('image.jpg'),f)
os.remove('image.jpg')
Writing files to disk and then imediately deleting them seems like extra work on the system that should be avoided. Can I upload an object in memory to FTP using Python?
Because of duck-typing, the file object (f in your code) only needs to support the .read(blocksize) call to work with storbinary. When faced with questions like this, I go to the source, in this case lib/python2.6/ftplib.py:
def storbinary(self, cmd, fp, blocksize=8192, callback=None):
"""Store a file in binary mode. A new port is created for you.
Args:
cmd: A STOR command.
fp: A file-like object with a read(num_bytes) method.
blocksize: The maximum data size to read from fp and send over
the connection at once. [default: 8192]
callback: An optional single parameter callable that is called on
on each block of data after it is sent. [default: None]
Returns:
The response code.
"""
self.voidcmd('TYPE I')
conn = self.transfercmd(cmd)
while 1:
buf = fp.read(blocksize)
if not buf: break
conn.sendall(buf)
if callback: callback(buf)
conn.close()
return self.voidresp()
As commented, it only wants a file-like object, indeed it not even be particularly file-like, it just needs read(n). StringIO provides such "memory file" services.
import urllib
import ftplib
ftp = ftplib.FTP(...)
f = urllib.urlopen('http://localhost/image.jpg')
ftp.storbinary('STOR image.jpg', f)
You can use any in-memory file-like object, like BytesIO:
from io import BytesIO
It works both in binary mode with FTP.storbinary:
f = BytesIO(b"the contents")
ftp.storbinary("STOR /path/file.txt", f)
as well as in ascii/text mode with FTP.storlines:
f = BytesIO(b"the contents")
ftp.storlines("STOR /path/file.txt", f)
For more advanced examples, see:
Python - Upload a in-memory file (generated by API calls) in FTP by chunks
Python - Transfer a file from HTTP(S) URL to FTP/Dropbox without disk writing (chunked upload)
How to send CSV file directly to an FTP server