Here's what I'm doing now:
mysock = urllib.urlopen('http://localhost/image.jpg')
fileToSave = mysock.read()
oFile = open(r"C:\image.jpg",'wb')
oFile.write(fileToSave)
oFile.close
f=file('image.jpg','rb')
ftp.storbinary('STOR '+os.path.basename('image.jpg'),f)
os.remove('image.jpg')
Writing files to disk and then imediately deleting them seems like extra work on the system that should be avoided. Can I upload an object in memory to FTP using Python?
Because of duck-typing, the file object (f in your code) only needs to support the .read(blocksize) call to work with storbinary. When faced with questions like this, I go to the source, in this case lib/python2.6/ftplib.py:
def storbinary(self, cmd, fp, blocksize=8192, callback=None):
"""Store a file in binary mode. A new port is created for you.
Args:
cmd: A STOR command.
fp: A file-like object with a read(num_bytes) method.
blocksize: The maximum data size to read from fp and send over
the connection at once. [default: 8192]
callback: An optional single parameter callable that is called on
on each block of data after it is sent. [default: None]
Returns:
The response code.
"""
self.voidcmd('TYPE I')
conn = self.transfercmd(cmd)
while 1:
buf = fp.read(blocksize)
if not buf: break
conn.sendall(buf)
if callback: callback(buf)
conn.close()
return self.voidresp()
As commented, it only wants a file-like object, indeed it not even be particularly file-like, it just needs read(n). StringIO provides such "memory file" services.
import urllib
import ftplib
ftp = ftplib.FTP(...)
f = urllib.urlopen('http://localhost/image.jpg')
ftp.storbinary('STOR image.jpg', f)
You can use any in-memory file-like object, like BytesIO:
from io import BytesIO
It works both in binary mode with FTP.storbinary:
f = BytesIO(b"the contents")
ftp.storbinary("STOR /path/file.txt", f)
as well as in ascii/text mode with FTP.storlines:
f = BytesIO(b"the contents")
ftp.storlines("STOR /path/file.txt", f)
For more advanced examples, see:
Python - Upload a in-memory file (generated by API calls) in FTP by chunks
Python - Transfer a file from HTTP(S) URL to FTP/Dropbox without disk writing (chunked upload)
How to send CSV file directly to an FTP server
Related
I have the following Python function to write the given content to a bucket in Cloud Storage:
import gzip
from google.cloud import storage
def upload_to_cloud_storage(json):
"""Write to Cloud Storage."""
# The contents to upload as a JSON string.
contents = json
storage_client = storage.Client()
# Path and name of the file to upload (file doesn't yet exist).
destination = "path/to/name.json.gz"
# Gzip the contents before uploading
with gzip.open(destination, "wb") as f:
f.write(contents.encode("utf-8"))
# Bucket
my_bucket = storage_client.bucket('my_bucket')
# Blob (content)
blob = my_bucket.blob(destination)
blob.content_encoding = 'gzip'
# Write to storage
blob.upload_from_string(contents, content_type='application/json')
However, I receive an error when running the function:
FileNotFoundError: [Errno 2] No such file or directory: 'path/to/name.json.gz'
Highlighting this line as the cause:
with gzip.open(destination, "wb") as f:
I can confirm that the bucket and path both exist although the file itself is new and to be written.
I can also confirm that removing the Gzipping part sees the file successfully written to Cloud Storage.
How can I gzip a new file and upload to Cloud Storage?
Other answers I've used for reference:
https://stackoverflow.com/a/54769937
https://stackoverflow.com/a/67995040
Although #David's answer wasn't complete at the time of solving my problem, it got me on the right track. Here's what I ended up using along with explanations I found out along the way.
import gzip
from google.cloud import storage
from google.cloud.storage import fileio
def upload_to_cloud_storage(json_string):
"""Gzip and write to Cloud Storage."""
storage_client = storage.Client()
bucket = storage_client.bucket('my_bucket')
# Filename (include path)
blob = bucket.blob('path/to/file.json')
# Set blog meta data for decompressive transcoding
blob.content_encoding = 'gzip'
blob.content_type = 'application/json'
writer = fileio.BlobWriter(blob)
# Must write as bytes
gz = gzip.GzipFile(fileobj=writer, mode="wb")
# When writing as bytes we must encode our JSON string.
gz.write(json_string.encode('utf-8'))
# Close connections
gz.close()
writer.close()
We use the GzipFile() class instead of convenience method (compress) to enable us to pass in the mode. When trying to write using w or wt you will receive the error:
TypeError: memoryview: a bytes-like object is required, not 'str'
So we must write in binary mode (wb). This will also enable the .write() method. When doing so however we need to encode our JSON string. This can be done using str.encode() and setting it as utf-8. Failing to do this will also result in the same error.
Finally, I wanted to be able to enable decompressive transcoding where the requester (browser in my case) will receive the uncompressed version of the file when requested. To enable this google.cloud.storage.blob allows you to set some meta data including content_type and content_encoding so we can can follow best practices.
This sees the JSON object in memory written to your chosen destination in Cloud Storage in a compressed format and decompressed on the fly (without needing to download a gzip archive).
Thanks also to #JohnHanley for the troubleshooting advice.
The best solution is not to write the gzip to a file at all, and directly compress and stream to GCS.
from google.cloud import storage
from google.cloud.storage import fileio
storage_client = storage.Client()
bucket = storage_client.bucket('my_bucket')
blob = bucket.blob('my_object')
writer = fileio.BlobWriter(blob)
gz = gzip.GzipFile(fileobj=writer, mode="w") # use "wb" if bytes
gz.write(contents)
gz.close()
writer.close()
When I try to upload a compressed gzip file to cloud storage using a python script on a Cloud Shell instance, it always upload an empty file.
Here's the code to reproduce the errors:
import gzip
from google.cloud import storage
storage_client = storage.Client()
list=['hello', 'world', 'please', 'upload']
out_file=gzip.open('test.gz', 'wt')
for line in list:
out_file.write(line + '\n')
out_file.close
out_bucket = storage_client.bucket('test-bucket')
out_blob = out_bucket.blob('test')
out_blob.upload_from_filename('test.gz')
It uploads only an empty file named 'test' on my bucket which is not what I expect.
However, my file written in my Cloud Shell is not empty because when I do zcat test.gz it shows the expected content:
hello
world
please
upload
To understand what's happening in your code, here's a description from gzip docs:
Calling a GzipFile object’s close() method does not close fileobj, since you might wish to append more material after the compressed data.
This explains why file objects not being closed affects the upload of your file. Here's a supporting answer which describes the behavior of your code where the fileobj is not being closed, where:
The warning about fileobj not being closed only applies when you open the file, and pass it to the GzipFile via the fileobj= parameter. When you pass only a filename, GzipFile "owns" the file handle and will also close it.
The solution is to not pass the gzipfile via fileobj = parameter and to rewrite it like this:
import gzip
from google.cloud import storage
storage_client = storage.Client()
list=['hello', 'world', 'please', 'upload']
with gzip.open('test.gz', 'rt') as f_in, gzip.open('test.gz', 'wt') as f_out:
for line in list:
f_out.writelines(line + '\n')
out_bucket = storage_client.bucket('test-bucket')
out_blob = out_bucket.blob('test.gz') # include file format in dest filename
out_blob.upload_from_filename("test.gz")
I need to be able to upload a file through FTP and SFTP in Python but with some not so usual constraints.
File MUST NOT be written in disk.
The file how it is generated is by calling an API and writing the response which is in JSON to the file.
There are multiple calls to the API. It is not possible to retrieve the whole result in one single call of the API.
I can not store in a string variable the full result by doing the multiple calls needed and appending in each call until I have the whole file in memory. File could be huge and there is a memory resource constraint. Each chunk should be sent and memory deallocated.
So here some sample code of what I would like to:
def chunks_generator():
range_list = range(0, 4000, 100)
for i in range_list:
data_chunk = requests.get(url=someurl, url_parameters={'offset':i, 'limit':100})
yield str(data_chunk)
def upload_file():
chunks_generator = chunks_generator()
for chunk in chunks_generator:
data_chunk= chunk
chunk_io = io.BytesIO(data_chunk)
ftp = FTP(self.host)
ftp.login(user=self.username, passwd=self.password)
ftp.cwd(self.remote_path)
ftp.storbinary("STOR " + "myfilename.json", chunk_io)
I want only one file with all the chunks appended.
What I have already and works is if I have the whole file in memory and send it at once like this:
string_io = io.BytesIO(all_chunks_together_in_one_string)
ftp = FTP(self.host)
ftp.login(user=self.username, passwd=self.password)
ftp.cwd(self.remote_path)
ftp.storbinary("STOR " + "myfilename.json", string_io )
Bonus
I need this in ftplib but will need it in Paramiko as well for SFTP. If there are any other libraries that this would work better I am open.
How about if I need to zip the file? Can I zip each chunk and send the zip-chunked chunk at a time?
You can implement file-like class that upon calling .read(blocksize) method retrieves data from requests object.
Something like this (untested):
class ChunksGenerator:
i = 0
requests = None
def __init__(self, requests)
self.requests = requests
def read(self, blocksize):
# TODO: somehow detect end-of-file and return false in that case
buf = requests.get(
url=someurl, url_parameters={'offset':self.i, 'limit':blocksize})
self.i += blocksize
return buf
generator = ChunksGenerator(requests)
ftp.storbinary("STOR " + "myfilename.json", generator)
With Paramiko, you can use the same class with SFTPClient.putfo method.
(I'm working on a Python 3.4 project.)
There's a way to open a (sqlite3) database in memory :
with sqlite3.connect(":memory:") as database:
Does such a trick exist for the open() function ? Something like :
with open(":file_in_memory:") as myfile:
The idea is to speed up some test functions opening/reading/writing some short files on disk; is there a way to be sure that these operations occur in memory ?
How about StringIO:
import StringIO
output = StringIO.StringIO()
output.write('First line.\n')
print >>output, 'Second line.'
# Retrieve file contents -- this will be
# 'First line.\nSecond line.\n'
contents = output.getvalue()
# Close object and discard memory buffer --
# .getvalue() will now raise an exception.
output.close()
python3: io.StringIO
There is something similar for file-like input/output to or from a string in io.StringIO.
There is no clean way to add url-based processing to normal file open, but being Python dynamic you could monkey-patch standard file open procedure to handle this case.
For example:
from io import StringIO
old_open = open
in_memory_files = {}
def open(name, mode="r", *args, **kwargs):
if name[:1] == ":" and name[-1:] == ":":
# in-memory file
if "w" in mode:
in_memory_files[name] = ""
f = StringIO(in_memory_files[name])
oldclose = f.close
def newclose():
in_memory_files[name] = f.getvalue()
oldclose()
f.close = newclose
return f
else:
return old_open(name, mode, *args, **kwargs)
after that you can write
f = open(":test:", "w")
f.write("This is a test\n")
f.close()
f = open(":test:")
print(f.read())
Note that this example is very minimal and doesn't handle all real file modes (e.g. append mode, or raising the proper exception on opening in read mode an in-memory file that doesn't exist) but it may work for simple cases.
Note also that all in-memory files will remain in memory forever (unless you also patch unlink).
PS: I'm not saying that monkey-patching standard open or StringIO instances is a good idea, just that you can :-D
PS2: This kind of problem is solved better at OS level by creating an in-ram disk. With that you can even call external programs redirecting their output or input from those files and you also get all the full support including concurrent access, directory listings and so on.
io.StringIO provides a memory file implementation you can use to simulate a real file. Example from documentation:
import io
output = io.StringIO()
output.write('First line.\n')
print('Second line.', file=output)
# Retrieve file contents -- this will be
# 'First line.\nSecond line.\n'
contents = output.getvalue()
# Close object and discard memory buffer --
# .getvalue() will now raise an exception.
output.close()
In Python 2, this class is available instead as StringIO.StringIO.
Right now, I have the following code:
pilimg = PILImage.open(img_file_tmp) # img_file_tmp just contains the image to read
pilimg.thumbnail((200,200), PILImage.ANTIALIAS)
pilimg.save(fn, 'PNG') # fn is a filename
This works just fine for saving to a local file pointed to by fn. However, what I would want this to do instead is to save the file on a remote FTP server.
What is the easiest way to achieve this?
Python's ftplib library can initiate an FTP transfer, but PIL cannot write directly to an FTP server.
What you can do is write the result to a file and then upload it to the FTP server using the FTP library. There are complete examples of how to connect in the ftplib manual so I'll focus just on the sending part:
# (assumes you already created an instance of FTP
# as "ftp", and already logged in)
f = open(fn, 'r')
ftp.storbinary("STOR remote_filename.png", f)
If you have enough memory for the compressed image data, you can avoid the intermediate file by having PIL write to a StringIO, and then passing that object into the FTP library:
import StringIO
f = StringIO()
image.save(f, 'PNG')
f.seek(0) # return the StringIO's file pointer to the beginning of the file
# again this assumes you already connected and logged in
ftp.storbinary("STOR remote_filename.png", f)