How to dynamically select storage option for models.FileField?

How to dynamically select storage option for models.FileField? - python

Depending on the file extension, I want the file to be stored in a specific AWS bucket. I tried passing a function to the storage option, similar to how upload_to is dynamically defined.
However, this doesn't give the desired results. In my template, when I try href to document.docfile.url, the link doesn't work.
Checking in the shell, this happens
Document.objects.all()[0].docfile.storage.bucket
<Bucket: <function aws_bucket at 0x110672050>>
Document.objects.all()[0].docfile.storage.bucket_name
<function myproject.myapp.models.aws_bucket>
Desired behaviour would be
Document.objects.all()[0].docfile.storage.bucket_name
'asynch-uploader-txt'
Document.objects.all()[0].docfile.storage.bucket
<Bucket: asynch-uploader-txt>
This is my models.py file:
# -*- coding: utf-8 -*-
from django.db import models
from storages.backends.s3boto import S3BotoStorage
def upload_file_to(instance, filename):
import os
from django.utils.timezone import now
filename_base, filename_ext = os.path.splitext(filename)
return 'files/%s_%s%s' % (
filename_base,
now().strftime("%Y%m%d%H%M%S"),
filename_ext.lower(),
)
def aws_bucket(instance, filename):
import os
filename_base, filename_ext = os.path.splitext(filename)
return 'asynch-uploader-%s' %(filename_ext[1:])
class Document(models.Model):
docfile = models.FileField(upload_to=upload_file_to,storage=S3BotoStorage(bucket=aws_bucket))
Why is aws_bucket getting passed as a function and not a string, the way that upload_file_to is? How can I correct it?

For what you're trying to do you may be better off making a custom storage backend and just overriding the various bits of S3BotoStorage.
In particular if you make bucket_name a property you should be able to get the behavior you want.
EDIT:
To expand a bit on that, the source for S3BotoStorage.__init__ has the bucket as an optional argument. Additionally bucket when it's used in the class is a #param, making it easy to override. The following code is untested, but should be enough to give you a starting point
class MyS3BotoStorage(S3BotoStorage):
#property
def bucket(self):
if self._filename.endswith('.jpg'):
return self._get_or_create_bucket('SomeBucketName')
else:
return self._get_or_create_bucket('SomeSaneDefaultBucket')
def _save(self, name, content):
# This part might need some work to normalize the name and all...
self._filename = name
return super(MyS3BotoStorage, self)._save(name, content)

Related

How to know and instantiate only one class implemented in a Python module dynamically

Suppose in "./data_writers/excel_data_writer.py", I have:
from generic_data_writer import GenericDataWriter
class ExcelDataWriter(GenericDataWriter):
def __init__(self, config):
super().__init__(config)
self.sheet_name = config.get('sheetname')
def write_data(self, pandas_dataframe):
pandas_dataframe.to_excel(
self.get_output_file_path_and_name(), # implemented in GenericDataWriter
sheet_name=self.sheet_name,
index=self.index)
In "./data_writers/csv_data_writer.py", I have:
from generic_data_writer import GenericDataWriter
class CSVDataWriter(GenericDataWriter):
def __init__(self, config):
super().__init__(config)
self.delimiter = config.get('delimiter')
self.encoding = config.get('encoding')
def write_data(self, pandas_dataframe):
pandas_dataframe.to_csv(
self.get_output_file_path_and_name(), # implemented in GenericDataWriter
sep=self.delimiter,
encoding=self.encoding,
index=self.index)
In "./datawriters/generic_data_writer.py", I have:
import os
class GenericDataWriter:
def __init__(self, config):
self.output_folder = config.get('output_folder')
self.output_file_name = config.get('output_file')
self.output_file_path_and_name = os.path.join(self.output_folder, self.output_file_name)
self.index = config.get('include_index') # whether to include index column from Pandas' dataframe in the output file
Suppose I have a JSON config file that has a key-value pair like this:
{
"__comment__": "Here, user can provide the path and python file name of the custom data writer module she wants to use."
"custom_data_writer_module": "./data_writers/excel_data_writer.py"
"there_are_more_key_value_pairs_in_this_JSON_config_file": "for other input parameters"
}
In "main.py", I want to import the data writer module based on the custom_data_writer_module provided in the JSON config file above. So I wrote this:
import os
import importlib
def main():
# Do other things to read and process data
data_writer_class_file = config.get('custom_data_writer_module')
data_writer_module = importlib.import_module\
(os.path.splitext(os.path.split(data_writer_class_file)[1])[0])
dw = data_writer_module.what_should_this_be? # <=== Here, what should I do to instantiate the right specific data writer (Excel or CSV) class instance?
for df in dataframes_to_write_to_output_file:
dw.write_data(df)
if __name__ == "__main__":
main()
As I asked in the code above, I want to know if there's a way to retrieve and instantiate the class defined in a Python module assuming that there is ONLY ONE class defined in the module. Or if there is a better way to refactor my code (using some sort of pattern) without changing the structure of JSON config file described above, I'd like to learn from Python experts on StackOverflow. Thank you in advance for your suggestions!

You can do this easily with vars:
cls1,=[v for k,v in vars(data_writer_module).items()
if isinstance(v,type)]
dw=cls1(config)
The comma enforces that exactly one class is found. If the module is allowed to do anything like from collections import deque (or even foo=str), you might need to filter based on v.__module__.

Patching a function in a file where it is defined

I am trying to learn unittest patching. I have a single file that both defines a function, then later uses that function. When I try to patch this function, its return value is giving me the real return value, not the patched return value.
How do I patch a function that is both defined and used in the same file? Note: I did try to follow the advice given here, but it didn't seem to solve my problem.
walk_dir.py
from os.path import dirname, join
from os import walk
from json import load
def get_config():
current_path =dirname(__file__)
with open(join(current_path, 'config', 'json', 'folder.json')) as json_file:
json_data = load(json_file)
return json_data['parent_dir']
def get_all_folders():
dir_to_walk = get_config()
for root, dir, _ in walk(dir_to_walk):
return [join(root, name) for name in dir]
test_walk_dir.py
from hello_world.walk_dir import get_all_folders
from unittest.mock import patch
#patch('walk_dir.get_config')
def test_get_all_folders(mock_get_config):
mock_get_config.return_value = 'C:\\temp\\test\\'
result = get_all_folders()
assert set(result) == set('C:\\temp\\test\\test_walk_dir')

Try declaring the patch in such way:
#patch('hello_world.walk_dir.get_config')
As you can see this answer to the question you linked, it's recommended that your import statements match your patch statements. In your case from hello_world.walk_dir import get_all_folders and #patch('walk_dir.get_config') doesn't match.

Trying to understand Django source code and cause of missing argument TypeError

A screenshot (portrait view) of my IDE and Traceback showing all the code pasted here, may be easier to read if you have a vertical monitor.
Context: Trying to save image from a URL to a Django ImageField hosted on EC2 with files on S3 using S3BotoStorage. I'm confused because all of this suggests that Django is still treating it like local storage, while it should S3.
The lines in question that seem to be causing the error:
def get_filename(self, filename):
return os.path.normpath(self.storage.get_valid_name(os.path.basename(filename)))
def get_valid_name(self, name):
"""
Returns a filename, based on the provided filename, that's suitable for
use in the target storage system.
"""
return get_valid_filename(name)
TypeError Exception: get_valid_name() missing 1 required positional argument: 'name'
Last Local vars Tracback before error at get_valid_name:
filename 'testimagefilename'
self <django.db.models.fields.files.ImageField: image>
(Only the stuff inside these two horizontal dividers is from me, the rest is from Django 1.9)
image.image.save('testimagefilename', File(temp), save=True)
Local vars from Traceback at that point (not sure about the ValueError on image, I think it's because it hasn't been created yet):
File <class 'django.core.files.base.File'>
image Error in formatting: ValueError: The 'image' attribute has no file associated with it.
requests <module 'requests' from '/usr/local/lib/python3.4/site-packages/requests/__init__.py'>
Image <class 'mcmaster.models.Image'>
NamedTemporaryFile <function NamedTemporaryFile at 0x7fb0e1bb0510>
temp <tempfile._TemporaryFileWrapper object at 0x7fb0dd241ef0>
Relevant snippets of Django source code:
files.py
def save(self, name, content, save=True):
name = self.field.generate_filename(self.instance, name)
if func_supports_parameter(self.storage.save, 'max_length'):
self.name = self.storage.save(name, content, max_length=self.field.max_length)
else:
warnings.warn(
'Backwards compatibility for storage backends without '
'support for the `max_length` argument in '
'Storage.save() will be removed in Django 1.10.',
RemovedInDjango110Warning, stacklevel=2
)
self.name = self.storage.save(name, content)
setattr(self.instance, self.field.name, self.name)
# Update the filesize cache
self._size = content.size
self._committed = True
# Save the object because it has changed, unless save is False
if save:
self.instance.save()
save.alters_data = True
def get_directory_name(self):
return os.path.normpath(force_text(datetime.datetime.now().strftime(force_str(self.upload_to))))
def get_filename(self, filename):
return os.path.normpath(self.storage.get_valid_name(os.path.basename(filename)))
def generate_filename(self, instance, filename):
# If upload_to is a callable, make sure that the path it returns is
# passed through get_valid_name() of the underlying storage.
if callable(self.upload_to):
directory_name, filename = os.path.split(self.upload_to(instance, filename))
filename = self.storage.get_valid_name(filename)
return os.path.normpath(os.path.join(directory_name, filename))
return os.path.join(self.get_directory_name(), self.get_filename(filename))
storage.py
def get_valid_name(self, name):
"""
Returns a filename, based on the provided filename, that's suitable for
use in the target storage system.
"""
return get_valid_filename(name)
text.py
def get_valid_filename(s):
"""
Returns the given string converted to a string that can be used for a clean
filename. Specifically, leading and trailing spaces are removed; other
spaces are converted to underscores; and anything that is not a unicode
alphanumeric, dash, underscore, or dot, is removed.
>>> get_valid_filename("john's portrait in 2004.jpg")
'johns_portrait_in_2004.jpg'
"""
s = force_text(s).strip().replace(' ', '_')
return re.sub(r'(?u)[^-\w.]', '', s)
get_valid_filename = allow_lazy(get_valid_filename, six.text_type)

I'd make a guess you didn't instantiate the Storage class. How are you setting Django to use the custom storage? If you do this in models.py
image = models.ImageField(storage=MyStorage)
It will fail exactly as you describe. It should be
image = models.ImageField(storage=MyStorage())

Django Imagekit overwrite cachefile_name?

I'm trying to overwrite the cachefile_name property from the module django-imagekit.
Here is my code:
class Thumb150x150(ImageSpec):
processors = [ResizeToFill(150, 150)]
format = 'JPEG'
options = {'quality': 90}
#property
def cachefile_name(self):
# simplified for this example
return "bla/blub/test.jpg"
register.generator('blablub:thumb_150x150', Thumb150x150)
class Avatar(models.Model):
avatar= ProcessedImageField(upload_to=upload_to,
processors=[ConvertToRGBA()],
format='JPEG',
options={'quality': 60})
avatar_thumb = ImageSpecField(source='avatar',
id='blablub:thumb_150x150')
It doesn't work at all.When I debug (without my overwrite of cachefile_name), and look at the return value of cachefile_name, the result is a string like "CACHE/blablub/asdlkfjasd09fsaud0fj.jpg". Where is my mistake?
Any ideas?

Replicating the example as closely as I could, it worked fine. A couple of suggestions are:
1) Make sure you are using the avatar_thumb in a view. The file "bla/blub/test.jpg" won't be generated until then.
2) Check the configuration of your MEDIA_ROOT to make sure you know where "bla/blub/test.jpg" is expected to appear.
Let me give an example of something similar I was working on. I wanted to give my thumbnails unique names that can be predicted from the original filename. Imagekit's default scheme names the thumbnails based on a hash, which I can't guess. Instead of this:
media/12345.jpg
media/CACHE/12345/abcde.jpg
I wanted this:
media/photos/original/12345.jpg
media/photos/thumbs/12345.jpg
Overriding IMAGEKIT_SPEC_CACHEFILE_NAMER didn't work because I didn't want all of my cached files to end up in the "thumbs" directory, just those generated from a specific field in a specific model.
So I created this ImageSpec subclass and registered it:
class ThumbnailSpec(ImageSpec):
processors=[Thumbnail(200, 200, Anchor.CENTER, crop=True, upscale=False)]
format='JPEG'
options={'quality': 90}
# put thumbnails into the "photos/thumbs" folder and
# name them the same as the source file
#property
def cachefile_name(self):
source_filename = getattr(self.source, 'name', None)
s = "photos/thumbs/" + source_filename
return s
register.generator('myapp:thumbnail', ThumbnailSpec)
And then used it in my model like this:
# provide a unique name for each image file
def get_file_path(instance, filename):
ext = filename.split('.')[-1]
return "%s.%s" % (uuid.uuid4(), ext.lower())
# store the original images in the 'photos/original' directory
photoStorage = FileSystemStorage(
location=os.path.join(settings.MEDIA_ROOT, 'photos/original'),
base_url='/photos/original')
class Photo(models.Model):
image = models.ImageField(storage=photoStorage, upload_to=get_file_path)
thumb = ImageSpecField(
source='image',
id='myapp:thumbnail')

I think, the correct way is to set IMAGEKIT_SPEC_CACHEFILE_NAMER. Have a look at default namer names.py, it joins settings.IMAGEKIT_CACHEFILE_DIR with file path and hash, you should probably do the same.

How can I set a custom response header for pylons static (public) files?

How do I add a custom header to files pylons is serving from public?

a) Let your webserver serve files from /public instead of paster and configure it to pass some special headers.
b) Add a special route and serve the files yourself ala
class FilesController(BaseController):
def download(self, path)
fapp = FileApp( path, headers=self.get_headers(path) )
return fapp(request.environ, self.start_response)
c) maybe there is a way to overwrite headers and i just dont know how.

With a recent version of route, you can use the 'Magic path_info' feature, and follow the documentation from here to write your controller so it calls paster.DirectoryApp.
In my project, I wanted to serve any file in the public directory, including subdirs, and ended with this as controller, to be able to override content_type :
import logging
from paste.fileapp import FileApp
from paste.urlparser import StaticURLParser
from pylons import config
from os.path import basename
class ForceDownloadController(StaticURLParser):
def __init__(self, directory=None, root_directory=None, cache_max_age=None):
if not directory:
directory = config['pylons.paths']['static_files']
StaticURLParser.__init__(self, directory, root_directory, cache_max_age)
def make_app(self, filename):
headers = [('Content-Disposition', 'filename=%s' % (basename(filename)))]
return FileApp(filename, headers, content_type='application/octetstream')

In a standard Pylons setup, the public files are served from a StaticUrlParser. This is typically setup in your config/middleware.py:make_app() function
You need to subclass the StaticUrlParser like Antonin ENFRUN describes, though calling it a Controller is confusing because it's doing a different purpose. Add something like the following to the top of the config/middleware.py:
from paste.fileapp import FileApp
from paste.urlparser import StaticURLParser
class HeaderUrlParser(StaticURLParser):
def make_app(self, filename):
headers = # your headers here
return FileApp(filename, headers, content_type='application/octetstream')
then replace StaticUrlParser in config/middleware.py:make_app() with HeaderUrlParser
static_app = StaticURLParser(config['pylons.paths']['static_files'])
becomes
static_app = HeaderURLParser(config['pylons.paths']['static_files'])

A simpler way to use FileApp for streaming, based on the pylons book. The code below assumes your route provides some_file_identifier, but the other two variables are "magic" (see explanation after code).
class MyFileController(BaseController):
def serve(self, environ, start_response, some_file_identifier):
path = self._convert_id_to_path(some_file_identifier)
app = FileApp(path)
return app(environ, start_response)
Pylons automatically gives you the wsgi environ and start_response variables if you have variables of those names in your method signature. You should not need to set or munge headers otherwise, but if you do you can use the abilities built in to FileApp to achieve this.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to dynamically select storage option for models.FileField? - python

Related

How to know and instantiate only one class implemented in a Python module dynamically

Patching a function in a file where it is defined

Trying to understand Django source code and cause of missing argument TypeError

Django Imagekit overwrite cachefile_name?

How can I set a custom response header for pylons static (public) files?

Categories

Resources