sorl-thumbnail ThumbnailException Error After Cloning EC2 Instance - python

I cloned a working EC2 instance to create a secondary staging server. Everything is working as it should with the exception of sorl-thumbnail.
Before I describe the errors I'm receiving, I think it might be helpful to describe the stack I'm working with. It involves 3 EC2 instances; an app server running django in combination with Nginx and Gunicorn; a database running MySQL and Redis; and a media server running Nginx. The app server uses NFS to mount the media directory from the media server locally. All appropriate ports are open in AWS and the app server has been added /etc/exports on the media server.
On to the issue I am seeing... The img src attribute for all images that should be generated by sorl-thumbnail is empty. When I take a look at my django app's log, I see an entry like this for every missing image:
[04/29/2013 13:11:54] DEBUG : Could not find thumbnail image for rendering </media/images/12345.jpg>
ThumbnailException: Source file: '/images/12345.jpg' does not exist.
[04/29/2013 13:11:54] DEBUG : Could not retrieve image for </media/images/12345.jpg>
However, 12345.jpg does exist at /media/images/.
I spent most of Friday trying to run down the issue to no avail. Has anyone come across anything like this?

Generated data like image thumbnails is often stored in a (comparatively) temporary filesystem location, and How sorl-thumbnail operates suggest the same:
When you use the thumbnail template tag sorl-thumbnail looks up the
thumbnail in a Key Value Store. The key for a thumbnail is generated
from its filename and storage. [...] It is worth noting that sorl-thumbnail does not check if
source or thumbnail exists if the thumbnail key is found in the Key
Value Store.
Note: This means that if you change or delete a source file or delete the
thumbnail, sorl-thumbnail will still fetch from the Key Value Store.
Therefore it is important that if you delete or change a source or
thumbnail file notify the Key Value Store.
[emphasis mine]
Now, Amazon EC2 instances usually feature two distinct storage types, namely the persistent Amazon Elastic Block Store (Amazon EBS) volumes, which are copied when cloning an instance, and also the Amazon EC2 Instance Store volumes (usually referred to as ephemeral storage), which are lost when cloning an instance; see my answer to how to take backup of aws ec2 instance/ephemeral storage? for more on this difference/problem.
So presumably your thumbnails have been stored on the ephemeral volume and would need to be generated now accordingly.

Related

putting a Django website on production using sqlite data base

hey I'm currently working on a website (Photo selling services) and now I wanna deploy it on a public host,
I didn't change the database and I'm using Django's SQLite as my database, Is it gonna be a problem or it's fine?
and also I'm handling the downloads with my views and template and the files (photos) will be downloaded from my database and I wanted to know do I need one host for my application and another for putting my photos in? or I can just run the whole website on one host without a problem ( same as I'm running it on my local host).
I prefer not to use SQLite in production because:
It is just a single file and it may get deleted suddenly by anyone that has access to that.
No user management.
Limited data types.
For serving files on a heavy-traffic website it's good to have CDN to serve files from.

What's the optimal way to store image data temporarily in a containerized website?

I'm currently working on a website where i want the user to upload one or more images, my flask backend will do some changes on these pictures and then return them back to the front end.
Where do I optimally save these images temporarily especially if there are more then one user at the same time on my website (I'm planning on containerizing the website). Is it safe for me to save the images in the folder of the website or do I need e.g. a database for that?
You should use a database, or external object storage like Amazon S3.
I say this for a couple of reasons:
Accidents do happen. Say the client does an HTTP POST, gets a URL back, and does an HTTP GET to retrieve the result. But in the meantime, the container restarts (because the system crashed; your cloud instance got terminated; you restarted the container to upgrade its image; the application failed); the container-temporary filesystem will get lost.
A worker can run in a separate container. It's very reasonable to structure this application as a front-end Web server, that pushes messages into a job queue, and then a back-end worker picks up messages out of that queue to process the images. The main server and the worker will have separate container-local filesystems.
You might want to scale up the parts of this. You can easily run multiple containers from the same image; they'll each have separate container-local filesystems, and you won't directly control which replica a request goes to, so every container needs access to the same underlying storage.
...and it might not be on the same host. In particular, cluster technologies like Kubernetes or Docker Swarm make it reasonably straightforward to run container-based applications spread across multiple systems; sharing files between hosts isn't straightforward, even in these environments. (Most of the Kubernetes Volume types that are easy to get aren't usable across multiple hosts, unless you set up a separate NFS server.)
That set of constraints would imply trying to avoid even named volumes as much as you can. It makes sense to use volumes for the underlying storage for your database, and it can make sense to use Docker bind mounts to inject configuration files or get log files out, but ideally your container doesn't really use its local filesystem at all and doesn't care how many copies of itself are running.
(Do not rely on Docker's behavior of populating a named volume on first use. There are three big problems with it: it is on first use only, so if you update the underlying image, the volume won't get updated; it only works with Docker named volumes and not other options like bind-mounts; and it only works in Docker proper and not in Kubernetes.)
Other decisions are possible given other sets of constraints. If you're absolutely sure you will never ever want to run this application spread across multiple nodes, Docker volumes or bind mounts might make sense. I'd still avoid the container-temporary filesystem.

Django + AWS: files not syncing to S3

I inherited a CMS system that was implemented using Django Suit. One of the forms is supposed to upload files to S3 but it's not happening (the files upload to the webserver - EC2, but not to S3).
What I determined so far:
The EC2 instance has full access to S3 (via a role)
The user set up in Django's config file has full access to S3
There is a CloudFront configured to point to the bucket, and it works when files are accessed via a URL. The configuration is working there
The previous developers used the following for handling the upload of files:
DEFAULT_FILE_STORAGE = 'fallback_storage.storage.FallbackStorage'
FALLBACK_STORAGES = (
'django.core.files.storage.FileSystemStorage',
'main.custom_storages.MediaStorage'
)
I looked into these 3 classes to see if I'm missing a configuration but everything looks good.
I'm not familiar with this way of syncing files between a web server and S3, so I may be missing something very obvious. Is there like a cron jon that needs to run in the background?
I found a blog post explaining how to use Django to upload files to S3 using FallbackStorage. That tutorial uses docker. In this case, docker is not used at all.
I'm lost at this point. There are thousands of classes spread across dozens of python libraries. It will take forever to do an exhaustive analysis of the code.
You should probably look at the FallbackStorage class, typically for file uploads to S3 this would be the storage class off S3BotoStorage with the proper AWS_STORAGE_BUCKET, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY set.
stored = models.FileField(storage=S3BotoStorage(bucket=AWS_BUCKET), upload_to='blog-uploads')

URL to Cloudstorage file on sdk

I am experimenting with Google Cloud Storage on Appengine. I installed the new cloudstorage python api code and have everything working well. I deployed my code, and it is also working well.
My ACL is (correctly) set to public-read and I can view my added files on http://commondatastorage.googleapis.com// just fine when calling cloudstorage on the appspot.
However, is there a similar path on the local sdk?
I am using this to upload images and create thumbnails. However, locally, I don't know how to serve up the url to the thumbnail. I do see in the blobstore viewer, the blobs are created, but there is no filename displyed in the blobinfo AND the url uses the blobstore key rather than the filename I gave to the gs create call.
Yes, you can use the following on the app server:
/_ah/gcs/[bucket name]/[filename]
If you used the default bucket, it's called app_default_bucket.
I've tested it with images and it works well. With mp4 videos it seems to run into an error though.

mounting an s3 bucket in ec2 and using transparently as a mnt point

I have a webapp (call it myapp.com) that allows users to upload files. The webapp will be deployed on Amazon EC2 instance. I would like to serve these files back out to the webapp consumers via an s3 bucket based domain (i.e. uploads.myapp.com).
When the user uploads the files, I can easily drop them in into a folder called "site_uploads" on the local ec2 instance. However, since my ec2 instance has finite storage, with a lot of uploads, the ec2 file system will fill up quickly.
It would be great if the ec2 instance could mount and s3 bucket as the "site_upload" directory. So that uploads to the EC2 "site_upload" directory automatically end up on uploads.myapp.com (and my webapp can use template tags to make sure the links for this uploaded content is based on that s3 backed domain). This also gives me scalable file serving, as request for files hits s3 and not my ec2 instance. Also, it makes it easy for my webapp to perform scaling/resizing of the images that appear locally in "site_upload" but are actually on s3.
I'm looking at s3fs, but judging from the comments, it doesn't look like a fully baked solution. I'm looking for a non-commercial solution.
FYI, The webapp is written in django, not that that changes the particulars too much.
I'm not using EC2, but I do have my S3 bucket permanently mounted on my Linux server. The way I did it is with Jungledisk. It isn't a non-commercial solution, but it's very inexpensive.
First I setup the jungledisk as normal. Then I make sure fuse is installed. Mostly you just need to create the configuration file with your secret keys and such. Then just add a line to your fstab something like this.
jungledisk /path/to/mount/at fuse noauto,allow_other,config=/path/to/jungledisk/config/file.xml 0 0
Then just mount, and you're good to go.
For uploads, your users can upload directly to S3, as described here.
This way you won't need to mount S3.
When serving the files, you can also do that from S3 directly by marking the files public, I'd prefer to name the site "files.mydomain.com" or "images.mydomain.com" pointing to s3.
I use s3fs, but there are no readily available distributions. I've got my build here for anyone who wants it easier.
Configuration documentation wasn't available, so I winged it until I got this in my fstab:
s3fs#{{ bucket name }} {{ /path/to/mount/point }} fuse allow_other,accessKeyId={{ key }},secretAccessKey={{ secret key }} 0 0
s3fs
This is a little snipped that I use for an Ubuntu system and I have not tested it on so it will obviously need to be adapted for a M$ system. You'll also need to install s3-simple-fuse. If you wind up eventually putting your job to the clound, I'd recommend fabric to run the same command.
import os, subprocess
'''
Note: this is for Linux with s3cmd installed and libfuse2 installed
Run: 'fusermount -u mount_directory' to unmount
'''
def mountS3(aws_access_key_id, aws_secret_access_key, targetDir, bucketName = None):
#######
if bucketName is None:
bucketName = 's3Bucket'
mountDir = os.path.join(targetDir, bucketName)
if not os.path.isdir(mountDir):
os.path.mkdir(mountDir)
subprocess.call('s3-simple-fuse %s -o AWS_ACCESS_KEY_ID=%s,AWS_SECRET_ACCESS_KEY=%s,bucket=%s'%(mountDir, aws_access_key_id, aws_secret_access_key, bucketName)
I'd suggest using a separately-mounted EBS volume. I tried doing the same thing for some movie files. Access to S3 was slow, and S3 has some limitations like not being able to rename files, no real directory structure, etc.
You can set up EBS volumes in a RAID5 configuration and add space as you need it.

Categories