Django + AWS: files not syncing to S3 - python

I inherited a CMS system that was implemented using Django Suit. One of the forms is supposed to upload files to S3 but it's not happening (the files upload to the webserver - EC2, but not to S3).
What I determined so far:
The EC2 instance has full access to S3 (via a role)
The user set up in Django's config file has full access to S3
There is a CloudFront configured to point to the bucket, and it works when files are accessed via a URL. The configuration is working there
The previous developers used the following for handling the upload of files:
DEFAULT_FILE_STORAGE = 'fallback_storage.storage.FallbackStorage'
FALLBACK_STORAGES = (
'django.core.files.storage.FileSystemStorage',
'main.custom_storages.MediaStorage'
)
I looked into these 3 classes to see if I'm missing a configuration but everything looks good.
I'm not familiar with this way of syncing files between a web server and S3, so I may be missing something very obvious. Is there like a cron jon that needs to run in the background?
I found a blog post explaining how to use Django to upload files to S3 using FallbackStorage. That tutorial uses docker. In this case, docker is not used at all.
I'm lost at this point. There are thousands of classes spread across dozens of python libraries. It will take forever to do an exhaustive analysis of the code.

You should probably look at the FallbackStorage class, typically for file uploads to S3 this would be the storage class off S3BotoStorage with the proper AWS_STORAGE_BUCKET, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY set.
stored = models.FileField(storage=S3BotoStorage(bucket=AWS_BUCKET), upload_to='blog-uploads')

Related

putting a Django website on production using sqlite data base

hey I'm currently working on a website (Photo selling services) and now I wanna deploy it on a public host,
I didn't change the database and I'm using Django's SQLite as my database, Is it gonna be a problem or it's fine?
and also I'm handling the downloads with my views and template and the files (photos) will be downloaded from my database and I wanted to know do I need one host for my application and another for putting my photos in? or I can just run the whole website on one host without a problem ( same as I'm running it on my local host).
I prefer not to use SQLite in production because:
It is just a single file and it may get deleted suddenly by anyone that has access to that.
No user management.
Limited data types.
For serving files on a heavy-traffic website it's good to have CDN to serve files from.

Python HTTPS Requests: Get a file from GCP and send it to another location?

I have a script that scans a local folder and uploads some of the files to an SQL Server through a POST request. I would love to modify it to take files from a GCP bucket instead of from local storage. However I have no experience with GCP and I am having difficulty finding documentation supporting what I am trying to do. I have a few questions for anyone who has tried anything like this before:
Is a GET request the best way to copy GCP bucket files into a different location? I.e Is there a way to put my script directly into GCP and just use the POST request I already have, referencing the bucket instead of a folder?
If a GET request is the best way, does anyone know of a good resource to learn about HTTPS requests with GCP? (Not sure how to create the GET request/ what information Google would need).
After my GET request (if this is the best way), do the files necessarily have to download to my computer before the POST request to the SQL server OR is there any way to send the files to upload without having to download them?
If you want to replace your local storage by Cloud Storage, you have several things to know
The most transparent, and if you use a linux compliant OS, is to use GCSFuse. You will be able to mount a Cloud Storage bucket in a local directory and work as it was a local storage. HOWEVER, GCSFuse is a wrapper that transform system call to HTTP calls, latency, feature and performance are absolutely not the same.
When you search a file on Cloud Storage, you can only search with a prefix, not with a suffix (if you look for special extension such as .sql or .csv, it's not possible).
You must download the file content locally before sending it to your database. Except if you have a module/extension in your database able to read data from an URL or directly from Cloud Storage (that shouldn't exist).
gsutil is the best tool to handle the Cloud Storage file

URL to Cloudstorage file on sdk

I am experimenting with Google Cloud Storage on Appengine. I installed the new cloudstorage python api code and have everything working well. I deployed my code, and it is also working well.
My ACL is (correctly) set to public-read and I can view my added files on http://commondatastorage.googleapis.com// just fine when calling cloudstorage on the appspot.
However, is there a similar path on the local sdk?
I am using this to upload images and create thumbnails. However, locally, I don't know how to serve up the url to the thumbnail. I do see in the blobstore viewer, the blobs are created, but there is no filename displyed in the blobinfo AND the url uses the blobstore key rather than the filename I gave to the gs create call.
Yes, you can use the following on the app server:
/_ah/gcs/[bucket name]/[filename]
If you used the default bucket, it's called app_default_bucket.
I've tested it with images and it works well. With mp4 videos it seems to run into an error though.

sorl-thumbnail ThumbnailException Error After Cloning EC2 Instance

I cloned a working EC2 instance to create a secondary staging server. Everything is working as it should with the exception of sorl-thumbnail.
Before I describe the errors I'm receiving, I think it might be helpful to describe the stack I'm working with. It involves 3 EC2 instances; an app server running django in combination with Nginx and Gunicorn; a database running MySQL and Redis; and a media server running Nginx. The app server uses NFS to mount the media directory from the media server locally. All appropriate ports are open in AWS and the app server has been added /etc/exports on the media server.
On to the issue I am seeing... The img src attribute for all images that should be generated by sorl-thumbnail is empty. When I take a look at my django app's log, I see an entry like this for every missing image:
[04/29/2013 13:11:54] DEBUG : Could not find thumbnail image for rendering </media/images/12345.jpg>
ThumbnailException: Source file: '/images/12345.jpg' does not exist.
[04/29/2013 13:11:54] DEBUG : Could not retrieve image for </media/images/12345.jpg>
However, 12345.jpg does exist at /media/images/.
I spent most of Friday trying to run down the issue to no avail. Has anyone come across anything like this?
Generated data like image thumbnails is often stored in a (comparatively) temporary filesystem location, and How sorl-thumbnail operates suggest the same:
When you use the thumbnail template tag sorl-thumbnail looks up the
thumbnail in a Key Value Store. The key for a thumbnail is generated
from its filename and storage. [...] It is worth noting that sorl-thumbnail does not check if
source or thumbnail exists if the thumbnail key is found in the Key
Value Store.
Note: This means that if you change or delete a source file or delete the
thumbnail, sorl-thumbnail will still fetch from the Key Value Store.
Therefore it is important that if you delete or change a source or
thumbnail file notify the Key Value Store.
[emphasis mine]
Now, Amazon EC2 instances usually feature two distinct storage types, namely the persistent Amazon Elastic Block Store (Amazon EBS) volumes, which are copied when cloning an instance, and also the Amazon EC2 Instance Store volumes (usually referred to as ephemeral storage), which are lost when cloning an instance; see my answer to how to take backup of aws ec2 instance/ephemeral storage? for more on this difference/problem.
So presumably your thumbnails have been stored on the ephemeral volume and would need to be generated now accordingly.

mounting an s3 bucket in ec2 and using transparently as a mnt point

I have a webapp (call it myapp.com) that allows users to upload files. The webapp will be deployed on Amazon EC2 instance. I would like to serve these files back out to the webapp consumers via an s3 bucket based domain (i.e. uploads.myapp.com).
When the user uploads the files, I can easily drop them in into a folder called "site_uploads" on the local ec2 instance. However, since my ec2 instance has finite storage, with a lot of uploads, the ec2 file system will fill up quickly.
It would be great if the ec2 instance could mount and s3 bucket as the "site_upload" directory. So that uploads to the EC2 "site_upload" directory automatically end up on uploads.myapp.com (and my webapp can use template tags to make sure the links for this uploaded content is based on that s3 backed domain). This also gives me scalable file serving, as request for files hits s3 and not my ec2 instance. Also, it makes it easy for my webapp to perform scaling/resizing of the images that appear locally in "site_upload" but are actually on s3.
I'm looking at s3fs, but judging from the comments, it doesn't look like a fully baked solution. I'm looking for a non-commercial solution.
FYI, The webapp is written in django, not that that changes the particulars too much.
I'm not using EC2, but I do have my S3 bucket permanently mounted on my Linux server. The way I did it is with Jungledisk. It isn't a non-commercial solution, but it's very inexpensive.
First I setup the jungledisk as normal. Then I make sure fuse is installed. Mostly you just need to create the configuration file with your secret keys and such. Then just add a line to your fstab something like this.
jungledisk /path/to/mount/at fuse noauto,allow_other,config=/path/to/jungledisk/config/file.xml 0 0
Then just mount, and you're good to go.
For uploads, your users can upload directly to S3, as described here.
This way you won't need to mount S3.
When serving the files, you can also do that from S3 directly by marking the files public, I'd prefer to name the site "files.mydomain.com" or "images.mydomain.com" pointing to s3.
I use s3fs, but there are no readily available distributions. I've got my build here for anyone who wants it easier.
Configuration documentation wasn't available, so I winged it until I got this in my fstab:
s3fs#{{ bucket name }} {{ /path/to/mount/point }} fuse allow_other,accessKeyId={{ key }},secretAccessKey={{ secret key }} 0 0
s3fs
This is a little snipped that I use for an Ubuntu system and I have not tested it on so it will obviously need to be adapted for a M$ system. You'll also need to install s3-simple-fuse. If you wind up eventually putting your job to the clound, I'd recommend fabric to run the same command.
import os, subprocess
'''
Note: this is for Linux with s3cmd installed and libfuse2 installed
Run: 'fusermount -u mount_directory' to unmount
'''
def mountS3(aws_access_key_id, aws_secret_access_key, targetDir, bucketName = None):
#######
if bucketName is None:
bucketName = 's3Bucket'
mountDir = os.path.join(targetDir, bucketName)
if not os.path.isdir(mountDir):
os.path.mkdir(mountDir)
subprocess.call('s3-simple-fuse %s -o AWS_ACCESS_KEY_ID=%s,AWS_SECRET_ACCESS_KEY=%s,bucket=%s'%(mountDir, aws_access_key_id, aws_secret_access_key, bucketName)
I'd suggest using a separately-mounted EBS volume. I tried doing the same thing for some movie files. Access to S3 was slow, and S3 has some limitations like not being able to rename files, no real directory structure, etc.
You can set up EBS volumes in a RAID5 configuration and add space as you need it.

Categories