Local filesystem as a remote storage in Django

Local filesystem as a remote storage in Django - python

I use Amazon S3 as a part of my webservice. The workflow is the following:
User uploads lots of files to web server. Web server first stores them locally and then uploads to S3 asynchronously
User sends http-request to initiate job (which is some processing of these uploaded files)
Web service asks worker to do the job
Worker does the job and uploads result to S3
User requests the download link from web-server, somedbrecord.result_file.url is returned
User downloads result using this link
To work with files I use QueuedStorage backend. I initiate my FileFields like this:
user_uploaded_file = models.FileField(..., storage=queued_s3storage, ...)
result_file = models.FileField(..., storage=queued_s3storage, ...)
Where queued_s3storage is an object of class derived from ...backends.QueuedStorage and remote field is set to '...backends.s3boto.S3BotoStorage'.
Now I'm planning to deploy the whole system on one machine to run everything locally, I want to replace this '...backends.s3boto.S3BotoStorage' with something based on my local filesystem.
The first workaround was to use FakeS3 which can "emulate" S3 locally. Works, but this is not ideal, just extra unnecessary overhead.
I have Nginx server running and serving static files from particular directories. How do I create my "remote storage" class that actually stores files locally, but provides download links which lead to files served by Nginx? (something like http://myip:80/filedir/file1). Is there a standard library class for that in django?

The default storage backend for media files is local storage.
Your settings.py defines these two environment variables:
MEDIA_ROOT (link to docs) -- this is the absolute path to the local file storage folder
MEDIA_URL (link to docs) -- this is the webserver HTTP path (e.g. '/media/' or '//%s/media' % HOSTNAME
These are used by the default storage backend to save media files. From Django's default/global settings.py:
# Default file storage mechanism that holds media.
DEFAULT_FILE_STORAGE = 'django.core.files.storage.FileSystemStorage'
This configured default storage is used in FileFields for which no storage kwarg is provided. It can also be accessed like so: rom django.core.files.storage import default_storage.
So if you want to vary the storage for local development and production use, you can do something like this:
# file_storages.py
from django.conf import settings
from django.core.files.storage import default_storage
from whatever.backends.s3boto import S3BotoStorage
app_storage = None
if settings.DEBUG == True:
app_storage = default_storage
else:
app_storage = S3BotoStorage()
And in your models:
# models.py
from file_storages import app_storage
# ...
result_file = models.FileField(..., storage=app_storage, ...)
Lastly, you want nginx to serve the files directly from your MEDIA_URL. Just make sure that the nginx URL matches the path in MEDIA_URL.

I'm planning to deploy the whole system on one machine to run everything locally
Stop using QueuedStorage then, because "[QueuedStorage] enables having a local and a remote storage backend" and you've just said you don't want a remote.
Just use FileSystemStorage and configure nginx to serve the location / settings.MEDIA_ROOT

Related

How to deploy a flask application with a config file?

I have a flask application and I use a config file with some sensitive information. I was wondering how to deploy my application with the config file without releasing the sensitive information it holds.

TLDR; Create a class to hold your config secrets, store the actual secrets in environment variables on your host machine, and read in the environment variables in your app.
Detailed implementation below.
This is my folder structure:
api
|_cofig
|_config.py
|_app.py
Then inside of my app.py, which actually starts my Flask application, it looks roughly like this (I've excluded everything that doesn't matter).
from config.config import config
def create_app(app_environment=None):
if app_environment is None:
app = Flask(__name__)
app.config.from_object(config[os.getenv('FLASK_ENV', 'dev')])
else:
app = Flask(__name__)
app.config.from_object(config[app_environment])
if __name__ == "__main__":
app = create_app(os.getenv('FLASK_ENV', 'dev'))
app.run()
This allows you to dynamically specify an app environment. For example, you can pass the app environment by setting an environment variable and reading it in before you call create_app(). This is extremely useful if you containerize your Flask app using Docker or some other virtualization tool.
Lastly, my config.py file looks like this. You would change the attributes in each of my environment configs to your secrets.
import os
class ProdConfig:
# Database configuration
API_TOKEN = os.environ.get('PROD_MARKET_STACK_API_KEY_SECRET')
class DevConfig:
# Database configuration
API_TOKEN = os.environ.get('API_KEY_SECRET')
class TestConfig:
# Database configuration
API_TOKEN = os.environ.get('MARKET_STACK_API_KEY')
config = {
'dev': DevConfig,
'test': TestConfig,
'prod': ProdConfig
}
Further, you would access your config secrets throughout any modules in your Flask application via...
from flask import current_app
current_app.config['API_TOKEN']`

I believe the answer to your question may be more related to where your application is being deployed, rather than which web-framework you are using.
As far as I understand, it's a bad practice to store/track sensitive information (passwords and API keys for example) on your source files and you should probably avoid that.
If you have already commited that sensitive data and you want to remove it completely from your git history, I recommend checking this GitHub page.
A couple of high level solutions could be:
Have you config file access environment variables instead of hard coded values.
If you are using a cloud service such as Google Cloud Platform or AWS, you could use a secret manager to store your data and fetch it safely from your app.
Another approach could be storing the information encrypted (maybe with something like KMS), and decrypt it when needed (my least favorite).

I have deployed my flask web app api on azure. I have lot of config files for that I have created a separate directory where I keep all my config files. This is how my project directory looks like
configs
-> app_config.json
-> client_config.json
logs
-> app_debug.log
-> app_error.log
data
-> some other data related files
app.py
app.py is my main python file from which I have imported all the config files and below is how I use it
config_file = os.path.join(os.path.dirname(__file__), 'configs', 'app_config.json')
# Get the config data from config json file
json_data = open(config_file)
config_data = json.load(json_data)
json_data.close()
After this I can easily use config_data anywhere in the code:
mongo_db = connect_mongodb(username=config_data['MongoUsername'], password=config_data['MongoPassword'], url=config_data['MongoDBURL'], port=config_data['Port'], authdbname=config_data['AuthDBName'])

No api proxy found for service "app_identity_service" when running GAE script

I'm trying to run a custom stript to upload static files to a bucket.
import os
import sys
sys.path.append("/tools/google_appengine")
from google.appengine.ext import vendor
from google.appengine.api import app_identity
vendor.add('../libraries')
import cloudstorage as gcs
STATIC_DIR = '../dashboard/dist'
def main():
bucket_path = ''.join('/' + app_identity.get_default_gcs_bucket_name())
What I've been trying so far:
- initialize stubs manuaIlly
def initialize_service_apis():
from google.appengine.tools import dev_appserver
from google.appengine.tools.dev_appserver_main import ParseArguments
args, option_dict = ParseArguments(sys.argv) # Otherwise the option_dict isn't populated.
dev_appserver.SetupStubs('local', **option_dict)
(taken from https://blairconrad.wordpress.com/2010/02/20/automated-testing-using-app-engine-service-apis-and-a-memcaching-memoizer/)
But this gives me import error when importing dev_appserver lib.
Is there any way to resolve the issue ?
I need this script for an automatic deployment process.

The No api proxy found for service <blah> error messages typically indicate attempts to use GAE standard env infrastructure (packages under google.appengine in your case) inside standalone scripts, which is not OK. See GAE: AssertionError: No api proxy found for service "datastore_v3".
You have 2 options:
keep the code but make it execute inside a GAE app (as a request handler, for example), not as a standalone script
drop GAE libraries and switch to libraries designed to be used from standalone scrips. In your case you're looking for Cloud Storage Client Libraries. You may also need to adjust access control to the respective GAE app bucket.

I'm not familiar with dev_appserver.SetupStubs(), but I received this same error message while running unit tests in a testbed. In that environment, you have to explicitly enable stubs for any services you wish to test (see the docs).
In particular, initializing the app identity stub solved my problem:
from google.appengine.ext import testbed
t = testbed.Testbed()
t.init_app_identity_stub()

Amazon CloudFront distribution_id as a credential with Boto

I'm new to Python and Boto, I've managed to sort out file uploads from my server to S3.
But once I've uploaded a new file I want to do an invalidation request.
I've got the code to do that:
import boto
print 'Connecting to CloudFront'
cf = boto.connect_cloudfront()
cf.create_invalidation_request(aws_distribution_id, ['/testkey'])
But I'm getting an error: NameError: name 'aws_distribution_id' is not defined
I guessed that I could add the distribution id to the ~/.boto config, like the aws_secret_access_key etc:
$ cat ~/.boto
[Credentials]
aws_access_key_id = ACCESS-KEY-ID-GOES-HERE
aws_secret_access_key = ACCESS-KEY-SECRET-GOES-HERE
aws_distribution_id = DISTRIBUTION-ID-GOES-HERE
But that's not actually listed in the docs, so I'm not too surprised it failed:
http://docs.pythonboto.org/en/latest/boto_config_tut.html
My problem is I don't want to add the distribution_id to the script as I run it on both my live and staging servers, and I have different S3 and CloudFront set ups for both.
So I need the distribution_id to change per server, which is how I've got the the AWS access keys set.
Can I add something else to the boto config or is there a python user defaults I could add it to?

Since you can have multiple cloudfront distributions per account, it wouldn't make sense to configure it in .boto.
You could have another config file specific to your own environment and run your invalidation script using the config file as argument (or have the same file, but with different data depending on your env).

I solved this by using the ConfigParser. I added the following to the top of my script:
import ConfigParser
# read conf
config = ConfigParser.ConfigParser()
config.read('~/my-app.cnf')
distribution_id = config.get('aws_cloudfront', 'distribution_id')
And inside the conf file at ~/.my-app.cnf
[aws_cloudfront]
distribution_id = DISTRIBUTION_ID
So on my live server I just need to drop the cnf file into the user's home dir and change the distribution_id

Django and Apache: Create folder on another server in the network

I'm currently developing a Django application for internal use which runs on one server (Server 1) in a local network but needs write access to another server (Server 2) when data is saved to the database.
When a new record is saved, Django creates a new directory on the external server (Server 2) with an appropriate foldername. This was working well on the Django testserver which seemed to have access to the entire local network.
I've now successfully deployed my Django application with Apache and mod_wsgi but the folder creation procedure doesn't seems to work any more. I've tried a few things but can't seem to fix it quickly. Any ideas? Can this actually be achieved with Django and Apache?
def create_folder(self,request,obj,form, change, serverfolder, templatefolder):
try:
source_dir = templatefolder # Replace with path to project folder template
if not os.path.exists(destination_dir):
dir_util.copy_tree(source_dir,destination_dir)
obj.projectfolder = destination_dir
messages.success(request,"Project folder created on %s" % (serverfolder))
obj.create_folder = False
obj.has_folder = True
else:
messages.warning(request,"No new project folder created on %s server" % (obj.office.abbreviation))
except Exception,e:
messages.warning(request,str(e) + " Error during project folder creation on %s server!" % (obj.office.abbreviation))
def save_model(self, request, obj, form, change):
serverfolder = r'\\C-s-002\Projects' #C-s-002 is the external server in the same local network as the server on which Django is running
templatefolder = r'\\C-s-002\Projects\XXX Project Template'
self.create_folder(request,obj,form, change, serverfolder, templatefolder)

There are various approaches you can take here, so I will not attempt to exhaust all possibilities:
Option 1: Call an external command with Python. This is not specific to Django or Apache.
Option 2: Set up a web service on Server 2 that you can access via API calls to handle the file/directory creation needed by Server 1. This could be implemented with Django.

How to store private key on Heroku?

I have a flask app hosted on Heroku that needs to run commands on an AWS EC2 instance (Amazon Linux AMI) using boto.cmdshell. A couple of questions:
Is using a key pair to access the EC2 instance the best practice? Or is using username/password better?
If using a key pair is the preferred method, what's the best practice on managing/storing private keys on Heroku? Obviously putting the private key in git is not an option.
Thanks.

Heroku lets you take advantage of config variables to manage your application. Here is an exmaple of my config.py file that lives inside my flask application:
import os
# flask
PORT = int(os.getenv("PORT", 5000))
basedir = str(os.path.abspath(os.path.dirname(__file__)))
SECRET_KEY = str(os.getenv("APP_SECRET_KEY"))
DEBUG = str(os.getenv("DEBUG"))
ALLOWED_EXTENSIONS = str(os.getenv("ALLOWED_EXTENSIONS"))
TESTING = os.getenv("TESTING", False)
# s3
AWS_ACCESS_KEY_ID = str(os.getenv("AWS_ACCESS_KEY_ID"))
AWS_SECRET_ACCESS_KEY = str(os.getenv("AWS_SECRET_ACCESS_KEY"))
S3_BUCKET = str(os.getenv("S3_BUCKET"))
S3_UPLOAD_DIRECTORY = str(os.getenv("S3_UPLOAD_DIRECTORY"))
Now i can have two different sets of results. It pulls from my Environment variables. One when my application is on my local computer and from Heroku config variables when in production. For example.
DEBUG = str(os.getenv("DEBUG"))
is "TRUE" on my local computer. But False on Heroku. In order to check your Heroku config run.
Heroku config
Also keep in mind that if you ever want to keep some files part of your project locally but not in heroku or on github you can use git ignore. Of course those files won't exist on your production application then.

What I was looking for was guidance on how to deal with private keys. Both #DrewV and #yfeldblum pointed me to the right direction. I ended up turning my private key into a string and storing it in a Heroku config variables.
If anyone is looking to do something similar, here's a sample code snippit using paramiko:
import paramiko, base64
import StringIO
import os
key = paramiko.RSAKey.from_private_key(StringIO.StringIO(str(os.environ.get("AWS_PRIVATE_KEY"))))
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect(str(os.environ.get("EC2_PUBLIC_DNS")), username='ec2-user', pkey=key)
stdin, stdout, stderr = ssh.exec_command('ps')
for line in stdout:
print '... ' + line.strip('\n')
ssh.close()
Thanks to #DrewV and #yfeldblum for helping (upvote for both).

You can use config vars to store config items in an application running on Heroku.
You can use a username/password combination. You may make the username something easy; but be sure to generate a strong password, e.g., with something like openssl rand -base64 32.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Local filesystem as a remote storage in Django - python

Related

How to deploy a flask application with a config file?

No api proxy found for service "app_identity_service" when running GAE script

Amazon CloudFront distribution_id as a credential with Boto

Django and Apache: Create folder on another server in the network

How to store private key on Heroku?

Categories

Resources