Django Heroku Robots.txt Exclude from Production Pipeline

Django Heroku Robots.txt Exclude from Production Pipeline - python

I am testing out Pipelines on Heroku.
I have a staging app and a Production app in a pipeline and I had two issues which arose at the same time, and so may or nor may not be interrelated....
The first was how to run commands from my CLI on both my staging and production app.
This partially answered my question but not entirely. I found the solution was to set my staging app as the default: git config heroku.remote staging
Then to run my production apps commands I can run a command like so: heroku run python manage.py createsuperuser -a, --app your-app-name
The other issue which remains unresolved, seems to have a solution for Ruby is how to control my robots.txt from staging to production. I want my staging app to be hidden from Google indexing etc. but I don't want this to be transferred over to my production app (of course). Perhaps I shouldn't be using robots at all? Any help would be appreciated...

In the absence of any suggestions, I created a solution for this problem, namely how to prevent indexing of a staging app by google when using Heroku pipelines.
The issue is when "promoting" your linked repo from staging to production there seemed to be no obvious way to prevent the staging app being indexed by search engines but whilst still ensuring your production app is indexed.
I decided on limiting all views via a middleware according to IP address. Now only specific IPs can access the staging app on heroku. Perhaps this is not the best way, but in the absence of any other answer, this seems to work:
from django.core.exceptions import PermissionDenied
import os
def IPcheckMIddleware(get_response):
def middleware(request, *args, **kwargs):
herokuEnv = os.environ['IS_LIVE']
if herokuEnv == 'FALSE':
ip1=os.environ['IP_CHECKER']
ip2=os.environ['IP_CHECKER_1']
x_forwarded_for = request.META.get('HTTP_X_FORWARDED_FOR')
if x_forwarded_for:
ips = x_forwarded_for.split(',')[-1]
else:
ips = request.META.get('REMOTE_ADDR')
if ips in [ip1,ip2]:
pass
else:
raise PermissionDenied
else:
pass
response = get_response(request)
return response
return middleware
Hope that helps anyone with the same/similar issue...!

Related

Can't serve swagger on remotely hosted server with different url

I'm trying to serve some simple service using flask and flask_restx (a forked project of flask-restplus, that would be eventually served on AWS.
When it is served, I want to generate swagger page for others to test it easily.
from flask import Flask
from flask_restx import Api
from my_service import service_namespace
app = Flask(__name__)
api = Api(app, version='1.0')
api.add_namespace(service_namespace)
if __name__ == '__main__':
app.run(debug=True)
When I test it locally (e.g. localhost:5000), it works just fine. Problem is, when it is hosted on AWS, because it has a specific domain (gets redirected?) (e.g. my-company.com/chris-service to a container), the document page is unable to find its required files like css and so:
What I've looked and tried
Python (Flask + Swagger) Flasgger throwing 404 error
flask python creating swagger document error
404 error in Flask
Also tried adding Blueprint (albeit without knowing exactly what it does):
app = Flask(__name__)
blueprint = Blueprint("api", __name__,
root_path="/chris-service",
# url_prefix="/chris-service", # doesn't work
)
api = Api(blueprint)
app.register_blueprint(blueprint)
...
And still no luck.
Update
So here's more information as per the comments (pseudo, but technically identical)
Access point for the swagger is my-company.com/chris (with or without http:// or https:// doesn't make difference)
When connecting to the above address, the request URL for the assets are my-company.com/swaggerui/swagger-ui.css
You can access the asset in my-company.com/chris/swaggerui/swagger-ui.css
So I my resolution (which didn't work) was to somehow change the root_path (not even sure if it's the correct wording), as shown in What I've looked and tried.
I've spent about a week to solve this but can't find a way.
Any help will be greatful :) Thanks

Swagger parameters defined at apidoc.py file. Default apidoc object also created in this file. So if you want to customize it you have change it before app and api initialization.
In your case url_prefix should be changed (I recommend to use environment variables to be able set url_prefix flexibly):
$ export URL_PREFIX='/chris'
from os import environ
from flask import Flask
from flask_restx import Api, apidoc
if (url_prefix := environ.get('URL_PREFIX', None)) is not None:
apidoc.apidoc.url_prefix = url_prefix
app = Flask(__name__)
api = Api(app)
...
if __name__ == '__main__':
app.run()

Always very frustrating when stuff is working locally but not when deployed to AWS. Reading this github issue, these 404 errors on swagger assets are probably caused by:
Missing javascript swagger packages
Probably not the case, since flask-restx does this for you. And running it locally should also not work in this case.
Missing gunicorn settings
Make sure that you are also setting gunicorn up correctly as well with
--forwarded-allow-ips if deploying with it (you should be). If you are in a kubernetes cluster you can set this to *
https://docs.gunicorn.org/en/stable/settings.html#forwarded-allow-ips
According to this post, you also have to explicitly set
settings.FLASK_SERVER_NAME to something like http://ec2-10-221-200-56.us-west-2.compute.amazonaws.com:5000
If that does not work, try to deploy a flask-restx example, that should definetely work. This rules out any errors on your end.

Using Flask with subdomain breaks Google Task Queue routing

I set SERVER_NAME in my Flask app to start using subdomains so I can have e.g. frontend and backend on two different subdomains:
frontend.domain.com
backend.domain.com
I set Flask like this:
app.config['SERVER_NAME'] = 'domain.com'
app.url_map.default_subdomain = "frontend"
The app is published using Google App Engine, everything works fine, except the default App Engine domain https://PROJECT_ID.REGION_ID.r.appspot.com now returns a 404 because I understand Flask is not recognising any matching route.
I thought it was fine since I never used https://PROJECT_ID.REGION_ID.r.appspot.com, now I know I was wrong...
https://PROJECT_ID.REGION_ID.r.appspot.com is used by Google Task Cloud to route tasks and e.g. myapp.ey.r.appspot.com/my_task_worker, which is called by Cloud Tasks create_task, now hits a Not Found 404 while it worked before I set SERVER_NAME
How do I fix this? Do I have to hardcode myapp.ey.r.appspot.com in my Flask app somehow?
Here's an extract of my app.yaml, adapted:
runtime: python37
handlers:
- url: /.*
secure: always
redirect_http_response_code: 301
script: auto
env_variables:
DEBUG: False
SERVER_NAME: 'domain.com'
DEFAULT_SUBDOMAIN: 'frontend'
GCP_PROJECT: 'myapp'
CLOUD_TASK_LOCATION: 'europe-west3'
CLOUD_TASK_QUEUE: 'default'
GOOGLE_CLOUD_PLATFORM_API_KEY: 'xxxxxxxx'
...

Do I have to hardcode myapp.ey.r.appspot.com in my Flask app somehow?
Yes. The problem here is that you're managing the redirection from your App instead of leaving App engine to do it. Although this isn't a bad practice by its own, it leaves many of the App Engine features out and most important, as you already mentioned, other GCP products like Cloud Tasks expect a specific behaviour in order to work properly.
How do I fix this?
Under your current architecture you would have to add a routing to the default URL, however as far as I know Flask doesn't allow to route more than one domain, so you would have to switch the 'SERVER_NAME' to the default app engine or change into something like Django that supports multiple domains.
My suggestion is to map your subdomains to App Engine services (one for your frontend and one for your backend) and leave the routing to GCP (and remove the 'SERVER_NAME'). You can make use of the dispatch.yaml to do the routing, you can for example create the next routes:
dispatch:
# Default service serves the typical web resources and all static resources.
- url: "myapp.ey.r.appspot.com/*"
service: default
- url: "frontend.domain.com/*"
service: frontend
- url: "backend.domain.com/*"
service: backend

How to connect to the remote Google Datastore from localhost using dev_appserver.py?

I simply need an efficient way to debug GAE application, and to do so I need to connect to the production GAE infrastructure from the localhost when running dev_appserver.py.
Next code work well if I run it as a separate script:
import argparse
try:
import dev_appserver
dev_appserver.fix_sys_path()
except ImportError:
print('Please make sure the App Engine SDK is in your PYTHONPATH.')
raise
from google.appengine.ext import ndb
from google.appengine.ext.remote_api import remote_api_stub
def main(project_id):
server_name = '{}.appspot.com'.format(project_id)
remote_api_stub.ConfigureRemoteApiForOAuth(
app_id='s~' + project_id,
path='/_ah/remote_api',
servername=server_name)
# List the first 10 keys in the datastore.
keys = ndb.Query().fetch(10, keys_only=True)
for key in keys:
print(key)
if __name__ == '__main__':
parser = argparse.ArgumentParser(
description=__doc__,
formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument('project_id', help='Your Project ID.')
args = parser.parse_args()
main(args.project_id)
With this script, I was able to get data from remote Datastore. But where is I need to put the same code in my application(which is obviously not a single script) to make it work?
I've tried to put remote_api_stub.ConfigureRemoteApiForOAuth() code in the appengine_config.py but I've got a recursion error.
I'm running app like this:
dev_appserver.py app.yaml --admin_port=8001 --enable_console --support_datastore_emulator=no --log_level=info
The application uses NDB to access Google Datastore.
Application contain many modules and files and I simply don't know where is to put remote_api_stab auth code.
I hope somebody from the google team will see this topic because I've searched all the internet without any results. That's unbelievable how many people developing apps for the GAE platform, but it looks like nobody is developing/debugging apps locally.

Permanently Redirect http to https on Google App Engine Flexible with Django

I'm working on a project that uses Google Cloud Platform's App Engine in the Python 3 Flexible Environment using Django, and I'm trying to permanently redirect all requests over http to https for all routes, but so far have not been successful. I can access the site over https, but only if explicitly written in the address bar.
I've looked at this post: How to permanently redirect `http://` and `www.` URLs to `https://`? but did not find the answer useful.
The app works properly in every sense except for the redirecting. Here is my app.yaml file:
# [START runtime]
runtime: python
env: flex
entrypoint: gunicorn -b :$PORT myproject.wsgi
runtime_config:
python_version: 3
# [END runtime]
In myproject/settings.py I have these variables defined:
SECURE_SSL_REDIRECT = True
SESSION_COOKIE_SECURE = True
CSRF_COOKIE_SECURE = True
SECURE_PROXY_SSL_HEADER = ('HTTP-X-FORWARDED-PROTO', 'https')
On my local machine, when I set SECURE_SSL_REDIRECT to True, I was redirected to https properly, even though SSL is not supported on localhost. In production, I am still able to access the site using just http.
Is there something I'm missing or doing wrong to cause the redirect not to happen?

Setting secure in app.yaml only works for GAE Standard but not in Flexible. The app.yaml docs for Flexible do not mention this key at all.
You will probably have to do it on application level by inspecting the value of the X-Forwarded-Proto header. It will be set to https if the request to your app came by HTTPS. You can find more info on environment-provided headers in Flexible environment in the docs here.

Make sure you have SecurityMiddleware and CommonMiddleware enabled, and assign a Base_URL:
settings.py:
MIDDLEWARE_CLASSES = (
...
'django.middleware.security.SecurityMiddleware'
'django.middleware.common.CommonMiddleware',
)
BASE_URL = 'https://www.example.com'
Or, you could write your own middleware:
MIDDLEWARE_CLASSES = (
...
'core.my_middleware.ForceHttps',
)
BASE_URL = 'https://www.example.com'
my_middleware.py:
from django.http import HttpResponsePermanentRedirect
class ForceHttps(object):
def process_request(self, request):
if not (request.META.get('HTTPS') == 'on' and settings. BASE_URL == 'https://' + request.META.get('HTTP_HOST') ):
return HttpResponsePermanentRedirect(settings. BASE_URL + request.META.get('PATH_INFO'))
else:
return None

The issue is the header name. When accessing Django through a WSGI server, you should use the X-Forwarded-Proto header instead of the HTTP_X_FORWARDED_PROTO.
See: Why does django ignore HTTP_X_FORWARDED_PROTO from the wire but not in tests?

I had a similar problem and tried a number changes both in the app.yaml and also in settings.py for a custom domain (with the default ssl cert supplied by GAE).
Through trial and error I found that in settings.py updating the allowed hosts to the appropriate domains had the desired result:
ALLOWED_HOSTS = ['https://{your-project-name}.appspot.com','https://www.yourcustomdomain.com']
Update: I am no longer sure the above is the reason as on a subsequent deploy the above was rejected and I was getting a hosts error. However the redirect is still in place... :(
Before this change I was able to switch between http:// and https:// manually in the address bar now it redirects automaticlly.

In order to make this work both on App Engine Flexible and your local machine when testing, you should set the following in your settings.py
if os.getenv('GAE_INSTANCE'):
SECURE_SSL_REDIRECT = True
SECURE_PROXY_SSL_HEADER = ('HTTP_X_FORWARDED_PROTO', 'https')
else:
# Running on your local machine
SECURE_SSL_REDIRECT = False
SECURE_PROXY_SSL_HEADER = None
That should be all you need to do to ensure that redirect is happening properly when you are on app engine.
NOTE: If you are using old school app engine cron jobs (via cron.yaml) then you will need to start using the much improved cloud scheduler instead. App engine cron jobs do not properly support redirection to HTTPS but you can easily get it working with cloud scheduler.

appcfg.py shows You must be logged in as an administrator

When I try to upload a sample csv data to my GAE app through appcfg.py, it shows the below 401 error.
2015-11-04 10:44:41,820 INFO client.py:571 Refreshing due to a 401 (attempt 2/2)
2015-11-04 10:44:41,821 INFO client.py:797 Refreshing access_token
Error 401: --- begin server output ---
You must be logged in as an administrator to access this.
--- end server output ---
Here is the command I tried,
appcfg.py upload_data --application=dev~app --url=http://localhost:8080/_ah/remote_api --filename=data/sample.csv

This is how we do it in order to use custom authentication.
Custom handler in app.yaml
- url: /remoteapi.*
script: remote_api.app
Custom wsgi app in remote_api.py to override CheckIsAdmin
from google.appengine.ext.remote_api import handler
from google.appengine.ext import webapp
import re
MY_SECRET_KEY = 'MAKE UP PASSWORD HERE' # make one up, use the same one in the shell command
cookie_re = re.compile('^"?([^:]+):.*"?$')
class ApiCallHandler(handler.ApiCallHandler):
def CheckIsAdmin(self):
"""Determine if admin access should be granted based on the
auth cookie passed with the request."""
login_cookie = self.request.cookies.get('dev_appserver_login', '')
match = cookie_re.search(login_cookie)
if (match and match.group(1) == MY_SECRET_KEY
and 'X-appcfg-api-version' in self.request.headers):
return True
else:
self.redirect('/_ah/login')
return False
app = webapp.WSGIApplication([('.*', ApiCallHandler)])
From here we script the uploading of data that was exported from our live app. Use the same password that you made up in the python script above.
echo "MAKE UP PASSWORD HERE" | appcfg.py upload_data --email=some#example.org --passin --url=http://localhost:8080/remoteapi --num_threads=4 --kind=WebHook --filename=webhook.data --db_filename=bulkloader-progress-webhook.sql3
WebHook and webhook.data are specific to the Kind that we exported from production.

I had a similar issue, where appcfg.py was not giving me any credentials dialog, so I could not authenticate. I downgraded from GAELauncher 1.27 to 1.26, and the authentication started working again.
Temporary solution: go to https://console.developers.google.com/storage/browser/appengine-sdks/featured/ to get version 1.9.26
Submitted bug report: https://code.google.com/p/google-cloud-sdk/issues/detail?id=340

You cannot use the appcfg.py upload_data command with the development server [edit: as is; see Josh J's answer]. It only works with the remote_api endpoint running on App Engine and authenticated with OAuth2.
An easy way to load data into the dev server's datastore is to create an endpoint that reads a CSV file and creates the appropriate datastore entities, then hit it with the browser. (Be sure to remove the endpoint before deploying the app, or restrict access to the URL with login: admin.)

You must have an oauth token for a google account that is not an admin of that project. Try passing the --no_cookies flag so that it prompts for authentication again.

Maybe this has something to do with it? From the docs
Connecting your app to the local development server
To use the local development server for your app running locally, you
need to do the following:
Set environment variables. Add or modify your app's Datastore
connection code. Setting environment variables
Create an environment variable DATASTORE_HOST and set it to the host
and port on which the local development server is listening. The
default host and port is http://localhost:8080. (Note: If you use the
port and/or host command line arguments to change these defaults, be
sure to adjust DATASTORE_HOST accordingly.) The following bash shell
example shows how to set this variable:
export DATASTORE_HOST=http://localhost:8080 Create an environment
variable named DATASTORE_DATASET and set it to your dataset ID, as
shown in the following bash shell example:
export DATASTORE_DATASET= Note: Both the Python and Java
client libraries look for the environment variables DATASTORE_HOST and
DATASTORE_DATASET.
Link to Docs
https://cloud.google.com/datastore/docs/tools/devserver

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django Heroku Robots.txt Exclude from Production Pipeline - python

Related

Can't serve swagger on remotely hosted server with different url

Using Flask with subdomain breaks Google Task Queue routing

How to connect to the remote Google Datastore from localhost using dev_appserver.py?

Permanently Redirect http to https on Google App Engine Flexible with Django

appcfg.py shows You must be logged in as an administrator

Categories

Resources