I have an application in Google App Engine that consists in 2 modules (A and B). A handles user requests and it's available without authentication. B is a microservice that perform certain tasks when A requires it. So we have A making requests to B using urlfetch:
from google.appengine.api import urlfetch
from google.appengine.api import app_identity
rpc = urlfetch.create_rpc()
urlfetch.make_fetch_call(
rpc,
"https://b-dot-my-project.appspot.com/some/url",
method='GET',
follow_redirects=False,
headers = {
'X-Appengine-Inbound-Appid': 'my-project',
},
)
response = rpc.get_result()
B's app.yaml looks something like:
runtime: python27
api_version: 1
threadsafe: yes
service: b
handlers:
- url: /.*
script: my_module.app
login: admin
auth_fail_action: unauthorized
In the docs, they suggest:
When issuing a request to another App Engine app, your App Engine app
must assert its identity by adding the header
X-Appengine-Inbound-Appid to the request. If you instruct the URL
Fetch service to not follow redirects, App Engine will add this header
to requests automatically.
No matter what I do, I keep getting a 401 when making this request. Both A and B are deployed in the same project. Tried setting follow_redirects=False and adding the headers X-Appengine-Inbound-Appid manually (though I didn't expect it to work for the reasons described here), still not sure if the header is being set, as the logs for B don't include request headers and the failure condition happens before my handler module gets executed.
I would rather if possible to rely on A authenticating to B rather than just dropping the option login: admin and rely only on the header, as it is nicer to be able to call B from a project admin account (for debugging purposes for example).
Instead of specifying login: admin in your config, use the python library instead: https://cloud.google.com/appengine/docs/standard/python/refdocs/google.appengine.api.users This way you can check for the app engine header first, and fallback to the admin google user.
Instead of login:admin, you could check the header in module B request for 'HTTP_USER_AGENT': 'AppEngine-Google; (+http://code.google.com/appengine; appid: s~my-project)'. That tells you it came from urlfetch, taskqueue, or cron job.
Related
I set SERVER_NAME in my Flask app to start using subdomains so I can have e.g. frontend and backend on two different subdomains:
frontend.domain.com
backend.domain.com
I set Flask like this:
app.config['SERVER_NAME'] = 'domain.com'
app.url_map.default_subdomain = "frontend"
The app is published using Google App Engine, everything works fine, except the default App Engine domain https://PROJECT_ID.REGION_ID.r.appspot.com now returns a 404 because I understand Flask is not recognising any matching route.
I thought it was fine since I never used https://PROJECT_ID.REGION_ID.r.appspot.com, now I know I was wrong...
https://PROJECT_ID.REGION_ID.r.appspot.com is used by Google Task Cloud to route tasks and e.g. myapp.ey.r.appspot.com/my_task_worker, which is called by Cloud Tasks create_task, now hits a Not Found 404 while it worked before I set SERVER_NAME
How do I fix this? Do I have to hardcode myapp.ey.r.appspot.com in my Flask app somehow?
Here's an extract of my app.yaml, adapted:
runtime: python37
handlers:
- url: /.*
secure: always
redirect_http_response_code: 301
script: auto
env_variables:
DEBUG: False
SERVER_NAME: 'domain.com'
DEFAULT_SUBDOMAIN: 'frontend'
GCP_PROJECT: 'myapp'
CLOUD_TASK_LOCATION: 'europe-west3'
CLOUD_TASK_QUEUE: 'default'
GOOGLE_CLOUD_PLATFORM_API_KEY: 'xxxxxxxx'
...
Do I have to hardcode myapp.ey.r.appspot.com in my Flask app somehow?
Yes. The problem here is that you're managing the redirection from your App instead of leaving App engine to do it. Although this isn't a bad practice by its own, it leaves many of the App Engine features out and most important, as you already mentioned, other GCP products like Cloud Tasks expect a specific behaviour in order to work properly.
How do I fix this?
Under your current architecture you would have to add a routing to the default URL, however as far as I know Flask doesn't allow to route more than one domain, so you would have to switch the 'SERVER_NAME' to the default app engine or change into something like Django that supports multiple domains.
My suggestion is to map your subdomains to App Engine services (one for your frontend and one for your backend) and leave the routing to GCP (and remove the 'SERVER_NAME'). You can make use of the dispatch.yaml to do the routing, you can for example create the next routes:
dispatch:
# Default service serves the typical web resources and all static resources.
- url: "myapp.ey.r.appspot.com/*"
service: default
- url: "frontend.domain.com/*"
service: frontend
- url: "backend.domain.com/*"
service: backend
I am using the Automated Certificate Management through heroku in order to implement SSL for my application. My application will successfully connect securely using HTTPS if https://www.myapp.com is used, but if www.myapp.com or myapp.com is used, it defaults to HTTP.
In Heroku the domains that have been added are respectively as follows:
Domain Name: myapp.com, www.myapp.com
DNS Target: myapp.com.herokudns.com, www.myapp.com.herokudns.com
In google domains I have a subdomain forward record as follows:
myapp.com -> https://www.myapp.com
and under Custom resource records I have:
Name: www
Type: CNAME
Date: www.myapp.com.herokudns.com
Is there a way to force https through google domains or heroku-cli, or is this something I need to do in my Python app?
The easiest way is to use flask-sslify:
https://github.com/kennethreitz/flask-sslify
It turns every http request to your app into a https request
you only have to add one line of code to you app (or app factory):
from flask import Flask
from flask_sslify import SSLify
app = Flask(__name__)
sslify = SSLify(app)
flask-sslify doesn't seem to be maintained anymore. Heroku suggests looking at flask-talisman. But the csp requirements don't look trivial to me.
There really needs to be a simpler solution for this.
Citing Google App Engine inter module communication authorization the problem I have is that in the Docs (communication between modules) says:
You can configure any manual or basic scaling module to accept
requests from other modules in your app by restricting its handler to
only allow administrator accounts, specifying login: admin for the
appropriate handler in the module's configuration file. With this
restriction in place, any URLFetch from any other module in the app
will be automatically authenticated by App Engine, and any request
that is not from the application will be rejected.
And this is exactly the configuration I have for my module called "api1". In my app.yaml file I have:
# can accept requests from other modules.
# with login: admin and they are authenticated automatically.
- url: /.*
script: _go_app
login: admin
I'm trying now, from a different module in the same app, to make a service call as suggested in the doc using urfetch.fetch() method, and my implementation is:
from google.appengine.api import urlfetch, modules, app_identity
from rest_framework.response import Response, status
#api_view(['POST'])
def validate_email(request):
url = "http://%s/" % modules.get_hostname(module="api1")
payload = json.dumps({"SOME_KEY":"SOME_VALUE"})
appid = app_identity.get_application_id()
result = urlfetch.fetch(url + "emails/validate/document",
follow_redirects=False,
method=urlfetch.POST,
payload=payload,
headers={"Content-Type":"application/json")
return Response({
'status_code': result.status_code,
'content': result.content
}, status=status.HTTP_200_OK)
According to the documentation, having specified the follow_redirects=False, fetch() will automatically insert an header in my call (I've even tried to add it explicitly) with the "X-Appengine-Inbound-Appid" : MY-APP-ID.
Unfortunately I get as result of the fetch call a 302 redirect, if I follow it, it's a redirect to the authentication form. This occurs in Development server as well as in Production.
Can you please let me know how can I call my api1 service inside my validate_email method (belonging to a different module in the same app)?
Is there another way to authenticate the call since it seems the way suggested inside the documentation is not working?
Thank you
As written here this is a tracked issue now on google appengine public issue tracker. So everyone can go there to check for updates.
In the meanwhile I solved the issue removing the login: admin from the app.yaml and in the handler of my service I've checked manually for the existence of the header X-Appengine-Inbound-Appid and its value.
I am developing a Flask application based web application ( https://github.com/opensourcehacker/sevabot ) which has HTTP based API services.
Many developers are using and extending the API and I'd like to add a feature which prints Flask's HTTP request to Python logging output, so you can see raw HTTP payloads, source IP and headers you get.
What hooks Flask offers where this kind of HTTP request dumping would be the easiest to implement
Are there any existing solutions and best practices to learn from?
Flask makes a standard logger available at at current_app.logger, there's an example configuration in this gist, though you can centralise the logging calls in a before_request handler if you want to log every request:
from flask import request, current_app
#app.before_request
def log_request():
if current_app.config.get('LOG_REQUESTS'):
current_app.logger.debug('whatever')
# Or if you dont want to use a logger, implement
# whatever system you prefer here
# print request.headers
# open(current_app.config['REQUEST_LOG_FILE'], 'w').write('...')
I want to proxy requests made to my Flask app to another web service running locally on the machine. I'd rather use Flask for this than our higher-level nginx instance so that we can reuse our existing authentication system built into our app. The more we can keep this "single sign on" the better.
Is there an existing module or other code to do this? Trying to bridge the Flask app through to something like httplib or urllib is proving to be a pain.
I spent a good deal of time working on this same thing and eventually found a solution using the requests library that seems to work well. It even handles setting multiple cookies in one response, which took a bit of investigation to figure out. Here's the flask view function:
from dotenv import load_dotenv # pip package python-dotenv
import os
#
from flask import request, Response
import requests # pip package requests
load_dotenv()
API_HOST = os.environ.get('API_HOST'); assert API_HOST, 'Envvar API_HOST is required'
#api.route('/', defaults={'path': ''}) # ref. https://medium.com/#zwork101/making-a-flask-proxy-server-online-in-10-lines-of-code-44b8721bca6
#api.route('/<path>')
def redirect_to_API_HOST(path): #NOTE var :path will be unused as all path we need will be read from :request ie from flask import request
res = requests.request( # ref. https://stackoverflow.com/a/36601467/248616
method = request.method,
url = request.url.replace(request.host_url, f'{API_HOST}/'),
headers = {k:v for k,v in request.headers if k.lower() == 'host'},
data = request.get_data(),
cookies = request.cookies,
allow_redirects = False,
)
#region exlcude some keys in :res response
excluded_headers = ['content-encoding', 'content-length', 'transfer-encoding', 'connection'] #NOTE we here exclude all "hop-by-hop headers" defined by RFC 2616 section 13.5.1 ref. https://www.rfc-editor.org/rfc/rfc2616#section-13.5.1
headers = [
(k,v) for k,v in res.raw.headers.items()
if k.lower() not in excluded_headers
]
#endregion exlcude some keys in :res response
response = Response(res.content, res.status_code, headers)
return response
Update April 2021: excluded_headers should probably include all "hop-by-hop headers" defined by RFC 2616 section 13.5.1.
I have an implementation of a proxy using httplib in a Werkzeug-based app (as in your case, I needed to use the webapp's authentication and authorization).
Although the Flask docs don't state how to access the HTTP headers, you can use request.headers (see Werkzeug documentation). If you don't need to modify the response, and the headers used by the proxied app are predictable, proxying is staightforward.
Note that if you don't need to modify the response, you should use the werkzeug.wsgi.wrap_file to wrap httplib's response stream. That allows passing of the open OS-level file descriptor to the HTTP server for optimal performance.
My original plan was for the public-facing URL to be something like http://www.example.com/admin/myapp proxying to http://myapp.internal.example.com/. Down that path leads madness.
Most webapps, particularly self-hosted ones, assume that they're going to be running at the root of a HTTP server and do things like reference other files by absolute path. To work around this, you have to rewrite URLs all over the place: Location headers and HTML, JavaScript, and CSS files.
I did write a Flask proxy blueprint which did this, and while it worked well enough for the one webapp I really wanted to proxy, it was not sustainable. It was a big mess of regular expressions.
In the end, I set up a new virtual host in nginx and used its own proxying. Since both were at the root of the host, URL rewriting was mostly unnecessary. (And what little was necessary, nginx's proxy module handled.) The webapp being proxied to does its own authentication which is good enough for now.