Weird `/` behavior when Serving Flask with Apache2 + Gunicorn

Weird `/` behavior when Serving Flask with Apache2 + Gunicorn - python

I'm trying to build multiple endpoints and subendpoints within my application, part of it as a learning exercise, and part of it is that I have 2 domains.
For simplicity I'm going to refer to them as domain1 and domain2.
My Flask listening endpoints are on /api1 and /api2 for domains 1 & 2 respectively. Gunicorn is bound to listen on a unix socket at sock/domain1.sock and sock/domain2.sock. So far everything is working this way.
My Apache2 proxies the endpoints into the proper socket as follows:
for domain1 I have:
<Location /api>
ProxyPass unix:/var/www/socks/domain1.sock|http://127.0.0.1/api1
ProxyPassReverse unix:/var/www/socks/domain1.sock|http://127.0.0.1/api1
</Location>
for domain2 I have:
<Location /api>
ProxyPass unix:/var/www/socks/domain2.sock|http://127.0.0.1/api2
ProxyPassReverse unix:/var/www/socks/domain2.sock|http://127.0.0.1/api2
</Location>
I know that I don't need to have 2 sockets, but I'm doing so just for testing.
Now when I open domain1.com/api things are working perfectly. And so are for domain2.com/api
But when I open domain1.com/api/ (with a slash at the end) or domain2.com/api/ then it gives me a Site Not Found error. This is understandable since in my Flask I'm listening to the endpoint without a trailing slash. The fix for that is to implement / into my flask endpoint. So when I do that, the weird behavior occurs.
New Flask listening Endpoints are /api1/ and /api2/ (with trailing slash).
Now when I open domain.com/api/ it is working as intended. But when I'm on domain.com/api (without the slash) it's referring me to either domain.comapi or 127.0.01/api, where both are wrong scenarios. I tried to add a trailing slash in my Apache config, and tried multiple Flask approaches but they're all doing the same weird behavior and I can't understand why it's doing that. Now personally I don't mind using the endpoint without the slash, I just want to understand why this is happening. I also tried googling a lot but nothing came up related to my query.
Reproduceable Behavior:
I'm unable to link the 2nd domain as it is a protected IP for my company, so I created multiple endpoints so that you can click on to simulate the behavior.
https://thethiny.xyz/api1 -> sock|http://127.0.0.1/api1 -> internal /api1
https://thethiny.xyz/api2 -> sock|http://127.0.0.1/api2 -> internal /api2/
https://thethiny.xyz/api3 -> sock|http://127.0.0.1/api1/ -> internal /api1
https://thethiny.xyz/api4 -> sock|http://127.0.0.1/api2/ -> internal /api2
Working:
https://thethiny.xyz/api1
https://thethiny.xyz/api2/
https://thethiny.xyz/api4
Not Found:
https://thethiny.xyz/api1/
https://thethiny.xyz/api3
https://thethiny.xyz/api3/
Weird Redirect:
https://thethiny.xyz/api2
https://thethiny.xyz/api4/
Edit: I understand the problem and have came up with some solutions in the answer below. I'm not satisfied with the solutions but I'm taking this as a limitation of mapping endpoints to different underlying endpoints. For more information, read about Reverse Proxy Pass and Redirects and Rewriting Location Header in HTTPd

I now understand the problem. So in my Apache Proxy it is giving the request to Flask on the endpoint specified 127.0.0.1/api2, so when there's a redirect request from within Flask, it tries to redirect to 127.0.0.1/api2/, since Flask doesn't have any information about the original url source. Using ProxyPreserveHost solves this only when the endpoint resources match, as in mapping /api2/ to /api2 but not /api4/ to /api2/, since on the redirect, Flask receives a request for /api2 -> /api2/ and returns that having no information about /api4. Unfortunately I don't there's an actual solution to this from Apache2/Flask configurations other than manually handling the routes specifically to how you want them to be, as in do not allow Flask to redirect automatically since it will not know how, and instead either manually redirect (external redirect) to the correct endpoint, or handle each route separately (/api and /api/stuff but not /api/).
Example:
app.add_url_rule("/api2", view_func=StubFunction(), redirect_to="/api2/")
app.add_url_rule("/api2/", view_func=ActualFunction())
And add ProxyPreserveHost On to your Apache2 config or use the built in Proxy Fixer if you don't want to modify your Virtual Hosts:
from werkzeug.middleware.proxy_fix import ProxyFix
app.wsgi_app = ProxyFix(app.wsgi_app, x_proto=1, x_host=1)
What happens now is that 127.0.0.1 gets translated to yourdomain.tld when delivered to your Flask app. So when you're redirecting back using redirect_to, you're redirecting to your domain externally, no longer relatively. So in the case above, /api2 is redirecting to myDomain.tld/api2/ then /api2/ is called, which is functional.
You can also skip the preserve host and manually put in your domain name in the redirect as so:
app.add_url_rule("/api2", view_func=StubFunction(), redirect_to="https://yourDomain.tld/api2/")
But I don't like this approach in case you change your domain for some reason.
tl;dr, don't put a trailing slash in your ProxyPass Applications.

Related

Flask : double slash in urls

I'm having some trouble with Flask deployed under gunicorn + nginx. My website is under a reverse proxy to go on the url : http://example.com/identity/.
I have several urls in my Flask application and for each one (for example "index") :
when I try to access to http://example.com/identity/index/ (with the trailing slash) it goes right to the url
when I try to access to http://example.com/identity/index (without the trailing slash) it goes to http://example.com/identity//index/? (notice the double slash and question mark).
In Flask, the route associated to index is : #route('/identity/index'). I guess my problem is a "normal" Flask behaviour but I would like to access index without any slash and have my normal url. Same goes with redirections using redirect(url_for('index')) for example.

Actually it was a problem with my Nginx configuration, I was running under https from the outside and under http between my servers. So I used this snippet : http://flask.pocoo.org/snippets/35/ to force https urls in my application and now it works !

How to detect which of the two virtual hosts is being used in python and flask

I have a website developed in flask running on an apache2 server that responds on port 80 to two URLs
Url-1 http://www.example.com
Url-2 http://oer.example.com
I want to detect which of the two urls the user is coming in from and adjust what the server does and store the variable in a config variable
app.config['SITE'] = 'OER'
or
app.config['SITE'] = 'WWW'
Looking around on the internet I can find lots of examples using urllib2 the issue is that you need to pass it the url you want to slice and I cant find a way to pull that out as it may change between the two with each request.
I could fork the code and put up two different versions but that's as ugly as a box of frogs.
Thoughts welcome.

Use the Flask request object (from flask import request) and one of the following in your request handler:
hostname = request.environ.get('HTTP_HOST', '')
url = urlparse(request.url)
hostname = url.netloc
This will get e.g. oer.example.com or www.example.com. If there is a port number that will be included too. Keep in mind that this ultimately comes from the client request so "bad" requests might have it set wrong, although hopefully apache wouldn't route those to your app.

Can't reach Eve REST API

I'm using Eve to create a REST API for MongoDB.
It's all working fine, except for the fact that I can't reach the API from any other computer (in the same network), or even a different URL (e.g.: if I set SERVER_NAME = 'localhost:29000', I will not be able to reach the API with 127.0.0.1 and vice versa).
I've been looking around for hours, and I can't seem to find an answer. I also tried other REST API's for MongoDB like Kule, and they seem to work just fine, but they don't have as many options as Eve has.

Eve's SERVER_NAME seems to be based on the configuration variable by the same name from Flask: See "More on server name" below the table in the Flask Configuration docs. So it's really just for the name (hostname / subdomain handling) - the actual network interfaces it binds to therfore are probably determined by the server that runs the WSGI application.
If you're just doing the
app = Eve()
app.run()
from the quickstart example, try
app.run(host='0.0.0.0')
instead and leave the server name empty (SERVER_NAME = '').
I've never used Eve, but from what I understand about how it's built that should work

Flask request.remote_addr is wrong on webfaction and not showing real user IP

I just deployed a Flask app on Webfaction and I've noticed that request.remote_addr is always 127.0.0.1. which is of course isn't of much use.
How can I get the real IP address of the user in Flask on Webfaction?
Thanks!

If there is a proxy in front of Flask, then something like this will get the real IP in Flask:
if request.headers.getlist("X-Forwarded-For"):
ip = request.headers.getlist("X-Forwarded-For")[0]
else:
ip = request.remote_addr
Update: Very good point mentioned by Eli in his comment. There could be some security issues if you just simply use this. Read Eli's post to get more details.

Werkzeug middleware
Flask's documentation is pretty specific about recommended reverse proxy server setup:
If you deploy your application using one of these [WSGI] servers behind an HTTP [reverse] proxy you will need to rewrite a few headers in order for the application to work [properly]. The two problematic values in the WSGI environment usually are REMOTE_ADDR and HTTP_HOST... Werkzeug ships a fixer that will solve some common setups, but you might want to write your own WSGI middleware for specific setups.
And also about security consideration:
Please keep in mind that it is a security issue to use such a middleware in a non-proxy setup because it will blindly trust the incoming headers which might be forged by malicious clients.
The suggested code (that installs the middleware) that will make request.remote_addr return client IP address is:
from werkzeug.contrib.fixers import ProxyFix
app.wsgi_app = ProxyFix(app.wsgi_app, num_proxies=1)
Note num_proxies which is 1 by default. It's the number of proxy servers in front of the app.
The actual code is as follows (lastest werkzeug==0.14.1 at the time of writing):
def get_remote_addr(self, forwarded_for):
if len(forwarded_for) >= self.num_proxies:
return forwarded_for[-self.num_proxies]
Webfaction
Webfaction's documentation about Accessing REMOTE_ADDR says:
...the IP address is available as the first IP address in the comma separated list in the HTTP_X_FORWARDED_FOR header.
They don't say what they do when a client request already contains X-Forwarded-For header, but following common sense I would assume they replace it. Thus for Webfaction num_proxies should be set to 0.
Nginx
Nginx is more explicit about it's $proxy_add_x_forwarded_for:
the “X-Forwarded-For” client request header field with the $remote_addr variable appended to it, separated by a comma. If the “X-Forwarded-For” field is not present in the client request header, the $proxy_add_x_forwarded_for variable is equal to the $remote_addr variable.
For Nginx in front of the app num_proxies should be left at default 1.

Rewriting the Ignas's answer:
headers_list = request.headers.getlist("X-Forwarded-For")
user_ip = headers_list[0] if headers_list else request.remote_addr
Remember to read Eli's post about spoofing considerations.

You can use request.access_route to access list of ip :
if len(request.access_route) > 1:
return request.access_route[-1]
else:
return request.access_route[0]
Update:
You can just write this:
return request.access_route[-1]

The problem is there's probably some kind of proxy in front of Flask. In this case the "real" IP address can often be found in request.headers['X-Forwarded-For'].

Running multiple sites from a single Python web framework [duplicate]

This question already has answers here:
Multiple sites on Django
(2 answers)
Closed 1 year ago.
I know you can do redirection based on the domain or path to rewrite the URI to point at a site-specific location and I've also seen some brutish if and elif statements for every site as shown in the following code, which I would like to avoid.
if site == 'site1':
...
elif site == 'site2:
...
What are some good and clever ways of running multiple sites from a single, common Python web framework (i.e., Pylons, TurboGears, etc)?

Django has this built in. See the sites framework.
As a general technique, include a 'host' column in your database schema attached to the data you want to be host-specific, then include the Host HTTP header in the query when you are retrieving data.

Using Django on apache with mod_python, I host multiple (unrelated) django sites simply with the following apache config:
<VirtualHost 1.2.3.4>
DocumentRoot /www/site1
ServerName site1.com
<Location />
SetHandler python-program
SetEnv DJANGO_SETTINGS_MODULE site1.settings
PythonPath "['/www'] + sys.path"
PythonDebug On
PythonInterpreter site1
</Location>
</VirtualHost>
<VirtualHost 1.2.3.4>
DocumentRoot /www/site2
ServerName site2.com
<Location />
SetHandler python-program
SetEnv DJANGO_SETTINGS_MODULE site2.settings
PythonPath "['/www'] + sys.path"
PythonDebug On
PythonInterpreter site2
</Location>
</VirtualHost>
No need for multiple apache instances or proxy servers. Using a different PythonInterpreter directive for each site (the name you enter is arbitrary) keeps the namespaces separate.

I use CherryPy as my web server (which comes bundled with Turbogears), and I simply run multiple instances of the CherryPy web server on different ports bound to localhost. Then I configure Apache with mod_proxy and mod_rewrite to transparently forward requests to the proper port based on the HTTP request.

Using multiple server instances on local ports is a good idea, but you don't need a full featured web server to redirect HTTP requests.
I would use pound as a reverse proxy to do the job. It is small, fast, simple and does exactly what we need here.
WHAT POUND IS:
a reverse-proxy: it passes requests from client browsers to one or more back-end servers.
a load balancer: it will distribute the requests from the client browsers among several back-end servers, while keeping session information.
an SSL wrapper: Pound will decrypt HTTPS requests from client browsers and pass them as plain HTTP to the back-end servers.
an HTTP/HTTPS sanitizer: Pound will verify requests for correctness and accept only well-formed ones.
a fail over-server: should a back-end server fail, Pound will take note of the fact and stop passing requests to it until it recovers.
a request redirector: requests may be distributed among servers according to the requested URL.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.