I have a Python Tornado server sitting behind a nginx frontend. Every now and then, but not every time, I get a 502 error. I look in the nginx access log and I see this:
127.0.0.1 - - [02/Jun/2010:18:04:02 -0400] "POST /a/question/updates HTTP/1.1" 502 173 "http://localhost/tagged/python" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3"
and in the error log:
2010/06/02 18:04:02 [error] 14033#0: *1700 connect() failed (111: Connection refused)
while connecting to upstream, client: 127.0.0.1, server: _,
request: "POST /a/question/updates HTTP/1.1",
upstream: "http://127.0.0.1:8888/a/question/updates", host: "localhost", referrer: "http://localhost/tagged/python"
I don't think any errors show up in the Tornado log. How would you go about debugging this? Is there something I can put in the Tornado or nginx configuration to help debug this?
The line from the error log is very informative in my opinion. It says the connection was refused by the upstream, it contains client IP, Nginx server config, request line, hostname, upstream URL and referrer.
It is pretty clear you must look at the upstream (or firewall) to find out the reason.
In case you'd like to look at how Nginx processes the request, why it chooses specific server and location sections -- there is a beautiful "debug" mode. (Note, your Nginx binary must be built with debugging symbols included). Then:
error_log /path/to/your/error.log debug;
will turn on debugging for all the requests. Debugging information in the error log requires some time to get used to interpret it, but it's worth the efforts.
Do not use this "as is" for high traffic sites! It generates a lot of information and your error log will grow very fast. If you need to debug requests in the production, use debug_connection directive:
events {
debug_connection 1.2.3.4;
}
It turns debugging on for the specific client IP address only.
Related
I have 2 containers, client and server. In server, I started a python SimpleHTTPServer:
python3 -m http.server
Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/) ...
on client, I started a request to it:
import requests
with requests.Session() as s:
r = s.get("http://172.17.0.3:8000")
The server successfully received the request:
172.17.0.2 - - [26/Oct/2022 00:17:29] "GET / HTTP/1.1" 200 -
Now, I am inspecting the state of the TCP connections in server using ss and netstat and I see the TCP connections are still open for a little while before actually getting discarded, even when my python script already finished.
What's the point of opening a request.Session() in a context manager, if the connection is going to be left open anyways? According to the requests documentation:
Sessions can also be used as context managers:
with requests.Session() as s:
s.get('https://httpbin.org/cookies/set/sessioncookie/123456789')
This will make sure the session is closed as soon as the with block is exited, even if
unhandled exceptions occurred.
This says the Session is closed but doesn't really specify that the TCP connection is closed. What's the point of closing the Session, if the TCP connection is going to be left open anyways? Is there a way to close the TCP connection right after the context manager finishes? Is it even recommended?
I have a Django app with which users can create video collages using multiple videos. Problem is, on production, when uploading videos to amazon s3, I get a 502 bad gateway (works fine locally). Does anyone know what could be wrong? I already set
client_max_body_size 100M
and
fastcgi_buffers 8 16k;
fastcgi_buffer_size 32k;
fastcgi_connect_timeout 3000;
fastcgi_send_timeout 3000;
fastcgi_read_timeout 3000;
Does anyone know what could be wrong? Thanks in advance
Full error:
2017/12/31 23:50:51 [error] 1279#1279: *1 upstream prematurely closed connection while reading response header from upstream,
client: 107.205.110.154,
server: movingcollage.com,
request: "POST /create-collage/ HTTP/1.1",
upstream: "http://unix:/home/mike/movingcollage/movingcollage.sock:/create-collage/",
host: "movingcollage.com", referrer: "http://movingcollage.com/create-collage/"
If the problem were in nginx timeout it would give you 504 error. 502 error means that this error could happen due to timeout in process behind nginx, gunicorn in your case I guess. Try to launch it with -t 3000 param (to match your nginx conf).
I am having a problem with a misbehaving HTTP Proxy server. I have no control over the proxy server, unfortunately -- it's an 'enterprise' product from IBM. The proxy server is part of a service virtualization solution being leveraged for software testing.
The fundamental issue (I think*) is that the proxy server sends back HTTP/1.0 responses. I can get it to work fine from SOAP UI ( A Java application) and curl from the command line, but Python refuses to connect. From what I can tell, Python is behaving correctly, and the other two are not, as the server expects HTTP/1.1 responses (it wants Host headers, at the very least, to route the service request to a given stub).
Is there a way to get Requests, or the underlying urllib3, or the even farther down http lib to always use http1.1, even if the other end appears to be using 1.0?
Here is a sample program (unfortunately, it requires you to have an IBM Ration Integration Tester installation with RTCP to really replicate) to reproduce the problem:
import http.client as http_client
http_client.HTTPConnection.debuglevel = 1
import logging
import requests
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True
requests.post("https://host:8443/axl",
headers={"soapAction": '"CUCM:DB ver=9.1 updateSipTrunk"'},
data='<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:tns="http://www.cisco.com/AXL/API/9.1"><soapenv:Header/><soapenv:Body><tns:updateSipTrunk><name>PLACEHOLDER</name><newName>PLACEHOLDER</newName><destinations><destination><addressIpv4>10.10.1.5</addressIpv4><sortOrder>1</sortOrder></destination></destinations></tns:updateSipTrunk></soapenv:Body></soapenv:Envelope>',
verify=False)
(Proxy is configured via HTTPS_PROXY environment variable)
Debug output before the error, note the HTTP/1.0:
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): host.com
send: b'CONNECT host.com:8443 HTTP/1.0\r\n'
send: b'\r\n'
header: Host: host.com:8443
header: Proxy-agent: Green Hat HTTPS Proxy/1.0
The exact error text that occurs in RHEL 6 is:
requests.exceptions.SSLError: [SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:646)
Even though the Host header is shown here, it does NOT show up on the wire. I confirmed this with a tcpdump:
14:03:14.315049 IP sourcehost.53214 > desthost.com: Flags [P.], seq 0:32, ack 1, win 115, options [nop,nop,TS val 2743933964 ecr 4116114841], length 32
0x0000: 0000 0c07 ac00 0050 56b5 4044 0800 4500 .......PV.#D..E.
0x0010: 0054 3404 4000 4006 2ca0 0af8 3f15 0afb .T4.#.#.,...?...
0x0020: 84f8 cfde 0c7f a4f8 280a 4ebd b425 8018 ........(.N..%..
0x0030: 0073 da46 0000 0101 080a a38d 1c0c f556 .s.F...........V
0x0040: XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX ..CONNECT.host
0x0050: XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX xx:8443.HTTP/1.0
0x0060: 0d0a
When I curl it with verbose, this is what the output looks like:
* About to connect() to proxy proxy-host.com port 3199 (#0)
* Trying 10.**.**.** ... connected
* Connected to proxy-host.com (10.**.**.**) port 3199 (#0)
* Establish HTTP proxy tunnel to host.com:8443
> CONNECT host.com:8443 HTTP/1.1
> Host: host.com:8443
> User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.19.1 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2
> Proxy-Connection: Keep-Alive
> soapAction: "CUCM:DB ver=9.1 updateSipTrunk"
>
< HTTP/1.0 200 OK
< Host: host.com:8443
< Proxy-agent: Green Hat HTTPS Proxy/1.0
<
* Proxy replied OK to CONNECT request
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* CAfile: /path/to/store/ca-bundle.crt
CApath: none
* SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
Truncated after this point. You can see the HTTP/1.0 response from the proxy after connecting. The curl's tcpdump also clearly shows the host header, as well as HTTP 1.1.
*I can't be entirely sure this is the fundamental issue, as I can't test it. I do see HTTP/1.0 responses, and can tell that my non-working Python code sends CONNECT HTTP/1.0 messages, while the working Java sends HTTP/1.1 messages, as does Curl. It's possible the problem is unrelated (although I find that unlikely) or that Python is misbehaving, and not Java/curl. I simply don't know enough to know for sure.
So, is there a way to force urllib3/requests to use HTTP v1.1 at all times?
httplib (which requests relies upon for HTTP(S) heavy lifting) always uses HTTP/1.0 with CONNECT:
Lib/httplib.py:788:
def _tunnel(self):
self.send("CONNECT %s:%d HTTP/1.0\r\n" % (self._tunnel_host,
self._tunnel_port))
for header, value in self._tunnel_headers.iteritems():
self.send("%s: %s\r\n" % (header, value))
self.send("\r\n")
<...>
So you can't "force" it to use "HTTP/1.1" here other than by editing the subroutine.
This MAY be the problem if the proxy doesn't support HTTP/1.0 - in particular, 1.0 does not require a Host: header, and indeed, as you can see by comparing your log output with the code above, httplib does not send it. While, in verity, a proxy may expect it regardless. But if this is the case, you should've gotten an error from the proxy or something in response to CONNECT -- unless the proxy is so borken that it substitutes some default (or garbage) for Host:, returns 200 anyway and tries to connect God-knows-where, at which point you're getting timeouts.
You can make httplib add the Host: header to CONNECT by adding it to _tunnel_headers (indirectly):
s=requests.Session()
proxy_url=os.environ['HTTPS_PROXY']
s.proxies["https"]=proxy_url
# have to specify proxy here because env variable is only detected by httplib code
#while we need to trigger requests' proxy logic that acts earlier
# "https" means any https host. Since a Session persists cookies,
#it's meaningless to make requests to multiple hosts through it anyway.
pm=s.get_adapter("https://").proxy_manager_for(proxy_url)
pm.proxy_headers['Host']="host.com"
del pm,proxy_url
<...>
s.get('https://host.com')
If you do not depend on the requests library you may find the following snippet useful:
import http.client
conn = http.client.HTTPSConnection("proxy.domain.lu", 8080)
conn.set_tunnel("www.domain.org", 443, headers={'User-Agent': 'curl/7.56.0'})
conn.request("GET", "/api")
response = conn.getresponse()
print( response.read() )
I have a setup with nginx, uwsgi, and gevent. When testing the setup's ability to handle premature client disconnects, I found that uwsgi isn't exactly responding in a timely manner.
This is how I detect that a disconnect has occurred inside of my python code:
While True:
if 'uwsgi' in sys.modules:
import uwsgi ##UnresolvedImport
fileDescriptor = uwsgi.connection_fd()
if not uwsgi.is_connected(fileDescriptor):
logger.debug("Connection was lost (client disconnect)")
break
So when uwsgi signals a lost of connection, I break out of this loop. There's also a call to gevent.sleep(2) at the bottom of the loop to prevent hammering the CPU.
With that in place I have nginx logging the close connection like this:
2016/08/16 19:23:23 [info] 32452#0: *1 epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while sending to client, client: 192.168.56.1, server: <removed>, request: "GET /myurl HTTP/1.1", upstream: "uwsgi://127.0.0.1:8070", host: "<removed>:8443"
nginx is immediately aware of the disconnect when it produces this log entry, it's within milliseconds of the client disconnecting. Yet uwsgi doesn't seem to be aware of the disconnect until seconds, sometimes almost a minute later at least in terms of notifying my code:
DEBUG - Connection was lost (client disconnect) - 391 ms[08/16/16 19:24:04 UTC])
The uwsgi.log file created via daemonize suggests it somehow saw it a second before nginx but somehow waited half a minute to actually tell my code:
[pid: 32208|app: 0|req: 2/2] 192.168.56.1 () {32 vars in 382 bytes} [Tue Aug 16 19:23:22 2016] GET /myurl => generated 141 bytes in 42030 msecs (HTTP/1.1 200) 2 headers in 115 bytes (4 switches on core 999
This is my setup in nginx:
upstream bottle {
server 127.0.0.1:8070;
}
server {
listen 8443;
ssl on;
ssl_certificate /etc/pki/tls/certs/server.crt;
ssl_certificate_key /etc/pki/tls/private/server.key;
server_name <removed>;
# Load configuration files for the default server block.
include /etc/nginx/default.d/*.conf;
location / {
include uwsgi_params;
#proxy_read_timeout 5m;
uwsgi_buffering off;
uwsgi_ignore_client_abort off;
proxy_ignore_client_abort off;
proxy_cache off;
chunked_transfer_encoding off;
#uwsgi_read_timeout 5m;
#uwsgi_send_timeout 5m;
uwsgi_pass bottle;
}
}
The odd part to me is how the timestamp from uwsgi is saying it saw it right when nginx did, however it doesn't write that entry until my code sees it ~30 seconds later. It appears from my perspective, that uwsgi is essentially lying or locking it up, yet I can't find any errors from it.
Any help is appreciated. I've attempted to remove any buffering and delays from nginx without any success.
I have my Python Flask web app hosted on nginx. While trying to execute a request it shows a timeout error in the nginx error log as shown below :
[error] 2084#0: *1 upstream timed out (110: Connection timed out)
while reading response header from upstream, client:
192.168.2.224, server: 192.168.2.131, request: "POST /execute HTTP/1.1", upstream: "uwsgi://unix:/hom
e/jay/PythonFlaskApp/app.sock", host: "192.168.2.131:9000", referrer:
"http://192.168.2.131:9000/"
If I try to run the app locally it works fine and responds fine.
Any one have any idea what might be wrong ?
the error found in browser console is :
Gateway Time-out
Here is the nginx config file:
server {
listen 9000;
server_name 192.168.2.131;
location / {
include uwsgi_params;
proxy_read_timeout 300;
uwsgi_pass unix:/home/jay/PythonFlaskApp/app.sock;
}
}
And here is the Python Fabric code that i trying to execute. i'm not sure if this is causing the issue, but any waz here is the code :
from fabric.api import *
#application.route("/execute",methods=['POST'])
def execute():
try:
machineInfo = request.json['info']
ip = machineInfo['ip']
username = machineInfo['username']
password = machineInfo['password']
command = machineInfo['command']
isRoot = machineInfo['isRoot']
env.host_string = username + '#' + ip
env.password = password
resp = ''
with settings(warn_only=True):
if isRoot:
resp = sudo(command)
else:
resp = run(command)
return jsonify(status='OK',message=resp)
except Exception, e:
print 'Error is ' + str(e)
return jsonify(status='ERROR',message=str(e))
I have a uWSGi config file for the web app and started it using an upstart script. Here is uwSGi conf file :
[uwsgi]
module = wsgi
master = true
processes = 5
socket = app.sock
chmod-socket = 660
vacuum = true
die-on-term = true
and here is upstart script
description "uWSGI server instance configured to serve Python Flask App"
start on runlevel [2345]
stop on runlevel [!2345]
setuid jay
setgid www-data
chdir /home/jay/PythonFlaskApp
exec uwsgi --ini app.ini
I have followed the below tutorial on running flask app on nginx
This is likely a problem with the Fabric task, not with Flask. Have you tried isolating / removing Fabric from the application, just for troubleshooting purposes? You could try stubbing out a value for resp, rather than actually executing the run/sudo commands in your function. I would bet that the app works just fine if you do that.
And so that would mean that you've got a problem with Fabric executing the command in question. First thing you should do is verify this by mocking up an example Fabfile on the production server using the info you're expecting in one of your requests, and then running it with fab -f <mock_fabfile.py>.
It's also worth noting that using with settings(warn_only=True): can result in suppression of error messages. I think that you should remove this, since you are in a troubleshooting scenario. From the docs on Managing Output:
warnings: Warning messages. These are often turned off when one expects a given operation to fail, such as when using grep to test existence of text in a file. If paired with setting env.warn_only to True, this can result in fully silent warnings when remote programs fail. As with aborts, this setting does not control actual warning behavior, only whether warning messages are printed or hidden.
As a third suggestion, you can get more info out of Fabric by using the show('debug') context manager, as well as enabling Paramiko's logging:
from fabric.api import env
from fabric.context_managers import show
# You can also enable Paramiko's logging like so:
import logging
logging.basicConfig(level=logging.DEBUG)
def my_task():
with show('debug'):
run('my command...')
The Fabric docs have some additional suggestions for troubleshooting: http://docs.fabfile.org/en/1.6/troubleshooting.html. (1.6 is an older/outdated version, but the concepts still apply.)