I am using nginx as a reverse proxy in front of a uWSGI server (flask apps).
Due to a memory leak, use --max-requests to reload workers after so many calls.
The issue is the following : When a worker just restarted/started, the first request it receives stays hanging between uWSGI and NGINX, the process time inside of the flask app is usual and very quick but the client waits until uwsgi_send_timeout is triggered.
Using tcpdump to see the request (nginx is XXX.14 and uWSGI is XXX.11) :
You can see in the time column that it hangs for 300 seconds (uwsgi_send_timeout) eventhough the HTTP request has been received by NGINX... uWSGI just doesn't send a [FIN] packet to signal that the connexion is closed. Then NGINX triggers the timeout and closes the session.
The end client receives a truncated response.. With a 200 status code. which is very frustrating.
This happens at every worker reload, only once, the first request no matter how big the request.
Does anyone have a workaround this issue? have I misconfigured something?
uwsgi.ini
[uwsgi]
# Get the location of the app
module = api:app
plugin = python3
socket = :8000
manage-script-name = true
mount = /=api:app
cache2 = name=xxx,items=1024
# Had to increase buffer-size because of big authentication requests.
buffer-size = 8192
## Workers management
# Number of workers
processes = $(UWSGI_PROCESSES)
master = true
# Number of requests managed by 1 worker before reloading (reload is time expensive)
max-requests = $(UWSGI_MAX_REQUESTS)
lazy-apps = true
single-interpreter = true
nginx-server.conf
server {
listen 443 ssl http2;
client_max_body_size 50M;
location #api {
include uwsgi_params;
uwsgi_pass api:8000;
uwsgi_read_timeout 300;
uwsgi_send_timeout 300;
}
For some weird reason, adding the parameter uwsgi_buffering off; in the nginx config fixed the issue.
I still don't understand why but for now this fixes my issue. If anyone has a valid explanation, don't hesitate.
server {
listen 443 ssl http2;
client_max_body_size 50M;
location #api {
include uwsgi_params;
uwsgi_pass api:8000;
uwsgi_buffering off;
uwsgi_read_timeout 300;
uwsgi_send_timeout 300;
}
Related
I am serving a sagemaker model through a custom docker container using the guide that AWS provides. This is a docker container that runs a simple nginx->gunicorn/wsgi->flask server
I am facing an issue where my transform requests time out around 30 minutes in all instances, despite should being able to continue to 60 minutes. I need requests to be able to go to sagemaker maximum of 60 minutes due to data intense nature of request.
Through experience working with this setup for some months, I know that there are 3 factors that should affect the time my server has to respond to requests:
Sagemaker itself will cap invocations requests according to the
InvocationsTimeoutInSeconds paremeter set when creating the batch
transform
job.
The nginx.conf file must be configured such that keepalive_timeout, proxy_read_timeout, proxy_send_timeout, and proxy_connect_timeout are all equal or greater than maximum timeout
gunicorn server must its timeout configured to be equal or greater than maximum timeout
I have verified that when I create my batch transform job InvocationsTimeoutInSeconds is set to 3600 (1 hour)
My nginx.conf looks like this:
worker_processes 1;
daemon off; # Prevent forking
pid /tmp/nginx.pid;
error_log /var/log/nginx/error.log;
events {
# defaults
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
access_log /var/log/nginx/access.log combined;
sendfile on;
client_max_body_size 30M;
keepalive_timeout 3920s;
upstream gunicorn {
server unix:/tmp/gunicorn.sock;
}
server {
listen 8080 deferred;
client_max_body_size 80m;
keepalive_timeout 3920s;
proxy_read_timeout 3920s;
proxy_send_timeout 3920s;
proxy_connect_timeout 3920s;
send_timeout 3920s;
location ~ ^/(ping|invocations) {
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_redirect off;
proxy_pass http://gunicorn;
}
location / {
return 404 "{}";
}
}
}
I start the gunicorn server like this:
def start_server():
print('Starting the inference server with {} workers.'.format(model_server_workers))
print('Model server timeout {}.'.format(model_server_timeout))
# link the log streams to stdout/err so they will be logged to the container logs
subprocess.check_call(['ln', '-sf', '/dev/stdout', '/var/log/nginx/access.log'])
subprocess.check_call(['ln', '-sf', '/dev/stderr', '/var/log/nginx/error.log'])
nginx = subprocess.Popen(['nginx', '-c', '/opt/program/nginx.conf'])
gunicorn = subprocess.Popen(['gunicorn',
'--timeout', str(3600),
'-k', 'sync',
'-b', 'unix:/tmp/gunicorn.sock',
'--log-level', 'debug',
'-w', str(1),
'wsgi:app'])
signal.signal(signal.SIGTERM, lambda a, b: sigterm_handler(nginx.pid, gunicorn.pid))
# If either subprocess exits, so do we.
pids = set([nginx.pid, gunicorn.pid])
while True:
pid, _ = os.wait()
if pid in pids:
break
sigterm_handler(nginx.pid, gunicorn.pid)
print('Inference server exiting')
Despite all this, whenever a transform job takes longer than approx 30 minutes I will see this message in my logs and the transform job status becomes failed:
2023/01/07 08:23:14 [error] 11#11: *4 upstream prematurely closed connection while reading response header from upstream, client: 169.254.255.130, server: , request: "POST /invocations HTTP/1.1", upstream: "http://unix:/tmp/gunicorn.sock:/invocations", host: "169.254.255.131:8080"
I am close to thinking there is a bug in AWS batch transform, but perhaps I am missing some other variable (perhaps in the nginx.conf) that could lead to premature upstream termination of my request.
By looking at hardware metrics was able to determine that the upstream termination only happens when the server was near its memory limit. So my guess is that the OS was killing the gunicorn worker and the 30 minute mark was just a coincidence that happened on my long running test cases.
My solution was to increase the memory available on the server
I'm using smtplib to send simple emails for booking in a flask application I'm using google mail and have an app password as well as allowed less secure applications. I have the booking system running on my personal computer, but as soon as I port it over to the VPS it stops working, for an unknown reason other than the username and password are not accepted, but they are definitely correct, and it will run by itself but wont when run in wsgi and nginx.
Nginx config
server {
listen 80;
server_name example.com;
# return 301 https://$server_name$request_uri;
location / {
uwsgi_pass unix:/path/too/chatbot.sock;
include uwsgi_params;
}
}
server {
listen 443 ssl http2;
listen [::]:443 ssl http2;
server_name example.com;`
ssl_certificate /path/too/keys.pem;
ssl_certificate_key /path/too//primarykey.pem;
ssl_trusted_certificate /path/too//keys.pem;
ssl_session_timeout 1d;
ssl_session_cache shared:MozSSL:10m; # about 40000 sessions
# curl https://ssl-config.mozilla.org/ffdhe2048.txt > /path/to/dhparam
#ssl_dhparam /path/to/dhparam;
# intermediate configuration
ssl_protocols TLSv1.2;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384;
ssl_prefer_server_ciphers on;
# HSTS (ngx_http_headers_module is required) (63072000 seconds)
add_header Strict-Transport-Security "max-age=63072000" always;
# replace with the IP address of your resolver
resolver 8.8.8.8;
location / {
include uwsgi_params;
uwsgi_pass unix:/path/too/chatbot.sock;
}
}
UWSGI.ini file
[uwsgi]
module=wsgi:app
master = true
processes = 5
enable-threads = true
socket = chatbot.sock
chmod.socket = 666
vacuum = true
die-on-term = true
.env
DIALOGFLOW_PROJECT_ID=projectid
GOOGLE_APPLICATION_CREDENTIALS=Ajsonfile.json
RESTFUL_CREDENTIALS=restful_credentials.json
MAIL_USERNAME=example#gmail.com
MAIL_PASSWORD=apasswordforemailaddress
My current thinking is that wsgi or nginx are unable too find the file due to some sort of permissions issue but I've chown'ed all the related files, I'm getting the same issue with my google api key now too.
all the information is stored in a .env file which has the correct group access, along with all the other files running on the site already.
I don't know what would be helpful to post here other than I'm using nginx and wsgi to expose a flask application, some items are stored in a .env file that doesn't seem to be read.
To get them to load while running in WSGI you need to use dot-env package
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())
[uwsgi]
base = /var/www/html/poopbuddy-api
chdir = %(base)
app = app
I don't know exactly what chdir does but I think it at least sets the default location to the root directory of the app. From there, load_dotenv() works.
I am trying to redirect all get requests to abc.example.com and send them to example.com. The following works on local:
#app.route('/', methods=['GET'])
def message_get():
return redirect('https://example.com')
But on production server, it fails. Instead of getting redirected, the url looks like this:
abc.example.comget%20/%20HTTP/1.1)uri
I observed that if I put in the whole url like this
https://abc.example.com
it redirects properly. but abc.example.com or http://abc.example.com fails.
I have a flask app with gunicorn app server. Nginx is used as reverse proxy. Unable to determine which of them is causing the problem. Guessing something to do with my nginx configuration. But any pointers will help. thanks.
Nginx configuration:
server {
listen 80 default_server;
listen [::]:80 default_server;
server_name abc.example.com;
return 301 https://$server_name$request)uri; }
server{
# SSL configuration
server_name abc.example.com;
listen 443 ssl http2 default_server;
listen [::]:443 ssl http2 default_server;
include snippets/ssl-abc.example.com.conf;
include snippets/ssl-params.conf;
location / {
include proxy_params;
proxy_pass http://unix:/home/user1/apps/myapp/myapp.sock;
}
location ~ /.well-known {
allow all;
}
}
You have a typo in your first server section; it should be:
return 301 https://$server_name$request_uri;
I have a setup with nginx, uwsgi, and gevent. When testing the setup's ability to handle premature client disconnects, I found that uwsgi isn't exactly responding in a timely manner.
This is how I detect that a disconnect has occurred inside of my python code:
While True:
if 'uwsgi' in sys.modules:
import uwsgi ##UnresolvedImport
fileDescriptor = uwsgi.connection_fd()
if not uwsgi.is_connected(fileDescriptor):
logger.debug("Connection was lost (client disconnect)")
break
So when uwsgi signals a lost of connection, I break out of this loop. There's also a call to gevent.sleep(2) at the bottom of the loop to prevent hammering the CPU.
With that in place I have nginx logging the close connection like this:
2016/08/16 19:23:23 [info] 32452#0: *1 epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while sending to client, client: 192.168.56.1, server: <removed>, request: "GET /myurl HTTP/1.1", upstream: "uwsgi://127.0.0.1:8070", host: "<removed>:8443"
nginx is immediately aware of the disconnect when it produces this log entry, it's within milliseconds of the client disconnecting. Yet uwsgi doesn't seem to be aware of the disconnect until seconds, sometimes almost a minute later at least in terms of notifying my code:
DEBUG - Connection was lost (client disconnect) - 391 ms[08/16/16 19:24:04 UTC])
The uwsgi.log file created via daemonize suggests it somehow saw it a second before nginx but somehow waited half a minute to actually tell my code:
[pid: 32208|app: 0|req: 2/2] 192.168.56.1 () {32 vars in 382 bytes} [Tue Aug 16 19:23:22 2016] GET /myurl => generated 141 bytes in 42030 msecs (HTTP/1.1 200) 2 headers in 115 bytes (4 switches on core 999
This is my setup in nginx:
upstream bottle {
server 127.0.0.1:8070;
}
server {
listen 8443;
ssl on;
ssl_certificate /etc/pki/tls/certs/server.crt;
ssl_certificate_key /etc/pki/tls/private/server.key;
server_name <removed>;
# Load configuration files for the default server block.
include /etc/nginx/default.d/*.conf;
location / {
include uwsgi_params;
#proxy_read_timeout 5m;
uwsgi_buffering off;
uwsgi_ignore_client_abort off;
proxy_ignore_client_abort off;
proxy_cache off;
chunked_transfer_encoding off;
#uwsgi_read_timeout 5m;
#uwsgi_send_timeout 5m;
uwsgi_pass bottle;
}
}
The odd part to me is how the timestamp from uwsgi is saying it saw it right when nginx did, however it doesn't write that entry until my code sees it ~30 seconds later. It appears from my perspective, that uwsgi is essentially lying or locking it up, yet I can't find any errors from it.
Any help is appreciated. I've attempted to remove any buffering and delays from nginx without any success.
I am trying to run Tornado on multicore CPU with each tornado IOLoop process on a different core, and I'll use NGINX for proxy pass to Tornado processes. Now when I check http://www.tornadoweb.org/en/stable/guide/running.html
Editing the actual configuration here for more details:
events {
worker_connections 1024;
}
http {
upstream chatserver {
server 127.0.0.1:8888;
}
server {
# Requires root access.
listen 80;
# WebSocket.
location /chatsocket {
proxy_pass http://chatserver;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
location / {
proxy_pass http://chatserver;
}
}
}
Now previously I was able to connect to socket ws://localhost:8888 from client (When I was running python main.py but now I can't connect. At the server, NGINX is changing the request to http somehow that I want to avoid. Access logs at tornado server:
WARNING:tornado.access:400 GET /search_image (127.0.0.1) 0.83ms
How can I make the nginx only communicate via ws:// not http://
I figured out the issue and it was solved by override tornado's check_origin function by making it return true in all cases. Thank you all.