Connection Leak in Azure WebApp Using Python on Ubuntu - python

I have a Flask App running on an Ubuntu WebApp on Azure. Every morning my queries to the app fail with the below error:
sqlalchemy.exc.OperationalError: (pyodbc.OperationalError) ('08S01', '[08S01] [Microsoft][ODBC Driver 17 for SQL Server]TCP Provider: Error code 0x68 (104) (SQLExecDirectW)')
I am using SQLAlchemy ORM to query my Azure SQL Server Instance. I believe that my connections are becoming stale for the following reasons.
It happens every morning after no one uses the app
After X many failed returns, it starts working, until the next morning.
However to make things more weird, when I check sys.dm_exec_sessions on the sql server, it does not show any active connections (outside of the one I'm executing to check).
In addition, when I run the dockerized app on my local and connect to the DB I get no such error.
If anyone has had a similar issue I'd love some insights, or at least a recommendation on where to drill down.
https://azure.github.io/AppService/2018/03/01/Deep-Dive-into-TCP-Connections-in-App-Service-Diagnostics.html
This link helped me, but the solution is only for Windows Apps, not Linux.

With help from #snakecharmerb:
The application was in-fact holding on to a pool of dead connections, setting pool_recycle to a greater time solved the issue.
engine = create_engine(
key, echo=False, future=True,
echo_pool=True,
# connection pool will log informational output such as when connections are invalidated.
pool_recycle=3600
# causes the pool to recycle connections after the given number of seconds has passed.
)

Related

Connecting to jTDS Microsoft server with SQLalchemy and Presto

I'm trying to connect to an oldschool jTDS ms server for a variety of different analysis tasks. Firstly just using Python with SQL alchemy, as well as using Tableau and Presto.
Focusing on SQL Alchemy first at the moment I'm getting an error of:
Data source name not found and no default driver specified
With this, based on this thread here Connecting to SQL Server 2012 using sqlalchemy and pyodbc
i.e,
import urllib
params = urllib.parse.quote_plus("DRIVER={FreeTDS};"
"SERVER=x-y.x.com;"
"DATABASE=;"
"UID=user;"
"PWD=password")
engine = sa.create_engine("mssql+pyodbc:///?odbc_connect={FreeTDS}".format(params))
Connecting works fine through Dbeaver, using a jTDS SQL Server (MSSQL) driver (which is labelled as legacy).
Curious as to how to resolve this issue, I'll keep researching away, but would appreciate any help.
I imagine there is an old drive on the internet I need to integrate into SQL Alchemy to begin with, and then perhaps migrating this data to something newer.
Appreciate your time

Google Cloud SQL w/ Django - Extremely Slow Connection

Edit:
After doing some further investigation, the delay seems to be more Django than the Cloud SQL Proxy.
I added a couple of print statements at the start and end of a view, and they print instantly when the request is made, but it takes a further 60 seconds for the page to load.
I've stripped back the template files to include only the bare bones, removing most scripts and static resources and it's still pretty similar.
Changing my view to return a simple HttpResponse('Done') cuts the time drastically.
Whilst developing locally I am using Django to serve the static files as described in the docs. Again, I don't have this issue with other projects though.
Original Post:
I've recently noticed my Django application is incredibly slow to connect to my Google Cloud SQL database when using the Cloud SQL Proxy in my local development environment.
The initial connection takes 2-3 minutes, then 60 seconds per request thereafter. This applies when performing migrations or running the development server. Eventually the request completes.
I've tried scaling up the database but to no effect (it's relatively small anyway). Database version is MySQL 5.7 with machine type db-n1-standard-1. Previously I've used Django Channels but have since removed all references to this.
The Middleware and settings.py are relatively standard and identical to another Django app that connects in an instant.
The live site also connects very fast without any issues.
Python version is 3.6 w/ Django 2.1.4 and mysqlclient 1.3.14.
My database settings are defined as:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
'NAME': os.getenv('DB_NAME'),
'USER': os.getenv('DB_USER'),
'PASSWORD': os.getenv('DB_PASSWORD'),
'PORT': '3306',
}
}
DATABASES['default']['HOST'] = os.getenv('DB_HOST')
if os.getenv('GAE_INSTANCE'):
pass
else:
DATABASES['default']['HOST'] = '127.0.0.1'
Using environment variables or not doesn't seem to make a difference.
I'm starting the Cloud SQL Proxy via ./cloud_sql_proxy -instances="my-project:europe-west1:my-project-instance"=tcp:3306.
After invoking the proxy via the command line I see Ready for new connections. Running python manage.py runserver shows New connection for "my-project:europe-west1:my-project-instance" but then takes an age before I see Starting development server at http://127.0.0.1:8000/.
I'm also noticing several errors in Stackdriver:
_mysql_exceptions.OperationalError: (2006, "Lost connection to MySQL server at 'reading initial communication packet', system error: 95")
django.db.utils.OperationalError: (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 95")
AttributeError: 'SessionStore' object has no attribute '_session_cache'
These appear - or don't - from time to time without changing any settings.
I've read they may be an access rights issue but the connection is eventually made, it's just incredibly slow. I'm authorising via the Google Cloud SDK, which seems to work fine.
Eventually I found that the main source of the delay was a recursive function being called in one of my admin forms (which delayed the initial startup) and context processors (which delayed each load). After removing it, the pages loaded without issue. It worked fine when deployed to App Engine or when using a test/local SQLite database, though, which is what made debugging a little harder.

OperationalError: Can't start transaction Postgresql

I am running pgdb.connect to make a connection to my postgresql server and cursor to execute queries. For the first time (sometimes twice) it works properly but after that it gives me
OperationalError: Can't start transaction
on "cursor.execute(query)"
Can someone please help me on how to solve this error?
I am using docker compose with one containing postgresql and another a flask server.

Dealing with Concurrent Requests & Connections in Cloud SQL

We have created an app on App Engine using Datastore. Now we have been led to using Cloud SQL as we wanted to use joins, nested queries and functions such as average, total etc. Data gets migrated from Datastore to Cloud SQL by daily Cron jobs.
I was just going through the below links to know the basic details related to performance and limitations.
https://cloud.google.com/sql/docs/diagnose-issues#data-issues
https://cloud.google.com/sql/pricing#v1-pricing
https://cloud.google.com/sql/faq#sizeqps
So far it looks like Tier D0 or D1 will serve the purpose that we intended.
Few things which are confusing me:
a) What is Pending Connection and how does it affect ?
Not sure if this throws 1033 Instance has too many concurrent requests, when it exceeds 100. How do we handle this ? Is it like we can create 250 connections but use only 100 at a time ?
b) 250 Concurrent Connections.
Should throw error Too Many Connections if it exceeds 250
c) Per app engine instance 12 concurrent connections per SQL Instance. How do we ensure that no more than 12 connections per app engine instance ?
I have gone through the following forums:
What are the connection limits for Google Cloud SQL from App Engine, and how to best reuse DB connections?
What's a good approach to managing the db connection in a Google Cloud SQL (GAE) Python app?
But people face certain issues in that.
d) We got an OperationalError:
(2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 38") error when we tried a test with 1000+ requests.
We have 1500+ people using our system concurrently and it looks like it will fail. So we are just confused if we can use Cloud SQL due to the above mentioned issues. But solutions should be available though.
Can anyone help?
In my company we run a postgres database on Google Cloud SQL. We encountered a similar problem, and our solution consisted in using a proxy service that keeps a common pool of connectors for all the users.
For Postgres, this is called PgBouncer (https://www.pgbouncer.org/). After a short googling session I found ProxySQL (https://proxysql.com/) that is mentioned as a potential similar tool for MySQL (see: https://github.com/prisma/prisma/issues/3258).
On their website, ProxySQL also provide a schema that can help you understand how this service looks like in practice.

SQLAlchemy hangs while connecting to SQL Azure, but not always

I have a django application, which is making use of SQLAlchemy to connect to a SQL Server instance on Windows Azure. The app has worked perfectly for 3 months on a local SQL Server instance, and for over a month on an Azure instance. The issues appeared this monday, after a week without any code changes.
The site uses:
Python 2.7
Django 1.6
Apache/Nginx
SQLAlchemy 0.9.3
pyODBC 3.0.7
FreeTDS
The application appears to lock up right after a connection is pulled out of the Pool (I have setup verbose logging at every point in the workflow). I assumed this had something to do with the connections going stale. So we tried making the pool_recycle incredibly short (5 secs), all the way up to an hour. That did not help.
We also tried using the NullPool to force a new connection on every page view. However that does not help either. After about 15 minutes the site will completely lock up again (meaning no pages that use the database are viewable).
The weird thing is, half the computers that experience the "hang", will end up loading the page about 15 minutes later.
Has anyone had any experience with SQL Azure and SQLAlchemy?
I found a workaround for this issue. Please note that this is definitely not a fix, since the site worked perfectly fine before. We could not determine what the actual issue is because SQL Azure has no error log (one of the 100 reasons I would suggest never considering SQL Azure over a real database server).
I got around the problem by turning off all Connection Pooling, at the application level, AND at the driver level.
Things started consistently working after making my /etc/odbcinst.ini look like:
[FreeTDS]
Description = TDS driver (Sybase/MS SQL)
# Some installations may differ in the paths
Driver = /usr/lib/x86_64-linux-gnu/odbc/libtdsodbc.so
Setup = /usr/lib/x86_64-linux-gnu/odbc/libtdsS.so
CPReuse =
CPTimeout = 0
FileUsage = 1
Pooling = No
The key being setting CPTimeout (Connection Pool Timeout) to 0, and Pooling to No. Just turning pooling off at the application level (in SQL Alchemy) did not work, only after setting it at the driver level did things start working smoothly.
I am now at 4 days without a problem after changing that setting.

Categories