Can a connection to postgresql using psycopg2 hold for 24+ hours? - python

I have a bot running 24/7 that accesses a PostgreSQL database. My first implementation of it would create a connection and close it for every transaction (first time learning SQL) but I learned that it takes a long time to create/close all these connections.
I made a small code to test the difference and got the following:
>>>test.py
100 tries:
26.547296285629272 s (non persistent)
1.3095812797546387 s (persistent)
My question is can a persistent connection hold for 24+ hours? if not can I check for it and reconnect?

There is no inherent limit on the age of a connection. If you operating through a firewall or gateway though, it might interfere with attempts to hold one indefinitely. And of course if you ever take the server down for maintenance or cold backup, that will also break the connection.
The classic way to "ping" a suspect connection in PostgreSQL is to issue select 1;. Some connection poolers will do this for you. You should probably use one, rather than inventing your own. Assuming you need one in the first place--while establishing connections is slow, it shouldn't be nearly as slow as your mysterious benchmark is showing.

Related

Is is possible to cherry-pick the db connection (persistent) django will use on each request on a per client/session basis?

I'd like to be able to tell someone (django, pgBouncer, or whoever can provide me with this service) to always hand me the same connection to the database (PostgreSQL in this case) on a per client/session basis, instead of getting a random one each time (or creating a new one for that matter).
To my knowledge:
Django's CONN_MAX_AGE can control the lifetime of connections, so
far so good. This will also have a positive impact on performance
(no connection setup penalties).
Some pooling package (pgBouncer for example) can hold the connections and hand them to me as I need them. We're almost there.
The only bit I'm missing is the possibility to ask pgBouncer (or any other similar tool for that matter) to give me a specific db connection, instead of "one from the pool". This is important because I want to have control over the lifetime of the transaction. I want to be able to open a transaction, then send a series of commands, and then manually commit all the work, or roll everything back should something fail.
Many years ago, I've implemented something very similar to what I'm looking for now. It was a simple connection pool made in C which would hold as many connections to oracle as clients needed on one hand, while on the other it would give these clients the chance to recover these exact connections based on some ID, which could have been for example a PHP session ID. That way users could acquire a lock on some database object/row, and the lock would persist even after the apache process died. From that point on the session owner was in total control of that row until he decided it was time to commit it, or until the backend decided it was time to let the transaction go by idleness.

Telnet Connection Pooling

Background: I'm currently trying to develop a monitoring system at my job. All the nodes that need to be monitored are accessible via Telnet. Once a Telnet connection has been made, the system needs to execute a couple of commands on the node and process the output.
My problem is that both creating a new connection and running the commands needs time. It takes app. 10s to get a connection up (the TCP connection is established instantly, but some commands need to be run to prepare the connection for use), and an almost equal amount of time to run the command required.
So, I need to come up with a solution that allows me to execute 10-20 of these 10s long commands on the nodes, without collectively taking more than 1min. I was thinking of creating a sort of connection pooler, which I could send the commands to and then it could execute them in parallel, dividing them over available Telnet sessions. I tried to find something similar that I could use (or even just look at to gain some understanding), but I am unable to find anything.
I'm developing on Ubuntu with Python. Any help would be appreciated!
Edit (Update Info)*:
#Aya #Thomas: A bit more info. I already have a solution in Python that is working, however it is getting difficult to manage the code. Currently I'm using the same approach that you advised, using a per connection Thread. However, the problem is that there is a 10s delay each time a connection is made to a node, and I need to make atleast 10 connections per node per iteration. The time limit for each iteration is 60s, so making a new connection each time is not feasible. It needs to open 10 connections per node at startup and maintain those connections.
What I am looking for is someone who can point out examples of good architecture for something like this?

How can I detect total MySQL server death from Python?

I've been doing some HA testing of our database and in my simulation of server death I've found an issue.
My test uses Django and does this:
Connect to the database
Do a query
Pull out the network cord of the server
Do another query
At this point everything hangs indefinitely within the mysql_ping function. As far as my app is concerned it is connected to the database (because of the previous query), it's just that the server is taking a long time to respond...
Does anyone know of any ways to handle this kind of situation? connect_timeout doesn't work as I'm already connected. read_timeout seems like a somewhat too blunt instrument (and I can't even get that working with Django anyway).
Setting the default socket timeout also doesn't work (and would be vastly too blunt as this would affect all socket operations and not just MySQL).
I'm seriously considering doing my queries within threads and using Thread.join(timeout) to perform the timeout.
In theory, if I can do this timeout then reconnect logic should kick in and our automatic failover of the database should work perfectly (kill -9 on affected processes currently does the trick but is a bit manual!).
I would think this would be more inline with setting a read_timeout on your front-facing webserver. Any number of reasons could exist to hold up your django app indefinitely. While you have found one specific case there could be many more (code errors, cache difficulties, etc).

How to ensure several Python processes access the data base one by one?

I got a lot scripts running: scrappers, checkers, cleaners, etc. They have some things in common:
they are forever running;
they have no time constrain to finish their job;
they all access the same MYSQL DB, writting and reading.
Accumulating them, it's starting to slow down the website, which runs on the same system, but depends on these scripts.
I can use queues with Kombu to inline all writtings.
But do you know a way to make the same with reading ?
E.G: if one script need to read from the DB, his request is sent to a blocking queue, et it resumes when it got the answer ? This way everybody is making request to one process, and the process is the only one talking to the DB, making one request at the time.
I have no idea how to do this.
Of course, in the end I may have to add more servers to the mix, but before that, is there something I can do at the software level ?
You could use a connection pooler and make the connections from the scripts go through it. It would limit the number of real connections hitting your DB while being transparent to your scripts (their connections would be held in a "wait" state until a real connections is freed).
I don't know what DB you use, but for Postgres I'm using PGBouncer for similiar reasons, see http://pgfoundry.org/projects/pgbouncer/
You say that your dataset is <1GB, the problem is CPU bound.
Now start analyzing what is eating CPU cycles:
Which queries are really slow and executed often. MySQL can log those queries.
What about the slow queries? Can they be accelerated by using an index?
Are there unused indices? Drop them!
Nothing helps? Can you solve it by denormalizing/precomputing stuff?
You could create a function that each process must call in order to talk to the DB. You could re-write the scripts so that they must call that function rather than talk directly to the DB. Within that function, you could have a scope-based lock so that only one process would be talking to the DB at a time.

Mysql Connection, one or many?

I'm writing a script in python which basically queries WMI and updates the information in a mysql database. One of those "write something you need" to learn to program exercises.
In case something breaks in the middle of the script, for example, the remote computer turns off, it's separated out into functions.
Query Some WMI data
Update that to the database
Query Other WMI data
Update that to the database
Is it better to open one mysql connection at the beginning and leave it open or close the connection after each update?
It seems as though one connection would use less resources. (Although I'm just learning, so this is a complete guess.) However, opening and closing the connection with each update seems more 'neat'. Functions would be more stand alone, rather than depend on code outside that function.
"However, opening and closing the connection with each update seems more 'neat'. "
It's also a huge amount of overhead -- and there's no actual benefit.
Creating and disposing of connections is relatively expensive. More importantly, what's the actual reason? How does it improve, simplify, clarify?
Generally, most applications have one connection that they use from when they start to when they stop.
I don't think that there is "better" solution. Its too early to think about resources. And since wmi is quite slow ( in comparison to sql connection ) the db is not an issue.
Just make it work. And then make it better.
The good thing about working with open connection here, is that the "natural" solution is to use objects and not just functions. So it will be a learning experience( In case you are learning python and not mysql).
Think for a moment about the following scenario:
for dataItem in dataSet:
update(dataItem)
If you open and close your connection inside of the update function and your dataSet contains a thousand items then you will destroy the performance of your application and ruin any transactional capabilities.
A better way would be to open a connection and pass it to the update function. You could even have your update function call a connection manager of sorts. If you intend to perform single updates periodically then open and close your connection around your update function calls.
In this way you will be able to use functions to encapsulate your data operations and be able to share a connection between them.
However, this approach is not great for performing bulk inserts or updates.
Useful clues in S.Lott's and Igal Serban's answers. I think you should first find out your actual requirements and code accordingly.
Just to mention a different strategy; some applications keep a pool of database (or whatever) connections and in case of a transaction just pull one from that pool. It seems rather obvious you just need one connection for this kind of application. But you can still keep a pool of one connection and apply following;
Whenever database transaction is needed the connection is pulled from the pool and returned back at the end.
(optional) The connection is expired (and of replaced by a new one) after a certain amount of time.
(optional) The connection is expired after a certain amount of usage.
(optional) The pool can check (by sending an inexpensive query) if the connection is alive before handing it over the program.
This is somewhat in between single connection and connection per transaction strategies.

Categories