Checking my MySQL connection when using Python - python

I have a Python script which reads files into my tables in MySQL. Now this program runs automatically every now and then. However I'am afraid of 2 things:
Their might come a time the program stops running because it cant connect to the MySQL server. There are a lot of processes depending on this tables, so if the tables are not up to date the rest of my process will also stop working.
Their might sneak a file inside the process which does not have the expected content. After the script finished running, every value of column X must have 12 rows. If it does not have 12 rows this means the files did not have the right content inside them.
My question is: Is there something I can do to tackle this before it happens? Like send an e-mail to myself so I can be notified if the connection fails or like run the program on another server or if a certain value has like NOT 12 rows?
I'm very eager to know how you guys handle this situations.
I have a very simple connection made like this:
mydb = mysql.connector.connect(
host= 'localhost',
user = 'root',
passwd = '*****.',
database= 'my_database'
)

The event you are talking about is very unlikely to happen, and the only possible situation I could see this happening is when your database runs out of memory. For which you can set up a 2 min period every 2 to 3 days when you will go and check the amount of memory left in your server.

Related

How to implement 'sqlite3_busy_timeout' in python?

I'm trying to run a python script with multiple threads, but I'm getting the following error:
sqlite3.OperationalError: database is locked
I've found out that I need to extend the sqlite3_busy_timeout to make it wait a bit longer before writing to the database.
The code used for this looks like the following:
'db.configure("busyTimeout", 10000)' //This should make it wait for 10 seconds.
What I want to know is how do I implement this code? Where should I place it, before or after the SQLite command? also, do I have to write anything before it? like c.execute("code")?
You can set the timeout with the busy_timeout pragma. Here is an example setting the busy timeout to 10 seconds:
import sqlite3
with sqlite3.connect('example.db') as db
db.execute('pragma busy_timeout=10000')
# do more work with db...

Run python job every x minutes

I have a small python script that basically connects to a SQL Server (Micrsoft) database and gets users from there, and then syncs them to another mysql database, basically im just running queries to check if the user exists, if not, then add that user to the mysql database.
The script usually would take around 1 min to sync. I require the script to do its work every 5 mins (for example) exactly once (one sync per 5 mins).
How would be the best way to go about building this?
I have some test data for the users but on the real site, theres a lot more users so I can't guarantee the script takes 1 min to execute, it might even take 20 mins. However having an interval of say 15 mins everytime the script executes would be ideal for the problem...
Update:
I have the connection params for the sql server windows db, so I'm using a small ubuntu server to sync between the two databases located on different servers. So lets say db1 (windows) and db2 (linux) are the database servers, I'm using s1 (python server) and pymssql and mysql python modules to sync.
Regards
I am not sure cron is right for the job. It seems to me that if you have it run every 15 minutes but sometimes a synch takes 20 minutes you could have multiple processes running at once and possibly collide.
If the driving force is a constant wait time between the variable execution times then you might need a continuously running process with a wait.
def main():
loopInt = 0
while(loopInt < 10000):
synchDatabase()
loopInt += 1
print("call #" + str(loopInt))
time.sleep(300) #sleep 5 minutes
main()
(obviously not continuous, but long running) You can set the result of while to true and it will be continuous. (comment out loopInt += 1)
Edited to add: Please see note in comments about monitoring the process as you don't want the script to hang or crash and you not be aware of it.
You might want to use a system that handles queues, for example RabbitMQ, and use Celery as the python interface to implement it. With Celery, you can add tasks (like execution of a script) to a queue or run a schedule that'll perform a task after a given interval (just like cron).
Get started http://celery.readthedocs.org/en/latest/

What is the proper way of using MySQLdb connections and cursors across multiple functions in Python

I'm kind of new to Python and its MySQLdb connector.
I'm writing an API to return some data from a database using the RESTful approach. In PHP, I wrapped the Connection management part in a class, acting as an abstraction layer for MySQL queries.
In Python:
I define the connection early on in the script: con = mdb.connect('localhost', 'user', 'passwd', 'dbname')
Then, in all subsequent methods:
import MySQLdb as mdb
def insert_func():
with con:
cur = con.cursor(mdb.cursors.DictCursor)
cur.execute("INSERT INTO table (col1, col2, col3) VALUES (%s, %s, %s)", (val1, val2, val3) )
rows = cur.fetchall()
#do something with the results
return someval
etc.
I use mdb.cursors.DictCursor because I prefer to be able to access database columns in an associative array manner.
Now the problems start popping up:
in one function, I issue an insert query to create a 'group' with unique 'groupid'.
This 'group' has a creator. Every user in the database holds a JSON array in the 'groups' column of his/her row in the table.
So when I create a new group, I want to assign the groupid to the user that created it.
I update the user's record using a similar function.
I've wrapped the 'insert' and 'update' parts in two separate function defs.
The first time I run the script, everything works fine.
The second time I run the script, the script runs endlessly (I suspect due to some idle connection to the MySQL database).
When I interrupt it using CTRL + C, I get one of the following errors:
"'Cursor' object has no attribute 'connection'"
"commands out of sync; you can't run this command now"
or any other KeyboardInterrupt exception, as would be expected.
It seems to me that these errors are caused by some erroneous way of handling connections and cursors in my code.
I read it was good practice to use with con: so that the connection will automatically close itself after the query. I use 'with' on 'con' in each function, so the connection is closed, but I decided to define the connection globally, for any function to use it. This seems incompatible with the with con: context management. I suspect the cursor needs to be 'context managed' in a similar way, but I do not know how to do this (To my knowledge, PHP doesn't use cursors for MySQL, so I have no experience using them).
I now have the following questions:
Why does it work the first time but not the second? (it will however, work again, once, after the CTRL + C interrupt).
How should I go about using connections and cursors when using multiple functions (that can be called upon in sequence)?
I think there are two main issues going on here- one appears to be python code and the other is the structure of how you're interacting to your DB.
First, you're not closing your connection. This depends on your application's needs - you have to decide how long it should stay open. Reference this SO question
from contextlib import closing
with closing( connection.cursor() ) as cursor:
... use the cursor ...
# cursor closed. Guaranteed.
connection.close()
Right now, you have to interrupt your program with Ctl+C because there's no reason for your with statement to stop running.
Second, start thinking about your interactions with the DB in terms of 'transactions'. Do something, commit it to the DB, if it didn't work, rollback, if it did, close the connection. Here's a tutorial.
With connections, as with file handles the rule of thumb is open late, close early.
So I would recommend share connections only where they are trying to do one thing. Or if you multiprocess, then each process gets a connection, again following open late, close early. And if you are doing sequential operation (say in a loop) open and close outside the loop. Having global connections can get messy. Mainly because now you have to keep track of which function uses it at what time, and what it tries to do with it.
The issue of "cannot run command now", is because your keyboard interrupt kills the active connection.
As to part one of your question - endlessly could be anywhere. Each instance of python will get its own connection. So when you run it the second time it should get its own connection. Open up a mysql client and do
show full processlist
to see whats going on.

Python re establishment after Server shuts down

I have a python script running on my server which accessed a database, executes a fetch query and runs a learning algorithm to classify and updates certain values and means depending on the query.
I want to know if for some reason my server shuts down in between then my python script would shut down and my query lost.
How do i get to know where to continue from once I re-run the script and i want to carry on the updated means from the previous queries that have happened.
First of all: the question is not really related to Python at all. It's a general problem.
And the answer is simple: keep track of what your script does (in a file or directly in db). If it crashes continue from the last step.

Locking a row with SQLite (read lock ?)

I have developed a basic proxy tester in Python. Proxy IPs and ports, as well as their date_of_last_test (e.g. 31/12/2011 10:10:10) and result_of_last_test (OK or KO) are stored in a single SQLite table. (I realize I could store a lot more details on the tests results and keep an history / stats, but this simple model suits my needs).
Here is the simplified code of the tester main loop, where I loop over the proxies and update their status:
while True:
# STEP 1: select
myCursor.execute("SELECT * from proxy ORDER BY date_of_last_test ASC;")
row = myCursor.fetchone()
# STEP 2: update
if isProxyWorking(row['ip'], row['port']): # this test can last a few seconds
updateRow(row['ip'], row['port'], 'OK')
else:
updateRow(row['ip'], row['port'], 'KO')
My code works fine when run as a single process. Now, I would like to be able to run many processes of the program, using the same SQLite database file.
The problem with the current code is the lack of a locking mechanism that would prevent several processes from testing the same proxy.
What would be the cleanest way to put a lock on the row at STEP 1 / SELECT time, so that the next process doing a SELECT gets the next row ?
In other words, I'd like to avoid the following situation:
Let's say it's 10PM, and the DB contains 2 proxies:
Proxy A tested for the last time at 8PM and proxy B tested at 9PM.
I start two processes of the tester to update their statuses:
10:00 - Process 1 gets the "oldest" proxy to test it: A
10:01 -
Process 2 gets the "oldest" proxy to test it: !!! A !!! (here I'd
like Process 2 to get proxy B because A is already being tested -
though not updated yet in db)
10:10 - Testing of A by Process 1 is
over, its status is updated in DB
10:11 - Testing of A by Process 2 is
over, its status is updated (!!! AGAIN !!!) in DB
There is no actual error/exception in this case, but a waste of time I want to avoid.
SQlite only allows one process to update anything in the database at a time, From the FAQ
Multiple processes can have the same database open at the same time. Multiple processes can be doing a SELECT at the same time. But only one process can be making changes to the database at any moment in time,
and
When SQLite tries to access a file that is locked by another process, the default behavior is to return SQLITE_BUSY. You can adjust this behavior from C code using the sqlite3_busy_handler() or sqlite3_busy_timeout() API functions.
So if there only a few updates then this will work otherwise you need to change to a more capable database.
so there is only one lock which is on the whole database

Categories