Django 1.7 Migrations hanging - python

I have a django migration I am trying to apply. It gets made fine (it's small, it's only adding a CharField to two different Models. However when I run the actual migrate it hangs (no failure, no success, just sits).
Through googling I've found that other open connections can mess with it so I restarted the DB. However this DB is connect to continuously running jobs and new queries do sneak in right away. However they are small, and last time I tried restarting I THINK I was able to execute my migrate before anything else. Still nothing.
Are there any other known issues that cause something like this?

At least in PostgreSQL you cannot modify tables (even if it's just adding new columns) while there are active transactions. The easiest workaround for this is usually to:
run the migration script (which will hang)
restart your webserver/wsgi container
When restarting your webserver all open transactions will be aborted (assuming you don't have background processes which also have transactions open), so as soon as no transactions are blocking your table, the migration will finish.

I was having this same problem today. I discovered that you can clear out any hanging transactions in PostgreSQL using the following SQL immediately before running your transaction:
-- View all the current activity
-- SELECT * FROM pg_stat_activity;
-- terminate other connections (make sure to add your own IP address)
SELECT pg_terminate_backend(procpid) FROM pg_stat_activity WHERE client_addr <> 'YOUR IP HERE'
This will terminate any connections that aren't yours, which might not be ideal in all circumstances, but works like a charm.

Worth noting for future readers that the migrations can hang when trying to apply a migration for an incorrect size CharField (DB implementation dependent). I was trying to alter a CharField to be greater than size 255 and it was just hanging. Even after terminating the connections as stated it would not fix it as a CharField of size greater than 255 as that was incorrect with my implementation (postgresql).
TLDR; Ensure your CharField is 255 or less, if greater change your CharField to a TextField and it could fix your problem!

Related

Debugging idle postgres query executed from sqlalchemy

I have a batch query that I'm running daily on my database. However, it seems to get stuck in idle state, and I'm having a lot of difficulty debugging what's going on.
The query is an aggregation on a table that is simultaneously getting inserted, which I'm guessing somehow relates to the issue. (The aggregation is on the previous days data, so the insertions shouldn't affect results.)
Clues
I'm running this inside a python script using sqlalchemy. However, I've set transaction level to autocommit, so I don't think things are getting wrapped inside a transaction. On the other hand, I don't see the query hang when I run it manually in sql terminal.
By querying pg_stat_activity, the query initially comes into the database as state='active'. After maybe 15 seconds, the state changes to 'idle' and additionally, the xact_start is set to NULL. The waiting flag is never set to true.
Before I figured out the transaction level autocommit for sqlalchemy, it would instead hang in state 'idle in transaction' rather than 'idle'. And it possibly hangs slightly less frequently since making that change?
I feel like I'm not equipped to dig any deeper than I have on this. Any feedback, even explaining more about different states and relevant postgres internals without giving a definite answer, would be greatly appreciated.

Python (PostgreSQL) wrap local and remote calls in one transaction

I am doing synchronization between two databases in Odoo. If it goes without any issues on remote, then it synchronizes on both sides. But if something goes wrong on remote, then local database changes are committed, but remote is not.
In other words, databases go out of sync.
Is there a way to make changes in local database and if something goes wrong trying to synchronize remote database, rollback local database to previous state.
There is this method:
#api.one
def order_process_now(self):
servers = self._synchro_before_sale()
# Process local order
inv_id = self.action_invoice_create()
if inv_id:
inv = self.env['account.invoice'].search([('id', '=', inv_id)])
inv.signal_workflow('invoice_open')
for picking in self.picking_ids:
picking.force_assign()
picking.action_done()
# Process remote orders
self._remote_order_action('order_process_now', servers)
As you can see it is divided into two parts. First it makes changes to local database, then makes changes on remote (using xmlrpclib with erppeek wrapper).
How can I make this method as one transaction, so if anything goes wrong executing method, any changes to databases would rollback?
What you need for this is two-phase commit.
The general idea is:
Begin your local and remote changes
Do the required work on each
On the remote side PREPARE TRANSACTION and take note of the returned ID in persistent storage
On the local side COMMIT the changes
On the remote side COMMIT PREPARED with the returned ID, or if the local commit failed for some reason, ROLLBACK PREPARED instead.
If your app restarts it must look at its record of of prepared-but-not-committed remote transactions and:
if the local transaction was committed) issue a COMMIT PREPARED; or
if the local transaction was NOT committed issue a ROLLBACK PREPARED
This is not simple to get right. The naïve approach that fails to record the local commit ID doesn't really fix anything, it just replaces inconsistent database state with leaked prepared transactions. You must actually keep a record of the prepared transactions and resolve them after a crash or restart. Remember that the ROLLBACK PREPARED or COMMIT PREPARED can fail due to connectivity issues, DB restarts, etc.
For this reason many people use a separate transaction manager that takes care of this part for them. MSDTC is an option on Windows systems. For Java you can supposedly use JTC. On C/UNIX systems you could use XA. Unfortunately distributed transaction managers appear to attract horrible, baroque and ill-defined API design (can you say javax.transaction.HeuristicMixedException?)
You'll need to look at two phase commits. Basically this lets you do a trial commit on each separate system and then only if both succeed do a final "real" commit.
You still need to deal with the case where e.g. the client crashes. Then you'll have prepared commits hanging about and you'll want to roll them back and start again.

Raw query in Django very much slower than the same in Postgres

I face the problem of an extremely slow (raw) query in my Django app. Strangely enough, it's not slow when I launch the isolated query from the shell (ex: python manage.py my_code_query) but it's slow when I run the whole program that contains all my querys (it "blocks" always at the same query; actually it eventually completes but it's something like 100x slower). It's like if all the queries that are before the problematic one are consuming memory and there is not enough memory left when my query starts. The query ran directly from Postgres has no problem at all.
I read somewhere (Django cursor.execute(QUERY) much slower than running the query in the postgres database) that it can be the work_mem setting in Postgres that causes the problem but they are not very clear about the way they set it from Django. Do I have to make a call from my connection.cursor.execute() to set the work_mem parameter? Once only?
Could it be another problem than the work_mem setting?
Any hint will be very appreciated.
Thanks,
Patrick
Inspired by that post (How can I tell Django to execute all queries using 10MB statement mem?), I made this call before executing my cursor:
cursor.execute("set work_mem='100MB'") #set statement_mem does not work
It's running timely now.
--EDIT: Well, that was yesterday. Today it's not running timely anymore. Don't know why.

How can I detect total MySQL server death from Python?

I've been doing some HA testing of our database and in my simulation of server death I've found an issue.
My test uses Django and does this:
Connect to the database
Do a query
Pull out the network cord of the server
Do another query
At this point everything hangs indefinitely within the mysql_ping function. As far as my app is concerned it is connected to the database (because of the previous query), it's just that the server is taking a long time to respond...
Does anyone know of any ways to handle this kind of situation? connect_timeout doesn't work as I'm already connected. read_timeout seems like a somewhat too blunt instrument (and I can't even get that working with Django anyway).
Setting the default socket timeout also doesn't work (and would be vastly too blunt as this would affect all socket operations and not just MySQL).
I'm seriously considering doing my queries within threads and using Thread.join(timeout) to perform the timeout.
In theory, if I can do this timeout then reconnect logic should kick in and our automatic failover of the database should work perfectly (kill -9 on affected processes currently does the trick but is a bit manual!).
I would think this would be more inline with setting a read_timeout on your front-facing webserver. Any number of reasons could exist to hold up your django app indefinitely. While you have found one specific case there could be many more (code errors, cache difficulties, etc).

Sometimes can't delete an Oracle database row using Django

I have a unit test which contains the following line of code
Site.objects.get(name="UnitTest").delete()
and this has worked just fine until now. However, that statement is currently hanging. It'll sit there forever trying to execute the delete. If I just say
print Site.objects.get(name="UnitTest")
then it works, so I know that it can retrieve the site. No other program is connected to Oracle, so it's not like there are two developers stepping on each other somehow. I assume that some sort of table lock hasn't been released.
So short of shutting down the Oracle database and bringing it back up, how do I release that lock or whatever is blocking me? I'd like to not resort to a database shutdown because in the future that may be disruptive to some of the other developers.
EDIT: Justin suggested that I look at the DBA_BLOCKERS and DBA_WAITERS tables. Unfortunately, I don't understand these tables at all, and I'm not sure what I'm looking for. So here's the information that seemed relevant to me:
The DBA_WAITERS table has 182 entries with lock type "DML". The DBA_BLOCKERS table has 14 entries whose session ids all correspond to the username used by our application code.
Since this needs to get resolved, I'm going to just restart the web server, but I'd still appreciate any suggestions about what to do if this problem repeats itself. I'm a real novice when it comes to Oracle administration and have mostly just used MySQL in the past, so I'm definitely out of my element.
EDIT #2: It turns out that despite what I thought, another programmer was indeed accessing the database at the same time as me. So what's the best way to detect this in the future? Perhaps I should have shut down my program and then queried the DBA_WAITERS and DBA_BLOCKERS tables to make sure they were empty.
From a separate session, can you query the DBA_BLOCKERS and DBA_WAITERS data dictionary tables and post the results? That will tell you if your session is getting blocked by a lock held by some other session, as well as what other session is holding the lock.

Categories