I am using psycopg2 (2.6.1) to connect to Amazon's Redshift.
I have a query that should last about 1 second, but about 1 time out of every 20 concurrent tries it just hangs forever (I manually kill them after 1 hour). To address this, I configured the statement_timeout setting before my query, as such:
rcur.execute("SET statement_timeout TO 60000")
rcur.execute(query)
so that after 1 minute the query will give up, and I can try again (the second try does complete quickly as expected), but even with this (which I confirmed worked by setting the timeout to 1 ms and seeing it raise an Exception), sometimes the Python code hangs instead of raising an Exception (it never reaches the print directly after the rcur.execute(query)). And I can see in the Redshift AWS dashboard that the query has been "terminated" after 59 seconds, but my code still hangs for an hour instead of raising an Exception.
Does anyone know how to resolve this, or have a better method of dealing with typically short queries that occasionally take unnaturally long and simply need to be cancelled and retried?
I think you need to configure your keepAlive settings for the Redshift connection.
Follow the steps in this AWS doc to do that,
http://docs.aws.amazon.com/redshift/latest/mgmt/connecting-firewall-guidance.html
Related
We have built a series of programs that process data. In the last step, they write out a series of Neo4j commands to create nodes, and later, to connect those nodes. This is a sample of what is created.
CREATE (n1:ActionElement {
nodekey:1,
name:'type',
resource: 'action',
element:'ActionElement',
description:'CompilationUnit',
linestart:1,
colstart:7,
lineend:454,
colend:70,
content:[],
level:1,
end:False
});
The issue is that the file created has ~20,000 lines. When I run it through the shell, I get an error on some of the transactions. It seems to alternately process and reject. I cant see a pattern but I am assuming that I am overruning the processing speed.
neo4j> CREATE (n1573)-[:sibling]->(n1572);
Connection refused
neo4j> CREATE (n1574)-[:sibling]->(n1573);
Connection refused
neo4j> CREATE (n1575)-[:sibling]->(n1574);
0 rows available after 3361 ms, consumed after another 2 ms
Added 2 nodes, Created 1 relationships
neo4j> CREATE (n1579)-[:sibling]->(n1578);
0 rows available after 78 ms, consumed after another 0 ms*
Interesting enough, it recovers, fails, recovers.
Any thoughts ? is this just fundamentally the wrong way to do this ? The LAST program to touch it happens to be python, should I have it update the database direct ? Thank you
In the end, it was a network issue combined with a transaction issue. To solve, we FTPd the file to the same server as the Neo4j instance (eliminating the network latency) and then modified the python load to try/except on error, and if it failed, wait 5 seconds and retry. Only issued an error when the second try failed. That eliminated any 'transaction latency'. On a 16 core machine, it did not fail at all. On a single core, it retried 4 times across 20,000 updates, and all passed on the second try. Not ideal, but workable.
Thanks
We are calling postgres using psycopg2 2.7.5 this way that we perform a query then perform some operation on data that we received and then we open new connection and perform another query and so on.
Usually the query takes between 15 s to 10 min.
Occasinally after 2 h we receive error: Python Exception : connection already closed
What may be the reason for that? Data is the same and query is the same and sometimes the same query gives results back in 3 min and sometimes it gets that timeout after 2 hrs.
I wonder if it is possible that connection is broken earlier but in python we get that information for some reason after 2 hrs?
I doubt that there are any locks on DB at the moment when we perform a query but it may be under huge load and max number of connections may be reached (not confirmed but this is an option).
What would be the best way to track down the problem? Firewall is set to 30 min timeout.
We are calling postgres using psycopg2 2.7.5 this way that we perform a query then perform some operation on data that we received and then we open new connection and perform another query and so on.
Why do you keep opening new connections? What do you do with the old one, and when do you do it?
I wonder if it is possible that connection is broken earlier but in python we get that information for some reason after 2 hrs?
In general, a broken connection won't be detected until you try to use it. If you are using a connection pooler, it is possible the pool manager checks up on the connection periodically in the background.
We are having issues recently with our prod servers connecting to Oracle. Intermittently we are getting "DatabaseError: ORA-12502: TNS:listener received no CONNECT_DATA from client". This issue is completely random and goes away in a second by itself and it's not a Django problem, can replicate it with SQLPlus from the servers.
We opened ticket with Oracle support but in the meantime i'm wondering if it's possible to simply retry any DB-related operation when it fails.
The problem is that i can't use try/catch blocks in the code to handle this since this can happen on ANY DB interaction in the entire codebase. I have to do this at a lower level so that i do it only once. Is there any way to install an error handler or something like that directly on django.db.backends.oracle level so that it will cover all the codebase? Basically, all i want to do is this:
try:
execute_sql()
catch:
if code == ORA-12502:
sleep 1 second
#re-try the same operation
exexute_sql()
Is this even possible or I'm out of luck?
Thanks!
I have a small python script that basically connects to a SQL Server (Micrsoft) database and gets users from there, and then syncs them to another mysql database, basically im just running queries to check if the user exists, if not, then add that user to the mysql database.
The script usually would take around 1 min to sync. I require the script to do its work every 5 mins (for example) exactly once (one sync per 5 mins).
How would be the best way to go about building this?
I have some test data for the users but on the real site, theres a lot more users so I can't guarantee the script takes 1 min to execute, it might even take 20 mins. However having an interval of say 15 mins everytime the script executes would be ideal for the problem...
Update:
I have the connection params for the sql server windows db, so I'm using a small ubuntu server to sync between the two databases located on different servers. So lets say db1 (windows) and db2 (linux) are the database servers, I'm using s1 (python server) and pymssql and mysql python modules to sync.
Regards
I am not sure cron is right for the job. It seems to me that if you have it run every 15 minutes but sometimes a synch takes 20 minutes you could have multiple processes running at once and possibly collide.
If the driving force is a constant wait time between the variable execution times then you might need a continuously running process with a wait.
def main():
loopInt = 0
while(loopInt < 10000):
synchDatabase()
loopInt += 1
print("call #" + str(loopInt))
time.sleep(300) #sleep 5 minutes
main()
(obviously not continuous, but long running) You can set the result of while to true and it will be continuous. (comment out loopInt += 1)
Edited to add: Please see note in comments about monitoring the process as you don't want the script to hang or crash and you not be aware of it.
You might want to use a system that handles queues, for example RabbitMQ, and use Celery as the python interface to implement it. With Celery, you can add tasks (like execution of a script) to a queue or run a schedule that'll perform a task after a given interval (just like cron).
Get started http://celery.readthedocs.org/en/latest/
I have a one year production site configured with django.contrib.sessions.backends.cached_db backend with a MySQL database backend. The reason why I chose cached_db is a mix of security with read performance.
The problem is, the cleanup command, responsible to delete all expired sessions, was never executed, resulting in a 2.3GB session table data length, 6 million rows and 500Mb index length.
When I try to run the ./manage.py cleanup (in Django 1.3) command, or ./manage.py clearsessions (Django`s 1.5 correspondent), the process never ends (or my patience doesn't complete 3 hours).
The code that Django use's to do this is:
Session.objects.filter(expire_date__lt=timezone.now()).delete()
In a first impression, I think that's normal because the table has 6M rows, but, after I inspect System's monitor, I discover that all memory and cpu was used by the python process, not mysqld, fullfilling my machine's resources. I think that's something terrible wrong with this command code. It seems that python iterates over all founded expired session rows before deleting each of them, one by one. In this case, a code refactoring to just raw a DELETE FROM command can resolve my problem and helps Django community, right? But, if this is the case, a Queryset delete command is acting weird and none optimized in my opinion. Am I right?