Django and Sqlite Concurrency issue - python

I've done a bit of reading related to the concurrency issues with sqlite, but I don't see how they'd apply to Django since it's inherently single threaded. I'm not using any multiprocess modules either. I have absolutely no experience with concurrent programming either, so if someone can identify WHY the following code is causing an OperationalError: 'database is locked' I'd be grateful.
views.py
def screening(request, ovramt=None):
errors = []
if request.method == "POST":
form = ScreeningForm(request.POST)
if form.is_valid():
print "Woo valid!!"
return HttpResponse()
else: # GET
if ovramt is None:
o = Ovramt.objects.select_related(depth=1).latest("date_completed")
print "found?"
print o.id
else:
try:
o = Ovramt.objects.select_related(depth=1).get(id=ovramt)
except:
errors.append("OVRAMT NOT FOUND")
if o.residents.count() <= 0:
o.add_active_residents()
residents = list(o.residents)
models.py
def add_active_residents(self):
ssa_res = SSA_Resident.objects.select_related(depth=1).filter(ssa=self.ssa, active=True)
for r in ssa_res:
self.residents.add(r.resident) # Fails Here
self.save()
The add_active_residents method works fine, until it is called from the views module. Is there an open connection to the database open in the view which prevents writing from the model? Does someone have an explanation why this code will error?

In the following method function
def add_active_residents(self):
ssa_res = SSA_Resident.objects.select_related(depth=1).filter(ssa=self.ssa, active=True)
for r in ssa_res:
self.residents.add(r.resident) # Fails Here
self.save()
Why is there a select_related? You only really need the FK's of ssa_res items. Why do additional queries for related items?

Are you using Python 2.6?
If so, this is (apparently) a known issue that can be mitigated by adding:
DATABASE_OPTIONS = {'timeout': 30}
to your settings.py
See http://code.djangoproject.com/ticket/9409

My understanding is that only write operations will result in a db-locked condition.
http://www.sqlite.org/lockingv3.html
It's hard to say what the problem is without knowing how django is handling sqlite internally.
Speaking from using sqlite with standard cgi, I've noticed that in some cases it can take a 'long' time to release the lock. You may want to increase the timeout value mentioned by Matthew Christensen.

Sounds like you are actually running a multithreaded application, despite what you say. I am a bit clueless about Django, but I would assume that even though it might be single-threaded, whatever debugging server, or production server you run your application in won't be "inherently single threaded".

Related

How to use self.env.commit() properly?

Could someone please explain to me how self.env.cr.commit() works, when to use it and some good practices?
From Odoo documentation seems the use of cr.commit is very dangerous. This is my first time using it and I am not sure how to use it properly for my use case.
Edit:
More information for my use case: I am creating shipments through shipping provider API. Let's say my API call is successful and I have created shipment but during handling of the response, I have to raise UserError for some reason and my changes are rollbacked. So now the state of the shipment is different in Odoo and on the shipping provider server which is unacceptable.
So if I am calling the method create_dhl_shipment() and the flag variable is True (an error occurred during last API call) then I would like to delete the original shipment and create a new one.
And my problem is: How do I make a change in the database and keep it from rollbacking.
During search on the internet, I came across cr.commit() but Odoo in documentation really discourages using it.
very simplified example:
Class StockPickingInherited(models.Model)
_inherit = 'stock.picking'
remnant_shipment = fields.Boolean("Possible remnant shipment")
packages = fields.One2many("stock.shipment.package", "picking_id")
def create_dhl_shipment(self):
response_from_shipping_provider = requests.get("API URL")
if response_from_shipping_provider != 200:
if not remnant_shipment:
raise UserError("Shipment creation failed")
else:
self.write({"packages" : [(5, 0, 0)]})
self.env.cr.commit() # write data into the db and keep the change from rollbacking due to raising UserError
raise UserError("Shipment creation failed")
Am I doing it right? Are there some potential dangers?
It is true that self.env.cr.commit() should be used sparingly. However, there are some legitimate use cases for it. For example:
#api.model
def some_cron_job(self):
for record in self.env[...].search(...):
record.do_some_process()
self.env.cr.commit()
The above is fine because you are doing some process on a batch of records and to avoid that the cron job does it over and over because of an error on one of the records, you can commit after each record has been processed. (PS: the above could be made safer with try...except... or marking a record as "failed" for example. This can be done in conjunction with commit())
In your use case you want to display a message to the user, but your problem is that once you raise the transaction will rollback, so to counter this you call a commit().
I have had similar situations and I sometimes find that using an #api.onchange works well for showing a message to the user.
#api.onchange('your_field')
def onchange_your_field(self):
if self.your_field > 100:
raise UserError("Thanks for entering the field")
However, I am no longer a fan of that approach since Odoo has gotten rid of most #api.onchange in favour of computed fields. (There are good reasons for the move, so try and avoid it too.)
Luckily there is a new way to do this, introduced in Odoo Version 13.0.
def some_method(self):
# Do something
self.write({"something" : False})
# Display message
return {
'type': 'ir.actions.client',
'tag': 'display_notification',
'params': {
'title': title,
'message': message,
'sticky': False,
}
}
The above is taken from here in the Odoo repo where a message is displayed to the user if the mail server credentials are correct.
The following rule is taken from the Odoo guidelines (never commit the transaction):
You should NEVER call cr.commit() yourself, UNLESS you have created your own database cursor explicitly! And the situations where you need to do that are exceptional!And by the way if you did create your own cursor, then you need to handle error cases and proper rollback, as well as properly close the cursor when you’re done with it.
In the following example, we create a new cursor to avoid rollback that could be caused by an upper method:
registry = odoo.registry(self.env.cr.dbname)
with registry.cursor() as cr:
env = api.Environment(cr, SUPERUSER_ID, {})
For more details check the well-documented Odoo guidelines
You can find an example in auth_ldap module

Recover from PendingRollbackError and allow subsequent queries

We have a pyramid web application.
We use SQLAlchemy#1.4 with Zope transactions.
In our application, it is possible for an error to occur during flush as described here which causes any subsequent usage of the SQLAlchemy session to throw a PendingRollbackError. The error which occurs during a flush is unintentional (a bug), and is raised to our exception handling view... which tries to use data from the SQLAlchemy session, which then throws a PendingRollbackError.
Is it possible to "recover" from a PendingRollbackError if you have not framed your transaction management correctly? The SQLAclhemy documentation says to avoid this situation you essentially "just need to do things the right way". Unfortunately, this is a large codebase, and developers don't always follow correct transaction management. This issue is also complicated if savepoints/nested transactions are used.
def some_view():
# constraint violation
session.add_all([Foo(id=1), Foo(id=1)])
session.commit() # Error is raised during flush
return {'data': 'some data'}
def exception_handling_view(): # Wired in via pyramid framework, error ^ enters here.
session.query(... does a query to get some data) # This throws a `PendingRollbackError`
I am wondering if we can do something like the below, but don't understand pyramid + SQLAlchemy + Zope transactions well enough to know the implications (when considering the potential for nested transactions etc).
def exception_handling_view(): # Wired in via pyramid framework, error ^ enters here.
def _query():
session.query(... does a query to get some data)
try:
_query()
except PendingRollbackError:
session.rollback()
_query()
Instead of trying to execute your query, just try to get the connection:
def exception_handling_view():
try:
_ = session.connection()
except PendingRollbackError:
session.rollback()
session.query(...)
session.rollback() only rolls back the innermost transaction, as is usually expected — assuming nested transactions are used intentionally via the explicit session.begin_nested().
You don't have to rollback parent transactions, but if you decide to do that, you can:
while session.registry().in_transaction():
session.rollback()

simple example of working with neo4j python driver?

Is there a simple example of working with the neo4j python driver?
How do I just pass cypher query to the driver to run and return a cursor?
If I'm reading for example this it seems the demo has a class wrapper, with a private member func I pass to the session.write,
session.write_transaction(self._create_and_return_greeting, ...
That then gets called it with a transaction as a first parameter...
def _create_and_return_greeting(tx, message):
that in turn runs the cypher
result = tx.run("CREATE (a:Greeting) "
This seems 10X more complicated than it needs to be.
I did just try a simpler:
def raw_query(query, **kwargs):
neodriver = neo_connect() # cached dbconn
with neodriver.session() as session:
try:
result = session.run(query, **kwargs)
return result.data()
But this results in a socket error on the query, probably because the session goes out of scope?
[dfcx/__init__] ERROR | Underlying socket connection gone (_ssl.c:2396)
[dfcx/__init__] ERROR | Failed to write data to connection IPv4Address(('neo4j-core-8afc8558-3.production-orch-0042.neo4j.io', 7687)) (IPv4Address(('34.82.120.138', 7687)))
Also I can't return a cursor/iterator, just the data()
When the session goes out of scope, the query result seems to die with it.
If I manually open and close a session, then I'd have the same problems?
Python must be the most popular language this DB is used with, does everyone use a different driver?
Py2neo seems cute, but completely lacking in ORM wrapper function for most of the cypher language features, so you have to drop down to raw cypher anyway. And I'm not sure it supports **kwargs argument interpolation in the same way.
I guess that big raise should help iron out some kinks :D
Slightly longer version trying to get a working DB wrapper:
def neo_connect() -> Union[neo4j.BoltDriver, neo4j.Neo4jDriver]:
global raw_driver
if raw_driver:
# print('reuse driver')
return raw_driver
neoconfig = NEOCONFIG
raw_driver = neo4j.GraphDatabase.driver(
neoconfig['url'], auth=(
neoconfig['user'], neoconfig['pass']))
if raw_driver is None:
raise BaseException("cannot connect to neo4j")
else:
return raw_driver
def raw_query(query, **kwargs):
# just get data, no cursor
neodriver = neo_connect()
session = neodriver.session()
# logging.info('neoquery %s', query)
# with neodriver.session() as session:
try:
result = session.run(query, **kwargs)
data = result.data()
return data
except neo4j.exceptions.CypherSyntaxError as err:
logging.error('neo error %s', err)
logging.error('failed query: %s', query)
raise err
# finally:
# logging.info('close session')
# session.close()
update: someone pointed me to this example which is another way to use the tx wrapper.
https://github.com/neo4j-graph-examples/northwind/blob/main/code/python/example.py#L16-L21
def raw_query(query, **kwargs):
neodriver = neo_connect() # cached dbconn
with neodriver.session() as session:
try:
result = session.run(query, **kwargs)
return result.data()
This is perfectly fine and works as intended on my end.
The error you're seeing is stating that there is a connection problem. So there must be something going on between the server and the driver that's outside of its influence.
Also, please note, that there is a difference between all of these ways to run a query:
with driver.session():
result = session.run("<SOME CYPHER>")
def work(tx):
result = tx.run("<SOME CYPHER>")
with driver.session():
session.write_transaction(work)
The latter one might be 3 lines longer and the team working on the drivers collected some feedback regarding this. However, there are more things to consider here. Firstly, changing the API surface is something that needs careful planning and cannot be done in say a patch release. Secondly, there are technical hurdles to overcome. Here are the semantics, anyway:
Auto-commit transaction. Runs only that query as one unit of work.
If you run a new auto-commit transaction within the same session, the previous result will buffer all available records for you (depending on the query, this will consume a lot of memory). This can be avoided by calling result.consume(). However, if the session goes out of scope, the result will be consumed automatically. This means you cannot extract further records from it. Lastly, any error will be raised and needs handling in the application code.
Managed transaction. Runs whatever unit of work you want within that function. A transaction is implicitly started and committed (unless you rollback explicitly) around the function.
If the transaction ends (end of function or rollback), the result will be consumed and become invalid. You'll have to extract all records you need before that.
This is the recommended way of using the driver because it will not raise all errors but handle some internally (where appropriate) and retry the work function (e.g. if the server is only temporarily unavailable). Since the function might be executed multiple time, you must make sure it's idempotent.
Closing thoughts:
Please remember that stackoverlfow is monitored on a best-effort basis and what can be perceived as hasty comments may get in the way of getting helpful answers to your questions

Pythonic way to handle errors and exceptions

Some time ago I wrote a piece of code, a Flask route to log out users from a web application I was working on, that looked like that:
#app.route('/logout')
#login_required
def logout():
# lets get the user cookie, and if it exists, delete it
cookie = request.cookies.get('app_login')
response = make_response(redirect(url_for('login')))
if cookie:
riak_bucket = riak_connect('sessions')
riak_bucket.get(cookie).delete()
response.delete_cookie('app_login', None)
return response
return response
I did its job, and was certainly working, but now I am getting into making the app more robust by adding proper error handling, something that I havent done before on a large scale nowhere in my code. So I stumbled on this route function and I started writing its new version, when I realised I dont know how to do it 'the right way'. Here is what I came up with:
#app.route('/logout')
#login_required
def logout():
# why dont we call variables after what they are in specifics?
login_redirect = make_response(redirect(url_for('login')))
try:
cookie = request.cookies.get('app_login')
except:
return login_redirect
# if we are here, the above try/except went good, right?
try:
# perhaps sessions_bucket should really be bucket_object?
# is it valid to chain try statements like that, or should they be
# tried separately one by one?
sessions_bucket = riak_connect('sessions')
sessions_bucket.get(cookie).delete()
login_redirect.delete_cookie('app_login', None)
except:
return login_redirect
# return redirect by default, just because it seems more secure
return login_redirect
It also does it job, but still doesnt look 'right' to me. So, the question are, to all of you who have larger experience in writing really pythonic Python code, given the fact I would love the code to handle all errors nicely, be readable to others and do its job fast and well (in this particular case but also in rest of rather large codebase):
how are you calling your variables, extra specific or general: sessions_bucket vs riak_bucket vs bucket_object?
how do you handle errors, by usage of try/except one after another, or by nesting one try/except in another, or in any other way?
is it ok to do more than one thing in one try/except, or not?
and perhaps anything else, that comes to your mind to the above code examples
Thanks in advance!
I don't know the exact riak python API, so I don't know what exceptions are thrown. On the other hand, how should the web app behave on the different error conditions? Has the user to be informed?
Variable names: I prefer generic. If you change the implementation (e.g. Session store), you don't have to change the variable names.
Exceptions: Depends on the desired behavior. If you want to recover from errors, try/except one after another. (Generally, linear code is simpler.) If you don't recover from errors, I find one bigger try clause with several exception clauses very acceptable.
For me it's ok to do several things in one try/except. If there are too many try/except clauses, the code gets less readable.
More things: logging. logging.exception will log the traceback so you can know where exactly the error appeared.
Some suggestion:
import logging
log = loggin.getLogger(__name__)
#app.route('/logout')
#login_required
def logout():
login_redirect = make_response(redirect(url_for('login')))
try:
sessionid = request.cookies.get('app_login', None)
except AttributeError:
sessionid = None
log.error("Improperly configured")
if sessionid:
try:
session_store = riak_connect('sessions')
session = session_store.get(sessionid)
if session:
session.delete()
login_redirect.delete_cookie('app_login', None)
except IOError: # what errors appear when connect fails?
log.exception("during logout")
return login_redirect

Caching of DB queries in django

I have django app that uses two external scripts. One script moves a file from A to B, stores the value for B in a database - and exists afterwards, which should commit any possibly open transactions. The next script reacts to movement of the file (using inotify), calculates md5sum (which appearently takes time) and then looks for an entry in the database like
x = Queue.get(filename=location).
Looking at the timestamps of my logs, I am 100% sure that the first script is long done before the second script (actually a daemon) runs the query. Interestingly enough, the thing works perfectly after a restart of daemon.
This leads me to believe that somehow the Queryset (I actually run the code shown above everytime a new file is detected with inotify) is cached during runtime of the daemon. I however would not want to restart the daemon all the time, but instead force the query to actually use the DB instead of that cache.
The django documentation doesn't say much about that - however usually django is not used as an external :)
Thank you in advance for any hints!
Ben
PS: as per request the source of the relevant part from the daemon
def _get_info(self, path):
try:
obj = Queue.objects.get(filename=path)
x = obj.x
return x
except Exception, e:
self.logger.error("Error in lookup: %s" % e)
return None
This is called by a thread everytime a new file is moved to the watched directory
Whereas the code in the first script looks like
for f in Queue.objects.all():
if (matching_stuff_here):
f.filename = B
f.save()
sys.exit(0)
You haven't shown any actual code, so we have to guess. My guess would be that even though the transaction in the first script is done and committed, you're still inside an open transaction in script B: and because of transaction isolation, you won't see any changes in B until you finish the transaction there.

Categories