I have my web application and I am having statistics about my users in json type column. For example: {'current': {'friends': 5, 'wins': 2, 'loses': 10}}. I would like to update only specific field in case on race condition. For now I was just simply updating whole dictionary but when User will play two games at the same moment, race condition could occur.
For now i am doing this like that:
class User:
name = Column(Unicode(1024), nullable=False)
username = Column(Unicode(128), nullable=False, unique=True, default='')
password = Column(Unicode(256), nullable=True, default='')
counters = Column(
MutableDict.as_mutable(JSON), nullable=False,
server_default=text('{}'), default=lambda: copy.deepcopy(DEFAULT_COUNTERS))
def current_counter(self, feature, number):
current = self.counters.get('current', {})[feature]
if current + number < 0:
return
self.counters.get('current', {})[feature] = current + number
self.counters.changed()
but this will update whole counters column after changing value and if two games will occur I am expecting race condition.
I was thinking about some session.query, something like that, but I am not that good:
def update_counter(self, session, feature, number):
current = self.counters.get('current', {})[feature]
if current + number < 0:
return
session.query(User) \
.filter(User.id == self.id) \
.update({
"current": func.jsonb_set(
User.counters['current'][feature],
column(current) + column(number),
'true')
},
synchronize_session=False
)
This code produce: NotImplementedError: Operator 'getitem' is not supported on this expression for Event.counters['current'][feature] line but I don't know how to make this works.
Thanks for help.
The error is produced from chaining item access, instead of using a tuple of indexes as a single operation:
User.counters['current', feature]
This would produce a path index operation. But if you would do it that way, you would be setting the value in the nested JSON only, not in the whole value. In addition the value you're indexing from your JSON is an integer (instead of a collection), so jsonb_set() would not even know what to do. That is why jsonb_set() accepts a path as its second argument, which is an array of text and describes which value you want to set in your JSON:
func.jsonb_set(User.counters, ['current', feature], ...)
As for race conditions, there might be one still. You first get the count from the current model object in
current = self.counters.get('current', {})[feature]
and then proceed to use that value in an update, but what if another transaction has managed to perform a similar update in between? You would possibly overwrite that update's changes:
select, counter = 42 |
| select, counter = 42
update counter = 52 | # +10
| update counter = 32 # -10
commit |
| commit # 32 instead of 42
The solution then is to either make sure that you fetched the current model object using FOR UPDATE, or you're using SERIALIZABLE transaction isolation (be ready to retry on serialization failures), or ignore the fetched value and let the DB calculate the update:
# Note that create_missing is true by default
func.jsonb_set(
User.counters,
['current', feature],
func.to_jsonb(
func.coalesce(User.counters['current', feature].astext.cast(Integer), 0) +
number))
and if you want to be sure that you don't update the value if the result would turn out negative (remember that the value you've read before might've changed already), add a check using the DB calculated value as a predicate:
def update_counter(self, session, feature, number):
current_count = User.counters['current', feature].astext.cast(Integer)
# Coalesce in case the count has not been set yet and is NULL
new_count = func.coalesce(current_count, 0) + number
session.query(User) \
.filter(User.id == self.id, new_count >= 0) \
.update({
User.counters: func.jsonb_set(
func.to_jsonb(User.counters),
['current', feature],
func.to_jsonb(new_count)
)
}, synchronize_session=False)
I am trying to compare two datetime.time variables in some python code in a View.
p_obj.time_start is a Time field
b_obj.time_start is also a Time field
slot_start = p_obj.time_start
whereami = slot_start
...
for b_obj in mydata:
while (b_obj.time_start > whereami):
...
...
When I run the view, I get a Type Error
can't compare datetime.time to builtin_function_or_method
Looking at the django debug I can see that:
whereami is <built-in method time of datetime.datetime object at 0x7fdb1136ee40>
but
slot_start is datetime.time(10, 0)
I believe my problem is due to the way the comparision is evaluating.
I have tried various permutations, and while I can get both variables independently to resolve to a datetime.time (like slot_start is) I can't see to get them both to do it prior to or during the while comparison. Is there a way to force the evaluation prior to the comparison?
I have a thought that this might be something to do with the first method being evaluated then the result (a datetime.time) not being comparable to the second method (which is a method). Is there some simple syntax I am missing to make both elements evaluate prior to the comparison?
Strangely
if (whereami < b_obj.time_start):
doesn't throw any error, and does appear to work correctly (in that it performs the expected comparisons).
Adding more info as requested:
class Booking(models.Model):
....
practavail = models.ForeignKey(PractAvail, on_delete=models.PROTECT, blank=False)
time_start = models.TimeField("Start Time", blank=False)
time_finish = models.TimeField("Finish Time", blank=False)
....
class PractAvail(models.Model):
....
time_start = models.TimeField("Start Time", blank=False)
time_finish = models.TimeField("Finish Time", blank=False)
....
Then in my view (skipping non essential bits):
p_avails = PractAvail.objects.filter(date=inputdate).order_by('time_start')
for p_obj in p_avails:
mydata['bookings'][p_obj.pk] = p_obj.booking_set.all().order_by('time_start')
for b_obj in mydata['bookings'][p_obj.pk]:
and then the while loop as above.
I have also found that my code works fine with:
if (b_obj.time_start > whereami):
yesdoloop = True
else:
yesdoloop = False
while yesdoloop:
...do stuff, including updating whereami...
if (b_obj.time_start > whereami):
yesdoloop = True
else:
yesdoloop = False
So, there is definitely something different about how a while compares as against how an if compares (I think?!)
I'm trying to do a search between two dates with sqlalchemy. If I used static dates will be this way.
def secondExercise():
for instance in session.query(Puppy.name, Puppy.weight, Puppy.dateOfBirth).\
filter(Puppy.dateOfBirth <= '2015-08-31', Puppy.dateOfBirth >= '2015-02-25' ).order_by(desc("dateOfBirth")):
print instance
Manipulating dates in python is quite easy.
today = date.today().strftime("%Y/%m/%d")
sixthmonth = date(date.today().year, date.today().month-6,date.today().day).strftime("%Y/%m/%d")
The problem is, I don't know how to implement this as parameter. Any help with this?
for instance in session.query(Puppy.name, Puppy.weight, Puppy.dateOfBirth).\
filter(Puppy.dateOfBirth <= today, Puppy.dateOfBirth >= sixthmonth ).order_by(desc("dateOfBirth")):
SQLAlchemy supports comparison by datetime.date() and datetime.datetime() objects.
http://docs.sqlalchemy.org/en/rel_1_0/core/type_basics.html?highlight=datetime#sqlalchemy.types.DateTime
You can expose these as parameters (replace your_query with all the stuff you want to be constant and not parametrized):
six_months_ago = datetime.datetime.today() - datetime.timedelta(180)
today = datetime.datetime.today()
def query_puppies(birth_date=six_months_ago):
for puppy in your_query.filter(Puppy.dateOfBirth.between(birthdate, today)):
print puppy.name # for example..
Also note the usage of the between clause for some extra awesomeness :)
but two seperate clasuses using <= and >= would also work.
cheers
I'm trying to utilise latest() on a django model queryset to return the next upcoming date in a model.
I've tried a few different things, using __lte and __gte lookups on a filter and to no avail.
The filter option would work for me, if there was a way to effectively utilise a model method within an exclude() but without writing a custom manager that's not going to be an option.
There must be an easier way?
class RaidSession(models.Model):
scheduled = models.DateTimeField()
duration = models.DurationField()
def is_expired(self):
duration_to_date = self.scheduled + self.duration
return True if duration_to_date < timezone.now() else False
Since I'm a little old school, it usually helps me to think of such problems as an SQL query. In your case this would be
SELECT * FROM app_raidsession rs
WHERE rs.scheduled >= now()
ORDER BY rs.scheduled
LIMIT 1
This gives you the next scheduled raid.
In django ORM, you should be able to translate this more or less straightforward to:
from django.utils.timezone import now
# first() returns None if the result is empty
next_raid = models.RaidSession.objects \
.filter(scheduled__gte=now()) \
.order_by('scheduled') \
.first()
If the duration is relevant, you will need an F-expression:
from django.db.models import F
next_raid = models.RaidSession.objects \
.filter(scheduled__gte=now() - F('duration')) \
.order_by('scheduled') \
.first()
Is there any way to get SQLAlchemy to do a bulk insert rather than inserting each individual object. i.e.,
doing:
INSERT INTO `foo` (`bar`) VALUES (1), (2), (3)
rather than:
INSERT INTO `foo` (`bar`) VALUES (1)
INSERT INTO `foo` (`bar`) VALUES (2)
INSERT INTO `foo` (`bar`) VALUES (3)
I've just converted some code to use sqlalchemy rather than raw sql and although it is now much nicer to work with it seems to be slower now (up to a factor of 10), I'm wondering if this is the reason.
May be I could improve the situation using sessions more efficiently. At the moment I have autoCommit=False and do a session.commit() after I've added some stuff. Although this seems to cause the data to go stale if the DB is changed elsewhere, like even if I do a new query I still get old results back?
Thanks for your help!
SQLAlchemy introduced that in version 1.0.0:
Bulk operations - SQLAlchemy docs
With these operations, you can now do bulk inserts or updates!
For instance, you can do:
s = Session()
objects = [
User(name="u1"),
User(name="u2"),
User(name="u3")
]
s.bulk_save_objects(objects)
s.commit()
Here, a bulk insert will be made.
The sqlalchemy docs have a writeup on the performance of various techniques that can be used for bulk inserts:
ORMs are basically not intended for high-performance bulk inserts -
this is the whole reason SQLAlchemy offers the Core in addition to the
ORM as a first-class component.
For the use case of fast bulk inserts, the SQL generation and
execution system that the ORM builds on top of is part of the Core.
Using this system directly, we can produce an INSERT that is
competitive with using the raw database API directly.
Alternatively, the SQLAlchemy ORM offers the Bulk Operations suite of
methods, which provide hooks into subsections of the unit of work
process in order to emit Core-level INSERT and UPDATE constructs with
a small degree of ORM-based automation.
The example below illustrates time-based tests for several different
methods of inserting rows, going from the most automated to the least.
With cPython 2.7, runtimes observed:
classics-MacBook-Pro:sqlalchemy classic$ python test.py
SQLAlchemy ORM: Total time for 100000 records 12.0471920967 secs
SQLAlchemy ORM pk given: Total time for 100000 records 7.06283402443 secs
SQLAlchemy ORM bulk_save_objects(): Total time for 100000 records 0.856323003769 secs
SQLAlchemy Core: Total time for 100000 records 0.485800027847 secs
sqlite3: Total time for 100000 records 0.487842082977 sec
Script:
import time
import sqlite3
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, String, create_engine
from sqlalchemy.orm import scoped_session, sessionmaker
Base = declarative_base()
DBSession = scoped_session(sessionmaker())
engine = None
class Customer(Base):
__tablename__ = "customer"
id = Column(Integer, primary_key=True)
name = Column(String(255))
def init_sqlalchemy(dbname='sqlite:///sqlalchemy.db'):
global engine
engine = create_engine(dbname, echo=False)
DBSession.remove()
DBSession.configure(bind=engine, autoflush=False, expire_on_commit=False)
Base.metadata.drop_all(engine)
Base.metadata.create_all(engine)
def test_sqlalchemy_orm(n=100000):
init_sqlalchemy()
t0 = time.time()
for i in xrange(n):
customer = Customer()
customer.name = 'NAME ' + str(i)
DBSession.add(customer)
if i % 1000 == 0:
DBSession.flush()
DBSession.commit()
print(
"SQLAlchemy ORM: Total time for " + str(n) +
" records " + str(time.time() - t0) + " secs")
def test_sqlalchemy_orm_pk_given(n=100000):
init_sqlalchemy()
t0 = time.time()
for i in xrange(n):
customer = Customer(id=i+1, name="NAME " + str(i))
DBSession.add(customer)
if i % 1000 == 0:
DBSession.flush()
DBSession.commit()
print(
"SQLAlchemy ORM pk given: Total time for " + str(n) +
" records " + str(time.time() - t0) + " secs")
def test_sqlalchemy_orm_bulk_insert(n=100000):
init_sqlalchemy()
t0 = time.time()
n1 = n
while n1 > 0:
n1 = n1 - 10000
DBSession.bulk_insert_mappings(
Customer,
[
dict(name="NAME " + str(i))
for i in xrange(min(10000, n1))
]
)
DBSession.commit()
print(
"SQLAlchemy ORM bulk_save_objects(): Total time for " + str(n) +
" records " + str(time.time() - t0) + " secs")
def test_sqlalchemy_core(n=100000):
init_sqlalchemy()
t0 = time.time()
engine.execute(
Customer.__table__.insert(),
[{"name": 'NAME ' + str(i)} for i in xrange(n)]
)
print(
"SQLAlchemy Core: Total time for " + str(n) +
" records " + str(time.time() - t0) + " secs")
def init_sqlite3(dbname):
conn = sqlite3.connect(dbname)
c = conn.cursor()
c.execute("DROP TABLE IF EXISTS customer")
c.execute(
"CREATE TABLE customer (id INTEGER NOT NULL, "
"name VARCHAR(255), PRIMARY KEY(id))")
conn.commit()
return conn
def test_sqlite3(n=100000, dbname='sqlite3.db'):
conn = init_sqlite3(dbname)
c = conn.cursor()
t0 = time.time()
for i in xrange(n):
row = ('NAME ' + str(i),)
c.execute("INSERT INTO customer (name) VALUES (?)", row)
conn.commit()
print(
"sqlite3: Total time for " + str(n) +
" records " + str(time.time() - t0) + " sec")
if __name__ == '__main__':
test_sqlalchemy_orm(100000)
test_sqlalchemy_orm_pk_given(100000)
test_sqlalchemy_orm_bulk_insert(100000)
test_sqlalchemy_core(100000)
test_sqlite3(100000)
As far as I know, there is no way to get the ORM to issue bulk inserts. I believe the underlying reason is that SQLAlchemy needs to keep track of each object's identity (i.e., new primary keys), and bulk inserts interfere with that. For example, assuming your foo table contains an id column and is mapped to a Foo class:
x = Foo(bar=1)
print x.id
# None
session.add(x)
session.flush()
# BEGIN
# INSERT INTO foo (bar) VALUES(1)
# COMMIT
print x.id
# 1
Since SQLAlchemy picked up the value for x.id without issuing another query, we can infer that it got the value directly from the INSERT statement. If you don't need subsequent access to the created objects via the same instances, you can skip the ORM layer for your insert:
Foo.__table__.insert().execute([{'bar': 1}, {'bar': 2}, {'bar': 3}])
# INSERT INTO foo (bar) VALUES ((1,), (2,), (3,))
SQLAlchemy can't match these new rows with any existing objects, so you'll have to query them anew for any subsequent operations.
As far as stale data is concerned, it's helpful to remember that the session has no built-in way to know when the database is changed outside of the session. In order to access externally modified data through existing instances, the instances must be marked as expired. This happens by default on session.commit(), but can be done manually by calling session.expire_all() or session.expire(instance). An example (SQL omitted):
x = Foo(bar=1)
session.add(x)
session.commit()
print x.bar
# 1
foo.update().execute(bar=42)
print x.bar
# 1
session.expire(x)
print x.bar
# 42
session.commit() expires x, so the first print statement implicitly opens a new transaction and re-queries x's attributes. If you comment out the first print statement, you'll notice that the second one now picks up the correct value, because the new query isn't emitted until after the update.
This makes sense from the point of view of transactional isolation - you should only pick up external modifications between transactions. If this is causing you trouble, I'd suggest clarifying or re-thinking your application's transaction boundaries instead of immediately reaching for session.expire_all().
I usually do it using add_all.
from app import session
from models import User
objects = [User(name="u1"), User(name="u2"), User(name="u3")]
session.add_all(objects)
session.commit()
Direct support was added to SQLAlchemy as of version 0.8
As per the docs, connection.execute(table.insert().values(data)) should do the trick. (Note that this is not the same as connection.execute(table.insert(), data) which results in many individual row inserts via a call to executemany). On anything but a local connection the difference in performance can be enormous.
SQLAlchemy introduced that in version 1.0.0:
Bulk operations - SQLAlchemy docs
With these operations, you can now do bulk inserts or updates!
For instance (if you want the lowest overhead for simple table INSERTs), you can use Session.bulk_insert_mappings():
loadme = [(1, 'a'),
(2, 'b'),
(3, 'c')]
dicts = [dict(bar=t[0], fly=t[1]) for t in loadme]
s = Session()
s.bulk_insert_mappings(Foo, dicts)
s.commit()
Or, if you want, skip the loadme tuples and write the dictionaries directly into dicts (but I find it easier to leave all the wordiness out of the data and load up a list of dictionaries in a loop).
Piere's answer is correct but one issue is that bulk_save_objects by default does not return the primary keys of the objects, if that is of concern to you. Set return_defaults to True to get this behavior.
The documentation is here.
foos = [Foo(bar='a',), Foo(bar='b'), Foo(bar='c')]
session.bulk_save_objects(foos, return_defaults=True)
for foo in foos:
assert foo.id is not None
session.commit()
This is a way:
values = [1, 2, 3]
Foo.__table__.insert().execute([{'bar': x} for x in values])
This will insert like this:
INSERT INTO `foo` (`bar`) VALUES (1), (2), (3)
Reference: The SQLAlchemy FAQ includes benchmarks for various commit methods.
All Roads Lead to Rome, but some of them crosses mountains, requires ferries but if you want to get there quickly just take the motorway.
In this case the motorway is to use the execute_batch() feature of psycopg2. The documentation says it the best:
The current implementation of executemany() is (using an extremely charitable understatement) not particularly performing. These functions can be used to speed up the repeated execution of a statement against a set of parameters. By reducing the number of server roundtrips the performance can be orders of magnitude better than using executemany().
In my own test execute_batch() is approximately twice as fast as executemany(), and gives the option to configure the page_size for further tweaking (if you want to squeeze the last 2-3% of performance out of the driver).
The same feature can easily be enabled if you are using SQLAlchemy by setting use_batch_mode=True as a parameter when you instantiate the engine with create_engine()
The best answer I found so far was in sqlalchemy documentation:
http://docs.sqlalchemy.org/en/latest/faq/performance.html#i-m-inserting-400-000-rows-with-the-orm-and-it-s-really-slow
There is a complete example of a benchmark of possible solutions.
As shown in the documentation:
bulk_save_objects is not the best solution but it performance are correct.
The second best implementation in terms of readability I think was with the SQLAlchemy Core:
def test_sqlalchemy_core(n=100000):
init_sqlalchemy()
t0 = time.time()
engine.execute(
Customer.__table__.insert(),
[{"name": 'NAME ' + str(i)} for i in xrange(n)]
)
The context of this function is given in the documentation article.
Sqlalchemy supports bulk insert
bulk_list = [
Foo(
bar=1,
),
Foo(
bar=2,
),
Foo(
bar=3,
),
]
db.session.bulk_save_objects(bulk_list)
db.session.commit()