I have a query...
message_batch = Message.objects.all()[500]
I don't want to have to make another database call to retrieve the objects, besides I already have them in memory so whats the point.
So I tried to update like this:
message_batch.update(send_date=datetime.datetime.now(), status="Sent")
But I get the following error message:
Cannot update a query once a slice has been taken.
Why? Is there a around around this? I want to update the objects I already have in memory not make another call to retrieve them.
This is my full code, has to be way around this....
total = Message.objects.filter(status="Unsent", sender=user, batch=batch).exclude(recipient_number__exact='').count()
for i in xrange(0,total,500):
message_batch = Message.objects.filter(status="Unsent").exclude(recipient_number__exact='')[i:i+500]
# do some stuff here
# once all done update the objects
message_batch.update(send_date=datetime.datetime.now(), billed=True)
Use django database transactions for best performance:
https://docs.djangoproject.com/en/1.5/topics/db/transactions/
eg:
from django.db import transaction
total = Message.objects.filter(status="Unsent", sender=user, batch=batch).exclude(recipient_number__exact='').count()
for i in xrange(0,total,500):
message_batch = Message.objects.filter(status="Unsent").exclude(recipient_number__exact='')[i:i+500]
# do some stuff here
#once all done update the objects
message_batch.update(send_date=datetime.datetime.now(), billed=True)
with transaction.commit_on_success():
for m in message_batch:
m.update(send_date=datetime.datetime.now(), billed=True)
This will wrap eg. 5000 rows update in to one call to database instead of calling to database for 5000 times if you execute update not in a transaction.
You can update objects by their primary keys:
base_qs = Message.objects.filter(status="Unsent", sender=user, batch=batch).exclude(recipient_number__exact='')
total = base_qs.count()
for i in xrange(0, total, 500):
page = list(base_qs[i:i+500])
page_ids = [o.pk for o in page]
# Do some stuff here
base_qs.filter(pk__in=page_ids).update(send_date=datetime.datetime.now(), billed=True)
Related
I've been working in a web app for a while and this is the first time I realize this problem, I think it could be related with how SQLAlchemy sessions are handled, so some clarification in simple term would be helpful.
My configuration for work with flask sqlAlchemy is:
from flask_sqlalchemy import SQLAlchemy
db = SQLAlchemy(app)
My problem: db.session.commit() sometimes doesn't save changes. I wrote some flask endpoints which are reached via the front end requests in the user browser.
In this particular case, I'm editing a hotel "Booking" object altering the "Rooms" columns which is a Text field.
the function does the following:
1-Query the Booking object from the dates in the request
2- Edit the Rooms column of this Booking object
3- Commit the changes "db.session.commit()"
4- If a user has X functionality active, I make some checks calling a second function:
·4.1- This functions make some checks and query and edit another object in the database different from the "Booking" object I edited previously.
·4.2- At the end of this secondary function I call db.session.commit() "Note this changes always got saved correctly in the database"
·4.3- Return the results to the previous function
5- Return results to the front end ("just before this return, I print the Booking.Rooms to make sure it looks as it should, and it does... I even tried to make a second commit after the print but before the return... But after this, sometimes Booking.Rooms are updated as expected but some other times it doesn't... I noted if repeat the action many times it finally works, but given the intermediate function "described in point 4" saves all his changes correctly, this causes an inconsistency in the data and drives me mad because if I repeat the action and procedure in the function of point 4 worked correctly, I can't repeat the mod Rooms action...
So, I'm now really confused if this is something I don't understand from flask sessions, for what I understand, whenever I make a new request to flask, it's an isolated session, right?
I mean, if 2 concurrent users are storing some changes in the database, a db.session.commit() from one of the users won't commit the changes from the other one, right?
Same way, if I call db.session.commit() in one request, that changes are stored in the database, and if after that "in the same request", I keep modding things, it's like another session, right? And the committed changes are there already safely stored? And I can still use previous objects for further modifications
Anyway, all of this shouldn't be a problem because after the commit() I print the Booking.Rooms and looks as expected... And some times it works getting stored correctly and some times it doesn't...
Also note: When I return this result to the client, the client makes instantly a second request to the server to request updated Booking data, and then the data is returned without this expected changes committed... I suppose flask handled all the commit() before it gets the second request "other way it wouldn't have returned the result previously..."
Can this be a limitation of the flask development server which can't handle correctly many requests and that when deployed with gunicorn it doesn't happen?
Any hint or clarification about Sessions would be nice, because this is pretty strange behaviour, especially that sometimes works and others don't...
And as requested here is the code, I know is not possible to reproduce, have a lot of setup behind and would need a lot of data to works as intended under same circumstances as in my case, but this should provide an overview of how the functions looks like and where are the commits I mention above. Any ideas of where can be the problem is very helpful.
#Main function hit by the frontend
#users.route('/url_endpoint1', methods=['POST'], strict_slashes=False)
#login_required
def add_room_to_booking_Api():
try:
bookingData = request.get_json()
roomURL=bookingData["roomSafeURL"]
targetBooking = bookingData["bookingURL"]
startDate = bookingData["checkInDate"]
endDate = bookingData["checkOutDate"]
roomPrices=bookingData["roomPrices"]
booking = Bookings.query.filter_by(SafeURL=targetBooking).first()
alojamiento = Alojamientos.query.filter_by(id=reserva.CodigoAlojamiento).first() #owner of the booking
room=Rooms.query.filter_by(SafeURL=roomURL).first()
roomsInBooking=ast.literal_eval(reserva.Habitaciones) #I know, should be json.loads() and json.dumps() for better performance probably...
#if room is available for given days add it to the booking
if CheckIfRoomIsAvailableForBooking(alojamiento.id, room, startDate, endDate, booking) == "OK":
roomsInBooking.append({"id": room.id, "Prices": roomPrices, "Guests":[]}) #add the new room the Rooms column of the booking
booking.Habitaciones = str(roomsInBooking)#save the new rooms data
print(booking.Habitaciones) # check changes applied
room.ReservaAsociada = booking.id # associate booking and room
for ocupante in room.Ocupantes: #associate people in the room with the booking
ocupante.Reserva = reserva.id
#db.session.refresh(reserva) # test I made to check if something changes but didn't worked
if some_X_function() == True: #if user have some functionality enabled
#db.session.begin() #another test which didn't worked
RType = WuBook_Rooms.query.filter_by(ParentType=room.Tipo).first()
RType=[RType] #convert to list because I resuse the function in cases with multiple types
resultAdd = function4(RType, booking.Entrada.replace(hour=0, minute=0, second=0), booking.Salida.replace(hour=0, minute=0, second=0))
if resultAdd["resultado"] == True: # "resultado":error, "casos":casos
return (jsonify({"resultado": "Error", "mensaje": resultAdd["casos"]}))
print(booking.Habitaciones) #here I still get expected result
db.session.commit()
#I get this return of averything correct in my frontend but not really stored in the database
return jsonify({"resultado": "Ok", "mensaje": "Room " + str(room.Identificador) + " added to the booking"})
else:
return (jsonify({"resultado": "Error", "mensaje": "Room " + str(room.Identificador) + " not available to book in target dates"}))
except Exception as e:
#some error handling which is not getting hit now
db.session.rollback()
print(e, ": en linea", lineno())
excepcion = str((''.join(traceback.TracebackException.from_exception(e).format()).replace("\n","</br>"), "</br>Excepcion emitida ne la línea: ", lineno()))
sendExceptionEmail(excepcion, current_user)
return (jsonify({"resultado":"Error","mensaje":"Error"}))
#function from point 4
def function4(RType, startDate, endDate):
delta = endDate - startDate
print(startDate, endDate)
print(delta)
for ind_type in RType:
calendarUpdated=json.loads(ind_type.updated_availability_by_date)
calendarUpdatedBackup=calendarUpdated
casos={}
actualizar=False
error=False
for i in range(delta.days):
day = (startDate + timedelta(days=i))
print(day, i)
diaString=day.strftime("%d-%m-%Y")
if day>=datetime.now() or diaString==datetime.now().strftime("%d-%m-%Y"): #only care about present and future dates
disponibilidadLocal=calendarUpdated[diaString]["local"]
yaReservadas=calendarUpdated[diaString]["local_booked"]
disponiblesChannel=calendarUpdated[diaString]["avail"]
#adjust availability data
if somecondition==True:
actualizar=True
casos.update({diaString:"Happened X"})
else:
actualizar=False
casos.update({diaString:"Happened Y"})
error="Error"
if actualizar==True: #this part of the code is hit normally and changes stored correctly
ind_type.updated_availability_by_date=json.dumps(calendarUpdated)
wubookproperty=WuBook_Properties.query.filter_by(id=ind_type.PropertyCode).first()
wubookproperty.SyncPending=True
ind_type.AvailUpdatePending=True
elif actualizar==False: #some error occured, revert changes
ind_type.updated_availability_by_date = json.dumps(calendarUpdatedBackup)
db.session.commit()#this commit persists
return({"resultado":error, "casos":casos}) #return to main function with all this chnages stored
Realized nothing was wrong at session level, it was my fault in another function client side which sends a request to update same data which is just being updated but with the old data... so in fact, I was getting the data saved correctly in the database but overwrote few milliseconds later. It was just a return statement missing in a javascript file to avoid this outcome...
I use arango-orm (which uses python-arango in the background) in my Python/ArangoDB back-end. I have set up a small testing util that uses a remote database to insert test data, execute the unit tests and remove the test data again.
I insert my test data with a Python for loop. Each iteration, a small piece of information changes based on a generic object and then I insert that modified generic object into ArangoDB until I have 10 test objects. However, after that code is run, my test assertions tell me I don't have 10 objects stored inside my db, but only 8 (or sometimes 3, 7 or 9). It looks like pythong-arango runs these queries asynchronously or that ArangoDB already replies with an OK before the data is actually inserted. Anyone has an idea of what is going on? When I put in a sleep of 1 second after all data is inserted, my tests run green. This obviously is no solution.
This is a little piece of example code I use:
def load_test_data(self) -> None:
# This method is called from the setUp() method.
logging.info("Loading test data...")
for i in range(1, 11):
# insertion with data object (ORM)
user = test_utils.get_default_test_user()
user.id = i
user.username += str(i)
user.name += str(i)
db.add(user)
# insertion with dictionary
project = test_utils.get_default_test_project()
project['id'] = i
project['name'] += str(i)
project['description'] = f"Description for project with the id {i}"
db.insert_document("projects", project)
# TODO: solve this dirty hack
sleep(1)
def test_search_by_user_username(self) -> None:
actual = dao.search("TestUser3")
self.assertEqual(1, len(actual))
self.assertEqual(3, actual[0].id)
Then my db is created like this in a separate module:
client = ArangoClient(hosts=f"http://{arango_host}:{arango_port}")
test_db = client.db(arango_db, arango_user, arango_password)
db = Database(test_db)
EDIT:
I had not put the sync property to true upon collection creation, but after changing the collection and setting it to true, the behaviour stays exactly the same.
After getting in touch with the people of ArangoDB, I learned that views are not updatet as quickly as collections. Thye have given me an internal SEARCH option which also waits for synching views. Since it's an internal option, only used for unit testing, they high discourage the use of it. For me, I only use it for unit testing.
I need to increase speed of parsing heap of XML files. I decided to try use Python threads, but I do not know how to correct work from them with DB.
My DB store only links to files. I decided to add isProcessing column to my DB to prevent acquire of same rows from multiple threads
So result table look like:
|xml_path|isProcessing|
Every thread set this flag before starting processing and other threads select for procossings rows where this flags is not set.
But I am not sure that is correct way, because I am not sure that acquire is atomic and two threads my to process same row two times.
def select_single_file_for_processing():
#...
sql = """UPDATE processing_files SET "isProcessing" = 'TRUE' WHERE "xml_name"='{0}'""".format(xml_name)
cursor.execute(sql)
conn.commit()
def worker():
result = select_single_file_for_processing() #
# ...
# processing()
def main():
# ....
while unprocessed_xml_count != 0: # now unprocessed_xml_count is global! I know that it's wrong, but how to fix it?
checker_thread = threading.Thread(target=select_total_unpocessed_xml_count)
checker_thread.start() # if we have files for processing
for i in range(10): # run processed
t = Process(target=worker)
t.start()
The second question - what is the best practice to work with DB from multiprocessing module?
As written, your isProcessing flag could have problems with multiple threads. You should include a predicate for isProcessing = FALSE and check how many rows are updated. One thread will report 1 row and any other threads will report 0 rows.
As to best practices? This is a reasonable solution. The key is to be specific. A simple update will set the values as specified. The operation you are trying to perform though is to change the value from a to b, hence including the predicate for a in the statement.
UPDATE processing_files
SET isProcessing = 'TRUE'
WHERE xmlName = '...'
AND isProcessing = 'FALSE';
I'm trying to optimize a method from a program I developed.
Basically, it's a GUI (I use the PyQt library) displaying informations. Informations are stored in a sqlite database. I use a QSqlTableModel and a QTableview to display these informations. It is a very common combination.
One of the field in the database is a boolean, called "new". The purpose of the method I want to optimize is to set this boolean to 0.
Here is the method:
def markOneRead(self, element):
"""Slot to mark an article read"""
print("\n")
print("start markoneread")
start_time = datetime.datetime.now()
# Get the QTableView object (I have several)
table = self.liste_tables_in_tabs[self.onglets.currentIndex()]
# Save the current selected line
line = table.selectionModel().currentIndex().row()
print("before bdd change")
elapsed_time = datetime.datetime.now() - start_time
print(elapsed_time)
# Change the data in the model
# The 12th column is the field "new". I write 0
# !!!!! Very long action
table.model().setData(table.model().index(line, 12), 0)
print("before searchbutton")
elapsed_time = datetime.datetime.now() - start_time
And the output is something like this:
before bdd change
0:00:00.000141
before searchbutton
0:00:03.064438
So basically, this line:
table.model().setData(table.model().index(line, 12), 0)
Takes 3 seconds to perform. That's very long, I'm just updating an item in the database, it shouldn't be that long. My database has 25000 items, but I don't think it changes something.
EDIT:
Maybe it's because the model performs the change immediately, and try to reload all the data ?
Do you have an idea about how to solve this issue ?
EDIT 2:
Actually, the problem comes from the reloading of the data. If I change the editStrategy of the model:
model.setEditStrategy(QtSql.QSqlTableModel.OnManualSubmit)
Now it doesn't take 3 seconds anymore, but the view is not updated, new is still set to 1, after the call to the method.
So I wonder if there is a way to "reload" only one item, one index, after a model change ?
perhaps this helps:
with:
index = table.model().index(line, 12)
table.model().dataChanged.emit(index,index)
or:
table.model().dataChanged.emit(table.model().index(line, 12),table.model().index(line, 12))
you can define the items affected by changes
see documentation.
By table.model().dataChanged-signal the automatical update or repaint after table.model.setData() should be limited to the area defined by two indices representing the top-left and bottom-right child. If both indices are identical, only one item is affected.
QSqlTableModel is very convenient but it's not magic. My guess is that executing the update statement takes most of the time. Try to update the a few rows manually and see how long that goes.
If this is slow as well, then you probably need an index on the table so sqlite can locate rows more quickly.
We still have a rare case of duplicate entries when this POST method is called.
I had asked for advice previously on Stack overflow and was given a solution, that is utilising the parent/child methodology to retain strongly consistent queries.
I have migrated all data into that form and let it run for another 3 months.
However the problem was never solved.
The problem is right here with this conditional if recordsdb.count() == 1:
It should be true in order to update the entry, but instead HRD might not always find the latest entry and creates a new entry instead.
As you can see, we are writing/reading from the Record via Parent/Child methodology as recommended:
new_record = FeelTrackerRecord(parent=user.key,...)
And yet still upon retrieval, the HRD still doesn't always fetch the latest entry:
recordsdb = FeelTrackerRecord.query(ancestor = user.key).filter(FeelTrackerRecord.record_date == ... )
So we are quite stuck on this and don't know how to solve it.
#requires_auth
def post(self, ios_sync_timestamp):
user = User.query(User.email == request.authorization.username).fetch(1)[0]
if user:
json_records = request.json['records']
for json_record in json_records:
recordsdb = FeelTrackerRecord.query(ancestor = user.key).filter(FeelTrackerRecord.record_date == date_parser.parse(json_record['record_date']))
if recordsdb.count() == 1:
rec = recordsdb.fetch(1)[0]
if 'timestamp' in json_record:
if rec.timestamp < json_record['timestamp']:
rec.rating = json_record['rating']
rec.notes = json_record['notes']
rec.timestamp = json_record['timestamp']
rec.is_deleted = json_record['is_deleted']
rec.put()
elif recordsdb.count() == 0:
new_record = FeelTrackerRecord(parent=user.key,
user=user.key,
record_date = date_parser.parse(json_record['record_date']),
rating = json_record['rating'],
notes = json_record['notes'],
timestamp = json_record['timestamp'])
new_record.put()
else:
raise Exception('Got more than two records for the same record date - among REST post')
user.last_sync_timestamp = create_timestamp(datetime.datetime.today())
user.put()
return '', 201
else:
return '', 401
Possible Solution:
The very last idea I have to solve this would be, stepping away from Parent/Child strategy and using the user.key PLUS date-string as part of the key.
Saving:
new_record = FeelTrackerRecord(id=str(user.key) + json_record['record_date'], ...)
new_record.put()
Loading:
key = ndb.Key(FeelTrackerRecord, str(user.key) + json_record['record_date'])
record = key.get();
Now I could check if record is None, I shall create a new entry, otherwise I shall update it. And hopefully HRD has no reason not finding the record anymore.
What do you think, is this a guaranteed solution?
The Possible Solution appears to have the same problem as the original code. Imagine the race condition if two servers execute the same instructions practically simultaneously. With Google's overprovisioning, that is sure to happen once in a while.
A more robust solution should use Transactions and a rollback for when concurrency causes a consistency violation. The User entity should be the parent of its own Entity Group. Increment a records counter field in the User entity within a transaction. Create the new FeelTrackerRecord only if the Transaction completes successfully. Therefore the FeelTrackerRecord entities must have a User as parent.
Edit: In the case of your code the following lines would go before user = User.query(... :
Transaction txn = datastore.beginTransaction();
try {
and the following lines would go after user.put() :
txn.commit();
} finally {
if (txn.isActive()) {
txn.rollback();
}
}
That may overlook some flow control nesting detail, it is the concept that this answer is trying to describe.
With an active transaction, if multiple processes (for example on multiple servers executing the same POST concurrently because of overprovisioning) the first process will succeed with its put and commit, while the second process will throw the documented ConcurrentModificationException.
Edit 2: The transaction that increments the counter (and may throw an exception) must also create the new record. That way if the exception is thrown, the new record is not created.