setData is very slow

setData is very slow - python

I'm trying to optimize a method from a program I developed.
Basically, it's a GUI (I use the PyQt library) displaying informations. Informations are stored in a sqlite database. I use a QSqlTableModel and a QTableview to display these informations. It is a very common combination.
One of the field in the database is a boolean, called "new". The purpose of the method I want to optimize is to set this boolean to 0.
Here is the method:
def markOneRead(self, element):
"""Slot to mark an article read"""
print("\n")
print("start markoneread")
start_time = datetime.datetime.now()
# Get the QTableView object (I have several)
table = self.liste_tables_in_tabs[self.onglets.currentIndex()]
# Save the current selected line
line = table.selectionModel().currentIndex().row()
print("before bdd change")
elapsed_time = datetime.datetime.now() - start_time
print(elapsed_time)
# Change the data in the model
# The 12th column is the field "new". I write 0
# !!!!! Very long action
table.model().setData(table.model().index(line, 12), 0)
print("before searchbutton")
elapsed_time = datetime.datetime.now() - start_time
And the output is something like this:
before bdd change
0:00:00.000141
before searchbutton
0:00:03.064438
So basically, this line:
table.model().setData(table.model().index(line, 12), 0)
Takes 3 seconds to perform. That's very long, I'm just updating an item in the database, it shouldn't be that long. My database has 25000 items, but I don't think it changes something.
EDIT:
Maybe it's because the model performs the change immediately, and try to reload all the data ?
Do you have an idea about how to solve this issue ?
EDIT 2:
Actually, the problem comes from the reloading of the data. If I change the editStrategy of the model:
model.setEditStrategy(QtSql.QSqlTableModel.OnManualSubmit)
Now it doesn't take 3 seconds anymore, but the view is not updated, new is still set to 1, after the call to the method.
So I wonder if there is a way to "reload" only one item, one index, after a model change ?

perhaps this helps:
with:
index = table.model().index(line, 12)
table.model().dataChanged.emit(index,index)
or:
table.model().dataChanged.emit(table.model().index(line, 12),table.model().index(line, 12))
you can define the items affected by changes
see documentation.
By table.model().dataChanged-signal the automatical update or repaint after table.model.setData() should be limited to the area defined by two indices representing the top-left and bottom-right child. If both indices are identical, only one item is affected.

QSqlTableModel is very convenient but it's not magic. My guess is that executing the update statement takes most of the time. Try to update the a few rows manually and see how long that goes.
If this is slow as well, then you probably need an index on the table so sqlite can locate rows more quickly.

Related

Record the time that the element is present on screen

I have a code to test an app, and I need to record the time that the progress bar is present on the page. I have an idea in mind with a while loop and try-except, but I am afraid it might be inaccurate due to the possible delays with finding an element function. Any ideas? Python preferred.

You can initialize a creation_time variable right before your element appears on the screen, and then a destory_time variable right after your element gets destoryed.
from datetime import datetime
# REST OF THE CODE
creation_time = datetime.now()
# ......
destroy_time = datetime.now()
appearance_time = destory_time - creation_time

This may have been your thought but I'd try using EC.visibility_of_element_located in a while loop. When the element is first located, store the time in a variable as the above answer suggests. Then when the element is no longer visible, store the time in another variable and stop the loop

Python ArangoDB insertion of objects not completed after method is run

I use arango-orm (which uses python-arango in the background) in my Python/ArangoDB back-end. I have set up a small testing util that uses a remote database to insert test data, execute the unit tests and remove the test data again.
I insert my test data with a Python for loop. Each iteration, a small piece of information changes based on a generic object and then I insert that modified generic object into ArangoDB until I have 10 test objects. However, after that code is run, my test assertions tell me I don't have 10 objects stored inside my db, but only 8 (or sometimes 3, 7 or 9). It looks like pythong-arango runs these queries asynchronously or that ArangoDB already replies with an OK before the data is actually inserted. Anyone has an idea of what is going on? When I put in a sleep of 1 second after all data is inserted, my tests run green. This obviously is no solution.
This is a little piece of example code I use:
def load_test_data(self) -> None:
# This method is called from the setUp() method.
logging.info("Loading test data...")
for i in range(1, 11):
# insertion with data object (ORM)
user = test_utils.get_default_test_user()
user.id = i
user.username += str(i)
user.name += str(i)
db.add(user)
# insertion with dictionary
project = test_utils.get_default_test_project()
project['id'] = i
project['name'] += str(i)
project['description'] = f"Description for project with the id {i}"
db.insert_document("projects", project)
# TODO: solve this dirty hack
sleep(1)
def test_search_by_user_username(self) -> None:
actual = dao.search("TestUser3")
self.assertEqual(1, len(actual))
self.assertEqual(3, actual[0].id)
Then my db is created like this in a separate module:
client = ArangoClient(hosts=f"http://{arango_host}:{arango_port}")
test_db = client.db(arango_db, arango_user, arango_password)
db = Database(test_db)
EDIT:
I had not put the sync property to true upon collection creation, but after changing the collection and setting it to true, the behaviour stays exactly the same.

After getting in touch with the people of ArangoDB, I learned that views are not updatet as quickly as collections. Thye have given me an internal SEARCH option which also waits for synching views. Since it's an internal option, only used for unit testing, they high discourage the use of it. For me, I only use it for unit testing.

Whats is correct way to work with PostgreSQL from Python threads?

I need to increase speed of parsing heap of XML files. I decided to try use Python threads, but I do not know how to correct work from them with DB.
My DB store only links to files. I decided to add isProcessing column to my DB to prevent acquire of same rows from multiple threads
So result table look like:
|xml_path|isProcessing|
Every thread set this flag before starting processing and other threads select for procossings rows where this flags is not set.
But I am not sure that is correct way, because I am not sure that acquire is atomic and two threads my to process same row two times.
def select_single_file_for_processing():
#...
sql = """UPDATE processing_files SET "isProcessing" = 'TRUE' WHERE "xml_name"='{0}'""".format(xml_name)
cursor.execute(sql)
conn.commit()
def worker():
result = select_single_file_for_processing() #
# ...
# processing()
def main():
# ....
while unprocessed_xml_count != 0: # now unprocessed_xml_count is global! I know that it's wrong, but how to fix it?
checker_thread = threading.Thread(target=select_total_unpocessed_xml_count)
checker_thread.start() # if we have files for processing
for i in range(10): # run processed
t = Process(target=worker)
t.start()
The second question - what is the best practice to work with DB from multiprocessing module?

As written, your isProcessing flag could have problems with multiple threads. You should include a predicate for isProcessing = FALSE and check how many rows are updated. One thread will report 1 row and any other threads will report 0 rows.
As to best practices? This is a reasonable solution. The key is to be specific. A simple update will set the values as specified. The operation you are trying to perform though is to change the value from a to b, hence including the predicate for a in the statement.
UPDATE processing_files
SET isProcessing = 'TRUE'
WHERE xmlName = '...'
AND isProcessing = 'FALSE';

How to make a WX Python For Loop for updating a series of text boxes

I was asked to make a Subnet Scanner App for a summer class. The app is complete but its ugly as sin because the code is extremely lengthy.
It is lengthy because the app requirements called for a gui with 8 ip addresses on them and whether or not there had been a return at that IP.
Since it required a GUI, I created a series of static text fields to represent the individual IPs it was scanning.
That app has a scanner button and once that button is pressed and the scan resolves, it goes to update the board with the appropriate values.
To make it easier, I set up a global placeholder value to just increment by 8 each time I needed to see a new set.
So heres what I'd like to do
I'd like to make a for loop that updates the static text of each of the eight fields without having to write each one out individually.
What I have that works is below:
self.XValue0.SetLabel(str(placeholder))
self.XValue1.SetLabel(str(placeholder1))
self.XValue2.SetLabel(str(placeholder2))
self.XValue3.SetLabel(str(placeholder3))
self.XValue4.SetLabel(str(placeholder4))
self.XValue5.SetLabel(str(placeholder5))
self.XValue6.SetLabel(str(placeholder6))
self.XValue7.SetLabel(str(placeholder7))
XValue is in reference to 192.168.1.X where X is the value being changed since the first part of the Subnet stays the same anyway.
The placeholder with a number is in reference to a global value that adds the number (1-7) to the placeholder to populate the field.
For instance, if the GUI starts at 192.168.1.0, it would populate down to 192.168.1.7 and display the yes or no for all positions in that range.
What I would like to do is something more similar to this:
for x in range(0,7):
PlaceholderValue = str(placeholder + x)
XValue = 'XValue' + PlaceholderValue
self.XValue.setLabel(PlaceholderValue)
However, when I do that, the console gives me an error saying it can't find "XValue".
My question is this. Is there a way to make that for loop work the way I want it to, or is it by necessity the longer code based on how I wrote it. Thanks all. Sorry for the long post, its my first question after lurking for ages. Please let me know if you all need anymore info.

When you create your static text objects, save the references in a list as well as (or instead of) the XValue attributes.
self.XValues = []
for x in range(8):
self.XValues.append(wx.StaticText(...))
then it makes it much easier to loop over and update later
for i, xvalue in enumerate(self.XValues):
xvalue.SetLabel(placeholder + i)
You can still access the labels individually too, for instance to access XValue4 you had before, you can do self.XValues[4].
Also, you could have done it like you tried, except you need to use getattr to dynamically get attributes of objects, but you're better off storing the labels in a list.
for x in range(8):
PlaceholderValue = str(placeholder + x)
XValue = 'XValue' + PlaceholderValue
getattr(self, XValue).setLabel(PlaceholderValue)

Django Cannot update a query once a slice has been taken

I have a query...
message_batch = Message.objects.all()[500]
I don't want to have to make another database call to retrieve the objects, besides I already have them in memory so whats the point.
So I tried to update like this:
message_batch.update(send_date=datetime.datetime.now(), status="Sent")
But I get the following error message:
Cannot update a query once a slice has been taken.
Why? Is there a around around this? I want to update the objects I already have in memory not make another call to retrieve them.
This is my full code, has to be way around this....
total = Message.objects.filter(status="Unsent", sender=user, batch=batch).exclude(recipient_number__exact='').count()
for i in xrange(0,total,500):
message_batch = Message.objects.filter(status="Unsent").exclude(recipient_number__exact='')[i:i+500]
# do some stuff here
# once all done update the objects
message_batch.update(send_date=datetime.datetime.now(), billed=True)

Use django database transactions for best performance:
https://docs.djangoproject.com/en/1.5/topics/db/transactions/
eg:
from django.db import transaction
total = Message.objects.filter(status="Unsent", sender=user, batch=batch).exclude(recipient_number__exact='').count()
for i in xrange(0,total,500):
message_batch = Message.objects.filter(status="Unsent").exclude(recipient_number__exact='')[i:i+500]
# do some stuff here
#once all done update the objects
message_batch.update(send_date=datetime.datetime.now(), billed=True)
with transaction.commit_on_success():
for m in message_batch:
m.update(send_date=datetime.datetime.now(), billed=True)
This will wrap eg. 5000 rows update in to one call to database instead of calling to database for 5000 times if you execute update not in a transaction.

You can update objects by their primary keys:
base_qs = Message.objects.filter(status="Unsent", sender=user, batch=batch).exclude(recipient_number__exact='')
total = base_qs.count()
for i in xrange(0, total, 500):
page = list(base_qs[i:i+500])
page_ids = [o.pk for o in page]
# Do some stuff here
base_qs.filter(pk__in=page_ids).update(send_date=datetime.datetime.now(), billed=True)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

setData is very slow - python

Related

Record the time that the element is present on screen

Python ArangoDB insertion of objects not completed after method is run

Whats is correct way to work with PostgreSQL from Python threads?

How to make a WX Python For Loop for updating a series of text boxes

Django Cannot update a query once a slice has been taken

Categories

Resources