I am trying to make an application in Python using PyQt that can fetch the generated content of a list of URLs and process the fetched source with the help of multiple threads. I need to run about ten QWebViews at once. As ridiculous as that might sound, when it comes to hundreds of URLs, using threaded QWebViews gets the results over 3 times faster than normal.
Here is the test code that I have been having problems with...
import sys
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *
class Worker(QThread):
def __init__(self, url, frame):
QThread.__init__(self)
self.url = url
self.frame = frame
def run(self):
self.frame.load(QUrl(self.url))
print len(self.frame.page().mainFrame().toHtml())
app = QApplication(sys.argv)
webFrame = QWebView()
workerList = []
for x in range(1):
worker = Worker('http://www.google.com', webFrame)
workerList.append(worker)
for worker in workerList:
worker.start()
sys.exit(app.exec_())
Above, I tried initializing the QWebView in the main QApplication only to get:
QObject: Cannot create children for a parent that is in a different thread.
So then I tried initializing the QWebView in the QThread; but then, the QWebView remained unchanged and blank without outputting any errors or anything. This was probably due to a cache error.
I have the feeling that I am missing out on something or skipping a very important step. Since threaded QWebViews in PyQt isn't a really documented topic, I would really appreciate any help on how to successfully implement this.
There are multiple issues with your question and code:
You are talking about QWebFrame, but are actually passing a QWebView to your worker(s). Since this is a QWidget, it belongs to the main (GUI) thread and should not be modified by other threads.
One QWebView / QWebFrame can only load one URL at a time, so you cannot share it across multiple workers.
QWebFrame.load() loads data asynchronously, i.e. a call to load() returns immediately and there will be no data to read yet. You will have to wait for the loadFinished() signal to be emitted before you can access the data.
Since the actual loading is done by the networking layer of the operating system, and the load() method does not block, there is no need to run it in a separate thread in the first place. Why do you claim that this should be faster -- it makes no sense.
Since you want to load hundreds of URLs in parallel (or about 10, you are mentioning both in the same sentence), are you sure that you want to use QWebFrame, which is a presentation class? Do you actually want to render the HTML or are you just interested in the data retrieved?
Related
I am working on a scientific algorithm (image processing), which is written in C++, and uses lots of parallelization, handled by OpenMP. I need it to be callable from Python, so I created a CPython package, which handles the wrapping of the algorithm.
Now I need some UI, as user interaction is essential for initializing some stuff. My problem is that the UI freezes when I run the algorithm. I start the algorithm in a separate thread, so this shouldn't be a problem (I even proved it by replacing the function call with time.sleep, and it works fine, not causing any freeze). For testing I reduced the UI to two buttons: one for starting the algorithm, and another just to print some random string to console (to check UI interactions).
I also experienced something really weird. If I started moving the mouse, then pressed the button to start the computation, and after that kept moving the mouse continuously, the UI did not freeze, so hovering over the buttons gave them the usual blueish Windows-style tint. But if I stopped moving my mouse for a several seconds over the application window, clicked a button, or swapped to another window, the UI froze again. It's even more strange that the UI stayed active if I rested my mouse outside of the application window.Here's my code (unfortunately I cannot share the algorithm for several reasons, but I hope I manage to get some help even like this):
if __name__ == "__main__":
from PyQt5.QtWidgets import QApplication, QPushButton, QVBoxLayout, QWidget
from PyQt5.QtCore import QThread, QObject, pyqtSignal
import time
from CustomAlgorithm import Estimator # my custom Python package implemented in C++
class Worker(QObject):
finished = pyqtSignal()
def run(self):
estimator = Estimator()
estimator.calculate()
# if the above two lines are commented, and the next line is uncommented,
# everything's fine
# time.sleep(5)
print("done")
app = QApplication([])
thread = QThread()
window = QWidget()
layout = QVBoxLayout()
# button to start the calculation
btn = QPushButton("start")
layout.addWidget(btn)
btn.clicked.connect(thread.start)
# button to print some text to console
btn2 = QPushButton("other button")
layout.addWidget(btn2)
btn2.clicked.connect(lambda: print("other button clicked"))
window.setLayout(layout)
# handling events
worker = Worker(app)
worker.moveToThread(thread)
thread.started.connect(worker.run)
worker.finished.connect(thread.quit)
worker.finished.connect(worker.deleteLater)
thread.finished.connect(thread.deleteLater)
window.show()
app.exec_()
I tried multiple variants of using threads, like threading.Thread, multiprocessing.Process, PyQt5.QtCore.QThread (as seen above), even napari's worker implementation, but the result was the same. I even tried removing omp from the code, just in case it interferes somehow with python threads, but it didn't help.
As for the reason I use python, is that the final goal is to make my implementation available in napari.
Any help would be highly appreciated!
Because of Python's "Global Interpreter Lock", only one thread can run Python code at a time. However, other threads can do I/O at the same time.
If you want to allow other threads to run (just like I/O does) you can surround your code with these macros:
Py_BEGIN_ALLOW_THREADS
// computation goes here
Py_END_ALLOW_THREADS
Other Python threads will be allowed to run while the computation is happening. You can't access anything from Python between these two lines - so get that data in order before Py_BEGIN_ALLOW_THREADS.
Reference
I have some simple code that loads google.com with the PyQt4 library.
This is the code:
import sys
from PyQt4.QtGui import QApplication
from PyQt4.QtCore import QUrl
from PyQt4.QtWebKit import QWebView
class Browser(QWebView):
def __init__(self):
QWebView.__init__(self)
self.loadFinished.connect(self._result_available)
def _result_available(self, ok):
frame = self.page().mainFrame()
#print(frame.toHtml())
app.exit()
if __name__ == '__main__':
app = QApplication(sys.argv)
view = Browser()
view.load(QUrl('http://www.google.com'))
print('start')
app.exec_()#hangs the main thread
print('end')
My problem with this code is that app.exec_() hangs the main thread for a while, between the print start and print end.
Is there a way to scrape a website in PyQt4 without making the main thread hang for a while.
I would like to resume the main thread's normal execution of code after app.exec_().
It's probably just taking time to download the content from the web. app.exec_() is what runs your application basically. So if anything is indeed hanging it's actually some other factor that influences the proper execution of your application. Any statements that are after the app.exec_() will be executed once you close the application. Usually app.exec_() is called at the end of the main (in C++ you do return app.exec_() which you can also do in PyQt of course).
When working with content that requires time to be downloaded and displayed in the UI, you have to add multithreading in order to allow the main thread to continue work properly (thus not creating the so called UI freeze) but at the same time do some work in the background. The sole purpose of the main thread is to keep track of the UI and run it. Anything else that you add to the main thread will prevent the UI from working in a fluent manner. You have several options here - inheriting QThread (I wouldn't suggest doing that for this scenario in particular), using QThread + QObject in order to incorporate slots and signal in your application (for example: page downloaded? -> if yes, signal the UI to display the content), QRunnable etc.
So my advice is to download the content in a separate thread and once it's done you can add it to your UI as well as provide some other form of visual feedback (this also includes print statements :P).
I am fairly new in writing bigger programs in Python (I was only writing short scripts before). The program I'm about to write recives data from an external device, saves it to database and displays it in GUI (continuously). Since handling the data is time consuming I want to use threads (PySide QtCore QThreads to be specific). The best way to do this that I can think of is to set up two threads, one for the database processes and one for the handling of serial port, with GUI running in the MainThread. Now I've read a whole bunch of stuff about proper/improper QThreading, starting with Maya's post and digging deeper, up to this post where I found that:
When to subclass and when not to?
If you do not really need an event loop in the thread, you should subclass.
If you need an event loop and handle signals and slots within the thread, you may not need to subclass.
Question 1: Now the first thing I don't know is which way to go (subclassing or not). My guess is subclassing, since (Question2), but I'm not entirely sure and for now I'm sticking to moveToThread().
Question 2: The other thing I have no clue about is how to make the actions in my threads event-driven (the SerialThread receives the data from the device, sends it via signal to the DatabaseThread, the DatabaseThread collects the data and puts it in the database).
Question 3: When using moveToThread() I get AttributeError: 'PySide.QtCore.QThread' object has no attribute 'my_awesome_method' for every method I write for my class. It's clear to me that I don't understand the principle of operation here. How do I implement my methods that I want my_class to have while using my_class.moveToThread(my_thread)?
I've found all tutorials possible on QThreads and C++, so this discussion on QThreads in Python is interesting, but does not explain everything I want to know. I need a simple example with signals, slots and an exact explanation on how to use run(). Do I decorate it? Why? How? How does it work?
The short version of my code looks like this:
from PySide import QtCore
from app.thrads.gui.gui import Gui #my own class
from app.threads.db.db import Database #my own class
from app.threads.serial.serial import Serial #my own class
class AppManager():
def __init__(self):
self.gui = Gui()
self.db = Database()
self.db_thread = QtCore.QThread()
self.db.moveToThread(self.db_thread)
self.db_thread.start()
self.serial = Serial()
self.serial_thread = QtCore.QThread()
self.serial.moveToThread(self.serial_thread)
self.serial_thread.start()
and my Database and Serial class look somewhat like this:
from PySide import QtCore
from .db_signal import DbSignal
class Database(QtCore.QObject):
def __init__(self):
super(Database, self).__init__()
self.signal = DbSignal()
def my_awesome_method(self):
''' does something awesome '''
pass
Question 1
I think the best way is the easiest: don't subclass, just create two different threads. In the first, move the Database object, in the second the Serial one. As it, you won't make mistakes in the implementation of your sub-classed threads,and bug fixing will be quicker.
Question 2
I don't know the architecture of your application, but you could do as follows, after creating the threads and moving the worker objects in them:
self.serial_thread.started.connect(self.serial.start)
self.serial_thread.finished.connect(self.serial.stop)
self.db_thread.started.connect(self.db.start)
self.serial_thread.finished.connect(self.db.stop)
self.serial.data_ready.connect(self.db.handle_new_data)
# Starting the database first, as it we won't lose a single packet of data
self.db_thread.start()
self.serial_thread.start()
In fact, the advantage of QThreads is that they doesn't really modify the code.
Question 3
I think the problem is you're trying to call my_awesome_method with the QThread where your data base or your serial listener has been moved to. On the contrary, you should call the method with the object itself, as if it wasn't in a different thread !
# Bad!
obj.moveToThread(thread)
thread.method_of_obj(param)
# Good!
obj.moveToThread(thread)
obj.method_of_obj(param)
I have a piece of code that displays a Gui which has a QTextEdit Field. I would like to print to this field in real time similar to how the print function outputs to the console.
I have tried using multiple instances of the append function. Ex:
self.textEdit.append(_translate("MainWindow", ">>> Text", None))
The problem is that no matter where they are in the code, they seem to only show after the program is executed. My goal is to have them show in line like the print function does on the console.
I feel like this is an easy answer, but I have had no luck searching.. I am fairly new to Python and any help or guidance will be appreciated.
Thanks in advance!
Indeed, as mata mentioned the freezing comes from doing all your work in the same (main) thread, which also handles UI updates. One way to solve your responsiveness issue is indeed to frequently use QApplication.processEvents() in your blocking code. That will give the impression of a responsive GUI to the user if frequent enough.
However, using threads in Python (whether native or QThread) will not always work. That is because of the existence of the Global Interpreter Lock (GIL, the wiki has a good short intro). In short, Python does not allow more than one thread to execute code at the same time.
If your background task is light, or is based on IO, you can get around this as most IO-heavy modules for Python release the GIL while doing their job. However, if you are performing heavy computations in Python, the GIL will be locked by your processes, and as such your UI will still be unresponsive.
Consider the following example, built using PySide:
import sys, random
from threading import Thread
from time import sleep
from urllib import urlopen
from PySide import QtCore, QtGui
class Window(QtGui.QMainWindow):
update_signal = QtCore.Signal(int)
def __init__(self):
QtGui.QMainWindow.__init__(self)
self.progress_bar = QtGui.QProgressBar(self)
self.progress_bar.setRange(0, 10)
self.setCentralWidget(self.progress_bar)
self.update_signal[int].connect(self.progress_bar.setValue)
self.show()
self.t = Thread(target=self.worker)
self.t.start()
def worker(self):
while self.progress_bar.value() < 10:
self.update_signal.emit(self.progress_bar.value()+1)
print "Starting Sleep"
sleep(5)
print "End of Sleep"
if __name__ == '__main__':
qapp = QtGui.QApplication(sys.argv)
win = Window()
sys.exit(qapp.exec_())
Then, try replacing the worker function to:
def worker(self):
while self.progress_bar.value() < 10:
self.update_signal.emit(self.progress_bar.value()+1)
v = 0
print "Starting Add"
for i in xrange(5000000):
v = v+random.uniform(0, 100)
print "End of Add"
The first case maintains a responsive UI, as the call to sleep() releases the GIL. But the second example does not, as the computationally-intense algorithm keeps the lock.
One solution could be using the multiprocessing package.
From the docs:
multiprocessing is a package that supports spawning processes using an
API similar to the threading module. The multiprocessing package
offers both local and remote concurrency, effectively side-stepping
the Global Interpreter Lock by using subprocesses instead of threads.
Due to this, the multiprocessing module allows the programmer to fully
leverage multiple processors on a given machine.
A simple, illustrative example of using python multipocessing.
Also, if you have further interest, this blog post about multi-processing techniques might be of interest.
This means that you're probably doing all your work in the GUI thread. This is a common mistake, and it means that the GUI will freeze and not respond while there is something else going on.
You can add a call to QApplication.processEvents() to allow for the GUI to update after you change the text in your QTextEdit, but that will only partially solve the problem, the GUI will nevertheless freeze up between those calls.
The solution is simple: do the work in a separate thread. You should read Threading Basics from the Qt documentation, that should get you started.
I am creating an application which automates loading webpages and creating screenshots (no, I cannot use one of the existing solutions) using PySide. One part of the app gets a list of URLs and loads each URL in turn using a new QWebPage object. After the page is loaded a screenshot is taken and the QWebPage object is deleted.
Every now and then, given enough runs, I get the following error from PySide, as a RuntimeError exception:
Invalid Signal signature: loadStarted()
Failed to connect signal loadStarted().
The first line is printed to STDERR (probably by Qt?) and the second line is the Python exception.
loadStarted() is a built-in QWebPage signal, not something I created. This works 90% of the time, and I could not figure out what makes it fail occasionally.
To be honest, this app is quite unusual in design, as it hooks PySide/Qt into a uWSGI served web app - this means that for example I am not using the QApplication event loop but rather a local event loop for each page load. I am also not experienced in either Qt or Python so it's possible I'm making a lot of mistakes, but I can't figure out what those are.
I am thinking this post might have something to do with it, but I'm not sure.
Any suggestions on where to look next?
UPDATE: the signal is connected through the following code:
class MyWebPage(QWebPage):
def __init__(self, parent=None):
super(MyWebPage, self).__init__(parent)
self.loadStarted.connect(self.started)
self.loadFinished[bool].connect(self.finished)
MyWebPage objects are created as children of a different single QObject instance which is not deleted until the process shuts down. They are deleted by calling page.deleteLater() once I am done with them. Since I am running a local event loop, I trigger deferred deletions to happen after exiting local event loop by calling:
# self.eventLoop is the local event loop, which at this stage is not running
self.eventLoop.processEvents()
# self.app is the QApplication instance
self.app.sendPostedEvents(None, QEvent.DeferredDelete)
I was having the same problem (I'd get these errors every once in a while, but I couldn't consistently reproduce it). I think you may be right that it has something to do with methods not existing when you try to connect the signals to them - just to test that, I put the .connect calls in a separate method, and the errors went away. For example:
EDIT:
(a few hours later) I guess I spoke too soon - I just got the error again.
UPDATE:
(a few weeks later)
I played around with the syntax a lot and occasionally even got a RuntimeError (possibly this bug in PySide?). I'm still not completely sure why, but as the error happens inconsistently, you're probably safe in forcing it like this:
class MyWebPage(QWebPage):
def __init__(self, parent=None):
super(MyWebPage, self).__init__(parent)
success = False
while not success:
try:
success = self.loadStarted.connect(self.started)
except RuntimeError:
success = False
success = False
while not success:
try:
success = self.loadFinished[bool].connect(self.finished)
except RuntimeError:
success = False
If you really want to be safe, you could maybe keep a loop counter and just crash the program if the signal doesn't connect correctly before some threshold.
For what its worth, this and other issues were finally solved in a decent way after I upgraded to PySide 1.2.1.