I want to save a page content to an image when it is fully loaded but sometimes i am getting output raster not rendered completely.
Code:
import sys
import signal
import os
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import QWebPage
app = QApplication(sys.argv)
signal.signal(signal.SIGINT, signal.SIG_DFL)
webpage = QWebPage()
def onLoadFinished(result):
if not result:
print "Request failed"
sys.exit(1)
webpage.setViewportSize(webpage.mainFrame().contentsSize())
image = QImage(webpage.viewportSize(), QImage.Format_ARGB32)
painter = QPainter(image)
webpage.mainFrame().render(painter)
painter.end()
if os.path.exists("output.png"):
os.remove("output.png")
image.save("output.png")
sys.exit(0) # quit this application
webpage.mainFrame().load(QUrl("file:///page.html"))
webpage.connect(webpage, SIGNAL("loadFinished(bool)"), onLoadFinished)
sys.exit(app.exec_())
Page is using JavaScript (onload function) to acquire google map (640x640px) .
Image: http://i56.tinypic.com/15ojg3s.png
I'm not sure if this even possible. For a static website this could probably work, but Google Maps will load tiles dynamically, and I'm in doubt it will emit a usuable "I'm done" signal.
But it seems you only want an image of a Google map? Have you looked at their API? They allow you to generate static maps, just by building a URL.
Example
http://maps.google.com/maps/api/staticmap?center=Brooklyn+Bridge,New+York,NY&zoom=14&size=512x512&maptype=roadmap &markers=color:blue|label:S|40.702147,-74.015794&markers=color:green|label:G|40.711614,-74.012318 &markers=color:red|color:red|label:C|40.718217,-73.998284&sensor=false
Related
I am trying to scrape the following svg's from the following link:
https://finance.yahoo.com/quote/AAPL/analysts?p=AAPL
The portion I am trying to scrape is as follows:
Images Here
I do not need the words of the chart (just the graphs themselves). However, I have never scraped an svg image before and i'm not sure if it is possible. I looked around but could not find any useful python packages to directly do this.
I know that I can take a screenshot of the image with python using selenium and then use PIL to crop it and save it as an svg, but I am wondering if there is a more direct way to grab these charts off the page. Any useful packages or implementations would be helpful. Thank you.
Edit: Got some down votes but not sure why Here is how I would implement it in my way..
import sys
import time
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *
class Screenshot(QWebView):
def __init__(self):
self.app = QApplication(sys.argv)
QWebView.__init__(self)
self._loaded = False
self.loadFinished.connect(self._loadFinished)
def capture(self, url, output_file):
self.load(QUrl(url))
self.wait_load()
# set to webpage size
frame = self.page().mainFrame()
self.page().setViewportSize(frame.contentsSize())
# render image
image = QImage(self.page().viewportSize(), QImage.Format_ARGB32)
painter = QPainter(image)
frame.render(painter)
painter.end()
print 'saving', output_file
image.save(output_file)
def wait_load(self, delay=0):
# process app events until page loaded
while not self._loaded:
self.app.processEvents()
time.sleep(delay)
self._loaded = False
def _loadFinished(self, result):
self._loaded = True
s = Screenshot()
s.capture('https://finance.yahoo.com/quote/AAPL/analysts?p=AAPL', 'yhf.png')
I would then use the crop function in PIL to take the images out of the charts.
Using QWebView for web scraping seams weird to me, although I do realize that there is an advantage that it says to the server "I'm not a web scraper, I'm an embeded browser". Note that this approach is not bulletproof: your scraper can still be detected if it shows a behavior unusual for a human user.
This is how I would do it:
Id use requests to download the page (may be through a proxy that hides your real ip addres to combat ip-bans).
Then I'd parse the page using BeautifulSoup to get the url of the svg file you are trying to get.
Then I'd download the svg file and convert it into an image using something like this
If you want to continue using Qt instead, look for methods in the web view that allow inspecting DOM or extracting the resources the view downloaded.
I would like to get the DOM of a website after js execution.
I would also like to get all the content of the iframes in the website, similarly to what I have in Google Chrome's Inspect Element feature.
This is my code:
import sys
from PyQt4 import QtGui, QtCore, QtWebKit
class Sp():
def save(self):
print ("call")
data = self.webView.page().currentFrame().documentElement().toInnerXml()
print(data.encode('utf-8'))
print ('finished')
def main(self):
self.webView = QtWebKit.QWebView()
self.webView.load(QtCore.QUrl("http://www.w3schools.com/tags/tryit.asp?filename=tryhtml_iframe_scrolling"))
QtCore.QObject.connect(self.webView,QtCore.SIGNAL("loadFinished(bool)"),self.save)
app = QtGui.QApplication(sys.argv)
s = Sp()
s.main()
sys.exit(app.exec_())
This gives me the html of the website, but not the html inside the iframes. Is there any way that I could get the HTML of the iframes.
This is a very hard problem to solve in general.
The main difficulty is that there is no way to know in advance how many frames each page has. And in addition to that, each child-frame may have its own set of frames, the number of which is also unknown. In theory, there could be an infinite number of nested frames, and the page will never finish loading (which seems no exaggeration for sites that have a lot of ads).
Anyway, below is a version of your script which gets the top-level QWebFrame object of each frame as it loads, and shows how you can access some of the things you are interested in. As you will see from the output, there are a lot of "junk" frames inserted by ads and such like that you will somehow need to filter out.
import sys, signal
from PyQt4 import QtGui, QtCore, QtWebKit
class Sp():
def save(self, ok, frame=None):
if frame is None:
print ('main-frame')
frame = self.webView.page().mainFrame()
else:
print('child-frame')
print('URL: %s' % frame.baseUrl().toString())
print('METADATA: %s' % frame.metaData())
print('TAG: %s' % frame.documentElement().tagName())
print()
def handleFrameCreated(self, frame):
frame.loadFinished.connect(lambda: self.save(True, frame=frame))
def main(self):
self.webView = QtWebKit.QWebView()
self.webView.page().frameCreated.connect(self.handleFrameCreated)
self.webView.page().mainFrame().loadFinished.connect(self.save)
self.webView.load(QtCore.QUrl("http://www.w3schools.com/tags/tryit.asp?filename=tryhtml_iframe_scrolling"))
signal.signal(signal.SIGINT, signal.SIG_DFL)
print('Press Crtl+C to quit\n')
app = QtGui.QApplication(sys.argv)
s = Sp()
s.main()
sys.exit(app.exec_())
NB: it is important that you connect to the loadFinished signal of the main frame rather than the web-view. If you connect to the latter, it will be called multiple times if the page contains more than one frame.
Am working on a python script (env: custom Linux Mint 17.1) that uses a webbrowser class to instantiate a browser instance that renders some HTML.
I'd like to have a hyperlink within the HTML, which when clicked upon, causes
a local python script to run.
Have not found any precise way to do this.. any help is appreciated,
TIA, Kaiwan.
Since you're using PyQt4 and the QtWebKit module, it's very easy to do so.
You create a function that grabs the url and acts accordingly.
Here's some sample code to get you started:
from PyQt4.QtCore import QUrl
from PyQt4.QtGui import QApplication
from PyQt4.QtWebKit import QWebView, QWebPage
import sys
def linkHandler(url):
print "[DEBUG] Clicked link is: %s" % url
if url == "My Triggering URL":
print "Found my link, launching python script"
else:
# Handle url gracefully
pass
def main():
app = QApplication(sys.argv)
webview = QWebView()
# Tell our webview to handle links, which it doesn't by default
webview.page().setLinkDelegationPolicy(QWebPage.DelegateAllLinks)
webview.linkClicked.connect(linkHandler)
webview.load(QUrl('http://google.com'))
webview.show()
return app.exec_()
if __name__ == "__main__":
main()
P.S. Whenever you're posting code snippets, it's better to edit your question and provide the information there because it becomes a mess in the comment field.
P.S.S. Be more specific in your tags the next time, there are a lot of frameworks that can actually create a browser, PyQt4 would be a very good tag to begin with and would get you more answers.
I have a python code with PySide that has a QWebView that shows google maps.
I just want to get the response each time that I do any request using the QWebView widget.
I have searched info but there is no reference about getting a response with PySide. If you need me to paste some code I will but I just have a simple QWebView widget.
EDIT: You asked me for the code:
from PySide.QtCore import *
from PySide.QtGui import *
import sys
import pyside3
class MainDialog(QMainWindow, pyside3.Ui_MainWindow):
def __init__(self, parent=None):
super(MainDialog,self).__init__(parent)
self.setupUi(self)
token_fb=""
#self.Connect_buttom.clicked.connect(self.get_fb_token)
self.Connect_buttom.clicked.connect(lambda: self.get_fb_token(self.FB_username.text(), self.FB_password.text()))
#self.connect(self.Connect_buttom, SIGNAL("clicked()"), self.get_fb_token)
#Change between locate and hunt
self.MapsButton.clicked.connect(lambda: self.select_page_index(0))
self.HuntButton.clicked.connect(lambda: self.select_page_index(1))
###########################
self.webView.setHtml(URL)
def select_page_index(self, index): # To change between frames
self.Container.setCurrentIndex(index)
I need the response from: self.webView.setHtml(URL) because depending on the response my app has to do one thing or other.
Function QWebView.setHtml() has no response in the sense that it doesn't return anything.
Maybe you want to listen to all links that are clicked and do something custom with it.
web_view = QtWebKit.QWebView()
web_view.page().setLinkDelegationPolicy(QtWebKit.QWebPage.DelegateAllLinks)
web_view.linkClicked.connect(your_handler)
Or maybe you want to do something when loading has finished. This is done by:
web_view = QtWebKit.QWebView()
web_view.loadFinished.connect(your_handler)
Is there a way to take a screenshot using PIL of an specified HTML/Javascript page that resides on my server?
I want to write a script that will change some parameters on that HTML page and then have PIL take screenshots of it.
Any ideas? Examples would be truly appreciated.
Do you absolutely have to use PIL? If not you might be able to get what you want using PyQT which has a built-in Webkit control.
See http://notes.alexdong.com/xhtml-to-pdf-using-pyqt4-webkit-and-headless for an example which converts html+css into a PDF without using a separate browser. The code is pretty short so I've copied it below.
import sys
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *
app = QApplication(sys.argv)
web = QWebView()
web.load(QUrl("http://www.google.com"))
#web.show()
printer = QPrinter()
printer.setPageSize(QPrinter.A4)
printer.setOutputFormat(QPrinter.PdfFormat)
printer.setOutputFileName("file.pdf")
def convertIt():
web.print_(printer)
print "Pdf generated"
QApplication.exit()
QObject.connect(web, SIGNAL("loadFinished(bool)"), convertIt)
sys.exit(app.exec_())