How do I use setPage() in pyqtwebengine? - python

I want to upload my javascript and html code to qwebengine so it will read it and load it in any website I specify. Is this possible?
I want the html to load on the website with the setPage() command but I'm not sure how to do this.
I want to do it to my own website using setPage() with qtwebengine.

Related

use a single navbar file on multiple html files using a single tag in python flask

I have navbar html template nav.html I want to use it on other html files.
I am creating a web application on pyhton flask.
enter image description here
I dont to write the same code on every page. I want to create once and user any where.

How to download a file in Python?

So the issue I am having isn't that there is a link of a PDF on the web I am trying to scrape and download onto my PC (It doesn't end in .pdf). I have a download link that I want to activate, which would then lead me to download a PDF onto my computer. It looks like this:
https://***.com/files/4122109/download?download_frd=1&verifier=xxx
When I click the link, it verifies I am the user that I am, and then lets me download the file with the ID contained in the above query. The content-type for this file is "application/pdf" so I know it downloads a PDF file for me. I just need a library that "clicks" or "activates" the download for me.
Also, I am trying to do this for all the URLs I am pulling from a course on Canvas in a GET request. I am not trying to use Selenium here because I am getting these URLs from an API. Any advice in this approach would be highly appreciated.

HTML Parsing with Python (HTML vs. complete website)

I'm trying to parse html from a website that contains information about train tickets and there prices (source below), however I'm having an issue getting back all the html from the website when I use urllib to request the html.
What I need is the price per ticket which doesn't seem to appear when I used urllib to request the html. After doing some investigative work, I determined that if I save the webpage with chrome and select "HTML only", I don't get the price, however if I select "Complete WebPage," I do. Is there anyway to view the HTML that I get when I download the "Complete Webpage" and use that in python. Or is there a way to automate the downloading of the complete webpage and use the downloaded files to parse in python.
Thanks,
George
https://www.raileurope.com/en/us/point_to_point/ptp_results.htm?execution=e3s1&resultId=147840746&cobrand=public&saleCountry=us&resultId=147840746&cobrand=public&saleCountry=us&itemId=-1&fn=fsRequest&cobrand=public&c=USD&roundtrip=0&isAtocRequest=0&georequest=1&lang=en&route-type=0&from0=paris&to0=amsterdam&deptDate0=06%2F07%2F2017&time0=8&pass-question-radio=1&nCountries=&selCountry1=&selCountry2=&selCountry3=&selCountry4=&selCountry5=&familyId=&p=0&additionalTraveler0=adult&additionalTravelerAge0=&paxIds=&nA=1&nY=0&nC=0&nS=0
Take a look at selenium
Since the website is rendered by JS, you will have to use a webdriver to simulate the "Click".
You will need a crawler instead of a simple scraper

Log into secured website, automatically print page as pdf

I have been exploring ways to use python to log into a secure website (eg. Salesforce), navigate to a certain page and print (save) the page as pdf at a prescribed location.
I have tried using:
pdfkit.from_url: Use Request to get a session cookie, parse it then pass it as cookie into the wkhtmltopdf's options settings. This method does not work due to pdfkit not being able to recognise the cookie I passed.
pdfkit.from_file: Use Request.get to get the html of the page I want to print, then use pdfkit to convert the html file to pdf. This works but the page format and images are all missing.
Selenium: Use a webdriver to log in then navigate to the wanted page, call the windows.print function. This does not work because I can't pass any arguments to the window's SaveAs dialog.
Does anyone have any idea to get around?
log in using requests
use requests session mechanism to keep track of the cookie
use session to retrieve the HTML page
parse the HTML (use beautifulsoup)
identify img tags and css links
download locally the images and css documents
rewrite the img src attributes to point to the locally downloaded images
rewrite the css links to point to the locally downloaded css
serialize the new HTML tree to a local .html file
use whatever "HTML to PDF" solution to render the local .html file

Web Scraping Javascript Using Python

I am used to using BeautifulSoup to scrape a website, however this website is different. Upon soup.prettify() I get back Javascript code, lots of stuff. I want to scrape this website for the data on the actual website (company name, telephone number etc). Is there a way of scraping these scripts such as Main.js to retrieve the data that is displayed on the website to me?
Clear version:
Code is:
<script src="/docs/Main.js" type="text/javascript" language="javascript"></script>
This holds the text that is on the website. I would like to scrape this text however it is populated using JS not HTML (which I used to use BeautifulSoup for).
You're asking if you can scrape text generated at runtime by Javascript. The answer is sort-of.
You'd need to run some kind of headless browser, like PhantomJS, in order to let the Javascript execute and populate the page. You'd then need to feed the HTML that the headless browser generates to BeautifulSoup in order to parse it.

Categories