Dynamically decide which browser to use - python

I'm trying to find a way to dynamically decide which web browser will open the link I clicked.
There are a few sites that I visit that work best on Iexplore and others that I prefer to open with chrome. If I set my default browser to one of these, than I'll constantly find myself opening a site with one browser, than copying the url and opening it in a new one. This happens a lot when people send me links.
I've thought of making a python script as the default browser and making a function that decides which browser should open the page. I've tried setting the script as my default browser by changing some registry keys. It seemed to work but when I try to open a site (for example writing "http://stackoverflow.com" in the run window), the url doesn't show in sys.argv.
Is there another way of finding the arguments sent to the program?
The registry keys I changed are:
HKEY_CURRENT_USER\Software\Classes\http\shell\open\command
HKEY_CURRENT_USER\Software\Classes\https\shell\open\command
HKEY_LOCAL_MACHINE\SOFTWARE\Classes\http\shell\open\command
HKEY_LOCAL_MACHINE\SOFTWARE\Classes\https\shell\open\command
It seemed to work on windows XP but it doesn't work on 7 (the default browser is still the same...)

Have you considered using browsers extension that emulate IE rendering instead of a homegrown solution? I believe there is one called 'ie tab' for chrome/firefox. http://www.ietab.net/

You can try build something on top of existing software which automates browser-webpage interaction, have a look at Selenium, maybe you can tweak it somehow to suit your needs.
But beware, the problem you are trying to solve is fairly complex and complicated, for instance consider just this: how are you going to translate your own subjective experience of a website into code? There are some objective indices, some pages simply break, but many things, such as bad css styling are difficult to asses and quantify.
EDIT: here's a web testing framework in which you can generate your own tests in Python It's probably easier to start with then Selenium.

Related

Authenticating a Minecraft Microsoft account with Python

What even am I doing
So, as Minecraft Java has been slowly switching over to using Microsoft based accounts instead of solely Mojang accounts, I have been trying to put together an authentication method for a small launcher project I've been working on.
The First Issue.
I've been following a piece of documentation here, which had instructions on what GET and POST requests to send to which URLs, and how to parse them, etc. It's worked pretty well, except for The First Issue.
It was a dark and stormy night, and the Microsoft Authentication URL used Javascript for redirects, so the Requests library I was using in Python could not follow the redirects. There might be a way to parse the HTML content and find the redirections or something, but that is way above my head, because I am still new to even Python.
So I looked around for a solution that would let me follow the JavaScript redirects, and the best solution (in concept) looked to be using a headless browser. This led me down a long path until I came face to face with The Second Issue.
The Second Issue.
I looked around for a headless browser that I could use, and I found a couple:
Selenium, or
PyQT WebEngine or WebKit
(I know there are lots of others but I chose these and used them for examples)
From here, the issue isn't so much an issue to fix, but the issue of I don't know what I'm doing.
I looked into Selenium, and it looked promising, but the fact that I had to download a WebDriver confused me in terms of how I would package that, since this is going to be used for a distributed application.
I then looked into PyQT WebEngine, and it just confused me in all respects, so basically I just need some info on maybe how to use it. I also don't need to have to use PyQT to launch a window, or design my UI, or anything else. I already am planning to use Kivy for the GUI. I just need a headless browser or some other solution to follow Javascript redirects when sending a POST request to a certain URL.
So,
From here I just want to ask advice on which route I should take, since there seems to be a broad amount of options I could use. I've already mentioned what I need, so any advice on how or what I should use, in terms of headless browsers, libraries, etc.
Also if anyone has any other suggestions for how to authenticate a Microsoft account, please let me know.
I'm almost done
If there is anything I could answer or clarify, just let me know. I will highly appreciate all advice or suggestions.
Thanks,
Pyrotex7
Well to resolve this - I just went with PyQt in the end after messing around for a while.

Why am I not seeing the "full" html case? [duplicate]

This question already has answers here:
Web-scraping JavaScript page with Python
(18 answers)
Closed 4 hours ago.
What is the best method to scrape a dynamic website where most of the content is generated by what appears to be ajax requests? I have previous experience with a Mechanize, BeautifulSoup, and python combo, but I am up for something new.
--Edit--
For more detail: I'm trying to scrape the CNN primary database. There is a wealth of information there, but there doesn't appear to be an api.
The best solution that I found was to use Firebug to monitor XmlHttpRequests, and then to use a script to resend them.
This is a difficult problem because you either have to reverse engineer the JavaScript on a per-site basis, or implement a JavaScript engine and run the scripts (which has its own difficulties and pitfalls).
It's a heavy weight solution, but I've seen people doing this with GreaseMonkey scripts - allow Firefox to render everything and run the JavaScript, and then scrape the elements. You can even initiate user actions on the page if needed.
Selenium IDE, a tool for testing, is something I've used for a lot of screen-scraping. There are a few things it doesn't handle well (Javascript window.alert() and popup windows in general), but it does its work on a page by actually triggering the click events and typing into the text boxes. Because the IDE portion runs in Firefox, you don't have to do all of the management of sessions, etc. as Firefox takes care of it. The IDE records and plays tests back.
It also exports C#, PHP, Java, etc. code to build compiled tests/scrapers that are executed on the Selenium server. I've done that for more than a few of my Selenium scripts, which makes things like storing the scraped data in a database much easier.
Scripts are fairly simple to write and alter, being made up of things like ("clickAndWait","submitButton"). Worth a look given what you're describing.
Adam Davis's advice is solid.
I would additionally suggest that you try to "reverse-engineer" what the JavaScript is doing, and instead of trying to scrape the page, you issue the HTTP requests that the JavaScript is issuing and interpret the results yourself (most likely in JSON format, nice and easy to parse). This strategy could be anything from trivial to a total nightmare, depending on the complexity of the JavaScript.
The best possibility, of course, would be to convince the website's maintainers to implement a developer-friendly API. All the cool kids are doing it these days 8-) Of course, they might not want their data scraped in an automated fashion... in which case you can expect a cat-and-mouse game of making their page increasingly difficult to scrape :-(
There is a bit of a learning curve, but tools like Pamie (Python) or Watir (Ruby) will let you latch into the IE web browser and get at the elements. This turns out to be easier than Mechanize and other HTTP level tools since you don't have to emulate the browser, you just ask the browser for the html elements. And it's going to be way easier than reverse engineering the Javascript/Ajax calls. If needed you can also use tools like beatiful soup in conjunction with Pamie.
Probably the easiest way is to use IE webbrowser control in C# (or any other language). You have access to all the stuff inside browser out of the box + you dont need to care about cookies, SSL and so on.
i found the IE Webbrowser control have all kinds of quirks and workarounds that would justify some high quality software to take care of all those inconsistencies, layered around the shvwdoc.dll api and mshtml and provide a framework.
This seems like it's a pretty common problem. I wonder why someone hasn't anyone developed a programmatic browser? I'm envisioning a Firefox you can call from the command line with a URL as an argument and it will load the page, run all of the initial page load JS events and save the resulting file.
I mean Firefox, and other browsers already do this, why can't we simply strip off the UI stuff?

Is it possible to create a python program that controls my web pages?

I have been wondering about this since I could really benefit from a program that makes actions on websites that I use for my job that require the same command over and over again.
I know some python and I love to learn new things.
I tried looking for it on google but I guess I'm not sure how to find it.
I would love it if you could direct me to a guide or something like that.
Thank you very much!
Selenium interacts with a web browser directly, although you can hide the browser window in the code (look up Selenium in --headless mode). This is a good choice for filling out a lot of forms or interacting with graphical user interface elements.
However, if you need to request information from websites, you don't always need to interact with the web browser directly. You can use the package called Requests. This doesn't depend on any web browsers and can run silently in the background.
I think you can do it with Python and some packages like selenium. Also you need some html knowledge to search in the html source code of the specific wegpage.
I found an interesting use case, maybe that helps you:
https://towardsdatascience.com/controlling-the-web-with-python-6fceb22c5f08

Automate webpage tasks without having to have a browser open?

I know about tampermonkey/greasemonkey and have used it a fair bit, but now my task is to write a program that runs in the background and automates mundane tasks (clicking buttons, typing into input fields etc.) on a specific webpage. Running a browser in the background takes too much RAM and processing power, so I'm looking for an alternative.
So far I've found selenium, but after a bit of research it looks like that it requires to have a browser open at all times as well (or maybe not? the documentation isn't that good). I thought about python scripts too, but I don't have any experience with those nor have I any idea if they can handle anything that's not basic html. If they can, does anyone know of a good tutorial for python scripts? I have used that language a few years ago, so I shouldn't really have a problem with python itself.
If python scripts aren't ideal either, is there a (preferably somewhat simple) way I could achieve what I want?
It depends on whether you want a script that interacts with a web UI, or a script that automates web requests. Do you really need to click buttons and type into input fields? Presumably, the data from those buttons and input fields is eventually sent to a web server. You could skip the entire UI and just make the requests directly. You don't need a browser for that and python is fine for doing these types of things (you don't even need selenium, you can just use requests)
On the other hand, if you're trying to test out the UI of a web page, or you actually need to interact with the web UI for some other reason, then yeah, you'll need an application (like a web browser) that's capable of rendering the UI so you can interact with it.

Opening pages asynchronously in headless browser (PhantomJS)

I am using PhantomJS via Python through Selenium+Ghostdriver.
I am looking to load several pages simultaneously and to do so, I am looking for an async method to load pages.
From my research, PhantomJS already lives in a separate thread and supports multiple tabs, so I believe the only missing piece of the puzzle is a method to load pages in a non-blocking way.
Any solution would be welcome, be it a simple Ghostdriver method I overlooked, bypassing Ghostdriver and interfacing directly with PhantomJS or a different headless browser.
Thanks for the help and suggestions.
Yuval
If you want to bypass ghostdriver, then you can directly write your PhantomJS scripts in JavaScript or CoffeeScript. As far as I know there is no way of doing this with the selenium webdriver except with different threads in the language of your choice (python).
If you are not happy with it, there is CasperJS which has more freedom in writing scripts than with selenium, but you will only be able to use PhantomJS or SlimerJS.
I'm not completely sure on how to do this via Selenium/Ghostdriver specifically, but if you (or future readers) are able to touch the phantom scripts directly, then the solution is as simple as:
page.open(newUrl, ...);
The "page.open()" method is async by default, and should serve your needs. - Quite some time has passed since you asked this question so not sure if you need the help anymore. But, again, for those who may read this later I do hope this helps!

Categories