Authenticating a Minecraft Microsoft account with Python - python

What even am I doing
So, as Minecraft Java has been slowly switching over to using Microsoft based accounts instead of solely Mojang accounts, I have been trying to put together an authentication method for a small launcher project I've been working on.
The First Issue.
I've been following a piece of documentation here, which had instructions on what GET and POST requests to send to which URLs, and how to parse them, etc. It's worked pretty well, except for The First Issue.
It was a dark and stormy night, and the Microsoft Authentication URL used Javascript for redirects, so the Requests library I was using in Python could not follow the redirects. There might be a way to parse the HTML content and find the redirections or something, but that is way above my head, because I am still new to even Python.
So I looked around for a solution that would let me follow the JavaScript redirects, and the best solution (in concept) looked to be using a headless browser. This led me down a long path until I came face to face with The Second Issue.
The Second Issue.
I looked around for a headless browser that I could use, and I found a couple:
Selenium, or
PyQT WebEngine or WebKit
(I know there are lots of others but I chose these and used them for examples)
From here, the issue isn't so much an issue to fix, but the issue of I don't know what I'm doing.
I looked into Selenium, and it looked promising, but the fact that I had to download a WebDriver confused me in terms of how I would package that, since this is going to be used for a distributed application.
I then looked into PyQT WebEngine, and it just confused me in all respects, so basically I just need some info on maybe how to use it. I also don't need to have to use PyQT to launch a window, or design my UI, or anything else. I already am planning to use Kivy for the GUI. I just need a headless browser or some other solution to follow Javascript redirects when sending a POST request to a certain URL.
So,
From here I just want to ask advice on which route I should take, since there seems to be a broad amount of options I could use. I've already mentioned what I need, so any advice on how or what I should use, in terms of headless browsers, libraries, etc.
Also if anyone has any other suggestions for how to authenticate a Microsoft account, please let me know.
I'm almost done
If there is anything I could answer or clarify, just let me know. I will highly appreciate all advice or suggestions.
Thanks,
Pyrotex7

Well to resolve this - I just went with PyQt in the end after messing around for a while.

Related

Why am I not seeing the "full" html case? [duplicate]

This question already has answers here:
Web-scraping JavaScript page with Python
(18 answers)
Closed 4 hours ago.
What is the best method to scrape a dynamic website where most of the content is generated by what appears to be ajax requests? I have previous experience with a Mechanize, BeautifulSoup, and python combo, but I am up for something new.
--Edit--
For more detail: I'm trying to scrape the CNN primary database. There is a wealth of information there, but there doesn't appear to be an api.
The best solution that I found was to use Firebug to monitor XmlHttpRequests, and then to use a script to resend them.
This is a difficult problem because you either have to reverse engineer the JavaScript on a per-site basis, or implement a JavaScript engine and run the scripts (which has its own difficulties and pitfalls).
It's a heavy weight solution, but I've seen people doing this with GreaseMonkey scripts - allow Firefox to render everything and run the JavaScript, and then scrape the elements. You can even initiate user actions on the page if needed.
Selenium IDE, a tool for testing, is something I've used for a lot of screen-scraping. There are a few things it doesn't handle well (Javascript window.alert() and popup windows in general), but it does its work on a page by actually triggering the click events and typing into the text boxes. Because the IDE portion runs in Firefox, you don't have to do all of the management of sessions, etc. as Firefox takes care of it. The IDE records and plays tests back.
It also exports C#, PHP, Java, etc. code to build compiled tests/scrapers that are executed on the Selenium server. I've done that for more than a few of my Selenium scripts, which makes things like storing the scraped data in a database much easier.
Scripts are fairly simple to write and alter, being made up of things like ("clickAndWait","submitButton"). Worth a look given what you're describing.
Adam Davis's advice is solid.
I would additionally suggest that you try to "reverse-engineer" what the JavaScript is doing, and instead of trying to scrape the page, you issue the HTTP requests that the JavaScript is issuing and interpret the results yourself (most likely in JSON format, nice and easy to parse). This strategy could be anything from trivial to a total nightmare, depending on the complexity of the JavaScript.
The best possibility, of course, would be to convince the website's maintainers to implement a developer-friendly API. All the cool kids are doing it these days 8-) Of course, they might not want their data scraped in an automated fashion... in which case you can expect a cat-and-mouse game of making their page increasingly difficult to scrape :-(
There is a bit of a learning curve, but tools like Pamie (Python) or Watir (Ruby) will let you latch into the IE web browser and get at the elements. This turns out to be easier than Mechanize and other HTTP level tools since you don't have to emulate the browser, you just ask the browser for the html elements. And it's going to be way easier than reverse engineering the Javascript/Ajax calls. If needed you can also use tools like beatiful soup in conjunction with Pamie.
Probably the easiest way is to use IE webbrowser control in C# (or any other language). You have access to all the stuff inside browser out of the box + you dont need to care about cookies, SSL and so on.
i found the IE Webbrowser control have all kinds of quirks and workarounds that would justify some high quality software to take care of all those inconsistencies, layered around the shvwdoc.dll api and mshtml and provide a framework.
This seems like it's a pretty common problem. I wonder why someone hasn't anyone developed a programmatic browser? I'm envisioning a Firefox you can call from the command line with a URL as an argument and it will load the page, run all of the initial page load JS events and save the resulting file.
I mean Firefox, and other browsers already do this, why can't we simply strip off the UI stuff?

Is it possible to create a python program that controls my web pages?

I have been wondering about this since I could really benefit from a program that makes actions on websites that I use for my job that require the same command over and over again.
I know some python and I love to learn new things.
I tried looking for it on google but I guess I'm not sure how to find it.
I would love it if you could direct me to a guide or something like that.
Thank you very much!
Selenium interacts with a web browser directly, although you can hide the browser window in the code (look up Selenium in --headless mode). This is a good choice for filling out a lot of forms or interacting with graphical user interface elements.
However, if you need to request information from websites, you don't always need to interact with the web browser directly. You can use the package called Requests. This doesn't depend on any web browsers and can run silently in the background.
I think you can do it with Python and some packages like selenium. Also you need some html knowledge to search in the html source code of the specific wegpage.
I found an interesting use case, maybe that helps you:
https://towardsdatascience.com/controlling-the-web-with-python-6fceb22c5f08

how to embed a web browser on a webpage

I am looking to embed(stream) a web browser running from my vps on a webpage. I'm just wondering what the best way to do that would be. preferably using a python framework, But I'm just not sure where to start. I'm open to any suggestions, thanks!
What you're looking for is likely a python proxy server. Fortunately other people have also been interested in such a thing. Take a look at this stack overflow article, as it has a number of suggestions on simple python based proxies you may want to try.

Dynamically decide which browser to use

I'm trying to find a way to dynamically decide which web browser will open the link I clicked.
There are a few sites that I visit that work best on Iexplore and others that I prefer to open with chrome. If I set my default browser to one of these, than I'll constantly find myself opening a site with one browser, than copying the url and opening it in a new one. This happens a lot when people send me links.
I've thought of making a python script as the default browser and making a function that decides which browser should open the page. I've tried setting the script as my default browser by changing some registry keys. It seemed to work but when I try to open a site (for example writing "http://stackoverflow.com" in the run window), the url doesn't show in sys.argv.
Is there another way of finding the arguments sent to the program?
The registry keys I changed are:
HKEY_CURRENT_USER\Software\Classes\http\shell\open\command
HKEY_CURRENT_USER\Software\Classes\https\shell\open\command
HKEY_LOCAL_MACHINE\SOFTWARE\Classes\http\shell\open\command
HKEY_LOCAL_MACHINE\SOFTWARE\Classes\https\shell\open\command
It seemed to work on windows XP but it doesn't work on 7 (the default browser is still the same...)
Have you considered using browsers extension that emulate IE rendering instead of a homegrown solution? I believe there is one called 'ie tab' for chrome/firefox. http://www.ietab.net/
You can try build something on top of existing software which automates browser-webpage interaction, have a look at Selenium, maybe you can tweak it somehow to suit your needs.
But beware, the problem you are trying to solve is fairly complex and complicated, for instance consider just this: how are you going to translate your own subjective experience of a website into code? There are some objective indices, some pages simply break, but many things, such as bad css styling are difficult to asses and quantify.
EDIT: here's a web testing framework in which you can generate your own tests in Python It's probably easier to start with then Selenium.

MediaWiki / Python authentication integration advice needed

I'm trying to integrate my MediaWiki site with some custom Python web applications. I have complete control over the MediaWiki server and am free to change the authentication plugin if needed. For the time being, I would like all users to login via a screen on the MediaWiki page (or at least they should believe they are, the whole process should be transparent to them).
In general, I would prefer not to completely write my own authentication code, but I don't mind doing some minor adapting.
I'm looking for some advice from people who have done something like this before, my questions are:
I know absolutely nothing about LDAP, but it seems rather commonly supported with various plugins for MediaWiki and Python. Is it best to have a central LDAP server, and then force all applications to authenticate here?
As compared to the above, what are the downsides of just reading from the wiki database, and comparing to see if the shared-secret from the user's cookie match, and then assuming they are logged in?
Is it advisable to use openID for a situation like this? What are some of the downsides?
This might seem obvious but have you seen the LDAP Authentication extension? We used it (with some modifications) and it works well.
You can also use in combination with e.g. Lockdown.
So my (limited) answers to your questions are:
Yes (I can't think why you would not want it in one place).
One downside is if users move groups / authentication. They need manually to delete their cookies, which can cause headaches for people supporting the wiki.
Sorry, don't know that one.
Hope this limited answer helps.

Categories