How to modify the web-browsing history through a python script? - python

I want to programatically modify the browsing history of Chrome through a Python code.
I already knew that many browsers use sqllite database for the browsing history. And asked google and all the answers and questions were about importing/exporting the data of the browsing history.
However what I want to do is to modify the data in the database to delete specific or all the sites that I've visited.
I would like to ask you if there was some modules in Python that helps doing the task through code.
If not applicable then we will need to switch into making the code take control over the mouse and the screen, open chrome, go to the browsing history, select the rows wanted deleted and press delete/confirm. Which will be impossible for a beginner like me to gather the determination and resources to do it.

This may be helpful How can I delete all web history that matches a specific query in Google Chrome
it uses javascript, but you can easily translate it into some simple python and use something like BS BeautifulSoup or just scroll down and there's some sql things going on which look promising
And because you'll have the code already in one language translating it should be pretty simple, even for a beginner and especially using python. PS this took me about as long to find as it was to read your question. it's rare that you will want to do something that someone hasn't already done..just maybe not in python ;)

Related

Authenticating a Minecraft Microsoft account with Python

What even am I doing
So, as Minecraft Java has been slowly switching over to using Microsoft based accounts instead of solely Mojang accounts, I have been trying to put together an authentication method for a small launcher project I've been working on.
The First Issue.
I've been following a piece of documentation here, which had instructions on what GET and POST requests to send to which URLs, and how to parse them, etc. It's worked pretty well, except for The First Issue.
It was a dark and stormy night, and the Microsoft Authentication URL used Javascript for redirects, so the Requests library I was using in Python could not follow the redirects. There might be a way to parse the HTML content and find the redirections or something, but that is way above my head, because I am still new to even Python.
So I looked around for a solution that would let me follow the JavaScript redirects, and the best solution (in concept) looked to be using a headless browser. This led me down a long path until I came face to face with The Second Issue.
The Second Issue.
I looked around for a headless browser that I could use, and I found a couple:
Selenium, or
PyQT WebEngine or WebKit
(I know there are lots of others but I chose these and used them for examples)
From here, the issue isn't so much an issue to fix, but the issue of I don't know what I'm doing.
I looked into Selenium, and it looked promising, but the fact that I had to download a WebDriver confused me in terms of how I would package that, since this is going to be used for a distributed application.
I then looked into PyQT WebEngine, and it just confused me in all respects, so basically I just need some info on maybe how to use it. I also don't need to have to use PyQT to launch a window, or design my UI, or anything else. I already am planning to use Kivy for the GUI. I just need a headless browser or some other solution to follow Javascript redirects when sending a POST request to a certain URL.
So,
From here I just want to ask advice on which route I should take, since there seems to be a broad amount of options I could use. I've already mentioned what I need, so any advice on how or what I should use, in terms of headless browsers, libraries, etc.
Also if anyone has any other suggestions for how to authenticate a Microsoft account, please let me know.
I'm almost done
If there is anything I could answer or clarify, just let me know. I will highly appreciate all advice or suggestions.
Thanks,
Pyrotex7
Well to resolve this - I just went with PyQt in the end after messing around for a while.

Why am I not seeing the "full" html case? [duplicate]

This question already has answers here:
Web-scraping JavaScript page with Python
(18 answers)
Closed 4 hours ago.
What is the best method to scrape a dynamic website where most of the content is generated by what appears to be ajax requests? I have previous experience with a Mechanize, BeautifulSoup, and python combo, but I am up for something new.
--Edit--
For more detail: I'm trying to scrape the CNN primary database. There is a wealth of information there, but there doesn't appear to be an api.
The best solution that I found was to use Firebug to monitor XmlHttpRequests, and then to use a script to resend them.
This is a difficult problem because you either have to reverse engineer the JavaScript on a per-site basis, or implement a JavaScript engine and run the scripts (which has its own difficulties and pitfalls).
It's a heavy weight solution, but I've seen people doing this with GreaseMonkey scripts - allow Firefox to render everything and run the JavaScript, and then scrape the elements. You can even initiate user actions on the page if needed.
Selenium IDE, a tool for testing, is something I've used for a lot of screen-scraping. There are a few things it doesn't handle well (Javascript window.alert() and popup windows in general), but it does its work on a page by actually triggering the click events and typing into the text boxes. Because the IDE portion runs in Firefox, you don't have to do all of the management of sessions, etc. as Firefox takes care of it. The IDE records and plays tests back.
It also exports C#, PHP, Java, etc. code to build compiled tests/scrapers that are executed on the Selenium server. I've done that for more than a few of my Selenium scripts, which makes things like storing the scraped data in a database much easier.
Scripts are fairly simple to write and alter, being made up of things like ("clickAndWait","submitButton"). Worth a look given what you're describing.
Adam Davis's advice is solid.
I would additionally suggest that you try to "reverse-engineer" what the JavaScript is doing, and instead of trying to scrape the page, you issue the HTTP requests that the JavaScript is issuing and interpret the results yourself (most likely in JSON format, nice and easy to parse). This strategy could be anything from trivial to a total nightmare, depending on the complexity of the JavaScript.
The best possibility, of course, would be to convince the website's maintainers to implement a developer-friendly API. All the cool kids are doing it these days 8-) Of course, they might not want their data scraped in an automated fashion... in which case you can expect a cat-and-mouse game of making their page increasingly difficult to scrape :-(
There is a bit of a learning curve, but tools like Pamie (Python) or Watir (Ruby) will let you latch into the IE web browser and get at the elements. This turns out to be easier than Mechanize and other HTTP level tools since you don't have to emulate the browser, you just ask the browser for the html elements. And it's going to be way easier than reverse engineering the Javascript/Ajax calls. If needed you can also use tools like beatiful soup in conjunction with Pamie.
Probably the easiest way is to use IE webbrowser control in C# (or any other language). You have access to all the stuff inside browser out of the box + you dont need to care about cookies, SSL and so on.
i found the IE Webbrowser control have all kinds of quirks and workarounds that would justify some high quality software to take care of all those inconsistencies, layered around the shvwdoc.dll api and mshtml and provide a framework.
This seems like it's a pretty common problem. I wonder why someone hasn't anyone developed a programmatic browser? I'm envisioning a Firefox you can call from the command line with a URL as an argument and it will load the page, run all of the initial page load JS events and save the resulting file.
I mean Firefox, and other browsers already do this, why can't we simply strip off the UI stuff?

Scraping data from a website with a search box

First of all I want to apologize if my question is too broad or generic, but it would really save me a lot of needlessly wasted time to get an answer to guide me in the right direction for the work I want to do. With that out of the way, here it goes.
I am trying to retrieve some publicly available data from a website, to create a dataset to work with for a Data Science project. My big issue is that the website does not have a friendly way to download it, and, from what I gathered, it also has no API. So, getting the data requires skills that I do not possess. I would love to learn how to scrape the website (the languages I am most comfortable with are Python and R), and it would add some value to my project if I did it, but I also am somewhat pressured by time constraints, and don't know if it is even possible to scrape the website, much less to learn how to do it in a few days.
The website is this one https://www.rnec.pt/pt_PT/pesquisa-de-estudos-clinicos. It has a search box, and the only option I configure is to click the banner that says "Pesquisa Avançada" and then mark the box that says "Menor de 18 anos". I then click the "Pesquisar" button in the lower-right, and the results that show up are the ones that I want to extract (either that or, if it's simpler, all the results, without checking the "Menor de 18 anos" box). In my computer, 2 results show up per page, and there are 38 pages total. Each result has some of it details in the page where the results appear but, to get the full data from each entry, one has to click "Detalhes" in the lower right of each result, which opens a display with all the data from that result. If possible, I would love to download all the data from that "Detalhes" page of each result (the data there alerady contains the fields that show up in the search result page).
Honestly, I am ready to waste a whole day manually transcribing all the data, but it would be much better to do it computationally, even it it takes me two or three days to learn and do it.
I think that, for someone with experience in web scraping, it is probably super simple to check if it is possible to download the data I described, and what is the best way to go about it (in general terms, I would research and learn it). But I really am lost when it comes to this, and just kindly want to ask for some help in showing me the way go about it (even if the answer is "it is too complicated/impossible, just do it manually"). I know that there are some Python packages for web scraping, like BeautifulSoup and Selenium, but I don't really know if either of them would be appropriate.
I am sorry if my request is not exactly a short and simple coding question, but I have to try to gather any help or guidance I can get. Thank you in advance to everyone who reads my question and a special thank you if you are able to give me some pointers.

How to: Python script that will 'click' on a portion of my screen, and then do key commands?

Python noobie.
I'm trying to make Python select a portion of my screen. In this case, it is a small window within a Firefox window -- it's Firebug source code. And then, once it has selected the right area, control-A to select all and then control-C to copy. If I could figure this out then I would just do the same thing and paste all of the copies into a .txt file.
I don't really know where to begin -- are there libraries for this kind of thing? Is it even possible?
I would look into PyQt or PySide which are Python wrapper on TOp of Qt.
Qt is a big monster but it's very well documented and i'm sure it will help you further in your project once you grabbed your screen section.
As you've mentioned in the comments, the data is all in the HTML to start (I'm guessing it's greyed out in your Firebug screenshot since it's a hidden element). This approach avoids the complexity of trying to automate a browser. Here's a rough outline of how I would get the data:
Download the HTML for the whole page - I'd do this manually at first (i.e. File > Save from a browser), and if there are a bunch of pages you want to process, figure out how to download all the pages you want later. If you want to use python for this part, I'd recommend urllib2. The URLs for each page are probably pretty structured, so you could easily store them in a list, and download each one and save it locally. .
Write a script to parse the HTML - don't use regex. Since you're using Python, use something like Beautiful Soup, which will create a nice object representation of the page, and then you can get the elements you want.
You mention you're new to python, so there's definitely going to be a learning curve around this, but this actually sounds like a pretty doable project to use to learn some more python.
If you run into specific obstacles with each step, start a new question with a bit of sample code, showing what you're trying to accomplish, and people will be more than willing to help out.

How close is Python to being able to wrap it in a workbook type skin?

With my luck this question will be closed too quickly. I see a tremendous possibility for a python application that basically is like a workbook. Imagine if you will that instead of writing code you select from a menu of choices. For example, the File menu would have an open command that lets the user navigate to a file or directory of file or a webpage, even a list of web pages and specify those as the things that will be the base for the next actions.
Then you have a find menu. The menu would allow easy access to the various parsing tools, regular expression and string tools so you can specify the thing you want to find within the files.
Another menu item could allow you to create queries to interact with database objects.
I could go on and on. As the language becomes more higher level then these types of features become easier to implement. There is a tremendous advantage to developing something like this. How much time is spent reinventing the wheel for mundane tasks? Programmers have functions that they have built to do many mundane tasks but what about democratizing the power offered by a tool like Python.
I have people in my office all of the time asking how to solve problems that seem intractable to them, but when I show them how with a few lines of code their problem is solvable except for the edge cases they become amazed. I deflect their gratitude with the observation that it is not really that hard except for being able to construct the right google search to identify the right package or library to solve the problem. There is nothing amazing about my ability to use lxml and sets to pull all bolded sections from a collection of say 12,000 documents and compare across time and across unique identifiers in the collection how those bolded sections have evolved/changed or converged. The amazing piece is that someone wrote the libraries to do these things.
What is the advantage to the community for something like this. Imagine if you would an interface that looks like a workbook but interacts with an app-store. So if you want to pull something from html file you go to the app store and buy a plug-in that handles the work. If the workbook is built robustly enough it could be licensed to a machine, the 'apps' would be tied to a particular workbook.
Just imagine the creativity that could be unleashed by users if they could get over the feeling that access to this power is difficult. You guys may not see this but I see Python being so close to being able to port to something like a workbook framework. Weren't the early spreadsheet programs nothing more than a frame around some Fortran libraries that had been ported to C?
Comments or is there such an application and I have not found it.
There are Python application that are based on generating code -- the most amazing one probably Resolver One, which focuses on spreadsheets (and hinges on IronPython). With that exception, however, interacting based on the UI paradigm you have in mind (pick one of this, one of that, etc) tends to be pretty limited in the gamut of choices it offers to let the user generate the exact application they need -- there's just so much more you can say by writing even a little script, than what you can say by point-and-grunt.
That being said, Python would surely be a great choice both to implement such an app and as the language to generate... if and when you have a UI sketch that looks like it can actually allow non-programmers to specify a large-enough spectrum of apps in a broad-enough domain!-). Spreadsheets have proven themselves in this sense, but I don't know of other niches or approaches that have actually done so -- do you?
Your idea kinda reminded me of something I stumbled across months ago: http://www.ailab.si/orange/
Is your concept very similar to Microsoft Access? Generally programmers tend not to write such programs because they produce such horrible code that the authors themselves would never want to use their program.

Categories