in opened applications?
I want to automate firefox in some web page and I don't have a way to "know" if the page already load completely or if it still loading...
I was thinking about making an OCR to check the status bar... it's difficult ?
For example, when the word DONE appears at the status bar, the program continues to the next command...
OCR is a terrible, terrible choice for something like this. Use OCR when you are encountering images with unknown text. If you are trying to automate Firefox, there's a billion better ways of doing so. Check out something like AutoIt or any one of a hundred automation tools for Windows. Or write a custom Firefox extension. Either one of those will be far easier to implement, more reliable, and more performant than OCR.
Maybe http://groups.csail.mit.edu/uid/sikuli/ is what you want
Related
This question already has answers here:
Web-scraping JavaScript page with Python
(18 answers)
Closed 4 hours ago.
What is the best method to scrape a dynamic website where most of the content is generated by what appears to be ajax requests? I have previous experience with a Mechanize, BeautifulSoup, and python combo, but I am up for something new.
--Edit--
For more detail: I'm trying to scrape the CNN primary database. There is a wealth of information there, but there doesn't appear to be an api.
The best solution that I found was to use Firebug to monitor XmlHttpRequests, and then to use a script to resend them.
This is a difficult problem because you either have to reverse engineer the JavaScript on a per-site basis, or implement a JavaScript engine and run the scripts (which has its own difficulties and pitfalls).
It's a heavy weight solution, but I've seen people doing this with GreaseMonkey scripts - allow Firefox to render everything and run the JavaScript, and then scrape the elements. You can even initiate user actions on the page if needed.
Selenium IDE, a tool for testing, is something I've used for a lot of screen-scraping. There are a few things it doesn't handle well (Javascript window.alert() and popup windows in general), but it does its work on a page by actually triggering the click events and typing into the text boxes. Because the IDE portion runs in Firefox, you don't have to do all of the management of sessions, etc. as Firefox takes care of it. The IDE records and plays tests back.
It also exports C#, PHP, Java, etc. code to build compiled tests/scrapers that are executed on the Selenium server. I've done that for more than a few of my Selenium scripts, which makes things like storing the scraped data in a database much easier.
Scripts are fairly simple to write and alter, being made up of things like ("clickAndWait","submitButton"). Worth a look given what you're describing.
Adam Davis's advice is solid.
I would additionally suggest that you try to "reverse-engineer" what the JavaScript is doing, and instead of trying to scrape the page, you issue the HTTP requests that the JavaScript is issuing and interpret the results yourself (most likely in JSON format, nice and easy to parse). This strategy could be anything from trivial to a total nightmare, depending on the complexity of the JavaScript.
The best possibility, of course, would be to convince the website's maintainers to implement a developer-friendly API. All the cool kids are doing it these days 8-) Of course, they might not want their data scraped in an automated fashion... in which case you can expect a cat-and-mouse game of making their page increasingly difficult to scrape :-(
There is a bit of a learning curve, but tools like Pamie (Python) or Watir (Ruby) will let you latch into the IE web browser and get at the elements. This turns out to be easier than Mechanize and other HTTP level tools since you don't have to emulate the browser, you just ask the browser for the html elements. And it's going to be way easier than reverse engineering the Javascript/Ajax calls. If needed you can also use tools like beatiful soup in conjunction with Pamie.
Probably the easiest way is to use IE webbrowser control in C# (or any other language). You have access to all the stuff inside browser out of the box + you dont need to care about cookies, SSL and so on.
i found the IE Webbrowser control have all kinds of quirks and workarounds that would justify some high quality software to take care of all those inconsistencies, layered around the shvwdoc.dll api and mshtml and provide a framework.
This seems like it's a pretty common problem. I wonder why someone hasn't anyone developed a programmatic browser? I'm envisioning a Firefox you can call from the command line with a URL as an argument and it will load the page, run all of the initial page load JS events and save the resulting file.
I mean Firefox, and other browsers already do this, why can't we simply strip off the UI stuff?
I am wondering if and how it is possible to 'listen to' the text that is in my browser window.
I am specifically NOT looking to scrape websites in the sense that I want to crawl them for information, I am just interested in interacting with an arbitrary page that makes my browser output text.
Example
Suppose I am asking a question on Stack Overflow.
By the time I type a title of 'Listen to text in browser' the suggestions appear, one of them contains plain text 'Listen to browser request'
As soon as the word 'request' is on my screen, the browser will get shut down and I order a pizza
What would a good solution look like
I want to be able to do this for practically any website that somehow makes my computer show simple text. Ideally without having knowledge of how the text is generated.
I want this to be somewhat fast, subsecond should be possible
I do not want to hit the website or its api's, just want to use the information that is already on my screen.
I am not too picky about the OS and browser requirements.
I can also imagine there may be corner cases wher it is hard (perhaps text is shown as a picture, or perhaps parts of a sentence are actually spread across multiple textboxes that are just displayed to eachother). For now I just wonder how this can be done for a simple page.
Bonus points if it could even capture text from the field in which I am typing myself, so I can scald myself when I am about to say something stupid.
What have I come up with so far
I am in general confident that I can process the text once I get it into a tool, however the main challenge is on how one can listen to the browser.
I tried looking at the source code, but that does not appear to contain this dynamic text
Perhaps there is a steraming API on the browser itself, that can stream out changes?
Perhaps there is a way to grab all text from a browser, perhaps 10x per second or so
Using the normal scraping solutions are completely not what I want, so I do not want to fire a request to the webserver 10x per second.
In the worst case, I suppose we could use screen capture software, followed by text recognition, but I really hope there is something more elegant
I suppose there may be automation/testing software that can do this. That would be an answer but something lightweight (e.g. a python library) would be nicest.
I have tried searching but did not find any solution, or even the question. Presumably I am using the wrong words.
I have been wondering about this since I could really benefit from a program that makes actions on websites that I use for my job that require the same command over and over again.
I know some python and I love to learn new things.
I tried looking for it on google but I guess I'm not sure how to find it.
I would love it if you could direct me to a guide or something like that.
Thank you very much!
Selenium interacts with a web browser directly, although you can hide the browser window in the code (look up Selenium in --headless mode). This is a good choice for filling out a lot of forms or interacting with graphical user interface elements.
However, if you need to request information from websites, you don't always need to interact with the web browser directly. You can use the package called Requests. This doesn't depend on any web browsers and can run silently in the background.
I think you can do it with Python and some packages like selenium. Also you need some html knowledge to search in the html source code of the specific wegpage.
I found an interesting use case, maybe that helps you:
https://towardsdatascience.com/controlling-the-web-with-python-6fceb22c5f08
Python noobie.
I'm trying to make Python select a portion of my screen. In this case, it is a small window within a Firefox window -- it's Firebug source code. And then, once it has selected the right area, control-A to select all and then control-C to copy. If I could figure this out then I would just do the same thing and paste all of the copies into a .txt file.
I don't really know where to begin -- are there libraries for this kind of thing? Is it even possible?
I would look into PyQt or PySide which are Python wrapper on TOp of Qt.
Qt is a big monster but it's very well documented and i'm sure it will help you further in your project once you grabbed your screen section.
As you've mentioned in the comments, the data is all in the HTML to start (I'm guessing it's greyed out in your Firebug screenshot since it's a hidden element). This approach avoids the complexity of trying to automate a browser. Here's a rough outline of how I would get the data:
Download the HTML for the whole page - I'd do this manually at first (i.e. File > Save from a browser), and if there are a bunch of pages you want to process, figure out how to download all the pages you want later. If you want to use python for this part, I'd recommend urllib2. The URLs for each page are probably pretty structured, so you could easily store them in a list, and download each one and save it locally. .
Write a script to parse the HTML - don't use regex. Since you're using Python, use something like Beautiful Soup, which will create a nice object representation of the page, and then you can get the elements you want.
You mention you're new to python, so there's definitely going to be a learning curve around this, but this actually sounds like a pretty doable project to use to learn some more python.
If you run into specific obstacles with each step, start a new question with a bit of sample code, showing what you're trying to accomplish, and people will be more than willing to help out.
I need my program (Python) to upload files (large reports) to services like (rapidshare, megaupload or easyshare) and grab the URL the site gives me (to them forward to the user)
What's easiest way ( I think Selenium, but maybe it's overkill) ?
What's the fastest ( can I do it with mechanize? ) ?
How would you do it?
Thanx in advance.
I would attack this with Selenium, even it beeing really heavy, I think the easy aspect of it is worth it.
I would do what you need to do (upload file to service) by hand while on the FireFox plugin SeleniumIDE would be recording it. Them, just export as Python and you have your code.
SeleniumIDE:
Selenium is a bit to slow, but the simplicity I showed you is well worth it (IMHO).
You might check first whether the sites in questions have an API meant for this sort of thing. easy-share for example does (the others are blocked to me at the moment, so haven't checked those): http://www.easy-share.com/be/developers.html (and they even have a ready-made python module available)