Listen to text in browse

Listen to text in browse - python

I am wondering if and how it is possible to 'listen to' the text that is in my browser window.
I am specifically NOT looking to scrape websites in the sense that I want to crawl them for information, I am just interested in interacting with an arbitrary page that makes my browser output text.
Example
Suppose I am asking a question on Stack Overflow.
By the time I type a title of 'Listen to text in browser' the suggestions appear, one of them contains plain text 'Listen to browser request'
As soon as the word 'request' is on my screen, the browser will get shut down and I order a pizza
What would a good solution look like
I want to be able to do this for practically any website that somehow makes my computer show simple text. Ideally without having knowledge of how the text is generated.
I want this to be somewhat fast, subsecond should be possible
I do not want to hit the website or its api's, just want to use the information that is already on my screen.
I am not too picky about the OS and browser requirements.
I can also imagine there may be corner cases wher it is hard (perhaps text is shown as a picture, or perhaps parts of a sentence are actually spread across multiple textboxes that are just displayed to eachother). For now I just wonder how this can be done for a simple page.
Bonus points if it could even capture text from the field in which I am typing myself, so I can scald myself when I am about to say something stupid.
What have I come up with so far
I am in general confident that I can process the text once I get it into a tool, however the main challenge is on how one can listen to the browser.
I tried looking at the source code, but that does not appear to contain this dynamic text
Perhaps there is a steraming API on the browser itself, that can stream out changes?
Perhaps there is a way to grab all text from a browser, perhaps 10x per second or so
Using the normal scraping solutions are completely not what I want, so I do not want to fire a request to the webserver 10x per second.
In the worst case, I suppose we could use screen capture software, followed by text recognition, but I really hope there is something more elegant
I suppose there may be automation/testing software that can do this. That would be an answer but something lightweight (e.g. a python library) would be nicest.
I have tried searching but did not find any solution, or even the question. Presumably I am using the wrong words.

Related

Scraping data from a website with a search box

First of all I want to apologize if my question is too broad or generic, but it would really save me a lot of needlessly wasted time to get an answer to guide me in the right direction for the work I want to do. With that out of the way, here it goes.
I am trying to retrieve some publicly available data from a website, to create a dataset to work with for a Data Science project. My big issue is that the website does not have a friendly way to download it, and, from what I gathered, it also has no API. So, getting the data requires skills that I do not possess. I would love to learn how to scrape the website (the languages I am most comfortable with are Python and R), and it would add some value to my project if I did it, but I also am somewhat pressured by time constraints, and don't know if it is even possible to scrape the website, much less to learn how to do it in a few days.
The website is this one https://www.rnec.pt/pt_PT/pesquisa-de-estudos-clinicos. It has a search box, and the only option I configure is to click the banner that says "Pesquisa Avançada" and then mark the box that says "Menor de 18 anos". I then click the "Pesquisar" button in the lower-right, and the results that show up are the ones that I want to extract (either that or, if it's simpler, all the results, without checking the "Menor de 18 anos" box). In my computer, 2 results show up per page, and there are 38 pages total. Each result has some of it details in the page where the results appear but, to get the full data from each entry, one has to click "Detalhes" in the lower right of each result, which opens a display with all the data from that result. If possible, I would love to download all the data from that "Detalhes" page of each result (the data there alerady contains the fields that show up in the search result page).
Honestly, I am ready to waste a whole day manually transcribing all the data, but it would be much better to do it computationally, even it it takes me two or three days to learn and do it.
I think that, for someone with experience in web scraping, it is probably super simple to check if it is possible to download the data I described, and what is the best way to go about it (in general terms, I would research and learn it). But I really am lost when it comes to this, and just kindly want to ask for some help in showing me the way go about it (even if the answer is "it is too complicated/impossible, just do it manually"). I know that there are some Python packages for web scraping, like BeautifulSoup and Selenium, but I don't really know if either of them would be appropriate.
I am sorry if my request is not exactly a short and simple coding question, but I have to try to gather any help or guidance I can get. Thank you in advance to everyone who reads my question and a special thank you if you are able to give me some pointers.

What is the most efficient way to extract visible data from a poker room and how does one implement this?

So I'm new to python and just finished my first application. (Giving random chords to be played on a midi piano and increasing the score if the right notes are hit in a graphical interface, nothing too fancy but also non-trivial.) And now I'm looking for a new challenge, this time I'm going to try and create a program that monitors a poker table and collects data on all the players. Though this is completely allowed on almost all poker rooms (example of the largest one) there is obviously no set and go API available. This probably makes the extraction of relevant data the most challenging part of the entire program. In my search for more information, I came across an undergraduate thesis that goes in to writing such a program using Java (Internet Poker: Data Collection and Analysis - Haruyoshi Sakai).
In this thesis, the author speaks of 3 data collection methods:
Sniffing packets
Hand history
Scraping the screen
Like the author, I've come to the conclusion that the third option is probably the best route, but unlike him I have no knowledge of how to start this.
What I do know is the following: Any table will look like the image below. Note how text, including numbers is written in the same font on the table. Additionally, all relevant information is also supplied in the chat box situated in the lower left corner of the window.
In some regards using the chat box sounds like the best way to go, seeing as all text is predictable and in the same font. The problem I see is computational speed: It will often occur that many actions get executed in rapid succession. Any program will have to be able to keep up with this.
On the other hand, using the table as reference means that you have to deal with unpredictable bet locations.
The plan: Taking this in to a count, I'd start by getting an index of all player's names and stacks from the table view and "initialising" the table that way, and continue to use their stacks to extrapolate the betting they do.
The Method: Of course, the method is the entire reason why I made this post. It seems to me like one would need some sort of OCR to achieve all this, but seeing as everything is in a known font, there may be some significant optimisations that can be made. I would love some input on resources to learn about solutions to similar problems. Or if you've got a better idea on how to tackle this problem, I'd love to hear that too!
Please do be sure to ask any questions you may have, I will be happy to answer them in as much detail as possible.

Black box testing of webpages in python

Does anyone know of any APIs that i could use to test my website from a blackbox point of view.
I would need to enter some text into a text box and extract the corresponding output for multiple cases on the same page.
I would like to perform a load and stress test on this website.
Pardon my incorrect jargon if there is any as i am extremely new to web development.

Selenium is an extremely powerful tool for testing web applications. Its primary function is to act as a browser and carry out unit tests, so it may be useful to you for doing things like entering text into a text box and extracting the corresponding output for multiple cases.
You can read more about Selenium here
Also python bindings exist, which you can read about here
As far as stress testing goes, take a look at this question posted on stack overflow best way to stress test a website

How to: Python script that will 'click' on a portion of my screen, and then do key commands?

Python noobie.
I'm trying to make Python select a portion of my screen. In this case, it is a small window within a Firefox window -- it's Firebug source code. And then, once it has selected the right area, control-A to select all and then control-C to copy. If I could figure this out then I would just do the same thing and paste all of the copies into a .txt file.
I don't really know where to begin -- are there libraries for this kind of thing? Is it even possible?

I would look into PyQt or PySide which are Python wrapper on TOp of Qt.
Qt is a big monster but it's very well documented and i'm sure it will help you further in your project once you grabbed your screen section.

As you've mentioned in the comments, the data is all in the HTML to start (I'm guessing it's greyed out in your Firebug screenshot since it's a hidden element). This approach avoids the complexity of trying to automate a browser. Here's a rough outline of how I would get the data:
Download the HTML for the whole page - I'd do this manually at first (i.e. File > Save from a browser), and if there are a bunch of pages you want to process, figure out how to download all the pages you want later. If you want to use python for this part, I'd recommend urllib2. The URLs for each page are probably pretty structured, so you could easily store them in a list, and download each one and save it locally. .
Write a script to parse the HTML - don't use regex. Since you're using Python, use something like Beautiful Soup, which will create a nice object representation of the page, and then you can get the elements you want.
You mention you're new to python, so there's definitely going to be a learning curve around this, but this actually sounds like a pretty doable project to use to learn some more python.
If you run into specific obstacles with each step, start a new question with a bit of sample code, showing what you're trying to accomplish, and people will be more than willing to help out.

It's possible make an OCR in Python to check words

in opened applications?
I want to automate firefox in some web page and I don't have a way to "know" if the page already load completely or if it still loading...
I was thinking about making an OCR to check the status bar... it's difficult ?
For example, when the word DONE appears at the status bar, the program continues to the next command...

OCR is a terrible, terrible choice for something like this. Use OCR when you are encountering images with unknown text. If you are trying to automate Firefox, there's a billion better ways of doing so. Check out something like AutoIt or any one of a hundred automation tools for Windows. Or write a custom Firefox extension. Either one of those will be far easier to implement, more reliable, and more performant than OCR.

Maybe http://groups.csail.mit.edu/uid/sikuli/ is what you want

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.