Black box testing of webpages in python

Black box testing of webpages in python - python

Does anyone know of any APIs that i could use to test my website from a blackbox point of view.
I would need to enter some text into a text box and extract the corresponding output for multiple cases on the same page.
I would like to perform a load and stress test on this website.
Pardon my incorrect jargon if there is any as i am extremely new to web development.

Selenium is an extremely powerful tool for testing web applications. Its primary function is to act as a browser and carry out unit tests, so it may be useful to you for doing things like entering text into a text box and extracting the corresponding output for multiple cases.
You can read more about Selenium here
Also python bindings exist, which you can read about here
As far as stress testing goes, take a look at this question posted on stack overflow best way to stress test a website

Related

Why am I not seeing the "full" html case? [duplicate]

This question already has answers here:
Web-scraping JavaScript page with Python
(18 answers)
Closed 4 hours ago.
What is the best method to scrape a dynamic website where most of the content is generated by what appears to be ajax requests? I have previous experience with a Mechanize, BeautifulSoup, and python combo, but I am up for something new.
--Edit--
For more detail: I'm trying to scrape the CNN primary database. There is a wealth of information there, but there doesn't appear to be an api.

The best solution that I found was to use Firebug to monitor XmlHttpRequests, and then to use a script to resend them.

This is a difficult problem because you either have to reverse engineer the JavaScript on a per-site basis, or implement a JavaScript engine and run the scripts (which has its own difficulties and pitfalls).
It's a heavy weight solution, but I've seen people doing this with GreaseMonkey scripts - allow Firefox to render everything and run the JavaScript, and then scrape the elements. You can even initiate user actions on the page if needed.

Selenium IDE, a tool for testing, is something I've used for a lot of screen-scraping. There are a few things it doesn't handle well (Javascript window.alert() and popup windows in general), but it does its work on a page by actually triggering the click events and typing into the text boxes. Because the IDE portion runs in Firefox, you don't have to do all of the management of sessions, etc. as Firefox takes care of it. The IDE records and plays tests back.
It also exports C#, PHP, Java, etc. code to build compiled tests/scrapers that are executed on the Selenium server. I've done that for more than a few of my Selenium scripts, which makes things like storing the scraped data in a database much easier.
Scripts are fairly simple to write and alter, being made up of things like ("clickAndWait","submitButton"). Worth a look given what you're describing.

Adam Davis's advice is solid.
I would additionally suggest that you try to "reverse-engineer" what the JavaScript is doing, and instead of trying to scrape the page, you issue the HTTP requests that the JavaScript is issuing and interpret the results yourself (most likely in JSON format, nice and easy to parse). This strategy could be anything from trivial to a total nightmare, depending on the complexity of the JavaScript.
The best possibility, of course, would be to convince the website's maintainers to implement a developer-friendly API. All the cool kids are doing it these days 8-) Of course, they might not want their data scraped in an automated fashion... in which case you can expect a cat-and-mouse game of making their page increasingly difficult to scrape :-(

There is a bit of a learning curve, but tools like Pamie (Python) or Watir (Ruby) will let you latch into the IE web browser and get at the elements. This turns out to be easier than Mechanize and other HTTP level tools since you don't have to emulate the browser, you just ask the browser for the html elements. And it's going to be way easier than reverse engineering the Javascript/Ajax calls. If needed you can also use tools like beatiful soup in conjunction with Pamie.

Probably the easiest way is to use IE webbrowser control in C# (or any other language). You have access to all the stuff inside browser out of the box + you dont need to care about cookies, SSL and so on.

i found the IE Webbrowser control have all kinds of quirks and workarounds that would justify some high quality software to take care of all those inconsistencies, layered around the shvwdoc.dll api and mshtml and provide a framework.

This seems like it's a pretty common problem. I wonder why someone hasn't anyone developed a programmatic browser? I'm envisioning a Firefox you can call from the command line with a URL as an argument and it will load the page, run all of the initial page load JS events and save the resulting file.
I mean Firefox, and other browsers already do this, why can't we simply strip off the UI stuff?

Listen to text in browse

I am wondering if and how it is possible to 'listen to' the text that is in my browser window.
I am specifically NOT looking to scrape websites in the sense that I want to crawl them for information, I am just interested in interacting with an arbitrary page that makes my browser output text.
Example
Suppose I am asking a question on Stack Overflow.
By the time I type a title of 'Listen to text in browser' the suggestions appear, one of them contains plain text 'Listen to browser request'
As soon as the word 'request' is on my screen, the browser will get shut down and I order a pizza
What would a good solution look like
I want to be able to do this for practically any website that somehow makes my computer show simple text. Ideally without having knowledge of how the text is generated.
I want this to be somewhat fast, subsecond should be possible
I do not want to hit the website or its api's, just want to use the information that is already on my screen.
I am not too picky about the OS and browser requirements.
I can also imagine there may be corner cases wher it is hard (perhaps text is shown as a picture, or perhaps parts of a sentence are actually spread across multiple textboxes that are just displayed to eachother). For now I just wonder how this can be done for a simple page.
Bonus points if it could even capture text from the field in which I am typing myself, so I can scald myself when I am about to say something stupid.
What have I come up with so far
I am in general confident that I can process the text once I get it into a tool, however the main challenge is on how one can listen to the browser.
I tried looking at the source code, but that does not appear to contain this dynamic text
Perhaps there is a steraming API on the browser itself, that can stream out changes?
Perhaps there is a way to grab all text from a browser, perhaps 10x per second or so
Using the normal scraping solutions are completely not what I want, so I do not want to fire a request to the webserver 10x per second.
In the worst case, I suppose we could use screen capture software, followed by text recognition, but I really hope there is something more elegant
I suppose there may be automation/testing software that can do this. That would be an answer but something lightweight (e.g. a python library) would be nicest.
I have tried searching but did not find any solution, or even the question. Presumably I am using the wrong words.

How to document existing Selenium Webdriver tests?

I am in charge of testing of a web application using Selenium Webdriver with Python. Over the past year I created a large script (20K+ lines) where each test is a separate function. Now my boss wants me to document my tests explaining in plan English what each test does. What tool would you recommend to document the steps your tests make?

I think this is a great question. Many people and companies don't bother managing their existing tests properly which leads to redundant and repeated code without having a clear idea what is covered by automated tests.
There is no single answer to this question but in general you can consider the following options:
Testing framework built in reporting. In Java, for example, you have the unit testing libraries like jUnit and TestNG. When they run, they generate certain output that can later be formatted and reviewed as the need arises. I am sure there an implementation of unit testing framework like this in Python too.
You can also consider using a BDD tool like Cucumber. This is a bit different and might not be suitable in certain cases when the tests are low level system checks. It can however help you organize your test scenarios and keep them an a readable form. It is also very good for reporting to a non-technical person.

Combination Functional/Load/Stress Testing Website Libraries Python

I have the need to scale up some testing efforts for web application. I'm most familiar with using selenium (with python bindings) for functional testing amongst other things. Now that I need to also do concurrent load/stress testing I think I need to take different approach. I like the look of locust, but I'm not sure how to integrate the functional test requirements as well. The basic test outline for an individual user is this:
login to site with credentials
"click" relevant angular elements to navigate the site
"click" and initiate download of various reports
Ideally, I could scale this with 10-50-100 concurrent users and get a log file with results (times, failures, etc.)
Any best practices tips from the frequently unsung test heros would be sincerely appreciated!
EDIT:
I realize this is a bit non-standard. Just the nature of what I am trying replicate with new relic running the background for analytics. Currently, I'm trying to figure out if Selenium can be combined with Locust in an appropriate way.

You are right that your first choice was Locust. The main strength of Locust that it is the Python code based tool and you can do there almost everything else what you can do in pure Python.
if you are looking for some functional testing aspect, you can even do it in integration with your load tests with default Python assertions library.
Check this article, it should give you some thoughts on how to make functional checks within your Locust performance tests using Python:
https://www.blazemeter.com/blog/locust-assertions-a-complete-user-manual

Minimized html preview in div

How you can realize a minimized view of a html page in a div (like google preview)?
http://img228.imageshack.us/i/minimized.png/
edit: ok.. i see its a picture on google, probably a minimized screenshot.

This is more or less a duplicate of the question: Create thumbnails from URLs using PHP
However, just to add my 2¢, my strong preference would be to use an existing web service, e.g. websnapr, as mentioned by thirtydot in the comments on your question. Generating the snapshots yourself will be difficult to scale well, and just the kind of thing I'd think is worth using an established service for.
If you really do want to do this yourself, I've had success using CutyCapt to generate snapshots of webpages - there are various other similar options (i.e. external programs you can call to do the rendering) mentioned in that other question.

google displays an image thumbnail, so you would need to generate an image using GD or ImageMagic.
The general flow would be
Fetch page content, including stylesheets and all images via curl (potentially tricky to capture all the embedded files but shouldn't be beyond a competent PHP programmer
Construct a rendering of the page inside PHP itself (EXTREMELY tricky! Wouldn't even know where to start with that, though there might be some kind of third party extension available)
Use GD/Imagemagic/whatever to generate a thumbnail image in an appropriate format (shouldn't be too hard).
Clearly, it's the rendering the page from the HTML, CSS, images etc you downloaded that is going to be the difficult part.
Personally I'd be wondering if the effort involved is worth it.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.