Is there any module out there that could be used by my Django site to tell whether the client browser supports HTML5 and what features are supported?
Sadly no. This is something that you'll need JavaScript client to do. Especially something like http://modernizr.com/
One way to do it would be to run modernizr and send results to back end.
If you would be really optimistic, you could build a list of User-Agents and decide upon that. But good luck with keeping which things works in which version of Chrome and Firefox.
Related
I'm currently testing a website with python-selenium and it works pretty well so far. I'm using webdriver.Firefox() because it makes the devolepment process much easier if you can see what the testing program actually does. However, the tests are very slow. At one point, the program has to click on 30 items to add them to a list, which takes roughly 40 seconds because the browser is responding so awfully slowly. So after googling how to make selenium faster I've thought about using a headless browser instead, for example webdriver.PhantomJS().
However, the problem is, that the website requires a login including a captcha at the beginning. Right now I enter the captcha manually in the Firefox-Browser. When switching to a headless browser, I cannot do this anymore.
So my idea was to open the website in Firefox, login and solve the captcha manually. Then I somehow continue the session in headless PhatomJS which allows me to run the code quickly. So basically it is about changing the used driver mid-code.
I know that a driver is completely clean when created. So if I create a new driver after logging in in Firefox, I'd be logged out in the other driver. So I guess I'd have to transfer some session-information between the two drivers.
Could this somehow work? If yes, how can I do it? To be honest I do not know a lot about the actual functionality of webhooks, cookies and storing the"logged-in" information in general. So how would you guys handle this problem?
Looking forward to hearing your answers,
Tobias
Note: I already asked a similar question, which got marked as a duplicate of this one. However, the other question discusses how to reconnect to the browser after quitting the script. This is not what I am intending to do. I want to change the used driver mid-script while staying logged in on the website. So I deleted my old question and created this new, more fitting one. I hope it is okay like that.
The real solution to this is to have your development team add a test mode (not available on Production) where the Captcha solution is either provided somewhere in the page code, or the Captcha is bypassed.
Your proposed solution does not sound like it would work, and having a manual step defeats the purpose of automation. Automation that requires manual steps to be taken will be abandoned.
The website "recognizes" the user via Cookies - a special HTTP Header which is being sent with each request so the website knows that the user is authenticated, has these or that permissions, etc.
Fortunately Selenium provides functions allowing cookies manipulation so all you need to do is to store cookies from the Firefox using WebDriver.get_cookies() method and once done add them to PhantomJS via WebDriver.add_cookie() method.
firefoxCookies = firefoxDriver.get_cookies()
for cookie in firefoxCookies:
phantomJSDriver.add_cookie(cookie)
I've looked and urllib(2), mechanize, and Beautiful Soup in hopes to find something that captures network calls such as pixel/beacon fires from a page. Unfortunately i'm not very familiar with any of them, and also not very clear on how to go about my search.
I'd like to use python to run through a series of web urls, and capture each ones networks call aka pixel fires. Would anyone know of a means or library i can start from inorder to accomplish this??
looked into webscrappying, but i don't want the html, instead i beleive i'm looking for the GET request the site makes.
If I understand what you want, you want to log what requests a browser makes when displaying a page, in respect of many pages.
Your options are to script a browser using python (See: http://wiki.python.org/moin/WebBrowserProgramming), or script the browser using javascript, and output your results in some way (I suggest JSON, over a request or to a file), and analyse them in python.
You'll probably find it easier to do the scripting in javascript, honestly.
Another possibility if you have access to the Firefox web browser is to install Firebug, a powerful debugging tool that gives you the option to display all network traffic from a web page in the browser console. In order to transfer the output from the console to a file you will need to install the ConsoleExport plugin for Firebug.
You will now be able to capture all the traffic from a web page to a file which you can then parse with Python.
Is there a way, using some library or method, to scrape a webpage in real time as a user navigates it manually? Most scrapers I know of such as python mechanize create a browser object that emulates a browser - of course this is not what I am looking for since if I have a browser open, it will be different than the one mechanize creates.
If there is no solution, my problem is I want to scrape elements from a HTML5 game to make an intelligent agent of sorts. I won't go into more detail, but I suspect if others are trying to do the same in the future (or any real time scraping with a real user), a solution to this could be useful for them as well.
Thanks in advance!
Depending on what your use-case is, you could set up a SOCKS proxy or some other form of proxy and configure it to log all traffic, then instruct your browser to use it. You'd then scrape that log somehow.
Similarly, if you have control over your router, you could configure capture and logging there, e.g. using tcpdump. This wouldn't decrypt encrypted traffic, of course.
If you are working with just one browser, there may be a way to instruct it to do something at each action via a custom browser plugin, but I'd have to guess you'd be running into security model issues a lot.
The problem with a HTML5 game is that typically most of its "navigation" is done using a lot of Javascript. The Javascript is typically doing a lot -- manipulating the DOM, triggering requests for new content to fit into the DOM, etc...
Because of this you might be better off looking into OS-level or browser-level scripting services that can "drive" keyboard and mouse events, take screenshots, or possibly even take a snapshot of the current page DOM and query it.
You might investigate browser automation and testing frameworks like Selenium for this.
I am not sure if this would work in your situation but it is possible to create a simple web browser using PyQt which will work with HTML5 and from this it might be possible to capture what is going on when a live user plays the game.
I have used PyQt for a simple browser window (for a completely different application) and it seems to handle simple, sample HTML5 games. How one would delve into the details of what is going on the game is a question for PyQt experts, not me.
I want to create a special wiki page on my local Redmine server. It should contain an inventory of some executables from my server. My goal is a script which scans certain folders on my server for these files and put them (with some additional information) in a nice Redmine wiki page.
My first thought was to traverse my server's file system with a simple batch file and to create a SQL expression for putting the results directly into the underlying mySQL database (which contains Redmine's wiki pages). But I consider this too risky and too error-prone.
Then I had the idea to use a script language like python (which I always wanted to learn) to retrieve the information and send it back to the Redmine server, like a web browser would do. This should be a much safer way. But this doesn't seems to be an easy beginner's task when just starting with python - I fail to authenticate myself on the Redmine server.
My last idea was to create a HTML page with python, which could be displayed within a Redmine wiki page with the plugin 'Redmine Wiki Extensions'. But I consider this only as a solution light, because it's not very elegant.
So what I seek is either a new idea to solve this problem or some clues on how to do a proper authentification with python on my Redmine server - maybe I could use a cookie for easier authentification...
I'm not familiar with redmine, but if you are looking for something like having a script that performs some actions the same way you would do in a browser, then mechanize is a library that might be helpful for you unless there's some javascript involved. In that case, then I'd look into something like windmill or selenium to let you drive the web browser.
However, please note using web scraping is also error-prone since any change in the design of the web pages involved might break your scripts.
Regarding the option of using an API as pointed out by the comment from AdamKG, that would be a good option, since there's a REST API that you can use from python if you like. Unfortunately, I don't see anything to let you do what you're looking for and it seems it hasn't yet reached the stable status yet. Anyway, as I said, it's still a good option to consider in the future.
I want to have a browser page that updates some information on a timer or events. I'd like to use Python on the server side. It's quite simple, I don't need anything massively complex.
I can spend some time figuring out how to do all this the "AJAX way", but I'm sure someone has written a nice Python library to do all the heavy lifting. If you have used such a library please let me know the details.
Note: I saw how-to-implement-a-minimal-server-for-ajax-in-python but I want a library to hide the implementation details.
AJAX stands for Asynchronous JavaScript and XML. You don't need any special library, other than the Javascript installed on the browser to do AJAX calls. The AJAX requests comes from the client side Javascript code, and goes to the server side which in your case would be handled in python.
You probably want to use the Django web framework.
Check out this tutorial on Django tips: A simple AJAX example.
Here is a simple client side tutorial on XmlHTTPRequest / AJAX
You can also write both the client and server side of the ajax code using python with pyjamas:
Here's an RPC style server and simple example:
http://www.machine-envy.com/blog/2006/12/10/howto-pyjamas-pylons-json/
Lots of people use it with Django, but as the above example shows it will work fine with Pylons, and can be used with TurboGears2 just as easily.
I'm generally in favor of learning enough javascript to do this kind of thing yourself, but if your problem fits what pygjamas can do, you'll get results from that very quickly and easily.
I suggest you to implement the server part in Django, which is in my opinion a fantastic toolkit. Through Django, you produce your XML responses (although I suggest you to use JSON, which is easier to handle on the web browser side).
Once you have something that generates your reply on server side, you have to code the javascript code that invokes it (through the asynchronous call), gets the result (in JSON) and uses it to do something clever on the DOM tree of the page. For this, you need a JavaScript library.
I did some experience with various javascript libraries for "Web 2.0". Scriptaculous is cool, and Dojo as well, but my absolute favourite is MochiKit, because they focus on a syntax which is very pythonic, so it will hide you quite well the differences between javascript and python.