Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I'm using Python 3.6 while I have to fill in a form. Unfortunately, mechanize doesn't work on Python 3.
What do you suggest as an alternative to mechanize?
SeleniumRC with selenium.py is an alternative (and one of the few workable options if the pages you need to scrape have an important, "structural" role for Javascript operations, esp. AJAX-y ones, since Mechanize doesn't execute the Javascript on the pages it's scraping).
For scraping and form handling you can use lxml.html (it doesn't automate fetching and cookies though).
scrapy is a library specifically for scraping.
I've been successful with Splinter, a solution built on top of Selenium - while providing more pythonic API.
I've used twill for a lot of my testing needs. It works as a stand-alone language for "web browsing" or as a library from Python. It actually uses Mechanize under the hood, so I'm not sure if it will meet your needs -- are you encountering problems intrinsic to Mechanize, or would you benefit from a high level layer?
try zope.browser
http://pypi.python.org/pypi?:action=display&name=zope.testbrowser
scrapelib is another option : https://github.com/sunlightlabs/scrapelib
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
There's a website which has a form. I would like to automate a process that I enter some values to the form and check the results. I would like to scan quite a lot of form combinations (that's why I don't want to do this manually). Unfortunately, I was unable to automate it using cURL because of some heavy cookie usage.
I thought that maybe I can use real browsers to do some automation for me? I was thinking of phantomjs and selenium (haven't used selenium so far). Does selenium run a real browser? This would be good, since a real browser would handle all the cookie stuff.
In short: I would have a bunch of python dictionaries that would be used to fill the website form. After filling the form, I want to scan HTML to retrieve the result. Afterwards, I'll summarize everything (this step will be easy). Does selenium suit my needs? Can you recommend something better?
Yes selenium is a browser emulator it opens an actual browser window when you run it
phantom-js is a headless web kit for selenium and it will run the browser in the background without showing the actual browser window
if you cant use, urllib, requests or mechanize then yes your best shot is using selenium
for the HTML parsing I recommend BeautifulSoup its really easy to use and will get all the info you need
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I'd like to have a utility running to periodically check our websites to make sure they're up and responding. Python is my preferred quick utility environment.
I know I can ping the server with urllib2 or something, but I really want to test that all the resources are there and available as well (CSS, JS, images, etc). Something like what a browser does when it loads a page -- fetch the HTML, then fetch the resources required, and check for any 400 or 500 errors.
Is there some simple way to do this in Python? I could probably use regex to try to grab the resource URLs from the HTML, but I don't want to worry about whether I'm doing it wrong.
Is there a tool or trick that will do the hard work, or will I have to parse the HTML myself? Or am I going about this the wrong way?
For availability monitoring I'd recommend a 3rd party service like newrelic.com or site24x7.com.
If you want to roll your own (which isn't so hard if you have only basic needs) just use an HTML parser and iterate over the DOM to request your linked resources. Just don't use regexes.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I wonder if there is an open source CMS coded in Python that is as big as Drupal (or Joomla or Wordpress).
You are looking for Django. (Edited to say: OK, it's a web application framework, but there's lots of overlap. Django-CMS is maybe more what you want, but is in no way as big as Drupal or Joomla.)
Plone is an open source python powered CMS.
check out Web2py it comes tons usefully stuff
I'm looking for similar alternatives and have found http://www.lfcproject.com/blog/release-10-final. At least they just had a release. Looks promising.
Btw, development of Django-CMS looks either dead or stale :\
Tendenci was recently released into the open source community and is written in Python on a Django framework.
Tendenci CMS comes with a ton of features that require plugins/modules typically with other CMS' like membership management, selective permissions, event registration and event calendar, jobs board, video and photo galleries, etc.
You can download Tendenci on Github at https://github.com/tendenci/tendenci.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
Does anyone know of a Python class similar to Java Robot?
Specifically I would like to perform a screen grab in Ubuntu, and eventually track mouse clicks and keyboard presses (although that's a slightly different question).
If you have GTK, then you can use the gtk.gdk.Display class to do most of the work. It controls the keyboard/mouse pointer grabs a set of gtk.gdk.Screen objects.
Check out GNU LDTP:
GNU/Linux Desktop Testing Project (GNU
LDTP) is aimed at producing high
quality test automation framework
[...]
Especially Writing LDTP test scripts in Python scripting language
As far as the screen grab, see this answer. That worked for me. Other answers to the same question might be of interest as well.
Check out the RobotFramework. I do not know if it will do the same things as JavaRobot, or if it will do more. But it is easy and very flexible to use.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
What's my best bet for parsing HTML if I can't use BeautifulSoup or lxml? I've got some code that uses SGMLlib but it's a bit low-level and it's now deprecated.
I would prefer if it could stomache a bit of malformed HTML although I'm pretty sure most of the input will be pretty clean.
Python has a native HTML parser, however the Tidy wrapper Nick suggested would probably be a solid choice as well. Tidy is a very common library, (written in C is it?)
Perhaps µTidylib will meet your needs?
You can install lxml and many other python modules easily and seamlessly on the Mac (OS X) using Pallet, which is the MacPorts official GUI
The module name is py27-lxml. Easy as 1,2,3.
http://www.xmlhack.com/read.php?item=1392
http://sourceforge.net/projects/pirxx/
http://pyxml.sourceforge.net/topics/
I don't have much experience with python, but I have used Xerces (from the Apache foundation) in the past and found it to be very useful. The learning curve isn't bad either, though I'm not coming from a python perspective. I suggest you consider it though. (The first two links I've included discuss python interfaces to Xerces and the last one is the first google hit on "python xml").
htql is good at handling malformed html:
http://htql.net/
html5lib is good:
http://code.google.com/p/html5lib/
Update: The link above is broken. A third-party mirror of above, can be accessed from https://github.com/html5lib/gcode-import