Parsing HTML in Python [closed]

Parsing HTML in Python [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
What's my best bet for parsing HTML if I can't use BeautifulSoup or lxml? I've got some code that uses SGMLlib but it's a bit low-level and it's now deprecated.
I would prefer if it could stomache a bit of malformed HTML although I'm pretty sure most of the input will be pretty clean.

Python has a native HTML parser, however the Tidy wrapper Nick suggested would probably be a solid choice as well. Tidy is a very common library, (written in C is it?)

Perhaps µTidylib will meet your needs?

You can install lxml and many other python modules easily and seamlessly on the Mac (OS X) using Pallet, which is the MacPorts official GUI
The module name is py27-lxml. Easy as 1,2,3.

http://www.xmlhack.com/read.php?item=1392
http://sourceforge.net/projects/pirxx/
http://pyxml.sourceforge.net/topics/
I don't have much experience with python, but I have used Xerces (from the Apache foundation) in the past and found it to be very useful. The learning curve isn't bad either, though I'm not coming from a python perspective. I suggest you consider it though. (The first two links I've included discuss python interfaces to Xerces and the last one is the first google hit on "python xml").

htql is good at handling malformed html:
http://htql.net/

html5lib is good:
http://code.google.com/p/html5lib/
Update: The link above is broken. A third-party mirror of above, can be accessed from https://github.com/html5lib/gcode-import

Related

programmatically migrating tests from self.assert to bare asserts [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I have a relatively large test code base which I will migrate from nose to py.test. I would also like to take advantage of py.tests 'bare assert' functionality so that I'd need to make a lot of the following changes (for example):
self.assertEquals(a, b)
->
assert a == b
The code base is in practice too large for me to consider doing this by hand. With some git and sed magic I can get rid of about half of the self.asserts, but that still leaves me with an awful lot to do and the script is already getting somewhat complex.
It occurred to me that I'm probably not the first person to have done this. So: have any nice scripts to do this kind of thing? Or know of any nice tool that can programmatically refactor python (note: I'm aware of python-rope but to be honest at a glance that didn't seem particularly convenient)

You could use py.convert_unittest from the pycmd package for transforming the self.assert* alternatively. It doesn't deal with rewriting the inheritance, though.
Not sure it makes sense but you might also checkout the related pycmd hg repository and tweak the script, possibly submitting pull requests. If you like, i'd help factoring out the script into a new repo (also on github, if you prefer) and then advertise it so people with the same problem can start sharing efforts. As i am not using unittest myself for a longer time (surprise!) i don't have interest to drive this effort but i am willing to help along.

The best way to document python code [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I'm starting to work on documentation of python33 modules. But I'm beginner in this field.
I will be very grateful if you will help me to choose good instrument to do this.
I read a lot of topics in Internet. I've fount that the most popular are sphinx and Epydoc. But which of them is better to use? Almost all problems discussed in the Internet are quite old. I'm sure that situation is changed since 2011 year.. May be somebody is pro in doing docs for python docs. Please help to make first steps.

Cannot imagine more useful and helpful material except:
PEP8 Comments section
PEP8 Documentation Strings section
Documentation thread of The Hitchhiker’s Guide to Python
And, yes:
Epydoc is discontinued. Use Sphinx instead.
Hope that helps.

XMPP server in python [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I'm looking hard but I cannot find any XMPP server in python with the following features:
using epoll, just like http://www.gevent.org/
supporting BOSH
modular design
use little RAM/CPU for up to 1000 users
more important than the previous requirement: the CPU/RAM usage must be predictable
Prosody looks quite good feature-wise, but I don't know how many users it can support simultaneously and how it is performance-wise.
Could someone give me an idea?

For a rough idea of how Prosody is performance-wise, see this post on their ML. https://groups.google.com/d/topic/prosody-users/SlXpfwJfgY4/discussion

xmpp.org uses Prosody, any other questions? :P
btw, if you want to toy with it a little, you can always run prosody using luajit (didn't test that myself, but I'm fairly sure it would work). Expect at least 2-4x faster execution.
Look # ejabberd too.

Are there any alternatives to Mechanize in Python? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I'm using Python 3.6 while I have to fill in a form. Unfortunately, mechanize doesn't work on Python 3.
What do you suggest as an alternative to mechanize?

SeleniumRC with selenium.py is an alternative (and one of the few workable options if the pages you need to scrape have an important, "structural" role for Javascript operations, esp. AJAX-y ones, since Mechanize doesn't execute the Javascript on the pages it's scraping).

For scraping and form handling you can use lxml.html (it doesn't automate fetching and cookies though).
scrapy is a library specifically for scraping.

I've been successful with Splinter, a solution built on top of Selenium - while providing more pythonic API.

I've used twill for a lot of my testing needs. It works as a stand-alone language for "web browsing" or as a library from Python. It actually uses Mechanize under the hood, so I'm not sure if it will meet your needs -- are you encountering problems intrinsic to Mechanize, or would you benefit from a high level layer?

try zope.browser
http://pypi.python.org/pypi?:action=display&name=zope.testbrowser

scrapelib is another option : https://github.com/sunlightlabs/scrapelib

Is there a Python equivalent to Java's AWT Robot class? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
Does anyone know of a Python class similar to Java Robot?
Specifically I would like to perform a screen grab in Ubuntu, and eventually track mouse clicks and keyboard presses (although that's a slightly different question).

If you have GTK, then you can use the gtk.gdk.Display class to do most of the work. It controls the keyboard/mouse pointer grabs a set of gtk.gdk.Screen objects.

Check out GNU LDTP:
GNU/Linux Desktop Testing Project (GNU
LDTP) is aimed at producing high
quality test automation framework
[...]
Especially Writing LDTP test scripts in Python scripting language

As far as the screen grab, see this answer. That worked for me. Other answers to the same question might be of interest as well.

Check out the RobotFramework. I do not know if it will do the same things as JavaRobot, or if it will do more. But it is easy and very flexible to use.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parsing HTML in Python [closed] - python

Python has a native HTML parser, however the Tidy wrapper Nick suggested would probably be a solid choice as well. Tidy is a very common library, (written in C is it?)

Perhaps µTidylib will meet your needs?

You can install lxml and many other python modules easily and seamlessly on the Mac (OS X) using Pallet, which is the MacPorts official GUI The module name is py27-lxml. Easy as 1,2,3.

htql is good at handling malformed html: http://htql.net/

html5lib is good: http://code.google.com/p/html5lib/ Update: The link above is broken. A third-party mirror of above, can be accessed from https://github.com/html5lib/gcode-import

Related

programmatically migrating tests from self.assert to bare asserts [closed]

The best way to document python code [closed]

XMPP server in python [closed]

Are there any alternatives to Mechanize in Python? [closed]

Is there a Python equivalent to Java's AWT Robot class? [closed]

Categories

Resources