Recognise Malformed XML file [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I am building a python script to run nightly, part of which involves invoking a drupal bulk operation to export an XML file. Since the process takes a number of hours, and the next step is to automatically import it to another source, I would like to perform some level of integrity checking.
My first thought would be to simply make sure that the XML is not malformed. I don't particularly want to start inspecting the data, I just want to make sure it's not truncated.
What process might I use to perform this malformed check. Is there an applicable XML library built into Python? I would prefer to keep the script as portable as possible, so if possible a built-in solution would be the most preferable.
Thanks for any advice.

If you want use python, you can consider using element tree
Load your .xml and try to parse. Any Exception means XML is malformed.

Related

programmatically migrating tests from self.assert to bare asserts [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I have a relatively large test code base which I will migrate from nose to py.test. I would also like to take advantage of py.tests 'bare assert' functionality so that I'd need to make a lot of the following changes (for example):
self.assertEquals(a, b)
->
assert a == b
The code base is in practice too large for me to consider doing this by hand. With some git and sed magic I can get rid of about half of the self.asserts, but that still leaves me with an awful lot to do and the script is already getting somewhat complex.
It occurred to me that I'm probably not the first person to have done this. So: have any nice scripts to do this kind of thing? Or know of any nice tool that can programmatically refactor python (note: I'm aware of python-rope but to be honest at a glance that didn't seem particularly convenient)
You could use py.convert_unittest from the pycmd package for transforming the self.assert* alternatively. It doesn't deal with rewriting the inheritance, though.
Not sure it makes sense but you might also checkout the related pycmd hg repository and tweak the script, possibly submitting pull requests. If you like, i'd help factoring out the script into a new repo (also on github, if you prefer) and then advertise it so people with the same problem can start sharing efforts. As i am not using unittest myself for a longer time (surprise!) i don't have interest to drive this effort but i am willing to help along.

Is there an open source tool for finding the flow of a python program? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
Is there some tool that can provide the flow of a python program at a functional level (eg. function A called function B with args1 which in turn called function C with args2). If not, what could be a possible starting point to create it? I thought cProfile might be of some help, but it doesn't give the proper stack trace iirc. Is there a better solution than using pdb and parsing the stack trace and providing the result in a better format?
A very interesting project to visualize the program flow is pythontutor!
There are a number of Python visual debuggers that'll do what you want:
pudb (console visual debugger, open-source)
WinPDB (free, open-source)
PyCharm (shareware, free trial, cross-platform, not open source but probably has the best interface of the three)

Python utility to monitor website uptime (including resources) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I'd like to have a utility running to periodically check our websites to make sure they're up and responding. Python is my preferred quick utility environment.
I know I can ping the server with urllib2 or something, but I really want to test that all the resources are there and available as well (CSS, JS, images, etc). Something like what a browser does when it loads a page -- fetch the HTML, then fetch the resources required, and check for any 400 or 500 errors.
Is there some simple way to do this in Python? I could probably use regex to try to grab the resource URLs from the HTML, but I don't want to worry about whether I'm doing it wrong.
Is there a tool or trick that will do the hard work, or will I have to parse the HTML myself? Or am I going about this the wrong way?
For availability monitoring I'd recommend a 3rd party service like newrelic.com or site24x7.com.
If you want to roll your own (which isn't so hard if you have only basic needs) just use an HTML parser and iterate over the DOM to request your linked resources. Just don't use regexes.

XMPP server in python [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I'm looking hard but I cannot find any XMPP server in python with the following features:
using epoll, just like http://www.gevent.org/
supporting BOSH
modular design
use little RAM/CPU for up to 1000 users
more important than the previous requirement: the CPU/RAM usage must be predictable
Prosody looks quite good feature-wise, but I don't know how many users it can support simultaneously and how it is performance-wise.
Could someone give me an idea?
For a rough idea of how Prosody is performance-wise, see this post on their ML. https://groups.google.com/d/topic/prosody-users/SlXpfwJfgY4/discussion
xmpp.org uses Prosody, any other questions? :P
btw, if you want to toy with it a little, you can always run prosody using luajit (didn't test that myself, but I'm fairly sure it would work). Expect at least 2-4x faster execution.
Look # ejabberd too.

Parsing HTML in Python [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
What's my best bet for parsing HTML if I can't use BeautifulSoup or lxml? I've got some code that uses SGMLlib but it's a bit low-level and it's now deprecated.
I would prefer if it could stomache a bit of malformed HTML although I'm pretty sure most of the input will be pretty clean.
Python has a native HTML parser, however the Tidy wrapper Nick suggested would probably be a solid choice as well. Tidy is a very common library, (written in C is it?)
Perhaps µTidylib will meet your needs?
You can install lxml and many other python modules easily and seamlessly on the Mac (OS X) using Pallet, which is the MacPorts official GUI
The module name is py27-lxml. Easy as 1,2,3.
http://www.xmlhack.com/read.php?item=1392
http://sourceforge.net/projects/pirxx/
http://pyxml.sourceforge.net/topics/
I don't have much experience with python, but I have used Xerces (from the Apache foundation) in the past and found it to be very useful. The learning curve isn't bad either, though I'm not coming from a python perspective. I suggest you consider it though. (The first two links I've included discuss python interfaces to Xerces and the last one is the first google hit on "python xml").
htql is good at handling malformed html:
http://htql.net/
html5lib is good:
http://code.google.com/p/html5lib/
Update: The link above is broken. A third-party mirror of above, can be accessed from https://github.com/html5lib/gcode-import

Categories