Python and XPath - python

I'm trying to parse this XML
I want to get a list of all of the mechanisms, so I'm trying to use XPATH (please suggest if theres an easier way) to get the mechanisms...
Here is my code:
parseMessage = libxml2.parseDoc(doc)
xpathcon = parseMessage.xpathNewContext()
xpathcon.xpathRegisterNs('urn','http://etherx.jabber.org/streams')
nodes = xpathcon.xpathEval("//urn:text()")
print nodes
And here is the error I'm getting...
Entity: line 1: parser error : Premature end of data in tag stream line 1
h"/><register xmlns="http://jabber.org/features/iq-register"/></stream:features>
I know that my code doesn't remove all the mechanisms but first I'd just like to get around the issue at hand. Anyway to make this into correct XML that can be parsed? Do I need to add a new header or remove a header or do something else?

It looks like you're trying to build an XMPP library. Why not use an existing library, such as SleekXMPP?
If you really need to build your own XMPP library, you'll need to use a streaming parser, such as Expat.

Please use one of the existing XMPP libraries.
Next: you're not going to be successful with XMPP thinking of it like a document. You'll be able to hack around it for a few days making yourself believe that you're on to something, and then you'll realize that there is no way to tell when the server is done sending you information, so there's no way to know when to call what you have a document.
Instead, use a stream-based parser. SleekXMPP uses xml.etree.cElementTree.iterparse with a wrapper around the socket to make it smell like a file. There are likely other ways, like using xml.parsers.expat directly.

Related

Python: direct requesting with requests.get('http.weatherdataexample.com/example.json')

I'm using Python 3.x and the lib requests.
is there a way to get a specific value by putting the whole path from a json-file without downloading first the whole json?
I have the follow code to get weatherdata:
import requests
data=requests.get('http.weatherdata.com/example.json')
current_wind=a.json()['features'][48]['properties']['value']
is there a way to request directly the "current wind" like somehow like the following? this could decrease traffic from the request.
import requests
current_wind=requests.get('http.weatherdata.com/example.json',['features'][48]['properties']['value'])
i was looking for that specific question for quite a while... almost gave up.
thank for your answer.
Please be more accurate in the question and/or give examples.
But I think you mean something like this:
a=requests.get('http.weatherdata.com/example.json')
current_wind=a.json()['features'][48]['properties']['value']
You have to define the variable for the request, then convert it into json.

Feedparser.parse etag and modified params

import feedparser
d = feedparser.parse('http://rss.cnn.com/rss/edition.rss', etag=d.etag)
I am new to Python and can't get my head around the parameter etag=d.etag
I Don't understand the data type. It's important to me because I am trying to make this parameter as a string dynamically. Does not work. I printed type(d.etag), result is Unicode. So I tried to the Unicode func to form my string. Still no luck. Sorry, I realise this is so basic, I just can't get it. I know, to get the etag working is easy to achieve if you follow the examples from the feedparser site, where you do your first call without a param, then each subsequent call use the etag=d.etag. I am mainly learning on my iPad and am using Pythonista, so I am running my program over and over. I also know I could write it out to a file, and parse the file instead, but I really want to understand why I can't dynamically create this param. I am sure I will hit the same problem with another module sooner or later.

Naming issue with REST wrapper hammock in Python

I'm using a REST wrapper in Python called Hammock. Better than I can explain "Hammock is a fun module lets you deal with rest APIs by converting them into dead simple programmatic APIs. It uses popular requests module in backyard to provide full-fledged rest experience."
It will turn api.website/end/point/ into website.end.point which makes working with the API pretty simple. The issue I've run into is when an endpoint has a character in it that Python does not allow in names, '-' in this case (ex api.website/end-point/). Accessing an endpoint like this turns into website.end-point, which is invalid python code.
I looked and '-' is a totally valid character to have in a REST endpoint name. Is there a way to allow this character, maybe the equivalent of a character escape or something? I think I could fix it in the inner code of the module, but figure that's probably a bad way to go about this. Any ideas?
I was able to fix this by using 'website("end-point")' instead of 'website.end-point'. I hope this helps someone else out.
https://github.com/kadirpekel/hammock/issues/20

restructuredtext: what is the best way of extracting bibliographic and other fields?

I am putting together a website where the content is maintained as restructuredtext that is then converted into html. I need more control than e.g. rst2html.py, so I am using a my own python script that uses things like
docutils.core.publish_parts(source, writer_name='html')
to create the html.
publish_parts() gives me useful parts like the title, body, etc. However, it seems I must look elsewhere to get the values of rst fields like
:Authors:
:version:
etc. For this, I have been using publish_doctree() as in
doctree = core.publish_doctree(source).asdom()
and then going through this recursively using using getElementsByTagName() as in
doctree.getElementsByTagName('authors')
doctree.getElementsByTagName('version')
etc.
Using publish_doctree() to extract fields does the job, and that's good, but it does seem more convoluted than using e.g. publish_parts().
My question is simply whether this is the best recommended way of extracting out these rst fields, or is there a more direct and less convoluted way? If not, that is fine, but I thought I would inquire in case I am missing something.

Need example/help with GtkTextBuffer (of GtkTextView) serialize/deserialize

I am trying to save user's bold/italic/font/etc tags in a GtkTextView.
Using GtkTextBuffer.get_text() does not return the tags.
The best documentation I have found on this is:
http://www.pygtk.org/docs/pygtk/class-gtktextbuffer.html#method-gtktextbuffer--register-serialize-format
However, I do not understand the function arguments.
It would be infinitely handy to have an example of how these are used to save/load a textview with tags in it.
Edit: I would like to clarify what I am trying to accomplish. Basically I want to save/load the textview's text+tags. I have no desire to do anything more complicated than that. I am using pickle as the file format, so I dont need any help here on how to save it or in what format. Just need a way to pull/push the data so that the user loses nothing that he/she sees on screen. Thank you.
If you need to save the tags because you just want to copy the text into another text buffer, you can use gtk.TextBuffer.insert_range().
If you need to save the text with tags into another format readable by other programs, I once wrote a library with a GTK text buffer serializer to and from RTF. It doesn't have any Python bindings though. But in any case the code is a good example of how to use the serializer facility. Link: Osxcart
I haven't worked with GtkTextBuffer's serialization. Reading the documentation you linked, I would suggest trying the default serializer, by calling
textbuffer.register_serialize_tagset()
This gives you GTK+'s built-in proprietary serializer. Being proprietary here means that it doesn't serialize into some well-known format; but if all you need is the ability to save out the text buffer's contents and load them back, this should be fine.
Of course the source code is available inside GTK+ if you really want to figure out how it works; I would recommend against trying to implement e.g. a stand-alone de-serializer though, since there are probably no guarantees made by GTK+ that the format will remain as-is.

Categories