attributes not appearing in xml2dict parse output even with xml_attribs = True - python

I am having a problem with python xmltodict. Following the near-consensus recommendation here, I tried xmltodict and liked it very much until I had to access attributes at the top level of my handler. I'm probably doing something wrong but it's not clear to me what. I have an xml document looking something like this
<api>
<cons id="79550" modified_dt="1526652449">
<firstname>Mackenzie</firstname>
...
</cons>
<cons id="79551" modified_dt="1526652549">
<firstname>Joe</firstname>
...
</cons>
<api>
I parse it with this:
xmltodict.parse(apiResult.body, item_depth=2, item_callback=handler, xml_attribs=True)
where apiResult.body contains the xml shown above. But, in spite of the xml_attribs=True, I see no #id or #modified_dt in the output after parsing in the handler, although all the elements in the original do appear.
The handler is coded as follows:
def handler(_, cons):
print (cons)
mc = MatchChecker(cons)
mc.check()
return True
What might I be doing wrong?
I've also tried xmljson and instantly don't like it as well as xmltodict, if only I had the way around this issue. Does anyone have a solution to this problem or a package that would handle this better?

xmltodict works just fine, but you are parsing the argument item_depth=2 which means your handler will only see the elements inside the <cons> elements rather than the <cons> element itself.
xml = """
<api>
<cons id="79550" modified_dt="1526652449">
<firstname>Mackenzie</firstname>
</cons>
</api>
"""
def handler(_,arg):
for i in arg.items():
print(i)
return True
xmltodict.parse(xml, item_depth=2, item_callback=handler, xml_attribs=True)
Prints ('firstname', 'Mackenzie') as expected.
Whereas:
xmltodict.parse(xml, item_depth=1, item_callback=handler, xml_attribs=True)
Prints ('cons', OrderedDict([('#id', '79550'), ('#modified_dt', '1526652449'), ('firstname', 'Mackenzie')])), again as expected.

Related

Try/Except not working with BeautifulSoup

I am trying to loop over a series of pages and extract some info. However, in certain pages some exceptions occur and I need to deal with them. I created the following function to try to deal with them. See below:
def iferr(x):
try:
x
except (Exception, TypeError, AttributeError) as e:
pass
I intend to use as part of code like this:
articles = [[iferr(dp[0].find('span', class_='citation')),\
iferr(dp[0].find('div', class_='abstract')),\
iferr(dp[0].find('a', rel='nofollow')['href'])] for dp in data]
The idea is that if, for example, dp[0].find('a', rel='nofollow')['href'] leads to an error (fails), it will simply ignore it (fill it with a blank or a None).
However, whenever an error/exception occurs in one of the three elements it does not 'pass'. It just tells me that the error has occurred. There errors it displays are those I listed in the 'except' command which I assume would be dealt with.
EDIT:
Per Michael's suggestion, I was able to see that the order in which iferr processes what is going on would always prompt the error before he try. So I worked on workaround:
def fndwoerr(d,x,y,z,h):
try:
if not h:
d.find('x',y = 'z')
else:
d.find('x',y = 'z')['h']
except (Exception, TypeError, AttributeError) as e:
pass
...
articles = [[fndwoerr(dp[0],'span','class_','citation',None),\
fndwoerr(dp[0],'div','class_','abstract',None),\
fndwoerr(dp[0], 'a', 'rel','nofollow','href')] for dp in data]
Now it runs without prompting an error. However, everything returned becomes None. I am pretty sure it has to do with he way the parameters are entered. y should not be displayed as a string in the find function, whereas z has. However, I input both as string when i call the function. How can I go about this?
Example looks a bit strange, so it would be a good idea to improve the question, so that we can reproduce your issue easily. May read how to create minimal, reproducible example
The idea is that if, for example, dp[0].find('a',
rel='nofollow')['href'] leads to an error (fails), it will simply
ignore it (fill it with a blank or a None).
What about checking if element is available with an if-statement?
dp[0].find('a', rel='nofollow').get('href']) if dp[0].find('a', rel='nofollow') else None
or with walrus operator from python 3.8:
l.get('href']) if (l:=dp[0].find('a', rel='nofollow')) else None
Example
from bs4 import BeautifulSoup
soup = BeautifulSoup('<h1>This is a Heading</h1>', 'html.parser')
for e in soup.select('h1'):
print(e.find('a').get('href') if e.find('a') else None)

Setting NSSpeechSynthesizer mode from Python

I am using PyObjC bindings to try to get a spoken sound file from phonemes.
I figured out that I can turn speech into sound as follows:
import AppKit
ss = AppKit.NSSpeechSynthesizer.alloc().init()
ss.setVoice_('com.apple.speech.synthesis.voice.Alex')
ss.startSpeakingString_toURL_("Hello", AppKit.NSURL.fileURLWithPath_("hello.aiff"))
# then wait until ve.isSpeaking() returns False
Next for greater control I'd like to turn the text first into phonemes, and then speak them.
phonemes = ss.phonemesFromText_("Hello")
But now I'm stuck, because I know from the docs that to get startSpeakingString to accept phonemes as input, you first need to set NSSpeechSynthesizer.SpeechPropertyKey.Mode to "phoneme". And I think I'm supposed to use setObject_forProperty_error_ to set that.
There are two things I don't understand:
Where is NSSpeechSynthesizer.SpeechPropertyKey.Mode in PyObjC? I grepped the entire PyObjC directory and SpeechPropertyKey is not mentioned anywhere.
How do I use setObject_forProperty_error_ to set it? I think based on the docs that the first argument is the value to set (although it's called just "an object", so True in this case?), and the second is the key (would be phoneme in this case?), and finally there is an error callback. But I'm not sure how I'd pass those arguments in Python.
Where is NSSpeechSynthesizer.SpeechPropertyKey.Mode in PyObjC?
Nowhere.
How do I use setObject_forProperty_error_ to set it?
ss.setObject_forProperty_error_("PHON", "inpt", None)
"PHON" is the same as NSSpeechSynthesizer.SpeechPropertyKey.Mode.phoneme
"inpt" is the same as NSSpeechSynthesizer.SpeechPropertyKey.inputMode
It seems these are not defined anywhere in PyObjC, but I found them by firing up XCode and writing a short Swift snippet:
import Foundation
import AppKit
let synth = NSSpeechSynthesizer()
let x = NSSpeechSynthesizer.SpeechPropertyKey.Mode.phoneme
let y = NSSpeechSynthesizer.SpeechPropertyKey.inputMode
Now looking at x and y in the debugger show that they are the strings mentioned above.
As for how to call setObject_forProperty_error_, I simply tried passing in those strings and None as the error handler, and that worked.

Can you use mock_open to simulate serial connections?

Morning folks,
I'm trying to get a few unit tests going in Python to confirm my code is working, but I'm having a real hard time getting a Mock anything to fit into my test cases. I'm new to Python unit testing, so this has been a trying week thus far.
The summary of the program is I'm attempting to do serial control of a commercial monitor I got my hands on and I thought I'd use it as a chance to finally use Python for something rather than just falling back on one of the other languages I know. I've got pyserial going, but before I start shoving a ton of commands out to the TV I'd like to learn the unittest part so I can write for my expected outputs and inputs.
I've tried using a library called dummyserial, but it didn't seem to be recognising the output I was sending. I thought I'd give mock_open a try as I've seen it works like a standard IO as well, but it just isn't picking up on the calls either. Samples of the code involved:
def testSendCmd(self):
powerCheck = '{0}{1:>4}\r'.format(SharpCodes['POWER'], SharpCodes['CHECK']).encode('utf-8')
read_text = 'Stuff\r'
mo = mock_open(read_data=read_text)
mo.in_waiting = len(read_text)
with patch('__main__.open', mo):
with open('./serial', 'a+b') as com:
tv = SharpTV(com=com, TVID=999, tvInput = 'DVI')
tv.sendCmd(SharpCodes['POWER'], SharpCodes['CHECK'])
com.write(b'some junk')
print(mo.mock_calls)
mo().write.assert_called_with('{0}{1:>4}\r'.format(SharpCodes['POWER'], SharpCodes['CHECK']).encode('utf-8'))
And in the SharpTV class, the function in question:
def sendCmd(self, type, msg):
sent = self.com.write('{0}{1:>4}\r'.format(type,msg).encode('utf-8'))
print('{0}{1:>4}\r'.format(type,msg).encode('utf-8'))
Obviously, I'm attempting to control a Sharp TV. I know the commands are correct, that isn't the issue. The issue is just the testing. According to documentation on the mock_open page, calling mo.mock_calls should return some data that a call was made, but I'm getting just an empty set of []'s even in spite of the blatantly wrong com.write(b'some junk'), and mo().write.assert_called_with(...) is returning with an assert error because it isn't detecting the write from within sendCmd. What's really bothering me is I can do the examples from the mock_open section in interactive mode and it works as expected.
I'm missing something, I just don't know what. I'd like help getting either dummyserial working, or mock_open.
To answer one part of my question, I figured out the functionality of dummyserial. The following works now:
def testSendCmd(self):
powerCheck = '{0}{1:>4}\r'.format(SharpCodes['POWER'], SharpCodes['CHECK'])
com = dummyserial.Serial(
port='COM1',
baudrate=9600,
ds_responses={powerCheck : powerCheck}
)
tv = SharpTV(com=com, TVID=999, tvInput = 'DVI')
tv.sendCmd(SharpCodes['POWER'], SharpCodes['CHECK'])
self.assertEqual(tv.recv(), powerCheck)
Previously I was encoding the dictionary values as utf-8. The dummyserial library decodes whatever you write(...) to it so it's a straight string vs. string comparison. It also encodes whatever you're read()ing as latin1 on the way back out.

XML Attribute modification not being saved

I have the following code:
def incrCount(root):
root.attrib['count'] = int(root.attrib['count']) + 1
# root.set('count', int(root.attrib['count']) + 1)
root = getXMLRoot('test.xml')
incrCount(root)
print root.attrib['count']
when I run it, the correct value is printed but that change is never visible in the file at the end of execution. I have tried both methods above to no success. Can anyone point out where I made the mistake?
As exemplified in the documentation (19.7.1.4. Modifying an XML File), you need to write back to file after all modification operations has been performed. Assuming that root references instance of ElementTree, you can use ElementTree.write() method for this purpose :
.....
root = getXMLRoot('test.xml')
incrCount(root)
print root.attrib['count']
root.write('test.xml')

pdfquery/PyQuery: example code shows no AttributeError but mine does...why?

I'm following the example code found here. The author has some documentation where he list some steps that used to write the program. When I run the whole program together it runs perfectly but when I follow the steps he's put I get an AttributeError.
Here's my code
pdf = pdfquery.PDFQuery("Aberdeen_2015_1735t.pdf")
pdf.load()
pdf.tree.write("test3.xml", pretty_print=True, encoding="utf-8")
sept = pdf.pq('LTPage[pageid=\'1\'] LTTextLineHorizontal:contains("SEPTEMBER")')
print(sept.text())
x = float(sept.get('x0'))
y = float(sept.get('y0'))
cells = pdf.extract( [
('with_parent','LTPage[pageid=\'1\']'),
('cells', 'LTTextLineHorizontal:in_bbox("%s,%s,%s,%s")' % (x, y, x+600, y+20))
])
Everything runs fine until it gets to "sept.get" where it says that "'PyQuery' object has no attribute 'get'." Does anyone know why the program wouldn't encounter this error when it's run all together but it occurs when a piece of the code is run?
According to the PyQuery API reference, a PyQuery object indeed doesn't have a get member. The code example must be obsolete.
According to https://pypi.python.org/pypi/pdfquery, attributes are retrieved with .attr:
x = float(sept.attr('x0'))
Judging by the history of pyquery's README.rst, get was never documented and only worked due to some side effect (some delegation to a dict, perhaps).

Categories