Hello I am having trouble with a xml file I am using. Now what happens is whenever i try to get the msg tag i get an error preventing me from accessing the data. Here is the code I am writing so far.
from xml.dom import minidom
import smtplib
from email.mime.text import MIMEText
from datetime import datetime
def xml_data ():
f = open('C:\opidea_2.xml', 'r')
data = f.read()
f.close()
dom = minidom.parseString(data)
ic = (dom.getElementsByTagName('logentry'))
dom = None
content = ''
for num in ic:
xmlDate = num.getElementsByTagName('date')[0].firstChild.nodeValue
content += xmlDate + '\n '
xmlMsg = num.getElementsByTagName('msg')
if xmlMsg !='' and len(xmlMsg) > 0:
xmlMsgc = xmlMsg[0].firstChild.nodeValue
content += " Comments: \n " + str(xmlMsg) + '\n\n'
else:
xmlMsgc = "No comment made."
content += xmlMsgc
print content
if __name__ == "__main__":
xml_data ()
Here is part of the xml if it helps.
<log>
<logentry
revision="33185">
<author>glv</author>
<date>2012-08-06T21:01:52.494219Z</date>
<paths>
<path
kind="file"
action="M">/branches/Patch_4_2_0_Branch/text.xml</path>
<path
kind="dir"
action="M">/branches/Patch_4_2_0_Branch</path>
</paths>
<msg>PATCH_BRANCH:N/A
BUG_NUMBER:N/A
FEATURE_AFFECTED:N/A
OVERVIEW:N/A
Adding the SVN log size requirement to the branch
</msg>
</logentry>
</log>
Now when i use xmlMsg = num.getElementsByTagName('msg')[0].toxml() I can get the code to work, I just have to do a lot of replacing and I rather not have to do that. Also I have date working using xmlDate = num.getElementsByTagName('date')[0].firstChild.nodeValue.
Is there something I am missing or doing wrong? Also here is the traceback.
Traceback (most recent call last):
File "C:\python\src\SVN_Email_copy.py", line 141, in <module>
xml_data ()
File "C:python\src\SVN_Email_copy.py", line 94, in xml_data
xmlMsg = num.getElementsByTagName('msg').firstChild.nodeValue
AttributeError: 'NodeList' object has no attribute 'firstChild'
I suggest a different approach. Below is a program that does what you want (I think...). It uses the ElementTree API instead of minidom. This simplifies things quite a bit.
You have posted several related questions concerning parsing of an XML file using minidom. I really think you should look into ElementTree (and for even more advanced stuff, check out ElementTree's "superset", lxml). Both these APIs are much easier to work with than minidom.
import xml.etree.ElementTree as ET
def xml_data():
root = ET.parse("opidea_2.xml")
logentries = root.findall("logentry")
content = ""
for logentry in logentries:
date = logentry.find("date").text
content += date + '\n '
msg = logentry.find("msg")
if msg is not None:
content += " Comments: \n " + msg.text + '\n\n'
else:
content += "No comment made."
print content
if __name__ == "__main__":
xml_data()
Output when using your XML sample (you may want to work a bit more on the exact layout):
2012-08-06T21:01:52.494219Z
Comments:
PATCH_BRANCH:N/A
BUG_NUMBER:N/A
FEATURE_AFFECTED:N/A
OVERVIEW:N/A
Adding the SVN log size requirement to the branch
I was doing the code wrong it seems. Here is how i was able to solve it.
if len(xmlMsg) > 0 and xmlMsg[0].firstChild != None:
xmlMsgc = xmlMsg[0].firstChild.nodeValue
xmlMsgpbr = xmlMsgc.replace('\n', ' ')
xmlMsgf.append(xmlMsgpbr)
else:
xmlMsgf = "No comments made"
I never checked if first child had any value or not. That's what I was missing. the other answers helped well but this is how i was able to get it to work. Thank you guys.
myNodeList.item( 0)
maybe...
http://docs.python.org/library/xml.dom.html
use this... print "%s" %(num.getElementsByTagName('date')[0].firstChild.data)
Related
I am one step before finishing a project. As far as I know, all parts of the code works, and I have tested them separately. However, the output CSV still comes out empty for some reason. My code:
import requests, bs4, csv, sys
reload(sys)
sys.setdefaultencoding('utf-8')
url = 'http://www.constructeursdefrance.com/resultat/?dpt=01'
count = 1
def result():
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text,'html.parser')
links = []
try:
for div in soup.select('.link'):
link = div.a.get('href')
links.append(link)
with open('french.csv', 'wb') as file:
writer = csv.writer(file)
for i in links:
res2 = requests.get(i)
soup2 = bs4.BeautifulSoup(res2.text, 'html.parser')
for each in soup2.select('li > strong'):
writer.writerow([each.text, each.next_sibling])
except:
pass
while not url.endswith('?dpt=010'):
print 'downloading %s' %url
result()
count += 1
url = 'http://www.constructeursdefrance.com/resultat/?dpt=0' + str(count)
url = 'http://www.constructeursdefrance.com/resultat/?dpt=10'
count = 10
while not url.endswith('?dpt=102'):
print 'downloading %s' %url
result()
count += 1
url = 'http://www.constructeursdefrance.com/resultat/?dpt=' + str(count)
print 'done'
This is really one of the first bigger projects I am trying to solve as a beginner. Being so close yet so stuck is frustrating, however. Any help is appreciated.
first, do not use try except in a large block, just use in a small place.
if you comment of you try except statement, this error will raise:
Traceback (most recent call last):
File "/home/li/PycharmProjects/tw/1.py", line 29, in <module>
result()
File "/home/li/PycharmProjects/tw/1.py", line 26, in result
writer.writerow([each.text, each.next_sibling])
TypeError: a bytes-like object is required, not 'str'
and this error message is clear, when it write to file, it require a bytes_like object, and you can check that the file you opened is in 'wb' mode, 'b' represent bytes mode, so the problem is clear, just change the mode to normal mode which require a str_like object:
with open('french.csv', 'w') as file:
import sys
import os
import urllib
from xml.etree.ElementTree import ElementTree
from xml.etree.ElementTree import tostring
import flickrapi
api_key = ' '
api_password = ' '
photo_id='2124494179'
flickr= flickrapi.FlickrAPI(api_key, api_password)
#photos= flickr.photos_getinfo(photo_id='15295705890')
#tree=ElementTree(flickr.photos_getinfo(photo_id))
#image_id=open('photoIds.txt','r')
#Image_data=open('imageinformation','w')
#e=image_id.readlines(10)
#f= [s.replace('\r\n', '') for s in e]
#num_of_lines=len(f)
#image_id.close()
#i=0
#while i<269846:
# term=f[i]
#try:
photoinfo=flickr.photos_getinfo(photo_id=photo_id)
photo_tree=ElementTree(photoinfo)
#photo_tree.write('photo_tree')
#i+=1
#photo=photo_tree.getroot()
#photodata=photo.getiterator()
#for elem in owner.getiterator():
#for elem in photo.getiterator():
for elem in photo_tree.getroot():
farm=elem.attrib['farm']
id=elem.attrib['id']
server=elem.attrib['server']
#title=photo_tree.find('title').txt
#for child in elem.findall():
# username=child.attrib['username']
# location=child.attrib['location']
# user=elem.attrib['username']
print (farm)
print(id)
print(server)
#owner=photo_tree.findall('owner')
# print(username)
#filename="%s.txt"%(farm)
#f=open(filename,'w')
#f.write("%s"%farm)
#for elem in photo_tree.getiterator():
#for child in photo_tree.getiterator():
#print (child.attrib)
#owner=child.attrib['username']
I would like to read data from a file and pass it to flickrapi method to get images' information recursively using pythonand save it in a file as a text: image id=.... user name=... location=... tags=... and so on. I could save the attributes of the first element by using .getroot() but I tried to get the attributes of other element but it returns error. I want to save the attributes into txt file and read the image ids from a file so I can use these data in the algorithm I'm working on.
Since I figured out a away to solve the problem(I'm a beginner and know almost nothing about python), what we need to do is to iterator the object(since it's not saved as xml file) using tags name as follows:
photo_tree=ElementTree(photoinfo)
for elem in photo_tree.getroot():
uploaded=elem.attrib['dateuploaded']
uploaded=datetime.datetime.fromtimestamp(float(uploaded)).strftime('%Y-%m-%d %H:%M:%S')
for elem in photo_tree.getiterator(tag='dates'):
taken_date=elem.attrib['taken']
photo_info = open(head + 'filename/' + ('%d.txt') % (id),'a')
photo_info.write(str(id)+'\t'+uploaded+'\t'+taken_date+'\t'+'\n')
may it helps someone who is seeking a solution for same problem. Or may be there is an efficient way to solve this issue!!
I have a metadata of Quickbird in the format xml:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<isd>
<IMD>
<VERSION>AA</VERSION>
<GENERATIONTIME>2008-01-04T18:36:17.000000Z</GENERATIONTIME>
<PRODUCTORDERID>005708443040_01_P001</PRODUCTORDERID>
<PRODUCTCATALOGID>901001001E9ED900</PRODUCTCATALOGID>
</IMD>
</isd>
I would like to convert xml into text format as following:
version = "AA"; generationTime = 2008-01-04T18:36:17.000000Z; productOrderId = "005708443040_01_P001"; productCatalogId = "901001001E9ED900"; childCatalogId = "202001001E9ED800";
I wrote the python code as following, but it didn't provide the result as I expected
from xml.dom import minidom
xmldoc = minidom.parse("image.XML")
isd = xmldoc.getElementsByTagName("isd")[0]
imds = isd.getElementsByTagName("IMD")
for imd in imds:
print (imd)
Could you please help me how to do this task?
Thanks so much for your help.
This should print all the contents of the XML (It doesn't convert to camel-case as in your expected result because there's no way to know which characters to keep in uppercase and which to move to lowercase):
from xml.dom import minidom
xmldoc = minidom.parse("image.XML")
isd = xmldoc.getElementsByTagName("isd")[0]
imds = isd.getElementsByTagName("IMD")
for imd in imds:
for child in imd.childNodes:
if child.nodeType == minidom.Node.ELEMENT_NODE:
print child.nodeName+ ' = "' + child.childNodes[0].nodeValue + '"; ',
This will print:
VERSION = "AA"; GENERATIONTIME = "2008-01-04T18:36:17.000000Z"; PRODUCTORDERID = "005708443040_01_P001"; PRODUCTCATALOGID = "901001001E9ED900";
See a working example in this Python Fiddle (click on "RUN" to see the result)
I got this script from a forum and it keeps coming up with the following error
Traceback (most recent call last):
File "test.py", line 42, in <module> main()
File "test.py", line 28, in main
bot_response = objektid[0].toxml()
IndexError: list index out of range
I have searched around for an answer to this but, I cannot relate the answers to my code, maybe due to me being such a noob with python.
The script is as follows.
#!/usr/bin/python -tt
# Have a conversation with a PandaBot AI
# Author A.Roots
import urllib, urllib2
import sys
from xml.dom import minidom
from xml.sax.saxutils import unescape
def main():
human_input = raw_input('You: ')
if human_input == 'exit':
sys.exit(0)
base_url = 'http://www.pandorabots.com/pandora/talk-xml'
data = urllib.urlencode([('botid', 'ebbf27804e3458c5'), ('input', human_input)])
# Submit POST data and download response XML
req = urllib2.Request(base_url)
fd = urllib2.urlopen(req, data)
# Take Bot's response out of XML
xmlFile = fd.read()
dom = minidom.parseString(xmlFile)
objektid = dom.getElementsByTagName('that')
bot_response = objektid[0].toxml()
bot_response = bot_response[6:]
bot_response = bot_response[:-7]
# Some nasty unescaping
bot_response = unescape(bot_response, {"'": "'", """: '"'})
print 'Getter:',str(bot_response)
# Repeat until terminated
while 1:
main()
if __name__ == '__main__':
print 'Hi. You can now talk to Getter. Type "exit" when done.'
main()
Your help on this is greatly appreciated
No element <that> was found:
objektid = dom.getElementsByTagName('that')
so the list is empty.
Testing your code, I get the message:
<result status="3" botid="ebbf27804e3458c5"><input>Hello world!</input><message>Failed to find bot</message></result>
which contains no such tags. The error message seems to indicate that the specific bot id you are using does not or no longer exist. Perhaps you need to sign up for a new bot of your own on the Pandorabots homepage?
I note that you are doing Some nasty unescaping. Why not grab the text nodes under that tag instead and let the DOM library take care of that for you?
You may want to look into the ElementTree API (included with Python) instead as it is easier to use.
The problem is here
objektid = dom.getElementsByTagName('that')
bot_response = objektid[0].toxml()
If the dom.getElementsByTagName returns nothing at all, then objektid[0], the first element of objektid will not exist. Hence the fault!
To get around it do something like
objektid = dom.getElementsByTagName('that')
if len(objektid) >= 0:
bot_response = objektid[0].toxml()
bot_response = bot_response[6:]
bot_response = bot_response[:-7]
# Some nasty unescaping
bot_response = unescape(bot_response, {"'": "'", """: '"'})
else:
bot_response = ""
print 'Getter:',str(bot_response)
I am trying to read all the links in the tag and then trying to create wiki links out of it...basically I want to read each link from the xml file and then create wiki links with the last word(please see below on what I mean by lastword) of the link...for somereason am running into following error,what I am missing,please suggest
http://wiki.build.com/ca_builds/CIT (last word is CIT)
http://wiki.build.com/ca_builds/1.2_Archive(last word is 1.2_Archive)
INPUT XML:-
<returnLink>
http://wiki.build.com/ca_builds/CIT
http://wiki.build.com/ca_builds/1.2_Archive
</returnLink>
PYTHON code
def getReturnLink(xml):
"""Collects the link to return to the PL home page from the config file."""
if xml.find('<returnLink>') == -1:
return None
else:
linkStart=xml.find('<returnLink>')
linkEnd=xml.find('</returnLink>')
link=xml[linkStart+12:linkEnd].strip()
link = link.split('\n')
#if link.find('.com') == -1:
#return None
for line in link:
line = line.strip()
print "LINE"
print line
lastword = line.rfind('/') + 1
line = '['+link+' lastword]<br>'
linklis.append(line)
return linklis
OUTPUT:-
line = '['+link+' lastword]<br>'
TypeError: cannot concatenate 'str' and 'list' objects
EXPECTED OUTPUT:-
CIT (this will point to http://wiki.build.com/ca_builds/CIT
1.2_Archive (this will point to http://wiki.build.com/ca_builds/1.2_Archive 1.2_Archive)
Python standard library has xml parser. You can also support multiple <returnLink> elements and Unicode words in an url:
import posixpath
import urllib
import urlparse
from xml.etree import cElementTree as etree
def get_word(url):
basename = posixpath.basename(urlparse.urlsplit(url).path)
return urllib.unquote(basename).decode("utf-8")
urls = (url.strip()
for links in etree.parse(input_filename_or_file).iter('returnLink')
for url in links.text.splitlines())
wikilinks = [u"[{} {}]".format(url, get_word(url))
for url in urls if url]
print(wikilinks)
Note: work with Unicode internally. Convert the text to bytes only to communicate with outside world e.g., when writing to a file.
Example
[http://wiki.build.com/ca_builds/CIT#some-fragment CIT]
[http://wiki.build.com/ca_builds/Unicode%20%28%E2%99%A5%29 Unicode (♥)]
Intead of parsing XML by hand, use a library like lxml:
>>> s = """<returnLink>
... http://wiki.build.com/ca_builds/CIT
... http://wiki.build.com/ca_builds/1.2_Archive
... </returnLink>"""
>>> from lxml import etree
>>> xml_tree = etree.fromstring(s)
>>> links = xml_tree.text.split()
>>> for i in links:
... print '['+i+']'+i[i.rfind('/')+1:]
...
[http://wiki.build.com/ca_builds/CIT]CIT
[http://wiki.build.com/ca_builds/1.2_Archive]1.2_Archive
I'm not sure what you mean by wikilinks, but the above should give you an idea on how to parse the string.
I'm having some difficulty understanding you question, but it seems like you just want to return the string after the last '/' character in the link? You can do this with reverse find.
return link[link.rfind('/') + 1:]