How to write attributes from an xml file to a text file - python

I am using ElementTree to get the attributes and elements I need from an xml file. The xml file is queried from mySQL
I want to write out all the attributes and elements into a new text file using python
root = tree.getroot()
name = root.attrib['name']
country = root.find("country").text
I can see the results when I print them out
I want to write to a file the list of all the names and countries in the xml file

So if you generate an array with all your XML name's, you can use this few line of code to create/write a .txt file and write all names on a new line.
list_names = ["OLIVIA", "RUBY", "EMILY", "GRACE", "JESSICA"]
with open('listName.txt', 'w') as filehandle:
# filehandle.writelines("%s\n" % name for name in list_names)
filehandle.writelines("".join("{0}\n".format(name) for name in list_names))
As #Parfait suggested, here is a solution without % to concatenate string.
Source : https://stackabuse.com/reading-and-writing-lists-to-a-file-in-python/

Related

listing text files names in a new text file for reading links. getting desire result in print command but trying to save it in file or in list nope

i was trying to extract links from many text files for that i tried to make a name list of text files in separate file to store this names in list . but when i tried to get output as print it give me my desire output but when tried to store it in list or a new file it is showing encoded or not readable format .
i marked my desire output in green and undesired output in redcircle.
Can you guide me to achieve my output according to desire
docs = []
with open('files.txt','r') as f:
content = f.readlines()
for doc in content:
if '-' in doc:
print(doc[101:])
docs.append(doc[101:])
#print(doc[101:])
#print(type(doc))
print(docs)
This snippet might help you in fetching file names from the txt file
docs = []
with open('files.txt', 'r') as f:
content = f.readlines()
for doc in content:
file_name = doc.rstrip().split(" ")[-1]
docs.append(file_name)
print(docs)

Add XML element and save the file?

I'm pretty new to XML parsing with Python with minidom.
I have got this XML:
<filelist>
<file id="1.jpg"></file>
</filelist>
I would like to add and then save to the same file the following row for example:
<file id="2.jpg"></file>
I am doing the parsing using:
doc = minidom.parse('filelist.xml')
files = doc.getElementsByTagName('file')
for file in files:
idFile = file.getAttribute("id")
print(idFile)
How I can add that "element" and then save to same file?
Starting with your original code, the following additions were added to your code to accomplish adding an element and saving it back to the original file:
Create a new 'file' element
Set the new 'file' element 'id' attribute
Retrieve the document root ('filelist') node
Append the new 'file' element to the 'filelist' node
Write updated XML back to original file
See updated code following with comments to match this list of additions.
from xml.dom import minidom
doc = minidom.parse('filelist.xml')
# 1. Create a new 'file' element
new_file_element = doc.createElement('file')
# 2. Set the new 'file' element 'id' attribute
new_file_element.setAttribute('id', '2.jpg')
# 3. Retrieve the document root ('filelist') node
filelist_element = doc.documentElement
# 4. Append the new 'file' element to the 'filelist' node
filelist_element.appendChild(new_file_element)
files = doc.getElementsByTagName('file')
for file in files:
idFile = file.getAttribute('id')
print(idFile)
# 5. Write updated XML back to original file
with open('filelist.xml', 'w') as xml_file:
doc.writexml(xml_file, encoding='utf-8')

Why did I insert the element to a xml file without line splitter using python?

I am using python to deal with xml file, I need to insert one line to the xml file, and the code is like this:
xobj = ET.parse('/src/xxx.xml')
xroot = xobj.getroot()
filename = ET.Element("filename")
filename.text = xmlname
xroot.insert(0, filename)
tree = ET.ElementTree(xroot)
tree.write('/dst/xxx.xml')
It did insert one line of contents to the original xml file, but it was not a line. My xml file becomes:
<filename>004228.xml</filename><object>
....
</object>
There should be a \n between </filename> and <object>, but this method does not have that line spliter, how could I make the format look nice ?

CSV Writer only writing first line in file

So I have patent data I wish to store from an XML to a CSV file. I've been able to run my code through each iteration of the invention name, date, country, and patent number, but when I try to write the results into a CSV file something goes wrong.
The XML data looks like this (for one section of many):
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE us-patent-grant SYSTEM "us-patent-grant-v42-2006-08-23.dtd" [ ]>
<us-patent-grant lang="EN" dtd-version="v4.2 2006-08-23" file="USD0584026-20090106.XML" status="PRODUCTION" id="us-patent-grant" country="US" date-produced="20081222" date-publ="20090106">
<us-bibliographic-data-grant>
<publication-reference>
<document-id>
<country>US</country>
<doc-number>D0584026</doc-number>
<kind>S1</kind>
<date>20090106</date>
</document-id>
</publication-reference>
My code for running through and writing these lines one-by-one is:
for xml_string in separated_xml(infile): # Calls the output of the separated and read file to parse the data
soup = BeautifulSoup(xml_string, "lxml") # BeautifulSoup parses the data strings where the XML is converted to Unicode
pub_ref = soup.findAll("publication-reference") # Beginning parsing at every instance of a publication
lst = [] # Creating empty list to append into
for info in pub_ref: # Looping over all instances of publication
# The final loop finds every instance of invention name, patent number, date, and country to print and append into
with open('./output.csv', 'wb') as f:
writer = csv.writer(f, dialect = 'excel')
for inv_name, pat_num, date_num, country in zip(soup.findAll("invention-title"), soup.findAll("doc-number"), soup.findAll("date"), soup.findAll("country")):
#print(inv_name.text, pat_num.text, date_num.text, country.text)
#lst.append((inv_name.text, pat_num.text, date_num.text, country.text))
writer.writerow([inv_name.text, pat_num.text, date_num.text, country.text])
And lastly, the output in my .csv file is this:
"Content addressable information encapsulation, representation, and transfer",07475432,20090106,US
I'm unsure where the issue lies and I know I'm still quite a newbie at Python but can anyone find the problem?
You open the file in overwrite mode ('wb') inside a loop. On each iteration you erase what could have been previously written. The correct way is to open the file outside the loop:
...
with open('./output.csv', 'wb') as f:
writer = csv.writer(f, dialect = 'excel')
for info in pub_ref: # Looping over all instances of publication
# The final loop finds every instance of invention name, patent number, date, and country to print and append into
for inv_name, pat_num, date_num, country in zip(soup.findAll("invention-title"), soup.findAll("doc-number"), soup.findAll("date"), soup.findAll("country")):
...
The problem lies in this line with open('./output.csv', 'wb') as f:
If you want to write all rows into a single file, use mode a. Using wb will overwrite the file and thus you are only getting the last line.
Read more about the file mode here: https://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files

How to split each line from file to it's own string in Python?

I have a file with a bunch of information with this following format:
<name>New York Jets</name>
I am trying to get each line of the file into it's own string. For example, I want this line to say "This is the roster for the New York Jets." I have this so far, but it has "This is the roster for the" for every single line. I think I have to use-
inputString.split('\n')
But I'm not sure where to put it in at. This is what I have so far.
def summarizeData(filename):
with open(filename,"r") as fo:
for rec in fo:
name=rec.split('>')[1].split('<')[0]
print("Here is the roster for the %s." % (name))
and I call summarizeData("NewYorkJets.txt"). So basically I am trying to split each line from the file to get it in it's own string
from xml.dom import minidom
xmldoc = minidom.parse('filename.txt')
itemlist = xmldoc.getElementsByTagName('item')
for s in itemlist:
print(s.attributes['name'].value)
you can read a file having tags like this, and retrieve the values.

Categories