Writing Python XML ElementTree output to CSV - python

TL;DR
I'm now able to output the information I want in the CSV but I'm just repeating the last XML file's data over and over again.
This is the latest version of the script:
import csv
import glob
import xml.etree.ElementTree as ET
filenames = glob.glob("..\Lib\macros\*.xml")
for filename in filenames:
with open(filename, 'r') as content:
element = ET.parse(content)
root = element.getroot()
print(root.attrib, filename)
e = element.findall('commands/MatrixSwitch/')
for i in e:
print (i.tag, i.text)
with open('results.csv', 'w', newline='') as file:
for filename in filenames:
writer = csv.writer(file)
writer.writerow([root.attrib, filename])
for i in e:
writer.writerow([i.tag, i.text])
Say I have 10 XML files, I'm getting the output related to XML "File 10" 10 times in the CSV, not anything for XML "File 1-9" ... sure its something simple?
=========================================================================
I've written a small script which ingests a folder of XML files, searches for a particular element and then recalls some of the data. This is then printed to the console and written to a CSV, except I'm having trouble formatting my CSV correctly.
This is where I've got so far:
import csv
import glob
import xml.etree.ElementTree as ET
filenames = glob.glob("..\Lib\macros\*.xml")
for filename in filenames:
with open(filename, 'r') as content:
element = ET.parse(content)
root = element.getroot()
print(root.attrib, filename)
e = element.findall('commands/MatrixSwitch/')
for i in e:
print (i.tag, i.text)
with open('results.csv', 'w', newline='') as file:
writer = csv.writer(file)
writer.writerow([root.attrib, filename])
I'm looking to capture the following data:
XML Filename
Macro Name
Monitor ID
Camera ID
I'm only interested in the and when a "Matrix Switch" is referred to in the XML. Sometimes there might only be one monitor ID and one camera ID, sometimes there might be more so the script needs to loop through and get all of the IDs within the "Matrix Switch" element. This seems to work so far.
Typical XML structure looks like this :
<macro name="NAME OF THE MACRO IS SHOWN HERE">
<execution>
<delay>0</delay>
</execution>
<parameters/>
<commands>
<MatrixSwitch>
<camera>1530</camera>
<monitor>1020</monitor>
</MatrixSwitch>
<MatrixSwitch>
<camera>1531</camera>
<monitor>1001</monitor>
</MatrixSwitch>
</commands>
</macro>
Or like this :
<macro name="ANOTHER NAME GOES HERE">
<execution>
<delay>0</delay>
</execution>
<parameters/>
<commands>
<MatrixSwitch>
<camera>201</camera>
<monitor>17</monitor>
</MatrixSwitch>
<MatrixSwitch>
<camera>206</camera>
<monitor>18</monitor>
</MatrixSwitch>
<MatrixSwitch>
<camera>202</camera>
<monitor>19</monitor>
</MatrixSwitch>
<MatrixSwitch>
<camera>207</camera>
<monitor>20</monitor>
</MatrixSwitch>
</commands>
</macro>
My current results.csv is only set to output the name and filename. This works but I'm unsure where I need to add the "writer" command to the loop where its dealing with the Monitor ID and Camera ID .
I want my CSV to show : Name, Filename, Monitor A, Camera A, Monitor B, Camera B, Monitor C, Camera C, Monitor D, Camera D etc.....
Any pointers greatly appreciated!!
Code has now been changed slightly :
import csv
import glob
import xml.etree.ElementTree as ET
filenames = glob.glob("..\Lib\macros\*.xml")
for filename in filenames:
with open(filename, 'r') as content:
element = ET.parse(content)
root = element.getroot()
print(root.attrib, filename)
e = element.findall('commands/MatrixSwitch/')
for i in e:
print (i.tag, i.text)
with open('results.csv', 'w', newline='') as file:
writer = csv.writer(file)
writer.writerow([root.attrib, filename])
for i in e:
writer.writerow([i.tag, i.text])
Output in the CSV is as below :
https://imgur.com/a/SrPrgjm

Just add a loop calling writerow:
...
with open('results.csv', 'w', newline='') as file:
writer = csv.writer(file)
writer.writerow([root.attrib, filename])
for i in e:
writer.writerow([i.tag, i.text])

Related

Write a CSV from JSON, importing only given keys

I have JSONs reporting different values, and I want to import only some keys in a csv.
I have tried 2 approaches, but both give me some problems.
At first, I have tried this :
`import os,json
import glob
import csv
# Place your JSON data in a directory named 'data/'
src = "MYPATH"
data = []
json_pattern = os.path.join(src, '*.json')
# only json
files = glob.glob(json_pattern, recursive=True)
# Loop through files
for single_file in files:
with open(single_file, 'r') as f:
json_file = json.load(f)
try:
data.append([
json_file['name1'],
json_file['name2'],
json_file['name3'],
json_file['name4'],
])
except KeyError:
continue
# Add headers
data.insert(0, ['title_1', 'title_2', 'title_3'])
# Export to CSV.
# Add the date to the file name to avoid overwriting it each time.
csv_filename = 'name.csv'
with open((src + csv_filename), "w", newline="") as f:
writer = csv.writer(f)
writer.writerows(data)`
In this way, unfortunately, if a key is not included, the code skip the file altogether, while I want it to skip only the key.
So I tried this, instead:
import os,json
import glob
import csv
# Place your JSON data in a directory named 'data/'
src = "MY_PATH"
data = []
json_pattern = os.path.join(src, '*.json')
# Change the glob if you want to only look through files with specific names
files = glob.glob(json_pattern, recursive=True)
# Loop through files
col_name = ['name1','name2','name4']
for single_file in files:
with open(single_file, 'r') as f:
json_file = json.load(f)
for key in col_name:
try:
data.append([json_file[key]])
except KeyError:
continue
# Add headers
data.insert(0, ['title_1', 'title_2', 'title_3'])
# Export to CSV.
# Add the date to the file name to avoid overwriting it each time.
csv_filename = 'name.csv'
with open((src + csv_filename), "w", newline="") as f:
writer = csv.writer(f)
writer.writerows(data)
But in this case, each value is a new row in the csv, while I want the value from each json in a single row.
I am not an expert and I really don't know how to combine this two.
Can someone help me out?
Thanks!
If I understand what you're trying to do correctly, why not just do
# Loop through files
for single_file in files:
with open(single_file, 'r') as f:
json_file = json.load(f)
data.append([
json_file.get('name1', ''),
json_file.get('name2', ''),
json_file.get('name3', ''),
json_file.get('name4', '')
])
By using .get() you can specify the default value in case a key isn't found.

Don't understand why is my exported csv-file read-only?

I'm using the DyMat Python-package to export variables from .mat-files to .csv.
However, the exported .csv-files are all read-only? Why is this the case and is there a solution to resolve this?
I've tried to close() the file properly or delete the file in Python but that did not work. Any other suggestions?
import csv, numpy, os
import DyMat
os.chdir("C:/Users/myvhove/Documents/ResultsPyDymInt/Dymola/CoupledClutches")
print(os.getcwd())
dm = DyMat.DyMatFile("dymatresfile")
print(dm.fileName)
print(dm.names())
varList = ('J1.w', 'J2.w', 'J3.w', 'J4.w')
fileName = dm.fileName + '.csv'
oFile = open(fileName, 'w', newline='')
csvWriter = csv.writer(oFile)
vDict = dm.sortByBlocks(varList)
for vList in vDict.values():
vData = dm.getVarArray(vList)
vList.insert(0, dm._absc[0])
csvWriter.writerow(vList)
csvWriter.writerows(numpy.transpose(vData))
oFile.close()

Outputting child nodes to CSV with Python

Edit: I've replaced the example XML with real data and provided my code at the bottom.
I have several xml-files containing from 1 to 10+ lines of the following data:
<?xml version="1.0" encoding="UTF-8"?>
<Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:cec="urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2" xmlns:soapenv="http://www.w3.org/2003/05/soap-envelope" xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd" xmlns:xenc="http://www.w3.org/2001/04/xmlenc#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2 UBL-Invoice-2.0.xsd">
<cac:LegalMonetaryTotal>
<cbc:PayableAmount currencyID="DKK">2586.61</cbc:PayableAmount>
</cac:LegalMonetaryTotal>
<cac:InvoiceLine>
<cbc:ID>1</cbc:ID>
<cbc:InvoicedQuantity unitCode="HUR">1.50</cbc:InvoicedQuantity>
<cbc:LineExtensionAmount currencyID="DKK">1633.65</cbc:LineExtensionAmount>
</cac:InvoiceLine>
<cac:InvoiceLine>
<cbc:ID>2</cbc:ID>
<cbc:InvoicedQuantity unitCode="HUR">1.00</cbc:InvoicedQuantity>
<cbc:LineExtensionAmount currencyID="DKK">952.96</cbc:LineExtensionAmount>
</cac:InvoiceLine>
</Invoice>
And I want to output the data to a CSV-file in the following structure:
filename,lineId,lineQuantity,lineAmount,payableAmount
file1,1,1.50,1633.65,2586.61
file1,2,1.00,952.96,2586.61
file2,.,.,.
...where there's a row for each line per file coupled with the filename and total amount.
This is my code:
from os import listdir, path, walk
import xml.etree.ElementTree as ET
import csv
def invoicelines(self):
filename = path.splitext(path.split(file)[1])[0]
lineId = root.find('./InvoiceLine/ID').text
lineQuantity = root.find('./InvoiceLine/InvoicedQuantity').text
lineAmount = root.find('./InvoiceLine/LineExtensionAmount').text
payableAmount = root.find('./LegalMonetaryTotal/PayableAmount').text
row = [
filename,
lineId,
lineQuantity,
lineAmount,
payableAmount
]
return row
csvfile = 'output.csv'
def csv_write_header(csvfile):
with open(csvfile, 'w', newline='') as outfile:
writer = csv.writer(outfile)
writer.writerow([
'filename',
'lineId',
'lineQuantity',
'lineAmount',
'payableAmount'
])
xml_files = []
for root, dirs, files in walk('mypath'):
for file in files:
if file.endswith('.xml'):
xml_files.append(path.join(root, file))
csv_write_header(csvfile)
for file in xml_files:
tree = ET.iterparse(file)
for _, el in tree:
el.tag = el.tag.split('}', 1)[1] # ignores namespaces
root = tree.root
if 'Invoice' in root.tag: # only invoice files
for e in root.iter('InvoiceLine'):
with open(csvfile, 'a', newline='') as outfile:
writer = csv.writer(outfile)
writer.writerow(invoicelines(e))
And the output I get if I just parse the above file is:
filename,lineId,lineQuantity,lineAmount,payableAmount
file1,1,1.50,1633.65,2586.61
file1,1,1.50,1633.65,2586.61
...so I'm guessing it's something with my iteration.
The following code achieves your desired result.
import os
import xml.etree.ElementTree as ET
def extract_line_id_data(line_element):
line_id = line_element[0].text
quantity = line_element[1].text
line_amount = line_element[2].text
return line_id, quantity, line_amount
# Iterate over all files in a directory
for _, dirs, files in os.walk('/path/to_folder/with/xml_files/'):
with open('output.csv', 'a') as output:
output.write('Filename,LineID,Quantity,LineAmount,TotalAmount\n') # Headers
for xml_file in files:
# If not all files in the folder files are XML you'll need to catch an exception here
tree = ET.parse(xml_file) # might need to use os.path.abspath
root = tree.getroot()
total_amount = root[0][0].text # Get total amount value
# Iterate over all "Line" elements
for e in root[1:]:
output.write('{},{},{},{},{}\n'.format(xml_file, * extract_line_id_data(e), total_amount))
Tested with your file and a "file2.xml" with a TotalAmount of 350, output looks like this:
Filename,LineID,Quantity,LineAmount,TotalAmount
file.xml,1,4,132,407
file.xml,2,1,72,407
file.xml,3,7,203,407
file2.xml,1,4,132,350
file2.xml,2,1,72,350
file2.xml,3,7,203,350
I hope this works for you. I have used ElementTree as preferred, although I would have used lxml myself.
Try following code :
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using System.Xml;
using System.Xml.Linq;
using System.IO;
namespace ConsoleApp2
{
class Program
{
const string FILENAME = #"c:\temp\text.csv";
static void Main()
{
string[] filenames = Directory.GetFiles(#"c:\temp", "*.xml");
StreamWriter writer = new StreamWriter(FILENAME);
foreach (string filename in filenames)
{
XDocument doc = XDocument.Load(filename);
string amount = (string)doc.Descendants("TotalAmount").FirstOrDefault();
foreach (XElement line in doc.Descendants("Line"))
{
writer.WriteLine(string.Join(",",
filename,
(string)line.Element("LineID"),
(string)line.Element("Quantity"),
(string)line.Element("LineAmount"),
amount));
}
}
writer.Flush();
writer.Close();
}
}
}

Original files are automatically deleted by the process while compiling code

I have written a code in python to convert dicom (.dcm) data into a csv file. However, if I run the code for more than once on my database directory, the data is automatically getting lost/deleted. I tried searching in 'recycle bin' but could not find the deleted data. I am not aware of the process of what went wrong with the data.
Is there anything wrong with my code? Any suggestions are highly appreciated.
here is my code:
import xlsxwriter
import os.path
import sys
import dicom
import xlrd
import csv
root = input("Enter Directory Name: ")
#path = os.path.join(root, "targetdirectory")
i=1
for path, subdirs, files in os.walk(root):
for name in files:
os.rename(os.path.join(path, name), os.path.join(path,'MR000'+ str(i)+'.dcm'))
i=i+1
dcm_files = []
for path, dirs, files in os.walk(root):
for names in files:
if names.endswith(".dcm"):
dcm_files.append(os.path.join(path, names))
print (dcm_files)
with open('junk/test_0.csv', 'w', newline='') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=',',
quotechar='|',
quoting=csv.QUOTE_MINIMAL)
spamwriter.writerow(["Folder Name","File Name", "PatientName",
"PatientID", "PatientBirthDate","SliceThickness","Rows"])
for dcm_file in dcm_files:
ds = dicom.read_file(dcm_file)
fileName = dcm_file.split("/")
spamwriter.writerow([fileName[1],fileName[2],
ds.get("PatientName", "None"),
ds.get("PatientID", "None"),
ds.get("PatientBirthDate", "None"),
ds.get("SliceThickness", "None"),
ds.get("Rows", "None")])
You have something like the following scenario:
After 1st iteration, you end with the files: MR0001.dcm, MR0002.dcm, MR0003.dcm... In 2nd iteration, there are the following changes:
os.rename('some_file', 'MR0001.dcm')
os.rename('MR0001.dcm', 'MR0002.dcm')
os.rename('MR0002.dcm', 'MR0003.dcm')
os.rename('MR0003.dcm', 'MR0004.dcm')
...
So at the end there is only a file 'MR0004.dcm'.
Add the following line just below renaming:
print( os.path.join(path, name), '-->', os.path.join(path,'MR000'+ str(i)+'.dcm'))
Then you will see, what exactly files are renamed.

File append in python

I have n files in the location /root as follows
result1.txt
abc
def
result2.txt
abc
def
result3.txt
abc
def
and so on.
I must create a consolidated file called result.txt with all the values concatenated from all result files looping through the n files in a location /root/samplepath.
It may be easier to use cat, as others have suggested. If you must do it with Python, this should work. It finds all of the text files in the directory and appends their contents to the result file.
import glob, os
os.chdir('/root')
with open('result.txt', 'w+') as result_file:
for filename in glob.glob('result*.txt'):
with open(filename) as file:
result_file.write(file.read())
# append a line break if you want to separate them
result_file.write("\n")
That could be an easy way of doing so
Lets says for example that my file script.py is in a folder and along with that script there is a folder called testing, with inside all the text files named like file_0, file_1....
import os
#reads all the files and put everything in data
number_of_files = 0
data =[]
for i in range (number_of_files):
fn = os.path.join(os.path.dirname(__file__), 'testing/file_%d.txt' % i)
f = open(fn, 'r')
for line in f:
data.append(line)
f.close()
#write everything to result.txt
fn = os.path.join(os.path.dirname(__file__), 'result.txt')
f = open(fn, 'w')
for element in data:
f.write(element)
f.close()

Categories