Writing a list to a CSV file - python
I've got a piece of code that extracts coordinates from a KML file. It works beautifully and prints to the screen the way I'd want it to print to a CSV file. However, when I attempt to write it to a CSV file, the resulting file is empty.
I've tried both the method below and the standard text output method using .write and .writerows. All have the same result.
Here is the KML I'm using:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:kml="http://www.opengis.net/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom">
<Document>
<name>Test3.kml</name>
<Style id="s_ylw-pushpin">
<IconStyle>
<scale>1.1</scale>
<Icon>
<href>http://maps.google.com/mapfiles/kml/pushpin/ylw-pushpin.png</href>
</Icon>
<hotSpot x="20" y="2" xunits="pixels" yunits="pixels"/>
</IconStyle>
</Style>
<Style id="s_ylw-pushpin_hl">
<IconStyle>
<scale>1.3</scale>
<Icon>
<href>http://maps.google.com/mapfiles/kml/pushpin/ylw-pushpin.png</href>
</Icon>
<hotSpot x="20" y="2" xunits="pixels" yunits="pixels"/>
</IconStyle>
</Style>
<StyleMap id="m_ylw-pushpin">
<Pair>
<key>normal</key>
<styleUrl>#s_ylw-pushpin</styleUrl>
</Pair>
<Pair>
<key>highlight</key>
<styleUrl>#s_ylw-pushpin_hl</styleUrl>
</Pair>
</StyleMap>
<Placemark>
<name>Untitled</name>
<styleUrl>#m_ylw-pushpin</styleUrl>
<LineString>
<tessellate>1</tessellate>
<coordinates>
-117.2983479390361,33.27144940863937,0 -117.2979479084534,33.27158154479859,0 -117.2974695164833,33.27172038778199,0 -117.2975027748323,33.27194103134417,0 -117.297514618297,33.27194834552386,0 -117.2979065026131,33.27210103585357,0 -117.2980671096438,33.27197757139673,0 -117.2980506390891,33.27176546338881,0 -117.2983889177018,33.27174732829762,0 -117.2985056013534,33.27196820309105,0 -117.2984607071796,33.27217535203514,0 -117.2982982520078,33.2722451382993,0 -117.2982714656408,33.2722496045722,0 -117.297926137081,33.27225329696987,0 -117.2979181624345,33.27225324047765,0 -117.297660871735,33.27222714260547,0 -117.2976362532899,33.2722186164706,0 -117.2974159727989,33.27218328409937,0 -117.2974081729552,33.27218350960742,0 -117.2970860609136,33.27208829299941,0 -117.2968393500826,33.27207716108421,0 -117.2967459496535,33.27216774204006,0 -117.2966603938058,33.27233920748802,0 -117.2969907889174,33.27237357387524,0 -117.2970232333844,33.27237306198914,0 -117.2973444433226,33.27239693646774,0 -117.297751764355,33.27242613992279,0 -117.2981731050047,33.27243373303686,0 -117.2981813185804,33.27243372905114,0 -117.2985617246156,33.2723816290589,0 -117.2987498163436,33.27248971415388,0 -117.2987694564539,33.27262188734785,0 -117.2985436721398,33.27267540671544,0 -117.2985270445518,33.27267612619851,0 -117.2981490803383,33.27268345629938,0 -117.2981145841072,33.2726829556605,0 -117.2977420026904,33.27265933276826,0 -117.2977334907908,33.27265936075214,0 -117.2977079525845,33.27265943947727,0 -117.297690884793,33.27265933069783,0 -117.2973143742666,33.2726410594433,0 -117.2972972842265,33.27263660852098,0 -117.2972803621663,33.27263663588342,0 -117.2969673713573,33.27262125275644,0 -117.296756583612,33.27260864705382,0 -117.2965634725893,33.27264899681126,0 -117.2965301429721,33.27279607660442,0 -117.296929900768,33.27282274189361,0 -117.2972917056901,33.27281884120617,0 -117.2975482260676,33.27280094439733,0 -117.2979485409129,33.27281652227333,0 -117.2983940432828,33.2728392485114,0 -117.2987809571886,33.27284381722371,0
</coordinates>
</LineString>
</Placemark>
</Document>
</kml>
And the code:
from xml.dom import minidom
import csv
xmldoc = minidom.parse("Test.kml")
kml = xmldoc.getElementsByTagName("kml")[0]
document = kml.getElementsByTagName("Document")[0]
placemarks = document.getElementsByTagName("Placemark")
for placemark in placemarks:
coords = placemark.getElementsByTagName("coordinates")[0].firstChild.data
list = coords.split(",")
for items in list:
item = items.split(",")
for allitems in item:
latlon = allitems.replace("0 ","")
latlon = latlon.strip()
print(latlon) # <-- Printing to the screen works fine
with open("Output.csv", "w") as output:
writer = csv.writer(output, delimiter='\n')
writer.writerow(latlon)
****SOLVED****
Final working solution is this:
with open("Output.csv", "w") as text_file: # open the file first
#writer = csv.writer(output, delimiter='\n') # and get ready to write
for placemark in placemarks:
coords = placemark.getElementsByTagName("coordinates")[0].firstChild.data
list = coords.split(",")
for items in list:
item = items.split(",")
for allitems in item:
latlon = allitems.replace("0 ","")
latlon = latlon.strip()
print(latlon) # <-- Printing to the screen works fine
text_file.write(latlon + '\n') # Write the row to the already-open file
I abandoned the csv method and went with a text file output, just renaming to csv. I end up with the result I need. Thanks to all that contributed.
The with and writer= should be happening once, at the beginning of your loop. As it is now, you are re-creating the file for each item, throwing away the last item.
with open("Output.csv", "w") as output: # open the file first
writer = csv.writer(output, delimiter='\n') # and get ready to write
for placemark in placemarks:
coords = placemark.getElementsByTagName("coordinates")[0].firstChild.data
list = coords.split(",")
for items in list:
item = items.split(",")
for allitems in item:
latlon = allitems.replace("0 ","")
latlon = latlon.strip()
print(latlon) # <-- Printing to the screen works fine
writer.writerow([latlon]) # Write the row to the already-open file
# EDIT 2 ^ ^
Edit Now there may be another issue: it looks like latlon is a string, but writerow expects a list of items, and fills in the commas between the items automatically. You might want print(latlon + ',', file=output) instead of writer.writerow depending on your specific use case.
Edit 2 Use [latlon] instead of latlon to get the whole line on one row instead of one character per row. The brackets make it a list of one item rather than a string, which behaves in this context like a list of its characters, one at a time.
Related
Group branches in an XML tree with Python on a common field
I have a list of order details in a CSV, and want to join all items from the lines together on one order. Example date is: Order|Line|Item|Price 123456789|1|IK123456|199.99 654987321|1|MASGE12385|29.95 654987321|2|KLEAN458792|9.99 654987321|3|LP12489|1959.95 I want everything to be listed in an XML with the root as the Order Number, Child as the Line Number and Sub-Children as Item and Price. I want the output to look like: <Order number = "123456789"> <Line number = "1"> <Item>IK123456</Item> <Price>199.99</Price> </Line> </Order> <Order number = "654987321"> <Line = "1"> <Item>MASGE12385</Item> <Price>29.95</Price> </Line> <Line = "2"> <Item>KLEAN458792</Item> <Price>9.99</Price> </Line> <Line = "3"> <Item>LP12489</Item> <Price>1959.95</Price> </Line> </Order> Here is my code: import csv import xml.etree.ElementTree as ET file = 'C:/github.txt' with open (file, 'r') as f: reader = csv.reader(f, delimiter = '|') header = next(reader) order_num = reader[0] root = ET.Element("Order") #BUILD A ROOT FOR THE XML TREE root.set('number', order_num) #ADD ATTRIBUTE for row in reader: #ITERATE THROUGH EACH ROW AND POPULATE DATA IN BRANCHES OF XML TREE line = ET.SubElement(root, 'line', number= reader[1]) item = ET.SubElement(line, 'item code') item.text = reader[2] price = ET.SubElement(line, 'price') price.text = reader[3] tree = ET.ElementTree(root) tree.write('C:/github.xml', encoding = 'utf-8', xml_declaration = True) (NOTE: I moved something and got an error, but not sure what happened)
During loop, consider keeping a tracker on Number to conditionally decide to create an element and keep related underlying items together. Additionally, consider csv.DictReader to iterate csv rows as a dictionary which takes first row headers as keys. Finally, use the built-in minidom to pretty print output. Below will incorporate all XML items under the single <Orders> root: import csv import xml.etree.ElementTree as ET import xml.dom.minidom as mn file = 'C:/github.txt' curr_order = None with open (file, 'r') as f: reader = csv.DictReader(f, delimiter = '|') # BUILD A ROOT FOR THE XML TREE root = ET.Element("Orders") # ITERATE THROUGH EACH ROW AS DICTIONARY for d in reader: # CONDITIONALLY BUILD ORDER ELEMENT if curr_order != str(d['Order']): orderElem = ET.SubElement(root, "Order") curr_order = str(d['Order']) # CREATE DESCENDANTS OF ORDER orderElem.set('number', str(d['Order'])) line = ET.SubElement(orderElem, 'line', number = str(d['Line'])) ET.SubElement(line, 'item_code').text = str(d['Item']) ET.SubElement(line, 'price').text = str(d['Price']) # PRETTY PRINT OUTPUT dom = mn.parseString(ET.tostring(root, encoding = 'utf-8')) with open('C:/github.xml', 'wb') as f: f.write(dom.toprettyxml(indent=" ", encoding = 'utf-8')) Online Demo
Python: file write and iterating re.sub from dict only writes last occurrence
I'm unable to figure out how to write and save all re.sub iteration from a dict. Only the last occurrence is saved to the file. I have a translation-worksheet.csv formatted as: locale, lang, Foo-v2, Bar-v2 de_DE, German, German-Foo-v2, German-Bar-v2 zh_CN, Chinese, 零件-Foo-v2, 零件-Bar-v2 There's a folder with a file for each language: target/de_DE_v123.xml The contents of a file: <trans-unit id="14_de_DE" resname="documentGroup.translation"> <source xml:lang="en-GB">Foo-v2</source> <target xml:lang="de-DE">German-Foo-v1</target> </trans-unit> <trans-unit id="1759_de_DE" resname="documentGroup.translation"> <source xml:lang="en-GB">Bar-v2</source> <target xml:lang="de-DE">German-Bar-v1</target> </trans-unit> The goal is to go into each translation file and update all target text. Regex must be used because the target translation text must be over-written regardless of what's currently. import glob import pandas as pd import re data = pd.read_csv('translate-worksheet.csv', sep=',', header=0) englishTranslation = data.columns[2:] #get English text for k, v in data.iterrows(): locale = v[0] docGroup = v[2:] findnreplace = dict(zip(englishTranslation,docGroup)) #{english source: translated target} print("Working on language:"+locale) for propFile in glob.glob('target\\*'+locale+'*.xml'): print(" xliff file:"+propFile) with open(propFile, 'r+', encoding='utf-8') as f: content = f.read() for source, target in findnreplace.items(): print(" Replacing:"+source+", with:"+target) match = re.sub(r'(?<='+source+'<\/source>)[\r\n]+([^\r\n]+)\>(.*?)\<',r"\1"+">"+target+"<", content,flags=re.MULTILINE) f.seek(0) f.write(match) print(match) Expected output: <trans-unit id="14_de_DE" resname="documentGroup.translation"> <source xml:lang="en-GB">Foo-v2</source> <target xml:lang="de-DE">German-Foo-v2</target> </trans-unit> <trans-unit id="1759_de_DE" resname="documentGroup.translation"> <source xml:lang="en-GB">Bar-v2</source> <target xml:lang="de-DE">German-Bar-v2</target> </trans-unit> Actual output: <trans-unit id="14_de_DE" resname="documentGroup.translation"> <source xml:lang="en-GB">Foo-v2</source> <target xml:lang="de-DE">German-Foo-v1</target> </trans-unit> <trans-unit id="1759_de_DE" resname="documentGroup.translation"> <source xml:lang="en-GB">Bar-v2</source> <target xml:lang="de-DE">German-Bar-v2</target> </trans-unit> I'm new to Python and welcome all critiquing to improve the code overall. UPDATE with solution: This may probably very inefficient code because it opens the file, modifies it, and closes it each time, but it works and my files are only 15kb each. I changed it from "open the file and for every source and target in the dict, do something" to "for every source and target in the dict, open the file and do something. for propFile in glob.glob('target\\*'+locale+'*.xml'): print(" xliff file:"+propFile) for source, target in findnreplace.items(): with open(propFile, 'r+', encoding='utf-8') as f: content = f.read() f.seek(0) print(" Replacing:"+source+", with:"+target) match = re.sub(r'(?<='+source+'<\/source>)[\r\n]+([^\r\n]+)\>(.*?)\<',r"\1"+">"+target+"<", content,flags=re.MULTILINE) f.write(match) f.truncate() print(match)
Based on your code, it looks like you want to replace a block of text within an existing text file using regex. For this the basic logic is: Find the text you want to replace Store the existing file text before this text Store the existing file text after this text Create the replacement text to be used in the updated file Rewrite the file with the 'before' text, the replacement text, and the 'after' text Without your actual data, I can't confirm this updated code works, but it should be close: for source, target in findnreplace.items(): print(" Replacing:"+source+", with:"+target) # find start\end index of text to be replaced srch = re.search(r'(?<='+source+'<\/source>)[\r\n]+([^\r\n]+)\>(.*?)\<',r"\1"+">"+target+"<", content,flags=re.MULTILINE) startidx, endidx = .span() # position of text within file # get replacement text match = re.sub(r'(?<='+source+'<\/source>)[\r\n]+([^\r\n]+)\>(.*?)\<',r"\1"+">"+target+"<", content,flags=re.MULTILINE) f.seek(0) # from file start preblk = f.read(startidx) # all text before replace block f.seek(endidx) # end of replace block postblk = f.read(endidx) # all text after replace block f.seek(0) # restart from beginning f.truncate(0) # clear file contents f.write(preblk) f.write(match) f.write(postblk) print(match)
This may probably very inefficient code because it opens the file, modifies it, and closes it each time, but it works and my files are only 15kb each. I changed it from "open the file and for every source and target in the dict, do something" to "for every source and target in the dic, open the file and do something. for propFile in glob.glob('target\\*'+locale+'*.xml'): print(" xliff file:"+propFile) for source, target in findnreplace.items(): with open(propFile, 'r+', encoding='utf-8') as f: content = f.read() f.seek(0) print(" Replacing:"+source+", with:"+target) match = re.sub(r'(?<='+source+'<\/source>)[\r\n]+([^\r\n]+)\>(.*?)\<',r"\1"+">"+target+"<", content,flags=re.MULTILINE) f.write(match) f.truncate() print(match)
How do I sort XML alphabetically using python?
I have some XML files that i want to sort by the element name. These xml files are considered used as Profiles in my salesforce sandbox/org. Ive built some code that takes an xml file and appends it to the bottom of each profile xml file.Allowing me to add code to multiple files all at once rather than having to copy/paste to each file. The issue here, the xml needs to be sorted alphabetically by the element name, ex:(classAccesses, fieldPermissions, layoutAssignments, recordTypeVisibilities, objectPermissions) I have pasted an example of the xml below. The format of the file needs to be consistent and cant change as salesforce might not like it. <?xml version="1.0" encoding="UTF-8"?> <Profile xmlns="http://soap.sforce.com/2006/04/metadata"> <fieldPermissions> <editable>false</editable> <field>Branch_Queue__c.Cell_Phone_Number__c</field> <readable>true</readable> </fieldPermissions> <fieldPermissions> <editable>false</editable> <field>Branch_Queue__c.Branch__c</field> <readable>true</readable> </fieldPermissions> <fieldPermissions> <editable>false</editable> <field>Branch_Queue__c.Source__c</field> <readable>true</readable> </fieldPermissions> <fieldPermissions> <editable>false</editable> <field>Branch_Queue__c.Served_By__c</field> <readable>true</readable> </fieldPermissions> <fieldPermissions> <editable>false</editable> <field>Branch_Queue__c.Update__c</field> <readable>true</readable> </fieldPermissions> <recordTypeVisibilities> <default>false</default> <recordType>Knowledge__kav.RealEstate</recordType> <visible>true</visible> </recordTypeVisibilities> <recordTypeVisibilities> <default>false</default> <recordType>Knowledge__kav.RealEstate_Community_Connection</recordType> <visible>true</visible> </recordTypeVisibilities> <objectPermissions> <allowCreate>false</allowCreate> <allowDelete>false</allowDelete> <allowEdit>false</allowEdit> <allowRead>true</allowRead> <modifyAllRecords>false</modifyAllRecords> <object>Branch_Queue__c</object> <viewAllRecords>true</viewAllRecords> </objectPermissions> <classAccesses> <apexClass>BranchQueueDisplayList</apexClass> <enabled>true</enabled> </classAccesses> <classAccesses> <apexClass>BranchQueueDisplayList_Test</apexClass> <enabled>true</enabled> </classAccesses> <classAccesses> <apexClass>BranchQueueService</apexClass> <enabled>true</enabled> </classAccesses> </Profile> if it helps, here is the python script i have built. if you have any questions please feel free to ask. Thanks! import os import json directory = 'C:/Users/HB35401/MAXDev/force-app/main/default/profiles' #folder containing profiles to be modified os.chdir(directory) newData = 'C:/testXMLBatch/additionalXML/addXML.xml' #xml file to append to profile-xml files. for nameOfFile in os.listdir(directory): #for each profile in the directory if nameOfFile.endswith(".xml"): g = open(newData) data = g.read() #set the value of the newXML to the data variable f = open(nameOfFile) fileContent = f.read() #save the content of the profile to fileContent if data in fileContent: print('ERROR: XML is already inside the Profile.' + nameOfFile) else: EndLine = fileContent[-11:] #save the </Profile> tag from the bottom of the file to EndLine variable. #print(EndLine) # theEndLine will be appended back after we add our new XML. test = fileContent[:-11] #remove the </Profile> tag and write back to the profile the removal of the </Profile> tag with open(nameOfFile, "w") as w: w.write(test) with open(nameOfFile) as t: fileContent2 = t.read() #print(fileContent2) h = open(nameOfFile, "a") #add the new data to the profile along with the </Profile> tag h.write(data + "\n"+ EndLine) h.close()
Try this . from simplified_scrapy import SimplifiedDoc, utils xml = utils.getFileContent('your xml file.xml') doc = SimplifiedDoc(xml) root = doc.Profile nodes = root.children # Get all nodes count = len(nodes) if count: sorted_nodes = sorted(nodes, key=operator.itemgetter('tag')) # Sort by tag sorted_htmls = [] for node in sorted_nodes: sorted_htmls.append(node.outerHtml) # Get the string of sorted nodes for i in range(0, count): nodes[i].repleaceSelf(sorted_htmls[i]) # Replace the nodes in the original text with the sorted nodes print(doc.html)
generator 'yield' in splitting structured input file produces out of sync result
Referring to the top answer by #georg (which I've adapted below) here: Split one file into multiple files based on pattern (cut can occur within lines) I find this a potentially useful pattern to split a file into multiples, based on the initial delimiter. However, as a commenter notes, it creates a blank file first, the reason for which is unclear. I think this is related to the problem I'm having. In my (clumsy, I'm no python master!) adaptation, I try to set the filename by parsing the line following the delimiter before opening the new output file by calling the output=next(fs) generator. However, the dilemma, of course, is that the domain name is not known until the line after the delimiter. I end up with filenames which are one step out of sync with the contained data. The input file contains 100+ xml 'trees', each of which starts with a standard <?xml version='1.0' encoding='UTF-8'?> followed by a line like this which includes the domain name <ns2:domain ... name="atypi.org" ..."> Here is my current script: #!/usr/bin/python2.7 import re def files(): n = 0 while n<12 : n += 1 print "**DEBUG** in generator nameFile=%s n=%d \r" % (nameFile, n) yield open('/Users/peterf/Google Drive/2015 Projects-Strategy/Domain Admin/RackDomains/%s.part.xml' % nameFile, 'w') filename='/Users/peterf/Google Drive/2015 Projects-Strategy/Domain Admin/RackspaceListDomain.output.xml' nameFile='' pat ='<?xml' namePat=re.compile('<ns2:domain.+ name="(.+?)".+>') fs = files() outfile = next(fs) with open(filename) as infile: for line in infile: m=namePat.search(line) if m: nameFile=m.group(1) print "<---\rin 'if m:' nameFile=%s\r" % (nameFile) if pat not in line: # print "\rin 'pat not in line' line=%s\r" % (line) outfile.write(line) else: items = line.split(pat) outfile.write(items[0]) for item in items[1:]: print "in 'for item' pre next(fs) nameFile=%s\r" % (nameFile) outfile = next(fs) print "in 'for item' post next(fs) nameFile=%s --->\r" % (nameFile) outfile.write(pat + item) My debug listing shows: **DEBUG** in generator nameFile= n=1 in 'for item' pre next(fs) nameFile= **DEBUG** in generator nameFile= n=2 in 'for item' post next(fs) nameFile= ---> <--- in 'if m:' nameFile=addressing.com in 'for item' pre next(fs) nameFile=addressing.com **DEBUG** in generator nameFile=addressing.com n=3 in 'for item' post next(fs) nameFile=addressing.com ---> <--- in 'if m:' nameFile=alicemcmahon.com in 'for item' pre next(fs) nameFile=alicemcmahon.com **DEBUG** in generator nameFile=alicemcmahon.com n=4 in 'for item' post next(fs) nameFile=alicemcmahon.com ---> <--- in 'if m:' nameFile=alphabets.com in 'for item' pre next(fs) nameFile=alphabets.com **DEBUG** in generator nameFile=alphabets.com n=5 in 'for item' post next(fs) nameFile=alphabets.com ---> the output directory contains these filenames, beginning with a truncated name from the first 'yield' I guess... .part.xml (this has data from 'addressing.com') addressing.com.part.xml alicemcmahon.com.part.xml alphabets.com.part.xml americanletterpress.com.part.xml americanwoodtype.com.part.xml amyshoemaker.com.part.xml archaicrevivalbooks.com.part.xml archaicrevivalfonts.com.part.xml archaicrevivalimages.com.part.xml astroteddies.com.part.xml I can't figure out how to approach this problem, where the generator is producing an output file before I can get an appropriate name for the file. Here's some representative sections of the input file: <?xml version='1.0' encoding='utf-8'?> <ns2:domain xmlns:ns3="http://www.w3.org/2005/Atom" xmlns:ns2="http://docs.rackspacecloud.com/dns/api/v1.0" xmlns="http://docs.rackspacecloud.com/dns/api/management/v1.0" id="1204245" name="addressing.com" ttl="300" emailAddress="ipadmin#stabletransit.com" updated="2012-10-10T21:33:36Z" created="2009-07-25T15:05:39Z"> <ns2:nameservers> <ns2:nameserver name="dns1.stabletransit.com" /> <ns2:nameserver name="dns2.stabletransit.com" /> </ns2:nameservers> <ns2:recordsList totalEntries="5"> <ns2:record id="A-2542579" type="A" name="addressing.com" data="198.101.155.141" ttl="300" updated="2012-10-10T21:33:35Z" created="2010-02-17T05:02:16Z" /> </ns2:recordsList> </ns2:domain> <?xml version='1.0' encoding='UTF-8'?> <ns2:domain xmlns:ns3="http://www.w3.org/2005/Atom" xmlns:ns2="http://docs.rackspacecloud.com/dns/api/v1.0" xmlns="http://docs.rackspacecloud.com/dns/api/management/v1.0" id="2776403" name="alicemcmahon.com" ttl="300" emailAddress="ipadmin#stabletransit.com" updated="2013-10-21T16:43:17Z" created="2011-05-01T03:01:51Z"> <ns2:nameservers> <ns2:nameserver name="dns1.stabletransit.com" /> <ns2:nameserver name="dns2.stabletransit.com" /> </ns2:nameservers> <ns2:recordsList totalEntries="10"> <ns2:record id="A-6895108" type="A" name="alicemcmahon.com" data="216.185.152.144" ttl="300" updated="2013-10-21T16:43:17Z" created="2011-05-01T03:01:51Z" /> </ns2:recordsList> </ns2:domain> <?xml version='1.0' encoding='UTF-8'?> <ns2:domain xmlns:ns3="http://www.w3.org/2005/Atom" xmlns:ns2="http://docs.rackspacecloud.com/dns/api/v1.0" xmlns="http://docs.rackspacecloud.com/dns/api/management/v1.0" id="1204247" name="americanletterpress.com" ttl="300" emailAddress="ipadmin#stabletransit.com" updated="2012-10-10T21:33:37Z" created="2009-07-25T15:05:41Z"> <ns2:nameservers> <ns2:nameserver name="dns1.stabletransit.com" /> <ns2:nameserver name="dns2.stabletransit.com" /> </ns2:nameservers> <ns2:recordsList totalEntries="5"> <ns2:record id="A-2542581" type="A" name="americanletterpress.com" data="198.101.155.141" ttl="300" updated="2012-10-10T21:33:36Z" created="2010-02-17T05:02:16Z" /> </ns2:recordsList> </ns2:domain> <?xml version='1.0' encoding='UTF-8'?> <ns2:domain xmlns:ns3="http://www.w3.org/2005/Atom" xmlns:ns2="http://docs.rackspacecloud.com/dns/api/v1.0" xmlns="http://docs.rackspacecloud.com/dns/api/management/v1.0" id="1204249" name="americanwoodtype.com" ttl="300" emailAddress="ipadmin#stabletransit.com" updated="2012-10-10T21:33:38Z" created="2009-07-25T15:05:42Z"> <ns2:nameservers> <ns2:nameserver name="dns1.stabletransit.com" /> <ns2:nameserver name="dns2.stabletransit.com" /> </ns2:nameservers> <ns2:recordsList totalEntries="5"> <ns2:record id="A-2542583" type="A" name="americanwoodtype.com" data="198.101.155.141" ttl="300" updated="2012-10-10T21:33:37Z" created="2010-02-17T05:02:16Z" /> </ns2:recordsList> </ns2:domain>
You are asking the generator to produce an output file at the very start: nameFile='' # ... outfile = next(fs) That's your blank filename right there. Postpone calling next(fs) until you have a value for nameFile, and not before. You could set outfile = None instead and test for None before you write: if pat not in line: if outfile is not None: outfile.write(line) else: items = line.split(pat) if outfile is not None: outfile.write(items[0]) If you need to handle lines before you can find your first filename, store those lines in a buffer instead, and clear the buffer when you first create a new file. Not that I think you should be using a generator at all, you are really overcomplicating things by using one. Just create new file objects directly in your loop, that's much clearer. If all you are doing is split the file, use a buffer until you have a filename: buffer = [] out_name = '/Users/peterf/Google Drive/2015 Projects-Strategy/Domain Admin/RackDomains/%s.part.xml' outfile = None with open(filename) as infile: for line in infile: # look for a filename to write to if we don't have one yet if outfile is None: match = namePat.search(line) if match: # New filename, open a file object outfile = open(out_name % match.group(1), 'w') # clear out the buffer, we'll write directly to # the file after this. outfile.writelines(buffer) buffer = [] if '<?xml' in line: # new XML doc, close off the previous one if outfile is not None: outfile.close() outfile = None # line handling if outfile is None: buffer.append(line) else: outfile.write(line) if outfile is not None: outfile.close() # All lines processed, if there is a buffer left, then we have unhandled lines if buffer: print('There were trailing lines without a name') print(*buffer, sep='')
python XML to CSV Parse result non
i have this xml but having issue parsing it into csv, i tried simple print statement but still getting no value: <?xml version="1.0" encoding="UTF-8"?> <Document xmlns="urn:iso:std:iso:20022:tech:xsd:pain.008.001.02" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <CstmrDrctDbtInitn> <GrpHdr> <MsgId>1820</MsgId> <CreDtTm>2016-05-17T11:56:12</CreDtTm> <NbOfTxs>197</NbOfTxs> <CtrlSum>136661.81</CtrlSum> <InitgPty> <Nm>GS Netherlands CDZ C.V.</Nm> </InitgPty> </GrpHdr> </CstmrDrctDbtInitn> <CstmrDrctDbtInitn> <GrpHdr> <CreDtTm>2016-05-18T10:34:51</CreDtTm> <NbOfTxs>1</NbOfTxs> <CtrlSum>758.99</CtrlSum> <InitgPty> <Nm>GS Netherlands CDZ C.V.</Nm> </InitgPty></GrpHdr></CstmrDrctDbtInitn> </Document> and i want to iterate value for each node. So far i have written code as below: import xml.etree.ElementTree as ET import csv with open("D:\Python\Dave\\17_05_16_1820_DD201606B10_Base.xml") as myFile: tree = ET.parse(myFile) ns = {'d': 'urn:iso:std:iso:20022:tech:xsd:pain.008.001.02'} # open a file for writing Resident_data = open('Bank.csv', 'w') # create the csv writer object csvwriter = csv.writer(Resident_data) resident_head = [] #write Header MsgId = 'MsgId' resident_head.append(MsgId) CreDtTm = 'CreDtTm' resident_head.append(CreDtTm) NbOfTxs = 'NbOfTxs' resident_head.append(NbOfTxs) CtrlSum = 'CtrlSum' resident_head.append(CtrlSum) csvwriter.writerow(resident_head) for member in tree.findall('.//d:Document/d:CstmrDrctDbtInitn/d:GrpHdr/d:MsgId', ns): resident = [] #write values MsgId = member.find('MsgId').text resident.append(MsgId) CreDtTm = member.find('CreDtTm').text resident.append(CreDtTm) NbOfTxs = member.find('NbOfTxs').text resident.append(NbOfTxs) CtrlSum = member.find('CtrlSum').text resident.append(CtrlSum) csvwriter.writerow(resident) Resident_data.close() I get no error and my Bank.csv has only header but no data please help