Converting CSV to XML

Converting CSV to XML - python

I'm currently trying to make the input file for a hydrologic model (HBV-light) compatible with external calibration software (PEST). HBV-light requires that it's input files be in XML format, while PEST can only read text files. My issue relates to writing a script that will automatically convert a parameter set written by PEST (in CSV format) to an XML file that can be read by HBV-light.
Here's a short example of a text file that can be written by PEST:
W,X,Y,Z
1,2,3,4
and this is how I'm attempting to organize the XML file:
<Parameters>
<GroupA>
<W>1</W>
<X>2</X>
</GroupA>
<GroupB>
<Y>3</Y>
<Z>4</Z>
</GroupB>
</Parameters>
I don't have very much programming experience whatsoever, but here is a python code that I wrote so far:
import csv
csvFile = 'myCSVfile.csv'
xmlFile = 'myXMLfile.xml'
csvData = csv.reader(open(csvFile))
xmlData = open(xmlFile, 'w')
xmlData.write('<?xml version="1.0" encoding="utf-8"?>' + "\n")
# there must be only one top-level tag
xmlData.write('<Catchment xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">' + "\n")
xmlData.write('<CatchmentParamters>' + "\n")
rowNum = 0
for row in csvData:
if rowNum == 0:
tags = row
# replace spaces w/ underscores in tag names
for i in range(0, 2):
tags[i] = tags[i].replace(' ', '_')
else:
for i in range(0, 2):
xmlData.write(' ' + '<' + tags[i] + '>' \
+ row[i] + '</' + tags[i] + '>' + "\n")
rowNum +=1
xmlData.write('</CatchmentParameters>' + "\n")
xmlData.write('<VegetationZone>' + "\n")
xmlData.write('<VegetationZoneParameters>' + "\n")
rowNum = 0
for row in csvData:
if rowNum == 0:
tags = row
# replace spaces w/ underscores in tag names
for i in range(3, 5):
tags[i] = tags[i].replace(' ', '_')
else:
for i in range(3, 5):
xmlData.write(' ' + '<' + tags[i] + '>' \
+ row[i] + '</' + tags[i] + '>' + "\n")
rowNum +=1
xmlData.write('</VegetationZoneParameters>' + "\n")
xmlData.write('</VegetationZone>' + "\n")
xmlData.write('</Catchment>' + "\n")
xmlData.close()
I can get the Group A (or CathmentParameters specifically) to be written, but the Group B section is NOT being written. Not sure what to do!

I think that the loop is wrong.
Try if this works for you
#! /usr/bin/env python
# coding= utf-8
import csv
csvFile = 'myCSVfile.csv'
xmlFile = 'myXMLfile.xml'
csvData = csv.reader(open(csvFile))
xmlData = open(xmlFile, 'w')
xmlData.write('<?xml version="1.0" encoding="utf-8"?>' + "\n")
# there must be only one top-level tag
xmlData.write('<Catchment xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">' + "\n")
xmlData.write('<CatchmentParamters>' + "\n")
rowNum = 0
for row in csvData:
if rowNum == 0:
tags = row
# replace spaces w/ underscores in tag names
for i in range(0, 2):
tags[i] = tags[i].replace(' ', '_')
else:
for i in range(0, 2):
xmlData.write(' ' + '<' + tags[i] + '>' \
+ row[i] + '</' + tags[i] + '>' + "\n")
xmlData.write('</CatchmentParameters>' + "\n")
xmlData.write('<VegetationZone>' + "\n")
xmlData.write('<VegetationZoneParameters>' + "\n")
for i in range(2, 4):
xmlData.write(' ' + '<' + tags[i] + '>' \
+ row[i] + '</' + tags[i] + '>' + "\n")
xmlData.write('</VegetationZoneParameters>' + "\n")
xmlData.write('</VegetationZone>' + "\n")
rowNum +=1
xmlData.write('</Catchment>' + "\n")
xmlData.close()

I think the issue is in your range definition in the second part... range(3, 5) means elements 4 and 5, what you want is probably range(2,4) meaning elements 3 and 4.

The problem is that you iterate over the contents of the csv file twice - it appears that you need to "rewind" after your first loop. There is also a minor indexing issue, with the second range needing to be range(2,4) and not range(3,5) as was already pointed out.
I created a piece of code that appears to work. It can probably be improved upon by people who understand Python properly. Note - I added a couple of print statements to convince myself I understood what is happening. If you don't open the csvFile a second time (at "starting the second for loop"), then no rows get printed. That's your clue that this is the problem.
import csv
csvFile = 'myCSVfile.csv'
xmlFile = 'myXMLfile.xml'
csvData = csv.reader(open(csvFile))
xmlData = open(xmlFile, 'w')
xmlData.write('<?xml version="1.0" encoding="utf-8"?>' + "\n")
# there must be only one top-level tag
xmlData.write('<Catchment xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">' + "\n")
xmlData.write('<CatchmentParamters>' + "\n")
rowNum = 0
for row in csvData:
print "row is ", row
if rowNum == 0:
tags = row
# replace spaces w/ underscores in tag names
for i in range(0, 2):
tags[i] = tags[i].replace(' ', '_')
else:
for i in range(0, 2):
xmlData.write(' ' + '<' + tags[i] + '>' \
+ row[i] + '</' + tags[i] + '>' + "\n")
rowNum +=1
xmlData.write('</CatchmentParameters>' + "\n")
xmlData.write('<VegetationZone>' + "\n")
xmlData.write('<VegetationZoneParameters>' + "\n")
rowNum = 0
print "starting the second for loop"
csvData = csv.reader(open(csvFile))
for row in csvData:
print "row is now ", row
if rowNum == 0:
tags = row
# replace spaces w/ underscores in tag names
for i in range(2, 4):
tags[i] = tags[i].replace(' ', '_')
else:
for i in range(2, 4):
xmlData.write(' ' + '<' + tags[i] + '>' \
+ row[i] + '</' + tags[i] + '>' + "\n")
rowNum +=1
xmlData.write('</VegetationZoneParameters>' + "\n")
xmlData.write('</VegetationZone>' + "\n")
xmlData.write('</Catchment>' + "\n")
xmlData.close()
Using the above with the little test file you had given resulted in the following XML file:
<?xml version="1.0" encoding="utf-8"?>
<Catchment xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<CatchmentParamters>
<W>1</W>
<X>2</X>
</CatchmentParameters>
<VegetationZone>
<VegetationZoneParameters>
<Y>3</Y>
<Z>4</Z>
</VegetationZoneParameters>
</VegetationZone>
</Catchment>
Problem solved?

Related

Tweepy error with exporting array content

I am looking to extract tweets and write them to a CSV file, however, I cannot figure out how to get it to generate a file. I am using Tweepy to extract the tweets. I would like the CSV file to contain the following cells: User, date, tweet, likes, retweets, total, eng rate, rating, tweet id
import tweepy
import csv
auth = tweepy.OAuthHandler("", "")
auth.set_access_token("", "")
api = tweepy.API(auth)
try:
api.verify_credentials()
print("Authentication OK")
except:
print("Error during authentication")
def timeline(username):
tweets = api.user_timeline(screen_name=username, count = '100', tweet_mode="extended")
for status in (tweets):
eng = round(((status.favorite_count + status.retweet_count)/status.user.followers_count)*100, 2)
if (not status.retweeted) and ('RT #' not in status.full_text) and (eng <= 0.02):
print (status.user.screen_name + ',' + str(status.created_at) + ',' + status.full_text + ",Likes: " + str(status.favorite_count) + ",Retweets: " + str(status.retweet_count) + ',Total: ' + str(status.favorite_count + status.retweet_count) + ',Engagement rate: ' + str(eng) + '%' + 'Rating: Low' + ',Tweet ID: ' + str(status.id))
elif (not status.retweeted) and ('RT #' not in status.full_text) and (0.02 < eng <= 0.09):
print (status.user.screen_name + ',' + str(status.created_at) + ',' + status.full_text + ",Likes: " + str(status.favorite_count) + ",Retweets: " + str(status.retweet_count) + ',Total: ' + str(status.favorite_count + status.retweet_count) + ',Engagement rate: ' + str(eng) + '%' + 'Rating: Good' + ',Tweet ID: ' + str(status.id))
elif (not status.retweeted) and ('RT #' not in status.full_text) and (0.09 < eng <= 0.33):
print (status.user.screen_name + ',' + str(status.created_at) + ',' + status.full_text + ",Likes: " + str(status.favorite_count) + ",Retweets: " + str(status.retweet_count) + ',Total: ' + str(status.favorite_count + status.retweet_count) + ',Engagement rate: ' + str(eng) + '%' + 'Rating: High' + ',Tweet ID: ' + str(status.id))
elif (not status.retweeted) and ('RT #' not in status.full_text) and (0.33 < eng):
print (status.user.screen_name + ',' + str(status.created_at) + ',' + status.full_text + ",Likes: " + str(status.favorite_count) + ",Retweets: " + str(status.retweet_count) + ',Total: ' + str(status.favorite_count + status.retweet_count) + ',Engagement rate: ' + str(eng) + '%' + 'Rating: Very High' + ',Tweet ID: ' + str(status.id))
tweet = timeline("twitter")
with open('tweet.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
writer.writerow([tweet])

You can look at https://docs.python.org/3/library/csv.html for the info on how to generate a csv file in Python. Quick exmaple:
import csv
with open('some_output.csv', 'w') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(["field1", "field2", "field3"])

Your function get_tweets does not return a value but you are trying to retrieve a value from that function which would result in None. Also it looks like tweet value will be list of strings. writerow method from csv.writer should get list of items and not list of lists. I have modified your code to address those issues. Let me know if it works.
def get_tweets(username):
tweets = api.user_timeline(screen_name=username, count=100)
tweets_for_csv = [tweet.text for tweet in tweets]
print(tweets_for_csv)
return tweets_for_csv
tweet = get_tweets("fazeclan")
with open('tweet.csv', 'w') as f:
writer = csv.writer(f)
writer.writerow(tweet)

Python for loop incomplete

I am trying to convert all csv files in a folder to XML but having issues, code as follows:
import glob
import csv
fileCount = 0
path = "csvs\*.csv"
for fname in glob.glob(path):
print(fname)
for fname in glob.glob(path):
csvFile = fname
xmlFile = "csvs\myData" + str(fileCount) + ".xml"
print (xmlFile)
print (csvFile)
print (fileCount)
fileCount +=1
csvData = csv.reader(open(csvFile))
xmlData = open(xmlFile, 'w')
xmlData.write('<?xml version="1.0" encoding="UTF-8"?>' + "\n")
# there must be only one top-level tag
xmlData.write('<userforms>' + "\n")
rowNum = 0
for row in csvData:
if rowNum == 0:
tags = row
# replace spaces w/ underscores in tag names
for i in range(len(tags)):
tags[i] = tags[i].replace(' ', ' ')
else:
xmlData.write('<userform>' + "\n")
for i in range(len(tags)):
xmlData.write('<response>' + "\n" + ' <field>' + tags[i] + '</field>' + "\n" + ' <value>' + row[i] + '</value>'+ "\n" + '</response>' + "\n")
xmlData.write('</userform>' + "\n")
rowNum +=1
#print (fileCount)
#print (xmlFile)
#print (csvFile)
xmlData.write('</userforms>' + "\n")
#xmlData.write('</csv_data>' + "\n")
xmlData.close()
There are 4 csv files in the original folder, the content of each is the same but the names are 1.csv, 2.csv, 3.csv and 4.csv. This code does generate 4 xml files but the first three are incomplete with just the xml header created.
Is there abyway to add a delay/check to a for loop to ensure it completes?
Console output is clean with only print info available:
csvs\myData0.xml
csvs\1.csv
0
csvs\myData1.xml
csvs\2.csv
1
csvs\myData2.xml
csvs\3.csv
2
csvs\myData3.xml
csvs\4.csv
3

Index out of range while converting file from csv to xml

While running the following code for converting csv to xml I'm getting index out of range error.
I used the code below a small subset of file with 16 columns it works fine but when I try it on more than 30 its giving following error
Traceback (most recent call last):
File "csv2xml.py", line 40, in <module>
+ rowData[i] + '</' + tags[i] + '>' + "\n")
IndexError: list index out of range
#!/usr/bin/python
import sys
import os
import glob
delimiter = "," # "\t" "|" # delimiter used in the CSV file(s)
# the optional command-line argument maybe a CSV file or a folder
if len(sys.argv) == 2:
arg = sys.argv[1].lower()
if arg.endswith('.csv'): # if a CSV file then convert only that file
csvFiles = [arg]
else: # if a folder path then convert all CSV files in the that folder
os.chdir(arg)
csvFiles = glob.glob('*.csv')
# if no command-line argument then convert all CSV files in the current folder
elif len(sys.argv) == 1:
csvFiles = glob.glob('*.csv')
else:
os._exit(1)
for csvFileName in csvFiles:
xmlFile = csvFileName[:-4] + '.xml'
# read the CSV file as binary data in case there are non-ASCII characters
csvFile = open(csvFileName, 'rb')
csvData = csvFile.readlines()
csvFile.close()
tags = csvData.pop(0).strip().replace(' ', '_').split(delimiter)
xmlData = open(xmlFile, 'w')
xmlData.write('<?xml version="1.0" encoding="UTF-8" ?>' + "\n")
# there must be only one top-level tag
xmlData.write('<CTS>' + "\n")
for row in csvData:
rowData = row.strip().split(delimiter)
xmlData.write('<Product>' + "\n")
for i in range(len(tags)):
xmlData.write(' ' + '<' + tags[i] + '>'
+ rowData[i] + '</' + tags[i] + '>' + "\n")
xmlData.write('</Product>' + "\n")
xmlData.write('</CTS>' + "\n")
xmlData.close()

It sounds like your for loop over the lines of data should check the length of rowData like this:
tags_length = len(tags)
for row in csvData:
rowData = row.strip().split(delimiter)
xmlData.write('<Product>' + "\n")
if len(rowData) >= tags_length:
for i in range(tags_length):
xmlData.write(' ' + '<' + tags[i] + '>'
+ rowData[i] + '</' + tags[i] + '>' + "\n")
xmlData.write('</Product>' + "\n")

python script to convert csv to xml

Please help to correct the python script to get the required output
I have written below code to convert csv to xml.
In input file have column from 1 to 278. In output file need to have tag from A1 to A278,
Code :
#!/usr/bin/python
import sys
import os
import csv
if len(sys.argv) != 2:
os._exit(1)
path=sys.argv[1] # get folder as a command line argument
os.chdir(path)
csvFiles = [f for f in os.listdir('.') if f.endswith('.csv') or f.endswith('.CSV')]
for csvFile in csvFiles:
xmlFile = csvFile[:-4] + '.xml'
csvData = csv.reader(open(csvFile))
xmlData = open(xmlFile, 'w')
xmlData.write('<?xml version="1.0"?>' + "\n")
# there must be only one top-level tag
xmlData.write('<TariffRecords>' + "\n")
rowNum = 0
for row in csvData:
if rowNum == 0:
tags = Tariff
# replace spaces w/ underscores in tag names
for i in range(len(tags)):
tags[i] = tags[i].replace(' ', '_')
else:
xmlData.write('<Tariff>' + "\n")
for i in range(len(tags)):
xmlData.write(' ' + '<' + tags[i] + '>' \
+ row[i] + '</' + tags[i] + '>' + "\n")
xmlData.write('</Tariff>' + "\n")
rowNum +=1
xmlData.write('</TariffRecords>' + "\n")
xmlData.close()
Getting below error from script:
Traceback (most recent call last):
File "ctox.py", line 20, in ?
tags = Tariff
NameError: name 'Tariff' is not defined
Sample Input file.(this is a sample record in actual input file will contain 278 columns).
If input file has two or three records, same needs to be appended in one XML file.
name,Tariff Summary,Record ID No.,Operator Name,Circle (Service Area),list
Prepaid Plan Voucher,test_All calls 2p/s,TT07PMPV0188,Ta Te,Gu,
Prepaid Plan Voucher,test_All calls 3p/s,TT07PMPV0189,Ta Te,HR,
Sample output file
The above two TariffRecords, tariff will be hard coded at the beginning and end of xml file.
<TariffRecords>
<Tariff>
<A1>Prepaid Plan Voucher</A1>
<A2>test_All calls 2p/s</A2>
<A3>TT07PMPV0188</A3>
<A4>Ta Te</A4>
<A5>Gu</A5>
<A6></A6>
</Tariff>
<Tariff>
<A1>Prepaid Plan Voucher</A1>
<A2>test_All calls 3p/s</A2>
<A3>TT07PMPV0189</A3>
<A4>Ta Te</A4>
<A5>HR</A5>
<A6></A6>
</Tariff>
</TariffRecords>

First off you need to replace
tags = Tariff
with
tags = row
Secondly you want to replace the write line to not write tags name but write A1, A2 etc..
Complete code:
import sys
import os
import csv
if len(sys.argv) != 2:
os._exit(1)
path=sys.argv[1] # get folder as a command line argument
os.chdir(path)
csvFiles = [f for f in os.listdir('.') if f.endswith('.csv') or f.endswith('.CSV')]
for csvFile in csvFiles:
xmlFile = csvFile[:-4] + '.xml'
csvData = csv.reader(open(csvFile))
xmlData = open(xmlFile, 'w')
xmlData.write('<?xml version="1.0"?>' + "\n")
# there must be only one top-level tag
xmlData.write('<TariffRecords>' + "\n")
rowNum = 0
for row in csvData:
if rowNum == 0:
tags = row
# replace spaces w/ underscores in tag names
for i in range(len(tags)):
tags[i] = tags[i].replace(' ', '_')
else:
xmlData.write('<Tariff>' + "\n")
for i, index in enumerate(range(len(tags))):
xmlData.write(' ' + '<' + 'A%s' % (index+1) + '>' \
+ row[i] + '</' + 'A%s' % (index+1) + '>' + "\n")
xmlData.write('</Tariff>' + "\n")
rowNum +=1
xmlData.write('</TariffRecords>' + "\n")
xmlData.close()
Output:
<?xml version="1.0"?>
<TariffRecords>
<Tariff>
<A1>Prepaid Plan Voucher</A1>
<A2>test_All calls 2p/s</A2>
<A3>TT07PMPV0188</A3>
<A4>Ta Te</A4>
<A5>Gu</A5>
<A6></A6>
</Tariff>
<Tariff>
<A1>Prepaid Plan Voucher</A1>
<A2>test_All calls 3p/s</A2>
<A3>TT07PMPV0189</A3>
<A4>Ta Te</A4>
<A5>HR</A5>
<A6></A6>
</Tariff>
</TariffRecords>

import pandas as pd
from xml.etree import ElementTree as xml
df = pd.read_csv("file_path")
csv_data = df.values
root = xml.Element("TariffRecords")
tariff = xml.subelement("Tariff", root)
for index, data in enumarate(csv_data):
row = xml.Element("A"+str(index), tariff)
row.set(str(data))

If Else Range Search Python 3.5

I am trying to use Python 3.5 to create a .xml document from csv. The .xml requires a specific schema, which I have been able to replicate through Python. My issue is that some of the schema needs to be changed, depending on input from the csv. If a specific product is listed in the csv, a couple of lines from the xml need to be left out. I have provided a basic example below; the issue lies in the middle of the code, where I try to call a defined variable string and do the if else statement on the range to eliminate the unneeded lines. No matter what I do the else is defaulted to; the if portion never solves true although the data in the source document matches the defined variable string, thus the two lines for the range are always left out -thanks in advance.
#! /usr/bin/env python
# coding= utf-8
import csv
csvFile = 'PRODUCT LIST.csv'
xmlFile = 'PRODUCT LIST.xml'
csvData = csv.reader(open(csvFile))
xmlData = open(xmlFile, 'w')
var1 = 'CocaCola'
xmlData.write('<?xml version="1.0" encoding="utf-8"?>' + "\n")
# there must be only one top-level tag
xmlData.write('<ArrayOfProducts>' + "\n")
rowNum = 0
for row in csvData:
if rowNum == 0:
tags = row
# replace spaces w/ underscores in tag names
for i in range(0, 12):
tags[i] = tags[i].replace(' ', '_')
xmlData.write(' <Product>' + "\n")
xmlData.write(' <Name />' + "\n")
for i in range(0, 2):
xmlData.write(' ' + '<' + tags[i] + '>' \
+ row[i] + '</' + tags[i] + '>' + "\n")
xmlData.write(' <List>' + "\n")
for i in range(2, 3):
xmlData.write(' <List ' + "p:type=" + '"' + row[i] + '"' + ' ' + "xmlns:p=" '"xsi"' '>' "\n")
for i in range(3, 7):
xmlData.write(' ' + '<' + tags[i] + '>' \
+ row[i] + '</' + tags[i] + '>' + "\n")
if i in range(3,4) == var1:
for i in range(7, 9):
xmlData.write(' ' + '<' + tags[i] + '>' \
+ row[i] + '</' + tags[i] + '>' + "\n")
else:
pass
xmlData.write(' <Supported>' + "\n")
for i in range (9, 10):
xmlData.write(' <Manufacturer ' + "p:type=" + '"' + row[i] + '"' + '>' "\n")
for i in range(10, 11):
xmlData.write(' ' + '<' + tags[i] + '>' \
+ row[i] + '</' + tags[i] + '>' + "\n")
xmlData.write(' </Manufacturer>' + "\n")
xmlData.write(' </Supported>' + "\n")
for i in range(11, 12):
xmlData.write(' ' + '<' + tags[i] + '>' \
+ row[i] + '</' + tags[i] + '>' + "\n")
xmlData.write(' </Manufacturer>' + "\n")
xmlData.write(' </List>' + "\n")
xmlData.write(' </Product>' + "\n")
rowNum +=1
xmlData.write('</ArrayOfProducts>' + "\n")
xmlData.close()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Converting CSV to XML - python

I think the issue is in your range definition in the second part... range(3, 5) means elements 4 and 5, what you want is probably range(2,4) meaning elements 3 and 4.

Related

Tweepy error with exporting array content

Python for loop incomplete

Index out of range while converting file from csv to xml

python script to convert csv to xml

If Else Range Search Python 3.5

Categories

Resources