How can I do automation for excel to xml in python? - python

My question is that I have assigned one task in that I have to read excel document and store that data into XML file. So I have done one code in python for that. But it giving me error when I am writing an XML file.
#!/usr/bin/python
import xlrd
import xml.etree.ElementTree as ET
workbook = xlrd.open_workbook('anuja.xls')
workbook = xlrd.open_workbook('anuja.xlsx', on_demand = True)
worksheet = workbook.sheet_by_index(0)
first_row = [] # Header
for col in range(worksheet.ncols):
first_row.append( worksheet.cell_value(0,col) )
# tronsform the workbook to a list of dictionnaries
data =[]
for row in range(1, worksheet.nrows):
elm = {}
for col in range(worksheet.ncols):
elm[first_row[col]]=worksheet.cell_value(row,col)
data.append(elm)
for set1 in data :
f = open('data.xml', 'w')
f.write("<Progress>%s</Progress>" % (set1[0]))
f.write("<P>%s</P>" % (set1[1]))
f.write("<Major>%s</Major>" % (set1[2]))
f.write("<pop>%s</pop>" % (set1[3]))
f.write("<Key>%s</Key>" % (set1[4]))
f.write("<Summary>%s</Summary>" % (set1[5]))
Error is
Traceback (most recent call last):
File "./read.py", line 23, in <module>
f.write("<Progress>%s</Progress>" % (set1[0]))
KeyError: 0

So the error message actually tells you that there is no key '0' that you try to write to the XML file.
Some more Tipps:
You open the XML file in every iteration of your loop which will fail
There are easier ways to create XML files, check out this article https://pythonadventures.wordpress.com/2011/04/04/write-xml-to-file/
You should check out a python debugger, it will make it easy for you to investigate e.g. what your data loop looks from the inside. I like ipdb most https://pypi.python.org/pypi/ipdb

Related

Breakng the hash

I have to break 4 hash codes and find their number . but my code is not working
these are the hash codes (in a csv file) :
javad :f478525457dcd5ec6223e52bd3df32d1edb600275e18d6435cdeb3ef2294e8de
milad : 297219e7de424bb52c040e7a2cbbd9024f7af18e283894fe59ca6abc0313c3c4
tahmine : 6621ead3c9ec19dfbd65ca799cc387320c1f22ac0c6b3beaae9de7ef190668c4
niloofar : 26d72e99775e03d2501416c6f402c265e628b7d02eee17a7671563c32e0cd9a3
my code :
import hashlib
import itertools as it
import csv
from typing import Dict
number=[0,1,2,3,4,5,6,7,8,9]
code = hashlib.sha256()
passwords = list(it.permutations(number, 4))
with open('passwords.csv', newline='') as theFile:
reader = csv.reader(theFile)
passdic = dict()
# hpass is hash password
for hpass in passwords :
encoded_hpass = ''.join(map(str, hpass)).encode('ascii')
code = hashlib.sha256()
code.update(encoded_hpass)
passdic[encoded_hpass] = code.digest()
for row in theFile :
for key, value in row.items():
passdic[key].append(value)
and my result is :
'C:\Users\Parsa\AppData\Local\Programs\Python\Python38-32\python.exe' 'c:\Users\Parsa\.vscode\extensions\ms-python.python-2021.12.1559732655\pythonFiles\lib\python\debugpy\launcher' '3262' '--' 'c:\Users\Parsa\Desktop\project\hash breaker.py'
Traceback (most recent call last):
File "c:\Users\Parsa\Desktop\project\hash breaker.py", line 24, in <module>
for row in theFile :
ValueError: I/O operation on closed file.
You're trying to read from a closed file, which is impossible.
I don't know what your code is supposed to do, but here are the unlogical parts:
This opens the file to parse it as CSV
with open('passwords.csv', newline='') as theFile:
reader = csv.reader(theFile)
Then later on you run:
for row in theFile :
for key, value in row.items():
But now, you're outside of the with block and the file is closed.
I guess you should use reader in place of theFile. If you really intend to loop over the raw line of the file, you need to wrap the loop again in a with open statement.

Conversion of JSON to XML errors out when I try to write to file

I am in the process of doing a conversion of JSON to XML using Python.
I'm giving a presentation of how by starting with one file, CSV, you can convert it through multiple formats in a chain. So, CSV to JSON, that JSON to XML, XML to the next file type in the chain, etc, back to CSV.
I obtained a public domain CSV file from Kaggle (https://www.kaggle.com/canggih/anime-data-score-staff-synopsis-and-genre), then converted it to JSON.
From JSON, I am trying to convert to XML and write to an outfile.
I converted the CSV to JSON using this (no formatting, just a straight conversion):
#This should convert CSV to JSON
import json, os
import pandas as pd
import csv
df = pd.read_csv('dataanime.csv')
df.to_json(r'sassyg_data_Anime.json')
Then, I created my JSON to XML file:
#With help from instructor and CodeSpeedy
#https://www.codespeedy.com/how-to-convert-json-to-xml-using-python/
#Import libraries
import json as j
import xml.etree.ElementTree as et
#load in the json file
with open("sassyg_data_Anime.json") as json_file_format:
d = j.load(json_file_format)
#create the main container element for the entire XML file
r = et.Element("Work")
#creates the subelements for each part of the json file
et.SubElement(r,"Title").text = d["Title"]
et.SubElement(r,"Type").text = d["Type"]
et.SubElement(r,"Episodes").text = d["Episodes"]
et.SubElement(r,"Status").text = d["Status"]
et.SubElement(r,"Start airing").text = str(d["Start airing"])
et.SubElement(r,"End airing").text = str(d["End airing"])
et.SubElement(r,"Starting season").text = d["Starting season"]
et.SubElement(r,"Broadcast time").text = d["Broadcast time"]
et.SubElement(r,"Producers").text = d["Producers"]
et.SubElement(r,"Licensors").text = d["Licensors"]
et.SubElement(r,"Studios").text = d["Studios"]
et.SubElement(r,"Sources").text = d["Sources"]
et.SubElement(r,"Genres").text = d["Genres"]
et.SubElement(r,"Duration").text = str(d["Duration"])
et.SubElement(r,"Rating").text = d["Rating"]
et.SubElement(r,"Score").text = str(d["Score"])
et.SubElement(r,"Scored by").text = str(d["Scored by"])
et.SubElement(r,"Members").text = str(d["Members"])
et.SubElement(r,"Favorites").text = str(d["Favorites"])
et.SubElement(r,"Description").text = d["Description"]
#create the element tree/info for the write file
a = et.ElementTree(r)
#ERROR ERROR
#structure the output for xml via tostring rather than str
#Cannot write an ElementTree to file, errors out
#This was one solution I came up with, still errors out
a_xml_str = et.tostring(a)
print(a_xml_str)
#This might error out as well, I can't get the program to get to this point
#write file it should go to
outfile = open("json_to_xml.xml", 'w', encoding='utf-8')
outfile.write(a_xml_str)
outfile.close()
The error I get is:
Traceback (most recent call last):
File "F:\Data_Int_Final\Gardner_json_to_xml\convert_json_to_xml.py", line 44, in <module>
a_xml_str = et.tostring(a)
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\xml\etree\ElementTree.py", line 1109, in tostring
ElementTree(element).write(stream, encoding,
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\xml\etree\ElementTree.py", line 748, in write
serialize(write, self._root, qnames, namespaces,
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\xml\etree\ElementTree.py", line 873, in _serialize_xml
tag = elem.tag
AttributeError: 'ElementTree' object has no attribute 'tag'
This is the latest version of the code I've tried. Can anyone see a solution?
Update:
I have two other ways to convert to the starting JSON file, would one of these be a better approach?
import json
import csv
def make_json(csvFilePath, jsonFilePath):
data = {}
with open(csvFilePath, encoding='utf-8') as csvf:
csvReader = csv.DictReader(csvf)
for rows in csvReader:
key = rows['Title']
data[key] = rows
with open(jsonFilePath, 'w', encoding='utf-8') as jsonf:
jsonf.write(json.dumps(data, indent=4))
csvFilePath = r'dataanime.csv'
jsonFilePath = r'dataAnime.json'
make_json(csvFilePath, jsonFilePath)
which errors out my XML conversion when I use this JSON file with it:
Traceback (most recent call last):
File "F:\Data_Int_Final\convert_json_to_xml.py", line 16, in <module>
et.SubElement(r,"Title").text = d["Title"]
KeyError: 'Title'
or:
import csv
import json
import time
def csv_to_json(csvFilePath, jsonFilePath):
jsonArray = []
#read csv file
with open(csvFilePath, encoding='utf-8') as csvf:
#load csv file data using csv library's dictionary reader
csvReader = csv.DictReader(csvf)
#convert each csv row into python dict
for row in csvReader:
#add this python dict to json array
jsonArray.append(row)
#convert python jsonArray to JSON String and write to file
with open(jsonFilePath, 'w', encoding='utf-8') as jsonf:
jsonString = json.dumps(jsonArray, indent=4)
jsonf.write(jsonString)
csvFilePath = r'dataanime.csv'
jsonFilePath = r'g_d_anime.json'
start = time.perf_counter()
csv_to_json(csvFilePath, jsonFilePath)
finish = time.perf_counter()
print(f"Conversion of all rows completed successfully in {finish - start:0.4f} seconds")
which errors out my XML conversion when I use this created JSON file with it:
Traceback (most recent call last):
File "F:\Data_Int_Final\convert_json_to_xml.py", line 16, in <module>
et.SubElement(r,"Title").text = d["Title"]
TypeError: list indices must be integers or slices, not str
It's simpler to work with the CSV file and generate a XML file from that directly.
Try something like this:
import csv
import xml.etree.ElementTree as et
root = et.Element('WorksXML')
tree = et.ElementTree(root)
with open("dataanime.csv", "r", encoding="utf-8") as fin:
reader = csv.DictReader(fin)
for row in reader:
r = et.SubElement(root, "Work")
# iterate over each of the fields and add to the XML element
for field in reader.fieldnames:
et.SubElement(r, field.replace(' ', '_')).text = row[field]
with open("csv_to_xml.xml", 'wb') as fout:
tree.write(fout, xml_declaration=True, encoding='utf-8')
This generates an XML file with each "work" as a separate sub-element under the root element.
<?xml version="1.0" encoding="utf-8"?>
<WorksXML>
<Work>
<Title>Fullmetal Alchemist: Brotherhood</Title>
<Type>TV</Type>
<Episodes>64</Episodes>
<Status>Finished Airing</Status>
<Start_airing>4/5/2009</Start_airing>
<End_airing>7/4/2010</End_airing>
<Starting_season>Spring</Starting_season>
...
For the CSV to JSON conversion, the first approach creates a dictionary with titles as keys and the second approach creates an array with each item an object with all the attributes.
If any of the works have a duplicate title then the first approach will overwrite the duplicate entries. If not then it's just a matter of how you want to access the data in the JSON file as a dictionary or a list. If you want to generate XML from the JSON file then the second approach with an array will be the better option.
To convert the array-based JSON file to XML then this will do the job.
import json
import xml.etree.ElementTree as ET
def json_to_xml(jsonFilePath, xmlFilePath):
root = ET.Element('WorksXML')
tree = ET.ElementTree(root)
with open(jsonFilePath, "r", encoding="utf-8") as fin:
jdata = json.load(fin)
for obj in jdata:
r = ET.SubElement(root, "Work")
for key, value in obj.items():
ET.SubElement(r, key.replace(' ', '_')).text = value
with open(xmlFilePath, 'wb') as fout:
tree.write(fout, xml_declaration=True, encoding='utf-8')
jsonFilePath = 'g_d_anime.json'
xmlFilePath = 'g_d_anime.xml'
json_to_xml(jsonFilePath, xmlFilePath)

Search,replace text and save as based on text in document in Python

All, I am just getting started with python and I thought this may be a good time to see if it can help me automate a lot of repeative tasks I have to complete.
I am using a script I found on Gethub that will search and replace and then write a new file with the name output.txt. It works fine, but Since I have lots of these files I need to be able to name them different names based on the Text in the final modified document.
To make this a little more difficult the name of the file is based on the text I will be modifing the document with.
So, basically after I run this script, I have a file that sits at C:\Program Files (x86)\Python35-32\Scripts\Text_Find_and_Replace\Result with the name of output.txt in this Modified new file I would like to name it based on what text is in a particular line of the file. So in the modified file of output.txt I would like to have it rename the file to the plain text in line 35.
I have figured out how to read the line within the file using
import linecache
line = linecache.getline("readme.txt", 1)
line
>>> line
'This is Python version 3.5.1\n'
I just need to figure out how to rename the file based on the variable "line"
Any Ideas?
#!/usr/bin/python
import os
import sys
import string
import re
## information/replacingvalues.txt this is the text of the values you want in your final document
information = open("C:\Program Files (x86)\Python35- 32\Scripts\Text_Find_and_Replace\information/replacingvalues.txt", 'r')
#Text_Find_and_Replace\Result\output.txt This is the dir and the sum or final document
output = open("C:\Program Files (x86)\Python35-32\Scripts\Text_Find_and_Replace\Result\output.txt", 'w')
#field = open("C:\Program Files (x86)\Python35- 32\Scripts\Text_Find_and_Replace\Field/values.txt"
# Field is the file or words you will be replacing
field = open("C:\Program Files (x86)\Python35- 32\Scripts\Text_Find_and_Replace\Field/values.txt", 'r')
##
##
# modified code for autohot key
# Text_Find_and_Replace\Test/remedy line 1.ahk is the original doc you want modified
with open("C:\Program Files (x86)\Python35- 32\Scripts\Text_Find_and_Replace\Test/remedy line 1.ahk", 'r') as myfile:
inline = myfile.read()
#orig code
##with open("C:\Program Files (x86)\Python35- 32\Scripts\Text_Find_and_Replace\Test/input.txt", 'r') as myfile:
## inline = myfile.read()
informations = []
fields = []
dictionary = {}
i = 0
for line in information:
informations.append(line.splitlines())
for lines in field:
fields.append(lines.split())
i = i+1;
if (len(fields) != len(informations) ):
print ("replacing values and values have different numbers")
exit();
else:
for i in range(0, i):
rightvalue = str(informations[i])
rightvalue = rightvalue.strip('[]')
rightvalue = rightvalue[1:-1]
leftvalue = str(fields[i])
leftvalue = leftvalue.strip('[]')
leftvalue = leftvalue.strip("'")
dictionary[leftvalue] = rightvalue
robj = re.compile('|'.join(dictionary.keys()))
result = robj.sub(lambda m: dictionary[m.group(0)], inline)
output.write(result)
information.close;
output.close;
field.close;
I figured out how...
import os
import linecache
linecache.clearcache()
newfilename= linecache.getline("C:\python 3.5/remedy line 1.txt",37)
filename = ("C:\python 3.5/output.ahk")
os.rename(filename, newfilename.strip())
linecache.clearcache()

Reading contents of excel file in python webapp2

I have two files namely sample.csv and sample.xlsx, all those files are stored in blobstore.I am able to read the records of csv file(which is in the blobstore) using the following code
blobReader = blobstore.BlobReader(blob_key)
inputFile = BlobIterator(blobReader)
if inputFile is None:
values = None
else:
try:
stringReader = csv.reader(inputFile)
data = []
columnHeaders = []
for rowIndex, row in enumerate(stringReader):
if(rowIndex == 0):
columnHeaders = row
else:
data.append(row)
values = {'columnHeaders' : columnHeaders, 'data' : data}
except:
values = None
self.response.write(values)
The output of the above code of a sample.csv file is
{'columnHeaders': ['First Name', 'Last Name', 'Email', 'Mobile'], 'data': [['fx1', 'lx2', 'flx1x2#xxx.com', 'xxx-xxx-xxxx'], ['fy1', 'ly2', 'fly1y2#yyy.com', 'yyy-yyy-yyyy'], ['fz1', 'lz2', 'flz1z2#zzz.com', 'zzz-zzz-zzzz']]}
Using the xlrd package, i am able to read the excel file contents, but in this i have to specify the exact file location
book = xlrd.open_workbook('D:/sample.xlsx')
first_sheet = book.sheet_by_index(0)
self.response.write(first_sheet.row_values(0))
cell = first_sheet.cell(0,0)
self.response.write(cell.value)
Is there any way to read the excel file contents from the blobstore, i have tried it with the following code
blobReader = blobstore.BlobReader(blobKey)
uploadedFile = BlobIterator(blobReader)
book = xlrd.open_workbook(file_contents=uploadedFile)
(or)
book = xlrd.open_workbook(file_contents=blobReader)
But it throws some error TypeError: 'BlobReader' object has no attribute 'getitem'.
Any ideas? Thanks..
Looking into the doc for open_workbook in the xlrd package doc, it seems that when you pass "file_contents", it's expecting a string.
Then you need to look into turning a Blob into a String, which can be done with BlobReader.read(), which gives you a string of the read data.

'ValueError: I/O operation on closed file.' when attempting to advance to new line from CSV

I'm a QA tester who is new to python, trying to create a script to create multiple XML files from a CSV file containing various fields. I feel I am close to creating this program. Unfortunately,I have been getting the following error when adding code to advance to the next line in the CSV file(line = next(reader)).If I don't add the line to to advance, the program will run but multiple xml files will be created with information from only the first line of the CSV file. I can't figure out why or how to fix it.
Error Message:
Traceback (most recent call last):
File "C:\Users\xxxxxxxxx\Desktop\defxmlImportcsv.py", line 22, in <module>
line = next(reader)
ValueError: I/O operation on closed file.
Here is my code:
import xml.etree.ElementTree as etree
import csv
with open('datanames.csv') as csvfile:
reader = csv.reader(csvfile)
x=0
line = next(reader)
line = next(reader)
while x<2:
filename = "Output"+str(x)+".xml"
[firstName,lastName] = line
print(line)
tree = etree.parse('WB5655(BR-09).xml')
root = tree.getroot()
registration_id=tree.find('primaryApplicant/ssn')
registration_id.text = str(53)
first_name = tree.find('primaryApplicant/firstName')
first_name.text = (line[0])
last_name = tree.find('primaryApplicant/lastName')
last_name.text =(line[1])
line = next(reader)
tree.write(filename)
print(x)
x=x+1
Any help would be greatly appreciated. Thanks in advance.
csvfile is automatically closed when you exit your with block. Which means that reader, in turn, can no longer read from it, causing your line = next(reader) line to fail.
The easiest (and likely most correct) fix is to add indentation to your code so that your while loop is inside the with block.
You exited the with statement:
with open('datanames.csv') as csvfile:
reader = csv.reader(csvfile)
x=0
line = next(reader)
line = next(reader)
while x<2:
# ...
The moment the while line is reached the csvfile file object is closed, because, logically, that block is outside of the with statement (not matching the indentation).
The solution is to indent the whole while loop to be within the with block:
with open('datanames.csv') as csvfile:
reader = csv.reader(csvfile)
x=0
line = next(reader)
line = next(reader)
while x<2:
# ...
Rather than use while, use itertools.islice() to loop just twice:
from itertools import islice
tree = etree.parse('WB5655(BR-09).xml')
registration_id=tree.find('primaryApplicant/ssn')
registration_id.text = '53'
with open('datanames.csv') as csvfile:
reader = csv.reader(csvfile)
# skip two lines
next(islice(reader, 2, 2), None)
for x, row in enumerate(islice(reader, 2)):
filename = "Output{}.xml".format(x)
first_name = tree.find('primaryApplicant/firstName')
last_name = tree.find('primaryApplicant/lastName')
first_name.text, last_name.text = row
tree.write(filename)
I simplified your XML handling as well; you don't have to read the input XML tree twice, for example.

Categories