Read through an unstructured xls file - python

I have an .xls file which contains one column with 2,000 rows.
I want to iterate through the file and print out the data points
which start with "cheap". However, the following code doesn't work.
Help!
import xlrd
wb = xlrd.open_workbook("file.xls")
wb.sheet_names()
sh = wb.sheet_by_index(0)
lst = [sh]
for item in lst:
print item.startswith("cheap")
Traceback (most recent call last):
File "C:\Python26\keywords.py", line 14, in <module>
print item.startswith("cheap")
AttributeError: 'Sheet' object has no attribute 'startswith'

it should look like:
import xlrd
wb = xlrd.open_workbook("file.xls")
wb.sheet_names()
sh = wb.sheet_by_index(0)
for item in sh.col(0):
value = unicode(item.value)
if value.startswith("cheap"):
print value

Related

Converting excel to xml with Null values in excel

I'm trying to convert an excel file to xml using this skeleton code:
wb = load_workbook("deneme.xlsx")
# Getting an object of active sheet 1
ws = wb.worksheets[0]
doc, tag, text = Doc().tagtext()
xml_header = '<?xml version="1.0" encoding="UTF-8"?>'
# Appends the String to document
doc.asis(xml_header)
with tag('userdata'):
with tag('basicinf'):
for row in ws.iter_rows(min_row=2, max_row=None, min_col=1, max_col=90):
row = [cell.value for cell in row]
a=row[0]
with tag("usernumber"):
text(row[0])
with tag("username"):
text(row[1])
with tag("serviceareacode"):
text(row[2])
with tag("language"):
text(row[3])
with tag("welcomemsgid"):
text(row[4])
with tag("calledlimitedid"):
text(row[5])
with tag("followmeflag"):
text(row[6])
with tag("followmenumber"):
text(row[7])
with tag("mobilespecial"):
text(row[8])
result = indent(
doc.getvalue(),
indentation=' ',
indent_text=False
)
print(result)
with open("routescheme_{}.xml".format(a), "w") as f:
f.write(result)
Now if I don't write any input on row[0] in excel, I get the below error:
Traceback (most recent call last):
File "C:\Python39\lib\site-packages\yattag\simpledoc.py", line 489, in html_escape
return s.replace("&", "&").replace("<", "<").replace(">", ">")
AttributeError: 'NoneType' object has no attribute 'replace'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\abdul\Desktop\mmm\main.py", line 36, in <module>
text(row[0])
File "C:\Python39\lib\site-packages\yattag\simpledoc.py", line 179, in text
transformed_string = html_escape(strg)
File "C:\Python39\lib\site-packages\yattag\simpledoc.py", line 491, in html_escape
raise TypeError(
TypeError: You can only insert a string, an int or a float inside a xml/html text node. Got None (type <class 'NoneType'>) instead.
My expectation is that when row[0] is empty it should be like <usernumber></usernumber> in my xml result file.
How can I do that?
I guess ı found solution by myself. I am not sure this is a good solution but,
with tag("usernumber"):
if (row[0] == None):
text()
else:
text(row[0])
So if serviceare code is empty result is: <serviceareacode></serviceareacode>
if it is not empty, result is: <serviceareacode>value</serviceareacode>

openpyxl: AttributeError: 'MergedCell' object attribute 'value' is read-only

When i'm trying to fill the cell in existing .xlsx file and then save it to a new one I got message:
import openpyxl
path = "/home/karol/Dokumenty/wzor.xlsx"
wb_obj = openpyxl.load_workbook(path)
sheet_obj = wb_obj.active
new_protokol = sheet_obj
firma = input("Podaj nazwe: ")
nazwa_pliku = "Protokol odczytu"
filename = nazwa_pliku + firma + ".xlsx"
sheet_obj["C1"] = firma
sheet_obj["D1"] = input()
new_protokol.save(filename=filename)
Traceback (most recent call last):
File "/home/karol/PycharmProjects/Protokolu/Main.py", line 16, in <module>
sheet_obj["C1"] = firma
File "/home/karol/PycharmProjects/Protokolu/venv/lib/python3.7/site-packages/openpyxl/worksheet/worksheet.py", line 309, in __setitem__
self[key].value = value
AttributeError: 'MergedCell' object attribute 'value' is read-only
Process finished with exit code 1
How to fix it?
When you merge cells all cells but the top-left one are removed from the worksheet. To carry the border-information of the merged cell, the boundary cells of the merged cell are created as MergeCells which always have the value 'None'
ws.merge_cells('B2:F4')
top_left_cell = ws['B2']
top_left_cell.value = "My Cell"
Please try this approach, it'll work just fine for you.
To write in a merge cell, you must write in the cell in the upper left corner. And the error will not come out.
ws['I6']="123123123"
wb.save(filename=path....)
I also met this error. I deleted my current Excel file and replaced it with a good Excel file, and then the error disappeared.

Python xls: Copying only non-hidden rows to new Workbook and save

I'm working on a script to copy only non-hidden rows from one file to a new workbook. Right now I have this:
import xlrd
import xlwt
from xlrd import open_workbook
wb = open_workbook('input.xls', formatting_info=True)
wb_sheet = wb.sheet_by_index(0)
newBook = xlwt.Workbook()
newSheet = newBook.add_sheet("no_hidden")
idx = 0
for row_idx in range(1, wb_sheet.nrows):
hidden = wb_sheet.rowinfo_map[row_idx].hidden
if(hidden is not True):
for col_index, cell_value in enumerate(wb_sheet.row[row_idx]):
newSheet.write(idx, col_index, cell_value)
idx = idx+1
newBook.save("test.xls")
However, I get an error saying this:
Traceback (most recent call last):
File "delete.py", line 16, in <module>
for col_index, cell_value in enumerate(wb_sheet.row[row_idx]):
TypeError: 'method' object is not subscriptable
I think I'm handling the wb_sheet.row[]-object wrong, but I cannot figure out how to achieve what I want at this point. Any help would be great.
Thanks!
Please, modify your code as below, hope this may be helpful.
import xlrd
import xlwt
from xlrd import open_workbook
wb = open_workbook('input.xls', formatting_info=True)
wb_sheet = wb.sheet_by_index(0)
newBook = xlwt.Workbook()
newSheet = newBook.add_sheet("no_hidden")
idx = 0
for row_idx in range(0, wb_sheet.nrows):
hidden = wb_sheet.rowinfo_map[row_idx].hidden
if(hidden==0):
for col_index, cell_obj in enumerate(wb_sheet.row(row_idx)):
newSheet.write(idx, col_index, cell_obj.value)
idx = idx+1
newBook.save("test1.xls")
instead of wb_sheet.row[row_idx] it should be wb_sheet.row(row_idx) and
it returns cell object, at the time of writing into other excel file
cell_obj.value should be written.
Hope this may help.
From documentation row is mehtod
row(rowx)
Returns a sequence of the Cell objects in the given row.
Just change this part of your code:
for col_index, cell_value in enumerate(wb_sheet.row(row_idx)):
Same errors:
def a():pass
a[1] # TypeError: 'function' object is not subscriptable

How can I do automation for excel to xml in python?

My question is that I have assigned one task in that I have to read excel document and store that data into XML file. So I have done one code in python for that. But it giving me error when I am writing an XML file.
#!/usr/bin/python
import xlrd
import xml.etree.ElementTree as ET
workbook = xlrd.open_workbook('anuja.xls')
workbook = xlrd.open_workbook('anuja.xlsx', on_demand = True)
worksheet = workbook.sheet_by_index(0)
first_row = [] # Header
for col in range(worksheet.ncols):
first_row.append( worksheet.cell_value(0,col) )
# tronsform the workbook to a list of dictionnaries
data =[]
for row in range(1, worksheet.nrows):
elm = {}
for col in range(worksheet.ncols):
elm[first_row[col]]=worksheet.cell_value(row,col)
data.append(elm)
for set1 in data :
f = open('data.xml', 'w')
f.write("<Progress>%s</Progress>" % (set1[0]))
f.write("<P>%s</P>" % (set1[1]))
f.write("<Major>%s</Major>" % (set1[2]))
f.write("<pop>%s</pop>" % (set1[3]))
f.write("<Key>%s</Key>" % (set1[4]))
f.write("<Summary>%s</Summary>" % (set1[5]))
Error is
Traceback (most recent call last):
File "./read.py", line 23, in <module>
f.write("<Progress>%s</Progress>" % (set1[0]))
KeyError: 0
So the error message actually tells you that there is no key '0' that you try to write to the XML file.
Some more Tipps:
You open the XML file in every iteration of your loop which will fail
There are easier ways to create XML files, check out this article https://pythonadventures.wordpress.com/2011/04/04/write-xml-to-file/
You should check out a python debugger, it will make it easy for you to investigate e.g. what your data loop looks from the inside. I like ipdb most https://pypi.python.org/pypi/ipdb

Python openpyxl module says: AttributeError: 'tuple' object has no attribute 'upper'

Installed Python 3.4 and modules jdcal and openpyxl:
Trying myself on the openpyxl library to read and write XLSX files from Python. I installed the jdcall module and the openpyxl module. Code lets me create the workbook and work sheet:
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
However, if I try to write to the first cell like this:
ws[ 1, 1] = 'testing 1-2-3'
Python says:
C:\Wolf\Python Studies>database.py
Traceback (most recent call last):
File "C:\Wolf\Python Studies\database.py", line 13, in <module>
ws[ 1, 1] = 'testing 1-2-3'
File "C:\Python34\lib\site-packages\openpyxl-2.2.0b1-py3.4.egg\openpyxl\worksheet\worksheet.py", line 403, in __setitem__
self[key].value = value
File "C:\Python34\lib\site-packages\openpyxl-2.2.0b1-py3.4.egg\openpyxl\worksheet\worksheet.py", line 400, in __getitem__ <BR>
return self._get_cell(key)
File "C:\Python34\lib\site-packages\openpyxl-2.2.0b1-py3.4.egg\openpyxl\worksheet\worksheet.py", line 368, in _get_cell
coordinate = coordinate.upper()
AttributeError: 'tuple' object has no attribute 'upper'
C:\Wolf\Python Studies>
Any idea what I am doing wrong?
Cell coordinates should be provided as a string:
ws['A1'] = 'testing 1-2-3'
Or, if you want to use row and column indexes, use ws.cell().value:
ws.cell(row=1, column=1).value = 'testing 1-2-3'

Categories