openpyxl read excel with filtered data - python

With openpyxl, I am reading an excel file which has some filters applied already.
from openpyxl import load_workbook
wb = load_workbook('C:\Users\dsivaji\Downloads\testcases.xlsx')
ws = wb['TestCaseList']
print ws['B3'].value
My goal to loop through the content of the column 'B'. With this I will be able to read the content of the cell 'B3'. If filters applied and in that case, I don't want to start from the initial cell.
i.e. whichever visible in the excel (after applying the filters) , those alone I want to fetch.
After searching in web for sometime, found that ws.row_dimensions can help with the visible property, but still no luck.
>>> ws.row_dimensions[1]
<openpyxl.worksheet.dimensions.RowDimension object at 0x03EF5B48>
>>> ws.row_dimensions[2]
<openpyxl.worksheet.dimensions.RowDimension object at 0x03EF5B70>
>>> ws.row_dimensions[3].visible
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'RowDimension' object has no attribute 'visible'
How to achieve this ?

You are almost there. The name of the attribute is hidden. If you replace visible in your code with hidden, it should work.

openpyxl is a library for the OOXML file format (.xlsx) and not a replacement for an application like Microsoft Excel. As such support for filters is limited to reading and writing their definitions but not applying them.

Related

Error writing to an existing xlsm file using OPENPYXL -> xml.etree.ElementTree.ParseError: mismatched tag: line 42, column 8

I want to write / change a value to a single cell of an existing xlsm file (let's call it filename.xlsm), maintaining all macros and properties of the original xlsm file.
filename.xlsm has several sheets. I wish to write the value 4 in cell E3 of sheet 'Sup'.
I get the error "xml.etree.ElementTree.ParseError: mismatched tag: line 42, column 8"
The PYTHON code
import pandas as pd
import numpy as np
import openpyxl
InputExcelfile = openpyxl.load_workbook('filename.xlsm', keep_vba=True)
sup_sheet = InputExcelfile['Sup']
sup_sheet['E3'] = 4
InputExcelfile.save('filename.xlsm')
I have tried deleting several sheets to see if the error disappears, but this only happens with this sheet.
After running the program and getting the error, I try accessing the Excel and get the error:
Excel error message
Help would be very much appreciated.
Many thanks.

getting the error; attributeerror: 'Worksheet' object has no attribute 'delete_rows' openpyxl

i'm writing code for a too to perform GIS functions to an input of an excel sheet. sometimes the excel sheet will come in and have 2 separate rows across the top for its attributes fields, and when there is 2, I need to delete the top row. the value of cell A1 will be naming if I need to do this
I tried writing code to check this and delete it as below;
openpyxl
import arcpy, os, sys, csv, openpyxl
from arcpy import env
env.workspace = r"C:\Users\myname\Desktop\Yanko's tool"
arcpy.env.overwriteOutput = True
excel = r"C:\Users\myname\Desktop\Yanko's tool\Yanko's Duplicate tool\Construction_table_Example.xlsx"
layer = r"C:\Users\myname\Desktop\Yanko's tool\Yanko's Duplicate tool\Example_Polygons.shp"
output = r"C:\Users\myname\Desktop\Yanko's tool\\Yanko's Duplicate tool"
book = openpyxl.load_workbook(excel)
book.get_sheet_by_name("Construction Table format")
if ws.cell(row=1, column=1).value == "Naming":
ws.delete_rows(1, 1)
book.save
book.close
it should just delete the first row if the if function passes true, but I get the error;
Warning
(from warnings module):
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\openpyxl\reader\worksheet.py", line 310
warn(msg)
UserWarning: Data Validation extension is not supported and will be removed
Traceback (most recent call last):
File "C:\Users\ronan.corrigan\Desktop\Yanko's tool\Yanko's Duplicate tool\Yanko's Tool.py", line 31, in <module>
ws.delete_rows(1, 1)
AttributeError: 'Worksheet' object has no attribute 'delete_rows'
any help in figuring out what I've done wrong would be greatly appreciated
thanks
First of all, according to the docs, the get_sheet_by_name function is deprecated, and you should just be using the sheet name to get the function:
book["Construction Table format"]
Another thing to note, in your code I don't see you setting that ws value, which should be set to whatever sheet object is returned. If you're setting it somewhere else, so it may be possible that you are using a different sheet object which doesn't have that function
ws=book["Construction Table format"]
Other than that you'd have to share the stack trace to give a better understanding of what's breaking

Open and edit excel via python

I want to import an existing excel file and edit it. But when i copy the excel file and try to edit on it i get some errors. I did not get errors while trying to execute "write" command. But when i am trying to read some values in the cell, i am having problem.
import xlsxwriter
from xlrd import open_workbook
from xlwt import Workbook, easyxf
import xlwt
from xlutils.copy import copy
workbook=open_workbook("month.xlsx")
sheet=workbook.sheet_by_index(0)
print sheet.nrows
book = copy(workbook)
w_sheet=book.get_sheet(0)
print w_sheet.cell(0,0).value
Error: Traceback (most recent call last):
File "excel.py", line 18, in <module>
print w_sheet.cell(0,0).value
AttributeError: 'Worksheet' object has no attribute 'cell'
I haven't used this library, but looking at the documentation I think you are trying to do something it doesn't support. The worksheet documentation lists it's functionality and cell() is not there.
I think this library is for writing excel only, not reading.
Perhaps try pandas read_excel() to read the excel documents you create?
You can the use pandas iloc on the resulting dataframe to get the value you want:
value=pd.read_excel("file.xlsx", sheet_name="sheet").iloc[0,0]
I think that's correct, although I can't run the code to check just now...

pandas pd.read_excel() returning empty dictionary

I am a novice Python programmer and I am having an issue loading an xlsx workbook with the pd.read_excel() function. The pandas read_excel documentation says that specifying 'sheet_name = None' should return "All sheets as a dictionary of DataFrames", however I am getting an empty dictionary back:
template_workbook = pd.read_excel(template_path, sheet_name=None, index_col=None)
template_workbook
Returns:
OrderedDict()
When I try to print the worksheet names in the dictionary:
template_workbook.sheet_name
Returns:
AttributeErrorTraceback (most recent call last)
<ipython-input-67 e76a0b915981> in <module>()
----> 1 template_workbook.sheet_name
AttributeError: 'OrderedDict' object has no attribute 'sheet_name'
It is not clear to me why the worksheets are not being listed in the output dictionary. Any tips are greatly appreciated.
I have 26 tabs/sheets, and am trying to fill 23 using the tab names for indexing.
When you use read_excel with multiple sheets, pandas will return a dictionary:
Returns: DataFrame or Dict of DataFrames
If you have an dictionary, you can use the .keys() method to see the file tabs, as in:
print(template_workbook.keys())
I found this post through Google as I ran into this same problem. Unfortunately, no errors were thrown which is not very helpful, so I'm posting this answer to help the next person who might find this.
The read_excel function in Pandas doesn't exhaustively support ALL Excel functionality. This means if you are using some advanced Excel functionality (named ranges) your data might not be parsed correctly when Pandas tries to read your Excel data.
I tried to simplify my Excel file as much as possible which still didn't work, so I created a new Excel Workbook and copied my data in sheet by sheet. This ended up working for me.
So my advice is to keep your Excel file as simple as possible and you'll probably be able to import it with Pandas. If you send over your exact Excel file I'm happy to help debug (I know this is coming years after the question though).

Python xlwt produces AttributeError when searching for empty cell in Excel spreadsheet file

I have an Excel file and I am using Python to fill its rows and columns.
I want to use the following function to find the first empty row in the table and fill it:
from xlwt import Workbook, easyxf
def next_available_row(sheet):
str_list = filter(None, sheet.col_values(1)) # error
return str(len(str_list)+1)
wb=Workbook()
sheet=wb.add_sheet('sheet1')
sheet.write(0,0,'item')
sheet.write(0,1,'cost')
sheet.write(next_available_row(sheet),0,'potato')
sheet.write(next_available_row(sheet),1,4)
but I get the following error:
AttributeError: 'sheet' object has no attribute 'col_values'
What should I do?
The library you are using, xlwt, is for writing .xls spreadsheets only, and does not have the method col_values (to read its contents), as the error message already states (correctly).
The function next_available_row() (from How to find the first empty row of a google spread sheet using python GSPREAD?) that you want to use to search for an empty cell is based on a different library, gspread, and that is apparently not for Excel files (e.g. .xls, note there are several versions of this file type).
So you probably are looking for an entirely different library, one that reads and writes Excel files.
http://www.python-excel.org/ lists several libraries (including your xlrd):
https://pypi.python.org/pypi/xlrd
https://pypi.python.org/pypi/xlwt
https://pypi.python.org/pypi/XlsxWriter
https://pypi.python.org/pypi/openpyxl
Or maybe try to manage something by reading the file first, e.g. with xlwt's sister project, xlrd.
Seems that has no col_values method on xlwt API. http://xlwt.readthedocs.io/en/latest/api.html
Maybe using together the xlrd you can reach your goal.
http://xlrd.readthedocs.io/en/latest/api.html?highlight=col_values#xlrd-sheet

Categories