Limit the number of worksheets in a excel file using python - python

I need to limit the number of worksheets in a excel file to a specific number using python. The user cannot add a new worksheet once the number of sheets in that workbook reaches a particular number.
I couldn't find any solution using xlsxwriter, openpyxl or xlrd
Is there any option available using some other packages?

Excel doesn't have such functionality built in. You can only disallow creating new sheets by protecting the workbook or with a VBA handler that reverses the operation immediately.
Protecting the workbook also disables other worksheet operations like moving, renaming and hiding/unhiding which may or may not be desirable.
OTOH, the VBA handler can be more intelligent than the one on the link:
Private Sub Workbook_NewSheet(ByVal Sh As Object)
If ThisWorkbook.Worksheets.Count > <Maximum> Then
With Application
.ScreenUpdating = False
.DisplayAlerts = False
Sh.Delete
.DisplayAlerts = True
.ScreenUpdating = True
End With
End If
End Sub
Of course, this will have no effect if one edits the file with a 3rd-party package that doesn't run VBA, or disables macros in Excel.
To have macros, the workbook must be saved as .xlsm, or Excel would give an error upon opening.
See Working with VBA Macros — XlsxWriter Documentation about Python implementation. openpyxl cannot work with macros, only preserve them at most, and xlrd looks like being designed to only read rather than edit. Alternatively, there's always Excel's own COM interface that pywin32 can use.

Related

Write data with Python into existing excel file keeping it intact as much as possible

We have a rather complicated Excel based VBA Tool that shall be replaced by a proper Database and Python based application step by step.
There will be time of the transition between were the not yet completely ready Python tool and the already existing VBA solution will coexist.
To allow interoperability the Python tool must be able to export the database values into the Excel VBA Tool keeping it intact. Meaning that not only all VBA codes have to work as expected but also Shapes, Special Formats etc, Checkboxes etc. have to work after the export.
Currently a simple:
from openpyxl import load_workbook
wb = load_workbook(r'Tool.xlsm', keep_vba=True)
# Write some data i.e. (not required to destroy the file)
wb["SomeSheet!SomeCell"] = "SomeValue"
wb.save(r"Tool_filled.xlsm")
will destroy the file, i.e. shapes won't work, checkboxes neither. (The resulting file is only 5 MB from originally 8 MB, showing that something went quite wrong).
Is there a way to only modify only the data of an ExcelSheet keeping everything else intact/untouched?
As far I know an Excel Sheet are only zipped .xml files. So it should be possible to edit only the related sheets? Correct?
Is there a more comfortable way as writing everything from scratch to only modify the data of an existing Excel file?
Note: The solution has to work in Linux, so simple remote Excel calls are not an option.

openpyxl: password protect entire excel file (xlsx)

I am trying to find pythonic ways to encrypt/password-protect excel xlsx files. Came across openpyxl, where in their documentation (https://openpyxl.readthedocs.io/en/stable/protection.html) states that it can do so.
However, an error message AttributeError: 'NoneType' object has no attribute 'workbookPassword' is prompted when I executed the following. Help anyone?
from openpyxl import workbook
file = 'test.xlsx' // an existing xlsx
wb = load_workbook(filename = file)
wb.security.workbookPassword = 'test_password'
wb.security.lockStructure = True
Edit:
I believe I have used the function improperly, though it is not v clear in their documentation. It was also mentioned that the password can be set using this function openpyxl.workbook.protection.WorkbookProtection.workbookPassword(), which then differs in their example.
You are talking about ways to "encrypt/password-protect excel xlsx files" in a synonymous manner. However, please note that regarding MS Office these are not the same (although one may argue about the wording here)! This can be seen from the screenshot below, or simply open an excel file and go to "File", then under Permissions click on "Protect Workbook".
Screenshot Excel Protection
The workbookPassword from openpyxl only prevents modifications of the workbook structure. Their documentation states, that this is only meant
To prevent other users from viewing hidden worksheets, adding, moving,
deleting, or hiding worksheets, and renaming worksheets, you can
protect the structure of your workbook with a password.
See also their documentation here.
This only refers to the structure, i.e. to adding/removing sheets, but it explicitly does not forbid users from reading the file, nor from editing the contents of the (already available) sheets! However, the set password does not offer read protection in the sense of an encryption.
I have not found a way to actually encrypt an Excel file with openpyxl, but other packages might do just that. I know encrypted files can be decrypted using Python, see e.g. this post. Therefore, I am guessing you can also encrypt such files in a similar way, although sadly right now I cannot test this.

Saving XlsxWriter workbook more than once

I am writing software that manipulates Excel sheets. So far, I've been using xlrd and xlwt to do so, and everything works pretty well.
It opens a sheet (xlrd) and copies select columns to a new workbook (xlwt)
It then opens the newly created workbook to read data (xlrd) and does some math and formatting with the data (which couldn't be done if the file isn't saved once) - (xlwt saves once again)
However, I am now willing to add charts in my documents, and this function is not supported by xlwt. I have found that xlsxwriter does, but this adds other complications to my code: xlsxwriter only has xlsxwriter.close(), which saves AND closes the document.
Does anyone know if there's any workaround for this? Whenever I use xlsxwriter.close(), my workbook object containing the document I'm writing isn't usable anymore.
Fundamentally, there is no reason you need to read twice and save twice. For your current (no charts) process, you can just read the data you need using xlrd; then do all your processing; and write once with xlwt.
Following this workflow, it is a relatively simple matter to replace xlwt with XlsxWriter.

Can't find the active or selected cell in excel using Openpyxl

I want to use python to find what the address or coordinates of the currently active or selected cell in an excel spreadsheets currently active sheet.
So far all I've been able to do is the latter. Perhaps I'm just using the wrong words to search. However, this is the first time in two years of writing first VBA and now Python that I haven't been able to just search and find the answer. Even if it took me half a day.
I've crawled through the code at readthedocs (http://openpyxl.readthedocs.org/en/latest/_modules/index.html)
and looked through the openpyxl.cell.cell, openpyxl.worksheet.worksheet, openpyxl.worksheet.views code. The last seemed to have some promise and led me to writing the code below. Still, no joy, and I don't seem to be able to phrase my online searches to be able to pinpoint results that talk about finding the actual active/selected cell. Perhaps this is because openpyxl is really looking at the saved spreadsheet which might not include any data on the last cell to be selected.
I've tried it both in Python 3.4.3 and 2.7.11. Using openpyxl 2.4.0.
Here's the code that got me the closest to my goal. I was running it in Python3.
from openpyxl.worksheet.views import Selection
import openpyxl
wb = openpyxl.load_workbook('example.xlsx')
ws = wb.active
print(wb.get_sheet_names())
print(ws)
print(Selection.activeCell)
Which gives me the below.
['Sheet1', 'Sheet2', 'Sheet3']
<Worksheet "Sheet3">
Values must be of type <class 'str'>
I put in the first two prints just to prove to myself that I'm actually accessing the workbook/sheet.
If I change the last line to:
print(Selection.activeCellId)
I get:
Values must be of type <class 'int'>
I assume this is because these are only for writing not querying. I've toyed with the idea of writing a VBA macro and just running it from python. However, this code will be used with spreadsheets I don't control. By people who aren't necessarily capable of fixing any problems. I don't think I'm capable of writing something good enough to handle any problems that might crop up either.
Any help will be greatly appreciated.
It's difficult to see the purpose of an active cell for a library like openpyxl as it is effectively a GUI artefact. Nevertheless, because openpyxl works hard to implement the OOXML specification it should be possible to read the value stored by the previous application, or write it.
ws.views.sheetView[0].selection[0].activeCell
Consider the win32com library to replicate the Excel VBA property, ActiveCell. Openpyxl might have a limited method for this property while wind32com allows Python to fully utilize the COM libraries of Windows programs including the MS Office Suite (Excel, Word, Access, etc.). You can even manipulate files as a child process as if your were directly writing VBA.
import win32com.client
# OPEN EXCEL APP AND SPREADSHEET
xlApp = win32com.client.Dispatch("Excel.Application")
xlApp.Workbooks.Open('example.xlsx')
xlApp.ActiveWorkbook.Worksheets('Sheet1').Activate
print(xlApp.ActiveCell)
xlApp.ActiveWorkbook.Close(False)
xlApp.Quit
xlApp = None

How to save in openpyxl without losing formulae?

Because I need to parse and then use the actual data in cells, I open an xlsm in openpyxl with data_only = True.
This has proved very useful. Now though, having the same need for an xlsm that contains formuale in cells, when I then save my changes, the formulae are missing from the saved version.
Are data_only = True and formulae mutually exclusive? If not, how can I access the actual value in cells without losing the formulae when I save?
When I say I lose the formulae, it seems that the results of the formulae (sums, concatenattions etc.) get preserved. But the actual formulaes themselves are no longer displayed when a cell is clicked.
UPDATE:
To confirm whether or not the formulaes were being preserved or not, I've re-opened the saved xlsm, this time with data_only left as False. I've checked the value of a cell that had been constructed using a formula. Had formulae been preserved, opening the xlsm with data_only set to False should have return the formula. But it returns the actual text value (which is not what I want).
If you want to preserve the integrity of the workbook, ie. retain the formulae, the you cannot use data_only=True. The documentation makes this very clear.
Part of your question was: Are data_only = True and formulae mutually exclusive?
The answer to that, in openpyxl, is yes.
But this is not intrinsic to Excel. You could have a library like openpyxl which gives you access to both the formulas and their results. This is unlikely to happen, since the maintainer(s) of openpyxl are philosophically opposed to this idea.
So, how you're expected to handle your kind of situation in openpyxl is to load the workbook twice: once with data_only=True just to read the data (which you keep in memory), then load it again as a "different" workbook with data_only=False to get a writable version.
The "canonical" way of modifying an existing workbook with Python while preserving everything else (including formatting, formulas, charts, macros, etc.) is to use a COM interface (such as PyWin32, or higher-level wrappers like pywinauto or xlwings) to control a running instance of Excel. Of course, this is only possible if you are running on a machine with Excel installed.

Categories