Python to excel, openpyxl and file format not valid - python

The following is a simple snippet to open a .xlsm file, write a few values to it with python, and save it.
import openpyxl
from openpyxl import load_workbook
def toExcel():
wb = load_workbook(filename="C:\\Users\\Mark\\Documents\\Test.xlsm")
ws = wb.worksheets[0]
ws.cell(row=1, column=1).value = 'foo'
ws['A2'] = 'bar'
wb.save("C:\\Users\\Mark\\Documents\\Test1.xlsm")
toExcel()
While the file opens and saves, it mentions file format not valid / corrupt and cannot open. If the .xlsm is removed from the wb.save, it will save and open after selecting excel with Open With. Why is the file format not valid as is?

From here: https://openpyxl.readthedocs.io/en/default/tutorial.html#saving-to-a-file
Note
The following will fail:
>>> wb = load_workbook('document.xlsx')
>>> # Need to save with the extension *.xlsx
>>> wb.save('new_document.xlsm')
>>> # MS Excel can't open the document
>>>
>>> # or
>>>
>>> # Need specify attribute keep_vba=True
>>> wb = load_workbook('document.xlsm')
>>> wb.save('new_document.xlsm')
>>> # MS Excel can't open the document
>>>
>>> # or
>>>
>>> wb = load_workbook('document.xltm', keep_vba=True)
>>> # If us need template document, then we need specify extension as *.xltm.
>>> # If us need document, then we need specify attribute as_template=False.
>>> wb.save('new_document.xlsm', as_template=True)
>>> # MS Excel can't open the document

I found this post because I was trying to create a .xlsm file from scratch using openpyxl. I figured out that I was getting this error because when you load the workbook you need to have keep_vba=True as the second parameter passed to the load_workbook function.
So this is what your load_workbook function should look like:
wb = load_workbook(filename="C:\\Users\\Mark\\Documents\\Test.xlsm", keep_vba=True)
As a side note, here is my post that talks about creating a .xlsm file from scratch using openpyxl.

Related

Unable to read excel file , list index out of range error, cant find Sheets

I am trying to read excel (.xlsx) file and convert it to dataframe. I used pandas.ExelFile , pandas.read_excel, openpyxl load_workbook and even io file reading methods but i am unable to read Sheet of this file. Every time i get list index out of range error or no sheet names is case of openpyxl. Also tried xlrd method.
temp_df = pd.read_excel("v2s.xlsx", sheet_name = 0)
or
temp_df = pd.read_excel("v2s.xlsx", sheet_name = "Sheet1")
or
from openpyxl import load_workbook
workbook = load_workbook(filename="v2s.xlsx",read_only = True, data_only = True)
workbook.sheetnames
Link to excel file
According to this ticket, the file is saved in a "slightly defective" format.
The user posted that he used Save As to change the type of document back to a normal Excel spreadsheet file.
Your file is this type:
You need to save it as:
Then running your code
from openpyxl import load_workbook
workbook = load_workbook(filename="v2s_0.xlsx",read_only = True, data_only = True)
print(workbook.sheetnames)
Outputs:
['Sheet1']

How to extract OLE objects from Excel table using Python?

I would like to use Python to extract OLE-objects from an Excel table into the Windows clipboard.
This post didn't help further since it is for VBA.
And this post is still unanswered.
Assuming the given Excel table (with ChemDraw or ChemSketch OLE objects):
There are some Python modules which can handle Excel files, e.g. openpyxl, xlrd.
The module win32clipboard can put data into the clipboard.
My Problems:
I don't see how to get the embedded OLE object to the clipboard. Probably, openpyxl and xlrd together with win32clipboard are not suited for this?
There is a Python module oletools which maybe can do it but I don't understand how it works.
https://pypi.org/project/oletools/
From this page:
oleobj: to extract embedded objects from OLE files.
This seems to be exactly what I am looking for, however, I couldn't find any MCVEs. And unfortunately, the documentation of oleobj is basically reduced to: "read the source code and find out yourself". I would be grateful for hints and assistance.
My code so far:
### trying to extract OLE objects from Excel table into clipboard
from openpyxl import load_workbook
import win32clipboard as clpbd
def set_clipboard(data):
clpbd.OpenClipboard()
clpbd.EmptyClipboard()
clpbd.SetClipboardText(data) # I'm aware, this is only for text, is there anything for OLEs?
clpbd.CloseClipboard()
def print_clipboard():
clpbd.OpenClipboard()
data = clpbd.GetClipboardData()
clpbd.CloseClipboard()
print(data)
wb = load_workbook(filename = 'tbChemOLE.xlsx')
ws = wb.active
myName = ws['A3'].value # result: napthalene
myImage = ws['B3'].value # result: None
myObject = ws['C3'].value # result: None
set_clipboard(myName)
print_clipboard() # result: Naphtalene
# set_clipboard(myImage) # crash, because myImage is None
print_clipboard()
# set_clipboard(myObject) # crash, because myObject is None
print_clipboard()
wb.close()
### end of code
I built a python module to do exactly this check it out over here. https://pypi.org/project/AttachmentsExtractor/ also the module can be run on any os.
after installing the library use the following code snippet:
from AttachmentsExtractor import extractor
abs_path_to_file='Please provide absolute path here '
path_to_destination_directory = 'Please provide path of the directory where the extracted attachments should be stored'
extractor.extract(abs_path_to_file,path_to_destination_directory) # returns true if one or more attachments are found else returns false.
In the meantime I found this post, where the OP actually didn't want the OLE objects on the clipboard, but for me it is fine. Actually, no need for openpyxl or xlrd, but win32com.client is required.
I can get all OLE objects, however, they are indexed (probably) in the sequence of their addition.
So I need to create a dictionary with the row index as key and a set of OLE object index of and name as value.
Code:
### copy OLE object in certain cell to clipboard
import win32com.client as win32
import win32clipboard
excel = win32.gencache.EnsureDispatch('Excel.Application')
ffname = r'C:\Test\tbChemOLE.xlsx'
wb = excel.Workbooks.Open(ffname)
ws = wb.Worksheets.Item(1)
objs = ws.OLEObjects()
def get_all_OLEs():
oleNo_dict = {} # dictionary for all OLE objects
for i in range(1,len(objs)+1): # loop all OLE objects
obj = objs.Item(i)
myRow = obj.TopLeftCell.Row # row of OLE object
myName = ws.Cells(myRow,1).Value # corresponding name
oleNo_dict[myRow] = (i, myName)
return oleNo_dict
def get_OLE(row):
try:
objs[oleNo_dict[row][0]].Copy()
win32clipboard.OpenClipboard()
data = win32clipboard.GetClipboardData(0xC004) # Binary access
win32clipboard.CloseClipboard()
except Exception as e:
print(e)
win32clipboard.OpenClipboard()
win32clipboard.EmptyClipboard()
win32clipboard.CloseClipboard()
return oleNo_dict[row]
# and OLE is on clipboard if found
oleNo_dict = get_all_OLEs()
row = 4
myMolecule = get_OLE(row)
print(myMolecule[1], "OLE object is now on the clipboard.")
wb.Close()
excel.Application.Quit()
### end of code
Result:
Anthracene OLE object is now on the clipboard.

How do I download an xlsm file and read every sheet in python?

Right now I am doing the following.
import xlrd
resp = requests.get(url, auth=auth).content
output = open(r'temp.xlsx', 'wb')
output.write(resp)
output.close()
xl = xlrd.open_workbook(r'temp.xlsx')
sh = 1
try:
for sheet in xl.sheets():
xls.append(sheet.name)
except:
xls = ['']
It's extracting the sheets but I don't know how to read the file or if saving the file as an .xlsx is actually working for macros. All I know is that the code is not working right now and I need to be able to catch the data that is being generated in a macro. Please help! Thanks.
I highly recommend using xlwings if you want to open, modify, and save .xlsm files without corrupting them. I have tried a ton of different methods (using other modules like openpyxl) and the macros always end up being corrupted.
import xlwings as xw
app = xw.App(visible=False) # IF YOU WANT EXCEL TO RUN IN BACKGROUND
xlwb = xw.Book('PATH\\TO\\FILE.xlsm')
xlws = {}
xlws['ws1'] = xlwb.sheets['Your Worksheet']
print(xlws['ws1'].range('B1').value) # get value
xlws['ws1'].range('B1').value = 'New Value' # change value
yourMacro = xlwb.macro('YourExcelMacro')
yourMacro()
xlwb.save()
xlwb.close()
Edit - I added an option to keep Excel invisible at users request

Retrieve Excel Workbook Connection Properties

I am attempting to grab the "Command Text" from the Connection Property window in an Excel file using python. However, I cannot find the object that contains this information. In the below picture I would like to retrieve the highlighted EXEC sp_FooBar as a string:
I am able to retrieve the Connection names with:
import odbc
import win32com.client
file = r'PATH_TO_FILE'
xl = win32com.client.DispatchEx('Excel.Application')
wb = xl.workbooks.open(file)
for x in wb.connections:
print(x)
But I'm not sure how to use the <COMObject <unknown>> object further to grab the command text. I'm thinking win32com may have something, but can't seem to crack the code.
You can get the CommandText property from a OLEDBConnectioninstance like this:
import odbc
import win32com.client
file = r'PATH_TO_FILE'
xl = win32com.client.DispatchEx('Excel.Application')
wb = xl.workbooks.open(file)
for x in wb.Connections:
print(x.OLEDBConnection.CommandText)
xl.Application.Quit()

Is there a way to edit/save an opening (in excel) xlsx file?

I want to do something like this:
import openpyxl as x
wb = x.load_workbook(filename)
# do some edit to the workbook
wb.save(filename)
The file specified by the filename is opening in Excel. Excel is locking the file so I will get permission denied error running the above code. Is there a way to edit/save it?
from openpyxl import load_workbook
ifile = 'Whales.xlsx'
wb = load_workbook(filename=ifile)
# do some edit to the workbook
wb.save(ifile)

Categories