How to extract OLE objects from Excel table using Python? - python

I would like to use Python to extract OLE-objects from an Excel table into the Windows clipboard.
This post didn't help further since it is for VBA.
And this post is still unanswered.
Assuming the given Excel table (with ChemDraw or ChemSketch OLE objects):
There are some Python modules which can handle Excel files, e.g. openpyxl, xlrd.
The module win32clipboard can put data into the clipboard.
My Problems:
I don't see how to get the embedded OLE object to the clipboard. Probably, openpyxl and xlrd together with win32clipboard are not suited for this?
There is a Python module oletools which maybe can do it but I don't understand how it works.
https://pypi.org/project/oletools/
From this page:
oleobj: to extract embedded objects from OLE files.
This seems to be exactly what I am looking for, however, I couldn't find any MCVEs. And unfortunately, the documentation of oleobj is basically reduced to: "read the source code and find out yourself". I would be grateful for hints and assistance.
My code so far:
### trying to extract OLE objects from Excel table into clipboard
from openpyxl import load_workbook
import win32clipboard as clpbd
def set_clipboard(data):
clpbd.OpenClipboard()
clpbd.EmptyClipboard()
clpbd.SetClipboardText(data) # I'm aware, this is only for text, is there anything for OLEs?
clpbd.CloseClipboard()
def print_clipboard():
clpbd.OpenClipboard()
data = clpbd.GetClipboardData()
clpbd.CloseClipboard()
print(data)
wb = load_workbook(filename = 'tbChemOLE.xlsx')
ws = wb.active
myName = ws['A3'].value # result: napthalene
myImage = ws['B3'].value # result: None
myObject = ws['C3'].value # result: None
set_clipboard(myName)
print_clipboard() # result: Naphtalene
# set_clipboard(myImage) # crash, because myImage is None
print_clipboard()
# set_clipboard(myObject) # crash, because myObject is None
print_clipboard()
wb.close()
### end of code

I built a python module to do exactly this check it out over here. https://pypi.org/project/AttachmentsExtractor/ also the module can be run on any os.
after installing the library use the following code snippet:
from AttachmentsExtractor import extractor
abs_path_to_file='Please provide absolute path here '
path_to_destination_directory = 'Please provide path of the directory where the extracted attachments should be stored'
extractor.extract(abs_path_to_file,path_to_destination_directory) # returns true if one or more attachments are found else returns false.

In the meantime I found this post, where the OP actually didn't want the OLE objects on the clipboard, but for me it is fine. Actually, no need for openpyxl or xlrd, but win32com.client is required.
I can get all OLE objects, however, they are indexed (probably) in the sequence of their addition.
So I need to create a dictionary with the row index as key and a set of OLE object index of and name as value.
Code:
### copy OLE object in certain cell to clipboard
import win32com.client as win32
import win32clipboard
excel = win32.gencache.EnsureDispatch('Excel.Application')
ffname = r'C:\Test\tbChemOLE.xlsx'
wb = excel.Workbooks.Open(ffname)
ws = wb.Worksheets.Item(1)
objs = ws.OLEObjects()
def get_all_OLEs():
oleNo_dict = {} # dictionary for all OLE objects
for i in range(1,len(objs)+1): # loop all OLE objects
obj = objs.Item(i)
myRow = obj.TopLeftCell.Row # row of OLE object
myName = ws.Cells(myRow,1).Value # corresponding name
oleNo_dict[myRow] = (i, myName)
return oleNo_dict
def get_OLE(row):
try:
objs[oleNo_dict[row][0]].Copy()
win32clipboard.OpenClipboard()
data = win32clipboard.GetClipboardData(0xC004) # Binary access
win32clipboard.CloseClipboard()
except Exception as e:
print(e)
win32clipboard.OpenClipboard()
win32clipboard.EmptyClipboard()
win32clipboard.CloseClipboard()
return oleNo_dict[row]
# and OLE is on clipboard if found
oleNo_dict = get_all_OLEs()
row = 4
myMolecule = get_OLE(row)
print(myMolecule[1], "OLE object is now on the clipboard.")
wb.Close()
excel.Application.Quit()
### end of code
Result:
Anthracene OLE object is now on the clipboard.

Related

How do I download an xlsm file and read every sheet in python?

Right now I am doing the following.
import xlrd
resp = requests.get(url, auth=auth).content
output = open(r'temp.xlsx', 'wb')
output.write(resp)
output.close()
xl = xlrd.open_workbook(r'temp.xlsx')
sh = 1
try:
for sheet in xl.sheets():
xls.append(sheet.name)
except:
xls = ['']
It's extracting the sheets but I don't know how to read the file or if saving the file as an .xlsx is actually working for macros. All I know is that the code is not working right now and I need to be able to catch the data that is being generated in a macro. Please help! Thanks.
I highly recommend using xlwings if you want to open, modify, and save .xlsm files without corrupting them. I have tried a ton of different methods (using other modules like openpyxl) and the macros always end up being corrupted.
import xlwings as xw
app = xw.App(visible=False) # IF YOU WANT EXCEL TO RUN IN BACKGROUND
xlwb = xw.Book('PATH\\TO\\FILE.xlsm')
xlws = {}
xlws['ws1'] = xlwb.sheets['Your Worksheet']
print(xlws['ws1'].range('B1').value) # get value
xlws['ws1'].range('B1').value = 'New Value' # change value
yourMacro = xlwb.macro('YourExcelMacro')
yourMacro()
xlwb.save()
xlwb.close()
Edit - I added an option to keep Excel invisible at users request

How to check if a large Excel file is protected as quickly as possible?

How to check if excel file is protected with password the fastest way (without trying to open it and placing an exception)?
Updated:
from zipfile import *
from openpyxl import load_workbook
filename = 'Z:\\path_to_file\\qwerty.xlsm' # protected one
try:
wb = load_workbook(filename, data_only=True, read_only=True)
except (BadZipfile) as error:
print(is_zipfile(filename))
A problem is that I got False as an output, thus I cannot get rid of the exception and replace it with is_zipfile() condition.
Solution using the openpyxl library:
import openpyxl
wb = openpyxl.load_workbook(PATH_TO_FILE)
if wb.security.lockStructure == None:
# no password, act accordingly
...
else:
# password, act accordingly
...
You can do this using the protection._password property of a sheet:
wb = openpyxl.load_workbook("C:\\Users\\...\\Downloads\\some_workbook.xlsx")
print(wb.worksheets[0].protection._password)
You can do this for whatever sheet you would like, based off the worksheets in the workbook.
If there is no password, the value is None. Otherwise, it returns the hashed password.
So, you can create a method to check this:
def password_protected(sheet):
if sheet.protection._password is None:
return False
return True
The same method applies for the whole workbook, the property is just workbook.protection._workbook_password.
When trying to open a password protected workbook with openpyxl, this indeed gives a error zipfile.BadZipFile so a workaround would be to use:
import zipfile
zipfile.is_zipfile("workbook.xlsx")

Retrieve Excel Workbook Connection Properties

I am attempting to grab the "Command Text" from the Connection Property window in an Excel file using python. However, I cannot find the object that contains this information. In the below picture I would like to retrieve the highlighted EXEC sp_FooBar as a string:
I am able to retrieve the Connection names with:
import odbc
import win32com.client
file = r'PATH_TO_FILE'
xl = win32com.client.DispatchEx('Excel.Application')
wb = xl.workbooks.open(file)
for x in wb.connections:
print(x)
But I'm not sure how to use the <COMObject <unknown>> object further to grab the command text. I'm thinking win32com may have something, but can't seem to crack the code.
You can get the CommandText property from a OLEDBConnectioninstance like this:
import odbc
import win32com.client
file = r'PATH_TO_FILE'
xl = win32com.client.DispatchEx('Excel.Application')
wb = xl.workbooks.open(file)
for x in wb.Connections:
print(x.OLEDBConnection.CommandText)
xl.Application.Quit()

Accessing UpdateLinks() in COM Object using Python

I am working on automating an Excel file which is linked to certain .csv files.
Those .csv files are created from a SAS Code which is run every Quarter.
The files created are timestamped accordingly for example XYZ_201603.csv and XYZ_201606.csv and so on.
I need to update the links on my Excel File so that it automatically changes the link to the file from next quarter. I am trying to do this using Python win32com.client and my code looks like
from win32com import Dispatch
xl_app = Dispatch("Excel.Application")
xl_app.Visible = True
xl_app.DisplayAlerts = False
wb = xl_app.workbooks.open(r"C:\Users\XYZ\Desktop\Test\Summary.xlsx")
xl_app.AskToUpdateLinks = False
try:
wb.UpdateLink(Name=r"C:\Users\XYZ\Desktop\Test\XYZ_201606.csv")
except Exception as e:
print(e)
finally:
wb.Close(True)
wb = None
return True
xl_app.Quit()
xl = None
Whenever I run this, I get the following error
(-2147352567,'Exception occured.',(0,'Microsoft Excel','UpdateLink method of
Workbook class failed','xlmain11.chm',0,-2146827284),None)
Can Somebody tell me what is going wrong here. Also, incase I have multiple links, how do I tell which link needs to be changed to what? Can I pass a dictionary of directories of updated datasets
The code and the approach has been taken from this answer on Stack Overflow
Update Links in for Excel Spreadsheet Using Python
If you review the Microsoft Documentation, it seems that the UpdateLink method can be called without any parameters. Therefore this program should work:
import win32com.client as win32
xl_app = win32.gencache.EnsureDispatch("Excel.Application")
xl_app.Visible = True
xl_app.DisplayAlerts = False
wb = xl_app.workbooks.open(r"C:\Users\XYZ\Desktop\Test\Summary.xlsx")
wb.UpdateLink()
wb.Save()
wb.Close()
xl_app.Quit()
I'm not sure if my solution solves your issue, but I had the same problem and I used LinkSources() and ChangeLink() instead
newSource = r"C:\Users\XYZ\Desktop\Test\XYZ_201606.csv"
oldSource = wb.LinkSources()
wb.ChangeLink(Name = oldSource[0], NewName = newSource, Type = 1)
Hope it helps!

how to output xlsx generated by Openpyxl to browser?

I was using stackoverflow for a while now and it helped me very often. Now I have a problem I couldn't solve myself or through searching.
I'm trying to output my excel file generated by openpyxl in browser as I was doing it with phpexcel. The method appears to be the same, but I only get broken file. My code looks like this:
from openpyxl.workbook import Workbook
from openpyxl.writer.excel import ExcelWriter
from openpyxl.writer.excel import save_virtual_workbook
from openpyxl.cell import get_column_letter
from StringIO import StringIO
print 'Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
print 'Content-Disposition: attachment;filename="results.xlsx"'
print 'Cache-Control: max-age=0\n'
output = StringIO()
wb = Workbook()
ws = wb.worksheets[0]
ws.cell('A1').value = 3.14
wb.save(output)
print output.getvalue()
#print save_virtual_workbook(wb)
I use the version 1.5.8 and python 2.7.
None of the approaches works. When I just use it from desktop and not browser it works flawlessly.
I would be very thankful for help.
P.S. please don't tell me that using other language or program would be easier. I need to solve this with python.
this is work for me. I use python 2.7 and latest openpyxl and send_file from flask
... code ...
import StringIO
from openpyxl import Workbook
wb = Workbook()
ws = wb.active # worksheet
ws.title = "Excel Using Openpyxl"
c = ws.cell(row=5, column=5)
c.value = "Hi on 5,5"
out = StringIO.StringIO()
wb.save(out)
out.seek(0)
return send_file(out, mimetype='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
attachment_filename='xxl.xlsx', as_attachment=True)
output = HttpResponse(mimetype='application/application/vnd.openxmlformats-officedocument.spreadsheetml.sheet')
file_name = "Test.xlsx"
output['Content-Disposition'] = 'attachment; filename='+ file_name
wb = Workbook()
ws = wb.worksheets[0]
ws.cell('A1').value = 3.14
wb.save(output)
return output
I used this tips to download my files with openpyxl. Hope that will help
Writing the xlsx output to disk and then serving it up via Apache worked perfectly, but putting it out directly caused errors in Excel and other issues.
I added a couple of extra steps and made one minor change to your code:
buffer=output.getvalue()
In the HTTP headers:
print "Content-Length: " + str(len(buffer))
And used write() instead of print() to push the buffer into the standard output stream:
stdout.write(buffer)
Your scripts works for me as you expect without alterations.
I can only assume you have a problem with your cgi script setup.
Make sure you have the directory where the script lives actually gets served by the web server. On apache you can achieve this with:
ScriptAlias /cgi-bin/ /home/WWW/localhost/cgi-bin/
Make sure the script is excutable by setting the script permissions. For commandline operation (python scriptname) that was not necessary, for your webbrowser that is. And make sure the owner of the webserver can excute the scripts, as the webserver probably does not run as you.
Because Excel uses a binary format you should be using BytesIO to buffer.
from io import BytesIO
But what error are you getting if you use save_virtual_workbook() which does this for you?
I have same problem.
Solution is to switch stdout to bin mode:
import msvcrt
print 'Content-Type:application/octet-stream; name="{}"'.format(os.path.basename(xls_file))
print 'Content-Disposition:attachment; filename="{}"'.format(os.path.basename(xls_file))
print "Content-Length: " + str(os.path.getsize(xls_file))
print 'Cache-Control: max-age=0\r\n'
msvcrt.setmode (1, os.O_BINARY) # stdout = 1
sys.stdout.flush()
with open(xls_file, 'rb') as fobj:
copyfileobj(fobj, sys.stdout)
If you want to build an HTML table that looks like your spreadsheet, you probably want to work with CSV. Either do this, instead of Excel, OR convert your Excel to CSV after you build it.
In any case, once you have the data in CSV format, then it's simply a matter of using python to build the HTML page and looping through the CSV data, while inserting your <table>, <tr>, and <td> tags, as appropriate.

Categories