Unable to convert and save excel file into PDF using Python - python

I am trying to import excel files and converting them into PDF using Python. I tried using win32com library but the saved file is not good. Saved PDF only contains first few columns from excel.
Code:-
import win32com.client
o = win32com.client.Dispatch("Excel.Application")
o.Visible = False
wb_path = r'c:\user\desktop\sample.xls'
wb = o.Workbooks.Open(wb_path)
ws_index_list = [1,4,5] #say you want to print these sheets
path_to_pdf = r'C:\user\desktop\sample.pdf'
wb.WorkSheets(ws_index_list).Select()
wb.ActiveSheet.ExportAsFixedFormat(0, path_to_pdf)

Related

Print specific sheets in excel doc to PDF with Python (xlwings)

I am attempting to automate the very manual process of individually selecting a range of worksheets within an excel file to PDF. I was able to string together the following code, which successfully prints the document. However, I cannot figure out how to select specific worksheets within my workbook, so it currently prints the entire workbook to PDF (which comes out to a whopping 897 pages).
Any ideas on how to select certain pages and then print to PDF with a given file name?
import os
import xlwings as xw
book = xw.Book(r'linktomyfile.xlsm')
sheet = book.sheets[0]
current_work_dir = os.getcwd()
pdf_path = os.path.join(current_work_dir, "Report_Date.pdf")
print(f"Saving workbook as '{pdf_path}' ...")
book.api.ExportAsFixedFormat(0, pdf_path)
print(f"Opening PDF file with default application")
Much appreciated!
You can just use the sheet reference to print to pdf, for example:
book = xw.Book(r'linktomyfile.xlsm')
sheet = book.sheets("Sheet1")
current_work_dir = os.getcwd()
pdf_path = os.path.join(current_work_dir, "Report_Date.pdf")
sheet.api.ExportAsFixedFormat(0, pdf_path)
You can also specify a range, e.g.
sheet.range("A1:G15").api.ExportAsFixedFormat(0, pdf_path)
Example of looping through specific sheets:
sheetlist = ["Sheet A", "Sheet B"]
for each in sheetlist:
pdf_path = os.path.join(current_work_dir, f"{each}.pdf")
sht = book.sheets(each)
sht.api.ExportAsFixedFormat(0, pdf_path)
Here each pdf is named after the sheet name.
In newer version of xlwings, there's a built in .to_pdf() function. Assuming you've got a book or sheet ready to print:
# to print a whole workbook
myXlwingsWorkBook.to_pdf(r"c:\myOutputPath")
# print a sheet
myXlwingsSheet.to_pdf(r"c:\myOutputPath")
Documentation: Xlwings documentation - then search for "PDF"
There're a few options. I wish I could just print/pdf the first page though...

Python win32 client saving %20 instead of spaces

I have an issue saving the pdf files. The code works to convert excel files to pdf, but it is saving all of my files with %20 instead of spaces. So "Fort Worth" would save as "Fort%20Worth".
Here is the code below. Thanks.
import xlwings as xw
import win32com.client
curyq = "2017Q4"
msa_list_ea = ['Albuquerque','Atlanta','Austin','Baltimore','Boston','Charlotte','Chicago','Cincinnati','Cleveland','Columbus',
'Dallas','Dallas/Ft. Worth','Denver','Detroit','Fort Lauderdale','Fort Worth','Hartford','Houston','Indianapolis',
'Jacksonville','Kansas City','Las Vegas','Long Island','Los Angeles','Louisville','Memphis','Miami','Milwaukee',
'Minneapolis','Nashville','New York','Norfolk','Newark','Oakland','Orange County','Orlando','West Palm Beach',
'Philadelphia','Phoenix','Pittsburgh','Portland','Raleigh','Richmond','Sacramento','Salt Lake City','San Antonio',
'Riverside','San Diego','San Francisco','San Jose','Seattle','St. Louis','Tampa','Tucson','Ventura','Washington, DC']
## convert market excel EBA reports to PDF
o = win32com.client.Dispatch("Excel.Application")
o.Visible = False
for i in msa_list_ea:
if i == "Dallas/Ft. Worth":
i = "Dallas-Ft. Worth"
if i == "Newark":
i = "Northern New Jersey"
wb_path = r'G:/Team/EBAs/{}/Excel/{}_EBA_{}.xlsx'.format(curyq, i, curyq)
wb = o.Workbooks.Open(wb_path)
ws_index_list = [1] #chooses which sheet in workbook to print (counting begins at 1)
path_to_pdf = r'G:/Team/EBAs/{}/PDF/{}_EBA_{}.pdf'.format(curyq, i, curyq) ## path to save pdf file
wb.WorkSheets(ws_index_list).Select()
wb.ActiveSheet.ExportAsFixedFormat(0, path_to_pdf)
wb.Close(False)
print("{}".format(i))
This prints correctly in my terminal, no %20s here.
I assume your raw string literal is not respected by the external call to Excel in wb.ActiveSheet.ExportAsFixedFormat(0, path_to_pdf).
Try adding quotes:
path_to_pdf = r'"G:/Team/EBAs/{}/PDF/{}_EBA_{}.pdf"'.format(curyq, i, curyq) ## path to save pdf file
I had the same issue when running a similar code on a Windows machine. The path was using forward slashes. Using double backslashes solved the problem.
To make it non OS specific I used the os and pathlib modules to format the path correctly:
path_to_pdf = os.fspath(Path(path_to_pdf))

Copy Protected excel workbook into another workbook python

I am trying to open a protected Excel file and copy the contents to another file I'm using this following snippet:
import win32com.client
xlApp = win32com.client.Dispatch("Excel.Application")
filename='C:/Users/sh/Documents/Supply.xls'
xlwb = xlApp.Workbooks.Open(filename,False,True,None)
for sheet in xlwb.Worksheets:
xlApp = win32com.client.Dispatch("Excel.Application")
nwb = xlApp.Workbooks.Add()
sheet.Copy(Before=nwb.Workheets('Sheet1'))
nwb.SaveAs("C:/Users/sh/Documents/"+sheet.Name+'.xlsx') # Line 9
nwb.Close(True)
However, I'm not able to copy the contents as it throws an exception at
line number 9 saying 'Microsoft Excel Cannot Access the file at (line 9)
Is there any other method to copy contents of protected Excel workbook to another workbook in python?
import win32com.client
xlApp = win32com.client.Dispatch("Excel.Application")
filename='C:/Py/Input/Supply.xls'
xlwb = xlApp.Workbooks.Open(filename,False,True,None)
sheet= xlwb.Sheets(1)
shhet1=xlwb.Sheets(2)
nwb = xlApp.Workbooks.Add()
sheet.Copy(Before=nwb.Sheets(1))
nwb.SaveAs('Sheet1.csv',24)
nwb.Close(True)
nwb1 = xlApp.Workbooks.Add()
shhet1.Copy(Before=nwb1.Sheets(1))
nwb1.SaveAs('Sheet2.csv',24)
nwb1.Close(True)

Python - save different sheets of an excel file as individual excel files

Newbie : I have an Excel file, which has more than 100 different Sheets. Each sheet contains several tables and charts.
I wish to save every sheet as a new Excel file.
I tried many python codes, but none of them worked.
Kindly help in this. Thanks!
Edit 1 : In reponse to comments, this is what I tried:
import pandas as pd
import xlrd
inputFile = 'D:\Excel\Complete_data.xlsx'
#getting sheet names
xls = xlrd.open_workbook(inputFile, on_demand=True)
sheet_names = xls.sheet_names()
path = "D:/Excel/All Files/"
#create a new excel file for every sheet
for name in sheet_names:
parsing = pd.ExcelFile(inputFile).parse(sheetname = name)
#writing data to the new excel file
parsing.to_excel(path+str(name)+".xlsx", index=False)
To be precise, the problem is coming in copying tables and charts.
I have just worked through this issue so will post my solution, I do not know how it will affect charts etc.
import os
import xlrd
from xlutils.copy import copy
import xlwt
path = #place path where files to split up are
targetdir = (path + "New_Files/") #where you want your new files
if not os.path.exists(targetdir): #makes your new directory
os.makedirs(targetdir)
for root,dir,files in os.walk(path, topdown=False): #all the files you want to split
xlsfiles=[f for f in files] #can add selection condition here
for f in xlsfiles:
wb = xlrd.open_workbook(os.path.join(root, f), on_demand=True)
for sheet in wb.sheets(): #cycles through each sheet in each workbook
newwb = copy(wb) #makes a temp copy of that book
newwb._Workbook__worksheets = [ worksheet for worksheet in newwb._Workbook__worksheets if worksheet.name == sheet.name ]
#brute force, but strips away all other sheets apart from the sheet being looked at
newwb.save(targetdir + f.strip(".xls") + sheet.name + ".xls")
#saves each sheet as the original file name plus the sheet name
Not particularly elegant but worked well for me and gives easy functionality. Hopefully useful for someone.

Python parsing XLS with images [duplicate]

I found some Python2 code to extract images from Excel files.
I have a very fundamental question: Where shall I specify the path of my target excel file?
Or does it only work with an active opened Excel file?
import win32com.client # Need pywin32 from pip
from PIL import ImageGrab # Need PIL as well
import os
excel = win32com.client.Dispatch("Excel.Application")
workbook = excel.ActiveWorkbook
wb_folder = workbook.Path
wb_name = workbook.Name
wb_path = os.path.join(wb_folder, wb_name)
#print "Extracting images from %s" % wb_path
print("Extracting images from", wb_path)
image_no = 0
for sheet in workbook.Worksheets:
for n, shape in enumerate(sheet.Shapes):
if shape.Name.startswith("Picture"):
# Some debug output for console
image_no += 1
print("---- Image No. %07i ----", image_no)
# Sequence number the pictures, if there's more than one
num = "" if n == 0 else "_%03i" % n
filename = sheet.Name + num + ".jpg"
file_path = os.path.join (wb_folder, filename)
#print "Saving as %s" % file_path # Debug output
print('Saving as ', file_path)
shape.Copy() # Copies from Excel to Windows clipboard
# Use PIL (python imaging library) to save from Windows clipboard
# to a file
image = ImageGrab.grabclipboard()
image.save(file_path,'jpeg')
You can grab images from existing Excel file like this:
from PIL import ImageGrab
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
workbook = excel.Workbooks.Open(r'C:\Users\file.xlsx')
for sheet in workbook.Worksheets:
for i, shape in enumerate(sheet.Shapes):
if shape.Name.startswith('Picture'): # or try 'Image'
shape.Copy()
image = ImageGrab.grabclipboard()
image.save('{}.jpg'.format(i+1), 'jpeg')
An xlsx file is actually a zip file. You can directly get the images from the xl/media subfolder. You can do this in python using the ZipFile class. You don't need to have MS Excel or even run in Windows!
Filepath and filename is defined in the variables here:
wb_folder = workbook.Path
wb_name = workbook.Name
wb_path = os.path.join(wb_folder, wb_name)
In this particular case, it calls the active workbook at the line prior:
workbook = excel.ActiveWorkbook
But you should theoretically be able to specify path using the wb_folder and wb_name variables, as long as you load the file on the excel module (Python: Open Excel Workbook using Win32 COM Api).

Categories