Python parsing XLS with images [duplicate]

Python parsing XLS with images [duplicate] - python

I found some Python2 code to extract images from Excel files.
I have a very fundamental question: Where shall I specify the path of my target excel file?
Or does it only work with an active opened Excel file?
import win32com.client # Need pywin32 from pip
from PIL import ImageGrab # Need PIL as well
import os
excel = win32com.client.Dispatch("Excel.Application")
workbook = excel.ActiveWorkbook
wb_folder = workbook.Path
wb_name = workbook.Name
wb_path = os.path.join(wb_folder, wb_name)
#print "Extracting images from %s" % wb_path
print("Extracting images from", wb_path)
image_no = 0
for sheet in workbook.Worksheets:
for n, shape in enumerate(sheet.Shapes):
if shape.Name.startswith("Picture"):
# Some debug output for console
image_no += 1
print("---- Image No. %07i ----", image_no)
# Sequence number the pictures, if there's more than one
num = "" if n == 0 else "_%03i" % n
filename = sheet.Name + num + ".jpg"
file_path = os.path.join (wb_folder, filename)
#print "Saving as %s" % file_path # Debug output
print('Saving as ', file_path)
shape.Copy() # Copies from Excel to Windows clipboard
# Use PIL (python imaging library) to save from Windows clipboard
# to a file
image = ImageGrab.grabclipboard()
image.save(file_path,'jpeg')

You can grab images from existing Excel file like this:
from PIL import ImageGrab
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
workbook = excel.Workbooks.Open(r'C:\Users\file.xlsx')
for sheet in workbook.Worksheets:
for i, shape in enumerate(sheet.Shapes):
if shape.Name.startswith('Picture'): # or try 'Image'
shape.Copy()
image = ImageGrab.grabclipboard()
image.save('{}.jpg'.format(i+1), 'jpeg')

An xlsx file is actually a zip file. You can directly get the images from the xl/media subfolder. You can do this in python using the ZipFile class. You don't need to have MS Excel or even run in Windows!

Filepath and filename is defined in the variables here:
wb_folder = workbook.Path
wb_name = workbook.Name
wb_path = os.path.join(wb_folder, wb_name)
In this particular case, it calls the active workbook at the line prior:
workbook = excel.ActiveWorkbook
But you should theoretically be able to specify path using the wb_folder and wb_name variables, as long as you load the file on the excel module (Python: Open Excel Workbook using Win32 COM Api).

Related

Unable to convert and save excel file into PDF using Python

I am trying to import excel files and converting them into PDF using Python. I tried using win32com library but the saved file is not good. Saved PDF only contains first few columns from excel.
Code:-
import win32com.client
o = win32com.client.Dispatch("Excel.Application")
o.Visible = False
wb_path = r'c:\user\desktop\sample.xls'
wb = o.Workbooks.Open(wb_path)
ws_index_list = [1,4,5] #say you want to print these sheets
path_to_pdf = r'C:\user\desktop\sample.pdf'
wb.WorkSheets(ws_index_list).Select()
wb.ActiveSheet.ExportAsFixedFormat(0, path_to_pdf)

Download WMF from pptx and decode to JPEG

Hello stackoverflow users.
I am trying to download an image from the powerpoint presentation and then to process it(to recognize numbers on it at certain coordinates).
My problem is that I can download an image from pptx data only in .wmf format, and I cannot convert it. I have tried all possible solutions already.
from pptx import Presentation
from pptx.enum.shapes import MSO_SHAPE_TYPE
pptx_path = "name_pptx.pptx"
prs = Presentation(pptx_path)
desired_slide = prs.slides[6 - 1]
for shape in desired_slide.shapes:
if shape.shape_type == MSO_SHAPE_TYPE.PICTURE:
image_file_bytes = shape.image.blob
file_extension = shape.image.ext # at this point format is .wfm
Interesting that in Powerpoint I can select a desired .jpeg extension when saving a file.

It took me few hours to solve my problem, convertation of wmf file to jpg is a bit tricky in Windows. I add the image to temporary excel file, and then download an image from it.
class ExcelHelpers():
#staticmethod
def add_img_to_excel(path_to_wmf):
import xlsxwriter
workbook = xlsxwriter.Workbook('test.xlsx')
worksheet = workbook.add_worksheet()
worksheet.insert_image('A1', path_to_wmf)
workbook.close()
#staticmethod
def get_img_from_excel(long_filename):
filename = os.path.basename(long_filename).split('.')[0]
from PIL import ImageGrab
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
path_to_excel = os.path.join(os.getcwd(), 'test.xlsx')
workbook = excel.Workbooks.Open(path_to_excel)
for sheet in workbook.Worksheets:
for i, shape in enumerate(sheet.Shapes):
if shape.Name.startswith('Picture'):
shape.Copy()
image = ImageGrab.grabclipboard()
image.save('{}.jpg'.format(filename), 'jpeg')
workbook.Close()
excel.Quit()
del excel
os.remove(long_filename)
os.remove('test.xlsx')

Can excel CSV files be injected with macros?

I am trying to inject my csv files with a macro that automatically fits the column widths in excel but nothing is happening. I am using a code from a previous post (Use Python to Inject Macros into Spreadsheets). But here is the code
import os
import sys
# Import System libraries
import glob
import random
import re
#sys.coinit_flags = 0 # comtypes.COINIT_MULTITHREADED
# USE COMTYPES OR WIN32COM
#import comtypes
#from comtypes.client import CreateObject
# USE COMTYPES OR WIN32COM
import win32com
from win32com.client import Dispatch
desktop = os.path.join(os.path.join(os.environ['USERPROFILE']), 'Desktop')
x = r'C:\This\is\the\path'
scripts_dir = x
conv_scripts_dir = x
strcode = \
'''
sub test()
Column.Autofit
end sub
'''
#com_instance = CreateObject("Excel.Application", dynamic = True) # USING COMTYPES
com_instance = Dispatch("Excel.Application") # USING WIN32COM
com_instance.Visible = True#False
com_instance.DisplayAlerts = True#False
for script_file in glob.glob(os.path.join(scripts_dir, '*.csv')):
print("Processing: %s" % scr ipt_file)
# do the operation in background without actually opening Excel
(file_path, file_name) = os.path.split(script_file)
objworkbook = com_instance.Workbooks.Open(script_file)
xlmodule = objworkbook.VBProject.VBComponents.Add(1)
xlmodule.CodeModule.AddFromString(strcode.strip())
objworkbook.SaveAs(os.path.join(conv_scripts_dir, file_name))
com_instance.Quit()
This code actually opens the a file in excel, executes a macro, and then closes the excel window. Why doesn't the macro work from the python command line, but does work inside excel?

Exactly. There is not formats, whatsoever, in a CSV. It's like trying to add formatting to a Text file. You can't do that. You can convert CSV files to XLSM files (Excel Macro) or XLSB (Binary).
Sub CSVtoXLSB2()
Dim wb As Workbook
Dim CSVPath As String
Dim sProcessFile As String
CSVPath = "C:\your_path_here\"
sProcessFile = Dir(CSVPath & "*.csv")
Do Until sProcessFile = "" ' Loop until no file found.
Set wb = Application.Workbooks.Open(CSVPath & sProcessFile)
wb.SaveAs CSVPath & Split(wb.Name, ".")(0) & ".xlsb", FileFormat _
:=50, Password:="", WriteResPassword:="", ReadOnlyRecommended:=False, _
CreateBackup:=False
wb.Close
sProcessFile = Dir() ' Get next entry.
Loop
Set wb = Nothing
End Sub
Now, if you want to copy a Module from one Workbook to another, follow the steps listed below:
Copying a module from one workbook to another
Open both the workbook that contains the macro you want to copy, and the workbook where you want to copy it.
On the Developer tab, click Visual Basic to open the Visual Basic Editor.
In the Visual Basic Editor, on the View menu, click Project Explorer Project Explorer button image, or press CTRL+R.
In the Project Explorer pane, drag the module containing the macro you want to copy to the destination workbook. In this case, we're copying Module1 from Book2.xlsm to Book1.xlsm.
VBA Project Explorer
Module1 copied from Book2.xlsm
Copy of Module1 copied to Book1.xlsm

Python win32 client saving %20 instead of spaces

I have an issue saving the pdf files. The code works to convert excel files to pdf, but it is saving all of my files with %20 instead of spaces. So "Fort Worth" would save as "Fort%20Worth".
Here is the code below. Thanks.
import xlwings as xw
import win32com.client
curyq = "2017Q4"
msa_list_ea = ['Albuquerque','Atlanta','Austin','Baltimore','Boston','Charlotte','Chicago','Cincinnati','Cleveland','Columbus',
'Dallas','Dallas/Ft. Worth','Denver','Detroit','Fort Lauderdale','Fort Worth','Hartford','Houston','Indianapolis',
'Jacksonville','Kansas City','Las Vegas','Long Island','Los Angeles','Louisville','Memphis','Miami','Milwaukee',
'Minneapolis','Nashville','New York','Norfolk','Newark','Oakland','Orange County','Orlando','West Palm Beach',
'Philadelphia','Phoenix','Pittsburgh','Portland','Raleigh','Richmond','Sacramento','Salt Lake City','San Antonio',
'Riverside','San Diego','San Francisco','San Jose','Seattle','St. Louis','Tampa','Tucson','Ventura','Washington, DC']
## convert market excel EBA reports to PDF
o = win32com.client.Dispatch("Excel.Application")
o.Visible = False
for i in msa_list_ea:
if i == "Dallas/Ft. Worth":
i = "Dallas-Ft. Worth"
if i == "Newark":
i = "Northern New Jersey"
wb_path = r'G:/Team/EBAs/{}/Excel/{}_EBA_{}.xlsx'.format(curyq, i, curyq)
wb = o.Workbooks.Open(wb_path)
ws_index_list = [1] #chooses which sheet in workbook to print (counting begins at 1)
path_to_pdf = r'G:/Team/EBAs/{}/PDF/{}_EBA_{}.pdf'.format(curyq, i, curyq) ## path to save pdf file
wb.WorkSheets(ws_index_list).Select()
wb.ActiveSheet.ExportAsFixedFormat(0, path_to_pdf)
wb.Close(False)
print("{}".format(i))

This prints correctly in my terminal, no %20s here.
I assume your raw string literal is not respected by the external call to Excel in wb.ActiveSheet.ExportAsFixedFormat(0, path_to_pdf).
Try adding quotes:
path_to_pdf = r'"G:/Team/EBAs/{}/PDF/{}_EBA_{}.pdf"'.format(curyq, i, curyq) ## path to save pdf file

I had the same issue when running a similar code on a Windows machine. The path was using forward slashes. Using double backslashes solved the problem.
To make it non OS specific I used the os and pathlib modules to format the path correctly:
path_to_pdf = os.fspath(Path(path_to_pdf))

xls to csv converter

I am using win32.client in python for converting my .xlsx and .xls file into a .csv. When I execute this code it's giving an error. My code is:
def convertXLS2CSV(aFile):
'''converts a MS Excel file to csv w/ the same name in the same directory'''
print "------ beginning to convert XLS to CSV ------"
try:
import win32com.client, os
from win32com.client import constants as c
excel = win32com.client.Dispatch('Excel.Application')
fileDir, fileName = os.path.split(aFile)
nameOnly = os.path.splitext(fileName)
newName = nameOnly[0] + ".csv"
outCSV = os.path.join(fileDir, newName)
workbook = excel.Workbooks.Open(aFile)
workbook.SaveAs(outCSV, c.xlCSVMSDOS) # 24 represents xlCSVMSDOS
workbook.Close(False)
excel.Quit()
del excel
print "...Converted " + nameOnly + " to CSV"
except:
print ">>>>>>> FAILED to convert " + aFile + " to CSV!"
convertXLS2CSV("G:\\hello.xlsx")
I am not able to find the error in this code. Please help.

I would use xlrd - it's faster, cross platform and works directly with the file.
As of version 0.8.0, xlrd reads both XLS and XLSX files.
But as of version 2.0.0, support was reduced back to only XLS.
import xlrd
import csv
def csv_from_excel():
wb = xlrd.open_workbook('your_workbook.xls')
sh = wb.sheet_by_name('Sheet1')
your_csv_file = open('your_csv_file.csv', 'wb')
wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
for rownum in xrange(sh.nrows):
wr.writerow(sh.row_values(rownum))
your_csv_file.close()

I would use pandas. The computationally heavy parts are written in cython or c-extensions to speed up the process and the syntax is very clean. For example, if you want to turn "Sheet1" from the file "your_workbook.xls" into the file "your_csv.csv", you just use the top-level function read_excel and the method to_csv from the DataFrame class as follows:
import pandas as pd
data_xls = pd.read_excel('your_workbook.xls', 'Sheet1', index_col=None)
data_xls.to_csv('your_csv.csv', encoding='utf-8')
Setting encoding='utf-8' alleviates the UnicodeEncodeError mentioned in other answers.

Maybe someone find this ready-to-use piece of code useful. It allows to create CSVs from all spreadsheets in Excel's workbook.
Python 2:
# -*- coding: utf-8 -*-
import xlrd
import csv
from os import sys
def csv_from_excel(excel_file):
workbook = xlrd.open_workbook(excel_file)
all_worksheets = workbook.sheet_names()
for worksheet_name in all_worksheets:
worksheet = workbook.sheet_by_name(worksheet_name)
with open(u'{}.csv'.format(worksheet_name), 'wb') as your_csv_file:
wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
for rownum in xrange(worksheet.nrows):
wr.writerow([unicode(entry).encode("utf-8") for entry in worksheet.row_values(rownum)])
if __name__ == "__main__":
csv_from_excel(sys.argv[1])
Python 3:
import xlrd
import csv
from os import sys
def csv_from_excel(excel_file):
workbook = xlrd.open_workbook(excel_file)
all_worksheets = workbook.sheet_names()
for worksheet_name in all_worksheets:
worksheet = workbook.sheet_by_name(worksheet_name)
with open(u'{}.csv'.format(worksheet_name), 'w', encoding="utf-8") as your_csv_file:
wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
for rownum in range(worksheet.nrows):
wr.writerow(worksheet.row_values(rownum))
if __name__ == "__main__":
csv_from_excel(sys.argv[1])

I'd use csvkit, which uses xlrd (for xls) and openpyxl (for xlsx) to convert just about any tabular data to csv.
Once installed, with its dependencies, it's a matter of:
python in2csv myfile > myoutput.csv
It takes care of all the format detection issues, so you can pass it just about any tabular data source. It's cross-platform too (no win32 dependency).

First read your excel spreadsheet into pandas, below code will import your excel spreadsheet into pandas as a OrderedDict type which contain all of your worksheet as dataframes. Then simply use worksheet_name as a key to access specific worksheet as a dataframe and save only required worksheet as csv file by using df.to_csv(). Hope this will workout in your case.
import pandas as pd
df = pd.read_excel('YourExcel.xlsx', sheet_name=None)
df['worksheet_name'].to_csv('YourCsv.csv')
If your Excel file contain only one worksheet then simply use below code:
import pandas as pd
df = pd.read_excel('YourExcel.xlsx')
df.to_csv('YourCsv.csv')
If someone want to convert all the excel worksheets from single excel workbook to the different csv files, try below code:
import pandas as pd
def excelTOcsv(filename):
df = pd.read_excel(filename, sheet_name=None)
for key, value in df.items():
return df[key].to_csv('%s.csv' %key)
This function is working as a multiple Excel sheet of same excel workbook to multiple csv file converter. Where key is the sheet name and value is the content inside sheet.

#andi I tested your code, it works great, BUT
In my sheets there's a column like this
2013-03-06T04:00:00
date and time in the same cell
It gets garbled during exportation, it's like this in the exported file
41275.0416667
other columns are ok.
csvkit, on the other side, does ok with that column but only exports ONE sheet, and my files have many.

xlsx2csv is faster than pandas and xlrd.
xlsx2csv -s 0 crunchbase_monthly_.xlsx cruchbase
excel file usually comes with n sheetname.
-s is sheetname index.
then, cruchbase folder will be created, each sheet belongs to xlsx will be converted to a single csv.
p.s. csvkit is awesome too.

Quoting an answer from Scott Ming, which works with workbook containing multiple sheets:
Here is a python script getsheets.py (mirror), you should install pandas and xlrd before you use it.
Run this:
pip3 install pandas xlrd # or `pip install pandas xlrd`
How does it works?
$ python3 getsheets.py -h
Usage: getsheets.py [OPTIONS] INPUTFILE
Convert a Excel file with multiple sheets to several file with one sheet.
Examples:
getsheets filename
getsheets filename -f csv
Options:
-f, --format [xlsx|csv] Default xlsx.
-h, --help Show this message and exit.
Convert to several xlsx:
$ python3 getsheets.py goods_temp.xlsx
Sheet.xlsx Done!
Sheet1.xlsx Done!
All Done!
Convert to several csv:
$ python3 getsheets.py goods_temp.xlsx -f csv
Sheet.csv Done!
Sheet1.csv Done!
All Done!
getsheets.py:
# -*- coding: utf-8 -*-
import click
import os
import pandas as pd
def file_split(file):
s = file.split('.')
name = '.'.join(s[:-1]) # get directory name
return name
def getsheets(inputfile, fileformat):
name = file_split(inputfile)
try:
os.makedirs(name)
except:
pass
df1 = pd.ExcelFile(inputfile)
for x in df1.sheet_names:
print(x + '.' + fileformat, 'Done!')
df2 = pd.read_excel(inputfile, sheetname=x)
filename = os.path.join(name, x + '.' + fileformat)
if fileformat == 'csv':
df2.to_csv(filename, index=False)
else:
df2.to_excel(filename, index=False)
print('\nAll Done!')
CONTEXT_SETTINGS = dict(help_option_names=['-h', '--help'])
#click.command(context_settings=CONTEXT_SETTINGS)
#click.argument('inputfile')
#click.option('-f', '--format', type=click.Choice([
'xlsx', 'csv']), default='xlsx', help='Default xlsx.')
def cli(inputfile, format):
'''Convert a Excel file with multiple sheets to several file with one sheet.
Examples:
\b
getsheets filename
\b
getsheets filename -f csv
'''
if format == 'csv':
getsheets(inputfile, 'csv')
else:
getsheets(inputfile, 'xlsx')
cli()

We can use Pandas lib of Python to conevert xls file to csv file
Below code will convert xls file to csv file .
import pandas as pd
Read Excel File from Local Path :
df = pd.read_excel("C:/Users/IBM_ADMIN/BU GPA Scorecard.xlsx",sheetname=1)
Trim Spaces present on columns :
df.columns = df.columns.str.strip()
Send Data frame to CSV file which will be pipe symbol delimted and without Index :
df.to_csv("C:/Users/IBM_ADMIN/BU GPA Scorecard csv.csv",sep="|",index=False)

Python is not the best tool for this task. I tried several approaches in Python but none of them work 100% (e.g. 10% converts to 0.1, or column types are messed up, etc). The right tool here is PowerShell, because it is an MS product (as is Excel) and has the best integration.
Simply download this PowerShell script, edit line 47 to enter the path for the folder containing the Excel files and run the script using PowerShell.

Using xlrd is a flawed way to do this, because you lose the Date Formats in Excel.
My use case is the following.
Take an Excel File with more than one sheet and convert each one into a file of its own.
I have done this using the xlsx2csv library and calling this using a subprocess.
import csv
import sys, os, json, re, time
import subprocess
def csv_from_excel(fname):
subprocess.Popen(["xlsx2csv " + fname + " --all -d '|' -i -p "
"'<New Sheet>' > " + 'test.csv'], shell=True)
return
lstSheets = csv_from_excel(sys.argv[1])
time.sleep(3) # system needs to wait a second to recognize the file was written
with open('[YOUR PATH]/test.csv') as f:
lines = f.readlines()
firstSheet = True
for line in lines:
if line.startswith('<New Sheet>'):
if firstSheet:
sh_2_fname = line.replace('<New Sheet>', '').strip().replace(' - ', '_').replace(' ','_')
print(sh_2_fname)
sh2f = open(sh_2_fname+".csv", "w")
firstSheet = False
else:
sh2f.close()
sh_2_fname = line.replace('<New Sheet>', '').strip().replace(' - ', '_').replace(' ','_')
print(sh_2_fname)
sh2f = open(sh_2_fname+".csv", "w")
else:
sh2f.write(line)
sh2f.close()

I've tested all anwers, but they were all too slow for me. If you have Excel installed you can use the COM.
I thought initially it would be slower since it will load everything for the actual Excel application, but it isn't for huge files. Maybe because the algorithm for opening and saving files runs a heavily optimized compiled code, Microsoft guys make a lot of money for it after all.
import sys
import os
import glob
from win32com.client import Dispatch
def main(path):
excel = Dispatch("Excel.Application")
if is_full_path(path):
process_file(excel, path)
else:
files = glob.glob(path)
for file_path in files:
process_file(excel, file_path)
excel.Quit()
def process_file(excel, path):
fullpath = os.path.abspath(path)
full_csv_path = os.path.splitext(fullpath)[0] + '.csv'
workbook = excel.Workbooks.Open(fullpath)
workbook.Worksheets(1).SaveAs(full_csv_path, 6)
workbook.Saved = 1
workbook.Close()
def is_full_path(path):
return path.find(":") > -1
if __name__ == '__main__':
main(sys.argv[1])
This is very raw code and won't check for errors, print help or anything, it will just create a csv file for each file that matches the pattern you entered in the function so you can batch process a lot of files only launching excel application once.

As much as I hate to rely on Windows Excel proprietary software, which is not cross-platform, my testing of csvkit for .xls, which uses xlrd under the hood, failed to correctly parse dates (even when using the commandline parameters to specify strptime format).
For example, this xls file, when parsed with csvkit, will convert cell G1 of 12/31/2002 to 37621, whereas when converted to csv via excel -> save_as (using below) cell G1 will be "December 31, 2002".
import re
import os
from win32com.client import Dispatch
xlCSVMSDOS = 24
class CsvConverter(object):
def __init__(self, *, input_dir, output_dir):
self._excel = None
self.input_dir = input_dir
self.output_dir = output_dir
if not os.path.isdir(self.output_dir):
os.makedirs(self.output_dir)
def isSheetEmpty(self, sheet):
# https://archive.is/RuxR7
# WorksheetFunction.CountA(ActiveSheet.UsedRange) = 0 And ActiveSheet.Shapes.Count = 0
return \
(not self._excel.WorksheetFunction.CountA(sheet.UsedRange)) \
and \
(not sheet.Shapes.Count)
def getNonEmptySheets(self, wb, as_name=False):
return [ \
(sheet.Name if as_name else sheet) \
for sheet in wb.Sheets \
if not self.isSheetEmpty(sheet) \
]
def saveWorkbookAsCsv(self, wb, csv_path):
non_empty_sheet_names = self.getNonEmptySheets(wb, as_name=True)
assert (len(non_empty_sheet_names) == 1), \
"Expected exactly 1 sheet but found %i non-empty sheets: '%s'" \
%(
len(non_empty_sheet_names),
"', '".join(name.replace("'", r"\'") for name in non_empty_sheet_names)
)
wb.Worksheets(non_empty_sheet_names[0]).SaveAs(csv_path, xlCSVMSDOS)
wb.Saved = 1
def isXlsFilename(self, filename):
return bool(re.search(r'(?i)\.xls$', filename))
def batchConvertXlsToCsv(self):
xls_names = tuple( filename for filename in next(os.walk(self.input_dir))[2] if self.isXlsFilename(filename) )
self._excel = Dispatch('Excel.Application')
try:
for xls_name in xls_names:
csv_path = os.path.join(self.output_dir, '%s.csv' %os.path.splitext(xls_name)[0])
if not os.path.isfile(csv_path):
workbook = self._excel.Workbooks.Open(os.path.join(self.input_dir, xls_name))
try:
self.saveWorkbookAsCsv(workbook, csv_path)
finally:
workbook.Close()
finally:
if not len(self._excel.Workbooks):
self._excel.Quit()
self._excel = None
if __name__ == '__main__':
self = CsvConverter(
input_dir='C:\\data\\xls\\',
output_dir='C:\\data\\csv\\'
)
self.batchConvertXlsToCsv()
The above will take an input_dir containing .xls and output them to output_dir as .csv -- it will assert that there is exactly 1 non-empty sheet in the .xls; if you need to handle multiple sheets into multiple csv then you'll need to edit saveWorkbookAsCsv.

I was trying to use xlrd library in order to convert the format xlsx into csv, but I was getting error: xlrd.biffh.XLRDError: Excel xlsx file; not supported. That was happening because this package is no longer reading any other format unless xls, according to xlrd documentation.
Following the answer from Chris Withers I was able to change the engine for the function read_excel() from pandas, then I was able to a create a function that is converting any sheet from your Excel spreadsheet you want to successfully.
In order to work the function below, don't forget to install the openpyxl library from here.
Function:
import os
import pathlib
import pandas as pd
# Function to convert excel spreadsheet into csv format
def Excel_to_csv():
# Excel file full path
excel_file = os.path.join(os.path.sep, pathlib.Path(__file__).parent.resolve(), "Excel_Spreadsheet.xlsx")
# Excel sheets
excel_sheets = ['Sheet1', 'Sheet2', 'Sheet3']
for sheet in excel_sheets:
# Create dataframe for each sheet
df = pd.DataFrame(pd.read_excel(excel_file, sheet, index_col=None, engine='openpyxl'))
# Export to csv. i.e: sheet_name.csv
df.to_csv(os.path.join(os.path.sep, pathlib.Path(__file__).parent.resolve(), sheet + '.csv'), sep=",", encoding='utf-8', index=False, header=True)
# Runs the excel_to_csv function:
Excel_to_csv()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python parsing XLS with images [duplicate] - python

An xlsx file is actually a zip file. You can directly get the images from the xl/media subfolder. You can do this in python using the ZipFile class. You don't need to have MS Excel or even run in Windows!

Related

Unable to convert and save excel file into PDF using Python

Download WMF from pptx and decode to JPEG

Can excel CSV files be injected with macros?

Python win32 client saving %20 instead of spaces

xls to csv converter

Categories

Resources