pd.read_excel not able to read a xlsx file - python

I am trying to read an xlsx file using pd.read_excel but getting an error like this:
Error
My code snippet looks like this:
Notebook code
When i tried to open the xlsx file directly from notebook i got this error:
opening xlsx file
PLease help me solve this issue.

Your title says pd.read_excel, but your code says pd.read_csv. CSV files are plain text files, readable with Notepad. XLSX files are binary, and are not human-readable. If you have an XLSX file, then use pd.read_excel, like your title says.

Related

Load xls files with pandas is failed

I am trying to load an xls file with pandas using:
pd.read_excel(fi_name, sheet_name=None, engine=None)
But i get this error:
"XLRDError: Workbook is encrypted"
But file is not encrypted, i can open it with excel, and read file's text with tika package.
Is someone know how can i solve it ?
Besides, is anyone know a python package for reading all excel files format,
Even if pandas is failed ?
Thanks
I guess ,I found something for your problem:
import msoffcrypto
file = msoffcrypto.OfficeFile (open ('encrypted.xls', 'rb')) # read the original file
file.load_key (password = 'VelvetSweatshop') # Fill in the password, if it can be opened directly, the default password is 'VelvetSweatshop'
file.decrypt (open ('decrypted.xls', 'wb')) # Save it as a new file after decryption
After that, you can use xlrd to open and operate the decrypted file normally.
and you can install the package with
pip install msoffcrypto
and you can see the full documentation here
There are 2 possible reasons for this:
The file that you are getting is not in the same file format as the file extension says.
Either the whole workbook or a sheet of it is password protected and hence the data being read from it is encrypted to protect the data.

I am facing problem with .CSV format in PANDAS

I will explain in detail:
I have an Excel file and my client is using one tool which reads .csv format files only.
Now I am opening the Excel file in Excel and saving into .CSV format by using Save As option in excel. let me take this is a File_1.
I wrote Python code by using pandas module and i converted that Excel file into csv. let me take this is as a File_2.
My client tool is able to read File_1 but not File_2. Why? What would be the problem?
My observations:
When I am reading File_1 in pandas (which is converted into .CSV manually) I had to mention --> encoding = "ISO-8859-1", otherwise it is giving Unicode error.
Ex: pd.read_csv("File_1.csv", encoding = 'ISO-8859-1")
But when I am reading File_2 in pandas, it simply reading and not giving any error.
Ex: pd.read_csv("File_2.csv")
So what would be the reason to not read File_2 by client tool? Is it Unicode problem or any other?

Python: How to write data to an Excel file using pd.ExcelWriter?

The question:
I'm trying to write data to an Excel file using Python, specifically using the ExcelWriter function provided py Pandas as described here in the docs. I think I've onto something here, but I'm only able to achieve one of two outcomes:
1. If the Excel file is open, access permission is denied.
2. If the Excel file is closed, the code seems to be running just fine, but the following error message is provided when trying to open the file Excel file after execution:
Excel cannot open the file excelTest.xlsm because the file format or
file extension is not valid. Verify that the file has not been
corrupted and that the file extenstion matches the format of the file
Does anyone know what's going on here? Or is there perhaps a better way to do this than using pd.ExcelWrite?
The details:
I've got three files in the directory C:\pythontest:
1. input.txt
2. excelTest.xlsm
1. pythonTest.py
input.txt is a comma separated text file with this content:
A,B,C
1,4,6
2,5,5
3,5,6
excelTest.xlsm is an Excel file that is completely empty with the exception of of one empty sheet named Sheet1.
pythonTest.py is a script where I'm trying to read the txt file using Python, and then write a pandas dataframe to the Excel file:
import os
import pandas as pd
os.getcwd()
os.chdir('C:/pythonTest')
os.listdir(os.getcwd())
df = pd.read_csv('C:\\pythonTest\\input.txt')
writer = pd.ExcelWriter('excelTest.xlsm')
df.to_excel(writer,'Sheet2')
writer.save()
But as I've mentioned, it fails spectacularly. Any suggestions?
System info:
Windows 7, 64 bit
Excel Version 1803
Python 3.6.6 | Anaconda custom (64-bit) |
Pandas 0.23.4
EDIT 1 - print(df) output as requested in the comments:
Pandas requires that a workbook name ends in .xls or .xlsx. It uses the extension to choose which Excel engine to use.
So the problem you've got is the extension, due to "extension hardening" Excel won't open this file since it knows that it doesn't contain a macro and isn't actually an xlsm file. Writing to excelTest.xlsx should work!

How to import an old Excel file with extension.xls?

I have a file from SAS that is exported as an older Excel .xls file. I would like to import this file into python 3.5.
when I do:
import pandas as pd
Filewant = pd.read_excel("Filepath\\\Filename.xls")
I get a bunch of error messages culminating in
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'<html xm'
if I open up the file and manually save it in a current .xlsx file and us the same command line using:
Filewant =pd.read_excel("Filepath\\Filename\.xlsx")
then the file is imported into Python properly. However, I want the process to be more automated so I don't to have to manually save the file to .xlsx format to make it work.
SAS tech support told me that this won't work and that I'll need to convert the .xls SAS output into a .xlsx file:
Unfortunately, the MSOffice2K destination creates an HTML file even though it uses the .XLS extension here which allows the file to be opened with excel.
You can use VBScript to convert the file to .XLSX, however, there is no way to do this using the MSoffice2K destination.
The error message tells you the problem. found b'<html xm' Your file is an HTML file and not an XLS file. This was commonly done with "old" SAS since it did not support writing XLS files, but Excel did support reading HTML files.

python excel processing error

I am working on the excel processing using python.
I am using xlrd module (version 0.6.1) for the same.
I am abe to fetch most of the excel files but for some excel files it gives me error as :
XLRDError: Expected BOF record; found 0x213c
Can anyone let me know about how to solve this issue?
thanks in advance.
What you have is most probably an "XML Spreadsheet 2003 (*.xml)" file ... "<!" aka "\x3c\x21" (which is what XML streams start with) is being interpreted as the little-endian number 0x213c.
Notepad: First two lines:
<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
You can also check this by opening the file with Excel and then click on Save As and look at the file-type that is displayed. While you are there, save it as an XLS file so that your xlrd can read it.
Note: this XML file is NOT the Excel 2007+ XLSX file. An XLSX is actually a ZIP file (starts with "PK", not "<?") containing a bunch of XML streams.

Categories