I have the following function that will read from an excel workbook with the openpyxl library:
import openpyxl
def read_excel(path):
excel_workbook = openpyxl.load_workbook(path, read_only = True)
# other logic
return None
I can call that function like this:
read_excel("C:/Users/anon/Desktop/Current Projects/Test Files/Test.xlsm ")
And it returns this error:
openpyxl.utils.exceptions.InvalidFileException: openpyxl does not support .xlsm file
format, please check you can open it with Excel first. Supported formats are: .xlsx,.xlsm,
.xltx,.xltm
That error message confuses me. It's telling me that it doesn't support the .xlsm file format, and that it supports the .xlsm file format. The file opens just fine in excel, why won't openpyxl read my Excel file?
There is an extra whitespace character in the error message after .xlsm. Remove the whitespace character at the end of the path string you call the function with, and the function runs without error.
read_excel("C:/Users/anon/Desktop/Current Projects/Test Files/Test.xlsm")
the same problem bothered me a lot also today, and finally I updated openpyxl from 2.3.2 to 2.3.5, and this problem disappeared.
Although I am using Anaconda, sometimes using pip to update the packages might be a good try.
I'm using PyQt5 and had the same problem. I found that adding _filter fixed the problem. The full line reads:
fileName, _filter = QtWidgets.QFileDialog.getOpenFileName(None, "Lists", "", "xlsx files *.xlsx")
First, change the cwd(). When passing the file name, you can just copy the name of the file and paste it instead of typing it manually. The error may arise from some undetected nuances.
Related
I have tons of ".xls" format excel files that haves #NAME error in it.
I need to open each one, collect data from a specific range. but when I try to open it with xlrd I get the following error: "ERROR *** Token 0x2d (AreaN) found in NAME formula"
Code is below:
import xlrd
book = xlrd.open_workbook(r"C:\Users\metin.unlu\Desktop\Python\renk_study\labs\2-19.xls",ignore_workbook_corruption=True)
sheet=book.sheet_by_index(0)
While the error is safe explatornary and I know the cause of it, I have no idea how to solve it without openning each excel file manually and fixing it.
I am trying to learn Python (day 2) and am hoping to practice with Excel books first as this is where I am comfortable/fluent.
Right off the bat I am having an error that I don't quit understand when running the below code:
import openpyxl
wb = openpyxl.load_workbook("/Users/Scott/Desktop/Workbook1.xlsx")
print(wb.sheetnames)
This does print my sheet names as requested, but it is followed by:
/Users/Scott/PycharmProjects/Excel/venv/lib/python3.7/site-packages/openpyxl/worksheet/_reader.py:293: UserWarning: Unknown extension is not supported and will be removed
warn(msg)
I have found other questions that point to slicers/conditional formatting etc, but that does not apply here. This is a book I just made and only added 3 sheets before saving. It has no data, no formatting, and the extension is valid. I have no add-ons installed on my excel either.
Any idea why why I am getting this error? How do I resolve?
Python: 3.7
openpyxl: 2.6
I had a similar issue. I developed an application which read and write Excel files. It woked well on my Windows computer, but then I tried to run it on a friends mac. It showed the same error. I could "fix" it by changing the configuration of the workbook, like this:
import openpyxl as op
wb = op.load_workbook(file, read_only=True, data_only=True)
But, as you can see, you can only read Excel files with this configuration. At the end, I realized that my friend didn't have Microsoft Office installed on his computer. Install it truly solved my problem.
This question was from a couple years ago but I'm encountering it now with openpyxl and require a fix, as the warning is confounding and misleading to my end users.
The warning from openpyxl comes via the stdlib warnings library, which can be suppressed.
import warnings
warnings.simplefilter("ignore")
That's the "hit it with a hammer" approach. More granular levels of warnings suppression can be found here: https://docs.python.org/3/library/warnings.html
This is exactly the problem I encountered just now..
And to my situation (not to everyone) I discovered that you just need to close your excel and rerun the code, very simple.
If this doesn't work, you can refer to other answers.
Thanks
Python - Openpyxl - "UserWarning: Unknown extension" issue
To understand the error, you need to know what's inside an XLSX file. The best way to take a look is to change the extension to zip and open that. Inside you will see a file called [Content_Types].xml and directories for the other content. If you check out the XML in Content_Types you will see a <Types ...> tag containing other tags like this:
<Default Extension="png" ContentType="image/png"/>
<Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
<Default Extension="xml" ContentType="application/xml"/>
Note the "Extension" property. This is what the warning refers to. In the example above, my file included Extension="png" - the unknown extension.
For me, it was enough to specify read_only=True and the error went away eg:
wb = openpyxl.load_workbook(file, read_only=True)
I could also fix the issue by copying everything except the images to a new workbook and saving that. After checking, the xml in the new workbook no longer contained the png property.
Note, reading into pandas with pd.read_excel uses openpyxl and generates the same "Unknown extension" error but there is no way to pass through the read_only parameter. You can suppress the specific warning with:
import warnings
warnings.filterwarnings('ignore', category=UserWarning, module='openpyxl')
This question already has answers here:
Convert UTF-8 with BOM to UTF-8 with no BOM in Python
(7 answers)
Closed last year.
I'm opening a plain text file, parsing it, and adding different lines to existing, empty string variables. I add these variables into a new variable that is a multi-line fstring. Trying to write the data to a new text file is not behaving as expected.
Reading the original file works fine. Text is properly parsed, variables populated.
The multi-line fstring variable seems fine. Prints normally. Even tried formatting it different ways which I show below.
When writing to a new file, that's where the strangeness starts. I've tried 2 ways:
Straight coding the open function with w or w+
Adding the above to a function and using that inside main()
The file is saved to disk with the correct name. Trying to double-click open in Finder produces nothing. Right-click to open produces nothing. Trying to move to trash with command+delete gives an error:
It sounds like the file goes to trash, but as the file disappears from the folder a new one is created with the same name in its place.
If I try to open in TextMate via File > Open, it opens as a blank file with no errors.
Since I can't get rid of the file, I have to delete the directory and create the directory again with the same name, or force delete in Terminal using rm. Restarting the system does not help. Relaunching Finder does nothing. Saving text files from other apps works fine. Directory is chmod 755.
If I copy an existing text file into the output directory, rename it to what the file is expected to be named, and let python overwrite the contents, it doesn't work either. The file modification date changes (and I see the file "blink" in Finder) but the contents remain the same. However, the file is not corrupted and opens normally.
If I do the same but delete the text inside of the copied file first, then run the script, python writes no data to the file, I can't open it by double-clicking on it, and I get error -43 again with the odd non-trashing behavior.
The strangest thing is this: if I add another with open() at the end of the script, and open the file that was just created and supposedly written to, and print its contents, the contents print. It's like when the script ends the file contents are being removed or its being corrupted somehow. Tried to close the file inside the script even though it's not needed, but same behavior persists.
Code:
Here's the code for writing:
FORMAT='utf-8'
OUTPUT_DIR = '/Path/To/SaveFolder'
# as a function
def write_to_file(content, fpath, name):
the_file = os.path.join(fpath, name)
with open(the_file, 'w+', encoding=FORMAT) as t:
t.write(content)
def main():
print(f" Writing File...\n")
filename = f"{pcode}_{author}_{title}_text.txt"
write_to_file(multiline_var, OUTPUT_DIR, filename)
# or hard coded in main()
def main():
print(f" Writing File...\n")
filename = f"{pcode}_{author}_{title}_text.txt"
the_file = os.path.join(OUTPUT_DIR, filename)
with open(the_file, 'w+', encoding=FORMAT) as t:
t.write(multiline_var)
I have tried using w w+ wt and wt+ and with and without encoding='utf-8'
Here is an example of multi-line fstring variable:
# using triple quotes
multiline_var = f"""
[PROJ-{pcode}] {full_title} by {author}
{description}
{URL}
{DIVIDER_1}
{TEXT_BLURB}
Some text here and then {SOME_MORE_TEXT}"
{DIVIDER_1}
{SOME_LINK}
"""
# or inside parens
multiline_var = (
f"[PROJ-{pcode}] {full_title} by {author}\n"
f"{description}\n\n"
f"{URL}\n"
f"{DIVIDER_1}\n"
f"{TEXT_BLURB}\n\n"
f"Some text here and then {SOME_MORE_TEXT}\n"
f"{DIVIDER_1}\n\n"
f"{SOME_LINK}"
)
Using exiftool on the text file shows the following, so it looks the data is there but must be corrupted:
File Size : 1797 bytes
File Modification Date/Time : 2021:12:31 15:55:39-05:00
File Access Date/Time : 2021:12:31 15:58:13-05:00
File Inode Change Date/Time : 2021:12:31 15:55:39-05:00
File Permissions : -rw-r--r--
File Type : TXT
File Type Extension : txt
MIME Type : text/plain
MIME Encoding : utf-8
Byte Order Mark : No
Newlines : Unix LF
Line Count : 55
Word Count : 181
Not sure what I'm doing wrong. VScode shows no syntax errors in the script. There are no errors in Terminal when running the script. Have I made some simple mistake in the above code? Maybe the fstring variable is causing a problem?
Thanks to #bnaecker for leading me to the solution to this problem.
It appeared that when creating/writing to a text file with a long name, Python can corrupt it. Not sure why, as I save long names for images with Python image libraries all the time. Using a short name like "MyFile.txt" it worked just fine, but that was a red herring.
I have updated this post with my journey to the final solution for using the long names that are needed for my project, though I'm not sure why the problem exists.
First Attempts:
So far creating using a short name and then renaming to a long one.... attempts have failed. I did notice that python is locking the file it creates and never unlocks it. Not sure if this is the problem. Setting chflags with os.system('chflags nouchg') command does not work, not even with sudo, and not even in the Terminal doing it manually.
Using os.rename() in Python corrupts the file
Using os.system('mv oldFile.txt newFile.txt') corrupts the file
Manually using mv command in Terminal corrupts the file
Manually changing the filename in the Finder does not (wtf?)
I kept looking for workarounds but nothing did the job.
Round 2:
Progress!
After much tinkering, I discovered a hidden character inside the file. I ran cat /path/longfilename.txt in Terminal, selected and copied the output and pasted into VScode. Here is what I saw:
Somehow a hidden character is getting into the project code number.
Pasting it into a Unicode search engine it came up as a ZERO WIDTH NO-BREAK SPACE also known in Unicode as EF BB BF. However, when pasting this symbol into TextMate it shows up as <U+FEFF> which is?...
The Byte Order Mark!
Opening a normal utf-8 text file in a hex editor also shows the files starting with EFBBBF for the BOM.
Now, the text file being read and parsed at first has no blank lines to start the file, so I added a line break, and also tried adding some spaces. This time when writing the file I could open it, however, after sending it to the trash, the same behavior occurred and the file was broken again. It seems that because other corrupted versions were in the trash, it added the symbol back to the file name for some reason.
So what appears to be happening, for whatever reason, when Python opens the text file I'm parsing that has no line break at the top, it seems to be grabbing the BOM from the file and adding that to the first variable which is grabbing the first line of the text file. Since that text is a number code that starts the file name, the BOM symbol is being added to the file name as well as the code inside the text file.
Just... wow
The Current Solution:
I have to leave a blank line at the start of the text file that I'm opening and parsing and a simple line break won't do it. I have no idea why this is. I added some spaces for good measure because randomly the BOM would be added to the variable and filename again. So far (knock on wood) as long as the first line of that initial file has some spaces and then a line break, and previous corrupted files have been deleted from the trash, a long file name can be used for all the files I'm creating and writing to without any problems.
This corruption even persists if I remove the encoding flag from both of the open functions I'm using (one to read and parse, the other to create and write).
If anyone knows why this is happening, please share. I've never seen it mentioned before. I'm not sure if it's a python 3.8 bug, a mac OS bug, the way TextMate wrote the original file, or a combination of these.
Correct Solution:
Thanks to #tripleee for the proper way to handle this, as I don't remember seeing this before, though I haven't been using python for very long.
In order to ignore the BOM, reading in the text file to be parsed with an encoding='utf-8-sig' does the job. Seems to be why it exists. :)
Problem solved.
I have been provided with a xlsb file full of data. I want to process the data using python. I can convert it to csv using excel or open office, but I would like the whole process to be more automated. Any ideas?
Update: I took a look at this question and used the first answer:
import subprocess
subprocess.call("cscript XlsToCsv.vbs data.xlsb data.csv", shell=False)
The issue is the file contains greek letters so the encoding is not preserved. Opening the csv with Notepad++ it looks as it should, but when I try to insert into a database comes like this ���. Opening the file as csv, just to read text is displayed like this:
\xc2\xc5\xcb instead of ΒΕΛ.
I realize it's an issue in encoding, but it's possible to retain the original encoding converting the xlsb file to csv ?
I've encountered this same problem and using pyxlsb does it for me:
from pyxlsb import open_workbook
with open_workbook('HugeDataFile.xlsb') as wb:
for sheetname in wb.sheets:
with wb.get_sheet(sheetname) as sheet:
for row in sheet.rows():
values = [r.v for r in row] # retrieving content
csv_line = ','.join(values) # or do your thing
Most popular Excel python packages openpyxl and xlrd have no support for xlsb format (bug tracker entries: openpyxl, xlrd).
So I'm afraid there is no native python way =/. However, since you are using windows, it should be easy to script the task with external tools.
I would suggest taking look at Convert XLS to XLSB Programatically?. You mention python in title but the matter of the question does not imply you are strongly coupled to it, so you could go pure c# way.
If you feel really comfortable only with python one of the answers there suggests a command line tool under a fancy name of Convert-XLSB. You could script it as an external tool from python with subprocess.
I know this is not a good answer, but I don't think there is better/easier way as of now.
In my previous experience, i was handling converting xlsb using libreoffice command line utility,
In ruby i just execute system command to call libreoffice for converting xlsb format to csv:
`libreoffice --headless --convert-to csv your_xlsb_file.xlsb --outdir /path/csv`
and to change the encoding i use command line to using iconv, using ruby :
`iconv -f ISO-8859-1 -t UTF-8 your_csv_file.csv > new_file_csv.csv`
I also looked at the problem and the following worked for me. First opening the file in excel via python and than saving it to different file. Bit of a workaround but I like it more than other solutions. In example I use file format 6 which is CSV but you can also use other ones.
import win32com.client
excel = win32com.client.Dispatch("Excel.Application")
excel.DisplayAlerts = False
excel.Visible=False
doc = excel.Workbooks.Open("C:/users/A295998/Python/#TA1PROG3.xlsb")
doc.SaveAs(Filename="C:\\users\\A295998\\Python\\test5.csv",FileFormat=6)
doc.Close()
excel.Quit()
XLSB is a binary format and I don't think you'll be able to parse it with current python tools and packages. If you still want to somehow automate the process with python you can do what the others have told you and script that windows CLI tool. Calling the .exe from the command line with subprocess, and passing an array of the files you want to convert.
I.e: with a script similar to this one you could convert all the .xlsb files that you place in the "xlsb" folder to .csv format...
├── xlsb
│ ├── file1.xlsb
│ ├── file2.xlsb
│ └── file3.xlsb
└── xlsb_to_csv.py
xlsb_to_csv.py
#!/usr/bin/env python
import os
files = [f for f in os.listdir('./xlsb')]
for f in files:
subprocess.call("ConvertXLS.EXE " + str(f) + " --arguments", shell=True)
Note: the Windows command is pseudocode... I use a similar approach to batch-convert stuff in headless windows servers for testing purpouses. You just have to figure out the exe location and the windows command...
Hope it helps... good luck!
I think you can do this using pyuno. This blog entry shows how to convert xls files to csv, and as open office supports xlsb files since version 3.2, this code might just work for you. You will have to go through hassle of setting up the pyuno environment though..
The script you reference seem to use the ActiveX interface to Excel, and save via its Workbook.SaveAs method.
According to the MSDN documentation this method have a TextCodepage argument which may be helpful.
Sidenote: You can rewrite the VB script in python, see this question.
http://scienceoss.com/read-excel-files-from-python/comment-page-1/#comment-1051
From the above link, I used this utility to read an XLS file. If the XLS file contains different language characters like Chinese or Hindi, it does not output them correctly. Is there a workaround for this?
After Googling, I found this:
import xlrd
def upload_xls(dir,file,request):
try:
global msg
global row_num
row_num = []
header_arr = []
global file_path
file_path = dir
#reader = csv.reader(open(file), delimiter='#', quotechar='"')
book = xlrd.open_workbook('dodgy.xls',encoding='cp1252') ##To specify UTF8-encoding
wb.sheet_names()
sh = wb.sheet_by_index(0)
valid_xl_format = 0
invalid_xl_format = 0
except:
print "Error
But there is an error in the line book = open_workbook('dodgy.xls',encoding='cp1252'):
TypeError: open_workbook() got an unexpected keyword argument 'encoding'
[dis]claimer: I'm the author of xlrd.
If the xls contains different language characters like chine or
hindi.It does not output the exact wordings.Is there a work around for
this..
The encoding_override argument is (as explained in the documentation) used ONLY for OLD files (produced by Excels earlier than Excel 97 (that's the year 1997)) and only then when the internally-recorded "codepage" is missing or incorrect.
Note: Old file with Chinese characters: Overriding with 'cp1252' is guaranteed to raise an exception.
Note: Old file with "Hindi" (Devanagari?) characters: very unlikely ... as far as I know there never was an officially-supported codepage for any of the ISCII scripts, and I haven't heard of any unofficial one. Any information on this topic and/or sample files would be very welcome.
Excel 97 and later versions record all text data in (effectively) UTF-16LE. The encoding_override is ignored if the file is a valid Excel-97-or-later file.
Whatever the version of Excel that produced the file, (as documented) xlrd returns unicode strings. Your problems are much more likely to be related to how you are displaying or converting those unicode strings.
For further assistance, edit your question to show examples of the actual output together with the "exact wording".
According to the xlrd module documentation, the correct parameter is: encoding_override="cp1252" and not encoding="cp1252".
From the way you are importing the xlrd module you should be calling the function as xlrd.open_workbook but in the example code you use the function directly, as if you had used from xlrd import *.
There is a csv module in the standard library, which handles unicode in Python 3.1.
Warning: in Python 2.x the csv library does not handle unicode.
There is a similar question. The answer was the Output was causing issue, not XLRD.
Answer on how set your script to UTF-8 ->
https://stackoverflow.com/a/17628350/713