Traceback (most recent call last) when docx.Document() is used - python

import docx
f = open('~/Desktop/python/test/draft.docx','rb')
document = docx.Document(f)
Traceback (most recent call last):
File "./test.py", line 56, in <module>
document = docx.Document(f)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/docx/api.py", line 25, in Document
document_part = Package.open(docx).main_document_part
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/docx/opc/package.py", line 116, in open
pkg_reader = PackageReader.from_file(pkg_file)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/docx/opc/pkgreader.py", line 32, in from_file
phys_reader = PhysPkgReader(pkg_file)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/docx/opc/phys_pkg.py", line 101, in __init__
self._zipf = ZipFile(pkg_file, 'r')
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/zipfile.py", line 1200, in __init__
self._RealGetContents()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/zipfile.py", line 1267, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
Uninstalled docx and installed python-docx
Uninstalled lxml too
None of it works.
Any help would be appreciated
Running python3.7 on OS X10.13

Don't open the file before you pass it to Document(). Just give it the path like you did in the open() call above.
It needs to be an actual Word .docx file. Note that you can just call document = Document() to get started. The "save as" file name is provided in the document.save() call. The file (if any) provided in the Document() call is just the starting-point "template" to use.
See the related documentation here:
https://python-docx.readthedocs.io/en/latest/user/documents.html

Related

Why can't I edit my .xlsx file with openpyxl?

I am encountering a problem when running with openpyxl the code below
import openpyxl
import os
wb = openpyxl.load_workbook('example.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')
sheet["A1"].value
sheet["A1"].value == None
sheet["A1"].value = 42
sheet["A3"].value = 'Hello'
os.chdir("/Users/mac/Desktop")
wb.save('exceeeel.xlsx')
The error is
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/openpyxl/reader/excel.py", line 312, in load_wo
rkbook
reader = ExcelReader(filename, read_only, keep_vba,
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/openpyxl/reader/excel.py", line 124, in __init_
_
self.archive = _validate_archive(fn)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/openpyxl/reader/excel.py", line 96, in _validat
e_archive
What am I doing wrong? I am using the current version of the openpyxl library.
I can't provide a confident answer because the question only includes a partial traceback. That being said, it looks like the traceback one would get for a FileNotFoundError:
C:\Python37\python.exe C:/Users/user/PycharmProjects/scratch/scratch2.py
Traceback (most recent call last):
File "C:/Users/user/PycharmProjects/scratch/scratch2.py", line 3, in <module>
wb = openpyxl.load_workbook('example.xlsx')
File "C:\Python37\lib\site-packages\openpyxl\reader\excel.py", line 313, in load_workbook
data_only, keep_links)
File "C:\Python37\lib\site-packages\openpyxl\reader\excel.py", line 124, in __init__
self.archive = _validate_archive(fn)
File "C:\Python37\lib\site-packages\openpyxl\reader\excel.py", line 96, in _validate_archive
archive = ZipFile(filename, 'r')
File "C:\Python37\lib\zipfile.py", line 1207, in __init__
self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: 'example.xlsx'
Process finished with exit code 1
This error will be raised when the path your provided to the file you want to load with openpyxl.load_workbook does not contain the specified file. Since the only argument you provided in your call of that function is 'example.xlsx' that probably means there is no file in the folder you are running this script from.
If this 'example.xlsx' file is in a different folder then you'll want to either specify the relative path to that file as your argument or move the file into the same folder as your script.
If this isn't what's going on then you'll need to provide the full traceback that you are seeing on your end in order to get a better answer.

How Do We Convert HTML to PDF using Python, Is there Any code please share it to me?

I have tried the Library called pytotree, But i didnt get any Answer
This is the code:
import pdftotree
file= open('C:/Users/chaitanya.naidu/Downloads/test.pdf', 'rb')
f = pdftotree.parse(file)
I am getting this error
Traceback (most recent call last):
File "<ipython-input-4-4a9a6b72801d>", line 1, in <module>
f = pdftotree.parse(file)
File "C:\Users\chaitanya.naidu\AppData\Local\Continuum\Anaconda3\lib\site-packages\pdftotree\core.py", line 63, in parse
if not extractor.is_scanned():
File "C:\Users\chaitanya.naidu\AppData\Local\Continuum\Anaconda3\lib\site-packages\pdftotree\TreeExtract.py", line 121, in is_scanned
self.parse()
File "C:\Users\chaitanya.naidu\AppData\Local\Continuum\Anaconda3\lib\site-packages\pdftotree\TreeExtract.py", line 91, in parse
for page_num, layout in enumerate(analyze_pages(self.pdf_file)):
File "C:\Users\chaitanya.naidu\AppData\Local\Continuum\Anaconda3\lib\site-packages\pdftotree\utils\pdf\pdf_utils.py", line 117, in analyze_pages
with open(os.path.realpath(file_name), "rb") as fp:
File "C:\Users\chaitanya.naidu\AppData\Local\Continuum\Anaconda3\lib\ntpath.py", line 542, in abspath
path = os.fspath(path)
TypeError: expected str, bytes or os.PathLike object, not _io.BufferedReader
You can use pdfkit, example:
import pdfkit
pdfkit.from_url('http://google.com', 'out.pdf')
pdfkit.from_file('test.html', 'out.pdf')
pdfkit.from_string('Hello!', 'out.pdf')

Pyglet can't load .wav file

Env:
Ubuntu 18.04
Python 3.6.6
pyglet 1.3.2
Issue:
Based on documentation of pyglet I try to run following code:
import pyglet
pyglet.options["audio"] = ("openal", "pulse", "directsound", "silent")
explosion = pyglet.media.load('explosion.wav')
But following exceptions occured:
1) if file was converted by ffmpeg -i input.mp3 output.wav
Traceback (most recent call last):
File "<path_to_dir>/test_sound.py", line 3, in <module>
explosion = pyglet.media.load('zxc.wav', streaming=False)
File "<path_to_env>lib/python3.6/site-packages/pyglet/media/sources/loader.py", line 63, in load
source = get_source_loader().load(filename, file)
File "<path_to_env>lib/python3.6/site-packages/pyglet/media/sources/loader.py", line 84, in load
return WaveSource(filename, file)
File "<path_to_env>lib/python3.6/site-packages/pyglet/media/sources/riff.py", line 197, in __init__
raise WAVEFormatException('Not a WAVE file')
pyglet.media.sources.riff.WAVEFormatException: Not a WAVE file
2) or this for several .wav from internet
Traceback (most recent call last):
File "<path_to_dir>//test_sound.py", line 3, in <module>
explosion = pyglet.media.load('explosion.wav', streaming=False)
File "<path_to_env>lib/python3.6/site-packages/pyglet/media/sources/loader.py", line 63, in load
source = get_source_loader().load(filename, file)
File "<path_to_env>lib/python3.6/site-packages/pyglet/media/sources/loader.py", line 84, in load
return WaveSource(filename, file)
File "<path_to_env>lib/python3.6/site-packages/pyglet/media/sources/riff.py", line 192, in __init__
format = wave_form.get_format_chunk()
File "<path_to_env>lib/python3.6/site-packages/pyglet/media/sources/riff.py", line 172, in get_format_chunk
for chunk in self.get_chunks():
File "<path_to_env>lib/python3.6/site-packages/pyglet/media/sources/riff.py", line 108, in get_chunks
chunk = cls(self.file, name, length, offset)
File "<path_to_env>lib/python3.6/site-packages/pyglet/media/sources/riff.py", line 153, in __init__
raise RIFFFormatException('Size of format chunk is incorrect.')
pyglet.media.sources.riff.RIFFFormatException: Size of format chunk is incorrect.
Question:
How to run .wav files via pyglet correctly?
Like in the example, it is probably either an issue with openal or the wav-files. Are the procedural sounds playing correctly, e.g.:
from pyglet.media.sources.procedural import Sine
sine = Sine(duration=1, frequency=500,
sample_size=16, sample_rate=44100)
pyglet.media.StaticSource(sine).play()
and can you share an offending wav-file? I just ran a test on Linux Mint 19, Python 3.7.1 and pyglet 1.3.2 with https://github.com/pyreiz/pyreiz/blob/master/reiz/media/wav/ding.wav and it runs fine.

spaCy: errors attempting to load serialized Doc

I am trying to serialize/deserialize spaCy documents (setup is Windows 7, Anaconda) and am getting errors. I haven't been able to find any explanations. Here is a snippet of code and the error it generates:
import spacy
nlp = spacy.load('en')
text = 'This is a test.'
doc = nlp(text)
fout = 'test.spacy' # <-- according to the API for Doc.to_disk(), this needs to be a directory (but for me, spaCy writes a file)
doc.to_disk(fout)
doc.from_disk(fout)
Traceback (most recent call last):
File "<ipython-input-7-aa22bf1b9689>", line 1, in <module>
doc.from_disk(fout)
File "doc.pyx", line 763, in spacy.tokens.doc.Doc.from_disk
File "doc.pyx", line 806, in spacy.tokens.doc.Doc.from_bytes
ValueError: [E033] Cannot load into non-empty Doc of length 5.
I have also tried creating a new Doc object and loading from that, as shown in the example ("Example: Saving and loading a document") in the spaCy docs, which results in a different error:
from spacy.tokens import Doc
from spacy.vocab import Vocab
new_doc = Doc(Vocab()).from_disk(fout)
Traceback (most recent call last):
File "<ipython-input-16-4d99a1199f43>", line 1, in <module>
Doc(Vocab()).from_disk(fout)
File "doc.pyx", line 763, in spacy.tokens.doc.Doc.from_disk
File "doc.pyx", line 838, in spacy.tokens.doc.Doc.from_bytes
File "stringsource", line 646, in View.MemoryView.memoryview_cwrapper
File "stringsource", line 347, in View.MemoryView.memoryview.__cinit__
ValueError: buffer source array is read-only
EDIT:
As pointed out in the replies, the path provided should be a directory. However, the first code snippet creates a file. Changing this to a non-existing directory path doesn't help as spaCy still creates a file. Attempting to write to an existing directory causes an error too:
fout = 'data'
doc.to_disk(fout) Traceback (most recent call last):
File "<ipython-input-8-6c30638f4750>", line 1, in <module>
doc.to_disk(fout)
File "doc.pyx", line 749, in spacy.tokens.doc.Doc.to_disk
File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 1161, in open
opener=self._opener)
File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 1015, in _opener
return self._accessor.open(self, flags, mode)
File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 387, in wrapped
return strfunc(str(pathobj), *args)
PermissionError: [Errno 13] Permission denied: 'data'
Python has no problem writing at this location via standard file operations (open/read/write).
Trying with a Path object yields the same results:
from pathlib import Path
import os
fout = Path(os.path.join(os.getcwd(), 'data'))
doc.to_disk(fout)
Traceback (most recent call last):
File "<ipython-input-17-6c30638f4750>", line 1, in <module>
doc.to_disk(fout)
File "doc.pyx", line 749, in spacy.tokens.doc.Doc.to_disk
File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 1161, in open
opener=self._opener)
File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 1015, in _opener
return self._accessor.open(self, flags, mode)
File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 387, in wrapped
return strfunc(str(pathobj), *args)
PermissionError: [Errno 13] Permission denied: 'C:\\Users\\Username\\workspace\\data'
Any ideas why this might be happening?
doc.to_disk(fout)
must be
a path to a directory, which will be created if it doesn't exist.
Paths may be either strings or Path-like objects.
as the documentation for spaCy states in https://spacy.io/api/doc
Try changing fout to a directory, it might do the trick.
EDIT:
Examples from the spacy documentation:
for doc.to_disk:
doc.to_disk('/path/to/doc')
and for doc.from_disk:
from spacy.tokens import Doc
from spacy.vocab import Vocab
doc = Doc(Vocab()).from_disk('/path/to/doc')

Failed to download file using pafy

I am using Python 2.7 and pafy to download audio file from youtube
import pafy
video = pafy.new("https://www.youtube.com/watch?v=dcNlEn1LrrE")
print video.m4astreams
filename = video.m4astreams[0].download(quiet=False)
I get the following error:
Traceback (most recent call last):
File "E:\work\Python\2017\pafy\work_with_pafy.py", line 27, in <module>
filename = video.m4astreams[0].download(quiet=False)#.encode('utf-8')
File "c:\python27\lib\site-packages\pafy\backend_shared.py", line 586, in download
filename = self.generate_filename(meta=meta, max_length=256-len('.temp'))
File "c:\python27\lib\site-packages\pafy\backend_shared.py", line 458, in generate_filename
return xenc(filename)
File "c:\python27\lib\site-packages\pafy\util.py", line 63, in xenc
return utf8_replace(stuff) if not_utf8_environment else stuff
File "c:\python27\lib\site-packages\pafy\util.py", line 57, in utf8_replace
txt = txt.encode(sse, "replace").decode(sse)
TypeError: encode() argument 1 must be string, not None
Please Help!
Thanks in advance.
I have found the solution.
The problem is solved by replacing one string in util.py file C:\Python27\Lib\site-packages\pafy\util.py
I replaced that string in util.py:
txt = txt.encode(sse, "replace").decode(sse)
by this one:
txt = txt.encode('utf-8')
After that file successfully downloaded without any problems.

Categories