PyPDF2: writing output to stdout fails with python3 - python

I am trying to use Python 3.7.2 with PyPDF2 1.26 to select some pages of an input PDF file and write the output to stdout (the actual code is more complicated, this is just a MCVE):
import sys
from PyPDF2 import PdfFileReader, PdfFileWriter
input = PdfFileReader("example.pdf")
output = PdfFileWriter()
output.addPage(input.getPage(0))
output.write(sys.stdout)
This fails with the following error:
UserWarning: File <<stdout>> to write to is not in binary mode. It may not be written to correctly. [pdf.py:453]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.7/site-packages/PyPDF2/pdf.py", line 487, in write
stream.write(self._header + b_("\n"))
TypeError: write() argument must be str, not bytes
The problem seems to be that sys.stdout is not open in binary mode. As some of the answers suggest, I have tried the following:
output.write(sys.stdout.buffer)
This fails with the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.7/site-packages/PyPDF2/pdf.py", line 491, in write
object_positions.append(stream.tell())
OSError: [Errno 29] Illegal seek
I have also tried the answer from Changing the way stdin/stdout is opened in Python 3:
sout = open(sys.stdout.fileno(), "wb")
output.write(sout)
This fails with the same error as above.
How can I use the PyPDF2 library to output a PDF to standard output?
More generally, how do I correctly switch sys.stdout to binary mode (akin to Perl's binmode STDOUT)?
Note: There is no need to tell me that I can open a file in binary mode and write the PDF to that file. That works; however, I specifically want to write the PDF to stdout.

From the documentation:
write(stream)
Writes the collection of pages added to this object out as a PDF file.
Parameters: stream – An object to write the file to. The object must support the write method and the tell method, similar to a file object.
It turns out that sys.stdout.buffer is not tellable if not redirected to a file, hence you can't use it as a stream for PdfFileWriter.write.
Say your script is called myscript. If you call just myscript, then you'll get this error, but if you use it with a redirection, as in:
myscript > myfile.pdf
then Python understands it's a seekable stream, and you won't get the error.

Related

why is the error thinking Its not string?

path of file is:
"C:\Users\deana\OneDrive\Marlon's files\Programming\Python\PITT\PITT_LIbrary\Lists\test.txt"
lines of code are:
import os
os.chdir("C:/Users/deana/OneDrive/Marlon's files/Programming/Python/PITT/PITT_LIbrary/Lists")
exec(open('test.txt'))
the error is this:
Traceback (most recent call last):
File "<pyshell#14>", line 1, in <module>
exec(open('test.txt'))
TypeError: exec() arg 1 must be a string, bytes or code object
also if I try on one line as such:
exec(open(r"C:/Users/deana/OneDrive/Marlon's files/Programming/Python/PITT/PITT_LIbrary/Lists/test.txt"))
i'ts the same error. (with and without r)
super frustrationg as it reads like i'm not inputting string... but it is string!?!
also I've done this litteraly the same way before, restarted IDLE shell, no difference.
ugh! I always get stupid errors with file paths.
I should have been using os.startfile() to open this.
It was confusing by using .open(). as I was attempting to open in default app.
before, i've used exec.open() to open .py files and guess I got them confused.
exec is just used to open other scripts... need stronger coffee next time.
Try this:
import os
os.chdir("C:/Users/deana/OneDrive/Marlon's files/Programming/Python/PITT/PITT_LIbrary/Lists")
exec(open('test.txt', 'rb'))
You can convert the txt file to bytes by opening it with rb (read bytes).

Invalid Signature Exception with .sav file

New to scipy but not to python. Trying to import a .sav file to scipy so I can do some basic work on it. But, each time I try to import the file using scipy.io.readsav(), python throws an error:
Traceback (most recent call last):
File "<ipython-input-7-743be643d8a1>", line 1, in <module>
dataset = io.readsav("c:/users/me/desktop/survey.sav")
File "C:\Users\me\Anaconda3\lib\site-packages\scipy\io\idl.py", line 726, in readsav
raise Exception("Invalid SIGNATURE: %s" % signature)
Exception: Invalid SIGNATURE: b'$F'
Any idea what's happening? I can open the file in R and manipulate the data, but I'd like to do it in Python. Running Anaconda on Windows.
scipy.io.readsav() reads IDL SAVE files. You have tagged this question spss, so I assume you are trying to read an SPSS file. The format of an SPSS .sav file is not the same as the format of an IDL SAVE file.
Look on pypi for savReaderWriter for Python code to read and write sav files.

How to read a large file set

I am very new to Python. So please give specific advice. I am using Python 3.2.2.
I need to read a large file set in my computer. Now I can not even open it. To verify the directory of the file, I used:
>>> import os
>>> os.path.dirname(os.path.realpath('a9000006.txt'))
It gives me the location 'C:\\Python32'
Then I wrote up codes to open it:
>>> file=open('C:\\Python32\a9000006.txt','r')
Traceback (most recent call last):
File "<pyshell#29>", line 1, in <module>
file=open('C:\\Python32\a9000006.txt','r')
IOError: [Errno 22] Invalid argument: 'C:\\Python32\x079000006.txt'
Then I tried another one:
>>> file=open('C:\\Python32\\a9000006.txt','r')
Traceback (most recent call last):
File "<pyshell#33>", line 1, in <module>
file=open('C:\\Python32\\a9000006.txt','r')
IOError: [Errno 2] No such file or directory: 'C:\\Python32\\a9000006.txt'
Then another one:
>>> file=open(r'C:\Python32\a9000006.txt','r')
Traceback (most recent call last):
File "<pyshell#35>", line 1, in <module>
file=open(r'C:\Python32\a9000006.txt','r')
IOError: [Errno 2] No such file or directory: 'C:\\Python32\\a9000006.txt'
The file is saved in the Python folder. But, it is in a folder, so the path is D\Software\Python\Python3.2.2\Part1\Part1awards_1990\awd_1990_00. It is multiple layers of folders.
Also, and anyone share how to read the abstract section of all files in that folder? Thanks.
\a is the ASCII bell character, not a backslash and an a. Use forward slashes instead of backslashes:
open('C:/Python32/a9000006.txt')
and use the actual path to the file instead of C:/Python32/a9000006.txt It's not clear from your question what that path might be; you seem like you might already know the path, but you're misusing realpath in a way that seems like you're trying to use it to search for the file. realpath doesn't do that.

Is it possible to get writing access to raw devices using python with windows?

This is sort of a follow-up to this question. I want to know if you can access raw devices (i.e. \\.\PhysicalDriveN) in writing mode and if this should be the case, how.
Using Linux, write access can simply be achieved by using e.g. open("/dev/sdd", "w+") (provided that the script is running with root permissions). I assume that Mac OS behaves similar (with /dev/diskN as input file).
When trying the same command under Windows (with the corresponding path), it fails with the following error:
IOError: [Errno 22] invalid mode ('w+') or filename: '\\\\.\\PhysicalDrive3'
However, when trying to read from the PhysicalDrive, it does work (even the correct data is read). The shell is running with administrator permissions under Windows 7.
Is there any other way to accomplish this task using python while still keeping the script as platform-independent as possible?
Edit:
I looked a bit further into what methods python provides for file handling and stumbled across os.open. Opening the PhysicalDrive using os.open(drive_string, os.O_WRONLY|os.O_BINARY) returns no error. So far, so good. Now I have either the choice to write directly to this file-descriptor using os.write, or use os.fdopen to get a file-object and write to it in the regular way.
Sadly, none of these possibilities works. In the first case (os.write()), I get this:
>>> os.write(os.open("\\\\.\\PhysicalDrive3", os.O_WRONLY|os.O_BINARY), "test")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument
In the second case, I can create a file object with write permissions, but the writing itself fails (well, after enforcing its execution using .flush()):
>>> g = os.fdopen(os.open("\\\\.\\PhysicalDrive3", os.O_WRONLY|os.O_BINARY), "wb")
>>> g.write("test")
>>> g.flush()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IOError: [Errno 22] Invalid argument
As eryksun and agf pointed out in the comments (but I didn't really get it at first), the solution is rather simple: you have to open the device in the rb+ mode, which opens the device for updating (as I have found out now..) without trying to replace it with a new file (which wouldn't work because the file is in fact a physical drive).
When writing, you have to write always a whole sector at a time (i.e. multiples of 512-byte), otherwise it fails.
In addition, the .seek() command can also jump only sector-wise. If you try to seek a position inside a sector (e.g. position 621), the file object will jump to the beginning of the sector where your requested position is (i.e. to the beginning of the second sector, byte 512).
Possibly in Win 7 you have to do something more extreme, such as locking the volume(s) for the disk beforehand with DeviceIoControl(hVol, FSCTL_LOCK_VOLUME, ...)
In Win 7 you don't have to do that; opening and writing with 'rb+' mode works fine.

Python zlib inflate error

I'm trying to inflate a zlib compressed file using Python with this code:
import zlib
data = open("3B42.110531.21.6A.HDF.Z", 'rb').read()
inflated = zlib.decompress(data)
f = open('3B42.110531.21.6A.HDF', 'wb')
f.write(inflated)
f.close()
I've already done several attempts with different options:
Adding a second parameter to zlib.decompress (zlib.decompress(data,-15))
Skipping the first two bytes zlib.decompress(data[2:-4]) / zlib.decompress(data[2:] /.. )
Basecoding to 64 bits.
Anyway, I keep failing with this message:
Traceback (most recent call last):
File "C:\opt\Python25\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 325, in RunScript
exec codeObject in __main__.__dict__
File "E:\Tesis\data\uncompress.py", line 6, in <module>
inflated = zlib.decompress(data)
error: Error -3 while decompressing data: incorrect header check
The only difference is using a negative parameter in zlib.decompress: invalid block type.
import zlib
data = open("3B42.110531.21.6A.HDF.Z", 'rb').read()
inflated = zlib.decompress(data,-15)
f = open('3B42.110531.21.6A.HDF', 'wb')
f.write(inflated)
f.close()
Traceback (most recent call last):
File "C:\opt\Python25\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 325, in RunScript
exec codeObject in __main__.__dict__
File "E:\Tesis\data\uncompress.py", line 6, in <module>
inflated = zlib.decompress(data,-15)
error: Error -3 while decompressing data: invalid block type
I'm sure that the file is not corrupted, I can open it from WinRAR.
(environment: Windows x64, Python 2.5, I guess that the file is in a Unix machine..binary downloaded)
I've already read the following links
zlib-decompression-in-python
python-inflate and deflate-implementation
.Z indicates a LZC/compress file. Despite the name similarity, this compression format differs from gzip, which is what zlib implements.
Try using the command-line compress utility to uncompress the file (Your gzip program may also be able to decompress it).
The file extension '.Z' and the attempts you tried so far sound like you either use zLib wrong (but it seems correct according to your posted links) or the zLib stream isn't right at the beginning of the file.
You can use my tool Precomp with the file to detect the position of zLib stream(s) inside the file:
precomp -v -slow 3B42.110531.21.6A.HDF.Z
It should output something like this:
Possible zLib-Stream (slow mode) found at position 85, windowbits = 15
Can be decompressed to 9264 bytes
This will tell you both the position of the stream and the windowbits parameter to use (negated).
It will also tell you if there are zLib streams inside the file at all because as phihag said, it's possible that the file is compressed with something different than deflate/zLib. Note that in this case, there'll probably be some misdetections as the zLib header is only 2 bytes in size, but those can be identified by decompressing to <100 bytes.

Categories