Python gnupg encrypt_file(StringIO()) creates encrypted PGP message with empty content? - python

I don't understand what I'm doing wrong here. As far as I can tell from the python gnupg documentation this encrypt_file call should have content like the encrypt call, but everything I'm trying is resulting in a GPG blob that decrypts to a blank file. No error, no indication that there was any problem with the operation, just... emptyness, like it was compiled with sartre=1...
Am I missing some key to the incantation?
Python 3.6.5,
python-gnupg==0.4.7
from gnupg import GPG
from io import StringIO
gpg = GPG(gpgbinary='/usr/local/bin/gpg', gnupghome='/tmp/gpg/')
keyserver = <insert keyserver URI>
fp = <insert GPG key fingerprint>
gpg.recv_keys(keyserver, fp)
sio = StringIO('\n'.join([f"Hello world {n}" for n in range(120)]))
c0 = gpg.encrypt(sio.getvalue(), recipients=fp)
len(str(c0))
1252
c1 = gpg.encrypt_file(sio, recipients=fp)
len(str(c1))
858
sio = StringIO('\n'.join([f"Hello world {n}" for n in range(1200)]))
c0 = gpg.encrypt(sio.getvalue(), recipients=fp)
len(str(c0))
4559
c1 = gpg.encrypt_file(sio, recipients=fp)
len(str(c1))
858

Seems the following were the operative issues:
The file-like object, it seems, has to be binary; e.g. a BytesIO or a file pointer opened in binary mode.
You have to seek(0) to write the full content of the file-like object. (Kind of obvious but I overlooked it; I probably kept checking encrypt() first then encrypt_file() without seeking back to the start. It probably didn't help that this doesn't make it work on a StringIO, so I may have mentally ruled it out as the issue early on.)

Related

Python: Symmetric Encryption with GPG and Subprocess

I'm trying to achieve the functionality provided by the following bash command in Python.
echo "$DATA" | gpg --symmetric --armor --batch --passphrase "${KEY}"
So far I've tried to use subprocess but am having difficulty passing in the data. I tried giving it as a command in the list of parameters to send to subprocess but that just effectively echos the the entire thing.
cmd = f"| gpg --symmetric --armor --batch --passphrase {key}".split()
temp = ["echo", f"\"{data}\""]
temp.extend(cmd)
res = subprocess.run(temp, stdout=subprocess.PIPE, universal_newlines=True)
encrypted = res.stdout.strip()
I'm also interested in using the python-gnupg module but have not yet figured out how to replicate the above with it either.
Thanks in advance for any help!
You can use the input argument to run()/check_output():
from getpass import getpass
import subprocess
key = getpass("KEY: ")
data = b'Symmetric Encryption with GPG and Subprocess'
command = ["gpg", "--symmetric", "--armor", "--batch", "--passphrase", key]
out = subprocess.check_output(command, input=data, universal_newlines=False)
Note that GNU echo will, by default, append a newline. Use echo -n to not print the trailing \n. Either way, you'll want to be careful to mimic this in Python.
In case anyone was wondering, I also got the python-gnupg module to work for my application. I am sticking with the subprocess answer since that reduces dependencies but wanted to share this as well.
gpg = gnupg.GPG()
encrypted = str(gpg.encrypt(data, recipients=None, symmetric=True, passphrase=key, extra_args=["--batch"]))
The python-gnupg module has a long history with serious security flaws, many of which it's more likely to be affected by due to the decision to use subproess to call an external binary executable.
Instead, the recommendation of the GnuPG Project is to use the CPython bindings for the GPGME C API, which ship with the GPGME source code.
import gpg
from getpass import getpass
key = getpass("KEY: ")
c = gpg.Context(armor=True)
data = b"Symmetric encryption with GPGME."
ciphertext, result, sign_result = c.encrypt(data, sign=False, passphrase=key)
with open("some_file.txt.asc", "wb") as f:
f.write(ciphertext)
Because this uses symmetric encryption, there won't be a digital signature included and there are no recipient keys to check for. Which means both result and sign_result will return None. Only ciphertext contains anything and that's the ASCII armoured excrypted data, which can be written to a file as above, or you can do something else with it.
The documentation for this, far superior, module is included with the GPGME source, but an online draft version is available here.

Embed filename using python-gnupg

I'm able to embed the original filename using python-gnupg when encrypting a file using below:
filename = 'test.tim'
with open(filename, 'rb) as fh:
status = gpg.encrypt_file(fh, recipients='somerecipient', output='test.tim.gpg',\
sign='somesignature', extra_args=['--set-filename', os.path.basename(filename)])
This can be verified by using gpg from the command line:
$ gpg2 --list-packets test.tim.gpg | grep name
I am however unable to preserve the original filename when decrypting the file:
with open(filename, 'rb') as fh:
status = gpg.decrypt_file(fh, extra_args['--use-embedded-filename'])
I am aware about the output parameter (which specifies the filename to save contents to) in the decrypt_file function, but i want to preserve the original filename (which i won't always know)
It seems the decrypt_file function always passes the --decrypt flag to gpg which always ouputs the contents to stdout (unless used in conjunction with output parameter) as in:
$ gpg --decrypt --use-embedded-filename test.tim.gpg
Below command will decrypt and save output to original filename:
$ gpg --use-embedded-filename test.tim.gpg
Any ideas?
Tim
The functionality to do what you want doesn't exist in the original python-gnupg.
There's a modified version here by isislovecruft (which is what you get if you pip install gnupg) that adds support for --list-packets with gpg.listpackets but still doesn't support --use-embeded-file-name
So my approach, if I were to insist on using python only, would probably be to start with isislovecruft's version and then subclass GPG like this:
import gnupg
import os
GPGBINARY = os.environ.get('GPGBINARY', 'gpg')
hd = os.path.join(os.getcwd(), 'keys')
class myGPG(gnupg.GPG):
def decrypt_file_original_name(self, file, always_trust=False, passphrase=None, extra_args=None):
args = ["--use-embedded-filename"]
output = calculate_the_file_name_using_list_packets()
self.set_output_without_confirmation(args, output)
if always_trust: # pragma: no cover
args.append("--always-trust")
if extra_args:
args.extend(extra_args)
result = self.result_map['crypt'](self)
self._handle_io(args, file, result, passphrase, binary=True)
# logger.debug('decrypt result: %r', result.data)
return result
gpg = myGPG(gnupghome=hd, gpgbinary=GPGBINARY)
Bear in mind, at this point it is almost certainly much easier to just use subprocess and run the gpg binary directly from the shell, especially as you don't care about the output.
Anyway, I got this far and have run out of time for now, so I leave implementing calculate_the_file_name_using_list_packets up to you, if you choose to go the 'pure python' route. Hopefully it is a bit easier now you have gpg.list-packets. Good luck!

python gnupg decrypt error

I'm new to Python and I want to decrypt a downloadable PGP encrypted file in Python using gnupg module (http://pythonhosted.org/python-gnupg/) (I thought that simple API calls should be easy but I have wasted so much time on this that I thought of getting some help).
So I'm able to download a file from the url in Python, and I tried decryping it with Gpg4win software, and it works well. I get different errors when I try to decrypt it in Python using gnupg module.
Ideally I would like to download the file from the url, and decrypt it, and then store it in a file (Rather than downloading the file, saving it, decryting the file, saving a new decypted file)
This is my prototype code:
#test
import urllib2
import gnupg
z='https://abcd_url'
u = urllib2.urlopen(z)
localFile = open('file_haha_test2', 'w+b')
localFile.write(u.read())
gpg = gnupg.GPG()
#gpg.encoding = 'utf-8'
##gpg = gnupg.GPG(gnupghome='C:\\Program Files (x86)\\GNU\\Desktop\\GnuPG',
## gpgbinary='C:\\Program Files (x86)\\GNU\\Desktop\\GnuPG\\gpg.exe',
## keyring='C:\\user\\Desktop\\Encryption keys\\secret-key-73F.asc')
status = str(gpg.decrypt(u.read(), passphrase='hp', output='HAHAHAH.txt'))
#status = str(gpg.decrypt_file(localFile, passphrase='hp',output='HAHAHAH.txt'))
#status = gpg.decrypt_file(localFile)
print status
#localFile.close()
I got different errors for different syntax(You can see them commented above). Currently I'm getting not getting any output on the screen. I think it should print the contents.
I really want to get this working as quickly as possible and any help will be greatly appreciated.
I assume you are using the library on PyPI as ‘gnupg’.
Have you tried the library on PyPI as ‘python-gnupg’?
The two libraries have quite similar APIs, and both have activity in recent years. I don't know which is better in general.

Pycurl: uploading files with filenames in UTF-8

This question is related to this one.
Please read the problem that Chris describes there. I'll narrow it down: there's a CURL error 26 if a filename is utf-8-encoded and contains characters that are not in the range of those supported by non-unicode programs.
Let me explain myself:
local_filename = filename.encode("utf-8")
self.curl.setopt(self.curl.HTTPPOST, [(field, (self.curl.FORM_FILE, local_filename, self.curl.FORM_FILENAME, local_filename))])
I have windows 7 with Russian set as the language for non-unicode programs. If I don't encode filename to utf-8 (and pass filename, not local_filename to pycurl(, everything goes flawlessly if the filename contains either English or Russian chars. But if there is, say, an à, — it throws an error 26. If I pass local_filename (so encoded to UTF-8), even Russian chars aren't allowed.
Could you help, please? Thanks!
This is easy to answer, harder to fix:
pycurl uses libcurl for formposting. libcurl uses plain fopen() to open files for posting. Therefore you need to tell libcurl the exact file name that it should open and read from your local file system.
Decompose this problem into 2 components:
tell pycurl which file to open to read file data
send filename in correct encoding to the server
These may or may not be same encodings.
For 1, use sys.getfilesystemencoding() to convert unicode filename (which you use throughout python code correctly) to a string that pycurl/libcurl can open correctly with fopen(). Use strace (linux) or equivalent windows osx to verify correct file path is being opened by pycurl.
If that totally fails you can always feed file data stream from Python via pycurl.READFUNCTION.
For 2, learn how filename is transmitted during file upload, example. I don't have a good link, all I know it's not trivial, e.g. when it comes to very long file names.
I hacked up your code snippet, I have this, it works against nc -vl 5050 at least.
#!/usr/bin/python
import pycurl
c = pycurl.Curl()
filename = u"example-\N{EURO SIGN}.mp3"
with open(filename, "wb") as f:
f.write("\0\xfffoobar\x07\xff" * 9)
local_filename = filename.encode("utf-8")
c.setopt(pycurl.HTTPPOST, [("xxx", (pycurl.FORM_FILE, local_filename, pycurl.FORM_FILENAME, local_filename))])
c.setopt(pycurl.URL, "http://localhost:5050/")
c.setopt(pycurl.HTTPHEADER, ["Expect:"])
c.perform()
My test doesn't cover the case where encoding is different between OS and HTTP.
Should be enough to get your started though, shouldn't it?

How can I work with Gzip files which contain extra data?

I'm writing a script which will work with data coming from instrumentation as gzip streams. In about 90% of cases, the gzip module works perfectly, but some of the streams cause it to produce IOError: Not a gzipped file. If the gzip header is removed and the deflate stream fed directly to zlib, I instead get Error -3 while decompressing data: incorrect header check. After about half a day of banging my head against the wall, I discovered that the streams which are having problems contain a seemingly-random number of extra bytes (which are not part of the gzip data) appended to the end.
It strikes me as odd that Python cannot work with these files for two reasons:
Both Gzip and 7zip are able to open these "padded" files without issue. (Gzip produces the message decompression OK, trailing garbage ignored, 7zip succeeds silently.)
Both the Gzip and Python docs seem to indicate that this should work: (emphasis mine)
Gzip's format.txt:
It must be possible to
detect the end of the compressed data with any compression method,
regardless of the actual size of the compressed data. In particular,
the decompressor must be able to detect and skip extra data appended
to a valid compressed file on a record-oriented file system, or when
the compressed data can only be read from a device in multiples of a
certain block size.
Python's gzip.GzipFile`:
Calling a GzipFile object’s close() method does not close fileobj, since you might wish to append more material after the compressed data. This also allows you to pass a StringIO object opened for writing as fileobj, and retrieve the resulting memory buffer using the StringIO object’s getvalue() method.
Python's zlib.Decompress.unused_data:
A string which contains any bytes past the end of the compressed data. That is, this remains "" until the last byte that contains compression data is available. If the whole string turned out to contain compressed data, this is "", the empty string.
The only way to determine where a string of compressed data ends is by actually decompressing it. This means that when compressed data is contained part of a larger file, you can only find the end of it by reading data and feeding it followed by some non-empty string into a decompression object’s decompress() method until the unused_data attribute is no longer the empty string.
Here are the four approaches I've tried. (These examples are Python 3.1, but I've tested 2.5 and 2.7 and had the same problem.)
# approach 1 - gzip.open
with gzip.open(filename) as datafile:
data = datafile.read()
# approach 2 - gzip.GzipFile
with open(filename, "rb") as gzipfile:
with gzip.GzipFile(fileobj=gzipfile) as datafile:
data = datafile.read()
# approach 3 - zlib.decompress
with open(filename, "rb") as gzipfile:
data = zlib.decompress(gzipfile.read()[10:])
# approach 4 - zlib.decompressobj
with open(filename, "rb") as gzipfile:
decompressor = zlib.decompressobj()
data = decompressor.decompress(gzipfile.read()[10:])
Am I doing something wrong?
UPDATE
Okay, while the problem with gzip seems to be a bug in the module, my zlib problems are self-inflicted. ;-)
While digging into gzip.py I realized what I was doing wrong — by default, zlib.decompress et al. expect zlib-wrapped streams, not bare deflate streams. By passing in a negative value for wbits, you can tell zlib to skip the zlib header and decrompress the raw stream. Both of these work:
# approach 5 - zlib.decompress with negative wbits
with open(filename, "rb") as gzipfile:
data = zlib.decompress(gzipfile.read()[10:], -zlib.MAX_WBITS)
# approach 6 - zlib.decompressobj with negative wbits
with open(filename, "rb") as gzipfile:
decompressor = zlib.decompressobj(-zlib.MAX_WBITS)
data = decompressor.decompress(gzipfile.read()[10:])
This is a bug. The quality of the gzip module in Python falls far short of the quality that should be required in the Python standard library.
The problem here is that the gzip module assumes that the file is a stream of gzip-format files. At the end of the compressed data, it starts from scratch, expecting a new gzip header; if it doesn't find one, it raises an exception. This is wrong.
Of course, it is valid to concatenate two gzip files, eg:
echo testing > test.txt
gzip test.txt
cat test.txt.gz test.txt.gz > test2.txt.gz
zcat test2.txt.gz
# testing
# testing
The gzip module's error is that it should not raise an exception if there's no gzip header the second time around; it should simply end the file. It should only raise an exception if there's no header the first time.
There's no clean workaround without modifying the gzip module directly; if you want to do that, look at the bottom of the _read method. It should set another flag, eg. reading_second_block, to tell _read_gzip_header to raise EOFError instead of IOError.
There are other bugs in this module. For example, it seeks unnecessarily, causing it to fail on nonseekable streams, such as network sockets. This gives me very little confidence in this module: a developer who doesn't know that gzip needs to function without seeking is badly unqualified to implement it for the Python standard library.
I had a similar problem in the past. I wrote a new module that works better with streams. You can try that out and see if it works for you.
I had exactly this problem, but none of this answers resolved my issue. So, here is what I did to solve the problem:
#for gzip files
unzipped = zlib.decompress(gzip_data, zlib.MAX_WBITS|16)
#for zlib files
unzipped = zlib.decompress(gzip_data, zlib.MAX_WBITS)
#automatic header detection (zlib or gzip):
unzipped = zlib.decompress(gzip_data, zlib.MAX_WBITS|32)
Depending on your case, it might be necessary to decode your data, like:
unzipped = unzipped.decode()
https://docs.python.org/3/library/zlib.html
I couldn't make it to work with the above mentioned techniques. so made a work around using zipfile package
import zipfile
from io import BytesIO
mock_file = BytesIO(data) #data is the compressed string
z = zipfile.ZipFile(file = mock_file)
neat_data = z.read(z.namelist()[0])
Works perfect

Categories