Protobuf Decode Error when reading bytes from withing Bitbucket

Protobuf Decode Error when reading bytes from withing Bitbucket - python

I have a protobuf file saved as bytes on Windows.
I am reading it as follows:
tlog = tlog_schema_pb2.TLog()
with open("tests/unittests/data/tlog.proto", "rb") as f:
tlog.ParseFromString(f.read())
It all fine. But when I push my changes with git to Bitbucket, there I get an error:
google.protobuf.message.DecodeError: Error parsing message with type 'globusdigital.tlogprocessing.TLog'
I cannot understand what can it be. What can change when I push to Bitbucket?

It appears that the problem was in pre-commit.
I had trailing-whitespace check there, and it was destructing protobuf file.
To solve it, just add
files: '\.pyi?$'
at the end of .pre-commit-config.yaml (would be cool to exclude files, but did not understand how)

Related

Python-GNUPG encrypted file cannot be decrypted with private key

I am trying to encrypt a text file in Python 3.6 using python-gnupg, and a public key provided by a client, for which they have a private key to decrypt it with. I don't have access to that key. Despite python-gnupg appearing to successfully encrypt the file (though with some confusing errors appearing in the log), the client is unable to decrypt it. We're told the error they're getting is gpg: decryption failed: No secret key
When we tested encrypting a file using Cryptophane (different computer, running Windows instead of Ubuntu) and the same public key, they were able to decrypt it. This is how the encryption was successfully done manually for months. When testing the same code with our company public key, we were able to decrypt it using our private key and Cryptophane.
I've googled extensively for the error messages and general problem, and haven't found anything that seemed to be the same problem getting solved.
Here's the relevant code. filepath is the relative path to the file to be encrypted. pgp_key_name is the name of the .asc file containing the public key. pgp_key_dir is the directory it's in.
def pgp_encrypt_file(filepath, pgp_key_name, pgp_key_dir):
gpg = gnupg.GPG()
output_full_filepath = filepath + '.pgp'
try:
with open(pgp_key_dir + pgp_key_name) as file:
key_data = file.read()
import_result = gpg.import_keys(key_data)
logger.info(msg='Public key imported: {}'.format(pgp_key_name))
public_keys = gpg.list_keys()
fingerprint = public_keys[0]['fingerprint']
logger.info(msg='Attempting to encrypt file: ' +
output_full_filepath)
with open(filepath, 'r') as f:
newfile = f.read()
status = gpg.encrypt(newfile, fingerprint,
output=output_full_filepath)
logger.info(msg='status.ok : ' + str(status.ok))
logger.info(msg='status.status : ' + str(status.status))
except FileNotFoundError as e:
logger.error(msg='File not found: ' + str(e))
except TypeError as e:
logger.error(msg='GNUPG TypeError: ' + str(e))
return output_full_filepath
And the relevant section of the logs:
03-01 15:18:58 gnupg INFO Setting homedir to
'/home/[user]/.config/python-gnupg'
03-01 15:18:58 gnupg ERROR Could neither invoke nor terminate a
gpg process... Are you sure you specified the corrent (and full) path to the
gpg binary?
(That error did NOT appear later, and I was unable to find anything relevant on Google or Stack Overflow for it.)
03-04 09:04:39 gnupg WARNING Ignoring '/usr/bin/gpg' (path is a symlink)
03-04 09:04:39 gnupg ERROR Could not find binary for 'gpg'.
03-04 09:04:39 gnupg INFO Setting homedir to
'/home/[user]/.config/python-gnupg'
03-04 09:04:39 gnupg INFO
Initialised settings:
binary: /usr/bin/gpg2
binary version: `2.0.14\ncfg:pubkey:1;16;17\ncfg:cipher:2;3;4;7;8;9;10;11;12;13\ncfg:ciphername:3DES;CAST5;BLOWFISH;AES;AES192;AES256;TWOFISH;CAMELLIA128;CAMELLIA192;CAMELLIA256\ncfg:digest:1;2;3;8;9;10;11\ncfg:digestname:MD5;SHA1;RIPEMD160;SHA256;SHA384;SHA512;SHA224\ncfg:compress:0;1;2;3\n'
homedir: /home/[user]/.config/python-gnupg
ignore_homedir_permissions: False
keyring: /home/[user]/.config/python-gnupg/pubring.gpg
secring: /home/[user]/.config/python-gnupg/secring.gpg
default_preference_list: SHA512 SHA384 SHA256 AES256 CAMELLIA256 TWOFISH
AES192 ZLIB ZIP Uncompressed
keyserver: hkp://wwwkeys.pgp.net
options: None
verbose: False
use_agent: False
03-04 09:04:39 gnupg INFO Importing: [first few lines of public key]
03-04 09:04:39 root INFO Public key imported: [name of key]
03-04 09:04:39 root INFO Attempting to encrypt file: [file]
03-04 09:04:39 gnupg INFO Writing encrypted output to file:
[file.pgp]
03-04 09:04:39 gnupg INFO Encrypted output written successfully.
Some thoughts and things we've tried:
Though there is a gpg binary in /usr/bin/gpg, we're using a conda virtual environment for the project itself, which I think may be messing this up. However, when I ran this code from the command line, with the environment deactivated, I ended up with the same result.
I see that the log file says that it couldn't find the gpg binary, and that it's ignoring a symlink pointing to it, but all of its status messages thereafter seemed to indicate that the encrytion was fine, and again, it worked just fine multiple times with a different public/private key pair.
Examining the pgp object in the IDE once instantiated leads me to think that it found the gpg binary just fine, even without passing any parameters to gnupg.GPG(). Passing in gnupghome='/usr/bin/gpg' leads me to the same place, and passing in gnupghome='not/real/path throws an error.
Setting armor=False on the call to encrypt did not change anything.
I really appreciate any and all thoughts on the matter.
If the answer is that it's just not looking in the right directories for the gpg binary or homedir, due to our virtual environment settings, recommendations on how to work around that would also be appreciated.

Resolved.
In this case, it was the client's error. We later attempted to encrypt the file using a variety of slightly different options, including many done from the command line, and from Python.
They were able to decrypt every single one.
For the sake of helping some others down the line, here are a few things that I've learned since starting on this journey:
There are two distinct packages both named python-gnupg.
The original one (from what I understand): https://pythonhosted.org/python-gnupg/
And a fork of it: https://github.com/isislovecruft/python-gnupg
Since these packages share a name, it is very confusing when googling errors in one or the other. Doing pip install python-gnupg seems to always download the second one. My experience is almost entirely with this second one, so keep that in mind when reading everything else in this post.
On CentOS 6, /usr/bin/gpg is a symlink that points to /usr/bin/gpg2. Python-GNUPG logs errors noting this, but then it seems to find /usr/bin/gpg2 just fine.
Regarding the error Could neither invoke nor terminate a gpg process...: While this concerned me, this also appears to have had no effect at all on any functionality. Your mileage may vary.
Compatibility issues are possible between the Python-GNUPG version and gpg binary version. This can lead to Unknown status message: [SOME-GPG-MESSAGE] errors; for example: Unknown status message: PINENTRY_LAUNCHED which I believe arises when gpg tries to bring up the passphrase prompt (which it does not do in older versions!). If you are NOT trying to make a module with different uses on different OSes (we were), you can try your luck with manually editing the python-gnupg source code once you pip install the package. Specifically, in pretty_bad_protocol._parsers.py in the _handle_status method, there is a tuple of known status messages; just add in any of the "unknown" status messages there, and that error won't trip in the future. I mean, you're on your own after that, but it was something that we tried and it doesn't appear to have harmed anything.
Best of luck to anyone trying to do pgp encryption in the future.

thanks for providing all the details.
I have fixed that issue by
gpg = gnupg.GPG(binary='/usr/bin/gpg2', homedir='/tmp')

Install virtualenv fails because an HTML is downloaded not a tar.gz

I follow instructions here:
However when I try to tar xvfz:
gzip: stdin: not in gzip format tar: Child returned status 1 tar:
Error is not recoverable: exiting now
this happens because apparently, I have downloaded an HTML:
file virtualenv-16.0.tar.gz
gives:
virtualenv-16.0.tar.gz: HTML document, ASCII text, with no line
terminators
I guess the problem could be in some settings on my machine, but have no idea which ones.
Thanks for any help.

This is because the file url mentioned in the instructions, you followed is invalid and so it returns following html code instead of expected tar.gz file:
<html><head><title>301 Moved Permanently</title></head><body><center><h1>301 Moved Permanently</h1></center></body></html>
Solution
Try below url:
https://files.pythonhosted.org/packages/33/bc/fa0b5347139cd9564f0d44ebd2b147ac97c36b2403943dbee8a25fd74012/virtualenv-16.0.0.tar.gz
You can refer from here:
https://pypi.org/project/virtualenv/#files

create a new odt-file with python

I have a folder "myfolder" which was created when I had unzipt an odt-file. I had delete the content.xml in the folder. Now I want to add a file called "content.xml" with data in it (here in the variable "content" is the xml-styled text). I tried this:
with zipfile.ZipFile('myfolder', mode='a', compression=zipfile.ZIP_DEFLATED) as zf:
zf.writestr('content.xml', content)
I get an odt-file but it is damaged. when I unzip it there is only the content.xml in it. the mode parameter is 'a' so I thougt it will append the content.xml to the other files.
Can anybody help?

You can try to use the package odfpy.
you can see some info here and here
Odfpy is a library to read and write OpenDocument v. 1.2 files. The
main focus has been to prevent the programmer from creating invalid
documents. It has checks that raise an exception if the programmer
adds an invalid element, adds an attribute unknown to the grammar,
forgets to add a required attribute or adds text to an element that
doesn’t allow it. It's on pipy, so you can install it with
you can install it with pip:
pip install odfpy

Python cannot read "warc.gz" file completely

For my work, I scrape web-sites and write them to gzipped web-archives (with extension "warc.gz"). I use Python 2.7.11 and the warc 0.2.1 library.
I noticed that for majority of files I cannot read them completely with the warc-library. For example if the warc.gz file has 517 records, I can read only about 200 of them.
After some research I found out that this problem happens only with the gzipped files. The files with extension "warc" do not have this problem.
I have found out that some people have this problem as well (https://github.com/internetarchive/warc/issues/21), while no solution for it is found.
I guess that there might be a bug in "gzip" in Python 2.7.11. Does maybe someone have experience with this, and know what can be done about this problem?
Thanks in advance!
Example:
I create new warc.gz files like this:
import warc
warc_path = "\\some_path\file_name.warc.gz"
warc_file = warc.open(warc_path, "wb")
To write records I use:
record = warc.WARCRecord(payload=value, headers=headers)
warc_file.write_record(record)
This creates perfect "warc.gz" files. There are no problems with them. All, including "\r\n" is correct. But the problem starts when I read these files.
To read files I use:
warc_file = warc.open(warc_path, "rb")
To loop through records I use:
for record in warc_file:
...
The problem is that not all records are found during this looping for "warc.gz" file, while they all are found for "warc" files. Working with both types of files is addressed in the warc-library itself.

It seems that the custom gzip handling in warc.gzip2.GzipFile, file splitting with warc.utils.FilePart and reading in warc.warc.WARCReader is broken as a whole (tested with python 2.7.9, 2.7.10 and 2.7.11). It stops short when it receives no data instead of a new header.
It would seem that basic stdlib gzip handles the catenated files just fine and so this should work as well:
import gzip
import warc
with gzip.open('my_test_file.warc.gz', mode='rb') as gzf:
for record in warc.WARCFile(fileobj=gzf):
print record.payload.read()

Winzip cannot open an archive created by python shutil.make_archive on windows. On ubuntu archive manager does fine

I am trying to return a zip file in django http response, the code goes something like...
archive = shutil.make_archive('testfolder', 'zip', MEDIA_ROOT, 'testfolder')
response = HttpResponse(FileWrapper(open(archive)),
content_type=mimetypes.guess_type(archive)[0])
response['Content-Length'] = getsize(archive)
response['Content-Disposition'] = "attachment; filename=test %s.zip" % datetime.now()
return response
Now when this code is executed on ubuntu the resulting downloaded file opens without any issue, but when its executed on windows the file created does not open in winzip (gives error 'Unsupported Zip Format').
Is there something very obvious I am missing here? Isn't python code supposed to be portable?
EDIT:
Thanks to J.F. Sebastian for his comment...
There was no problem in creating the archive, it was reading it back into the request. So, the solution is to change second line of my code from,
response = HttpResponse(FileWrapper(open(archive)),
content_type=mimetypes.guess_type(archive)[0])
to,
response = HttpResponse(FileWrapper(open(archive, 'rb')), # notice extra 'rb'
content_type=mimetypes.guess_type(archive)[0])
checkout, my answer to this question for more details...

The code you have written should work correctly. I've just run the following line from your snippet to generate a zip file and was able to extract on Linux and Windows.
archive = shutil.make_archive('testfolder', 'zip', MEDIA_ROOT, 'testfolder')
There is something funny and specific going on. I recommend you check the following:
Generate the zip file outside of Django with a script that just has that one liner. Then try and extract it on a Windows machine. This will help you rule out anything going on relating to Django, web server or browser
If that works then look at exactly what is in the folder you compressed. Do the files have any funny characters in their names, are there strange file types, or super long filenames.
Run a md5 checksum on the zip file in Windows and Linux just to make absolutely sure that the two files are byte by byte identical. To rule out any file corruption that might have occured.

Thanks to J.F. Sebastian for his comment...
I'll still write the solution here in detail...
There was no problem in creating the archive, it was reading it back into the request. So, the solution is to change second line of my code from,
response = HttpResponse(FileWrapper(open(archive)),
content_type=mimetypes.guess_type(archive)[0])
to,
response = HttpResponse(FileWrapper(open(archive, 'rb')), # notice extra 'rb'
content_type=mimetypes.guess_type(archive)[0])
because apparently, hidden somewhere in python 2.3 documentation on open:
The most commonly-used values of mode are 'r' for reading, 'w' for
writing (truncating the file if it already exists), and 'a' for
appending (which on some Unix systems means that all writes append to
the end of the file regardless of the current seek position). If mode
is omitted, it defaults to 'r'. The default is to use text mode, which
may convert '\n' characters to a platform-specific representation on
writing and back on reading. Thus, when opening a binary file, you
should append 'b' to the mode value to open the file in binary mode,
which will improve portability. (Appending 'b' is useful even on
systems that don’t treat binary and text files differently, where it
serves as documentation.) See below for more possible values of mode.
So, in simple terms while reading binary files, using open(file, 'rb') increases portability of your code (it certainly did in this case)
Now, it extracts without troubles, on windows...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.