I want to install some python packages using pip but cannot as every file downloaded produces the same hash, which then fails comparison in pips security check.
After playing around, I see that every file I download using curl from files.pythonhosted will hash to the same value. I've tested this with a python script like so:
curl http://files.pythonhosted.org/packages/1a/80/b06ce333aabba7ab1b6a41ea3c4e46970ceb396e705733480a2d47a7f74b/Django-4.0.3-py3-none-any.whl -o django.whl
import hashlib
hasher = hashlib.sha256()
BLOCKSIZE = 65536
def hash_stuff(file):
with open(file, 'rb') as afile:
buf = afile.read(BLOCKSIZE)
while len(buf) > 0:
hasher.update(buf)
buf = afile.read(BLOCKSIZE)
print(hasher.hexdigest())
hash_stuff("pynvim.tar.gz")
hash_stuff("opencv.tar.gz")
hash_stuff("django.whl")
which outputs:
➜ ~ python pythonhash.py
c77ab57a36e39ce205ca2327a3edd10399f4d78a3be91e80d845a1b97c29b7d6
ea75572349ed10da0f3224398737fd08352ae10e6f3c571345feb971e080a276
9e31adaf584633587df90d7be36e2fb287c7344eaa4bb23d619f4bdaa19a67d0
if I modify the order of the hash_stuff function like so (note the ordering is different):
hash_stuff("django.whl")
hash_stuff("opencv.tar.gz")
hash_stuff("pynvim.tar.gz")
the output does not change!
➜ ~ python pythonhash.py
c77ab57a36e39ce205ca2327a3edd10399f4d78a3be91e80d845a1b97c29b7d6
ea75572349ed10da0f3224398737fd08352ae10e6f3c571345feb971e080a276
9e31adaf584633587df90d7be36e2fb287c7344eaa4bb23d619f4bdaa19a67d0
If I reset the hasher object I get the first hash c77ab57 three times like so
def hash_stuff(file):
hasher = hashlib.sha256()
BLOCKSIZE = 65536
with open(file, 'rb') as afile:
-----
➜ ~ python pythonhash.py
c77ab57a36e39ce205ca2327a3edd10399f4d78a3be91e80d845a1b97c29b7d6
c77ab57a36e39ce205ca2327a3edd10399f4d78a3be91e80d845a1b97c29b7d6
c77ab57a36e39ce205ca2327a3edd10399f4d78a3be91e80d845a1b97c29b7d6
I've written the same test in ruby and getting the same results..
require 'digest'
puts Digest::SHA256.hexdigest File.read "django.whl"
puts Digest::SHA256.hexdigest File.read "opencv.tar.gz"
puts Digest::SHA256.hexdigest File.read "pynvim.tar.gz"
As a sanity check, I've tested hashing some local files and they produce the same hash consistently, regardless of ordering.
How can the ordering of execution effect the hash?
erm, files.pythonhosted doesn't even have a proper ssl certificate.. - can I even trust this host?
What could I possibly be doing wrong?
Turns out my internet provider was blocking the content due to files.pythonhosted having a self signed ssl certificate.
the reason the hash was the same for all files was because I was getting an error html page (doh..) thanks #jasonharper for the pointer!
Related
The php code is
'''
$input_file = "a.txt";
$source = file_get_contents($input_file);
$source = gzcompress($source);
file_put_contents("php.txt",$source)
'''
The python code is
'''
testFile = "a.txt"
content = None
with open(testFile,"rb") as f:
content = f.read()
outContent = zlib.compress(content)
with open("py.txt","wb") as f:
f.write(outContent)
'''
The python3 version is [Python 3.6.9]
The php version is [PHP 7.2.17]
I need the same result for same md5.
The problem is not in PHP or Python, but rather in your "need". You cannot expect to get the same result, unless the two environments happen to be using the same version of the same compression code with the same settings. Since you do not have control of the version of code being used, your "need" can never be guaranteed to be met.
You should instead be doing your md5 on the decompressed data, not the compressed data.
I find the solution.
The code is
compress = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION,zlib.DEFLATED, 15, 9)
outContent = compress.compress(content)
outContent += compress.flush()
The python zlib provide a interface "zlib.compressobj",which returns a compressobj,and the parameters decide the result.
You can adjust parameters to make sure the python's result is same with php's
I'm able to embed the original filename using python-gnupg when encrypting a file using below:
filename = 'test.tim'
with open(filename, 'rb) as fh:
status = gpg.encrypt_file(fh, recipients='somerecipient', output='test.tim.gpg',\
sign='somesignature', extra_args=['--set-filename', os.path.basename(filename)])
This can be verified by using gpg from the command line:
$ gpg2 --list-packets test.tim.gpg | grep name
I am however unable to preserve the original filename when decrypting the file:
with open(filename, 'rb') as fh:
status = gpg.decrypt_file(fh, extra_args['--use-embedded-filename'])
I am aware about the output parameter (which specifies the filename to save contents to) in the decrypt_file function, but i want to preserve the original filename (which i won't always know)
It seems the decrypt_file function always passes the --decrypt flag to gpg which always ouputs the contents to stdout (unless used in conjunction with output parameter) as in:
$ gpg --decrypt --use-embedded-filename test.tim.gpg
Below command will decrypt and save output to original filename:
$ gpg --use-embedded-filename test.tim.gpg
Any ideas?
Tim
The functionality to do what you want doesn't exist in the original python-gnupg.
There's a modified version here by isislovecruft (which is what you get if you pip install gnupg) that adds support for --list-packets with gpg.listpackets but still doesn't support --use-embeded-file-name
So my approach, if I were to insist on using python only, would probably be to start with isislovecruft's version and then subclass GPG like this:
import gnupg
import os
GPGBINARY = os.environ.get('GPGBINARY', 'gpg')
hd = os.path.join(os.getcwd(), 'keys')
class myGPG(gnupg.GPG):
def decrypt_file_original_name(self, file, always_trust=False, passphrase=None, extra_args=None):
args = ["--use-embedded-filename"]
output = calculate_the_file_name_using_list_packets()
self.set_output_without_confirmation(args, output)
if always_trust: # pragma: no cover
args.append("--always-trust")
if extra_args:
args.extend(extra_args)
result = self.result_map['crypt'](self)
self._handle_io(args, file, result, passphrase, binary=True)
# logger.debug('decrypt result: %r', result.data)
return result
gpg = myGPG(gnupghome=hd, gpgbinary=GPGBINARY)
Bear in mind, at this point it is almost certainly much easier to just use subprocess and run the gpg binary directly from the shell, especially as you don't care about the output.
Anyway, I got this far and have run out of time for now, so I leave implementing calculate_the_file_name_using_list_packets up to you, if you choose to go the 'pure python' route. Hopefully it is a bit easier now you have gpg.list-packets. Good luck!
a way of checking if a file has been modified, is calculating and storing a hash (or checksum) for the file. Then at any point the hash can be re-calculated and compared against the stored value.
I'm wondering if there is a way to store the hash of a file in the file itself? I'm thinking text files.
The algorithm to calculate the hash should be iterative and consider that the hash will be added to the file the hash is being calculated for... makes sense? Anything available?
Thanks!
edit:
https://security.stackexchange.com/questions/3851/can-a-file-contain-its-md5sum-inside-it
from Crypto.Hash import HMAC
secret_key = "Don't tell anyone"
h = HMAC.new(secret_key)
text = "whatever you want in the file"
## or: text = open("your_file_without_hash_yet").read()
h.update(text)
with open("file_with_hash") as fh:
fh.write(text)
fh.write(h.hexdigest())
Now, as some people tried to point out, though they seemed confused - you need to remember that this file has the hash on the end of it and that the hash is itself not part of what gets hashed. So when you want to check the file, you would do something along the lines of:
end_len = len(h.hex_digest())
all_text = open("file_with_hash").read()
text, expected_hmac = all_text[:end_len], all_text[end_len:]
h = HMAC.new(secret_key)
h.update(text)
if h.hexdigest() != expected_hmac:
raise "Somebody messed with your file!"
It should be clear though that this alone doesn't ensure your file hasn't been changed; the typical use case is to encrypt your file, but take the hash of the plaintext. That way, if someone changes the hash (at the end of the file) or tries changing any of the characters in the message (the encrypted portion), things will mismatch and you will know something was changed.
A malicious actor won't be able to change the file AND fix the hash to match because they would need to change some data, and then rehash everything with your private key. So long as no one knows your private key, they won't know how to recreate the correct hash.
This is an interesting question. You can do it if you adopt a proper convention for hashing and verifying the integrity of the files. Suppose you have this file, namely, main.py:
#!/usr/bin/env python
# encoding: utf-8
print "hello world"
Now, you could append an SHA-1 hash to the python file as a comment:
(printf '#'; cat main.py | sha1sum) >> main.py
Updated main.py:
#!/usr/bin/env python
# encoding: utf-8
print "hello world"
#30e3b19d4815ff5b5eca3a754d438dceab9e8814 -
Hence, to verify if the file was modified you can do this in Bash:
if [ "$(printf '#';head -n-1 main.py | sha1sum)" == "$(tail -n1 main.py)" ]
then
echo "Unmodified"
else
echo "Modified"
fi
Of course, someone could try to fool you by changing the hash string manually. In order to stop these bad guys, you can improve the system by tempering the file with a secret string before adding the hash to the last line.
Improved version
Add the hash in the last line including your secret string:
(printf '#';cat main.py;echo 'MyUltraSecretTemperString12345') | sha1sum >> main.py
For checking if the file was modified:
if [ "$(printf '#';(head -n-1 main.py; echo 'MyUltraSecretTemperString12345') | sha1sum)" == "$(tail -n1 main.py)" ]
then
echo "Unmodified"
else
echo "Modified"
fi
Using this improved version, the bad guys only can fool you if they find your ultra secret key first.
EDIT: This is a rough implementation of the keyed-hash message authentication code (HMAC).
Well although it looks like a strange idea, it could be an application of a little used but very powerful property of windows NTFS file system: the File Streams.
It allows to add many streams to a file without changing the content of the default stream. For example:
echo foo > foo.text
echo bar > foo.text:alt
type foo.text
=> foo
more < foo.text:alt
=> bar
But when listing the directory, you can only see one single file: foo.txt
So in your use case, you could write the hash of main stream in stream named hash, and later compare the content of the hash stream with the hash of the main stream.
Just a remark: for a reason I do not know, type foo.text:alt generates the following error:
"The filename, directory name, or volume label syntax is incorrect."
that's why my example uses more < as recommended in the Using streams page on MSDN
So assuming you have a myhash function that gives the hash for a file (you can easily build one by using the hashlib module):
def myhash(filename):
# compute the hash of the file
...
return hash_string
You can do:
def store_hash(filename):
hash_string = myhash(filename)
with open(filename + ":hash") as fd:
fd.write(hash_string)
def compare_hash(filename):
hash_string = myhash(filename)
with open(filename + ":hash") as fd:
orig = fd.read()
return (hash_string == orig)
I know I can get the sha256 digest of a given file using hashlib.sha256, and could get the rmd160 digest using a subprocess call to openssl rmd160 <myfile>, but is there a python package I can import that provides a method to determine the rmd160 digest?
(rmd160 is a recommended checksum for use in a Macports Portfile [1].)
You may use hashlib.new(name) to create hashers from the algorithm name. My understanding of the documentation (both Python 2 and 3) is that in this case this is delegated to OpenSSL. OpenSSL supports the names "rmd160" and "ripemd160".
I use
def rmd160(fname):
BLOCKSIZE = 65536
h = hashlib.new("rmd160")
with open(fname, 'rb') as f:
buf = f.read(BLOCKSIZE)
while buf:
h.update(buf)
buf = f.read(BLOCKSIZE)
return h.hexdigest()
It's not included in the standard library, but pycrypto supports it.
If you're using a unix system with access to a compiler and the required dependencies, you can simply pip install pycrypto. If you're using windows, there are third party pre-built binaries
duplicates (that I havent found answers in):
https://stackoverflow.com/questions/4066361/how-to-obfuscate-python-code
How do I protect Python code?
So I have looked at both of the links above ^^ and I have found nothing useful for actually encrypting python scripts and/or obfuscating python code. So I am new to C but experienced in python, and if I wanted to develop commercial python projects my best idea was this:
create a c script and an encrypted&compiled python script The C script would need to simply supply a string encryption key and decrypt it. Just FYI ive never actually tried encryption, and I know this wouldnt be perfect. But I dont need perfect. I just want to make it harder to decompile my python source code, realizing this would be still easy but not AS easy.
I have looked currently at Cython, and I can easily generate a *.c file, now how can I compile this to binary? (with visual studio)
So how could I encrypt my python code and decrypt it from a C script (which I could compile to binary making it significantly harder to edit)?
Here's what I would do:
1) Create a C script which produces a key and stores it to a text file
2) Pick up the key and immediately delete the text file upon running Python
3) Use the key to decrypt the really important parts of your Python code (make sure that not having these bits will break your script) then import it all
4) Immediately re-encrypt the important Python bits, and delete the .pyc file
This will be beatable, but you're ok with that.
To encrypt and re-encrypt your python bits, try this code:
from hashlib import md5
from Crypto.Cipher import AES
from Crypto import Random
def encrypt(in_file, out_file, password, key_length=32):
bs = AES.block_size
salt = Random.new().read(bs - len('Salted__'))
key, iv = derive_key_and_iv(password, salt, key_length, bs)
cipher = AES.new(key, AES.MODE_CBC, iv)
out_file.write('Salted__' + salt)
finished = False
while not finished:
chunk = in_file.read(1024 * bs)
if len(chunk) == 0 or len(chunk) % bs != 0:
padding_length = (bs - len(chunk) % bs) or bs
chunk += padding_length * chr(padding_length)
finished = True
out_file.write(cipher.encrypt(chunk))
def decrypt(in_file, out_file, password, key_length=32):
bs = AES.block_size
salt = in_file.read(bs)[len('Salted__'):]
key, iv = derive_key_and_iv(password, salt, key_length, bs)
cipher = AES.new(key, AES.MODE_CBC, iv)
next_chunk = ''
finished = False
while not finished:
chunk, next_chunk = next_chunk, cipher.decrypt(in_file.read(1024 * bs))
if len(next_chunk) == 0:
padding_length = ord(chunk[-1])
chunk = chunk[:-padding_length]
finished = True
out_file.write(chunk)
So to summarize, here's some pseudo-code:
def main():
os.system("C_Executable.exe")
with open("key.txt",'r') as f:
key = f.read()
os.remove("key.txt")
#Calls to decrpyt files which look like this:
with open("Encrypted file name"), 'rb') as in_file, open("unecrypted file name"), 'wb') as out_file:
decrypt(in_file, out_file, key)
os.remove("encrypted file name")
import fileA, fileB, fileC, etc
global fileA, fileB, fileC, etc
#Calls to re-encrypt files and remove unencrypted versions along with .pyc files using a similar scheme to decryption calls
#Whatever else you want
But just to stress and important point,
Python is not made for this! It is meant to be open and free!
If you find yourself at this juncture with no other alternative, you probably should just use a different language
The a look at Nuitka project. It's a python compiler that compiles your python scripts to native executable code that uses libpython to run.
http://nuitka.net/
You didn't explain WHY you felt the need to encrypt/decrypt. The answer to which could materially alter any suggestions offered.
For example, let's assume you're trying to protect intellectual property, but like the convenience of coding in python. If this is you motivation, consider cython -- http://cython.org.
But let's say you were more concerned about security (ie: to prevent someone from altering your code without a user's permission). In that case, you could consider some sort of Embedded loader that checksums your python source BEFORE invoking an embedded python interpreter.
I'm sure there are 1/2 dozen other reasons you might want to encrypt.