Related
I want to install some python packages using pip but cannot as every file downloaded produces the same hash, which then fails comparison in pips security check.
After playing around, I see that every file I download using curl from files.pythonhosted will hash to the same value. I've tested this with a python script like so:
curl http://files.pythonhosted.org/packages/1a/80/b06ce333aabba7ab1b6a41ea3c4e46970ceb396e705733480a2d47a7f74b/Django-4.0.3-py3-none-any.whl -o django.whl
import hashlib
hasher = hashlib.sha256()
BLOCKSIZE = 65536
def hash_stuff(file):
with open(file, 'rb') as afile:
buf = afile.read(BLOCKSIZE)
while len(buf) > 0:
hasher.update(buf)
buf = afile.read(BLOCKSIZE)
print(hasher.hexdigest())
hash_stuff("pynvim.tar.gz")
hash_stuff("opencv.tar.gz")
hash_stuff("django.whl")
which outputs:
➜ ~ python pythonhash.py
c77ab57a36e39ce205ca2327a3edd10399f4d78a3be91e80d845a1b97c29b7d6
ea75572349ed10da0f3224398737fd08352ae10e6f3c571345feb971e080a276
9e31adaf584633587df90d7be36e2fb287c7344eaa4bb23d619f4bdaa19a67d0
if I modify the order of the hash_stuff function like so (note the ordering is different):
hash_stuff("django.whl")
hash_stuff("opencv.tar.gz")
hash_stuff("pynvim.tar.gz")
the output does not change!
➜ ~ python pythonhash.py
c77ab57a36e39ce205ca2327a3edd10399f4d78a3be91e80d845a1b97c29b7d6
ea75572349ed10da0f3224398737fd08352ae10e6f3c571345feb971e080a276
9e31adaf584633587df90d7be36e2fb287c7344eaa4bb23d619f4bdaa19a67d0
If I reset the hasher object I get the first hash c77ab57 three times like so
def hash_stuff(file):
hasher = hashlib.sha256()
BLOCKSIZE = 65536
with open(file, 'rb') as afile:
-----
➜ ~ python pythonhash.py
c77ab57a36e39ce205ca2327a3edd10399f4d78a3be91e80d845a1b97c29b7d6
c77ab57a36e39ce205ca2327a3edd10399f4d78a3be91e80d845a1b97c29b7d6
c77ab57a36e39ce205ca2327a3edd10399f4d78a3be91e80d845a1b97c29b7d6
I've written the same test in ruby and getting the same results..
require 'digest'
puts Digest::SHA256.hexdigest File.read "django.whl"
puts Digest::SHA256.hexdigest File.read "opencv.tar.gz"
puts Digest::SHA256.hexdigest File.read "pynvim.tar.gz"
As a sanity check, I've tested hashing some local files and they produce the same hash consistently, regardless of ordering.
How can the ordering of execution effect the hash?
erm, files.pythonhosted doesn't even have a proper ssl certificate.. - can I even trust this host?
What could I possibly be doing wrong?
Turns out my internet provider was blocking the content due to files.pythonhosted having a self signed ssl certificate.
the reason the hash was the same for all files was because I was getting an error html page (doh..) thanks #jasonharper for the pointer!
The php code is
'''
$input_file = "a.txt";
$source = file_get_contents($input_file);
$source = gzcompress($source);
file_put_contents("php.txt",$source)
'''
The python code is
'''
testFile = "a.txt"
content = None
with open(testFile,"rb") as f:
content = f.read()
outContent = zlib.compress(content)
with open("py.txt","wb") as f:
f.write(outContent)
'''
The python3 version is [Python 3.6.9]
The php version is [PHP 7.2.17]
I need the same result for same md5.
The problem is not in PHP or Python, but rather in your "need". You cannot expect to get the same result, unless the two environments happen to be using the same version of the same compression code with the same settings. Since you do not have control of the version of code being used, your "need" can never be guaranteed to be met.
You should instead be doing your md5 on the decompressed data, not the compressed data.
I find the solution.
The code is
compress = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION,zlib.DEFLATED, 15, 9)
outContent = compress.compress(content)
outContent += compress.flush()
The python zlib provide a interface "zlib.compressobj",which returns a compressobj,and the parameters decide the result.
You can adjust parameters to make sure the python's result is same with php's
There is already a lot of questions out there but the problem is, none of them have sufficient answers how to do it, especially when using python3.
Basically, I want to read JAR/APK certificates, like this one: Link to ASN1 Decoder, with Android Test Signing Key
There are now several alternatives:
pyasn1: seems to work, but only can parse the raw ASN.1 format
M2Crypto: only works on py2
Chilkat: Not free, although CkCert seems to be free
cryptography: Can not load the certificate, as the X509 certificate is inside the PKCS#7 container
I found a way to use pyasn1 to unpack the cert from the pkcs#7 message, then use cryptography to read it:
from pyasn1.codec.der.decoder import decode
from pyasn1.codec.der.encoder import encode
from cryptography import x509
from cryptography.hazmat.backends import default_backend
cdata = open("CERT.RSA", "rb").read()
cert, rest = decode(cdata)
# The cert should be located there
realcert = encode(cert[1][3])
realcert = realcert[2 + (realcert[1] & 0x7F) if realcert[1] & 0x80 > 1 else 2:] # remove the first DER identifier from the front
x509.load_der_x509_certificate(realcert, default_backend())
which gives
<Certificate(subject=<Name([<NameAttribute(oid=<ObjectIdentifier(oid=2.5.4.6, name=countryName)>, value='US')>, <NameAttribute(oid=<ObjectIdentifier(oid=2.5.4.8, name=stateOrProvinceName)>, value='California')>, <NameAttribute(oid=<ObjectIdentifier(oid=2.5.4.7, name=localityName)>, value='Mountain View')>, <NameAttribute(oid=<ObjectIdentifier(oid=2.5.4.10, name=organizationName)>, value='Android')>, <NameAttribute(oid=<ObjectIdentifier(oid=2.5.4.11, name=organizationalUnitName)>, value='Android')>, <NameAttribute(oid=<ObjectIdentifier(oid=2.5.4.3, name=commonName)>, value='Android')>, <NameAttribute(oid=<ObjectIdentifier(oid=1.2.840.113549.1.9.1, name=emailAddress)>, value='android#android.com')>])>, ...)>
Is there no other way to have it clean and tidy?
There are now libraries to do this in pure python. One is asn1crypto: https://github.com/wbond/asn1crypto#readme
This is also impemented in androguard, including examples how to use it: https://androguard.readthedocs.io/en/latest/intro/certificates.html
I am creating an ZIP file with ZipFile in Python 2.5, it works OK so far:
import zipfile, os
locfile = "test.txt"
loczip = os.path.splitext (locfile)[0] + ".zip"
zip = zipfile.ZipFile (loczip, "w")
zip.write (locfile)
zip.close()
But I couldn't find how to encrypt the files in the ZIP file.
I could use system and call PKZIP -s, but I suppose there must be a more "Pythonic" way. I'm looking for an open source solution.
I created a simple library to create a password encrypted zip file in python. - here
import pyminizip
compression_level = 5 # 1-9
pyminizip.compress("src.txt", "dst.zip", "password", compression_level)
The library requires zlib.
I have checked that the file can be extracted in WINDOWS/MAC.
This thread is a little bit old, but for people looking for an answer to this question in 2020/2021.
Look at pyzipper
A 100% API compatible replacement for Python’s zipfile that can read and write AES encrypted zip files.
7-zip is also a good choice, but if you do not want to use subprocess, go with pyzipper...
The duplicate question: Code to create a password encrypted zip file? has an answer that recommends using 7z instead of zip. My experience bears this out.
Copy/pasting the answer by #jfs here too, for completeness:
To create encrypted zip archive (named 'myarchive.zip') using open-source 7-Zip utility:
rc = subprocess.call(['7z', 'a', '-mem=AES256', '-pP4$$W0rd', '-y', 'myarchive.zip'] +
['first_file.txt', 'second.file'])
To install 7-Zip, type:
$ sudo apt-get install p7zip-full
To unzip by hand (to demonstrate compatibility with zip utility), type:
$ unzip myarchive.zip
And enter P4$$W0rd at the prompt.
Or the same in Python 2.6+:
>>> zipfile.ZipFile('myarchive.zip').extractall(pwd='P4$$W0rd')
pyminizip works great in creating a password protected zip file. For unziping ,it fails at some situations. Tested on python 3.7.3
Here, i used pyminizip for encrypting the file.
import pyminizip
compression_level = 5 # 1-9
pyminizip.compress("src.txt",'src', "dst.zip", "password", compression_level)
For unzip, I used zip file module:
from zipfile import ZipFile
with ZipFile('/home/paulsteven/dst.zip') as zf:
zf.extractall(pwd=b'password')
You can use pyzipper for this task and it will work great when you want to encrypt a zip file or generate a protected zip file.
pip install pyzipper
import pyzipper
def encrypt_():
secret_password = b'your password'
with pyzipper.AESZipFile('new_test.zip',
'w',
compression=pyzipper.ZIP_LZMA,
encryption=pyzipper.WZ_AES) as zf:
zf.setpassword(secret_password)
zf.writestr('test.txt', "What ever you do, don't tell anyone!")
with pyzipper.AESZipFile('new_test.zip') as zf:
zf.setpassword(secret_password)
my_secrets = zf.read('test.txt')
The strength of the AES encryption can be configure to be 128, 192 or 256 bits. By default it is 256 bits. Use the setencryption() method to specify the encryption kwargs:
def encrypt_():
secret_password = b'your password'
with pyzipper.AESZipFile('new_test.zip',
'w',
compression=pyzipper.ZIP_LZMA) as zf:
zf.setpassword(secret_password)
zf.setencryption(pyzipper.WZ_AES, nbits=128)
zf.writestr('test.txt', "What ever you do, don't tell anyone!")
with pyzipper.AESZipFile('new_test.zip') as zf:
zf.setpassword(secret_password)
my_secrets = zf.read('test.txt')
Official Python ZipFile documentation is available here: https://docs.python.org/3/library/zipfile.html
#tripleee's answer helped me, see my test below.
This code works for me on python 3.5.2 on Windows 8.1 ( 7z path added to system).
rc = subprocess.call(['7z', 'a', output_filename + '.zip', '-mx9', '-pSecret^)'] + [src_folder + '/'])
With two parameters:
-mx9 means max compression
-pSecret^) means password is Secret^). ^ is escape for ) for Windows OS, but when you unzip, it will need type in the ^.
Without ^ Windows OS will not apply the password when 7z.exe creating the zip file.
Also, if you want to use -mhe switch, you'll need the file format to be in 7z instead of zip.
I hope that may help.
2022 answer:
I believe this is an utterly mundane task and therefore should be oneliner. I abstracted away all the frevolous details in a library that is as powerfull as a bash terminal.
from crocodile.toolbox import Path
file = Path(r'my_string_path')
result_file = file.zip(pwd="lol", use_7z=True)
when the 7z flag is raised, it gets called behind the scenes.
You don't need to learn 7z command line syntax.
You don't need to worry about installing 7z, does that automatically if it's not installed. (tested on windows so far)
You can use the Chilkat library. It's commercial, but has a free evaluation and seems pretty nice.
Here's an example I got from here:
import chilkat
# Demonstrates how to create a WinZip-compatible 128-bit AES strong encrypted zip
zip = chilkat.CkZip()
zip.UnlockComponent("anything for 30-day trial")
zip.NewZip("strongEncrypted.zip")
# Set the Encryption property = 4, which indicates WinZip compatible AES encryption.
zip.put_Encryption(4)
# The key length can be 128, 192, or 256.
zip.put_EncryptKeyLength(128)
zip.SetPassword("secret")
zip.AppendFiles("exampleData/*",True)
zip.WriteZip()
Let's say you want to save a bunch of files somewhere, for instance in BLOBs. Let's say you want to dish these files out via a web page and have the client automatically open the correct application/viewer.
Assumption: The browser figures out which application/viewer to use by the mime-type (content-type?) header in the HTTP response.
Based on that assumption, in addition to the bytes of the file, you also want to save the MIME type.
How would you find the MIME type of a file? I'm currently on a Mac, but this should also work on Windows.
Does the browser add this information when posting the file to the web page?
Is there a neat python library for finding this information? A WebService or (even better) a downloadable database?
The python-magic method suggested by toivotuo is outdated. Python-magic's current trunk is at Github and based on the readme there, finding the MIME-type, is done like this.
# For MIME types
import magic
mime = magic.Magic(mime=True)
mime.from_file("testdata/test.pdf") # 'application/pdf'
The mimetypes module in the standard library will determine/guess the MIME type from a file extension.
If users are uploading files the HTTP post will contain the MIME type of the file alongside the data. For example, Django makes this data available as an attribute of the UploadedFile object.
More reliable way than to use the mimetypes library would be to use the python-magic package.
import magic
m = magic.open(magic.MAGIC_MIME)
m.load()
m.file("/tmp/document.pdf")
This would be equivalent to using file(1).
On Django one could also make sure that the MIME type matches that of UploadedFile.content_type.
This seems to be very easy
>>> from mimetypes import MimeTypes
>>> import urllib
>>> mime = MimeTypes()
>>> url = urllib.pathname2url('Upload.xml')
>>> mime_type = mime.guess_type(url)
>>> print mime_type
('application/xml', None)
Please refer Old Post
Update - In python 3+ version, it's more convenient now:
import mimetypes
print(mimetypes.guess_type("sample.html"))
13 year later...
Most of the answers on this page for python 3 were either outdated or incomplete.
To get the mime type of a file I use:
import mimetypes
mt = mimetypes.guess_type("https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf")
if mt:
print("Mime Type:", mt[0])
else:
print("Cannot determine Mime Type")
# Mime Type: application/pdf
Live Demo
From Python docs:
mimetypes.guess_type(url, strict=True)
Guess the type of a file based on its filename, path or URL, given by url. URL can be a string or a path-like object.
The return value is a tuple (type, encoding) where type is None if the type can’t be guessed (missing or unknown suffix) or a string of the form 'type/subtype', usable for a MIME content-type header.
encoding is None for no encoding or the name of the program used to encode (e.g. compress or gzip). The encoding is suitable for use as a Content-Encoding header, not as a Content-Transfer-Encoding header. The mappings are table driven. Encoding suffixes are case sensitive; type suffixes are first tried case sensitively, then case insensitively.
The optional strict argument is a flag specifying whether the list of known MIME types is limited to only the official types registered with IANA. When strict is True (the default), only the IANA types are supported; when strict is False, some additional non-standard but commonly used MIME types are also recognized.
Changed in version 3.8: Added support for url being a path-like object.
Python bindings to libmagic
All the different answers on this topic are very confusing, so I’m hoping to give a bit more clarity with this overview of the different bindings of libmagic. Previously mammadori gave a short answer listing the available option.
libmagic
module name: magic
pypi: file-magic
source: https://github.com/file/file/tree/master/python
When determining a files mime-type, the tool of choice is simply called file and its back-end is called libmagic. (See the Project home page.) The project is developed in a private cvs-repository, but there is a read-only git mirror on github.
Now this tool, which you will need if you want to use any of the libmagic bindings with python, already comes with its own python bindings called file-magic. There is not much dedicated documentation for them, but you can always have a look at the man page of the c-library: man libmagic. The basic usage is described in the readme file:
import magic
detected = magic.detect_from_filename('magic.py')
print 'Detected MIME type: {}'.format(detected.mime_type)
print 'Detected encoding: {}'.format(detected.encoding)
print 'Detected file type name: {}'.format(detected.name)
Apart from this, you can also use the library by creating a Magic object using magic.open(flags) as shown in the example file.
Both toivotuo and ewr2san use these file-magic bindings included in the file tool. They mistakenly assume, they are using the python-magic package. This seems to indicate, that if both file and python-magic are installed, the python module magic refers to the former one.
python-magic
module name: magic
pypi: python-magic
source: https://github.com/ahupp/python-magic
This is the library that Simon Zimmermann talks about in his answer and which is also employed by Claude COULOMBE as well as Gringo Suave.
filemagic
module name: magic
pypi: filemagic
source: https://github.com/aliles/filemagic
Note: This project was last updated in 2013!
Due to being based on the same c-api, this library has some similarity with file-magic included in libmagic. It is only mentioned by mammadori and no other answer employs it.
2017 Update
No need to go to github, it is on PyPi under a different name:
pip3 install --user python-magic
# or:
sudo apt install python3-magic # Ubuntu distro package
The code can be simplified as well:
>>> import magic
>>> magic.from_file('/tmp/img_3304.jpg', mime=True)
'image/jpeg'
There are 3 different libraries that wraps libmagic.
2 of them are available on pypi (so pip install will work):
filemagic
python-magic
And another, similar to python-magic is available directly in the latest libmagic sources, and it is the one you probably have in your linux distribution.
In Debian the package python-magic is about this one and it is used as toivotuo said and it is not obsoleted as Simon Zimmermann said (IMHO).
It seems to me another take (by the original author of libmagic).
Too bad is not available directly on pypi.
in python 2.6:
import shlex
import subprocess
mime = subprocess.Popen("/usr/bin/file --mime " + shlex.quote(PATH), shell=True, \
stdout=subprocess.PIPE).communicate()[0]
You didn't state what web server you were using, but Apache has a nice little module called Mime Magic which it uses to determine the type of a file when told to do so. It reads some of the file's content and tries to figure out what type it is based on the characters found. And as Dave Webb Mentioned the MimeTypes Module under python will work, provided an extension is handy.
Alternatively, if you are sitting on a UNIX box you can use sys.popen('file -i ' + fileName, mode='r') to grab the MIME type. Windows should have an equivalent command, but I'm unsure as to what it is.
#toivotuo 's method worked best and most reliably for me under python3. My goal was to identify gzipped files which do not have a reliable .gz extension. I installed python3-magic.
import magic
filename = "./datasets/test"
def file_mime_type(filename):
m = magic.open(magic.MAGIC_MIME)
m.load()
return(m.file(filename))
print(file_mime_type(filename))
for a gzipped file it returns:
application/gzip; charset=binary
for an unzipped txt file (iostat data):
text/plain; charset=us-ascii
for a tar file:
application/x-tar; charset=binary
for a bz2 file:
application/x-bzip2; charset=binary
and last but not least for me a .zip file:
application/zip; charset=binary
python 3 ref: https://docs.python.org/3.2/library/mimetypes.html
mimetypes.guess_type(url, strict=True) Guess the type of a file based
on its filename or URL, given by url. The return value is a tuple
(type, encoding) where type is None if the type can’t be guessed
(missing or unknown suffix) or a string of the form 'type/subtype',
usable for a MIME content-type header.
encoding is None for no encoding or the name of the program used to
encode (e.g. compress or gzip). The encoding is suitable for use as a
Content-Encoding header, not as a Content-Transfer-Encoding header.
The mappings are table driven. Encoding suffixes are case sensitive;
type suffixes are first tried case sensitively, then case
insensitively.
The optional strict argument is a flag specifying whether the list of
known MIME types is limited to only the official types registered with
IANA. When strict is True (the default), only the IANA types are
supported; when strict is False, some additional non-standard but
commonly used MIME types are also recognized.
import mimetypes
print(mimetypes.guess_type("sample.html"))
In Python 3.x and webapp with url to the file which couldn't have an extension or a fake extension. You should install python-magic, using
pip3 install python-magic
For Mac OS X, you should also install libmagic using
brew install libmagic
Code snippet
import urllib
import magic
from urllib.request import urlopen
url = "http://...url to the file ..."
request = urllib.request.Request(url)
response = urlopen(request)
mime_type = magic.from_buffer(response.readline())
print(mime_type)
alternatively you could put a size into the read
import urllib
import magic
from urllib.request import urlopen
url = "http://...url to the file ..."
request = urllib.request.Request(url)
response = urlopen(request)
mime_type = magic.from_buffer(response.read(128))
print(mime_type)
I try mimetypes library first. If it's not working, I use python-magic libary instead.
import mimetypes
def guess_type(filename, buffer=None):
mimetype, encoding = mimetypes.guess_type(filename)
if mimetype is None:
try:
import magic
if buffer:
mimetype = magic.from_buffer(buffer, mime=True)
else:
mimetype = magic.from_file(filename, mime=True)
except ImportError:
pass
return mimetype
The mimetypes module just recognise an file type based on file extension. If you will try to recover a file type of a file without extension, the mimetypes will not works.
I'm surprised that nobody has mentioned it but Pygments is able to make an educated guess about the mime-type of, particularly, text documents.
Pygments is actually a Python syntax highlighting library but is has a method that will make an educated guess about which of 500 supported document types your document is.
i.e. c++ vs C# vs Python vs etc
import inspect
def _test(text: str):
from pygments.lexers import guess_lexer
lexer = guess_lexer(text)
mimetype = lexer.mimetypes[0] if lexer.mimetypes else None
print(mimetype)
if __name__ == "__main__":
# Set the text to the actual defintion of _test(...) above
text = inspect.getsource(_test)
print('Text:')
print(text)
print()
print('Result:')
_test(text)
Output:
Text:
def _test(text: str):
from pygments.lexers import guess_lexer
lexer = guess_lexer(text)
mimetype = lexer.mimetypes[0] if lexer.mimetypes else None
print(mimetype)
Result:
text/x-python
Now, it's not perfect, but if you need to be able to tell which of 500 document formats are being used, this is pretty darn useful.
For byte Array type data you can use
magic.from_buffer(_byte_array,mime=True)
I 've tried a lot of examples but with Django mutagen plays nicely.
Example checking if files is mp3
from mutagen.mp3 import MP3, HeaderNotFoundError
try:
audio = MP3(file)
except HeaderNotFoundError:
raise ValidationError('This file should be mp3')
The downside is that your ability to check file types is limited, but it's a great way if you want not only check for file type but also to access additional information.