Version of python script - python

I was wondering if it is possible to set a version to a python script. I would like to be able to see the version of a script by right clicking on the file, selecting properties and then select the version tab.
This tab exists on many other files, but is it possible to lure it out on a .py/.pyw file?

Those are textfiles and thus they don't contain any header to include such information.
Specify the version with __version__ attribute and ask Microsoft to write this functionality.

__version__ is a string, but sometimes it's also a float or tuple. You should also make sure that the version number conforms to the format described in PEP 440 (PEP 386 a previous version of this standard).
__version__ = '1.0'
or
__version__ = (1, 0)
__version__ = sys.version[:3]

Related

Find desktop folder in a custom location? [duplicate]

I have this small program and it needs to create a small .txt file in their 'My Documents' Folder. Here's the code I have for that:
textfile=open('C:\Users\MYNAME\Documents','w')
lines=['stuff goes here']
textfile.writelines(lines)
textfile.close()
The problem is that if other people use it, how do I change the MYNAME to their account name?
Use os.path.expanduser(path), see http://docs.python.org/library/os.path.html
e.g. expanduser('~/filename')
This works on both Unix and Windows, according to the docs.
Edit: forward slash due to Sven's comment.
This works without any extra libs:
import ctypes.wintypes
CSIDL_PERSONAL = 5 # My Documents
SHGFP_TYPE_CURRENT = 0 # Get current, not default value
buf= ctypes.create_unicode_buffer(ctypes.wintypes.MAX_PATH)
ctypes.windll.shell32.SHGetFolderPathW(None, CSIDL_PERSONAL, None, SHGFP_TYPE_CURRENT, buf)
print(buf.value)
Also works if documents location and/or default save location is changed by user.
On Windows, you can use something similar what is shown in the accepted answer to the question: Python, get windows special folders for currently logged-in user.
For the My Documents folder path, useshellcon.CSIDL_PERSONALin the shell.SHGetFolderPath() function call instead of shellcon.CSIDL_MYPICTURES.
So, assuming you have the PyWin32 extensions1 installed, this might work (see caveat in Update section below):
>>> from win32com.shell import shell, shellcon
>>> shell.SHGetFolderPath(0, shellcon.CSIDL_PERSONAL, None, 0)
u'<path\\to\\folder>'
Update: I just read something that said that CSIDL_PERSONAL won't return the correct folder if the user has changed the default save folder in the Win7 Documents library. This is referring to what you can do in library's Properties dialog:
The checkmark means that the path is set as the default save location.
I currently am unware of a way to call the SHLoadLibraryFromKnownFolder() function through PyWin32 (there currently isn't a shell.SHLoadLibraryFromKnownFolder. However it should be possible to do so using the ctypes module.
1Installers for the latest versions of the Python for Windows Extensions are currently available from: http://sourceforge.net/projects/pywin32

Accessing alternate clipboard formats from python

Copying to the clipboard from an application that supports rich text will typically add the text in several formats. I need to find out the available formats, and then retrieve the clipboard contents in a selected format. In case it matters, I'm interested in rich text formats (from Word, Acrobat, browsers, ...), not in image data or other exotica.
I've looked and looked, but the solutions I've found are limited to plain text, super outdated, specific to Windows (I'm on OS X), reliant on the commandline utilities pbcopy and pbpaste (which don't handle all clipboard formats), or several of the above.
So: How can I get a list of the formats present in the clipboard, and extract its contents in a format of my choice?
Platforms, in order of interest: system-independent (I wish), OS X Mountain Lion (my current platform) or similar, other platforms (I plan to distribute my code).
Selected links
pyperclip: Looks interesting, but on OS X it delegates to pbcopy and pbpaste which support text, rtf and ps formats only.
This recipe from activestate is for Windows only, but shows how to get HTML. (This SO question refers to it).
This SO answer is also specific to win32clipboard.
This question is about dragging and dropping files to the clipboard (on Windows). Interesting, but no help with what I need.
This tkinter-based solution is simple and still works on OS X, but it only gets plain text-- and I've found no evidence that tkinter can handle anything else.
This shows near-identical tkinter code for putting text on the clipboard.
Edit (May 2017)
I now have a solution for OS X (see self-answer below), but I would appreciate hearing if (and how) pyperclip or another module can do the same on Windows. Pyperclip gets its hands deep in the Windows API, so it can't be very far from supporting a listing and selection of all available formats.
It's quite straightforward on OS X with the help of the module richxerox, available on pypi. It requires system support including the Apple AppKit and Foundation modules. I had trouble building Objective C for Python 3, so that initially I had only gotten this to work for Python 2. Anaconda 3 comes with all the necessary pieces preinstalled, however.
Here's a demo that prints the available clipboard types, and then fetches and prints each one:
import richxerox as rx
# Dump formats
verbose = True
if verbose:
print(rx.available(neat=False, dyn=True))
else:
print(rx.available())
# Dump contents in all formats
for k, v in rx.pasteall(neat=False, dyn=True).items():
line = "\n*** "+k+": "+v
print(line)
Output:
(
"public.html",
"public.utf8-plain-text"
)
*** public.html: <html><head><meta http-equiv="content-type" content="text/html; charset=utf-8">
</head><body><a href="http://coffeeghost.net/2010/10/09/pyperclip-a-cross-platform-clipboard-module-for-python/"
rel="nofollow noreferrer">pyperclip</a>: Looks interesting</body></html>
*** public.utf8-plain-text: pyperclip: Looks interesting
To print in a desired format with fall-back to text, you could use this:
paste_format = "rtf"
content = rx.paste(paste_format)
if not content:
content = rx.paste("text")
Or you could first check if a format is available:
if "public.rtf" in rx.available():
content = rx.paste("rtf")
else:
content = rx.paste("text")

Finding the user's "My Documents" path

I have this small program and it needs to create a small .txt file in their 'My Documents' Folder. Here's the code I have for that:
textfile=open('C:\Users\MYNAME\Documents','w')
lines=['stuff goes here']
textfile.writelines(lines)
textfile.close()
The problem is that if other people use it, how do I change the MYNAME to their account name?
Use os.path.expanduser(path), see http://docs.python.org/library/os.path.html
e.g. expanduser('~/filename')
This works on both Unix and Windows, according to the docs.
Edit: forward slash due to Sven's comment.
This works without any extra libs:
import ctypes.wintypes
CSIDL_PERSONAL = 5 # My Documents
SHGFP_TYPE_CURRENT = 0 # Get current, not default value
buf= ctypes.create_unicode_buffer(ctypes.wintypes.MAX_PATH)
ctypes.windll.shell32.SHGetFolderPathW(None, CSIDL_PERSONAL, None, SHGFP_TYPE_CURRENT, buf)
print(buf.value)
Also works if documents location and/or default save location is changed by user.
On Windows, you can use something similar what is shown in the accepted answer to the question: Python, get windows special folders for currently logged-in user.
For the My Documents folder path, useshellcon.CSIDL_PERSONALin the shell.SHGetFolderPath() function call instead of shellcon.CSIDL_MYPICTURES.
So, assuming you have the PyWin32 extensions1 installed, this might work (see caveat in Update section below):
>>> from win32com.shell import shell, shellcon
>>> shell.SHGetFolderPath(0, shellcon.CSIDL_PERSONAL, None, 0)
u'<path\\to\\folder>'
Update: I just read something that said that CSIDL_PERSONAL won't return the correct folder if the user has changed the default save folder in the Win7 Documents library. This is referring to what you can do in library's Properties dialog:
The checkmark means that the path is set as the default save location.
I currently am unware of a way to call the SHLoadLibraryFromKnownFolder() function through PyWin32 (there currently isn't a shell.SHLoadLibraryFromKnownFolder. However it should be possible to do so using the ctypes module.
1Installers for the latest versions of the Python for Windows Extensions are currently available from: http://sourceforge.net/projects/pywin32

What is the common header format of Python files?

I came across the following header format for Python source files in a document about Python coding guidelines:
#!/usr/bin/env python
"""Foobar.py: Description of what foobar does."""
__author__ = "Barack Obama"
__copyright__ = "Copyright 2009, Planet Earth"
Is this the standard format of headers in the Python world?
What other fields/information can I put in the header?
Python gurus share your guidelines for good Python source headers :-)
Its all metadata for the Foobar module.
The first one is the docstring of the module, that is already explained in Peter's answer.
How do I organize my modules (source files)? (Archive)
The first line of each file shoud be #!/usr/bin/env python. This makes it possible to run the file as a script invoking the interpreter implicitly, e.g. in a CGI context.
Next should be the docstring with a description. If the description is long, the first line should be a short summary that makes sense on its own, separated from the rest by a newline.
All code, including import statements, should follow the docstring. Otherwise, the docstring will not be recognized by the interpreter, and you will not have access to it in interactive sessions (i.e. through obj.__doc__) or when generating documentation with automated tools.
Import built-in modules first, followed by third-party modules, followed by any changes to the path and your own modules. Especially, additions to the path and names of your modules are likely to change rapidly: keeping them in one place makes them easier to find.
Next should be authorship information. This information should follow this format:
__author__ = "Rob Knight, Gavin Huttley, and Peter Maxwell"
__copyright__ = "Copyright 2007, The Cogent Project"
__credits__ = ["Rob Knight", "Peter Maxwell", "Gavin Huttley",
"Matthew Wakefield"]
__license__ = "GPL"
__version__ = "1.0.1"
__maintainer__ = "Rob Knight"
__email__ = "rob#spot.colorado.edu"
__status__ = "Production"
Status should typically be one of "Prototype", "Development", or "Production". __maintainer__ should be the person who will fix bugs and make improvements if imported. __credits__ differs from __author__ in that __credits__ includes people who reported bug fixes, made suggestions, etc. but did not actually write the code.
Here you have more information, listing __author__, __authors__, __contact__, __copyright__, __license__, __deprecated__, __date__ and __version__ as recognized metadata.
I strongly favour minimal file headers, by which I mean just:
#!/usr/bin/env python # [1]
"""\
This script foos the given bars [2]
Usage: myscript.py BAR1 BAR2
"""
import os # standard library, [3]
import sys
import requests # 3rd party packages
import mypackage # local source
[1] The hashbang if, and only if, this file should be able to be directly executed, i.e. run as myscript.py or myscript or maybe even python myscript.py. (The hashbang isn't used in the last case, but providing it gives users the choice of executing it either way.) The hashbang should not be included if the file is a module, intended just to be imported by other Python files.
[2] Module docstring
[3] Imports, grouped in the standard way, ie. three groups of imports, with a single blank line between them. Within each group, imports are sorted. The final group, imports from local source, can either be absolute imports as shown, or explicit relative imports.
Everything else is a waste of time - both for the author and for subsequent maintainers. It wastes the precious visual space at the top of the file with information that is better tracked elsewhere, and is easy to get out of date and become actively misleading.
If you have legal disclaimers or licensing info, it goes into a separate file. It does not need to infect every source code file. Your copyright should be part of this. People should be able to find it in your LICENSE file, not random source code.
Metadata such as authorship and dates is already maintained by your source control. There is no need to add a less-detailed, erroneous, and out-of-date version of the same info in the file itself.
I don't believe there is any other data that everyone needs to put into all their source files. You may have some particular requirement to do so, but such things apply, by definition, only to you. They have no place in “general headers recommended for everyone”.
The answers above are really complete, but if you want a quick and dirty header to copy'n paste, use this:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""Module documentation goes here
and here
and ...
"""
Why this is a good one:
The first line is for *nix users. It will choose the Python interpreter in the user path, so will automatically choose the user preferred interpreter.
The second one is the file encoding. Nowadays every file must have a encoding associated. UTF-8 will work everywhere. Just legacy projects would use other encoding.
And a very simple documentation. It can fill multiple lines.
See also: https://www.python.org/dev/peps/pep-0263/
If you just write a class in each file, you don't even need the documentation (it would go inside the class doc).
Also see PEP 263 if you are using a non-ascii characterset
Abstract
This PEP proposes to introduce a syntax to declare the encoding of
a Python source file. The encoding information is then used by the
Python parser to interpret the file using the given encoding. Most
notably this enhances the interpretation of Unicode literals in
the source code and makes it possible to write Unicode literals
using e.g. UTF-8 directly in an Unicode aware editor.
Problem
In Python 2.1, Unicode literals can only be written using the
Latin-1 based encoding "unicode-escape". This makes the
programming environment rather unfriendly to Python users who live
and work in non-Latin-1 locales such as many of the Asian
countries. Programmers can write their 8-bit strings using the
favorite encoding, but are bound to the "unicode-escape" encoding
for Unicode literals.
Proposed Solution
I propose to make the Python source code encoding both visible and
changeable on a per-source file basis by using a special comment
at the top of the file to declare the encoding.
To make Python aware of this encoding declaration a number of
concept changes are necessary with respect to the handling of
Python source code data.
Defining the Encoding
Python will default to ASCII as standard encoding if no other
encoding hints are given.
To define a source code encoding, a magic comment must
be placed into the source files either as first or second
line in the file, such as:
# coding=<encoding name>
or (using formats recognized by popular editors)
#!/usr/bin/python
# -*- coding: <encoding name> -*-
or
#!/usr/bin/python
# vim: set fileencoding=<encoding name> :
...
What I use in some project is this line in the first line for Linux machines:
# -*- coding: utf-8 -*-
As a DOC & Author credit, I like simple string in multiline. Here an example from Example Google Style Python Docstrings
# -*- coding: utf-8 -*-
"""Example Google style docstrings.
This module demonstrates documentation as specified by the `Google Python
Style Guide`_. Docstrings may extend over multiple lines. Sections are created
with a section header and a colon followed by a block of indented text.
Example:
Examples can be given using either the ``Example`` or ``Examples``
sections. Sections support any reStructuredText formatting, including
literal blocks::
$ python example_google.py
Section breaks are created by resuming unindented text. Section breaks
are also implicitly created anytime a new section starts.
Attributes:
module_level_variable1 (int): Module level variables may be documented in
either the ``Attributes`` section of the module docstring, or in an
inline docstring immediately following the variable.
Either form is acceptable, but the two should not be mixed. Choose
one convention to document module level variables and be consistent
with it.
Todo:
* For module TODOs
* You have to also use ``sphinx.ext.todo`` extension
.. _Google Python Style Guide:
http://google.github.io/styleguide/pyguide.html
"""
Also can be nice to add:
"""
#Author: ...
#Date: ....
#Credit: ...
#Links: ...
"""
Additional Formats
Meta-information markup | devguide
"""
:mod:`parrot` -- Dead parrot access
===================================
.. module:: parrot
:platform: Unix, Windows
:synopsis: Analyze and reanimate dead parrots.
.. moduleauthor:: Eric Cleese <eric#python.invalid>
.. moduleauthor:: John Idle <john#python.invalid>
"""
/common-header-python
#!/usr/bin/env python3 Line 1
# -*- coding: utf-8 -*- Line 2
#----------------------------------------------------------------------------
# Created By : name_of_the_creator Line 3
# Created Date: date/month/time ..etc
# version ='1.0'
# ---------------------------------------------------------------------------
Also I report similarly to other answers
__author__ = "Rob Knight, Gavin Huttley, and Peter Maxwell"
__copyright__ = "Copyright 2007, The Cogent Project"
__credits__ = ["Rob Knight", "Peter Maxwell", "Gavin Huttley",
"Matthew Wakefield"]
__license__ = "GPL"
__version__ = "1.0.1"
__maintainer__ = "Rob Knight"
__email__ = "rob#spot.colorado.edu"
__status__ = "Production"

How to find the mime type of a file in python?

Let's say you want to save a bunch of files somewhere, for instance in BLOBs. Let's say you want to dish these files out via a web page and have the client automatically open the correct application/viewer.
Assumption: The browser figures out which application/viewer to use by the mime-type (content-type?) header in the HTTP response.
Based on that assumption, in addition to the bytes of the file, you also want to save the MIME type.
How would you find the MIME type of a file? I'm currently on a Mac, but this should also work on Windows.
Does the browser add this information when posting the file to the web page?
Is there a neat python library for finding this information? A WebService or (even better) a downloadable database?
The python-magic method suggested by toivotuo is outdated. Python-magic's current trunk is at Github and based on the readme there, finding the MIME-type, is done like this.
# For MIME types
import magic
mime = magic.Magic(mime=True)
mime.from_file("testdata/test.pdf") # 'application/pdf'
The mimetypes module in the standard library will determine/guess the MIME type from a file extension.
If users are uploading files the HTTP post will contain the MIME type of the file alongside the data. For example, Django makes this data available as an attribute of the UploadedFile object.
More reliable way than to use the mimetypes library would be to use the python-magic package.
import magic
m = magic.open(magic.MAGIC_MIME)
m.load()
m.file("/tmp/document.pdf")
This would be equivalent to using file(1).
On Django one could also make sure that the MIME type matches that of UploadedFile.content_type.
This seems to be very easy
>>> from mimetypes import MimeTypes
>>> import urllib
>>> mime = MimeTypes()
>>> url = urllib.pathname2url('Upload.xml')
>>> mime_type = mime.guess_type(url)
>>> print mime_type
('application/xml', None)
Please refer Old Post
Update - In python 3+ version, it's more convenient now:
import mimetypes
print(mimetypes.guess_type("sample.html"))
13 year later...
Most of the answers on this page for python 3 were either outdated or incomplete.
To get the mime type of a file I use:
import mimetypes
mt = mimetypes.guess_type("https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf")
if mt:
print("Mime Type:", mt[0])
else:
print("Cannot determine Mime Type")
# Mime Type: application/pdf
Live Demo
From Python docs:
mimetypes.guess_type(url, strict=True)
Guess the type of a file based on its filename, path or URL, given by url. URL can be a string or a path-like object.
The return value is a tuple (type, encoding) where type is None if the type can’t be guessed (missing or unknown suffix) or a string of the form 'type/subtype', usable for a MIME content-type header.
encoding is None for no encoding or the name of the program used to encode (e.g. compress or gzip). The encoding is suitable for use as a Content-Encoding header, not as a Content-Transfer-Encoding header. The mappings are table driven. Encoding suffixes are case sensitive; type suffixes are first tried case sensitively, then case insensitively.
The optional strict argument is a flag specifying whether the list of known MIME types is limited to only the official types registered with IANA. When strict is True (the default), only the IANA types are supported; when strict is False, some additional non-standard but commonly used MIME types are also recognized.
Changed in version 3.8: Added support for url being a path-like object.
Python bindings to libmagic
All the different answers on this topic are very confusing, so I’m hoping to give a bit more clarity with this overview of the different bindings of libmagic. Previously mammadori gave a short answer listing the available option.
libmagic
module name: magic
pypi: file-magic
source: https://github.com/file/file/tree/master/python
When determining a files mime-type, the tool of choice is simply called file and its back-end is called libmagic. (See the Project home page.) The project is developed in a private cvs-repository, but there is a read-only git mirror on github.
Now this tool, which you will need if you want to use any of the libmagic bindings with python, already comes with its own python bindings called file-magic. There is not much dedicated documentation for them, but you can always have a look at the man page of the c-library: man libmagic. The basic usage is described in the readme file:
import magic
detected = magic.detect_from_filename('magic.py')
print 'Detected MIME type: {}'.format(detected.mime_type)
print 'Detected encoding: {}'.format(detected.encoding)
print 'Detected file type name: {}'.format(detected.name)
Apart from this, you can also use the library by creating a Magic object using magic.open(flags) as shown in the example file.
Both toivotuo and ewr2san use these file-magic bindings included in the file tool. They mistakenly assume, they are using the python-magic package. This seems to indicate, that if both file and python-magic are installed, the python module magic refers to the former one.
python-magic
module name: magic
pypi: python-magic
source: https://github.com/ahupp/python-magic
This is the library that Simon Zimmermann talks about in his answer and which is also employed by Claude COULOMBE as well as Gringo Suave.
filemagic
module name: magic
pypi: filemagic
source: https://github.com/aliles/filemagic
Note: This project was last updated in 2013!
Due to being based on the same c-api, this library has some similarity with file-magic included in libmagic. It is only mentioned by mammadori and no other answer employs it.
2017 Update
No need to go to github, it is on PyPi under a different name:
pip3 install --user python-magic
# or:
sudo apt install python3-magic # Ubuntu distro package
The code can be simplified as well:
>>> import magic
>>> magic.from_file('/tmp/img_3304.jpg', mime=True)
'image/jpeg'
There are 3 different libraries that wraps libmagic.
2 of them are available on pypi (so pip install will work):
filemagic
python-magic
And another, similar to python-magic is available directly in the latest libmagic sources, and it is the one you probably have in your linux distribution.
In Debian the package python-magic is about this one and it is used as toivotuo said and it is not obsoleted as Simon Zimmermann said (IMHO).
It seems to me another take (by the original author of libmagic).
Too bad is not available directly on pypi.
in python 2.6:
import shlex
import subprocess
mime = subprocess.Popen("/usr/bin/file --mime " + shlex.quote(PATH), shell=True, \
stdout=subprocess.PIPE).communicate()[0]
You didn't state what web server you were using, but Apache has a nice little module called Mime Magic which it uses to determine the type of a file when told to do so. It reads some of the file's content and tries to figure out what type it is based on the characters found. And as Dave Webb Mentioned the MimeTypes Module under python will work, provided an extension is handy.
Alternatively, if you are sitting on a UNIX box you can use sys.popen('file -i ' + fileName, mode='r') to grab the MIME type. Windows should have an equivalent command, but I'm unsure as to what it is.
#toivotuo 's method worked best and most reliably for me under python3. My goal was to identify gzipped files which do not have a reliable .gz extension. I installed python3-magic.
import magic
filename = "./datasets/test"
def file_mime_type(filename):
m = magic.open(magic.MAGIC_MIME)
m.load()
return(m.file(filename))
print(file_mime_type(filename))
for a gzipped file it returns:
application/gzip; charset=binary
for an unzipped txt file (iostat data):
text/plain; charset=us-ascii
for a tar file:
application/x-tar; charset=binary
for a bz2 file:
application/x-bzip2; charset=binary
and last but not least for me a .zip file:
application/zip; charset=binary
python 3 ref: https://docs.python.org/3.2/library/mimetypes.html
mimetypes.guess_type(url, strict=True) Guess the type of a file based
on its filename or URL, given by url. The return value is a tuple
(type, encoding) where type is None if the type can’t be guessed
(missing or unknown suffix) or a string of the form 'type/subtype',
usable for a MIME content-type header.
encoding is None for no encoding or the name of the program used to
encode (e.g. compress or gzip). The encoding is suitable for use as a
Content-Encoding header, not as a Content-Transfer-Encoding header.
The mappings are table driven. Encoding suffixes are case sensitive;
type suffixes are first tried case sensitively, then case
insensitively.
The optional strict argument is a flag specifying whether the list of
known MIME types is limited to only the official types registered with
IANA. When strict is True (the default), only the IANA types are
supported; when strict is False, some additional non-standard but
commonly used MIME types are also recognized.
import mimetypes
print(mimetypes.guess_type("sample.html"))
In Python 3.x and webapp with url to the file which couldn't have an extension or a fake extension. You should install python-magic, using
pip3 install python-magic
For Mac OS X, you should also install libmagic using
brew install libmagic
Code snippet
import urllib
import magic
from urllib.request import urlopen
url = "http://...url to the file ..."
request = urllib.request.Request(url)
response = urlopen(request)
mime_type = magic.from_buffer(response.readline())
print(mime_type)
alternatively you could put a size into the read
import urllib
import magic
from urllib.request import urlopen
url = "http://...url to the file ..."
request = urllib.request.Request(url)
response = urlopen(request)
mime_type = magic.from_buffer(response.read(128))
print(mime_type)
I try mimetypes library first. If it's not working, I use python-magic libary instead.
import mimetypes
def guess_type(filename, buffer=None):
mimetype, encoding = mimetypes.guess_type(filename)
if mimetype is None:
try:
import magic
if buffer:
mimetype = magic.from_buffer(buffer, mime=True)
else:
mimetype = magic.from_file(filename, mime=True)
except ImportError:
pass
return mimetype
The mimetypes module just recognise an file type based on file extension. If you will try to recover a file type of a file without extension, the mimetypes will not works.
I'm surprised that nobody has mentioned it but Pygments is able to make an educated guess about the mime-type of, particularly, text documents.
Pygments is actually a Python syntax highlighting library but is has a method that will make an educated guess about which of 500 supported document types your document is.
i.e. c++ vs C# vs Python vs etc
import inspect
def _test(text: str):
from pygments.lexers import guess_lexer
lexer = guess_lexer(text)
mimetype = lexer.mimetypes[0] if lexer.mimetypes else None
print(mimetype)
if __name__ == "__main__":
# Set the text to the actual defintion of _test(...) above
text = inspect.getsource(_test)
print('Text:')
print(text)
print()
print('Result:')
_test(text)
Output:
Text:
def _test(text: str):
from pygments.lexers import guess_lexer
lexer = guess_lexer(text)
mimetype = lexer.mimetypes[0] if lexer.mimetypes else None
print(mimetype)
Result:
text/x-python
Now, it's not perfect, but if you need to be able to tell which of 500 document formats are being used, this is pretty darn useful.
For byte Array type data you can use
magic.from_buffer(_byte_array,mime=True)
I 've tried a lot of examples but with Django mutagen plays nicely.
Example checking if files is mp3
from mutagen.mp3 import MP3, HeaderNotFoundError
try:
audio = MP3(file)
except HeaderNotFoundError:
raise ValidationError('This file should be mp3')
The downside is that your ability to check file types is limited, but it's a great way if you want not only check for file type but also to access additional information.

Categories