I'm trying to get a file type with Python. For example, if I give the code "somearchive.rar" it must return "WinRAR Archive". If I give it "someapplication.exe" it must return "Application", etc...
Basically the text you see when you open a file's properties in Windows, on the "File type" line.
I don't know how to do this, though I think you can do it by looking at the registry or something similar and taking the file's properties (or file's extension properties?) and then keeping only the type, because I saw this code
def def_app(estensione):
class_root = winreg.QueryValue(winreg.HKEY_CLASSES_ROOT, estensione)
with winreg.OpenKey(winreg.HKEY_CLASSES_ROOT, r'{}\shell\open\command'.format(class_root)) as key:
command = winreg.QueryValueEx(key, '')[0]
return shlex.split(command)[0]
that looks at the registry and gives you the default application that opens files with the given extension.
OK, so I found out how to do it... This code checks the file type (or association) by looking in the Windows' registry (the same as opening regedit, going in HKEY_CLASSES_ROOT and then looking at the keys in there, as the user #martineau suggested):
rawcom = os.popen("assoc ."+command[len(command)-1]).read().split("=")
It is already split, so I can do rawcom[1] and get the file type easily.
If there isn't a file association in the Windows' registry, it checks the file type using this code that I found:
def get_file_metadata(path, filename, metadata):
sh = win32com.client.gencache.EnsureDispatch('Shell.Application', 0)
ns = sh.NameSpace(path)
file_metadata = dict()
item = ns.ParseName(str(filename))
for ind, attribute in enumerate(metadata):
attr_value = ns.GetDetailsOf(item, ind)
if attr_value:
file_metadata[attribute] = attr_value
return file_metadata
if __name__ == '__main__':
folder = direc
filename = file
metadata = ['Name', 'Size', 'Item type', 'Date modified', 'Date created']
proprietà = get_file_metadata(folder, filename, metadata)
It does exactly what I was trying to do at the start, getting the file type as if I was opening the file's properties in the Windows explorer. With this I put the file metadata in a dictionary and then get only the "Item type" value.
maybe this library can help: filetype
Example:
In [1]: import filetype
In [2]: kind = filetype.guess('/Users/ayik/Pictures/Archive.rar')
In [3]: print(f'MIME type: {kind.mime}')
MIME type: application/x-rar-compressed
from then you can map the MIME types to your desired types
Related
I'm relatively new to python, I have made a few pieces of very simple code, and I'm struggling to implement a program that prompts the user for the name of a file (e.g. : cat.png) and then outputs that file’s media type (e.g. : .png). If the file’s name ends, case-insensitively, in any of these suffixes:
-.gif
-.jpg
-.jpeg
-.png
-.pdf
-.txt
-.zip
Then I want to print out its corresponding meaning:
-image/gif
-image/jpeg
-image/jpeg
-image/png
-application/text
-text/plain
-application/zip
e.g. :
My desired output:
$ python extensions.py
File name: cat.gif
image/gif
$ python extensions.py
File name: cat.jpg
image/jpg
I've tried to solve this problem using a dictionary, to match a name to its corresponding format:
file_name = input('File name: ').strip().lower()
extensions = [
{'name': '.gif', 'format': 'image/gif' },
{'name': '.jpg', 'format': 'image/jpeg' },
{'name': '.jpeg', 'format' :'image/jpeg' },
{'name': '.png', 'format': 'image/png' },
{'name': '.pdf', 'format': 'application/text' },
{'name': '.txt', 'format': 'text/plain' },
{'name': '.zip', 'format': 'application/zip' }
]
Problem is, I don't know how to turn a user output like cat.png into a file like .png and printed on the terminal as image/png like the picture above. I'm trying to find a way to somehow somehow take the .png part out of the 'cat.png', and pass it through a dictionary, printing out the image/png.
Appreciate you reading this long description. Anyone have ideas to implement such a program maybe?
import os
file_name = 'cat.png'
extensions = {
'.gif': 'image/gif',
'.jpg': 'image/jpeg' ,
'.jpeg': 'image/jpeg' ,
'.png': 'image/png' ,
'.pdf': 'application/text' ,
'.txt': 'text/plain' ,
'.zip': 'application/zip'
}
print(extensions[os.path.splitext(file_name)[1]])
So first of all, I have changed the structure of the extensions - there is no real need to be so repetitive and keep a list of dicts which would be much harder to browse through. Using extensions as keys, you can refer to them directly to find respective format.
Then you can use splitext method from Python os.path module to get a string representing file extensions (which works better for various edge cases than str.split)
That should not be much of an issue.
First, create a dictionary of {extension:explanation} items.
Then, ask the user to enter the filename.
Then, split the file name by dot, take the part after dot. You can use the partition method for that.
Finally, query your dictionary and return the output.
Code:
# First, create a dictionary of {extension:explanation} items.
extensions_dict {
'png': 'image/media'
# put as much as you wish
}
# Then, ask the user to enter the filename.
user_file = input('file name: ')
# Then, split the file name by dot, take the part after dot.
file_name, dot, file_extension = user_file.partition('.')
# Finally, query your dictionary and return the output.
# You can print a message to the user as a default
# if the extension is not in your dictionary
print(extensions_dict.get(
file_extension,
f'I cannot understand your extension ({file_extension}) :('
)
If the file name is really just the file name with one dot in it, then you can use .split() to get the extension:
file_extension = file_name.split('.')[-1]
By way of explanation: split.('.') splits up the string that you've entered with the dot as the separator and returns a list, and the [-1] index gives you the last item in that list, i.e., the extension.
You can then look up that extension in your dictionary.
The other solutions are fine and OK. But I want to offer you a more pythonic way using Pythons pathlib package to handle file path objects the platform independent way and some other tweaks.
#!/usr/bin/env python3
import pathlib # recommended way with Python3
media_types_dict = {
'png': 'image/media',
'gif': 'image/gif',
# ...
}
# ask the user
user_file = input('file name: ')
# convert to a file path object
user_file = pathlib.Path(user_file)
# get the extension (with the trailing dot, e.g. ".png")
extension = user_file.suffix
# remove the dot
extension = extension[1:]
try:
# get the media type
media_type = media_types_dict[extension.lower()]
except KeyError: # if the extensions is not present in the dict
raise Exception(f'The extension "{extension}" is unknown.')
# output the result
print(media_type)
Using split(".") may create error if user inputs without a dot(.) It may not be more pythonic but using if....elif....else would solve this problem.
x=input("enter filename with extension: ").lower()
if x.endswith(".gif"):
print("image/gif")
elif x.endswith(".jpg"):
print("image/jpg")
elif x.endswith(".jpeg"):
print("image/jpeg")
elif x.endswith(".png"):
print("image/png")
elif x.endswith(".pdf"):
print("document/pdf")
elif x.endswith(".txt"):
print("text/txt")
elif x.endswith(".zip"):
print("archive/zip")
else:
print("application/octet-stream")
I wrote a function in python using the comtypes.client module, the function should supposedly open the database from a .msi file and write a special (key, value) pair. My issue so far is once the function is called with no problems, I try to use os.rename() to rename the .msi file afterwards and get a permission error:
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process
what I understand is that my COM object is still in use and so I cannot access the file, The function and function calls look like (obviously this is very simplified but should look work as such):
import comtypes.client
import os, shutil
def setInstallerAttribute(installer_path, attribute_key, attribute_value):
installerCOM = comtypes.client.CreateObject("WindowsInstaller.Installer")
installerDatabase = installerCOM.OpenDatabase (installer_path, 1)
view = installerDatabase.OpenView ("INSERT INTO Property (Property, Value) VALUES ('{0}', '{1}')".format(attribute_key, attribute_value))
view.Execute
installerDatabase.Commit
view = None
installerDatabase = None
installerCOM = None
if __name__ == "__main__":
input = '{}'.format(msi_fullapth)
key = "Build"
value = "test_value"
if os.path.exists(input):
setInstallerAttribute(input, key, value)
os.rename(input, {some other path})
The function is written because previously I was using a VBScript to set this (key, value) pair:
Option Explicit
Dim installer, database, view, myproperty, stdout, key
Set installer = CreateObject("WindowsInstaller.Installer")
Set database = installer.OpenDatabase (WScript.Arguments.Item(0), 1)
' Update Property'
'Set view = database.OpenView ("UPDATE Property SET Value = '" & myproperty & "' WHERE Property = 'MYPROPERTY'")'
myproperty = WScript.Arguments.Item(2)
key = WScript.Arguments.Item(1)
' Add/Insert Property'
Set view = database.OpenView ("INSERT INTO Property (Property, Value) VALUES ('" & key & "', '" & myproperty & "')")
view.Execute
database.Commit
Set database = Nothing
Set installer = Nothing
Set view = Nothing
I would call this in my python code with os.system(cscript {VBScript} {path} {Key} {Value}), however I want minimal external dependencies as possible with my python code. I was looking around for some answers, I looked into the comtypes documentation to see if I can explicitly release or "uncouple" my COM object. I tried using installerCOM.Quit() and installerCOM.Exit() which seem not be options for WindowsInstaller.Installer Objects.
Finally, I read in several previous non-python (C# mainly) answers on StackOverflow stating that setting the COM object variables to null would solve this, this is also clear from the VBScript but this does not seem to work in python with None
Maybe:
import gc
def setInstallerAttribute(installer_path, attribute_key, attribute_value):
installerCOM = comtypes.client.CreateObject("WindowsInstaller.Installer")
installerDatabase = installerCOM.OpenDatabase (installer_path, 1)
view = installerDatabase.OpenView ("INSERT INTO Property (Property, Value) VALUES ('{0}', '{1}')".format(attribute_key, attribute_value))
view.Execute
installerDatabase.Commit
del view
del installerDatabase
del installerCOM
gc.collect()
How can I get the username value from the "Last saved by" property from any windows file?
e.g.: I can see this info right clicking on a word file and accessing the detail tab. See the picture below:
Does any body knows how can I get it using python code?
Following the comment from #user1558604, I searched a bit on google and reached a solution. I tested on extensions .docx, .xlsx, .pptx.
import zipfile
import xml.dom.minidom
# Open the MS Office file to see the XML structure.
filePath = r"C:\Users\Desktop\Perpetual-Draft-2019.xlsx"
document = zipfile.ZipFile(filePath)
# Open/read the core.xml (contains the last user and modified date).
uglyXML = xml.dom.minidom.parseString(document.read('docProps/core.xml')).toprettyxml(indent=' ')
# Split lines in order to create a list.
asText = uglyXML.splitlines()
# loop the list in order to get the value you need. In my case last Modified By and the date.
for item in asText:
if 'lastModifiedBy' in item:
itemLength = len(item)-20
print('Modified by:', item[21:itemLength])
if 'dcterms:modified' in item:
itemLength = len(item)-29
print('Modified On:', item[46:itemLength])
The result in the console is:
Modified by: adm.UserName
Modified On: 2019-11-08"
This is the part of the mailer.py script:
config = pyfig.Pyfig(config_file)
svnlook = config.general.svnlook #svnlook path
sendmail = config.general.sendmail #sendmail path
From = config.general.from_email #from email address
To = config.general.to_email #to email address
what does this config variable contain? Is there a way to get the value for config variable without pyfig?
In this case config = a pyfig.Pyfig object initialised with the contents of the file named by the content of the string config_file.
To find out what that object does and contains you can either look at the documentation and/or the source code, both here, or you can print out, after the initialisation, e.g.:
config = pyfig.Pyfig(config_file)
print "Config Contains:\n\t", '\n\t'.join(dir(config))
if hasattr(config, "keys"):
print "Config Keys:\n\t", '\n\t'.join(config.keys())
or if you are using Python 3,
config = pyfig.Pyfig(config_file)
print("Config Contains:\n\t", '\n\t'.join(dir(config)))
if hasattr(config, "keys"):
print("Config Keys:\n\t", '\n\t'.join(config.keys()))
To get the same data without pyfig you would need to read and parse at the content of the file referenced by config_file within your own code.
N.B.: Note that pyfig seems to be more or less abandoned - no updates in over 5 years, web site no longer exists, etc., so I would strongly recommend converting the code to use a json configuration file instead.
This code is copy from http://code.google.com/p/closure-library/source/browse/trunk/closure/bin/build/source.py
The Source class's __str
__method referred self._path
Is it a special property for self?
Cuz, i couldn't find the place define this variable at Source Class
import re
_BASE_REGEX_STRING = '^\s*goog\.%s\(\s*[\'"](.+)[\'"]\s*\)'
_PROVIDE_REGEX = re.compile(_BASE_REGEX_STRING % 'provide')
_REQUIRES_REGEX = re.compile(_BASE_REGEX_STRING % 'require')
# This line identifies base.js and should match the line in that file.
_GOOG_BASE_LINE = (
'var goog = goog || {}; // Identifies this file as the Closure base.')
class Source(object):
"""Scans a JavaScript source for its provided and required namespaces."""
def __init__(self, source):
"""Initialize a source.
Args:
source: str, The JavaScript source.
"""
self.provides = set()
self.requires = set()
self._source = source
self._ScanSource()
def __str__(self):
return 'Source %s' % self._path #!!!!!! what is self_path !!!!
def GetSource(self):
"""Get the source as a string."""
return self._source
def _ScanSource(self):
"""Fill in provides and requires by scanning the source."""
# TODO: Strip source comments first, as these might be in a comment
# block. RegExes can be borrowed from other projects.
source = self.GetSource()
source_lines = source.splitlines()
for line in source_lines:
match = _PROVIDE_REGEX.match(line)
if match:
self.provides.add(match.group(1))
match = _REQUIRES_REGEX.match(line)
if match:
self.requires.add(match.group(1))
# Closure's base file implicitly provides 'goog'.
for line in source_lines:
if line == _GOOG_BASE_LINE:
if len(self.provides) or len(self.requires):
raise Exception(
'Base files should not provide or require namespaces.')
self.provides.add('goog')
def GetFileContents(path):
"""Get a file's contents as a string.
Args:
path: str, Path to file.
Returns:
str, Contents of file.
Raises:
IOError: An error occurred opening or reading the file.
"""
fileobj = open(path)
try:
return fileobj.read()
finally:
fileobj.close()
No, _path is just an attribute that may or me not be set on an object like any other attribute. The leading underscore simply means that the author felt it was an internal detail of the object and didn't want it regarded as part of the public interface.
In this particular case, unless something is setting the attribute from outside that source file, it looks like it's simply a mistake. It won't do any harm unless anyone ever tries to call str() on a Source object and probably nobody ever does.
BTW, you seem to be thinking there is something special about self. The name self isn't special in any way: it's a convention to use this name for the first parameter of a method, but it is just a name like any other that refers to the object being processed. So if you could access self._path without causing an error you could access it equally well through any other name for the object.