Finding the version of an application from Python? - python

Basically i am trying to find out what version of ArcGIS the user currently has installed, i looked through the registry and couldn't find anything related to a version string. However i know it is stored, within the .exe.
I've done a fair bit of googling, and can't find anything really worth it. I tried using the GetFileVersionInfo, and i seem to get a random mishmash of stuff.
Any ideas?
EDIT
Sigh....
Turns out pywin32 is not always installed on all machines. Does anyone know if its possible to do the same thing via ctypes?
Also this is only for windows.

If you prefer not to do this using pywin32, you would be able to do this with ctypes, for sure.
The trick will be decoding that silly file version structure that comes back.
There's one old mailing list post that is doing what you're asking. Unfortunately, I don't have a windows box handy to test this myself, right now. But if it doesn't work, it should at least give you a good start.
Here's the code, in case those 2006 archives vanish sometime:
import array
from ctypes import *
def get_file_info(filename, info):
"""
Extract information from a file.
"""
# Get size needed for buffer (0 if no info)
size = windll.version.GetFileVersionInfoSizeA(filename, None)
# If no info in file -> empty string
if not size:
return ''
# Create buffer
res = create_string_buffer(size)
# Load file informations into buffer res
windll.version.GetFileVersionInfoA(filename, None, size, res)
r = c_uint()
l = c_uint()
# Look for codepages
windll.version.VerQueryValueA(res, '\\VarFileInfo\\Translation',
byref(r), byref(l))
# If no codepage -> empty string
if not l.value:
return ''
# Take the first codepage (what else ?)
codepages = array.array('H', string_at(r.value, l.value))
codepage = tuple(codepages[:2].tolist())
# Extract information
windll.version.VerQueryValueA(res, ('\\StringFileInfo\\%04x%04x\\'
+ info) % codepage, byref(r), byref(l))
return string_at(r.value, l.value)
print get_file_info(r'C:\WINDOWS\system32\calc.exe', 'FileVersion')
--
Ok - back near a windows box. Have actually tried this code now. "Works for me".
>>> print get_file_info(r'C:\WINDOWS\system32\calc.exe', 'FileVersion')
6.1.7600.16385 (win7_rtm.090713-1255)

there's a gnu linux utility called 'strings' that prints the printable characters in any file(binary or non-binary), try using that and look for a version number like pattern
on windows, you can get strings here http://unxutils.sourceforge.net/

Related

Windows media foundation decoding audio streams

I am new to Windows Media Foundation, I am currently following some tutorials on the basics to get started and decoding audio. However, I am running int a few issues. Using: Windows 10 64bit (1809), and am using Python (ctypes and COM's to interface).
1) The IMFSourceReader will not allow me to select or deselect any stream out. I have tried wav and mp3 formats (and multiple different files), but they all error out. According to the docs, to speed up performance you want to deselect other streams and select the stream you want, in this case, audio.
However:
source_reader.SetStreamSelection(MF_SOURCE_READER_ANY_STREAM, False)
Produces an error:
OSError: [WinError -1072875853] The stream number provided was invalid.
Which should be correct since MF_SOURCE_READER_ANY_STREAM value (DWORD of 4294967294) should be universal? Or am I incorrect in that?
I've tried seeing if I could just select the audio stream:
source_reader.SetStreamSelection(MF_SOURCE_READER_FIRST_AUDIO_STREAM, True)
Which produces a different error:
OSError: exception: access violation reading 0x0000000000000001
My current code up until that point:
MFStartup(MF_VERSION) # initialize
source_reader = IMFSourceReader()
filename = "C:\\test.mp3"
MFCreateSourceReaderFromURL(filename, None, ctypes.byref(source_reader)) # out: source reader.
if source_reader: # Not null
source_reader.SetStreamSelection(MF_SOURCE_READER_ANY_STREAM, False) # invalid stream #?
source_reader.SetStreamSelection(MF_SOURCE_READER_FIRST_AUDIO_STREAM, True) # access violation??
IMFSourceReader seems to be functioning just fine for other functions, Such as GetCurrentMediaType, SetCurrentMediaType, etc. Could it still return IMFSourceReader if there are any issues?
2) I am not sure if not being able to select the streams is causing further issues (I suspect it is). If I just skip selecting or deselecting streams, everything actually works up until trying to convert a sample into a single buffer with ConvertToContiguousBuffer, which, according to the docs, outputs into a IMFMediaBuffer. The problem is, after running that, it does return as S_OK, but the buffer is null. I used GetBufferCount to make sure there are some buffers in the sample atleast, and it always returns 1-3 depending on the file used, so it shouldn't be empty.
Here is the relevant code:
while True:
flags = DWORD()
sample = IMFSample()
source_reader.ReadSample(streamIndex, 0, None, ctypes.byref(flags), None, ctypes.byref(sample)) # flags, sample [out]
if flags.value & MF_SOURCE_READERF_ENDOFSTREAM:
print("READ ALL OF STREAM")
break
if sample:
buffer_count = DWORD()
sample.GetBufferCount(ctypes.byref(buffer_count))
print("BUFFER COUNT IN SAMPLE", buffer_count.value)
else:
print("NO SAMPLE")
continue
buffer = IMFMediaBuffer()
hr = sample.ConvertToContiguousBuffer(ctypes.byref(buffer))
print("Conversion succeeded", hr == 0) # true
if buffer:
print("CREATED BUFFER")
else:
print("BUFFER IS NULL")
break
I am unsure where to go from here, I couldn't find much explanations on the internet regarding these specific issues. Is WMF still the goto for Windows 10? Should I be using something else? I really am stumped and any help is greatly appreciated.
Try using MF_SOURCE_READER_ALL_STREAMS instead of MF_SOURCE_READER_ANY_STREAM.
source_reader.SetStreamSelection(MF_SOURCE_READER_ALL_STREAMS, False)
When reading the samples you also need to specify a valid stream index which in your case I suspect 0 isn't. Try:
source_reader.ReadSample((DWORD)MF_SOURCE_READER_FIRST_AUDIO_STREAM, 0, None, ctypes.byref(flags), None, ctypes.byref(sample))
Also does your Python wrapper return a result when you make your Media Foundation calls? Nearly all Media Foundation methods return a HRESULT and it's important to check it equals S_OK before proceeding. If you don't it's very hard to work out where a wrong call occurred.
Is WMF still the goto for Windows 10?
Many people have asked that question. The answer depends on what you need to do (in your case audio codec support is very limited so perhaps it isn't the best option). But for things like rendering audio/video, reading/writing to media files, audio/video device capture etc. it is still the case that Media Foundation is the most up to date Microsoft supported option.
Usually, with MediaFoundation, you need to call CoInitializeEx before MFStartup :
Media Foundation and COM
Best Practices for Applications
In Media Foundation, asynchronous processing and callbacks are handled by work queues. Work queues always have multithreaded apartment (MTA) threads, so an application will have a simpler implementation if it runs on an MTA thread as well. Therefore, it is recommended to call CoInitializeEx with the COINIT_MULTITHREADED flag.
MFCreateSourceReaderFromURL function
Remarks
Call CoInitialize(Ex) and MFStartup before calling this function.
In your code, I don't see the call to CoInitializeEx.
EDIT
Because you are using audio file, normally you should have only one audio stream, and index should be 0.
Try :
source_reader.SetStreamSelection(0, True)
and tell us the result.

How can I retrieve a file's details in Windows using the Standard Library for Python 2

I need to read the details for a file in Windows so that I can interrogate the file's 'File version' as displayed in the Details tab of the File Properties window.
I haven't found anything in the standard library that makes this very easy to accomplish but figured if I could find the right windows function, I could probably accomplish it using ctypes.
Does anyone have any exemplary code or can they point me to a Windows function that would let me read this info. I took a look a GetFileAttributes already, but that wasn't quite right as far as I could tell.
Use the win32 api Version Information functions from ctypes. The api is a little fiddly to use, but I also want this so have thrown together a quick script as an example.
usage: version_info.py [-h] [--lang LANG] [--codepage CODEPAGE] path
Can also use as a module, see the VersionInfo class. Checked with Python 2.7 and 3.6 against a few files.
import array
from ctypes import *
def get_file_info(filename, info):
"""
Extract information from a file.
"""
# Get size needed for buffer (0 if no info)
size = windll.version.GetFileVersionInfoSizeA(filename, None)
# If no info in file -> empty string
if not size:
return ''
# Create buffer
res = create_string_buffer(size)
# Load file informations into buffer res
windll.version.GetFileVersionInfoA(filename, None, size, res)
r = c_uint()
l = c_uint()
# Look for codepages
windll.version.VerQueryValueA(res, '\\VarFileInfo\\Translation',
byref(r), byref(l))
# If no codepage -> empty string
if not l.value:
return ''
# Take the first codepage (what else ?)
codepages = array.array('H', string_at(r.value, l.value))
codepage = tuple(codepages[:2].tolist())
# Extract information
windll.version.VerQueryValueA(res, ('\\StringFileInfo\\%04x%04x\\'
+ info) % codepage, byref(r), byref(l))
return string_at(r.value, l.value)

Accessing alternate clipboard formats from python

Copying to the clipboard from an application that supports rich text will typically add the text in several formats. I need to find out the available formats, and then retrieve the clipboard contents in a selected format. In case it matters, I'm interested in rich text formats (from Word, Acrobat, browsers, ...), not in image data or other exotica.
I've looked and looked, but the solutions I've found are limited to plain text, super outdated, specific to Windows (I'm on OS X), reliant on the commandline utilities pbcopy and pbpaste (which don't handle all clipboard formats), or several of the above.
So: How can I get a list of the formats present in the clipboard, and extract its contents in a format of my choice?
Platforms, in order of interest: system-independent (I wish), OS X Mountain Lion (my current platform) or similar, other platforms (I plan to distribute my code).
Selected links
pyperclip: Looks interesting, but on OS X it delegates to pbcopy and pbpaste which support text, rtf and ps formats only.
This recipe from activestate is for Windows only, but shows how to get HTML. (This SO question refers to it).
This SO answer is also specific to win32clipboard.
This question is about dragging and dropping files to the clipboard (on Windows). Interesting, but no help with what I need.
This tkinter-based solution is simple and still works on OS X, but it only gets plain text-- and I've found no evidence that tkinter can handle anything else.
This shows near-identical tkinter code for putting text on the clipboard.
Edit (May 2017)
I now have a solution for OS X (see self-answer below), but I would appreciate hearing if (and how) pyperclip or another module can do the same on Windows. Pyperclip gets its hands deep in the Windows API, so it can't be very far from supporting a listing and selection of all available formats.
It's quite straightforward on OS X with the help of the module richxerox, available on pypi. It requires system support including the Apple AppKit and Foundation modules. I had trouble building Objective C for Python 3, so that initially I had only gotten this to work for Python 2. Anaconda 3 comes with all the necessary pieces preinstalled, however.
Here's a demo that prints the available clipboard types, and then fetches and prints each one:
import richxerox as rx
# Dump formats
verbose = True
if verbose:
print(rx.available(neat=False, dyn=True))
else:
print(rx.available())
# Dump contents in all formats
for k, v in rx.pasteall(neat=False, dyn=True).items():
line = "\n*** "+k+": "+v
print(line)
Output:
(
"public.html",
"public.utf8-plain-text"
)
*** public.html: <html><head><meta http-equiv="content-type" content="text/html; charset=utf-8">
</head><body><a href="http://coffeeghost.net/2010/10/09/pyperclip-a-cross-platform-clipboard-module-for-python/"
rel="nofollow noreferrer">pyperclip</a>: Looks interesting</body></html>
*** public.utf8-plain-text: pyperclip: Looks interesting
To print in a desired format with fall-back to text, you could use this:
paste_format = "rtf"
content = rx.paste(paste_format)
if not content:
content = rx.paste("text")
Or you could first check if a format is available:
if "public.rtf" in rx.available():
content = rx.paste("rtf")
else:
content = rx.paste("text")

How to unlock a "secured" (read-protected) PDF in Python?

In Python I'm using pdfminer to read the text from a pdf with the code below this message. I now get an error message saying:
File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdfpage.py", line 124, in get_pages
raise PDFTextExtractionNotAllowed('Text extraction is not allowed: %r' % fp)
PDFTextExtractionNotAllowed: Text extraction is not allowed: <cStringIO.StringO object at 0x7f79137a1
ab0>
When I open this pdf with Acrobat Pro it turns out it is secured (or "read protected"). From this link however, I read that there's a multitude of services which can disable this read-protection easily (for example pdfunlock.com. When diving into the source of pdfminer, I see that the error above is generated on these lines.
if check_extractable and not doc.is_extractable:
raise PDFTextExtractionNotAllowed('Text extraction is not allowed: %r' % fp)
Since there's a multitude of services which can disable this read-protection within a second, I presume it is really easy to do. It seems that .is_extractable is a simple attribute of the doc, but I don't think it is as simple as changing .is_extractable to True..
Does anybody know how I can disable the read protection on a pdf using Python? All tips are welcome!
================================================
Below you will find the code with which I currently extract the text from non-read protected.
def getTextFromPDF(rawFile):
resourceManager = PDFResourceManager(caching=True)
outfp = StringIO()
device = TextConverter(resourceManager, outfp, codec='utf-8', laparams=LAParams(), imagewriter=None)
interpreter = PDFPageInterpreter(resourceManager, device)
fileData = StringIO()
fileData.write(rawFile)
for page in PDFPage.get_pages(fileData, set(), maxpages=0, caching=True, check_extractable=True):
interpreter.process_page(page)
fileData.close()
device.close()
result = outfp.getvalue()
outfp.close()
return result
I had some issues trying to get qpdf to behave in my program. I found a useful library, pikepdf, that is based on qpdf and automatically converts pdfs to be extractable.
The code to use this is pretty straightforward:
import pikepdf
pdf = pikepdf.open('unextractable.pdf')
pdf.save('extractable.pdf')
As far as I know, in most cases the full content of the PDF is actually encrypted, using the password as the encryption key, and so simply setting .is_extractable to True isn't going to help you.
Per this thread:
Does a library exist to remove passwords from PDFs programmatically?
I would recommend removing the read-protection with a command-line tool such as qpdf (easily installable, e.g. on Ubuntu use apt-get install qpdf if you don't have it already):
qpdf --password=PASSWORD --decrypt SECURED.pdf UNSECURED.pdf
Then open the unlocked file with pdfminer and do your stuff.
For a pure-Python solution, you can try using PyPDF2 and its .decrypt() method, but it doesn't work with all types of encryption, so really, you're better off just using qpdf - see:
https://github.com/mstamy2/PyPDF2/issues/53
I used below code using pikepdf and able to overwrite.
import pikepdf
pdf = pikepdf.open('filepath', allow_overwriting_input=True)
pdf.save('filepath')
In my case there was no password, but simply setting check_extractable=False circumvented the PDFTextExtractionNotAllowed exception for a problematic file (that opened fine in other viewers).
Full disclosure, I am one of the maintainers of pdfminer.six. It is a community-maintained version of pdfminer for python 3.
This issue was fixed in 2020 by disabling the check_extractable by default. It now shows a warning instead of raising an error.
Similar question and answer here.
The 'check_extractable=True' argument is by design.
Some PDFs explicitly disallow to extract text, and PDFMiner follows the directive. You can override it (giving check_extractable=False), but do it at your own risk.
If you want to unlock all pdf files in a folder without renaming them, you may use this code:
import glob, os, pikepdf
p = os.getcwd()
for file in glob.glob('*.pdf'):
file_path = os.path.join(p, file).replace('\\','/')
init_pdf = pikepdf.open(file_path)
new_pdf = pikepdf.new()
new_pdf.pages.extend(init_pdf.pages)
new_pdf.save(str(file))
In pikepdf library it is impossible to overwrite the existing file by saving it with the same name. In contrast, you would like to copy the pages to the newly created empty pdf file, and save it.
I too faced the same problem of parsing the secured pdf but it has got resolved using pikepdf library. I tried this library on my jupyter notebbok and on windows os but it gave errors but it worked smoothly on Ubuntu
If you've forgotten the password to your PDF, below is a generic script which tries a LOT of password combinations on the same PDF. It uses pikepdf, but you can update the function check_password to use something else.
Usage example:
I used this when I had forgotten a password on a bank PDF. I knew that my bank always encrypts these kind of PDFs with the same password-structure:
Total length = 8
First 4 characters = an uppercase letter.
Last 4 characters = a number.
I call script as follows:
check_passwords(
pdf_file_path='/Users/my_name/Downloads/XXXXXXXX.pdf',
combination=[
ALPHABET_UPPERCASE,
ALPHABET_UPPERCASE,
ALPHABET_UPPERCASE,
ALPHABET_UPPERCASE,
NUMBER,
NUMBER,
NUMBER,
NUMBER,
]
)
Password-checking script:
(Requires Python3.8, with libraries numpy and pikepdf)
from typing import *
from itertools import product
import time, pikepdf, math, numpy as np
from pikepdf import PasswordError
ALPHABET_UPPERCASE: Sequence[str] = tuple('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
ALPHABET_LOWERCASE: Sequence[str] = tuple('abcdefghijklmnopqrstuvwxyz')
NUMBER: Sequence[str] = tuple('0123456789')
def as_list(l):
if isinstance(l, (list, tuple, set, np.ndarray)):
l = list(l)
else:
l = [l]
return l
def human_readable_numbers(n, decimals: int = 0):
n = round(n)
if n < 1000:
return str(n)
names = ['', 'thousand', 'million', 'billion', 'trillion', 'quadrillion']
n = float(n)
idx = max(0,min(len(names)-1,
int(math.floor(0 if n == 0 else math.log10(abs(n))/3))))
return f'{n/10**(3*idx):.{decimals}f} {names[idx]}'
def check_password(pdf_file_path: str, password: str) -> bool:
## You can modify this function to use something other than pike pdf.
## This function should throw return True on success, and False on password-failure.
try:
pikepdf.open(pdf_file_path, password=password)
return True
except PasswordError:
return False
def check_passwords(pdf_file_path, combination, log_freq: int = int(1e4)):
combination = [tuple(as_list(c)) for c in combination]
print(f'Trying all combinations:')
for i, c in enumerate(combination):
print(f"{i}) {c}")
num_passwords: int = np.product([len(x) for x in combination])
passwords = product(*combination)
success: bool | str = False
count: int = 0
start: float = time.perf_counter()
for password in passwords:
password = ''.join(password)
if check_password(pdf_file_path, password=password):
success = password
print(f'SUCCESS with password "{password}"')
break
count += 1
if count % int(log_freq) == 0:
now = time.perf_counter()
print(f'Tried {human_readable_numbers(count)} ({100*count/num_passwords:.1f}%) of {human_readable_numbers(num_passwords)} passwords in {(now-start):.3f} seconds ({human_readable_numbers(count/(now-start))} passwords/sec). Latest password tried: "{password}"')
end: float = time.perf_counter()
msg: str = f'Tried {count} passwords in {1000*(end-start):.3f}ms ({count/(end-start):.3f} passwords/sec). '
msg += f"Correct password: {success}" if success is not False else f"All {num_passwords} passwords failed."
print(msg)
Comments
Obviously, don't use this to break into PDFs which are not your own. I hold no responsibility over how you use this script or any consequences of using it.
A lot of optimizations can be made.
Right now check_password uses pikepdf, which loads the file from disk for every "check". This is really slow, ideally it should run against an in-memory copy. I haven't figured out a way to do that, though.
You can probably speed this up a LOT by calling qpdf directly using C++, which is much better than Python for this kind of stuff.
I would avoid multi-processing here, since we're calling the same qpdf binary (which is normally a system-wide installation), which might become the bottleneck.

Finding the user's "My Documents" path

I have this small program and it needs to create a small .txt file in their 'My Documents' Folder. Here's the code I have for that:
textfile=open('C:\Users\MYNAME\Documents','w')
lines=['stuff goes here']
textfile.writelines(lines)
textfile.close()
The problem is that if other people use it, how do I change the MYNAME to their account name?
Use os.path.expanduser(path), see http://docs.python.org/library/os.path.html
e.g. expanduser('~/filename')
This works on both Unix and Windows, according to the docs.
Edit: forward slash due to Sven's comment.
This works without any extra libs:
import ctypes.wintypes
CSIDL_PERSONAL = 5 # My Documents
SHGFP_TYPE_CURRENT = 0 # Get current, not default value
buf= ctypes.create_unicode_buffer(ctypes.wintypes.MAX_PATH)
ctypes.windll.shell32.SHGetFolderPathW(None, CSIDL_PERSONAL, None, SHGFP_TYPE_CURRENT, buf)
print(buf.value)
Also works if documents location and/or default save location is changed by user.
On Windows, you can use something similar what is shown in the accepted answer to the question: Python, get windows special folders for currently logged-in user.
For the My Documents folder path, useshellcon.CSIDL_PERSONALin the shell.SHGetFolderPath() function call instead of shellcon.CSIDL_MYPICTURES.
So, assuming you have the PyWin32 extensions1 installed, this might work (see caveat in Update section below):
>>> from win32com.shell import shell, shellcon
>>> shell.SHGetFolderPath(0, shellcon.CSIDL_PERSONAL, None, 0)
u'<path\\to\\folder>'
Update: I just read something that said that CSIDL_PERSONAL won't return the correct folder if the user has changed the default save folder in the Win7 Documents library. This is referring to what you can do in library's Properties dialog:
The checkmark means that the path is set as the default save location.
I currently am unware of a way to call the SHLoadLibraryFromKnownFolder() function through PyWin32 (there currently isn't a shell.SHLoadLibraryFromKnownFolder. However it should be possible to do so using the ctypes module.
1Installers for the latest versions of the Python for Windows Extensions are currently available from: http://sourceforge.net/projects/pywin32

Categories