Copying to the clipboard from an application that supports rich text will typically add the text in several formats. I need to find out the available formats, and then retrieve the clipboard contents in a selected format. In case it matters, I'm interested in rich text formats (from Word, Acrobat, browsers, ...), not in image data or other exotica.
I've looked and looked, but the solutions I've found are limited to plain text, super outdated, specific to Windows (I'm on OS X), reliant on the commandline utilities pbcopy and pbpaste (which don't handle all clipboard formats), or several of the above.
So: How can I get a list of the formats present in the clipboard, and extract its contents in a format of my choice?
Platforms, in order of interest: system-independent (I wish), OS X Mountain Lion (my current platform) or similar, other platforms (I plan to distribute my code).
Selected links
pyperclip: Looks interesting, but on OS X it delegates to pbcopy and pbpaste which support text, rtf and ps formats only.
This recipe from activestate is for Windows only, but shows how to get HTML. (This SO question refers to it).
This SO answer is also specific to win32clipboard.
This question is about dragging and dropping files to the clipboard (on Windows). Interesting, but no help with what I need.
This tkinter-based solution is simple and still works on OS X, but it only gets plain text-- and I've found no evidence that tkinter can handle anything else.
This shows near-identical tkinter code for putting text on the clipboard.
Edit (May 2017)
I now have a solution for OS X (see self-answer below), but I would appreciate hearing if (and how) pyperclip or another module can do the same on Windows. Pyperclip gets its hands deep in the Windows API, so it can't be very far from supporting a listing and selection of all available formats.
It's quite straightforward on OS X with the help of the module richxerox, available on pypi. It requires system support including the Apple AppKit and Foundation modules. I had trouble building Objective C for Python 3, so that initially I had only gotten this to work for Python 2. Anaconda 3 comes with all the necessary pieces preinstalled, however.
Here's a demo that prints the available clipboard types, and then fetches and prints each one:
import richxerox as rx
# Dump formats
verbose = True
if verbose:
print(rx.available(neat=False, dyn=True))
else:
print(rx.available())
# Dump contents in all formats
for k, v in rx.pasteall(neat=False, dyn=True).items():
line = "\n*** "+k+": "+v
print(line)
Output:
(
"public.html",
"public.utf8-plain-text"
)
*** public.html: <html><head><meta http-equiv="content-type" content="text/html; charset=utf-8">
</head><body><a href="http://coffeeghost.net/2010/10/09/pyperclip-a-cross-platform-clipboard-module-for-python/"
rel="nofollow noreferrer">pyperclip</a>: Looks interesting</body></html>
*** public.utf8-plain-text: pyperclip: Looks interesting
To print in a desired format with fall-back to text, you could use this:
paste_format = "rtf"
content = rx.paste(paste_format)
if not content:
content = rx.paste("text")
Or you could first check if a format is available:
if "public.rtf" in rx.available():
content = rx.paste("rtf")
else:
content = rx.paste("text")
Related
I have this small program and it needs to create a small .txt file in their 'My Documents' Folder. Here's the code I have for that:
textfile=open('C:\Users\MYNAME\Documents','w')
lines=['stuff goes here']
textfile.writelines(lines)
textfile.close()
The problem is that if other people use it, how do I change the MYNAME to their account name?
Use os.path.expanduser(path), see http://docs.python.org/library/os.path.html
e.g. expanduser('~/filename')
This works on both Unix and Windows, according to the docs.
Edit: forward slash due to Sven's comment.
This works without any extra libs:
import ctypes.wintypes
CSIDL_PERSONAL = 5 # My Documents
SHGFP_TYPE_CURRENT = 0 # Get current, not default value
buf= ctypes.create_unicode_buffer(ctypes.wintypes.MAX_PATH)
ctypes.windll.shell32.SHGetFolderPathW(None, CSIDL_PERSONAL, None, SHGFP_TYPE_CURRENT, buf)
print(buf.value)
Also works if documents location and/or default save location is changed by user.
On Windows, you can use something similar what is shown in the accepted answer to the question: Python, get windows special folders for currently logged-in user.
For the My Documents folder path, useshellcon.CSIDL_PERSONALin the shell.SHGetFolderPath() function call instead of shellcon.CSIDL_MYPICTURES.
So, assuming you have the PyWin32 extensions1 installed, this might work (see caveat in Update section below):
>>> from win32com.shell import shell, shellcon
>>> shell.SHGetFolderPath(0, shellcon.CSIDL_PERSONAL, None, 0)
u'<path\\to\\folder>'
Update: I just read something that said that CSIDL_PERSONAL won't return the correct folder if the user has changed the default save folder in the Win7 Documents library. This is referring to what you can do in library's Properties dialog:
The checkmark means that the path is set as the default save location.
I currently am unware of a way to call the SHLoadLibraryFromKnownFolder() function through PyWin32 (there currently isn't a shell.SHLoadLibraryFromKnownFolder. However it should be possible to do so using the ctypes module.
1Installers for the latest versions of the Python for Windows Extensions are currently available from: http://sourceforge.net/projects/pywin32
I'm working on a python script that creates numerous images files based on a variety of inputs in OS X Yosemite. I am trying to write the inputs used to create each file as 'Finder comments' as each file is created so that IF the the output is visually interesting I can look at the specific input values that generated the file. I've verified that this can be done easily with apple script.
tell application "Finder" to set comment of (POSIX file "/Users/mgarito/Desktop/Random_Pixel_Color/2015-01-03_14.04.21.png" as alias) to {Val1, Val2, Val3} as Unicode text
Afterward, upon selecting the file and showing its info (cmd+i) the Finder comments clearly display the expected text 'Val1, Val2, Val2'.
This is further confirmed by running mdls [File/Path/Name] before and after the applescript is used which clearly shows the expected text has been properly added.
The problem is I can't figure out how to incorporate this into my python script to save myself.
Im under the impression the solution should* be something to the effect of:
VarList = [Var1, Var2, Var3]
Fiele = [File/Path/Name]
file.os.system.add(kMDItemFinderComment, VarList)
As a side note I've also look at xattr -w [Attribute_Name] [Attribute_Value] [File/Path/Name] but found that though this will store the attribute, it is not stored in the desired location. Instead it ends up in an affiliated pList which is not what I'm after.
Here is my way to do that.
First you need to install applescript package using pip install applescript command.
Here is a function to add comments to a file:
def set_comment(file_path, comment_text):
import applescript
applescript.tell.app("Finder", f'set comment of (POSIX file "{file_path}" as alias) to "{comment_text}" as Unicode text')
and then I'm just using it like this:
set_comment('/Users/UserAccountName/Pictures/IMG_6860.MOV', 'my comment')
After more digging, I was able to locate a python applescript bundle: https://pypi.python.org/pypi/py-applescript
This got me to a workable answer, though I'd still prefer to do this natively in python if anyone has a better option?
import applescript
NewFile = '[File/Path/Name]' <br>
Comment = "Almost there.."
AddComment = applescript.AppleScript('''
on run {arg1, arg2}
tell application "Finder" to set comment of (POSIX file arg1 as alias) to arg2 as Unicode text
return
end run
''')
print(AddComment.run(NewFile, Comment))
print("Done")
This is the function to get comment of a file.
def get_comment(file_path):
import applescript
return applescript.tell.app("Finder", f'get comment of (POSIX file "{file_path}" as alias)').out
print(get_comment('Your Path'))
Another approach is to use appscript, a high-level Apple event bridge that is sadly no longer officially supported but still works (and saw an updated release in Jan. 2021). Here is an example of reading and setting the comment on a file:
import appscript
import mactypes
# Get a handle on the Finder.
finder = appscript.app('Finder')
# Tell Finder to select the file.
file = finder.items[mactypes.Alias("/path/to/a/file")]
# Print the current comment
comment = file.comment()
print("Current comment: " + comment)
# Set a new comment.
file.comment.set("New comment")
# Print the current comment again to verify.
comment = file.comment()
print("Current comment: " + comment)
Despite that the author of appscript recommends against using it in new projects, I used it recently to create a command-line utility called Urial for the specialized purpose of writing and updating URIs in Finder comments. Perhaps its code can serve as an an additional example of using appscript to manipulate Finder comments.
I have this small program and it needs to create a small .txt file in their 'My Documents' Folder. Here's the code I have for that:
textfile=open('C:\Users\MYNAME\Documents','w')
lines=['stuff goes here']
textfile.writelines(lines)
textfile.close()
The problem is that if other people use it, how do I change the MYNAME to their account name?
Use os.path.expanduser(path), see http://docs.python.org/library/os.path.html
e.g. expanduser('~/filename')
This works on both Unix and Windows, according to the docs.
Edit: forward slash due to Sven's comment.
This works without any extra libs:
import ctypes.wintypes
CSIDL_PERSONAL = 5 # My Documents
SHGFP_TYPE_CURRENT = 0 # Get current, not default value
buf= ctypes.create_unicode_buffer(ctypes.wintypes.MAX_PATH)
ctypes.windll.shell32.SHGetFolderPathW(None, CSIDL_PERSONAL, None, SHGFP_TYPE_CURRENT, buf)
print(buf.value)
Also works if documents location and/or default save location is changed by user.
On Windows, you can use something similar what is shown in the accepted answer to the question: Python, get windows special folders for currently logged-in user.
For the My Documents folder path, useshellcon.CSIDL_PERSONALin the shell.SHGetFolderPath() function call instead of shellcon.CSIDL_MYPICTURES.
So, assuming you have the PyWin32 extensions1 installed, this might work (see caveat in Update section below):
>>> from win32com.shell import shell, shellcon
>>> shell.SHGetFolderPath(0, shellcon.CSIDL_PERSONAL, None, 0)
u'<path\\to\\folder>'
Update: I just read something that said that CSIDL_PERSONAL won't return the correct folder if the user has changed the default save folder in the Win7 Documents library. This is referring to what you can do in library's Properties dialog:
The checkmark means that the path is set as the default save location.
I currently am unware of a way to call the SHLoadLibraryFromKnownFolder() function through PyWin32 (there currently isn't a shell.SHLoadLibraryFromKnownFolder. However it should be possible to do so using the ctypes module.
1Installers for the latest versions of the Python for Windows Extensions are currently available from: http://sourceforge.net/projects/pywin32
I'm trying to use pyPdf to extract and print pages from a multipage PDF. Problem is, text is not extracted from some pages. I've put an example file here:
http://www.4shared.com/document/kmJF67E4/forms.html
If you run the following, the first 81 pages return no text, while the final 11 extract properly. Can anyone help?
from pyPdf import PdfFileReader
input = PdfFileReader(file("forms.pdf", "rb"))
for page in input1.pages:
print page.extractText()
Note that extractText() still has problems extracting the text properly. From the documentation for extractText():
This works well for some PDF files,
but poorly for others, depending on
the generator used. This will be
refined in the future. Do not rely on
the order of text coming out of this
function, as it will change if this
function is made more sophisticated.
Since it is the text you want, you can use the Linux command pdftotext.
To invoke that using Python, you can do this:
>>> import subprocess
>>> subprocess.call(['pdftotext', 'forms.pdf', 'output'])
The text is extracted from forms.pdf and saved to output.
This works in the case of your PDF file and extracts the text you want.
This isn't really an answer, but the problem with pyPdf is this: it doesn't yet support CMaps. PDF allows fonts to use CMaps to map character IDs (bytes in the PDF) to Unicode character codes. When you have a PDF that contains non-ASCII characters, there's probably a CMap in use, and even sometimes when there's no non-ASCII characters. When pyPdf encounters strings that are not in standard Unicode encoding, it just sees a bunch of byte code; it can't convert those bytes to Unicode, so it just gives you empty strings. I actually had this same problem and I'm working on the source code at the moment. It's time consuming, but I hope to send a patch to the maintainer some time around mid-2011.
You could also try the pdfminer library (also in python), and see if it's better at extracting the text. For splitting however, you will have to stick with pyPdf as pdfminer doesn't support that.
I find it sometimes useful to convert it to ps (try with pdf2psand pdftops for potential differences) then back to pdf (ps2pdf). Then try your original script again.
I had similar problem with some pdfs and for windows, this is working excellent for me:
1.- Download Xpdf tools for windows
2.- copy pdftotext.exe from xpdf-tools-win-4.00\bin32 to C:\Windows\System32 and also to C:\Windows\SysWOW64
3.- use subprocess to run command from console:
import subprocess
try:
extInfo = subprocess.check_output('pdftotext.exe '+filePath + ' -',shell=True,stderr=subprocess.STDOUT).strip()
except Exception as e:
print (e)
I'm starting to think I should adopt a messy two-part solution. there are two sections to the PDF, pp 1-82 which have text page labels (pdftotext can extract), and pp 83-end which have no page labels but pyPDF can extract and it explicitly knows pages.
I think I need to combine the two. Clunky, but I don't see any way round it. Sadly I'm having to do this on a Windows machine.
Basically i am trying to find out what version of ArcGIS the user currently has installed, i looked through the registry and couldn't find anything related to a version string. However i know it is stored, within the .exe.
I've done a fair bit of googling, and can't find anything really worth it. I tried using the GetFileVersionInfo, and i seem to get a random mishmash of stuff.
Any ideas?
EDIT
Sigh....
Turns out pywin32 is not always installed on all machines. Does anyone know if its possible to do the same thing via ctypes?
Also this is only for windows.
If you prefer not to do this using pywin32, you would be able to do this with ctypes, for sure.
The trick will be decoding that silly file version structure that comes back.
There's one old mailing list post that is doing what you're asking. Unfortunately, I don't have a windows box handy to test this myself, right now. But if it doesn't work, it should at least give you a good start.
Here's the code, in case those 2006 archives vanish sometime:
import array
from ctypes import *
def get_file_info(filename, info):
"""
Extract information from a file.
"""
# Get size needed for buffer (0 if no info)
size = windll.version.GetFileVersionInfoSizeA(filename, None)
# If no info in file -> empty string
if not size:
return ''
# Create buffer
res = create_string_buffer(size)
# Load file informations into buffer res
windll.version.GetFileVersionInfoA(filename, None, size, res)
r = c_uint()
l = c_uint()
# Look for codepages
windll.version.VerQueryValueA(res, '\\VarFileInfo\\Translation',
byref(r), byref(l))
# If no codepage -> empty string
if not l.value:
return ''
# Take the first codepage (what else ?)
codepages = array.array('H', string_at(r.value, l.value))
codepage = tuple(codepages[:2].tolist())
# Extract information
windll.version.VerQueryValueA(res, ('\\StringFileInfo\\%04x%04x\\'
+ info) % codepage, byref(r), byref(l))
return string_at(r.value, l.value)
print get_file_info(r'C:\WINDOWS\system32\calc.exe', 'FileVersion')
--
Ok - back near a windows box. Have actually tried this code now. "Works for me".
>>> print get_file_info(r'C:\WINDOWS\system32\calc.exe', 'FileVersion')
6.1.7600.16385 (win7_rtm.090713-1255)
there's a gnu linux utility called 'strings' that prints the printable characters in any file(binary or non-binary), try using that and look for a version number like pattern
on windows, you can get strings here http://unxutils.sourceforge.net/