XMP image tagging and Python

XMP image tagging and Python - python

If I were to tag a bunch of images via XMP, in Python, what would be the best way? I've used Perl's Image::ExifTool and I am very much used to its reliability. I mean the thing never bricked on tens of thousands of images.
I found this, backed by some heavy-hitters like the European Space Agency, but it's clearly marked as unstable.
Now, assuming I am comfortable with C++, how easy is it to, say, use the Adobe XMP toolkit directly, in Python? Having never done this before, I am not sure what I'd sign up for.
Update: I tried some libraries out there and, including the fore mentioned toolkit, they are still pretty immature and have glaring problems. I resorted to actually writing an Perl-based server that accepts XML requests to read and write metadata, with the combat-tested Image::EXIF. The amount of code is actually very light, and definitely beats torturing yourself by trying to get the Python libraries to work. The server solution is language-agnostic, so it's a twofer.

Well, they website says that the python-xmp-toolkit uses Exempi, which is based on the Adobe XMP toolkit, via ctypes. What I'm trying to say is that you're not likely to create a better wrapping of the C++ code yourself. If it's unstable (i.e. buggy), it's most likely still cheaper for you to create patches than doing it yourself from scratch.
However, in your special situation, it depends on how much functionality you need. If you just need a single function, then wrapping the C++ code into a small C extension library or with Cython is feasible. When you need to have all functionality & flexibility, you have to create wrappers manually or using SWIG, basically repeating the work already done by other people.

I struggled for several hours with python-xmp-toolkit, and eventually gave up and just wrapped calls to ExifTool.
There is a Ruby library that wraps ExifTool as well (albeit, much better than what I created); I feel it'd be worth porting it to Python for a simple way of dealing with XMP.

For Python 3.x there's py3exiv2 which supports editing XMP metadata
With py3exiv2 you can read and write all standard metadata, create your own XMP namespace or extract the thumbnail embedded in image file.
One thing I like about py3exiv2 is that it's built on the (C++) exiv2 library which seems well-maintained
I did encounter a problem though when installing it on my system (Ubuntu 16.04). To get it working I first had to install the latest version of libexiv2-dev (sudo apt-get install libexiv2-dev), and only after this install py3exiv2 (sudo -H pip3 install py3exiv2)
Here's how I've used py3exiv2 to write a new tag:
import pyexiv2
metadata = pyexiv2.ImageMetadata("file_name.jpg")
metadata.read()
key = "Xmp.xmp.CustomTagKey"
value = "CustomTagValue"
metadata[key] = pyexiv2.XmpTag(key, value)
metadata.write()
(There's also a tutorial in the documentation)

For people finding this thread in the future, I would like to share my solution. I put a package up on the Python Package Index (PyPI) called imgtag. It lets you do basic XMP subject field tag editing using python-xmp-toolkit, but abstracts away all of the frustrating nonsense of actually using python-xmp-toolkit into one-line commands.
Install exempi for your platform, then run
python3 -m pip install imgtag
Now you can use it as such:
from imgtag import ImgTag
# Open image for tag editing
test = ImgTag(
filename="test.jpg", # The image file
force_case="lower", # Converts the case of all tags
# Can be `None`, `"lower"`, `"upper"`
# Default: None
strip=True, # Strips whitespace from the ends of all tags
# Default: True
no_duplicates=True # Removes all duplicate tags (case sensitive)
# Default: True
)
# Print existing tags
print("Current tags:")
for tag in test.get_tags():
print(" Tag:", tag)
# Add tags
test.add_tags(["sleepy", "happy"])
# Remove tags
test.remove_tags(["cute"])
# Set tags, removing all existing tags
test.set_tags(["dog", "good boy"])
# Save changes and close file
test.close()
# Re-open for tag editing
test.open()
# Remove all tags
test.clear_tags()
# Delete the ImgTag object, automatically saving and closing the file
del(test)
I haven't yet added methods for the other XMP fields like description, date, creator, etc. Maybe someday I will, but if you look at how the existing functions work in the source code, you can probably figure out how to add the method yourself. If you do add more methods, make a pull request please. :)

You can use ImageMagic convert, IIRC there's a Python module to it as well.

Related

Python Pip automatically increment version number based on SCM

Similar questions like this were raised many times, but I was not able to find a solution for my specific problem.
I was playing around with setuptools_scm recently and first thought it is exactly what I need. I have it configured like this:
pyproject.toml
[build-system]
requires = ["setuptools_scm"]
build-backend = "setuptools.build_meta"
[project]
...
dynamic = ["version"]
[tool.setuptools_scm]
write_to = "src/hello_python/_version.py"
version_scheme = "python-simplified-semver"
and my __init__.py
from ._version import __version__
from ._version import __version_tuple__
Relevant features it covers for me:
I can use semantic versioning
it is able to use *.*.*.devN version strings
it increments minor version in case of feature-branches
it increments patch/micro version in case of fix-branches
This is all cool. As long as I am on my feature-branch I am able to get the correct version strings.
What I like particularly is, that the dev version string contains the commit hash and is thus unique across multiple branches.
My workflow now looks like this:
create feature or fix branch
commit, (push, ) publish
merge PR to develop-branch
As soon as I am on my feature-branch I am able to run python -m build which generated a new _version.py with the correct version string accordingly to the latest git tag found. If I add new commits, it is fine, as the devN part of the version string changes due to the commit hash. I would even be able to run a python -m twine upload dist/* now. My package is build with correct version, so I simply publish it. This works perfectly fine localy and on CI for both fix and feature branches alike.
The problem that I am facing now, is, that I need a slightly different behavior for my merged PullRequests
As soon as I merge, e.g. 0.0.1.dev####, I want to run my Jenkins job not on the feature-branch anymore, but instead on develop-branch. And the important part now is, I want to
get develop-branch (done by CI)
update version string to same as on branch but without devN, so: 0.0.1
build and publish
In fact, setuptools_scm is changing the version to 0.0.2.dev### now, and I would like to have 0.0.1.
I was tinkering a bit with creating git tags before running setuptools_scm or build, but I was not able to get the correct version string to put into the tag. At this point I am struggling now.
Is anyone aware of a solution to tackle my issue with having?:
minor increment on feature-branches + add .devN
patch/micro increment on fix-branches + add .devN
no increment on develop-branch and version string only containing major.minor.patch of merged branch

TLDR: turning off to write the version number to a file every time setuptools_scm runs could maybe solve your problem, alternatively add the version file to .gitignore.
Explanation:
I also just started using setuptools_scm, so I am not very confident in using it yet.
But, as far as I understand the logic to derive the version number is incremented according to the state of your repository (the detailed logic is documented here: https://github.com/pypa/setuptools_scm/#default-versioning-scheme).
When I am not mistaken, the tool now does exactly what it is expected to: it does NOT set a version name only derived from the tag, but also adds a devSomething because the tag you've set is not referencing the most current commit on the develop branch head in your case.
Also I had the problem that when letting setuptools_scm generate a version and also configuring it to write it to a file, this would lead to another state since the last commit, again generating a dev version number.
To get a "clean" (e.g. v0.0.1) version number I hat to do the tagging after merging (with merge commit) since the merge commit was also taken into account for the version numbering logic.
Still my setup is currently less complex than yours. Just feature and fix branches and just a main branch without develop. So fewer merge commits (I chose to do merge commits, so no linear history). Now after merging with commit I create a Tag manually and formulate its name myself.
And this also only works for me in case I opt-out for writing the version number into a file. This I have done by inserting the following into pyproject.toml:
[tool.setuptools_scm]
# intentionally empty/commented out
# write_to option leads to an unclean workspace during build
# which again is leading setuptools_scm to interpret this during build and producing wheels with unclean version numbers
# write_to = "version.txt"
Since setuptools_scm runs during build, a new version file is also generated, which pollutes your worktree. Since your worktree will never be clean this way, you always get a dev version number. To still have a version file and to let it ignore during build, add the file to your .gitignore.
My approach is not perfect, some manual steps, but for now it works for me.
Certainly not 100% applicable in your CI scenario, but maybe you could change the order of doing merges and tags. I hope this helps somehow.

Python API Compatibility Checker

In my current work environment, we produce a large number of Python packages for internal use (10s if not 100s). Each package has some dependencies, usually on a mixture of internal and external packages, and some of these dependencies are shared.
As we approach dependency hell, updating dependencies becomes a time consuming process. While we care about the functional changes a new version might introduce, of equal (if not more) importance are the API changes that break the code.
Although running unit/integration tests against newer versions of a dependency helps us to catch some issues, our coverage is not close enough to 100% to make this a robust strategy. Release notes and a change log help identify major changes at a high-level, but these rarely exist for internally developed tools or go into enough detail to understand the implications the new version has on the (public) API.
I am looking at otherways to automate this process.
I would like to be able to automatically compare two versions of a Python package and report the API differences between them. In particular this would include backwards incompatible changes such as removing functions/methods/classes/modules, adding positional arguments to a function/method/class and changing the number of items a function/method returns. As a developer, based on the report this generates I should have a greater understanding about the code level implications this version change will introduce, and so the time require to integrate it.
Elsewhere, we use the C++ abi-compliance-checker and are looking at the Java api-compliance-checker to help with this process. Is there a similar tool available for Python? I have found plenty of lint/analysis/refactor tools but nothing that provides this level of functionality. I understand that Python's dynamic typing will make a comprehensive report impossible.
If such a tool does not exist, are they any libraries that could help with implementing a solution? For example, my current approach would be to use an ast.NodeVisitor to traverse the package and build a tree where each node represents a module/class/method/function and then compare this tree to that of another version for the same package.
Edit: since posting the question I have found pysdiff which covers some of my requirements, but interested to see alternatives still.
Edit: also found Upstream-Tracker would is a good example of the sort of information I'd like to end up with.

What about using the AST module to parse the files?
import ast
with file("test.py") as f:
python_src = f.read()
node = ast.parse(python_src) # Note: doesn't compile the src
print ast.dump(node)
There's the walk method on the ast node (described http://docs.python.org/2/library/ast.html)
The astdump might work (available on pypi)
This out of date pretty printer
http://code.activestate.com/recipes/533146-ast-pretty-printer/
The documentation tool Sphinx also extracts the information you are looking for. Perhaps give that a look.
So walk the AST and build a tree with the information you want in it. Once you have a tree you can pickle it and diff later or convert the tree to a text representation in a
text file you can diff with difftools, or some external diff program.
The ast has parse() and compile() methods. Only thing is I'm not entirely sure how much information is available to you after parsing (as you don't want to compile()).

Perhaps you can start by using the inspect module
import inspect
import types
def genFunctions(module):
moduleDict = module.__dict__
for name in dir(module):
if name.startswith('_'):
continue
element = moduleDict[name]
if isinstance(element, types.FunctionType):
argSpec = inspect.getargspec(element)
argList = argSpec.args
print "{}.{}({})".format(module.__name__, name, ", ".join(argList))
That will give you a list of "public" (not starting with underscore) functions with their argument lists. You can add more stuff to print the kwargs, classes, etc.
Once you run that on all the packages/modules you care about, in both old and new versions, you'll have two lists like this:
myPackage.myModule.myFunction1(foo, bar)
myPackage.myModule.myFunction2(baz)
Then you can either just sort and diff them, or write some smarter tooling in Python to actually compare all the names, e.g. to permit additional optional arguments but reject new mandatory arguments.

Check out zope.interfaces (you can get it from PyPI). Then you can incorporate unit testing that modules support interfaces into your unit tests. May take a while to retro fit however - also it's not a silver bullet.

Navigating a big Python codebase faster

As programmers we read more than we write. I've started working at a company that uses a couple of "big" Python packages; packages or package-families that have a high KLOC. Case in point: Zope.
My problem is that I have trouble navigating this codebase fast/easily. My current strategy is
I start reading a module I need to change/understand
I hit an import which I need to know more of
I find out where the source code for that import is by placing a Python debug (pdb) statement after the imports and echoing the module, which tells me it's source file
I navigate to it, in shell or the Vim file explorer.
most of the time the module itself imports more modules and before I know it I've got 10KLOC "on my plate"
Alternatively:
I see a method/class I need to know more of
I do a search (ack-grep) for the definition of that method/class across the whole codebase (which can be a pain because the codebase is partly in ~/.buildout-eggs)
I find one or more pieces of code that define that method/class
I have to deduce which one of them is the one I need to read
This costs a lot of time, which is understandable for a big codebase. But I get the feeling that navigating a large and unknown Python codebase is a common enough problem.
So I'm looking for technical tools or strategic solutions for this problem.
...
I just can't imagine hardcore Python programmers using the strategies outlined above.

on Vim, I like NERDTree (a file browser) and taglist.vim (source code browser --> http://www.vim.org/scripts/script.php?script_id=273)
also in Vim, you can use CTRL-] to jump to a definition (:h CTRL-]):
download exuberant ctags http://ctags.sourceforge.net/
follow the install directions and put it somewhere on your PATH
from the 'root' directory of your source code, make a tags file from the shell: "ctags -R"
(make sure you have :set noautochdir, and make sure :pwd is the root directory from step 3)
go into Vim, cursor over some function or class name, hit CTRL-]
by default, if there's multiple matches for the tag, it shows you everywhere it was imported, and where it was declared
if the tag only has one match, it immediately jumps to it
...then use Ctrl+O and Ctrl+I to move back and forth from where you were
(repeat above steps for the source code of particular libraries you use, i usually keep a separate Vim window open to study stuff)

I use ipython's ?? command
You just need to figure out how to import the things you want to look for, then add ?? to the end of the module or class or function or method name to view their source code. And the command completion helps on figuring out long names as well.

Try red pill: https://github.com/klen/python-mode

Render Unified Diff with Python

I have a string which contains svn unified diff. My PyGTK app need to show this diff to user, and I want to render it like other diff tools do, or at least have it colorized.
Do you have something to suggest, external tool, library, custom implementation...? I was loking at http://kafka.fr.free.fr/diff2html/ but I prefer to use some library or sth like that, so users don't need to install third party apps.
I want use this for git and mercurial diffs later as well.

You could use difflib to generate diffs, and pygtkscintilla for syntax-highlighting, line-numbering, code-folding, etc.
If you only want syntax-highlighting (as opposed to all the editor features offered by pygtkscintilla), then you could also look at pygments.

The difflib.HtmlDiff class provides facilities for doing this. However, instead of starting with a unified diff file, HtmlDiff wants you to pass the complete "before" and "after" files. These files are easy to get with svn/git/mercurial commands without using the "diff" functionality of those VCS.

GtkSourceView is a drop in replacement for pygtk's TextView that can syntax-highlight diff files, including unified diffs.

Using cscope to browse Python code with VIM?

Has anyone managed successfully using cscope with Python code? I have VIM 7.2 and the latest version of cscope installed, however it doesn't get my code's tags correctly (always off by a couple of lines). I tried the pycscope script but its output isn't supported by the modern version of cscope.
Any ideas? Or an alternative for browsing Python code with VIM? (I'm specifically interested in the extra features cscope offers beyond the simple tags of ctags)

EDIT: I'm going to run through the process step by step:
Preparing the sources:
exhuberant ctags, has an option: -x
Alternatively, ctags can generate a cross reference file which lists,
in human readable form, information about the various source objects
found in a set of language files.
This is the key to the problem:
ctags -x $(ls **/*.py); # replace with find if no zsh
will give you your database of source objects in a known, format, described under
man ctags; # make sure you use exuberant ctags!
Gnu Global is not limited to only the "out of the box" type of files. Any regular file format will serve.
Also, you can use gtags-cscope, which comes with global as mentioned in section 3.7 of the manual, for a possible shortcut using gtags. You'll end up with an input of a ctags tabular file which Global/gtags can parse to get your objects, or you can use the source for pycscope together with your ctags file of known format to get an input for the vim cscope commands in
if_cscope.txt.
Either way it's quite doable.
Perhaps you'd prefer idutils?
Definintely possible since
z3c.recipe.tags
on pypi makes use of both ctags and idutils to create tag files for a buildout, which is a method I shall investigate in short while.
Of course, you could always use the greputils script below, it has support for idutils , we know idutils works with python, and if that fails, there is also something called vimentry from this year that also uses python, idutils and vim.
Reference links (not complete list):
gtags vimscript, uses Gnu global. updated 2008
greputils vimscript, contains support for the *id idutils, 2005
lid vimscript, Ancient, but this guy is pretty good, his tag and buffer howtos are amazing 2002
An updated version of pyscope, 2010
Hopefully this helps you with your problem, I certainly helped me. I would have been quite sad tonight with a maggoty pycscope.

This seems to work for me:
Change to the top directory of your python code. Create a file called cscope.files:
find . -name '*.py' > cscope.files
cscope -R
You may need to perform a cscope -b first if the cross references don't get built properly.

From a correspondence with the maintainer of cscope, this tool isn't designed to work with Python, and there are no plans to implement that compatibility. Whatever works now, apparently works by mistake, and there is no promise whatsoever that it will keep working.
It appears I've been using an out-of-date version of pycscope. The latest version 0.3 is supported by the cscope DB. The author of pycscope told me that he figured out the output format for the cscope DB from reading the source code of cscope. That format isn't documented, on purpose, but nevertheless it currently works with pycsope 0.3, which is the solution I'll be using.
I'm going to accept this answer since unfortunately no other answer provided help even after bounty was declared. No answers are upvoted, so I honestly have no idea where the bounty will go.

There is a wonderful Python-mode-klen plugin. If you have it and rope (python refactoring library) installed, then going to the definition of a particular term is as simple as <C-c>g or <C-c>rag (first is filetype mapping, second is a global one). There are much more useful features, some useless for me. All of them are disableable. Features from list of questions found at cscope-intro:
Where is this symbol used? <C-c>f. Rather confusing though, as results in quickfix list do show - instead of the actual lines (though they point to the correct location). Maybe it will be fixed.
Where is it defined?, What is this global symbol's definition?, Where is this function in the source files? <C-c>g
What is <...> global symbol's definition? <C-c>raj
Not very much, but I am not too experienced user of ropevim.

I got the same question you got, after browsing the internet, I found a way to fix this:
create a python script: cscope_scan.py
import os
codeRootDir = os.getcwd()
__revision__ = '0.1'
__author__ = 'lxd'
FILE_TYPE_LIST= ['py']
if __name__ == '__main__':
import os
f = open('cscope.files','w')
for root,dirs,files in os.walk(codeRootDir):
for file in files:
for file_type in FILE_TYPE_LIST:
if file.split('.')[-1] == file_type:
f.write('%s\n' %os.path.join(root,file))
f.close()
cmd = 'cscope -bk'
os.system(cmd)
excute this script under you code's root folder, this will generate the cscope.files and then excute cscope -b I don't know what happens to my computer, the last two lines aren't working well, but I think manually type a cscope -bk is acceptable:)

This hack also seems to force cscope to go through Python files:
cscope -Rb -s *
If you accept that cscope is apparently not designed to work with Python.
Superset any language any tool question: How to find all occurrences of a variable in Vim?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.