Toolchain not downloading the tool

Toolchain not downloading the tool - python

Hi I'm trying to set up a toolchain for the Fn project. The approach is to set up a toolchain per binary available in GitHub and then, in theory use it in a rule.
I have a common package which has the available binaries:
default_version = "0.5.44"
os_list = [
"linux",
"mac",
"windows"
]
def get_bin_name(os):
return "fn_cli_%s_bin" % os
The download part looks like this:
load(":common.bzl", "get_bin_name", "os_list", "default_version")
_url = "https://github.com/fnproject/cli/releases/download/{version}/{file}"
_os_to_file = {
"linux" : "fn_linux",
"mac" : "fn_mac",
"windows" : "fn.exe",
}
def _fn_binary(os):
name = get_bin_name(os)
file = _os_to_file.get(os)
url = _url.format(
file = file,
version = default_version
)
native.http_file(
name = name,
urls = [url],
executable = True
)
def fn_binaries():
"""
Installs the hermetic binary for Fn.
"""
for os in os_list:
_fn_binary(os)
Then I set up the toolchain like this:
load(":common.bzl", "get_bin_name", "os_list")
_toolchain_type = "toolchain_type"
FnInfo = provider(
doc = "Information about the Fn Framework CLI.",
fields = {
"bin" : "The Fn Framework binary."
}
)
def _fn_cli_toolchain(ctx):
toolchain_info = platform_common.ToolchainInfo(
fn_info = FnInfo(
bin = ctx.attr.bin
)
)
return [toolchain_info]
fn_toolchain = rule(
implementation = _fn_cli_toolchain,
attrs = {
"bin" : attr.label(mandatory = True)
}
)
def _add_toolchain(os):
toolchain_name = "fn_cli_%s" % os
native_toolchain_name = "fn_cli_%s_toolchain" % os
bin_name = get_bin_name(os)
compatibility = ["#bazel_tools//platforms:%s" % os]
fn_toolchain(
name = toolchain_name,
bin = ":%s" % bin_name,
visibility = ["//visibility:public"]
)
native.toolchain(
name = native_toolchain_name,
toolchain = ":%s" % toolchain_name,
toolchain_type = ":%s" % _toolchain_type,
target_compatible_with = compatibility
)
def setup_toolchains():
"""
Macro te set up the toolchains for the different platforms
"""
native.toolchain_type(name = _toolchain_type)
for os in os_list:
_add_toolchain(os)
def fn_register():
"""
Registers the Fn toolchains.
"""
path = "//tools/bazel_rules/fn/internal/cli:fn_cli_%s_toolchain"
for os in os_list:
native.register_toolchains(path % os)
In my BUILD file I call setup_toolchains:
load(":toolchain.bzl", "setup_toolchains")
setup_toolchains()
With this set up I have a rule which looks like this:
_toolchain = "//tools/bazel_rules/fn/cli:toolchain_type"
def _fn(ctx):
print("HEY")
bin = ctx.toolchains[_toolchain].fn_info.bin
print(bin)
# TEST RULE
fn = rule(
implementation = _fn,
toolchains = [_toolchain]
)
Workpace:
workspace(name = "basicwindow")
load("//tools/bazel_rules/fn:defs.bzl", "fn_binaries", "fn_register")
fn_binaries()
fn_register()
When I query for the different binaries with bazel query //tools/bazel_rules/fn/internal/cli:fn_cli_linux_bin they are there but calling bazel build //... results in an error which complains of:
ERROR: /Users/marcguilera/Code/Marc/basicwindow/tools/bazel_rules/fn/internal/cli/BUILD.bazel:2:1: in bin attribute of fn_toolchain rule //tools/bazel_rules/fn/internal/cli:fn_cli_windows: rule '//tools/bazel_rules/fn/internal/cli:fn_cli_windows_bin' does not exist. Since this rule was created by the macro 'setup_toolchains', the error might have been caused by the macro implementation in /Users/marcguilera/Code/Marc/basicwindow/tools/bazel_rules/fn/internal/cli/toolchain.bzl:35:15
ERROR: Analysis of target '//tools/bazel_rules/fn/internal/cli:fn_cli_windows' failed; build aborted: Analysis of target '//tools/bazel_rules/fn/internal/cli:fn_cli_windows' failed; build aborted
INFO: Elapsed time: 0.079s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded, 0 targets configured)
I tried to follow the toolchain tutorial in the documentation but I can't get it to work. Another interesting thing is that I'm actually using mac so the toolchain compatibility seems to also be wrong.
I'm using this toolchain in a repo so the paths vary but here's a repo containing only the fn stuff for ease of read.

Two things:
One, I suspect this is your actual issue: https://github.com/bazelbuild/bazel/issues/6828
The core of the problem is that, is the toolchain_type target is in an external repository, it always needs to be referred to by the fully-qualified name, never by the locally-qualified name.
The second is a little more fundamental: you have a lot of Starlark macros here that are generating other targets, and it's very hard to read. It would actually be a lot simpler to remove a lot of the macros, such as _fn_binary, fn_binaries, and _add_toolchains. Just have setup_toolchains directly create the needed native.toolchain targets, and have a repository macro that calls http_archive three times to declare the three different sets of binaries. This will make the code much easier to read and thus easier to debug.
For debugging toolchains, I follow two steps: first, I verify that the tool repositories exist and can be accessed directly, and then I check the toolchain registration and resolution.
After going several levels deep, it looks like you're calling http_archive, naming the new repository #linux, and downloading a specific binary file. This isn't how http_archive works: it expects to fetch a zip file (or tar.gz file), extract that, and find a WORKSPACE and at least one BUILD file inside.
My suggestions: simplify your macros, get the external repositories clearly defined, and then explore using toolchain resolution to choose the right one.
I'm happy to help answer further questions as needed.

Related

Why isn't a class identical to the same class loaded from an entry point?

I have a program that loads objects via entry points. Later, I want to check to see what type those objects are, but I'm getting stuck because isinstance() returns False when I'd expect it to return True.
Below is a minimal working example. It has a couple files, because it needs to be "pip-installed" in order to define entry points, so I also uploaded it to GitHub in case that's more convenient.
Can anyone explain (i) why isinstance() is returning False and (ii) if there's any way to make it return True?
The entry point:
# pyproject.toml
[build-system]
requires = ["flit_core >=3.2,<4"]
build-backend = "flit_core.buildapi"
[project]
name = "entry_point_isinstance"
authors = [
{name = "Kale Kundert", email = "kale#thekunderts.net"},
]
readme = "README.rst"
version = "0.0.0"
description = "Reproduce a bug involving entry points and ``isinstance()``."
requires-python = "~=3.10"
dependencies = []
[project.entry-points.entry_point_isinstance]
my_class = "entry_point_isinstance:MyClass"
The code:
# entry_point_isinstance.py
from importlib.metadata import entry_points
class MyClass:
pass
if __name__ == '__main__':
entry_points = entry_points(group='entry_point_isinstance')
plugin_cls = next(iter(entry_points)).load()
plugin_obj = plugin_cls()
print(
f'{MyClass=}',
f'{plugin_cls=}',
f'{plugin_obj=}',
f'{plugin_cls is MyClass=}',
f'{isinstance(plugin_obj, MyClass)=}',
sep='\n',
)
The terminal commands:
$ pip install -e .
$ python entry_point_isinstance.py
MyClass=<class '__main__.MyClass'>
plugin_cls=<class 'entry_point_isinstance.MyClass'>
plugin_obj=<entry_point_isinstance.MyClass object at 0x7f96b58893f0>
plugin_cls is MyClass=False
isinstance(plugin_obj, MyClass)=False
Note that MyClass and plugin_cls seem to come from different modules: __main__ and entry_point_isinstance, respectively. This seems suspicious, but I don't understand why it's happening. Normally python avoids loading the same module twice.

Fastai - failed initiation of language model in Sentence Piece Processor, cache_dir parameter

I've been already browsing web for hours to find a solution for my, which i believe so might be a pretty petty issue.
I'm using fastai's Sentence Piece Processor (SPProcesor) at the very first steps of initiation of a language model.
My code for these steps looks like this:
bs = 48
processor = SPProcessor(lang='pl')
data_lm = (TextList.from_csv('', target_corpus, processor=processor)
.split_by_rand_pct(0.1)
.label_for_lm()
.databunch(bs=bs)
)
data_lm.save(data_lm_file)
After execution i get an error which is as follows:
~/x/miniconda3/envs/fastai/lib/python3.6/site-packages/fastai/text/data.py in process(self, ds)
466 self.sp_model,self.sp_vocab = cache_dir/'spm.model',cache_dir/'spm.vocab'
467 if not getattr(self, 'vocab', False):
--> 468 with open(self.sp_vocab, 'r', encoding=self.enc) as f: self.vocab = Vocab([line.split('\t')[0] for line in f.readlines()])
469 if self.n_cpus <= 1: ds.items = self._encode_batch(ds.items)
470 else:
FileNotFoundError: [Errno 2] No such file or directory: 'tmp/spm/spm.vocab'
The proper outcome of the code executed above should be as following:
created folder named 'tmp', containing folder 'spm', within which should be placed 2 files named respectively: spm.vocab and spm.model.
What happens instead is that 'tmp' folder is created along with files named "cache_dir".vocab and "cache_dir".model inside my current directory.
Folder 'spm' is nowhere to be found.
I've found a sort of workaround solution.
It consists of manually creating a 'spm' folder inside 'tmp' and moving those 2 other mentioned above files into it, and changing their names to spm.vocab and spm.model.
That allows me to carry on with my processing yet I'd like to find a way to skip that neccessity of manually moving created files and else.
Maybe I need to pass some paramateres (probably cache_dir) with specific values before processing?
If you'd have any idea on how to solve that issue, please point me those.
I'd be grateful.

I can see similar error if I switch the code in fastai/text/data.py to an earlier version of this commit. Then, if I apply changes from the same commit it all works nicely. Now, the most recent version of the same file (the one which supposed to help with paths with spaces) seems to have yet another bug introduced there.
So pretty much it seems that the problem is that fastai is trying to give argument --model_prefix with quotes to the sentencepiece .SentencePieceTrainer.Train which makes it "misbehave".
One possibility for you would be to either (1) update to the later version of fastai (which might not help due to another bug in a newer version), or (2) manually apply changes from here to your installation's fastai/text/data.py. It's a very small change - just delete the line:
cache_dir = cache_dir/'spm'
and replace
f'--model_prefix="cache_dir" --vocab_size={vocab_sz} --model_type={model_type}']))
with the
f"--model_prefix={cache_dir/'spm'} --vocab_size={vocab_sz} --model_type={model_type}"]))
In case you are not comfortable with updating the code of the installation you can monkey-patch the module by substituting existing train_sentencepiece function by writing fixed version in your code and then doing something like fastai.text.data.train_sentencepiece = my_fixed_train_sentencepiece before other calls.
So if you are using newer version of the library the code might look like this:
import fastai
from fastai.core import PathOrStr
from fastai.text.data import ListRules, get_default_size, quotemark, full_char_coverage_langs
from typing import Collection
def train_sentencepiece(texts:Collection[str], path:PathOrStr, pre_rules: ListRules=None, post_rules:ListRules=None,
vocab_sz:int=None, max_vocab_sz:int=30000, model_type:str='unigram', max_sentence_len:int=20480, lang='en',
char_coverage=None, tmp_dir='tmp', enc='utf8'):
"Train a sentencepiece tokenizer on `texts` and save it in `path/tmp_dir`"
from sentencepiece import SentencePieceTrainer
cache_dir = Path(path)/tmp_dir
os.makedirs(cache_dir, exist_ok=True)
if vocab_sz is None: vocab_sz=get_default_size(texts, max_vocab_sz)
raw_text_path = cache_dir / 'all_text.out'
with open(raw_text_path, 'w', encoding=enc) as f: f.write("\n".join(texts))
spec_tokens = ['\u2581'+s for s in defaults.text_spec_tok]
SentencePieceTrainer.Train(" ".join([
f"--input={quotemark}{raw_text_path}{quotemark} --max_sentence_length={max_sentence_len}",
f"--character_coverage={ifnone(char_coverage, 0.99999 if lang in full_char_coverage_langs else 0.9998)}",
f"--unk_id={len(defaults.text_spec_tok)} --pad_id=-1 --bos_id=-1 --eos_id=-1",
f"--user_defined_symbols={','.join(spec_tokens)}",
f"--model_prefix={cache_dir/'spm'} --vocab_size={vocab_sz} --model_type={model_type}"]))
raw_text_path.unlink()
return cache_dir
fastai.text.data.train_sentencepiece = train_sentencepiece
And if you are using older version, then like the following:
import fastai
from fastai.core import PathOrStr
from fastai.text.data import ListRules, get_default_size, full_char_coverage_langs
from typing import Collection
def train_sentencepiece(texts:Collection[str], path:PathOrStr, pre_rules: ListRules=None, post_rules:ListRules=None,
vocab_sz:int=None, max_vocab_sz:int=30000, model_type:str='unigram', max_sentence_len:int=20480, lang='en',
char_coverage=None, tmp_dir='tmp', enc='utf8'):
"Train a sentencepiece tokenizer on `texts` and save it in `path/tmp_dir`"
from sentencepiece import SentencePieceTrainer
cache_dir = Path(path)/tmp_dir
os.makedirs(cache_dir, exist_ok=True)
if vocab_sz is None: vocab_sz=get_default_size(texts, max_vocab_sz)
raw_text_path = cache_dir / 'all_text.out'
with open(raw_text_path, 'w', encoding=enc) as f: f.write("\n".join(texts))
spec_tokens = ['\u2581'+s for s in defaults.text_spec_tok]
SentencePieceTrainer.Train(" ".join([
f"--input={raw_text_path} --max_sentence_length={max_sentence_len}",
f"--character_coverage={ifnone(char_coverage, 0.99999 if lang in full_char_coverage_langs else 0.9998)}",
f"--unk_id={len(defaults.text_spec_tok)} --pad_id=-1 --bos_id=-1 --eos_id=-1",
f"--user_defined_symbols={','.join(spec_tokens)}",
f"--model_prefix={cache_dir/'spm'} --vocab_size={vocab_sz} --model_type={model_type}"]))
raw_text_path.unlink()
return cache_dir
fastai.text.data.train_sentencepiece = train_sentencepiece

How to properly link to PyQt5 documentation using intersphinx?

I'm running into some trouble trying to link to the PyQt5 docs using intersphinx.
Trying to cross reference any of the QtCore classes (such as QThread) does not work as I'd expect. I have parsed the objects.inv available here using python -m sphinx.ext.intersphinx objects.inv, which results in an output shown in this gist.
Unfortunately, under the python namespace there are no classes and only a few functions. Everything PyQt5-related is in the sip:class namespace. Trying to reference this in documentation using the standard :py:class: syntax does not link to anything (since sphinx doesn't see that reference connected to anything), and using :sip:class: causes a warning of Unknown interpreted text role "sip:class", which makes sense because that is not a known reference code.
So, how do we access the documentation of PyQt through intersphinx (if we can at all)?

EDIT:
I create python package with this solution: https://pypi.org/project/sphinx-qt-documentation/
ORIGINAL ANSWER:
I use another approach to this problem. I create custom sphinx plugin to translate on fly inventory file to use sip domain. It allows to chose which documentation should be pointed (pleas see docstring on top). It works on my project, but I'm not sure if it support all cases.
This extension need sphinx.ext.intersphinx extension to be configured in sphinx extension and PyQt configure in mapping
intersphinx_mapping = {...,
"PyQt": ("https://www.riverbankcomputing.com/static/Docs/PyQt5", None)}
"""
This module contains sphinx extension supporting for build PartSeg documentation.
this extensio provides one configuration option:
`qt_documentation` with possibe values:
* PyQt - linking to PyQt documentation on https://www.riverbankcomputing.com/static/Docs/PyQt5/api/ (incomplete)
* Qt - linking to Qt documentation on "https://doc.qt.io/qt-5/" (default)
* PySide - linking to PySide documentation on "https://doc.qt.io/qtforpython/PySide2/"
"""
import re
from sphinx.application import Sphinx
from sphinx.environment import BuildEnvironment
from docutils.nodes import Element, TextElement
from docutils import nodes
from typing import List, Optional, Dict, Any
from sphinx.locale import _
from sphinx.ext.intersphinx import InventoryAdapter
try:
from qtpy import QT_VERSION
except ImportError:
QT_VERSION = None
# TODO add response to
# https://stackoverflow.com/questions/47102004/how-to-properly-link-to-pyqt5-documentation-using-intersphinx
signal_slot_uri = {
"Qt": "https://doc.qt.io/qt-5/signalsandslots.html",
"PySide": "https://doc.qt.io/qtforpython/overviews/signalsandslots.html",
"PyQt": "https://www.riverbankcomputing.com/static/Docs/PyQt5/signals_slots.html"
}
signal_name = {
"Qt": "Signal",
"PySide": "Signal",
"PyQt": "pyqtSignal"
}
slot_name = {
"Qt": "Slot",
"PySide": "Slot",
"PyQt": "pyqtSlot"
}
signal_pattern = re.compile(r'((\w+\d?\.QtCore\.)|(QtCore\.)|(\.)())?(pyqt)?Signal')
slot_pattern = re.compile(r'((\w+\d?\.QtCore\.)|(QtCore\.)|(\.)())?(pyqt)?Slot')
def missing_reference(app: Sphinx, env: BuildEnvironment, node: Element, contnode: TextElement
) -> Optional[nodes.reference]:
"""Linking to Qt documentation."""
target: str = node['reftarget']
inventories = InventoryAdapter(env)
objtypes = None # type: Optional[List[str]]
if node['reftype'] == 'any':
# we search anything!
objtypes = ['%s:%s' % (domain.name, objtype)
for domain in env.domains.values()
for objtype in domain.object_types]
domain = None
else:
domain = node.get('refdomain')
if not domain:
# only objects in domains are in the inventory
return None
objtypes = env.get_domain(domain).objtypes_for_role(node['reftype'])
if not objtypes:
return None
objtypes = ['%s:%s' % (domain, objtype) for objtype in objtypes]
if target.startswith("PySide2"):
head, tail = target.split(".", 1)
target = "PyQt5." + tail
obj_type_name = "sip:{}".format(node.get("reftype"))
if obj_type_name not in inventories.named_inventory["PyQt"]:
return None
target_list = [target, "PyQt5." + target]
target_list += [name + "." + target for name in inventories.named_inventory["PyQt"]["sip:module"].keys()]
if signal_pattern.match(target):
uri = signal_slot_uri[app.config.qt_documentation]
dispname = signal_name[app.config.qt_documentation]
version = QT_VERSION
elif slot_pattern.match(target):
uri = signal_slot_uri[app.config.qt_documentation]
dispname = slot_name[app.config.qt_documentation]
version = QT_VERSION
else:
for target_name in target_list:
if target_name in inventories.main_inventory[obj_type_name]:
proj, version, uri, dispname = inventories.named_inventory["PyQt"][obj_type_name][target_name]
print(node) # print nodes with unresolved references
break
else:
return None
if app.config.qt_documentation == "Qt":
html_name = uri.split("/")[-1]
uri = "https://doc.qt.io/qt-5/" + html_name
elif app.config.qt_documentation == "PySide":
html_name = "/".join(target.split(".")[1:]) + ".html"
uri = "https://doc.qt.io/qtforpython/PySide2/" + html_name
# remove this line if you would like straight to pyqt documentation
if version:
reftitle = _('(in %s v%s)') % (app.config.qt_documentation, version)
else:
reftitle = _('(in %s)') % (app.config.qt_documentation,)
newnode = nodes.reference('', '', internal=False, refuri=uri, reftitle=reftitle)
if node.get('refexplicit'):
# use whatever title was given
newnode.append(contnode)
else:
# else use the given display name (used for :ref:)
newnode.append(contnode.__class__(dispname, dispname))
return newnode
def setup(app: Sphinx) -> Dict[str, Any]:
app.connect('missing-reference', missing_reference)
app.add_config_value('qt_documentation', "Qt", True)
return {
'version': "0.9",
'env_version': 1,
'parallel_read_safe': True
}

In order to get intersphinx mapping to work for my project that uses PyQt5 I did the following:
Downloaded the original objects.inv file
Changed the :sip: domain to be :py:
Redirected the URL for most PyQt objects to point to the Qt website, which means that instead of being directed to PyQt-QWidget when someone clicks on a QWidget in my documentation they are directed to Qt-QWidget
Added aliases so that :class:`QWidget`, :class:`QtWidgets.QWidget` and :class:`PyQt5.QtWidgets.QWidget` are all linked to Qt-QWidget
If you would like to use my modified objects.inv file in your own project you can download it, save it to the same directory as your conf.py file and then edit your intersphinx_mapping dictionary in your conf.py to be
intersphinx_mapping = {
# 'PyQt5': ('http://pyqt.sourceforge.net/Docs/PyQt5/', None),
'PyQt5': ('', 'pyqt5-modified-objects.inv'),
}
If my 'pyqt5-modified-objects.inv' file does not meet the requirements for your project (for example, I did not add aliases for all Qt modules, only QtWidgets, QtCore and QtGui) then you can modify the source code that automatically performs steps 1 - 4 above.
The source code can also be used to create a modified objects.inv file for PyQt4; however, the original objects.inv file for PyQt4 does not contain a complete listing of all Qt modules and classes and therefore using intersphinx mapping with PyQt4 isn't very useful.
Note: The SourceForge team is currently solving some issues and so executing the source code will raise a ConnectionError until their issues are resolved.

python apt_pkg to obtain individual pkg details?

I've been using a combination of apt_pkg and apt libraries to obtain the following details from each package:
package.name
package.installedVersion
package.description
package.homepage
package.priority
I was able to obtain what I needed in the following manner, which I'm not entirely sure it's the best method of obtaining the results:
import apt_pkg, apt
apt_pkg.InitConfig()
apt_pkg.InitSystem()
aptpkg_cache = apt_pkg.GetCache() #Low level
apt_cache = apt.Cache() #High level
apt_cache.update()
apt_cache.open()
pkgs = {}
list_pkgs = []
for package in aptpkg_cache.Packages:
try:
#I use this to pass in the pkg name from the apt_pkg.packages
#to the high level apt_cache which allows me to obtain the
#details I need. Is it better to just stick to one library here?
#In other words, can I obtain this with just apt_pkg instead of using apt?
selected_package = apt_cache[package.name]
#Verify that the package can be upgraded
if check_pkg_status(package) == "upgradable":
pkgs["name"] = selected_package.name
pkgs["version"] = selected_package.installedVersion
pkgs["desc"] = selected_package.description
pkgs["homepage"] = selected_package.homepage
pkgs["severity"] = selected_package.prority
list_pkgs.append(pkgs)
else:
print "Package: " + package.name + " does not exist"
pass #Not upgradable?
except:
pass #This is one of the main reasons why I want to try a different method.
#I'm using this Try/Catch because there are a lot of times that when
#I pass in package.name to apt_cache[], I get error that package does not
#exists...
def check_pkg_status(package):
versions = package.VersionList
version = versions[0]
for other_version in versions:
if apt_pkg.VersionCompare(version.VerStr, other_version.VerStr)<0:
version = other_version
if package.CurrentVer:
current = package.CurrentVer
if apt_pkg.VersionCompare(current.VerStr, version.VerStr)<0:
return "upgradable"
else:
return "current"
else:
return "uninstalled"
I want to find a good way of using apt_pkg/apt to get the details for each package that's a possible upgrade/update candidate?
The way I'm currently doing this, I only get updates/upgrades for packages already in the system, even though I noticed the update manager for Debian shows me packages that I don't have in my system.

The following script is based on your python code, works on my Ubuntu 12.04, should also works with any system has python-apt 0.8+
import apt
apt_cache = apt.Cache() #High level
apt_cache.update()
apt_cache.open()
list_pkgs = []
for package_name in apt_cache.keys():
selected_package = apt_cache[package_name]
#Verify that the package can be upgraded
if selected_package.isUpgradable:
pkg = dict(
name=selected_package.name,
version= selected_package.installedVersion,
desc= selected_package.description,
homepage= selected_package.homepage,
severity= selected_package.priority)
list_pkgs.append(pkg)
print list_pkgs

Script/utility to rewrite all svn:externals in repository trunk

Say that one wishes to convert all absolute svn:externals URLS to relative URLS throughout their repository.
Alternatively, if heeding the tip in the svn:externals docs ("You should seriously consider using explicit revision numbers..."), one might find themselves needing to periodically pull new revisions for externals in many places throughout the repository.
What's the best way to programmatically update a large number of svn:externals properties?
My solution is posted below.

Here's my class to extract parts from a single line of an svn:externals property:
from urlparse import urlparse
import re
class SvnExternalsLine:
'''Consult https://subversion.apache.org/docs/release-notes/1.5.html#externals for parsing algorithm.
The old svn:externals format consists of:
<local directory> [revision] <absolute remote URL>
The NEW svn:externals format consists of:
[revision] <absolute or relative remote URL> <local directory>
Therefore, "relative" remote paths always come *after* the local path.
One complication is the possibility of local paths with spaces.
We just assume that the remote path cannot have spaces, and treat all other
tokens (except the revision specifier) as part of the local path.
'''
REVISION_ARGUMENT_REGEXP = re.compile("-r(\d+)")
def __init__(self, original_line):
self.original_line = original_line
self.pinned_revision_number = None
self.repo_url = None
self.local_pathname_components = []
for token in self.original_line.split():
revision_match = self.REVISION_ARGUMENT_REGEXP.match(token)
if revision_match:
self.pinned_revision_number = int(revision_match.group(1))
elif urlparse(token).scheme or any(map(lambda p: token.startswith(p), ["^", "//", "/", "../"])):
self.repo_url = token
else:
self.local_pathname_components.append(token)
# ---------------------------------------------------------------------
def constructLine(self):
'''Reconstruct the externals line in the Subversion 1.5+ format'''
tokens = []
# Update the revision specifier if one existed
if self.pinned_revision_number is not None:
tokens.append( "-r%d" % (self.pinned_revision_number) )
tokens.append( self.repo_url )
tokens.extend( self.local_pathname_components )
if self.repo_url is None:
raise Exception("Found a bad externals property: %s; Original definition: %s" % (str(tokens), repr(self.original_line)))
return " ".join(tokens)
I use the pysvn library to iterate recursively through all of the directories possessing the svn:externals property, then split that property value by newlines, and act upon each line according to the parsed SvnExternalsLine.
The process must be performed on a local checkout of the repository. Here's how pysvn (propget) can be used to retrieve the externals:
client.propget( "svn:externals", base_checkout_path, recurse=True)
Iterate through the return value of this function, and and after modifying the property on each directory,
client.propset("svn:externals", new_externals_property, path)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.