Google Dataflow: global name is not defined - apache beam - python

In local I have this:
from shapely.geometry import Point
<...>
class GeoDataIngestion:
def parse_method(self, string_input):
Place = Point(float(values[2]), float(values[3]))
<...>
I run this, with python 2.7 and all goes well
After that, I try to test it with the dataflow runner but while running I got this error:
NameError: global name 'Point' is not defined
The pipeline:
geo_data = (raw_data
| 'Geo data transform' >> beam.Map(lambda s: geo_ingestion.parse_method(s))
I have read other post and I think this should work, but i'm not sure if there are something special with Google Dataflow in this
I also tried:
import shapely.geometry
<...>
Place = shapely.geometry.Point(float(values[2]), float(values[3]))
With the same result
NameError: global name 'shapely' is not defined
Any idea?
In Google Cloud, If I tried in my virtual enviroment, I can do it without any problem:
(env) ...#cloudshell:~ ()$ python
Python 2.7.13 (default, Sep 26 2018, 18:42:22)
[GCC 6.3.0 20170516] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from shapely.geometry import Point
Var = Point(-5.020751953125, 39.92237576385941)
EXTRA:
Error using requirements.txt
Collecting Shapely==1.6.4.post1 (from -r req.txt (line 2))
Using cached https://files.pythonhosted.org/packages/7d/3c/0f09841db07aabf9cc387662be646f181d07ed196e6f60ce8be5f4a8f0bd/Shapely-1.6.4.post1.tar.gz
Saved c:\<...>\shapely-1.6.4.post1.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "c:\<...>\temp\pip-download-kpg5ca\Shapely\setup.py", line 80, in <module>
from shapely._buildcfg import geos_version_string, geos_version, \
File "shapely\_buildcfg.py", line 200, in <module>
lgeos = CDLL("geos_c.dll")
File "C:\Python27\Lib\ctypes\__init__.py", line 366, in __init__
self._handle = _dlopen(self._name, mode)
WindowsError: [Error 126] No se puede encontrar el m¢dulo especificado
Error using setup.py
Setup.py like this changing:
CUSTOM_COMMANDS = [
['apt-get', 'update'],
['apt-get', '--assume-yes', 'install', 'libgeos-dev'],
['pip', 'install', 'Shapely'],
['echo', 'Custom command worked!']
]
The result is like no packet would be installed, because I get the error from the beginning:
NameError: global name 'Point' is not defined
setup.py file:
from __future__ import absolute_import
from __future__ import print_function
import subprocess
from distutils.command.build import build as _build
import setuptools
class build(_build): # pylint: disable=invalid-name
sub_commands = _build.sub_commands + [('CustomCommands', None)]
CUSTOM_COMMANDS = [
['apt-get', 'update'],
['apt-get', '--assume-yes', 'install', 'libgeos-dev'],
['pip', 'install', 'Shapely']]
class CustomCommands(setuptools.Command):
def initialize_options(self):
pass
def finalize_options(self):
pass
def RunCustomCommand(self, command_list):
print('Running command: %s' % command_list)
p = subprocess.Popen(
command_list,
stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
# Can use communicate(input='y\n'.encode()) if the command run requires
# some confirmation.
stdout_data, _ = p.communicate()
print('Command output: %s' % stdout_data)
if p.returncode != 0:
raise RuntimeError(
'Command %s failed: exit code: %s' % (command_list, p.returncode))
def run(self):
for command in CUSTOM_COMMANDS:
self.RunCustomCommand(command)
REQUIRED_PACKAGES = ['Shapely']
setuptools.setup(
name='dataflow',
version='0.0.1',
description='Dataflow set workflow package.',
install_requires=REQUIRED_PACKAGES,
packages=setuptools.find_packages(),
cmdclass={
'build': build,
'CustomCommands': CustomCommands,
}
)
pipeline options:
pipeline_options = PipelineOptions()
pipeline_options.view_as(StandardOptions).streaming = True
pipeline_options.view_as(SetupOptions).save_main_session = True
pipeline_options.view_as(SetupOptions).setup_file = 'C:\<...>\setup.py'
with beam.Pipeline(options=pipeline_options) as p:
The call:
python -m dataflow --project XXX --temp_location gs://YYY --runner DataflowRunner --region europe-west1 --setup_file C:\<...>\setup.py
The beginning log: (before dataflow wait for the data)
INFO:root:Defaulting to the temp_location as staging_location: gs://iotbucketdetector/test/prueba
C:\Users\<...>~1\Desktop\PROYEC~2\env\lib\site-packages\apache_beam\runners\dataflow\dataflow_runner.py:816: DeprecationWarning: options is deprecated since First stable release.. References to <pipeline>.options will
not be supported
transform_node.inputs[0].pipeline.options.view_as(StandardOptions))
INFO:root:Starting GCS upload to gs://<...>-1120074505-586000.1542699905.588000/pipeline.pb...
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:root:Completed GCS upload to gs://<...>-1120074505-586000.1542699905.588000/pipeline.pb
INFO:root:Executing command: ['C:\\Users\\<...>~1\\Desktop\\PROYEC~2\\env\\Scripts\\python.exe', 'setup.py', 'sdist', '--dist-dir', 'c:\\users\\<...>~1\\appdata\\local\\temp\\tmpakq8bs']
running sdist
running egg_info
writing requirements to dataflow.egg-info\requires.txt
writing dataflow.egg-info\PKG-INFO
writing top-level names to dataflow.egg-info\top_level.txt
writing dependency_links to dataflow.egg-info\dependency_links.txt
reading manifest file 'dataflow.egg-info\SOURCES.txt'
writing manifest file 'dataflow.egg-info\SOURCES.txt'
warning: sdist: standard file not found: should have one of README, README.rst, README.txt, README.md
running check
warning: check: missing required meta-data: url
warning: check: missing meta-data: either (author and author_email) or (maintainer and maintainer_email) must be supplied
creating dataflow-0.0.1
creating dataflow-0.0.1\dataflow.egg-info
copying files to dataflow-0.0.1...
copying setup.py -> dataflow-0.0.1
copying dataflow.egg-info\PKG-INFO -> dataflow-0.0.1\dataflow.egg-info
copying dataflow.egg-info\SOURCES.txt -> dataflow-0.0.1\dataflow.egg-info
copying dataflow.egg-info\dependency_links.txt -> dataflow-0.0.1\dataflow.egg-info
copying dataflow.egg-info\requires.txt -> dataflow-0.0.1\dataflow.egg-info
copying dataflow.egg-info\top_level.txt -> dataflow-0.0.1\dataflow.egg-info
Writing dataflow-0.0.1\setup.cfg
Creating tar archive
removing 'dataflow-0.0.1' (and everything under it)
INFO:root:Starting GCS upload to gs://<...>-1120074505-586000.1542699905.588000/workflow.tar.gz...
INFO:root:Completed GCS upload to gs://<...>-1120074505-586000.1542699905.588000/workflow.tar.gz
INFO:root:Starting GCS upload to gs://<...>-1120074505-586000.1542699905.588000/pickled_main_session...
INFO:root:Completed GCS upload to gs://<...>-1120074505-586000.1542699905.588000/pickled_main_session
INFO:root:Downloading source distribtution of the SDK from PyPi
INFO:root:Executing command: ['C:\\Users\\<...>~1\\Desktop\\PROYEC~2\\env\\Scripts\\python.exe', '-m', 'pip', 'download', '--dest', 'c:\\users\\<...>~1\\appdata\\local\\temp\\tmpakq8bs', 'apache-beam==2.5.0', '--no-d
eps', '--no-binary', ':all:']
Collecting apache-beam==2.5.0
Using cached https://files.pythonhosted.org/packages/c6/96/56469c57cb043f36bfdd3786c463fbaeade1e8fcf0593ec7bc7f99e56d38/apache-beam-2.5.0.zip
Saved c:\users\<...>~1\appdata\local\temp\tmpakq8bs\apache-beam-2.5.0.zip
Successfully downloaded apache-beam
INFO:root:Staging SDK sources from PyPI to gs://<...>-1120074505-586000.1542699905.588000/dataflow_python_sdk.tar
INFO:root:Starting GCS upload to gs://<...>-1120074505-586000.1542699905.588000/dataflow_python_sdk.tar...
INFO:root:Completed GCS upload to gs://<...>-1120074505-586000.1542699905.588000/dataflow_python_sdk.tar
INFO:root:Downloading binary distribtution of the SDK from PyPi
INFO:root:Executing command: ['C:\\Users\\<...>~1\\Desktop\\PROYEC~2\\env\\Scripts\\python.exe', '-m', 'pip', 'download', '--dest', 'c:\\users\\<...>~1\\appdata\\local\\temp\\tmpakq8bs', 'apache-beam==2.5.0', '--no-d
eps', '--only-binary', ':all:', '--python-version', '27', '--implementation', 'cp', '--abi', 'cp27mu', '--platform', 'manylinux1_x86_64']
Collecting apache-beam==2.5.0
Using cached https://files.pythonhosted.org/packages/ff/10/a59ba412f71fb65412ec7a322de6331e19ec8e75ca45eba7a0708daae31a/apache_beam-2.5.0-cp27-cp27mu-manylinux1_x86_64.whl
Saved c:\users\<...>~1\appdata\local\temp\tmpakq8bs\apache_beam-2.5.0-cp27-cp27mu-manylinux1_x86_64.whl
Successfully downloaded apache-beam
INFO:root:Staging binary distribution of the SDK from PyPI to gs://<...>-1120074505-586000.1542699905.588000/apache_beam-2.5.0-cp27-cp27mu-manylinux1_x86_64.whl
INFO:root:Starting GCS upload to gs://<...>-1120074505-586000.1542699905.588000/apache_beam-2.5.0-cp27-cp27mu-manylinux1_x86_64.whl...
INFO:root:Completed GCS upload to gs://<...>-1120074505-586000.1542699905.588000/apache_beam-2.5.0-cp27-cp27mu-manylinux1_x86_64.whl
INFO:root:Create job: <Job
createTime: u'2018-11-20T07:45:28.050865Z'
currentStateTime: u'1970-01-01T00:00:00Z'
id: u'2018-11-19_23_45_27-14221834310382472741'
location: u'europe-west1'
name: u'beamapp-<...>-1120074505-586000'
projectId: u'poc-cloud-209212'
stageStates: []
steps: []
tempFiles: []
type: TypeValueValuesEnum(JOB_TYPE_STREAMING, 2)>

This is because you need to tell dataflow to install package you want.
Briefly documentation is here.
Simply speak, for PyPi package like shapely, you can do the following to ensure all dependencies installed.
pip freeze > requirements.txt
Remove all unrelated package in requirements.txt
Run your pipline with --requirements_file requirements.txt
Or even more, if you want to do something like install linux package by apt-get or using your own python module. Take a look on this official example. You need to setup a setup.py for this and change your pipeline command with
--setup_file setup.py.
For PyPi module, use the REQUIRED_PACKAGES in example.
REQUIRED_PACKAGES = [
'numpy','shapely'
]
If you are use pipeline options, then add setup.py as
pipeline_options = {
'project': PROJECT,
'staging_location': 'gs://' + BUCKET + '/staging',
'runner': 'DataflowRunner',
'job_name': 'test',
'temp_location': 'gs://' + BUCKET + '/temp',
'save_main_session': True,
'setup_file':'.\setup.py'
}
options = PipelineOptions.from_dictionary(pipeline_options)
p = beam.Pipeline(options=options)

Import inside the function + setup.py:
class GeoDataIngestion:
def parse_method(self, string_input):
from shapely.geometry import Point
place = Point(float(values[2]), float(values[3]))
setup.py with:
REQUIRED_PACKAGES = ['shapely']

Related

Python - Build and release an Artifact with AzureDevOps

I'm trying to create an Azure DevOps Pipeline in order to build and release a Python package under the Azure DevOps Artifacts section.
I've started creating a feed called "utils", then I've created my package and I've structured it like that:
.
src
|
__init__.py
class.py
test
|
__init__.py
test_class.py
.pypirc
azure-pipelines.yml
pyproject.toml
requirements.txt
setup.cfg
And this is the content of files:
.pypirc
[distutils]
Index-servers =
prelios-utils
[utils]
Repository = https://pkgs.dev.azure.com/OMIT/_packaging/utils/pypi/upload/
pyproject.toml
[build-system]
requires = [
"setuptools>=42",
"wheel"
]
build-backend = "setuptools.build_meta"
setup.cfg
[metadata]
name = my_utils
version = 0.1
author = Walter Tranchina
author_email = walter.tranchina#OMIT.com
description = A package containing [...]
long_description = file: README.md
long_description_content_type = text/markdown
url = OMIT.com
project_urls =
classifiers =
Programming Language :: Python :: 3
License :: OSI Approved :: MIT License
Operating System :: OS Independent
[options]
package_dir =
= src
packages = find:
python_requires = >=3.7
install_requires=
[options.packages.find]
where = src
azure-pipelines.yml
trigger:
- main
pool:
vmImage: 'ubuntu-latest'
strategy:
matrix:
Python38:
python.version: '3.8'
steps:
- task: UsePythonVersion#0
inputs:
versionSpec: '$(python.version)'
displayName: 'Use Python $(python.version)'
- script: |
python -m pip install --upgrade pip
displayName: 'Install dependencies'
- script: |
pip install twine wheel
displayName: 'Install buildtools'
- script: |
pip install pytest pytest-azurepipelines
pytest
displayName: 'pytest'
- script: |
python -m build
displayName: 'Artifact creation'
- script: |
twine upload -r utils --config-file ./.pypirc dist/*
displayName: 'Artifact Upload'
The problem I'm facing is that the pipeline stucks in the Artifact Upload stage for hours without completing.
Can please someone help me understand what it's wrong?
Thanks!
[UPDATE]
I've updated my yml file as suggested in the answers:
- task: TwineAuthenticate#1
displayName: 'Twine Authenticate'
inputs:
artifactFeed: 'utils'
And now I have this error:
2022-05-19T09:20:50.6726960Z ##[section]Starting: Artifact Upload
2022-05-19T09:20:50.6735745Z ==============================================================================
2022-05-19T09:20:50.6736081Z Task : Command line
2022-05-19T09:20:50.6736434Z Description : Run a command line script using Bash on Linux and macOS and cmd.exe on Windows
2022-05-19T09:20:50.6736788Z Version : 2.201.1
2022-05-19T09:20:50.6737008Z Author : Microsoft Corporation
2022-05-19T09:20:50.6737375Z Help : https://learn.microsoft.com/azure/devops/pipelines/tasks/utility/command-line
2022-05-19T09:20:50.6737859Z ==============================================================================
2022-05-19T09:20:50.8090380Z Generating script.
2022-05-19T09:20:50.8100662Z Script contents:
2022-05-19T09:20:50.8102321Z twine upload -r utils --config-file ./.pypirc dist/*
2022-05-19T09:20:50.8102824Z ========================== Starting Command Output ===========================
2022-05-19T09:20:50.8129029Z [command]/usr/bin/bash --noprofile --norc /home/vsts/work/_temp/706c12ef-da25-44b0-b1fc-5ab83e7e0bf9.sh
2022-05-19T09:20:51.1178721Z Uploading distributions to
2022-05-19T09:20:51.1180490Z https://pkgs.dev.azure.com/OMIT/_packaging/utils/pypi/upload/
2022-05-19T09:20:27.0860014Z Traceback (most recent call last):
2022-05-19T09:20:27.0861203Z File "/opt/hostedtoolcache/Python/3.8.12/x64/bin/twine", line 8, in <module>
2022-05-19T09:20:27.0862081Z sys.exit(main())
2022-05-19T09:20:27.0863965Z File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/twine/__main__.py", line 33, in main
2022-05-19T09:20:27.0865080Z error = cli.dispatch(sys.argv[1:])
2022-05-19T09:20:27.0866638Z File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/twine/cli.py", line 124, in dispatch
2022-05-19T09:20:27.0867670Z return main(args.args)
2022-05-19T09:20:27.0869183Z File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/twine/commands/upload.py", line 198, in main
2022-05-19T09:20:27.0870362Z return upload(upload_settings, parsed_args.dists)
2022-05-19T09:20:27.0871990Z File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/twine/commands/upload.py", line 127, in upload
2022-05-19T09:20:27.0873239Z repository = upload_settings.create_repository()
2022-05-19T09:20:27.0875392Z File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/twine/settings.py", line 329, in create_repository
2022-05-19T09:20:27.0876447Z self.username,
2022-05-19T09:20:27.0877911Z File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/twine/settings.py", line 131, in username
2022-05-19T09:20:27.0879043Z return cast(Optional[str], self.auth.username)
2022-05-19T09:20:27.0880583Z File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/twine/auth.py", line 34, in username
2022-05-19T09:20:27.0881640Z return utils.get_userpass_value(
2022-05-19T09:20:27.0883208Z File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/twine/utils.py", line 248, in get_userpass_value
2022-05-19T09:20:27.0884302Z value = prompt_strategy()
2022-05-19T09:20:27.0886234Z File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/twine/auth.py", line 85, in username_from_keyring_or_prompt
2022-05-19T09:20:27.0887440Z return self.prompt("username", input)
2022-05-19T09:20:27.0888964Z File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/twine/auth.py", line 96, in prompt
2022-05-19T09:20:27.0890017Z return how(f"Enter your {what}: ")
2022-05-19T09:20:27.0890786Z EOFError: EOF when reading a line
2022-05-19T09:20:27.1372189Z ##[error]Bash exited with code 'null'.
2022-05-19T09:20:27.1745024Z ##[error]The operation was canceled.
2022-05-19T09:20:27.1749049Z ##[section]Finishing: Artifact Upload
Seems like twine is waiting for something... :/
I guess this is because you are missing a Python Twine Upload Authenticate task.
- task: TwineAuthenticate#1
inputs:
artifactFeed: 'MyTestFeed'
If you are using a project level feed, the value of artifactFeed should be {project name}/{feed name}.
If you are using an organization level feed, the value of artifactFeed should be {feed name}.
A simpler way is to click the gray "setting" button under the task and select your feed from the drop-down list.
I've found the solution after many tentatives...
First I've created a Service Connection in Azure DevOps to Python, containing an API key previously generated.
Then I've edited the yaml file:
- task: TwineAuthenticate#1
displayName: 'Twine Authenticate'
inputs:
pythonUploadServiceConnection: 'PythonUpload'
- script: |
python -m twine upload --skip-existing --verbose -r utils --config-file $(PYPIRC_PATH) dist/*
displayName: 'Artifact Upload'
They key was using the variable $(PYPIRC_PATH) that is automatically set by the previous task. The .pypirc file is ignored by the process, so it can be deleted!
Hope it will help!

Python/Pip packaging; how to move built files to install directory

I have been working on a Python package which wraps some C++ libraries that need to be built from source. I build these with CMake, and I want the whole thing to be 'pip install'able in the end. I am almost there, however I am having problems getting the libraries built by CMake to end up in the final Python installation directory.
I managed to get them into the final 'wheel', oddly enough, but they aren't in my site_packages directory.
My setup.py file looks like this:
import os
import re
import sys
import sysconfig
import site
import platform
import subprocess
import pathlib
from distutils.version import LooseVersion
from setuptools import setup, Extension
from setuptools.command.build_ext import build_ext as build_ext_orig
class CMakeExtension(Extension):
def __init__(self, name, sourcedir=''):
Extension.__init__(self, name, sources=[])
self.sourcedir = os.path.abspath(sourcedir)
class CMakeBuild(build_ext_orig):
def run(self):
try:
out = subprocess.check_output(['cmake', '--version'])
except OSError:
raise RuntimeError("CMake must be installed to build the following extensions: " +
", ".join(e.name for e in self.extensions))
if platform.system() == "Windows":
raise RuntimeError("Sorry, pyScannerBit doesn't work on Windows platforms. Please use Linux or OSX.")
for ext in self.extensions:
self.build_extension(ext)
def build_extension(self, ext):
extdir = os.path.abspath(os.path.dirname(self.get_ext_fullpath(ext.name)))
cmake_args = ['-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=' + extdir,
'-DPYTHON_EXECUTABLE=' + sys.executable,
'-DCMAKE_VERBOSE_MAKEFILE:BOOL=OFF',
'-Wno-dev',
'-DCMAKE_RUNTIME_OUTPUT_DIRECTORY=' + extdir,
'-DSCANNERBIT_STANDALONE=True',
'-DCMAKE_INSTALL_RPATH=$ORIGIN',
'-DCMAKE_BUILD_WITH_INSTALL_RPATH:BOOL=ON',
'-DCMAKE_INSTALL_RPATH_USE_LINK_PATH:BOOL=ON',
'-DCMAKE_INSTALL_PREFIX:PATH=' + extdir,
]
cfg = 'Debug' if self.debug else 'Release'
build_args = ['--config', cfg]
if platform.system() == "Windows":
cmake_args += ['-DCMAKE_LIBRARY_OUTPUT_DIRECTORY_{}={}'.format(cfg.upper(), extdir)]
if sys.maxsize > 2**32:
cmake_args += ['-A', 'x64']
build_args += ['--', '/m']
else:
cmake_args += ['-DCMAKE_BUILD_TYPE=' + cfg]
build_args += ['--', '-j2']
env = os.environ.copy()
env['CXXFLAGS'] = '{} -DVERSION_INFO=\\"{}\\"'.format(env.get('CXXFLAGS', ''),
self.distribution.get_version())
if not os.path.exists(self.build_temp):
os.makedirs(self.build_temp)
# untar ScannerBit tarball
subprocess.check_call(['tar','-C','pyscannerbit/scannerbit/untar/ScannerBit','-xf','pyscannerbit/scannerbit/ScannerBit_stripped.tar','--strip-components=1'], cwd=ext.sourcedir, env=env)
# First cmake
subprocess.check_call(['cmake', ext.sourcedir] + cmake_args, cwd=self.build_temp, env=env)
# Build all the scanners
subprocess.check_call(['cmake', '--build', '.', '--target', 'multinest'] + build_args, cwd=self.build_temp)
# Re-run cmake to detect built scanner plugins
subprocess.check_call(['cmake', ext.sourcedir], cwd=self.build_temp)
# Main build
subprocess.check_call(['cmake', '--build', '.'] + build_args, cwd=self.build_temp)
# Install
#subprocess.check_call(['cmake', '--build', '.', '--target', 'install'], cwd=self.build_temp)
setup(
name='pyscannerbit',
version='0.0.8',
author='Ben Farmer',
# Add yourself if you contribute to this package
author_email='ben.farmer#gmail.com',
description='A python interface to the GAMBIT scanning module, ScannerBit',
long_description='',
ext_modules=[CMakeExtension('_interface')],
cmdclass=dict(build_ext=CMakeBuild),
zip_safe=False,
packages=['pyscannerbit'],
)
As you can see, I am telling CMake to build the libraries in 'extdir', which it turns out is
/tmp/pip-req-build-d7mfvn1a/build/lib.linux-x86_64-3.6
I had assumed that the files would just be copied from here (or some other temporary directory?) into the final install path in bulk, but perhaps it doesn't work like that (though as I said earlier, these built files do end up in the wheel that is generated). Do these built files need to be added to MANIFEST.in or some 'package_data' entry or something like that? Currently they are not listed anywhere like that, since it was my understanding that those were for moving files around pre-build, not post-build. Currently I only use MANIFEST.in to make sure my sdist tarball gets filled correctly.
For completeness, I am building the package with pip as follows:
python setup.py sdist
pip install -v dist/pyscannerbit-0.0.8.tar.gz
This is just so I know that the build from the tarball works, for later use with PyPI.
The source is on github if you want to try it out: https://github.com/bjfar/pyscannerbit
Ok so it seems that I just had the paths a bit wrong. I previously was setting the CMAKE_LIBRARY_OUTPUT_DIRECTORY to
extdir = os.path.abspath(os.path.dirname(self.get_ext_fullpath(ext.name)))
However I needed to point it to
extdir+'/pyscannerbit'
where pyscannerbit is the name of the package. Otherwise the files end up in the parent directory where the build occurs, but not inside the project directory. So then they don't subsequently get copied to the install path.

Can't use Python's sh module in Bazel genrule

When I run a python script, which uses the 'sh' module, from bazel genrule, it failed with this:
INFO: Analysed target //src:foo_gen (8 packages loaded).
INFO: Found 1 target...
ERROR: /home/libin11/workspace/test/test/src/BUILD:1:1: Executing genrule //src:foo_gen failed (Exit 1)
Traceback (most recent call last):
File "src/test.py", line 2, in <module>
sh.touch("foo.bar")
File "/usr/local/lib/python2.7/dist-packages/sh.py", line 1427, in __call__
return RunningCommand(cmd, call_args, stdin, stdout, stderr)
File "/usr/local/lib/python2.7/dist-packages/sh.py", line 767, in __init__
self.call_args, pipe, process_assign_lock)
File "/usr/local/lib/python2.7/dist-packages/sh.py", line 1784, in __init__
self._stdout_read_fd, self._stdout_write_fd = pty.openpty()
File "/usr/lib/python2.7/pty.py", line 29, in openpty
master_fd, slave_name = _open_terminal()
File "/usr/lib/python2.7/pty.py", line 70, in _open_terminal
raise os.error, 'out of pty devices'
OSError: out of pty devices
Target //src:foo_gen failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 2.143s, Critical Path: 0.12s
INFO: 0 processes.
FAILED: Build did NOT complete successfully
I want to integrate a thirdparty project to my own. The third party project is built with a python script, so I would like to build the project with bazel genrule.
Here is an example file list:
.
├── src
│ ├── BUILD
│ └── test.py
└── WORKSPACE
WORKSPACE is empty, BUILD is:
genrule(
name = "foo_gen",
srcs = glob(["**/*"]),
outs = ["foo.bar"],
cmd = "python $(location test.py)",
)
test.py is:
import sh
sh.touch("foo.bar")
And run:
bazel build //src:foo_gen
OS: Ubuntu 16.04
bazel: release 0.14.1
It looks like if you change the call to sh.touch("foo.bar", _tty_in=False, _tty_out=False) it works, but you'll still need a bit of modification to the genrule otherwise it won't produce output.
I prefer to import pip dependencies using the bazel python rules, so I can create the tool for my genrule. This way, bazel handles the pip requirement install and you don't have to chmod the test.py file.
load("#my_deps//:requirements.bzl", "requirement")
py_binary(
name = "foo_tool",
srcs = [
"test.py",
],
main = "test.py",
deps = [
requirement("sh"),
],
)
genrule(
name = "foo_gen",
outs = ["foo.bar"],
cmd = """
python3 $(location //src:foo_tool)
cp foo.bar $#
""",
tools = [":foo_tool"],
)
Note the required copy in the genrule command. It's a bit cleaner if your python script can output to std out, then you can just redirect the output to the file instead of adding a copy command. See this for more info.
My output with these changes:
INFO: Analysed target //src:foo_gen (0 packages loaded).
INFO: Found 1 target...
Target //src:foo_gen up-to-date:
bazel-genfiles/src/foo.bar
INFO: Elapsed time: 0.302s, Critical Path: 0.00s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action

Pip failing to install dependencies

I have an open source project (GridCal) and I tell users to install the package with pip install GridCal or pip3 install GridCal for unix systems.
The setup file is this:
from distutils.core import setup
import sys
import os
import platform
from GridCal.grid.CalculationEngine import __GridCal_VERSION__
name = "GridCal"
version = str(__GridCal_VERSION__)
description = "Research Oriented electrical simulation software."
# Python 3.5 or later needed
if sys.version_info < (3, 5, 0, 'final', 0):
raise (SystemExit, 'Python 3.5 or later is required!')
# Build a list of all project modules
packages = []
for dir_name, dir_names, file_names in os.walk(name):
if '__init__.py' in file_names:
packages.append(dir_name.replace('/', '.'))
package_dir = {name: name}
# Data_files (e.g. doc) needs (directory, files-in-this-directory) tuples
data_files = []
for dir_name, dir_names, file_names in os.walk('doc'):
files_list = []
for filename in file_names:
fullname = os.path.join(dir_name, filename)
files_list.append(fullname)
data_files.append(('share/' + name + '/' + dir_name, files_list))
if platform.system() == 'Windows':
# list the packages (On windows anaconda is assumed)
required_packages = ["numpy",
"scipy",
"networkx",
"pandas",
"xlwt",
"xlrd",
# "PyQt5",
"matplotlib",
"qtconsole",
"pysot",
"openpyxl",
"pulp"
]
else:
# make the desktop entry
make_linux_desktop_file(version_=version, comment=description)
# list the packages
required_packages = ["numpy",
"scipy",
"networkx",
"pandas",
"xlwt",
"xlrd",
"PyQt5",
"matplotlib",
"qtconsole",
"pysot",
"openpyxl",
"pulp"
]
# Read the license
with open('LICENSE.txt', 'r') as f:
license_text = f.read()
setup(
# Application name:
name=name,
# Version number (initial):
version=version,
# Application author details:
author="Santiago Peñate Vera",
author_email="santiago.penate.vera#gmail.com",
# Packages
packages=packages,
data_files=data_files,
# Include additional files into the package
include_package_data=True,
# Details
url="http://pypi.python.org/pypi/GridCal/",
# License file
license=license_text,
# description
description=description,
# long_description=open("README.txt").read(),
# Dependent packages (distributions)
install_requires=required_packages,
setup_requires=required_packages
)
From time to time I get users reports saying that the program is missing modules: https://github.com/SanPen/GridCal/issues/12
I have specified the list of packages both in install_requires and setup_requires.
Is this a pip bug, or shall I do something else?
Your setup.py imports GridCal.grid.CalculationEngine which imports almost all of your dependencies. I.e. your setup.py imports dependencies before installing them.
Try to install it in a new empty virtual env detached from your global site-packages — that surely doesn't work:
$ virtualenv --no-site-packages --python python3.4 test-gcal
Running virtualenv with interpreter /usr/bin/python3.4
Using base prefix '/usr'
New python executable in /home/phd/tmp/test-gcal/bin/python3.4
Also creating executable in /home/phd/tmp/test-gcal/bin/python
Installing setuptools, pip, wheel...done.
$ source test-gcal/bin/activate
$ pip install GridCal
Collecting GridCal
Using cached GridCal-1.85.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-c7q9pbep/GridCal/setup.py", line 5, in <module>
from GridCal.grid.CalculationEngine import __GridCal_VERSION__
File "/tmp/pip-build-c7q9pbep/GridCal/GridCal/grid/CalculationEngine.py", line 18, in <module>
from GridCal.grid.JacobianBased import IwamotoNR, Jacobian, LevenbergMarquardtPF
File "/tmp/pip-build-c7q9pbep/GridCal/GridCal/grid/JacobianBased.py", line 19, in <module>
from numpy import array, angle, exp, linalg, r_, Inf, conj, diag, asmatrix, asarray, zeros_like, zeros, complex128, \
ImportError: No module named 'numpy'
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-c7q9pbep/GridCal/
The fix is relatively straightforward: you have to move __GridCal_VERSION__ from GridCal/Engine/CalculationEngine.py to a separate GridCal/version.py (or __version__.py or something like this) and do from GridCal.version import __GridCal_VERSION__ in setup.py.
Please remember that import would work only if your GridCal/__init__.py is empty or only imports builtin/standard modules. If said __init__.py directly or indirectly imports a (not yet installed) dependency version.py could not be imported. There is a way to overcome this in setup.py but I skip it for now. If your ever will need the solution — ask again.

How to include (script-built) libraries with package installation?

I am making a Python package that has a C++-extension module and someone else's shared library that it requires. I want everything installable via pip. My current setup.py file works when I use pip install -e . but when I don't use develop mode (e.i. omit the -e) I get "cannot open shared object file" when importing the module in Python. I believe the reason is that setuptools doesn't consider the shared library to be part of my package, so the relative link to the library is broken during installation when files are copied to the install directory.
Here is what my setup.py file looks like:
from setuptools import setup, Extension, Command
import setuptools.command.develop
import setuptools.command.build_ext
import setuptools.command.install
import distutils.command.build
import subprocess
import sys
import os
# This function downloads and builds the shared-library
def run_clib_install_script():
build_clib_cmd = ['bash', 'clib_install.sh']
if subprocess.call(build_clib_cmd) != 0:
sys.exit("Failed to build C++ dependencies")
# I make a new command that will build the shared-library
class build_clib(Command):
user_options = []
def initialize_options(self):
pass
def finalize_options(self):
pass
def run(self):
run_clib_install_script()
# I subclass install so that it will call my new command
class install(setuptools.command.install.install):
def run(self):
self.run_command('build_clib')
setuptools.command.install.install.run(self)
# I do the same for build...
class build(distutils.command.build.build):
sub_commands = [
('build_clib', lambda self: True),
] + distutils.command.build.build.sub_commands
# ...and the same for develop
class develop(setuptools.command.develop.develop):
def run(self):
self.run_command('build_clib')
setuptools.command.develop.develop.run(self)
# These are my includes...
# note that /clib/include only exists after calling clib_install.sh
cwd = os.path.dirname(os.path.abspath(__file__))
include_dirs = [
cwd,
cwd + '/clib/include',
cwd + '/common',
]
# These are my arguments for the compiler to my shared-library
lib_path = os.path.join(cwd, "clib", "lib")
library_dirs = [lib_path]
link_args = [os.path.join(lib_path, "libclib.so")]
# My extension module gets these arguments so it can link to clib
mygen_module = Extension('mygen',
language="c++14",
sources=["common/mygen.cpp"],
libraries=['clib'],
extra_compile_args=['-std=c++14'],
include_dirs=include_dirs,
library_dirs=library_dirs,
extra_link_args=link_args
+ ['-Wl,-rpath,$ORIGIN/../clib/lib'])
# I use cmdclass to override the default setuptool commands
setup(name='mypack',
cmdclass = {'install': install,
'build_clib': build_clib, 'build': build,
'develop': develop},
packages=['mypack'],
ext_package='mypack',
ext_modules=[mygen_module],
# package_dir={'mypack': '.'},
# package_data={'mypack': ['docs/*md']},
include_package_data=True)
I subclass some of the setuptools commands in order to build the shared-library before it compiles the extension. clib_install.sh is a bash script that locally downloads and builds the shared library in /clib, creating the headers (in /clib/include) and .so file (in /clib/lib). To solve problems with linking to shared-library dependencies I used $ORIGIN/../clib/lib as a link argument so that the absolute path to clib isn't needed.
Unfortunately, the /clib directory doesn't get copied to the install location. I tried tinkering with package_data but it didn't copy my directory over. In fact, I don't even know what pip/setuptools does with /clib after the script is called, I guess it is made in some temporary build directory and gets deleted after. I am not sure how to get /clib to where it needs to be after it is made.
package_data={
'mypack': [
'clib/include/*.h',
'clib/lib/*.so',
'docs/*md',
]
},

Categories