NCCL install / Tensorflow complie error with RTX 2080ti - python

I am installing Cuda 10, CuDnn 7.3 and NCCL 2.3 on Ubuntu 18.04 with a 2080ti. I am having trouble with the NCCL part. When i try to compile Tensorflow it says error and cannot find NCCL-SLA.txt file. But when i search for that file i cant find it anywhere either? So when i look online people say you can move it from another directory but its not there for me so i do not know what to do.
Here is the error code i am getting:
ERROR: missing input file '#local_config_nccl//:nccl/NCCL-SLA.txt'
ERROR: /home/josh/tensorflow/tensorflow/tools/pip_package/BUILD:166:1: //tensorflow/tools/pip_package:build_pip_package: missing input file '#local_config_nccl//:nccl/NCCL-SLA.txt'
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
ERROR: /home/josh/tensorflow/tensorflow/tools/pip_package/BUILD:166:1 1 input file(s) do not exist
INFO: Elapsed time: 17.209s, Critical Path: 11.30s
INFO: 353 processes, local.
FAILED: Build did NOT complete successfully

If you can’t find the file NCCL-SLA.txt, try copy the file LICENSE.txt to a new file named NCCL-SLA.txt in the same directory

Related

Tensorflow XLA compiler - bazel build error - Could not find include file ".../hlo_ops_base.td"

I am trying to use XLA compiler from Tensorflow following the example provided at this page:
https://gist.github.com/carlthome/6ae8a570e21069c60708017e3f96c9fd
In short, it downloads a ResNet50 network and compiles it as library.
During execution of bazel build I always end up on the following build error:
error: Could not find include file 'tensorflow/compiler/mlir/xla/ir/hlo_ops_base.td'
include "tensorflow/compiler/mlir/xla/ir/hlo_ops_base.td"
^
external/org_tensorflow/tensorflow/compiler/mlir/xla/ir/hlo_ops.td:22:9: error: Unexpected input at top level
include "tensorflow/compiler/mlir/xla/ir/hlo_ops_base.td"
> ERROR: /home/ubuntu/.cache/bazel/_bazel_ubuntu/e5cce820cc082410b4fcc604db349066/external/org_tensorflow/tensorflow/compiler/mlir/xla/BUILD:465:1: Executing genrule #org_tensorflow//tensorflow/compiler/mlir/xla:operator_writer_inc failed (Exit 1)
[6,144 / 7,191] 3 actions running
#org_tensorflow//tensorflow/compiler/xla/client:global_data; 4s local
#org_tensorflow//tensorflow/core/kernels/tensor_forest:resources; 1s local
...//tensorflow/core/kernels:eigen_contraction_kernel_with_mkl; 1s local
external/org_tensorflow/tensorflow/compiler/mlir/xla/ir/hlo_ops.td:22:9: error: Could not find include file 'tensorflow/compiler/mlir/xla/ir/hlo_ops_base.td'
include "tensorflow/compiler/mlir/xla/ir/hlo_ops_base.td"
^
external/org_tensorflow/tensorflow/compiler/mlir/xla/ir/hlo_ops.td:22:9: error: Unexpected input at top level
include "tensorflow/compiler/mlir/xla/ir/hlo_ops_base.td"
^
[6,144 / 7,191] 3 actions running
#org_tensorflow//tensorflow/compiler/xla/client:global_data; 4s local
#org_tensorflow//tensorflow/core/kernels/tensor_forest:resources; 1s local
...//tensorflow/core/kernels:eigen_contraction_kernel_with_mkl; 1s local
Target #org_tensorflow//:graph failed to build
[6,147 / 7,191] checking cached actions
Use --verbose_failures to see the command lines of failed build steps.
[6,147 / 7,191] checking cached actions
INFO: Elapsed time: 7903.567s, Critical Path: 204.12s
[6,147 / 7,191] checking cached actions
INFO: 5961 processes: 5961 local.
[6,147 / 7,191] checking cached actions
FAILED: Build did NOT complete successfully
FAILED: Build did NOT complete successfully
So, it does not find the hlo_ops_base.td file, which of course is present in the path (I checked it)
The first time I have tried this, it worked like a charm.
Afterwards I have executed it again on different machines (also perfect clean VMs on different platforms), but always had the same issue.
I am using
bazel 1.1.0,
tensorflow 1.14 (cpu),
protobuf 3.0.0,
python 2.7
Does anyone have any clue on how to solve this? I have tried to search it online and it seems no one else is having this issue...
Thanks,
Matteo

Compiling Tensorflow and Bazel on 32-bit Linux

I was trying to compile bazel on my Debian 32-bit.
I started the process with this command:
env EXTRA_BAZEL_ARGS="--host_javabase=#local_jdk//:jdk" BAZEL_JAVAC_OPTS="-J-Xms384m -J-Xmx512m" bash ./compile.sh
And I get this error:
In file included from external/com_google_protobuf/src/google/protobuf/message.h:122,
from external/com_google_protobuf/src/google/protobuf/descriptor.pb.h:29,
from external/com_google_protobuf/src/google/protobuf/descriptor.cc:52:
external/com_google_protobuf/src/google/protobuf/descriptor.h:1283:26: note: 'class google::protobuf::FileDescriptor' declared here
class LIBPROTOBUF_EXPORT FileDescriptor {
^~~~~~~~~~~~~~
At global scope:
cc1plus: warning: unrecognized command line option '-Wno-writable-strings'
ERROR: /path/to/bazel/third_party/BUILD:535:1: Executing genrule //third_party:filter_netty_dynamic_libs failed (Exit 12)
zip error: Nothing to do! (bazel-out/piii-opt/bin/third_party/netty_tcnative/netty-tcnative-filtered.jar)
Target //src:bazel_nojdk failed to build
INFO: Elapsed time: 1173,794s, Critical Path: 37,45s
INFO: 976 processes: 976 local.
FAILED: Build did NOT complete successfully
ERROR: Could not build Bazel
I searched for a while... but nothing...
Any idea?
I followed the official instructions available at https://docs.bazel.build/versions/master/install-compile-source.html
Unfortunately, Bazel does not support 32-bit builds. Based on the complexity of builds that Bazel is designed to handle, a 32-bit system is just not feasible to use.

How to build Python PEX using Bazel?

I'm trying to build a pex_binary in mac OS X for my Apache Heron application (written in Python), but it fails with error.
Details of bazel build error below.
$bazel build pmTop
ERROR: /Arun/Python/Heron/PatMon/WORKSPACE:1:1: name 'git_repository' is not defined
ERROR: Error evaluating WORKSPACE file
ERROR: error loading package '': Encountered error while reading extension file 'pex/pex_rules.bzl': no such package '#io_bazel_rules_pex//pex': error loading package 'external': Could not load //external package
ERROR: error loading package '': Encountered error while reading extension file 'pex/pex_rules.bzl': no such package '#io_bazel_rules_pex//pex': error loading package 'external': Could not load //external package
INFO: Elapsed time: 0.104s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
Contents of my WORKSPACE below.
git_repository(
name = "io_bazel_rules_pex",
remote = "https://github.com/benley/bazel_rules_pex.git",
tag = "0.3.0",
)
load("#io_bazel_rules_pex//pex:pex_rules.bzl", "pex_repositories")
pex_repositories()
Bazel version details below.
$bazel version
Build label: 0.25.0
Build target: bazel-out/darwin-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Wed May 1 21:47:49 2019 (1556747269)
Build timestamp: 1556747269
Build timestamp as int: 1556747269
Not sure, why it's unable to fetch the pex_rules.bzl package. I'm not behind firewall. Appreciate any pointers to fix this issue.
The git_repository rule is not known. Add the following statement on the top of your WORKSPACE file
load("#bazel_tools//tools/build_defs/repo:git.bzl", "git_repository")

Building boost examples in linux

I've installed all the boost and boost-related packages from synaptic manager. If I ran "bjam" from the directory getting error :
error: Unable to load Jamfile.
error: Could not find a Jamfile in directory '/usr/share/boost- build/kernel'.
error: Attempted to find it with pattern '[Bb]uild.jam [Jj]amfile.v2 [Jj]amfile [Jj]amfile.jam'.
error: Please consult the documentation at 'http://www.boost.org'.

Cannot build the bootloader of pyinstaller on solaris 10(i386)

Trying to build the bootloader of pyinstaller by this instruction, but fail
error messages :
[37/40] cc_link: build/debug/linux/getpath_1.o build/debug/common/pyi_python_1.o build/debug/common/pyi_global_1.o build/debug/common/pyi_launch_1.o build/debug/common/pyi_pythonlib_1.o build/debug/common/pyi_utils_1.o build/debug/common/pyi_archive_1.o build/debug/common/main_1.o build/debug/common/pyi_path_1.o -> build/debug/run_d
ld: fatal: library -lz: not found
ld: fatal: File processing errors. No output written to /home/devel3/nwtham/python27/pyinstaller/bootloader/build/debug/run_d
Waf: Leaving directory `/home/devel3/nwtham/python27/pyinstaller/bootloader/build'
Build failed: -> task failed (err #1):
{task: cc_link getpath_1.o,pyi_python_1.o,pyi_global_1.o,pyi_launch_1.o,pyi_pythonlib_1.o,pyi_utils_1.o,pyi_archive_1.o,main_1.o,pyi_path_1.o -> run_d}
I try to install libz1-1.2.8,REV=2013.09.23-SunOS5.10-i386-CSW.pkg.gz from here, but it still get the same error messages. I check on the folder /opt/csw/lib and found a hyperlink named "libz.so.1" and change it to "libz.so", but I still got the same error message.
My LD_LIBRARY_PATH
/opt/csw/lib/python2.7:/opt/csw/lib:/usr/lib:/usr/local/gcc/lib:/usr/local/lib:/usr/sfw/lib:/usr/dt/lib:/usr/openwin/lib/:/usr/ucblib:/opt/sybase/lib
Maybe I install a wrong package, which package should I install?
Python2.7 and gcc4.9.2 all download from here.
edit :
part of the PATH :
/usr/bin:/usr/local/bin:/opt/csw/gcc4/bin:/usr/sfw/bin:/opt/csw/bin

Categories