I am trying to read from external hadoop from tensorflow on my mac. I have built tf with hadoop support from source, and also build hadoop with native library support on my mac. I am getting the following error ,
hdfsBuilderConnect(forceNewInstance=0, nn=192.168.60.53:9000, port=0, kerbTicketCachePath=(NULL), userName=(NULL)) error:
java.lang.NoSuchFieldError: LOG
at org.apache.hadoop.ipc.ClientCache.getClient(ClientCache.java:62)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.<init>(ProtobufRpcEngine.java:145)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.<init>(ProtobufRpcEngine.java:133)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.<init>(ProtobufRpcEngine.java:119)
at org.apache.hadoop.ipc.ProtobufRpcEngine.getProxy(ProtobufRpcEngine.java:102)
at org.apache.hadoop.ipc.RPC.getProtocolProxy(RPC.java:579)
at org.apache.hadoop.hdfs.NameNodeProxies.createNNProxyWithClientProtocol(NameNodeProxies.java:418)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:314)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.FileSystem$1.run(FileSystem.java:162)
at org.apache.hadoop.fs.FileSystem$1.run(FileSystem.java:159)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:159)
2018-10-05 16:01:21.867554: W tensorflow/core/kernels/queue_base.cc:277] _0_input_producer: Skipping cancelled enqueue attempt with queue not closed
Traceback (most recent call last):
This is my code:
import tensorflow as tf
def create_file_reader_ops(filename_queue):
reader = tf.TextLineReader(skip_header_lines=1)
_, csv_row = reader.read(filename_queue)
record_defaults = [[""], [""], [0], [0]]
country, code, gold, silver = tf.decode_csv(csv_row, record_defaults=record_defaults)
features = tf.stack([gold, silver])
return features, country
filename_queue = tf.train.string_input_producer([
"hdfs://192.168.60.53:9000/iris_data_multiclass.csv",
])
example, country = create_file_reader_ops(filename_queue)
with tf.Session() as sess:
tf.global_variables_initializer().run()
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
while True:
try:
example_data, country_name = sess.run([example, country])
print(example_data, country_name)
except tf.errors.OutOfRangeError:
break
I have build hadoop from source on mac.
$ hadoop version
Hadoop 2.7.3
Subversion https://github.com/apache/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff
Compiled by himaprasoon on 2018-10-04T11:09Z
Compiled with protoc 2.5.0
From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4
This command was run using /Users/himaprasoon/git/hadoop/hadoop-dist/target/hadoop-2.7.3/share/hadoop/common/hadoop-common-2.7.3.jar
hadoop checknative output
$ hadoop checknative
18/10/05 16:15:05 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library libbz2.dylib
18/10/05 16:15:05 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /Users/himaprasoon/git/hadoop/hadoop-dist/target/hadoop-2.7.3/lib/native/libhadoop.dylib
zlib: true /usr/lib/libz.1.dylib
snappy: true /usr/local/lib/libsnappy.1.dylib
lz4: true revision:99
bzip2: true /usr/lib/libbz2.1.0.dylib
openssl: true /usr/local/lib/libcrypto.dylib
tf version : 1.10.1
Any ideas what I might be doing wrong?
here are my environment variables.
HADOOP_HOME=/Users/himaprasoon/git/hadoop/hadoop-dist/target/hadoop-2.7.3/
HADOOP_MAPRED_HOME=$HADOOP_HOME
HADOOP_COMMON_HOME=$HADOOP_HOME
HADOOP_HDFS_HOME=$HADOOP_HOME
YARN_HOME=$HADOOP_HOME
HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
HADOOP_INSTALL=$HADOOP_HOME
OPENSSL_ROOT_DIR="/usr/local/opt/openssl"
LDFLAGS="-L${OPENSSL_ROOT_DIR}/lib"
CPPFLAGS="-I${OPENSSL_ROOT_DIR}/include"
PKG_CONFIG_PATH="${OPENSSL_ROOT_DIR}/lib/pkgconfig"
OPENSSL_INCLUDE_DIR="${OPENSSL_ROOT_DIR}/include"
PATH="/usr/local/opt/protobuf#2.5/bin:$PATH
HADOOP_OPTS="-Djava.library.path=${HADOOP_HOME}/lib/native"
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native
JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:${HADOOP_HOME}/lib/native
this is how I am running my program
CLASSPATH=$($HADOOP_HDFS_HOME/bin/hdfs classpath --glob) python3.6 myfile.py
references used to build tf and hadoop
Hadoop native libraries not found on OS/X
https://medium.com/#s.matthew.english/build-hadoop-from-source-on-macos-a3fb2b958b6c
Can Tensorflow read from HDFS on Mac?
https://gist.github.com/zedar/f631ace0759c1d512573
Have you read this post?
Tensorflow Enqueue operation was cancelled
It seems there is a workaround for the same error message there:
The problem happens at the very last stage when python tries to kill threads.
To do this properly you should create a train.Coordinator and pass it to your
queue_runner (no need to pass sess, as default session will be used>
with tf.Session() as sess:
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
// do your things
coord.request_stop()
coord.join(threads)
The last two lines should be added to your while loop to make sure all threads are properly killed.
Related
I'm trying to process the data streaming from Apache Kafka using the Python SDK for Apache Beam with the Flink runner. After running Kafka 2.4.0 and Flink 1.8.3, I follow these steps:
1) Compile and run Beam 2.16 with Flink 1.8 runner.
git clone --single-branch --branch release-2.16.0 https://github.com/apache/beam.git beam-2.16.0
cd beam-2.16.0
nohup ./gradlew :runners:flink:1.8:job-server:runShadow -PflinkMasterUrl=localhost:8081 &
2) Run the Python pipeline.
from apache_beam import Pipeline
from apache_beam.io.external.kafka import ReadFromKafka
from apache_beam.options.pipeline_options import PipelineOptions
if __name__ == '__main__':
with Pipeline(options=PipelineOptions([
'--runner=FlinkRunner',
'--flink_version=1.8',
'--flink_master_url=localhost:8081',
'--environment_type=LOOPBACK',
'--streaming'
])) as pipeline:
(
pipeline
| 'read' >> ReadFromKafka({'bootstrap.servers': 'localhost:9092'}, ['test']) # [BEAM-3788] ???
)
result = pipeline.run()
result.wait_until_finish()
3) Publish some data to Kafka.
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
>{"hello":"world!"}
The Python script throws this error:
[flink-runner-job-invoker] ERROR org.apache.beam.runners.fnexecution.jobsubmission.JobInvocation - Error during job invocation BeamApp-USER-somejob. org.apache.flink.client.program.ProgramInvocationException: Job failed. (JobID: xxx)
at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:268)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:483)
at org.apache.beam.runners.flink.FlinkExecutionEnvironments$BeamFlinkRemoteStreamEnvironment.executeRemotely(FlinkExecutionEnvironments.java:360)
at org.apache.flink.streaming.api.environment.RemoteStreamEnvironment.execute(RemoteStreamEnvironment.java:310)
at org.apache.beam.runners.flink.FlinkStreamingPortablePipelineTranslator$StreamingTranslationContext.execute(FlinkStreamingPortablePipelineTranslator.java:173)
at org.apache.beam.runners.flink.FlinkPipelineRunner.runPipelineWithTranslator(FlinkPipelineRunner.java:104)
at org.apache.beam.runners.flink.FlinkPipelineRunner.run(FlinkPipelineRunner.java:80)
at org.apache.beam.runners.fnexecution.jobsubmission.JobInvocation.runPipeline(JobInvocation.java:78)
at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146)
at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:265)
... 13 more
Caused by: java.lang.ClassCastException: org.apache.beam.sdk.io.kafka.KafkaRecord cannot be cast to [B
at org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41)
at org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56)
at org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:105)
at org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:81)
at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:578)
at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:529)
at org.apache.beam.sdk.util.CoderUtils.encodeToSafeStream(CoderUtils.java:82)
at org.apache.beam.sdk.util.CoderUtils.encodeToByteArray(CoderUtils.java:66)
at org.apache.beam.sdk.util.CoderUtils.encodeToByteArray(CoderUtils.java:51)
at org.apache.beam.sdk.util.CoderUtils.clone(CoderUtils.java:141)
at org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.copy(CoderTypeSerializer.java:67)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:577)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:305)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:394) at org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper.emitElement(UnboundedSourceWrapper.java:341)
at org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper.run(UnboundedSourceWrapper.java:283)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97)
at org.apache.flink.streaming.runtime.tasks.StoppableSourceStreamTask.run(StoppableSourceStreamTask.java:45)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:302)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
... 1 more
ERROR:root:java.lang.ClassCastException: org.apache.beam.sdk.io.kafka.KafkaRecord cannot be cast to [B
[flink-runner-job-invoker] INFO org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService - Manifest at/tmp/artifacts0k1mnin0/somejob/MANIFEST has 0 artifact locations
[flink-runner-job-invoker] INFO org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactStagingService - Removed dir /tmp/artifacts0k1mnin0/job_somejob/
Traceback (most recent call last):
File "main.py", line 40, in <module>
run()
File "main.py", line 37, in run
result.wait_until_finish()
File "/home/USER/beam/lib/python3.5/site-packages/apache_beam/runners/portability/portable_runner.py", line 439, in wait_until_finish self._job_id, self._state, self._last_error_message()))
RuntimeError: Pipeline BeamApp-USER-somejob failed in state FAILED: java.lang.ClassCastException: org.apache.beam.sdk.io.kafka.KafkaRecord cannot be cast to [B
I tried other deserializers available in Kafka but they did not work: Couldn't infer Coder from class org.apache.kafka.common.serialization.StringDeserializer. This error is originating from this piece of code.
Am I doing something wrong?
Disclaimer: this is my first encounter with Apache Beam project.
It seems that Kafka consumer support is quite fresh thing in Beam (at least in Python interface) according to this JIRA. Apparently, it seems that there is still problem with FlinkRunner combined with this new API. Even though your code is technically correct it will not run correctly on Flink. There is a patch available which seems more like a quickfix than final solution to me. It requires recompilation and thus is not something I would propose using on production. If you are just getting started with technology and don't want to be blocked then feel free to try it out.
I attempt to install Spark Release 2.4.0 on my pc, which system is win7_x64.
However when I try to run simple code to check whether spark is ready to work:
code:
import os
from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster('local[*]').setAppName('word_count')
sc = SparkContext(conf=conf)
d = ['a b c d', 'b c d e', 'c d e f']
d_rdd = sc.parallelize(d)
rdd_res = d_rdd.flatMap(lambda x: x.split(' ')).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a+b)
print(rdd_res)
print(rdd_res.collect())
I get this error:
error1
I open the worker.py file to check the code.
I find that, in version 2.4.0, the code is :
worker.py v2.4.0
However, in version 2.3.2, the code is:
worker.py v2.3.2
Then I reinstall spark-2.3.2-bin-hadoop2.7 , the code works fine.
Also, I find this question:
ImportError: No module named 'resource'
So, I think spark-2.4.0-bin-hadoop2.7 can not work in win7 because of importing
resource module in worker.py, which is a Unix specific package.
I hope someone could fix this problem in spark.
i got this error and I have spark 2.4.0, jdk 11 and kafka 2.11 on windows.
I was able to resolve this by doing -
1) cd spark_home\python\lib
ex. cd C:\myprograms\spark-2.4.0-bin-hadoop2.7\python
2) unzip pyspark.zip
3) edit worker.py , comment out 'import resource' and also following para and save the file. This para is just an add on and is not a core code, so its fine to comment it out.
4)remove the older pyspark.zip and create new zip.
5) in jupyter notebook restart the kernel.
commented para in worker.py -
# set up memory limits
#memory_limit_mb = int(os.environ.get('PYSPARK_EXECUTOR_MEMORY_MB', "-1"))
#total_memory = resource.RLIMIT_AS
#try:
# if memory_limit_mb > 0:
#(soft_limit, hard_limit) = resource.getrlimit(total_memory)
#msg = "Current mem limits: {0} of max {1}\n".format(soft_limit, hard_limit)
#print(msg, file=sys.stderr)
# convert to bytes
#new_limit = memory_limit_mb * 1024 * 1024
#if soft_limit == resource.RLIM_INFINITY or new_limit < soft_limit:
# msg = "Setting mem limits to {0} of max {1}\n".format(new_limit, new_limit)
# print(msg, file=sys.stderr)
# resource.setrlimit(total_memory, (new_limit, new_limit))
#except (resource.error, OSError, ValueError) as e:
# # not all systems support resource limits, so warn instead of failing
# print("WARN: Failed to set memory limit: {0}\n".format(e), file=sys.stderr)
Python has some compatibility issue with the newly released Spark 2.4.0 version. I also faced this similar issue. If you download and configure Spark 2.3.2 in your system (change environment variables), the problem will be resolved.
Does anyone here succeed to run im2txt with TensorFlow 1.4.1?
I'm using this model(https://drive.google.com/file/d/0B_qCJ40uBfjEWVItOTdyNUFOMzg/view)
2018-01-04 00:46:59.268582: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key lstm/basic_lstm_cell/kernel not found in checkpoint
Then I tried the following script to convert model. The script generated checkpoint, .meta, .data, and .index.
OLD_CHECKPOINT_FILE = "/tmp/my_checkpoint/model.ckpt-3000000"
NEW_CHECKPOINT_FILE = "/tmp/my_converted_checkpoint/model.ckpt-3000000"
import tensorflow as tf
vars_to_rename = {
"lstm/BasicLSTMCell/Linear/Matrix": "lstm/basic_lstm_cell/weights",
"lstm/BasicLSTMCell/Linear/Bias": "lstm/basic_lstm_cell/biases",
}
new_checkpoint_vars = {}
reader = tf.train.NewCheckpointReader(OLD_CHECKPOINT_FILE)
for old_name in reader.get_variable_to_shape_map():
if old_name in vars_to_rename:
new_name = vars_to_rename[old_name]
else:
new_name = old_name
new_checkpoint_vars[new_name] = tf.Variable(reader.get_tensor(old_name))
init = tf.global_variables_initializer()
saver = tf.train.Saver(new_checkpoint_vars)
with tf.Session() as sess:
sess.run(init)
print("save checkpoint")
saver.save(sess, NEW_CHECKPOINT_FILE)
Could anyone tell me how I can use those files to run im2txt with TensorFlow 1.4.1. (Actually, I could run im2txt with tensorflow 0.12.1)
Env
python 3.5.2
Mac OS X version 10.12.6
TensorFlow 1.4.1
Thank for your help.
Get the same error with checkpoint file with tf 1.4.1 and python3.5 on MacOS 10.13
Reason: checkpoint file downloaded is generated using an old version of tensorflow(python2). word_count.txt file format
answers from https://github.com/KranthiGV/Pretrained-Show-and-Tell-model
Changes:
1. generate ckp file which can be loaded by tf1.4.1
OLD_CHECKPOINT_FILE = "model.ckpt-1000000"
NEW_CHECKPOINT_FILE = "model2.ckpt-1000000"
import tensorflow as tf
vars_to_rename = {
"lstm/basic_lstm_cell/weights": "lstm/basic_lstm_cell/kernel",
"lstm/basic_lstm_cell/biases": "lstm/basic_lstm_cell/bias",
}
new_checkpoint_vars = {}
reader = tf.train.NewCheckpointReader(OLD_CHECKPOINT_FILE)
for old_name in reader.get_variable_to_shape_map():
if old_name in vars_to_rename:
new_name = vars_to_rename[old_name]
else:
new_name = old_name
new_checkpoint_vars[new_name] =
tf.Variable(reader.get_tensor(old_name))`
init = tf.global_variables_initializer()
saver = tf.train.Saver(new_checkpoint_vars)
with tf.Session() as sess:
sess.run(init)
saver.save(sess, NEW_CHECKPOINT_FILE)
python3 file reading problem, in im2txt/run_reference.py
with tf.gfile.GFile(filename, "rb") as f:
word_count.txt downloaded from that link need to be replaced with this one
https://github.com/siavash9000/im2txt_demo/tree/master/im2txt_pretrained
Chunfang's solution works for me, but I wanted to share another approach.
In recent versions of TensorFlow, Google provides an "official" checkpoint_convert.py utility to convert old RNN checkpoints:
python checkpoint_convert.py [--write_v1_checkpoint] \
'/path/to/old_checkpoint' '/path/to/new_checkpoint'
I have a client-server design using Pyro4, in which the client code is as follows:
import Pyro4
uri = 'PYRO:PYRO_SERVER#123.123.123.123:10000
test_1 = Pyro4.Proxy(uri)
test_1.run_model()
The server-side code is as follows:
import Pyro4
import socket
from keras.models import Sequential
from keras.layers import LSTM
import tensorflow as tf
#Pyro4.expose
class PyroServer(object):
def run_model(self):
session = tf.Session()
session.run(tf.global_variables_initializer())
session.run(tf.local_variables_initializer())
session.run(tf.tables_initializer())
session.run(tf.variables_initializer([]))
tf.reset_default_graph()
model = Sequential()
model.add(LSTM(25, input_shape=(5, 10)))
host_name = socket.gethostbyname(socket.getfqdn())
daemon = Pyro4.Daemon(host = host_name,port = 10000)
uri = daemon.register(PyroServer,objectId = 'PYRO_SERVER')
daemon.requestLoop()
After the server is started, the first call from the client to the run_model() method functions properly. For the second, and all subsequent calls, the following error message is displayed:
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Pyro4/core.py", line 187, in call
return self.__send(self.__name, args, kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Pyro4/core.py", line 472, in _pyroInvoke
raise data # if you see this in your traceback, you should probably inspect the remote traceback as well
ValueError: Fetch argument cannot be interpreted as a Tensor. (Operation name: "lstm_1/init"
op: "NoOp"
input: "^lstm_1/kernel/Assign"
input: "^lstm_1/recurrent_kernel/Assign"
input: "^lstm_1/bias/Assign"
is not an element of this graph.)
Can anyone suggest a possible solution for this?
I'm not familiar with Tensorflow, but the actual error is this:
ValueError: Fetch argument cannot be interpreted as a Tensor.
Simplify your code and make it run stand-alone correctly first, only then wrap it in a Pyro service.
Hey I use that code and that works great for me.
$cat greeting-server.py
import Pyro4
import tensorflow as tf
#Pyro4.expose
class GreetingMaker(object):
def get_fortune(self, name):
var = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
return "Hello, {0}. Here is your greeting message:\n" \
"{1}".format(name,sess.run(var))
daemon = Pyro4.Daemon() # make a Pyro daemon
uri = daemon.register(GreetingMaker) # register the greeting maker as a Pyro object
print("Ready. Object uri =", uri) # print the uri so we can use it in the client later
daemon.requestLoop() # start the event loop of the server to wait for calls
$ cat greeting-client.py
import Pyro4
uri = input("What is the Pyro uri of the greeting object? ").strip()
name = input("What is your name? ").strip()
greeting_maker = Pyro4.Proxy(uri) # get a Pyro proxy to the greeting object
print(greeting_maker.get_fortune(name)) # call method normally
$ python greeting-server.py &
[1] 2965
Ready. Object uri = PYRO:obj_a751da78da6a4feca49f18ab664cc366#localhost:53025
$python greeting-client.py
What is the Pyro uri of the greeting object?
PYRO:obj_a751da78da6a4feca49f18ab664cc366#localhost:53025
What is your name?
Plm
2018-03-06 16:20:32.271647: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2018-03-06 16:20:32.271673: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-03-06 16:20:32.271678: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2018-03-06 16:20:32.271682: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2018-03-06 16:20:32.271686: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
Hello, Plm. Here is your greeting message:
b'Hello, TensorFlow!'
And as you can see, if get connected to same url again, then it works without the TF initialization time, since it was done during first call already. Then persistency is maintained across 2 separate calls, as long as you call same Pyro url, obviously.
$ python greeting-client.py
What is the Pyro uri of the greeting object?
PYRO:obj_a751da78da6a4feca49f18ab664cc366#localhost:53025
What is your name?
Plm2
Hello, Plm2. Here is your greeting message:
b'Hello, TensorFlow!'
Tensorboard should be started from commnad line like that:
tensorboard --logdir=path
I need to run it from code. Until now I use this:
import os
os.system('tensorboard --logdir=' + path)
However tensorboard do not start because is not included in the system path. I use PyCharm with virtualenv on windows. I don't want to change system paths so the only option is to run it from virtualenv. How to do this?
Using Tensorboard 2 API (2019):
from tensorboard import program
tracking_address = log_path # the path of your log file.
if __name__ == "__main__":
tb = program.TensorBoard()
tb.configure(argv=[None, '--logdir', tracking_address])
url = tb.launch()
print(f"Tensorflow listening on {url}")
Note: tb.launch() create a daemon thread that will die automatically when your process is finished
Probably a bit late for an answer, but this is what worked for me in Python 3.6.2:
import tensorflow as tf
from tensorboard import main as tb
tf.flags.FLAGS.logdir = "/path/to/graphs/"
tb.main()
That runs tensorboard with the default configuration and looks for graphs and summaries in "/path/to/graphs/". You can of course change the log directory and set as many variables as you like using:
tf.flags.FLAGS.variable = value
Hope it helps.
You should launch tensorBoard in the separate thread:
def launchTensorBoard():
import os
os.system('tensorboard --logdir=' + tensorBoardPath)
return
import threading
t = threading.Thread(target=launchTensorBoard, args=([]))
t.start()
As I get the same problem, you can use this lines inspired by tensorboard\main.py:
from tensorboard import default
from tensorboard import program
tb = program.TensorBoard(default.PLUGIN_LOADERS, default.get_assets_zip_provider())
tb.configure(argv=['--logdir', my_directory])
tb.main()
With my_directory as the folder you want to check. Don't forget to create a separate Thread if you want to avoid to be block after tb.main().
Best regards
EDIT Tensorboard V1.10:
For some personnal reasons, I write it in a different way:
class TensorBoardTool:
def __init__(self, dir_path):
self.dir_path = dir_path
def run(self):
# Remove http messages
log = logging.getLogger('werkzeug')
log.setLevel(logging.ERROR)
# Start tensorboard server
tb = program.TensorBoard(default.PLUGIN_LOADERS, default.get_assets_zip_provider())
tb.configure(argv=['--logdir', self.dir_path])
url = tb.launch()
sys.stdout.write('TensorBoard at %s \n' % url)
EDIT Tensorboard V1.12:
According to Elad Weiss and tsbertalan for the version 1.12 of tensorboard.
def run(self):
# Remove http messages
log = logging.getLogger('werkzeug').setLevel(logging.ERROR)
# Start tensorboard server
tb = program.TensorBoard(default.get_plugins(), default.get_assets_zip_provider())
tb.configure(argv=[None, '--logdir', self.dir_path])
url = tb.launch()
sys.stdout.write('TensorBoard at %s \n' % url)
Then to run it just do:
# Tensorboard tool launch
tb_tool = TensorBoardTool(work_dir)
tb_tool.run()
This will allow you to run a Tensorboard server at same time as your main process, without disturbing http request!
For Tensorboard 2.1.0, this works for me:
python -m tensorboard.main --logdir $PWD/logs
You must have your env active first. (In my case, conda install had a fatal error, so I needed to reinstall tf via pip inside conda.)
A full solution for Tensorboard 2 (2019), with automatic opening of the Chrome browser, for Windows and Linux. Works for both environments: conda and virtualenv. This solution suppresses the Tensorboard output so it doesn't (irritatingly) show in stdout
from multiprocessing import Process
import sys
import os
class TensorboardSupervisor:
def __init__(self, log_dp):
self.server = TensorboardServer(log_dp)
self.server.start()
print("Started Tensorboard Server")
self.chrome = ChromeProcess()
print("Started Chrome Browser")
self.chrome.start()
def finalize(self):
if self.server.is_alive():
print('Killing Tensorboard Server')
self.server.terminate()
self.server.join()
# As a preference, we leave chrome open - but this may be amended similar to the method above
class TensorboardServer(Process):
def __init__(self, log_dp):
super().__init__()
self.os_name = os.name
self.log_dp = str(log_dp)
# self.daemon = True
def run(self):
if self.os_name == 'nt': # Windows
os.system(f'{sys.executable} -m tensorboard.main --logdir "{self.log_dp}" 2> NUL')
elif self.os_name == 'posix': # Linux
os.system(f'{sys.executable} -m tensorboard.main --logdir "{self.log_dp}" '
f'--host `hostname -I` >/dev/null 2>&1')
else:
raise NotImplementedError(f'No support for OS : {self.os_name}')
class ChromeProcess(Process):
def __init__(self):
super().__init__()
self.os_name = os.name
self.daemon = True
def run(self):
if self.os_name == 'nt': # Windows
os.system(f'start chrome http://localhost:6006/')
elif self.os_name == 'posix': # Linux
os.system(f'google-chrome http://localhost:6006/')
else:
raise NotImplementedError(f'No support for OS : {self.os_name}')
Initialization:
tb_sup = TensorboardSupervisor('path/to/logs')
After finishing the training/testing:
tb_sup.finalize()
If your python interpreter path is:
/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/bin/python3.6
You can run this command instead of tensorboard
/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorboard/main.py
To run tensorboard from a python script within a specified virtual environment you have to change tensorboard to /path/to/your/environment/bin/tensorboard. It is also recommended to execute the command in a separate thread as suggested by #Dmitry.
Together it looks like this and works for me with tb and tf version 1.14.0:
def run_tensorboard(logdir_absolute):
import os, threading
tb_thread = threading.Thread(
target=lambda: os.system('/home/username/anaconda3/envs/'
'env_name/bin/tensorboard '
'--logdir=' + logdir_absolute),
daemon=True)
tb_thread.start()
As of TensorBoard version 1.9.0, the following works to start TensorBoard with default settings in the same Python process:
import tensorboard as tb
import tensorboard.program
import tensorboard.default
tb.program.FLAGS.logdir = 'path/to/logdir'
tb.program.main(tb.default.get_plugins(),
tb.default.get_assets_zip_provider())
The following will open a Chrome tab and launches TensorBoard. Simply provide the desired directory and your system's name .
import os
os.system(
"cd <directory> \
&& google-chrome http://<your computer name>:6007 \
&& tensorboard --port=6007 --logdir runs"
)
Had the same problem:
As you're working on Windows, you can use batch files to fully-automate opening tensorboard like in the exaple below.
As you probably want to open tensorboard within a visible console window (cmd.exe). Calling one batch-file within your IDE (pycharm) will run it within the IDE, so in the background, which means you can't see the console. Therefore, you can use a workaround: call a batch-file that then calls the batch-file to start tensorboard.
Note: I'm using Anaconda as my virtual-environment for this example
batch_filename = 'start_tb.bat' # set filename for batch file
tb_command = 'tensorboard --logdir=' + log_dir # join strings for tensorflow command
# creates batch file that will call seconds batch file in console window (cmd.exe)
with open(os.path.join('invoke.bat'), "w") as f:
f.writelines('start ' + batch_filename)
# created batch file that activates Anaconda environment and starts tensorboard
with open(os.path.join(batch_filename), "w") as f:
f.writelines('\nconda activate YOURCondaEnvNAME && ' + tb_command) # change to your conda environment, or other virtualenv
# starts tensorboard using the batch files (will open console window)
# calls the 'invoke.bat' that will call 'start_tb.bat'
os.system('invoke.bat')
# starts tensorboard in default browser >> ATTENTION: must be adapted to local host
os.system('start "" http://YOUR-COMPUTER-NAME:6006/') # just copy the URL that tensorboard runs at on your computer
Sometimes you might have to refresh tensorboard within your browser, as it opened already before it was properly set-up.
Try running from python
import os
os.system('python -m tensorflow.tensorboard --logdir=' + path)
works for me in PyCharm (but on linux, so if the shell syntax is different then you have to tweak it)