Keras/TensorFlow error while running as Pyro4 Server - python

I have a client-server design using Pyro4, in which the client code is as follows:
import Pyro4
uri = 'PYRO:PYRO_SERVER#123.123.123.123:10000
test_1 = Pyro4.Proxy(uri)
test_1.run_model()
The server-side code is as follows:
import Pyro4
import socket
from keras.models import Sequential
from keras.layers import LSTM
import tensorflow as tf
#Pyro4.expose
class PyroServer(object):
def run_model(self):
session = tf.Session()
session.run(tf.global_variables_initializer())
session.run(tf.local_variables_initializer())
session.run(tf.tables_initializer())
session.run(tf.variables_initializer([]))
tf.reset_default_graph()
model = Sequential()
model.add(LSTM(25, input_shape=(5, 10)))
host_name = socket.gethostbyname(socket.getfqdn())
daemon = Pyro4.Daemon(host = host_name,port = 10000)
uri = daemon.register(PyroServer,objectId = 'PYRO_SERVER')
daemon.requestLoop()
After the server is started, the first call from the client to the run_model() method functions properly. For the second, and all subsequent calls, the following error message is displayed:
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Pyro4/core.py", line 187, in call
return self.__send(self.__name, args, kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/Pyro4/core.py", line 472, in _pyroInvoke
raise data # if you see this in your traceback, you should probably inspect the remote traceback as well
ValueError: Fetch argument cannot be interpreted as a Tensor. (Operation name: "lstm_1/init"
op: "NoOp"
input: "^lstm_1/kernel/Assign"
input: "^lstm_1/recurrent_kernel/Assign"
input: "^lstm_1/bias/Assign"
is not an element of this graph.)
Can anyone suggest a possible solution for this?

I'm not familiar with Tensorflow, but the actual error is this:
ValueError: Fetch argument cannot be interpreted as a Tensor.
Simplify your code and make it run stand-alone correctly first, only then wrap it in a Pyro service.

Hey I use that code and that works great for me.
$cat greeting-server.py
import Pyro4
import tensorflow as tf
#Pyro4.expose
class GreetingMaker(object):
def get_fortune(self, name):
var = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
return "Hello, {0}. Here is your greeting message:\n" \
"{1}".format(name,sess.run(var))
daemon = Pyro4.Daemon() # make a Pyro daemon
uri = daemon.register(GreetingMaker) # register the greeting maker as a Pyro object
print("Ready. Object uri =", uri) # print the uri so we can use it in the client later
daemon.requestLoop() # start the event loop of the server to wait for calls
$ cat greeting-client.py
import Pyro4
uri = input("What is the Pyro uri of the greeting object? ").strip()
name = input("What is your name? ").strip()
greeting_maker = Pyro4.Proxy(uri) # get a Pyro proxy to the greeting object
print(greeting_maker.get_fortune(name)) # call method normally
$ python greeting-server.py &
[1] 2965
Ready. Object uri = PYRO:obj_a751da78da6a4feca49f18ab664cc366#localhost:53025
$python greeting-client.py
What is the Pyro uri of the greeting object?
PYRO:obj_a751da78da6a4feca49f18ab664cc366#localhost:53025
What is your name?
Plm
2018-03-06 16:20:32.271647: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2018-03-06 16:20:32.271673: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-03-06 16:20:32.271678: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2018-03-06 16:20:32.271682: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2018-03-06 16:20:32.271686: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
Hello, Plm. Here is your greeting message:
b'Hello, TensorFlow!'
And as you can see, if get connected to same url again, then it works without the TF initialization time, since it was done during first call already. Then persistency is maintained across 2 separate calls, as long as you call same Pyro url, obviously.
$ python greeting-client.py
What is the Pyro uri of the greeting object?
PYRO:obj_a751da78da6a4feca49f18ab664cc366#localhost:53025
What is your name?
Plm2
Hello, Plm2. Here is your greeting message:
b'Hello, TensorFlow!'

Related

Problem with init() function for model deployment in Azure

I want to deploy model in Azure but I'm struggling with the following problem.
I have my model registered in Azure. The file with extension .sav is located locally. The registration looks the following:
import urllib.request
from azureml.core.model import Model
# Register model
model = Model.register(ws, model_name="my_model_name.sav", model_path="model/")
I have my score.py file. The init() function in the file looks like this:
import json
import numpy as np
import pandas as pd
import os
import pickle
from azureml.core.model import Model
def init():
global model
model_path = Model.get_model_path(model_name = 'my_model_name.sav', _workspace='workspace_name')
model = pickle(open(model_path, 'rb'))
But when I try to deploy I se the following error:
"code": "AciDeploymentFailed",
"statusCode": 400,
"message": "Aci Deployment failed with exception: Your container application crashed. This may be caused by errors in your scoring file's init() function.
1. Please check the logs for your container instance: leak-tester-pm. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs.
And when I run print(service.logs()) I have the following output (I have only one model registered in Azure):
None
Am I doing something wrong with loading model in score.py file?
P.S. The .yml file for the deployment:
name: project_environment
dependencies:
# The python interpreter version.
# Currently Azure ML only supports 3.5.2 and later.
- python=3.6.2
- pip:
- scikit-learn==0.24.2
- azureml-defaults
- numpy
- pickle-mixin
- pandas
- xgboost
- azure-ml-api-sdk
channels:
- anaconda
- conda-forge
The local inference server allows you to quickly debug your entry script (score.py). In case the underlying score script has a bug, the server will fail to initialize or serve the model. Instead, it will throw an exception & the location where the issues occurred.
There are two possible reasons for the error or exception occurred.
HTTP server issue. Need to troubleshoot it
Docker deployment.
You need to debug the procedure which you followed. In some cases, HTTP server issues will cause the problem of initialization (init())
Check Azure Machine Learning inference HTTP server for better debugging from server perspective.
The docker file mentioned is looking good, but it's better to debug once by the steps mentioned in https://learn.microsoft.com/en-us/azure/machine-learning/how-to-troubleshoot-deployment-local#dockerlog
Try below code inside init() function:
def init():
global model
model_path = "modelfoldername/model.pkl"
filename = 'mymodel.sav'
pickle.dump(model_path, open(filename, 'wb'))
# load the model from disk
model= pickle.load(open(filename, 'rb'))

Getting Memory usage of a tensorflow model with guppy.hpy() not working

I am loading a saved model of tensorflow (.pb file) and trying to evaluate how much memory it allocates for the model with guppy package. Following a simple tutorial, here is what i tried:
from guppy import hpy
import tensorflow as tf
heap = hpy()
print("Heap Status at starting: ")
heap_status1 = heap.heap()
print("Heap Size : ", heap_status1.size, " bytes\n")
print(heap_status1)
heap.setref()
print("\nHeap Status after setting reference point: ")
heap_status2 = heap.heap()
print("Heap size: ", heap_status2.size, " bytes\n")
print(heap_status2)
model_path = "./saved_model/" #.pb file directory
model = tf.saved_model.load(model_path)
print("\nHeap status after creating model: ")
heap_status3 = heap.heap()
print("Heap size: ", heap_status3.size, " bytes\n")
print(heap_status3)
print("Memory used by the model: ", heap_status3.size - heap_status2.size)
I don't know why, but when i run the code it suddenly stops executing when i call heap_status1 = heap.heap(). It doesn't throw any error.
This same code runs fine when i don't use anything related to tensorflow, i.e. it runs successfully when i just create some random lists, strings, etc instead of loading a tensorflow model.
Note: my model will run in a CPU device. Unfortunately, tf.config.experimental.get_memory_info works with GPUs only.
If you are on Windows, the crash may be related to https://github.com/zhuyifei1999/guppy3/issues/25. Check pywin32 version and if it is < 300, upgrade pywin32 with
pip install -U pywin32

why does importing keras take so long on AWS Lambda?

Problem
I'm trying to load and use a keras model in AWS Lambda, but importing keras from tensorflow is taking a long time in my lambda function. Curiously, though, it didn't take very long in SageMaker. Why is this, and how can I fix it?
Setup Description
I'm using the serverless framework to deploy my function. The handler and serverless.yml are included below. I have an EFS volume holding my dependencies, which were installed using an EC2 instance with the EFS volume mounted. I pip installed dependencies to the EFS with the -t flag. For example, I installed tensorflow like this:
sudo miniconda3/envs/devenv/bin/pip install tensorflow -t /mnt/efs/fs1/lib
where /mnt/efs/fs1/lib is the folder on the EFS which stores my dependencies. The models are stored on s3.
I prototyped my loading my model on a Sagemaker notebook with the following code:
import time
start = time.time()
from tensorflow import keras
print('keras: {}'.format(time.time()-start))
import boto3
import os
import zipfile
print('imports: {}'.format(time.time()-start))
modelPath = '***model.zip'
bucket = 'predictionresources'
def load_motion_model():
s3 = boto3.client('s3')
s3.download_file(bucket, modelPath, 'model.motionmodel.zip')
with zipfile.ZipFile('model.motionmodel.zip', 'r') as zip_ref:
zip_ref.extractall('model.motionmodel')
return keras.models.load_model('model.motionmodel/'+os.listdir('model.motionmodel')[0])
model = load_motion_model()
print('total time: {}'.format(time.time()-start))
which has the following output:
keras: 2.0228586196899414
imports: 2.0231151580810547
total time: 3.0635251998901367
so, including all imports, this takes around 3 seconds to execute.
however, when I deploy to AWS Lambda with serverless, keras takes substantially longer to import. This lambda function (which is the same as the other one, just wrapped in a handler) and serverless.yml:
Handler
try:
import sys
import os
sys.path.append(os.environ['MNT_DIR']+'/lib0') # nopep8 # noqa
except ImportError:
pass
#returns the version of all dependancies
def test(event, context):
print('TEST LOADING')
import time
start = time.time()
from tensorflow import keras
print('keras: {}'.format(time.time()-start))
import boto3
import os
import zipfile
print('imports: {}'.format(time.time()-start))
modelPath = '**********nmodel.zip'
bucket = '***********'
def load_motion_model():
s3 = boto3.client('s3')
s3.download_file(bucket, modelPath, 'model.motionmodel.zip')
with zipfile.ZipFile('model.motionmodel.zip', 'r') as zip_ref:
zip_ref.extractall('model.motionmodel')
return keras.models.load_model('model.motionmodel/'+os.listdir('model.motionmodel')[0])
model = load_motion_model()
print('total time: {}'.format(time.time()-start))
body = {
'message': 'done!'
}
response = {
"statusCode": 200,
"body": json.dumps(body)
}
return response
(p.s. I know this would fail due to lack of write access, and the model needs to be saved to /tmp/)
Serverless.yml
service: test4KerasTest
plugins:
- serverless-pseudo-parameters
custom:
efsAccessPoint: fsap-*****
LocalMountPath: /mnt/efs
subnetsId: subnet-*****
securityGroup: sg-*****
provider:
name: aws
runtime: python3.6
region: us-east-2
timeout: 30
iamRoleStatements:
- Effect: Allow
Action:
- s3:GetObject
- s3:PutObject
Resource: 'arn:aws:s3:::predictionresources/*'
package:
exclude:
- node_modules/**
- .vscode/**
- .serverless/**
- .pytest_cache/**
- __pychache__/**
functions:
test:
handler: handler.test
environment: # Service wide environment variables
MNT_DIR: ${self:custom.LocalMountPath}
BUCKET: predictionresources
REGION: us-east-2
vpc:
securityGroupIds:
- ${self:custom.securityGroup}
subnetIds:
- ${self:custom.subnetsId}
iamManagedPolicies:
- arn:aws:iam::aws:policy/AmazonElasticFileSystemClientReadWriteAccess
- arn:aws:iam::aws:policy/AmazonS3FullAccess
events:
- http:
path: test
method: get
fileSystemConfig:
localMountPath: '${self:custom.LocalMountPath}'
arn: 'arn:aws:elasticfilesystem:${self:provider.region}:#{AWS::AccountId}:access-point/${self:custom.efsAccessPoint}'
results in this output from cloudwatch:
As can be seen, keras takes substantially longer to import in my lambda environment, but the other imports don't seem to be as negatively effected. I have tried importing different modules in different orders, and keras consistently takes an unreasonable amount of time to import.. Due to restrictions by the API Gateway, this function can't take longer than 30 seconds, which means I have to find a way to shorten the time it takes me to import keras on my lambda function.

Tensorflow on Elastic Beanstalk and Django

import tensorflow as tf
with tf.Graph().as_default():
sentences = tf.placeholder(tf.string)
import tensorflow_hub as hub
embed = hub.Module('/tmp/module')
embeddings = embed(sentences)
This is from my views.py file in my django app.
the execution fails at the import and shows the following error.
End of script output before headers: wsgi.py child pid 29409 exit
signal Segmentation fault (11)
While it works fine on my local machine.
I am currently using a t3 medium instance. Any tips on how to fix this ?

Tensorflow Read from HDFS mac : java.lang.NoSuchFieldError: LOG

I am trying to read from external hadoop from tensorflow on my mac. I have built tf with hadoop support from source, and also build hadoop with native library support on my mac. I am getting the following error ,
hdfsBuilderConnect(forceNewInstance=0, nn=192.168.60.53:9000, port=0, kerbTicketCachePath=(NULL), userName=(NULL)) error:
java.lang.NoSuchFieldError: LOG
at org.apache.hadoop.ipc.ClientCache.getClient(ClientCache.java:62)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.<init>(ProtobufRpcEngine.java:145)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.<init>(ProtobufRpcEngine.java:133)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.<init>(ProtobufRpcEngine.java:119)
at org.apache.hadoop.ipc.ProtobufRpcEngine.getProxy(ProtobufRpcEngine.java:102)
at org.apache.hadoop.ipc.RPC.getProtocolProxy(RPC.java:579)
at org.apache.hadoop.hdfs.NameNodeProxies.createNNProxyWithClientProtocol(NameNodeProxies.java:418)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:314)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.FileSystem$1.run(FileSystem.java:162)
at org.apache.hadoop.fs.FileSystem$1.run(FileSystem.java:159)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:159)
2018-10-05 16:01:21.867554: W tensorflow/core/kernels/queue_base.cc:277] _0_input_producer: Skipping cancelled enqueue attempt with queue not closed
Traceback (most recent call last):
This is my code:
import tensorflow as tf
def create_file_reader_ops(filename_queue):
reader = tf.TextLineReader(skip_header_lines=1)
_, csv_row = reader.read(filename_queue)
record_defaults = [[""], [""], [0], [0]]
country, code, gold, silver = tf.decode_csv(csv_row, record_defaults=record_defaults)
features = tf.stack([gold, silver])
return features, country
filename_queue = tf.train.string_input_producer([
"hdfs://192.168.60.53:9000/iris_data_multiclass.csv",
])
example, country = create_file_reader_ops(filename_queue)
with tf.Session() as sess:
tf.global_variables_initializer().run()
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
while True:
try:
example_data, country_name = sess.run([example, country])
print(example_data, country_name)
except tf.errors.OutOfRangeError:
break
I have build hadoop from source on mac.
$ hadoop version
Hadoop 2.7.3
Subversion https://github.com/apache/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff
Compiled by himaprasoon on 2018-10-04T11:09Z
Compiled with protoc 2.5.0
From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4
This command was run using /Users/himaprasoon/git/hadoop/hadoop-dist/target/hadoop-2.7.3/share/hadoop/common/hadoop-common-2.7.3.jar
hadoop checknative output
$ hadoop checknative
18/10/05 16:15:05 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library libbz2.dylib
18/10/05 16:15:05 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /Users/himaprasoon/git/hadoop/hadoop-dist/target/hadoop-2.7.3/lib/native/libhadoop.dylib
zlib: true /usr/lib/libz.1.dylib
snappy: true /usr/local/lib/libsnappy.1.dylib
lz4: true revision:99
bzip2: true /usr/lib/libbz2.1.0.dylib
openssl: true /usr/local/lib/libcrypto.dylib
tf version : 1.10.1
Any ideas what I might be doing wrong?
here are my environment variables.
HADOOP_HOME=/Users/himaprasoon/git/hadoop/hadoop-dist/target/hadoop-2.7.3/
HADOOP_MAPRED_HOME=$HADOOP_HOME
HADOOP_COMMON_HOME=$HADOOP_HOME
HADOOP_HDFS_HOME=$HADOOP_HOME
YARN_HOME=$HADOOP_HOME
HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
HADOOP_INSTALL=$HADOOP_HOME
OPENSSL_ROOT_DIR="/usr/local/opt/openssl"
LDFLAGS="-L${OPENSSL_ROOT_DIR}/lib"
CPPFLAGS="-I${OPENSSL_ROOT_DIR}/include"
PKG_CONFIG_PATH="${OPENSSL_ROOT_DIR}/lib/pkgconfig"
OPENSSL_INCLUDE_DIR="${OPENSSL_ROOT_DIR}/include"
PATH="/usr/local/opt/protobuf#2.5/bin:$PATH
HADOOP_OPTS="-Djava.library.path=${HADOOP_HOME}/lib/native"
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native
JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:${HADOOP_HOME}/lib/native
this is how I am running my program
CLASSPATH=$($HADOOP_HDFS_HOME/bin/hdfs classpath --glob) python3.6 myfile.py
references used to build tf and hadoop
Hadoop native libraries not found on OS/X
https://medium.com/#s.matthew.english/build-hadoop-from-source-on-macos-a3fb2b958b6c
Can Tensorflow read from HDFS on Mac?
https://gist.github.com/zedar/f631ace0759c1d512573
Have you read this post?
Tensorflow Enqueue operation was cancelled
It seems there is a workaround for the same error message there:
The problem happens at the very last stage when python tries to kill threads.
To do this properly you should create a train.Coordinator and pass it to your
queue_runner (no need to pass sess, as default session will be used>
with tf.Session() as sess:
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
// do your things
coord.request_stop()
coord.join(threads)
The last two lines should be added to your while loop to make sure all threads are properly killed.

Categories