Kubeflow error while creating a pipeline on on-premise server - python

I wanted to create a kubeflow pipeline for a MNIST model I created.I included the train function and predict function as container images in my pipeline. After creating the client instance, when I call the function create_run_from_pipeline_func to run my pipeline, It throws the following error.
AttributeError Traceback (most recent call last)
<ipython-input-24-bccaab255371> in <module>
11
12 # Submit pipeline directly from pipeline function
---> 13 run_result = client.create_run_from_pipeline_func(pipeline_func,
14 experiment_name=experiment_name,
15 run_name=run_name,
~/.local/lib/python3.8/site-packages/kfp/_client.py in create_run_from_pipeline_func(self, pipeline_func, arguments, run_name, experiment_name, pipeline_conf, namespace, mode, launcher_image, pipeline_root, enable_caching, service_account)
999 pipeline_conf=pipeline_conf)
1000
-> 1001 return self.create_run_from_pipeline_package(
1002 pipeline_file=pipeline_package_path,
1003 arguments=arguments,
~/.local/lib/python3.8/site-packages/kfp/_client.py in create_run_from_pipeline_package(self, pipeline_file, arguments, run_name, experiment_name, namespace, pipeline_root, enable_caching, service_account)
1080 experiment = self.create_experiment(
1081 name=experiment_name, namespace=namespace)
-> 1082 run_info = self.run_pipeline(
1083 experiment_id=experiment.id,
1084 job_name=run_name,
~/.local/lib/python3.8/site-packages/kfp/_client.py in run_pipeline(self, experiment_id, job_name, pipeline_package_path, params, pipeline_id, version_id, pipeline_root, enable_caching, service_account)
756 html = (
757 '<a href="%s/#/runs/details/%s" target="_blank" >Run details</a>.'
--> 758 % (self._get_url_prefix(), response.run.id))
759 IPython.display.display(IPython.display.HTML(html))
760 return response.run
AttributeError: 'NoneType' object has no attribute 'id'
the following inbuilt function should return an object but it returns a null type(response)
response = self._run_api.create_run(body=run_body)
I am running the following code:
I have specified the host inside the client instance creation.
client = kfp.Client(host='http://10.152.183.8.nip.io/')
Kubeflow version:1.21:
kfctl version: kfctl v1.0.1-0-gf3edb9b:
Kubernetes platform: Microk8s
Kubernetes version: Client-GitVersion:"v1.24.0" , Server-GitVersion:"v1.21.12-3+6937f71915b56b":
OS : Linux 18.04

Related

Azure CLI ML Throws TypeError: __init__() takes 2 positional arguments but 3 were given

I'm attempting to follow this tutorial and am getting the following error. I'm working in jupyter notebook with python 3. I am trying to build a recommendation engine using Azure tools and the Microsoft doc I attached.
TypeError: init() takes 2 positional arguments but 3 were given
I have attached an image of code from jupyter notebook to demonstrate proper formatting
I've tried solutions to roll back my Azure Python SDK using Pip. I run pip install --upgrade azureml-sdk and all looks good. Thanks very much for any and all help!
The code appears below:
print(workspace_name)
ws = Workspace.create(
name=workspace_name,
subscription_id=subscription_id,
resource_group=resource_group,
location=location,
exist_ok=True
)
The full error appears below:
'''
TypeError Traceback (most recent call last)
<ipython-input-11-25e04e55f419> in <module>
5 resource_group=resource_group,
6 location=location,
----> 7 exist_ok=True
8 )
~\.conda\envs\reco_pyspark\lib\site-packages\azureml\core\workspace.py in create(name, auth, subscription_id, resource_group, location, create_resource_group, sku, friendly_name, storage_account, key_vault, app_insights, container_registry, cmk_keyvault, resource_cmk_uri, hbi_workspace, default_cpu_compute_target, default_gpu_compute_target, private_endpoint_config, private_endpoint_auto_approval, exist_ok, show_output)
437
438 if location:
--> 439 available_locations = _available_workspace_locations(subscription_id, auth)
440 available_locations = [x.lower().replace(' ', '') for x in available_locations]
441 location = location.lower().replace(' ', '')
~\.conda\envs\reco_pyspark\lib\site-packages\azureml\core\workspace.py in _available_workspace_locations(subscription_id, auth)
1556 if not auth:
1557 auth = InteractiveLoginAuthentication()
-> 1558 return _commands.available_workspace_locations(auth, subscription_id)
~\.conda\envs\reco_pyspark\lib\site-packages\azureml\_project\_commands.py in available_workspace_locations(auth, subscription_id)
334 :rtype: list[str]
335 """
--> 336 response = auth._get_service_client(ResourceManagementClient, subscription_id).providers.get(
337 "Microsoft.MachineLearningServices")
338 for resource_type in response.resource_types:
~\.conda\envs\reco_pyspark\lib\site-packages\azureml\core\authentication.py in _get_service_client(self, client_class, subscription_id, subscription_bound, base_url)
150 return _get_service_client_using_arm_token(self, client_class, subscription_id,
151 subscription_bound=subscription_bound,
--> 152 base_url=base_url)
153
154 def signed_session(self, session=None):
~\.conda\envs\reco_pyspark\lib\site-packages\azureml\core\authentication.py in _get_service_client_using_arm_token(auth, client_class, subscription_id, subscription_bound, base_url)
1620 else:
1621 # converting subscription_id, which is string, to string because of weird python 2.7 errors.
-> 1622 client = client_class(adal_auth_object, str(subscription_id), base_url=base_url)
1623 return client
1624
TypeError: __init__() takes 2 positional arguments but 3 were given
'''
Have you tried using the Workspace.from_config() and the config.json method? Check out this doc page on how the set up your environment.

sagemaker.tensorflow.serving predict failed with 502 error

I deployed a tensorflow saved_model on using the following code:
`model_path = 's3://' + sagemaker_session.default_bucket() + '/model/model.tar.gz'
from sagemaker.tensorflow.serving import Model
model = Model(model_data=model_path, role=role)
predictor = model.deploy(initial_instance_count=1, instance_type='ml.c5.xlarge')`
the model takes images of dimensions 1,48,48,1
Immediately after, when I try to make a prediction using the following code:
`predictor.predict(preprocessed_faces_emo.tolist())`
I get the following error, and I know understand what the problem is. I am using this code from within sagemaker with Python version 3.7 and Tensorflow version 1.14.0:
`---------------------------------------------------------------------------
ModelError Traceback (most recent call last)
<ipython-input-37-4dc04dc0679c> in <module>()
----> 1 predictor.predict(preprocessed_faces_emo.tolist())~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/predictor.py in predict(self, data, initial_args)
105
106 request_args = self._create_request_args(data, initial_args)
--> 107 response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
108 return self._handle_response(response)
109 ~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
355 "%s() only accepts keyword arguments." % py_operation_name)
356 # The "self" in this scope is referring to the BaseClient.
--> 357 return self._make_api_call(operation_name, kwargs)
358
359 _api_call.__name__ = str(py_operation_name)~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
659 error_code = parsed_response.get("Error", {}).get("Code")
660 error_class = self.exceptions.from_code(error_code)
--> 661 raise error_class(parsed_response, operation_name)
662 else:
663 return parsed_responseModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (502) from model with message "<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.16.1</center>
</body>
</html>
". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/sagemaker-tensorflow-serving-2020-01-13-13-43-12-354 in account 970351559819 for more information.`

How to run model trained on GPU on CPU in spaCy

I'm using spaCy 2.0.18. I have trained a model using GPU but now I want to load this model for the predictions and run on CPU only.
I am able to load the model into memory but once I try to use it I get the following error:
import spacy
nlp = spacy.load("path_to_my_model")
# works fine up to this moment
result = nlp("Test") # throws the exception below:
Exception ignored in: <bound method Stream.__del__ of <cupy.cuda.stream.Stream object at 0x7fd288621be0>>
Traceback (most recent call last):
File "cupy/cuda/stream.pyx", line 161, in cupy.cuda.stream.Stream.__del__
AttributeError: 'Stream' object has no attribute 'ptr'
---------------------------------------------------------------------------
CUDARuntimeError Traceback (most recent call last)
<ipython-input-4-306c96b208c5> in <module>
----> 1 nlp("Yolo")
/opt/anaconda3/lib/python3.7/site-packages/spacy/language.py in __call__(self, text, disable)
344 if not hasattr(proc, '__call__'):
345 raise ValueError(Errors.E003.format(component=type(proc), name=name))
--> 346 doc = proc(doc)
347 if doc is None:
348 raise ValueError(Errors.E005.format(name=name))
nn_parser.pyx in spacy.syntax.nn_parser.Parser.__call__()
nn_parser.pyx in spacy.syntax.nn_parser.Parser.parse_batch()
/opt/anaconda3/lib/python3.7/site-packages/spacy/util.py in get_cuda_stream(require)
236
237 def get_cuda_stream(require=False):
--> 238 return CudaStream() if CudaStream is not None else None
239
240
cupy/cuda/stream.pyx in cupy.cuda.stream.Stream.__init__()
cupy/cuda/runtime.pyx in cupy.cuda.runtime.streamCreate()
cupy/cuda/runtime.pyx in cupy.cuda.runtime.streamCreate()
cupy/cuda/runtime.pyx in cupy.cuda.runtime.check_status()
CUDARuntimeError: cudaErrorNoDevice: no CUDA-capable device is detected
How to force spaCy to use CPU instead of GPU?

More efficient way to send a request than JSON to deployed tensorflow model in Sagemaker?

I have trained a tf.estimator based TensorFlow model in Sagemaker and deployed it and it works fine.
But I can only send requests to it in JSON format. I need to send some big input tensors and this seems very inefficient and also quickly breaks InvokeEndpoints 5MB request limit.
Is it possible to use a more effective format against the tensorflow serving based endpoint?
I tried sending a protobuf based request:
from sagemaker.tensorflow.serving import Model
from sagemaker.tensorflow.tensorflow_serving.apis import predict_pb2
from sagemaker.tensorflow.predictor import tf_serializer, tf_deserializer
role = 'xxx'
model = Model('s3://xxx/tmp/artifacts/sagemaker-tensorflow-scriptmode-xxx/output/model.tar.gz', role)
predictor = model.deploy(initial_instance_count=1, instance_type='ml.c5.xlarge', endpoint_name='test-endpoint')
# this predictor has json serializer, make a new one pred =
RealTimePredictor('test-endpoint', serializer=tf_serializer, deserializer=tf_deserializer)
req = predict_pb2.PredictRequest()
req.inputs['instances'].CopyFrom(tf.make_tensor_proto(np.zeros((4, 36, 64)), shape=(4, 36, 64)))
predictor.predict(req)
Which results in the following error:
---------------------------------------------------------------------------
ModelError Traceback (most recent call last)
<ipython-input-40-5ba7f281bd0d> in <module>()
----> 1 predictor.predict(req)
~/anaconda3/envs/default/lib/python3.6/site-packages/sagemaker/predictor.py in predict(self, data, initial_args)
76
77 request_args = self._create_request_args(data, initial_args)
---> 78 response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
79 return self._handle_response(response)
80
~/anaconda3/envs/default/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
355 "%s() only accepts keyword arguments." % py_operation_name)
356 # The "self" in this scope is referring to the BaseClient.
--> 357 return self._make_api_call(operation_name, kwargs)
358
359 _api_call.__name__ = str(py_operation_name)
~/anaconda3/envs/default/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
659 error_code = parsed_response.get("Error", {}).get("Code")
660 error_class = self.exceptions.from_code(error_code)
--> 661 raise error_class(parsed_response, operation_name)
662 else:
663 return parsed_response
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (415) from model with message "{"error": "Unsupported Media Type: application/octet-stream"}".
Is JSON the only available query format for deployed TensorFlow models ?
Have you looked at batch transform? If you don't need actually an HTTPS endpoint, this might solve your problem:
https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-batch-transform.html

pymongo error when writing

I am unable to do any writes to a remote mongodb database. I am able to connect and do lookups (e.g. find). I connect like this:
conn = pymongo.MongoClient(db_uri,slaveOK=True)
db = conn.test_database
coll = db.test_collection
But when I try to insert,
coll.insert({'a':1})
I run into an error:
---------------------------------------------------------------------------
AutoReconnect Traceback (most recent call last)
<ipython-input-56-d4ffb9e3fa79> in <module>()
----> 1 coll.insert({'a':1})
/usr/lib/python2.7/dist-packages/pymongo/collection.pyc in insert(self, doc_or_docs, manipulate, safe, check_keys, continue_on_error, **kwargs)
410 message._do_batched_insert(self.__full_name, gen(), check_keys,
411 safe, options, continue_on_error,
--> 412 self.uuid_subtype, client)
413
414 if return_one:
/usr/lib/python2.7/dist-packages/pymongo/mongo_client.pyc in _send_message(self, message, with_last_error, command, check_primary)
1126 except (ConnectionFailure, socket.error), e:
1127 self.disconnect()
-> 1128 raise AutoReconnect(str(e))
1129 except:
1130 sock_info.close()
AutoReconnect: not master
If I remove the slaveOK=True (setting it to it's default value of False) then I can still connect, but the reads (and writes) fail:
AutoReconnect Traceback (most recent call last)
<ipython-input-70-6671eea24f80> in <module>()
----> 1 coll.find_one()
/usr/lib/python2.7/dist-packages/pymongo/collection.pyc in find_one(self, spec_or_id, *args, **kwargs)
719 *args, **kwargs).max_time_ms(max_time_ms)
720
--> 721 for result in cursor.limit(-1):
722 return result
723 return None
/usr/lib/python2.7/dist-packages/pymongo/cursor.pyc in next(self)
1036 raise StopIteration
1037 db = self.__collection.database
-> 1038 if len(self.__data) or self._refresh():
1039 if self.__manipulate:
1040 return db._fix_outgoing(self.__data.popleft(),
/usr/lib/python2.7/dist-packages/pymongo/cursor.pyc in _refresh(self)
980 self.__skip, ntoreturn,
981 self.__query_spec(), self.__fields,
--> 982 self.__uuid_subtype))
983 if not self.__id:
984 self.__killed = True
/usr/lib/python2.7/dist-packages/pymongo/cursor.pyc in __send_message(self, message)
923 self.__tz_aware,
924 self.__uuid_subtype,
--> 925 self.__compile_re)
926 except CursorNotFound:
927 self.__killed = True
/usr/lib/python2.7/dist-packages/pymongo/helpers.pyc in _unpack_response(response, cursor_id, as_class, tz_aware, uuid_subtype, compile_re)
99 error_object = bson.BSON(response[20:]).decode()
100 if error_object["$err"].startswith("not master"):
--> 101 raise AutoReconnect(error_object["$err"])
102 elif error_object.get("code") == 50:
103 raise ExecutionTimeout(error_object.get("$err"),
AutoReconnect: not master and slaveOk=false
Am I connecting incorrectly? Is there a way to specify connecting to the primary replica?
AutoReconnect: not master means that your operation is failing because the node on which you are attempting to issue the command is not the primary of a replica set, where the command (e.g., a write operation) requires that node to be a primary. Setting slaveOK=True just enables you to read from a secondary node, where by default you would only be able to read from the primary.
MongoClient is automatically able to discover and connect to the primary if the replica set name is provided to the constructor with replicaSet=<replica set name>. See "Connecting to a Replica Set" in the PyMongo docs.
As an aside, slaveOK is deprecated, replaced by ReadPreference. You can specify a ReadPreference when creating the client or when issuing queries, if you want to target a node other than the primary.
I don't know It's related to this topic or not But when I searched about the below exception google leads me to the question. Maybe it'd be helpful.
pymongo.errors.NotMasterError: not master
In my case, My hard drive was full.
you can also figure it out with df -h command

Categories