Trouble saving repeated protobuf object to file (Python) - python

I'm new to protobuf, so I don't know how to frame the question correctly.
Anyways, I'm using this Model Config proto file. I converted it into python using this command protoc -I=. --python_out=. ./model_server_config.proto from Protocol Buffer page. Now I have some python files which I can import and work on. My objective is to create a file (for running the TensorFlow model server with multiple models) which should look like the following:
model_config_list: {
config: {
name: "name1",
base_path: "path1",
model_platform: "tensorflow"
},
config: {
name: "name2",
base_path: "path2",
model_platform: "tensorflow"
},
config: {
name: "name3",
base_path: "path3",
model_platform: "tensorflow"
},
}
Now using the python package compiled, I made a protobuf object which looks like this when I print it out:
model_config_list {
config {
name: "name1"
base_path: "path1"
model_platform: "tensorflow"
}
config {
name: "name2"
base_path: "path2"
model_platform: "tensorflow"
}
config {
name: "name3"
base_path: "path3"
model_platform: "tensorflow"
}
}
But while serializing the object using objectname.SerializeToString(), I get a weird output as :
b'\n\x94\x01\n \n\x04name1\x12\x0cpath1"\ntensorflow\n7\n\x08name2\x12\x1fpath2"\ntensorflow\n7\n\x08name3\x12\x1fpath3"\ntensorflow'
I tried converting it into Json also using the protobuf for python like this:
from google.protobuf.json_format import MessageToJson
MessageToJson(objectname)
which gave me a result like:
{
"modelConfigList": {
"config": [
{
"name": "name1",
"basePath": "path1",
"modelPlatform": "tensorflow"
},
{
"name": "name2",
"basePath": "path2",
"modelPlatform": "tensorflow"
},
{
"name": "name3",
"basePath": "path3",
"modelPlatform": "tensorflow"
}
]
}
}
with all the objects in a list and each objects as string, which is not acceptable for TensorFlow model server config.
Any ideas on how to write it into a file correctly? Or am I creating the whole objects incorrectly? Any help is welcome, Thanks in advance.

I don't know anything about what system will be reading your file, so I can't say anything about how you should write it to a file. It really depends on how the Model Server expects to read it.
That said, I don't see anything wrong with how you're creating the message, or any of the serialization methods you've shown.
The print method shows a "text format" proto, which is good for debugging and is sometimes used for storing configuration files. It's not very compact (field names are present in the file) and doesn't have all the backwards- and forwards-compatible features of the binary representation. It's actually funcionally the same as what you've said it "should look like": the colons and commas are actually optional.
The SerializeToString() method uses the binary serialization format. This is arguably what Protocol Buffers were built to do. It's a compact representation and provides backwards and forwards compatibility, but it's not very human-readable.
As the name suggests, the json_format module provides a JSON representation of the message. That's perfectly good if the system you're interacting with expects a JSON, but it's not exactly common.
Appendix: instead of using print(), the google.protobuf.text_format module has utilities better suited to using the text format programmatically. To write to a file, you could use:
from google.protobuf import text_format
(...)
with open(file_path, 'w') as output:
text_format.PrintMessage(my_message, output)

Related

singer tap-zendesk - how to extract catalog.json with selected:True from discovery mode

I am using singer's tap-zendesk library and want to extract data from specific schemas.
I am running the following command in sync mode:
tap-zendesk --config config.json --catalog catalog.json.
Currently my config.json file has the following parameters:
{
"email": "<email>",
"api_token": "<token>",
"subdomain": "<domain>",
"start_date": "<start_date>"
}
I've managed to extract data by putting 'selected':true under schema, properties and metadata in the catalog.json file. But I was wondering if there was an easier way to do this? There are around 15 streams I need to go through.
I manage to get the catalog.json file through the discovery mode command:
tap-zendesk --config config.json --discover > catalog.json
The output looks something like the following, but that means that I have to go and add selected:True under every field.
{
"streams": [
{
"stream": "tickets",
"tap_stream_id": "tickets",
"schema": {
**"selected": "true"**,
"properties": {
"organization_id": {
**"selected": "true"**,},
"metadata": [
{
"breadcrumb": [],
"metadata": {
**"selected": "true"**
}
The selected=true needs to be applied only once per stream. This needs to be added to the metadata section under the stream where breadcrumbs = []. This is very poorly documented.
Please see this blog post for some helpful details: https://medium.com/getting-started-guides/extracting-ticket-data-from-zendesk-using-singer-io-tap-zendesk-57a8da8c3477

How to insert variable value into JSON string for use with PDAL

I'm trying to use the Python extension for PDAL to read in a laz file.
To do so, I'm using the simple pipeline structure as exampled here: https://gis.stackexchange.com/questions/303334/accessing-raw-data-from-laz-file-in-python-with-open-source-software. It would be useful for me, however, to insert the value contained in a variable for the "filename:" field. To do so, I've tried the following, where fullFileName is a str variable containing the name (full path) of the file, but I am getting an error that no such file exists. I am assuming my JSON syntax is slightly off or something; can anyone help?
pipeline="""{
"pipeline": [
{
"type": "readers.las",
"filename": "{fullFileName}"
}
]
}"""
You can follow this code:
import json
import pdal
file = "D:/Lidar data/input.laz"
pipeline={
"pipeline": [
{
"type": "readers.las",
"filename": file
},
{
"type": "filters.sort",
"dimension": "Z"
}
]
}
r = pdal.Pipeline(json.dumps(pipeline))
r.validate()
points = r.execute()
print(points)

Is it possible to use references in JSON?

I have this JSON:
{
"app_name": "my_app",
"version": {
"1.0": {
"path": "/my_app/1.0"
},
"2.0": {
"path": "/my_app/2.0"
}
}
}
Is it somehow possible to reference the keywords app_name and the key of version so that I don't have to repeat "my_app" and the version numbering?
I was thinking something along the lines of... (code totally made up):
{
"#app_name": "my_app",
"version": {
"1.0": {
"path": "/{{$app_name}}/{{key[-1]]}}"
},
"2.0": {
"path": "/{{$app_name}}/{{key[-1]}}"
}
}
}
Or is this something that could instead be handled better using YAML?
In the end, I intend to read this data into a Python dictionary.
No, JSON does not have references. (The functionality you request here, with substring expansion, would open itself to memory attacks against the parser; by not supporting this functionality, JSON avoids vulnerability to such attacks).
If you want such functionality, you need to implement it yourself.
Not in pure JSON, but you could performs string substitution after you parse the JSON.

Access GlobalParameters in Azure ML Python script

How can one access the global parameters ("GlobalParameters") sent from a web service in a Python script on Azure ML?
I tried:
if 'GlobalParameters' in globals():
myparam = GlobalParameters['myparam']
but with no success.
EDIT: Example
In my case, I'm sending a sound file over the web service (as a list of samples). I would also like to send a sample rate and the number of bits per sample. I've successfully configured the web service (I think) to take these parameters, so the GlobalParameters now look like:
"GlobalParameters": {
"sampleRate": "44100",
"bitsPerSample": "16",
}
However, I cannot access these variables from the Python script, neither as GlobalParameters["sampleRate"] nor as sampleRate. Is it possible? Where are they stored?
based on our understanding of your question, here may has a miss conception that Azure ML parameters are not “Global Parameters”, as a matter of fact they are just parameter substitution tied to a particular module. So in affect there are no global parameters that are accessible throughout the experiment you have mentioned. Such being the case, we think the experiment below accomplishes what you are asking for:
Please add an “Enter Data” module to the experiment and add Data in csv format. Then for the Data click the parameter to create a web service parameter. Add in the CSV data which will be substituted from data passed by the client application. I.e.
Please add an “Execute Python” module and hook up the “Enter Data” output to the “Execute Python” input1. Add the python code to take the dataframe1 and add it to a python list. Once you have it in a list you can use it anywhere in your python code.
Python code snippet
def azureml_main(dataframe1 = None, dataframe2 = None):
import pandas as pd
global_list = []
for g in dataframe1["Col3"]:
global_list.append(g)
df_global = pd.DataFrame(global_list)
print('Input pandas.DataFrame:\r\n\r\n{0}'.format(df_global))
return [df_global]
Once you publish your experiment, you can add in new values in the “Data”: “”, section below with the new values that you was substituted for the “Enter Data” values in the experiment.
data = {
"Inputs": {
"input1":
{
"ColumnNames": ["Col1", "Col2", "Col3"],
"Values": [ [ "0", "value", "0" ], [ "0", "value", "0" ], ]
}, },
"GlobalParameters": {
"Data": "1,sampleRate,44500\\n2,bitsPerSample,20",
}
}
Please feel free to let us know if this makes sense.
The GlobalParameters parameter can not be used in a Python script. It is used to override certain parameters in other modules.
If you, for example, take the 'Split Data' module, you'll find an option to turn a parameter into a web service parameter:
Once you click that, a new section appears titled "Web Service Parameters". There you can change the default parameter name to one of your choosing.
If you deploy your project as a web service, you can override that parameter by putting it in the GlobalParameters parameter:
"GlobalParameters": {
"myFraction": 0.7
}
I hope that clears things up a bit.
Although it is not possible to use GlobalParameters in the Python script (see my previous answer), you can however hack/abuse the second input of the Python script to pass in other parameters. In my example I call them metadata parameters.
To start, I added:
a Web service input module with name: "realdata" (for your real data off course)
a Web service input module with name: "metadata" (we will abuse this one to pass parameters to our Python).
a Web service output module with name: "computedMetadata"
Connect the modules as follows:
As you can see, I also added a real data set (Restaurant ratings) as wel as a dummy metadata csv (the Enter Data Manually) module.
In this manual data you will have to predefine your metadata parameters as if they were a csv with a header and a only a single row to hold the data:
In the example both sampleRate and bitsPerSample are set to 0.
My Python scripts then takes in that fake csv as metadata, does some dummy calculation with it and returns it as column name:
import pandas as pd
def azureml_main(realdata = None, metadata = None):
theSum = metadata["sampleRate"][0] + metadata["bitsPerSample"][0]
outputString = "The sum of the sampleRate and the bitsPerSecond is " + str(theSum)
print(outputString)
return pd.DataFrame([outputString])
I then published this as a web service and called it using Node.js like this:
httpreq.post('https://ussouthcentral.services.azureml.net/workspaces/xxx/services/xxx', {
headers: {
Authorization: 'Bearer xxx'
},
json: {
"Inputs": {
"realdata": {
"ColumnNames": [
"userID",
"placeID",
"rating"
],
"Values": [
[
"100",
"101",
"102"
],
[
"200",
"201",
"202"
]
]
},
"metadata": {
"ColumnNames": [
"sampleRate",
"bitsPerSample"
],
"Values": [
[
44100,
16
]
]
}
},
"GlobalParameters": {}
}
}, (err, res) => {
if(err) return console.log(err);
console.log(JSON.parse(res.body));
});
The output was as expected:
{ Results:
{ computedMetadata:
{ type: 'table',
value:
{ ColumnNames: [ '0' ],
ColumnTypes: [ 'String' ],
Values:
[ [ 'The sum of the sampleRate and the bitsPerSecond is 44116' ] ] } } } }
Good luck!

Convert a JSON schema to a python class

Is there a python library for converting a JSON schema to a python class definition, similar to jsonschema2pojo -- https://github.com/joelittlejohn/jsonschema2pojo -- for Java?
So far the closest thing I've been able to find is warlock, which advertises this workflow:
Build your schema
>>> schema = {
'name': 'Country',
'properties': {
'name': {'type': 'string'},
'abbreviation': {'type': 'string'},
},
'additionalProperties': False,
}
Create a model
>>> import warlock
>>> Country = warlock.model_factory(schema)
Create an object using your model
>>> sweden = Country(name='Sweden', abbreviation='SE')
However, it's not quite that easy. The objects that Warlock produces lack much in the way of introspectible goodies. And if it supports nested dicts at initialization, I was unable to figure out how to make them work.
To give a little background, the problem that I was working on was how to take Chrome's JSONSchema API and produce a tree of request generators and response handlers. Warlock doesn't seem too far off the mark, the only downside is that meta-classes in Python can't really be turned into 'code'.
Other useful modules to look for:
jsonschema - (which Warlock is built on top of)
valideer - similar to jsonschema but with a worse name.
bunch - An interesting structure builder thats half-way between a dotdict and construct
If you end up finding a good one-stop solution for this please follow up your question - I'd love to find one. I poured through github, pypi, googlecode, sourceforge, etc.. And just couldn't find anything really sexy.
For lack of any pre-made solutions, I'll probably cobble together something with Warlock myself. So if I beat you to it, I'll update my answer. :p
python-jsonschema-objects is an alternative to warlock, build on top of jsonschema
python-jsonschema-objects provides an automatic class-based binding to JSON schemas for use in python.
Usage:
Sample Json Schema
schema = '''{
"title": "Example Schema",
"type": "object",
"properties": {
"firstName": {
"type": "string"
},
"lastName": {
"type": "string"
},
"age": {
"description": "Age in years",
"type": "integer",
"minimum": 0
},
"dogs": {
"type": "array",
"items": {"type": "string"},
"maxItems": 4
},
"gender": {
"type": "string",
"enum": ["male", "female"]
},
"deceased": {
"enum": ["yes", "no", 1, 0, "true", "false"]
}
},
"required": ["firstName", "lastName"]
} '''
Converting the schema object to class
import python_jsonschema_objects as pjs
import json
schema = json.loads(schema)
builder = pjs.ObjectBuilder(schema)
ns = builder.build_classes()
Person = ns.ExampleSchema
james = Person(firstName="James", lastName="Bond")
james.lastName
u'Bond' james
example_schema lastName=Bond age=None firstName=James
Validation :
james.age = -2
python_jsonschema_objects.validators.ValidationError: -2 was less
or equal to than 0
But problem is , it is still using draft4validation while jsonschema has moved over draft4validation , i filed an issue on the repo regarding this .
Unless you are using old version of jsonschema , the above package will work as shown.
I just created this small project to generate code classes from json schema, even if dealing with python I think can be useful when working in business projects:
pip install jsonschema2popo
running following command will generate a python module containing json-schema defined classes (it uses jinja2 templating)
jsonschema2popo -o /path/to/output_file.py /path/to/json_schema.json
more info at: https://github.com/frx08/jsonschema2popo

Categories