Avro python array serialization - python

Looks like the following doesn't fit avro schema:
[{'page_title': 'Antoine Meillet', 'page_id': 3, 'contributors': [['contribution', {'revisions': 2, 'username': 'Curry'}], ['contribution', {'revisions': 1, 'username': 'script de conversion'}], ['contribution', {'revisions': 1, 'username': 'Francis'}]]}]
Schema :
{
"namespace": "org.wikipedia.fr",
"name": "meta-history",
"type": "record",
"fields": [
{
"name": "page_title",
"type": "string"
},
{
"name": "page_id",
"type": "int"
},
{
"name": "contributors",
"type": {
"type": "array",
"items": {
"type": "record",
"name": "contribution",
"fields": [
{
"name": "revisions",
"type": "int"
},
{
"name": "username",
"type": "string"
}
]
}
}
}
]
}
Got "ValueError: no value and no default for revisions"
Not sure what I'm doing wrong here...

Ok, not needed to name the array field.
[{'page_title': 'Antoine Meillet', 'page_id': 3, 'contributors': [{'revisions': 2, 'username': 'Curry'}, {'revisions': 1, 'username': 'script de conversion'}, {'revisions': 1, 'username': 'Francis'}]]
Got it right.

Related

How to convert Json to Python object?

How to convert the complex Json format to python? I feel difficulty in converting the attached complex json to python object and I have to validate this data later against the DB.
Json:
{
"namespace":"Data.Datapoint",
"type":"record",
"name":"Blood Donar",
"fields":[
{
"name":"id",
"type":"int"
},
{
"name":"donor_number",
"type":"string"
},
{
"name":"birth_date",
"type":{
"type":"int",
"logicalType":"date"
},
"doc":"Birth Date"
},
{
"name":"height",
"type":[
"int",
"null"
],
"doc":"Height"
},
{
"name":"applicant_ts",
"type":[
{
"type":"long",
"logicalType":"timestamp-millis"
},
"null"
],
"doc":"Creation Timestamp"
},
{
"name":"arm_preference_ind",
"type":[
"string",
"null"
],
"doc":"Arm Preference; Selection from list"
},
{
"name":"abo_ind",
"type":[
"string",
"null"
],
"doc":"Blood Type/ABO"
},
{
"name":"vein_grading_ind",
"type":[
"string",
"null"
],
"doc":"Vein Grade"
}
]
}
import json
data = '''
{ "namespace": "Data.Datapoint", "type": "record", "name": "Blood Donar", "fields": [ { "name": "id", "type": "int" }, { "name": "donor_number", "type": "string" }, { "name": "birth_date", "type": { "type": "int", "logicalType": "date" }, "doc": "Birth Date" }, { "name": "height", "type": [ "int", "null" ], "doc": "Height" }, { "name": "applicant_ts", "type": [ { "type": "long", "logicalType": "timestamp-millis" }, "null" ], "doc": "Creation Timestamp" }, { "name": "arm_preference_ind", "type": [ "string", "null" ], "doc": "Arm Preference; Selection from list" }, { "name": "abo_ind", "type": [ "string", "null" ], "doc": "Blood Type/ABO" }, { "name": "vein_grading_ind", "type": [ "string", "null" ], "doc": "Vein Grade" } ] }
'''
json_data = json.loads(data)
json_data is your python dict obj.
if you want json data from web you can try this
import json
import requests
response = requests.get("https://jsonplaceholder.typicode.com/todos")
todos = json.loads(response.text)

Validate Json schema with repeating json response in Python

I am getting none when I try to validate my Json schema with my Json response using Validate from Jsonschema.validate, while it shows matched on https://www.jsonschemavalidator.net/
Json Schema
{
"KPI": [{
"KPIDefinition": {
"id": {
"type": "string"
},
"name": {
"type": "string"
},
"version": {
"type": "number"
},
"description": {
"type": "string"
},
"datatype": {
"type": "string"
},
"units": {
"type": "string"
}
},
"KPIGroups": [{
"id": {
"type": "number"
},
"name": {
"type": "string"
}
}]
}],
"response": [{
"Description": {
"type": "string"
}
}]
}
JSON Response
JSON Response
{
"KPI": [
{
"KPIDefinition": {
"id": "2",
"name": "KPI 2",
"version": 1,
"description": "This is KPI 2",
"datatype": "1",
"units": "perHour"
},
"KPIGroups": [
{
"id": 7,
"name": "Group 7"
}
]
},
{
"KPIDefinition": {
"id": "3",
"name": "Parameter 3",
"version": 1,
"description": "This is KPI 3",
"datatype": "1",
"units": "per Hour"
},
"KPIGroups": [
{
"id": 7,
"name": "Group 7"
}
]
}
],
"response": [
{
"Description": "RECORD Found"
}
]
}
Code
json_schema2 = {"KPI":[{"KPIDefinition":{"id_new":{"type":"number"},"name":{"type":"string"},"version":{"type":"number"},"description":{"type":"string"},"datatype":{"type":"string"},"units":{"type":"string"}},"KPIGroups":[{"id":{"type":"number"},"name":{"type":"string"}}]}],"response":[{"Description":{"type":"string"}}]}
json_resp = {"KPI":[{"KPIDefinition":{"id":"2","name":"Parameter 2","version":1,"description":"This is parameter 2 definition version 1","datatype":"1","units":"kN"},"KPIGroups":[{"id":7,"name":"Group 7"}]},{"KPIDefinition":{"id":"3","name":"Parameter 3","version":1,"description":"This is parameter 3 definition version 1","datatype":"1","units":"kN"},"KPIGroups":[{"id":7,"name":"Group 7"}]}],"response":[{"Description":"RECORD FETCHED"}]}
print(jsonschema.validate(instance=json_resp, schema=json_schema2))
Validation is not being done correctly, I changed the dataType and key name in my response but still, it is not raising an exception or error.
jsonschema.validate(..) is not supposed to return anything.
Your schema object and the JSON object are both okay and validation has passed if it didn't raise any exceptions -- which seems to be the case here.
That being said, you should wrap your call within a try-except block so as to be able to catch validation errors.
Something like:
try:
jsonschema.validate(...)
print("Validation passed!")
except ValidationError:
print("Validation failed")
# similarly catch SchemaError too if needed.
Update: Your schema is invalid. As it stands, it will validate almost all inputs. A schema JSON should be an object (dict) that should have fields like "type" and based on the type, it might have other required fields like "items" or "properties". Please read up on how to write JSONSchema.
Here's a schema I wrote for your JSON:
{
"type": "object",
"required": [
"KPI",
"response"
],
"properties": {
"KPI": {
"type": "array",
"items": {
"type": "object",
"required": ["KPIDefinition","KPIGroups"],
"properties": {
"KPIDefinition": {
"type": "object",
"required": ["id","name"],
"properties": {
"id": {"type": "string"},
"name": {"type": "string"},
"version": {"type": "integer"},
"description": {"type": "string"},
"datatype": {"type": "string"},
"units": {"type": "string"},
},
"KPIGroups": {
"type": "array",
"items": {
"type": "object",
"required": ["id", "name"],
"properties": {
"id": {"type": "integer"},
"name": {"type": "string"}
}
}
}
}
}
}
},
"response": {
"type": "array",
"items": {
"type": "object",
"required": ["Description"],
"properties": {
"Description": {"type": "string"}
}
}
}
}
}

Python how to pick 3rd occurence in nested json array

I am working with one of my requirement
My requirement: I need to pick and print only 3rd "id" from "syrap" list from the nested json file. I am not getting desired output. Any help will be appreciated.
Test file:
{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
"batters":
{ "process": "abc",
"mix": "0303",
"syrap":
[
{ "id": "1001", "type": "Regular" },
{ "id": "1002", "type": "Chocolate" },
{ "id": "1003", "type": "Blueberry" },
{ "id": "1004", "type": "Devil's Food" }
]
},
"rate": 0.55,
"topping":
[
{ "id": "5001", "type": "None" },
{ "id": "5002", "type": "Glazed" },
{ "id": "5005", "type": "Sugar" },
{ "id": "5007", "type": "Powdered Sugar" },
{ "id": "5006", "type": "Chocolate with Sprinkles" },
{ "id": "5003", "type": "Chocolate" },
{ "id": "5004", "type": "Maple" }
]
}
Expected output in a csv:
0001,donut,abc,0303,1003
My code:
import requests
import json
import csv
f = open('testdata.json')
data = json.load(f)
f.close()
f = csv.writer(open('testout.csv', 'wb+'))
for item in data:
f.writerow([item['id'], item[type], item['batters'][0]['process'],
item['batters'][0]['mix'],
item['batters'][0]['syrap'][0]['id'],
item['batters'][0]['syrap'][1]['id'],
item['batters'][0]['syrap'][2]['id'])
Here is some sample code showing how you can iterate through json content parsed as a dictionary:
import json
json_str = '''{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
"batters":
{ "process": "abc",
"mix": "0303",
"syrap":
[
{ "id": "1001", "type": "Regular" },
{ "id": "1002", "type": "Chocolate" },
{ "id": "1003", "type": "Blueberry" },
{ "id": "1004", "type": "Devil's Food" }
]
},
"rate": 0.55,
"topping":
[
{ "id": "5001", "type": "None" },
{ "id": "5002", "type": "Glazed" },
{ "id": "5005", "type": "Sugar" },
{ "id": "5007", "type": "Powdered Sugar" },
{ "id": "5006", "type": "Chocolate with Sprinkles" },
{ "id": "5003", "type": "Chocolate" },
{ "id": "5004", "type": "Maple" }
]
}
'''
jsondict = json.loads(json_str)
syrap_node = jsondict['batters']['syrap']
for item in syrap_node:
print (f'id:{item["id"]} type: {item["type"]}')
Simply, data[“batters”][“syrap”][2][“id”]
Much better way to achieve this would be
f = open('testout.csv', 'wb+')
with f:
fnames = ['id','type','process','mix','syrap']
writer = csv.DictWriter(f, fieldnames=fnames)
writer.writeheader()
for item in data:
print item
writer.writerow({'id' : item['id'], 'type': item['type'],
'process' : item['batters']['process'],
'mix': item['batters']['mix'],
'syrap': item['batters']['syrap'][2]['id']})
You need to make sure that data is actually a list. if it is not a list, don't use for loop.
simply,
writer.writerow({'id' : data['id'], 'type': data['type'],
'process' : data['batters']['process'],
'mix': data['batters']['mix'],
'syrap': data['batters']['syrap'][2]['id']})

"object mapping [prices] can't be changed from nested to non-nested" on Bulk Python

I'm trying to insert a doc in ElasticSearch but every time i try to insert in python, its return me an error. But if i try to insert from Kibana or cUrl, its succeed.
I already tried the elasticserach-dsl but i've got the same error.
(Sorry for my bad english, i'm from brazil :D)
Error i've got:
elasticsearch.helpers.BulkIndexError: ((...)'status': 400, 'error': {'type':
'illegal_argument_exception', 'reason': "object mapping [prices] can't be changed from nested to non-nested"}}}])
My code:
from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk
doc = [{
"_index": "products",
"_type": "test_products",
"_source": {
[...]
"prices": {
"latest": {
"value": 89,
"when": 1502795602848
},
"old": [
{
"value": 0,
"when": 1502795602848
}
]
},
"sizes": [
{
"name": "P",
"available": True
},
{
"name": "M",
"available": True
}
],
"created": "2017-08-15T08:13:22.848284"
}
}]
bulk(self.es, doc, index="products")
My ES mapping:
{
"test_products": {
"mappings": {
"products": {
"properties": {
"approved": {
"type": "boolean"
},
"available": {
"type": "boolean"
},
"brand": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"buyClicks": {
"type": "integer"
},
"category": {
"type": "keyword"
},
"code": {
"type": "keyword"
},
"color": {
"type": "nested",
"properties": {
"name": {
"type": "keyword"
},
"value": {
"type": "keyword"
}
}
},
"created": {
"type": "date"
},
"description": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"gender": {
"type": "keyword"
},
"images": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"likes": {
"type": "integer"
},
"link": {
"type": "keyword"
},
"name": {
"type": "text",
"term_vector": "yes",
"analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer"
},
"prices": {
"type": "nested",
"properties": {
"latest": {
"type": "nested",
"properties": {
"value": {
"type": "long"
},
"when": {
"type": "date",
"format": "dd-MM-yyyy||epoch_millis"
}
}
},
"old": {
"type": "nested",
"properties": {
"value": {
"type": "long"
},
"when": {
"type": "date",
"format": "dd-MM-yyyy||epoch_millis"
}
}
}
}
},
"redirectClicks": {
"type": "integer"
},
"sizes": {
"type": "nested",
"properties": {
"available": {
"type": "boolean"
},
"name": {
"type": "keyword"
},
"quantity": {
"type": "integer"
}
}
},
"slug": {
"type": "keyword"
},
"store": {
"type": "keyword"
},
"subCategories": {
"type": "nested",
"properties": {
"name": {
"type": "keyword"
},
"value": {
"type": "keyword"
}
}
},
"tags": {
"type": "text",
"fields": {
"raw": {
"type": "text",
"term_vector": "yes",
"analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer"
}
}
},
"thumbnails": {
"type": "keyword"
}
}
}
}
}
}

str to dict in python, but maintain the sequence of json attributes

I've tried ast.literal_eval and json.loads but both of these, doesn't maintain the sequence of json attributes when a string is provided. Please see the following example -
String before providing it to json.loads -
{
"type": "array",
"properties": {
"name": {
"type": "string"
},
"i": {
"type": "integer"
},
"strList": {
"type": "array",
"items": {
"type": "string"
}
},
"strMap": {
"type": "object"
},
"p2": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"i": {
"type": "integer"
},
"p3": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"i": {
"type": "integer"
},
"p4": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"i": {
"type": "integer"
}
}
}
}
}
}
}
},
"p3": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"i": {
"type": "integer"
},
"p4": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"i": {
"type": "integer"
}
}
}
}
}
},
"b": {
"type": "boolean",
"required": true
}
},
"classnames": {
"rootNode": {
"classname": "com.agent.Person"
},
"p2": {
"classname": "com.agent.Person2",
"p3": {
"classname": "com.agent.Person3",
"p4": {
"classname": "com.agent.Person4"
}
}
},
"p3": {
"classname": "com.agent.Person3",
"p4": {
"classname": "com.agent.Person4"
}
}
}
}
String after providing it to json.loads -
{
'classnames': {
'p2': {
'classname': 'com.agent.Person2',
'p3': {
'classname': 'com.agent.Person3',
'p4': {
'classname': 'com.agent.Person4'
}
}
},
'p3': {
'classname': 'com.agent.Person3',
'p4': {
'classname': 'com.agent.Person4'
}
},
'rootNode': {
'classname': 'com.agent.Person'
}
},
'properties': {
'b': {
'required': True,
'type': 'boolean'
},
'i': {
'type': 'integer'
},
'name': {
'type': 'string'
},
'p2': {
'items': {
'properties': {
'i': {
'type': 'integer'
},
'name': {
'type': 'string'
},
'p3': {
'properties': {
'i': {
'type': 'integer'
},
'name': {
'type': 'string'
},
'p4': {
'properties': {
'i': {
'type': 'integer'
},
'name': {
'type': 'string'
}
},
'type': 'object'
}
},
'type': 'object'
}
},
'type': 'object'
},
'type': 'array'
},
'p3': {
'items': {
'properties': {
'i': {
'type': 'integer'
},
'name': {
'type': 'string'
},
'p4': {
'properties': {
'i': {
'type': 'integer'
},
'name': {
'type': 'string'
}
},
'type': 'object'
}
},
'type': 'object'
},
'type': 'array'
},
'strList': {
'items': {
'type': 'string'
},
'type': 'array'
},
'strMap': {
'type': 'object'
}
},
'type': 'array'
}
Can anyone please suggest an alternative or something in python which keeps the sequence of attributes as they are and convert the string into the python dictionary?
As tobias_k has said, python dictionaries are unordered, so you'll lose any order information as soon as you load your data into one.
You can, however, load your JSON string into a OrderedDict:
from collections import OrderedDict
import json
json.loads(your_json_string, object_pairs_hook=OrderedDict)
This method is mentioned in the json module documentation

Categories