Easy way to convert JSON to pyarrow schema

Easy way to convert JSON to pyarrow schema - python

Disclaimer: I'm new to apache parquet and pyarrow. Is there is easy way to convert json to a pyarrow schema? The json I'm working with is:
{
"_time": ${datetime},
"activity": ${event_id},
"activity_id": 6,
"category_name": "Network Activity",
"category_uid": 4,
"class_name": "HTTP Activity",
"class_uid": 4002,
"dst_endpoint": {
"ip": ${sip}
},
"http_request": {
"hostname": ${host},
"url": {
"hostname": ${url_host},
"path": ${url_path},
"text": ${url}
},
"user_agent": ${ua},
"version": ${reqversion}
},
"http_response": {
"code": ${respcode}
},
"metadata": {
"version": 1.0.0
},
"severity": ${riskscore},
"severity_id": 0,
"src_endpoint": {
"ip": ${cip}
},
"type_name": "HTTP Activity: Traffic",
"type_uid": 400206
}

Related

A Python script that can navigate a .json and export a .csv based on a search term

I want to take a .json of "_PRESET..." items and their "code-state"s with "actions" that contain other "code-state"s, "appearance"s, and "switch"s and turn it into .csv produced from the actions under a given "_PRESET...", including the "code-state"s and the "actions" listed under their individual entries.
This would allow a user to enter the "_PRESET..." name and receive a 3-column .csv file containing each action's "type", name, and "value". There are of course ways to export the entire .json easily, but I can't fathom a way to navigate it like is needed.
enters "_PRESET_Config_A" for
input.json:
{
"abc_data": {
"_PRESET_Config_A": {
"properties": {
"category": "configuration",
"name": "_PRESET_Config_A",
"collection": null,
"description": ""
},
"actions": {
"EN-R9": {
"type": "code_state",
"value": "on"
}
}
},
"PN4FP": {
"properties": {
"category": "uncategorized",
"name": "PN4FP",
"collection": null,
"description": ""
},
"actions": {
"E_xxxxxx_Default": {
"type": "appearance",
"value": "M_Red"
}
}
},
"HEDIS": {
"properties": {
"category": "uncategorized",
"name": "HEDIS",
"collection": null,
"description": ""
},
"actions": {
"E_xxxxxx_Default": {
"type": "appearance",
"value": "M_Purple"
}
}
},
"_PRESET_Config_B": {
"properties": {
"category": "configuration",
"name": "_PRESET_Config_A",
"collection": null,
"description": ""
},
"actions": {
"HEDIS": {
"type": "code_state",
"value": "on"
}
}
},
"EN-R9": {
"properties": {
"category": "uncategorized",
"name": "EN-R9",
"collection": null,
"description": ""
},
"actions": {
"PN4FP": {
"type": "code_state",
"value": "on"
},
"switch_StorageBin": {
"type": "switch",
"value": "00_w_Storage_Bin_R9"
}
}
}
}
}
Desired output.csv
type,name,value
code_state,EN-R9,on
code_state,PN4FP,on
appearance,E_xxxxxx_Default,M_Red
switch,switch_StorageBin,00_w_Storage_Bin_R9

singer.io incremental load is not working

Working with singer.io trying to get data from mysql using xampp on Ubuntu and able to fetch historical data but then i tried to get the incremental load the code is not working.
Not able to get the replication_key_value if anyone can help i will be really thankful to you.
config.json:
{
"host": "localhost",
"port": "3307",
"user": "root",
"password": ""
}
properties.json:
{
"streams": [
{
"tap_stream_id": "test_1-company",
"table_name": "company",
"schema": {
"properties": {
"COMPANY_ID": {
"inclusion": "automatic",
"maxLength": 6,
"type": [
"null",
"string"
]
},
"COMPANY_NAME": {
"inclusion": "available",
"maxLength": 25,
"type": [
"null",
"string"
]
},
"COMPANY_CITY": {
"inclusion": "available",
"maxLength": 25,
"type": [
"null",
"string"
]
}
},
"type": "object"
},
"stream": "company",
"metadata": [
{
"breadcrumb": [],
"metadata": {
"selected-by-default": false,
"database-name": "test_1",
"row-count": 7,
"is-view": false,
"table-key-properties": [
"COMPANY_ID"
],
"replication-method": "INCREMENTAL",
"replication-key": "COMPANY_ID"
}
},
{
"breadcrumb": [
"properties",
"COMPANY_ID"
],
"metadata": {
"selected-by-default": true,
"sql-datatype": "varchar(6)"
}
},
{
"breadcrumb": [
"properties",
"COMPANY_NAME"
],
"metadata": {
"selected-by-default": true,
"sql-datatype": "varchar(25)"
}
},
{
"breadcrumb": [
"properties",
"COMPANY_CITY"
],
"metadata": {
"selected-by-default": true,
"sql-datatype": "varchar(25)"
}
}
]
}
]
}
Console log:
$ tap-mysql -c config.json --properties properties.json
INFO Server Parameters: version: 10.4.14-MariaDB, wait_timeout: 2700, innodb_lock_wait_timeout: 2700, max_allowed_packet: 1048576, interactive_timeout: 28800
INFO Server SSL Parameters (blank means SSL is not active): [ssl_version: ], [ssl_cipher: ]
{"type": "STATE", "value": {"currently_syncing": null}}

Can you share your executable?
Your problem may be in the state.json file.
State.json needs the form:
{
"bookmarks":{
"123po-employee":{
"replication_key":"Id",
"replication_key_value":45,
"version":1609927544221
}
},
"currently_syncing":null
}

Validate Json schema with repeating json response in Python

I am getting none when I try to validate my Json schema with my Json response using Validate from Jsonschema.validate, while it shows matched on https://www.jsonschemavalidator.net/
Json Schema
{
"KPI": [{
"KPIDefinition": {
"id": {
"type": "string"
},
"name": {
"type": "string"
},
"version": {
"type": "number"
},
"description": {
"type": "string"
},
"datatype": {
"type": "string"
},
"units": {
"type": "string"
}
},
"KPIGroups": [{
"id": {
"type": "number"
},
"name": {
"type": "string"
}
}]
}],
"response": [{
"Description": {
"type": "string"
}
}]
}
JSON Response
JSON Response
{
"KPI": [
{
"KPIDefinition": {
"id": "2",
"name": "KPI 2",
"version": 1,
"description": "This is KPI 2",
"datatype": "1",
"units": "perHour"
},
"KPIGroups": [
{
"id": 7,
"name": "Group 7"
}
]
},
{
"KPIDefinition": {
"id": "3",
"name": "Parameter 3",
"version": 1,
"description": "This is KPI 3",
"datatype": "1",
"units": "per Hour"
},
"KPIGroups": [
{
"id": 7,
"name": "Group 7"
}
]
}
],
"response": [
{
"Description": "RECORD Found"
}
]
}
Code
json_schema2 = {"KPI":[{"KPIDefinition":{"id_new":{"type":"number"},"name":{"type":"string"},"version":{"type":"number"},"description":{"type":"string"},"datatype":{"type":"string"},"units":{"type":"string"}},"KPIGroups":[{"id":{"type":"number"},"name":{"type":"string"}}]}],"response":[{"Description":{"type":"string"}}]}
json_resp = {"KPI":[{"KPIDefinition":{"id":"2","name":"Parameter 2","version":1,"description":"This is parameter 2 definition version 1","datatype":"1","units":"kN"},"KPIGroups":[{"id":7,"name":"Group 7"}]},{"KPIDefinition":{"id":"3","name":"Parameter 3","version":1,"description":"This is parameter 3 definition version 1","datatype":"1","units":"kN"},"KPIGroups":[{"id":7,"name":"Group 7"}]}],"response":[{"Description":"RECORD FETCHED"}]}
print(jsonschema.validate(instance=json_resp, schema=json_schema2))
Validation is not being done correctly, I changed the dataType and key name in my response but still, it is not raising an exception or error.

jsonschema.validate(..) is not supposed to return anything.
Your schema object and the JSON object are both okay and validation has passed if it didn't raise any exceptions -- which seems to be the case here.
That being said, you should wrap your call within a try-except block so as to be able to catch validation errors.
Something like:
try:
jsonschema.validate(...)
print("Validation passed!")
except ValidationError:
print("Validation failed")
# similarly catch SchemaError too if needed.
Update: Your schema is invalid. As it stands, it will validate almost all inputs. A schema JSON should be an object (dict) that should have fields like "type" and based on the type, it might have other required fields like "items" or "properties". Please read up on how to write JSONSchema.
Here's a schema I wrote for your JSON:
{
"type": "object",
"required": [
"KPI",
"response"
],
"properties": {
"KPI": {
"type": "array",
"items": {
"type": "object",
"required": ["KPIDefinition","KPIGroups"],
"properties": {
"KPIDefinition": {
"type": "object",
"required": ["id","name"],
"properties": {
"id": {"type": "string"},
"name": {"type": "string"},
"version": {"type": "integer"},
"description": {"type": "string"},
"datatype": {"type": "string"},
"units": {"type": "string"},
},
"KPIGroups": {
"type": "array",
"items": {
"type": "object",
"required": ["id", "name"],
"properties": {
"id": {"type": "integer"},
"name": {"type": "string"}
}
}
}
}
}
}
},
"response": {
"type": "array",
"items": {
"type": "object",
"required": ["Description"],
"properties": {
"Description": {"type": "string"}
}
}
}
}
}

Converting JSON String to Dictionary in python [duplicate]

This question already has answers here:
How can I parse (read) and use JSON?
(5 answers)
Closed 7 years ago.
{
"status_code": 0,
"result_type": "DRAGON_NLU_ASR_CMD",
"NMAS_PRFX_SESSION_ID": "8c63c3ed-40da-4cdc-8ad8-1dd94ce8e466",
"NMAS_PRFX_TRANSACTION_ID": "1",
"audio_transfer_info": {
"packages": [
{
"time": "20160105015723190",
"bytes": 1668
},
{
"time": "20160105015723646",
"bytes": 7613
}
],
"nss_server": "10.56.11.186:4510",
"end_time": "20160105015723645",
"audio_id": 1,
"start_time": "20160105015722835"
},
"cadence_regulatable_result": "completeRecognition",
"appserver_results": {
"status": "success",
"final_response": 1,
"payload": {
"diagnostic_info": {
"adk_dialog_manager_status": "undefined",
"nlu_version": "[NLU_PROJECT:NVCCP-deu-DEU];[Datapack:Version: nlps-deu-DEU-NVCCP-6.1.100.12-2-GMT20151130161021];[VL-Models:Version: vlmodels-NVCCP-deu-DEU-6.1.100.12-2-GMT20151130160231]",
"nlps_host": "mt-dmz-nlps002.nuance.com:8636",
"nlps_ip": "10.56.10.51",
"application": "AUDI_2017",
"nlu_component_flow": "[Input:VoiceJSON] [FieldID|auto_main] [NLUlib|C-eckart-r$Rev$.f20151118.1250] [build|G-r72490M.f20151130.1055] [vlmodel|Version: vlmodels-NVCCP-deu-DEU-6.1.100.12-2-GMT20151130160231] [Flow|+VlingoTokenized]",
"third_party_delay": "0",
"nmaid": "AUDI_SDS_2017_EXT_20151203",
"nlps_profile": "AUDI_2017",
"fieldId": "auto_main",
"nlps_profile_package_version": "r159218",
"nlu_annotator": "com.nuance.NVCCP.deu-DEU.ncs51.VlingoNLU-client-qNVCCP_NCS51",
"ext_map_time": "3",
"nlu_use_literal_annotator": "0",
"int_map_time": "1",
"nlps_nlu_type": "nlu_project",
"nlu_language": "deu-DEU",
"timing": {
"finalRespSentDelay": "311",
"intermediateRespSentDelay": "1896"
},
"nlps_profile_package": "AUDI_2017"
},
"actions": [
{
"Input": {
"Interpretations": [
"18 Slash 6/2015"
],
"Type": "asr"
},
"Instances": [
{
"nlu_classification": {
"Domain": "UDE",
"Intention": "Unspecified"
},
"nlu_interpretation_index": 1,
"nlu_slot_details": {
"Location": {
"literal": "18 Slash 6/2015"
},
"Search-phrase": {
"literal": "18 slash 6/2015"
}
},
"interpretation_confidence": 3174
}
],
"type": "nlu_results",
"api_version": "1.0"
}
],
"nlps_version": "nlps(z):6.1.100.12.2-B359;Version: nlps-base-Zeppelin-6.1.100-B124-GMT20151130193521;"
}
},
"final_response": 1,
"prompt": "",
"result_format": "appserver_post_results"
}
I would like to convert the above JSON to a dictionary data type in python. In the above script, I want to read this values -
{"Domain":"UDE","Intention":"Unspecified"}
I am new to json, so I am not able to understand. can someone please suggest me some ideas.

Simply use json module which comes with python.
import json
json_string = '{"first_name": "Guido", "last_name":"Rossum"}'
parsed_json = json.loads(json_string)
print(parsed_json['first_name'])
"Guido"
reference : http://docs.python-guide.org/en/latest/scenarios/json/

How to get id of lastest uploaded video on youtube channel [duplicate]

This question already has answers here:
How to get notification when youtube channel uploads video in python
(2 answers)
Closed 7 months ago.
How can I get the id of latest uploaded video in a specific youtube Channel using Python?

You can request JSON and parse it. The following code gives you the first (most recent) result and stores it in first.
import urllib, json
author = 'Google'
inp = urllib.urlopen(r'http://gdata.youtube.com/feeds/api/videos?max-results=1&alt=json&orderby=published&author=' + author)
resp = json.load(inp)
inp.close()
first = resp['feed']['entry'][0]
# Title of the video
print first['title']
# URL
print first['link'][0]['href']
I just looked through the JSON object in an interactive Python shell. You can build your own query or use the one I posted. Remember to change the author. This is a lower level approach, and #Frederik mentioned something a bit higher level.
The first object looks like this.
{
"author": [
{
"name": {
"$t": "Google"
},
"uri": {
"$t": "http://gdata.youtube.com/feeds/api/users/google"
}
}
],
"category": [
{
"scheme": "http://schemas.google.com/g/2005#kind",
"term": "http://gdata.youtube.com/schemas/2007#video"
},
{
"label": "Science & Technology",
"scheme": "http://gdata.youtube.com/schemas/2007/categories.cat",
"term": "Tech"
},
{
"scheme": "http://gdata.youtube.com/schemas/2007/keywords.cat",
"term": "Google Currents"
},
{
"scheme": "http://gdata.youtube.com/schemas/2007/keywords.cat",
"term": "Google"
},
{
"scheme": "http://gdata.youtube.com/schemas/2007/keywords.cat",
"term": "Currents"
},
{
"scheme": "http://gdata.youtube.com/schemas/2007/keywords.cat",
"term": "Magazine App"
},
{
"scheme": "http://gdata.youtube.com/schemas/2007/keywords.cat",
"term": "Reader App"
},
{
"scheme": "http://gdata.youtube.com/schemas/2007/keywords.cat",
"term": "Android"
},
{
"scheme": "http://gdata.youtube.com/schemas/2007/keywords.cat",
"term": "ios"
},
{
"scheme": "http://gdata.youtube.com/schemas/2007/keywords.cat",
"term": "Android phone"
},
{
"scheme": "http://gdata.youtube.com/schemas/2007/keywords.cat",
"term": "Android tablet"
},
{
"scheme": "http://gdata.youtube.com/schemas/2007/keywords.cat",
"term": "iphone"
},
{
"scheme": "http://gdata.youtube.com/schemas/2007/keywords.cat",
"term": "ipad"
}
],
"content": {
"$t": "Google Currents is a new mobile app that lets you enjoy free online magazines and other content optimized for your Android or Apple phones and tablets. Learn more at www.google.com",
"type": "text"
},
"gd$comments": {
"gd$feedLink": {
"countHint": 463,
"href": "http://gdata.youtube.com/feeds/api/videos/5LOcUkm8m9w/comments"
}
},
"gd$rating": {
"average": 4.7557077,
"max": 5,
"min": 1,
"numRaters": 1752,
"rel": "http://schemas.google.com/g/2005#overall"
},
"id": {
"$t": "http://gdata.youtube.com/feeds/api/videos/5LOcUkm8m9w"
},
"link": [
{
"href": "http://www.youtube.com/watch?v=5LOcUkm8m9w&feature=youtube_gdata",
"rel": "alternate",
"type": "text/html"
},
{
"href": "http://gdata.youtube.com/feeds/api/videos/5LOcUkm8m9w/responses",
"rel": "http://gdata.youtube.com/schemas/2007#video.responses",
"type": "application/atom+xml"
},
{
"href": "http://gdata.youtube.com/feeds/api/videos/5LOcUkm8m9w/related",
"rel": "http://gdata.youtube.com/schemas/2007#video.related",
"type": "application/atom+xml"
},
{
"href": "http://m.youtube.com/details?v=5LOcUkm8m9w",
"rel": "http://gdata.youtube.com/schemas/2007#mobile",
"type": "text/html"
},
{
"href": "http://gdata.youtube.com/feeds/api/videos/5LOcUkm8m9w",
"rel": "self",
"type": "application/atom+xml"
}
],
"media$group": {
"media$category": [
{
"$t": "Tech",
"label": "Science & Technology",
"scheme": "http://gdata.youtube.com/schemas/2007/categories.cat"
}
],
"media$content": [
{
"duration": 94,
"expression": "full",
"isDefault": "true",
"medium": "video",
"type": "application/x-shockwave-flash",
"url": "http://www.youtube.com/v/5LOcUkm8m9w?version=3&f=videos&app=youtube_gdata",
"yt$format": 5
},
{
"duration": 94,
"expression": "full",
"medium": "video",
"type": "video/3gpp",
"url": "rtsp://v1.cache8.c.youtube.com/CiILENy73wIaGQncm7xJUpyz5BMYDSANFEgGUgZ2aWRlb3MM/0/0/0/video.3gp",
"yt$format": 1
},
{
"duration": 94,
"expression": "full",
"medium": "video",
"type": "video/3gpp",
"url": "rtsp://v5.cache4.c.youtube.com/CiILENy73wIaGQncm7xJUpyz5BMYESARFEgGUgZ2aWRlb3MM/0/0/0/video.3gp",
"yt$format": 6
}
],
"media$description": {
"$t": "Google Currents is a new mobile app that lets you enjoy free online magazines and other content optimized for your Android or Apple phones and tablets. Learn more at www.google.com",
"type": "plain"
},
"media$keywords": {
"$t": "Google Currents, Google, Currents, Magazine App, Reader App, Android, ios, Android phone, Android tablet, iphone, ipad"
},
"media$player": [
{
"url": "http://www.youtube.com/watch?v=5LOcUkm8m9w&feature=youtube_gdata_player"
}
],
"media$thumbnail": [
{
"height": 360,
"time": "00:00:47",
"url": "http://i.ytimg.com/vi/5LOcUkm8m9w/0.jpg",
"width": 480
},
{
"height": 90,
"time": "00:00:23.500",
"url": "http://i.ytimg.com/vi/5LOcUkm8m9w/1.jpg",
"width": 120
},
{
"height": 90,
"time": "00:00:47",
"url": "http://i.ytimg.com/vi/5LOcUkm8m9w/2.jpg",
"width": 120
},
{
"height": 90,
"time": "00:01:10.500",
"url": "http://i.ytimg.com/vi/5LOcUkm8m9w/3.jpg",
"width": 120
}
],
"media$title": {
"$t": "Introducing Google Currents",
"type": "plain"
},
"yt$duration": {
"seconds": "94"
}
},
"published": {
"$t": "2011-12-08T09:10:07.000Z"
},
"title": {
"$t": "Introducing Google Currents",
"type": "text"
},
"updated": {
"$t": "2011-12-14T12:57:53.000Z"
},
"yt$hd": {},
"yt$statistics": {
"favoriteCount": "312",
"viewCount": "420050"
}
}

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Easy way to convert JSON to pyarrow schema - python

Related

A Python script that can navigate a .json and export a .csv based on a search term

singer.io incremental load is not working

Validate Json schema with repeating json response in Python

Converting JSON String to Dictionary in python [duplicate]

How to get id of lastest uploaded video on youtube channel [duplicate]

Categories

Resources