For a personal project I'm trying to write an AWS Lambda in Python3.9 that will delete a newly created user, if the creator is not myself. For this, the logs in CloudWatch Logs will trigger (via CloudTrail and EventBridge) my Lambda. Therefore, I will receive the JSON request as my event in :
def lambdaHandler(event, context)
But I have trouble to parse it...
If I print the event, I get that :
{'version': '1.0', 'invokingEvent': '{
"configurationItemDiff": {
"changedProperties": {},
"changeType": "CREATE"
},
"configurationItem": {
"relatedEvents": [],
"relationships": [],
"configuration": {
"path": "/",
"userName": "newUser",
"userId": "xxx",
"arn": "xxx",
"createDate": "2022-11-23T09:02:49.000Z",
"userPolicyList": [],
"groupList": [],
"attachedManagedPolicies": [],
"permissionsBoundary": null,
"tags": []
},
"supplementaryConfiguration": {},
"tags": {},
"configurationItemVersion": "1.3",
"configurationItemCaptureTime": "2022-11-23T09:04:40.659Z",
"configurationStateId": 1669194280659,
"awsAccountId": "141372946428",
"configurationItemStatus": "ResourceDiscovered",
"resourceType": "AWS::IAM::User",
"resourceId": "xxx",
"resourceName": "newUser",
"ARN": "arn:aws:iam::xxx:user/newUser",
"awsRegion": "global",
"availabilityZone": "Not Applicable",
"configurationStateMd5Hash": "",
"resourceCreationTime": "2022-11-23T09:02:49.000Z"
},
"notificationCreationTime": "2022-11-23T09:04:41.317Z",
"messageType": "ConfigurationItemChangeNotification",
"recordVersion": "1.3"
}', 'ruleParameters': '{
"badUser": "arn:aws:iam::xxx:user/badUser"
}', 'resultToken': 'xxx=', 'eventLeftScope': False, 'executionRoleArn': 'arn:aws:iam: : xxx:role/aws-service-role/config.amazonaws.com/AWSServiceRoleForConfig', 'configRuleArn': 'arn:aws:config:eu-west-1: xxx:config-rule/config-rule-q3nmvt', 'configRuleName': 'UserCreatedRule', 'configRuleId': 'config-rule-q3nmvt', 'accountId': 'xxx'
}
And for my purpose, I'd like to get the "changeType": "CREATE" value to say that if it is CREATE, I check the creator and if it is not myself, I delete newUser.
So the weird thing is that I copy/paste that event into VSCode and format it in a .json document and it says that there are errors (line 1 : version and invokingEvent should be double quote for example, but well).
For now I only try to reach and print the
"changeType": "CREATE"
by doing :
import json
import boto3
import logging
iam = boto3.client('iam')
def lambda_handler(event, context):
"""
Triggered if a user is created
Check the creator - if not myself :
- delete new user and remove from groups if necessary
"""
try:
print(event['invokingEvent']["configurationItemDiff"]["changeType"])
except Exception as e:
print("Error because :")
print(e)
And get the error string indices must be integers - it happens for ["configurationItemDiff"].
I understand the error already (I'm new to python though so maybe not completely) and tried many things like :
print(event['invokingEvent']['configurationItemDiff']) : swapping double quote by simple quote but doesnt change anything
print(event['invokingEvent'][0]) : but it gives me the index { and [2] gives me the c not the whole value.
At this point I'm stuck and need help because I can't find any solution on this. I don't use SNS, maybe should I ? Because I saw that with it, the JSON document would not be the same and we can access through ["Records"][...] ? I don't know, please help
What you are printing is a python dict, it looks sort of like JSON but is not JSON, it is the representation of a python dict. That means it will have True / False instead of true / false, it will have ' instead of ", etc.
You could do print(json.dumps(event)) instead.
Anyway, the actual problem is that invokingEvent is yet another JSON, but in its string form, you need to to json.loads that nested JSON string. You can see that because the value after invokingEvent is inside another set of '...', therefore it is a string, not a parsed dict already.
invoking_event = json.loads(event['invokingEvent'])
change_type = invoking_event["configurationItemDiff"]["changeType"]
ruleParameters would be another nested JSON which needs parsing first if you wanted to use it.
I am writing Python code to validate a .csv file using a JSON schema and the jsonschema Python module. I have a clinical manifest schema that looks like this:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "http://example.com/veoibd_schema.json",
"title": "clinical data manifest schema",
"description": "Validates clinical data manifests",
"type": "object",
"properties": {
"individualID": {
"type": "string",
"pattern": "^CENTER-"
},
"medicationAtDx": {
"$ref": "https://raw.githubusercontent.com/not-my-username/validation_schemas/reference_definitions/clinicalData.json#/definitions/medicationAtDx"
}
},
"required": [
"individualID",
"medicationAtDx"
]
}
The schema referenced by the $ref looks like this:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "http://example.com/clinicalData.json",
"definitions":{
"ageDxYears": {
"description": "Age in years at diagnosis",
"type": "number",
"minimum": 0,
"maximum": 90
},
"ageOnset": {
"description": "Age in years of first symptoms",
"type": "number",
"exclusiveMinimum": 0
},
"medicationAtDx": {
"description": "Medication prescribed at diagnosis",
"type": "string"
}
}
}
(Note that both schemas are quite a bit larger and have been edited for brevity.)
I need to be able to figure out the "type" of "medicationAtDx" and am trying to figure out how to use jsonschema.RefResolver to de-reference it, but am a little lost in the terminology used in the documentation and can't find a good example that explains what the parameters are and what it returns "in small words", i.e. something that a beginning JSON schema user would easily understand.
I created a RefResolver from the clinical manifest schema:
import jsonschema
testref = jsonschema.RefResolver.from_schema(clin_manifest_schema)
I fed it the url in the "$ref":
meddx_url = "https://raw.githubusercontent.com/not-my-username/validation_schemas/reference_definitions/clinicalData.json#/definitions/medicationAtDx"
testref.resolve_remote(meddx_url)["definitions"].keys()
What I was expecting to get back was:
dict_keys(['medicationAtDx'])
What I actually got back was:
dict_keys(['ageDxYears', 'ageOnset', 'medicationAtDx'])
Is this the expected behavior? If not, how can I narrow it down to just the definition for "medicationAtDx"? I can traverse the whole dictionary to get what I want if I have to, but I'd rather have it return just the reference I need.
Thanks in advance!
ETA: per Relequestual's comment below, I took a couple of passes with resolve_fragment as follows:
ref_doc = meddx_url.split("#")[0]
ref_frag = meddx_url.split("#")[1]
testref.resolve_fragment(ref_doc, ref_frag)
This gives me "TypeError: string indices must be integers" and "RefResolutionError: Unresolvable JSON pointer". I tried tweaking the parameters in different ways (adding the "#" back into the fragment, removing the leading slash, etc.) and got the same results. Relequestual's explanation of a fragment was very helpful, but apparently I'm still not understanding the exact parameters that resolve_fragment is expecting.
I'm fairly new to Python programming, and have thus far been reverse engineering code that previous developers have made, or have cobbled together some functions on my own.
The script itself works; to cut a long story short, its designed to parse a CSV and to (a) create and or update the contacts found in the CSV, and (b) to correctly assign the contact to their associated company. All using the HubSpot API. To achieve this i've also imported requests and csvmapper.
I had the following questions:
How can I improve this script to make it more pythonic?
What is the best way to make this script run on a remote server,
keeping in mind that Requests and CSVMapper probably aren't
installed on that server, and that I most likely won't have
permission to install them - what is the best way to "package" this
script, or to upload Requests and CSVMapper to the server?
Any advice much appreciated.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import print_function
import sys, os.path, requests, json, csv, csvmapper, glob, shutil
from time import sleep
major, minor, micro, release_level, serial = sys.version_info
# Client Portal ID
portal = "XXXXXX"
# Client API Key
hapikey = "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
# This attempts to find any file in the directory that starts with "note" and ends with ".CSV"
# Server Version
# findCSV = glob.glob('/home/accountName/public_html/clientFolder/contact*.CSV')
# Local Testing Version
findCSV = glob.glob('contact*.CSV')
for i in findCSV:
theCSV = i
csvfileexists = os.path.isfile(theCSV)
# Prints a confirmation if file exists, prints instructions if it doesn't.
if csvfileexists:
print ("\nThe \"{csvPath}\" file was found ({csvSize} bytes); proceeding with sync ...\n".format(csvSize=os.path.getsize(theCSV), csvPath=os.path.basename(theCSV)))
else:
print ("File not found; check the file name to make sure it is in the same directory as this script. Exiting ...")
sys.exit()
# Begin the CSVmapper mapping... This creates a virtual "header" row - the CSV therefore does not need a header row.
mapper = csvmapper.DictMapper([
[
{'name':'account'}, #"Org. Code"
{'name':'id'}, #"Hubspot Ref"
{'name':'company'}, #"Company Name"
{'name':'firstname'}, #"Contact First Name"
{'name':'lastname'}, #"Contact Last Name"
{'name':'job_title'}, #"Job Title"
{'name':'address'}, #"Address"
{'name':'city'}, #"City"
{'name':'phone'}, #"Phone"
{'name':'email'}, #"Email"
{'name':'date_added'} #"Last Update"
]
])
# Parse the CSV using the mapper
parser = csvmapper.CSVParser(os.path.basename(theCSV), mapper)
# Build the parsed object
obj = parser.buildObject()
def contactCompanyUpdate():
# Open the CSV, use commas as delimiters, store it in a list called "data", then find the length of that list.
with open(os.path.basename(theCSV),"r") as f:
reader = csv.reader(f, delimiter = ",", quotechar="\"")
data = list(reader)
# For every row in the CSV ...
for row in range(0, len(data)):
# Set up the JSON payload ...
payload = {
"properties": [
{
"name": "account",
"value": obj[row].account
},
{
"name": "id",
"value": obj[row].id
},
{
"name": "company",
"value": obj[row].company
},
{
"property": "firstname",
"value": obj[row].firstname
},
{
"property": "lastname",
"value": obj[row].lastname
},
{
"property": "job_title",
"value": obj[row].job_title
},
{
"property": "address",
"value": obj[row].address
},
{
"property": "city",
"value": obj[row].city
},
{
"property": "phone",
"value": obj[row].phone
},
{
"property": "email",
"value": obj[row].email
},
{
"property": "date_added",
"value": obj[row].date_added
}
]
}
nameQuery = "{first} {last}".format(first=obj[row].firstname, last=obj[row].lastname)
# Get a list of all contacts for a certain company.
contactCheck = "https://api.hubapi.com/contacts/v1/search/query?q={query}&hapikey={hapikey}".format(hapikey=hapikey, query=nameQuery)
# Convert the payload to JSON and assign it to a variable called "data"
data = json.dumps(payload)
# Defined the headers content-type as 'application/json'
headers = {'content-type': 'application/json'}
contactExistCheck = requests.get(contactCheck, headers=headers)
for i in contactExistCheck.json()[u'contacts']:
# ... Get the canonical VIDs
canonicalVid = i[u'canonical-vid']
if canonicalVid:
print ("{theContact} exists! Their VID is \"{vid}\"".format(theContact=obj[row].firstname, vid=canonicalVid))
print ("Attempting to update their company...")
contactCompanyUpdate = "https://api.hubapi.com/companies/v2/companies/{companyID}/contacts/{vid}?hapikey={hapikey}".format(hapikey=hapikey, vid=canonicalVid, companyID=obj[row].id)
doTheUpdate = requests.put(contactCompanyUpdate, headers=headers)
if doTheUpdate.status_code == 200:
print ("Attempt Successful! {theContact}'s has an updated company.\n".format(theContact=obj[row].firstname))
break
else:
print ("Attempt Failed. Status Code: {status}. Company or Contact not found.\n".format(status=doTheUpdate.status_code))
def createOrUpdateClient():
# Open the CSV, use commas as delimiters, store it in a list called "data", then find the length of that list.
with open(os.path.basename(theCSV),"r") as f:
reader = csv.reader(f, delimiter = ",", quotechar="\"")
data = list(reader)
# For every row in the CSV ...
for row in range(0, len(data)):
# Set up the JSON payload ...
payloadTest = {
"properties": [
{
"property": "email",
"value": obj[row].email
},
{
"property": "firstname",
"value": obj[row].firstname
},
{
"property": "lastname",
"value": obj[row].lastname
},
{
"property": "website",
"value": None
},
{
"property": "company",
"value": obj[row].company
},
{
"property": "phone",
"value": obj[row].phone
},
{
"property": "address",
"value": obj[row].address
},
{
"property": "city",
"value": obj[row].city
},
{
"property": "state",
"value": None
},
{
"property": "zip",
"value": None
}
]
}
# Convert the payload to JSON and assign it to a variable called "data"
dataTest = json.dumps(payloadTest)
# Defined the headers content-type as 'application/json'
headers = {'content-type': 'application/json'}
#print ("{theContact} does not exist!".format(theContact=obj[row].firstname))
print ("Attempting to add {theContact} as a contact...".format(theContact=obj[row].firstname))
createOrUpdateURL = 'http://api.hubapi.com/contacts/v1/contact/createOrUpdate/email/{email}/?hapikey={hapikey}'.format(email=obj[row].email,hapikey=hapikey)
r = requests.post(createOrUpdateURL, data=dataTest, headers=headers)
if r.status_code == 409:
print ("This contact already exists.\n")
elif (r.status_code == 200) or (r.status_code == 202):
print ("Success! {firstName} {lastName} has been added.\n".format(firstName=obj[row].firstname,lastName=obj[row].lastname, response=r.status_code))
elif r.status_code == 204:
print ("Success! {firstName} {lastName} has been updated.\n".format(firstName=obj[row].firstname,lastName=obj[row].lastname, response=r.status_code))
elif r.status_code == 400:
print ("Bad request. You might get this response if you pass an invalid email address, if a property in your request doesn't exist, or if you pass an invalid property value.\n")
else:
print ("Contact Marko for assistance.\n")
if __name__ == "__main__":
# Run the Create or Update function
createOrUpdateClient()
# Give the previous function 5 seconds to take effect.
sleep(5.0)
# Run the Company Update function
contactCompanyUpdate()
print("Sync complete.")
print("Moving \"{something}\" to the archive folder...".format(something=theCSV))
# Cron version
#shutil.move( i, "/home/accountName/public_html/clientFolder/archive/" + os.path.basename(i))
# Local version
movePath = "archive/{thefile}".format(thefile=theCSV)
shutil.move( i, movePath )
print("Move successful! Exiting...\n")
sys.exit()
I'll just go from top to bottom. The first rule is, do what's in PEP 8. It's not the ultimate style guide, but it's certainly a reference baseline for Python coders, and that's more important, especially when you're getting started. The second rule is, make it maintainable. A couple of years from now, when some other new kid comes through, it should be easy for her to figure out what you were doing. Sometimes that means doing things the long way, to reduce errors. Sometimes it means doing things the short way, to reduce errors. :-)
#!/usr/bin/env python
# -*- coding: utf-8 -*-
Two things: you got the encoding right, per PEP 8. And
Conventions for writing good documentation strings (a.k.a. "docstrings") are immortalized in PEP 257.
You've got a program that does something. But you don't document what.
from __future__ import print_function
import sys, os.path, requests, json, csv, csvmapper, glob, shutil
from time import sleep
major, minor, micro, release_level, serial = sys.version_info
Per PEP 8: put your import module statements one per line.
Per Austin: make your paragraphs have separate subjects. You've got some imports right next to some version info stuff. Insert a blank line. Also, DO SOMETHING with the data! Or you didn't need it to be right here, did you?
# Client Portal ID
portal = "XXXXXX"
# Client API Key
hapikey = "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
You've obscured these in more ways than one. WTF is a hapikey? I think you mean Hubspot_API_key. And what does portal do?
One piece of advice: the more "global" a thing is, the more "formal" it should be. If you have a for loop, it's okay to call one of the variables i. If you have a piece of data that is used throughout a function, call it obj or portal. But if you have a piece of data that is used globally, or is a class variable, make it put on a tie and a jacket so everyone can recognize it: make it Hubspot_api_key instead of client_api_key. Maybe even Hubspot_client_api_key if there are more than one API. Do the same with portal.
# This attempts to find any file in the directory that starts with "note" and ends with ".CSV"
# Server Version
# findCSV = glob.glob('/home/accountName/public_html/clientFolder/contact*.CSV')
It didn't take long for the comments to become lies. Just delete them if they aren't true.
# Local Testing Version
findCSV = glob.glob('contact*.CSV')
This is the kind of thing that you should create a function for. Just create a simple function called "get_csv_files" or whatever, and have it return a list of filenames. That decouples you from glob, and it means you can make your test code data driven (pass a list of filenames into a function, or pass a single file into a function, instead of asking it to search for them). Also, those glob patterns are exactly the kind of thing that go in a config file, or a global variable, or get passed as command line arguments.
for i in findCSV:
I'll bet typing CSV in upper case all the time is a pain. And what does findCSV mean? Read that line, and figure out what that variable should be called. Maybe csv_files? Or new_contact_files? Something that demonstrates that there is a collection of things.
theCSV = i
csvfileexists = os.path.isfile(theCSV)
Now what does i do? You had this nice small variable name, in a BiiiiiiG loop. That was a mistake, since if you can't see a variable's entire scope all on one page, it probably needs a somewhat longer name. But then you created an alias for it. Both i and theCSV refer to the same thing. And ... I don't see you using i again. So maybe your loop variable should be theCSV. Or maybe it should be the_csv to make it easier to type. Or just csvname.
# Prints a confirmation if file exists, prints instructions if it doesn't.
This seems a little needless. If you're using glob to get filenames, they pretty much are going to exist. (If they don't, it's because they were deleted between the time you called glob and the time you tried to open them. That's possible, but rare. Just continue or raise an exception, depending.)
if csvfileexists:
print ("\nThe \"{csvPath}\" file was found ({csvSize} bytes); proceeding with sync ...\n".format(csvSize=os.path.getsize(theCSV), csvPath=os.path.basename(theCSV)))
In this code, you use the value of csvfileexists. But that's the only place you use it. In this case, you can probably move the call to os.path.isfile() into the if statement and get rid of the variable.
else:
print ("File not found; check the file name to make sure it is in the same directory as this script. Exiting ...")
sys.exit()
Notice that in this case, when there is an actual problem, you didn't print the file name? How helpful was that?
Also, remember the part where you're on a remote server? You should consider using Python's logging module to record these messages in a useful manner.
# Begin the CSVmapper mapping... This creates a virtual "header" row - the CSV therefore does not need a header row.
mapper = csvmapper.DictMapper([
[
{'name':'account'}, #"Org. Code"
{'name':'id'}, #"Hubspot Ref"
{'name':'company'}, #"Company Name"
{'name':'firstname'}, #"Contact First Name"
{'name':'lastname'}, #"Contact Last Name"
{'name':'job_title'}, #"Job Title"
{'name':'address'}, #"Address"
{'name':'city'}, #"City"
{'name':'phone'}, #"Phone"
{'name':'email'}, #"Email"
{'name':'date_added'} #"Last Update"
]
])
You're creating an object with a bunch of data. This would be a good place for a function. Define a make_csvmapper() function to do all this for you, and move it out of line.
Also, note that the standard csv module has most of the functionality you are using. I don't think you actually need csvmapper.
# Parse the CSV using the mapper
parser = csvmapper.CSVParser(os.path.basename(theCSV), mapper)
# Build the parsed object
obj = parser.buildObject()
Here's another chance for a function. Maybe instead of making a csv mapper, you could just return the obj?
def contactCompanyUpdate():
At this point, things get fishy. You have these function definitions indented, but I don't think you need them. Is that a stackoverflow problem, or does your code really look like this?
# Open the CSV, use commas as delimiters, store it in a list called "data", then find the length of that list.
with open(os.path.basename(theCSV),"r") as f:
No, apparently it really looks like this. Because you're using theCSV inside this function when you don't really need to. Please consider using formal function parameters instead of just grabbing outer-scope objects. Also, why are you using basename on the csv file? If you obtained it using glob, doesn't it already have the path you want?
reader = csv.reader(f, delimiter = ",", quotechar="\"")
data = list(reader)
# For every row in the CSV ...
for row in range(0, len(data)):
Here you forced data to be a list of rows obtained from reader, and then started iterating over them. Just iterate over reader directly, like: for row in reader: BUT WAIT! You're actually iterating over a CSV file that you have already opened, in your obj variable. Just pick one, and iterate over it. You don't need to open the file twice for this.
# Set up the JSON payload ...
payload = {
"properties": [
{
"name": "account",
"value": obj[row].account
},
{
"name": "id",
"value": obj[row].id
},
{
"name": "company",
"value": obj[row].company
},
{
"property": "firstname",
"value": obj[row].firstname
},
{
"property": "lastname",
"value": obj[row].lastname
},
{
"property": "job_title",
"value": obj[row].job_title
},
{
"property": "address",
"value": obj[row].address
},
{
"property": "city",
"value": obj[row].city
},
{
"property": "phone",
"value": obj[row].phone
},
{
"property": "email",
"value": obj[row].email
},
{
"property": "date_added",
"value": obj[row].date_added
}
]
}
Okay, that was a LOOOONG span of code that didn't do much. At the least, tighten those inner dicts up to one line each. But better still, write a function to create your dictionary in the format you want. You can use getattr to pull the data by name from obj.
nameQuery = "{first} {last}".format(first=obj[row].firstname, last=obj[row].lastname)
# Get a list of all contacts for a certain company.
contactCheck = "https://api.hubapi.com/contacts/v1/search/query?q={query}&hapikey={hapikey}".format(hapikey=hapikey, query=nameQuery)
# Convert the payload to JSON and assign it to a variable called "data"
data = json.dumps(payload)
# Defined the headers content-type as 'application/json'
headers = {'content-type': 'application/json'}
contactExistCheck = requests.get(contactCheck, headers=headers)
Here you're encoding details of the API into your code. Consider pulling them out into functions. (That way, you can come back later and build a module of them, to re-use in your next program.) Also, beware of comments that don't actually tell you anything. And feel free to pull that together as a single paragraph, since it's all in service of the same key thing - making an API call.
for i in contactExistCheck.json()[u'contacts']:
# ... Get the canonical VIDs
canonicalVid = i[u'canonical-vid']
if canonicalVid:
print ("{theContact} exists! Their VID is \"{vid}\"".format(theContact=obj[row].firstname, vid=canonicalVid))
print ("Attempting to update their company...")
contactCompanyUpdate = "https://api.hubapi.com/companies/v2/companies/{companyID}/contacts/{vid}?hapikey={hapikey}".format(hapikey=hapikey, vid=canonicalVid, companyID=obj[row].id)
doTheUpdate = requests.put(contactCompanyUpdate, headers=headers)
if doTheUpdate.status_code == 200:
print ("Attempt Successful! {theContact}'s has an updated company.\n".format(theContact=obj[row].firstname))
break
else:
print ("Attempt Failed. Status Code: {status}. Company or Contact not found.\n".format(status=doTheUpdate.status_code))
I'm not sure if this last bit should be an exception or not. Is an "Attempt Failed" normal behavior, or does it mean that something is broken?
At any rate, please look into the API you are using. I'd bet there is some more information available for minor failures. (Major failures would be the internet is broken or their server is offline.) They might provide an "errors" or "error" field in their return JSON, for example. Those should be logged or printed with your failure message.
def createOrUpdateClient():
Mostly this function has the same issues as the previous one.
else:
print ("Contact Marko for assistance.\n")
Except here. Never put your name in someplace like this. Or you'll still be getting calls on this code 10 years from now. Put your department name ("IT Operations") or a support number. The people who need to know will already know. And the people who don't need to know can just notify the people that already know.
if __name__ == "__main__":
# Run the Create or Update function
createOrUpdateClient()
# Give the previous function 5 seconds to take effect.
sleep(5.0)
# Run the Company Update function
contactCompanyUpdate()
print("Sync complete.")
print("Moving \"{something}\" to the archive folder...".format(something=theCSV))
# Cron version
#shutil.move( i, "/home/accountName/public_html/clientFolder/archive/" + os.path.basename(i))
# Local version
movePath = "archive/{thefile}".format(thefile=theCSV)
shutil.move( i, movePath )
print("Move successful! Exiting...\n")
This was awkward. You might consider taking some command line arguments and using them to determine your behavior.
sys.exit()
And don't do this. Never put an exit() at module scope, because it means you can't possibly import this code. Maybe someone wants to import it to parse the docstrings. Or maybe they want to borrow some of those API functions you wrote. Too bad! sys.exit() means always having to say "Oh, sorry, I'll have to do that for you." Put it at the bottom of your actual __name__ == "__main__" code. Or, since you aren't actually passing a value, just remove it entirely.
Is there a python library for converting a JSON schema to a python class definition, similar to jsonschema2pojo -- https://github.com/joelittlejohn/jsonschema2pojo -- for Java?
So far the closest thing I've been able to find is warlock, which advertises this workflow:
Build your schema
>>> schema = {
'name': 'Country',
'properties': {
'name': {'type': 'string'},
'abbreviation': {'type': 'string'},
},
'additionalProperties': False,
}
Create a model
>>> import warlock
>>> Country = warlock.model_factory(schema)
Create an object using your model
>>> sweden = Country(name='Sweden', abbreviation='SE')
However, it's not quite that easy. The objects that Warlock produces lack much in the way of introspectible goodies. And if it supports nested dicts at initialization, I was unable to figure out how to make them work.
To give a little background, the problem that I was working on was how to take Chrome's JSONSchema API and produce a tree of request generators and response handlers. Warlock doesn't seem too far off the mark, the only downside is that meta-classes in Python can't really be turned into 'code'.
Other useful modules to look for:
jsonschema - (which Warlock is built on top of)
valideer - similar to jsonschema but with a worse name.
bunch - An interesting structure builder thats half-way between a dotdict and construct
If you end up finding a good one-stop solution for this please follow up your question - I'd love to find one. I poured through github, pypi, googlecode, sourceforge, etc.. And just couldn't find anything really sexy.
For lack of any pre-made solutions, I'll probably cobble together something with Warlock myself. So if I beat you to it, I'll update my answer. :p
python-jsonschema-objects is an alternative to warlock, build on top of jsonschema
python-jsonschema-objects provides an automatic class-based binding to JSON schemas for use in python.
Usage:
Sample Json Schema
schema = '''{
"title": "Example Schema",
"type": "object",
"properties": {
"firstName": {
"type": "string"
},
"lastName": {
"type": "string"
},
"age": {
"description": "Age in years",
"type": "integer",
"minimum": 0
},
"dogs": {
"type": "array",
"items": {"type": "string"},
"maxItems": 4
},
"gender": {
"type": "string",
"enum": ["male", "female"]
},
"deceased": {
"enum": ["yes", "no", 1, 0, "true", "false"]
}
},
"required": ["firstName", "lastName"]
} '''
Converting the schema object to class
import python_jsonschema_objects as pjs
import json
schema = json.loads(schema)
builder = pjs.ObjectBuilder(schema)
ns = builder.build_classes()
Person = ns.ExampleSchema
james = Person(firstName="James", lastName="Bond")
james.lastName
u'Bond' james
example_schema lastName=Bond age=None firstName=James
Validation :
james.age = -2
python_jsonschema_objects.validators.ValidationError: -2 was less
or equal to than 0
But problem is , it is still using draft4validation while jsonschema has moved over draft4validation , i filed an issue on the repo regarding this .
Unless you are using old version of jsonschema , the above package will work as shown.
I just created this small project to generate code classes from json schema, even if dealing with python I think can be useful when working in business projects:
pip install jsonschema2popo
running following command will generate a python module containing json-schema defined classes (it uses jinja2 templating)
jsonschema2popo -o /path/to/output_file.py /path/to/json_schema.json
more info at: https://github.com/frx08/jsonschema2popo