Is there any way to edit an airflow operator after creation?

Is there any way to edit an airflow operator after creation? - python

I have a python script that dynamically create task (airflow operator) and DAG basing on a JSON file that maps every option desired.
The script also dedicated function to create any operator needed.
Sometimes i want to activate some conditional options based on the mapping... for example in a bigqueryOperator sometimes i need a time_partitioning and a destination_table, but i don't want to set on every mapped task.
I've tried to read documentation about BaseOperator, but i can't see any java-like set method.
Function that return the operator for example the bigQuery one
def bqOperator(mappedTask):
try:
return BigQueryOperator(
task_id=mappedTask.get('task_id'),
sql=mappedTask.get('sql'),
##destination_dataset_table=project+'.'+dataset+'.'+mappedTask.get('target'),
write_disposition=mappedTask.get('write_disposition'),
allow_large_results=mappedTask.get('allow_large_results'),
##time_partitioning=mappedTask.get('time_partitioning'),
use_legacy_sql=mappedTask.get('use_legacy_sql'),
dag=dag,
)
except Exception as e:
error = 'Error creating BigQueryOperator for task : ' + mappedTask.get('task_id')
logger.error(error)
raise Exception(error)
mappedTask inside json file without partitioning
{
"task_id": "TEST_TASK_ID",
"sql": "some fancy query",
"type": "bqOperator",
"dependencies": [],
"write_disposition": "WRITE_APPEND",
"allow_large_results": true,
"createDisposition": "CREATE_IF_NEEDED",
"use_legacy_sql": false
},
mappedTask inside json file with partitioning
{
"task_id": "TEST_TASK_ID_PARTITION",
"sql": "some fancy query",
"type": "bqOperator",
"dependencies": [],
"write_disposition": "WRITE_APPEND",
"allow_large_results": true,
"createDisposition": "CREATE_IF_NEEDED",
"use_legacy_sql": false,
"targetTable": "TARGET_TABLE",
"time_partitioning": {
"field": "DATE_TO_PART",
"type": "DAY"
}
},

Change bqOperator as below to handle that case, basically it would pass None when it won't find that field in your json:
def bqOperator(mappedTask):
try:
return BigQueryOperator(
task_id=mappedTask.get('task_id'),
sql=mappedTask.get('sql'),
destination_dataset_table="{}.{}.{}".format(project, dataset, mappedTask.get('target')) if mappedTask.get('target', None) else None,
write_disposition=mappedTask.get('write_disposition'),
allow_large_results=mappedTask.get('allow_large_results'),
time_partitioning=mappedTask.get('time_partitioning', None),
use_legacy_sql=mappedTask.get('use_legacy_sql'),
dag=dag,
)
except Exception as e:
error = 'Error creating BigQueryOperator for task : ' + mappedTask.get('task_id')
logger.error(error)
raise Exception(error)

There is no private methods or fields in python, so you can directly set and get fields like
op.use_legacy_sql = True
Given that I strongly discourage from doing this, as this a real code smell. Instead you could modify you factory class to apply some defaults to your json data.
Or even better, apply defaults on json itself. Than save and use updated json. This will make things more predictable.

Related

Parsing JSON in AWS Lambda Python

For a personal project I'm trying to write an AWS Lambda in Python3.9 that will delete a newly created user, if the creator is not myself. For this, the logs in CloudWatch Logs will trigger (via CloudTrail and EventBridge) my Lambda. Therefore, I will receive the JSON request as my event in :
def lambdaHandler(event, context)
But I have trouble to parse it...
If I print the event, I get that :
{'version': '1.0', 'invokingEvent': '{
"configurationItemDiff": {
"changedProperties": {},
"changeType": "CREATE"
},
"configurationItem": {
"relatedEvents": [],
"relationships": [],
"configuration": {
"path": "/",
"userName": "newUser",
"userId": "xxx",
"arn": "xxx",
"createDate": "2022-11-23T09:02:49.000Z",
"userPolicyList": [],
"groupList": [],
"attachedManagedPolicies": [],
"permissionsBoundary": null,
"tags": []
},
"supplementaryConfiguration": {},
"tags": {},
"configurationItemVersion": "1.3",
"configurationItemCaptureTime": "2022-11-23T09:04:40.659Z",
"configurationStateId": 1669194280659,
"awsAccountId": "141372946428",
"configurationItemStatus": "ResourceDiscovered",
"resourceType": "AWS::IAM::User",
"resourceId": "xxx",
"resourceName": "newUser",
"ARN": "arn:aws:iam::xxx:user/newUser",
"awsRegion": "global",
"availabilityZone": "Not Applicable",
"configurationStateMd5Hash": "",
"resourceCreationTime": "2022-11-23T09:02:49.000Z"
},
"notificationCreationTime": "2022-11-23T09:04:41.317Z",
"messageType": "ConfigurationItemChangeNotification",
"recordVersion": "1.3"
}', 'ruleParameters': '{
"badUser": "arn:aws:iam::xxx:user/badUser"
}', 'resultToken': 'xxx=', 'eventLeftScope': False, 'executionRoleArn': 'arn:aws:iam: : xxx:role/aws-service-role/config.amazonaws.com/AWSServiceRoleForConfig', 'configRuleArn': 'arn:aws:config:eu-west-1: xxx:config-rule/config-rule-q3nmvt', 'configRuleName': 'UserCreatedRule', 'configRuleId': 'config-rule-q3nmvt', 'accountId': 'xxx'
}
And for my purpose, I'd like to get the "changeType": "CREATE" value to say that if it is CREATE, I check the creator and if it is not myself, I delete newUser.
So the weird thing is that I copy/paste that event into VSCode and format it in a .json document and it says that there are errors (line 1 : version and invokingEvent should be double quote for example, but well).
For now I only try to reach and print the
"changeType": "CREATE"
by doing :
import json
import boto3
import logging
iam = boto3.client('iam')
def lambda_handler(event, context):
"""
Triggered if a user is created
Check the creator - if not myself :
- delete new user and remove from groups if necessary
"""
try:
print(event['invokingEvent']["configurationItemDiff"]["changeType"])
except Exception as e:
print("Error because :")
print(e)
And get the error string indices must be integers - it happens for ["configurationItemDiff"].
I understand the error already (I'm new to python though so maybe not completely) and tried many things like :
print(event['invokingEvent']['configurationItemDiff']) : swapping double quote by simple quote but doesnt change anything
print(event['invokingEvent'][0]) : but it gives me the index { and [2] gives me the c not the whole value.
At this point I'm stuck and need help because I can't find any solution on this. I don't use SNS, maybe should I ? Because I saw that with it, the JSON document would not be the same and we can access through ["Records"][...] ? I don't know, please help

What you are printing is a python dict, it looks sort of like JSON but is not JSON, it is the representation of a python dict. That means it will have True / False instead of true / false, it will have ' instead of ", etc.
You could do print(json.dumps(event)) instead.
Anyway, the actual problem is that invokingEvent is yet another JSON, but in its string form, you need to to json.loads that nested JSON string. You can see that because the value after invokingEvent is inside another set of '...', therefore it is a string, not a parsed dict already.
invoking_event = json.loads(event['invokingEvent'])
change_type = invoking_event["configurationItemDiff"]["changeType"]
ruleParameters would be another nested JSON which needs parsing first if you wanted to use it.

Navigating Event in AWS Lambda Python

So I'm fairly new to both AWS and Python. I'm on a uni assignment and have hit a road block.
I'm uploading data to AWS S3, this information is being sent to an SQS Queue and passed into AWS Lambda. I know, it would be much easier to just go straight from S3 to Lambda...but apparently "that's not the brief".
So I've got my event accurately coming into AWS Lambda, but no matter how deep I dig, I can't reach the information I need. In AMS Lambda, I run the following query.
def lambda_handler(event, context):
print(event)
Via CloudWatch, I get the output
{'Records': [{'messageId': '1d8e0a1d-d7e0-42e0-9ff7-c06610fccae0', 'receiptHandle': 'AQEBr64h6lBEzLk0Xj8RXBAexNukQhyqbzYIQDiMjJoLLtWkMYKQp5m0ENKGm3Icka+sX0HHb8gJoPmjdTRNBJryxCBsiHLa4nf8atpzfyCcKDjfB9RTpjdTZUCve7nZhpP5Fn7JLVCNeZd1vdsGIhkJojJ86kbS3B/2oBJiCR6ZfuS3dqZXURgu6gFg9Yxqb6TBrAxVTgBTA/Pr35acEZEv0Dy/vO6D6b61w2orabSnGvkzggPle0zcViR/shLbehROF5L6WZ5U+RuRd8tLLO5mLFf5U+nuGdVn3/N8b7+FWdzlmLOWsI/jFhKoN4rLiBkcuL8UoyccTMJ/QTWZvh5CB2mwBRHectqpjqT4TA3Z9+m8KNd/h/CIZet+0zDSgs5u', 'body': '{"Records":[{"eventVersion":"2.1","eventSource":"aws:s3","awsRegion":"eu-west-2","eventTime":"2021-03-26T01:03:53.611Z","eventName":"ObjectCreated:Put","userIdentity":{"principalId":"MY_ID"},"requestParameters":{"sourceIPAddress":"MY_IP_ADD"},"responseElements":{"x-amz-request-id":"BQBY06S20RYNH1XJ","x-amz-id-2":"Cdo0RvX+tqz6SZL/Xw9RiBLMCS3Rv2VOsu2kVRa7PXw9TsIcZeul6bzbAS6z4HF6+ZKf/2MwnWgzWYz+7jKe07060bxxPhsY"},"s3":{"s3SchemaVersion":"1.0","configurationId":"test","bucket":{"name":"MY_BUCKET","ownerIdentity":{"principalId":"MY_ID"},"arn":"arn:aws:s3:::MY_BUCKET"},"object":{"key":"test.jpg","size":246895,"eTag":"c542637a515f6df01cbc7ee7f6e317be","sequencer":"00605D33019AD8E4E5"}}}]}', 'attributes': {'ApproximateReceiveCount': '1', 'SentTimestamp': '1616720643174', 'SenderId': 'AIDAIKZTX7KCMT7EP3TLW', 'ApproximateFirstReceiveTimestamp': '1616720648174'}, 'messageAttributes': {}, 'md5OfBody': '1ab703704eb79fbbb58497ccc3f2c555', 'eventSource': 'aws:sqs', 'eventSourceARN': 'arn:aws:sqs:eu-west-2:ARN', 'awsRegion': 'eu-west-2'}]}
[Disclaimer, I've tried to edit out any identifying information but if there's any sensitive data I'm not understanding or missed, please let me know]
Anyways, just for a sample, I want to get the Object Key, which is test.jpg. I tried to drill down as much as I can, finally getting to: -
def lambda_handler(event, context):
print(event['Records'][0]['body'])
This returned the following (which was nice to see fully stylized): -
{
"Records": [
{
"eventVersion": "2.1",
"eventSource": "aws:s3",
"awsRegion": "eu-west-2",
"eventTime": "2021-03-26T01:08:16.823Z",
"eventName": "ObjectCreated:Put",
"userIdentity": {
"principalId": "MY_ID"
},
"requestParameters": {
"sourceIPAddress": "MY_IP"
},
"responseElements": {
"x-amz-request-id": "ZNKHRDY8GER4F6Q5",
"x-amz-id-2": "i1Cazudsd+V57LViNWyDNA9K+uRbSQQwufMC6vf50zQfzPaH7EECsvw9SFM3l3LD+TsYEmnjXn1rfP9GQz5G5F7Fa0XZAkbe"
},
"s3": {
"s3SchemaVersion": "1.0",
"configurationId": "test",
"bucket": {
"name": "MY_BUCKET",
"ownerIdentity": {
"principalId": "MY_ID"
},
"arn": "arn:aws:s3:::MY_BUCKET"
},
"object": {
"key": "test.jpg",
"size": 254276,
"eTag": "b0052ab9ba4b9395e74082cfd51a8f09",
"sequencer": "00605D3407594DE184"
}
}
}
]
}
However, from this stage on if I try to write print(event['Records'][0]['body']['Records']) or print(event['Records'][0]['s3']), I'll get told I require an integer, not a string. If I try to write print(event['Records'][0]['body'][0]), I'll be given a single character every time (in this cause the first { bracket).
I'm not sure if this has something to do with tuples, or if at this stage it's all saved as one large string, but at least in the output view it doesn't appear to be saved that way.
Does anyone have any idea what I'd do from this stage to access the further information? In the full release after I'm done testing, I'll be wanting to save an audio file and the file name as opposed to a picture.
Thanks.

You are having this problem because the contents of the body is a JSON. But in string format. You should parse it to be able to access it like a normal dictionary. Like so:
import json
def handler(event: dict, context: object):
body = event['Records'][0]['body']
body = json.loads(body)
# use the body as a normal dictionary
You are getting only a single char when using integer indexes because it is a string. So, using [n] in an string will return the nth char.

It's because your getting stringified JSON data. You need to load it back to its Python dict format.
There is a useful package called lambda_decorators. you can install with pip install lambda_decorators
so you can do this:
from lambda_decorators import load_json_body
#load_json_body
def lambda_handler(event, context):
print(event['Records'][0]['body'])
# Now you can access the the items in the body using there index and keys.
This will extract the JSON for you.

Add stdout of subprocess to JSON report if test case fails

I'm investigating methods of adding to the JSON report generated by either pytest-json or pytest-json-report: I'm not hung up on either plugin. So far, I've done the bulk of my evaluation using pytest-json. So, for example, the JSON object has this for a test case
{
"name": "fixture_test.py::test_failure1",
"duration": 0.0012421607971191406,
"run_index": 2,
"setup": {
"name": "setup",
"duration": 0.00011181831359863281,
"outcome": "passed"
},
"call": {
"name": "call",
"duration": 0.0008759498596191406,
"outcome": "failed",
"longrepr": "def test_failure1():\n> assert 3 == 4, \"3 always equals 3\"\nE AssertionError: 3 always equals 3\nE assert 3 == 4\n\nfixture_test.py:19: AssertionError"
},
"teardown": {
"name": "teardown",
"duration": 0.00014257431030273438,
"outcome": "passed"
},
"outcome": "failed"
}
This is from experiments I'm trying. In practice, some of the test cases are done by spawning a sub-process via Popen and the assert is that a certain string appears in the stdout. In the event that the test case fails, I need to add a key/value to the call dictionary which contains the stdout of that subprocess. I have tried in vain thus far to find the correct fixture or apparatus to accomplish this. It seems that the pytest_exception_interact may be the way to go, but drilling into the JSON structure has thus far eluded me. All I need to do is add/modify the JSON structure at the point of an error. It seems that pytest_runtest_call is too heavy handed.
Alternatively, is there a means of altering the value of longrepr in the above? I've been unable to find the correct way of doing either of these and it's time to ask.

As it would appear, the pytest-json project is rather defunct. The developer/owner of pytest-json-report has this to say (under Related Tools at this link).
pytest-json has some great features but appears to be unmaintained. I borrowed some ideas and test cases from there.
The pytest-json-report project handles exactly the case that I'm requiring: capturing stdout from a subprocess and putting it into the JSON report. A crude example of doing so follows:
import subprocess as sp
import pytest
import sys
import re
def specialAssertHandler(str, assertMessage):
# because pytest automatically captures stdout,stderr this is all that's needed
# when the report is generated, this will be in a field named "stdout"
print(str)
return assertMessage
def test_subProcessStdoutCapture():
# NOTE: if you're version of Python 3 is sufficiently mature, add text=True also
proc = sp.Popen(['find', '.', '-name', '*.json'], stdout=sp.PIPE)
# NOTE: I had this because on the Ubuntu I was using, this is the version of
# Python and the return of proc.stdout.read() is a binary object not a string
if sys.version[0] == 3 and sys.version[6]:
output = proc.stdout.read().decode()
elif sys.version[0] == 2:
# The other version of Python I'm using is 2.7.15, it's exceedingly frustrating
# that the Python language def changed so between 2 and 3. In 2, the output
# was already a string object
output = proc.stdout.read()
m = re.search('some string', output)
assert m is not None, specialAssertHandler(output, "did not find 'some string' in output")
With the above, using the pytest-json-report, the full output of the subprocess is captured by the infrastructure and placed into the afore mentioned report. An excerpt showing this is below:
{
"nodeid": "expirment_test.py::test_stdout",
"lineno": 25,
"outcome": "failed",
"keywords": [
"PyTest",
"test_stdout",
"expirment_test.py"
],
"setup": {
"duration": 0.0002694129943847656,
"outcome": "passed"
},
"call": {
"duration": 0.02718186378479004,
"outcome": "failed",
"crash": {
"path": "/home/afalanga/devel/PyTest/expirment_test.py",
"lineno": 32,
"message": "AssertionError: Expected to find always\nassert None is not None"
},
"traceback": [
{
"path": "expirment_test.py",
"lineno": 32,
"message": "AssertionError"
}
],
"stdout": "./.report.json\n./report.json\n./report1.json\n./report2.json\n./simple_test.json\n./testing_addition.json\n\n",
"longrepr": "..."
},
"teardown": {
"duration": 0.0004875659942626953,
"outcome": "passed"
}
}
The field longrepr holds the full text of the test case but in the interest of brevety, it is made an ellipsis. In the field crash, the value of assertMessage from my example is placed. This shows that it is possible to place such messages into the report at the point of occurrence instead of post processing.
I think it may be possible to "cleverly" handle this using the hook I referenced in my original question pytest_exception_interact. If I find it is so, I'll update this answer with a demonstration.

Python 3 unittest - Is it possible to get access to assert information without overloading?

I am building custom TestResult and TestRunner classes. I would like to capture assert information from every assert that is ran, regardless if it results in a Fail, Error, or Pass.
An example test scenario:
import unittest
class TestMath(unittest.TestCase):
def test_square(self):
self.assertEqual(9, 3*3) # will Pass
self.assertEqual(9, 3**3) # will Fail
From running the above case, is it possible for me to access information such as assertion type, status, parameters, etc without digging into and overloading the assert methods with my own TestCase?
My end goal is to create something like this:
"TestMath": {
"test_square": [
{
"type": "assertEqual",
"status": "Pass",
"expected": 9,
"result": 9
},
{
"type": "assertEqual",
"status": "Fail",
"expected": 9,
"result": 27
}
]
}
Is there a built-in mechanism to get access to this kind of information that I am just not aware of? Or is this something I will need to address for my use case?

How can I improve this script to make it more pythonic?

I'm fairly new to Python programming, and have thus far been reverse engineering code that previous developers have made, or have cobbled together some functions on my own.
The script itself works; to cut a long story short, its designed to parse a CSV and to (a) create and or update the contacts found in the CSV, and (b) to correctly assign the contact to their associated company. All using the HubSpot API. To achieve this i've also imported requests and csvmapper.
I had the following questions:
How can I improve this script to make it more pythonic?
What is the best way to make this script run on a remote server,
keeping in mind that Requests and CSVMapper probably aren't
installed on that server, and that I most likely won't have
permission to install them - what is the best way to "package" this
script, or to upload Requests and CSVMapper to the server?
Any advice much appreciated.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import print_function
import sys, os.path, requests, json, csv, csvmapper, glob, shutil
from time import sleep
major, minor, micro, release_level, serial = sys.version_info
# Client Portal ID
portal = "XXXXXX"
# Client API Key
hapikey = "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
# This attempts to find any file in the directory that starts with "note" and ends with ".CSV"
# Server Version
# findCSV = glob.glob('/home/accountName/public_html/clientFolder/contact*.CSV')
# Local Testing Version
findCSV = glob.glob('contact*.CSV')
for i in findCSV:
theCSV = i
csvfileexists = os.path.isfile(theCSV)
# Prints a confirmation if file exists, prints instructions if it doesn't.
if csvfileexists:
print ("\nThe \"{csvPath}\" file was found ({csvSize} bytes); proceeding with sync ...\n".format(csvSize=os.path.getsize(theCSV), csvPath=os.path.basename(theCSV)))
else:
print ("File not found; check the file name to make sure it is in the same directory as this script. Exiting ...")
sys.exit()
# Begin the CSVmapper mapping... This creates a virtual "header" row - the CSV therefore does not need a header row.
mapper = csvmapper.DictMapper([
[
{'name':'account'}, #"Org. Code"
{'name':'id'}, #"Hubspot Ref"
{'name':'company'}, #"Company Name"
{'name':'firstname'}, #"Contact First Name"
{'name':'lastname'}, #"Contact Last Name"
{'name':'job_title'}, #"Job Title"
{'name':'address'}, #"Address"
{'name':'city'}, #"City"
{'name':'phone'}, #"Phone"
{'name':'email'}, #"Email"
{'name':'date_added'} #"Last Update"
]
])
# Parse the CSV using the mapper
parser = csvmapper.CSVParser(os.path.basename(theCSV), mapper)
# Build the parsed object
obj = parser.buildObject()
def contactCompanyUpdate():
# Open the CSV, use commas as delimiters, store it in a list called "data", then find the length of that list.
with open(os.path.basename(theCSV),"r") as f:
reader = csv.reader(f, delimiter = ",", quotechar="\"")
data = list(reader)
# For every row in the CSV ...
for row in range(0, len(data)):
# Set up the JSON payload ...
payload = {
"properties": [
{
"name": "account",
"value": obj[row].account
},
{
"name": "id",
"value": obj[row].id
},
{
"name": "company",
"value": obj[row].company
},
{
"property": "firstname",
"value": obj[row].firstname
},
{
"property": "lastname",
"value": obj[row].lastname
},
{
"property": "job_title",
"value": obj[row].job_title
},
{
"property": "address",
"value": obj[row].address
},
{
"property": "city",
"value": obj[row].city
},
{
"property": "phone",
"value": obj[row].phone
},
{
"property": "email",
"value": obj[row].email
},
{
"property": "date_added",
"value": obj[row].date_added
}
]
}
nameQuery = "{first} {last}".format(first=obj[row].firstname, last=obj[row].lastname)
# Get a list of all contacts for a certain company.
contactCheck = "https://api.hubapi.com/contacts/v1/search/query?q={query}&hapikey={hapikey}".format(hapikey=hapikey, query=nameQuery)
# Convert the payload to JSON and assign it to a variable called "data"
data = json.dumps(payload)
# Defined the headers content-type as 'application/json'
headers = {'content-type': 'application/json'}
contactExistCheck = requests.get(contactCheck, headers=headers)
for i in contactExistCheck.json()[u'contacts']:
# ... Get the canonical VIDs
canonicalVid = i[u'canonical-vid']
if canonicalVid:
print ("{theContact} exists! Their VID is \"{vid}\"".format(theContact=obj[row].firstname, vid=canonicalVid))
print ("Attempting to update their company...")
contactCompanyUpdate = "https://api.hubapi.com/companies/v2/companies/{companyID}/contacts/{vid}?hapikey={hapikey}".format(hapikey=hapikey, vid=canonicalVid, companyID=obj[row].id)
doTheUpdate = requests.put(contactCompanyUpdate, headers=headers)
if doTheUpdate.status_code == 200:
print ("Attempt Successful! {theContact}'s has an updated company.\n".format(theContact=obj[row].firstname))
break
else:
print ("Attempt Failed. Status Code: {status}. Company or Contact not found.\n".format(status=doTheUpdate.status_code))
def createOrUpdateClient():
# Open the CSV, use commas as delimiters, store it in a list called "data", then find the length of that list.
with open(os.path.basename(theCSV),"r") as f:
reader = csv.reader(f, delimiter = ",", quotechar="\"")
data = list(reader)
# For every row in the CSV ...
for row in range(0, len(data)):
# Set up the JSON payload ...
payloadTest = {
"properties": [
{
"property": "email",
"value": obj[row].email
},
{
"property": "firstname",
"value": obj[row].firstname
},
{
"property": "lastname",
"value": obj[row].lastname
},
{
"property": "website",
"value": None
},
{
"property": "company",
"value": obj[row].company
},
{
"property": "phone",
"value": obj[row].phone
},
{
"property": "address",
"value": obj[row].address
},
{
"property": "city",
"value": obj[row].city
},
{
"property": "state",
"value": None
},
{
"property": "zip",
"value": None
}
]
}
# Convert the payload to JSON and assign it to a variable called "data"
dataTest = json.dumps(payloadTest)
# Defined the headers content-type as 'application/json'
headers = {'content-type': 'application/json'}
#print ("{theContact} does not exist!".format(theContact=obj[row].firstname))
print ("Attempting to add {theContact} as a contact...".format(theContact=obj[row].firstname))
createOrUpdateURL = 'http://api.hubapi.com/contacts/v1/contact/createOrUpdate/email/{email}/?hapikey={hapikey}'.format(email=obj[row].email,hapikey=hapikey)
r = requests.post(createOrUpdateURL, data=dataTest, headers=headers)
if r.status_code == 409:
print ("This contact already exists.\n")
elif (r.status_code == 200) or (r.status_code == 202):
print ("Success! {firstName} {lastName} has been added.\n".format(firstName=obj[row].firstname,lastName=obj[row].lastname, response=r.status_code))
elif r.status_code == 204:
print ("Success! {firstName} {lastName} has been updated.\n".format(firstName=obj[row].firstname,lastName=obj[row].lastname, response=r.status_code))
elif r.status_code == 400:
print ("Bad request. You might get this response if you pass an invalid email address, if a property in your request doesn't exist, or if you pass an invalid property value.\n")
else:
print ("Contact Marko for assistance.\n")
if __name__ == "__main__":
# Run the Create or Update function
createOrUpdateClient()
# Give the previous function 5 seconds to take effect.
sleep(5.0)
# Run the Company Update function
contactCompanyUpdate()
print("Sync complete.")
print("Moving \"{something}\" to the archive folder...".format(something=theCSV))
# Cron version
#shutil.move( i, "/home/accountName/public_html/clientFolder/archive/" + os.path.basename(i))
# Local version
movePath = "archive/{thefile}".format(thefile=theCSV)
shutil.move( i, movePath )
print("Move successful! Exiting...\n")
sys.exit()

I'll just go from top to bottom. The first rule is, do what's in PEP 8. It's not the ultimate style guide, but it's certainly a reference baseline for Python coders, and that's more important, especially when you're getting started. The second rule is, make it maintainable. A couple of years from now, when some other new kid comes through, it should be easy for her to figure out what you were doing. Sometimes that means doing things the long way, to reduce errors. Sometimes it means doing things the short way, to reduce errors. :-)
#!/usr/bin/env python
# -*- coding: utf-8 -*-
Two things: you got the encoding right, per PEP 8. And
Conventions for writing good documentation strings (a.k.a. "docstrings") are immortalized in PEP 257.
You've got a program that does something. But you don't document what.
from __future__ import print_function
import sys, os.path, requests, json, csv, csvmapper, glob, shutil
from time import sleep
major, minor, micro, release_level, serial = sys.version_info
Per PEP 8: put your import module statements one per line.
Per Austin: make your paragraphs have separate subjects. You've got some imports right next to some version info stuff. Insert a blank line. Also, DO SOMETHING with the data! Or you didn't need it to be right here, did you?
# Client Portal ID
portal = "XXXXXX"
# Client API Key
hapikey = "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
You've obscured these in more ways than one. WTF is a hapikey? I think you mean Hubspot_API_key. And what does portal do?
One piece of advice: the more "global" a thing is, the more "formal" it should be. If you have a for loop, it's okay to call one of the variables i. If you have a piece of data that is used throughout a function, call it obj or portal. But if you have a piece of data that is used globally, or is a class variable, make it put on a tie and a jacket so everyone can recognize it: make it Hubspot_api_key instead of client_api_key. Maybe even Hubspot_client_api_key if there are more than one API. Do the same with portal.
# This attempts to find any file in the directory that starts with "note" and ends with ".CSV"
# Server Version
# findCSV = glob.glob('/home/accountName/public_html/clientFolder/contact*.CSV')
It didn't take long for the comments to become lies. Just delete them if they aren't true.
# Local Testing Version
findCSV = glob.glob('contact*.CSV')
This is the kind of thing that you should create a function for. Just create a simple function called "get_csv_files" or whatever, and have it return a list of filenames. That decouples you from glob, and it means you can make your test code data driven (pass a list of filenames into a function, or pass a single file into a function, instead of asking it to search for them). Also, those glob patterns are exactly the kind of thing that go in a config file, or a global variable, or get passed as command line arguments.
for i in findCSV:
I'll bet typing CSV in upper case all the time is a pain. And what does findCSV mean? Read that line, and figure out what that variable should be called. Maybe csv_files? Or new_contact_files? Something that demonstrates that there is a collection of things.
theCSV = i
csvfileexists = os.path.isfile(theCSV)
Now what does i do? You had this nice small variable name, in a BiiiiiiG loop. That was a mistake, since if you can't see a variable's entire scope all on one page, it probably needs a somewhat longer name. But then you created an alias for it. Both i and theCSV refer to the same thing. And ... I don't see you using i again. So maybe your loop variable should be theCSV. Or maybe it should be the_csv to make it easier to type. Or just csvname.
# Prints a confirmation if file exists, prints instructions if it doesn't.
This seems a little needless. If you're using glob to get filenames, they pretty much are going to exist. (If they don't, it's because they were deleted between the time you called glob and the time you tried to open them. That's possible, but rare. Just continue or raise an exception, depending.)
if csvfileexists:
print ("\nThe \"{csvPath}\" file was found ({csvSize} bytes); proceeding with sync ...\n".format(csvSize=os.path.getsize(theCSV), csvPath=os.path.basename(theCSV)))
In this code, you use the value of csvfileexists. But that's the only place you use it. In this case, you can probably move the call to os.path.isfile() into the if statement and get rid of the variable.
else:
print ("File not found; check the file name to make sure it is in the same directory as this script. Exiting ...")
sys.exit()
Notice that in this case, when there is an actual problem, you didn't print the file name? How helpful was that?
Also, remember the part where you're on a remote server? You should consider using Python's logging module to record these messages in a useful manner.
# Begin the CSVmapper mapping... This creates a virtual "header" row - the CSV therefore does not need a header row.
mapper = csvmapper.DictMapper([
[
{'name':'account'}, #"Org. Code"
{'name':'id'}, #"Hubspot Ref"
{'name':'company'}, #"Company Name"
{'name':'firstname'}, #"Contact First Name"
{'name':'lastname'}, #"Contact Last Name"
{'name':'job_title'}, #"Job Title"
{'name':'address'}, #"Address"
{'name':'city'}, #"City"
{'name':'phone'}, #"Phone"
{'name':'email'}, #"Email"
{'name':'date_added'} #"Last Update"
]
])
You're creating an object with a bunch of data. This would be a good place for a function. Define a make_csvmapper() function to do all this for you, and move it out of line.
Also, note that the standard csv module has most of the functionality you are using. I don't think you actually need csvmapper.
# Parse the CSV using the mapper
parser = csvmapper.CSVParser(os.path.basename(theCSV), mapper)
# Build the parsed object
obj = parser.buildObject()
Here's another chance for a function. Maybe instead of making a csv mapper, you could just return the obj?
def contactCompanyUpdate():
At this point, things get fishy. You have these function definitions indented, but I don't think you need them. Is that a stackoverflow problem, or does your code really look like this?
# Open the CSV, use commas as delimiters, store it in a list called "data", then find the length of that list.
with open(os.path.basename(theCSV),"r") as f:
No, apparently it really looks like this. Because you're using theCSV inside this function when you don't really need to. Please consider using formal function parameters instead of just grabbing outer-scope objects. Also, why are you using basename on the csv file? If you obtained it using glob, doesn't it already have the path you want?
reader = csv.reader(f, delimiter = ",", quotechar="\"")
data = list(reader)
# For every row in the CSV ...
for row in range(0, len(data)):
Here you forced data to be a list of rows obtained from reader, and then started iterating over them. Just iterate over reader directly, like: for row in reader: BUT WAIT! You're actually iterating over a CSV file that you have already opened, in your obj variable. Just pick one, and iterate over it. You don't need to open the file twice for this.
# Set up the JSON payload ...
payload = {
"properties": [
{
"name": "account",
"value": obj[row].account
},
{
"name": "id",
"value": obj[row].id
},
{
"name": "company",
"value": obj[row].company
},
{
"property": "firstname",
"value": obj[row].firstname
},
{
"property": "lastname",
"value": obj[row].lastname
},
{
"property": "job_title",
"value": obj[row].job_title
},
{
"property": "address",
"value": obj[row].address
},
{
"property": "city",
"value": obj[row].city
},
{
"property": "phone",
"value": obj[row].phone
},
{
"property": "email",
"value": obj[row].email
},
{
"property": "date_added",
"value": obj[row].date_added
}
]
}
Okay, that was a LOOOONG span of code that didn't do much. At the least, tighten those inner dicts up to one line each. But better still, write a function to create your dictionary in the format you want. You can use getattr to pull the data by name from obj.
nameQuery = "{first} {last}".format(first=obj[row].firstname, last=obj[row].lastname)
# Get a list of all contacts for a certain company.
contactCheck = "https://api.hubapi.com/contacts/v1/search/query?q={query}&hapikey={hapikey}".format(hapikey=hapikey, query=nameQuery)
# Convert the payload to JSON and assign it to a variable called "data"
data = json.dumps(payload)
# Defined the headers content-type as 'application/json'
headers = {'content-type': 'application/json'}
contactExistCheck = requests.get(contactCheck, headers=headers)
Here you're encoding details of the API into your code. Consider pulling them out into functions. (That way, you can come back later and build a module of them, to re-use in your next program.) Also, beware of comments that don't actually tell you anything. And feel free to pull that together as a single paragraph, since it's all in service of the same key thing - making an API call.
for i in contactExistCheck.json()[u'contacts']:
# ... Get the canonical VIDs
canonicalVid = i[u'canonical-vid']
if canonicalVid:
print ("{theContact} exists! Their VID is \"{vid}\"".format(theContact=obj[row].firstname, vid=canonicalVid))
print ("Attempting to update their company...")
contactCompanyUpdate = "https://api.hubapi.com/companies/v2/companies/{companyID}/contacts/{vid}?hapikey={hapikey}".format(hapikey=hapikey, vid=canonicalVid, companyID=obj[row].id)
doTheUpdate = requests.put(contactCompanyUpdate, headers=headers)
if doTheUpdate.status_code == 200:
print ("Attempt Successful! {theContact}'s has an updated company.\n".format(theContact=obj[row].firstname))
break
else:
print ("Attempt Failed. Status Code: {status}. Company or Contact not found.\n".format(status=doTheUpdate.status_code))
I'm not sure if this last bit should be an exception or not. Is an "Attempt Failed" normal behavior, or does it mean that something is broken?
At any rate, please look into the API you are using. I'd bet there is some more information available for minor failures. (Major failures would be the internet is broken or their server is offline.) They might provide an "errors" or "error" field in their return JSON, for example. Those should be logged or printed with your failure message.
def createOrUpdateClient():
Mostly this function has the same issues as the previous one.
else:
print ("Contact Marko for assistance.\n")
Except here. Never put your name in someplace like this. Or you'll still be getting calls on this code 10 years from now. Put your department name ("IT Operations") or a support number. The people who need to know will already know. And the people who don't need to know can just notify the people that already know.
if __name__ == "__main__":
# Run the Create or Update function
createOrUpdateClient()
# Give the previous function 5 seconds to take effect.
sleep(5.0)
# Run the Company Update function
contactCompanyUpdate()
print("Sync complete.")
print("Moving \"{something}\" to the archive folder...".format(something=theCSV))
# Cron version
#shutil.move( i, "/home/accountName/public_html/clientFolder/archive/" + os.path.basename(i))
# Local version
movePath = "archive/{thefile}".format(thefile=theCSV)
shutil.move( i, movePath )
print("Move successful! Exiting...\n")
This was awkward. You might consider taking some command line arguments and using them to determine your behavior.
sys.exit()
And don't do this. Never put an exit() at module scope, because it means you can't possibly import this code. Maybe someone wants to import it to parse the docstrings. Or maybe they want to borrow some of those API functions you wrote. Too bad! sys.exit() means always having to say "Oh, sorry, I'll have to do that for you." Put it at the bottom of your actual __name__ == "__main__" code. Or, since you aren't actually passing a value, just remove it entirely.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Is there any way to edit an airflow operator after creation? - python

Related

Parsing JSON in AWS Lambda Python

Navigating Event in AWS Lambda Python

Add stdout of subprocess to JSON report if test case fails

Python 3 unittest - Is it possible to get access to assert information without overloading?

How can I improve this script to make it more pythonic?

Categories

Resources