Reading/Writing to JSON adds an extra unnecessary curly bracket } - python

I am writing a program in Python where a JSON local file needs to be updated with the last processed item in a database so that the process kicks off again from that point.
Problem I am having in my code is that sometimes, it adds an extra curly bracket "}" to the end of the code causing the JSON to become invalid. This then breaks the scheduled process until the JSON file is updated.
I know that I can first read the file to the object, then close the file then open it again to write to it but personally it doesn't feel like the code would be as clean given that it is being constantly written to so that the tracking of the last processed item is not lost.
import json
with open(_SETTINGS, 'r+') as settings:
_last_processed = log['#timestamp']
settings_data[env]['last_processed'] = _last_processed
settings.seek(0)
# settings.truncate()
json.dumps(settings_data, settings, indent=2)
The JSON file, _SETTINGS, looks like as follows:
{
"UAT": {
"last_processed": "2019-10-10T00:00:00.0000Z"
},
"DEV": {
"last_processed": "2019-10-10T00:00:00.0000Z"
}
}
Annoyingly what only sometimes gets returned is the above JSON but with an extra closing curly bracket "}" as below.
{
"UAT": {
"last_processed": "2019-10-10T00:00:00.0000Z"
},
"DEV": {
"last_processed": "2019-10-10T00:00:00.0000Z"
}
}}
Anyone can shed some light on this?

Related

PDF long text extraction to JSON in Python

I'm trying to create a python script that extracts text from a PDF then converts it to a correctly formatted JSON file (see below).
The text extraction is not a problem. I'm using PyPDF2 to extract the text from user inputted pdf - which will often result in a LONG text string. I would like to add this text as a 'value' to a json 'key' (see 2nd example below).
My code:
# Writing all data to JSON file
# Data to be written
dictionary ={
"company": str(company),
"document": str(document),
"text": str(text) # This is what would be a LONG string of text
}
# Serializing json
json_object = json.dumps(dictionary, indent = 4)
print(json_object)
with open('company_document.json', 'w') as fp:
json.dump(json_object, fp)
The ideal output would be a JSON file that is structured like this:
[
{
"company": 1,
"document-name": "Orlando",
"text": " **LONG_TEXT_HERE** "
}
]
I'm not getting the right json structure as an output. Also, the long text string most likely contains some punctuation or special characters that can affect the json - such as closing the string too early. I could take this out before, but is there a way to keep it in for the json file so I can address it in the next step (in Neo4j) ?
This is my output at the moment:
"{\n \"company\": \"Stack\",\n \"document\": \"Overflow Report\",\n \"text\": \"Long text 2020\\nSharing relevant and accountable information about our development "quotes and things...
Current:
Current situation
Goal:
Ideal situation
Does anyone have an idea on how this can be achieved?
Like many people, you are confusing the CONTENT of your data with the REPRESENTATION of your data. The code you have works just fine. Notice:
import json
# Data to be written
dictionary ={
"company": 1,
"document": "Orlando",
"text": """Long text 2020
Sharing relevant and accountable information about our development.
This is a complicated text string with "quotes and things".
"""
}
# Serializing json
json_object = json.dumps([dictionary], indent = 4)
print(json_object)
with open('company_document.json', 'w') as fp:
json.dump([dictionary], fp)
When executed, this produces the following on stdout:
[
{
"company": 1,
"document": "Orlando",
"text": "Long text 2020\nSharing relevant and accountable information about our development.\nThis is a complicated text string with \"quotes and things\".\n"
}
]
Notice that the embedded quotes are escaped. That's what the standard requires. The file does not have the indentation, because you didn't ask for it, but it's still quite valid JSON.
[{"company": 1, "document": "Orlando", "text": "Long text 2020\nSharing relevant and accountable information about our development.\nThis is a complicated text string with \"quotes and things\".\n"}]
FOLLOWUP
This version reads in whatever was in the file before, adds a new record to the list, and saves the whole thing out.
import os
import json
# Read existing data.
MASTER = "company_document.json"
if os.path.exists( MASTER ):
database = json.load( open(MASTER,'r') )
else:
database = []
# Data to be written
dictionary ={
"company": 1,
"document": "Orlando",
"text": """Long text 2020
Sharing relevant and accountable information about our development.
This is a complicated text string with "quotes and things".
"""
}
# Serializing json
json_object = json.dumps([dictionary], indent = 4)
print(json_object)
database.append(dictionary)
with open(MASTER, 'w') as fp:
json.dump(database, fp)

Navigating Event in AWS Lambda Python

So I'm fairly new to both AWS and Python. I'm on a uni assignment and have hit a road block.
I'm uploading data to AWS S3, this information is being sent to an SQS Queue and passed into AWS Lambda. I know, it would be much easier to just go straight from S3 to Lambda...but apparently "that's not the brief".
So I've got my event accurately coming into AWS Lambda, but no matter how deep I dig, I can't reach the information I need. In AMS Lambda, I run the following query.
def lambda_handler(event, context):
print(event)
Via CloudWatch, I get the output
{'Records': [{'messageId': '1d8e0a1d-d7e0-42e0-9ff7-c06610fccae0', 'receiptHandle': 'AQEBr64h6lBEzLk0Xj8RXBAexNukQhyqbzYIQDiMjJoLLtWkMYKQp5m0ENKGm3Icka+sX0HHb8gJoPmjdTRNBJryxCBsiHLa4nf8atpzfyCcKDjfB9RTpjdTZUCve7nZhpP5Fn7JLVCNeZd1vdsGIhkJojJ86kbS3B/2oBJiCR6ZfuS3dqZXURgu6gFg9Yxqb6TBrAxVTgBTA/Pr35acEZEv0Dy/vO6D6b61w2orabSnGvkzggPle0zcViR/shLbehROF5L6WZ5U+RuRd8tLLO5mLFf5U+nuGdVn3/N8b7+FWdzlmLOWsI/jFhKoN4rLiBkcuL8UoyccTMJ/QTWZvh5CB2mwBRHectqpjqT4TA3Z9+m8KNd/h/CIZet+0zDSgs5u', 'body': '{"Records":[{"eventVersion":"2.1","eventSource":"aws:s3","awsRegion":"eu-west-2","eventTime":"2021-03-26T01:03:53.611Z","eventName":"ObjectCreated:Put","userIdentity":{"principalId":"MY_ID"},"requestParameters":{"sourceIPAddress":"MY_IP_ADD"},"responseElements":{"x-amz-request-id":"BQBY06S20RYNH1XJ","x-amz-id-2":"Cdo0RvX+tqz6SZL/Xw9RiBLMCS3Rv2VOsu2kVRa7PXw9TsIcZeul6bzbAS6z4HF6+ZKf/2MwnWgzWYz+7jKe07060bxxPhsY"},"s3":{"s3SchemaVersion":"1.0","configurationId":"test","bucket":{"name":"MY_BUCKET","ownerIdentity":{"principalId":"MY_ID"},"arn":"arn:aws:s3:::MY_BUCKET"},"object":{"key":"test.jpg","size":246895,"eTag":"c542637a515f6df01cbc7ee7f6e317be","sequencer":"00605D33019AD8E4E5"}}}]}', 'attributes': {'ApproximateReceiveCount': '1', 'SentTimestamp': '1616720643174', 'SenderId': 'AIDAIKZTX7KCMT7EP3TLW', 'ApproximateFirstReceiveTimestamp': '1616720648174'}, 'messageAttributes': {}, 'md5OfBody': '1ab703704eb79fbbb58497ccc3f2c555', 'eventSource': 'aws:sqs', 'eventSourceARN': 'arn:aws:sqs:eu-west-2:ARN', 'awsRegion': 'eu-west-2'}]}
[Disclaimer, I've tried to edit out any identifying information but if there's any sensitive data I'm not understanding or missed, please let me know]
Anyways, just for a sample, I want to get the Object Key, which is test.jpg. I tried to drill down as much as I can, finally getting to: -
def lambda_handler(event, context):
print(event['Records'][0]['body'])
This returned the following (which was nice to see fully stylized): -
{
"Records": [
{
"eventVersion": "2.1",
"eventSource": "aws:s3",
"awsRegion": "eu-west-2",
"eventTime": "2021-03-26T01:08:16.823Z",
"eventName": "ObjectCreated:Put",
"userIdentity": {
"principalId": "MY_ID"
},
"requestParameters": {
"sourceIPAddress": "MY_IP"
},
"responseElements": {
"x-amz-request-id": "ZNKHRDY8GER4F6Q5",
"x-amz-id-2": "i1Cazudsd+V57LViNWyDNA9K+uRbSQQwufMC6vf50zQfzPaH7EECsvw9SFM3l3LD+TsYEmnjXn1rfP9GQz5G5F7Fa0XZAkbe"
},
"s3": {
"s3SchemaVersion": "1.0",
"configurationId": "test",
"bucket": {
"name": "MY_BUCKET",
"ownerIdentity": {
"principalId": "MY_ID"
},
"arn": "arn:aws:s3:::MY_BUCKET"
},
"object": {
"key": "test.jpg",
"size": 254276,
"eTag": "b0052ab9ba4b9395e74082cfd51a8f09",
"sequencer": "00605D3407594DE184"
}
}
}
]
}
However, from this stage on if I try to write print(event['Records'][0]['body']['Records']) or print(event['Records'][0]['s3']), I'll get told I require an integer, not a string. If I try to write print(event['Records'][0]['body'][0]), I'll be given a single character every time (in this cause the first { bracket).
I'm not sure if this has something to do with tuples, or if at this stage it's all saved as one large string, but at least in the output view it doesn't appear to be saved that way.
Does anyone have any idea what I'd do from this stage to access the further information? In the full release after I'm done testing, I'll be wanting to save an audio file and the file name as opposed to a picture.
Thanks.
You are having this problem because the contents of the body is a JSON. But in string format. You should parse it to be able to access it like a normal dictionary. Like so:
import json
def handler(event: dict, context: object):
body = event['Records'][0]['body']
body = json.loads(body)
# use the body as a normal dictionary
You are getting only a single char when using integer indexes because it is a string. So, using [n] in an string will return the nth char.
It's because your getting stringified JSON data. You need to load it back to its Python dict format.
There is a useful package called lambda_decorators. you can install with pip install lambda_decorators
so you can do this:
from lambda_decorators import load_json_body
#load_json_body
def lambda_handler(event, context):
print(event['Records'][0]['body'])
# Now you can access the the items in the body using there index and keys.
This will extract the JSON for you.

How to get a value from my JSON file, in python

I want you to ask for your help.
I need to browse folder with json files and I need to search for one attribute with specific value.
Here is my code:
def listDir():
fileNames = os.listdir(FOLDER_PATH)
for fileName in fileNames:
print(fileName)
with open('C:\\Users\\Kamčo\\Desktop\\UBLjsons\\' + fileName, 'r') as json_file:
data = json.load(json_file)
data_json = json.dumps(data, indent=2, sort_keys=True)
print(data_json)
for line in data_json:
if line['ID'] == 'Kamilko':
print("try") #just to be sure it accessed to this
I am getting this error: TypeError: string indices must be integers
I also tried to search for a solution here but it didnt help me.
Here is my JSON
{
"Invoice": {
"#xmlns": "urn:oasis:names:specification:ubl:schema:xsd:Invoice-2",
"#xmlns:cac": "urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2",
"#xmlns:cbc": "urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2",
"ID": "Kamilko",
"IssueDate": "2020-02-09",
"OrderReference": {
"ID": "22"
},
"InvoiceLine": {
"Price": {
"PriceAmount": {
"#currencyID": "EUR",
"#text": "23.50"
},
"BaseQuantity": {
"#unitCode": "C62",
"#text": "1"
}
}
}
}
}
do you have any idea how to do it?
You've loaded your file in using json.load(...). That'll convert the JSON data into a Python dictionary that you can use to access elements:
if data["Invoice"]["OrderReference"]["ID"] == 22:
print("try")
Note that you might want to check the relevant keys exist along the way, in case the structure of your file varies, or you could catch the KeyError that'll come up if the key doesn't exist, using try/except.
Some more background:
When you then call json.dumps(...), you're taking that handy python structure and converting it back into a hard-to-understand string again. I don't think you want or need to do this.
The specific error you have is because dumps has created a string. You're then trying to access an element of that string using the [ ] operator. Strings can only be indexed using integers, e.g. mystr[4], so Python doesn't understand what you're asking it to do with data_json["ID"].

Use a json file stored on fs/disk as output for an Ansible module

I am struggling with an ansible module I needed to create. Everything is done, the module gets a json file delivered from a third party onto the fs. This json file is expected to be the (only) output to be able to access to register the json file and access the content - or at least make the output somehow properly accessible.
The output file consists of a proper json file and I have tried various stuff to reach my goal.
Including:
Simply print out the json file using print or os.stdout.write, because according to the documentation, ansible simply takes the stdout.
Importing the json and dump is using json.dumps(data) or like this:
with open('path-to-file', 'r') as tmpfile:
data = json.load(tmpfile)
module.exit_json(changed=True, message="API call to %s successfull" % endpoint, meta=data)
This ended up having the json in the output, but in an escaped variant and ansible refuses to access the escaped part.
What would be the correct way to make the json data accessible for further usage?
Edit:
The json looks like this (well, it’s a huge json, this is simply a part of it):
{
"total_results": 51,
"total_pages": 2,
"prev_url": null,
"next_url": "/v2/apps?order-direction=asc&page=2&results-per-page=50",
After register, the debug output looks like this and I cannot access output.meta.total_results for example.
ok: [localhost] => {
"output": {
"changed": true,
"message": "API call filtering /v2/apps with name and yes-no was successfull",
"meta": "{\"total_results\": 51, \"next_url\": \"/v2/apps?order-direction=asc&page=2&results-per-page=50\", \"total_pages\": 2, \"prev_url\": null, (...)
The ansible output when trying to access the var:
ok: [localhost] => {
"output.meta.total_results": "VARIABLE IS NOT DEFINED!"
}
Interesting. My tests using os.stdout.write somehow failed, but using print json.dumps(data) works.
This is solved.

How can I improve this script to make it more pythonic?

I'm fairly new to Python programming, and have thus far been reverse engineering code that previous developers have made, or have cobbled together some functions on my own.
The script itself works; to cut a long story short, its designed to parse a CSV and to (a) create and or update the contacts found in the CSV, and (b) to correctly assign the contact to their associated company. All using the HubSpot API. To achieve this i've also imported requests and csvmapper.
I had the following questions:
How can I improve this script to make it more pythonic?
What is the best way to make this script run on a remote server,
keeping in mind that Requests and CSVMapper probably aren't
installed on that server, and that I most likely won't have
permission to install them - what is the best way to "package" this
script, or to upload Requests and CSVMapper to the server?
Any advice much appreciated.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import print_function
import sys, os.path, requests, json, csv, csvmapper, glob, shutil
from time import sleep
major, minor, micro, release_level, serial = sys.version_info
# Client Portal ID
portal = "XXXXXX"
# Client API Key
hapikey = "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
# This attempts to find any file in the directory that starts with "note" and ends with ".CSV"
# Server Version
# findCSV = glob.glob('/home/accountName/public_html/clientFolder/contact*.CSV')
# Local Testing Version
findCSV = glob.glob('contact*.CSV')
for i in findCSV:
theCSV = i
csvfileexists = os.path.isfile(theCSV)
# Prints a confirmation if file exists, prints instructions if it doesn't.
if csvfileexists:
print ("\nThe \"{csvPath}\" file was found ({csvSize} bytes); proceeding with sync ...\n".format(csvSize=os.path.getsize(theCSV), csvPath=os.path.basename(theCSV)))
else:
print ("File not found; check the file name to make sure it is in the same directory as this script. Exiting ...")
sys.exit()
# Begin the CSVmapper mapping... This creates a virtual "header" row - the CSV therefore does not need a header row.
mapper = csvmapper.DictMapper([
[
{'name':'account'}, #"Org. Code"
{'name':'id'}, #"Hubspot Ref"
{'name':'company'}, #"Company Name"
{'name':'firstname'}, #"Contact First Name"
{'name':'lastname'}, #"Contact Last Name"
{'name':'job_title'}, #"Job Title"
{'name':'address'}, #"Address"
{'name':'city'}, #"City"
{'name':'phone'}, #"Phone"
{'name':'email'}, #"Email"
{'name':'date_added'} #"Last Update"
]
])
# Parse the CSV using the mapper
parser = csvmapper.CSVParser(os.path.basename(theCSV), mapper)
# Build the parsed object
obj = parser.buildObject()
def contactCompanyUpdate():
# Open the CSV, use commas as delimiters, store it in a list called "data", then find the length of that list.
with open(os.path.basename(theCSV),"r") as f:
reader = csv.reader(f, delimiter = ",", quotechar="\"")
data = list(reader)
# For every row in the CSV ...
for row in range(0, len(data)):
# Set up the JSON payload ...
payload = {
"properties": [
{
"name": "account",
"value": obj[row].account
},
{
"name": "id",
"value": obj[row].id
},
{
"name": "company",
"value": obj[row].company
},
{
"property": "firstname",
"value": obj[row].firstname
},
{
"property": "lastname",
"value": obj[row].lastname
},
{
"property": "job_title",
"value": obj[row].job_title
},
{
"property": "address",
"value": obj[row].address
},
{
"property": "city",
"value": obj[row].city
},
{
"property": "phone",
"value": obj[row].phone
},
{
"property": "email",
"value": obj[row].email
},
{
"property": "date_added",
"value": obj[row].date_added
}
]
}
nameQuery = "{first} {last}".format(first=obj[row].firstname, last=obj[row].lastname)
# Get a list of all contacts for a certain company.
contactCheck = "https://api.hubapi.com/contacts/v1/search/query?q={query}&hapikey={hapikey}".format(hapikey=hapikey, query=nameQuery)
# Convert the payload to JSON and assign it to a variable called "data"
data = json.dumps(payload)
# Defined the headers content-type as 'application/json'
headers = {'content-type': 'application/json'}
contactExistCheck = requests.get(contactCheck, headers=headers)
for i in contactExistCheck.json()[u'contacts']:
# ... Get the canonical VIDs
canonicalVid = i[u'canonical-vid']
if canonicalVid:
print ("{theContact} exists! Their VID is \"{vid}\"".format(theContact=obj[row].firstname, vid=canonicalVid))
print ("Attempting to update their company...")
contactCompanyUpdate = "https://api.hubapi.com/companies/v2/companies/{companyID}/contacts/{vid}?hapikey={hapikey}".format(hapikey=hapikey, vid=canonicalVid, companyID=obj[row].id)
doTheUpdate = requests.put(contactCompanyUpdate, headers=headers)
if doTheUpdate.status_code == 200:
print ("Attempt Successful! {theContact}'s has an updated company.\n".format(theContact=obj[row].firstname))
break
else:
print ("Attempt Failed. Status Code: {status}. Company or Contact not found.\n".format(status=doTheUpdate.status_code))
def createOrUpdateClient():
# Open the CSV, use commas as delimiters, store it in a list called "data", then find the length of that list.
with open(os.path.basename(theCSV),"r") as f:
reader = csv.reader(f, delimiter = ",", quotechar="\"")
data = list(reader)
# For every row in the CSV ...
for row in range(0, len(data)):
# Set up the JSON payload ...
payloadTest = {
"properties": [
{
"property": "email",
"value": obj[row].email
},
{
"property": "firstname",
"value": obj[row].firstname
},
{
"property": "lastname",
"value": obj[row].lastname
},
{
"property": "website",
"value": None
},
{
"property": "company",
"value": obj[row].company
},
{
"property": "phone",
"value": obj[row].phone
},
{
"property": "address",
"value": obj[row].address
},
{
"property": "city",
"value": obj[row].city
},
{
"property": "state",
"value": None
},
{
"property": "zip",
"value": None
}
]
}
# Convert the payload to JSON and assign it to a variable called "data"
dataTest = json.dumps(payloadTest)
# Defined the headers content-type as 'application/json'
headers = {'content-type': 'application/json'}
#print ("{theContact} does not exist!".format(theContact=obj[row].firstname))
print ("Attempting to add {theContact} as a contact...".format(theContact=obj[row].firstname))
createOrUpdateURL = 'http://api.hubapi.com/contacts/v1/contact/createOrUpdate/email/{email}/?hapikey={hapikey}'.format(email=obj[row].email,hapikey=hapikey)
r = requests.post(createOrUpdateURL, data=dataTest, headers=headers)
if r.status_code == 409:
print ("This contact already exists.\n")
elif (r.status_code == 200) or (r.status_code == 202):
print ("Success! {firstName} {lastName} has been added.\n".format(firstName=obj[row].firstname,lastName=obj[row].lastname, response=r.status_code))
elif r.status_code == 204:
print ("Success! {firstName} {lastName} has been updated.\n".format(firstName=obj[row].firstname,lastName=obj[row].lastname, response=r.status_code))
elif r.status_code == 400:
print ("Bad request. You might get this response if you pass an invalid email address, if a property in your request doesn't exist, or if you pass an invalid property value.\n")
else:
print ("Contact Marko for assistance.\n")
if __name__ == "__main__":
# Run the Create or Update function
createOrUpdateClient()
# Give the previous function 5 seconds to take effect.
sleep(5.0)
# Run the Company Update function
contactCompanyUpdate()
print("Sync complete.")
print("Moving \"{something}\" to the archive folder...".format(something=theCSV))
# Cron version
#shutil.move( i, "/home/accountName/public_html/clientFolder/archive/" + os.path.basename(i))
# Local version
movePath = "archive/{thefile}".format(thefile=theCSV)
shutil.move( i, movePath )
print("Move successful! Exiting...\n")
sys.exit()
I'll just go from top to bottom. The first rule is, do what's in PEP 8. It's not the ultimate style guide, but it's certainly a reference baseline for Python coders, and that's more important, especially when you're getting started. The second rule is, make it maintainable. A couple of years from now, when some other new kid comes through, it should be easy for her to figure out what you were doing. Sometimes that means doing things the long way, to reduce errors. Sometimes it means doing things the short way, to reduce errors. :-)
#!/usr/bin/env python
# -*- coding: utf-8 -*-
Two things: you got the encoding right, per PEP 8. And
Conventions for writing good documentation strings (a.k.a. "docstrings") are immortalized in PEP 257.
You've got a program that does something. But you don't document what.
from __future__ import print_function
import sys, os.path, requests, json, csv, csvmapper, glob, shutil
from time import sleep
major, minor, micro, release_level, serial = sys.version_info
Per PEP 8: put your import module statements one per line.
Per Austin: make your paragraphs have separate subjects. You've got some imports right next to some version info stuff. Insert a blank line. Also, DO SOMETHING with the data! Or you didn't need it to be right here, did you?
# Client Portal ID
portal = "XXXXXX"
# Client API Key
hapikey = "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
You've obscured these in more ways than one. WTF is a hapikey? I think you mean Hubspot_API_key. And what does portal do?
One piece of advice: the more "global" a thing is, the more "formal" it should be. If you have a for loop, it's okay to call one of the variables i. If you have a piece of data that is used throughout a function, call it obj or portal. But if you have a piece of data that is used globally, or is a class variable, make it put on a tie and a jacket so everyone can recognize it: make it Hubspot_api_key instead of client_api_key. Maybe even Hubspot_client_api_key if there are more than one API. Do the same with portal.
# This attempts to find any file in the directory that starts with "note" and ends with ".CSV"
# Server Version
# findCSV = glob.glob('/home/accountName/public_html/clientFolder/contact*.CSV')
It didn't take long for the comments to become lies. Just delete them if they aren't true.
# Local Testing Version
findCSV = glob.glob('contact*.CSV')
This is the kind of thing that you should create a function for. Just create a simple function called "get_csv_files" or whatever, and have it return a list of filenames. That decouples you from glob, and it means you can make your test code data driven (pass a list of filenames into a function, or pass a single file into a function, instead of asking it to search for them). Also, those glob patterns are exactly the kind of thing that go in a config file, or a global variable, or get passed as command line arguments.
for i in findCSV:
I'll bet typing CSV in upper case all the time is a pain. And what does findCSV mean? Read that line, and figure out what that variable should be called. Maybe csv_files? Or new_contact_files? Something that demonstrates that there is a collection of things.
theCSV = i
csvfileexists = os.path.isfile(theCSV)
Now what does i do? You had this nice small variable name, in a BiiiiiiG loop. That was a mistake, since if you can't see a variable's entire scope all on one page, it probably needs a somewhat longer name. But then you created an alias for it. Both i and theCSV refer to the same thing. And ... I don't see you using i again. So maybe your loop variable should be theCSV. Or maybe it should be the_csv to make it easier to type. Or just csvname.
# Prints a confirmation if file exists, prints instructions if it doesn't.
This seems a little needless. If you're using glob to get filenames, they pretty much are going to exist. (If they don't, it's because they were deleted between the time you called glob and the time you tried to open them. That's possible, but rare. Just continue or raise an exception, depending.)
if csvfileexists:
print ("\nThe \"{csvPath}\" file was found ({csvSize} bytes); proceeding with sync ...\n".format(csvSize=os.path.getsize(theCSV), csvPath=os.path.basename(theCSV)))
In this code, you use the value of csvfileexists. But that's the only place you use it. In this case, you can probably move the call to os.path.isfile() into the if statement and get rid of the variable.
else:
print ("File not found; check the file name to make sure it is in the same directory as this script. Exiting ...")
sys.exit()
Notice that in this case, when there is an actual problem, you didn't print the file name? How helpful was that?
Also, remember the part where you're on a remote server? You should consider using Python's logging module to record these messages in a useful manner.
# Begin the CSVmapper mapping... This creates a virtual "header" row - the CSV therefore does not need a header row.
mapper = csvmapper.DictMapper([
[
{'name':'account'}, #"Org. Code"
{'name':'id'}, #"Hubspot Ref"
{'name':'company'}, #"Company Name"
{'name':'firstname'}, #"Contact First Name"
{'name':'lastname'}, #"Contact Last Name"
{'name':'job_title'}, #"Job Title"
{'name':'address'}, #"Address"
{'name':'city'}, #"City"
{'name':'phone'}, #"Phone"
{'name':'email'}, #"Email"
{'name':'date_added'} #"Last Update"
]
])
You're creating an object with a bunch of data. This would be a good place for a function. Define a make_csvmapper() function to do all this for you, and move it out of line.
Also, note that the standard csv module has most of the functionality you are using. I don't think you actually need csvmapper.
# Parse the CSV using the mapper
parser = csvmapper.CSVParser(os.path.basename(theCSV), mapper)
# Build the parsed object
obj = parser.buildObject()
Here's another chance for a function. Maybe instead of making a csv mapper, you could just return the obj?
def contactCompanyUpdate():
At this point, things get fishy. You have these function definitions indented, but I don't think you need them. Is that a stackoverflow problem, or does your code really look like this?
# Open the CSV, use commas as delimiters, store it in a list called "data", then find the length of that list.
with open(os.path.basename(theCSV),"r") as f:
No, apparently it really looks like this. Because you're using theCSV inside this function when you don't really need to. Please consider using formal function parameters instead of just grabbing outer-scope objects. Also, why are you using basename on the csv file? If you obtained it using glob, doesn't it already have the path you want?
reader = csv.reader(f, delimiter = ",", quotechar="\"")
data = list(reader)
# For every row in the CSV ...
for row in range(0, len(data)):
Here you forced data to be a list of rows obtained from reader, and then started iterating over them. Just iterate over reader directly, like: for row in reader: BUT WAIT! You're actually iterating over a CSV file that you have already opened, in your obj variable. Just pick one, and iterate over it. You don't need to open the file twice for this.
# Set up the JSON payload ...
payload = {
"properties": [
{
"name": "account",
"value": obj[row].account
},
{
"name": "id",
"value": obj[row].id
},
{
"name": "company",
"value": obj[row].company
},
{
"property": "firstname",
"value": obj[row].firstname
},
{
"property": "lastname",
"value": obj[row].lastname
},
{
"property": "job_title",
"value": obj[row].job_title
},
{
"property": "address",
"value": obj[row].address
},
{
"property": "city",
"value": obj[row].city
},
{
"property": "phone",
"value": obj[row].phone
},
{
"property": "email",
"value": obj[row].email
},
{
"property": "date_added",
"value": obj[row].date_added
}
]
}
Okay, that was a LOOOONG span of code that didn't do much. At the least, tighten those inner dicts up to one line each. But better still, write a function to create your dictionary in the format you want. You can use getattr to pull the data by name from obj.
nameQuery = "{first} {last}".format(first=obj[row].firstname, last=obj[row].lastname)
# Get a list of all contacts for a certain company.
contactCheck = "https://api.hubapi.com/contacts/v1/search/query?q={query}&hapikey={hapikey}".format(hapikey=hapikey, query=nameQuery)
# Convert the payload to JSON and assign it to a variable called "data"
data = json.dumps(payload)
# Defined the headers content-type as 'application/json'
headers = {'content-type': 'application/json'}
contactExistCheck = requests.get(contactCheck, headers=headers)
Here you're encoding details of the API into your code. Consider pulling them out into functions. (That way, you can come back later and build a module of them, to re-use in your next program.) Also, beware of comments that don't actually tell you anything. And feel free to pull that together as a single paragraph, since it's all in service of the same key thing - making an API call.
for i in contactExistCheck.json()[u'contacts']:
# ... Get the canonical VIDs
canonicalVid = i[u'canonical-vid']
if canonicalVid:
print ("{theContact} exists! Their VID is \"{vid}\"".format(theContact=obj[row].firstname, vid=canonicalVid))
print ("Attempting to update their company...")
contactCompanyUpdate = "https://api.hubapi.com/companies/v2/companies/{companyID}/contacts/{vid}?hapikey={hapikey}".format(hapikey=hapikey, vid=canonicalVid, companyID=obj[row].id)
doTheUpdate = requests.put(contactCompanyUpdate, headers=headers)
if doTheUpdate.status_code == 200:
print ("Attempt Successful! {theContact}'s has an updated company.\n".format(theContact=obj[row].firstname))
break
else:
print ("Attempt Failed. Status Code: {status}. Company or Contact not found.\n".format(status=doTheUpdate.status_code))
I'm not sure if this last bit should be an exception or not. Is an "Attempt Failed" normal behavior, or does it mean that something is broken?
At any rate, please look into the API you are using. I'd bet there is some more information available for minor failures. (Major failures would be the internet is broken or their server is offline.) They might provide an "errors" or "error" field in their return JSON, for example. Those should be logged or printed with your failure message.
def createOrUpdateClient():
Mostly this function has the same issues as the previous one.
else:
print ("Contact Marko for assistance.\n")
Except here. Never put your name in someplace like this. Or you'll still be getting calls on this code 10 years from now. Put your department name ("IT Operations") or a support number. The people who need to know will already know. And the people who don't need to know can just notify the people that already know.
if __name__ == "__main__":
# Run the Create or Update function
createOrUpdateClient()
# Give the previous function 5 seconds to take effect.
sleep(5.0)
# Run the Company Update function
contactCompanyUpdate()
print("Sync complete.")
print("Moving \"{something}\" to the archive folder...".format(something=theCSV))
# Cron version
#shutil.move( i, "/home/accountName/public_html/clientFolder/archive/" + os.path.basename(i))
# Local version
movePath = "archive/{thefile}".format(thefile=theCSV)
shutil.move( i, movePath )
print("Move successful! Exiting...\n")
This was awkward. You might consider taking some command line arguments and using them to determine your behavior.
sys.exit()
And don't do this. Never put an exit() at module scope, because it means you can't possibly import this code. Maybe someone wants to import it to parse the docstrings. Or maybe they want to borrow some of those API functions you wrote. Too bad! sys.exit() means always having to say "Oh, sorry, I'll have to do that for you." Put it at the bottom of your actual __name__ == "__main__" code. Or, since you aren't actually passing a value, just remove it entirely.

Categories