Best practices when pulling API data that is being updated/changed repeatedly

Best practices when pulling API data that is being updated/changed repeatedly - python

currently I'm pulling data from the Quickbooks online API, parsing it and storing it into a database. My problem is, right now with the small amount of data I am pulling I am just deleting my db tables and repopulating the tables with the updated - is there any way I can do this more optimally?

QuickBooks provides an API that is exactly what you're looking for. It's called Change Data Capture and is a common pattern for time-based updates like you're describing.
If you refer to Intuit's docs they tell you all about it:
https://developer.intuit.com/app/developer/qbo/docs/learn/explore-the-quickbooks-online-api/change-data-capture
Basically you make requests like this, providing a date/time you want data changed since:
https://quickbooks.api.intuit.com/v3/company/<realmId>/cdc?entities=<entityList>&changedSince=<dateTime>
And you get back a list of changed objects that you can then update your local database with:
{
"CDCResponse": [{
"QueryResponse": [{
"Customer": [{
...
"Id": "63",
...
},
{
...
"Id": "99",
...
}],
"startPosition": 1,
"maxResults": 2
},
{
"Estimate": [{
...
"Id": "34",
...
},
{
...
"Id": "123",
...
},
{
"domain": "QBO",
"status": "Deleted",
"Id": "979",
"MetaData": {
"LastUpdatedTime": "2015-12-23T12:55:50-08:00"
}
}],
"startPosition": 1,
"maxResults": 3,
"totalCount": 5
}]
}],
"time": "2015-12-23T10:00:01-07:00"
}

Related

Add ascending serial number field to all existing mongodb documents in a collection

I have a mongodb collection which looks something like this;
[
{
"Code": "018906",
"X": "0.12",
},
{
"Code": "018907",
"X": "0.18",
},
{
"Code": "018910",
"X": "0.24",
},
{
"Code": "018916",
"X": "0.75",
},
]
I want to add an ascending serial number field to all existing mongodb documents inside the collection. After adding, the new collection will look like this;
[
{
"Serial": 1,
"Code": "018906",
"X": "0.12",
},
{
"Serial": 2,
"Code": "018907",
"X": "0.18",
},
{
"Serial": 3,
"Code": "018910",
"X": "0.24",
},
{
"Serial": 4,
"Code": "018916",
"X": "0.75",
},
]
I am open to using any python mongodb library such as pymongo or mongoengine.
I am using python 3.7, mongodb v4.2.

You can do it with a single aggregation query by grouping up all documents in a single array, then unwinding it with element index included:
db.collection.aggregate([
{
$group: {
_id: null,
doc: {
$push: "$$ROOT"
}
}
},
{
$unwind: {
path: "$doc",
includeArrayIndex: "doc.Serial"
}
},
{
$replaceRoot: {
newRoot: "$doc"
}
},
{
$out: "new_collection_name"
}
])
All job is done serverside, no need to load whole collection to the application's memory. If the collection is large enough, you may need to call aggregation with "allowDiskUse".
Prepend it with sorting stage to ensure expected order if required.

First you need to find all the _id in the collection, and use bulk write operation.
from pymongo import UpdateOne
records = db.collection.find({}, {'_id':1})
i = 1
request = []
for record in records:
request.append(UpdateOne({'_id': record['_id']}, {'$set': {'serial': i}}))
i=i+1
db.collection.bulk_write(request)

I need help figuring out how to turn online data into a usable list that I can print data from

In a program I am working on, I use ArcCloud's music fingerprinting service. after uploading the data I need identified, I am given back this piece of data:
re = ACRCloudRecognizer(config)
data = (re.recognize_by_file('audio_name.mp3', 0))
>>>data
'{"metadata":{"timestamp_utc":"2020-05-18 23:00:59","music":[{"label":"NoCopyrightSounds","play_offset_ms":125620,"duration_ms":326609,"external_ids":{},"artists":[{"name":"Culture Code & Regoton"}],"result_from":1,"acrid":"a53ea40c6a8b4a6795ac3d799f6a4aec","title":"Waking Up","genres":[{"name":"Electro"}],"album":{"name":"Waking Up"},"score":100,"external_metadata":{},"release_date":"2014-05-25"}]},"cost_time":5.5099999904633,"status":{"msg":"Success","version":"1.0","code":0},"result_type":0}\n'
I think it's a list, but I am unable to figure out how to navigate nor grab specific information from it. I'm unsure how they set up the information, and what patterns to look for. Ideally, I would like to create a print function that would print the title, artists, and album.
Any help is much appreciated!

Formatting the JSON makes it more legible
{
"metadata": {
"timestamp_utc": "2020-05-18 23:00:59",
"music": [
{
"label": "NoCopyrightSounds",
"play_offset_ms": 125620,
"duration_ms": 326609,
"external_ids": {},
"artists": [
{
"name": "Culture Code & Regoton"
}
],
"result_from": 1,
"acrid": "a53ea40c6a8b4a6795ac3d799f6a4aec",
"title": "Waking Up",
"genres": [
{
"name": "Electro"
}
],
"album": {
"name": "Waking Up"
},
"score": 100,
"external_metadata": {},
"release_date": "2014-05-25"
}
]
},
"cost_time": 5.5099999904633,
"status": {
"msg": "Success",
"version": "1.0",
"code": 0
},
"result_type": 0
}
Looks like you're looking for .metadata.music.title (presumably), but only if .status.code is 0

No enum error when validating JSON using jsonschema in python

First of all, I am not getting a proper error reponse on the web platform as well (https://jsonschemalint.com). I am using jsonschema in python, and have a proper json schema and json data that works.
The problem I'd like to solve is the following: Before we deliver JSON files with example data, we need to run them through SoapUI to test if they are proper, as we are dealing with huge files and usually our devs may make some errors in generating them, so we do the final check.
I'd like to create a script to automate this, avoiding SoapUI. So after googling, I came across jsonschema, and tried to use it. I get all the proper results,etc, I get errors when I delete certain elements as usual, but the biggest issues are the following:
Example :
I have a subsubsub object in my JSON schema, let's call it Test1, which contains the following :
**Schema**
{
"exname":"2",
"info":{},
"consumes":{},
"produces":{},
"schemes":{},
"tags":{},
"parameters":{},
"paths":{},
"definitions":{
"MainTest1":{
"description":"",
"minProperties":1,
"properties":{
"test1":{
"items":{
"$ref":"#//Test1"
},
"maxItems":10,
"minItems":1,
"type":"array"
},
"test2":{
"items":{
"$ref":"#//"
},
"maxItems":10,
"minItems":1,
"type":"array"
}
}
},
"Test1":{
"description":"test1des",
"minProperties":1,
"properties":{
"prop1":{
"description":"prop1des",
"example":"prop1exam",
"maxLength":10,
"minLength":2,
"type":"string"
},
"prop2":{
"description":"prop2des",
"example":"prop2example",
"maxLength":200,
"minLength":2,
"type":"string"
},
"prop3":{
"enum":[
"enum1",
"enum2",
"enum3"
],
"example":"enum1",
"type":"string"
}
},
"required":[
"prop3"
],
"type":"object"
}
}
}
**Proper example for Test1**
{
"Test1": [{
"prop1": "TestStr",
"prop2": "Test and Test",
"prop3": "enum1"
}]
}
**Improper example that still passes validation for Test1**
{
"test1": [{
"prop1": "TestStr123456", [wrong as it passes the max limit]
"prop2": "Test and Test",
"prop3": " enum1" [wrong as it has a whitespace char before enum1]
}]
}
The first issue I ran across is that enum in prop3 isn't validated correctly. So, when I use " enum1" or "enumruwehrqweur" or "literally anything", the tests pass. In addition, that min-max characters do not get checked throughout my JSON. No matter how many characters I use in any field, I do not get an error. Anyone has any idea how to fix this, or has anyone found a better workaround to do what I would like to do? Thank you in advance!

There were a few issues with your schema. I'll address each of them.
In your schema, you have "Test1". In your JSON instance, you have "test1". Case is important. I would guess this is just an error in creating your example.
In your schema, you have "Test1" at the root level. Because this is not a schema key word, it is ignored, and has no effect on validation. You need to nest it inside a "properties" object, as you have done elsewhere.
{
"properties": {
"test1": {
Your validation would still not work correctly. If you want to validate each item in an array, you need to use the items keyword.
{
"properties": {
"test1": {
"items": {
"description": "test1des",
Finally, you'll need to nest the required and type key words inside the items object.
Here's the complete schema:
{
"properties": {
"test1": {
"items": {
"description": "test1des",
"minProperties": 1,
"properties": {
"prop1": {
"description": "prop1des",
"example": "prop1exam",
"maxLength": 10,
"minLength": 2,
"type": "string"
},
"prop2": {
"description": "prop2des",
"example": "prop2example",
"maxLength": 200,
"minLength": 2,
"type": "string"
},
"prop3": {
"enum": [
"enum1",
"enum2",
"enum3"
],
"example": "enum1",
"type": "string"
}
},
"required": [
"prop3"
],
"type": "object"
}
}
}
}

How to insert large documents in mongodb

I want to store a large JSON document larger than 16MB (which is the size limit per document) in MongoDB, but due to the size limit i am unable to do so. How can I store such large documents in MongoDB? I know GridFS API can be an option, but after a lot of struggle, I am still unable to figure out how to use GridFS and what are the right commands to insert and retrieve data using GridFS. Any help in using GridFS or any other alternative to store large JSON documents would be much appreciated.
I am using Python's PyMongo package.
Thanks!

There are so many methods to store large data in MongoDB
var Data = {
"userID": "1",
"userData": {
"firstName": "Test Firest Name",
"lastName": "Test Last Name",
"number":{
"phNumber": "9999999991",
"cellNumber": "8888888888",
},
"address": {
"Geo": {
"latitude": 15.40,
"longtitude": -70.90
},
"city": "surat",
"state": "gujarat",
"contry": "india"
},
"product": {
"game": {
"GTA": true,
"DOTA": true
},
"television": {
"TV": true,
"PlayStation": true
"Xbox": false
}
},
},
"key": "ANbcsgYSIDncsSK"
};
db.collection("[collection Name]").insertMany(Data)

Python .get nested Json values

I have a json file with the following example json entry:
{
"title": "Test prod",
"leafPage": true,
"type": "product",
"product": {
"title": "test product",
"offerPrice": "$19.95",
"offerPriceDetails": {
"amount": 19.95,
"text": "$19.95",
"symbol": "$"
},
"media": [
{
"link": "http://www.test.com/cool.jpg",
"primary": true,
"type": "image",
"xpath": "/html[1]/body[1]/div[1]/div[3]/div[2]/div[1]/div[1]/div[1]/div[1]/a[1]/img[1]"
}
],
"availability": true
},
"human_language": "en",
"url": "http://www.test.com"
}
I can post via python script this to my test server perfectly when I use:
"text": entry.get("title"),
"url": entry.get("url"),
"type": entry.get("type"),
However I cannot get the following nested item to upload the values, how do I structure the python json call to get a nested python json entry?
Ive tried the below without success, I need to have it as .get because there are different fields currently in the json file and it errors out without the .get call.
"Amount": entry.get("product"("offerPrice"))
Any help on how to structure the nested json entry would be very much appreciated.

You need to do:
"Amount": entry.get("product", {}).get("offerPrice")
entry.get("product", {}) returns a product dictionary (or an empty dictionary if there is no product key).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Best practices when pulling API data that is being updated/changed repeatedly - python

currently I'm pulling data from the Quickbooks online API, parsing it and storing it into a database. My problem is, right now with the small amount of data I am pulling I am just deleting my db tables and repopulating the tables with the updated - is there any way I can do this more optimally?

Related

Add ascending serial number field to all existing mongodb documents in a collection

I need help figuring out how to turn online data into a usable list that I can print data from

No enum error when validating JSON using jsonschema in python

How to insert large documents in mongodb

Python .get nested Json values

Categories

Resources