AppleScript doesn't complete when parsing large json structure

AppleScript doesn't complete when parsing large json structure - python

I have an iMac 2017 with Monterey 12.6.
I am forced to call my API in python to have a correct JSON structure for results.
For information, this is the URL that I call in python (coingecko limits the result to the 100 first results):
https://api.coingecko.com/api/v3/coins/markets?ids=acala,alpaca-finance,altair,astar,avalanche-2,baanx,bbeefy-finance,bifrost-native-coin,binancecoin,binance-eth,binance-usd,bitcoin,cardano,chainlink,chainx,chronicle,coin-capsule,colony,cosmos,crabada,crypto-com-chain,cumrocket,curve-dao-token,dappradar,dogecoin,elrond-erd-2,ergo,ethereum,evmos,exeedme,fantom,ftx-token,fuse-network-token,genshiro,green-satoshi-token,havven,hooked-protocol,integritee,kadena,karura,kintsugi,kucoin-shares,kusama,kuswap,matic-network,metagame-arena,metagods,metavault,mina-protocol,moonbeam,moonpot,moonriver,nafty,near,osmosis,pancakeswap-token,paraswap,platypus-finance,pluton,polkadot,safemoon,shiba-inu,kryll,kucoin-shares,kusama,kuswap,lido-dao,matic-network,maze-token,memepad,metagame-arena,metagods,metis-token,mina-protocol,moonbeam,moonpot,moonriver,movn,nafter,nafty,near,nexo,nftlaunch,orion-protocol,osmosis,paid-network,pancakeswap-token,paraswap,platypus-finance,polkadot,polkamon,polkamarkets,polkastarter,polycat-finance,polychain-monsters,polygonfarm-finance,presearch,safemoon,safepal,shiba-inu,shiden,solana,staked-ether,staked-olympus,stasis-eurs,stepn,sushi,swissborg,switch,tether,terrausd,the-graph,the-sandbox,unifarm,uniswap,usd-coin,valkyrie-protocol,wizarre-scroll,zelcash&vs_currency=EUR"
The result in Python is something like that but with many others cryptos:
[{'id': 'bitcoin', 'symbol': 'btc', 'name': 'Bitcoin', 'image': 'https://assets.coingecko.com/coins/images/1/large/bitcoin.png?1547033579', 'current_price': 15765.21, 'market_cap': 303279048538, 'market_cap_rank': 1, 'fully_diluted_valuation': 330971307514, 'total_volume': 9519061248, 'high_24h': 15812.98, 'low_24h': 15758.06, 'price_change_24h': -18.11016048005331, 'price_change_percentage_24h': -0.11474, 'market_cap_change_24h': -207406972.0255127, 'market_cap_change_percentage_24h': -0.06834, 'circulating_supply': 19242937.0, 'total_supply': 21000000.0, 'max_supply': 21000000.0, 'ath': 59717, 'ath_change_percentage': -73.60052, 'ath_date': '2021-11-10T14:24:11.849Z', 'atl': 51.3, 'atl_change_percentage': 30631.82104, 'atl_date': '2013-07-05T00:00:00.000Z', 'roi': None, 'last_updated': '2022-12-25T14:21:15.974Z'}, {'id': 'ethereum', 'symbol': 'eth', 'name': 'Ethereum', 'image': 'https://assets.coingecko.com/coins/images/279/large/ethereum.png?1595348880', 'current_price': 1142.38, 'market_cap': 137652680087, 'market_cap_rank': 2, 'fully_diluted_valuation': 137652680087, 'total_volume': 2165257956, 'high_24h': 1149.42, 'low_24h': 1141.8, 'price_change_24h': -1.0190823115187868, 'price_change_percentage_24h': -0.08913, 'market_cap_change_24h': -73228021.97866821, 'market_cap_change_percentage_24h': -0.05317, 'circulating_supply': 120523982.078808, 'total_supply': 120523982.078808, 'max_supply': None, 'ath': 4228.93, 'ath_change_percentage': -72.99273, 'ath_date': '2021-12-01T08:38:24.623Z', 'atl': 0.381455, 'atl_change_percentage': 299310.75605, 'atl_date': '2015-10-20T00:00:00.000Z', 'roi': {'times': 95.89474372830291, 'currency': 'btc', 'percentage': 9589.474372830291}, 'last_updated': '2022-12-25T14:20:55.699Z'}]
Content of my AppleScript:
set desktop_folder to "$HOME/PycharmProjects/crypto/"
set valReturned to do shell script "python3 " & desktop_folder & "crypto.py"
set coins to (every item in valReturned) as list
repeat with n from 1 to count of coins
set coin to item n of coins
end repeat
The AppleScript process uses 99% of my processor.
If I comment on the loop part, I have a huge JSON (it's just the result for 100 different cryptocurrencies). I can't post it because when I copy-paste its contents, it exceeds 81,000 characters here.
Why does this crash AppleScript?
I've already posted on Stackoverflow here but I didn't do it the same way.
I think I could decrease the number of cryptos in my request but it's still strange that it crashes this way. I have no problem with Excel and powerquery for example.
I feel like the best way to produce a Numbers file would be to generate a CSV and then use AppleScript to import that CSV into Numbers and then apply formatting. Any suggestions?

Can only speculate, as I don't have/use any Apple-anything:
There's no linebreaks in the JSON. If loading/processing it with a tool/language that relies (internally) on strings, it may exceed a maximum character limit for a string
In both cases, your valReturned is a variable in AppleScript. It may come with some size limitations, and also you're filling/setting it via the textual characters that come from the output of echo or pyhton3. This can lead to delays and size limitations (crashes?). Especially with echo, printing text on the terminal is often limited by the terminal's output/drawing/rendering speed, not how fast the computer can process the characters.
Not sure how AppleScript handles JSON - does it parse the JSON and keep it all in memory? Also, does it try to be clever and unserialize/demarshal the JSON into its own data types, creating many objects etc. for later programmatic access? You seem to then also have two copies of it, one in a list structure in coins and then the valReturned (as a string I assume).
Potentially more similar reasons like that...
If you're already able to call echo and python3, maybe you do not need to use AppleScript to iterate over the JSON entries? As you seem to be able to write the JSON from the network into a file directly, maybe you do not need to have the AppleScript load/keep the JSON data? Wonder if trying to avoid AppleScript's exposure to the JSON or results/output from the external programs would mitigate the performance + crash problems.

AppleScript will perform the shell script without any issues, but note that the results will be a string. Your sample script is just going through the (81,000+) characters of that string, which takes a little time. Many languages (such as Python) provide direct support for JSON strings - for AppleScript, Cocoa (via AppleScriptObjC) can be used.
The JSON string from the shell script can be parsed to get a list of records (the result can also be left as Cocoa objects for use with those methods). The individual list items can then stepped through using a repeat statement and the various record key values used as desired, for example:
use framework "Foundation"
use scripting additions
set theURL to "https://api.coingecko.com/api/v3/coins/markets?ids=acala,alpaca-finance,altair,astar,avalanche-2,baanx,bbeefy-finance,bifrost-native-coin,binancecoin,binance-eth,binance-usd,bitcoin,cardano,chainlink,chainx,chronicle,coin-capsule,colony,cosmos,crabada,crypto-com-chain,cumrocket,curve-dao-token,dappradar,dogecoin,elrond-erd-2,ergo,ethereum,evmos,exeedme,fantom,ftx-token,fuse-network-token,genshiro,green-satoshi-token,havven,hooked-protocol,integritee,kadena,karura,kintsugi,kucoin-shares,kusama,kuswap,matic-network,metagame-arena,metagods,metavault,mina-protocol,moonbeam,moonpot,moonriver,nafty,near,osmosis,pancakeswap-token,paraswap,platypus-finance,pluton,polkadot,safemoon,shiba-inu,kryll,kucoin-shares,kusama,kuswap,lido-dao,matic-network,maze-token,memepad,metagame-arena,metagods,metis-token,mina-protocol,moonbeam,moonpot,moonriver,movn,nafter,nafty,near,nexo,nftlaunch,orion-protocol,osmosis,paid-network,pancakeswap-token,paraswap,platypus-finance,polkadot,polkamon,polkamarkets,polkastarter,polycat-finance,polychain-monsters,polygonfarm-finance,presearch,safemoon,safepal,shiba-inu,shiden,solana,staked-ether,staked-olympus,stasis-eurs,stepn,sushi,swissborg,switch,tether,terrausd,the-graph,the-sandbox,unifarm,uniswap,usd-coin,valkyrie-protocol,wizarre-scroll,zelcash&vs_currency=EUR"
set valReturned to (do shell script "/usr/bin/curl " & quoted form of theURL) -- string
set theResult to parseJSON from valReturned -- list of records
set output to "Current Prices:" & return -- header for the example
repeat with anItem in theResult -- the individual list items
-- do whatever with the record key values
set theName to |name| of anItem -- "name" is a keyword
set theValue to current_price of anItem
set output to output & theName & ": " & theValue & return
end repeat
return output
# Parse a JSON string into a data structure.
to parseJSON from sourceString given coercion:coerce : true
if class of sourceString is not in {string} then error "parseJSON error: source is not a string"
set theString to current application's NSString's stringWithString:sourceString
set theData to theString's dataUsingEncoding:(current application's NSUTF8StringEncoding)
set {theObject, theError} to current application's NSJSONSerialization's JSONObjectWithData:theData options:0 |error|:(reference)
if theObject is missing value then error "parseJSON error: " & (theError's userInfo's objectForKey:"NSDebugDescription")
if coerce then if (theObject's isKindOfClass:(current application's NSArray)) as boolean then
return theObject as list
else
return theObject as record
end if
return theObject -- leave as NSArray or NSDictionary
end parseJSON

Related

Write a custom JSON interpreter for a file that looks like json but isnt using Python

What I need to do is to write a module that can read and write files that use the PDX script language. This language looks alot like json but has enough differences that a custom encoder/decoder is needed to do anything with those files (without a mess of regex substitutions which would make maintenance hell). I originally went with just reading them as txt files, and use regex to find and replace things to convert it to valid json. This lead me to my current point, where any additions to the code requires me to write far more code than I would want to, just to support some small new thing. So using a custom json thing I could write code that shows what valid key:value pairs are, then use that to handle the files. To me that will be alot less code and alot easier to maintain.
So what does this code look like? In general it looks like this (tried to put all possible syntax, this is not an example of a working file):
#key = value # this is the definition for the scripted variable
key = {
# This is a comment. No multiline comments
function # This is a single key, usually optimize_memory
# These are the accepted key:value pairs. The quoted version is being phased out
key = "value"
key = value
key = #key # This key is using a scripted variable, defined either in the file its in or in the `scripted_variables` folder. (see above for example on how these are initially defined)
# type is what the key type is. Like trigger:planet_stability where planet_stability is a trigger
key = type:key
# Variables like this allow for custom names to be set. Mostly used for flags and such things
[[VARIABLE_NAME]
math_key = $VARIABLE_NAME$
]
# this is inline math, I dont actually understand how this works in the script language yet as its new. The "<" can be replaced with any math symbol.
# Valid example: planet_stability < #[ stabilitylevel2 + 10 ]
key < #[ key + 10 ]
# This is used alot to handle code blocks. Valid example:
# potential = {
# exists = owner
# owner = {
# has_country_flag = flag_name
# }
# }
key = {
key = value
}
# This is just a list. Inline brackets are used alot which annoys me...
key = { value value }
}
The major differences between json and PDX script is the nearly complete lack of quotations, using an equals sign instead of a colon for separation and no comma's at the end of the lines. Now before you ask me to change the PDX code, I cant. Its not mine. This is what I have to work with and cant make any changes to the syntax. And no I dont want to convert back and forth as I have already mentioned this would require alot of work. I have attempted to look for examples of this, however all I can find are references to convert already valid json to a python object, which is not what I want. So I cant give any examples of what I have already done, as I cant find anywhere to even start.
Some additional info:
Order of key:value pairs does not technically matter, however it is expected to be in a certain order, and when not in that order causes issues with mods and conflict solvers
bool properties always use yes or no rather than true or false
Lowercase is expected and in some cases required
Math operators are used as separators as well, eg >=, <= ect
The list of syntax is not exhaustive, but should contain most of the syntax used in the language
Past work:
My last attempts at this all revolved around converting it from a text file to a json file. This was alot of work just to get a small piece of this to work.
Example:
potential = {
exists = owner
owner = {
is_regular_empire = yes
is_fallen_empire = no
}
NOR = {
has_modifier = resort_colony
has_modifier = slave_colony
uses_habitat_capitals = yes
}
}
And what i did to get most of the way to json (couldnt find a way to add quotes)
test_string = test_string.replace("\n", ",")
test_string = test_string.replace("{,", "{")
test_string = test_string.replace("{", "{\n")
test_string = test_string.replace(",", ",\n")
test_string = test_string.replace("}, ", "},\n")
test_string = "{\n" + test_string + "\n}"
# Replace the equals sign with a colon
test_string = test_string.replace(" =", ":")
This resulted in this:
{
potential: {
exists: owner,
owner: {
is_regular_empire: yes,
is_fallen_empire: no,
},
NOR: {
has_modifier: resort_colony,
has_modifier: slave_colony,
uses_habitat_capitals: yes,
},
}
}
Very very close yes, but in no way could I find a way to add the quotations to each word (I think I did try a regex sub, but wasnt able to get it to work, since this whole thing is just one unbroken string), making this attempt stuck and also showing just how much work is required just to get a very simple potential block to mostly work. However this is not the method I want anymore, one because its alot of work and two because I couldnt find anything to finish it. So a custom json interpreter is what I want.

The classical approach (potentially leading to more code, but also more "correctness"/elegance) is probably to build a "recursive descent parser", from a bunch of conditionals/checks, loops and (sometimes recursive?) functions/handlers to deal with each of the encountered elements/characters on the input stream. An implicit parse/call tree might be sufficient if you directly output/print the JSON equivalent, or otherwise you could also create a representation/model in memory for later output/conversion.
Related book recommendation could be "Language Implementation Patterns" by Terence Parr, me avoiding to promote my own interpreters and introductory materials :-) In case you need further help, maybe write me?

Key 'boot_num' is not recognized when being interpreted from a .JSON file

Currently, I am working on a Boot Sequence in Python for a larger project. For this specific part of the sequence, I need to access a .JSON file (specs.json), establish it as a dictionary in the main program. I then need to take a value from the .JSON file, and add 1 to it, using it's key to find the value. Once that's done, I need to push the changes to the .JSON file. Yet, every time I run the code below, I get the error:
bootNum = spcInfDat['boot_num']
KeyError: 'boot_num'`
Here's the code I currently have:
(Note: I'm using the Python json library, and have imported dumps, dump, and load.)
# Opening of the JSON files
spcInf = open('mki/data/json/specs.json',) # .JSON file that contains the current system's specifications. Not quite needed, but it may make a nice reference?
spcInfDat = load(spcInf)
This code is later followed by this, where I attempt to assign the value to a variable by using it's dictionary key (The for statement was a debug statement, so I could visibly see the Key):
for i in spcInfDat['spec']:
print(CBL + str(i) + CEN)
# Loacting and increasing the value of bootNum.
bootNum = spcInfDat['boot_num']
print(str(bootNum))
bootNum = bootNum + 1
(Another Note: CBL and CEN are just variables I use to colour text I send to the terminal.)
This is the interior of specs.json:
{
"spec": [
{
"os":"name",
"os_type":"getwindowsversion",
"lang":"en",
"cpu_amt":"cpu_count",
"storage_amt":"unk",
"boot_num":1
}
]
}
I'm relatively new with .JSON files, as well as using the Python json library; I only have experience with them through some GeeksforGeeks tutorials I found. There is a rather good chance that I just don't know how .JSON files work in conjunction with the library, but I figure that it would still be worth a shot to check here. The GeeksForGeeks tutorial had no documentation about this, as well as there being minimal I know about how this works, so I'm lost. I've tried searching here, and have found nothing.
Issue Number 2
Now, the prior part works. But, when I attempt to run the code on the following lines:
# Changing the values of specDict.
print(CBL + "Changing values of specDict... 50%" + CEN)
specDict ={
"os":name,
"os_type":ost,
"lang":"en",
"cpu_amt":cr,
"storage_amt":"unk",
"boot_num":bootNum
}
# Writing the product of makeSpec to `specs.json`.
print(CBL + "Writing makeSpec() result to `specs.json`... 75%" + CEN)
jsonobj = dumps(specDict, indent = 4)
with open('mki/data/json/specs.json', "w") as outfile:
dump(jsonobj, outfile)
I get the error:
TypeError: Object of type builtin_function_or_method is not JSON serializable.
Is there a chance that I set up my dictionary incorrectly, or am I using the dump function incorrectly?

You can show the data using:
print(spcInfData)
This shows it to be a dictionary, whose single entry 'spec' has an array, whose zero'th element is a sub-dictionary, whose 'boot_num' entry is an integer.
{'spec': [{'os': 'name', 'os_type': 'getwindowsversion', 'lang': 'en', 'cpu_amt': 'cpu_count', 'storage_amt': 'unk', 'boot_num': 1}]}
So what you are looking for is
boot_num = spcInfData['spec'][0]['boot_num']
and note that the value obtained this way is already an integer. str() is not necessary.
It's also good practice to guard against file format errors so the program handles them gracefully.
try:
boot_num = spcInfData['spec'][0]['boot_num']
except (KeyError, IndexError):
print('Database is corrupt')
Issue Number 2
"Not serializable" means there is something somewhere in your data structure that is not an accepted type and can't be converted to a JSON string.
json.dump() only processes certain types such as strings, dictionaries, and integers. That includes all of the objects that are nested within sub-dictionaries, sub-arrays, etc. See documentation for json.JSONEncoder for a complete list of allowable types.

Cannot remove single quotes from string

I got a super weird problem: I got a list of strings which looks like this:
batch = ['{"SageMakerOutput":[{"label":"LABEL_8","score":0.9152628183364868},"inputs":"test"}',
'{"SageMakerOutput":[{"label":"LABEL_8","score":0.9769203066825867},"inputs":"Alles OK"}',
'{"SageMakerOutput":[{"label":"LABEL_8","score":0.9345938563346863},"inputs":"F"}']
In each entry of the list I want to remove the single quotes "'" but somehow I cannot remove it with .replace():
for line in batch:
line = line.replace("'","")
I dont get it

Alright, so judging by your comments and your other post, it seems like what you have are strings, and what you want are dictionaries. I've copied your data from the other post because your data isn't correct in this post (you're missing ] in this post).
Solution
import json
batch = ['{"SageMakerOutput":[{"label":"LABEL_8","score":0.9152628183364868}],"inputs":"test"}',
'{"SageMakerOutput":[{"label":"LABEL_8","score":0.9769203066825867}],"inputs":"Alles OK"}']
batch = [json.loads(b) for b in batch]
Output:
[{'SageMakerOutput': [{'label': 'LABEL_8', 'score': 0.9152628183364868}],
'inputs': 'test'},
{'SageMakerOutput': [{'label': 'LABEL_8', 'score': 0.9769203066825867}],
'inputs': 'Alles OK'}]
Explanation
Those objects in batch are what are known as JSON objects. They're just strings with a very specific structure. They are analogous to Python's dict type with some very minor differences and can very easily be converted to Python dict objects using Python's built-in json module, which automatically translates those minor differences between JSON and Python (e.g., booleans in JSON strings are true and false, but in Python they need to be True and False).
Notes
Advice to OP and future readers: this post is a classic case of the XY problem. In the future, try to be more clear about your end goal. Your question in this post isn't actually answerable because it's impossible to do what you were asking.

Coding Python to handle JSON array inconsistencies

I've been playing with Python grabbing and parsing data from JSON API's. Specifically, I am working with the CTA (Chicago Transit Authority) Train Tracker API.
I periodically receive a TypeError: string indices must be integers that I tracked down to a when an array of multiple 'train' runs exists versus a single 'train' run. The single run is not in an array of runs.
{'ctatt':
{'tmst': '2018-03-05T01:59:10',
'errCd': '0',
'errNm': None,
'route': [{'#name': 'g'},
{'#name': 'y',
'train': {'rn': '030',
.....
'heading': '302'},
{'#name': 'blue',
'train': [{'rn': '125',
.....
'heading': '302'},
{'rn': '127',
.....
'heading': '278'},
The 'g' route has no instances of runs.
The 'y' route has 1 run.
'train': {'rn':}
the 'blue' route has multiple runs.
'train': [{'rn': ...},{'rn': ...},{'rn': ...}]
The code I'm using to parse through handles the lack of runs and multiple runs. It hits the TypeError with 1 run.
for train_rt in trains_data['ctatt']['route']:
line_name = train_rt['#name']
if train_rt.get('train', 'None') != 'None':
for train_run in train_rt['train']:
What is the best way to handle just a single run that's not in the array?
2 Yellow Line Runs in Chrome: Dev Tools: Network: Preview
1 Yellow Line Run in Chrome: Dev Tools: Network: Preview
An inconsistency I noticed is, if I query a single route, the routes still are in an array of 1 route.

You have two options:
explicitly test for a list or dictionary, with isinstance()
put your access in a try:...except and catch the TypeError, then proceed to treat it as a single element.
It won't matter all that much which one you pick (but there can be a performance difference), pick the style you feel works best for your code.
For example, if you used an isistance() test, you could add a list around the single element so the rest of your code doesn't have to change:
for train_rt in trains_data['ctatt']['route']:
line_name = train_rt['#name']
train_runs = train_rt.get('train', [])
if not isinstance(train_runs, list):
# single entry, wrap
train_runs = [train_runs]
for train_run in train_runs:
# ...
Note that if the 'train' key is missing, the above code again normalises by using an empty list. That allows you to avoid another if test, because now the for loop will not iterate at all.
If you have a support contact for that API, I'd at least report the issue and point out that their data structure is inconsistent.

Printing new line in RTF file made with Python

I have made a GUI program where I can enter some values in various fields. All info from these fields is then combined into a dictionary, which is then stored using the shelve module. With the push of a button, I can then export all dictionary entries into an RTF file, as I want parts of the file formatted in italics.
The GUI and shelve part of the program works just fine. The problem I'm having is exporting multiple lines to the RTF file. When I print the strings I want to write to the RTF file into the python shell, I get multiple lines. But when I export it to RTF, it's all printed on one line. I know this should usually be fixed by adding an \n to the string, but this hasn't worked for me in any way. Can anyone tell me what I'm doing wrong, or maybe a workaround where I can still use italics to save the text?
As far as a working example goes:
data = dict()
data['first'] = {'author': 'Kendall MA'
'year': '1987',
'title': 'This is a test title'}
data['second'] = {'author': 'Mark',
'year': '2014',
'title': 'It is not working correctly'}
rtf = open('../test.rtf', 'w')
rtf.write(r'{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fswiss\fcharset0 Cambria;}}')
for key in data.keys():
entry = data[key]
rtf.write(r'{0} ({1}): \i {2} \i0'.format(entry['author'], entry['year'], entry['title']) + '\n')
rtf.write(r'}\n\x00')
rtf.close()
The output this code gives is:
Mark (2014): It is not working correctly Kendall (1987): This is a test title
While it should be:
Mark (2014): It is not working correctly
Kendall (1987): This is a test title
EDIT:
I found out that the combination of /line and /par works. Using them seperately does not for some reason that is unclear to me (maybe somebody can explain?).
But a new error occurred. When the author is in fact multiple authors, which I enter by list (['Kendall MA', 'Powsen RB']) and then make into a single string using ', '.join(entry['author']), the first word gets cut off. So I would get
'MA, Powsen RB' instead of 'Kendall MA, Powsen RB'. Does anyone know why and how to counter it?

\n has no special meaning in RTF. If you want to output a line break, you will need to use r'\line' or r'\par' for a paragraph break instead of (or in addition to, for readability) \n.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.