what is different loads, dumps vs read json file? - python

I have a simple json file 'stackoverflow.json"
{
"firstname": "stack",
"lastname": "overflow"
}
what is different between 2 below functions:
def read_json_file_1():
with open('stackoverflow.json') as fs:
payload = ujson.loads(fs.read())
payload = ujson.dumps(payload)
return payload
def read_json_file_2():
with open ('stackoverflow.json') as fs:
return payload = fs.read()
Then I use 'requests' module to send post resquest with payload from 2 above funtions and it works for both.
Thanks.

the 'loads' function takes a json file and converts it into a dictionary or list depending on the exact json file.
the `dumps' take a python data structure, and converts it back into json.
so you first function is loading and validating json, converting it into a python structure, and then converting it back to json before returning, whereas your 2nd function just reads the content of the file, with no conversion or validation
The functions are therefore only equivalent if the json is valid - if there are json errors then the two functions will execute very differently.If you know that the file contains an error free json the two files should return equivalent output, however if the file contains error within the json, then the first function will fail with relevant tracebacks etc, and the 2nd function wont generate any errors at all. The 1st function is more error friendly, but is less efficient (since it converts json -> python -> json). The 2nd function is much more efficient but is much less error friendly (i.e. it wont fail if the json is broken).

Related

How to custom indent JSON?

I want to be able to indent a JSON file in a way where the first key is on the same line as the opening bracket, by deafult, a json.dump() function puts it in a new line.
For example, if the original file was:
[
{
"statusName": "CO_FILTER",
"statusBar": {}
I want it to start like this:
[
{ "statusName": "CO_FILTER",
"statusBar":{}
I believe if you wanted to custom format it, you will most likely have to make/add onto the JSON library, just like the Google Json formatter and many others. But for the most part, its not like a value you can change on demand, it will either be included in another Json library apart from the default library from python. Maybe search for different python Json libraries that dumps Json the way you like.

How to use wiktextract

I am trying to extract a Wiktionary xml file from their dumps using the wiktextract python module. However their website does not give me enough information. I could not use the command line program that comes with it since it isn't a Windows executable, so I tried the programmatic way. The following code takes a while to run so it seems to be doing something but then I'm not sure what to do with the ctx variable. Can anyone help me?
import wiktextract
def word_cb(data):
print(data)
ctx = wiktextract.parse_wiktionary(
r'myfile.xml', word_cb,
languages=["English", "Translingual"])
You are on the right track, but don't have to worry too much about the ctx object.
As the documentation says:
The parse_wiktionary call will call word_cb(data) for words and redirects found in the
Wiktionary dump. data is information about a single word and part-of-speech as a dictionary (multiple senses of the same part-of-speech are combined into the same dictionary). It may also be a redirect (indicated by presence of a redirect key in the dictionary).
The output ctx object mostly contains summary information (the number of sections processed, etc; you can use dir(ctx) to see some of its fields.
The useful results are not the ones in the returned ctx object, but the ones passed to word_cb on a word-by-word basis. So you might just try something like the following to get a JSON dump from a wiktionary XML dump. Because the full dumps are many gigabytes, I put a small one on a server for convenience in this example.
import json
import wiktextract
import requests
xml_fn = 'enwiktionary-20190220-pages-articles-sample.xml'
print("Downloading XML dump to " + xml_fn)
response = requests.get('http://45.61.148.79/' + xml_fn, stream=True)
# Throw an error for bad status codes
response.raise_for_status()
with open(xml_fn, 'wb') as handle:
for block in response.iter_content(4096):
handle.write(block)
print("Downloaded XML dump, beginning processing...")
fh = open("output.json", "wb")
def word_cb(data):
fh.write(json.dumps(data))
ctx = wiktextract.parse_wiktionary(
r'enwiktionary-20190220-pages-articles-sample.xml', word_cb,
languages=["English", "Translingual"])
print("{} English entries processed.".format(ctx.language_counts["English"]))
print("{} bytes written to output.json".format(fh.tell()))
fh.close()
For me this produces:
Downloading XML dump to enwiktionary-20190220-pages-articles-sample.xml
Downloaded XML dump, beginning processing...
684 English entries processed.
326478 bytes written to output.json
with the small dump extract I placed on a server for convenience. It will take much longer to run on the full dump.

json.load changing the string that is input

Hi I am working on a simple program that takes data from a json file (input through an html form with flask handling the data) and uses this data to make calls to an API.
So I have some JSON like this:
[{"id": "ßLÙ", "server": "NA"}]
and I want to send the id to an api call like this example:
http://apicallnamewhatever+id=ßLÙ
however when i load the json file into my app.py with the following command
ids = json.load(open('../names.json'))
json.load seems to alter the id from 'ßLÙ' to 'ßLÙ'
im not sure why this happens during json.load, but i need to find a way to get 'ßLÙ' into the api call instead of the deformed 'ßLÙ'
It looks as if your names.json is encoded in "utf-8", but you are opening it as "windows-1252" [*] or something like that. Try
json.load(open('names.json', encoding="utf-8"))
and you probably should also URL-encode the id instead of concatenating it directly with that server address, something along these lines:
urllib2.quote(idExtractedFromJson.encode("utf-8")
[*] Thanks #jDo for pointing that out, I initially guessed the wrong codepage.

loading Behave test results in JSON file in environment.py's after_all() throws error

I'm trying to send my Behave test results to an API Endpoint. I set the output file to be a new JSON file, run my test, and then in the Behave after_all() send the JSON result via the requests package.
I'm running my Behave test like so:
args = ['--outfile=/home/user/nathan/results/behave4.json',
'--for mat=json.pretty']
from behave.__main__ import main as behave_main
behave_main(args)
In my environment.py's after_all(), I have:
def after_all(context):
data = json.load(open('/home/user/myself/results/behave.json', 'r')) # This line causes the error
sendoff = {}
sendoff['results'] = data
r = requests.post(MyAPIEndpoint, json=sendoff)
I'm getting the following error when running my Behave test:
HOOK-ERROR in after_all: ValueError: Expecting object: line 124 column 1
(char 3796)
ABORTED: By user.
The reported error is here in my JSON file:
[
{
...
} <-- line 124, column 1
]
However, behave.json is outputted after the run and according to JSONLint it is valid JSON. I don't know the exact details of after_all(), but I think the issue is that the JSON file isn't done writing by the time I try to open it in after_all(). If I try json.load() a second time on the behave.json file after the file is written, it runs without error and I am able to view my JSON file at the endpoint.
Any better explanation as to why this is happening? Any solution or change in logic to get past this?
Yes, it seems as though the file is still in the process of being written when I try to access it in after_all(). I put in a small delay before I open the file in my code, then I manually viewed the behave.json file and saw that there was no closing ] after the last }.
That explains that. I will create a new question to find out how to get by this, or if a change in a logic is required.

Retrieving a JSON valid string from a python requests.post()

I am extremely confused after trying a few possible solutions and getting various errors that just lead me in circles. I have a function that will grab a tweet, put it in a dictionary, then write that dictionary to a file using dumps like so:
jsonFile = {}
jsonFile["tweet"] = tweet
jsonFile["language"] = language
with open('jsonOutputfile.txt', 'w') as f:
json.dump(jsonFile, f)
I then have another python file that has a function that will return the value of this jsonOutputfile.txt if I want to use it elsewhere. I do that like so:
with open('jsonOutputfile.txt') as f:
jsonObject = json.load(f)
return jsonObject
This function sits on my localhost. The above two functions that have to do with saving and retrieving the JSON file are separate from the rest of my functions below, as I want them to be.
I have another function that will retrieve the values of the returned status using python requests, like so:
def grab_tweet():
return requests.post("http://gateway:8080/function/twittersend")
and then after grabbing the tweet I want to manipulate it, and I want to do so using the JSON that I should have received from this request.
r = grab_tweet()
data = json.dumps(r.text)
return data.get('tweet')
I want this function above to return just the value that is associated with the tweet key in the JSON that I received from when I saved and loaded it. However, I keep on getting the following error: AttributeError: 'str' object has no attribute 'get' which I am confused about because from my understanding using json.dumps() should create a JSON valid string that I can call get on. Is there an encoding error when I am transferring this to and from a file, or maybe when I am receiving my request?
Any help is appreciated.
EDIT:
Here is a sample of a response from my requests.post when I use r.text, it also looks like there is some Unicode in the response so I put an example at the end of the tweet section. (This also doesn't look like a JSON which is what my question is centered around. There should at least be double quotes and no U's right?):
{u'tweet': u'RT THIS IS THE TWEET BLAH BLAH\u2026', u'language': u'en'}
Use .json() in requests module to get response as JSON
Ex:
data = r.json()
return data.get('tweet')
Note: json.dumps convert your response to a string object
Edit as per comment - Try using the ast module.
Ex:
import ast
data = ast.literal_eval(r.text)
You will need to use the .json() method. See requests' documentation: JSON Response Content
Also, for future reference, rather than do
f.write(json.dumps(jsonFile))
You could simply use:
json.dump(jsonFile, f)
Same with using load instead of loads:
jsonObject = json.load(f)

Categories