Python not parsing JSON API return prettily - python

I reviewed a handful of the questions related to mine and found this slightly unique. I'm using Python 2.7.1 on OS X 10.7. One more note: I'm more of a hacker than developer.
I snagged the syntax below from Python documentation to try to do a "Pretty Print:"
date = {}
data = urllib2.urlopen(url)
s = json.dumps(data.read(), sort_keys=True, indent=4)
print '\n'.join([l.rstrip() for l in s.splitlines()])
I expected using the rstrip / splitlines commands would expand out the calls like in the example.
Also, not sure if it's relevant, but when tring to pipe the output to python -mjson.tool the reply is No JSON object could be decoded
Here's a snippet of the cURL output I'm trying to parse:
{"data":[{"name":"Site Member","created_at":"2012-07-24T11:22:04-07:00","activity_id":"500ee7cbbaf02xxx8e011e2e",
And so on.
The main objective is to make this mess of data more legible so I can learn from it and start structuring some automatic scraping of data based on arguments. Any guidance to get me from green to successful is a huge help.
Thanks,
mjb

The output of the urllib2.urlopen().read() is a string and needs to be converted to an object first before you can call json.dumps() on it.
Modified code:
date = {}
data = urllib2.urlopen(url)
data_obj = json.loads(data.read())
s = json.dumps(data_obj, sort_keys=True, indent=4)
print s

Related

JSON Dictionary in .txt value has None type as a value, can't import correctly?

working on a larger project utilizing an API and I am having issues writing / reading the JSON response as a dictionary in a .txt file. I believe I have gotten to the point where the error seems to be driven by the formatting of the dictionary, which writes values in the dictionary such as None and datetime.date(2020, 7, 11).
What am I doing in writing or reading this incorrectly?
I read in by:
with open('./testing/txns.txt', 'r') as f:
txns = f.read()
txns = txns.replace("'", "\"") # to solve for single quotes
Sample of what might be in 'txns.txt':
{'account_owner': None,
'amount': 6.33,
'authorized_date': datetime.date(2020, 12, 23)}
Error from reading:
json.decoder.JSONDecodeError: Expecting value: line 1 column 19 (char 18)
If you need more context - I am happy to provide that. I am trying to work with Plaid's API, specifically the Transactions endpoint and client.transactions_get(request). I successfully acquired transactions through this line, and I would now like to debug, do discovery, and develop with this sample data. I cannot write it and read it correctly and I think this is why :)
Update1:
This is my code now
response = client.transactions_get(request)
with open('txns.txt', 'w') as f:
f.write(json.dumps(response.to_dict()))
Error:TypeError: Object of type date is not JSON serializable
Update2 - SOLVED
With the help of Alex & How to overcome "datetime.datetime not JSON serializable"? I figured it out:
f.write(json.dumps(response.to_dict(), default=str))
Yep, the commenters are correct that the Plaid Python client library doesn't return this value as JSON. There's a section about this in the README at https://github.com/plaid/plaid-python#converting-the-response-to-a-json
Converting the response to a JSON
As this is a common question, we've
included this in the README. plaid-python uses models like
TransactionsGetResponse to encapsulate API responses. If you want to
convert this to a JSON, do something like this:
import json
...
response = ... # type TransactionsGetResponse
# to_dict makes it first a python dictionary, and then we turn it into a
# string JSON.
json_string = json.dumps(response.to_dict())

Python parsing JSON from url incomplete

I am trying to get a JSON from a URL I have succeeded in doing that but when I print the result it seems like I get only half of the content.
this is what I wrote:
with urllib.request.urlopen("https://financialmodelingprep.com/api/v3/company/stock/list") as url:
data = json.loads(url.read().decode())
print(data)
I won't print what I am getting because it's very long(if you look at the source and ctrl+f look up the sentence you will find that its in the middle of the page) but it starts with
ties Municipal Fund', 'price': 9.83,
thanks for the help.
I think that your output has been cut by your IDE.
When I run the same request and I write it into a file, you can see the that the data is fully written:
import requests
with requests.get("https://financialmodelingprep.com/api/v3/company/stock/list") as url:
data = url.content.decode()
with open("file.txt","w") as f:
f.write(data)
When the response is large, the first read may only return what is available at that time. If you want to be sure to have the full data, you should loop reading until you get nothing (an empty byte string here).
Here, it would probably be more robust to pass the response stream to json.load and let it read until the json string is completed:
with urllib.request.urlopen(
"https://financialmodelingprep.com/api/v3/company/stock/list") as url:
data = json.load(url)
print(data)
But I think that if the data was partial, json.loads should have choked on it. So the problem is likely elsewhere...

Retrieving a JSON valid string from a python requests.post()

I am extremely confused after trying a few possible solutions and getting various errors that just lead me in circles. I have a function that will grab a tweet, put it in a dictionary, then write that dictionary to a file using dumps like so:
jsonFile = {}
jsonFile["tweet"] = tweet
jsonFile["language"] = language
with open('jsonOutputfile.txt', 'w') as f:
json.dump(jsonFile, f)
I then have another python file that has a function that will return the value of this jsonOutputfile.txt if I want to use it elsewhere. I do that like so:
with open('jsonOutputfile.txt') as f:
jsonObject = json.load(f)
return jsonObject
This function sits on my localhost. The above two functions that have to do with saving and retrieving the JSON file are separate from the rest of my functions below, as I want them to be.
I have another function that will retrieve the values of the returned status using python requests, like so:
def grab_tweet():
return requests.post("http://gateway:8080/function/twittersend")
and then after grabbing the tweet I want to manipulate it, and I want to do so using the JSON that I should have received from this request.
r = grab_tweet()
data = json.dumps(r.text)
return data.get('tweet')
I want this function above to return just the value that is associated with the tweet key in the JSON that I received from when I saved and loaded it. However, I keep on getting the following error: AttributeError: 'str' object has no attribute 'get' which I am confused about because from my understanding using json.dumps() should create a JSON valid string that I can call get on. Is there an encoding error when I am transferring this to and from a file, or maybe when I am receiving my request?
Any help is appreciated.
EDIT:
Here is a sample of a response from my requests.post when I use r.text, it also looks like there is some Unicode in the response so I put an example at the end of the tweet section. (This also doesn't look like a JSON which is what my question is centered around. There should at least be double quotes and no U's right?):
{u'tweet': u'RT THIS IS THE TWEET BLAH BLAH\u2026', u'language': u'en'}
Use .json() in requests module to get response as JSON
Ex:
data = r.json()
return data.get('tweet')
Note: json.dumps convert your response to a string object
Edit as per comment - Try using the ast module.
Ex:
import ast
data = ast.literal_eval(r.text)
You will need to use the .json() method. See requests' documentation: JSON Response Content
Also, for future reference, rather than do
f.write(json.dumps(jsonFile))
You could simply use:
json.dump(jsonFile, f)
Same with using load instead of loads:
jsonObject = json.load(f)

Python reading responses and validating the result

currently I am stuck with being able to print out the result gotten from the API, but not being able to alter nor read them without parsing it into a text file.
Furthermore, I wouldn't need all of the information that the API provides and would be great if I can only have the match_id.
The response from the API:Result.
From the result I would only need the match_id and after I have gotten the match_id, I would compare it with a list of string e.g. 3238829394, 3238829395 and more, to check whether does any of the value are similar to mine, and if it's similar, the system would then alert me
I have found a way of doing it by passing the results into a text file, then comparing it with the list that I have.
The code for getting the response:
import dota2api
import json
import requests
api = dota2api.Initialise("[Value API][2]")
reponse = api.get_match_history_by_seq_num(start_at_match_seq_num=2829690055, matches_requested=1)
response = str(hist)
f = open('myfile.txt', 'w')
f.write(response)
f.close()
However I am hoping to find a faster and better way to do this process, as it is very time consuming and unstable. Thank you.
You are getting a JSON file back from that API. In python all data can be accessed directly without parsing it.
The response will be something like (sorry, but in that image I cannot copy paste to read the JSON properly):
for match in response['matches']:
if is_similar(match['match_id']):
do_something_cool_here
I think that should do what you need. If you give the answer as string I can help you building the code properly, but I guess you get the idea of what I am trying to say there :)
Hope it helps!
EDIT:
We talked by private and this works:
import dota2api
import requests
api = dota2api.Initialise("API_KEY")
response = api.get_match_history_by_seq_num(start_at_match_seq_num=SEQ_NUM, matches_requested=1)
match_id_check = MATCH_ID
for match in response['matches']:
if match_id_check == match['match_id']:
print(match)
with API_KEY, SEQ_NUM and MATCH_ID to configure

requests - Python command line behavior differs from behavior when script is run

I'm trying to write a script that will input data I supply into a web form at a url I supply.
To start with, I'm testing it out by simply getting the html of the page and outputting it as a text file. (I'm using Windows, hence .txt.)
import sys
import requests
sys.stdout = open('html.txt', 'a')
content = requests.get('http://www.york.ac.uk/teaching/cws/wws/webpage1.html')
content.text
When I do this (i.e., the last two lines) on the python command line (>>>), I get what I expect. When I do it in this script and run it from the normal command line, the resulting html.txt is blank. If I add print(content) then html.txt contains only: <Response [200]>.
Can anyone elucidate what's going on here? Also, as you can probably tell, I'm a beginner, and I can't for the life of me find a beginner-level tutorial that explains how to use requests (or urllib[2] or selenium or whatever) to send data to webpages and retrieve the results. Thanks!
You want:
import sys
import requests
result = requests.get('http://www.york.ac.uk/teaching/cws/wws/webpage1.html')
if result.status_code == requests.codes.ok:
with open('html.txt', 'a') as sys.stdout:
print result.content
Requests returns an instance of type request.Response. When you tried to print that, the __repr__ method was called, which looks like this:
def __repr__(self):
return '<Response [%s]>' % (self.status_code)
That is where the <Response [200]> came from.
The requests.Reponse has a content attribute which is an instance of str (or bytes for Python 3) that contains your HTML.
The text attribute is type unicode which may or may not be what you want. You mention in the comments that you saw a UnicodeDecodeError when you tried to write it to a file. I was able to replace the print result.content above with print result.text and I did not get that error.
If you need help solving your unicode problems, I recommend reading this unicode presentation. It explains why and when to decode and encode unicode.
The interactive interpreter echoes the result of every expression that doesn't produce None. This doesn't happen in regular scripts.
Use print to explicitly echo values:
print response.content
I used the undecoded version here as you are redirecting stdout to a file with no further encoding information.
You'd be better of writing the output directly to a file however:
with open('html.txt', 'ab') as outputfile:
outputfile.write(response.content)
This writes the response body, undecoded, directly to the file.

Categories