Validate well-formed JSON using Python - python

I would like to create a python script that makes a series of web requests to endpoints that will spit out JSON. After receiving the response, I want to validate that it is well-formed JSON, and not some error page. Also, I will need to insert an API key into the request header to get a proper response. what is the best way to go about doing this in python?
Thanks!

To validate the JSON you can use http://python-jsonschema.readthedocs.org/en/latest/
To insert the API key in the header you can use Requests http://docs.python-requests.org/en/latest/user/quickstart/

Validate the json use json.dump(), if your json object incorect it's return the error, handle the error using try catch and answer to your second question , i think you talking about X-AUTH-TOKEN , if your request return json object use this header.
headers = {
"Content-Type": "application/json",
"X-AUTH-TOKEN": your API Token
}
otherwise use
"Content-type":"application/x-www-form-urlencoded"

Related

How do you get a part of a json response in python?

I am a new programmer and I'm learning the request module. I'm stuck on the fact that I don't know how to get a specific part of a json response, I think it's called a header? or its the thing inside of a header? I'm not sure. But the API returns simple json code. This is the api
https://mcapi.us/server/status?ip=mc.hypixel.net
for more of a example, lets say it returns this json code from the api
{"status":"success","online":true"}
And I wanted to get the "online" response, how would I do that?
And this is the code im currently working with.
import requests
def main():
ask = input("IP : ")
response = requests.get('https://mcapi.us/server/status?ip=' + ask)
print(response.content)
main()
And to be honest, I don't even know if this is json. I think it is but the api page says its cors? if it isn't I'm sorry.
In your example you have a dictionary with key "online"
You need to parse it first with .json() and then you can get it in form dict[key]
In your case
response = requests.get('https://mcapi.us/server/status?ip=' + ask).json()
print(response["online"])
or in case of actual content
response = requests.get('https://mcapi.us/server/status?ip=' + ask).json()
print(response["content"])

GET requests works only with params not just url

I've just discovered something strange. When downloading data from facebook with GET using the requests 2.18.4 library, I get error when I just use
requests.get('https://.../{}/likes?acces_token={}'.format(userID,token))
into which I parse the user ID and access - the API does not read the access token correctly.
But, it works fine as
requests.get('https://../{}'.format(userID), params={"access_token":token})
Or it works when I copy paste the values in the appropriate fields by hand in the python console.
So my hypothesis is that it has something to with how the token string got parsed using the params vs the string. But what I don't understand at all, why would that be the case? Or is ? character somehow strange in this case?
Double check if both the URLs are the same (in your post they differ by the /likes substring).
Then you can check how the library requests concatenated parameters from the params argument:
url = 'https://facebook.com/.../{}'.format(userID)
r = requests.Request('GET', url, params={"access_token":token})
pr = r.prepare()
print pr.url

Is a request response always JSON?

I'm trying to understand what exactly I'm getting back when I make a POST request using the Requests module — is it always JSON? Seems like every response I get appears to be JSON, but I'm not sure.
Where r is my response object, when I do:
print r.apparent_encoding
It always seems to return ascii
And when I try type():
>>>print type(r)
<class 'requests.models.Response'
I pasted the output from print r.text into a JSON validator, and it reported no errors. So should I assume Requests is providing my with JSON objects here?
A response can be anything. If you've posted to a REST endpoint, it will usually respond with JSON. If so, requests will detect that and allow you to decode it via the .json() method.
But it's perfectly possible for you to post to a normal web URL, in effect pretending to be a browser, and unless the server is doing something really clever it will just respond with the standard HTML it would serve to the browser. In that case, doing response.json() will raise a ValueError.
No, the response text for a POST request is totally up to the web service. A good REST API will always respond with JSON, but you will not always get that.
Example
A common pattern in PHP is
<?php
$successful_whatever = false;
if (isset($_POST['whatever'])) {
# put $_POST['whatever'] in a database
$successful_whatever = true;
}
echo $twig->render('gallery.twig',
array('successful_whatever' => $successful_whatever));
?>
As you can see the response text will be a rendered template (HTML). I'm not saying it is good, just that it is common.

Send Cookie Using Python Requests

I am trying to scrape some data (reproduce a POST operation I did in a browser) by using Python Requests library. Expecting it will return the content I saw while using browser by copying the request header and post form.
However, I am not quite sure what is the correct way to send cookies using Python Requests.
Here is a screen shot how it looks like in Chrome.
It looks like I can either use cookie as a key in the request header or use the cookie argument in the post command.
(1). If I want to use cookie in the request header, should I treat the whole cookie content as a long string and set the key to be cookie and set the value to be that string?
(2). If I want to use the cookie argument in the request.post command, should I manually translate that long string into a dictionary first, and then pass to cookie argument?. Something like this?
mycookie = {'firsttimevisitor':'N'; 'cmTPSet':'Y'; 'viewType':'List'... }
# Then
r = requests.post(myurl, data=myformdata, cookies=mycookie, headers=myheaders)
Thanks!
Just follow the documentation:
>>> url = 'http://httpbin.org/cookies'
>>> cookies = dict(cookies_are='working')
>>> r = requests.get(url, cookies=cookies)
>>> r.text
'{"cookies": {"cookies_are": "working"}}'
So use a dict for cookie, and note it's cookies not cookie.
Yes. But make sure you call it "Cookie" (With capital C)
I always did it with a dict. Requests expects you to give a dict.
A string will give the following
cookiejar.set_cookie(create_cookie(name, cookie_dict[name]))
TypeError: string indices must be integers, not str

HTTP POST and parsing JSON with Scrapy

I have a site that I want to extract data from. The data retrieval is very straight forward.
It takes the parameters using HTTP POST and returns a JSON object. So, I have a list of queries that I want to do and then repeat at certain intervals to update a database. Is scrapy suitable for this or should I be using something else?
I don't actually need to follow links but I do need to send multiple requests at the same time.
How does looks like the POST request? There are many variations, like simple query parameters (?a=1&b=2), form-like payload (the body contains a=1&b=2), or any other kind of payload (the body contains a string in some format, like json or xml).
In scrapy is fairly straightforward to make POST requests, see: http://doc.scrapy.org/en/latest/topics/request-response.html#request-usage-examples
For example, you may need something like this:
# Warning: take care of the undefined variables and modules!
def start_requests(self):
payload = {"a": 1, "b": 2}
yield Request(url, self.parse_data, method="POST", body=urllib.urlencode(payload))
def parse_data(self, response):
# do stuff with data...
data = json.loads(response.body)
For handling requests and retrieving response, scrapy is more than enough. And to parse JSON, just use the json module in the standard library:
import json
data = ...
json_data = json.loads(data)
Hope this helps!
Based on my understanding of the question, you just want to fetch/scrape data from a web page at certain intervals. Scrapy is generally used for crawling.
If you just want to make http post requests you might consider using the python requests library.

Categories