How to parse JSONP response in python? - python

How can i parse JSONP response, i tried json.loads(), but it will never work for JSONP

By the reading following
JSONP is JSON with padding, that is, you put a string at the beginning
and a pair of parenthesis around it.
I tried to remove padding from the string and used json.loads()
from json import loads
response = requests.get(link)
startidx = response.text.find('(')
endidx = response.text.rfind(')')
data = loads(response.text[startidx + 1:endidx])
it's working

Related

Storing response as json in Python requests-cache

I'm using requests-cache to cache http responses in human-readable format.
I've patched requests using the filesystem backend, and the the serializer to json, like so:
import requests_cache
requests_cache.install_cache('example_cache', backend='filesystem', serializer='json')
The responses do get cached as json, but the response's body is encoded (I guess using the cattrs library, as described here).
Is there a way to make requests-cache save responses as-is?
What you want to do makes sense, but it's a bit more complicated than it appears. The response files you see are representations of requests.Response objects. Response._content contains the original bytes received from the server. The wrapper methods and properties like Response.json() and Response.text will then attempt to decode that content. For a Response object to work correctly, it needs to have the original binary response body.
When requests-cache serializes that response as JSON, the binary content is encoded in Base85. That's why you're seeing encoded bytes instead of JSON there. To have everything including the response body saved in JSON, there are a couple options:
Option 1
Make a custom serializer. If you wanted to be able to modify response content and have those changes reflected in responses returned by requests-cache, this would probably be the best way to do it.
This may be become a bit convoluted, because you would have to:
Handle response content that isn't valid JSON, and save as encoded bytes instead
During deserialization, if the content was saved as JSON, convert it back into bytes to recreate the original Response object
It's doable, though. I could try to come up with an example later, if needed.
Option 2
Make a custom backend. It could extend FileCache and FileDict, and copy valid JSON content to a separate file. Here is a working example:
import json
from os.path import splitext
from requests import Response
from requests_cache import CachedSession, FileCache, FileDict
class JSONFileCache(FileCache):
"""Filesystem backend that copies JSON-formatted response content into a separate file
alongside the main response file
"""
def __init__(self, cache_name, **kwargs):
super().__init__(cache_name, **kwargs)
self.responses = JSONFileDict(cache_name, **kwargs)
class JSONFileDict(FileDict):
def __setitem__(self, key: str, value: Response):
super().__setitem__(key, value)
response_path = splitext(self._path(key))[0]
json_path = f'{response_path}_content.json'
# Will handle errors and skip writing if content can't be decoded as JSON
with self._try_io(ignore_errors=True):
content = json.dumps(value.json(), indent=2)
with open(json_path, mode='w') as f:
f.write(content)
Usage example:
custom_backend = JSONFileCache('example_cache', serializer='json')
session = CachedSession(backend=custom_backend)
session.get('https://httpbin.org/get')
After making a request, you will see a pair of files like:
example_cache/680f2a52944ee079.json
example_cache/680f2a52944ee079_content.json
That may not be exactly what you want, but it's the easiest option if you only need to read the response content and don't need to modify it.

JSON TypeError: string indices must be integers

I get the error "TypeError: string indices must be integers" in the following code.
import json
import requests
url = "https://petition.parliament.uk/petitions/300139.json"
response = requests.get(url)
data = response.text
parsed = json.loads(data)
sig_count = data["attributes"]["signature_count"]
print(sig_count)
import json
url = 'https://petition.parliament.uk/petitions/300139.json'
response = requests.get(url)
data = response.text
parsed = json.loads(data)
sig_count = parsed["data"]["attributes"]["signature_count"]
print(sig_count)
you are calling the variable data instead of parsed. You are also missing the "data" key when filtering.
After using json.loads(), you need to use the newly defined variable because the opreration does not happen in place. data` is the json in its raw form, interpreted as a string.
Try with:
parsed['attributes']['signature_count']

why type(JSON) is str in python?

I got some data from an API with requests:
r = requests.get(...)
a = r.text
print(type(a))
str2JSON = json.dumps(a,indent=4)
print(type(str2JSON))
The result is:
class 'str'
class 'str'
Then I try loads instead of dumps:
str2JSON_2 = json.loads(a)
print(type(str2JSON_2))
And I get class list!!!
Why is that behaviour?
If you dump a string into JSON and you don’t get an error, does it automatically mean that the JSON is well parsed? Shouldn't that be a JSON class?
The thing you get back from requests is a str value containing a JSON encoded value.
dumps takes that str and produces another string containing the JSON-encoded version of the original (JSON-encoded) string.
You need loads to decode the string into a value.
json2str = json.loads(a,indent=4) # name change to reflect the direction of the operation
Consider:
>>> s = '"foo"' # A JSON string value
>>> json.dumps(s)
'"\\"foo\\""'
>>> json.loads(s)
'foo'
The string may, of course, encode a value other than a simple string:
>>> json.loads('3') # Compare to json.loads('"3"') returning '3'
3
>>> json.loads('[1,2,3]')
[1,2,3]
>>> json.loads('{"foo": 6}')
{'foo': 6}
requests, though, doesn't actually require you to remember which direction dumps and loads go (although you should make it a point to learn). A Response object has a json method which decodes the text attribute for you.
json2str = r.json() # equivalent to json2str = json.loads(r.text)
you are using requests. it provides convince method to parse your response as json (i.e. it loads it for you) you want a = r.json(). This way a would be JSON object and you can dump it as string later on. That is assuming you get valid json as response.
here is an example
import requests
import json
url = 'https://reqres.in/api/users' # dummy resposnse
resp = requests.get(url)
my_json = resp.json()
#print example user
print(my_json['data'][0])
json_string = json.dumps(my_json, indent=4)
print(json_string)
json.dumps returns a JSON formatted python string object.
Below the is the statement you can notice in actual implementation file for def dumps
"""Serialize obj to a JSON formatted str.

How to decode a post url string containing plus sign

I have a encoded URL string (click to visit the page)
http://epub.sipo.gov.cn/patentoutline.action?strWhere=OPD%3D%272019.02.15%27+and+PA%3D%27%25%E5%8D%8E%E4%B8%BA%25%27
obtain by chrome inspect. I try to write a requests post function to get the page, the best I could figure out is the following, however, it does not work properly. The troubling part seems to be the plus sign. (If there is no and clause, "OPD='2019.02.15'" or "PA='%华为%'" works fine.)
import requests
url = 'http://epub.sipo.gov.cn/patentoutline.action'
params = {'strWhere': r"OPD='2019.02.15' and PA='%华为%'"} # cannot find results
# params = {'strWhere': r"OPD='2019.02.15'"} # works
# params = {'strWhere': r"PA='%华为%'"} # works
r = requests.post(url, data=params)
print(r.content.decode())
replace in the url the spaces with %20
you can use a function before send it
str.replace(old, new[, max])
for example
params = {'strWhere': r"OPD='2019.02.15' and PA='%华为%'"}
params['strWhere'] = params['strWhere'].replace(' ', '%20')

How to parse a HTML response as json format using python?

I used python2 to make a request to RNAcentral database, and I read the response as JSON format by the use of this command: response.json().
This let me read the data as a dictionary data type, so I used the corresponding syntax to obtain the data from cross references, which contained some links to other databases, but when I try to make the request for each link using the command mentioned above, I can't read it as JSON, because I can only obtain the response content as HTML.
So I need to know how to read make a request to each link from cross references and read it as JSON using python language.
Here is the code:
direcc = 'http://rnacentral.org/api/v1/rna/'+code+'/?flat=true.json'
resp = requests.get(direcc)
datos=resp.json()
d={}
links = []
for diccionario in datos['xrefs']['results']:
if diccionario['taxid']==9606:
base_datos=diccionario['database']
for llave,valor in diccionario['accession'].iteritems():
d[base_datos]={'url':diccionario['accession']['url'],
'expert_db_url':diccionario['accession']['expert_db_url'],
'source_url':diccionario['accession']['source_url']}
for key,value in d.iteritems():
links.append(d[key]['expert_db_url'])
for item in links:
response = requests.get(item)
r = response.json()
And this is the error I get: ValueError: No JSON object could be decoded.
Thank you.

Categories