The following code in utils.py
manager = PoolManager()
data = json.dumps(dict) #takes in a python dictionary of json
manager.request("POST", "https://myurlthattakesjson", data)
Gives me ValueError: need more than 1 value to unpack when the server is run. Does this most likely mean that the JSON is incorrect or something else?
Your Json data needs to be URLencoded for it to be POST (or GET) safe.
# import parser
import urllib.parse
manager = PoolManager()
# stringify your data
data = json.dumps(dict) #takes in a python dictionary of json
# base64 encode your data string
encdata = urllib.parse.urlencode(data)
manager.request("POST", "https://myurlthattakesjson", encdata)
I believe in python3 they made some changes that the data needs to be binary. See unable to Post data to a login form using urllib python v3.2.1
Related
This question already has answers here:
How can I parse (read) and use JSON?
(5 answers)
What are the differences between the urllib, urllib2, urllib3 and requests module?
(11 answers)
Closed last month.
I want to dynamically query Google Maps through the Google Directions API. As an example, this request calculates the route from Chicago, IL to Los Angeles, CA via two waypoints in Joplin, MO and Oklahoma City, OK:
http://maps.googleapis.com/maps/api/directions/json?origin=Chicago,IL&destination=Los+Angeles,CA&waypoints=Joplin,MO|Oklahoma+City,OK&sensor=false
It returns a result in the JSON format.
How can I do this in Python? I want to send such a request, receive the result and parse it.
I recommend using the awesome requests library:
import requests
url = 'http://maps.googleapis.com/maps/api/directions/json'
params = dict(
origin='Chicago,IL',
destination='Los+Angeles,CA',
waypoints='Joplin,MO|Oklahoma+City,OK',
sensor='false'
)
resp = requests.get(url=url, params=params)
data = resp.json() # Check the JSON Response Content documentation below
JSON Response Content: https://requests.readthedocs.io/en/master/user/quickstart/#json-response-content
The requests Python module takes care of both retrieving JSON data and decoding it, due to its builtin JSON decoder. Here is an example taken from the module's documentation:
>>> import requests
>>> r = requests.get('https://github.com/timeline.json')
>>> r.json()
[{u'repository': {u'open_issues': 0, u'url': 'https://github.com/...
So there is no use of having to use some separate module for decoding JSON.
requests has built-in .json() method
import requests
requests.get(url).json()
import urllib
import json
url = 'http://maps.googleapis.com/maps/api/directions/json?origin=Chicago,IL&destination=Los+Angeles,CA&waypoints=Joplin,MO|Oklahoma+City,OK&sensor=false'
result = json.load(urllib.urlopen(url))
Use the requests library, pretty print the results so you can better locate the keys/values you want to extract, and then use nested for loops to parse the data. In the example I extract step by step driving directions.
import json, requests, pprint
url = 'http://maps.googleapis.com/maps/api/directions/json?'
params = dict(
origin='Chicago,IL',
destination='Los+Angeles,CA',
waypoints='Joplin,MO|Oklahoma+City,OK',
sensor='false'
)
data = requests.get(url=url, params=params)
binary = data.content
output = json.loads(binary)
# test to see if the request was valid
#print output['status']
# output all of the results
#pprint.pprint(output)
# step-by-step directions
for route in output['routes']:
for leg in route['legs']:
for step in leg['steps']:
print step['html_instructions']
just import requests and use from json() method :
source = requests.get("url").json()
print(source)
OR you can use this :
import json,urllib.request
data = urllib.request.urlopen("url").read()
output = json.loads(data)
print (output)
Try this:
import requests
import json
# Goole Maps API.
link = 'http://maps.googleapis.com/maps/api/directions/json?origin=Chicago,IL&destination=Los+Angeles,CA&waypoints=Joplin,MO|Oklahoma+City,OK&sensor=false'
# Request data from link as 'str'
data = requests.get(link).text
# convert 'str' to Json
data = json.loads(data)
# Now you can access Json
for i in data['routes'][0]['legs'][0]['steps']:
lattitude = i['start_location']['lat']
longitude = i['start_location']['lng']
print('{}, {}'.format(lattitude, longitude))
Also for pretty Json on console:
json.dumps(response.json(), indent=2)
possible to use dumps with indent. (Please import json)
I'm using requests-cache to cache http responses in human-readable format.
I've patched requests using the filesystem backend, and the the serializer to json, like so:
import requests_cache
requests_cache.install_cache('example_cache', backend='filesystem', serializer='json')
The responses do get cached as json, but the response's body is encoded (I guess using the cattrs library, as described here).
Is there a way to make requests-cache save responses as-is?
What you want to do makes sense, but it's a bit more complicated than it appears. The response files you see are representations of requests.Response objects. Response._content contains the original bytes received from the server. The wrapper methods and properties like Response.json() and Response.text will then attempt to decode that content. For a Response object to work correctly, it needs to have the original binary response body.
When requests-cache serializes that response as JSON, the binary content is encoded in Base85. That's why you're seeing encoded bytes instead of JSON there. To have everything including the response body saved in JSON, there are a couple options:
Option 1
Make a custom serializer. If you wanted to be able to modify response content and have those changes reflected in responses returned by requests-cache, this would probably be the best way to do it.
This may be become a bit convoluted, because you would have to:
Handle response content that isn't valid JSON, and save as encoded bytes instead
During deserialization, if the content was saved as JSON, convert it back into bytes to recreate the original Response object
It's doable, though. I could try to come up with an example later, if needed.
Option 2
Make a custom backend. It could extend FileCache and FileDict, and copy valid JSON content to a separate file. Here is a working example:
import json
from os.path import splitext
from requests import Response
from requests_cache import CachedSession, FileCache, FileDict
class JSONFileCache(FileCache):
"""Filesystem backend that copies JSON-formatted response content into a separate file
alongside the main response file
"""
def __init__(self, cache_name, **kwargs):
super().__init__(cache_name, **kwargs)
self.responses = JSONFileDict(cache_name, **kwargs)
class JSONFileDict(FileDict):
def __setitem__(self, key: str, value: Response):
super().__setitem__(key, value)
response_path = splitext(self._path(key))[0]
json_path = f'{response_path}_content.json'
# Will handle errors and skip writing if content can't be decoded as JSON
with self._try_io(ignore_errors=True):
content = json.dumps(value.json(), indent=2)
with open(json_path, mode='w') as f:
f.write(content)
Usage example:
custom_backend = JSONFileCache('example_cache', serializer='json')
session = CachedSession(backend=custom_backend)
session.get('https://httpbin.org/get')
After making a request, you will see a pair of files like:
example_cache/680f2a52944ee079.json
example_cache/680f2a52944ee079_content.json
That may not be exactly what you want, but it's the easiest option if you only need to read the response content and don't need to modify it.
I got some data from an API with requests:
r = requests.get(...)
a = r.text
print(type(a))
str2JSON = json.dumps(a,indent=4)
print(type(str2JSON))
The result is:
class 'str'
class 'str'
Then I try loads instead of dumps:
str2JSON_2 = json.loads(a)
print(type(str2JSON_2))
And I get class list!!!
Why is that behaviour?
If you dump a string into JSON and you don’t get an error, does it automatically mean that the JSON is well parsed? Shouldn't that be a JSON class?
The thing you get back from requests is a str value containing a JSON encoded value.
dumps takes that str and produces another string containing the JSON-encoded version of the original (JSON-encoded) string.
You need loads to decode the string into a value.
json2str = json.loads(a,indent=4) # name change to reflect the direction of the operation
Consider:
>>> s = '"foo"' # A JSON string value
>>> json.dumps(s)
'"\\"foo\\""'
>>> json.loads(s)
'foo'
The string may, of course, encode a value other than a simple string:
>>> json.loads('3') # Compare to json.loads('"3"') returning '3'
3
>>> json.loads('[1,2,3]')
[1,2,3]
>>> json.loads('{"foo": 6}')
{'foo': 6}
requests, though, doesn't actually require you to remember which direction dumps and loads go (although you should make it a point to learn). A Response object has a json method which decodes the text attribute for you.
json2str = r.json() # equivalent to json2str = json.loads(r.text)
you are using requests. it provides convince method to parse your response as json (i.e. it loads it for you) you want a = r.json(). This way a would be JSON object and you can dump it as string later on. That is assuming you get valid json as response.
here is an example
import requests
import json
url = 'https://reqres.in/api/users' # dummy resposnse
resp = requests.get(url)
my_json = resp.json()
#print example user
print(my_json['data'][0])
json_string = json.dumps(my_json, indent=4)
print(json_string)
json.dumps returns a JSON formatted python string object.
Below the is the statement you can notice in actual implementation file for def dumps
"""Serialize obj to a JSON formatted str.
given json {"foo":"bazz","1":2}
I want to convert it to POST data :
"foo"="bazz";"1"=2;
(the data format in case it were posted from html form)
is there any exist decoder for json>> POST data? if no, does next script will do it as well?
json_body = {"foo":"bazz","1":2}
data = ''
for key, value in json_body.items():
data += '"{key}"={value};'.format(key=key, value=value)
print data
>> "foo"="bazz";"1"=2;
thanks
Use urllib.parse.urlencode:
from urllib.parse import urlencode
data = urlencode(json_body)
This produces x-www-form-urlencoded data, which is the default mime-type used by browsers when POST-ing HTML forms.
I have a URL:
http://somewhere.com/relatedqueries?limit=2&query=seedterm
where modifying the inputs, limit and query, will generate wanted data. Limit is the max number of term possible and query is the seed term.
The URL provides text result formatted in this way:
oo.visualization.Query.setResponse({version:'0.5',reqId:'0',status:'ok',sig:'1303596067112929220',table:{cols:[{id:'score',label:'Score',type:'number',pattern:'#,##0.###'},{id:'query',label:'Query',type:'string',pattern:''}],rows:[{c:[{v:0.9894380670262618,f:'0.99'},{v:'newterm1'}]},{c:[{v:0.9894380670262618,f:'0.99'},{v:'newterm2'}]}],p:{'totalResultsCount':'7727'}}});
I'd like to write a python script that takes two arguments (limit number and the query seed), go fetch the data online, parse the result and return a list with the new terms ['newterm1','newterm2'] in this case.
I'd love some help, especially with the URL fetching since I have never done this before.
It sounds like you can break this problem up into several subproblems.
Subproblems
There are a handful of problems that need to be solved before composing the completed script:
Forming the request URL: Creating a configured request URL from a template
Retrieving data: Actually making the request
Unwrapping JSONP: The returned data appears to be JSON wrapped in a JavaScript function call
Traversing the object graph: Navigating through the result to find the desired bits of information
Forming the request URL
This is just simple string formatting.
url_template = 'http://somewhere.com/relatedqueries?limit={limit}&query={seedterm}'
url = url_template.format(limit=2, seedterm='seedterm')
Python 2 Note
You will need to use the string formatting operator (%) here.
url_template = 'http://somewhere.com/relatedqueries?limit=%(limit)d&query=%(seedterm)s'
url = url_template % dict(limit=2, seedterm='seedterm')
Retrieving data
You can use the built-in urllib.request module for this.
import urllib.request
data = urllib.request.urlopen(url) # url from previous section
This returns a file-like object called data. You can also use a with-statement here:
with urllib.request.urlopen(url) as data:
# do processing here
Python 2 Note
Import urllib2 instead of urllib.request.
Unwrapping JSONP
The result you pasted looks like JSONP. Given that the wrapping function that is called (oo.visualization.Query.setResponse) doesn't change, we can simply strip this method call out.
result = data.read()
prefix = 'oo.visualization.Query.setResponse('
suffix = ');'
if result.startswith(prefix) and result.endswith(suffix):
result = result[len(prefix):-len(suffix)]
Parsing JSON
The resulting result string is just JSON data. Parse it with the built-in json module.
import json
result_object = json.loads(result)
Traversing the object graph
Now, you have a result_object that represents the JSON response. The object itself be a dict with keys like version, reqId, and so on. Based on your question, here is what you would need to do to create your list.
# Get the rows in the table, then get the second column's value for
# each row
terms = [row['c'][2]['v'] for row in result_object['table']['rows']]
Putting it all together
#!/usr/bin/env python3
"""A script for retrieving and parsing results from requests to
somewhere.com.
This script works as either a standalone script or as a library. To use
it as a standalone script, run it as `python3 scriptname.py`. To use it
as a library, use the `retrieve_terms` function."""
import urllib.request
import json
import sys
E_OPERATION_ERROR = 1
E_INVALID_PARAMS = 2
def parse_result(result):
"""Parse a JSONP result string and return a list of terms"""
prefix = 'oo.visualization.Query.setResponse('
suffix = ');'
# Strip JSONP function wrapper
if result.startswith(prefix) and result.endswith(suffix):
result = result[len(prefix):-len(suffix)]
# Deserialize JSON to Python objects
result_object = json.loads(result)
# Get the rows in the table, then get the second column's value
# for each row
return [row['c'][2]['v'] for row in result_object['table']['rows']]
def retrieve_terms(limit, seedterm):
"""Retrieves and parses data and returns a list of terms"""
url_template = 'http://somewhere.com/relatedqueries?limit={limit}&query={seedterm}'
url = url_template.format(limit=limit, seedterm=seedterm)
try:
with urllib.request.urlopen(url) as data:
data = perform_request(limit, seedterm)
result = data.read()
except:
print('Could not request data from server', file=sys.stderr)
exit(E_OPERATION_ERROR)
terms = parse_result(result)
print(terms)
def main(limit, seedterm):
"""Retrieves and parses data and prints each term to standard output"""
terms = retrieve_terms(limit, seedterm)
for term in terms:
print(term)
if __name__ == '__main__'
try:
limit = int(sys.argv[1])
seedterm = sys.argv[2]
except:
error_message = '''{} limit seedterm
limit must be an integer'''.format(sys.argv[0])
print(error_message, file=sys.stderr)
exit(2)
exit(main(limit, seedterm))
Python 2.7 version
#!/usr/bin/env python2.7
"""A script for retrieving and parsing results from requests to
somewhere.com.
This script works as either a standalone script or as a library. To use
it as a standalone script, run it as `python2.7 scriptname.py`. To use it
as a library, use the `retrieve_terms` function."""
import urllib2
import json
import sys
E_OPERATION_ERROR = 1
E_INVALID_PARAMS = 2
def parse_result(result):
"""Parse a JSONP result string and return a list of terms"""
prefix = 'oo.visualization.Query.setResponse('
suffix = ');'
# Strip JSONP function wrapper
if result.startswith(prefix) and result.endswith(suffix):
result = result[len(prefix):-len(suffix)]
# Deserialize JSON to Python objects
result_object = json.loads(result)
# Get the rows in the table, then get the second column's value
# for each row
return [row['c'][2]['v'] for row in result_object['table']['rows']]
def retrieve_terms(limit, seedterm):
"""Retrieves and parses data and returns a list of terms"""
url_template = 'http://somewhere.com/relatedqueries?limit=%(limit)d&query=%(seedterm)s'
url = url_template % dict(limit=2, seedterm='seedterm')
try:
with urllib2.urlopen(url) as data:
data = perform_request(limit, seedterm)
result = data.read()
except:
sys.stderr.write('%s\n' % 'Could not request data from server')
exit(E_OPERATION_ERROR)
terms = parse_result(result)
print terms
def main(limit, seedterm):
"""Retrieves and parses data and prints each term to standard output"""
terms = retrieve_terms(limit, seedterm)
for term in terms:
print term
if __name__ == '__main__'
try:
limit = int(sys.argv[1])
seedterm = sys.argv[2]
except:
error_message = '''{} limit seedterm
limit must be an integer'''.format(sys.argv[0])
sys.stderr.write('%s\n' % error_message)
exit(2)
exit(main(limit, seedterm))
i didn't understand well your problem because from your code there it seem to me that you use Visualization API (it's the first time that i hear about it by the way).
But well if you are just searching for a way to fetch data from a web page you could use urllib2 this is just for getting data, and if you want to parse the retrieved data you will have to use a more appropriate library like BeautifulSoop
if you are dealing with another web service (RSS, Atom, RPC) rather than web pages you can find a bunch of python library that you can use and that deal with each service perfectly.
import urllib2
from BeautifulSoup import BeautifulSoup
result = urllib2.urlopen('http://somewhere.com/relatedqueries?limit=%s&query=%s' % (2, 'seedterm'))
htmletxt = resul.read()
result.close()
soup = BeautifulSoup(htmltext, convertEntities="html" )
# you can parse your data now check BeautifulSoup API.