TypeError with urlopen() - python

I am a bit confused with using Request, urlopen and JSONDecoder().decode().
Currently I have:
hdr = {'User-agent' : 'anything'} # header, User-agent header describes my web browser
I am assuming that the server uses this to determine which browsers are acceptable? Not sure
my url is:
url = 'http://wwww.reddit.com/r/aww.json'
I set a req variable
req = Request(url,hdr) #request to access the url with header
json = urlopen(req).read() # read json page
I tried using urlopen in terminal and I get this error:
TypeError: must be string or buffer, not dict # This has to do with me header?
data = JSONDecoder().decode(json) # translate json data so I can parse through it with regular python functions?
I'm not really sure why I get the TypeError

If you look at the documentation of Request, you can see that the constructor signature is actually Request(url, data=None, headers={}, …). So the second parameter, the one after the URL, is the data you are sending with the request. But if you want to set the headers instead, you will have to specify the headers parameter.
You can do this in two different ways. Either you pass None as the data parameter:
Request(url, None, hdr)
But, well, this requires you to pass the data parameter explicitely and you have to make sure that you pass the default value to not cause any unwanted effects. So instead, you can tell Python to explicitely pass the header parameter instead, without specifying data:
Request(url, headers=hdr)

Related

Parameters are ignored in python web request for JSON data

I try to read JSON-formatted data from the following public URL: http://ws-old.parlament.ch/factions?format=json. Unfortunately, I was not able to convert the response to JSON as I always get the HTML-formatted content back from my request. Somehow the request seems to completely ignore the parameters for JSON formatting passed with the URL:
import urllib.request
response = urllib.request.urlopen('http://ws-old.parlament.ch/factions?format=json')
response_text = response.read()
print(response_text) #why is this HTML?
Does somebody know how I am able to get the JSON formatted content as displayed in the web browser?
You need to add "Accept": "text/json" to request header.
For example using requests package:
r = requests.get(r'http://ws-old.parlament.ch/factions?format=json',
headers={'Accept':'text/json'})
print(r.json())
Result:
[{'id': 3, 'updated': '2022-02-22T14:59:17Z', 'abbreviation': ...
Sorry for you but these web services have a misleading implementation. The format query parameter is useless. As pointed out by #maciek97x only the header Accept: <format> will be considered for the formatting.
So your can directly call the endpoint without the ?format=json but with the header Accept: text/json.

Obtain both headers and content from single GET request with the Python requests library

I am trying to, with a single get request, obtain both headers and content.
At the moment, I am able to obtain them individually:
Headers=requests.get('https://coinmarketcap.com', verify=False).headers
and
ParseLM=requests.get('https://coinmarketcap.com', verify=False).content
However, this makes two separate GET requests while I am trying to parse both headers and content from the same request, although separately.
Call requests.get() once, saving the entire result:
response = requests.get('https://coinmarketcap.com', verify=False)
Then you can access individual pieces of the result:
headers = response.headers
content = response.content

What are the corresponding parameters of Requests "data" and "params" in urllib2?

I have been successfully implementing python Requests module to send out POST requests to server with specified
resp = requests.request("POST", url, proxies, data, headers, params, timeout)
However, for a certain reason, I now need to use python urllib2 module to query. For urllib2.urlopen's parameter "data," what I understand is that it helps to form the query string (which is the same as Requests "params"). requests.request's parameter "data," on the other hand, is used to fill the request body.
After searching and reading many posts, examples, and documentations, I still have not been able to figure out what is the corresponding parameter of requests.request's "data" in urllib2.
Any advice is much appreciated! Thanks.
-Janton
It doesn't matter what it is called - it is a matter of passing it in at the right place. For example in this example, the POST data is a dictionary (name can be anything).
The dictionary is urlencoded and the urlencoded name can again be anything but I've picked "postdata", which is the data that is POSTed
import urllib # for the urlencode
import urllib2
searchdict = {'q' : 'urllib2'}
url = 'https://duckduckgo.com/html'
postdata = urllib.urlencode(searchdict)
req = urllib2.Request(url, postdata)
response = urllib2.urlopen(req)
print response.read()
print response.getcode()
If your POST data is plain text (not a Python type such as a dictionary) it can work without urllib.urlencode:
import urllib2
searchstring = 'q=urllib2'
url = 'https://duckduckgo.com/html'
req = urllib2.Request(url, searchstring)
response = urllib2.urlopen(req)
print response.read()
print response.getcode()

Making a GET request JSON with parameters using Python

I was wondering how do I make a GET request to a specific url with two query parameters? These query parameters contain two id numbers
So far I have:
import json, requests
url = 'http://'
requests.post(url)
But they gave me query paramters first_id=### and last_id=###. I don't know how to include these parameters?
To make a GET request you need the get() method, for parameters use params argument:
response = requests.get(url, params={'first_id': 1, 'last_id': 2})
If the response is of a JSON content type, you can use the json() shortcut method to get it loaded into a Python object for you:
data = response.json()
print(data)

Python Requests data parameter not working properly

I'm using Requests http://docs.python-requests.org/en/latest/
I'm trying to do a simple POST request with some extra headers. I don't understand the following behavior.
opener = requests.Session()
data = {"payload" : { "id": 1, "pwd": "mypass"}
headers = {"Content-Type":"application/json"}
url = "https://mysite.com/login"
# THIS WORKS
opener.post(url, data)
# THIS DOES NOT WORK
opener.post(url, json.dumps(data))
# THIS DOES NOT WORK
opener.post(url, data=data, headers=headers)
# THIS WORKS
opener.post(url, data=json.dumps(data), headers=headers)
It seems that the post method expects a dict normally and does not work when I convert that dict to a string. I can login using the former but I can't using the latter.
However, when I supply extra headers information, it seems the post method works the extact opposite way, it expects a string for the data. On the server side, the first version throws an error while the second version works with headers.
What is the reason for this behavior?

Categories