Is there a way to get headers from an url in python? - python

Is there a way to get the headers from url in any format like charles proxy does in python.

Yes, there is a way to get headers from an URL in Python programming language. The requests module provides a way to do this.
import requests
url = "https://www.google.com"
response = requests.head(url)
print(response.headers) # prints the entire header as a dictionary
print(response.headers["Content-Length"]) # prints a specific section of the
dictionary
https://www.folkstalk.com/tech/python-get-response-headers-with-code-examples/

Related

Parameters are ignored in python web request for JSON data

I try to read JSON-formatted data from the following public URL: http://ws-old.parlament.ch/factions?format=json. Unfortunately, I was not able to convert the response to JSON as I always get the HTML-formatted content back from my request. Somehow the request seems to completely ignore the parameters for JSON formatting passed with the URL:
import urllib.request
response = urllib.request.urlopen('http://ws-old.parlament.ch/factions?format=json')
response_text = response.read()
print(response_text) #why is this HTML?
Does somebody know how I am able to get the JSON formatted content as displayed in the web browser?
You need to add "Accept": "text/json" to request header.
For example using requests package:
r = requests.get(r'http://ws-old.parlament.ch/factions?format=json',
headers={'Accept':'text/json'})
print(r.json())
Result:
[{'id': 3, 'updated': '2022-02-22T14:59:17Z', 'abbreviation': ...
Sorry for you but these web services have a misleading implementation. The format query parameter is useless. As pointed out by #maciek97x only the header Accept: <format> will be considered for the formatting.
So your can directly call the endpoint without the ?format=json but with the header Accept: text/json.

Python - Requests pulling HTML instead of JSON

I'm building a Python web scraper (personal use) and am running into some trouble retrieving a JSON file. I was able to find the request URL I need, but when I run my script (I'm using Requests) the URL returns HTML instead of the JSON shown in the Chrome Developer Tools console. Here's my current script:
import requests
import json
url = 'https://nytimes.wd5.myworkdayjobs.com/Video?clientRequestID=1f1a6071627946499b4b09fd0f668ef0'
r = requests.get(url)
print(r.text)
Completely new to Python, so any push in the right direction is greatly appreciated. Thanks!
Looks like that website returns the response depending on the accept headers provided by the request. So try:
import requests
import json
url = 'https://nytimes.wd5.myworkdayjobs.com/Video?clientRequestID=1f1a6071627946499b4b09fd0f668ef0'
r = requests.get(url, headers={'accept': 'application/json'})
print(r.json())
You can have a look at the full api for further reference: http://docs.python-requests.org/en/latest/api/.

urllib request for json does not match the json in browser

This is my code thus far.
url = 'https://www.endomondo.com/rest/v1/users/3014732/workouts/357031682'
response = urllib.urlopen(url)
print response
data = json.load(response)
print data
The problem is that when I look at the json in the browser it is long and contains more features than I see when printing it.
To be more exact, I'm looking for the 'points' part which should be
data['points']['points']
however
data['points']
has only 2 attributes and doesn't contain the second 'points' that I do see in the url in the browser.
Could it be that I can only load 1 "layer" deep and not 2?
You need to add a user-agent to your request.
Using requests (which urllib documentation recommends over directly using urllib), you can do:
import requests
url = 'https://www.endomondo.com/rest/v1/users/3014732/workouts/357031682'
response = requests.get(url, headers={'user-agent': 'Mozilla 5.0'})
print(response.json())
# long output....

How to print out http-response header in Python

Today I actually needed to retrieve data from the http-header response. But since I've never done it before and also there is not much you can find on Google about this. I decided to ask my question here.
So actual question: How does one print the http-header response data in python? I'm working in Python3.5 with the requests module and have yet to find a way to do this.
Update: Based on comment of OP, that only the response headers are needed. Even more easy as written in below documentation of Requests module:
We can view the server's response headers using a Python dictionary:
>>> r.headers
{
'content-encoding': 'gzip',
'transfer-encoding': 'chunked',
'connection': 'close',
'server': 'nginx/1.0.4',
'x-runtime': '148ms',
'etag': '"e1ca502697e5c9317743dc078f67693f"',
'content-type': 'application/json'
}
And especially the documentation notes:
The dictionary is special, though: it's made just for HTTP headers. According to RFC 7230, HTTP Header names are case-insensitive.
So, we can access the headers using any capitalization we want:
and goes on to explain even more cleverness concerning RFC compliance.
The Requests documentation states:
Using Response.iter_content will handle a lot of what you would otherwise have to handle when using Response.raw directly. When streaming a download, the above is the preferred and recommended way to retrieve the content.
It offers as example:
>>> r = requests.get('https://api.github.com/events', stream=True)
>>> r.raw
<requests.packages.urllib3.response.HTTPResponse object at 0x101194810>
>>> r.raw.read(10)
'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03'
But also offers advice on how to do it in practice by redirecting to a file etc. and using a different method:
Using Response.iter_content will handle a lot of what you would otherwise have to handle when using Response.raw directly
How about something like this:
import urllib2
req = urllib2.Request('http://www.google.com/')
res = urllib2.urlopen(req)
print res.info()
res.close();
If you are looking for something specific in the header:
For Date: print res.info().get('Date')
Here's how you get just the response headers using the requests library like you mentioned (implementation in Python3):
import requests
url = "https://www.google.com"
response = requests.head(url)
print(response.headers) # prints the entire header as a dictionary
print(response.headers["Content-Length"]) # prints a specific section of the dictionary
It's important to use .head() instead of .get() otherwise you will retrieve the whole file/page like the rest of the answers mentioned.
If you wish to retrieve a URL that requires authentication you can replace the above response with this:
response = requests.head(url, auth=requests.auth.HTTPBasicAuth(username, password))
easy
import requests
site = "https://www.google.com"
headers = requests.get(site).headers
print(headers)
if you want something specific
print(headers["domain"])
I'm using the urllib module, with the following code:
from urllib import request
with request.urlopen(url, data) as f:
print(f.getcode()) # http response code
print(f.info()) # all header info
resp_body = f.read().decode('utf-8') # response body
its very easy u can type
print(response.headers)
or my fav
print(requests.get('url').headers)
also u can use
print(requests.get('url').content)
Try to use req.headers and that's all. You will get the response headers ;)
import pprint
import requests
res = requests.request("GET", "https://google.com")
pprint.PrettyPrinter(indent=2).pprint(dict(res.headers))

Python change Accept-Language using requests

I'm new to python and trying to get some infos from IMDb using requests library. My code is capturing all data (e.g., movie titles) in my native language, but i would like to get them in english.
How can i change the accept-language in requests to do that?
All you need to do is define your own headers:
import requests
url = "http://www.imdb.com/title/tt0089218/"
headers = {"Accept-Language": "en-US,en;q=0.5"}
r = requests.get(url, headers=headers)
You can add whatever other headers you'd like to modify as well.

Categories