This question already has an answer here:
python how to decode http response
(1 answer)
Closed 2 years ago.
when i send get requests the response or outcome is not human understandable ..
enter image description here
my code:
get_log_head = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'sec-ch-ua': '"Chromium";v="86", "\"Not\\A;Brand";v="99", "Google Chrome";v="86"',
'sec-ch-ua-mobile': '?0',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'none',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36'
}
get_login = session_request.get("https://www.ex.com", headers=get_log_head)
print(get_login.text)
what the solve ?
This is because the web server is sending you a brotli compressed response, since you set 'accept-encoding': 'gzip, deflate, br',
The requests module can natively handle gzip, and deflate, and will automatically be decoded for you (documented here) but not Brotli. Try modifying your accept-encoding to
'accept-encoding': 'gzip, deflate'
To get a human readable response I used this code:
import requests
session_request = requests.session()
get_log_head = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'sec-ch-ua': '"Chromium";v="86", "\"Not\\A;Brand";v="99", "Google Chrome";v="86"',
'sec-ch-ua-mobile': '?0',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'none',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36'
}
get_login = session_request.get("https://www.panago.com", headers=get_log_head)
print(get_login.text)
Related
I want to use proxy with Python web requests. To test if my request is working or not, I send a request to jsonip.com. In the response it returns my real ip instead of the proxy. Also the website providing proxy also says "no activity". Am I connecting to the proxy correctly? Here the code:
import time, requests, random
from requests.auth import HTTPProxyAuth
auth = HTTPProxyAuth("muyjgovw", "mtpysgrb3nkj")
def reqs():
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/107.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
# 'Accept-Encoding': 'gzip, deflate, br',
'Referer': 'https://www.google.com/',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'cross-site',
'Sec-Fetch-User': '?1',
}
prox = [{"http": "http://64.137.58.19:6265"}]
proxies = random.choice(prox)
response = requests.get('https://jsonip.com/', headers=headers, proxies=proxies)
print(response.status_code)
print(response.json())
reqs()
Screenshot of website showing no activity
Your have to do this to include the proxy
import time, requests, random
from requests.auth import HTTPProxyAuth
auth = HTTPProxyAuth("muyjgovw", "mtpysgrb3nkj")
def reqs():
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/107.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
# 'Accept-Encoding': 'gzip, deflate, br',
'Referer': 'https://www.google.com/',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'cross-site',
'Sec-Fetch-User': '?1',
}
prox = [{"http": "http://64.137.58.19:6265",
"https": "http://64.137.58.19:6265" }]
proxies = random.choice(prox)
response = requests.get('https://jsonip.com/', headers=headers, proxies=proxies)
print(response.status_code)
print(response.json())
reqs()
I need to parse the site with DDoS-GUARD.
In Firefox devtools I found the GET-method https://stolichki.ru/cities/all
If I open this url in firefox, it's returns the JSON-object.
But Python requests returns the html page with 403 status.
response_raw = requests.get('https://stolichki.ru/cities/all')
print(response_raw.text)
I tried to change the headers of request
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'ru-RU,ru;q=0.8,en-US;q=0.5,en;q=0.3',
'Connection': 'keep-alive',
'DNT': '1',
'Host': 'stolichki.ru',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'none',
'Sec-Fetch-User': '?1',
'Sec-GPC': '1',
'TE': 'trailers',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:99.0) Gecko/20100101 Firefox/99.0'
}
But it didn't help.
Please don't offer grab library
I have a text file with entries like this (url.py):
import requests
headers = {
'authority': 'www.spain.com',
'pragma': 'no-cache',
'cache-control': 'no-cache',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'sec-fetch-site': 'none',
'sec-fetch-mode': 'navigate',
'sec-fetch-user': '?1',
'sec-fetch-dest': 'document',
'accept-language': 'en-US,en;q=0.9,pt;q=0.8',
}
links=['https://www.spain.com']
for url in links:
page = requests.get(url, headers=headers)
print(page)
Return
ubuntu#OS-Ubuntu:/mnt/$ python3 url.py
<Response [200]>
I need this to be filled in automatically because I will receive a txt file (domain.txt) with the domains like this:
www.spain.com
www.uk.com
www.italy.com
I wanted the python script to be unique and transversal ... I would just add more domains to my domain.txt and then I would run my url.py and it would automatically make the request on all domains of domain.txt
You can help me with that.
Assuming url.py is located in the same directory as domains.txt, you can open the file and read each link into a list using:
with open('domains.txt', 'r') as f:
links = f.read().splitlines()
api_url = "https://en.coinjinja.com/api/events/search"
headers = {'origin': 'https://en.coinjinja.com',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'ko-KR,ko;q=0.9,en-US;q=0.8,en;q=0.7',
'authority': 'en.coinjinja.com',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.106 Safari/537.36',
'content-type': 'application/json',
'content-length': '126',
'accept': '*/*',
'referer': 'https://en.coinjinja.com/events/time/next_week/tags/hardfork+airdrop+burn+exchange+partnership'
}
data = {"start": "2020-02-17","end":"2020-02-24","symbol":"","types":["hardfork","airdrop","burn","exchange","partnership"]}
api_request = requests.post(api_url, headers=headers, data=json.dumps(data))
print(api_request.headers)
print(api_request.encoding)
print(api_request.content.decode('utf-8','ignore'))
You need to install brotli package to work with 'Content-Encoding': 'br'. It's duplicate of unable to decode Python web request
I want to scrape facebook companies for their date (if they have).
problem is that when I try to retrieve the HTML, I get the Hebrew version of it (I'm located in Israel)
this is part of the result:
�1u�9X�/.������~�O+$B\^����y�����e�;�+
Code:
import requests
from bs4 import BeautifulSoup
headers = {'accept': '*/*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-GB,en;q=0.9,en-US;q=0.8,hi;q=0.7,la;q=0.6',
'cache-control': 'no-cache',
'dnt': '1',
'pragma': 'no-cache',
'referer': 'https',
'sec-fetch-mode': 'no-cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36',
}
url = 'https://www.facebook.com/pg/google/about/'
def fetch(URL):
try:
response = requests.get(url=URL, headers=headers).text
print(response)
except:
print('Could not retrieve data, or connect')
fetch(url)
Is there a way to check the EN website? any subdomain? or i should use proxy in the request?
What are you seeing isn't Hebrew version of the site, but compressed response from the server. As quick solution, you can remove accept-encoding header from the request:
import requests
from bs4 import BeautifulSoup
headers = {
'accept': '*/*',
# 'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-GB,en;q=0.9,en-US;q=0.8,hi;q=0.7,la;q=0.6',
'cache-control': 'no-cache',
'dnt': '1',
'pragma': 'no-cache',
'referer': 'https',
'sec-fetch-mode': 'no-cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36',
}
url = 'https://www.facebook.com/pg/google/about/'
def fetch(URL):
try:
response = requests.get(url=URL, headers=headers).text
print(response)
except:
print('Could not retrieve data, or connect')
fetch(url)
Prints the uncompressed page:
<!DOCTYPE html>
<html lang="en" id="facebook" class="no_js">
<head><meta charset="utf-8" /><meta name="referrer" content="origin-when-crossorigin" id="meta_referrer" /><script>window._cstart=+new Date();</script><script>function envFlush(a){function b(b){for(var c in a)b[
...and so on.