Python requests - 403 forbidden - despite setting `User-Agent` headers - python

import requests
import webbrowser
from bs4 import BeautifulSoup
url = 'https://www.gamefaqs.com'
#headers={'User-Agent': 'Mozilla/5.0'}
headers ={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}
response = requests.get(url, headers)
response.status_code is returning 403.
I can browse the website using firefox/chrome, so It seems to be a coding error.
I can't figure out what mistake I'm making.
Thank you.

This works if you make the request through a Session object.
import requests
session = requests.Session()
response = session.get('https://www.gamefaqs.com', headers={'User-Agent': 'Mozilla/5.0'})
print(response.status_code)
Output:
200

Using keyword argument works for me:
import requests
headers={'User-Agent': 'Mozilla/5.0'}
response = requests.get('https://www.gamefaqs.com', headers=headers)

Try using a Session.
import requests
session = requests.Session()
response = session.get(url, headers={'user-agent': 'Mozilla/5.0'})
print(response.status_code)
If still the request returns 403 Forbidden (after session object &
adding user-agent to headers), you may need to add more headers:
headers = {
'user-agent':"Mozilla/5.0 ...",
'accept': '"text/html,application...',
'referer': 'https://...',
}
r = session.get(url, headers=headers)
In the chrome, Request headers can be found in the Network > Headers > Request-Headers of the Developer Tools. (Press F12 to toggle it.)
reason being, few websites look for user-agent or for presence of specific headers before accepting the request.

Related

Python trying to download a file with login and password

I am trying to Access a comfort panel with windows CE from Windows 10 and get a audittrail.cvs with python. After logging in using a username and password, I try to download the csv file with the audit trail info but the csv containing the HTML info gets downloaded instead. How do I download the actual file?
import requests
from bs4 import BeautifulSoup
loginurl = ("http://10.70.148.11/FormLogin")
secure_url = ("http://10.70.148.11/StorageCardSD?UP=TRUE&FORCEBROWSE")
downloadurl = ("http://10.70.148.11/StorageCardSD/AuditTrail0.csv?UP=TRUE&FORCEBROWSE")
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'}
payload = {
'Login': 'Admin',
'Password': 'Pass'
}
with requests.session() as s:
s.post(loginurl, data=payload)
r = s.get(secure_url)
soup = BeautifulSoup(r.content, 'html.parser')
print(soup.prettify())
req = requests.get(downloadurl, headers=headers, allow_redirects=True)
url_content = req.content
csv_file = open('audittrail.csv', 'wb')
csv_file.write(url_content)
csv_file.close()
When you try to get the file, you are no longer in the requests session and therefore do not have the necessary cookies.
Try making the requests with your session logged in (the requests session).
It should work.

Js site does not return data on shaving

I'm using the script I always use to scrape data from the web but I'm not getting success.
I would like to get the data from the table on the website:
https://www.rad.cvm.gov.br/ENET/frmConsultaExternaCVM.aspx
I'm using the following code for scraping:
from bs4 import BeautifulSoup
from selenium import webdriver
url = "https://www.rad.cvm.gov.br/ENET/frmConsultaExternaCVM.aspx"
browser = webdriver.PhantomJS()
browser.get(url)
html = browser.page_source
bs = BeautifulSoup(html, 'lxml')
print(bs)
currently I only receive js from the site and not the data from the table itself
Do HTTP POST to https://www.rad.cvm.gov.br/ENET/frmConsultaExternaCVM.aspx/PopulaComboEmpresas
This will return you data of the table as JSON.
In the browser do F12 --> Network --> Fetch/XHR in order to see more details like HTTP header and POST Body.
You can do that easily using only requests as api calls json response and following the post method.
Here is the working code:
import requests
import json
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
body = { 'tipoEmpresa': '0'}
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36',
'x-dtpc': '33$511511524_409h2vHHVRBIAIGILPJNCRGRCECUBIACWCBUEE-0e37',
'X-Requested-With': 'XMLHttpRequest',
'Content-Type': 'application/json'
}
url='https://www.rad.cvm.gov.br/ENET/frmConsultaExternaCVM.aspx/PopulaComboEmpresas'
r = requests.post(url, data=json.dumps(body), headers =headers, verify = False)
res = r.json()['d']
print(res)

Unable to fetch a response - Request library Python

I am unable to fetch a response from this url. While it works in browser, even in incognito mode. Not sure why it is not working. It is just keep running without any output. No errors. I even tried request headers by setting 'user-agent' key but again received no response
Following is the code used:
import requests
response = requests.get('https://www1.nseindia.com/ArchieveSearch?h_filetype=eqbhav&date=04-12-2020&section=EQ')
print(response.text)
I want html text from the response page for further use.
Your server is checking to see if you are sending the request from a web browser. If not, it's not returning anything. Try this:
import requests
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:83.0) Gecko/20100101 Firefox/83.0'}
r=requests.get('https://www1.nseindia.com/ArchieveSearch?h_filetype=eqbhav&date=04-12-2020&section=EQ', timeout=3, headers=headers)
print(r.text)

Incomplete HTML Content Using Python request.get

I am Trying to Get Html Content from a URL using request.get in Python.
But am getting incomplete response.
import requests
from lxml import html
url = "https://www.expedia.com/Hotel-Search?destination=Maldives&latLong=3.480528%2C73.192127&regionId=109&startDate=04%2F20%2F2018&endDate=04%2F21%2F2018&rooms=1&_xpid=11905%7C1&adults=2"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36',
'Content-Type': 'text/html',
}
response = requests.get(url, headers=headers)
print response.content
Can any one suggest the changes to be done for getting the exact complete response.
NB:using selenium am able to get the complete response,but that is not the recommended way.
If you need to get content generated dynamically by JavaScript and you don't want to use Selenium, you can try requests-html tool that supports JavaScript:
from requests_html import HTMLSession
session = HTMLSession()
url = "https://www.expedia.com/Hotel-Search?destination=Maldives&latLong=3.480528%2C73.192127&regionId=109&startDate=04%2F20%2F2018&endDate=04%2F21%2F2018&rooms=1&_xpid=11905%7C1&adults=2"
r = session.get(url)
r.html.render()
print(r.content)

Not able to login to a website using requests python package

I am using the following script to login to https://www.mbaco.com/login. While I am not getting any error, I can't access the protected pages of the website. Plz help.
import requests
url = 'https://www.mbaco.com/login'
payload = {
'_username':"mysuername",
'_password':"password"
}
session = requests.session()
r = session.post(url, data=payload)
You have the wrong url, the post is to https://www.mbaco.com/login_check, it is also a good to add a user-agent:
import requests
headers = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36"}
url = 'https://www.mbaco.com/login_check'
payload = {
'_username':"mysuername",
'_password':"password"
}
session = requests.session()
r = session.post(url, data=payload, headers=headers)
If you want to see what gets posted and to where, open developer tools or firebug and you can see exactly what is happening, in this case you can see under the other tab exactly what is posted and to where:

Categories