python request is giving "Access Denied" - python

I was trying to download some data using python request command as follows:
import requests
head = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0'}
session = requests.session()
session.get('https://www1.nseindia.com/ArchieveSearch?h_filetype=fobhav&date=02-07-2011&section=FO', headers=head)
r= session.get('https://www1.nseindia.com/content/historical/DERIVATIVES/2011/AUG/fo02AUG2011bhav.csv.zip', headers=head)
print(r.status_code)
print(r.content)
But the above code is giving me following output:
403
b'<HTML><HEAD>\n<TITLE>Access Denied</TITLE>\n</HEAD><BODY>\n<H1>Access Denied</H1>\n \nYou don\'t have permission to access "http://www1.nseindia.com/content/historical/DERIVATIVES/2011/AUG/fo02AUG2011bhav.csv.zip" on this server.<P>\nReference #18.661c2017.1662218167.332744f\n</BODY>\n</HTML>\n'
Why am I getting "Access Denied"? If someone simply goes to the website, he can select the date and download the data.
EDIT
Site to visit to get the url: https://www1.nseindia.com/products/content/derivatives/equities/archieve_fo.htm
Select 'bhavcopy' and a date to get the link.

Related

Unable to fetch a response - Request library Python

I am unable to fetch a response from this url. While it works in browser, even in incognito mode. Not sure why it is not working. It is just keep running without any output. No errors. I even tried request headers by setting 'user-agent' key but again received no response
Following is the code used:
import requests
response = requests.get('https://www1.nseindia.com/ArchieveSearch?h_filetype=eqbhav&date=04-12-2020&section=EQ')
print(response.text)
I want html text from the response page for further use.
Your server is checking to see if you are sending the request from a web browser. If not, it's not returning anything. Try this:
import requests
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:83.0) Gecko/20100101 Firefox/83.0'}
r=requests.get('https://www1.nseindia.com/ArchieveSearch?h_filetype=eqbhav&date=04-12-2020&section=EQ', timeout=3, headers=headers)
print(r.text)

Syntax Error in Python Requests library for the identifier "headers"

I'm trying to build a webscraper from a tutorial I watched.
Replicating the same work is giving me the following error.
import requests
import bs4
r = requests.get("http://www.pyclass.com/example.html", headers={"User-agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0"})  
c=r.content
The error says "Syntax Error : Invalid character in identifier"
The word headers is being highlighted .
I really need to use headers so that I can fetch the data by impersonating a web browser , otherwise I am getting a 406 error without it.
Try below code.
import requests
head={"User-agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0"}
r=requests.get("http://www.example.com/", headers=head)

Bad request trying to scrape page using Python 3

I am trying to to scrape the following page using python 3 but I keep getting HTTP Error 400: Bad Request. I have looked at some of the previous answers suggesting to use urllib.quote which didn't work for me since it's python 2. Also, I tried the following code as suggested by another post and still didn't work.
url = requote_uri('http://www.txhighereddata.org/Interactive/CIP/CIPGroup.cfm?GroupCode=01')
with urllib.request.urlopen(url) as response:
html = response.read()
The server deny queries from non human-like User-Agent HTTP header.
Just pick a browser's User-Agent string and set it as header to your query:
import urllib.request
url = 'http://www.txhighereddata.org/Interactive/CIP/CIPGroup.cfm?GroupCode=01'
headers={
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0"
}
request = urllib.request.Request(url, headers=headers)
with urllib.request.urlopen(request) as response:
html = response.read()

requests[python lib] doesn't get correct response with post

I want to communicate with some website. I successfully login into the site but I can't send query below. parameter is clear in below images but I don't know why my code response code 400.
Header:
Params:
Here is my code in python:
#init user-agent header for perfomance and compatibility
heads={ 'User-Agent' : "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0" }
user = "XXXXXXXXXXX"
user_id="4013370545"
#update session
user_url = instagram_url + user + "/"
result=c.get(user_url,headers=heads)
#fist request for followers.
qdata= "ig_user("+user_id+")+{++followed_by.first(10)+{++++count,++++page_info+{++++++end_cursor,++++++has_next_page++++},++++nodes+{++++++id,++++++is_verified,++++++followed_by_viewer,++++++requested_by_viewer,++++++full_name,++++++profile_pic_url,++++++username++++}++}}"
queryid="17851938028087704"
query_data=dict(q=qdata,ref="relationships::follow_list",query_id=queryid)
query_data= {'q': qdata, 'ref': "relationships::follow_list", 'query_id': queryid}
#query_data="q="+qdata+"ref="+"relationships::follow_list"+"query_id="+queryid
#set number of header required for login
heads['X-Requested-With']="XMLHttpRequest"
heads['X-Instagram-AJAX']="1"
heads['Referer']='https://www.instagram.com/'+user+'/'
heads['Host']= 'www.instagram.com'
heads['X-CSRFToken']=result.cookies['csrftoken']
#heads['Accept-Encoding']="gzip, deflate, br"
#heads['Accept-Language']="en-US,en;q=0.5"
#heads['Accept']='*/*'
#heads['Content-Type']='application/x-www-form-urlencoded'
#login to the instagram using query_data and prepared headers
result =c.post(instagram_url+"query/", data=query_data, headers=heads)
but result is:
<Response [400]>
where is my mistake? any suggestion?

trying to log in and scrape a website through asp.net

I have wrote a program with the aim of logging into one of my companies websites and then scraping data with aim of making data collection quicker. this is using requests and beautiful soup.
I can get it to print out the html code for a page but I cant get it to log in past the aspx and then print the html on the page after.
below is the code im using and my headers and params. any help would be appreciated
import requests
from bs4 import BeautifulSoup
URL="http://mycompanywebsiteloginpage.co.uk/Login.aspx"
headers={"User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:44.0) Gecko/20100101 Firefox/44.0 Iceweasel/44.0.2"}
username="myusername"
password="mypassword"
s=requests.Session()
s.headers.update(headers)
r=s.get(URL)
soup=BeautifulSoup(r.content)
VIEWSTATE=soup.find(id="__VIEWSTATE")['value']
EVENTVALIDATION=soup.find(id="__EVENTVALIDATION")['value']
EVENTTARGET=soup.find(id="__EVENTTARGET")['value']
EVENTARGUEMENT=soup.find(id="__EVENTARGUMENT")['value']
login_data={"__VIEWSTATE":VIEWSTATE,
"ctl00$ContentPlaceHolder1$_tbEngineerUsername":username,
"ctl00$ContentPlaceHolder1$_tbEngineerPassword":password,
"ctl00$ContentPlaceHolder1$_tbSiteOwnerEmail":"",
"ctl00$ContentPlaceHolder1$_tbSiteOwnerPassword":"",
"ctl00$ContentPlaceHolder1$tbAdminName":username,
"ctl00$ContentPlaceHolder1$tbAdminPassword":password,
"__EVENTVALIDATION":EVENTVALIDATION,
"__EVENTTARGET":EVENTTARGET,
"--EVENTARGUEMENT":EVENTARGUEMENT}
r = s.post(URL, data=login_data)
r = requests.get("http://mycompanywebsitespageafterthelogin.co.uk/Secure/")
print (r.url)
print (r.text)
FROM DATA
__VIEWSTATE:"DAwNEAIAAA4BBQAOAQ0QAgAADgEFAw4BDRACDwEBBm9ubG9hZAFkU2hvd1BhbmVsKCdjdGwwMF9Db250ZW50UGxhY2VIb2xkZXIxX19wbkFkbWluaXN0cmF0b3JzJywgZG9jdW1lbnQuZ2V0RWxlbWVudEJ5SWQoJ2FkbWluTG9naW5MaW5rJykpOwAOAQUBDgENEAIAAA4DBQEFBwULDgMNEAIMDwEBDUFsdGVybmF0ZVRleHQBDldEU0kgRGFzaGJvYXJkAAAAAA0QAgAADgIFAAUBDgINEAIPAQEEVGV4dAEEV0RTSQAAAA0QAgwPAQEHVmlzaWJsZQgAAAAADRACDwECBAABBFdEU2kAAAAAAABCX8QugS7ztoUJMfDmZ0s20ZNQfQ=="
ctl00$ContentPlaceHolder1$_tbEngineerUsername:"myusername"
ctl00$ContentPlaceHolder1$_tbEngineerPassword:"mypassword"
ctl00$ContentPlaceHolder1$_tbSiteOwnerEmail:""
ctl00$ContentPlaceHolder1$_tbSiteOwnerPassword:""
ctl00$ContentPlaceHolder1$tbAdminName:"myusername"
ctl00$ContentPlaceHolder1$tbAdminPassword:"mypassword"
__EVENTVALIDATION:"HQABAAAA/////wEAAAAAAAAADwEAAAAKAAAACBzHEFXh+HCtf3vdl8crWr6QZnmaeK7pMzThEoU2hwqJxnlkQDX2XLkLAOuKEnW/qBMtNK2cdpQgNxoGtq65"
__EVENTTARGET:"ctl00$ContentPlaceHolder1$_btAdminLogin"
__EVENTARGUMENT:""
REQUEST COOKIES
ASP.NET_SessionId:"11513CDDE31AF267CCD87BAB"
RESPONSE HEADERS
Cache-Control:"private"
Connection:"Keep-Alive"
Content-Length:"123"
Content-Type:"text/html; charset=utf-8"
Date:"Thu, 28 Jul 2016 13:37:45 GMT"
Keep-Alive:"timeout=15, max=91"
Location:"/Secure/"
Server:"Apache/2.2.14 (Ubuntu)"
x-aspnet-version:"2.0.50727"
REQUEST HEADERS
Host:"mycompanywebsite.co.uk"
User-Agent:"Mozilla/5.0 (X11; Linux x86_64; rv:44.0) Gecko/20100101 Firefox/44.0 Iceweasel/44.0.2"
Accept:"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
Accept-Language:"en-US,en;q=0.5"
Accept-Encoding:"gzip, deflate"
Referer:"http://mycompanywebsiteloginpage/Login.aspx"
Cookie:"ASP.NET_SessionId=F11CB47B137ADB66D2274758"
Connection:"keep-alive"
change the line
r = requests.get("http://mycompanywebsitespageafterthelogin.co.uk/Secure/")
to use your session object
r = s.get("http://mycompanywebsitespageafterthelogin.co.uk/Secure/")

Categories