I'm facing issue to access such URL via Python code
https://www1.nseindia.com/content/historical/EQUITIES/2021/JAN/cm01JAN2021bhav.csv.zip
This was working for last 3 years until 31-Dec-2020. Seems that the site has implemented some restrictions.
There's solution for similar one here in
VB NSE ACCESS DENIED
This addition is made : "User-Agent" : "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11" "Referer" : "https://www1.nseindia.com/products/content/equities/equities/archieve_eq.htm"
Original code is here :
https://github.com/naveen7v/Bhavcopy/blob/master/Bhavcopy.py
It's not working even after adding following in requests section
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11'}
##print (Path)
a=requests.get(Path,headers)
Can someone help?
def download_bhavcopy(formated_date):
url = "https://www1.nseindia.com/content/historical/DERIVATIVES/{0}/{1}/fo{2}{1}{0}bhav.csv.zip".format(
formated_date.split('-')[2],
month_dict[formated_date.split('-')[1]],
formated_date.split('-')[0])
print(url)
res=None
hdr = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*,q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-IN,en;q=0.9,en-GB;q=0.8,en-US;q=0.7,hi;q=0.6',
'Connection': 'keep-alive','Host':'www1.nseindia.com',
'Cache-Control':'max-age=0',
'Host':'www1.nseindia.com',
'Referer':'https://www1.nseindia.com/products/content/derivatives/equities/fo.htm',
}
cookie_dict={'bm_sv':'E2109FAE3F0EA09C38163BBF24DD9A7E~t53LAJFVQDcB/+q14T3amyom/sJ5dm1gV7z2R0E3DKg6WiKBpLgF0t1Mv32gad4CqvL3DIswsfAKTAHD16vNlona86iCn3267hHmZU/O7DrKPY73XE6C4p5geps7yRwXxoUOlsqqPtbPsWsxE7cyDxr6R+RFqYMoDc9XuhS7e18='}
session = requests.session()
for cookie in cookie_dict:
session.cookies.set(cookie,cookie_dict[cookie])
response = session.get(url,headers = hdr)
if response.status_code == 200:
print('Success!')
elif response.status_code == 404:
print('Not Found.')
else :
print('response.status_code ', response.status_code)
file_name="none";
try:
zipT=zipfile.ZipFile(io.BytesIO(response.content) )
zipT.extractall()
file_name = zipT.filelist[0].filename
print('file name '+ file_name)
except zipfile.BadZipFile: # if the zip file has any errors then it prints the error message which you wrote under the 'except' block
print('Error: Zip file is corrupted')
except zipfile.LargeZipFile: # it raises an 'LargeZipFile' error because you didn't enable the 'Zip64'
print('Error: File size if too large')
print(file_name)
return file_name
Inspect the link in your web browser and find GET for the required download link.
Go to Headers and check User-Agent
e.g. User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0
Now modify your code as:
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0'
}
result = requests.get(URL, headers = headers)
Related
I wanted to get Fear and greed index from 'https://alternative.me/crypto/ and I found a url where the XHR data is by using Chrome network tool but when trying to scrape the website, I got 404 or 500 error page.
url = 'https://alternative.me/api/crypto/fear-and-greed-index/history'
headers = {
'user-agent': ('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36'),
'Referer': ('https://alternative.me/crypto/fear-and-greed-index/'),
'Accept-Language': ('ko-KR,ko;q=0.9,en-US;q=0.8,en;q=0.7'),
'accept-encoding': ('gzip, deflate, br'),
'accept': ('application/json, text/plain, */*')
}
res = requests.post(url, headers=headers)
I have python code like
import requests
headers = {'Host': 'www.google.com','User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:71.0) Gecko/20100101 Firefox/71.0','Accept': '*/*','Accept-Language': 'en-US,en;q=0.5','Accept-Encoding': 'gzip, deflate, br','Referer': 'https://www.google.com/','Origin': 'https://www.google.com','Connection': 'keep-alive','Content-Length': '0','TE': 'Trailers'}
response = requests.get("https://www.google.co.in/search?tbs=sbi:AMhZZivWlHh9fYSFQ1SYSgdWdYroq7vlNqRWbgzAeOHgb1_1aVO6EfHf9oo4N6kMf9pR-MjgXMeP5EG4VTTeZ5UujHI12znActXxMoyDqKsqI0cgI9YJ_11xd5R0DiKpo2drjWKnK2lNgGSGJYKdDFJ0ZNKhfBTUn3WKSmG72gLR07uPXdby9jCXC1KJFqBSpaGNrJ6Zc6LSQymwNqqJZrO8iwNYRPzJsoHlWUZNSoZ1X18Ii8X7x0TrlgSz0HySJ_1QO3E8LLbaE0rZluLVBsk6t0GDW2MR4IXs3dCuCcTMPDgqZS-CMks6Tgc6xkDLyLLBC051S6gNxRhXpZ3FVg75Vlt_1nAptI_1Vpw",headers=headers)
print(len(response.text))
and a flutter code like
import 'package:dio/dio.dart';
Dio dio = Dio();
var headers = {'Host': 'www.google.com','User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:71.0) Gecko/20100101 Firefox/71.0','Accept': '*/*','Accept-Language': 'en-US,en;q=0.5','Accept-Encoding': 'gzip, deflate, br','Referer': 'https://www.google.com/','Origin': 'https://www.google.com','Connection': 'keep-alive','Content-Length': '0','TE': 'Trailers'};
var response = await dio.get("https://www.google.co.in/search?tbs=sbi:AMhZZivWlHh9fYSFQ1SYSgdWdYroq7vlNqRWbgzAeOHgb1_1aVO6EfHf9oo4N6kMf9pR-MjgXMeP5EG4VTTeZ5UujHI12znActXxMoyDqKsqI0cgI9YJ_11xd5R0DiKpo2drjWKnK2lNgGSGJYKdDFJ0ZNKhfBTUn3WKSmG72gLR07uPXdby9jCXC1KJFqBSpaGNrJ6Zc6LSQymwNqqJZrO8iwNYRPzJsoHlWUZNSoZ1X18Ii8X7x0TrlgSz0HySJ_1QO3E8LLbaE0rZluLVBsk6t0GDW2MR4IXs3dCuCcTMPDgqZS-CMks6Tgc6xkDLyLLBC051S6gNxRhXpZ3FVg75Vlt_1nAptI_1Vpw",options: Options(headers: headers,followRedirects: true));
var a=response.data;
print(a.length);
The problem is that the result I am getting from both of the packages is not same. I want the output of python but have to implement python code in flutter. Flutter solution with any other package is also nice...
I'm trying to submit info to this site > https://cxkes.me/xbox/xuid
The info: e = {'gamertag' : "Xi Fall iX"}
Every time I try, I get WinError 10054. I can't seem to find a fix for this.
My Code:
import urllib.parse
import urllib.request
import json
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
url = "https://cxkes.me/xbox/xuid"
e = {'gamertag' : "Xi Fall iX"}
f = {'accept': "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
'accept-encoding': "gzip, deflate, br",
'accept-language': "en-GB,en-US;q=0.9,en;q=0.8",
'cache-control': "max-age=0",
'content-length': "76",
'content-type': "application/x-www-form-urlencoded",
'cookie': "__cfduid=d2f371d250727dc4858ad1417bdbcfba71593253872; XSRF-TOKEN=eyJpdiI6IjVcL2dHMGlYSGYwd3ZPVEpTRGlsMnFBPT0iLCJ2YWx1ZSI6InA4bDJ6cEtNdzVOT3UxOXN4c2lcLzlKRTlYaVNvZjdpMkhqcmllSWN3eFdYTUxDVHd4Y2NiS0VqN3lDSll4UDhVMHM1TXY4cm9lNzlYVGE0dkRpVWVEZz09IiwibWFjIjoiYjdlNjU3ZDg3M2Y0MDBlZDY3OWE5YTdkMWUwNGRiZTVkMTc5OWE1MmY1MWQ5OTQ2ODEzNzlhNGFmZGNkZTA1YyJ9; laravel_session=eyJpdiI6IjJTdlFhK0dacFZ4cFI5RFFxMHgySEE9PSIsInZhbHVlIjoia2F6UTJXVmNSTEt1M3lqekRuNVFqVE5ZQkpDang4WWhraEVuNm0zRmlVSjVTellNTDRUb1wvd1BaKzNmV2lISGNUQ0l6Z21jeFU3VlpiZzY0TzFCOHZ3PT0iLCJtYWMiOiIwODU3YzMxYzg2N2UzMjdkYjcxY2QyM2Y4OTVmMTY1YTcxZTAxZWI0YTExZDE0ZjFhYWI2NzRlODcyOTg3MjIzIn0%3D",
'origin': "https://cxkes.me",
'referer': "https://cxkes.me/xbox/xuid",
'sec-fetch-dest': "document",
'sec-fetch-mode': "navigate",
'sec-fetch-site': "same-origin",
'sec-fetch-user': "?1",
'upgrade-insecure-requests': "1",
'user-agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36"}
data = urllib.parse.urlencode(e)
data = json.dumps(data)
data = str(data)
data = data.encode('ascii')
req = urllib.request.Request(url, data, f)
with urllib.request.urlopen(req) as response:
the_page = response.read()
print(the_page)
Having run the code, I get the following error
[WinError 10054] An existing connection was forcibly closed by the remote host
that could be caused by any of the followings:
The network link between server and client may be temporarily going down.
running out of system resources.
sending malformed data.
I am not sure what you trying to achieve here entirely. But if your aim is to simply read the XUID of a gamer tag then use a web-automator like Selenium to retrieve that value.
import requests
x = requests.get('https://www1.nseindia.com/live_market/dynaContent/live_watch/equities_stock_watch.htm' )
print(x.status_code)
print(x.content)
Giving connection error. Please help how to correct it.
Try this:
import requests
url = "https://www1.nseindia.com/live_market/dynaContent/live_watch/equities_stock_watch.htm"
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, '
'like Gecko) '
'Chrome/80.0.3987.149 Safari/537.36',
'accept-language': 'en,gu;q=0.9,hi;q=0.8', 'accept-encoding': 'gzip, deflate, br'}
session = requests.Session()
request = session.get(url, headers=headers, timeout=5)
cookies = dict(request.cookies)
response = session.get(url, headers=headers, timeout=5, cookies=cookies)
print(response.status_code)
print(response.content)
This code for the first time you try to access the website in your program, If your accessing the site multiple times then use this:
response = session.get(url, headers=headers, timeout=5, cookies=cookies) everytime you try to access again.
Tell me if this works
Try add user agent to header:
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.162 Safari/537.36'}
r = requests.get('https://www1.nseindia.com/live_market/dynaContent/live_watch/equities_stock_watch.htm', headers=headers)
print(r.content)
I want to scrape using beautiful soup and python requests a website that requires a login first, I'm able to login by giving my username and password via a post request, however making a get request within the same session after login yeilds error 403(FORBIDDEN), is there a solution to this? The last line in my code is producing a 'forbidden' message, is there a workaround?
import requests
from bs4 import BeautifulSoup
headers = {
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36'
}
payload = {
'login' : '#my_username' , 'password': '#my_password', 'remember_me': 'false', 'fallback': 'false'
}
with requests.Session() as s:
url = 'https://www.hackerrank.com/auth/login'
r = s.get(url , headers = headers)
soup = BeautifulSoup(r.content , 'html5lib')
r = s.post(url , data = payload , headers = headers)
print(r.content)
s.get('Webpage_that_can_be_accessed_only_after_login' , headers = headers)
I did the almost the same thing only difference was that I passed the exact header I saw being passed in chrome and passed csrf_token
import requests
import json
import sys
from bs4 import BeautifulSoup
#header string picked from chrome
headerString='''
{
"accept": "text/html,application/xhtml+xml,application/xml;q':0.9,image/avif,image/webp,image/apng,*/*;q':0.8,application/signed-exchange;v':b3;q':0.9',text/html,application/xhtml+xml,application/xml;q':0.9,image/avif,image/webp,image/apng,*/*;q':0.8,application/signed-exchange;v':b3;q':0.9",
"accept-encoding": "gzip, deflate, br",
"accept-language": "en-US,en;q':0.9",
"cache-control": "max-age=0",
"cookie": "hackerrank_mixpanel_token':7283187c-1f24-4134-a377-af6c994db2a0; hrc_l_i':F; _hrank_session':653fb605c88c81624c6d8f577c9094e4f8657136ca3487f07a3068c25080706db7178cc4deda978006ce9d0937c138b52271e3cd199fda638e8a0b8650e24bb7; _ga':GA1.2.397113208.1599678708; _gid':GA1.2.933726361.1599678708; user_type':hacker; session_id':h3xb3ljp-1599678763378; __utma':74197771.397113208.1599678708.1599678764.1599678764.1; __utmc':74197771; __utmz':74197771.1599678764.1.1.utmcsr':(direct)|utmccn':(direct)|utmcmd':(none); __utmt':1; __utmb':74197771.3.10.1599678764; _biz_uid':5969ac22487d4b0ff8d000621de4a30c; _biz_sid:79bd07; _biz_nA':1; _biz_pendingA':%5B%5D; _biz_flagsA':%7B%22Version%22%3A1%2C%22ViewThrough%22%3A%221%22%2C%22XDomain%22%3A%221%22%7D; _gat_UA-45092266-28':1; _gat_UA-45092266-26':1; session_referrer':https%3A%2F%2Fwww.google.com%2F; session_referring_domain':www.google.com; session_landing_url':https%3A%2F%2Fwww.hackerrank.com%2Fprefetch_data%3Fcontest_slug%3Dmaster%26get_feature_feedback_list%3Dtrue",
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "none",
"sec-fetch-user": "?1",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36"
}
'''
d=json.loads(headerString)
#creating session
s = requests.Session()
url='https://www.hackerrank.com/auth/login'
r=s.get(url, headers=d)
#getting the csrf_token
soup = BeautifulSoup(r.text, 'html.parser')
csrf_token=soup.find('meta', id='csrf-token')['content']
#using it in login post call
request_header={
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36",
"x-csrf-token": csrf_token
}
payload={"login":"<user-name>","password":"<password>","remember_me":False,"fallback":True}
r=s.post(url, headers=request_header, data=payload)
#then I tested if login is successful by going into dashboard page
d=json.loads(r.text)
csrf_token=d['csrf_token']
url='https://www.hackerrank.com/dashboard'
request_header={
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36",
"x-csrf-token": csrf_token
}
r=s.get(url, headers=request_header, data=payload)
print(r.text)```