Is there a way to resolve FORBIDDEN FOR url? [duplicate] - python

This question already has an answer here:
Reading value from HTML page - nseindia
(1 answer)
Closed 2 years ago.
import requests
from bs4 import BeautifulSoup
url = "https://www.nseindia.com/api/option-chain-indices?symbol=BANKNIFTY"
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) chrome/85.0.4183.102 Safari/537.36','Accept-Language': 'en-US,en;q=0.9,hi;q=0.8,es;q=0.7','Accept-Encoding': 'gzip, deflate, br'}
response.raise_for_status()
df = response.json()
This used to work but now this error comes up HTTPError: 401 Client Error: Unauthorized for url: https://www.nseindia.com/api/option-chain-indices?symbol=BANKNIFTY.
any idea what should i do to access the web site in python ?

Well in that case, we should visit the main page to obtain the Cookies and then contact the API:
import requests
params = {
'symbol': 'BANKNIFTY'
}
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:81.0) Gecko/20100101 Firefox/81.0'
}
def main(url):
with requests.Session() as req:
req.headers.update(headers)
r = req.get("https://www.nseindia.com")
r = req.get(url, params=params).json()
print(r)
main("https://www.nseindia.com/api/option-chain-indices")

Related

Python IndexError: list index out of range. Can someone help me?

Can someone help me to solve my problem in this code?
CODE:
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/83.0.4103.61 Safari/537.36"}
url = "https://www.amazon.com/RUNMUS-Surround-Canceling-Compatible-Controller/dp/B07GRM747Y"
resp = requests.get(url, headers=headers)
s = BeautifulSoup(resp.content, features='lxml')
product_title = s.select("#productTitle")[0].get_text().strip()
print(product_title)
If you try to print what you get as response, you will not encounter related errors.
import requests
headers = {'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36"}
url = "https://www.amazon.com/RUNMUS-Surround-Canceling-Compatible-Controller/dp/B07GRM747Y"
resp = requests.get(url, headers=headers)
print(resp.content)
The output you are getting from this request:
b'<!--\n To discuss automated access to Amazon data please contact api-services-support#amazon.com.\n For information about migrating to our APIs refer to our Marketplace APIs...
The site you are sending requests is not allowing you to access content with provided headers. So your s.select("#productTitle") creates empty list therefore you are getting an index error.

python requests not able to make connection to NSE india, Connection error

import requests
x = requests.get('https://www1.nseindia.com/live_market/dynaContent/live_watch/equities_stock_watch.htm' )
print(x.status_code)
print(x.content)
Giving connection error. Please help how to correct it.
Try this:
import requests
url = "https://www1.nseindia.com/live_market/dynaContent/live_watch/equities_stock_watch.htm"
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, '
'like Gecko) '
'Chrome/80.0.3987.149 Safari/537.36',
'accept-language': 'en,gu;q=0.9,hi;q=0.8', 'accept-encoding': 'gzip, deflate, br'}
session = requests.Session()
request = session.get(url, headers=headers, timeout=5)
cookies = dict(request.cookies)
response = session.get(url, headers=headers, timeout=5, cookies=cookies)
print(response.status_code)
print(response.content)
This code for the first time you try to access the website in your program, If your accessing the site multiple times then use this:
response = session.get(url, headers=headers, timeout=5, cookies=cookies) everytime you try to access again.
Tell me if this works
Try add user agent to header:
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.162 Safari/537.36'}
r = requests.get('https://www1.nseindia.com/live_market/dynaContent/live_watch/equities_stock_watch.htm', headers=headers)
print(r.content)

WEB SCRAPING - python requests session not able to gather data

I've seen some similar threads but neither gave me the answer. I simply need to get html content from one website. I'm sending the POST request with data for particular case and then using GET requests I want to scrape the text from html. The problem is that I always receive the first page's content. Not sure what I am doing wrong.
import requests
headers = {
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'pl-PL,pl;q=0.9,en-US;q=0.8,en;q=0.7',
'Connection':'keep-alive',
'Content-Type':'application/x-www-form-urlencoded',
'Origin':'https://przegladarka-ekw.ms.gov.pl',
'Referer':'https://przegladarka-ekw.ms.gov.pl/eukw_prz/KsiegiWieczyste/wyszukiwanieKW',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36',
}
data = {
'kodWydzialu':'PT1R',
'nrKw':'00037314',
'cyfraK':'9',
}
url = 'https://przegladarka-ekw.ms.gov.pl/eukw_prz/KsiegiWieczyste/wyszukiwanieKW'
r = requests.session()
r.post(url, data=data, headers=headers)
final_content = r.get(url, headers=headers)
print(final_content.text)
The GET requests come from ("https://przegladarka-ekw.ms.gov.pl/eukw_prz/eukw201906070952/js/jquery-1.11.0_min.js
") but it returns a wall of code. My goal is to scrape the page which appears after providing the data from above to search menu.
try this
import json
import urllib.request
headers = {
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'pl-PL,pl;q=0.9,en-US;q=0.8,en;q=0.7',
'Connection':'keep-alive',
'Content-Type':'application/x-www-form-urlencoded',
'Origin':'https://przegladarka-ekw.ms.gov.pl',
'Referer':'https://przegladarka-ekw.ms.gov.pl/eukw_prz/KsiegiWieczyste/wyszukiwanieKW',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36',
}
data = {
'kodWydzialu':'PT1R',
'nrKw':'00037314',
'cyfraK':'9',
}
url = 'https://przegladarka-ekw.ms.gov.pl/eukw_prz/KsiegiWieczyste/wyszukiwanieKW'
r=urllib.request.urlopen(url, data=bytes(json.dumps(data), encoding="utf-8"))
final_content = r
for i in r:
print(i)

error in $ failed reading not a valid json value when i try to send requests via python

so ive been trying to figure out how to do the 'follow' thing using python codes on imvu.com, but it always returns the message "invalid arguments" error in $: failed reading: not a valid json value"
import requests
headers = {
"Origin": "https://secure.imvu.com/",
"Referer": "https://secure.imvu.com/next/av/Sammy165/",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36",
"X-IMVU-SAUCE": "" #removed sauce for account safety
}
url = "https://api.imvu.com/profile/profile-user-696969696/subscriptions"
data = {"id": "https://api.imvu.com/profile/profile-user-175389029"}
req = requests.post(url=url, headers=headers, data=data)
print(req.text)
Have you tried
requests.post(url=url, headers=headers, json=data)
?
You have to do json.dumps(data). See code below
import requests
import json
headers = {
"Origin": "https://secure.imvu.com/",
"Referer": "https://secure.imvu.com/next/av/Sammy165/",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36",
"X-IMVU-SAUCE": "" #removed sauce for account safety
}
url = "https://api.imvu.com/profile/profile-user-696969696/subscriptions"
data = {"id": "https://api.imvu.com/profile/profile-user-175389029"}
req = requests.post(url=url, headers=headers, data=json.dumps(data))
print(req.text)
Output:
{"status":"failure","error":"ERROR-GENERIC-001","message":"Permission Denied: You are not allowed to modify this subscription set."}

python requests does not POST after redirect

For some reason python requests does not do rePOST after encountered redirect header
import requests
proxies = {'http': 'http://127.0.0.1:8888',}
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36',
'content-type': 'application/x-www-form-urlencoded'}
r = requests.post(url, data, headers=headers, timeout=timeout, proxies=proxies, allow_redirects=True,)
html = r.text
So it means I can't login to any form that is behind redirect. How can I solve this issue? Thank you!

Categories