This question already has an answer here:
Reading value from HTML page - nseindia
(1 answer)
Closed 2 years ago.
import requests
from bs4 import BeautifulSoup
url = "https://www.nseindia.com/api/option-chain-indices?symbol=BANKNIFTY"
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) chrome/85.0.4183.102 Safari/537.36','Accept-Language': 'en-US,en;q=0.9,hi;q=0.8,es;q=0.7','Accept-Encoding': 'gzip, deflate, br'}
response.raise_for_status()
df = response.json()
This used to work but now this error comes up HTTPError: 401 Client Error: Unauthorized for url: https://www.nseindia.com/api/option-chain-indices?symbol=BANKNIFTY.
any idea what should i do to access the web site in python ?
Well in that case, we should visit the main page to obtain the Cookies and then contact the API:
import requests
params = {
'symbol': 'BANKNIFTY'
}
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:81.0) Gecko/20100101 Firefox/81.0'
}
def main(url):
with requests.Session() as req:
req.headers.update(headers)
r = req.get("https://www.nseindia.com")
r = req.get(url, params=params).json()
print(r)
main("https://www.nseindia.com/api/option-chain-indices")
Related
Can someone help me to solve my problem in this code?
CODE:
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/83.0.4103.61 Safari/537.36"}
url = "https://www.amazon.com/RUNMUS-Surround-Canceling-Compatible-Controller/dp/B07GRM747Y"
resp = requests.get(url, headers=headers)
s = BeautifulSoup(resp.content, features='lxml')
product_title = s.select("#productTitle")[0].get_text().strip()
print(product_title)
If you try to print what you get as response, you will not encounter related errors.
import requests
headers = {'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36"}
url = "https://www.amazon.com/RUNMUS-Surround-Canceling-Compatible-Controller/dp/B07GRM747Y"
resp = requests.get(url, headers=headers)
print(resp.content)
The output you are getting from this request:
b'<!--\n To discuss automated access to Amazon data please contact api-services-support#amazon.com.\n For information about migrating to our APIs refer to our Marketplace APIs...
The site you are sending requests is not allowing you to access content with provided headers. So your s.select("#productTitle") creates empty list therefore you are getting an index error.
import requests
x = requests.get('https://www1.nseindia.com/live_market/dynaContent/live_watch/equities_stock_watch.htm' )
print(x.status_code)
print(x.content)
Giving connection error. Please help how to correct it.
Try this:
import requests
url = "https://www1.nseindia.com/live_market/dynaContent/live_watch/equities_stock_watch.htm"
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, '
'like Gecko) '
'Chrome/80.0.3987.149 Safari/537.36',
'accept-language': 'en,gu;q=0.9,hi;q=0.8', 'accept-encoding': 'gzip, deflate, br'}
session = requests.Session()
request = session.get(url, headers=headers, timeout=5)
cookies = dict(request.cookies)
response = session.get(url, headers=headers, timeout=5, cookies=cookies)
print(response.status_code)
print(response.content)
This code for the first time you try to access the website in your program, If your accessing the site multiple times then use this:
response = session.get(url, headers=headers, timeout=5, cookies=cookies) everytime you try to access again.
Tell me if this works
Try add user agent to header:
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.162 Safari/537.36'}
r = requests.get('https://www1.nseindia.com/live_market/dynaContent/live_watch/equities_stock_watch.htm', headers=headers)
print(r.content)
I've seen some similar threads but neither gave me the answer. I simply need to get html content from one website. I'm sending the POST request with data for particular case and then using GET requests I want to scrape the text from html. The problem is that I always receive the first page's content. Not sure what I am doing wrong.
import requests
headers = {
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'pl-PL,pl;q=0.9,en-US;q=0.8,en;q=0.7',
'Connection':'keep-alive',
'Content-Type':'application/x-www-form-urlencoded',
'Origin':'https://przegladarka-ekw.ms.gov.pl',
'Referer':'https://przegladarka-ekw.ms.gov.pl/eukw_prz/KsiegiWieczyste/wyszukiwanieKW',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36',
}
data = {
'kodWydzialu':'PT1R',
'nrKw':'00037314',
'cyfraK':'9',
}
url = 'https://przegladarka-ekw.ms.gov.pl/eukw_prz/KsiegiWieczyste/wyszukiwanieKW'
r = requests.session()
r.post(url, data=data, headers=headers)
final_content = r.get(url, headers=headers)
print(final_content.text)
The GET requests come from ("https://przegladarka-ekw.ms.gov.pl/eukw_prz/eukw201906070952/js/jquery-1.11.0_min.js
") but it returns a wall of code. My goal is to scrape the page which appears after providing the data from above to search menu.
try this
import json
import urllib.request
headers = {
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'pl-PL,pl;q=0.9,en-US;q=0.8,en;q=0.7',
'Connection':'keep-alive',
'Content-Type':'application/x-www-form-urlencoded',
'Origin':'https://przegladarka-ekw.ms.gov.pl',
'Referer':'https://przegladarka-ekw.ms.gov.pl/eukw_prz/KsiegiWieczyste/wyszukiwanieKW',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36',
}
data = {
'kodWydzialu':'PT1R',
'nrKw':'00037314',
'cyfraK':'9',
}
url = 'https://przegladarka-ekw.ms.gov.pl/eukw_prz/KsiegiWieczyste/wyszukiwanieKW'
r=urllib.request.urlopen(url, data=bytes(json.dumps(data), encoding="utf-8"))
final_content = r
for i in r:
print(i)
so ive been trying to figure out how to do the 'follow' thing using python codes on imvu.com, but it always returns the message "invalid arguments" error in $: failed reading: not a valid json value"
import requests
headers = {
"Origin": "https://secure.imvu.com/",
"Referer": "https://secure.imvu.com/next/av/Sammy165/",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36",
"X-IMVU-SAUCE": "" #removed sauce for account safety
}
url = "https://api.imvu.com/profile/profile-user-696969696/subscriptions"
data = {"id": "https://api.imvu.com/profile/profile-user-175389029"}
req = requests.post(url=url, headers=headers, data=data)
print(req.text)
Have you tried
requests.post(url=url, headers=headers, json=data)
?
You have to do json.dumps(data). See code below
import requests
import json
headers = {
"Origin": "https://secure.imvu.com/",
"Referer": "https://secure.imvu.com/next/av/Sammy165/",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36",
"X-IMVU-SAUCE": "" #removed sauce for account safety
}
url = "https://api.imvu.com/profile/profile-user-696969696/subscriptions"
data = {"id": "https://api.imvu.com/profile/profile-user-175389029"}
req = requests.post(url=url, headers=headers, data=json.dumps(data))
print(req.text)
Output:
{"status":"failure","error":"ERROR-GENERIC-001","message":"Permission Denied: You are not allowed to modify this subscription set."}
For some reason python requests does not do rePOST after encountered redirect header
import requests
proxies = {'http': 'http://127.0.0.1:8888',}
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36',
'content-type': 'application/x-www-form-urlencoded'}
r = requests.post(url, data, headers=headers, timeout=timeout, proxies=proxies, allow_redirects=True,)
html = r.text
So it means I can't login to any form that is behind redirect. How can I solve this issue? Thank you!