Scraping pre-market table on Barchart with Python - python

Apologies if this is a bit website specific (barchart.com). I used the guidance provided here for properly connecting and scraping barchart.com for Futures data. However, after hours of trying, I am at a loss as to how to pull off this same trick for their pre-market data table: Barchart_Premarket_Site.
Anyone know the trick to get the pre-market data?
Here is the basic connection, for which i get a 403:
import requests
geturl=r'https://www.barchart.com/stocks/pre-market-trading/volume-advances?orderBy=preMarketVolume&orderDir=desc'
s=requests.Session()
r=s.get(geturl)
#j=r.json()
print(r)`

All that was required was to add more headers to the request. You can find your own headers using chrome > developer tools; and then just find the api request for the table and slam in a few of the headers associated with that request.
import requests
request_url = "https://www.barchart.com/proxies/core-api/v1/quotes/get?lists=stocks.us.premarket.volume_advances&orderDir=desc&fields=symbol%2CsymbolName%2CpreMarketLastPrice%2CpreMarketPriceChange%2CpreMarketPercentChange%2CpreMarketVolume%2CpreMarketAverage5dVolume%2CpreMarketPreviousLast%2CpreMarketPreviousChange%2CpreMarketPreviousPercentChange%2CpreMarketTradeTime%2CnextEarningsDate%2CnextEarningsDate%2CtimeCode%2CsymbolCode%2CsymbolType%2ChasOptions&orderBy=preMarketVolume&meta=field.shortName%2Cfield.type%2Cfield.description%2Clists.lastUpdate&hasOptions=true&page=1&limit=100&raw=1"
headers = {
'accept': 'application/json',
'cookie': '_gcl_au=1.1.685644914.1670446600; _fbp=fb.1.1670446600221.1987872306; _pbjs_userid_consent_data=3524755945110770; _pubcid=e7cf9178-59bc-4a82-b6c4-a2708ed78b8d; _admrla=2.2-1e3aed0d7d9d2975-a678aeef-7671-11ed-803e-d12e87d011f0; _lr_env_src_ats=false; _cc_id=6c9e21e7f9c269f8501e2616f9e68632; __browsiUID=c0174d21-a0ab-4dfe-8978-29ae08f44964; __qca=P0-531499020-1670446603686; __gads=ID=220b766bf87e15f9-22fa0316ded8001f:T=1670446598:S=ALNI_MaEWcBqESsJKLF0AwoIVvrKjpjZ_g; panoramaId_expiry=1673549551401; panoramaId=9aa5615403becfbc8adf14a3024816d53938b8cdbea6c8f5cabb60112755d70c; udmsrc=%7B%7D; _pk_id.1.73a4=1aee00a1c66e897b.1672997455.; _ccm_inf=1; bcPremierAdsListScreen=true; _hjSessionUser_2563157=eyJpZCI6ImI2MTM5NTQ4LWUxYzMtNTU2NS04MmM3LTk4ODQ5MWNjY2YxZCIsImNyZWF0ZWQiOjE2NzMwMzQ3OTY0NDAsImV4aXN0aW5nIjp0cnVlfQ==; bcFreeUserPageView=0; _gid=GA1.2.449489725.1673276404; _ga_4HQ9CY2XKK=GS1.1.1673303248.3.0.1673303248.0.0.0; _ga=GA1.2.606341620.1670446600; __aaxsc=2; aasd=5%7C1673314072749; webinar131WebinarClosed=true; _lr_geo_location_state=NC; _lr_geo_location=US; udm_edge_floater_fcap=%5B1673397095403%2C1673392312561%2C1673078162569%2C1673076955809%2C1673075752582%2C1673066137343%2C1673056514808%2C1673051706099%2C1673042087115%2C1673037276340%2C1672960427551%2C1672952009965%2C1672947201101%5D; pbjs-unifiedid=%7B%22TDID%22%3A%2219345091-e7fd-4323-baeb-4627c879c6ba%22%2C%22TDID_LOOKUP%22%3A%22TRUE%22%2C%22TDID_CREATED_AT%22%3A%222022-12-05T19%3A48%3A10%22%7D; __gpi=UID=000008c6d06e1e0d:T=1670446598:RT=1673433090:S=ALNI_MZS6mLx8CJg9iN6kzx4JeDFHPOMjg; market=eyJpdiI6InJvcVNudkprUjQ1bE0yWWQrSTlYY1E9PSIsInZhbHVlIjoieUpabHpmSnJGSkIxc0o1enpyb1dLdENBSWp4UE5NYUZwUFg3OGs0TGJSL0dQWUNpTDU0a2hZbklOQTFNd09OVSIsIm1hYyI6IjBjMjJkNDExZjRhOTc2M2QwYWU3NGUyNmVlZTgyMzY2NWM2MjQyOTY2MjY2YmUxODI2Y2RkY2FlNzI3MjNkOTIifQ%3D%3D; _lr_retry_request=true; __browsiSessionID=c02dadca-6355-415f-aa80-926cccd94759&true&false&DEFAULT&us&desktop-4.11.12&false; IC_ViewCounter_www.barchart.com=2; cto_bundle=dxDlRl90VldIJTJGa0VaRzRIS0xnQmdQOXVWVlhybWJ3NDluY29PelBnM0prMkFxZkxyZWh4dkZNZG9LcyUyRjY1VWlIMWRldkRVRlJ5QW05dHlsQU1xN2VmbzlJOXZFSTNlcFRxUkRxYiUyRlp6Z3hhUHpBekdReU5idVV0WnkxVll0eGp5TyUyQlVzJTJCVDVoRkpWWlZ4R0hOSUl2YTVJVDhBJTNEJTNE; cto_bidid=51ixCl92dkhqbmVmdnlTZHVYS25nWTk2eDVMUnVRNjhEMUhxa3FlcmFzRHVNSERUQkd5cFZrM0QyQyUyRkVNNkV6S0ZHOUZPcTBTR2lBUjA5QUc5YU1ucW9GMFZBWHB4aU9sMlo3WHAlMkJYWjZmJTJGWkpsWSUzRA; _awl=2.1673451629.5-df997ba8dc13bee936d8d14a9771e587-6763652d75732d6561737431-0; laravel_token=eyJpdiI6IjR2YStGblAxWlZoZzllcEtPUUFLNlE9PSIsInZhbHVlIjoiY3E2bHdQWFkyT1FFUHFka2NMMVoyREFvQlZwWXlxc3F0SlRuZnIyTHJsSWtNVFA0K1czcDloWFF2d0lVZys3azZyelkrWks5SWxuRW05MGlqV1I4QmViMU9KKzArVXJOTWNVK2hqZVRocVNHM3NZa1dNeStQbnNyYVBtcjlUeTZzT2lpV2t1ek1UOE1wSUFudmg0NzFTQ3VPeDJiYk16bGNBTzVqVHBCcFRZdTFsZjBVREVyUEhLeThjZm9wSGIzQ2NDVE0ya0xOQWx1VGx0aUlEUE9yakU4Q3RicWFmNDdkYjJSWHlsSWYwajlSUkozVmQ4OVNGNzZEeWhtUExtcXB6VnNrY2NsUzRFQnJyMlhiejFtc0l3U2p5SW5BbFFDZTN0dk9EUWNOR2hVYUdMbmhFUFZVT24xOFFGVkM3L2giLCJtYWMiOiIxYzM5Yzk1ZWNjNjM0NzdjMmM4YTJkZDg0ZmY5MWQwNWUzOTlhNTAwNjg2MTNmNTNlYzY4M2MzYWQ3MDA4MThlIn0%3D; XSRF-TOKEN=eyJpdiI6Ik1PMGEvOGFkZ1p1ekpNcXIvZWZtcHc9PSIsInZhbHVlIjoiMVZYQ3NCV1hjcWREdG5uSDVqYXZVVy91U29USys1dkJJeFNZZG9QVGNRNDhmMTJIeitVV2NUV0xSUC9ZTThVM3FDQWZBcVdNclhqSkx4MG1NTGhadUNlNXRLMEdUc3RDcEVwNnJVYU9FNTBub2NKRWxlekxBZmZEVXNhZUlwWnoiLCJtYWMiOiIxYTI0N2E2OGMxMzRhNmFiYTliMzBlYTdjYWZlNzUwMDRlY2Q5YjI2YzY4OGZlMWIxYmM0YTE3YzZkMTdhMGM3In0%3D; laravel_session=eyJpdiI6InJIcmMxRWVacmtGc2tENS9zYUFFOVE9PSIsInZhbHVlIjoibG1vQWh1d1dmaUNBZTV4dGdJbWhTVEoyMWVrblBFNTBycTBPai9ad2llcHRkd0hHUTI4ZS8rUFNFVm5LNEcvd1RXY1RwOHdVZHplNU92Vk9xUHZjYmMrUC9Cc3hJUkJNWE54OVR1UHFaTExpM1BRcWRSWEJ5Q3gvVVNzajdHZUoiLCJtYWMiOiI5NDVkOGU4NGM5Y2MwMThmMTgwMzQyOWQ1Yzc5MzU5ZGU2ZjkwMWRjYzBjZWJiZDFhMTQzODMzZmE2NWExMGQ3In0%3D',
'referer': 'https://www.barchart.com/stocks/pre-market-trading/volume-advances?orderBy=preMarketVolume&orderDir=desc',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
'x-xsrf-token': 'eyJpdiI6Im1LQVRpVEJONzZwMDRVQnhYK0I5SWc9PSIsInZhbHVlIjoiMkRIMnJBb1VDQmRscjNlajF1dVR2eWxRbGNJTGZCNWxMaWk3N0EzQWlyOWk0cXJBK2oyUVJ1N282R2VOVWh6WlhJcXdZdFplZmRqaFhPa203bi9HeFBxckJKeUVzVDRETHI5OHlxNDZnOEF5WVV5NXdNSWJiWk95UlFHRXQwN2siLCJtYWMiOiI1NTkyZjk2M2FlNTE0NDI0ODQ3YmE4ZjIyZDY1MzM2MTA3ZTY4NDA5NzA5YzViMjhiN2UwYTFhNTM1Y2ZkMjk5In0='
}
r = requests.get(request_url,headers=headers)

Related

Is there a way to change the geolocation of the request in requests_html?

I am trying to scrape some Facebook post stats (number of reactions, comments, views or shares) with requests_html package. I used this approach because I had to render the initial page source since Facebook uses some lazy loading scripts.
I managed to do it but it depends on the geolocation. For example, if I make the request from my local machine the comment count span will be in my country's language, but if I use a UK VPN it will be displayed in English (E.g. '5 comments').
The goal is to deploy the code into cloud and to standardize it to a single Language to be more robust.
I tried sending the Accept-Language header as shown in the snippet below, but no success.
Please note that the xpath might change if the page structure changes.
from requests_html import HTMLSession
session = HTMLSession()
headers = {
'Accept-Language': 'en-GB,en;q=0.9,en-US;q=0.8',
'Content-type': 'text/html; charset=utf-8',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36',
}
r = session.get(url, headers=headers)
r.html.render(timeout=10, wait=2, sleep=2)
xpath = '/html/body/div[1]/div/div[1]/div/div[3]/div/div/div/div[1]/div[2]/div[1]/div/div/div[1]/div[2]/div[2]/div/div/div[2]/div/div[3]/span/div/span'
comments = r.html.xpath(xpath, first=True).text
Thank you in advance!

Data Scraping of the Numbers and details from the website

I want to scrape the contact numbers from the website with the respective details of the Courier Services. I am not able to scrape the Contact numbers and other details like name address and rating from all the Courier services. I analyzed the data is in the script tag. Please suggest a fix for this
import requests
import pandas as pd
import json
import csv
from lxml import html
import re
headers ={'authority': 'www.justdial.com',
'accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9 ',
'accept-encoding': 'gzip, deflate, br',
'accept-language':'en-US,en;q=0.9',
'user-agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36" }
produrl = 'https://www.justdial.com/Mumbai/Courier-Services-in-Mumbai-Bazar-Nalasopara-East/nct-10142628'
prodresp = requests.get(produrl, headers=headers, timeout=30)
prodResphtml = html.fromstring(prodresp.text)
partjson = prodResphtml.xpath('/html/head/script[9]/text()')
print(partjson)
That's data are coming with ajax api call from there;
https://www.justdial.com/api/india_api_write/20march2020/searchziva.php?city=Mumbai&area=Mumbai-Bazar-Nalasopara-East&lat=&long=&darea_flg=0&case=spcall&stype=category_list&search=Courier-Services&national_catid=10142628&nextdocid=&attribute_values=&basedon=&sortby=&nearme=0&max=100&pg_no=1

Python requests - session token changing

I am currently using Python requests to scrape data from a website and using Postman as a tool to help me do it.
To those not familiar with Postman, it sends a get request and generates a code snippet to be used in many languages, including Python.
By using it, I can get data from the website quite easily, but it seems as like the 'Cookie' aspect of headers provided by Postman changes with time, so I can't automate my code to run anytime. The issue is that when the cookie is not valid I get an access denied message.
Here's an example of the code provided by Postman:
import requests
url = "https://wsloja.ifood.com.br/ifood-ws-v3/restaurants/7c854a4c-01a4-48d8-b3d4-239c6c069f6a/menu"
payload = {}
headers = {
'access_key': '69f181d5-0046-4221-b7b2-deef62bd60d5',
'browser': 'Windows',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36',
'Accept': 'application/json, text/plain, */*',
'secret_key': '9ef4fb4f-7a1d-4e0d-a9b1-9b82873297d8',
'Cache-Control': 'no-cache, no-store',
'X-Ifood-Session-Id': '85956739-2fac-4ebf-85d3-1aceda9738df',
'platform': 'Desktop',
'app_version': '8.37.0',
'Cookie': 'session_token=TlNUXzMyMjJfMTU5Nzg1MDE5NTIxNF84NDI5NTA2NDQ2MjUxMg==; _abck=AD1745CB8A0963BF3DD67C8AF7932007~-1~YAAQtXsGYH8UUe9zAQAACZ+IAgStbP4nYLMtonPvQ+4UY+iHA3k6XctPbGQmPF18spdWlGiDB4/HbBvDiF0jbgZmr2ETL8YF+f71Uwhsj+L8K+Fk4PFWBolAffkIRDfSubrf/tZOYRfmw09o59aFuQor5LeqxzXkfVsXE8uIJE0P/nC1JfImZ35G0OFt+HyIgDUZMFQ54Wnbap7+LMSWcvMKF6U/RlLm46ybnNnT/l/NLRaEAOIeIE3/JdKVVcYT2t4uePfrTkr5eD499nyhFJCwSVQytS9P7ZNAM4rFIPnM6kPtwcPjolLNeeU=~-1~-1~-1; ak_bmsc=129F92B2F8AC14A400433647B8C29EA3C9063145805E0000DB253D5F49CE7151~plVgguVnRQTAstyzs8P89cFlKQnC9ISQCH9KPHa8xYPDVoV2iQ/Hij2PL9r8EKEqcQfzkGmUWpK09ZpU0tL/llmBloi+S+Znl5P5/NJeV6Ex2gXqBu1ZCxc9soMWWyrdvG+0FFvSP3a6h3gaouPh2O/Tm4Ghk9ddR92t380WBkxvjXBpiPzoYp1DCO4yrEsn3Tip1Gan43IUHuCvO+zkRmgrE3Prfl1T/g0Px9mvLSVrg=; bm_sz=3106E71C2F26305AE435A7DA00506F01~YAAQRTEGyfky691zAQAAGuDbBggFW4fJcnF1UtgEsoXMFkEZk1rG8JMddyrxP3WleKrWBY7jA/Q08btQE43cKWmQ2qtGdB+ryPtI2KLNqQtKM5LnWRzU+RqBQqVbZKh/Rvp2pfTvf5lBO0FRCvESmYjeGvIbnntzaKvLQiDLO3kZnqmMqdyxcG1f51aoOasrjfo=; bm_sv=B4011FABDD7E457DDA32CBAB588CE882~aVOIuceCgWY25bT2YyltUzGUS3z5Ns7gJ3j30i/KuVUgG1coWzGavUdKU7RfSJewTvE47IPiLztXFBd+mj7c9U/IJp+hIa3c4z7fp22WX22YDI7ny3JxN73IUoagS1yQsyKMuxzxZOU9NpcIl/Eq8QkcycBvh2KZhhIZE5LnpFM='
}
response = requests.request("GET", url, headers=headers, data = payload)
print(response.text.encode('utf8'))
Here's just the Cookie part where I get access denied:
'Cookie': 'session_token=TlNUXzMyMjJfMTU5Nzg1MDE5NTIxNF84NDI5NTA2NDQ2MjUxMg==; _abck=AD1745CB8A0963BF3DD67C8AF7932007~-1~YAAQtXsGYH8UUe9zAQAACZ+IAgStbP4nYLMtonPvQ+4UY+iHA3k6XctPbGQmPF18spdWlGiDB4/HbBvDiF0jbgZmr2ETL8YF+f71Uwhsj+L8K+Fk4PFWBolAffkIRDfSubrf/tZOYRfmw09o59aFuQor5LeqxzXkfVsXE8uIJE0P/nC1JfImZ35G0OFt+HyIgDUZMFQ54Wnbap7+LMSWcvMKF6U/RlLm46ybnNnT/l/NLRaEAOIeIE3/JdKVVcYT2t4uePfrTkr5eD499nyhFJCwSVQytS9P7ZNAM4rFIPnM6kPtwcPjolLNeeU=~-1~-1~-1; ak_bmsc=129F92B2F8AC14A400433647B8C29EA3C9063145805E0000DB253D5F49CE7151~plVgguVnRQTAstyzs8P89cFlKQnC9ISQCH9KPHa8xYPDVoV2iQ/Hij2PL9r8EKEqcQfzkGmUWpK09ZpU0tL/llmBloi+S+Znl5P5/NJeV6Ex2gXqBu1ZCxc9soMWWyrdvG+0FFvSP3a6h3gaouPh2O/Tm4Ghk9ddR92t380WBkxvjXBpiPzoYp1DCO4yrEsn3Tip1Gan43IUHuCvO+zkRmgrE3Prfl1T/g0Px9mvLSVrg=; bm_sz=3106E71C2F26305AE435A7DA00506F01~YAAQRTEGyfky691zAQAAGuDbBggFW4fJcnF1UtgEsoXMFkEZk1rG8JMddyrxP3WleKrWBY7jA/Q08btQE43cKWmQ2qtGdB+ryPtI2KLNqQtKM5LnWRzU+RqBQqVbZKh/Rvp2pfTvf5lBO0FRCvESmYjeGvIbnntzaKvLQiDLO3kZnqmMqdyxcG1f51aoOasrjfo=; bm_sv=B4011FABDD7E457DDA32CBAB588CE882~aVOIuceCgWY25bT2YyltUzGUS3z5Ns7gJ3j30i/KuVUgG1coWzGavUdKU7RfSJewTvE47IPiLztXFBd+mj7c9U/IJp+hIa3c4z7fp22WX23E755znZL76c0V/amxbHU9BUnrEff3HGcsniyh5mU+C9XVmtNRLd8oT1UW9WUg3qE=' }
Which is slightly different from the one before.
How could I get through this by somehow having python get the session token?
Apparently just removing 'Cookie' from headers does the job.

Workaround for blocked GET requests in Python

I'm trying to retrieve and process the results of a web search using requests and beautifulsoup.
I've written some simple code to do the job, and it returns successfully (status = 200), but the content of the request is just an error message "We're sorry for any inconvenience, but the site is currently unavailable.", and has been the same for the last several days. Searching within Firefox returns results without issue, however. I've run the code using a URL for the UK-based site and it works without issue so I wonder if the US site is set up to block attempts to scrape web searches.
Are there ways to mask the fact I'm attempting to retrieve search results from within Python (eg, masquerading as a standard search within Firefox) or some other work around to allow access to the search results?
Code included for reference below:
import pandas as pd
from requests import get
import bs4 as bs
import re
# works
# baseURL = 'https://www.autotrader.co.uk/car-search?sort=sponsored&radius=1500&postcode=ky119sb&onesearchad=Used&onesearchad=Nearly%20New&onesearchad=New&make=TOYOTA&model=VERSO&year-from=1990&year-to=2017&minimum-mileage=0&maximum-mileage=200000&body-type=MPV&fuel-type=Diesel&minimum-badge-engine-size=1.6&maximum-badge-engine-size=4.5&maximum-seats=8'
# doesn't work
baseURL = 'https://www.autotrader.com/cars-for-sale/Certified+Cars/cars+under+50000/Jeep/Grand+Cherokee/Seattle+WA-98101?extColorsSimple=BURGUNDY%2CRED%2CWHITE&maxMileage=45000&makeCodeList=JEEP&listingTypes=CERTIFIED%2CUSED&interiorColorsSimple=BEIGE%2CBROWN%2CBURGUNDY%2CTAN&searchRadius=0&modelCodeList=JEEPGRAND&trimCodeList=JEEPGRAND%7CSRT%2CJEEPGRAND%7CSRT8&zip=98101&maxPrice=50000&startYear=2015&marketExtension=true&sortBy=derivedpriceDESC&numRecords=25&firstRecord=0'
a = get(baseURL)
soup = bs.BeautifulSoup(a.content,'html.parser')
info = soup.find_all('div', class_ = 'information-container')
price = soup.find_all('div', class_ = 'vehicle-price')
d = []
for idx, i in enumerate(info):
ii = i.find_next('ul').find_all('li')
year_ = ii[0].text
miles = re.sub("[^0-9\.]", "", ii[2].text)
engine = ii[3].text
hp = re.sub("[^\d\.]", "", ii[4].text)
p = re.sub("[^\d\.]", "", price[idx].text)
d.append([year_, miles, engine, hp, p])
df = pd.DataFrame(d, columns=['year','miles','engine','hp','price'])
By default, Requests sends a unique user agent when making requests.
>>> r = requests.get('https://google.com')
>>> r.request.headers
{'User-Agent': 'python-requests/2.22.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
It is possible that the website you are using is trying to avoid scrapers by denying any request with a user agent of python-requests.
To get around this, you can change your user agent when sending a request. Since it's working on your browser, simply copy your browser user agent (you can Google it, or record a request to a webpage and copy your user agent like that). For me, it's Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36 (what a mouthful), so I'd set my user agent like this:
>>> headers = {
... 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36'
... }
and then send the request with the new headers (the new headers are added to the default headers, they don't replace them unless they have the same name):
>>> r = requests.get('https://google.com', headers=headers) # Using the custom headers we defined above
>>> r.request.headers
{'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
Now we can see that the request was sent with our preferred headers, and hopefully the site won't be able to tell the difference between Requests and a browser.

How to obtain a JSON response from the stats.nba.com API?

I'm just trying to simply use a Python get request to access JSON data from stats.nba.com. It seems pretty straight-forward as I can enter the URL into your browser and get the results I'm looking for. However, whenever I run this the program just runs to no end. I'm wondering if I have to include some type of headers information in my get request.
The code is below:
import requests
url = 'http://stats.nba.com/stats/commonteamroster?LeagueID=00&Season=2017-18&TeamID=1610612756'
response=requests.get(url)
print response.text
I have tried to visit the url you given, you can add header to your request to avoid this problem (the minimum information you need to provide is User-Agent, I think you can use more header information as you can):
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
response = requests.get(url, headers=headers)
The stats.nba.com website need your 'User-Agent' header information.
You can get your request header information from Network tab in the browser.
Take chrome as example, when you press F12, and visit url you given, you can find the relative request information, the most useful information is request headers.
You need to use headers. Try copying from your browser's network tab. Here's what worked for me:
request_headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive',
'Host': 'stats.nba.com',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'
}
And here's the modified get:
response = requests.get(url, headers = request_headers)

Categories