JWT Bearer Authorization for web scraping using python requests

JWT Bearer Authorization for web scraping using python requests - python

This is my first post on StackOverflow so please bear with me.
I am writing a function that makes a request via REST API and then returns the values, but I'm having trouble with the authentication part.
The authentication is a JWT bearer token, and is needed to retrieve the data (though I am not needing to log in so in that regard it is an unauthorised API).
def get__price(jwt, cookie):
headers = {
'authority': 'www.dextools.io',
'pragma': 'no-cache',
'cache-control': 'no-cache',
'accept': 'application/json',
'authorization': f'Bearer {jwt}', # HERE IS THE VAR I NEED
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36',
'content-type': 'application/json',
'sec-gpc': '1',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'cors',
'sec-fetch-dest': 'empty',
'referer': 'https://www.dextools.io/app/uniswap/pair-explorer/0x0d4a11d5eeaac28ec3f61d100daf4d40471f1852',
'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
#'cookie': f'__cfduid={cookie}; ai_user=hizb^|2021-04-03T00:16:45.460Z; ai_session=5vAmv^|1617443356577.045^|1617443356577.045',
}
params = (
('v', '1.9.1'),
('pair', '0x0d4a11d5eeaac28ec3f61d100daf4d40471f1852'),
('ts', '1617443384-0')
)
try:
response = requests.get('https://www.dextools.io/api/uniswap/1/pairexplorer', headers=headers, params=params)
except Exception as e:
print(f"ERROR: {e}")
I've tried to make a request to the website https://www.dextools.io and get any JWT tokens, but it doesnt seem to work using Sessions.
Maybe it has no importance but I can find this JWT token on the browser when I go to developer tools > Local Storage > (website url) > t where t contains my eyJxxxxxxxxxxxxxxx token.
Any help would be appreciated, thanks.

Hello seeing to the network requests of website I was able to get the data via below code but you might need to get the new password if website blocks it jwt token which is generated below is valid for like 6 to 8 mins you can re use the jwt token till that time and then you need to get new jwt token by calling that back login url like mentioned in below code.
Code:
import time
import requests
s = requests.session()
headersdict = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36',
'Referer': 'https://www.dextools.io/app/uniswap/pair-explorer/0x0d4a11d5eeaac28ec3f61d100daf4d40471f1852',
'Origin': 'https://www.dextools.io'}
s.headers.update(headersdict)
payload = {"id": "anyone", "password": "TfY6WC6F4L4+S6xwvPo8QoHlYZ50rK2DrJnEAWBoMqU="}#you can use this password to generate new jwt tokens if it blocks you check network requests and get this password again but i dont think they will block it that way.
s1 = s.post("https://www.dextools.io/back/user/login", json=payload)
jwt = s1.headers["X-Auth"]
headersdict = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36',
'Referer': 'https://www.dextools.io/app/uniswap/pair-explorer/0x0d4a11d5eeaac28ec3f61d100daf4d40471f1852',
'Origin': 'https://www.dextools.io',
'authorization': f'Bearer {jwt}'}
s.headers.update(headersdict)
params = (
('v', '1.9.1'),
('pair', '0x0d4a11d5eeaac28ec3f61d100daf4d40471f1852'),
('ts', f'{time.time()}-0')
)
response = s.get('https://www.dextools.io/api/uniswap/1/pairexplorer', params=params)
print(response.text)
Output:
Let me know if you have any questions :)

Related

Python Requests Does Not get website that opens on browser

I have been trying to access this website https://www.dickssportinggoods.com/f/tents-accessories with requests module but it just keeps processing and does not stop while the same website works fine on browser. Scrappy gives a time out error for the same website. Is there something that should be taken into account while accessing websites like these. Thanks

For sites like these you can try to add the extra headers that your browser does. Following these steps worked for me -
Open the link in incognito window with the network tab open.
Copy the first request made by right clicking -> copy -> copy as curl
Go to https://curl.trillworks.com/. Paste the curl command to get the equivalent python requests code.
Now try removing headers one by one until it works with the minimal headers.
Image for reference - https://i.stack.imgur.com/vRS98.png
Edit -
import requests
headers = {
'authority': 'www.dickssportinggoods.com',
'pragma': 'no-cache',
'cache-control': 'no-cache',
'sec-ch-ua': '" Not;A Brand";v="99", "Google Chrome";v="91", "Chromium";v="91"',
'sec-ch-ua-mobile': '?0',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'sec-fetch-site': 'none',
'sec-fetch-mode': 'navigate',
'sec-fetch-user': '?1',
'sec-fetch-dest': 'document',
'accept-language': 'en-US,en;q=0.9',
}
response = requests.get('https://www.dickssportinggoods.com/f/tents-accessories', headers=headers)
print(response.text)

Have you tried adding headers?
import requests
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get('https://www.dickssportinggoods.com/f/tents-accessories', headers=headers)
response.raise_for_status()
print(response.text)

So Thanks to #Marcel and #Sonal but appart from headers, it just worked when i put the statement in a try/except block.
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0\
Win64\
x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36'
}
session = requests.Session()
try:
r = session.get(
link, headers=headers, stream=True)
return r
except requests.exceptions.ConnectionError:
r.status_code = "Connection refused"

How to obtain a JSON response from the stats.nba.com API?

I'm just trying to simply use a Python get request to access JSON data from stats.nba.com. It seems pretty straight-forward as I can enter the URL into your browser and get the results I'm looking for. However, whenever I run this the program just runs to no end. I'm wondering if I have to include some type of headers information in my get request.
The code is below:
import requests
url = 'http://stats.nba.com/stats/commonteamroster?LeagueID=00&Season=2017-18&TeamID=1610612756'
response=requests.get(url)
print response.text

I have tried to visit the url you given, you can add header to your request to avoid this problem (the minimum information you need to provide is User-Agent, I think you can use more header information as you can):
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
response = requests.get(url, headers=headers)
The stats.nba.com website need your 'User-Agent' header information.
You can get your request header information from Network tab in the browser.
Take chrome as example, when you press F12, and visit url you given, you can find the relative request information, the most useful information is request headers.

You need to use headers. Try copying from your browser's network tab. Here's what worked for me:
request_headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive',
'Host': 'stats.nba.com',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'
}
And here's the modified get:
response = requests.get(url, headers = request_headers)

Why this simple POST request not working in Python Scrapy whereas it works with simple request.post()

I have simplest POST request code
import requests
headers = {
'origin': 'https://jet.com',
'accept-encoding': 'gzip, deflate, br',
'x-csrf-token': 'IzaENk9W-Xzv9I5NcCJtIf9h_nT24p5fU-Tk',
'jet-referer': '/product/detail/87e89b3ce17f4742ab6d72aeaaa5480d?gclid=CPzS982CgdMCFcS1wAodABwIOQ',
'x-requested-with': 'XMLHttpRequest',
'accept-language': 'en-US,en;q=0.8',
'cookie': 'akacd_phased_release=3673158615~rv=53~id=041cdc832c1ee67c7be18df3f637ad43; jet.csrf=_JKKPyR5fKD-cPDGmGv8AJk5; jid=7292a61d-af8f-4d6f-a339-7f62afead9a0; jet-phaser=%7B%22experiments%22%3A%5B%7B%22variant%22%3A%22a%22%2C%22version%22%3A1%2C%22id%22%3A%22a_a_test16%22%7D%2C%7B%22variant%22%3A%22slp_categories%22%2C%22version%22%3A1%2C%22id%22%3A%22slp_categories%22%7D%2C%7B%22variant%22%3A%22on_cat_nav_clicked%22%2C%22version%22%3A1%2C%22id%22%3A%22catnav_load%22%7D%2C%7B%22variant%22%3A%22zipcode_table%22%2C%22version%22%3A1%2C%22id%22%3A%22zipcode_table%22%7D%5D%2C%22id%22%3A%222982c0e7-287e-42bb-8858-564332ada868%22%7D; ak_bmsc=746D16A88CE3AE7088B0CD38DB850B694F8C5E56B1650000DAA82659A1D56252~plJIR8hXtAZjTSjYEr3IIpW0tW+u0nQ9IrXdfV5GjSfmXed7+tD65YJOVp5Vg0vdSqkzseD0yUZUQkGErBjGxwmozzj5VjhJks1AYDABrb2mFO6QqZyObX99GucJA834gIYo6/8QDIhWMK1uFvgOZrFa3SogxRuT5MBtC8QBA1YPOlK37Ecu1WRsE2nh55E24F0mFDx5hXcfBAhWdMne6NrQ88JE9ZDxjW5n8qsh+QAHo=; _sdsat_landing_page=https://jet.com/product/detail/87e89b3ce17f4742ab6d72aeaaa5480d?gclid=CPzS982CgdMCFcS1wAodABwIOQ|1495705823651; _sdsat_session_count=1; AMCVS_A7EE579F557F617B7F000101%40AdobeOrg=1; AMCV_A7EE579F557F617B7F000101%40AdobeOrg=-227196251%7CMCIDTS%7C17312%7CMCMID%7C11996417004070294145733272597342763775%7CMCAID%7CNONE%7CMCAAMLH-1496310624%7C3%7CMCAAMB-1496310625%7Chmk_Lq6TPIBMW925SPhw3Q%7CMCOPTOUT-1495713041s%7CNONE; __qca=P0-949691368-1495705852397; mm_gens=Rollout%20SO123%20-%20PDP%20Grid%20Image%7Ctitle%7Chide%7Cattr%7Chide%7Cprice%7Chide~SO19712%20HP%20Rec%20View%7Clast_viewed%7Cimage-only~SO17648%20-%20PLA%20PDP%7Cdesc%7CDefault%7Cbuybox%7Cmodal%7Cexp_cart%7Chide-cart%7Ctop_caro%7CDefault; jcmp_productSku=882b1010309d48048b8f3151ddccb3cf; _sdsat_all_pages_canary_variants=a_a_test16:a|slp_categories:slp_categories|catnav_load:on_cat_nav_clicked|zipcode_table:zipcode_table; _sdsat_all_pages_native_pay_eligible=No; _uetsid=_uet6ed8c6ab; _tq_id.TV-098163-1.3372=ef52068e069c26b9.1495705843.0.1495705884..; _ga=GA1.2.789964406.1495705830; _gid=GA1.2.1682210002.1495705884; s_cc=true; __pr.NaN=6jvgorz8tb; mm-so17648=gen; __pr.11xw=xqez1m3cvl; _sdsat_all_pages_login_status=logged-out; _sdsat_jid_cookie=7292a61d-af8f-4d6f-a339-7f62afead9a0; _sdsat_phaser_id=2982c0e7-287e-42bb-8858-564332ada868; _sdsat_all_pages_jet_platform=desktop; _sdsat_all_pages_site_version=3.860.1495036770896|2017-05-16 20:35:36 UTC; _sdsat_all_pages_canary_variants_2=a_a_test16:a~slp_categories:slp_categories~catnav_load:on_cat_nav_clicked~zipcode_table:zipcode_table; jet=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpZCI6WyJmMmUwMjI1NS1iODFkLTRlOTktOGU1Yi0yZGI1MjU0ZTdjNzUiXSwiamNtcEhpc3RvcnkiOltbXV0sImlwWmlwY29kZSI6WyIyMTA2MSJdLCJjbGllbnRUaWNrZXQiOlsiZXlKMGVYQWlPaUpLVjFRaUxDSmhiR2NpT2lKSVV6STFOaUo5LmV5SmpiR2xsYm5SZmFXUWlPaUl3Tm1JMlkyTTNaVGRtTnpVME16TmhPREU0T0RjelpUWmpZMkV4WTJRelppSXNJbWx6Y3lJNkltcGxkQzVqYjIwaUxDSmhkV1FpT2lKM1pXSmpiR2xsYm5RaWZRLnlKMXdoYklDVml4TE1iblliV0xQY1RvdF9EWUo3MjFYQkdFMzBpUktpdTQiXSwicHJvbW9jb2RlIjpbIlNQUklORzE1Il0sInBsYSI6W3RydWVdLCJmcmVlU2hpcHBpbmciOltmYWxzZV0sImpjbXAiOlt7ImpjbXAiOiJwbGE6Z2dsOm5qX2R1cl9nZW5fcGF0aW9fX2dhcmRlbl9hMjpwYXRpb19fZ2FyZGVuX2dyaWxsc19fb3V0ZG9vcl9jb29raW5nX2dyaWxsX2NvdmVyc19hMjpuYTpwbGFfNzg0NzQ0NTQyXzQwNTY4Mzg3NzA2X3BsYS0yOTM2MjcyMDMzNDE6bmE6bmE6bmE6Mjo4ODJiMTAxMDMwOWQ0ODA0OGI4ZjMxNTFkZGNjYjNjZiIsImNvZGUiOiJQTEExNSIsInNrdSI6Ijg4MmIxMDEwMzA5ZDQ4MDQ4YjhmMzE1MWRkY2NiM2NmIn1dLCJpYXQiOjE0OTU3MDU4OTh9.6OEM9e9fTyUZdFGju19da4rEnFh8kPyg8wENmKyhYgc; bm_sv=360FA6B793BB42A17F395D08A2D90484~BLAlpOUET7ALPzcGziB9dbZNvjFjG3XLQPFGCRTk+2bnO/ivK7G+kOe1WXpHgIFmyZhniWIzp2MpGel1xHNmiYg0QOLNqourdIffulr2J9tzacGPmXXhD6ieNGp9PAeTqVMi+2kSccO1+JzO+CaGFw==; s_tps=30; s_pvs=173; mmapi.p.pd=%221759837076%7CDwAAAApVAgDxP2Qu1Q4AARAAAUJz0Q1JAQAmoW6kU6PUSKeaIXVTo9RIAAAAAP%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FAAZEaXJlY3QB1Q4BAAAAAAAAAAAAt8wAAH0vAQC3zAAABQDZlQAAAmpAfP%2FVDgD%2F%2F%2F%2F%2FAdUO1Q7%2F%2FwYAAAEAAAAAAc9dAQCNFgIAADyXAABuDVKACdUOAP%2F%2F%2F%2F8B1Q7VDv%2F%2FCgAAAQAAAAABs2ABANweAgAAiY0AAMCzlXtx1Q4A%2F%2F%2F%2F%2FwHVDtUO%2F%2F8GAAABAAAAAAORSwEATPoBAJJLAQBO%2BgEAk0sBAFD6AQABt8wAAAYAAADYlQAAHMPK3ZbVDgD%2F%2F%2F%2F%2FAdUO1Q7%2F%2FwYAAAEAAAAAAc5dAQCJFgIAAbfMAAAGAAAAmpgAAFAf9YUU1Q4A%2F%2F%2F%2F%2FwHVDtUO%2F%2F8EAAABAAAAAAR0YwEA1R4CAHVjAQDWHgIAdmMBANgeAgB3YwEA2x4CAAG3zAAABAAAAAAAAAAAAUU%3D%22; mmapi.p.srv=%22fravwcgus04%22; mmapi.e.PLA=%22true%22; mmapi.p.uat=%7B%22PLATraffic%22%3A%22true%22%7D; _sdsat_lt_pages_viewed=6; _sdsat_pages_viewed=6; _sdsat_traffic_source=',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36',
'content-type': 'application/json',
'accept': 'application/json, text/javascript, */*; q=0.01',
'referer': 'https://jet.com/product/detail/87e89b3ce17f4742ab6d72aeaaa5480d?gclid=CPzS982CgdMCFcS1wAodABwIOQ',
'authority': 'jet.com',
'dnt': '1',
}
data = '{"zipcode":"21061","sku":"87e89b3ce17f4742ab6d72aeaaa5480d","origination":"PDP"}'
r=requests.post('https://jet.com/api/product/v2', headers=headers, data=data)
print(r)
It returns 200
And I want to convert this simple request to Python Request.
body = '{"zipcode":"21061","sku":"87e89b3ce17f4742ab6d72aeaaa5480d","origination":"PDP"}'
yield Request(url = 'https://jet.com/api/product/v2', callback=self.parse_jet_page, meta={'data':data}, method="POST", body=body, headers=self.jet_headers)
it returns 400, looks like headers are being over-written or something. Or is there bug?

I guess the error is caused by cookies.
By default, the "cookie" entry in your HTTP headers shall be overriden by a built-in downloader middleware CookiesMiddleware. Scrapy expects a user to use Request.cookies for passing cookies.
If you do need to pass cookies directly in Request.headers (instead of using Request.cookies), you'll need to disable the built-in CookiesMiddleware. You may simply set COOKIES_ENABLED=False in settings.

POST request with form data

I'm trying to simulate a request
that has various headers and bracketed form data.
Form Data:
{"username": "MY_USERNAME", "pass": "MY_PASS", "AUTO": "true"}
That is the form data shown in the console of Chrome
So I tried putting it together with Python's requests library:
import requests
reqUrl = 'http://website.com/login'
postHeaders = {
'Accept': '*/*',
'Accept-Encoding': 'gzip,deflate',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive',
'Content-Length': '68',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Host': 'website.com',
'Origin': 'http://www.website.com',
'Referer': 'http://www.website.com/',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36'
}
payload = {"username": "MY_USERNAME",
"pass": "MY_PASS",
"AUTO": "true"
}
session = requests.Session()
response = session.post(reqUrl, data=payload, headers=postHeaders)
I'm receiving a response but it shows:
{"status":"failure","error":"Invalid request data"}
Am I going about implementing the form data wrong? I was also thinking it could have to do with modifying the Content-Length?

Yes, you are setting a content length, overriding anything requests might set. You are setting too many headers, leave most of those to the library instead:
postHeaders = {
'Accept-Language': 'en-US,en;q=0.8',
'Origin': 'http://www.website.com',
'Referer': 'http://www.website.com/',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36'
}
is plenty. All the others will be generated for you.
However, from your description of the form data, it looks like you are posting JSON instead. In that case, use the json keyword argument instead of data, which will encode your payload to JSON and set the Content-Type header to application/json:
response = session.post(reqUrl, json=payload, headers=postHeaders)

Specifying the body content in a post using requests(python)

I have been using fiddler to inspect an http post that an application makes and then trying to replicate that post using requests in python.
The link I am posting to: http://www.example.com/ws/for/250/buy
Now in fiddler I can clearly see the headers which are easy to replicate using requests. However when I look in textview on fiddler I see this :
tuples=4,421&flows=undefined
To replicate that I think I need to use the data parameter which I found on the docs, however I am not sure how to write this in python? As in do I do it as a dictionary and split it up according to the & sign, or do i have to specify it a a string, etc?
My current code
url = 'http://www.example.com/ws/for/250/buy'
headers = {
'Connection': 'keep-alive',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1003.1 Safari/535.19 Awesomium/1.7.1',
'Accept': '*/*',
'Accept-Encoding': 'gzip,deflate',
'Accept-Language': 'en',
'Accept-Charset': 'iso-8859-1,*,utf-8',
}
r6 = requests.post(url, headers = headers, verify = False)

Something like
url = 'http://www.example.com/ws/for/250/buy'
headers = {
'Connection': 'keep-alive',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1003.1 Safari/535.19 Awesomium/1.7.1',
'Accept': '*/*',
'Accept-Encoding': 'gzip,deflate',
'Accept-Language': 'en',
'Accept-Charset': 'iso-8859-1,*,utf-8',
}
r6 = requests.post(url, headers=headers, verify=False, data={'tuples': '4,421', 'flows': 'undefined'})
Should work

You provide a dictionary to the data argument:
r = requests.post(url, data={'tuples': '4,421', 'flows': 'undefined'})

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

JWT Bearer Authorization for web scraping using python requests - python

Related

Python Requests Does Not get website that opens on browser

How to obtain a JSON response from the stats.nba.com API?

Why this simple POST request not working in Python Scrapy whereas it works with simple request.post()

POST request with form data

Specifying the body content in a post using requests(python)

Categories

Resources