Python requests - session token changing

Python requests - session token changing - python

I am currently using Python requests to scrape data from a website and using Postman as a tool to help me do it.
To those not familiar with Postman, it sends a get request and generates a code snippet to be used in many languages, including Python.
By using it, I can get data from the website quite easily, but it seems as like the 'Cookie' aspect of headers provided by Postman changes with time, so I can't automate my code to run anytime. The issue is that when the cookie is not valid I get an access denied message.
Here's an example of the code provided by Postman:
import requests
url = "https://wsloja.ifood.com.br/ifood-ws-v3/restaurants/7c854a4c-01a4-48d8-b3d4-239c6c069f6a/menu"
payload = {}
headers = {
'access_key': '69f181d5-0046-4221-b7b2-deef62bd60d5',
'browser': 'Windows',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36',
'Accept': 'application/json, text/plain, */*',
'secret_key': '9ef4fb4f-7a1d-4e0d-a9b1-9b82873297d8',
'Cache-Control': 'no-cache, no-store',
'X-Ifood-Session-Id': '85956739-2fac-4ebf-85d3-1aceda9738df',
'platform': 'Desktop',
'app_version': '8.37.0',
'Cookie': 'session_token=TlNUXzMyMjJfMTU5Nzg1MDE5NTIxNF84NDI5NTA2NDQ2MjUxMg==; _abck=AD1745CB8A0963BF3DD67C8AF7932007~-1~YAAQtXsGYH8UUe9zAQAACZ+IAgStbP4nYLMtonPvQ+4UY+iHA3k6XctPbGQmPF18spdWlGiDB4/HbBvDiF0jbgZmr2ETL8YF+f71Uwhsj+L8K+Fk4PFWBolAffkIRDfSubrf/tZOYRfmw09o59aFuQor5LeqxzXkfVsXE8uIJE0P/nC1JfImZ35G0OFt+HyIgDUZMFQ54Wnbap7+LMSWcvMKF6U/RlLm46ybnNnT/l/NLRaEAOIeIE3/JdKVVcYT2t4uePfrTkr5eD499nyhFJCwSVQytS9P7ZNAM4rFIPnM6kPtwcPjolLNeeU=~-1~-1~-1; ak_bmsc=129F92B2F8AC14A400433647B8C29EA3C9063145805E0000DB253D5F49CE7151~plVgguVnRQTAstyzs8P89cFlKQnC9ISQCH9KPHa8xYPDVoV2iQ/Hij2PL9r8EKEqcQfzkGmUWpK09ZpU0tL/llmBloi+S+Znl5P5/NJeV6Ex2gXqBu1ZCxc9soMWWyrdvG+0FFvSP3a6h3gaouPh2O/Tm4Ghk9ddR92t380WBkxvjXBpiPzoYp1DCO4yrEsn3Tip1Gan43IUHuCvO+zkRmgrE3Prfl1T/g0Px9mvLSVrg=; bm_sz=3106E71C2F26305AE435A7DA00506F01~YAAQRTEGyfky691zAQAAGuDbBggFW4fJcnF1UtgEsoXMFkEZk1rG8JMddyrxP3WleKrWBY7jA/Q08btQE43cKWmQ2qtGdB+ryPtI2KLNqQtKM5LnWRzU+RqBQqVbZKh/Rvp2pfTvf5lBO0FRCvESmYjeGvIbnntzaKvLQiDLO3kZnqmMqdyxcG1f51aoOasrjfo=; bm_sv=B4011FABDD7E457DDA32CBAB588CE882~aVOIuceCgWY25bT2YyltUzGUS3z5Ns7gJ3j30i/KuVUgG1coWzGavUdKU7RfSJewTvE47IPiLztXFBd+mj7c9U/IJp+hIa3c4z7fp22WX22YDI7ny3JxN73IUoagS1yQsyKMuxzxZOU9NpcIl/Eq8QkcycBvh2KZhhIZE5LnpFM='
}
response = requests.request("GET", url, headers=headers, data = payload)
print(response.text.encode('utf8'))
Here's just the Cookie part where I get access denied:
'Cookie': 'session_token=TlNUXzMyMjJfMTU5Nzg1MDE5NTIxNF84NDI5NTA2NDQ2MjUxMg==; _abck=AD1745CB8A0963BF3DD67C8AF7932007~-1~YAAQtXsGYH8UUe9zAQAACZ+IAgStbP4nYLMtonPvQ+4UY+iHA3k6XctPbGQmPF18spdWlGiDB4/HbBvDiF0jbgZmr2ETL8YF+f71Uwhsj+L8K+Fk4PFWBolAffkIRDfSubrf/tZOYRfmw09o59aFuQor5LeqxzXkfVsXE8uIJE0P/nC1JfImZ35G0OFt+HyIgDUZMFQ54Wnbap7+LMSWcvMKF6U/RlLm46ybnNnT/l/NLRaEAOIeIE3/JdKVVcYT2t4uePfrTkr5eD499nyhFJCwSVQytS9P7ZNAM4rFIPnM6kPtwcPjolLNeeU=~-1~-1~-1; ak_bmsc=129F92B2F8AC14A400433647B8C29EA3C9063145805E0000DB253D5F49CE7151~plVgguVnRQTAstyzs8P89cFlKQnC9ISQCH9KPHa8xYPDVoV2iQ/Hij2PL9r8EKEqcQfzkGmUWpK09ZpU0tL/llmBloi+S+Znl5P5/NJeV6Ex2gXqBu1ZCxc9soMWWyrdvG+0FFvSP3a6h3gaouPh2O/Tm4Ghk9ddR92t380WBkxvjXBpiPzoYp1DCO4yrEsn3Tip1Gan43IUHuCvO+zkRmgrE3Prfl1T/g0Px9mvLSVrg=; bm_sz=3106E71C2F26305AE435A7DA00506F01~YAAQRTEGyfky691zAQAAGuDbBggFW4fJcnF1UtgEsoXMFkEZk1rG8JMddyrxP3WleKrWBY7jA/Q08btQE43cKWmQ2qtGdB+ryPtI2KLNqQtKM5LnWRzU+RqBQqVbZKh/Rvp2pfTvf5lBO0FRCvESmYjeGvIbnntzaKvLQiDLO3kZnqmMqdyxcG1f51aoOasrjfo=; bm_sv=B4011FABDD7E457DDA32CBAB588CE882~aVOIuceCgWY25bT2YyltUzGUS3z5Ns7gJ3j30i/KuVUgG1coWzGavUdKU7RfSJewTvE47IPiLztXFBd+mj7c9U/IJp+hIa3c4z7fp22WX23E755znZL76c0V/amxbHU9BUnrEff3HGcsniyh5mU+C9XVmtNRLd8oT1UW9WUg3qE=' }
Which is slightly different from the one before.
How could I get through this by somehow having python get the session token?

Apparently just removing 'Cookie' from headers does the job.

Related

Scraping pre-market table on Barchart with Python

Apologies if this is a bit website specific (barchart.com). I used the guidance provided here for properly connecting and scraping barchart.com for Futures data. However, after hours of trying, I am at a loss as to how to pull off this same trick for their pre-market data table: Barchart_Premarket_Site.
Anyone know the trick to get the pre-market data?
Here is the basic connection, for which i get a 403:
import requests
geturl=r'https://www.barchart.com/stocks/pre-market-trading/volume-advances?orderBy=preMarketVolume&orderDir=desc'
s=requests.Session()
r=s.get(geturl)
#j=r.json()
print(r)`

All that was required was to add more headers to the request. You can find your own headers using chrome > developer tools; and then just find the api request for the table and slam in a few of the headers associated with that request.
import requests
request_url = "https://www.barchart.com/proxies/core-api/v1/quotes/get?lists=stocks.us.premarket.volume_advances&orderDir=desc&fields=symbol%2CsymbolName%2CpreMarketLastPrice%2CpreMarketPriceChange%2CpreMarketPercentChange%2CpreMarketVolume%2CpreMarketAverage5dVolume%2CpreMarketPreviousLast%2CpreMarketPreviousChange%2CpreMarketPreviousPercentChange%2CpreMarketTradeTime%2CnextEarningsDate%2CnextEarningsDate%2CtimeCode%2CsymbolCode%2CsymbolType%2ChasOptions&orderBy=preMarketVolume&meta=field.shortName%2Cfield.type%2Cfield.description%2Clists.lastUpdate&hasOptions=true&page=1&limit=100&raw=1"
headers = {
'accept': 'application/json',
'cookie': '_gcl_au=1.1.685644914.1670446600; _fbp=fb.1.1670446600221.1987872306; _pbjs_userid_consent_data=3524755945110770; _pubcid=e7cf9178-59bc-4a82-b6c4-a2708ed78b8d; _admrla=2.2-1e3aed0d7d9d2975-a678aeef-7671-11ed-803e-d12e87d011f0; _lr_env_src_ats=false; _cc_id=6c9e21e7f9c269f8501e2616f9e68632; __browsiUID=c0174d21-a0ab-4dfe-8978-29ae08f44964; __qca=P0-531499020-1670446603686; __gads=ID=220b766bf87e15f9-22fa0316ded8001f:T=1670446598:S=ALNI_MaEWcBqESsJKLF0AwoIVvrKjpjZ_g; panoramaId_expiry=1673549551401; panoramaId=9aa5615403becfbc8adf14a3024816d53938b8cdbea6c8f5cabb60112755d70c; udmsrc=%7B%7D; _pk_id.1.73a4=1aee00a1c66e897b.1672997455.; _ccm_inf=1; bcPremierAdsListScreen=true; _hjSessionUser_2563157=eyJpZCI6ImI2MTM5NTQ4LWUxYzMtNTU2NS04MmM3LTk4ODQ5MWNjY2YxZCIsImNyZWF0ZWQiOjE2NzMwMzQ3OTY0NDAsImV4aXN0aW5nIjp0cnVlfQ==; bcFreeUserPageView=0; _gid=GA1.2.449489725.1673276404; _ga_4HQ9CY2XKK=GS1.1.1673303248.3.0.1673303248.0.0.0; _ga=GA1.2.606341620.1670446600; __aaxsc=2; aasd=5%7C1673314072749; webinar131WebinarClosed=true; _lr_geo_location_state=NC; _lr_geo_location=US; udm_edge_floater_fcap=%5B1673397095403%2C1673392312561%2C1673078162569%2C1673076955809%2C1673075752582%2C1673066137343%2C1673056514808%2C1673051706099%2C1673042087115%2C1673037276340%2C1672960427551%2C1672952009965%2C1672947201101%5D; pbjs-unifiedid=%7B%22TDID%22%3A%2219345091-e7fd-4323-baeb-4627c879c6ba%22%2C%22TDID_LOOKUP%22%3A%22TRUE%22%2C%22TDID_CREATED_AT%22%3A%222022-12-05T19%3A48%3A10%22%7D; __gpi=UID=000008c6d06e1e0d:T=1670446598:RT=1673433090:S=ALNI_MZS6mLx8CJg9iN6kzx4JeDFHPOMjg; market=eyJpdiI6InJvcVNudkprUjQ1bE0yWWQrSTlYY1E9PSIsInZhbHVlIjoieUpabHpmSnJGSkIxc0o1enpyb1dLdENBSWp4UE5NYUZwUFg3OGs0TGJSL0dQWUNpTDU0a2hZbklOQTFNd09OVSIsIm1hYyI6IjBjMjJkNDExZjRhOTc2M2QwYWU3NGUyNmVlZTgyMzY2NWM2MjQyOTY2MjY2YmUxODI2Y2RkY2FlNzI3MjNkOTIifQ%3D%3D; _lr_retry_request=true; __browsiSessionID=c02dadca-6355-415f-aa80-926cccd94759&true&false&DEFAULT&us&desktop-4.11.12&false; IC_ViewCounter_www.barchart.com=2; cto_bundle=dxDlRl90VldIJTJGa0VaRzRIS0xnQmdQOXVWVlhybWJ3NDluY29PelBnM0prMkFxZkxyZWh4dkZNZG9LcyUyRjY1VWlIMWRldkRVRlJ5QW05dHlsQU1xN2VmbzlJOXZFSTNlcFRxUkRxYiUyRlp6Z3hhUHpBekdReU5idVV0WnkxVll0eGp5TyUyQlVzJTJCVDVoRkpWWlZ4R0hOSUl2YTVJVDhBJTNEJTNE; cto_bidid=51ixCl92dkhqbmVmdnlTZHVYS25nWTk2eDVMUnVRNjhEMUhxa3FlcmFzRHVNSERUQkd5cFZrM0QyQyUyRkVNNkV6S0ZHOUZPcTBTR2lBUjA5QUc5YU1ucW9GMFZBWHB4aU9sMlo3WHAlMkJYWjZmJTJGWkpsWSUzRA; _awl=2.1673451629.5-df997ba8dc13bee936d8d14a9771e587-6763652d75732d6561737431-0; laravel_token=eyJpdiI6IjR2YStGblAxWlZoZzllcEtPUUFLNlE9PSIsInZhbHVlIjoiY3E2bHdQWFkyT1FFUHFka2NMMVoyREFvQlZwWXlxc3F0SlRuZnIyTHJsSWtNVFA0K1czcDloWFF2d0lVZys3azZyelkrWks5SWxuRW05MGlqV1I4QmViMU9KKzArVXJOTWNVK2hqZVRocVNHM3NZa1dNeStQbnNyYVBtcjlUeTZzT2lpV2t1ek1UOE1wSUFudmg0NzFTQ3VPeDJiYk16bGNBTzVqVHBCcFRZdTFsZjBVREVyUEhLeThjZm9wSGIzQ2NDVE0ya0xOQWx1VGx0aUlEUE9yakU4Q3RicWFmNDdkYjJSWHlsSWYwajlSUkozVmQ4OVNGNzZEeWhtUExtcXB6VnNrY2NsUzRFQnJyMlhiejFtc0l3U2p5SW5BbFFDZTN0dk9EUWNOR2hVYUdMbmhFUFZVT24xOFFGVkM3L2giLCJtYWMiOiIxYzM5Yzk1ZWNjNjM0NzdjMmM4YTJkZDg0ZmY5MWQwNWUzOTlhNTAwNjg2MTNmNTNlYzY4M2MzYWQ3MDA4MThlIn0%3D; XSRF-TOKEN=eyJpdiI6Ik1PMGEvOGFkZ1p1ekpNcXIvZWZtcHc9PSIsInZhbHVlIjoiMVZYQ3NCV1hjcWREdG5uSDVqYXZVVy91U29USys1dkJJeFNZZG9QVGNRNDhmMTJIeitVV2NUV0xSUC9ZTThVM3FDQWZBcVdNclhqSkx4MG1NTGhadUNlNXRLMEdUc3RDcEVwNnJVYU9FNTBub2NKRWxlekxBZmZEVXNhZUlwWnoiLCJtYWMiOiIxYTI0N2E2OGMxMzRhNmFiYTliMzBlYTdjYWZlNzUwMDRlY2Q5YjI2YzY4OGZlMWIxYmM0YTE3YzZkMTdhMGM3In0%3D; laravel_session=eyJpdiI6InJIcmMxRWVacmtGc2tENS9zYUFFOVE9PSIsInZhbHVlIjoibG1vQWh1d1dmaUNBZTV4dGdJbWhTVEoyMWVrblBFNTBycTBPai9ad2llcHRkd0hHUTI4ZS8rUFNFVm5LNEcvd1RXY1RwOHdVZHplNU92Vk9xUHZjYmMrUC9Cc3hJUkJNWE54OVR1UHFaTExpM1BRcWRSWEJ5Q3gvVVNzajdHZUoiLCJtYWMiOiI5NDVkOGU4NGM5Y2MwMThmMTgwMzQyOWQ1Yzc5MzU5ZGU2ZjkwMWRjYzBjZWJiZDFhMTQzODMzZmE2NWExMGQ3In0%3D',
'referer': 'https://www.barchart.com/stocks/pre-market-trading/volume-advances?orderBy=preMarketVolume&orderDir=desc',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
'x-xsrf-token': 'eyJpdiI6Im1LQVRpVEJONzZwMDRVQnhYK0I5SWc9PSIsInZhbHVlIjoiMkRIMnJBb1VDQmRscjNlajF1dVR2eWxRbGNJTGZCNWxMaWk3N0EzQWlyOWk0cXJBK2oyUVJ1N282R2VOVWh6WlhJcXdZdFplZmRqaFhPa203bi9HeFBxckJKeUVzVDRETHI5OHlxNDZnOEF5WVV5NXdNSWJiWk95UlFHRXQwN2siLCJtYWMiOiI1NTkyZjk2M2FlNTE0NDI0ODQ3YmE4ZjIyZDY1MzM2MTA3ZTY4NDA5NzA5YzViMjhiN2UwYTFhNTM1Y2ZkMjk5In0='
}
r = requests.get(request_url,headers=headers)

Error status code 403 even with headers, Python Requests

I am sending a request to some url. I Copied the curl url to get the code from curl to python tool. So all the headers are included, but my request is not working and I recieve status code 403 on printing and error code 1020 in the html output. The code is
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:106.0) Gecko/20100101 Firefox/106.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
# 'Accept-Encoding': 'gzip, deflate, br',
'DNT': '1',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'none',
'Sec-Fetch-User': '?1',
}
response = requests.get('https://v2.gcchmc.org/book-appointment/', headers=headers)
print(response.status_code)
print(response.cookies.get_dict())
with open("test.html",'w') as f:
f.write(response.text)
I also get cookies but not getting the desired response. I know I can do it with selenium but I want to know the reason behind this. Thanks in advance.
Note:
I have installed all the libraries installed with request with same version as computer and still not working and throwing 403 error

The site is protected by cloudflare which aims to block, among other things, unauthorized data scraping. From What is data scraping?
The process of web scraping is fairly simple, though the
implementation can be complex. Web scraping occurs in 3 steps:
First the piece of code used to pull the information, which we call a scraper bot, sends an HTTP GET request to a specific website.
When the website responds, the scraper parses the HTML document for a specific pattern of data.
Once the data is extracted, it is converted into whatever specific format the scraper bot’s author designed.
You can use urllib instead of requests, it seems to be able to deal with cloudflare
req = urllib.request.Request('https://v2.gcchmc.org/book-appointment/')
req.add_headers('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:106.0) Gecko/20100101 Firefox/106.0')
req.add_header('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8')
req.add_header('Accept-Language', 'en-US,en;q=0.5')
r = urllib.request.urlopen(req).read().decode('utf-8')
with open("test.html", 'w', encoding="utf-8") as f:
f.write(r)

It works on my machine, so I am not sure what the problem is.
However, when I want send a request which does not work, I often try if it works using playwright. Playwright uses a browser driver and thus mimics your actual browser when visiting the page. It can be installed using pip install playwright. When you try it for the first time it may give an error which tells you to install the drivers, just follow the instruction to do so.
With playwright you can try the following:
from playwright.sync_api import sync_playwright
url = 'https://v2.gcchmc.org/book-appointment/'
ua = (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/69.0.3497.100 Safari/537.36"
)
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
page = browser.new_page(user_agent=ua)
page.goto(url)
page.wait_for_timeout(1000)
html = page.content()
print(html)
A downside of playwright is that it requires the installation of the chromium (or other) browsers. This is a downside as it may complicate deployment, as the browser can not simply be added to requirements.txt, and a container image is required.

Try running Burp Suite's Proxy to see all the headers and other data like cookies. Then you could mimic the request with the Python module. That's what I always do.
Good luck!

Had the same problem recently.
Using the javascript fetch-api with Selenium-Profiles worked for me.
example js:
fetch('http://example.com/movies.json')
.then((response) => response.json())
.then((data) => console.log(data));o
Example Python with Selenium-Profiles:
headers = {
"accept": "application/json",
"accept-encoding": "gzip, deflate, br",
"accept-language": profile["cdp"]["useragent"]["acceptLanguage"],
"content-type": "application/json",
# "cookie": cookie_str, # optional
"sec-ch-ua": "'Google Chrome';v='107', 'Chromium';v='107', 'Not=A?Brand';v='24'",
"sec-ch-ua-mobile": "?0", # "?1" for mobile
"sec-ch-ua-platform": "'" + profile['cdp']['useragent']['userAgentMetadata']['platform'] + "'",
"sec-fetch-dest": "empty",
"sec-fetch-mode": "cors",
"user-agent": profile['cdp']['useragent']['userAgent']
}
answer = driver.requests.fetch("https://www.example.com/",
options={
"body": json.dumps(post_data),
"headers": headers,
"method":"POST",
"mode":"same-origin"
})
I don't know why this occurs, but I assume cloudfare and others are able to detect, whether a request is made with javascript.

Requests return the same data

I using python requests module to grab data from one website.
At first time i run script, all works fine, data is ok. Then, if run script again, it's return the same data, however this data changed on website if opened in browser. Whenever i run script, data still the same. BUT!
After 5 or 6 minutes, if run script again, data was updated. Looks like requests caching info.
If using the browser, every time hit refresh, data updates correctly.
r = requests.get('https://verysecretwebsite.com', headers=headers)
r.text
Actually i use following header:
headers = {'User-Agent': "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 YaBrowser/19.6.1.153 Yowser/2.5 Safari/537.36",
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate, br',
'Referer': 'https://www.gismeteo.ru/weather-orenburg-5159/now/',
'DNT': '1',
'Connection': 'false',
'Upgrade-Insecure-Requests': '1',
'Cache-Control': 'no-cache, max-age=0',
'TE': 'Trailers'}
but with no luck.
I try grub this link https://www.gismeteo.ru/weather-orenburg-5159/now/ with section "data-dateformat="G:i"

In your code you haven't set any headers. This means that requests will always send its default User-Agent header like User-Agent: python-requests/2.22.0 and use no caching directives like Cache-Control.
The remote server of your website may have different caching policies for client applications. Remote server can respond with different data or use different caching time based on User-Agent and/or Cache-Control headers of your request.
So try to check what headers your browser uses (F12 in Chrome) to make requests to your site and then add them to your request. You can also add Cache-Control directive to force server to return the most recent data.
Example:
import requests
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 YaBrowser/19.6.1.153 Yowser/2.5 Safari/537.36",
"Cache-Control": "no-cache, max-age=0", # disable caching
}
r = requests.get("https://www.mysecretURL.com", headers=headers)

The requests.get() method doesn't cache data by default (from this StackOverflow post) I'm not entirely sure of the reason for the lag, as refreshing your browser is essentially identical to calling requests.get(). You could try creating a loop that automatically collects data every 5-10 seconds or so, and that should work fine (and keep you from having to manually run the same lines of code). Hope this helps!

POST request fails to interact with site

I am trying to login to a site called grailed.com and follow a certain product. The code below is what I have tried.
The code below succeeds in logging in with my credentials. However whenever I try to follow a product (the id in the payload is the id of the product) the code runs without any errors but fails to follow the product. I am confused at this behavior. Is it a similar case to Instagram (where Instagram blocks any attempt to interact programmatically with their site and force you to use their API (grailed.com does not have a API for the public to use AFAIK)
I tried the following code (which looks exactly like the POST request sent when you follow on the site).
headers/data defined here
r = requests.Session()
v = r.post("https://www.grailed.com/api/sign_in", json=data,headers = headers)
headers = {
'authority': 'www.grailed.com',
'method': 'POST',
"path": "/api/follows",
'scheme': 'https',
'accept': 'application/json',
'accept-encoding': 'gzip, deflate, br',
"content-type": "application/json",
"x-amplitude-id": "1547853919085",
"x-api-version": "application/grailed.api.v1",
"x-csrf-token": "9ph4VotTqyOBQzcUt8c3C5tJrFV7VlT9U5XrXdbt9/8G8I14mGllOMNGqGNYlkES/Z8OLfffIEJeRv9qydISIw==",
"origin": "https://www.grailed.com",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"
}
payload = {
"id": "7917017"
}
b = r.post("https://www.grailed.com/api/follows",json = payload,headers = headers)

If API is not designed to be public, you are most likely missing csrf token in your follow headers.
You have to find an CSRF token, and add it to /api/follows POST.
taking fast look at code, this might be hard as everything goes inside javascript.

GET request while access token is changing

I wrote a python script which sends GET requests to some particular website.
In order to perform this request, I need to attach the access token that I was given when I logged in.
The problem is that the access token is changing each 15 min, and I have to find it over and over again by using Chrome Devtool (Network tab). I was wondering if there is any way to obtain the new token automatically, or any other way to perform this GET request without using this access token but only the credentials (Username and Password) for this website.
Right now, this is how I'm doing this (Notice that the data provided is not real, so please don't try to use) :
url = "https://www.containers.io/web/alerts"
querystring = {"access_token":"cfc6f6d22f00303fb7ac--f","envId":"58739be2c2","folderId":"active","sortBy":"status"}
headers = {
'origin': "https://www.containers.io",
'accept-encoding': "gzip, deflate, br",
'accept-language': "en-US,en;q=0.9",
'user-agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36",
'accept': "application/json, text/plain, */*",
'referer': "https://www.containers.io",
'connection': "keep-alive",
'cache-control': "no-cache",
}
response = requests.request("GET", url, headers=headers, params=querystring)
JSON_format = response.json()

I'd advice an handshake implementation of this. Pass the access-code with your request and make sure the response returned contains that same access-code which you can then use to generate another code to make a second or more requests. Hope this answers your question

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python requests - session token changing - python

Apparently just removing 'Cookie' from headers does the job.

Related

Scraping pre-market table on Barchart with Python

Error status code 403 even with headers, Python Requests

Requests return the same data

POST request fails to interact with site

GET request while access token is changing

Categories

Resources