I try to create POST request to ASP site (just like in Firefox), for get JSON response.
But in my code response is html, not JSON.
link to site
Firebug Response Headers:
HTTP/1.1 200 OK
Cache-Control: private
Content-Type: application/json; charset=utf-8
Server: Microsoft-IIS/7.5
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Date: Mon, 08 Sep 2014 11:32:22 GMT
Content-Length: 101
Firebug Request Headers:
POST /Portal/WebPageMethods/Playlista/playlist.aspx HTTP/1.1
Host: www.polskieradio.pl
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:30.0) Gecko/20100101 Firefox/30.0
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: pl,en-US;q=0.7,en;q=0.3
Accept-Encoding: gzip, deflate
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
X-Requested-With: XMLHttpRequest
Referer: http://www.polskieradio.pl/10,Czworka.json
Content-Length: 17
Cookie: cookies-accepted=true; ASP.NET_SessionId=35p3kig5t5cmlikubnlnytlh
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
source code:
import requests
import json
url = "http://www.polskieradio.pl/Portal/WebPageMethods/Playlista/playlist.aspx?program=4&count=1"
payload = { "Host": "www.polskieradio.pl",
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:30.0) Gecko/20100101 Firefox/30.0",
"Accept": "application/json, text/javascript, */*; q=0.01",
"Accept-Language": "pl,en-US;q=0.7,en;q=0.3",
"Accept-Encoding": "gzip, deflate",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"X-Requested-With": "XMLHttpRequest",
"Referer": "http://www.polskieradio.pl/10,Czworka",
"Content-Length": "17",
"Cookie": "cookies-accepted=true; ASP.NET_SessionId=5l1eezrjfdyvvevxushojtc2",
"Connection": "keep-alive",
"Pragma": "no-cache",
"Cache-Control": "no-cache"
}
r = requests.post(url, data=json.dumps(payload))
print(r.headers['content-type'])
print r.content
How to do this properly?
Thanks for answers!
Try a little bit different...
Look at this example:
headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}
r = requests.post(url, data=json.dumps(data), headers=headers)
Accept is a header, not a payload.
Everything you are sending as payload, are, in fact, headers.
Your POST payload may be program=4&count=1, or you can do a GET.
--- ADDITION with final solution
import requests
import json
url = "http://www.polskieradio.pl/Portal/WebPageMethods/Playlista/playlist.aspx"
data = 'program=4&count=1'
headers = {
'User-Agent': 'curl/7.35.0',
'Host': 'www.polskieradio.pl',
'Accept':'*/*',
'Proxy-Connection': 'Keep-Alive',
'Content-Type': 'application/x-www-form-urlencoded'
}
r = requests.post(url, data=data, headers=headers)
print r.content
Related
# selenium-request.py
from seleniumwire import webdriver # Import from seleniumwire
# Create a new instance of the Chrome driver
driver = webdriver.Chrome()
driver.get('https://www.cmegroup.com/content/cmegroup/en/tools-information/advisorySearch/jcr:content/full-par/cmeadvisorysearch.advisorySearch.advisorynotices:Advisory%20Notices.-.2.12|07|2021.01|01|2008.json')
for request in driver.requests:
if request.response:
print(request.response.headers)
When I run that code I get the headers Selenium uses:
$ python selenium-request.py
Accept-Ranges: bytes
Access-Control-Allow-Origin: http://star-website.com
Content-Type: application/json
ETag: W/"36b8a-5d3d28ed9cc43"
Last-Modified: Thu, 23 Dec 2021 16:16:16 GMT
Referrer-Policy: no-referrer-when-downgrade
Server: Apache
ServerID: e1
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Accept-Encoding
Content-Encoding: gzip
Cache-Control: max-age=86400
Date: Thu, 23 Dec 2021 16:16:16 GMT
Content-Length: 46236
Connection: keep-alive
Content-Security-Policy: frame-ancestors 'self' *.cmegroup.com *.quikstrike.net commodex.co.il openexchange.community.cmegroup.com staging.tickertocker.com http://www.straitsfinancial.com www.straitsfinancial.com http://straitsfinancial.com https://www.home.saxo https://app.topsteptrader.com https://help.topsteptrader.com https://staging.topsteptrader.com https://blueeditsitecore.sys.dom https://bluesitecore.sys.dom https://sitecoredev.orange.saxobank.com https://sitecoredev-nocache.orange.saxobank.com https://sitecoredevedit.orange.tst2.dom http://star-website.com https://www.investing.com https://*.benzinga.com https://bz.zingbot.bz https://www.zingbot.bz https://gdcdyn.interactivebrokers.com https://www.interactivebrokers.com https://zingbot.bz https://www.zingbot.bz https://m.zingbot.bz https://bz.zingbot.bz https://dev.futuresfirstacademy.com https://uat.futuresfirstacademy.com https://futuresfirstacademy.com http://stage.barchart.com http://www.barchart.com https://www.infinityfutures.com https://kilofutures.com https://m.cqg.com https://mdemo.cqg.com *.chicago.cme.com:7822 https://uatm.cqg.com https://local.zingbot.bz https://www.gulfbondsukuk.org www.kgieworld.sg https://www.propex24.wpcomstaging.com https://www.propex24.com *.straitsfinancial.gate39tech.com us.straitsfinancial.com https://*.kapcoclients.com https://kapcoclients.com https://*.wallstreetbound.org https://wallstreetbound.org https://cofcointl.plateau.com https://rise.articulate.com https://members.tradeday.com http://blf-django.herokuapp.com https://www.bluelinefutures.com https://www.bluelinefutures.live https://www.bluelinefutures.trade https://login.chicago.cme.com https://loginnr.chicago.cme.com https://logincert.chicago.cme.com https://login-ny.chicago.cme.com https://ampfutures.com https://cme.ampfutures.com https://*.advantagefutures.com https://*.e-futures.com https://*.etrade.com https://*.gffbrokers.com https://infinityfutures-cn.com https://sweetfutures.com https://*.tradovate.com https://home.saxo https://*.tickmill.co.uk https://*.directa.it https://big.pt https://*.tradestation-international.com https://*.stonex.com http://tradinglesson.com https://tradinglesson.com *.ibroker.it *.ibroker.es *.cornertrader.ch *.whselfinvest.com *.banxbroker.de *.ameritrade.com *.sweetfutures.com *.danielstrading.com *.gainfutures.com *.futuresonline.com *.tdainc.com *.lsvp.com *.schwab.com *.schwab.co.uk *.us.global.schwab.com *.dev.schwab.com;
Set-Cookie: ak_bmsc=AB0A9701302106EABE2E195C6AC2A074~000000000000000000000000000000~YAAQLtERAvOZVN19AQAA7C8U6A7AWr7StAmiphZPltguFftPSOXgfa2NAq7Vts+40k7AdnPG55ULK1vyBRhPRdqWbtYml3JTC3RjHLu31l8kWBFvysYyuY2uz4GpkvmOWoBSN/Dl/2bQ9bEgbiYj3tCZ1o+wEvMfsiAWiJeMY3M1ozu6nyQz0JVpdvfsqun3z5wGhpJWhkjrJjeIyHvVdzx2uyIb1azRFlHT+nRCR6NHGoaMM/G2sI1DqPOXPB5btXjdncvB739c2Beh7RgWD/zvb78qpAJDUR1KOenDy1EwN2Bg8pqH1sxlsoVrl7i7r/pAOaWKfd4U1FKP7p730GfOp/m2VRBIdYgHDPHPvGeITPKrR/G22aR886r9Lerhug==; Domain=.cmegroup.com; Path=/; Expires=Thu, 23 Dec 2021 18:16:01 GMT; Max-Age=7185; HttpOnly
I copy these exact headers into a python dict and request as follows:
# python-request.py
import requests
headers = {
"Accept-Ranges": "bytes",
"Access-Control-Allow-Origin": "http://star-website.com",
"Content-Type": "application/json",
"ETag": 'W/"36b8a-5d3d28ed9cc43"',
"Last-Modified": "Thu, 23 Dec 2021 16:16:16 GMT",
"Referrer-Policy": "no-referrer-when-downgrade",
"Server": "Apache",
"ServerID": "e1",
"Strict-Transport-Security": "max-age=31536000; includeSubDomains",
"Vary": "Accept-Encoding",
"Content-Encoding": "gzip",
"Cache-Control": "max-age=86400",
"Date": "Thu, 23 Dec 2021 16:16:16 GMT",
"Content-Length": "46236",
"Connection": "keep-alive",
"Content-Security-Policy": "frame-ancestors 'self' *.cmegroup.com *.quikstrike.net commodex.co.il openexchange.community.cmegroup.com staging.tickertocker.com http://www.straitsfinancial.com www.straitsfinancial.com http://straitsfinancial.com https://www.home.saxo https://app.topsteptrader.com https://help.topsteptrader.com https://staging.topsteptrader.com https://blueeditsitecore.sys.dom https://bluesitecore.sys.dom https://sitecoredev.orange.saxobank.com https://sitecoredev-nocache.orange.saxobank.com https://sitecoredevedit.orange.tst2.dom http://star-website.com https://www.investing.com https://*.benzinga.com https://bz.zingbot.bz https://www.zingbot.bz https://gdcdyn.interactivebrokers.com https://www.interactivebrokers.com https://zingbot.bz https://www.zingbot.bz https://m.zingbot.bz https://bz.zingbot.bz https://dev.futuresfirstacademy.com https://uat.futuresfirstacademy.com https://futuresfirstacademy.com http://stage.barchart.com http://www.barchart.com https://www.infinityfutures.com https://kilofutures.com https://m.cqg.com https://mdemo.cqg.com *.chicago.cme.com:7822 https://uatm.cqg.com https://local.zingbot.bz https://www.gulfbondsukuk.org www.kgieworld.sg https://www.propex24.wpcomstaging.com https://www.propex24.com *.straitsfinancial.gate39tech.com us.straitsfinancial.com https://*.kapcoclients.com https://kapcoclients.com https://*.wallstreetbound.org https://wallstreetbound.org https://cofcointl.plateau.com https://rise.articulate.com https://members.tradeday.com http://blf-django.herokuapp.com https://www.bluelinefutures.com https://www.bluelinefutures.live https://www.bluelinefutures.trade https://login.chicago.cme.com https://loginnr.chicago.cme.com https://logincert.chicago.cme.com https://login-ny.chicago.cme.com https://ampfutures.com https://cme.ampfutures.com https://*.advantagefutures.com https://*.e-futures.com https://*.etrade.com https://*.gffbrokers.com https://infinityfutures-cn.com https://sweetfutures.com https://*.tradovate.com https://home.saxo https://*.tickmill.co.uk https://*.directa.it https://big.pt https://*.tradestation-international.com https://*.stonex.com http://tradinglesson.com https://tradinglesson.com *.ibroker.it *.ibroker.es *.cornertrader.ch *.whselfinvest.com *.banxbroker.de *.ameritrade.com *.sweetfutures.com *.danielstrading.com *.gainfutures.com *.futuresonline.com *.tdainc.com *.lsvp.com *.schwab.com *.schwab.co.uk *.us.global.schwab.com *.dev.schwab.com;",
"Set-Cookie": "ak_bmsc=AB0A9701302106EABE2E195C6AC2A074~000000000000000000000000000000~YAAQLtERAvOZVN19AQAA7C8U6A7AWr7StAmiphZPltguFftPSOXgfa2NAq7Vts+40k7AdnPG55ULK1vyBRhPRdqWbtYml3JTC3RjHLu31l8kWBFvysYyuY2uz4GpkvmOWoBSN/Dl/2bQ9bEgbiYj3tCZ1o+wEvMfsiAWiJeMY3M1ozu6nyQz0JVpdvfsqun3z5wGhpJWhkjrJjeIyHvVdzx2uyIb1azRFlHT+nRCR6NHGoaMM/G2sI1DqPOXPB5btXjdncvB739c2Beh7RgWD/zvb78qpAJDUR1KOenDy1EwN2Bg8pqH1sxlsoVrl7i7r/pAOaWKfd4U1FKP7p730GfOp/m2VRBIdYgHDPHPvGeITPKrR/G22aR886r9Lerhug==; Domain=.cmegroup.com; Path=/; Expires=Thu, 23 Dec 2021 18:16:01 GMT; Max-Age=7185; HttpOnly"
}
requests.get(
"https://www.cmegroup.com/content/cmegroup/en/tools-information/advisorySearch/jcr:content/full-par/cmeadvisorysearch.advisorySearch.advisorynotices:Advisory%20Notices.-.2.12|07|2021.01|01|2008.json",
headers=headers)
When I run this it just hangs indefinitely, so there is some issue with the request.
Apart from the headers, what is the difference between the requests made by python and Selenium - how could I identify the issue and hopefully get this working with the python requests library?
Update
I updated the code to get the request.headers instead:
Host: www.cmegroup.com
Connection: keep-alive
sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="96", "Google Chrome";v="96"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "Linux"
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Sec-Fetch-Site: none
Sec-Fetch-Mode: navigate
Sec-Fetch-User: ?1
Sec-Fetch-Dest: document
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
... but the python requests script has the same result when using these headers, just hanging (or timing out if I set a timeout parameter).
Further update
Debug output is as follows:
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.cmegroup.com:443
send: b'GET /content/cmegroup/en/tools-information/advisorySearch/jcr:content/full-par/cmeadvisorysearch.advisorySearch.advisorynotices:Advisory%20Notices.-.2.12%7C07%7C2021.01%7C01%7C2008.json HTTP/1.1\r\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36\r\nAccept-Encoding: gzip, deflate, br\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9\r\nConnection: keep-alive\r\nHost: www.cmegroup.com\r\nsec-ch-ua: " Not A;Brand";v="99", "Chromium";v="96", "Google Chrome";v="96"\r\nsec-ch-ua-mobile: ?0\r\nsec-ch-ua-platform: Linux\r\nUpgrade-Insecure-Requests: 1\r\nSec-Fetch-Site: none\r\nSec-Fetch-Mode: navigate\r\nSec-Fetch-User: ?1\r\nSec-Fetch-Dest: document\r\nAccept-Language: en-US,en;q=0.9\r\n\r\n'
It looks like it only needs a compatible User-Agent header.
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0',
}
url = 'https://www.cmegroup.com/content/cmegroup/en/tools-information/advisorySearch/jcr:content/full-par/cmeadvisorysearch.advisorySearch.advisorynotices:Advisory%20Notices.-.2.12|07|2021.01|01|2008.json'
response = requests.get(url, headers = headers, timeout = 30) # A
print(response.status_code) # Prints 200 (OK).
print(response.json()) # Prints the output as JSON. "item" key has 50 values in a list.
^ This snippet did the trick for me.
It looks, you are using the response headers, not request headers.
Try
print(request.headers)
I am scraping a website with the following url and headers:
url : 'https://tennistonic.com/tennis-news/'
headers :
{
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-GB,en-US;q=0.9,en;q=0.8",
"Cache-Control": "no-cache",
"content-length": "0",
"content-type": "text/plain",
"cookie": "IDE=AHWqTUl3YRZ8Od9MzGofphNI-OCOFESmxlN69Ekm4Sbh9tcBDXGJQ1LVwbDd2uX_; DSID=AAO-7r74ByYt6ieW2yasN78hFsOGY6mrhpN5pEOWQ1vGRnAOdolIlKv23JqCRf11OpFUGFdZ-yxB3Ii1VE6UjcK-jny-4mcJ5uO-_BaV3bEFbLvU7rJNBlc",
"origin": "https//tennistonic.com",
"Connection": "keep-alive",
"Pragma": "no-cache",
"Referer": "https://tennistonic.com/",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "cross-site",
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.80 Safari/537.36",
"x-client-data": "CI22yQEIprbJAQjBtskBCKmdygEIl6zKAQisx8oBCPXHygEI58jKAQjpyMoBCOLNygEI3NXKAQjB18oBCP2XywEIj5nLARiKwcoB"}
The x client data has a decoded section afterwards which I have left out but also tried with. The full request on dev tools is shown below:
:authority: stats.g.doubleclick.net
:method: POST
:path: /j/collect?t=dc&aip=1&_r=3&v=1&_v=j87&tid=UA-13059318-2&cid=1499412700.1601628730&jid=598376897&gjid=243704922&_gid=1691643639.1604317227&_u=QACAAEAAAAAAAC~&z=1736278164
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: en-GB,en-US;q=0.9,en;q=0.8
cache-control: no-cache
content-length: 0
content-type: text/plain
cookie: IDE=AHWqTUl3YRZ8Od9MzGofphNI-OCOFESmxlN69Ekm4Sbh9tcBDXGJQ1LVwbDd2uX_; DSID=AAO-7r74ByYt6ieW2yasN78hFsOGY6mrhpN5pEOWQ1vGRnAOdolIlKv23JqCRf11OpFUGFdZ-yxB3Ii1VE6UjcK-jny-4mcJ5uO-_BaV3bEFbLvU7rJNBlc
origin: https://tennistonic.com
pragma: no-cache
referer: https://tennistonic.com/
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: cross-site
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.80 Safari/537.36
x-client-data: CI22yQEIprbJAQjBtskBCKmdygEIl6zKAQisx8oBCPXHygEI58jKAQjpyMoBCOLNygEI3NXKAQjB18oBCP2XywEIj5nLARiKwcoB
Decoded:
message ClientVariations {
// Active client experiment variation IDs.
repeated int32 variation_id = [3300109, 3300134, 3300161, 3313321, 3315223, 3318700, 3318773, 3318887, 3318889, 3319522, 3320540, 3320769, 3329021, 3329167];
// Active client experiment variation IDs that trigger server-side behavior.
repeated int32 trigger_variation_id = [3317898];
}
r = requests.get(url2, headers=headers2)
soup_cont = soup(r.content, 'html.parser')
My soup contents from the response is as follows:
Is this website protected or am I sending wrong requests?
Try using selenium:
from selenium import webdriver
from bs4 import BeautifulSoup
import time
driver = webdriver.Chrome()
driver.get('https://tennistonic.com/tennis-news/')
time.sleep(3)
soup = BeautifulSoup(driver.page_source,'html5lib')
print(soup.prettify())
driver.close()
I'm trying to log in into one web application with python but very attempt ends with 500 error and the html body shows the error: [HttpAntiForgeryException]. I tried to apply a few solutions from the other questions here but nothing helped. So now, I'm sucked at first request which response Is giving me 500.
import requests
from bs4 import BeautifulSoup
url = "http://localhost:52053/Account/Login"
username = "test#test.sk"
user_password = "pass"
session = requests.Session()
response = session.get(url)
soup = BeautifulSoup(response.content, features="html.parser")
#print(soup)
states = ["__RequestVerificationToken", "Email", "RememberMe"]
login_data = {"username": username, "password": user_password, "Login": "submit"}
headers = {"Host": "localhost:52053",
"Content-Type": "application/x-www-form-urlencoded",
"Connection": "close",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0)",
"Cookie": str(session.cookies.get_dict())}
for state in states: # search for existing aspnet states and get its values
result = soup.find('input', {'name': state})
if not (result is None): # when existent (some may not be needed!)
if state == "Email":
login_data.update({state: login_data["username"]})
else:
login_data.update({state: result['value']})
post_request = session.post(url, headers=headers, data=login_data)
Successful login attempt looks like this.
POST /Account/Login HTTP/1.1
Host: localhost:52053
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: sk,en-US;q=0.7,en;q=0.3
Accept-Encoding: gzip, deflate
Content-Type: application/x-www-form-urlencoded
Content-Length: 193
Origin: http://localhost:52053
Connection: close
Referer: http://localhost:52053/Account/Login
Cookie: __RequestVerificationToken=j9yFGpTFSlH5_aQt0k-Gvz10I16TVXbDk31NKPm1HkcWsksUfKXkjL567yFplCS_VovTR7lVuEgNjwgp-EO3RjNj4gQOvNUXnPkjymZx_jA1
Upgrade-Insecure-Requests: 1
__RequestVerificationToken=LjHuOdKSCr1A7KRDNie4GUnCZ3qRwUCdHyLlPYT40DsEB-GNUvEKxe5nvZWf5gZ4ZflwI43xGWPyYu8GI15wroEg9WRRVtSzZ9-KY9Mu_JA1&Email=test%40test.sk&Password=pass&RememberMe=false
Following response is:
HTTP/1.1 302 Found
Cache-Control: no-cache
Pragma: no-cache
Content-Type: text/html; charset=utf-8
Expires: -1
Location: /
Server: Microsoft-IIS/10.0
X-AspNetMvc-Version: 5.2
X-AspNet-Version: 4.0.30319
Set-Cookie: .AspNet.ExternalCookie=; path=/; expires=Thu, 01-Jan-1970 00:00:00 GMT
Set-Cookie: .AspNet.ApplicationCookie=KCLm03FHj8v_6rIpTzBTm7EzEtzpKmIz1Z9_z29wycUSqUVyKbGEmptXUwG41MqNOMR7Vbeq2u576ijazupNLffLP-Ua0n60aLmnVSDsLsdTqYT7jjqyGPw1Ppp8AnIDs3sdefmksazX2UvKTxzxRBufFCoxtCJx51mWtBv7v0JzUeC1hnfu1AIJ7GH_8T59KD3iv0hRSHDqlWHlkWzyN1Xt0m5ixC14e4eC2YxEm3_acy96atB2Jv5u0HREPzssLmywuzj6sLa9cHCllTG2gMVWvHA3IDhCWu7Ojf8BO02Eml3pPM5QTJ-sq540fcj9QyELayUOwBZWffSgsJeq8mlt3FupQcJ-JTJxDzAsDc4Cmk-BcvYSfpAJq4SdR-Y4mTN_6vu-wwAOLZPSgh-5K7guWmZ3VfRitZHXd_rvTEmMiVrgHFTEQAkUYu4zTSupxRplTtKb1VSDs0Nc1uEos2z0_aw-nBbRBrTPpvmqGok
Auth flow continues with this request. I'm not trying to sent this request yet (I put it here just for better imagination):
GET / HTTP/1.1
Host: localhost:52053
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: sk,en-US;q=0.7,en;q=0.3
Accept-Encoding: gzip, deflate
Referer: http://localhost:52053/Account/Login
Connection: close
Cookie: __RequestVerificationToken=j9yFGpTFSlH5_aQt0k-Gvz10I16TVXbDk31NKPm1HkcWsksUfKXkjL567yFplCS_VovTR7lVuEgNjwgp-EO3RjNj4gQOvNUXnPkjymZx_jA1;
.AspNet.ApplicationCookie=gvv113IJhtdaOhdc0Rz2N--5Ob18W6gS64J3wtOJggRTqE70h-8HyBGQAmLvSM2qCV2e-dXR2Uto-BktD6NmNz6dJtxckIYasPOfqodDNZX33YJxNEDg7a64LPi1bNnmrnvQcOHAceQNqZDykXrhFm55dqoo1oZnJHfZQnltwqAdg7DGO31PZpzu-GAZh2_gzuxd_saJdS09ZZQrc9h7WiU2ONqeya87pSAN7ZyHQ_XvsU5cUwDGq7FWLpzlIeeZWkay6iWVmCSwNEofpdVsb880P3XZnFKEj2SW2PfazdNLfgy86YNjkoD6_3Vb1BLirRoSP0XIQMcs2F_CzgXkxD5GvDray8TPYqcQJ4L2fikReUJHadx9fFnslF2BFcnKYC8D-Xusrda_5r-CQoQ4SzAe2Cqn0h1NYHxS1wsxt35neC5RuQ3geadAEEghjrSSVhSl8jCfACtQtcBeNL2x_m6I9L3XJCjMpzJjtP6up3E
Upgrade-Insecure-Requests: 1
Next response is just kind of 200 - you are in.
So my problem is that the response from the first request is failing. Is someone able to see some mistake or did I forgot something?
Failed response from the first request call:
HTTP/1.1 500 Internal Server Error
Cache-Control: private
Content-Type: text/html; charset=utf-8
Server: Microsoft-IIS/10.0
X-AspNet-Version: 4.0.30319
X-SourceFiles: =?UTF-8?B?QzpcVXNlcnNcUENBZG1pbmlzdHJhdG9yXERlc2t0b3BccGVuIHRlc3RpbmdcU2VjdXJpdHlXb3Jrc2hvcC1EVldBLW1hc3RlclxkdndhLXRyYWluaW5nXGR2d2EtdHJhaW5pbmdcQWNjb3VudFxMb2dpbg==?=
If I try to pint request headers and login_data, result is:
print(post_request.request.headers)
{'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0)', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'close', 'Host': 'localhost:52053', 'Content-Type': 'application/x-www-form-urlencoded', 'Cookie': "{'__RequestVerificationToken': 'yg-7mFRyZiONwsZ2dIVkIIW5tB7gSL_sazgphg-VuW2OpNNRRkxmLH-9SZJXiN9whUC_BYTo8RgsiDrVjcYtLEf9anW56rVwZ2RQPzxHA481'}", 'Content-Length': '249'}
print(login_data)
{'username': 'test#test.sk', 'password': 'pass', 'Login': 'submit', '__RequestVerificationToken': '14OuwaRqldlGKi93C91zf6QD_ouOorHBDe63s4KgfP3gbt85V0QMy2X5OMwWAo1TUrD8zJ-zoZbXLPpgDI_wrxVZv3ceYNos_e5_elFhVt01', 'Email': 'test#test.sk', 'RememberMe': 'true', 'Password': 'pass'}
I just find the solution out.
Requests could handle all headers by itself (and my headers was, for some reason, causing errors), it was redirecting my request to 200 response so I didn't instantly saw, that it is actually working and catching 302 response.
I found out by printing:
print(post_request.history)
Which gave me <[302]>
Now, when I know, there is a redirection, I just have to allow_redirects=False and now I'm able to catch my set-cookie header
Full code, witch is getting an expected responses is:
import requests
from bs4 import BeautifulSoup
url = "http://localhost:52053/Account/Login"
username = "test#test.sk"
user_password = "pass"
session = requests.Session()
response = session.get(url)
soup = BeautifulSoup(response.content, features="html.parser")
#print(soup)
states = ["__RequestVerificationToken", "Email", "RememberMe"]
login_data = {"username": username, "password": user_password, "Login": "submit"}
for state in states: # search for existing aspnet states and get its values
result = soup.find('input', {'name': state})
if not (result is None): # when existent (some may not be needed!)
if state == "Email":
login_data.update({state: login_data["username"]})
else:
login_data.update({state: result['value']})
post_request = session.post(url, data=login_data, allow_redirects=False)
print(login_data)
#the code below is testing, if the HttpAntiForgeryException is in code
if "HttpAntiForgeryException" not in post_request.text:
print(post_request.headers)
else:
print("antiforgery")
Updated code - I'm using this code to send the request:
headers = {
"Host": "www.roblox.com",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0",
"Accept": "application/json, text/plain, */*",
"Accept-Language": "en-US;q=0.7,en;q=0.3",
"Referer": "https://www.roblox.com/users/12345/profile",
"Content-Type": "application/json;charset=utf-8",
"X-CSRF-TOKEN": "some-xsrf-token",
"Content-Length": "27",
"DNT": "1",
"Connection": "close"
}
data = {"targetUserId":"56789"}
url = "http://www.roblox.com/user/follow"
r = requests.post(url, headers=headers, data=data, cookies={"name":"value"})
Response (using r.text):
{"isValid":false,"data":null,"error":""}
The request itself is valid, I sent it using burp and it worked:
POST /user/follow HTTP/1.1
Host: www.roblox.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0
Accept: application/json, text/plain, */*
Accept-Language: pl,en-US;q=0.7,en;q=0.3
Referer: https://www.roblox.com/users/12345/profile
Content-Type: application/json;charset=utf-8
X-CSRF-TOKEN: Ab1/2cde3fGH
Content-Length: 27
Cookie: some-cookie=;
DNT: 1
Connection: close
{"targetUser":"56789"}
Because it works in Burp but not in Python requests, get a packet sniffer (Wireshark is the simplest IMO) and look to see the difference in the packet sent by Burp that works and the one sent from Python that does not work. I am suspecting that the problem is that the website is HTTPS but you are using http://www.roblox.com . Do try https://www.roblox.com , but I am not sure if it will work.
How can I grab the request headers for an XHR requests using Python Requests module? Using the following code seems to return the response headers:
import requests
r = requests.get('http://www.whoscored.com/tournamentsfeed/12496/Fixtures/?d=2015W50&isAggregate=false')
headers = r.headers
print headers
This returns an object that looks like this:
{'content-length': '624', 'content-encoding': 'gzip', 'expires': '-1', 'vary': 'Accept-Encoding', 'server': 'Microsoft-IIS/8.0', 'pragma': 'no-cache', 'cache-control': 'no-cache', 'date': 'Tue, 15 Dec 2015 14:41:34 GMT', 'x-powered-by': 'ASP.NET', 'content-type': 'text/html; charset=utf-8'}
However, when I look in Chrome developer tools the request header looks like this:
Host: www.whoscored.com
Connection: keep-alive
Accept: text/plain, */*; q=0.01
Model-Last-Mode: W50hFYr7jwZWt40WUb9udPVFxmB6g9yct204X0/gmf4=
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36
Referer: http://www.whoscored.com/Regions/252/Tournaments/2/England-Premier-League
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-GB,en-US;q=0.8,en;q=0.6
Cookie: __gads=ID=d09f8c0cdc1a4258:T=1449875272:S=ALNI_MbTPDtXiIlHK49F4FOqdDap__pfCA; nlsnocrvu=1; OX_plg=swf|shk|pm; _ga=GA1.3.578623339.1449875271; _gat=1; _ga=GA1.2.578623339.1449875271
Can anyone assist?
Thanks
You need to check for request headers like this
r.request.headers
That would give you something like
{'Connection': 'keep-alive', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'User-Agent': 'python-requests/2.7.0 CPython/2.7.10 Darwin/15.0.0'}
For obvious reasons it won't be the same as you see in the Chrome developer tools, because the browser adds its own headers which the requests module doesn't.
GET /tournamentsfeed/12496/Fixtures/?d=2015W50&isAggregate=false HTTP/1.1
Host: www.whoscored.com
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8,fr;q=0.6
Cookie: _ga=GA1.2.788154924.1450195026; _gat_as25n45=1
To get these headers you need to run some js code to pull the headers.