Return 302 in Web Crawler - python

After I simulate to log in, when I try to post the original website, it returns 302. When I open the original website in Chrome, it returns 415.
I tried several ways:
session.post(url,headers = headers,data = data)
requests.post(url,headers = headers,data = data)
urllib.request.urlopen.read(url).decode()
import requets
import json
header = {'Host': 'sty.js118114.com:8080',
'Connection': 'keep-alive',
'Content-Length': '8188',
'Accept': '*/*',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36',
'Content-Type': 'text/plain;charset=UTF-8;application/xml',
'Origin': 'http://sty.js118114.com:8080',
'Referer':
'http://sty.js118114.com:8080/Report/report/movecar_list.html',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Cookie': cookie_name + '=' + cookie_value
}
data = {"calling_no":"","begin_time":"","end_time":"","called_car_no":""}
res = requests.post(target,data = json.dumps(data),headers = header)
print(res.content.decode())
I expect the content must be the json version or html version so that I can use re model or xpath to get the infomation I want.(without any redirects
Lastly, I provide the necessary infomation about the problem:
Chrome Network
General
Request URL: http://sty.js118114.com:8080/Report/movecar/list/1/10
Request Method: POST
Status Code: 200 OK
Remote Address: 127.0.0.1:8888
Referrer Policy: no-referrer-when-downgrade
Response Headers
Content-Length: 8150
Content-Type: application/json;charset=UTF-8
Date: Thu, 22 Aug 2019 00:47:51 GMT
Server: Apache-Coyote/1.1
Request Headers
Accept: */*
Accept-Encoding: gzip, deflate
Accept-Language: zh-CN,zh;q=0.9
Content-Length: 66
Content-Type: text/plain;charset=UTF-8;
Cookie: JSESSIONID=0A474B00017BFFD89A515B336F482905
Host: sty.js118114.com:8080
Origin: http://sty.js118114.com:8080
Proxy-Connection: keep-alive
Referer: http://sty.js118114.com:8080/Report/report/movecar_list.html
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36
X-Requested-With: XMLHttpRequest
Request Payload
{calling_no: "", begin_time: "", end_time: "", called_car_no: ""}
begin_time: ""
called_car_no: ""
calling_no: ""
end_time: ""
Fiddler Inspectors Raw
POST http://sty.js118114.com:8080/Report/movecar/list/1/10 HTTP/1.1
Host: sty.js118114.com:8080
Connection: keep-alive
Content-Length: 66
Accept: */*
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36
Content-Type: text/plain;charset=UTF-8;
Origin: http://sty.js118114.com:8080
Referer: http://sty.js118114.com:8080/Report/report/movecar_list.html
Accept-Encoding: gzip, deflate
Accept-Language: zh-CN,zh;q=0.9
Cookie: JSESSIONID=0A474B00017BFFD89A515B336F482905
{"calling_no":"","begin_time":"","end_time":"","called_car_no":""}
Response Raw
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: application/json;charset=UTF-8
Date: Thu, 22 Aug 2019 00:27:59 GMT
Content-Length: 8150

Related

Python 3 POST request to handle form data

I can't figure out how to correctly set up a POST request with the following data:
General
Request URL: https://myurl.com/install/index.cgi
Request Method: POST
Request Headers
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en
Cache-Control: max-age=0
Connection: keep-alive
Content-Length: 48
Content-Type: application/x-www-form-urlencoded
Host: myurl.com
Origin: https://myurl.com
Referer: https://myurl.com/install/
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64)
Form Data
page: install
state: STATUS
I can do the following:
import requests
headers = {"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Encoding":"gzip,deflate,br",
"Accept-Language":"en-US,en;q=0.8",
"Cache-Control":"max-age=0",
"Connection":"keep-alive",
"Content-Length":"48",
"Content-Type":"application/x-www-form-urlencoded",
"Host":"myurl.com",
"Origin":"https://myurl.com",
"Referer":"https://myurl.com/install/?s=ROM",
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"}
f = requests.put(path, headers=headers)
But how do I handle the form data? Under the form data there is a page: install and a state: STATUS.
How do I include this on my POST request?
Just add data= to your request:
import requests
path = ...
headers = ...
form_data = {
"page": "install",
"state": "STATUS",
}
f = requests.put(path, headers=headers, data=form_data)
I presume you know how to use the developer tools on the browser of your choice. The following is a template I follow:
Load the page (GET)
Use XPATH to find_element_by_id I'm targeting, i.e. username
Set XPATH to SetValue of such element
Post the page (POST)

Python - Send post wrong par?

I have problem on site where email is under obfuscator.
When i run my program i get output:
The email is:{"success":"","code":1,"msg":"ReCAPTCHA"}
But when i want click 'watch email' on computer all is fine and i get:
The email is:{"success":"","code":0,"msg":"xxxx#gmail.com"}
Code in POST:
REQUEST HEADERS
Accept: application/json, text/javascript, */*; q=0.01
Accept-Encoding: gzip, deflate, br
Accept-Language: pl-PL,pl;q=0.9,en-US;q=0.8,en;q=0.7
Connection: keep-alive
Content-Length: 141
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Cookie: SOOOME COOKIES.
Host: https://xxxxxxx.com
Origin: https://xxxxxxx.com
Referer: https://xxxxxxx.com/asas
User-Agent: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Mobile Safari/537.36
X-Requested-With: XMLHttpRequest
QUERY STRING PARAMETRS
decode:
FORM DATA
hash: YToyOntpOjA7czo0NDoidHh3VFlXck83eFdza1FRUWgydUlvb0MveHRRemNLaCtNa3BuenVJU0VmUT0iO2k6MTtzOjE2OiK3SJ7OlhTa5DgPfA1YqCfRIjt9
type: ademail
And here is my code:
import requests
url = "https://xxxxx.com/_ajax/obfuscator/?decode"
headers = {
'User-Agent':'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Mobile Safari/537.36',
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'pl-PL,pl;q=0.9,en-US;q=0.8,en;q=0.7',
'Connection': 'keep-alive',
'Content-Length': '141',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Host': 'https://xxxxx.com/',
'Origin': 'https://xxxxx.com/',
'Referer':'https://xxxxx.com/asd',
'X-Requested-With':'XMLHttpRequest' }
data = {'hash':'YToyOntpOjA7czo0NDoiQStHbXkrY2p1dllrUmlXSWdWTjdNbHF2Y3cyak13QU5GeUtaQXZReFcrbz0iO2k6MTtzOjE2OiJ7Byq7O88ydxCtVWgoEETOIjt9',
'type':'adsemail'}
r = requests.post(url, data, headers)
pastebin_url = r.text
print("The email is:%s"%pastebin_url)
I also try do it the same as Webdriver
driver = webdriver.Chrome("C:/Users/User/Desktop/Email/chromedriver.exe")
driver.set_page_load_timeout(5000)
driver.get("https://xxxx.com/asd")
driver.implicitly_wait(3000)
sleep(1)
RODO = "//input[#class='btn btn-confirm']"
driver.find_element_by_xpath(RODO).click()
sleep(7)
email = "//span[#class='click_to_show']"
driver.find_element_by_xpath(email).click()
But i get Recaptcha to do.... ;/
Where is the problem?
I also try:
``` user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.50 Safari/537.36'
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('user-agent={0}'.format(user_agent))
driver = webdriver.Chrome("C:/Users/User/Desktop/Email/chromedriver.exe")
driver.set_page_load_timeout(5000)
But not working, site want captcha ;/

Log in and download - Python Requests

I've been trying to login on this website and download some files for a while now. I can't find out what is wrong with my code below. I'm new to Python:
When you hit the login page https://www.targetsite/members/login.php, it makes the following calls:
Login page:
[General]
Request URL: https://www.targetsite/members/login.php
Request Method: GET
Status Code: 302 Found
Remote Address: 0.0.0.0:443
Referrer Policy: no-referrer-when-downgrade
[Response Headers]
Connection: keep-alive
Content-Length: 425
Content-Type: text/html; charset=iso-8859-1
Date: Thu, 13 Dec 2018 14:37:02 GMT
Keep-Alive: timeout=5
Location: https://www.targetsite/auth.form?bWFmYkpTbjhNN0J4bWM2S2NwaCtlNTkydDJJV0xiMk1aTWRKa0kwVldWZ29hZjdaMEUweFJuRWl3a3NqOVUwTwpSbmtyRnptekp6Wm40VlM2MDF5dWVVSDR2V1FXMG5JU1ZNQVUrZ0lvYTlXTWQ4T2ZEVzJVY1ZSWW4wZk1NVHZhCm9uZWZZeTI2V1JNPQo=
Server: nginx/1.14.2
Set-Cookie: pcar%5fUkVTVFJJQ1RFRA%3d%3d=; path=/; domain=.targetsite; expires=Wed 13-Dec-2017 14:37:02 GMT
X-Vegas-No-Cache: YES
[Request Headers]
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Connection: keep-alive
Cookie: pcah=SXlOK1VkSytiMTl0VllvbDk4N2tVaXR5bmZFZmNNVUsK
Host: www.targetsite
Referer: http://www.targetsite/tour/index.php
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36 Vivaldi/2.1.1337.36
AuthForm:
[General]
Request URL: https://www.targetsite/auth.form?bWFmYkpTbjhNN0J4bWM2S2NwaCtlNTkydDJJV0xiMk1aTWRKa0kwVldWZ29hZjdaMEUweFJuRWl3a3NqOVUwTwpSbmtyRnptekp6Wm40VlM2MDF5dWVVSDR2V1FXMG5JU1ZNQVUrZ0lvYTlXTWQ4T2ZEVzJVY1ZSWW4wZk1NVHZhCm9uZWZZeTI2V1JNPQo=
Request Method: GET
Status Code: 200 OK
Remote Address: 0.0.0.0:443
Referrer Policy: no-referrer-when-downgrade
[Response Headers]
Cache-Control: max-age=1, must-revalidate
Connection: keep-alive
Content-Type: text/html
Date: Thu, 13 Dec 2018 14:37:03 GMT
Expires: Thu, 13 Dec 2018 14:37:03 GMT
Keep-Alive: timeout=5
Server: nginx/1.14.2
Set-Cookie: pcar%5fUkVTVFJJQ1RFRA%3d%3d=; path=/; domain=.targetsite; expires=Wed 13-Dec-2017 14:37:03 GMT
Transfer-Encoding: chunked
[Request Headers]
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Connection: keep-alive
Cookie: pcah=SXlOK1VkSytiMTl0VllvbDk4N2tVaXR5bmZFZmNNVUsK
Host: www.targetsite
Referer: http://www.targetsite/tour/index.php
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36 Vivaldi/2.1.1337.36
bWFmYkpTbjhNN0J4bWM2S2NwaCtlNTkydDJJV0xiMk1aTWRKa0kwVldWZ29hZjdaMEUweFJuRWl3a3NqOVUwTwpSbmtyRnptekp6Wm40VlM2MDF5dWVVSDR2V1FXMG5JU1ZNQVUrZ0lvYTlXTWQ4T2ZEVzJVY1ZSWW4wZk1NVHZhCm9uZWZZeTI2V1JNPQo:
So based on that behaviour observed using chrome inspector, I coded the following, trying to emulate the actions triggered after accessing https://www.targetsite.com/members/login.php
#Libraries
import requests
import json
from lxml import html
#URL
primeiraUrl = 'https://www.targetsite.com/members/login.php'
urlPost = 'https://ams.targetsite.com/auth.form'
#Credentials
userd = 'user'
passwd = 'pass'
session = requests.Session()
#session.verify = False
#GetToken
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9',
'Connection': 'keep-alive',
'Referer': 'http://www.targetsite.com/tour/index.php',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36 Vivaldi/2.1.1337.36'
}
get1stContact = session.get(primeiraUrl,headers=headers)
segundaUrl = get1stContact.url
get2ndContact = session.get(segundaUrl,headers=headers)
And then when you login on the website, that's what you get:
[General]
Request URL: https://ams.targetsite.com/auth.form
Request Method: POST
Status Code: 302 Found
Remote Address: 1.1.1.1:443
Referrer Policy: no-referrer-when-downgrade
[Response Headers]
Connection: Keep-Alive
Content-Length: 239
Content-Type: text/html; charset=iso-8859-1
Date: Thu, 13 Dec 2018 14:47:57 GMT
Keep-Alive: timeout=20, max=94
Location: http://www.targetsite.com/members/index.php
Server: Apache/2.2.15 (CentOS)
Set-Cookie: pcar%5fUkVTVFJJQ1RFRA%3d%3d=cS90NW9XLzVVeVNIeElMOUpFaHlCb2hGWkZveVUrdTFiK0dad0FYVDN2UT0K; path=/; domain=.targetsite.com; expires=Thu 13-Dec-2018 20:47:57 GMT
[Request Headers]
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Cache-Control: max-age=0
Connection: keep-alive
Content-Length: 147
Content-Type: application/x-www-form-urlencoded
Cookie: pcah=Q3BLRnpDUGRhcnJFMmg1OGI0LzBrLzNhYWM5cjBVV2IK
Host: ams.targetsite.com
Origin: https://www.targetsite.com
Referer: https://www.targetsite.com/auth.form?N2dFMUIwaGFIc1BWQ3BoRTd2NVBWayt5ZE91UnZsa2xCcmNUU1VtVG8yNW54WUhjNFBYblE3STJwK2xrRWhNawpNRWtmMjJtUFF2Y0xSL2t1N2xIc2pmSk4wZG5uRVdmbkEyRUpxdnVDODI4UmVhMjlId1h6dVZIeFRtWGZuUGd5CkoySHEwZXg5RnRVPQo=
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36 Vivaldi/2.1.1337.36
[Form-Data]
rlm: RESTRICTED
for: http%3a%2f%2fwww%2etargetsite%2ecom%2fmembers%2findex%2ephp
rmb: y
uid: user
pwd: pass
And here is the code I wrote to make that post request:
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Origin': 'https://www.targetsite.com',
'Referer': segundaUrl,
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36 Vivaldi/2.1.1337.36'
}
body = {
'rlm':'RESTRICTED',
'for':'http://www.targetsite.com/members/index.php',
'rmb': 'y',
'uid': 'user',
'pwd': 'pass'
}
#session.post(url)
r = session.post(urlPost, headers=headers, data=body)
That all being said there's anyone who can help me figure this out ? Thanks in advance!
Edit.: Full code as requested:
#Help
#http://kazuar.github.io/scraping-tutorial/
#Libraries
import requests
import json
from lxml import html
#URL
primeiraUrl = 'https://www.targetsite.com/members/login.php'
urlPost = 'https://ams.targetsite.com/auth.form'
#Credentials
userd = 'user'
passwd = 'pass'
session = requests.Session()
#session.verify = False
#GetToken
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9',
'Connection': 'keep-alive',
'Referer': 'http://www.targetsite.com/tour/index.php',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36 Vivaldi/2.1.1337.36'
}
get1stContact = session.get(primeiraUrl,headers=headers)
segundaUrl = get1stContact.url
get2ndContact = session.get(segundaUrl,headers=headers)
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Content-Type': 'application/x-www-form-urlencoded',
'Accept-Language': 'en-US,en;q=0.9',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Origin': 'https://www.targetsite.com',
'Referer': segundaUrl,
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36 Vivaldi/2.1.1337.36'
}
body = {
'rlm':'RESTRICTED',
'for':'http://www.targetsite.com/members/index.php',
'rmb': 'y',
'uid': 'user',
'pwd': 'pass'
}
#session.post(url)
r = session.post(urlPost, headers=headers, data=body)
print(r.text)
You are missing to send some headers, most importantly:
Content-Type: application/x-www-form-urlencoded
That one is very important, as it tells the server how to parse the parameters you are sending in your form.

Can't login to a specific ASP.NET website using python requests

So I've been trying for the last 6 hours to make this work, but I couldn't and endless searches didn't help, So I guess I'm either doing something very fundamental wrong, or it's just a trivial bug which happens to match my logic so I need extra eyes to help me fix it.
The website url is this.
I wrote a piece of messy python code to just login and read the next page, but All I get is a nasty 500 error saying something on the server went wrong processing my request.
Here is the request made by a browser which works just fine, no problem.
HTTP Response code to this request is 302 (Redirect)
POST /appstatus/index.aspx HTTP/1.1
Host: www.wes.org
Connection: close
Content-Length: 303
Cache-Control: max-age=0
Origin: https://www.wes.org
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36
Content-Type: application/x-www-form-urlencoded
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Referer: https://www.wes.org/appstatus/index.aspx
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.8,fa;q=0.6
Cookie: ASP.NET_SessionId=bu2gemmlh3hvp4f5lqqngrbp; _ga=GA1.2.1842963052.1473348318; _gat=1
__VIEWSTATE=%2FwEPDwUKLTg3MTMwMDc1NA9kFgICAQ9kFgICAQ8PFgIeBFRleHRkZGRk9rP20Uj9SdsjOKNUBlbw55Q01zI%3D&__VIEWSTATEGENERATOR=189D346C&__EVENTVALIDATION=%2FwEWBQK6lf6LBAKf%2B9bUAgK9%2B7qcDgK8w4S2BALowqJjoU1f0Cg%2FEAGU6r2IjpIPG8BO%2BiE%3D&txtUID=Email%40Removed.com&txtPWD=PASSWORDREMOVED&Submit=Log+In&Hidden1=
and this one is the request made by my script.
POST /appstatus/index.aspx HTTP/1.1
Host: www.wes.org
Connection: close
Accept-Encoding: gzip, deflate, br
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Upgrade-Insecure-Requests: 1
Content-Type: application/x-www-form-urlencoded
Origin: https://www.wes.org
Accept-Language: en-US,en;q=0.8,fa;q=0.6
Cache-Control: max-age=0
Referer: https://www.wes.org/appstatus/indexca.aspx
Cookie: ASP.NET_SessionId=nxotmb55jjwf5x4511rwiy45
Content-Length: 303
txtPWD=PASSWORDREMOVED&Submit=Log+In&__EVENTVALIDATION=%2FwEWBQK6lf6LBAKf%2B9bUAgK9%2B7qcDgK8w4S2BALowqJjoU1f0Cg%2FEAGU6r2IjpIPG8BO%2BiE%3D&txtUID=Email%40Removed.com&__VIEWSTATE=%2FwEPDwUKLTg3MTMwMDc1NA9kFgICAQ9kFgICAQ8PFgIeBFRleHRkZGRk9rP20Uj9SdsjOKNUBlbw55Q01zI%3D&Hidden1=&__VIEWSTATEGENERATOR=189D346C
And this is the script making the request, I'm sorry if it's so messy, just need something quick.
import requests
import bs4
import urllib.parse
def main():
session = requests.Session()
headers = {"Origin": "https://www.wes.org",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Cache-Control": "max-age=0", "Upgrade-Insecure-Requests": "1", "Connection": "close",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36",
"Referer": "https://www.wes.org/appstatus/indexca.aspx", "Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.8,fa;q=0.6", "Content-Type": "application/x-www-form-urlencoded"}
r = session.get('https://www.wes.org/appstatus/index.aspx',headers=headers)
cookies = r.cookies
soup = bs4.BeautifulSoup(r.content, "html5lib")
viewState=urllib.parse.quote(str(soup.select('#__VIEWSTATE')[0]).split('value="')[1].split('"/>')[0])
viewStateGenerator=urllib.parse.quote(str(soup.select('#__VIEWSTATEGENERATOR')[0]).split('value="')[1].split('"/>')[0])
eventValidation=urllib.parse.quote(str(soup.select('#__EVENTVALIDATION')[0]).split('value="')[1].split('"/>')[0])
paramsPost = {}
paramsPost.update({'__VIEWSTATE':viewState})
paramsPost.update({'__VIEWSTATEGENERATOR':viewStateGenerator})
paramsPost.update({'__EVENTVALIDATION':eventValidation})
paramsPost.update({"txtUID": "My#Email.Removed"})
paramsPost.update({"txtPWD": "My_So_Called_Password"})
paramsPost.update({"Submit": "Log In"})
paramsPost.update({"Hidden1": ""})
response = session.post("https://www.wes.org/appstatus/index.aspx", data=paramsPost, headers=headers,
cookies=cookies)
print("Status code:", response.status_code) #Outputs 500.
#print("Response body:", response.content)
if __name__ == '__main__':
main()
Any help would be so much appreciated.
You are doing way too much work and in doing so not passing valid data,you extract value attribute directly i.e .select_one('#__VIEWSTATEGENERATOR')["value"] and the same for all the rest, the cookies will be set in the Session object after your initial get so the logic boils down to:
with requests.Session() as session:
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36"}
r = session.get('https://www.wes.org/appstatus/index.aspx', headers=headers)
soup = bs4.BeautifulSoup(r.content, "html5lib")
viewState = soup.select_one('#__VIEWSTATE')["value"]
viewStateGenerator = soup.select_one('#__VIEWSTATEGENERATOR')["value"]
eventValidation = soup.select_one('#__EVENTVALIDATION')["value"]
paramsPost = {'__VIEWSTATE': viewState,'__VIEWSTATEGENERATOR': viewStateGenerator,
'__EVENTVALIDATION': eventValidation,"txtUID": "My#Email.Removed",
"txtPWD": "My_So_Called_Password",
"Submit": "Log In","Hidden1": ""}
response = session.post("https://www.wes.org/appstatus/index.aspx", data=paramsPost, headers=headers)
print("Status code:", response.status_code)
Python by convention uses CamelCase for class names and lowercase with underscores to separate multiple words, you might want to consider applying that to your code.

How to get request headers rather than response headers using Python Requests

How can I grab the request headers for an XHR requests using Python Requests module? Using the following code seems to return the response headers:
import requests
r = requests.get('http://www.whoscored.com/tournamentsfeed/12496/Fixtures/?d=2015W50&isAggregate=false')
headers = r.headers
print headers
This returns an object that looks like this:
{'content-length': '624', 'content-encoding': 'gzip', 'expires': '-1', 'vary': 'Accept-Encoding', 'server': 'Microsoft-IIS/8.0', 'pragma': 'no-cache', 'cache-control': 'no-cache', 'date': 'Tue, 15 Dec 2015 14:41:34 GMT', 'x-powered-by': 'ASP.NET', 'content-type': 'text/html; charset=utf-8'}
However, when I look in Chrome developer tools the request header looks like this:
Host: www.whoscored.com
Connection: keep-alive
Accept: text/plain, */*; q=0.01
Model-Last-Mode: W50hFYr7jwZWt40WUb9udPVFxmB6g9yct204X0/gmf4=
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36
Referer: http://www.whoscored.com/Regions/252/Tournaments/2/England-Premier-League
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-GB,en-US;q=0.8,en;q=0.6
Cookie: __gads=ID=d09f8c0cdc1a4258:T=1449875272:S=ALNI_MbTPDtXiIlHK49F4FOqdDap__pfCA; nlsnocrvu=1; OX_plg=swf|shk|pm; _ga=GA1.3.578623339.1449875271; _gat=1; _ga=GA1.2.578623339.1449875271
Can anyone assist?
Thanks
You need to check for request headers like this
r.request.headers
That would give you something like
{'Connection': 'keep-alive', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'User-Agent': 'python-requests/2.7.0 CPython/2.7.10 Darwin/15.0.0'}
For obvious reasons it won't be the same as you see in the Chrome developer tools, because the browser adds its own headers which the requests module doesn't.
GET /tournamentsfeed/12496/Fixtures/?d=2015W50&isAggregate=false HTTP/1.1
Host: www.whoscored.com
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8,fr;q=0.6
Cookie: _ga=GA1.2.788154924.1450195026; _gat_as25n45=1
To get these headers you need to run some js code to pull the headers.

Categories