After I simulate to log in, when I try to post the original website, it returns 302. When I open the original website in Chrome, it returns 415.
I tried several ways:
session.post(url,headers = headers,data = data)
requests.post(url,headers = headers,data = data)
urllib.request.urlopen.read(url).decode()
import requets
import json
header = {'Host': 'sty.js118114.com:8080',
'Connection': 'keep-alive',
'Content-Length': '8188',
'Accept': '*/*',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36',
'Content-Type': 'text/plain;charset=UTF-8;application/xml',
'Origin': 'http://sty.js118114.com:8080',
'Referer':
'http://sty.js118114.com:8080/Report/report/movecar_list.html',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Cookie': cookie_name + '=' + cookie_value
}
data = {"calling_no":"","begin_time":"","end_time":"","called_car_no":""}
res = requests.post(target,data = json.dumps(data),headers = header)
print(res.content.decode())
I expect the content must be the json version or html version so that I can use re model or xpath to get the infomation I want.(without any redirects
Lastly, I provide the necessary infomation about the problem:
Chrome Network
General
Request URL: http://sty.js118114.com:8080/Report/movecar/list/1/10
Request Method: POST
Status Code: 200 OK
Remote Address: 127.0.0.1:8888
Referrer Policy: no-referrer-when-downgrade
Response Headers
Content-Length: 8150
Content-Type: application/json;charset=UTF-8
Date: Thu, 22 Aug 2019 00:47:51 GMT
Server: Apache-Coyote/1.1
Request Headers
Accept: */*
Accept-Encoding: gzip, deflate
Accept-Language: zh-CN,zh;q=0.9
Content-Length: 66
Content-Type: text/plain;charset=UTF-8;
Cookie: JSESSIONID=0A474B00017BFFD89A515B336F482905
Host: sty.js118114.com:8080
Origin: http://sty.js118114.com:8080
Proxy-Connection: keep-alive
Referer: http://sty.js118114.com:8080/Report/report/movecar_list.html
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36
X-Requested-With: XMLHttpRequest
Request Payload
{calling_no: "", begin_time: "", end_time: "", called_car_no: ""}
begin_time: ""
called_car_no: ""
calling_no: ""
end_time: ""
Fiddler Inspectors Raw
POST http://sty.js118114.com:8080/Report/movecar/list/1/10 HTTP/1.1
Host: sty.js118114.com:8080
Connection: keep-alive
Content-Length: 66
Accept: */*
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36
Content-Type: text/plain;charset=UTF-8;
Origin: http://sty.js118114.com:8080
Referer: http://sty.js118114.com:8080/Report/report/movecar_list.html
Accept-Encoding: gzip, deflate
Accept-Language: zh-CN,zh;q=0.9
Cookie: JSESSIONID=0A474B00017BFFD89A515B336F482905
{"calling_no":"","begin_time":"","end_time":"","called_car_no":""}
Response Raw
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: application/json;charset=UTF-8
Date: Thu, 22 Aug 2019 00:27:59 GMT
Content-Length: 8150
Related
I can't figure out how to correctly set up a POST request with the following data:
General
Request URL: https://myurl.com/install/index.cgi
Request Method: POST
Request Headers
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en
Cache-Control: max-age=0
Connection: keep-alive
Content-Length: 48
Content-Type: application/x-www-form-urlencoded
Host: myurl.com
Origin: https://myurl.com
Referer: https://myurl.com/install/
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64)
Form Data
page: install
state: STATUS
I can do the following:
import requests
headers = {"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Encoding":"gzip,deflate,br",
"Accept-Language":"en-US,en;q=0.8",
"Cache-Control":"max-age=0",
"Connection":"keep-alive",
"Content-Length":"48",
"Content-Type":"application/x-www-form-urlencoded",
"Host":"myurl.com",
"Origin":"https://myurl.com",
"Referer":"https://myurl.com/install/?s=ROM",
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"}
f = requests.put(path, headers=headers)
But how do I handle the form data? Under the form data there is a page: install and a state: STATUS.
How do I include this on my POST request?
Just add data= to your request:
import requests
path = ...
headers = ...
form_data = {
"page": "install",
"state": "STATUS",
}
f = requests.put(path, headers=headers, data=form_data)
I presume you know how to use the developer tools on the browser of your choice. The following is a template I follow:
Load the page (GET)
Use XPATH to find_element_by_id I'm targeting, i.e. username
Set XPATH to SetValue of such element
Post the page (POST)
I have problem on site where email is under obfuscator.
When i run my program i get output:
The email is:{"success":"","code":1,"msg":"ReCAPTCHA"}
But when i want click 'watch email' on computer all is fine and i get:
The email is:{"success":"","code":0,"msg":"xxxx#gmail.com"}
Code in POST:
REQUEST HEADERS
Accept: application/json, text/javascript, */*; q=0.01
Accept-Encoding: gzip, deflate, br
Accept-Language: pl-PL,pl;q=0.9,en-US;q=0.8,en;q=0.7
Connection: keep-alive
Content-Length: 141
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Cookie: SOOOME COOKIES.
Host: https://xxxxxxx.com
Origin: https://xxxxxxx.com
Referer: https://xxxxxxx.com/asas
User-Agent: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Mobile Safari/537.36
X-Requested-With: XMLHttpRequest
QUERY STRING PARAMETRS
decode:
FORM DATA
hash: YToyOntpOjA7czo0NDoidHh3VFlXck83eFdza1FRUWgydUlvb0MveHRRemNLaCtNa3BuenVJU0VmUT0iO2k6MTtzOjE2OiK3SJ7OlhTa5DgPfA1YqCfRIjt9
type: ademail
And here is my code:
import requests
url = "https://xxxxx.com/_ajax/obfuscator/?decode"
headers = {
'User-Agent':'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Mobile Safari/537.36',
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'pl-PL,pl;q=0.9,en-US;q=0.8,en;q=0.7',
'Connection': 'keep-alive',
'Content-Length': '141',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Host': 'https://xxxxx.com/',
'Origin': 'https://xxxxx.com/',
'Referer':'https://xxxxx.com/asd',
'X-Requested-With':'XMLHttpRequest' }
data = {'hash':'YToyOntpOjA7czo0NDoiQStHbXkrY2p1dllrUmlXSWdWTjdNbHF2Y3cyak13QU5GeUtaQXZReFcrbz0iO2k6MTtzOjE2OiJ7Byq7O88ydxCtVWgoEETOIjt9',
'type':'adsemail'}
r = requests.post(url, data, headers)
pastebin_url = r.text
print("The email is:%s"%pastebin_url)
I also try do it the same as Webdriver
driver = webdriver.Chrome("C:/Users/User/Desktop/Email/chromedriver.exe")
driver.set_page_load_timeout(5000)
driver.get("https://xxxx.com/asd")
driver.implicitly_wait(3000)
sleep(1)
RODO = "//input[#class='btn btn-confirm']"
driver.find_element_by_xpath(RODO).click()
sleep(7)
email = "//span[#class='click_to_show']"
driver.find_element_by_xpath(email).click()
But i get Recaptcha to do.... ;/
Where is the problem?
I also try:
``` user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.50 Safari/537.36'
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('user-agent={0}'.format(user_agent))
driver = webdriver.Chrome("C:/Users/User/Desktop/Email/chromedriver.exe")
driver.set_page_load_timeout(5000)
But not working, site want captcha ;/
I've been trying to login on this website and download some files for a while now. I can't find out what is wrong with my code below. I'm new to Python:
When you hit the login page https://www.targetsite/members/login.php, it makes the following calls:
Login page:
[General]
Request URL: https://www.targetsite/members/login.php
Request Method: GET
Status Code: 302 Found
Remote Address: 0.0.0.0:443
Referrer Policy: no-referrer-when-downgrade
[Response Headers]
Connection: keep-alive
Content-Length: 425
Content-Type: text/html; charset=iso-8859-1
Date: Thu, 13 Dec 2018 14:37:02 GMT
Keep-Alive: timeout=5
Location: https://www.targetsite/auth.form?bWFmYkpTbjhNN0J4bWM2S2NwaCtlNTkydDJJV0xiMk1aTWRKa0kwVldWZ29hZjdaMEUweFJuRWl3a3NqOVUwTwpSbmtyRnptekp6Wm40VlM2MDF5dWVVSDR2V1FXMG5JU1ZNQVUrZ0lvYTlXTWQ4T2ZEVzJVY1ZSWW4wZk1NVHZhCm9uZWZZeTI2V1JNPQo=
Server: nginx/1.14.2
Set-Cookie: pcar%5fUkVTVFJJQ1RFRA%3d%3d=; path=/; domain=.targetsite; expires=Wed 13-Dec-2017 14:37:02 GMT
X-Vegas-No-Cache: YES
[Request Headers]
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Connection: keep-alive
Cookie: pcah=SXlOK1VkSytiMTl0VllvbDk4N2tVaXR5bmZFZmNNVUsK
Host: www.targetsite
Referer: http://www.targetsite/tour/index.php
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36 Vivaldi/2.1.1337.36
AuthForm:
[General]
Request URL: https://www.targetsite/auth.form?bWFmYkpTbjhNN0J4bWM2S2NwaCtlNTkydDJJV0xiMk1aTWRKa0kwVldWZ29hZjdaMEUweFJuRWl3a3NqOVUwTwpSbmtyRnptekp6Wm40VlM2MDF5dWVVSDR2V1FXMG5JU1ZNQVUrZ0lvYTlXTWQ4T2ZEVzJVY1ZSWW4wZk1NVHZhCm9uZWZZeTI2V1JNPQo=
Request Method: GET
Status Code: 200 OK
Remote Address: 0.0.0.0:443
Referrer Policy: no-referrer-when-downgrade
[Response Headers]
Cache-Control: max-age=1, must-revalidate
Connection: keep-alive
Content-Type: text/html
Date: Thu, 13 Dec 2018 14:37:03 GMT
Expires: Thu, 13 Dec 2018 14:37:03 GMT
Keep-Alive: timeout=5
Server: nginx/1.14.2
Set-Cookie: pcar%5fUkVTVFJJQ1RFRA%3d%3d=; path=/; domain=.targetsite; expires=Wed 13-Dec-2017 14:37:03 GMT
Transfer-Encoding: chunked
[Request Headers]
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Connection: keep-alive
Cookie: pcah=SXlOK1VkSytiMTl0VllvbDk4N2tVaXR5bmZFZmNNVUsK
Host: www.targetsite
Referer: http://www.targetsite/tour/index.php
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36 Vivaldi/2.1.1337.36
bWFmYkpTbjhNN0J4bWM2S2NwaCtlNTkydDJJV0xiMk1aTWRKa0kwVldWZ29hZjdaMEUweFJuRWl3a3NqOVUwTwpSbmtyRnptekp6Wm40VlM2MDF5dWVVSDR2V1FXMG5JU1ZNQVUrZ0lvYTlXTWQ4T2ZEVzJVY1ZSWW4wZk1NVHZhCm9uZWZZeTI2V1JNPQo:
So based on that behaviour observed using chrome inspector, I coded the following, trying to emulate the actions triggered after accessing https://www.targetsite.com/members/login.php
#Libraries
import requests
import json
from lxml import html
#URL
primeiraUrl = 'https://www.targetsite.com/members/login.php'
urlPost = 'https://ams.targetsite.com/auth.form'
#Credentials
userd = 'user'
passwd = 'pass'
session = requests.Session()
#session.verify = False
#GetToken
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9',
'Connection': 'keep-alive',
'Referer': 'http://www.targetsite.com/tour/index.php',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36 Vivaldi/2.1.1337.36'
}
get1stContact = session.get(primeiraUrl,headers=headers)
segundaUrl = get1stContact.url
get2ndContact = session.get(segundaUrl,headers=headers)
And then when you login on the website, that's what you get:
[General]
Request URL: https://ams.targetsite.com/auth.form
Request Method: POST
Status Code: 302 Found
Remote Address: 1.1.1.1:443
Referrer Policy: no-referrer-when-downgrade
[Response Headers]
Connection: Keep-Alive
Content-Length: 239
Content-Type: text/html; charset=iso-8859-1
Date: Thu, 13 Dec 2018 14:47:57 GMT
Keep-Alive: timeout=20, max=94
Location: http://www.targetsite.com/members/index.php
Server: Apache/2.2.15 (CentOS)
Set-Cookie: pcar%5fUkVTVFJJQ1RFRA%3d%3d=cS90NW9XLzVVeVNIeElMOUpFaHlCb2hGWkZveVUrdTFiK0dad0FYVDN2UT0K; path=/; domain=.targetsite.com; expires=Thu 13-Dec-2018 20:47:57 GMT
[Request Headers]
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Cache-Control: max-age=0
Connection: keep-alive
Content-Length: 147
Content-Type: application/x-www-form-urlencoded
Cookie: pcah=Q3BLRnpDUGRhcnJFMmg1OGI0LzBrLzNhYWM5cjBVV2IK
Host: ams.targetsite.com
Origin: https://www.targetsite.com
Referer: https://www.targetsite.com/auth.form?N2dFMUIwaGFIc1BWQ3BoRTd2NVBWayt5ZE91UnZsa2xCcmNUU1VtVG8yNW54WUhjNFBYblE3STJwK2xrRWhNawpNRWtmMjJtUFF2Y0xSL2t1N2xIc2pmSk4wZG5uRVdmbkEyRUpxdnVDODI4UmVhMjlId1h6dVZIeFRtWGZuUGd5CkoySHEwZXg5RnRVPQo=
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36 Vivaldi/2.1.1337.36
[Form-Data]
rlm: RESTRICTED
for: http%3a%2f%2fwww%2etargetsite%2ecom%2fmembers%2findex%2ephp
rmb: y
uid: user
pwd: pass
And here is the code I wrote to make that post request:
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Origin': 'https://www.targetsite.com',
'Referer': segundaUrl,
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36 Vivaldi/2.1.1337.36'
}
body = {
'rlm':'RESTRICTED',
'for':'http://www.targetsite.com/members/index.php',
'rmb': 'y',
'uid': 'user',
'pwd': 'pass'
}
#session.post(url)
r = session.post(urlPost, headers=headers, data=body)
That all being said there's anyone who can help me figure this out ? Thanks in advance!
Edit.: Full code as requested:
#Help
#http://kazuar.github.io/scraping-tutorial/
#Libraries
import requests
import json
from lxml import html
#URL
primeiraUrl = 'https://www.targetsite.com/members/login.php'
urlPost = 'https://ams.targetsite.com/auth.form'
#Credentials
userd = 'user'
passwd = 'pass'
session = requests.Session()
#session.verify = False
#GetToken
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9',
'Connection': 'keep-alive',
'Referer': 'http://www.targetsite.com/tour/index.php',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36 Vivaldi/2.1.1337.36'
}
get1stContact = session.get(primeiraUrl,headers=headers)
segundaUrl = get1stContact.url
get2ndContact = session.get(segundaUrl,headers=headers)
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Content-Type': 'application/x-www-form-urlencoded',
'Accept-Language': 'en-US,en;q=0.9',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Origin': 'https://www.targetsite.com',
'Referer': segundaUrl,
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36 Vivaldi/2.1.1337.36'
}
body = {
'rlm':'RESTRICTED',
'for':'http://www.targetsite.com/members/index.php',
'rmb': 'y',
'uid': 'user',
'pwd': 'pass'
}
#session.post(url)
r = session.post(urlPost, headers=headers, data=body)
print(r.text)
You are missing to send some headers, most importantly:
Content-Type: application/x-www-form-urlencoded
That one is very important, as it tells the server how to parse the parameters you are sending in your form.
So I've been trying for the last 6 hours to make this work, but I couldn't and endless searches didn't help, So I guess I'm either doing something very fundamental wrong, or it's just a trivial bug which happens to match my logic so I need extra eyes to help me fix it.
The website url is this.
I wrote a piece of messy python code to just login and read the next page, but All I get is a nasty 500 error saying something on the server went wrong processing my request.
Here is the request made by a browser which works just fine, no problem.
HTTP Response code to this request is 302 (Redirect)
POST /appstatus/index.aspx HTTP/1.1
Host: www.wes.org
Connection: close
Content-Length: 303
Cache-Control: max-age=0
Origin: https://www.wes.org
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36
Content-Type: application/x-www-form-urlencoded
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Referer: https://www.wes.org/appstatus/index.aspx
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.8,fa;q=0.6
Cookie: ASP.NET_SessionId=bu2gemmlh3hvp4f5lqqngrbp; _ga=GA1.2.1842963052.1473348318; _gat=1
__VIEWSTATE=%2FwEPDwUKLTg3MTMwMDc1NA9kFgICAQ9kFgICAQ8PFgIeBFRleHRkZGRk9rP20Uj9SdsjOKNUBlbw55Q01zI%3D&__VIEWSTATEGENERATOR=189D346C&__EVENTVALIDATION=%2FwEWBQK6lf6LBAKf%2B9bUAgK9%2B7qcDgK8w4S2BALowqJjoU1f0Cg%2FEAGU6r2IjpIPG8BO%2BiE%3D&txtUID=Email%40Removed.com&txtPWD=PASSWORDREMOVED&Submit=Log+In&Hidden1=
and this one is the request made by my script.
POST /appstatus/index.aspx HTTP/1.1
Host: www.wes.org
Connection: close
Accept-Encoding: gzip, deflate, br
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Upgrade-Insecure-Requests: 1
Content-Type: application/x-www-form-urlencoded
Origin: https://www.wes.org
Accept-Language: en-US,en;q=0.8,fa;q=0.6
Cache-Control: max-age=0
Referer: https://www.wes.org/appstatus/indexca.aspx
Cookie: ASP.NET_SessionId=nxotmb55jjwf5x4511rwiy45
Content-Length: 303
txtPWD=PASSWORDREMOVED&Submit=Log+In&__EVENTVALIDATION=%2FwEWBQK6lf6LBAKf%2B9bUAgK9%2B7qcDgK8w4S2BALowqJjoU1f0Cg%2FEAGU6r2IjpIPG8BO%2BiE%3D&txtUID=Email%40Removed.com&__VIEWSTATE=%2FwEPDwUKLTg3MTMwMDc1NA9kFgICAQ9kFgICAQ8PFgIeBFRleHRkZGRk9rP20Uj9SdsjOKNUBlbw55Q01zI%3D&Hidden1=&__VIEWSTATEGENERATOR=189D346C
And this is the script making the request, I'm sorry if it's so messy, just need something quick.
import requests
import bs4
import urllib.parse
def main():
session = requests.Session()
headers = {"Origin": "https://www.wes.org",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Cache-Control": "max-age=0", "Upgrade-Insecure-Requests": "1", "Connection": "close",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36",
"Referer": "https://www.wes.org/appstatus/indexca.aspx", "Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.8,fa;q=0.6", "Content-Type": "application/x-www-form-urlencoded"}
r = session.get('https://www.wes.org/appstatus/index.aspx',headers=headers)
cookies = r.cookies
soup = bs4.BeautifulSoup(r.content, "html5lib")
viewState=urllib.parse.quote(str(soup.select('#__VIEWSTATE')[0]).split('value="')[1].split('"/>')[0])
viewStateGenerator=urllib.parse.quote(str(soup.select('#__VIEWSTATEGENERATOR')[0]).split('value="')[1].split('"/>')[0])
eventValidation=urllib.parse.quote(str(soup.select('#__EVENTVALIDATION')[0]).split('value="')[1].split('"/>')[0])
paramsPost = {}
paramsPost.update({'__VIEWSTATE':viewState})
paramsPost.update({'__VIEWSTATEGENERATOR':viewStateGenerator})
paramsPost.update({'__EVENTVALIDATION':eventValidation})
paramsPost.update({"txtUID": "My#Email.Removed"})
paramsPost.update({"txtPWD": "My_So_Called_Password"})
paramsPost.update({"Submit": "Log In"})
paramsPost.update({"Hidden1": ""})
response = session.post("https://www.wes.org/appstatus/index.aspx", data=paramsPost, headers=headers,
cookies=cookies)
print("Status code:", response.status_code) #Outputs 500.
#print("Response body:", response.content)
if __name__ == '__main__':
main()
Any help would be so much appreciated.
You are doing way too much work and in doing so not passing valid data,you extract value attribute directly i.e .select_one('#__VIEWSTATEGENERATOR')["value"] and the same for all the rest, the cookies will be set in the Session object after your initial get so the logic boils down to:
with requests.Session() as session:
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36"}
r = session.get('https://www.wes.org/appstatus/index.aspx', headers=headers)
soup = bs4.BeautifulSoup(r.content, "html5lib")
viewState = soup.select_one('#__VIEWSTATE')["value"]
viewStateGenerator = soup.select_one('#__VIEWSTATEGENERATOR')["value"]
eventValidation = soup.select_one('#__EVENTVALIDATION')["value"]
paramsPost = {'__VIEWSTATE': viewState,'__VIEWSTATEGENERATOR': viewStateGenerator,
'__EVENTVALIDATION': eventValidation,"txtUID": "My#Email.Removed",
"txtPWD": "My_So_Called_Password",
"Submit": "Log In","Hidden1": ""}
response = session.post("https://www.wes.org/appstatus/index.aspx", data=paramsPost, headers=headers)
print("Status code:", response.status_code)
Python by convention uses CamelCase for class names and lowercase with underscores to separate multiple words, you might want to consider applying that to your code.
How can I grab the request headers for an XHR requests using Python Requests module? Using the following code seems to return the response headers:
import requests
r = requests.get('http://www.whoscored.com/tournamentsfeed/12496/Fixtures/?d=2015W50&isAggregate=false')
headers = r.headers
print headers
This returns an object that looks like this:
{'content-length': '624', 'content-encoding': 'gzip', 'expires': '-1', 'vary': 'Accept-Encoding', 'server': 'Microsoft-IIS/8.0', 'pragma': 'no-cache', 'cache-control': 'no-cache', 'date': 'Tue, 15 Dec 2015 14:41:34 GMT', 'x-powered-by': 'ASP.NET', 'content-type': 'text/html; charset=utf-8'}
However, when I look in Chrome developer tools the request header looks like this:
Host: www.whoscored.com
Connection: keep-alive
Accept: text/plain, */*; q=0.01
Model-Last-Mode: W50hFYr7jwZWt40WUb9udPVFxmB6g9yct204X0/gmf4=
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36
Referer: http://www.whoscored.com/Regions/252/Tournaments/2/England-Premier-League
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-GB,en-US;q=0.8,en;q=0.6
Cookie: __gads=ID=d09f8c0cdc1a4258:T=1449875272:S=ALNI_MbTPDtXiIlHK49F4FOqdDap__pfCA; nlsnocrvu=1; OX_plg=swf|shk|pm; _ga=GA1.3.578623339.1449875271; _gat=1; _ga=GA1.2.578623339.1449875271
Can anyone assist?
Thanks
You need to check for request headers like this
r.request.headers
That would give you something like
{'Connection': 'keep-alive', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'User-Agent': 'python-requests/2.7.0 CPython/2.7.10 Darwin/15.0.0'}
For obvious reasons it won't be the same as you see in the Chrome developer tools, because the browser adds its own headers which the requests module doesn't.
GET /tournamentsfeed/12496/Fixtures/?d=2015W50&isAggregate=false HTTP/1.1
Host: www.whoscored.com
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8,fr;q=0.6
Cookie: _ga=GA1.2.788154924.1450195026; _gat_as25n45=1
To get these headers you need to run some js code to pull the headers.