Log in to website using Python and Requests module? - python

I'm writing an AliExpress web scraper using Python and the Requests module along with BeautifulSoup and I got it working well, however I've run into a problem - I get redirected to a login page randomly. My solution to this is to simply log in at the start of my session before scraping, but I don't know how to log in.
The login page (https://login.aliexpress.com) requires only the username and password, but when I try to enter them with my code and test to see if I'm logged in by going to https://home.aliexpress.com/index.htm and looking at the html, it fails as it redirects me back to the login page.
My code after trying multiple solutions to no avail:
import requests
LOGIN_URL = "https://login.aliexpress.com/"
LOGIN_INFO = {
"loginId": "myemail#email.com",
"password": "mypassword"
}
with requests.Session() as sess:
#go to login page
sess.get(LOGIN_URL)
#attempt to log in with my login info
sess.post(LOGIN_URL, data=LOGIN_INFO)
#go to 'My AliExpress' page to verify successful login
success = sess.get("https://home.aliexpress.com/index.htm")
#manually check html to see if I was sent to the login page again
print(success.text)
This is pretty much what's left after my many failed attempts. Some of the things I've tried are:
Looking at the cookie after the 'sess.get(LOGIN_URL)', it returns
this but I don't know what to do with it (in key:value format):
ali_apache_tracktmp :
ali_apache_track :
xman_f :
t52Eyo+p3qf6E6fdmL5yJ81g2icRn+2PYjjrWYHlqlDyXAixo92Z5KHMZV8SCV7vP4ZjxEmuTQesVWkqxUi3SpFU1qbRyNRd+d0pIIKVhrIDri2oaWrt6A==
JSESSIONID : 30678741D7473C80BEB85825718FB1C6
acs_usuc_t :
acs_rt=343aef98b0ca4ae79497e31b11c82c29&x_csrf=1b5g78e7fz2rt
xman_us_f : x_l=0
ali_apache_id : 23.76.146.14.1510893827939.187695.4
xman_t :
PSIYMbKN2UyuejZBfmP9o5hdmQGoSB0UL0785LnRBxW0bdbdMmtW2A47hHbgTgD7TmFp7QVsOW4kXTsXMncy+iKisKfqagqb4yPxOVFdw+k=
Tried looking for a csrf token and only found the text after '_csrf=' in the 5th bullet above. Tried using it and it didn't work.
Looked at the html form sent when you log in but I don't know html and can only recognize it has a lot more fields than the ones I've seen other people use for other websites (Image of Form Data from Chrome here).
Changing the "myPassword" in my code to the text in the password2 field in image above and changing the "password" key to "password2" too.
Googled for a few hours but didn't find anything that would work.
At this point, I'm at my wits end, so any help on how to proceed would be very much appreciated. I'm not the best coder (still learning), don't know html outside of what I've learned from a few tutorials about scraping, and was hoping to figure it out myself, but hours later I still haven't solved it and realized I could really use the help.
I'm using python 3.5. If there's any more info needed, let me know. Brain is just about turned completely to mush after being stuck and awake for so long.

I have a suspicion this will not work the way you want it to.
Even after somehow accomplishing the login prompt, the following page presents a "slider verification" which to my knowledge requests is unable to do anything about. (If there is a method please let me know).
I have been trying to use cookies instead:
session = requests.Session()
cj = requests.cookies.RequestsCookieJar()
cj.set('KEY', 'VALUE')
session.cookies = cj
response = session.get(url, timeout=5, headers=headers, proxies=proxies)
Previously the scraper worked using headers and proxies for a time, but recently it always prompts a login.
I have tried all the keys and values in the cookies as well to no avail.
An idea would be to use selenium to login and capture cookies, then pass it to requests session.
AntoG has a solution to do this:
https://stackoverflow.com/a/42114843

Related

How to use requests to login to this website

I'm trying to automate some tasks with python, and webscraping. but first, I need to login to a website I have an account on.
I've seen several examples on stack overflow, but for some reason, this website won't let me login using requests. Can anyone tell me what I'm doing wrong?
The webpage:
https://www.americanbulls.com/Signin.aspx?lang=en
the form variables:
ctl00$MainContent$uEmail
ctl00$MainContent$uPassword
Is it the variable names have '$' in them?
Any help would be greatly appreciated.
import sys
print(sys.path)
sys.path.append('C:\program files\python36\lib\site-packages\pip\_vendor')
import requests
import sys
import time
EMAIL = '<my_email>'
PASSWORD = '<my_password>'
URL = 'https://www.americanbulls.com/Signin.aspx?lang=en'
# Start a session so we can have persistant cookies
session = requests.session()
#This is the form data that the page sends when logging in
login_data = {
'ctl00$MainContent$uEmail': EMAIL,
'ctl00$MainContent$uPassword': PASSWORD
}
# Authenticate
r = session.post(URL, data=login_data, timeout=15, verify=True)
# Try accessing a page that requires you to be logged in
r = session.get('https://www.americanbulls.com/members/SignalPage.aspx?lang=en&Ticker=SQ')
print(r.url)
I submitted a form using test#test.test as the email and test as the password, and when I looked at the headers of the request I'd sent in the network tab of chrome dev tools it said I submitted the following form data.
ctl00$ScriptManager1:ctl00$MainContent$UpdatePanel|ctl00$MainContent$btnSubmit
__LASTFOCUS:
__EVENTTARGET:
__EVENTARGUMENT:
__VIEWSTATE:/wEPDwULLTE5MzMzODAyNzIPZBYCZg9kFgICAQ9kFgICAw9kFgICBQ9kFhICAQ8WAh4FY2xhc3MFFmhlYWRlcmNvbnRhaW5lcl9zYWZhcmkWCgIBDzwrAAkCAA8WAh4OXyFVc2VWaWV3U3RhdGVnZAYPZBAWAmYCARYCPCsADAEAFgYeC05hdmlnYXRlVXJsBRVSZWdpc3Rlci5hc3B4P2xhbmc9ZW4eBFRleHQFCFJlZ2lzdGVyHgdUb29sVGlwBTFSZWdpc3RlciBub3cgdG8gZ2V0IGFjY2VzcyB0byBleGNsdXNpdmUgZmVhdHVyZXMhPCsADAEAFgYfAwUHU2lnbiBJbh8CBRNTaWduaW4uYXNweD9sYW5nPWVuHghTZWxlY3RlZGdkZAIDDw8WBB8CBRREZWZhdWx0LmFzcHg/bGFuZz1lbh4ISW1hZ2VVcmwFGH4vaW1nL2FtZXJpY2FuYnVsbHMxLmdpZmRkAgcPZBYCAgEPPCsACQIADxYCHwFnZAYPZBAWAWYWATwrAAwCABYCHwMFB0VuZ2xpc2gBD2QQFghmAgECAgIDAgQCBQIGAgcWCDwrAAwCABYGHwMFB0VuZ2xpc2gfAgUUL1NpZ25pbi5hc3B4P2xhbmc9ZW4fBWcCFCsAAhYCHgNVcmwFEn4vaW1nL2VuaWNvbjAxLnBuZ2Q8KwAMAgAWBB8DBQdEZXV0c2NoHwIFFC9TaWduaW4uYXNweD9sYW5nPWRlAhQrAAIWAh8HBRJ+L2ltZy9kZWljb24wMS5wbmdkPCsADAIAFgQfAwUG5Lit5paHHwIFFC9TaWduaW4uYXNweD9sYW5nPXpoAhQrAAIWAh8HBRJ+L2ltZy96aGljb24wMS5wbmdkPCsADAIAFgQfAwUJRnJhbsOnYWlzHwIFFC9TaWduaW4uYXNweD9sYW5nPWZyAhQrAAIWAh8HBRJ+L2ltZy9mcmljb24wMS5wbmdkPCsADAIAFgQfAwUIVMO8cmvDp2UfAgUUL1NpZ25pbi5hc3B4P2xhbmc9dHICFCsAAhYCHwcFEn4vaW1nL3RyaWNvbjAxLnBuZ2Q8KwAMAgAWBB8DBQlJbmRvbmVzaWEfAgUUL1NpZ25pbi5hc3B4P2xhbmc9aWQCFCsAAhYCHwcFEn4vaW1nL2lkaWNvbjAxLnBuZ2Q8KwAMAgAWBB8DBQhFc3Bhw7FvbB8CBRQvU2lnbmluLmFzcHg/bGFuZz1lcwIUKwACFgIfBwUSfi9pbWcvZXNpY29uMDEucG5nZDwrAAwCABYEHwMFCEl0YWxpYW5vHwIFFC9TaWduaW4uYXNweD9sYW5nPWl0AhQrAAIWAh8HBRJ+L2ltZy9pdGljb24wMS5wbmdkZGRkAgkPZBYCAgEPPCsABAEADxYEHgVWYWx1ZQULTGFzdCBVcGRhdGUeB1Zpc2libGVoZGQCCw9kFgICAQ88KwAEAQAPFgIfCWhkZAIDDxYCHwAFGG1haW5tZW51Y29udGFpbmVyX3NhZmFyaRYGAgEPPCsACQIADxYCHwFnZAYPZBAWDWYCAQICAgMCBAIFAgYCBwIIAgkCCgILAgwWDTwrAAwBABYEHwIFFERlZmF1bHQuYXNweD9sYW5nPWVuHwMFBEhPTUU8KwAMAQAWBB8DBQRBTUVYHwIFKVNpZ25hbExpc3QuYXNweD9sYW5nPWVuJk1hcmtldFN5bWJvbD1BTUVYPCsADAEAFgQfAwUETllTRR8CBSlTaWduYWxMaXN0LmFzcHg/bGFuZz1lbiZNYXJrZXRTeW1ib2w9TllTRTwrAAwBABYEHwMFBk5BU0RBUR8CBStTaWduYWxMaXN0LmFzcHg/bGFuZz1lbiZNYXJrZXRTeW1ib2w9TkFTREFRPCsADAEAFgQfAwUIT1RDIFBJTksfAgUpU2lnbmFsTGlzdC5hc3B4P2xhbmc9ZW4mTWFya2V0U3ltYm9sPVBJTks8KwAMAQAWBB8DBQlQUkVGRVJSRUQfAgUuU2lnbmFsTGlzdC5hc3B4P2xhbmc9ZW4mTWFya2V0U3ltYm9sPVBSRUZFUlJFRDwrAAwBABYEHwMFCFdBUlJBTlRTHwIFLVNpZ25hbExpc3QuYXNweD9sYW5nPWVuJk1hcmtldFN5bWJvbD1XQVJSQU5UUzwrAAwBABYEHwMFB0lOREVYRVMfAgUcSW5kZXhTaWduYWxMaXN0LmFzcHg/bGFuZz1lbjwrAAwCABYEHwMFAmZ4HwIFGVNpZ25hbExpc3RGWC5hc3B4P2xhbmc9ZW4KPCsADgEAFgYeCUZvcmVDb2xvcgpgHgtGb250X0l0YWxpY2ceBF8hU0IChCA8KwAMAQAWAh8JaDwrAAwBABYCHwloPCsADAEAFgIfCWg8KwAMAQAWAh8JaGRkAgMPFCsABA8WBB8IBRRTdXBwb3J0LmFzcHg/bGFuZz1lbh8JaGRkZDwrAAUBABYCHwMFBEhlbHBkAgUPZBYCAgMPPCsABAEADxYCHwgFJmh0dHBzOi8vd3d3LnR3aXR0ZXIuY29tL2FtZXJpY2FuX0J1bGxzZGQCBQ8WAh8ABRdzdWJtZW51Y29udGFpbmVyX3NhZmFyaRYKAgEPPCsACQIADxYCHwFnZAYPZBAWAWYWATwrAAwBABYEHwIFFVJlZ2lzdGVyLmFzcHg/bGFuZz1lbh8DBTFSZWdpc3RlciBub3cgdG8gZ2V0IGFjY2VzcyB0byBleGNsdXNpdmUgZmVhdHVyZXMhZGQCAw88KwAJAgAPFgQfAWcfCWhkBg9kEBYBZhYBPCsADAEAFgQfAgUTU2lnbmluLmFzcHg/bGFuZz1lbh8DBQdTaWduIEluZGQCBQ88KwAJAgAPFgIfAWdkBg9kEBYBZhYBPCsADAEAFgQfAgUfTWVtYmVyc2hpcEJlbmVmaXRzLmFzcHg/bGFuZz1lbh8DBRNNZW1iZXJzaGlwIEJlbmVmaXRzZGQCBw88KwAJAQAPFgQfAWcfCWhkZAILDzwrAAYBAzwrAAgBABYCHghOdWxsVGV4dAUMRW50ZXIgU3ltYm9sZAIHDxYCHwAFEGNvbnRhaW5lcl9zYWZhcmkWAgIBD2QWAgIBD2QWAgIBD2QWAgIDD2QWAmYPZBYcAgEPPCsABAEADxYCHwgFB1NpZ24gSW5kZAIDDzwrAAQBAA8WAh8IBQlOZXcgVXNlcj9kZAIFDxQrAAQPFgIfCAUVUmVnaXN0ZXIuYXNweD9sYW5nPWVuZGRkPCsABQEAFgIfAwUIUmVnaXN0ZXJkAgcPPCsABAEADxYCHwhlZGQCCQ88KwAEAQAPFgIfCAUFRW1haWxkZAINDw8WAh4MRXJyb3JNZXNzYWdlBQ1JbnZhbGlkIGVtYWlsZGQCDw8PFgIfDgUNSW52YWxpZCBlbWFpbGRkAhEPPCsABAEADxYCHwgFCFBhc3N3b3JkZGQCFQ8PFgIfDgUUUGFzc3dvcmQgaXMgcmVxdWlyZWRkZAIXDzwrAAQBAA8WAh8IBQtSZW1lbWJlciBNZWRkAhsPDxYCHwMFB1NpZ24gSW5kZAIdDw8WAh8DBQZDYW5jZWxkZAIfDzwrAAQBAA8WAh8IBShJZiB5b3UgY2Fubm90IHJlYWNoIHlvdXIgYWNjb3VudCwgcGxlYXNlZGQCIQ8UKwAEDxYCHwgFGVNlbmRQYXNzd29yZC5hc3B4P2xhbmc9ZW5kZGQ8KwAFAQAWAh8DBQtjbGljayBoZXJlLmQCCQ8WAh8ABRB3aGl0ZWJhbnRfc2FmYXJpZAILDxYCHwAFG3N1cHBvcnRtZW51Y29udGFpbmVyX3NhZmFyaRYCAgEPPCsACQIADxYCHwFnZAYPZBAWBmYCAQICAgMCBAIFFgY8KwAMAQAWBB8DBQhBYm91dCBVcx8CBRRBYm91dFVzLmFzcHg/bGFuZz1lbjwrAAwBABYEHwMFB1N1cHBvcnQfAgUUU3VwcG9ydC5hc3B4P2xhbmc9ZW48KwAMAQAWBB8DBQdQcml2YWN5HwIFFFByaXZhY3kuYXNweD9sYW5nPWVuPCsADAEAFgQfAwUDVE9THwIFEFRvcy5hc3B4P2xhbmc9ZW48KwAMAQAWBB8DBRNNZW1iZXJzaGlwIEJlbmVmaXRzHwIFH01lbWJlcnNoaXBCZW5lZml0cy5hc3B4P2xhbmc9ZW48KwAMAQAWBB8DBQ9JbXBvcnRhbnQgTGlua3MfAgUbSW1wb3J0YW50TGlua3MuYXNweD9sYW5nPWVuZGQCDQ8WAh8ABRdmb290ZXJjb250YWluZXIxX3NhZmFyaRYCAgEPPCsACQIADxYCHwFnZAYPZBAWCGYCAQICAgMCBAIFAgYCBxYIPCsADAEAFgYfAwUHRW5nbGlzaB8FZx8CBRZ+Ly9TaWduaW4uYXNweD9sYW5nPWVuPCsADAEAFgQfAwUHRGV1dHNjaB8CBRZ+Ly9TaWduaW4uYXNweD9sYW5nPWRlPCsADAEAFgQfAwUG5Lit5paHHwIFFn4vL1NpZ25pbi5hc3B4P2xhbmc9emg8KwAMAQAWBB8DBQlGcmFuw6dhaXMfAgUWfi8vU2lnbmluLmFzcHg/bGFuZz1mcjwrAAwBABYEHwMFCFTDvHJrw6dlHwIFFn4vL1NpZ25pbi5hc3B4P2xhbmc9dHI8KwAMAQAWBB8DBQlJbmRvbmVzaWEfAgUWfi8vU2lnbmluLmFzcHg/bGFuZz1pZDwrAAwBABYEHwMFCEVzcGHDsW9sHwIFFn4vL1NpZ25pbi5hc3B4P2xhbmc9ZXM8KwAMAQAWBB8DBQhJdGFsaWFubx8CBRZ+Ly9TaWduaW4uYXNweD9sYW5nPWl0ZGQCDw8WAh8ABRdmb290ZXJjb250YWluZXIzX3NhZmFyaRYOAgEPFgIeCWlubmVyaHRtbAUMRGlzY2xhaW1lcnM6ZAIDDxYCHw8FjwVBbWVyaWNhbmJ1bGxzLmNvbSBMTEMgaXMgbm90IHJlZ2lzdGVyZWQgYXMgYW4gaW52ZXN0bWVudCBhZHZpc2VyIHdpdGggdGhlIFUuUy4gU2VjdXJpdGllcyBhbmQgRXhjaGFuZ2UgQ29tbWlzc2lvbi4gIFJhdGhlciwgQW1lcmljYW5idWxscy5jb20gTExDIHJlbGllcyB1cG9uIHRoZSDigJxwdWJsaXNoZXLigJlzIGV4Y2x1c2lvbuKAnSBmcm9tIHRoZSBkZWZpbml0aW9uIG9mIGludmVzdG1lbnQgYWR2aXNlciBhcyBwcm92aWRlZCB1bmRlciBTZWN0aW9uIDIwMihhKSgxMSkgb2YgdGhlIEludmVzdG1lbnQgQWR2aXNlcnMgQWN0IG9mIDE5NDAgYW5kIGNvcnJlc3BvbmRpbmcgc3RhdGUgc2VjdXJpdGllcyBsYXdzLiBBcyBzdWNoLCBBbWVyaWNhbmJ1bGxzLmNvbSBMTEMgZG9lcyBub3Qgb2ZmZXIgb3IgcHJvdmlkZSBwZXJzb25hbGl6ZWQgaW52ZXN0bWVudCBhZHZpY2UuIFRoaXMgc2l0ZSBhbmQgYWxsIG90aGVycyBvd25lZCBhbmQgb3BlcmF0ZWQgYnkgQW1lcmljYW5idWxscy5jb20gTExDIGFyZSBib25hIGZpZGUgcHVibGljYXRpb25zIG9mIGdlbmVyYWwgYW5kIHJlZ3VsYXIgY2lyY3VsYXRpb24gb2ZmZXJpbmcgaW1wZXJzb25hbCBpbnZlc3RtZW50LXJlbGF0ZWQgYWR2aWNlIHRvIG1lbWJlciBhbmQgL29yIHByb3NwZWN0aXZlIG1lbWJlcnMuZAIFDxYCHw8FrAJBbWVyaWNhbmJ1bGxzLmNvbSBpcyBhbiBpbmRlcGVuZGVudCB3ZWJzaXRlLiBBbWVyaWNhbmJ1bGxzLmNvbSBMTEMgZG9lcyBub3QgcmVjZWl2ZSBjb21wZW5zYXRpb24gYnkgYW55IGRpcmVjdCBvciBpbmRpcmVjdCBtZWFucyBmcm9tIHRoZSBzdG9ja3MsIHNlY3VyaXRpZXMgYW5kIG90aGVyIGluc3RpdHV0aW9ucyBvciBhbnkgdW5kZXJ3cml0ZXJzIG9yIGRlYWxlcnMgYXNzb2NpYXRlZCB3aXRoIHRoZSBicm9hZGVyIG5hdGlvbmFsIG9yIGludGVybmF0aW9uYWwgZm9yZXgsIGNvbW1vZGl0eSBhbmQgc3RvY2sgbWFya2V0cy5kAgcPFgIfDwX3CFRoZXJlZm9yZSwgQW1lcmljYW5idWxscy5jb20gYW5kIEFtZXJpY2FuYnVsbHMuY29tIExMQyBpcyBleGVtcHQgZnJvbSB0aGUgZGVmaW5pdGlvbiBvZiDigJxpbnZlc3RtZW50IGFkdmlzZXLigJ0gYXMgcHJvdmlkZWQgdW5kZXIgU2VjdGlvbiAyMDIoYSkgKDExKSBvZiB0aGUgSW52ZXN0bWVudCBBZHZpc2VycyBBY3Qgb2YgMTk0MCBhbmQgY29ycmVzcG9uZGluZyBzdGF0ZSBzZWN1cml0aWVzIGxhd3MsIGFuZCBoZW5jZSByZWdpc3RyYXRpb24gYXMgc3VjaCBpcyBub3QgcmVxdWlyZWQuIFdlIGFyZSBub3QgYSByZWdpc3RlcmVkIGJyb2tlci1kZWFsZXIuIE1hdGVyaWFsIHByb3ZpZGVkIGJ5IEFtZXJpY2FuYnVsbHMuY29tIExMQyBpcyBmb3IgaW5mb3JtYXRpb25hbCBwdXJwb3NlcyBvbmx5LCBhbmQgdGhhdCBubyBtZW50aW9uIG9mIGEgcGFydGljdWxhciBzZWN1cml0eSBpbiBhbnkgb2Ygb3VyIG1hdGVyaWFscyBjb25zdGl0dXRlcyBhIHJlY29tbWVuZGF0aW9uIHRvIGJ1eSwgc2VsbCwgb3IgaG9sZCB0aGF0IG9yIGFueSBvdGhlciBzZWN1cml0eSwgb3IgdGhhdCBhbnkgcGFydGljdWxhciBzZWN1cml0eSwgcG9ydGZvbGlvIG9mIHNlY3VyaXRpZXMsIHRyYW5zYWN0aW9uIG9yIGludmVzdG1lbnQgc3RyYXRlZ3kgaXMgc3VpdGFibGUgZm9yIGFueSBzcGVjaWZpYyBwZXJzb24uIFRvIHRoZSBleHRlbnQgdGhhdCBhbnkgb2YgdGhlIGluZm9ybWF0aW9uIG9idGFpbmVkIGZyb20gQW1lcmljYW5idWxscy5jb20gTExDIG1heSBiZSBkZWVtZWQgdG8gYmUgaW52ZXN0bWVudCBvcGluaW9uLCBzdWNoIGluZm9ybWF0aW9uIGlzIGltcGVyc29uYWwgYW5kIG5vdCB0YWlsb3JlZCB0byB0aGUgaW52ZXN0bWVudCBuZWVkcyBvZiBhbnkgc3BlY2lmaWMgcGVyc29uLiBBbWVyaWNhbmJ1bGxzLmNvbSBMTEMgZG9lcyBub3QgcHJvbWlzZSwgZ3VhcmFudGVlIG9yIGltcGx5IHZlcmJhbGx5IG9yIGluIHdyaXRpbmcgdGhhdCBhbnkgaW5mb3JtYXRpb24gcHJvdmlkZWQgdGhyb3VnaCBvdXIgd2Vic2l0ZXMsIGNvbW1lbnRhcmllcywgb3IgcmVwb3J0cywgaW4gYW55IHByaW50ZWQgbWF0ZXJpYWwsIG9yIGRpc3BsYXllZCBvbiBhbnkgb2Ygb3VyIHdlYnNpdGVzLCB3aWxsIHJlc3VsdCBpbiBhIHByb2ZpdCBvciBsb3NzLmQCCQ8WAh8PBeMGR292ZXJubWVudCByZWd1bGF0aW9ucyByZXF1aXJlIGRpc2Nsb3N1cmUgb2YgdGhlIGZhY3QgdGhhdCB3aGlsZSB0aGVzZSBtZXRob2RzIG1heSBoYXZlIHdvcmtlZCBpbiB0aGUgcGFzdCwgcGFzdCByZXN1bHRzIGFyZSBub3QgbmVjZXNzYXJpbHkgaW5kaWNhdGl2ZSBvZiBmdXR1cmUgcmVzdWx0cy4gV2hpbGUgdGhlcmUgaXMgYSBwb3RlbnRpYWwgZm9yIHByb2ZpdHMgdGhlcmUgaXMgYWxzbyBhIHJpc2sgb2YgbG9zcy4gVGhlcmUgaXMgc3Vic3RhbnRpYWwgcmlzayBpbiBzZWN1cml0eSB0cmFkaW5nLiBMb3NzZXMgaW5jdXJyZWQgaW4gY29ubmVjdGlvbiB3aXRoIHRyYWRpbmcgc3RvY2tzIG9yIGZ1dHVyZXMgY29udHJhY3RzIGNhbiBiZSBzaWduaWZpY2FudC4gWW91IHNob3VsZCB0aGVyZWZvcmUgY2FyZWZ1bGx5IGNvbnNpZGVyIHdoZXRoZXIgc3VjaCB0cmFkaW5nIGlzIHN1aXRhYmxlIGZvciB5b3UgaW4gdGhlIGxpZ2h0IG9mIHlvdXIgZmluYW5jaWFsIGNvbmRpdGlvbiBzaW5jZSBhbGwgc3BlY3VsYXRpdmUgdHJhZGluZyBpcyBpbmhlcmVudGx5IHJpc2t5IGFuZCBzaG91bGQgb25seSBiZSB1bmRlcnRha2VuIGJ5IGluZGl2aWR1YWxzIHdpdGggYWRlcXVhdGUgcmlzayBjYXBpdGFsLiBOZWl0aGVyIEFtZXJpY2FuYnVsbHMuY29tIExMQywgbm9yIEFtZXJpY2FuYnVsbHMuY29tIG1ha2VzIGFueSBjbGFpbXMgd2hhdHNvZXZlciByZWdhcmRpbmcgcGFzdCBvciBmdXR1cmUgcGVyZm9ybWFuY2UuIEFsbCBleGFtcGxlcywgY2hhcnRzLCBoaXN0b3JpZXMsIHRhYmxlcywgY29tbWVudGFyaWVzLCBvciByZWNvbW1lbmRhdGlvbnMgYXJlIGZvciBlZHVjYXRpb25hbCBvciBpbmZvcm1hdGlvbmFsIHB1cnBvc2VzIG9ubHkuZAILDxYCHw8F3wZEaXNwbGF5ZWQgaW5mb3JtYXRpb24gaXMgYmFzZWQgb24gd2lkZWx5LWFjY2VwdGVkIG1ldGhvZHMgb2YgdGVjaG5pY2FsIGFuYWx5c2lzIGJhc2VkIG9uIGNhbmRsZXN0aWNrIHBhdHRlcm5zLiBBbGwgaW5mb3JtYXRpb24gaXMgZnJvbSBzb3VyY2VzIGRlZW1lZCB0byBiZSByZWxpYWJsZSwgYnV0IHRoZXJlIGlzIG5vIGd1YXJhbnRlZSB0byB0aGUgYWNjdXJhY3kuIExvbmctdGVybSBpbnZlc3RtZW50IHN1Y2Nlc3MgcmVsaWVzIG9uIHJlY29nbml6aW5nIHByb2JhYmlsaXRpZXMgaW4gcHJpY2UgYWN0aW9uIGZvciBwb3NzaWJsZSBmdXR1cmUgb3V0Y29tZXMsIHJhdGhlciB0aGFuIGFic29sdXRlIGNlcnRhaW50eSDigJMgcmlzayBtYW5hZ2VtZW50IGlzIGNyaXRpY2FsIGZvciBzdWNjZXNzLiBFcnJvciBhbmQgdW5jZXJ0YWludHkgYXJlIHBhcnQgb2YgYW55IGZvcm0gb2YgbWFya2V0IGFuYWx5c2lzLiBQYXN0IHBlcmZvcm1hbmNlIGlzIG5vIGd1YXJhbnRlZSBvZiBmdXR1cmUgcGVyZm9ybWFuY2UuIEludmVzdG1lbnQvIHRyYWRpbmcgY2FycmllcyBzaWduaWZpY2FudCByaXNrIG9mIGxvc3MgYW5kIHlvdSBzaG91bGQgY29uc3VsdCB5b3VyIGZpbmFuY2lhbCBwcm9mZXNzaW9uYWwgYmVmb3JlIGludmVzdGluZyBvciB0cmFkaW5nLiBZb3VyIGZpbmFuY2lhbCBhZHZpc2VyIGNhbiBnaXZlIHlvdSBzcGVjaWZpYyBmaW5hbmNpYWwgYWR2aWNlIHRoYXQgaXMgYXBwcm9wcmlhdGUgdG8geW91ciBuZWVkcywgcmlzay10b2xlcmFuY2UsIGFuZCBmaW5hbmNpYWwgcG9zaXRpb24uIEFueSB0cmFkZXMgb3IgaGVkZ2VzIHlvdSBtYWtlIGFyZSB0YWtlbiBhdCB5b3VyIG93biByaXNrIGZvciB5b3VyIG93biBhY2NvdW50LmQCDQ8WAh8PBdsBWW91IGFncmVlIHRoYXQgQW1lcmljYW5idWxscy5jb20gYW5kIEFtZXJpY2FuYnVsbHMuY29tIExMQyBpdHMgcGFyZW50IGNvbXBhbnksIHN1YnNpZGlhcmllcywgYWZmaWxpYXRlcywgb2ZmaWNlcnMgYW5kIGVtcGxveWVlcyBzaGFsbCBub3QgYmUgbGlhYmxlIGZvciBhbnkgZGlyZWN0LCBpbmRpcmVjdCwgaW5jaWRlbnRhbCwgc3BlY2lhbCBvciBjb25zZXF1ZW50aWFsIGRhbWFnZXMuZAIRDxYCHwAFHGJvdHRvbWJhbm5lcmNvbnRhaW5lcl9zYWZhcmlkGAEFHl9fQ29udHJvbHNSZXF1aXJlUG9zdEJhY2tLZXlfXxYIBQ9jdGwwMCRMb2dpbk1lbnUFC2N0bDAwJG1NYWluBQ5jdGwwMCRNYWluTWVudQUWY3RsMDAkRnJlZVJlZ2lzdGVyTWVudQUYY3RsMDAkTWVtYmVyc2hpcEJlbmVmaXRzBRJjdGwwMCRTZWFyY2hCdXR0b24FEWN0bDAwJFN1cHBvcnRNZW51BRNjdGwwMCRMYW5ndWFnZXNNZW51NlBIALTovVw6LJEOuDXyhCTS4+M=
__VIEWSTATEGENERATOR:ECDA716A
__EVENTVALIDATION:/wEdAAVswH4c0JxRe30eXDiX0bhcXr7XOgipC8DNcjKl0sbO7fwNII+YQgXfxmh/KZz6Myr4IcjYoaGuA6R78NuEHgsNQX9+ScDGDIM47zqhQCjs5Ynd+DEUmo0/Xv9Oy6tQgLO7ip/G
ctl00$mMain:{"selectedItemIndexPath":"0i0","checkedState":""}
ctl00$MainMenu:{"selectedItemIndexPath":"","checkedState":""}
ctl00$FreeRegisterMenu:{"selectedItemIndexPath":"","checkedState":""}
ctl00$MembershipBenefits:{"selectedItemIndexPath":"","checkedState":""}
ctl00$SearchBox$State:{"rawValue":"","validationState":""}
ctl00$SearchBox:Enter Symbol
ctl00$MainContent$uEmail:test#test.test
ctl00$MainContent$uPassword:test
ctl00$MainContent$ASPxCheckBox1:I
ctl00$SupportMenu:{"selectedItemIndexPath":"","checkedState":""}
ctl00$LanguagesMenu:{"selectedItemIndexPath":"","checkedState":""}
DXScript:1_304,1_185,1_298,1_211,1_221,1_188,1_182,1_290,1_296,1_279,1_198,1_209,1_217,1_201
DXCss:1_40,1_50,1_53,1_51,1_4,1_16,1_13,0_4617,0_4621,1_14,1_17,Styles/Site.css,img/favicon.ico,https://adservice.google.com/adsid/integrator.js?domain=www.americanbulls.com,https://securepubads.g.doubleclick.net/static/3p_cookie.html
__ASYNCPOST:true
ctl00$MainContent$btnSubmit:Sign In
Your code looks great. It just looks like the script is failing because you're not submitting everything that the browser would normally submit. You could try continuing down the path you are on, submit all of the extra form data, and hope you don't have to bother with adding a CSRF token (a CSRF token is a randomly generated string that you're required to send back), or you can do as Sidharth Shah sugggested and use Selenium.
There is a Firefox extension for Selenium that will allow you to start recording your mouse and keyboard actions, and then when you are done, you can export the results in Python. That Python code will depend on the Selenium library and a Selenium Chrome/Firefox/IE driver. When you run your Python code, a new browser window will open up, controlled by the selenium driver and your Python code. It's pretty cool, your basically writing Python code that controls a browser window. You will have to modify the Python code that the Firefox extension gives you a little bit to read all of the data from the page and start doing stuff with it after you're logged in, but the code for opening the browser window, navigating to athe login page, filling in your login credentials and submitting the form, and navigating to other pages after you're logged in will all be written for you.

How to make HTTP POST on website that uses asp.net?

I'm using Python library requests for this, but I can't seem to be able to log in to this website.
The url is https://www.bet365affiliates.com/ui/pages/affiliates/, and I've been trying post requests to https://www.bet365affiliates.com/Members/CMSitePages/SiteLogin.aspx?lng=1 with the data of "ctl00$MasterHeaderPlaceHolder$ctl00$passwordTextbox", "ctl00$MasterHeaderPlaceHolder$ctl00$userNameTextbox", etc, but I never seem to be able to get logged in.
Could someone more experienced check the page's source code and tell me what am I am missing here?
The solution could be this: Please Take attention, you could do it without selenium. If you want to do without it, firstly you should get the main affiliate page, and from the response data you could fetch all the required information (which I gather by xpaths). I just didn't have enough time to write it in fully requests.
To gather the informations from response data you could use XML tree library. With the same XPATH method, you could easily find all the requested informations.
import requests
from selenium import webdriver
Password = 'YOURPASS'
Username = 'YOURUSERNAME'
browser = webdriver.Chrome(os.getcwd()+"/"+"Chromedriver.exe")
browser.get('https://www.bet365affiliates.com/ui/pages/affiliates/Affiliates.aspx')
VIEWSTATE=browser.find_element_by_xpath('//*[#id="__VIEWSTATE"]')
SESSIONID=browser.find_element_by_xpath('//*[#id="CMSessionId"]')
PREVPAG=browser.find_element_by_xpath('//*[#id="__PREVIOUSPAGE"]')
EVENTVALIDATION=browser.find_element_by_xpath('//* [#id="__EVENTVALIDATION"]')
cookies = browser.get_cookies()
session = requests.session()
for cookie in cookies:
print cookie['name']
print cookie['value']
session.cookies.set(cookie['name'], cookie['value'])
payload = {'ctl00_AjaxScriptManager_HiddenField':'',
'__EVENTTARGET':'ctl00$MasterHeaderPlaceHolder$ctl00$goButton',
'__EVENTARGUMENT':'',
'__VIEWSTATE':VIEWSTATE,
'__PREVIOUSPAGE':PREVPAG,
'__EVENTVALIDATION':EVENTVALIDATION,
'txtPassword':Username,
'txtUserName':Password,
'CMSessionId':SESSIONID,
'returnURL':'/ui/pages/affiliates/Affiliates.aspx',
'ctl00$MasterHeaderPlaceHolder$ctl00$userNameTextbox':Username,
'ctl00$MasterHeaderPlaceHolder$ctl00$passwordTextbox':Password,
'ctl00$MasterHeaderPlaceHolder$ctl00$tempPasswordTextbox':'Password'}
session.post('https://www.bet365affiliates.com/Members/CMSitePages/SiteLogin.aspx?lng=1',data=payload)
Did you inspected the http request used by the browser to log you in?
You should replicate it.
FB

Retrieve awesomebox.io scan page content with python-requests

i'm trying to retrieve the page content from https://www.awesomebox.io/scan
But before I can do that need to be logged in. At the moment I still get the login page content. Thats because it redirects because im not logged in.
Anybody know how to get the scan page content with python-requests?
I tried multiple requests authentication methods.
My code so far:
import requests
session = requests.session()
loginURL = 'http://www.awesomebox.io/login'
payload = {'username': '******','password': '******'}
session.post(loginURL, data=payload)
scanURL = "http://awesomebox.io/scan"
scanpage = session.get(scanURL)
print scanpage.content
I don't have an account with awesomebox, so therefore don't know exactly. But nowadays a login on websites is more sophisticated and secure than a simple post of username and password.
To find out, you can do a manual login and trace the web traffic in the developer mode of the browser (e.g. F12 for MSIE or Edge) and store it in a .har file. There you can (hopefully) see, how the Login procedure is implemented and build the same sequence in your requests session.
Sometimes there is a hidden field in the form (e.g. "lt" for login ticket) that has been populated via js by the page before. Sometimes it's even more complex, if a secret login in run via Ajax in the Background. In this case you even see nothing in the F12 view and have to dig into the js scripts.
Thank you, I noticed i forgot a hidden parameter.
I added the csrfmiddlewaretoken.

Mechanize and Python not handling cookies properly

I have a Python script using mechanize browser which logs into a self hosted Wordpress blog, navigates to a different page after the automatic redirect to the dashboard to automate several builtin functions.
This script actually works 100% on most of my blogs but goes into a permanent loop with one of them.
The difference is that the only one which fails has a plugin called Wassup running. This plugin sets a session cookie for all visitors and this is what I think is causing the issue.
When the script goes to the new page the Wordpress code doesn't get the proper cookie set, decides that the browser isn't logged in and redirects to the login page. The script logs in again and attempts the same function and round we go again.
I tried using Twill which does login correctly and handles the cookies correctly but Twill, by default, outputs everything to the command line. This is not the behaviour I want as I am doing page manipulation at this point and I need access to the raw html.
This is the setup code
# Browser
self.br = mechanize.Browser()
# Cookie Jar
policy = mechanize.DefaultCookiePolicy(rfc2965=True)
cj = mechanize.LWPCookieJar(policy=policy)
self.br.set_cookiejar(cj)
After successful login I call this function
def open(self):
if 'http://' in str(self.burl):
site = str(self.burl) + '/wp-admin/plugin-install.php'
self.burl = self.burl[7:]
else:
site = "http://" + str(self.burl) + '/wp-admin/plugin-install.php'
try:
r = self.br.open(site, timeout=1000)
html = r.read()
return html
except HTTPError, e:
return str(e.code)
I'm thinking that I will need to save the cookies to a file and then shuffle the order so the Wordpress session cookie gets returned before the Wassup one.
Any other suggestions?
This turned out to be a quite different problem, and fix, than it seemed which is why I have decided to put the answer here for anyone who reads this later.
When a WordPress site is setup there is an option for the url to default to http://sample.com or http://www.sample.com. This turned out to be a problem for the cookie storage. Cookies are stored with the url as part of their name. My program semi-hardcodes the url with one or the other of these formats. This meant that every time I made a new url request it had the wrong format and no cookie with the right name could be found so the WordPress site rightfully decided I wasn't logged in and sent me back to login again.
The fix is to grab the url delivered in the redirect after login and recode the variable (in this case self.burl) to reflect what the .httaccess file expects to see.
This fixed my problem because some of my sites had one format and some the other.
I hope this helps someone out with using requests, twill, mechanise etc.

Python script is scraping the wrong page source. I think it's failing to login properly?

This script succeeds at getting a 200 response object, getting a cookie, and returning reddit's stock homepage source. However, it is supposed to get the source of the "recent activity" subpage which can only be accessed after logging in. This makes me think it's failing to log in appropriately but the username and password are accurate, I've double checked that.
#!/usr/bin/python
import requests
import urllib2
auth = ('username', 'password')
with requests.session(auth=auth) as s:
c = s.get('http://www.reddit.com')
cookies = c.cookies
for k, v in cookies.items():
opener = urllib2.build_opener()
opener.addheaders.append(('cookie', '{}={}'.format(k, v)))
f = opener.open('http://www.reddit.com/account-activity')
print f.read()
It looks like you're using the standard "HTTP Basic" authentication, which is not what Reddit uses to log in to its web site. (Almost no web sites use HTTP Basic (which pops up a modal dialog box requesting authentication), but implement their own username/password form).
What you'll need to do is get the home page, read the login form fields, fill in the user name and password, POST the response back to the web site, get the resulting cookie, then use the cookie in future requests. There may be quite a number of other details for you to work out too, but you'll have to experiment.
I just think maybe we're having the same problem. I get status code 200 ok. But the script never logged me in. I'm getting some suggestions and help. Hopefully you'll let me know what works for you too. Seems reddit is using the same system too.
Check out this page where my problem is being discussed.
Authentication issue using requests on aspx site

Categories