Can python webbrowser module work with cgi? - python

I am trying to open a url from an html form submit action using the python cgi and webbrowser modules. I am able to launch the url on localhost with
python -m webbrowser -t "http://myhost.com/path/to/some/file.html" however with
webbrowser.open(url, new=0, autoraise=True) in my cgi script, apache serves a blank page and no errors are printed to the page. The location box of my firefox browser shows 'http://myhost.com/my_script.cgi?Submit=Submit' instead of the destination url.
If the webbrowser module cannot be used in this context, I'm confused about what purpose it actually serves. If I use a url as the form action, the page would be served to the client, but my script needs to process variables passed by a previous html form in order to determine the url and then serve that url to the client. Let me know what I'm missing, thanks.

I'm pretty sure, yes, you are confusing what the webbrowser library is supposed to do.
If you run the webbrowser command inside the cgi, it will attempt to open a webbrowser on the server running your cgi script (as whatever user is running the cgi). This will probably not work. what I think you want to have happen is your cgi script should return the appropriate HTTP "commands" to the browser to cause it to redirect. As an example:
if "Submit" in form:
url = "http://www.python.org #or do whatever processing is needed to calculate the new URL
print "Status: 302" #this tells the browser the result is a redirect
print "Location: " + url #this tells the browser where to redirect
print "" #send two new lines to end the HTTP header
Note that if you were using django or another library (and I expect even just the cgi library counts), there's probably a higher level way to do this, but this should work just about anywhere. You could also do things with META-refresh inside HTML etc.

Related

How to use requests to login to this website

I'm trying to automate some tasks with python, and webscraping. but first, I need to login to a website I have an account on.
I've seen several examples on stack overflow, but for some reason, this website won't let me login using requests. Can anyone tell me what I'm doing wrong?
The webpage:
https://www.americanbulls.com/Signin.aspx?lang=en
the form variables:
ctl00$MainContent$uEmail
ctl00$MainContent$uPassword
Is it the variable names have '$' in them?
Any help would be greatly appreciated.
import sys
print(sys.path)
sys.path.append('C:\program files\python36\lib\site-packages\pip\_vendor')
import requests
import sys
import time
EMAIL = '<my_email>'
PASSWORD = '<my_password>'
URL = 'https://www.americanbulls.com/Signin.aspx?lang=en'
# Start a session so we can have persistant cookies
session = requests.session()
#This is the form data that the page sends when logging in
login_data = {
'ctl00$MainContent$uEmail': EMAIL,
'ctl00$MainContent$uPassword': PASSWORD
}
# Authenticate
r = session.post(URL, data=login_data, timeout=15, verify=True)
# Try accessing a page that requires you to be logged in
r = session.get('https://www.americanbulls.com/members/SignalPage.aspx?lang=en&Ticker=SQ')
print(r.url)
I submitted a form using test#test.test as the email and test as the password, and when I looked at the headers of the request I'd sent in the network tab of chrome dev tools it said I submitted the following form data.
ctl00$ScriptManager1:ctl00$MainContent$UpdatePanel|ctl00$MainContent$btnSubmit
__LASTFOCUS:
__EVENTTARGET:
__EVENTARGUMENT:
__VIEWSTATE:/wEPDwULLTE5MzMzODAyNzIPZBYCZg9kFgICAQ9kFgICAw9kFgICBQ9kFhICAQ8WAh4FY2xhc3MFFmhlYWRlcmNvbnRhaW5lcl9zYWZhcmkWCgIBDzwrAAkCAA8WAh4OXyFVc2VWaWV3U3RhdGVnZAYPZBAWAmYCARYCPCsADAEAFgYeC05hdmlnYXRlVXJsBRVSZWdpc3Rlci5hc3B4P2xhbmc9ZW4eBFRleHQFCFJlZ2lzdGVyHgdUb29sVGlwBTFSZWdpc3RlciBub3cgdG8gZ2V0IGFjY2VzcyB0byBleGNsdXNpdmUgZmVhdHVyZXMhPCsADAEAFgYfAwUHU2lnbiBJbh8CBRNTaWduaW4uYXNweD9sYW5nPWVuHghTZWxlY3RlZGdkZAIDDw8WBB8CBRREZWZhdWx0LmFzcHg/bGFuZz1lbh4ISW1hZ2VVcmwFGH4vaW1nL2FtZXJpY2FuYnVsbHMxLmdpZmRkAgcPZBYCAgEPPCsACQIADxYCHwFnZAYPZBAWAWYWATwrAAwCABYCHwMFB0VuZ2xpc2gBD2QQFghmAgECAgIDAgQCBQIGAgcWCDwrAAwCABYGHwMFB0VuZ2xpc2gfAgUUL1NpZ25pbi5hc3B4P2xhbmc9ZW4fBWcCFCsAAhYCHgNVcmwFEn4vaW1nL2VuaWNvbjAxLnBuZ2Q8KwAMAgAWBB8DBQdEZXV0c2NoHwIFFC9TaWduaW4uYXNweD9sYW5nPWRlAhQrAAIWAh8HBRJ+L2ltZy9kZWljb24wMS5wbmdkPCsADAIAFgQfAwUG5Lit5paHHwIFFC9TaWduaW4uYXNweD9sYW5nPXpoAhQrAAIWAh8HBRJ+L2ltZy96aGljb24wMS5wbmdkPCsADAIAFgQfAwUJRnJhbsOnYWlzHwIFFC9TaWduaW4uYXNweD9sYW5nPWZyAhQrAAIWAh8HBRJ+L2ltZy9mcmljb24wMS5wbmdkPCsADAIAFgQfAwUIVMO8cmvDp2UfAgUUL1NpZ25pbi5hc3B4P2xhbmc9dHICFCsAAhYCHwcFEn4vaW1nL3RyaWNvbjAxLnBuZ2Q8KwAMAgAWBB8DBQlJbmRvbmVzaWEfAgUUL1NpZ25pbi5hc3B4P2xhbmc9aWQCFCsAAhYCHwcFEn4vaW1nL2lkaWNvbjAxLnBuZ2Q8KwAMAgAWBB8DBQhFc3Bhw7FvbB8CBRQvU2lnbmluLmFzcHg/bGFuZz1lcwIUKwACFgIfBwUSfi9pbWcvZXNpY29uMDEucG5nZDwrAAwCABYEHwMFCEl0YWxpYW5vHwIFFC9TaWduaW4uYXNweD9sYW5nPWl0AhQrAAIWAh8HBRJ+L2ltZy9pdGljb24wMS5wbmdkZGRkAgkPZBYCAgEPPCsABAEADxYEHgVWYWx1ZQULTGFzdCBVcGRhdGUeB1Zpc2libGVoZGQCCw9kFgICAQ88KwAEAQAPFgIfCWhkZAIDDxYCHwAFGG1haW5tZW51Y29udGFpbmVyX3NhZmFyaRYGAgEPPCsACQIADxYCHwFnZAYPZBAWDWYCAQICAgMCBAIFAgYCBwIIAgkCCgILAgwWDTwrAAwBABYEHwIFFERlZmF1bHQuYXNweD9sYW5nPWVuHwMFBEhPTUU8KwAMAQAWBB8DBQRBTUVYHwIFKVNpZ25hbExpc3QuYXNweD9sYW5nPWVuJk1hcmtldFN5bWJvbD1BTUVYPCsADAEAFgQfAwUETllTRR8CBSlTaWduYWxMaXN0LmFzcHg/bGFuZz1lbiZNYXJrZXRTeW1ib2w9TllTRTwrAAwBABYEHwMFBk5BU0RBUR8CBStTaWduYWxMaXN0LmFzcHg/bGFuZz1lbiZNYXJrZXRTeW1ib2w9TkFTREFRPCsADAEAFgQfAwUIT1RDIFBJTksfAgUpU2lnbmFsTGlzdC5hc3B4P2xhbmc9ZW4mTWFya2V0U3ltYm9sPVBJTks8KwAMAQAWBB8DBQlQUkVGRVJSRUQfAgUuU2lnbmFsTGlzdC5hc3B4P2xhbmc9ZW4mTWFya2V0U3ltYm9sPVBSRUZFUlJFRDwrAAwBABYEHwMFCFdBUlJBTlRTHwIFLVNpZ25hbExpc3QuYXNweD9sYW5nPWVuJk1hcmtldFN5bWJvbD1XQVJSQU5UUzwrAAwBABYEHwMFB0lOREVYRVMfAgUcSW5kZXhTaWduYWxMaXN0LmFzcHg/bGFuZz1lbjwrAAwCABYEHwMFAmZ4HwIFGVNpZ25hbExpc3RGWC5hc3B4P2xhbmc9ZW4KPCsADgEAFgYeCUZvcmVDb2xvcgpgHgtGb250X0l0YWxpY2ceBF8hU0IChCA8KwAMAQAWAh8JaDwrAAwBABYCHwloPCsADAEAFgIfCWg8KwAMAQAWAh8JaGRkAgMPFCsABA8WBB8IBRRTdXBwb3J0LmFzcHg/bGFuZz1lbh8JaGRkZDwrAAUBABYCHwMFBEhlbHBkAgUPZBYCAgMPPCsABAEADxYCHwgFJmh0dHBzOi8vd3d3LnR3aXR0ZXIuY29tL2FtZXJpY2FuX0J1bGxzZGQCBQ8WAh8ABRdzdWJtZW51Y29udGFpbmVyX3NhZmFyaRYKAgEPPCsACQIADxYCHwFnZAYPZBAWAWYWATwrAAwBABYEHwIFFVJlZ2lzdGVyLmFzcHg/bGFuZz1lbh8DBTFSZWdpc3RlciBub3cgdG8gZ2V0IGFjY2VzcyB0byBleGNsdXNpdmUgZmVhdHVyZXMhZGQCAw88KwAJAgAPFgQfAWcfCWhkBg9kEBYBZhYBPCsADAEAFgQfAgUTU2lnbmluLmFzcHg/bGFuZz1lbh8DBQdTaWduIEluZGQCBQ88KwAJAgAPFgIfAWdkBg9kEBYBZhYBPCsADAEAFgQfAgUfTWVtYmVyc2hpcEJlbmVmaXRzLmFzcHg/bGFuZz1lbh8DBRNNZW1iZXJzaGlwIEJlbmVmaXRzZGQCBw88KwAJAQAPFgQfAWcfCWhkZAILDzwrAAYBAzwrAAgBABYCHghOdWxsVGV4dAUMRW50ZXIgU3ltYm9sZAIHDxYCHwAFEGNvbnRhaW5lcl9zYWZhcmkWAgIBD2QWAgIBD2QWAgIBD2QWAgIDD2QWAmYPZBYcAgEPPCsABAEADxYCHwgFB1NpZ24gSW5kZAIDDzwrAAQBAA8WAh8IBQlOZXcgVXNlcj9kZAIFDxQrAAQPFgIfCAUVUmVnaXN0ZXIuYXNweD9sYW5nPWVuZGRkPCsABQEAFgIfAwUIUmVnaXN0ZXJkAgcPPCsABAEADxYCHwhlZGQCCQ88KwAEAQAPFgIfCAUFRW1haWxkZAINDw8WAh4MRXJyb3JNZXNzYWdlBQ1JbnZhbGlkIGVtYWlsZGQCDw8PFgIfDgUNSW52YWxpZCBlbWFpbGRkAhEPPCsABAEADxYCHwgFCFBhc3N3b3JkZGQCFQ8PFgIfDgUUUGFzc3dvcmQgaXMgcmVxdWlyZWRkZAIXDzwrAAQBAA8WAh8IBQtSZW1lbWJlciBNZWRkAhsPDxYCHwMFB1NpZ24gSW5kZAIdDw8WAh8DBQZDYW5jZWxkZAIfDzwrAAQBAA8WAh8IBShJZiB5b3UgY2Fubm90IHJlYWNoIHlvdXIgYWNjb3VudCwgcGxlYXNlZGQCIQ8UKwAEDxYCHwgFGVNlbmRQYXNzd29yZC5hc3B4P2xhbmc9ZW5kZGQ8KwAFAQAWAh8DBQtjbGljayBoZXJlLmQCCQ8WAh8ABRB3aGl0ZWJhbnRfc2FmYXJpZAILDxYCHwAFG3N1cHBvcnRtZW51Y29udGFpbmVyX3NhZmFyaRYCAgEPPCsACQIADxYCHwFnZAYPZBAWBmYCAQICAgMCBAIFFgY8KwAMAQAWBB8DBQhBYm91dCBVcx8CBRRBYm91dFVzLmFzcHg/bGFuZz1lbjwrAAwBABYEHwMFB1N1cHBvcnQfAgUUU3VwcG9ydC5hc3B4P2xhbmc9ZW48KwAMAQAWBB8DBQdQcml2YWN5HwIFFFByaXZhY3kuYXNweD9sYW5nPWVuPCsADAEAFgQfAwUDVE9THwIFEFRvcy5hc3B4P2xhbmc9ZW48KwAMAQAWBB8DBRNNZW1iZXJzaGlwIEJlbmVmaXRzHwIFH01lbWJlcnNoaXBCZW5lZml0cy5hc3B4P2xhbmc9ZW48KwAMAQAWBB8DBQ9JbXBvcnRhbnQgTGlua3MfAgUbSW1wb3J0YW50TGlua3MuYXNweD9sYW5nPWVuZGQCDQ8WAh8ABRdmb290ZXJjb250YWluZXIxX3NhZmFyaRYCAgEPPCsACQIADxYCHwFnZAYPZBAWCGYCAQICAgMCBAIFAgYCBxYIPCsADAEAFgYfAwUHRW5nbGlzaB8FZx8CBRZ+Ly9TaWduaW4uYXNweD9sYW5nPWVuPCsADAEAFgQfAwUHRGV1dHNjaB8CBRZ+Ly9TaWduaW4uYXNweD9sYW5nPWRlPCsADAEAFgQfAwUG5Lit5paHHwIFFn4vL1NpZ25pbi5hc3B4P2xhbmc9emg8KwAMAQAWBB8DBQlGcmFuw6dhaXMfAgUWfi8vU2lnbmluLmFzcHg/bGFuZz1mcjwrAAwBABYEHwMFCFTDvHJrw6dlHwIFFn4vL1NpZ25pbi5hc3B4P2xhbmc9dHI8KwAMAQAWBB8DBQlJbmRvbmVzaWEfAgUWfi8vU2lnbmluLmFzcHg/bGFuZz1pZDwrAAwBABYEHwMFCEVzcGHDsW9sHwIFFn4vL1NpZ25pbi5hc3B4P2xhbmc9ZXM8KwAMAQAWBB8DBQhJdGFsaWFubx8CBRZ+Ly9TaWduaW4uYXNweD9sYW5nPWl0ZGQCDw8WAh8ABRdmb290ZXJjb250YWluZXIzX3NhZmFyaRYOAgEPFgIeCWlubmVyaHRtbAUMRGlzY2xhaW1lcnM6ZAIDDxYCHw8FjwVBbWVyaWNhbmJ1bGxzLmNvbSBMTEMgaXMgbm90IHJlZ2lzdGVyZWQgYXMgYW4gaW52ZXN0bWVudCBhZHZpc2VyIHdpdGggdGhlIFUuUy4gU2VjdXJpdGllcyBhbmQgRXhjaGFuZ2UgQ29tbWlzc2lvbi4gIFJhdGhlciwgQW1lcmljYW5idWxscy5jb20gTExDIHJlbGllcyB1cG9uIHRoZSDigJxwdWJsaXNoZXLigJlzIGV4Y2x1c2lvbuKAnSBmcm9tIHRoZSBkZWZpbml0aW9uIG9mIGludmVzdG1lbnQgYWR2aXNlciBhcyBwcm92aWRlZCB1bmRlciBTZWN0aW9uIDIwMihhKSgxMSkgb2YgdGhlIEludmVzdG1lbnQgQWR2aXNlcnMgQWN0IG9mIDE5NDAgYW5kIGNvcnJlc3BvbmRpbmcgc3RhdGUgc2VjdXJpdGllcyBsYXdzLiBBcyBzdWNoLCBBbWVyaWNhbmJ1bGxzLmNvbSBMTEMgZG9lcyBub3Qgb2ZmZXIgb3IgcHJvdmlkZSBwZXJzb25hbGl6ZWQgaW52ZXN0bWVudCBhZHZpY2UuIFRoaXMgc2l0ZSBhbmQgYWxsIG90aGVycyBvd25lZCBhbmQgb3BlcmF0ZWQgYnkgQW1lcmljYW5idWxscy5jb20gTExDIGFyZSBib25hIGZpZGUgcHVibGljYXRpb25zIG9mIGdlbmVyYWwgYW5kIHJlZ3VsYXIgY2lyY3VsYXRpb24gb2ZmZXJpbmcgaW1wZXJzb25hbCBpbnZlc3RtZW50LXJlbGF0ZWQgYWR2aWNlIHRvIG1lbWJlciBhbmQgL29yIHByb3NwZWN0aXZlIG1lbWJlcnMuZAIFDxYCHw8FrAJBbWVyaWNhbmJ1bGxzLmNvbSBpcyBhbiBpbmRlcGVuZGVudCB3ZWJzaXRlLiBBbWVyaWNhbmJ1bGxzLmNvbSBMTEMgZG9lcyBub3QgcmVjZWl2ZSBjb21wZW5zYXRpb24gYnkgYW55IGRpcmVjdCBvciBpbmRpcmVjdCBtZWFucyBmcm9tIHRoZSBzdG9ja3MsIHNlY3VyaXRpZXMgYW5kIG90aGVyIGluc3RpdHV0aW9ucyBvciBhbnkgdW5kZXJ3cml0ZXJzIG9yIGRlYWxlcnMgYXNzb2NpYXRlZCB3aXRoIHRoZSBicm9hZGVyIG5hdGlvbmFsIG9yIGludGVybmF0aW9uYWwgZm9yZXgsIGNvbW1vZGl0eSBhbmQgc3RvY2sgbWFya2V0cy5kAgcPFgIfDwX3CFRoZXJlZm9yZSwgQW1lcmljYW5idWxscy5jb20gYW5kIEFtZXJpY2FuYnVsbHMuY29tIExMQyBpcyBleGVtcHQgZnJvbSB0aGUgZGVmaW5pdGlvbiBvZiDigJxpbnZlc3RtZW50IGFkdmlzZXLigJ0gYXMgcHJvdmlkZWQgdW5kZXIgU2VjdGlvbiAyMDIoYSkgKDExKSBvZiB0aGUgSW52ZXN0bWVudCBBZHZpc2VycyBBY3Qgb2YgMTk0MCBhbmQgY29ycmVzcG9uZGluZyBzdGF0ZSBzZWN1cml0aWVzIGxhd3MsIGFuZCBoZW5jZSByZWdpc3RyYXRpb24gYXMgc3VjaCBpcyBub3QgcmVxdWlyZWQuIFdlIGFyZSBub3QgYSByZWdpc3RlcmVkIGJyb2tlci1kZWFsZXIuIE1hdGVyaWFsIHByb3ZpZGVkIGJ5IEFtZXJpY2FuYnVsbHMuY29tIExMQyBpcyBmb3IgaW5mb3JtYXRpb25hbCBwdXJwb3NlcyBvbmx5LCBhbmQgdGhhdCBubyBtZW50aW9uIG9mIGEgcGFydGljdWxhciBzZWN1cml0eSBpbiBhbnkgb2Ygb3VyIG1hdGVyaWFscyBjb25zdGl0dXRlcyBhIHJlY29tbWVuZGF0aW9uIHRvIGJ1eSwgc2VsbCwgb3IgaG9sZCB0aGF0IG9yIGFueSBvdGhlciBzZWN1cml0eSwgb3IgdGhhdCBhbnkgcGFydGljdWxhciBzZWN1cml0eSwgcG9ydGZvbGlvIG9mIHNlY3VyaXRpZXMsIHRyYW5zYWN0aW9uIG9yIGludmVzdG1lbnQgc3RyYXRlZ3kgaXMgc3VpdGFibGUgZm9yIGFueSBzcGVjaWZpYyBwZXJzb24uIFRvIHRoZSBleHRlbnQgdGhhdCBhbnkgb2YgdGhlIGluZm9ybWF0aW9uIG9idGFpbmVkIGZyb20gQW1lcmljYW5idWxscy5jb20gTExDIG1heSBiZSBkZWVtZWQgdG8gYmUgaW52ZXN0bWVudCBvcGluaW9uLCBzdWNoIGluZm9ybWF0aW9uIGlzIGltcGVyc29uYWwgYW5kIG5vdCB0YWlsb3JlZCB0byB0aGUgaW52ZXN0bWVudCBuZWVkcyBvZiBhbnkgc3BlY2lmaWMgcGVyc29uLiBBbWVyaWNhbmJ1bGxzLmNvbSBMTEMgZG9lcyBub3QgcHJvbWlzZSwgZ3VhcmFudGVlIG9yIGltcGx5IHZlcmJhbGx5IG9yIGluIHdyaXRpbmcgdGhhdCBhbnkgaW5mb3JtYXRpb24gcHJvdmlkZWQgdGhyb3VnaCBvdXIgd2Vic2l0ZXMsIGNvbW1lbnRhcmllcywgb3IgcmVwb3J0cywgaW4gYW55IHByaW50ZWQgbWF0ZXJpYWwsIG9yIGRpc3BsYXllZCBvbiBhbnkgb2Ygb3VyIHdlYnNpdGVzLCB3aWxsIHJlc3VsdCBpbiBhIHByb2ZpdCBvciBsb3NzLmQCCQ8WAh8PBeMGR292ZXJubWVudCByZWd1bGF0aW9ucyByZXF1aXJlIGRpc2Nsb3N1cmUgb2YgdGhlIGZhY3QgdGhhdCB3aGlsZSB0aGVzZSBtZXRob2RzIG1heSBoYXZlIHdvcmtlZCBpbiB0aGUgcGFzdCwgcGFzdCByZXN1bHRzIGFyZSBub3QgbmVjZXNzYXJpbHkgaW5kaWNhdGl2ZSBvZiBmdXR1cmUgcmVzdWx0cy4gV2hpbGUgdGhlcmUgaXMgYSBwb3RlbnRpYWwgZm9yIHByb2ZpdHMgdGhlcmUgaXMgYWxzbyBhIHJpc2sgb2YgbG9zcy4gVGhlcmUgaXMgc3Vic3RhbnRpYWwgcmlzayBpbiBzZWN1cml0eSB0cmFkaW5nLiBMb3NzZXMgaW5jdXJyZWQgaW4gY29ubmVjdGlvbiB3aXRoIHRyYWRpbmcgc3RvY2tzIG9yIGZ1dHVyZXMgY29udHJhY3RzIGNhbiBiZSBzaWduaWZpY2FudC4gWW91IHNob3VsZCB0aGVyZWZvcmUgY2FyZWZ1bGx5IGNvbnNpZGVyIHdoZXRoZXIgc3VjaCB0cmFkaW5nIGlzIHN1aXRhYmxlIGZvciB5b3UgaW4gdGhlIGxpZ2h0IG9mIHlvdXIgZmluYW5jaWFsIGNvbmRpdGlvbiBzaW5jZSBhbGwgc3BlY3VsYXRpdmUgdHJhZGluZyBpcyBpbmhlcmVudGx5IHJpc2t5IGFuZCBzaG91bGQgb25seSBiZSB1bmRlcnRha2VuIGJ5IGluZGl2aWR1YWxzIHdpdGggYWRlcXVhdGUgcmlzayBjYXBpdGFsLiBOZWl0aGVyIEFtZXJpY2FuYnVsbHMuY29tIExMQywgbm9yIEFtZXJpY2FuYnVsbHMuY29tIG1ha2VzIGFueSBjbGFpbXMgd2hhdHNvZXZlciByZWdhcmRpbmcgcGFzdCBvciBmdXR1cmUgcGVyZm9ybWFuY2UuIEFsbCBleGFtcGxlcywgY2hhcnRzLCBoaXN0b3JpZXMsIHRhYmxlcywgY29tbWVudGFyaWVzLCBvciByZWNvbW1lbmRhdGlvbnMgYXJlIGZvciBlZHVjYXRpb25hbCBvciBpbmZvcm1hdGlvbmFsIHB1cnBvc2VzIG9ubHkuZAILDxYCHw8F3wZEaXNwbGF5ZWQgaW5mb3JtYXRpb24gaXMgYmFzZWQgb24gd2lkZWx5LWFjY2VwdGVkIG1ldGhvZHMgb2YgdGVjaG5pY2FsIGFuYWx5c2lzIGJhc2VkIG9uIGNhbmRsZXN0aWNrIHBhdHRlcm5zLiBBbGwgaW5mb3JtYXRpb24gaXMgZnJvbSBzb3VyY2VzIGRlZW1lZCB0byBiZSByZWxpYWJsZSwgYnV0IHRoZXJlIGlzIG5vIGd1YXJhbnRlZSB0byB0aGUgYWNjdXJhY3kuIExvbmctdGVybSBpbnZlc3RtZW50IHN1Y2Nlc3MgcmVsaWVzIG9uIHJlY29nbml6aW5nIHByb2JhYmlsaXRpZXMgaW4gcHJpY2UgYWN0aW9uIGZvciBwb3NzaWJsZSBmdXR1cmUgb3V0Y29tZXMsIHJhdGhlciB0aGFuIGFic29sdXRlIGNlcnRhaW50eSDigJMgcmlzayBtYW5hZ2VtZW50IGlzIGNyaXRpY2FsIGZvciBzdWNjZXNzLiBFcnJvciBhbmQgdW5jZXJ0YWludHkgYXJlIHBhcnQgb2YgYW55IGZvcm0gb2YgbWFya2V0IGFuYWx5c2lzLiBQYXN0IHBlcmZvcm1hbmNlIGlzIG5vIGd1YXJhbnRlZSBvZiBmdXR1cmUgcGVyZm9ybWFuY2UuIEludmVzdG1lbnQvIHRyYWRpbmcgY2FycmllcyBzaWduaWZpY2FudCByaXNrIG9mIGxvc3MgYW5kIHlvdSBzaG91bGQgY29uc3VsdCB5b3VyIGZpbmFuY2lhbCBwcm9mZXNzaW9uYWwgYmVmb3JlIGludmVzdGluZyBvciB0cmFkaW5nLiBZb3VyIGZpbmFuY2lhbCBhZHZpc2VyIGNhbiBnaXZlIHlvdSBzcGVjaWZpYyBmaW5hbmNpYWwgYWR2aWNlIHRoYXQgaXMgYXBwcm9wcmlhdGUgdG8geW91ciBuZWVkcywgcmlzay10b2xlcmFuY2UsIGFuZCBmaW5hbmNpYWwgcG9zaXRpb24uIEFueSB0cmFkZXMgb3IgaGVkZ2VzIHlvdSBtYWtlIGFyZSB0YWtlbiBhdCB5b3VyIG93biByaXNrIGZvciB5b3VyIG93biBhY2NvdW50LmQCDQ8WAh8PBdsBWW91IGFncmVlIHRoYXQgQW1lcmljYW5idWxscy5jb20gYW5kIEFtZXJpY2FuYnVsbHMuY29tIExMQyBpdHMgcGFyZW50IGNvbXBhbnksIHN1YnNpZGlhcmllcywgYWZmaWxpYXRlcywgb2ZmaWNlcnMgYW5kIGVtcGxveWVlcyBzaGFsbCBub3QgYmUgbGlhYmxlIGZvciBhbnkgZGlyZWN0LCBpbmRpcmVjdCwgaW5jaWRlbnRhbCwgc3BlY2lhbCBvciBjb25zZXF1ZW50aWFsIGRhbWFnZXMuZAIRDxYCHwAFHGJvdHRvbWJhbm5lcmNvbnRhaW5lcl9zYWZhcmlkGAEFHl9fQ29udHJvbHNSZXF1aXJlUG9zdEJhY2tLZXlfXxYIBQ9jdGwwMCRMb2dpbk1lbnUFC2N0bDAwJG1NYWluBQ5jdGwwMCRNYWluTWVudQUWY3RsMDAkRnJlZVJlZ2lzdGVyTWVudQUYY3RsMDAkTWVtYmVyc2hpcEJlbmVmaXRzBRJjdGwwMCRTZWFyY2hCdXR0b24FEWN0bDAwJFN1cHBvcnRNZW51BRNjdGwwMCRMYW5ndWFnZXNNZW51NlBIALTovVw6LJEOuDXyhCTS4+M=
__VIEWSTATEGENERATOR:ECDA716A
__EVENTVALIDATION:/wEdAAVswH4c0JxRe30eXDiX0bhcXr7XOgipC8DNcjKl0sbO7fwNII+YQgXfxmh/KZz6Myr4IcjYoaGuA6R78NuEHgsNQX9+ScDGDIM47zqhQCjs5Ynd+DEUmo0/Xv9Oy6tQgLO7ip/G
ctl00$mMain:{"selectedItemIndexPath":"0i0","checkedState":""}
ctl00$MainMenu:{"selectedItemIndexPath":"","checkedState":""}
ctl00$FreeRegisterMenu:{"selectedItemIndexPath":"","checkedState":""}
ctl00$MembershipBenefits:{"selectedItemIndexPath":"","checkedState":""}
ctl00$SearchBox$State:{"rawValue":"","validationState":""}
ctl00$SearchBox:Enter Symbol
ctl00$MainContent$uEmail:test#test.test
ctl00$MainContent$uPassword:test
ctl00$MainContent$ASPxCheckBox1:I
ctl00$SupportMenu:{"selectedItemIndexPath":"","checkedState":""}
ctl00$LanguagesMenu:{"selectedItemIndexPath":"","checkedState":""}
DXScript:1_304,1_185,1_298,1_211,1_221,1_188,1_182,1_290,1_296,1_279,1_198,1_209,1_217,1_201
DXCss:1_40,1_50,1_53,1_51,1_4,1_16,1_13,0_4617,0_4621,1_14,1_17,Styles/Site.css,img/favicon.ico,https://adservice.google.com/adsid/integrator.js?domain=www.americanbulls.com,https://securepubads.g.doubleclick.net/static/3p_cookie.html
__ASYNCPOST:true
ctl00$MainContent$btnSubmit:Sign In
Your code looks great. It just looks like the script is failing because you're not submitting everything that the browser would normally submit. You could try continuing down the path you are on, submit all of the extra form data, and hope you don't have to bother with adding a CSRF token (a CSRF token is a randomly generated string that you're required to send back), or you can do as Sidharth Shah sugggested and use Selenium.
There is a Firefox extension for Selenium that will allow you to start recording your mouse and keyboard actions, and then when you are done, you can export the results in Python. That Python code will depend on the Selenium library and a Selenium Chrome/Firefox/IE driver. When you run your Python code, a new browser window will open up, controlled by the selenium driver and your Python code. It's pretty cool, your basically writing Python code that controls a browser window. You will have to modify the Python code that the Firefox extension gives you a little bit to read all of the data from the page and start doing stuff with it after you're logged in, but the code for opening the browser window, navigating to athe login page, filling in your login credentials and submitting the form, and navigating to other pages after you're logged in will all be written for you.

python requests: url with another sub-requests

When I execute manually this URL in my webbrowser I see in my network console that three other requests will be executed. it works.
call: www.my.url/publish_something
get this cmd
get that cmd
post that...
How can I do it in Python requests?
That I only call once the "main"-URL including all sub-requests like my webbrowser.
> publish_url = "www.my.url/publish_something" r =
> self.session.get(publish_url, verify=False, params=p)
it seems, when I call this url with python requests-module, he does not execute the sub-requests.
When you open an url in your browser, the browser
- issues a GET request to that url
- parse the content
- issues GET requests for each image tag and for each script, style etc tags mentionning an external source,
- executes the scripts, which may lead to more sub requests and DOM modifications,
- and finally render the final DOM.
When you send a GET request with Python (with python-rquests, the urllib module or whatever), on the first of the above stages is performed, so if you want more you'll have to do it by yourself (parsing the content, retrieving images etc etc).
Or you can use a headless browser like PhantomJS.

Logging into a website and retrieving HTML with Python

I need to log into a website to access its html on a login-protected page for a project I'm doing.
I'm using this person's answer with the values I need:
from twill.commands import *
go('https://example.com/login')
fv("3", "email", "myemail#example.com")
fv("3", "password", "mypassword")
submit()
Assumedly this should log me in so I then run:
sock = urllib.urlopen("https://www.example.com/activities")
html_source = sock.read()
sock.close()
print html_source
Which I thought would print the html of the (now) accessible page but instead just gives me the html of the login page. I've tried other methods (e.g. with mechanize) but I get the identical result.
What am I missing? Do some sites restrict this type of login or does it not work with https or something? (The site is FitBit, since I couldn't use the url in the question)
You're using one library to log in and another to then retrieve the subsequent page. twill and urllib are not sharing data about your sessions. (Similar issue to this one.) If you do that, then you need to manage the session cookie / authentication yourself. Specifically, you'll need to copy the cookie + data and add that to the post-login request in the other library.
Otherwise, and more logically, use the same one for both the login and post-login requests.

How to access netgear router web interface

What I am trying to do is access the traffic meter data on my local netgear router. It's easy enough to login to it and click on the link, but ideally I would like a little app that sits down in the system tray (windows) that I can check whenever I want to see what my network traffic is.
I'm using python to try to access the router's web page, but I've run into some snags. I originally tried modified a script that would reboot the router (found here https://github.com/ncw/router-rebooter/blob/master/router_rebooter.py) but it just serves up the raw html and I need it after the onload javascript functions have run. This type of thing is described in many posts about web scraping and people suggested using selenium.
I tried selenium and have run into two problems. First, it actually opens the browser window, which is not what I want. Second, it skips the stuff I put in to pass the HTTP authentication and pops up the login window anyway. Here is the code:
from selenium import webdriver
baseAddress = '192.168.1.1'
baseURL = 'http://%(user)s:%(pwd)s#%(host)s/traffic_meter.htm'
username = 'admin'
pwd = 'thisisnotmyrealpassword'
url = baseURL % {
'user': username,
'pwd': pwd,
'host': baseAddress
}
profile = webdriver.FirefoxProfile()
profile.set_preference('network.http.phishy-userpass-length', 255)
driver = webdriver.Firefox(firefox_profile=profile)
driver.get(url)
So, my question is, what is the best way to accomplish what I want without having it launch a visible web browser window?
Update:
Okay, I tried sircapsalot's suggestion and modified the script to this:
from selenium import webdriver
from contextlib import closing
url = 'http://admin:notmyrealpassword#192.168.1.1/start.htm'
with closing(webdriver.Remote(desired_capabilities = webdriver.DesiredCapabilities.HTMLUNIT)) as driver:
driver.get(url)
print(driver.page_source)
This fixes the web browser being loaded, but it failed the authentication. Any suggestions?
Okay, I found the solution and it was way easier than I thought. I did try John1024's suggestion and was able to download the proper webpage from the router using wget. However I didn't like the fact that wget saved the result to a file, which I would then have to open and parse.
I ended up going back to the original reboot_router.py script I had attempted to modify unsuccessfully the first time. My problem was I was trying to make it too complicated. This is the final script I ended up using:
import urllib2
user = 'admin'
pwd = 'notmyrealpassword'
host = '192.168.1.1'
url = 'http://' + host + '/traffic_meter_2nd.htm'
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, host, user, pwd)
authhandler = urllib2.HTTPBasicAuthHandler(passman)
opener = urllib2.build_opener(authhandler)
response = opener.open(url)
stuff = response.read()
response.close()
print stuff
This prints out the entire traffic meter webpage from my router, with its proper values loaded. I can then take this and parse the values out of it. The nice thing about this is it has no external dependencies like selenium, wget or other libraries that needs to be installed. Clean is good.
Thank you, everyone, for your suggestions. I wouldn't have gotten to this answer without them.
The web interface for my Netgear router (WNDR3700) is also filled with javascript. Yours may differ but I have found that my scripts can get all the info they need without javascript.
The first step is finding the correct URL. Using FireFox, I went to the traffic page and then used "This Frame -> Show only this frame" to discover that the URL for the traffic page on my router is:
http://my_router_address/traffic.htm
After finding this URL, no web browswer and no javascript is needed. I can, for example, capture this page with wget:
wget http://my_router_address/traffic.htm
Using a text editor on the resulting traffic.htm file, I see that the traffic data is available in a lengthy block that starts:
var traffic_today_time="1486:37";
var traffic_today_up="1,959";
var traffic_today_down="1,945";
var traffic_today_total="3,904";
. . . .
Thus, the traffic.htm file can be easily captured and parsed with the scripting language of your choice. No javascript ever needs to be executed.
UPDATE: I have a ~/.netrc file with a line in it like:
machine my_router_address login someloginname password somepassword
Before wget downloads from the router, it retrieves the login info from this file. This has security advantages. If one runs wget http://name#password..., then the password is viewable to all on your machine via the process list (ps a). Using .netrc, this never happens. Restrictive permissions can be set on .netrc, e.g. readable only by user (chmod 400 ~/.netrc).

Mechanize and Python not handling cookies properly

I have a Python script using mechanize browser which logs into a self hosted Wordpress blog, navigates to a different page after the automatic redirect to the dashboard to automate several builtin functions.
This script actually works 100% on most of my blogs but goes into a permanent loop with one of them.
The difference is that the only one which fails has a plugin called Wassup running. This plugin sets a session cookie for all visitors and this is what I think is causing the issue.
When the script goes to the new page the Wordpress code doesn't get the proper cookie set, decides that the browser isn't logged in and redirects to the login page. The script logs in again and attempts the same function and round we go again.
I tried using Twill which does login correctly and handles the cookies correctly but Twill, by default, outputs everything to the command line. This is not the behaviour I want as I am doing page manipulation at this point and I need access to the raw html.
This is the setup code
# Browser
self.br = mechanize.Browser()
# Cookie Jar
policy = mechanize.DefaultCookiePolicy(rfc2965=True)
cj = mechanize.LWPCookieJar(policy=policy)
self.br.set_cookiejar(cj)
After successful login I call this function
def open(self):
if 'http://' in str(self.burl):
site = str(self.burl) + '/wp-admin/plugin-install.php'
self.burl = self.burl[7:]
else:
site = "http://" + str(self.burl) + '/wp-admin/plugin-install.php'
try:
r = self.br.open(site, timeout=1000)
html = r.read()
return html
except HTTPError, e:
return str(e.code)
I'm thinking that I will need to save the cookies to a file and then shuffle the order so the Wordpress session cookie gets returned before the Wassup one.
Any other suggestions?
This turned out to be a quite different problem, and fix, than it seemed which is why I have decided to put the answer here for anyone who reads this later.
When a WordPress site is setup there is an option for the url to default to http://sample.com or http://www.sample.com. This turned out to be a problem for the cookie storage. Cookies are stored with the url as part of their name. My program semi-hardcodes the url with one or the other of these formats. This meant that every time I made a new url request it had the wrong format and no cookie with the right name could be found so the WordPress site rightfully decided I wasn't logged in and sent me back to login again.
The fix is to grab the url delivered in the redirect after login and recode the variable (in this case self.burl) to reflect what the .httaccess file expects to see.
This fixed my problem because some of my sites had one format and some the other.
I hope this helps someone out with using requests, twill, mechanise etc.

Categories