How do I edit the url command for chromium browsers - python

I want to redirect websites in my chromium browser but I'm not sure how to do so. I want to add code so that when I go to www.google.com, the browser redirects to www.stackoverflow.com.
I tried looking online but I couldn't find anything related to this. I tried creating a browser in pyqt5 and I was able to do this with the commands:
def navigate(self,url): if "google.com" in url: url= "http://stackoverflow.com"
Can someone give me the equivalent for chromium?

Related

Log in to website using Python and Requests module?

I'm writing an AliExpress web scraper using Python and the Requests module along with BeautifulSoup and I got it working well, however I've run into a problem - I get redirected to a login page randomly. My solution to this is to simply log in at the start of my session before scraping, but I don't know how to log in.
The login page (https://login.aliexpress.com) requires only the username and password, but when I try to enter them with my code and test to see if I'm logged in by going to https://home.aliexpress.com/index.htm and looking at the html, it fails as it redirects me back to the login page.
My code after trying multiple solutions to no avail:
import requests
LOGIN_URL = "https://login.aliexpress.com/"
LOGIN_INFO = {
"loginId": "myemail#email.com",
"password": "mypassword"
}
with requests.Session() as sess:
#go to login page
sess.get(LOGIN_URL)
#attempt to log in with my login info
sess.post(LOGIN_URL, data=LOGIN_INFO)
#go to 'My AliExpress' page to verify successful login
success = sess.get("https://home.aliexpress.com/index.htm")
#manually check html to see if I was sent to the login page again
print(success.text)
This is pretty much what's left after my many failed attempts. Some of the things I've tried are:
Looking at the cookie after the 'sess.get(LOGIN_URL)', it returns
this but I don't know what to do with it (in key:value format):
ali_apache_tracktmp :
ali_apache_track :
xman_f :
t52Eyo+p3qf6E6fdmL5yJ81g2icRn+2PYjjrWYHlqlDyXAixo92Z5KHMZV8SCV7vP4ZjxEmuTQesVWkqxUi3SpFU1qbRyNRd+d0pIIKVhrIDri2oaWrt6A==
JSESSIONID : 30678741D7473C80BEB85825718FB1C6
acs_usuc_t :
acs_rt=343aef98b0ca4ae79497e31b11c82c29&x_csrf=1b5g78e7fz2rt
xman_us_f : x_l=0
ali_apache_id : 23.76.146.14.1510893827939.187695.4
xman_t :
PSIYMbKN2UyuejZBfmP9o5hdmQGoSB0UL0785LnRBxW0bdbdMmtW2A47hHbgTgD7TmFp7QVsOW4kXTsXMncy+iKisKfqagqb4yPxOVFdw+k=
Tried looking for a csrf token and only found the text after '_csrf=' in the 5th bullet above. Tried using it and it didn't work.
Looked at the html form sent when you log in but I don't know html and can only recognize it has a lot more fields than the ones I've seen other people use for other websites (Image of Form Data from Chrome here).
Changing the "myPassword" in my code to the text in the password2 field in image above and changing the "password" key to "password2" too.
Googled for a few hours but didn't find anything that would work.
At this point, I'm at my wits end, so any help on how to proceed would be very much appreciated. I'm not the best coder (still learning), don't know html outside of what I've learned from a few tutorials about scraping, and was hoping to figure it out myself, but hours later I still haven't solved it and realized I could really use the help.
I'm using python 3.5. If there's any more info needed, let me know. Brain is just about turned completely to mush after being stuck and awake for so long.
I have a suspicion this will not work the way you want it to.
Even after somehow accomplishing the login prompt, the following page presents a "slider verification" which to my knowledge requests is unable to do anything about. (If there is a method please let me know).
I have been trying to use cookies instead:
session = requests.Session()
cj = requests.cookies.RequestsCookieJar()
cj.set('KEY', 'VALUE')
session.cookies = cj
response = session.get(url, timeout=5, headers=headers, proxies=proxies)
Previously the scraper worked using headers and proxies for a time, but recently it always prompts a login.
I have tried all the keys and values in the cookies as well to no avail.
An idea would be to use selenium to login and capture cookies, then pass it to requests session.
AntoG has a solution to do this:
https://stackoverflow.com/a/42114843

How to use requests to login to this website

I'm trying to automate some tasks with python, and webscraping. but first, I need to login to a website I have an account on.
I've seen several examples on stack overflow, but for some reason, this website won't let me login using requests. Can anyone tell me what I'm doing wrong?
The webpage:
https://www.americanbulls.com/Signin.aspx?lang=en
the form variables:
ctl00$MainContent$uEmail
ctl00$MainContent$uPassword
Is it the variable names have '$' in them?
Any help would be greatly appreciated.
import sys
print(sys.path)
sys.path.append('C:\program files\python36\lib\site-packages\pip\_vendor')
import requests
import sys
import time
EMAIL = '<my_email>'
PASSWORD = '<my_password>'
URL = 'https://www.americanbulls.com/Signin.aspx?lang=en'
# Start a session so we can have persistant cookies
session = requests.session()
#This is the form data that the page sends when logging in
login_data = {
'ctl00$MainContent$uEmail': EMAIL,
'ctl00$MainContent$uPassword': PASSWORD
}
# Authenticate
r = session.post(URL, data=login_data, timeout=15, verify=True)
# Try accessing a page that requires you to be logged in
r = session.get('https://www.americanbulls.com/members/SignalPage.aspx?lang=en&Ticker=SQ')
print(r.url)
I submitted a form using test#test.test as the email and test as the password, and when I looked at the headers of the request I'd sent in the network tab of chrome dev tools it said I submitted the following form data.
ctl00$ScriptManager1:ctl00$MainContent$UpdatePanel|ctl00$MainContent$btnSubmit
__LASTFOCUS:
__EVENTTARGET:
__EVENTARGUMENT:
__VIEWSTATE:/wEPDwULLTE5MzMzODAyNzIPZBYCZg9kFgICAQ9kFgICAw9kFgICBQ9kFhICAQ8WAh4FY2xhc3MFFmhlYWRlcmNvbnRhaW5lcl9zYWZhcmkWCgIBDzwrAAkCAA8WAh4OXyFVc2VWaWV3U3RhdGVnZAYPZBAWAmYCARYCPCsADAEAFgYeC05hdmlnYXRlVXJsBRVSZWdpc3Rlci5hc3B4P2xhbmc9ZW4eBFRleHQFCFJlZ2lzdGVyHgdUb29sVGlwBTFSZWdpc3RlciBub3cgdG8gZ2V0IGFjY2VzcyB0byBleGNsdXNpdmUgZmVhdHVyZXMhPCsADAEAFgYfAwUHU2lnbiBJbh8CBRNTaWduaW4uYXNweD9sYW5nPWVuHghTZWxlY3RlZGdkZAIDDw8WBB8CBRREZWZhdWx0LmFzcHg/bGFuZz1lbh4ISW1hZ2VVcmwFGH4vaW1nL2FtZXJpY2FuYnVsbHMxLmdpZmRkAgcPZBYCAgEPPCsACQIADxYCHwFnZAYPZBAWAWYWATwrAAwCABYCHwMFB0VuZ2xpc2gBD2QQFghmAgECAgIDAgQCBQIGAgcWCDwrAAwCABYGHwMFB0VuZ2xpc2gfAgUUL1NpZ25pbi5hc3B4P2xhbmc9ZW4fBWcCFCsAAhYCHgNVcmwFEn4vaW1nL2VuaWNvbjAxLnBuZ2Q8KwAMAgAWBB8DBQdEZXV0c2NoHwIFFC9TaWduaW4uYXNweD9sYW5nPWRlAhQrAAIWAh8HBRJ+L2ltZy9kZWljb24wMS5wbmdkPCsADAIAFgQfAwUG5Lit5paHHwIFFC9TaWduaW4uYXNweD9sYW5nPXpoAhQrAAIWAh8HBRJ+L2ltZy96aGljb24wMS5wbmdkPCsADAIAFgQfAwUJRnJhbsOnYWlzHwIFFC9TaWduaW4uYXNweD9sYW5nPWZyAhQrAAIWAh8HBRJ+L2ltZy9mcmljb24wMS5wbmdkPCsADAIAFgQfAwUIVMO8cmvDp2UfAgUUL1NpZ25pbi5hc3B4P2xhbmc9dHICFCsAAhYCHwcFEn4vaW1nL3RyaWNvbjAxLnBuZ2Q8KwAMAgAWBB8DBQlJbmRvbmVzaWEfAgUUL1NpZ25pbi5hc3B4P2xhbmc9aWQCFCsAAhYCHwcFEn4vaW1nL2lkaWNvbjAxLnBuZ2Q8KwAMAgAWBB8DBQhFc3Bhw7FvbB8CBRQvU2lnbmluLmFzcHg/bGFuZz1lcwIUKwACFgIfBwUSfi9pbWcvZXNpY29uMDEucG5nZDwrAAwCABYEHwMFCEl0YWxpYW5vHwIFFC9TaWduaW4uYXNweD9sYW5nPWl0AhQrAAIWAh8HBRJ+L2ltZy9pdGljb24wMS5wbmdkZGRkAgkPZBYCAgEPPCsABAEADxYEHgVWYWx1ZQULTGFzdCBVcGRhdGUeB1Zpc2libGVoZGQCCw9kFgICAQ88KwAEAQAPFgIfCWhkZAIDDxYCHwAFGG1haW5tZW51Y29udGFpbmVyX3NhZmFyaRYGAgEPPCsACQIADxYCHwFnZAYPZBAWDWYCAQICAgMCBAIFAgYCBwIIAgkCCgILAgwWDTwrAAwBABYEHwIFFERlZmF1bHQuYXNweD9sYW5nPWVuHwMFBEhPTUU8KwAMAQAWBB8DBQRBTUVYHwIFKVNpZ25hbExpc3QuYXNweD9sYW5nPWVuJk1hcmtldFN5bWJvbD1BTUVYPCsADAEAFgQfAwUETllTRR8CBSlTaWduYWxMaXN0LmFzcHg/bGFuZz1lbiZNYXJrZXRTeW1ib2w9TllTRTwrAAwBABYEHwMFBk5BU0RBUR8CBStTaWduYWxMaXN0LmFzcHg/bGFuZz1lbiZNYXJrZXRTeW1ib2w9TkFTREFRPCsADAEAFgQfAwUIT1RDIFBJTksfAgUpU2lnbmFsTGlzdC5hc3B4P2xhbmc9ZW4mTWFya2V0U3ltYm9sPVBJTks8KwAMAQAWBB8DBQlQUkVGRVJSRUQfAgUuU2lnbmFsTGlzdC5hc3B4P2xhbmc9ZW4mTWFya2V0U3ltYm9sPVBSRUZFUlJFRDwrAAwBABYEHwMFCFdBUlJBTlRTHwIFLVNpZ25hbExpc3QuYXNweD9sYW5nPWVuJk1hcmtldFN5bWJvbD1XQVJSQU5UUzwrAAwBABYEHwMFB0lOREVYRVMfAgUcSW5kZXhTaWduYWxMaXN0LmFzcHg/bGFuZz1lbjwrAAwCABYEHwMFAmZ4HwIFGVNpZ25hbExpc3RGWC5hc3B4P2xhbmc9ZW4KPCsADgEAFgYeCUZvcmVDb2xvcgpgHgtGb250X0l0YWxpY2ceBF8hU0IChCA8KwAMAQAWAh8JaDwrAAwBABYCHwloPCsADAEAFgIfCWg8KwAMAQAWAh8JaGRkAgMPFCsABA8WBB8IBRRTdXBwb3J0LmFzcHg/bGFuZz1lbh8JaGRkZDwrAAUBABYCHwMFBEhlbHBkAgUPZBYCAgMPPCsABAEADxYCHwgFJmh0dHBzOi8vd3d3LnR3aXR0ZXIuY29tL2FtZXJpY2FuX0J1bGxzZGQCBQ8WAh8ABRdzdWJtZW51Y29udGFpbmVyX3NhZmFyaRYKAgEPPCsACQIADxYCHwFnZAYPZBAWAWYWATwrAAwBABYEHwIFFVJlZ2lzdGVyLmFzcHg/bGFuZz1lbh8DBTFSZWdpc3RlciBub3cgdG8gZ2V0IGFjY2VzcyB0byBleGNsdXNpdmUgZmVhdHVyZXMhZGQCAw88KwAJAgAPFgQfAWcfCWhkBg9kEBYBZhYBPCsADAEAFgQfAgUTU2lnbmluLmFzcHg/bGFuZz1lbh8DBQdTaWduIEluZGQCBQ88KwAJAgAPFgIfAWdkBg9kEBYBZhYBPCsADAEAFgQfAgUfTWVtYmVyc2hpcEJlbmVmaXRzLmFzcHg/bGFuZz1lbh8DBRNNZW1iZXJzaGlwIEJlbmVmaXRzZGQCBw88KwAJAQAPFgQfAWcfCWhkZAILDzwrAAYBAzwrAAgBABYCHghOdWxsVGV4dAUMRW50ZXIgU3ltYm9sZAIHDxYCHwAFEGNvbnRhaW5lcl9zYWZhcmkWAgIBD2QWAgIBD2QWAgIBD2QWAgIDD2QWAmYPZBYcAgEPPCsABAEADxYCHwgFB1NpZ24gSW5kZAIDDzwrAAQBAA8WAh8IBQlOZXcgVXNlcj9kZAIFDxQrAAQPFgIfCAUVUmVnaXN0ZXIuYXNweD9sYW5nPWVuZGRkPCsABQEAFgIfAwUIUmVnaXN0ZXJkAgcPPCsABAEADxYCHwhlZGQCCQ88KwAEAQAPFgIfCAUFRW1haWxkZAINDw8WAh4MRXJyb3JNZXNzYWdlBQ1JbnZhbGlkIGVtYWlsZGQCDw8PFgIfDgUNSW52YWxpZCBlbWFpbGRkAhEPPCsABAEADxYCHwgFCFBhc3N3b3JkZGQCFQ8PFgIfDgUUUGFzc3dvcmQgaXMgcmVxdWlyZWRkZAIXDzwrAAQBAA8WAh8IBQtSZW1lbWJlciBNZWRkAhsPDxYCHwMFB1NpZ24gSW5kZAIdDw8WAh8DBQZDYW5jZWxkZAIfDzwrAAQBAA8WAh8IBShJZiB5b3UgY2Fubm90IHJlYWNoIHlvdXIgYWNjb3VudCwgcGxlYXNlZGQCIQ8UKwAEDxYCHwgFGVNlbmRQYXNzd29yZC5hc3B4P2xhbmc9ZW5kZGQ8KwAFAQAWAh8DBQtjbGljayBoZXJlLmQCCQ8WAh8ABRB3aGl0ZWJhbnRfc2FmYXJpZAILDxYCHwAFG3N1cHBvcnRtZW51Y29udGFpbmVyX3NhZmFyaRYCAgEPPCsACQIADxYCHwFnZAYPZBAWBmYCAQICAgMCBAIFFgY8KwAMAQAWBB8DBQhBYm91dCBVcx8CBRRBYm91dFVzLmFzcHg/bGFuZz1lbjwrAAwBABYEHwMFB1N1cHBvcnQfAgUUU3VwcG9ydC5hc3B4P2xhbmc9ZW48KwAMAQAWBB8DBQdQcml2YWN5HwIFFFByaXZhY3kuYXNweD9sYW5nPWVuPCsADAEAFgQfAwUDVE9THwIFEFRvcy5hc3B4P2xhbmc9ZW48KwAMAQAWBB8DBRNNZW1iZXJzaGlwIEJlbmVmaXRzHwIFH01lbWJlcnNoaXBCZW5lZml0cy5hc3B4P2xhbmc9ZW48KwAMAQAWBB8DBQ9JbXBvcnRhbnQgTGlua3MfAgUbSW1wb3J0YW50TGlua3MuYXNweD9sYW5nPWVuZGQCDQ8WAh8ABRdmb290ZXJjb250YWluZXIxX3NhZmFyaRYCAgEPPCsACQIADxYCHwFnZAYPZBAWCGYCAQICAgMCBAIFAgYCBxYIPCsADAEAFgYfAwUHRW5nbGlzaB8FZx8CBRZ+Ly9TaWduaW4uYXNweD9sYW5nPWVuPCsADAEAFgQfAwUHRGV1dHNjaB8CBRZ+Ly9TaWduaW4uYXNweD9sYW5nPWRlPCsADAEAFgQfAwUG5Lit5paHHwIFFn4vL1NpZ25pbi5hc3B4P2xhbmc9emg8KwAMAQAWBB8DBQlGcmFuw6dhaXMfAgUWfi8vU2lnbmluLmFzcHg/bGFuZz1mcjwrAAwBABYEHwMFCFTDvHJrw6dlHwIFFn4vL1NpZ25pbi5hc3B4P2xhbmc9dHI8KwAMAQAWBB8DBQlJbmRvbmVzaWEfAgUWfi8vU2lnbmluLmFzcHg/bGFuZz1pZDwrAAwBABYEHwMFCEVzcGHDsW9sHwIFFn4vL1NpZ25pbi5hc3B4P2xhbmc9ZXM8KwAMAQAWBB8DBQhJdGFsaWFubx8CBRZ+Ly9TaWduaW4uYXNweD9sYW5nPWl0ZGQCDw8WAh8ABRdmb290ZXJjb250YWluZXIzX3NhZmFyaRYOAgEPFgIeCWlubmVyaHRtbAUMRGlzY2xhaW1lcnM6ZAIDDxYCHw8FjwVBbWVyaWNhbmJ1bGxzLmNvbSBMTEMgaXMgbm90IHJlZ2lzdGVyZWQgYXMgYW4gaW52ZXN0bWVudCBhZHZpc2VyIHdpdGggdGhlIFUuUy4gU2VjdXJpdGllcyBhbmQgRXhjaGFuZ2UgQ29tbWlzc2lvbi4gIFJhdGhlciwgQW1lcmljYW5idWxscy5jb20gTExDIHJlbGllcyB1cG9uIHRoZSDigJxwdWJsaXNoZXLigJlzIGV4Y2x1c2lvbuKAnSBmcm9tIHRoZSBkZWZpbml0aW9uIG9mIGludmVzdG1lbnQgYWR2aXNlciBhcyBwcm92aWRlZCB1bmRlciBTZWN0aW9uIDIwMihhKSgxMSkgb2YgdGhlIEludmVzdG1lbnQgQWR2aXNlcnMgQWN0IG9mIDE5NDAgYW5kIGNvcnJlc3BvbmRpbmcgc3RhdGUgc2VjdXJpdGllcyBsYXdzLiBBcyBzdWNoLCBBbWVyaWNhbmJ1bGxzLmNvbSBMTEMgZG9lcyBub3Qgb2ZmZXIgb3IgcHJvdmlkZSBwZXJzb25hbGl6ZWQgaW52ZXN0bWVudCBhZHZpY2UuIFRoaXMgc2l0ZSBhbmQgYWxsIG90aGVycyBvd25lZCBhbmQgb3BlcmF0ZWQgYnkgQW1lcmljYW5idWxscy5jb20gTExDIGFyZSBib25hIGZpZGUgcHVibGljYXRpb25zIG9mIGdlbmVyYWwgYW5kIHJlZ3VsYXIgY2lyY3VsYXRpb24gb2ZmZXJpbmcgaW1wZXJzb25hbCBpbnZlc3RtZW50LXJlbGF0ZWQgYWR2aWNlIHRvIG1lbWJlciBhbmQgL29yIHByb3NwZWN0aXZlIG1lbWJlcnMuZAIFDxYCHw8FrAJBbWVyaWNhbmJ1bGxzLmNvbSBpcyBhbiBpbmRlcGVuZGVudCB3ZWJzaXRlLiBBbWVyaWNhbmJ1bGxzLmNvbSBMTEMgZG9lcyBub3QgcmVjZWl2ZSBjb21wZW5zYXRpb24gYnkgYW55IGRpcmVjdCBvciBpbmRpcmVjdCBtZWFucyBmcm9tIHRoZSBzdG9ja3MsIHNlY3VyaXRpZXMgYW5kIG90aGVyIGluc3RpdHV0aW9ucyBvciBhbnkgdW5kZXJ3cml0ZXJzIG9yIGRlYWxlcnMgYXNzb2NpYXRlZCB3aXRoIHRoZSBicm9hZGVyIG5hdGlvbmFsIG9yIGludGVybmF0aW9uYWwgZm9yZXgsIGNvbW1vZGl0eSBhbmQgc3RvY2sgbWFya2V0cy5kAgcPFgIfDwX3CFRoZXJlZm9yZSwgQW1lcmljYW5idWxscy5jb20gYW5kIEFtZXJpY2FuYnVsbHMuY29tIExMQyBpcyBleGVtcHQgZnJvbSB0aGUgZGVmaW5pdGlvbiBvZiDigJxpbnZlc3RtZW50IGFkdmlzZXLigJ0gYXMgcHJvdmlkZWQgdW5kZXIgU2VjdGlvbiAyMDIoYSkgKDExKSBvZiB0aGUgSW52ZXN0bWVudCBBZHZpc2VycyBBY3Qgb2YgMTk0MCBhbmQgY29ycmVzcG9uZGluZyBzdGF0ZSBzZWN1cml0aWVzIGxhd3MsIGFuZCBoZW5jZSByZWdpc3RyYXRpb24gYXMgc3VjaCBpcyBub3QgcmVxdWlyZWQuIFdlIGFyZSBub3QgYSByZWdpc3RlcmVkIGJyb2tlci1kZWFsZXIuIE1hdGVyaWFsIHByb3ZpZGVkIGJ5IEFtZXJpY2FuYnVsbHMuY29tIExMQyBpcyBmb3IgaW5mb3JtYXRpb25hbCBwdXJwb3NlcyBvbmx5LCBhbmQgdGhhdCBubyBtZW50aW9uIG9mIGEgcGFydGljdWxhciBzZWN1cml0eSBpbiBhbnkgb2Ygb3VyIG1hdGVyaWFscyBjb25zdGl0dXRlcyBhIHJlY29tbWVuZGF0aW9uIHRvIGJ1eSwgc2VsbCwgb3IgaG9sZCB0aGF0IG9yIGFueSBvdGhlciBzZWN1cml0eSwgb3IgdGhhdCBhbnkgcGFydGljdWxhciBzZWN1cml0eSwgcG9ydGZvbGlvIG9mIHNlY3VyaXRpZXMsIHRyYW5zYWN0aW9uIG9yIGludmVzdG1lbnQgc3RyYXRlZ3kgaXMgc3VpdGFibGUgZm9yIGFueSBzcGVjaWZpYyBwZXJzb24uIFRvIHRoZSBleHRlbnQgdGhhdCBhbnkgb2YgdGhlIGluZm9ybWF0aW9uIG9idGFpbmVkIGZyb20gQW1lcmljYW5idWxscy5jb20gTExDIG1heSBiZSBkZWVtZWQgdG8gYmUgaW52ZXN0bWVudCBvcGluaW9uLCBzdWNoIGluZm9ybWF0aW9uIGlzIGltcGVyc29uYWwgYW5kIG5vdCB0YWlsb3JlZCB0byB0aGUgaW52ZXN0bWVudCBuZWVkcyBvZiBhbnkgc3BlY2lmaWMgcGVyc29uLiBBbWVyaWNhbmJ1bGxzLmNvbSBMTEMgZG9lcyBub3QgcHJvbWlzZSwgZ3VhcmFudGVlIG9yIGltcGx5IHZlcmJhbGx5IG9yIGluIHdyaXRpbmcgdGhhdCBhbnkgaW5mb3JtYXRpb24gcHJvdmlkZWQgdGhyb3VnaCBvdXIgd2Vic2l0ZXMsIGNvbW1lbnRhcmllcywgb3IgcmVwb3J0cywgaW4gYW55IHByaW50ZWQgbWF0ZXJpYWwsIG9yIGRpc3BsYXllZCBvbiBhbnkgb2Ygb3VyIHdlYnNpdGVzLCB3aWxsIHJlc3VsdCBpbiBhIHByb2ZpdCBvciBsb3NzLmQCCQ8WAh8PBeMGR292ZXJubWVudCByZWd1bGF0aW9ucyByZXF1aXJlIGRpc2Nsb3N1cmUgb2YgdGhlIGZhY3QgdGhhdCB3aGlsZSB0aGVzZSBtZXRob2RzIG1heSBoYXZlIHdvcmtlZCBpbiB0aGUgcGFzdCwgcGFzdCByZXN1bHRzIGFyZSBub3QgbmVjZXNzYXJpbHkgaW5kaWNhdGl2ZSBvZiBmdXR1cmUgcmVzdWx0cy4gV2hpbGUgdGhlcmUgaXMgYSBwb3RlbnRpYWwgZm9yIHByb2ZpdHMgdGhlcmUgaXMgYWxzbyBhIHJpc2sgb2YgbG9zcy4gVGhlcmUgaXMgc3Vic3RhbnRpYWwgcmlzayBpbiBzZWN1cml0eSB0cmFkaW5nLiBMb3NzZXMgaW5jdXJyZWQgaW4gY29ubmVjdGlvbiB3aXRoIHRyYWRpbmcgc3RvY2tzIG9yIGZ1dHVyZXMgY29udHJhY3RzIGNhbiBiZSBzaWduaWZpY2FudC4gWW91IHNob3VsZCB0aGVyZWZvcmUgY2FyZWZ1bGx5IGNvbnNpZGVyIHdoZXRoZXIgc3VjaCB0cmFkaW5nIGlzIHN1aXRhYmxlIGZvciB5b3UgaW4gdGhlIGxpZ2h0IG9mIHlvdXIgZmluYW5jaWFsIGNvbmRpdGlvbiBzaW5jZSBhbGwgc3BlY3VsYXRpdmUgdHJhZGluZyBpcyBpbmhlcmVudGx5IHJpc2t5IGFuZCBzaG91bGQgb25seSBiZSB1bmRlcnRha2VuIGJ5IGluZGl2aWR1YWxzIHdpdGggYWRlcXVhdGUgcmlzayBjYXBpdGFsLiBOZWl0aGVyIEFtZXJpY2FuYnVsbHMuY29tIExMQywgbm9yIEFtZXJpY2FuYnVsbHMuY29tIG1ha2VzIGFueSBjbGFpbXMgd2hhdHNvZXZlciByZWdhcmRpbmcgcGFzdCBvciBmdXR1cmUgcGVyZm9ybWFuY2UuIEFsbCBleGFtcGxlcywgY2hhcnRzLCBoaXN0b3JpZXMsIHRhYmxlcywgY29tbWVudGFyaWVzLCBvciByZWNvbW1lbmRhdGlvbnMgYXJlIGZvciBlZHVjYXRpb25hbCBvciBpbmZvcm1hdGlvbmFsIHB1cnBvc2VzIG9ubHkuZAILDxYCHw8F3wZEaXNwbGF5ZWQgaW5mb3JtYXRpb24gaXMgYmFzZWQgb24gd2lkZWx5LWFjY2VwdGVkIG1ldGhvZHMgb2YgdGVjaG5pY2FsIGFuYWx5c2lzIGJhc2VkIG9uIGNhbmRsZXN0aWNrIHBhdHRlcm5zLiBBbGwgaW5mb3JtYXRpb24gaXMgZnJvbSBzb3VyY2VzIGRlZW1lZCB0byBiZSByZWxpYWJsZSwgYnV0IHRoZXJlIGlzIG5vIGd1YXJhbnRlZSB0byB0aGUgYWNjdXJhY3kuIExvbmctdGVybSBpbnZlc3RtZW50IHN1Y2Nlc3MgcmVsaWVzIG9uIHJlY29nbml6aW5nIHByb2JhYmlsaXRpZXMgaW4gcHJpY2UgYWN0aW9uIGZvciBwb3NzaWJsZSBmdXR1cmUgb3V0Y29tZXMsIHJhdGhlciB0aGFuIGFic29sdXRlIGNlcnRhaW50eSDigJMgcmlzayBtYW5hZ2VtZW50IGlzIGNyaXRpY2FsIGZvciBzdWNjZXNzLiBFcnJvciBhbmQgdW5jZXJ0YWludHkgYXJlIHBhcnQgb2YgYW55IGZvcm0gb2YgbWFya2V0IGFuYWx5c2lzLiBQYXN0IHBlcmZvcm1hbmNlIGlzIG5vIGd1YXJhbnRlZSBvZiBmdXR1cmUgcGVyZm9ybWFuY2UuIEludmVzdG1lbnQvIHRyYWRpbmcgY2FycmllcyBzaWduaWZpY2FudCByaXNrIG9mIGxvc3MgYW5kIHlvdSBzaG91bGQgY29uc3VsdCB5b3VyIGZpbmFuY2lhbCBwcm9mZXNzaW9uYWwgYmVmb3JlIGludmVzdGluZyBvciB0cmFkaW5nLiBZb3VyIGZpbmFuY2lhbCBhZHZpc2VyIGNhbiBnaXZlIHlvdSBzcGVjaWZpYyBmaW5hbmNpYWwgYWR2aWNlIHRoYXQgaXMgYXBwcm9wcmlhdGUgdG8geW91ciBuZWVkcywgcmlzay10b2xlcmFuY2UsIGFuZCBmaW5hbmNpYWwgcG9zaXRpb24uIEFueSB0cmFkZXMgb3IgaGVkZ2VzIHlvdSBtYWtlIGFyZSB0YWtlbiBhdCB5b3VyIG93biByaXNrIGZvciB5b3VyIG93biBhY2NvdW50LmQCDQ8WAh8PBdsBWW91IGFncmVlIHRoYXQgQW1lcmljYW5idWxscy5jb20gYW5kIEFtZXJpY2FuYnVsbHMuY29tIExMQyBpdHMgcGFyZW50IGNvbXBhbnksIHN1YnNpZGlhcmllcywgYWZmaWxpYXRlcywgb2ZmaWNlcnMgYW5kIGVtcGxveWVlcyBzaGFsbCBub3QgYmUgbGlhYmxlIGZvciBhbnkgZGlyZWN0LCBpbmRpcmVjdCwgaW5jaWRlbnRhbCwgc3BlY2lhbCBvciBjb25zZXF1ZW50aWFsIGRhbWFnZXMuZAIRDxYCHwAFHGJvdHRvbWJhbm5lcmNvbnRhaW5lcl9zYWZhcmlkGAEFHl9fQ29udHJvbHNSZXF1aXJlUG9zdEJhY2tLZXlfXxYIBQ9jdGwwMCRMb2dpbk1lbnUFC2N0bDAwJG1NYWluBQ5jdGwwMCRNYWluTWVudQUWY3RsMDAkRnJlZVJlZ2lzdGVyTWVudQUYY3RsMDAkTWVtYmVyc2hpcEJlbmVmaXRzBRJjdGwwMCRTZWFyY2hCdXR0b24FEWN0bDAwJFN1cHBvcnRNZW51BRNjdGwwMCRMYW5ndWFnZXNNZW51NlBIALTovVw6LJEOuDXyhCTS4+M=
__VIEWSTATEGENERATOR:ECDA716A
__EVENTVALIDATION:/wEdAAVswH4c0JxRe30eXDiX0bhcXr7XOgipC8DNcjKl0sbO7fwNII+YQgXfxmh/KZz6Myr4IcjYoaGuA6R78NuEHgsNQX9+ScDGDIM47zqhQCjs5Ynd+DEUmo0/Xv9Oy6tQgLO7ip/G
ctl00$mMain:{"selectedItemIndexPath":"0i0","checkedState":""}
ctl00$MainMenu:{"selectedItemIndexPath":"","checkedState":""}
ctl00$FreeRegisterMenu:{"selectedItemIndexPath":"","checkedState":""}
ctl00$MembershipBenefits:{"selectedItemIndexPath":"","checkedState":""}
ctl00$SearchBox$State:{"rawValue":"","validationState":""}
ctl00$SearchBox:Enter Symbol
ctl00$MainContent$uEmail:test#test.test
ctl00$MainContent$uPassword:test
ctl00$MainContent$ASPxCheckBox1:I
ctl00$SupportMenu:{"selectedItemIndexPath":"","checkedState":""}
ctl00$LanguagesMenu:{"selectedItemIndexPath":"","checkedState":""}
DXScript:1_304,1_185,1_298,1_211,1_221,1_188,1_182,1_290,1_296,1_279,1_198,1_209,1_217,1_201
DXCss:1_40,1_50,1_53,1_51,1_4,1_16,1_13,0_4617,0_4621,1_14,1_17,Styles/Site.css,img/favicon.ico,https://adservice.google.com/adsid/integrator.js?domain=www.americanbulls.com,https://securepubads.g.doubleclick.net/static/3p_cookie.html
__ASYNCPOST:true
ctl00$MainContent$btnSubmit:Sign In
Your code looks great. It just looks like the script is failing because you're not submitting everything that the browser would normally submit. You could try continuing down the path you are on, submit all of the extra form data, and hope you don't have to bother with adding a CSRF token (a CSRF token is a randomly generated string that you're required to send back), or you can do as Sidharth Shah sugggested and use Selenium.
There is a Firefox extension for Selenium that will allow you to start recording your mouse and keyboard actions, and then when you are done, you can export the results in Python. That Python code will depend on the Selenium library and a Selenium Chrome/Firefox/IE driver. When you run your Python code, a new browser window will open up, controlled by the selenium driver and your Python code. It's pretty cool, your basically writing Python code that controls a browser window. You will have to modify the Python code that the Firefox extension gives you a little bit to read all of the data from the page and start doing stuff with it after you're logged in, but the code for opening the browser window, navigating to athe login page, filling in your login credentials and submitting the form, and navigating to other pages after you're logged in will all be written for you.

building Web browser in Python and issue regarding cookies

I know this sounds weird, but I have got no choice, I searched the google and I found nothing, So..
I'm following a video tutorial https://www.youtube.com/watch?v=JEW50aEVi4k on 'building a webbrowser in python', I was wondering if cookies can be saved, So is it possible ?
If yes, then could you give some suggestions.
Cookies are not a problem - you can use mechanize (https://pypi.python.org/pypi/mechanize/) which saves and sends the cookies automatically.
import mechanize
browser = mechanize.Browser()
browser.set_handle_robots(False)
response = browser.open('http://www.youtube.com')
#Headers are handled automatically. You can access them:
headers = browser.request.header_items()
>>> headers
[('Host', 'www.youtube.com'), ('Cookie', 'YSC=cNcoiHG71bY; VISITOR_INFO1_LIVE=uLHsDODGalg; PREF=f1=50000000'), ('User-agent', 'Python-urllib/2.7')]
It is very hard to write a browser with Javascript support. If you need javasctipt then i suggest you to use selenium with PhantomJS which acts just like a real browser.

Figuring out url made in ajax post

At this link when hover over any row, then there is an image box which says "i" you can click to get extra data. Then navigate to Lines History. Where is that information coming from? I can't find the URL that is connected with that.
I used dev tools in chrome, and found out that there's an ajax post being made:
Request URL:http://www.sbrforum.com/ajax/?a=[SBR.Odds.Modules]OddsEvent_GetLinesHistory
Form Data: UserId=0&Sport=basketball&League=NBA&EventId=259672&View=LH&SportsbookId=238&DefaultBookId=238&ConsensusBookId=19&PeriodTypeId=&StartDate=2014-03-24&MatchupLink=http%3A%2F%2Fwww.sbrforum.com%2Fnba-basketball%2Fmatchups%2F20140324-602%2F&Key=de2f9e1485ba96a69201680d1f7bace4&theme=default
but when I try to visit this url in browser I got Invalid Ajax Call -- from host:
Any idea?
Like you say, it's probably an HTTP POST request.
When you navigate to the URL with the browser, the browser issues a GET request, without all the form data.
Try curl, wget, or the javascript console in your browser to do a POST.

How to access netgear router web interface

What I am trying to do is access the traffic meter data on my local netgear router. It's easy enough to login to it and click on the link, but ideally I would like a little app that sits down in the system tray (windows) that I can check whenever I want to see what my network traffic is.
I'm using python to try to access the router's web page, but I've run into some snags. I originally tried modified a script that would reboot the router (found here https://github.com/ncw/router-rebooter/blob/master/router_rebooter.py) but it just serves up the raw html and I need it after the onload javascript functions have run. This type of thing is described in many posts about web scraping and people suggested using selenium.
I tried selenium and have run into two problems. First, it actually opens the browser window, which is not what I want. Second, it skips the stuff I put in to pass the HTTP authentication and pops up the login window anyway. Here is the code:
from selenium import webdriver
baseAddress = '192.168.1.1'
baseURL = 'http://%(user)s:%(pwd)s#%(host)s/traffic_meter.htm'
username = 'admin'
pwd = 'thisisnotmyrealpassword'
url = baseURL % {
'user': username,
'pwd': pwd,
'host': baseAddress
}
profile = webdriver.FirefoxProfile()
profile.set_preference('network.http.phishy-userpass-length', 255)
driver = webdriver.Firefox(firefox_profile=profile)
driver.get(url)
So, my question is, what is the best way to accomplish what I want without having it launch a visible web browser window?
Update:
Okay, I tried sircapsalot's suggestion and modified the script to this:
from selenium import webdriver
from contextlib import closing
url = 'http://admin:notmyrealpassword#192.168.1.1/start.htm'
with closing(webdriver.Remote(desired_capabilities = webdriver.DesiredCapabilities.HTMLUNIT)) as driver:
driver.get(url)
print(driver.page_source)
This fixes the web browser being loaded, but it failed the authentication. Any suggestions?
Okay, I found the solution and it was way easier than I thought. I did try John1024's suggestion and was able to download the proper webpage from the router using wget. However I didn't like the fact that wget saved the result to a file, which I would then have to open and parse.
I ended up going back to the original reboot_router.py script I had attempted to modify unsuccessfully the first time. My problem was I was trying to make it too complicated. This is the final script I ended up using:
import urllib2
user = 'admin'
pwd = 'notmyrealpassword'
host = '192.168.1.1'
url = 'http://' + host + '/traffic_meter_2nd.htm'
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, host, user, pwd)
authhandler = urllib2.HTTPBasicAuthHandler(passman)
opener = urllib2.build_opener(authhandler)
response = opener.open(url)
stuff = response.read()
response.close()
print stuff
This prints out the entire traffic meter webpage from my router, with its proper values loaded. I can then take this and parse the values out of it. The nice thing about this is it has no external dependencies like selenium, wget or other libraries that needs to be installed. Clean is good.
Thank you, everyone, for your suggestions. I wouldn't have gotten to this answer without them.
The web interface for my Netgear router (WNDR3700) is also filled with javascript. Yours may differ but I have found that my scripts can get all the info they need without javascript.
The first step is finding the correct URL. Using FireFox, I went to the traffic page and then used "This Frame -> Show only this frame" to discover that the URL for the traffic page on my router is:
http://my_router_address/traffic.htm
After finding this URL, no web browswer and no javascript is needed. I can, for example, capture this page with wget:
wget http://my_router_address/traffic.htm
Using a text editor on the resulting traffic.htm file, I see that the traffic data is available in a lengthy block that starts:
var traffic_today_time="1486:37";
var traffic_today_up="1,959";
var traffic_today_down="1,945";
var traffic_today_total="3,904";
. . . .
Thus, the traffic.htm file can be easily captured and parsed with the scripting language of your choice. No javascript ever needs to be executed.
UPDATE: I have a ~/.netrc file with a line in it like:
machine my_router_address login someloginname password somepassword
Before wget downloads from the router, it retrieves the login info from this file. This has security advantages. If one runs wget http://name#password..., then the password is viewable to all on your machine via the process list (ps a). Using .netrc, this never happens. Restrictive permissions can be set on .netrc, e.g. readable only by user (chmod 400 ~/.netrc).

Categories