Python clicking a button on webpage (back end) - python

I am trying using Python to simulate the login to company email.
The scripts works find and result page shows an already login sign.
import urllib
import urllib2
import mechanize
import cookielib
br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.set_handle_equiv(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
br.addheaders = [('User-agent', 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)')]
url = 'http://www.company.com/'
form_data = {'name_file': 'THENAME', 'password_field': 'THEPASSWORD'}
params = urllib.urlencode(form_data)
response = br.open(url, params)
However I need to click the “GoTo Email” button on the webpage to enter the email dashboard. Note here the web address doesn’t change, and not redirecting to any other page, before and after clicking the button.
The HTML script is showed as below. I think it’s a button.
id=mail1_btnGoto class=btn onmouseover="this.className='btnOver'" onmouseout="this.className='btn'" name=mail1$btnGoto value="GoTo Email" type=submit>
I thought to use winapi to simulate a mouse click but it’s silly because it controls mouse at the front end only. Selenium isn’t a solution in this case because I want to have the script run at the back end.
How can I have the button ‘click’ on the webpage?

It seems the email dashboard is driven by Javascript, so you cannot simply use winapi to simulate mouseclick without evaluating script.
Generally there are two workarounds:
Use full-feature browser driver. As you mentioned above, selenium is a good choice across many programming language. The webdriver does not need opening browsers manually and can be fully controlled by scripts. You can try ghost driver instead. It uses PhantomJS and should run in backend server.(But installing phantomjs is required)
mock the request. Because logining will usually invoke a http/https request. You can use python to mock that request. You can use http debbuging tools like fiddler, wireshark or Chrome web inspector to capture the information the browser sent to authentication server.
I tries to be specific and detailed. But due to the diversity of web crawling a step by step guide is beyond my reach.

Related

Python transfer mechanize browser session

I'm having a little difficulty trying to navigate a website past the login screen. I've done this using mechanize. However once I navigate past the login page I want to interact with the page, click attributes, etc. which mechanize cannot do. I also want to do this all "behind the curtain" so the browser window is invisible (trying not to use selenium).
Here is the code I use to login. What can I do past this to start interacting with the page
import mechanize
br = mechanize.Browser()
#get computer browser
br.set_handle_robots(False)
#what robots?
br.open("www.website.com")
#open website
br.select_form(nr=0)
#get the main form
br.set_all_readonly(False)
for control in br.form.controls:
print control
user_control = br.form.controls[0]
user_control._value = 'username'
user_password = br.form.controls[1]
user_password._value = 'password'
br.submit()
One option would be to "transfer" cookies from mechanize to selenium and use selenium with a headless browser like PhantomJS or with a virtual display. Or, just switch to selenium+PhantomJS completely (including the authentication step).
See also:
Python: how to dump cookies of a mechanize.Browser instance?
How to save and load cookies using python selenium webdriver

python and mechanize to login into university webpage

I know quite a few people asked a similar question like this, but after looking though the answers and following this tips I can't get this script to work...
Here is my problem... I am trying to write a python script using the "mechanize" module to login into my university "meal balance" page and get the source html of the page that displays my declining balance for food, then I will parse the html source and get the numbers...
The problem is accessing the said webpage and logging in...
This is the login website: http://www.wcu.edu/11407.asp
Towards the end you will see the FORM I need to fill...
Here is the code I am trying to use in order to login and get the page with my declining balance:
import mechanize, cookielib
from time import sleep
url = 'http://www.wcu.edu/11407.asp'
myId = 'xxxxxxxx'
myPin = 'xxxxxxxx'
# Browser
#br = mechanize.Browser()
#br = mechanize.Browser(factory=mechanize.DefaultFactory(i_want_broken_xhtml_support=True))
br = mechanize.Browser(factory=mechanize.RobustFactory()) # Use this because of bad html
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
# User-Agent (fake agent to google-chrome linux x86_64)
br.addheaders = [('User-agent','Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11'),
('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'),
('Accept-Encoding', 'gzip,deflate,sdch'),
('Accept-Language', 'en-US,en;q=0.8'),
('Accept-Charset', 'ISO-8859-1,utf-8;q=0.7,*;q=0.3')]
# The site we will navigate into, handling it's session
br.open(url)
for f in br.forms():
print f
# Select the third (index two) form
br.select_form(nr=2)
# User credentials
br.form['id'] = myId
br.form['PIN'] = myPin
br.form.action = 'https://itapp.wcu.edu/BanAuthRedirector/Default.aspx
# Login
res = br.submit().read()
sleep(10)
f = file('mycatpage.html', 'w')
f.write(res)
f.close()
This gives me the login page back and not the page after.... Why???
Why don't you just check where the error originates from by typing your code into a python shell? Or testing it with another site? There are a number of obvious possibilities to test against the reason of the error you face.
Look into my problem here
Also an auto login for my university its page, with working code and html code as example.

Scrape Facebook in Python

I'm interested in getting the number of friends each of my friends on Facebook has. Apparently the official Facebook API does not allow getting the friends of friends, so I need to get around this (somehwhat sensible) limitation somehow. I tried the following:
import sys
import urllib, urllib2, cookielib
username = 'me#example.com'
password = 'mypassword'
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'email' : username, 'pass' : password})
request = urllib2.Request('https://login.facebook.com/login.php')
request.add_header('User-Agent','Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.12) Gecko/20101027 Fedora/3.6.12-1.fc14 Firefox/3.6.12')
opener.open(request, login_data)
resp = opener.open('http://facebook.com')
print resp.read()
but I only end up with a captcha page. Any idea how FB is detecting that the request is not from a "normal" browser? I could add an extra step and solve the captcha but that would add unnecessary complexity to the program so I would rather avoid it. When I use a web browser with the same User-Agent string I don't get a captcha.
Alternatively, does anyone have any saner ideas on how to accomplish my goal, i.e. get a list of friends of friends?
Have you tried tracing and comparing HTTP transactions with Fiddler2 or Wireshark? Fiddler can even trace https, as long as your client code can be made to work with bogus certs.
I did try a lot of ways to scrape facebook and the only way that worked for me is :
To install selenium , the firefox plugin, the server and the python client library.
Then with the firefox plugin, you can record the actions you do to login and export as a python script, you use this as a base for your work and it will work. Basically I added to this script a request to my webserver to fectch a list of things to inspect on FB and then at the end of the script I send the results back to my server.
I could NOT find a way to do it directly from my server with a browser simulator like mechanize or else ! I guess It needs to be done from a client browser.

Python - The request headers for mechanize

I am looking for a way to view the request (not response) headers, specifically what browser mechanize claims to be. Also how would I go about manipulating them, eg setting another browser?
Example:
import mechanize
browser = mechanize.Browser()
# Now I want to make a request to eg example.com with custom headers using browser
The purpose is of course to test a website and see whether or not it shows different pages depending on the reported browser.
It has to be the mechanize browser as the rest of the code depends on it (but is left out as it's irrelevant.)
browser.addheaders = [('User-Agent', 'Mozilla/5.0 blahblah')]
You've got an answer on how to change the headers, but if you want to see the exact headers that are being used try using a proxy that displays the traffic. e.g. Fiddler2 on windows or see this question for some Linux altenatives.
you can modify referer too...
br.addheaders = [('Referer', 'http://google.com')]

python mechanize.browser submit() related problem

im making some script with mechanize.browser module.
one of problem is all other thing is ok, but when submit() form,it not working,
so i was found some suspicion source part.
in the html source i was found such like following.
<form method="post" onsubmit="return loginCheck(this)" name="FRMLOGIN"/>
im thinking, loginCheck(this) making problem when submit form.
but how to handle this kind of javascript function with mechanize module ,so i can
successfully submit form and can receive result?
folloing is my current script source.
if anyone can help me ..much appreciate!!
# -*- coding: cp949-*-
import sys,os
import mechanize, urllib
import cookielib
from BeautifulSoup import BeautifulSoup,BeautifulStoneSoup,Tag
import datetime, time, socket
import re,sys,os,mechanize,urllib,time
br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
# Want debugging messages?
br.set_debug_http(True)
br.set_debug_redirects(True)
br.set_debug_responses(True)
# User-Agent (this is cheating, ok?)
br.addheaders = [('User-agent', 'Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.6')]
br.open('http://user.buddybuddy.co.kr/Login/LoginForm.asp?URL=')
html = br.response().read()
print html
br.select_form(name='FRMLOGIN')
print br.viewing_html()
br.form['ID']='zero1zero2'
br.form['PWD']='012045'
br.submit()
print br.response().read()
mechanize doesn't support Javascript at all. If you absolutely have to run that Javascript, look into Selenium. It offers python bindings to control a real, running browser like Firefox or IE.
you will either need to make use of unmaintained module DOMForm and Spidermonkey (http://pypi.python.org/pypi/python-spidermonkey) to process javascript. Or you figure out what loginCheck() is doing and perform its work prior form submission in python. If loginCheck() just checks for obvious validity of login data, that should be pretty easy.
Please note, that the action parameter of the stated form tag is missing. It's probably given in the javascript part.
Depending on what you intend it might be easier to work with urllib2 only. You might assume a static appearance of that web page and just post data with urllib2's methods and get the results with it also.
onsubmit is just ignored by mechanize, no javascript interpretation is done.
You need to verify what loginCheck(); in some limited case (Validation) you can do programmatically what javascript does.

Categories