scraping way2sms with mechanize

scraping way2sms with mechanize - python

I am trying to send an sms with by scraping way2sms.com, but I am unable to login into way2sms.com using mechanize.
I am using following code to submit the login form.
import mechanize
br = mechanize.Browser()
br.set_handle_robots(False)
br.set_handle_refresh(False)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; Linux x86_64; rv:18.0) Gecko/20100101 Firefox/18.0')]
res=br.open('http://wwwa.way2sms.com/content/prehome.jsp')
link=list(br.links())[5]
res=br.follow_link(link)
br.form = list(br.forms())[0]
br.form.find_control('username').value=USERNAME #user name
br.form.find_control('password').value=PASSWORD #password
res=br.submit()
After submitting the form, again the login page is received.

Just replace username and password with your username and password.
import mechanize
import cookielib
br = mechanize.Browser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
# User-Agent
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
url = 'http://site25.way2sms.com/content/index.html?'
#Opening WEbsite
op = br.open(url)
#Selection form
br.select_form(nr=0)
username = 'mobilenumberhere'
password = 'passwordhere'
#Give username and password
br.form['username'] = username
br.form['password'] = password
br.submit()
#To check whether log in Successful or not
if username in br.geturl():
print "Login Failed" # Go to way2sms and enter wrong details. You will understand this.
else:
print "Login Successful. You are at ", br.geturl()

Related

How can I navigate a site after logging in

I have used mechanize and successfully logged into a user login page. Now I want to navigate the site to a specific page in the submenus. When I try this by opening the URL of the specific page after logging in, another login page comes up which I do not have a username and password for. This log in page does not usually show up when I am navigating the site on a web browser.
How can I do this?
import mechanize
import webbrowser
import cookielib
usern = '****'
passw = '****'
br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_handle_robots(False)
br.addheaders = [('User-agent', 'Mozilla/5.0 X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
r = br.open("https://myunihub-1.mdx.ac.uk/cas-web/login?service=https%3A%2F%2Fmyunihub.mdx.ac.uk%2Fc%2Fportal%2Flogin")
br.select_form(nr=0)
br.form['username'] = usern
br.form['password'] = passw
br.set_cookiejar(cj)
br.submit
url = "https://misis.mdx.ac.uk/mislve/bwskfshd.P_CrseSchd"
webbrowser.open_new(url)

Try to use cookies and pretend to be actual browser. Some sites doesn't allow automated scripts/robots to crawl their sites. But you can always tell them no no I'm actual browser.
import cookielib
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
And let's pretend we are not a robot and a actual browser.
br.set_handle_robots(False)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]

to extract hindi,tamil,punjabi(Indian languages) post from a social networking site

i am using python and beautiful soup..trying to extract hindi,tamil,punjabi(Indian languages) post from a social networking site with the help of cookies..i am bale to extract but the extract is not in that language itself rather is in some encoded form ..i want it in the same language..eg:hindi post should be extracted the same in hindi only..
import mechanize
import cookielib
from bs4 import BeautifulSoup
import urllib2
import csv
from html2text import html2text
import re
br = mechanize.Browser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
urls = []
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1'),('Connection','keep-alive'),('Accept','application/json, text/javascript, */*; q=0.01'),('Accept-Encoding','gzip, deflate, sdch'),('Host','link'),('Referer','https://link/'),('X-Requested-With','XMLHttpRequest'),('Accept-Language','en-US,en;q=0.8')]
br.open('https://link')
br._factory.is_html = True
# Select the first (index zero) form
#br.select_form(predicate=lambda f: f.attrs.get('id', None) == 'login_form')
br.select_form(nr=0)
# User credentials
br.form['USER'] = 'username'
br.form['PASSWORD'] = 'password'
# Login
br.submit()
soup = BeautifulSoup(br.response().read())
for tag in soup.find_all("div", re.compile("classname")):
#print tag
for tag1 in tag.find_all(re.compile("^p")):
print tag1
output sample:
\u0baa\u0b9f\u0bbf\u0ba4\u0bcd\u0ba4\u0ba4\u0bbf\u0bb2\u0bcd \u0baa\u0bbf\u0b9f\u0bbf\u0ba4\u0bcd\u0ba4\u0ba4\u0bc1 \u263a
expected output: written in that particular language(here tamil)

unicode-escape worked for me.
.decode('unicode-escape')

python mechanize returns wrong 302 location

I am trying to do some automation for this site http://www.beistle.com/ but for visiting the search page i need pid which is generated on the server.
I looked for a response the browser makes and tried to do the same with mechanize in python.
import mechanize;
import cookielib;
import urllib2;
import lxml.html;
br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(False)
br.set_handle_referer(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1'),('Referer', 'http://www.beistle.com/'), ('Connection', 'keep-alive'), ('Host','www.beistle.com'), ('Accept-Language', 'en-US,en;q=0.5'), ('Accept-Encoding', 'gzip, deflate'), ('Accept','text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8') ]
br.set_debug_http(True)
br.set_debug_redirects(True)
br.set_debug_responses(True)
br.open("http://www.beistle.com/");
br.select_form(name='aspnetForm');
br.submit();
In browser it returns 302 with correct location in which the pid is written. However mechanize returns "Location: /Search.aspx" which can't be used for searching

Filling a form using a python script

I'm trying to write a python script that will fill a form on a website, send it, and after sending I want to search for a keyword on the resulting webpage.
More specifically, the form is: https://booking.elal.co.il/newBooking/changeOrder.jsp?LANG=EN&RESSYSTEMID=1
When I fill the form manually on the web, after I press the "continue" button I get kind of "processing page", and afterwards I get the webpage that I want to search on it the keyword.
I tried to use the script here: http://stockrt.github.io/p/handling-html-forms-with-python-mechanize-and-BeautifulSoup/ , but for some reason after submitting the form when I do: print br.response().geturl() I get the url of the "processing page", and not the url of the webpage I want to search on.
My Code:
import mechanize
import cookielib
from BeautifulSoup import BeautifulSoup
import html2text
# Browser
br = mechanize.Browser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
# User-Agent (this is cheating, ok?)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
# The site we will navigate into, handling it's session
br.open('https://booking.elal.co.il/newBooking/changeOrder.jsp?LANG=EN&RESSYSTEMID=1')
# Select the first (index zero) form
br.select_form(nr=0)
# User credentials
br.form['REC_LOC'] = '...'
br.form['DIRECT_RETRIEVE_LASTNAME'] = '...'
# Login
br.submit()
#Trying to print the webpage
html = br.response().read()
print html2text.html2text(html)
Is it possible to do what I want, and how can I do it?

Python, parse html form

how I can get input from html forms on other sites?
I want it to return a dictionary such as:
form = [('name' = 'somename', 'type' = 'text', 'value':''},{' name' = 'somename', 'type' = 'submit', 'value': ' submit ').
Sorry for my English.

you probably wont be able to retrieve form data from other users on other sites. If you wish to use a script to send data to a form, mechanize is one tool that makes this quite easy.

Yeah mechanize is sweet !
import mechanize
# Browser
br = mechanize.Browser()
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
# we inspect the all form element in the http://stackoverflow.com
br.open('http://stackoverflow.com')
for form in br.forms():
print form

Look at mechanize, lxml.html and BeatifulSoup.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

scraping way2sms with mechanize - python

Related

How can I navigate a site after logging in

to extract hindi,tamil,punjabi(Indian languages) post from a social networking site

python mechanize returns wrong 302 location

Filling a form using a python script

Python, parse html form

Categories

Resources