I'm trying to access Sharepoint using mechanize but i got a 401 error. Here's the code i'm using:
import mechanize
url = "http://sharepoint:8080/foo/bar/foobar.aspx"
br.addheaders = [('User-agent', 'Mozilla/4.0(compatible; MSIE 7.0b; Windows NT 6.0)')]
br.add_password(url, 'domain\\user', 'myPassword')
r = br.open(url)
html = r.read()
Did i miss anything?
Did you happen to try Python Ntlm for accessing SharePoint?
Examples in the Ntlm doc will explain how to use it with Urllib2. Pasted below the code for using NTLM authentication using mechanize.
import mechanize
from ntlm import HTTPNtlmAuthHandler
pass_manager = mechanize.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, url, user, password)
auth_NTLM = HTTPNtlmAuthHandler.HTTPNtlmAuthHandler(pass_manager)
browser = mechanize.Browser()
browser.add_handler(auth_NTLM)
r = browser.open(url)
html = r.read()
Try with:
br.addheaders = [('User-agent', 'Mozilla/4.0(compatible; MSIE 7.0b; Windows NT 6.0)'), ('Authorization', 'Basic %s:%s' % ('domain\\user', 'myPassword'))]
instead of
br.addheaders = [('User-agent', 'Mozilla/4.0(compatible; MSIE 7.0b; Windows NT 6.0)')]
This should work if your sharepoint server provides Basic Auth.
Looking at the usage in the mechanize docs you only need to specify the username (eg 'john_doe', try this
...
br.add_password(url, 'username_string', 'myPassword')
r = br.open(url)
html = r.get_data() # r.get_data() can be called many times without calling seek
Related
I have been trying to automate a site login that requires cookies. I found an answer on this site, and replied to it, but I had problems logging in becasue I forgot I had an account here already. I apologize for the double post, but I was worried my reply wouldn't be seen.
Can't automate login using python mechanize (must "activate" specific browser)
One question. When trying to replicate this, I run into an error.
File "test5.py", line 6, in <module>
self.br = mechanize.Browser( factory=mechanize.RobustFactory() )
NameError: name 'self' is not defined
I usually script in Perl, but have been reading that this python module would be much easier for what I'm trying to accomplish.
Here is my code:
#!/usr/bin/python
import sys
import mechanize
from mechanize import ParseResponse, urlopen, urljoin
self.br = mechanize.Browser( factory=mechanize.RobustFactory() )
self.br.add_handler(PrettifyHandler())
cj = cookielib.LWPCookieJar()
cj.save('cookies.txt', ignore_discard=False, ignore_expires=False)
self.br.set_cookiejar(cj)
self.br.addheaders = [('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'),
('User-agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20100101 Firefox/16.0'),
('Referer', 'https://--------------/admin/login.jsp'),
('Accept-Encoding', 'gzip,deflate,sdch'),
('Accept-Language', 'en-US,en;q=0.5'),
]
self.br.open('https://--------------/admin/login.jsp')
# Select the first (index zero) form
self.br.select_form(nr=0)
# User credentials
self.br.form['email'] = 'emailaddress'
self.br.form['password'] = 'password'
# Login
self.br.submit()
# Inventory
body = self.br.response().read().split('\n')
To me, it looks like a problem declaring the variable self, but i'm not too familiar enough with Python to know if that is the case.
Any ideas why I get this error would be greatly appreciated.
UPDATE::
I was able to get past the initial error by remove all instances of self. Now when I run the following code, I get this error:
raise FormNotFoundError("no form matching "+description)
mechanize._mechanize.FormNotFoundError: no form matching name 'Loginform'
Below is the code:
!/usr/bin/python
import sys
import mechanize
import cookielib
from mechanize import ParseResponse, urlopen, urljoin, Browser
from time import sleep
class PrettifyHandler(mechanize.BaseHandler):
def http_response(self, request, response):
if not hasattr(response, "seek"):
response = mechanize.response_seek_wrapper(response)
# only use BeautifulSoup if response is html
if response.info().dict.has_key('content-type') and ('html' in response.info().dict['content-type']):
soup = MinimalSoup (response.get_data())
response.set_data(soup.prettify())
return response
br = mechanize.Browser( factory=mechanize.RobustFactory() )
br.add_handler(PrettifyHandler())
br.set_handle_robots(False)
cj = cookielib.LWPCookieJar()
cj.save('cookies.txt', ignore_discard=False, ignore_expires=False)
br.set_cookiejar(cj)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
br.addheaders = [('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,* /*;q=0.8'),
('User-agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20100101 Firefox/16.0'),
('Referer', 'https://****/admin/login.jsp'),
('Accept-Encoding', 'gzip,deflate,sdch'),
('Accept-Language', 'en-US,en;q=0.5'),
]
br.open('https://****/admin/login.jsp')
print br.response
# Select the first (index zero) form
br.select_form(name='Loginform')
#br.select_form(nr=0)
# User credentials
br.form['email'] = 'luskbo#gmail.com'
br.form['password'] = 'password!!!'
# Login
br.submit()
# Inventory
body = br.response().read().split('\n')
Just remove every self., then it should work (if there arn't any other errors).
self usually is only used to refer to the current object from within a method in a class, not at module level.
What i need is to extract uid cookie from the first web site and open the second one with it (it's a sort of authorisation)
it neither works with this code:
#!/usr/bin/env python
import urllib, urllib2, cookielib
import socket, Cookie
def extract(url):
jar = cookielib.FileCookieJar("cookies")
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(jar))
opener.addheaders = [('User-agent',
'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.14) Gecko/20110201 Firefox/2.0.0.14')]
response = opener.open(url)
for cookie in jar:
precious_value = cookie.value
return precious_value
site1 = "mysite1.com"
site2 = "mysite2.com"
cp = urllib2.HTTPCookieProcessor()
cj = cp.cookiejar
cj.set_cookie(cookielib.Cookie(0, cookie_name,
extract(site1),
'80', False, 'domain', True, False, '/path',
True, False, None, False, None, None, None))
opener = urllib2.build_opener(urllib2.HTTPHandler(),cp)
opener.addheaders.append(('User-agent', 'Mozilla/5.0 (compatible)'))
print opener.open(site2).read()
nor this way:
#!/usr/bin/env python
import urllib, urllib2, cookielib
def extract(url):
jar = cookielib.FileCookieJar("cookies")
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(jar))
opener.addheaders = [('User-agent',
'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.14) Gecko/20110201 Firefox/2.0.0.14')]
response = opener.open(url)
for cookie in jar:
precious_value = cookie
return precious_value
site1 = "mysite1.com"
site1 = "mysite2.com"
jar = cookielib.FileCookieJar("cookies")
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(jar))
opener.addheaders = [('User-Agent','Mozilla/5.0 (Windows; U; Windows NT 5.1;
en-GB; rv:1.8.1.14) Gecko/20110201 Firefox/2.0.0.14')]
opener.addheaders = [('Cookies', extract(site1))]
response = opener.open(site2)
print response.read()
however I've managed to succeed here with 'requests' library
the code looks nice:
cookies= dict(mycid='9ti6cACUi6AqxXBG2H9AMPkrfRbBJPalKTAh_bLcuQ8c8C')
r = requests.get(url, cookies = cookies)
print r.text
Its fine for me and I don't have anything against requests... but still what have i done wrong during two first attempts? In both cases extract procedures work fine and I see that uid has been properly extracted. I guess the problem is with add_headers area. The answer is obvious but still can get through. Can someone help?
1) What is the proper way to pass a cookie into headers only with urllib or urllib2?
2) How can I pass it as a parameter which can be changed, not just reference to extracted object?
3)How should I properly pass it as an object name/value?
Thanks in advance
Your loop def extract(url): has two problems:
It always returns the last value, which is not necessarily where you cookie is
It makes assumption on the order in which cookie are store which you can't know
(i'm assuming precious_value is defined somewhere else otherwise this code doesn't work)
To know which key you should use to retrieve the particular cookie you're interested in, you can use chrome developer tools to see what's the name of the cookie set by the site you want.
Hope this helps.
** edit
after a short thought, I just figured that I don't have to use mechanize at all
and yet I don't know which Python library I should use in order to interact w/
cookies and session data,
can anyone please hint me ? **
I would like to perform a simple login and use the credentials ( and cookies, session data too ) for some site.
I used mechanize in order to perform the basic form usage, since the form is being built using Javascript
import cookielib
br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
br.set_debug_http(True)
br.set_debug_redirects(True)
br.set_debug_responses(True)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
parameters = { 'username' : 'w00t',
'password' : 't00w'
}
data = urllib.urlencode(parameters)
resp = br.open(url,data)
however for some reason I can't seem to get any positive response from the server, I don't see any sign (for ex redirection to the desired page) , nor I know how to continue once I have the cookies and session to actually continue using these cookies and session data
I was wondering if anyone could hint me or refer me to the correct documentation, as what I have found does not seem to solve my problem
I've used the Requests library (http://docs.python-requests.org/en/latest/index.html) for this sort of thing in Python before. I found it very straight forward and to have great documentation. Here's an example that includes cookies in a request:
>>> url = 'http://httpbin.org/cookies'
>>> cookies = dict(cookies_are='working')
>>> r = requests.get(url, cookies=cookies)
>>> r.text
'{"cookies": {"cookies_are": "working"}}'
I have used Mechanize and if I recall, it keeps track of cookies for you. To the contrary, this library will require you to constantly repost the cookies upon requests.
I'm trying to use python mechanize to retrive the list of apps on iTunes connect. Once this list is retrieved, further work will be done with those links.
Logging in succeeds but then when i follow the "Manage Your Applications" link I get redirected back to the login page. It is as if the session gets lost.
import mechanize
import cookielib
from BeautifulSoup import BeautifulSoup
import html2text
filename = 'itunes.html'
br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.set_handle_equiv(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
br.open('https://itunesconnect.apple.com/WebObjects/iTunesConnect.woa')
br.select_form(name='appleConnectForm')
br.form['theAccountName'] = username
br.form['theAccountPW'] = password
br.submit()
apps_link = br.find_link(text='Manage Your Applications')
print "Manage Your Apps link = ", apps_link
req = br.follow_link(text='Manage Your Applications')
for app_link in br.links():
print "link is ", app_link
Any ideas what could be wrong?
You need to save/load the cookiejar
Figured this out after further investigation. This was due to a known bug in cookielib documented here: http://bugs.python.org/issue3924
Basically some sites (notably itunesconnect), set the cookie version as a string not an int. Which causes an error in cookielib since it does not deal with that condition. The fix at the bottom of that issue thread worked for me.
hello there
i was wondering if it was possible to connect to a http host (I.e. for example google.com)
and download the source of the webpage?
Thanks in advance.
Using urllib2 to download a page.
Google will block this request as it will try to block all robots. Add user-agent to the request.
import urllib2
user_agent = 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-US) AppleWebKit/534.3 (KHTML, like Gecko) Chrome/6.0.472.63 Safari/534.3'
headers = { 'User-Agent' : user_agent }
req = urllib2.Request('http://www.google.com', None, headers)
response = urllib2.urlopen(req)
page = response.read()
response.close() # its always safe to close an open connection
You can also use pyCurl
import sys
import pycurl
class ContentCallback:
def __init__(self):
self.contents = ''
def content_callback(self, buf):
self.contents = self.contents + buf
t = ContentCallback()
curlObj = pycurl.Curl()
curlObj.setopt(curlObj.URL, 'http://www.google.com')
curlObj.setopt(curlObj.WRITEFUNCTION, t.content_callback)
curlObj.perform()
curlObj.close()
print t.contents
You can use urllib2 module.
import urllib2
url = "http://somewhere.com"
page = urllib2.urlopen(url)
data = page.read()
print data
See the doc for more examples
The documentation of httplib (low-level) and urllib (high-level) should get you started. Choose the one that's more suitable for you.
Using requests package:
# Import requests
import requests
#url
url = 'https://www.google.com/'
# Create the binary string html containing the HTML source
html = requests.get(url).content
or with the urllib
from urllib.request import urlopen
#url
url = 'https://www.google.com/'
# Create the binary string html containing the HTML source
html = urlopen(url).read()
so here's another approach to this problem using mechanize. I found this to bypass a website's robot checking system. i commented out the set_all_readonly because for some reason it wasn't recognized as a module in mechanize.
import mechanize
url = 'http://www.example.com'
br = mechanize.Browser()
#br.set_all_readonly(False) # allow everything to be written to
br.set_handle_robots(False) # ignore robots
br.set_handle_refresh(False) # can sometimes hang without this
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')] # [('User-agent', 'Firefox')]
response = br.open(url)
print response.read() # the text of the page
response1 = br.response() # get the response again
print response1.read() # can apply lxml.html.fromstring()