Python Authentication - python

I'm new to python and after struggling with myself a little bit I almost got the code to working.
import urllib, urllib2, cookielib
username = 'myuser'
password = 'mypass'
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login = urllib.urlencode({'user' : username, 'pass' : password})
opener.open('http://www.ok.com/', login)
mailb = opener.open('http://www.ok.com/mailbox').read()
print mailb
But the output I got after print is just a redirect page.
<html>
<head>
<META HTTP-EQUIV="Refresh" CONTENT="0;URL=https://login.ok.com/login.html?skin=login-page&dest=REDIR|http://www.ok.com/mailbox">
<HTML dir=ltr><HEAD><TITLE>OK :: Redirecting</TITLE>
</head>
</html>
Thanks

If a browser got that response, it would interpret it as a request to redirect to the URL specified.
You will need to do something similar with your script. You need to parse the <META> tag and locate the URL and then do a GET on that URL.

Related

Why Requests library cannot read the source-code?

I've been writing a python script for all the Natas challenges. So far, everything went smooth.
In challenge natas22, there is nothing on the page, but it gives you the link of the source-code. From the browser, I can reach to the source-code (which is PHP) and read it. But I cannot do it with my Python script. Which is very weird, because I've done that in other challenges...
I also tried to give a user-agent (up to date chrome browser), did not work.
Here is the small code:
import requests
user = 'natas22'
passw = 'chG9fbe1Tq2eWVMgjYYD1MsfIvN461kJ'
url = 'http://%s.natas.labs.overthewire.org/' % user
response = requests.get('http://natas22.natas.labs.overthewire.org/index-source.html', auth=(user, passw))
print(response.text)
Which returns:
<code><span style="color: #000000">
<br /></span>ml>id="viewsource"><a href="index-source.html">View sourcecode</a></div>nbsp;next level are:<br>";l.js"></script>
</code>
But in fact, it should had returned:
<? session_start();
if(array_key_exists("revelio", $_GET)) {
// only admins can reveal the password
if(!($_SESSION and array_key_exists("admin", $_SESSION) and $_SESSION["admin"] == 1)) {
header("Location: /");
} } ?>
<html> <head> <!-- This stuff in the header has nothing to do with the level --> <link rel="stylesheet" type="text/css" href="http://natas.labs.overthewire.org/css/level.css"> <link rel="stylesheet" href="http://natas.labs.overthewire.org/css/jquery-ui.css" /> <link rel="stylesheet" href="http://natas.labs.overthewire.org/css/wechall.css" /> <script src="http://natas.labs.overthewire.org/js/jquery-1.9.1.js"></script> <script src="http://natas.labs.overthewire.org/js/jquery-ui.js"></script> <script src=http://natas.labs.overthewire.org/js/wechall-data.js></script><script src="http://natas.labs.overthewire.org/js/wechall.js"></script> <script>var wechallinfo = { "level": "natas22", "pass": "<censored>" };</script></head> <body> <h1>natas22</h1> <div id="content">
<?
if(array_key_exists("revelio", $_GET)) {
print "You are an admin. The credentials for the next level are:<br>";
print "<pre>Username: natas23\n";
print "Password: <censored></pre>";
} ?>
<div id="viewsource">View sourcecode</div> </div> </body> </html>
Why it's behaving like this? I'm very curious and couldn't find out
If you want the url for trying from the browser:
url: http://natas22.natas.labs.overthewire.org/index-source.html
Username: natas22
Password: chG9fbe1Tq2eWVMgjYYD1MsfIvN461kJ
Your code seems to be fine. The source code use \r instead of \n, so most of the code is hidden in a terminal.
You can see this using response.content instead of response.test to see this:
import requests
user = 'natas22'
passw = 'chG9fbe1Tq2eWVMgjYYD1MsfIvN461kJ'
url = 'http://%s.natas.labs.overthewire.org/' % user
response = requests.get('http://natas22.natas.labs.overthewire.org/index-source.html', auth=(user, passw))
print(response.content)
Try:
import requests
user = 'natas22'
passw = 'chG9fbe1Tq2eWVMgjYYD1MsfIvN461kJ'
url = 'http://%s.natas.labs.overthewire.org/' % user
response = requests.get('http://natas22.natas.labs.overthewire.org/index-source.html', auth=(user, passw))
print(response.text.replace('\r', '\n'))
This also works:
import requests
user = 'natas22'
passw = 'chG9fbe1Tq2eWVMgjYYD1MsfIvN461kJ'
url = 'http://%s.natas.labs.overthewire.org/' % user
response = requests.get('http://natas22.natas.labs.overthewire.org/index-source.html', auth=(user, passw))
print(response.content.decode('utf8').replace('\r', '\n'))

Bypassing company single-sign-on using requests in Python

from requests.auth import HTTPDigestAuth
url_0 = "https://xyz.ecorp.abc.com/"
r = requests.get(url_0, auth=HTTPDigestAuth('username', 'password'))
r.text
I am trying to get data from this site but when I log in using requests library then my company has additional layer of security tied to it which directs me to Single sign on page. Is there a way to bypass that and go to the above URL?
Below is the response from server:
<HTML>
<HEAD>
<SCRIPT language="JavaScript">
function redirect({
window.location.replace("https://abc.xyz.com/CwsLogin/cws/sso.htm");
}
</SCRIPT>
</HEAD>
<BODY onload="redirect()"></BODY>
</HTML>

Python3: http.client with privoxy/TOR making bad requests

I'm trying to use TOR with http.client.HTTPConnection, but for some reason I keep getting weird responses from everything. I'm not really sure exactly how to explain, so here's an example of what I have:
class Socket(http.client.HTTPConnection):
def __init__(self, url):
super().__init__('127.0.0.1', 8118)
super().set_tunnel(url)
#super().__init__(url)
def get(self, url = '/', params = {}):
params = util.params_to_query(params)
if params:
if url.find('?') == -1: url += '?' + params
else: url += '&' + params
self.request(
'GET',
url,
'',
{'Connection': 'Keep alive'}
)
return self.getresponse().read().decode('utf-8')
If I run this with:
sock = Socket('www.google.com')
print(sock.get())
I get:
<html><head><meta content="text/html;charset=utf-8" http-equiv="content-type"/>
<title>301 Moved</title></head><body>
<h1>301 Moved</h1>
The document has moved
here.
</body></html>
Google is redirecting me to the url I just requested, except with the privoxy port. And it gets weirder - if I try https://check.torproject.org:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<title>Welcome to sergii!</title>
</head>
<body>
<h1>Welcome to sergii!</h1>
This is sergii, a system run by and for the Tor Project.
She does stuff.
What kind of stuff and who our kind sponsors are you might learn on
db.torproject.org.
<p>
</p><hr noshade=""/>
<font size="-1">torproject-admin</font>
</body>
</html>
If I don't try to use privoxy/TOR, I get exactly what your browser gets at http://www.google.com or http://check.torproject.org. I don't know what's going on here. I suspect the issue is with python because I can use TOR with firefox, but I don't really know.
Privoxy log reads:
2015-06-27 19:28:26.950 7f58f4ff9700 Request: www.google.com:80/
2015-06-27 19:30:40.360 7f58f4ff9700 Request: check.torproject.org:80/
TOR log has nothing useful to say.
This ended up being because I was connecting with http:// and those sites wanted https://. It does work correctly for sites that accept normal http://.

How to handle redirects in url opener

Sorry for rookie question. I was wondering if there is an efficient url opener class in python that handle redirects. I'm currently using simple urllib.urlopen() but It's not working. This is an example:
http://thetechshowdown.com/Redirect4.php
For this url, the class I'm using does not follow the redirection to:
http://www.bhphotovideo.com/
and only shows:
"You are being automatically redirected to B&H.
Page Stuck? Click Here ."
Thanks in advance.
Use module requests - it folows redirections as default.
But page can be redirected by javascript so none of modules will follow this kind of redirection.
Turn off javascript in browser and go to http://thetechshowdown.com/Redirect4.php to see if it redirects you to other page
I checked this page - there is javascript redirect and HTML redirect (tag with "refresh" argument). Both aren't normal redirection send by server - so any module will not follow this redirection. You have to read page, find url in code and connect with that url.
import requests
import lxml, lxml.html
# started page
r = requests.get('http://thetechshowdown.com/Redirect4.php')
#print r.url
#print r.history
#print r.text
# first redirection
html = lxml.html.fromstring(r.text)
refresh = html.cssselect('meta[http-equiv="refresh"]')
if refresh:
print 'refresh:', refresh[0].attrib['content']
x = refresh[0].attrib['content'].find('http')
url = refresh[0].attrib['content'][x:]
print 'url:', url
r = requests.get(url)
#print r.text
# second redirection
html = lxml.html.fromstring(r.text)
refresh = html.cssselect('meta[http-equiv="refresh"]')
if refresh:
print 'refresh:', refresh[0].attrib['content']
x = refresh[0].attrib['content'].find('http')
url = refresh[0].attrib['content'][x:]
print 'url:', url
r = requests.get(url)
# final page
print r.text
That happens because of soft redirects. urllib is not following the redirects because it does not recognize them as such. In fact a HTTP response code 200 (page found) is issued and redirection will happen by some sort of side effect in browsers.
The first page has a HTTP responde code 200, but contains the following:
<meta http-equiv="refresh" content="1; url=http://fave.co/1idiTuz">
which instructs the browser to follow the link. The second resource issues a HTTP responsec code 301 or 302 (redirect) to another resource, where a second soft redirect takes place, this time with Javascript:
<script type="text/javascript">
setTimeout(function () {window.location.replace(\'http://bhphotovideo.com\');}, 2.75 * 1000);
</script>
<noscript>
<meta http-equiv="refresh" content="2.75;http://bhphotovideo.com" />
</noscript>
Unfortunately, you will have to extract the URLs to follow by hand. However, it's not difficult. Here is the code:
from lxml.html import parse
from urllib import urlopen
from contextlib import closing
def follow(url):
"""Follow both true and soft redirects."""
while True:
with closing(urlopen(url)) as stream:
next = parse(stream).xpath("//meta[#http-equiv = 'refresh']/#content")
if next:
url = next[0].split(";")[1].strip().replace("url=", "")
else:
return stream.geturl()
print follow("http://thetechshowdown.com/Redirect4.php")
I will leave the error handling to you :) also note that this might result in an endless loop if the target page contains a <meta> tag too. It is not your case, but you could add some sort of checks to prevent that: stop after n redirects, see if page is redirecting to itself, whichever you think is better.
You will probably need to install the lxml library.
The meta refresh redirection urls from html could look like any of these:
Relative urls:
<meta http-equiv="refresh" content="0; url=legal_notices_en.htm#disclaimer">
With quotes inside quotes:
<meta http-equiv="refresh" content="0; url='legal_notices_en.htm#disclaimer'">
Uppercase letters in the content of the tag:
<meta http-equiv="refresh" content="0; URL=legal_notices_en.htm#disclaimer">
Summary:
Use lxml.xml to parse the html,
Use a lower() and two split()s to get the url part,
Strip eventual wrapping quotes and spaces,
Get the absolute url,
Store the cache of the results in a local file with shelves (useful if you have lots of urls to test).
Usage:
print get_redirections('https://www.google.com')
Returns something like:
{'final': u'https://www.google.be/?gfe_rd=fd&ei=FDDASaSADFASd', 'history': [<Response [302]>]}
Code:
from urlparse import urljoin, urlparse
import urllib, shelve, lxml, requests
from lxml import html
def get_redirections(initial_url, url_id = None):
if not url_id:
url_id = initial_url
documents_checked = shelve.open('tested_urls.log')
if url_id in documents_checked:
print 'cached'
output = documents_checked[url_id]
else:
print 'not cached'
redirecting = True
history = []
try:
current_url = initial_url
while redirecting:
r = requests.get(current_url)
final = r.url
history += r.history
status = {'final':final,'history':history}
html = lxml.html.fromstring(r.text.encode('utf8'))
refresh = html.cssselect('meta[http-equiv="refresh"]')
if refresh:
refresh_content = refresh[0].attrib['content']
current_url = refresh_content.lower().split('url=')[1].split(';')[0]
before_stripping = ''
after_stripping = current_url
while before_stripping != after_stripping:
before_stripping = after_stripping
after_stripping = before_stripping.strip('"').strip("'").strip()
current_url = urljoin(final, after_stripping)
history += [current_url]
else:
redirecting = False
except requests.exceptions.RequestException as e:
status = {'final':str(e),'history':[],'error':e}
documents_checked[url_id] = status
output = status
documents_checked.close()
return output
url = 'http://google.com'
print get_redirections(url)

Cannot fetch a web site with python urllib.urlopen() or any web browser other than Shiretoko

Here is the URL of the site I want to fetch
https://salami.parc.com/spartag/GetRepository?friend=jmankoff&keywords=antibiotic&option=jmankoff%27s+tags
When I fetch the web site with the following code and display the contents with the following code:
sock = urllib.urlopen("https://salami.parc.com/spartag/GetRepository?friend=jmankoff&keywords=antibiotic&option=jmankoff's+tags")
html = sock.read()
sock.close()
soup = BeautifulSoup(html)
print soup.prettify()
I get the following output:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>
Error message
</title>
</head>
<body>
<h2>
Invalid input data
</h2>
</body>
</html>
I get the same result with urllib2 as well. Now interestingly, this URL works on only Shiretoko web browser v3.5.7. (when I say it works I mean that it brings me the right page). When I feed this URL into Firefox 3.0.15 or Konqueror v4.2.2. I get exactly the same error page (with "Invalid input data"). I don't have any idea what creates this difference and how I can fetch this page using Python. Any ideas?
Thanks
If you see the urllib2 doc, it says
urllib2.build_opener([handler, ...])ΒΆ
.....
If the Python installation has SSL support (i.e., if the ssl module can be imported), HTTPSHandler will also be added.
.....
you can try using urllib2 together with ssl module. alternatively, you can use httplib
That's exactly what you get when you click on the link with a webbrowser. Maybe you are supposed to be logged in or have a cookie set or something
I get the same message for firefox 3.5.8 (shiretoko) on linux

Categories