Bypassing company single-sign-on using requests in Python - python

from requests.auth import HTTPDigestAuth
url_0 = "https://xyz.ecorp.abc.com/"
r = requests.get(url_0, auth=HTTPDigestAuth('username', 'password'))
r.text
I am trying to get data from this site but when I log in using requests library then my company has additional layer of security tied to it which directs me to Single sign on page. Is there a way to bypass that and go to the above URL?
Below is the response from server:
<HTML>
<HEAD>
<SCRIPT language="JavaScript">
function redirect({
window.location.replace("https://abc.xyz.com/CwsLogin/cws/sso.htm");
}
</SCRIPT>
</HEAD>
<BODY onload="redirect()"></BODY>
</HTML>

Related

Why Requests library cannot read the source-code?

I've been writing a python script for all the Natas challenges. So far, everything went smooth.
In challenge natas22, there is nothing on the page, but it gives you the link of the source-code. From the browser, I can reach to the source-code (which is PHP) and read it. But I cannot do it with my Python script. Which is very weird, because I've done that in other challenges...
I also tried to give a user-agent (up to date chrome browser), did not work.
Here is the small code:
import requests
user = 'natas22'
passw = 'chG9fbe1Tq2eWVMgjYYD1MsfIvN461kJ'
url = 'http://%s.natas.labs.overthewire.org/' % user
response = requests.get('http://natas22.natas.labs.overthewire.org/index-source.html', auth=(user, passw))
print(response.text)
Which returns:
<code><span style="color: #000000">
<br /></span>ml>id="viewsource"><a href="index-source.html">View sourcecode</a></div>nbsp;next level are:<br>";l.js"></script>
</code>
But in fact, it should had returned:
<? session_start();
if(array_key_exists("revelio", $_GET)) {
// only admins can reveal the password
if(!($_SESSION and array_key_exists("admin", $_SESSION) and $_SESSION["admin"] == 1)) {
header("Location: /");
} } ?>
<html> <head> <!-- This stuff in the header has nothing to do with the level --> <link rel="stylesheet" type="text/css" href="http://natas.labs.overthewire.org/css/level.css"> <link rel="stylesheet" href="http://natas.labs.overthewire.org/css/jquery-ui.css" /> <link rel="stylesheet" href="http://natas.labs.overthewire.org/css/wechall.css" /> <script src="http://natas.labs.overthewire.org/js/jquery-1.9.1.js"></script> <script src="http://natas.labs.overthewire.org/js/jquery-ui.js"></script> <script src=http://natas.labs.overthewire.org/js/wechall-data.js></script><script src="http://natas.labs.overthewire.org/js/wechall.js"></script> <script>var wechallinfo = { "level": "natas22", "pass": "<censored>" };</script></head> <body> <h1>natas22</h1> <div id="content">
<?
if(array_key_exists("revelio", $_GET)) {
print "You are an admin. The credentials for the next level are:<br>";
print "<pre>Username: natas23\n";
print "Password: <censored></pre>";
} ?>
<div id="viewsource">View sourcecode</div> </div> </body> </html>
Why it's behaving like this? I'm very curious and couldn't find out
If you want the url for trying from the browser:
url: http://natas22.natas.labs.overthewire.org/index-source.html
Username: natas22
Password: chG9fbe1Tq2eWVMgjYYD1MsfIvN461kJ
Your code seems to be fine. The source code use \r instead of \n, so most of the code is hidden in a terminal.
You can see this using response.content instead of response.test to see this:
import requests
user = 'natas22'
passw = 'chG9fbe1Tq2eWVMgjYYD1MsfIvN461kJ'
url = 'http://%s.natas.labs.overthewire.org/' % user
response = requests.get('http://natas22.natas.labs.overthewire.org/index-source.html', auth=(user, passw))
print(response.content)
Try:
import requests
user = 'natas22'
passw = 'chG9fbe1Tq2eWVMgjYYD1MsfIvN461kJ'
url = 'http://%s.natas.labs.overthewire.org/' % user
response = requests.get('http://natas22.natas.labs.overthewire.org/index-source.html', auth=(user, passw))
print(response.text.replace('\r', '\n'))
This also works:
import requests
user = 'natas22'
passw = 'chG9fbe1Tq2eWVMgjYYD1MsfIvN461kJ'
url = 'http://%s.natas.labs.overthewire.org/' % user
response = requests.get('http://natas22.natas.labs.overthewire.org/index-source.html', auth=(user, passw))
print(response.content.decode('utf8').replace('\r', '\n'))

Web-scraping a password protected website using Ghost.py

I'm trying to get the HTML content of a password protected site using Ghost.py.
The web server I have to access, has the following HTML code (I cut it just to the important parts):
URL: http://192.168.1.60/PAGE.htm
<html>
<head>
<script language="JavaScript">
function DoHash()
{
var psw = document.getElementById('psw_id');
var hpsw = document.getElementById('hpsw_id');
var nonce = hpsw.value;
hpsw.value = MD5(nonce.concat(psw.value));
psw.value = '';
return true;
}
</script>
</head>
<body>
<form action="PAGE.HTM" name="" method="post" onsubmit="DoHash();">
Access code <input id="psw_id" type="password" maxlength="15" size="20" name="q" value="">
<br>
<input type="submit" value="" name="q" class="w_bok">
<br>
<input id="hpsw_id" type="hidden" name="pA" value="180864D635AD2347">
</form>
</body>
</html>
The value of "#hpsw_id" changes every time you load the page.
On a normal browser, once you type the correct password and press enter or click the "submit" button, you land on the same page but now with the real contents.
URL: http://192.168.1.60/PAGE.htm
<html>
<head>
<!–– javascript is gone ––>
</head>
<body>
Welcome to PAGE.htm content
</body>
</html>
First I tried with mechanize but failed, as I need javascript. So now I´m trying to solve it using Ghost.py
My code so far:
import ghost
g = ghost.Ghost()
with g.start(wait_timeout=20) as session:
page, extra_resources = session.open("http://192.168.1.60/PAGE.htm")
if page.http_status == 200:
print("Good!")
session.evaluate("document.getElementById('psw_id').value='MySecretPassword';")
session.evaluate("document.getElementsByClassName('w_bok')[0].click();", expect_loading=True)
print session.content
This code is not loading the contents correctly, in the console I get:
Traceback (most recent call last): File "", line 8, in
File
"/usr/local/lib/python2.7/dist-packages/ghost/ghost.py", line 181, in
wrapper
timeout=kwargs.pop('timeout', None)) File "/usr/local/lib/python2.7/dist-packages/ghost/ghost.py", line 1196, in
wait_for_page_loaded
'Unable to load requested page', timeout) File "/usr/local/lib/python2.7/dist-packages/ghost/ghost.py", line 1174, in
wait_for
raise TimeoutError(timeout_message) ghost.ghost.TimeoutError: Unable to load requested page
Two questions...
1) How can I successfully login to the password protected site and get the real content of PAGE.htm?
2) Is this direction the best way to go? Or I'm missing something completely which will make things work more efficiently?
I'm using Ubuntu Mate.
This is not the answer I was looking for, just a work-around to make it work (in case someone else has a similar issue in the future).
To skip the javascript part (which was stopping me to use python's request), I decided to do the expected hash on python (and not on web) and send the hash as the normal web form would do.
So the Javascript basically concatenates the hidden hpsw_id value and the password, and makes a md5 from it.
The python now looks like this:
import requests
from hashlib import md5
from re import search
url = "http://192.168.1.60/PAGE.htm"
with requests.Session() as s:
# Get hpsw_id number from website
r = s.get(url)
hpsw_id = search('name="pA" value="([A-Z0-9]*)"', r.text)
hpsw_id = hpsw_id.group(1)
# Make hash of ID and password
m = md5()
m.update(hpsw_id + 'MySecretPassword')
pA = m.hexdigest()
# Post to website to login
r = s.post(url, data=[('q', ''), ('q', ''), ('pA', pA)])
print r.content
Note: the q, q and pA are the elements that the form (q=&q=&pA=f08b97e5e3f472fdde4280a9aa408aaa) is sending when I login normally using internet browser.
If someone however knows the answer of my original question I would be very appreciated if you post it here.

Can not find out the source of data I need when crawling website

I am writing a web crawler with python. I come across a problem when I am trying to find out the source of the data I need.
The site I am crawling is: https://www.whoscored.com/Regions/252/Tournaments/2/England-Premier-League, and the data I want is as below:
I can find these data by browsering the page source after the page has been tatolly loaded by firefox:
DataStore.prime('standings', { stageId:15151, idx:0, field: 'overall'}, [[15151,32,'Manchester United',1,5,4,1,0,16,2,14,13,1,3,3,0,0,10,0,10,9,7,2,1,1,0,6,2,4,4,[[0,1190179,4,0,2,252,'England',2,'Premier League','2017/2018',32,29,'Manchester United','West Ham','Manchester United','West Ham',4,0,'w'] ......
I thought these data should be requested though ajax, but I detected no such request by using the web console.
Then, I simulated the browser behaviour (set header and cookies) requiring the html page:
<html>
<head>
<META NAME="robots" CONTENT="noindex,nofollow">
<script src="/_Incapsula_Resource?SWJIYLWA=2977d8d74f63d7f8fedbea018b7a1d05">
</script>
<script>
(function() {
var z="";var b="7472797B766172207868723B76617220743D6E6577204461746528292E67657454696D6528293B7661722073746174757......";for (var i=0;i<b.length;i+=2){z=z+parseInt(b.substring(i, i+2), 16)+",";}z = z.substring(0,z.length-1); eval(eval('String.fromCharCode('+z+')'));})();
</script></head>
<body>
<iframe style="display:none;visibility:hidden;" src="//content.incapsula.com/jsTest.html" id="gaIframe"></iframe>
</body></html>
I created an .html file with the content above, and open it with firefox, but it seems that the script did not executed. Now, I don`t know how to do, I need some help, thanks!

Python Authentication

I'm new to python and after struggling with myself a little bit I almost got the code to working.
import urllib, urllib2, cookielib
username = 'myuser'
password = 'mypass'
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login = urllib.urlencode({'user' : username, 'pass' : password})
opener.open('http://www.ok.com/', login)
mailb = opener.open('http://www.ok.com/mailbox').read()
print mailb
But the output I got after print is just a redirect page.
<html>
<head>
<META HTTP-EQUIV="Refresh" CONTENT="0;URL=https://login.ok.com/login.html?skin=login-page&dest=REDIR|http://www.ok.com/mailbox">
<HTML dir=ltr><HEAD><TITLE>OK :: Redirecting</TITLE>
</head>
</html>
Thanks
If a browser got that response, it would interpret it as a request to redirect to the URL specified.
You will need to do something similar with your script. You need to parse the <META> tag and locate the URL and then do a GET on that URL.

Cannot fetch a web site with python urllib.urlopen() or any web browser other than Shiretoko

Here is the URL of the site I want to fetch
https://salami.parc.com/spartag/GetRepository?friend=jmankoff&keywords=antibiotic&option=jmankoff%27s+tags
When I fetch the web site with the following code and display the contents with the following code:
sock = urllib.urlopen("https://salami.parc.com/spartag/GetRepository?friend=jmankoff&keywords=antibiotic&option=jmankoff's+tags")
html = sock.read()
sock.close()
soup = BeautifulSoup(html)
print soup.prettify()
I get the following output:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>
Error message
</title>
</head>
<body>
<h2>
Invalid input data
</h2>
</body>
</html>
I get the same result with urllib2 as well. Now interestingly, this URL works on only Shiretoko web browser v3.5.7. (when I say it works I mean that it brings me the right page). When I feed this URL into Firefox 3.0.15 or Konqueror v4.2.2. I get exactly the same error page (with "Invalid input data"). I don't have any idea what creates this difference and how I can fetch this page using Python. Any ideas?
Thanks
If you see the urllib2 doc, it says
urllib2.build_opener([handler, ...])¶
.....
If the Python installation has SSL support (i.e., if the ssl module can be imported), HTTPSHandler will also be added.
.....
you can try using urllib2 together with ssl module. alternatively, you can use httplib
That's exactly what you get when you click on the link with a webbrowser. Maybe you are supposed to be logged in or have a cookie set or something
I get the same message for firefox 3.5.8 (shiretoko) on linux

Categories