Why Requests library cannot read the source-code?

Why Requests library cannot read the source-code? - python

I've been writing a python script for all the Natas challenges. So far, everything went smooth.
In challenge natas22, there is nothing on the page, but it gives you the link of the source-code. From the browser, I can reach to the source-code (which is PHP) and read it. But I cannot do it with my Python script. Which is very weird, because I've done that in other challenges...
I also tried to give a user-agent (up to date chrome browser), did not work.
Here is the small code:
import requests
user = 'natas22'
passw = 'chG9fbe1Tq2eWVMgjYYD1MsfIvN461kJ'
url = 'http://%s.natas.labs.overthewire.org/' % user
response = requests.get('http://natas22.natas.labs.overthewire.org/index-source.html', auth=(user, passw))
print(response.text)
Which returns:
<code><span style="color: #000000">
<br /></span>ml>id="viewsource"><a href="index-source.html">View sourcecode</a></div>nbsp;next level are:<br>";l.js"></script>
</code>
But in fact, it should had returned:
<? session_start();
if(array_key_exists("revelio", $_GET)) {
// only admins can reveal the password
if(!($_SESSION and array_key_exists("admin", $_SESSION) and $_SESSION["admin"] == 1)) {
header("Location: /");
} } ?>
<html> <head> <!-- This stuff in the header has nothing to do with the level --> <link rel="stylesheet" type="text/css" href="http://natas.labs.overthewire.org/css/level.css"> <link rel="stylesheet" href="http://natas.labs.overthewire.org/css/jquery-ui.css" /> <link rel="stylesheet" href="http://natas.labs.overthewire.org/css/wechall.css" /> <script src="http://natas.labs.overthewire.org/js/jquery-1.9.1.js"></script> <script src="http://natas.labs.overthewire.org/js/jquery-ui.js"></script> <script src=http://natas.labs.overthewire.org/js/wechall-data.js></script><script src="http://natas.labs.overthewire.org/js/wechall.js"></script> <script>var wechallinfo = { "level": "natas22", "pass": "<censored>" };</script></head> <body> <h1>natas22</h1> <div id="content">
<?
if(array_key_exists("revelio", $_GET)) {
print "You are an admin. The credentials for the next level are:<br>";
print "<pre>Username: natas23\n";
print "Password: <censored></pre>";
} ?>
<div id="viewsource">View sourcecode</div> </div> </body> </html>
Why it's behaving like this? I'm very curious and couldn't find out
If you want the url for trying from the browser:
url: http://natas22.natas.labs.overthewire.org/index-source.html
Username: natas22
Password: chG9fbe1Tq2eWVMgjYYD1MsfIvN461kJ

Your code seems to be fine. The source code use \r instead of \n, so most of the code is hidden in a terminal.
You can see this using response.content instead of response.test to see this:
import requests
user = 'natas22'
passw = 'chG9fbe1Tq2eWVMgjYYD1MsfIvN461kJ'
url = 'http://%s.natas.labs.overthewire.org/' % user
response = requests.get('http://natas22.natas.labs.overthewire.org/index-source.html', auth=(user, passw))
print(response.content)

Try:
import requests
user = 'natas22'
passw = 'chG9fbe1Tq2eWVMgjYYD1MsfIvN461kJ'
url = 'http://%s.natas.labs.overthewire.org/' % user
response = requests.get('http://natas22.natas.labs.overthewire.org/index-source.html', auth=(user, passw))
print(response.text.replace('\r', '\n'))
This also works:
import requests
user = 'natas22'
passw = 'chG9fbe1Tq2eWVMgjYYD1MsfIvN461kJ'
url = 'http://%s.natas.labs.overthewire.org/' % user
response = requests.get('http://natas22.natas.labs.overthewire.org/index-source.html', auth=(user, passw))
print(response.content.decode('utf8').replace('\r', '\n'))

Related

Web-scraping a password protected website using Ghost.py

I'm trying to get the HTML content of a password protected site using Ghost.py.
The web server I have to access, has the following HTML code (I cut it just to the important parts):
URL: http://192.168.1.60/PAGE.htm
<html>
<head>
<script language="JavaScript">
function DoHash()
{
var psw = document.getElementById('psw_id');
var hpsw = document.getElementById('hpsw_id');
var nonce = hpsw.value;
hpsw.value = MD5(nonce.concat(psw.value));
psw.value = '';
return true;
}
</script>
</head>
<body>
<form action="PAGE.HTM" name="" method="post" onsubmit="DoHash();">
Access code <input id="psw_id" type="password" maxlength="15" size="20" name="q" value="">
<br>
<input type="submit" value="" name="q" class="w_bok">
<br>
<input id="hpsw_id" type="hidden" name="pA" value="180864D635AD2347">
</form>
</body>
</html>
The value of "#hpsw_id" changes every time you load the page.
On a normal browser, once you type the correct password and press enter or click the "submit" button, you land on the same page but now with the real contents.
URL: http://192.168.1.60/PAGE.htm
<html>
<head>
<!–– javascript is gone ––>
</head>
<body>
Welcome to PAGE.htm content
</body>
</html>
First I tried with mechanize but failed, as I need javascript. So now I´m trying to solve it using Ghost.py
My code so far:
import ghost
g = ghost.Ghost()
with g.start(wait_timeout=20) as session:
page, extra_resources = session.open("http://192.168.1.60/PAGE.htm")
if page.http_status == 200:
print("Good!")
session.evaluate("document.getElementById('psw_id').value='MySecretPassword';")
session.evaluate("document.getElementsByClassName('w_bok')[0].click();", expect_loading=True)
print session.content
This code is not loading the contents correctly, in the console I get:
Traceback (most recent call last): File "", line 8, in
File
"/usr/local/lib/python2.7/dist-packages/ghost/ghost.py", line 181, in
wrapper
timeout=kwargs.pop('timeout', None)) File "/usr/local/lib/python2.7/dist-packages/ghost/ghost.py", line 1196, in
wait_for_page_loaded
'Unable to load requested page', timeout) File "/usr/local/lib/python2.7/dist-packages/ghost/ghost.py", line 1174, in
wait_for
raise TimeoutError(timeout_message) ghost.ghost.TimeoutError: Unable to load requested page
Two questions...
1) How can I successfully login to the password protected site and get the real content of PAGE.htm?
2) Is this direction the best way to go? Or I'm missing something completely which will make things work more efficiently?
I'm using Ubuntu Mate.

This is not the answer I was looking for, just a work-around to make it work (in case someone else has a similar issue in the future).
To skip the javascript part (which was stopping me to use python's request), I decided to do the expected hash on python (and not on web) and send the hash as the normal web form would do.
So the Javascript basically concatenates the hidden hpsw_id value and the password, and makes a md5 from it.
The python now looks like this:
import requests
from hashlib import md5
from re import search
url = "http://192.168.1.60/PAGE.htm"
with requests.Session() as s:
# Get hpsw_id number from website
r = s.get(url)
hpsw_id = search('name="pA" value="([A-Z0-9]*)"', r.text)
hpsw_id = hpsw_id.group(1)
# Make hash of ID and password
m = md5()
m.update(hpsw_id + 'MySecretPassword')
pA = m.hexdigest()
# Post to website to login
r = s.post(url, data=[('q', ''), ('q', ''), ('pA', pA)])
print r.content
Note: the q, q and pA are the elements that the form (q=&q=&pA=f08b97e5e3f472fdde4280a9aa408aaa) is sending when I login normally using internet browser.
If someone however knows the answer of my original question I would be very appreciated if you post it here.

Bypassing company single-sign-on using requests in Python

from requests.auth import HTTPDigestAuth
url_0 = "https://xyz.ecorp.abc.com/"
r = requests.get(url_0, auth=HTTPDigestAuth('username', 'password'))
r.text
I am trying to get data from this site but when I log in using requests library then my company has additional layer of security tied to it which directs me to Single sign on page. Is there a way to bypass that and go to the above URL?
Below is the response from server:
<HTML>
<HEAD>
<SCRIPT language="JavaScript">
function redirect({
window.location.replace("https://abc.xyz.com/CwsLogin/cws/sso.htm");
}
</SCRIPT>
</HEAD>
<BODY onload="redirect()"></BODY>
</HTML>

How to handle alerts with Python?

I wuold like to handle alerts with Python. What I wuold like to do is:
Open a url
Submit a form or click some links
Check if an alert occurs in the new page
I made this with Javascript using PhantomJS, but I would made even with Python.
Here is the javascript code:
file test.js:
var webPage = require('webpage');
var page = webPage.create();
var url = 'http://localhost:8001/index.html'
page.onConsoleMessage = function (msg) {
console.log(msg);
}
page.open(url, function (status) {
page.evaluate(function () {
document.getElementById('myButton').click()
});
page.onConsoleMessage = function (msg) {
console.log(msg);
}
page.onAlert = function (msg) {
console.log('ALERT: ' + msg);
};
setTimeout(function () {
page.evaluate(function () {
console.log(document.documentElement.innerHTML)
});
phantom.exit();
}, 1000);
});
file index.html
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<title></title>
<meta charset="utf-8" />
</head>
<body>
<form>
<input id="username" name="username" />
<button id="myButton" type="button" value="Page2">Go to Page2</button>
</form>
</body>
</html>
<script>
document.getElementById("myButton").onclick = function () {
location.href = "page2.html";
};
</script>
file page2.html
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<title></title>
<meta charset="utf-8" />
</head>
<body onload="alert('hello')">
</body>
</html>
This works; it detects the alert on page2.html.
Now I made this python script:
test.py
import requests
from test import BasicTest
from selenium import webdriver
from bs4 import BeautifulSoup
url = 'http://localhost:8001/index.html'
def main():
#browser = webdriver.Firefox()
browser = webdriver.PhantomJS()
browser.get(url)
html_source = browser.page_source
#browser.quit()
soup = BeautifulSoup(html_source, "html.parser")
soup.prettify()
request = requests.get('http://localhost:8001/page2.html')
print request.text
#Handle Alert
if __name__ == "__main__":
main();
Now, how can I check if an alert occur on page2.html with Python? First I open the page index.html, then page2.html.
I'm at the beginning, so any suggestions will be appreciate.
p.s.
I also tested webdriver.Firefox() but it is extremely slow.
Also i read this question : Check if any alert exists using selenium with python
but it doesn't work (below is the same previous script plus the solution suggested in the answer).
.....
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
....
def main():
.....
#Handle Alert
try:
WebDriverWait(browser, 3).until(EC.alert_is_present(),
'Timed out waiting for PA creation ' +
'confirmation popup to appear.')
alert = browser.switch_to.alert()
alert.accept()
print "alert accepted"
except TimeoutException:
print "no alert"
if __name__ == "__main__":
main();
I get the error :
"selenium.common.exceptions.WebDriverException: Message: Invalid
Command Method.."

PhantomJS uses GhostDriver to implement the WebDriver Wire Protocol, which is how it works as a headless browser within Selenium.
Unfortunately, GhostDriver does not currently support Alerts. Although it looks like they would like help to implement the features:
https://github.com/detro/ghostdriver/issues/20
You could possibly switch to the javascript version of PhantomJS or use the Firefox driver within Selenium.
from selenium import webdriver
from selenium.common.exceptions import NoAlertPresentException
if __name__ == '__main__':
# Switch to this driver and switch_to_alert will fail.
# driver = webdriver.PhantomJS('<Path to Phantom>')
driver = webdriver.Firefox()
driver.set_window_size(1400, 1000)
driver.get('http://localhost:8001/page2.html')
try:
driver.switch_to.alert.accept()
print('Alarm! ALARM!')
except NoAlertPresentException:
print('*crickets*')

Python3: http.client with privoxy/TOR making bad requests

I'm trying to use TOR with http.client.HTTPConnection, but for some reason I keep getting weird responses from everything. I'm not really sure exactly how to explain, so here's an example of what I have:
class Socket(http.client.HTTPConnection):
def __init__(self, url):
super().__init__('127.0.0.1', 8118)
super().set_tunnel(url)
#super().__init__(url)
def get(self, url = '/', params = {}):
params = util.params_to_query(params)
if params:
if url.find('?') == -1: url += '?' + params
else: url += '&' + params
self.request(
'GET',
url,
'',
{'Connection': 'Keep alive'}
)
return self.getresponse().read().decode('utf-8')
If I run this with:
sock = Socket('www.google.com')
print(sock.get())
I get:
<html><head><meta content="text/html;charset=utf-8" http-equiv="content-type"/>
<title>301 Moved</title></head><body>
<h1>301 Moved</h1>
The document has moved
here.
</body></html>
Google is redirecting me to the url I just requested, except with the privoxy port. And it gets weirder - if I try https://check.torproject.org:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<title>Welcome to sergii!</title>
</head>
<body>
<h1>Welcome to sergii!</h1>
This is sergii, a system run by and for the Tor Project.
She does stuff.
What kind of stuff and who our kind sponsors are you might learn on
db.torproject.org.
<p>
</p><hr noshade=""/>
<font size="-1">torproject-admin</font>
</body>
</html>
If I don't try to use privoxy/TOR, I get exactly what your browser gets at http://www.google.com or http://check.torproject.org. I don't know what's going on here. I suspect the issue is with python because I can use TOR with firefox, but I don't really know.
Privoxy log reads:
2015-06-27 19:28:26.950 7f58f4ff9700 Request: www.google.com:80/
2015-06-27 19:30:40.360 7f58f4ff9700 Request: check.torproject.org:80/
TOR log has nothing useful to say.

This ended up being because I was connecting with http:// and those sites wanted https://. It does work correctly for sites that accept normal http://.

Python Authentication

I'm new to python and after struggling with myself a little bit I almost got the code to working.
import urllib, urllib2, cookielib
username = 'myuser'
password = 'mypass'
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login = urllib.urlencode({'user' : username, 'pass' : password})
opener.open('http://www.ok.com/', login)
mailb = opener.open('http://www.ok.com/mailbox').read()
print mailb
But the output I got after print is just a redirect page.
<html>
<head>
<META HTTP-EQUIV="Refresh" CONTENT="0;URL=https://login.ok.com/login.html?skin=login-page&dest=REDIR|http://www.ok.com/mailbox">
<HTML dir=ltr><HEAD><TITLE>OK :: Redirecting</TITLE>
</head>
</html>
Thanks

If a browser got that response, it would interpret it as a request to redirect to the URL specified.
You will need to do something similar with your script. You need to parse the <META> tag and locate the URL and then do a GET on that URL.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why Requests library cannot read the source-code? - python

Related

Web-scraping a password protected website using Ghost.py

Bypassing company single-sign-on using requests in Python

How to handle alerts with Python?

Python3: http.client with privoxy/TOR making bad requests

Python Authentication

Categories

Resources