I want to unshorten URLs to get the real address.In some cases there are more than one redirection. I have tried using urllib2 but it seems to be making GET requests which is consuming time and bandwidth. I want get only the headers so that I have the final URL without needing to get the whole body/data of that page.
thanks
You need to execute a HTTP HEAD request to get just the headers.
The second answer shows how to perform a HEAD request using urllib.
How do you send a HEAD HTTP request in Python 2?
Related
I'm needing to write a script to confirm a part of website is vulnerable to reflected XSS but the request response doesn't contain complete HTML so I can't check it for the payload. For example in Burb the response contains the whole page HTML where I can see the 'alert('xss')' but in Python it does not. I've tried response.text/content etc. but they're all the same. Is there a seperate module for this stuff or am I just doing something wrong with the request?
for p in payloads:
response = requests.get(url+p)
if p in response.content:
print(f'Vulnerable: payload - {p}')
Burp response does contain the following
<pre>Hello <script>alert("XSS")</script></pre>
I need to have the same thing in the Python response
One possibility is that that part of the script only will load after a few seconds after GET. When using the request module, it will return the first thing it sees (i.e. the unloaded script).
To go around this you may want to use a web driver module like selenium that allows waiting before getting the HTML
When I was just doing some research on python web scraping I got to know of a package named grequests, it was said that this can send parallel HTTP requests thus gaining more speed than the normal python requests module. Well, that sounds great but I was not able to get the HTML of the web pages I requested as there is no .text method like the normal requests module. If I get some help it would be great!
Since grequests.imap function returns a list, you'll need to use an index or call the entire list in a loop.
responses = grequests.imap(session)
for response in responses:
print(response.text)
When I access a specific webpage, it sends a specific POST request, the response to which I want to parse. How do I make my script, receiving only the webpage's URL, parse that specific request's response?
(Ideally, in Python.)
So, I've found out that the 'seleniumwire' library for Python is one way to access requests made by a browser when loading a page.
I'm writing a script, to help me do some repetitive testing of a bunch of URLs.
I've written a python method in the script that it opens up the URL and sends a get request. I'm using Requests: HTTP for Humans -http://docs.python-requests.org/en/latest/- api to handle the http calls.
There's the request.history that returns a list of status codes of the directs. I need to be able to access the particular redirects for those list of 301s. There doesn't seem to be a way to do this - to access and trace what my URLS are redirecting to. I want to be able to access the redirected URLS (status code 301)
Can anyone offer any advice?
Thanks
Okay, I'm so silly. Here's the answer I was looking for
r = requests.get("http://someurl")
r.history[1].url will return the URL
I've got a link that I know redirects to another end url, and I'm trying to get the address for that end url using python. But the original link is a little weird, and doesn't work like a normal redirect, and I can't figure out why. When I post the link (the link's below for you try, if you'd like) into a browser, it redirects perfectly. But when I run the following code, it doesn't.
import urllib2
request = urllib2.Request('http://www.facebook.com/ajax/emu/end.php?eid=AQJSWpZ3e4cCTHoNdahpJzPYzmzHOENzbTWBVlW4SgIxX0rL9bo6NXmS3q06cjeh5jO9wbsmr3IyGrpbXPSj0GPLbRJl4VUH-EBnmSy_R4j7iYzpMe1ooZ6IEqSEIlBl0-5SEldIhxI82m75YPa5nOhuBdokiwTw79hoiRB-Zn1auxN-6WLVe3e5WNSt3HLAEjZL-2e4ox_7yAyLcBo1nkamEvShTyZ-GfIf0A9oFXylwRnV8oNaqNmUnqrFYqDbUhzh7d6LSm3jbv1ue2coS3w8N7OxTKVwODHa-Hd3qRbYskB9weio8eKdDFtkvDKuzSSq5hjr711UjlDsgpxLuAmdD95xVwpomxeEsBsMCYJoUEQYa-cM7q3W1aiIYBHlyn2__t74qHWVvzK5zaLKFMKjRFQqphDlUMgMni6AP1VHSn1wli_3lgeVD8TzcJMSlJIF7DC_O44WdjBIMY8OufER3ZB_mm2NqwUe6cvV9oV9SNyYHE4UUURYjW_Z6sUxz3SpHG8c6QxJ-ltSeShvU3mIwAhFE3M0jGTg7AQ7nIoOUfC8PDainFZ1NV8g31aqaqDsF7UxdlOmBT6w-Y8TPmHOXfSlWB-M3MQYUBmcWS3UzlbSsavQG8LXPqYbyKfvkAfncSnZS3_tkoqbTksFirQWlSxJ3mgXrO5PqopH63Esd9ynCbFQM1q_3_wgkYvTeGS9XK6G63_Ag3N9dCHsO_bCJToJT4jeHQCSQ83cb1U5Qpe_7EWbw1ilzgyL-LBVrpH424dwK-4AoaL00W-gWzShSdOynjcoGeB7KE0pHbg-XhuaVribSodriSGybNdADBosnddVvZldY22-_97MqEuA&&c=4&&f=4&&ui=6003071106023-id_4e0b51323f9d01393198225&&en=1&&a=0&&sig=78154')
opener = urllib2.build_opener()
f = opener.open(request)
f.geturl()
I simply get my original url back. I encounter the same problem when I save cookies and use mechanize. Any help would be much appreciated! Thanks!
It looks like this is using Javascript to perform the redirect. You'll either have to figure out exactly how the Javascript is performing the redirects and pull out the appropriate urls, or you'll have to actually run the Javascript. As far as I know, running Javascript from python is not an easy task.
(original answer deleted)
If you look at the contents of f.read() you'll see what's going on here. Instead of returning a 301 or 302 that redirects to the new URL, Facebook actually returns a real HTML document - which contains a piece of Javascript that uses document.location.replace to change the URL in the browser.
There's no easy way of replicating that with Python - the best thing to do is to parse the document with something like BeautifulSoup to find the Javascript, and somehow extract the new URL. It won't be pretty.