I am trying to capture http status code 3XX/302 for a redirection url. But I cannot get it because it gives 200 status code.
Here is the code:
import requests
r = requests.get('http://goo.gl/NZek5')
print r.status_code
I suppose this should issue either 301 or 302 because it redirects to another page. I had tried few redirecting urls (for e.g. http://fb.com ) but again it is issuing the 200. What should be done to capture the redirection code properly?
requests handles redirects for you, see redirection and history.
Set allow_redirects=False if you don't want requests to handle redirections, or you can inspect the redirection responses contained in the r.history list.
Demo:
>>> import requests
>>> url = 'https://httpbin.org/redirect-to'
>>> params = {"status_code": 301, "url": "https://stackoverflow.com/q/22150023"}
>>> r = requests.get(url, params=params)
>>> r.history
[<Response [301]>, <Response [302]>]
>>> r.history[0].status_code
301
>>> r.history[0].headers['Location']
'https://stackoverflow.com/q/22150023'
>>> r.url
'https://stackoverflow.com/questions/22150023/http-redirection-code-3xx-in-python-requests'
>>> r = requests.get(url, params=params, allow_redirects=False)
>>> r.status_code
301
>>> r.url
'https://httpbin.org/redirect-to?status_code=301&url=https%3A%2F%2Fstackoverflow.com%2Fq%2F22150023'
So if allow_redirects is True, the redirects have been followed and the final response returned is the final page after following redirects. If allow_redirects is False, the first response is returned, even if it is a redirect.
requests.get allows for an optional keyword argument allow_redirects which defaults to True. Setting allow_redirects to False will disable automatically following redirects, as follows:
In [1]: import requests
In [2]: r = requests.get('http://goo.gl/NZek5', allow_redirects=False)
In [3]: print r.status_code
301
This solution will identify the redirect and display the history of redirects, and it will handle common errors. This will ask you for your URL in the console.
import requests
def init():
console = input("Type the URL: ")
get_status_code_from_request_url(console)
def get_status_code_from_request_url(url, do_restart=True):
try:
r = requests.get(url)
if len(r.history) < 1:
print("Status Code: " + str(r.status_code))
else:
print("Status Code: 301. Below are the redirects")
h = r.history
i = 0
for resp in h:
print(" " + str(i) + " - URL " + resp.url + " \n")
i += 1
if do_restart:
init()
except requests.exceptions.MissingSchema:
print("You forgot the protocol. http://, https://, ftp://")
except requests.exceptions.ConnectionError:
print("Sorry, but I couldn't connect. There was a connection problem.")
except requests.exceptions.Timeout:
print("Sorry, but I couldn't connect. I timed out.")
except requests.exceptions.TooManyRedirects:
print("There were too many redirects. I can't count that high.")
init()
Anyone have the php version of this code?
r = requests.get(url)
if len(r.history) < 1:
print("Status Code: " + str(r.status_code))
else:
print("Status Code: 301. Below are the redirects")
h = r.history
i = 0
for resp in h:
print(" " + str(i) + " - URL " + resp.url + " \n")
i += 1
if do_restart:
Related
I made a site phishing searcher using Python. Here is the code that i use
output = []
for i in range (100):
for subdomain_count in [1, 2, 3, 4]:
webtypo = random.choice(typo) + '.odoo.com'
http = random.choice(HTTP)
data = random.sample(web, k=subdomain_count) + [webtypo]
delims = (random.choices(delimiters, k=subdomain_count)
address = ''.join([a+b for a, b in zip(data, delims)])
weburl = http + address
output.append(weburl)
exist=[]
for c in output:
try:
request = requests.get(c)
if request.status_code == 200:
exist.append(c)
print('Exist')
elif request.status_code == 204:
print('user does not exist')
except:
print('Not Exist')
When i check the Request URL, the link changes to https://www.odoo.com/typo?domain=minecraftnet.odoo.com&autodbname=edcandroidbni123&hosting=eu142a, is there a way to detect in odoo if a link of a website changes it would print out web does not exist, but if there's a site that uses the odoo.com tld it would print out exist.
Yes, you can use the response.url parameter to get the final URL after any redirects.
response = requests.get(c)
final_url = response.url
Note this only handles 3xx redirects, not javascript redirects. Requests will never execute javascript.
By default, allow_redirects is typically true, but that does not seem to handle redirects that give out non-3XX status codes.
More specifically, how do I handle
>>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
# example url
>>> r.status_code
200
>>> r.text
<p>Redirecting...</p>
And going to the site manually does reveal that the page indeed redirects despite giving a 200.
edit:
My current solution for now. Might clean it up later
def meta_redirect(html_text, uri):
start = html_text.find('<metahttp-equiv="refresh"content="0;url=') + 40
end = html_text.find('"', start)
redirect = html_text[start:end]
if "http" in redirect:
return redirect
return '/'.join([uri, redirect])
def main():
uri = 'https://api.github.com/user'
r = requests.get(uri, auth=('user', 'pass'))
html_text = r.text.lower().replace("'", '"').replace(" ", "")
while '<metahttp-equiv="refresh"content="0;url=' in html_text:
r = requests.get(meta_redirect(html_text, uri), auth=('user', 'pass'))
html_text = r.text.lower().replace("'", '"').replace(" ", "")
return r
I have a little problem with authentication. I am writting a script, which is getting login and password from user(input from keyboard) and then I want to get some data from the website(http not https), but every time I run the script the response is 401.I read some similar posts from stack and I tried this solutions:
Solution 1
c = HTTPConnection("somewebsite")
userAndPass = b64encode(b"username:password").decode("ascii")
headers = { 'Authorization' : 'Basic %s' % userAndPass }
c.request('GET', '/', headers=headers)
res = c.getresponse()
data = res.read()
Solution 2
with requests.Session() as c:
url = 'somewebsite'
USERNAME = 'username'
PASSWORD = 'password'
c.get(url)
login_data = dict(username = USERNAME, password = PASSWORD)
c.post(url,data = login_data)
page = c.get('somewebsite', headers = {"Referer": "somwebsite"})
print(page)
Solution 3
www = 'somewebsite'
value ={'filter':'somefilter'}
data = urllib.parse.urlencode(value)
data=data.encode('utf-8')
req = urllib.request.Request(www,data)
resp = urllib.request.urlopen(req)
respData = resp.read()
print(respData)
x = urllib.request.urlopen(www,"username","password")
print(x.read())'
I don't know how to solve this problem. Can somebody give me some link or tip ?
Have you tried the Basic Authentication example from requests?
>>> from requests.auth import HTTPBasicAuth
>>> requests.get('https://api.github.com/user', auth=HTTPBasicAuth('user', 'pass'))
<Response [200]>
Can I know what type of authentication on the website?
this is an official Basic Auth example (http://docs.python-requests.org/en/master/user/advanced/#http-verbs)
from requests.auth import HTTPBasicAuth
auth = HTTPBasicAuth('fake#example.com', 'not_a_real_password')
r = requests.post(url=url, data=body, auth=auth)
print(r.status_code)
To use api with authentication, we need to have token_id or app_id that will provide the access for our request. Below is an example how we can formulate the url and get the response:
strong text
import requests
city = input()
api_call = "http://api.openweathermap.org/data/2.5/weather?"
app_id = "892d5406f4811786e2b80a823c78f466"
req_url = api_call + "q=" + city + "&appid=" + app_id
response = requests.get(req_url)
data = response.json()
if (data["cod"] == 200):
hum = data["main"]["humidity"]
print("Humidity is % d " %(hum))
elif data["cod"] != 200:
print("Error occurred : " ,data["cod"], data["message"])
I have been having a problem where I try to send a get request and if there is a next page token in the result it will then take that link and execute another request recursively until there is no next page token in the result.
The first request works fine but when there is a next page token in the response and it tries to execute the new request the result is an Invalid ReSponse but if I take the link that was given from the result and use it in postman or on my browser everything is fine.
I'm assuming it has something to requests running on different threads at the same time.
The second response from request using Python:
{'html_attributions': [], 'status': 'INVALID_REQUEST', 'results': []}
Here is what I have:
import requests
def getPlaces(location,radius,type, APIKEY):
url = "https://maps.googleapis.com/maps/api/place/nearbysearch/json?location="+location+"&radius="+radius+"&type="+type+"&key="+APIKEY
print('Getting results for type ' + type + '...')
r = requests.get(url)
response = r.json()
results = []
if response['status'] == 'ZERO_RESULTS':
print("Did not find results for the type "+type)
else:
print("Results for type "+type)
for result in response['results']:
results.append(result)
print(result)
print('Printing results')
print(results)
if 'next_page_token' in response:
print("There is a next page")
page_token = response['next_page_token']
print(page_token)
next_results = getNextPlace(page_token,APIKEY)
print(next_results)
results.append(next_results)
return results
# Get the rest of the results
def getNextPlace(page_token,APIKEY):
print('...')
next_url = 'https://maps.googleapis.com/maps/api/place/nearbysearch/json?location='+location+'&radius='+radius+'&type='+type+'&pagetoken=' + page_token + '&key=' + APIKEY
print(next_url)
r = requests.get(next_url)
response = r.json()
results = []
print(response)
if response['status'] == 'ZERO_RESULTS':
print("Did not find results")
elif response['status'] == 'INVALID_REQUEST':
print('Invalid response')
else:
for next_result in response['results']:
results.append(next_result)
print(next_result)
if 'next_page_token' in response:
new_page_token = response['next_page_token']
getNext = getNextPlace(new_page_token,APIKEY)
results.append(getNext)
return results
Figured out the issue!
Google API doesn't allow consecutive requests to its API if the last request was within ~2 seconds.
What I did have I just had the program sleep for 3 seconds and the sent the request.
Now everything is working fine
What you are trying to do can be seen in one function like:
def getPlaces(location,radius,API,i,type):
url = "https://maps.googleapis.com/maps/api/place/nearbysearch/json?location="+location+"&radius="+radius+"&key="+API+"&types="+type
r = requests.get(url)
response = r.json()
results = []
for result in response['results']:
results.append(result)
l=[]
while True:
if 'next_page_token' in response:
page_token = response['next_page_token']
l.append(page_token)
next_url = url+'&pagetoken='+l[i]
i=i+1
time.sleep(3)
r = requests.get(next_url)
response = r.json()
for next_result in response['results']:
results.append(next_result)
else:
break
return results
Your code print "invalid response" because response['status'] == 'INVALID_REQUEST', so it is google api service think your url request is invalid.
As this document says, the parameter location, radius, type and key is required, and the pagetoken is optional. So your second request url is invalid because it does not have the all required key.
Maybe you should try change the url to :
next_url = 'https://maps.googleapis.com/maps/api/place/nearbysearch/json?location='+location+"&radius="+radius+"&type="+type+"&key="+APIKEY + "&pagetoken=" + page_token
I am able to use the below code to do a get request on the concourse api to fetch the pipeline build details.
However post request to trigger the pipeline build does not work and no error is reported .
Here is the code
url = "http://192.168.100.4:8080/api/v1/teams/main/"
r = requests.get(url + 'auth/token')
json_data = json.loads(r.text)
cookie = {'ATC-Authorization': 'Bearer '+ json_data["value"]}
r = requests.post(url + 'pipelines/pipe-name/jobs/job-name/builds'
, cookies=cookie)
print r.text
print r.content
r = requests.get(url + 'pipelines/pipe-name/jobs/job-name/builds/17', cookies=cookie)
print r.text
You may use Session :
[...] The Session object allows you to persist certain parameters across requests. It also persists cookies across all requests made from the Session instance [...]
url = "http://192.168.100.4:8080/api/v1/teams/main/"
req_sessions = requests.Session() #load session instance
r = req_sessions.get(url + 'auth/token')
json_data = json.loads(r.text)
cookie = {'ATC-Authorization': 'Bearer '+ json_data["value"]}
r = req_sessions.post(url + 'pipelines/pipe-name/jobs/job-name/builds', cookies=cookie)
print r.text
print r.content
r = req_sessions.get(url + 'pipelines/pipe-name/jobs/job-name/builds/17')
print r.text