I created a API in my site and I'm trying to call an API from python but I always get 406 as a response, however, if I put the url in the browser with the parameters, I can see the correct answer
I already did some test in pages where you can tests you own API, I test it in the browser and work fine.
I already followed up a manual that explains how to call an API from python but I do not get the correct response :(
This is the URL of the API with the params:
https://icassy.com/api/login.php?usuario_email=warles34%40gmail.com&usuario_clave=123
This is the code I use to call the API from Python
import requests
urlLogin = "https://icassy.com/api/login.php"
params = {'usuario_email': 'warles34#gmail.com', 'usuario_clave': '123'}
r = requests.get(url=urlLogin, data=params)
print(r)
print(r.content)
and I get:
<Response [406]>
b'<head><title>Not Acceptable!</title></head><body><h1>Not Acceptable!</h1><p>An appropriate representation of the requested resource could not be found on this server. This error was generated by Mod_Security.</p></body></html>'
I should receive in JSON format the success message and the apikey like this:
{"message":"Successful login.","apikey":"eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwOlwvXC9leGFtcGxlLm9yZyIsImF1ZCI6Imh0dHA6XC9cL2ljYXNzeS5jb20iLCJpYXQiOjEzNTY5OTk1MjQsIm5iZiI6MTM1NzAwMDAwMCwiZGF0YSI6eyJ1c3VhcmlvX2lkIjoiMzQiLCJ1c3VhcmlvX25vbWJyZSI6IkNhcmxvcyIsInVzdWFyaW9fYXBlbGxpZG8iOiJQZXJleiIsInVzdWFyaW9fZW1haWwiOiJ3YXJsZXMzNEBnbWFpbC5jb20ifX0.bOhrC-vXhQEHtbbZGmhLByCxvJY7YxDrLhVOfy9zeFc"}
Looks like there is a validation on the server to check if request is made from some browser. Adding a user-agent header should do it -
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
r = requests.get(url=urlLogin, params=params, headers=headers)
This link of user agents might come handy in future.
I turned out that the service I was doing a request to was hosted on Akamai that has a bot manager. It looks at the requests (where it comes from) and if it determines that it is a bot you get a 406 error.
The solution was to ask for the server IP to be whitelisted, or to send a special header to all server communication.
In my case, I had
'Accept': 'text/plain'
and it worked after I replaced it with
'Accept': 'application/json'
I didn't need to use user-agent at all
Related
I am trying to complete a webscrape of a page that requires a log-in first. I am fairly certain that I have my code and input names ('login' and 'password') correct yet it still gives me a 'Login Failed' page. Here is my code:
payload = {'login': 'MY_USERNAME', 'password': 'MY_PASSWORD'}
login_url = "https://www.spatialgroup.com.au/property_daily/"
with requests.Session() as session:
session.post(login_url, data=payload)
response = session.get("https://www.spatialgroup.com.au/cgi-bin/login.cgi")
html = response.text
print(html)
I've done some snooping around and have figured out that the session doesn't stay logged in when I run my session.get("LOGGEDIN_PAGE"). For example, if I complete the log in process and then enter a URL into the address bar that I know for a fact is a page only accessible once logged in, it returns me to the 'Login Failed' page. How would I get around this if my login session is not maintained?
As others have mentioned, its hard to help here without knowing the actual site you are attempting to log in to.
I'd point out that you aren't using any set HTTP headers at all, which is a common validation check for logins on webpages. If you're sure that you are POSTing the data in the right format (form encoded versus json encoded), then I would open up Chrome inspector and copy the user-agent from your browser.
s = requests.Session()
s.headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36',
'Accept': '*/*'
}
Also, it's good practice to check the response status code of each web request you make using a try/except pattern. This will help you catch errors as you write and test requests, instead of blindly guessing which requests are erroneous.
r = requests.get('http://mypage.com')
try:
r.raise_for_status()
except requests.exceptions.HTTPError:
print('oops bad status code {} on request!'.format(r.status_code))
Edit: Now that you've given us the site, inspecting a login attempt reveals that the form data isn't actually being POSTed to that website, but rather it's being sent to a CGI script url.
To find this, open up Chrome Inspector and watch the "Network" tab as you try to login. You'll see that the login is actually being sent to https://www.spatialgroup.com.au/cgi-bin/login.cgi, not the actual login page. When you submit to this login page, it executes a 302 redirect after logging in. We can check the location after performing the request to see if the login was successful.
Knowing this I would send a request like this:
s = requests.Session()
# try to login
r = s.post(
url='https://www.spatialgroup.com.au/cgi-bin/login.cgi',
headers={
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3'
},
data={
'login': USERNAME,
'password': PASSWORD
}
)
# now lets check to make sure we didnt get 4XX or 5XX errors
try:
r.raise_for_status()
except requests.exceptions.HTTPError:
print('oops bad status code {} on request!'.format(r.status_code))
else:
print('our login redirected to: {}'.format(r.url))
# subsequently if the login was successful, you can now make a request to the login-protected page at this point
It's very difficult to help you without having the actual website you are working with. That being said I would recommend you changing this line:
session.post(login_url, data=payload)
to this one:
session.post(login_url, json=payload)
hope this helps
I'm trying to authenticate on a French water provider website to get my water consumption data. The website does not provide any api and I'm trying to make a python script that authenticates on the website and crawls the data. My work is based on a working Domoticz python script and a shell script.
The workflow is the following:
Get a token from the website
Authenticate with login, password, and token get at step 1
Get 1 or more cookies from step 2
Get data using the cookie(s) from 3
I'm stuck at step 2 where I can't get the cookies with my python script. I tried with postman, curl, and wget and it is working. I even used the python code generated by postman and I still get no cookies.
Heres is a screenshot of my postman post request
which gives two cookies in the response.
And here is my python code:
import requests
url = "https://www.toutsurmoneau.fr/mon-compte-en-ligne/je-me-connecte"
querystring = {"_username":"mymail#gmail.com","_password":"mypass","_csrf_token":"knfOIFZNhiCVxHS0U84GW5CrfMt36eLvqPPYGDSsOww","signin[username]":"mymail#gmail.com","signin[password]":"mypass","tsme_user_login[_username]":"mymail#gmail.com","tsme_user_login[_password]":"mypass"}
payload = ""
headers = {
'Accept': "application/json, text/javascript, */*; q=0.01",
'Content-Type': "application/x-www-form-urlencoded",
'Accept-Language': "fr,fr-FR;q=0.8,en;q=0.6",
'User-Agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Mobile Safari/537.36",
'Connection': "keep-alive",
'cache-control': "no-cache",
'Postman-Token': "c7e5f7ca-abea-4161-999a-3c28ec979628"
}
response = requests.request("POST", url, data=payload, headers=headers, params=querystring)
print(response.cookies.get_dict())
The output is {}.
I cannot figure out what I'm doing wrong.
If you have any help to provide, I'll be happy to get it.
Thanks for reading.
Edit:
Some of my assumptions were wrong. The shell script was indeed working but not Postman. I was confused because of response 200 I receive.
So I answer my own question.
First, when getting the token at step 1, I receive a cookie. I'm supposed to use this cookie when logging in which I did not do before.
Then, when using this cookie and the token to log in step 2, I was not able to see any cookie in the response I receive while I was well connected (I find in the content a "disconnect" string which is here only if well logged in). That's a normal behavior since cookies are not sent in the response of a post request.
I had to create a requests.session to post my log in form, and the session stores the cookie.
Now, I'm able to use this information to grab the data from the server.
Hope that will help others.
I am trying to log in to a website using python and the requests module.
My problem is that I am still seeing the log in page even after I have given my username / password and am trying to access pages after the log in - in other words, I am not getting past the log in page, even though it seems successful.
I am learning that it can be a different process with each website and so it's not obvious what I need to add to fix the problem.
It was suggested that I download a web traffic snooper like Fiddler and then try to replicate the actions with my python script.
I have downloaded Fiddler, but I'm a little out of my depth with how I find and replicate the actions that I need.
Any help would be gratefully received.
My original code:
import requests
payload = {
'login_Email': 'xxxxx#gmail.com',
'login_Password': 'xxxxx'
}
with requests.Session() as s:
p = s.post('https://www.auction4cars.com/', data=payload)
print p.text
If you look at the browser developer tools, you may see that the login POST request needs to be submitted to a different URL:
https://www.auction4cars.com/Home/UserLogin
Note that also the payload needs to be:
payload = {
'login_Email_or_Username': 'xxxxx#gmail.com',
'login_Password': 'xxxxx'
}
I'd still visit the login page before doing that and set the headers:
HOME_URL = 'https://www.auction4cars.com/'
LOGIN_URL = "https://www.auction4cars.com/Home/UserLogin"
with requests.Session() as s:
s.headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"
}
s.get(HOME_URL)
p = s.post(LOGIN_URL, data=payload)
print(p.text) # or use p.json() as, I think, the response format is JSON
I'm new to Python and to coding in general. I'm trying to request poloniex public API using this simple code but keep getting 403 Error.
Does anyone have any idea what can cause it and how to fix it?
Link to Poloniex API Doc
Thanks
import requests
def public_method():
url = 'https://poloniex.com/public?command=returnTicker'
api = requests.get(url)
return api
print(public_method())
403 is a HTTP status code. You can learn more about those here.
Saying that, the code you supplied works. It connects to the api however the api itself returns a Forbidden 403 response.
Your code will return a requests object which is (I believe) almost what you want. If you'd like to retrieve the data from the poloniex api you'll need to call json() method against said object.
import requests
def public_method():
url = 'https://poloniex.com/public?command=returnTicker'
api = requests.get(url)
return api
print(public_method().json())
Basically, it requires to have header. This solved the problem.
import requests
def public_method():
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132
Safari/537.36',
'Cookie':
'cf_clearance=1159d2ca806b3ebf2a85a8706f4b8c90ff6abc01-1517488982-1800'
}
url = 'https://poloniex.com/public?command=returnTicker'
api = requests.get(url, headers=headers)
return api
print(public_method())
If it has a CAPTCHA when you open from you browser, it is a GeoIp security feature, you may use a VPS or VPN localised inside the Europe or the US zone to avoid this security issue.
I want to open a url using urllib.request.urlopen('someurl'):
with urllib.request.urlopen('someurl') as url:
b = url.read()
I keep getting the following error:
urllib.error.HTTPError: HTTP Error 403: Forbidden
I understand the error to be due to the site not letting python access it, to stop bots wasting their network resources- which is understandable. I went searching and found that you need to change the user agent for urllib. However all the guides and solutions I have found for this issue as to how to change the user agent have been with urllib2, and I am using python 3 so all the solutions don't work.
How can I fix this problem with python 3?
From the Python docs:
import urllib.request
req = urllib.request.Request(
url,
data=None,
headers={
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
}
)
f = urllib.request.urlopen(req)
print(f.read().decode('utf-8'))
from urllib.request import urlopen, Request
urlopen(Request(url, headers={'User-Agent': 'Mozilla'}))
I just answered a similar question here: https://stackoverflow.com/a/43501438/206820
In case you just not only want to open the URL, but also want to download the resource(say, a PDF file), you can use the code as below:
# proxy = ProxyHandler({'http': 'http://192.168.1.31:8888'})
proxy = ProxyHandler({})
opener = build_opener(proxy)
opener.addheaders = [('User-Agent','Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.1 Safari/603.1.30')]
install_opener(opener)
result = urlretrieve(url=file_url, filename=file_name)
The reason I added proxy is to monitor the traffic in Charles, and here is the traffic I got:
The host site rejection is coming from the OWASP ModSecurity Core Rules for Apache mod-security. Rule 900002 has a list of "bad" user agents, and one of them is "python-urllib2". That's why requests with the default user agent fail.
Unfortunately, if you use Python's "robotparser" function,
https://docs.python.org/3.5/library/urllib.robotparser.html?highlight=robotparser#module-urllib.robotparser
it uses the default Python user agent, and there's no parameter to change that. If "robotparser"'s attempt to read "robots.txt" is refused (not just URL not found), it then treats all URLs from that site as disallowed.