Getting 403 error using Python3 - python

I'm new to Python and to coding in general. I'm trying to request poloniex public API using this simple code but keep getting 403 Error.
Does anyone have any idea what can cause it and how to fix it?
Link to Poloniex API Doc
Thanks
import requests
def public_method():
url = 'https://poloniex.com/public?command=returnTicker'
api = requests.get(url)
return api
print(public_method())

403 is a HTTP status code. You can learn more about those here.
Saying that, the code you supplied works. It connects to the api however the api itself returns a Forbidden 403 response.
Your code will return a requests object which is (I believe) almost what you want. If you'd like to retrieve the data from the poloniex api you'll need to call json() method against said object.
import requests
def public_method():
url = 'https://poloniex.com/public?command=returnTicker'
api = requests.get(url)
return api
print(public_method().json())

Basically, it requires to have header. This solved the problem.
import requests
def public_method():
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132
Safari/537.36',
'Cookie':
'cf_clearance=1159d2ca806b3ebf2a85a8706f4b8c90ff6abc01-1517488982-1800'
}
url = 'https://poloniex.com/public?command=returnTicker'
api = requests.get(url, headers=headers)
return api
print(public_method())

If it has a CAPTCHA when you open from you browser, it is a GeoIp security feature, you may use a VPS or VPN localised inside the Europe or the US zone to avoid this security issue.

Related

AWS Lambda - python webscraping - unable to bypass cloudfare anti-bots from AWS ip but working from local ip

I've built a simple python web scraper that works as expected locally but does not work on AWS Lambda -- specifically and only for the website I would like to scrape. I've tested out just the scraping portion of the code and can confirm that is is a cloudflare anti-bot issue.
I've combed through relevant SO and medium articles and tried:
adding the appropriate headers
specifying user agent
using different libraries (urllib, cloudscraper, selenium)
using a virtual display (pyvirtualdisplay with xvfb) as according to this post: How to bypass Cloudflare bot protection in selenium
Example code of the urllib version to illustrate the question:
import json
import urllib.request
def lambda_handler(event, context):
url = 'https://disboard.org/servers/tag/python/15'
headers = {}
headers['User-Agent'] = "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17"
req = urllib.request.Request(url, headers = headers)
resp = urllib.request.urlopen(req)
respData = resp.read()
return respData
The above code returns a 403 status + reCAPTCHA.
I understand that data center IP ranges get handled more carefully by antispam than residential IPs -- is there any workaround for this?
Thank you in advance.

Web Scraping TooManyRedirects: Exceeded 30 redirects. requests_ip_rotator

import requests
from requests_ip_rotator import ApiGateway, EXTRA_REGIONS
if __name__ == "__main__":
# Create gateway object and initialise in AWS
gateway = ApiGateway("https://spare.avspart.com", regions=EXTRA_REGIONS, access_key_id = 'my key', access_key_secret = 'my secret key')
gateway.start(force=True)
# Execute from random IP
session = requests.Session()
# session.max_redirects = 100
session.mount("https://spare.avspart.com", gateway)
# setting User-Agent header
session.headers['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36'
response = session.get("https://spare.avspart.com/catalog/case/64848/4534337/677993/")
print(response.status_code)
# Delete gateways
gateway.shutdown()
I am trying to scrape this page"https://spare.avspart.com/catalog/case/64848/4534337/677993/" using requests-ip-rotator because I was blocked using requests.get() but when I try and access it I get TooManyRedirects: Exceeded 30 redirects. error.
I have read through most of the posts on this problem and tried various things such as changing the session.max_redirects and trying different types of headers, and also reaching out to the library creator. This answer Accepted answer for the same issue seems to solve the problem, but when I try and implement this in my code the issue persists.
It would be great if anyone has any recommendations for other things I can try.

I Call API from PYTHON I get the response 406 Not Acceptable

I created a API in my site and I'm trying to call an API from python but I always get 406 as a response, however, if I put the url in the browser with the parameters, I can see the correct answer
I already did some test in pages where you can tests you own API, I test it in the browser and work fine.
I already followed up a manual that explains how to call an API from python but I do not get the correct response :(
This is the URL of the API with the params:
https://icassy.com/api/login.php?usuario_email=warles34%40gmail.com&usuario_clave=123
This is the code I use to call the API from Python
import requests
urlLogin = "https://icassy.com/api/login.php"
params = {'usuario_email': 'warles34#gmail.com', 'usuario_clave': '123'}
r = requests.get(url=urlLogin, data=params)
print(r)
print(r.content)
and I get:
<Response [406]>
b'<head><title>Not Acceptable!</title></head><body><h1>Not Acceptable!</h1><p>An appropriate representation of the requested resource could not be found on this server. This error was generated by Mod_Security.</p></body></html>'
I should receive in JSON format the success message and the apikey like this:
{"message":"Successful login.","apikey":"eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwOlwvXC9leGFtcGxlLm9yZyIsImF1ZCI6Imh0dHA6XC9cL2ljYXNzeS5jb20iLCJpYXQiOjEzNTY5OTk1MjQsIm5iZiI6MTM1NzAwMDAwMCwiZGF0YSI6eyJ1c3VhcmlvX2lkIjoiMzQiLCJ1c3VhcmlvX25vbWJyZSI6IkNhcmxvcyIsInVzdWFyaW9fYXBlbGxpZG8iOiJQZXJleiIsInVzdWFyaW9fZW1haWwiOiJ3YXJsZXMzNEBnbWFpbC5jb20ifX0.bOhrC-vXhQEHtbbZGmhLByCxvJY7YxDrLhVOfy9zeFc"}
Looks like there is a validation on the server to check if request is made from some browser. Adding a user-agent header should do it -
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
r = requests.get(url=urlLogin, params=params, headers=headers)
This link of user agents might come handy in future.
I turned out that the service I was doing a request to was hosted on Akamai that has a bot manager. It looks at the requests (where it comes from) and if it determines that it is a bot you get a 406 error.
The solution was to ask for the server IP to be whitelisted, or to send a special header to all server communication.
In my case, I had
'Accept': 'text/plain'
and it worked after I replaced it with
'Accept': 'application/json'
I didn't need to use user-agent at all

Replicate browser actions with a python script using Fiddler

I am trying to log in to a website using python and the requests module.
My problem is that I am still seeing the log in page even after I have given my username / password and am trying to access pages after the log in - in other words, I am not getting past the log in page, even though it seems successful.
I am learning that it can be a different process with each website and so it's not obvious what I need to add to fix the problem.
It was suggested that I download a web traffic snooper like Fiddler and then try to replicate the actions with my python script.
I have downloaded Fiddler, but I'm a little out of my depth with how I find and replicate the actions that I need.
Any help would be gratefully received.
My original code:
import requests
payload = {
'login_Email': 'xxxxx#gmail.com',
'login_Password': 'xxxxx'
}
with requests.Session() as s:
p = s.post('https://www.auction4cars.com/', data=payload)
print p.text
If you look at the browser developer tools, you may see that the login POST request needs to be submitted to a different URL:
https://www.auction4cars.com/Home/UserLogin
Note that also the payload needs to be:
payload = {
'login_Email_or_Username': 'xxxxx#gmail.com',
'login_Password': 'xxxxx'
}
I'd still visit the login page before doing that and set the headers:
HOME_URL = 'https://www.auction4cars.com/'
LOGIN_URL = "https://www.auction4cars.com/Home/UserLogin"
with requests.Session() as s:
s.headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"
}
s.get(HOME_URL)
p = s.post(LOGIN_URL, data=payload)
print(p.text) # or use p.json() as, I think, the response format is JSON

How to Google in Python Using urllib or requests

What is the proper way to Google something in Python 3? I have tried requests and urllib for a Google page. When I simply res = requests.get("https://www.google.com/#q=" + query) that doesn't come back with the same HTML as when I inspect the Google page in Safari. The same happens with urllib. A similar thing happens when I use Bing. I am familiar with AJAX. However, it seems that that is now depreciated.
In python, if you do not specify the user agent header in http requests manually, python will add for you by default which can be detected by Google and may be forbidden by it.
Try the following if it can help.
import urllib
yourUrl = "post it here"
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'}
req = urllib.request.Request(yourUrl, headers = headers)
page = urllib.request.urlopen(req)

Categories