OAuth for third party API on Jupyter? - python

I'm using Python on a Jupyter notebook for data analysis, and want access to a third party API (Mendeley) that uses OAuth. There used to be a work-around with a server on Heroku that produced a token manually, but that's been discontinued recently.
This must be an insanely common problem, but I can't find a maintained library that supports it. Most Python OAuth libraries are server-only; there's a well-supported JupyterHub-OAuthenticator, but IFAICS that is using OAuth for a different purpose.
ipyauth looks the business, but it's not been updated much and it's not documented how to extend it for new services. That situation usually means there's something better-supported available.
What is the currently-maintained Jupyter-Python-ThirdPartyAPI library, please?

Well, one answer turns out to be just to use the requests package, and to copy and paste the redirected URL each time:
from requests_oauthlib import OAuth2Session
scope = 'all'
redirect_uri='http://localhost:8888/'
oauth = OAuth2Session('YourApplicationApiIdNumber', redirect_uri=redirect_uri, scope=scope)
authorization_url, state = oauth.authorization_url(
"https://api.mendeley.com/oauth/authorize" )
print( 'Please go to %s to authorize access, and copy the final localhost URL' % authorization_url )
assert(False) # Stop processing until this is done.
... into a variable:
authorization_response = 'http://localhost:8888/tree?TheStuffWereInterestedIn'
... And then take it from there:
token = oauth.fetch_token(
'https://api.mendeley.com/oauth/token',
authorization_response=authorization_response.replace('http', 'https').replace(',',''),
client_secret='YourClientSecret')
r = oauth.get('https://api.mendeley.com/documents?sort=last_modified&order=desc&limit=500', timeout=30)
You have to configure the Mendeley application interface with the callback URL. This is http://localhost:8888/ , because that's something that Jupyter can display without losing the additional OAuth2 parameters. But Requests OAuth implementation doesn't accept non-https links (nor a trailing comma that Jupyter adds occasionally), so we fudge it as shown.
I guess this approach will work with pretty much any OAuth2 API. Certainly Requests lists quite a few.

Not sure if this helps or not, but I had to do an Oauth using PKCE Authorization flow with client_id, a registered callback url and no secret. And it had to run in Jupyter. The user is required to login into the authorization provider's website as part of the redirect (I think. I get mixed up with the terms because for me the Auth provider and the resource were from the same source). To do this I used the python HTTPServer and request handler to intercept the callback url. The Httpserver then hands over the relevant authorization code to my OAuth client when fetching the access token.
from http.server import BaseHTTPRequestHandler, HTTPServer
from urllib import parse
import random
import string
import hashlib
import base64
from typing import Any
import webbrowser
from authlib.integrations.requests_client import OAuth2Session
config = {
'scopes': ['openid', 'profile'],
'port' : 8888,
'redirect_url': 'http://localhost:8888/auth/callback',
'access_token_url': 'https://some_oauth_proivder.com/oauth2/access',
'auth_code_url': 'https://some_oauth_proivder.com/oauth2/authz/',
'client_id': 'providedByOauthprovider'}
def generate_code() -> tuple[str, str]:
rand = random.SystemRandom()
code_verifier = ''.join(rand.choices(string.ascii_letters + string.digits, k=128))
code_sha_256 = hashlib.sha256(code_verifier.encode('utf-8')).digest()
b64 = base64.urlsafe_b64encode(code_sha_256)
code_challenge = b64.decode('utf-8').replace('=', '')
return (code_verifier, code_challenge)
def login(config: dict[str, Any]) -> str:
class OAuthHttpServer(HTTPServer):
def __init__(self, *args, **kwargs):
HTTPServer.__init__(self, *args, **kwargs)
self.authorization_code = ""
class OAuthHttpHandler(BaseHTTPRequestHandler):
def do_GET(self):
self.send_response(200)
self.send_header("Content-Type", "text/html")
self.end_headers()
self.wfile.write("Redirecting to the My Auth provider login\n".encode("UTF-8"))
parsed = parse.urlparse(self.path)
client.fetch_token(url=config['access_token_url'],
authorization_response=self.path,
state=state,
code_verifier=code_verifier,
grant_type = 'authorization_code')
self.wfile.write("""
<html>
<body>
<h2>Authorization request to My Auth provider has been completed.</h1>
<h3>You may close this tab or window now.</h3>
</body>
</html>
""".encode("UTF-8"))
self.wfile.write('<script> setTimeout("window.close()", 2500);</script>'.encode("UTF-8")) #Timeout only works if already logged for some reason
with OAuthHttpServer(('', config["port"]), OAuthHttpHandler) as httpd:
client = OAuth2Session(client_id=config['client_id'],
scope=config['scopes'],
redirect_uri=config['redirect_url'],
code_challenge_method='S256')
code_verifier, code_challenge = generate_code()
auth_uri, state = client.create_authorization_url(config['auth_code_url'], code_verifier=code_verifier)
webbrowser.open_new(auth_uri)
httpd.handle_request()
clear_output()
print("Logged in successfully")
return client.token['access_token']
I borrowed a lot of this from a sample from this guy, but I had to swap out the Oauth client because it didn't work for me. And I had to rearrange the http request handler a little.
https://github.com/CamiloTerevinto/Blog/tree/main/Samples

Related

Google login on Pythonanywhere

I'm using this code to connect remotely to my google account using requests. On local it works perfectly but when I'm trying to use on Pythonanywhere (same version of python 3.6 + I do have a paid account, a hacker plan) it doesn't work, it doesn't connect to my google account at all, without any error message in the console, do you have any idea what could be the problem ?
from bs4 import BeautifulSoup
import requests
class SessionGoogle:
def __init__(self, url_login, url_auth, login, pwd):
self.ses = requests.session()
login_html = self.ses.get(url_login)
soup_login = BeautifulSoup(login_html.content).find('form').find_all('input')
my_dict = {}
for u in soup_login:
if u.has_attr('value'):
my_dict[u['name']] = u['value']
# override the inputs without login and pwd:
my_dict['Email'] = login
my_dict['Passwd'] = pwd
self.ses.post(url_auth, data=my_dict)
def get(self, URL):
return self.ses.get(URL).text
url_login = "https://accounts.google.com/ServiceLogin"
url_auth = "https://accounts.google.com/ServiceLoginAuth"
session = SessionGoogle(url_login, url_auth, "myGoogleLogin", "myPassword")
print session.get("http://plus.google.com")
You're never actually looking at the response of the login post request, so it's entirely likely that Google is rejecting your login for some reason and you have no way of knowing.

How to use oauth2 to access StackExchange API?

I'm following the instructions mentioned here: https://api.stackexchange.com/docs/authentication
But since there is no code provided, I'm not able to understand the flow correctly.
I've been trying to get the authentication part done using two methods below but I have hit a deadend.
1)
import requests
from pprint import pprint
resp = requests.get('https://stackexchange.com/oauth/dialog?client_id=6667&scope=private_info&redirect_uri=https://stackexchange.com/oauth/login_success/')
pprint(vars(resp))
2)
import oauth2 as oauth
from pprint import pprint
url = 'https://www.stackexchange.com'
request_token_url = '%s/oauth/' % url
access_token_url = '%s/' % url
consumer = oauth.Consumer(key='mykey',
secret='mysecret')
client = oauth.Client(consumer)
response, content = client.request(request_token_url, 'GET')
print(response, content)
I'm not sure how to go forward from here? I need to use the access token that is returned and use it to query the API. A sample code would really really help! Thanks.
EDIT: This is the code I'm using currently:
from requests_oauthlib import OAuth2Session
from pprint import pprint
client_id = 'x'
client_secret = 'x'
redirect_uri = 'https://stackexchange.com/oauth/login_success'
scope = 'no_expiry'
oauth = OAuth2Session(client_id, redirect_uri=redirect_uri, scope=scope)
pprint(vars(oauth))
authorization_url, state = oauth.authorization_url('https://stackexchange.com/oauth/dialog')
print(authorization_url)
Instead of having to click on the authorization_url and get the token, is there a way I can directly fetch the token within the script itself?
Of the two methods you used, the first is the recommended method for desktop applications. It is probably correct.
OAuth is intended to force the user to go to a specific webpage and acknowledge that they are giving permission (usually through clicking a button) for an application to access their data. The HTTP responses you print are merely the webpage where a user needs to click accept.
To get a feeling for the flow, put the first address (https://stackexchange.com/oauth/dialog?client_id=6667&scope=&redirect_uri=https://stackexchange.com/oauth/login_success/) in the address bar and click accept on the loaded page. The access_token will be in the URL right after that.
If you are making the application only for yourself, the access_token can be copied into your Python script. The token expires in one day; if that is too short add no_expiry to scope to make it last forever. DO NOT share the token with anyone else, since it gives them access to details of your account! Each user of the script must generate their own token.
Test the access_token by inserting in your app's key and the access_token you just obtained into the url: https://api.stackexchange.com/2.2/me?key=key&site=stackoverflow&order=desc&sort=reputation&access_token=&filter=default
If you need a more automated, integrated, user-friendly solution, I would look at selenium webdriver to open a browser window and get the resulting credentials.
Just one minor correction on Marc's answer. If you want the access token to last forever, you should add no_expiry instead of no_expire.

403 when retrieving a WSDL via Python SUDS

I can't seem to get SUDS to download a WSDL that requires Basic auth credentials. My code is simple:
wsdl_url = 'https://example.com/ChangeRequest.do?WSDL'
self.client = Client(wsdl_url, username=username, password=password)
I've also tried:
from suds.transport.https import HttpAuthenticated
wsdl_url = 'https://example.com/ChangeRequest.do?WSDL'
credentials = dict(username=username, password=password)
t = HttpAuthenticated(**credentials)
self.client = Client(url=wsdl_url, transport=t)
In both cases, the service returns a 403 Forbidden error. I can go down into the SUDS code in http.py and add this line to the call:
u2request.add_header('Authorization','Basic xxxxxxxxxxxxxxxxxxxx')
This works. What am I doing wrong to get SUDS to pass my credentials when downloading the WSDL?
Note: I try to connect to the WSDL directly using both Chrome's Postman plugin and SoapUI, and the service works as well. So I know the credentials are correct.
I encountered a similar issue (suds v0.4, wsdl, 403), and found out that it was because the server I'm trying to access blocks any requests with the header User-Agent set like Python-urllib* (suds is using urllib2, hence the default header). Explicitly change the header solves the issue.
Particular to my solution: I overrode the open method of a transport class, and set client options, like the following code snippet. Note that we need to explicitly set for open and subsequent requests separately. Please advice better ways to circumvent this if you know any. And hope this post could help save someone's time in the future.
import urllib2
import suds
from suds.transport.https import HttpAuthenticated
from suds.transport import TransportError
URL = 'https://example.com/ChangeRequest.do?WSDL'
class HttpHeaderModify(HttpAuthenticated):
def open(self, request):
try:
url = request.url
u2request = urllib2.Request(url, headers={'User-Agent': 'Mozilla'})
self.proxy = self.options.proxy
return self.u2open(u2request)
except urllib2.HTTPError, e:
raise TransportError(str(e), e.code, e.fp)
transport = HttpHeaderModify()
client = Client(URL, transport=transport, timeout=10)
# Subsequent requests' header needs to be set again here. The overridden transport
# class only handles opening of the client.
client.set_options(headers={'User-Agent': 'Mozilla'})
P.S. Though my problem may not be the same, searching for "403 suds" pops up this SO question, so I decide just post my solution here.
reference post that gave me the right direction: https://bitbucket.org/jurko/suds/issues/27/client-request-for-wsdl-does-not-use-given
I used to have this issue before and compare with the soap UI header.
Found that suds missing to include the header (Host).
client.set_options(headers={'Host': 'value'})
And issue fixed.

django/python the works of views and bringing in an API

I'm just beginning to learn about python/django. I know PHP, but I wanted to get to know about this framework. I'm trying to work with yelp's API. I'm trying to figure out what to do when someone brings in a new file into the project.
In their business.py they have this:
import json
import oauth2
import optparse
import urllib
import urllib2
parser = optparse.OptionParser()
parser.add_option('-c', '--consumer_key', dest='consumer_key', help='OAuth consumer key (REQUIRED)')
parser.add_option('-s', '--consumer_secret', dest='consumer_secret', help='OAuth consumer secret (REQUIRED)')
parser.add_option('-t', '--token', dest='token', help='OAuth token (REQUIRED)')
parser.add_option('-e', '--token_secret', dest='token_secret', help='OAuth token secret (REQUIRED)')
parser.add_option('-a', '--host', dest='host', help='Host', default='api.yelp.com')
parser.add_option('-i', '--id', dest='id', help='Business')
parser.add_option('-u', '--cc', dest='cc', help='Country code')
parser.add_option('-n', '--lang', dest='lang', help='Language code')
options, args = parser.parse_args()
# Required options
if not options.consumer_key:
parser.error('--consumer_key required')
if not options.consumer_secret:
parser.error('--consumer_secret required')
if not options.token:
parser.error('--token required')
if not options.token_secret:
parser.error('--token_secret required')
if not options.id:
parser.error('--id required')
url_params = {}
if options.cc:
url_params['cc'] = options.cc
if options.lang:
url_params['lang'] = options.lang
path = '/v2/business/%s' % (options.id,)
def request(host, path, url_params, consumer_key, consumer_secret, token, token_secret):
"""Returns response for API request."""
# Unsigned URL
encoded_params = ''
if url_params:
encoded_params = urllib.urlencode(url_params)
url = 'http://%s%s?%s' % (host, path, encoded_params)
print 'URL: %s' % (url,)
# Sign the URL
consumer = oauth2.Consumer(consumer_key, consumer_secret)
oauth_request = oauth2.Request('GET', url, {})
oauth_request.update({'oauth_nonce': oauth2.generate_nonce(),
'oauth_timestamp': oauth2.generate_timestamp(),
'oauth_token': token,
'oauth_consumer_key': consumer_key})
token = oauth2.Token(token, token_secret)
oauth_request.sign_request(oauth2.SignatureMethod_HMAC_SHA1(), consumer, token)
signed_url = oauth_request.to_url()
print 'Signed URL: %s\n' % (signed_url,)
# Connect
try:
conn = urllib2.urlopen(signed_url, None)
try:
response = json.loads(conn.read())
finally:
conn.close()
except urllib2.HTTPError, error:
response = json.loads(error.read())
return response
response = request(options.host, path, url_params, options.consumer_key, options.consumer_secret, options.token, options.token_secret)
print json.dumps(response, sort_keys=True, indent=2)
Its very lengthy, I appologize for that. But my concern is what do I do with this? They've set up a def request() in here, and I'm assuming that I have to import this into my views?
I've been following the django documentation of creating a new app. In the documentation they've set up a bunch of def inside the views.py file. I'm just confused as to how am I supposed to make this work with my project? If I wanted to search for a business in the URL, how would it send the data out?
Thanks for your help.
This is a command line script that makes http requests to the yelp api. You probably don't want to make such an external request within the context of a main request handler. Well, you could call a request handler that makes this call to yelp. Let's see ...
You could import the request function and instead of invoking it with command line options, call it yourself.
from yelp.business import request as yelp_req
def my_request_handler(request):
json_from_yelp = yelp_req(...
# do stuff and return a response
Making this kind of external call inside a request handler is pretty meh though (that is, making an http request to an external service within a request handler). If the call is in ajax, it may be ok for the ux.
This business.py is just an example showing you how to create a signed request with oauth2. You may be able to just import the request function and use it. OTOH, you may prefer to write your own (perhaps using the requests library). You probably want to use celery or some other async means to make the calls outside of your request handlers and/or cache the responses to avoid costly external http io with every request.

pywikipedia bot with https and http authentication

I'm having trouble getting my bot to login to a MediaWiki install on the intranet. I believe it is due to the http authentication protecting the wiki.
Facts:
The wiki root is: https://local.example.com/mywiki/
When visiting the wiki with a web browser, a popup comes up asking for enterprise credentials (I assume this is basic access authentication)
This is what I have in my user-config.py:
mylang = 'en'
family = 'mywiki'
usernames['mywiki']['en'] = u'Bot'
authenticate['local.example.com'] = ('user', 'pass')
This is what I have in mywiki_family.py:
# -*- coding: utf-8 -*-
import family, config
# The Wikimedia family that is known as mywiki
class Family(family.Family):
def __init__(self):
family.Family.__init__(self)
self.name = 'mywiki'
self.langs = { 'en' : 'local.example.com'}
def scriptpath(self, code):
return '/mywiki'
def version(self, code):
return '1.13.5'
def isPublic(self):
return False
def hostname(self, code):
return 'local.example.com'
def protocol(self, code):
return 'https'
def path(self, code):
return '/mywiki/index.php'
When I execute login.py -v -v, I get this:
urllib2.urlopen(urllib2.Request('https://local.example.com/w/index.php?title=Special:Userlogin&useskin=monobook&action=submit', wpSkipCookieCheck=1&wpPassword=XXXX&wpDomain=&wpRemember=1&wpLoginattempt=Aanmelden%20%26%20Inschrijven&wpName=Bot, {'Content-type': 'application/x-www-form-urlencoded', 'User-agent': 'PythonWikipediaBot/1.0'})):
(Redundant traceback info here)
urllib2.HTTPError: HTTP Error 401: Unauthorized
(I'm not sure why it has 'local.example.com/w' instead of '/mywiki'.)
I thought it might be trying to authenticate to example.com instead of example.com/wiki, so I changed the authenticate line to:
authenticate['local.example.com/mywiki'] = ('user', 'pass')
But then I get an HTTP 401.2 error back from IIS:
You do not have permission to view this directory or page using the credentials that you supplied because your Web browser is sending a WWW-Authenticate header field that the Web server is not configured to accept.
Any help on how to get this working would be appreciated.
Update After fixing my family file, it now says:
Getting information for site mywiki:en
('http error', 401, 'Unauthorized', )
WARNING: Could not open 'https://local.example.com/mywiki/index.php?title=Non-existing_page&action=edit&useskin=monobook'. Maybe the server or your connection is down. Retrying in 1 minutes...
I looked at the HTTP headers on a plan urllib2.ulropen call and it's using WWW-Authenticate: Negotiate WWW-Authenticate: NTLM. I'm guessing urllib2 and thus pywikipedia don't support this?
Update Added a tasty bounty for help in getting this to work. I can authenticate using python-ntlm. How do I integrate this into pywikipedia?
Well the fact that login.py tries accessing '\w' instead of your path shows that there is a family configuration issue.
Your code is indented strangely: is scriptpath a member of the new Family class? as in:
class Family(family.Family):
def __init__(self):
family.Family.__init__(self)
self.name = 'mywiki'
self.langs = { 'en' : 'local.example.com'}
def scriptpath(self, code):
return '/mywiki'
def version(self, code):
return '1.13.5'
def isPublic(self):
return False
def hostname(self, code):
return 'local.example.com'
def protocol(self, code):
return 'https'
?
I believe that something is wrong with your family file. A good way to check is to do in a python console:
import wikipedia
site = wikipedia.getSite('en', 'mywiki')
print site.login_address()
as long as the relative address is wrong, showing '/w' instead of '/mywiki', it means that the family file is still not configured correctly, and that the bot won't work :)
Update: how to integrate ntlm in pywikipedia?
I just had a look at the basic example here. I would integrate the code before that line in login.py:
response = urllib2.urlopen(urllib2.Request(self.site.protocol() + '://' + self.site.hostname() + address, data, headers))
You want to write something of the like:
from ntlm import HTTPNtlmAuthHandler
user = 'DOMAIN\User'
password = "Password"
url = self.site.protocol() + '://' + self.site.hostname()
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, url, user, password)
# create the NTLM authentication handler
auth_NTLM = HTTPNtlmAuthHandler.HTTPNtlmAuthHandler(passman)
# create and install the opener
opener = urllib2.build_opener(auth_NTLM)
urllib2.install_opener(opener)
response = urllib2.urlopen(urllib2.Request(self.site.protocol() + '://' + self.site.hostname() + address, data, headers))
I would test this and integrate it directly into pywikipedia codebase if only I had an available ntlm setup...
Whatever happens, please do not vanish with your solution: we're interested, at pywikipedia, by your solution :)
I am guessing the problem you have is that the server expects basic authentication and you are not handling that in your client. Michael Foord wrote a good article about handling basic authentication in Python.
You did not provide enough information for me to be sure about this, so if that does not work, please provide some additional information, like network dump of you connection attempt.

Categories