Getting author's articles from Scopus using Scopus API (AUTHENTICATION_ERROR) - python

I've registered at http://www.developers.elsevier.com/action/devprojects. I created a project and got my scopus key:
Now, using this generated key, I would like to find an author by firstname, lastname and subjectarea. I make requests from my university network, which is allowed to visit Scopus (I have full manual access to Scopus search, use it from Firefox with no problem). However, I wanted to automatize my Scopus mining, by writing a simple script. I would like to find publications of an author by giving his/her firstname, lastname and subjectarea.
Here's my code:
# !/usr/bin/env python
# -*- coding: utf-8 -*-
import requests
import json
from scopus import SCOPUS_API_KEY
scopus_author_search_url = 'http://api.elsevier.com/content/search/author?'
headers = {'Accept':'application/json', 'X-ELS-APIKey': SCOPUS_API_KEY}
search_query = 'query=AUTHFIRST(%) AND AUTHLASTNAME(%s) AND SUBJAREA(%s)' % ('John', 'Kitchin', 'COMP')
# api_resource = "http://api.elsevier.com/content/search/author?apiKey=%s&" % (SCOPUS_API_KEY)
# request with first searching page
page_request = requests.get(scopus_author_search_url + search_query, headers=headers)
print page_request.url
# response to json
page = json.loads(page_request.content.decode("utf-8"))
print page
Where SCOPUS_API_KEY looks just like this: SCOPUS_API_KEY="xxxxxxxx".
Although I have full access to scopus from my university network, I'm getting such response:
{u'service-error': {u'status': {u'statusText': u'Requestor
configuration settings insufficient for access to this resource.',
u'statusCode': u'AUTHENTICATION_ERROR'}}}
The generated link looks like this: http://api.elsevier.com/content/search/author?query=AUTHFIRST(John)%20AND%20AUTHLASTNAME(Kitchin)%20AND%20SUBJAREA(COMP) and when I click it, it shows an XML file:
<service-error><status>
<statusCode>AUTHORIZATION_ERROR</statusCode>
<statusText>No APIKey provided for request</statusText>
</status></service-error>
Or, when I change the scopus_author_search_url to "http://api.elsevier.com/content/search/author?apiKey=%s&" % (SCOPUS_API_KEY) I'm getting:
{u'service-error': {u'status': {u'statusText': u'Requestor configuration settings insufficient for access to this resource.', u'statusCode': u'AUTHENTICATION_ERROR'}}} and the XML file:
<service-error>
<status>
<statusCode>AUTHENTICATION_ERROR</statusCode>
<statusText>Requestor configuration settings insufficient for access to this resource.</statusText>
</status>
</service-error>
What can be the cause of this problem and how can I fix it?

I have just registered for an API key and tested it first with this URL:
http://api.elsevier.com/content/search/author?apikey=4xxxxxxxxxxxxxxxxxxxxxxxxxxxxx43&query=AUTHFIRST%28John%29+AND+AUTHLASTNAME%28Kitchin%29+AND+SUBJAREA%28COMP%29
This works fine from my university network. I also tested a second API Key, so have verified one with registered website on my university domain, one with registered website http://apitest.example.com, ruling out the domain name used to register as the source of your problem.
I tested this
in the browser,
using your python code both with the api key in the headers. The only change I made to your code is removing
from scopus import SCOPUS_API_KEY
and adding
SCOPUS_API_KEY ='4xxxxxxxxxxxxxxxxxxxxxxxxxxxxx43'
using your python code adapted to put the apikey in the URL instead of the headers.
In all cases, the query returns two authors, one at Carnegie Mellon and one at Palo Alto.
I can't replicate your error message. If I try to use the API key from an IP address unregistered with elsevier (e.g. my home computer), I see a different error:
<service-error>
<status>
<statusCode>AUTHENTICATION_ERROR</statusCode>
<statusText>Client IP Address: xxx.yyy.aaa.bbb does not resolve to an account</statusText>
</status>
</service-error>
If I use a random (wrong) API key from the university network, I see
<service-error>
<status>
<statusCode>AUTHORIZATION_ERROR</statusCode>
<statusText>APIKey <mad3upa1phanum3r1ck3y> with IP address <my.uni.IP.add> is unrecognized or has insufficient privileges for access to this resource</statusText>
</status>
</service-error>
Debug steps
As I can't replicate your problem - here are some diagnostic steps you can use to resolve:
Use your browser at uni to actually submit the api query with your key in the URL (i.e. copy the URL above, paste it into the address bar, substitute your key and see whether you get the XML back)
If 1 returns the XML you expect, move onto submitting the request via Python - first, copy the exact URL straight into Python (no variable substitution via %s, no apikey in the header) and simply do a .get() on it.
If 2 returns correctly, ensure that your SCOPUS_API_KEY holds the exact key value, no more no less. i.e. print 'SCOPUS_API_KEY' should return your apikey: 4xxxxxxxxxxxxxxxxxxxxxxxxxxxxx43
If 1 returns the error, it looks like your uni (for whatever reason) has not got access to the authors query API. This doesn't make much sense given that you can perform manual search, but that is all I can conclude
Docs
For reference the authentication algorithm documentation is here, but it is not very simple to follow. You are following authentication option 1 and your method should just work.
N.B. The API is limited to 5000 author retrievals per week. If you have run a lot of queries in a loop, even if they have failed, it is possible that you have exceeded that...

For future reference. OP was using the package scopus which has long been renamed to pybliometrics.
Nowadays you can do
from pybliometrics.scopus import AuthorSearch
q = "AUTHFIRST(John) AND AUTHLASTNAME(Kitchin) AND SUBJAREA(COMP)"
s = AuthorSearch(q) # handles access, retrieval, parsing and even caches results
print(s)
results = s.authors # Holds all the information as a list of namedtuples
print(results) # You can put this into a pandas DataFrame as well

Related

python: getting the complete url with os.environ

I would like to get the COMPLETE URL using os.environ.
I am rendering notebooks with voila and I would like to open url from a dashboard using parameters in the URL.
So far I have:
URL_BASE = os.environ.get('SCRIPT_NAME')
PARAMETERS = os.environ.get("QUERY_STRING")
print(f'{URL_BASE=}')
print(f'{PARAMETERS=}')
assuming this is the url:
https://flyingcar.org/john/voila/render/shared/users/j/john/learn_url.ipynb?redirects=2&name=john&dossier=SA123445#{whatever=no/f=5}
URL_BASE="flyingcar.org/john/voila/render/shared/users/j/john/learn_url.ipynb"
&
PARAMETERS="redirects=2&name=john&dossier=SA123445"
Having a look at the whole collection of vars in os.environ I dont see any that would include the whole url (including what is there after #) in order to parse that part as well as with parameters.
captured_values = parse_qs(PARAMETERS)
print('here parse_qs of query:',captured_values)
>>> here parse_qs of query: {'d': ['34'], 'f': ['56']}
Some ideas?
I tried to print all the os.environ variables with:
for k,v in os.environ.items():
print(k,v)
but it nothing seems to contain what is beyond the # symbol in the URL
Any ideas?
Thanks
RELATED:
Get current URL in Python
The fragment (#) part of the URI never leaves the user agent, so it can't be picked up by the python web server. See: https://datatracker.ietf.org/doc/html/rfc3986#page-24
... instead, the fragment identifier is separated
from the rest of the URI prior to a dereference, and thus the
identifying information within the fragment itself is dereferenced
solely by the user agent, regardless of the URI scheme.

Sending multi-var dictionary via get in google app engine; python

I recently, started working on the google app engine and am facing the following problem:
I have a main.py where my user sees his own comments + those of others. Now, I need to add an EditComment.py where a user is directed when he wants to edit his code.
I am working with the guestbook application only, and to actually fetch the selected comment I need both guestbook name and the content of the comment. How do I create this url?
In other words, I need to create a url like
\edit?guestbook="Family"&content="helloworld"
I tried this
//I need to send guestbook_name and content of greeting in order to fetch the row from
//the database
//So, I show the text of the greeting and give a url to edit page
content_toSend = {'guestbook_name':guestbook_name,'content':greeting.content}
self.response.write('<blockquote>%s</blockquote>' %
(content_toSend,greeting.content))
//But the other side handler receives only the first variable of the dict in the get request
so that the user can click on a greeting and be directed to the edit page. But the get request just sends the first var(guestbook_name) in the url. How do I send the whole dictionary?
Edit : I had tired urllib.urlencode but the handler in webapp2 requires a dict and so that didn't work
The method urlencode() of urllib standard library can be useful.
edit with example:
content_toSend = urllib.urlencode({
'guestbook_name' : guestbook_name,
'content' : greeting.content
})
If you know that you are going to have these two variables in dictionary why dont u try this
self.response.write('<blockquote>%s</blockquote>' %
(content_toSend['guestbook_name'],content_toSend['content'],greeting.content))

Parsing wikipedia stubs using python wikitools

I implemented the example from: Mediawiki and Python
I read Get wikipedia abstract using python and How to parse/extract data from a mediawiki marked-up article via python and several others.
I am trying to get a dump of some Wikipedia stubs associated with a category and insert them into an internal semantic mediawiki site. For the purpose of this example I am using the "Somali_Region" category. The script uses the mediawiki API to obtain data then it parses the data removing all template information which is desirable.
from wikitools import wiki
from wikitools import category
import mwparserfromhell
wikisite = "http://en.wikipedia.org/w/api.php"
parse_category = "Somali_Region"
wikiObject = wiki.Wiki(wikisite)
wikiCategory = category.Category(wikiObject, parse_category)
articles = wikiCategory.getAllMembersGen(namespaces=[0])
for article in articles:
wikiraw = article.getWikiText()
parsedWikiText = mwparserfromhell.parse(wikiraw)
for template in parsedWikiText.filter_templates():
parsedWikiText.remove(template)
print parsedWikiText
The internal semantic mediawiki site fails if I try to do a dump from wikipedia and do an insert, so that is not an option. Is it possible to do use the API to insert data into the semantic mediawiki site? I read the mediawiki API edit page, but I could not find a python example.
If I understand correctly, you want to take your parsedWikiText and save it into a private wiki.
Here's what I have for doing that kind of thing (you'll need to store USERNAME and PASSWORD somewhere; I use a config file, but there are more secure ways). I'll pick up from right before your for loop...
# Set up and authenticate into the target wiki if you need to.
from wikitools import wiki, page
target_wiki = wiki.Wiki('http://wiki.example.com/w/api.php')
site.login(USERNAME, PASSWORD)
for article in articles:
wikiraw = article.getWikiText()
parsedWikiText = mwparserfromhell.parse(wikiraw)
for template in parsedWikiText.filter_templates():
parsedWikiText.remove(template)
# Use the API's edit function to save the new content.
target_title = article.title
target_page = page.Page(target_wiki, target_title)
result = target_page.edit(text=parsedWikiText, summary="Imported text")
# Check to see if it worked.
if result['edit']['result'] == 'Success':
print 'Saved', target_title
else:
print 'Save failed', target_title
I'm assuming here you want to save parsedWikiText into a new page. If there's already something on the page in your wiki, you'll have to read it first with target_page.getWikiText() and then mix the new text in somehow. I've also assumed the article will have the same name it had in Wikipedia; if not then change target_title.

using amazon api with python

it seems that amazon has changed their API, i get error from Python:
id = "..."
pas = "..."
produit = amazon.API(id, pas, "fr")
produit.item_search("playstation")
and i get this error:
AWSError: AWS.MissingParameters: Your request is missing required
parameters. Required parameters include AssociateTag.
and i've tried the example in the documentation and it's the same:
produit.item_search('Books', Publisher='Galileo Press')
AWSError: AWS.MissingParameters: Your request is missing required
parameters. Required parameters include AssociateTag.
i've found this:
Changing the example to:
api = API(AWS_KEY, SECRET_KEY, 'de',ASSOC_TAG)
from here:
https://bitbucket.org/basti/python-amazon-product-api/issue/33/required-parameters-include-associatetag
any ideas? or the documentation should be updated?
They dropped support for obsolete APIs recently, and the newest version requires a valid Associate Tag.
https://affiliate-program.amazon.com/gp/advertising/api/detail/api-changes.html
Associate Tag Parameter: Every request made to the API should include a valid Associate Tag. Any request that does not contain a valid Associate Tag will be rejected with an appropriate error message.
ASSOC_TAG must be your real tag (one that matches the API key).

Amazon Web Service ItemSearch DetailPageURL's with Associate IDs?

DetailPageURL's returned by ItemSearch seem to include an incorrect ID/tag rather than the associate ID I requested the search with.
I'm getting:
http://www.amazon.co.uk/gp/product/1590595009?SubscriptionId=XXX&tag=foo-12&linkCode=as2&camp=1634&creative=19450&creativeASIN=1590595009
When I expect:
http://www.amazon.co.uk/gp/product/1590595009?SubscriptionId=XXX&tag=wwwmydomain-12&linkCode=as2&camp=1634&creative=19450&creativeASIN=1590595009
How do I get the correct tag? (Note that SO rewrites the above links to their own Associate ID if you click either of the above)
I'm using Python and PyAWS 0.3.0, although I think the problem is with my request, rather than with the API wrapper.
(As an aside, The Amazon Associates Link Checker (U.K. store)/U.S. store is invaluable in testing these links)
Simple error in the end..... I was including the tag in the initial search:
for searchResult in
ecs.ItemSearch(item,
SearchIndex=index,
AssociateTag='wwwmydomain-12')
But not in the secondary loop that steps through each result getting more details:
for item in
ecs.ItemSearch(searchResult.ASIN,
ResponseGroup='Medium'):
should be:
for item in
ecs.ItemSearch(searchResult.ASIN,
ResponseGroup='Medium',
AssociateTag='wwwodbodycom-21'):
The tag is needed in both - it seems it's not carried over.

Categories