python: getting the complete url with os.environ - python

I would like to get the COMPLETE URL using os.environ.
I am rendering notebooks with voila and I would like to open url from a dashboard using parameters in the URL.
So far I have:
URL_BASE = os.environ.get('SCRIPT_NAME')
PARAMETERS = os.environ.get("QUERY_STRING")
print(f'{URL_BASE=}')
print(f'{PARAMETERS=}')
assuming this is the url:
https://flyingcar.org/john/voila/render/shared/users/j/john/learn_url.ipynb?redirects=2&name=john&dossier=SA123445#{whatever=no/f=5}
URL_BASE="flyingcar.org/john/voila/render/shared/users/j/john/learn_url.ipynb"
&
PARAMETERS="redirects=2&name=john&dossier=SA123445"
Having a look at the whole collection of vars in os.environ I dont see any that would include the whole url (including what is there after #) in order to parse that part as well as with parameters.
captured_values = parse_qs(PARAMETERS)
print('here parse_qs of query:',captured_values)
>>> here parse_qs of query: {'d': ['34'], 'f': ['56']}
Some ideas?
I tried to print all the os.environ variables with:
for k,v in os.environ.items():
print(k,v)
but it nothing seems to contain what is beyond the # symbol in the URL
Any ideas?
Thanks
RELATED:
Get current URL in Python

The fragment (#) part of the URI never leaves the user agent, so it can't be picked up by the python web server. See: https://datatracker.ietf.org/doc/html/rfc3986#page-24
... instead, the fragment identifier is separated
from the rest of the URI prior to a dereference, and thus the
identifying information within the fragment itself is dereferenced
solely by the user agent, regardless of the URI scheme.

Related

django allauth redirect to another page after email verification

I have this issue: I need to redirect to another page when email verification is confirmed and login for the first time after the email verification. I tried to configure this in setting.py, but didn't work.
My settings.py:
ACCOUNT_LOGIN_ON_EMAIL_CONFIRMATION = True
ACCOUNT_EMAIL_CONFIRMATION_AUTHENTICATED_REDIRECT_URL = os.getenv('schools/form-school')
the url in urls.py is: path('form-school', views.school_list_view, name = 'schools' ),
if you have any clue on how to resolve this please comment your ideas, thanks so much!!
When setting ACCOUNT_EMAIL_CONFIRMATION_AUTHENTICATED_REDIRECT_URL you have to specify the URL or named URL pattern associated to the view you want to display. Sometimes knowing the full URL can be tricky and using the named URL pattern comes in handy as it let's you reference your views without having to care about the absolute path where you defined the url. In your case the URL pattern is the string schools that is set as the value the parameter name.
It should be enough to set that as:
ACCOUNT_LOGIN_ON_EMAIL_CONFIRMATION = True
ACCOUNT_EMAIL_CONFIRMATION_AUTHENTICATED_REDIRECT_URL = "schools"
Note that os.getenv() is used when you need to retrieve the value for an environment variable. This is kinda weird in this situation and I doubt that your project is storing the URL/named URL pattern for a particular view in an environment variable (also environment variables with characters other than numbers, letters and underscore may not be well interpreted).

How to validate and know if an URL is a Google Docs URL? |Python,Flask|

I'm creating a website that has a function to let user share their Google Docs URL to each others. I want to validate the input of the user to be Google Docs URL before I let them post it so that it could be safe. I'm using Flask and Python and I wonder if there is anyway to validate this.
The only validations I learn so far are those from FlaskForm like below:
project_link = StringField('Google Docs link to your project', validators=[DataRequired()])
and to limits the URL's character to 100 in my models.py
I think a possible way to do it is to create some Python codes in my views.py that check if the URL contains phrases like "docs.google.com"...
I don't really know how to validate if an URL is a Google Docs URL and I would greatly appreciate it if you could show me how.
Thank you.
Try something like this:
url = "http://docs.google.com/an/example/google/doc"
prefixes = ["https://","http://"]
def validate(url):
for pre in prefixes:
url = url.strip(pre) # this gets rid of http or https prefixes
if url.startswith("docs.google.com"):
return True
else:
return False
This also has the effect of filtering out any unwanted prefixes, such as "chrome://" or "about://".
An example:
>>> url = "http://docs.google.com/document"
>>> validate(url)
True
>>> url = "https://googledocs.com"
>>> validate(url)
False
>>> url = "prefix://docs.google.com"
>>> validate(url)
False
URL='www......'
if 'docs.google.com' in URL and '&site=' not in URL:
print(True)
As monsieuralfonse64 pointed out, you need the second half of the statement to prevent bypasses where the previous page is listed as containing docs.google.com, but not the other site.
This answer is WRONG. as was once again pointed out, any number of prefixes could be in front of a link, and anything from microsoft.com/hello?x=docs.google.com to stackoverflow.com/docs.google.com?name=hello and youtube.com/watch?v=docs.google.com would all be validated in my approach.
I would like to add one more solution to these already good solutions. For stuff like this you can always just use existing libraries!
Existing libraries probably are accounting for some corner-cases that you haven't thought of yourself (if you chose the right one). We don't want ot re-invent the wheel now, do we?
Here's how I would go about it:
from urllib.parse import urlparse
url = "https://drive.google.nl"
format = "drive.google.com"
parsed = urlparse(url)
if(parsed.netloc == format and (parsed.scheme == "http" or parsed.scheme == "https")):
print(True)
I only tested this in python3, but I'm sure it'll also work for other python versions.

GET requests works only with params not just url

I've just discovered something strange. When downloading data from facebook with GET using the requests 2.18.4 library, I get error when I just use
requests.get('https://.../{}/likes?acces_token={}'.format(userID,token))
into which I parse the user ID and access - the API does not read the access token correctly.
But, it works fine as
requests.get('https://../{}'.format(userID), params={"access_token":token})
Or it works when I copy paste the values in the appropriate fields by hand in the python console.
So my hypothesis is that it has something to with how the token string got parsed using the params vs the string. But what I don't understand at all, why would that be the case? Or is ? character somehow strange in this case?
Double check if both the URLs are the same (in your post they differ by the /likes substring).
Then you can check how the library requests concatenated parameters from the params argument:
url = 'https://facebook.com/.../{}'.format(userID)
r = requests.Request('GET', url, params={"access_token":token})
pr = r.prepare()
print pr.url

Getting author's articles from Scopus using Scopus API (AUTHENTICATION_ERROR)

I've registered at http://www.developers.elsevier.com/action/devprojects. I created a project and got my scopus key:
Now, using this generated key, I would like to find an author by firstname, lastname and subjectarea. I make requests from my university network, which is allowed to visit Scopus (I have full manual access to Scopus search, use it from Firefox with no problem). However, I wanted to automatize my Scopus mining, by writing a simple script. I would like to find publications of an author by giving his/her firstname, lastname and subjectarea.
Here's my code:
# !/usr/bin/env python
# -*- coding: utf-8 -*-
import requests
import json
from scopus import SCOPUS_API_KEY
scopus_author_search_url = 'http://api.elsevier.com/content/search/author?'
headers = {'Accept':'application/json', 'X-ELS-APIKey': SCOPUS_API_KEY}
search_query = 'query=AUTHFIRST(%) AND AUTHLASTNAME(%s) AND SUBJAREA(%s)' % ('John', 'Kitchin', 'COMP')
# api_resource = "http://api.elsevier.com/content/search/author?apiKey=%s&" % (SCOPUS_API_KEY)
# request with first searching page
page_request = requests.get(scopus_author_search_url + search_query, headers=headers)
print page_request.url
# response to json
page = json.loads(page_request.content.decode("utf-8"))
print page
Where SCOPUS_API_KEY looks just like this: SCOPUS_API_KEY="xxxxxxxx".
Although I have full access to scopus from my university network, I'm getting such response:
{u'service-error': {u'status': {u'statusText': u'Requestor
configuration settings insufficient for access to this resource.',
u'statusCode': u'AUTHENTICATION_ERROR'}}}
The generated link looks like this: http://api.elsevier.com/content/search/author?query=AUTHFIRST(John)%20AND%20AUTHLASTNAME(Kitchin)%20AND%20SUBJAREA(COMP) and when I click it, it shows an XML file:
<service-error><status>
<statusCode>AUTHORIZATION_ERROR</statusCode>
<statusText>No APIKey provided for request</statusText>
</status></service-error>
Or, when I change the scopus_author_search_url to "http://api.elsevier.com/content/search/author?apiKey=%s&" % (SCOPUS_API_KEY) I'm getting:
{u'service-error': {u'status': {u'statusText': u'Requestor configuration settings insufficient for access to this resource.', u'statusCode': u'AUTHENTICATION_ERROR'}}} and the XML file:
<service-error>
<status>
<statusCode>AUTHENTICATION_ERROR</statusCode>
<statusText>Requestor configuration settings insufficient for access to this resource.</statusText>
</status>
</service-error>
What can be the cause of this problem and how can I fix it?
I have just registered for an API key and tested it first with this URL:
http://api.elsevier.com/content/search/author?apikey=4xxxxxxxxxxxxxxxxxxxxxxxxxxxxx43&query=AUTHFIRST%28John%29+AND+AUTHLASTNAME%28Kitchin%29+AND+SUBJAREA%28COMP%29
This works fine from my university network. I also tested a second API Key, so have verified one with registered website on my university domain, one with registered website http://apitest.example.com, ruling out the domain name used to register as the source of your problem.
I tested this
in the browser,
using your python code both with the api key in the headers. The only change I made to your code is removing
from scopus import SCOPUS_API_KEY
and adding
SCOPUS_API_KEY ='4xxxxxxxxxxxxxxxxxxxxxxxxxxxxx43'
using your python code adapted to put the apikey in the URL instead of the headers.
In all cases, the query returns two authors, one at Carnegie Mellon and one at Palo Alto.
I can't replicate your error message. If I try to use the API key from an IP address unregistered with elsevier (e.g. my home computer), I see a different error:
<service-error>
<status>
<statusCode>AUTHENTICATION_ERROR</statusCode>
<statusText>Client IP Address: xxx.yyy.aaa.bbb does not resolve to an account</statusText>
</status>
</service-error>
If I use a random (wrong) API key from the university network, I see
<service-error>
<status>
<statusCode>AUTHORIZATION_ERROR</statusCode>
<statusText>APIKey <mad3upa1phanum3r1ck3y> with IP address <my.uni.IP.add> is unrecognized or has insufficient privileges for access to this resource</statusText>
</status>
</service-error>
Debug steps
As I can't replicate your problem - here are some diagnostic steps you can use to resolve:
Use your browser at uni to actually submit the api query with your key in the URL (i.e. copy the URL above, paste it into the address bar, substitute your key and see whether you get the XML back)
If 1 returns the XML you expect, move onto submitting the request via Python - first, copy the exact URL straight into Python (no variable substitution via %s, no apikey in the header) and simply do a .get() on it.
If 2 returns correctly, ensure that your SCOPUS_API_KEY holds the exact key value, no more no less. i.e. print 'SCOPUS_API_KEY' should return your apikey: 4xxxxxxxxxxxxxxxxxxxxxxxxxxxxx43
If 1 returns the error, it looks like your uni (for whatever reason) has not got access to the authors query API. This doesn't make much sense given that you can perform manual search, but that is all I can conclude
Docs
For reference the authentication algorithm documentation is here, but it is not very simple to follow. You are following authentication option 1 and your method should just work.
N.B. The API is limited to 5000 author retrievals per week. If you have run a lot of queries in a loop, even if they have failed, it is possible that you have exceeded that...
For future reference. OP was using the package scopus which has long been renamed to pybliometrics.
Nowadays you can do
from pybliometrics.scopus import AuthorSearch
q = "AUTHFIRST(John) AND AUTHLASTNAME(Kitchin) AND SUBJAREA(COMP)"
s = AuthorSearch(q) # handles access, retrieval, parsing and even caches results
print(s)
results = s.authors # Holds all the information as a list of namedtuples
print(results) # You can put this into a pandas DataFrame as well

What is the syntax for adding a GET parameter to a URL?

I am using Python and Google App Engine.
I need to get access to certain webpage by adding some elements to the url.
What is the syntax for adding a GET parameter to a URL?
You put ? in the end of the url. After the ? you put var1=val1&var2=val2& ....
For example, if your raw url (without the get parameters) is http://www.example.com/ and you have two parameters, param1=7 and param2=seven, then the full url should be:
http://www.example.com/?param1=7&param2=seven.
If you want to generate the param1=7&param2=seven in python, you can use a parameter dictionary, like this:
import urllib
parameters = urllib.urlencode({'param1':'7', 'param2':'seven'})
The value of parameters is 'param1=7&param2=seven' so you can append the parameters string to a url like this:
raw_url = 'http://www.example.com/'
url = raw_url + '?' + params
Now the value of url is 'http://www.example.com/?param1=7&param2=seven'.
I am fairly certain this has been asked many times before, but the query parameters start with ? and are separated by & like so:
http://www.site.com/script.php?key=value&var=num
http://www.foo.com/somepath?keyword1=value1&keyword2=value2&keyword3=value3
The requests module handles this pretty cute:
>>> import requests
>>> reply = requests.get('https://example.com', {'abc':1, 'def':'<?>'})
>>> reply.url
'https://example.com/?abc=1&def=%3C%3F%3E'

Categories