Django : extract a path from a full URL - python

In a Django 1.8 simple tag, I need to resolve the path to the HTTP_REFERER found in the context. I have a piece of code that works, but I would like to know if a more elegant solution could be implemented using Django tools.
Here is my code :
from django.core.urlresolvers import resolve, Resolver404
# [...]
#register.simple_tag(takes_context=True)
def simple_tag_example(context):
# The referer is a full path: http://host:port/path/to/referer/
# We only want the path: /path/to/referer/
referer = context.request.META.get('HTTP_REFERER')
if referer is None:
return ''
# Build the string http://host:port/
prefix = '%s://%s' % (context.request.scheme, context.request.get_host())
path = referer.replace(prefix, '')
resolvermatch = resolve(path)
# Do something very interesting with this resolvermatch...
So I manually construct the string 'http://sub.domain.tld:port', then I remove it from the full path to HTTP_REFERER found in context.request.META. It works but it seems a bit overwhelming for me.
I tried to build a HttpRequest from referer without success. Is there a class or type that I can use to easily extract the path from an URL?

You can use urlparse module to extract the path:
try:
from urllib.parse import urlparse # Python 3
except ImportError:
from urlparse import urlparse # Python 2
parsed = urlparse('http://stackoverflow.com/questions/32809595')
print(parsed.path)
Output:
'/questions/32809595'

Related

Unable to use amazonify in Python 3. ModuleNotFoundError: No module named 'urlparse'

My code:
from amazonify import amazonify
from urllib.parse import urlparse
User_url = input('Enter the link here: ')
The error I got:
File
"C:\Users\jashandeep\AppData\Local\Programs\Python\Python39\lib\site-packages\amazonify.py",
line 4, in
from urlparse import urlparse, urlunparse ModuleNotFoundError: No module named 'urlparse'
You will need to update this import to add Python 3 support:
From:
from urlparse import urlparse, urlunparse, parse_qs
from urllib import urlencode
to:
from urllib.parse import urlparse, urlunparse, parse_qs, urlencode
Your options:
Use this package with Python 2.
Update package source code.
Modify the import statements in amazonify.py.
In some cases, changing the source code can have unexpected consequences (for example, for dependent packages and scripts). In general, I do not recommend it. But it looks like this package is pretty simple and hasn't been updated for a long time.
It seems that there will be no update. You need to do everything yourself.
Use this as a standalone script.
This package is basically a single Python function. The author of the package did not specify a license, but you can write your own script similar to this example.
from urllib.parse import urlparse, urlunparse, parse_qs, urlencode
def amazonify(url, affiliate_tag):
"""Generate an Amazon affiliate link given any Amazon link and affiliate
tag.
:param str url: The Amazon URL.
:param str affiliate_tag: Your unique Amazon affiliate tag.
:rtype: str or None
:returns: An equivalent Amazon URL with the desired affiliate tag included,
or None if the URL is invalid.
"""
# Ensure the URL we're getting is valid:
new_url = urlparse(url)
if not new_url.netloc:
return None
# Add or replace the original affiliate tag with our affiliate tag in the
# querystring. Leave everything else unchanged.
query_dict = parse_qs(new_url[4])
query_dict['tag'] = affiliate_tag
new_url = new_url[:4] + (urlencode(query_dict, True), ) + new_url[5:]
return urlunparse(new_url)
if __name__ == '__main__':
affiliate_tag = 'tag'
urls = [...]
affiliate_urls = [amazonify(u, affiliate_tag) for u in urls]
print(affiliate_urls)

Combining three URL components into a single URL

I am trying to write a function combine three components of a URL: a protocol, location, and resource, into a single URL.
I have the following code, and it works only partially, returning a URL with only the protocol and resource components, but not the location component.
Code:
from urllib.parse import urlparse
import os
def buildURL(protocol, location, resource):
return urllib.parse.urljoin(protocol, os.path.join(location,
resource))
Example: buildURL('http://', 'httpbin.org', '/get')
This returns http:///get. I a trying to debug this to also allow for the location parameter to be in the URL. It should be returning http://httpbin.org/get.
How can I build a URL successfully?
It's because you put /get in the os.path.join. you should call it like buildURL('http://', 'httpbin.org', 'get'). os.path.join will treat / as an absolute path that will be hooked from the root of the base location, which is the first parameter of the join function: location
You shouldn't be using os.path here at all. That module is for filesystem paths, e.g. to deal with things like /usr/bin/bash and C:\Documents and Settings\User\.
It's not for building URLs. They aren't affected by the user's host OS.
Instead, use urlunparse() or urlunsplit() from urllib.parse:
from urllib.parse import urlunparse
urlunparse(('https', 'httpbin.org', '/get', None, None, None))
# 'https://httpbin.org/get'

how to get current model name from url odoo

I need to get current model name from this url:
http://localhost:8069/web#view_type=form&model=system.audit&menu_id=221&action=229
Here the model name is system.audit.
Please Help
You can use urlparse built-in library in python
Try this snippet:
from urlparse import urlparse, parse_qs
url = "http://localhost:8069/web#view_type=form&model=system.audit&menu_id=221&action=229"
uri = urlparse(url)
qs = uri.fragment
print parse_qs(qs).get('model', None)
# ['system.audit']
It is working fine for me, check if the model is in query params when you're getting in real-time
In python 3, try this:
from urllib import parse
result = parse.parse_qs("http://localhost:8069/web#view_type=form&model=system.audit&menu_id=221&action=229")['model'][0]
print(result)
For python 2 use urlparse.urlparse.parse_qs().

python os.listdir() for remote locations

I recently noticed that
os.listdir('http://chymera.eu/data/faceRT')
complains about not finding my directories.
What can I do to be able to run os.listdir() on remote locations? I have checked and this is not a permissions issue, I can open the folder via my browser and my webftp client says it's 755.
Whatever I do, I would NOT like to have to use login information. I made a decision about sharing when I set the directory permissions. If I say r+x for everyone then I want that to mean r+x for everyone.
os.listdir expects the argument to be a path on the filesystem. It does not attempt to understand URLs
You can use urllib to request the page and parse it to find the URLs
Ok, so I solved this by using the HTMLparser to parse my web index:
if source == 'server':
from HTMLParser import HTMLParser
import urllib
class ChrParser(HTMLParser):
def handle_starttag(self, tag, attrs):
if tag =='a':
for key, value in attrs:
if key == 'href' and value.endswith('.csv'):
pre_fileslist.append(value)
results_dir = 'http://chymera.eu/data/faceRT'
data_url = urllib.urlopen(results_dir).read()
parser = ChrParser()
pre_fileslist = []
parser.feed(data_url) # pre_fileslist gets populated here

Making relative paths absolute in python

I want to crawl web page with python, the problem is with relative paths, I have the following functions which normalize and derelativize urls in web page, I can not implement one part of derelativating function. Any ideas? :
def normalizeURL(url):
if url.startswith('http')==False:
url = "http://"+url
if url.startswith('http://www.')==False:
url = url[:7]+"www."+url[7:]
return url
def deRelativizePath(url, path):
url = normalizeURL(url)
if path.startswith('http'):
return path
if path.startswith('/')==False:
if url.endswith('/'):
return url+path
else:
return url+"/"+path
else:
#this part is missing
The problem is: I do not know how to get main url, they can be in many formats:
http://www.example.com
http://www.example.com/
http://www.sub.example.com
http://www.sub.example.com/
http://www.example.com/folder1/file1 #from this I should extract http://www.example.com/ then add path
...
I recommend that you consider using urlparse.urljoin() for this:
Construct a full ("absolute") URL by combining a "base URL" (base) with another URL (url). Informally, this uses components of the base URL, in particular the addressing scheme, the network location and (part of) the path, to provide missing components in the relative URL.
from urlparse import urlparse
And then parse into the respective parts.

Categories