Using absolute URLs in Pelican static site generator ATOM feeds - python

In the ATOM feeds (RSS) for my site created with Pelican, the URLs and images with a relative path point to http://feeds.feedburner.com/blogname/whatever instead of my site's path.
I have the SITEURL and Relative_URLS = False settings in Pelican. What else can I do to get the feeds to use absolute instead of relative paths?
I know there is an xml:base setting, but I don't know how to get Pelican to use it?

OK, googling around, it seems like there's no way to do this. I'm going to have to resort to rewriting all the feed content by hacking the feed writer with absolute image paths and link URLs.

Related

Django open pdf on certain page number

I am trying to create a PDF analysis web app and I am stuck. I want to allow the user to open a certain page of the pdf that have over 300 pages in it.
So, can anyone tell me how to use Django to open the pdf in a new tab on a specific page?
EDIT -- Actually the Django code is running on AWS server and I want the user to see and open a PDF on a specific page that is stored into my media folder after analysis.
The answer to this really isn't Django-specific and is going to depend on what viewer you are using to display the PDF. If you're relying on the browser to display it, according to Adobe, you can use an anchor link (#page=) in the url (e.g. http://www.example.com/myfile.pdf#page=XX) though I think support for this varies outside Firefox and Chrome.
If you're using PDF.JS, on the other hand, you'll be able to programatically select a page to render. You can see how to do that on the viewer's examples page, here.
This depends on which library etc.
Not really a django question as #ps_ said.
From Adobe.com
To target an HTML link to a specific page in a PDF file, add #page=[page number] to the end of the link's URL.
For example, this HTML tag opens page 4 of a PDF file named myfile.pdf:
<A HREF="http://www.example.com/myfile.pdf#page=4">
I was able to solve my question, it took a while and after every attempt to use different libraries and PDF.js failed.
The solution is quite simple. so what I did is:
Step 1: Add MEDIA_ROOT and MEDIA_URL to settings.py.
# path to the MEDIA directory
MEDIA_ROOT = '/home/absolute/path/to/your/media/folder'
# URL to use to open MEDIA
MEDIA_URL = '/media/'
Step 2: Update the urls.py with the following code:
from . import settings
import django
urlpatterns = [
...
...
url(r'^(.*?)media/(?P<path>.*)$', django.views.static.serve,
{'document_root': settings.MEDIA_ROOT}),
]
Step 3: Now I created an onclick in the HTML tag as:
My pdf files were in documents folder inside the media directory
onclick="window.open('media/documents/{{Your/Pdf/File/Path.pdf}}#page={{Page_number_to_open}}')"

Using relative URL's

I do a simple web application written in Python using cherrypy and Mako. So, my question is also simple.
I have one page with URL http://1.2.3.4/a/page_first. Also there is an image that available on URL http://1.2.3.4/a/page_first/my_image.png. And I want to locate my_image.png on the page_first.
I added a tag <img src="my_image.png"/>, but it is not shown. I looked at web developer tools->Network and saw that request URL for image was http://1.2.3.4/a/my_image.png, instead of http://1.2.3.4/a/page_first/my_image.png.
Why does it happen?
Thanks.
The page address needs to be http://1.2.3.4/a/page_first/ (with trailing slash).
ADDED:
You don't seem to understand relative URLs, so let me explain. When you reference an image like this <img src="my_image.png"/>, the image URL in the tag doesn't have any host/path info, so path is taken from the address of the HTML page that refers to the image. Since path is everything up to the last slash, in your case it is http://1.2.3.4/a/. So the full image URL that the browser will request becomes http://1.2.3.4/a/my_image.png.
You want it to be http://1.2.3.4/a/page_first/my_image.png, so the path part of the HTML page must be /a/page_first/.
Note that the browser will not assume page_first is "a directory" just because it doesn't have an "extension", and will not add the trailing slash automatically. When you access a server publishing static dirs and files and specify a directory name for the path and omit the trailing slash (e. g. http://www.example.com/some/path/here), the server is able to determine that you actually request a directory, and it adds the slash (and usually also a default/index file name) for you. It's not generally the case with dynamic web sites where URLs are programmed.
So basically you need to explicitly include the trailing slash in your page path: dispatcher.connect('page','/a/:number_of_page/', controller=self, action='page_method') and always refer to it with the trailing slash (http://1.2.3.4/a/page_first/), otherwise the route will not be matched.
As a side note, usually you put the images and other static files into a dedicated dir and serve them either with CherryPy's static dir tool, or, if it's a high load site, with a dedicated server.
Try <img src="/page_first/my_image.png"/>

Does BeautifulSoup understand relative URLs?

I'm trying to scrape a site that uses a ton of relative URLs. One archive page has links to many individual entries, but the URL is given like "../2011/category/example.html"
For each entry, I want to open the page and scrape it, but I'm not sure what the most efficient way to handle that is. I'm thinking of splitting the starting URL by "/", pop off the last item and re-joining them, to get the base URL.
That seems like such a cludge, though. Is there a cleaner way?
To construct an absolute URL from a relative URL, use urlparse.urljoin (docs here).
If you are using a browsing system like mechanize for crawling, however, you can simply fetch an absolute url initially and then feed the browser relative urls after that. The browser will keep track of state and fetch the URL from the same domain as the previous request automatically.

django compressor and clevercss with absolute url paths

When using django, compressor, and clevercss, I set my css url to an absolute path. Clevercss is then passed the path of the .ccss file without the COMPRESS_ROOT prefixed (the absolute path). When I set my css url to a relative path, clevercss processes the ccss files, but the browser then correctly looks for relatively placed css files (e.g. mywebsite.com/profile/user/1/css/stylesheet.css)
Compressor, however, does use the MEDIA_ROOT when the css link is a relative url, but not when an absolute url is used. This has the unfortunate effect of my css either being rendered by clevercss and not accessible by the browser (unless on the home page), or clevercss not having access to the files (due to an absolute url being used). Ironically, the examples offered on http://github.com/mintchaos/django_compressor use absolute urls for the css paths.
I think I'm doing something wrong here, but I'm not sure where it could be and have spent quite a few hours looking. I'm also currently running this locally through ./manage.py runserver and serving some static files (images) through django. (this is fine for my local development).
I can't speak to django-compressor specifically; but I have been dealing with finding a good automatic compression solution for the CSS and JS files of my Django-powered web applications. I'm currently using django-static. It's been really easy to set up and use, IMO. I was running into some issues running django-compress (different from django-compressor) when I decided to give django-static a try. So far it's been great. Might be worth checking out. It can be found here: http://github.com/peterbe/django-static.

Url for the current page from a Mako template in Pylons

I need to know the full url for the current page from within a Mako template file in Pylons.
The url will be using in an iframe contained within the page so it needs to be known when the page is being generated rather than after the page hits the server or from the environment. (Not sure if I am communicating that last bit properly)
Not sure if this is the Pylons way of doing things but ${request.url} seems to work for me.
I think you can use h.url_for('', qualified=True) to get the full URL.
Make sure you have imported url_for in your helper file: from routes.util import helpers as h
Have a look at http://pylonshq.com/docs/en/0.9.7/thirdparty/routes/#routes.util.url_for

Categories