Certain javascript libs not loading over https

Certain javascript libs not loading over https - python

After setting up https on a site, some the javascript libraries are not loading while others are. In this case, the select2 lib is not loading. Why would this be?
Head extract
<head>
<link rel="stylesheet" href="https://yui.yahooapis.com/pure/0.6.0/pure-min.css">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js"></script>
<script src="https://code.jquery.com/ui/1.11.4/jquery-ui.js"></script>
<link rel="stylesheet" href="https://code.jquery.com/ui/1.11.4/themes/cupertino/jquery-ui.css">
<link href="https://cdnjs.cloudflare.com/ajax/libs/select2/4.0.0/css/select2.min.css" rel="stylesheet" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/select2/4.0.0/js/select2.min.js"></script>
<link rel="stylesheet" type="text/css" href="https://d1r6do663ilw4i.cloudfront.net/static/sweetalerts/sweetalert.css">
<script src="https://d1r6do663ilw4i.cloudfront.net/static/sweetalerts/sweetalert.min.js"></script>

First: Make sure the file actually exists at that URL. (Try different browsers, command line tools, ...)
Second: Make sure your ad-blocker/browser-plugins aren't blocking the request.

Related

Can't get stylesheet to link properly with Django Framework (using localhost)

I'm trying to link it in the header of html file in the following path:
main/home/templates/home/index.html
And the style.css lives in
main/main/stylesheet/style.css
And this is my link in the index.html:
<link rel="stylesheet" type="text/css" href="/main/stylesheet/style.css">
Is something wrong?

I guess you should go for:
<link rel="stylesheet" type="text/css" href="main/stylesheet/style.css">
so you need a relative path.

Replace the node of Beautiful Soup with string in python

I have to download and save the webpages with a given URL. I have downloaded the page as well as the required js and css files. But the problem is to change the src and href values of those tags in the html source file as well to make it work.
my html source is :
<link REL="shortcut icon" href="/commd/favicon.ico">
<script src="/commd/jquery.min.js"></script>
<script src="/commd/jquery-ui.min.js"></script>
<script src="/commd/slimScroll.min.js"></script>
<script src="/commd/ajaxstuff.js"></script>
<script src="/commd/jquery.nivo.slider.pack.js"></script>FCT0505
<script src="/commd/jquery.nivo.slider.pack.js"></script>
<link rel="stylesheet" type="text/css" href="/fonts/stylesheet.cssFCT0505"/>
<link rel="stylesheet" type="text/css" href="/commd/stylesheet.css"/>
<!--[if gte IE 6]>
<link rel="stylesheet" type="text/css" href="/commd/stylesheetIE.css" />
<![endif]-->
<link rel="stylesheet" type="text/css" href="/commd/accordion.css"/>
<link rel="stylesheet" href="/commd/nivo.css" type="text/css" media="screen" />
<link rel="stylesheet" href="/commd/nivo-slider.css" type="text/css" media="screen" />
I have found out all the links of css and js files as well as downloaded them using :
scriptsurl = soup3.find_all("script")
os.chdir(foldername)
for l in scriptsurl:
if l.get("src") is not None:
print(l.get("src"))
script="http://www.iitkgp.ac.in"+l.get("src")
print(script)
file=l.get("src").split("/")[-1]
l.get("src").replaceWith('./foldername/'+file)
print(file)
urllib.request.urlretrieve(script,file)
linksurl=soup3.find_all("link")
for l in linksurl:
if l.get("href") is not None:
print(l.get("href"))
css="http://www.iitkgp.ac.in"+l.get("href")
file=l.get("href").split("/")[-1]
print(css)
print(file)
if(os.path.exists(file)):
urllib.request.urlretrieve(css,file.split(".")[0]+"(1)."+file.split(".")[-1])
else:
urllib.request.urlretrieve(css,file)
os.chdir("..")
Can anyone suggest me the method to change(local machine path) the the src/href texts during these loop executions only which will be great help.
This is my first task of crawling.

Reading from the documentation:
You can add, remove, and modify a tag’s attributes. Again, this is done by treating the tag as a dictionary:
So writing something like:
l["src"] = os.path.join(os.getcwd(),foldername, file)
instead of
l.get("src").replaceWith('./foldername/'+file)
I believe will do the trick

Download a HTML page locally (+CSS, +images)

I'd like to be able to download a HTML page (let's say this actual question!):
f = urllib2.urlopen('https://stackoverflow.com/questions/33914277')
content = f.read() # soup = BeautifulSoup(content) could be useful?
g = open("mypage.html", 'w')
g.write(content)
g.close()
such that it is displayed the same way locally than online. Currently here is the (bad) result:
(source: gget.it)
Thus, one need to download CSS, and modify the HTML itself such that it points to this local CSS file... and the same for images, etc.
How to do this? (I think there should be simpler than this answer, that doesn't handle CSS, but how? Library?)

Since css and image files fall under CORS policy, from your local html you still can refer to them while they are in the cloud. The problem is unresolved URIs. In the html head section you have smth. like this:
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<link rel="stylesheet" type="text/css" href="/assets/8943fcf6/select.css" />
<link href="/css/media.css" rel="stylesheet" type="text/css">
<script type="text/javascript" src="/assets/jquery.yii.js"></script>
<script type="text/javascript" src="/assets/select.js"></script>
</head>
Obviously /css/media.css implies base address, ex. http://example.com. To resolve it for local file you need to make http://example.com/css/media.css as href value in your local copy of html. So now you should parse and add the base into the local code:
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<link rel="stylesheet" type="text/css" href="http://example.com/assets/select.css" />
<link href="http://example.com/css/media.css" rel="stylesheet" type="text/css">
<script type="text/javascript" src="http://example.com/assets/jquery.yii.js"></script>
<script type="text/javascript" src="http://example.com/assets/select.js"></script>
</head>
Use any means for that (js, php...)
Update
Since a local file also contains images' references throughout the body section you'll need to resolve them too.

Genshi auto load css/js need exclude specific file

I'm making a bootstrap theme for Trac installation. This is my first time using Genshi so please be patient :)
So I've following:
<head py:match="head" py:attrs="select('#*')">
${select('*|comment()|text()')}
<link rel="stylesheet" type="text/css" href="${chrome.htdocs_location}css/bootstrap.min.css" />
<link rel="stylesheet" type="text/css" href="${chrome.htdocs_location}css/style.css" />
</head>
This loads my custom css, but JS/css that trac needs to use.
So result is this:
<link rel="help" href="/pixelperfect/wiki/TracGuide" />
<link rel="start" href="/pixelperfect/wiki" />
<link rel="stylesheet" href="/pixelperfect/chrome/common/css/trac.css" type="text/css" />
<link rel="stylesheet" href="/pixelperfect/chrome/common/css/wiki.css" type="text/css" />
<link rel="stylesheet" type="text/css" href="/pixelperfect/chrome/common/css/bootstrap.min.css" />
<link rel="stylesheet" type="text/css" href="/pixelperfect/chrome/common/css/style.css" />
All is good, except that I would like to exclude trac.css out of there completely.
So my question is twofold:
1. How does genshi know what to load? Where is the manfest of all css/js files that it displays.
2. Is it genshi or python doing this?
Any help and relevant reading appreciated! :)
Thanks!

On 1:
The information on CSS files is accumulated in the 'links' dictionary of a request's Chrome property (req.chrome['links']), for JS files it is the 'scripts' dictionary. See add_link and add_script functions from trac.web.chrome respectively.
The default style sheet is added to the Chrome object directly. See the add_stylesheet call in trac.web.chrome.Chrome.prepare_request() method.
On 2:
Its part of the Request object, that is processed by Genshi. Preparation is done in Python anyway, but it is in the Trac Python scripts domain rather than in Genshi Python scripts.

How can I find a file name in a block of text using python?

I have gotten the HTML of a webpage using Python, and I now want to find all of the .CSS files that are linked to in the header. I tried partitioning, as shown below, but I got the error "IndexError: string index out of range" upon running it and save each as its own variable (I know how to do this part).
sytle = src.partition(".css")
style = style[0].partition('<link href=')
print style[2]
c =1
I do no think that this is the right way to approach this, so would love some advice. Many thanks in advance. Here is a section of the kind of text I am needing to extract .CSS file(s) from.
<meta name="viewport" content="width=device-width, initial-scale=1.0, minimum-scale=1.0" />
<!--[if gte IE 7]><!-->
<link href="/stylesheets/master.css?1342791430" media="screen, projection" rel="stylesheet" type="text/css" />
<link href="/stylesheets/adapt.css?1342791413" media="screen, projection" rel="stylesheet" type="text/css" />
<!-- <![endif]-->
<link href="/stylesheets/print.css?1342791421" media="print" rel="stylesheet" type="text/css" />
<link href="/apple-touch-icon-precomposed.png" rel="apple-touch-icon-precomposed" />
<link href="http://dribbble.com/shots/popular.rss" rel="alternate" title="RSS" type="application/rss+xml" />

You should use regular expression for this. Try the following:
/href="(.*\.css[^"]*)/g
EDIT
import re
matches = re.findall('href="(.*\.css[^"]*)', html)
print(matches)

My answer is along the same lines as Jon Clements' answer, but I tested mine and added a drop of explanation.
You should not use a regex. You can't parse HTML with a regex. The regex answer might work, but writing a robust solution is very easy with lxml. This approach is guaranteed to return the full href attribute of all <link rel="stylesheet"> tags and no others.
from lxml import html
def extract_stylesheets(page_content):
doc = html.fromstring(page_content) # Parse
return doc.xpath('//head/link[#rel="stylesheet"]/#href') # Search
There is no need to check the filenames, since the results of the xpath search are already known to be stylesheet links, and there's no guarantee that the filenames will have a .css extension anyway. The simple regex will catch only a very specific form, but the general html parser solution will also do the right thing in cases such as this, where the regex would fail miserably:
<link REL="stylesheet" hREf =
'/stylesheets/print?1342791421'
media="print"
><!-- link href="/css/stylesheet.css" -->
It could also be easily extended to select only stylesheets for a particular media.

For what it's worth (using lxml.html) as a parsing lib.
untested
import lxml.html
from urlparse import urlparse
sample_html = """<meta name="viewport" content="width=device-width, initial-scale=1.0, minimum-scale=1.0" />
<!--[if gte IE 7]><!-->
<link href="/stylesheets/master.css?1342791430" media="screen, projection" rel="stylesheet" type="text/css" />
<link href="/stylesheets/adapt.css?1342791413" media="screen, projection" rel="stylesheet" type="text/css" />
<!-- <![endif]-->
<link href="/stylesheets/print.css?1342791421" media="print" rel="stylesheet" type="text/css" />
<link href="/apple-touch-icon-precomposed.png" rel="apple-touch-icon-precomposed" />
<link href="http://dribbble.com/shots/popular.rss" rel="alternate" title="RSS" type="application/rss+xml" />
"""
import lxml.html
page = lxml.html.fromstring(html)
link_hrefs = (p.path for p in map(urlparse, page.xpath('//head/link/#href')))
for href in link_hrefs:
if href.rsplit(href, 1)[-1].lower() == 'css': # implement smarter error handling here
pass # do whatever

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Certain javascript libs not loading over https - python

First: Make sure the file actually exists at that URL. (Try different browsers, command line tools, ...) Second: Make sure your ad-blocker/browser-plugins aren't blocking the request.

Related

Can't get stylesheet to link properly with Django Framework (using localhost)

Replace the node of Beautiful Soup with string in python

Download a HTML page locally (+CSS, +images)

Genshi auto load css/js need exclude specific file

How can I find a file name in a block of text using python?

Categories

Resources