How to prevent automatic HTML source code fixing on web browser - python

My original html source is below:
<html>
<head>
<title> aaaaa<bbbbb </title>
</head>
<body>
</body>
</html>
As you can see there is a mistake in the title. There is an unclosed < between aaaaa and bbbbb.
When I open this page with web browsers (firefox, chrome and edge), the browsers fix the problem and change the source code to this:
<html>
<head>
<title> aaaaa<bbbbb </title>
</head>
<body>
</body>
</html>
So is there a way to prevent browsers to fix problems in original htmls? When I browse, I want to see original html source.
Note: I am using firefox geckodriver with python/selenium. So any solution that includes a configuration in firefox or python code would be OK.

There are some fundamental difference between the HTML DOM shown through View Source i.e. using ctrl + U and the markup shown through Inspector i.e. using ctrl + shift + I.
Both the methods are two different browser features which allows users to look at the HTML of the webpage. However, the main difference is the View Source shows the HTML that was delivered from the web server (application server) to the browser. Where as, Inspect element is a Developer Tool e.g. Chrome DevTools to look at the state of the DOM Tree after the browser has applied its error correction and after any Javascript have manipulated the DOM. Some of those activities may include:
HTML error correction by the browser
HTML normalization by the browser
DOM manipulation by Javascript
In short, using View Source you will observe the Javascript but not the HTML. The HTML errors may get corrected in the Inspect Elements tool. As an example:
With in View Source you may observe:
<h1>The title</h2>
Whereas through Inspect Element that would have corrected as:
<h1>The title</h1>
This usecase
Based on the above mentioned concept the following markup:
<html>
<head>
<title> aaaaa<bbbbb </title>
</head>
<body>
</body>
</html>
gets corrected as:
<html>
<head>
<title> aaaaa<bbbbb </title>
</head>
<body>
</body>
</html>

Related

Blocking a section of text so that it is printed on one page

My program generates some text and tables as HTML report files. Tables may vary in length depending on the week. I want the text under the tables to be on one page, it cannot be split into two pages when printed out.
I don't have any idea how to do this, thanks for any help.
This is a sample of my code:
<!DOCTYPE html>
<html lang="pl">
<head>
<title>Some title</title>
</head>
<style type="text/css">
Some style formating...
</style>
<body>
Some text......
{DF.to_html()}
Some text.....
{DF.to_html()}
Text that need to be on one page when printed
</body>
</html>

How can I use Python to receive input from HTML in my Chrome extension?

In my extension, I have an input box, and a button. What I would like is for text to be submitted, and when the button is clicked, a summary to be generated.
This is my Python and HTML code (it's very simple).
Python Code:
import gensim
from gensim.summarization import summarize
def summary(original_text):
return summarize(original_text)
HTML Code
<!DOCTYPE html>
<head>
<meta charset="UTF-8">
<title>Document</title>
</head>
<body>
<input placeholder="Paste your text here!"> </input>
<button type="button" id='button_summarize'> Summarize! </button>
</body>
</html>
I tried to use flask but I realised it would not work since I'm trying to create an extension. Is there any way I'd get the Python to communicate with my HTML? Sample code would be helpful as I'm a bit new to this (hopefully it isn't too cumbersome given the simplicity of the task).
Thanks!
Short answer:
no there isn't a way you should be using javascript.
Long answer (for die hard pythonists):
use brython or something similar.
you can add an event listener to the button in brython with
document['button_summarize'].bind('click',function)

Using PAGE_ORDER_BY = 'page-order' in Pelican HTML pages

Using Pelican, the python static site generator, I want to re-order how my pages display in the navigation.
My pages are html, not markdown/ reST.
According to the docs & How can I control the order of pages from within a pelican article category? I should be using:
PAGE_ORDER_BY = 'page-order'
In the pelicanconf.py.
I have tried the following meta tags in my html pages, and used the same number format in each page: ='000' ='111' etc.
<meta page-order="888">
<meta name="page-order" content="888">
I get the following error when compiling the site:
There is no "page-order" attribute in the item metadata. Defaulting to slug order.
What is the correct method of specifying page order in HTML pages?
Thanks in advance.
Solved this by following this structure:
in config (no hyphen):
PAGE_ORDER_BY = 'sortorder'
HTML Page Structure:
{% extends "base.html" %}
<html>
<head>
<title>Test</title>
<meta name="sortorder" content="222" />
</head>
<body>
</body>
</html>
Pelican removes the html/head/body and uses the ones included in your base html, but seems to need this structure to recognise the meta tags in the head.
Make sure to use the same number of digits in each content="", eg. 111, 222, 333 not 1, 222, 33.

html css example in django

I wanted to use a third-party html/css example in Django, but it doesn't work.
<!DOCTYPE html>
<html>
<head>
<title>Example</title>
<link rel="stylesheet" type="text/css" href="/static/css/style.css" />
</head>
<body>
EXAMPLE CODE
</body>
</html>
I have used just this example, and commented all other code. Django recognizes the css file, I checked it. However, I don't get a sticky footer and header. They transform in a plain text, just like the main body.
I put this example in the codeacademy engine and it works there as well.
What hidden stones of Django I might be missing?
there is no hidden stones in django. there is no such option as recognize css file for django. Django has nothing to do with css files.
I recommend u to open network panel in browser developer tools console and check, was the css file downloaded successfully by browser or not.
in some case if u use django development server there is manual how to serve static files in development mode
https://docs.djangoproject.com/en/1.2/howto/static-files/
another cases are, misspelled css file name or misspelled/incorrect path to css file
https://docs.djangoproject.com/en/dev/howto/static-files/

selenium2 chrome webdriver - workaround for a blocking confirm?

Sorry my question could not be more succinct.
I'm using Selenium 2.14.0.
I have two pages, test.html and test2.html. I load test.html with the chrome webdriver, and click a link that takes me to test2.html.
test2.html contains a confirm in the body, which I think is preventing the page from loading, which is blocking my test script. Below are the html pages and my test script.
Is there anyway to have selenium2 close the confirm dialog? It looks like there is a way to do it with selenium-rc (choose_ok_on_next_confirmation), but that functionality is not available in webdriver (AFAICT).
When I run my test script, I don't get the 'blocking' output until I close the confirm. Funny thing is, if I load test2.html directly from my test script, instead of clicking a link, it doesn't seem to block.
Test.html
<head>
<title>Test</title>
</head>
<body>
<a id="link" href="test2.html">Click Me</a>
</body>
</html>
Test2.html
<html>
<head>
<title>Test</title>
</head>
<body>
<a id="link" href="test2.html">Click Me</a>
</body>
</html>
Python selenium test script
import selenium.webdriver as webdriver
wd = webdriver.Chrome()
wd.get('file:///C:/cygwin/tmp/postest/test.html')
elem = wd.find_element_by_id('link')
elem.click()
print 'Blocking!'
Have you tried
alert = driver.switch_to_alert()
alert.accept()
?

Categories