Flask - href to anchor on a different page (navigation bar) - python

I am using Flask to develop a web app. On the home page (index.html), the navigation bar navigates one to specific sections on the page using anchors:
<a class='text' href='#body2'>calculate</a>
<a id="body2"></a>
On the home page, there is a form which links you to a new page (output.html). I want the same navigation bar to navigate a user to the previous page (index.html) and the specific sections. I have written the navigation links on the second page as shown below:
<a class='text' href="{{ url_for('index') }}#body2">calculate</a>
When I click the navigation links, the new page does not load. However, this is the strange thing, when I inspect the navigation link element in my browser and click the link through the inspect client, it does take me to the correct page/section.
If I remove '#body2' from the above line, it successfully navigates me to the previous page, but not to the specific section.
(If you want to physically try out the navigation links on the web app, use the following link:
http://yourgreenhome.appspot.com/ - Enter some random values into the blank form entries and it will take you to the second page. It is running through Google's App Engine but this is definitely not causing the problem because the problem still occurs when I run the site on local host).

You have an error in smoothscroll.js
$(document).ready(function(){
$("a").on('click', function(event) {
if (this.hash !== "") {
event.preventDefault();
var hash = this.hash;
$('html, body').animate({
scrollTop: $(hash).offset().top
}, 800, function(){
window.location.hash = hash;
});
}
});
});
In advpy page, $(hash).offset() is undefined, thus top is undefined. Because you are preventing the default event (event.preventDefault();) the click on the link doesn't occur.

Related

Incomplete html from Selenium

Hi I was wondering why if I have a certain page's url and use selenium like this:
webdriver.get(url)
webdriver.page_source
The source code given by selenium lacks elements that are there when inspecting the page from the browser ?
Is it some kind of way the website protects itself from scraping ?
Try adding some delay between webdriver.get(url) and webdriver.page_source to let the page completely loaded
Generally it should give you entire page source content with all the tags and tag attributes. But this is only applicable for static web pages .
for dynamic web pages, webdriver.page_source will only give you page resource whatever is available at that point of time in DOM. cause DOM will be updated based on user interaction with page.
Note that iframes are excluded from page_source in any way.
If the site you are scraping is a Dynamic website, then it takes some time to load as the JavaScript should run, do some DOM manipulations etc., and only after this you get the source code of the page.
So it is better to add some time delay between your get request and getting the page source.
import time
webdriver.get(url)
# pauses execution for x seconds.
time.sleep(x)
webdriver.page_source
The page source might contain one link on javascript file and you will see many controls on the page that has been generated on your side in your browser by running js code.
The source page is:
<script>
[1,2,3,4,5].map(i => document.write(`<p id="${i}">${i}</p>`))
</script>
Virtual DOM is:
<p id="1">1</p>
<p id="2">2</p>
<p id="3">3</p>
<p id="4">4</p>
<p id="5">5</p>
To get Virtual DOM HTML:
document.querySelector('html').innerHTML
<script>
[1,2,3,4,5].map(i => document.write(`<p id="${i}">${i}</p>`))
console.log(document.querySelector('body').innerHTML)
</script>

i don't want go to new tab when click submit button and use url_for method post

flask application #####
this is html page for login. my problem is when i ctrl+click at submit, it will have new page on new tab.
i don't want go to new tab. How can i fix this problem ?
(and use url_for method post)
i want fix this problem from phone device and computer device.
Control + Click is the way to open a link to a new tab... if even normal click will drag you to new tab, check in the code if you can see something like this
<form target="_blank" ...
and convert it into
<form ...
EDIT
On the comment I understand what you're looking for, so try to put this javascript in the page:
$(document).ready(function(){
$("submit").click(function(e){
e.preventDefault();
var formAction = $(this).attr("action");
window.location.href = formAction;
});
});
or without jQuery
document.querySelectorAll("input[type='submit']")[0].onclick = function () {
var formAction = document.querySelectorAll("form")[0].action;
e.preventDefault();
window.location.href = formAction;
};
Please note that it will work only if there's 1 form and 1 submit (input type=submit) in the page. So no multiple forms or button submit.

How to download data from hidden rows from a table on a web page using bs4 on py3

I want to know how I can download the datas from the first table that are contained in hidden row an save that into arrays at the following link:
https://www.diretta.it/giocatore/dybala-paulo/W4myUVXR/
To see them I have to press the button "show more matches" that you see in the image.
It is important that the code downloads every number/name in the first match table even the elements in hidden rows that is the focus of the question.
When you press that button the table shows other lines relating to the oldest games of that player.
I used the code you see below and was able to download only the information you see and not the information you get after pressing the button.
for record in link.findAll('a', class_ = 'leagueTable__team'):
linkplayer = record.get('href')
destlink.append(linkplayer)
for i in range(len(destlink)):
link_step1 = "https://www.diretta.it"+ destlink[i]+"/rosa/"
link_team.append(link_step1)
link_soap1=make_soup(link_step1)
for record in link_soap1.findAll('div', class_='tableTeam__squadName--playerName'):
for record1 in record.findAll("a"):
linkplayer = record1.get('href')
link_step2=diretta+linkplayer
players.append(linkplayer)
link_step2_list.append(link_step2)
for i in range(len(link_step2_list)):
link_soap2 = make_soup(link_step2_list[i])
for record in link_soap2.findAll('div', class_='playerTable__date'):
date = record.get_text()
print(date)
HTML:
<div class = class="profileTable__row profileTable__row--last show-more-last-matches">
<a>Mostra piĆ¹ incontri</a>
</div>
<script type="text/javascript">
$this = $('.profileTable__row--leagueHeading')
$this.hide();
$(document).ready(function() {
$this.eq(0).show();
var actualElement = $this.eq(0).attr('data-state');
for(var i = 1; i < $this.length; i++) {
if($this.eq(i).attr('data-state') != actualElement) {
$this.eq(i).show();
actualElement = $this.eq(i).attr('data-state');
}
}
})
I don't know about bs4 but ...
The 'hidden' data isn't actually in the page until you click on the "show more" link. So you need to use something like selenium to:
find the "show more" link
click the link
Use selenium to find all of the players again
This code will open the page, find the link, and click it:
import unittest
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://www.diretta.it/giocatore/dybala-paulo/W4myUVXR/')
for link in driver.find_elements_by_tag_name('a'):
if "Mostra" in link.text:
link.click()
You need to have the chrome driver installed on your box too..
https://chromedriver.chromium.org/downloads
The data is obtained through a new request and getting the result as json. One possibility is making the script do that request and parsing the json object.

Switching contents of a webpage with python CGI

I'm trying to learn by doing and i was messing around with twitch's API and JSON and got a list of the top 25 streams on their site to print out (splinksy.com) shows what i mean. Now i want to be able to make it so that when you click on a link it removes the text from the page and replaces it with a full screen embed of the stream, i know how to get python to show the embed i just don't know how to get it to work with page-urls such as ?channel= or just replacing content without refreshing at all.
In a click listener You can load the content of HTML body required to display a full screen embed of the stream using AJAX and replace content of a division which spans the whole body of current HTML page.
$('#link').click(function(){
$.get('?channel=1',
function(data) {
$('#content').html(data);
}
);
});
As you are loading content from other website you should use following code:
<iframe width="800" height="400" id="content"></iframe>
<a data-url="http://www.twitch.tv/esltv_dota">esltv_dota</a>
<script src="//ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>
<script>
$(document).ready(function(){
$('#link').click(function(){
$('#content').attr('src',$(this).attr('data-url'));
});
});
<script>

The html content that I'm trying to scrape only appears to load when I navigate to a certain anchor within the site

I'm trying to scrape a certain value off the following website: https://www.theice.com/productguide/ProductSpec.shtml?specId=6747556#data
Specifically, I'm trying to grab the "last" value from the table at the bottom of the page in the table with class "data default borderless". The issue is that when I search for that object name, nothing appears.
The code I use is as follows:
from bs4 import BeautifulSoup
import urllib2
url = "https://www.theice.com/productguide/ProductSpec.shtml?specId=6747556#data"
page=urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
result = soup.findAll(attrs={"class":"data default borderless"})
print result
One issue I noticed is that when I pull the soup for that URL, it strips off the anchor tag and shows me the html for the url: https://www.theice.com/productguide/ProductSpec.shtml?specId=6747556
It was my understanding that anchor tags just navigate you around the page but all the HTML should be there regardless, so I'm wondering if this table somehow doesn't load unless you've navigated to the "data" section of the webpage.
Does anyone know how to force the table to load before I pull the soup? Is there something else I'm doing wrong that prevents me from seeing the table?
Thanks in advance!
The content is dynamically generated via below js:
<script type="text/javascript">
var app = {};
app.isOption = false;
app.urls = {
'spec':'/productguide/ProductSpec.shtml?details=&specId=6747556',
'data':'/productguide/ProductSpec.shtml?data=&specId=6747556',
'confirm':'/reports/dealreports/getSampleConfirm.do?hubId=4080&productId=3418',
'reports':'/productguide/ProductSpec.shtml?reports=&specId=6747556',
'expiry':'/productguide/ProductSpec.shtml?expiryDates=&specId=6747556'
};
app.Router = Backbone.Router.extend({
routes:{
"spec":"spec",
"data":"data",
"confirm":"confirm",
"reports":"reports",
"expiry":"expiry"
},
initialize: function(){
_.bindAll(this, "spec");
},
spec:function () {
this.navigate("");
this._loadPage('spec');
},
data:function () {
this._loadPage('data');
},
confirm:function () {
this._loadPage('confirm');
},
reports:function () {
this._loadPage('reports');
},
expiry:function () {
this._loadPage('expiry');
},
_loadPage:function (cssClass, cb) {
$('#right').html('Loading..').load(this._makeUrlUnique(app.urls[cssClass]), cb);
this._updateNav(cssClass);
},
_updateNav:function (cssClass) {
// the left bar gets hidden on margin rates because the tables get smashed up too much
// so ensure they're showing for the other links
$('#left').show();
$('#right').removeClass('wide');
// update the subnav css so the arrow points to the right location
$('#subnav ul li a.' + cssClass).siblings().removeClass('on').end().addClass('on');
},
_makeUrlUnique:function (urlString) {
return urlString + '&_=' + new Date().getTime();
}
});
// init and start the app
$(function () {
window.router = new app.Router();
Backbone.history.start();
});
</script>
Two things you can do:1. figuring out the real path and variables it uses to pull the data, see this part 'data':'/productguide/ProductSpec.shtml?data=&specId=6747556', it passes a variable to the data string and get the content. 2. use the rss feed they provided and construct your own table.
the table is generated by JavaScript and you cant get it without actually loading the page in your browser
or you could use Selenium to load the page then evaluate the JavaScript and html, But Selenium will bring up and window so its visible but you can use Phantom.JS which makes the browser headless
But yes you will need to load the actual js in a browser to get the HTML is generates
Take a look at this answer also
Good Luck!
The HTML is generated using Javascript, so BeautifulSoup won't be able to get the HTML for that table (and actually the whole <div id="right" class="main"> is loaded using Javascript, I guess they're using node.js)
You can check this by printing the value of soup.get_text(). You can see that the table is not there in the source.
In that case, there is no way for you to access the data, unless you use Javascript to do exactly what the script do to get the data from the server.

Categories