This webpage opens fine manually, but directly goes to a "maintenance" error message when using Selenium !
from selenium import webdriver
driver = webdriver.Chrome(executable_path="chromedriver")
driver.get("https://www.winamax.fr/paris-sportifs/")
Is there a way to avoid this behaviour ?
Sometimes the websites checks your user-agent. Maybe you can change that.
Otherwise, I would recommend using Auto-It to do the job.
You could also try loading in your personal Chrome profile into Selenium.
A bit unclear why you felt website blocking Selenium. However I was able to access the website following the solution below:
Code Block:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get('https://www.winamax.fr/paris-sportifs/')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "section#above-content a[href='/paris-sportifs']")))
print(driver.page_source)
Console Output:
<html class="app-desktop js flexbox canvas canvastext webgl no-touch geolocation postmessage websqldatabase indexeddb hashchange history draganddrop websockets rgba hsla multiplebgs backgroundsize borderimage borderradius boxshadow textshadow opacity cssanimations csscolumns cssgradients cssreflections csstransforms csstransforms3d csstransitions fontface generatedcontent video audio localstorage sessionstorage webworkers applicationcache svg inlinesvg smil svgclippaths pointerevents cssremunit" lang="fr" style=""><!--[if lt IE 7]> <html class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]--><!--[if IE 7]> <html class="no-js lt-ie9 lt-ie8"> <![endif]--><!--[if IE 8]> <html class="no-js lt-ie9"> <![endif]--><!--[if gt IE 8]><!--><!--<![endif]--><head>
<title>Paris Sportifs - Parier en ligne avec Winamax</title>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<meta name="viewport" content="width=1024">
<meta name="verify-v1" content="divGwlYLgkmoS558zISY8BYg1KLLvQLFmRf0CmPJ1kc=">
<meta name="google-site-verification" content="DB-aVdWJ00FHClrX_fsXFGgZzaYUKGVcvY7uSfOkXsw">
<meta name="format-detection" content="telephone=no">
<meta name="apple-mobile-web-app-title" content="Winamax">
<meta property="og:description" content="Agréé par l’Arjel – Pariez sur le sport sur Winamax! Faites un premier dépôt et votre premier pari sera remboursé si il est perdant.">
<meta property="og:image" content="https://operator-front-static-cdn.winamax.fr/img/content/betting/ParisSportif_Facebook.jpg?v=20150401">
<meta property="fb:admins" content="519584907">
<meta property="og:site_name" content="Winamax.fr">
<meta property="og:type" content="game">
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:site" content="#WinamaxSport">
<meta name="twitter:title" content="Paris Sportifs - Parier en ligne avec Winamax">
<meta name="twitter:description" content="Agréé par l’Arjel – Pariez sur le sport sur Winamax! Faites un premier dépôt et votre premier pari sera remboursé si il est perdant.">
<meta name="twitter:image:src" content="https://operator-front-static-cdn.winamax.fr/img/content/betting/ParisSportif_Facebook_twitter.jpg?v=20150401">
<meta name="twitter:domain" content="winamax.fr">
<meta property="og:url" content="https://www.winamax.fr/paris-sportifs/">
<link rel="stylesheet" id="normalize" href="https://operator-front-static-cdn.winamax.fr/style/v2/normalize.css?v=20191219-1" type="text/css" media="all">
<link rel="stylesheet" id="reset" href="https://operator-front-static-cdn.winamax.fr/style/v2/reset.css?v=20191219-1" type="text/css" media="all">
<link rel="stylesheet" id="fontawesome" href="https://operator-front-static-cdn.winamax.fr/style/v2/fontawesome.min.css?v=20191219-1" type="text/css" media="all">
<link rel="stylesheet" id="magnific-popup" href="https://operator-front-static-cdn.winamax.fr/style/v2/magnific-popup.css?v=20191219-1" type="text/css" media="all">
<link rel="stylesheet" id="spritesheet" href="https://operator-front-static-cdn.winamax.fr/style/v2/spritesheet.css?v=20191219-1" type="text/css" media="all">
<link rel="stylesheet" id="common" href="https://operator-front-static-cdn.winamax.fr/style/v2/common.css?v=20191219-1" type="text/css" media="all">
<link rel="stylesheet" id="fancybox" href="https://operator-front-static-cdn.winamax.fr/style/v2/jquery.fancybox.css?v=20191219-1" type="text/css" media="all">
<link rel="stylesheet" id="doubleslider" href="https://operator-front-static-cdn.winamax.fr/style/v2/doubleslider.css?v=20191219-1" type="text/css" media="all">
<link rel="stylesheet" id="source-sans-pro" href="https://operator-front-static-cdn.winamax.fr/style/fonts/SourceSansPro/source-sans-pro.css?v=20191219-1" type="text/css" media="all">
<link rel="icon" href="https://operator-front-static-cdn.winamax.fr/img/style/v2/favicon.ico" type="image/png">
<link rel="manifest" href="/manifest.json">
<link rel="apple-touch-icon" href="https://operator-front-static-cdn.winamax.fr/img/style/v2/20170721_touch_icon_winamax.png">
<link rel="canonical" href="https://www.winamax.fr/paris-sportifs/">
<link rel="alternate" hreflang="fr" href="https://www.winamax.fr/paris-sportifs/">
<link rel="alternate" hreflang="en" href="https://www.winamax.fr/en/sports-betting/">
<link rel="alternate" hreflang="de" href="https://www.winamax.fr/de/sportwetten/">
<script async="" src="https://www.google-analytics.com/analytics.js"></script><script type="text/javascript" async="" src="https://ssl.google-analytics.com/ga.js"></script><script type="text/javascript" src="https://operator-front-static-cdn.winamax.fr/script/swfobject.js?v=20191219-1"></script>
<script type="text/javascript" src="https://operator-front-static-cdn.winamax.fr/script/v2/betting-helpers.js?v=20191219-1"></script>
<script type="text/javascript" src="https://operator-front-static-cdn.winamax.fr/script/numeral.js?v=20191219-1"></script>
<script type="text/javascript" src="https://operator-front-static-cdn.winamax.fr/script/numeral.languages.js?v=20191219-1"></script>
<script src="https://operator-front-static-cdn.winamax.fr/script/v2/lib/modernizr.custom.js?v=20191219-1"></script>
<!--[if lte IE 8]>
<script src="https://operator-front-static-cdn.winamax.fr/script/v2/lib/jquery-1.10.1.min.js?v=20191219-1"></script>
<script src="https://operator-front-static-cdn.winamax.fr/script/v2/lib/selectivizr-min.js?v=20191219-1"></script>
<![endif]-->
<script language="Javascript" type="text/javascript" src="https://operator-front-static-cdn.winamax.fr/script/messages/messages_fr.js?v=20191219-1"></script>
<script>
var $siteLanguage = 'FR';
var $siteLanguagePath = '';
var $siteLicense = 'FR';
</script>
<style data-styled-components=""></style><script charset="utf-8" src="https://operator-front-static-cdn.winamax.fr/betting/client/1.55.1/main.30da6bcf4d935f4ad03c.js"></script><script charset="utf-8" src="https://operator-front-static-cdn.winamax.fr/betting/client/1.55.1/main.6dcf45fd436abb91aedd.js"></script><script charset="utf-8" src="https://operator-front-static-cdn.winamax.fr/betting/client/1.55.1/main.1812cb976eb7abff1666.js"></script><script charset="utf-8" src="https://operator-front-static-cdn.winamax.fr/betting/client/1.55.1/main.fbe533b04c719597d791.js"></script><script type="text/javascript" charset="utf-8" async="" data-requirecontext="_" data-requiremodule="jquery" src="https://operator-front-static-cdn.winamax.fr/script/v2/lib/jquery-1.12.4.min.js?v=20191219-1"></script><script type="text/javascript" charset="utf-8" async="" data-requirecontext="_" data-requiremodule="swipe" src="https://operator-front-static-cdn.winamax.fr/script/v2/lib/swipe.js?v=20191219-1"></script><script type="text/javascript" charset="utf-8" async="" data-requirecontext="_" data-requiremodule="touch" src="https://operator-front-static-cdn.winamax.fr/script/v2/lib/jquery.touch.js?v=20191219-1"></script><script type="text/javascript" charset="utf-8" async="" data-requirecontext="_" data-requiremodule="magnific" src="https://operator-front-static-cdn.winamax.fr/script/v2/lib/jquery.magnific-popup.min.js?v=20191219-1"></script><script type="text/javascript" charset="utf-8" async="" data-requirecontext="_" data-requiremodule="fancybox" src="https://operator-front-static-cdn.winamax.fr/script/v2/lib/jquery.fancybox.pack.js?v=20191219-1"></script><style id="detectElementResize" type="text/css">#keyframes resizeanim { from { opacity: 0; } to { opacity: 0; } } .resize-triggers { animation: 1ms resizeanim; visibility: hidden; opacity: 0; } .resize-triggers, .resize-triggers > div, .contract-trigger:before { content: " "; display: block; position: absolute; top: 0; left: 0; height: 100%; width: 100%; overflow: hidden; z-index: -1; } .resize-triggers > div { background: #eee; overflow: auto; } .contract-trigger:before { width: 200%; height: 200%; }</style><style type="text/css">.fancybox-margin{margin-right:17px;}</style><script type="text/javascript" charset="utf-8" async="" data-requirecontext="_" data-requiremodule="script/v2/common" src="https://operator-front-static-cdn.winamax.fr/script/v2/common.js?v=20191219-1"></script><script type="text/javascript" charset="utf-8" async="" data-requirecontext="_" data-requiremodule="script/v2/mobile" src="https://operator-front-static-cdn.winamax.fr/script/v2/mobile.js?v=20191219-1"></script><script type="text/javascript" charset="utf-8" async="" data-requirecontext="_" data-requiremodule="script/v2/gallery" src="https://operator-front-static-cdn.winamax.fr/script/v2/gallery.js?v=20191219-1"></script><script type="text/javascript" charset="utf-8" async="" data-requirecontext="_" data-requiremodule="script/v2/doubleslider" src="https://operator-front-static-cdn.winamax.fr/script/v2/doubleslider.js?v=20191219-1"></script></head>
<body class="lang-fr license-fr">
<div id="wrap-menu-overlay"></div>
<!-- Facebook -->
<div id="fb-root"></div>
<script>$fbLocalized = 'fr_FR';</script>
<div id="doc">
<div id="inner-wrap"><div style="cursor: pointer; background: url("https://static.winamax.fr/img/style/v2/common/avertissement.png?v=2") center center no-repeat rgb(0, 0, 0); height: 60px; z-index: 112; display: block;"></div>
<script>
var arjelBannerImgPath = Math.random() > 0.5 ? "https://static.winamax.fr/img/style/v2/common/avertissement.png?v=2" : "https://static.winamax.fr/img/style/v2/common/avertissement-alt.png";
function setupArjelBanner() {
var ref = document.getElementById("inner-wrap");
if (ref) {
var banner = document.createElement("div");
banner.style.cursor = "pointer";
banner.style.background = "url('" + arjelBannerImgPath + "') no-repeat center center #000";
banner.style.height = "60px";
banner.style.zIndex = Math.floor(50 + Math.random() * 100);
function goArjelInfos() {
window.open("https://www.winamax.fr/CLIC/PREVENTION/JIS_HOME/www.joueurs-info-service.fr");
}
function WidthChange(mq) {
if (mq.matches) {
var arjelBannerImgPath = Math.random() > 0.5 ? "https://static.winamax.fr/img/style/v2/common/avertissement-mobile.png?v=2" : "https://static.winamax.fr/img/style/v2/common/avertissement-alt-mobile.png";
banner.style.background = "url('" + arjelBannerImgPath + "') no-repeat center center #000";
banner.style.backgroundSize = "100% auto";
} else {
banner.style.display = "block";
}
}
banner.addEventListener("click", goArjelInfos, false);
ref.insertBefore(banner, ref.firstChild);
if (matchMedia) {
var mq = window.matchMedia("(max-width: 480px)");
mq.addListener(WidthChange);
WidthChange(mq);
}
}
}
setupArjelBanner();
</script>
<div class="top-notif-wrapper"><div class="notification top-notif medium cookie-consent">
<i class="icon-warning-sign"></i>
<p><span>En poursuivant votre navigation sur ce site, vous acceptez l’utilisation de Cookies afin de réaliser des statistiques de visites et vous proposer des promotions adaptées. S’y opposer - Plus d’informations sur les cookies</span></p>
<i class="icon-remove"></i></div>
</div><header id="masthead">
<div class="container">
<a href="/" id="logo">
<img src="https://operator-front-static-cdn.winamax.fr/img/style/v2/common/logo.png" alt="Winamax">
<img class="hover" src="https://operator-front-static-cdn.winamax.fr/img/style/v2/common/logo-highlight.png" alt="">
</a>
<div class="content">
<div id="login-toggle" class="toggle">
<a href="/account/login.php?redir=/paris-sportifs/">
<svg id="user-icon" viewBox="0 0 32 24">
<path d="M10.3,13.9c0,0-7.1-0.1-9.8,5.3v4.7h9.8h9.8v-4.7C17.5,13.8,10.3,13.9,10.3,13.9z"></path>
<ellipse cx="10.3" cy="6.4" rx="4.9" ry="6.2"></ellipse>
</svg>
</a>
</div>
<div id="login-container" class="expandable">
Se connecter
S'inscrire
<div id="login-popup">
<form id="login-form" method="post" action="/paris-sportifs/" autocomplete="off">
<p class="field">
<label>Adresse email</label>
<input type="email" name="email" id="loginbox_email" placeholder="vous#exemple.com" tabindex="1" autocomplete="off" value="">
</p>
<p class="field">
<label>Mot de passe</label>
<input type="password" id="loginbox_password" name="password" placeholder="" tabindex="2" autocomplete="off">
<a class="forgot-password" href="/account/lost_password.php">Mot de passe oublié ?</a>
</p>
<p class="field birthdate">
<label for="day-input">Date de naissance</label>
<input type="text" id="loginbox_birthday" name="birth_day" autocomplete="off" placeholder="JJ" tabindex="3" pattern="[0-9]*" size="2" maxlength="2" value="">
<input type="text" id="loginbox_birthmonth" name="birth_month" autocomplete="off" placeholder="MM" tabindex="4" pattern="[0-9]*" size="2" maxlength="2" value="">
<input type="text" id="loginbox_birthyear" name="birth_year" autocomplete="off" placeholder="AAAA" tabindex="5" pattern="[0-9]*" size="4" maxlength="4" value="">
</p>
<button type="submit" id="login-button" name="submitlogin" class="secondary-button" tabindex="6"><i class="icon-lock"></i>Connexion</button>
</form>
<div id="no-account">
<span>Pas encore de compte ?</span> Inscrivez-vous gratuitement
</div>
</div>
<div id="pin-popup">
</div>
</div>
<div id="search" class="expandable" style="height:28px">
</div>
<div id="nav-toggle" class="toggle">
<svg id="list-icon" viewBox="0 0 24 24">
<rect x="6" y="17" width="18" height="3"></rect>
<rect x="6" y="10.5" width="18" height="3"></rect>
<rect x="6" y="4" width="18" height="3"></rect>
<rect x="0" y="4" width="3" height="3"></rect>
<rect x="0" y="10.5" width="3" height="3"></rect>
<rect x="0" y="17" width="3" height="3"></rect>
</svg>
</div>
</div>
</div>
<nav id="main-nav" class="">
<ul class="main-nav-list">
<li class="">
JOUER AU POKER
</li>
<li class="focus">
PARIS SPORTIFS
</li>
<li class="">
GRILLES
</li>
<li class="">
JEU DE L’ENTRAINEUR
</li>
<li class="">
PROMOS
</li>
<li class="">
VIP
</li>
<li class="">
ACTUS
</li>
</ul>
</nav>
</header>
<section id="above-content">
<nav id="secondary-nav" class="container">
<div id="section-title">
Paris Sportifs
</div>
<ul>
<li>À la Une</li>
<li>LIVE</li>
<li>Multiplex</li>
<li>Winamax TV</li>
<li>Calendrier</li>
<li>Mes paris</li>
<li>Stats</li>
<li>Résultats</li>
</ul>
</nav>
</section>
Related
I am trying to get the html code of one marketplace, I am getting the correct code for one category but for another one I am getting wrong result.
For this one
k = requests.get('https://www.skroutz.gr/plus-deals').text
soup1=BeautifulSoup(k,'html.parser')
soup1
I am getting the correct html code and I can process the data but for the following link i am getting less html code and there is now info about the product inside.
k = requests.get('https://www.skroutz.gr/c/40/kinhta-thlefwna.html').text
soup=BeautifulSoup(k,'html.parser')
soup
Output
<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
<!--[if IE 7]> <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
<!--[if IE 8]> <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->
<head>
<title>Please Wait... | Cloudflare</title>
<meta charset="utf-8"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="IE=Edge" http-equiv="X-UA-Compatible"/>
<meta content="noindex, nofollow" name="robots"/>
<meta content="width=device-width,initial-scale=1" name="viewport"/>
<link href="/cdn-cgi/styles/cf.errors.css" id="cf_styles-css" rel="stylesheet"/>
<!--[if lt IE 9]><link rel="stylesheet" id='cf_styles-ie-css' href="/cdn-cgi/styles/cf.errors.ie.css" /><![endif]-->
<style>body{margin:0;padding:0}</style>
<!--[if gte IE 10]><!-->
<script>
if (!navigator.cookieEnabled) {
window.addEventListener('DOMContentLoaded', function () {
var cookieEl = document.getElementById('cookie-alert');
cookieEl.style.display = 'block';
})
}
</script>
<!--<![endif]-->
<script>
//<![CDATA[
(function(){
window._cf_chl_opt={
cvId: "2",
cType: "managed",
cNounce: "84571",
cRay: "724b4343a9c4152e",
cHash: "fe638ea0b43499b",
cUPMDTk: "\/c\/40\/kinhta-thlefwna.html?__cf_chl_tk=SyBbHJlHzS.xDpnxJdXeTIBNmKGSVY9Caf6BU5YW6xE-1656805606-0-gaNycGzNBz0",
cFPWv: "b",
cTTimeMs: "1000",
cLt: "n",
cRq: {
ru: "aHR0cHM6Ly93d3cuc2tyb3V0ei5nci9jLzQwL2tpbmh0YS10aGxlZnduYS5odG1s",
ra: "cHl0aG9uLXJlcXVlc3RzLzIuMjYuMA==",
rm: "R0VU",
d: "MFxIKPp9sia0pudoUteuGKwmEJZnwrwbUU7R6o3bLbmRQdKUSifFGDKIWAthmI25tWLGCJvkxJ/ZSXohN4X7NLhi1X31Fa3HI/KIaiMyh8YXFbroUVZul9D5Y0Hxgy3uYhvKQWMzzsfJA0wJ3OWZCF2SCCYvYhqZeaVPccnVLdSfk0r1+/iOoRZlTpwKzHg1y8avQMw3jpGpm4AvsDclXglM3x1j0hmlFWCig7J2b1QPfPUB1n7jUOL7angCdkUPRolkIzbpbiv1EBnFvj8LTKjERnDHfL9FMw/kobs+wwFcHik9T3fBJA6zpy3pMCiJCMv8yXAAnbmICUswEwNMGEqj2XSYaae1wYLwVRfku7hcs5p1SULPSvG6Tdnw3KBFaN8G3LCzuPXMCIecb0shm/WJQLFiQJZ3vkqH4pYgVzU+dy0RKpmpFdPX/oJcvcCexMG9PQS+5DmKbEqSaMJ0o4MS3lydhg/0CB98601f/gQpLlOvwGQgmpZA0dQNsoJ4qdk/zkPIsonkBETn8ymsyUWZTYjARpB5oPl2Pb/jIBaTEbYiI/q0AkBC4NL3AHZTpJimOCbg+Qo1qW2qMjIJ4sguHuVn1sXfdD61k0x5AS29MSDKIL7TNcIybM223qnleOlFMwBhOYuRt5kbhi0iCQ==",
t: "MTY1NjgwNTYwNi45OTUwMDA=",
m: "jz0GZxi5IN9yD0ZMzRvqHS7iO3DUD67nnR9aWi8akek=",
i1: "yKKkb4Pl9MdAGpv3ff55sw==",
i2: "0V/b+SP8SYiGm6Ql5jqhjg==",
zh: "zMLZU0ozMixiusF5YQ59SCEM/iph9RYq7XDo619EjZk=",
uh: "xaa5dII6Z3KyYGzGAu/zTXOfAYzLW3WlpO4dxW/Wc8c=",
hh: "SbqW99632Mb3TCb6zbuLigmv9PVrnmEea13QmnYx5Y4=",
}
};
}());
//]]>
</script>
<style>
#cf-wrapper #spinner {width:69px; margin: auto;}
#cf-wrapper #cf-please-wait{text-align:center}
.attribution {margin-top: 32px;}
.bubbles { background-color: #f58220; width:20px; height: 20px; margin:2px; border-radius:100%; display:inline-block; }
#cf-wrapper #challenge-form { padding-top:25px; padding-bottom:25px; }
#cf-hcaptcha-container { text-align:center;}
#cf-hcaptcha-container iframe { display: inline-block;}
#keyframes fader { 0% {opacity: 0.2;} 50% {opacity: 1.0;} 100% {opacity: 0.2;} }
#cf-wrapper #cf-bubbles { width:69px; }
#-webkit-keyframes fader { 0% {opacity: 0.2;} 50% {opacity: 1.0;} 100% {opacity: 0.2;} }
#cf-bubbles > .bubbles { animation: fader 1.6s infinite;}
#cf-bubbles > .bubbles:nth-child(2) { animation-delay: .2s;}
#cf-bubbles > .bubbles:nth-child(3) { animation-delay: .4s;}
</style>
</head>
<body>
<div id="cf-wrapper">
<div class="cf-alert cf-alert-error cf-cookie-error" data-translate="enable_cookies" id="cookie-alert">Please enable cookies.</div>
<div class="cf-error-details-wrapper" id="cf-error-details">
<div class="cf-wrapper cf-header cf-error-overview">
<h1 data-translate="managed_challenge_headline">Please wait...</h1>
<h2 class="cf-subheadline"><span data-translate="managed_checking_msg">We are checking your browser...</span> www.skroutz.gr</h2>
</div>
<div class="cf-section cf-highlight cf-captcha-container">
<div class="cf-wrapper">
<div class="cf-columns two">
<div class="cf-column">
<div class="cf-highlight-inverse cf-form-stacked">
<form action="/c/40/kinhta-thlefwna.html?__cf_chl_f_tk=SyBbHJlHzS.xDpnxJdXeTIBNmKGSVY9Caf6BU5YW6xE-1656805606-0-gaNycGzNBz0" class="challenge-form managed-form" enctype="application/x-www-form-urlencoded" id="challenge-form" method="POST">
<div id="cf-please-wait">
<div id="spinner">
<div id="cf-bubbles">
<div class="bubbles"></div>
<div class="bubbles"></div>
<div class="bubbles"></div>
</div>
</div>
<p data-translate="please_wait" id="cf-spinner-please-wait">Please stand by, while we are checking your browser...</p>
<p data-translate="redirecting" id="cf-spinner-redirecting" style="display:none">Redirecting...</p>
</div>
<input name="md" type="hidden" value="IwzwByLaPyYwh36ds_B76pH5fbjsm4NPLcrfOyVKp4k-1656805606-0-ATjH4PthZIQnlHja_TD2k3GIpJNvZi83t0ImbV0HYerX3pLXRLzqVYaRg_jJnr-ENIDFSkm2yG4ucsXrIj2kFquA7Ko4B40ctfjPfq93KAuO2HKoglP7QsaBaAwi3M-psOfnAbR48_o8kIFryXMdm6fJxGT8XpMh4LukCG0MavaJDWjYYiZcHHX82oSF7rx5_LnOgEkx0xkLtZZseZkhzqTfzzJ0S9wKzO9ZZdQMKxRxMIXWn36p3IezexUmZFlfbNtTFuETI0mWGSoTodXSiebeJasF-Ug2dnksRfhCQ1tsLun1XkVrbB5FauyBoa1Lh2-j6k3iW59xN6wsekKZcTrSaq03kn-bEod25lpVvQoe6u0wMmNYbBNWUtn4GER9CxWsDlUAXAwCp-BJAD-lHiJzAuBAmygbwFLRsoOcKOzUqtdQXNMX852hbTsSeTEw8a8bYxj-rHT4-d4zZDHGAw-dWKUAzVrtVSJLKlnRbknZ8BgT4FNnHF3Wwoo6JJxpoyYrrpMi4X27dgBkyymEQ0t1q7LXO_NSBxZIs33DM3hdiBMBHa69ZlFv92IKf-g6pnJnJj1QSlo5kVBTEd_wDswWTzLO6LmFz9VmMbGYTPPYbSFlP_b1VipLLI2DA4UVIOvb_7-alRtQjQQVZXFlkQYREd2J2EKtjzaOdMDKB6PgldFFeAOb97nCukF9UBBXHBAIATvR05AMvKt3LdBeL_3VDoyZe4Gnl8gbzlUTNP4OQ-o8ZEGcPC86WqWj5t2pSHw9XLxDFgR4IT7uYQOCY0cDmNe-BcbWz71ELBDGUXoyu7snXR_nMh6E3b87YNxVRSzeQ5JMrxajndJOfG_Oe0HSFF2SXFU_ahvhlY_GAY9TqexHH0pYzoVCJUtrkU8eTQ"/>
<input name="r" type="hidden" value="DUXMrZguXrr5SKxL5gb8bc9rgvwFZTJBMi4FVb0iz94-1656805606-0-AWxS7hG0CDsoq8Hfnn/fAlbcwDr533szKw8waJNidRmjag4uyqPCsquvtrdjXbzXFP39PakxrWmNVDcKmdXQ6S9o9TGKWaXg537pbNvR73prQvU1eXhXCkbjf86u4b88W0acULgkpL9h+5tRLelwypJ3QwI2X/MfTod2V2drUXQ5h0peLqbcO0HTqMTvXgwUSQapbHaW8ZiQp+mGupb30Pu8YMKTXRGgvJWilv1F+WhpyprJaNOA86hfIlIolBHItxli/EZHW8IR4/GmMlFowX11GPibk8OLHZ+YizRjmdnALqh/AraZFv8GgRwerfHXrZvwka6KqlggzN77KL1q0JhnG1B0oAef8kpOr+yxxp2+cLCiJV8UkCTa96xVAsIwsz1lamCKGqJHaoFVLadytpHWkWQ+Mfqr2/a4tl9ivB6K6bB7qZBzneFusiSOXfwrSsgeXJ6aLNUzuYtMR8wpEVVkiakRpsFVXKIJOzD3KgFyCAfMCd8F2cktMc1+IasYMPlLuE96aqHqQFOoZoNCKOkxIjC35w3SGYgVUw7GOwGtvt08Q6mHns9C9PpRSxsIXUS1jMvQdRIW1tI3lBa1f/9u+u326qXb43+Hnk1F5nuIkXir6CkZKz3HFbAKKJis8rEyI7V0XCKpM/dd0n0EE6H75L0ogVgyQ0YtzxF03NvTdbO4pVLIHoWveMSVQHSFBHvJnVC7B5hhjoFVreOKh4mq3MAWM68jpdhYpv0lFNCSh50T+aCvcdXDCy78PqNQYl0C6jRfsFrGlkUxdLAN/UFwPNq/FzhwijK05TDkY0xN/eUQGoWxhDuCq4FMcukFHlHLMX9XomWLJV70XgP4OHxVarraKvzXTKDLVmKw04mNcVHVXR9syQmZnGWT9kSonL8Yqie6Z4K7JsMZIQuKFbVSLvvMvPsp5Ij9gNmGmexBcbxoQCoi4UivIg8WBtGu8D0BssU5QN8jTlcupcPqY+CsYFmvF3fX0CT3Vri46K6uxRDiXiVhAls7FnhjH1eFykshMQgDVZ04RopWa+sBGy/8FB2b7OI7kXgV0AaQTAkjiywNgKE93F7udAjXHjMYI4Z6MBwa2M8R3/CaC/6nfrCuI9JSopiPs7bXDjYXvVSwAUmaaWHHowg8F7xNbDWPRutBjaAauCqXLEMFYUs7NUyXwuvM+SqbV9T02Iukya848oEqMGX/pfrXcEP+BoXw/rL7PQ0J1NCB5leXQ4kJYmJpEvhsCi+3jklvpPgTxgmH66D5/dOF6EIUKlNO70Mk5Ft6aVEewLT/SWwXcyXjd5jXjLEyrG2PZN3oNFSMRFPLZqXDbwafxYjKHQoFHDKWSoESdTb6pO31v5kDzFdDF5wfy8Grr5tqYEiCQaiawPMEvbzZ/C0CbmZEvxxgii9UA/fNU6ZtvCXoljfao5b1tVh7ww0Bl3/KtCjj/zDcittuheaQaoBrAd1h0L1m1G/UmcXlRoJXthIRatelutU1voXttp097S8myV92G65vM+J5T+8q4yiFapkRYvwKRtp6spOaIqIEGh1VHGZAUZZFu2TtZRA1kVWFXof8d0krQvUly9QmxWAz4xHqIpedOtjc+0Sa+aX781jxvKzZ6YA9kWe78Z1VCUsv3h7vKUZ3uUD1mDt8xpZukiuCtU8fqH+8rPhEeKoAOTk03HrzvrjnUeplxjht6JOx0ZfkvFaUyjDjnL4Pr/R9uMe9V07Dl6aIj9HaQRtBfygajoSePZwAi7emffrLi8YPjvYctJCFyK/auNRJBIKj5nEeOUvKkKNmdA=="/>
<input name="vc" type="hidden" value="28e85100a980cbce13a010a21dbc0851"/>
<noscript class="cf-captcha-info" id="cf-captcha-bookmark">
<h1 data-translate="turn_on_js" style="color:#bd2426;">Please turn JavaScript on and reload the page.</h1>
</noscript>
<div class="cookie-warning" data-translate="turn_on_cookies" id="no-cookie-warning" style="display:none">
<p data-translate="turn_on_cookies" style="color:#bd2426;">Please enable Cookies and reload the page.</p>
</div>
<script>
//<![CDATA[
var a = function() {try{return !!window.addEventListener} catch(e) {return !1} },
b = function(b, c) {a() ? document.addEventListener("DOMContentLoaded", b, c) : document.attachEvent("onreadystatechange", b)};
b(function(){
var cookiesEnabled=(navigator.cookieEnabled)? true : false;
if(!cookiesEnabled){
var q = document.getElementById('no-cookie-warning');q.style.display = 'block';
}
});
//]]>
</script>
<div id="trk_captcha_js" style="background-image:url('/cdn-cgi/images/trace/captcha/nojs/h/transparent.gif?ray=724b4343a9c4152e')"></div>
</form>
<script>
//<![CDATA[
(function(){
var isIE = /(MSIE|Trident\/|Edge\/)/i.test(window.navigator.userAgent);
var trkjs = isIE ? new Image() : document.createElement('img');
trkjs.setAttribute("src", "/cdn-cgi/images/trace/managed/js/transparent.gif?ray=724b4343a9c4152e");
trkjs.id = "trk_managed_js";
trkjs.setAttribute("alt", "");
document.body.appendChild(trkjs);
var cpo=document.createElement('script');
cpo.type='text/javascript';
cpo.src="/cdn-cgi/challenge-platform/h/b/orchestrate/managed/v1?ray=724b4343a9c4152e";
window._cf_chl_opt.cOgUHash = location.hash === '' && location.href.indexOf('#') !== -1 ? '#' : location.hash;
window._cf_chl_opt.cOgUQuery = location.search === '' && location.href.slice(0, -window._cf_chl_opt.cOgUHash.length).indexOf('?') !== -1 ? '?' : location.search;
if (window._cf_chl_opt.cUPMDTk && window.history && window.history.replaceState) {
var ogU = location.pathname + window._cf_chl_opt.cOgUQuery + window._cf_chl_opt.cOgUHash;
history.replaceState(null, null, "\/c\/40\/kinhta-thlefwna.html?__cf_chl_rt_tk=SyBbHJlHzS.xDpnxJdXeTIBNmKGSVY9Caf6BU5YW6xE-1656805606-0-gaNycGzNBz0" + window._cf_chl_opt.cOgUHash);
cpo.onload = function() {
history.replaceState(null, null, ogU);
};
}
document.getElementsByTagName('head')[0].appendChild(cpo);
}());
//]]>
</script>
</div>
</div>
<div class="cf-column">
<div class="cf-screenshot-container">
<span class="cf-no-screenshot"></span>
</div>
</div>
</div>
</div>
</div>
<div class="cf-section cf-wrapper">
<div class="cf-columns two">
<div class="cf-column">
<h2 data-translate="why_captcha_headline">Why do I have to complete a CAPTCHA?</h2>
<p data-translate="why_captcha_detail">Completing the CAPTCHA proves you are a human and gives you temporary access to the web property.</p>
</div>
<div class="cf-column">
<h2 data-translate="resolve_captcha_headline">What can I do to prevent this in the future?</h2>
<p data-translate="resolve_captcha_antivirus">If you are on a personal connection, like at home, you can run an anti-virus scan on your device to make sure it is not infected with malware.</p>
<p data-translate="resolve_captcha_network">If you are at an office or shared network, you can ask the network administrator to run a scan across the network looking for misconfigured or infected devices.</p>
</div>
</div>
</div>
<div class="cf-error-footer cf-wrapper w-240 lg:w-full py-10 sm:py-4 sm:px-8 mx-auto text-center sm:text-left border-solid border-0 border-t border-gray-300">
<p class="text-13">
<span class="cf-footer-item sm:block sm:mb-1">Cloudflare Ray ID: <strong class="font-semibold">724b4343a9c4152e</strong></span>
<span class="cf-footer-separator sm:hidden">•</span>
<span class="cf-footer-item hidden sm:block sm:mb-1" id="cf-footer-item-ip">
Your IP:
<button class="cf-footer-ip-reveal-btn" id="cf-footer-ip-reveal" type="button">Click to reveal</button>
<span class="hidden" id="cf-footer-ip">213.7.17.251</span>
<span class="cf-footer-separator sm:hidden">•</span>
</span>
<span class="cf-footer-item sm:block sm:mb-1"><span>Performance & security by</span> Cloudflare</span>
</p>
<script>(function(){function d(){var b=a.getElementById("cf-footer-item-ip"),c=a.getElementById("cf-footer-ip-reveal");b&&"classList"in b&&(b.classList.remove("hidden"),c.addEventListener("click",function(){c.classList.add("hidden");a.getElementById("cf-footer-ip").classList.remove("hidden")}))}var a=document;document.addEventListener&&a.addEventListener("DOMContentLoaded",d)})();</script>
</div><!-- /.error-footer -->
</div>
</div>
<script>
window._cf_translation = {};
</script>
</body>
</html>
The website is loading data with an API. Here is the link: https://www.skroutz.gr/c/40/kinhta-thlefwna.json?page=1
I set the pageLoadStrategy as eager but now the method I wrote to close popups of the website is not working(it works when pageLoadStrategy is normal). So I want to know how to change the pageLoadStrategy from eager to normal, close the popups, and then change it back to eager I want to use eager as I want my code to work for low speed connections also.
here is the content of the page taken by driver.page_source when pageLoadStrategy is set to eager:
<head>
<meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/>
<meta charset="utf-8"/>
<title>
Amizone
</title>
<meta content="overview & stats" name="description"/>
<meta content="width=device-width, initial-scale=1.0, maximum-scale=1.0" name="viewport"/>
<!-- bootstrap & fontawesome -->
<link href="/Content/bootstrap.min.css" rel="stylesheet"/>
<link href="/font-awesome/4.2.0/css/font-awesome.min.css" rel="stylesheet"/>
<link href="/Content/color.css" rel="stylesheet"/>
<!-- page specific plugin styles -->
<!-- text fonts -->
<link href="https://fonts.googleapis.com/css?family=Poppins:300,400,500,600,700" rel="stylesheet" type="text/css"/>
<!-- ace styles -->
<link href="/Content/ace.min.css" rel="stylesheet"/>
<link href="/Content/Dashboard.css" rel="stylesheet"/>
<link href="/assets/bootstrap-datepicker.min.css" rel="stylesheet"/>
<!-- end css for this page-->
<script src="/Scripts/jquery.2.1.1.min.js">
</script>
</head>
</html>
this is some of the content of the page(which I want) at the same point when pageLoadStrategy is set to normal:
<head>
<meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/>
<meta charset="utf-8"/>
<title>
Amizone
</title>
<meta content="overview & stats" name="description"/>
<meta content="width=device-width, initial-scale=1.0, maximum-scale=1.0" name="viewport"/>
<!-- bootstrap & fontawesome -->
<link href="/Content/bootstrap.min.css" rel="stylesheet"/>
<link href="/font-awesome/4.2.0/css/font-awesome.min.css" rel="stylesheet"/>
<link href="/Content/color.css" rel="stylesheet"/>
<!-- page specific plugin styles -->
<!-- text fonts -->
<link href="https://fonts.googleapis.com/css?family=Poppins:300,400,500,600,700" rel="stylesheet" type="text/css"/>
<!-- ace styles -->
<link href="/Content/ace.min.css" rel="stylesheet"/>
<link href="/Content/Dashboard.css" rel="stylesheet"/>
<link href="/assets/bootstrap-datepicker.min.css" rel="stylesheet"/>
<!-- end css for this page-->
<script src="/Scripts/jquery.2.1.1.min.js">
</script>
<script src="/Scripts/jquery.unobtrusive-ajax.min.js">
</script>
<script src="/Scripts/ace-extra.min.js">
</script>
<script src="/Scripts/bootstrap.min.js">
</script>
<script src="/Scripts/bootbox.min.js">
</script>
<!-- page calender plugin scripts -->
<!--start ace scripts -->
<script src="/Scripts/ace-elements.min.js">
</script>
<script src="/Scripts/ace.min.js">
</script>
<script src="/Scripts/jquery.easypiechart.min.js">
</script>
<script src="/assets/animate-plus.min.js">
</script>
<script src="/assets/owl.carousel.js">
</script>
<script src="/assets/bootstrap-datepicker.min.js">
</script>
<script src="/Scripts/form-wizard.js">
</script>
<script src="/Scripts/validator.js">
</script>
<link href="/Content/color.css" rel="stylesheet"/>
<script src="/Scripts/jquery.colorbox.min.js">
</script>
<link href="/Content/colorbox.min.css" rel="stylesheet"/>
<!-- include the style -->
<link href="/Content/alertifyjs/alertify.min.css" rel="stylesheet"/>
<!-- include a theme -->
<link href="/Content/alertifyjs/themes/default.min.css" rel="stylesheet"/>
<script src="/Scripts/alertify.js">
</script>
<link href="/Content/main.css" rel="stylesheet"/>
<style>
#myDiv {
visibility: hidden;
opacity: 0;
}
</style>
<script>
alertify.set('notifier', 'position', 'top-right');
</script>
<script>
var baseurl = "";
</script>
<script>
var myVar;
function myFunction() {
$("#lodingDiv").css("display", "block");
myVar = setTimeout(showPage, 300);
}
function showPage() {
$("#lodingDiv").css("display", "none");
$("#myDiv").css("visibility", "visible");
$("#myDiv").css("opacity", "1");
}
</script>
<script charset="UTF-8" src="https://www.gstatic.com/charts/46.1/loader.js" type="text/javascript">
</script>
<link href="https://www.gstatic.com/charts/46.1/css/core/tooltip.css" id="load-css-0" rel="stylesheet" type="text/css"/>
<link href="https://www.gstatic.com/charts/46.1/css/util/util.css" id="load-css-1" rel="stylesheet" type="text/css"/>
<script charset="UTF-8" src="https://www.gstatic.com/charts/46.1/js/jsapi_compiled_format_module.js" type="text/javascript">
</script>
<script charset="UTF-8" src="https://www.gstatic.com/charts/46.1/js/jsapi_compiled_default_module.js" type="text/javascript">
</script>
<script charset="UTF-8" src="https://www.gstatic.com/charts/46.1/js/jsapi_compiled_ui_module.js" type="text/javascript">
</script>
<script charset="UTF-8" src="https://www.gstatic.com/charts/46.1/js/jsapi_compiled_corechart_module.js" type="text/javascript">
</script>
</head>
<body class="no-skin modal-open" oncontextmenu="return false;" onload="myFunction()" style="padding-right: 15px;">
<div id="lodingDiv" style="display: block;">
<div id="loader">
</div>
</div>
<script>
function beginReadNotificationNav(stype, iNoticeId) {
var hfvalue = $('#Hf' + stype + '_' + iNoticeId).val();
var count = $('#PendHome' + stype + 's').text();
var actualCount = parseInt($.trim(count));
if (hfvalue == "False" && actualCount > 0) {
$('#PendHome' + stype + 's').text(actualCount - 1);
$('#PendHome1' + stype + 's').text(actualCount - 1);
$('#Pend' + stype + 's').text(actualCount - 1);
$('#Hf' + stype + '_' + iNoticeId).val('True');
}
}
</script>
<script>
function imageExists(url, callback) {
var img = new Image();
img.onload = function() { callback(true); };
img.onerror = function() { callback(false); };
img.src = url;
}
function validateImageURL()
{
var imageUrl = 'https://amizone.net/amizone/Images/Signatures/7071804_P.png';
imageExists(imageUrl, function(exists) {
//Show the result
// alert('Fileexists=' + exists);
var html='';
if (exists)
{
html= '<img class="nav-user-photo" src="https://amizone.net/amizone/Images/Signatures/7071804_P.png"/>';
}
else
{
html = '<img class="nav-user-photo" src="../Images/blankphoto.png" />';
}
// alert(html);
$('#userphoto').append(html);
});
}
validateImageURL();
</script>
<style>
.margin-left-20{margin-left:20px}
.top-notice li {
line-height: 20px;
overflow: hidden;
text-overflow: ellipsis;
white-space: nowrap;
}
.Nav-Note-Title {
font-size: 13px;
color: #0eb2a4;
}
.Nav-Note-Date {
font-size: 12px;
color: #18960a;
}
.Nav-Note-View a {
font-size: 11px;
color: #fff;
text-decoration: none;
padding: 2px 10px;
border-radius: 10px;
}
#media(min-width:767px) {
.modal-close-btn-home {
display:none;
}
}
</style>
<div class="navbar navbar-default ace-save-state navbar-fixed-top" id="navbar">
<script type="text/javascript">
try { ace.settings.check('navbar', 'fixed') } catch (e) { }
</script>
<div class="navbar-container" id="navbar-container">
<button class="navbar-toggle menu-toggler pull-left" data-target="#sidebar" id="menu-toggler" type="button">
<span class="sr-only">
Toggle sidebar
</span>
<span class="icon-bar">
</span>
<span class="icon-bar">
</span>
<span class="icon-bar">
</span>
</button>
<div class="navbar-header pull-left">
<a class="navbar-brand" href="/Home/">
<img src="/images/amizone-logo-inner.png"/>
</a>
</div>
<div class="pull-left">
<h4 class="align-middle white margin-left-20">
Amity University Uttar Pradesh, Noida
</h4>
</div>
<div class="navbar-buttons pull-right" role="navigation">
<ul class="nav ace-nav">
<li class="grey dropdown-modal">
<a aria-expanded="false" class="dropdown-toggle" data-toggle="dropdown" href="#">
<i class="ace-icon fa fa-bell">
</i>
<span class="badge badge-grey" id="PendHomeNotifications">
10
</span>
</a>
<ul class="dropdown-menu-right dropdown-navbar dropdown-menu dropdown-caret dropdown-close">
<li class="dropdown-header">
<i class="ace-icon fa fa-check">
</i>
<span id="PendHome1Notifications">
10
</span>
Notices To Read
<button aria-label="Close" class="close modal-close-btn-home" data-dismiss="modal" type="button">
<span aria-hidden="true">
×
</span>
</button>
</li>
<li class="dropdown-content ace-scroll" style="position: relative;">
<div class="scroll-track" style="display: none;">
<div class="scroll-bar">
</div>
</div>
<div class="scroll-content" style="max-height: 200px;">
<div class="scroll-content" style="">
<ul class="dropdown-menu dropdown-navbar top-notice">
<li>
<input id="HfNotification_6699" type="hidden" value="False"/>
<span class="Nav-Note-Title">
NOTICE FOR ALL -- LOST & FOUND
</span>
<div class="clearfix">
<span class="pull-left Nav-Note-Date">
13 Jan 2020
</span>
<span class="pull-right Nav-Note-View">
<a class="bg-b-blue" data-ajax="true" data-ajax-begin="beginReadNotificationNav('Notification','6699');" data-ajax-loading="#lodingDiv" data-ajax-method="GET" data-ajax-mode="replace" data-ajax-success=" $('#FormModal').modal('show');" data-ajax-update="#DivForm" href="/Home/NoticeDescription/6787102C-D0F6-4AF2-8E58-4D6F9FF3D4C8?Type=2" id="6787102C-D0F6-4AF2-8E58-4D6F9FF3D4C8" rel="0">
View
</a>
</span>
</div>
</li>
<li>
<input id="HfNotification_6700" type="hidden" value="False"/>
<span class="Nav-Note-Title">
COMPLETE BAN ON E-CIGARETTES
</span>
<div class="clearfix">
<span class="pull-left Nav-Note-Date">
10 Jan 2020
</span>
<span class="pull-right Nav-Note-View">
<a class="bg-b-blue" data-ajax="true" data-ajax-begin="beginReadNotificationNav('Notification','6700');" data-ajax-loading="#lodingDiv" data-ajax-method="GET" data-ajax-mode="replace" data-ajax-success=" $('#FormModal').modal('show');" data-ajax-update="#DivForm" href="/Home/NoticeDescription/F81A5BB8-68BE-4B92-8DD0-2A76F1880197?Type=2" id="F81A5BB8-68BE-4B92-8DD0-2A76F1880197" rel="0">
View
</a>
</span>
</div>
</li>
<li>```
this is the python automation code:
#import sys
import time
import db
from bs4 import BeautifulSoup
from selenium.common import exceptions
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
#from selenium.webdriver.support import expected_conditions as EC
start_time = time.time() #stores time at which program starts
#while(True):
#***setting up chrome driver***
caps = DesiredCapabilities().CHROME
caps["pageLoadStrategy"] = "normal" #complete
#caps["pageLoadStrategy"] = "eager" #interactive
#caps["pageLoadStrategy"] = "none"
chromedriver = "/usr/share/chromedriver/chromedriver"
driver = webdriver.Chrome(desired_capabilities=caps,executable_path=chromedriver)
driver.set_window_size(800, 1000)
# driver.set_network_conditions(
# offline=False,
# latency=5, # additional latency (ms)
# download_throughput=500 * 1024, # maximal throughput
# upload_throughput=500 * 1024) # maximal throughput
#driver.maximize_window()
#wait = WebDriverWait(driver, 10)
wait = driver.implicitly_wait(10)
url = "https://student.amizone.net"
driver.get(url) #getting amizone.net
#***write page content to a file and return page soup***
def page_content_to_file(*argsv):
wait
content = driver.page_source
#print(content)
page_soup = BeautifulSoup(content, "html.parser")
page_soup_text = BeautifulSoup.prettify(page_soup)
if(len(argsv) > 1):
raise NameError('page_content_to_file cannot take more than 2 arguments')
if(len(argsv) == 1):
filename = argsv[0]
with open(filename, "w") as file:
file.write(page_soup_text)
print("wrote to file {}".format(filename))
return page_soup
#***function that enters login credentials***
def login(username, password):
try:
#type | name=_UserName
driver.find_element(By.NAME, "_UserName").send_keys(username)
#type | name=_Password
driver.find_element(By.NAME, "_Password").send_keys(password)
#click | css=#loginform .login100-form-btn |
driver.find_element(By.CSS_SELECTOR, "#loginform .login100-form-btn").click()
except:
print("couldn't complete login")
#***function to close popups***
def close_popups():
#try:
page_soup = page_content_to_file("popup.html")
# getting names of divs having class 'modal fade in'
# driver.implicitly_wait(10)
# content = driver.page_source
# page_soup = BeautifulSoup(content,"html.parser")
popup_divs = page_soup.find_all('div', {"class":"modal fade in"})
#print(popup_divs)
popups_name = []
for div in popup_divs:
popups_name.append(div['id'])
print(popups_name)
if(len(popups_name) == 0):
print("no popups found popups_name length=0")
else:
print("starting")
# clicking to close pop-ups
for name in reversed(popups_name):
xpath = "//div[#id='" + name + "']//button[#class='close']"
print(xpath)
driver.find_element(By.XPATH, xpath).click()
print("clicks complete")
#extra code
#wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#ModalPopAmityHostel button.btn"))).click()
#driver.execute_script("arguments[0].click()", driver.find_element_by_css_selector("#StudentSatisfactionPop button.btn"))
#click | id=ModalPopAmityHostel |
#driver.find_element(By.XPATH, "//div[#id='ModalPopAmityHostel']//button[#class='close']").click()
#click | id=StudentSatisfactionPop |
#driver.find_element(By.XPATH, "//div[#id='StudentSatisfactionPop']//button[#class='close']").click()
# except:
# print("error occured while closing popups")
return 0
#----------FUNCTION CALLS----------
login("username", "password") #login
close_popups() #close all popups
#----------------------------------
#***go to next/prev date in myClasses***
page_soup = page_content_to_file("amizone.html")
date_prev_next = ""
while(date_prev_next != "end"):
date_prev_next = input("type prev/next:")
try:
if(date_prev_next == "next"):
#click | css=.fc-icon-right-single-arrow |
driver.find_element(By.CSS_SELECTOR, ".fc-icon-right-single-arrow").click()
print(driver.find_element(By.XPATH, "//*[#id='calendar']/div[1]/div[3]/h2").text)
if(date_prev_next == "prev"):
# click | css=.fc-prev-button |
driver.find_element(By.CSS_SELECTOR, ".fc-prev-button").click()
print(driver.find_element(By.XPATH, "//*[#id='calendar']/div[1]/div[3]/h2").text)
except exceptions.NoSuchElementException as e:
print(e, "unable to click. Something may be blocking the element")
#***clicking on the hamburger button and choosing timetable***
def menu_click(option):
menu_toggler_xpath = "//*[#id='menu-toggler']" #xpath of the menu toggler(hamburger button)
# clicking on the hamburger button on top left corner
driver.find_element(By.XPATH, menu_toggler_xpath).click()
if(option == "timetable"):
time_table_navbar_xpath = "//*[#id='10']" #xpath of the timetable option in menu
# clicking on the timetable button in the menu
driver.find_element(By.XPATH, time_table_navbar_xpath).click()
print("clicked on timetable.")
wait
menu_click("timetable")
#***scraping timetable***
# print(driver.find_element(By.CLASS_NAME, "tab-content").text)
#NOTE: no need to click on weekdays because all info is in the webpage
#NOTE: just clicking on the tt loads only the current day's tt. if then you click once on any day webpage shows tt of whole week.(why?)
#for i in range(1,8): #iterating over all weekdays 1-7
#weekday_xpath = "//*[#id='myTab3']/li[" + str(i) + "]/a" #concatenating string to make xpath for each weekday
#clicking on a day to get whole week's tt
page_soup = page_content_to_file("01.html")
weekday_xpath = "//*[#id='myTab3']/li[1]/a" #xpath of day no.1 of the week in the timetable at that time.
driver.find_element(By.XPATH, weekday_xpath).click()
# get info about classes and attendance marked from myclasses
#TODO: for a course check whether green or blue dot is shown
#extra code
#date = driver.find_element(By.XPATH, "//*[#id='calendar']/div[1]/div[3]/h2").text #to get the date of myClasses
#print(calendar_date_element.text)
#driver.find_element(By.ID, "ModalPopAmityHostel").click()
#time.sleep(5)
#driver.implicitly_wait(5000)
#driver.find_element(By.ID, "StudentSatisfactionPop").click()
#url = driver.current_url
#driver.find_element(By.CLASS_NAME, "close").click()
#driver.implicitly_wait(5000)
period_data = [] #list to make sql statement
#scraping timetable data
page_soup = page_content_to_file()
divs_class_tab_pane = page_soup.findAll("div", {"class":"tab-pane"}) #finds and makes a list all the <div class="tab-pane in active" id="[day]">
for day_div in divs_class_tab_pane: #selects each day's div from divs_class_tab_pane list
print()
day = day_div["id"].strip() #gets the id attribute of div tag e.g <div class="tab-pane in active" id="Sunday"> returns the day
print(day)
try:
#find all <div class="thumbnail timetable-box"> elements which contains p tags of details of a class
div_thumbnail_timetable_box = day_div.findAll("div", {"class":"thumbnail timetable-box"})
if(len(div_thumbnail_timetable_box) == 0):
print("no classes alloted yet")
except:
print("no classes today")
#selecting element one at a time from div_thumbnail_timetable_box
for ttbox in div_thumbnail_timetable_box:
period_data.append(day) #appending day to list
print()
#get text from <p class="class-time"> the class time
class_time = ttbox.find('p', {"class":"class-time"}).text.strip()
period_data.append(class_time) #appending class time to list
print(class_time)
#get text from <p class="course-code"> the course code
course_code = ttbox.find('p', {"class":"course-code"}).text.strip()
period_data.append(course_code) #appending course_code to list
print(course_code)
#get text from <p class="course-teacher"> the course teacher
course_teacher = ttbox.find('p', {"class":"course-teacher"}).text.strip()
period_data.append(course_teacher) #appending course_teacher to list
print(course_teacher)
class_location = ttbox.find('p', {"class":"class-loc"}).text.strip()
period_data.append(class_location)
print(class_location)
print(period_data)
#connecting to database #TODO: exception handliling required here
mydb = db.establish_con("localhost", "manik", "sweetbread","amizone")
script = "','".join(period_data)
period_data.clear()
query = "INSERT INTO amizone.tt_data(`day`,`time`,course,teacher, class_loc) VALUES ('" + script + "');"
#running MySQL query in the database
mycursor = db.run_sql(mydb, query)
#mycursor = db.run_sql(mydb, "SELECT * FROM amizone.tt_data;")
mydb.commit()
#driver.quit()
print("execution time: %ss" % (round(time.time() - start_time, 5)))
# time.sleep(10)```
You have to consider a couple of things as follows:
As you set the pageLoadStrategy as eager through DesiredCapabilities() this configuration gets baked into the chromedriver executable and will persist till the lifetime of the WebDriver instance. So, you can't change the pageLoadStrategy either from eager to normal or vice versa while the test execution is In Progress.
You can find a couple of relevant discussions in:
Set capability on already running selenium webdriver
Change ChromeOptions in an existing webdriver
Comparing the Page Source obtained by the WebDriver variant using both the pageLoadStrategies you will observe:
The Page Source with pageLoadStrategy as eager doesn't contains the jQuery and AJAX cals:
<script src="/Scripts/jquery.2.1.1.min.js">
</script>
<script src="/Scripts/jquery.unobtrusive-ajax.min.js">
</script>
Hence, there are higher possibilities that the WebDriver variant will attemptan interaction e.g. click(), send_keys() even before the JavaScript registers different HTML DOM Events e.g. event handlers, on elements with in the HTML document which eventually will result in click() failures.
This is the exact reason behind the ...popup not getting closed....
Solution
From the Page Source with pageLoadStrategy as eager, it is quite evident the elements on the webpage are JavaScript enabled elements, so a better approach would be to use pageLoadStrategy as normal through the test execution.
Is there any way to get direct image URL from http-link to telegram post with Python?
I have direct link to telegramm post, example: https://t.me/tele2slack/223
I can find image URL with chrome inspector ... for my link image url is:
image
https://cdn4.telesco.pe/file/oxrpfWsqyBeFI3KIxPqBf-5A1k_OEiueCdwpuhR0oWtM7_88zpYi7kRsADHYobpByICSfImn_CffaxWr2nC6E49BSFchpKRKO5bkNPsFmefhsjdLstZwtHaeZGqHkqWFcGbtujPcmigwJkl7gH7tjHJrqlpmhZmGS7QnacF8PNxpocVMqaQXRxLW7kAwm6lVxLYo6AJqNb8bdZ5RXJgd6mQG0v5QINvTwtJNdioEWDAjtsufsxHVgzdUK1yBn1M3cjmhjfv8o4uMyi0bhsdFV_q21e0Sqj-QvUi-99JCPSHNVlLBfoWQEtSCeErPE45UrlqbnELYOznvLq_CeE6BcQ.jpg
Is there any way to automate this process with python?
I've tried GET request, but unfortunatelly get no useful information:
response = requests.get("https://t.me/tele2slack/223")
response:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Telegram: Contact #tele2slack</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta property="og:title" content="Tele2slack_dev">
<meta property="og:image" content="https://telegram.org/img/t_logo.png">
<meta property="og:site_name" content="Telegram">
<meta property="og:description" content="🇺🇸#TSN #отчетности #сша
Tyson Foods Q4 Earnings:
-Q4 Adj EPS $1.21 (est $1.25)
-Q4 Revenue $10.88 Bln (esat $11.0 Bln)">
<meta property="twitter:title" content="Tele2slack_dev">
<meta property="twitter:image" content="https://telegram.org/img/t_logo.png">
<meta property="twitter:site" content="#Telegram">
<meta property="al:ios:app_store_id" content="686449807">
<meta property="al:ios:app_name" content="Telegram Messenger">
<meta property="al:ios:url" content="tg://resolve?domain=tele2slack&post=224">
<meta property="al:android:url" content="tg://resolve?domain=tele2slack&post=224">
<meta property="al:android:app_name" content="Telegram">
<meta property="al:android:package" content="org.telegram.messenger">
<meta name="twitter:card" content="summary">
<meta name="twitter:site" content="#Telegram">
<meta name="twitter:description" content="🇺🇸#TSN #отчетности #сша
Tyson Foods Q4 Earnings:
-Q4 Adj EPS $1.21 (est $1.25)
-Q4 Revenue $10.88 Bln (esat $11.0 Bln)
">
<meta name="twitter:app:name:iphone" content="Telegram Messenger">
<meta name="twitter:app:id:iphone" content="686449807">
<meta name="twitter:app:url:iphone" content="tg://resolve?domain=tele2slack&post=224">
<meta name="twitter:app:name:ipad" content="Telegram Messenger">
<meta name="twitter:app:id:ipad" content="686449807">
<meta name="twitter:app:url:ipad" content="tg://resolve?domain=tele2slack&post=224">
<meta name="twitter:app:name:googleplay" content="Telegram">
<meta name="twitter:app:id:googleplay" content="org.telegram.messenger">
<meta name="twitter:app:url:googleplay" content="https://t.me/tele2slack/224">
<meta name="apple-itunes-app" content="app-id=686449807, app-argument: tg://resolve?domain=tele2slack&post=224">
<link rel="shortcut icon" href="//telegram.org/favicon.ico?3" type="image/x-icon" />
<link href="https://fonts.googleapis.com/css?family=Roboto:400,700" rel="stylesheet" type="text/css">
<!--link href="/css/myriad.css" rel="stylesheet"-->
<link href="//telegram.org/css/bootstrap.min.css?3" rel="stylesheet">
<link href="//telegram.org/css/telegram.css?177" rel="stylesheet" media="screen">
</head>
<body>
<div class="tgme_page_wrap">
<div class="tgme_head_wrap">
<div class="tgme_head">
<a href="//telegram.org/" class="tgme_head_brand">
<i class="tgme_logo"></i>
</a>
</div>
</div>
<a class="tgme_head_dl_button" href="//telegram.org/dl?tme=6dae9c11480edfa67e_2093069837989679044">
Don't have <strong>Telegram</strong> yet? Try it now!<i class="tgme_icon_arrow"></i>
</a>
<div class="tgme_page tgme_page_post">
<div class="tgme_page_widget" id="widget">
<script async src="https://telegram.org/js/telegram-widget.js?7" data-telegram-post="tele2slack/224" data-width="100%"></script>
</div>
<div class="tgme_page_widget_actions" id="widget_actions">
<div class="tgme_page_widget_actions_cont">
<div class="tgme_page_widget_action_right">
<div class="tgme_page_context_btn"><a class="tgme_action_button_new" href="/s/tele2slack/224"><span class="tgme_action_button_label">Context</span></a></div>
</div>
<div class="tgme_page_widget_action_left">
<div class="tgme_page_embed_btn">
<a class="tgme_action_button_new" onclick="return toggleEmbed();"><span class="tgme_action_button_label">Embed</span></a>
</div>
</div>
<div class="tgme_page_widget_action">
<a class="tgme_action_button_new" href="tg://resolve?domain=tele2slack&post=224">View In Channel</a>
</div>
<div class="tgme_page_embed_action">
<textarea class="tgme_page_embed_code" rows="3" id="embed_code_field" readonly><script async src="https://telegram.org/js/telegram-widget.js?7" data-telegram-post="tele2slack/224" data-width="100%"></script></textarea>
<div class="tgme_page_copy_action">
<a class="tgme_action_button_new" onclick="return copyEmbedCode();">Copy</a>
</div>
</div>
</div>
</div>
</div>
</div>
<div id="tgme_frame_cont"></div>
<script type="text/javascript">
var protoUrl = "tg:\/\/resolve?domain=tele2slack&post=224";
if (false) {
var iframeContEl = document.getElementById('tgme_frame_cont') || document.body;
var iframeEl = document.createElement('iframe');
iframeContEl.appendChild(iframeEl);
var pageHidden = false;
window.addEventListener('pagehide', function () {
pageHidden = true;
}, false);
window.addEventListener('blur', function () {
pageHidden = true;
}, false);
if (iframeEl !== null) {
iframeEl.src = protoUrl;
}
!false && setTimeout(function() {
if (!pageHidden) {
window.location = protoUrl;
}
}, 2000);
}
else if (protoUrl) {
setTimeout(function() {
window.location = protoUrl;
}, 100);
}
</script>
<script>(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-45099287-3', 'auto', {'sampleRate': 5});
ga('set', 'anonymizeIp', true);
ga('send', 'pageview');function toggleEmbed() {
var widget_actions = document.getElementById('widget_actions');
if (widget_actions.classList.contains('embed_opened')) {
widget_actions.classList.remove('embed_opened');
} else {
widget_actions.classList.add('embed_opened');
if (!document.body.classList.contains('fixed_actions')) {
window.scrollTo(0, document.body.offsetHeight);
}
selectEmbedCode();
}
checkActionsPosition();
return false;
}
function selectEmbedCode() {
var field = document.getElementById('embed_code_field');
field.focus();
field.setSelectionRange(0, field.value.length);
}
function copyEmbedCode() {
selectEmbedCode();
document.execCommand('copy');
return false;
}
function checkActionsPosition() {
var widget = document.getElementById('widget');
var widget_actions = document.getElementById('widget_actions');
var widget_rect = widget.getBoundingClientRect();
var actions_bottom = widget_rect.bottom + widget_actions.offsetHeight - 1;
var client_bottom = window.innerHeight || html.clientHeight;
if (actions_bottom > client_bottom) {
widget.style.marginBottom = widget_actions.offsetHeight + 'px';
document.body.classList.add('fixed_actions');
} else {
widget.style.marginBottom = '';
document.body.classList.remove('fixed_actions');
}
}
function postMessageHandler(event) {
try { var data = JSON.parse(event.data); }
catch(e) { var data = {}; }
if (data.event == 'resize') {
setTimeout(checkActionsPosition, 50);
}
}
window.addEventListener('resize', checkActionsPosition);
window.addEventListener('scroll', checkActionsPosition);
window.addEventListener('message', postMessageHandler);
</script>
</body>
</html>
<!-- page generated in 10.93ms -->
Usually i use selenium to make web scraping and other automations.
Check this solution, maybe this could help:
import urllib
from selenium import webdriver
driver = webdriver.chrome()
driver.get('https://www.google.com/')
# get the image from google website
img = driver.find_element_by_xpath('//*[#id="hplogo"]/img')
src = img.get_attribute('src')
# download the image
urllib.urlretrieve(src, "google_logo.png")
driver.close()
To get the xpath source, RIGHT BUTTON MOUSE> INSPECT ELEMENT > RIGHT CLICK ON HTML ELEMENT> CLICK COPY XPATH
I shared a note through evernote and a HTML page generated.
I wanna get title & content of this note ,so code below:
import re
resource = '<!DOCTYPE html>\n<!--[if lt IE 7 ]> <html class="ie6">; <![endif]--><!--[if IE 7 ]> <html class="ie7"> <![endif]--><!--[if IE 8 ]> <html class="ie8"> <![endif]--><!--[if IE 9 ]> <html class="ie9"> <![endif]--><!--[if gt IE 9]> <html> <![endif]--><!--[if !IE]><!--> <html> <!--<![endif]--><head><meta name="en:locale" content="en" />\n <meta charset="utf-8" />\n <meta http-equiv="X-UA-Compatible" content="IE=9,chrome=1" />\n <meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1,minimum-scale=1,user-scalable=0" />\n\n <meta property="og:title" content="python re"/>\n <meta property="og:type" content="article"/>\n <meta property="og:description" content="a question about python re\n "/>\n <meta property="og:url" content="https://www.evernote.com/shard/s61/sh/396b4a1f-ae9c-40aa-b740-5aa19e301489/3de6deff539dec4772bdc4f1057a437d"/>\n <meta property="og:image"\n content="https://www.evernote.com/shard/s61/sh/396b4a1f-ae9c-40aa-b740-5aa19e301489/3de6deff539dec4772bdc4f1057a437d/thm/note/396b4a1f-ae9c-40aa-b740-5aa19e301489"/>\n <meta property="og:site_name" content="Evernote"/>\n <meta property="og:created_time" content="1350193749000"/>\n <meta property="og:updated_time" content="1350193786000"/>\n <link rel="Shortcut Icon" href="/favicon.ico" type="image/x-icon" />\n\n <link rel="stylesheet" href="/redesign/global/css/fonts.css" />\n <link rel="stylesheet" href="/redesign/global/css/header.css" />\n\n <link rel="stylesheet" href="/redesign/sharing/css/sharedNote.css" />\n <title>python re</title>\n <link rel="stylesheet" href="/redesign/modules/SharingMenu/SharingMenu.css"><link rel="stylesheet" href="/redesign/modules/LinkUrlDialog/LinkUrlDialog.css"></head><body class="wrapper"><div class="logo-bar">\n \n <a class="save-button save-button-desktop" href="/saveNote/s61/396b4a1f-ae9c-40aa-b740-5aa19e301489/3de6deff539dec4772bdc4f1057a437d">\n Save to Evernote</a>\n\n <div class="switch-account-div">\n <div class="switch-account-icon"></div>\n <span class="switch-account-name"></span>\n <div class="switch-account-arrow"></div>\n <div class="switch-account-dropdown">\n <div class="switch-dropdown-arrow"></div>\n <div class="switch-account-menuitem">\n Switch Account</div>\n <div class="switch-account-logout">\n Sign Out</div>\n </div>\n </div>\n\n </div>\n\n <div id="message-container">\n <div id="message">\n <div id="message-checkmark"></div>\n <span></span>\n </div>\n </div>\n\n <div id="container-boundingbox" class="wrapper">\n <div id="container" class="wrapper">\n <div class="sharing-imagegallery">\n <div class="SharingMenu"><div class="sharing-menu">\n <div class="share-button-container">\n <div class="label-container">\n <span class="label">\n Share</span>\n <div class="label-icon facebook-icon">\n </div>\n </div>\n <div class="icon-container"\n title="Share">\n <div class="icon">\n </div>\n </div>\n </div>\n <div class="menu-bar">\n <div class="menu-bar-div">\n <div class="menu-bar-icon facebook-icon"></div>\n <span class="menu-bar-label">\n Facebook</span>\n </div>\n <div class="menu-bar-div">\n <div class="menu-bar-icon twitter-icon"></div>\n <span class="menu-bar-label">\n Twitter</span>\n </div>\n <div class="menu-bar-div">\n <div class="menu-bar-icon linkedin-icon"></div>\n <span class="menu-bar-label">\n LinkedIn</span>\n </div>\n <div class="menu-bar-div">\n <div class="menu-bar-icon link-icon"></div>\n <span class="menu-bar-label">\n Link</span>\n </div>\n </div>\n </div>\n</div></div>\n <div class="shared-by-mobile">\n Shared by flowerszhong</div>\n <div class="shared-by shared-by-desktop">\n <div class="shared-by-left"></div>\n Shared by flowerszhong<div class="shared-by-right"></div>\n </div>\n <h2 class="note-title">python re</h2>\n <div class="vtop">\n <div class="note-updated">\n <span>\n Updated Today</span>\n </div>\n </div>\n <div class="divider"></div>\n <div class="note-content">\n <div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="ennote">\na question about python re\n<div><br/></div></div></div>\n <a class="save-button save-button-mobile" href="/saveNote/s61/396b4a1f-ae9c-40aa-b740-5aa19e301489/3de6deff539dec4772bdc4f1057a437d">\n Save to Evernote</a>\n <div class="clearfix" style="clear: both;"></div>\n</div>\n </div>\n\n\n <div class="footer">\n <div>\n Evernote makes it easy to remember things big and small from your everyday life using your computer, tablet, phone and the web.</div>\n <div class="footer-logo"></div>\n </div>\n\n <div class="LinkUrlDialog"><script id="linkUrlDialog" type="text/html">\n <div class="link-url-dialog">\n <div class="dialog-head">\n Link to Note</div>\n <div class="dialog-body">\n <p>Paste this link into an email or IM to share it.</p>\n <p>Anyone with the link will be able to view the note.</p>\n </div>\n <div class="url-container">\n <div class="url-title">\n Note URL:</div>\n <input type="text" class="url-input" value="{{url}}" readonly>\n <div class="copy-container">\n <button type="button" class="copy-button">\n Copy to Clipboard</button>\n </div>\n </div>\n </div>\n </script>\n</div><script src="/redesign/global/js/respond.min.js"></script>\n <script src="/redesign/global/js/require.min.js"></script>\n <script src="/redesign/global/js/config-require.js"></script>\n <script type="text/javascript">\n define("actionBean", [], function() {return {"shareNoteUri":"/shard/s61/sh/396b4a1f-ae9c-40aa-b740-5aa19e301489/3de6deff539dec4772bdc4f1057a437d?shareNote&service=","foodNote":false,"skitchNote":false,"userName":"","switchAccountUri":"/saveNote/s61/396b4a1f-ae9c-40aa-b740-5aa19e301489/3de6deff539dec4772bdc4f1057a437d?switch","logoutUri":"/saveNote/s61/396b4a1f-ae9c-40aa-b740-5aa19e301489/3de6deff539dec4772bdc4f1057a437d?logout","userStatus":"","images":false,"userLoggedIn":false};});\n </script>\n <!-- Google Analytics -->\n<script type="text/javascript">\nvar _gaq = _gaq || [];\n_gaq.push([\'_setAccount\', \'UA-285778-5\']);\n\n\n _gaq.push([\'_trackPageview\', \'/sh/{noteGuid}/{noteKey}/{suffix}\']);\n \n\n(function() {\n var ga = document.createElement(\'script\'); ga.type = \'text/javascript\'; ga.async = true;\n ga.src = (\'https:\' == document.location.protocol ? \'https://ssl\' : \'http://www\') + \'.google-analytics.com/ga.js\';\n var s = document.getElementsByTagName(\'script\')[0]; s.parentNode.insertBefore(ga, s);\n})();\n</script>\n<!-- End of Google Analytics -->\n<script type="text/javascript">\n var _gaq = _gaq || [];\n _gaq.push([\'_setCustomVar\',\n 4, // Slot 4 - required\n \'contentClass\', // Category - required\n \'\', // Value - required\n 3 // Page-level scope\n ]);\n\n _gaq.push([\'_setCustomVar\',\n 5, // Slot 5 - required\n \'sourceApplication\', // Category - required\n \'\', // Value - required\n 3 // Page-level scope\n ]);\n _gaq.push([\'_trackPageview\', \'/singleNote\']);\n </script>\n <script type="text/javascript" src="/redesign/modules/SharingMenu/SharingMenu.js"></script><script type="text/javascript" src="/redesign/modules/LinkUrlDialog/LinkUrlDialog.js"></script><script type="text/javascript" src="/redesign/sharing/SharedNoteViewAction/SharedNoteViewAction.js"></script></body></html>'
title_pattern = re.compile('(?<=<title>).+(?=</title>)')
content_pattern = re.compile('(?<=class=\"divider\"></div>).+(?=<a class=\"save-button)')
title= re.search(title_pattern,resource)
content = re.search(content_pattern,resource)
if title:
print title.group()
if content:
print content.group()
# if __name__=='__main__':main()
output:
python re
why only get title? and how to get the content of this note?
Your problem is that the content contains newlines. ., by default, doesn't match newlines.
Therefore, you should use re.DOTALL:
content_pattern = re.compile('(?<=class=\"divider\"></div>).+(?=<a class=\"save-button)', re.DOTALL)
to make . match newlines. Then it works.
I don't fully understand what you want to do but it seems like BeautifulSoup can help you out.
Trying to get the frame from the html source below using the following code:
idx = self.selenium.get_element_index("GEPNav")
idx = self.selenium.get_element_index("TOCFrames")
frame = self.selenium.select_frame("TOCFrames")
the 2 calls to get_element_index are for testing and they work, but the call to select_frame() returns None. Not sure why....
<html>
<head>
<TITLE>NYSE Arca Bylaws and Rules</TITLE>
<link rel="stylesheet" href="/PCX/styles/GEP.css">
<script language="javascript" src="misc.js"></script>
<META Http-Equiv="Cache-Control" Content="no-cache">
<META Http-Equiv="Pragma" Content="no-cache">
<META Http-Equiv="Expires" Content="0">
</head>
<script language="javascript" src="RenderTOC.js"></script>
<script language="javascript">
var IntervalID = 0;
IntervalID = window.setInterval('setTimer()', 2000);
</script>
<frameset rows="188, *" border="0" >
<frame src="/PCXTools/ExchangeNav.asp?SelectedNode=chp_1_1&manual=/PCX/pcxe/pcxe-rules/" name="GEPNav" id="GEPNav" scrolling="no" FRAMEBORDER="0" noresize marginwidth="0" marginheight="0">
<frame src="/PCXTools/PlatformTOCFrame.asp?SelectedNode=chp_1_1&manual=/PCX/pcxe/pcxe-rules/#chp_1_1" Name="TOCFrames" id="TOCFrames" scrolling="no" FRAMEBORDER="0" noresize marginwidth="0" marginheight="0">
</frameset>
<noframes>
To be viewed properly, this page requires frames.
</noframes>