This code is working:
from playwright.sync_api import sync_playwright
from bs4 import BeautifulSoup
from datetime import datetime
import time
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
page = browser.new_page()
page.goto("https://www.apple.com/br/shop/product/MV7N2BE/A/airpods-com-estojo-de-recarga")
html = page.content()
soup = BeautifulSoup(html,'html.parser')
valorAppleStore = soup.select("span.as-price-installments")[-2].get_text().replace(" à vista (10% de desconto)", '')
print(valorAppleStore)
browser.close()
But if I change headless=True, the code returns an error:
Traceback (most recent call last):
File "c:/Users/ANDERSONCARVALHODELI/Documents/py/AirpodsPW.py", line 19, in <module>
valorAppleStore = soup.select("span.as-price-installments")[-2].get_text().replace(" à vista (10% de desconto)",
'')
IndexError: list index out of range
I fixed this using:
from playwright.sync_api import sync_playwright
from bs4 import BeautifulSoup
from datetime import datetime
import time
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
page = browser.new_page()
page.goto("https://www.apple.com/br/shop/product/MV7N2BE/A/airpods-com-estojo-de-recarga")
time.sleep(1)
browser.close()
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto("https://www.apple.com/br/shop/product/MV7N2BE/A/airpods-com-estojo-de-recarga")
html = page.content()
soup = BeautifulSoup(html,'html.parser')
valorAppleStore = soup.select("span.as-price-installments")[-2].get_text().replace(" à vista (10% de desconto)", '')
print(valorAppleStore)
But I think this is not the better choice. How do I fix this without opening the browser using headless=False and stick to headless=True?
When I print(html) before soup=..., I see:
<!DOCTYPE html><html><head> <title>Page Not Found - Apple</title> <link rel="stylesheet" href="https://www.apple.com/wss/fonts?families=SF+Pro,v1|SF+Pro+Icons,v1"> <link rel="stylesheet" href="https://www.apple.com/v/errors/c/built/styles/main.built.css" type="text/css"> <link rel="stylesheet" href="https://www.apple.com/v/errors/c/built/styles/overview.built.css" type="text/css"> <link rel="stylesheet" href="https://store.storeimages.cdn-apple.com/4982/store.apple.com/shop/rs-external/rel/us/external.css"> <link rel="stylesheet" href="https://store.storeimages.cdn-apple.com/4982/store.apple.com/shop/rs-globalelements/dist/us/globalelements.css"> <style>.more::after{content: "";}a.pointer, a.more, a.block span.more, button.unbutton.more{padding-right: .7em; background-image: url(https://store.storeimages.cdn-apple.com/4982/store.apple.com/shop/rs-web/2/dist/assets/as-legacy/base/link/res/more.svg); background-repeat: no-repeat; background-position: 100% 50%; background-size: 5px 9px; zoom: 1;}.as-globalfooter-directory-column-section-list a{margin-bottom: .8em; display: block}.as-globalfooter-directory-column-section-list a:last-child{margin-bottom: 0;}.as-globalfooter-mini .as-globalfooter-mini-shop a{color: #06c;}.as-globalfooter .as-globalfooter-mini-legal-copyright, .as-footnotes .as-globalfooter-mini-legal-copyright, .as-globalfooter .as-globalfooter-mini-legal-link, .as-footnotes .as-globalfooter-mini-legal-link{top: -3px; position: relative; z-index: 1;}.as-globalfooter .as-globalfooter-directory+.as-globalfooter-mini, .as-footnotes .as-globalfooter-directory+.as-globalfooter-mini{padding-bottom: 26px;}.container{position: relative;}hr{display: inline-block; border: 0px; border-top: 0.1em solid #CCD2D9; width: 100%}</style></head><body class="page-overview"> <nav data-store-api="/shop/bag/status" id="ac-globalnav"> <div class="ac-gn-content"> <ul class="ac-gn-list"> <p class="ac-gn-link-text">Apple</p> <p class="ac-gn-link-text">Store</p> <p class="ac-gn-link-text">Mac</p> <p class="ac-gn-link-text">iPad</p> <p class="ac-gn-link-text">iPhone</p> <p class="ac-gn-link-text">Watch</p> <p class="ac-gn-link-text">AirPods</p> <p class="ac-gn-link-text">TV & Home</p>
<p class="ac-gn-link-text">Only on Apple</p> <p class="ac-gn-link-text">Accessories</p> <p class="ac-gn-link-text">Support</p> <li class="ac-gn-item ac-gn-item-menu ac-gn-search"> <a id="ac-gn-link-search" class="ac-gn-link ac-gn-link-search" href="/us/search" data-analytics-title="search" data-analytics-intrapage-link="" aria-label="Search apple.com" role="button" aria-haspopup="true"></a> </li> <p class="ac-gn-link-text">Shopping Bag</p> </ul> </div></nav> <div id="ac-gn-placeholder"> </div><main id="main" class="main" role="main" data-page-type="overview"> <h1 class="section-headline typography-headline">The page you’re looking for can’t be found.</h1> <aside id="search-wrapper" role="search" data-analytics-region="search" aria-hidden="false"> <form id="searchform-form" class="searchform" action="/us/search" method="get" data-suggestions-url="/search-services/suggestions/"><input id="searchform-input" type="text" class="form-textbox form-textbox-text form-icon-left" aria-labelledby="textbox_label" required="" aria-required="true" data-placeholder-long="Search for Products, Stores, and Help" autocorrect="off" autocapitalize="off" autocomplete="off"><span class="form-label" id="textbox_label" aria-hidden="true">Search apple.com</span> <div id="searchform-submit" class="form-icons-wrapper form-icons-wrapper-left form-icons-focusable" type="submit" aria-label="Submit"><button class="form-icons form-icons-search15"></button></div><div id="searchform-reset" class="button-reset form-icons-wrapper form-icons-focusable" type="reset" disabled="" aria-label="Clear Search"><button class="form-icons form-icons-small form-icons-clearsolid15 form-icon-reset"></button></div></form> </aside> <div class="cta-sitemap"> <div class="cta-sitemap"> Or see our site map </div></div></main> <footer class="as-globalfooter as-globalfooter-contained"> <div class="as-globalfooter-content"> <div class="as-globalfooter-breadcrumbs"> <p class="as-globalfooter-breadcrumbs-home-icon"></p><p class="as-globalfooter-breadcrumbs-home-label">Apple</p> <div class="as-globalfooter-breadcrumbs-path"> <ol class="as-globalfooter-breadcrumbs-list"> <li class="as-globalfooter-breadcrumbs-item breadcrumbs-title"> Page Not Found</li></ol> </div></div><nav class="as-globalfooter-directory with-5-columns"> <div class="as-globalfooter-directory-column"> <div class="as-globalfooter-directory-column-section"> <h3 class="as-globalfooter-directory-column-section-title">Shop and Learn</h3> <ul class="as-globalfooter-directory-column-section-list"> Store Mac iPad iPhone Watch AirPods TV & Home iPod touch AirTag Accessories Gift Cards </ul> </div></div><div class="as-globalfooter-directory-column"> <div class="as-globalfooter-directory-column-section"> <h3 class="as-globalfooter-directory-column-section-title">Services</h3> <ul class="as-globalfooter-directory-column-section-list"> Apple Music Apple TV+ Apple Fitness+ Apple News+ Apple Arcade iCloud Apple One Apple Card Apple Books Apple Podcasts App Store </ul> </div><div class="as-globalfooter-directory-column-section"> <h3 class="as-globalfooter-directory-column-section-title">Account</h3> <ul class="as-globalfooter-directory-column-section-list"> Manage Your Apple ID Apple Store Account iCloud.com </ul> </div></div><div class="as-globalfooter-directory-column"> <div class="as-globalfooter-directory-column-section"> <h3 class="as-globalfooter-directory-column-section-title">Apple Store</h3> <ul class="as-globalfooter-directory-column-section-list"> Find a Store Genius Bar Today at Apple Apple Camp Apple Store App Refurbished and Clearance Financing Apple Trade In Order Status Shopping Help </ul> </div></div><div class="as-globalfooter-directory-column"> <div class="as-globalfooter-directory-column-section"> <h3 class="as-globalfooter-directory-column-section-title">For Business</h3> <ul class="as-globalfooter-directory-column-section-list"> Apple and Business Shop for Business </ul> </div><div class="as-globalfooter-directory-column-section"> <h3 class="as-globalfooter-directory-column-section-title">For Education</h3> <ul class="as-globalfooter-directory-column-section-list"> Apple and Education Shop for K-12 Shop for College </ul> </div><div class="as-globalfooter-directory-column-section"> <h3 class="as-globalfooter-directory-column-section-title">For Healthcare</h3> <ul class="as-globalfooter-directory-column-section-list"> Apple in Healthcare Health on Apple Watch Health Records on iPhone </ul> </div><div class="as-globalfooter-directory-column-section"> <h3 class="as-globalfooter-directory-column-section-title">For Government</h3> <ul class="as-globalfooter-directory-column-section-list"> Shop for Government Shop for Veterans and Military </ul> </div></div><div class="as-globalfooter-directory-column"> <div class="as-globalfooter-directory-column-section"> <h3 class="as-globalfooter-directory-column-section-title">Apple Values</h3> <ul class="as-globalfooter-directory-column-section-list"> Accessibility Education Environment Inclusion and Diversity Privacy <a href="/racial-equity-justice-initiative/">Racial Equity
and Justice</a> Supplier Responsibility </ul> </div><div class="as-globalfooter-directory-column-section"> <h3 class="as-globalfooter-directory-column-section-title">About Apple</h3> <ul class="as-globalfooter-directory-column-section-list"> Newsroom Apple Leadership Career Opportunities Investors Ethics & Compliance Events Contact Apple </ul> </div></div></nav> <div class="as-globalfooter-mini"> <div class="as-globalfooter-mini-shop">More ways to shop:
Find an Apple Store or other retailer near you. <span>Or call 1-800-MY-APPLE.</span> </div><div class="as-globalfooter-mini-locale"> <a class="as-globalfooter-mini-locale-link" href="/choose-country-region/" title="Choose your country or region" aria-label="United States. Choose your country or region" data-analytics-title="choose your country">United States</a> </div><p class="as-globalfooter-mini-legal-copyright">Copyright © 2022 Apple Inc. All rights reserved. </p><a class="as-globalfooter-mini-legal-link" href="/legal/privacy/">Privacy Policy </a> <a class="as-globalfooter-mini-legal-link" href="/legal/internet-services/terms/site.html">Terms of Use </a> <a class="as-globalfooter-mini-legal-link" href="/us/shop/goto/help/sales_refunds">Sales
and Refunds </a> <a class="as-globalfooter-mini-legal-link" href="/legal/">Legal </a> <a class="as-globalfooter-mini-legal-link" href="/sitemap/">Site Map </a> </div></div></footer> <script src="https://www.apple.com/v/errors/c/built/scripts/main.built.js" type="text/javascript" charset="utf-8"></script></body></html>
First of all, Playwright already has a full suite of selectors that work on the live page, so to eliminate a dependency, speed up your scrape, use less code and avoid weird errors when the static HTML snapshot gets out of sync with the live page, I suggest skipping BS.
On to the main problem, you've done good by printing the HTML to see what sort of response you're dealing with. The 404 page indicates you've been detected as a bot when running headlessly, but this can often manifest as a captcha, Cloudflare browser check page, or other "are you a robot?" notice.
As with everything in scraping, there's no one-size-fits-all solution, but one typical approach is to set a custom user agent string:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
ua = (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/69.0.3497.100 Safari/537.36"
)
url = (
"https://www.apple.com/br/shop/product/MV7N2BE/A/airpods-com-estojo-de-recarga"
)
page = browser.new_page(user_agent=ua)
page.goto(url, wait_until="domcontentloaded")
sel = "span.as-price-installments:last-child"
text = (
page.wait_for_selector(sel)
.text_content()
.replace("à vista (10% de desconto)", "")
.strip()
)
print(text) # => R$ 1.399,50
browser.close()
Related
I'm facing some issues while getting the content from the IRI having some special characters. I've been strictly working with requests module.
Following are some of the URLs which are causing trouble
https://cwur.org/2018-19/King's-College-London.php
https://cwur.org/2018-19/University-of-Wisconsin–Madison.php
import requests
res = requests.get('https://cwur.org/2018-19/University-of-São-Paulo.php')
res.text
In order to get response 200, pass an User-Agent in the headers.
import requests
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11'}
res = requests.get('https://cwur.org/2018-19/University-of-São-Paulo.php', headers=headers)
print(res.status_code)
print("---" * 10)
print(res.text)
Output:
200
------------------------------
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
<meta name="description" content="The Center for World University Rankings (CWUR) is a leading consulting organization and publisher of the largest academic ranking of global universities.">
<meta name="keywords" content="ranking, rankings, university, universities, college, colleges, 2021, 2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, world, top, best, global, Ranking universitario mundial, Classement mondial des universités , Weltweites Universitätsranking, Zentrum für weltweite Universitätsrankings , ××ר×× ×××× ××רס××××ת ××¢××××, ××ר×× ×××ר×× ×××× ××רס××××ת ××¢××××, ì¸ê³ ëíìì, ãä¸çã®å¤§å¦ããã, ä¸ç大å¸æåä¸å¿, ì¸ê³ëíëí¹ì¼í°,ä¸ç大å¦ã©ã³ãã³ã°ã»ã³ã¿ã¼, Ranking mundial universitário, РейÑинг ÑнивеÑÑиÑеÑов миÑа , ÑазÑабоÑки ÑейÑинга ÑнивеÑÑиÑеÑов миÑа, ÙرÙز ,تصÙÙ٠اÙجاÙعات اÙعاÙÙÙØ© ,تصÙÙÙ, اÙجاÙعات, جاÙعات, اÙعاÙÙ, تصÙÙ٠اÙجاÙعات, ÙرÙز تصÙÙ٠اÙجاÙعات اÙعاÙÙÙØ©, Ranking de universidades del mundo, subject, subjects, journal, journals, ranking by subjects, country ranking, country rankings">
<link rel="icon" type="image/png" href="../../favicon.png" />
<!-- Bootstrap core CSS -->
<link href="../../dist/css/bootstrap.min.css" rel="stylesheet">
<!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
<link href="../../assets/css/ie10-viewport-bug-workaround.css" rel="stylesheet">
<!-- Custom styles for this template -->
<link href="../../starter-template.css" rel="stylesheet">
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
<style type="text/css">
/* CSS used here will be applied after bootstrap.css */
.navbar-custom {
color: #FFFFFF;
background-color: #222222;
border-color: #222222;
}
</style>
<title> University of São Paulo Ranking | CWUR World University Rankings 2018-2019</title>
</head>
<body>
<nav class="navbar navbar-inverse navbar-fixed-top">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<img src="../images/logo_944_400.png" height="50">
</div>
<div id="navbar" class="navbar-collapse collapse">
<ul class="nav navbar-nav">
<li>About</li>
<li class="dropdown">
World University Rankings <span class="caret"></span>
<ul class="dropdown-menu">
<li class="dropdown-header">World University Rankings</li>
<li>2020-21</li>
<li>2019-20</li>
<li>2018-19</li>
<li>2017</li>
<li>2016</li>
<li>2015</li>
<li>2014</li>
<li>2013</li>
<li>2012</li>
<li role="separator" class="divider"></li>
<li class="dropdown-header">University Rankings by Country</li>
<li>2018-19</li>
<li>2017</li>
<li>2016</li>
<li>2015</li>
<li>2014</li>
<li role="separator" class="divider"></li>
<li>Rankings by Subject</li>
</ul>
</li>
<li class="dropdown">
Methodology <span class="caret"></span>
<ul class="dropdown-menu">
<li>World University Rankings</li>
<li>Subject Rankings</li>
</ul>
</li>
<li>Media</li>
</ul>
</div>
</div>
</nav>
<div class="container">
<div class="page-header">
<h4> University of São Paulo Ranking - CWUR World University Rankings 2018-2019</h4>
<!-- Go to www.addthis.com/dashboard to customize your tools -->
<div class="addthis_toolbox addthis_default_style addthis_32x32_style"> <a class="addthis_button_preferred_1"></a> <a class="addthis_button_preferred_2"></a> <a class="addthis_button_preferred_3"></a> <a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div> </div>
<div class="row">
<div class="col-md-8">
<table class="table table-bordered table-hover">
<tr><td><b>Institution Name</b></td><td>University of São Paulo </td></tr>
<tr><td><b>Native Name</b></td><td>Universidade de São Paulo </td></tr>
<tr><td><b>Location</b></td><td>Brazil</td></tr>
<tr><td><b>World Rank</b></td><td>77</td></tr>
<tr><td><b>National Rank</b></td><td>1</td></tr>
<tr><td><b>Quality of Education Rank</b></td><td>583</td></tr>
<tr><td><b>Alumni Employment Rank</b></td><td>256</td></tr>
<tr><td><b>Quality of Faculty Rank</b></td><td>109</td></tr>
<tr><td><b>Research Output Rank</b></td><td>4</td></tr>
<tr><td><b>Quality Publications Rank</b></td><td>60</td></tr>
<tr><td><b>Influence Rank</b></td><td>162</td></tr>
<tr><td><b>Citations Rank</b></td><td>139</td></tr>
<tr><td><b>Overall Score</b></td><td>82.6</td></tr>
<tr><td><b>Domain</b></td><td>usp.br</td></tr>
</table>
</div>
<div class="col-md-4">
<div class="table-responsive">
<table class="table table-bordered table-hover">
<tr><td>Top 2000 Universities (2020-21)</td></tr>
<tr><td>Top 2000 Universities (2019-20)</td></tr>
<tr><td>Top 1000 Universities (2018-19)</td></tr>
<tr><td>Ranking by Country (2018-2019)</td></tr>
<tr><td>Top 1000 Universities (2017)</td></tr>
<tr><td>Ranking by Country (2017)</td></tr>
<tr><td>Rankings by Subject</td></tr>
<tr><td>Top 1000 Universities (2016)</td></tr>
<tr><td>Ranking by Country (2016)</td></tr>
<tr><td>Top 1000 Universities (2015)</td></tr>
<tr><td>Ranking by Country (2015)</td></tr>
<tr><td>Top 1000 Universities (2014)</td></tr>
<tr><td>Ranking by Country (2014)</td></tr>
</table>
</div>
</div>
</div>
<p>Copyright © 2012-2020 Center for World University Rankings</p>
</div>
<!-- Bootstrap core JavaScript
================================================== -->
<!-- Placed at the end of the document so the pages load faster -->
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
<script>window.jQuery || document.write('<script src="../../assets/js/vendor/jquery.min.js"><\/script>')</script>
<script src="../../dist/js/bootstrap.min.js"></script>
<!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
<script src="../../assets/js/ie10-viewport-bug-workaround.js"></script>
<!-- Go to www.addthis.com/dashboard to customize your tools -->
<script type="text/javascript" src="//s7.addthis.com/js/300/addthis_widget.js#pubid=ra-5316b43f5ee1fc57"></script>
</body>
</html>
Update:
In case of unicode urls, you can convert them to string
import requests
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11'}
url = "https://cwur.org/2018-19/University-of-S\xc3\xa3o-Paulo.php"
new_url = url.encode("iso-8859-1").decode()
res = requests.get(new_url, headers=headers)
print(res.status_code)
print("---" * 10)
print(res.text)
I recommend trying to store the data you receive from the .get() method in a dictionary and then using pprint module to display in a neat manner:
import requests
from pprint import pprint
url = 'https://cwur.org/2018-19/University-of-Wisconsin–Madison.php'
res = requests.get(url)
# printing the status code is also helpful to see if the API call was successful
print("Status code:", r.status_code)
r_dict = res.json()
pprint(r_dict)
If you get a status code of 200, then the API call was successful. This is more documentation on other status code response: link
Hope this helps you to find the problem with your link.
I am a beginner with python selenium.
Any help I appriciate.
My selenium code :
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) ' \
'Chrome/80.0.3987.132 Safari/537.36'
chrome_option = webdriver.ChromeOptions()
chrome_option.add_argument('--no-sandbox')
chrome_option.add_argument('--disable-dev-shm-usage')
chrome_option.add_argument('--ignore-certificate-errors')
chrome_option.add_argument("--disable-blink-features=AutomationControlled")
chrome_option.add_argument(f'user-agent={user_agent}')
chrome_option.headless = True
driver = webdriver.Chrome(options = chrome_option)
driver.get("https://apps.apple.com/us/app/tiktok-make-your-day/id835599320#see-all/reviews")
I got following Error Message :
[0708/203913.943:INFO:CONSOLE(0)] "Access to font at 'https://www.apple.com/ac/globalfooter/3/en_US/assets/ac-footer/legacy/appleicons_text.woff' from origin 'https://apps.apple.com' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource.", source: https://apps.apple.com/us/app/tiktok-make-your-day/id835599320#see-all/reviews (0)
[0708/203914.012:INFO:CONSOLE(8199)] "Metrics config: No config provided via delegate or fetched via init(), using default/cached config values.", source: https://apps.apple.com/assets/vendor-5715c00de8dadd4a8dd6d176ecd12d82.js (8199)
[0708/203914.022:INFO:CONSOLE(8199)] "Metrics config: No config provided via delegate or fetched via init(), using default/cached config values.", source: https://apps.apple.com/assets/vendor-5715c00de8dadd4a8dd6d176ecd12d82.js (8199)
[0708/203914.027:INFO:CONSOLE(6946)] "ember-i18n has been deprecated in favor of ember-intl", source: https://apps.apple.com/assets/vendor-5715c00de8dadd4a8dd6d176ecd12d82.js (6946)
[0708/203915.024:INFO:CONSOLE(0)] "Access to font at 'https://www.apple.com/ac/globalfooter/3/en_US/assets/ac-footer/legacy/appleicons_text.ttf' from origin 'https://apps.apple.com' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource.", source: https://apps.apple.com/us/app/tiktok-make-your-day/id835599320#see-all/reviews (0)
I look forward to hear you.
Regards!
It's inconclusive from your question why you have to use the following argument:
--disable-blink-features=AutomationControlled
Perhaps with a simple tweak I can access the url successfully as follows:
Code Block:
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get('https://apps.apple.com/us/app/tiktok-make-your-day/id835599320#see-all/reviews')
print(driver.page_source)
Console Output:
<html lang="en-us" prefix="og: http://ogp.me/ns#" xml:lang="en-us"><head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1, viewport-fit=cover">
.
.
<meta name="description" content="Read reviews, compare customer ratings, see screenshots, and learn more about TikTok - Make Your Day. Download TikTok - Make Your Day and enjoy it on your iPhone, iPad, and iPod touch." id="ember15042807" class="ember-view">
<meta name="keywords" content="TikTok - Make Your Day, TikTok Inc., Entertainment, Photo & Video, ios apps, app, appstore, app store, iphone, ipad, ipod touch, itouch, itunes" id="ember15042809" class="ember-view">
<meta property="og:title" content="TikTok - Make Your Day" id="ember15042811" class="ember-view">
<meta property="og:description" content="TikTok is THE destination for mobile videos. On TikTok, short-form videos are exciting, spontaneous, and genuine. Whether you’re a sports fanatic, a pet enthusiast, or just looking for a laugh, there’s something for everyone on TikTok. All you have to do is watch, engage with what you like, skip wha…" id="ember15042813" class="ember-view">
<meta property="og:site_name" content="App Store" id="ember15042815" class="ember-view">
<meta property="og:url" content="https://apps.apple.com/us/app/tiktok-make-your-day/id835599320" id="ember15042817" class="ember-view">
<meta property="og:image" content="https://is2-ssl.mzstatic.com/image/thumb/Purple124/v4/b7/54/52/b7545217-dbd1-c219-ce00-00189f3b739e/AppIcon_TikTok-0-0-1x_U007emarketing-0-0-0-7-0-0-sRGB-0-0-0-GLES2_U002c0-512MB-85-220-0-0.png/1200x630wa.png" id="ember15042819" class="ember-view">
<meta property="og:image:alt" content="TikTok - Make Your Day on the App Store" id="ember15042821" class="ember-view">
<meta property="og:image:secure_url" content="https://is2-ssl.mzstatic.com/image/thumb/Purple124/v4/b7/54/52/b7545217-dbd1-c219-ce00-00189f3b739e/AppIcon_TikTok-0-0-1x_U007emarketing-0-0-0-7-0-0-sRGB-0-0-0-GLES2_U002c0-512MB-85-220-0-0.png/1200x630wa.png" id="ember15042823" class="ember-view">
<meta property="og:image:type" content="image/png" id="ember15042825" class="ember-view">
<meta property="og:image:width" content="1200" id="ember15042827" class="ember-view">
<meta property="og:image:height" content="630" id="ember15042829" class="ember-view">
<meta property="og:type" content="website" id="ember15042831" class="ember-view">
<meta property="og:locale" content="en_US" id="ember15042833" class="ember-view">
<meta property="fb:app_id" content="116556461780510" id="ember15042835" class="ember-view">
<meta name="twitter:title" content="TikTok - Make Your Day" id="ember15042837" class="ember-view">
<meta name="twitter:description" content="TikTok is THE destination for mobile videos. On TikTok, short-form videos are exciting, spontaneous, and genuine. Whether you’re a sports fanatic, a pet enthusiast, or just looking for a laugh, there’s something for everyone on TikTok. All you have to do is watch, engage with what you like, skip wha…" id="ember15042839" class="ember-view">
<meta name="twitter:site" content="#AppStore" id="ember15042841" class="ember-view">
<meta name="twitter:domain" content="AppStore" id="ember15042843" class="ember-view">
<meta name="twitter:image" content="https://is2-ssl.mzstatic.com/image/thumb/Purple124/v4/b7/54/52/b7545217-dbd1-c219-ce00-00189f3b739e/AppIcon_TikTok-0-0-1x_U007emarketing-0-0-0-7-0-0-sRGB-0-0-0-GLES2_U002c0-512MB-85-220-0-0.png/600x600wa.png" id="ember15042845" class="ember-view">
<meta name="twitter:image:alt" content="TikTok - Make Your Day on the App Store" id="ember15042847" class="ember-view">
<meta name="twitter:card" content="summary_large_image" id="ember15042849" class="ember-view">
<meta name="apple-itunes-app" content="app-id=375380948, app-argument=https://apps.apple.com/us/app/tiktok-make-your-day/id835599320" id="ember15042851" class="ember-view">
<script name="schema:software-application" id="ember15042853" class="ember-view" type="application/ld+json">{"#context":"http://schema.org","#type":"SoftwareApplication","name":"TikTok - Make Your Day","description":"TikTok is THE destination for mobile videos. On TikTok, short-form videos are exciting, spontaneous, and genuine. Whether you’re a sports fanatic, a pet enthusiast, or just looking for a laugh, there’s something for everyone on TikTok. All you have to do is watch, engage with what you like, skip what you don’t, and you’ll find an endless stream of short videos that feel personalized just for you. From your morning coffee to your afternoon errands, TikTok has the videos that are guaranteed to make your day.\n\nWe make it easy for you to discover and create your own original videos by providing easy-to-use tools to view and capture your daily moments. Take your videos to the next level with special effects, filters, music, and more. \n\n■ Watch endless amount of videos customized specifically for you\nA personalized video feed based on what you watch, like, and share. TikTok offers you real, interesting, and fun videos that will make your day.\n \n■ Explore videos, just one scroll away\nWatch all types of videos, from Comedy, Gaming, DIY, Food, Sports, Memes, and Pets, to Oddly Satisfying, ASMR, and everything in between.\n \n■ Pause recording multiple times in one video\nPause and resume your video with just a tap. Shoot as many times as you need.\n \n■ Be entertained and inspired by a global community of creators\nMillions of creators are on TikTok showcasing their incredible skills and everyday life. Let yourself be inspired.\n\n■ Add your favorite music or sound to your videos for free\nEasily edit your videos with millions of free music clips and sounds. We curate music and sound playlists for you with the hottest tracks in every genre, including Hip Hop, Edm, Pop, Rock, Rap, and Country, and the most viral original sounds.\n\n■ Express yourself with creative effects\nUnlock tons of filters, effects, and AR objects to take your videos to the next level.\n\n■ Edit your own videos \nOur integrated editing tools allow you to easily trim, cut, merge and duplicate video clips without leaving the app.\n\n* Any feedback? Contact us at feedback#tiktok.com or tweet us #tiktok_us","screenshot":["https://is3-ssl.mzstatic.com/image/thumb/Purple113/v4/ab/10/3d/ab103d22-efca-509b-312a-f2ff2feb819d/pr_source.png/300x0w.jpg","https://is3-ssl.mzstatic.com/image/thumb/Purple113/v4/d3/30/6b/d3306b31-f0e1-fed0-683c-44899a9c0b5d/pr_source.png/300x0w.jpg","https://is2-ssl.mzstatic.com/image/thumb/Purple123/v4/03/88/18/038818fa-44c6-9289-f9b7-fc1d67273fde/pr_source.png/300x0w.jpg","https://is5-ssl.mzstatic.com/image/thumb/Purple123/v4/67/ee/ad/67eead34-780b-5aae-4be7-574ccdd4010a/pr_source.png/300x0w.jpg","https://is5-ssl.mzstatic.com/image/thumb/Purple123/v4/71/6f/da/716fdafc-b9ef-856f-5ce1-5ecf5b05a815/pr_source.png/300x0w.jpg","https://is4-ssl.mzstatic.com/image/thumb/Purple113/v4/98/7a/12/987a1281-b961-75a0-8da7-1e123bbada56/pr_source.jpg/643x0w.jpg","https://is4-ssl.mzstatic.com/image/thumb/Purple113/v4/27/3f/cd/273fcda9-05c6-3d16-1e7d-465cf7d5aa25/pr_source.jpg/643x0w.jpg","https://is3-ssl.mzstatic.com/image/thumb/Purple113/v4/c3/e6/7b/c3e67b5b-9159-50e9-9b62-558aa32cd605/pr_source.jpg/643x0w.jpg","https://is3-ssl.mzstatic.com/image/thumb/Purple113/v4/6c/ff/97/6cff975c-dcb6-253c-a63b-f30c7d465efc/pr_source.jpg/643x0w.jpg","https://is2-ssl.mzstatic.com/image/thumb/Purple123/v4/12/3a/bf/123abfb8-fda2-85bb-6419-a58eefd2af7f/pr_source.jpg/643x0w.jpg"],"image":"https://is2-ssl.mzstatic.com/image/thumb/Purple124/v4/b7/54/52/b7545217-dbd1-c219-ce00-00189f3b739e/AppIcon_TikTok-0-0-1x_U007emarketing-0-0-0-7-0-0-sRGB-0-0-0-GLES2_U002c0-512MB-85-220-0-0.png/1200x630wa.png","applicationCategory":"Entertainment","datePublished":"2014年4月1日","operatingSystem":"Requires iOS 9.3 or later. Compatible with iPhone, iPad, and iPod touch.","author":{"#type":"Person","name":"TikTok Inc.","url":"https://apps.apple.com/us/developer/tiktok-inc/id1039610913"},"aggregateRating":{"#type":"AggregateRating","ratingValue":4.7,"reviewCount":5397820},"offers":{"#type":"Offer","price":0,"priceCurrency":"USD","category":"free"}}
</script>
<meta name="apple:content_id" content="835599320" id="ember15042855" class="ember-view">
<meta name="ember-cli-head-end" content="">
<link rel="stylesheet" type="text/css" href="/global-elements/2014.4.0/en_US/ac-global-nav.d915b46b2869cd416cbafe206ca74838.css" data-global-elements-nav-styles="">
<link rel="stylesheet" type="text/css" href="/global-elements/2014.4.0/en_US/ac-global-footer.23e044b4f5b5dd393dc9767d96faf248.css" data-global-elements-footer-styles="">
<meta name="version" content="2026.2.0">
<link integrity="" rel="stylesheet" href="/assets/web-experience-app-f1c50a018dab4bdb8c454bab56ac60a1.css" data-rtl="/assets/web-experience-rtl-app-bd50312cdbe2a58d742a5dbf295ff43a.css">
<script charset="utf-8" src="https://apps.apple.com/assets/chunk.d1a0e2a45b401b141683.js"></script></head>
<body class="ember-application has-js no-touch">
<div id="ember-app">
<script type="x/boundary" id="fastboot-body-start"></script><aside id="ac-gn-segmentbar" class="ac-gn-segmentbar" lang="en-US" dir="ltr" data-strings="{ 'exit': 'Exit', 'view': '{%STOREFRONT%} Store Home', 'segments': { 'smb': 'Business Store Home', 'eduInd': 'Education Store Home', 'other': 'Store Home' } }">
</aside>
<input type="checkbox" id="ac-gn-menustate" class="ac-gn-menustate">
<nav id="ac-globalnav" class="js no-touch windows" role="navigation" aria-label="Global" data-hires="false" data-analytics-region="global nav" lang="en-US" dir="ltr" data-www-domain="www.apple.com" data-store-locale="us" data-store-root-path="/us" data-store-api="https://www.apple.com/[storefront]/shop/bag/status" data-search-locale="en_US" data-search-suggestions-api="https://www.apple.com/search-services/suggestions/" data-search-defaultlinks-api="https://www.apple.com/search-services/suggestions/defaultlinks/">
<div class="ac-gn-content">
<ul class="ac-gn-header">
<li class="ac-gn-item ac-gn-menuicon">
<label class="ac-gn-menuicon-label" for="ac-gn-menustate" aria-hidden="true">
<span class="ac-gn-menuicon-bread ac-gn-menuicon-bread-top">
<span class="ac-gn-menuicon-bread-crust ac-gn-menuicon-bread-crust-top"></span>
</span>
<span class="ac-gn-menuicon-bread ac-gn-menuicon-bread-bottom">
<span class="ac-gn-menuicon-bread-crust ac-gn-menuicon-bread-crust-bottom"></span>
</span>
</label>
<a href="#ac-gn-menustate" role="button" class="ac-gn-menuanchor ac-gn-menuanchor-open" id="ac-gn-menuanchor-open">
<span class="ac-gn-menuanchor-label">Global Nav Open Menu</span>
</a>
<a href="#" role="button" class="ac-gn-menuanchor ac-gn-menuanchor-close" id="ac-gn-menuanchor-close">
<span class="ac-gn-menuanchor-label">Global Nav Close Menu</span>
</a>
</li>
<li class="ac-gn-item ac-gn-apple">
<a class="ac-gn-link ac-gn-link-apple" href="https://www.apple.com/" data-analytics-title="apple home" id="ac-gn-firstfocus-small">
<span class="ac-gn-link-text">Apple</span>
</a>
</li>
<li class="ac-gn-item ac-gn-bag ac-gn-bag-small" id="ac-gn-bag-small">
<div class="ac-gn-bag-wrapper">
<a class="ac-gn-link ac-gn-link-bag" href="https://www.apple.com/us/shop/goto/bag" data-analytics-title="bag" data-analytics-click="bag" aria-label="Shopping Bag" data-string-badge="Shopping Bag with item count :">
<span class="ac-gn-link-text">Shopping Bag</span>
</a>
<span class="ac-gn-bag-badge">
<span class="ac-gn-bag-badge-separator"></span>
<span class="ac-gn-bag-badge-number"></span>
<span class="ac-gn-bag-badge-unit">+</span>
</span>
</div>
<span class="ac-gn-bagview-caret ac-gn-bagview-caret-large"></span>
</li>
</ul>
<div class="ac-gn-search-placeholder-container" role="search">
<div class="ac-gn-search ac-gn-search-small">
<a id="ac-gn-link-search-small" class="ac-gn-link" href="https://www.apple.com/us/search" data-analytics-title="search" data-analytics-click="search" data-analytics-intrapage-link="" aria-label="Search apple.com" role="button" aria-haspopup="true">
<div class="ac-gn-search-placeholder-bar">
<div class="ac-gn-search-placeholder-input">
<div class="ac-gn-search-placeholder-input-text" aria-hidden="true">
<div class="ac-gn-link-search ac-gn-search-placeholder-input-icon"></div>
<span class="ac-gn-search-placeholder">Search apple.com</span>
</div>
</div>
<div class="ac-gn-searchview-close ac-gn-searchview-close-small ac-gn-search-placeholder-searchview-close">
<span class="ac-gn-searchview-close-cancel" aria-hidden="true">Cancel</span>
</div>
</div>
</a>
</div>
</div>
<ul class="ac-gn-list">
<li class="ac-gn-item ac-gn-apple">
<a class="ac-gn-link ac-gn-link-apple" href="https://www.apple.com/" data-analytics-title="apple home" id="ac-gn-firstfocus">
<span class="ac-gn-link-text">Apple</span>
</a>
</li>
<li class="ac-gn-item ac-gn-item-menu ac-gn-mac">
<a class="ac-gn-link ac-gn-link-mac" href="https://www.apple.com/mac/" data-analytics-title="mac">
<span class="ac-gn-link-text">Mac</span>
</a>
</li>
<li class="ac-gn-item ac-gn-item-menu ac-gn-ipad">
<a class="ac-gn-link ac-gn-link-ipad" href="https://www.apple.com/ipad/" data-analytics-title="ipad">
<span class="ac-gn-link-text">iPad</span>
</a>
</li>
<li class="ac-gn-item ac-gn-item-menu ac-gn-iphone">
<a class="ac-gn-link ac-gn-link-iphone" href="https://www.apple.com/iphone/" data-analytics-title="iphone">
<span class="ac-gn-link-text">iPhone</span>
</a>
</li>
<li class="ac-gn-item ac-gn-item-menu ac-gn-watch">
<a class="ac-gn-link ac-gn-link-watch" href="https://www.apple.com/watch/" data-analytics-title="watch">
<span class="ac-gn-link-text">Watch</span>
</a>
</li>
<li class="ac-gn-item ac-gn-item-menu ac-gn-tv">
<a class="ac-gn-link ac-gn-link-tv" href="https://www.apple.com/tv/" data-analytics-title="tv">
<span class="ac-gn-link-text">TV</span>
</a>
</li>
<li class="ac-gn-item ac-gn-item-menu ac-gn-music">
<a class="ac-gn-link ac-gn-link-music" href="https://www.apple.com/music/" data-analytics-title="music">
<span class="ac-gn-link-text">Music</span>
</a>
</li>
<li class="ac-gn-item ac-gn-item-menu ac-gn-support">
<a class="ac-gn-link ac-gn-link-support" href="https://support.apple.com" data-analytics-title="support">
<span class="ac-gn-link-text">Support</span>
</a>
</li>
<li class="ac-gn-item ac-gn-item-menu ac-gn-search" role="search">
<a id="ac-gn-link-search" class="ac-gn-link ac-gn-link-search" href="https://www.apple.com/us/search" data-analytics-title="search" data-analytics-click="search" data-analytics-intrapage-link="" aria-label="Search apple.com" role="button" aria-haspopup="true"></a>
</li>
<li class="ac-gn-item ac-gn-bag" id="ac-gn-bag">
<div class="ac-gn-bag-wrapper">
<a class="ac-gn-link ac-gn-link-bag" href="https://www.apple.com/us/shop/goto/bag" data-analytics-title="bag" data-analytics-click="bag" aria-label="Shopping Bag" data-string-badge="Shopping Bag with item count : {%BAGITEMCOUNT%}">
<span class="ac-gn-link-text">Shopping Bag</span>
</a>
<span class="ac-gn-bag-badge" aria-hidden="true">
<span class="ac-gn-bag-badge-separator"></span>
<span class="ac-gn-bag-badge-number"></span>
<span class="ac-gn-bag-badge-unit">+</span>
</span>
</div>
<span class="ac-gn-bagview-caret ac-gn-bagview-caret-large"></span>
</li>
</ul>
.
.
.
</div><footer id="ac-globalfooter" class="js flexbox" role="contentinfo" lang="en-US" dir="ltr"><div class="ac-gf-content"><section class="ac-gf-footer">
<div class="ac-gf-footer-shop" x-ms-format-detection="none">
More ways to shop: Find an Apple Store or other retailer near you. <span class="nowrap">Or call 1-800-MY-APPLE.</span>
</div>
<div class="ac-gf-footer-locale">
<a class="ac-gf-footer-locale-link" href="https://www.apple.com/choose-country-region/" title="Choose your country or region" aria-label="Choose your country or region" data-analytics-title="choose your country"><span class="ac-gf-footer-locale-flag" data-hires="false"></span>Choose your country or region</a>
</div>
<div class="ac-gf-footer-legal">
<div class="ac-gf-footer-legal-copyright">Copyright © 2020 Apple Inc. All rights reserved.</div>
<div class="ac-gf-footer-legal-links">
<a class="ac-gf-footer-legal-link" href="https://www.apple.com/legal/privacy/" data-analytics-title="privacy policy">Privacy Policy</a>
<a class="ac-gf-footer-legal-link" href="https://www.apple.com/legal/internet-services/terms/site.html" data-analytics-title="terms of use">Terms of Use</a>
<a class="ac-gf-footer-legal-link" href="https://www.apple.com/us/shop/goto/help/sales_refunds" data-analytics-title="sales and refunds">Sales and Refunds</a>
<a class="ac-gf-footer-legal-link" href="https://www.apple.com/legal/" data-analytics-title="legal">Legal</a>
<a class="ac-gf-footer-legal-link" href="https://www.apple.com/sitemap/" data-analytics-title="site map">Site Map</a>
</div>
</div>
</section>
</div></footer>
.
.
.
</div>
<div id="modal-container"></div>
<script integrity="" src="/assets/vendor-5715c00de8dadd4a8dd6d176ecd12d82.js"></script><div id="ac-gn-viewport-emitter"> </div>
<script integrity="" src="/assets/web-experience-app-26d37fb2d982f3cfebb0a2498926aa6e.js"></script>
<script src="https://js-cdn.music.apple.com/-amp/v2/musickit.js"></script>
<div id="ember-basic-dropdown-wormhole"></div>
</body></html>
I am new to web development and scraping in general and I am trying to challenge myself by scrape websites like LinkedIn.
Since they have embers and dynamically changing ids it is a bit more struggle to scrape properly.
I am trying to scrape the "experience section" of a LinkedIn profile by looking using the following code:
experience = driver.find_element_by_xpath('//section[#id = "experience-section"]/ul/li[#class="position"]')
the driver got the entire Linkedin profile webpage. I would like to have all the position under the "experience-section". The error message is:
Unable to locate element: {"method":"xpath","selector":"//section[#id = "experience-section"]/ul/li/div[#class="position"]"}
I am able to scrape other stuff on Linkedin, but the experience section is a big struggle for me. Is the xpath wrong? if yes, what could I change?
Thank you
<section id="experience-section" class="pv-profile-section experience-section ember-view"><header class="pv-profile-section__card-header">
<h2 class="pv-profile-section__card-heading t-20 t-black t-normal">
Experience
</h2>
<!----></header>
<ul id="ember1620" class="pv-profile-section__section-info section-info pv-profile-section__section-info--has-no-more ember-view"><li id="ember1622" class="pv-profile-section__sortable-item pv-profile-section__section-info-item relative pv-profile-section__list-item sortable-item ember-view"><div id="ember1623" class="pv-entity__position-group-pager ember-view"> <li id="392598211" class="pv-profile-section__sortable-card-item pv-profile-section pv-position-entity ember-view"><!----><a data-control-name="background_details_company" href="/company/8736/" id="ember1626" class="ember-view"> <div class="pv-entity__logo company-logo">
<img class="lazy-image pv-entity__logo-img pv-entity__logo-img EntityPhoto-square-5 loaded" alt="Bill & Melinda Gates Foundation" src="https://media.licdn.com/dms/image/C560BAQHvFIyUvuKtQA/company-logo_400_400/0?e=1556755200&v=beta&t=Qhh8_KnrE-OiuXAutFyeI69tgUF3c1ptC9N12siDO4o">
</div>
<div class="pv-entity__summary-info pv-entity__summary-info--background-section ">
<h3 class="t-16 t-black t-bold">Co-chair</h3>
<h4 class="t-16 t-black t-normal">
<span class="visually-hidden">Company Name</span>
<span class="pv-entity__secondary-title">Bill & Melinda Gates Foundation</span>
</h4>
<div class="display-flex">
<h4 class="pv-entity__date-range t-14 t-black--light t-normal">
<span class="visually-hidden">Dates Employed</span>
<span>2000 – Present</span>
</h4>
<h4 class="t-14 t-black--light t-normal">
<span class="visually-hidden">Employment Duration</span>
<span class="pv-entity__bullet-item-v2">19 yrs</span>
</h4>
</div>
<!---->
</div>
</a>
<!---->
</li>
</div>
</li><li id="ember1630" class="pv-profile-section__sortable-item pv-profile-section__section-info-item relative pv-profile-section__list-item sortable-item ember-view"><div id="ember1631" class="pv-entity__position-group-pager ember-view"> <li id="392599749" class="pv-profile-section__sortable-card-item pv-profile-section pv-position-entity ember-view"><!----><a data-control-name="background_details_company" href="/company/1035/" id="ember1634" class="ember-view"> <div class="pv-entity__logo company-logo">
<img class="lazy-image pv-entity__logo-img pv-entity__logo-img EntityPhoto-square-5 loaded" alt="Microsoft" src="https://media.licdn.com/dms/image/C4D0BAQEko6uLz7XylA/company-logo_400_400/0?e=1556755200&v=beta&t=XQhwV5ruWfGBfjgQylV9gkeXD8VnQRBHGd1bOfTs2tw">
</div>
<div class="pv-entity__summary-info pv-entity__summary-info--background-section ">
<h3 class="t-16 t-black t-bold">Co-founder</h3>
<h4 class="t-16 t-black t-normal">
<span class="visually-hidden">Company Name</span>
<span class="pv-entity__secondary-title">Microsoft</span>
</h4>
<div class="display-flex">
<h4 class="pv-entity__date-range t-14 t-black--light t-normal">
<span class="visually-hidden">Dates Employed</span>
<span>1975 – Present</span>
</h4>
<h4 class="t-14 t-black--light t-normal">
<span class="visually-hidden">Employment Duration</span>
<span class="pv-entity__bullet-item-v2">44 yrs</span>
</h4>
</div>
<!---->
</div>
</a>
<!---->
</li>
</div>
</li>
</ul>
<!----></section>
---- Update:
I used the solution provided by Sers
driver.get('https://www.linkedin.com/in/williamhgates/')
experience = driver.find_elements_by_xpath('//section[#id = "experience-section"]/ul//li')
for item in experience:
print(item.text)
print("")
and I somehow get the results twice:
Co-chair
Company Name
Bill & Melinda Gates Foundation
Dates Employed
2000 – Present
Employment Duration
19 yrs
Co-chair
Company Name
Bill & Melinda Gates Foundation
Dates Employed
2000 – Present
Employment Duration
19 yrs
Co-founder
Company Name
Microsoft
Dates Employed
1975 – Present
Employment Duration
44 yrs
Co-founder
Company Name
Microsoft
Dates Employed
1975 – Present
Employment Duration
44 yrs
The problem in you xpath is li not directly under ul, try xpath below:
//section[#id = "experience-section"]/ul//li
Update
driver.get('https://www.linkedin.com/in/williamhgates/')
experience = driver.find_elements_css_selector('#experience-section .pv-profile-section')
for item in experience:
print(item.text)
print("")
I try using
driver.find_element_by_partial_link_text('2019')
but I get an error saying it was unable to find the element. I also tried using find_element_by_link_text('') and using the whole line but it wont work.
Ideas?
driver.find_element_by_partial_link_text('2019').click()
That is what I have been trying with nothing working.
Here is the webpage HTML:
<div class="rowOf" id="tableRow1">
<div class="tableD">
<div class="productDiv" id="productDiv92195">
<h2 class="productTitle" id="productTitle92195" onclick="goToProduct(0)">2019 Wall Calendar by Camoleaf</h2>
<img class="productImage" src="https://images-na.ssl-images-amazon.com/images/I/91j3pmPYDOL.jpg" onclick="goToProduct(0)">
<hr>
<h4 class="normalPrice" id="normalPrice0" onclick="goToProduct(0)">
Normally: <span class="currency">$ </span>16.95
</h4>
<h4 class="promoPrice" style="margin:2.5px auto;" id="promoPrice92195" onclick="goToProduct(0)">
Your Amazon Price: <span class="currency">$ </span>1.70
</h4>
<h3>Your Total: <span class="currency">$ </span>1.70</h3>
<p class="clickToViewP" id="cToVP92195" onclick="goToProduct(0)">Click to view and purchase!</p>
</div>
</div>
<div class="tableD">
<div class="productDiv" id="productDiv69354">
<h2 class="productTitle" id="productTitle69354" onclick="goToProduct(1)">Pure Lyft Energy Drink Mix (4 Pack) by PURELYFT</h2>
<img class="productImage" src="https://images-na.ssl-images-amazon.com/images/I/81kCgs96Z0L.jpg" onclick="goToProduct(1)">
<hr>
<h4 class="normalPrice" id="normalPrice1" onclick="goToProduct(1)">
Normally: <span class="currency">$ </span>9.99
</h4>
<h4 class="promoPrice" style="margin:2.5px auto;" id="promoPrice69354" onclick="goToProduct(1)">
Your Amazon Price: <span class="currency">$ </span>0.99
</h4>
<h3>Your Total: <span class="currency">$ </span>0.99</h3>
<p class="clickToViewP" id="cToVP69354" onclick="goToProduct(1)">Click to view and purchase!</p>
</div>
</div>
<div class="tableD">
<div class="productDiv" id="productDiv79478">
<h2 class="productTitle" id="productTitle79478" onclick="goToProduct(2)">Multi-Purpose Calf Compression Sleeves by DS Sports</h2>
<img class="productImage" src="https://images-na.ssl-images-amazon.com/images/I/91U7ExY-SfL.jpg" onclick="goToProduct(2)">
<hr>
<h4 class="normalPrice" id="normalPrice2" onclick="goToProduct(2)">
Normally: <span class="currency">$ </span>12.95
</h4>
<h4 class="promoPrice" style="margin:2.5px auto;" id="promoPrice79478" onclick="goToProduct(2)">
Your Amazon Price: <span class="currency">$ </span>5.05
</h4>
<h3>Your Total: <span class="currency">$ </span>5.05</h3>
<p class="clickToViewP" id="cToVP79478" onclick="goToProduct(2)">Click to view and purchase!</p>
</div>
</div>
</div>
In your sample HTML, the only instance of "2019" is in an <h2> tag, not an anchor (<a>) link. Since find_element_by_partial_link_text() only searches anchor tags, it won't find it.
You can search via XPath to find an arbitrary element via partial text. Something like this:
all_matches = driver.find_elements_by_xpath("//*[text()[contains(., '2019')]]")
all_matches[0].click()
That XPath says:
Search all elements (*)
Look at each item's text() in turn
If that text() contains() the string "2019", add it to the set of matches.
And of course we only click on the first element that matches.
Instead of using driver.findElement(By.partialLinkText("2019"));
You should use driver.findElement(By.linkText("2019"));
This again won't work as there are many calander in which the link text is 2019. So you need to provide a particular name.
Example, I did this :-
import org.openqa.selenium.By;
import org.openqa.selenium.Keys;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
public class AmazonShoping {
public static void main(String[] args)
{
System.setProperty("webdriver.chrome.driver","C:\\Users\\priyj_kumar\\Downloads\\chromedriver.exe");
WebDriver driver = new ChromeDriver();
driver.get("https://www.amazon.in");
driver.findElement(By.id("twotabsearchtextbox")).sendKeys("2019 Calander",Keys.ENTER);
driver.findElement(By.linkText("mapyourmonth Planner Organizer Diary Wall Calendar 2019")).click();
//checking for a particular boat headphone say Boat BassHeads 900 Wired Headphone with Mic
// driver.findElement(By.linkText("Boat BassHeads 900 Wired Headphone with Mic")).click();
// String str = driver.findElement(By.xpath("//*[#id='mp-tfa']/p")).getText();
// System.out.println(str);
}
}
This worked fine for me. Please let me know if I misunderstood the question somehow.
I've been building a web scraper in BS4 and have gotten stuck. I am using Trip Advisor as a test for other data I will be going after, but am not able to isolate the tag of the 'entire' reviews. Here is an example:
https://www.tripadvisor.com/Restaurant_Review-g56010-d470148-Reviews-Chez_Nous-Humble_Texas.html
Notice in the first review, there is an icon below "the wine list is...". I am able to easily isolate the partial reviews, but have not been able to figure out a way to get BS4 to pull the reviews after a simulated 'More' click. I'm trying to figure out what tool(s) are needed for this? Do I need to use selenium instead?
The original element looks like this:
<span class="partnerRvw">
<span class="taLnk hvrIE6 tr475091998 moreLink ulBlueLinks" onclick=" ta.util.cookie.setPIDCookie(4444); ta.call('ta.servlet.Reviews.expandReviews', {type: 'dummy'}, ta.id('review_475091998'), 'review_475091998', '1', 4444);
">
More </span>
<span class="ui_icon caret-down"></span>
</span>
Looking at the HTML after you click on the More link you would find a new dynamically added class that has a with the information I need (see below):
<div class="review dyn_full_review inlineReviewUpdate provider0 first newFlag" style="display: block;">
<a name="UR475091998" class=""></a>
<div id="UR475091998" class="extended provider0 first newFlag">
<div class="col1of2">
<div class="member_info">
<div id="UID_6875524F623CC948F4F9CA95BB4A9567-SRC_475091998" class="memberOverlayLink" onmouseover="requireCallIfReady('members/memberOverlay', 'initMemberOverlay', event, this, this.id, 'Reviews', 'user_name_photo');" data-anchorwidth="90">
<div class="avatar profile_6875524F623CC948F4F9CA95BB4A9567 ">
<a onclick="">
<img src="https://media-cdn.tripadvisor.com/media/photo-l/0d/97/43/bf/joannecarpenter.jpg" class="avatar potentialFacebookAvatar avatarGUID:6875524F623CC948F4F9CA95BB4A9567" width="74" height="74">
</a>
</div>
<div class="username mo">
<span class="expand_inline scrname mbrName_6875524F623CC948F4F9CA95BB4A9567" onclick="ta.trackEventOnPage('Reviews', 'show_reviewer_info_window', 'user_name_name_click')">joannecarpenter</span>
</div>
</div>
<div class="location">
Humble, Texas
</div>
</div>
<div class="memberBadging g10n">
<div id="UID_6875524F623CC948F4F9CA95BB4A9567-CONT" class="no_cpu" onclick="ta.util.cookie.setPIDCookie('15984'); requireCallIfReady('members/memberOverlay', 'initMemberOverlay', event, this, this.id, 'Reviews', 'review_count');" data-anchorwidth="90">
<div class="levelBadge badge lvl_02">
Level <span><img src="https://static.tacdn.com/img2/badges/20px/lvl_02.png" alt="" class="icon" width="20" height="20/"></span> Contributor </div>
<div class="reviewerBadge badge">
<img src="https://static.tacdn.com/img2/badges/20px/rev_03.png" alt="" class="icon" width="20" height="20">
<span class="badgeText">6 reviews</span> </div>
<div class="contributionReviewBadge badge">
<img src="https://static.tacdn.com/img2/badges/20px/Foodie.png" alt="" class="icon" width="20" height="20">
<span class="badgeText">6 restaurant reviews</span>
</div>
</div>
</div>
</div>
<div class="col2of2">
<div class="innerBubble">
<div class="quote">“<span class="noQuotes">Dinner</span>”</div>
<div class="rating reviewItemInline">
<span class="rate sprite-rating_s rating_s"> <img class="sprite-rating_s_fill rating_s_fill s50" width="70" src="https://static.tacdn.com/img2/x.gif" alt="5 of 5 bubbles">
</span>
<span class="ratingDate relativeDate" title="April 12, 2017">Reviewed 3 days ago
<span class="new redesigned">NEW</span> </span>
<a class="viaMobile" href="/apps" target="_blank" onclick="ta.util.cookie.setPIDCookie(24687)">
<span class="ui_icon mobile-phone"></span>
via mobile
</a>
</div>
<div class="entry">
<p>
Our favorite restaurant in Houston. Definitely the best and friendliest service! The food is not only served with a flair, it is absolutely delicious. My favorite is the Lamb. It is the best! Also the duck moose, fois gras, the crispy salad and the French onion soup are all spectacular! This is a must try restaurant! The wine list is fantastic. Just ask Daniel for suggestions. He not only knows his wines; he loves what he does! We Love this place!
</p>
</div>
<div class="rating-list">
<div class="recommend">
<span class="recommend-titleInline noRatings">Visited April 2017</span>
</div>
</div>
<div class="expanded lessLink">
<span class="taLnk collapse ulBlueLinks no_cpu ">
Less
</span>
<span class="textArrow_more ui_icon caret-up"></span>
</div>
<div id="helpfulq475091998_expanded" class="helpful redesigned white_btn_container ">
<span class="isHelpful">Helpful?</span> <div class="tgt_helpfulq475091998 rnd_white_thank_btn" onclick="ta.call('ta.servlet.Reviews.helpfulVoteHandlerOb', event, this, 'LeJIVqd4EVIpECri1GII2t6mbqgqguuuxizSxiniaqgeVtIJpEJCIQQoqnQQeVsSVuqHyo3KUKqHMdkKUdvqHxfqHfGVzCQQoqnQQZiptqH5paHcVQQoqnQQrVxEJtxiGIac6XoXmqoTpcdkoKAUAAv0tEn1dkoKAUAAv0zH1o3KUK0pSM13vkooXdqn3XmffAdvqndqnAfbAo77dbAo3k0npEEeJIV1K0EJIVqiJcpV1U0Ii9VC1rZlU3XozxbZZxE2crHN2TDUJiqnkiuzsVEOxdkXqi7TxXpUgyR2xXvOfROwaqILkrzz9MvzCxMva7xEkq8xXNq8ymxbAq8AzzrhhzCxbx2vdNvEn2fnwEfq8alzCeqi53ZrgnMrHhshTtowGpNSmq89IwiVb7crUJxdevaCnJEqI33qiE5JGErJExXKx5ooItGCy5wnCTx2VA7RvxEsO3'); ta.trackEventOnPage('HELPFUL_VOTE_TEST', 'helpfulvotegiven_v2');">
<img src="https://static.tacdn.com/img2/icons/icon_thumb_white.png" class="helpful_thumbs_up white">
<img src="https://static.tacdn.com/img2/icons/icon_thumb_green.png" class="helpful_thumbs_up green">
<span class="helpful_text">Thank joannecarpenter</span> </div>
</div>
<div class="tooltips vertically_centered">
<div class="reportProblem">
<span id="ReportIAP_475091998" class="problem collapsed taLnk" onclick="ta.trackEventOnPage('Report_IAP', 'Report_Button_Clicked', 'member'); ta.call('ta.servlet.Reviews.iapFlyout', event, this, '475091998')" onmouseover="if (!this.getAttribute('data-first')) {ta.trackEventOnPage('Reviews', 'report_problem', 'hover_over_flag'); this.setAttribute('data-first', 1)} uiOverlay(event, this)" data-tooltip="" data-position="above" data-content="Problem with this review?">
<img src="https://static.tacdn.com/img2/icons/gray_flag.png" width="13" height="14" alt="">
<span class="reportTxt">Report</span> </span>
</div>
</div>
<div class="userLinks">
<div class="sameGeoActivity">
<a href="/members-citypage/joannecarpenter/g56010" target="_blank" onclick="ta.setEvtCookie('Reviews','more_reviews_by_user','',0,this.href); ta.util.cookie.setPIDCookie(19160)">
See all 5 reviews by joannecarpenter for Humble </a>
</div>
<div class="askQuestion">
<span class="taLnk ulBlueLinks" onclick="ta.trackEventOnPage('answers_review','ask_user_intercept_click' ); ta.load('ta-answers', (function() {require('answers/misc').askReviewerIntercept(this, '470148', 'joannecarpenter', '6875524F623CC948F4F9CA95BB4A9567', 'en', '475091998','Chez Nous', 39151)}).bind(this), true);">Ask joannecarpenter about Chez Nous</span>
</div>
</div>
<div class="note">
This review is the subjective opinion of a TripAdvisor member and not of TripAdvisor LLC. </div>
<div class="duplicateReviewsInline">
<div class="previous">joannecarpenter has 1 more review of Chez Nous</div> <ul class="dupReviews">
<li class="dupReviewItem">
<div class="reviewTitle">
“Joanne Carpenter”
</div>
<div class="rating">
<span class="rate sprite-rating_ss rating_ss"> <img class="sprite-rating_ss_fill rating_ss_fill ss50" width="50" src="https://static.tacdn.com/img2/x.gif" alt="5 of 5 bubbles">
</span>
<span class="date">Reviewed January 18, 2017</span>
</div>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="large">
</div>
<div class="ad iab_inlineBanner">
<div id="gpt-ad-468x60" class="adInner gptAd"></div>
</div>
</div>
Is there a way for BS4 to handle this for me?
Here's a simple example to get you started:
import selenium
from selenium import webdriver
driver = webdriver.PhantomJS()
url = "https://www.tripadvisor.com/Restaurant_Review-g56010-d470148-Reviews-Chez_Nous-Humble_Texas.html"
driver.get(url)
elem = driver.get_element_by_class_name("taLnk")
...
You could find more info about the methods here:
http://selenium-python.readthedocs.io/
In all likelihood you will need to examine a few more of these pages, to identify variations in the HTML code. For the sample you have offered, and given that you are able to obtain it by simulating a press, the following code works to select the paragraph that you seem to want.
from bs4 import BeautifulSoup
HTML = open('temp.htm').read()
soup = BeautifulSoup(HTML, 'lxml')
para = soup.select('.entry > p')
print (para[0].text)
Result:
Our favorite restaurant in Houston. Definitely the best and friendliest service! The food is not only served with a flair, it is absolutely delicious. My favorite is the Lamb. It is the best! Also the duck moose, fois gras, the crispy salad and the French onion soup are all spectacular! This is a must try restaurant! The wine list is fantastic. Just ask Daniel for suggestions. He not only knows his wines; he loves what he does! We Love this place!
Note that there are newlines before and after the paragraph.