I want to get 'a' tags inside div with the class 'x'. I'm trying this code:
u = urllib.urlopen('http://www.full-hd-wallpaper.com/all')
data = u.read()
soap = BeautifulSoup(data)
print soap.select("div.wallpaper_item a")
but the result is empty. I'm sure the selector is correct. I also tried this simple selector:
print soap.select("div")
but nothing returned. what's wrong with my code?
this is the input:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US" xmlns:fb="http://www.facebook.com/2008/fbml">
<head>
<link rel="stylesheet" type="text/css" href="/templates/darkbrush/style.css?20140420:8" />
<script type="text/javascript">
SITE_URL = '';
SEO_ON = '3';
COMMENT_WAIT = 'Please wait 60 seconds between comments';
COMMENT_ERROR = 'An error occured in sending your comment';
WALLPAPER_SUBMIT_COMMENT = 'Submit comment';
ADDING_COMMENT = 'Adding comment';
COMMENT_ADDED = 'Comment added';
function WallpaperAddHit(id) {
AjaxPost("/includes/wallpaper/ajax/wallpaper_hit.php", "wallpaper_id="+id,
function () {}
)
}
</script>
<script type="text/javascript" src="/includes/wss.js"></script>
<link rel="alternate" type="application/rss+xml" title="" href="/rss.php" />
<title>All wallpapers - Full HD Wallpaper</title><meta name="description" content="All wallpapers All our wallpapers in one place!" />
<meta name="keywords" content="Full HD Wallpapers, wallpapers 2013, full hd wallpaper,wallpapers, HD Wallpapers, HD Wallpaper, desktops, downloads,Wallpaper,hd wallpaper download, hd wallpaper nature, hd wallpaper 1920x1080, HD Wallpaper 1080p, hd wallpaper for android, " /><meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon" />
<link rel="icon" href="/favicon.ico" type="image/x-icon" />
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-42003300-1', 'full-hd-wallpaper.com');
ga('send', 'pageview');
</script>
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-45774874-1']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
</head>
<body>
<!-- Report popup and overlay !-->
<div id="ava-popup">
<div id="ava-popup-header">
<div id="ava-popup-title"></div>
<div id="popup-close-button" onclick="HidePopup('ava-popup');"></div>
</div>
<div id="ava-popup-content"></div>
</div>
<div id="overlay" onclick="HidePopup('ava-popup')"></div>
<div class="header">
<div class="header_logo">
<img src="/templates/darkbrush/images/logo.png" alt="Full HD Wallpaper" />
</div>
<div class="header_right">
<!--ads-->
</div>
<br style="clear:both;" />
</div>
<div class="menu">
<div class="menu_left">
<div class="menu_item">
Homepage</div><div class="menu_item">
New wallpapers</div><div class="menu_item">
Most Downloads</div><div class="menu_item">
Top rated</div><div class="menu_item"></div><div class="menu_item">Copyright policy</div><div class="menu_item">Contact Us </div>
</div>
<div class="menu_right">
<form action="/index.php?task=search" method="get" onsubmit="searchSubmit('', ''); return false;">
<input name="task" type="hidden" value="search" />
<div class="search_contain">
<div class="search_contain_left">
<input name="q" type="text" size="20" id="search_textbox" value="Search..." onclick="clickclear(this, 'Search...')" onblur="clickrecall(this,'Search...')" class="search_box" />
</div>
<div class="search_contain_right">
<input type="image" style="margin-top:7px;" src="/templates/darkbrush/images/search_button.png" />
</div>
</div>
</form>
</div>
</div>
<div class="mc_bg">
<div class="main_container">
<div class="secondary_container">
<div class="left_sidebar">
News
Subscribe
Links<br />
<form action="http://feedburner.google.com/fb/a/mailverify" method="post" target="popupwindow" onsubmit="window.open('http://feedburner.google.com/fb/a/mailverify?uri=FullHdWallpaper', 'popupwindow', 'scrollbars=yes,width=550,height=520');return true"><input type="hidden" name="loc" value="en_US" /><input type="hidden" value="FullHdWallpaper" name="uri" /><h5>Email Subscription</h5>
<input type="text" class="email_textbox" name="email" size="20" />
<input type="submit" class="email_button" value="Subscribe" />
</form>
<br />
<h2>Categories</h2>
<div class="category_menu_item">
All wallpapers</div><div class="category_menu_item">Abstract</div><div class="category_menu_item">Animals</div><div class="category_menu_item">Nature</div><div class="category_menu_item">Men</div><div class="category_menu_item">Children</div><div class="category_menu_item">Girls and Women</div><div class="category_menu_item">World</div><div class="category_menu_item">Foods</div><div class="category_menu_item">Cars</div><div class="category_menu_item">Technology</div><div class="category_menu_item">Holiday</div><div class="category_menu_item">Other</div><div class="category_menu_item">Flower</div>
<br />
<h2>Tags</h2>
<div class="tag_cloud">Adriana lima wallpaper Alessandra ambrosio wallpaper Amber heard wallpaper Beyonce wallpaper Britney Spears wallpaper Candice swanepoel wallpaper Cheryl cole wallpaper Doutzen kroes wallpaper Elisha cuthbert wallpaper Ewelina olczak wallpapers Inna wallpaper Jennifer lawrence wallpapers Jennifer lopez wallpaper Jessica alba wallpaper Kate upton wallpaper Lady Gaga wallpaper Lindsey stirling wallpaper Marloes horst wallpaper Natalia vodianova wallpaper Nicki minaj wallpaper Nicole scherzinger wallpaper Rihanna wallpaper Robert pattinson wallpaper Scarlett johansson wallpaper Taylor swift wallpaper Woman with Car wallpaper alexandra stan wallpaper alfa romeo wallpaper aston martin wallpaper audi wallpaper bar refaeli wallpaper barbara palvin wallpaper benz wallpaper bmw wallpaper bugatti wallpaper emma watson wallpaper eva mendes wallpaper ferrari wallpaper ford wallpaper irina shayk wallpaper kate beckinsale wallpaper katy perry wallpaper kelly clarkson wallpaper lamborghini wallpaper megan fox wallpaper miranda kerr wallpaper motorcycle wallpaper porsche wallpaper rosie huntington wallpaper selena gomez wallpaper suzuki wallpaper </div><br />
<h2>New</h2>
<div class="module_wallpaper">
<a href="/nature/windmill-morning-sunrise-2">
<img src="/image.php?width=229&height=129&id=5997&nocache=1&dothumb=1" alt="windmill morning sunrise" width="150" height="85" />
</a>
</div><div class="module_wallpaper">
<a href="/nature/landscapes-nature-roads">
<img src="/image.php?width=229&height=129&id=5996&nocache=1&dothumb=1" alt="landscapes nature roads" width="150" height="85" />
</a>
</div><div class="module_wallpaper">
<a href="/girls-and-women/candice-swanepoel-model">
<img src="/image.php?width=229&height=129&id=5995&nocache=1&dothumb=1" alt="candice swanepoel model" width="150" height="85" />
</a>
</div><br /> </div>
<div class="right_sidebar">
<h2>Your account</h2>
<form method="post" action="/login.php?done=1">
<div class="mini_login_form">
<p>Username</p>
<input name="username" type="text" id="username" class="mini_login_textbox" /><br />
<p>Password</p>
<input name="password" type="password" id="password" class="mini_login_textbox" /><br />
<p><label><input type="checkbox" name="remember" id="remember" checked="checked" /> Keep me logged in</label></p>
<input type="submit" name="Submit" value="Login" class="mini_login_button" />
Forgotten your password?
</div>
</form>
Register new account
<br />
<h2>Recently viewed</h2>
<div class="no_recents">None...</div><br />
<h2>Stats</h2>
<ul class="stats_ul">
<li><strong>317</strong> users online</li>
<li><strong>5729</strong> wallpapers</li>
<li><strong>76823</strong> members</li>
<li><strong>0</strong> news posts</li>
<li><strong>31</strong> comments</li>
<li><strong>19</strong> categories</li>
</ul><br />
<h2>Links</h2>
<ul><li>Fashion Shows</li><li>wallpapers gallery</li><li>Girls Wallpapers</li><li>High Definition Wallpapers</li></ul><div class="more_links">More links</div>
</div>
<div class="center_column">
<div class="center_container">
<center>
<div class="gas">
<script async="async" src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- wallpaper-text-728 -->
<ins class="adsbygoogle" style="display:inline-block;width:728px;height:15px" data-ad-client="ca-pub-3574787538747201" data-ad-slot="8438048800"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script> </div>
</center>
<div class="header_overflow"><h1>Homepage » All wallpapers</h1></div>
<div class="ad_banner_misc">
</div>
<div class="category_sort_options">
Sort options: <a class="sort_bold" href="/all">Newest</a> | <a class="sort_notbold" href="/all/oldest/1">Oldest</a> | <a class="sort_notbold" href="/all/rating/1">Top Rated</a> | <a class="sort_notbold" href="/all/downloads/1">Most Downloads</a> | <a class="sort_notbold" href="/all/nameasc/1">A-Z</a> | <a class="sort_notbold" href="/all/namedesc/1">Z-A</a> <select class="select" id="resolution_filter" name="resolution_filter" onchange="setResFilter();">
<option value="all">All resolutions</option><optgroup label="4:3"> <option value="1600x1200">1600x1200 </option><option value="1400x1050">1400x1050 </option><option value="1280x960">1280x960 </option><option value="1024x768">1024x768 </option><option value="800x600">800x600 </option></optgroup><optgroup label="16:9"> <option value="2560x1440">2560x1440 </option><option value="1920x1080">1920x1080 (1080p) </option><option value="1600x900">1600x900 </option><option value="1280x720">1280x720 (720p) </option></optgroup><optgroup label="16:10"> <option value="2880x1800">2880x1800 </option><option value="2560x1600">2560x1600 </option><option value="1920x1200">1920x1200 </option><option value="1680x1050">1680x1050 </option><option value="1440x900">1440x900 </option><option value="1280x800">1280x800 </option></optgroup><optgroup label="Apple"> <option value="2048x2048">Retina iPad </option><option value="1024x1024">iPad / iPad mini </option><option value="640x1136">iPhone 5 (& iPod) </option><option value="640x960">iPhone 4/4S/iPod </option><option value="320x480">Older iPhone & iPod </option></optgroup><optgroup label="Blackberry"> <option value="360x480">360x480 </option><option value="320x240">320x240 </option></optgroup><optgroup label="Google Android"> <option value="720x1280">720x1280 </option><option value="540x960">540x960 </option><option value="480x854">480x854 </option><option value="480x800">480x800 </option><option value="320x480">320x480 </option></optgroup><optgroup label="Netbook"> <option value="1366x768">1366x768 </option><option value="1024x600">1024x600 </option><option value="800x480">800x480 </option></optgroup><optgroup label="Other resolutions"> <option value="960x544">960x544 PS Vita </option><option value="480x272">480x272 PSP </option></optgroup><optgroup label="Windows Phone 7/8"> <option value="768x1280">768x1280 </option><option value="720x1280">720x1280 </option><option value="480x800">480x800 </option></optgroup></select> </div>
<script async="async" src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- wallpaper-720 -->
<ins class="adsbygoogle" style="display:inline-block;width:728px;height:90px" data-ad-client="ca-pub-3574787538747201" data-ad-slot="1054382805"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script><br /><br />
<div class="category_container">
<div class="wallpaper_item">
<a href="/nature/windmill-morning-sunrise-2">
<img src="/image.php?width=229&height=129&id=5997&nocache=1&dothumb=1" alt="windmill morning sunrise" />
</a>
<div class="wallpaper_item_name">
<a href="/nature/windmill-morning-sunrise-2">
windmill morning sunrise </a>
</div>
</div><div class="wallpaper_item">
<a href="/nature/landscapes-nature-roads">
<img src="/image.php?width=229&height=129&id=5996&nocache=1&dothumb=1" alt="landscapes nature roads" />
</a>
<div class="wallpaper_item_name">
<a href="/nature/landscapes-nature-roads">
landscapes nature roads </a>
</div>
</div><div class="wallpaper_item">
<a href="/girls-and-women/candice-swanepoel-model">
<img src="/image.php?width=229&height=129&id=5995&nocache=1&dothumb=1" alt="candice swanepoel model" />
</a>
<div class="wallpaper_item_name">
<a href="/girls-and-women/candice-swanepoel-model">
candice swanepoel model </a>
</div>
</div><div class="wallpaper_item">
<a href="/cars/bugatti-veyron-grand-sport-2014">
<img src="/image.php?width=229&height=129&id=5994&nocache=1&dothumb=1" alt="bugatti veyron grand spor…" />
</a>
<div class="wallpaper_item_name">
<a href="/cars/bugatti-veyron-grand-sport-2014">
bugatti veyron grand spor… </a>
</div>
</div><div class="wallpaper_item">
<a href="/girls-and-women/beyonce-knowles-singer">
<img src="/image.php?width=229&height=129&id=5993&nocache=1&dothumb=1" alt="beyonce knowles singer" />
</a>
<div class="wallpaper_item_name">
<a href="/girls-and-women/beyonce-knowles-singer">
beyonce knowles singer </a>
</div>
</div><div class="wallpaper_item">
<a href="/girls-and-women/vintage-girl">
<img src="/image.php?width=229&height=129&id=5992&nocache=1&dothumb=1" alt="vintage girl" />
</a>
<div class="wallpaper_item_name">
<a href="/girls-and-women/vintage-girl">
vintage girl </a>
</div>
</div><div class="wallpaper_item">
<a href="/girls-and-women/jennifer-lopez-2014">
<img src="/image.php?width=229&height=129&id=5991&nocache=1&dothumb=1" alt="jennifer lopez 2014" />
</a>
<div class="wallpaper_item_name">
<a href="/girls-and-women/jennifer-lopez-2014">
jennifer lopez 2014 </a>
</div>
</div><div class="wallpaper_item">
<a href="/cars/bmw-m3">
<img src="/image.php?width=229&height=129&id=5990&nocache=1&dothumb=1" alt="bmw m3" />
</a>
<div class="wallpaper_item_name">
<a href="/cars/bmw-m3">
bmw m3 </a>
</div>
</div><div class="wallpaper_item">
<a href="/nature/tree-leaves-fog">
<img src="/image.php?width=229&height=129&id=5989&nocache=1&dothumb=1" alt="tree leaves fog" />
</a>
<div class="wallpaper_item_name">
<a href="/nature/tree-leaves-fog">
tree leaves fog </a>
</div>
</div><div class="wallpaper_item">
<a href="/girls-and-women/hand-lips-girl">
<img src="/image.php?width=229&height=129&id=5988&nocache=1&dothumb=1" alt="hand lips girl" />
</a>
<div class="wallpaper_item_name">
<a href="/girls-and-women/hand-lips-girl">
hand lips girl </a>
</div>
</div><div class="wallpaper_item">
<a href="/nature/spring-sheet">
<img src="/image.php?width=229&height=129&id=5987&nocache=1&dothumb=1" alt="spring sheet" />
</a>
<div class="wallpaper_item_name">
<a href="/nature/spring-sheet">
spring sheet </a>
</div>
</div><div class="wallpaper_item">
<a href="/cars/srt-viper-2014">
<img src="/image.php?width=229&height=129&id=5986&nocache=1&dothumb=1" alt="srt viper 2014" />
</a>
<div class="wallpaper_item_name">
<a href="/cars/srt-viper-2014">
srt viper 2014 </a>
</div>
</div><div class="wallpaper_item">
<a href="/animals/yellow-butterfly-red">
<img src="/image.php?width=229&height=129&id=5985&nocache=1&dothumb=1" alt="yellow butterfly red" />
</a>
<div class="wallpaper_item_name">
<a href="/animals/yellow-butterfly-red">
yellow butterfly red </a>
</div>
</div><div class="wallpaper_item">
<a href="/girls-and-women/art-girl-butterflies">
<img src="/image.php?width=229&height=129&id=5984&nocache=1&dothumb=1" alt="art girl butterflies" />
</a>
<div class="wallpaper_item_name">
<a href="/girls-and-women/art-girl-butterflies">
art girl butterflies </a>
</div>
</div><div class="wallpaper_item">
<a href="/cities/eiffel-tower-2">
<img src="/image.php?width=229&height=129&id=5983&nocache=1&dothumb=1" alt="eiffel tower" />
</a>
<div class="wallpaper_item_name">
<a href="/cities/eiffel-tower-2">
eiffel tower </a>
</div>
</div><div class="wallpaper_item">
<a href="/girls-and-women/hair-face-wind">
<img src="/image.php?width=229&height=129&id=5982&nocache=1&dothumb=1" alt="hair face wind" />
</a>
<div class="wallpaper_item_name">
<a href="/girls-and-women/hair-face-wind">
hair face wind </a>
</div>
</div><div class="wallpaper_item">
<a href="/cars/bmw-f30">
<img src="/image.php?width=229&height=129&id=5981&nocache=1&dothumb=1" alt="bmw f30" />
</a>
<div class="wallpaper_item_name">
<a href="/cars/bmw-f30">
bmw f30 </a>
</div>
</div><div class="wallpaper_item">
<a href="/nature/purple-ears-field">
<img src="/image.php?width=229&height=129&id=5980&nocache=1&dothumb=1" alt="purple ears field" />
</a>
<div class="wallpaper_item_name">
<a href="/nature/purple-ears-field">
purple ears field </a>
</div>
</div><div class="wallpaper_item">
<a href="/other/bike">
<img src="/image.php?width=229&height=129&id=5979&nocache=1&dothumb=1" alt="bike" />
</a>
<div class="wallpaper_item_name">
<a href="/other/bike">
bike </a>
</div>
</div><div class="wallpaper_item">
<a href="/cars/porsche-carrera">
<img src="/image.php?width=229&height=129&id=5978&nocache=1&dothumb=1" alt="porsche carrera" />
</a>
<div class="wallpaper_item_name">
<a href="/cars/porsche-carrera">
porsche carrera </a>
</div>
</div><div class="wallpaper_item">
<a href="/nature/mountain-nature-landscape">
<img src="/image.php?width=229&height=129&id=5977&nocache=1&dothumb=1" alt="mountain nature landscape" />
</a>
<div class="wallpaper_item_name">
<a href="/nature/mountain-nature-landscape">
mountain nature landscape </a>
</div>
</div><div class="wallpaper_item">
<a href="/cars/suzuki-gsx-r600-motorcycle">
<img src="/image.php?width=229&height=129&id=5976&nocache=1&dothumb=1" alt="suzuki gsx r600 motorcycl…" />
</a>
<div class="wallpaper_item_name">
<a href="/cars/suzuki-gsx-r600-motorcycle">
suzuki gsx r600 motorcycl… </a>
</div>
</div><div class="wallpaper_item">
<a href="/nature/road-mountain">
<img src="/image.php?width=229&height=129&id=5975&nocache=1&dothumb=1" alt="road mountain" />
</a>
<div class="wallpaper_item_name">
<a href="/nature/road-mountain">
road mountain </a>
</div>
</div><div class="wallpaper_item">
<a href="/girls-and-women/car-2">
<img src="/image.php?width=229&height=129&id=5973&nocache=1&dothumb=1" alt="car" />
</a>
<div class="wallpaper_item_name">
<a href="/girls-and-women/car-2">
car </a>
</div>
</div><div class="wallpaper_item">
<a href="/girls-and-women/brunette-dress">
<img src="/image.php?width=229&height=129&id=5972&nocache=1&dothumb=1" alt="brunette dress" />
</a>
<div class="wallpaper_item_name">
<a href="/girls-and-women/brunette-dress">
brunette dress </a>
</div>
</div><div class="wallpaper_item">
<a href="/girls-and-women/model-dress">
<img src="/image.php?width=229&height=129&id=5971&nocache=1&dothumb=1" alt="model dress" />
</a>
<div class="wallpaper_item_name">
<a href="/girls-and-women/model-dress">
model dress </a>
</div>
</div><div class="wallpaper_item">
<a href="/animals/africa-elephants">
<img src="/image.php?width=229&height=129&id=5970&nocache=1&dothumb=1" alt="africa elephants" />
</a>
<div class="wallpaper_item_name">
<a href="/animals/africa-elephants">
africa elephants </a>
</div>
</div><div class="wallpaper_item">
<a href="/nature/green-trees-forest-akes">
<img src="/image.php?width=229&height=129&id=5969&nocache=1&dothumb=1" alt="Green trees forest akes" />
</a>
<div class="wallpaper_item_name">
<a href="/nature/green-trees-forest-akes">
Green trees forest akes </a>
</div>
</div><div class="wallpaper_item">
<a href="/animals/wild-cheetah-alone">
<img src="/image.php?width=229&height=129&id=5968&nocache=1&dothumb=1" alt="wild cheetah alone" />
</a>
<div class="wallpaper_item_name">
<a href="/animals/wild-cheetah-alone">
wild cheetah alone </a>
</div>
</div><div class="wallpaper_item">
<a href="/nature/lake-3">
<img src="/image.php?width=229&height=129&id=5967&nocache=1&dothumb=1" alt="lake" />
</a>
<div class="wallpaper_item_name">
<a href="/nature/lake-3">
lake </a>
</div>
</div> </div>
<br />
<script async="async" src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- wallpaper-720 -->
<ins class="adsbygoogle" style="display:inline-block;width:728px;height:90px" data-ad-client="ca-pub-3574787538747201" data-ad-slot="1054382805"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script><br />
<div class="category_pages">
<div class="pagination_wrap">
<b><font color="#2B6FE4">1</font></b> 2 3 4 5 6 7 8 ... 188 189 Next » </div>
</div>
<br />
<div style="padding: 10px">
<center>
</center>
</div>
</div>
</div>
</div>
<div class="ad_banner_footer"><br style="clear:both" />
<script async="async" src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- wallpaper-text-728 -->
<ins class="adsbygoogle" style="display:inline-block;width:728px;height:15px" data-ad-client="ca-pub-3574787538747201" data-ad-slot="8438048800"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div>
</div>
</div>
<div class="footer">
Powered by Wallpaper Site Script - Copyright AV Scripts 2014 </div>
</body>
</html>
Related
I am trying to pull the name and position of random people from Sales Navigator. Each person shows up as a card that contains all the information. I obtain a list of the cards but then I want to get for each one the Name and Title. I have tried using the code below to get the information from a card, the HTML of one result is below.
So far, my attempts always return an error indicating that the element could not be found. How could I solve this?
def testeo(driver):
lista = driver.find_elements_by_xpath("//*[contains(#class,'pv5 ph2 search-results__result-item')]")
nombres = []
for i in range(0, len(lista)):
nombres.append((lista[i].find_element_by_xpath(".//*[contains(#class,'result-lockup__name')]").text,
lista[i].find_element_by_xpath(".//*[contains(#class,'t-14 t-bold')]").text))
<li class="pv5 ph2 search-results__result-item" data-scroll-into-view="urn:li:fs_salesProfile:(ACwAAAJ-Ab0Bu4JpScPs9SE2b8R_LP9L9vU9nM8,NAME_SEARCH,fH_T)">
<div class="pt5 absolute search-results__select-container">
<input id="search-result-ember6830" class="small-input ember-checkbox ember-view" type="checkbox">
<label class="m0" for="search-result-ember6830">
<span class="a11y-text">
Select Jean Jongejan
</span>
</label>
</div>
<div style="" id="ember6866" class="flex full-width deferred-area ember-view"> <div class="search-results__result-container full-width pl2">
<div id="ember6981" class="ember-view"> <div id="ember6982" class="ember-view">
<article>
<h3 class="a11y-text">
Profile result – Jean Jongejan
</h3>
<section class="result-lockup">
<h4 class="a11y-text">
Profile result lockup – Jean Jongejan
</h4>
<div class="result-lockup__profile-info flex flex-column">
<div class="horizontal-person-entity-lockup-4 result-lockup__entity ml6">
<figure>
<a href="/sales/people/ACwAAAJ-Ab0Bu4JpScPs9SE2b8R_LP9L9vU9nM8,NAME_SEARCH,fH_T?_ntb=ErSmZYlWS8KlI9CD0cB6Yg%3D%3D" id="ember6985" class="result-lockup__icon-link ember-view">
<div class="presence-entity--size-4 relative mr2">
<img src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" loading="lazy" alt="Go to Jean Jongejan’s profile" id="ember6986" class="max-width max-height circle-entity-4 lazy-image ghost-person loaded ember-view">
<div class="presence-indicator presence-indicator--size-4 hidden presence-entity__indicator presence-entity__indicator--size-4" title="Reachable">
<span class="a11y-text">
Jean Jongejan is reachable
</span>
</div>
</div>
</a> </figure>
<dl>
<dt class="result-lockup__name">
<a href="/sales/people/ACwAAAJ-Ab0Bu4JpScPs9SE2b8R_LP9L9vU9nM8,NAME_SEARCH,fH_T?_ntb=ErSmZYlWS8KlI9CD0cB6Yg%3D%3D" id="ember6989" class="ember-view"> Jean Jongejan
</a> </dt>
<dd class="inline-flex vertical-align-middle">
<ul class="ml1 flex align-items-center list-style-none">
<li class="mr1">
<span class="a11y-text">
3rd degree contact
</span>
<span class="label-16dp block" aria-hidden="true">
3rd
</span>
</li>
<!----><!----><!----> </ul>
</dd>
<dd class="result-lockup__highlight-keyword">
<span class="t-14 t-bold">EXT Key Account Management & Consultancy</span>
<span>
at
<span data-entity-hovercard-id="urn:li:fs_salesCompany:36314" class="result-lockup__position-company">
<a href="/sales/company/36314?_ntb=Z6Rvdg6sRMiPD6xsYlUuFQ%3D%3D" id="ember6991" class="Sans-14px-black-75%-bold ember-view"> <span aria-hidden="true">
Marimekko
</span>
<span class="a11y-text">
Go to Marimekko account page
</span>
</a> <button aria-expanded="false" aria-label="See more about Marimekko" class="entity-hovercard__a11y-trigger p0 b0" data-entity-hovercard-id="urn:li:fs_salesCompany:36314" data-entity-hovercard-trigger="click"></button>
</span>
</span>
</dd>
<dd>
<span class="t-12 t-black--light">
3 years 11 months in role and company
</span>
</dd>
<dd>
<ul class="mv1 t-12 t-black--light result-lockup__misc-list">
<li class="result-lockup__misc-item">Breda, North Brabant, Netherlands</li>
</ul>
</dd>
</dl>
</div>
<!----> </div>
<div class="result-lockup__actions flex">
<ul class="result-lockup__common-actions">
<li class="result-lockup__action-item mb3">
<div class="display-flex">
<div id="ember6993" class="ember-view"> <div id="ember6995" class="save-to-list-dropdown artdeco-dropdown artdeco-dropdown--placement-bottom artdeco-dropdown--justification-right ember-view"><button aria-expanded="false" id="ember6996" class="save-to-list-dropdown__trigger ph4 artdeco-button artdeco-button--secondary artdeco-button--pro artdeco-button--1 m-type--message artdeco-dropdown__trigger artdeco-dropdown__trigger--placement-bottom ember-view" type="button" tabindex="0"> Save
<!----></button><div tabindex="-1" aria-hidden="true" id="ember6997" class="save-to-list-dropdown__content-container artdeco-dropdown__content artdeco-dropdown--is-dropdown-element artdeco-dropdown__content--has-arrow artdeco-dropdown__content--arrow-right artdeco-dropdown__content--justification-right artdeco-dropdown__content--placement-bottom ember-view"><div class="artdeco-dropdown__content-inner">
<!---->
</div>
</div></div>
<div id="ember6998" class="ember-view">
<!---->
<!----></div>
</div> <div class="relative">
<div id="ember6999" class="ember-view">
<div id="ember7000" class="artdeco-dropdown artdeco-dropdown--placement-bottom artdeco-dropdown--justification-right ember-view"><button aria-expanded="false" id="ember7001" class="artdeco-dropdown__trigger result-lockup__action-button m-type--more artdeco-dropdown__trigger--non-button artdeco-dropdown__trigger--placement-bottom ember-view" type="button" tabindex="0"> <span class="a11y-text">See more actions for this result</span>
<li-icon aria-hidden="true" type="ellipsis-horizontal-icon" class="artdeco-button artdeco-button--tertiary artdeco-button--1 artdeco-button--muted p0" size="medium"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" data-supported-dps="24x24" fill="currentColor" width="24" height="24" focusable="false">
<path d="M2 10h4v4H2v-4zm8 4h4v-4h-4v4zm8-4v4h4v-4h-4z"></path>
</svg></li-icon>
<!----></button><div tabindex="-1" aria-hidden="true" id="ember7002" class="artdeco-dropdown__content result-lockup__dropdown-more artdeco-dropdown--is-dropdown-element artdeco-dropdown__content--has-arrow artdeco-dropdown__content--arrow-right artdeco-dropdown__content--justification-right artdeco-dropdown__content--placement-bottom ember-view"><!----></div></div>
<div id="ember7003" class="ember-view"><!----></div>
<!----></div> </div>
</div>
</li>
<!----> </ul>
</div>
</section>
<section class="result-context relative pt1">
<h4 class="a11y-text">Profile result context – Jean Jongejan</h4>
<!---->
<!---->
<!----> </section>
</article>
</div>
</div> </div>
</div>
</li>
Can you try this?
//*[name()='dt'][#class='result-lockup__name']
i want to scrape a web table with selenium and beautifulsoup. The table contains 10 x 'resultMainRow' and 4 x 'resultMainCell'.
Inside the 4th resultMainCell, there have 8 span classes, for each holding an img src.
The following html code represents one of the table rows. I could only print out the relevant source code of the table. How can I iterate through the full web table together with the img src?
<div class="resultMainTable">
<div class="resultMainRow">
<div class="resultMainCell_1 tableResult2">
<a href="javascript:genResultDetails(2);"
title="Best of the date">20/006 </a></div>
<div class="resultMainCell_2 tableResult2">21/01/2020</div>
<div class="resultMainCell_3 tableResult2"></div>
<div class="resultMainCell_4 tableResult2">
<span class="resultMainCellInner">
<img height="25" src="/info/images/icon/no_3abc”> </span>
<span class="resultMainCellInner">
<img height="25" src = "/info/images/icon/no_14 " ></span>
<span class="resultMainCellInner">
<img height="25" src "/info/images/icon/no_21 " ></span>
<span class="resultMainCellInner">
<img height="25" src="/info/images/icon/no_28 " ></span>
<span class="resultMainCellInner">
<img height="25" src=" /info/images/icon/no_37 "></span>
<span class="resultMainCellInner">
<img height="25" src="/info/images/icon/no_44 "></span>
<span class="resultMainCellInner">
<img height="6" src="/info/images/icon_happy " ></span>
<span class="resultMainCellInner"
<img height="25" src="/info/images/icon/smile "></span>
</div>
</div>
The table contains 10 x 'resultMainRow' and 4 x 'resultMainCell'.
Inside the 4th resultMainCell, there have 8 span classes, for each holding an img src.
My code is as following:
soup = BeautifulSoup(driver.page_source, 'lxml')
sixsix = soup.findAll("div", {"class": "resultMainTable"})
print (sixsix)
for row in sixsix:
images = soup.findAll('img')
for image in images:
if len(images) == 8:
aaa = images[1].find('src')
bbb = images[2].find('src')
ccc = images[3].find('src')
ddd = images[4].find('src')
eee = images[5].find('src')
fff = images[6].find('src')
ggg = images[7].find('src')
hhh = images[8].find('src')
print ((row.text), (image('src')))
You can try this script to iterate over all rows of the table and extract text from first three cells and 8 URLs from src attributes:
from bs4 import BeautifulSoup
txt = '''
<div class="resultMainTable">
<div class="resultMainRow">
<div class="resultMainCell">text1</div>
<div class="resultMainCell">text2</div>
<div class="resultMainCell">text3</div>
<div class="resultMainCell">
<div>
<div>
<span>
<img src="1" />
<img src="2" />
<img src="3" />
<img src="4" />
<img src="5" />
<img src="6" />
<img src="7" />
<img src="8" />
</span>
</div>
</div>
</div>
</div>
<div class="resultMainRow">
<div class="resultMainCell">text3</div>
<div class="resultMainCell">text4</div>
<div class="resultMainCell">text5</div>
<div class="resultMainCell">
<div>
<div>
<span>
<img src="9" />
<img src="10" />
<img src="11" />
<img src="12" />
<img src="13" />
<img src="14" />
<img src="15" />
<img src="16" />
</span>
</div>
</div>
</div>
</div>
</div>'''
soup = BeautifulSoup(txt, 'html.parser')
for row in soup.select('div.resultMainTable .resultMainRow'):
v1, v2, v3, v4 = row.select('div.resultMainCell')
imgs = [img['src'] for img in v4.select('img')]
print(v1.text, v2.text, v3.text, *imgs)
Prints:
text1 text2 text3 1 2 3 4 5 6 7 8
text3 text4 text5 9 10 11 12 13 14 15 16
EDIT (With real HTML code from edited question):
from bs4 import BeautifulSoup
txt = '''<div class="resultMainTable">
<div class="resultMainRow">
<div class="resultMainCell_1 tableResult2">
<a href="javascript:genResultDetails(2);"
title="Best of the date">20/006 </a></div>
<div class="resultMainCell_2 tableResult2">21/01/2020</div>
<div class="resultMainCell_3 tableResult2"></div>
<div class="resultMainCell_4 tableResult2">
<span class="resultMainCellInner">
<img height="25" src="/info/images/icon/no_3abc"> </span>
<span class="resultMainCellInner">
<img height="25" src = "/info/images/icon/no_14 " ></span>
<span class="resultMainCellInner">
<img height="25" src "/info/images/icon/no_21 " ></span>
<span class="resultMainCellInner">
<img height="25" src="/info/images/icon/no_28 " ></span>
<span class="resultMainCellInner">
<img height="25" src=" /info/images/icon/no_37 "></span>
<span class="resultMainCellInner">
<img height="25" src="/info/images/icon/no_44 "></span>
<span class="resultMainCellInner">
<img height="6" src="/info/images/icon_happy " ></span>
<span class="resultMainCellInner"
<img height="25" src="/info/images/icon/smile "></span>
</div>
</div>'''
soup = BeautifulSoup(txt, 'html.parser')
for row in soup.select('div.resultMainTable .resultMainRow'):
v1, v2, v3, v4 = row.select('div[class^="resultMainCell"]')
imgs = [img['src'] for img in v4.select('img')]
print(v1.text, v2.text, v3.text, *imgs)
Prints:
20/006 21/01/2020 /info/images/icon/no_3abc /info/images/icon/no_14 /info/images/icon/no_28 /info/images/icon/no_37 /info/images/icon/no_44 /info/images/icon_happy
I just found out about how to process webpages in python using BeautifulSoup.
There's a list of div from which I want to get those in a specific range. The range is defined by two div that have a h2 child.
How would I do that? Thank you for your support!
EDIT: I added an actual representation of my html code below instead of a previous "simplified" version that was missing tags.
The new code shows a root div with class foo-bar-details.
Nested are 9 div tags. Two of which have a nested h2 tag. All of those 9 div tags contain img elements deeply nested within. What I need is each img element of those divs that are between the ones containing the h2 element.
An expected outcome if applied to the html code below would be:
<img src="../../images/123456_thumb.jpg" alt="Image 123456" title="Image 123456">
<img src="../../images/67890_thumb.JPG" alt="Image 67890 " title="Image 67890">
This is the html code:
<div class="foo-bar-details">
<div class="padding-y-10 padding-x-40 gray-sand-bg" id="sec-feat-3-1">
<div class="row">
<div class="col-sm-6 info-panel">
<div class="row">
<div class="col-sm-6 margin-bottom-10">
<p class="margin-0">
<strong>fsuhfsdf </strong>
</p>
</div>
<div class="col-sm-6 margin-bottom-10">
<p class="margin-0">
<strong>Feat</strong><span class="icon-help"></span>
</p>
</div>
</div>
</div>
<div class="col-sm-6 foo-images">
<div class="row">
<img src="../../images/39826_thumb.JPG" alt="Image 39826" title="Image 39826 ">
<div class="img-description">
</div>
</div>
</div>
</div>
</div>
<div class="padding-y-10 padding-x-40 gray-sand-bg" id="sec-feat-3-1">
<div class="row">
<div class="col-sm-6 info-panel">
<div class="row">
<div class="col-sm-6 margin-bottom-10">
<p class="margin-0">
<strong>JHFDFD </strong>
</p>
</div>
<div class="col-sm-6 margin-bottom-10">
<p class="margin-0">
<strong>Feat</strong><span class="icon-help"></span>
</p>
</div>
</div>
</div>
<div class="col-sm-6 foo-images">
<div class="row">
<img src="../../images/223234_thumb.JPG" alt="Image 223234" title="Image 223234 ">
<div class="img-description">
</div>
</div>
</div>
</div>
</div>
<div class="padding-y-10 padding-x-40 gray-sand-bg" id="sec-feat-3-1">
<div class="row">
<div class="col-sm-6 info-panel">
<div class="row">
<div class="col-sm-6 margin-bottom-10">
<p class="margin-0">
<strong>sdfsdf </strong>
</p>
</div>
<div class="col-sm-6 margin-bottom-10">
<p class="margin-0">
<strong>Feat</strong><span class="icon-help"></span>
</p>
</div>
</div>
</div>
<div class="col-sm-6 foo-images">
<div class="row">
<img src="../../images/223823_thumb.JPG" alt="Image 223823" title="Image 223823 ">
<div class="img-description">
</div>
</div>
</div>
</div>
</div>
<div class="element-header mystic-bg padding-y-10 padding-x-20" id="elem-4">
<h2 class="h3 margin-bottom-5">
Foo
</h2>
<ul class="list-inline margin-0">
<li> Foo feature </li>
...
</ul>
</div>
<div id="info-panel-header" class="padding-y-10 padding-x-40">
<div class="row">
<div class="col-se-6 element-info">
<div class="col-se-12">
<div class="row">
</div>
</div>
</div>
<div class="col-sm-6 foo-images">
<div class="row">
<img src="../../images/123456_thumb.jpg" alt="Image 123456" title="Image 123456">
<div class="img-description">
</div>
</div>
</div>
</div>
</div>
<div class="padding-y-10 padding-x-40 gray-wild-sand-bg" id="sec-feat-4-1">
<div class="row">
<div class="col-sm-6 info-panel">
<div class="row">
<div class="col-sm-6 margin-bottom-10">
<p class="margin-0">
<strong>Foo strin: </strong>
</p>
</div>
<div class="col-sm-6 margin-bottom-10">
<p class="margin-0">
<strong>Barbar</strong><span class="icon-help"></span>
</p>
</div>
</div>
<div class="row">
<div class="col-sm-6 margin-bottom-10">
<p class="margin-0">
<strong>Mine: </strong>
</p>
</div>
<div class="col-sm-6 margin-bottom-10">
<p class="margin-0">
TEST<span class="icon-help"></span>
</p>
</div>
</div>
</div>
<div class="col-sm-6 foo-images">
<div class="row">
<img src="../../images/67890_thumb.JPG" alt="Image 67890 " title="Image 67890">
<div class="img-description">
</div>
</div>
</div>
</div>
</div>
<div class="element-header mystic-bg padding-y-10 padding-x-20" id="elem-5">
<h2 class="h3 margin-bottom-5">
Bar
</h2>
<ul class="list-inline margin-0">
<li> Bar feature </li>
...
</ul>
</div>
<div class="padding-y-10 padding-x-40 gray-sand-bg" id="sec-feat-3-1">
<div class="row">
<div class="col-sm-6 info-panel">
<div class="row">
<div class="col-sm-6 margin-bottom-10">
<p class="margin-0">
<strong>fsuhfsdf </strong>
</p>
</div>
<div class="col-sm-6 margin-bottom-10">
<p class="margin-0">
<strong>Feat</strong><span class="icon-help"></span>
</p>
</div>
</div>
</div>
<div class="col-sm-6 foo-images">
<div class="row">
<img src="../../images/39826_thumb.JPG" alt="Image 39826" title="Image 39826 ">
<div class="img-description">
</div>
</div>
</div>
</div>
</div>
<div class="padding-y-10 padding-x-40 gray-sand-bg" id="sec-feat-3-1">
<div class="row">
<div class="col-sm-6 info-panel">
<div class="row">
<div class="col-sm-6 margin-bottom-10">
<p class="margin-0">
<strong>fsuhfsdf </strong>
</p>
</div>
<div class="col-sm-6 margin-bottom-10">
<p class="margin-0">
<strong>Feat</strong><span class="icon-help"></span>
</p>
</div>
</div>
</div>
<div class="col-sm-6 foo-images">
<div class="row">
<img src="../../images/209876_thumb.JPG" alt="Image 209876" title="Image 209876 ">
<div class="img-description">
</div>
</div>
</div>
</div>
</div>
</div>
Here is a solution involving lxml.html:
We extract all divs between the first and last divs which contain an h2 tag:
import lxml.html
# HTML file saved as "file.html"
file_name = "file.html"
with open(file_name, 'r') as f:
tree = lxml.html.fromstring(f.read())
# all_div = tree.findall('div')
all_div = tree.find_class('foo-bar-details')[0].findall('div')
start, stop = None, None
for k, div in enumerate(all_div):
if div.findall('h2') and start is None:
print("Range starts at %d" % k)
start = k
continue
if div.findall('h2') and start is not None:
print("Range stops at %d" % k)
stop = k + 1 # add one as range stops at k - 1
continue
# div_list = all_div[start:stop]
img_list = [_.xpath('.//img') for _ in all_div[start:stop]]
print(img_list)
# [[], [<Element img at 0x20b58d73f40>], [<Element img at 0x20b58d73f90>], []]
# Or
img_list = [_.xpath('.//img/#src') for _ in all_div[start:stop]]
print(img_list)
# [[], ['../../images/123456_thumb.jpg'], ['../../images/67890_thumb.JPG'], []]
Another solution involving SimplifiedDoc:
from simplified_scrapy.simplified_doc import SimplifiedDoc
html ='''
<div class="foo-bar-details">
<div class="element-header mystic-bg padding-y-10 padding-x-20" id="elem-4">
<h2 class="h3 margin-bottom-5">
Foo
</h2>
<ul class="list-inline margin-0">
<li> Foo feature </li>
...
</ul>
</div>
<div id="info-panel-header" class="padding-y-10 padding-x-40">Test 1</div>
<div class="padding-y-10 padding-x-40 gray-wild-sand-bg" id="foo-feat-4-1">Test 2</div>
<div class="padding-y-10 padding-x-40 " id="foo-feat-4-2">Test 3</div>
<div class="padding-y-10 padding-x-40 gray-wild-sand-bg" id="foo-feat-4-3">Test 4</div>
<div class="element-header mystic-bg padding-y-10 padding-x-20" id="elem-5">
<h2 class="h3 margin-bottom-5">
Bar
</h2>
<ul class="list-inline margin-0">
<li> Bar feature </li>
...
</ul>
</div>
</div>
'''
doc = SimplifiedDoc(html)
divs = doc.select('div.foo-bar-details').divs.contains('<h2')
print ([div.id for div in divs])
divs = doc.select('div.foo-bar-details').divs.notContains('<h2')
print ([div.id for div in divs])
Result:
['elem-4', 'elem-5']
['info-panel-header', 'foo-feat-4-1', 'foo-feat-4-2', 'foo-feat-4-3']
Simplifieddoc library does not rely on the third-party library, which is lighter and faster, perfect for beginners.
Here are more examples here
If I understand you correctly, you want to find <img> tags and corresponding <h2> to which the images belong to.
This example (txt variable contains the HTML snippet from your question):
from bs4 import BeautifulSoup
soup = BeautifulSoup(txt, 'html.parser')
out = {}
for img in soup.select('div:has(h2) ~ div img'):
out.setdefault(img.find_previous('h2').get_text(strip=True), []).append(img['src'])
from pprint import pprint
pprint(out)
Prints:
{'Bar': ['../../images/39826_thumb.JPG', '../../images/209876_thumb.JPG'],
'Foo': ['../../images/123456_thumb.jpg', '../../images/67890_thumb.JPG']}
I have tried several answers in Stack Overflow. When I print the webpage I can only see the equivalent of viewing the page source in Chrome, rather than the full DOM tree you would get from inspecting the web page. As you can see I have put a wait in but this hasn't changed anything, should I try Firefox instead of Chrome?
Is it possible the website I'm trying to use has anti-scraping measures? What else could I try?
def selenium_start(url):
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('chromedriver',chrome_options=options)
driver.get(url)
try:
driver = WebDriverWait(driver, 5).until\
(EC.presence_of_element_located((By.ID, "koex")))
except:
print('Sorry!')
return driver
webpage_driver = selenium_start('https://getbootstrap.com/docs/4.0/components/collapse/')
"""
div_container = webpage_driver.find_element(By.CLASS_NAME, 'maincontent')
html = webpage_driver.execute_script('return document.documentElement.outerHTML')
#inner_div = div_container.get_attribute('outerHTML')
"""
print(page_soup)
To extract the Page Source you have to induce WebDriverWait for the visibility_of_element_located() of an element within the webpage and you can use the following Locator Strategies:
Code Block:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get("https://getbootstrap.com/docs/4.0/components/collapse/")
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "h1.bd-title")))
print(driver.page_source)
driver.quit()
Console Output:
<html lang="en"><head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta name="description" content="Toggle the visibility of content across your project with a few classes and our JavaScript plugins.">
<meta name="author" content="Mark Otto, Jacob Thornton, and Bootstrap contributors">
<meta name="generator" content="Jekyll v3.7.0">
<title>Collapse · Bootstrap</title>
<link rel="canonical" href="https://getbootstrap.com/docs/4.0/components/collapse/">
<!-- Bootstrap core CSS -->
<style class="anchorjs"></style><link href="/docs/4.0/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-Gn5384xqQ1aoWXA+058RXPxPg6fy4IWvTNh0E263XmFcJlSAwiGgFAW/dAiS6JXm" crossorigin="anonymous">
<!-- Documentation extras -->
<link href="https://cdn.jsdelivr.net/npm/docsearch.js#2/dist/cdn/docsearch.min.css" rel="stylesheet">
<link href="/docs/4.0/assets/css/docs.min.css" rel="stylesheet">
<!-- Favicons -->
<link rel="apple-touch-icon" href="/docs/4.0/assets/img/favicons/apple-touch-icon.png" sizes="180x180">
<link rel="icon" href="/docs/4.0/assets/img/favicons/favicon-32x32.png" sizes="32x32" type="image/png">
<link rel="icon" href="/docs/4.0/assets/img/favicons/favicon-16x16.png" sizes="16x16" type="image/png">
<link rel="manifest" href="/docs/4.0/assets/img/favicons/manifest.json">
<link rel="mask-icon" href="/docs/4.0/assets/img/favicons/safari-pinned-tab.svg" color="#563d7c">
<link rel="icon" href="/docs/4.0/assets/img/favicons/favicon.ico">
<meta name="msapplication-config" content="/docs/4.0/assets/img/favicons/browserconfig.xml">
<meta name="theme-color" content="#563d7c">
<!-- Twitter -->
<meta name="twitter:card" content="summary">
<meta name="twitter:site" content="#getbootstrap">
<meta name="twitter:creator" content="#getbootstrap">
<meta name="twitter:title" content="Collapse">
<meta name="twitter:description" content="Toggle the visibility of content across your project with a few classes and our JavaScript plugins.">
<meta name="twitter:image" content="https://getbootstrap.com/docs/4.0/assets/brand/bootstrap-social-logo.png">
<!-- Facebook -->
<meta property="og:url" content="https://getbootstrap.com/docs/4.0/components/collapse/">
<meta property="og:title" content="Collapse">
<meta property="og:description" content="Toggle the visibility of content across your project with a few classes and our JavaScript plugins.">
<meta property="og:type" content="website">
<meta property="og:image" content="http://getbootstrap.com/docs/4.0/assets/brand/bootstrap-social.png">
<meta property="og:image:secure_url" content="https://getbootstrap.com/docs/4.0/assets/brand/bootstrap-social.png">
<meta property="og:image:type" content="image/png">
<meta property="og:image:width" content="1200">
<meta property="og:image:height" content="630">
<script async="" src="https://www.google-analytics.com/analytics.js"></script><script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-146052-10', 'getbootstrap.com');
ga('set', 'anonymizeIp', true);
ga('send', 'pageview');
</script>
<script id="_carbonads_projs" type="text/javascript" src="https://srv.carbonads.net/ads/CKYIKKJL.json?segment=placement:getbootstrapcom&callback=_carbonads_go"></script></head>
<body>
<a id="skippy" class="sr-only sr-only-focusable" href="#content">
<div class="container">
<span class="skiplink-text">Skip to main content</span>
</div>
</a>
There's a newer version of Bootstrap 4!
<header class="navbar navbar-expand navbar-dark flex-column flex-md-row bd-navbar">
<a class="navbar-brand mr-0 mr-md-2" href="/" aria-label="Bootstrap"><svg class="d-block" width="36" height="36" viewBox="0 0 612 612" xmlns="http://www.w3.org/2000/svg" focusable="false"><title>Bootstrap</title><path fill="currentColor" d="M510 8a94.3 94.3 0 0 1 94 94v408a94.3 94.3 0 0 1-94 94H102a94.3 94.3 0 0 1-94-94V102a94.3 94.3 0 0 1 94-94h408m0-8H102C45.9 0 0 45.9 0 102v408c0 56.1 45.9 102 102 102h408c56.1 0 102-45.9 102-102V102C612 45.9 566.1 0 510 0z"></path><path fill="currentColor" d="M196.77 471.5V154.43h124.15c54.27 0 91 31.64 91 79.1 0 33-24.17 63.72-54.71 69.21v1.76c43.07 5.49 70.75 35.82 70.75 78 0 55.81-40 89-107.45 89zm39.55-180.4h63.28c46.8 0 72.29-18.68 72.29-53 0-31.42-21.53-48.78-60-48.78h-75.57zm78.22 145.46c47.68 0 72.73-19.34 72.73-56s-25.93-55.37-76.46-55.37h-74.49v111.4z"></path></svg>
</a>
<div class="navbar-nav-scroll">
<ul class="navbar-nav bd-navbar-nav flex-row">
<li class="nav-item">
<a class="nav-link " href="/" onclick="ga('send', 'event', 'Navbar', 'Community links', 'Bootstrap');">Home</a>
</li>
<li class="nav-item">
<a class="nav-link active" href="/docs/4.0/getting-started/introduction/" onclick="ga('send', 'event', 'Navbar', 'Community links', 'Docs');">Documentation</a>
</li>
<li class="nav-item">
<a class="nav-link " href="/docs/4.0/examples/" onclick="ga('send', 'event', 'Navbar', 'Community links', 'Examples');">Examples</a>
</li>
<li class="nav-item">
<a class="nav-link" href="https://themes.getbootstrap.com/" onclick="ga('send', 'event', 'Navbar', 'Community links', 'Themes');" target="_blank" rel="noopener">Themes</a>
</li>
<li class="nav-item">
<a class="nav-link" href="https://expo.getbootstrap.com/" onclick="ga('send', 'event', 'Navbar', 'Community links', 'Expo');" target="_blank" rel="noopener">Expo</a>
</li>
<li class="nav-item">
<a class="nav-link" href="https://blog.getbootstrap.com/" onclick="ga('send', 'event', 'Navbar', 'Community links', 'Blog');" target="_blank" rel="noopener">Blog</a>
</li>
</ul>
</div>
<ul class="navbar-nav flex-row ml-md-auto d-none d-md-flex">
<li class="nav-item dropdown">
<a class="nav-item nav-link dropdown-toggle mr-md-2" href="#" id="bd-versions" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
v4.0
</a>
<div class="dropdown-menu dropdown-menu-right" aria-labelledby="bd-versions">
<a class="dropdown-item" href="/docs/4.1/">Latest (v4.1.x)</a>
<a class="dropdown-item active" href="/docs/4.0/">v4.0.0</a>
<div class="dropdown-divider"></div>
<a class="dropdown-item" href="https://v4-alpha.getbootstrap.com/">v4 Alpha 6</a>
<a class="dropdown-item" href="https://getbootstrap.com/docs/3.3/">v3.3.7</a>
<a class="dropdown-item" href="https://getbootstrap.com/2.3.2/">v2.3.2</a>
</div>
</li>
<li class="nav-item">
<a class="nav-link p-2" href="https://github.com/twbs/bootstrap" target="_blank" rel="noopener" aria-label="GitHub"><svg class="navbar-nav-svg" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 499.36" focusable="false"><title>GitHub</title><path d="M256 0C114.64 0 0 114.61 0 256c0 113.09 73.34 209 175.08 242.9 12.8 2.35 17.47-5.56 17.47-12.34 0-6.08-.22-22.18-.35-43.54-71.2 15.49-86.2-34.34-86.2-34.34-11.64-29.57-28.42-37.45-28.42-37.45-23.27-15.84 1.73-15.55 1.73-15.55 25.69 1.81 39.21 26.38 39.21 26.38 22.84 39.12 59.92 27.82 74.5 21.27 2.33-16.54 8.94-27.82 16.25-34.22-56.84-6.43-116.6-28.43-116.6-126.49 0-27.95 10-50.8 26.35-68.69-2.63-6.48-11.42-32.5 2.51-67.75 0 0 21.49-6.88 70.4 26.24a242.65 242.65 0 0 1 128.18 0c48.87-33.13 70.33-26.24 70.33-26.24 14 35.25 5.18 61.27 2.55 67.75 16.41 17.9 26.31 40.75 26.31 68.69 0 98.35-59.85 120-116.88 126.32 9.19 7.9 17.38 23.53 17.38 47.41 0 34.22-.31 61.83-.31 70.23 0 6.85 4.61 14.81 17.6 12.31C438.72 464.97 512 369.08 512 256.02 512 114.62 397.37 0 256 0z" fill="currentColor" fill-rule="evenodd"></path></svg>
</a>
</li>
<li class="nav-item">
<a class="nav-link p-2" href="https://twitter.com/getbootstrap" target="_blank" rel="noopener" aria-label="Twitter"><svg class="navbar-nav-svg" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 416.32" focusable="false"><title>Twitter</title><path d="M160.83 416.32c193.2 0 298.92-160.22 298.92-298.92 0-4.51 0-9-.2-13.52A214 214 0 0 0 512 49.38a212.93 212.93 0 0 1-60.44 16.6 105.7 105.7 0 0 0 46.3-58.19 209 209 0 0 1-66.79 25.37 105.09 105.09 0 0 0-181.73 71.91 116.12 116.12 0 0 0 2.66 24c-87.28-4.3-164.73-46.3-216.56-109.82A105.48 105.48 0 0 0 68 159.6a106.27 106.27 0 0 1-47.53-13.11v1.43a105.28 105.28 0 0 0 84.21 103.06 105.67 105.67 0 0 1-47.33 1.84 105.06 105.06 0 0 0 98.14 72.94A210.72 210.72 0 0 1 25 370.84a202.17 202.17 0 0 1-25-1.43 298.85 298.85 0 0 0 160.83 46.92" fill="currentColor"></path></svg>
</a>
</li>
<li class="nav-item">
<a class="nav-link p-2" href="https://bootstrap-slack.herokuapp.com" target="_blank" rel="noopener" aria-label="Slack"><svg class="navbar-nav-svg" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512" focusable="false"><title>Slack</title><path fill="currentColor" d="M210.787 234.832l68.31-22.883 22.1 65.977-68.309 22.882z"></path><path d="M490.54 185.6C437.7 9.59 361.6-31.34 185.6 21.46S-31.3 150.4 21.46 326.4 150.4 543.3 326.4 490.54 543.34 361.6 490.54 185.6zM401.7 299.8l-33.15 11.05 11.46 34.38c4.5 13.92-2.87 29.06-16.78 33.56-2.87.82-6.14 1.64-9 1.23a27.32 27.32 0 0 1-24.56-18l-11.46-34.38-68.36 22.92 11.46 34.38c4.5 13.92-2.87 29.06-16.78 33.56-2.87.82-6.14 1.64-9 1.23a27.32 27.32 0 0 1-24.56-18l-11.46-34.43-33.15 11.05c-2.87.82-6.14 1.64-9 1.23a27.32 27.32 0 0 1-24.56-18c-4.5-13.92 2.87-29.06 16.78-33.56l33.12-11.03-22.1-65.9-33.15 11.05c-2.87.82-6.14 1.64-9 1.23a27.32 27.32 0 0 1-24.56-18c-4.48-13.93 2.89-29.07 16.81-33.58l33.15-11.05-11.46-34.38c-4.5-13.92 2.87-29.06 16.78-33.56s29.06 2.87 33.56 16.78l11.46 34.38 68.36-22.92-11.46-34.38c-4.5-13.92 2.87-29.06 16.78-33.56s29.06 2.87 33.56 16.78l11.47 34.42 33.15-11.05c13.92-4.5 29.06 2.87 33.56 16.78s-2.87 29.06-16.78 33.56L329.7 194.6l22.1 65.9 33.15-11.05c13.92-4.5 29.06 2.87 33.56 16.78s-2.88 29.07-16.81 33.57z" fill="currentColor"></path></svg>
</a>
</li>
</ul>
<a class="btn btn-bd-download d-none d-lg-inline-block mb-3 mb-md-0 ml-md-3" href="https://github.com/twbs/bootstrap/archive/v4.0.0.zip">Download</a>
</header>
<div class="container-fluid">
<div class="row flex-xl-nowrap">
<div class="col-12 col-md-3 col-xl-2 bd-sidebar">
<form class="bd-search d-flex align-items-center">
<span class="algolia-autocomplete" style="position: relative; display: inline-block; direction: ltr;"><input type="search" class="form-control ds-input" id="search-input" placeholder="Search..." aria-label="Search for..." autocomplete="off" spellcheck="false" role="combobox" aria-autocomplete="list" aria-expanded="false" aria-owns="algolia-autocomplete-listbox-0" dir="auto" style="position: relative; vertical-align: top;"><pre aria-hidden="true" style="position: absolute; visibility: hidden; white-space: pre; font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 16px; font-style: normal; font-variant: normal; font-weight: 400; word-spacing: 0px; letter-spacing: normal; text-indent: 0px; text-rendering: auto; text-transform: none;"></pre><span class="ds-dropdown-menu" role="listbox" id="algolia-autocomplete-listbox-0" style="position: absolute; top: 100%; z-index: 100; display: none; left: 0px; right: auto;"><div class="ds-dataset-1"></div></span></span>
<button class="btn btn-link bd-search-docs-toggle d-md-none p-0 ml-3" type="button" data-toggle="collapse" data-target="#bd-docs-nav" aria-controls="bd-docs-nav" aria-expanded="false" aria-label="Toggle docs navigation"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 30 30" width="30" height="30" focusable="false"><title>Menu</title><path stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-miterlimit="10" d="M4 7h22M4 15h22M4 23h22"></path></svg>
</button>
</form>
.
<div class="bd-example">
<div id="accordion">
<div class="card">
<div class="card-header" id="headingOne">
<h5 class="mb-0">
<button class="btn btn-link" data-toggle="collapse" data-target="#collapseOne" aria-expanded="true" aria-controls="collapseOne">
Collapsible Group Item #1
</button>
</h5>
</div>
<div id="collapseOne" class="collapse show" aria-labelledby="headingOne" data-parent="#accordion">
<div class="card-body">
Anim pariatur cliche reprehenderit, enim eiusmod high life accusamus terry richardson ad squid. 3 wolf moon officia aute, non cupidatat skateboard dolor brunch. Food truck quinoa nesciunt laborum eiusmod. Brunch 3 wolf moon tempor, sunt aliqua put a bird on it squid single-origin coffee nulla assumenda shoreditch et. Nihil anim keffiyeh helvetica, craft beer labore wes anderson cred nesciunt sapiente ea proident. Ad vegan excepteur butcher vice lomo. Leggings occaecat craft beer farm-to-table, raw denim aesthetic synth nesciunt you probably haven't heard of them accusamus labore sustainable VHS.
</div>
</div>
</div>
<div class="card">
<div class="card-header" id="headingTwo">
<h5 class="mb-0">
<button class="btn btn-link collapsed" data-toggle="collapse" data-target="#collapseTwo" aria-expanded="false" aria-controls="collapseTwo">
Collapsible Group Item #2
</button>
</h5>
</div>
<div id="collapseTwo" class="collapse" aria-labelledby="headingTwo" data-parent="#accordion">
<div class="card-body">
Anim pariatur cliche reprehenderit, enim eiusmod high life accusamus terry richardson ad squid. 3 wolf moon officia aute, non cupidatat skateboard dolor brunch. Food truck quinoa nesciunt laborum eiusmod. Brunch 3 wolf moon tempor, sunt aliqua put a bird on it squid single-origin coffee nulla assumenda shoreditch et. Nihil anim keffiyeh helvetica, craft beer labore wes anderson cred nesciunt sapiente ea proident. Ad vegan excepteur butcher vice lomo. Leggings occaecat craft beer farm-to-table, raw denim aesthetic synth nesciunt you probably haven't heard of them accusamus labore sustainable VHS.
</div>
</div>
</div>
<div class="card">
<div class="card-header" id="headingThree">
<h5 class="mb-0">
<button class="btn btn-link collapsed" data-toggle="collapse" data-target="#collapseThree" aria-expanded="false" aria-controls="collapseThree">
Collapsible Group Item #3
</button>
</h5>
</div>
.
<table class="table table-bordered table-striped">
<thead>
<tr>
<th style="width: 100px;">Name</th>
<th style="width: 50px;">Type</th>
<th style="width: 50px;">Default</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>parent</td>
<td>selector | jQuery object | DOM element </td>
<td>false</td>
<td>If parent is provided, then all collapsible elements under the specified parent will be closed when this collapsible item is shown. (similar to traditional accordion behavior - this is dependent on the <code>card</code> class). The attribute has to be set on the target collapsible area.</td>
</tr>
<tr>
<td>toggle</td>
<td>boolean</td>
<td>true</td>
<td>Toggles the collapsible element on invocation</td>
</tr>
</tbody>
</table>
.
<table class="table table-bordered table-striped">
<thead>
<tr>
<th style="width: 150px;">Event Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>show.bs.collapse</td>
<td>This event fires immediately when the <code>show</code> instance method is called.</td>
</tr>
<tr>
<td>shown.bs.collapse</td>
<td>This event is fired when a collapse element has been made visible to the user (will wait for CSS transitions to complete).</td>
</tr>
<tr>
<td>hide.bs.collapse</td>
<td>This event is fired immediately when the <code>hide</code> method has been called.</td>
</tr>
<tr>
<td>hidden.bs.collapse</td>
<td>This event is fired when a collapse element has been hidden from the user (will wait for CSS transitions to complete).</td>
</tr>
</tbody>
</table>
<div class="bd-clipboard"><button class="btn-clipboard" title="" data-original-title="Copy to clipboard">Copy</button></div><figure class="highlight"><pre><code class="language-js" data-lang="js"><span class="nx">$</span><span class="p">(</span><span class="s1">'#myCollapsible'</span><span class="p">).</span><span class="nx">on</span><span class="p">(</span><span class="s1">'hidden.bs.collapse'</span><span class="p">,</span> <span class="kd">function</span> <span class="p">()</span> <span class="p">{</span>
<span class="c1">// do something…</span>
<span class="p">})</span></code></pre></figure>
</main>
</div>
</div>
<script src="https://code.jquery.com/jquery-3.2.1.slim.min.js" integrity="sha384-KJ3o2DKtIkvYIK3UENzmM7KCkRr/rE9/Qpg6aAZGJwFDMVNA/GpGFF93hXpG5KkN" crossorigin="anonymous"></script>
<script>window.jQuery || document.write('<script src="/docs/4.0/assets/js/vendor/jquery-slim.min.js"><\/script>')</script>
<script src="/docs/4.0/assets/js/vendor/popper.min.js" integrity="sha384-ApNbgh9B+Y1QKtv3Rn7W3mgPxhU9K/ScQsAP7hUibX39j7fakFPskvXusvfa0b4Q" crossorigin="anonymous"></script><script src="/docs/4.0/dist/js/bootstrap.min.js" integrity="sha384-JZR6Spejh4U02d8jOt6vLEHfe/JQGiRRSQQxSfFWpi1MquVdAyjUar5+76PVCmYl" crossorigin="anonymous"></script><script src="https://cdn.jsdelivr.net/npm/docsearch.js#2/dist/cdn/docsearch.min.js"></script><script src="/docs/4.0/assets/js/docs.min.js"></script>
</body></html>
Maybe the DOM nodes you are looking for are inside iframes. In that case, you need to look for them inside those iframes by:
driver = webdriver.Chrome()
driver.get("https://www.google.com")
iframes = driver.find_elements_by_tag_name('iframe')
for pos, iframe in enumerate(iframes):
src = iframe.get_attribute('src')
driver.switch_to.frame(iframe)
# driver = WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.ID, "...")))
source = getattr(driver, "page_source", "no page source")
print(pos, src, len(source), source[:1000])
driver.switch_to.default_content()
It's hard to tell from context but if you have a string containing the page html source then parsing it with Beautiful Soup will do. Maybe not ideal if you need to keep the number of dependencies as small as possible but thats an easy fix.
This is part of a html page from which i need to extract the following items:
name from the strong tag, classification type (Actor and Singer), born and died location.
<li class="clearfix">
<div style="margin-top:10px;">
<div class="float-left" style="margin-bottom:10px;">
<a href="http://" title="Elvis Presley" name="Elvis Presley" class="float-left">
<strong>Mr. Elvis Presley</strong></a>
</div>
<div class="rating_overall fleft" style="margin:0px 0px 0px 10px;">
<div class="rating_overall voted_rating_overall" style='width:72.96px;'></div>
</div>
<span class="result-vote float-left" id="result" style="line-height:15px; color: #AAA; font-size: 0.9em; margin-top: 1px;"> (15 vots)</span>
<div class="clear"></div>
<a href="http://" title="Mr. Elvis Presley" name="Mr. Elvis Presley">
<img style="float:left;" src="http://a.jpg" alt="Mr. Elvis Presley" title="Mr. Elvis Presley" />
</a>
<br/>
<p>
<b>Classification:</b>
Actor
, Singer
<br />
<b>Born:</b> Tupelo<br />
<b>Died:</b>
Memphis,
<!--<b>City:</b>-->
Memphis
</p>
<div class="clk"></div>
</div>
</li>
I had try using the BeautifulSoup but i'm a newbie on python :
data2 = soup.find_all('li',{'class':'clearfix'})
for container in data2:
if container.find('a', {'class':'float-left'}):
name = container.a.text
print (name)
if container.find('a', {'class':'underline'}):
classification=container.div.p.a.text
print (classification)
flag
Although I didn't get any errors from the script, I managed to extract only the name and the first classification. How do I target the rest of the elements that I need: classification("Singer") and the born and died location?
You can use beautiful soup for html parser , I am showing you both first with beautiful soup and second with regex and catch the results with group capturing :
First with Beautiful soup:
string_1="""<li class="clearfix">
<div style="margin-top:10px;">
<div class="float-left" style="margin-bottom:10px;">
<a href="http://" title="Elvis Presley" name="Elvis Presley" class="float-left">
<strong>Mr. Elvis Presley</strong></a>
</div>
<div class="rating_overall fleft" style="margin:0px 0px 0px 10px;">
<div class="rating_overall voted_rating_overall" style='width:72.96px;'></div>
</div>
<span class="result-vote float-left" id="result" style="line-height:15px; color: #AAA; font-size: 0.9em; margin-top: 1px;"> (15 vots)</span>
<div class="clear"></div>
<a href="http://" title="Mr. Elvis Presley" name="Mr. Elvis Presley">
<img style="float:left;" src="http://a.jpg" alt="Mr. Elvis Presley" title="Mr. Elvis Presley" />
</a>
<br/>
<p>
<b>Classification:</b>
Actor
, Singer
<br />
<b>Born:</b> Tupelo<br />
<b>Died:</b>
Memphis,
<!--<b>City:</b>-->
Memphis
</p>
<div class="clk"></div>
</div>
</li>"""
from bs4 import BeautifulSoup
soup=BeautifulSoup(string_1,"html.parser")
for a in soup.find_all('a'):
print(a['name'])
Output:
Elvis Presley
Mr. Elvis Presley
Actor
Singer
Tupelo
Memphis
Second with regex:
Use it if the form code is same as you shown there :
import re
string_1="""<li class="clearfix">
<div style="margin-top:10px;">
<div class="float-left" style="margin-bottom:10px;">
<a href="http://" title="Elvis Presley" name="Elvis Presley" class="float-left">
<strong>Mr. Elvis Presley</strong></a>
</div>
<div class="rating_overall fleft" style="margin:0px 0px 0px 10px;">
<div class="rating_overall voted_rating_overall" style='width:72.96px;'></div>
</div>
<span class="result-vote float-left" id="result" style="line-height:15px; color: #AAA; font-size: 0.9em; margin-top: 1px;"> (15 vots)</span>
<div class="clear"></div>
<a href="http://" title="Mr. Elvis Presley" name="Mr. Elvis Presley">
<img style="float:left;" src="http://a.jpg" alt="Mr. Elvis Presley" title="Mr. Elvis Presley" />
</a>
<br/>
<p>
<b>Classification:</b>
Actor
, Singer
<br />
<b>Born:</b> Tupelo<br />
<b>Died:</b>
Memphis,
<!--<b>City:</b>-->
Memphis
</p>
<div class="clk"></div>
</div>
</li>"""
pattern=r'<strong>(\w.+)<\/strong>|<b>Classification:<\/b>(\s.+)(\s.+)|(Born:.+)|(Died:.+\s.+\s.+\s.+)'
pattern_2=r'name=["](\w.+?)["]'
match=re.finditer(pattern,string_1,re.M)
for find in match:
if find.group(1):
print("Name {}".format(find.group(1)))
if find.group(2):
print("Classificiation first {}".format(re.search(pattern_2,str(find.group(2))).group(1)))
print("Classification second {}".format(re.search(pattern_2,str(find.group(3))).group(1)))
if find.group(4):
print("Born {}".format(re.search(pattern_2, str(find.group(4))).group(1)))
if find.group(5):
print("Dead {}".format(re.search(pattern_2, str(find.group(5))).group(1)))
output:
Name Mr. Elvis Presley
Classificiation first Actor
Classification second Singer
Born Tupelo
Dead Memphis