Get the Text from the next_sibling - BeautifulSoup 4

Get the Text from the next_sibling - BeautifulSoup 4 - python

I want to scrape Restaurants from this URL
for rests in dining_soup.select("div.infos-restos"):
for rest in rests.select("h3"):
safe_print(" Rest Nsme: "+rest.text)
print(rest.next_sibling.next_sibling.next_sibling.next_sibling.contents)
outputs
<div class="descriptif-resto">
<p>
<strong>Type of cuisine</strong>:International</p>
<p>
<strong>Opening hours</strong>:06:00-23:30</p>
<p>The Food Square bar and restaurant offers a varied menu in an elegant and welcoming setting. In fine weather you can also enjoy your meal next to the pool or relax on the garden terrace.</p>
</div>
and
print(rest.next_sibling.next_sibling.next_sibling.next_sibling.text)
outputs always empty
So my question is how do I scrape Type of cuisine and opening hours from that Div?

Opening hours and cuisine are in "descriptif-resto" text:
import requests
from bs4 import BeautifulSoup
r = requests.get("http://www.accorhotels.com/gb/hotel-5548-mercure-niederbronn-hotel/restaurant.shtml")
soup = BeautifulSoup(r.content)
print(soup.find("div",attrs={"class":"descriptif-resto"}).text)
Type of cuisine:Brasserie
Opening hours:12:00 - 14:00 / 19:00 - 22:00
The name is in the first h3 tag, the type and opening hours are in the two p tags:
name = soup.find("div", attrs={"class":"infos-restos"}).h3.text
det = soup.find("div",attrs={"class":"descriptif-resto"}).p
hours = det.find_next("p").text
tpe = det.text
print(name)
print(hours)
print(tpe)
LA STUB DU CASINO
Opening hours:12:00 - 14:00 / 19:00 - 22:00
Type of cuisine:Brasserie
Ok so some parts don't have both opening hours and cuisine so you will have to fine tune that but this gets all the info:
from itertools import chain
all_dets = soup.find_all("div", attrs={"class":"infos-restos"})
# get all names from h3 tagsusing chain so we can zip later
names = chain.from_iterable(x.find_all("h3") for x in all_dets)
# get all info to extract cuisine, hours
det = chain.from_iterable(x.find_all("div",attrs={"class":"descriptif-resto"}) for x in all_dets)
# zipp appropriate details with each name
zipped = zip(names, det)
for name, det in zipped:
details = det.p
name, tpe = name.text, details
hours = details.find_next("p") if "cuisine" in det.p.text else ""
if hours: # empty string means we have a bar
print(name, tpe.text, hours.text)
else:
print(name, tpe.text)
print("-----------------------------")
LA STUB DU CASINO
Type of cuisine:Brasserie
Opening hours:12:00 - 14:00 / 19:00 - 22:00
-----------------------------
RESTAURANT DU CASINO IVORY
Type of cuisine:French
Opening hours:19:00 - 22:00
-----------------------------
BAR DE L'HOTEL LE DOLLY
Opening hours:10:00-01:00
-----------------------------
BAR DES MACHINES A SOUS
Opening hours:10:30-03:00
-----------------------------

Related

How to use loop 'find next sibling' until reaching a certain tag when web scraping with beautifulsoup in python?

The webpage I'm attempting to scrape has a section where the html tags are nested like so:
<div>
<h3>
<p>
<p>
<h3>
<p>
<p>
<p>
My code is able to navigate to the correct tag but I am struggling to split the text by as is a sibling, not a child. I am either able to print just the tags or print all the text within the tag without splitting into sections.
I've tried using for loops but I don't think is the right approach if searching within siblings. I think looping an if statement to determine if find_next_sibling().name = 'h3' might work but I've been unable to iterate this without nesting a large number of if statements.
Can anyone please advise on what approach I should take? Please see my full code below - the treaty files section works fine.
from bs4 import BeautifulSoup
import requests
url = 'https://www.gov.uk/government/publications/albania-tax-treaties'
get_url = requests.get(url)
url_html = get_url.content
soup = BeautifulSoup(url_html, 'lxml')
treaty_files = soup.find_all('div', class_='attachment-details')
for treaty_file in treaty_files:
file_name = treaty_file.h3.a.text
file_url = treaty_file.h3.a['href']
#print(f"Treaty Name: {file_name}")
#print(f"Treaty URL: {file_url}")
#print()
#Attempt 1
treaty_details = soup.find('div', class_='govspeak').find_all('h3')
for treaty_content in treaty_details:
content = treaty_content.find_next_siblings()
for x in content:
test = x
a = test
#print(a)
#Attempt 2
treaty_details = soup.find('div', class_='govspeak').find_all('h3')
for treaty_content in treaty_details:
content = treaty_content.find_next_sibling()
while content.name != 'h3':
print(f"Text: {content.text}")
content = content.find_next_sibling()
if content.name == 'h3':
break

One possible solution is to leverage pandas.Series.groupby function to group sections together:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://www.gov.uk/government/publications/albania-tax-treaties"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
govspeak = soup.select_one(".govspeak")
s = pd.Series(govspeak.find_all(recursive=False))
for _, g in s.groupby(s.apply(lambda x: x.name).eq("h3").cumsum()):
title = g.iloc[0].text
text = "\n".join(row.text for row in g.iloc[1:])
print(title)
print("-" * 120)
print(text)
print()
print()
Prints:
2021 UK-Albania Synthesised text of the Multilateral Instrument and the 2013 Double Taxation Agreement — in force
------------------------------------------------------------------------------------------------------------------------
The 2013 UK-Albania Double Taxation Agreement has been modified by the Multilateral Instrument (MLI).
The modifications made by the Multilateral Instrument entered into force in:
the UK on 1 October 2018
Albania on 1 January 2021
They are effective in the UK from:
1 January 2021 for taxes withheld at source
1 April 2022 for Corporation Tax
6 April 2022 for Income Tax and Capital Gains Tax
They are effective in Albania from 1 July 2021.
2013 UK-Albania Double Taxation Agreement — in force
------------------------------------------------------------------------------------------------------------------------
The agreement entered into force on 30 December 2013.
It is effective in the UK from:
1 April 2014 for Corporation Tax
6 April 2014 for Income Tax and Capital Gains Tax
It is effective in Albania from 1 January 2014 for Income Tax and Capital Gains Tax.

BeautifulSoup not getting web data

I'm creating a web scraper in order to pull the name of a company from a chamber of commerce website directory.
Im using BeautifulSoup. The page and soup objects appear to be working, but when I scrape the HTML content, an empty list is returned when it should be filled with the directory names on the page.
Web page trying to scrape: https://www.austinchamber.com/directory
Here is the HTML:
<div>
<ul> class="item-list item-list--small"> == $0
<li>
<div class='item-content'>
<div class='item-description'>
<h5 class = 'h5'>Women Helping Women LLC</h5>
Here is the python code:
def pageRequest(url):
page = requests.get(url)
return page
def htmlSoup(page):
soup = BeautifulSoup(page.content, "html.parser")
return soup
def getNames(soup):
name = soup.find_all('h5', class_='h5')
return name
page = pageRequest("https://www.austinchamber.com/directory")
soup = htmlSoup(page)
name = getNames(soup)
for n in name:
print(n)

The data is loaded dynamically via Ajax. To get the data, you can use this script:
import json
import requests
url = 'https://www.austinchamber.com/api/v1/directory?filter[categories]=&filter[show]=all&page={page}&limit=24'
page = 1
for page in range(1, 10):
print('Page {}..'.format(page))
data = requests.get(url.format(page=page)).json()
# uncommentthis to print all data:
# print(json.dumps(data, indent=4))
for d in data['data']:
print(d['title'])
Prints:
...
Indeed
Austin Telco Federal Credit Union - Taos
Green Bank
Seton Medical Center Austin
Austin Telco Federal Credit Union - Jollyville
Page 42..
Texas State SBDC - San Marcos Office
PlainsCapital Bank - Motor Bank
University of Texas - Thompson Conference Center
Lamb's Tire & Automotive Centers - #2 Research & Braker
AT&T Labs
Prosperity Bank - Rollingwood
Kerbey Lane Cafe - Central
Lamb's Tire & Automotive Centers - #9 Bee Caves
Seton Medical Center Hays
PlainsCapital Bank - North Austin
Ellis & Salazar Body Shop
aLamb's Tire & Automotive Centers - #6 Lake Creek
Rudy's Country Store and BarBQ
...

Get Text from h1 with BeautifulSoup

I was asked to get a product name from a web.
I was asked to get this text:
SEIKO 5 AUTOMATIC MENS STEEL VINTAGE JAPAN MADE BLACK DIAL WATCH RUN ORDER K
This is my BeautifulSoup code:
import requests
from bs4 import BeautifulSoup
get = requests.get('https://www.ebay.com/itm/SEIKO-5-AUTOMATIC-MENS-STEEL-VINTAGE-JAPAN-MADE-BLACK-DIAL-WATCH-RUN-ORDER-K/143420840058?epid=18032713872&_trkparms=ispr%3D1&hash=item21648c587a:g:ZzEAAOSw9MRdsI8v&enc=AQAEAAACQBPxNw%2BVj6nta7CKEs3N0qVBgKB1sCHq6imZgPqwOxGc8125XNy2Dq0slMe8clDZgTSnJdS4K5F5NyTF%2FwJExAng2G2%2FdtRUNYEnKcxoo4WXaAM5K%2BUxqDKTnmNGfgjTzpWCdoE50XlC7BXz3bBrJTY0vo62kBVR03HYvJwVCxnu8NEBiz4YMfAlPWDNnP2lVje46p22rKWDem6rHFqpoKtLDVHS8CaQER%2BqJxucEnw14LJIybRkfCmDuobZv%2F4F9Lhrl8xiPp%2Bbk6iRIu3UqqocBO%2FNyxW1aAa8QWkaJqtUy3g6Yue61yMEb0GY3BwO1%2BpVwkTOZLDvYHXZ%2FZEGNu%2F%2BYznes9jNtctDCr9Xv3QECsXyLDEOeo7LHh1srunEoRvK9T0AkS7oT%2BI3%2B%2BtD5fGnpJJu%2FJ3MdktqvgnTwieipeZTrGsHiQ8iL1nWm0CJcMbe2UUELEG%2BLHPNSSkRcUVBWnoPuOE5FjuyFHR1ujG2TgGLfN8HlO6ZyfNWz0K%2Bc4zjo7wBPnJdffcn6p8kLHWhbFyMyIY1Jc8yZBl20mlA29S%2BN%2Bw0e3uZDHK%2BIyCBctbYgGxaQM6Aevcdx0OcXl%2Fy7aDoRTqhBue9OYrAa3fEQf6ObFqtCbiEiXTioQZZJfrC%2FXfbq36oMTuQAFRvH2ahowGoPhSQkE1Jn73QLI%2FGXVynHIG2KdQSbX4eU%2FgoGy9y5WIvvUL9Xxy4ltNvTtCpjg5XlY8VxDv4M2gsLY3C0SRv7LNELk%2FitBSjfuUjzg%3D%3D&checksum=143420840058aa89790ec2164a5caf16644bb1bfd7c8&enc=AQAEAAACQBPxNw%2BVj6nta7CKEs3N0qVBgKB1sCHq6imZgPqwOxGc8125XNy2Dq0slMe8clDZgTSnJdS4K5F5NyTF%2FwJExAng2G2%2FdtRUNYEnKcxoo4WXaAM5K%2BUxqDKTnmNGfgjTzpWCdoE50XlC7BXz3bBrJTY0vo62kBVR03HYvJwVCxnu8NEBiz4YMfAlPWDNnP2lVje46p22rKWDem6rHFqpoKtLDVHS8CaQER%2BqJxucEnw14LJIybRkfCmDuobZv%2F4F9Lhrl8xiPp%2Bbk6iRIu3UqqocBO%2FNyxW1aAa8QWkaJqtUy3g6Yue61yMEb0GY3BwO1%2BpVwkTOZLDvYHXZ%2FZEGNu%2F%2BYznes9jNtctDCr9Xv3QECsXyLDEOeo7LHh1srunEoRvK9T0AkS7oT%2BI3%2B%2BtD5fGnpJJu%2FJ3MdktqvgnTwieipeZTrGsHiQ8iL1nWm0CJcMbe2UUELEG%2BLHPNSSkRcUVBWnoPuOE5FjuyFHR1ujG2TgGLfN8HlO6ZyfNWz0K%2Bc4zjo7wBPnJdffcn6p8kLHWhbFyMyIY1Jc8yZBl20mlA29S%2BN%2Bw0e3uZDHK%2BIyCBctbYgGxaQM6Aevcdx0OcXl%2Fy7aDoRTqhBue9OYrAa3fEQf6ObFqtCbiEiXTioQZZJfrC%2FXfbq36oMTuQAFRvH2ahowGoPhSQkE1Jn73QLI%2FGXVynHIG2KdQSbX4eU%2FgoGy9y5WIvvUL9Xxy4ltNvTtCpjg5XlY8VxDv4M2gsLY3C0SRv7LNELk%2FitBSjfuUjzg%3D%3D&checksum=143420840058aa89790ec2164a5caf16644bb1bfd7c8')
soup = BeautifulSoup(get.text, 'lxml')
company = soup.select('h1.it-ttl')[0].text.strip()
print(company)
The HTML from the code is:
<h1 class="it-ttl" id="itemTitle" itemprop="name">
<span class="g-hdn">Details about
</span>
SEIKO 5 AUTOMATIC MENS STEEL VINTAGE JAPAN MADE BLACK DIAL WATCH RUN ORDER K
</h1>
Instead of the desired text, I get this:
Details about SEIKO 5 AUTOMATIC MENS STEEL VINTAGE JAPAN MADE BLACK DIAL WATCH RUN ORDER K
How can I extract only the product name?

import requests
from bs4 import BeautifulSoup
get = requests.get('https://www.ebay.com/itm/SEIKO-5-AUTOMATIC-MENS-STEEL-VINTAGE-JAPAN-MADE-BLACK-DIAL-WATCH-RUN-ORDER-K/143420840058?epid=18032713872&_trkparms=ispr%3D1&hash=item21648c587a:g:ZzEAAOSw9MRdsI8v&enc=AQAEAAACQBPxNw%2BVj6nta7CKEs3N0qVBgKB1sCHq6imZgPqwOxGc8125XNy2Dq0slMe8clDZgTSnJdS4K5F5NyTF%2FwJExAng2G2%2FdtRUNYEnKcxoo4WXaAM5K%2BUxqDKTnmNGfgjTzpWCdoE50XlC7BXz3bBrJTY0vo62kBVR03HYvJwVCxnu8NEBiz4YMfAlPWDNnP2lVje46p22rKWDem6rHFqpoKtLDVHS8CaQER%2BqJxucEnw14LJIybRkfCmDuobZv%2F4F9Lhrl8xiPp%2Bbk6iRIu3UqqocBO%2FNyxW1aAa8QWkaJqtUy3g6Yue61yMEb0GY3BwO1%2BpVwkTOZLDvYHXZ%2FZEGNu%2F%2BYznes9jNtctDCr9Xv3QECsXyLDEOeo7LHh1srunEoRvK9T0AkS7oT%2BI3%2B%2BtD5fGnpJJu%2FJ3MdktqvgnTwieipeZTrGsHiQ8iL1nWm0CJcMbe2UUELEG%2BLHPNSSkRcUVBWnoPuOE5FjuyFHR1ujG2TgGLfN8HlO6ZyfNWz0K%2Bc4zjo7wBPnJdffcn6p8kLHWhbFyMyIY1Jc8yZBl20mlA29S%2BN%2Bw0e3uZDHK%2BIyCBctbYgGxaQM6Aevcdx0OcXl%2Fy7aDoRTqhBue9OYrAa3fEQf6ObFqtCbiEiXTioQZZJfrC%2FXfbq36oMTuQAFRvH2ahowGoPhSQkE1Jn73QLI%2FGXVynHIG2KdQSbX4eU%2FgoGy9y5WIvvUL9Xxy4ltNvTtCpjg5XlY8VxDv4M2gsLY3C0SRv7LNELk%2FitBSjfuUjzg%3D%3D&checksum=143420840058aa89790ec2164a5caf16644bb1bfd7c8&enc=AQAEAAACQBPxNw%2BVj6nta7CKEs3N0qVBgKB1sCHq6imZgPqwOxGc8125XNy2Dq0slMe8clDZgTSnJdS4K5F5NyTF%2FwJExAng2G2%2FdtRUNYEnKcxoo4WXaAM5K%2BUxqDKTnmNGfgjTzpWCdoE50XlC7BXz3bBrJTY0vo62kBVR03HYvJwVCxnu8NEBiz4YMfAlPWDNnP2lVje46p22rKWDem6rHFqpoKtLDVHS8CaQER%2BqJxucEnw14LJIybRkfCmDuobZv%2F4F9Lhrl8xiPp%2Bbk6iRIu3UqqocBO%2FNyxW1aAa8QWkaJqtUy3g6Yue61yMEb0GY3BwO1%2BpVwkTOZLDvYHXZ%2FZEGNu%2F%2BYznes9jNtctDCr9Xv3QECsXyLDEOeo7LHh1srunEoRvK9T0AkS7oT%2BI3%2B%2BtD5fGnpJJu%2FJ3MdktqvgnTwieipeZTrGsHiQ8iL1nWm0CJcMbe2UUELEG%2BLHPNSSkRcUVBWnoPuOE5FjuyFHR1ujG2TgGLfN8HlO6ZyfNWz0K%2Bc4zjo7wBPnJdffcn6p8kLHWhbFyMyIY1Jc8yZBl20mlA29S%2BN%2Bw0e3uZDHK%2BIyCBctbYgGxaQM6Aevcdx0OcXl%2Fy7aDoRTqhBue9OYrAa3fEQf6ObFqtCbiEiXTioQZZJfrC%2FXfbq36oMTuQAFRvH2ahowGoPhSQkE1Jn73QLI%2FGXVynHIG2KdQSbX4eU%2FgoGy9y5WIvvUL9Xxy4ltNvTtCpjg5XlY8VxDv4M2gsLY3C0SRv7LNELk%2FitBSjfuUjzg%3D%3D&checksum=143420840058aa89790ec2164a5caf16644bb1bfd7c8')
soup = BeautifulSoup(get.text, 'html.parser')
company = soup.select('h1.it-ttl')[0].text.strip()
span_text = soup.select('span.g-hdn')[0].text.strip()
print(company)
print(span_text)
print(company.lstrip(span_text))
Since the span tag is nested in the h1 tag, the necessary step is to extract the span text and remove it from the h1 tag with the lstrip method.

Return empty bracket [ ] when web scraping

I try to print all the titles on nytimes.com. I used requests and beautifulsoup module. But I got empty brackets in the end. The return result is [ ]. How can I fix this problem?
import requests
from bs4 import BeautifulSoup
url = "https://www.nytimes.com/"
r = requests.get(url)
text = r.text
soup = BeautifulSoup(text, "html.parser")
title = soup.find_all("span", "balanceHeadline")
print(title)

I am assuming that you are trying to retrieve the headlines of nytimes. Doing title = soup.find_all("span", {'class':'balancedHeadline'}) will not get you your results. The <span> tag found using the element selector is often misleading. What you have to do is to look into the source code of the page and find the tags wrapped around the title.
For nytimes its a little tricky because the headlines are wrapped in the <script> tag with a lot of junk inside. Hence what you can do is to "clean" it first and deserialize the string by convertinng it into a python dictionary object.
import requests
from bs4 import BeautifulSoup
import json
url = "https://www.nytimes.com/"
r = requests.get(url)
r_html = r.text
soup = BeautifulSoup(r_html, "html.parser")
scripts = soup.find_all('script')
for script in scripts:
if 'preloadedData' in script.text:
jsonStr = script.text
jsonStr = jsonStr.split('=', 1)[1].strip() # remove "window.__preloadedData = "
jsonStr = jsonStr.rsplit(';', 1)[0] # remove trailing ;
jsonStr = json.loads(jsonStr)
for key,value in jsonStr['initialState'].items():
try:
if value['promotionalHeadline'] != "":
print(value['promotionalHeadline'])
except:
continue
outputs
Jeffrey Epstein Autopsy Results Conclude He Hanged Himself
Trump and Netanyahu Put Bipartisan Support for Israel at Risk
Congresswoman Rejects Israel’s Offer of a West Bank Visit
In Tlaib’s Ancestral Village, a Grandmother Weathers a Global Political Storm
Cathay Chief’s Resignation Shows China’s Power Over Hong Kong Unrest
Trump Administration Approves Fighter Jet Sales to Taiwan
Peace Road Map for Afghanistan Will Let Taliban Negotiate Women’s Rights
Debate Flares Over Afghanistan as Trump Considers Troop Withdrawal
In El Paso, Hundreds Show Up to Mourn a Woman They Didn’t Know
Is Slavery’s Legacy in the Power Dynamics of Sports?
Listen: ‘Modern Love’ Podcast
‘The Interpreter’
If You Think Trump Is Helping Israel, You’re a Fool
First They Came for the Black Feminists
How Women Can Escape the Likability Trap
With Trump as President, the World Is Spiraling Into Chaos
To Understand Hong Kong, Don’t Think About Tiananmen
The Abrupt End of My Big-Girl Summer
From Trump Boom to Trump Gloom
What Are Trump and Netanyahu Afraid Of?
King Bibi Bows Before a Tweet
Ebola Could Be Eradicated — But Only if the World Works Together
The Online Mob Came for Me. What Happened to the Reckoning?
A German TV Star Takes On Bullies
Why Is Hollywood So Scared of Climate Change?
Solving Medical Mysteries With Your Help: Now on Netflix

title = soup.find_all("span", "balanceHeadline")
replace it with
title = soup.find_all("span", {'class':'balanceHeadline'})

Difficulty using beautifulsoup in Python to scrape web data from multiple HTML classes

I am using Beautiful Soup in Python to scrape some data from a property listings site.
I have had success in scraping the individual elements that I require but wish to use a more efficient script to pull back all the data in one command if possible.
The difficulty is that the various elements I require reside in different classes.
I have tried the following, so far.
for listing in content.findAll('h2', attrs={"class": "listing-results-attr"}):
print(listing.text)
which successfully gives the following list
15 room mansion for sale
3 bed barn conversion for sale
2 room duplex for sale
1 bed garden shed for sale
Separately, to retrieve the address details for each listing I have used the following successfully;
for address in content.findAll('a', attrs={"class": "listing-results-address"}):
print(address.text)
which gives this
22 Acacia Avenue, CityName Postcode
100 Sleepy Hollow, CityName Postcode
742 Evergreen Terrace, CityName Postcode
31 Spooner Street, CityName Postcode
And for property price I have used this...
for prop_price in content.findAll('a', attrs={"class": "listing-results-price"}):
print(prop_price.text)
which gives...
$350,000
$1,250,000
$750,000
$100,000
This is great however I need to be able to pull back all of this information in a more efficient and performant way such that all the data comes back in one pass.
At present I can do this using something like the code below:
all = content.select("a.listing-results-attr, h2.listing-results-address, a.listing-results-price")
This works somewhat but brings back too much additional HTML tags and is just not nearly as elegant or sophisticated as I require. Results as follows.
</a>, <h2 class="listing-results-attr">
15 room mansion for sale
</h2>, <a class="listing-results-address" href="redacted">22 Acacia Avenue, CityName Postcode</a>, <a class="listing-results-price" href="redacted">
$350,000
Expected results should look something like this:
15 room mansion for sale
22 Acacia Avenue, CityName Postcode
$350,000
3 bed barn conversion for sale
100 Sleepy Hollow, CityName Postcode
$1,250,000
etc
etc
I then need to be able to store the results as JSON objects for later analysis.
Thanks in advance.

Change your selectors as shown below:
import requests
from bs4 import BeautifulSoup as bs
url = 'https://www.zoopla.co.uk/for-sale/property/caerphilly/?q=Caerphilly&results_sort=newest_listings&search_source=home'
r = requests.get(url)
soup = bs(r.content, 'lxml')
details = ([item.text.strip() for item in soup.select(".listing-results-attr a, .listing-results-address , .text-price")])
You can view separately with, for example,
prices = details[0::3]
descriptions = details[1::3]
addresses = details[2::3]
print(prices, descriptions, addresses)

find_all() function always returns a list, strip() is remove spaces at the beginning and at the end of the string.
import requests
from bs4 import BeautifulSoup as bs
url = 'https://www.zoopla.co.uk/for-sale/property/caerphilly/?q=Caerphilly&results_sort=newest_listings&search_source=home'
r = requests.get(url)
soup = bs(r.content, 'lxml')
results = soup.find("ul",{'class':"listing-results clearfix js-gtm-list"})
for li in results.find_all("li",{'class':"srp clearfix"}):
price = li.find("a",{"class":"listing-results-price text-price"}).text.strip()
address = li.find("a",{'class':"listing-results-address"}).text.strip()
description = li.find("h2",{'class':"listing-results-attr"}).find('a').text.strip()
print(description)
print(address)
print(price)
O/P:
2 bed detached bungalow for sale
Bronrhiw Fach, Caerphilly CF83
£159,950
2 bed semi-detached house for sale
Cwrt Nant Y Felin, Caerphilly CF83
£159,950
3 bed semi-detached house for sale
Pen-Y-Bryn, Caerphilly CF83
£102,950
.....

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Get the Text from the next_sibling - BeautifulSoup 4 - python

Related

How to use loop 'find next sibling' until reaching a certain tag when web scraping with beautifulsoup in python?

BeautifulSoup not getting web data

Get Text from h1 with BeautifulSoup

Return empty bracket [ ] when web scraping

Difficulty using beautifulsoup in Python to scrape web data from multiple HTML classes

Categories

Resources