I come from Hong Kong, a new learner of python, I am going to scrap the information from below website:
https://event.hktdc.com/en/?eventFormat=Exhibition&countryRegion=Hong-Kong&location=all&year=2023&p=1
my code as below
import requests
from bs4 import BeautifulSoup
response = requests.get(
"https://event.hktdc.com/tc/?eventFormat=Exhibition&countryRegion=Hong-Kong&location=all&year=2023&p=1#list")
soup = BeautifulSoup(response.text, "html.parser")
print(soup.prettify())
How can I get the information from it ?
Related
from bs4 import BeautifulSoup
import requests
url='https://play.google.com/store/apps/details?id=com.Shooter.ModernWarships'
req=requests.get(url)
soup = BeautifulSoup(req.content, 'html.parser')
print(soup.find_all("h1", class_="Fd93Bb ynrBgc xwcR9d"))
Avoid using classes cause they are dynamic, change your strategy and select tags or ids that are often more static.
from bs4 import BeautifulSoup
import requests
url='https://play.google.com/store/apps/details?id=com.Shooter.ModernWarships'
req=requests.get(url)
soup = BeautifulSoup(req.content, 'html.parser')
soup.h1.text
or
soup.find('h1').text
Output
MODERN WARSHIPS
I tried to retrieve table data through the link below by python, unfortunately they brought all the html tags but haven't brought the table. Could you do me a favor and help me.
https://www150.statcan.gc.ca/n1/pub/71-607-x/2021004/exp-eng.htm?r1=(1)&r2=0&r3=0&r4=12&r5=0&r7=0&r8=2022-02-01&r9=2022-02-01
my code:
import requests
from bs4 import BeautifulSoup
url = 'https://www150.statcan.gc.ca/n1/pub/71-607-x/2021004/exp-eng.htm?r1=(1)&r2=0&r3=0&r4=12&r5=0&r7=0&r8=2022-02-01&r9=2022-02-01'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup)
I am trying to webscrape a Chinese website https://bo.io.gov.mo/bo/ii/2021/43/avisosoficiais_cn.asp, but the code below is not returning the full html text. The strange thing is that the code is able to get me the full html from the Portuguese version of the same website https://bo.io.gov.mo/bo/ii/2021/43/avisosoficiais.asp. What is the problem?
from bs4 import BeautifulSoup
from urllib.request import urlopen
response = urlopen('https://bo.io.gov.mo/bo/ii/2021/43/avisosoficiais_cn.asp')
html_doc = response.read()
soup = BeautifulSoup(html_doc, 'lxml')
strhtm = soup.prettify()
print(strhtm)
I wanted to look for Skoda 2018 with less than 100K KM from this site
https://www.autocenter.co.il/
however I cannot find the right method
here is what I did
from bs4 import BeautifulSoup
import requests
url = "https://www.autocenter.co.il/"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
#print((response.status_code))
first=soup.find_all("div",{"class":"product-wrapper-inner"})
print(first[0].text)
As mentioned in the comments by #Elyes construct your url based on your criteria
from bs4 import BeautifulSoup
import requests
url = "https://www.autocenter.co.il/shop/?flr_manufacturer=196&flr_from_year=2018&flr_mileage_range=0-100000"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
#print((response.status_code))
first=soup.find_all("div",{"class":"col-lg-11 text-center px-4 px-lg-15"})
[i.text for i in first]
output:
I am trying to take a movie rating from the website Letterboxd. I have used code like this on other websites and it has worked, but it is not getting the info I want off of this website.
import requests
from bs4 import BeautifulSoup
page = requests.get("https://letterboxd.com/film/avengers-endgame/")
soup = BeautifulSoup(page.content, 'html.parser')
final = soup.find("section", attrs={"class":"section ratings-histogram-
chart"})
print(final)
This prints nothing, but there is a tag in the website for this class and the info I want is under it.
The reason behind this, is that the website loads most of the content asynchronously, so you'll have to look at the http requests it sends to the server in order to load the page content after loading the page layout. You can find them in "network" section in the browser (F12 key).
For instance, one of the apis they use to load the rating is this one:
https://letterboxd.com/csi/film/avengers-endgame/rating-histogram/
You can get the weighted average from another tag
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://letterboxd.com/film/avengers-endgame/')
soup = bs(r.content, 'lxml')
print(soup.select_one('[name="twitter:data2"]')['content'])
Text of all histogram
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://letterboxd.com/csi/film/avengers-endgame/rating-histogram/')
soup = bs(r.content, 'lxml')
ratings = [item['title'].replace('\xa0',' ') for item in soup.select('.tooltip')]
print(ratings)