Unable to retrieve crawling information

Unable to retrieve crawling information - python

As you can see from the result screen in the picture, the class name is correct and there seems to be no mistake. But I'm not getting any results.
from bs4 import BeautifulSoup
from urllib.request import urlopen
response = urlopen("https://www.naver.com")
soup = BeautifulSoup(response, 'html.parser')
for anchor in soup.select('span .realtime_item'):
print(anchor)

enter image description here
Translate. The site can no longer be crawled this way.

it worked for me:
from bs4 import BeautifulSoup
from urllib.request import urlopen
response = urlopen("https://www.naver.com")
soup = BeautifulSoup(response, 'html.parser')
for anchor in soup.select('.realtime_item'):
print(anchor)
print("\n\n")

You are not getting any data because SPAN doesn't have anything like realtime_item. Try to print soup and find if the value is there or not and then do select
from bs4 import BeautifulSoup
from urllib.request import urlopen
response = urlopen("https://www.naver.com")
soup = BeautifulSoup(response, 'html.parser')
print(soup)

Related

My beautiful soup code does not work as expected

It's my project and it does not work
from bs4 import BeautifulSoup
import requests
import lxml
import html
html_txt = requests.get("http://transfer.ttc.com.ge/?page=live&setLng=ka")
soup = BeautifulSoup(html_text, "lxml")
job = soup.find("tr", class_= "text-left right txt-td")
print(job)

First of all you can't make a soup with a Requests object, you should add .text:
html_text = requests.get("http://transfer.ttc.com.ge/?page=live&setLng=ka").text
You first call the variable html_txt and then html_text, there is something wrong...
You should try this:
from bs4 import BeautifulSoup
import requests
import lxml
import html
html_text = requests.get("http://transfer.ttc.com.ge/?page=live&setLng=ka").text
soup = BeautifulSoup(html_text, "lxml")
job = soup.find("tr", class_= "text-left right txt-td")
print(job)

locating child element by BeautifulSoup

I am new to BeautifulSoup and I am praticing with little tasks. Here I try to get the "previous" link in this site. The html is
here
My code is
import requests, bs4
from bs4 import BeautifulSoup
url = 'https://www.xkcd.com/'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
result = soup.find('div', id="comic")
url2 = result.find('ul', class_='comicNav').find('a', rel='prev').find('href')
But it shows NoneType.. I have read some posts about the child elements in html, and I tried some different things. But it still does not work.. Thank you for your help in advance.

Tou could use a CSS Selector instead.
import requests, bs4
from bs4 import BeautifulSoup
url = 'https://www.xkcd.com/'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
result = soup.select('.comicNav a[rel~="prev"]')[0]
print(result)
if you want just the href change
result = soup.select('.comicNav a[rel~="prev"]')[0]["href"]

To get prev link.find ul tag and then find a tag. Try below code.
import requests, bs4
from bs4 import BeautifulSoup
url = 'https://www.xkcd.com/'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
url2 = soup.find('ul', class_='comicNav').find('a',rel='prev')['href']
print(url2)
Output:
/2254/

python beautifulsoup get html tag content

How can I get the content of an html tag with beautifulsoup? for example the content of <title> tag?
I tried:
from bs4 import BeautifulSoup
url ='http://www.websiteaddress.com'
soup = BeautifulSoup(url)
result = soup.findAll('title')
for each in result:
print(each.get_text())
But nothing happened. I'm using python3.

You need to fetch the website data first. You can do this with the urllib.request module. Note that HTML documents only have one title so there is no need to use find_all() and a loop.
from urllib.request import urlopen
from bs4 import BeautifulSoup
url ='http://www.websiteaddress.com'
data = urlopen(url)
soup = BeautifulSoup(data, 'html.parser')
result = soup.find('title')
print(result.get_text())

getting email ids using beautifulsoup

I am working on python. I am learning beautifulsoup & I am parsing a link.
my url :
http://www.dtemaharashtra.gov.in/approvedinstitues/StaticPages/frmInstituteSummary.aspx?InstituteCode=1002
I want to parse email id from that url.
How can I do that?

import urllib2
from bs4 import BeautifulSoup
html = urllib2.urlopen('http://www.dtemaharashtra.gov.in/approvedinstitues/StaticPages/frmInstituteSummary.aspx?InstituteCode=1002').read()
soup = BeautifulSoup(html)
print soup.find(id='ctl00_rightContainer_ContentBox1_lblEMailAddress').text

import requests
from bs4 import BeautifulSoup
r = requests.get("http://www.dtemaharashtra.gov.in/approvedinstitues/StaticPages/frmInstituteSummary.aspx?InstituteCode=1002")
soup = BeautifulSoup(r.text)
soup.find("span", {"id":"ctl00_rightContainer_ContentBox1_lblEMailAddress"}).text

Extracting table info using BeautifulSoup (bs4)

Could anyone please give me a snippet of BeautifulSoup code to extract some of the items in the table found here?
Here's my attempt:
from bs4 import BeautifulSoup
from urllib2 import urlopen
url = "http://biology.burke.washington.edu/conus/accounts/../recordview/record.php?ID=1ll&tabs=21100111&frms=1&res=&pglimit=A"
html = urlopen(url).read()
soup = BeautifulSoup(html,"lxml")
tables = soup.findAll("table")
However, this is failing -- tables turns out to be empty.
Sorry, I'm a BeautifulSoup noob.
Thanks!

The given url page does not contain any table element in the source.
table is generated by javascript inside an iframe.
import urllib
from bs4 import BeautifulSoup
url = 'http://biology.burke.washington.edu/conus/recordview/description.php?ID=1l9l0l421l55llll&tabs=21100111&frms=1&pglimit=A&offset=&res=&srt=&sql2='
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)
tables = soup.find_all('table')
#print(tables)
selenium solution:
from selenium import webdriver
from bs4 import BeautifulSoup
url = "http://biology.burke.washington.edu/conus/accounts/../recordview/record.php?ID=1ll&tabs=21100111&frms=1&res=&pglimit=A"
driver = webdriver.Firefox()
driver.get(url)
driver.switch_to_frame(driver.find_elements_by_tag_name('iframe')[0])
soup = BeautifulSoup(driver.page_source)
tables = soup.find_all('table')
#print(tables)
driver.quit()

this is my current workflow:
from bs4 import beautifulsoup
from urllib2 import urlopen
url = "http://somewebpage.com"
html = urlopen(url).read()
soup = BeautifulSoup(html)
tables = soup.find_all('table')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unable to retrieve crawling information - python

enter image description here Translate. The site can no longer be crawled this way.

it worked for me: from bs4 import BeautifulSoup from urllib.request import urlopen response = urlopen("https://www.naver.com") soup = BeautifulSoup(response, 'html.parser') for anchor in soup.select('.realtime_item'): print(anchor) print("\n\n")

Related

My beautiful soup code does not work as expected

locating child element by BeautifulSoup

python beautifulsoup get html tag content

getting email ids using beautifulsoup

Extracting table info using BeautifulSoup (bs4)

Categories

Resources