Requests vs Selenium Python in Youtube - python

When I use Selenium library to find the length of related channel in YouTube channel Page it gives me 12. But when I use Requests library to find the length it gives me 0.
I want to use requests please help me if it's possible
My code
Requests
import requests
from bs4 import BeautifulSoup
import time
r = requests.get("https://www.youtube.com/channel/UCoykjkkJxsz7JukJR7mGrwg/about")
soup = BeautifulSoup(r.content, 'html.parser')
bb = soup.find_all("ytd-mini-channel-renderer",class_="style-scope ytd-vertical-channel-section-renderer")
print(len(bb))
Selenium
from selenium import webdriver
from bs4 import BeautifulSoup
import time
driver = webdriver.Chrome(chrome_path)
driver.get("https://www.youtube.com/channel/UCoykjkkJxsz7JukJR7mGrwg/about")
soup = BeautifulSoup(driver.page_source, 'html.parser')
bb = soup.find_all("ytd-mini-channel-renderer",class_="style-scope ytd-vertical-channel-section-renderer")
print(len(bb))

Every time I've run into an issue like this, it was because JS was creating the data I was after. If this is the case, you likely won't be able to use requests as it can't handle the JS.
If you navigate to that youtube page in a browser, you can see that "ytd-mini-channel-renderer" exists if you inspect it, but if you view source, you get 0 results. The code you can see from "view source" is what requests is getting.

Sometimes the issue is caused by the soup object having different tags from the ones you see from dev tools, which is what is happening in your case. On analysing the soup object you'll notice the information you need is actually now in <h3 class="yt-lockup-title ">.
This code will pull the results you want:
import requests
from bs4 import BeautifulSoup
r = requests.get("https://www.youtube.com/channel/UCoykjkkJxsz7JukJR7mGrwg/about")
soup = BeautifulSoup(r.content, 'html.parser')
bb=soup.find_all('h3',class_='yt-lockup-title')
print(len(bb))

Related

Browser code and beautifulsoup collection different

I try to parse soccerstand front page soccer matches and fail because the items I get with BeautifulSoup are really different from what I see in browser.
My code is simple at the moment:
import urllib.request
from bs4 import BeautifulSoup
with urllib.request.urlopen('https://soccerstand.com/') as response:
url_data = response.read()
soup = BeautifulSoup(url_data, 'html.parser')
print(soup.find_all('div.event__match'))
So I tried this and this failed. When I checked soup variable it turned out not to contain such divs at all, so what I get with BS is different from what I see by inspecting code on the website.
What's the reason for that? Is there any workaround?

HTML parsing with BeautifulSoup in Python unknown error

I know that this code works for other websites that end in .com
However I noticed that the code doesn't work if I try to parse websites that end in .kr
Can somebody help to find why this is happening and an alternate solution to parse these types of websites?
Following is my code.
import requests
from bs4 import BeautifulSoup
URL = 'https://everytime.kr/#nN4K1XC0weHnnM9VB5Qe'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id='container')
print(results)
The URL here is a link to my timetable. I need to parse this website so that I can easily collect the information for the subjects and data relevant to the subject (duration, location, professor's name, etc.).
Thanks
Website is serving dynamic content and you get an empty response back - you may use selenium.
Example
from selenium import webdriver
from bs4 import BeautifulSoup
import time
driver = webdriver.Chrome(executable_path=r'C:\Program Files\ChromeDriver\chromedriver.exe')
url = 'https://everytime.kr/#nN4K1XC0weHnnM9VB5Qe'
driver.get(url)
time.sleep(5)
soup = BeautifulSoup(driver.page_source, 'html.parser')
results = soup.find(id='container')
print(results)
driver.close()

Neither Selenium or Beautiful soup showing full html source?

I tried using beautiful soup to parse a website, however when I printed "page_soup" I would only get a portion of the HTML, the beginning portion of the code, which has the info I need, was omitted. No one answered my question. After doing some research I tried using Selenium to access the full HTML, however I got the same results. Below are both of my attempts with selenium and beautiful soup. When I try and print the html it starts off in the middle of the source code, skipping the doctype, lang etc initial statements.
from selenium import webdriver
from bs4 import BeautifulSoup
browser = webdriver.Chrome( executable_path= "/usr/local/bin/chromedriver")
browser.get('https://coronavirusbellcurve.com/')
html = browser.page_source
soup = BeautifulSoup(html)
print(soup)
import bs4
import urllib
from urllib.request import urlopen as uReq
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup as soup
htmlPage = urlopen(pageRequest).read()
page_soup = soup(htmlPage, 'html.parser')
print(page_soup)
The requests module seems to be returning the numbers in the first table on the page assuming you are referring to US Totals.
import requests
r = requests.get('https://coronavirusbellcurve.com/').content
print(r)

How to get some data in real time from a website using python?

I want to fetch som data from website
https://web.sensibull.com/optionchain?expiry=2020-03-26&tradingsymbol=NIFTY
I am using beautifulsoup library to fetch this data, and have tried the following code:
import requests
import urllib.request
import time
from bs4 import BeautifulSoup
url = 'https://web.sensibull.com/optionchain?expiry=2020-03-26&tradingsymbol=NIFTY'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
b = soup.find("div", {"class": "style__AtmIVWrapper-idZNMX kUMMRI"})
print(b)
But it shows "None" as the output.
Although there is only one class of this name in the full HTML code, but I also tried this:
for b in soup.find_all('div', attrs={'class':'style__AtmIVWrapper-idZNMX kUMMRI'}):
print(b.get_text())
print(len(b))
But it doesn't work.
Also tried soup.find("div")
But it does not shows the required div tag in the output, maybe due to nested divs present.
Unable to fetch this data and proceed with my work. Please help.
If you are looking for code. This might help:-
from selenium import webdriver
import time
webpage = 'https://web.sensibull.com/optionchain?expiry=2020-03-26&tradingsymbol=NIFTY'
driver = webdriver.Chrome(executable_path='Your/path/to/chromedriver.exe')
driver.get(webpage)
time.sleep(10)
nifty_fut = driver.find_element_by_xpath('//*[#id="app"]/div/div[4]/div[2]/div[3]/div/div/div[2]/div[1]/div[1]/div/button/span[1]/div[1]')
print(nifty_fut.text)
atm_iv = driver.find_element_by_xpath('//*[#id="app"]/div/div[4]/div[2]/div[3]/div/div/div[2]/div[1]/div[2]')
print(atm_iv.text)
driver.quit()
Could be a syntax problem try with soup.find_all("div", class_="style__AtmIVWrapper-idZNMX kUMMRI") or just soup.find("div", class_="style__AtmIVWrapper-idZNMX kUMMRI")
If interested in webscraping and bs4 take a look at the documentation https://www.crummy.com/software/BeautifulSoup/bs4/doc/#find

Using Requests and BeautifulSoup - Python returns tag with no text

I'm trying to capture the number of visits on this page, but python returns the tag with no text.
This is what I've done.
import requests
from bs4 import BeautifulSoup
r = requests.get("http://www.kijiji.ca/v-2-bedroom-apartments-condos/city-of-halifax/clayton-park-west-condo-style-luxury-2-bed-den/1016364514")
soup = BeautifulSoup(r.content)
print soup.find_all("span",{"class":"ad-visits"})
The values you are trying to scrape are populated by javascript so beautfulsoup or requests aren't going to work in this case.
You'll need to use something like selenium to get the output.
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("http://www.kijiji.ca/v-2-bedroom-apartments-condos/city-of-halifax/clayton-park-west-condo-style-luxury-2-bed-den/1016364514")
soup = BeautifulSoup(driver.page_source , 'html.parser')
print soup.find_all("span",{"class":"ad-visits"})
Selenium will return the page source as rendered and you can then use beautifulsoup to get the value
[<span class="ad-visits">385</span>]

Categories