Failed to extract an attribute using beautifulsoup in deep div tree - python

I am trying to extract an attribute from a deep <div> tree. I tried find_all and select; both failed. Please help. The line in grey is what I need. I need to get the value of data-num out which is 2.
Basically, I need to get the value orange on the top right of https://www.xin.com/c2b_car_o/201/
Thank you for your help!

thank you for the help. I finally found a solution.
[problem]: I could not locate the web elements using beautifulsoup.
[reason]: the element was rendered by javascript. I don't know why? but that's the answer I got from another source. I was told that I need to use selenium.
[solution]: I used selemium to successfully extracted the numbers. See my code below.
[more help]: the problem is solved! But I am unclear what situation is good for selenium. And how do I know what a specific element is rendered by javascript which can't be extracted using beautifulsoup. Please give a try and post your code if you can do it using bs4. Or, please provide more explanations. Thanks.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome('G:\chromedriver\chromedriver')
driver.get("https://www.xin.com/c2b_car_o/201/")
elements = driver.find_elements_by_xpath('//*[#class="mt-number-animate-dom"]')
num_str=[el.get_attribute("data-num") for el in elements]
print(int(''.join(num_str)))

Related

Get Xpath for Element in Python

I've been researching this for two days now. There seems to be no simple way of doing this. I can find an element on a page by downloading the html with Selenium and passing it to BeautifulSoup, followed by a search via classes and strings. I want to click on this element after finding it, so I want to pass its Xpath to Selenium. I have no minimal working example, only pseudo code for what I'm hoping to do.
Why is there no function/library that lets me search through the html of a webpage, find an element, and then request it's Xpath? I can do this manually by inspecting the webpage and clicking 'copy Xpath'. I can't find any solutions to this on stackoverflow, so please don't tell me I haven't looked hard enough.
Pseudo-Code:
*parser is BeautifulSoup HTML object*
for box in parser.find_all('span', class_="icon-type-2"): # find all elements with particular icon
xpath = box.get_xpath()
I'm willing to change my code entirely, as long as I can locate a particular element, and extract it's Xpath. So any other ideas on entirely different libraries are welcome.

How to Click on a Hidden Link with BeautifulSoup or Selenium

I'm trying to download a file using Selenium and BeautifulSoup, but am running into some issues with the way the website is set up. I can see there is a table object containing the link I want deep in the code, but I'm running into difficulties actually instructing BeautifulSoup and Selenium to actually navigate that far and find the link. https://www.theice.com/clear-us/risk-management#margin-rates is the website and I want to download the Margin Scanning File.
hdr={'User-Agent':'Mozilla/5.0'}
req=urllib.request.Request(url,headers=hdr)
icepage=urllib.request.urlopen(req)
htmlitem=icepage.read()
soup=BeautifulSoup(htmlitem,'lxml')
divs=soup.find('div',{'class':'sticky-header__main'})
print(divs.findChild().find('div',{'class':'row'}).find('div',{'class':'1-main true-grid-10'}).find_all('div')[2])
From there divs.findChild().find('div',{'class':'row'}).find('div',{'class':'1-main true-grid-10'}).find_all('div')[2] is the closest I have gotten to selecting the next div that has id='content-5485eefe-b105-49ed-b1ac-7e9470d29262' and I want to drill down that to the ICUS_MARGIN_SCANNING csv in the table five or six further div levels below that.
With Selenium I'm even further lost where I've been trying variations of driver.find_element_by_link_text('Margin Scanning') and getting nothing back.
Any help with accessing that table and the ICUS_Margin_scanning file would be much appreciated. Thank you!
used f12=> network tab and found this page so here u go
from bs4 import BeautifulSoup
import requests
import datetime
BASE_API_URL='https://www.theice.com'
r=requests.get(f'https://www.theice.com/marginrates/ClearUSMarginParameterFiles.shtml?getParameterFileTable&category=Current&_={int(datetime.datetime.now().timestamp()*1000)}')
soup=BeautifulSoup(r.content,features='lxml')
margin_scanning_link=BASE_API_URL+soup.find_all("a", string="Margin Scanning")[0].attrs['href']
margin_scanning_file=requests.get(margin_scanning_link)

Extracting links from website with selenium bs4 and python

Okay so.
The heading might seem like this question has already been asked but I had no luck finding an answer for it.
I need help with making link extracting program with python.
Actually It works. It finds all <a> elements on a webpage. Takes their href="" and puts it in an array. Then it exports it in csv file. Which is what I want.
But I can't get a hold of one thing.
The website is dynamic so I am using the Selenium webdriver to get JavaScript results.
The code for the program is pretty simple. I open a website with webdriver and then get its content. Then I get all links with
results = driver.find_elements_by_tag_name('a')
Then I loop through results with for loop and get href with
result.get_attribute("href")
I store results in an array and then print them out.
But the problem is that I can't get the name of the links.
This leads to Google
Is there any way to get 'This leads to Google' string.
I need it for every link that is stored in an array.
Thank you for your time
UPDATE!!!!!
As it seems it only gets dynamic links. I just notice this. This is really strange now. For hard coded items, it returns an empty string. For a dynamic link, it returns its name.
Okay. So. The answer is that instad of using .text you shoud use get_attribute("textContent"). Works better than get_attribute("innerHTML")
Thanks KunduK for this answer. You saved my day :)

How to find a javascript function using selenium/python/chrome

Im struggling to find and click javascript element which inspected looks like this:
"Patch advisories"
I tried to find by name, partial name and a href but I am a beginner so I did not managed to make it work.
driver.find_element_by_css_selector("a[onlick*=PAAdvanced-search','300]").click(); does not work either
Can someone advise?
Thank you
If your requirement is to click on above mentioned link then find that using text.
Use //a[text()='Patch advisories'] xpath :
element=driver.find_element_by_xpath("//a[text()='Patch advisories']")
element.click()
You should be able to do something simple like
driver.find_element_by_link_text("Patch advisories").click()
An alternative using the href would be to do something like
driver.find_element_by_css_selector("a[href*='PAAdvanced-search']").click()
If those aren't working, something else is going on in the page. You either need a wait to wait for them to be available or you may have an IFRAME on the page.

Unable to locate the element while using selenium-webdriver

I am very much new to selenium WebDriver and I am trying to automate a page which has a button named "Delete Log File". Using FireBug I got to know that, the HTML is described as
and also the css selector is defined as "#DeleteLogButton" using firepath
hence I used
browser.find_element_by_css_selector("#DeleteLogButton").click() in webdriver to click on that button but its now working and also, I tried,
browser.find_element_by_id("DeleteLogButton").click() to click on that button. Even this did not find the solution for my problem...
Please help me out in resolving the issue.
Most of the times im using By.xpath and it works specially if you use contains in your xpath. For example : //*[contains(text(),'ABC')]
This will look for all the elements that contains string 'ABC'
In your case you can replace ABC with Delete Log File
try to find it by name like :
browser.find_element_by_name("Delete Log File").click();

Categories