In the Xpath Helper plugin, I was able to get the HTML tag content:
QUERY://div[#id="cardModel"]/div[#class="modal-dialog"]/div[#class="modal-content"]//tr[1]/td[1]//tr/td[2]/div/span/text()
RESULTS (1):Enrico
The result is:
Enrico
But in Python:
from selenium import webdriver
from lxml import etree
driver = webdriver.Chrome()
detailUrl = 'https://www.enf.com.cn/3d-energy-1?directory=panel&utm_source=ENF&utm_medium=perc&utm_content=22196&utm_campaign=profiles_panel'
driver.get(detailUrl)
html_ele_detail = etree.HTML(driver.page_source)
time.sleep(5)
companyPhone = html_ele_detail.xpath('//div[#id="cardModel"]/div[#class="modal-dialog"]/div[#class="modal-content"]//tr[1]/td[1]//tr/td[2]/div/span/text()')
print("companyPhone = ", companyPhone)
companyPhone shows empty, what's wrong?Thank you all for solving this problem
As you are already using the selenium library, you do not need to use etree library.
For this application selenium library is enough
see the example below and adapt for your purpose:
from selenium import webdriver
driver = webdriver.Chrome()
detailUrl = 'your url here'
driver.get(detailUrl)
web_element_text = driver.find_element_by_xpath('your xpath directory here').text
print(web_element_text)
See some other examples in another topic by clicking here
Let me know if this was helpful.
Related
Okay, so I've got HTML code like this one:
<span class="lista_td_calendar" rel="1617096300">finished</span>
And I would like to fetch it using lxml, though there are many spans of this class, and each of them has different rel attribute, and I've written something like this:
from lxml import html
import requests
page = requests.get(link)
tree = html.fromstring(page.content)
series = tree.xpath('//span[#class="lista_td_calendar"]/text()')
print(series)
Though it doesn't fetch anything, is there ayway to make it undependant from rel argument?
Problem is that the value I was trying to reach was generated by javascript so it's unreachable through request module, using selenium solved the problem
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(options=chrome_options)
driver.get('https://blackclover.wbijam.pl/pierwsza_seria-170.html')
elements = driver.find_elements_by_class_name('lista_td_calendar')
I used webdriver, because I need to make to copy the site after authentication.
from selenium import webdriver
import myconnutils
import re
from time import sleep
connection = myconnutils.getConnection()
#use Chrome
driver = webdriver.Chrome("/Users/User/Documents/sender/chromedriver")
#enter to site
driver.get("https://example.com/en/account")
driver.find_element_by_id("user").send_keys("userlogin")
driver.find_element_by_id("password").send_keys("passwordinput")
driver.find_element_by_id("submit").click()
What is next? How to copy all page with css, js, images?
Eventually try using Selenium with BeautifulSoup. You should be able to get the source code like this:
example_soup = BeautifulSoup(driver.page_source, 'html.parser')
Eventually this blog post also helps.
Im trying to automate blog posting using selenium and python
Able to locate element using Inspect element in browser
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
browser = webdriver.Firefox()
browser.get('https://www.blogger.com/go/signin')
email = browser.find_element_by_id('identifierId')
email.send_keys("xxxxxx#gmail.com")
email.send_keys(Keys.RETURN)
time.sleep(5)
password = browser.find_element_by_name("password")
password.send_keys("xxxxxx")
password.send_keys(Keys.RETURN)
time.sleep(5)
newpost = browser.find_element_by_partial_link_text("editor")
posttitle = browser.find_element_by_name("K3JSBVB-C-b titleField textField K3JSBVB-C-a")
error
Unable to locate element: [name="K3JSBVB-C-b titleField textField K3JSBVB-C-a"]
This K3JSBVB-C-b class attribute looks dynamic, I would rather suggest sticking to the New post text instead.
The relevant XPath expression would be something like:
//a[text()='New post']
Also consider using Explicit Wait as using sleep is an antipattern you should be avoiding
new_post = WebDriverWait(browser, 10).until(
expected_conditions.element_to_be_clickable(By.XPATH("//a[text()='New post']")))
new_post.click()
More information: How to use Selenium to test web applications using AJAX technology
I am very new to web scraping. I have the following url:
https://www.bloomberg.com/markets/symbolsearch
So, I use Selenium to enter the Symbol Textbox and press Find Symbols to get the details. This is the code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("https://www.bloomberg.com/markets/symbolsearch/")
element = driver.find_element_by_id("query")
element.send_keys("WMT:US")
driver.find_element_by_name("commit").click()
It returns the table. How can I retrieve that? I am pretty clueless.
Second question,
Can I do this without Selenium as it is slowing down things? Is there a way to find an API which returns a JSON?
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
from bs4 import BeautifulSoup
import requests
driver = webdriver.Firefox()
driver.get("https://www.bloomberg.com/markets/symbolsearch/")
element = driver.find_element_by_id("query")
element.send_keys("WMT:US")
driver.find_element_by_name("commit").click()
time.sleep(5)
url = driver.current_url
time.sleep(5)
parsed = requests.get(url)
soup = BeautifulSoup(parsed.content,'html.parser')
a = soup.findAll("table", { "class" : "dual_border_data_table" })
print(a)
here is the total code by which you can get the table you are looking for. now do what you need to do after getting the table. hope it helps
I want to get all link from a web page using selenium ide and python.
For example if I search test or anything on google website and I want all link related to that.
Here is my code
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
baseurl="https://www.google.co.in/?gws_rd=ssl"
driver = webdriver.Firefox()
driver.get(baseurl)
driver.find_element_by_id("lst-ib").click()
driver.find_element_by_id("lst-ib").clear()
driver.find_element_by_id("lst-ib").send_keys("test")
link_name=driver.find_element_by_xpath(".//*[#id='rso']/div[2]/li[2]/div/h3/a")
print link_name
driver.close()
Output
<selenium.webdriver.remote.webelement.WebElement object at 0x7f0ba50c2090>
Using xpath $x(".//*[#id='rso']/div[2]/li[2]/div/h3/a") in Firebug's console.
Output
[a jtypes2.asp]
How can I get links content from a object.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
baseurl="https://www.google.co.in/?gws_rd=ssl"
driver = webdriver.Firefox()
driver.get(baseurl)
driver.find_element_by_id("lst-ib").click()
driver.find_element_by_id("lst-ib").clear()
driver.find_element_by_id("lst-ib").send_keys("test")
driver.find_element_by_id("lst-ib").send_keys(Keys.RETURN)
driver.implicitly_wait(2)
link_name=driver.find_elements_by_xpath(".//*[#id='rso']/div/li/div/h3/a")
for link in link_name:
print link.get_attribute('href')
Try the above code. Your code doesn't send a RETURN key after giving the search keyword. Also I've made changes to implicitly wait for 2 seconds to load the search results and I've changed xpath to get all links.