I am working in Python with Selenium and have to do this:
elem = driver.find_element_by_css_selector('#results .page_block_sub_header_count')
elem = driver.find_element_by_css_selector('#results .page_block_sub_header_count')
elem = driver.find_element_by_css_selector('#results .page_block_sub_header_count')
elem = driver.find_element_by_css_selector('#results .page_block_sub_header_count')
I am doing this because there are 4 elements on the page that have this CSS Selector "#results .page_block_sub_header_count". And I want to get the result of the 4th one.
Is there a good way I could put this into code? I do not want 4 similar lines. And I believe this is not considered a good code practice.
Just use the find_elements_by_css_selector() (note the "s") to locate multiple elements matching a locator - CSS selector in your case:
results = driver.find_elements_by_css_selector('#results .page_block_sub_header_count')
results would be a list of WebElement instances, you can iterate over it and get, say, a text:
for item in results:
print(item.text)
Without the HTML it is tough to provide you the best fit solution. How ever as I can see from your tried out code, you have used :
driver.find_element_by_css_selector('#results .page_block_sub_header_count')
This essentially means, the node with id as results has atleast 4 childnodes with class as page_block_sub_header_count. So to construct the best fit css_selector we are missing the information regarding the tagName which contains the class page_block_sub_header_count which will be available in the HTML DOM.
Still if you want to get the result of the 4th one you can use the following line of code :
elem = driver.find_element_by_css_selector('#results .page_block_sub_header_count:nth-of-type(4)')
Also you can use the following code (just a case):
elements = driver.find_elements_by_css_selector("#results .page_block_sub_header_count")
for index in range(len(elements)):
elements[index].text
PS: #alecxe's code is better for usage.
Related
Basic concept I know:
find_element = find single elements. We can use .text or get.attribute('href') to make the element can be readable. Since find_elements is a list, we can't use .textor get.attribute('href') otherwise it shows no attribute.
To scrape information to be readable from find_elements, we can use for loop function:
vegetables_search = driver.find_elements(By.CLASS_NAME, "product-brief-wrapper")
for i in vegetables_search:
print(i.text)
Here is my problem, when I use find_element, it shows the same result. I searched the problem on the internet and the answer said that it's because using find_element would just show a single result only. Here is my code which hopes to grab different urls.
links.append(driver.find_element(By.XPATH, ".//a[#rel='noopener']").get_attribute('href'))
But I don't know how to combine the results into pandas. If I print these codes, links variable prints the same url on the csv file...
vegetables_search = driver.find_elements(By.CLASS_NAME, "product-brief-wrapper")
Product_name =[]
links = []
for search in vegetables_search:
Product_name.append(search.find_element(By.TAG_NAME, "h4").text)
links.append(driver.find_element(By.XPATH, ".//a[#rel='noopener']").get_attribute('href'))
#use panda modules to export the information
df = pd.DataFrame({'Product': Product_name,'Link': links})
df.to_csv('name.csv', index=False)
print(df)
Certainly, if I use loop function particularly, it shows different links.(That's mean my Xpath is correct(!?))
product_link = (driver.find_elements(By.XPATH, "//a[#rel='noopener']"))
for i in product_link:
print(i.get_attribute('href'))
My questions:
Besides using for loop function, how to make find_elements becomes readable? Just like find_element(By.attribute, 'content').text
How to go further step for my code? I cannot print out different urls.
Thanks so much. ORZ
This is the html code which's inspected from the website:
This line:
links.append(driver.find_element(By.XPATH, ".//a[#rel='noopener']").get_attribute('href'))
should be changed to be
links.append(search.find_element(By.XPATH, ".//a[#rel='noopener']").get_attribute('href'))
driver.find_element(By.XPATH, ".//a[#rel='noopener']").get_attribute('href') will always search for the first element on the DOM matching .//a[#rel='noopener'] XPath locator while you want to find the match inside another element.
To do so you need to change WebDriver driver object with WebElement search object you want to search inside, as shown above.
I am trying to get the href with selenium and python.
This is my page:
Some class information are changing depending on which elements. So I am trying basically to get all href for <a id="job____ .....
links.append(job.find_element_by_xpath('//a[#aria-live="polite"]//span').get_attribute(name="href"))
I tried couple of things but can't figure out how. How can i get all my href from the screenshot above?
Try this, but take care your xpath
"//a[#aria-live="polite"]//span"
will get a span, and i dont see any span with href on your html. Maybe this xpath solve it
//a[./span[#aria-live="polite"]]
links.append(job.find_element_by_xpath('//a[./span[#aria-live="polite"]]').get_attribute("href"))
But it wont get all urls, this with find_elements (return a list), extend your url list with list comprehension
links.extend([x.get_attribute("href") for x in job.find_elements_by_xpath('//a[./span[#aria-live="polite"]]')])
edit 1, other xpath solution
links.extend(["website_base_url"+x.get_attribute("href") for x in job.find_elements_by_xpath('//a[contains(#id, "job_")]')])
list_of_elements_with_href = wd.find_elements_by_xpath("//a[contains(#href,'')]")
for el_with_href in list_of_elements_with_href :
links.append(el.with_href.get_attribute("href"))
or if you need more specify:
list_of_elements_with_href = wd.find_elements_by_xpath("//a[contains(#href,'') and contains(#id,'job_')]")
Based on your description and attached image, I think you have got the wrong xpath. Try the following code.
find_links = driver.find_elements_by_xpath("//a[starts-with(#id,'job_')]")
links = []
for link in find_links:
links.append(link.get_attribute("href"))
Please note elements in find_elements_by_xpath instead of element.
I am unable to test this solution as you have not provided the website.
I have some trouble finding a element here Link
I want to scrape the names of the matches using:
WebDriverWait(driver,10).until(EC.presence_of_element_located((By.XPATH, '/html/body/div[7]/div[1]/div/div/div[4]/div/div/main/div[2]/div/div/div/div//div/div/div/article/main/a/span')))
(I don't like using the XPath, but otherwise I'd also get the bets that weren't just full time result bets)
But when I run this it returns an empty list.
I have tried figuring out wether the match names are within an iframe or something but I can't figure it out. Does someone know how I can scrape these elements?
N.B. I have checked multiple times whether the XPath is actually in the HTML and it is.
so the code you are using is to wait the page to load until the xpath is found. I wrote the code below and it works. It just prints dates too, you need to adjust. However I just run it so I am confident it works. Ensure you load all the dependencies in the top. It works with class and not xpath.
driver.get("https://sports.williamhill.com/betting/en-gb/football/competitions/OB_TY295/English-Premier-League/matches/OB_MGMB/Match-Betting")
WebDriverWait(driver, 10).until(
lambda x: x.find_element_by_class_name('sp-o-market__title').is_displayed())
out = driver.find_elements_by_class_name('sp-o-market__title')
for item in out:
item = item.get_attribute('innerHTML')
item = item.split('<span>')[1]
item = item.split("</span>")[0]
print(item)
Produces:
Arsenal v Norwich
Brentford v Brighton
I have some basic selenium code and an xpath expression that performs well.
The xpath:
/html/body/div/div/table[2]/tbody/tr/td/div/table/tbody/tr//td/div[5]/table/tbody/tr[2]
selects the section I'm interested in, containing many elements.
however, append '//p' like so:
/html/body/div/div/table[2]/tbody/tr/td/div/table/tbody/tr//td/div[5]/table/tbody/tr[2]//p
does NOT select only those elements. Instead, what I ended up with is a single element.
I'm obviously missing something basic. This is an example of what my code looks like:
#!/usr/bin/env python
from selenium import webdriver
from time import sleep
fp = webdriver.FirefoxProfile()
wd = webdriver.Firefox(firefox_profile=fp)
wd.get("http://someurl.html")
# appending //p here is the problem that finds only a single <a> element
elems = wd.find_element_by_xpath("/html/body/div/div/table[2]/tbody/tr/td/div/table/tbody/tr/td/div[5]/table/tbody/tr[2]//p")
print elems.get_attribute("innerHTML").encode("utf-8", 'ignore')
wd.close()
EDIT: solved by using find_element*s*_by_xpath instead of find_element as suggested (thanks, Alexander Petrovich, for spotting this).
Don't use such locators. Shorten them a bit. Something like //table[#attr='value']/tbody/tr[2]//p
To select multiple elements, use find_elements_by_xpath() method (it returns a list of WebElement objects)
You will not be able to use elems.get_attribute(). Instead, you'll have to iterate through the list
elems = wd.find_elements_by_xpath("/your/xpath")
for el in elems:
print '\n' + el.get_attribute('innerHTML').encode("utf-8", 'ignore')
I try to index over results returned by an xpath. For example:
xpath = '//a[#id="someID"]'
can return a few results. I want to get a list of them. I thought that doing:
numOfResults = sel.get_xpath_count(xpath)
l = []
for i in range(1,numOfResults+1):
l.append(sel.get_text('(%s)[%d]'%(xpath, i)))
would work because doing something similar with firefox's Xpath checker works:
(//a[#id='someID'])[2]
returns the 2nd result.
Ideas why the behavior would be different and how to do such a thing with selenium
Thanks
Can you try the xpath /html/descendant::a[#id="someID"] You can replace the /html with something else that is an ancestor of your links like id('content'). You should then be able to locate individual links using [1], [2] etc.
From the XPath TR at http://www.w3.org/TR/xpath#path-abbrev:
NOTE: The location path //para[1] does not mean the same as the location path /descendant::para[1]. The latter selects the first descendant para element; the former selects all descendant para elements that are the first para children of their parents.
The answer is that you need to tell selenium that you're using xpath:
numOfResults = sel.get_xpath_count(xpath)
l = []
for i in range(1,numOfResults+1):
l.append(sel.get_text(xpath='(%s)[%d]'%(xpath, i)))
In Selenium you normally do it without the extra brackets so your loop would look like the following
numOfResults = sel.get_xpath_count(xpath)
l = []
for i in range(1,numOfResults+1):
l.append(sel.get_text('%s[%d]'%(xpath, i)))
And that will produce a valid XPath in Selenium like //a[#id='someID'][2]