I am trying to count the number of items in a list box on a webpage and then select multiple items from this list box. I can select the items fine I am just struggling to find out how to count the items in the list box.
see code:
from selenium import webdriver
from selenium.webdriver.support.ui import Select
... ...
accountListBox = Select(driver.find_element_by_id("ctl00_MainContent_accountItemsListBox"))
accountListBox.select_by_index(0)
print(len(accountListBox))
I have tried using len() which results in the error "TypeError: object of type 'Select' has no len()".
I have also tried accountListBox.size() and also removed the 'Select' from line 3 which also doesn't work.
Pretty new to this so would appreciate your feedback.
Thanks!
According to the docs a list of Select element's options can be obtained by saying select.options. In your particular case this would be accountListBox.options, and you need to call len() on that, and not on the Select instance itself:
print(len(accountListBox.options))
Or, if you only want to print a list of currently selected options:
print(len(accountListBox.all_selected_options))
You should use find_elements by using some common selector for each listbox's items to find all of them, store the found elements into a variable, and use a native python's library to count them.
I usually use Selenium along with Beautiful Soup. Beautiful Soup is a Python package for parsing HTML and XML documents.
With Beautiful Soup you can get the count of items in a list box in the following way:
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.PhantomJS() # or webdriver.Firefox()
driver.get('http://some-website.com/some-page/')
html = driver.page_source.encode('utf-8')
b = BeautifulSoup(html, 'lxml')
items = b.find_all('p', attrs={'id':'ctl00_MainContent_accountItemsListBox'})
print(len(items))
I assumed that the DOM element you want to find is a paragraph (p tag), but you can replace this with whatever element you need to find.
Related
My code goes to a website, clicks each iteration of row (of the table) which opens a new window.
I want to scrape 1 information per this new window, but I am having difficulty using CSS selectors to get this field (Faculty)
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
import time
import requests
driver = webdriver.Chrome()
productlink=[]
driver.get('https://aaaai.planion.com/Web.User/SearchSessions?ACCOUNT=AAAAI&CONF=AM2021&USERPID=PUBLIC&ssoOverride=OFF')
time.sleep(3)
page_source = driver.page_source
soup = BeautifulSoup(page_source,'html.parser')
productlist=driver.find_elements_by_class_name('clickdiv')
for item in productlist:
item.click() #opens the new window per each row
time.sleep(2)
faculty=driver.find_elements_by_xpath('//*[#id="W1"]/div/div/div/div[2]/div[2]/table/tbody/tr[7]/td/table/tbody/tr/td[2]/b')
print(faculty)
driver.find_element_by_class_name('XX').click()#closes window
time.sleep(1)
You have:
faculty=driver.find_elements_by_xpath('//*[#id="W1"]/div/div/div/div[2]/div[2]/table/tbody/tr[7]/td/table/tbody/tr/td[2]/b')
So you have an array of elements; when you print, you will have something like differents selenium elements. If you only have one element, you can use find_element_by_xpath instead of find_elements_by_xpath.
If you want to obtain the faculty values, and you have an array, you need to do a for and extract the text or get the attribute of all the elements.
Faculty reside within a table with class sorttable. You want the first of these so can use nth-of-type to restrict matches. Then the actual names are within b tags so use a child combinator ( )to move to the child b tags within the table. Use a list comprehension to return a faculty list:
faculty = [i.text for i in driver.find_elements_by_css_selector('.sortable:nth-of-type(1) b')
I sadly couldn't find any resources online for my problem. I'm trying to store elements found by XPath in a list and then loop over the XPath elements in a list to search in that object. But instead of searching in that given object, it seems that selenium is always again looking in the whole site.
Anyone with good knowledge about this? I've seen that:
// Selects nodes in the document from the current node that matches the selection no matter where they are
But I've also tried "/" and it didn't work either.
Instead of giving me the text for each div, it gives me the text from all divs.
My Code:
from selenium import webdriver
driver = webdriver.Chrome()
result_text = []
# I'm looking for all divs with a specific class and store them in a list
divs_found = driver.find_elements_by_xpath("//div[#class='a-fixed-right-grid-col a-col-left']")
# Here seems to be the problem as it seems like instead of "divs_found[1]" it behaves like "driver" an looking on the whole site
hrefs_matching_in_div = divs_found[1].find_elements_by_xpath("//a[contains(#href, '/gp/product/')]")
# Now I'm looking in the found href matches to store the text from it
for href in hrefs_matching_in_div:
result_text.append(href.text)
print(result_text)
You need to add . for immediate child.Try now.
hrefs_matching_in_div = divs_found[1].find_elements_by_xpath(".//a[contains(#href, '/gp/product/')]")
How do I click an element using selenium and beautifulsoup in python? I got these lines of code and I find it difficult to achieve. I want to click every element in every iteration. There are no pagination or next page. There are only like about 10 elements and after clicking the last element, it should stop. Does anyone know what should I do. Here are my code
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
import urllib
import urllib.request
from bs4 import BeautifulSoup
chrome_path = r"C:\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
url = 'https://www.99.co/singapore/condos-apartments/a-treasure-trove'
driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html,'lxml')
details = soup.select('.FloorPlans__container__rwH_w') //Whole container of the result
for d in details:
picture = d.find('span',{'class':'Tappable-inactive'}).click() //the single element.
print(d)
driver.close()
Here is the site https://www.99.co/singapore/condos-apartments/a-treasure-trove . I want to scrape the details and the image in every floor plans section but it is difficult because the image only appears after you click the specific element. I can only get the details except for the image itself. Try it yourself so that you know what I mean.
EDIT:
I tried this method
for d in driver.find_elements_by_xpath('//*[#id="floorPlans"]/div/div/div/div/span'):
d.click()
The problem is it clicks too fast that the image couldn't load. And also im using selenium here. Is there any method like selecting a beautifulsoup like this format picture = d.find('span',{'class':'Tappable-inactive'}).click() ?
You cannot interact with website widgets by using beautifulSoup you need to work with selenium. There are 2 ways to handle this problem.
First is to get the main wrapper (class) of the 10 elements and then iterate to each child element of the main class.
You can get the element by xpath and increment the last number in xpath by one in each iteration to move to the next element.
I print some result to check your code.
"details" only has one item.
And "picture" is not element. (So it's not clickable.)
details = soup.select('.FloorPlans__container__rwH_w')
print(details)
print(len(details))
for d in details:
print(d)
picture = d.find('span',{'class':'Tappable-inactive'})
print(picture)
Output:
For your edited version, you can check images have been visible before you do click().
Use visibility_of_element_located to do.
Reference: https://selenium-python.readthedocs.io/waits.html
I'm trying do a loop for obtain the elements from a list. This is one part of each element in the list element's (the problem is that i do know how select the elements in produto-nome, produto-preco, categoria e subcategoria.
<div categoria="Carnes" class="panel-product" produto-fabricante="" produto-nome="Contra Filé Maturada FRIBOI Resfriado Pedaço 1,1kg" produto-preco="45.09" produto-qtd="1" produto-sku="0028363" ruptura="Verdadeiro" subcategoria="Carne bovina">
I'm using in Python the package Selenium so, to extract from html page this list i use: soup.find_all("div", "panel-product"), but after this i don't know how select the elements from the informations described above. Thanks!
By default soup.find_all should provide an array of objects containing each element, so it could be achieved with
data = soup.find_all("div", "panel-product")
for d in data:
produtofabricante = d.attrs['produto-fabricante']
# do what you need here
Check Soap documentation for .attrs
As you wanted to use Selenium, you do not need BeautifulSoup. Initialize the selenium driver and get the html, and then use:
c = driver.find_elements_by_class_name("panel-product")
And then to get the attributes in this class:
print(c.get_attribute('produto-nome'))
I want to extract text of a particular span which is given in the snapshot. I am unable to find the span by its class attribute. I have attached The html source (snapshot) of the data to be extracted as well.
Any suggestions?
import bs4 as bs
import urllib
sourceUrl='https://www.pakwheels.com/forums/t/planing-a-trip-from-karachi-to-lahore-by-road-in-feb-2017/414115/2'
source=urllib.request.urlopen(sourceUrl).read()
soup=bs.BeautifulSoup(source, 'html.parser')
count=soup.find('span',{'class':'number'})
print(len(count))
See the image:
If you disable JavaScript in your browser you can easily see that span element that you want are disappearing.
In order to get that element one of the possible solutions can be using Selenium browser.
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://www.pakwheels.com/forums/t/planing-a-trip-from-karachi-to-lahore-by-road-in-feb-2017/414115/2')
span = driver.find_element_by_xpath('//li[3]/span')
print(span.text)
driver.close()
Output:
Another solution - find desired value deep down in web page source(in Chrome browser press Ctrl+U) and extract span value using a regular expression.
import re
import requests
r = requests.get(
'https://www.pakwheels.com/forums/t/planing-a-trip-from-karachi-to-lahore-by-road-in-feb-2017/414115/2')
span = re.search('\"posts_count\":(\d+)', r.text)
print(span.group(1))
Output:
If You know how to use CSS SELECTORS you can use :
mySpan = soup.select("span.number")
It will return List of all nodes which are valid for this selector.
So mySpan[0] could contain what You need. And then use one of the methods like for example get_text() to get what You need.
First of all you need to decode response
source=urllib.request.urlopen(sourceUrl).read().decode()
Maybe your issue will disappears after this fix