I am trying to get the product's seller yet it will not get the text. I assume this is some weird thing since the text is also a link. Any help?
Python Code:
self.sold_by = driver.find_element_by_css_selector('#sellerProfileTriggerId').text
HTML Element:
SKUniverse
Try like this:
self.sold_by = driver.find_element_by_css_selector('#sellerProfileTriggerId')
text_element=self.sold_by.text
print(text_element)
Also, why aren't you using xpath or id selectors! Just asking :)
I am writing a webscraper that scrapes data from a list of links one after the other. The problem is that the website uses the same class names for up to 3 different buttons at once with no other unique identifiers used which to my understanding makes it impossible to point to the exact button if there are more.
I used the driver.find.element which worked well since it just found the first result and basically ignored the other buttons. However, on some pages, the information the offers information that I am trying to scrape is missing which results in the script picking up wrong data and filling it in even though I am not interested in that data at all.
So I went out with a solution that checks whether the scraped information contains a specific string that only appears for that one piece of information that I am trying to get and if the string is not found the data variable should get overwritten with empty data so that it would be obvious that the information doesn't exist.
However, during the process the if statement that I am trying to filter the strings with doesn't seem to work at all. When there are no buttons on the webpage it indeed manages to fill in the variable with empty data. However, once a different button appears it's not filtered and gets through somehow and ruins the whole thing.
This is an example webpage which doesn't contain the data at all :
https://reality.idnes.cz/rk/detail/nido-group-s-r-o/5a85b108a26e3a2adb4e394c/?page=185
This is an example webpage that contains 2 buttons with data the first of which I am trying to scrape look for the "nemovitostí" text in the blue button that's what I am trying to filter.
https://reality.idnes.cz/rk/detail/m-m-reality-holding-a-s/5a85b582a26e3a321d4f2700/
This is the problematic code :
# Offers
offers = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "span.btn__text")))
offers = offers.text
print(offers)
# Check if scraped information contains offers else move on
if "nemovitostí" or "nemovitosti" or "nemovitost" in offers:
pass
else:
offers = ""
Since the if statement should supposedly look for the set of strings and otherwise if not found should execute any other code under the else statement I can't seem to understand how is it possible that the data gets in at all. There are no error codes or warning it just picks up the data instead of ignoring it even if the string is different.
This is more of the code for reference :
# Open links.csv file and read it's contents
with open('links.csv') as read:
reader = csv.reader(read)
link_list = list(reader)
# Information search
for link in link_list:
driver.get(', '.join(link))
# Title
title = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "h1.b-annot__title.mb-5")))
# Offers
offers = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "span.btn__text")))
offers = offers.text
print(offers)
# Check if scraped information contains offers else move on
if "nemovitostí" or "nemovitosti" or "nemovitost" in offers:
None
else:
offers = ""
# Address
address = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "p.font-sm")))
# Phone number
# Try to obtain phone number if nonexistent move on
try:
phone_number = wait.until(ec.presence_of_element_located((By.XPATH, "//a[./span[contains(#class, 'icon icon--phone')]]")))
phone_number = phone_number.text
except TimeoutException:
phone_number = ""
# Email
# Try to obtain email if nonexistent move on
try:
email = wait.until(ec.presence_of_element_located((By.XPATH, "//a[./span[contains(#class, 'icon icon--email')]]")))
email = email.text
except TimeoutException:
email = ""
# Print scraping results
print(title.text, " ", offers, " ", address.text, " ", phone_number, " ", email)
# Save results to a list
company = [title.text, offers, address.text, phone_number, email]
# Write results to scraped.xlsx file
worksheet.write_row(row, 0, company)
del title, offers, address, phone_number, email
# Push row number lower
row += 1
workbook.close()
driver.quit()
How is it possible that the data still gets through? Is there an error in my syntax? If you saw my mistake please let me know so I can get better next time! Thanks to anyone for any sort of help!
1. The problem is that the website uses the same class names for up to 3 different buttons at once with no other unique identifiers used which to my understanding makes it impossible to point to the exact button if there are more
You can actually get the element you need if you use By.XPATH instead By.CSS_SELECTOR.
First would be (//span[#class='btn__text'])[1], second (//span[#class='btn__text'])[2] and third (//span[#class='btn__text'])[3]
Or if you are not sure what the order would be, you can be more specific like
(//span[#class='btn__text' and contains(text(),'nemovitostí')])
2. Second problem is related to if syntax in python
It should be like this
if "nemovitostí" in offers or "nemovitosti" in offers or "nemovitost" in offers:
There might be a nicer way to write this, maybe something like this:
for i in ["nemovitostí" , "nemovitosti" , "nemovitost"]:
if i in offers:
The most ideal way to write this would be the following
value=["nemovitostí","nemovitosti","nemovitost"]
if any(s in offers for s in value):
#dosomethinghere
else:
offers = ""
please help me to find the solution to get the information from this html code by using Selenium without XPath because I want to make a loop from it.
I want to get the result as: "4.7" from this "title="4.7/5 - 10378 Reviews"
please check the picture below:
enter image description here
If you have driver setup already,
Than try something like this,
rating = driver.find_element_by_class_name("stars").get_attribute("title")
print (rating)
So I am trying to scrape a list of email addresses from my User Explorer page in Google Analytics.
which
I obtained the x-path via here
The item's X-path is //*[#id="ID-explorer-table-dataTable-key-0-0"]/div
But no matter how I do:
driver.find_elements_by_xpath(`//*[#id="ID-explorer-table-dataTable-key-0-0"]/div`)
or
driver.find_elements_by_xpath('//*[#id="ID-reportContainer"]')
or
driver.find_elements_by_id(r"ID-explorer-table-dataTable-key-0-0")
it returns an empty list.
Can anyone tell me where I have gone wrong?
I also tried using:
html = driver.page_source
but of course I couldnt find the list of the emails as well.
I am also thinking, if this doesnt work, whether there is a way I can automate control + a and copy all the text displayed into a string in Python and then usere.findall() to find the email addresses?
email = driver.find_element_by_xpath(//*[#id="ID-explorer-table-dataTable-key-0-0"]/div)
print("email", email.get_attribute("innerHTML"))
Thanks for the help of #Guy!
It was something related to iframe and this worked and detected which frame the item i need belong to:
iframelist=driver.find_elements_by_tag_name('iframe')
for i in range(len(iframelist)):
driver.switch_to.frame(iframelist[i])
if len(driver.find_elements_by_xpath('//*[#id="ID-explorer-table-dataTable-key-0-0"]/div'))!=0:
print('it is item {}'.format(i))
break
else:
driver.switch_to.default_content()
I am using selenium to scrape reviews from tripadvisor.com. I haven't found the right way to extract all the review rating made by users:"ui_bubble_rating bubble_50", 50 = 5 star.
<span class="ui_bubble_rating bubble_50"></span>
::before==$0
::after==$0
Is there any way to extract the number using selenium?
Anyone knows, please help to point out with many thanks.
I tried the code below, but the xpath can't find the value I need. It provided only one star rating which is same for all review.
var = driver.find_element_by_xpath("//span[contains(#class, 'ui_bubble_rating bubble_')]").get_attribute("class")
I need the star rating of each review, please have a look at the photo below for what I need. Thank you
Hi. I finally solved my problem. Thank you so much for your time. I just put the answer here in case someone needs.
var = driver.find_element_by_class_name('ui_bubble_rating').get_attribute('class')
review_rating = var.split("_")[-1]
Get class name value and store in 'var'
split class name using underscore and it will return array with splitted string
access value at last index of array 'data' it will contain elements like "ui","bubble","rating bubble","50"
var = driver.
find_element_by_xpath("//span[contains(#class,
'ui_bubble_rating bubble_')]").get_attribute("class")
data = var.split("_")
print data[-1]