Having trouble getting some text. Python. Selenium

Having trouble getting some text. Python. Selenium - python

Trying to get the Finance data from this div. There is no unique identifier for this div. So, I collect all 3-4 divs check if the word FINANSE appears in the text, if it does, then get the inner div text. However, it doesn't seem to work. Any other approach or what am I missing here? Thanks in advance.
link = https://rejestr.io/krs/882875/fortuna-cargo
fin_divs = driver.find_elements_by_css_selector('div.card.mb-4')
for div in fin_divs:
if 'FINANSE' in div.text:
finances = div.find_element_by_css_selector('div.card-body').text
else:
finances = "Finance Data Not Available"

You can simplify your code to select exact element instead of looping through list of elements:
finances = driver.find_element_by_xpath('//div[div="Finanse"]/div[#class="card-body"]').text
print(finances)
>>>Kapitał zakładowy
>>>5 tys. zł

You are doing everything correct, just add break into the if statement to not overwrite finances to "Finance Data Not Available" after finding correct one:
fin_divs = driver.find_elements_by_css_selector('div.card.mb-4')
for div in fin_divs:
if 'FINANSE' in div.text:
finances = div.find_element_by_css_selector('div.card-body').text
break
else:
finances = "Finance Data Not Available"

Related

Python & Selenium: what's the best way to hierarchically select data from html elements?

As an exercise for learning Python and Selenium, I'm trying to write a script that checks a web page with all kinds of commercial deals, find all the specific food deals (class name 'tag-food'), put them in a list (elem), then check which ones contain the text 'sushi', and for those elements extract the html element which contains price. And print the results.
I have:
elem = driver.find_elements_by_class_name('tag-food')
i = 0
while i < len(elem):
source_code = elem[i].get_attribute("innerHTML")
# ?? how to check if source_code contains 'sushi'?
# ?? if true how to extract price data?
i = i + 1
driver.quit()
What's the best and most direct way to do these checks? Thanks! 🙏

I don't think you need a while loop for this. Also, you would be looking for a text value, not innerHTML
You can make it more simple like this:
for row in driver.find_elements_by_class_name('tag-food'):
if "sushi" in row.get_attribute("innerText"):
print("Yes this item has sushi")
# find element to grab price, store in variable to do something else with
else:
print("No sushi in this item")
Or even just this, depending on how the text in the HTML is structured:
for row in driver.find_elements_by_class_name('tag-food'):
if "sushi" in row.text:
print("Yes this item has sushi")
# find element to grab price, store in variable to do something else with
else:
print("No sushi in this item")

Selenium Python - Scription Return empty string and unable to compare by assertequal

I created a selenium script to check the number of cart is shown zero (0)> However, it's returned empty although this field is zero.
Scripts:
shopping_cart_qty = self.driver.find_element_by_xpath("//span[contains(#class,'topnav-cart-qty')]").text
self.assertEqual('0', shopping_cart_qty, "The shopping car is not empty")
Return:
The shopping car is not empty
!= 0
Expected :0
Actual :

Try with
shopping_cart_qty = self.driver.find_element_by_xpath("//span[contains(#class,'topnav-cart-qty')]").get_attribute("value")

shopping_cart_qty = self.driver.find_element_by_xpath("//span[contains(#class,'topnav-cart-qty')]").get_attribute("textContent")
or
shopping_cart_qty = self.driver.find_element_by_xpath("//span[contains(#class,'topnav-cart-qty')]")
driver.execute_script("arguments[0].scrollIntoView()",shopping_cart_qty )
print(shopping_cart_qty.text)
if element is not visible then text will return empty as it considers the visibility also, you can use textContent to retrieve the text irrespective of the visibility .
But the best way is to scrollintoview first and then get text

Noticing a warning to limit scraped results with BeautifulSoup in Python

I am trying to scrape sales data from eBay with BeautifulSoup in Python for recently sold items and it works very well with the following code which finds all prices and all dates from sold items.
price = []
try:
p = soup.find_all('span', class_='POSITIVE')
except:
p = 'nan'
for x in p:
x = str(x)
x = x.replace(' ','"')
x = x.split('"')
if '>Sold' in x:
continue
else:
price.append(x)
Now I am running into a problem though. As seen in the picture below for this URL (https://www.ebay.com/sch/i.html?_from=R40&_trksid=p2334524.m570.l1313&_nkw=babe+ruth+1933+goudey+149+psa+%281.5%29&_sacat=0&LH_TitleDesc=0&_osacat=0&_odkw=babe+ruth+1933+goudey+149+psa+1.5&LH_Complete=1&rt=nc&LH_Sold=1), eBay sometimes suggests other search results if there are not enough for specific search queries. Check out the image
By that, my code not only finds the correct prices but also those of the suggested results below the warning. I was trying to find out where the warning message is located and delete every listing that is being found afterward, but I cannot figure it out. I also thought that I can search for the prices one by one but even then I cannot figure out how to notice when the warning appears.
Is there any other way you guys can think of to solve this?
I am aware that this is really specific

You can scrape the number of results (Shown in picture) and make a loop with the range of the results.
The code will be something like:
results = soup.find...
#You have to make the variable a int so replace everything extra
results = int(results)
for i in range(1, results):
price[i] = str(price[i])
price[i] = price[i].replace(' ','"')
price[i] = price[i].split()
if '>Sold' in price[i]:
continue
else:

Select button by highest number in xpath

There are multiple buttons on my page containing similar href. They only differ with id_invoices. I want to click one button on page using xpath and href which looks like:
href="/pl/payment/getinvoice/?id_invoices=461"
I can select all buttons using:
invoices = driver.find_elements_by_xpath("//a[contains(#href, '/payment/getinvoice/')]")
but I need to select only button with highest id_invoices. Can it be done? :)

What you can do is:
hrefList = driver.find_elements_by_xpath("//a[contains(#href, '/payment/getinvoice/')]/#href")
for i in hrefList:
hrefList[i]=hrefList[i].split("id_invoices=")[-1]
max = max(hrefList)
driver.find_elements_by_xpath("//a[contains(#href, '/payment/getinvoice/?id_invoices="+str(max))+"'"+"]").click()

I don't know much about python so giving you a direction/algorithm to achieve same
Using getAttribute('#href');
You will get strings of URLs
You need to split all element after getText() you get in invoice List.
Split by= and pick up the last array value.
Now you need to typecast string to int, as last value after = will be a number
Now you just need to pick the highest value.

Since you have an XPath that returns all the desired elements, you just need to grab the href attribute from each one, split the href by '=' to get the id (2nd part of string), find the largest id, and then use the id to find the element you want and click on it.
invoices = driver.find_elements_by_xpath("//a[contains(#href, '/payment/getinvoice/')]")
ids = []
for invoice in invoices
ids.append(invoice.get_attribute("href").split('=')[2])
results = list(map(int, ids)) // you can't do max on a list of string, you won't get the right answer
id = max(results)
driver.find_element_by_xpath("//a[#href='/pl/payment/getinvoice/?id_invoices=" + id + "']").click

Selenium Python how to get text(html source) from <div>

I'm trying to get text $27.5 inside tag <div>, I located the element by id and the element is called "price".
The snippet of html is as follows:
<div id="PPP,BOSSST,NYCPAS,2015-04-26T01:00:00-04:00,2015-04-26T05:20:00-04:00,_price" class="price inlineBlock strong mediumText">$27.50</div>
Here is what I've tried
price.text
price.get_attribute('value')
Both of the above doesn't work.
Update:
Thanks for everyone that tries to help.
I combined your answers together and got the solution:)
price = driver.find_element_by_xpath("//div[#class='price inlineBlock strong mediumText']")
price_content = price.get_attribute('innerHTML')
print price_content.strip()

Can't you use a regular expression or Beautiful Soup to find the contents of the element in HTML:
re.search(r'<div.*?>(*.?)</div>', price.get_attribute('innerHTML')).group(1)

You element is hidden, last I worked with Selenium you were not able to get text of hidden elements. That said, you can always execute javascript, I dont usually write in python, but it should be something like:
def val = driver.execute_script("return document.getElementById('locator').innerHTML")

Change the css selector to
div[id$='_price']
Complete code
price = fltright.find_element(By.CSS_SELECTOR, "div[id$='_price']")
price.text

I tried your edited solution, but they only get 1 div having class. So, I tried these below to print a List of div having the same class.
Changing element to elements will output a List:
price = driver.find_elements_by_xpath('//div[#class = "price inlineBlock strong mediumText"]')
Use for ... in range () to print a List:
num = len (price)
for i in range (num):
print (price[i].text)

browser.find_element_by_xpath("//form[#id='workQueueTaskListForm']/div[1]/p").text

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Having trouble getting some text. Python. Selenium - python

You can simplify your code to select exact element instead of looping through list of elements: finances = driver.find_element_by_xpath('//div[div="Finanse"]/div[#class="card-body"]').text print(finances) >>>Kapitał zakładowy >>>5 tys. zł

Related

Python & Selenium: what's the best way to hierarchically select data from html elements?

Selenium Python - Scription Return empty string and unable to compare by assertequal

Noticing a warning to limit scraped results with BeautifulSoup in Python

Select button by highest number in xpath

Selenium Python how to get text(html source) from <div>

Categories

Resources