Cant print text with html xpath with selenium - python

I'm trying to make a temporary email generator using 20-minute mail, but I can seem to print the text from my XPath. I started python 2 months ago and have been getting really good answers with my other questions. any response is appreciated.
code:
from selenium import webdriver
from time import sleep
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("http://www.20minutemail.com/")
sleep(1)
createMail = driver.find_element_by_xpath("//*[#id=\"headerwrap\"]/header/div[2]/div/div/input[2]")
createMail.click()
sleep(3)
email = driver.find_element_by_xpath("//*[#id=\"userTempMail\"]/text()")
print(email)

I've got similar problems when I tried to get some kind of attribute using xpath. I'm still not sure why. I worked arround it using the WebElement attribute. Try this:
email = driver.find_element_by_xpath("//*[#id=\"userTempMail\"]).text
Also, if you want to optimize your code you change sleep(time) for WebDriverWait(driver, time).until(some_condition). This'll stop halting your code as soon as some_condition is met. More on this here: https://selenium-python.readthedocs.io/waits.html#explicit-waits

I changed it to
email = driver.find_element_by_xpath("//*[#id=\"userTempMail\"]")
(taking the /text() out so it knows its just html)
then doing
print(email.text)
to get the inner text out.

Related

For Loops while using selenium for webscraping Python

I am attempting to web-scrape info off of the following website: https://www.axial.net/forum/companies/united-states-family-offices/
I am trying to scrape the description for each family office, so "https://www.axial.net/forum/companies/united-states-family-offices/"+insert_company_name" are the pages I need to scrape.
So I wrote the following code to test the program for just one page:
from bs4 import BeautifulSoup as soup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome('insert_path_here/chromedriver')
driver.get("https://network.axial.net/company/ansaco-llp")
page_source = driver.page_source
soup2 = soup(page_source,"html.parser")
soup2.findAll('axl-teaser-description')[0].text
This works for the single page, as long as the description doesn't have a "show full description" drop down button. I will save that for another question.
I wrote the following loop:
#Note: Lst2 has all the names for the companies. I made sure they match the webpage
lst3=[]
for key in lst2[1:]:
driver.get("https://network.axial.net/company/"+key.lower())
page_source = driver.page_source
for handle in driver.window_handles:
driver.switch_to.window(handle)
word_soup = soup(page_source,"html.parser")
if word_soup.findAll('axl-teaser-description') == []:
lst3.append('null')
else:
c = word_soup.findAll('axl-teaser-description')[0].text
lst3.append(c)
print(lst3)
When I run the loop, all of the values come out as "null", even the ones without "click for full description" buttons.
I edited the loop to instead print out "word_soup", and the page is different then if I had run it without a loop and does not have the description text.
I don't understand why a loop would cause that but apparently it does. Does anyone know how to fix this problem?
Found solution. pause the program for 3 seconds after driver.get:
import time
lst3=[]
for key in lst2[1:]:
driver.get("https://network.axial.net/company/"+key.lower())
time.sleep(3)
page_source = driver.page_source
word_soup = soup(page_source,"html.parser")
if word_soup.findAll('axl-teaser-description') == []:
lst3.append('null')
else:
c = word_soup.findAll('axl-teaser-description')[0].text
lst3.append(c)
print(lst3)
I see that the page uses javascript to generate the text meaning it doesn't show up in the page source, which is weird but ok. I don't quite understand why you're only iterating through and switching to all the instances of Selenium you have open, but you definitely won't find the description in the page source / beautifulsoup.
Honestly, I'd personally look for a better website if you can, otherwise, you'll have to try it with selenium which is inefficient and horrible.

Webpage formatted in a way that makes selecting text with selenium impossible

This problem is driving me insane: I'm trying to capture the response from a Pandorabot using Selenium but although I can input text and make the bot reply, its webpage is formatted in such a way that makes selecting the output text a nightmare.
This is my code in Python:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from time import sleep
driver = webdriver.Firefox()
driver.get("http://demo.vhost.pandorabots.com/pandora/talk?botid=b0dafd24ee35a477")
elem = driver.find_element_by_name("input")
elem.clear()
elem.send_keys("hello")
elem.send_keys(Keys.RETURN)
line = driver.find_element_by_xpath("(//input)[#name='botcust2']/preceding::font[1]/*")
print(line)
response = line.text
print(response)
driver.close()
which manages to get the first bit of the response ("Chomsky:") but not the rest.
How do I get to properly capture the response text (ideally excluding the bot name)?
Is there a more elegant way to do it (eg jquery script) that wouldn't break so easily if the webpage gets reformatted?
Many thanks!
Edit
So, after playing around a bit more with jQuery I found a workaround to the problem of any URL text not showing.
I set the whole text string into a variable and then I replace any instances of the name and empty lines with ''. So the jQuery code as pointed out by pguardiario becomes:
# get the last child text node
response = self.browser.execute_script("""
var main_str = $('font:has(b:contains("Chomsky:"))').contents().has( "br" ).last().text().trim();
main_str = main_str.replace(/Chomsky:/g,'').replace(/^\\s*[\\r\\n]/gm, '');
return main_str;
""")
I'm sure there may be better/more elegant ways to do the whole thing but for now it works.
Many thanks to pguardiario and everyone else for the suggestions!
Since you asked for jQuery:
from requests import get
body = get("http://code.jquery.com/jquery-1.11.3.min.js").content.decode('utf8')
driver.execute_script(body)
# get the last child text node
response = driver.execute_script("""
return $('font:has(b:contains("Chomsky:"))').contents().last().text().trim()
""")
To capture the response from a Pandorabot using Selenium as the response is within a text node you can execute_script() method as follows:
Code Block:
driver.get('http://demo.vhost.pandorabots.com/pandora/talk?botid=b0dafd24ee35a477')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input[name='input']"))).send_keys("hello")
driver.find_element_by_css_selector("input[value='Ask Chomsky']").click()
print(driver.execute_script("return arguments[0].lastChild.textContent;", WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//input[#value='Ask Chomsky']//following-sibling::font[last()]//font")))).strip())
Console Output:
Hi! Can I ask you a question?

Cycle trough URLs from a txt

This is my first question so please bear with me (I have googled this and I did not find anything)
I'm making a program which goes to a url, clicks a button, checks if the page gets forwarded and if it does saves that url to a file.
So far I've got the first two steps done but I'm having some issues.
I want Selenium to repeat this process with multiple urls (if possible, multiple at a time).
I have all the urls in a txt called output.txt
At first I did
url_list = https://example.com
to see if my program even worked, and it did however I am stuck on how to get it to go to the next URL in the list and I am unable to find anything on the internet which helps me.
This is my code so far
import selenium
from selenium import webdriver
url_list = "C\\user\\python\\output.txt"
def site():
driver = webdriver.Chrome("C:\\python\\chromedriver")
driver.get(url_list)
send = driver.find_element_by_id("NextButton")
send.click()
if (driver.find_elements_by_css_selector("a[class='Error']")):
print("Error class found")
I have no idea as to how I'd get selenium to go to the first url in the list then go onto the second one and so forth.
If anyone would be able to help me I'd be very grateful.
I think the problem is that you assumed the name of the file containing the url, is a url. You need to open the file first and build the url list.
According to the docs https://selenium.dev/documentation/en/webdriver/browser_manipulation/, get expect a url, not a file path.
import selenium
from selenium import webdriver
with open("C\\user\\python\\output.txt") as f:
url_list = f.read().split('\n')
def site():
driver = webdriver.Chrome("C:\\python\\chromedriver")
for url in url_list:
driver.get(url)
send = driver.find_element_by_id("NextButton")
send.click()
if (driver.find_elements_by_css_selector("a[class='Error']")):
print("Error class found")

selenium stop after input

I am new to Python and Selenium coding, but I think I figured it out, tryed to build some exmaples for myself to learn from them, I got 2 questions,
First of all for some reason my code is stopping after my Input it does not going for the yalla() Function for some reason,
yallaurl = str(input('Your URL + ' + ""))
browser = webdriver.Chrome()
browser.get(yallaurl)
browser.maximize_window()
yalla()
Other then this the other Question is about browser.find_element_by_xpath so After I go to an html file and click Copy xpath I am getting something like this:
/html/body/table[2]/tbody/tr/td/form/table[4]/tbody/tr[2]/td/table/tbody/tr[2]/td[2]
So how is the line of code is working? is this legit?
def yalla():
sleep(2)
count = len(browser.find_elements_by_class_name('flyingCart'))
email = browser.find_element_by_xpath('/html/body/table[2]/tbody/tr/td/form/table[4]/tbody/tr[2]/td/table/tbody/tr[2]/td[2]')
for x in range(2, count):
itemdesc[x] = browser.find_element_by_xpath(
"/html/body/table[2]/tbody/tr/td/form/table[1]/tbody/tr[2]/td[2]/table/tbody/tr[x]/td[2]/a[1]/text()")
priceper[x] = browser.find_element_by_xpath(
"/html/body/table[2]/tbody/tr/td/form/table[1]/tbody/tr[2]/td[2]/table/tbody/tr[x]/td[5]/text()")
amount[x] = browser.find_element_by_xpath(
"/html/body/table[2]/tbody/tr/td/form/table[1]/tbody/tr[2]/td[2]/table/tbody/tr[x]/td[6]")
browser.navigate().to('https://www.greeninvoice.co.il/app/documents/new#type=100')
checklogininvoice()
Yes, your code will run just fine and is legit but not recommended.
As described, the absolute path works fine, but would break if the HTML was changed only slightly
Reference: https://selenium-python.readthedocs.io/locating-elements.html
Firstly, this code is confusing:
yallaurl = str(input('Your URL + ' + ""))
This is essentially equavilent to:
yallaurl = input('Your URL: ')
Yes, this code is correct:
browser.find_element_by_xpath('/html/body/table[2]/tbody/tr/td/form/table[4]/tbody/tr[2]/td/table/tbody/tr[2]/td[2]')
Please refer to the docs for proper usage.
Here is the suggested use of this method:
from selenium.webdriver.common.by import By
driver.find_element(By.XPATH, '/html/body/table[2]/tbody/tr/td/form/table[4]/tbody/tr[2]/td/table/tbody/tr[2]/td[2]')
This code will return an object of the element you have selected. To print the HTML of the element itself, this should work:
print(element.get_attribute('outerHTML'))
For further information on page objects, please refer to this page of the docs.
Since you have not provided the code for your 'yalla' function, it is hard to diagnose the problem there.

Python: How to use re.compile after click()

I've successfully coded Python to open a website and then click a link. After that, I'd like to grab info from the site that is "active". But I'm getting the following error:
Search = Regex.search(res.text)
AttributeError: 'NoneType' object has no attribute 'text'
I think the problem is that I don't know how to "define" the clicked-into webpage as a variable. Here is the code that is relevant:
import re, requests, csv, pyperclip, logging
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('http://espn.go.com/golf/players')
find_player = browser.find_element_by_partial_link_text("Allenby, Robert")
res = find_player.click()
xRegex = re.compile(r'(1991)')
xSearch = xRegex.search(res.text)
output_player_name = xSearch.group(1)
This is my first Python coding experience and my first post to ask a question. Thanks in advance for any help.
PS I know that 1991 appears in the webpage. It's the year Robert Allenby turned pro.
find_element_by_partial_link_text() returns a anchor link and click is an event in browser which returns None. To get page content use browser.page_source and process the same as per requirement.
import re
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('http://espn.go.com/golf/players')
href_link = browser.find_element_by_partial_link_text("Allenby")
href_link.click()
res = browser.page_source
print res
xRegex = re.compile(r'(1991)')
xSearch = xRegex.search(res)
output_player_name = xSearch.group(1)
print output_player_name
Hope this helps :)
find_player is of type WebElement, res is type NoneType. The clicked-into webpage is held in the variable browser. You can use functions like find_element_by_partial_link_text to find what you need.
By the way, what are you trying to accomplish? It might help me answer your question better.

Categories