I am new to Python and Selenium coding, but I think I figured it out, tryed to build some exmaples for myself to learn from them, I got 2 questions,
First of all for some reason my code is stopping after my Input it does not going for the yalla() Function for some reason,
yallaurl = str(input('Your URL + ' + ""))
browser = webdriver.Chrome()
browser.get(yallaurl)
browser.maximize_window()
yalla()
Other then this the other Question is about browser.find_element_by_xpath so After I go to an html file and click Copy xpath I am getting something like this:
/html/body/table[2]/tbody/tr/td/form/table[4]/tbody/tr[2]/td/table/tbody/tr[2]/td[2]
So how is the line of code is working? is this legit?
def yalla():
sleep(2)
count = len(browser.find_elements_by_class_name('flyingCart'))
email = browser.find_element_by_xpath('/html/body/table[2]/tbody/tr/td/form/table[4]/tbody/tr[2]/td/table/tbody/tr[2]/td[2]')
for x in range(2, count):
itemdesc[x] = browser.find_element_by_xpath(
"/html/body/table[2]/tbody/tr/td/form/table[1]/tbody/tr[2]/td[2]/table/tbody/tr[x]/td[2]/a[1]/text()")
priceper[x] = browser.find_element_by_xpath(
"/html/body/table[2]/tbody/tr/td/form/table[1]/tbody/tr[2]/td[2]/table/tbody/tr[x]/td[5]/text()")
amount[x] = browser.find_element_by_xpath(
"/html/body/table[2]/tbody/tr/td/form/table[1]/tbody/tr[2]/td[2]/table/tbody/tr[x]/td[6]")
browser.navigate().to('https://www.greeninvoice.co.il/app/documents/new#type=100')
checklogininvoice()
Yes, your code will run just fine and is legit but not recommended.
As described, the absolute path works fine, but would break if the HTML was changed only slightly
Reference: https://selenium-python.readthedocs.io/locating-elements.html
Firstly, this code is confusing:
yallaurl = str(input('Your URL + ' + ""))
This is essentially equavilent to:
yallaurl = input('Your URL: ')
Yes, this code is correct:
browser.find_element_by_xpath('/html/body/table[2]/tbody/tr/td/form/table[4]/tbody/tr[2]/td/table/tbody/tr[2]/td[2]')
Please refer to the docs for proper usage.
Here is the suggested use of this method:
from selenium.webdriver.common.by import By
driver.find_element(By.XPATH, '/html/body/table[2]/tbody/tr/td/form/table[4]/tbody/tr[2]/td/table/tbody/tr[2]/td[2]')
This code will return an object of the element you have selected. To print the HTML of the element itself, this should work:
print(element.get_attribute('outerHTML'))
For further information on page objects, please refer to this page of the docs.
Since you have not provided the code for your 'yalla' function, it is hard to diagnose the problem there.
Related
Im using selenium to check if FB pages exist. When i enter the page title in the search bar it works fine but after the second loop the name of the page gets attached to the preview search and i cant find a way to clear the previous search.
For example it looks for
xyz for the first time
then it looks for
xyzabc when i just want to look for abc this time.
How can i clear the search bar so i can just enter the input without the previous input?
Here is my code
for page_target in df.page_name.values:
time.sleep(3)
inputElement = driver.find_element_by_name("q")
inputElement.send_keys(page_target)
inputElement.submit()
time.sleep(5)
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser').get_text()
title = soup.find(page_target)
#if page exists add 1 to the dic otherwise -1
if title > 0:
dic_holder[page_target] = 1
else:
dic_holder[page_target] = -1
driver.find_element_by_name("q").clear()
time.sleep(3)
You can use
WebElement.clear();//to clear the previous search item
WebElement.sendkeys(abc);//to insert the new search
Also I guess you have a sticky search in your application hence I recommend you to use this method everytime you insert something in the searchbox
Few ways to do it:
Use element.clear(). I see that you already tried in your code, not sure how it didn't work but I guess it is not text box or input element?
Use javascript: driver.execute_script('document.getElementsByName("q")[0].value=""');
Emulate Ctrl+A?
from selenium.webdriver.common.keys import Keys
elem.send_keys(Keys.CONTROL, 'a')
elem.send_keys("page 1")
I'm trying to make a temporary email generator using 20-minute mail, but I can seem to print the text from my XPath. I started python 2 months ago and have been getting really good answers with my other questions. any response is appreciated.
code:
from selenium import webdriver
from time import sleep
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("http://www.20minutemail.com/")
sleep(1)
createMail = driver.find_element_by_xpath("//*[#id=\"headerwrap\"]/header/div[2]/div/div/input[2]")
createMail.click()
sleep(3)
email = driver.find_element_by_xpath("//*[#id=\"userTempMail\"]/text()")
print(email)
I've got similar problems when I tried to get some kind of attribute using xpath. I'm still not sure why. I worked arround it using the WebElement attribute. Try this:
email = driver.find_element_by_xpath("//*[#id=\"userTempMail\"]).text
Also, if you want to optimize your code you change sleep(time) for WebDriverWait(driver, time).until(some_condition). This'll stop halting your code as soon as some_condition is met. More on this here: https://selenium-python.readthedocs.io/waits.html#explicit-waits
I changed it to
email = driver.find_element_by_xpath("//*[#id=\"userTempMail\"]")
(taking the /text() out so it knows its just html)
then doing
print(email.text)
to get the inner text out.
I am attempting to web-scrape info off of the following website: https://www.axial.net/forum/companies/united-states-family-offices/
I am trying to scrape the description for each family office, so "https://www.axial.net/forum/companies/united-states-family-offices/"+insert_company_name" are the pages I need to scrape.
So I wrote the following code to test the program for just one page:
from bs4 import BeautifulSoup as soup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome('insert_path_here/chromedriver')
driver.get("https://network.axial.net/company/ansaco-llp")
page_source = driver.page_source
soup2 = soup(page_source,"html.parser")
soup2.findAll('axl-teaser-description')[0].text
This works for the single page, as long as the description doesn't have a "show full description" drop down button. I will save that for another question.
I wrote the following loop:
#Note: Lst2 has all the names for the companies. I made sure they match the webpage
lst3=[]
for key in lst2[1:]:
driver.get("https://network.axial.net/company/"+key.lower())
page_source = driver.page_source
for handle in driver.window_handles:
driver.switch_to.window(handle)
word_soup = soup(page_source,"html.parser")
if word_soup.findAll('axl-teaser-description') == []:
lst3.append('null')
else:
c = word_soup.findAll('axl-teaser-description')[0].text
lst3.append(c)
print(lst3)
When I run the loop, all of the values come out as "null", even the ones without "click for full description" buttons.
I edited the loop to instead print out "word_soup", and the page is different then if I had run it without a loop and does not have the description text.
I don't understand why a loop would cause that but apparently it does. Does anyone know how to fix this problem?
Found solution. pause the program for 3 seconds after driver.get:
import time
lst3=[]
for key in lst2[1:]:
driver.get("https://network.axial.net/company/"+key.lower())
time.sleep(3)
page_source = driver.page_source
word_soup = soup(page_source,"html.parser")
if word_soup.findAll('axl-teaser-description') == []:
lst3.append('null')
else:
c = word_soup.findAll('axl-teaser-description')[0].text
lst3.append(c)
print(lst3)
I see that the page uses javascript to generate the text meaning it doesn't show up in the page source, which is weird but ok. I don't quite understand why you're only iterating through and switching to all the instances of Selenium you have open, but you definitely won't find the description in the page source / beautifulsoup.
Honestly, I'd personally look for a better website if you can, otherwise, you'll have to try it with selenium which is inefficient and horrible.
For some reasons, all I want is to get from the input field what I have just written IN THE INPUT FIELD just to check.
from selenium import webdriver
import os
xpath_user = '//*[#id="login-username"]'
user = 'user#yahoo.com'
dir_path = os.path.dirname(os.path.realpath(__file__))
chromedriver = dir_path + "/chromedriver.exe"
driver = webdriver.Chrome(chromedriver)
driver.implicitly_wait(3)
driver.get('https:\\www.yahoo.com')
driver.find_element_by_xpath(xpath_user).send_keys(user)
element = driver.find_element_by_xpath(xpath_user).text
print(element)
if element == 'user#yahoo.com':
print("Good")
In this example, the output is '', but I want the actual 'user#yahoo.com', but I don't know if it is even possible because 'user#yahoo.com' doesn't appear in the html form of the page. Maybe I am missing something or there is a work around. I'll be glad if someone could help me.
Note that my experience with python is limited.
Try driver.find_element_by_xpath(xpath_user).get_attribute("value")
The text property is for text within the tags of an element.
I've successfully coded Python to open a website and then click a link. After that, I'd like to grab info from the site that is "active". But I'm getting the following error:
Search = Regex.search(res.text)
AttributeError: 'NoneType' object has no attribute 'text'
I think the problem is that I don't know how to "define" the clicked-into webpage as a variable. Here is the code that is relevant:
import re, requests, csv, pyperclip, logging
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('http://espn.go.com/golf/players')
find_player = browser.find_element_by_partial_link_text("Allenby, Robert")
res = find_player.click()
xRegex = re.compile(r'(1991)')
xSearch = xRegex.search(res.text)
output_player_name = xSearch.group(1)
This is my first Python coding experience and my first post to ask a question. Thanks in advance for any help.
PS I know that 1991 appears in the webpage. It's the year Robert Allenby turned pro.
find_element_by_partial_link_text() returns a anchor link and click is an event in browser which returns None. To get page content use browser.page_source and process the same as per requirement.
import re
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('http://espn.go.com/golf/players')
href_link = browser.find_element_by_partial_link_text("Allenby")
href_link.click()
res = browser.page_source
print res
xRegex = re.compile(r'(1991)')
xSearch = xRegex.search(res)
output_player_name = xSearch.group(1)
print output_player_name
Hope this helps :)
find_player is of type WebElement, res is type NoneType. The clicked-into webpage is held in the variable browser. You can use functions like find_element_by_partial_link_text to find what you need.
By the way, what are you trying to accomplish? It might help me answer your question better.