I am working on a project here a work and I have print lines being displayed for 5+ elements. I need to add all the outputs into a txt file. I was told I can use a for loop but not sure how. What are my best options.
import unittest
from selenium import webdriver
import time
import sys
driver = webdriver.Chrome()
driver.get('website')
driver.maximize_window()
# Displays Time Sheet for Current Day
element = driver.find_element_by_xpath('//*[#id="page-wrapper"]/div[1]/h1')
# this element is visible
print(element.text)
# Displays Start Time
element = driver.find_element_by_xpath('//*
[#id="pageinner"]/div/div[1]/div/div/div/div[1]/div[2]/div[1]')
print("Start time was at:", element.text)
# Displays End Time
element = driver.find_element_by_xpath('//*
[#id="pageinner"]/div/div[1]/div/div/div/div[4]/div[2]/div[1]')
print("Clocked out as of:", element.text)
# Displays when out to Lunch
element = driver.find_element_by_xpath('//*[#id="page-
inner"]/div/div[1]/div/div/div/div[2]/div[2]/div[1]/h3')
print("I left for Lunch at:", element.text)
# Displays when back from Lunch
element = driver.find_element_by_xpath('//*[#id="page-
inner"]/div/div[1]/div/div/div/div[3]/div[2]/div[1]/h3')
print("I arrived back from Lunch at:", element.text)
# Total Hours for The Day
element = driver.find_element_by_xpath('//*[#id="page-
inner"]/div/div[1]/div/div/div/div[5]/div[2]/div[1]')
print("I was at work for:", element.text)
'''
# Save to txt
sys.stdout = open('file.txt', 'w')
print(element.text)
'''
# Screenshot
# driver.save_screenshot('screenshot.png')
driver.close()
if __name__ == '__main__':
unittest.main()
At the end Here I need to save all the prints into a txt file for documentation.
Selenium 3.5 with Python 3.6.1 bindings provides a simpler way to redirect all the Console Outputs into a log file.
You can create a sub-directory within your Project space by the name Log and start redirecting the Console Outputs into a log file as follows:
# Firefox
driver = webdriver.Firefox(executable_path=r'C:\your_path\geckodriver.exe', log_path='./Log/geckodriver.log')
# Chrome
driver = webdriver.Chrome(executable_path=r'C:\your_path\chromedriver.exe', service_log_path='./Log/chromedriver.log')
# IE
driver = webdriver.Ie(executable_path=r'C:\your_path\IEDriverServer.exe', log_file='./Log/IEdriver.log')
It is worth to mention that the verbosity of the webdriver is easily configurable.
Related
I will describe you a problem of interaction EXCEL VBA Script calling a python webscraper script, to collect an visualize financial data from the n-tv website.
It is an exercise and only private interest for me , to understand, where my mistake is. I am a beginner in programming and not a professional, so please do not be irritated by my probably very poor program style. I am just learning and this is an exercise.
First I show you my python web scraper Script based on selenium:
Screenshot
[enter image description here][1]
+
Text of code:
"""
Spyder Editor
This is a temporary script file.
"""
import time
print("ich bin überfordert")
time.sleep(3)
print("import ausgeführt")
time.sleep(3)
from selenium import webdriver
driver = webdriver.Chrome(executable_path = 'C:\Program Files\Google\chromedriver.exe')
driver.get('http:\\www.n-tv.de')
time.sleep(5)
iframe = driver.find_element_by_xpath('//*[#id="sp_message_iframe_532515"]')
driver.switch_to.frame(iframe)
button = driver.find_element_by_xpath('//*[#id="notice"]/div[3]/div[2]/button')
button.click()
time.sleep(5)
driver.refresh()
driver.implicitly_wait(15)
link = driver.find_elements_by_class_name('stock__currency')
link[0].click()
time.sleep(3)
tab2 = driver.find_elements_by_class_name("tableholder")
rows = tab2[3].find_elements_by_class_name('linked')
datei = open('textdatei.txt','a')
for row in rows:
cols = row.find_elements_by_tag_name('td')
zeichen = ""
for col in cols:
print(col.text)
zeichen = zeichen + col.text + "\t"
print(zeichen)
datei.write(zeichen + "\n")
datei.close()
driver.close()
It is based on selenium, clicks away a cookie button in an iframe, links to the target finanzdata DAX an reads out those data in a file.txt.
Then the text of the calling EXCEL VBA Script:
Sub Textdatei_Einlesen()
Dim objShell As Object
Dim PythonExe, PythonScript As String
Set objShell = VBA.CreateObject("Wscript.Shell")
Dim TextAusDatei As String
Dim Zähler As Long
Dim Tabelle As Worksheet
PythonExe = """C:\Python39\python.exe"""
PythonScript = "C:\Users\annet\ScrapeTest.py"
'PythonScript = "D:\PYTHON-Test\shell_excel.py"
Set objShell = VBA.CreateObject("Wscript.Shell")
objShell.Run PythonExe & PythonScript, 1, True
End Sub
The VBA Script is working in the kind, that a simple test python program “shell_excel.py” is called by excel, runs without problem an so far it seemed all ok.
Above As you can see, the small “shell_excel.py” counter test script is executed correctly.
Following the source code of shell_excel.py:
# -*- coding: utf-8 -*-
"""
Spyder Editor
This is a temporary script file.
"""
import time
print("ich bin überfordert")
for i in range(0,10):
print(str(i))
time.sleep(i)
print("sek: " + str(i))
print("import ausgeführt")
But the Problems arises, if the Excel VBA calls the python scraper script:
As you can see, the web scraper is called and started like the first print commad of python script.
But the rest of the lines of the python source code is not executed. After a ca half a second the prompt shell is closed without any calling of the webdriver or something else as an effect, even without any error message.
As you can see by screenshot, the scraper script is running until the first print command, but at the first imports it is finished without any result and effect.
I think there must be problems by importing selenium etc.
And that I do not understand, because if i call and run my scrape script alone under xSpider, it works fine.
I know, there would be other ways to get the data into excel sheet, but I want in ideal case only open excel, press vba run button, the scraping an other processes are starting and automaticalliy excel takes the scraped data from txt file to make graphics.
Has anyone an idea, where the problem in my python environment coud be ?
Remark again, the scraper alone works, the excel vba script alone works, the small python script to test is called correctly by excel vba, but if i switch to the selenium based script it is not executed at the import parts.
Has anyone an idea ?
I am very sorry, the screenshot as jpg files are here not imported, i don not why, it seems i have not the rights.
The python script worked sometimes but not always so I added some WebDriverWait blocks and that seems to have fixed it. The VBA is much the same except I used Exec instead of Run to capture the output.
Option Explicit
Sub Textdatei_Einlesen()
Const PyExe = "C:\Python39\python.exe"
Const PyScript = "C:\Users\annet\ScrapeTest.py"
Dim objShell As Object, objScript As Object
Set objShell = VBA.CreateObject("Wscript.Shell")
Set objScript = objShell.Exec("""" & PyExe & """ " & PyScript)
MsgBox objScript.StdOut.ReadAll(), vbInformation
End Sub
python
import time
import sys
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
URL = 'https://www.n-tv.de'
DRIVERPATH = r'C:\Program Files\Google\chromedriver.exe'
def logit(s):
log.write(time.strftime("%H:%M:%S ") + s + "\n")
# create logfile
logfile = sys.path[0] + '\\' + time.strftime('%Y-%m-%d_%H%M%S') + ".log"
log = open(logfile,'a')
# webdriver
s = Service(DRIVERPATH)
op = webdriver.ChromeOptions()
op.add_argument('--ignore-certificate-errors-spki-list')
driver = webdriver.Chrome(service=s,options=op)
print("Getting " + URL)
driver.get(URL)
try:
iframe = WebDriverWait(driver, 3).until \
(EC.presence_of_element_located((By.XPATH,'//*[#id="sp_message_iframe_532515"]')))
logit("IFrame is ready!")
except TimeoutException:
logit("Loading IFrame took too much time!")
quit()
driver.switch_to.frame(iframe)
driver.implicitly_wait(3)
button = driver.find_element(By.XPATH,'//*[#id="notice"]/div[3]/div[2]/button')
button.click()
time.sleep(2)
driver.refresh()
try:
WebDriverWait(driver, 5).until \
(EC.presence_of_all_elements_located((By.CLASS_NAME,'stock__currency')))
logit("Page is ready!")
except TimeoutException:
logit("Loading Page took too much time!")
driver.quit()
quit()
link = driver.find_elements(By.CLASS_NAME,'stock__currency')
link[0].click()
try:
WebDriverWait(driver, 5).until \
(EC.presence_of_all_elements_located((By.CLASS_NAME,'tableholder')))
logit("Table is ready!")
except TimeoutException:
logit("Loading Table took too much time!")
driver.quit()
quit()
# find table
tab2 = driver.find_elements(By.CLASS_NAME,"tableholder")
rows = tab2[3].find_elements(By.CLASS_NAME,'linked')
# create text file
datei = open('textdatei.txt','a')
# write to text file
for row in rows:
logit(row.get_attribute('innerHTML'))
cols = row.find_elements(By.TAG_NAME,'td')
zeichen = ""
for col in cols:
zeichen = zeichen + col.text + "\t"
datei.write(zeichen + "\n")
# exit
datei.close()
log.close()
driver.quit()
print("Ended see logfile " + logfile)
quit()
I wrote a short program to automate the process of clicking and saving profiles on LinkedIn.
Brief:
The program reads from a txt file with a large amount of LI URLs.
Using Selenium, it opens them one by one, then, hit the "Open in Sales Navigator" button
A new tab is opening, and on it, it needs to click the "Save" button, and choose the relevant list to save on.
I have two main problems:
LinkedIn has 3 versions of the same page. How can I use a condition to check which page version is it? (meaning - if you can't find this button, move to the next version). From what I've seen, you can't really use the "If" function with selenium, cause it causing trouble. Any other suggestions?
More important, and the reason I opened this thread - I want to monitor the "failed" links. Let's say I have a list of 1000 LI URLs, and I ran the program to save them on my account. I want to monitor the ones it didn't save or failed to open (broken links, page unavailable, etc.). In order to execute that, I used a CSV file and ordered the program to save all the pages that already saved on this account, but it doesn't solve my problem. How can I make him save all of them and not just the ones that were already saved? (I find it hard to execute because when a page appears as "Unavailable", it jumps to the next one and I couldn't find a way to make him save it.
It makes it harder to work with it, cause when I put 500 or 1000 URLs, I can't tell which ones save and which ones aren't saved.
Here's the code:
import selenium.webdriver as webdriver
import selenium.webdriver.support.ui as ui
from selenium.webdriver.common.keys import Keys
from time import sleep
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import NoSuchElementException
import csv
import random
options = webdriver.ChromeOptions()
options.add_argument('--lang=EN')
options.add_argument("--start-maximized")
prefs = {"profile.default_content_setting_values.notifications" : 2}
options.add_experimental_option("prefs",prefs)
driver = webdriver.Chrome(executable_path='assets\chromedriver', chrome_options=options)
driver.get("https://www.linkedin.com/login?fromSignIn=true")
minDelay=input("\n Please provide min delay in seconds : ")
maxDelay=input("\n Please provide max delay in seconds : ")
listNumber=input("\n Please provide list number : ")
outputFile=input('\n save skipped as?: ')
count=0
closed=2
with open("links.txt", "r") as links:
for link in links:
try:
driver.get(link.strip())
sleep(3)
driver.find_element_by_xpath("//button[#class='save-to-list-dropdown__trigger ph5 artdeco-button artdeco-button--primary artdeco-button--3 artdeco-button--pro artdeco-dropdown__trigger artdeco-dropdown__trigger--placement-bottom ember-view']").click()
sleep(2)
count+=1
if count==1:
driver.find_element_by_xpath("//ul[#class='save-to-list-dropdown__content']//ul//li["+str(listNumber)+"]").click()
else:
driver.find_element_by_xpath("//ul[#class='save-to-list-dropdown__content']//ul//li[1]").click()
sleep(2)
sleep(random.randint(int(minDelay), int(maxDelay)))
except:
if closed==0:
driver.close()
sleep(1)
fileOutput=open(outputFile+".csv", mode='a', newline='', encoding='utf-8')
file_writer = csv.writer(fileOutput, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
file_writer.writerow([link.strip()])
fileOutput.close()
print("Finished.")
The common approach to have different sort of listeners is to use EventFiringWebDriver. See the example here:
from selenium import webdriver
from selenium.webdriver.support.abstract_event_listener import AbstractEventListener
from selenium.webdriver.support.event_firing_webdriver import EventFiringWebDriver
class EventListener(AbstractEventListener):
def before_click(self, element, driver):
if element.tag_name == 'a':
print('Clicking link:', element.get_attribute('href'))
if __name__ == '__main__':
driver = EventFiringWebDriver(driver=webdriver.Firefox(), event_listener=EventListener())
driver.get("https://webelement.click/en/welcome")
link = driver.find_element_by_xpath('//a[text()="All Posts"]')
link.click()
driver.quit()
UPD:
Basically your case does not really need that listener. However you can user it. Say you have link file like:
https://google.com
https://invalid.url
https://duckduckgo.com/
https://sadfsdf.sdf
https://stackoverflow.com
Then the way with EventFiringWebDriver would be:
from selenium import webdriver
from selenium.webdriver.support.abstract_event_listener import AbstractEventListener
from selenium.webdriver.support.event_firing_webdriver import EventFiringWebDriver
broken_urls = []
class EventListener(AbstractEventListener):
def on_exception(self, exception, drv):
broken_urls.append(drv.current_url)
if __name__ == '__main__':
driver = EventFiringWebDriver(driver=webdriver.Firefox(), event_listener=EventListener())
with open("links.txt", "r") as links:
for link in links:
try:
driver.get(link.strip())
except:
print('Cannot reach the link', link.strip())
print("Finished.")
driver.quit()
import csv
with open('broken_urls.csv', 'w', newline='') as broken_urls_csv:
wr = csv.writer(broken_urls_csv, quoting=csv.QUOTE_ALL)
wr.writerow(broken_urls)
and without EventFiringWebDriver would be:
broken_urls = []
if __name__ == '__main__':
from selenium import webdriver
driver = webdriver.Firefox()
with open("links.txt", "r") as links:
for link in links:
stripped_link = link.strip()
try:
driver.get(stripped_link)
except:
print('Cannot reach the link', link.strip())
broken_urls.append(stripped_link)
print("Finished.")
driver.quit()
import csv
with open('broken_urls.csv', 'w', newline='') as broken_urls_csv:
wr = csv.writer(broken_urls_csv, quoting=csv.QUOTE_ALL)
wr.writerow(broken_urls)
My task is to open each url from the following website and retrieve some evaluation data for each essay. I have located the element successfully, which means I get 10 element. However, when selenium began to imitate human to click the url, it can only open the first link of ten links.
https://esi.clarivate.com/DocumentsAction.action
HTML:
The code is as followed.
import time
from selenium import webdriver
driver=webdriver.Chrome('/usr/local/bin/chromedriver')
driver.get('https://esi.clarivate.com/IndicatorsAction.action?Init=Yes&SrcApp=IC2LS&SID=H3-M1jrs4mSS2O3WTFbtdrUJugtDvogGRIM-18x2dx2B1ubex2Bo9Y5F6ZPQtUZbfUAx3Dx3Dp1StTsneXx2B7vu85UqXoaoQx3Dx3D-03Ff2gF3hTJGBPDScD1wSwx3Dx3D-cLUx2FoETAVeN3rTSMreq46gx3Dx3D')
#add filter-> research fields-> "clinical medicine"
target = driver.find_element_by_id("ext-gen1065")
time.sleep(1)
target.click()
time.sleep(1)
n = driver.window_handles
driver.switch_to.window(n[-1])
links=driver.find_elements_by_class_name("docTitle")
length=len(links)
for i in range(0,length):
item=links[i]
item.click()
time.sleep(1)
handles=driver.window_handles
index_handle=driver.current_window_handle
for handle in handles:
if handle != index_handle:
driver.switch_to.window(handle)
else:
continue
time.sleep(1)
u1=driver.find_elements_by_class_name("large-number")[2].text
u2=driver.find_elements_by_class_name("large-number")[3].text
print(u1,u2)
print("\n")
driver.close()
time.sleep(1)
driver.switch_to_window(index_handle)
driver.quit()
print("————finished————")
The error page:
And I try to find out the problem by testing these code:
links=driver.find_elements_by_class_name("docTitle")
length=len(links)
print(length)
print(links[1].text)
#links[0].click()
links[1].click()
The result is:
which means it had already find the element, but failed to open it.(when using links[0].text, it works fine.)
Any idea about this?
I have created a screen scraping program using selenium, which prints out a few variables. I want to take the numbers it spits out and compare it to numbers in a text document. I am unsure on the process of going about this. What would be the best way to go about this. The text file will contain a 3 numbers which will be compared to 3 numbers that have been screen scraped.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
#The above is downloading the needed files for this code to work
chrome_path = r"C:\Users\ashabandha\Downloads\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://signin.acellus.com/SignIn/index.html")
time.sleep(2)
username = driver.find_element_by_id("Name")
password = driver.find_element_by_id("Psswrd")
username.send_keys("my login")
password.send_keys("my password")
time.sleep(2)
driver.find_element_by_xpath("""//*[#id="loginform"]/table[2]/tbody/tr/td[2]/input""").click()
#The program has now signed in and is going to navigate to the progress tab
time.sleep(2)
driver.get("https://admin252.acellus.com/StudentFunctions/progress.html?ClassID=484")
time.sleep(2)
#now we are on the progress tab
posts = driver.find_elements_by_class_name("Object7069")
time.sleep(2)
for post in posts:
print (post.text)
#this gives me the first class log
time.sleep(2)
driver.get("https://admin252.acellus.com/StudentFunctions/progress.html?ClassID=326")
#This gives me second class log
time.sleep(2)
posts = driver.find_elements_by_class_name("Object7069")
time.sleep(2)
for post in posts:
print (post.text)
time.sleep(2)
driver.get("https://admin252.acellus.com/StudentFunctions/progress.html?ClassID=292")
posts = driver.find_elements_by_class_name("Object7069")
time.sleep(2)
for post in posts:
print (post.text)
Save selenium output on a data structure, like list or dictionary, then open the file, extract the info you want to compare it to and do the algorithm or expression you wish to it: https://www.python.org/doc/
check out working with file.
I already written several lines of codes to pull url from this website.
http://www.worldhospitaldirectory.com/United%20States/hospitals
code is below:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import csv
driver = webdriver.Firefox()
driver.get('http://www.worldhospitaldirectory.com/United%20States/hospitals')
url = []
pagenbr = 1
while pagenbr <= 115:
current = driver.current_url
driver.get(current)
lks = driver.find_elements_by_xpath('//*[#href]')
for ii in lks:
link = ii.get_attribute('href')
if '/info' in link:
url.append(link)
print('page ' + str(pagenbr) + ' is done.')
if pagenbr <=114:
elm = driver.find_element_by_link_text('Next')
driver.implicitly_wait(10)
elm.click()
time.sleep(2)
pagenbr += 1
ls = list(set(url))
with open('US_GeneralHospital.csv', 'wb') as myfile:
wr = csv.writer(myfile,quoting=csv.QUOTE_ALL)
for u in ls:
wr.writerow([u])
And it worked very well to pull each individual links from this website.
But the problem is I need to change the page number I need to loop by myself every time.
I want to let this code upgrade to iterate by calculating how many time it need. Not by manually inputting.
Thank you very much.
This is bad idea to hardcode the number of pages in your script. Try just to click "Next" button while it is enabled:
from selenium.common.exceptions import NoSuchElementException
while True:
try:
# do whatever you need to do on page
driver.find_element_by_xpath('//li[not(#class="disabled")]/span[text()="Next"]').click()
except NoSuchElementException:
break
This should allow you to execute page scraping until the last page reached
Also note that using lines current = driver.current_url and driver.get(current) makes no sense at all, so you might skip them