Hi guys I have a hard time making a script.
I have a python script using selenium
I am doing automation on a site, the script needs to be running on that site for a long time.
The problem is that the site times out, the robot returns an error and stops executing.
I need that when this happens close all windows and reconnect to the site again
site timeout is = 30min
if anyone can help me it will help a lot!!!
from selenium import webdriver
import pyautogui
URL = 'https://XXXXXXX'
URL2 = 'https://XXXXXX'
user = 'user12345'
password = 'password12345'
class Web:
browser = webdriver.Ie(URL)
browser.find_element_by_name('login').send_keys(user)
browser.find_element_by_name('password').send_keys(password)
pyautogui.moveTo(121,134)# here I open a login window so I can use another link that I need to use
pyautogui.click(121,134)
browser.execute_script("window.open()")
browser.switch-to.frame(browser.window_handles[1])
browser.get(URL2)
with open(tabela, "r") as leitor:
reader = csv.DictReader(leitor, delimiter=';')
for linha in reader:
folder = linha['folder']
try:
browser.find_element_by_id('field').send_keys(folder)
browser.find_element_by_id('save').click()
except:
with open('falied.txt', 'a') as wirter:
writer.write(folder)
writer.close()
browser.quit()
if __name__ == '__main__':
Web()
from now on he needs to be running the code inside the page
this code is an example similar to my original code
Replace your part of code with the code below:
if __name__ == '__main__':
while True:
try:
Web()
Except:
browser.quit()
As you can see we're calling it in while True which means it'll run indefinitely browser.quit() will close the selenium.
Related
I have some python code which scrapes a website and reports the live price of a specific crypto. When I use a while loop to keep printing the live price it keeps printing the same price over and over even when the live price on the website has changed. I thought that maybe my code was scraping it and coming to that website too fast so I added a delay using the time module but even after a 1 minute delay it will not display the correct price but instead prints the same price over and over. Manually ending and restarting the code seemed to make this bug go away but I want this program to run 24/7 and email me when a price reaches a certain point. This is my code so far: (BTW I am a beginner)
import requests
import bs4
import time
run = True
while run == True:
# time.sleep(60)
res = requests.get("https://coinmarketcap.com/currencies/gitcoin/")
soup_obj = bs4.BeautifulSoup(res.text, "lxml")
item = soup_obj.select(".priceValue___11gHJ")[0]
item = item.text
print(item)
exit()
This has a loop but I have added an exit() function so that it ends and so I can manually restart it. I just need a way for this code to automatically end itself and then restart repeatedly. I am also using the community edition of Pycharm (latest edition).
You can write your program to call a subprocess instead of doing the web call itself. That subprocess can call requests, return whatever you want via stdout and exit. There are multiple ways to do this. You could write separate scripts or use multiprocess.Process, but in this example I've written a script that calls itself and uses command line parameters to know which role it is playing.
import sys
if len(sys.argv) == 1:
# run poller as subprocess so it exits
import time
import subprocess as subp
while True:
result = subp.run([sys.executable, __file__, "called"], capture_output=True)
# assuming program returns ascii float in single line
item = result.stdout.decode("ascii").strip()
print(item)
time.sleep(60)
else:
import requests
import bs4
res = requests.get("https://coinmarketcap.com/currencies/gitcoin/")
soup_obj = bs4.BeautifulSoup(res.text, "lxml")
item = soup_obj.select(".priceValue___11gHJ")[0]
item = item.text
sys.stdout.write(item)
I'm developing a python app that sets form componentes of a certain webpage (developed using vue3.js)
I'm able to set a datepicker's value, but, after that, next operation clears the dapicker away.
I must be doing something really fool, but I'm out of ideas.
Here's my code:
import sys
from selenium import webdriver
#-----------------------------------------------------------------------------------------------------------------------
driver_headfull = None
try:
driver_headfull = webdriver.Firefox()
firefox_options = webdriver.FirefoxOptions()
except Exception as e:
print('ERROR WHEN CREATING webdriver.Firefox()')
print("Please, verify you have installed: firefox and that geckodriver.exe's is in %PATH%")
print(e)
sys.exit(5)
#navigate to url
driver_headfull.get('http://a_certain_ip')
#set a datepicker
element_to_interact_with_headfull = driver_headfull.find_element_by_id('datesPolicyEffectiveDate')
driver_headfull.execute_script("arguments[0].value = '2020-07-01';", element_to_interact_with_headfull)
#set a <div> that behaves like a <select>.
element_to_interact_with_headfull = driver_headfull.find_element_by_id('industryDescription')
driver_headfull.execute_script("arguments[0].click();", element_to_interact_with_headfull)
element_pseudo_select_option_headfull = driver_headfull.find_element_by_id('descriptionIndustryoption0')
driver_headfull.execute_script("arguments[0].click();", element_pseudo_select_option_headfull)
# this very last instruction resets value of html_id=datesPolicyEffectiveDate (datepicker)
while(True):
pass
Any ideas will be so welcome!
Well, this was a pain. I'll post it in case it's of any use for someone.
It seems the component was reloaded, and I was setting the son of the component by means of
arguments[0].value = '2020-07-01';
so the parent wouldn't see the change, and would automatically reload the child with a default (empty) value.
Adding the following snippet solved my trouble:
driver_headfull.execute_script("arguments[0].value = '2021-07-01';", element_to_interact_with_headfull)
driver_headfull.execute_script("arguments[0].dispatchEvent(new Event('input', { bubbles: true }));", element_to_interact_with_headfull)
I am trying to run selenium using ThreadsPoolExecutor. The website requires a login and I am trying to speed up a step in what I am trying to do in the website. But everytime a thread opens chrome, I need to relogin and it sometimes just hangs. I login once first without using threads to do some processing. And from here on, I like to open a few chome webdrivers without the need to relogin. Is there a way around this? PS: website has no id and password strings in the url.
def startup(dirPath):
# Start the WebDriver, load options
options = webdriver.ChromeOptions()
options.add_argument("--disable-infobars")
options.add_argument("--enable-file-cookies")
params = {'behavior': 'allow', 'downloadPath': dirPath}
wd = webdriver.Chrome(options=options, executable_path=r"C:\Chrome\chromedriver.exe")
wd.execute_cdp_cmd('Page.setDownloadBehavior', params)
# wd.delete_all_cookies()
wd.set_page_load_timeout(30)
wd.implicitly_wait(10)
return wd
def webLogin(dID, pw, wd):
wd.get('some url')
# Login, clear any outstanding login in id
wd.find_element_by_id('username').clear()
wd.find_element_by_id('username').send_keys(dID)
wd.find_element_by_id('password').clear()
wd.find_element_by_id('password').send_keys(pw)
wd.find_element_by_css_selector('.button').click()
if __name__ == '__main__':
dirPath, styleList = firstProcessing()
loginAndClearLB(dID, dPw, dirPath) # calls startup & webLogin, this is also my 1st login
# many webdrivers spawned here
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
results = {executor.submit(addArtsToLB, dID, dPw, dirPath, style): style for style in styleList}
#Do other stuff
wd2 = startup(dirPath)
webLogin(dID, dPw, wd2)
startDL(wd2)
logOut(wd2, dirPath)
Any help would be greatly appreciated. Thanks!!
Like mentioned above, you could obtain the authentication token from the first login and than include it in all the subsequent requests.
However, another option (if you're using basic auth) is to just add the username and password into the URL, like:
https://username:password#your.domain.com
ok it looks like there is no solution yet for more complicated websites that do not use basic authentication. My modified Solution:
def webOpenWithCookie(wd, cookies):
wd.get('https://some website url/404')
for cookie in cookies:
wd.add_cookie(cookie)
wd.get('https://some website url/home')
return wd
def myThreadedFunc(dirPath, style, cookies): # this is the function that gets threaded
wd = startup(dirPath) # just starts chrome
wd = webOpenWithCookie(wd, cookies) # opens a page in the site and adds cookies to wd and then open your real target page. No login required now.
doSomethingHere(wd, style)
wd.quit() # close all the threads here better I think
if __name__ == '__main__':
dirPath, styleList = firstProcessing()
wd1 = startup(dirPath)
wd1 = webLogin(dID, dPw, wd1) # here i login once
cookies = wd1.get_cookies() # get the cookie from here
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
results = {executor.submit(myThreadedFunc, dirPath, style, cookies): style for style in styleList} # this spawns threads, but each thread will not need login although the compromise is it needs to go to 404 page first.
Fairly new to python, I learn by doing, so I thought I'd give this project a shot. Trying to create a script which finds the google analytics request for a certain website parses the request payload and does something with it.
Here are the requirements:
Ask user for 2 urls ( for comparing the payloads from 2 diff. HAR payloads)
Use selenium to open the two urls, use browsermobproxy/phantomJS to
get all HAR
Store the HAR as a list
From the list of all HAR files, find the google analytics request, including the payload
If Google Analytics tag found, then do things....like parse the payload, etc. compare the payload, etc.
Issue: Sometimes for a website that I know has google analytics, i.e. nytimes.com - the HAR that I get is incomplete, i.e. my prog. will say "GA Not found" but that's only because the complete HAR was not captured so when the regex ran to find the matching HAR it wasn't there. This issue in intermittent and does not happen all the time. Any ideas?
I'm thinking that due to some dependency or latency, the script moved on and that the complete HAR didn't get captured. I tried the "wait for traffic to stop" but maybe I didn't do something right.
Also, as a bonus, I would appreciate any help you can provide on how to make this script run fast, its fairly slow. As I mentioned, I'm new to python so go easy :)
This is what I've got thus far.
import browsermobproxy as mob
from selenium import webdriver
import re
import sys
import urlparse
import time
from datetime import datetime
def cleanup():
s.stop()
driver.quit()
proxy_path = '/Users/bob/Downloads/browsermob-proxy-2.1.4-bin/browsermob-proxy-2.1.4/bin/browsermob-proxy'
s = mob.Server(proxy_path)
s.start()
proxy = s.create_proxy()
proxy_address = "--proxy=127.0.0.1:%s" % proxy.port
service_args = [proxy_address, '--ignore-ssl-errors=yes', '--ssl-protocol=any'] # so that i can do https connections
driver = webdriver.PhantomJS(executable_path='/Users/bob/Downloads/phantomjs-2.1.1-windows/phantomjs-2.1.1-windows/bin/phantomjs', service_args=service_args)
driver.set_window_size(1400, 1050)
urlLists = []
collectTags = []
gaCollect = 0
varList = []
for x in range(0,2): # I want to ask the user for 2 inputs
url = raw_input("Enter a website to find GA on: ")
time.sleep(2.0)
urlLists.append(url)
if not url:
print "You need to type something in...here"
sys.exit()
#gets the two user url and stores in list
for urlList in urlLists:
print urlList, 'start 2nd loop' #printing for debug purpose, no need for this
if not urlList:
print 'Your Url list is empty'
sys.exit()
proxy.new_har()
driver.get(urlList)
#proxy.wait_for_traffic_to_stop(15, 30) #<-- tried this but did not do anything
for ent in proxy.har['log']['entries']:
gaCollect = (ent['request']['url'])
print gaCollect
if re.search(r'google-analytics.com/r\b', gaCollect):
print 'Found GA'
collectTags.append(gaCollect)
time.sleep(2.0)
break
else:
print 'No GA Found - Ending Prog.'
cleanup()
sys.exit()
cleanup()
This might be a stale question, but I found an answer that worked for me.
You need to change two things:
1 - Remove sys.exit() -- this causes your programme to stop after the first iteration through the ent list, so if what you want is not the first thing, it won't be found
2 - call new_har with the captureContent option enabled to get the payload of requests:
proxy.new_har(options={'captureHeaders':True, 'captureContent': True})
See if that helps.
Struggling with what is I am sure a very straight forward problem. I have a scheduled task set up which launches a batch file, which in turn runs a Python script. All is well, however I cannot seem to close the Python shell once the script is finished. The result is lots of open windows.
If this is a Python issue, I have read the best way to close is to do the following:
import selenium
import json
import time
import datetime
import sys
from selenium import webdriver
from datetime import timedelta
today = datetime.datetime.today()
yesterday = today - timedelta(days=1)
yesterday = yesterday.strftime("%d.%m.%Y")
browser = webdriver.Chrome(executable_path = 'c:/xampp/htdocs/portal/functions/timon/chromedriver.exe')
browser.get('http://adventures.timon.is')
time.sleep(2)
browser.find_element_by_id('tbxNumerstarfsmanns').clear()
browser.find_element_by_id('tbxNumerstarfsmanns').send_keys('user')
browser.find_element_by_id('tbxUserLykilord').clear()
browser.find_element_by_id('tbxUserLykilord').send_keys('pass')
time.sleep(2)
browser.find_element_by_css_selector('input[type=\"submit\"]').click()
browser.find_element_by_css_selector("a[href*=reports]").click()
browser.find_element_by_link_text("Salary administrators").click()
browser.find_element_by_link_text("Punch-in report").click()
time.sleep(2)
browser.find_element_by_id('id_fromdate').clear()
browser.find_element_by_id('id_fromdate').send_keys(yesterday)
browser.find_element_by_id('id_todate').clear()
browser.find_element_by_id('id_todate').send_keys(yesterday)
time.sleep(2)
browser.find_element_by_css_selector("input[type=submit]").click()
time.sleep(2)
results = browser.find_elements_by_css_selector("table#resultstable td")
columns = [val.text for val in results]
data = json.dumps(columns)
text_file = open("c:/xampp/htdocs/portal/functions/timon/info.txt", "w")
text_file.write(data)
text_file.close()
browser.close()
sys.exit()
However this does not work.
Batch file looks like this...
start "extractTimon" "C:\xampp\Python36-32\python.exe" C:\xampp\htdocs\portal\functions\timon\extractTimon.py
If anyone could point me in the right direction, I'd really appreciate it.