TBSelenium: Tor page closes immediately - python

I'm trying to use Tor with selenium, which works through the use of tbselenium.
However, when loading an url or clicking a web element, the page immideately closes when finishing the action, instead of remaining open as would be the case when using selenium with chrome.
Any ideas to keep the page open?
import tbselenium.common as cm
from tbselenium.tbdriver import TorBrowserDriver
from tbselenium.utils import launch_tbb_tor_with_stem
tbb_dir = "C:\\pathto\\Tor Browser\\"
tor_process = launch_tbb_tor_with_stem(tbb_path=tbb_dir)
for i in range(1):
with TorBrowserDriver(tbb_dir, tor_cfg=cm.USE_STEM) as driver:
driver.load_url("http://hln.be",3,wait_for_page_body=True)
#driver.get('https://google.be')
try:
policypage=driver.find_element_by_xpath("//a[contains(#href,'members/join')]")
policypage.click()
usern=driver.find_element_by_xpath("//input[contains(#id,'user_member_username')]")
usern.send_keys('Tryout')
except:
print('different look')

As Furas said, use the standard driver declaration.

Related

Selenium - How to open url in active/current tab in python

def get_status(driver):
try:
driver.execute(Command.STATUS)
return "Alive"
except (socket.error, httplib.CannotSendRequest):
return "Dead"
if get_status(driver) == 'Alive':
#using this opens a whole new window but i want to go to another website in same tab.
driver.get('https://www.amazon.in')
else :
driver.get('https://www.google.in')
so basically i want to open a new url in the active tab of chrome/firefox browser. but i failed to find any workaround. i hope you can answer this question.
all tutorials on this redirect me to java function
driver.navigate.to()
which is not working in python.
This code did open a new URL within the same tab.
from selenium import webdriver
from selenium.webdriver.remote.command import Command
import http.client as httplib
import socket
def get_status(driver):
try:
driver.execute(Command.STATUS)
return "Alive"
except (socket.error, httplib.CannotSendRequest):
return "Dead"
driver = webdriver.Chrome(executable_path="path to chromedriver.exe")
driver.maximize_window()
driver.implicitly_wait(20)
driver.get("https://www.youtube.com/")
if get_status(driver) == 'Alive':
# using this opens a whole new window but i want to go to another website in same tab.
driver.get('https://www.amazon.in')
else:
driver.get('https://www.google.in')
driver.quit()

Take snapshot using Selenium for page that doesn't stop loading

I'm using Selenium to capture screenshots of a web page. It works great on sites like stackoverflow but I'm trying to use it on a page that never stops loading. Is there a way to grab the screenshot after x seconds regardless if it's done or not?
Current code:
import os
from selenium import webdriver
def main():
driver = webdriver.Chrome()
with open('test.txt', 'r') as f:
for url in f.readlines():
driver.get('http://' + url)
sn_name = os.path.join('Screenshots', url.strip().replace('/', '-') + '.png')
print('Attempting to save:', sn_name)
if not driver.save_screenshot(sn_name):
raise Exception('Could not save screen shot: ' + sn_name)
driver.quit()
if __name__ == '__main__':
main()
I think it doesn't work like that.
Webdriver will implicit waiting for a page loading till timed-out.
It should give you a timeout exception.
I think you should use try-except to catch that and then take a screenshot.
Otherwise, you should do a multithreading programming for another thread to take a screenshot.

selenium works on local and not on azure server

I am trying to get video url from links on this page. Video link could be seen on https://in.news.yahoo.com/video/jaguar-fighter-aircraft-crashes-near-084300217.html . (Open in Chrome)
For that I wrote chrome web driver related code as below :
from bs4 import BeautifulSoup
from selenium import webdriver
from pyvirtualdisplay import Display
chromedriver = '/usr/local/bin/chromedriver'
os.environ['webdriver.chrome.driver'] = chromedriver
display = Display(visible=0, size=(800,600))
display.start()
driver = webdriver.Chrome(chromedriver)
driver.get('https://in.news.yahoo.com/video/jaguar-fighter-aircraft-crashes-near-084300217.html')
try:
element = WebDriverWait(driver, 20).until(lambda driver: driver.find_elements_by_class_name('yvp-main'))
self.yahoo_video_trend = []
for s in driver.find_elements_by_class_name('yvp-main'):
print "Processing link - ", item['link']
trend = item
print item['description']
trend['video_link'] = s.find_element_by_tag_name('video').get_attribute('src')
print
print s.find_element_by_tag_name('video').get_attribute('src')
self.yahoo_video_trend.append(trend)
except:
return
This works fine on my local system but when I run on my azure server it does not give any result at s.find_element_by_tag_name('video').get_attribute('src')
I have installed chrome on my azureserver.
Update :
Please see, requests and Beautifulsoup I already tried, but as yahoo loads html content dynamically from json, I could not get it using them.
And yeah azure server is simple linux system with command line access. Not any application.
I tried to reproduce your issue using you code. However, I found there was no tag named video in that page('https://in.news.yahoo.com/video/jaguar-fighter-aircraft-crashes-near-084300217.html')(using IE and Chrome to test).
I used the developer Tool to check the HTML code, like this picture:
It seems that this page used the flash player to play video,not HTML5 video control.
For this reason, I suggest that you can check your code whether used the rightly tag name.
Any concerns, please feel free to let me know.
We tried to reproduce the error on our side. I was not able to get chrome driver to work, but I did try the firefox driver and it worked fine. It was able to load the page and get the link via the URL.
Can you change your code to print the exception and send it to us, to see where the script is failing?
Change your code:
except:
return
try
do
except Exception,e: print str(e)
Send us the exception, so we can take a look.

Python - Selenium - Print Webpage

How do I print a webpage using selenium please.
import time
from selenium import webdriver
# Initialise the webdriver
chromeOps=webdriver.ChromeOptions()
chromeOps._binary_location = "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe"
chromeOps._arguments = ["--enable-internal-flash"]
browser = webdriver.Chrome("C:\\Program Files\\Google\\Chrome\\Application\\chromedriver.exe", port=4445, chrome_options=chromeOps)
time.sleep(3)
# Login to Webpage
browser.get('www.webpage.com')
Note: I am using the, at present, current version of Google Chrome: Version 32.0.1700.107 m
While it's not directly printing the webpage, it is easy to take a screenshot of the entire current page:
browser.save_screenshot("screenshot.png")
Then the image can be printed using any image printing library. I haven't personally used any such library so I can't necessarily vouch for it, but a quick search turned up win32print which looks promising.
The key "trick" is that we can execute JavaScript in the selenium browser window using the "execute_script" method of the selenium webdriver, and if you execute the JavaScript command "window.print();" it will activate the browsers print function.
Now, getting it to work elegantly requires setting a few preferences to print silently, remove print progress reporting, etc. Here is a small but functional example that loads up and prints whatever website you put in the last line (where 'http://www.cnn.com/' is now):
import time
from selenium import webdriver
import os
class printing_browser(object):
def __init__(self):
self.profile = webdriver.FirefoxProfile()
self.profile.set_preference("services.sync.prefs.sync.browser.download.manager.showWhenStarting", False)
self.profile.set_preference("pdfjs.disabled", True)
self.profile.set_preference("print.always_print_silent", True)
self.profile.set_preference("print.show_print_progress", False)
self.profile.set_preference("browser.download.show_plugins_in_list",False)
self.driver = webdriver.Firefox(self.profile)
time.sleep(5)
def get_page_and_print(self, page):
self.driver.get(page)
time.sleep(5)
self.driver.execute_script("window.print();")
if __name__ == "__main__":
browser_that_prints = printing_browser()
browser_that_prints.get_page_and_print('http://www.cnn.com/')
The key command you were probably missing was "self.driver.execute_script("window.print();")" but one needs some of that setup in init to make it run smooth so I thought I'd give a fuller example. I think the trick alone is in a comment above so some credit should go there too.

Any advice for sending a request to a website from Python?

def align_sequences(IDs):
import webbrowser
import urllib,urllib2
url = 'http://www.uniprot.org/align/'
params = {'query':IDs}
data = urllib.urlencode(params)
request = urllib2.Request(url, data)
response = urllib2.urlopen(request)
job_url = response.geturl()
webbrowser.open(job_url)
align_sequences('Q4PRD1 Q7LZ61')
With this function I want to open 'http://www.uniprot.org/align/', request the protein sequences with IDs Q4PRD1 and Q7LZ61 to be aligned, and then open the website in my browser.
Initially it seems to be working fine - running the script will open the website and show the alignment job to being run. However, it will keep going forever and never actually finish, even if I refresh the page. If I input the IDs in the browser and hit 'align' it works just fine, taking about 8 seconds to align.
I am not familiar with the differences between running something directly from a browser and running it from Python. Do any of you have an idea of what might be going wrong?
Thank you :-)
~Max
You have to click align button. You can't do this with webbrowser though. One option is to use selenium:
from selenium import webdriver
url = 'http://www.uniprot.org/align/'
ids = 'Q4PRD1 Q7LZ61'
driver = webdriver.Firefox()
driver.get(url)
q = driver.find_element_by_id('alignQuery')
q.send_keys(ids)
btn = driver.find_element_by_id("sequence-align-submit")
btn.click()
I think this is in javascript. If you look at the html-code of button Align you can see
onclick="UniProt.analytics('AlignmentSubmissionPage', 'click', 'Submit align'); submitAlignForm();"
UniProt.analytics() and submitAlignForm() some javascript magic. This magic in js-compr.js2013_11 file.
You can view this file using http://jsbeautifier.org/ and then do on python what do javascript.

Categories