I am trying to use tor browser, and get a new IP address each URL I visit in python. I am able to open an instance of selenium running the tor browser, but how can I request a new IP every website I visit?
binary = '/Applications/TorBrowser.app/Contents/MacOS/firefox'
if os.path.exists(binary) is False:
raise ValueError("The binary path to Tor firefox does not exist.")
firefox_binary = FirefoxBinary(binary)
browser = None
def get_browser(binary=None):
browser = webdriver.Firefox(firefox_binary=binary)
return browser
if __name__ == "__main__":
browser = get_browser(binary=firefox_binary)
urls = (
('tor browser check', 'https://check.torproject.org/'),
('ip checker', 'http://icanhazip.com')
)
for url_name, url in urls:
print "getting", url_name, "at", url
browser.get(url)
To use Python to request a new IP for every request, you need to open a connection to the ControlPort and issue a NEWNYM signal.
You can use Stem to simplify the connection and commands:
from stem.control import Controller
from stem import Signal
if __name__ == '__main__':
with Controller.from_port(port = 9051) as controller:
controller.authenticate('password') # provide the password here if you set one
controller.signal(Signal.NEWNYM) # switch to clean circuits
Keep in mind Tor may rate limit NEWNYM requests so you may need to wait a short while (default 10 seconds) before issuing that command. Also, due to the limited number of exit nodes, your circuits might get the same exit node depending on how many requests you are issuing.
You need to issue this command every time you want to get a new IP (switch circuits).
Related
I want to use a tbselenium package for browser automation but however when I try to run the code I get the following error TBDriverPortError: SOCKS port 9050 is not listening.
following is the code snippet that I have used.
import unittest
from time import sleep
from tbselenium.tbdriver import TorBrowserDriver
import tbselenium.common as cm
class TestSite(unittest.TestCase):
def setUp(self):
# Point the path to the tor-browser_en-US directory in your system
tbpath = r'C:\Users\Sachin\Desktop\Tor Browsernew'
self.driver = TorBrowserDriver(tbpath, tbb_logfile_path='test.log')
self.url = "https://check.torproject.org"
def tearDown(self):
# We want the browser to close at the end of each test.
self.driver.close()
def test_available(self):
self.driver.load_url(self.url)
# Find the element for success
element = self.driver.find_element_by_class_name('on')
self.assertEqual(str.strip(element.text),
"Congratulations. This browser is configured to use Tor.")
sleep(2) # So that we can see the page
if __name__ == '__main__':
unittest.main()
can any one help me solve this error have been struggling for days
You need to uncomment ControlPort, HashedControlPassword and CookieAuthentication from Torrc file.
you can get torrc from here
then reload tor services.
if you are facing issues post full trackback in your Question
I'm testing my own ddos protection feature implemented in my server (this is necessary). Currently I have a terrible loop for making multiple tor requests, each with it's own identity.
os.system("taskkill /f /im tor.exe")
os.startfile("C:/Tor/Browser/TorBrowser/Tor/tor.exe")
session = requests.session()
session.proxies = {}
session.proxies['http'] = 'socks5h://localhost:9050'
session.proxies['https'] = 'socks5h://localhost:9050'
Now I want to multithread this for faster speeds, since each tor connection takes ages to load.
If I google how to run multiple tor instances, I get info on how to do this from within the tor browser itself, never how to do it programmatically, Is there a way to do this on windows python3 specifically?
Any help appreciated
The key point to understand about running multiple separate Tor processes is that each one will need to listen on it's own ControlPort and SocksPort so that your clients can issue requests through each individual instance.
If you use Stem, stem.process.launch_tor_with_config would be the recommended way to launch multiple Tor processes. By using this method, you can pass the necessary config options dynamically to each client without having to create individual files, and you'll have better process management over the Tor instances.
If you want to use os, you will need to create one config file per instance and pass that to tor when you start it.
At minimum, create one torrc config file for each instance you want to run with the following:
torrc.1
ControlPort 9800
SocksPort 9801
torrc.2
ControlPort 9802
SocksPort 9803
Each individual client will connect on the different socks ports to issue requests.
To start them, use:
os.system("C:/Tor/Browser/TorBrowser/Tor/tor.exe -f C:/path/to/torrc.1")
os.system("C:/Tor/Browser/TorBrowser/Tor/tor.exe -f C:/path/to/torrc.2")
Then create one or more clients per instance:
session1 = requests.session()
session1.proxies = {}
session1.proxies['http'] = 'socks5h://localhost:9801'
session1.proxies['https'] = 'socks5h://localhost:9801'
session2 = requests.session()
session2.proxies = {}
session2.proxies['http'] = 'socks5h://localhost:9803'
session2.proxies['https'] = 'socks5h://localhost:9803'
first of All, install stem like this in terminal
>>>pip install stem
write this code in a text file and rename the file like this myfile.py
include stem and requests first like this in start of file and write following code
import requests
import stem.process
x = 6
for i in range(1, x):
cp = str(10000+i)
sp = str(11000+i)
tp1 = stem.process.launch_tor_with_config(tor_cmd = 'C:\\Users\\<Tor Directory>\\Browser\\TorBrowser\\Tor\\tor.exe', config = {
'ControlPort': cp,
'SocksPort' : sp,
'DataDirectory': 'C:/<Any Path for data directories>/proxies/'+str(i)+'/',
'Log': [
'NOTICE stdout',
'ERR file C:/<Any Path for Error file>/tor_error_log.txt',
],
},
)
proxies = {
'http': 'socks5h://127.0.0.1:'+str(sp),
'https': 'socks5h://127.0.0.1:'+str(sp)
}
r1 = requests.get('http://ipinfo.io/json', proxies=proxies)
print('\n')
print(r1.content)
print('\n')
now go into the folder that contains myfile.py and run command prompt(cmd) or any terminal there and launch the file like this.
>>>python myfile.py
this will launch 5 tor processes on these ports 11001,11002,11003,11004,11005
you can access the tor proxy(socks5) by using ip address 127.0.0.1 and any of the above ports from any program.
if you open task manager you will see 5 tor processes running that consumes 10-20mb of ram each process
if you get an error like this while running myfile.py in terminal,
can not bind listening port. working with config files left us in broken state. Dying
then just close all processes of tor and launch myfile.py again. this error happens because you have already a tor process running on the port.
to create more tor processes, close all tor instances from task manager, change the value of variable x in start of file like this
x = any integer like 10,20,30,50
save myfile.py and run this file again.
cheers!
today I try to make a "waiting page" using Flask.
I mean a client makes a request, I want to show him a page like "wait the process can take a few minutes", and when the process ends on the server display the result.I want to display "wait" before my function manageBill.teste but redirect work only when it returned right?
#application.route('/teste', methods=['POST', 'GET'])
def test_conf():
if request.method == 'POST':
if request.form.get('confList') != None:
conf_file = request.form.get('confList')
username = request.form.get('username')
password = request.form.get('password')
date = request.form.get('date')
if date == '' or conf_file == '' or username == '' or password == '':
return "You forget to provide information"
newpid = os.fork()
if newpid == 0: # in child procces
print('A new child ', os.getpid())
error = manageBill.teste(conf_file, username, password, date)
print ("Error :" + error)
return redirect('/tmp/' + error)
else: # in parent procces
return redirect('/tmp/wait')
return error
return manageBill.manageTest()`
My /tmp route:
#application.route('/tmp/<wait>')
def wait_teste(wait):
return "The procces can take few minute, you will be redirected when the teste is done.<br>" + wait
If you are using the WSGI server (the default), requests are handled by threads. This is likely incompatible with forking.
But even if it wasn't, you have another fundamental issue. A single request can only produce a single response. Once you return redirect('/tmp/wait') that request is done. Over. You can't send anything else.
To support such a feature you have a few choices:
The most common approach is to have AJAX make the request to start a long running process. Then setup an /is_done flask endpoint that you can check (via AJAX) periodically (this is called polling). Once your endpoint returns that the work is done, you can update the page (either with JS or by redirecting to a new page).
Have /is_done be a page instead of an API endpoint that is queried from JS. Set an HTTP refresh on it (with some short timeout like 10 seconds). Then your server can send a redirect for the /is_done endpoint to the results page once the task finishes.
Generally you should strive to serve web requests as quickly as possible. You shouldn't leave connections open (to wait for a long task to finish) and you should offload these long running tasks to a queue system running separately from the web process. In this way, you can scale your ability to handle web requests and background processes separately (and one failing does not bring the other down).
I am using Python 2.7 in order to perform a simple task of launching a browser, verify the header, and close the browser
#Launch the browser # google
new = 0
url = "http://www.google.com/"
webbrowser.open(url, new=new)
#Check for the header
conn = httplib.HTTPConnection("www.google.com")
conn.request("HEAD", "/")
r1 = conn.getresponse()
#Close the browser
os.system("taskkill /im iexplore.exe")
This just runs on an infinite loop in order to verify continuous connectivity. Ping check isn't sufficient for the amount of traffic I need, or I would use that.
My problem is that if I do lose connectivity, the script freezes and I get addressinfo errors. How do I ignore this, or recognize it, kill the browser and keep the script running?
Sorry, if I'm not doing this right...it is my first post.
I don't think you actually need a browser here at all.
Meanwhile, the way you ignore or recognize errors is with a try statement. So:
while True:
try:
conn = httplib.HTTPConnection("www.google.com")
conn.request("HEAD", "/")
r1 = conn.getresponse()
if not my_verify_response_func(r1):
print('Headers are wrong!')
except Exception as e:
print('Failed to check headers with {}'.format(e))
time.sleep(60) # I doubt you want to run as fast as possible
Instead of just using urllib does anyone know of the most efficient package for fast, multithreaded downloading of URLs that can operate through http proxies? I know of a few such as Twisted, Scrapy, libcurl etc. but I don't know enough about them to make a decision or even if they can use proxies.. Anyone know of the best one for my purposes? Thanks!
is's simple to implement this in python.
The urlopen() function works
transparently with proxies which do
not require authentication. In a Unix
or Windows environment, set the
http_proxy, ftp_proxy or gopher_proxy
environment variables to a URL that
identifies the proxy server before
starting the Python interpreter
# -*- coding: utf-8 -*-
import sys
from urllib import urlopen
from BeautifulSoup import BeautifulSoup
from Queue import Queue, Empty
from threading import Thread
visited = set()
queue = Queue()
def get_parser(host, root, charset):
def parse():
try:
while True:
url = queue.get_nowait()
try:
content = urlopen(url).read().decode(charset)
except UnicodeDecodeError:
continue
for link in BeautifulSoup(content).findAll('a'):
try:
href = link['href']
except KeyError:
continue
if not href.startswith('http://'):
href = 'http://%s%s' % (host, href)
if not href.startswith('http://%s%s' % (host, root)):
continue
if href not in visited:
visited.add(href)
queue.put(href)
print href
except Empty:
pass
return parse
if __name__ == '__main__':
host, root, charset = sys.argv[1:]
parser = get_parser(host, root, charset)
queue.put('http://%s%s' % (host, root))
workers = []
for i in range(5):
worker = Thread(target=parser)
worker.start()
workers.append(worker)
for worker in workers:
worker.join()
usually proxies filter websites categorically based on how the website was created. It is difficult to transmit data through proxies based on categories. Eg youtube is classified as audio/video streams therefore youtube is blocked in some places espically schools.
If you want to bypass proxies and get the data off a website and put it in your own genuine website like a dot com website that can be registered it to you.
When you are making and registering the website categorise your website as anything you want.