Failure to change identify using Tor when scraping google

Failure to change identify using Tor when scraping google - python

I am trying to automate google search but unfortunately my IP is blocked. After some searches, it seems like using Tor could get me a new IP dynamically. However, after adding the following code block into my existing code, google still blocks my attempts even under the new IP. So I am wondering is there anything wrong with my code?
Code (based on this)
from TorCtl import TorCtl
import socks
import socket
import urllib2
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "127.0.0.1", 9050)
__originalSocket = socket.socket
def newId():
''' Clean circuit switcher
Restores socket to original system value.
Calls TOR control socket and closes it
Replaces system socket with socksified socket
'''
socket.socket = __originalSocket
conn = TorCtl.connect(controlAddr="127.0.0.1", controlPort=9051, passphrase="mypassword")
TorCtl.Connection.send_signal(conn, "NEWNYM")
conn.close()
socket.socket = socks.socksocket
## generate a new ip
newId()
### verify the new ip
print(urllib2.urlopen("http://icanhazip.com/").read())
## run my scrape code
google_scrape()
new error message
<br>Sometimes you may be asked to solve the CAPTCHA if you are using advanced terms that robots are known to use, or sending requests very quickly.
</div>
IP address: 89.234.XX.25X<br>Time: 2017-02-12T05:02:53Z<br>

Google (and many other sites such as "protected" by Cloudflare) filter requests coming via TOR by the IP address of Tor exit nodes. They can do this because the list of IP addresses of Tor exit nodes is public.
Thus changing your identity - which in turn changes your Tor circuit and will likely result in using a different exit node and thus different IP (although the latter two are not guaranteed) - will not work against this block.
For your use case you might consider using VPN instead of Tor, as their IP addresses are less likely to be blocked. Especially if you use non-free VPN.

Related

Check if service is reachable and working with socket module in Python

The context related to my question is that I work for a company in the networking area. This company has several stores around the country where DVRs are accessible through port 2781 and a domain for people to access security cameras, the problem is that in order for these people to successfully access DVRs through the domain and port you must have a DMZ configured in the modem of the stores. To corroborate the DMZ I'm trying to use Python with the sockets module but I don't understand the best way to do it yet.
import socket
s = socket.socket()
s.connect((domain, port))
s.close()
Once I make the proper connection which is the best way to check if there is a communication? Work it with an exception or just use socket.recv and detect if it is empty?

In order for connect to succeed there already has to be some kind of connection be done. Otherwise the TCP handshake would fail. Thus the first step would be to check if connect succeeds or throws an exception.
It can still be possible that there is some deep packet inspection firewall in place which does not block the initial connection but only blocks the later data exchange. To find out if this is the case you have to do actual bidirectional communication. But how this communication should look like depends on the specific application protocol which is unknown in your case. Still you need to check that a) sending and receiving works (catching exceptions) and b) that a response returns the expected data.

Tor, Stem, and Sockets - Changing Identity with TOR

I'm trying to run Tor through python. My goal is to be able to switch exits or otherwise alter my IP from time-to-time at a time of my choosing. I've followed several tutorials and encountered several different errors.
This code prints my IP address
import requests
r = requests.get('http://icanhazip.com/')
r.content
# returns my regular IP
I've downloaded and 'installed' the Tor browser (although it seems to run from an .exe). When it is running, I can use the following code to return my Tor IP, which seems to change every 10 minutes or so, so everything seems to be working so far
import socks
import socket
socks.set_default_proxy(socks.SOCKS5, "127.0.0.1", 9150)
socket.socket = socks.socksocket
r = requests.get('http://icanhazip.com/')
print r.content
# Returns a different IP
Now, I've found instructions here on how to request a new identity programmatically, but this is where things begin to go wrong. I run the following code
from stem import Signal
from stem.control import Controller
with Controller.from_port(port = 9151) as controller:
controller.authenticate()
controller.signal(Signal.NEWNYM)
and I get the error "SOCKS5Error: 0x01: General SOCKS server failure" at the line that creates the controller. I thought this might be a configuration problem - the instructions say that there should be a config file called 'torrc' that contains the port numbers, apart from other things.
Eventually, in the directory C:\Users\..myname..\Desktop\Tor Browser\Browser\TorBrowser\Data\Tor I found three files, a 'torrc', a 'torrc-defaults', and a 'torrc.orig.1'. My 'torrc-defaults' looks similar to the 'torrc' shown in the documentation, my 'torrc.orig.1' is empty, and my 'torrc' has two lines of comments that look as follows with some more settings afterwards but no port settings
# This file was generated by Tor; if you edit it, comments will not be preserved
# The old torrc file was renamed to torrc.orig.1 or similar, and Tor will ignore it
...
I've tried to make sure that the two ports listed in the 'torrc-defaults' match the ports in the socks statement at the beginning and the controller statment further on. When these are 9150 and 9151 respectively I get the general SOCKS server failure listed above.
When I try and run the Controller with the wrong port (in this case, 9051 which was recommended in other posts but for me leads to Tor browser failing to load when I adjust the 'torrc-defaults') then I instead get the error message "[Errno 10061] No connection could be made because the target machine actively refused it"
Finally, when I run the Controller with Tor browser running in the background but without first running the SOCKS statement, I instead get a lot of warning text and finally an error, as shown in part below:
...
...
250 __OwningControllerProcess
DEBUG:stem:GETCONF __owningcontrollerprocess (runtime: 0.0010)
TRACE:stem:Sent to tor:
SIGNAL NEWNYM
TRACE:stem:Received from tor:
250 OK
TRACE:stem:Received from tor:
650 SIGNAL NEWNYM
TRACE:stem:Sent to tor:
QUIT
TRACE:stem:Received from tor:
250 closing connection
INFO:stem:Error while receiving a control message (SocketClosed): empty socket content
At first I thought this looked quite promising - I put in a quick check line like this both before and after the new user request
print requests.get('http://icanhazip.com/').content
It prints the IP, but unfortunately it's the same both before and afterwards.
I'm pretty lost here - any support would be appreciated!
Thanks.

Get your hashed password :
tor --hash-password my_password
Modify the configuration file :
sudo gedit /etc/tor/torrc
Add these lines
ControlPort 9051
HashedControlPassword 16:E600ADC1B52C80BB6022A0E999A7734571A451EB6AE50FED489B72E3DF
CookieAuthentication 1
Note :- The HashedControlPassword is generated from first command
Refer this link
I haven't tried it on my own because I am using mac, but I hope this helps with configuration

You can use torify to achieve your goal. for example run the python interpreter as this command "sudo torify python" and every connection will be routed through TOR.
Reference on torify: https://trac.torproject.org/projects/tor/wiki/doc/TorifyHOWTO

Sending traffic from multiple source IPs Scapy

I am trying to send some traffic via python using scapy (on Ubuntu). I am using a range of source IPs (10.0.0.32/29). Everything seems to be working (at least I see the traffic in wireshark and it reaches my firewall) but I am having a problem completing the TCP handshake using the IP addresses that aren't the main IP of the eth0 adapter. Does anyone know if this is possible to do:
Source:
from scapy.all import *
import random
sp=random.randint(1024,65535)
ip=IP(src="10.0.0.234/29",dst="www.google.com")
SYN=TCP(sport=sp, dport=80,flags="S",seq=10)
SYNACK=sr1(ip/SYN)
my_ack=SYNACK.seq+1
ACK=TCP(sport=sp,dport=80,flags="A",seq=11,ack=my_ack)
send(ip/ACK)
payload="SEND TCP"
PUSH=TCP(sport=sp,dport=80,flags="PA",seq=11,ack=my_ack)
send(ip/PUSH/payload)

Because you are behind a NAT/router, you should check it allows you to use the full range of IPs. If it is running DHCP protocol, your eth0 will typically recieve a unique IP adress that will be the only routed in your private network.
Furthermore, you must ensure your kernel knows what IPs are attributed to it, else it will drop response packets. If you want to use the full range of IP, you have two choices :
Create virtual devices with virtual mac adresses, each requesting an IP through DHCP.
Configure your router so it statically routes the full IP table to your host, and alias each IP you intend to use
Once you have done that, there is no reason you wouldn't be able to syn/ack from your multiple source IPs. From distant server point of view, there wouldn't be any difference between what you are trying to do and several machines in a local network requesting a page at the same time.

Select outgoing IP from django?

If we're running on a host which can have multiple IP addresses (it's actually EC2 with elastic IPs), is it possible to select from django which outgoing IP address to use?
Even if this is just a random choice it'd be fine.
Edit: Apologies, I was not clear above. The requests are new outgoing calls made from within Python, not a response to a client request - happy for that to go back down whatever pipe it came in on.

I guess that for webapp responses, the server is always going to use one connection socket, so if the request came to IP address X, the response will be sent in the same TCP connection and will originate from the same address X, even though the host also has addresses Y and Z.
On the other hand, if your application creates another TCP connection during its operation, its probably possible to bind that socket on any of host's IP addresses you want. If you're using python's socket module, you can do it by specifying source_address argument in socket.create_connection() call. Unfortunately, not all higher-level libraries may allow this level of control.

I am not sure about the question quite well, but just wanted to drop this page, if it comes to any help python outgoing ip

Trouble initiating a TCP connection in Python--blocking and timing out

For a class project I'm trying to do some socket programming Python but running into a very basic issue. I can't create a TCP connection from my laptop to a lab machine. (Which I'm hoping to use as the "server") Without even getting into the scripts I have written, I've been simply trying interpreter line commands with no success. On the lab machine (kh4250-39.cselabs.umn.edu) I type the following into Python:
from socket import *
sock = socket()
sock.bind(('', 8353))
sock.listen(5)
sock.accept()
And then on my laptop I type:
from socket import *
sock = socket()
sock.connect(('kh4250-39.cselabs.umn.edu', 8353))
At which point both machines block and don't do anything until the client times out or I send a SIGINT. This code is pretty much exactly copied from examples I've found online and from Mark Lutz's book Programming Python (using '' for the server host name apparently uses the OS default and is fairly common). If I run both ends in my computer and use 'localhost' for the hostname it works fine, so I suspect it's some problem with the hostnames I'm using on one or both ends. I'm really not sure what could be going wrong on such a simple example. Does anyone have an idea?

A good way to confirm whether it's a firewall issue or not is to perform a telnet from the command-line to the destination host in question:
% telnet kh4250-39.cselabs.umn.edu 8353
Trying 128.101.38.44...
And then sometime later:
telnet: connect to address 128.101.38.44: Connection timed out
If it just hangs there at Trying and then eventually times out, chances are the connection to the remote host on that specific port is being blocked by a firewall. It could either be at the network layer (e.g. a real firewall or a router access-list) or at the host, such as iptables or other host-based filtering mechanisms.
Access to this lab host might only be available from within the lab or the campus network. Talk with your professor or a network administrator or someone "in the know" on the network to find out for sure.

Try to bind the server to 'kh4250-39.cselabs.umn.edu' instead of '':
sock.bind(('kh4250-39.cselabs.umn.edu', 8353))
If this does not work: Another reason could be a firewall blocking the port 8353....

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Failure to change identify using Tor when scraping google - python

Related

Check if service is reachable and working with socket module in Python

Tor, Stem, and Sockets - Changing Identity with TOR

Sending traffic from multiple source IPs Scapy

Select outgoing IP from django?

Trouble initiating a TCP connection in Python--blocking and timing out

Categories

Resources