Limiting number of processes in multiprocessing python

Limiting number of processes in multiprocessing python - python

My requirement is to generate hundreds of HTTP POST requests per second. I am doing it using urllib2.
def send():
req = urllib2.Request(url)
req.add_data(data)
response = urllib2.urlopen(req)
while datetime.datetime.now() <= ftime:
p=Process(target=send, args=[])
p.start()
time.sleep(0.001)
The problem is this code sometimes for some iterations throws either of following exceptions:
HTTP 503 Service Unavailable.
URLError: <urlopen error [Errno -2] Name or service not known>
I have tried using requests(HTTP for humans) as well but I am having some proxy issues with that module. Seems like requests is sending http packets to proxy server even when target machine is within same LAN. I don't want packets to go to proxy server.

The simplest way to limit number of concurrent connections is to use a thread pool:
#!/usr/bin/env python
from itertools import izip, repeat
from multiprocessing.dummy import Pool # use threads for I/O bound tasks
from urllib2 import urlopen
def fetch(url_data):
try:
return url_data[0], urlopen(*url_data).read(), None
except EnvironmentError as e:
return url_data[0], None, str(e)
if __name__=="__main__":
pool = Pool(20) # use 20 concurrent connections
params = izip(urls, repeat(data)) # use the same data for all urls
for url, content, error in pool.imap_unorderred(fetch, params):
if error is None:
print("done: %s: %d" % (url, len(content)))
else:
print("error: %s: %s" % (url, error))
503 Service Unavailable is a server error. It might fail to handle the load.
Name or service not known is a dns error. If you need make many requests; install/enable a local caching dns server.

Related

changing ip in iteration with tor python

I want to change my IP everytime I run through loop. I am trying to achieve it with TOR. I have seen few posts with similar question, but solution given there is not working. So far my code looks like:
import socks
#import socket
import requests
import time
for i in range(1,3):
socks.setdefaultproxy(proxy_type=socks.PROXY_TYPE_SOCKS5, addr="127.0.0.1", port=9050)
try:
print (requests.get("http://icanhazip.com").text)
except Exception as e:
time.sleep(30)
print (type(e))
print (e)
I need different IP every time, instead of same IP.
edit : I have tried using approach given on How to change Tor identity in Python?. My limitation is not to use any external libraries. also solution provided by Nedim is without external library.
so far I have tried following to get different IP from mentioned link:
import socket
import sys
import os
try:
tor_c = socket.create_connection(("127.0.0.1", 9051 ))
secret = os.urandom(32) # pass this to authenticate
hash = tor_c.s2k_gen(secret) # pass this to Tor on startup.
tor_c.send('AUTHENTICATE "{}"\r\nSIGNAL NEWNYM\r\n'.format(hash))
response = tor_c.recv(1024)
if response != '250 OK\r\n250 OK\r\n':
sys.stderr.write('Unexpected response from Tor control port: {}\n'.format(response))
except Exception as e:
sys.stderr.write('Error connecting to Tor control port: {}\n'.format(repr(e)))
but it is throwing following error:
Error connecting to Tor control port: ConnectionRefusedError(10061, 'No connection could be made because the target machine actively refused it', None, 10061, None)

def renew_connection():
with Controller.from_port(port=9051) as controller:
controller.authenticate(password='password')
controller.signal(Signal.NEWNYM)
controller.close()
def request_tor(url, headers):
print((requests.get(url,proxies={'http': 'socks5h://localhost:9050'}, headers=headers)).text)
r = requests.get(url)
print('direct IP:', r.text)
if __name__ == "__main__":
url = 'http://icanhazip.com'
headers = { 'User-Agent': UserAgent().random }
for i in range(5):
request_tor(url,headers)
renew_connection()
time.sleep(5)

Python socket.recv evaluation

I am writing a small python script in order to use it for checking haproxy. What the script does is to connect on haproxy socket and "poll" for stats.
#!/usr/bin/env python
import socket
import sys
my_socket = socket.socket( socket.AF_UNIX, socket.SOCK_STREAM )
try:
my_socket.connect( "/var/run/haproxy/haproxy.sock" )
except socket.error:
print "cant connect to socket"
sys.exit(1)
my_socket.send("show stat\n")
response = my_socket.recv(1024)
print response
What i wish to do is if there is no response from the socket, meaning if haproxy will not output the stats, to exit the script with exit code (1).Is it possible to somehow evaluate if an answer is received?

By default the socket will be in blocking mode and recv() will block until data is received or the connection is closed.
If you can assume that the proxy will respond within a certain amount of time you can set a timeout on the client socket. The timeout is the number of seconds to wait for a socket operation to complete. If the operation is not complete an exception is raised:
my_socket.settimeout(5.0) # 5 seconds. Set this after connecting.
try:
response = my_socket.recv(1024)
print response
except socket.timeout as exc:
print 'timed out waiting for response from proxy'
my_socket.close()
sys.exit(1)
That's one way and it's probably the easiest way. You could also look at the select() module which provides functions that will let your client wait for the socket to become readable, which indicates that there is data to be read, or that the socket has been closed. It really depends on what behaviour you want. Example using select():
import select
r, _, _ = select.select([my_socket], [], [], 5.0)
if r:
response = my_socket.recv(1024)
print response
else:
print 'Nothing received from proxy in 5 seconds'
my_socket.close()
sys.exit(1)

Python Requests package: lost connection while streaming

I'd like to use the Requests package to connect to the streaming API of a web service. Suppose I use the following code to send a request, receive the response and iterate through the lines of response as they arrive:
import requests
r = requests.get('http://httpbin.org/stream/20', stream=True)
for line in r.iter_lines():
if line:
print line
While waiting to receive new data, we are basically waiting for r.iter_lines() to generate a new piece of data. But what if I lose internet connection while waiting? How can I find out so I can attempt to reconnect?

You can disconnect from your network to have a try. Requests raise such error:
requests.exceptions.ConnectionError: HTTPConnectionPool(host='httpbin.org', port=80): Max retries exceeded with url: /stream/20 (Caused by : [Errno -3] Temporary failure in name resolution)
The error message shows Requests already retries for network error. You can refer to this answer for setting the max_retries. If you wants more customization (e.g. waits between retries), do it in a loop:
import socket
import requests
import time
MAX_RETRIES = 2
WAIT_SECONDS = 5
for i in range(MAX_RETRIES):
try:
r = requests.get('http://releases.ubuntu.com/14.04.1/ubuntu-14.04.1-desktop-amd64.iso',
stream=True, timeout=10)
idx = 1
for chunk in r.iter_content(chunk_size=1024):
if chunk:
print 'Chunk %d received' % idx
idx += 1
break
except requests.exceptions.ConnectionError:
print 'build http connection failed'
except socket.timeout:
print 'download failed'
time.sleep(WAIT_SECONDS)
else:
print 'all tries failed'
EDIT: I tested with a large file. I used iter_content instead, because it's a binary file. iter_lines is based on iter_content (source codes), so I believe the behaviour is same. Procedure: run the codes with network connected. After receiving some chunks, disconnect. Wait 2-3 seconds, reconnect, the downloading continued. So requests package DOES retry for connection lost in the iteration.
Note: If no network when build the connection (requests.get()), ConnectionError is raised; if network lost in the iter_lines / iter_content, socket.timeout is raised.

urllib2.urlopen(z).read() - try for x seconds then move to the next item

I have a list of x websites from which I want to scrape data.
Code:
import urllib2
from urllib2 import Request, urlopen, HTTPError, URLError
def checkurl(z):
print urllib2.urlopen('http://'+z).read()
for x in t2w: #t2w is my list
print x
checkurl(x)
print "\n"
As of now, the whole process stops, as soon as a website is
unavailable. What can I do to let urllib2 try for x time, give an error e.g "website not available' and then move on to the next item in the list.
Maybe should have mentioned that this is for .onion
import socks
import socket
def create_connection(address, timeout=None, source_address=None):
sock = socks.socksocket()
sock.connect(address)
return sock
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "127.0.0.1", 9150)
socket.socket = socks.socksocket
socket.create_connection = create_connection
#####
import urllib2
from urllib2 import Request, urlopen, HTTPError, URLError
def checkurl(z):
try:
urllib2.urlopen("http://"+z, timeout=1).read()
except urllib2.URLError, e:
raise MyException("Error raised: %r" % e)
#print urllib2.urlopen('http://'+z).read()

You can use the timeout parameter.
try:
urllib2.urlopen("http://example.com", timeout=1)
except urllib2.URLError, e:
raise MyException("Error raised: %r" % e)
From the docs:
The optional timeout parameter specifies a timeout in seconds for
blocking operations like the connection attempt (if not specified, the
global default timeout setting will be used). This actually only works
for HTTP, HTTPS and FTP connections.

Python server response codes

Is there a way in python to check the server response codes (200, 301, 404) in the header of a specified ip range (1.1.1.1 - 1.1.1.254) maybe its even possible to do it multi-threaded?
P.S. fond out that its possible with the "HTTPResponse.status" object (http://docs.python.org/library/httplib.html) how could i now check the ip range with it?
P.S. May be it would be a good idea to first check if port 80 is open and then only test the ones with open ports i think it would speed it really up because of 254 ip's maybe 30 are using port 80.

You can just try and connect with a normal GET request to the root of the host, with a short timeout (or longer one if you want it to wait more). Then you can run it through a map.
import httplib
from multiprocessing import Pool
def test_ip(addr):
conn = httplib.HTTPConnection(addr, timeout=1)
try:
conn.request("GET", "/")
except:
return addr, httplib.REQUEST_TIMEOUT
else:
resp = conn.getresponse()
return addr, resp.status
finally:
conn.close()
p = Pool(20)
results = p.map(test_ip, ["1.1.1.%d" % d for d in range(1,255)], chunksize=10)
print results
# [('1.1.1.1', 408), ('1.1.1.2', 408), ...]
Adjust Pool size and chunksize to suit.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Limiting number of processes in multiprocessing python - python

Related

changing ip in iteration with tor python

Python socket.recv evaluation

Python Requests package: lost connection while streaming

urllib2.urlopen(z).read() - try for x seconds then move to the next item

Python server response codes

Categories

Resources