How to specify source interface in python requests module? [duplicate] - python

I have a script that makes some requests with urllib2.
I use the trick suggested elsewhere on Stack Overflow to bind another ip to the application, where my my computer has two ip addresses (IP A and IP B).
I would like to switch to using the requests library. Does anyone knows how I can achieve the same functionality with that library?

Looking into the requests module, it looks like it uses httplib to send the http requests. httplib uses socket.create_connection() to connect to the www host.
Knowing that and following the monkey patching method in the link you provided:
import socket
real_create_conn = socket.create_connection
def set_src_addr(*args):
address, timeout = args[0], args[1]
source_address = ('IP_ADDR_TO_BIND_TO', 0)
return real_create_conn(address, timeout, source_address)
socket.create_connection = set_src_addr
import requests
r = requests.get('http://www.google.com')
It looks like httplib passes all the arguments (to create_connection()) as args (vs keywords) as trying to extend the kwargs dict inside set_src_addr was failing. I believe the above is what you want, but I don't have a dual homed machine to test on.

actually, you should bind IP to requests like this :
import urllib3
real_create_conn = urllib3.util.connection.create_connection
def set_src_addr(address, timeout, *args, **kw):
source_address = ('YOUR_BIND_IP', 0)
return real_create_conn(address, timeout=timeout, source_address=source_address)
urllib3.util.connection.create_connection = set_src_addr
import requests
r = requests.get('http://ipecho.net/plain')
print( r.text)

Reposting my answer from a duplicate question, as I believe it is a cleaner solution:
You can use a SourceAddressAdapter from requests-toolbelt:
import requests
from requests_toolbelt.adapters import source
source = source.SourceAddressAdapter('127.0.0.1')
with requests.Session() as session:
session.mount('http://', source)
r = session.get("http://example.com/foo/bar")

Related

How do I go about enabling HTTP proxy in Twitty Twister?

I'm using the twittytwister module to implement a Twitter client but it looks like it's no longer being developed. I need to connect to an HTTP proxy to be able to reach the internet but the module does not have such an option so I'm looking at modifying it.
def __downloadPage(factory, *args, **kwargs):
downloader = factory(*args, **kwargs)
if downloader.scheme == 'https':
from twisted.internet import ssl
contextFactory = ssl.ClientContextFactory()
reactor.connectSSL(downloader.host, downloader.port,
downloader, contextFactory)
else:
reactor.connectTCP(downloader.host, downloader.port,
downloader)
return downloader
def getPage(url, *args, **kwargs):
return __downloadPage(client.HTTPClientFactory, url, *args, **kwargs)
What can I do here to make it connect with my proxy? Do I replace client.HTTPClientFactory with something else?
The newer (added in Twisted 9.0.0) HTTP client API, twisted.web.client.Agent, includes support for connecting to an HTTP proxy. For example, you could write:
from twisted.web.client import Agent, ProxyAgent
from twisted.internet.endpoints import clientFromString
from twisted.internet import reactor
from os import environ
try:
proxy = environ["HTTP_PROXY"]
except KeyError:
agent = Agent(reactor)
else:
agent = ProxyAgent(clientFromString(reactor, proxy))
See the endpoints documentation for details about the expected format of the HTTP_PROXY environment variable this example uses.
Unfortunately it looks like twittytwister uses the older HTTP client API so you'll need to port it to use Agent before you can benefit from ProxyAgent. Fortunately the getPage style API is fairly limited so it shouldn't be very hard to replace it with Agent-style code.

Why doesn't requests.get() return? What is the default timeout that requests.get() uses?

In my script, requests.get never returns:
import requests
print ("requesting..")
# This call never returns!
r = requests.get(
"http://www.some-site.example",
proxies = {'http': '222.255.169.74:8080'},
)
print(r.ok)
What could be the possible reason(s)? Any remedy? What is the default timeout that get uses?
What is the default timeout that get uses?
The default timeout is None, which means it'll wait (hang) until the connection is closed.
Just specify a timeout value, like this:
r = requests.get(
'http://www.example.com',
proxies={'http': '222.255.169.74:8080'},
timeout=5
)
From requests documentation:
You can tell Requests to stop waiting for a response after a given
number of seconds with the timeout parameter:
>>> requests.get('http://github.com', timeout=0.001)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
requests.exceptions.Timeout: HTTPConnectionPool(host='github.com', port=80): Request timed out. (timeout=0.001)
Note:
timeout is not a time limit on the entire response download; rather,
an exception is raised if the server has not issued a response for
timeout seconds (more precisely, if no bytes have been received on the
underlying socket for timeout seconds).
It happens a lot to me that requests.get() takes a very long time to return even if the timeout is 1 second. There are a few way to overcome this problem:
1. Use the TimeoutSauce internal class
From: https://github.com/kennethreitz/requests/issues/1928#issuecomment-35811896
import requests from requests.adapters import TimeoutSauce
class MyTimeout(TimeoutSauce):
def __init__(self, *args, **kwargs):
if kwargs['connect'] is None:
kwargs['connect'] = 5
if kwargs['read'] is None:
kwargs['read'] = 5
super(MyTimeout, self).__init__(*args, **kwargs)
requests.adapters.TimeoutSauce = MyTimeout
This code should cause us to set the read timeout as equal to the
connect timeout, which is the timeout value you pass on your
Session.get() call. (Note that I haven't actually tested this code, so
it may need some quick debugging, I just wrote it straight into the
GitHub window.)
2. Use a fork of requests from kevinburke: https://github.com/kevinburke/requests/tree/connect-timeout
From its documentation: https://github.com/kevinburke/requests/blob/connect-timeout/docs/user/advanced.rst
If you specify a single value for the timeout, like this:
r = requests.get('https://github.com', timeout=5)
The timeout value will be applied to both the connect and the read
timeouts. Specify a tuple if you would like to set the values
separately:
r = requests.get('https://github.com', timeout=(3.05, 27))
NOTE: The change has since been merged to the main Requests project.
3. Using evenlet or signal as already mentioned in the similar question:
Timeout for python requests.get entire response
I wanted a default timeout easily added to a bunch of code (assuming that timeout solves your problem)
This is the solution I picked up from a ticket submitted to the repository for Requests.
credit: https://github.com/kennethreitz/requests/issues/2011#issuecomment-477784399
The solution is the last couple of lines here, but I show more code for better context. I like to use a session for retry behaviour.
import requests
import functools
from requests.adapters import HTTPAdapter,Retry
def requests_retry_session(
retries=10,
backoff_factor=2,
status_forcelist=(500, 502, 503, 504),
session=None,
) -> requests.Session:
session = session or requests.Session()
retry = Retry(
total=retries,
read=retries,
connect=retries,
backoff_factor=backoff_factor,
status_forcelist=status_forcelist,
)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
# set default timeout
for method in ('get', 'options', 'head', 'post', 'put', 'patch', 'delete'):
setattr(session, method, functools.partial(getattr(session, method), timeout=30))
return session
then you can do something like this:
requests_session = requests_retry_session()
r = requests_session.get(url=url,...
In my case, the reason of "requests.get never returns" is because requests.get() attempt to connect to the host resolved with ipv6 ip first. If something went wrong to connect that ipv6 ip and get stuck, then it retries ipv4 ip only if I explicit set timeout=<N seconds> and hit the timeout.
My solution is monkey-patching the python socket to ignore ipv6(or ipv4 if ipv4 not working), either this answer or this answer are works for me.
You might wondering why curl command is works, because curl connect ipv4 without waiting for ipv6 complete. You can trace the socket syscalls with strace -ff -e network -s 10000 -- curl -vLk '<your url>' command. For python, strace -ff -e network -s 10000 -- python3 <your python script> command can be used.
Patching the documented "send" function will fix this for all requests - even in many dependent libraries and sdk's. When patching libs, be sure to patch supported/documented functions, not TimeoutSauce - otherwise you may wind up silently losing the effect of your patch.
import requests
DEFAULT_TIMEOUT = 180
old_send = requests.Session.send
def new_send(*args, **kwargs):
if kwargs.get("timeout", None) is None:
kwargs["timeout"] = DEFAULT_TIMEOUT
return old_send(*args, **kwargs)
requests.Session.send = new_send
The effects of not having any timeout are quite severe, and the use of a default timeout can almost never break anything - because TCP itself has default timeouts as well.
Reviewed all the answers and came to conclusion that the problem still exists. On some sites requests may hang infinitely and using multiprocessing seems to be overkill. Here's my approach(Python 3.5+):
import asyncio
import aiohttp
async def get_http(url):
async with aiohttp.ClientSession(conn_timeout=1, read_timeout=3) as client:
try:
async with client.get(url) as response:
content = await response.text()
return content, response.status
except Exception:
pass
loop = asyncio.get_event_loop()
task = loop.create_task(get_http('http://example.com'))
loop.run_until_complete(task)
result = task.result()
if result is not None:
content, status = task.result()
if status == 200:
print(content)
UPDATE
If you receive a deprecation warning about using conn_timeout and read_timeout, check near the bottom of THIS reference for how to use the ClientTimeout data structure. One simple way to apply this data structure per the linked reference to the original code above would be:
async def get_http(url):
timeout = aiohttp.ClientTimeout(total=60)
async with aiohttp.ClientSession(timeout=timeout) as client:
try:
etc.

Using python Requests library to consume from Twitter's user streams - how to detect disconnection?

I'm trying to use Requests to create a robust way of consuming from Twitter's user streams. So far, I've produced the following basic working example:
"""
Example of connecting to the Twitter user stream using Requests.
"""
import sys
import json
import requests
from oauth_hook import OAuthHook
def userstream(access_token, access_token_secret, consumer_key, consumer_secret):
oauth_hook = OAuthHook(access_token=access_token, access_token_secret=access_token_secret,
consumer_key=consumer_key, consumer_secret=consumer_secret,
header_auth=True)
hooks = dict(pre_request=oauth_hook)
config = dict(verbose=sys.stderr)
client = requests.session(hooks=hooks, config=config)
data = dict(delimited="length")
r = client.post("https://userstream.twitter.com/2/user.json", data=data, prefetch=False)
# TODO detect disconnection somehow
# https://github.com/kennethreitz/requests/pull/200/files#L13R169
# Use a timeout? http://pguides.net/python-tutorial/python-timeout-a-function/
for chunk in r.iter_lines(chunk_size=1):
if chunk and not chunk.isdigit():
yield json.loads(chunk)
if __name__ == "__main__":
import pprint
import settings
for obj in userstream(access_token=settings.ACCESS_TOKEN, access_token_secret=settings.ACCESS_TOKEN_SECRET, consumer_key=settings.CONSUMER_KEY, consumer_secret=settings.CONSUMER_SECRET):
pprint.pprint(obj)
However, I need to be able to handle disconnections gracefully. Currently, when the stream disconnects, the above just hangs, and there are no exceptions raised.
What would be the best way to achieve this? Is there a way to detect this through the urllib3 connection pool? Should I use a timeout?
I would recommend adding a timeout parameter to the client.post() call. http://docs.python-requests.org/en/latest/user/quickstart/#timeouts
However, it is important to note that requests doesn't set the TCP timeout, so you could set that using the following:
import socket
socket.setdefaulttimeout(TIMEOUT)

Python - Twisted, Proxy and modifying content

So i've looked around at a few things involving writting an HTTP Proxy using python and the Twisted framework.
Essentially, like some other questions, I'd like to be able to modify the data that will be sent back to the browser. That is, the browser requests a resource and the proxy will fetch it. Before the resource is returned to the browser, i'd like to be able to modify ANY (HTTP headers AND content) content.
This ( Need help writing a twisted proxy ) was what I initially found. I tried it out, but it didn't work for me. I also found this ( Python Twisted proxy - how to intercept packets ) which i thought would work, however I can only see the HTTP requests from the browser.
I am looking for any advice. Some thoughts I have are to use the ProxyClient and ProxyRequest classes and override the functions, but I read that the Proxy class itself is a combination of the both.
For those who may ask to see some code, it should be noted that I have worked with only the above two examples. Any help is great.
Thanks.
To create ProxyFactory that can modify server response headers, content you could override ProxyClient.handle*() methods:
from twisted.python import log
from twisted.web import http, proxy
class ProxyClient(proxy.ProxyClient):
"""Mangle returned header, content here.
Use `self.father` methods to modify request directly.
"""
def handleHeader(self, key, value):
# change response header here
log.msg("Header: %s: %s" % (key, value))
proxy.ProxyClient.handleHeader(self, key, value)
def handleResponsePart(self, buffer):
# change response part here
log.msg("Content: %s" % (buffer[:50],))
# make all content upper case
proxy.ProxyClient.handleResponsePart(self, buffer.upper())
class ProxyClientFactory(proxy.ProxyClientFactory):
protocol = ProxyClient
class ProxyRequest(proxy.ProxyRequest):
protocols = dict(http=ProxyClientFactory)
class Proxy(proxy.Proxy):
requestFactory = ProxyRequest
class ProxyFactory(http.HTTPFactory):
protocol = Proxy
I've got this solution by looking at the source of twisted.web.proxy. I don't know how idiomatic it is.
To run it as a script or via twistd, add at the end:
portstr = "tcp:8080:interface=localhost" # serve on localhost:8080
if __name__ == '__main__': # $ python proxy_modify_request.py
import sys
from twisted.internet import endpoints, reactor
def shutdown(reason, reactor, stopping=[]):
"""Stop the reactor."""
if stopping: return
stopping.append(True)
if reason:
log.msg(reason.value)
reactor.callWhenRunning(reactor.stop)
log.startLogging(sys.stdout)
endpoint = endpoints.serverFromString(reactor, portstr)
d = endpoint.listen(ProxyFactory())
d.addErrback(shutdown, reactor)
reactor.run()
else: # $ twistd -ny proxy_modify_request.py
from twisted.application import service, strports
application = service.Application("proxy_modify_request")
strports.service(portstr, ProxyFactory()).setServiceParent(application)
Usage
$ twistd -ny proxy_modify_request.py
In another terminal:
$ curl -x localhost:8080 http://example.com
For two-way proxy using twisted see the article:
http://sujitpal.blogspot.com/2010/03/http-debug-proxy-with-twisted.html

Python Twisted JSON RPC

Can anyone recommend some simple code to set up a simple JSON RPC client and server using twisted?
I found txJSON-RPC, but I was wondering if someone had some experience using some of these anc could recommend something.
txJSONRPC is great. I use it and it works. I suggest you give it a try.
SERVER:
from txjsonrpc.web import jsonrpc
from twisted.web import server
from twisted.internet import reactor
class Math(jsonrpc.JSONRPC):
"""
An example object to be published.
"""
def jsonrpc_add(self, a, b):
"""
Return sum of arguments.
"""
return a + b
reactor.listenTCP(7080, server.Site(Math()))
reactor.run()
CLIENT:
from twisted.internet import reactor
from txjsonrpc.web.jsonrpc import Proxy
def printValue(value):
print "Result: %s" % str(value)
def printError(error):
print 'error', error
def shutDown(data):
print "Shutting down reactor..."
reactor.stop()
proxy = Proxy('http://127.0.0.1:7080/')
d = proxy.callRemote('add', 3, 5)
d.addCallback(printValue).addErrback(printError).addBoth(shutDown)
reactor.run()
As a bonus, I will leave some alternative: amp.
http://amp-protocol.net
If you are looking for a framework-independent approach, this lib I pushed (using mixin) might be helpful:
Cyclone, a Tornado async web server implementation written using twisted, has a built-in json-rpc request handler that uses the python json/simplejson module. Example server and client code is here.
wikipedia has a bunch of implementations listed for python: https://en.wikipedia.org/wiki/JSON-RPC#Implementations
That said, txjason feels like the one best integrated with twisted. It seems to support out of order responses out of the box for example. Most of it would be portable to python3 using six. The most horrible part is the parameter validation, which is not exposed in the normal public API anyway.
For me this worked better then "libraries" , speaking of client.
TESTDATA = {'id': 1234,
'method': 'getbalance',
}
URL = 'http://localhost:7777'
d= getPage(URL,method="POST",postdata=json.dumps(TESTDATA))
d.addBoth(lambda x :print(json.loads(x)))

Categories