I am making HTTP requests using the requests library in python, but I need the IP address from the server that responded to the HTTP request and I'm trying to avoid making two calls (and possibly having a different IP address from the one that responded to the request).
Is that possible? Does any python HTTP library allow me to do that?
PS: I also need to make HTTPS requests and use an authenticated proxy.
Update 1:
Example:
import requests
proxies = {
"http": "http://user:password#10.10.1.10:3128",
"https": "http://user:password#10.10.1.10:1080",
}
response = requests.get("http://example.org", proxies=proxies)
response.ip # This doesn't exist, this is just an what I would like to do
Then, I would like to know to which IP address requests are connected from a method or property in the response. In other libraries, I was able to do that by finding the sock object and using the getpeername() function.
It turns out that it's rather involved.
Here's a monkey-patch while using requests version 1.2.3:
Wrapping the _make_request method on HTTPConnectionPool to store the response from socket.getpeername() on the HTTPResponse instance.
For me on python 2.7.3, this instance was available on response.raw._original_response.
from requests.packages.urllib3.connectionpool import HTTPConnectionPool
def _make_request(self,conn,method,url,**kwargs):
response = self._old_make_request(conn,method,url,**kwargs)
sock = getattr(conn,'sock',False)
if sock:
setattr(response,'peer',sock.getpeername())
else:
setattr(response,'peer',None)
return response
HTTPConnectionPool._old_make_request = HTTPConnectionPool._make_request
HTTPConnectionPool._make_request = _make_request
import requests
r = requests.get('http://www.google.com')
print r.raw._original_response.peer
Yields:
('2a00:1450:4009:809::1017', 80, 0, 0)
Ah, if there's a proxy involved or the response is chunked, the HTTPConnectionPool._make_request isn't called.
So here's a new version patching httplib.getresponse instead:
import httplib
def getresponse(self,*args,**kwargs):
response = self._old_getresponse(*args,**kwargs)
if self.sock:
response.peer = self.sock.getpeername()
else:
response.peer = None
return response
httplib.HTTPConnection._old_getresponse = httplib.HTTPConnection.getresponse
httplib.HTTPConnection.getresponse = getresponse
import requests
def check_peer(resp):
orig_resp = resp.raw._original_response
if hasattr(orig_resp,'peer'):
return getattr(orig_resp,'peer')
Running:
>>> r1 = requests.get('http://www.google.com')
>>> check_peer(r1)
('2a00:1450:4009:808::101f', 80, 0, 0)
>>> r2 = requests.get('https://www.google.com')
>>> check_peer(r2)
('2a00:1450:4009:808::101f', 443, 0, 0)
>>> r3 = requests.get('http://wheezyweb.readthedocs.org/en/latest/tutorial.html#what-you-ll-build')
>>> check_peer(r3)
('162.209.99.68', 80)
Also checked running with proxies set; proxy address is returned.
Update 2016/01/19
est offers an alternative that doesn't need the monkey-patch:
rsp = requests.get('http://google.com', stream=True)
# grab the IP while you can, before you consume the body!!!!!!!!
print rsp.raw._fp.fp._sock.getpeername()
# consume the body, which calls the read(), after that fileno is no longer available.
print rsp.content
Update 2016/05/19
From the comments, copying here for visibility, Richard Kenneth Niescior offers the following that is confirmed working with requests 2.10.0 and Python 3.
rsp=requests.get(..., stream=True)
rsp.raw._connection.sock.getpeername()
Update 2019/02/22
Python3 with requests version 2.19.1.
resp=requests.get(..., stream=True)
resp.raw._connection.sock.socket.getsockname()
Update 2020/01/31
Python3.8 with requests 2.22.0
resp = requests.get('https://www.google.com', stream=True)
resp.raw._connection.sock.getsockname()
Try:
import requests
proxies = {
"http": "http://user:password#10.10.1.10:3128",
"https": "http://user:password#10.10.1.10:1080",
}
response = requests.get('http://jsonip.com', proxies=proxies)
ip = response.json()['ip']
print('Your public IP is:', ip)
Related
I do have a list of urls such as ["www.bol.com ","www.dopper.com"]format.
In order to be inputted on scrappy as start URLs I need to know the correct HTTP protocol.
For example:
["https://www.bol.com/nl/nl/", "https://dopper.com/nl"]
As you see the protocol might differ from https to http or even with or without www.
Not sure if there are any other variations.
is there any python tool that can determine the right protocol?
If not and I have to build the logic by myself what are the cases that I should take into account?
For option 2, this is what I have so far:
def identify_protocol(url):
try:
r = requests.get("https://" + url + "/", timeout=10)
return r.url, r.status_code
except requests.HTTPError:
r = requests.get("http//" + url + "/", timeout=10)
return r.url, r.status_code
except requests.HTTPError:
r = requests.get("https//" + url.replace("www.","") + "/", timeout=10)
return r.url, r.status_code
except:
return None, None
is there any other possibility I should take into account?
There is no way to determine the protocol/full domain from the fragment directly, the information simply isn't there. In order to find it you would either need:
a database of the correct protocol/domains, which you can lookup your domain fragment in
to make the request and see what the server tells you
If you do (2) you can of course gradually build your own database to avoid needing the request in future.
On many https servers, if you attempt a http connection you will be redirected to https. If you are not, then you can reliably use the http. If the http fails, then you could try again with https and see if it works.
The same applies to the domain: if the site usually redirects, you can perform the request using the original domain and see where you are redirected.
An example using requests:
>>> import requests
>>> r = requests.get('http://bol.com')
>>> r
<Response [200]>
>>> r.url
'https://www.bol.com/nl/nl/'
As you can see the request object url parameter has the final destination URL, plus protocol.
As I understood question, you need to retrieve final url after all possible redirections. It could be done with built-in urllib.request. If provided url has no scheme you can use http as default. To parse input url I used combination of urlsplit() and urlunsplit().
Code:
import urllib.request as request
import urllib.parse as parse
def find_redirect_location(url, proxy=None):
parsed_url = parse.urlsplit(url.strip())
url = parse.urlunsplit((
parsed_url.scheme or "http",
parsed_url.netloc or parsed_url.path,
parsed_url.path.rstrip("/") + "/" if parsed_url.netloc else "/",
parsed_url.query,
parsed_url.fragment
))
if proxy:
handler = request.ProxyHandler(dict.fromkeys(("http", "https"), proxy))
opener = request.build_opener(handler, request.ProxyBasicAuthHandler())
else:
opener = request.build_opener()
with opener.open(url) as response:
return response.url
Then you can just call this function on every url in list:
urls = ["bol.com ","www.dopper.com", "https://google.com"]
final_urls = list(map(find_redirect_location, urls))
You can also use proxies:
from itertools import cycle
urls = ["bol.com ","www.dopper.com", "https://google.com"]
proxies = ["http://localhost:8888"]
final_urls = list(map(find_redirect_location, urls, cycle(proxies)))
To make it a bit faster you can make checks in parallel threads using ThreadPoolExecutor:
from concurrent.futures import ThreadPoolExecutor
urls = ["bol.com ","www.dopper.com", "https://google.com"]
final_urls = list(ThreadPoolExecutor().map(find_redirect_location, urls))
I am trying to connect to Splunk via API using python. I can connect, and get a 200 status code but when I read the content, it doesn't read the content of the page. View below:
Here is my code:
import json
import requests
import re
baseurl = 'https://my_splunk_url:8888'
username = 'my_username'
password = 'my_password'
headers={"Content-Type": "application/json"}
s = requests.Session()
s.proxies = {"http": "my_proxy"}
r = s.get(baseurl, auth=(username, password), verify=False, headers=None, data=None)
print(r.status_code)
print(r.text)
I am new to Splunk and python so any ideas or suggestions as to why this is happening would help.
You need to authenticate first to get a token, then you'll be able to hit the rest of REST endpoints. The auth endpoint it at /servicesNS/admin/search/auth/login, which will give you the session_key, which you then provide to subsequent requests.
Here is some code that uses requests to authenticate to a Splunk instance, then start a search. It then checks to see if the search is complete, if not, wait a second and then check again. Keep checking and sleeping until the search is done, then print out the results.
import time # need for sleep
from xml.dom import minidom
import json, pprint
import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
base_url = 'https://localhost:8089'
username = 'admin'
password = 'changeme'
search_query = "search=search index=*"
r = requests.get(base_url+"/servicesNS/admin/search/auth/login",
data={'username':username,'password':password}, verify=False)
session_key = minidom.parseString(r.text).getElementsByTagName('sessionKey')[0].firstChild.nodeValue
print ("Session Key:", session_key)
r = requests.post(base_url + '/services/search/jobs/', data=search_query,
headers = { 'Authorization': ('Splunk %s' %session_key)},
verify = False)
sid = minidom.parseString(r.text).getElementsByTagName('sid')[0].firstChild.nodeValue
print ("Search ID", sid)
done = False
while not done:
r = requests.get(base_url + '/services/search/jobs/' + sid,
headers = { 'Authorization': ('Splunk %s' %session_key)},
verify = False)
response = minidom.parseString(r.text)
for node in response.getElementsByTagName("s:key"):
if node.hasAttribute("name") and node.getAttribute("name") == "dispatchState":
dispatchState = node.firstChild.nodeValue
print ("Search Status: ", dispatchState)
if dispatchState == "DONE":
done = True
else:
time.sleep(1)
r = requests.get(base_url + '/services/search/jobs/' + sid + '/results/',
headers = { 'Authorization': ('Splunk %s' %session_key)},
data={'output_mode': 'json'},
verify = False)
pprint.pprint(json.loads(r.text))
Many of the request calls thare used include the flag, verify = False to avoid issues with the default self-signed SSL certs, but you can drop that if you have legit certificates.
Published a while ago at https://gist.github.com/sduff/aca550a8df636fdc07326225de380a91
Nice piece of coding. One of the wonderful aspects of Python is the ability to use other people's well written packages. In this case, why not use Splunk's Python packages to do all of that work, with a lot less coding around it.
pip install splunklib.
Then add the following to your import block
import splunklib.client as client
import splunklib.results as results
pypi.org has documentation on some of the usage, Splunk has an excellent set of how-to documents. Remember, be lazy, use someone else's work to make your work look better.
I am making HTTP requests using the requests library in python, but I need the IP address from the server that responded to the HTTP request and I'm trying to avoid making two calls (and possibly having a different IP address from the one that responded to the request).
Is that possible? Does any python HTTP library allow me to do that?
PS: I also need to make HTTPS requests and use an authenticated proxy.
Update 1:
Example:
import requests
proxies = {
"http": "http://user:password#10.10.1.10:3128",
"https": "http://user:password#10.10.1.10:1080",
}
response = requests.get("http://example.org", proxies=proxies)
response.ip # This doesn't exist, this is just an what I would like to do
Then, I would like to know to which IP address requests are connected from a method or property in the response. In other libraries, I was able to do that by finding the sock object and using the getpeername() function.
It turns out that it's rather involved.
Here's a monkey-patch while using requests version 1.2.3:
Wrapping the _make_request method on HTTPConnectionPool to store the response from socket.getpeername() on the HTTPResponse instance.
For me on python 2.7.3, this instance was available on response.raw._original_response.
from requests.packages.urllib3.connectionpool import HTTPConnectionPool
def _make_request(self,conn,method,url,**kwargs):
response = self._old_make_request(conn,method,url,**kwargs)
sock = getattr(conn,'sock',False)
if sock:
setattr(response,'peer',sock.getpeername())
else:
setattr(response,'peer',None)
return response
HTTPConnectionPool._old_make_request = HTTPConnectionPool._make_request
HTTPConnectionPool._make_request = _make_request
import requests
r = requests.get('http://www.google.com')
print r.raw._original_response.peer
Yields:
('2a00:1450:4009:809::1017', 80, 0, 0)
Ah, if there's a proxy involved or the response is chunked, the HTTPConnectionPool._make_request isn't called.
So here's a new version patching httplib.getresponse instead:
import httplib
def getresponse(self,*args,**kwargs):
response = self._old_getresponse(*args,**kwargs)
if self.sock:
response.peer = self.sock.getpeername()
else:
response.peer = None
return response
httplib.HTTPConnection._old_getresponse = httplib.HTTPConnection.getresponse
httplib.HTTPConnection.getresponse = getresponse
import requests
def check_peer(resp):
orig_resp = resp.raw._original_response
if hasattr(orig_resp,'peer'):
return getattr(orig_resp,'peer')
Running:
>>> r1 = requests.get('http://www.google.com')
>>> check_peer(r1)
('2a00:1450:4009:808::101f', 80, 0, 0)
>>> r2 = requests.get('https://www.google.com')
>>> check_peer(r2)
('2a00:1450:4009:808::101f', 443, 0, 0)
>>> r3 = requests.get('http://wheezyweb.readthedocs.org/en/latest/tutorial.html#what-you-ll-build')
>>> check_peer(r3)
('162.209.99.68', 80)
Also checked running with proxies set; proxy address is returned.
Update 2016/01/19
est offers an alternative that doesn't need the monkey-patch:
rsp = requests.get('http://google.com', stream=True)
# grab the IP while you can, before you consume the body!!!!!!!!
print rsp.raw._fp.fp._sock.getpeername()
# consume the body, which calls the read(), after that fileno is no longer available.
print rsp.content
Update 2016/05/19
From the comments, copying here for visibility, Richard Kenneth Niescior offers the following that is confirmed working with requests 2.10.0 and Python 3.
rsp=requests.get(..., stream=True)
rsp.raw._connection.sock.getpeername()
Update 2019/02/22
Python3 with requests version 2.19.1.
resp=requests.get(..., stream=True)
resp.raw._connection.sock.socket.getsockname()
Update 2020/01/31
Python3.8 with requests 2.22.0
resp = requests.get('https://www.google.com', stream=True)
resp.raw._connection.sock.getsockname()
Try:
import requests
proxies = {
"http": "http://user:password#10.10.1.10:3128",
"https": "http://user:password#10.10.1.10:1080",
}
response = requests.get('http://jsonip.com', proxies=proxies)
ip = response.json()['ip']
print('Your public IP is:', ip)
I'm playing around trying to write a client for a site which provides data as an HTTP stream (aka HTTP server push). However, urllib2.urlopen() grabs the stream in its current state and then closes the connection. I tried skipping urllib2 and using httplib directly, but this seems to have the same behaviour.
The request is a POST request with a set of five parameters. There are no cookies or authentication required, however.
Is there a way to get the stream to stay open, so it can be checked each program loop for new contents, rather than waiting for the whole thing to be redownloaded every few seconds, introducing lag?
You could try the requests lib.
import requests
r = requests.get('http://httpbin.org/stream/20', stream=True)
for line in r.iter_lines():
# filter out keep-alive new lines
if line:
print line
You also could add parameters:
import requests
settings = { 'interval': '1000', 'count':'50' }
url = 'http://agent.mtconnect.org/sample'
r = requests.get(url, params=settings, stream=True)
for line in r.iter_lines():
if line:
print line
Do you need to actually parse the response headers, or are you mainly interested in the content? And is your HTTP request complex, making you set cookies and other headers, or will a very simple request suffice?
If you only care about the body of the HTTP response and don't have a very fancy request, you should consider simply using a socket connection:
import socket
SERVER_ADDR = ("example.com", 80)
sock = socket.create_connection(SERVER_ADDR)
f = sock.makefile("r+", bufsize=0)
f.write("GET / HTTP/1.0\r\n"
+ "Host: example.com\r\n" # you can put other headers here too
+ "\r\n")
# skip headers
while f.readline() != "\r\n":
pass
# keep reading forever
while True:
line = f.readline() # blocks until more data is available
if not line:
break # we ran out of data!
print line
sock.close()
One way to do it using urllib2 is (assuming this site also requires Basic Auth):
import urllib2
p_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
url = 'http://streamingsite.com'
p_mgr.add_password(None, url, 'login', 'password')
auth = urllib2.HTTPBasicAuthHandler(p_mgr)
opener = urllib2.build_opener(auth)
urllib2.install_opener(opener)
f = opener.open('http://streamingsite.com')
while True:
data = f.readline()
I'm trying to send a SOAP request using SOAPpy as the client. I've found some documentation stating how to add a cookie by extending SOAPpy.HTTPTransport, but I can't seem to get it to work.
I tried to use the example here,
but the server I'm trying to send the request to started throwing 415 errors, so I'm trying to accomplish this without using ClientCookie, or by figuring out why the server is throwing 415's when I do use it. I suspect it might be because ClientCookie uses urllib2 & http/1.1, whereas SOAPpy uses urllib & http/1.0
Does someone know how to make ClientCookie use http/1.0, if that is even the problem, or a way to add a cookie to the SOAPpy headers without using ClientCookie? If tried this code using other services, it only seems to throw errors when sending requests to Microsoft servers.
I'm still finding my footing with python, so it could just be me doing something dumb.
import sys, os, string
from SOAPpy import WSDL,HTTPTransport,Config,SOAPAddress,Types
import ClientCookie
Config.cookieJar = ClientCookie.MozillaCookieJar()
class CookieTransport(HTTPTransport):
def call(self, addr, data, namespace, soapaction = None, encoding = None,
http_proxy = None, config = Config):
if not isinstance(addr, SOAPAddress):
addr = SOAPAddress(addr, config)
cookie_cutter = ClientCookie.HTTPCookieProcessor(config.cookieJar)
hh = ClientCookie.HTTPHandler()
hh.set_http_debuglevel(1)
# TODO proxy support
opener = ClientCookie.build_opener(cookie_cutter, hh)
t = 'text/xml';
if encoding != None:
t += '; charset="%s"' % encoding
opener.addheaders = [("Content-Type", t),
("Cookie", "Username=foobar"), # ClientCookie should handle
("SOAPAction" , "%s" % (soapaction))]
response = opener.open(addr.proto + "://" + addr.host + addr.path, data)
data = response.read()
# get the new namespace
if namespace is None:
new_ns = None
else:
new_ns = self.getNS(namespace, data)
print '\n' * 4 , '-'*50
# return response payload
return data, new_ns
url = 'http://www.authorstream.com/Services/Test.asmx?WSDL'
proxy = WSDL.Proxy(url, transport=CookieTransport)
print proxy.GetList()
Error 415 is because of incorrect content-type header.
Install httpfox for firefox or whatever tool (wireshark, Charles or Fiddler) to track what headers are you sending. Try Content-Type: application/xml.
...
t = 'application/xml';
if encoding != None:
t += '; charset="%s"' % encoding
...
If you trying to send file to the web server use Content-Type:application/x-www-form-urlencoded
A nice hack to use cookies with SOAPpy calls
Using Cookies with SOAPpy calls