unittest for opening and reading url [duplicate]

unittest for opening and reading url [duplicate] - python

I've got a piece of code that I can't figure out how to unit test! The module pulls content from external XML feeds (twitter, flickr, youtube, etc.) with urllib2. Here's some pseudo-code for it:
params = (url, urlencode(data),) if data else (url,)
req = Request(*params)
response = urlopen(req)
#check headers, content-length, etc...
#parse the response XML with lxml...
My first thought was to pickle the response and load it for testing, but apparently urllib's response object is unserializable (it raises an exception).
Just saving the XML from the response body isn't ideal, because my code uses the header information too. It's designed to act on a response object.
And of course, relying on an external source for data in a unit test is a horrible idea.
So how do I write a unit test for this?

urllib2 has a functions called build_opener() and install_opener() which you should use to mock the behaviour of urlopen()
import urllib2
from StringIO import StringIO
def mock_response(req):
if req.get_full_url() == "http://example.com":
resp = urllib2.addinfourl(StringIO("mock file"), "mock message", req.get_full_url())
resp.code = 200
resp.msg = "OK"
return resp
class MyHTTPHandler(urllib2.HTTPHandler):
def http_open(self, req):
print "mock opener"
return mock_response(req)
my_opener = urllib2.build_opener(MyHTTPHandler)
urllib2.install_opener(my_opener)
response=urllib2.urlopen("http://example.com")
print response.read()
print response.code
print response.msg

It would be best if you could write a mock urlopen (and possibly Request) which provides the minimum required interface to behave like urllib2's version. You'd then need to have your function/method which uses it able to accept this mock urlopen somehow, and use urllib2.urlopen otherwise.
This is a fair amount of work, but worthwhile. Remember that python is very friendly to ducktyping, so you just need to provide some semblance of the response object's properties to mock it.
For example:
class MockResponse(object):
def __init__(self, resp_data, code=200, msg='OK'):
self.resp_data = resp_data
self.code = code
self.msg = msg
self.headers = {'content-type': 'text/xml; charset=utf-8'}
def read(self):
return self.resp_data
def getcode(self):
return self.code
# Define other members and properties you want
def mock_urlopen(request):
return MockResponse(r'<xml document>')
Granted, some of these are difficult to mock, because for example I believe the normal "headers" is an HTTPMessage which implements fun stuff like case-insensitive header names. But, you might be able to simply construct an HTTPMessage with your response data.

Build a separate class or module responsible for communicating with your external feeds.
Make this class able to be a test double. You're using python, so you're pretty golden there; if you were using C#, I'd suggest either in interface or virtual methods.
In your unit test, insert a test double of the external feed class. Test that your code uses the class correctly, assuming that the class does the work of communicating with your external resources correctly. Have your test double return fake data rather than live data; test various combinations of the data and of course the possible exceptions urllib2 could throw.
Aand... that's it.
You can't effectively automate unit tests that rely on external sources, so you're best off not doing it. Run an occasional integration test on your communication module, but don't include those tests as part of your automated tests.
Edit:
Just a note on the difference between my answer and #Crast's answer. Both are essentially correct, but they involve different approaches. In Crast's approach, you use a test double on the library itself. In my approach, you abstract the use of the library away into a separate module and test double that module.
Which approach you use is entirely subjective; there's no "correct" answer there. I prefer my approach because it allows me to build more modular, flexible code, something I value. But it comes at a cost in terms of additional code to write, something that may not be valued in many agile situations.

You can use pymox to mock the behavior of anything and everything in the urllib2 (or any other) package. It's 2010, you shouldn't be writing your own mock classes.

I think the easiest thing to do is to actually create a simple web server in your unit test. When you start the test, create a new thread that listens on some arbitrary port and when a client connects just returns a known set of headers and XML, then terminates.
I can elaborate if you need more info.
Here's some code:
import threading, SocketServer, time
# a request handler
class SimpleRequestHandler(SocketServer.BaseRequestHandler):
def handle(self):
data = self.request.recv(102400) # token receive
senddata = file(self.server.datafile).read() # read data from unit test file
self.request.send(senddata)
time.sleep(0.1) # make sure it finishes receiving request before closing
self.request.close()
def serve_data(datafile):
server = SocketServer.TCPServer(('127.0.0.1', 12345), SimpleRequestHandler)
server.datafile = datafile
http_server_thread = threading.Thread(target=server.handle_request())
To run your unit test, call serve_data() then call your code that requests a URL that looks like http://localhost:12345/anythingyouwant.

Why not just mock a website that returns the response you expect? then start the server in a thread in setup and kill it in the teardown. I ended up doing this for testing code that would send email by mocking an smtp server and it works great. Surely something more trivial could be done for http...
from smtpd import SMTPServer
from time import sleep
import asyncore
SMTP_PORT = 6544
class MockSMTPServer(SMTPServer):
def __init__(self, localaddr, remoteaddr, cb = None):
self.cb = cb
SMTPServer.__init__(self, localaddr, remoteaddr)
def process_message(self, peer, mailfrom, rcpttos, data):
print (peer, mailfrom, rcpttos, data)
if self.cb:
self.cb(peer, mailfrom, rcpttos, data)
self.close()
def start_smtp(cb, port=SMTP_PORT):
def smtp_thread():
_smtp = MockSMTPServer(("127.0.0.1", port), (None, 0), cb)
asyncore.loop()
return Thread(None, smtp_thread)
def test_stuff():
#.......snip noise
email_result = None
def email_back(*args):
email_result = args
t = start_smtp(email_back)
t.start()
sleep(1)
res.form["email"]= self.admin_email
res = res.form.submit()
assert res.status_int == 302,"should've redirected"
sleep(1)
assert email_result is not None, "didn't get an email"

Trying to improve a bit on #john-la-rooy answer, I've made a small class allowing simple mocking for unit tests
Should work with python 2 and 3
try:
import urllib.request as urllib
except ImportError:
import urllib2 as urllib
from io import BytesIO
class MockHTTPHandler(urllib.HTTPHandler):
def mock_response(self, req):
url = req.get_full_url()
print("incomming request:", url)
if url.endswith('.json'):
resdata = b'[{"hello": "world"}]'
headers = {'Content-Type': 'application/json'}
resp = urllib.addinfourl(BytesIO(resdata), header, url, 200)
resp.msg = "OK"
return resp
raise RuntimeError('Unhandled URL', url)
http_open = mock_response
#classmethod
def install(cls):
previous = urllib._opener
urllib.install_opener(urllib.build_opener(cls))
return previous
#classmethod
def remove(cls, previous=None):
urllib.install_opener(previous)
Used like this:
class TestOther(unittest.TestCase):
def setUp(self):
previous = MockHTTPHandler.install()
self.addCleanup(MockHTTPHandler.remove, previous)

Related

Python3 : Records not getting pushed to Splunk

I have created a custom class, which push my logs to splunk, but somehow it is not working. Here is the class.
class Splunk(logging.StreamHandler):
def __init__(self, url, token):
super().__init__()
self.url = url
self.headers = {f'Authorization': f'Splunk {token}'}
self.propagate = False
def emit(self, record):
mydata = dict()
mydata['sourcetype'] = 'mysourcetype'
mydata['event'] = record.__dict__
response = requests.post(self.url, data=json.dumps(mydata), headers=self.headers)
return response
I call the class from my logger class, somehow like this (adding additional handler), so that it can log on console along with send to splunk
if splunk_config is not None:
splunk_handler = Splunk(splunk_config["url"], splunk_config["token"])
self.default_logger.addHandler(splunk_handler)
But somehow, I am not able to see any logs in splunk. Though I can see the logs in console.
When I try to run the strip down version of above logic from python3 terminal, it is successful.
import requests
import json
url = 'myurl'
token = 'mytoken'
headers = {'Authorization': 'Splunk mytoken'}
propagate = False
mydata = dict()
mydata['sourcetype'] = 'mysourcetype'
mydata['event'] = {'name': 'root', 'msg': 'this is a sample message'}
response = requests.post(url, data=json.dumps(mydata), headers=headers)
print(response.text)
Things I have already tried, making my dictionary data as JSON serializable using below link but it didn't helped.
https://pynative.com/make-python-class-json-serializable/
Any other things to try ?

I've successfully used this Python Class for Sending Events to Splunk HTTP Event Collector instead of writing a dedicated class
https://github.com/georgestarcher/Splunk-Class-httpevent
Advantage is that it implements batchEvent() and flushBatch() methods to submit multiple events at once across multiple threads.
The example here should get you started:
https://github.com/georgestarcher/Splunk-Class-httpevent/blob/master/example.py
If this answers your question, take a moment to accept the answer. This can be done by clicking on the check mark beside the answer to toggle it from greyed out to filled in!

Create SoapRequest without sending them with Suds/Python

Is there anyway to get suds returning the SoapRequest (in XML) without sending it?
The idea is that the upper levels of my program can call my API with an additional boolean argument (simulation).
If simulation == false then process the other params and send the request via suds
If simulation == false then process the other params, create the XML using suds (or any other way) and return it to the caller without sending it to the host.
I already implemented a MessagePlugin follwing https://fedorahosted.org/suds/wiki/Documentation#MessagePlugin, but I am not able to get the XML, stop the request and send back the XML to the caller...
Regards

suds uses a "transport" class called HttpAuthenticated by default. That is where the actual send occurs. So theoretically you could try subclassing that:
from suds.client import Client
from suds.transport import Reply
from suds.transport.https import HttpAuthenticated
class HttpAuthenticatedWithSimulation(HttpAuthenticated):
def send(self, request):
is_simulation = request.headers.pop('simulation', False)
if is_simulation:
# don't actually send the SOAP request, just return its XML
return Reply(200, request.headers.dict, request.msg)
return HttpAuthenticated(request)
...
sim_transport = HttpAuthenticatedWithSimulation()
client = Client(url, transport=sim_transport,
headers={'simulation': is_simulation})
It's a little hacky. (For example, this relies on HTTP headers to pass the boolean simulation option down to the transport level.) But I hope this illustrates the idea.

The solution that I implemented is:
class CustomTransportClass(HttpTransport):
def __init__(self, *args, **kwargs):
HttpTransport.__init__(self, *args, **kwargs)
self.opener = MutualSSLHandler() # I use a special opener to enable a mutual SSL authentication
def send(self,request):
print "===================== 1-* request is going ===================="
is_simulation = request.headers['simulation']
if is_simulation == "true":
# don't actually send the SOAP request, just return its XML
print "This is a simulation :"
print request.message
return Reply(200, request.headers, request.message )
return HttpTransport.send(self,request)
sim_transport = CustomTransportClass()
client = Client(url, transport=sim_transport,
headers={'simulation': is_simulation})
Thanks for your help,

How to retain cookies for xmlrpc.client in Python 3?

The default Python xmlrpc.client.Transport (can be used with xmlrpc.client.ServerProxy) does not retain cookies, which are sometimes needed for cookie based logins.
For example, the following proxy, when used with the TapaTalk API (for which the login method uses cookies for authentication), will give a permission error when trying to modify posts.
proxy = xmlrpc.client.ServerProxy(URL, xmlrpc.client.Transport())
There are some solutions for Python 2 on the net, but they aren't compatible with Python 3.
How can I use a Transport that retains cookies?

Existing answer from GermainZ works only for HTTP. After a lot of time fighting with it, there is HTTPS adaptation. Note the context option which is crucial.
class CookiesTransport(xmlrpc.client.SafeTransport):
"""A SafeTransport (HTTPS) subclass that retains cookies over its lifetime."""
# Note context option - it's required for success
def __init__(self, context=None):
super().__init__(context=context)
self._cookies = []
def send_headers(self, connection, headers):
if self._cookies:
connection.putheader("Cookie", "; ".join(self._cookies))
super().send_headers(connection, headers)
def parse_response(self, response):
# This check is required if in some responses we receive no cookies at all
if response.msg.get_all("Set-Cookie"):
for header in response.msg.get_all("Set-Cookie"):
cookie = header.split(";", 1)[0]
self._cookies.append(cookie)
return super().parse_response(response)
The reason for it is that ServerProxy doesn't respect context option related to transport, if transport is specified, so we need to use it directly in Transport constructor.
Usage:
import xmlrpc.client
import ssl
transport = CookiesTransport(context=ssl._create_unverified_context())
# Note the closing slash in address as well, very important
server = xmlrpc.client.ServerProxy("https://<api_link>/", transport=transport)
# do stuff with server
server.myApiFunc({'param1': 'x', 'param2': 'y'})

This is a simple Transport subclass that will retain all cookies:
class CookiesTransport(xmlrpc.client.Transport):
"""A Transport subclass that retains cookies over its lifetime."""
def __init__(self):
super().__init__()
self._cookies = []
def send_headers(self, connection, headers):
if self._cookies:
connection.putheader("Cookie", "; ".join(self._cookies))
super().send_headers(connection, headers)
def parse_response(self, response):
for header in response.msg.get_all("Set-Cookie"):
cookie = header.split(";", 1)[0]
self._cookies.append(cookie)
return super().parse_response(response)
Usage:
proxy = xmlrpc.client.ServerProxy(URL, CookiesTransport())
Since xmlrpc.client in Python 3 has better suited hooks for this, it's much simpler than an equivalent Python 2 version.

Using web.py return a post request and do further processing after the response

I am using web.py to return a protocol buffer response from a post request and response time is critical. I have some writes to redis that I would like to do after the post response. rather than before.
r = redis.StrictRedis(host='localhost', port=6379, db=0)
class index:
def POST(self):
return pPbuffer
r.set('a','b')
So, how can I modify the code so I can I can return as quickly as possible but doing post cleanup (no pun intended).
Thanks

If you are using wsgi or something as server you could use yield to generate contents time after time and the browser will receive them in sort.
For your example:
class index:
def POST(self):
yield pPbuffer
r.set('a','b')
And this is a good example which is doing it this way.

custom methods in python urllib2

Using urllib2, are we able to use a method other than 'GET' or 'POST' (when data is provided)?
I dug into the library and it seems that the decision to use GET or POST is 'conveniently' tied to whether or not data is provided in the request.
For example, I want to interact with a CouchDB database which requires methods such as 'DEL', 'PUT'. I want the handlers of urllib2, but need to make my own method calls.
I WOULD PREFER NOT to import 3rd party modules into my project, such as the CouchDB python api. So lets please not go down that road. My implementation must use the modules that ship with python 2.6. (My design spec requires the use of a barebones PortablePython distribution). I would write my own interface using httplib before importing external modules.
Thanks so much for the help

You could subclass urllib2.Request like so (untested)
import urllib2
class MyRequest(urllib2.Request):
GET = 'get'
POST = 'post'
PUT = 'put'
DELETE = 'delete'
def __init__(self, url, data=None, headers={},
origin_req_host=None, unverifiable=False, method=None):
urllib2.Request.__init__(self, url, data, headers, origin_req_host, unverifiable)
self.method = method
def get_method(self):
if self.method:
return self.method
return urllib2.Request.get_method(self)
opener = urllib2.build_opener(urllib2.HTTPHandler)
req = MyRequest('http://yourwebsite.com/put/resource/', method=MyRequest.PUT)
resp = opener.open(req)

It could be:
import urllib2
method = 'PATH'
request = urllib2.Request('http://host.com')
request.get_method = lambda: method()
That is, a runtime class modification A.K.A monkey path.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

unittest for opening and reading url [duplicate] - python

You can use pymox to mock the behavior of anything and everything in the urllib2 (or any other) package. It's 2010, you shouldn't be writing your own mock classes.

Related

Python3 : Records not getting pushed to Splunk

Create SoapRequest without sending them with Suds/Python

How to retain cookies for xmlrpc.client in Python 3?

Using web.py return a post request and do further processing after the response

custom methods in python urllib2

Categories

Resources