Python 3.x WSGI Testing framework - python

Given that webtest doesn't seem to have a 3.x version (or any plans to develop one), are there any solutions for automated system testing of a WSGI application? I know unittest for unit testing - I'm more interested in the moment in whole systems tests.
I'm not looking for tools to help develop an application - just test it.

In case anyone else comes upon this, I ended up writing a solution myself. Here's a very simple class I use - I just inherit from WSGIBaseTest instead of TestCase, and get a method self.request() that I can pass requests into. It stores cookies, and will automatically send them into the application on later requests (until self.new_session() is called).
import unittest
from wsgiref import util
import io
class WSGIBaseTest(unittest.TestCase):
'''Base class for unit-tests. Provides up a simple interface to make requests
as though they came through a wsgi interface from a user.'''
def setUp(self):
'''Set up a fresh testing environment before each test.'''
self.cookies = []
def request(self, application, url, post_data = None):
'''Hand a request to the application as if sent by a client.
#param application: The callable wsgi application to test.
#param url: The URL to make the request against.
#param post_data: A string.'''
self.response_started = False
temp = io.StringIO(post)
environ = {
'PATH_INFO': url,
'REQUEST_METHOD': 'POST' if post_data else 'GET',
'CONTENT_LENGTH': len(post),
'wsgi.input': temp,
}
util.setup_testing_defaults(environ)
if self.cookies:
environ['HTTP_COOKIE'] = ';'.join(self.cookies)
self.response = ''
for ret in application(environ, self._start_response):
assert self.response_started
self.response += str(ret)
temp.close()
return response
def _start_response(self, status, headers):
'''A callback passed into the application, to simulate a wsgi
environment.
#param status: The response status of the application ("200", "404", etc)
#param headers: Any headers to begin the response with.
'''
assert not self.response_started
self.response_started = True
self.status = status
self.headers = headers
for header in headers:
# Parse out any cookies and save them to send with later requests.
if header[0] == 'Set-Cookie':
var = header[1].split(';', 1)
if len(var) > 1 and var[1][0:9] == ' Max-Age=':
if int(var[1][9:]) > 0:
# An approximation, since our cookies never expire unless
# explicitly deleted (by setting Max-Age=0).
self.cookies.append(var[0])
else:
index = self.cookies.index(var[0])
self.cookies.pop(index)
def new_session(self):
'''Start a new session (or pretend to be a different user) by deleting
all current cookies.'''
self.cookies = []

Related

Python3 : Records not getting pushed to Splunk

I have created a custom class, which push my logs to splunk, but somehow it is not working. Here is the class.
class Splunk(logging.StreamHandler):
def __init__(self, url, token):
super().__init__()
self.url = url
self.headers = {f'Authorization': f'Splunk {token}'}
self.propagate = False
def emit(self, record):
mydata = dict()
mydata['sourcetype'] = 'mysourcetype'
mydata['event'] = record.__dict__
response = requests.post(self.url, data=json.dumps(mydata), headers=self.headers)
return response
I call the class from my logger class, somehow like this (adding additional handler), so that it can log on console along with send to splunk
if splunk_config is not None:
splunk_handler = Splunk(splunk_config["url"], splunk_config["token"])
self.default_logger.addHandler(splunk_handler)
But somehow, I am not able to see any logs in splunk. Though I can see the logs in console.
When I try to run the strip down version of above logic from python3 terminal, it is successful.
import requests
import json
url = 'myurl'
token = 'mytoken'
headers = {'Authorization': 'Splunk mytoken'}
propagate = False
mydata = dict()
mydata['sourcetype'] = 'mysourcetype'
mydata['event'] = {'name': 'root', 'msg': 'this is a sample message'}
response = requests.post(url, data=json.dumps(mydata), headers=headers)
print(response.text)
Things I have already tried, making my dictionary data as JSON serializable using below link but it didn't helped.
https://pynative.com/make-python-class-json-serializable/
Any other things to try ?
I've successfully used this Python Class for Sending Events to Splunk HTTP Event Collector instead of writing a dedicated class
https://github.com/georgestarcher/Splunk-Class-httpevent
Advantage is that it implements batchEvent() and flushBatch() methods to submit multiple events at once across multiple threads.
The example here should get you started:
https://github.com/georgestarcher/Splunk-Class-httpevent/blob/master/example.py
If this answers your question, take a moment to accept the answer. This can be done by clicking on the check mark beside the answer to toggle it from greyed out to filled in!

bottle : how to set a cookie inside a python decorator?

There are some operations that needs to be done before running some routes. For example :
check if we recognise the user,
check the language,
check the location,
set variables in the navbar (here after named header) of the html
and so on, then make decisions based on the outcome and lastly run the requested route.
I find it hard to use the respose.set_cookie("cookie_name", actual_cookie) inside a decorator. It seems flask has a "make_response" object that works well (see here on stack overflow issue 34543157 : Python Flask - Setting a cookie using a decorator), but I find it difficult to reproduce the same thing with bottle.
any how here is my attempt that is not working :
#python3
#/decorator_cookie.py
from bottle import request, response, redirect
from other_module import datamodel, db_pointer, secret_value #custom_module
import json
cookie_value = None
surfer_email_exist_in_db = None
header = None
db_pointer = instanciation_of_a_db_connexion_to_tables
surfer = db_pointer.get(request.get_cookie('surfer')) if db_pointer.get(request.get_cookie('surfer')) != None else "empty"
def set_header(func):
def header_manager():
global cookie_value, surfer_email_exist_in_db, header, db_pointer
cookie_value = True #for stack-overflow question convenience
surfer_email_exist_in_db = True #for stack-overflow question convenience
if not all([cookie_value, surfer_email_exist_in_db]):
redirect('/login')
else:
header = json.dumps(db_pointer.get('header_fr'))
response.set_cookie("header", header, secret = secret_value, path = "/", httponly = True)
return func()
return header_manager
and the main file where the routing goes to
#python3
#/main.py
from bottle import route, request
from decorator_cookie import set_header
from other_module secret_value
#route('/lets_try')
#set_header
def lets_try():
header = request.get_cookie('header', secret = secret_value)
print(header) #here I get None
return template('lets_try.tpl', headers = header)
I also tried set the cookie like that :
make_response = response(func).set_cookie("header", header, secret = secret_value, path = "/", httponly = True)
But got an error :)
Here is the response doc : Response documentation
Do you have any clues ?
Thanks
There is no issue with your code, what you are missing is understanding is understanding
Request 1 [By Browser/No Cookies] -> Request has No cookies -> Response you add cookie header
Request 2 [By Browser/Header Cookies] -> Request has Header cookies -> Response
So for your first request Request.get_cookie will return None but for your second request it will actually return the value

Is there REST API reverse proxy library to inject request headers?

I'm setting up a small Python service to act as an REST API reverse proxy, but hoping there's some libraries available to help speed this process up.
Need to be able to run a function to calculate a variable to inject as a request header when the request is proxied through to the backend.
As it stands I have a simpler script to do the function to get the variable and inject it into a Nginx config file and then force a Nginx hot reload via signals, but trying to remove this dependency for what should be a fairly simple task.
Would a good approach be to use falcon as the listener and combine it with another approach to inject and forward requests?
Thanks for reading.
Edit: Been reading https://aiohttp.readthedocs.io/en/stable/ as it seems to be the right direction.
Thanks to someone over at falcon, this is now the accepted answer!
import io
import falcon
import requests
class Proxy(object):
UPSTREAM = 'https://httpbin.org'
def __init__(self):
self.session = requests.Session()
def handle(self, req, resp):
headers = dict(req.headers, Via='Falcon')
for name in ('HOST', 'CONNECTION', 'REFERER'):
headers.pop(name, None)
request = requests.Request(req.method, self.UPSTREAM + req.path,
data=req.bounded_stream.read(),
headers=headers)
prepared = request.prepare()
from_upstream = self.session.send(prepared, stream=True)
resp.content_type = from_upstream.headers.get('Content-Type',
falcon.MEDIA_HTML)
resp.status = falcon.get_http_status(from_upstream.status_code)
resp.stream = from_upstream.iter_content(io.DEFAULT_BUFFER_SIZE)
api = falcon.API()
api.add_sink(Proxy().handle)

Create SoapRequest without sending them with Suds/Python

Is there anyway to get suds returning the SoapRequest (in XML) without sending it?
The idea is that the upper levels of my program can call my API with an additional boolean argument (simulation).
If simulation == false then process the other params and send the request via suds
If simulation == false then process the other params, create the XML using suds (or any other way) and return it to the caller without sending it to the host.
I already implemented a MessagePlugin follwing https://fedorahosted.org/suds/wiki/Documentation#MessagePlugin, but I am not able to get the XML, stop the request and send back the XML to the caller...
Regards
suds uses a "transport" class called HttpAuthenticated by default. That is where the actual send occurs. So theoretically you could try subclassing that:
from suds.client import Client
from suds.transport import Reply
from suds.transport.https import HttpAuthenticated
class HttpAuthenticatedWithSimulation(HttpAuthenticated):
def send(self, request):
is_simulation = request.headers.pop('simulation', False)
if is_simulation:
# don't actually send the SOAP request, just return its XML
return Reply(200, request.headers.dict, request.msg)
return HttpAuthenticated(request)
...
sim_transport = HttpAuthenticatedWithSimulation()
client = Client(url, transport=sim_transport,
headers={'simulation': is_simulation})
It's a little hacky. (For example, this relies on HTTP headers to pass the boolean simulation option down to the transport level.) But I hope this illustrates the idea.
The solution that I implemented is:
class CustomTransportClass(HttpTransport):
def __init__(self, *args, **kwargs):
HttpTransport.__init__(self, *args, **kwargs)
self.opener = MutualSSLHandler() # I use a special opener to enable a mutual SSL authentication
def send(self,request):
print "===================== 1-* request is going ===================="
is_simulation = request.headers['simulation']
if is_simulation == "true":
# don't actually send the SOAP request, just return its XML
print "This is a simulation :"
print request.message
return Reply(200, request.headers, request.message )
return HttpTransport.send(self,request)
sim_transport = CustomTransportClass()
client = Client(url, transport=sim_transport,
headers={'simulation': is_simulation})
Thanks for your help,

Python Scrapy - mimetype based filter to avoid non-text file downloads

I have a running scrapy project, but it is being bandwidth intensive because it tries to download a lot of binary files (zip, tar, mp3, ..etc).
I think the best solution is to filter the requests based on the mimetype (Content-Type:) HTTP header. I looked at the scrapy code and found this setting:
DOWNLOADER_HTTPCLIENTFACTORY = 'scrapy.core.downloader.webclient.ScrapyHTTPClientFactory'
I changed it to:
DOWNLOADER_HTTPCLIENTFACTORY = 'myproject.webclients.ScrapyHTTPClientFactory'
And played a little with the ScrapyHTTPPageGetter, here is the edits highlighted:
class ScrapyHTTPPageGetter(HTTPClient):
# this is my edit
def handleEndHeaders(self):
if 'Content-Type' in self.headers.keys():
mimetype = str(self.headers['Content-Type'])
# Actually I need only the html, but just in
# case I've preserved all the text
if mimetype.find('text/') > -1:
# Good, this page is needed
self.factory.gotHeaders(self.headers)
else:
self.factory.noPage(Exception('Incorrect Content-Type'))
I feel this is wrong, I need more scrapy friendly way to cancel/drop request right after determining that it's unwanted mimetype. Instead of waiting for the whole data to be downloaded.
Edit:
I'm asking specifically for this part self.factory.noPage(Exception('Incorrect Content-Type')) is that the correct way to cancel a request.
Update 1:
My current setup have crashed the Scrapy server, so please don't try to use the same code above to solve the problem.
Update 2:
I have setup an Apache-based website for testing using the following structure:
/var/www/scrapper-test/Zend -> /var/www/scrapper-test/Zend.zip (symlink)
/var/www/scrapper-test/Zend.zip
I have noticed that Scrapy discards the ones with the .zip extension, but scraps the one without .zip even though it's just a symbolic link to it.
I built this Middleware to exclude any response type that isn't in a whitelist of regular expressions:
from scrapy.http.response.html import HtmlResponse
from scrapy.exceptions import IgnoreRequest
from scrapy import log
import re
class FilterResponses(object):
"""Limit the HTTP response types that Scrapy dowloads."""
#staticmethod
def is_valid_response(type_whitelist, content_type_header):
for type_regex in type_whitelist:
if re.search(type_regex, content_type_header):
return True
return False
def process_response(self, request, response, spider):
"""
Only allow HTTP response types that that match the given list of
filtering regexs
"""
# each spider must define the variable response_type_whitelist as an
# iterable of regular expressions. ex. (r'text', )
type_whitelist = getattr(spider, "response_type_whitelist", None)
content_type_header = response.headers.get('content-type', None)
if not type_whitelist:
return response
elif not content_type_header:
log.msg("no content type header: {}".format(response.url), level=log.DEBUG, spider=spider)
raise IgnoreRequest()
elif self.is_valid_response(type_whitelist, content_type_header):
log.msg("valid response {}".format(response.url), level=log.DEBUG, spider=spider)
return response
else:
msg = "Ignoring request {}, content-type was not in whitelist".format(response.url)
log.msg(msg, level=log.DEBUG, spider=spider)
raise IgnoreRequest()
To use it, add it to settings.py:
DOWNLOADER_MIDDLEWARES = {
'[project_name].middlewares.FilterResponses': 999,
}
May be it is to late. You can use the Accept header to filter the data that you are looking for.
The solution is to setup a Node.js proxy and configure Scrapy to use it through http_proxy environment variable.
What the proxy should do is:
Take HTTP requests from Scrapy and sends it to the server being crawled. Then it gives back the response from to Scrapy i.e. intercept all HTTP traffic.
For binary files (based on a heuristic you implement) it sends 403 Forbidden error to Scrapy and immediate closes the request/response. This helps to save time, traffic and Scrapy won't crash.
Sample Proxy Code
That actually works!
http.createServer(function(clientReq, clientRes) {
var options = {
host: clientReq.headers['host'],
port: 80,
path: clientReq.url,
method: clientReq.method,
headers: clientReq.headers
};
var fullUrl = clientReq.headers['host'] + clientReq.url;
var proxyReq = http.request(options, function(proxyRes) {
var contentType = proxyRes.headers['content-type'] || '';
if (!contentType.startsWith('text/')) {
proxyRes.destroy();
var httpForbidden = 403;
clientRes.writeHead(httpForbidden);
clientRes.write('Binary download is disabled.');
clientRes.end();
}
clientRes.writeHead(proxyRes.statusCode, proxyRes.headers);
proxyRes.pipe(clientRes);
});
proxyReq.on('error', function(e) {
console.log('problem with clientReq: ' + e.message);
});
proxyReq.end();
}).listen(8080);

Categories