how to unit test a REST client for an unreliable server?

how to unit test a REST client for an unreliable server? - python

I'm making a Python-based REST client for a 3rd party service that's still under development. The issue is to test/verify that the client will work under ALL kinds of scenarios. Including incorrect responses.
The client uses the Requests library to make the remote REST calls (mostly GET and POST). And for unit testing, I'm thinking of employing the HTTPretty module to simulate/mock the server responses.
The problem is how to deal with the sheer number of possible test cases. Consider the following made-up API;
REQUEST (GET) = http://example.com/new_api?param1=34&param2=hello
RESPONSE = {"value1":34,"value2":"a string"}
I find myself needing to write unit test cases for the following scenarios -
client sending correct number of parameters
client sending incorrect parameter values
client missing a parameter
server's correct responses for above scenarios
server not sending back all the required values
server mixing up value parameters (returning a string instead of a number)
server sending back HTML instead of JSON
... etc
The intent behind all this extensive testing is to help identify where an error could originate from. i.e. is it my client that's having an issue, or the 3rd party server?
Does anyone know of a good way to organize a Python test suite to accommodate these scenarios? Writing unit test functions feels like it will become a never-ending task... :(

That's the goal of unit testing is to test all the cases where you think, you might have to handle errors. On the other hand you do not need to test things which are already handled naturally by the system.
Note that HTTP is an application level protocol where the client always initiates the request and the server just responds. So what I mean by this, is that because you are developing the client you are not responsible of the server response. Your goal is just to sent the appropriate requests.
On the other hand, there are HTTP responses which might trigger behaviors on the client side. These you want to test. For example, the server answers a 301, and you want to test if your client does the right thing by initiating the next request, grabbing the Location: HTTP header value.
In the case of a REST API (aka hypertext driven), your client will parse the content of the HTTP responses and specifically the set of links and/or rel associated value. Based on these values, the client may make decisions or expose possible choices to the users. These you have to test.
If the server doesn't give the information inside the HTTP response for continuing your exploration on the client side, then it's not a REST API, but a perfectly valid HTTP API. Simple as that. It becomes even easier to test. Nothing much to do.

Related

How to interpose RabbitMQ between REST client and (Python) REST server?

If I develop a REST service hosted in Apache and a Python plugin which services GET, PUT, DELETE, PATCH; and this service is consumed by an Angular client (or other REST interacting browser technology). Then how do I make it scale-able with RabbitMQ (AMQP)?
Potential Solution #1
Multiple Apache's still faces off against the browser's HTTP calls.
Each Apache instance uses an AMQP plugin and then posts message to a queue
Python microservices monitor a queue and pull a message, service it and return response
Response passed back to Apache plugin, in turn Apache generates the HTTP response
Does this mean the Python microservice no longer has any HTTP server code at all. This will change that component a lot. Perhaps best to decide upfront if you want to use this pattern as it seems it would be a task to rip out any HTTP server code.
Other potential solutions? I am genuinely puzzled as to how we're supposed to take a classic REST server component and upgrade it to be scale-able with RabbitMQ/AMQP with minimal disruption.

I would recommend switching wsgi to asgi(nginx can help here), Im not sure why you think rabbitmq is the solution to your problem, as nothing you described seems like that would be solved by using this method.
asgi is not supported by apache as far as I know, but it allows the server to go do work, and while its working it can continue to service new requests that come in. (gross over simplification)
If for whatever reason you really want to use job workers (rabbitmq, etc) then I would suggest returning to the user a "token" (really just the job_id) and then they can call with that token, and it will report back either the current job status or the result

Using django for non http requests

Can I use django to handle non http-requests and responses? I have a django web application serving up webpages, and I would like to use it to also communicate with other devices (hand-held gps sending in status reports and receiving ack) over tcp, but django reports that the requests are "
code 400, message Bad HTTP/0.9 request type".
[28/Sep/2015 15:14:26] code 400, message Bad HTTP/0.9 request type ('[V1.0.0,244565434376396,1,abcd,2015-09-28')
[28/Sep/2015 15:14:26] "[V1.0.0,244565434376396,1,abcd,2015-09-28 14:14:12,1-2,865456543459367,2,T1]" 400 -
The message from the device is sent as text over tcp with no http parameters at all.
I haven't found any information on how to do this with django, but it would make my life easier if it was possible.
Thanks!

Not that I know of.
Django is a web framework, so it's designed around a certain paradigm if not a certain protocol.
The design is heavily informed - if not by HTTP - by the notions of URL, request, a stateless protocol, et cetera.
If the template system and the routing system were taken away you would be left with a glorified ORM and some useless bits of code.
However, unless you are dealing with existing devices with their own protocol, you can use Django to build a RESTful service to successfully exchange information with something other than bipeds in front of a web browser.
This article on Dr. Dobb's is very informative.
Django REST, although by no means necessary, can help you.
If you are really stuck with legacy devices and protocols, you could write an adapter/proxy that would receive your devices' requests and translate them to RESTful calls, if you protocol looks enough like HTTP semantically rather than syntactically (as in, if you just have to translate QUUX aaa:bbb:ccc: to GET xx/yy/zz).
If it does not share the slightest bit of HTTP's semantics, I'd say Django can't help you much.

I second the suggestion that you can better handle non-http with other methods, but I do have a suggestion as to how to structure a Django app that could do it. HTTP processing takes place in middleware and you could just make your app be on the top of that stack and either pre-empt other middlewares by returning the response instead of passing it down the stack or preparing a mock request to pass down to other handlers, grabbing the response on the way back to post-process it for your receiver.
This feels hacky and might require a bunch of un-orthodox tricks but that's how I would approach the problem as stated.

http POSTs over a certain size failing when authentication is enabled

I've developed a fairly simple web service using Flask (Python 2.7, current Flask and dependencies), where clients POST a hunk of JSON to the server and get a response.
This is working 100% of the time when no authentication is enabled; straight up POST to my service works great.
Adding HTTP Digest authentication to the endpoint results in the client producing a 'Broken Pipe' error after the 401 - Authentication Required response is sent back... but only if the JSON hunk is more than about 22k.
If the JSON hunk being transmitted in the POST is under ~22k, the client gets its 401 response and cheerfully replies with authentication data as expected.
I'm not sure exactly what the size cut-off is... the largest I've tested successfully with is 21766 bytes, and the smallest that's failed is 43846 bytes. You'll note that 32k is right in that range, and 32k might be a nice default size for a buffer... and this smells like a buffer size problem.
The problem has been observed using a Python client (built with the 'requests' module) and a C++ client (using one of Qt's HTTP client classes). The problem is also observed both when running the Flask app "stand-alone" (that is, via app.run()) and when running behind Apache via mod_wsgi. No SSL is enabled in either case.

It goes as follows:
your client POSTs JSON data without authentication
server receives the request (not necessarily in one long chunk, it might come in parst)
server evaluates the requests and finds it is not providing credentials, so decides stopping processing the request and replies 401.
With short POST size server consumes all and does not have time to break the POST requests in the middle. With growing POST size the chances to interrupt unauthorized POST request is higher.
You client has two options:
Either start sending credentials right away.
Or try / catch broken pipe and react to it by forming proper Digest based request.
The first feeling is, something is broken, but it is rather reasonable approach - imagine, someone could post huge POST request, consume resources on your server while not being authorized to do so. The reaction of the server seems reasonable in this context.

How to make sure that my AJAX requests are originating from the same server in Python

I have already asked a question about IP Authentication here: TastyPie Authentication from the same server
However, I need something more! An IP address could be very easily spoofed.
Scenario: My API (TastyPie) and Client App (in javascript) are on the same server/site/domain. My users don't login. I want to consume my API in my javascript client side.
Question: How can I make sure (authentication) that my AJAX requests are originating from the same server?
I'm using Tatypie. I need to authentication that the requests from the client are being made on the same server/domain etc. I cannot use 'logged in sessions' as my users don't login.
I have looked at private keys and generating a signature but they can viewed in the javascript making that method insecure. If I do it in a way to request a signature form the server (hiding the private key in some python code) anyone can make the same http request to get_signature that my javascript makes, thus defeating the point.
I also tried to have the Django view put the signature in the view eliminating the need to make the get_signature call. This is safe, but means that I have to now refresh the page every time to get a new signature. From a users point of view only the first call to the API would work, after which they need to refresh, again pointless.
I cannot believe I'm the only person with this requirement. This is a common scenario I'm sure. Please help :) An example using custom authentication in Tastypie would be welcome too.
Thanks
Added:

Depending on your infrastructure #dragonx's answer might interest you most.
my 2c
You want to make sure that only if a client visits your website can use the api? Hmm does the bot, robot, crawler fall in the same category with the client then? Or am I wrong? This can be easily exploited in case you really want to secure it really.
I cannot believe I'm the only person with this requirement.
Maybe not, but as you can see you are prone to several attacks to your API and that can be a reason for someone not sharing your design and making security stricter with auth.
EDIT
Since we are talking about AJAX requests what does the IP part has to do with this? The IP will always be the Client's IP! So probably, you want a public API...
I would Go with the tokens/session/cookie part.
I 'd go with a generated token that lasts a little while and a flow described below.
I'd go with a limiter per some time, like Github does. Eg 60 requests per hour per ip or more for registered users
To overcome the problem with the refreshing token I would just do this:
Client visits the site
-> server generates API TOKEN INIT
-> Client gets API TOKEN INIT which is valid only for starting 1 request.
Client makes AJAX Request to API
-> Client uses API TOKEN INIT
-> Server checks against API TOKEN INIT and limits
-> Server accepts request
-> Server passes back API TOKEN
-> Client consumes response data and stores API TOKEN for further usage (Will be stored in browser memory via JS)
Client Starts Comm with the API for a limited amount of time or requests. Notice that you know also the init token date so you can use it to check against the 1st visit on the page.
The 1st token is generated via the server when the client visits.
Then the client uses that token in order to obtain a real one, that lasts for some time or something else as of limitation.
This makes someone actually visit the webpage and then he can access the API for a limit amount of time, requests perhaps etc.
This way you don't need refreshing.
Of course the above scenario could be simplified with only one token and a time limit as mentioned above.
Of course the above scenario is prone to advanced crawlers, etc since you have no authentication.
Of course a clever attacker can grab tokens from server and repeat the steps but, then you already had that that problem from start.
Some extra points
As the comments provided please close writes to the API. You don't want to be a victim of DOS attacks with writes if you have doubts about your implementation(if not use auth) or for extra security
The token scenario as described above can also become more complicated eg by constantly exchanging tokens
Just for reference GAE Cloud storage uses signed_urls for kind of the same purpose.
Hope it helps.
PS. regarding IP spoofing and Defense against spoofing attacks wikipedia says so packet's won't be returned to the attacker:
Some upper layer protocols provide their own defense against IP
spoofing attacks. For example, Transmission Control Protocol (TCP)
uses sequence numbers negotiated with the remote machine to ensure
that arriving packets are part of an established connection. Since the
attacker normally can't see any reply packets, the sequence number
must be guessed in order to hijack the connection. The poor
implementation in many older operating systems and network devices,
however, means that TCP sequence numbers can be predicted.

If it's purely the same server, you can verify requests against 127.0.0.1 or localhost.
Otherwise the solution is probably at the network level, to have a separate private subnet that you can check against. It should be difficult for an attacker to spoof your subnet without being on your subnet.

I guess you're a bit confused (or I am, please correct me). That your JS code is published on the same server as your API does not mean AJAX requests will come from your server. The clients download the JS from your server and execute it, which results in requests to your API sent from the clients, not from the same server.
Now if the above scenario correctly describes your case, what you are probably trying to do is to protect your API from bot scraping. The easiest protection is CAPTCHA, and you can find some more ideas on the Wiki page.
If you are concerned that other sites may make AJAX calls to your API to copy your site functionality, you shouldn't be--AJAX requests can only be sent to the same server as the page the JS is running on, unless it is JSONP.

Short answer: It is not possible to prevent a dedicated attacker.
You have no method of identifying a client other than with the information that they give you. For instance, username/password authentication works under the assumption that only a valid client would be able to provide valid credentials. When someone logs in, all you know is that some person provided those credentials -- you assume that this means that this means that they are a legitimate user.
Let's take a look at your scenario here, as I understand it. The only method you have of authenticating a client is IP Address, a very weak form of authentication. As you stated, this can be easily spoofed, and in with some effort your server's response can be received back to the attacker's original IP address. If this happens, you can't do anything about it. The fact is, if you assume someone from a valid IP address is a valid user, then spoofers and legitimate users are indistinguishable. This is just like if someone steals your password and tries to log in to StackOverflow. To StackOverflow, the attacker and you are indistinguishable, since all they have to go on is the username and password.
You can do fancy things with the client as mentioned in other answers, such as tokens, time limits, etc., but an dedicated attacker would be able to mimic the actions of a legitimate client, and you wouldn't be able to tell them apart because they would both appear to be from valid IP addresses. For instance, in your last example, if I was an attacker looking to make API calls, I would spoof a legitimate IP address, get the signature, and use it to make an API call, just as a legitimate client would.
If your application is critical enough to deem this level of thought into security, you should at least think of implementing something like API tokens, public key encryption, or other authentication methods that are more secure than IP addresses to tell your clients apart from any attackers. Authentication by IP address (or other easily forged tokens like hostname or headers) simply won't cut it.

may be you could achieve this by using Same-origin policy
refer http://en.wikipedia.org/wiki/Same_origin_policy

As suggested by Venkatesh Bachu, Same Origin Policy and http://en.wikipedia.org/wiki/Cross-Origin_Resource_Sharing (CORS) could be used as a solution.
In your API, you can check Origin header and respond accordingly.
Need to check if Origin header can be modified by using extensions like tamper data.
A determined hacker can still snoop by pointing browser to a local proxy server.

If this app server is running on an ordinary web server that has configurable listening IP address, set it to 127.0.0.1. With the TCPServer module, it's like
SocketServer.TCPServer(("127.0.0.1", 12345), TheHandlerClass)
Use netstat command to verify the listening address is correct as "127.0.0.1"
tcp4 0 0 127.0.0.1.12345 *.* LISTEN
This will effectively making any connection originated outside the same host impossible on the TCP level.

There are two general solution types: in-band solutions using normal web server/client mechanisms, that are easy to implement but have limitations; and out-of-band solutions that rely on you to configure something externally, that take a little more work but don't have the same limitations as in-band.
If you prefer an in-band solution, then the typical approach used to prevent cross-site request forgery (XSRF) would work well. Server issues a token with a limited life span; client uses the token in requests; privacy of token is (sort of) assured by using an HTTPS connection. This approach is used widely, and works well unless you are worried about man-in-the-middle attacks that could intercept the token, or buggy browsers that could leak data to other client-side code that's being naughty.
You can eliminate those limitations, if you're motivated, by introducing client certificates. These are kind of the flip side to the SSL certificates we all use on web servers -- they operate the same way, but are used to identify the client rather than the server. Because the certificate itself never goes over the wire (you install it locally in the browser or other client), you don't have the same threats from man-in-the-middle and browser leakage. This solution isn't used much in the wild because it's confusing to set up (very confusing for the typical user), but if you have a limited number of clients and they are under your control, then it could be feasible to deploy and manage this limited number of client certificates. The certificate operations are handled by the browser, not in client code (i.e. not in JavaScript) so your concern about key data being visible in JavaScript would not apply in this scenario.
Lastly, if you want to skip over the client configuration nonsense, use the ultimate out-of-band solution -- iptables or a similar tool to create an application-level firewall that only allows sessions that originate from network interfaces (like local loopback) that you know for certain can't be accessed off the box.

Python Webserver: How to serve requests asynchronously

I need to create a python middleware that will do the following:
a) Accept http get/post requests from multiple clients.
b) Modify and Dispatch these requests to a backend remote application (via socket communication). I do not have any control over this remote application.
c) Receive processed results from backend application and return these results back to the requesting clients.
Now the clients are expecting a synchronous request/response scenario. But the backend application is not returning the results synchronously. That is, some requests take much longer to process than others. Hence,
Client 1 : send http request C1 --> get response R1
Client 2 : send http request C2 --> get response R2
Client 3 : send http request C3 --> get response R3
Python middleware receives them in some order: C2, C3, C1. Dispatches them in this order to backend (as non-http messages). Backend responds with results in mixed order R1, R3, R2. Python middleware should package these responses back into http response objects and send the response back to the relevant client.
Is there any sample code to program this sort of behavior. There seem to be something like 20 different web frameworks for python and I'm confused as to which one would be best for this scenario (would prefer something as lightweight as possible ... I would consider Django too heavy ... I tried bottle, but I am not sure how to go about programming that for this scenario).
================================================
Update (based on discussions below): Requests have a request id. Responses have a response id (which should match the request id that they correspond to). There is only one socket connection between the middleware and the remote backend application. While we can maintain a {request_id : ip_address} dictionary, the issue is how to construct a HTTP response object to the correct client. I assume, threading might solve this problem where each thread maintains its own response object.

Screw frameworks. This exactly the kind of task for asyncore. This module allows event-based network programming: given a set of sockets, it calls back given handlers when data is ready on any of them. That way, threads are not necessary just to dumbly wait for data on one socket to arrive and painfully pass it to another thread. You would have to implement the http handling yourself, but examples can be found on that. Alternatively, you could use the async feature of uwsgi, which would allow your application to be integrated with an existing webserver, but that does not integrate with asyncore by default --- though it wouldn't be hard to make it work. Depends on specific needs.

Quoting your comment:
The middleware uses a single persistent socket connection to the backend. All requests from middleware are forwarded via this single socket. Clients do send a request id along with their requests. Response id should match the request id. So the question remains: How does the middleware (web server) keep track of which request id belonged to which client? I mean, is there any way for a cgi script in middleware to create a db of tuples like and once a response id matches, then send a http response to clientip:clienttcpport ?
Is there any special reason for doing all this processing in a middleware? You should be able to do all this in a decorator, or somewhere else, if more appropriate.
Anyway, you need to maintain a global concurrent dictionary (extend dict and protect it using threading.Lock). Upon a new request, store the given request-id as key, and associate it to the respective client (sender). Whenever your backend responds, retrieve the client from this dictionary, and remove the entry so it doesn't accumulate forever.
UPDATE: someone already extended the dictionary for you - check this answer.

Ultimately your going from the synchronous http request-response protocol from your clients to an asynchronous queuing/messaging protocol with your backend. So you've two choices (1) either make requests wait until the backend has no outstanding work, then process one (2) write something that marries the backend responses with their associated request (using a dictionary of request or something)
One way might be to run your server in one thread while dealing with your backend in another (see... Run Python HTTPServer in Background and Continue Script Execution) or maybe look at aiohttp (https://docs.aiohttp.org/en/v0.12.0/web.html)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.