After reading the introductory articles on REST (Fielding's thesis and other) my perception of statelessness is that there should be no session objects on the server side. Yet, i see Flask (and maybe other REST frameworks in different technologies that i do not know about) gives us a session object to store information on the server in this example:
#app.route('/login', methods=['GET', 'POST'])
def login():
if request.method == 'POST':
session['username'] = request.form['username']
return redirect(url_for('index'))
...
Surely, i am misunderstanding REST's statelessness. So, what is it really?
The purposes of introducing the statelessness constraint in REST include improvements to visibility, reliability, and scalability. This means that proxies and other intermediaries are better able to participate in communication patterns that involve self-descriptive stateless messages, server death and failover does not result in session state synchronisation problems, and it is easy to add new servers to handle client load again without needing to synchronise session state.
REST achieves statelessness through a number of mechanisms:
By designing methods and communication patterns that they do not require state to be retained server-side after the request.
By designing services that expose capabilities to directly sample and transition server-side state without left-over application state
By "deferring" or passing back state to the client as a message at the end of each request whenever session state or application state is required
The downside of statelessness is exposed in that last point: Applications that demand some kind of session state persist beyond the duration of a single request need to have that state sent back to the client as part of the response message. Next time the client wants to issue a request, the state is again transferred to the service and then back to the client.
you can get more info from herehttp://soundadvice.id.au/blog/2009/06/
No, you understand well. There shouldn't be any "session" in a RESTful service. Always check that you can send any URI by mail, keep it in bookmarks, and reference it in links. This is indeed why REST is so important to the Web: no RESTful resources = no more links. Authentication should only be done when accessing the resource representation.
What you can have instead of sessions is a user object (for example a shopping cart) that can be modified by REST methods. This is different from a session, since, for example, there could be services where you could authorize other people to see your shopping cart.
In REST architecture, Session state is kept entirely on the client. This means data cannot be left on the server in a shared context and we still have to send the repetitive data (per-interaction overhead) in a series of requests.
As we keep the application state on the client-side, this reduces the server's control over consistent application behavior, since the application becomes dependent on the correct implementation of semantics across multiple client versions.
However this constraint induces the properties of visibility, reliability, and scalability.
Visibility is improved because a monitoring system does not have to
look beyond a single request datum in order to determine the full
nature of the request.
Reliability is improved because it eases the task of recovering from
partial failures.
Scalability is improved because not having to store state between
requests allows the server component to quickly free resources, and
further simplifies implementation because the server doesn't have to
manage resource usage across requests.
see http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm
Related
I have an AWS server that handles end-user registration/ It runs an EC2 linux instance that serves our API via Apache & Python, and which is connected to its data on a separate Amazon RDS instance running mysql.
To remotely admin the system, I set states in a mysql table to control the availability of the registration API to the public user, and also the level of logging for our Python API, which may reference up to 5 concurrent admin preferences (i.e. not a single "log level")
Because our API provides almost two dozen different functions, we need to check the state of the system's availability before any individual function is accessed. That means there's an SQL Select statement from that table (which only has one record), but for every session of user transaction,s which might involve a half-dozen API calls. We need to check to see if the availability status has changed, so the user doesn't start an API call and have the database become unavailable in the middle of the process. Same for the logging preferences.
The API calls return the server's availability, and estimated downtime, back to the calling program (NOT a web browser interface) which handles that situation gracefully.
Is this a commonly accepted approch for handling this? Should I care if I'm over-polling the status table? And should I set up mysql with my status table in such a way to make my constant checking more efficient (e.g. cached?) when Python obtains its data?
I should note that we might have thousands of simultaneous users making API requests, not tens of thousands, or millions.
Your strategy seems off-track, here.
Polling a status table should not be a major hot spot. A small table, with proper indexes, queried outside a transaction, is a lightweight operation. With an appropriately-provisioned server, such a query should be done entirely in memory, requiring no disk access.
But that doesn't mean it's a fully viable strategy.
We need to check to see if the availability status has changed, so the user doesn't start an API call and have the database become unavailable in the middle of the process.
This will prove impossible. You need time travel capability for this strategy to succeed.
Consider this: the database becoming unavailable in the middle of a process wouldn't be detected by your approach. Only the lack of availability at the beginning would be detected. And that's easy enough to detect, anyway -- you will realize that as soon as you try to do something.
Set appropriate timeouts. The MySQL client library should have support for a connect timeout, as well as a timeout which will cause your application to see an error if a query runs longer than is acceptable or a network disruption causes the connection to be lost mid-query. I don't know whether this exists or what it's called in Python but in the C client library, this is MYSQL_OPT_READ_TIMEOUT and is very handy for preventing a hang when for whatever reason you get no response from the database within an acceptable period of time.
Use database transactions, so that a failure to process a request results in no net change to the database. A MySQL transaction is implicitly rolled back if the connection between the application and the database is lost.
Implementing error handling and recovery -- written into your code -- is likely the more viable approach than trying to prevent your code from running when the service is unavailable is more likely to be a good design, because there is no check interval small enough to fully avoid a database becoming unavailable "in the middle" of a request.
In any event, polling a database table with each request seems like the wrong approach, not to mention the fact that an outage on the health status table's server makes your service fail unnecessarily when the service itself might have been healthy but failed to prove that.
On the other hand, I don't know your architecture, but assuming your front-end involves something like Amazon Application Load Balancer or HAProxy, the health checks against the API service endpoint can actually perform the test. If you configure your check interval for, say, 10 seconds, and making a request to the check endpoint (say GET /health-check) actually verifies end-to-end availability of the necessary components (e.g. database access) then the API service can effectively take itself offline when a problem occurs. It remains offline until it starts returning success again.
The advantage here is that your workload involved in healthy checking is consistent -- it happens every 10 seconds, increasing with the number of nodes providing the service, but not increasing with actual request traffic, because you don't have to perform a check for each request. This means you have a window of a few seconds between the actual loss of availability and the detection of the loss of availability, but the requests that get through in the mean time will fail, anyway.
HAProxy -- and presumably other tools like Varnish or Nginx -- can help you handle graceful failures in other ways as well, by timing out failed requests at a layer before the API endpoint so that the caller gets a response even though the service itself didn't respond. An example from one of my environments is a shopping page where an external API call is made by the application when a site visitor is browsing items by category. If this request runs longer than it should, the proxy can interrupt the request and return a preconfigured static error page to the system making the request with an error -- say, in JSON or XML, that the requesting application will understand -- so that the hard failure becomes a softer one. This fake response can, for example in this case, return an empty JSON array of "items found."
It isn't entirely clear to me, now, whether these APIs are yours, or are external APIs that you are aggregating. If the latter, then HAProxy is a good solution here, too, but facing the other direction -- the back-end faces outward and your service contacts its front-end. You access the external service through the proxy and the proxy checks the remote service and will immediately return an error back to your application if the target API is unhealthy. I use this solution to access an external trouble ticketing system from one of my apps. An additional advantage, here, is that the proxy logs allow me to collect usage, performance, and reliability data about all of the many requests passed to that external service regardless of which of dozens of internal systems may access it, with far better visibility than I could accomplish than if I tried to collect it from all of the internal application servers that access that external service.
What is the best way to do user management in a single page JS (Mithril) app? I want users to login to load preferences and take on a role so they gain certain permissions. I have a REST API backend written in Python (Falcon web framework). Having read a bit into it, it seems to boil down to sending credentials to the backend and get a token back. But the question is how that should be done. It seems that tokens are a better method than cookies, but that has effects on the exchange of secrets/tokens. the 'xhr.withCredentials' method seems to be cookie based for instance. JWT (json web tokens) seems like a modern, interesting option, but it's hard to find a clear explanation how it could be used with a SPA.. And once the Mithril app has a token, where should I store it and how should I use it with subsequent requests?
This isn't so much about Mithril, actually the only Mithril-related area is the server communication. That is done with the m.request method (docs here), but you need to create an object for all server communication that requires authentication.
That object should have knowledge about the auth system and detect if a token expired, then request a new one, take proper action if things fail, etc. It's a bit of work, but the process is different for most auth systems, so there's not much to do about it, except using something that already exists.
Being a small and lean MVC framework, Mithril doesn't have any security-related features built-in, but the m.request method is very powerful and you should use that inside the auth communication object.
The client-side storage will be in cookies or HTML5 storage. Here's an StackExchange answer that goes into more depth: https://security.stackexchange.com/a/80767 but the point is that this isn't Mithril-related either.
Thanks for linking to the tokens vs. cookies article, it was very nice!
I am a newbie and I want to do the following
I have service endpoints like
#login_required
#app.route('/')
def home():
pass
#login_required
#app.route('/add')
def add():
pass
#login_required
#app.route('/save')
def save():
pass
#login_required
#app.route('/delete')
def delete():
pass
I would like user to user to be authenticated while making these calls.
Question
How can I authenticate REST calls using python?
How do I make sure that if call lands to execute any of the endpoints, they are authenticated?
How do I basically do all authentication at the HTTP header level without saving any state so that it can scale better in future (Like Amazon S3), meaning any call might go to a different server and still be able to authenticate it self.
I am entirely new to REST world and don't really know how to achieve this.
Thank you
First, a question, are you authenticating a user, a client, or both?
For authenticating a client I like HTTP MAC Authentication for REST service authentication. Take a look at the Mozilla Services macauthlib and how it's used in their pyramid_macauth project. You should be able to learn from pyramid_macauth as an example in applying macauthlib to secure your services. A search to see if anyone else has tried this with Flask is a good idea, too.
For authenticating users and clients, perhaps take a look at OAuth 2.0 proper (HTTP MAC Auth is a related specification).
I had hoped to post more links, however, this is my first post and it seems I have to earn my way to more links in a response. :)
Security is not for noobs. Use a framework and rely on its implementation. Study the source code, read blogs and papers, and at some point you'll be able to architect your own system.
There are many things that may go wrong, and once you deploy a protocol you may not be able to come back without breaking existing clients.
That said, the usual way fot authenticating a request is by using a couple of tokens, usually called a public key and a private (secret) key. A variant is using the private key to generate a short lived session token. Another variant is using an API key specific per client. Anyway, this token is usually sent in a HTTP header (either a standard cookie or a custom one), but it's also possible to use the request body. Usually they are not appended to the URL because the secret may end in a log file. Also, you should pay attention to how and where store the secret key.
Depending on the channel (plain HTTP) you may want to use a HMAC to sign requests instead of sending secrets in the wild. You have to watch against replay attacks. Timing attacks are possible. Cryptographic collisions may be used to defeat your scheme. You may need tokens to avoid CSRF (this is not really needed if web browsers don't come into play, but you don't specify this)
Again, choose a framework and don't code anything by yourself. Broken software is usually ok to fix, but security holes can do real damages.
Looking at your API, it does not look like restful endpoints. The URI should represent a certain entity and not actions. For an instance if you are dealing with an entity such as user you could have yourdomain.com/user and perform various operations such as create, delete, update and fetch using HTTP verbs like POST, DELETE, PATCH and GET (Given that you use flask this can be achieved very easily).
In terms of security, I assume there are multiple schemes but the one which I have used is generating a session token given a key and secret via an initial authenticate call. I suggest you look for specialized online resources on generating key and secret pair as well as the session token.
In terms of scaling I guess your concern is that the sessions should not be specific to a given machine. The authentication data can be stored in a store separately from the HTTP front-ends. This way you can add additional webservers and scale your front-end or add additional data stores and scale either on a need basis.
I have been reading at multiple places and it is suggested that the Web Servers should be Stateles with share nothing architecture. This helps them scale better.
That means each request has all the information needed to process the request.
This becomes tricky when you have REST endpoints that needs authentication.
I have been looking at ways Flask extensions do this and Flask Login extension is defined as
Flask-Login provides user session management for Flask. It handles the
common tasks of logging in, logging out, and remembering your users’
sessions over extended periods of time.
This seems like against the philosophy of building a Stateless server, isn't it?
What are better ways to build a Stateless server with authentication provided via HTTP headers with Python or related python libraries?
P.S: Apologies for not posting a programming question here, this is a design issue and I do not know how to solve it and SO seems to have right people to answer such questions. Thanks.
Flask-Login uses flask's built in session management, which by default uses secure/signed cookies, and so is purely client side.
It can support server side sessions if needed though of course, here's an example redis backed session store.
I've the same problem as you have said.
While I have built a simple solution for this but looking for a better one.
What I currently did is to ask the caller (Who send the http request) provide a 'X-User-Info' in the http header, the value is a token. When I received the request, I use this token to get user identity (From redis for instance) and all of the following authorization & permission control are based on this identity.
The authentication does nothing but generate a random token, save it with user info to redis and return the token itself to the caller.
I'm using CherryPy to make a web-based frontend for SymPy that uses an asynchronous process library on the server side to allow for processing multiple requests at once without waiting for each one to complete. So as to allow for the frontend to function as expected, I am using one process for the entirety of each session. The client-side Javascript sends the session-id from the cookie to the server when the user submits a request, and the server-side currently uses a pair of lists, storing instances of a controller class in one and the corresponding session-id's in another, creating a new interpreter proxy and sending the input if a non-existant session-id is submitted. The only problem with this is that the proxy classes are not deleted upon the expiration of their corresponding sessions. Also, I can't see anything to retrieve the session-id for which the current request is being served.
My questions about all this are: is there any way to "connect" an arbitrary object to a CherryPy session so that it gets deleted upon session expiration, is there something I am overlooking here that would greatly simplify things, and does CherryPy's multi-threading negate the problem of synchronous reading of the stdout filehandle from the child process?
You can create your own session type, derived from CherryPy's base session. Use its clean_up method to do your cleanup.
Look at cherrypy/lib/sessions.py for details and sample session implementations.