Python caching for multiple users

Python caching for multiple users - python

I'm creating a python wrapper for Vimeo API and this is my first time creating a python distribution. I'm having questions with python caching.
I referred this existing python-vimeo wrapper for caching the request token. That guy implemented like this
"""By default, this client will cache API requests for 120 seconds. To
override this setting, pass in a different cache_timeout parameter (in
seconds), or to disable caching, set cache_timeout to 0."""
I'm wondering whether it will create a problem or not. If there is more than one user using that feature for connecting vimeo exactly at the same time, and storing the information like this in the server
return self._cache.setdefault(key, processor(headers, content))
doesn't it create problem(informations will be overwritten in the cache)?
If it creates a problem, could you tell me the best solution? I think It would be storing in the filename with the name of authenticated username. Am I right?
Thanks!

I'm not sure I understand the issue, but you could create a prefixed key where the prefix of the key is the username. So a naive but possibly good approach is to save to the
username+"_"+key
key instead
There most likely wouldn't be any key collisions.

Related

How to avoid requesting many times the same data from MongoDB in Python

I have a function that takes from the database the current language chosen by the user, the problem is that this function is ran many times (like 25) when the user request something and I can't avoid that.
This is making my service very slow, so I was wondering what is the best way to solve this problem. Any ideas?
This is what I've done following #JustinEzequiel comments:
#lru_cache
def get_locale(self, guildID: int):
return db.find_one({'id': guildID})['language'], guildID

You will want to cache the results of the database call. This will allow you to avoid multiple DB hits, allowing you to look up the needed information on the fly. The actual implementation of this will vary depending on the object you're returning from your database.

Ignoring a ProtoRPC message field via Cloud Endpoints

I've been working on an AppEngine-based project and I wanted to know if it's possible to ignore a ProtoRPC message field.
With the Java SDK, you can use #ApiResourceProperty to ignore a property (this means it's not contained within the response returned to the browser). However, I have not come across a way of doing this using the Python SDK.
Is there anything like this in the Python SDK?
Thanks, Adil

Nope, unfortunately not (at least not to my knowledge).
Two possible solutions depending on your use-case.
Set field values to None before returning the message in your method. That way they will be skipped/not included in the JSON response.
If your messages are hooked up to datastore models you can use the endpoints-proto-datastore library which allows you to use your ndb models directly in your API methods. Additionally it allows for request_fields and response_fields parameters in the method decorator which will limit the request or response to the specified subset of message/model fields. (internally it creates the necessary message classes for you)

App engine -- when to use memcache vs search index (and search API)?

I am interested in adding a spell checker to my app -- I'm planning on using difflib with a custom word list that's ~147kB large (13,025 words).
When testing user queries against this list, would it make more sense to:
load the dictionary into memcache (I guess from the datastore?) and keep it in memcache or
build an index for the Search API and pass in the user query against it there
I guess what I'm asking is which is faster: memcache or a search index?
Thanks.

Memcache is definitely faster.
Another important consideration is cost. Memcache API calls are free, while Search API calls have their own quota and pricing.
By the way, you may store your library as a static file, because it's small and it does not change. There is no need to store it in the Datastore.

Memcache is faster however you need to consider the following.
it is not reliable, at any moment entities can be purged. So your code needs a fallback for non cached data
You can only fetch by key, so as you said you would need to store whole dictionaries in memcache objects.
Each memcache entity can only store 1MB. If you dictionary is larger you would have to span multiple entities. Ok not relevant in your case.
There are some other alternatives. How often will the dictionary be updated ?
Here is one alternate strategy.
You could store it in the filesystem (requires app updates) or GCS if you want to update the dictionary outside of app updates. Then you can load the dictionary in each instance into memory at startup or on first request and cache it at the running instance level, then you won't have any round trips to services adding latencies. This will also be simpler code wise (ie no fallbacks if not in memcache etc)
Here is an example. In this case the code lives in a module, which is imported as required. I am using a yaml file for additional configuration, it could just as easily json load a dictionary, or you could define a python dictionary in the module.
_modsettings = {}
def loadSettings(settings='settings.yaml'):
x= _modsettings
if not x:
try:
_modsettings.update(load(open(settings,'r').read()))
except IOError:
pass
return _modsettings
settings = loadSettings()
Then whenever I want the settings dictionary my code just refers to mymodule.settings.
By importing this module during a warmup request you won't get a race condition, or have to import/parse the dictionary during a user facing request. You can put in more error traps as appropriate ;-)

Is it safe to pass Google App Engine Entity Keys into web pages to maintain context?

I have a simple GAE system that contains models for Account, Project and Transaction.
I am using Django to generate a web page that has a list of Projects in a table that belong to a given Account and I want to create a link to each project's details page. I am generating a link that converts the Project's key to string and includes that in the link to make it easy to lookup the Project object. This gives a link that looks like this:
My Project Name
Is it secure to create links like this? Is there a better way? It feels like a bad way to keep context.
The key string shows up in the linked page and is ugly. Is there a way to avoid showing it?
Thanks.

There is few examples, in GAE docs, that uses same approach, and also Key are using characters safe for including in URLs. So, probably, there is no problem.
BTW, I prefer to use numeric ID (obj_key.id()), when my model uses number as identifier, just because it's looks not so ugly.

Whether or not this is 'secure' depends on what you mean by that, and how you implement your app. Let's back off a bit and see exactly what's stored in a Key object. Take your key, go to shell.appspot.com, and enter the following:
db.Key(your_key)
this returns something like the following:
datastore_types.Key.from_path(u'TestKind', 1234, _app=u'shell')
As you can see, the key contains the App ID, the kind name, and the ID or name (along with the kind/id pairs of any parent entities - in this case, none). Nothing here you should be particularly concerned about concealing, so there shouldn't be any significant risk of information leakage here.
You mention as a concern that users could guess other URLs - that's certainly possible, since they could decode the key, modify the ID or name, and re-encode the key. If your security model relies on them not guessing other URLs, though, you might want to do one of a couple of things:
Reconsider your app's security model. You shouldn't rely on 'secret URLs' for any degree of real security if you can avoid it.
Use a key name, and set it to a long, random string that users will not be able to guess.
A final concern is what else users could modify. If you handle keys by passing them to db.get, the user could change the kind name, and cause you to fetch a different entity kind to that which you intended. If that entity kind happens to have similarly named fields, you might do things to the entity (such as revealing data from it) that you did not intend. You can avoid this by passing the key to YourModel.get instead, which will check the key is of the correct kind before fetching it.
All this said, though, a better approach is to pass the key ID or name around. You can extract this by calling .id() on the key object (for an ID - .name() if you're using key names), and you can reconstruct the original key with db.Key.from_path('kind_name', id) - or just fetch the entity directly with YourModel.get_by_id.

After doing some more research, I think I can now answer my own question. I wanted to know if using GAE keys or ids was inherently unsafe.
It is, in fact, unsafe without some additional code, since a user could modify URLs in the returned webpage or visit URL that they build manually. This would potentially let an authenticated user edit another user's data just by changing a key Id in a URL.
So for every resource that you allow access to, you need to ensure that the currently authenticated user has the right to be accessing it in the way they are attempting.
This involves writing extra queries for each operation, since it seems there is no built-in way to just say "Users only have access to objects that are owned by them".

I know this is an old post, but i want to clarify one thing. Sometimes you NEED to work with KEYs.
When you have an entity with a #Parent relationship, you cant get it by its ID, you need to use the whole KEY to get it back form the Datastore. In these cases you need to work with the KEY all the time if you want to retrieve your entity.

They aren't simply increasing; I only have 10 entries in my Datastore and I've already reached 7001.
As long as there is some form of protection so users can't simply guess them, there is no reason not to do it.

Alternative to singleton?

I'm a Python & App Engine (and server-side!) newbie, and I'm trying to create very simple CMS. Each deployment of the application would have one -and only one -company object, instantiated from something like:
class Company(db.Model):
name = db.StringPropery()
profile = db.TextProperty()
addr = db.TextProperty()
I'm trying to provide the facility to update the company profile and other details.
My first thought was to have a Company entity singleton. But having looked at (although far from totally grasped) this thread I get the impression that it's difficult, and inadvisable, to do this.
So then I thought that perhaps for each deployment of the CMS I could, as a one-off, run a script (triggered by a totally obscure URL) which simply instantiates Company. From then on, I would get this instance with theCompany = Company.all()[0]
Is this advisable?
Then I remembered that someone in that thread suggested simply using a module. So I just created a Company.py file and stuck a few variables in it. I've tried this in the SDK and it seems to work -to my suprise, modified variable values "survived" between requests.
Forgive my ignorance but, I assume these values are only held in memory rather than on disk -unlike Datastore stuff? Is this a robust solution? (And would the module variables be in scope for all invocations of my application's scripts?)

Global variables are "app-cached." This means that each particular instance of your app will remember these variables' values between requests. However, when an instance is shutdown these values will be lost. Thus I do not think you really want to store these values in module-level variables (unless they are constants which do not need to be updated).
I think your original solution will work fine. You could even create the original entity using the remote API tool so that you don't need an obscure page to instantiate the one and only Company object.
You can also make the retrieval of the singleton Company entity a bit faster if you retrieve it by key.
If you will need to retrieve this entity frequently, then you can avoid round-trips to the datastore by using a caching technique. The fastest would be to app-cache the Company entity after you've retrieved it from the datastore. To protect against the entity from becoming too out of date, you can also app-cache the time you last retrieved the entity and if that time is more than N seconds old then you could re-fetch it from the datastore. For more details on this option and how it compares to alternatives, check out Nick Johnson's article Storage options on App Engine.

It sounds like you are trying to provide a way for your app to be configurable on a per-application basis.
Why not use the datastore to store your company entity with a key_name? Then you will always know how to fetch the company entity, and you'll be able edit the company without redeploying.
company = Company(key_name='c')
# set stuff on company....
company.put()
# later in code...
company = Company.get_by_key_name('c')
Use memcache to store the details of the company and avoid repeated datastore calls.
In addition to memcache, you can use module variables to cache the values. They are cached, as you have seen, between requests.

I think the approach you read about is the simplest:
Use module variables, initialized in None.
Provide accessors (get/setters) for these variables.
When a variable is accessed, if its value is None, fetch it from the database. Otherwise, just use it.
This way, you'll have app-wide variables provided by the module (which won't be instantiated again and again), they will be shared and you won't lose them.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.