how to get all the key from one bucket in couchbase? - python

Using Python SDK, could not find how to get all the keys from one bucket
in couchbase.
Docs reference:
http://docs.couchbase.com/sdk-api/couchbase-python-client-2.2.0/api/couchbase.html#item-api-methods
https://github.com/couchbase/couchbase-python-client/tree/master/examples
https://stackoverflow.com/questions/27040667/how-to-get-all-keys-from-couchbase
Is there a simple way to get all the keys ?

I'm a little concerned as to why you would want every single key. The number of documents can get very large, and I can't think of a good reason to want every single key.
That being said, here are a couple of ways to do it in Couchbase:
N1QL. First, create a primary index (CREATE PRIMARY INDEX ON bucketname), then select the keys: SELECT META().id FROM bucketname; In Python, you can use N1QLQuery and N1QLRequest to execute these.
Create a map/reduce view index. Literally the default map function when you create a new map/reduce view index is exactly that: function (doc, meta) { emit(meta.id, null); }. In Python, use the View class.
You don't need Python to do these things, by the way, but you can use it if you'd like. Check out the documentation for the Couchbase Python SDK for more information.

I'm a little concerned as to why you would want every single key. The number of documents can get very large, and I can't think of a good reason to want every single key.
There is a document for every customer with the key being the username for the customer. That username is only held as a one-way hash (along with the password) for authentication. It is not stored in its original form or in a form from which the original can be recovered. It's not feasible to ask the 100 million customers to provide their userids. This came from an actual customer on #seteam.

Related

Is there a way to check if a key=value exists in a DynamoDB table?

I want to check if a specific key has a specific value in a dynamodb table with/without retrieving the entire item. Is there a way to do this in Python using boto3?
Note: I am looking to match a sort key with its value and check if that specific key value pair exists in the table,
It sounds like you want to fetch an item by it's sort key alone. While this is possible with the scan operation, it's not ideal.
DynamoDB gives us three ways to fetch data: getItem, query and scan.
The getItem operation allows you to fetch a single using it's primary key. The query operation can fetch multiple items within the same partition, but requires you to specify the partition key (and optionally the sort key). The scan operation lets you fetch items by specifying any attribute.
Therefore, if you want to fetch data form DynamoDB without using the full primary key or partition key, you can use the scan operation. However, be careful when using scan. From the docs:
The Scan operation returns one or more items and item attributes by accessing every item in a table or a secondary index.
The scan operation can be horribly inefficient if not used carefully. If you find yourself using scans frequently in your application or in a highly trafficked area of your app, you probably want to reorganize your data model.
What Seth said is 100% accurate, however, if you can add a GSI you can use the query option on the GSI. You could create a GSI that is just the value of the sort key, allowing you to query for records that match that sort key. You can even use the same field, and if you don't need any of the data you can just project the keys, keeping the cost relatively low.

is it bad practise to store a table in a database with no primary key?

I am currently working on a list implementation in python that stores a persistent list as a database:
https://github.com/DarkShroom/sqlitelist
I am tackling a design consideration, it seems that SQLite allows me store the data without a primary key?
self.c.execute('SELECT * FROM unnamed LIMIT 1 OFFSET {}'.format(key))
this line of code can retrieve by absolute row reference
Is this bad practise? will I loose the data order at any point? Perhaps it's OKAY with sqlite, but my design will not translate to other database engines? Any thoughts from people more familiar with databases would be helpful. I am writing this so I don't have to deal with databases!
The documentation says:
If a SELECT statement that returns more than one row does not have an ORDER BY clause, the order in which the rows are returned is undefined.
So you cannot simply use OFFSET to identify rows.
A PRIMARY KEY constraint just tells the database that is must enforce UNIQUE and NOT NULL constraints on the PK columns. If you do not declare a PRIMARY KEY, these constraints are not automatically enforced, but this does not change the fact that you have to identify your rows somehow when you want to access them.
The easiest way to store list entries is to have the position in the list as a separate column. (If your program takes up most of its time inserting or deleting list entries, it might be a better idea to store the list not as an array but as a linked list, i.e., the database does not store the position but a pointer to the next entry.)

Python, gae, ndb - get all keys in a kind

I know how to get all entities by a key using Book.get_by_id(key)
where Book is an ndb.Model.
How do I get all the keys within my Kind?
Is it using fetch()(https://cloud.google.com/appengine/docs/python/ndb/queryclass#Query_fetch) ?
I don't want to get the keys/IDs from a given entity or some value. Just retrieve all the available keys, so I could retrieve their respectful entities and display it all to the user
If you only want the keys, use the keys_only keyword in the fetch() method:
Book.query().fetch(keys_only=True)
Then you can fetch all the entities using ndb.get_multi(keys). According to Guido, this may be more efficient than returning the entities in the query (if the entities are already in the cache).
With all_books = Book.query().fetch() the all_books variable will now have every entity of your Book model.
Note though that when you have lots of entities in the Book model - it won't be a good idea to load&show them all at once. You will need some kind of pagination implementation (depending on what exactly you're doing) - otherwise your pages will load forever which will create a bad experience for your users.
Read more at https://cloud.google.com/appengine/docs/python/ndb/queries
If you only wish to get all keys just use
entity.query().fetch(key_only=True)
which will return a list of all keys in that entity group. If you wanna get the IDs and not keys you can also use:
map(lambda key: key.id(), entity.query().fetch(key_only=True))

Use sorted set to notifications system

I am using redis sorted sets to save user notifications. But as i never did a notification system, I am asking about my logic.
I need to save 4 things for each notification.
post_id
post_type - A/B
visible - Y/N
checked - Y/N
My question is how can I store this type of structure in sorted sets?
ZADD users_notifications:1 10 1_A_Y_Y
ZADD users_notifications:1 20 2_A_Y_N
....
There is a better way to do this type of stuff in redis? In the case above i am saving the four thing in each element, and i need to split by the underscore in the server language.
It really depends on how you need to query the data.
The most common way to approach this problem is to use a sorted set for the order and a hash for each object.
So:
ZADD notifications:<user-id> <timestamp> <post-id>
HMSET notifications:<user-id>:<post-id> type <type> visible <visible> checked <checked>
You'd use ZRANGE to get the latest notifications in order and then a pipelined call to HMGET to get the attributes for each object.
As I mentioned, it depends on how you need to access the data. If, for example, you always show visible and unchecked notifications to a user, then you probably want to store those IDs in a different sorted set, so that you don't have to query for the status.
Assuming you have such a sorted set, when a user dismisses a notification you'd do:
HSET notifications:<user-id>:<post-id> visible 0
ZREM notifications:<user-id>:visible <post-id>

Is it safe to pass Google App Engine Entity Keys into web pages to maintain context?

I have a simple GAE system that contains models for Account, Project and Transaction.
I am using Django to generate a web page that has a list of Projects in a table that belong to a given Account and I want to create a link to each project's details page. I am generating a link that converts the Project's key to string and includes that in the link to make it easy to lookup the Project object. This gives a link that looks like this:
My Project Name
Is it secure to create links like this? Is there a better way? It feels like a bad way to keep context.
The key string shows up in the linked page and is ugly. Is there a way to avoid showing it?
Thanks.
There is few examples, in GAE docs, that uses same approach, and also Key are using characters safe for including in URLs. So, probably, there is no problem.
BTW, I prefer to use numeric ID (obj_key.id()), when my model uses number as identifier, just because it's looks not so ugly.
Whether or not this is 'secure' depends on what you mean by that, and how you implement your app. Let's back off a bit and see exactly what's stored in a Key object. Take your key, go to shell.appspot.com, and enter the following:
db.Key(your_key)
this returns something like the following:
datastore_types.Key.from_path(u'TestKind', 1234, _app=u'shell')
As you can see, the key contains the App ID, the kind name, and the ID or name (along with the kind/id pairs of any parent entities - in this case, none). Nothing here you should be particularly concerned about concealing, so there shouldn't be any significant risk of information leakage here.
You mention as a concern that users could guess other URLs - that's certainly possible, since they could decode the key, modify the ID or name, and re-encode the key. If your security model relies on them not guessing other URLs, though, you might want to do one of a couple of things:
Reconsider your app's security model. You shouldn't rely on 'secret URLs' for any degree of real security if you can avoid it.
Use a key name, and set it to a long, random string that users will not be able to guess.
A final concern is what else users could modify. If you handle keys by passing them to db.get, the user could change the kind name, and cause you to fetch a different entity kind to that which you intended. If that entity kind happens to have similarly named fields, you might do things to the entity (such as revealing data from it) that you did not intend. You can avoid this by passing the key to YourModel.get instead, which will check the key is of the correct kind before fetching it.
All this said, though, a better approach is to pass the key ID or name around. You can extract this by calling .id() on the key object (for an ID - .name() if you're using key names), and you can reconstruct the original key with db.Key.from_path('kind_name', id) - or just fetch the entity directly with YourModel.get_by_id.
After doing some more research, I think I can now answer my own question. I wanted to know if using GAE keys or ids was inherently unsafe.
It is, in fact, unsafe without some additional code, since a user could modify URLs in the returned webpage or visit URL that they build manually. This would potentially let an authenticated user edit another user's data just by changing a key Id in a URL.
So for every resource that you allow access to, you need to ensure that the currently authenticated user has the right to be accessing it in the way they are attempting.
This involves writing extra queries for each operation, since it seems there is no built-in way to just say "Users only have access to objects that are owned by them".
I know this is an old post, but i want to clarify one thing. Sometimes you NEED to work with KEYs.
When you have an entity with a #Parent relationship, you cant get it by its ID, you need to use the whole KEY to get it back form the Datastore. In these cases you need to work with the KEY all the time if you want to retrieve your entity.
They aren't simply increasing; I only have 10 entries in my Datastore and I've already reached 7001.
As long as there is some form of protection so users can't simply guess them, there is no reason not to do it.

Categories