I'm using Google App Engine NDB. Sometimes I will want to get all users with a phone number in a specified list. Using queries is extremely expensive for this, so I thought I'll just make the id value of the User entity the phone number of the user so I can fetch directly by ids.
The problem is that the phone number field is optional, so initially a User entity is created without a phone number, and thus no value for id. So it would be created user = User() as opposed to user = User(id = phone_number).
So when a user at a later point decides to add a phone number to his account, is there anyway to modify that User entity's id value to the new phone number?
The entity ID forms part of the primary key for the entity, so there's no way to change it. Changing it is identical to creating a new entity with the new key and deleting the old one - which is one thing you can do, if you want.
A better solution would be to create a PhoneNumber kind that provides a reference to the associated User, allowing you to do lookups with get operations, but not requiring every user to have exactly one phone number.
Related
I want to create a table(postgres) that stores data about what items were viewed by what user. authenticated users are no problem but how can I tell one anonymous user from another anonymous user? This is needed for analysis purposes.
maybe store their IP address as unique ID? How can I do this?
I think you should use cookies.
When a user that is not authenticated makes a request, look for a cookie named whatever ("nonuserid" in this case). If the cookie is not present it means it's a new user so you should set the cookie with a random id. If it's present you can use the id in it to identificate the anonymous user.
Option 1
Use the IP Address. Checkout this answer. There is no need to write the IP into the session because you could always get the client IP when you receive a request.
Option 2
Generate the unique ID by the uuid, as the doc said. And set the ID in the session with a given name, suppose it USER_ID.
When you get the request from a user, check if the USER_ID in session. If so, read the value of it and write a record to database like user id visit page X. If not, generate and set.
I asked another question about doing large queries in GAE, to which the answer was pretty much not possible.
What I want to do is this: from an iOS device, I get all the user's contacts phone numbers. So now I have a list of say 250 phone numbers. I want to send these phone numbers back to the server and check to see which of these phone numbers belong to a User account.
So I need to do a query: query = User.query(User.phone.IN(phones_list))
However, with GAE, this is quite an expensive query. It will cost 250 reads for just this one query, and I expect to do this type of query often.
So I came up with a crazy idea. Why don't I host the phone numbers on another host, on another database, where this type of query is cheaper. Then I can have GAE send a HTTP request to my other server to get the desired info.
So I have two questions:
Are there any databases more streamlined to handle these kinds of
queries, and which it would be more cheaper to do? Or will it all be
the same as GAE?
Is this overkill? Is it a good idea? Should I suck it up and pay the cost?
GAE's datastore should be good enough for your service. Since your application looks like could be parallelized very well.
1. use phone number as key_name of User.
As you set number as key_name of User, the following code will increase the query speed and reduce the read operation.
memcache.get_multi([phone_number1, phone_number2 ... ])
db.get([number1_not_found_in_memcache, number2_not_found_in_memcache])
memcache.set_multi("all_number_found_in_db")
2. store multi number in one datastore.
the operation cost of GAE not directly related to the entity's size. therefore a large entity store multi data would be another way to save the operation cost.
for example, store several phone number which have the same number_prefix together.
class Number(db.Model):
number_prefix = db.StringProperty()
numbers = db.StringListProperty(indexed = False)
# check number 01234567, 032123124
numbers = Number.get(["01", "03'])
# check 01234567 in number[0].numbers ?
# check 032123124 in number[1].numbers ?
this method could further imporve with memcache.
Generalizing slightly on other ideas offered... assuming that all your search keys are unique to a single User (e.g. email, phone, twitter handle, etc.)
At User write time, you can generate a set of SearchIndex(...) and persist that. Each SearchIndex has the key of the User.
Then at search time you can construct the keys for any SearchIndex and do two ndb.get_multi_async calls. The first to get matching SearchIndex entities, and the second to get the Users associated with those index entities.
From my understanding, #db.transactional(xg=True) allows for transactions across groups, however the following code returns "queries inside transactions must have ancestors".
#db.transactional(xg=True)
def insertUserID(self,userName):
user = User.gql("WHERE userName = :1", userName).get()
highestUser = User.all().order('-userID').get()
nextUserID = highestID + 1
user.userID = nextUserID
user.put()
Do you need to pass in the key for each entity despite being a cross group transaction? Can you please help modify this example accordingly?
An XG transaction can be applied across max 25 entity groups. Ancestor query limits the query to a single entity group, and you would be able to do queries within those 25 entity groups in a single XG transaction.
A transactional query without parent would potentially include all entity groups in the application and lock everything up, so you get an error message instead.
In app engine one usually tries to avoid monotonically increasing ids. The auto assigned ones might go like 101, 10001, 10002 and so on. If you know that you need monotonically increasng ids it and it'll work for you performance wise, how about:
Have some kind of model representation of userId to enable key_name
usage and direct lookup
Query for userId outside transaction, get highest candidate id
In transaction do get_or_insert; lookup UserId.get_by_key_name(candidateid+1). If
already present and pointing to a different user, try again with +2
and so on until you find a free one and create it, updating the
userid attribute of user at the same time.
If the XG-transaction of updating UserId+User is too slow, perhaps create UserId+task in transaction (not XG), and let the executing task associate UserId and User afterwards. Or a single backend that can serialize UserId creation and perhaps allow put_async if you retry to avoid holes in the sequence and do something like 50 creations per second.
If it's possible to use userName as key_name you can do direct lookup instead of query and make things faster and cheaper.
Cross group transactions allow you to perform a transaction across multiple groups, but they don't remove the prohibition on queries inside transactions. You need to perform the query outside the transaction, and pass the ID of the entity in (and then check any invariants specified in the query still hold) - or, as Shay suggests, use IDs so you don't have to do a query in the first place.
Every datastore entity has a key, a key (amount other things) has a numeric id that the AppEngine assign to it or key_name which you can give it.
In your case it looks like you can use the numeric id, after you call put() on the user entity you will have: user.key().id() (or user.key.id() if your using NDB) which will be unique for each user (as long as all the user have the same parent, which is None in your code).
This id is not sequential but guarantee to be unique.
I am setting up a simple billing system, where I have a table that lists users and the day they are supposed to be billed. Each user only has one row of this table associated with them.
I need to query the database on a daily basis to get a list of users to be billed on that day.
The model is:
class BillingDay(models.Model):
user = models.ForeignKey(User)
day = models.IntegerField(max_length=2)
How would I query against the day field? User.objects.filter(billingday=1) looks at the ID, but I'm looking i need to get a list of users with 1 as the value for day in billingday
User.objects.filter(billingday__day=1)
Just as a note, though, you might want to rethink how you're setting this up before you get too far down the rabbit hole. Will users have multiple billing days? My guess would be no. If that's the case, there's no reason for a BillingDay model. It only adds complexity and fragments data. The billing day could just be a field on your user profile.
Now, creating a user profile for a User is in principle no different that having a BillingDay model as a way to add extra data to User, but it's far more extensible. Django has builtin methods for having a user profile associate with every User, and you can more data to the same user profile object over time. Whereas, BillingDay would be relegated to just one data point and you'd later have to add additional models (more complexity and fragmentation of data) for other data points down the line.
See Django's documentation on user profiles.
I'm working on an application that lets registered users create or upload content, and allows anonymous users to view that content and browse registered users' pages to find that content - this is very similar to how a site like Flickr, for example, allows people to browse its users' pages.
To do this, I need a way to identify the user in the anonymous HTTP GET request. A user should be able to type http://myapplication.com/browse/<userid>/<contentid> and get to the right page - should be unique, but mustn't be something like the user's email address, for privacy reasons.
Through Google App Engine, I can get the email address associated with the user, but like I said, I don't want to use that. I can have users of my application pick a unique user name when they register, but I would like to make that optional if at all possible, so that the registration process is as short as possible.
Another option is to generate some random cookie (a GUID?) during the registration process, and use that, I don't see an obvious way of guaranteeing uniqueness of such a cookie without a trip to the database.
Is there a way, given an App Engine user object, of getting a unique identifier for that object that can be used in this way?
I'm looking for a Python solution - I forgot that GAE also supports Java now. Still, I expect the techniques to be similar, regardless of the language.
Your timing is impeccable: Just yesterday, a new release of the SDK came out, with support for unique, permanent user IDs. They meet all the criteria you specified.
I think you should distinguish between two types of users:
1) users that have logged in via Google Accounts or that have already registered on your site with a non-google e-mail address
2) users that opened your site for the first time and are not logged in in any way
For the second case, I can see no other way than to generate some random string (e.g. via uuid.uuid4() or from this user's session cookie key), as an anonymous user does not carry any unique information with himself.
For users that are logged in, however, you already have a unique identifier -- their e-mail address. I agree with your privacy concerns -- you shouldn't use it as an identifier. Instead, how about generating a string that seems random, but is in fact generated from the e-mail address? Hashing functions are perfect for this purpose. Example:
>>> import hashlib
>>> email = 'user#host.com'
>>> salt = 'SomeLongStringThatWillBeAppendedToEachEmail'
>>> key = hashlib.sha1('%s$%s' % (email, salt)).hexdigest()
>>> print key
f6cd3459f9a39c97635c652884b3e328f05be0f7
As hashlib.sha1 is not a random function, but for given data returns always the same result, but it is proven to be practically irreversible, you can safely present the hashed key on the website without compromising user's e-mail address. Also, you can safely assume that no two hashes of distinct e-mails will be the same (they can be, but probability of it happening is very, very small). For more information on hashing functions, consult the Wikipedia entry.
Do you mean session cookies?
Try http://code.google.com/p/gaeutilities/
What DzinX said. The only way to create an opaque key that can be authenticated without a database roundtrip is using encryption or a cryptographic hash.
Give the user a random number and hash it or encrypt it with a private key. You still run the (tiny) risk of collisions, but you can avoid this by touching the database on key creation, changing the random number in case of a collision. Make sure the random number is cryptographic, and add a long server-side random number to prevent chosen plaintext attacks.
You'll end up with a token like the Google Docs key, basically a signature proving the user is authenticated, which can be verified without touching the database.
However, given the pricing of GAE and the speed of bigtable, you're probably better off using a session ID if you really can't use Google's own authentication.