I have a question about updating field in GAE database. My problem looks like this:
class A(db.Model):
a = db.StringProperty()
and I added bool field:
class A(db.Model):
a = db.StringProperty()
b = db.BooleanProperty(default=False)
Now my problem is I'd like to have every instance of model b == False.
To update it I could of course drag them out of datastore and put them back there, but there is 700k elements there already and I really don't know how to do it efficiently. I can't take them out at once because I get soft memory exceeded errors. If I try to do it with little chunks - it costs me many db read operations. Do you have any idea how else I could update my datastore?
Cheers
I agree with #ShayErlichmen. However, if you really want to update every entity, the easiest way is to use the MapReduce library:
http://code.google.com/p/appengine-mapreduce/
It's not as easy as it sounds, because the documentation sucks, but this is the getting started point:
http://code.google.com/p/appengine-mapreduce/wiki/GettingStartedInPython
You just write a function foo() that will check the value of each entity passed to it, and if necessary, write update the value of your Boolean, and write it.
The library will grab batches of entities and send each batch to a separate task. Each task will run in a loop calling your function foo(). Note that the batches run in parallel, so it may launch a few instances in parallel, but it tends to be quick.
You new attribute can be in one of three states: None, False and True. just treat None as False in your code and you won't have todo the update.
Related
I just started using Eve and it's really great for quickly getting a full REST API to run. However, I'm not entirely convinced that REST is perfect in all cases, e.g. I'd like to have a simple upvote route where I can increase the counter of an object. If I manually retrieve the object, increase the counter, and update it, I can easily run into problems with getting out-of-sync. So I'd like to add a simple extra-route, e.g. /resource/upvote that increases the upvote count by one and returns the object.
I don't know how "hacky" this is, so if it's over-the-top please tell me. I don't see a problem with having custom routes for some important tasks that would be too much work to do in a RESTful way. I know I could treat upvotes as its own resource, but hey I thought we're doing MongoDB, so let's not be overly relational.
So here is as far as I got:
#app.route('/api/upvote/<type>/<id>')
def upvote(type, id):
obj = app.data.find_one_raw(type, id)
obj['score'] += 1
Problem #1 find_one_raw returns None all the time. I guess I have to convert the id parameter? (I'm using the native MongoDB ObjectId)
Problem #2 How to save the object? I don't see a handy easy-to-use method like save_raw
Problem #3 Can we wrap the whole thing in a transaction or similar to make sure it's thread-safe? (I'm also new to MongoDB as you can tell).
1:
type happens to be python keyword. Do you mean to say something like resource_type ?
2: There is app.data.insert (to create new) or app.data.update (to update existing one)
3: Apparently there are no transactions in mongodb as apparent from this thread (As you can tell, I am new to mongodb myself)
I have a model called Theme. It has a lot of columns, but I need to retrieve only the field called "name", so I did this:
Theme.objects.only("name")
But it doesn't work, it is still retrieving all the columns.
PD: I don't want to use values() because it returns only a python dictionary. I need to return a set of model instances, to access to its attributes and methods.
Using only or its counterpart defer does not prevent accessing the deferred attributes. It only delays retrieval of said attributes until they are accessed. So take the following:
for theme in Theme.objects.all():
print theme.name
print theme.other_attribute
This will execute a single query when the loop starts. Now consider the following:
for theme in Theme.objects.only('name'):
print theme.name
print theme.other_attribute
In this case, the other_attribute is not loaded in the initial query at the start of the loop. However, it is added to the model's list of deferred attributes. When you try to access it, another query is executed to retrieve the value of other_attribute. In the second case, a total of n+1 queries is executed for n Theme objects.
The only and defer methods should only ever be used in advanced use-cases, after the need for optimization arises, and after proper analysing of your code. Even then, there are often workarounds that work better than deferring fields. Please read the note at the bottom of the defer documentation.
If what you want is a single column, I think what you are looking for is .values() instead of .only.
I searched around and couldn't really find any information on this. Basically i have a database "A" and a database "B". What i want to do is create a python script (that will likely run as a cron job) that will collect data from database "A" via sql, perform an action on it, and then input that data into database "B".
I have written it using functions something along the lines of:
Function 1 gets the date the script was last run
Function 2 Gets the data from Database "A" based on function 1
Function 3-5 Perform the needed actions
Function 6 Inserts data into Database "B"
My question is, it was mentioned to me that i should use a Class to do this rather than just functions. The only problem is, I am honestly a bit hazy on Classes and when to use them.
Would a Class be better for this? Or is writing this out as functions that feed into each other better? If i would use a Class, could you tell me how it would look?
Would a Class be better for this?
Probably not.
Classes are useful when you have multiple, stateful instances that have shared methods. Nothing in your problem description matches those criteria.
There's nothing wrong with having a script with a handful of functions to perform simple data transfers (extract, transform, store).
Two code examples (simplified):
.get outside the transaction (object from .get passed into the transactional function)
#db.transactional
def update_object_1_txn(obj, new_value):
obj.prop1 = new_value
return obj.put()
.get inside the transaction
#db.transactional
def update_object2_txn(obj_key, new_value):
obj = db.get(obj_key)
obj.prop1 = new_value
return obj.put()
Is the first example logically sound? Is the transaction there useful at all, does it provide anything? I'm trying to better understand appengine's transactions. Would choosing the second option prevent from concurrent modifications for that object?
To answer your question in one word: yes, your second example is the way to do it. In the boundaries of a transaction, you get some data, change it, and commit the new value.
Your first one is not wrong, though, because you don't read from obj. So even though it might not have the same value that the earlier get returned, you wouldn't notice. Put another way: as written, your examples aren't good at illustrating the point of a transaction, which is usually called "test and set". See a good Wikipedia article on it here: http://en.wikipedia.org/wiki/Test-and-set
More specific to GAE, as defined in GAE docs, a transaction is:
a set of Datastore operations on one or more entities. Each transaction is guaranteed to be atomic, which means that transactions are never partially applied. Either all of the operations in the transaction are applied, or none of them are applied.
which tells you it doesn't have to be just for test and set, it could also be useful for ensuring the batch commit of several entities, etc.
I need to attach an object to session in such a way that it will not differ from one persisted in db. (Easier to explain it with code):
session.query(type(some_object)).filter_by(id=some_object.id).one()
Is there more proper way to do that?
session.add(some_object) doesn't work since an entity with such id can already be attached to this session, and object = session.merge(some_object) doesn't work for me because it translates state from detached copy (if i make object.name='asdfasdf' these changes will be pending after merging object)
EDIT:
I found a bit less ugly way:
some_object = session.merge(some_object)
session.refresh(some_object)
But is there a way todo this in one call?
I need to attach an object to session in such a way that it will not differ from one persisted in db.
"will not differ from DB" pretty much means you're looking to load it, so query it. You might want to consider that the object might already be present in that target session. so your approach with query(type(object)) is probably the most direct, though you can use get() to hit the primary key directly, and populate_existing() to guarantee that state which already exists in the session is overwritten:
session.query(type(some_object)).populate_existing().get(some_object.id)
the above calls down to almost the same codepaths that refresh() does. The merge/refresh approach you have works too but emits at least two SELECT calls.