GAE mapreduce: How to access counters when the counting is done? - python

I have a mapper pipeline where the map function increments a counter using
yield op.counters.Increment("mycounter")
But I don't know how to access the value of "mycounter" after the pipeline has completed. I have seen examples using a completion handler, but they seem to refer to an older mapreduce library, where one could actually define a completion handler.
My best guess is that I need to define a final stage in the pipeline that has access to the mapper pipeline's counters -- but how exactly?

As answered in this related question, this feature is not available right now. There's a feature request in their issue tracker (Issue 208), which is currently with status "Started". Please star it ;-)

Related

Batch call Dependency id requirements?

I have a script which does the following:
Create campaign
Create AdSet (requires campaign_id)
Create AdCreative (requires adset_id)
Create Ad (requires creative_id and adset_id)
I am trying to lump all of them into a batch request. However, I realized that my none of these gets created except for my campaign (step 1) if I use remote_create(batch=my_batch). This is probably due to the dependencies of the ids that are needed in by each of the subsequent steps.
I read the documentation and it mentions that one can "Specifying dependencies between operations in the request" (https://developers.facebook.com/docs/graph-api/making-multiple-requests) between calls via {result=(parent operation name):(JSONPath expression)}
Is this possible with the python API?
Can this be achieved with the way I am using remote_creates?
Unfortunately python sdk doesn't currently support this. There is a github issue for it: https://github.com/facebook/facebook-python-ads-sdk/issues/256.
I have also encountered this issue also and have described my workaround in the comments on the issue:
"I found a decent workaround for getting this behaviour without too much trouble. Basically I set the id fields that have dependencies with values like "{result=:$,id}" and prior to doing execute() on the batch object I iterate over ._batch and add as the 'name' entry. When I run execute sure enough it works perfectly. Obviously this solution does have it's limitations such where you are doing multiple calls to the same endpoint that need to be fed into other endpoints and you would have duplicated resource names and would need to customize the name further to string them together.
Anyways, hope this helps someone!"

pyspark accumulators - understanding their use

I would like to understand what is the use of accumulators. Based upon online examples it seems that we can use to them to count specific issues with the data. For example I have a lot of license numbers, i can count how many of them are not valid using accumulators. But cannot we do the same using filter and map operations? Would it be possible to show a good example where accumulators are used? I would appreciate if you provide sample code in pyspark instead of java or scala
Accumulators are used mostly for diagnostics and retrieving additional data from the actions and typically shouldn't be used as a part of the main logic, especially when called inside transformations*.
Lets start with the first case. You can use accumulator or named accumulator to monitor program execution in close-to-real time (updated per task) and for example kill the job if you encounter to many invalid records. State of the named accumulators can be monitored for example using driver UI.
In case of actions it can used to get additional statistics. For example if you use foreach, foreachPartition to push data to external system you can use accumulators to keep track of failures.
* When are accumulators truly reliable?

Python Eve: Add custom route, changing an object manually

I just started using Eve and it's really great for quickly getting a full REST API to run. However, I'm not entirely convinced that REST is perfect in all cases, e.g. I'd like to have a simple upvote route where I can increase the counter of an object. If I manually retrieve the object, increase the counter, and update it, I can easily run into problems with getting out-of-sync. So I'd like to add a simple extra-route, e.g. /resource/upvote that increases the upvote count by one and returns the object.
I don't know how "hacky" this is, so if it's over-the-top please tell me. I don't see a problem with having custom routes for some important tasks that would be too much work to do in a RESTful way. I know I could treat upvotes as its own resource, but hey I thought we're doing MongoDB, so let's not be overly relational.
So here is as far as I got:
#app.route('/api/upvote/<type>/<id>')
def upvote(type, id):
obj = app.data.find_one_raw(type, id)
obj['score'] += 1
Problem #1 find_one_raw returns None all the time. I guess I have to convert the id parameter? (I'm using the native MongoDB ObjectId)
Problem #2 How to save the object? I don't see a handy easy-to-use method like save_raw
Problem #3 Can we wrap the whole thing in a transaction or similar to make sure it's thread-safe? (I'm also new to MongoDB as you can tell).
1:
type happens to be python keyword. Do you mean to say something like resource_type ?
2: There is app.data.insert (to create new) or app.data.update (to update existing one)
3: Apparently there are no transactions in mongodb as apparent from this thread (As you can tell, I am new to mongodb myself)

Mapreduce on Google App Engine

I'm very confused with the state and documentation of mapreduce support in GAE.
In the official doc https://developers.google.com/appengine/docs/python/dataprocessing/, there is an example, but :
the application use mapreduce.input_readers.BlobstoreZipInputReader, and I would like to use mapreduce.input_readers.DatastoreInputReader. The documentation mention the parameters of DatastoreInputReader, but not the return value sent back to the map fonction....
the application "demo" (page Helloworld) has a mapreduce.yaml file wich IS NOT USED in the application ???
So I found http://code.google.com/p/appengine-mapreduce/. The is a complete example with mapreduce.input_readers.DatastoreInputReader, but it is written that reduce phase isn't supported yet !
So I would like to know if it is possible to implement the first form of mapreduce, with the DatastoreInputReader, to execute a real map / reduce to get a GROUP BY equivalent ?
The second example is from the earlier release, which did indeed just support the mapper phase. However, as the first example shows, the full map/reduce functionality is now supported and has been for some time. The mapreduce.yaml is from that earlier version, it is not used now.
I'm not sure what your actual question is. The value sent to the map function from DatastoreInputReader is, not surprisingly, the individual entity which is taken from the kind being mapped over.

Python, Webkit: how to get the DOM after the page has loaded?

In my code I've connected to the WebView's load-finished event. The particular callback function takes a webview object and frame object as arguments. Then I tried executing get_dom_document() on the frame & the webview objects respectively. It seems this method doesn't exist for those objects...
PS: i started with the tips i got here http://www.gnu.org/software/pythonwebkit/
UPDATE (11-Sep-2010): I think the link I shared relates to a new & different project. Its not a solution per se. My bad!
it's definitely there.
and you can't just "take the tips from http://www.gnu.org/software/pythonwebkit/" you actually have to COMPILE THE CODE (reason: standard pywebkitgtk DOES NOT have W3C DOM accessor functions).
then take a look in pythonwebkit/pywebkitgtk/examples and run browser.py and you'll see what to do.
l.
i forgot to mention (and it wasn't on the documentation, which i've now updated): you specifically need to check out the "python_codegen" branch, otherwise you just end up with plain vanilla webkit. which is of absolutely no use to you.

Categories