Running mapreduce jobs on google app engine - python

I'm running the demo that comes with the mapreduce framework. It's giving me an error:
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/webapp/_webapp25.py", line 703, in __call__
handler.post(*groups)
File "/path/to/mapreduce/base_handler.py", line 68, in post
self.handle()
File "/path/to/mapreduce/handlers.py", line 431, in handle
self.aggregate_state(state, shard_states)
File "/path/to/mapreduce/handlers.py", line 462, in aggregate_state
context.COUNTER_MAPPER_CALLS))
File "/path/to/mapreduce/model.py", line 257, in get
return self.counters.get(counter_name, 0)
AttributeError: 'list' object has no attribute 'get'
Is this something I'm doing wrong, does the demo not work? Is there more updated code somewhere else?
This is using the code from http://appengine-mapreduce.googlecode.com/svn/trunk/

Not familiar with that code, but the latest code is the MapReduce Bundle you can download from the SDK:
https://developers.google.com/appengine/downloads
It comes with a bit of a demo. I was able to follow this and get this to work:
http://code.google.com/p/appengine-mapreduce/wiki/GettingStartedInPython
Here's some additional notes I had when I was trying to get MapReduce running.
http://eatdev.tumblr.com/post/17983355135/using-mapreduce-with-django-nonrel-on-app-engine

Related

Google App Engine dev_appserver.py: watcher_ignore_re flag "is not JSON serializable"

Why I run the dev_appserver.py with the option watcher_ignore_re, I get an error message that the regex is not JSON serializable.
Is this a bug with the development server? Am I using this command improperly? The command and callstack is printed below.
C:\Users\mes65\Documents\MyProject>"C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\bin\dev_appserver.py" ^
--watcher_ignore_re="(.*\.git|.*\.idea|tmp\.py)" ^
"C:\Users\mes65\Documents\MyProject"
WARNING 2018-06-06 09:28:59,161 appinfo.py:1622] lxml version "2.3" is deprecated, use one of: "3.7.3"
INFO 2018-06-06 09:28:59,187 devappserver2.py:120] Skipping SDK update check.
Traceback (most recent call last):
File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\dev_appserver.py", line 96, in <module>
_run_file(__file__, globals())
File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\dev_appserver.py", line 90, in _run_file
execfile(_PATHS.script_file(script_name), globals_)
File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\google\appengine\tools\devappserver2\devappserver2.py", line 454, in <module>
main()
File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\google\appengine\tools\devappserver2\devappserver2.py", line 442, in main
dev_server.start(options)
File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\google\appengine\tools\devappserver2\devappserver2.py", line 163, in start
bool(ssl_certificate_paths), options)
File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\google\appengine\tools\devappserver2\metrics.py", line 166, in Start
self._cmd_args = json.dumps(vars(cmd_args)) if cmd_args else None
File "C:\Python27\lib\json\__init__.py", line 244, in dumps
return _default_encoder.encode(obj)
File "C:\Python27\lib\json\encoder.py", line 207, in encode
chunks = self.iterencode(o, _one_shot=True)
File "C:\Python27\lib\json\encoder.py", line 270, in iterencode
return _iterencode(o, 0)
File "C:\Python27\lib\json\encoder.py", line 184, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: <_sre.SRE_Pattern object at 0x00000000063C2188> is not JSON serializable
It looks like it is an issue with the google analytics code built into dev_appserver2 (google-cloud-sdk\platform\google_appengine\google\appengine\tools\devappserver2\devappserver2.py on or around line 316). It wants to send all of your command line options to google analytics. If you remove the analytics client id by adding the command line option --google_analytics_client_id= (note: '=' without any following value) the appserver won't call the google analytics code where it is trying to JSON serialize an SRE object and failing. However, since you are on Windows I find that the --watcher_ignore_re does not work anyway even when you get past this issue.
There is a comment in file_watcher.py
TODO: b/33178251 - Add watcher_ignore_re support for windows.
I also faced with this usability problem on Windows and was really disappointed. I tried to find some workarounds but I hadn't found any appropriate way.
In the end, I decided to make my own implementation of support watcher_ignore_re for Windows. I put required changes in my Github repo
If describe them in several words:
Add _watcher_ignore_re, _skip_files_re properties and its setters
Add import statement from google.appengine.tools.devappserver2 import watcher_common and use it in newly created def _path_ginored
Filter additional_changes before adding them to watcher changed files
For resolving mentioned problem with not serializable regex attribute we should drop them from the serialized dictionary. Fix for this is added as consequent commit and can be checked at metrics.py:185-193.
I hope it helps other guys enjoy developing on GAE on Windows :)

Tweepy error: attribute error

The code was working fine but all of a sudden this error started to raise, though I did not change my codebase.
searched_tweets = ( status._json for status in tweepy.Cursor(api.search, q=query, count=300, since=from_date, until=to_date,
File "build/bdist.linux-x86_64/egg/tweepy/cursor.py", line 197, in next
File "build/bdist.linux-x86_64/egg/tweepy/cursor.py", line 108, in next
File "build/bdist.linux-x86_64/egg/tweepy/binder.py", line 245, in _call
File "build/bdist.linux-x86_64/egg/tweepy/binder.py", line 189, in execute
tweepy.error.TweepError:
Failed to send request: 'module' object has no attribute 'HTTPMessage'
I had another python script which also used tweepy. But that also started throwing error:
stream.filter(track=keyword_list, stall_warnings=True)
File "build/bdist.linux-x86_64/egg/tweepy/streaming.py", line 445, in filter
File "build/bdist.linux-x86_64/egg/tweepy/streaming.py", line 361, in _start
File "build/bdist.linux-x86_64/egg/tweepy/streaming.py", line 294, in _run
AttributeError: 'module' object has no attribute 'HTTPMessage'
I have no idea what is causing this error. I even tried reinstalling tweepy but no luck. Any help is highly appreciated!
Edit: I figured out that it's coming from tweepy's binder.py:
I was able to solve this issue by installing Anaconda3.4 and then,
pip3 install tweepy
I suppose that it was a dependency or an update issue.

How to get p12 file working on python app engine

Im having trouble getting identity-toolkit fully working with Python App Engine Sandbox. The sample provided is for a non GAE Sandbox project.
In the sample project it reads gitkit-server-config.json from file using os.path. But this is not supported in GAE Sandbox. To get around this I am creating a GitkitClient directly using the constructor:
gitkit_instance = gitkitclient.GitkitClient(
client_id="123456opg.apps.googleusercontent.com",
service_account_email="my-project#appspot.gserviceaccount.com",
service_account_key="/path/to/my-p12file.p12",
widget_url="http://localhost:8080/callback",
http=None,
project_id="my-project")
Is this the correct way to create the GitkitClient?
The issue now is when I try to do a password reset when running locally using dev_appserver.py I get the following stack trace:
File "dashboard.py", line 89, in post
oobResult = gitkit_instance.GetOobResult(self.request.POST,self.request.remote_addr)
File "identitytoolkit/gitkitclient.py", line 366, in GetOobResult
param['action'])
File "identitytoolkit/gitkitclient.py", line 435, in _BuildOobLink
code = self.rpc_helper.GetOobCode(param)
File "identitytoolkit/rpchelper.py", line 104, in GetOobCode
response = self._InvokeGitkitApi('getOobConfirmationCode', request)
File "identitytoolkit/rpchelper.py", line 210, in _InvokeGitkitApi
access_token = self._GetAccessToken()
File "identitytoolkit/rpchelper.py", line 231, in _GetAccessToken
'assertion': self._GenerateAssertion(),
File "identitytoolkit/rpchelper.py", line 259, in _GenerateAssertion
crypt.Signer.from_string(self.service_account_key),
File "oauth2client/_pure_python_crypt.py", line 183, in from_string
raise ValueError('No key could be detected.')
ValueError: No key could be detected.
Im assuming this is a problem with the .p12 file? I double checked service_account_key="/path/to/my-p12file.p12" and the file exists. What am I missing here?
FYI to others working on this in the future -
I could not get this working in python. The documentation doesn't make it clear how to get this working in app engine. In addition, dependency issues with PyCrypto made this a gcc and dependency nightmare.
I was however able to get this working in Go and there is a semi-working example online that will work with some modifications highlighted in the issues and pull request pages. Good luck.

gdata spreadsheet library for python not working anymore?

I was trying to run a query for data in one of my google docs, and it's worked for several months. Starting yesterday or the day before, I noticed that my script no longer works. Has Google updated their api for spreadsheets? Has anybody found a workaround?
My error looks like this:
Traceback (most recent call last):
File "build_packer_image.py", line 311, in <module>
for index, entry in enumerate(client.GetWorksheetsFeed(doc_key).entry):
File "/build/toolchain/mac-10.5-32/lib/python2.7/site-packages/gdata/spreadsheet/service.py", line 129, in GetWorksheetsFeed
converter=gdata.spreadsheet.SpreadsheetsWorksheetsFeedFromString)
File "/build/toolchain/mac-10.5-32/lib/python2.7/site-packages/gdata/service.py", line 1074, in Get
return converter(result_body)
File "/build/toolchain/mac-10.5-32/lib/python2.7/site-packages/gdata/spreadsheet/__init__.py", line 411, in SpreadsheetsWorksheetsFeedFromString
xml_string)
File "/build/toolchain/mac-10.5-32/lib/python2.7/site-packages/atom/__init__.py", line 93, in optional_warn_function
return f(*args, **kwargs)
File "/build/toolchain/mac-10.5-32/lib/python2.7/site-packages/atom/__init__.py", line 127, in CreateClassFromXMLString
tree = ElementTree.fromstring(xml_string.replace('doctype','DOCTYPE'))
File "<string>", line 125, in XML
cElementTree.ParseError: no element found: line 1, column 0
Build step 'Execute shell' marked build as failure
Finished: FAILURE
I am using:
Python 2.7.5
gdata 2.0.18
I am just using an document key and no oauth in my code, if that makes a difference (I am passing in the username and password to the ClientLogin method)
Actually here is the answer to the problem:
The use of client login (using username/password instead of oauth2) is
likely the cause of the error. That protocol was deprecated 3+ years
ago and was just shutdown. If you capture the HTTP response (which
appears to have some HTML content), that might confirm if it is
related to the shutdown. Migrating to OAuth 2 would get your apps
working again.
After sending xml for update in spreadsheet google respond with a login page.
It means the authentication is not working for gdata now
https://code.google.com/a/google.com/p/apps-api-issues/issues/detail?id=3851#c2

AttributeError: 'Response' object has no attribute '_dom'

I'm testing ebaysdk Python library that lets you connect to ebay. Now I'm trying examples from: https://github.com/timotheus/ebaysdk-python/
So far I got stuck at this example:
from ebaysdk.shopping import Connection as Shopping
shopping = Shopping(domain="svcs.sandbox.ebay.com", config_file="ebay.yaml")
response = shopping.execute('FindPopularItems',
{'QueryKeywords': 'Python'})
print response.disct()
When I run it. It gives me this error:
Traceback (most recent call last):
File "ebay-test.py", line 13, in <module>
{'QueryKeywords': 'Python'})
File "/usr/local/lib/python2.7/dist-packages/ebaysdk-2.1.0-py2.7.egg/ebaysdk/connection.py", line 123, in execute
self.error_check()
File "/usr/local/lib/python2.7/dist-packages/ebaysdk-2.1.0-py2.7.egg/ebaysdk/connection.py", line 193, in error_check
estr = self.error()
File "/usr/local/lib/python2.7/dist-packages/ebaysdk-2.1.0-py2.7.egg/ebaysdk/connection.py", line 305, in error
error_array.extend(self._get_resp_body_errors())
File "/usr/local/lib/python2.7/dist-packages/ebaysdk-2.1.0-py2.7.egg/ebaysdk/shopping/__init__.py", line 188, in _get_resp_body_errors
dom = self.response.dom()
File "/usr/local/lib/python2.7/dist-packages/ebaysdk-2.1.0-py2.7.egg/ebaysdk/response.py", line 229, in dom
return self._dom
File "/usr/local/lib/python2.7/dist-packages/ebaysdk-2.1.0-py2.7.egg/ebaysdk/response.py", line 216, in __getattr__
return getattr(self._obj, name)
AttributeError: 'Response' object has no attribute '_dom'
Am I missing something here or it could be some kind of bug in library?
Do you have a config file? I had a lot of problems getting started with this SDK. To get the yaml config file to work, I had to specify the directory that it was in. So in your example, it would be:
shopping = Shopping(domain="svcs.sandbox.ebay.com", config_file=os.path.join(os.path.dirname(os.path.realpath(__file__)), 'ebay.yaml'))
You should also be able to specify debug=true in your Shopping() declaration as in Shopping(debug=True).
Make sure if you have not, to specify your APP_ID and other necessary values in the config file.
You have the wrong domain, it should be open.api.sandbox.ebay.com. See this page on the ebaysdk github.

Categories