Simple python plugin system

Simple python plugin system - python

I'm writing a parser for an internal xml-based metadata format in python. I need to provide different classes for handling different tags. There will be a need for a rather big collection of handlers, so I've envisioned it as a simple plugin system. What I want to do is simply load every class in a package, and register it with my parser.
My current attempt looks like this:
(Handlers is the package containing the handlers, each handler has a static member tags, which is a tuple of strings)
class MetadataParser:
def __init__(self):
#...
self.handlers={}
self.currentHandler=None
for handler in dir(Handlers): # Make a list of all symbols exported by Handlers
if handler[-7:] == 'Handler': # and for each of those ending in "Handler"
handlerMod=my_import('MetadataLoader.Handlers.' + handler)
self.registerHandler(handlerMod, handlerMod.tags) # register them for their tags
# ...
def registerHandler(self, handler, tags):
""" Register a handler class for each xml tag in a given list of tags """
if not isSequenceType(tags):
tags=(tags,) # Sanity check, make sure the tag-list is indeed a list
for tag in tags:
self.handlers[tag]=handler
However, this does not work. I get the error AttributeError: 'module' object has no attribute 'tags'
What am I doing wrong?

Probably one of your handlerMod modules does not contain any tags variable.

First off, apologies for poorly formated/incorrect code.
Also thanks for looking at it. However, the culprit was, as so often, between the chair and the keyboard. I confused myself by having classes and modules of the same name. The result of my_import (which I now realize I didn't even mention where it comes from... It's from SO: link) is a module named for instance areaHandler. I want the class, also named areaHandler. So I merely had to pick out the class by eval('Handlers.' + handler + '.' + handler).
Again, thanks for your time, and sorry about the bandwidth

I suggest you read the example and explanation on this page where how to write a plug-in architecture is explained.

Simple and completly extensible implementation via extend_me library.
Code could look like
from extend_me import ExtensibleByHash
# create meta class
tagMeta = ExtensibleByHash._('Tag', hashattr='name')
# create base class for all tags
class BaseTag(object):
__metaclass__ = tagMeta
def __init__(self, tag):
self.tag = tag
def process(self, *args, **kwargs):
raise NotImeplemntedError()
# create classes for all required tags
class BodyTag(BaseTag):
class Meta:
name = 'body'
def process(self, *args, **kwargs):
pass # do processing
class HeadTag(BaseTag):
class Meta:
name = 'head'
def process(self, *args, **kwargs):
pass # do some processing here
# implement other tags in this way
# ...
# process tags
def process_tags(tags):
res_tags = []
for tag in tags:
cls = tagMeta.get_class(tag) # get correct class for each tag
res_tags.append(cls(tag)) # and add its instance to result
return res_tags
For more information look at documentation or code.
This lib is used in OpenERP / Odoo RPC lib

Related

How to change a variable value in a python parent class from sub class method just for class instance

ok, I am not even entirely sure if my title is completely accurate as I completely do not understand class inheritance and instances at that moment but understand it is something that I need or should grasp moving forward.
Background: attempting to create a custom importer for my bank to be used with the popular Beancount/fava double entry ledger accounting system. I originally reported to fava as a bug but then realized its not a bug and its more my lack of general understanding of Python classes so thought it would be better to post here.
So...I have created the following import script file which as I understand is a sub class of beancount csv.Importer (https://github.com/beancount/beancount/blob/master/beancount/ingest/importers/csv.py) which is a sub class of beancount Importer (https://github.com/beancount/beancount/blob/master/beancount/ingest/importer.py)
In my importer I over ride 2 methods of csv.Importer, name() and file_account(). My goal is to derive the source account associated to input file based on file name and dictionary look-up. The extract() method I do not wish to over-ride in my sub class, however in the csv.Importer extract() method there is reference to self.account that represents the source account to use for extracted transactions. Currently the way my script is if I feed it a file named 'SIMPLII_9999_2018-01-01.csv' the account will be properly derived as 'Assets:Simplii:Chequing-9999'. However, if I stop short of actually importing the transactions in fava and instead attempt to extract the transactions again from the same file the derived account then becomes 'Assets:Simplii:Chequing-9999 :Chequing-9999'.
What I am trying to do is derive the source account from the input file and pass this information as the self.account variable in the parent class (csv.Importer) for my class instance (I think). What is it that I am doing wrong in my class that is causing the derived source account to be carried over to the next instance?
#!/usr/bin/env python3
from beancount.ingest import extract
from beancount.ingest.importers import csv
from beancount.ingest import cache
from beancount.ingest import regression
import re
from os import path
from smart_importer.predict_postings import PredictPostings
class SimpliiImporter(csv.Importer):
'''
Importer for the Simplii bank.
Note: This undecorated class can be regression-tested with
beancount.ingest.regression.compare_sample_files
'''
config = {csv.Col.DATE: 'Date',
csv.Col.PAYEE: 'Transaction Details',
csv.Col.AMOUNT_DEBIT: 'Funds Out',
csv.Col.AMOUNT_CREDIT: 'Funds In'}
account_map = {'9999':'Chequing-9999'}
def __init__(self, *, account, account_map=account_map):
self.account_map = account_map
self.account = 'Assets:Simplii'
super().__init__(
self.config,
self.account,
'CAD',
['Filename: .*SIMPLII_\d{4}_.*\.csv',
'Contents:\n.*Date, Transaction Details, Funds Out, Funds In'],
institution='Simplii'
)
def name(self):
cls = self.__class__
return '{}.{}'.format(cls.__module__, cls.__name__)
def file_account(self, file):
__account = None
if file:
m = re.match(r'.+SIMPLII_(\d{4})_.*', file.name)[1]
if m:
sub_account = self.account_map.get(m)
if sub_account:
__account = self.account + ':' + sub_account
return __account
def extract(self, file):
self.account = self.file_account(file)
return super().extract(file)
#PredictPostings(training_data='/beancount/personal.beancount')
class SmartSimpliiImporter(SimpliiImporter):
'''
A smart version of the Simplii importer.
'''
pass

so I have managed to get this working however I don't think its the proper way to do it...
I changed the extract function like this
def extract(self, file):
self.account = self.file_account(file)
postings = super().extract(file)
self.account = 'Assets:Simplii'
return postings
basically I set the self.account to the value I need to, call the parent class extract function saving results to variable, reset the self.account variable and return results. Seems more of a work around than the proper way but at least its here in case it helps someone else out...

What triggers the from_crawler classmethod?

I'm using scrapy and I have the following functioning pipeline class :
class DynamicSQLlitePipeline(object):
#classmethod
def from_crawler(cls, crawler):
# Here, you get whatever value was passed through the "table" parameter
docket = getattr(crawler.spider, "docket")
return cls(docket)
def __init__(self,docket):
try:
db_path = "sqlite:///"+settings.SETTINGS_PATH+"\\data.db"
db = dataset.connect(db_path)
table_name = docket[0:3] # FIRST 3 LETTERS
self.my_table = db[table_name]
except Exception:
# traceback.exec_print()
pass
def process_item(self, item, spider):
try:
test = dict(item)
self.my_table.insert(test)
print('INSERTED')
except IntegrityError:
print('THIS IS A DUP')
In my spider I have:
custom_settings = {
'ITEM_PIPELINES': {
'myproject.pipelines.DynamicSQLlitePipeline': 600,
}
}
From a recent question I was pointed to What is the 'cls' variable used for in Python classes?
If I understand correctly in order for the pipeline object to be instantiated (using the init function), it requires a docket number. The docket number only becomes available once the from_crawler class method is run. But what triggers the from_crawler method. Again the code is working.

The caller of a classmethod has to have an instance of the class. They may just access it by name, like this:
DynamicSQLlitePipeline.from_crawler(crawler)
… or:
sqlitepipeline.DynamicSQLlitePipeline.from_crawler(crawler)
Or maybe you pass the class object to someone, and they store it and use it later like this:
pipelines[i].from_crawler(crawler)
In Scrapy, the usual way to register a set of pipelines with the framework, according to the docs, is like this:
ITEM_PIPELINES = {
'myproject.pipelines.PricePipeline': 300,
'myproject.pipelines.JsonWriterPipeline': 800,
}
(Also see the Extensions user guide, which explains how this fits into a scrapy project.)
Presumably you've done something similar in code you haven't shown us, putting something like 'sqlscraper.pipelines.DynamicSQLlitePipeline' in that dict. At some point, Scrapy goes through that dict, sorts it in order by the values, and instantiates each pipeline. (Because it has the name of the class, as a string, instead of the class object, this is a little trickier, but the details really aren't relevant here.)

How to give a class a referencable string name?

The scenerio is I'm using an arg parser to get a command line argument auth_application.
auth_application command can have many values, for example:
cheese
eggs
noodles
pizza
These values are related to a programmable class.
I'd like a way to name the class, possible using a decorator.
So I can say
if auth_application is Cheese.__name__:
return Cheese()
Currently I maintain a tuple of auth_application names and have to expose that to my arg parser class as well as import the classes I need.
Anyways to make this better? Is there a decorator for classes to name them?
I'm looking for a python 2.7 solution, but a python 3 solution might be useful to know.

Easy peasy.
class command(object):
map = {}
def __init__(self, commandname):
self.name = commandname
def __call__(self, cls):
command.map[self.name] = cls
return cls
class NullCommand(object):
pass
#command('cheese')
class Cheese(object):
pass
#command('eggs')
class Eggs(object):
pass
def func(auth_application):
return command.map.get(auth_application, command.NullCommand)()

You can just keep a sinlge list of all of your "allowed classes" and iterate over that to find the class being referred to from the command line.
allow_classes = [Cheese,Eggs,Noodles,Pizza]
for cls in allow_classes:
if auth_application.lower() is cls.__name__.lower():
return cls()

Absolutely you can! You need to understand class attributes.
class NamedClass(object):
name = "Default"
class Cheese(NamedClass):
name = "Cheese"
print(Cheese.name)
> Cheese

You can use the standard Inspect Library to get the real class names, without having to augment your classes with any extra data - and this works for any class, in any module - even if you don't have the source code.
For instance - to list all the classes defined in mymodule :
import mymodule
import inspect
for name, obj in inspect.getmembers(mymodule, inspect.isclass):
print name
the obj variable is a real class object - which you can use to declare an instance, access class methods etc.
To get the definition of a class by it's name string - you can write a simple search function :
import mymodule
import inspect
def find_class(name):
"""Find a named class in mymodule"""
for this_name, _cls_ in inspect.getmembers(mymodule, inspect.isclass):
if this_name = name:
return _cls_
return None
....
# Create an instance of the class named in auth_application
find_class(auth_application)(args, kwargs)
NB: Code snippets not tested

Best way to mix and match components in a python app

I have a component that uses a simple pub/sub module I wrote as a message queue. I would like to try out other implementations like RabbitMQ. However, I want to make this backend change configurable so I can switch between my implementation and 3rd party modules for cleanliness and testing.
The obvious answer seems to be to:
Read a config file
Create a modifiable settings object/dict
Modify the target component to lazily load the specified implementation.
something like :
# component.py
from test.queues import Queue
class Component:
def __init__(self, Queue=Queue):
self.queue = Queue()
def publish(self, message):
self.queue.publish(message)
# queues.py
import test.settings as settings
def Queue(*args, **kwargs):
klass = settings.get('queue')
return klass(*args, **kwargs)
Not sure if the init should take in the Queue class, I figure it would help in easily specifying the queue used while testing.
Another thought I had was something like http://www.voidspace.org.uk/python/mock/patch.html though that seems like it would get messy. Upside would be that I wouldn't have to modify the code to support swapping component.
Any other ideas or anecdotes would be appreciated.
EDIT: Fixed indent.

One thing I've done before is to create a common class that each specific implementation inherits from. Then there's a spec that can easily be followed, and each implementation can avoid repeating certain code they'll all share.
This is a bad example, but you can see how you could make the saver object use any of the classes specified and the rest of your code wouldn't care.
class SaverTemplate(object):
def __init__(self, name, obj):
self.name = name
self.obj = obj
def save(self):
raise NotImplementedError
import json
class JsonSaver(SaverTemplate):
def save(self):
file = open(self.name + '.json', 'wb')
json.dump(self.object, file)
file.close()
import cPickle
class PickleSaver(SaverTemplate):
def save(self):
file = open(self.name + '.pickle', 'wb')
cPickle.dump(self.object, file, protocol=cPickle.HIGHEST_PROTOCOL)
file.close()
import yaml
class PickleSaver(SaverTemplate):
def save(self):
file = open(self.name + '.yaml', 'wb')
yaml.dump(self.object, file)
file.close()
saver = PickleSaver('whatever', foo)
saver.save()

App Engine (Python) Datastore Precall API Hooks

Background
So let's say I'm making app for GAE, and I want to use API Hooks.
BIG EDIT: In the original version of this question, I described my use case, but some folks correctly pointed out that it was not really suited for API Hooks. Granted! Consider me helped. But now my issue is academic: I still don't know how to use hooks in practice, and I'd like to. I've rewritten my question to make it much more generic.
Code
So I make a model like this:
class Model(db.Model):
user = db.UserProperty(required=True)
def pre_put(self):
# Sets a value, raises an exception, whatever. Use your imagination
And then I create a db_hooks.py:
from google.appengine.api import apiproxy_stub_map
def patch_appengine():
def hook(service, call, request, response):
assert service == 'datastore_v3'
if call == 'Put':
for entity in request.entity_list():
entity.pre_put()
apiproxy_stub_map.apiproxy.GetPreCallHooks().Append('preput',
hook,
'datastore_v3')
Being TDD-addled, I'm making all this using GAEUnit, so in gaeunit.py, just above the main method, I add:
import db_hooks
db_hooks.patch_appengine()
And then I write a test that instantiates and puts a Model.
Question
While patch_appengine() is definitely being called, the hook never is. What am I missing? How do I make the pre_put function actually get called?

Hooks are a little low level for the task at hand. What you probably want is a custom property class. DerivedProperty, from aetycoon, is just the ticket.
Bear in mind, however, that the 'nickname' field of the user object is probably not what you want - per the docs, it's simply the user part of the email field if they're using a gmail account, otherwise it's their full email address. You probably want to let users set their own nicknames, instead.

The issue here is that within the context of the hook() function an entity is not an instance of db.Model as you are expecting.
In this context entity is the protocol buffer class confusingly referred to as entity (entity_pb). Think of it like a JSON representation of your real entity, all the data is there, and you could build a new instance from it, but there is no reference to your memory-resident instance that is waiting for it's callback.
Monkey patching all of the various put/delete methods is the best way to setup Model-level callbacks as far as I know†
Since there doesn't seem to be that many resources on how to do this safely with the newer async calls, here's a BaseModel that implements before_put, after_put, before_delete & after_delete hooks:
class HookedModel(db.Model):
def before_put(self):
logging.error("before put")
def after_put(self):
logging.error("after put")
def before_delete(self):
logging.error("before delete")
def after_delete(self):
logging.error("after delete")
def put(self):
return self.put_async().get_result()
def delete(self):
return self.delete_async().get_result()
def put_async(self):
return db.put_async(self)
def delete_async(self):
return db.delete_async(self)
Inherit your model-classes from HookedModel and override the before_xxx,after_xxx methods as required.
Place the following code somewhere that will get loaded globally in your applicaiton (like main.py if you use a pretty standard looking layout). This is the part that calls our hooks:
def normalize_entities(entities):
if not isinstance(entities, (list, tuple)):
entities = (entities,)
return [e for e in entities if hasattr(e, 'before_put')]
# monkeypatch put_async to call entity.before_put
db_put_async = db.put_async
def db_put_async_hooked(entities, **kwargs):
ents = normalize_entities(entities)
for entity in ents:
entity.before_put()
a = db_put_async(entities, **kwargs)
get_result = a.get_result
def get_result_with_callback():
for entity in ents:
entity.after_put()
return get_result()
a.get_result = get_result_with_callback
return a
db.put_async = db_put_async_hooked
# monkeypatch delete_async to call entity.before_delete
db_delete_async = db.delete_async
def db_delete_async_hooked(entities, **kwargs):
ents = normalize_entities(entities)
for entity in ents:
entity.before_delete()
a = db_delete_async(entities, **kwargs)
get_result = a.get_result
def get_result_with_callback():
for entity in ents:
entity.after_delete()
return get_result()
a.get_result = get_result_with_callback
return a
db.delete_async = db_delete_async_hooked
You can save or destroy your instances via model.put() or any of the db.put(), db.put_async() etc, methods and get the desired effect.
†would love to know if there is an even better solution!?

I don't think that Hooks are really going to solve this problem. The Hooks will only run in the context of your AppEngine application, but the user can change their nickname outside of your application using Google Account settings. If they do that, it won't trigger any logic implement in your hooks.
I think that the real solution to your problem is for your application to manage its own nickname that is independent of the one exposed by the Users entity.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Simple python plugin system - python

Probably one of your handlerMod modules does not contain any tags variable.

I suggest you read the example and explanation on this page where how to write a plug-in architecture is explained.

Related

How to change a variable value in a python parent class from sub class method just for class instance

What triggers the from_crawler classmethod?

How to give a class a referencable string name?

Best way to mix and match components in a python app

App Engine (Python) Datastore Precall API Hooks

Categories

Resources