I need to construct a tool that will be used to create field mappings (between tables) in the most automated manner possible.
Here is the deal: imagine a table being appended to other. (lets ignore field type, just for a second...)
CREATE OR REPLACE TABLE fooA(
id,
name,
type,
foo)
CREATE OR REPLACE TABLE otherFooTable(
idFoo,
nameFoo,
spam)
I am thinking to create a structure like this:
fieldMap = {'otherFooTable': [('idFoo','id'),('nameFoo','name'),('spam','foo')]}
I would be able to access this using (for example)
print fieldMap['tabelax'][0][1]
It´s not a very complex structure, but i can run into some problems using it? Is there any suggestions of how to handle this sort of issue? I need to store (for now) at least inputTable (i don´t want to repeat it for each field mapped), inputField,outputField. There is no reason to store outputTable, because that is always known beforehand.
Suggestions and past experiences are deeply appreciated.
PS: perhaps a formal structure (like a class) would be better?
Thanks
I'd honestly just take hints from (or use) SQLAlchemy or Django Models. These are tried and true data representation methods.
Here is a little wrapper class for FooB's to mimic FooA's, but still retain their FooB-ishness.
from collections import namedtuple
# use namedtuple to define some simple classes (requires Py2.6 or later)
FooA = namedtuple('FooA', 'id name type foo')
FooB = namedtuple('FooB', 'idfoo namefoo spam')
# create a wrapper class for FooB's to look like a FooA
class FooAMimic(object):
attrMap = dict(zip(FooA._fields, FooB._fields))
# or if the fields aren't nicely ordered, declare this mapping explicitly
#~ attrMap = { 'id' : 'idfoo', 'name' : 'namefoo', 'foo' : 'spam' }
def __init__(self, obj):
self.obj = obj
def __getattr__(self, aname):
ob = self.obj
if aname in self.attrMap:
return getattr(ob, self.attrMap[aname])
elif hasattr(ob, aname):
return getattr(ob, aname)
else:
raise AttributeError("no such attribute " + aname)
def __dir__(self):
return sorted(set(dir(super(FooAMimic,self))
+ dir(self.obj)
+ list(FooA._fields)))
Use it like this:
# make some objects, some FooA, some FooB
fa = FooA('a', 'b', 'c','d')
fb = FooB('xx', 'yy', 'zz')
fc = FooA('e', 'f', 'g','h')
# create list of items that are FooA's, or FooA lookalikes
coll = [fa, FooAMimic(fb), fc]
# access objects like FooA's, but notice that the wrapped FooB
# attributes are still available too
for f in sorted(coll, key=lambda k : k.id):
print f.id, '=',
try:
print f.namefoo, "(really a namefoo)"
except AttributeError:
print f.name
Prints:
a = b
e = f
xx = yy (really a namefoo)
Think about this
class Column( object ):
def __init__( self, name, type_information=None ):
self.name = name
self.type_information = type_information
self.pk = None
self.fk_ref = None
def fk( self, column ):
self.fk_ref = column
class Table( object ):
def __init__( self, name, *columns ):
self.name = name
self.columns = dict( (c.name, c) for c in columns )
def column( self, name ):
return self.columns[ name ]
Table( "FOOA", Column( "id" ), Column( "name" ), Column( "type" ), Column( "foo" ) )
Table( "otherFooTable", Column( "idFoo" ), Column( "nameFoo" ), Column( "spam" ) )
It's not clear at all what you're tying to do or why, so this is as good as anything, since it seems to represent the information you actually have.
Try to avoid accessing your data through fixed numerical indexes as in fieldMap['tabelax'][0][1]. After a year of not looking at your code, it may take you (or others) a while to figure out what it all means in human terms (e.g. "the value of idFoo in table tabelax"). Also, if you ever need to change your data structure (e.g. add another field) then some/all your numerical indexes may need fixing. Your code becomes ossified when the risk of breaking the logic prevents you from modifying the data structure.
It is much better to use a class and use class (accessor) methods to access the data structure. That way, the code outside of your class can be preserved even if you need to change your data structure (inside the class) at some future date.
Related
This question is posed as a general Python question, but will be exemplified using Kubeflow Pipelines SDK, kfp==1.8.12. In this module, I want to create a helper class around the class dsl.ContainerOp to simplify a lot of my work. As a minimal example, below I use this class to create a component as such:
from kfp import dsl
name = 'My name'
image = 'My image'
docker_entrypoint = "/main.py"
docker_args = [
'--arg1', 'some arg',
'--arg2', 'some other arg'
]
component = dsl.ContainerOp(
name=name,
image=image,
arguments=[docker_entrypoint] + docker_args
)
Then, I would like to set one of its attributes that relates to caching as such;
use_caching = False
if use_caching:
staleness = "P30D"
else:
staleness = "P0D"
component.execution_options.caching_strategy.max_cache_staleness = staleness
which works fine, as expected. Now, I would like to create a ContainerOpHelper class, to simplify a lot of my argument passing (the "real" code has a lot of parameters). Problem: I need to access the attribute execution_options.caching_strategy.max_cache_staleness from the class, but I can't figure out how! Here is the helper class, and my attempt to access the attribute;
class ContainerOpHelper(dsl.ContainerOp):
def __init__(
self,
name: str,
image: str,
docker_entrypoint: str = None,
docker_args: list = None,
use_caching: bool = None
):
super().__init__(
name=name,
image=image,
arguments=([docker_entrypoint] if docker_entrypoint else []) + (docker_args if docker_args else [])
)
if use_caching:
staleness = "P30D"
else:
staleness = "P0D"
# Tried to be creative; but doesnt work
super.__setattr__("execution_options.caching_strategy.max_cache_staleness", staleness)
This helper class can then be used as such;
component = ContainerOpHelper(
name='My name',
image='My image',
docker_entrypoint="/main.py",
docker_args=[
'--arg1', 'some arg',
'--arg2', 'some other arg'
],
use_caching=False
)
Since the attribute execution_options.caching_strategy.max_cache_staleness is "many levels deep", I'm not sure how I can set it in my helper class. Any ideas?
The solution was fairly simple, as provided by the comment of #quamrana.
Simply set it straight in the child class constructor as such;
self.execution_options.caching_strategy.max_cache_staleness = staleness
I have a SQLAlchemy model:
class Ticket(db.Model):
__tablename__ = 'ticket'
id = db.Column(INTEGER(unsigned=True), primary_key=True, nullable=False,
autoincrement=True)
cluster = db.Column(db.VARCHAR(128))
#classmethod
def get(cls, cluster=None):
query = db.session.query(Ticket)
if cluster is not None:
query = query.filter(Ticket.cluster==cluster)
return query.one()
If I add a new column and would like to extend the get method, I have to add one if xxx is not None like this below:
#classmethod
def get(cls, cluster=None, user=None):
query = db.session.query(Ticket)
if cluster is not None:
query = query.filter(Ticket.cluster==cluster)
if user is not None:
query = query.filter(Ticket.user==user)
return query.one()
Is there any way I could make this more efficient? If I have too many columns, the get method would become so ugly.
As always, if you don't want to write something repetitive, use a loop:
#classmethod
def get(cls, **kwargs):
query = db.session.query(Ticket)
for k, v in kwargs.items():
query = query.filter(getattr(table, k) == v)
return query.one()
Because we're no longer setting the cluster=None/user=None as defaults (but instead depending on things that weren't specified by the caller simply never being added to kwargs), we no longer need to prevent filters for null values from being added: The only way a null value will end up in the argument list is if the user actually asked to search for a value of None; so this new code is able to honor that request should it ever take place.
If you prefer to retain the calling convention where cluster and user can be passed positionally (but the user can't search for a value of None), see the initial version of this answer.
I have the following SQLAlchemy class defined:
Base = sqlalchemy.ext.declarative.declarative_base()
class NSASecrets(Base):
__tablename__ = 'nsasecrets';
id = sqlalchemy.Column(sqlalchemy.Integer, primary_key=True);
text = sqlalchemy.Column(sqlalchemy.String);
author = sqlalchemy.Column(sqlalchemy.String);
Now what I want to do is to be able to mask "author" field depending on some logic, something like:
if (allowed):
nsasecrets = session.query(NSASecrets,**mask=False**);
else:
nsasecrets = session.query(NSASecrets,**mask=True**);
for nsasecret in nsasecrets:
print '{0} {1}'.format(author, text);
So depending on this "mask" parameter I would like output to be "John Smith" in False case - output not masked, or "J*** **h" when output is masked. Now obviously I could do it in this very print, but problem is that prints are scattered around the code and the only way I see to do this in controlled centralized manner is to create SQLAlchemy objects with already masked values. So is there any well known solution to this? Or should I just create my own session manager that would overload "query" interface or am I missing some other possible solutions to this?
Thanks
this is typically the kind of thing in Python we do with something called descriptors. A simple way to combine descriptors with SQLAlchemy mapped columns is to use the synonym, though synonym is a bit dated at this point, in favor of a less "magic" system called hybrids. Either can be used here, below is an example of a hybrid:
from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base, synonym_for
from sqlalchemy.ext.hybrid import hybrid_property
Base = declarative_base()
class NSASecrets(Base):
__tablename__ = 'nsasecrets'
id = Column(Integer, primary_key=True)
_text = Column("text", String)
_author = Column("author", String)
def _obfuscate(self, value):
return "%s%s" % (value[0], ("*" * (len(value) - 2)))
#hybrid_property
def text(self):
return self._obfuscate(self._text)
#text.setter
def text(self, value):
self._text = value
#text.expression
def text(cls):
return cls._text
#hybrid_property
def author(self):
return self._obfuscate(self._author)
#author.setter
def author(self, value):
self._author = value
#author.expression
def author(cls):
return cls._author
n1 = NSASecrets(text='some text', author="some author")
print n1.text
print n1.author
note that this doesn't have much to do with querying. The idea of formatting the data as it arrives in a rowset is a different way to go, and there's some ways to make that happen too, though if you're only concerned about print statements that refer to "text" and "author", it's likely more convenient to keep that as a python access pattern.
Looking at the bottom of the post you can see i have three classes. The code here is pseudo code written on the fly and untested however it adequately shows my problem. If we need the actual classes I can update this question tomorrow when at work. So ignore syntax issues and code that only represents a thought rather than the actual "code" that would do what i describe there.
Question 1
If you look at the Item search class method you can see that when the user does a search i call search on the base class then based on that result return the correct class/object. This works but seems kludgy. Is there a better way to do this?
Question 2
If you look at the KitItem class you can see that I am overriding the list price. If the flag calc_list is set to true then I sum the list price of the components and return that as the list price for the kit. If its not marked as true I want to return the "base" list price. However as far as I know there is no way to access a parent attribute since in a normal setup it would be meaningless but with sqlalchemy and shared table inheritance it could be useful.
TIA
class Item(DeclarativeBase):
__tablename__ = 'items'
item_id = Column(Integer,primary_key=True,autoincrement=True)
sku = Column(Unicode(50),nullable=False,unique=True)
list_price = Column(Float)
cost_price = Column(Float)
item_type = Column(Unicode(1))
__mapper_args__ = {'polymorphic_on': item_type}
__
def __init__(self,sku,list_price,cost_price):
self.sku = sku
self.list_price = list_price
self.cost_price = cost_price
#classmethod
def search(cls):
"""
" search based on sku, description, long description
" return item as proper class
"""
item = DBSession.query(cls).filter(...) #do search stuff here
if item.item_type == 'K': #Better way to do this???
return DBSession.query(KitItem).get(item.item_id)
class KitItem(Item):
__mapper_args__ = {'polymorphic_identity': 'K'}
calc_list = Column(Boolean,nullable=False,default=False)
#property
def list_price(self):
if self.calc_list:
list_price = 0.0
for comp in self.components:
list_price += comp.component.list_price * comp.qty
return list_price
else:
#need help here
item = DBSession.query(Item).get(self.item_id)
return item.list_price
class KitComponent(DeclarativeBase):
__tablename__ = "kit_components"
kit_id = Column(Integer,ForeignKey('items.item_id'),primarykey=True)
component_id = Column(Integer,ForeignKey('items.item_id'),primarykey=True)
qty = Column(Integer,nullable=False, default=1)
kit = relation(KitItem,backref=backref("components"))
component = relation(Item)
Answer-1: in fact you do not need to do anything special here: given that you configured your inheritance hierarchy properly, your query will already return proper class for every row (Item or KitItem). This is the advantage of the ORM part. What you could do though is to configure the query to immediatelly load also the additional columns which do belong to children of Item (from your code this is only calc_list column), which you can do by specifying with_polymorphic('*'):
#classmethod
def search(cls):
item = DBSession.query(cls).with_polymorphic('*').filter(...) #do search stuff here
return item
Read more on this in Basic Control of Which Tables are Queried.
To see the difference, enabled SQL logging, and compare your tests scripts with and without with_polymorphic(...) - you will most probably require less SQL statements being executed.
Answer-2: I would not override one entry attributed with one which is purely computed. Instead I would just create another computed attribute (lets call it final_price), which would look like following for each of two classes:
class Item(Base):
...
#property
def total_price(self):
return self.list_price
class KitItem(Item):
...
#property
def total_price(self):
if self.calc_list:
_price = 0.0
for comp in self.components:
_price += comp.component.list_price * comp.qty
return _price
else:
# #note: again, you do not need to perform any query here at all, as *self* is that you need
return self.list_price
Also in this case, you might think of configuring the relationship KitItem.components to be eagerly loaded, so that the calculation of the total_price will not trigger additional SQL. But you have to decide yourself if this is beneficial for your use cases (again, analyse the SQLs generated in your scenario).
what it is the best way to accomplish the following, either subclassing tuple or some other trick?
region = ( "buffer", "region" )
region.cmd = ( "kill", "mark" )
You can simply subclass tuple without modification and it will work. By subclassing a built-in class it gains the ability to have arbitrary properties assigned to it, like normal user-defined classes.
class Region(tuple):
pass
region = Region(( "buffer", "region" ))
region.cmd = ( "kill", "mark" )
class Region(tuple):
def __init__(self, *args):
super(Region, self).__init__(self, *args)
self.cmd = None
region = Region(("buffer", "region"))
region.cmd = ("kill", "mark")
I'm not sure what you're asking.
If you want nested data structure I'd probably use a dictionary instead of a tuple.
region = {
"buffer" : ("kill", "mark"),
"region" : ("kill", "mark")
}
Tuples are non mutable as well I believe.
Use a named tuple: http://docs.python.org/dev/library/collections.html#collections.namedtuple