I am currently using this piece of code :
class FileSystem(metaclass=Singleton):
"""File System manager based on Spark"""
def __init__(self, spark):
self._path = spark._jvm.org.apache.hadoop.fs.Path
self._fs = spark._jvm.org.apache.hadoop.fs.FileSystem.get(
spark._jsc.hadoopConfiguration()
)
#classmethod
def without_spark(cls):
with Spark() as spark:
return cls(spark)
My object depends obviously on the Spark object (another object that I created - If you need to see its code, I can add it but I do not think it is required for my current issue).
It can be used in 2 differents ways resulting the same behavior :
fs = FileSystem.without_spark()
# OR
with Spark() as spark:
fs = FileSystem(spark)
My problem is that, even if FileSystem is a singleton, using the class method without_spark makes me enter (__enter__) the context manager of spark, which lead to a connection to spark cluster, which takes a lot of time. How can I make that the first execution of without_spark do the connection, but the next one only returns the already created instance?
The expected behavior would be something like this :
#classmethod
def without_spark(cls):
if not cls.exists: # I do not know how to persist this information in the class
with Spark() as spark:
return cls(spark)
else:
return cls()
I think you are looking for something like
import contextlib
class FileSystem(metaclass=Singleton):
"""File System manager based on Spark"""
spark = None
def __init__(self, spark):
self._path = spark._jvm.org.apache.hadoop.fs.Path
self._fs = spark._jvm.org.apache.hadoop.fs.FileSystem.get(
spark._jsc.hadoopConfiguration()
)
#classmethod
def without_spark(cls):
if cls.spark is None:
cm = cls.spark = Spark()
else:
cm = contextlib.nullcontext(cls.spark)
with cm as s:
return cls(s)
The first time without_spark is called, a new instance of Spark is created and used as a context manager. Subsequent calls reuse the same Spark instance and use a null context manager.
I believe your approach will work as well; you just need to initialize exists to be False, then set it to True the first (and every, really) time you call the class method.
class FileSystem(metaclass=Singleton):
"""File System manager based on Spark"""
exists = False
def __init__(self, spark):
self._path = spark._jvm.org.apache.hadoop.fs.Path
self._fs = spark._jvm.org.apache.hadoop.fs.FileSystem.get(
spark._jsc.hadoopConfiguration()
)
#classmethod
def without_spark(cls):
if not cls.exists:
cls.exists = True
with Spark() as spark:
return cls(spark)
else:
return cls()
Can't you make the constructor argument optional, and initiate the Spark lazily, e.g. in a property (or functools.cached_property):
from functools import cached_property
class FileSystem(metaclass=Singleton):
def __init__(self, spark=None):
self._spark = spark
#cached_property
def spark(self):
if self._spark:
return self._spark
return self._spark := Spark()
#cached_property
def path(self):
return self.spark._jvm.org.apache.hadoop.fs.Path
#cached_property
def fs(self):
with self.spark:
return self.spark._jvm.org.apache.hadoop.fs.FileSystem.get(
self.spark._jsc.hadoopConfiguration()
)
Related
Is there a way to have completion/intellisense on (*args ,**kwargs) functions?
For instance:
class GetVar(GetVarInterface):
#classmethod
def fromcustom(cls,locorvar,offset=0,varType="int", name=None,deref=False,member=None):
return GetVarCustom(locorvar,offset,varType, name,deref,member)
class GetVarCustom(GetVar):
def __init__(self,locorvar,offset=0,varType="int", name=None,deref=False,member=None):
I wanted to implement this without specifying every argument of the constructor (For example using *vars, **kwargs) but didn't want to lose completion/intellisense abilities. Is there a way?
The disadvantage in the current implementation is that you would have to replicate the signature twice for every change...
The only option is to add a comment under the function to hint the arguments, otherwise you can't; if the ide is reading that a function has undefined arguments, it will show you that it's undefined.
A "solution" is to just use the common arguments and pass the rest as kwargs, or you can keep the original init.
class Single_Init:
def __init__(self, val_a, val_b, name=None):
self.val_a = val_a
self.val_b = val_b
self.name = name
class Single_Init_B(Single_Init):
# The previous contructor is calld
def get_result(self):
return self.val_a + self.val_b
class Split_Const:
def op_offset(self, offset):
self.offset = offset
def __init__(self, name, member=None, **kwargs):
""" You olso can hint in a func coment """
self.name = name
self.member = member
if 'offset' in kwargs:
self.offset = kwargs['offset']
else:
self.offset = None
if __name__ == '__main__':
single = Single_Init_B(2, 3)
print('Single:', single.get_result())
split = Split_Const('Name')
split.op_offset(0.5)
print('Split:', split.offset)
Got the solution outside this site..
#functools.wraps(functools.partial(GetVarCustom.__init__,1))
def f(*args,**kwargs):
return GetVarCustom(*args,**kwargs)
Of course, it would have been easier in case of a standard function. However, you need to update the assigned attribute of wraps. Otherwise it will change the function name.
#functools.wraps(GetVarCustom.value,assigned=['__doc__'])
def getvalue(*args,**kwargs):
return self_custom.value(*args,**kwargs)
I want to have abstract class Task and some derived classes like TaskA, TaskB, ...
I need static method in Task fetching all the tasks and returning list of them. But problem is that I have to fetch every task differently. I want Task to be universal so when I create new class for example TaskC it should work without changing class Task. Which design pattern should I use?
Let's say every derived Task will have decorator with its unique id, I am looking for function that would find class by id and create instance of it. How to do it in python?
There are a couple of ways you could achieve this.
the first and most simple is using the __new__ method as a factory to decide what subclass should be returned.
class Base:
UUID = "0"
def __new__(cls, *args, **kwargs):
if args == "some condition":
return A(*args, **kwargs)
elif args == "another condition":
return B(*args, **kwargs)
class A(Base):
UUID = "1"
class B(Base):
UUID = "2"
instance = Base("some", "args", "for", "the", condition=True)
in this example, if you wanted to make sure that the class is selected by uuid. you can replace the if condition to read something like
if a.UUID == "an argument you passed":
return A
but it's not really useful. since you have knowledge of the specific UUID, you might as well not bother going through the interface.
since I don't know what you want the decorator for, I can't think of a way to integrate it.
EDIT TO ADDRESS THE NOTE:
you don't need to have update it every time, if you do your expressions smartly.
let's say that the defining factor comes from a config file, that says "use class B"
for sub_classs in self.__subclasses__():
if sub_class.UUID == config.uuid:
return sub_class(*args, **kwargs) # make an instance and return it
the problem with that is that uuid is not useful to us as people. it would be easier to understand if instead we used a config.name to replace every place we have uuid in the example
I was fighting with this a lot of time and this is exactly what I wanted:
def class_id(id:int):
def func(cls):
cls.class_id = lambda : id
return cls
return func
def find_subclass_by_id(cls:type, id:int) -> type:
for t in cls.__subclasses__():
if getattr(t, "class_id")() == id:
return t
def get_class_id(obj)->int:
return getattr(type(obj), "class_id")()
class Task():
def load(self, dict:Dict) -> None:
pass
#staticmethod
def from_dict(dict:Dict) -> 'Task':
task_type = int(dict['task_type'])
t = find_subclass_by_id(Task, task_type)
obj:Task = t()
obj.load(dict)
return obj
#staticmethod
def fetch(filter: Dict):
return [Task.from_dict(doc) for doc in list_of_dicts]
#class_id(1)
class TaskA(Task):
def load(self, dict:Dict) -> None:
...
...
I am attempting to modify the code base found at this Github repository. The project aims to create a system which uses function decorators to add functions to a directed acyclic graph representing a pipeline of tasks to perform on some given input.
I would like to modify the project by creating a Task class which has a method process that is decorated to perform the tasks in the pipeline opposed to the use of top level functions as is the currently functionality.
class Task:
def __init__(self, name):
self.name = name
##Pipeline.task()
def process(self, input):
return input
class AdditionTask(Task):
def __init__(self, name, value):
super().__init__(self, name)
self.value = value
#Pipeline.task()
def process(self, input):
return map(lambda x: x + self.value, input)
I assume this is a valid decoration given the example3.py code provided in this link describing Python decorators. Though it seems when attempting to do this an error is generated when supplying the decorator
TypeError: task() missing 1 required positional argument: 'self'
As far as I can tell when the AdditionTask is instantiated, the function decorator is not within the same scope as the instantiation of the Pipeline() object. As evidenced by the example SSCCE below
from collections import deque
# https://github.com/vdymna/generic-python-pipeline/blob/master/pipeline/dag.py
class DAG:
"""Directed acyclic graph structure to manage pipeline task dependecies."""
def __init__(self):
self.graph = {}
def add(self, node, points_to=None):
"""Add new task not to the graph, specify optional 'points_to' parameter."""
if node not in self.graph:
self.graph[node] = []
if points_to:
if points_to not in self.graph:
self.graph[points_to] = []
self.graph[node].append(points_to) # todo: need to make sure not to add duplicates
# if sorted tasks and original graph lengths there must be a cycle
if len(self.sort()) != len(self.graph):
raise Exception('A cycle is detected in the graph')
def sort(self):
"""Sort all the task nodes based on the dependencies."""
self.in_degrees()
nodes_to_visit = deque()
for node, pointers in self.degrees.items():
# find all root nodes
if pointers == 0:
nodes_to_visit.append(node)
sorted_nodes = []
while nodes_to_visit:
node = nodes_to_visit.popleft()
for pointer in self.graph[node]:
self.degrees[pointer] -= 1
if self.degrees[pointer] == 0:
nodes_to_visit.append(pointer)
sorted_nodes.append(node)
return sorted_nodes
def in_degrees(self):
"""Determing number of in-coming edges for each task node."""
self.degrees = {}
for node in self.graph:
if node not in self.degrees:
self.degrees[node] = 0
for pointed in self.graph[node]:
if pointed not in self.degrees:
self.degrees[pointed] = 0
self.degrees[pointed] += 1
# https://github.com/vdymna/generic-python-pipeline/blob/master/pipeline/pipeline.py
class Pipeline:
"""Create a pipeline by chaining multiple tasks and identifying dependencies."""
def __init__(self):
self.tasks = DAG()
def task(self, depends_on=None):
"""Add new task to the pipeline and specify dependency task (optional)."""
def inner(func):
if depends_on:
self.tasks.add(depends_on, func)
else:
self.tasks.add(func)
return func
return inner
def run(self, *args):
"""Execute the pipeline and return each task results."""
sorted_tasks = self.tasks.sort()
completed = {}
for task in sorted_tasks:
for depend_on, nodes in self.tasks.graph.items():
if task in nodes:
completed[task] = task(completed[depend_on])
if task not in completed:
if sorted_tasks.index(task) == 0:
completed[task] = task(*args)
else:
completed[task] = task()
return completed
class Task:
def __init__(self, name):
self.name = name
##Pipeline.task()
def process(self, input):
return input
class AdditionTask(Task):
def __init__(self, name, value):
super().__init__(self, name)
self.value = value
#Pipeline.task()
def process(self, input):
return map(lambda x: x + self.value, input)
if __name__ == "__main__":
pipeline = Pipeline()
add_op = AdditionTask(4)
print(pipeline.run(range(0, 4)))
Is there a way I can decorate class functions with the function decorator defined in Pipeline to wrap class functions into the pipeline's functionality?
The only potential solution I have now is to create a class global in Pipeline, that is, instead of defining self.tasks I would utilize Pipeline.Tasks, that way the addition of Tasks to the pipeline is independent of scoping. But this has issues if I would like to create multiple Pipelines and in the administration of the pipeline's tasks and DAG.
EDIT: Hmm, I think I want something akin to the final advanced decorator example in this blog post. Here a decorator is created that is applied to a class. The decorator inspects the decorated class for attributes which it can then apply another decorator to. In this case, a decorator which determines a function's runtime. From the post,
def time_this(original_function):
print "decorating"
def new_function(*args,**kwargs):
print "starting timer"
import datetime
before = datetime.datetime.now()
x = original_function(*args,**kwargs)
after = datetime.datetime.now()
print "Elapsed Time = {0}".format(after-before)
return x
return new_function
def time_all_class_methods(Cls):
class NewCls(object):
def __init__(self,*args,**kwargs):
self.oInstance = Cls(*args,**kwargs)
def __getattribute__(self,s):
"""
this is called whenever any attribute of a NewCls object is accessed. This function first tries to
get the attribute off NewCls. If it fails then it tries to fetch the attribute from self.oInstance (an
instance of the decorated class). If it manages to fetch the attribute from self.oInstance, and
the attribute is an instance method then `time_this` is applied.
"""
try:
x = super(NewCls,self).__getattribute__(s)
except AttributeError:
pass
else:
return x
x = self.oInstance.__getattribute__(s)
if type(x) == type(self.__init__): # it is an instance method
return time_this(x) # this is equivalent of just decorating the method with time_this
else:
return x
return NewCls
#now lets make a dummy class to test it out on:
#time_all_class_methods
class Foo(object):
def a(self):
print "entering a"
import time
time.sleep(3)
print "exiting a"
oF = Foo()
oF.a()
Though this again still presents the same issue I identified in the original posting text, there is no way I see to pass a Pipeline instance to the decorator described by time_all_class_methods. While I may be able to decorate the Task class with a class akin to time_all_class_methods, it still would not be aware of an given instantiated instance of Pipeline to which the decorated attributes would be added; to Pipeline.tasks.
I've got a large library of Django apps that are shared by a handful of Django projects/sites. Within each project/site there is an option to define a 'Mix In' class that will be mixed in to one of the in-library base classes (which many models sub-class from).
For this example let's say the in-library base class is PermalinkBase and the mix-in class is ProjectPermalinkBaseMixIn.
Because so many models subclass from PermalinkBase, not all the methods/properities defined in ProjectPermalinkBaseMixIn will be utilitized by all of PermalinkBase's subclasses.
I'd like to write a decorator that can be applied to methods/properties within ProjectPermalinkBaseMixIn in order to limit them from running (or at least returning None) if they are accessed from a non-approved class.
Here's how I'm doing it now:
class ProjectPermalinkBaseMixIn(object):
"""
Project-specific Mix-In Class to `apps.base.models.PermalinkBase`
"""
def is_video_in_season(self, season):
# Ensure this only runs if it is being called from the video model
if self.__class__.__name__ != 'Video':
to_return = None
else:
videos_in_season = season.videos_in_this_season.all()
if self in list(videos_in_season):
to_return = True
else:
to_return False
return to_return
Here's how I'd like to do it:
class ProjectPermalinkBaseMixIn(object):
"""
Project-specific Mix-In Class to `apps.base.models.PermalinkBase`
"""
#limit_to_model('Video')
def is_video_in_season(self, season):
videos_in_season = season.videos_in_this_season.all()
if self in list(videos_in_season):
to_return = True
else:
to_return = False
return to_return
Is this possible with decorators? This answer helped me to better understand decorators but I couldn't figure out how to modify it to solve the problem I listed above.
Are decorators the right tool for this job? If so, how would I write the limit_to_model decorator function? If not, what would be the best way to approach this problem?
was looking at your problem and I think this might be an overcomplicated way to achieve what you are trying to do. However I wrote this bit of code:
def disallow_class(*klass_names):
def function_handler(fn):
def decorated(self, *args, **kwargs):
if self.__class__.__name__ in klass_names:
print "access denied to class: %s" % self.__class__.__name__
return None
return fn(self, *args, **kwargs)
return decorated
return function_handler
class MainClass(object):
#disallow_class('DisallowedClass', 'AnotherDisallowedClass')
def my_method(self, *args, **kwargs):
print "my_method running!! %s" % self
class DisallowedClass(MainClass): pass
class AnotherDisallowedClass(MainClass): pass
class AllowedClass(MainClass): pass
if __name__ == "__main__":
x = DisallowedClass()
y = AnotherDisallowedClass()
z = AllowedClass()
x.my_method()
y.my_method()
z.my_method()
If you run this bit of code on your command line the output will be something like:
access denied to class: DisallowedClass
access denied to class: AnotherDisallowedClass
my_method running!! <__main__.AllowedClass object at 0x7f2b7105ad50>
Regards
I have some way of building a data structure (out of some file contents, say):
def loadfile(FILE):
return # some data structure created from the contents of FILE
So I can do things like
puppies = loadfile("puppies.csv") # wait for loadfile to work
kitties = loadfile("kitties.csv") # wait some more
print len(puppies)
print puppies[32]
In the above example, I wasted a bunch of time actually reading kitties.csv and creating a data structure that I never used. I'd like to avoid that waste without constantly checking if not kitties whenever I want to do something. I'd like to be able to do
puppies = lazyload("puppies.csv") # instant
kitties = lazyload("kitties.csv") # instant
print len(puppies) # wait for loadfile
print puppies[32]
So if I don't ever try to do anything with kitties, loadfile("kitties.csv") never gets called.
Is there some standard way to do this?
After playing around with it for a bit, I produced the following solution, which appears to work correctly and is quite brief. Are there some alternatives? Are there drawbacks to using this approach that I should keep in mind?
class lazyload:
def __init__(self,FILE):
self.FILE = FILE
self.F = None
def __getattr__(self,name):
if not self.F:
print "loading %s" % self.FILE
self.F = loadfile(self.FILE)
return object.__getattribute__(self.F, name)
What might be even better is if something like this worked:
class lazyload:
def __init__(self,FILE):
self.FILE = FILE
def __getattr__(self,name):
self = loadfile(self.FILE) # this never gets called again
# since self is no longer a
# lazyload instance
return object.__getattribute__(self, name)
But this doesn't work because self is local. It actually ends up calling loadfile every time you do anything.
The csv module in the Python stdlibrary will not load the data until you start iterating over it, so it is in fact lazy.
Edit: If you need to read through the whole file to build the datastructure, having a complex Lazy load object that proxies things is overkill. Just do this:
class Lazywrapper(object):
def __init__(self, filename):
self.filename = filename
self._data = None
def get_data(self):
if self._data = None:
self._build_data()
return self._data
def _build_data(self):
# Now open and iterate over the file to build a datastructure, and
# put that datastructure as self._data
With the above class you can do this:
puppies = Lazywrapper("puppies.csv") # Instant
kitties = Lazywrapper("kitties.csv") # Instant
print len(puppies.getdata()) # Wait
print puppies.getdata()[32] # instant
Also
allkitties = kitties.get_data() # wait
print len(allkitties)
print kitties[32]
If you have a lot of data, and you don't really need to load all the data you could also implement something like class that will read the file until it finds the doggie called "Froufrou" and then stop, but at that point it's likely better to stick the data in a database once and for all and access it from there.
If you're really worried about the if statement, you have a Stateful object.
from collections import MutableMapping
class LazyLoad( MutableMapping ):
def __init__( self, source ):
self.source= source
self.process= LoadMe( self )
self.data= None
def __getitem__( self, key ):
self.process= self.process.load()
return self.data[key]
def __setitem__( self, key, value ):
self.process= self.process.load()
self.data[key]= value
def __contains__( self, key ):
self.process= self.process.load()
return key in self.data
This class delegates the work to a process object which is either a Load or a
DoneLoading object. The Load object will actually load. The DoneLoading
will not load.
Note that there are no if-statements.
class LoadMe( object ):
def __init__( self, parent ):
self.parent= parent
def load( self ):
## Actually load, setting self.parent.data
return DoneLoading( self.parent )
class DoneLoading( object ):
def __init__( self, parent ):
self.parent= parent
def load( self ):
return self
Wouldn't if not self.F lead to another call to __getattr__, putting you into an infinite loop? I think your approach makes sense, but to be on the safe side, I'd make that line into:
if name == "F" and not self.F:
Also, you could make loadfile a method on the class, depending on what you're doing.
Here's a solution that uses a class decorator to defer initialisation until the first time an object is used:
def lazyload(cls):
original_init = cls.__init__
original_getattribute = cls.__getattribute__
def newinit(self, *args, **kwargs):
# Just cache the arguments for the eventual initialization.
self._init_args = args
self._init_kwargs = kwargs
self.initialized = False
newinit.__doc__ = original_init.__doc__
def performinit(self):
# We call object's __getattribute__ rather than super(...).__getattribute__
# or original_getattribute so that no custom __getattribute__ implementations
# can interfere with what we are doing.
original_init(self,
*object.__getattribute__(self, "_init_args"),
**object.__getattribute__(self, "_init_kwargs"))
del self._init_args
del self._init_kwargs
self.initialized = True
def newgetattribute(self, name):
if not object.__getattribute__(self, "initialized"):
performinit(self)
return original_getattribute(self, name)
if hasattr(cls, "__getitem__"):
original_getitem = cls.__getitem__
def newgetitem(self, key):
if not object.__getattribute__(self, "initialized"):
performinit(self)
return original_getitem(self, key)
newgetitem.__doc__ = original_getitem.__doc__
cls.__getitem__ = newgetitem
if hasattr(cls, "__len__"):
original_len = cls.__len__
def newlen(self):
if not object.__getattribute__(self, "initialized"):
performinit(self)
return original_len(self)
newlen.__doc__ = original_len.__doc__
cls.__len__ = newlen
cls.__init__ = newinit
cls.__getattribute__ = newgetattribute
return cls
#lazyload
class FileLoader(dict):
def __init__(self, filename):
self.filename = filename
print "Performing expensive load operation"
self[32] = "Felix"
self[33] = "Eeek"
kittens = FileLoader("kitties.csv")
print "kittens is instance of FileLoader: %s" % isinstance(kittens, FileLoader) # Well obviously
print len(kittens) # Wait
print kittens[32] # No wait
print kittens[33] # No wait
print kittens.filename # Still no wait
print kittens.filename
The output:
kittens is instance of FileLoader: True
Performing expensive load operation
2
Felix
Eeek
kitties.csv
kitties.csv
I tried to actually restore the original magic methods after the initialization, but it wasn't working out. It may be necessary to proxy additional magic methods, I didn't investigate every scenario.
Note that kittens.initialized will always return True because it kicks off the initialization if it hasn't already been performed. Obviously it would be possible to add an exemption for this attribute so that it would return False if no other operation had been performed on the object, or the check could be changed to the equivalent of a hasattr call and the initialized attribute could be deleted after the initialization.
Here's a hack that makes the "even better" solution work, but I think it's annoying enough that it's probably better to just use the first solution. The idea is to execute the step self = loadfile(self.FILE) by passing the the variable name as an attribute:
class lazyload:
def __init__(self,FILE,var):
self.FILE = FILE
self.var = var
def __getattr__(self,name):
x = loadfile(self.FILE)
globals()[self.var]=x
return object.__getattribute__(x, name)
Then you can do
kitties = lazyload("kitties.csv","kitties")
^ ^
\ /
These two better match exactly
After you call any method on kitties (aside from kitties.FILE or kitties.var), it will become completely indistinguishable from what you'd have gotten with kitties = loadfile("kitties.csv"). In particular, it will no longer be an instance of lazyload and kitties.FILE and kitties.var will no longer exist.
If you need use puppies[32] you need also define __getitem__ method because __getattr__ don't catch that behaviour.
I implement lazy load for my needs, there is non-adapted code:
class lazy_mask(object):
'''Fake object, which is substituted in
place of masked object'''
def __init__(self, master, id):
self.master=master
self.id=id
self._result=None
self.master.add(self)
def _res(self):
'''Run lazy job'''
if not self._result:
self._result=self.master.get(self.id)
return self._result
def __getattribute__(self, name):
'''proxy all queries to masked object'''
name=name.replace('_lazy_mask', '')
#print 'attr', name
if name in ['_result', '_res', 'master', 'id']:#don't proxy requests for own properties
return super(lazy_mask, self).__getattribute__(name)
else:#but proxy requests for masked object
return self._res().__getattribute__(name)
def __getitem__(self, key):
'''provide object["key"] access. Else can raise
TypeError: 'lazy_mask' object is unsubscriptable'''
return self._res().__getitem__(key)
(master is registry object that load data when i run it's get() method)
This implementation works ok for isinstance() and str() and json.dumps() with it