I recently started to work with Python's classes, since I need to work with it through the use of OTree, a Python framework used for online experiment.
In one file, I define the pages that I want to be created, using classes. So essentially, in the OTree system, each class corresponds to a new page. The thing is, all pages (so classes) are basically the same, at the exception to some two parameters, as shown in the following code:
class Task1(Page):
form_model = 'player'
form_fields = ['Envie_WordsList_Toy']
def is_displayed(self):
return self.round_number == self.participant.vars['task_rounds'][1]
def vars_for_template(player):
WordsList_Toy= Constants.WordsList_Toy.copy()
random.shuffle(WordsList_Toy)
return dict(
WordsList_Toy=WordsList_Toy
)
#staticmethod
def live_method(player, data):
player.WTP_WordsList_Toy = int(data)
def before_next_page(self):
self.participant.vars['Envie_WordsList_Toy'] = self.player.Envie_WordsList_Toy
self.participant.vars['WTP_WordsList_Toy'] = self.player.WTP_WordsList_Toy
So here, the only thing that would change would be the name of the class, as well as the suffix of the variable WordsList_ used throughout this code, which is Toy.
Naively, what I tried to do is to define a function that would take those two parameters, such as this:
def page_creation(Task_Number,name_type):
class Task+str(Task_Number)(Page):
form_model = 'player'
form_fields = ['Envie_WordsList_'+str(name_type)]
def is_displayed(self):
return self.round_number == self.participant.vars['task_rounds'][1]
def vars_for_template(player):
WordsList_+str(name_type) = Constants.WordsList+str(name_type).copy()
random.shuffle(WordsList_+str(name_type))
return dict(
WordsList_+str(name_type)=WordsList_+str(name_type)
)
#staticmethod
def live_method(player, data):
player.WTP_WordsList_+str(name_type) = int(data)
def before_next_page(self):
self.participant.vars['Envie_WordsList_+str(name_type)'] = self.player.Envie_WordsList_+str(name_type)
self.participant.vars['WTP_WordsList_+str(name_type)'] = self.player.WTP_WordsList_+str(name_type)
Obviously, it does not work since I have the feeling that it is not possible to construct variables (or classes identifier) this way. I just started to really work on Python some weeks ago, so some of its aspects might escape me still. Could you help me on this issue? Thank you.
You can generate dynamic classes using the type constructor:
MyClass = type("MyClass", (BaseClass1, BaseClass2), {"attr1": "value1", ...})
Thus, according to your case, that would be:
cls = type(f"Task{TaskNumber}", (Page, ), {"form_fields": [f"Envive_WordList_{name_type}"], ...})
Note that you still have to construct your common methods like __init__, is_displayed and so on, as inner functions of the class factory:
def class_factory(*args, **kwargs):
...
def is_displayed(self):
return self.round_number == self.participant.vars['task_rounds']
def vars_for_template(player):
...
# Classmethod wrapping is done below
def live_method(player, data):
...
cls = type(..., {
"is_displayed": is_displayed,
"vars_for_template": vars_for_template,
"live_method": classmethod(live_method),
...,
}
#classmethod could be used as a function - {"live_method": classmethod(my_method)}
Related
I have a class in Python that initializes the attributes of an environment. I am attempting to grab the topographyRegistry attribute list of my Environment class in a separate function, which when called, should take in the parameters of 'self' and the topography to be added. When this function is called, it should simply take an argument such as addTopographyToEnvironment(self, "Mountains") and append it to the topographyRegistry of the Environment class.
When implementing what I mentioned above, I ran into an error regarding the 'self' method not being defined. Hence, whenever I call the above line, it gives me:
print (Environment.addTopographyToEnvironment(self, "Mountains"))
^^^^
NameError: name 'self' is not defined
This leads me to believe that I am unaware of and missing a step in my implementation, but I am unsure of what that is exactly.
Here is the relevant code:
class EnvironmentInfo:
def __init__(self, perceivableFood, perceivableCreatures, regionTopography, lightVisibility):
self.perceivableFood = perceivableFood
self.perceivableCreatures = perceivableCreatures
self.regionTopography = regionTopography
self.lightVisibility = lightVisibility
class Environment:
def __init__(self, creatureRegistry, foodRegistry, topographyRegistery, lightVisibility):
logging.info("Creating new environment")
self.creatureRegistry = []
self.foodRegistry = []
self.topographyRegistery = []
self.lightVisibility = True
def displayEnvironment():
creatureRegistry = []
foodRegistry = []
topographyRegistery = ['Grasslands']
lightVisibility = True
print (f"Creatures: {creatureRegistry} Food Available: {foodRegistry} Topography: {topographyRegistery} Contains Light: {lightVisibility}")
def addTopographyToEnvironment(self, topographyRegistery):
logging.info(
f"Registering {topographyRegistery} as a region in the Environment")
self.topographyRegistery.append(topographyRegistery)
def getRegisteredEnvironment(self):
return self.topographyRegistry
if __name__ == "__main__":
print (Environment.displayEnvironment()) #Display hardcoded attributes
print (Environment.addTopographyToEnvironment(self, "Mountains"))#NameError
print (Environment.getRegisteredEnvironment(self)) #NameError
What am I doing wrong or not understanding when using 'self'?
Edit: In regard to omitting 'self' from the print statement, it still gives me an error indicating a TypeError:
print (Environment.addTopographyToEnvironment("Mountains"))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Environment.addTopographyToEnvironment() missing 1 required positional argument: 'topographyRegistery'
Comments
Despite having def getRegisteredEnvironment(self): it wasn't indented, so it's not recognized as a class method.
self is a keyword used in conjunction with classes (class methods or attributes) - not functions. self is implied to be the instantiated object (eg a = Environment(...) -> self would refer to a) or the module's (I can't think of the proper term) class.
You didn't have your addTopographyToEnvironment class method defined.
In terms of your Environment class, you aren't using the variables you are passing to the class, so I made that change as well - I don't know if that was intentional or not.
As per your comment from the other answer, if you had def my_class_method(self) and you try to invoke it through an object with additional parameters, like so a = my_object(); a.my_class_method("Mountains"), you should get an error of the sorts, "2 positional arguments passed, expected 1.".
Your main problem is that you are doing Environment.class_method() and not creating an object from the class. Do a = Environment(whatever arguments here) to create an object from the class, then do a.addTopographyToEnvironment("Mountains") to do what you were going to do with "Mountains" and that object. What you have currently may be right, its just is missing the proper implementation, but the below article does a great job explaining the differences between all of them (Class Methods vs Static Methods vs Instance Methods), and is definitely worth the read.
class EnvironmentInfo:
def __init__(self, perceivableFood, perceivableCreatures, regionTopography, lightVisibility):
self.perceivableFood = perceivableFood
self.perceivableCreatures = perceivableCreatures
self.regionTopography = regionTopography
self.lightVisibility = lightVisibility
class Environment:
def __init__(self, creatureRegistry, foodRegistry, topographyRegistery, lightVisibility):
logging.info("Creating new environment")
self.creatureRegistry = creatureRegistry
self.foodRegistry = foodRegistry
self.topographyRegistery = topographyRegistery
self.lightVisibility = lightVisibility
def displayEnvironment(self):
creatureRegistry = []
foodRegistry = []
topographyRegistery = ['Grasslands']
lightVisibility = True
print (f"Creatures: {creatureRegistry} Food Available: {foodRegistry} Topography: {topographyRegistery} Contains Light: {lightVisibility}")
def addTopographyToEnvironment(self, environment):
return "Whatever this is supposed to return." + environment
def getRegisteredEnvironment(self):
return self.topographyRegistry
if __name__ == "__main__":
print (Environment.displayEnvironment()) #Display hardcoded attributes
print (Environment.addTopographyToEnvironment("Mountains"))#NameError
print (Environment.getRegisteredEnvironment()) #NameError
Object Instantiation In Python
With all that out of the way, I will answer the question as is posed, "Is there a way to grab list attributes that have been initialized using self and append data to them in Python?". I am assuming you mean the contents of the list and not the attributes of it, the attributes would be "got" or at least printed with dir()
As a simple example:
class MyClass:
def __init__(self, my_list):
self.my_list = my_list
if __name__ == "__main__":
a = MyClass([1, 2, 3, 4, 5])
print(a.my_list)
# will print [1, 2, 3, 4, 5]
a.my_list.append(6)
print(a.my_list)
# will print [1, 2, 3, 4, 5, 6]
print(dir(a.my_list))
# will print all object methods and object attributes for the list associated with object "a".
Sub Classing In Python
Given what you have above, it looks like you should be using method sub classing - this is done with the keyword super. From what I can guess, it would look like you'd implement that kind of like this:
class EnvironmentInfo:
def __init__(self, perceivableFood, perceivableCreatures, regionTopography, lightVisibility):
self.perceivableFood = perceivableFood
self.perceivableCreatures = perceivableCreatures
self.regionTopography = regionTopography
self.lightVisibility = lightVisibility
class Environment(EnvironmentInfo):
def __init__(self, creatureRegistry, foodRegistry, topographyRegistery, lightVisibility, someOtherThingAvailableToEnvironmentButNotEnvironmentInfo):
logging.info("Creating new environment")
super.__init__(foodRegistry, creatureRegistry, topographyRegistery, lightVisibility)
self.my_var1 = someOtherThingAvailableToEnvironmentButNotEnvironmentInfo
def displayEnvironment(self):
creatureRegistry = []
foodRegistry = []
topographyRegistery = ['Grasslands']
lightVisibility = True
print (f"Creatures: {creatureRegistry} Food Available: {foodRegistry} Topography: {topographyRegistery} Contains Light: {lightVisibility}")
def addTopographyToEnvironment(self, environment):
return "Whatever this is supposed to return." + environment
def getRegisteredEnvironment(self):
return self.topographyRegistry
def methodAvailableToSubClassButNotSuper(self)
return self.my_var1
if __name__ == "__main__":
a = Environment([], [], [], True, "Only accessible to the sub class")
print(a.methodAvailableToSubClassButNotSuper())
as the article describes when talking about super(), methods and attributes from the super class are available to the sub class.
Extra Resources
Class Methods vs Static Methods vs Instance Methods - "Difference #2: Method Defination" gives an example that would be helpful I think.
What is sub classing in Python? - Just glanced at it; probably an okay read.
Self represents the instance of the class and you don't have access to it outside of the class, by the way when you are calling object methods of a class you don't need to pass self cause it automatically be passed to the method you just need to pass the parameters after self so if you want to call an object method like addTopographyToEnvironment(self, newVal) you should do it like:
Environment.addTopographyToEnvironment("Mountains")
and it should work fine
Given this example code where we have a series of log processors, I can't help feeling there ought to be a more pythonic/efficient way of deciding which log processor to use to process some data:
class Component1ErrorLogProcessor:
def process(logToProcess):
# Do something with the logs
pass
class Component2ErrorLogProcessor:
def process(logToProcess):
# Do something with the logs
pass
class LogProcessor:
def __init__(self):
self.component1 = Component1ErrorLogProcessor()
self.component2 = Component2ErrorLogProcessor()
def process_line(self, line, component):
if component == "Component1Log-" or component == "[Component1]":
self.component1.process_errors(line)
elif component == "Component2Log-" or component == "[Component2]":
self.component2.process_errors(line)
I'd personally use the idea of registry, so you map each class to component names.
There are a bunch of different ways to go about this, here's a quick example by using a base class:
class ComponentLogProcessor(object):
_Mapping = {}
#classmethod
def register(cls, *component_names):
for name in component_names:
cls._Mapping[name] = cls
#classmethod
def cls_from_component(cls, component):
return cls._Mapping[component]
class Component1ErrorLogProcessor(ComponentLogProcessor):
def process(logToProcess):
# Do something with the logs
pass
Component1ErrorLogProcessor.register('Component1Log-', '[Component1]')
class Component2ErrorLogProcessor(ComponentLogProcessor):
def process(logToProcess):
# Do something with the logs
pass
Component2ErrorLogProcessor.register('Component2Log-', '[Component2]')
class LogProcessor:
def process_line(self, line, component):
ComponentLogProcessor.cls_from_component(component).process_errors(line)
I have created this class that works as expected, I want only to expose one method, get_enriched_dataso the other are pretty much private w/ the underscore.
The functionality works, just pretty convinced I am not doing the most pythonic/OOP way:
class MergeClients:
def __init__(self,source_df,extra_info_df,type_f):
self.df_all = pd.merge(source_df,extra_info_df, on='clientID', how='left')
self.avg_age = self._get_avg_age()
self.type_f = 'Medium'
def _filter_by_age(self, age):
return self.df_all[self.df_all['Age'] > age]
def _filter_by_family_type(self, f_type):
return self.df_all[self.df_all['familyType'] == f_type]
def _get_avg_age(self):
return self.df_all['Age'].mean()
def get_enriched_data(self):
self.df_all = self._filter_by_age(self.avg_age)
self.df_all=self._filter_by_family_type(self.type_f)
return self.df_all
But I find the code looks so ugly with so many self references, for example in the get_enriched_datamethod there are three self references per line, how can I correct this? Any direction on how to correctly Python classes is welcome.
Edit:
Example of working code:
main_df = pd.DataFrame({'clientID':[1,2,3,4,5],
'Name':['Peter','Margaret','Marc','Alice','Maria']})
extra_info = pd.DataFrame({'clientID':[1,2,3,4,5],'Age':[19,35,18,65,57],'familyType':['Big','Medium','Single','Medium','Medium']})
family_stats = MergeClients(main_df,extra_info,'Medium')
family_filtered = family_stats.get_enriched_data()
There are some odd things about your code. I will point out one thing about instances: every method has access to all attributes, so you don't always need to pass them as parameters:
class MergeClients:
def __init__(self,source_df,extra_info_df,type_f):
self.df_all = pd.merge(source_df,extra_info_df, on='clientID', how='left')
self.avg_age = self._get_avg_age()
self.type_f = 'Medium'
def _filter_by_age(self): #No need for age param
return self.df_all[self.df_all['Age'] > self.avg_age]
def _filter_by_family_type(self): #No need for f_type param
return self.df_all[self.df_all['familyType'] == self.type_f]
def _get_avg_age(self):
return self.df_all['Age'].mean()
def get_enriched_data(self):
self.df_all = self._filter_by_age()
self.df_all = self._filter_by_family_type()
return self.df_all
Since the two methods in question: _filter_by_age() and _filter_by_family_type() are private by convention, this means that clients of your class are not expected to call them. So if only other methods of this class call these methods and only the ones you have shown, then there is no need to pass parameters which are already attributes.
Alternatively there is the argument that for other private methods where sometimes they should use attributes, but at other times they should take a parameter, then I would make those methods take a parameter as you had originally.
Functions declared within a Python Class can be effectively made 'private' by preceding the name with double underscore. For example:
class Clazz():
def __work(self):
print('Working')
def work(self):
self.__work()
c = Clazz()
c.work()
c.__work()
The output of this would be:
Working
Traceback (most recent call last):
File "/Volumes/G-DRIVE Thunderbolt 3/PythonStuff/play.py", line 575, in
c = Clazz()
AttributeError: 'Clazz' object has no attribute '__work'
In other words, the __work function has been 'hidden'
I'm super new to Python (I started about 3 weeks ago) and I'm trying to make a script that scrapes web pages for information. After it's retrieved the information it runs through a function to format it and then passes it to a class that takes 17 variables as parameters. The class uses this information to calculate some other variables and currently has a method to construct a dictionary. The code works as intended but a plugin I'm using with Pycharm called SonarLint highlights that 17 variables is too many to use as parameters?
I've had a look for alternate ways to pass the information to the class, such as in a tuple or a list but couldn't find much information that seemed relevant. What's the best practice for passing many variables to a class as parameters? Or shouldn't I be using a class for this kind of thing at all?
I've reduced the amount of variables and code for legibility but here is the class;
Class GenericEvent:
def __init__(self, type, date_scraped, date_of_event, time, link,
blurb):
countdown_delta = date_of_event - date_scraped
countdown = countdown_delta.days
if countdown < 0:
has_passed = True
else:
has_passed = False
self.type = type
self.date_scraped = date_scraped
self.date_of_event = date_of_event
self.time = time
self.link = link
self.countdown = countdown
self.has_passed = has_passed
self.blurb = blurb
def get_dictionary(self):
event_dict = {}
event_dict['type'] = self.type
event_dict['scraped'] = self.date_scraped
event_dict['date'] = self.date_of_event
event_dict['time'] = self.time
event_dict['url'] = self.link
event_dict['countdown'] = self.countdown
event_dict['blurb'] = self.blurb
event_dict['has_passed'] = self.has_passed
return event_dict
I've been passing the variables as key:value pairs to the class after I've cleaned up the data the following way:
event_info = GenericEvent(type="Lunar"
date_scraped=30/01/19
date_of_event=28/07/19
time=12:00
link="www.someurl.com"
blurb="Some string.")
and retrieving a dictionary by calling:
event_info.get_dictionary()
I intend to add other methods to the class to be able to perform other operations too (not just to create 1 dictionary) but would like to resolve this before I extend the functionality of the class.
Any help or links would be much appreciated!
One option is a named tuple:
from typing import Any, NamedTuple
class GenericEvent(NamedTuple):
type: Any
date_scraped: Any
date_of_event: Any
time: Any
link: str
countdown: Any
blurb: str
#property
def countdown(self):
countdown_delta = date_of_event - date_scraped
return countdown_delta.days
#property
def has_passed(self):
return self.countdown < 0
def get_dictionary(self):
return {
**self._asdict(),
'countdown': self.countdown,
'has_passed': self.has_passed,
}
(Replace the Anys with the fields’ actual types, e.g. datetime.datetime.)
Or, if you want it to be mutable, a data class.
I don't think there's anything wrong with what you're doing. You could, however, take your parameters in as a single dict object, and then deal with them by iterating over the dict or doing something explicitly with each one. Seems like that would, in your case, make your code messier.
Since all of your parameters to your constructor are named parameters, you could just do this:
def __init__(self, **params):
This would give you a dict named params that you could then process. The keys would be your parameter names, and the values the parameter values.
If you aligned your param names with what you want the keys to be in your get_dictionary method's return value, saving off this parameter as a whole could make that method trivial to write.
Here's an abbreviated version of your code (with a few syntax errors fixed) that illustrates this idea:
from pprint import pprint
class GenericEvent:
def __init__(self, **params):
pprint(params)
event_info = GenericEvent(type="Lunar",
date_scraped="30/01/19",
date_of_event="28/07/19",
time="12:00",
link="www.someurl.com",
blurb="Some string.")
Result:
{'blurb': 'Some string.',
'date_of_event': '28/07/19',
'date_scraped': '30/01/19',
'link': 'www.someurl.com',
'time': '12:00',
'type': 'Lunar'}
I have this main class
def main(args):
if type == train_pipeline_type:
strategy = TrainPipelineStrategy()
else:
strategy = TestPipelineStrategy()
for table in fetch_table_information_by_region(region):
split_required = DataUtils.load_from_dict(table, "split_required")
if split_required:
strategy.split(spark=spark, table_name=table_name,
data_loc=filtered_data_location, partition_column=partition_column,
split_output_dir= split_output_dir)
logger.info("Data Split for table : {} completed".format(table_name))
My TrainPipelineStrategy, and TestPipelineStrategy looks like this -
class PipelineTypeStrategy(object):
def partition_data(self, x):
# Something
def prepare_split_data(self, y):
# Something
def write_split_data(self, z):
# Something
def split(self, p):
# Something
class TrainPipelineStrategy(PipelineTypeStrategy):
""""""
class TestPipelineStrategy(PipelineTypeStrategy):
def write_split_data(self, y):
# Something else
My test case -
I need to test how many times split is called by mocking split functionality in main method.
Here is what i have tried -
#patch('module.PipelineTypeStrategy.TrainPipelineStrategy')
def test_split_data_main_split_data_call_count(self, fake_train):
fake_train_functions = mock.Mock()
fake_train_functions.split.return_value = None
fake_train.return_value = fake_train_functions
test_args = ["", "--x=6"]
SplitData.main(args=test_args)
assert fake_train_functions.split.call_count == 10
When i try to run my test, it creates the mock but ultimately ends up calling the actual split function. What am i doing wrong ?
The main issue with this code is that the way you set up the patch would be if TrainPipelineStrategy were a nested class of PipelineTypeStrategy, but TrainPipelineStrategy is a subclass of PipelineTypeStrategy.
Since TrainPipelineStrategy inherits from PipelineTypeStrategy it has access to split directly, so you can patch split without any reference to PipelineTypeStrategy (unless you specifically want to patch the version of split defined in PipelineTypeStrategy).
However, if you just want to mock the split method of the PipelineTypeStrategy class, you should use the patch.object decorator to mock just split instead of mocking the whole class as it's a bit more clean. Here's an example:
class TestClass(unittest.TestCase):
#patch.object(TrainPipelineStrategy, 'split', return_value=None)
def test_split_data_main_split_data_call_count(self, mock_split):
test_args = ["", "--x=6"]
SplitData.main(args=test_args)
self.assertEqual(mock_split.call_count, 10)