Creating Python class with regular expressions

Creating Python class with regular expressions - python

I apologize for the newbie question, but this is my first time working with classes. The class I'm trying to create is intended to perform a regex find and replace on all keys and values within a dictionary. The specific find and replace is defined upon instantiation.
There are two issues that I have. The first issue is that each instance of the class needs to accept a new dictionary. I'm not clear on how to create a class that accepts a general dictionary which I can specify upon creating an instance.
The second issue is that the class I have simply isn't working. I'm receiving the error message TypeError: expected string or buffer in the class line v = re.sub(self.find,self.replace,v).
There are three instances I want to create, one for each input dictionary: input_iter1, input_iter2, and input_iter3.
The following is the class:
class findreplace:
values = []
keys = []
def __init__(self, find, replace):
self.find = find
self.replace = replace
def value(self):
for k,v in input_iter1.items():
v = re.sub(self.find,self.replace,v)
findreplace.values.append(v)
def key(self):
for k,v in input_iter1.items():
k = re.sub(self.find,self.replace,k)
findreplace.keys.append(k)
The following are the instances:
values1 = findreplace('[)?:(]','')
values1.value()
values2 = findreplace(r'(,\s)(,\s)(\d{5})({e<=1})',r'\2\3')
values2.value()
keys1 = findreplace(r'(?<=^)(.+)(?=$)',r'(?:\1)')
keys1.key()
keys2 = findreplace(r'(?=$)',r'{e}')
keys2.key()
print values
print keys
If anyone has any insight on how I can workaround these two issues, I'd be grateful to hear them. Thanks!

First, Python 2 classes should start off this way:
class Foo(object):
Otherwise, you get an "old-style class", which is some ancient crusty thing no one uses.
Also, class names in Python are typically written in CamelCase.
Second, do not use mutable values (like lists!) as class attributes, as you're doing here with keys and values. They'll be shared across all instances of your class! It looks like you're even aware of this, since you refer to findreplace.keys directly, but it doesn't make sense to store instance-specific values in a class attribute like that.
But, most importantly: why is this a class at all? What does a findreplace represent? It looks like this would be much clearer if it were just a single function.
To answer your actual questions:
You pass in a dictionary just like you're passing in find and replace. Add another argument to __init__, and pass another argument when you construct your class.
Presumably, you're getting the TypeError because one of the values in your dictionary isn't a string, and you can only perform regexes on strings.

Where is your definition of the input_iter dicts? How do they look like? Your error indicates that the values of your dicts are not strings.

Related

Overwriting bound methods in python when function requires class attribute

Say I have a class:
class Data():
def __init__(self):
self.scores = []
self.encoding= {1: 'first', 2: 'second', 3:'third'}
def build():
self.scores = [1, 2, 3]
def translate(self):
return [self.encoding[score] for val in self.scores]
Now I want to be able to translate the columns for a given data object...
# I want to be able to do
d= Data()
d.scores.translate()
# AttributeError: 'list' object has no attribute 'translate'
# Instead of
d= Data()
d.translate()
Now I am fully aware that I am trying to access a method that does NOT exist for that list (translate()). I want to be able to make method calls as is mentioned above (d.scores.translate()) because I may have some specific subslice of d.scores I want to translate.
For Example, if d.scores was a nested numpy array (I only want to translate 1st 5 columns but keep all rows)
#this is what I would like to be able to do
d.scores[:, 1:5].translate()
# And I don't want to build a kwarg in the translate method to handle it like
d.scores.translate(indices=[1])
I know this is more of an implementation question, and I'm wondering what the best practice should be.
Am I trying to force a square peg into a round hole at this point? Should I just give up and define a module function or consider the kwargs? Is that more 'pythonic'?
UPDATE
I should have said this sooner but I did try using the kwarg and staticmethod route. I just want to know if there are other ways to accomplish this? Maybe through subclassing? or Python's equivalent of interfacing in java/C# (if it exists?)

Yes, you are trying to "force a square peg into a round hole".
Your translate method works on the whole scores list, full stop. This can not be changed with some trickery, which is simply not supported in Python.
When you want to do subslices, I would recommend doing it explicitly.
Examples:
# Using args/kwargs:
scores = john.translate(10, 15) # translate subslice 10:15
# Using a new static method:
scores = Person.translate_scores(john.scores[10:15])
Looks not that elegant, but works.
(BTW: Since you changed your question, my classes might be a little of, but I will not change my answer with every edit you make)
Your trickery simply does not work, because "scores" is not some part of your main class, but simply an attribute of it, which has its own type. So, when you do "d.scores.translate()" translate is not called on d, but on a list or whatever type scores is. You can not change that, because it is core Python.

You could do it by using a second class and use _scores for the list and a sub-object scores which manipulates _scores:
class DataHelper(object):
def __init__(self, data_obj):
self.data_obj = data_obj
def translate(self, *args):
... # work on self.data_obj._scores
class Data(object):
def __init__(self):
self.scores = DataHelper(self)
self._scores = []
With such a class structure, you might be able to to this:
scores = d.scores.translate(1, 5)
And with more trickery, you might be able to even do:
scores = d.scores[1:5].translate()
But for that, you will need a third class (objects of that will be created temporary on indexing scores objects, so that d.scores[1:5] will not create a list slice but a new object with translate method).

Efficiently setting attribute values for a class instantiated within another class

I am trying to set the attribute values of a certain class AuxiliaryClass than is instantiated in a method from MainClass class in the most efficient way possible.
AuxiliaryClass is instantiated within a method of MainClass - see below. However, AuxiliaryClass has many different attributes and I need to set the value of those attributes once the class has been instantiated - see the last 3 lines of my code.
Note: due to design constraints I cannot explain here, my classes only contain methods, meaning that I need to declare attributes as methods (see below).
class AuxiliaryClass(object):
def FirstMethod(self):
return None
...
def NthMethod(self):
return None
class MainClass(object):
def Auxiliary(self):
return AuxiliaryClass()
def main():
obj = MainClass()
obj.Auxiliary().FirstMethod = #some_value
...
obj.Auxiliary().NthMethod = #some_other_value
# ~~> further code
Basically I want to replace these last 3 lines of code with something neater, more elegant and more efficient. I know I could use a dictionary if I was instantiating AuxiliaryClass directly:
d = {'FirstMethod' : some_value,
...
'NthMethod' : some_other_value}
obj = AuxiliaryClass(**d)
But this does not seem to work for the structure of my problem. Finally, I need to set the values of AuxiliaryClass's attributes once MainClass has been instantiated (so I can't set the attribute's values within method Auxiliary).
Is there a better way to do this than obj.Auxiliary().IthMethod = some_value?
EDIT
A couple of people have said that the following lines:
obj.Auxiliary().FirstMethod = #some_value
...
obj.Auxiliary().NthMethod = #some_other_value
will have no effect because they will immediately get garbage collected. I do not really understand what this means, but if I execute the following lines (after the lines above):
print(obj.Auxiliary().FirstMethod())
...
print(obj.Auxiliary().NthMethod())
I am getting the values I entered previously.

To speed things up, and make the customization somewhat cleaner, you can cache the results of the AuxilliaryClass constructor/singleton/accessor, and loop over a dict calling setattr().
Try something like this:
init_values = {
'FirstMethod' : some_value,
:
'NthMethod' : some_other_value,
}
def main():
obj = MainClass()
aux = obj.Auxiliary() # cache the call, only make it once
for attr,value in init_values.items(): # python3 here, iteritems() in P2
setattr(aux, attr, value)
# other stuff below this point

I understand what is happening here: my code has a series of decorators before all methods which allow memoization. I do not know exactly how they work but when used the problem described above - namely, that lines of type obj.Auxiliary().IthMethod = some_value get immediately garbage collected - does not occur.
Unfortunately I cannot give further details regarding these decorators as 1) I do not understand them very well and 2) I cannot transmit this information outside my company. I think under this circumstances it is difficult to answer my question because I cannot fully disclose all the necessary details.

returning functions from a method in python

I have tried looking into the documentation and google search , but I am unable to find out the significance of the [clazz] at the end of method. Could someone help me understand the meaning of the [clazz] at the end of the method? Thanks.
def get_context_setter(context, clazz):
return {
int: context.setFieldToInt,
datetime: context.setFieldToDatetime
}[clazz]
setFieldToInt and setFieldToDatetime are methods inside context class.

This function returns one of two things. It returns either context.setFieldToInt or context.setFieldToDatetime. It does so by using a dictionary as what would be a switch statement in other programming languages.
It checks whether clazz is a reference to the class int or a reference to the class datetime, and then returns the appropriate method.
It's identical to this code:
def get_context_setter(context, clazz):
lookup_table = {int: context.setFieldToInt,
datetime: context.setFieldToDatetime
}
context_function = lookup_table[clazz] # figure out which to return
return context_function
Using a dict instead of a switch statement is pretty popular, see Replacements for switch statement in Python? .

More briefly.
The code presented is expecting the class of some object as a parameter poorly named as clazz.
It's then using that class as a dictionary key.
They're essentially trying to accept two different types and call a method on the object type.
class is a keyword in Python.
The author of the code you show chose to use a strange spelling instead of a longer snake_case parameter name like obj_class.
The parameters really should have been named obj, obj_class
Or
instance, instance_class
Even better, the class really need not be a separate parameter.

How can I dynamically construct the contents of a function in Python 2.7?

I've come across many articles which don't quite address what I'm attempting to do. I hope that this isn't a duplicate question.
I am writing a Python script which interfaces with several real-world objects outside of my PC. I have written classes which contain the functions necessary to interface with those objects. I have also successfully written a function, not very object oriented in style, which instantiates instances of those classes, gets data from them, and saves it all to a CSV file. That all works fine. Where I'm getting tripped up is in trying to make the function more adaptable so that I don't have to re-write it every time I want to add another class instance or get a different data point from a pre-existing instance.
The approach that I'm attempting is to create a list which contains names of class instances and specific function names to get data out of those instances. I then pass this list to another function. This other function would ideally create a header for my CSV file (so that the data can be more easily interpreted) and then proceed to gather the data.
Pseudocode:
inst1 = my_class_1()
inst2 = my_class_2()
filename = 'fubar.csv'
control_list = ['inst1', 'value1', 'inst2', 'value']
my_fucntion(filename, control_list):
# Code to create a header for CSV file in the form inst1-value1, inst2-value2
# Loop to control the number of times to grab data
# Code which iterates control_list and builds up things like inst1.value1(), inst2.value2(),
# etc. and then appends the results to a list
# write results list to filename
If I pass all elements of control_list into my function as strings I can easily generate my header for the results file but I can't figure out how to take those strings and generate something like inst1.value1() so that Python will access the functions within my classes. I think that if I create the list like [inst1.value1(), inst2.value2()] I can get data from my classes but I can't figure out how to create my header file this way.
I'm open to other approaches if the approach I outlined above won't work.

You can easily do this work without use of other python predefined function with help of oops concept.
class MyClass(object):
def init(self, value1):
self.value1 = value1
inst = MyClass("example")
get the value of attribute
inst.value1
set the value of attribute
inst.value1 = "hello"

I think you're looking for the getattr function:
class MyClass(object):
def __init__(self, value1):
self.value1 = value1
instance = MyClass("example")
fieldname = "value1"
fieldvalue = getattr(inst1, fieldname)
# fieldvalue == "example"
With getattr and setattr you can write code that manipulates fields whose name you pass in to your function as parameters.

I recently had a similar issue and used namedtuple to solve it.
value1 = 'value of 1'
value2 = 'value of 2'
ControlList = namedtuple("ControlList", "inst1, inst2"))
controllist = ControlList(value1, value2)
>>> print controllist.inst1
... 'value of 1'
>>> print controllist.inst2
... 'value of 2'
value1 and value2 need not be strings, they can even be instanciated or uninstaciated classes.
The benefits of writing ControlList this way, is that you can always expand your control list, and it doesn't break functions that rely on ControlList being a certain length, or having certain values in certain places. You always have access to the items in ControlList via .method binding, and can alias ControlList.method internally to your functions if you want to avoid messing with code.

Python: Efficient way to put multiple variables through a function

I have a bunch of variables that are equal to values pulled from a database. Sometimes, the database doesn't have a value and returns "NoneType". I'm taking these variables and using them to build an XML file. When the variable is NoneType, it causes the XML value to read "None" rather than blank as I'd prefer.
My question is: Is there an efficient way to go through all the variables at once and search for a NoneType and, if found, turn it to a blank string?
ex.
from types import *
[Connection to database omitted]
color = database.color
size = database.size
shape = database.shape
name = database.name
... etc
I could obviously do something like this:
if type(color) is NoneType:
color = ""
but that would become tedious for the 15+ variables I have. Is there a more efficient way to go through and check each variable for it's type and then correct it, if necessary? Something like creating a function to do the check/correction and having an automated way of passing each variable through that function?

All the solutions given here will make your code shorter and less tedious, but if you really have a lot of variables I think you will appreciate this, since it won't make you add even a single extra character of code for each variable:
class NoneWrapper(object):
def __init__(self, wrapped):
self.wrapped = wrapped
def __getattr__(self, name):
value = getattr(self.wrapped, name)
if value is None:
return ''
else:
return value
mydb = NoneWrapper(database)
color = mydb.color
size = mydb.size
shape = mydb.shape
name = mydb.name
# All of these will be set to an empty string if their
# original value in the database is none
Edit
I thought it was obvious, but I keep forgetting it takes time until all the fun Python magickery becomes a second nature. :) So how NoneWrapper does its magic? It's very simple, really. Each python class can define some "special" methods names that are easy to identify, because they are always surrounded by two underscores from each side. The most common and well-known of these methods is __init__(), which initializes each instance of the class, but there are many other useful special methods, and one of them is __getattr__(). This method is called whenever someone tries to access an attribute. of an instance of your class, and you can customize it to customize attribute access.
What NoneWrapper does is to override getattr, so whenever someone tries to read an attribute of mydb (which is a NoneWrapper instance), it reads the attribute with the specified name from the wrapped object (in this case, database) and return it - unless it's value is None, in which case it returns an empty string.
I should add here that both object variables and methods are attributes, and, in fact, for Python they are essentially the same thing: all attributes are variables that could be changed, and methods just happen to be variables that have their value set to a function of special type (bound method). So you can also use getattr() to control access to functions, which could lead to many interesting uses.

The way I would do it, although I don't know if it is the best, would be to put the variables you want to check and then use a for statement to iterate through the list.
check_vars = [color,size,shape,name]
for var in check_vars:
if type(var) is NoneType:
var = ""
To add variables all you have to do is add them to the list.

If you're already getting them one at a time, it's not that much longer to write:
def none_to_blank(value):
if value is None:
return ""
return value
color = none_to_blank(database.color)
size = none_to_blank(database.size)
shape = none_to_blank(database.shape)
name = none_to_blank(database.name)
Incidentally, use of "import *" is generally discouraged. Import only what you're using.

you can simply use:
color = database.color or ""
another way is to use a function:
def filter_None(var):
"" if (a is None) else a
color = filter_None(database.color)
I don't know how the database object is structured but another solution is to modify the database object like:
def myget(self, varname):
value = self.__dict__[varname]
return "" if (value is None) else value
DataBase.myget = myget
database = DataBase(...)
[...]
color = database.myget("color")
you can do better using descriptors or properties

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Creating Python class with regular expressions - python

Where is your definition of the input_iter dicts? How do they look like? Your error indicates that the values of your dicts are not strings.

Related

Overwriting bound methods in python when function requires class attribute

Efficiently setting attribute values for a class instantiated within another class

returning functions from a method in python

How can I dynamically construct the contents of a function in Python 2.7?

Python: Efficient way to put multiple variables through a function

Categories

Resources