Adding 2 python class objects together - python

I am having some problems adding two class objects together.
This is the code given to me, which will run MY file, the HyperLogLog and a sample text file:
import HyperLogLog
import sys
hlls = [HyperLogLog.HyperLogLog() for _ in range(5)]
with open(sys.argv[1], "r") as file:
for line in file:
cleanLine = line.replace("\n", "")
(cmd, set, value) = cleanLine.split(" ")[:3]
# See if this was an add, count, or merge command
if cmd == "A":
hlls[int(set)].add(value)
elif cmd == "C":
estimate = hlls[int(set)].count()
print("Estimate:", estimate, "Real count:", value)
elif cmd == "M":
(cmd, m1, m2, m3) = cleanLine.split(" ")
hlls[int(m3)] = hlls[int(m1)] + hlls[int(m2)]
The bottom most line is to merge hlls(set m1) and hlls(set m2). hlls(set x) stores a single parameter M, which is my HyperLogLog vector. I need to make an add function to make the addition line above work. This I have done as follows:
class HyperLogLog:
def __init__(self):
self.M = [0 for x in range(m)]
##############
Code altering the self.M
##############
def __add__(self, other):
Sum=other.M
for i,value in enumerate(other.M):
if value<self.M[i]:
Sum[i]=self.M[i]
self.M=Sum
return self
This will return the correct value for the m3 set. But it will also alter the self.M value of set m1. How can I return something other than self, which will make hlls[int(m3)] and instance of the HyperLogLog class, with the merged self.M value?
If I just return the Sum function, hlls[int(m3)] is no longer an instance of the HyperLogLog class.
If I change self.M as I do, I alter the self.M value of hlls[int(m1)].
If I do something like:
def __add__(self, other):
Sum=other.M
for i,value in enumerate(other.M):
if value<self.M[i]:
Sum[i]=self.M[i]
self2=self
self2.M=Sum
return self2
The value of self.M of instance hlls[int(m1)] is still changed. I don't understand why.

When you do this:
self2=self
Both self and self2 point to the same object, so when one is changed the other one is changed as well. The easiest fix would be to create a new HyperLogLog object, so you would replace the line above with:
self2=HyperLogLog()

This doesn't create a new object instance. It just assigns another name to the same object.
self2=self
You should create a new HyperLogLog object in the __add__ method.
Something like this:
def __add__(self, other):
retval = HyperLogLog()
retval.M = [max(a, b) for a, b in zip(self.M, other.M)]
return retval

Related

pickle, dill and cloudpickle returning field as empty dict on custom class after process termination

I have an object of a custom class that I am trying to serialize and permanently store.
When I serialize it, store it, load it and use it in the same run, it works fine. It only messes up when I've ended the process and then try to load it again from the pickle file. This is the code that works fine:
first_model = NgramModel(3, name="debug")
for paragraph in text:
first_model.train(paragraph_to_sentences(text))
# paragraph to sentences just uses regex to do the equivalent of splitting by punctuation
print(first_model.context_options)
# context_options is a dict (counter)
first_model = NgramModel.load_existing_model("debug")
#load_existing_model loads the pickle file. Look in the class code
print(first_model.context_options)
However, when I run this alone, it prints an empty counter:
first_model = NgramModel.load_existing_model("debug")
print(first_model.context_options)
This is a shortened version of the class file (the only two methods that touch the pickle/dill are update_pickle_state and load_existing_model):
import os
import dill
from itertools import count
from collections import Counter
from os import path
class NgramModel:
context_options: dict[tuple, set[str]] = {}
ngram_count: Counter[tuple] = Counter()
n = 0
pickle_path: str = None
num_paragraphs = 0
num_sentences = 0
def __init__(self, n: int, **kwargs):
self.n = n
self.pickle_path = NgramModel.pathify(kwargs.get('name', NgramModel.gen_pickle_name())) #use name if exists else generate random name
def train(self, paragraph_as_list: list[str]):
'''really the central method that coordinates everything else. Takes a list of sentences, generates data(n-grams) from each, updates the fields, and saves the instance (self) to a pickle file'''
self.num_paragraphs += 1
for sentence in paragraph_as_list:
self.num_sentences += 1
generated = self.generate_Ngrams(sentence)
self.ngram_count.update(generated)
for ngram in generated:
self.add_to_set(ngram)
self.update_pickle_state()
def update_pickle_state(self):
'''saves instance to pickle file'''
file = open(self.pickle_path, "wb")
dill.dump(self, file)
file.close()
#staticmethod
def load_existing_model(name: str):
'''returns object from pickle file'''
path = NgramModel.pathify(name)
file = open(path, "rb")
obj: NgramModel = dill.load(file)
return obj
def generate_Ngrams(self, string: str):
'''ref: https://www.analyticsvidhya.com/blog/2021/09/what-are-n-grams-and-how-to-implement-them-in-python/'''
words = string.split(" ")
words = ["<start>"] * (self.n - 1) + words + ["<end>"] * (self.n - 1)
list_of_tup = []
for i in range(len(words) + 1 - self.n):
list_of_tup.append((tuple(words[i + j] for j in range(self.n - 1)), words[i + self.n - 1]))
return list_of_tup
def add_to_set(self, ngram: tuple[tuple[str, ...], str]):
if ngram[0] not in self.context_options:
self.context_options[ngram[0]] = set()
self.context_options[ngram[0]].add(ngram[1])
#staticmethod
def pathify(name):
'''converts name to path'''
return f"models/{name}.pickle"
#staticmethod
def gen_pickle_name():
for i in count():
new_name = f"unnamed-pickle-{i}"
if not path.exists(NgramModel.pathify(new_name)):
return new_name
All the other fields print properly and are complete and correct except the two dicts
The problem is that is that context_options is a mutable class-member, not an instance member. If I had to guess, dill is only pickling instance members, since the class definition holds class members. That would account for why you see a "filled-out" context_options when you're working in the same shell but not when you load fresh — you're using the dirtied class member in the former case.
It's for stuff like this that you generally don't want to use mutable class members (or similarly, mutable default values in function signatures). More typical is to use something like context_options: dict[tuple, set[str]] = None and then check if it's None in the __init__ to set it to a default value, e.g., an empty dict. Alternatively, you could use a #dataclass and provide a field initializer, i.e.
#dataclasses.dataclass
class NgramModel:
context_options: dict[tuple, set[str]] = dataclasses.field(default_factory=dict)
...
You can observe what I mean about it being a mutable class member with, for instance...
if __name__ == '__main__':
ng = NgramModel(3, name="debug")
print(ng.context_options) # {}
ng.context_options[("foo", "bar")] = {"baz", "qux"}
print(ng.context_options) # {('foo', 'bar'): {'baz', 'qux'}}
ng2 = NgramModel(3, name="debug")
print(ng2.context_options) # {('foo', 'bar'): {'baz', 'qux'}}
I would expect a brand new ng2 to have the same context that the brand new ng had - empty (or whatever an appropriate default is).

How to convert an object back into the code used to create it?

For example if I have a custom Python object like this;
#!/usr/bin/env python3
import os
base_dir = os.path.abspath(".")
class MyFile(dict):
def __init__(self, name, size = None, dir = base_dir):
self.name = name
self.path = os.path.join(dir, name)
self.bytes = size
and somewhere in my program, I initialize my object class;
a = MyFile(name = "foo", size = 10)
I want to be able to return the code used to create the object in the first place. For example;
print(a)
# <__main__.MyFile object at 0x102b84470>
# should instead print:
# MyFile(name = "foo", size = 10)
But since my object has some default attribute values, I only want those to show up in the output if they were explicitly included when the object was initialized;
b = MyFile(name = "bar", dir = "/home")
print(b)
# <__main__.MyFile object at 0x102b845c0>
# should instead print:
# MyFile(name = "bar", dir = "/home")
And to be clear, I am not trying to pull this from the source code, because a lot of my objects will be created dynamically, and I want to be able to return the same thing for them as well;
l = [ ("baz", 4), ("buzz", 12) ]
f = [ MyFile(name = n, size = s) for n, s in l ]
print(f)
# [<__main__.MyFile object at 0x1023844a8>, <__main__.MyFile object at 0x102384828>]
# should instead print:
# [ MyFile(name = "baz", size = 4), MyFile(name = "buzz", size = 12) ]
I saw the inspect library (https://docs.python.org/3/library/inspect.html) but it does not seem to have anything that does this. What am I missing? This functionality would be pretty analogous to R's dput function.
At a very basic level you can do this:
class MyClass:
def __init__(self, a, b):
self.a = a
self.b = b
def __repr__(self):
return f'{self.__class__.__name__}({self.a}, {self.b})'
class MyOtherClass(MyClass):
def method(self):
pass
c = MyClass(1, 2)
oc = MyOtherClass(3, 4)
print(c, oc)
Result:
MyClass(1, 2) MyOtherClass(3, 4)
This does what you ask, as well as taking subclassing into account to provide the correct class name. But of course things can get complicated for several reasons:
class MyClass:
def __init__(self, a, b):
self.a = a + 1
self.b = b if b < 10 else a
self.c = 0
def inc_c(self):
self.c += 1
def __repr__(self):
return f'{self.__class__.__name__}({self.a - 1}, {self.b})'
The value of c isn't covered by the constructor, so the proposed call would set it to 0. And Although you could compensate for the + 1 for a, the value of b will be more complicated - even more so if you realise someone could have changed the value later.
And then you need to consider that subclasses can override behaviour, etc. So, doing something like this only makes sense in very limited use cases.
As simple as replacing your code snippet with the following:
import os
base_dir = os.path.abspath(".")
class MyFile(object):
def __init__(self, name, size = None, dir = base_dir):
self.name = name
self.path = os.path.join(dir, name)
self.bytes = size
self.remember(name,size, dir)
def remember(self, name,size, dir):
self.s= '{}(name = \'{}\'{}{})'.format(self.__class__.__name__,name, ", size="+str(size) if size!=None else "", ', dir="'+dir+'"' if dir!=base_dir else "")
def __repr__(self):
return self.s
a) for a it returns:
MyFile(name = 'foo', size=10)
b) for b it returns:
MyFile(name = 'bar', dir="/home")
c) for f it returns:
[MyFile(name = 'baz', size=4), MyFile(name = 'buzz', size=12)]
Thanks to everyone who commented and answered. Ultimately, I incorporated their ideas and feedback into the following method, which allowed me to preserve the object's native __repr__ while still getting the behaviors I wanted.
#!/usr/bin/env python3
import os
base_dir = os.path.abspath(".")
class MyFile(dict):
"""
A custom dict class that auto-populates some keys based on simple input args
compatible with unittest.TestCase.assertDictEqual
"""
def __init__(self, name, size = None, dir = base_dir):
"""
standard init methods
"""
self.name = name
self.path = os.path.join(dir, name)
self.bytes = size
# auto-populate this key
self['somekey'] = self.path + ' ' + str(self.bytes)
# more logic for more complex keys goes here...
# use these later with `init` and `repr`
self.args = None
self.kwargs = None
#classmethod
def init(cls, *args, **kwargs):
"""
alternative method to initialize the object while retaining the args passed
"""
obj = cls(*args, **kwargs)
obj.args = args
obj.kwargs = kwargs
return(obj)
def repr(self):
"""
returns a text representation of the object that can be used to
create a new copy of an identical object, displaying only the
args that were originally used to create the current object instance
(do not show args that were not passed e.g. default value args)
"""
n = 'MyFile('
if self.args:
for i, arg in enumerate(self.args):
n += arg.__repr__()
if i < len(self.args) - 1 or self.kwargs:
n += ', '
if self.kwargs:
for i, (k, v) in enumerate(self.kwargs.items()):
n += str(k) + '=' + v.__repr__()
if i < len(self.kwargs.items()) - 1:
n += ', '
n += ')'
return(n)
Usage:
# normal object initialization
obj1 = MyFile('foo', size=10)
print(obj1) # {'somekey': '/Users/me/test/foo 10'}
# initialize with classmethod instead to preserve args
obj2 = MyFile.init("foo", size = 10)
print(obj2) # {'somekey': '/Users/me/test/foo 10'}
# view the text representation
repr = obj2.repr()
print(repr) # MyFile('foo', size=10)
# re-load a copy of the object from the text representation
obj3 = eval(repr)
print(obj3) # {'somekey': '/Users/me/test/foo 10'}
The use case for this being where I need to represent large simple data structures (dicts) in my Python code (integration tests), where the data values are dynamically generated from a smaller set of variables. But when I have many hundreds of such data structures that I need to include in the test case, it becomes infeasible to write the code for e.g. MyFile(...) out hundreds of times. This method allows me to use a script to ingest the data, and then print out compact Python code needed to recreate the data using my custom object class. Which I can then just copy/paste into my test cases.

How to use two helper functions in main script from another script

TypeError: _slow_trap_ramp() takes 1 positional argument but 2 were given
def demag_chip(self):
coil_probe_constant = float(514.5)
field_sweep = [50 * i * (-1)**(i + 1) for i in range(20, 0, -1)] #print as list
for j in field_sweep:
ramp = self._slow_trap_ramp(j)
def _set_trap_ramp(self):
set_trap_ramp = InstrumentsClass.KeysightB2962A.set_trap_ramp
return set_trap_ramp
def _slow_trap_ramp(self):
slow_trap_ramp = ExperimentsSubClasses.FraunhoferAveraging.slow_trap_ramp
return slow_trap_ramp
The error is straightforward.
ramp = self._slow_trap_ramp(j)
You are calling this method with an argument j, but the method doesn't take an argument (other than self, which is used to pass the object).
Re-define your method to accept an argument if you want to pass it one:
def _slow_trap_ramp(self, j):
It looks like your code extract contains methods of some class, whose full definition is not shown, and you are calling one method from another method (self._slow_trap_ramp(j)). When you call a method, Python automatically passes self before any other arguments. So you need to change def _slow_trap_ramp(self) to def _slow_trap_ramp(self, j).
Update in response to comment
To really help, we would need to see more of the class you are writing, and also some info on the other objects you are calling. But I am going to go out on a limb and guess that your code looks something like this:
InstrumentsClass.py
class KeysightB2962A
def __init__(self):
...
def set_trap_ramp(self):
...
ExperimentsSubClasses.py
class FraunhoferAveraging
def __init__(self):
...
def slow_trap_ramp(self, j):
...
Current version of main.py
import InstrumentsClass, ExperimentsSubClasses
class MyClass
def __init__(self)
...
def demag_chip(self):
coil_probe_constant = float(514.5)
field_sweep = [50 * i * (-1)**(i + 1) for i in range(20, 0, -1)] #print as list
for j in field_sweep:
ramp = self._slow_trap_ramp(j)
def _set_trap_ramp(self):
set_trap_ramp = InstrumentsClass.KeysightB2962A.set_trap_ramp
return set_trap_ramp
def _slow_trap_ramp(self):
slow_trap_ramp = ExperimentsSubClasses.FraunhoferAveraging.slow_trap_ramp
return slow_trap_ramp
if __name__ == "__main__":
my_obj = MyClass()
my_obj.demag_chip()
If this is the case, then these are the main problems:
Python passes self and j to MyClass._slow_trap_ramp, but you've only defined it to accept self (noted above),
you are using class methods from KeysightB2962A and FraunhoferAveraging directly instead of instantiating the class and using the instance's methods, and
you are returning references to the methods instead of calling the methods.
You can fix all of these by changing the code to look like this (see embedded comments):
New version of main.py
import InstrumentsClass, ExperimentsSubClasses
class MyClass
def __init__(self)
# create instances of the relevant classes (note parentheses at end)
self.keysight = InstrumentsClass.KeysightB2962A()
self.fraun_averaging = ExperimentsSubClasses.FraunhoferAveraging()
def demag_chip(self):
coil_probe_constant = float(514.5)
field_sweep = [50 * i * (-1)**(i + 1) for i in range(20, 0, -1)] #print as list
for j in field_sweep:
ramp = self._slow_trap_ramp(j)
def _set_trap_ramp(self):
# call instance method (note parentheses at end)
return self.keysight.set_trap_ramp()
def _slow_trap_ramp(self, j): # accept both self and j
# call instance method (note parentheses at end)
return self.fraun_averaging.slow_trap_ramp(j)
if __name__ == "__main__":
my_obj = MyClass()
my_obj.demag_chip()

how to fix the _getitem__ method

def pnamedtuple(type_name, field_names, mutable=False):
pass
class type_name:
def __init__(self, x, y):
self.x = x
self.y = y
self._fields = ['x','y']
self._mutable = False
def get_x(self):
return self.x
def get_y(self):
return self.y
def __getitem__(self,i):
if i > 1 or i <0:
raise IndexError
if i == 0 or i == 'x':
return self.get_x():
if i == 1 or i == 'y':
return self.get_y():
the getitem method to overload the [] (indexing operator) for this class: an index of 0 returns the value of the first field name in the field_names list; an index of 1 returns the value of the second field name in the field_names list, etc. Also, the index can be a string with the named field. So, for p = Point(1,2) writing p.get_x(), or p[0]), or p['x'] returns a result of 1. Raise an IndexError with an appropriate message if the index is out of bounds int or a string that does not name a field.
I am not sure how to fix the getitme function. below is the bsc.txt
c-->t1 = Triple1(1,2,3)
c-->t2 = Triple2(1,2,3)
c-->t3 = Triple3(1,2,3)
# Test __getitem__ functions
e-->t1[0]-->1
e-->t1[1]-->2
e-->t1[2]-->3
e-->t1['a']-->1
e-->t1['b']-->2
e-->t1['c']-->3
^-->t1[4]-->IndexError
^-->t1['d']-->IndexError
^-->t1[3.2]-->IndexError
can someone tell how to fix my _getitem _ function to get the output in bsc.txt? many thanks.
You've spelled __getitem__ incorrectly. Magic methods require two __ underscores before and after them.
So you haven't overloaded the original __getitem__ method, you've simply created a new method named _getitem_.
Python 3 does not allow strings and integers to be compared with > or <; it's best to stick with == if you don't yet know the type of i. You could use isinstance, but here you can easily convert the only two valid integer values to strings (or vice versa), then work only on strings.
def __getitem__(self, i):
if i == 0:
i = "x"
elif i == 1:
i = "y"
if i == "x":
return self.get_x()
elif i == "y":
return self.get_y()
else:
raise IndexError("Invalid key: {}".format(i))
your function is interesting, but there are some issues with it:
In python 3 you can't compare string with numbers, so you first should check with == against know values and or types. For example
def __getitem__(self,i):
if i in {0,"x"}:
return self.x
elif i in {1,"y"}:
return self.y
else:
raise IndexError(repr(i))
But defined like that (in your code or in the example above) for an instance t1 this t1[X] for all string X others than "x" or "y" will always fail as you don't adjust it for any other value. And that is because
pnamedtuple looks like you want for it to be a factory like collections.namedtuple, but it fail to be general enough because you don't use any the arguments of your function at all. And no, type_name is not used either, whatever value it have is throw away when you make the class declaration.
how to fix it?
You need other ways to store the value of the fields and its respective name, for example a dictionary lets call it self._data
To remember how you called yours field, use the argument of your function, for instance self._fields = field_names
To accept a unknown number of arguments use * like __init__(self, *values) then verify that you have the same numbers of values and fields and build your data structure of point 1 (the dictionary)
Once that those are ready then __getitem__ become something like:
def __getitem__(self, key):
if key in self._data:
return self._data[key]
elif isintance(key,int) and 0 <= key < len(self._fields):
return self._data[ self._fields[key] ]
else:
raise IndexError( repr(key) )
or you can simple inherit from a appropriate namedtuple and the only thing you need to do is overwrite its __getitem__ like
def __getitem__(self,key):
if key in self._fields:
return getattr(self,key)
return super().__getitem__(key)

Printing an object python class

I wrote the following program:
def split_and_add(invoer):
rij = invoer.split('=')
rows = []
for line in rij:
rows.append(process_row(line))
return rows
def process_row(line):
temp_coordinate_row = CoordinatRow()
rij = line.split()
for coordinate in rij:
coor = process_coordinate(coordinate)
temp_coordinate_row.add_coordinaterow(coor)
return temp_coordinate_row
def process_coordinate(coordinate):
cords = coordinate.split(',')
return Coordinate(int(cords[0]),int(cords[1]))
bestand = file_input()
rows = split_and_add(bestand)
for row in range(0,len(rows)-1):
rij = rows[row].weave(rows[row+1])
print rij
With this class:
class CoordinatRow(object):
def __init__(self):
self.coordinaterow = []
def add_coordinaterow(self, coordinate):
self.coordinaterow.append(coordinate)
def weave(self,other):
lijst = []
for i in range(len(self.coordinaterow)):
lijst.append(self.coordinaterow[i])
try:
lijst.append(other.coordinaterow[i])
except IndexError:
pass
self.coordinaterow = lijst
return self.coordinaterow
However there is an error in
for row in range(0,len(rows)-1):
rij = rows[row].weave(rows[row+1])
print rij
The outcome of the print statement is as follows:
[<Coordinates.Coordinate object at 0x021F5630>, <Coordinates.Coordinate object at 0x021F56D0>]
It seems as if the program doesn't acces the actual object and printing it. What am i doing wrong here ?
This isn't an error. This is exactly what it means for Python to "access the actual object and print it". This is what the default string representation for a class looks like.
If you want to customize the string representation of your class, you do that by defining a __repr__ method. The typical way to do it is to write a method that returns something that looks like a constructor call for your class.
Since you haven't shown us the definition of Coordinate, I'll make some assumptions here:
class Coordinate(object):
def __init__(self, x, y):
self.x, self.y = x, y
# your other existing methods
def __repr__(self):
return '{}({}, {})'.format(type(self).__name__, self.x, self.y)
If you don't define this yourself, you end up inheriting __repr__ from object, which looks something like:
return '<{} object at {:#010x}>'.format(type(self).__qualname__, id(self))
Sometimes you also want a more human-readable version of your objects. In that case, you also want to define a __str__ method:
def __str__(self):
return '<{}, {}>'.format(self.x, self.y)
Now:
>>> c = Coordinate(1, 2)
>>> c
Coordinate(1, 2)
>>> print(c)
<1, 2>
But notice that the __str__ of a list calls __repr__ on all of its members:
>>> cs = [c]
>>> print(cs)
[Coordinate(1, 2)]

Categories