How do I merge two objects into one? - python

How can we create dynamically (procedurally.... at run-time) one new object from two old objects so that operations performed on the new object are performed on both of the two old objects?
As just one example, we might have two streams:
a string stream ... str_strm = io.StringIO()
a file stream ... fl_strm = open("test_file.txt", "w")
In the example, we might want it to be that anytime we try to write a string to the new steam, copies of that string are written to the two older streams.
class MergedStream:
def __init__(strm1, strm2):
self._strm1 = strm1
self._strm2 = strm2
def write(self, msg:str):
self._strm1.write()
self._strm2.write()
return None
str_strm = io.StringIO()
fl_strm = open("test_file.txt", "w")
ms = MergedStream(str_strm, fl_strm)
I can NOT guarantee that the two old objects both have a method named write.
Our solution should be general and dynamically generate the new merged object no matter what the two old objects are.
How might we create a new object in such a way that anytime we try to act upon the new object, we perform that same action upon the two older objects?
Something like the following is a step in the right direction, but not a complete solution:
class MergedObject:
#classmethod
def merge(cls, left, right):
if left == right:
return left
else:
return cls(left, right)
def __call__(self, *args, **kwargs):
try:
r1 = self._lefty(*args, **kwargs)
r2 = self._righty(*args, **kwargs)
return type(self).merge(r1, r2)
except AttributeError:
raise AttributeError()
def __init__(lefty, righty):
self._lefty = lefty
self._righty = righty
reserved = dir(self) # reserved attribute names
attributes = dict()
attribute_names = set(dir(lefty), dir(righty)).difference(reserved)
for attribute_name in attribute_names:
latter = getattr(lefty) # `latter` is the `left-attribute`
ratter = getattr(righty) # `ratter` is the `right-attribute`
setattr(attribute_name, type(self).merge(latter, ratter))
SOME WRINKLES TO IRON OUT
I need to be able to merge two objects which have no overloaded == operator. That is, even if there is no code inside of the class definition which says def __eq__():, I need to test if two things are equal. Suppose I merge two objects which both have a method named insert() which return -1. I do not want to carry around duplicate values. I do not want a merged object of two copies of the number 10, and then x + 5 computes 10 + 5 twice.
There are some issues with overriding python's "magic" methods (dunder methods) such as __len__ or __mul__ If the two old objects both have a __len__ method, the new merged object might fail to have a a method named __len__.

Related

How to overwrite an existing dictionary Python 3

Sorry if this is worded badly, I hope you can understand/edit my question to make it easier to understand.
Ive been using python pickle to pickle/unpickle the state of the objects in a game (i do understand this is probably very storage/just generally inefficient and lazy but its only whilst im learning more python). However I encounter errors when doing this with the classes for presenting information.
The issue at root I believe is that when I unpickle the save data to load, it overwrites the existing dictionaries but the object storage points change, so the information class is trying to detect a room that the player can no longer enter since the data was overwritten.
I've made a snippet to reproduce the issue I have:
import pickle
class A(object):
def __init__(self):
pass
obj_dict = {
'a' : A(),
'b' : A()
## etc.
}
d = obj_dict['a']
f = open('save', 'wb')
pickle.Pickler(f,2).dump(obj_dict)
f.close()
f = open('save', 'rb')
obj_dict = pickle.load(f)
f.close()
if d == obj_dict['a']:
print('success')
else:
print(str(d) + '\n' + str(obj_dict['a']))
I understand this is probably to be expected when rewriting variables like this, but is there a way around it? Many thanks
Is your issue that you want d == obj_dict['a'] to evaluate to true?
By default, the above == equality check will compare the references of the two objects. I.e. does d and obj_dict['a'] point to the same chunk of memory?
When you un-pickle your object, it will be created as a new object, in a new chunk of memory and thus your equality check will fail.
You need to override how your equality check behaves to get the behavior you want. The methods you need to override are: __eq__ and __hash__.
In order to track your object through repeated pickling and un-pickling, you'll need to assign a unique id to the object on creation:
class A:
def __init__(self):
self.id = uuid.uuid4() # assign a unique, random id
Now you must override the methods mentioned above:
def __eq__( self, other ):
# is the other object also a class A and does it have the same id
return isinstance( other, A ) and self.id == other.id
def __hash__( self ):
return hash(self.id)

Reading binary file to a list of structs, but deepcopy overwrites first structs

I am reading a binary file into a list of class instances. I have a loop that reads data from the file into an instance. When the instance is filled, I append the instance to a list and start reading again.
This works fine except that one of the elements of the instance is a Rect (i.e. rectangle), which is a user-defined type. Even with deepcopy, the attributes are overwritten.
There are work-arounds, like not having Rect be a user-defined type. However, I can see that this is a situation that I will encounter a lot and was hoping there was a straightforward solution that allows me to read nested types in a loop.
Here is some code:
class Rect:
def __init__(self):
self.L = 0
class groundtruthfile:
def __init__(self):
self.rect = Rect
self.ht = int
self.wt = int
self.text = ''
...
data = []
g = groundtruthfile()
f = open("datafile.dtf", "rb")
length = unpack('i', f.read(4))
for i in range(1,length[0]+1): #length is a tuple
g.rect.L = unpack('i',f.read(4))[0]
...
data.append(copy.deepcopy(g))
The results of this are exactly what I want, except that all of the data(i).rect.L are the value of the last data read.
You have two problems here:
The rect attribute of a groundtruthfile instance (I'll just put this here...) is the Rect class itself, not an instance of that class - you should be doing:
self.rect = Rect() # note parentheses
to create an instance, instead (similarly e.g. self.ht = int sets that attribute to the integer class, not an instance); and
The line:
g.rect.L = unpack('i',f.read(4))[0]
explicitly modifies the attribute of the same groundtruthfile instance you've been using all along. You should move the line:
g = groundtruthfile()
inside the loop, so that you create a separate instance each time, rather than trying to create copies.
This is just a minimal fix - it would make sense to actually provide arguments to the various __init__ methods, for example, such that you can create instances in a more intuitive way.
Also, if you're not actually using i in the loop:
for _ in range(length[0]):
is neater than:
for i in range(1,length[0]+1):

How to set up a class with all the methods of and functions like a built in such as float, but holds onto extra data?

I am working with 2 data sets on the order of ~ 100,000 values. These 2 data sets are simply lists. Each item in the list is a small class.
class Datum(object):
def __init__(self, value, dtype, source, index1=None, index2=None):
self.value = value
self.dtype = dtype
self.source = source
self.index1 = index1
self.index2 = index2
For each datum in one list, there is a matching datum in the other list that has the same dtype, source, index1, and index2, which I use to sort the two data sets such that they align. I then do various work with the matching data points' values, which are always floats.
Currently, if I want to determine the relative values of the floats in one data set, I do something like this.
minimum = min([x.value for x in data])
for datum in data:
datum.value -= minimum
However, it would be nice to have my custom class inherit from float, and be able to act like this.
minimum = min(data)
data = [x - minimum for x in data]
I tried the following.
class Datum(float):
def __new__(cls, value, dtype, source, index1=None, index2=None):
new = float.__new__(cls, value)
new.dtype = dtype
new.source = source
new.index1 = index1
new.index2 = index2
return new
However, doing
data = [x - minimum for x in data]
removes all of the extra attributes (dtype, source, index1, index2).
How should I set up a class that functions like a float, but holds onto the extra data that I instantiate it with?
UPDATE: I do many types of mathematical operations beyond subtraction, so rewriting all of the methods that work with a float would be very troublesome, and frankly I'm not sure I could rewrite them properly.
I suggest subclassing float and using a couple decorators to "capture" the float output from any method (except for __new__ of course) and returning a Datum object instead of a float object.
First we write the method decorator (which really isn't being used as a decorator below, it's just a function that modifies the output of another function, AKA a wrapper function):
def mydecorator(f,cls):
#f is the method being modified, cls is its class (in this case, Datum)
def func_wrapper(*args,**kwargs):
#*args and **kwargs are all the arguments that were passed to f
newvalue = f(*args,**kwargs)
#newvalue now contains the output float would normally produce
##Now get cls instance provided as part of args (we need one
##if we're going to reattach instance information later):
try:
self = args[0]
##Now check to make sure new value is an instance of some numerical
##type, but NOT a bool or a cls type (which might lead to recursion)
##Including ints so things like modulo and round will work right
if (isinstance(newvalue,float) or isinstance(newvalue,int)) and not isinstance(newvalue,bool) and type(newvalue) != cls:
##If newvalue is a float or int, now we make a new cls instance using the
##newvalue for value and using the previous self instance information (arg[0])
##for the other fields
return cls(newvalue,self.dtype,self.source,self.index1,self.index2)
#IndexError raised if no args provided, AttributeError raised of self isn't a cls instance
except (IndexError, AttributeError):
pass
##If newvalue isn't numerical, or we don't have a self, just return what
##float would normally return
return newvalue
#the function has now been modified and we return the modified version
#to be used instead of the original version, f
return func_wrapper
The first decorator only applies to a method to which it is attached. But we want it to decorate all (actually, almost all) the methods inherited from float (well, those that appear in the float's __dict__, anyway). This second decorator will apply our first decorator to all of the methods in the float subclass except for those listed as exceptions (see this answer):
def for_all_methods_in_float(decorator,*exceptions):
def decorate(cls):
for attr in float.__dict__:
if callable(getattr(float, attr)) and not attr in exceptions:
setattr(cls, attr, decorator(getattr(float, attr),cls))
return cls
return decorate
Now we write the subclass much the same as you had before, but decorated, and excluding __new__ from decoration (I guess we could also exclude __init__ but __init__ doesn't return anything, anyway):
#for_all_methods_in_float(mydecorator,'__new__')
class Datum(float):
def __new__(klass, value, dtype="dtype", source="source", index1="index1", index2="index2"):
return super(Datum,klass).__new__(klass,value)
def __init__(self, value, dtype="dtype", source="source", index1="index1", index2="index2"):
self.value = value
self.dtype = dtype
self.source = source
self.index1 = index1
self.index2 = index2
super(Datum,self).__init__()
Here are our testing procedures; iteration seems to work correctly:
d1 = Datum(1.5)
d2 = Datum(3.2)
d3 = d1+d2
assert d3.source == 'source'
L=[d1,d2,d3]
d4=max(L)
assert d4.source == 'source'
L = [i for i in L]
assert L[0].source == 'source'
assert type(L[0]) == Datum
minimum = min(L)
assert [x - minimum for x in L][0].source == 'source'
Notes:
I am using Python 3. Not certain if that will make a difference for you.
This approach effectively overrides EVERY method of float other than the exceptions, even the ones for which the result isn't modified. There may be side effects to this (subclassing a built-in and then overriding all of its methods), e.g. a performance hit or something; I really don't know.
This will also decorate nested classes.
This same approach could also be implemented using a metaclass.
The problem is when you do :
x - minimum
in terms of types you are doing either :
datum - float, or datum - integer
Either way python doesn't know how to do either of them, so what it does is look at parent classes of the arguments if it can. since datum is a type of float, it can easily use float - and the calculation ends up being
float - float
which will obviously result in a 'float' - python has no way of knowing how to construct your datum object unless you tell it.
To solve this you either need to implement the mathematical operators so that python knows how to do datum - float or come up with a different design.
Assuming that 'dtype', 'source', index1 & index2 need to stay the same after a calculation - then as an example your class needs :
def __sub__(self, other):
return datum(value-other, self.dtype, self.source, self.index1, self.index2)
this should work - not tested
and this will now allow you to do this
d = datum(23.0, dtype="float", source="me", index1=1)
e = d - 16
print e.value, e.dtype, e.source, e.index1, e.index2
which should result in :
7.0 float me 1 None

are user defined classes mutable

Say I want to create a class for car, tractor and boat. All these classes have an instance of engine and I want to keep track of all the engines in a single list. If I understand correctly if the motor object is mutable i can store it as an attribute of car and also the same instance in a list.
I cant track down any solid info on whether user defined classes are mutable and if there is a choice to choose when you define them, can anybody shed some light?
User classes are considered mutable. Python doesn't have (absolutely) private attributes, so you can always change a class by reaching into the internals.
For using your class as a key in a dict or storing them in a set, you can define a .__hash__() method and a .__eq__() method, making a promise that your class is immutable. You generally design your class API to not mutate the internal state after creation in such cases.
For example, if your engines are uniquely defined by their id, you can use that as the basis of your hash:
class Engine(object):
def __init__(self, id):
self.id = id
def __hash__(self):
return hash(self.id)
def __eq__(self, other):
if isinstance(other, self.__class__):
return self.id == other.id
return NotImplemented
Now you can use instances of class Engine in sets:
>>> eng1 = Engine(1)
>>> eng2 = Engine(2)
>>> eng1 == eng2
False
>>> eng1 == eng1
True
>>> eng1 == Engine(1)
True
>>> engines = set([eng1, eng2])
>>> engines
set([<__main__.Engine object at 0x105ebef10>, <__main__.Engine object at 0x105ebef90>])
>>> engines.add(Engine(1))
>>> engines
set([<__main__.Engine object at 0x105ebef10>, <__main__.Engine object at 0x105ebef90>])
In the above sample I add another Engine(1) instance to the set, but it is recognized as already present and the set didn't change.
Note that as far as lists are concerned, the .__eq__() implementation is the important one; lists don't care if an object is mutable or not, but with the .__eq__() method in place you can test if a given engine is already in a list:
>>> Engine(1) in [eng1, eng2]
True
All objects (with the exception of a few in the standard library, some that implement special access mechanisms using things like descriptors and decorators, or some implemented in C) are mutable. This includes instances of user defined classes, classes themselves, and even the type objects that define the classes. You can even mutate a class object at runtime and have the modifications manifest in instances of the class created before the modification. By and large, things are only immutable by convention in Python if you dig deep enough.
I think you're confusing mutability with how python keeps references -- Consider:
class Foo(object):
pass
t = (1,2,Foo()) # t is a tuple, :. t is immutable
b = a[2] # b is an instance of Foo
b.foo = "Hello" # b is mutable. (I just changed it)
print (hash(b)) # b is hashable -- although the default hash isn't very useful
d = {b : 3} # since b is hashable, it can be used as a key in a dictionary (or set).
c = t # even though t is immutable, we can create multiple references to it.
a = [t] # here we add another reference to t in a list.
Now to your question about getting/storing a list of engines globally -- There are a few different ways to do this, here's one:
class Engine(object):
def __init__(self, make, model):
self.make = make
self.model = model
class EngineFactory(object):
def __init__(self,**kwargs):
self._engines = kwargs
def all_engines(self):
return self._engines.values()
def __call__(self,make, model):
""" Return the same object every for each make,model combination requested """
if (make,model) in _engines:
return self._engines[(make,model)]
else:
a = self._engines[(make,model)] = Engine(make,model)
return a
engine_factory = EngineFactory()
engine1 = engine_factory('cool_engine',1.0)
engine2 = engine_factory('cool_engine',1.0)
engine1 is engine2 #True !!! They're the same engine. Changing engine1 changes engine2
The example above could be improved a little bit by having the EngineFactory._engines dict store weakref.ref objects instead of actually storing real references to the objects. In that case, you'd check to make sure the reference is still alive (hasn't been garbage collected) before you return a new reference to the object.
EDIT: This is conceptually wrong, The immutable object in python can shed some light as to why.
class Engine():
def __init__(self, sn):
self.sn = sn
a = Engine(42)
b = a
print (a is b)
prints True.

sharing a string between two objects

I want two objects to share a single string object. How do I pass the string object from the first to the second such that any changes applied by one will be visible to the other? I am guessing that I would have to wrap the string in a sort of buffer object and do all sorts of complexity to get it to work.
However, I have a tendency to overthink problems, so undoubtedly there is an easier way. Or maybe sharing the string is the wrong way to go? Keep in mind that I want both objects to be able to edit the string. Any ideas?
Here is an example of a solution I could use:
class Buffer(object):
def __init__(self):
self.data = ""
def assign(self, value):
self.data = str(value)
def __getattr__(self, name):
return getattr(self.data, name)
class Descriptor(object):
def __get__(self, instance, owner):
return instance._buffer.data
def __set__(self, instance, value):
if not hasattr(instance, "_buffer"):
if isinstance(value, Buffer):
instance._buffer = value
return
instance._buffer = Buffer()
instance._buffer.assign(value)
class First(object):
data = Descriptor()
def __init__(self, data):
self.data = data
def read(self, size=-1):
if size < 0:
size = len(self.data)
data = self.data[:size]
self.data = self.data[size:]
return data
class Second(object):
data = Descriptor()
def __init__(self, data):
self.data = data
def add(self, newdata):
self.data += newdata
def reset(self):
self.data = ""
def spawn(self):
return First(self._buffer)
s = Second("stuff")
f = s.spawn()
f.data == s.data
#True
f.read(2)
#"st"
f.data
# "uff"
f.data == s.data
#True
s.data
#"uff"
s._buffer == f._buffer
#True
Again, this seems like absolute overkill for what seems like a simple problem. As well, it requires the use of the Buffer class, a descriptor, and the descriptor's impositional _buffer variable.
An alternative is to put one of the objects in charge of the string and then have it expose an interface for making changes to the string. Simpler, but not quite the same effect.
I want two objects to share a single
string object.
They will, if you simply pass the string -- Python doesn't copy unless you tell it to copy.
How do I pass the string object from
the first to the second such that any
changes applied by one will be visible
to the other?
There can never be any change made to a string object (it's immutable!), so your requirement is trivially met (since a false precondition implies anything).
I am guessing that I would have to
wrap the string in a sort of buffer
object and do all sorts of complexity
to get it to work.
You could use (assuming this is Python 2 and you want a string of bytes) an array.array with a typecode of c. Arrays are mutable, so you can indeed alter them (with mutating methods -- and some operators, which are a special case of methods since they invoke special methods on the object). They don't have the myriad non-mutating methods of strings, so, if you need those, you'll indeed need a simple wrapper (delegating said methods to the str(...) of the array that the wrapper also holds).
It doesn't seem there should be any special complexity, unless of course you want to do something truly weird as you seem to given your example code (have an assignment, i.e., a *rebinding of a name, magically affect a different name -- that has absolutely nothing to do with whatever object was previously bound to the name you're rebinding, nor does it change that object in any way -- the only object it "changes" is the one holding the attribute, so it's obvious that you need descriptors or other magic on said object).
You appear to come from some language where variables (and particularly strings) are "containers of data" (like C, Fortran, or C++). In Python (like, say, in Java), names (the preferred way to call what others call "variables") always just refer to objects, they don't contain anything except exactly such a reference. Some objects can be changed, some can't, but that has absolutely nothing to do with the assignment statement (see note 1) (which doesn't change objects: it rebinds names).
(note 1): except of course that rebinding an attribute or item does alter the object that "contains" that item or attribute -- objects can and do contain, it's names that don't.
Just put your value to be shared in a list, and assign the list to both objects.
class A(object):
def __init__(self, strcontainer):
self.strcontainer = strcontainer
def upcase(self):
self.strcontainer[0] = self.strcontainer[0].upper()
def __str__(self):
return self.strcontainer[0]
# create a string, inside a shareable list
shared = ['Hello, World!']
x = A(shared)
y = A(shared)
# both objects have the same list
print id(x.strcontainer)
print id(y.strcontainer)
# change value in x
x.upcase()
# show how value is changed in both x and y
print str(x)
print str(y)
Prints:
10534024
10534024
HELLO, WORLD!
HELLO, WORLD!
i am not a great expert in python, but i think that if you declare a variable in a module and add a getter/setter to the module for this variable you will be able to share it this way.

Categories