Serialize both inner and outer class arguments - python

I'm not quite sure I'm using the right wording in my researches -- if that's the case, please let me know, I may have missed obvious answers just because of that -- but I'd like to serialize (i.e. convert to a dictionary or JSON structure) both the main (outer) and inner class arguments of a class.
Here's an example:
class Outer(object):
def __init__(self, idx, array1, array2):
self.idx = idx
# flatten individual values:
## unpack first array
self.prop_a = array1[0]
self.prop_b = array1[1]
self.prop_c = array1[2]
## unpack second array
self.prop_d = array2[0]
self.prop_e = array2[1]
self.prop_f = array2[2]
# Nest elements to fit a wanted JSON schema
class inner1(object):
def __init__(self, outer):
self.prop_a = outer.prop_a
self.prop_b = outer.prop_b
self.prop_c = outer.prop_c
class inner2(object):
def __init__(self, outer):
self.prop_d = outer.prop_d
self.prop_e = outer.prop_e
self.prop_f = outer.prop_f
self.inner_first = inner1(self)
self.inner_second = inner2(self)
def serialize(self):
return vars(self)
Now I can call both:
import numpy as np
obj = Outer(10, np.array([1,2,3]), np.array([4,5,6]))
obj.prop_a # returns 1, or
obj.inner_first.prop_1 # also returns 1
But when I try to serialize it, it prints:
vars(obj) # prints:
{'idx': 10,
'prop_a': 1,
'prop_b': 2,
'prop_c': 3,
'prop_d': 4,
'prop_e': 5,
'prop_f': 6,
'inner_first': <__main__.Outer.__init__.<locals>.inner1 at 0x7f231a4fe3b0>,
'inner_second': <__main__.Outer.__init__.<locals>.inner2 at 0x7f231a4febc0>}
where I want it to print:
vars(obj) # prints:
{'idx': 10,
'prop_a': 1,
'prop_b': 2,
'prop_c': 3,
'prop_d': 4,
'prop_e': 5,
'prop_f': 6,
'inner_first': {'prop_a': 1, 'prop_b': 2, 'prop_c': 3},
'inner_second': {'prop_d': 4, 'prop_e': 5, 'prop_f': 6}}
with the 'inner_first' key being the actual results of vars(obj.inner_first), and same thing for the 'inner_second' key.
Ideally I'd like to call the serialize() method to convert my object to the desired output: obj.serialize()
I feel I'm close to the results but I can simply not see where I must go to solve this task.
At the really end, I wish I could simply:
obj = Outer(10, np.array([1,2,3]), np.array([4,5,6]))
obj.serialze()
{
'inner_first': {
'prop_a': 1,
'prop_b': 2,
'prop_c': 3
},
'inner_second': {
'prop_d': 4,
'prop_e': 5,
'prop_f': 6
}
}
in order to basically fit a given JSON structure that I have.
Info: this thread helped me to build the inner classes.
Also note that this question only embeds two "layers" or "levels" of the final structure, but I may have more than 2.

Related

How is sorted(key=lambda x:) implemented behind the scene?

An example:
names = ["George Washington", "John Adams", "Thomas Jefferson", "James Madison"]
sorted(names, key=lambda name: name.split()[-1].lower())
I know key is used to compare different names, but it can have two different implementations:
First compute all keys for each name, and bind the key and name together in some way, and sort them. The p
Compute the key each time when a comparison happens
The problem with the first approach is that it has to define another data structure to bind the key and data. The problem with the second approach is that the key might be computed for multiple times, that is, name.split()[-1].lower() will be executed many times, which is very time-consuming.
I am just wondering in which way Python implemented sorted().
The key function is executed just once per value, to produce a (keyvalue, value) pair; this is then used to sort and later on just the values are returned in the sorted order. This is sometimes called a Schwartzian transform.
You can test this yourself; you could count how often the function is called, for example:
>>> def keyfunc(value):
... keyfunc.count += 1
... return value
...
>>> keyfunc.count = 0
>>> sorted([0, 8, 1, 6, 4, 5, 3, 7, 9, 2], key=keyfunc)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> keyfunc.count
10
or you could collect all the values that are being passed in; you'll see that they follow the original input order:
>>> def keyfunc(value):
... keyfunc.arguments.append(value)
... return value
...
>>> keyfunc.arguments = []
>>> sorted([0, 8, 1, 6, 4, 5, 3, 7, 9, 2], key=keyfunc)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> keyfunc.arguments
[0, 8, 1, 6, 4, 5, 3, 7, 9, 2]
If you want to read the CPython source code, the relevant function is called listsort(), and the keyfunc is used in the following loop (saved_ob_item is the input array), which is executed before sorting takes place:
for (i = 0; i < saved_ob_size ; i++) {
keys[i] = PyObject_CallFunctionObjArgs(keyfunc, saved_ob_item[i],
NULL);
if (keys[i] == NULL) {
for (i=i-1 ; i>=0 ; i--)
Py_DECREF(keys[i]);
if (saved_ob_size >= MERGESTATE_TEMP_SIZE/2)
PyMem_FREE(keys);
goto keyfunc_fail;
}
}
lo.keys = keys;
lo.values = saved_ob_item;
so in the end, you have two arrays, one with keys and one with the original values. All sort operations act on the two arrays in parallel, sorting the values in lo.keys and moving the elements in lo.values in tandem.

Creating a union of two dictionaries

What I am attempting to accomplish is to create a union of two dictionaries (consisting of single integers i.e. 1, 2, 3, 4, etc.) by taking the keys out of the dictionary, putting them into two lists, joining the two lists and then putting them back into a new dictionary that contains both lists. However, I am running into the
TypeError: unsupported operand type(s) for +:
'builtin_function_or_method' and 'builtin_function_or_method'
How would I get around this error?
Here are the relevant pieces of code.
class DictSet:
def __init__(self, elements):
self.newDict = {}
for i in elements:
self.newDict[i] = True
def union(self, otherset):
a = self.newDict.keys
b = otherset.newDict.keys
list1 = a + b
new = DictSet(list1)
return new
def main():
allints = DictSet([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
odds = DictSet([1, 3, 5, 7, 9])
evens = DictSet([2, 4, 6, 8, 10])
Why not use dict.update()?
def union(self, otherset):
res = DictSet([])
res.newDict = dict(self.newDict)
res.newDict.update(otherset.newDict)
return res
You must call the keys() method. Try this:
a = self.newDict.keys()
b = otherset.newDict.keys()
EDIT: I see you are using Python3. In that case:
a = list(self.newDict)
b = list(otherset.newDict)

saving more than one value in one Python array position (kinda container)

I would like to use something as a container but I can't do objects... I believe there is some library or collection or something which could help my.
I want to save a few connected values into one array position:
array = []
array.append(value1 = 1, value2 = 2, value3 = 3)
array.append(value1 = 5, value2 = 7, value3 = 10)
array.append(value1 = 2, value2 = 3, value3 = 3)
Something like this... And then I would like to search in this array like
for n in array:
n.value1 = ....
But I'm beginner and don't know much about the language... Can you please help me?
you are looking for a dictionary. it can be used like this:
d = {"value1": 1, "value2": 2, "value3": 3}
for k in d:
print("key: {}, value: {}".format(k, d[k]))
here are the docs: https://docs.python.org/2/tutorial/datastructures.html#dictionaries
for your problem you 'll need a list of dictionaries. like this:
list_of_dict = []
list_of_dict.append({"value1": 1, "value2": 2, "value3": 3})
list_of_dict.append({"value1": 5, "value2": 7, "value3": 10})
list_of_dict.append({"value1": 2, "value2": 3, "value3": 3})
for dct in list_of_dict:
dct["value1"] = ...
As mentioned in the comment you are looking for a dictionary; see the docs or this tutorial.
Example code:
dict = {'value1':1,'value2':2,'value3':3}

Issue with Python class instances having a shallow connection

I'm attempting to write a genetic algorithm framework in Python, and am running into issues with shallow/deep copying. My background is mainly C/C++, and I'm struggling to understand how these connections are persisting.
What I am seeing is an explosion in the length of an attribute list within a subclass. My code is below...I'll point out the problems.
This is the class for a single gene. Essentially, it should have a name, value, and boolean flag. Instances of Gene populate a list within my Individual class.
# gene class
class Gene():
# constructor
def __init__(self, name, is_float):
self.name_ = name
self.is_float_ = is_float
self.value_ = self.randomize_gene()
# create a random gene
def randomize_gene(self):
return random.random()
This is my Individual class. Each generation, a population of these are created (I'll show the creation code after the class declaration) and have the typical genetic algorithm operations applied. Of note is the print len(self.Genes_) call, which grows each time this class is instantiated.
# individual class
class Individual():
# genome definition
Genes_ = [] # genes list
evaluated_ = False # prevent re-evaluation
fitness_ = 0.0 # fitness value (from evaluation)
trace_ = "" # path to trace file
generation_ = 0 # generation to which this individual belonged
indiv_ = 0 # identify this individual by number
# constructor
def __init__(self, gen, indv):
# assign indices
self.generation_ = gen
self.indiv_ = indv
self.fitness_ = random.random()
# populate genome
for lp in cfg.params_:
g = Gene(lp[0], lp[1])
self.Genes_.append(g)
print len(self.Genes_)
> python ga.py
> 24
> 48
> 72
> 96
> 120
> 144
......
As you can see, each Individual should have 24 genes, however this population explodes quite rapidly.
I create an initial population of new Individuals like this:
# create a randomized initial population
def createPopulation(self, gen):
loc_population = []
for i in range(0, cfg.population_size_):
indv = Individual(gen, i)
loc_population.append(indv)
return loc_population
and later on my main loop (apologies for the whole dump, but felt it was necessary - if my secondary calls (mutation/crossover) are needed please let me know))
for i in range(0, cfg.generations_):
# evaluate current population
self.evaluate(i)
# sort population on fitness
loc_pop = sorted(self.population_, key=operator.attrgetter('fitness_'), reverse=True)
# create next population & preserve elite individual
next_population = []
elitist = copy.deepcopy(loc_pop[0])
elitist.generation_ = i
next_population.append(elitist)
# perform selection
selection_pool = []
selection_pool = self.selection(elitist)
# perform crossover on selection
new_children = []
new_children = self.crossover(selection_pool, i)
# perform mutation on selection
muties = []
muties = self.mutation(selection_pool, i)
# add members to next population
next_population = next_population + new_children + muties
# fill out the rest with random
for j in xrange(len(next_population)-1, cfg.population_size_ - 1):
next_population.append(Individual(i, j))
# copy next population over old population
self.population_ = copy.deepcopy(next_population)
# clear old lists
selection_pool[:] = []
new_children[:] = []
muties[:] = []
next_population[:] = []
I'm not not completely sure that I understand your question, but I suspect that your problem is that the Genes_ variable in your Individual() class is declared in the class namespace. This namespace is available to all members of the class. In other words, each instance of Individual() will share the same variable Genes_.
Consider the following two snippets:
class Individual():
# genome definition
genes = []
def __init__(self):
for i in xrange(10):
self.genes.append(i)
ind_1 = Individual()
print ind_1.genes
ind_2 = Individual()
print ind_1.genes
print ind_2.genes
and
class Individual():
# genome definition
def __init__(self):
self.genes = []
for i in xrange(10):
self.genes.append(i)
ind_1 = Individual()
print ind_1.genes
ind_2 = Individual()
print ind_1.genes
print ind_2.genes
The first snippet outputs
>>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
while the second snippet outputs
>>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In the first scenario, when the second Individual() is instantiated the genes list variable already exists, and the genes from the second individual are added to this existing list.
Rather than creating the Individual() class like this,
# individual class
class Individual():
# genome definition
Genes_ = [] # genes list
# constructor
def __init__(self, gen, indv):
# assign indices
self.generation_ = gen
self.indiv_ = indv
self.fitness_ = random.random()
you should consider declaring the Genes_ variable in init so that each Individual() instance gets its own gene set
# individual class
class Individual():
# constructor
def __init__(self, gen, indv):
# genome definition
self.Genes_ = [] # genes list
# assign indices
self.generation_ = gen
self.indiv_ = indv
self.fitness_ = random.random()
When you create a class, you are really creating exactly one 'class object'. These are objects just like any other object in Python; everything in Python is an object, and what those objects do is defined by their methods, not their class! That is the magic of duck typing. In Python you can even create new classes dynamically on the fly.
Anyway, you are adding exactly one list object to the "Genes_" attribute of the one and only "Individuals" class object. The upshot is that every instance object of the "Individual" class object is accessing the same "Genes_" list object.
Consider this
# In 2.2 <= Python < 3.0 you should ALWAYS inherit from 'object'.
class Foobar(object):
doodah = []
a = Foobar()
b = Foobar()
assert id(a.doodah) == id(b.doodah) # True
In this case, as you can see, "a.doodah" and "b.doodah" are the same object!
class Foobar(object):
def __init__(self):
self.doodah = []
a = Foobar()
b = Foobar()
assert id(a.doodah) != id(b.doodah) # True
In this case, they are different objects.
It's possible to have your cake and eat it too. Consider this
class Foobar(object):
doodah = []
a = Foobar()
b = Foobar()
a.doodah = 'hlaghalgh'
assert id(a.doodah) != id(b.doodah) # True
In this case a "doodah" attribute is added to the "a" object, which overrides the class attribute.
Hope this helps!

What is the purpose of deepcopy's second parameter, memo?

from copy import*
a=[1,2,3,4]
c={'a':'aaa'}
print c
#{'a': 'aaa'}
b=deepcopy(a,c)
print b
print c
# print {'a': 'aaa', 10310992: 3, 10310980: 4, 10311016: 1, 11588784: [1, 2, 3, 4, [1, 2, 3, 4]], 11566456: [1, 2, 3, 4], 10311004: 2}
why c print that
Please try to use the code, rather than text, because my English is not very good, thank you
in django.utils.tree.py
def __deepcopy__(self, memodict):
"""
Utility method used by copy.deepcopy().
"""
obj = Node(connector=self.connector, negated=self.negated)
obj.__class__ = self.__class__
obj.children = deepcopy(self.children, memodict)
obj.subtree_parents = deepcopy(self.subtree_parents, memodict)
return obj
import copy
memo = {}
x1 = range(5)
x2=range(6,9)
x3=[2,3,4,11]
y1 = copy.deepcopy(x1, memo)
y2=copy.deepcopy(x2, memo)
y3=copy.deepcopy(x3,memo)
print memo
print id(y1),id(y2),id(y3)
y1[0]='www'
print y1,y2,y3
print memo
print :
{10310992: 3, 10310980: 4, 10311016: 1, 11588784: [0, 1, 2, 3, 4, [0, 1, 2, 3, 4]], 10311028: 0, 11566456: [0, 1, 2, 3, 4], 10311004: 2}
{11572448: [6, 7, 8], 10310992: 3, 10310980: 4, 10311016: 1, 11572368: [2, 3, 4, 11], 10310956: 6, 10310896: 11, 10310944: 7, 11588784: [0, 1, 2, 3, 4, [0, 1, 2, 3, 4], 6, 7, 8, [6, 7, 8], 11, [2, 3, 4, 11]], 10311028: 0, 11566456: [0, 1, 2, 3, 4], 10310932: 8, 10311004: 2}
11572408 11581280 11580960
['www', 1, 2, 3, 4] [6, 7, 8] [2, 3, 4, 11]
{11572448: [6, 7, 8], 10310992: 3, 10310980: 4, 10311016: 1, 11572368: [2, 3, 4, 11], 10310956: 6, 10310896: 11, 10310944: 7, 11588784: [0, 1, 2, 3, 4, [0, 1, 2, 3, 4], 6, 7, 8, [6, 7, 8], 11, [2, 3, 4, 11]], 10311028: 0, 11566456: ['www', 1, 2, 3, 4], 10310932: 8, 10311004: 2}
No one above gave a good example of how to use it.
Here's what I do:
def __deepcopy__(self, memo):
copy = type(self)()
memo[id(self)] = copy
copy._member1 = self._member1
copy._member2 = deepcopy(self._member2, memo)
return copy
Where member1 is an object not requiring deepcopy (like a string or integer), and member2 is one that does, like another custom type or a list or dict.
I've used the above code on highly tangled object graphs and it works very well.
If you also want to make your classes pickleable (for file save / load), there is not analogous memo param for getstate / setstate, in other words the pickle system somehow keeps track of already referenced objects, so you don't need to worry.
The above works on PyQt5 classes that you inherit from (as well as pickling - for instance I can deepcopy or pickle a custom QMainWindow, QWidget, QGraphicsItem, etc.)
If there is some initialization code in your constructor that creates new objects, for instance a CustomWidget(QWidget) that creates a new CustomScene(QGraphicsScene), but you'd like to pickle or copy the scene from one CustomWidget to a new one, then one way is to make a new=True parameter in your __init__ and say:
def __init__(..., new=True):
....
if new:
self._scene = CustomScene()
def __deepcopy__(self, memo):
copy = type(self)(..., new=False)
....
copy._scene = deepcopy(self._scene, memo)
....
That ensures you don't create a CustomScene (or some big class that does a lot of initializing) twice! You also should use the same setting (new=False) in your __setstate__ method, eg.:
def __setstate__(self, data):
self.__init__(...., new=False)
self._member1 = data['member 1']
.....
There are other ways to get around the above, but this is the one I converged to and use frequently.
Why did I talk about pickling as well? Because you will want both in any application typically, and you maintain them at the same time. If you add a member to your class, you add it to setstate, getstate, and deepcopy code. I would make it a rule that for any new class you make, you create the above three methods if you plan on doing copy / paste an file save / load in your app. Alternative is JSON and save / loading yourself, but then there's a lot more work for you to do including memoization.
So to support all the above, you need __deepcopy__, __setstate__, and __getstate__ methods and to import deepcopy:
from copy import deepcopy
, and when you write your pickle loader / saver functions (where you call pickle.load()/ pickle.dump() to load / save your object hierarchy / graph) do import _pickle as pickle for the best speeds (_pickle is some faster C impl which is usually compatible with your app requirements).
It's the memo dict, where id-to-object correspondence is kept to reconstruct complex object graphs perfectly. Hard to "use the code", but, let's try:
>>> import copy
>>> memo = {}
>>> x = range(5)
>>> y = copy.deepcopy(x, memo)
>>> memo
{399680: [0, 1, 2, 3, 4], 16790896: 3, 16790884: 4, 16790920: 1,
438608: [0, 1, 2, 3, 4, [0, 1, 2, 3, 4]], 16790932: 0, 16790908: 2}
>>>
and
>>> id(x)
399680
>>> for j in x: print j, id(j)
...
0 16790932
1 16790920
2 16790908
3 16790896
4 16790884
so as you see the IDs are exactly right. Also:
>>> for k, v in memo.items(): print k, id(v)
...
399680 435264
16790896 16790896
16790884 16790884
16790920 16790920
438608 435464
16790932 16790932
16790908 16790908
you see the identity for the (immutable) integers.
So here's a graph:
>>> z = [x, x]
>>> t = copy.deepcopy(z, memo)
>>> print id(t[0]), id(t[1]), id(y)
435264 435264 435264
so you see all the subcopies are the same objects as y (since we reused the memo).
You can read more by checking the Python online documentation:
http://docs.python.org/library/copy.html
The deepcopy() function is recursive, and it will work its way down through a deeply nested object. It uses a dictionary to detect objects it has seen before, to detect an infinite loop. You should just ignore this dictionary.
class A(object):
def __init__(self, *args):
self.lst = args
class B(object):
def __init__(self):
self.x = self
def my_deepcopy(arg):
try:
obj = type(arg)() # get new, empty instance of type arg
for key in arg.__dict__:
obj.__dict__[key] = my_deepcopy(arg.__dict__[key])
return obj
except AttributeError:
return type(arg)(arg) # return new instance of a simple type such as str
a = A(1, 2, 3)
b = B()
b.x is b # evaluates to True
c = my_deepcopy(a) # works fine
c = my_deepcopy(b) # stack overflow, recurses forever
from copy import deepcopy
c = deepcopy(b) # this works because of the second, hidden, dict argument
Just ignore the second, hidden, dict argument. Do not try to use it.
Here's a quick illustration I used for explaining this to myself:
a = [1,2,3]
memo = {}
b = copy.deepcopy(a,memo)
# now memo = {139907464678864: [1, 2, 3], 9357408: 1, 9357440: 2, 9357472: 3, 28258000: [1, 2, 3, [1, 2, 3]]}
key = 139907464678864
print(id(a) == key) #True
print(id(b) == key) #False
print(id(a) == id(memo[key])) #False
print(id(b) == id(memo[key])) #True
in other words:
memo[id_of_initial_object] = copy_of_initial_object

Categories