Python ORM to NumPy arrays - python

I am building data simulation framework with numpy ORM, where it is much more convenient to work with classes and objects instead of numpy arrays directly. Nevertheless, output of the simulation should be numpy array. Also blockz is quite interesting as a backend here.
I would like to map all object attributes to numpy arrays. Thus, numpy arrays work like a column-oriented "persistent" storage for my classes. I also need to link "new" attributes to objects which I can calculate using numpy(pandas) framework. And then just link them to objects accordingly using the same back-end.
Is there any solution for such approach? Would you recommend any way to build it in a HPC way?
I have found only django-pandas. PyTables is quite slow on adding new columns-attributes.
Something like (working on pointers to np_array):
class Instance()
def __init__(self, np_array, np_position):
self.np_array = np_array
self.np_position = np_position
def get_test_property():
return(self.np_array[np_position])
def set_test_property(value):
self.np_array[np_position] = value

In fact there is a way to change NumPy or bcolz arrays by reference.
Simple example can be found in the following code.
a = np.arange(10)
class Case():
def __init__(self, gcv_pointer):
self.gcv = gcv_pointer
def gcv(self):
return(self.gcv)
def gcv_set(self, value):
self.gcv[:] = value
pass
#===============================================================================
# NumPy
#===============================================================================
caseList = []
for i in range(1, 10):
case = Case(a[i-1:i])
caseList.append(case)
gcvs = [case.GetGCV() for case in caseList]
caseList[1].SetGCV(5)
caseList[1].SetGCV(13)
caseList[1].gcv[:] = 6
setattr(caseList[1], 'dpd', a[5:6])
caseList[1].dpd
caseList[1].dpd[:] = 888

Related

Create a numpy array with a user-defined method which assign a value

I need to create an empty numpy array (of unknown shape, at the creation time), and to be able to use specific methods (named "upload" and "download") in order to update its values and retrieve them.
Some more context: opencv has a class called cuda_GpuMat, which when you want to perform calculations on the GPU, you suppose to first create an instance of one, and then assign the np.array you wish to work on, using the upload() method, for example:
import numpy as np
import cv2
import cv2.cuda as cvc
x = np.arange(3)
x_gpu = cv2.cuda_GpuMat()
x_gpu.upload(x)
x_gpu = cvc.multiply(x_gpu, x_gpu)
x_gpu.download()
output:
array([[0],
[1],
[4]], dtype=int32)
I want a compatible class for regular cpu, so I can run the same code (only change: import cv2.cuda as cvc to import cv2 as cvc).
I read about ndarray subclassing but couldn't figure out how to do it properly.
what have I tried?
import cv2
import numpy as np
class CpuMat(np.ndarray):
def __new__(subtype, shape=0, **kwargs):
obj = super(CpuMat, subtype).__new__(subtype, shape, **kwargs)
obj.emptyflag = True
return obj
def __array_finalize__(self, obj):
if obj is None: return
def upload(self, img):
self.img = img
self.emptyflag = False
def download(self):
return self.img
def empty(self):
return self.emptyflag
Obviously this wouldn't work because, as in the example above, the cv2 functions operate on the array itself, but in my implementation the array is stored in the "img" attribute of the object.
I've tried several versions of the above (for example, I tried to define a repr() function of the class to return self.img, but it have to be a string so it didn't work), but I'm not sure I'm in the right direction to the solution.
So any help here would be great.

Python: How to create an object with expandable numpy array members

I have the following use case:
I am receiving 100 samples per second of several numpy nd-arrays, say a of shape (4, 3), and b of shape (5, 6).
On other instances, I could be receiving c of shape (2, 3), and d of shape (3, 5) and e of some other shape, and so on.
These samples arrive for some variable time between a single sample and 360000 samples (an hour).
I would like to treat an entire streaming session as an object
I would like to have something like
class Aggregator_ab(object):
def __init__(self):
self.a = ???
self.b = ???
def add_sample(new_a, new_b):
self.a.update(new_a) # How can I achieve such an update?
self.b.update(new_b) # How can I achieve such an update?
Now, I would like to
access Aggregator_ab's fields as numpy arrays:
agg = Aggregator_ab() # maybe with parameters?
num_samples = 4
for i in range(num_samples):
agg.add_sample(new_a=np.zeros((4, 3), new_b=np.zeros((5, 6))
assert agg.a.shape[0] == num_samples
assert agg.b.shape[0] == num_samples
assert agg.a.shape[1:] == (4, 3)
assert agg.b.shape[1:] == (5, 6)
And I would also expect regular numpy behavior of the members of agg.
My current code has some problems, and looks something like:
class Aggregator_ab(object):
def __init__(self):
self.__as = []
self.__bs = []
def add_sample(new_a, new_b):
self.__as.append(new_a)
self.__bs.append(new_b)
#property
def a(self):
return np.vstack(self.__as)
#property
def b(self):
return np.vstack(self.__bs)
problems:
can only get a "full" numpy array after using vstack
must use expensive vstack every "get"
can't utilize previous vstacks
adding any field requires a lot of boilerplate which I would like to extract
This only supports very limited use cases, and If i ever want something more, I would have to implement myself.
going through native python lists is the only way to scale the array size without paying too much for resizing. Had I used vstack to keep a numpy array, at large sizes, I wouldn't be able to keep up with the frame rate.
This seems to me like a common use case, thus I believe someone has solved this before me.
Is there some library that does this? I know pandas sounds right, but what do I do if I have fields that are matrices?
If not, then how is this usually dealt with?
What about allocating an array that keeps growing in size? It works like vectors in most common languages:
import numpy as np
class growing_array:
def __init__(self, shape, growth):
self._arr=np.empty(shape=(*shape, growth))
self.incoming_shape=shape
self.num_entries=0
self.growth=growth
def append(self, incoming_arr):
if self.num_entries == self._arr.shape[2]:
self._arr.resize(*self.incoming_shape, self._arr.shape[2]+self.growth)
self._arr[:,:,self.num_entries] = incoming_arr
self.num_entries+=1
def get(self):
return self._arr[:,:,0:self.num_entries]

Automatically initialize multiple instance of class

So far I mostly used Python for data analysis but for some time try to implement stuff. Right now I'm trying to implement a toxicokinetic-toxicodynamic model for a fish to analyse the effect of chemicals on them.
So given the following code:
import numpy as np
class fish():
def __init__(self):
self.resistance_threshold = np.random.normal(0,1)
My question is now, say I would like to initialize multiple instances of the fishclass (say 1000 fish), each with a different resistance to a chemical in order to model an agent-based population. How could one achieve this automatically?
I was wondering if there is something as using for example an index as part of the variable name, e.g.:
for fishid in range(0,1000):
fishfishid = fish() # use here the value of fishid to become the variables name. E.g. fish1, fish2, fish3, ..., fish999
Now even if there is a possibility to do this in Python, I always have the feeling, that implementing those 1000 instances is kinda bad practice. And was wondering if there is like an OOP-Python approach. Such as e.g. setting up a class "population" which initializes it within its own __init__function, but how would I assign the fish without initializing them first?
Any tipps, pointers or links would be greatly appreciated.
You can create a class FishPopulation and then store there all the Fish you need based on the size argument. For example, something like this would work:
import numpy as np
class Fish:
def __init__(self):
self.resistance_threshold = np.random.normal(0, 1)
class FishPopulation:
def __init__(self, size=1000):
self.size = size
self.fishes = [Fish() for _ in range(size)]
You can iterate over it like this:
fish_population = FishPopulation(size=10)
for fish in fish_population.fishes:
print(fish.resistance_threshold)
>>>
-0.9658927669391391
-0.5934917229482478
0.8827336199040103
-1.5729644992077412
-0.7682070400307331
1.464407499255235
0.7724449293785645
-0.7296586180041732
-1.1989783570280217
0.15716170041128566
And you can access their indexes like this:
print(fish_population.fishes[0].resistance_threshold)
>>> -0.9658927669391391

Numpy: Array of class instances

This might be a dumb question, but say i want to build a program from bottom-up like so:
class Atom(object):
def __init__(self):
'''
Constructor
'''
def atom(self, foo, bar):
#...with foo and bar being arrays of atom Params of lengths m & n
"Do what atoms do"
return atom_out
...i can put my instances in a dictionary:
class Molecule(Atom):
def __init__(self):
def structure(self, a, b):
#a = 2D array of size (num_of_atoms, m); 'foo' Params for each atom
#b = 2D array of size (num_of_atoms, n); 'bar' Params for each atom
unit = self.atom()
fake_array = {"atom1": unit(a[0], b[0]),
"atom2": unit(a[1], b[1]),
: : :
: : :}
def chemicalBonds(self, this, that, theother):
: : :
: : :
My question is, is there a way to do this with numpy arrays so that each element in "real_array" would be an instance of atom--i.e., the output of the individual computations of atom function? I can extend this to class Water(molecule): which would perform fast numpy operations on the large structure and chemicalBonds outputs, hence the need for arrays...Or is it the case that i'm going about this the wrong way?
Also if i am on the right track, i'd appreciate if you wanted to throw in any tips on how to structure a "hierarchical program" like this, as i'm not sure i'm doing the above correctly and recently discovered that i don't know what i'm doing.
Thanks in advance.
The path to hell is paved with premature optimization... As a beginner in python, focus on your program and what is supposed to do, once it is doing it too slowly you can ask focused questions about how to make it do it faster. I would stick with learning python's intrinsic data structures for managing your objects. You can implement your algorithms using using numpy arrays with standard data types if you are doing large array operations. Once you have some working code you can do performance testing to determine where you need optimization.
Numpy does allow you to create arrays of objects, and I will give you enough rope to hang yourself with below, but creating an ecosystem of tools to operate on those arrays of objects is not a trivial undertaking. You should first work with python data structures (buy Beazley's essential python reference), then with numpy's built in types, then creating your own compound numpy types. As a last resort, use the object type from the example below.
Good luck!
David
import numpy
class Atom(object):
def atoms_method(self, foo, bar):
#...with foo and bar being arrays of Paramsof length m & n
atom_out = foo + bar
return atom_out
array = numpy.ndarray((10,),dtype=numpy.object)
for i in xrange(10):
array[i] = Atom()
for i in xrange(10):
print array[i].atoms_method(i, 5)

defining an already exisiting numpy array within a class

I am creating an object in python. I have a numpy array from an H5 file that I would like to define within it. The numpy array is coordinates. I was poking around online and found tons of information about creating numpy arrays, or creating objects in numpy arrays.. but I can't find anything on defining an already made numpy array inside an object.
class Node(object):
def __init__(self, globalIndex, coordinates):
#Useful things to record
self.globalIndex = globalIndex
self.coordinates = numpy.coordinates
#Dictionaries to be used
self.localIndices ={}
self.GhostLayer = {}
My question: is there a specific way to define my numpy array within this class? If not (the fact that I couldn't find anything about it makes me think that it can't be done), how else could I import a numpy array?
class Node(object):
def __init__(self, globalIndex, coordinates):
#Useful things to record
self.globalIndex = globalIndex
self.coordinates = coordinates # now self.coordinates is just another name for your array
Assuming n = Node(some_index, numpy_coordinate_array_name)

Categories