Python: How to create an object with expandable numpy array members - python

I have the following use case:
I am receiving 100 samples per second of several numpy nd-arrays, say a of shape (4, 3), and b of shape (5, 6).
On other instances, I could be receiving c of shape (2, 3), and d of shape (3, 5) and e of some other shape, and so on.
These samples arrive for some variable time between a single sample and 360000 samples (an hour).
I would like to treat an entire streaming session as an object
I would like to have something like
class Aggregator_ab(object):
def __init__(self):
self.a = ???
self.b = ???
def add_sample(new_a, new_b):
self.a.update(new_a) # How can I achieve such an update?
self.b.update(new_b) # How can I achieve such an update?
Now, I would like to
access Aggregator_ab's fields as numpy arrays:
agg = Aggregator_ab() # maybe with parameters?
num_samples = 4
for i in range(num_samples):
agg.add_sample(new_a=np.zeros((4, 3), new_b=np.zeros((5, 6))
assert agg.a.shape[0] == num_samples
assert agg.b.shape[0] == num_samples
assert agg.a.shape[1:] == (4, 3)
assert agg.b.shape[1:] == (5, 6)
And I would also expect regular numpy behavior of the members of agg.
My current code has some problems, and looks something like:
class Aggregator_ab(object):
def __init__(self):
self.__as = []
self.__bs = []
def add_sample(new_a, new_b):
self.__as.append(new_a)
self.__bs.append(new_b)
#property
def a(self):
return np.vstack(self.__as)
#property
def b(self):
return np.vstack(self.__bs)
problems:
can only get a "full" numpy array after using vstack
must use expensive vstack every "get"
can't utilize previous vstacks
adding any field requires a lot of boilerplate which I would like to extract
This only supports very limited use cases, and If i ever want something more, I would have to implement myself.
going through native python lists is the only way to scale the array size without paying too much for resizing. Had I used vstack to keep a numpy array, at large sizes, I wouldn't be able to keep up with the frame rate.
This seems to me like a common use case, thus I believe someone has solved this before me.
Is there some library that does this? I know pandas sounds right, but what do I do if I have fields that are matrices?
If not, then how is this usually dealt with?

What about allocating an array that keeps growing in size? It works like vectors in most common languages:
import numpy as np
class growing_array:
def __init__(self, shape, growth):
self._arr=np.empty(shape=(*shape, growth))
self.incoming_shape=shape
self.num_entries=0
self.growth=growth
def append(self, incoming_arr):
if self.num_entries == self._arr.shape[2]:
self._arr.resize(*self.incoming_shape, self._arr.shape[2]+self.growth)
self._arr[:,:,self.num_entries] = incoming_arr
self.num_entries+=1
def get(self):
return self._arr[:,:,0:self.num_entries]

Related

Why is the returned address and not value being printed?

So I wrote a code for returning a list of objects with dimensions (5, 3, 3, 3) in python 3.5. Now my problem is that whenever I try to print the returned value, it prints the addresses of 5 separate 3D lists instead of the list as a whole. Even the type of returned value shows up as list. What exactly is the problem here?
Here is my initializing and returning function.
class layer(object):
def __init__(self, inputDimensions, channels, padding, stride, layerInput):
self.inputDimensions = inputDimensions
self.channels = channels
self.padding = padding
self.stride = stride
self.layerInput = layerInput
def getLayerInfo(self):
return self.inputDimensions, self.channels, self.padding, self.stride
def getLayerInput(self):
return self.layerInput
def getLayerFilterInfo(self):
return self.filterDimensions, self.numberOfFilters
def getLayerFilters(self):
return self.filters
def initializeFilter(self, filterDimensions, numberOfFilters):
self.filterDimensions = filterDimensions
self.numberOfFilters = numberOfFilters
self.filters = []
for i in range(0, self.numberOfFilters):
fil = filters(self.filterDimensions)
self.filters.append(fil)
Here is my filter class.
class filters(object):
def __init__(self, dimensions):
self.dimensions = dimensions
self.fil = np.random.random_sample(self.dimensions)
Here is a sample of input and output.
In [11]: l.getLayerFilters()
Out[11]:
[<__main__.filters at 0xb195a90>,
<__main__.filters at 0xb1cb588>,
<__main__.filters at 0xb1cb320>,
<__main__.filters at 0xb1cb5c0>,
<__main__.filters at 0xb1cbba8>]
In [12]: type(l.getLayerFilters())
Out[12]: list
In short: Instead of doing this:
fil = filters(self.filterDimensions)
self.filters.append(fil)
you can probably achieve what you want if you do this:
fil = filters(self.filterDimensions)
self.filters += fil.fil
Not sure, it depends on what those filters are supposed to be and how you want to put them together and in your result. Then again, it seems possible to get rid of the filters class altogether.
As it is, you're creating instances of your filters class, then append them to a list. What you get when you print that list is, as expected and seen in OP, a list of objects. Those objects ofc don't represent themselves as lists, they are all general objects with a default string representation that only shows the class that object "comes from" and the address in memory to remind you that it's indeed an object and not a class. To make your class aware of that it's supposed to be some sort of list and to make it behave that way, you could implement the __repr__ method like so:
def __repr__(self):
return "<Filters: {}>".format(self.fil)
Put that in your filters class and any print of it should now show you the list inside of it (or its representation). Improvable, still.
BTW: Please consider renaming your class to Filters or so. Upper camel case is what PEP8 suggests for classes.
I wanted to know how to return data members
You need to access them
filters = l.getLayerFilters()
for f in filters:
print(f.dimensions, f.fil)
whenever I try to print the returned value, it prints the addresses
You never told Python how else it should print your object. Just because those fields are there, it will not automatically show you them.
As attempted to discuss with you in the comments, you will need to override that output behavior yourself with a new function that returns a single human-readable representation of your class
As an example
class filters(object):
def __init__(self, dimensions):
self.dimensions = dimensions
self.fil = np.random.random_sample(self.dimensions)
def __repr__(self):
return "dimensions: {}\nfil: {}".format(self.dimensions, self.fil)
Now, try it again
Some more reading
https://stackoverflow.com/a/2626364/2308683
Understanding repr( ) function in Python

Member functions for numpy records

I have to use numpy arrays of records to save RAM and to have fast access. But I want to use member functions on that records. For example,
X=ones(3, dtype=dtype([('foo', int), ('bar', float)]))
X[1].incrementFooBar()
For ordinary python class,I can make
class QQQ:
...
def incrementFooBar(self):
self.foo+=1
self.bar+=1
pass
X=[QQQ(),QQQ(),QQQ()]
X[1].incrementFooBar()
How can I make something like that, but for numpy records?
I may be wrong, but I don't think there is a way to use member functions on the records in the numpy array like that. Alternatively, you could very easily construct a function to accomplish the same thing:
X=ones(3, dtype=dtype([('foo', int), ('bar', float)]))
def incrementFooBar(X, index):
X['foo'][index] += 1
X['bar'][index] += 1
#then instead of "X[1].incrementFooBar()"
incrementFooBar(X, 1)

Python ORM to NumPy arrays

I am building data simulation framework with numpy ORM, where it is much more convenient to work with classes and objects instead of numpy arrays directly. Nevertheless, output of the simulation should be numpy array. Also blockz is quite interesting as a backend here.
I would like to map all object attributes to numpy arrays. Thus, numpy arrays work like a column-oriented "persistent" storage for my classes. I also need to link "new" attributes to objects which I can calculate using numpy(pandas) framework. And then just link them to objects accordingly using the same back-end.
Is there any solution for such approach? Would you recommend any way to build it in a HPC way?
I have found only django-pandas. PyTables is quite slow on adding new columns-attributes.
Something like (working on pointers to np_array):
class Instance()
def __init__(self, np_array, np_position):
self.np_array = np_array
self.np_position = np_position
def get_test_property():
return(self.np_array[np_position])
def set_test_property(value):
self.np_array[np_position] = value
In fact there is a way to change NumPy or bcolz arrays by reference.
Simple example can be found in the following code.
a = np.arange(10)
class Case():
def __init__(self, gcv_pointer):
self.gcv = gcv_pointer
def gcv(self):
return(self.gcv)
def gcv_set(self, value):
self.gcv[:] = value
pass
#===============================================================================
# NumPy
#===============================================================================
caseList = []
for i in range(1, 10):
case = Case(a[i-1:i])
caseList.append(case)
gcvs = [case.GetGCV() for case in caseList]
caseList[1].SetGCV(5)
caseList[1].SetGCV(13)
caseList[1].gcv[:] = 6
setattr(caseList[1], 'dpd', a[5:6])
caseList[1].dpd
caseList[1].dpd[:] = 888

Numpy: Array of class instances

This might be a dumb question, but say i want to build a program from bottom-up like so:
class Atom(object):
def __init__(self):
'''
Constructor
'''
def atom(self, foo, bar):
#...with foo and bar being arrays of atom Params of lengths m & n
"Do what atoms do"
return atom_out
...i can put my instances in a dictionary:
class Molecule(Atom):
def __init__(self):
def structure(self, a, b):
#a = 2D array of size (num_of_atoms, m); 'foo' Params for each atom
#b = 2D array of size (num_of_atoms, n); 'bar' Params for each atom
unit = self.atom()
fake_array = {"atom1": unit(a[0], b[0]),
"atom2": unit(a[1], b[1]),
: : :
: : :}
def chemicalBonds(self, this, that, theother):
: : :
: : :
My question is, is there a way to do this with numpy arrays so that each element in "real_array" would be an instance of atom--i.e., the output of the individual computations of atom function? I can extend this to class Water(molecule): which would perform fast numpy operations on the large structure and chemicalBonds outputs, hence the need for arrays...Or is it the case that i'm going about this the wrong way?
Also if i am on the right track, i'd appreciate if you wanted to throw in any tips on how to structure a "hierarchical program" like this, as i'm not sure i'm doing the above correctly and recently discovered that i don't know what i'm doing.
Thanks in advance.
The path to hell is paved with premature optimization... As a beginner in python, focus on your program and what is supposed to do, once it is doing it too slowly you can ask focused questions about how to make it do it faster. I would stick with learning python's intrinsic data structures for managing your objects. You can implement your algorithms using using numpy arrays with standard data types if you are doing large array operations. Once you have some working code you can do performance testing to determine where you need optimization.
Numpy does allow you to create arrays of objects, and I will give you enough rope to hang yourself with below, but creating an ecosystem of tools to operate on those arrays of objects is not a trivial undertaking. You should first work with python data structures (buy Beazley's essential python reference), then with numpy's built in types, then creating your own compound numpy types. As a last resort, use the object type from the example below.
Good luck!
David
import numpy
class Atom(object):
def atoms_method(self, foo, bar):
#...with foo and bar being arrays of Paramsof length m & n
atom_out = foo + bar
return atom_out
array = numpy.ndarray((10,),dtype=numpy.object)
for i in xrange(10):
array[i] = Atom()
for i in xrange(10):
print array[i].atoms_method(i, 5)

How to create view/python reference on scipy sparse matrix?

I am working on an algorithm that uses diagonal and first off-diagonal blocks of a large (will be e06 x e06) block diagonal sparse matrix.
Right now I create a dict that stores the blocks in such a way that I can access the blocks in a matrix like fashion. For example B[0,0](5x5) gives the first block of matrix A(20x20), assuming 5x5 blocks and that matrix A is of type sparse.lil.
This works fine but takes horribly long too run. It is inefficient because it copies the data, as this reference revealed to my astonishment: GetItem Method
Is there a way to only store a view on a sparse matrix in a dict? I would like to change the content and still be able to use the same identifiers. It is fine if it takes a little longer as it should only be done once. The blocks will have many different dimensions and shapes.
As far as I know, all of the various sparse matricies in scipy.sparse return copies rather than a view of some sort. (Some of the others may be significantly faster at doing so than lil_matrix, though!)
One way of doing what you want is to just work with slice objects. For example:
import scipy.sparse
class SparseBlocks(object):
def __init__(self, data, chunksize=5):
self.data = data
self.chunksize = chunksize
def _convert_slices(self, slices):
newslices = []
for axslice in slices:
if isinstance(axslice, slice):
start, stop = axslice.start, axslice.stop
if axslice.start is not None:
start *= self.chunksize
if axslice.stop is not None:
stop *= self.chunksize
axslice = slice(start, stop, None)
elif axslice is not None:
axslice = slice(axslice, axslice+self.chunksize)
newslices.append(axslice)
return tuple(newslices)
def __getitem__(self, item):
item = self._convert_slices(item)
return self.data.__getitem__(item)
def __setitem__(self, item, value):
item = self._convert_slices(item)
return self.data.__setitem__(item, value)
data = scipy.sparse.lil_matrix((20,20))
s = SparseBlocks(data)
s[0,0] = 1
print s.data
Now, whenever we modify s[whatever] it will modify s.data of the appropriate chunk. In other words, s[0,0] will return or set s.data[:5, :5], and so on.

Categories