Member functions for numpy records - python

I have to use numpy arrays of records to save RAM and to have fast access. But I want to use member functions on that records. For example,
X=ones(3, dtype=dtype([('foo', int), ('bar', float)]))
X[1].incrementFooBar()
For ordinary python class,I can make
class QQQ:
...
def incrementFooBar(self):
self.foo+=1
self.bar+=1
pass
X=[QQQ(),QQQ(),QQQ()]
X[1].incrementFooBar()
How can I make something like that, but for numpy records?

I may be wrong, but I don't think there is a way to use member functions on the records in the numpy array like that. Alternatively, you could very easily construct a function to accomplish the same thing:
X=ones(3, dtype=dtype([('foo', int), ('bar', float)]))
def incrementFooBar(X, index):
X['foo'][index] += 1
X['bar'][index] += 1
#then instead of "X[1].incrementFooBar()"
incrementFooBar(X, 1)

Related

Python: How to create an object with expandable numpy array members

I have the following use case:
I am receiving 100 samples per second of several numpy nd-arrays, say a of shape (4, 3), and b of shape (5, 6).
On other instances, I could be receiving c of shape (2, 3), and d of shape (3, 5) and e of some other shape, and so on.
These samples arrive for some variable time between a single sample and 360000 samples (an hour).
I would like to treat an entire streaming session as an object
I would like to have something like
class Aggregator_ab(object):
def __init__(self):
self.a = ???
self.b = ???
def add_sample(new_a, new_b):
self.a.update(new_a) # How can I achieve such an update?
self.b.update(new_b) # How can I achieve such an update?
Now, I would like to
access Aggregator_ab's fields as numpy arrays:
agg = Aggregator_ab() # maybe with parameters?
num_samples = 4
for i in range(num_samples):
agg.add_sample(new_a=np.zeros((4, 3), new_b=np.zeros((5, 6))
assert agg.a.shape[0] == num_samples
assert agg.b.shape[0] == num_samples
assert agg.a.shape[1:] == (4, 3)
assert agg.b.shape[1:] == (5, 6)
And I would also expect regular numpy behavior of the members of agg.
My current code has some problems, and looks something like:
class Aggregator_ab(object):
def __init__(self):
self.__as = []
self.__bs = []
def add_sample(new_a, new_b):
self.__as.append(new_a)
self.__bs.append(new_b)
#property
def a(self):
return np.vstack(self.__as)
#property
def b(self):
return np.vstack(self.__bs)
problems:
can only get a "full" numpy array after using vstack
must use expensive vstack every "get"
can't utilize previous vstacks
adding any field requires a lot of boilerplate which I would like to extract
This only supports very limited use cases, and If i ever want something more, I would have to implement myself.
going through native python lists is the only way to scale the array size without paying too much for resizing. Had I used vstack to keep a numpy array, at large sizes, I wouldn't be able to keep up with the frame rate.
This seems to me like a common use case, thus I believe someone has solved this before me.
Is there some library that does this? I know pandas sounds right, but what do I do if I have fields that are matrices?
If not, then how is this usually dealt with?
What about allocating an array that keeps growing in size? It works like vectors in most common languages:
import numpy as np
class growing_array:
def __init__(self, shape, growth):
self._arr=np.empty(shape=(*shape, growth))
self.incoming_shape=shape
self.num_entries=0
self.growth=growth
def append(self, incoming_arr):
if self.num_entries == self._arr.shape[2]:
self._arr.resize(*self.incoming_shape, self._arr.shape[2]+self.growth)
self._arr[:,:,self.num_entries] = incoming_arr
self.num_entries+=1
def get(self):
return self._arr[:,:,0:self.num_entries]

How to pass one same value as argument to two methods?

I have a simple class as:
connection has db connection
import pandas as pd
from animal import kettle
class cat:
def foo(connection):
a=pd.read_sql('select * from zoo',connection)
return1= kettle.boo1(a)
return2= kettle.boo2(a)
return return1,return2
Now I want to pass a to both boo1 and boo2 of kettle, am I passing it the correct way in above foo()?
I thought above way is correct and I tried this way , but is this correct way to pass?
animal.py:
class kettle:
def boo1(return1):
print(return1)
def boo2(return2):
print(return2)
sorry if this doesn't make any sense,
my intention is passing a to both boo1 and boo2 of kettle class
This looks like the correct approach to me: by assigning the return value of pd.read_sql('select * from zoo', connection) to a first and then passing a to kettle.boo1 and kettle.boo2 you ensure you only do the potentially time-consuming database IO only once.
One thing to keep in mind with this design pattern when you are passing objects such as lists/dicts/dataframes is the question of whether kettle.boo1 changes the value that is in a. If it does, kettle.boo2 will receive the modified version of a as an input, which can lead to unexpected behavior.
A very minimal example is the following:
>>> def foo(x):
... x[0] = 'b'
...
>>> x = ['a'] # define a list of length 1
>>> foo(x) # call a function that modifies the first element in x
>>> print(x) # the value in x has changed
['b']
There are (many) possible solutions for your problem, whatever that might be. I assume you just start out object oriented programming in Python, and get errors along the lines of
unbound method boo1() must be called with kettle instance as first argument
and probably want this solution:
Give your class methods an instance parameter:
def boo1(self, return1):
Instantiate the class kettle in cat.foo:
k = kettle()
Then use it like:
k.boo1(a)
Same for the boo2 method.
Also you probably want to:
return return1 # instead of or after print(return1)
as your methods return None at the moment.

Implementing new data type in Python - without classes

I'm trying to implement new data type "Fractions" in Python to represents fractions, where numenator and denominator are both integers. Moreover, I have to implement four basic arithmetic operations. The trick is, I can't use classes in this task.
I thoght maybe tuples can be a good idea but I really don't know how to approach this.
Is there an easy way to solve such a problem? Any hint would really help me.
You have two problems. 1) How to encapsulate the data, and 2) How to operate on the data.
First, let's solve encapsulation. Just put everything you need in a tuple:
half = (1,2)
whole = (1,1)
answer = (42,1)
See? The first item is the numerator, the second is the denominator.
Now you need a way to operate on the data. Since we can't use methods, we'll just use regular functions:
def mul(a,b):
'Multiply two fractions'
return (a[0]*b[0], a[1]*b[1])
Similarly, implement add(a,b), negate(a), sub(a,b), etc. You might need a simplify(), so you don't end up with 10240000/20480000 after a while.
To make our object-oriented-without-classes suite complete, we need a constructor:
def make_frac(num, denom):
'Create a fraction with the indicated numerate and denominator'
return (num, denom)
Finally, place all of these functions in a module, and your task is complete. The user of your library will write something like this:
import your_fraction_lib
half = your_fraction_lib.make_frac(1,2)
quarter = your_fraction_lib.mul(half, half)
three_quaters = your_fraction_lib.add(half, quarter)
If you want to troll your teacher, you could do something along the lines of:
def construct(values):
def mul(other_fraction):
new_numerator = values['numerator']*other_fraction['values']['numerator']
new_denominator = values['denominator']*other_fraction['values']['denominator']
new_values = {'numerator':new_numerator,'denominator':new_denominator}
return(construct(new_values))
return({'values':{'numerator':values['numerator'],'denominator':values['denominator']},'mul':mul})
This allows you to construct objects that contain a mul function that acts much like a class method:
x = construct({'numerator':1,'denominator':2})
y = construct({'numerator':3,'denominator':5})
product = x['mul'](y)
print(product['values']['numerator'],product['values']['denominator'])
>>3 10

Python ORM to NumPy arrays

I am building data simulation framework with numpy ORM, where it is much more convenient to work with classes and objects instead of numpy arrays directly. Nevertheless, output of the simulation should be numpy array. Also blockz is quite interesting as a backend here.
I would like to map all object attributes to numpy arrays. Thus, numpy arrays work like a column-oriented "persistent" storage for my classes. I also need to link "new" attributes to objects which I can calculate using numpy(pandas) framework. And then just link them to objects accordingly using the same back-end.
Is there any solution for such approach? Would you recommend any way to build it in a HPC way?
I have found only django-pandas. PyTables is quite slow on adding new columns-attributes.
Something like (working on pointers to np_array):
class Instance()
def __init__(self, np_array, np_position):
self.np_array = np_array
self.np_position = np_position
def get_test_property():
return(self.np_array[np_position])
def set_test_property(value):
self.np_array[np_position] = value
In fact there is a way to change NumPy or bcolz arrays by reference.
Simple example can be found in the following code.
a = np.arange(10)
class Case():
def __init__(self, gcv_pointer):
self.gcv = gcv_pointer
def gcv(self):
return(self.gcv)
def gcv_set(self, value):
self.gcv[:] = value
pass
#===============================================================================
# NumPy
#===============================================================================
caseList = []
for i in range(1, 10):
case = Case(a[i-1:i])
caseList.append(case)
gcvs = [case.GetGCV() for case in caseList]
caseList[1].SetGCV(5)
caseList[1].SetGCV(13)
caseList[1].gcv[:] = 6
setattr(caseList[1], 'dpd', a[5:6])
caseList[1].dpd
caseList[1].dpd[:] = 888

Numpy: Array of class instances

This might be a dumb question, but say i want to build a program from bottom-up like so:
class Atom(object):
def __init__(self):
'''
Constructor
'''
def atom(self, foo, bar):
#...with foo and bar being arrays of atom Params of lengths m & n
"Do what atoms do"
return atom_out
...i can put my instances in a dictionary:
class Molecule(Atom):
def __init__(self):
def structure(self, a, b):
#a = 2D array of size (num_of_atoms, m); 'foo' Params for each atom
#b = 2D array of size (num_of_atoms, n); 'bar' Params for each atom
unit = self.atom()
fake_array = {"atom1": unit(a[0], b[0]),
"atom2": unit(a[1], b[1]),
: : :
: : :}
def chemicalBonds(self, this, that, theother):
: : :
: : :
My question is, is there a way to do this with numpy arrays so that each element in "real_array" would be an instance of atom--i.e., the output of the individual computations of atom function? I can extend this to class Water(molecule): which would perform fast numpy operations on the large structure and chemicalBonds outputs, hence the need for arrays...Or is it the case that i'm going about this the wrong way?
Also if i am on the right track, i'd appreciate if you wanted to throw in any tips on how to structure a "hierarchical program" like this, as i'm not sure i'm doing the above correctly and recently discovered that i don't know what i'm doing.
Thanks in advance.
The path to hell is paved with premature optimization... As a beginner in python, focus on your program and what is supposed to do, once it is doing it too slowly you can ask focused questions about how to make it do it faster. I would stick with learning python's intrinsic data structures for managing your objects. You can implement your algorithms using using numpy arrays with standard data types if you are doing large array operations. Once you have some working code you can do performance testing to determine where you need optimization.
Numpy does allow you to create arrays of objects, and I will give you enough rope to hang yourself with below, but creating an ecosystem of tools to operate on those arrays of objects is not a trivial undertaking. You should first work with python data structures (buy Beazley's essential python reference), then with numpy's built in types, then creating your own compound numpy types. As a last resort, use the object type from the example below.
Good luck!
David
import numpy
class Atom(object):
def atoms_method(self, foo, bar):
#...with foo and bar being arrays of Paramsof length m & n
atom_out = foo + bar
return atom_out
array = numpy.ndarray((10,),dtype=numpy.object)
for i in xrange(10):
array[i] = Atom()
for i in xrange(10):
print array[i].atoms_method(i, 5)

Categories