python numpy ndarray subclassing for offset changing - python

I am working on a framework for processing incoming data.
The data is received from a socket and added to numpy an array A (used as buffer) using shifting, sth like:
A[:-1] = A[1:]
A[-1] = value
The framework allows loading processing units as classes that have an access to incoming data using array view pointing to A. Everytime new data is received and stored in A, a method execute() is called:
def execute(self,):
newSample = self.data[-1]
What is important is that new sample is always under index = -1.
A user can also create his own array views in __init__ function:
def __init__(self,):
self.myData = self.data[-4:] # view that contains last 4 samples
Everything works nicely when I am shifting array A and adding new value at the end. However, for offline testing, I want to load all the data at the start of the framework and run everything else as before (i.e. the same classes implementing data processing).
Of course, I can again create A buffer using zeros array and shift it with new values. However, this involves copying of data between two arrays that is absolutely not necessary - takes time and memory.
What I was thinking about is to provide a way to change the boundaries of the numpy array or change A.data pointer. However, all the solutions are not allowed or lead to the warning message.
Finally, I am trying to change an internal offset of array A, so that I can advance it and thus make more data available for algorithms. What is important, self.data[-1] has to always point to the newly appeared sample and standard numpy array API should be used.
I have subclassed np.ndarray:
class MyArrayView(np.ndarray):
def __new__(cls, input_array):
obj = np.asarray(input_array).view(cls)
# add the new attribute to the created instance
obj._offset = 0
# Finally, we must return the newly created object:
return obj
def __array_finalize__(self, obj):
if obj is None:
return
self._offset = getattr(obj, '_offset', None)
def advance_index(self):
self._offset += 1
def __str__(self):
return super(MyArrayView, self[:]).__str__()
def __repr__(self):
return super(MyArrayView, self[:]).__repr__()
def __getitem__(self, idx):
if isinstance(idx, slice):
start = 0
stop = self._offset
step = idx.step
idx = slice(start, stop, step)
else:
idx = self._offset + idx
return super(MyArrayView, self).__getitem__(idx)
that allows me to do the following:
a = np.array([1,2,3,4,5,6,7,8,9,10])
myA = MyArrayView(a)
b = myA
print("b :", b)
for i in range(1,5):
myA.advance_index()
print(b[:], b[-1])
print("b :", b)
print("b + 10 :", b + 10)
print("b[:] + 20 :", b[:] + 20)
and gives following output:
b : []
[1] 1
[1 2] 2
[1 2 3] 3
[1 2 3 4] 4
b : [1 2 3 4]
b + 10 : [11 12 13 14]
b[:] + 20 : [21 22 23 24]
so far so good. However if I check the shape:
print("shape", b[:].shape) # shape (4,)
print("shape", b.shape) # shape (10,)
it is different in those two cases. I have tried to change it using: shape=(self.internalIndex,) but it leads me only to an error message.
I want to ask if you think this is the right way what I am doing and it only requires to overload more functions in a np.ndarray class. Or should I completely abandon this solution and fallback to shifting array with a new sample? Or is it may be possible to be achieved using standard np.ndarray implementation as I need to use standard numpy API.
I also tried this:
a = np.array([1,2,3,4,5,6,7,8,9,10])
b = a.view()[5:]
print(a.data) # <memory at 0x7f09e01d8f48>
print(b.data) # <memory at 0x7f09e01d8f48> They point to the same memory start!
print(np.byte_bounds(a)) # (50237824, 50237904)
print(np.byte_bounds(b)) # (50237864, 50237904) but the byte_bounds are different
So having this in mind, I would say I need to create a view of array a and extend it (or at least move it like a window on top of a).
However, all my tries to change the byte_bounds did not bring any effects.

I admire your bravery, but am quite sure sub-classing numpy arrays is overkill for your problem and can cause you a huge lot of headache. In the end it might cause a performance hit somewhere that by far outruns the array copying you are trying to avoid.
Why not make the slice (i.e. [-4:] or slice(-4, None)) a parameter to your __init__ function or a class attribute and override that in your test?
def __init__(self, lastfour=slice(-4, None)):
self.myData = self.data[lastfour]

Related

Possible to change the original variables in an array?

Consider this (uncontroversial) simple example:
allvars = []
a = 1
allvars.append(a)
b = 2
allvars.append(b)
c = 3
allvars.append(c)
d = 4
allvars.append(d)
e = 5
allvars.append(e)
for ix in allvars:
ix = ix + 1 # changes local variable ix, but not array elements
print(allvars) # [1, 2, 3, 4, 5]
for i, ix in enumerate(allvars):
allvars[i] = ix + 1 # changes array elements ...
print(allvars) # [2, 3, 4, 5, 6]
# ... but not original variables
print(a,b,c,d,e) # 1 2 3 4 5
Even if we had some variables "stored" into a Python list - changing that list did not change the original variables.
It is clear why this happens, if we recall that Python in fact stores pointers (as I read somewhere, "python has names, not variables"):
when we do a = 1; a points to the address of the int object 1
allvars[0], which is where we thought we stored a, also gets the address of the int object 1
In allvars[0] = allvars[0]+1, the final allvars[0] gets the address of the resulting int object, 2
however, that doesn't change the fact that a still points to the int object 1
The thing is, however, - I have a situation, where I have to manage a bunch of variables (like a, b ... above) separately; however, in the code, there are cases that would be more straightforward to handle, if I ran a loop over all these variables - but, the variables would have to be updated, because after the phase when the loops are useful, I have some remaining processing to be done on the variables (a, b etc) individually, where the updated values are expected.
So is there some (not too convoluted) way in Python, to put variables (or maybe, variable names) in a list/array, and then iterate over that array - and change the original variable (names)?
In terms of above example, I'd want to do something like this pseudocode:
...
for i, ix in enumerate(allvars):
change_originals("allvars[i] = ix + 1")
print(a,b,c,d,e) # this should be 2, 3, 4, 5, 6
Here you have created an array of primitives value. Primitives always copy whenever you use it. So mofication wont reflect on the original variable.
There are possible solution base on your requirement.
class Val:
def __init__(self, val = -1):
self._val = val
def get_val(self):
return self._val
# setter method
def set_val(self, x):
self._val = x
allvars = []
one = Val(1)
allvars.append(one)
print(allvars[0]._val)
one.set_val(2)
print(allvars[0]._val)
You can use a dictionary with key[1,2,3,4...]
You can create array of object
One way I can think of to do this would store the variable names as strings in a list, then use the exec function. This function always returns 0. It accepts a string argument and then executes that string as valid python code. So:
# Where allvars contains string names of variables
...
for i, ix in enumerate(allvars):
exec(f"{allvars[i]} = {ix} + 1")
Another way would use the locals() function, which gives you a dictionary with names and values of variables and any other names:
# using locals() function
# Make a list of variable names
allvars_strings = ['a','b','c','d','e']
# Slightly simpler
for i in allvars_strings:
locals()[i] = locals()[i] + 1
string = ''
print('\n\n')
for i in allvars_strings:
string += str(locals()[i]) + ', '
print(string[:-2])

Making a class process A 2 by 2 Matrix and having issues return it through __str__

so I have to process a 2 by 2 matrix through a class and return print the output with str. I can't really create new functions and I'm pretty sure the math is good with the matrices, I'm just having some output issues. I marked the area, specifically really the output that I cannot modify, but I can modify the class to support it.
Here's my code.
# This code aims to take a 2 by 2 matrix and add, subtract, and multiply it by another matrix, as well as inverse and power it.
# ----------------------------------------------------------------------------------------------------------------------
# This is how we'll use math.nan and only math.nan
import math
# Your classes should go here
class Matrix2x2: # Just initializing as needed.
def __init__(self,a,b,c,d):
self.a = a
self.b = b
self.c = c
self.d = d
def __add__(self,second):
return(Matrix2x2(self.a+second.a,self.b+second.b,self.c+second.c,self.d+second.d))
def __sub__(self, second): # Just subtracting instead of adding
return(Matrix2x2(self.a - second.a,self.b-second.b,self.c-second.c,self.d-second.d))
def __mul__(self, second): # Multiplying them based on the according spot and some addition.
return(Matrix2x2(self.a*second.a+self.b*second.c,self.a*second.b+self.b*second.d,self.c*second.a+self.d*second.c,self.c*second.b+self.d*second.d))
def __pow__(self, power): # Process varies based on what we work with.
StoredMatrix = Matrix2x2(self.a, self.b, self.c, self.d) # The variables just save information and make the math more clean.
determinant = 1/((self.a*self.d)-(self.b*self.c)) # Used to simplify inversing and determine if there is an inverse.
InverseMatrix = Matrix2x2(self.d*determinant,-self.b*determinant,-self.c*determinant, self.a*determinant)
if power > 0:
count = 1
while count < power: # The original matrix is repeatedly multiplied and stored until it matches the power value.
count+=1
StoredMatrix *= Matrix2x2(self.a, self.b, self.c, self.d)
return StoredMatrix
elif power < 0:
count = 0
while count < power:
count+=1
InverseMatrix *= Matrix2x2(self.d*determinant,-self.b*determinant,-self.c*determinant,self.a*determinant)
return InverseMatrix
if determinant == 0 or power == 0: # This means that there is no inverse, or the power value is 0 and invalid.
return(Matrix2x2(math.nan, math.nan, math.nan, math.nan))
def __str__(self):
return print('[',str(self.a) ,str(self.b) ,']\n' ,'\b[' ,str(self.c) ,str(self.d),']')
# Do NOT use any pre-built packages to perform the below operations, each should
# be coded using regular mathematics operation (+,-,*,/), no numpy or math functions other
# than math.nan
# Code below cannot be modified
A = Matrix2x2(1,2,3,4)
B = Matrix2x2(4,3,2,1)
print('Addition: A+B')
print(A,"+\n",B,"=\n",A+B,sep="")
input(),print('Subtraction: A-B')
print(A,"-\n",B,"=\n",A-B,sep="")
input(),print('Multiplication: A*B')
print(A,"*\n",B,"=\n",A*B,sep="")
input(),print('Multiplication: B*A')
print(B,"*\n",A,"=\n",B*A,sep="")
input(),print('Powers: A^3 ')
print(A,"^3","\n=\n",A**3,sep="")
input(),print('Inverse: A^-1 ')
print(A,"^-1","\n=\n",A**(-1),sep="")
input(),print('Inverse with powers: A^-3 = (A^-1)^3')
print(A,"^-3","\n=\n",A**(-3),sep="")
# code above cannot be modified
# Just for testing, below.
print(A.__add__(B))
print(A.__sub__(B))
print(A.__mul__(B))
print(A.__pow__(3))
print(A.__pow__(-1))
print(A.__pow__(0))
print(A.__pow(-3))
I usually get an error due a NoneType with the add function. This doesn't allow me to see what errors I'd get. I tried just converting them to strings individually using str() and got the same error. I don't think it's the math.nan's either.
Here is an example:
Addition: A+B
[ 1 2 ]
[ 3 4 ]
Traceback (most recent call last):
File "ThisWasPurposelyCensored", line 51, in <module>
print(A,"+\n",B,"=\n",A+B,sep="")
TypeError: __str__ returned non-string (type NoneType)
Process finished with exit code 1
Anyways, how do I avoid that NoneType issue or get this to be compatible with str, without interfering with the math and required input too much? I'll provide anymore information you might need to help me solve this.
Rewrite your __str__ method to be like this
def __str__(self):
return '[ {} {} ]\n[ {} {} ]'.format(self.a, self.b, self.c, self.d)
A bit shorter
def __str__(self):
return '[ {x.a} {x.b} ]\n[ {x.c} {x.d} ]'.format(x=self)

In an instance of a class, can you use a value returned from a one method, as a parameter for calling another method

class A:
def __init__(self, matrix=[]):
self.matrix = matrix
def dimension(self):
return len(self.matrix), len(self.matrix[0])
def reduce_matrix(self, i, j):
temp = self.matrix[:i-1] + self.matrix[i:]
M = A([row[:j-1] + row[j:] for row in temp])
return M
def determinant(self):
(nrows, ncols) = self.dimension()
if nrows != ncols:
return ("Cannot find determinant of non-square matrix.")
elif nrows == 1:
return self.matrix[0][0]
else:
M = A(sum([ ((-1)**j) * self.matrix[0][j] * self.reduce_matrix(1,
j+1).determinant() for j in range(ncols) ]))
return M
Hello, I just started OOP so I have a question regarding using the returned value of a method in place of the "instance name" that is passed as "self" for some other method.
I included the rest of the code that just makes the calculations for a matrix problem, but I'm only focusing on the "self.reduce_matrix(1, j+1)).determinant()" bit of it. I need to take a matrix (which is a list of lists ex. [[1,2,3],[4,5,6]] ), and perform the "reduce_matrix" method 1 time per column in the matrix, and each time, the "determinant(self)" method passes in the value returned from the other method. When I run it, it says that "list object has no attribute "determinant"" because I can't pass in "self" to the determinant method like I usually do --> "self.determinant(self)"
Any suggestions will be very appreciated
In reduce_matrix:
create M as M = A([row[:j-1] + row[j:] for row in temp])
do not return M.matrix but M itself.
That said, numpy is there for all kind of matrix operations and more.

In NumPy, how do I set array b's data to reference the data of array a?

Say I have ndarray a and b of compatible type and shape. I now wish for the data of b to be referring to the data of a. That is, without changing the array b object itself or creating a new one. (Imagine that b is actually an object of a class derived from ndarray and I wish to set its data reference after construction.) In the following example, how do I perform the b.set_data_reference?
import numpy as np
a = np.array([1,2,3])
b = np.empty_like(a)
b.set_data_reference(a)
This would result in b[0] == 1, and setting operations in one array would affect the other array. E.g. if we set a[1] = 22 then we can inspect that b[1] == 22.
N.B.: In case I controlled the creation of array b, I am aware that I could have created it like
b = np.array(a, copy=True)
This is, however, not the case.
NumPy does not support this operation. If you controlled the creation of b, you might be able to create it in such a way that it uses a's data buffer, but after b is created, you can't swap its buffer out for a's.
Every variable in python is a pointer so you can use directly = as follow
import numpy as np
a = np.array([1,2,3])
b = a
You can check that b refers to a as follow
assert a[1] == b[1]
a[1] = 4
assert a[1] == b[1]
Usually when functions are not always supposed to create their own buffer they implement an interface like
def func(a, b, c, out=None):
if out is None:
out = numpy.array(x, y)
# ...
return out
that way the caller can control if an existing buffer is used or not.

High memory usage in python

The following simple python code:
class Node:
NumberOfNodes = 0
def __init__(self):
Node.NumberOfNodes += 1
if __name__ == '__main__':
nodes = []
for i in xrange(1, 7 * 1000 * 1000):
if i % 1000 == 0:
print i
nodes.append(Node())
takes gigabytes of memory;
Which I think is irrational. Is that normal in python?
How could I fix that?(in my original code, I have about 7 million objects each with 10 fields and that takes 8 gigabytes of RAM)
If you have fixed number of fields then you can use __slots__ to save quite a lot of memory. Note that __slots__ do have some limitations, so make sure your read the Notes on using __slots__ carefully before choosing to use them in your application:
>>> import sys
>>> class Node(object):
NumberOfNodes = 0
def __init__(self):
Node.NumberOfNodes += 1
...
>>> n = Node()
>>> sys.getsizeof(n)
64
>>> class Node(object):
__slots__ = ()
NumberOfNodes = 0
def __init__(self):
Node.NumberOfNodes += 1
...
>>> n = Node()
>>> sys.getsizeof(n)
16
Python is an inherently memory heavy programming language. There are some ways you can get around this. __slots__ is one way. Another, more extreme approach is to use numpy to store your data. You can use numpy to create a structured array or record -- a complex data type that uses minimal memory, but suffers a substantial loss of functionality compared to a normal python class. That is, you are working with the numpy array class, rather than your own class -- you cannot define your own methods on your array.
import numpy as np
# data type for a record with three 32-bit ints called x, y and z
dtype = [(name, np.int32) for name in 'xyz']
arr = np.zeros(1000, dtype=dtype)
# access member of x of a record
arr[0]['x'] = 1 # name based access
# or
assert arr[0][0] == 1 # index based access
# accessing all x members of records in array
assert arr['x'].sum() == 1
# size of array used to store elements in memory
assert arr.nbytes == 12000 # 1000 elements * 3 members * 4 bytes per int
See more here.

Categories