Shifting elements in an array by a non integer in python - python

I wanted to know if there is a way to somehow shift an array by a non integer value in python. Let's say I have a matrix T[i,j] and I want to interpolate the value of T[1.3,4.5] by using T[1,4], T[1,5], T[2,4] and T[2,5]. Is there a simple and fast way to do that?
I have been stuck trying to use scipy.ndimage.shift() for the past few hours but I couldn't understand how to make it work.

EDIT: It seems this can be accomplished with scipy.interpolation.interp2d, though interp2d serves a more general purpose and thus may be slower in this case.
You'd want to do something like iT = interp2d(range(T.shape[0]),range(T.shape[1]),T.T). Please note that, in the T.T at the end, the .T stands for transpose, and has nothing to do with the fact that the array is called T. iT can be accessed as a function, for instance, print(iT(1.1,2.3)).
It will return arrays, as opposed to single values, which indicates that one can pass arrays as arguments too (i.e. compute the value of the interpolation at several points "at once".
I am not aware of any standard way to do this. I'd accomplish it by simply wrapping the array in an instance of some kind of "interpolated array" class. For simplicity, let's start by assuming T is a 1D array:
class InterpolatedArray:
def __init__(self, array):
self.array = array
def __getitem__(self, i):
i0 = int(i)
i1 = i0+1
f = i-i0
return self.array[i0]*(1-f)+self.array[i1]*f
As you can see, it overloads the subscripting routine, so that when you attempt to access (for instance) array[1.1], this returns 0.9*array[1]+0.1*array[2]. You'd have to explicitly build this object from your previous array:
iT = InterpolatedArray(T)
print(iT[1.1])
As for the two-dimensional case, it works the same, just with a little bit more work on the indexing:
class InterpolatedMatrix:
def __init__(self, array):
self.array = array
def __getitem__(self, i, j):
i0 = int(i)
i1 = i0+1
fi = i-i0
j0 = int(j)
j1 = j0+1
fj = j-j0
return self.array[i0,j0]*(1-fi)*(1-fj)\
+self.array[i1,j0]*fi*(1-fj)\
+self.array[i0,j1]*(1-fi)*fj\
+self.array[i1,j1]*fi*fj\
You can probably rewrite some of that code to optimize the amount of operations that are performed.
Note, also, that if for some reason you want to access every index in the image with some small fixed offset (i.e. T[i+0.3,j+0.5] for i,j in T.shape), then it would be better to do this with vectorization, using numpy (something similar may also be possible with scipy).

Related

Numpy.empty() creating array with non-empty values

I'm currently taking some classes on algorithms and data structures and using Python to implement some of the stuff I've been studying.
At the moment I'm implementing a Stack based on a fixed-size array. Given the particularities of python I opted to use numpy.empty().
For a test I've written I'm basically pushing 9 elements into the stack. Up to that point everything is ok because the resulting array has the 9 elements plus space for another 7.
I started popping elements out and when I reach the critical point of just having 4 elements in an array, I expect the array to copy the elements into a new array of size 8.
The thing is that when I create this new array, instead of being created with empty values is already populated.
Here an image of my terminal at that specific step when debugging with PDB
Is there anything I'm missing out?
EDIT: Seems like if I use Python 3 everything works as expected, this is just the case for Python 2
class StackV2(object):
"""
This is the Stack version based on fixed size arrays
"""
def __init__(self):
self.array = numpy.empty(1, dtype=str)
self.size = 0
def push(self, value):
self.array[self.size] = value
self.size += 1
if len(self.array) == self.size:
self._resize_array(len(self.array) * 2)
def pop(self):
self.array[self.size - 1] = ""
self.size -= 1
if len(self.array) == (4 * self.size):
self._resize_array(len(self.array) / 2)
def _resize_array(self, factor):
new_array = numpy.empty(factor, dtype=str)
print(new_array)
index = 0
for i in range(0, self.size):
new_array[index] = self.array[i]
index += 1
self.array = new_array
Short answer
Use numpy.zeros instead of numpy.empty to get rid of the surprise garbage values in your new arrays.
details
The arrays created by numpy.zeros have all of their elements initialized to a "zero value". For arrays with dtype=str, this will be the empty string ''.
From the Numpy docs:
Notes
empty, unlike zeros, does not set the array values to zero, and may therefore be marginally faster. On the other hand, it requires the user to manually set all the values in the array, and should be used with caution.
The fact that it works in Python 3 (but not Python 2) is undefined behavior. Basically, it's a quirk of the implementation which the Numpy developers didn't plan. The best practice is to not rely on such things in your code. As you've seen, the outcome of an undefined behavior is not guaranteed to be consistent across versions, implementations, different computers that you run your code on, etc.
Also, it sounds like you might be a little bit confused about how Numpy arrays work. A numpy array starts off at a fixed size when you create it. This is unlike a normal Python list [], which grows dynamically as you add values to it.
Also, you don't need both index and i in _resize_array. Just use one or the other, like this:
for i in range(self.size):
new_array[i] = self.array[i]
Aside from that your code is fine.

Defining a matrix with unknown size in python

I want to use a matrix in my Python code but I don't know the exact size of my matrix to define it.
For other matrices, I have used np.zeros(a), where a is known.
What should I do to define a matrix with unknown size?
In this case, maybe an approach is to use a python list and append to it, up until it has the desired size, then cast it to a np array
pseudocode:
matrix = []
while matrix not full:
matrix.append(elt)
matrix = np.array(matrix)
You could write a function that tries to modify the np.array, and expand if it encounters an IndexError:
x = np.random.normal(size=(2,2))
r,c = (5,10)
try:
x[r,c] = val
except IndexError:
r0,c0 = x.shape
r_ = r+1-r0
c_ = c+1-c0
if r > 0:
x = np.concatenate([x,np.zeros((r_,x.shape[1]))], axis = 0)
if c > 0:
x = np.concatenate([x,np.zeros((x.shape[0],c_))], axis = 1)
There are problems with this implementation though: First, it makes a copy of the array and returns a concatenation of it, which translates to a possible bottleneck if you use it many times. Second, the code I provided only works if you're modifying a single element. You could do it for slices, and it would take more effort to modify the code; or you can go the whole nine yards and create a new object inheriting np.array and override the .__getitem__ and .__setitem__ methods.
Or you could just use a huge matrix, or better yet, see if you can avoid having to work with matrices of unknown size.
If you have a python generator you can use np.fromiter:
def gen():
yield 1
yield 2
yield 3
In [11]: np.fromiter(gen(), dtype='int64')
Out[11]: array([1, 2, 3])
Beware if you pass an infinite iterator you will most likely crash python, so it's often a good idea to cap the length (with the count argument):
In [21]: from itertools import count # an infinite iterator
In [22]: np.fromiter(count(), dtype='int64', count=3)
Out[22]: array([0, 1, 2])
Best practice is usually to either pre-allocate (if you know the size) or build the array as a list first (using list.append). But lists don't build in 2d very well, which I assume you want since you specified a "matrix."
In that case, I'd suggest pre-allocating an oversize scipy.sparse matrix. These can be defined to have a size much larger than your memory, and lil_matrix or dok_matrix can be built sequentially. Then you can pare it down once you enter all of your data.
from scipy.sparse import dok_matrix
dummy = dok_matrix((1000000, 1000000)) # as big as you think you might need
for i, j, data in generator():
dummy[i,j] = data
s = np.array(dummy.keys).max() + 1
M = dummy.tocoo[:s,:s] #or tocsr, tobsr, toarray . . .
This way you build your array as a Dictionary of Keys (dictionaries supporting dynamic assignment much better than ndarray does) , but still have a matrix-like output that can be (somewhat) efficiently used for math, even in a partially built state.

Use a function like a numpy array

I'm dealing with a big array D with which I'm running into memory problems. However, the entries of that big array are in fact just copies of elements of a much smaller array B. Now my idea would be to use something like a "dynamic view" into B instead of constructing the full D. For example, is it possible to use a function D_fun like an array which the reads the correct element of B? I.e. something like
def D_fun(B, I, J):
i = convert_i_idx(I, J)
j = convert_j_idx(I, J)
return B[i,j]
And then I could use D_fun to do some matrix and vector multiplications.
Of course, anything else that would keep me form copying the elements of B repeatedly into a huge matrix would be appreciated.
Edit: I realized that if I invest some time in my other code I can get the matrix D to be a block matrix with the Bs on the diagonal and zeros otherwise.
This is usually done by subclassing numpy.ndarray and overloading __getitem__, __setitem__, __delitem__
(array-like access via []) to remap the indices like D_fun(..) does. Still, I am not sure if this will work in combination with the numpy parts implemented in C.
Some concerns:
When you're doing calculations on your big matrix D via the small matrix B, numpy might create a copy of D with its real dimensions, thus using more space than wanted.
If several (I1,J1), (I2,J2).. are mapped to the same (i,j), D[I1,J1] = newValue will also set D(I2,J2) to newValue.
np.dot uses compiled libraries to perform fast matrix products. That constrains the data type (integer, floats), and requires that the data be contiguous. I'd suggest studying this recent question about large dot products, numpy: efficient, large dot products
Defining a class with a custom __getitem__ is a way of accessing a object with indexing syntax. Look in numpy/lib/index_tricks.py for some interesting examples of this, np.mgrid,np.r_, np.s_ etc. But this is largely a syntax enhancement. It doesn't avoid the issues of defining a robust and efficient mapping between your D and B.
And before trying to do much with subclassing ndarray take a look at the implementation for np.matrix or np.ma. scipy.sparse also creates classes that behave like ndarray in many ways, but does not subclass ndarray.
In your D_fun are I and J scalars? If so this conversion would be horribly in efficient. It would be better if they could be arrays, lists or slices (anything that B[atuple] implements), but that can be a lot of work.
def D_fun(B, I, J):
i = convert_i_idx(I, J)
j = convert_j_idx(I, J)
return B[i,j]
def __getitem__(self, atuple):
# sketch of a getitem version of your function
I, J = atuple
<select B based on I,J?>
i = convert_i_idx(I, J)
j = convert_j_idx(I, J)
return B.__getitem__((i,j))
What is the mapping from D to B like? The simplest, and most efficient mapping would be that D is just a higher dimensional collection of B, i.e.
D = np.array([B0,B1,B2,...,Bn])
D[0,...] == B0
Slightly more complicated is the case where D[n1:n2,....] == B0, a slice
But if the B0 values are scattered around D you chances of efficient, reliable mapping a very small.

How to create generic 2d array in python

In Java you would do it like this: Node[][] nodes; where Node.java is a custom class. How do I do it in python where Node.py:
class Node(object):
def __init__(self):
self.memory = []
self.temporal_groups = []
I have imported numpy and created an object type
typeObject = numpy.dtype('O') # O stands for python objects
nodes = ???
You can try it this way, inside your node class create a function that will return the generic array:
def genArray(a, b):
return [[0 for y in range(a)] for x in range(b)]
then you can assign them the way you want. Maybe you might change the 0 to your node object. Let me know if this helps
You have two easy options: use numpy or declare a nested list. The latter approach is more conceptually similar to Node[][] since it allows for ragged lists, as does Java, but the former approach will probably make processing faster.
numpy arrays
To make an array in numpy:
import numpy as np
x = np.full((m, n), None, dtype=np.object)
In numpy, you have to have some idea about the size of the array (here m, n) up-front. There are ways to grow an array, but they are not very efficient, especially for large arrays. np.full will initialize your array with a copy of whatever reference you want. You can modify the elements as you wish after that.
Python lists
To create a ragged list, you do not have to do much:
x = []
This creates an empty list. This is equivalent to Node[][] in Java because that declares a list too. The main difference is that Python lists can change size and are untyped. They are effectively always Object[].
To add more dimensions to the list, just do x[i] =
[], which will insert a nested list into your outer list. This is similar to defining something like
Node[][] nodes = new Node[m][];
nodes[i] = new Node[n];
where m is the number of nested lists (rows) and n is the number of elements in each list (columns). Again, the main difference is that once you have a list in Python, you can expand or contract it as you Java.
Manipulate with x[i][j] as you would in Java. You can add new sublists by doing x.append([]) or x[i] = [], where i is an index past the end of x.

Which is faster, numpy transpose or flip indices?

I have a dynamic programming algorithm (modified Needleman-Wunsch) which requires the same basic calculation twice, but the calculation is done in the orthogonal direction the second time. For instance, from a given cell (i,j) in matrix scoreMatrix, I want to both calculate a value from values "up" from (i,j), as well as a value from values to the "left" of (i,j). In order to reuse the code I have used a function in which in the first case I send in parameters i,j,scoreMatrix, and in the next case I send in j,i,scoreMatrix.transpose(). Here is a highly simplified version of that code:
def calculateGapCost(i,j,scoreMatrix,gapcost):
return scoreMatrix[i-1,j] - gapcost
...
gapLeft = calculateGapCost(i,j,scoreMatrix,gapcost)
gapUp = calculateGapCost(j,i,scoreMatrix.transpose(),gapcost)
...
I realized that I could alternatively send in a function that would in the one case pass through arguments (i,j) when retrieving a value from scoreMatrix, and in the other case reverse them to (j,i), rather than transposing the matrix each time.
def passThrough(i,j,matrix):
return matrix[i,j]
def flipIndices(i,j,matrix):
return matrix[j,i]
def calculateGapCost(i,j,scoreMatrix,gapcost,retrieveValue):
return retrieveValue(i-1,j,scoreMatrix) - gapcost
...
gapLeft = calculateGapCost(i,j,scoreMatrix,gapcost,passThrough)
gapUp = calculateGapCost(j,i,scoreMatrix,gapcost,flipIndices)
...
However if numpy transpose uses some features I'm unaware of to do the transpose in just a few operations, it may be that transpose is in fact faster than my pass-through function idea. Can anyone tell me which would be faster (or if there is a better method I haven't thought of)?
The actual method would call retrieveValue 3 times, and involves 2 matrices that would be referenced (and thus transposed if using that approach).
In NumPy, transpose returns a view with a different shape and strides. It does not touch the data.
Therefore, you will likely find that the two approaches have identical performance, since in essence they are exactly the same.
However, the only way to be sure is to benchmark both.

Categories