How to get over "Theano has no sparse vector" error? - python

The following code works fine when the tensor is a dmatrix as follows:
A = T.dmatrix('A') # Input tensor
X, updates = theano.scan(lambda i: T.sum((A+A[i])*(T.neq(A*A[i],0)), axis=1),
sequences=T.arange(A.shape[0]))
compute = function([A], X)
Sample input:
a = [[1,2,3,0,9],[3,2,6,2,7],[0,0,0,8,0],[1,0,0,0,3]]
compute(a)
Corresponding output:
array([[ 30., 33., 0., 14.],
[ 33., 40., 10., 14.],
[ 0., 10., 16., 0.],
[ 14., 14., 0., 8.]])
The real pain comes into play when I try converting this to a sparse matrix.
A = sparse.csr_matrix(name='A', dtype='int64')
The following error pops up when X is compiled:
...
...
NotImplementedError: Theano has no sparse vectorUse X[a:b, c:d], X[a:b, c:c+1] or X[a:b] instead.
I also tried substituting the addition and multiply operations in the scan function with sparse.basic.add and sparse.basic.mul respectively. No matter what I do, the above error persists.
Please help. What should I do to fix this?

Related

appending a arrays of a list into arrays of another list

I have two lists that each one includes arrays. I want to append arrays of one list at the end of arrays of another one. I mean:
inp1 = [np.array([[2.5, 1.5, 0.]]),
np.array([[3., 2., 0.], [2.1, 2., -1.]])]
inp2 = [np.array([[10., 20., 30.]]),
np.array([[100., 100., 100.], [200., 200., 400.]])]
Then, I want to append first array of inp2 at the end of first array of inp1. Then, second of inp2 at the ned of second of inp1 and so on. I want to have the results as following:
outp= [np.array([[2.5, 1.5, 0.],
[10., 20., 30.]]),
np.array([[3., 2., 0.],
[2.1, 2., -1.],
[100., 100., 100.],
[200., 200., 400.]])]
I tried the following but it gave me another result:
outp1=zip(inp1, inp2)
outp1=list(outp1)
In reality I have hundreds of arrays stored in inp1 and inp2.
What about this :
l = [np.append(array, to_append, axis=0) for (array, to_append) in zip(inp1, inp2)]
print(l)

How do you apply a multivariable function to a 100x100 array?

I am trying to use python to compute the output of a function, say:
$f(x) = x + y$
Where x and y are the coordinates of the point in the array. So, the point 5, 5 would have the value 10. This will essentially generate an image of (x,y) and an associated pixel intensity value.
Right now I have a 100x100 dataframe in Python/Pandas, and want to know how to actually perform this calculation. My best guess is iterate over each row, and using the index of the row (y) and the index of the element (x), pass these two values into the function and set the point to that value.
This is essentially a basic multivariable graphing problem. Was hoping someone had some experience doing stuff like this. Thank you!
There are numpy functions fromfunction and indices. They'll probably do what you want.
import numpy as np
np.fromfunction( lambda r, c: r+c, shape = (5,5))
# array([[0., 1., 2., 3., 4.],
# [1., 2., 3., 4., 5.],
# [2., 3., 4., 5., 6.],
# [3., 4., 5., 6., 7.],
# [4., 5., 6., 7., 8.]])
fromfunction takes a function as the first argument then the shape. It uses the axes' indices in the function. The function requires as many arguments as there are dims in the shape.
np.indices((3,3))
# array([[[0, 0, 0], # Row coordinates
# [1, 1, 1],
# [2, 2, 2]],
#
# [[0, 1, 2], # Column coordinates
# [0, 1, 2],
# [0, 1, 2]]])
These can be used as function arguments to drive your results.
There are also np.ogrid and np.mgrid which generate np.arrays to use in any calculations. A lot depends on exactly what you want to do.
Edit: np.fromfunction with keyword arguments.
def test ( a, b, c, m0=1, m1 =1): # Specify function with kwargs.
return a * m0 + b * m1 + c
np.fromfunction(test, (4, 3, 5 ), m0=100, m1=10) # Change he kwargs at run time.
# array([[[ 0., 1., 2., 3., 4.],
# [ 10., 11., 12., 13., 14.],
# [ 20., 21., 22., 23., 24.]],
# [[100., 101., 102., 103., 104.],
# [110., 111., 112., 113., 114.],
# [120., 121., 122., 123., 124.]],
# [[200., 201., 202., 203., 204.],
# [210., 211., 212., 213., 214.],
# [220., 221., 222., 223., 224.]],
# [[300., 301., 302., 303., 304.],
# [310., 311., 312., 313., 314.],
# [320., 321., 322., 323., 324.]]])

Requesting NumPy/SciPy vectorization replacements of for loops and list comprehensions

I have two different array processing problems that I'd like to solve AQAP (Q=quickly) to ensure that the solutions aren't rate-limiting in my process (using NEAT to train a video game bot). In one case, I want to build a penalty function for making larger column heights, and in the other I want to reward building "islands of a common value.
Operations begin on a 26 row x 6 column numpy array of grayscale values with a black/0 background.
I have working solutions for each problem that already implement some numpy, but I'd like to push for a fully vectorized approach to both.
import numpy as np,
from scipy.ndimage.measurements import label as sp_label
from math import ceil
Both problems start from an array like this:
img= np.array([[ 0., 0., 0., 12., 0., 0.],
[ 0., 0., 0., 14., 0., 0.],
[ 0., 0., 0., 14., 0., 0.],
[ 0., 0., 0., 14., 0., 0.],
[16., 0., 0., 14., 0., 0.],
[16., 0., 0., 12., 0., 0.],
[12., 0., 11., 0., 0., 0.],
[12., 0., 11., 0., 0., 0.],
[16., 0., 15., 0., 15., 0.],
[16., 0., 15., 0., 15., 0.],
[14., 0., 12., 0., 11., 0.],
[14., 0., 12., 0., 11., 0.],
[14., 15., 11., 0., 11., 0.],
[14., 15., 11., 0., 11., 0.],
[13., 16., 12., 0., 13., 0.],
[13., 16., 12., 0., 13., 0.],
[13., 14., 16., 0., 16., 0.],
[13., 14., 16., 0., 16., 0.],
[16., 14., 15., 0., 14., 0.],
[16., 14., 15., 0., 14., 0.],
[14., 16., 14., 0., 11., 0.],
[14., 16., 14., 0., 11., 0.],
[11., 13., 14., 16., 12., 13.],
[11., 13., 14., 16., 12., 13.],
[12., 12., 15., 14., 15., 11.],
[12., 12., 15., 14., 15., 11.]])
The first (column height) problem is currently being solved with:
# define valid connection directions for sp_label
c_valid_conns = np.array((0,1,0,0,1,0,0,1,0,), dtype=np.int).reshape((3,3))
# run the island labeling function sp_label
# c_ncomponents is a simple count of the conected columns in labeled
columns, c_ncomponents = sp_label(img, c_valid_conns)
# calculate out the column lengths
col_lengths = np.array([(columns[columns == n]/n).sum() for n in range(1, c_ncomponents+1)])
col_lengths
to give me this array: [ 6. 22. 20. 18. 14. 4. 4.]
(bonus if the code consistently ignores the labeled region that does not "contain" the bottom of the array (row index 25/-1))
The second problem involves masking for each unique value and calculating the contiguous bodies in each masked array to get me the size of the contiguous bodies:
# initial values to start the ball rolling
values = [11, 12, 13, 14, 15, 16]
isle_avgs_i = [1.25, 2, 0, 1,5, 2.25, 1]
# apply filter masks to img to isolate each value
# Could these masks be pushed out into a third array dimension instead?
masks = [(img == g) for g in np.unique(values)]
# define the valid connectivities (8-way) for the sp_label function
m_valid_conns = np.ones((3,3), dtype=np.int)
# initialize islanding lists
# I'd love to do away with these when I no longer need the .append() method)
mask_isle_avgs, isle_avgs = [],[]
# for each mask in the image:
for i, mask in enumerate(masks):
# run the island labeling function sp_label
# m_labeled is the array containing the sequentially labeled islands
# m_ncomponents is a simple count of the islands in m_labeled
m_labeled, m_ncomponents = sp_label(mask, m_valid_conns)
# collect the average (island size-1)s (halving to account for...
# ... y resolution) for each island into mask_isle_avgs list
# I'd like to vectorize this step
mask_isle_avgs.append((sum([ceil((m_labeled[m_labeled == n]/n).sum()/2)-1
for n in range(1, m_ncomponents+1)]))/(m_ncomponents+1))
# add up the mask isle averages for all the islands...
# ... and collect into isle_avgs list
# I'd like to vectorize this step
isle_avgs.append(sum(mask_isle_avgs))
# initialize a difference list for the isle averages (I also want to do away with this step)
d_avgs = []
# evaluate whether isle_avgs is greater for the current frame or the...
# ... previous frame (isle_avgs_i) and append either the current...
# ... element or 0, depending on whether the delta is non-negative
# I want this command vectorized
[d_avgs.append(isle_avgs[j])
if (isle_avgs[j]-isle_avgs_i[j])>=0
else d_avgs.append(0) for j in range(len(isle_avgs))]
d_avgs
to give me this d_avgs array: [0, 0, 0.46785714285714286, 1.8678571428571429, 0, 0]
(bonus again if the code consistently ignores the labeled region that does not "contain" the bottom of the array (row index 25/-1) to instead give this array:
[0, 0, 0.43452380952380953, 1.6345238095238095, 0, 0] )
I'm looking to remove any list operations and comprehensions and move them into fully vectorized numpy/scipy implementation with the same results.
Any help removing any of these steps would be greatly appreciated.
Here's how I ultimately solved this issue:
######## column height penalty calculation ########
# c_ncomponents is a simple count of the conected columns in labeled
columns, c_ncomponents = sp_label(unit_img, c_valid_conns)
# print(columns)
# throw out the falling block with .isin(x,x[-1]) combined with...
# the mask nonzero(x)
drop_falling = np.isin(columns, columns[-1][np.nonzero(columns[-1])])
col_hts = drop_falling.sum(axis=0)
# print(f'col_hts {col_hts}')
# calculate differentials for the (grounded) column heights
d_col_hts = np.sum(col_hts - col_hts_i)
# print(f'col_hts {col_hts} - col_hts_i {col_hts_i} ===> d_col_hts {d_col_hts}')
# set col_hts_i to current col_hts for next evaluation
col_hts_i = col_hts
# calculate penalty/bonus function
# col_pen = (col_hts**4 - 3**4).sum()
col_pen = np.where(d_col_hts > 0, (col_hts**4 - 3**4), 0).sum()
#
# if col_pen !=0:
# print(f'col_pen: {col_pen}')
######## end column height penalty calculation ########
######## color island bonus calculation ########
# mask the unit_img to remove the falling block
isle_img = drop_falling * unit_img
# print(isle_img)
# broadcast the game board to add a layer for each color
isle_imgs = np.broadcast_to(isle_img,(7,*isle_img.shape))
# define a mask to discriminate on color in each layer
isle_masked = isle_imgs*[isle_imgs==ind_grid[0]]
# reshape the array to return to 3 dimensions
isle_masked = isle_masked.reshape(isle_imgs.shape)
# generate the isle labels
isle_labels, isle_ncomps = sp_label(isle_masked, i_valid_conns)
# determine the island sizes (via return_counts) for all the unique labels
isle_inds, isle_sizes = np.unique(isle_labels, return_counts=True)
# zero out isle_sizes[0] to remove spike for background (500+ for near empty board)
isle_sizes[0] = 0
# evaluate difference to determine whether bonus applies
if isle_sizes_i.sum() != isle_sizes.sum():
# calculate bonus for all island sizes ater throwing away the 0 count
isle_bonus = (isle_sizes**3).sum()
else:
isle_bonus = 0

How to read a saved three-dimensional matrix (Dicom Matrix) of Matlab as Python syntax in Python?

I have saved a 3D matrix with coordinate (row = 288, col = 288, slice(z) =266) in Matlab.
Now I want to load it in Python. Unfortunately, after loading, it is as (row = 288, col = 266, slice(z) =288) in Python.
Given that, Matlab syntax for size: (rows, columns, slices in 3rd dimension) and Python syntax for size: (slices in 3rd dim, rows, columns).
For example, in the following code, when I want to viwe the variable A as an array it is as (row = 288, col = 266, slice(z) =288):
from math import sqrt
from skimage import data
import matplotlib.pyplot as plt
import cv2
import pydicom
import scipy.io as sio
import os
import numpy as np
for root, dirs, files in
os.walk('G:\PCodes\Other_Codes'):
matfiles = [_ for _ in files if _.endswith('.mat')]
for matfile in matfiles: # matfile: 'Final_Volume.mat'
Patient_All_Info = sio.loadmat(os.path.join(root, matfile)) # Patient_All_Info : {dict}
Patient_All_Info.items()
A = Patient_All_Info["Final_Volume"] # A: {ndarray} : (288, 266, 288) - it isn't as (row = 288, col = 288, slice(z) =266) coordinates.
S = np.shape(A) # S: <class 'tuple'>: (288, 288, 266) ?
dcm_image = pydicom.read_file('A')
image = dcm_image.pixel_array
plt.imshow(image, cmap='gray')
plt.show()
How to load a saved 3D matrix (Dicom Matrix) of Matlab in Python?
In an Octave session:
>> x = reshape(1:24,4,3,2);
>> save -v7 'test.mat' x
With Python, loadmat retains the shape and F order:
In [200]: data = loadmat('test.mat')
In [208]: data['x'].shape
Out[208]: (4, 3, 2)
In [209]: data['x'].ravel(order='F')
Out[209]:
array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13.,
14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24.])
Transpose will produce a (2,3,4) array
In [210]: data['x'].T
Out[210]:
array([[[ 1., 2., 3., 4.],
[ 5., 6., 7., 8.],
[ 9., 10., 11., 12.]],
[[13., 14., 15., 16.],
[17., 18., 19., 20.],
[21., 22., 23., 24.]]])
transpose can take an order parameter, eg data['x'].transpose(2,0,1).
(I'm not familiar with dicom, but hopefully this illustrates how loadmat handles a 3d array from MATLAB.)

View specific fields as ndarray with numpy >= 1.13

Data is in a structured array:
import numpy as np
dtype = [(field, float) for field in ['x', 'y', 'z', 'prop1', 'prop2']]
data = np.array([(1,2,3,4,5), (6,7,8,9,10), (11,12,13,14,15)], dtype=dtype)
For some operations, the positions are accessed as a single nx3 array, for example:
positions = data[['x', 'y', 'z']].view(dtype=float).reshape(-1, 3)
ranges = np.sqrt(np.sum(positions**2, 1))
Since numpy 1.12, the following warning is emitted:
FutureWarning: Numpy has detected that you may be viewing or writing to an array returned by selecting multiple fields in a
structured array.
This code may break in numpy 1.13 because this will return a view instead of a copy -- see release notes for details.
Here is the corresponding entry in the release notes:
Indexing a structured array with multiple fields (eg, arr[['f1', 'f3']]) will return a view into the original array in 1.13, instead of a copy. Note the returned view will have extra padding bytes corresponding to intervening fields in the original array, unlike the copy in 1.12, which will affect code such as arr[['f1', 'f3']].view(newdtype).
How to port this code to numpy >=1.13?
Checking on numpy 1.13 the announced change doesn't appear to have happened yet. So let's simulate the future:
The future behavior will presumably be not to copy the data but to create a dtype that has only the fields you want, but the itemsize of the original dtype. So there will be gaps in each element, parts of memory that are not used.
xyz_tp = xyz_tp = np.dtype({'names': list('xyz'),
'formats': tuple(data.dtype.fields[f][0] for f in 'xyz'),
'offsets': tuple(data.dtype.fields[f][1] for f in 'xyz'),
'itemsize': data.dtype.itemsize})
xyz = data.view(xyz_tp)
xyz
# array([( 1., 2., 3.), ( 6., 7., 8.), ( 11., 12., 13.)],
# dtype={'names':['x','y','z'], 'formats':['<f8','<f8','<f8'], 'offsets':[0,8,16], 'itemsize':40})
The not used memory locations and their content are ignored but still there, so if you view with a builtin dtype they'll reappear.
xyz.view(float)
# array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11.,
# 12., 13., 14., 15.])
# Ouch!
The general fix would be to cast to a contiguous (no gaps) dtype with the same fields. This will force a copy
xyz_cont_tp = np.dtype({'names': list('xyz'), 'formats': 3*('<f8',)})
xyz.astype(xyz_cont_tp).view(float).reshape(-1, 3)
# array([[ 1., 2., 3.],
# [ 6., 7., 8.],
# [ 11., 12., 13.]])
In the special case of your selected fields being contiguous and of same type you can also do:
np.lib.stride_tricks.as_strided(data.view(float), shape=(3,3), strides=data.strides + (8,))
# array([[ 1., 2., 3.],
# [ 6., 7., 8.],
# [ 11., 12., 13.]])
This method does not copy data but creates a genuine view.
Other way for several adjacent float fields. Here for 3 fields starting from 'x' we obtain same result with:
np.ndarray((len(data),3), float, data, offset= data.dtype.fields['x'][1], strides= (data.strides[0], np.dtype(float).itemsize))

Categories