python h5py bug when feeding multidimensional dataset

python h5py bug when feeding multidimensional dataset - python

Here is my problem, it works in case 1, not in case 2:
import h5py
import numpy as np
data = np.random.randint(0,256,(5,), np.uint8)
f = h5py.File('test.h5','w')
f.create_dataset('1',(3,5), np.uint8)
f.create_dataset('2',(1,3,5), np.uint8)
print("case 1 before:\n",f['1'].value)
# case 1 before:
# [[0 0 0 0 0]
# [0 0 0 0 0]
# [0 0 0 0 0]]
f['1'][0] = data
print("case 1 after:\n",f['1'].value)
# case 1 after:
# [[ 75 215 125 175 193]
# [ 0 0 0 0 0]
# [ 0 0 0 0 0]]
print()
print()
print("case 2 before:\n",f['2'].value)
# case 2 before:
# [[[0 0 0 0 0]
# [0 0 0 0 0]
# [0 0 0 0 0]]]
f['2'][0][0] = data
print("case 2 after:\n",f['2'].value)
# case 2 after:
# [[[0 0 0 0 0]
# [0 0 0 0 0]
# [0 0 0 0 0]]]
Does anyone can explain to me what i am doing wrong?
(please do not suggest to create a np.array whith shape equal to my dataset shape, because I work with way more dimentions/size!!)

Don't used chained-indexing when making assignments. Instead of
f['2'][0][0] = data
Use
f['2'][0,0] = data
f['2'][0] returns a new array whose data is copied from f['2']. f['2'][0][0] = data assigns data to this new array. The assignment has no effect on f['2'].
In contrast, f['2'][0,0] = data modifies f['2'].
Under the hood, remember that foo[x] calls foo.__getitem__(x).
and foo[x] = y calls foo.__setitem__(x, y).
So f['2'][0][0] = data calls
f.__getitem__('2').__getitem__(0).__setitem(0, data)
f.__getitem__('2') returns a Dataset,
f.__getitem__('2').__getitem__(0) returns a NumPy array
f.__getitem__('2').__getitem__(0).__setitem(0, data) modifies that NumPy array
Whereas, f['2'][0,0] = data calls
f.__getitem__('2').__setitem__((0,0), data)
Now it is the Dataset's __setitem__ method that gets called, which naturally gives the Dataset an opportunity to modify its internal data.

Related

Numpy get secondary diagonal with offset=1 and change the values

I have this 6x6 matrix filled with 0s. I got the secondary diagonal in sec_diag. The thing I am trying to do is to change the values of above the sec_diag inside the matrix with the odds numbers from 9-1 [9,7,5,3,1]
import numpy as np
x = np.zeros((6,6), int)
sec_diag = np.diagonal(np.fliplr(x), offset=1)
The result should look like this:
[[0,0,0,0,9,0],
[0,0,0,7,0,0],
[0,0,5,0,0,0],
[0,3,0,0,0,0],
[1,0,0,0,0,0],
[0,0,0,0,0,0]]
EDIT: np.fill_diagonal isn't going to work.

You should use roll
x = np.zeros((6,6),dtype=np.int32)
np.fill_diagonal(np.fliplr(x), [9,7,5,3,1,0])
xr = np.roll(x,-1,axis=1)
print(xr)
Output
[[0 0 0 0 9 0]
[0 0 0 7 0 0]
[0 0 5 0 0 0]
[0 3 0 0 0 0]
[1 0 0 0 0 0]
[0 0 0 0 0 0]]

Maybe you should try with a double loop

Write functions resilient to variable dimension array

I'm struggling when writing a function that would seemlessly apply to any numpy arrays whatever its dimension.
At one point in my code, I have boolean arrays that I consider as mask for other arrays (0 = not passing, 1 = passing).
I would like to "enlarge" those mask arrays by overriding zeros adjacent to ones on a defined range.
Example :
input = [0,0,0,0,0,1,0,0,0,0,1,0,0,0]
enlarged_by_1 = [0,0,0,0,1,1,1,0,0,1,1,1,0,0]
enlarged_by_2 = [0,0,0,1,1,1,1,1,1,1,1,1,1,0]
input = [[0,0,0,1,0,0,1,0],
[0,1,0,0,0,0,0,0],
[0,0,0,0,0,0,1,0]]
enlarged_by_1 = [[0,0,1,1,1,1,1,1],
[1,1,1,0,0,0,0,0],
[0,0,0,0,0,1,1,1]]
This is pretty straighforward when inputs are 1D.
However, I would like this function to take seemlessy 1D, matrix, 3D, and so on.
So for a matrix, the same logic would be applied to each lines.
I read about ellipsis, but it does not seem to be applicable in my case.
Flattening the input applying the logic and reshaping the array would lead to possible contamination between individual arrays.
I do not want to go through testing the shape of input numpy array / recursive function as it does not seems very clean to me.
Would you have some suggestions ?

The operation that you are described seems very much like a convolution operation followed by clipping to ensure that values remain 0 or 1.
For your example input:
import numpy as np
input = np.array([0,0,0,0,0,1,0,0,0,0,1,0,0,0], dtype=int)
print(input)
def enlarge_ones(x, k):
mask = np.ones(2*k+1, dtype=int)
return np.clip(np.convolve(x, mask, mode='same'), 0, 1).astype(int)
print(enlarge_ones(input, k=1))
print(enlarge_ones(input, k=3))
which yields
[0 0 0 0 0 1 0 0 0 0 1 0 0 0]
[0 0 0 0 1 1 1 0 0 1 1 1 0 0]
[0 0 1 1 1 1 1 1 1 1 1 1 1 1]
numpy.convolve only works for 1-d arrays. However, one can imagine a for loop over the number of array dimensions and another for loop over each array. In other words, for a 2-d matrix first operate on every row and then on every column. You get the idea for nd-array with more dimensions. In other words the enlarge_ones would become something like:
def enlarge_ones(x, k):
n = len(x.shape)
if n == 1:
mask = np.ones(2*k+1, dtype=int)
return np.clip(np.convolve(x, mask, mode='same')[:len(x)], 0, 1).astype(int)
else:
x = x.copy()
for d in range(n):
for i in np.ndindex(x.shape[:-1]):
x[i] = enlarge_ones(x[i], k) # x[i] is 1-d
x = x.transpose(list(range(1, n)) + [0])
return x
Note the use of np.transpose to rotate the dimensions so that np.convolve is applied to the 1-d along each dimension. This is exactly n times, which returns the matrix to original shape at the end.
x = np.zeros((3, 5, 7), dtype=int)
x[1, 2, 2] = 1
print(x)
print(enlarge_ones(x, k=1))
[[[0 0 0 0 0 0 0]
[0 0 0 0 0 0 0]
[0 0 0 0 0 0 0]
[0 0 0 0 0 0 0]
[0 0 0 0 0 0 0]]
[[0 0 0 0 0 0 0]
[0 0 0 0 0 0 0]
[0 0 1 0 0 0 0]
[0 0 0 0 0 0 0]
[0 0 0 0 0 0 0]]
[[0 0 0 0 0 0 0]
[0 0 0 0 0 0 0]
[0 0 0 0 0 0 0]
[0 0 0 0 0 0 0]
[0 0 0 0 0 0 0]]]
[[[0 0 0 0 0 0 0]
[0 1 1 1 0 0 0]
[0 1 1 1 0 0 0]
[0 1 1 1 0 0 0]
[0 0 0 0 0 0 0]]
[[0 0 0 0 0 0 0]
[0 1 1 1 0 0 0]
[0 1 1 1 0 0 0]
[0 1 1 1 0 0 0]
[0 0 0 0 0 0 0]]
[[0 0 0 0 0 0 0]
[0 1 1 1 0 0 0]
[0 1 1 1 0 0 0]
[0 1 1 1 0 0 0]
[0 0 0 0 0 0 0]]]

Efficient way to find coordinates of connected blobs in binary image

I am looking for the coordinates of connected blobs in a binary image (2d numpy array of 0 or 1).
The skimage library provides a very fast way to label blobs within the array (which I found from similar SO posts). However I want a list of the coordinates of the blob, not a labelled array. I have a solution which extracts the coordinates from the labelled image. But it is very slow. Far slower than the inital labelling.
Minimal Reproducible example:
import timeit
from skimage import measure
import numpy as np
binary_image = np.array([
[0,1,0,0,1,1,0,1,1,0,0,1],
[0,1,0,1,1,1,0,1,1,1,0,1],
[0,0,0,0,0,0,0,1,1,1,0,0],
[0,1,1,1,1,0,0,0,0,1,0,0],
[0,0,0,0,0,0,0,1,1,1,0,0],
[0,0,1,0,0,0,0,0,0,0,0,0],
[0,1,0,0,1,1,0,1,1,0,0,1],
[0,0,0,0,0,0,0,1,1,1,0,0],
[0,1,1,1,1,0,0,0,0,1,0,0],
])
print(f"\n\n2d array of type: {type(binary_image)}:")
print(binary_image)
labels = measure.label(binary_image)
print(f"\n\n2d array with connected blobs labelled of type {type(labels)}:")
print(labels)
def extract_blobs_from_labelled_array(labelled_array):
# The goal is to obtain lists of the coordinates
# Of each distinct blob.
blobs = []
label = 1
while True:
indices_of_label = np.where(labelled_array==label)
if not indices_of_label[0].size > 0:
break
else:
blob =list(zip(*indices_of_label))
label+=1
blobs.append(blob)
if __name__ == "__main__":
print("\n\nBeginning extract_blobs_from_labelled_array timing\n")
print("Time taken:")
print(
timeit.timeit(
'extract_blobs_from_labelled_array(labels)',
globals=globals(),
number=1
)
)
print("\n\n")
Output:
2d array of type: <class 'numpy.ndarray'>:
[[0 1 0 0 1 1 0 1 1 0 0 1]
[0 1 0 1 1 1 0 1 1 1 0 1]
[0 0 0 0 0 0 0 1 1 1 0 0]
[0 1 1 1 1 0 0 0 0 1 0 0]
[0 0 0 0 0 0 0 1 1 1 0 0]
[0 0 1 0 0 0 0 0 0 0 0 0]
[0 1 0 0 1 1 0 1 1 0 0 1]
[0 0 0 0 0 0 0 1 1 1 0 0]
[0 1 1 1 1 0 0 0 0 1 0 0]]
2d array with connected blobs labelled of type <class 'numpy.ndarray'>:
[[ 0 1 0 0 2 2 0 3 3 0 0 4]
[ 0 1 0 2 2 2 0 3 3 3 0 4]
[ 0 0 0 0 0 0 0 3 3 3 0 0]
[ 0 5 5 5 5 0 0 0 0 3 0 0]
[ 0 0 0 0 0 0 0 3 3 3 0 0]
[ 0 0 6 0 0 0 0 0 0 0 0 0]
[ 0 6 0 0 7 7 0 8 8 0 0 9]
[ 0 0 0 0 0 0 0 8 8 8 0 0]
[ 0 10 10 10 10 0 0 0 0 8 0 0]]
Beginning extract_blobs_from_labelled_array timing
Time taken:
9.346099977847189e-05
9e-05 is small but so is this image for the example. In reality I am working with very high resolution images for which the function takes approximately 10 minutes.
Is there a faster way to do this?
Side note: I'm only using list(zip()) to try get the numpy coordinates into something I'm used to (I don't use numpy much just Python). Should I be skipping this and just using the coordinates to index as-is? Will that speed it up?

The part of the code that slow is here:
while True:
indices_of_label = np.where(labelled_array==label)
if not indices_of_label[0].size > 0:
break
else:
blob =list(zip(*indices_of_label))
label+=1
blobs.append(blob)
First, a complete aside: you should avoid using while True when you know the number of elements you will be iterating over. It's a recipe for hard-to-find infinite-loop bugs.
Instead, you should use:
for label in range(np.max(labels)):
and then you can ignore the if ...: break.
A second issue is indeed that you are using list(zip(*)), which is slow compared to NumPy functions. Here you could get approximately the same result with np.transpose(indices_of_label), which will get you a 2D array of shape (n_coords, n_dim), ie (n_coords, 2).
But the Big Issue is the expression labelled_array == label. This will examine every pixel of the image once for every label. (Twice, actually, because then you run np.where(), which takes another pass.) This is a lot of unnecessary work, as the coordinates can be found in one pass.
The scikit-image function skimage.measure.regionprops can do this for you. regionprops goes over the image once and returns a list containing one RegionProps object per label. The object has a .coords attribute containing the coordinates of each pixel in the blob. So, here's your code, modified to use that function:
import timeit
from skimage import measure
import numpy as np
binary_image = np.array([
[0,1,0,0,1,1,0,1,1,0,0,1],
[0,1,0,1,1,1,0,1,1,1,0,1],
[0,0,0,0,0,0,0,1,1,1,0,0],
[0,1,1,1,1,0,0,0,0,1,0,0],
[0,0,0,0,0,0,0,1,1,1,0,0],
[0,0,1,0,0,0,0,0,0,0,0,0],
[0,1,0,0,1,1,0,1,1,0,0,1],
[0,0,0,0,0,0,0,1,1,1,0,0],
[0,1,1,1,1,0,0,0,0,1,0,0],
])
print(f"\n\n2d array of type: {type(binary_image)}:")
print(binary_image)
labels = measure.label(binary_image)
print(f"\n\n2d array with connected blobs labelled of type {type(labels)}:")
print(labels)
def extract_blobs_from_labelled_array(labelled_array):
"""Return a list containing coordinates of pixels in each blob."""
props = measure.regionprops(labelled_array)
blobs = [p.coords for p in props]
return blobs
if __name__ == "__main__":
print("\n\nBeginning extract_blobs_from_labelled_array timing\n")
print("Time taken:")
print(
timeit.timeit(
'extract_blobs_from_labelled_array(labels)',
globals=globals(),
number=1
)
)
print("\n\n")

find mean position and area of labelled objects

I have a 2D labeled image (numpy array), each label represents an object. I have to find the object's center and its area. My current solution:
centers = [np.mean(np.where(label_2d == i),1) for i in range(1,num_obj+1)]
surface_area = np.array([np.sum(label_2d == i) for i in range(1,num_obj+1)])
Note that label_2d used for centers is not the same as the one for surface area, so I can't combine both operations. My current code is about 10-100 times to slow.
In C++ I would iterate through the image once (2 for loops) and fill the table (an array), from which I would than calculate centers and surface area.
Since for loops are quite slow in python, I have to find another solution. Any advice?

You could use the center_of_mass function present in scipy.ndimage.measurements for the first problem and then use np.bincount for the second problem. Because these are in the mainstream libraries, they will be heavily optimized, so you can expect decent speed gains.
Example:
>>> import numpy as np
>>> from scipy.ndimage.measurements import center_of_mass
>>>
>>> a = np.zeros((10,10), dtype=np.int)
>>> # add some labels:
... a[3:5, 1:3] = 1
>>> a[7:9, 0:3] = 2
>>> a[5:6, 4:9] = 3
>>> print(a)
[[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 1 1 0 0 0 0 0 0 0]
[0 1 1 0 0 0 0 0 0 0]
[0 0 0 0 3 3 3 3 3 0]
[0 0 0 0 0 0 0 0 0 0]
[2 2 2 0 0 0 0 0 0 0]
[2 2 2 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]]
>>>
>>> num_obj = 3
>>> surface_areas = np.bincount(a.flat)[1:]
>>> centers = center_of_mass(a, labels=a, index=range(1, num_obj+1))
>>> print(surface_areas)
[4 6 5]
>>> print(centers)
[(3.5, 1.5), (7.5, 1.0), (5.0, 6.0)]
Speed gains depend on the size of your input data though, so I can't make any serious estimates on that. Would be nice if you could add that info (size of a, number of labels, timing results for the method you used and these functions) in the comments.

Counting of adjacent cells in a numpy array

Past midnight and maybe someone has an idea how to tackle a problem of mine. I want to count the number of adjacent cells (which means the number of array fields with other values eg. zeroes in the vicinity of array values) as sum for each valid value!.
Example:
import numpy, scipy
s = ndimage.generate_binary_structure(2,2) # Structure can vary
a = numpy.zeros((6,6), dtype=numpy.int) # Example array
a[2:4, 2:4] = 1;a[2,4] = 1 # with example value structure
print a
>[[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 1 1 1 0]
[0 0 1 1 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]]
# The value at position [2,4] is surrounded by 6 zeros, while the one at
# position [2,2] has 5 zeros in the vicinity if 's' is the assumed binary structure.
# Total sum of surrounding zeroes is therefore sum(5+4+6+4+5) == 24
How can i count the number of zeroes in such way if the structure of my values vary?
I somehow believe to must take use of the binary_dilation function of SciPy, which is able to enlarge the value structure, but simple counting of overlaps can't lead me to the correct sum or does it?
print ndimage.binary_dilation(a,s).astype(a.dtype)
[[0 0 0 0 0 0]
[0 1 1 1 1 1]
[0 1 1 1 1 1]
[0 1 1 1 1 1]
[0 1 1 1 1 0]
[0 0 0 0 0 0]]

Use a convolution to count neighbours:
import numpy
import scipy.signal
a = numpy.zeros((6,6), dtype=numpy.int) # Example array
a[2:4, 2:4] = 1;a[2,4] = 1 # with example value structure
b = 1-a
c = scipy.signal.convolve2d(b, numpy.ones((3,3)), mode='same')
print numpy.sum(c * a)
b = 1-a allows us to count each zero while ignoring the ones.
We convolve with a 3x3 all-ones kernel, which sets each element to the sum of it and its 8 neighbouring values (other kernels are possible, such as the + kernel for only orthogonally adjacent values). With these summed values, we mask off the zeros in the original input (since we don't care about their neighbours), and sum over the whole array.

I think you already got it. after dilation, the number of 1 is 19, minus 5 of the starting shape, you have 14. which is the number of zeros surrounding your shape. Your total of 24 has overlaps.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python h5py bug when feeding multidimensional dataset - python

Related

Numpy get secondary diagonal with offset=1 and change the values

Write functions resilient to variable dimension array

Efficient way to find coordinates of connected blobs in binary image

find mean position and area of labelled objects

Counting of adjacent cells in a numpy array

Categories

Resources