how to remove surrounding empty data from 3d array - python

I have an array with shape (136, 512, 512) this array contains 0s and 1s indicating an objects shape inside this 3d space. I am trying to reduce the size of my array by removing empty slices of the array. essentially I want to keep all the 1s in my array but remove unnecessary rows and columns while keeping my array rectangular, similar to a hitbox or something like that. for example:
(0, 0, 0, 0, 0,
0, 0, 0, 1, 0,
0, 1, 1, 1, 0,
0, 1, 0, 1, 0,
1, 1, 0, 1, 0,
1, 0, 0, 1, 0,
0, 1, 1, 1, 0,
0, 0, 0, 0, 0)
would become:
(0, 0, 0, 1,
0, 1, 1, 1,
0, 1, 0, 1,
1, 1, 0, 1,
1, 0, 0, 1,
0, 1, 1, 1)
but on a 3d scale
(sorry about the horrible formatting, I'm terrible at this.)
and this is only necessary because pyplot doesn't seem to be able to plot such a large 3d graph with voxels, or atleast takes a very long time on my computer. so if anyone knows how to do large scale 3d plots that would be great.
EDIT
to clarify, the example is only a 2d example but to do this in 3d it must take into account all other rows / cols as each square must be the same shape. Not sure if this makes much sense, its hard to explain it in this many dimensions.
think of it as removing anything outside of the outer most 1s on each side from the centre of the cube.

EDIT: For removing only the surrounding brackets, read the excellent answer by Bill.
You can use np.all and np.delete to achieve this.
import numpy as np
l = [[0, 0, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 1, 1, 1, 0],
[0, 1, 0, 1, 0],
[1, 1, 0, 1, 0],
[1, 0, 0, 1, 0],
[0, 1, 1, 1, 0],
[0, 0, 0, 0, 0]]
arr = np.array(l)
arr1 = np.delete(arr, np.all(arr[..., :] == 0, axis=0), axis=1) # Deletes all 0-value columns
arr2 = np.delete(arr1, np.all(arr1[..., :] == 0, axis=1), axis=0) # Deletes all 0-value rows
print(arr)
print(arr2)
Output
[[0 0 0 0 0]
[0 0 0 1 0]
[0 1 1 1 0]
[0 1 0 1 0]
[1 1 0 1 0]
[1 0 0 1 0]
[0 1 1 1 0]
[0 0 0 0 0]]
[[0 0 0 1]
[0 1 1 1]
[0 1 0 1]
[1 1 0 1]
[1 0 0 1]
[0 1 1 1]]
The same can be extended to 3D array too.

Related

How access odd index elements and even index elements and merge them vertically

I've started learning numpy since yesterday.
my AIM is
Extract odd index elements from numpy array & even index elements from numpy and merge side by side vertically.
Let's say I have the array
mat = np.array([[1, 1, 0, 0, 0],
[0, 1, 0, 0, 1],
[1, 0, 0, 1, 1],
[0, 0, 0, 0, 0],
[1, 0, 1, 0, 1]])
What I've tried.
-->I've done transposing as I've to merge side by by side vertically.
mat = np.transpose(mat)
Which gives me
[[1 0 1 0 1]
[1 1 0 0 0]
[0 0 0 0 1]
[0 0 1 0 0]
[0 1 1 0 1]]
I've tried accessing odd index elements
odd = mat[1::2] print(odd)
Gives me
[[1 1 0 0 0] ----> wrong...should be [0,1,0,0,1] right? I'm confused
[0 0 1 0 0]] --->wrong...Should be [0,0,0,0,0] right? Where these are coming from?
My final output should like like
[[0 0 1 1 1]
[1 0 1 0 0]
[0 0 0 0 1]
[0 0 0 1 0]
[1 0 0 1 1]]
Type - np.nd array
Looks like you want:
mat[np.r_[1:mat.shape[0]:2,:mat.shape[0]:2]].T
Output:
array([[0, 0, 1, 1, 1],
[1, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 1, 0],
[1, 0, 0, 1, 1]])
Intermediate:
np.r_[1:mat.shape[0]:2,:mat.shape[0]:2]
output: array([1, 3, 0, 2, 4])
While the selection of rows is straight forward, there are various ways of combining them.
In [244]: mat = np.array([[1, 1, 0, 0, 0],
...: [0, 1, 0, 0, 1],
...: [1, 0, 0, 1, 1],
...: [0, 0, 0, 0, 0],
...: [1, 0, 1, 0, 1]])
The odd rows:
In [245]: mat[1::2,:] # or mat[1::2]
Out[245]:
array([[0, 1, 0, 0, 1],
[0, 0, 0, 0, 0]])
The even rows:
In [246]: mat[0::2,:]
Out[246]:
array([[1, 1, 0, 0, 0],
[1, 0, 0, 1, 1],
[1, 0, 1, 0, 1]])
Joining the rows verticallly (np.vstack can also be used):
In [247]: np.concatenate((mat[1::2,:], mat[0::2,:]), axis=0)
Out[247]:
array([[0, 1, 0, 0, 1],
[0, 0, 0, 0, 0],
[1, 1, 0, 0, 0],
[1, 0, 0, 1, 1],
[1, 0, 1, 0, 1]])
But since you want columns - tranpose:
In [248]: np.concatenate((mat[1::2,:], mat[0::2,:]), axis=0).transpose()
Out[248]:
array([[0, 0, 1, 1, 1],
[1, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 1, 0],
[1, 0, 0, 1, 1]])
We could transpose the selections first:
np.concatenate((mat[1::2,:].T, mat[0::2,:].T), axis=1)
or transpose before indexing (note the change in the ':' slice position):
np.concatenate((mat.T[:,1::2], mat.T[:,0::2]), axis=1)
The r_ in the other answer converts the slices into arrays and concatenates them, to make one row indexing array. That's equally valid.
So here alternate is the logic you can use.
1. convert array to list
2. Access nested list items based on mat[1::2] - odd & mat[::2] for even
3. concat them using np.concat at `axis =0` vertically.
4. Transpose them.
Implementaion.
mat = np.array([[1, 1, 0, 0, 0],
[0, 1, 0, 0, 1],
[1, 0, 0, 1, 1],
[0, 0, 0, 0, 0],
[1, 0, 1, 0, 1]])
mat_list = mat.tolist() ##############Optional
l_odd = mat_list[1::2]
l_even= mat_list[::2]
mask = np.concatenate((l_odd, l_even), axis=0)
mask = np.transpose(mask)
print(mask)
output #
[[0 0 1 1 1]
[1 0 1 0 0]
[0 0 0 0 1]
[0 0 0 1 0]
[1 0 0 1 1]]
Checking Type
print(type(mask))
Gives
<class 'numpy.ndarray'>

I'm unable to reshape a 1D np array of np arrays of size 3 without modifying them

rgb_list = []
int_list = [1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1]
for num in range(0, len(int_list)-3, 3):
rgb_list.append(received_int[num:num+3])
received_array = np.array(rgb_list)
print(received_array)
received_array_2d = np.ndarray.reshape(received_array, (5, 2))
print(received_array_2d)
So up until received_array, everything was fine, but when I try to reshape it into a 2D array, I get an error code, I assume it's because numpy is considering each integer individually, not the arrays.
ValueError: cannot reshape array of size 30 into shape (5,2)
the output of print(received_array) is
[[1 0 0]
[1 0 0]
[1 1 0]
[1 0 0]
[1 1 1]
[0 0 1]
[0 1 0]
[1 0 1]
[0 1 0]
[0 1 1]]
I want to get a 2D array that resembles this
[[1 0 0] [1 0 0] [1 1 0] [1 0 0] [1 1 1]
[0 0 1] [0 1 0] [1 0 1] [0 1 0] [0 1 1]]
How would I go about doing that?
If you are using numpy arrays, use numpy methods: reshape is appropriate here.
You first need to trim your array to a multiple of the expected dimensions:
int_list = np.array([1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1])
X,Y,Z = 2,5,3
int_list[:X*Y*Z].reshape((2,5,3))
output:
array([[[1, 0, 0], [1, 0, 0], [1, 1, 0], [1, 0, 0], [1, 1, 1]],
[[0, 0, 1], [0, 1, 0], [1, 0, 1], [0, 1, 0], [0, 1, 1]],
])

How to transform 1D list of values to 2D grid of 0's and 1's in python [duplicate]

This question already has answers here:
How can I one hot encode in Python?
(22 answers)
Closed 3 years ago.
I would like to take a list of values and transform them to a table (2D-list) of 0's and 1's, with one column for each unique number in the source list and an equal number of rows to the original. Each row will have a 1 if that column index matches the original value-1.
I have code that accomplishes this task, but I'm wondering if there is a better/faster way to do it. (The actual dataset has millions of entries vs. the simplified set below)
Sample Input:
value_list = [1, 2, 1, 3, 6, 5, 4, 3]
Desired output:
output_table = [[1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 1, 0, 0, 0]]
Current Solution:
value_list = [1, 2, 1, 3, 6, 5, 4, 3]
max_val = max(value_list)
# initialize to table of 0's
a = [([0] * max_val) for i in range(len(value_list))]
# overwrite with 1's where required
for i in range(len(value_list)):
j = value_list[i] - 1
a[i][j] = 1
print(f'a = ')
for row in a:
print(f'{row}')
You can do:
import numpy as np
value_list = [1, 2, 1, 3, 6, 5, 4, 3]
# create matrix of zeros
x = np.zeros(shape=(len(value_list), max(value_list)), dtype='int')
for i,v in enumerate(value_list):
x[i,v-1] = 1
print(x)
Output:
[[1 0 0 0 0 0]
[0 1 0 0 0 0]
[1 0 0 0 0 0]
[0 0 1 0 0 0]
[0 0 0 0 0 1]
[0 0 0 0 1 0]
[0 0 0 1 0 0]
[0 0 1 0 0 0]]
You can try this:
dummy_list = [0]*6
output_table = [dummy_list[:i-1] + [1] + dummy_list[i:] for i in value_list]
Output:
output_table = [[1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 1, 0, 0, 0]]

reverse flatten numpy array?

I have an array
[[0, 1, 0, 0] [0, 1, 0, 0] [1, 0, 0, 0] ..., [0, 1, 0, 0] [0, 1, 0, 0]
[1, 0, 0, 0]]
of Shape(38485,)
i want to reshape to (38485,4) like
[[0, 1, 0, 0]
[0, 1, 0, 0]
[1, 0, 0, 0]
.
.
.
[0, 1, 0, 0]
[0, 1, 0, 0]
[1, 0, 0, 0]]
but when i try array.reshape(-1,4) it throws me the error ValueError: cannot reshape array of size 38485 into shape (4)
My code to get array:
dataset = pd.read_csv('train.csv')
y = dataset.iloc[:, 6]
fr=np.array([1,0,0,0])
re=np.array([0,1,0,0])
le=np.array([0,0,1,0])
ri=np.array([0,0,0,1])
for i in range(y.shape[0]):
if y[i]=="Front":
y[i]=fr
elif y[i]=="Rear":
y[i]=re
elif y[i]=="Left":
y[i]=le
elif y[i]=="Right":
y[i]=ri
array=y.values
Is there any way I can accomplish this?
I Fixed this by
array = np.array([[n for n in row] for row in array])
Thanks to wim
Updated answer:
The variable y is a numpy array which contained strings and numpy.arrays. Its dtype is object, so numpy doesn't understand it's a table, even though it's full of 4-element numpy.arrays at the end of the preprocessing.
You could either avoid mixing object types by using another variable than y or convert y.values with :
array = np.array([x.astype('int32') for x in y.values])
As an example:
import numpy as np
y = np.array(["left", "right"], dtype = "object")
y[0] = np.array([1,0])
y[1] = np.array([0,1])
print(y)
# [[1 0] [0 1]]
print(y.dtype)
# object
print(y.shape)
# (2,)
y = np.array([x.astype('int32') for x in y])
print(y)
# [[1 0]
# [0 1]]
print(y.dtype)
# int32
print(y.shape)
# (2, 2)
Original answer:
Your array is somehow incomplete. It has 38485 elements, many of which look like 4-elements arrays. But somewhere in the middle, there must be at least one inner-array which doesn't have 4 elements. Or you might have a mix of collections (list, array, ).
That could be why the second value isn't defined in the shape.
Here's an example with one (8, 4) array and a copy of it, with just one element missing:
import numpy as np
data = np.array([[0, 1, 0, 0],[0, 1, 0, 0],[1, 0, 0, 0] , [0, 1, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0],[1, 0, 0, 0]])
print(data.shape)
# (8, 4)
print(data.dtype)
# int64
print(set(len(sub_array) for sub_array in data))
# set([4])
print(data.reshape(-1, 4))
# [[0 1 0 0]
# [0 1 0 0]
# [1 0 0 0]
# [0 1 0 0]
# [0 1 0 0]
# [0 1 0 0]
# [0 1 0 0]
# [1 0 0 0]]
broken_data = np.array([[0, 1, 0, 0],[0, 1, 0, 0],[1, 0, 0, 0] , [1, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0],[1, 0, 0, 0]])
print(broken_data.shape)
# (8, )
print(broken_data.dtype)
# object
print(set(len(sub_array) for sub_array in broken_data))
# set([3, 4])
print(broken_data.reshape(-1, 4))
# [[[0, 1, 0, 0] [0, 1, 0, 0] [1, 0, 0, 0] [1, 0, 0]]
# [[0, 1, 0, 0] [0, 1, 0, 0] [0, 1, 0, 0] [1, 0, 0, 0]]]
print([sub_array for sub_array in broken_data if len(sub_array) != 4])
# [[1, 0, 0]]
Find the sub-arrays that don't have exactly 4 elements and either filter them out or modify them.
You'll then have a (38485,4) array, and you won't have to call reshape.
The array length must be a multiple of 4. 38485 is not a multiple of 4. Otherwise, the reshape as you have written it should work correctly:
array.reshape(-1,4)

How to find cluster sizes in 2D numpy array?

My problem is the following,
I have a 2D numpy array filled with 0 an 1, with an absorbing boundary condition (all the outer elements are 0) , for example:
[[0 0 0 0 0 0 0 0 0 0]
[0 0 1 0 0 0 0 0 0 0]
[0 0 1 0 1 0 0 0 1 0]
[0 0 0 0 0 0 1 0 1 0]
[0 0 0 0 0 0 1 0 0 0]
[0 0 0 0 1 0 1 0 0 0]
[0 0 0 0 0 1 1 0 0 0]
[0 0 0 1 0 1 0 0 0 0]
[0 0 0 0 1 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]]
I want to create a function that takes this array and its linear dimension L as input parameters, (in this case L = 10) and returns the list of cluster sizes of this array.
By "clusters" I mean the isolated groups of elements 1 of the array
the array element [ i ][ j ] is isolated if all its neighbours are zeros, and its neighbours are the elements:
[i+1][j]
[i-1][j]
[i][j+1]
[i][j-1]
So in the previous array we have 7 clusters of sizes (2,1,2,6,1,1,1)
I tried to complete this task by creating two functions, the first one is a recursive function:
def clust_size(array,i,j):
count = 0
if array[i][j] == 1:
array[i][j] = 0
if array[i-1][j] == 1:
count += 1
array[i-1][j] = 0
clust_size(array,i-1,j)
elif array[i][j-1] == 1:
count += 1
array[i-1][j] = 0
clust_size(array,i,j-1)
elif array[i+1][j] == 1:
count += 1
array[i-1][j] = 0
clust_size(array,i+1,j)
elif array[i][j+1] == 1:
count += 1
array[i-1][j] = 0
clust_size(array,i,j+1)
return count+1
and it should return the size of one cluster. Everytime the function finds an array element equal to 1 it increases the value of the counter "count" and changes the value of the element to 0, in this way each '1' element it's counted just one time.
If one of the neighbours of the element is equal to 1 then the function calls itself on that element.
The second function is:
def clust_list(array,L):
sizes_list = []
for i in range(1,L-1):
for i in range(1,L-1):
count = clust_size(array,i,j)
sizes_list.append(count)
return sizes_list
and it should return the list containing the cluster sizes. The for loop iterates from 1 to L-1 because all the outer elements are 0.
This doesn't work and I can't see where the error is...
I was wondering if maybe there's an easier way to do it.
it seems like a percolation problem.
The following link has your answer if you have scipy installed.
http://dragly.org/2013/03/25/working-with-percolation-clusters-in-python/
from pylab import *
from scipy.ndimage import measurements
z2 = array([[0,0,0,0,0,0,0,0,0,0],
[0,0,1,0,0,0,0,0,0,0],
[0,0,1,0,1,0,0,0,1,0],
[0,0,0,0,0,0,1,0,1,0],
[0,0,0,0,0,0,1,0,0,0],
[0,0,0,0,1,0,1,0,0,0],
[0,0,0,0,0,1,1,0,0,0],
[0,0,0,1,0,1,0,0,0,0],
[0,0,0,0,1,0,0,0,0,0],
[0,0,0,0,0,0,0,0,0,0]])
This will identify the clusters:
lw, num = measurements.label(z2)
print lw
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 2, 0, 0, 0, 3, 0],
[0, 0, 0, 0, 0, 0, 4, 0, 3, 0],
[0, 0, 0, 0, 0, 0, 4, 0, 0, 0],
[0, 0, 0, 0, 5, 0, 4, 0, 0, 0],
[0, 0, 0, 0, 0, 4, 4, 0, 0, 0],
[0, 0, 0, 6, 0, 4, 0, 0, 0, 0],
[0, 0, 0, 0, 7, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
The following will calculate their area.
area = measurements.sum(z2, lw, index=arange(lw.max() + 1))
print area
[ 0. 2. 1. 2. 6. 1. 1. 1.]
This gives what you expect, although I would think that you would have a cluster with 8 members by eye-percolation.
I feel your problem with finding "clusters", is essentially the same problem of finding connected components in a binary image (with values of either 0 or 1) based on 4-connectivity. You can see several algorithms to identify the connected components (or "clusters" as you defined them) in this Wikipedia page:
http://en.wikipedia.org/wiki/Connected-component_labeling
Once the connected components or "clusters" are labelled, you can find any information you want easily, including the area, relative position or any other information you may want.
I believe that your way ist almost correct, except that you are initializing the variable count over and over again whenever you recursively call your function clust_size. I would add the count variable to the input parameters of clust_size and just reinitialize it for every first call in your nested for loops with count = 0.
Like this, you would call clust_size always like count=clust_size(array, i ,j, count)
I haven't tested it but it seems to me that it should work.
Hope it helps.
A relatively simple problem if you convert this to strings
import numpy as np
arr=np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0,],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0,],
[0, 0, 1, 1, 1, 1, 1, 1, 1, 0,], #modified
[0, 0, 0, 0, 0, 0, 1, 0, 1, 0,],
[0, 0, 0, 0, 0, 0, 1, 0, 0, 0,],
[0, 0, 0, 0, 1, 0, 1, 0, 0, 0,],
[0, 0, 0, 0, 0, 1, 1, 0, 0, 0,],
[0, 0, 0, 1, 0, 1, 0, 0, 0, 0,],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0,],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
arr = "".join([str(x) for x in arr.reshape(-1)])
print [len(x) for x in arr.replace("0"," ").split()]
output
[1, 7, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1] #Cluster sizes

Categories