Finding 3D indices of all matching values in numpy - python

I have a 3D int64 Numpy array, which is output from skimage.measure.label. I need a list of 3D indices that match each of our possible (previously known) values, separated out by which indices correspond to each value.
Currently, we do this by the following idiom:
for cur_idx,count in values_counts.items():
region=labels[:,:,:] == cur_idx
[dim1_indices,dim2_indices,dim3_indices]= np.nonzero(region)
While this code works and produces correct output, it is quite slow, especially the np.nonzero part, as we call this 200+ times on a large array. I realize that there is probably a faster way to do this via, say, numba, but we'd like to avoid adding on additional requirements unless needed.
Ultimately, what we're looking for is a list of indices that correspond to each (nonzero) value relatively efficiently. Assume our number of values <1000 but our array size >100x1000x1000. So, for example, on the array created by the following:
x = np.zeros((4,4,4))
x[3,3,3] = 1; x[1,0,3] = 2; x[1,2,3] = 2
we would want some idx_value dict/array such that idx_value_1[2] = 1 idx_value_2[2] = 2, idx_value_3[2] = 3.

I've tried tackling problems similar to the one you describe, and I think the np.argwhere function is probably your best option for reducing runtime (see docs here). See the code example below for how this could be used per the constraints you identify above.
import numpy as np
x = np.zeros((4,4,4))
x[3,3,3] = 1; x[1,0,3] = 2; x[1,2,3] = 3
# Instantiate dictionary/array to store indices
idx_value = {}
# Get indices equal to 2
idx_value[3] = np.argwhere(x == 3)
idx_value[2] = np.argwhere(x == 2)
idx_value[1] = np.argwhere(x == 1)
# Display idx_value - consistent with indices we set before
>>> idx_value
{3: array([[1, 2, 3]]), 2: array([[1, 0, 3]]), 1: array([[3, 3, 3]])}
For the first use case, I think you would still have to use a for loop to iterate over the values you're searching over, but it could be done as:
# Instantiate dictionary/array
idx_value = {}
# Now loop by incrementally adding key/value pairs
for cur_idx,count in values_counts.items():
idx_value[cur_idx] = np.argwhere(labels)
NOTE: This incrementally creates a dictionary where each key is an idx to search for, and each value is a np.array object of shape (N_matches, 3).

Related

Mapping between ids and indices using numpy arrays

I'm working on a graphical application that uses shapes such as quads, trias, lines etc. to represent geometry.
The input data is based on ID's.
A list of points is provided, each with an ID and coordinates (x, y, z)
A list of shapes is provided, each defined using the ids from he list of points
So a tria is defined as N1, N2, N3 where the N's are ID's in the list of points
I'm using VTK to display the data and it uses indices and not ids.
So I have to convert the id based input to index based input and I use the following numpy array approach which works REALLY well (and was provided by someone on this board I think)
# nodes - numpy array of point ids
nodes=np.asarray([1, 15, 56, 101, 150]) # This array can be millions of ids long
nmx = nodes.max()
node_id_to_index = np.empty((nmx + 1,), dtype=np.uint32)
# Using the node id as an index, insert consecutive indices as values into the array
# This gives us an array that can be indexed by ID and return the compact index
node_id_to_index[nodes] = np.arange(len(nodes), dtype=np.uint32)
Now when I have a shape defined using ids I can easily convert it to use indices like this
elems_by_id = np.asarray([56,1,150,15,101,1]) # This array can be millions of ids long
elems_by_index = node_id_to_index[elems_by_id]
# gives [2, 0, 4, 1, 3, 0]
One weakness of the approach is that if the original list of ids contains even a VERY large number, I'm required to allocate an array big enough to hold that many items. Even though I may not have that many entries in the original id list. (The original ID list can have gaps in the ids). I ran into this condition today.....
So my question is - how can I modify this approach to handle lists that contain ids so large that I don't have enough memory to create the mapping array?
Any help will be gratefully received....
Doug
OK - I think I found a solution - Credit to #Paul Panzer
But first some addition info - the input nodes array is sorted and guaranteed to have only unique ids
elems_by_index = nodes.searchsorted(elems_by_id)
This is only marginally slower than the original approach - so I'll just branch based on the max id in nodes - use the original approach when I can easily allocate enough memory and the second approach when the max id is huge....
As I understand, essentially you're looking for a fast way to find the index of a number in a list, e. g., you have a list like:
nodes = [932, 578, 41, ...]
and need a structure that would give
id_to_index[932] == 0
id_to_index[578] == 1
id_to_index[41] == 2
# etc.
(which can be done with something as simple as nodes.index(932), except that wouldn't be any fast). And your current solution is, essentially, the first part of the pigeonhole sort algorithm, and the problem is that the "number of elements (n) and the length of the range of possible key values (N) are approximately the same" condition isn't met - the range is much bigger in your case, so too much memory is wasted on that auxiliary data structure.
Why not simply use a Python dictionary, by the way? E. g. id_to_index = {932: 0, 578: 1, 41: 2, ...} - is it too slow (your current lookup is O(1), with a dictionary it would be log something)? Or is it because you want numpy indexing (e. g. id_to_index[[n1, n2, n3]] instead of one by one)? Perhaps, then, you can use SciPy sparce matrices (a single-row matrix instead of an array):
import numpy as np
import scipy.sparse as sp
nodes = np.array([9, 2, 7]) # a small test sample
# your solution with a numpy array
nmx = nodes.max()
node_id_to_index = np.empty((nmx + 1,), dtype=np.uint32)
node_id_to_index[nodes] = np.arange(len(nodes), dtype=np.uint32)
elems_by_id = [7, 2] # an even smaller test sample
elems_by_index = node_id_to_index[elems_by_id]
# gives [2 1]
print(elems_by_index)
# same with a 1 by (nmx + 1) sparce matrix
m = sp.csr_matrix((1, nmx + 1), dtype = np.uint32) # 1 x 10 matrix, stores nothing
m[0, nodes] = np.arange(len(nodes), dtype=np.uint32) # 1 x 10 matrix, stores 3 elements
m_by_index = m[0, elems_by_id] # looking through the 0-th row
print(m_by_index.toarray()[0]) # also gives [2 1]
Not sure if I chose the optimal type of matrix for this, read the descriptions of different types of sparse matrix formats to find the best one for the task.

Create array of index values from list with another list python

I have an array of values as well as another array which I would like to create an index to.
For example:
value_list = np.array([[2,2,3],[255,243,198],[2,2,3],[50,35,3]])
key_list = np.array([[2,2,3],[255,243,198],[50,35,3]])
MagicFunction(value_list,key_list)
#result = [[0,1,0,2]] which has the same length as value_list
The solutions I have seen online after researching are not quite what I am asking for I believe, any help would be appreciated!
I have this brute force code which provides the result but I don't even want to test it on my actual data size
T = np.zeros((len(value_list)), dtype = np.uint32)
for i in range(len(value_list)):
for j in range(len(key_list)):
if sum(value_list[i] == key_list[j]) == 3:
T[i] = j
The issue is how to get this to be not terribly inefficient. I see two approaches
use a dictionary so that the lookups will be fast. numpy arrays are mutable, and thus not hashable, so you'll have to convert them into, e.g., tuples to use with the dictionary.
Use broadcasting to check value_list against every "key" in key_list in a vectorized fashion. This will at least bring the for loops out of Python, but you will still have to compare every value to every key.
I'm going to assume here too that key_list only has unique "keys".
Here's how you could do the first approach:
value_list = np.array([[2,2,3],[255,243,198],[2,2,3],[50,35,3]])
key_list = np.array([[2,2,3],[255,243,198],[50,35,3]])
key_map = {tuple(key): i for i, key in enumerate(key_list)}
result = np.array([key_map[tuple(value)] for value in value_list])
result # array([0, 1, 0, 2])
And here's the second:
result = np.where((key_list[None] == value_list[:, None]).all(axis=-1))[1]
result # array([0, 1, 0, 2])
Which way is faster might depend on the size of key_list and value_list. I would time both for arrays of typical sizes for you.
EDIT - as noted in the comments, the second solution doesn't appear to be entirely correct, but I'm not sure what makes it fail. Consider using the first solution instead.
Assumptions:
Every element of value_list will be present in key_list (at some position or the other)
We are interested in the index within key_list, of only the first match
Solution:
From the two arrays, we create views of 3-tuples. We then broadcast the two views in two orthogonal directions and then check for element-wise equality on the broadcasted arrays.
import numpy as np
value_list = np.array([[2,2,3],[255,243,198],[2,2,3],[50,35,3]], dtype='uint8')
key_list = np.array([[2,2,3],[255,243,198],[50,35,3]], dtype='uint8')
# Define a new dtype, describing a "structure" of 3 uint8's (since
# your original dtype is uint8). To the fields of this structure,
# give some arbitrary names 'first', 'sec', and 'third'
dt = np.dtype([('first', np.uint8, 1),('sec', np.uint8, 1),('third', np.uint8, 1)])
# Now view the arrays as 1-d arrays of 3-tuples, using the dt
v_value_list = value_list.view(dtype=dt).reshape(value_list.shape[0])
v_key_list = key_list.view(dtype=dt).reshape(key_list.shape[0])
result = np.argmax(v_key_list[:,None] == v_value_list[None,:], axis=0)
print (result)
Output:
[0, 1, 0, 2]
Notes:
Though this is a pure numpy solution without any visible loops, it could have hidden inefficiencies, because, it matches every element of value_list with every element of key_list, in contrast with a loop-based search that smartly stops upon the first successful match. Any advantage gained will be dependent upon the actual size of key_list, and upon where the successful matches occur, in key_list. As the size of key_list grows, there might be some erosion of the numpy advantage, especially if the successful matches happen mostly in the earlier part of key_list.
The views that we are creating are in fact numpy structured arrays, where each element of the view is a structure of two int s. One, interesting question which I haven't yet explored is, when numpy compares one structure with another, does it perform a comparison of every field in the structure, or, does it short-circuit the field-comparisons at the first failed field of the structure? Any such short-cicuiting could imply a small additional advantage to this structured array solution.

Defining a matrix with unknown size in python

I want to use a matrix in my Python code but I don't know the exact size of my matrix to define it.
For other matrices, I have used np.zeros(a), where a is known.
What should I do to define a matrix with unknown size?
In this case, maybe an approach is to use a python list and append to it, up until it has the desired size, then cast it to a np array
pseudocode:
matrix = []
while matrix not full:
matrix.append(elt)
matrix = np.array(matrix)
You could write a function that tries to modify the np.array, and expand if it encounters an IndexError:
x = np.random.normal(size=(2,2))
r,c = (5,10)
try:
x[r,c] = val
except IndexError:
r0,c0 = x.shape
r_ = r+1-r0
c_ = c+1-c0
if r > 0:
x = np.concatenate([x,np.zeros((r_,x.shape[1]))], axis = 0)
if c > 0:
x = np.concatenate([x,np.zeros((x.shape[0],c_))], axis = 1)
There are problems with this implementation though: First, it makes a copy of the array and returns a concatenation of it, which translates to a possible bottleneck if you use it many times. Second, the code I provided only works if you're modifying a single element. You could do it for slices, and it would take more effort to modify the code; or you can go the whole nine yards and create a new object inheriting np.array and override the .__getitem__ and .__setitem__ methods.
Or you could just use a huge matrix, or better yet, see if you can avoid having to work with matrices of unknown size.
If you have a python generator you can use np.fromiter:
def gen():
yield 1
yield 2
yield 3
In [11]: np.fromiter(gen(), dtype='int64')
Out[11]: array([1, 2, 3])
Beware if you pass an infinite iterator you will most likely crash python, so it's often a good idea to cap the length (with the count argument):
In [21]: from itertools import count # an infinite iterator
In [22]: np.fromiter(count(), dtype='int64', count=3)
Out[22]: array([0, 1, 2])
Best practice is usually to either pre-allocate (if you know the size) or build the array as a list first (using list.append). But lists don't build in 2d very well, which I assume you want since you specified a "matrix."
In that case, I'd suggest pre-allocating an oversize scipy.sparse matrix. These can be defined to have a size much larger than your memory, and lil_matrix or dok_matrix can be built sequentially. Then you can pare it down once you enter all of your data.
from scipy.sparse import dok_matrix
dummy = dok_matrix((1000000, 1000000)) # as big as you think you might need
for i, j, data in generator():
dummy[i,j] = data
s = np.array(dummy.keys).max() + 1
M = dummy.tocoo[:s,:s] #or tocsr, tobsr, toarray . . .
This way you build your array as a Dictionary of Keys (dictionaries supporting dynamic assignment much better than ndarray does) , but still have a matrix-like output that can be (somewhat) efficiently used for math, even in a partially built state.

How do I declare an array in Python?

How do I declare an array in Python?
variable = []
Now variable refers to an empty list*.
Of course this is an assignment, not a declaration. There's no way to say in Python "this variable should never refer to anything other than a list", since Python is dynamically typed.
*The default built-in Python type is called a list, not an array. It is an ordered container of arbitrary length that can hold a heterogenous collection of objects (their types do not matter and can be freely mixed). This should not be confused with the array module, which offers a type closer to the C array type; the contents must be homogenous (all of the same type), but the length is still dynamic.
This is surprisingly complex topic in Python.
Practical answer
Arrays are represented by class list (see reference and do not mix them with generators).
Check out usage examples:
# empty array
arr = []
# init with values (can contain mixed types)
arr = [1, "eels"]
# get item by index (can be negative to access end of array)
arr = [1, 2, 3, 4, 5, 6]
arr[0] # 1
arr[-1] # 6
# get length
length = len(arr)
# supports append and insert
arr.append(8)
arr.insert(6, 7)
Theoretical answer
Under the hood Python's list is a wrapper for a real array which contains references to items. Also, underlying array is created with some extra space.
Consequences of this are:
random access is really cheap (arr[6653] is same to arr[0])
append operation is 'for free' while some extra space
insert operation is expensive
Check this awesome table of operations complexity.
Also, please see this picture, where I've tried to show most important differences between array, array of references and linked list:
You don't actually declare things, but this is how you create an array in Python:
from array import array
intarray = array('i')
For more info see the array module: http://docs.python.org/library/array.html
Now possible you don't want an array, but a list, but others have answered that already. :)
I think you (meant)want an list with the first 30 cells already filled.
So
f = []
for i in range(30):
f.append(0)
An example to where this could be used is in Fibonacci sequence.
See problem 2 in Project Euler
This is how:
my_array = [1, 'rebecca', 'allard', 15]
For calculations, use numpy arrays like this:
import numpy as np
a = np.ones((3,2)) # a 2D array with 3 rows, 2 columns, filled with ones
b = np.array([1,2,3]) # a 1D array initialised using a list [1,2,3]
c = np.linspace(2,3,100) # an array with 100 points beteen (and including) 2 and 3
print(a*1.5) # all elements of a times 1.5
print(a.T+b) # b added to the transpose of a
these numpy arrays can be saved and loaded from disk (even compressed) and complex calculations with large amounts of elements are C-like fast.
Much used in scientific environments. See here for more.
JohnMachin's comment should be the real answer.
All the other answers are just workarounds in my opinion!
So:
array=[0]*element_count
A couple of contributions suggested that arrays in python are represented by lists. This is incorrect. Python has an independent implementation of array() in the standard library module array "array.array()" hence it is incorrect to confuse the two. Lists are lists in python so be careful with the nomenclature used.
list_01 = [4, 6.2, 7-2j, 'flo', 'cro']
list_01
Out[85]: [4, 6.2, (7-2j), 'flo', 'cro']
There is one very important difference between list and array.array(). While both of these objects are ordered sequences, array.array() is an ordered homogeneous sequences whereas a list is a non-homogeneous sequence.
You don't declare anything in Python. You just use it. I recommend you start out with something like http://diveintopython.net.
I would normally just do a = [1,2,3] which is actually a list but for arrays look at this formal definition
To add to Lennart's answer, an array may be created like this:
from array import array
float_array = array("f",values)
where values can take the form of a tuple, list, or np.array, but not array:
values = [1,2,3]
values = (1,2,3)
values = np.array([1,2,3],'f')
# 'i' will work here too, but if array is 'i' then values have to be int
wrong_values = array('f',[1,2,3])
# TypeError: 'array.array' object is not callable
and the output will still be the same:
print(float_array)
print(float_array[1])
print(isinstance(float_array[1],float))
# array('f', [1.0, 2.0, 3.0])
# 2.0
# True
Most methods for list work with array as well, common
ones being pop(), extend(), and append().
Judging from the answers and comments, it appears that the array
data structure isn't that popular. I like it though, the same
way as one might prefer a tuple over a list.
The array structure has stricter rules than a list or np.array, and this can
reduce errors and make debugging easier, especially when working with numerical
data.
Attempts to insert/append a float to an int array will throw a TypeError:
values = [1,2,3]
int_array = array("i",values)
int_array.append(float(1))
# or int_array.extend([float(1)])
# TypeError: integer argument expected, got float
Keeping values which are meant to be integers (e.g. list of indices) in the array
form may therefore prevent a "TypeError: list indices must be integers, not float", since arrays can be iterated over, similar to np.array and lists:
int_array = array('i',[1,2,3])
data = [11,22,33,44,55]
sample = []
for i in int_array:
sample.append(data[i])
Annoyingly, appending an int to a float array will cause the int to become a float, without throwing an exception.
np.array retain the same data type for its entries too, but instead of giving an error it will change its data type to fit new entries (usually to double or str):
import numpy as np
numpy_int_array = np.array([1,2,3],'i')
for i in numpy_int_array:
print(type(i))
# <class 'numpy.int32'>
numpy_int_array_2 = np.append(numpy_int_array,int(1))
# still <class 'numpy.int32'>
numpy_float_array = np.append(numpy_int_array,float(1))
# <class 'numpy.float64'> for all values
numpy_str_array = np.append(numpy_int_array,"1")
# <class 'numpy.str_'> for all values
data = [11,22,33,44,55]
sample = []
for i in numpy_int_array_2:
sample.append(data[i])
# no problem here, but TypeError for the other two
This is true during assignment as well. If the data type is specified, np.array will, wherever possible, transform the entries to that data type:
int_numpy_array = np.array([1,2,float(3)],'i')
# 3 becomes an int
int_numpy_array_2 = np.array([1,2,3.9],'i')
# 3.9 gets truncated to 3 (same as int(3.9))
invalid_array = np.array([1,2,"string"],'i')
# ValueError: invalid literal for int() with base 10: 'string'
# Same error as int('string')
str_numpy_array = np.array([1,2,3],'str')
print(str_numpy_array)
print([type(i) for i in str_numpy_array])
# ['1' '2' '3']
# <class 'numpy.str_'>
or, in essence:
data = [1.2,3.4,5.6]
list_1 = np.array(data,'i').tolist()
list_2 = [int(i) for i in data]
print(list_1 == list_2)
# True
while array will simply give:
invalid_array = array([1,2,3.9],'i')
# TypeError: integer argument expected, got float
Because of this, it is not a good idea to use np.array for type-specific commands. The array structure is useful here. list preserves the data type of the values.
And for something I find rather pesky: the data type is specified as the first argument in array(), but (usually) the second in np.array(). :|
The relation to C is referred to here:
Python List vs. Array - when to use?
Have fun exploring!
Note: The typed and rather strict nature of array leans more towards C rather than Python, and by design Python does not have many type-specific constraints in its functions. Its unpopularity also creates a positive feedback in collaborative work, and replacing it mostly involves an additional [int(x) for x in file]. It is therefore entirely viable and reasonable to ignore the existence of array. It shouldn't hinder most of us in any way. :D
How about this...
>>> a = range(12)
>>> a
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
>>> a[7]
6
Following on from Lennart, there's also numpy which implements homogeneous multi-dimensional arrays.
Python calls them lists. You can write a list literal with square brackets and commas:
>>> [6,28,496,8128]
[6, 28, 496, 8128]
I had an array of strings and needed an array of the same length of booleans initiated to True. This is what I did
strs = ["Hi","Bye"]
bools = [ True for s in strs ]
You can create lists and convert them into arrays or you can create array using numpy module. Below are few examples to illustrate the same. Numpy also makes it easier to work with multi-dimensional arrays.
import numpy as np
a = np.array([1, 2, 3, 4])
#For custom inputs
a = np.array([int(x) for x in input().split()])
You can also reshape this array into a 2X2 matrix using reshape function which takes in input as the dimensions of the matrix.
mat = a.reshape(2, 2)
# This creates a list of 5000 zeros
a = [0] * 5000
You can read and write to any element in this list with a[n] notation in the same as you would with an array.
It does seem to have the same random access performance as an array. I cannot say how it allocates memory because it also supports a mix of different types including strings and objects if you need it to.

2D arrays in Python

What's the best way to create 2D arrays in Python?
What I want is want is to store values like this:
X , Y , Z
so that I access data like X[2],Y[2],Z[2] or X[n],Y[n],Z[n] where n is variable.
I don't know in the beginning how big n would be so I would like to append values at the end.
>>> a = []
>>> for i in xrange(3):
... a.append([])
... for j in xrange(3):
... a[i].append(i+j)
...
>>> a
[[0, 1, 2], [1, 2, 3], [2, 3, 4]]
>>>
Depending what you're doing, you may not really have a 2-D array.
80% of the time you have simple list of "row-like objects", which might be proper sequences.
myArray = [ ('pi',3.14159,'r',2), ('e',2.71828,'theta',.5) ]
myArray[0][1] == 3.14159
myArray[1][1] == 2.71828
More often, they're instances of a class or a dictionary or a set or something more interesting that you didn't have in your previous languages.
myArray = [ {'pi':3.1415925,'r':2}, {'e':2.71828,'theta':.5} ]
20% of the time you have a dictionary, keyed by a pair
myArray = { (2009,'aug'):(some,tuple,of,values), (2009,'sep'):(some,other,tuple) }
Rarely, will you actually need a matrix.
You have a large, large number of collection classes in Python. Odds are good that you have something more interesting than a matrix.
In Python one would usually use lists for this purpose. Lists can be nested arbitrarily, thus allowing the creation of a 2D array. Not every sublist needs to be the same size, so that solves your other problem. Have a look at the examples I linked to.
If you want to do some serious work with arrays then you should use the numpy library. This will allow you for example to do vector addition and matrix multiplication, and for large arrays it is much faster than Python lists.
However, numpy requires that the size is predefined. Of course you can also store numpy arrays in a list, like:
import numpy as np
vec_list = [np.zeros((3,)) for _ in range(10)]
vec_list.append(np.array([1,2,3]))
vec_sum = vec_list[0] + vec_list[1] # possible because we use numpy
print vec_list[10][2] # prints 3
But since your numpy arrays are pretty small I guess there is some overhead compared to using a tuple. It all depends on your priorities.
See also this other question, which is pretty similar (apart from the variable size).
I would suggest that you use a dictionary like so:
arr = {}
arr[1] = (1, 2, 4)
arr[18] = (3, 4, 5)
print(arr[1])
>>> (1, 2, 4)
If you're not sure an entry is defined in the dictionary, you'll need a validation mechanism when calling "arr[x]", e.g. try-except.
If you are concerned about memory footprint, the Python standard library contains the array module; these arrays contain elements of the same type.
Please consider the follwing codes:
from numpy import zeros
scores = zeros((len(chain1),len(chain2)), float)
x=list()
def enter(n):
y=list()
for i in range(0,n):
y.append(int(input("Enter ")))
return y
for i in range(0,2):
x.insert(i,enter(2))
print (x)
here i made function to create 1-D array and inserted into another array as a array member. multiple 1-d array inside a an array, as the value of n and i changes u create multi dimensional arrays

Categories