How to find adjacent lines on a regular 3D grid in python - python

I have the coordinate of a bunch of points and want to create surfaces out of them in a python package. I want to arrange my data before importing them into the package. Points are coming from a regular grid. Firstly, I am creating lines based on the location of points. In this step I just define which point numbers create my lines. My input data is:
coord = np.array(
[[0., 0., 2.], [0., 1., 3.], [0., 2., 2.], [1., 0., 1.], [1., 1., 3.],
[1., 2., 1.], [2., 0., 1.], [2., 1., 1.], [3., 0., 1.], [4., 0., 1.]])
The figure below shows the numbers of the grid points (gray) and the numbers of the lines (blue and red).
The lines are modeled through dictionaries, in which the key is the line number, and the value is a tuple with the start and end points numbers:
In [906]: blue_line
Out[906]: {1: (1, 2), 2: (2, 3), 3: (4, 5), 4: (5, 6), 5: (7, 8)}
In [907]: red_line
Out[907]:
{6: (1, 4),
7: (2, 5),
8: (3, 6),
9: (4, 7),
10: (5, 8),
11: (7, 9),
12: (9, 10)}
To learn more about how the lines are generated, check out this thread. The lines that are used to create the surfaces are stored in a list:
surfaces = [(1, 6, 3, 7), (2, 7, 4, 8), (3, 9, 5, 10)]
As the last step, I want to find the number of lines which are not used in creating the surfaces or are used but are closer than a limit the the dashed line in the figure above. Again, I have the coordinate of the two points creating that dashed line:
coord_dash = [(2., 2., 2.), (5., 0., 1.)]
adjacency_threshold = 2
I want to have these adjacent lines as another list (shown by a red arrow in the figure):
adjacent_lines = [4, 10, 5, 11, 12]
I have only this rough idea and do not know how to code it in Python. I can only create line numbers and surfaces and need help to find those close lines.

Determining what lines have not been used is straightforward (NumPy's setdiff1d comes in handy for this task):
In [924]: all_line = {**blue_line, **red_line}
In [925]: lines = list(all_line.keys())
In [926]: lines
Out[926]: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
In [927]: used_lines = np.ravel(surfaces)
In [928]: used_lines
Out[928]: array([ 1, 6, 3, 7, 2, 7, 4, 8, 3, 9, 5, 10])
In [929]: unused_lines = np.setdiff1d(lines, used_lines)
In [930]: unused_lines
Out[930]: array([11, 12])
The adjacent lines can be obtained by using NumPy's linalg.norm:
In [954]: midpoints
Out[954]:
{1: array([0. , 0.5, 2.5]),
2: array([0. , 1.5, 2.5]),
3: array([1. , 0.5, 2. ]),
4: array([1. , 1.5, 2. ]),
5: array([2. , 0.5, 1. ]),
6: array([0.5, 0. , 1.5]),
7: array([0.5, 1. , 3. ]),
8: array([0.5, 2. , 1.5]),
9: array([1.5, 0. , 1. ]),
10: array([1.5, 1. , 2. ]),
11: array([2.5, 0. , 1. ]),
12: array([3.5, 0. , 1. ])}
In [955]: mid_dash = np.array(coord_dash).mean(axis=0)
In [956]: mid_dash
Out[956]: array([3.5, 1. , 1.5])
In [957]: adjacent_lines = []
...: for idx, point in midpoints.items():
...: if np.linalg.norm(point - mid_dash) < adjacency_threshold:
...: adjacent_lines.append(idx)
In [958]: adjacent_lines
Out[958]: [5, 11, 12]

Related

Cannot iterate over reverb replay buffer due to dimensionality issue

I am trying to follow tensorflow's REINFORCE agent tutorial. It works when I use their code, but when I substitute my own environment I get this error:
Received incompatible tensor at flattened index 0 from table 'uniform_table'. Specification has (dtype, shape): (int32, [?]). Tensor has (dtype, shape): (int32, [92,1]).
Table signature: 0: Tensor<name: 'step_type/step_type', dtype: int32, shape: [?]>, 1: Tensor<name: 'observation/observation', dtype: double, shape: [?,18]>, 2: Tensor<name: 'action/action', dtype: float, shape: [?,2]>, 3: Tensor<name: 'next_step_type/step_type', dtype: int32, shape: [?]>, 4: Tensor<name: 'reward/reward', dtype: float, shape: [?]>, 5: Tensor<name: 'discount/discount', dtype: float, shape: [?]> [Op:IteratorGetNext]
This is interesting because 92 is exactly the number of steps in the episode.
The table signature when using my environment is:
Trajectory(
{'action': BoundedTensorSpec(shape=(None, 2), dtype=tf.float32, name='action', minimum=array(0., dtype=float32), maximum=array(3.4028235e+38, dtype=float32)),
'discount': BoundedTensorSpec(shape=(None,), dtype=tf.float32, name='discount', minimum=array(0., dtype=float32), maximum=array(1., dtype=float32)),
'next_step_type': TensorSpec(shape=(None,), dtype=tf.int32, name='step_type'),
'observation': BoundedTensorSpec(shape=(None, 18), dtype=tf.float64, name='observation', minimum=array([0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
7.5189e+02, 6.1000e-01, 1.0860e+01, 1.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00]), maximum=array(1.79769313e+308)),
'policy_info': (),
'reward': TensorSpec(shape=(None,), dtype=tf.float32, name='reward'),
'step_type': TensorSpec(shape=(None,), dtype=tf.int32, name='step_type')})
And when using the working tutorial environment:
Trajectory(
{'action': BoundedTensorSpec(shape=(None,), dtype=tf.int64, name='action', minimum=array(0), maximum=array(1)),
'discount': BoundedTensorSpec(shape=(None,), dtype=tf.float32, name='discount', minimum=array(0., dtype=float32), maximum=array(1., dtype=float32)),
'next_step_type': TensorSpec(shape=(None,), dtype=tf.int32, name='step_type'),
'observation': BoundedTensorSpec(shape=(None, 4), dtype=tf.float32, name='observation', minimum=array([-4.8000002e+00, -3.4028235e+38, -4.1887903e-01, -3.4028235e+38],
dtype=float32), maximum=array([4.8000002e+00, 3.4028235e+38, 4.1887903e-01, 3.4028235e+38],
dtype=float32)),
'policy_info': (),
'reward': TensorSpec(shape=(None,), dtype=tf.float32, name='reward'),
'step_type': TensorSpec(shape=(None,), dtype=tf.int32, name='step_type')})
The only dimensional differences are that in my case the agent produces an action composed of 2 scalar numbers while in the tutorial the action is composed of only one, and my observation is longer. Regardless, the unknown dimension precedes the known dimension.
The trajectories that are used as input for the replay buffer also match up; I printed their dimensions as they were created first for my version:
[(92, 1), (92, 1, 18), (92, 1, 2), (92, 1), (92, 1), (92, 1)]
[(92, 1), (92, 1, 18), (92, 1, 2), (92, 1), (92, 1), (92, 1)]
[(92, 1), (92, 1, 18), (92, 1, 2), (92, 1), (92, 1), (92, 1)]
[(92, 1), (92, 1, 18), (92, 1, 2), (92, 1), (92, 1), (92, 1)]
and then for the tutorial version:
[(9, 1), (9, 1, 4), (9, 1), (9, 1), (9, 1), (9, 1)]
[(11, 1), (11, 1, 4), (11, 1), (11, 1), (11, 1), (11, 1)]
[(10, 1), (10, 1, 4), (10, 1), (10, 1), (10, 1), (10, 1)]
[(10, 1), (10, 1, 4), (10, 1), (10, 1), (10, 1), (10, 1)]
[(10, 1), (10, 1, 4), (10, 1), (10, 1), (10, 1), (10, 1)]
[(10, 1), (10, 1, 4), (10, 1), (10, 1), (10, 1), (10, 1)]
[(9, 1), (9, 1, 4), (9, 1), (9, 1), (9, 1), (9, 1)]
[(9, 1), (9, 1, 4), (9, 1), (9, 1), (9, 1), (9, 1)]
[(9, 1), (9, 1, 4), (9, 1), (9, 1), (9, 1), (9, 1)]
[(9, 1), (9, 1, 4), (9, 1), (9, 1), (9, 1), (9, 1)]
So each of the entries in the trajectory for both versions have the shape (number of steps, batch size, (if entry itself is a list) value length).
I get the error mentioned at the start of the question when running the second of these two lines of code:
iterator = iter(replay_buffer.as_dataset(sample_batch_size=1))
trajectories, _ = next(iterator)
However, these lines of code run successfully using the tutorial's code, and 'trajectories' is as follows:
Trajectory(
{'action': <tf.Tensor: shape=(1, 50), dtype=int64, numpy=
array([[0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0,
1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0,
1, 0, 0, 0, 1, 1]])>,
'discount': <tf.Tensor: shape=(1, 50), dtype=float32, numpy=
array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
0., 1.]], dtype=float32)>,
'next_step_type': <tf.Tensor: shape=(1, 50), dtype=int32, numpy=
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 2, 0]], dtype=int32)>,
'observation': <tf.Tensor: shape=(1, 50, 4), dtype=float32, numpy=
array([[[ 0.02992676, 0.01392324, 0.03861422, -0.04107672],
[ 0.03020522, -0.18173054, 0.03779269, 0.26353496],
[ 0.02657061, -0.37737098, 0.04306339, 0.56789446],
[ 0.01902319, -0.18287869, 0.05442128, 0.2890832 ],
[ 0.01536562, 0.01142669, 0.06020294, 0.0140486 ],
[ 0.01559415, 0.20563589, 0.06048391, -0.25904846],
[ 0.01970687, 0.39984456, 0.05530294, -0.53205734],
[ 0.02770376, 0.59414685, 0.04466179, -0.80681443],
[ 0.0395867 , 0.39844212, 0.02852551, -0.50042385],
[ 0.04755554, 0.2029299 , 0.01851703, -0.19888948],
[ 0.05161414, 0.39778218, 0.01453924, -0.48567408],
[ 0.05956978, 0.59269595, 0.00482576, -0.7737395 ],
[ 0.0714237 , 0.39750797, -0.01064903, -0.47954214],
[ 0.07937386, 0.5927786 , -0.02023987, -0.7755622 ],
[ 0.09122943, 0.3979408 , -0.03575112, -0.48931554],
[ 0.09918825, 0.20334099, -0.04553743, -0.20811091],
[ 0.10325507, 0.39908352, -0.04969965, -0.5148037 ],
[ 0.11123674, 0.59486884, -0.05999572, -0.82272476],
[ 0.12313411, 0.40061677, -0.07645022, -0.54949903],
[ 0.13114645, 0.20664726, -0.0874402 , -0.2818491 ],
[ 0.1352794 , 0.01287431, -0.09307718, -0.0179748 ],
[ 0.13553688, -0.18079808, -0.09343667, 0.24395113],
[ 0.13192092, -0.37446988, -0.08855765, 0.50576115],
[ 0.12443152, -0.17821889, -0.07844243, 0.18653633],
[ 0.12086715, 0.01793264, -0.0747117 , -0.12982464],
[ 0.1212258 , -0.17604397, -0.0773082 , 0.13838378],
[ 0.11770492, 0.02009523, -0.07454053, -0.17765227],
[ 0.11810682, -0.17388523, -0.07809357, 0.09061581],
[ 0.11462912, 0.02226418, -0.07628125, -0.22564775],
[ 0.1150744 , -0.17168939, -0.08079421, 0.04203164],
[ 0.11164062, 0.02449259, -0.07995357, -0.27500907],
[ 0.11213046, -0.16940299, -0.08545376, -0.00857614],
[ 0.10874241, -0.36320207, -0.08562528, 0.2559689 ],
[ 0.10147836, -0.5570038 , -0.0805059 , 0.52046335],
[ 0.09033829, -0.3608463 , -0.07009663, 0.20353697],
[ 0.08312136, -0.55489945, -0.06602589, 0.47331032],
[ 0.07202338, -0.7490298 , -0.05655969, 0.7444739 ],
[ 0.05704278, -0.5531748 , -0.04167021, 0.43454146],
[ 0.04597928, -0.35748845, -0.03297938, 0.12901925],
[ 0.03882951, -0.16190998, -0.03039899, -0.17388314],
[ 0.03559131, 0.03363356, -0.03387666, -0.47599885],
[ 0.03626398, 0.22921707, -0.04339663, -0.77916366],
[ 0.04084833, 0.42490798, -0.05897991, -1.0851783 ],
[ 0.04934648, 0.6207563 , -0.08068347, -1.39577 ],
[ 0.06176161, 0.4267255 , -0.10859887, -1.1293658 ],
[ 0.07029612, 0.623089 , -0.13118619, -1.4540412 ],
[ 0.0827579 , 0.42979917, -0.16026701, -1.205056 ],
[ 0.09135389, 0.23706956, -0.18436813, -0.96658343],
[ 0.09609528, 0.04483784, -0.2036998 , -0.7370203 ],
[ 0.09699203, 0.24210311, -0.2184402 , -1.0862749 ]]],
dtype=float32)>,
'policy_info': (),
'reward': <tf.Tensor: shape=(1, 50), dtype=float32, numpy=
array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 0.]], dtype=float32)>,
'step_type': <tf.Tensor: shape=(1, 50), dtype=int32, numpy=
array([[0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 2]], dtype=int32)>})
So when everything is working correctly, feeding trajectories with entries of shape (number of steps, batch size, (if entry itself is a list) value length) to the replay buffer facilitates creation of a dataset where each entry in each row has shape (batch size, number of steps, (if entry itself is a list) value length).
However, in my version, each entry in each row of the dataset keeps its original shape, causing the error. Does anyone experienced with reverb know why this might be happening?
I did a lot more digging into the tensorflow backend and the problem is caused by the fact that the cartpole gym wrapper creates a non-batched python environment while the default is a batched environment, so when I run my code an additional (batch) dimension is being added to the trajectories before they are stored in the reverb table. However, since I am using the same table signature, when I attempt to pull an entry out of the table an exception is raised that the dimensions are incorrect because that signature conflicts with the actual shape of the entries

Understanding axes in NumPy

I was going through NumPy documentation, and am not able to understand one point. It mentions, for the example below, the array has rank 2 (it is 2-dimensional). The first dimension (axis) has a length of 2, the second dimension has a length of 3.
[[ 1., 0., 0.],
[ 0., 1., 2.]]
How does the first dimension (axis) have a length of 2?
Edit:
The reason for my confusion is the below statement in the documentation.
The coordinates of a point in 3D space [1, 2, 1] is an array of rank
1, because it has one axis. That axis has a length of 3.
In the original 2D ndarray, I assumed that the number of lists identifies the rank/dimension, and I wrongly assumed that the length of each list denotes the length of each dimension (in that order). So, as per my understanding, the first dimension should be having a length of 3, since the length of the first list is 3.
In numpy, axis ordering follows zyx convention, instead of the usual (and maybe more intuitive) xyz.
Visually, it means that for a 2D array where the horizontal axis is x and the vertical axis is y:
x -->
y 0 1 2
| 0 [[1., 0., 0.],
V 1 [0., 1., 2.]]
The shape of this array is (2, 3) because it is ordered (y, x), with the first axis y of length 2.
And verifying this with slicing:
import numpy as np
a = np.array([[1, 0, 0], [0, 1, 2]], dtype=np.float)
>>> a
Out[]:
array([[ 1., 0., 0.],
[ 0., 1., 2.]])
>>> a[0, :] # Slice index 0 of first axis
Out[]: array([ 1., 0., 0.]) # Get values along second axis `x` of length 3
>>> a[:, 2] # Slice index 2 of second axis
Out[]: array([ 0., 2.]) # Get values along first axis `y` of length 2
You may be confusing the other sentence with the picture example below. Think of it like this: Rank = number of lists in the list(array) and the term length in your question can be thought of length = the number of 'things' in the list(array)
I think they are trying to describe to you the definition of shape which is in this case (2,3)
in that post I think the key sentence is here:
In NumPy dimensions are called axes. The number of axes is rank.
If you print the numpy array
print(np.array([[ 1. 0. 0.],[ 0. 1. 2.]])
You'll get the following output
#col1 col2 col3
[[ 1. 0. 0.] # row 1
[ 0. 1. 2.]] # row 2
Think of it as a 2 by 3 matrix... 2 rows, 3 columns. It is a 2d array because it is a list of lists. ([[ at the start is a hint its 2d)).
The 2d numpy array
np.array([[ 1. 0., 0., 6.],[ 0. 1. 2., 7.],[3.,4.,5,8.]])
would print as
#col1 col2 col3 col4
[[ 1. 0. , 0., 6.] # row 1
[ 0. 1. , 2., 7.] # row 2
[3., 4. , 5., 8.]] # row 3
This is a 3 by 4 2d array (3 rows, 4 columns)
The first dimensions is the length:
In [11]: a = np.array([[ 1., 0., 0.], [ 0., 1., 2.]])
In [12]: a
Out[12]:
array([[ 1., 0., 0.],
[ 0., 1., 2.]])
In [13]: len(a) # "length of first dimension"
Out[13]: 2
The second is the length of each "row":
In [14]: [len(aa) for aa in a] # 3 is "length of second dimension"
Out[14]: [3, 3]
Many numpy functions take axis as an argument, for example you can sum over an axis:
In [15]: a.sum(axis=0)
Out[15]: array([ 1., 1., 2.])
In [16]: a.sum(axis=1)
Out[16]: array([ 1., 3.])
The thing to note is that you can have higher dimensional arrays:
In [21]: b = np.array([[[1., 0., 0.], [ 0., 1., 2.]]])
In [22]: b
Out[22]:
array([[[ 1., 0., 0.],
[ 0., 1., 2.]]])
In [23]: b.sum(axis=2)
Out[23]: array([[ 1., 3.]])
Keep the following points in mind when considering Numpy axes:
Each sub-level of a list (or array) represents an axis. For example:
import numpy as np
a = np.array([1,2]) # 1 axis
b = np.array([[1,2],[3,4]]) # 2 axes
c = np.array([[[1,2],[3,4]],[[5,6],[7,8]]]) # 3 axes
Axis labels correspond to the level of the sub-list they represent, starting with axis 0 for the outer most list.
To illustrate this, consider the following array of different shape, each with 24 elements:
# 1D Array
a0 = np.array(
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
)
a0.shape # (24,) - here, the length along the 0-axis is 24
# 2D Array
a01 = np.array(
[
[1.1, 1.2, 1.3, 1.4],
[2.1, 2.2, 2.3, 2.4],
[3.1, 3.2, 3.3, 3.4],
[4.1, 4.2, 4.3, 4.4],
[5.1, 5.2, 5.3, 5.4],
[6.1, 6.2, 6.3, 6.4]
]
)
a01.shape # (6, 4) - now, the length along the 0-axis is 6
# 3D Array
a012 = np.array(
[
[
[1.1.1, 1.1.2],
[1.2.1, 1.2.2],
[1.3.1, 1.3.2]
],
[
[2.1.1, 2.1.2],
[2.2.1, 2.2.2],
[2.3.1, 2.3.2]
],
[
[3.1.1, 3.1.2],
[3.2.1, 3.2.2],
[3.3.1, 3.3.2]
],
[
[4.1.1, 4.1.2],
[4.2.1, 4.2.2],
[4.3.1, 4.3.2]
]
)
a012.shape # (4, 3, 2) - and finally, the length along the 0-axis is 4

How to split an array based on minimum row value using vectorization

I am trying to figure out how to take the following for loop that splits an array based on the index of the lowest value in the row and use vectorization. I've looked at this link and have been trying to use the numpy.where function but currently unsuccessful.
For example if an array has n columns, then all the rows where col[0] has the lowest value are put in one array, all the rows where col[1] are put in another, etc.
Here's the code using a for loop.
import numpy
a = numpy.array([[ 0. 1. 3.]
[ 0. 1. 3.]
[ 0. 1. 3.]
[ 1. 0. 2.]
[ 1. 0. 2.]
[ 1. 0. 2.]
[ 3. 1. 0.]
[ 3. 1. 0.]
[ 3. 1. 0.]])
result_0 = []
result_1 = []
result_2 = []
for value in a:
if value[0] <= value[1] and value[0] <= value[2]:
result_0.append(value)
elif value[1] <= value[0] and value[1] <= value[2]:
result_1.append(value)
else:
result_2.append(value)
print(result_0)
>>[array([ 0. 1. 3.]), array([ 0. 1. 3.]), array([ 0. 1. 3.])]
print(result_1)
>>[array([ 1. 0. 2.]), array([ 1. 0. 2.]), array([ 1. 0. 2.])]
print(result_2)
>>[array([ 3. 1. 0.]), array([ 3. 1. 0.]), array([ 3. 1. 0.])]
First, use argsort to see where the lowest value in each row is:
>>> a.argsort(axis=1)
array([[0, 1, 2],
[0, 1, 2],
[0, 1, 2],
[1, 0, 2],
[1, 0, 2],
[1, 0, 2],
[2, 1, 0],
[2, 1, 0],
[2, 1, 0]])
Note that wherever a row has 0, that is the smallest column in that row.
Now you can build the results:
>>> sortidx = a.argsort(axis=1)
>>> [a[sortidx[:,i] == 0] for i in range(a.shape[1])]
[array([[ 0., 1., 3.],
[ 0., 1., 3.],
[ 0., 1., 3.]]),
array([[ 1., 0., 2.],
[ 1., 0., 2.],
[ 1., 0., 2.]]),
array([[ 3., 1., 0.],
[ 3., 1., 0.],
[ 3., 1., 0.]])]
So it is done with only a single loop over the columns, which will give a huge speedup if the number of rows is much larger than the number of columns.
This is not the best solution since it relies on simple python loops and is not very efficient when you start dealing with large data sets but it should get you started.
The point is to create an array of "buckets" which store the data based on the depth of the lengthiest element. Then enumerate each element in values, selecting the smallest one and saving its offset which is subsequently appended to the correct results "bucket", for each a. Finally we print this out in the last loop.
Solution using loops:
import numpy
import pprint
# random data set
a = numpy.array([[0, 1, 3],
[0, 1, 3],
[0, 1, 3],
[1, 0, 2],
[1, 0, 2],
[1, 0, 2],
[3, 1, 0],
[3, 1, 0],
[3, 1, 0]])
# create a list of results as big as the depth of elements in an entry
results = list()
for l in range(max(len(i) for i in a)):
results.append(list())
# don't do the following because all the references to the lists will be the same and you get dups:
# results = [[]]*max(len(i) for i in a)
for value in a:
res_offset, _val = min(enumerate(value), key=lambda x: x[1]) # get the offset and min value
results[res_offset].append(value) # store the original Array obj in the correct "bucket"
# print for visualization
for c, r in enumerate(results):
print("result_%s: %s" % (c, r))
Outputs:
result_0: [array([0, 1, 3]), array([0, 1, 3]), array([0, 1, 3])]
result_1: [array([1, 0, 2]), array([1, 0, 2]), array([1, 0, 2])]
result_2: [array([3, 1, 0]), array([3, 1, 0]), array([3, 1, 0])]
I found a much easier way to do this. I hope that I am interpreting the OP correctly.
My sense is that the OP wants to create a slice of the larger array based upon some set of conditions.
Note that the code above to create the array does not seem to work--at least in python 3.5. I generated the array as follow.
a = np.array([0., 1., 3., 0., 1., 3., 0., 1., 3., 1., 0., 2., 1., 0., 2.,1., 0., 2.,3., 1., 0.,3., 1., 0.,3., 1., 0.]).reshape([9,3])
Next, I sliced the original array into smaller arrays. Numpy has builtins to help with this.
result_0 = a[np.logical_and(a[:,0] <= a[:,1],a[:,0] <= a[:,2])]
result_1 = a[np.logical_and(a[:,1] <= a[:,0],a[:,1] <= a[:,2])]
result_2 = a[np.logical_and(a[:,2] <= a[:,0],a[:,2] <= a[:,1])]
This will generate new numpy arrays that match the given conditions.
Note if the user wants to convert these individual rows into a list or arrays, he/she can just enter the following code to obtain the result.
result_0 = [np.array(x) for x in result_0.tolist()]
result_0 = [np.array(x) for x in result_1.tolist()]
result_0 = [np.array(x) for x in result_2.tolist()]
This should generate the outcome requested in the OP.

Split NumPy array according to values in the array (a condition)

I have an array:
arr = [(1,1,1), (1,1,2), (1,1,3), (1,1,4)...(35,1,22),(35,1,23)]
I want to split my array according to the third value in each ordered pair. I want each third value of 1 to be the start
of a new array. The results should be:
[(1,1,1), (1,1,2),...(1,1,35)][(1,2,1), (1,2,2),...(1,2,46)]
and so on. I know numpy.split should do the trick but I'm lost as to how to write the condition for the split.
Here's a quick idea, working with a 1d array. It can be easily extended to work with your 2d array:
In [385]: x=np.arange(10)
In [386]: I=np.where(x%3==0)
In [387]: I
Out[387]: (array([0, 3, 6, 9]),)
In [389]: np.split(x,I[0])
Out[389]:
[array([], dtype=float64),
array([0, 1, 2]),
array([3, 4, 5]),
array([6, 7, 8]),
array([9])]
The key is to use where to find the indecies where you want split to act.
For a 2d arr
First make a sample 2d array, with something interesting in the 3rd column:
In [390]: arr=np.ones((10,3))
In [391]: arr[:,2]=np.arange(10)
In [392]: arr
Out[392]:
array([[ 1., 1., 0.],
[ 1., 1., 1.],
...
[ 1., 1., 9.]])
Then use the same where and boolean to find indexes to split on:
In [393]: I=np.where(arr[:,2]%3==0)
In [395]: np.split(arr,I[0])
Out[395]:
[array([], dtype=float64),
array([[ 1., 1., 0.],
[ 1., 1., 1.],
[ 1., 1., 2.]]),
array([[ 1., 1., 3.],
[ 1., 1., 4.],
[ 1., 1., 5.]]),
array([[ 1., 1., 6.],
[ 1., 1., 7.],
[ 1., 1., 8.]]),
array([[ 1., 1., 9.]])]
I cannot think of any numpy functions or tricks to do this . A simple solution using for loop would be -
In [48]: arr = [(1,1,1), (1,1,2), (1,1,3), (1,1,4),(1,2,1),(1,2,2),(1,2,3),(1,3,1),(1,3,2),(1,3,3),(1,3,4),(1,3,5)]
In [49]: result = []
In [50]: for i in arr:
....: if i[2] == 1:
....: tempres = []
....: result.append(tempres)
....: tempres.append(i)
....:
In [51]: result
Out[51]:
[[(1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 1, 4)],
[(1, 2, 1), (1, 2, 2), (1, 2, 3)],
[(1, 3, 1), (1, 3, 2), (1, 3, 3), (1, 3, 4), (1, 3, 5)]]
From looking at the documentation it seems like specifying the index of where to split on will work best. For your specific example the following works if arr is already a 2dimensional numpy array:
np.split(arr, np.where(arr[:,2] == 1)[0])
arr[:,2] returns a list of the 3rd entry in each tuple. The colon says to take every row and the 2 says to take the 3rd column, which is the 3rd component.
We then use np.where to return all the places where the 3rd coordinate is a 1. We have to do np.where()[0] to get at the array of locations directly.
We then plug in the indices we've found where the 3rd coordinate is 1 to np.split which splits at the desired locations.
Note that because the first entry has a 1 in the 3rd coordinate it will split before the first entry. This gives us one extra "split" array which is empty.

remove empty numpy array

I have a numpy array:
array([], shape=(0, 4), dtype=float64)
How can I remove this array in a multidimensional array?
I tried
import numpy as np
if array == []:
np.delete(array)
But, the multidimensional array still has this empty array.
EDIT:
The input is
new_array = [array([], shape=(0, 4), dtype=float64),
array([[-0.97, 0.99, -0.98, -0.93 ],
[-0.97, -0.99, 0.59, -0.93 ],
[-0.97, 0.99, -0.98, -0.93 ],
[ 0.70 , 1, 0.60, 0.65]]), array([[-0.82, 1, 0.61, -0.63],
[ 0.92, -1, 0.77, 0.88],
[ 0.92, -1, 0.77, 0.88],
[ 0.65, -1, 0.73, 0.85]]), array([], shape=(0, 4), dtype=float64)]
The expected output after removing the empty arrays is:
new array = [array([[-0.97, 0.99, -0.98, -0.93 ],
[-0.97, -0.99, 0.59, -0.93 ],
[-0.97, 0.99, -0.98, -0.93 ],
[ 0.70 , 1, 0.60, 0.65]]),
array([[-0.82, 1, 0.61, -0.63],
[ 0.92, -1, 0.77, 0.88],
[ 0.92, -1, 0.77, 0.88],
[ 0.65, -1, 0.73, 0.85]])]
new_array, as printed, looks like a list of arrays. And even if it were an array, it would be a 1d array of dtype=object.
==[] is not the way to check for an empty array:
In [10]: x=np.zeros((0,4),float)
In [11]: x
Out[11]: array([], shape=(0, 4), dtype=float64)
In [12]: x==[]
Out[12]: False
In [14]: 0 in x.shape # check if there's a 0 in the shape
Out[14]: True
Check the syntax for np.delete. It requires an array, an index and an axis, and returns another array. It does not operate in place.
If new_array is a list, a list comprehension would do a nice job of removing the [] arrays:
In [33]: alist=[x, np.ones((2,3)), np.zeros((1,4)),x]
In [34]: alist
Out[34]:
[array([], shape=(0, 4), dtype=float64), array([[ 1., 1., 1.],
[ 1., 1., 1.]]), array([[ 0., 0., 0., 0.]]), array([], shape=(0, 4), dtype=float64)]
In [35]: [y for y in alist if 0 not in y.shape]
Out[35]:
[array([[ 1., 1., 1.],
[ 1., 1., 1.]]), array([[ 0., 0., 0., 0.]])]
It would also work if new_array was a 1d array:
new_array=np.array(alist)
newer_array = np.array([y for y in new_array if 0 not in y.shape])
To use np.delete with new_array, you have to specify which elements:
In [47]: np.delete(new_array,[0,3])
Out[47]:
array([array([[ 1., 1., 1.],
[ 1., 1., 1.]]),
array([[ 0., 0., 0., 0.]])], dtype=object)
to find [0,3] you could use np.where:
np.delete(new_array,np.where([y.size==0 for y in new_array]))
Better yet, skip the delete and where and go with a boolean mask
new_array[np.array([y.size>0 for y in new_array])]
I don't think there's a way of identifying these 'emtpy' arrays without a list comprehension, since you have to check the shape or size property, not the element's data. Also there's a limit as to what kinds of math you can do across elements of an object array. It's more like a list than a 2d array.
I had initially an array (3,11,11) and after a multprocessing using pool.map my array was transformed in a list like this:
[array([], shape=(0, 11, 11), dtype=float64),
array([[[ 0.35318114, 0.36152024, 0.35572945, 0.34495254, 0.34169853,
0.36553977, 0.34266126, 0.3492261 , 0.3339431 , 0.34759375,
0.33490712],...
if a convert this list in an array the shape was (3,), so I used:
myarray = np.vstack(mylist)
and this returned my first 3d array with the original shape (3,11,11).
Delete takes the multidimensional array as a parameter. Then you need to specify the subarray to delete and the axis it's on. See http://docs.scipy.org/doc/numpy/reference/generated/numpy.delete.html
np.delete(new_array,<obj indicating subarray to delete (perhaps an array of integers in your case)>, 0)
Also, note that the deletion is not in-place.

Categories