Get an index of a sorted matrix - python

I have a 2D np.array:
array([[ 1523., 172., 1613.],
[ 3216., 117., 1999.],
[ 85., 1271., 4.]])
I would to extract the sorted indexes of this np.array by value.
The results should be (for example) :
[[2,2],[2,0],[1,1],[0,1],[2,1],[0,0],[0,2],[1,2],[1,0]]
I already saw how to extract the min :
np.unravel_index(np.argmin(act),act.shape) #(2,2)
Thank you

Using numpy.argsort with axis=None (assuming flatten array):
>>> import numpy as np
>>>
>>> act = np.array([[ 1523., 172., 1613.],
... [ 3216., 117., 1999.],
... [ 85., 1271., 4.]])
>>> n = act.shape[1]
>>> zip(*np.argsort(act, axis=None).__divmod__(n))
[(2, 2), (2, 0), (1, 1), (0, 1), (2, 1), (0, 0), (0, 2), (1, 2), (1, 0)]

Related

Cannot iterate over reverb replay buffer due to dimensionality issue

I am trying to follow tensorflow's REINFORCE agent tutorial. It works when I use their code, but when I substitute my own environment I get this error:
Received incompatible tensor at flattened index 0 from table 'uniform_table'. Specification has (dtype, shape): (int32, [?]). Tensor has (dtype, shape): (int32, [92,1]).
Table signature: 0: Tensor<name: 'step_type/step_type', dtype: int32, shape: [?]>, 1: Tensor<name: 'observation/observation', dtype: double, shape: [?,18]>, 2: Tensor<name: 'action/action', dtype: float, shape: [?,2]>, 3: Tensor<name: 'next_step_type/step_type', dtype: int32, shape: [?]>, 4: Tensor<name: 'reward/reward', dtype: float, shape: [?]>, 5: Tensor<name: 'discount/discount', dtype: float, shape: [?]> [Op:IteratorGetNext]
This is interesting because 92 is exactly the number of steps in the episode.
The table signature when using my environment is:
Trajectory(
{'action': BoundedTensorSpec(shape=(None, 2), dtype=tf.float32, name='action', minimum=array(0., dtype=float32), maximum=array(3.4028235e+38, dtype=float32)),
'discount': BoundedTensorSpec(shape=(None,), dtype=tf.float32, name='discount', minimum=array(0., dtype=float32), maximum=array(1., dtype=float32)),
'next_step_type': TensorSpec(shape=(None,), dtype=tf.int32, name='step_type'),
'observation': BoundedTensorSpec(shape=(None, 18), dtype=tf.float64, name='observation', minimum=array([0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
7.5189e+02, 6.1000e-01, 1.0860e+01, 1.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00]), maximum=array(1.79769313e+308)),
'policy_info': (),
'reward': TensorSpec(shape=(None,), dtype=tf.float32, name='reward'),
'step_type': TensorSpec(shape=(None,), dtype=tf.int32, name='step_type')})
And when using the working tutorial environment:
Trajectory(
{'action': BoundedTensorSpec(shape=(None,), dtype=tf.int64, name='action', minimum=array(0), maximum=array(1)),
'discount': BoundedTensorSpec(shape=(None,), dtype=tf.float32, name='discount', minimum=array(0., dtype=float32), maximum=array(1., dtype=float32)),
'next_step_type': TensorSpec(shape=(None,), dtype=tf.int32, name='step_type'),
'observation': BoundedTensorSpec(shape=(None, 4), dtype=tf.float32, name='observation', minimum=array([-4.8000002e+00, -3.4028235e+38, -4.1887903e-01, -3.4028235e+38],
dtype=float32), maximum=array([4.8000002e+00, 3.4028235e+38, 4.1887903e-01, 3.4028235e+38],
dtype=float32)),
'policy_info': (),
'reward': TensorSpec(shape=(None,), dtype=tf.float32, name='reward'),
'step_type': TensorSpec(shape=(None,), dtype=tf.int32, name='step_type')})
The only dimensional differences are that in my case the agent produces an action composed of 2 scalar numbers while in the tutorial the action is composed of only one, and my observation is longer. Regardless, the unknown dimension precedes the known dimension.
The trajectories that are used as input for the replay buffer also match up; I printed their dimensions as they were created first for my version:
[(92, 1), (92, 1, 18), (92, 1, 2), (92, 1), (92, 1), (92, 1)]
[(92, 1), (92, 1, 18), (92, 1, 2), (92, 1), (92, 1), (92, 1)]
[(92, 1), (92, 1, 18), (92, 1, 2), (92, 1), (92, 1), (92, 1)]
[(92, 1), (92, 1, 18), (92, 1, 2), (92, 1), (92, 1), (92, 1)]
and then for the tutorial version:
[(9, 1), (9, 1, 4), (9, 1), (9, 1), (9, 1), (9, 1)]
[(11, 1), (11, 1, 4), (11, 1), (11, 1), (11, 1), (11, 1)]
[(10, 1), (10, 1, 4), (10, 1), (10, 1), (10, 1), (10, 1)]
[(10, 1), (10, 1, 4), (10, 1), (10, 1), (10, 1), (10, 1)]
[(10, 1), (10, 1, 4), (10, 1), (10, 1), (10, 1), (10, 1)]
[(10, 1), (10, 1, 4), (10, 1), (10, 1), (10, 1), (10, 1)]
[(9, 1), (9, 1, 4), (9, 1), (9, 1), (9, 1), (9, 1)]
[(9, 1), (9, 1, 4), (9, 1), (9, 1), (9, 1), (9, 1)]
[(9, 1), (9, 1, 4), (9, 1), (9, 1), (9, 1), (9, 1)]
[(9, 1), (9, 1, 4), (9, 1), (9, 1), (9, 1), (9, 1)]
So each of the entries in the trajectory for both versions have the shape (number of steps, batch size, (if entry itself is a list) value length).
I get the error mentioned at the start of the question when running the second of these two lines of code:
iterator = iter(replay_buffer.as_dataset(sample_batch_size=1))
trajectories, _ = next(iterator)
However, these lines of code run successfully using the tutorial's code, and 'trajectories' is as follows:
Trajectory(
{'action': <tf.Tensor: shape=(1, 50), dtype=int64, numpy=
array([[0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0,
1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0,
1, 0, 0, 0, 1, 1]])>,
'discount': <tf.Tensor: shape=(1, 50), dtype=float32, numpy=
array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
0., 1.]], dtype=float32)>,
'next_step_type': <tf.Tensor: shape=(1, 50), dtype=int32, numpy=
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 2, 0]], dtype=int32)>,
'observation': <tf.Tensor: shape=(1, 50, 4), dtype=float32, numpy=
array([[[ 0.02992676, 0.01392324, 0.03861422, -0.04107672],
[ 0.03020522, -0.18173054, 0.03779269, 0.26353496],
[ 0.02657061, -0.37737098, 0.04306339, 0.56789446],
[ 0.01902319, -0.18287869, 0.05442128, 0.2890832 ],
[ 0.01536562, 0.01142669, 0.06020294, 0.0140486 ],
[ 0.01559415, 0.20563589, 0.06048391, -0.25904846],
[ 0.01970687, 0.39984456, 0.05530294, -0.53205734],
[ 0.02770376, 0.59414685, 0.04466179, -0.80681443],
[ 0.0395867 , 0.39844212, 0.02852551, -0.50042385],
[ 0.04755554, 0.2029299 , 0.01851703, -0.19888948],
[ 0.05161414, 0.39778218, 0.01453924, -0.48567408],
[ 0.05956978, 0.59269595, 0.00482576, -0.7737395 ],
[ 0.0714237 , 0.39750797, -0.01064903, -0.47954214],
[ 0.07937386, 0.5927786 , -0.02023987, -0.7755622 ],
[ 0.09122943, 0.3979408 , -0.03575112, -0.48931554],
[ 0.09918825, 0.20334099, -0.04553743, -0.20811091],
[ 0.10325507, 0.39908352, -0.04969965, -0.5148037 ],
[ 0.11123674, 0.59486884, -0.05999572, -0.82272476],
[ 0.12313411, 0.40061677, -0.07645022, -0.54949903],
[ 0.13114645, 0.20664726, -0.0874402 , -0.2818491 ],
[ 0.1352794 , 0.01287431, -0.09307718, -0.0179748 ],
[ 0.13553688, -0.18079808, -0.09343667, 0.24395113],
[ 0.13192092, -0.37446988, -0.08855765, 0.50576115],
[ 0.12443152, -0.17821889, -0.07844243, 0.18653633],
[ 0.12086715, 0.01793264, -0.0747117 , -0.12982464],
[ 0.1212258 , -0.17604397, -0.0773082 , 0.13838378],
[ 0.11770492, 0.02009523, -0.07454053, -0.17765227],
[ 0.11810682, -0.17388523, -0.07809357, 0.09061581],
[ 0.11462912, 0.02226418, -0.07628125, -0.22564775],
[ 0.1150744 , -0.17168939, -0.08079421, 0.04203164],
[ 0.11164062, 0.02449259, -0.07995357, -0.27500907],
[ 0.11213046, -0.16940299, -0.08545376, -0.00857614],
[ 0.10874241, -0.36320207, -0.08562528, 0.2559689 ],
[ 0.10147836, -0.5570038 , -0.0805059 , 0.52046335],
[ 0.09033829, -0.3608463 , -0.07009663, 0.20353697],
[ 0.08312136, -0.55489945, -0.06602589, 0.47331032],
[ 0.07202338, -0.7490298 , -0.05655969, 0.7444739 ],
[ 0.05704278, -0.5531748 , -0.04167021, 0.43454146],
[ 0.04597928, -0.35748845, -0.03297938, 0.12901925],
[ 0.03882951, -0.16190998, -0.03039899, -0.17388314],
[ 0.03559131, 0.03363356, -0.03387666, -0.47599885],
[ 0.03626398, 0.22921707, -0.04339663, -0.77916366],
[ 0.04084833, 0.42490798, -0.05897991, -1.0851783 ],
[ 0.04934648, 0.6207563 , -0.08068347, -1.39577 ],
[ 0.06176161, 0.4267255 , -0.10859887, -1.1293658 ],
[ 0.07029612, 0.623089 , -0.13118619, -1.4540412 ],
[ 0.0827579 , 0.42979917, -0.16026701, -1.205056 ],
[ 0.09135389, 0.23706956, -0.18436813, -0.96658343],
[ 0.09609528, 0.04483784, -0.2036998 , -0.7370203 ],
[ 0.09699203, 0.24210311, -0.2184402 , -1.0862749 ]]],
dtype=float32)>,
'policy_info': (),
'reward': <tf.Tensor: shape=(1, 50), dtype=float32, numpy=
array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 0.]], dtype=float32)>,
'step_type': <tf.Tensor: shape=(1, 50), dtype=int32, numpy=
array([[0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 2]], dtype=int32)>})
So when everything is working correctly, feeding trajectories with entries of shape (number of steps, batch size, (if entry itself is a list) value length) to the replay buffer facilitates creation of a dataset where each entry in each row has shape (batch size, number of steps, (if entry itself is a list) value length).
However, in my version, each entry in each row of the dataset keeps its original shape, causing the error. Does anyone experienced with reverb know why this might be happening?
I did a lot more digging into the tensorflow backend and the problem is caused by the fact that the cartpole gym wrapper creates a non-batched python environment while the default is a batched environment, so when I run my code an additional (batch) dimension is being added to the trajectories before they are stored in the reverb table. However, since I am using the same table signature, when I attempt to pull an entry out of the table an exception is raised that the dimensions are incorrect because that signature conflicts with the actual shape of the entries

Ndarray of lists with mix of floats and integers?

I have an array of lists (corr: N-Dimensional array)
s_cluster_data
Out[410]:
array([[ 0.9607611 , 0.19538569, 0. ],
[ 1.03990463, 0.22274072, 0. ],
[ 1.09430461, 0.22603228, 0. ],
...,
[ 1.10802461, -0.54190659, 2. ],
[ 0.9288097 , -0.49195368, 2. ],
[ 0.81606986, -0.47141286, 2. ]])
I would like to make the third column an integer. I've tried to assign dtype as such
dtype=[('A','f8'),('B','f8'),('C','i4')]
s_cluster_data = np.array(s_cluster_data, dtype=dtype)
s_cluster_data
Out[414]:
array([[( 0.9607611 , 0.9607611 , 0), ( 0.19538569, 0.19538569, 0),
( 0. , 0. , 0)],
[( 1.03990463, 1.03990463, 1), ( 0.22274072, 0.22274072, 0),
( 0. , 0. , 0)],
[( 1.09430461, 1.09430461, 1), ( 0.22603228, 0.22603228, 0),
( 0. , 0. , 0)],
...,
dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<i4')])
Which creates an array of lists of tuples (corr: array with dtype), with each index in lists becoming a separate tuple.
I've also tried to take apart the array, read it in as array of tuples, but return back to original state.
list_cluster = s_cluster_data.tolist() # py list
tuple_cluster = [tuple(l) for l in list_cluster] # list of tuples
dtype=[('A','f8'),('B','f8'),('C','i4')]
sd_cluster_data = np.array(tuple_cluster, dtype=dtype) # array of tuples with dtype
sd_cluster_data
Out: ...,
(1.0020371 , -0.56034073, 2), (1.18264038, -0.55773913, 2),
(1.00550194, -0.55359672, 2), (1.10802461, -0.54190659, 2),
(0.9288097 , -0.49195368, 2), (0.81606986, -0.47141286, 2)],
dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<i4')])
So ideally the above output is what I would like to see, but with array of lists, not array of tuples.
I tried to take the array apart and merge it back as lists
x_val_arr = np.array([x[0] for x in sd_cluster_data])
y_val_arr = np.array([x[1] for x in sd_cluster_data])
cluster_id_arr = np.array([x[2] for x in sd_cluster_data])
coordinates_arr = np.stack((x_val_arr,y_val_arr,cluster_id_arr),axis=1)
But once again I get floats in the third column
coordinates_arr
Out[416]:
array([[ 0.9607611 , 0.19538569, 0. ],
[ 1.03990463, 0.22274072, 0. ],
[ 1.09430461, 0.22603228, 0. ],
...,
[ 1.10802461, -0.54190659, 2. ],
[ 0.9288097 , -0.49195368, 2. ],
[ 0.81606986, -0.47141286, 2. ]])
So this is probably a question due to my lack of domain knowledge, but do ndarrays not support mixed data types if it consists of lists, not tuples?
In [87]: import numpy.lib.recfunctions as rf
In [88]: arr = np.array([[ 0.9607611 , 0.19538569, 0. ],
...: [ 1.03990463, 0.22274072, 0. ],
...: [ 1.09430461, 0.22603228, 0. ],
...: [ 1.10802461, -0.54190659, 2. ],
...: [ 0.9288097 , -0.49195368, 2. ],
...: [ 0.81606986, -0.47141286, 2. ]])
In [89]: arr
Out[89]:
array([[ 0.9607611 , 0.19538569, 0. ],
[ 1.03990463, 0.22274072, 0. ],
[ 1.09430461, 0.22603228, 0. ],
[ 1.10802461, -0.54190659, 2. ],
[ 0.9288097 , -0.49195368, 2. ],
[ 0.81606986, -0.47141286, 2. ]])
There are various ways of constructing a structured array from 2d array like this. Recent versions provide a convenient unstructured_to_structured function:
In [90]: dt = np.dtype([('A','f8'),('B','f8'),('C','i4')])
In [92]: rf.unstructured_to_structured(arr, dt)
Out[92]:
array([(0.9607611 , 0.19538569, 0), (1.03990463, 0.22274072, 0),
(1.09430461, 0.22603228, 0), (1.10802461, -0.54190659, 2),
(0.9288097 , -0.49195368, 2), (0.81606986, -0.47141286, 2)],
dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<i4')])
Each row of arr has been turned into a structured record, displayed as a tuple.
A functionally equivalent approach is to create a 'blank' array, and assign field values by name:
In [93]: res = np.zeros(arr.shape[0], dt)
In [94]: res
Out[94]:
array([(0., 0., 0), (0., 0., 0), (0., 0., 0), (0., 0., 0), (0., 0., 0),
(0., 0., 0)], dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<i4')])
In [95]: res['A'] = arr[:,0]
In [96]: res['B'] = arr[:,1]
In [97]: res['C'] = arr[:,2]
In [98]: res
Out[98]:
array([(0.9607611 , 0.19538569, 0), (1.03990463, 0.22274072, 0),
(1.09430461, 0.22603228, 0), (1.10802461, -0.54190659, 2),
(0.9288097 , -0.49195368, 2), (0.81606986, -0.47141286, 2)],
dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<i4')])
and to belabor the point, we could also make the structured array from a list of tuples:
In [104]: np.array([tuple(row) for row in arr.tolist()], dt)
Out[104]:
array([(0.9607611 , 0.19538569, 0), (1.03990463, 0.22274072, 0),
(1.09430461, 0.22603228, 0), (1.10802461, -0.54190659, 2),
(0.9288097 , -0.49195368, 2), (0.81606986, -0.47141286, 2)],
dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<i4')])
The problem might be in the way you pass data to np.array. The rows of array should be tuples.
a = np.array([( 0.9607611 , 0.19538569, 0. )], dtype='f8, f8, i4')
will create an array
array([(0.9607611, 0.19538569, 0)],
dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<i4')])

Optimal generation of numpy associative array with subarrays and variable length

I have Python-generated data, of the type
fa fb fc
fa1 fb1 [fc01, fc02,..., fc0m]
fa2 fb2 [fc11, fc12,..., fc1m]
... ... ...
fan fbn [fcn1, fcn2,..., fcnm]
I need to create a Python-compatible data structure to store it, maximizing ease of creation, and minimizing memory usage and read/write time. I need to be able to identify columns via field names (i.e. retrieve fa1 with something like data['fa'][0]). fa values are ints, and fb and fc are floats. Neither m nor n are known before runtime, but are known before data is inserted into the data structure, and do not change. m will not exceed 1000, and n won't exceed 10000. Data is generated one row at a time.
Until now, I've used a numpy associative array, asar, of dtype=[('f0,'i2'), ('f1','f8'), ('f2', 'f8', (m))]. However, since I can't just add a new row to a numpy array without deleting and recreating it each time a row is added, I've been using a separate counting variable ind_n, creating asar with asar = numpy.zeroes(n, dtype=dtype), overwriting asar[ind_n]'s zeroes with the data to be added, then incrementing ind_n until it reaches n. This works, but it seems like there must be a better solution (or at least one that allows me to eliminate ind_n). Is there a standard way to create the skeleton of asar (perhaps with something like np.zeroes()), then insert each line of data into the first nonzero row? Or a way to convert a standard python nested list to an associative array, once the nested list has been completely generated? (I know this conversion can definitely be done, but run into issues (e.g. ValueError: setting an array element with a sequence.) when converting the subarray, when I attempt it.)
In [39]: n, m = 5, 3
In [41]: dt=np.dtype([('f0','i2'), ('f1','f8'), ('f2', 'f8', (m))])
In [45]: asar = np.zeros(n, dt)
In [46]: asar
Out[46]:
array([(0, 0., [0., 0., 0.]), (0, 0., [0., 0., 0.]),
(0, 0., [0., 0., 0.]), (0, 0., [0., 0., 0.]),
(0, 0., [0., 0., 0.])],
dtype=[('f0', '<i2'), ('f1', '<f8'), ('f2', '<f8', (3,))])
Filling by field:
In [49]: asar['f0'] = np.arange(5)
In [50]: asar['f1'] = np.random.rand(5)
In [51]: asar['f2'] = np.random.rand(5,3)
In [52]: asar
Out[52]:
array([(0, 0.45120412, [0.86481761, 0.08861093, 0.42212446]),
(1, 0.63926708, [0.43788684, 0.89254029, 0.90637292]),
(2, 0.33844457, [0.80352251, 0.25411018, 0.315124 ]),
(3, 0.24271258, [0.27849709, 0.9905879 , 0.94155558]),
(4, 0.89239324, [0.1580938 , 0.52844036, 0.59092695])],
dtype=[('f0', '<i2'), ('f1', '<f8'), ('f2', '<f8', (3,))])
Generating a list with matching nesting:
In [53]: alist = [(i,i,[10]*3) for i in range(5)]
In [54]: np.array(alist, dt)
Out[54]:
array([(0, 0., [10., 10., 10.]), (1, 1., [10., 10., 10.]),
(2, 2., [10., 10., 10.]), (3, 3., [10., 10., 10.]),
(4, 4., [10., 10., 10.])],
dtype=[('f0', '<i2'), ('f1', '<f8'), ('f2', '<f8', (3,))])
Obviously you could do:
for i, row in enumerate(alist):
asar[i] = row
enumerate is a nice idiomatic way of generating an index along with a value. But then so is range(n).
If you know n at the time you create the first record your solution is essentially correct.
You can use np.empty instead of np.zeros saving a bit (but not much) time.
If you feel bad about ind_n you can create an array iterator instead.
>>> m = 5
>>> n = 7
>>> dt = [('col1', 'i2'), ('col2', float), ('col3', float, (m,))]
>>> data = [(np.random.randint(10), np.random.random(), np.random.random((m,))) for _ in range(n)]
>>>
>>> rec = np.empty((n,), dt)
>>> irec = np.nditer(rec, op_flags=[['readwrite']], flags=['c_index'])
>>>
>>> for src in data:
... # roughly equivalent to list.append:
... next(irec)[()] = src
... print()
... # getting the currently valid part:
... print(irec.operands[0][:irec.index+1])
...
[(9, 0.07368308, [0.44691665, 0.38875103, 0.83522137, 0.39281718, 0.62078615])]
[(9, 0.07368308, [0.44691665, 0.38875103, 0.83522137, 0.39281718, 0.62078615])
(6, 0.82350335, [0.57971597, 0.61270304, 0.05280996, 0.03702404, 0.99159465])]
[(9, 0.07368308, [0.44691665, 0.38875103, 0.83522137, 0.39281718, 0.62078615])
(6, 0.82350335, [0.57971597, 0.61270304, 0.05280996, 0.03702404, 0.99159465])
(3, 0.06565234, [0.88921842, 0.21097122, 0.83276431, 0.01824657, 0.49105466])]
[(9, 0.07368308, [0.44691665, 0.38875103, 0.83522137, 0.39281718, 0.62078615])
(6, 0.82350335, [0.57971597, 0.61270304, 0.05280996, 0.03702404, 0.99159465])
(3, 0.06565234, [0.88921842, 0.21097122, 0.83276431, 0.01824657, 0.49105466])
(2, 0.69806099, [0.87749632, 0.22119474, 0.25623813, 0.26587436, 0.04772489])]
[(9, 0.07368308, [0.44691665, 0.38875103, 0.83522137, 0.39281718, 0.62078615])
(6, 0.82350335, [0.57971597, 0.61270304, 0.05280996, 0.03702404, 0.99159465])
(3, 0.06565234, [0.88921842, 0.21097122, 0.83276431, 0.01824657, 0.49105466])
(2, 0.69806099, [0.87749632, 0.22119474, 0.25623813, 0.26587436, 0.04772489])
(1, 0.77573727, [0.44359522, 0.62471617, 0.65742177, 0.38889958, 0.13901824])]
[(9, 0.07368308, [0.44691665, 0.38875103, 0.83522137, 0.39281718, 0.62078615])
(6, 0.82350335, [0.57971597, 0.61270304, 0.05280996, 0.03702404, 0.99159465])
(3, 0.06565234, [0.88921842, 0.21097122, 0.83276431, 0.01824657, 0.49105466])
(2, 0.69806099, [0.87749632, 0.22119474, 0.25623813, 0.26587436, 0.04772489])
(1, 0.77573727, [0.44359522, 0.62471617, 0.65742177, 0.38889958, 0.13901824])
(0, 0.45797521, [0.79193395, 0.69029592, 0.0541346 , 0.49603146, 0.36146384])]
[(9, 0.07368308, [0.44691665, 0.38875103, 0.83522137, 0.39281718, 0.62078615])
(6, 0.82350335, [0.57971597, 0.61270304, 0.05280996, 0.03702404, 0.99159465])
(3, 0.06565234, [0.88921842, 0.21097122, 0.83276431, 0.01824657, 0.49105466])
(2, 0.69806099, [0.87749632, 0.22119474, 0.25623813, 0.26587436, 0.04772489])
(1, 0.77573727, [0.44359522, 0.62471617, 0.65742177, 0.38889958, 0.13901824])
(0, 0.45797521, [0.79193395, 0.69029592, 0.0541346 , 0.49603146, 0.36146384])
(6, 0.85225039, [0.62028917, 0.4895316 , 0.00922578, 0.66836154, 0.53082779])]

Identify top k values and mark them based on their respective ranking orders

There is an one-dimensional array, for instance, as shown in the following. Are there any functions that can transform this array into another array, which only keeps the top 5 elements of the existing array. These five kept elements are marked as 5, 4,3,2,1 based on their respective numerical values, and other elements are just marked as 0.
9.00E-05
8.74E-05
-6.67E-05
-0.000296984
-0.00016961
-7.49E-06
-0.000102942
-0.000183901
0.000206149
5.62E-05
0.000112588
5.93E-05
9.85E-05
-2.29E-05
5.08E-05
0.00015748
Here is one solution from rank
s=df.rank(ascending=False)
s.mask(s>5,0).astype(int)
Out[74]:
0 5
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 1
9 0
10 3
11 0
12 4
13 0
14 0
15 2
Name: val, dtype: int32
If you want the numbers to remain in the same order and obtain an array of tuples with the original number and rank, you could do this:
numbers = [ 9.00E-05, 8.74E-05, -6.67E-05, -0.000296984, -0.00016961, -7.49E-06, -0.000102942, -0.000183901, 0.000206149, 5.62E-05, 0.000112588, 5.93E-05, 9.85E-05, -2.29E-05, 5.08E-05, 0.00015748]
ranks = { n:max(5-i,0) for (i,n) in enumerate(sorted(numbers)) }
tagged = [ (n,ranks[n]) for n in numbers ]
# tagged will contain : [(9e-05, 0), (8.74e-05, 0), (-6.67e-05, 1), (-0.000296984, 5), (-0.00016961, 3), (-7.49e-06, 0), (-0.000102942, 2), (-0.000183901, 4), (0.000206149, 0), (5.62e-05, 0), (0.000112588, 0), (5.93e-05, 0), (9.85e-05, 0), (-2.29e-05, 0), (5.08e-05, 0), (0.00015748, 0)]
if the original order doesn't matter, you only need this:
tagged = [ (n,max(5-i,0)) for (i,n) in enumerate(sorted(numbers)) ]
# then tagge will be : [(-0.000296984, 5), (-0.000183901, 4), (-0.00016961, 3), (-0.000102942, 2), (-6.67e-05, 1), (-2.29e-05, 0), (-7.49e-06, 0), (5.08e-05, 0), (5.62e-05, 0), (5.93e-05, 0), (8.74e-05, 0), (9e-05, 0), (9.85e-05, 0), (0.000112588, 0), (0.00015748, 0), (0.000206149, 0)]
One way is to use numpy. We assume your array is held in variable arr.
args = arr.argsort()
arr[args[-5:]] = range(5, 0, -1)
arr[args[:-5]] = 0
# array([ 5., 0., 0., 0., 0., 0., 0., 0., 1., 0., 3., 0., 4.,
# 0., 0., 2.])

Scipy constrained minimization does not respect constraint

I apologize if the question seems straightforward and easy. I tried to look for an answer, but did not find one that could solve my problem.
I have a very simple minimization problem: I need to maximize an expected value (in a second phase the objective function will become more complicated):
def EV(q, P):
return (-1)*np.sum(100 * q * (2*P - 1))
q is a 12 dimensional vector whose elements need to be between 0 and 1 and, clearly, the sum of the elements of q is 1. So I proceed to set the bounds and constraints:
cons = {'type': 'eq', 'fun': lambda q: np.sum(q) - 1}
bds = [(0, 1), (0, 1), (0, 1), (0, 1), (0, 1), (0, 1), (0, 1), (0, 1), (0, 1), (0, 1), (0, 1), (0, 1)]
P = array([ 0.32510069, 0.96284943, 0.33966465, 0.61696874, 0.77368336,
0.10127222, 0.47836665, 0.87537657, 0.2086234 , 0.52468426,
0.31931169, 0.86424427]).
Then I call scipy.optimize.minimize:
X0 = np.array([0.5,0,0,0,0,0,0,0,0,0,0.4,0])
qstar = scipy.optimize.minimize(fun = EV, x0 = X0, args = (P), method = 'L-BFGS-B', bounds = bds, constraints = cons).
However, when I print the solution qstar I get the following:
fun: -323.56132559388169
hess_inv: <12x12 LbfgsInvHessProduct with dtype=float64>
jac: array([ 34.97985972, -92.56988847, 32.06706651, -23.39374987,
-54.7366767 , 79.74555274, 4.32666525, -75.0753145 ,
58.27532163, -4.93685093, 36.13766353, -72.84884873])
message: 'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
nfev: 26
nit: 1
status: 0
success: True
x: array([ 0., 1., 0., 1., 1., 0., 0., 1., 0., 1., 0., 1.])
Why isn't the solution satisfying the equality constraint? Is it, perhaps, because of the message? Any help is very much appreciated.
Change the solver method to SLSQP, as mentioned in the comment, constraints are only supported in SLSQP and COBYLA. SLSQP solves the problem by sequential least squares quadratic programming.
Note that COBYLA only supports inequality constraints.
import numpy as np
import scipy.optimize
def EV(q, P):
return (-1)*np.sum(100 * q * (2*P - 1))
cons = {'type': 'eq', 'fun': lambda q: np.sum(q) - 1}
bds = [(0, 1), (0, 1), (0, 1), (0, 1), (0, 1), (0, 1), (0, 1), (0, 1), (0, 1), (0, 1), (0, 1), (0, 1)]
P = np.array([ 0.32510069, 0.96284943, 0.33966465, 0.61696874, 0.77368336,
0.10127222, 0.47836665, 0.87537657, 0.2086234 , 0.52468426,
0.31931169, 0.86424427])
X0 = np.array([0.5,0,0,0,0,0,0,0,0,0,0.4,0])
qstar = scipy.optimize.minimize(fun = EV, x0 = X0, args = (P), method ='SLSQP', bounds = bds, constraints = cons)
print(qstar)
gives me the following output.
fun: -92.56988588438836
jac: array([ 34.97986126, -92.56988621, 32.06707001, -23.39374828,
-54.7366724 , 79.74555588, 4.32666969, -75.07531452,
58.27532005, -4.93685246, 36.13766193, -72.84885406])
message: 'Optimization terminated successfully.'
nfev: 28
nit: 2
njev: 2
status: 0
success: True
x: array([ 2.07808604e-10, 1.00000000e+00, 1.95365391e-10,
0.00000000e+00, 0.00000000e+00, 4.37596612e-10,
5.51522994e-11, 0.00000000e+00, 3.28030922e-10,
8.07265366e-12, 2.14253171e-10, 0.00000000e+00])

Categories