I am using the sklearn.cluster KMeans package. Once I finish the clustering if I need to know which values were grouped together how can I do it?
Say I had 100 data points and KMeans gave me 5 cluster. Now I want to know which data points are in cluster 5. How can I do that.
Is there a function to give the cluster id and it will list out all the data points in that cluster?
I had a similar requirement and i am using pandas to create a new dataframe with the index of the dataset and the labels as columns.
data = pd.read_csv('filename')
km = KMeans(n_clusters=5).fit(data)
cluster_map = pd.DataFrame()
cluster_map['data_index'] = data.index.values
cluster_map['cluster'] = km.labels_
Once the DataFrame is available is quite easy to filter,
For example, to filter all data points in cluster 3
cluster_map[cluster_map.cluster == 3]
If you have a large dataset and you need to extract clusters on-demand you'll see some speed-up using numpy.where. Here is an example on the iris dataset:
from sklearn.cluster import KMeans
from sklearn import datasets
import numpy as np
centers = [[1, 1], [-1, -1], [1, -1]]
iris = datasets.load_iris()
X = iris.data
y = iris.target
km = KMeans(n_clusters=3)
km.fit(X)
Define a function to extract the indices of the cluster_id you provide. (Here are two functions, for benchmarking, they both return the same values):
def ClusterIndicesNumpy(clustNum, labels_array): #numpy
return np.where(labels_array == clustNum)[0]
def ClusterIndicesComp(clustNum, labels_array): #list comprehension
return np.array([i for i, x in enumerate(labels_array) if x == clustNum])
Let's say you want all samples that are in cluster 2:
ClusterIndicesNumpy(2, km.labels_)
array([ 52, 77, 100, 102, 103, 104, 105, 107, 108, 109, 110, 111, 112,
115, 116, 117, 118, 120, 122, 124, 125, 128, 129, 130, 131, 132,
134, 135, 136, 137, 139, 140, 141, 143, 144, 145, 147, 148])
Numpy wins the benchmark:
%timeit ClusterIndicesNumpy(2,km.labels_)
100000 loops, best of 3: 4 µs per loop
%timeit ClusterIndicesComp(2,km.labels_)
1000 loops, best of 3: 479 µs per loop
Now you can extract all of your cluster 2 data points like so:
X[ClusterIndicesNumpy(2,km.labels_)]
array([[ 6.9, 3.1, 4.9, 1.5],
[ 6.7, 3. , 5. , 1.7],
[ 6.3, 3.3, 6. , 2.5],
... #truncated
Double-check the first three indices from the truncated array above:
print X[52], km.labels_[52]
print X[77], km.labels_[77]
print X[100], km.labels_[100]
[ 6.9 3.1 4.9 1.5] 2
[ 6.7 3. 5. 1.7] 2
[ 6.3 3.3 6. 2.5] 2
Actually a very simple way to do this is:
clusters=KMeans(n_clusters=5)
df[clusters.labels_==0]
The second row returns all the elements of the df that belong to the 0th cluster. Similarly you can find the other cluster-elements.
To get the IDs of the points/samples/observations that are inside each cluster, do this:
Python 2
Example using Iris data and a nice pythonic way:
import numpy as np
from sklearn.cluster import KMeans
from sklearn import datasets
np.random.seed(0)
# Use Iris data
iris = datasets.load_iris()
X = iris.data
y = iris.target
# KMeans with 3 clusters
clf = KMeans(n_clusters=3)
clf.fit(X,y)
#Coordinates of cluster centers with shape [n_clusters, n_features]
clf.cluster_centers_
#Labels of each point
clf.labels_
# Nice Pythonic way to get the indices of the points for each corresponding cluster
mydict = {i: np.where(clf.labels_ == i)[0] for i in range(clf.n_clusters)}
# Transform this dictionary into list (if you need a list as result)
dictlist = []
for key, value in mydict.iteritems():
temp = [key,value]
dictlist.append(temp)
RESULTS
#dict format
{0: array([ 50, 51, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
91, 92, 93, 94, 95, 96, 97, 98, 99, 101, 106, 113, 114,
119, 121, 123, 126, 127, 133, 138, 142, 146, 149]),
1: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]),
2: array([ 52, 77, 100, 102, 103, 104, 105, 107, 108, 109, 110, 111, 112,
115, 116, 117, 118, 120, 122, 124, 125, 128, 129, 130, 131, 132,
134, 135, 136, 137, 139, 140, 141, 143, 144, 145, 147, 148])}
# list format
[[0, array([ 50, 51, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
91, 92, 93, 94, 95, 96, 97, 98, 99, 101, 106, 113, 114,
119, 121, 123, 126, 127, 133, 138, 142, 146, 149])],
[1, array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])],
[2, array([ 52, 77, 100, 102, 103, 104, 105, 107, 108, 109, 110, 111, 112,
115, 116, 117, 118, 120, 122, 124, 125, 128, 129, 130, 131, 132,
134, 135, 136, 137, 139, 140, 141, 143, 144, 145, 147, 148])]]
Python 3
Just change
for key, value in mydict.iteritems():
to
for key, value in mydict.items():
You can look at attribute labels_
For example
km = KMeans(2)
km.fit([[1,2,3],[2,3,4],[5,6,7]])
print km.labels_
output: array([1, 1, 0], dtype=int32)
As you can see first and second point is cluster 1, last point in cluster 0.
You can Simply store the labels in an array. Convert the array to a data frame. Then Merge the data that you used to create K means with the new data frame with clusters.
Display the dataframe. Now you should see the row with corresponding cluster. If you want to list all the data with specific cluster, use something like data.loc[data['cluster_label_name'] == 2], assuming 2 your cluster for now.
Related
I have one numpy array that looks like this:
array([ 0, 1, 2, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16,
18, 19, 20, 22, 27, 28, 29, 32, 33, 34, 36, 37, 38,
39, 42, 43, 44, 45, 47, 48, 51, 52, 54, 55, 56, 60,
65, 66, 67, 68, 69, 70, 71, 73, 74, 75, 77, 78, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 92, 94, 95, 97,
98, 100, 101, 102, 105, 106, 108, 109, 113, 114, 117, 118, 119,
121, 123, 124, 126, 127, 128, 129, 131, 132, 133, 134, 135, 137,
138, 141, 142, 143, 144, 145, 147, 148, 149, 152, 154, 156, 157,
159, 160, 161, 163, 165, 166, 167, 168, 169, 170, 172, 176, 177,
179, 180, 182, 183, 185, 186, 187, 188, 191, 192, 194, 196, 197,
199, 200, 201, 202, 204, 205, 206, 207, 208])
I'm able to convert this to a set using set() no problem
However, I have another numpy array that looks like:
array([[ 2],
[ 4],
[ 10],
[ 10],
[ 12],
[ 13],
[ 14],
[ 16],
[ 19],
[ 21],
[ 21],
[ 22],
[ 29],
[209]])
When I try to use set() this gives me an error: TypeError: unhashable type: 'numpy.ndarray'
How can I convert my second numpy array to look like the first array and so I will be able to use set()?
For reference my second array is converted from a PySpark dataframe column using:
np.array(data2.select('row_num').collect())
And both arrays are used with set() in:
count = sorted(set(range(data1)) - set(np.array(data2.select('row_num').collect())))
As mentioned, use ravel to return a contiguous flattened array.
import numpy as np
arr = np.array(
[[2], [4], [10], [10], [12], [13], [14], [16], [19], [21], [21], [22], [29], [209]]
)
print(set(arr.ravel()))
Outputs:
{2, 4, 10, 12, 13, 14, 16, 209, 19, 21, 22, 29}
This is somewhat equivalent to doing a reshape with a single dimension being the array size:
print(set(arr.reshape(arr.size)))
I am trying to find all the minima in this graph.
But when I am writing this code it is giving many extra minima too and I want total 8 minima.
mini = []
for i in range(1, len(y)-1):
if y[i-1] >= y[i] and y[i] <= y[i+1]:
mini.append(i)
print(mini)
Output
[2, 5, 7, 15, 20, 26, 30, 37, 40, 47, 50, 52, 59, 61, 64, 70, 76, 84, 89, 94, 96, 99, 107, 109, 117, 120, 122, 130, 134, 140, 144, 148, 154, 164, 169, 176]
And I want to cut my data according to these minimum values. Can somebody tell me what changes should I make in this code to achieve my goal?
Hope someone can shed some light on this. I am trying to learn my way around with HDF5 files. Somehow this list of strings gets encoded into the file as a array of integers but I'm not able to figure out how to go about decoding it. I can plug the file back into pandas using the read_hdf function, but that's not the point - I am trying to understand the encoding logic. Summarized here is the example I was working with.
smiles.txt =
structure
[11CH2]1NCCN2C[C##H]3CCC[C##H]3c4cccc1c24
[11CH2]1NCCN2[C##H]3CCC[C##H]3c4cccc1c24
[11CH3]c1ccc(cc1)c2cc(nn2c3ccc(cc3)S(=O)(=O)N)C(F)(F)F
[11CH3]c1ccccc1O[C#H]([C##H]2CNCCO2)c3ccccc3
[11CH3]c1ccccc1S[C#H]([C##H]2CNCCO2)c3ccccc3
>>> import pandas as pd
>>> df = pd.read_csv('smiles.txt', header=0)
>>> df.to_hdf('smiles.h5', 'table')
I then explore the structure of the newly created HDF5 file:
>>> import h5py
>>> with h5py.File('smiles.h5',"r") as f:
>>> f.visit(print)
table
table/axis0
table/axis1
table/block0_items
table/block0_values
>>> with h5py.File('smiles_temp', 'r') as f:
>>> print(list(f.keys()))
>>> print(f['/thekey/axis0'][:])
>>> print(f['/thekey/axis1'][:])
>>> print(f['/thekey/block0_items'][:])
>>> print(f['/thekey/block0_values'][:])
['thekey']
[b'structure']
[0 1 2 3 4]
[b'structure']
[array([128, 4, 149, 123, 1, 0, 0, 0, 0, 0, 0, 140, 21,
110, 117, 109, 112, 121, 46, 99, 111, 114, 101, 46, 109, 117,
108, 116, 105, 97, 114, 114, 97, 121, 148, 140, 12, 95, 114,
101, 99, 111, 110, 115, 116, 114, 117, 99, 116, 148, 147, 148,
140, 5, 110, 117, 109, 112, 121, 148, 140, 7, 110, 100, 97,
114, 114, 97, 121, 148, 147, 148, 75, 0, 133, 148, 67, 1,
98, 148, 135, 148, 82, 148, 40, 75, 1, 75, 5, 75, 1,
134, 148, 104, 3, 140, 5, 100, 116, 121, 112, 101, 148, 147,
148, 140, 2, 79, 56, 148, 75, 0, 75, 1, 135, 148, 82,
148, 40, 75, 3, 140, 1, 124, 148, 78, 78, 78, 74, 255,
255, 255, 255, 74, 255, 255, 255, 255, 75, 63, 116, 148, 98,
137, 93, 148, 40, 140, 41, 91, 49, 49, 67, 72, 50, 93,
49, 78, 67, 67, 78, 50, 67, 91, 67, 64, 64, 72, 93,
51, 67, 67, 67, 91, 67, 64, 64, 72, 93, 51, 99, 52,
99, 99, 99, 99, 49, 99, 50, 52, 148, 140, 40, 91, 49,
49, 67, 72, 50, 93, 49, 78, 67, 67, 78, 50, 91, 67,
64, 64, 72, 93, 51, 67, 67, 67, 91, 67, 64, 64, 72,
93, 51, 99, 52, 99, 99, 99, 99, 49, 99, 50, 52, 148,
140, 54, 91, 49, 49, 67, 72, 51, 93, 99, 49, 99, 99,
99, 40, 99, 99, 49, 41, 99, 50, 99, 99, 40, 110, 110,
50, 99, 51, 99, 99, 99, 40, 99, 99, 51, 41, 83, 40,
61, 79, 41, 40, 61, 79, 41, 78, 41, 67, 40, 70, 41,
40, 70, 41, 70, 148, 140, 44, 91, 49, 49, 67, 72, 51,
93, 99, 49, 99, 99, 99, 99, 99, 49, 79, 91, 67, 64,
72, 93, 40, 91, 67, 64, 64, 72, 93, 50, 67, 78, 67,
67, 79, 50, 41, 99, 51, 99, 99, 99, 99, 99, 51, 148,
140, 44, 91, 49, 49, 67, 72, 51, 93, 99, 49, 99, 99,
99, 99, 99, 49, 83, 91, 67, 64, 72, 93, 40, 91, 67,
64, 64, 72, 93, 50, 67, 78, 67, 67, 79, 50, 41, 99,
51, 99, 99, 99, 99, 99, 51, 148, 101, 116, 148, 98, 46],
dtype=uint8)]
How does one go about returning the list of strings using h5py?
Just to clarify, the dataframe displays as:
In [2]: df = pd.read_csv('stack63452223.csv', header=0)
In [3]: df
Out[3]:
structure
0 [11CH2]1NCCN2C[C##H]3CCC[C##H]3c4cccc1c24
1 [11CH2]1NCCN2[C##H]3CCC[C##H]3c4cccc1c24
2 [11CH3]c1ccc(cc1)c2cc(nn2c3ccc(cc3)S(=O)(=O)N)...
3 [11CH3]c1ccccc1O[C#H]([C##H]2CNCCO2)c3ccccc3
4 [11CH3]c1ccccc1S[C#H]([C##H]2CNCCO2)c3ccccc3
In [11]: df._values
Out[11]:
array([['[11CH2]1NCCN2C[C##H]3CCC[C##H]3c4cccc1c24'],
['[11CH2]1NCCN2[C##H]3CCC[C##H]3c4cccc1c24'],
['[11CH3]c1ccc(cc1)c2cc(nn2c3ccc(cc3)S(=O)(=O)N)C(F)(F)F'],
['[11CH3]c1ccccc1O[C#H]([C##H]2CNCCO2)c3ccccc3'],
['[11CH3]c1ccccc1S[C#H]([C##H]2CNCCO2)c3ccccc3']], dtype=object)
or as a list of strings:
In [24]: df['structure'].to_list()
Out[24]:
['[11CH2]1NCCN2C[C##H]3CCC[C##H]3c4cccc1c24',
'[11CH2]1NCCN2[C##H]3CCC[C##H]3c4cccc1c24',
'[11CH3]c1ccc(cc1)c2cc(nn2c3ccc(cc3)S(=O)(=O)N)C(F)(F)F',
'[11CH3]c1ccccc1O[C#H]([C##H]2CNCCO2)c3ccccc3',
'[11CH3]c1ccccc1S[C#H]([C##H]2CNCCO2)c3ccccc3']
The h5 is written by pytables, which is different from h5py; generally h5py can read pytables, but the details can be complicated.
The top level keys:
['axis0', 'axis1', 'block0_items', 'block0_values']
A dataframe has axes (row and column). On another occasion I looked at how a dataframe stores its values, and found that it uses blocks, each holding columns with a common dtype. Here you have 1 column, and it is object dtype, since it contains strings.
Strings are bit awkward in HDF5, especially unicode. numpy arrays use a unicode string dtype; pandas uses object dtype, referencing Python strings (stored outside the dataframe). I suspect then that in saving such a frame pytables is using a more complex referencing scheme (that isn't immediately obvious via h5py).
Guess that's a long answer to just say I don't know.
Pandas own h5 load:
In [19]: pd.read_hdf('stack63452223.h5', 'table')
Out[19]:
structure
0 [11CH2]1NCCN2C[C##H]3CCC[C##H]3c4cccc1c24
1 [11CH2]1NCCN2[C##H]3CCC[C##H]3c4cccc1c24
2 [11CH3]c1ccc(cc1)c2cc(nn2c3ccc(cc3)S(=O)(=O)N)...
3 [11CH3]c1ccccc1O[C#H]([C##H]2CNCCO2)c3ccccc3
4 [11CH3]c1ccccc1S[C#H]([C##H]2CNCCO2)c3ccccc3
The h5 objects also have attrs,
In [38]: f['table'].attrs.keys()
Out[38]: <KeysViewHDF5 ['CLASS', 'TITLE', 'VERSION', 'axis0_variety', 'axis1_variety', 'block0_items_variety', 'encoding', 'errors', 'nblocks', 'ndim', 'pandas_type', 'pandas_version']>
Fiddling around I found that:
In [66]: x=f['table']['block0_values'][0]
In [67]: b''.join(x.view('S1').tolist())
Out[67]: b'\x80\x04\x95y\x01\x8c\x15numpy.core.multiarray\x94\x8c\x0c_reconstruct\x94\x93\x94\x8c\x05numpy\x94\x8c\x07ndarray\x94\x93\x94K\x85\x94C\x01b\x94\x87\x94R\x94(K\x01K\x05K\x01\x86\x94h\x03\x8c\x05dtype\x94\x93\x94\x8c\x02O8\x94\x89\x88\x87\x94R\x94(K\x03\x8c\x01|\x94NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK?t\x94b\x89]\x94(\x8c)[11CH2]1NCCN2C[C##H]3CCC[C##H]3c4cccc1c24\x94\x8c([11CH2]1NCCN2[C##H]3CCC[C##H]3c4cccc1c24\x94\x8c6[11CH3]c1ccc(cc1)c2cc(nn2c3ccc(cc3)S(=O)(=O)N)C(F)(F)F\x94\x8c,[11CH3]c1ccccc1O[C#H]([C##H]2CNCCO2)c3ccccc3\x94\x8c,[11CH3]c1ccccc1S[C#H]([C##H]2CNCCO2)c3ccccc3\x94et\x94b.'
Looks like your strings are there. uint8 is a single byte dtype, which can be viewed as byte. Joining them I see your strings, concatenated in some fashion.
reformating:
Out[67]: b'\x80\x04\x95y\x01\x8c\x15numpy.core.multiarray\x94\x8c\x0c_reconstruct\x94\x93\x94\x8c\x05numpy\x94\x8c\x07ndarray\x94\x93\x94K\x85\x94C\x01b\x94\x87\x94R\x94(K\x01K\x05K\x01\x86\x94h\x03\x8c\x05dtype\x94\x93\x94\x8c\x02O8\x94\x89\x88\x87\x94R\x94(K\x03\x8c\x01|\x94NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK?t\x94b\x89]\x94(\x8c)
[11CH2]1NCCN2C[C##H]3CCC[C##H]3c4cccc1c24\x94\x8c(
[11CH2]1NCCN2[C##H]3CCC[C##H]3c4cccc1c24\x94\x8c6
[11CH3]c1ccc(cc1)c2cc(nn2c3ccc(cc3)S(=O)(=O)N)C(F)(F)F\x94\x8c,
[11CH3]c1ccccc1O[C#H]([C##H]2CNCCO2)c3ccccc3\x94\x8c,
[11CH3]c1ccccc1S[C#H]([C##H]2CNCCO2)c3ccccc3\x94et\x94b.'
I have a nested numpy array (dtype=object), it contains 333 arrays that increase consistently from size 52x1 to size 52x333
I would like to effectively extract and concatenate these arrays so that I have a single 52x55611 array
I imagine this may be straightforward but my attempts using numpy.reshape have been unsuccesful
If you want to stack them along the second axis, you can use numpy.hstack.
list_of_arrays = [ array_1, ..., array_n] #all these arrays have same shape[0]
big_array = np.hstack( list_of_arrays)
if I have understood you correctly, you could use numpy.concatenate.
>>> import numpy as np
>>> a = np.array([range(52)])
>>> b = np.array([range(52,104), range(104, 156)])
>>> np.concatenate((a,b))
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51],
[ 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,
65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103],
[104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155]])
>>>
I have an algorithm. I want that last solution of the algorithm if respect certains conditions become the first solution. In my case I have this:
First PArt
Split the multidimensional array q in 2 parts
split_at = q[:,3].searchsorted([1,random.randrange(LB,UB-I)])
D = numpy.split(q, split_at)
Change and rename the splitted matrix:
S=B[1]
SF=B[2]
S2=copy(SF)
S2[:,3]=S2[:,3]+I
Define a function f:
f=sum(S[:,1]*S[:,3])+sum(S2[:,1]*S2[:,3])
This first part is an obligated passage.
Second Passage
Then I split again the array in 2 parts:
split_at = q[:,3].searchsorted([1,random.randrange(LB,UB-I)])
D = numpy.split(q, split_at)
I rename and change parts of the matrix(like in the first passage:
T=D[1]
TF=D[2]
T2=copy(TF)
T2[:,3]=T2[:,3]+I
u=random.sample(T[:],1) #I random select an array from T
v=random.sample(T2[:],1) #random select an array from T2
u=array(u)
v=array(v)
Here is my first problem: I want to continue the algorithm only if v[0,0]-u[0,0]+T[-1,3]<=UB, if not I want to repeat Second Passage until the condition is verified.
Now I swap 1 random array from T with another from T2:
x=numpy.where(v==T2)[0][0]
y=numpy.where(u==T)[0][0]
l=np.copy(T[y])
T[y],T2[x]=T2[x],T[y]
T2[x],l=l,T2[x]
I modified and recalculate some in the matrix:
E=np.copy(T)
E2=np.copy(T2)
E[:,3]=np.cumsum(E[:,0])
E2[:,3]=np.cumsum(E2[:,0])+I
Define f2:
f2=sum(E[:,1]*E[:,3])+sum(E2[:,1]*E2[:,3])
Here my second and last problem. I need to iterate this algorithm. If f-f2<0 my new starting solution has to be E and E2 and my new f has to be f2 and iterate excluding last choice the algorithm (recalcultaing a new f and f2).
Thank you for the patience. I'm a noob :D
EDIT:
I have an example here(this part goes before the code I have written on top)
import numpy as np
import random
p=[ 29, 85, 147, 98, 89, 83, 49, 7, 48, 88, 106, 97, 2,
107, 33, 144, 123, 84, 25, 42, 17, 82, 125, 103, 31, 110,
34, 100, 36, 46, 63, 18, 132, 10, 26, 119, 133, 15, 138,
113, 108, 81, 118, 116, 114, 130, 134, 86, 143, 126, 104, 52,
102, 8, 90, 11, 87, 37, 68, 75, 69, 56, 40, 70, 35,
71, 109, 5, 131, 121, 73, 38, 149, 20, 142, 91, 24, 53,
57, 39, 80, 79, 94, 136, 111, 78, 43, 92, 135, 65, 140,
148, 115, 61, 137, 50, 77, 30, 3, 93]
w=[106, 71, 141, 134, 14, 53, 57, 128, 119, 6, 4, 2, 140,
63, 51, 126, 35, 21, 125, 7, 109, 82, 95, 129, 67, 115,
112, 31, 114, 42, 91, 46, 108, 60, 97, 142, 85, 149, 28,
58, 52, 41, 22, 83, 86, 9, 120, 30, 136, 49, 84, 38,
70, 127, 1, 99, 55, 77, 144, 105, 145, 132, 45, 61, 81,
10, 36, 80, 90, 62, 32, 68, 117, 64, 24, 104, 131, 15,
47, 102, 100, 16, 89, 3, 147, 48, 148, 59, 143, 98, 88,
118, 121, 18, 19, 11, 69, 65, 123, 93]
p=array(p,'double')
w=array(w,'double')
r=p/w
LB=12
UB=155
I=9
j=p,w,r
j=transpose(j)
k=j[j[:,2].argsort()]
c=np.cumsum(k[:,0])
q=k[:,0],k[:,1],k[:,2],c
q=transpose(q)
o=sum(q[:,1]*q[:,3])
split_at = q[:,3].searchsorted([1,UB-I])
B = numpy.split(q, split_at)
S=B[1]
SF=B[2]
S2=copy(SF)
S2[:,3]=S2[:,3]+I
f=sum(S[:,1]*S[:,3])+sum(S2[:,1]*S2[:,3])
split_at = q[:,3].searchsorted([1,random.randrange(LB,UB-I)])
D = numpy.split(q, split_at)
T=D[1]
TF=D[2]
T2=copy(TF)
T2[:,3]=T2[:,3]+I
u=random.sample(T[:],1)
v=random.sample(T2[:],1)
u=array(u)
v=array(v)
x=numpy.where(v==T2)[0][0]
y=numpy.where(u==T)[0][0]
l=np.copy(T[y])
T[y],T2[x]=T2[x],T[y]
T2[x],l=l,T2[x]
E=np.copy(T)
E2=np.copy(T2)
E[:,3]=np.cumsum(E[:,0])
E2[:,3]=np.cumsum(E2[:,0])+I
f2=sum(E[:,1]*E[:,3])+sum(E2[:,1]*E2[:,3])
I tried:
def DivideRandom(T,T2):
split_at = q[:,3].searchsorted([1,random.randrange(LB,UB-I)])
D = numpy.split(q, split_at)
T=D[1]
TF=D[2]
T2=copy(TF)
T2[:,3]=T2[:,3]+I
Divide(T,T2)
def SelectJob(u,v):
u=random.sample(T[:],1)
v=random.sample(T2[:],1)
u=array(u)
v=array(v)
SelectJob(u,v)
d=v[0,0]-u[0,0]+T[-1,3]
def Swap(u,v):
x=numpy.where(v==T2)[0][0]
y=numpy.where(u==T)[0][0]
l=np.copy(T[y])
T[y],T2[x]=T2[x],T[y]
T2[x],l=l,T2[x]
E=np.copy(T)
E2=np.copy(T2)
E[:,3]=np.cumsum(E[:,0])
E2[:,3]=np.cumsum(E2[:,0])+I
f2=sum(E[:,1]*E[:,3])+sum(E2[:,1]*E2[:,3])
while True:
if d<=UB
Swap(u,v)
if d>UB
DivideRandom(T,T2)
SelectJob(u,v)
if d<UB:
break
You can iterate indefinitely using while True, then stop whenever your conditions are met using break:
count = 0
while True:
count += 1
if count == 10:
break
So for your second example you can try:
while True:
...
if f - f2 < 0:
# use new variables
f, E = f2, E2
else:
break
Your first problem is similar; loop, test, reset the appropriate variables.