Related
I have a 6 * 14 matrix, each element of the matrix represents a score; my goal is to find the maximum total score as well as which element was picked from each row.
Only 1 element can be selected for each row, and up to 1 element can be selected for each column.
If the element of column 14 (last column) is selected, we will stop and add the score up to that element as total score.
If the element of second column is selected, element of the next row can only be selected from the third column to the last column.
We need to start from the first row, cannot skip it and go to the next row.
For example, if x1,1 (element of first row and first column) is selected, then we go to the second row, and pick x2,3 (can be picked from 2nd to the last column), then we go to the third row and pick x3,6 (can be picked from 4th to the last column), then we go to the fourth row and pick x4,9 and the fifth row to pick x5,14. We will pause here and not go to the last row since we have chosen a value from the last column. And the total score will be x1,1 + x2,3 + x3,6 + x4,9 + x5,14 = 0.73 according to the matrix below.
appr_0:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
0.21
0.22
0.31
0.13
0.14
0.05
0.09
0.11
0.12
0.33
0.42
0.10
0.08
0.12
0.11
0.10
0.13
0.14
0.12
0.15
0.19
0.21
0.22
0.13
0.12
0.07
0.08
0.07
0.22
0.21
0.12
0.14
0.15
0.08
0.10
0.12
0.15
0.30
0.22
0.11
0.09
0.13
0.17
0.12
0.18
0.19
0.17
0.15
0.19
0.21
0.22
0.13
0.14
0.15
0.18
0.10
0.16
0.18
0.19
0.20
0.21
0.18
0.19
0.20
0.21
0.17
0.18
0.17
0.10
0.09
0.23
0.20
0.11
0.16
0.18
0.09
0.09
0.13
0.16
0.20
0.21
0.17
0.11
0.14
I have tried using iteration approach to find out the maximum score but it was very time consuming and the Python script wasn't able to run through it within a reasonable time. Just wondering if there is a way to write in another way and optimize it.
df = pd.DataFrame(columns=['days_1', 'days_2', 'days_3', 'days_4', 'days_5', 'days_6', 'score'])
max_score = 0
curr_score = 0
curr_max = 0
for j0 in range(14):
curr_score = appr_0[0, j0]
curr_max = curr_max + curr_score
max_score = max(curr_max, max_score)
for j1 in range(j0, 14):
curr_score = appr_0[1, j1]
curr_max = curr_max + curr_score
max_score = max(curr_max, max_score)
for j2 in range(j1, 14):
curr_score = appr_0[2, j2]
curr_max = curr_max + curr_score
max_score = max(curr_max, max_score)
for j3 in range(j2, 14):
curr_score = appr_0[3, j3]
curr_max = curr_max + curr_score
max_score = max(curr_max, max_score)
for j4 in range(j3, 14):
curr_score = appr_0[4, j4]
curr_max = curr_max + curr_score
max_score = max(curr_max, max_score)
for j5 in range(j4, 14):
curr_score = appr_0[5, j5]
curr_max = curr_max + curr_score
max_score = max(curr_max, max_score)
df = df.append(pd.DataFrame([[j0, j1, j2, j3, j4, j5, max_score]], columns = df.columns))
df_max_record = df.loc[df['score'] == df['score'].max()]
Expected output df_max_record will look like (faked data):
days_1
days_2
days_3
days_4
days_5
days_6
score
2
3
7
9
10
13
0.95
I am benchmarking multiple problems for multiple systems using Gekko, and I would like to get my code to return the function calls, iterations, and time it takes to solve. I know that the solver automatically prints all of this data but is there an object or attribute that can be returned to allow my function to return the numeric values?
Here is an example of how the code is set up.
def model(plot=False):
t = np.linspace(0, 1, 101)
m = GEKKO(remote=False); m.time=t
fe = m.Param(np.cos(2*np.pi*t)+3)
de = m.Var(fe[0])
e = m.CV(0); e.STATUS=1; e.SPHI=e.SPLO=0; e.WSPHI=1000; e.WSPLO=1
der = m.MV(0, lb=-1, ub=1); der.STATUS=1
m.Equations([de.dt() == der, e == fe-de])
m.options.IMODE=6; m.solve()
if plot:
import matplotlib.pyplot as plt
plt.plot(t, fe)
plt.plot(t, de)
plt.plot(t, der)
plt.show()
return m.fcalls
if __name__ == "__main__":
model(plot=True)
The objective function, iterations, solve time, and solution status are available in Gekko with:
m.options.OBJFCNVAL
m.options.ITERATIONS
m.options.SOLVETIME
m.options.APPSTATUS
You could return these as a list as I've done with summary.
from gekko import GEKKO
import numpy as np
def model(plot=False):
t = np.linspace(0, 1, 101)
m = GEKKO(remote=False); m.time=t
fe = m.Param(np.cos(2*np.pi*t)+3)
de = m.Var(fe[0])
e = m.CV(0); e.STATUS=1; e.SPHI=e.SPLO=0; e.WSPHI=1000; e.WSPLO=1
der = m.MV(0, lb=-1, ub=1); der.STATUS=1
m.Equations([de.dt() == der, e == fe-de])
m.options.DIAGLEVEL=1
m.options.SOLVER=1
m.options.IMODE=6; m.solve()
if plot:
import matplotlib.pyplot as plt
plt.plot(t, fe)
plt.plot(t, de)
plt.plot(t, der)
plt.savefig('result.png')
return [m.options.OBJFCNVAL,\
m.options.ITERATIONS,\
m.options.SOLVETIME,\
m.options.APPSTATUS]
if __name__ == "__main__":
summary = model(plot=True)
print(summary)
If you want function calls, it is a little more complicated because there are different types of function calls. There are function calls for the objective function and constraints, function calls for 1st derivatives, and function calls for 2nd derivatives. You can get a complete report of all the subroutine calls and the individuals and cumulative times for each by setting m.options.DIAGLEVEL=1 or higher. Here is the solver output for this problem:
Number of state variables: 1900
Number of total equations: - 1800
Number of slack variables: - 0
---------------------------------------
Degrees of freedom : 100
----------------------------------------------
Dynamic Control with APOPT Solver
----------------------------------------------
Iter Objective Convergence
0 9.81590E+01 1.00000E+00
1 7.62224E+01 4.00000E-10
2 7.62078E+01 1.10674E-02
3 7.62078E+01 1.00000E-10
4 7.62078E+01 8.32667E-17
5 7.62078E+01 8.32667E-17
Successful solution
---------------------------------------------------
Solver : APOPT (v1.0)
Solution time : 0.5382 sec
Objective : 76.20778997271815
Successful solution
---------------------------------------------------
Some solvers, like IPOPT, don't have the iterations readily available from the API so they are always reported as zero. With APOPT, the summary list is [76.207789973, 5, 0.5253, 1]. The timing and function call report is after the solver summary.
Timer # 1 0.70/ 1 = 0.70 Total system time
Timer # 2 0.54/ 1 = 0.54 Total solve time
Timer # 3 0.05/ 9 = 0.01 Objective Calc: apm_p
Timer # 4 0.00/ 5 = 0.00 Objective Grad: apm_g
Timer # 5 0.02/ 9 = 0.00 Constraint Calc: apm_c
Timer # 6 0.00/ 0 = 0.00 Sparsity: apm_s
Timer # 7 0.00/ 0 = 0.00 1st Deriv #1: apm_a1
Timer # 8 0.00/ 5 = 0.00 1st Deriv #2: apm_a2
Timer # 9 0.02/ 200 = 0.00 Custom Init: apm_custom_init
Timer # 10 0.00/ 200 = 0.00 Mode: apm_node_res::case 0
Timer # 11 0.00/ 600 = 0.00 Mode: apm_node_res::case 1
Timer # 12 0.00/ 200 = 0.00 Mode: apm_node_res::case 2
Timer # 13 0.00/ 400 = 0.00 Mode: apm_node_res::case 3
Timer # 14 0.00/ 4800 = 0.00 Mode: apm_node_res::case 4
Timer # 15 0.00/ 2000 = 0.00 Mode: apm_node_res::case 5
Timer # 16 0.00/ 0 = 0.00 Mode: apm_node_res::case 6
Timer # 17 0.00/ 5 = 0.00 Base 1st Deriv: apm_jacobian
Timer # 18 0.02/ 5 = 0.00 Base 1st Deriv: apm_condensed_jacobian
Timer # 19 0.00/ 1 = 0.00 Non-zeros: apm_nnz
Timer # 20 0.00/ 0 = 0.00 Count: Division by zero
Timer # 21 0.00/ 0 = 0.00 Count: Argument of LOG10 negative
Timer # 22 0.00/ 0 = 0.00 Count: Argument of LOG negative
Timer # 23 0.00/ 0 = 0.00 Count: Argument of SQRT negative
Timer # 24 0.00/ 0 = 0.00 Count: Argument of ASIN illegal
Timer # 25 0.00/ 0 = 0.00 Count: Argument of ACOS illegal
Timer # 26 0.00/ 1 = 0.00 Extract sparsity: apm_sparsity
Timer # 27 0.00/ 17 = 0.00 Variable ordering: apm_var_order
Timer # 28 0.00/ 1 = 0.00 Condensed sparsity
Timer # 29 0.00/ 0 = 0.00 Hessian Non-zeros
Timer # 30 0.00/ 3 = 0.00 Differentials
Timer # 31 0.00/ 0 = 0.00 Hessian Calculation
Timer # 32 0.00/ 0 = 0.00 Extract Hessian
Timer # 33 0.00/ 1 = 0.00 Base 1st Deriv: apm_jac_order
Timer # 34 0.06/ 1 = 0.06 Solver Setup
Timer # 35 0.40/ 1 = 0.40 Solver Solution
Timer # 36 0.00/ 23 = 0.00 Number of Variables
Timer # 37 0.00/ 12 = 0.00 Number of Equations
Timer # 38 0.05/ 17 = 0.00 File Read/Write
Timer # 39 0.00/ 1 = 0.00 Dynamic Init A
Timer # 40 0.02/ 1 = 0.02 Dynamic Init B
Timer # 41 0.02/ 1 = 0.02 Dynamic Init C
Timer # 42 0.00/ 1 = 0.00 Init: Read APM File
Timer # 43 0.00/ 1 = 0.00 Init: Parse Constants
Timer # 44 0.00/ 1 = 0.00 Init: Model Sizing
Timer # 45 0.00/ 1 = 0.00 Init: Allocate Memory
Timer # 46 0.00/ 1 = 0.00 Init: Parse Model
Timer # 47 0.00/ 1 = 0.00 Init: Check for Duplicates
Timer # 48 0.00/ 1 = 0.00 Init: Compile Equations
Timer # 49 0.00/ 1 = 0.00 Init: Check Uninitialized
Timer # 50 0.00/ 205 = 0.00 Evaluate Expression Once
Timer # 51 0.00/ 0 = 0.00 Sensitivity Analysis: LU Factorization
Timer # 52 0.00/ 0 = 0.00 Sensitivity Analysis: Gauss Elimination
Timer # 53 0.00/ 0 = 0.00 Sensitivity Analysis: Total Time
Timers 3, 4, and 5 are probably most relevant to your question. They are objective function requests, 1st derivative requests, and constraint evaluation requests.
I have a data frame with some quantitative data and one qualitative data. I would like to use describe to compute stats and group by column using the qualitative data. But I do not obtain the order I want for the level. Hereafter is an example:
df = pd.DataFrame({k: np.random.random(10) for k in "ABC"})
df["qual"] = 5 * ["init"] + 5 * ["final"]
The DataFrame looks like:
A B C qual
0 0.298217 0.675818 0.076533 init
1 0.015442 0.264924 0.624483 init
2 0.096961 0.702419 0.027134 init
3 0.481312 0.910477 0.796395 init
4 0.166774 0.319054 0.645250 init
5 0.609148 0.697818 0.151092 final
6 0.715744 0.067429 0.761562 final
7 0.748201 0.803647 0.482738 final
8 0.098323 0.614257 0.232904 final
9 0.033003 0.590819 0.943126 final
Now I would like to group by the qual column and compute statistical descriptors using describe. I did the following:
ddf = df.groupby("qual").describe().transpose()
ddf.unstack(level=0)
And I got
qual final init
A B C A B C
count 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000
mean 0.440884 0.554794 0.514284 0.211741 0.574539 0.433959
std 0.347138 0.284931 0.338057 0.182946 0.274135 0.355515
min 0.033003 0.067429 0.151092 0.015442 0.264924 0.027134
25% 0.098323 0.590819 0.232904 0.096961 0.319054 0.076533
50% 0.609148 0.614257 0.482738 0.166774 0.675818 0.624483
75% 0.715744 0.697818 0.761562 0.298217 0.702419 0.645250
max 0.748201 0.803647 0.943126 0.481312 0.910477 0.796395
I am close to what I want but I would like to swap and group the column index such as:
A B C
qual initial final initial final initial final
Is there a way to do it ?
Use columns.swaplevel and then sort_index by level=0 and axis='columns':
ddf = df.groupby('qual').describe().T.unstack(level=0)
ddf.columns = ddf.columns.swaplevel(0,1)
ddf = ddf.sort_index(level=0, axis='columns')
Or in one line using DataFrame.swaplevel instead of index.swaplevel:
ddf = ddf.swaplevel(0,1, axis=1).sort_index(level=0, axis='columns')
A B C
qual final init final init final init
count 5.00 5.00 5.00 5.00 5.00 5.00
mean 0.44 0.21 0.55 0.57 0.51 0.43
std 0.35 0.18 0.28 0.27 0.34 0.36
min 0.03 0.02 0.07 0.26 0.15 0.03
25% 0.10 0.10 0.59 0.32 0.23 0.08
50% 0.61 0.17 0.61 0.68 0.48 0.62
75% 0.72 0.30 0.70 0.70 0.76 0.65
max 0.75 0.48 0.80 0.91 0.94 0.80
Try ddf.stack().unstack(level=[0,2]), inplace of ddf.unstack(level=0)
I am stuck trying to query nearest neighbors of models from a pdb file, using scipy’s kd-tree. I have currently implemented a brute force approach where I compare each model's rmsd value to every other model. I would like to speed up the time it takes to find each model nearest neighbors by using kd-tree.
For reference, a sample of the pdb file I am working with has multiple models in a single file:
MODEL 5
HETATM 1 C1 SIN A 0 13.542 -2.290 0.745 1.00 0.00 C
HETATM 2 O1 SIN A 0 14.446 -2.652 0.010 1.00 0.00 O
HETATM 3 O2 SIN A 0 12.378 -2.189 0.395 1.00 0.00 O
...
TER 627 NH2 A 39
ENDMDL
MODEL 6
HETATM 1 C1 SIN A 0 11.762 2.281 -7.835 1.00 0.00 C
ATOM 26 C TRP A 2 11.341 6.316 -0.847 1.00 0.00 C
ATOM 27 O TRP A 2 11.074 6.179 0.330 1.00 0.00 O
ATOM 28 CB TRP A 2 13.182 7.844 -1.538 1.00 0.00 C
ATOM 29 CG TRP A 2 12.069 8.524 -2.259 1.00 0.00 C
...
HETATM 626 HN2 NH2 A 39 3.093 9.404 -6.782 1.00 0.00 H
TER 627 NH2 A 39
ENDMDL
MODEL 7
HETATM 1 C1 SIN A 0 -16.074 -1.515 -4.262 1.00 0.00 C
HETATM 2 O1 SIN A 0 -16.968 -1.910 -4.992 1.00 0.00 O
...
ATOM 18 OD1 ASP A 1 -12.877 3.426 -8.525 1.00 0.00 O
ATOM 19 OD2 ASP A 1 -13.484 1.785 -9.782 1.00 0.00 O
TER 627 NH2 A 39
ENDMDL
My initial attempt was to represent each model as a list, that has a list of atom coordinates, and each 3D atom coordinate is represented by a list:
print(model_coord)
[
[[1.4579, 0.0, 0.0],... ,[-5.5, 21.5529, 23.7390]],
[[16.5450, 3.3699, 10.1888], ... ,[-0.0963, 24.510883331298828, 20.2952]],
[[17.6256, 2.5858, 12.4808],... ,[-11.6052, 13.1031, 23.8958]]
]
I then received the following error when creating kdtree object:
kdtree = scipy.spatial.KDTree(model_coord)
File "/Library/Python/2.7/site-packages/scipy/spatial/kdtree.py", line 235, in __init__
self.n, self.m = np.shape(self.data)
ValueError: too many values to unpack
However, converting model_coord into panada dataframes allowed me to obtain the n by m requirement to create the kdtree object, where each row represents a model and the column 3D atom coordinates:
model_df = pd.DataFrame(model_coord)
print(model_df.to_string())
0 1 2 ...
0 [1.45799, 0.0, 0.0] [3.9140, 2.8670, 0.4530] [7.590, 3.7990, 0.1850] ...
1 [16.5450, 3.3699, 10.1888] [15.9148, 1.9402, 13.6552] [14.4702, 2.6485, 17.0995] ...
2 [17.6256, 2.5858, 12.4808] [16.4266, 2.2781, 16.0749] [12.6480, 2.6846, 16.0066] …
Here is my attempt to query nearest neighbor of radius with a model, where epsilon is the radius:
kdtree = scipy.spatial.KDTree(model_df)
for index, model in model_df.iterrows():
model_nn_dist, model_nn_ids = kdtree.query(model,distance_upper_bound=epsilon)
Received the following error due to the coordinates being a list object:
model_nn_dist, model_nn_ids=kdtree.query(model,distance_upper_bound=epsilon)
File "/Library/Python/2.7/site-packages/scipy/spatial/kdtree.py", line 521, in query
hits = self.__query(x, k=k, eps=eps, p=p,distance_upper_bound=distance_upper_bound)
File "/Library/Python/2.7/site-packages/scipy/spatial/kdtree.py", line 320, in __query
side_distances = np.maximum(0,np.maximum(x-self.maxes,self.mins-x))
TypeError: unsupported operand type(s) for -: 'list' and ‘list'
Attempted to resolve this by converting the atom coordinates into numpy array, however, this is the error I receive:
model_nn_dist, model_nn_ids = kdtree.query(model,distance_upper_bound=epsilon)
File "/Library/Python/2.7/site-packages/scipy/spatial/kdtree.py", line 521, in query
hits = self.__query(x, k=k, eps=eps, p=p, distance_upper_bound=distance_upper_bound)
File "/Library/Python/2.7/site-packages/scipy/spatial/kdtree.py", line 320, in __query
side_distances = np.maximum(0,np.maximum(x-self.maxes,self.mins-x))
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I am wondering if there is a better approach or a more suitable data structure to query nearest neighbors of models or sets of coordinates, using kd-trees.
So I have a query; I am accessing an API that gives the following response:
[["22014",201939,"0021401229","APR 15 2015",Team1 vs. Team2","W",
19,4,10,0.4,2,4,0.5,0,0,0,2,2,4,7,5,0,2,1,10,14,1],["22014",201939,"0021401","APR
13 2015",Team1 vs. Team3","W",
15,4,13,0.4,2,8,0.5,0,0,0,2,2,4,7,5,0,8,1,12,14,1],["22014",201939,"0021401192","APR
11 2015",Team1 vs. Team4","W",
22,5,10,0.4,2,6,0.5,0,0,0,2,2,4,7,5,0,2,1,8,14,1]]
I could just as easily have 16 different variables that I assign zero to, then print them out like the following example:
sum_pts = 0
for n in range(0,len(shot_data)): #range of games; these lengths vary per player
sum_pts= sum_pts+float(json.dumps(shots_array[n][24]))
print sum_pts/float(len(shots_array))
Output:
>>>
23.75
But I'd rather not create 16 different variables that calculate the average of the individual elements in this list. I'm looking for an easier way that I could get the average of Team1
I would like it the output to eventually be, so that I can apply this to infinite number of players or individual stats:
Team1 AVGPTS AVGAST AVGSTL AVGREB...
23.75 5.3 2.1 3.2
Or it could be:
Player1 AVGPTS AVGAST AVGSTL AVGREB ...
23.75 5.3 2.1 3.2 ...
To get the averages of the last 16 entries of each entry, you could use the following approach, this avoids the need to define multiple variables for each column:
data = [
["22014",201939,"0021401229","APR 15 2015", "Team1 vs. Team2","W", 19,4,10,0.4,2,4,0.5,0,0,0,2,2,4,7,5,0,2,1,10,14,1],
["22014",201939,"0021401","APR 13 2015","Team1 vs. Team3","W", 15,4,13,0.4,2,8,0.5,0,0,0,2,2,4,7,5,0,8,1,12,14,1],
["22014",201939,"0021401192","APR 11 2015","Team1 vs. Team4","W", 22,5,10,0.4,2,6,0.5,0,0,0,2,2,4,7,5,0,2,1,8,14,1]]
length = float(len(data))
values = []
for entry in data:
values.append(entry[6:])
values = zip(*values)
averages = [sum(v) / length for v in values]
for col in averages:
print "{:.2f} ".format(col),
This would display:
18.67 4.33 11.00 0.40 2.00 6.00 0.50 0.00 0.00 0.00 2.00 2.00 4.00 7.00 5.00 0.00 4.00 1.00 10.00 14.00 1.00
Note, your data is missing an opening quote before each Team1 vs Team2.