Gurobi Multiple Objective Function Hierarchical Degradation - python

I'm trying to implement a Gurobi model with multiple objective functions (specifically 2) that solves lexicographically (in a hierarchy) but I'm running into an issue where when optimizing the second objective function it degrades the solution to the first one, which should not happen with hierarchical optimizations. It is degrading the first solution up by 1, to decrease the second by 5, could this be an error in how I setup my model hierarchically? This is the code where I set up my model:
m = Model('lexMin Model')
m.ModelSense = GRB.MINIMIZE
variable = m.addVars(k.numVars, vtype=GRB.BINARY, name='variable')
m.setObjectiveN(LinExpr(quicksum([variable[j]*k.obj[0][j] for j in range(k.numVars)])),0)
m.setObjectiveN(LinExpr(quicksum([variable[j]*k.obj[1][j] for j in range(k.numVars)])),1)
for i in range(0,k.numConst):
m.addConstr(quicksum([k.const[i,j]*variable[j] for j in range(k.numVars)] <= k.constRHS[i]))
m.addConstr(quicksum([variable[j]*k.obj[0][j] for j in range(k.numVars)]) >= r2[0][0])
m.addConstr(quicksum([variable[j]*k.obj[0][j] for j in range(k.numVars)]) <= r2[1][0])
m.addConstr(quicksum([variable[j]*k.obj[1][j] for j in range(k.numVars)]) >= r2[1][1])
m.addConstr(quicksum([variable[j]*k.obj[1][j] for j in range(k.numVars)]) <= r2[0][1])
m.Params.ObjNumber = 0
m.ObjNPriority = 1
m.update()
m.optimize()
I've double checked and the priority of the second function is 0, the value for the objective functions are nowhere near where they'd be if I prioritized the wrong function. When optimizing the first function it finds the right value, even, but when it moves on to the second value it chooses values that degrade the first value.
The Gurobi output looks like this:
Optimize a model with 6 rows, 375 columns and 2250 nonzeros
Model fingerprint: 0xac5de9aa
Variable types: 0 continuous, 375 integer (375 binary)
Coefficient statistics:
Matrix range [1e+01, 1e+02]
Objective range [1e+01, 1e+02]
Bounds range [1e+00, 1e+00]
RHS range [1e+04, 1e+04]
---------------------------------------------------------------------------
Multi-objectives: starting optimization with 2 objectives ...
---------------------------------------------------------------------------
Multi-objectives: applying initial presolve ...
---------------------------------------------------------------------------
Presolve time: 0.00s
Presolved: 6 rows and 375 columns
---------------------------------------------------------------------------
Multi-objectives: optimize objective 1 () ...
---------------------------------------------------------------------------
Presolve time: 0.00s
Presolved: 6 rows, 375 columns, 2250 nonzeros
Variable types: 0 continuous, 375 integer (375 binary)
Root relaxation: objective -1.461947e+04, 10 iterations, 0.00 seconds
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
0 0 -14619.473 0 3 - -14619.473 - - 0s
H 0 0 -14569.00000 -14619.473 0.35% - 0s
H 0 0 -14603.00000 -14619.473 0.11% - 0s
H 0 0 -14608.00000 -14619.473 0.08% - 0s
H 0 0 -14611.00000 -14618.032 0.05% - 0s
0 0 -14617.995 0 5 -14611.000 -14617.995 0.05% - 0s
0 0 -14617.995 0 3 -14611.000 -14617.995 0.05% - 0s
H 0 0 -14613.00000 -14617.995 0.03% - 0s
0 0 -14617.995 0 5 -14613.000 -14617.995 0.03% - 0s
0 0 -14617.995 0 5 -14613.000 -14617.995 0.03% - 0s
0 0 -14617.995 0 7 -14613.000 -14617.995 0.03% - 0s
0 0 -14617.995 0 3 -14613.000 -14617.995 0.03% - 0s
0 0 -14617.995 0 4 -14613.000 -14617.995 0.03% - 0s
0 0 -14617.995 0 6 -14613.000 -14617.995 0.03% - 0s
0 0 -14617.995 0 6 -14613.000 -14617.995 0.03% - 0s
0 0 -14617.995 0 6 -14613.000 -14617.995 0.03% - 0s
0 0 -14617.720 0 7 -14613.000 -14617.720 0.03% - 0s
0 0 -14617.716 0 8 -14613.000 -14617.716 0.03% - 0s
0 0 -14617.697 0 8 -14613.000 -14617.697 0.03% - 0s
0 0 -14617.661 0 9 -14613.000 -14617.661 0.03% - 0s
0 2 -14617.661 0 9 -14613.000 -14617.661 0.03% - 0s
* 823 0 16 -14614.00000 -14616.351 0.02% 2.8 0s
Cutting planes:
Gomory: 6
Cover: 12
MIR: 4
StrongCG: 2
Inf proof: 6
Zero half: 1
Explored 1242 nodes (3924 simplex iterations) in 0.29 seconds
Thread count was 8 (of 8 available processors)
Solution count 6: -14614 -14613 -14611 ... -14569
No other solutions better than -14614
Optimal solution found (tolerance 1.00e-04)
Best objective -1.461400000000e+04, best bound -1.461400000000e+04, gap 0.0000%
---------------------------------------------------------------------------
Multi-objectives: optimize objective 2 () ...
---------------------------------------------------------------------------
Loaded user MIP start with objective -12798
Presolve removed 1 rows and 0 columns
Presolve time: 0.01s
Presolved: 6 rows, 375 columns, 2250 nonzeros
Variable types: 0 continuous, 375 integer (375 binary)
Root relaxation: objective -1.282967e+04, 28 iterations, 0.00 seconds
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
0 0 -12829.673 0 3 -12798.000 -12829.673 0.25% - 0s
0 0 -12829.378 0 4 -12798.000 -12829.378 0.25% - 0s
0 0 -12829.378 0 3 -12798.000 -12829.378 0.25% - 0s
0 0 -12828.688 0 4 -12798.000 -12828.688 0.24% - 0s
H 0 0 -12803.00000 -12828.688 0.20% - 0s
0 0 -12825.806 0 5 -12803.000 -12825.806 0.18% - 0s
0 0 -12825.193 0 5 -12803.000 -12825.193 0.17% - 0s
0 0 -12823.156 0 6 -12803.000 -12823.156 0.16% - 0s
0 0 -12822.694 0 7 -12803.000 -12822.694 0.15% - 0s
0 0 -12822.679 0 7 -12803.000 -12822.679 0.15% - 0s
0 2 -12822.679 0 7 -12803.000 -12822.679 0.15% - 0s
Cutting planes:
Cover: 16
MIR: 6
StrongCG: 3
Inf proof: 4
RLT: 1
Explored 725 nodes (1629 simplex iterations) in 0.47 seconds
Thread count was 8 (of 8 available processors)
Solution count 2: -12803 -12798
No other solutions better than -12803
Optimal solution found (tolerance 1.00e-04)
Best objective -1.280300000000e+04, best bound -1.280300000000e+04, gap 0.0000%
So it finds the values (-14613,-12803) instead of (-14614,-12798)

The default MIPGap is 1e-4. The first objective is degrading by less than that. (1/14614 =~ 0.7 e-4). If you lower the MIPGap, your issue should go away. In your code add
m.setObjective('MipGap', 1e-6)
before the optimize.
One way to reason about this behavior is that since you had a MIPGap of 1e-4, you would have accepted the a solution with value -14113, even if you didn't have a second objective.

Related

Issue in Appending data using Pandas

Problem: - I want to build a logic that take data like Attendance data, In Time, Employee Id and return a data frame with employee id, in time, attendance date and basically in which slot the employee entered. (Suppose In time is 9:30:00 of date 14-10-2019 so that if employee came at 9:30 so for that date and for that column it insert a value one.)
Given Example below
I tried lots of time to build logic for this problem but failed to build.
I have a dataset that looks like this.
I want an output like this so that whatever the time (employee enter's) it only insert data to that time column only column only.:
This is my code but its only repeating last loop.
temp =[]
for date in nf['DaiGong']:
for en in nf['EnNo']:
for i in nf['DateTime']:
col=['EnNo','Date','InTime','9:30-10:30','10:30-11:00','11:00-11:30','11:30-12:30','12:30-13:00','13:00-13:30']
ndf=pd.DataFrame(columns=col)
if i < '10:30:00' and i > '09:30:00':
temp.append(1)
ndf['9:30-10:30'] = temp
ndf['InTime'] = i
ndf['Date'] = date
ndf['EnNo'] = en
elif i < '11:00:00' and i > '10:30:00':
temp.append(1)
ndf['10:30-11:00'] = temp
ndf['InTime'] = i
ndf['Date'] = date
ndf['EnNo'] = en
elif i < '11:30:00' and i > '11:00:00':
temp.append(1)
ndf['11:00-11:30'] = temp
ndf['InTime'] = i
ndf['Date'] = date
ndf['EnNo'] = en
elif i < '12:30:00' and i > '11:30:00':
temp.append(1)
ndf['11:30-12:30'] = temp
ndf['InTime'] = i
ndf['Date'] = date
ndf['EnNo'] = en
elif i < '13:00:00' and i > '12:30:00':
temp.append(1)
ndf['12:30-13:00'] = temp
ndf['InTime'] = i
ndf['Date'] = date
ndf['EnNo'] = en
elif i < '13:30:00' and i > '13:00:00':
temp.append(1)
ndf['13:00-13:30'] = temp
ndf['InTime'] = i
ndf['Date'] = date
ndf['EnNo'] = en
This is the output of my code.
IIUC,
df = pd.DataFrame({'EnNo':[2,2,2,2,2,3,3,3,3],
'DaiGong':['2019-10-12', '2019-10-13', '2019-10-14', '2019-10-15', '2019-10-16', '2019-10-12', '2019-10-13', '2019-10-14', '2019-10-15'],
'DateTime':['09:53:56', '10:53:56', '09:23:56', '11:53:56', '11:23:56', '10:33:56', '12:53:56', '12:23:56', '09:53:56']})
df
DaiGong DateTime EnNo
0 2019-10-12 09:53:56 2
1 2019-10-13 10:53:56 2
2 2019-10-14 09:23:56 2
3 2019-10-15 11:53:56 2
4 2019-10-16 11:23:56 2
5 2019-10-12 10:33:56 3
6 2019-10-13 12:53:56 3
7 2019-10-14 12:23:56 3
8 2019-10-15 09:53:56 3
import datetime
df['DateTime'] = pd.to_datetime(df['DateTime']).dt.time #converting to datetime
def time_range(row): # I only wrote two conditions - add more
i = row['DateTime']
if i < datetime.time(10, 30, 0) and i > datetime.time(9, 30, 0):
return '9:30-10:30'
elif i < datetime.time(11, 0, 0) and i > datetime.time(10, 30, 0):
return '10:30-11:00'
else:
return 'greater than 11:00'
df['time range'] = df.apply(time_range, axis=1)
df1 = pd.concat([df[['EnNo', 'DaiGong', 'DateTime']], pd.get_dummies(df['time range'])], axis=1)
df1
EnNo DaiGong DateTime 10:30-11:00 9:30-10:30 greater than 11:00
0 2 2019-10-12 09:53:56 0 1 0
1 2 2019-10-13 10:53:56 1 0 0
2 2 2019-10-14 09:23:56 0 0 1
3 2 2019-10-15 11:53:56 0 0 1
4 2 2019-10-16 11:23:56 0 0 1
5 3 2019-10-12 10:33:56 1 0 0
6 3 2019-10-13 12:53:56 0 0 1
7 3 2019-10-14 12:23:56 0 0 1
8 3 2019-10-15 09:53:56 0 1 0
To get sum of count by employee,
df1.groupby(['EnNo'], as_index=False).sum()
Let me know if you have any questions
My test data:
df:
EnNo DaiGong DateTime
2 2019-10-12 09:53:56
2 2019-10-13 09:42:00
2 2019-10-14 12:00:01
1 2019-11-01 11:12:00
1 2019-11-02 10:13:45
Create helper datas:
tdr=pd.timedelta_range("09:00:00","12:30:00",freq="30T")
s=pd.Series( len(tdr)*["-"] )
s[0]=1
cls=[ t.rsplit(":",maxsplit=1)[0] for t in tdr.astype(str) ]
cols=[ t1+"-"+t2 for (t1,t2) in zip(cls,cls[1:]) ]
cols.append(cls[-1]+"-")
tdr:
TimedeltaIndex(['09:00:00', '09:30:00', '10:00:00', '10:30:00', '11:00:00', '11:30:00', '12:00:00', '12:30:00'], dtype='timedelta64[ns]', freq='30T')
cols:
['09:00-09:30', '09:30-10:00', '10:00-10:30', '10:30-11:00', '11:00-11:30', '11:30-12:00', '12:00-12:30', '12:30-']
s:
0 1
1 -
2 -
3 -
4 -
5 -
6 -
7 -
dtype: object
Use 'apply' and 'searchsorted' to get time slots:
df2= df.DateTime.apply(lambda t: \
s.shift(tdr.searchsorted(t)-1,fill_value="-"))
df2.columns=cols
df2:
09:00-09:30 09:30-10:00 10:00-10:30 10:30-11:00 11:00-11:30 11:30-12:00 12:00-12:30 12:30-
0 - 1 - - - - - -
1 - 1 - - - - - -
2 - - - - - - 1 -
3 - - - - 1 - - -
4 - - 1 - - - - -
Finally, concatenate the two data frames:
df_rslt= pd.concat([df,df2],axis=1)
df_rslt:
EnNo DaiGong DateTime 09:00-09:30 09:30-10:00 10:00-10:30 10:30-11:00 11:00-11:30 11:30-12:00 12:00-12:30 12:30-
0 2 2019-10-12 09:53:56 - 1 - - - - - -
1 2 2019-10-13 09:42:00 - 1 - - - - - -
2 2 2019-10-14 12:00:01 - - - - - - 1 -
3 1 2019-11-01 11:12:00 - - - - 1 - - -
4 1 2019-11-02 10:13:45 - - 1 - - - - -

Difference between Gurobi Lazy cuts counter and own counter

I'm implementing a B&C and using a counter that sums 1 after each Lazy Constraint is added.
After solving, there is a big difference between what I count and what Gurobi retrieves as Lazy constraints. What could be causing this difference?
Thanks.
Changed value of parameter LazyConstraints to 1
Prev: 0 Min: 0 Max: 1 Default: 0
Optimize a model with 67 rows, 442 columns and 1154 nonzeros
Variable types: 22 continuous, 420 integer (420 binary)
Coefficient statistics:
Matrix range [1e+00, 1e+00]
Objective range [1e-01, 5e+00]
Bounds range [1e+00, 1e+00]
RHS range [1e+00, 1e+01]
Presolve removed 8 rows and 42 columns
Presolve time: 0.00s
Presolved: 59 rows, 400 columns, 990 nonzeros
Variable types: 1 continuous, 399 integer (399 binary)
Root relaxation: objective 2.746441e+00, 37 iterations, 0.00 seconds
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
0 0 4.18093 0 20 - 4.18093 - - 0s
H 0 0 21.2155889 4.18093 80.3% - 0s
0 0 5.91551 0 31 21.21559 5.91551 72.1% - 0s
H 0 0 18.8660609 5.91551 68.6% - 0s
0 0 6.35067 0 38 18.86606 6.35067 66.3% - 0s
H 0 0 17.9145774 6.35067 64.6% - 0s
0 0 6.85254 0 32 17.91458 6.85254 61.7% - 0s
H 0 0 17.7591641 6.85254 61.4% - 0s
0 0 7.20280 0 50 17.75916 7.20280 59.4% - 0s
H 0 0 17.7516768 7.20280 59.4% - 0s
0 2 7.91616 0 51 17.75168 7.91616 55.4% - 0s
* 80 62 30 17.6301180 8.69940 50.7% 10.7 0s
* 169 138 35 16.3820478 9.10423 44.4% 9.9 1s
* 765 486 22 14.6853796 9.65509 34.3% 9.2 2s
* 1315 762 27 14.6428113 9.97011 31.9% 9.4 3s
* 1324 415 14 12.0742408 9.97011 17.4% 9.4 3s
H 1451 459 11.8261154 10.02607 15.2% 9.7 4s
1458 463 11.78416 15 58 11.82612 10.02607 15.2% 9.6 5s
* 1567 461 33 11.6541357 10.02607 14.0% 10.6 6s
4055 906 11.15860 31 36 11.65414 10.69095 8.26% 12.4 10s
Cutting planes:
Gomory: 4
Flow cover: 1
Lazy constraints: 228
Explored 7974 nodes (98957 simplex iterations) in 14.78 seconds
Thread count was 4 (of 4 available processors)
Solution count 10: 11.6541 11.8261 12.0742 ... 17.9146
Optimal solution found (tolerance 1.00e-04)
Best objective 1.165413573861e+01, best bound 1.165413573861e+01, gap 0.0000%
My Lazy constraints counter: 654
The displayed statistics on cutting planes after the optimization has finished (or stopped) only shows the number of cutting planes that were active in the final LP relaxation that was solved. In particular, the number of lazy constraints that are active that that last node may be less than the total number lazy constraints that were added in a callback. For example, Gurobi may add internal cutting planes during the optimization that dominate the original lazy constraint, or use the lazy constraint from the callback to derive other cuts instead of adding the original one.

Why doesn't the MILP produce a solution when its obviously solvable?

I'm solving a MILP in a Python script with PuLP and the Gurobi solver and varying parameters.
A sensitivity analysis is done with a 'for' loop, changing a parameter at every run. The first runs are with 'worst case' parameters (very low efficiency generator and very bad insulation material), and gradually the parameters get improved while looping through the MILP. At some point, when parameters are set in a way that a solution should be found quite quickly, Gurobipy does not seem to find a solution. This is the log:
Changed value of parameter TimeLimit to 300.0
Prev: 1e+100 Min: 0.0 Max: 1e+100 Default: 1e+100
Optimize a model with 8640 rows, 10080 columns and 20158 nonzeros
Variable types: 8640 continuous, 1440 integer (0 binary)
Coefficient statistics:
Matrix range [2e-05, 4e+04]
Objective range [1e+03, 1e+03]
Bounds range [7e-01, 4e+04]
RHS range [1e-02, 3e+04]
Presolve removed 7319 rows and 7331 columns
Presolve time: 0.03s
Presolved: 1321 rows, 2749 columns, 4069 nonzeros
Variable types: 1320 continuous, 1429 integer (1429 binary)
Root relaxation: objective 4.910087e+05, 679 iterations, 0.01 seconds
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
0 0 491008.698 0 11 - 491008.698 - - 0s
0 0 491008.698 0 11 - 491008.698 - - 0s
0 2 491008.698 0 11 - 491008.698 - - 0s
30429 24907 491680.652 942 3 - 491011.160 - 1.0 5s
73520 66861 491679.428 958 3 - 491011.996 - 1.0 10s
123770 116802 491762.182 1241 2 - 491012.439 - 1.0 15s
174010 165706 491896.963 1266 2 - 491012.636 - 1.0 20s
221580 212357 491234.860 1144 5 - 491012.931 - 1.0 25s
270004 259925 491187.818 904 5 - 491013.203 - 1.0 30s
322655 311334 491807.797 1254 2 - 491013.349 - 1.0 35s
379633 367554 491194.198 941 5 - 491013.571 - 1.0 40s
434035 420930 494029.008 1375 1 - 491013.695 - 1.0 45s
490442 476293 494016.622 1354 1 - 491013.851 - 1.0 50s
544923 529662 491203.097 990 5 - 491013.947 - 1.0 55s
597268 581228 492312.463 1253 2 - 491014.018 - 1.0 60s
650478 633331 491093.453 383 5 - 491014.133 - 1.0 65s
703246 685374 491755.974 1241 2 - 491014.188 - 1.0 70s
756675 737356 491069.420 272 6 - 491014.250 - 1.0 75s
811974 791502 491560.902 1235 3 - 491014.308 - 1.0 80s
866893 845452 491112.986 497 5 - 491014.345 - 1.0 85s
923793 901357 494014.134 1348 1 - 491014.390 - 1.0 90s
981961 958448 492971.305 1266 2 - 491014.435 - 1.0 95s
1039971 1015276 491545.502 1216 4 - 491014.502 - 1.0 100s
1097780 1071899 491171.468 818 5 - 491014.527 - 1.0 105s
1154447 1127328 491108.438 461 5 - 491014.591 - 1.0 110s
1212776 1184651 491024.147 57 6 - 491014.622 - 1.0 115s
1272535 1243171 495190.479 1266 2 - 491014.643 - 1.0 120s
1332126 1301674 491549.733 1228 3 - 491014.668 - 1.0 125s
1392772 1361287 491549.544 1219 3 - 491014.694 - 1.0 130s
1452380 1419870 491754.309 1237 2 - 491014.717 - 1.0 135s
1511070 1477572 491229.746 1131 5 - 491014.735 - 1.0 140s
1569783 1535126 491130.785 587 5 - 491014.764 - 1.0 145s
1628729 1593010 494026.669 1368 1 - 491014.775 - 1.0 150s
1687841 1651373 493189.023 1264 2 - 491014.810 - 1.0 155s
1747707 1709984 491548.263 1223 3 - 491014.841 - 1.0 160s
1807627 1768777 491160.795 755 5 - 491014.876 - 1.0 165s
1865730 1825486 494030.045 1379 1 - 491014.899 - 1.0 170s
1925615 1884356 494028.562 1374 1 - 491014.923 - 1.0 175s
1984204 1941827 491847.402 1115 2 - 491014.933 - 1.0 180s
2044016 2000572 491244.304 1210 5 - 491014.970 - 1.0 185s
2102125 2057622 491174.413 828 5 - 491014.989 - 1.0 190s
2161393 2115829 491115.089 532 5 - 491015.017 - 1.0 195s
2220721 2174168 491086.511 348 6 - 491015.041 - 1.0 200s
2281194 2233610 infeasible 1433 - 491015.048 - 1.0 205s
2341496 2292542 492824.696 1262 2 - 491015.069 - 1.0 210s
2399836 2349837 491548.142 1224 3 - 491015.084 - 1.0 215s
2459295 2408276 491178.869 853 5 - 491015.088 - 1.0 220s
2519203 2467098 491112.995 488 5 - 491015.106 - 1.0 225s
2578654 2525514 491069.711 270 6 - 491015.123 - 1.0 230s
2636111 2582093 491762.206 1250 2 - 491015.139 - 1.0 235s
2695962 2640805 491237.559 1152 5 - 491015.146 - 1.0 240s
2755319 2699171 491156.897 797 6 - 491015.161 - 1.0 245s
2813620 2756371 491024.109 43 7 - 491015.182 - 1.0 250s
2872810 2814527 492309.743 1255 2 - 491015.185 - 1.0 255s
2932550 2873227 492180.501 1255 2 - 491015.202 - 1.0 260s
2991586 2931246 491244.162 1207 5 - 491015.217 - 1.0 265s
3050385 2988872 491196.181 952 5 - 491015.228 - 1.0 270s
3110478 3047787 491127.746 560 5 - 491015.247 - 1.0 275s
3169730 3105844 491109.579 525 6 - 491015.266 - 1.0 280s
3229972 3165019 494029.916 1376 1 - 491015.276 - 1.0 285s
3289639 3223661 491861.516 1173 2 - 491015.293 - 1.0 290s
3349653 3282631 491862.419 1185 2 - 491015.305 - 1.0 295s
Explored 3409667 nodes (3506772 simplex iterations) in 300.02 seconds
Thread count was 8 (of 8 available processors)
Solution count 0
Time limit reached
Best objective -, best bound 4.910153206264e+05, gap -
('Gurobi status=', 9)
I've increased the maximum solving time to 300s (more takes up to much RAM and the programm gets terminated at some point) and played around with parameters (worse parameter settings find a solution!) but nothing seems to work. What might be the problem?
I was able to resolve this by inserting "None" for maximum solving time and inserting a loose maximal gap.

pandas: rapidly calculating sum of column with certain values

I have a pandas dataframe and I need to calculate the sum of a column of values that fall within a certain window. So for instance, if I have a window of 500, and my initial value is 1000, I want to sum all values that are between 499 and 999, and also between 1001 and 1501.
This is easier to explain with some data:
chrom pos end AFR EUR pi
0 1 10177 10177 0.4909 0.4056 0.495988
1 1 10352 10352 0.4788 0.4264 0.496369
2 1 10617 10617 0.9894 0.9940 0.017083
3 1 11008 11008 0.1346 0.0885 0.203142
4 1 11012 11012 0.1346 0.0885 0.203142
5 1 13110 13110 0.0053 0.0567 0.053532
6 1 13116 13116 0.0295 0.1869 0.176091
7 1 13118 13118 0.0295 0.1869 0.176091
8 1 13273 13273 0.0204 0.1471 0.139066
9 1 13550 13550 0.0008 0.0080 0.007795
10 1 14464 14464 0.0144 0.1859 0.161422
11 1 14599 14599 0.1210 0.1610 0.238427
12 1 14604 14604 0.1210 0.1610 0.238427
13 1 14930 14930 0.4811 0.5209 0.500209
14 1 14933 14933 0.0015 0.0507 0.044505
15 1 15211 15211 0.5371 0.7316 0.470848
16 1 15585 15585 0.0008 0.0020 0.002635
17 1 15644 15644 0.0008 0.0080 0.007795
18 1 15777 15777 0.0159 0.0149 0.030470
19 1 15820 15820 0.4849 0.2714 0.477153
20 1 15903 15903 0.0431 0.4652 0.349452
21 1 16071 16071 0.0091 0.0010 0.011142
22 1 16142 16142 0.0053 0.0020 0.007721
23 1 16949 16949 0.0227 0.0159 0.038759
24 1 18643 18643 0.0023 0.0080 0.009485
25 1 18849 18849 0.8411 0.9911 0.170532
26 2 30923 30923 0.6687 0.9364 0.338400
27 2 20286 46286 0.0053 0.0010 0.006863
28 2 21698 46698 0.0015 0.0010 0.002566
29 2 42159 47159 0.0083 0.0696 0.067187
So I need to subset based on the first two columns. For example, if my window = 500, my chrom = 1 and my pos = 15500, I will need to subset my df to include only those rows that have chrom = 1 and 15000 > pos < 16000.
I would then like to sum the AFR column of this subset of data.
Here is the function I have made:
#vdf is my main dataframe,
#polyChrom is the chromosome to subset by,
#polyPos is the position to subset by.
#Distance is how far the window should be from the polyPos.
#windowSize is the size of the window itself
#E.g. if distance=20000 and windowSize= 500, we are looking at a window
#that is (polyPos-20000)-500 to (polyPos-20000) and a window that is
#(polyPos+20000) to (polyPos+20000)+500.
def mafWindow(vdf, polyChrom, polyPos, distance, windowSize):
#If start position becomes less than 0, set it to 0
if(polyPos - distance < 0):
start1 = 0
end1 = windowSize
else:
start1 = polyPos - distance
end1 = start1 + windowSize
end2 = polyPos + distance
start2 = end2 - windowSize
#subset df
df = vdf.loc[(vdf['chrom'] == polyChrom) & ((vdf['pos'] <= end1) & (vdf['pos'] >= start1))|
((vdf['pos'] <= end2) & (vdf['pos'] >= start2))].copy()
return(df.AFR.sum())
This whole method works on subsetting the dataframe and is very slow when my dataframe contains ~55k rows. Is there a quicker and more efficient way of doing this?
The trick is to drop down to numpy arrays. Pandas indexing and slicing is slow.
import pandas as pd
df = pd.DataFrame([[1, 10177, 0.5], [1, 10178, 0.2], [1, 20178, 0.1],
[2, 10180, 0.3], [1, 10180, 0.4]], columns=['chrom', 'pos', 'AFR'])
chrom = df['chrom'].values
pos = df['pos'].values
afr = df['AFR'].values
def filter_sum(chrom_arr, pos_arr, afr_arr, chrom_val, pos_start, pos_end):
return sum(k for i, j, k in zip(chrom_arr, pos_arr, afr_arr) \
if pos_start < j < pos_end and i == chrom_val)
filter_sum(chrom, pos, afr, 1, 10150, 10200)
# 1.1

Why can't I pass the code test in python3?

what
I
met
is
a code question below:
https://www.patest.cn/contests/pat-a-practise/1001
Calculate a + b and output the sum in standard format -- that is, the digits must be separated into groups of three by commas (unless there are less than four digits).
Input
Each input file contains one test case. Each case contains a pair of integers a and b where -1000000 <= a, b <= 1000000. The numbers are separated by a space.
Output
For each test case, you should output the sum of a and b in one line. The sum must be written in the standard format.
Sample Input
-1000000 9
Sample Output
-999,991
This is my code below:
if __name__ == "__main__":
aline = input()
astr,bstr = aline.strip().split()
a,b = int(astr),int(bstr)
sum = a + b
sumstr= str(sum)
result = ''
while sumstr:
sumstr, aslice = sumstr[:-3], sumstr[-3:]
if sumstr:
result = ',' + aslice + result
else:
result = aslice + result
print(result)
And the test result turn out to be :
时间(Time) 结果(test result) 得分(score) 题目(question number)
语言(programe language) 用时(ms)[time consume] 内存(kB)[memory] 用户[user]
8月22日 15:46 部分正确[Partial Correct](Why?!!!) 11 1001
Python (python3 3.4.2) 25 3184 polar9527
测试点[test point] 结果[result] 用时(ms)[time consume] 内存(kB)[memory] 得分[score]/满分[full credit]
0 答案错误[wrong] 25 3056 0/9
1 答案正确[correct] 19 3056 1/1
10 答案正确[correct] 18 3184 1/1
11 答案正确[correct] 19 3176 1/1
2 答案正确[correct] 17 3180 1/1
3 答案正确[correct] 16 3056 1/1
4 答案正确[correct] 14 3184 1/1
5 答案正确[correct] 17 3056 1/1
6 答案正确[correct] 19 3168 1/1
7 答案正确[correct] 22 3184 1/1
8 答案正确[correct] 21 3164 1/1
9 答案正确[correct] 15 3184 1/1
I can give you a simple that doesn't match answer, when you enter -1000000, 9 as a, b in your input, you'll get -,999,991.which is wrong.
To get the right answer, you really should get to know format in python.
To solve this question, you can just write your code like this.
if __name__ == "__main__":
aline = input()
astr,bstr = aline.strip().split()
a,b = int(astr),int(bstr)
sum = a + b
print('{:,}'.format(sum))
Notice the behavior of your code when you input -1000 and 1. You need to handle the minus sign, because it is not a digit.

Categories