I'm solving a MILP in a Python script with PuLP and the Gurobi solver and varying parameters.
A sensitivity analysis is done with a 'for' loop, changing a parameter at every run. The first runs are with 'worst case' parameters (very low efficiency generator and very bad insulation material), and gradually the parameters get improved while looping through the MILP. At some point, when parameters are set in a way that a solution should be found quite quickly, Gurobipy does not seem to find a solution. This is the log:
Changed value of parameter TimeLimit to 300.0
Prev: 1e+100 Min: 0.0 Max: 1e+100 Default: 1e+100
Optimize a model with 8640 rows, 10080 columns and 20158 nonzeros
Variable types: 8640 continuous, 1440 integer (0 binary)
Coefficient statistics:
Matrix range [2e-05, 4e+04]
Objective range [1e+03, 1e+03]
Bounds range [7e-01, 4e+04]
RHS range [1e-02, 3e+04]
Presolve removed 7319 rows and 7331 columns
Presolve time: 0.03s
Presolved: 1321 rows, 2749 columns, 4069 nonzeros
Variable types: 1320 continuous, 1429 integer (1429 binary)
Root relaxation: objective 4.910087e+05, 679 iterations, 0.01 seconds
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
0 0 491008.698 0 11 - 491008.698 - - 0s
0 0 491008.698 0 11 - 491008.698 - - 0s
0 2 491008.698 0 11 - 491008.698 - - 0s
30429 24907 491680.652 942 3 - 491011.160 - 1.0 5s
73520 66861 491679.428 958 3 - 491011.996 - 1.0 10s
123770 116802 491762.182 1241 2 - 491012.439 - 1.0 15s
174010 165706 491896.963 1266 2 - 491012.636 - 1.0 20s
221580 212357 491234.860 1144 5 - 491012.931 - 1.0 25s
270004 259925 491187.818 904 5 - 491013.203 - 1.0 30s
322655 311334 491807.797 1254 2 - 491013.349 - 1.0 35s
379633 367554 491194.198 941 5 - 491013.571 - 1.0 40s
434035 420930 494029.008 1375 1 - 491013.695 - 1.0 45s
490442 476293 494016.622 1354 1 - 491013.851 - 1.0 50s
544923 529662 491203.097 990 5 - 491013.947 - 1.0 55s
597268 581228 492312.463 1253 2 - 491014.018 - 1.0 60s
650478 633331 491093.453 383 5 - 491014.133 - 1.0 65s
703246 685374 491755.974 1241 2 - 491014.188 - 1.0 70s
756675 737356 491069.420 272 6 - 491014.250 - 1.0 75s
811974 791502 491560.902 1235 3 - 491014.308 - 1.0 80s
866893 845452 491112.986 497 5 - 491014.345 - 1.0 85s
923793 901357 494014.134 1348 1 - 491014.390 - 1.0 90s
981961 958448 492971.305 1266 2 - 491014.435 - 1.0 95s
1039971 1015276 491545.502 1216 4 - 491014.502 - 1.0 100s
1097780 1071899 491171.468 818 5 - 491014.527 - 1.0 105s
1154447 1127328 491108.438 461 5 - 491014.591 - 1.0 110s
1212776 1184651 491024.147 57 6 - 491014.622 - 1.0 115s
1272535 1243171 495190.479 1266 2 - 491014.643 - 1.0 120s
1332126 1301674 491549.733 1228 3 - 491014.668 - 1.0 125s
1392772 1361287 491549.544 1219 3 - 491014.694 - 1.0 130s
1452380 1419870 491754.309 1237 2 - 491014.717 - 1.0 135s
1511070 1477572 491229.746 1131 5 - 491014.735 - 1.0 140s
1569783 1535126 491130.785 587 5 - 491014.764 - 1.0 145s
1628729 1593010 494026.669 1368 1 - 491014.775 - 1.0 150s
1687841 1651373 493189.023 1264 2 - 491014.810 - 1.0 155s
1747707 1709984 491548.263 1223 3 - 491014.841 - 1.0 160s
1807627 1768777 491160.795 755 5 - 491014.876 - 1.0 165s
1865730 1825486 494030.045 1379 1 - 491014.899 - 1.0 170s
1925615 1884356 494028.562 1374 1 - 491014.923 - 1.0 175s
1984204 1941827 491847.402 1115 2 - 491014.933 - 1.0 180s
2044016 2000572 491244.304 1210 5 - 491014.970 - 1.0 185s
2102125 2057622 491174.413 828 5 - 491014.989 - 1.0 190s
2161393 2115829 491115.089 532 5 - 491015.017 - 1.0 195s
2220721 2174168 491086.511 348 6 - 491015.041 - 1.0 200s
2281194 2233610 infeasible 1433 - 491015.048 - 1.0 205s
2341496 2292542 492824.696 1262 2 - 491015.069 - 1.0 210s
2399836 2349837 491548.142 1224 3 - 491015.084 - 1.0 215s
2459295 2408276 491178.869 853 5 - 491015.088 - 1.0 220s
2519203 2467098 491112.995 488 5 - 491015.106 - 1.0 225s
2578654 2525514 491069.711 270 6 - 491015.123 - 1.0 230s
2636111 2582093 491762.206 1250 2 - 491015.139 - 1.0 235s
2695962 2640805 491237.559 1152 5 - 491015.146 - 1.0 240s
2755319 2699171 491156.897 797 6 - 491015.161 - 1.0 245s
2813620 2756371 491024.109 43 7 - 491015.182 - 1.0 250s
2872810 2814527 492309.743 1255 2 - 491015.185 - 1.0 255s
2932550 2873227 492180.501 1255 2 - 491015.202 - 1.0 260s
2991586 2931246 491244.162 1207 5 - 491015.217 - 1.0 265s
3050385 2988872 491196.181 952 5 - 491015.228 - 1.0 270s
3110478 3047787 491127.746 560 5 - 491015.247 - 1.0 275s
3169730 3105844 491109.579 525 6 - 491015.266 - 1.0 280s
3229972 3165019 494029.916 1376 1 - 491015.276 - 1.0 285s
3289639 3223661 491861.516 1173 2 - 491015.293 - 1.0 290s
3349653 3282631 491862.419 1185 2 - 491015.305 - 1.0 295s
Explored 3409667 nodes (3506772 simplex iterations) in 300.02 seconds
Thread count was 8 (of 8 available processors)
Solution count 0
Time limit reached
Best objective -, best bound 4.910153206264e+05, gap -
('Gurobi status=', 9)
I've increased the maximum solving time to 300s (more takes up to much RAM and the programm gets terminated at some point) and played around with parameters (worse parameter settings find a solution!) but nothing seems to work. What might be the problem?
I was able to resolve this by inserting "None" for maximum solving time and inserting a loose maximal gap.
Related
I am trying to scrape both the 'settle' columns in conjunction with the base month and what respective table they are from (from this url: https://www.asxenergy.com.au/futures_nz/A)
I am able to run an html parser, but as soon as I attempt to run something similar to this:
table1 = soup.find(‘table’)
table1
it just comes back with nothing being there. I assume I'm making an error regarding the table tag. Would really appreciate some help!
Ideally I would like to be able to get the data from this table and then store it in a dataframe.
To read tables to panda's DataFrames you can use next example (as #TimRobers said, the data is loaded with JavaScript from different URL):
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://www.asxenergy.com.au/futures_nz/dataset"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
for table in soup.select("table:not(:has(table))"):
df = pd.read_html(str(table))[0]
df["TITLE"] = table.find_previous("h2").get_text(strip=True)
print(df)
print("-" * 160)
Prints:
Base Month Bid Size Bid Ask Ask Size High Low Last +/- Vol OpenInt OpenInt +/- Settle TITLE
0 Aug 2022 - - - - 54.00 52.85 54.00 +1.45 30 1610 - 52.55 Otahuhu
1 Sep 2022 - - - - 69.00 66.00 66.00 +1.00 97 1624 - 65.00 Otahuhu
2 Oct 2022 - - - - 84.10 81.75 81.75 +0.30 62 1585 - 81.45 Otahuhu
3 Nov 2022 - - - - 104.00 100.45 100.45 +0.40 62 1192 - 100.05 Otahuhu
4 Dec 2022 - - - - 87.25 84.70 84.70 +0.35 32 952 - 84.35 Otahuhu
5 Jan 2023 - - - - 119.10 118.10 118.20 +0.55 58 524 - 117.65 Otahuhu
6 Feb 2023 - - - - - - - - - 3 - 175.25 Otahuhu
7 Mar 2023 - - - - - - - - - - - 184.20 Otahuhu
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Base Quarter Bid Size Bid Ask Ask Size High Low Last +/- Vol OpenInt OpenInt +/- Settle TITLE
0 Q3 2022 - - - - 75.30 73.15 73.65 -0.65 31 3679 - 74.30 Otahuhu
1 Q4 2022 - - - - 91.60 87.60 88.50 - 65 4109 - 88.50 Otahuhu
2 Q1 2023 - - - - 163.25 158.50 158.50 - 123 3401 - 158.50 Otahuhu
3 Q2 2023 - - - - - - - - - 2403 - 214.00 Otahuhu
4 Q3 2023 - - - - 216.00 216.00 216.00 - 30 2438 - 216.00 Otahuhu
5 Q4 2023 - - - - 143.55 142.00 143.55 - 60 3357 - 143.55 Otahuhu
6 Q1 2024 - - - - - - - - - 3093 - 159.00 Otahuhu
7 Q2 2024 - - - - 197.00 197.00 197.00 - 30 2082 - 197.00 Otahuhu
8 Q3 2024 - - - - - - - - - 2091 - 197.50 Otahuhu
9 Q4 2024 - - - - 145.95 143.00 145.95 - 40 2649 - 145.95 Otahuhu
10 Q1 2025 - - - - 151.00 150.50 150.70 -0.30 46 1838 - 151.00 Otahuhu
11 Q2 2025 - - - - 178.00 175.20 176.00 - 92 1619 - 176.00 Otahuhu
12 Q3 2025 - - - - 178.00 175.20 176.00 - 92 1316 - 176.00 Otahuhu
13 Q4 2025 - - - - 128.45 125.00 125.50 - 100 1845 - 125.50 Otahuhu
----------------------------------------------------------------------------------------------------------------------------------------------------------------
...and so on.
I have a Kafka topic with 40 partitions. In a Kubernetes cluster.
I further have a microservice that consumes from this topic.
Sometimes it happens, within a batch process, that at one point there are some partitions left with unprocessed data while most partitions are finished. Using the kafka-consumer-groups.sh this looks like this:
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
- - - - - kafka-python-2.0.1-f1259971-c8ed-4d98-ba37-40f263b14a78/10.44.2.119 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-328f6a97-22ea-4f59-b702-4173feb9f025/10.44.0.29 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-9a2ea04e-3bf1-40f4-9262-6c14d0791dfc/10.44.7.35 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-81f5be15-535c-436c-996e-f8098d0613a1/10.44.4.26 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-ffcf76e2-f0ed-4894-bc70-ee73220881db/10.44.14.2 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-fc5709a0-a0b5-4324-92ff-02b6ee0f1232/10.44.2.123 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-c058418c-51ec-43e2-b666-21971480665b/10.44.15.2 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-0c14afab-af2a-4668-bb3c-015932fbfd13/10.44.14.5 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-1cb308f0-203f-43ae-9252-e0fc98eb87b8/10.44.14.4 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-42753a7f-80d0-481e-93a6-67445cb1bb5e/10.44.14.6 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-63e97395-e1ec-4cab-8edc-c5dd251932af/10.44.2.122 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-7116fdc2-809f-4f99-b5bd-60fbf2aba935/10.44.1.37 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-f5ef8ff1-f09c-498e-9b27-1bcac94b895b/10.44.2.125 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-8feec117-aa3a-42c0-91e8-0ccefac5f134/10.44.2.121 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-45cc5605-d3c8-4c77-8ca8-88afbde81a69/10.44.14.3 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-9a575ac4-1531-4b2a-b516-12ffa2496615/10.44.5.32 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-d33e112b-a1f4-4699-8989-daee03a5021c/10.44.14.7 kafka-python-2.0.1
my-topic 20 890 890 0 - - -
my-topic 38 857 857 0 - - -
my-topic 28 918 918 0 - - -
my-topic 23 66 909 843 - - -
my-topic 10 888 888 0 - - -
my-topic 2 885 885 0 - - -
my-topic 7 853 853 0 - - -
my-topic 16 878 878 0 - - -
my-topic 15 47 901 854 - - -
my-topic 26 934 934 0 - - -
my-topic 32 898 898 0 - - -
my-topic 21 921 921 0 - - -
my-topic 13 933 933 0 - - -
my-topic 5 879 879 0 - - -
my-topic 12 945 945 0 - - -
my-topic 4 918 918 0 - - -
my-topic 29 924 924 0 - - -
my-topic 39 895 895 0 - - -
my-topic 25 30 926 896 - - -
my-topic 9 915 915 0 - - -
my-topic 35 31 890 859 - - -
my-topic 3 69 897 828 - - -
my-topic 1 911 911 0 - - -
my-topic 6 22 901 879 - - -
my-topic 14 41 881 840 - - -
my-topic 30 900 900 0 - - -
my-topic 22 847 847 0 - - -
my-topic 8 919 919 0 - - -
my-topic 0 902 902 0 - - -
my-topic 18 924 924 0 - - -
my-topic 36 864 864 0 - - -
my-topic 34 929 929 0 - - -
my-topic 24 864 864 0 - - -
my-topic 19 937 937 0 - - -
my-topic 27 859 859 0 - - -
my-topic 11 838 838 0 - - -
my-topic 31 49 922 873 - - -
my-topic 37 882 882 0 - - -
my-topic 17 942 942 0 - - -
my-topic 33 928 928 0 - - -
It further states that the consumer group is rebalancing.
One thing to note here is that under CONSUMER-ID there are fewer consumers stated as there should be. It should be 20 consumers but in this output, there are only 17 shown even though all pods run. This number varies and I am not sure if it is an output issue or if they are really not there. This also baffles me because when I initially start (all new Kafka and consumer deployments) this does not happen. So it really seems to be related to consumer deployments being scaled, or otherwise killed.
It then happens for a short time that the consumers get assigned and after about half a minute the same picture as above shows again where the consumer group is rebalancing.
This happens also when I scale down. E.g. when I only have 4 consumers. I am not sure what's happening here. The pods all run and I use the same kind of base code and pattern in other microservices where it seems to work fine.
I suspect that it has something to do with a consumer pod getting killed because, as I said, with a new deployment it works initially. This batch is also a bit more long-running than the others I have so a pod kill is more likely during its run. I am also not sure if it has something to do with most partitions already being finished, this could also just be a quirk of my use case.
I recognized this because the processing seemed to take forever but new data was still processed. So I think what happens is that for the brief moment when the consumers are assigned they process data but they never commit the offset before getting rebalanced leaving them in an infinite loop. The only slightly related thing I found was this issue but it is from quite some versions before and does not fully describe my situation.
I use the kafka-python client and I use the kafka image confluentinc/cp-kafka:5.0.1.
I create the topic using the admin client NewTopic(name='my-topic', num_partitions=40, replication_factor=1) and create the client like so:
consumer = KafkaConsumer(consume_topic,
bootstrap_servers=bootstrap_servers,
group_id=consume_group_id,
value_deserializer=lambda m: json.loads(m))
for message in consumer:
process(message)
What is going wrong here?
Do I have some configuration error?
Any help is greatly appreciated.
The issue was with the heartbeat configuration. It turns out that while most messages only need seconds to process, few messages take very long to process. In these special cases the heartbeat update took too long for some of the consumers resulting in the broker to assume the consumer is down and start a rebalance.
I assume what happened next is the consumers getting reassigned to the same message, taking too long to process it again and triggering yet another rebalance. Resulting in an endless cycle.
I finally solved it by increasing both session_timeout_ms and heartbeat_interval_ms in the consumer (documented here). I also decreased the batch size so that the heartbeat is updated more regularly.
I'm trying to implement a Gurobi model with multiple objective functions (specifically 2) that solves lexicographically (in a hierarchy) but I'm running into an issue where when optimizing the second objective function it degrades the solution to the first one, which should not happen with hierarchical optimizations. It is degrading the first solution up by 1, to decrease the second by 5, could this be an error in how I setup my model hierarchically? This is the code where I set up my model:
m = Model('lexMin Model')
m.ModelSense = GRB.MINIMIZE
variable = m.addVars(k.numVars, vtype=GRB.BINARY, name='variable')
m.setObjectiveN(LinExpr(quicksum([variable[j]*k.obj[0][j] for j in range(k.numVars)])),0)
m.setObjectiveN(LinExpr(quicksum([variable[j]*k.obj[1][j] for j in range(k.numVars)])),1)
for i in range(0,k.numConst):
m.addConstr(quicksum([k.const[i,j]*variable[j] for j in range(k.numVars)] <= k.constRHS[i]))
m.addConstr(quicksum([variable[j]*k.obj[0][j] for j in range(k.numVars)]) >= r2[0][0])
m.addConstr(quicksum([variable[j]*k.obj[0][j] for j in range(k.numVars)]) <= r2[1][0])
m.addConstr(quicksum([variable[j]*k.obj[1][j] for j in range(k.numVars)]) >= r2[1][1])
m.addConstr(quicksum([variable[j]*k.obj[1][j] for j in range(k.numVars)]) <= r2[0][1])
m.Params.ObjNumber = 0
m.ObjNPriority = 1
m.update()
m.optimize()
I've double checked and the priority of the second function is 0, the value for the objective functions are nowhere near where they'd be if I prioritized the wrong function. When optimizing the first function it finds the right value, even, but when it moves on to the second value it chooses values that degrade the first value.
The Gurobi output looks like this:
Optimize a model with 6 rows, 375 columns and 2250 nonzeros
Model fingerprint: 0xac5de9aa
Variable types: 0 continuous, 375 integer (375 binary)
Coefficient statistics:
Matrix range [1e+01, 1e+02]
Objective range [1e+01, 1e+02]
Bounds range [1e+00, 1e+00]
RHS range [1e+04, 1e+04]
---------------------------------------------------------------------------
Multi-objectives: starting optimization with 2 objectives ...
---------------------------------------------------------------------------
Multi-objectives: applying initial presolve ...
---------------------------------------------------------------------------
Presolve time: 0.00s
Presolved: 6 rows and 375 columns
---------------------------------------------------------------------------
Multi-objectives: optimize objective 1 () ...
---------------------------------------------------------------------------
Presolve time: 0.00s
Presolved: 6 rows, 375 columns, 2250 nonzeros
Variable types: 0 continuous, 375 integer (375 binary)
Root relaxation: objective -1.461947e+04, 10 iterations, 0.00 seconds
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
0 0 -14619.473 0 3 - -14619.473 - - 0s
H 0 0 -14569.00000 -14619.473 0.35% - 0s
H 0 0 -14603.00000 -14619.473 0.11% - 0s
H 0 0 -14608.00000 -14619.473 0.08% - 0s
H 0 0 -14611.00000 -14618.032 0.05% - 0s
0 0 -14617.995 0 5 -14611.000 -14617.995 0.05% - 0s
0 0 -14617.995 0 3 -14611.000 -14617.995 0.05% - 0s
H 0 0 -14613.00000 -14617.995 0.03% - 0s
0 0 -14617.995 0 5 -14613.000 -14617.995 0.03% - 0s
0 0 -14617.995 0 5 -14613.000 -14617.995 0.03% - 0s
0 0 -14617.995 0 7 -14613.000 -14617.995 0.03% - 0s
0 0 -14617.995 0 3 -14613.000 -14617.995 0.03% - 0s
0 0 -14617.995 0 4 -14613.000 -14617.995 0.03% - 0s
0 0 -14617.995 0 6 -14613.000 -14617.995 0.03% - 0s
0 0 -14617.995 0 6 -14613.000 -14617.995 0.03% - 0s
0 0 -14617.995 0 6 -14613.000 -14617.995 0.03% - 0s
0 0 -14617.720 0 7 -14613.000 -14617.720 0.03% - 0s
0 0 -14617.716 0 8 -14613.000 -14617.716 0.03% - 0s
0 0 -14617.697 0 8 -14613.000 -14617.697 0.03% - 0s
0 0 -14617.661 0 9 -14613.000 -14617.661 0.03% - 0s
0 2 -14617.661 0 9 -14613.000 -14617.661 0.03% - 0s
* 823 0 16 -14614.00000 -14616.351 0.02% 2.8 0s
Cutting planes:
Gomory: 6
Cover: 12
MIR: 4
StrongCG: 2
Inf proof: 6
Zero half: 1
Explored 1242 nodes (3924 simplex iterations) in 0.29 seconds
Thread count was 8 (of 8 available processors)
Solution count 6: -14614 -14613 -14611 ... -14569
No other solutions better than -14614
Optimal solution found (tolerance 1.00e-04)
Best objective -1.461400000000e+04, best bound -1.461400000000e+04, gap 0.0000%
---------------------------------------------------------------------------
Multi-objectives: optimize objective 2 () ...
---------------------------------------------------------------------------
Loaded user MIP start with objective -12798
Presolve removed 1 rows and 0 columns
Presolve time: 0.01s
Presolved: 6 rows, 375 columns, 2250 nonzeros
Variable types: 0 continuous, 375 integer (375 binary)
Root relaxation: objective -1.282967e+04, 28 iterations, 0.00 seconds
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
0 0 -12829.673 0 3 -12798.000 -12829.673 0.25% - 0s
0 0 -12829.378 0 4 -12798.000 -12829.378 0.25% - 0s
0 0 -12829.378 0 3 -12798.000 -12829.378 0.25% - 0s
0 0 -12828.688 0 4 -12798.000 -12828.688 0.24% - 0s
H 0 0 -12803.00000 -12828.688 0.20% - 0s
0 0 -12825.806 0 5 -12803.000 -12825.806 0.18% - 0s
0 0 -12825.193 0 5 -12803.000 -12825.193 0.17% - 0s
0 0 -12823.156 0 6 -12803.000 -12823.156 0.16% - 0s
0 0 -12822.694 0 7 -12803.000 -12822.694 0.15% - 0s
0 0 -12822.679 0 7 -12803.000 -12822.679 0.15% - 0s
0 2 -12822.679 0 7 -12803.000 -12822.679 0.15% - 0s
Cutting planes:
Cover: 16
MIR: 6
StrongCG: 3
Inf proof: 4
RLT: 1
Explored 725 nodes (1629 simplex iterations) in 0.47 seconds
Thread count was 8 (of 8 available processors)
Solution count 2: -12803 -12798
No other solutions better than -12803
Optimal solution found (tolerance 1.00e-04)
Best objective -1.280300000000e+04, best bound -1.280300000000e+04, gap 0.0000%
So it finds the values (-14613,-12803) instead of (-14614,-12798)
The default MIPGap is 1e-4. The first objective is degrading by less than that. (1/14614 =~ 0.7 e-4). If you lower the MIPGap, your issue should go away. In your code add
m.setObjective('MipGap', 1e-6)
before the optimize.
One way to reason about this behavior is that since you had a MIPGap of 1e-4, you would have accepted the a solution with value -14113, even if you didn't have a second objective.
I've been trying to develop a scraping code to retrieve tables from an italian Fantasy Football website. To do so, I wanted to parse the html of using python, BeautifulSoup and pandas.
However, when I parse the html code with BeautifulSoup, I cannot find any tables:
This code:
>>> # import libraries
>>> import requests
>>> from bs4 import BeautifulSoup
>>> # define url of interest, request it and parse it
>>> url = 'https://www.fantacalcio.it/voti-fantacalcio-serie-a'
>>> response = requests.get(url)
>>> soup = BeautifulSoup(response.text, 'lxml')
>>> # find the first table in the code
>>> print(soup.find('table'))
None
I am new to html, but after some research I understood that the tables of interest might be contained in a pseudo-element, which is not appearing in the html code of the requested URL.
Is there a way to scrape the information contained in these tables?
This is one of the tables highlighted in Chrome
This is the related html snippet from the Chrome inspector tool, where the information is still available
This is how the same snippet looks after parsing:
>>> search = soup.find('div', id='Ata')
>>> print(search.prettify())
<div class="row no-gutter tbvoti" data-team="1" id="Ata">
</div>
empty...
Is it somehow possible to access the data?
Thank you very much for your help
If you go to Network Tab you will find following url which retrieve table data.This link will give you first table info same way you can fetch all table info.
https://www.fantacalcio.it/Servizi/Voti.ashx?s=2019-20&g=16&tv=314303547921&t=1
You can pandas library to read the table info and load into dataframe.
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://www.fantacalcio.it/Servizi/Voti.ashx?s=2019-20&g=16&tv=314303547921&t=1'
df=pd.read_html(url)
print(df[0])
Piggy backing on the answer below (so accept KunduK's answer as it's correct), you can iterate through the tables and create a list of the dataframes. I couldn't find where they get that parameter for t, so I just have it going through them all.
import pandas as pd
dfs = []
for i in range(1,200):
try:
url = 'https://www.fantacalcio.it/Servizi/Voti.ashx?s=2019-20&g=16&tv=314303547921&t=%s' %i
dfs.append(pd.read_html(url)[0])
except:
continue
Output:
print (dfs)
[ ATALANTA VOTO e FANTAVOTO ... BONUS/MALUS
ATALANTA Fantacalcio ... Fonte unica Fantacalcio.it
Unnamed: 0_level_2 Unnamed: 1_level_2 V Fv ... Rp Rs Au As
0 PGOLLINI NaN 65 45 ... - - - -
1 DCASTAGNE NaN 55 55 ... - - - -
2 DDJIMSITI NaN 55 55 ... - - - -
3 DGOSENS NaN 6 6 ... - - - -
4 DHATEBOER NaN 6 6 ... - - - -
5 DPALOMINO NaN 55 55 ... - - - -
6 DTOLOI NaN 6 6 ... - - - -
7 CCOLLEY E NaN 6 - ... - - - -
8 CDE ROON NaN 55 55 ... - - - -
9 CFREULER NaN 55 55 ... - - - -
10 CMALINOVSKYI NaN 65 95 ... - - - -
11 CPASALIC NaN 55 5 ... - - - -
12 ABARROW NaN 65 75 ... - - - 1
13 AMURIEL NaN 5 5 ... - - - -
14 ALLGASPERINI NaN 6 6 ... - - - -
[15 rows x 15 columns], BOLOGNA VOTO e FANTAVOTO ... BONUS/MALUS
BOLOGNA Fantacalcio ... Fonte unica Fantacalcio.it
Unnamed: 0_level_2 Unnamed: 1_level_2 V Fv ... Rp Rs Au As
0 PSKORUPSKI NaN 6 5 ... - - - -
1 DBANI NaN 6 6 ... - - - -
2 DDANILO LAR NaN 55 45 ... - - - -
3 DDENSWIL NaN 55 55 ... - - - -
4 DMBAYE NaN 6 - ... - - - -
5 DTOMIYASU NaN 65 75 ... - - - 1
6 CPOLI V 7 10 ... - - - -
7 CSVANBERG NaN 6 6 ... - - - -
8 CMEDEL NaN 6 6 ... - - - -
9 CDZEMAILI NaN 55 55 ... - - - -
10 AORSOLINI NaN 65 6 ... - - - -
11 APALACIO NaN 75 10 ... - - - -
12 ASANSONE N NaN 6 55 ... - - - -
13 ASANTANDER NaN 55 55 ... - - - -
14 ALLMIHAJLOVIC NaN 65 65 ... - - - -
[15 rows x 15 columns],
....
I'm implementing a B&C and using a counter that sums 1 after each Lazy Constraint is added.
After solving, there is a big difference between what I count and what Gurobi retrieves as Lazy constraints. What could be causing this difference?
Thanks.
Changed value of parameter LazyConstraints to 1
Prev: 0 Min: 0 Max: 1 Default: 0
Optimize a model with 67 rows, 442 columns and 1154 nonzeros
Variable types: 22 continuous, 420 integer (420 binary)
Coefficient statistics:
Matrix range [1e+00, 1e+00]
Objective range [1e-01, 5e+00]
Bounds range [1e+00, 1e+00]
RHS range [1e+00, 1e+01]
Presolve removed 8 rows and 42 columns
Presolve time: 0.00s
Presolved: 59 rows, 400 columns, 990 nonzeros
Variable types: 1 continuous, 399 integer (399 binary)
Root relaxation: objective 2.746441e+00, 37 iterations, 0.00 seconds
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
0 0 4.18093 0 20 - 4.18093 - - 0s
H 0 0 21.2155889 4.18093 80.3% - 0s
0 0 5.91551 0 31 21.21559 5.91551 72.1% - 0s
H 0 0 18.8660609 5.91551 68.6% - 0s
0 0 6.35067 0 38 18.86606 6.35067 66.3% - 0s
H 0 0 17.9145774 6.35067 64.6% - 0s
0 0 6.85254 0 32 17.91458 6.85254 61.7% - 0s
H 0 0 17.7591641 6.85254 61.4% - 0s
0 0 7.20280 0 50 17.75916 7.20280 59.4% - 0s
H 0 0 17.7516768 7.20280 59.4% - 0s
0 2 7.91616 0 51 17.75168 7.91616 55.4% - 0s
* 80 62 30 17.6301180 8.69940 50.7% 10.7 0s
* 169 138 35 16.3820478 9.10423 44.4% 9.9 1s
* 765 486 22 14.6853796 9.65509 34.3% 9.2 2s
* 1315 762 27 14.6428113 9.97011 31.9% 9.4 3s
* 1324 415 14 12.0742408 9.97011 17.4% 9.4 3s
H 1451 459 11.8261154 10.02607 15.2% 9.7 4s
1458 463 11.78416 15 58 11.82612 10.02607 15.2% 9.6 5s
* 1567 461 33 11.6541357 10.02607 14.0% 10.6 6s
4055 906 11.15860 31 36 11.65414 10.69095 8.26% 12.4 10s
Cutting planes:
Gomory: 4
Flow cover: 1
Lazy constraints: 228
Explored 7974 nodes (98957 simplex iterations) in 14.78 seconds
Thread count was 4 (of 4 available processors)
Solution count 10: 11.6541 11.8261 12.0742 ... 17.9146
Optimal solution found (tolerance 1.00e-04)
Best objective 1.165413573861e+01, best bound 1.165413573861e+01, gap 0.0000%
My Lazy constraints counter: 654
The displayed statistics on cutting planes after the optimization has finished (or stopped) only shows the number of cutting planes that were active in the final LP relaxation that was solved. In particular, the number of lazy constraints that are active that that last node may be less than the total number lazy constraints that were added in a callback. For example, Gurobi may add internal cutting planes during the optimization that dominate the original lazy constraint, or use the lazy constraint from the callback to derive other cuts instead of adding the original one.