How to split a list string based on \n - python

I have a list like this:
['Sehingga 8 Ogos 2021: Jumlah kes COVID-19 yang dilaporkan adalah 18,688 kes (1,262,540 kes)\n\nPecahan setiap negeri (Kumulatif):\n\nSelangor - 6,565 (465,015)\nWPKL - 1,883 (140,404)\nJohor - 1,308 (100,452)\nSabah -Lagi 1,379 (93,835)\nSarawak - 581 (81,328)\nNegeri Sembilan - 1,140 (78,777)\nKedah - 1,610 (56,598)\nPulau Pinang - 694 (52,368)\nKelantan - 870 (49,433)\nPerak - 861 (43,924)\nMelaka - 526 (35,584)\nPahang - 602 (29,125)\nTerengganu - 598 (20,696)\nWP Labuan - 2 (9,711)\nWP Putrajaya - 63 (4,478)\nPerlis - 6 (812)\n\n- KPK KKM']
How can i get a certain word based on the start and end position of \n.
Expected output:
1)Sehingga 8 Ogos 2021
2)Selangor - 6,565 (465,015)
3)WPKL - 1,883 (140,404)
4)Johor - 1,308 (100,452)
and so on ..

You can use re module for the task:
import re
lst = [
"Sehingga 8 Ogos 2021: Jumlah kes COVID-19 yang dilaporkan adalah 18,688 kes (1,262,540 kes)\n\nPecahan setiap negeri (Kumulatif):\n\nSelangor - 6,565 (465,015)\nWPKL - 1,883 (140,404)\nJohor - 1,308 (100,452)\nSabah -Lagi 1,379 (93,835)\nSarawak - 581 (81,328)\nNegeri Sembilan - 1,140 (78,777)\nKedah - 1,610 (56,598)\nPulau Pinang - 694 (52,368)\nKelantan - 870 (49,433)\nPerak - 861 (43,924)\nMelaka - 526 (35,584)\nPahang - 602 (29,125)\nTerengganu - 598 (20,696)\nWP Labuan - 2 (9,711)\nWP Putrajaya - 63 (4,478)\nPerlis - 6 (812)\n\n- KPK KKM"
]
out = []
for v in lst:
for g in re.findall(r"^(.*?\(.*?\))\n", v, flags=re.M):
out.append(g.split(":")[0])
print(*out, sep="\n")
Prints:
Sehingga 8 Ogos 2021
Selangor - 6,565 (465,015)
WPKL - 1,883 (140,404)
Johor - 1,308 (100,452)
Sabah -Lagi 1,379 (93,835)
Sarawak - 581 (81,328)
Negeri Sembilan - 1,140 (78,777)
Kedah - 1,610 (56,598)
Pulau Pinang - 694 (52,368)
Kelantan - 870 (49,433)
Perak - 861 (43,924)
Melaka - 526 (35,584)
Pahang - 602 (29,125)
Terengganu - 598 (20,696)
WP Labuan - 2 (9,711)
WP Putrajaya - 63 (4,478)
Perlis - 6 (812)

Related

Scrape a table to get specific data out

I am trying to scrape both the 'settle' columns in conjunction with the base month and what respective table they are from (from this url: https://www.asxenergy.com.au/futures_nz/A)
I am able to run an html parser, but as soon as I attempt to run something similar to this:
table1 = soup.find(‘table’)
table1
it just comes back with nothing being there. I assume I'm making an error regarding the table tag. Would really appreciate some help!
Ideally I would like to be able to get the data from this table and then store it in a dataframe.
To read tables to panda's DataFrames you can use next example (as #TimRobers said, the data is loaded with JavaScript from different URL):
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://www.asxenergy.com.au/futures_nz/dataset"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
for table in soup.select("table:not(:has(table))"):
df = pd.read_html(str(table))[0]
df["TITLE"] = table.find_previous("h2").get_text(strip=True)
print(df)
print("-" * 160)
Prints:
Base Month Bid Size Bid Ask Ask Size High Low Last +/- Vol OpenInt OpenInt +/- Settle TITLE
0 Aug 2022 - - - - 54.00 52.85 54.00 +1.45 30 1610 - 52.55 Otahuhu
1 Sep 2022 - - - - 69.00 66.00 66.00 +1.00 97 1624 - 65.00 Otahuhu
2 Oct 2022 - - - - 84.10 81.75 81.75 +0.30 62 1585 - 81.45 Otahuhu
3 Nov 2022 - - - - 104.00 100.45 100.45 +0.40 62 1192 - 100.05 Otahuhu
4 Dec 2022 - - - - 87.25 84.70 84.70 +0.35 32 952 - 84.35 Otahuhu
5 Jan 2023 - - - - 119.10 118.10 118.20 +0.55 58 524 - 117.65 Otahuhu
6 Feb 2023 - - - - - - - - - 3 - 175.25 Otahuhu
7 Mar 2023 - - - - - - - - - - - 184.20 Otahuhu
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Base Quarter Bid Size Bid Ask Ask Size High Low Last +/- Vol OpenInt OpenInt +/- Settle TITLE
0 Q3 2022 - - - - 75.30 73.15 73.65 -0.65 31 3679 - 74.30 Otahuhu
1 Q4 2022 - - - - 91.60 87.60 88.50 - 65 4109 - 88.50 Otahuhu
2 Q1 2023 - - - - 163.25 158.50 158.50 - 123 3401 - 158.50 Otahuhu
3 Q2 2023 - - - - - - - - - 2403 - 214.00 Otahuhu
4 Q3 2023 - - - - 216.00 216.00 216.00 - 30 2438 - 216.00 Otahuhu
5 Q4 2023 - - - - 143.55 142.00 143.55 - 60 3357 - 143.55 Otahuhu
6 Q1 2024 - - - - - - - - - 3093 - 159.00 Otahuhu
7 Q2 2024 - - - - 197.00 197.00 197.00 - 30 2082 - 197.00 Otahuhu
8 Q3 2024 - - - - - - - - - 2091 - 197.50 Otahuhu
9 Q4 2024 - - - - 145.95 143.00 145.95 - 40 2649 - 145.95 Otahuhu
10 Q1 2025 - - - - 151.00 150.50 150.70 -0.30 46 1838 - 151.00 Otahuhu
11 Q2 2025 - - - - 178.00 175.20 176.00 - 92 1619 - 176.00 Otahuhu
12 Q3 2025 - - - - 178.00 175.20 176.00 - 92 1316 - 176.00 Otahuhu
13 Q4 2025 - - - - 128.45 125.00 125.50 - 100 1845 - 125.50 Otahuhu
----------------------------------------------------------------------------------------------------------------------------------------------------------------
...and so on.

How to filter a list with index

ListA = ['Sehingga 30 Ogos 2021: Jumlah kes COVID-19 yang dilaporkan adalah 19,268 kes (1,725,357 kes).\nPecahan setiap negeri (Kumulatif):\nSelangor - 3,567 (599,624)\nWPKL - 672 (172,236)\nSabah - 2,310 (145,249)\nJohor - 2,265 (136,488)\nSarawak - 2,028 (114,273)\nKedah - 2,084 (97,100)\nNegeri Sembilan - 269 (91,261)\nPulau Pinang - 1,780 (84,759)\nKelantan - 1,308 (76,047)\nPerak - 1,144 (66,889)\nMelaka - 395 (48,141)\nPahang - 788 (43,335)\nTerengganu - 544 (32,884)\nWP Labuan - 2 (9,808)\nWP Putrajaya - 41 (5,373)\nPerlis - 71 (1,890)\nREAD MORE...']
Expected output:
Sehingga 30 Ogos 2021: Jumlah kes COVID-19 yang dilaporkan adalah 19,268 kes (1,725,357 kes).
Pecahan setiap negeri (Kumulatif):
Selangor - 3,567 (599,624)
WPKL - 672 (172,236)
Sabah - 2,310 (145,249)
Johor - 2,265 (136,488)
Sarawak - 2,028 (114,273)
Kedah - 2,084 (97,100)
Negeri Sembilan - 269 (91,261)
Pulau Pinang - 1,780 (84,759)
Kelantan - 1,308 (76,047)
Perak - 1,144 (66,889)
Melaka - 395 (48,141)
Pahang - 788 (43,335)
Terengganu - 544 (32,884)
WP Labuan - 2 (9,808)
WP Putrajaya - 41 (5,373)
Perlis - 71 (1,890)
and each of the state have their own index, for example:
index[0] = Selangor - 3,567 (599,624)
index[1] = WPKL - 672 (172,236)
index[2] = Sabah - 2,310 (145,249)
and so on..
I tried with regex function:
out = []
for v in listView:
regex_list = re.findall(r"^(.*?\(.*?\))\n", v.replace('.\n\n', '\n').replace('.', ':'), flags=re.M)
for g in regex_list:
out.append(g.split(":")[0])
But some index is not accurate, for example: when I call index[2] it couldn't return me Sabah - 2,310 (145,249) but return me Johor - 2,265 (136,488) instead
I think the below is what you are looking for
data = 'Sehingga 30 Ogos 2021: Jumlah kes COVID-19 yang dilaporkan adalah 19,268 kes (1,725,357 kes).\nPecahan setiap negeri (Kumulatif):\nSelangor - 3,567 (599,624)\nWPKL - 672 (172,236)\nSabah - 2,310 (145,249)\nJohor - 2,265 (136,488)\nSarawak - 2,028 (114,273)\nKedah - 2,084 (97,100)\nNegeri Sembilan - 269 (91,261)\nPulau Pinang - 1,780 (84,759)\nKelantan - 1,308 (76,047)\nPerak - 1,144 (66,889)\nMelaka - 395 (48,141)\nPahang - 788 (43,335)\nTerengganu - 544 (32,884)\nWP Labuan - 2 (9,808)\nWP Putrajaya - 41 (5,373)\nPerlis - 71 (1,890)\nREAD MORE...'
items = data.split('\n')
for item in items:
print(item)
output
Sehingga 30 Ogos 2021: Jumlah kes COVID-19 yang dilaporkan adalah 19,268 kes (1,725,357 kes).
Pecahan setiap negeri (Kumulatif):
Selangor - 3,567 (599,624)
WPKL - 672 (172,236)
Sabah - 2,310 (145,249)
Johor - 2,265 (136,488)
Sarawak - 2,028 (114,273)
Kedah - 2,084 (97,100)
Negeri Sembilan - 269 (91,261)
Pulau Pinang - 1,780 (84,759)
Kelantan - 1,308 (76,047)
Perak - 1,144 (66,889)
Melaka - 395 (48,141)
Pahang - 788 (43,335)
Terengganu - 544 (32,884)
WP Labuan - 2 (9,808)
WP Putrajaya - 41 (5,373)
Perlis - 71 (1,890)
READ MORE...

Kafka python client alternating between assigned and rebalancing not processing data

I have a Kafka topic with 40 partitions. In a Kubernetes cluster.
I further have a microservice that consumes from this topic.
Sometimes it happens, within a batch process, that at one point there are some partitions left with unprocessed data while most partitions are finished. Using the kafka-consumer-groups.sh this looks like this:
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
- - - - - kafka-python-2.0.1-f1259971-c8ed-4d98-ba37-40f263b14a78/10.44.2.119 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-328f6a97-22ea-4f59-b702-4173feb9f025/10.44.0.29 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-9a2ea04e-3bf1-40f4-9262-6c14d0791dfc/10.44.7.35 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-81f5be15-535c-436c-996e-f8098d0613a1/10.44.4.26 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-ffcf76e2-f0ed-4894-bc70-ee73220881db/10.44.14.2 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-fc5709a0-a0b5-4324-92ff-02b6ee0f1232/10.44.2.123 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-c058418c-51ec-43e2-b666-21971480665b/10.44.15.2 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-0c14afab-af2a-4668-bb3c-015932fbfd13/10.44.14.5 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-1cb308f0-203f-43ae-9252-e0fc98eb87b8/10.44.14.4 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-42753a7f-80d0-481e-93a6-67445cb1bb5e/10.44.14.6 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-63e97395-e1ec-4cab-8edc-c5dd251932af/10.44.2.122 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-7116fdc2-809f-4f99-b5bd-60fbf2aba935/10.44.1.37 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-f5ef8ff1-f09c-498e-9b27-1bcac94b895b/10.44.2.125 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-8feec117-aa3a-42c0-91e8-0ccefac5f134/10.44.2.121 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-45cc5605-d3c8-4c77-8ca8-88afbde81a69/10.44.14.3 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-9a575ac4-1531-4b2a-b516-12ffa2496615/10.44.5.32 kafka-python-2.0.1
- - - - - kafka-python-2.0.1-d33e112b-a1f4-4699-8989-daee03a5021c/10.44.14.7 kafka-python-2.0.1
my-topic 20 890 890 0 - - -
my-topic 38 857 857 0 - - -
my-topic 28 918 918 0 - - -
my-topic 23 66 909 843 - - -
my-topic 10 888 888 0 - - -
my-topic 2 885 885 0 - - -
my-topic 7 853 853 0 - - -
my-topic 16 878 878 0 - - -
my-topic 15 47 901 854 - - -
my-topic 26 934 934 0 - - -
my-topic 32 898 898 0 - - -
my-topic 21 921 921 0 - - -
my-topic 13 933 933 0 - - -
my-topic 5 879 879 0 - - -
my-topic 12 945 945 0 - - -
my-topic 4 918 918 0 - - -
my-topic 29 924 924 0 - - -
my-topic 39 895 895 0 - - -
my-topic 25 30 926 896 - - -
my-topic 9 915 915 0 - - -
my-topic 35 31 890 859 - - -
my-topic 3 69 897 828 - - -
my-topic 1 911 911 0 - - -
my-topic 6 22 901 879 - - -
my-topic 14 41 881 840 - - -
my-topic 30 900 900 0 - - -
my-topic 22 847 847 0 - - -
my-topic 8 919 919 0 - - -
my-topic 0 902 902 0 - - -
my-topic 18 924 924 0 - - -
my-topic 36 864 864 0 - - -
my-topic 34 929 929 0 - - -
my-topic 24 864 864 0 - - -
my-topic 19 937 937 0 - - -
my-topic 27 859 859 0 - - -
my-topic 11 838 838 0 - - -
my-topic 31 49 922 873 - - -
my-topic 37 882 882 0 - - -
my-topic 17 942 942 0 - - -
my-topic 33 928 928 0 - - -
It further states that the consumer group is rebalancing.
One thing to note here is that under CONSUMER-ID there are fewer consumers stated as there should be. It should be 20 consumers but in this output, there are only 17 shown even though all pods run. This number varies and I am not sure if it is an output issue or if they are really not there. This also baffles me because when I initially start (all new Kafka and consumer deployments) this does not happen. So it really seems to be related to consumer deployments being scaled, or otherwise killed.
It then happens for a short time that the consumers get assigned and after about half a minute the same picture as above shows again where the consumer group is rebalancing.
This happens also when I scale down. E.g. when I only have 4 consumers. I am not sure what's happening here. The pods all run and I use the same kind of base code and pattern in other microservices where it seems to work fine.
I suspect that it has something to do with a consumer pod getting killed because, as I said, with a new deployment it works initially. This batch is also a bit more long-running than the others I have so a pod kill is more likely during its run. I am also not sure if it has something to do with most partitions already being finished, this could also just be a quirk of my use case.
I recognized this because the processing seemed to take forever but new data was still processed. So I think what happens is that for the brief moment when the consumers are assigned they process data but they never commit the offset before getting rebalanced leaving them in an infinite loop. The only slightly related thing I found was this issue but it is from quite some versions before and does not fully describe my situation.
I use the kafka-python client and I use the kafka image confluentinc/cp-kafka:5.0.1.
I create the topic using the admin client NewTopic(name='my-topic', num_partitions=40, replication_factor=1) and create the client like so:
consumer = KafkaConsumer(consume_topic,
bootstrap_servers=bootstrap_servers,
group_id=consume_group_id,
value_deserializer=lambda m: json.loads(m))
for message in consumer:
process(message)
What is going wrong here?
Do I have some configuration error?
Any help is greatly appreciated.
The issue was with the heartbeat configuration. It turns out that while most messages only need seconds to process, few messages take very long to process. In these special cases the heartbeat update took too long for some of the consumers resulting in the broker to assume the consumer is down and start a rebalance.
I assume what happened next is the consumers getting reassigned to the same message, taking too long to process it again and triggering yet another rebalance. Resulting in an endless cycle.
I finally solved it by increasing both session_timeout_ms and heartbeat_interval_ms in the consumer (documented here). I also decreased the batch size so that the heartbeat is updated more regularly.

How can I get elements using lxml

https://bankchart.kz/spravochniki/reytingi_cbr/2/2019/7
How can I get text from each column, that is, from the last three blocks with the class <div class = "col-currency-rate"> of each <div class = "row">? I got the table but what to do next?
>>> tree.xpath('//div[#class="table-currency"]/div[#class="row"]')
[<Element div at 0x7fcac2a47ba8>, <Element div at 0x7fcac2a47c00>, <Element div at 0x7fcac2a47c58>, <Element div at 0x7fcac2a47cb0>, <Element div at 0x7fcac2a47d08>, <Element div at 0x7fcac2a47d60>, <Element div at 0x7fcac2a47db8>, <Element div at 0x7fcac2a47e10>, <Element div at 0x7fcac2a47e68>, <Element div at 0x7fcac2a47ec0>, <Element div at 0x7fcac2a47f18>, <Element div at 0x7fcac2a47f70>, <Element div at 0x7fcac2a47fc8>, <Element div at 0x7fcac2a4e050>, <Element div at 0x7fcac2a4e0a8>, <Element div at 0x7fcac2a4e100>, <Element div at 0x7fcac2a4e158>, <Element div at 0x7fcac2a4e1b0>, <Element div at 0x7fcac2a4e208>, <Element div at 0x7fcac2a4e260>, <Element div at 0x7fcac2a4e2b8>, <Element div at 0x7fcac2a4e310>, <Element div at 0x7fcac2a4e368>, <Element div at 0x7fcac2a4e3c0>, <Element div at 0x7fcac2a4e418>, <Element div at 0x7fcac2a4e470>, <Element div at 0x7fcac2a4e4c8>, <Element div at 0x7fcac2a4e520>]
>>> len(tree.xpath('//div[#class="table-currency"]/div[#class="row"]'))
28
html
<div class="table-currency">
<div class="row"><div class="col col-currency">
2.
<img rel="nofollow" src="https://st6.prosto.im/cache/st6/1/0/5/5/1055/1055.jpg" width="16" height="16" alt="">
<a target="_blank" href="/spravochniki/reytingi_banka/2/1057">
ForteBank
</a></div><div class="col col-headery col-currency-rate"><p>Активы банков, тыс. тенге</p></div><div class="col col-headery col-currency-rate"><p>Прирост за июль 2019 года, тыс. тенге</p></div><div class="col col-headery col-currency-rate"><p>Прирост с начала 2019 года, тыс. тенге</p></div><div class="col col-currency-rate"><p>1 985 956 865</p></div><div class="col col-currency-rate"><p></p><p class="arrow-up">+89 298 547</p><p></p></div><div class="col col-currency-rate"><p></p><p class="arrow-up">+390 999 868</p><p></p></div></div>
<div class="row"><div class="col col-currency">
3.
<img rel="nofollow" src="https://st6.prosto.im/cache/st6/1/0/9/5/1095/1095.png" width="16" height="16" alt="">
<a target="_blank" href="/spravochniki/reytingi_banka/2/1076">
Сбербанк России
</a></div><div class="col col-headery col-currency-rate"><p>Активы банков, тыс. тенге</p></div><div class="col col-headery col-currency-rate"><p>Прирост за июль 2019 года, тыс. тенге</p></div><div class="col col-headery col-currency-rate"><p>Прирост с начала 2019 года, тыс. тенге</p></div><div class="col col-currency-rate"><p>1 983 840 092</p></div><div class="col col-currency-rate"><p></p><p class="arrow-up">+88 853 745</p><p></p></div><div class="col col-currency-rate"><p></p><p class="arrow-up">+119 145 827</p><p></p></div></div>
</div>
Complex solution with specific Xpath expressions:
from lxml import html
import requests
url = 'https://bankchart.kz/spravochniki/reytingi_cbr/2/2019/7'
doc = html.document_fromstring(requests.get(url).content)
for row in doc.xpath('//div[#class="table-currency"]/div[#class="row"]'):
bank_name = row.xpath('descendant::a/text()')[0].strip()
print(bank_name)
for cur_rate in row.xpath('div[contains(#class, "col-currency-rate")][position() > last() - 3]'):
print('-', cur_rate.text_content())
print()
Details:
descendant::a/text() - xpath to extract text node of a element which is a child/descendant node of underlined row
div[contains(#class, "col-currency-rate")][position() > last() - 3] - xpath to select div elements with specific class attribute partial value and with a position starting from the 3rd last position to the end (last() - position of the last element, last() - 3 points to the 3rd last position)
The output:
Народный банк Казахстана
- 8 729 518 087
- +101 401 107
- -190 957 466
ForteBank
- 1 985 956 865
- +89 298 547
- +390 999 868
Сбербанк России
- 1 983 840 092
- +88 853 745
- +119 145 827
Kaspi Bank
- 1 907 391 103
- +12 378 770
- +233 318 909
Банк ЦентрКредит
- 1 495 599 542
- +34 795 443
- -14 202 851
АТФБанк
- 1 314 405 536
- +1 661 967
- -19 558 254
First Heartland Jýsan Bank
- 1 217 617 065
- +52 641 777
- -553 564 176
Жилстройсбербанк Казахстана
- 1 148 974 349
- +7 721 823
- +261 041 394
Евразийский банк
- 1 040 820 999
- -25 910 447
- -25 911 373
Ситибанк Казахстан
- 758 117 020
- +48 724 924
- +82 877 576
Банк "Bank RBK"
- 618 310 738
- +21 856 874
- +62 626 834
Альфа-Банк
- 504 777 556
- +17 401 839
- +51 157 130
Altyn Bank («Народный банк Казахстана»)
- 421 018 633
- -20 058 555
- +33 720 048
Нурбанк
- 408 442 557
- +7 065 511
- -18 282 545
Хоум Кредит энд Финанс Банк
- 372 901 871
- -2 127 105
- +33 983 288
Банк Китая в Казахстане
- 324 386 349
- +11 609 880
- +4 997 316
Банк ВТБ
- 184 247 490
- +5 800 194
- +40 725 927
First Heartland Bank (Банк ЭкспоКреди)
- 173 058 018
- -17 261 535
- +16 047 168
Торгово-промышленный Банк Китая в Алматы
- 140 792 847
- +6 365 348
- -26 137 736
Банк Kassa Nova
- 133 910 512
- +954 985
- +4 039 523
Tengri Bank (Punjab National Bank)
- 133 721 602
- +1 136 896
- -485 570
Азия Кредит Банк
- 99 659 306
- -3 790 116
- -21 420 844
Capital Bank Kazakhstan
- 85 702 895
- -3 165 322
- +4 469 187
KZI Bank (Казахстан Зират Интернешнл)
- 65 240 704
- -3 412 060
- -126 750
Шинхан Банк Казахстан
- 43 323 406
- -7 588 366
- +722 399
Исламский Банк "Al-Hilal"
- 30 562 279
- +2 411 098
- -1 430 198
Заман-Банк
- 22 969 984
- -168 105
- +5 544 675
Национальный Банк Пакистана
- 4 705 084
- -20 113
- -131 233
Try using this
import requests
import bs4 as bs
base_url = 'https://bankchart.kz/spravochniki/reytingi_cbr/2/2019/7'
soup = bs.BeautifulSoup(requests.get(base_url).text, 'lxml')
res = soup.find_all('div', {'class': 'row'})
final = list()
# res[1:] to skip the header of the columns
for bank in res[1:]:
bank_data = list()
# Bank name
bank_data.append(bank.find('a').text.strip('\n'))
# Image
bank_data.append(bank.find('img')['src'])
res = bank.find_all('div', {'class': 'col col-currency-rate'})
for values in res:
data = values.find_all('p')
for x in data:
if x.text:
# All the three values
bank_data.append(x.text)
final.append(bank_data)
for x in final:
print(x)
Check if this works for you.

Why doesn't the MILP produce a solution when its obviously solvable?

I'm solving a MILP in a Python script with PuLP and the Gurobi solver and varying parameters.
A sensitivity analysis is done with a 'for' loop, changing a parameter at every run. The first runs are with 'worst case' parameters (very low efficiency generator and very bad insulation material), and gradually the parameters get improved while looping through the MILP. At some point, when parameters are set in a way that a solution should be found quite quickly, Gurobipy does not seem to find a solution. This is the log:
Changed value of parameter TimeLimit to 300.0
Prev: 1e+100 Min: 0.0 Max: 1e+100 Default: 1e+100
Optimize a model with 8640 rows, 10080 columns and 20158 nonzeros
Variable types: 8640 continuous, 1440 integer (0 binary)
Coefficient statistics:
Matrix range [2e-05, 4e+04]
Objective range [1e+03, 1e+03]
Bounds range [7e-01, 4e+04]
RHS range [1e-02, 3e+04]
Presolve removed 7319 rows and 7331 columns
Presolve time: 0.03s
Presolved: 1321 rows, 2749 columns, 4069 nonzeros
Variable types: 1320 continuous, 1429 integer (1429 binary)
Root relaxation: objective 4.910087e+05, 679 iterations, 0.01 seconds
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
0 0 491008.698 0 11 - 491008.698 - - 0s
0 0 491008.698 0 11 - 491008.698 - - 0s
0 2 491008.698 0 11 - 491008.698 - - 0s
30429 24907 491680.652 942 3 - 491011.160 - 1.0 5s
73520 66861 491679.428 958 3 - 491011.996 - 1.0 10s
123770 116802 491762.182 1241 2 - 491012.439 - 1.0 15s
174010 165706 491896.963 1266 2 - 491012.636 - 1.0 20s
221580 212357 491234.860 1144 5 - 491012.931 - 1.0 25s
270004 259925 491187.818 904 5 - 491013.203 - 1.0 30s
322655 311334 491807.797 1254 2 - 491013.349 - 1.0 35s
379633 367554 491194.198 941 5 - 491013.571 - 1.0 40s
434035 420930 494029.008 1375 1 - 491013.695 - 1.0 45s
490442 476293 494016.622 1354 1 - 491013.851 - 1.0 50s
544923 529662 491203.097 990 5 - 491013.947 - 1.0 55s
597268 581228 492312.463 1253 2 - 491014.018 - 1.0 60s
650478 633331 491093.453 383 5 - 491014.133 - 1.0 65s
703246 685374 491755.974 1241 2 - 491014.188 - 1.0 70s
756675 737356 491069.420 272 6 - 491014.250 - 1.0 75s
811974 791502 491560.902 1235 3 - 491014.308 - 1.0 80s
866893 845452 491112.986 497 5 - 491014.345 - 1.0 85s
923793 901357 494014.134 1348 1 - 491014.390 - 1.0 90s
981961 958448 492971.305 1266 2 - 491014.435 - 1.0 95s
1039971 1015276 491545.502 1216 4 - 491014.502 - 1.0 100s
1097780 1071899 491171.468 818 5 - 491014.527 - 1.0 105s
1154447 1127328 491108.438 461 5 - 491014.591 - 1.0 110s
1212776 1184651 491024.147 57 6 - 491014.622 - 1.0 115s
1272535 1243171 495190.479 1266 2 - 491014.643 - 1.0 120s
1332126 1301674 491549.733 1228 3 - 491014.668 - 1.0 125s
1392772 1361287 491549.544 1219 3 - 491014.694 - 1.0 130s
1452380 1419870 491754.309 1237 2 - 491014.717 - 1.0 135s
1511070 1477572 491229.746 1131 5 - 491014.735 - 1.0 140s
1569783 1535126 491130.785 587 5 - 491014.764 - 1.0 145s
1628729 1593010 494026.669 1368 1 - 491014.775 - 1.0 150s
1687841 1651373 493189.023 1264 2 - 491014.810 - 1.0 155s
1747707 1709984 491548.263 1223 3 - 491014.841 - 1.0 160s
1807627 1768777 491160.795 755 5 - 491014.876 - 1.0 165s
1865730 1825486 494030.045 1379 1 - 491014.899 - 1.0 170s
1925615 1884356 494028.562 1374 1 - 491014.923 - 1.0 175s
1984204 1941827 491847.402 1115 2 - 491014.933 - 1.0 180s
2044016 2000572 491244.304 1210 5 - 491014.970 - 1.0 185s
2102125 2057622 491174.413 828 5 - 491014.989 - 1.0 190s
2161393 2115829 491115.089 532 5 - 491015.017 - 1.0 195s
2220721 2174168 491086.511 348 6 - 491015.041 - 1.0 200s
2281194 2233610 infeasible 1433 - 491015.048 - 1.0 205s
2341496 2292542 492824.696 1262 2 - 491015.069 - 1.0 210s
2399836 2349837 491548.142 1224 3 - 491015.084 - 1.0 215s
2459295 2408276 491178.869 853 5 - 491015.088 - 1.0 220s
2519203 2467098 491112.995 488 5 - 491015.106 - 1.0 225s
2578654 2525514 491069.711 270 6 - 491015.123 - 1.0 230s
2636111 2582093 491762.206 1250 2 - 491015.139 - 1.0 235s
2695962 2640805 491237.559 1152 5 - 491015.146 - 1.0 240s
2755319 2699171 491156.897 797 6 - 491015.161 - 1.0 245s
2813620 2756371 491024.109 43 7 - 491015.182 - 1.0 250s
2872810 2814527 492309.743 1255 2 - 491015.185 - 1.0 255s
2932550 2873227 492180.501 1255 2 - 491015.202 - 1.0 260s
2991586 2931246 491244.162 1207 5 - 491015.217 - 1.0 265s
3050385 2988872 491196.181 952 5 - 491015.228 - 1.0 270s
3110478 3047787 491127.746 560 5 - 491015.247 - 1.0 275s
3169730 3105844 491109.579 525 6 - 491015.266 - 1.0 280s
3229972 3165019 494029.916 1376 1 - 491015.276 - 1.0 285s
3289639 3223661 491861.516 1173 2 - 491015.293 - 1.0 290s
3349653 3282631 491862.419 1185 2 - 491015.305 - 1.0 295s
Explored 3409667 nodes (3506772 simplex iterations) in 300.02 seconds
Thread count was 8 (of 8 available processors)
Solution count 0
Time limit reached
Best objective -, best bound 4.910153206264e+05, gap -
('Gurobi status=', 9)
I've increased the maximum solving time to 300s (more takes up to much RAM and the programm gets terminated at some point) and played around with parameters (worse parameter settings find a solution!) but nothing seems to work. What might be the problem?
I was able to resolve this by inserting "None" for maximum solving time and inserting a loose maximal gap.

Categories