Scraping Asxp page with beautiful soup

Scraping Asxp page with beautiful soup - python

enter image description hereI am trying to scrape this page using Beautiful Soup, I 1st tried to find any API/Json behind the page which I couldn't find. And then I was trying a BS and HTML parser, but I can't get anywhere with it.
I am not able to do so coz the page is calling a function with onclick GetFiiStatistics('F-INDEX FUTURES').
How can I go about scraping pages like these?
webpage:
https://www.motilaloswal.com/markets/derivative-market/FII-Statistics.aspx

The data is loaded via Javascript from their API. Unfortunately, I don't know, how INPUT, CATAGORY and FLAG values are computed:
import json
import requests
import pandas as pd
from bs4 import BeautifulSoup
api_url = 'https://www.motilaloswal.com/ControllerBeta/APIRequest.aspx'
data = {
'INPUT':'3yDBtksOiDjLLYaySd5NYgCcFnUOx8Jh2c8SRJvEhAs=',
'CATAGORY':'Q668CVoAKYOr7whA+PW25A==',
'FLAG':'VrTQQDRa72uSLVOyZO5Nqg=='
}
json_data = json.loads(requests.post(api_url, data=data).json())
# uncomment this to print all data:
# print(json.dumps(json_data, indent=4))
df = pd.DataFrame(json_data)
print(df)
Prints:
IndexType Date BuyContracts BuyValue SellContracts SellValue Net_BuySellValue OIContracts_eod OIValue_eod
0 INDEX FUTURES 2020-07-14T00:00:00 67037 4403.320 88350 6048.750 -1645.430 123985 8799.15
1 INDEX FUTURES 2020-07-13T00:00:00 53612 3644.000 61174 4142.300 -498.300 105126 7803.56
2 INDEX FUTURES 2020-07-10T00:00:00 62735 4313.250 83916 5923.270 -1610.020 113346 8369.47
3 INDEX FUTURES 2020-07-09T00:00:00 67222 4773.340 51094 3442.520 1330.820 119977 8971.28
4 INDEX FUTURES 2020-07-08T00:00:00 83367 5661.640 69042 4659.280 1002.360 106027 7686.32
.. ... ... ... ... ... ... ... ... ...
495 INDEX FUTURES 2018-07-05T00:00:00 28566 2585.923 21393 1881.601 704.322 243417 19847.72
496 INDEX FUTURES 2018-07-04T00:00:00 21339 1875.184 26507 2425.003 -549.819 245786 20201.28
497 INDEX FUTURES 2018-07-03T00:00:00 30019 2610.728 28837 2563.647 47.081 237564 19322.63
498 INDEX FUTURES 2018-07-02T00:00:00 24976 2191.751 29501 2541.589 -349.838 226100 18203.48
499 INDEX FUTURES 2018-06-29T00:00:00 45399 3814.480 27297 2371.817 1442.663 227041 18387.33
[500 rows x 9 columns]
EDIT: To scrape other tabs, change the data= parameter in request:
import json
import requests
import pandas as pd
from bs4 import BeautifulSoup
api_url = 'https://www.motilaloswal.com/ControllerBeta/APIRequest.aspx'
data_future = {
'INPUT':'3yDBtksOiDjLLYaySd5NYgCcFnUOx8Jh2c8SRJvEhAs=',
'CATAGORY':'Q668CVoAKYOr7whA+PW25A==',
'FLAG':'VrTQQDRa72uSLVOyZO5Nqg=='
}
data_option = {
'INPUT': '3yDBtksOiDjLLYaySd5NYmcoKC8x7z5PFO880mjcQ2U=',
'CATAGORY': 'Q668CVoAKYOr7whA+PW25A==',
'FLAG': "VrTQQDRa72uSLVOyZO5Nqg=="
}
data_stock_future = {
'INPUT': '7F5jZM46TTOICwT1N6AfqkP7gWI2CpTGCWmll4bhYow=',
'CATAGORY': 'Q668CVoAKYOr7whA+PW25A==',
'FLAG': 'VrTQQDRa72uSLVOyZO5Nqg=='
}
data_stock_option = {
'INPUT': '7F5jZM46TTOICwT1N6Afqv1N6pCh+1OrTfhSwG6Azes=',
'CATAGORY': 'Q668CVoAKYOr7whA+PW25A==',
'FLAG': 'VrTQQDRa72uSLVOyZO5Nqg=='
}
json_data = json.loads(requests.post(api_url, data=data_stock_option).json()) # <-- change data= to data_stock_future or data_option ...
# uncomment this to print all data:
# print(json.dumps(json_data, indent=4))
df = pd.DataFrame(json_data)
print(df)
Prints:
IndexType Date BuyContracts BuyValue SellContracts SellValue Net_BuySellValue OIContracts_eod OIValue_eod
0 STOCK OPTIONS 2020-07-14T00:00:00 52546 4349.020 54139 4525.850 -176.830 115998 7661.010
1 STOCK OPTIONS 2020-07-13T00:00:00 50604 4242.330 52221 4413.040 -170.710 110663 7329.990
2 STOCK OPTIONS 2020-07-10T00:00:00 82502 6218.200 82608 6219.420 -1.220 109680 7232.900
3 STOCK OPTIONS 2020-07-09T00:00:00 64743 4725.430 64613 4714.460 10.970 104780 6945.740
4 STOCK OPTIONS 2020-07-08T00:00:00 75481 5201.770 75713 5220.580 -18.810 100200 6584.390
.. ... ... ... ... ... ... ... ... ...
495 STOCK OPTIONS 2018-07-05T00:00:00 94696 6728.086 93256 6617.059 111.027 66773 4471.483
496 STOCK OPTIONS 2018-07-04T00:00:00 68719 4762.333 69376 4794.350 -32.017 59195 4005.045
497 STOCK OPTIONS 2018-07-03T00:00:00 64283 4351.226 64982 4347.153 4.073 53028 3581.946
498 STOCK OPTIONS 2018-07-02T00:00:00 74479 4913.606 74730 4897.239 16.367 44627 3024.131
499 STOCK OPTIONS 2018-06-29T00:00:00 69730 4694.675 68447 4645.350 49.325 35486 2421.744
[500 rows x 9 columns]

Related

How to extract information from atom feed based on condition?

I have output of API request in given below.
From each atom:entry I need to extract
<c:series href="http://company.com/series/product/123"/>
<c:series-order>2020-09-17T00:00:00Z</c:series-order>
<f:assessment-low precision="0">980</f:assessment-low>
I tried to extract them to different list with BeautifulSoup, but that wasn't successful because in some entries there are dates but there isn't price (I've shown example below). How could I conditionally extract it? at least put N/A for entries where price is ommited.
soup = BeautifulSoup(request.text, "html.parser")
date = soup.find_all('c:series-order')
value = soup.find_all('f:assessment-low')
quot=soup.find_all('c:series')
p_day = []
p_val = []
q_val=[]
for i in date:
p_day.append(i.text)
for j in value:
p_val.append(j.text)
for j in quot:
q_val.append(j.get('href'))
d2={'date': p_day,
'price': p_val,
'quote': q_val
}
and
<atom:feed xmlns:atom="http://www.w3.org/2005/Atom" xmlns:a="http://company.com/ns/assets" xmlns:c="http://company.com/ns/core" xmlns:f="http://company.com/ns/fields" xmlns:s="http://company.com/ns/search">
<atom:id>http://company.com/search</atom:id>
<atom:title> COMPANYSearch Results</atom:title>
<atom:updated>2022-11-24T19:36:19.104414Z</atom:updated>
<atom:author>COMPANY atom:author>
<atom:generator> COMPANY/search Endpoint</atom:generator>
<atom:link href="/search" rel="self" type="application/atom"/>
<s:first-result>1</s:first-result>
<s:max-results>15500</s:max-results>
<s:selected-count>212</s:selected-count>
<s:returned-count>212</s:returned-count>
<s:query-time>PT0.036179S</s:query-time>
<s:request version="1.0">
<s:scope>
<s:series>http://company.com/series/product/123</s:series>
</s:scope>
<s:constraints>
<s:compare field="c:series-order" op="ge" value="2018-10-01"/>
<s:compare field="c:series-order" op="le" value="2022-11-18"/>
</s:constraints>
<s:options>
<s:first-result>1</s:first-result>
<s:max-results>15500</s:max-results>
<s:order-by key="commodity-name" direction="ascending" xml:lang="en"/>
<s:no-currency-rate-scheme>no-element</s:no-currency-rate-scheme>
<s:precision>embed</s:precision>
<s:include-last-commit-time>false</s:include-last-commit-time>
<s:include-result-types>live</s:include-result-types>
<s:relevance-score algorithm="score-logtfidf"/>
<s:lang-data-missing-scheme>show-available-language-content</s:lang-data-missing-scheme>
</s:options>
</s:request>
<s:facets/>
<atom:entry>
<atom:title>http://company.com/series-item/product/123-pricehistory-20200917000000</atom:title>
<atom:id>http://company.com/series-item/product/123-pricehistory-20200917000000</atom:id>
<atom:updated>2020-09-17T17:09:43.55243Z</atom:updated>
<atom:relevance-score>60800</atom:relevance-score>
<atom:content type="application/vnd.icis.iddn.entity+xml"><a:price-range>
<c:id>http://company.com/series-item/product/123-pricehistory-20200917000000</c:id>
<c:version>1</c:version>
<c:type>series-item</c:type>
<c:created-on>2020-09-17T17:09:43.55243Z</c:created-on>
<c:descriptor href="http://company.com/descriptor/price-range"/>
<c:domain href="http://company.com/domain/product"/>
<c:released-on>2020-09-17T21:30:00Z</c:released-on>
<c:series href="http://company.com/series/product/123"/>
<c:series-order>2020-09-17T00:00:00Z</c:series-order>
<f:assessment-low precision="0">980</f:assessment-low>
<f:assessment-high precision="0">1020</f:assessment-high>
<f:mid precision="1">1000</f:mid>
<f:assessment-low-delta>0</f:assessment-low-delta>
<f:assessment-high-delta>+20</f:assessment-high-delta>
<f:delta-type href="http://company.com/ref-data/delta-type/regular"/>
</a:price-range></atom:content>
</atom:entry>
<atom:entry>
<atom:title>http://company.com/series-item/product/123-pricehistory-20200910000000</atom:title>
<atom:id>http://company.com/series-item/product/123-pricehistory-20200910000000</atom:id>
<atom:updated>2020-09-10T18:57:55.128308Z</atom:updated>
<atom:relevance-score>60800</atom:relevance-score>
<atom:content type="application/vnd.icis.iddn.entity+xml"><a:price-range>
<c:id>http://company.com/series-item/product/123-pricehistory-20200910000000</c:id>
<c:version>1</c:version>
<c:type>series-item</c:type>
<c:created-on>2020-09-10T18:57:55.128308Z</c:created-on>
<c:descriptor href="http://company.com/descriptor/price-range"/>
<c:domain href="http://company.com/domain/product"/>
<c:released-on>2020-09-10T21:30:00Z</c:released-on>
<c:series href="http://company.com/series/product/123"/>
<c:series-order>2020-09-10T00:00:00Z</c:series-order>
for example here is no price
<f:delta-type href="http://company.com/ref-data/delta-type/regular"/>
</a:price-range></atom:content>
</atom:entry>

May try to iterate per entry, use xml parser to get a propper result and check if element exists:
soup = BeautifulSoup(request.text,'xml')
data = []
for i in soup.select('entry'):
data.append({
'date':i.find('series-order').text,
'value': i.find('assessment-low').text if i.find('assessment-low') else None,
'quot': i.find('series').get('href')
})
data
or with html.parser:
soup = BeautifulSoup(xml,'html.parser')
data = []
for i in soup.find_all('atom:entry'):
data.append({
'date':i.find('c:series-order').text,
'value': i.find('f:assessment-low').text if i.find('assessment-low') else None,
'quot': i.find('c:series').get('href')
})
data
Output:
[{'date': '2020-09-17T00:00:00Z',
'value': '980',
'quot': 'http://company.com/series/product/123'},
{'date': '2020-09-10T00:00:00Z',
'value': None,
'quot': 'http://company.com/series/product/123'}]

You can try this:
split your request.text by <atom:entry>
deal with each section seperately.
Use enumerate to identify the section that it came from
entries = request.text.split("<atom:entry>")
p_day = []
p_val = []
q_val=[]
for i, entry in enumerate(entries):
soup = BeautifulSoup(entry, "html.parser")
date = soup.find_all('c:series-order')
value = soup.find_all('f:assessment-low')
quot=soup.find_all('c:series')
for d in date:
p_day.append([i, d.text])
for v in value:
p_val.append([i, v.text])
for q in quot:
q_val.append([i, q.get('href')])
d2={'date': p_day,
'price': p_val,
'quote': q_val
}
print(d2)
OUTPUT:
{'date': [[1, '2020-09-17T00:00:00Z'], [2, '2020-09-10T00:00:00Z']],
'price': [[1, '980']],
'quote': [[1, 'http://company.com/series/product/123'],
[2, 'http://company.com/series/product/123']]}

Trying to grab just the first href in each table row

I'm trying to grab just the first href in each row in an HTML table. Using find_all on the soup object doesn't work because there are multiple tables so I used soup.select() to isolate just that table and work from there but it doesn't seem to be working.
Tried using find_all on the soup object alone, tried looping through the table rows with find() but it said that it returns 'NoneType'.
I Would like to be able to store a list that starts [/players/a/abrinal01.html, "/players/a/acyqu01.html, etc]
url = 'https://www.basketball-reference.com/leagues/NBA_2019_per_game.html'
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'html.parser')
table = soup.find("table", { "id" : "per_game_stats" })

You can access the desired data by anchoring the parsing from the outer div wrapper with the id of all_per_game_stats:
import requests
from bs4 import BeautifulSoup as soup
d = soup(requests.get('https://www.basketball-reference.com/leagues/NBA_2019_per_game.html').text, 'html.parser')
data = [b.td.a['href'] for b in d.find('div', {'id':'all_per_game_stats'}).table.find_all('tr') if b.td]
Output:
['/players/a/abrinal01.html', '/players/a/acyqu01.html', '/players/a/adamsja01.html', '/players/a/adamsst01.html', '/players/a/adebaba01.html', '/players/a/adelde01.html', '/players/a/akoonde01.html', '/players/a/aldrila01.html', '/players/a/alkinra01.html', '/players/a/allengr01.html', '/players/a/allenja01.html', '/players/a/allenka01.html', '/players/a/aminual01.html', '/players/a/anderju01.html', '/players/a/anderky01.html', '/players/a/anderry01.html', '/players/a/anderry01.html', '/players/a/anderry01.html', '/players/a/anigbik01.html', '/players/a/antetgi01.html', '/players/a/antetko01.html', '/players/a/anthoca01.html', '/players/a/anunoog01.html', '/players/a/arcidry01.html', '/players/a/arizatr01.html', '/players/a/arizatr01.html', '/players/a/arizatr01.html', '/players/a/augusdj01.html', '/players/a/aytonde01.html', '/players/b/bacondw01.html', '/players/b/baglema01.html', '/players/b/bakerro01.html', '/players/b/bakerro01.html', '/players/b/bakerro01.html', '/players/b/baldwwa01.html', '/players/b/balllo01.html', '/players/b/bambamo01.html', '/players/b/bareajo01.html', '/players/b/barneha02.html', '/players/b/barneha02.html', '/players/b/barneha02.html', '/players/b/bartowi01.html', '/players/b/bateske01.html', '/players/b/batumni01.html', '/players/b/bayleje01.html', '/players/b/baynear01.html', '/players/b/bazemke01.html', '/players/b/bealbr01.html', '/players/b/beaslma01.html', '/players/b/beaslmi01.html', '/players/b/belinma01.html', '/players/b/belljo01.html', '/players/b/bembrde01.html', '/players/b/bendedr01.html', '/players/b/bertada02.html', '/players/b/bertada01.html', '/players/b/beverpa01.html', '/players/b/birchkh01.html', '/players/b/biyombi01.html', '/players/b/bjeline01.html', '/players/b/blakean01.html', '/players/b/bledser01.html', '/players/b/blossja01.html', '/players/b/bogdabo01.html', '/players/b/bogdabo02.html', '/players/b/bogutan01.html', '/players/b/boldejo01.html', '/players/b/bongais01.html', '/players/b/bookede01.html', '/players/b/bouchch01.html', '/players/b/bradlav01.html', '/players/b/bradlav01.html', '/players/b/bradlav01.html', '/players/b/bradlto01.html', '/players/b/breweco01.html', '/players/b/breweco01.html', '/players/b/breweco01.html', '/players/b/bridgmi01.html', '/players/b/bridgmi02.html', '/players/b/briscis01.html', '/players/b/broekry01.html', '/players/b/brogdma01.html', '/players/b/brookdi01.html', '/players/b/brookma01.html', '/players/b/brownbr01.html', '/players/b/brownja02.html', '/players/b/brownlo01.html', '/players/b/brownst02.html', '/players/b/browntr01.html', '/players/b/brunsja01.html', '/players/b/bryanth01.html', '/players/b/bullore01.html', '/players/b/bullore01.html', '/players/b/bullore01.html', '/players/b/burketr01.html', '/players/b/burketr01.html', '/players/b/burketr01.html', '/players/b/burksal01.html', '/players/b/burksal01.html', '/players/b/burksal01.html', '/players/b/burksal01.html', '/players/b/burtode02.html', '/players/b/butleji01.html', '/players/b/butleji01.html', '/players/b/butleji01.html', '/players/c/cabocbr01.html', '/players/c/caldejo01.html', '/players/c/caldwke01.html', '/players/c/canaais01.html', '/players/c/canaais01.html', '/players/c/canaais01.html', '/players/c/canaais01.html', '/players/c/capelca01.html', '/players/c/carrode01.html', '/players/c/carteje01.html', '/players/c/cartevi01.html', '/players/c/cartewe01.html', '/players/c/cartemi01.html', '/players/c/cartemi01.html', '/players/c/cartemi01.html', '/players/c/carusal01.html', '/players/c/casspom01.html', '/players/c/caulewi01.html', '/players/c/caupatr01.html', '/players/c/cavanty01.html', '/players/c/chandty01.html', '/players/c/chandty01.html', '/players/c/chandty01.html', '/players/c/chandwi01.html', '/players/c/chandwi01.html', '/players/c/chandwi01.html', '/players/c/chealjo01.html', '/players/c/chiozch01.html', '/players/c/chrisma01.html', '/players/c/chrisma01.html', '/players/c/chrisma01.html', '/players/c/clarkga01.html', '/players/c/clarkia01.html', '/players/c/clarkjo01.html', '/players/c/collijo01.html', '/players/c/colliza01.html', '/players/c/collida01.html', '/players/c/colsobo01.html', '/players/c/conlemi01.html', '/players/c/connapa01.html', '/players/c/cookqu01.html', '/players/c/couside01.html', '/players/c/covinro01.html', '/players/c/covinro01.html', '/players/c/covinro01.html', '/players/c/crabbal01.html', '/players/c/craigto01.html', '/players/c/crawfja01.html', '/players/c/creekmi01.html', '/players/c/creekmi01.html', '/players/c/creekmi01.html', '/players/c/crowdja01.html', '/players/c/cunnida01.html', '/players/c/curryse01.html', '/players/c/curryst01.html', '/players/d/danietr01.html', '/players/d/davisan02.html', '/players/d/davisde01.html', '/players/d/davised01.html', '/players/d/davisty01.html', '/players/d/dedmode01.html', '/players/d/dekkesa01.html', '/players/d/dekkesa01.html', '/players/d/dekkesa01.html', '/players/d/delgaan01.html', '/players/d/dellama01.html', '/players/d/dellama01.html', '/players/d/dellama01.html', '/players/d/denglu01.html', '/players/d/derozde01.html', '/players/d/derrima01.html', '/players/d/diallch01.html', '/players/d/diallha01.html', '/players/d/dienggo01.html', '/players/d/dinwisp01.html', '/players/d/divindo01.html', '/players/d/doncilu01.html', '/players/d/dorsety01.html', '/players/d/dorsety01.html', '/players/d/dorsety01.html', '/players/d/dotsoda01.html', '/players/d/doziepj01.html', '/players/d/dragigo01.html', '/players/d/drumman01.html', '/players/d/dudleja01.html', '/players/d/dunnkr01.html', '/players/d/duranke01.html', '/players/d/duvaltr01.html', '/players/e/edwarvi01.html', '/players/e/ellenhe01.html', '/players/e/ellenhe01.html', '/players/e/ellenhe01.html', '/players/e/ellinwa01.html', '/players/e/ellinwa01.html', '/players/e/ellinwa01.html', '/players/e/embiijo01.html', '/players/e/ennisja01.html', '/players/e/ennisja01.html', '/players/e/ennisja01.html', '/players/e/eubandr01.html', '/players/e/evansja02.html', '/players/e/evansja01.html', '/players/e/evansja01.html', '/players/e/evansja01.html', '/players/e/evansty01.html', '/players/e/exumda01.html', '/players/f/farieke01.html', '/players/f/farieke01.html', '/players/f/farieke01.html', '/players/f/favorde01.html', '/players/f/feliccr01.html', '/players/f/feltora01.html', '/players/f/fergute01.html', '/players/f/ferreyo01.html', '/players/f/finnedo01.html', '/players/f/forbebr01.html', '/players/f/fournev01.html', '/players/f/foxde01.html', '/players/f/frazime01.html', '/players/f/fraziti01.html', '/players/f/fraziti01.html', '/players/f/fraziti01.html', '/players/f/fredeji01.html', '/players/f/fryech01.html', '/players/f/fultzma01.html', '/players/g/gallida01.html', '/players/g/gallola01.html', '/players/g/garrebi01.html', '/players/g/gasolma01.html', '/players/g/gasolma01.html', '/players/g/gasolma01.html', '/players/g/gasolpa01.html', '/players/g/gasolpa01.html', '/players/g/gasolpa01.html', '/players/g/gayru01.html', '/players/g/georgpa01.html', '/players/g/gibsota01.html', '/players/g/gilesha01.html', '/players/g/gilgesh01.html', '/players/g/goberru01.html', '/players/g/goodwbr01.html', '/players/g/gordoaa01.html', '/players/g/gordoer01.html', '/players/g/gortama01.html', '/players/g/grahade01.html', '/players/g/grahatr01.html', '/players/g/grantje01.html', '/players/g/grantje02.html', '/players/g/grantdo01.html', '/players/g/greenda02.html', '/players/g/greendr01.html', '/players/g/greenge01.html', '/players/g/greenja01.html', '/players/g/greenja01.html', '/players/g/greenja01.html', '/players/g/greenje02.html', '/players/g/griffbl01.html', '/players/h/hamilda02.html', '/players/h/hannadu01.html', '/players/h/hardati02.html', '/players/h/hardati02.html', '/players/h/hardati02.html', '/players/h/hardeja01.html', '/players/h/harklma01.html', '/players/h/harremo01.html', '/players/h/harride01.html', '/players/h/harriga01.html', '/players/h/harrijo01.html', '/players/h/harrito02.html', '/players/h/harrito02.html', '/players/h/harrito02.html', '/players/h/harrian01.html', '/players/h/harrian01.html', '/players/h/harrian01.html', '/players/h/harrian01.html', '/players/h/harrish01.html', '/players/h/hartjo01.html', '/players/h/harteis01.html', '/players/h/hasleud01.html', '/players/h/haywago01.html', '/players/h/hensojo01.html', '/players/h/hernaju01.html', '/players/h/hernawi01.html', '/players/h/hezonma01.html', '/players/h/hicksis01.html', '/players/h/hieldbu01.html', '/players/h/highsha01.html', '/players/h/hilarne01.html', '/players/h/hillge01.html', '/players/h/hillge01.html', '/players/h/hillge01.html', '/players/h/hillso01.html', '/players/h/holidaa01.html', '/players/h/holidjr01.html', '/players/h/holidju01.html', '/players/h/holidju01.html', '/players/h/holidju01.html', '/players/h/hollajo02.html', '/players/h/holliro01.html', '/players/h/holmeri01.html', '/players/h/hoodro01.html', '/players/h/hoodro01.html', '/players/h/hoodro01.html', '/players/h/horfoal01.html', '/players/h/houseda01.html', '/players/h/howardw01.html', '/players/h/huertke01.html', '/players/h/humphis01.html', '/players/h/hunterj01.html', '/players/h/hutchch01.html', '/players/i/ibakase01.html', '/players/i/iguodan01.html', '/players/i/ilyaser01.html', '/players/i/inglejo01.html', '/players/i/ingraan01.html', '/players/i/ingrabr01.html', '/players/i/irvinky01.html', '/players/i/isaacjo01.html', '/players/i/iwundwe01.html', '/players/j/jacksde01.html', '/players/j/jacksfr01.html', '/players/j/jacksja02.html', '/players/j/jacksjo02.html', '/players/j/jacksju01.html', '/players/j/jacksju01.html', '/players/j/jacksju01.html', '/players/j/jacksre01.html', '/players/j/jamesle01.html', '/players/j/jeffeam01.html', '/players/j/jenkijo01.html', '/players/j/jenkijo01.html', '/players/j/jenkijo01.html', '/players/j/jerebjo01.html', '/players/j/johnsal02.html', '/players/j/johnsam01.html', '/players/j/johnsbj01.html', '/players/j/johnsbj01.html', '/players/j/johnsbj01.html', '/players/j/johnsja01.html', '/players/j/johnsst04.html', '/players/j/johnsst04.html', '/players/j/johnsst04.html', '/players/j/johnsty01.html', '/players/j/johnsty01.html', '/players/j/johnsty01.html', '/players/j/johnswe01.html', '/players/j/johnswe01.html', '/players/j/johnswe01.html', '/players/j/jokicni01.html', '/players/j/jonesda03.html', '/players/j/jonesde02.html', '/players/j/jonesja04.html', '/players/j/jonesje01.html', '/players/j/joneste01.html', '/players/j/jonesty01.html', '/players/j/jordade01.html', '/players/j/jordade01.html', '/players/j/jordade01.html', '/players/j/josepco01.html', '/players/k/kaminfr01.html', '/players/k/kanteen01.html', '/players/k/kanteen01.html', '/players/k/kanteen01.html', '/players/k/kennalu01.html', '/players/k/kiddgmi01.html', '/players/k/kingge03.html', '/players/k/klebima01.html', '/players/k/knighbr03.html', '/players/k/knighbr03.html', '/players/k/knighbr03.html', '/players/k/knoxke01.html', '/players/k/korkmfu01.html', '/players/k/kornelu01.html', '/players/k/korveky01.html', '/players/k/korveky01.html', '/players/k/korveky01.html', '/players/k/koufoko01.html', '/players/k/kurucro01.html', '/players/k/kuzmaky01.html', '/players/l/labissk01.html', '/players/l/labissk01.html', '/players/l/labissk01.html', '/players/l/lambje01.html', '/players/l/lavinza01.html', '/players/l/laymaja01.html', '/players/l/leaftj01.html', '/players/l/leeco01.html', '/players/l/leeco01.html', '/players/l/leeco01.html', '/players/l/leeda03.html', '/players/l/lemonwa01.html', '/players/l/lenal01.html', '/players/l/leonaka01.html', '/players/l/leoname01.html', '/players/l/leuerjo01.html', '/players/l/leverca01.html', '/players/l/lillada01.html', '/players/l/linje01.html', '/players/l/linje01.html', '/players/l/linje01.html', '/players/l/livinsh01.html', '/players/l/loftoza01.html', '/players/l/looneke01.html', '/players/l/lopezbr01.html', '/players/l/lopezro01.html', '/players/l/loveke01.html', '/players/l/lowryky01.html', '/players/l/loydjo01.html', '/players/l/lucaska01.html', '/players/l/luwawti01.html', '/players/l/luwawti01.html', '/players/l/luwawti01.html', '/players/l/lydonty01.html', '/players/l/lylestr01.html', '/players/m/machasc01.html', '/players/m/macksh01.html', '/players/m/macksh01.html', '/players/m/macksh01.html', '/players/m/maconda01.html', '/players/m/macurjp01.html', '/players/m/mahinia01.html', '/players/m/makerth01.html', '/players/m/makerth01.html', '/players/m/makerth01.html', '/players/m/marjabo01.html', '/players/m/marjabo01.html', '/players/m/marjabo01.html', '/players/m/markkla01.html', '/players/m/martija01.html', '/players/m/masonfr01.html', '/players/m/matenya01.html', '/players/m/matthwe02.html', '/players/m/matthwe02.html', '/players/m/matthwe02.html', '/players/m/matthwe02.html', '/players/m/mbahalu01.html', '/players/m/mccalta01.html', '/players/m/mccawpa01.html', '/players/m/mccawpa01.html', '/players/m/mccawpa01.html', '/players/m/mccolcj01.html', '/players/m/mccontj01.html', '/players/m/mcderdo01.html', '/players/m/mcgeeja01.html', '/players/m/mcgruro01.html', '/players/m/mckinal01.html', '/players/m/mclembe01.html', '/players/m/mcraejo01.html', '/players/m/meeksjo01.html', '/players/m/mejrisa01.html', '/players/m/meltode01.html', '/players/m/metuch01.html', '/players/m/middlkh01.html', '/players/m/milescj01.html', '/players/m/milescj01.html', '/players/m/milescj01.html', '/players/m/milleda01.html', '/players/m/millema01.html', '/players/m/millspa02.html', '/players/m/millspa01.html', '/players/m/miltosh01.html', '/players/m/mirotni01.html', '/players/m/mirotni01.html', '/players/m/mirotni01.html', '/players/m/mitchdo01.html', '/players/m/mitrona01.html', '/players/m/monkma01.html', '/players/m/monrogr01.html', '/players/m/monrogr01.html', '/players/m/monrogr01.html', '/players/m/monrogr01.html', '/players/m/mooreet01.html', '/players/m/moreler01.html', '/players/m/moreler01.html', '/players/m/moreler01.html', '/players/m/morrija01.html', '/players/m/morrima03.html', '/players/m/morrima02.html', '/players/m/morrima02.html', '/players/m/morrima02.html', '/players/m/morrimo01.html', '/players/m/motiedo01.html', '/players/m/motlejo01.html', '/players/m/mudiaem01.html', '/players/m/murraja01.html', '/players/m/musadz01.html', '/players/m/muscami01.html', '/players/m/muscami01.html', '/players/m/muscami01.html', '/players/m/mykhasv01.html', '/players/m/mykhasv01.html', '/players/m/mykhasv01.html', '/players/n/naderab01.html', '/players/n/nancela02.html', '/players/n/napiesh01.html', '/players/n/netora01.html', '/players/n/niangge01.html', '/players/n/noahjo01.html', '/players/n/noelne01.html', '/players/n/nowitdi01.html', '/players/n/ntilila01.html', '/players/n/nunnaja01.html', '/players/n/nunnaja01.html', '/players/n/nunnaja01.html', '/players/n/nurkiju01.html', '/players/n/nwabada01.html', '/players/o/onealro01.html', '/players/o/oquinky01.html', '/players/o/ojelese01.html', '/players/o/okafoja01.html', '/players/o/okoboel01.html', '/players/o/okogijo01.html', '/players/o/oladivi01.html', '/players/o/olynyke01.html', '/players/o/osmande01.html', '/players/o/oubreke01.html', '/players/o/oubreke01.html', '/players/o/oubreke01.html', '/players/p/pachuza01.html', '/players/p/parkeja01.html', '/players/p/parkeja01.html', '/players/p/parkeja01.html', '/players/p/parketo01.html', '/players/p/parsoch01.html', '/players/p/pattepa01.html', '/players/p/pattoju01.html', '/players/p/paulch01.html', '/players/p/payneca01.html', '/players/p/payneca01.html', '/players/p/payneca01.html', '/players/p/paytoel01.html', '/players/p/paytoga02.html', '/players/p/pinsoth01.html', '/players/p/plumlma01.html', '/players/p/plumlmi01.html', '/players/p/poeltja01.html', '/players/p/pondequ01.html', '/players/p/porteot01.html', '/players/p/porteot01.html', '/players/p/porteot01.html', '/players/p/portibo01.html', '/players/p/portibo01.html', '/players/p/portibo01.html', '/players/p/poweldw01.html', '/players/p/powelno01.html', '/players/p/poythal01.html', '/players/q/qizh01.html', '/players/r/rabbiv01.html', '/players/r/randlch01.html', '/players/r/randlju01.html', '/players/r/redicjj01.html', '/players/r/reedda01.html', '/players/r/reynoca01.html', '/players/r/richajo01.html', '/players/r/richama01.html', '/players/r/riverau01.html', '/players/r/riverau01.html', '/players/r/riverau01.html', '/players/r/robinde01.html', '/players/r/robindu01.html', '/players/r/robingl02.html', '/players/r/robinje01.html', '/players/r/robinmi01.html', '/players/r/rondora01.html', '/players/r/rosede01.html', '/players/r/rosste01.html', '/players/r/roziete01.html', '/players/r/rubiori01.html', '/players/r/russeda01.html', '/players/s/sabondo01.html', '/players/s/sampsbr01.html', '/players/s/sampsja02.html', '/players/s/saricda01.html', '/players/s/saricda01.html', '/players/s/saricda01.html', '/players/s/satorto01.html', '/players/s/schrode01.html', '/players/s/scottmi01.html', '/players/s/scottmi01.html', '/players/s/scottmi01.html', '/players/s/sefolth01.html', '/players/s/seldewa01.html', '/players/s/seldewa01.html', '/players/s/seldewa01.html', '/players/s/sextoco01.html', '/players/s/shamela01.html', '/players/s/shamela01.html', '/players/s/shamela01.html', '/players/s/shumpim01.html', '/players/s/shumpim01.html', '/players/s/shumpim01.html', '/players/s/siakapa01.html', '/players/s/siberjo01.html', '/players/s/simmobe01.html', '/players/s/simmojo02.html', '/players/s/simmojo02.html', '/players/s/simmojo02.html', '/players/s/simmoko01.html', '/players/s/simonan01.html', '/players/s/smartma01.html', '/players/s/smithde03.html', '/players/s/smithde03.html', '/players/s/smithde03.html', '/players/s/smithis01.html', '/players/s/smithjr01.html', '/players/s/smithja02.html', '/players/s/smithja02.html', '/players/s/smithja02.html', '/players/s/smithja02.html', '/players/s/smithzh01.html', '/players/s/snellto01.html', '/players/s/spaldra01.html', '/players/s/spaldra01.html', '/players/s/spaldra01.html', '/players/s/spellom01.html', '/players/s/stausni01.html', '/players/s/stausni01.html', '/players/s/stausni01.html', '/players/s/stephdj01.html', '/players/s/stephla01.html', '/players/s/sumneed01.html', '/players/s/swanica01.html', '/players/s/swanica01.html', '/players/s/swanica01.html', '/players/t/tatumja01.html', '/players/t/teaguje01.html', '/players/t/templga01.html', '/players/t/templga01.html', '/players/t/templga01.html', '/players/t/teodomi01.html', '/players/t/terreja01.html', '/players/t/terryem01.html', '/players/t/terryem01.html', '/players/t/terryem01.html', '/players/t/theisda01.html', '/players/t/thomais02.html', '/players/t/thomakh01.html', '/players/t/thomala01.html', '/players/t/thompkl01.html', '/players/t/thomptr01.html', '/players/t/thornsi01.html', '/players/t/tollian01.html', '/players/t/townska01.html', '/players/t/trentga02.html', '/players/t/trieral01.html', '/players/t/tuckepj01.html', '/players/t/turneev01.html', '/players/t/turnemy01.html', '/players/u/udohek01.html', '/players/u/ulisty01.html', '/players/v/valanjo01.html', '/players/v/valanjo01.html', '/players/v/valanjo01.html', '/players/v/vandeja01.html', '/players/v/vanvlfr01.html', '/players/v/vonleno01.html', '/players/v/vucevni01.html', '/players/w/wadedw01.html', '/players/w/wagnemo01.html', '/players/w/waitedi01.html', '/players/w/walkeke02.html', '/players/w/walkelo01.html', '/players/w/walljo01.html', '/players/w/wallaty01.html', '/players/p/princta02.html', '/players/w/wanambr01.html', '/players/w/warretj01.html', '/players/w/washbju01.html', '/players/w/watanyu01.html', '/players/w/welshth01.html', '/players/w/westbru01.html', '/players/w/whitede01.html', '/players/w/whiteok01.html', '/players/w/whiteha01.html', '/players/w/wiggian01.html', '/players/w/willial03.html', '/players/w/willicj01.html', '/players/w/willijo04.html', '/players/w/willike04.html', '/players/w/willilo02.html', '/players/w/willima02.html', '/players/w/williro04.html', '/players/w/willitr02.html', '/players/w/wilsodj01.html', '/players/w/winslju01.html', '/players/w/woodch01.html', '/players/w/woodch01.html', '/players/w/woodch01.html', '/players/w/wrighde01.html', '/players/w/wrighde01.html', '/players/w/wrighde01.html', '/players/y/yabusgu01.html', '/players/y/youngni01.html', '/players/y/youngth01.html', '/players/y/youngtr01.html', '/players/z/zelleco01.html', '/players/z/zellety01.html', '/players/z/zellety01.html', '/players/z/zellety01.html', '/players/z/zizican01.html', '/players/z/zubaciv01.html', '/players/z/zubaciv01.html', '/players/z/zubaciv01.html']

I would use a set comprehension to remove duplicates and also I think nth-of-type to select the appropriate column reads more cleanly. Using bs4 4.7.1
import requests
from bs4 import BeautifulSoup as bs
soup = bs(requests.get('https://www.basketball-reference.com/leagues/NBA_2019_per_game.html').text, 'html.parser')
links = {i['href'] for i in soup.select('#per_game_stats td:nth-of-type(1) a')}
print(links)
You could also use the following css selector:
[csk] > a

Python: Write pandas dataframe to google sheets: AttributeError: 'module' object has no attribute 'open'

I have this python file that will write to my Google sheet using the Google
Sheets API just fine using the code from this example from
Python Google API writing to Google Sheet
google-sheet:
values = [
["Item", "Cost", "Stocked", "Ship Date",],
["Wheel", "20.50", "4", "3/1/2016",],
["Door", "15", "2", "3/15/2016",],
["Engine", "100", "1", "30/20/2016",],
["Totals", "=SUM(B2:B4)", "=SUM(C2:C4)", "=MAX(D2:D4)",],
]
Body = {
'values' : values,
'majorDimension' : 'ROWS',
}
}
result = service.spreadsheets().values().update(
spreadsheetId=spreadsheetID, range=rangeName,
valueInputOption='USER_ENTERED', body=Body).execute()
print("Writing OK!!")
However I want to write a dataframe that looks like this:
Date ORCL TSLA IBM YELP MSFT
0 10/24/2016 37.665958 202.759995 145.080612 34.48 59.564964
1 10/25/2016 37.754536 202.339996 145.379303 33.950001 59.555199
2 10/26/2016 37.705326 202.240005 146.275406 33.490002 59.203667
3 10/27/2016 37.616749 204.009995 147.759277 32.740002 58.686134
4 10/28/2016 37.567539 199.970001 147.046234 32.290001 58.461548
5 10/31/2016 37.813587 197.729996 148.086884 32.66 58.510365
246 10/16/2017 48.860001 350.600006 146.830002 43.52 77.650002
247 10/17/2017 49.189999 355.75 146.539993 43.200001 77.589996
248 10/18/2017 49.580002 359.649994 159.529999 44.580002 77.610001
249 10/19/2017 49.349998 351.809998 160.899994 44.439999 77.910004
250 10/20/2017 49.25 345.100006 162.070007 44.52 78.809998
251 10/23/2017 49.310001 337.019989 159.550003 43.599998 78.830002
I have tried from here:
Appending pandas Data Frame to Google spreadsheet
to-google-spreadsheet
import gspread
import gc
import pandas as pd
gc = gspread.authorize(credentials)
sh.share('otto#gmail.com', perm_type='user', role='writer')
sh = gc.open_by_key('1C09vB5F8zcyOrY4w_rctVUedXYJCZqtyoTc-bB0bgBY')
sheetName = 'sheet1'
rangeName = "Sheet1!A1:F252"
I declared my dataframe:
df = pd.DataFrame()
After some processing a non-empty data frame has been created...and
populated it:
Date ORCL TSLA IBM YELP MSFT
0 10/24/2016 37.665958 202.759995 145.080612 34.48 59.564964
1 10/25/2016 37.754536 202.339996 145.379303 33.950001 59.555199
2 10/26/2016 37.705326 202.240005 146.275406 33.490002 59.203667
3 10/27/2016 37.616749 204.009995 147.759277 32.740002 58.686134
4 10/28/2016 37.567539 199.970001 147.046234 32.290001 58.461548
5 10/31/2016 37.813587 197.729996 148.086884 32.66 58.510365
246 10/16/2017 48.860001 350.600006 146.830002 43.52 77.650002
247 10/17/2017 49.189999 355.75 146.539993 43.200001 77.589996
248 10/18/2017 49.580002 359.649994 159.529999 44.580002 77.610001
249 10/19/2017 49.349998 351.809998 160.899994 44.439999 77.910004
250 10/20/2017 49.25 345.100006 162.070007 44.52 78.809998
251 10/23/2017 49.310001 337.019989 159.550003 43.599998 78.830002
# Output_conn = gc.open("SheetName").worksheet("xyz")
# Here 'SheetName' is google spreadsheet and 'xyz' is sheet in the workbook
Output_conn = gc.open(sheetName).worksheet(spreadsheetid)
for i,row in df.iterrows():
Output_conn.append_row(Row)
from: Appending pandas Data Frame to Google spreadsheet frame-to-google-spreadsheet
I have this error msg at this line:
Output_conn = gc.open(sheetName).worksheet(spreadsheetid)
AttributeError: 'module' object has no attribute 'open'
Updated code:
import gspread
import gc
import pandas as pd
gc = gspread.authorize(credentials)
sh.share('otto#gmail.com', perm_type='user', role='writer')
sh = gc.open_by_key('1C09vB5F8zcyOrY4w_rctVUedXYJCZqtyoTc-bB0bgBY')
Error msg: UnboundedLocalError: local variable 'sh' referenced before assignment
Any help? Thanks!

Extracting data with BeautifulSoup and output to CSV

As mentioned in the previous questions, I am using Beautiful soup with python to retrieve weather data from a website.
Here's how the website looks like:
<channel>
<title>2 Hour Forecast</title>
<source>Meteorological Services Singapore</source>
<description>2 Hour Forecast</description>
<item>
<title>Nowcast Table</title>
<category>Singapore Weather Conditions</category>
<forecastIssue date="18-07-2016" time="03:30 PM"/>
<validTime>3.30 pm to 5.30 pm</validTime>
<weatherForecast>
<area forecast="TL" lat="1.37500000" lon="103.83900000" name="Ang Mo Kio"/>
<area forecast="SH" lat="1.32100000" lon="103.92400000" name="Bedok"/>
<area forecast="TL" lat="1.35077200" lon="103.83900000" name="Bishan"/>
<area forecast="CL" lat="1.30400000" lon="103.70100000" name="Boon Lay"/>
<area forecast="CL" lat="1.35300000" lon="103.75400000" name="Bukit Batok"/>
<area forecast="CL" lat="1.27700000" lon="103.81900000" name="Bukit Merah"/>`
<channel>
I managed to retrieve the information I need using these codes :
import requests
from bs4 import BeautifulSoup
import urllib3
#getting the ValidTime
r = requests.get('http://www.nea.gov.sg/api/WebAPI/?
dataset=2hr_nowcast&keyref=781CF461BB6606AD907750DFD1D07667C6E7C5141804F45D')
soup = BeautifulSoup(r.content, "xml")
time = soup.find('validTime').string
print "validTime: " + time
#getting the date
for currentdate in soup.find_all('item'):
element = currentdate.find('forecastIssue')
print "date: " + element['date']
#getting the time
for currentdate in soup.find_all('item'):
element = currentdate.find('forecastIssue')
print "time: " + element['time']
for area in soup.find('weatherForecast').find_all('area'):
area_attrs_li = [area.attrs for area in soup.find('weatherForecast').find_all('area')]
print area_attrs_li
Here are my results :
{'lat': u'1.34039000', 'lon': u'103.70500000', 'name': u'Jurong West',
'forecast': u'LR'}, {'lat': u'1.31200000', 'lon': u'103.86200000', 'name':
u'Kallang', 'forecast': u'LR'},
How do I remove u' from the result? I tried using the method I found while googling but it doesn't seem to work
I'm not strong in Python and have been stuck at this for quite a while.
EDIT : I tried doing this :
f = open("C:\\scripts\\nea.csv" , 'wt')
try:
for area in area_attrs_li:
writer = csv.writer(f)
writer.writerow( (time, element['date'], element['time'], area_attrs_li))
finally:
f.close()
print open("C:/scripts/nea.csv", 'rt').read()
It worked however, I would like to split the area apart as the records are duplicates in the CSV :
Thank you.

EDIT 1 -Topic:
You're missing escape characters:
C:\scripts>python neaweather.py
File "neaweather.py", line 30
writer.writerow( ('time', 'element['date']', 'element['time']', 'area_attrs_li') )
writer.writerow( ('time', 'element[\'date\']', 'element[\'time\']', 'area_attrs_li')
^
SyntaxError: invalid syntax
EDIT 2:
if you want to insert values:
writer.writerow( (time, element['date'], element['time'], area_attrs_li) )
EDIT 3:
to split the result to different lines:
for area in area_attrs_li:
writer.writerow( (time, element['date'], element['time'], area)
EDIT 4:
The splitting is not correct at all, but it shall give a better understanding of how to parse and split data to change it for your needs.
to split the area element again as you show in your image, you can parse it
for area in area_attrs_li:
# cut off the characters you don't need
area = area.replace('[','')
area = area.replace(']','')
area = area.replace('{','')
area = area.replace('}','')
# remove other characters
area = area.replace("u'","\"").replace("'","\"")
# split the string into a list
areaList = area.split(",")
# create your own csv-seperator
ownRowElement = ';'.join(areaList)
writer.writerow( (time, element['date'], element['time'], ownRowElement)
Offtopic:
This works for me:
import csv
import json
x="""[
{'lat': u'1.34039000', 'lon': u'103.70500000', 'name': u'Jurong West','forecast': u'LR'}
]"""
jsontxt = json.loads(x.replace("u'","\"").replace("'","\""))
f = csv.writer(open("test.csv", "w+"))
# Write CSV Header, If you dont need that, remove this line
f.writerow(['lat', 'lon', 'name', 'forecast'])
for jsontext in jsontxt:
f.writerow([jsontext["lat"],
jsontext["lon"],
jsontext["name"],
jsontext["forecast"],
])

AttributeError when extracting data from a URL in Python

I am using the code below to try an extract the data at the table in this URL. However, I get the following error message:
Error: `AttributeError: 'NoneType' object has no attribute 'find'`in
the line `data = iter(soup.find("table", {"class":
"tablestats"}).find("th", {"class": "header"}).find_all_next("tr"))`
My code is as follows:
from bs4 import BeautifulSoup
import requests
r = requests.get(
"http://www.federalreserve.gov/econresdata/researchdata/feds200628_1.html")
soup = BeautifulSoup(r.content)
data = iter(soup.find("table", {"class": "tablestats"}).find("th", {"class": "header"}).find_all_next("tr"))
headers = (next(data).text, next(data).text)
table_items = [(a.text, b.text) for ele in data for a, b in [ele.find_all("td")]]
for a, b in table_items:
print(u"Date={}, Maturity={}".format(a, b if b.strip() else "null"))
Thank You

from bs4 import BeautifulSoup
import requests
r = requests.get(
"http://www.federalreserve.gov/econresdata/researchdata/feds200628_1.html")
soup = BeautifulSoup(r.content)
# column headers
h = data.find_all("th", scope="col")
# get all the tr tags after the headers
final = [[t.th.text] + [ele.text for ele in t.find_all("td")] for t in h[-1].find_all_next("tr")]
headers = [th.text for th in h]
The final out list is all the rows in individual lists:
[['2015-06-05', '4.82039691', '-4.66420959', '-4.18904598',
'-3.94541434', '1.1477', '2.9361', '3.3588', '0.6943', '1.5881',
'2.3034', '2.7677', '3.0363', '3.1801', '3.2537', '3.2930', '3.3190',
'3.3431', '3.3707', '3.4038', '3.4428', '3.4871', '3.5357', '3.5876',
'3.6419', '3.6975', '3.7538', '3.8100', '3.8656', '3.9202', '3.9734',
'4.0250', '4.0748', '4.1225', '4.1682', '4.2117', '4.2530', '4.2921',
'0.3489', '0.7464', '1.1502', '1.4949', '1.7700', '1.9841', '2.1500',
'2.2800', '2.3837', '2.4685', '2.5396', '2.6006', '2.6544', '2.7027',
'2.7469', '2.7878', '2.8260', '2.8621', '2.8964', '2.9291', '2.9603',
'2.9901', '3.0187', '3.0461', '3.0724', '3.0976', '3.1217', '3.1448',
'3.1669', '3.1881', '0.3487', '0.7469', '1.1536', '1.5039', '1.7862',
'2.0078', '2.1811', '2.3179', '2.4277', '2.5181', '2.5943', '2.6603',
'2.7190', '2.7722', '2.8215', '2.8677', '2.9117', '2.9538', '2.9944',
'3.0338', '3.0721', '3.1094', '3.1458', '3.1814', '3.2161', '3.2501',
'3.2832', '3.3156', '3.3472', '3.3781', '1.40431658', '9.48795888'],
['2015-06-04', '4.64953424', '-4.52780982', '-3.98051369',
......................................
The headers:
['BETA0', 'BETA1', 'BETA2', 'BETA3', 'SVEN1F01', 'SVEN1F04', 'SVEN1F09', 'SVENF01', 'SVENF02', 'SVENF03', 'SVENF04', 'SVENF05', 'SVENF06', 'SVENF07', 'SVENF08', 'SVENF09', 'SVENF10', 'SVENF11', 'SVENF12', 'SVENF13', 'SVENF14', 'SVENF15', 'SVENF16', 'SVENF17', 'SVENF18', 'SVENF19', 'SVENF20', 'SVENF21', 'SVENF22', 'SVENF23', 'SVENF24', 'SVENF25', 'SVENF26', 'SVENF27', 'SVENF28', 'SVENF29', 'SVENF30', 'SVENPY01', 'SVENPY02', 'SVENPY03', 'SVENPY04', 'SVENPY05', 'SVENPY06', 'SVENPY07', 'SVENPY08', 'SVENPY09', 'SVENPY10', 'SVENPY11', 'SVENPY12', 'SVENPY13', 'SVENPY14', 'SVENPY15', 'SVENPY16', 'SVENPY17', 'SVENPY18', 'SVENPY19', 'SVENPY20', 'SVENPY21', 'SVENPY22', 'SVENPY23', 'SVENPY24', 'SVENPY25', 'SVENPY26', 'SVENPY27', 'SVENPY28', 'SVENPY29', 'SVENPY30', 'SVENY01', 'SVENY02', 'SVENY03', 'SVENY04', 'SVENY05', 'SVENY06', 'SVENY07', 'SVENY08', 'SVENY09', 'SVENY10', 'SVENY11', 'SVENY12', 'SVENY13', 'SVENY14', 'SVENY15', 'SVENY16', 'SVENY17', 'SVENY18', 'SVENY19', 'SVENY20', 'SVENY21', 'SVENY22', 'SVENY23', 'SVENY24', 'SVENY25', 'SVENY26', 'SVENY27', 'SVENY28', 'SVENY29', 'SVENY30', 'TAU1', 'TAU2']

There are a lot of issues in your code.
There is no table with class 'tablestats'.
There are no 'th' fields with class 'header'.
Following line-
table_items = [(a.text, b.text) for ele in data for a, b in [ele.find_all("td")]]
doesnt return just 2 values, so cant assign to a, b

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Scraping Asxp page with beautiful soup - python

Related

How to extract information from atom feed based on condition?

Trying to grab just the first href in each table row

Python: Write pandas dataframe to google sheets: AttributeError: 'module' object has no attribute 'open'

Extracting data with BeautifulSoup and output to CSV

AttributeError when extracting data from a URL in Python

Categories

Resources