Trying to grab just the first href in each table row - python
I'm trying to grab just the first href in each row in an HTML table. Using find_all on the soup object doesn't work because there are multiple tables so I used soup.select() to isolate just that table and work from there but it doesn't seem to be working.
Tried using find_all on the soup object alone, tried looping through the table rows with find() but it said that it returns 'NoneType'.
I Would like to be able to store a list that starts [/players/a/abrinal01.html, "/players/a/acyqu01.html, etc]
url = 'https://www.basketball-reference.com/leagues/NBA_2019_per_game.html'
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'html.parser')
table = soup.find("table", { "id" : "per_game_stats" })
You can access the desired data by anchoring the parsing from the outer div wrapper with the id of all_per_game_stats:
import requests
from bs4 import BeautifulSoup as soup
d = soup(requests.get('https://www.basketball-reference.com/leagues/NBA_2019_per_game.html').text, 'html.parser')
data = [b.td.a['href'] for b in d.find('div', {'id':'all_per_game_stats'}).table.find_all('tr') if b.td]
Output:
['/players/a/abrinal01.html', '/players/a/acyqu01.html', '/players/a/adamsja01.html', '/players/a/adamsst01.html', '/players/a/adebaba01.html', '/players/a/adelde01.html', '/players/a/akoonde01.html', '/players/a/aldrila01.html', '/players/a/alkinra01.html', '/players/a/allengr01.html', '/players/a/allenja01.html', '/players/a/allenka01.html', '/players/a/aminual01.html', '/players/a/anderju01.html', '/players/a/anderky01.html', '/players/a/anderry01.html', '/players/a/anderry01.html', '/players/a/anderry01.html', '/players/a/anigbik01.html', '/players/a/antetgi01.html', '/players/a/antetko01.html', '/players/a/anthoca01.html', '/players/a/anunoog01.html', '/players/a/arcidry01.html', '/players/a/arizatr01.html', '/players/a/arizatr01.html', '/players/a/arizatr01.html', '/players/a/augusdj01.html', '/players/a/aytonde01.html', '/players/b/bacondw01.html', '/players/b/baglema01.html', '/players/b/bakerro01.html', '/players/b/bakerro01.html', '/players/b/bakerro01.html', '/players/b/baldwwa01.html', '/players/b/balllo01.html', '/players/b/bambamo01.html', '/players/b/bareajo01.html', '/players/b/barneha02.html', '/players/b/barneha02.html', '/players/b/barneha02.html', '/players/b/bartowi01.html', '/players/b/bateske01.html', '/players/b/batumni01.html', '/players/b/bayleje01.html', '/players/b/baynear01.html', '/players/b/bazemke01.html', '/players/b/bealbr01.html', '/players/b/beaslma01.html', '/players/b/beaslmi01.html', '/players/b/belinma01.html', '/players/b/belljo01.html', '/players/b/bembrde01.html', '/players/b/bendedr01.html', '/players/b/bertada02.html', '/players/b/bertada01.html', '/players/b/beverpa01.html', '/players/b/birchkh01.html', '/players/b/biyombi01.html', '/players/b/bjeline01.html', '/players/b/blakean01.html', '/players/b/bledser01.html', '/players/b/blossja01.html', '/players/b/bogdabo01.html', '/players/b/bogdabo02.html', '/players/b/bogutan01.html', '/players/b/boldejo01.html', '/players/b/bongais01.html', '/players/b/bookede01.html', '/players/b/bouchch01.html', '/players/b/bradlav01.html', '/players/b/bradlav01.html', '/players/b/bradlav01.html', '/players/b/bradlto01.html', '/players/b/breweco01.html', '/players/b/breweco01.html', '/players/b/breweco01.html', '/players/b/bridgmi01.html', '/players/b/bridgmi02.html', '/players/b/briscis01.html', '/players/b/broekry01.html', '/players/b/brogdma01.html', '/players/b/brookdi01.html', '/players/b/brookma01.html', '/players/b/brownbr01.html', '/players/b/brownja02.html', '/players/b/brownlo01.html', '/players/b/brownst02.html', '/players/b/browntr01.html', '/players/b/brunsja01.html', '/players/b/bryanth01.html', '/players/b/bullore01.html', '/players/b/bullore01.html', '/players/b/bullore01.html', '/players/b/burketr01.html', '/players/b/burketr01.html', '/players/b/burketr01.html', '/players/b/burksal01.html', '/players/b/burksal01.html', '/players/b/burksal01.html', '/players/b/burksal01.html', '/players/b/burtode02.html', '/players/b/butleji01.html', '/players/b/butleji01.html', '/players/b/butleji01.html', '/players/c/cabocbr01.html', '/players/c/caldejo01.html', '/players/c/caldwke01.html', '/players/c/canaais01.html', '/players/c/canaais01.html', '/players/c/canaais01.html', '/players/c/canaais01.html', '/players/c/capelca01.html', '/players/c/carrode01.html', '/players/c/carteje01.html', '/players/c/cartevi01.html', '/players/c/cartewe01.html', '/players/c/cartemi01.html', '/players/c/cartemi01.html', '/players/c/cartemi01.html', '/players/c/carusal01.html', '/players/c/casspom01.html', '/players/c/caulewi01.html', '/players/c/caupatr01.html', '/players/c/cavanty01.html', '/players/c/chandty01.html', '/players/c/chandty01.html', '/players/c/chandty01.html', '/players/c/chandwi01.html', '/players/c/chandwi01.html', '/players/c/chandwi01.html', '/players/c/chealjo01.html', '/players/c/chiozch01.html', '/players/c/chrisma01.html', '/players/c/chrisma01.html', '/players/c/chrisma01.html', '/players/c/clarkga01.html', '/players/c/clarkia01.html', '/players/c/clarkjo01.html', '/players/c/collijo01.html', '/players/c/colliza01.html', '/players/c/collida01.html', '/players/c/colsobo01.html', '/players/c/conlemi01.html', '/players/c/connapa01.html', '/players/c/cookqu01.html', '/players/c/couside01.html', '/players/c/covinro01.html', '/players/c/covinro01.html', '/players/c/covinro01.html', '/players/c/crabbal01.html', '/players/c/craigto01.html', '/players/c/crawfja01.html', '/players/c/creekmi01.html', '/players/c/creekmi01.html', '/players/c/creekmi01.html', '/players/c/crowdja01.html', '/players/c/cunnida01.html', '/players/c/curryse01.html', '/players/c/curryst01.html', '/players/d/danietr01.html', '/players/d/davisan02.html', '/players/d/davisde01.html', '/players/d/davised01.html', '/players/d/davisty01.html', '/players/d/dedmode01.html', '/players/d/dekkesa01.html', '/players/d/dekkesa01.html', '/players/d/dekkesa01.html', '/players/d/delgaan01.html', '/players/d/dellama01.html', '/players/d/dellama01.html', '/players/d/dellama01.html', '/players/d/denglu01.html', '/players/d/derozde01.html', '/players/d/derrima01.html', '/players/d/diallch01.html', '/players/d/diallha01.html', '/players/d/dienggo01.html', '/players/d/dinwisp01.html', '/players/d/divindo01.html', '/players/d/doncilu01.html', '/players/d/dorsety01.html', '/players/d/dorsety01.html', '/players/d/dorsety01.html', '/players/d/dotsoda01.html', '/players/d/doziepj01.html', '/players/d/dragigo01.html', '/players/d/drumman01.html', '/players/d/dudleja01.html', '/players/d/dunnkr01.html', '/players/d/duranke01.html', '/players/d/duvaltr01.html', '/players/e/edwarvi01.html', '/players/e/ellenhe01.html', '/players/e/ellenhe01.html', '/players/e/ellenhe01.html', '/players/e/ellinwa01.html', '/players/e/ellinwa01.html', '/players/e/ellinwa01.html', '/players/e/embiijo01.html', '/players/e/ennisja01.html', '/players/e/ennisja01.html', '/players/e/ennisja01.html', '/players/e/eubandr01.html', '/players/e/evansja02.html', '/players/e/evansja01.html', '/players/e/evansja01.html', '/players/e/evansja01.html', '/players/e/evansty01.html', '/players/e/exumda01.html', '/players/f/farieke01.html', '/players/f/farieke01.html', '/players/f/farieke01.html', '/players/f/favorde01.html', '/players/f/feliccr01.html', '/players/f/feltora01.html', '/players/f/fergute01.html', '/players/f/ferreyo01.html', '/players/f/finnedo01.html', '/players/f/forbebr01.html', '/players/f/fournev01.html', '/players/f/foxde01.html', '/players/f/frazime01.html', '/players/f/fraziti01.html', '/players/f/fraziti01.html', '/players/f/fraziti01.html', '/players/f/fredeji01.html', '/players/f/fryech01.html', '/players/f/fultzma01.html', '/players/g/gallida01.html', '/players/g/gallola01.html', '/players/g/garrebi01.html', '/players/g/gasolma01.html', '/players/g/gasolma01.html', '/players/g/gasolma01.html', '/players/g/gasolpa01.html', '/players/g/gasolpa01.html', '/players/g/gasolpa01.html', '/players/g/gayru01.html', '/players/g/georgpa01.html', '/players/g/gibsota01.html', '/players/g/gilesha01.html', '/players/g/gilgesh01.html', '/players/g/goberru01.html', '/players/g/goodwbr01.html', '/players/g/gordoaa01.html', '/players/g/gordoer01.html', '/players/g/gortama01.html', '/players/g/grahade01.html', '/players/g/grahatr01.html', '/players/g/grantje01.html', '/players/g/grantje02.html', '/players/g/grantdo01.html', '/players/g/greenda02.html', '/players/g/greendr01.html', '/players/g/greenge01.html', '/players/g/greenja01.html', '/players/g/greenja01.html', '/players/g/greenja01.html', '/players/g/greenje02.html', '/players/g/griffbl01.html', '/players/h/hamilda02.html', '/players/h/hannadu01.html', '/players/h/hardati02.html', '/players/h/hardati02.html', '/players/h/hardati02.html', '/players/h/hardeja01.html', '/players/h/harklma01.html', '/players/h/harremo01.html', '/players/h/harride01.html', '/players/h/harriga01.html', '/players/h/harrijo01.html', '/players/h/harrito02.html', '/players/h/harrito02.html', '/players/h/harrito02.html', '/players/h/harrian01.html', '/players/h/harrian01.html', '/players/h/harrian01.html', '/players/h/harrian01.html', '/players/h/harrish01.html', '/players/h/hartjo01.html', '/players/h/harteis01.html', '/players/h/hasleud01.html', '/players/h/haywago01.html', '/players/h/hensojo01.html', '/players/h/hernaju01.html', '/players/h/hernawi01.html', '/players/h/hezonma01.html', '/players/h/hicksis01.html', '/players/h/hieldbu01.html', '/players/h/highsha01.html', '/players/h/hilarne01.html', '/players/h/hillge01.html', '/players/h/hillge01.html', '/players/h/hillge01.html', '/players/h/hillso01.html', '/players/h/holidaa01.html', '/players/h/holidjr01.html', '/players/h/holidju01.html', '/players/h/holidju01.html', '/players/h/holidju01.html', '/players/h/hollajo02.html', '/players/h/holliro01.html', '/players/h/holmeri01.html', '/players/h/hoodro01.html', '/players/h/hoodro01.html', '/players/h/hoodro01.html', '/players/h/horfoal01.html', '/players/h/houseda01.html', '/players/h/howardw01.html', '/players/h/huertke01.html', '/players/h/humphis01.html', '/players/h/hunterj01.html', '/players/h/hutchch01.html', '/players/i/ibakase01.html', '/players/i/iguodan01.html', '/players/i/ilyaser01.html', '/players/i/inglejo01.html', '/players/i/ingraan01.html', '/players/i/ingrabr01.html', '/players/i/irvinky01.html', '/players/i/isaacjo01.html', '/players/i/iwundwe01.html', '/players/j/jacksde01.html', '/players/j/jacksfr01.html', '/players/j/jacksja02.html', '/players/j/jacksjo02.html', '/players/j/jacksju01.html', '/players/j/jacksju01.html', '/players/j/jacksju01.html', '/players/j/jacksre01.html', '/players/j/jamesle01.html', '/players/j/jeffeam01.html', '/players/j/jenkijo01.html', '/players/j/jenkijo01.html', '/players/j/jenkijo01.html', '/players/j/jerebjo01.html', '/players/j/johnsal02.html', '/players/j/johnsam01.html', '/players/j/johnsbj01.html', '/players/j/johnsbj01.html', '/players/j/johnsbj01.html', '/players/j/johnsja01.html', '/players/j/johnsst04.html', '/players/j/johnsst04.html', '/players/j/johnsst04.html', '/players/j/johnsty01.html', '/players/j/johnsty01.html', '/players/j/johnsty01.html', '/players/j/johnswe01.html', '/players/j/johnswe01.html', '/players/j/johnswe01.html', '/players/j/jokicni01.html', '/players/j/jonesda03.html', '/players/j/jonesde02.html', '/players/j/jonesja04.html', '/players/j/jonesje01.html', '/players/j/joneste01.html', '/players/j/jonesty01.html', '/players/j/jordade01.html', '/players/j/jordade01.html', '/players/j/jordade01.html', '/players/j/josepco01.html', '/players/k/kaminfr01.html', '/players/k/kanteen01.html', '/players/k/kanteen01.html', '/players/k/kanteen01.html', '/players/k/kennalu01.html', '/players/k/kiddgmi01.html', '/players/k/kingge03.html', '/players/k/klebima01.html', '/players/k/knighbr03.html', '/players/k/knighbr03.html', '/players/k/knighbr03.html', '/players/k/knoxke01.html', '/players/k/korkmfu01.html', '/players/k/kornelu01.html', '/players/k/korveky01.html', '/players/k/korveky01.html', '/players/k/korveky01.html', '/players/k/koufoko01.html', '/players/k/kurucro01.html', '/players/k/kuzmaky01.html', '/players/l/labissk01.html', '/players/l/labissk01.html', '/players/l/labissk01.html', '/players/l/lambje01.html', '/players/l/lavinza01.html', '/players/l/laymaja01.html', '/players/l/leaftj01.html', '/players/l/leeco01.html', '/players/l/leeco01.html', '/players/l/leeco01.html', '/players/l/leeda03.html', '/players/l/lemonwa01.html', '/players/l/lenal01.html', '/players/l/leonaka01.html', '/players/l/leoname01.html', '/players/l/leuerjo01.html', '/players/l/leverca01.html', '/players/l/lillada01.html', '/players/l/linje01.html', '/players/l/linje01.html', '/players/l/linje01.html', '/players/l/livinsh01.html', '/players/l/loftoza01.html', '/players/l/looneke01.html', '/players/l/lopezbr01.html', '/players/l/lopezro01.html', '/players/l/loveke01.html', '/players/l/lowryky01.html', '/players/l/loydjo01.html', '/players/l/lucaska01.html', '/players/l/luwawti01.html', '/players/l/luwawti01.html', '/players/l/luwawti01.html', '/players/l/lydonty01.html', '/players/l/lylestr01.html', '/players/m/machasc01.html', '/players/m/macksh01.html', '/players/m/macksh01.html', '/players/m/macksh01.html', '/players/m/maconda01.html', '/players/m/macurjp01.html', '/players/m/mahinia01.html', '/players/m/makerth01.html', '/players/m/makerth01.html', '/players/m/makerth01.html', '/players/m/marjabo01.html', '/players/m/marjabo01.html', '/players/m/marjabo01.html', '/players/m/markkla01.html', '/players/m/martija01.html', '/players/m/masonfr01.html', '/players/m/matenya01.html', '/players/m/matthwe02.html', '/players/m/matthwe02.html', '/players/m/matthwe02.html', '/players/m/matthwe02.html', '/players/m/mbahalu01.html', '/players/m/mccalta01.html', '/players/m/mccawpa01.html', '/players/m/mccawpa01.html', '/players/m/mccawpa01.html', '/players/m/mccolcj01.html', '/players/m/mccontj01.html', '/players/m/mcderdo01.html', '/players/m/mcgeeja01.html', '/players/m/mcgruro01.html', '/players/m/mckinal01.html', '/players/m/mclembe01.html', '/players/m/mcraejo01.html', '/players/m/meeksjo01.html', '/players/m/mejrisa01.html', '/players/m/meltode01.html', '/players/m/metuch01.html', '/players/m/middlkh01.html', '/players/m/milescj01.html', '/players/m/milescj01.html', '/players/m/milescj01.html', '/players/m/milleda01.html', '/players/m/millema01.html', '/players/m/millspa02.html', '/players/m/millspa01.html', '/players/m/miltosh01.html', '/players/m/mirotni01.html', '/players/m/mirotni01.html', '/players/m/mirotni01.html', '/players/m/mitchdo01.html', '/players/m/mitrona01.html', '/players/m/monkma01.html', '/players/m/monrogr01.html', '/players/m/monrogr01.html', '/players/m/monrogr01.html', '/players/m/monrogr01.html', '/players/m/mooreet01.html', '/players/m/moreler01.html', '/players/m/moreler01.html', '/players/m/moreler01.html', '/players/m/morrija01.html', '/players/m/morrima03.html', '/players/m/morrima02.html', '/players/m/morrima02.html', '/players/m/morrima02.html', '/players/m/morrimo01.html', '/players/m/motiedo01.html', '/players/m/motlejo01.html', '/players/m/mudiaem01.html', '/players/m/murraja01.html', '/players/m/musadz01.html', '/players/m/muscami01.html', '/players/m/muscami01.html', '/players/m/muscami01.html', '/players/m/mykhasv01.html', '/players/m/mykhasv01.html', '/players/m/mykhasv01.html', '/players/n/naderab01.html', '/players/n/nancela02.html', '/players/n/napiesh01.html', '/players/n/netora01.html', '/players/n/niangge01.html', '/players/n/noahjo01.html', '/players/n/noelne01.html', '/players/n/nowitdi01.html', '/players/n/ntilila01.html', '/players/n/nunnaja01.html', '/players/n/nunnaja01.html', '/players/n/nunnaja01.html', '/players/n/nurkiju01.html', '/players/n/nwabada01.html', '/players/o/onealro01.html', '/players/o/oquinky01.html', '/players/o/ojelese01.html', '/players/o/okafoja01.html', '/players/o/okoboel01.html', '/players/o/okogijo01.html', '/players/o/oladivi01.html', '/players/o/olynyke01.html', '/players/o/osmande01.html', '/players/o/oubreke01.html', '/players/o/oubreke01.html', '/players/o/oubreke01.html', '/players/p/pachuza01.html', '/players/p/parkeja01.html', '/players/p/parkeja01.html', '/players/p/parkeja01.html', '/players/p/parketo01.html', '/players/p/parsoch01.html', '/players/p/pattepa01.html', '/players/p/pattoju01.html', '/players/p/paulch01.html', '/players/p/payneca01.html', '/players/p/payneca01.html', '/players/p/payneca01.html', '/players/p/paytoel01.html', '/players/p/paytoga02.html', '/players/p/pinsoth01.html', '/players/p/plumlma01.html', '/players/p/plumlmi01.html', '/players/p/poeltja01.html', '/players/p/pondequ01.html', '/players/p/porteot01.html', '/players/p/porteot01.html', '/players/p/porteot01.html', '/players/p/portibo01.html', '/players/p/portibo01.html', '/players/p/portibo01.html', '/players/p/poweldw01.html', '/players/p/powelno01.html', '/players/p/poythal01.html', '/players/q/qizh01.html', '/players/r/rabbiv01.html', '/players/r/randlch01.html', '/players/r/randlju01.html', '/players/r/redicjj01.html', '/players/r/reedda01.html', '/players/r/reynoca01.html', '/players/r/richajo01.html', '/players/r/richama01.html', '/players/r/riverau01.html', '/players/r/riverau01.html', '/players/r/riverau01.html', '/players/r/robinde01.html', '/players/r/robindu01.html', '/players/r/robingl02.html', '/players/r/robinje01.html', '/players/r/robinmi01.html', '/players/r/rondora01.html', '/players/r/rosede01.html', '/players/r/rosste01.html', '/players/r/roziete01.html', '/players/r/rubiori01.html', '/players/r/russeda01.html', '/players/s/sabondo01.html', '/players/s/sampsbr01.html', '/players/s/sampsja02.html', '/players/s/saricda01.html', '/players/s/saricda01.html', '/players/s/saricda01.html', '/players/s/satorto01.html', '/players/s/schrode01.html', '/players/s/scottmi01.html', '/players/s/scottmi01.html', '/players/s/scottmi01.html', '/players/s/sefolth01.html', '/players/s/seldewa01.html', '/players/s/seldewa01.html', '/players/s/seldewa01.html', '/players/s/sextoco01.html', '/players/s/shamela01.html', '/players/s/shamela01.html', '/players/s/shamela01.html', '/players/s/shumpim01.html', '/players/s/shumpim01.html', '/players/s/shumpim01.html', '/players/s/siakapa01.html', '/players/s/siberjo01.html', '/players/s/simmobe01.html', '/players/s/simmojo02.html', '/players/s/simmojo02.html', '/players/s/simmojo02.html', '/players/s/simmoko01.html', '/players/s/simonan01.html', '/players/s/smartma01.html', '/players/s/smithde03.html', '/players/s/smithde03.html', '/players/s/smithde03.html', '/players/s/smithis01.html', '/players/s/smithjr01.html', '/players/s/smithja02.html', '/players/s/smithja02.html', '/players/s/smithja02.html', '/players/s/smithja02.html', '/players/s/smithzh01.html', '/players/s/snellto01.html', '/players/s/spaldra01.html', '/players/s/spaldra01.html', '/players/s/spaldra01.html', '/players/s/spellom01.html', '/players/s/stausni01.html', '/players/s/stausni01.html', '/players/s/stausni01.html', '/players/s/stephdj01.html', '/players/s/stephla01.html', '/players/s/sumneed01.html', '/players/s/swanica01.html', '/players/s/swanica01.html', '/players/s/swanica01.html', '/players/t/tatumja01.html', '/players/t/teaguje01.html', '/players/t/templga01.html', '/players/t/templga01.html', '/players/t/templga01.html', '/players/t/teodomi01.html', '/players/t/terreja01.html', '/players/t/terryem01.html', '/players/t/terryem01.html', '/players/t/terryem01.html', '/players/t/theisda01.html', '/players/t/thomais02.html', '/players/t/thomakh01.html', '/players/t/thomala01.html', '/players/t/thompkl01.html', '/players/t/thomptr01.html', '/players/t/thornsi01.html', '/players/t/tollian01.html', '/players/t/townska01.html', '/players/t/trentga02.html', '/players/t/trieral01.html', '/players/t/tuckepj01.html', '/players/t/turneev01.html', '/players/t/turnemy01.html', '/players/u/udohek01.html', '/players/u/ulisty01.html', '/players/v/valanjo01.html', '/players/v/valanjo01.html', '/players/v/valanjo01.html', '/players/v/vandeja01.html', '/players/v/vanvlfr01.html', '/players/v/vonleno01.html', '/players/v/vucevni01.html', '/players/w/wadedw01.html', '/players/w/wagnemo01.html', '/players/w/waitedi01.html', '/players/w/walkeke02.html', '/players/w/walkelo01.html', '/players/w/walljo01.html', '/players/w/wallaty01.html', '/players/p/princta02.html', '/players/w/wanambr01.html', '/players/w/warretj01.html', '/players/w/washbju01.html', '/players/w/watanyu01.html', '/players/w/welshth01.html', '/players/w/westbru01.html', '/players/w/whitede01.html', '/players/w/whiteok01.html', '/players/w/whiteha01.html', '/players/w/wiggian01.html', '/players/w/willial03.html', '/players/w/willicj01.html', '/players/w/willijo04.html', '/players/w/willike04.html', '/players/w/willilo02.html', '/players/w/willima02.html', '/players/w/williro04.html', '/players/w/willitr02.html', '/players/w/wilsodj01.html', '/players/w/winslju01.html', '/players/w/woodch01.html', '/players/w/woodch01.html', '/players/w/woodch01.html', '/players/w/wrighde01.html', '/players/w/wrighde01.html', '/players/w/wrighde01.html', '/players/y/yabusgu01.html', '/players/y/youngni01.html', '/players/y/youngth01.html', '/players/y/youngtr01.html', '/players/z/zelleco01.html', '/players/z/zellety01.html', '/players/z/zellety01.html', '/players/z/zellety01.html', '/players/z/zizican01.html', '/players/z/zubaciv01.html', '/players/z/zubaciv01.html', '/players/z/zubaciv01.html']
I would use a set comprehension to remove duplicates and also I think nth-of-type to select the appropriate column reads more cleanly. Using bs4 4.7.1
import requests
from bs4 import BeautifulSoup as bs
soup = bs(requests.get('https://www.basketball-reference.com/leagues/NBA_2019_per_game.html').text, 'html.parser')
links = {i['href'] for i in soup.select('#per_game_stats td:nth-of-type(1) a')}
print(links)
You could also use the following css selector:
[csk] > a
Related
Couldn't find all "href" by Xpath in one time
I couldn't find all 'href' by "find_elements_by_xpath" Is there another way to find data? thx !pip install selenium from selenium import webdriver import time import pandas as pd browser = webdriver.Chrome(executable_path='./chromedriver.exe') browser.implicitly_wait(5) browser.get("https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons") linkPath = '//ul[#class = "sc-eWvPqa cePswM"]/li/a' product_links = browser.find_elements_by_xpath(linkPath) print(product_links)
href is attribute of the anchor tag. so this xpath //a should locate all of them. or in Selenium you can use tag_name as well. I will use XPath, browser = webdriver.Chrome(executable_path='./chromedriver.exe') browser.get("https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons") browser.maximize_window() linkPath = "//a" product_links = browser.find_elements(By.XPATH, linkPath) print(len(product_links)) for link in product_links: print(link.get_attribute('href')) Output: 288 https://tw.yahoo.com/ https://tw.buy.yahoo.com/ https://tw.bid.yahoo.com/ https://tw.usedcar.yahoo.com/ https://tw.mall.yahoo.com/activity?p=mall-1-0-180921-channelgroupbuy http://mail.yahoo.com.tw/ https://yahoomode.tumblr.com/yahooapp/ https://tw.mall.yahoo.com/ https://tw.mall.yahoo.com/search/store?p= https://tw.user.mall.yahoo.com/my/home https://tw.sc.mall.yahoo.com/mcart/preview https://tw.user.mall.yahoo.com/my/order/orderList https://login.yahoo.com/config/login?.intl=tw&.src=mktg1&done=https%3A%2F%2Ftw.mall.yahoo.com%2Fstore%2Fwatsons https://tw.user.mall.yahoo.com/my/point https://tw.mall.yahoo.com/ https://tw.user.mall.yahoo.com/my/home https://tw.user.mall.yahoo.com/my/order/orderList https://tw.user.mall.yahoo.com/sc/view/home https://tw.user.mall.yahoo.com/my/notification https://tw.user.mall.yahoo.com/my/point https://tw.user.mall.yahoo.com/my/order/ratingList https://tw.user.mall.yahoo.com/my/followupStore https://tw.user.mall.yahoo.com/my/watchlist https://tw.user.mall.yahoo.com/my/ecoupon https://tw.user.mall.yahoo.com/my/voucher/unused https://tw.user.mall.yahoo.com/my/member https://tw.user.mall.yahoo.com/my/setting https://tw.user.mall.yahoo.com/my/customerqa https://tw.help.yahoo.com/kb/shopping-mall-web/SLN35152.html https://tw.mall.yahoo.com/ https://tw.mall.yahoo.com/store/watsons https://tw.mall.yahoo.com/store/watsons https://tw.mall.yahoo.com/store/watsons https://tw.mall.yahoo.com/store/watsons/rating/list https://tw.mall.yahoo.com/chat/watsons https://tw.mall.yahoo.com/store/watsons/stIntroMgt https://tw.mall.yahoo.com/store_vip/watsons https://tw.mall.yahoo.com/store/watsons/stNoteMgt https://tw.mall.yahoo.com/store/watsons/edm https://tw.mall.yahoo.com/store/watsons None None https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=2689&path=2689 https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=2797&path=2797 https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=712&path=712 https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=664&path=664 None https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=1553&path=1553 https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=2505&path=2505 https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=1452&path=1452 https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=1060&path=1060 https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=827&path=827 https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=1003&path=1003 https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=1024&path=1024 https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=875&path=875 https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=958&path=958 https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=979&path=979 https://tw.mall.yahoo.com/store_vip/watsons https://tw.mall.yahoo.com/store/watsons/promo https://tw.rcv.mall.yahoo.com/rcv/askEcoupon?s=5Ir7dQTxCtYebEIVRr7qdbQJrQ-- https://tw.mall.yahoo.com/store/watsons/promoCode?id=407205 https://tw.mall.yahoo.com/store/watsons/amountPromo?promotion_id=3101274 https://tw.mall.yahoo.com/store/watsons/amountPromo?promotion_id=3101277 https://tw.mall.yahoo.com/store/watsons/amountPromo?promotion_id=3101328 https://tw.mall.yahoo.com/store/watsons/amountPromo?promotion_id=3101279 https://tw.mall.yahoo.com/store/watsons/amountPromo?promotion_id=3101281 https://tw.mall.yahoo.com/store/watsons/amountPromo?promotion_id=3101286 https://tw.mall.yahoo.com/store/watsons/amountPromo?promotion_id=3101276 https://tw.mall.yahoo.com/store/watsons/amountPromo?promotion_id=3101216 https://tw.mall.yahoo.com/store/watsons/amountPromo?promotion_id=3101318 https://tw.mall.yahoo.com/store/watsons/amountPromo?promotion_id=3101272 https://tw.mall.yahoo.com/store/watsons/amountPromo?promotion_id=3101228 https://tw.mall.yahoo.com/store/watsons/amountPromo?promotion_id=3101287 https://tw.mall.yahoo.com/store/watsons https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=4792&path=4792 https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=4802&path=4793,4802 https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=4805&path=4794,4805 https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=4807&path=4794,4807 https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=4799&path=4793,4799 https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=4801&path=4793,4801 https://tw.mall.yahoo.com/search?q=%E6%B4%BB%E6%B2%9B%E5%A4%9A&sid=watsons https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=1539 https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=4810&path=4794,4810 https://tw.mall.yahoo.com/search?q=%E6%B4%BB%E6%B2%9B%E5%A4%9A&sid=watsons https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=4806&path=4794,4806 https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=4806&path=4794,4806 https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=4800&path=4793,4800 https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=4800&path=4793,4800 https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=4812&path=4793,4812 https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=4812&path=4793,4812 https://tw.mall.yahoo.com/search?m=list&sid=watsons https://tw.mall.yahoo.com/search?q=%E7%BE%8E%E8%88%92%E5%BE%8B&sid=watsons https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=3283&path=2689,3283 https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=4808&path=4794,4808 https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=4809&path=4794,4809 https://tw.mall.yahoo.com/search?q=%E8%92%82%E8%8A%AC%E5%A6%AE%E4%BA%9E&sid=watsons https://member.watsons.com.tw/NewsView.aspx?NewsID=AhmGIZT9gReFoAM18vHFyA%3d%3d https://member.watsons.com.tw/NewsView.aspx?NewsID=PIzM00tt2hmQiVi9trVqWg%3d%3d https://tw.mall.yahoo.com/activity?p=mall-1-0-200422-member https://tw.mall.yahoo.com/item/p0330180543850 https://tw.mall.yahoo.com/item/p0330231079018 https://tw.mall.yahoo.com/item/p0330149612638 https://tw.mall.yahoo.com/item/p033089261341 https://tw.mall.yahoo.com/item/p0330190914741 https://tw.mall.yahoo.com/item/p0330206204422 https://tw.mall.yahoo.com/item/p033053127759 https://tw.mall.yahoo.com/item/p0330207304791 https://tw.mall.yahoo.com/item/p0330228841925 https://tw.mall.yahoo.com/item/p0330226336955 https://tw.mall.yahoo.com/item/p0330119743323 https://tw.mall.yahoo.com/item/p0330180543850 https://tw.mall.yahoo.com/item/p0330106434075 https://tw.mall.yahoo.com/item/p0330229833822 https://tw.mall.yahoo.com/item/p033098709784 https://tw.mall.yahoo.com/item/p03304442775 https://tw.mall.yahoo.com/item/p0330226584503 https://tw.mall.yahoo.com/item/p0330142835205 https://tw.mall.yahoo.com/item/p0330230215997 https://tw.mall.yahoo.com/item/p03304614678 https://tw.mall.yahoo.com/item/p033014991670 https://tw.mall.yahoo.com/item/p033041688721 https://tw.mall.yahoo.com/item/p0330172392339 https://tw.mall.yahoo.com/item/p0330222713237 https://tw.mall.yahoo.com/item/p033058974074 https://tw.mall.yahoo.com/item/p0330143315411 https://tw.mall.yahoo.com/item/p0330220872850 https://tw.mall.yahoo.com/item/p0330231079018 https://tw.mall.yahoo.com/item/p033092697026 https://tw.mall.yahoo.com/item/p0330219296877 https://tw.mall.yahoo.com/item/p0330230912735 https://tw.mall.yahoo.com/item/p0330230651934 https://tw.mall.yahoo.com/item/p0330199709328 https://tw.mall.yahoo.com/item/p0330229012247 https://tw.mall.yahoo.com/item/p0330142835202 https://tw.mall.yahoo.com/item/p0330158688717 https://tw.mall.yahoo.com/item/p0330227790346 None https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=4792 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=4793 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=4794 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=2689 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=3122 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=3126 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=3285 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=3952 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=3415 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=2150 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=2797 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=3155 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=757 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=749 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=740 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=3063 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=3312 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=3352 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=731 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=3061 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=712 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=664 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=1553 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=2505 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=1452 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=1539 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=1060 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=1046 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=827 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=1003 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=1024 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=875 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=918 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=2937 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=958 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=979 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=1797 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=2691 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=3186 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=2865 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=939 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=892 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=1757 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=4431 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=2950 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?pay=zero https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?pay=card https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?pay=store https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?pay=install https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ship=711 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ship=family https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ship=hilife_pick https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ship=hilife_cash_on_delivery https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?status=instk https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?status=video https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?minr=1 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?minr=2 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?minr=3 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?minr=4 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?minr=5 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?s=-createtime https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?s=-rating https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?s=-rating_count https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?s=price https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?s=-price https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?view=both https://tw.mall.yahoo.com/item/p0330231079018 https://tw.mall.yahoo.com/item/p0330221397264 https://tw.mall.yahoo.com/item/p0330201617111 https://tw.mall.yahoo.com/item/p0330149612638 https://tw.mall.yahoo.com/item/p0330199239561 https://tw.mall.yahoo.com/item/p0330157722030 https://tw.mall.yahoo.com/item/p0330195847496 https://tw.mall.yahoo.com/item/p0330199516056 https://tw.mall.yahoo.com/item/p033018957484 https://tw.mall.yahoo.com/item/p03304110080 https://tw.mall.yahoo.com/item/p0330180543850 https://tw.mall.yahoo.com/item/p0330182407803 https://tw.mall.yahoo.com/item/p0330127191918 https://tw.mall.yahoo.com/item/p033016523362 https://tw.mall.yahoo.com/item/p0330223567951 https://tw.mall.yahoo.com/item/p0330195847488 https://tw.mall.yahoo.com/item/p0330175780701 https://tw.mall.yahoo.com/item/p0330157492679 https://tw.mall.yahoo.com/item/p03304614678 https://tw.mall.yahoo.com/item/p033018936979 https://tw.mall.yahoo.com/item/p0330128871794 https://tw.mall.yahoo.com/item/p0330208015628 https://tw.mall.yahoo.com/item/p0330230696349 https://tw.mall.yahoo.com/item/p0330225341921 https://tw.mall.yahoo.com/item/p033076908568 https://tw.mall.yahoo.com/item/p0330226395844 https://tw.mall.yahoo.com/item/p03304110059 https://tw.mall.yahoo.com/item/p03304109225 https://tw.mall.yahoo.com/item/p0330194282004 https://tw.mall.yahoo.com/item/p0330224835920 https://tw.mall.yahoo.com/item/p0330228042972 https://tw.mall.yahoo.com/item/p0330106434075 https://tw.mall.yahoo.com/item/p03304109964 https://tw.mall.yahoo.com/item/p0330162452392 https://tw.mall.yahoo.com/item/p0330207304791 https://tw.mall.yahoo.com/item/p033076908463 https://tw.mall.yahoo.com/item/p03304057564 https://tw.mall.yahoo.com/item/p033037459923 https://tw.mall.yahoo.com/item/p0330212962835 https://tw.mall.yahoo.com/item/p0330212400568 https://tw.mall.yahoo.com/item/p03304109924 https://tw.mall.yahoo.com/item/p03304109929 https://tw.mall.yahoo.com/item/p0330157347297 https://tw.mall.yahoo.com/item/p033069529068 https://tw.mall.yahoo.com/item/p0330202092777 https://tw.mall.yahoo.com/item/p0330158688717 https://tw.mall.yahoo.com/item/p0330226336941 https://tw.mall.yahoo.com/item/p0330223693370 https://tw.mall.yahoo.com/item/p0330111297761 https://tw.mall.yahoo.com/item/p0330223745715 https://tw.mall.yahoo.com/item/p0330230577786 https://tw.mall.yahoo.com/item/p03304110017 https://tw.mall.yahoo.com/item/p0330107487926 https://tw.mall.yahoo.com/item/p0330101184035 https://tw.mall.yahoo.com/item/p033017662348 https://tw.mall.yahoo.com/item/p033069529070 https://tw.mall.yahoo.com/item/p0330200970871 https://tw.mall.yahoo.com/item/p0330227790329 https://tw.mall.yahoo.com/item/p0330181952679 https://tw.mall.yahoo.com/item/p033064463660 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?pg=2 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?pg=3 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?pg=4 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?pg=5 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?pg=6 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?pg=7 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?pg=8 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?pg=9 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?pg=10 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?pg=11 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?pg=2 https://tw.mall.yahoo.com/chat/watsons?rr=1637492343029 https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons#backToTop https://itunes.apple.com/tw/app/id778296354?mt=8 https://play.google.com/store/apps/details?id=com.yahoo.mobile.client.android.ecstore&hl=zh_TW https://www.facebook.com/Ybestbuy https://tw.mall.yahoo.com/activity?p=mall-1-0-200424-newcorp003 https://tw.mall.yahoo.com/help/help.html https://tw.mall.yahoo.com/help/return.html https://policies.yahoo.com/tw/zh-hant/yahoo/terms/utos/index.htm Due to language constraint, I could not differentiate products on the page. I think they are located by //a[#rel='nofollow'] XPath. Update 1 : linkPath = "//a" product_links = driver.find_elements(By.XPATH, linkPath) print(len(product_links)) for link in product_links: address = link.get_attribute('href') try: if '/item/p' in address: print(address) except: pass Output : https://tw.mall.yahoo.com/item/p0330180543850 https://tw.mall.yahoo.com/item/p0330231079018 https://tw.mall.yahoo.com/item/p0330149612638 https://tw.mall.yahoo.com/item/p033089261341 https://tw.mall.yahoo.com/item/p0330190914741 https://tw.mall.yahoo.com/item/p0330206204422 https://tw.mall.yahoo.com/item/p033053127759 https://tw.mall.yahoo.com/item/p0330207304791 https://tw.mall.yahoo.com/item/p0330228841925 https://tw.mall.yahoo.com/item/p0330226336955 https://tw.mall.yahoo.com/item/p0330230651934 https://tw.mall.yahoo.com/item/p033098709784 https://tw.mall.yahoo.com/item/p0330229833822 https://tw.mall.yahoo.com/item/p0330142835202 https://tw.mall.yahoo.com/item/p0330227790346 https://tw.mall.yahoo.com/item/p0330172392339 https://tw.mall.yahoo.com/item/p0330226584503 https://tw.mall.yahoo.com/item/p0330220872850 https://tw.mall.yahoo.com/item/p0330180543850 https://tw.mall.yahoo.com/item/p0330230215997 https://tw.mall.yahoo.com/item/p0330199709328 https://tw.mall.yahoo.com/item/p0330229012247 https://tw.mall.yahoo.com/item/p0330142835205 https://tw.mall.yahoo.com/item/p03304614678 https://tw.mall.yahoo.com/item/p033058974074 https://tw.mall.yahoo.com/item/p03304442775 https://tw.mall.yahoo.com/item/p033092697026 https://tw.mall.yahoo.com/item/p0330230912735 https://tw.mall.yahoo.com/item/p0330106434075 https://tw.mall.yahoo.com/item/p0330222713237 https://tw.mall.yahoo.com/item/p0330119743323 https://tw.mall.yahoo.com/item/p033041688721 https://tw.mall.yahoo.com/item/p0330231079018 https://tw.mall.yahoo.com/item/p0330158688717 https://tw.mall.yahoo.com/item/p033014991670 https://tw.mall.yahoo.com/item/p0330219296877 https://tw.mall.yahoo.com/item/p0330143315411 https://tw.mall.yahoo.com/item/p0330231079018 https://tw.mall.yahoo.com/item/p0330221397264 https://tw.mall.yahoo.com/item/p0330201617111 https://tw.mall.yahoo.com/item/p0330149612638 https://tw.mall.yahoo.com/item/p0330199239561 https://tw.mall.yahoo.com/item/p0330157722030 https://tw.mall.yahoo.com/item/p0330195847496 https://tw.mall.yahoo.com/item/p0330199516056 https://tw.mall.yahoo.com/item/p033018957484 https://tw.mall.yahoo.com/item/p03304110080 https://tw.mall.yahoo.com/item/p0330180543850 https://tw.mall.yahoo.com/item/p0330182407803 https://tw.mall.yahoo.com/item/p0330127191918 https://tw.mall.yahoo.com/item/p033016523362 https://tw.mall.yahoo.com/item/p0330223567951 https://tw.mall.yahoo.com/item/p0330195847488 https://tw.mall.yahoo.com/item/p0330175780701 https://tw.mall.yahoo.com/item/p0330157492679 https://tw.mall.yahoo.com/item/p03304614678 https://tw.mall.yahoo.com/item/p033018936979 https://tw.mall.yahoo.com/item/p0330128871794 https://tw.mall.yahoo.com/item/p0330208015628 https://tw.mall.yahoo.com/item/p0330230696349 https://tw.mall.yahoo.com/item/p0330225341921 https://tw.mall.yahoo.com/item/p033076908568 https://tw.mall.yahoo.com/item/p0330226395844 https://tw.mall.yahoo.com/item/p03304110059 https://tw.mall.yahoo.com/item/p0330194282004 https://tw.mall.yahoo.com/item/p03304109225 https://tw.mall.yahoo.com/item/p0330224835920 https://tw.mall.yahoo.com/item/p0330228042972 https://tw.mall.yahoo.com/item/p0330106434075 https://tw.mall.yahoo.com/item/p0330162452392 https://tw.mall.yahoo.com/item/p03304109964 https://tw.mall.yahoo.com/item/p0330207304791 https://tw.mall.yahoo.com/item/p03304057564 https://tw.mall.yahoo.com/item/p033076908463 https://tw.mall.yahoo.com/item/p033037459923 https://tw.mall.yahoo.com/item/p0330212962835 https://tw.mall.yahoo.com/item/p0330157347297 https://tw.mall.yahoo.com/item/p03304109929 https://tw.mall.yahoo.com/item/p0330212400568 https://tw.mall.yahoo.com/item/p03304109924 https://tw.mall.yahoo.com/item/p033069529068 https://tw.mall.yahoo.com/item/p0330202092777 https://tw.mall.yahoo.com/item/p0330158688717 https://tw.mall.yahoo.com/item/p0330226336941 https://tw.mall.yahoo.com/item/p0330223693370 https://tw.mall.yahoo.com/item/p0330111297761 https://tw.mall.yahoo.com/item/p0330223745715 https://tw.mall.yahoo.com/item/p0330230577786 https://tw.mall.yahoo.com/item/p0330101184035 https://tw.mall.yahoo.com/item/p03304110017 https://tw.mall.yahoo.com/item/p0330107487926 https://tw.mall.yahoo.com/item/p033017662348 https://tw.mall.yahoo.com/item/p0330200970871 https://tw.mall.yahoo.com/item/p033069529070 https://tw.mall.yahoo.com/item/p0330227790329 https://tw.mall.yahoo.com/item/p033064463660 https://tw.mall.yahoo.com/item/p0330181952679
Scraping Asxp page with beautiful soup
enter image description hereI am trying to scrape this page using Beautiful Soup, I 1st tried to find any API/Json behind the page which I couldn't find. And then I was trying a BS and HTML parser, but I can't get anywhere with it. I am not able to do so coz the page is calling a function with onclick GetFiiStatistics('F-INDEX FUTURES'). How can I go about scraping pages like these? webpage: https://www.motilaloswal.com/markets/derivative-market/FII-Statistics.aspx
The data is loaded via Javascript from their API. Unfortunately, I don't know, how INPUT, CATAGORY and FLAG values are computed: import json import requests import pandas as pd from bs4 import BeautifulSoup api_url = 'https://www.motilaloswal.com/ControllerBeta/APIRequest.aspx' data = { 'INPUT':'3yDBtksOiDjLLYaySd5NYgCcFnUOx8Jh2c8SRJvEhAs=', 'CATAGORY':'Q668CVoAKYOr7whA+PW25A==', 'FLAG':'VrTQQDRa72uSLVOyZO5Nqg==' } json_data = json.loads(requests.post(api_url, data=data).json()) # uncomment this to print all data: # print(json.dumps(json_data, indent=4)) df = pd.DataFrame(json_data) print(df) Prints: IndexType Date BuyContracts BuyValue SellContracts SellValue Net_BuySellValue OIContracts_eod OIValue_eod 0 INDEX FUTURES 2020-07-14T00:00:00 67037 4403.320 88350 6048.750 -1645.430 123985 8799.15 1 INDEX FUTURES 2020-07-13T00:00:00 53612 3644.000 61174 4142.300 -498.300 105126 7803.56 2 INDEX FUTURES 2020-07-10T00:00:00 62735 4313.250 83916 5923.270 -1610.020 113346 8369.47 3 INDEX FUTURES 2020-07-09T00:00:00 67222 4773.340 51094 3442.520 1330.820 119977 8971.28 4 INDEX FUTURES 2020-07-08T00:00:00 83367 5661.640 69042 4659.280 1002.360 106027 7686.32 .. ... ... ... ... ... ... ... ... ... 495 INDEX FUTURES 2018-07-05T00:00:00 28566 2585.923 21393 1881.601 704.322 243417 19847.72 496 INDEX FUTURES 2018-07-04T00:00:00 21339 1875.184 26507 2425.003 -549.819 245786 20201.28 497 INDEX FUTURES 2018-07-03T00:00:00 30019 2610.728 28837 2563.647 47.081 237564 19322.63 498 INDEX FUTURES 2018-07-02T00:00:00 24976 2191.751 29501 2541.589 -349.838 226100 18203.48 499 INDEX FUTURES 2018-06-29T00:00:00 45399 3814.480 27297 2371.817 1442.663 227041 18387.33 [500 rows x 9 columns] EDIT: To scrape other tabs, change the data= parameter in request: import json import requests import pandas as pd from bs4 import BeautifulSoup api_url = 'https://www.motilaloswal.com/ControllerBeta/APIRequest.aspx' data_future = { 'INPUT':'3yDBtksOiDjLLYaySd5NYgCcFnUOx8Jh2c8SRJvEhAs=', 'CATAGORY':'Q668CVoAKYOr7whA+PW25A==', 'FLAG':'VrTQQDRa72uSLVOyZO5Nqg==' } data_option = { 'INPUT': '3yDBtksOiDjLLYaySd5NYmcoKC8x7z5PFO880mjcQ2U=', 'CATAGORY': 'Q668CVoAKYOr7whA+PW25A==', 'FLAG': "VrTQQDRa72uSLVOyZO5Nqg==" } data_stock_future = { 'INPUT': '7F5jZM46TTOICwT1N6AfqkP7gWI2CpTGCWmll4bhYow=', 'CATAGORY': 'Q668CVoAKYOr7whA+PW25A==', 'FLAG': 'VrTQQDRa72uSLVOyZO5Nqg==' } data_stock_option = { 'INPUT': '7F5jZM46TTOICwT1N6Afqv1N6pCh+1OrTfhSwG6Azes=', 'CATAGORY': 'Q668CVoAKYOr7whA+PW25A==', 'FLAG': 'VrTQQDRa72uSLVOyZO5Nqg==' } json_data = json.loads(requests.post(api_url, data=data_stock_option).json()) # <-- change data= to data_stock_future or data_option ... # uncomment this to print all data: # print(json.dumps(json_data, indent=4)) df = pd.DataFrame(json_data) print(df) Prints: IndexType Date BuyContracts BuyValue SellContracts SellValue Net_BuySellValue OIContracts_eod OIValue_eod 0 STOCK OPTIONS 2020-07-14T00:00:00 52546 4349.020 54139 4525.850 -176.830 115998 7661.010 1 STOCK OPTIONS 2020-07-13T00:00:00 50604 4242.330 52221 4413.040 -170.710 110663 7329.990 2 STOCK OPTIONS 2020-07-10T00:00:00 82502 6218.200 82608 6219.420 -1.220 109680 7232.900 3 STOCK OPTIONS 2020-07-09T00:00:00 64743 4725.430 64613 4714.460 10.970 104780 6945.740 4 STOCK OPTIONS 2020-07-08T00:00:00 75481 5201.770 75713 5220.580 -18.810 100200 6584.390 .. ... ... ... ... ... ... ... ... ... 495 STOCK OPTIONS 2018-07-05T00:00:00 94696 6728.086 93256 6617.059 111.027 66773 4471.483 496 STOCK OPTIONS 2018-07-04T00:00:00 68719 4762.333 69376 4794.350 -32.017 59195 4005.045 497 STOCK OPTIONS 2018-07-03T00:00:00 64283 4351.226 64982 4347.153 4.073 53028 3581.946 498 STOCK OPTIONS 2018-07-02T00:00:00 74479 4913.606 74730 4897.239 16.367 44627 3024.131 499 STOCK OPTIONS 2018-06-29T00:00:00 69730 4694.675 68447 4645.350 49.325 35486 2421.744 [500 rows x 9 columns]
requests and bs4 cannot read the whole html
I am trying to get all the href in the list in this website:https://nihongonosensei.net/?page_id=10246. The website is very simple and clean. After reviewing the source, I found nothing dynamic. However, if I do import requests url = 'https://nihongonosensei.net/?page_id=10246' r = requests.get(url) r.text r.text only contains around 20000 characters information. More than half of the html is missing. I tried to copy the whole HTML from "view page source" and directly load to Beautifulsoup: from bs4 import BeautifulSoup html = '' # too long to copy. Here is the link: view-source:https://nihongonosensei.net/?page_id=10246 soup = BeautifulSoup(html, 'html.parser') Still only around 20000 characters are retained and the top half of the html is missing. Here is my question: Is there any character restrictions with requests and BeautifulSoup? If so, how can I remove the limitation? If not, why I cannot get the full html? Thanks a lot! Rachel
import requests from bs4 import BeautifulSoup r = requests.get("https://nihongonosensei.net/?page_id=10246") soup = BeautifulSoup(r.text, 'html.parser') for item in soup.findAll("a", href=True): item = item.get("href") if item.startswith("http"): print(item) output: https://nihongonosensei.net/ http://nihongonosensei.net/?p=3547 http://nihongonosensei.net/?p=3563 http://nihongonosensei.net/?p=3568 http://nihongonosensei.net/?p=3600 http://nihongonosensei.net/?p=3614 http://nihongonosensei.net/?p=3618 http://nihongonosensei.net/?p=3622 http://nihongonosensei.net/?p=3626 http://nihongonosensei.net/?p=3633 http://nihongonosensei.net/?p=3695 http://nihongonosensei.net/?p=3697 http://nihongonosensei.net/?p=3702 http://nihongonosensei.net/?p=3707 http://nihongonosensei.net/?p=3710 http://nihongonosensei.net/?p=3712 http://nihongonosensei.net/?p=3714 http://nihongonosensei.net/?p=3719 http://nihongonosensei.net/?p=3722 http://nihongonosensei.net/?p=3726 http://nihongonosensei.net/?p=3730 http://nihongonosensei.net/?p=3733 http://nihongonosensei.net/?p=3735 http://nihongonosensei.net/?p=5236 http://nihongonosensei.net/?p=5238 http://nihongonosensei.net/?p=5240 http://nihongonosensei.net/?p=5244 http://nihongonosensei.net/?p=5618 http://nihongonosensei.net/?p=5620 http://nihongonosensei.net/?p=5961 http://nihongonosensei.net/?p=5965 http://nihongonosensei.net/?p=5967 http://nihongonosensei.net/?p=5970 http://nihongonosensei.net/?p=5972 http://nihongonosensei.net/?p=6772 http://nihongonosensei.net/?p=7977 http://nihongonosensei.net/?p=7979 http://nihongonosensei.net/?p=7983 http://nihongonosensei.net/?p=7985 http://nihongonosensei.net/?p=7987 http://nihongonosensei.net/?p=8869 http://nihongonosensei.net/?p=8891 http://nihongonosensei.net/?p=9192 http://nihongonosensei.net/?p=9197 http://nihongonosensei.net/?p=9198 http://nihongonosensei.net/?p=9199 http://nihongonosensei.net/?p=9219 http://nihongonosensei.net/?p=9221 http://nihongonosensei.net/?p=9223 http://nihongonosensei.net/?p=9249 http://nihongonosensei.net/?p=9280 http://nihongonosensei.net/?p=9320 http://nihongonosensei.net/?p=9322 http://nihongonosensei.net/?p=9324 http://nihongonosensei.net/?p=9327 http://nihongonosensei.net/?p=9329 http://nihongonosensei.net/?p=9353 http://nihongonosensei.net/?p=9359 http://nihongonosensei.net/?p=9360 http://nihongonosensei.net/?p=13973 http://nihongonosensei.net/?p=13972 http://nihongonosensei.net/?p=13974 http://nihongonosensei.net/?p=11851 http://nihongonosensei.net/?p=11858 http://nihongonosensei.net/?p=12202 http://nihongonosensei.net/?p=12999 http://nihongonosensei.net/?p=13112 http://nihongonosensei.net/?p=13364 http://nihongonosensei.net/?p=13494 http://nihongonosensei.net/?p=14887 http://nihongonosensei.net/?p=14889 http://nihongonosensei.net/?p=14915 http://nihongonosensei.net/?p=14918 http://nihongonosensei.net/?p=17745 http://nihongonosensei.net/?p=18155 http://nihongonosensei.net/?p=18159 http://nihongonosensei.net/?p=18188 http://nihongonosensei.net/?p=18206 http://nihongonosensei.net/?p=18204 http://nihongonosensei.net/?p=18223 http://nihongonosensei.net/?p=18407 http://nihongonosensei.net/?p=18460 http://nihongonosensei.net/?p=18461 http://nihongonosensei.net/?p=18578 http://nihongonosensei.net/?p=18611 http://nihongonosensei.net/?p=18696 http://nihongonosensei.net/?p=18705 http://nihongonosensei.net/?p=18707 http://nihongonosensei.net/?p=18763 http://nihongonosensei.net/?p=3738 http://nihongonosensei.net/?p=3745 http://nihongonosensei.net/?p=3759 http://nihongonosensei.net/?p=3776 http://nihongonosensei.net/?p=3778 http://nihongonosensei.net/?p=3781 http://nihongonosensei.net/?p=3783 http://nihongonosensei.net/?p=3785 http://nihongonosensei.net/?p=3797 http://nihongonosensei.net/?p=3799 http://nihongonosensei.net/?p=3801 http://nihongonosensei.net/?p=3804 http://nihongonosensei.net/?p=3809 http://nihongonosensei.net/?p=3824 http://nihongonosensei.net/?p=3826 http://nihongonosensei.net/?p=13941 http://nihongonosensei.net/?p=3833 http://nihongonosensei.net/?p=4097 http://nihongonosensei.net/?p=5058 http://nihongonosensei.net/?p=5246 http://nihongonosensei.net/?p=5248 http://nihongonosensei.net/?p=5251 http://nihongonosensei.net/?p=5253 http://nihongonosensei.net/?p=5255 http://nihongonosensei.net/?p=5616 http://nihongonosensei.net/?p=5614 http://nihongonosensei.net/?p=5978 http://nihongonosensei.net/?p=5982 http://nihongonosensei.net/?p=5974 http://nihongonosensei.net/?p=6203 http://nihongonosensei.net/?p=6205 http://nihongonosensei.net/?p=11829 http://nihongonosensei.net/?p=11830 http://nihongonosensei.net/?p=6209 http://nihongonosensei.net/?p=6211 http://nihongonosensei.net/?p=7909 http://nihongonosensei.net/?p=7970 http://nihongonosensei.net/?p=7972 http://nihongonosensei.net/?p=7974 http://nihongonosensei.net/?p=7990 http://nihongonosensei.net/?p=7992 http://nihongonosensei.net/?p=8008 http://nihongonosensei.net/?p=8010 http://nihongonosensei.net/?p=8012 http://nihongonosensei.net/?p=9447 http://nihongonosensei.net/?p=9452 http://nihongonosensei.net/?p=9876 http://nihongonosensei.net/?p=9884 http://nihongonosensei.net/?p=9890 http://nihongonosensei.net/?p=9891 http://nihongonosensei.net/?p=9945 http://nihongonosensei.net/?p=14072 http://nihongonosensei.net/?p=14073 http://nihongonosensei.net/?p=10533 http://nihongonosensei.net/?p=10532 http://nihongonosensei.net/?p=11855 http://nihongonosensei.net/?p=11521 http://nihongonosensei.net/?p=18734 http://nihongonosensei.net/?p=18726 http://nihongonosensei.net/?p=11862 http://nihongonosensei.net/?p=11864 http://nihongonosensei.net/?p=11866 http://nihongonosensei.net/?p=12025 http://nihongonosensei.net/?p=12027 http://nihongonosensei.net/?p=12115 http://nihongonosensei.net/?p=13076 http://nihongonosensei.net/?p=13142 http://nihongonosensei.net/?p=13145 http://nihongonosensei.net/?p=13453 http://nihongonosensei.net/?p=13456 http://nihongonosensei.net/?p=13459 http://nihongonosensei.net/?p=13479 http://nihongonosensei.net/?p=13483 http://nihongonosensei.net/?p=3535 http://nihongonosensei.net/?p=14896 http://nihongonosensei.net/?p=18263 http://nihongonosensei.net/?p=18324 http://nihongonosensei.net/?p=18366 http://nihongonosensei.net/?p=18373 http://nihongonosensei.net/?p=18381 http://nihongonosensei.net/?p=18398 http://nihongonosensei.net/?p=18680 http://nihongonosensei.net/?p=18682 http://nihongonosensei.net/?p=18684 http://nihongonosensei.net/?p=1700 http://nihongonosensei.net/?p=1708 http://nihongonosensei.net/?p=1713 http://nihongonosensei.net/?p=1718 http://nihongonosensei.net/?p=1735 http://nihongonosensei.net/?p=1742 http://nihongonosensei.net/?p=1745 http://nihongonosensei.net/?p=1748 http://nihongonosensei.net/?p=1752 http://nihongonosensei.net/?p=1755 http://nihongonosensei.net/?p=1758 http://nihongonosensei.net/?p=1761 http://nihongonosensei.net/?p=1764 http://nihongonosensei.net/?p=1767 http://nihongonosensei.net/?p=1770 http://nihongonosensei.net/?p=1773 http://nihongonosensei.net/?p=1777 http://nihongonosensei.net/?p=1782 http://nihongonosensei.net/?p=1785 http://nihongonosensei.net/?p=1788 http://nihongonosensei.net/?p=1791 http://nihongonosensei.net/?p=1794 http://nihongonosensei.net/?p=1797 http://nihongonosensei.net/?p=1801 http://nihongonosensei.net/?p=1804 http://nihongonosensei.net/?p=1807 http://nihongonosensei.net/?p=1810 http://nihongonosensei.net/?p=1813 http://nihongonosensei.net/?p=1816 http://nihongonosensei.net/?p=1819 http://nihongonosensei.net/?p=1823 http://nihongonosensei.net/?p=1828 http://nihongonosensei.net/?p=1835 http://nihongonosensei.net/?p=1838 http://nihongonosensei.net/?p=12082 http://nihongonosensei.net/?p=3470 http://nihongonosensei.net/?p=3477 http://nihongonosensei.net/?p=3484 http://nihongonosensei.net/?p=3492 http://nihongonosensei.net/?p=3553 http://nihongonosensei.net/?p=3559 http://nihongonosensei.net/?p=13970 http://nihongonosensei.net/?p=6331 http://nihongonosensei.net/?p=6335 http://nihongonosensei.net/?p=6339 http://nihongonosensei.net/?p=6341 http://nihongonosensei.net/?p=6769 http://nihongonosensei.net/?p=8506 http://nihongonosensei.net/?p=8857 http://nihongonosensei.net/?p=9283 http://nihongonosensei.net/?p=9306 http://nihongonosensei.net/?p=9308 http://nihongonosensei.net/?p=9312 http://nihongonosensei.net/?p=9314 http://nihongonosensei.net/?p=9422 http://nihongonosensei.net/?p=9462 http://nihongonosensei.net/?p=9860 http://nihongonosensei.net/?p=11635 http://nihongonosensei.net/?p=12073 http://nihongonosensei.net/?p=12784 http://nihongonosensei.net/?p=12795 http://nihongonosensei.net/?p=12821 http://nihongonosensei.net/?p=12824 http://nihongonosensei.net/?p=12830 http://nihongonosensei.net/?p=12832 http://nihongonosensei.net/?p=12834 http://nihongonosensei.net/?p=12987 http://nihongonosensei.net/?p=12995 http://nihongonosensei.net/?p=13018 http://nihongonosensei.net/?p=3761 http://nihongonosensei.net/?p=13326 http://nihongonosensei.net/?p=13327 http://nihongonosensei.net/?p=13340 http://nihongonosensei.net/?p=13344 http://nihongonosensei.net/?p=17748 http://nihongonosensei.net/?p=17758 http://nihongonosensei.net/?p=17767 http://nihongonosensei.net/?p=17771 http://nihongonosensei.net/?p=18162 http://nihongonosensei.net/?p=18165 http://nihongonosensei.net/?p=18171 http://nihongonosensei.net/?p=18202 http://nihongonosensei.net/?p=18199 http://nihongonosensei.net/?p=18314 http://nihongonosensei.net/?p=18312 http://nihongonosensei.net/?p=18399 http://nihongonosensei.net/?p=18400 http://nihongonosensei.net/?p=18585 http://nihongonosensei.net/?p=18589 http://nihongonosensei.net/?p=18591 http://nihongonosensei.net/?p=18301 http://nihongonosensei.net/?p=18701 http://nihongonosensei.net/?p=18773 http://nihongonosensei.net/?p=18775 http://nihongonosensei.net/?p=18788 http://nihongonosensei.net/?p=18790 http://nihongonosensei.net/?p=18792 http://nihongonosensei.net/?p=18821 http://nihongonosensei.net/?p=3571 http://nihongonosensei.net/?p=9936 http://nihongonosensei.net/?p=3578 http://nihongonosensei.net/?p=5980 http://nihongonosensei.net/?p=3609 http://nihongonosensei.net/?p=3680 http://nihongonosensei.net/?p=3828 http://nihongonosensei.net/?p=6345 http://nihongonosensei.net/?p=6347 http://nihongonosensei.net/?p=6351 http://nihongonosensei.net/?p=7905 http://nihongonosensei.net/?p=7907 http://nihongonosensei.net/?p=8063 http://nihongonosensei.net/?p=18470 http://nihongonosensei.net/?p=18471 http://nihongonosensei.net/?p=9425 http://nihongonosensei.net/?p=9426 http://nihongonosensei.net/?p=9465 http://nihongonosensei.net/?p=9466 http://nihongonosensei.net/?p=9872 http://nihongonosensei.net/?p=10058 http://nihongonosensei.net/?p=11304 http://nihongonosensei.net/?p=11948 http://nihongonosensei.net/?p=18497 http://nihongonosensei.net/?p=18499 http://nihongonosensei.net/?p=18501 http://nihongonosensei.net/?p=12143 http://nihongonosensei.net/?p=12789 http://nihongonosensei.net/?p=12882 http://nihongonosensei.net/?p=12885 http://nihongonosensei.net/?p=12886 http://nihongonosensei.net/?p=13074 http://nihongonosensei.net/?p=13087 http://nihongonosensei.net/?p=13092 http://nihongonosensei.net/?p=13136 http://nihongonosensei.net/?p=13151 http://nihongonosensei.net/?p=13371 http://nihongonosensei.net/?p=18157 http://nihongonosensei.net/?p=18219 http://nihongonosensei.net/?p=18221 http://nihongonosensei.net/?p=18266 http://nihongonosensei.net/?p=18292 http://nihongonosensei.net/?p=18293 http://nihongonosensei.net/?p=18392 http://nihongonosensei.net/?p=18488 http://nihongonosensei.net/?p=18489 http://nihongonosensei.net/?p=18593 http://nihongonosensei.net/?p=18595 http://nihongonosensei.net/?p=18612 http://nihongonosensei.net/?p=18613 http://nihongonosensei.net/?p=18657 http://nihongonosensei.net/?p=18659 http://nihongonosensei.net/?p=18662 http://nihongonosensei.net/?p=18664 http://nihongonosensei.net/?p=12827 http://nihongonosensei.net/?p=4094 http://nihongonosensei.net/?p=18732 http://nihongonosensei.net/?p=18728 http://nihongonosensei.net/?p=18720 http://nihongonosensei.net/?p=18722 http://nihongonosensei.net/?p=18730 http://nihongonosensei.net/?p=18724 http://nihongonosensei.net/?p=4094 http://nihongonosensei.net/?p=3500 http://nihongonosensei.net/?p=3526 http://nihongonosensei.net/?p=3529 http://nihongonosensei.net/?p=3474 http://nihongonosensei.net/?p=3585 http://nihongonosensei.net/?p=3606 http://nihongonosensei.net/?p=3643 http://nihongonosensei.net/?p=3650 http://nihongonosensei.net/?p=3656 http://nihongonosensei.net/?p=5062 http://nihongonosensei.net/?p=5941 http://nihongonosensei.net/?p=5943 http://nihongonosensei.net/?p=5945 http://nihongonosensei.net/?p=5947 http://nihongonosensei.net/?p=5949 http://nihongonosensei.net/?p=5984 http://nihongonosensei.net/?p=7024 http://nihongonosensei.net/?p=7026 http://nihongonosensei.net/?p=7096 http://nihongonosensei.net/?p=7098 http://nihongonosensei.net/?p=7100 http://nihongonosensei.net/?p=7102 http://nihongonosensei.net/?p=7104 http://nihongonosensei.net/?p=7152 http://nihongonosensei.net/?p=10116 http://nihongonosensei.net/?p=3550 http://nihongonosensei.net/?p=8048 http://nihongonosensei.net/?p=6349 http://nihongonosensei.net/?p=8051 http://nihongonosensei.net/?p=8058 http://nihongonosensei.net/?p=8061 http://nihongonosensei.net/?p=8070 http://nihongonosensei.net/?p=8080 http://nihongonosensei.net/?p=8082 http://nihongonosensei.net/?p=4085 http://nihongonosensei.net/?p=4088 http://nihongonosensei.net/?p=8540 http://nihongonosensei.net/?p=8542 http://nihongonosensei.net/?p=8558 http://nihongonosensei.net/?p=8564 http://nihongonosensei.net/?p=8665 http://nihongonosensei.net/?p=8669 http://nihongonosensei.net/?p=8672 http://nihongonosensei.net/?p=8675 http://nihongonosensei.net/?p=8710 http://nihongonosensei.net/?p=8705 http://nihongonosensei.net/?p=7981 http://nihongonosensei.net/?p=8724 http://nihongonosensei.net/?p=8730 http://nihongonosensei.net/?p=8733 http://nihongonosensei.net/?p=8856 http://nihongonosensei.net/?p=9310 http://nihongonosensei.net/?p=9352 http://nihongonosensei.net/?p=5242 http://nihongonosensei.net/?p=9385 http://nihongonosensei.net/?p=9386 http://nihongonosensei.net/?p=9488 http://nihongonosensei.net/?p=9487 http://nihongonosensei.net/?p=12075 http://nihongonosensei.net/?p=18193 http://nihongonosensei.net/?p=18350 http://nihongonosensei.net/?p=18351 http://nihongonosensei.net/?p=18406 http://nihongonosensei.net/?p=18428 http://nihongonosensei.net/?p=18447 http://nihongonosensei.net/?p=18587 http://nihongonosensei.net/?p=18698 http://nihongonosensei.net/?p=18695 http://nihongonosensei.net/?p=18703 http://nihongonosensei.net/?p=3659 http://nihongonosensei.net/?p=3673 http://nihongonosensei.net/?p=3676 http://nihongonosensei.net/?p=3683 http://nihongonosensei.net/?p=3686 http://nihongonosensei.net/?p=18190 http://nihongonosensei.net/?p=3747 http://nihongonosensei.net/?p=3749 http://nihongonosensei.net/?p=3753 http://nihongonosensei.net/?p=5951 http://nihongonosensei.net/?p=5953 http://nihongonosensei.net/?p=5955 http://nihongonosensei.net/?p=5957 http://nihongonosensei.net/?p=7068 http://nihongonosensei.net/?p=7071 http://nihongonosensei.net/?p=7075 http://nihongonosensei.net/?p=7121 http://nihongonosensei.net/?p=3541 http://nihongonosensei.net/?p=8004 http://nihongonosensei.net/?p=6343 http://nihongonosensei.net/?p=8144 http://nihongonosensei.net/?p=8143 http://nihongonosensei.net/?p=8150 http://nihongonosensei.net/?p=8152 http://nihongonosensei.net/?p=8161 http://nihongonosensei.net/?p=8164 http://nihongonosensei.net/?p=8257 http://nihongonosensei.net/?p=9482 http://nihongonosensei.net/?p=8261 http://nihongonosensei.net/?p=8159 http://nihongonosensei.net/?p=8272 http://nihongonosensei.net/?p=8274 http://nihongonosensei.net/?p=8277 http://nihongonosensei.net/?p=8279 http://nihongonosensei.net/?p=9215 http://nihongonosensei.net/?p=9217 http://nihongonosensei.net/?p=9859 http://nihongonosensei.net/?p=10102 http://nihongonosensei.net/?p=18631 http://nihongonosensei.net/?p=18632 http://nihongonosensei.net/?p=11303 http://nihongonosensei.net/?p=12781 http://nihongonosensei.net/?p=12812 http://nihongonosensei.net/?p=12799 http://nihongonosensei.net/?p=12802 http://nihongonosensei.net/?p=12809 http://nihongonosensei.net/?p=13150 http://nihongonosensei.net/?p=11946 http://nihongonosensei.net/?p=13152 http://nihongonosensei.net/?p=18503 http://nihongonosensei.net/?p=3582 http://nihongonosensei.net/?p=17664 http://nihongonosensei.net/?p=17751 http://nihongonosensei.net/?p=18264 http://nihongonosensei.net/?p=18267 http://nihongonosensei.net/?p=18265 http://nihongonosensei.net/?p=18303 http://nihongonosensei.net/?p=18393 http://nihongonosensei.net/?p=8281 http://nihongonosensei.net/?p=18614 http://nihongonosensei.net/?p=18676 http://nihongonosensei.net/?p=18678 http://nihongonosensei.net/?p=18816 http://nihongonosensei.net/?p=18818 http://nihongonosensei.net/?p=18812 http://nihongonosensei.net/?p=18809 http://nihongonosensei.net/?p=18807 http://nihongonosensei.net/?p=18805 http://nihongonosensei.net/?p=18803 http://nihongonosensei.net/?p=18330 http://nihongonosensei.net/?p=3446 http://nihongonosensei.net/?p=3662 http://nihongonosensei.net/?p=5182 http://nihongonosensei.net/?p=9262 http://nihongonosensei.net/?p=9264 http://nihongonosensei.net/?p=3647 http://nihongonosensei.net/?p=8567 http://nihongonosensei.net/?p=9343 http://nihongonosensei.net/?p=8045 http://nihongonosensei.net/?p=18305 http://nihongonosensei.net/?p=18307 http://nihongonosensei.net/?p=18427 http://nihongonosensei.net/?p=18615 http://nihongonosensei.net/?p=18713 http://nihongonosensei.net/?p=18715 http://nihongonosensei.net/?p=18717 http://nihongonosensei.net/?p=18736 http://nihongonosensei.net/?p=3668 http://nihongonosensei.net/?p=5180 http://nihongonosensei.net/?p=4090 http://nihongonosensei.net/?p=11943 http://nihongonosensei.net/?p=11950 http://nihongonosensei.net/?p=11941 http://nihongonosensei.net/?p=12816 http://nihongonosensei.net/?p=18323 http://nihongonosensei.net/?p=18349 http://nihongonosensei.net/?p=18784 http://nihongonosensei.net/?p=18786 http://nihongonosensei.net/?p=18814 http://nihongonosensei.net/?p=18405 http://nihongonosensei.net/?p=5233 http://nihongonosensei.net/?p=7154 http://nihongonosensei.net/?p=7938 http://nihongonosensei.net/?p=7943 http://nihongonosensei.net/?p=8509 http://nihongonosensei.net/?p=8541 http://nihongonosensei.net/?p=8886 http://nihongonosensei.net/?p=8889 http://nihongonosensei.net/?p=9440 http://nihongonosensei.net/?p=9441 http://nihongonosensei.net/?p=3639 http://nihongonosensei.net/?p=3575 http://nihongonosensei.net/?p=3603 http://nihongonosensei.net/?p=11627 http://nihongonosensei.net/?p=11953 http://nihongonosensei.net/?p=11955 http://nihongonosensei.net/?p=17914 http://nihongonosensei.net/?p=18195 http://nihongonosensei.net/?p=18217 http://nihongonosensei.net/?p=18348 http://nihongonosensei.net/?p=18371 http://nihongonosensei.net/?p=18375 http://nihongonosensei.net/?p=18377 http://nihongonosensei.net/?p=18379 http://nihongonosensei.net/?p=18653 http://nihongonosensei.net/?p=18655 http://nihongonosensei.net/?p=13346 http://nihongonosensei.net/?p=13347 http://nihongonosensei.net/?p=13348 http://nihongonosensei.net/?p=13358 http://nihongonosensei.net/?p=13362 http://nihongonosensei.net/?p=13373 http://nihongonosensei.net/?p=13369 http://nihongonosensei.net/?p=13379 http://nihongonosensei.net/?p=13385 http://nihongonosensei.net/?p=13462 http://nihongonosensei.net/?p=13466 http://nihongonosensei.net/?p=14905 http://nihongonosensei.net/?p=17576 http://nihongonosensei.net/?p=17593 http://nihongonosensei.net/?p=17597 http://nihongonosensei.net/?p=17600 http://nihongonosensei.net/?p=17917 http://nihongonosensei.net/?p=18268 http://nihongonosensei.net/?p=18363 http://nihongonosensei.net/?p=19118 http://nihongonosensei.net/ http://nihongonosensei.net/?cat=7 http://nihongonosensei.net/?cat=3 http://nihongonosensei.net/?page_id=10246 http://nihongonosensei.net/?page_id=10246#linkn1 http://nihongonosensei.net/?page_id=10246#linkn2 http://nihongonosensei.net/?page_id=10246#linkn3 http://nihongonosensei.net/?page_id=10246#linkn4n5 http://nihongonosensei.net/?page_id=10246#linkn0 http://nihongonosensei.net/?page_id=13879 http://nihongonosensei.net/?page_id=8874 http://nihongonosensei.net/?p=17729 http://nihongonosensei.net/?page_id=8874#link2019 http://nihongonosensei.net/?page_id=8874#link30 http://nihongonosensei.net/?page_id=8874#link29 http://nihongonosensei.net/?page_id=8874#link28 http://nihongonosensei.net/?page_id=8874#link27 http://nihongonosensei.net/?page_id=8874#link26 http://nihongonosensei.net/?page_id=8874#link25 http://nihongonosensei.net/?page_id=8874#link24 http://nihongonosensei.net/?page_id=8874#link23 http://nihongonosensei.net/?page_id=4945 http://nihongonosensei.net/?page_id=5094 http://nihongonosensei.net/?page_id=13794 http://nihongonosensei.net/?page_id=13794#link1 http://nihongonosensei.net/?page_id=13794#link2 http://nihongonosensei.net/?page_id=13825 http://nihongonosensei.net/?page_id=13827 http://nihongonosensei.net/?page_id=1904 https://thk.kanzae.net/
Using beautiful soup inside a loop to search for urls
I am trying to write a python script that goes onto the website https://www.premierleague.com/players, takes a list of football player names from a spreadsheet I have (400+ footballer names), and inside a loop, iteratively searches for the link to each football player's page. For example : https://www.premierleague.com/players/4040/Benik-Afobe/overview. The final bit of the script is commented out as I have finalised that yet, but for context of what I'm eventually going to get to: it will take this list of urls that I will have obtained, and iteratively search for each players link to the player image, and append it to a list. I managed to get it to work for an individual player (Benik Afobe), but since adding the 'players_list' and trying a loop, I get the following error: Traceback (most recent call last): File "C:/Users/Liam/Documents/GitHub/Football_Scraping/fantast_pl_images.py", line 33, in <module> player_link = soup.find('a', href=re.compile('%s'))['href'] %player TypeError: 'NoneType' object is not subscriptable Does anyone know what I'm doing wrong and how to get my loop working? The Repo of my project can be found here: https://github.com/leej11/Football_Scraping # Import the Libraries that I need import urllib3 import certifi from bs4 import BeautifulSoup import re import pandas as pd # Specify the URL url = 'https://www.premierleague.com/players' http = urllib3.PoolManager(cert_reqs='CERT_REQUIRED', ca_certs=certifi.where()) response = http.request('GET', url, headers={'User-Agent': 'Mozilla/5.0'}) #Parse the html using beautiful soup and store in variable 'soup' soup = BeautifulSoup(response.data, "html.parser") #Importing the list of players I want to scrape the image of players_list = pd.read_csv('epl_players_anki_clean.csv') #Test that it's pulling all of the players names correctly print(players_list.iloc[:,0]) print (type(players_list)) #Convert the pandas dataframe to a list of strings, with each item being the string of a player name list_of_players = players_list['name'].values.tolist() print(list_of_players) #Setup an empty list to append the player links to player_link_list = [] #Loop over the list of player names, and search for the player url and append it to the player_link_list for player in players_list: player_link = soup.find('a', href=re.compile('%s'))['href'] %player print(player_link) player_url = 'https://www.premierleague.com' + '%s' %player_link print(player_url) player_link_list.append(player_url) ##### The final step ##### To be worked on in a bit, basically take the list of links and loop over it pulling out the player image links and appending them to a list ##### # url2 = player_url # http2 = urllib3.PoolManager(cert_reqs='CERT_REQUIRED', ca_certs=certifi.where()) # response2 = http2.request('GET', url2, headers={'User-Agent': 'Mozilla/5.0'}) # soup2 = BeautifulSoup(response2.data, "html.parser") # player_img = str(soup2.find("img", {'alt':'Benik Afobe'})['data-player']) # print(player_img) # # photo_link = 'http://platform-static-files.s3.amazonaws.com/premierleague/photos/players/250x250/' + '%s' %player_img + '.png' # print(photo_link)
It appears that the Premier League's player listing is dynamic, meaning that a browser script is loading additional players as the user scrolls down. Thus, using requests or urllib to find all the players will not work. Therefore, you will have to use a browser manipulation tool called selenium: Install: pip install selenium: Then, install the proper binding for the webbrowser you are using: http://selenium-python.readthedocs.io/installation.html#drivers import re import selenium import time import csv driver = selenium.webdriver.Chrome('/path/to/driver')#substitute Chrome with browser you are using driver.get('https://www.premierleague.com/players') last_height = driver.execute_script("return document.body.scrollHeight") while True: driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") time.sleep(0.5) new_height = driver.execute_script("return document.body.scrollHeight") if new_height == last_height: break last_height = new_height time.sleep(10) players = re.findall('www\.premierleague\.com/players/(.*?)/(.*?)/overview', driver.page_source) csv_filedata = list(csv.reader(open('epl_players_anki_clean.csv'))) player_dict = {re.sub('-', ' ', b):(a, b) for a, b in players} new_rows = [[csv_filedata[0]]+['url']]+[a+['https://www.premierleague.com/players/{}/{}/overview'.format(*player_dict[a[0]])] for a in csv_filedata] with open('players.csv', 'a') as f: write = csv.writer(f) write.writerows(new_rows) player_dict stores the following (truncated data): Output: {u'Asmir Begovic': [u'2537', u'Asmir-Begovic'], u'Ragnar Klavan': [u'15608', u'Ragnar-Klavan'], u'Eddie Nketiah': [u'14451', u'Eddie-Nketiah'], u'Mohamed Salah': [u'5178', u'Mohamed-Salah'], u'Jos\xe9 Holebas': [u'5713', u'Jos\xe9-Holebas'], u'Christian Eriksen': [u'4845', u'Christian-Eriksen'], u'Kurt Zouma': [u'5175', u'Kurt-Zouma'], u'Gareth Barry': [u'1308', u'Gareth-Barry'], u'Diego Costa': [u'4941', u'Diego-Costa'], u'Sam McQueen': [u'9649', u'Sam-McQueen'], u'Roque Mesa': [u'22575', u'Roque-Mesa'], u'Siem de Jong': [u'4885', u'Siem-de-Jong'], u'Lazar Markovic': [u'5078', u'Lazar-Markovic'], u'Adam Federici': [u'3182', u'Adam-Federici'], u'Dean Marney': [u'2359', u'Dean-Marney'], u'Nathan Broadhead': [u'14636', u'Nathan-Broadhead'], u'Alex Pritchard': [u'4433', u'Alex-Pritchard'], u'Matthew Pennington': [u'9895', u'Matthew-Pennington'], u'Tomas Kalas': [u'4101', u'Tomas-Kalas'], u'Nathan Ak\xe9': [u'4499', u'Nathan-Ak\xe9'], u'Mathias Normann': [u'23556', u'Mathias-Normann'], u'Grzegorz Krychowiak': [u'12735', u'Grzegorz-Krychowiak'], u'Wojciech Szczesny': [u'3543', u'Wojciech-Szczesny'], u'Charlie Adam': [u'4081', u'Charlie-Adam'], u'Marko Grujic': [u'13985', u'Marko-Grujic'], u'Harry Maguire': [u'9566', u'Harry-Maguire'], u'Isaiah Brown': [u'4674', u'Isaiah-Brown'], u'Matz Sels': [u'16723', u'Matz-Sels'], u'Leighton Baines': [u'3030', u'Leighton-Baines'], u'Marouane Fellaini': [u'3604', u'Marouane-Fellaini'], u'Jairo Riedewald': [u'4878', u'Jairo-Riedewald'], u'Glenn Murray': [u'4772', u'Glenn-Murray'], u'Tom Cadman': [u'14547', u'Tom-Cadman'], u'Ryan Shawcross': [u'3158', u'Ryan-Shawcross'], u"N'Golo Kant\xe9": [u'13492', u"N'Golo-Kant\xe9"], u'Aaron Ramsey': [u'3548', u'Aaron-Ramsey'], u'Stephen Kingsley': [u'10517', u'Stephen-Kingsley'], u'Eliaquim Mangala': [u'5334', u'Eliaquim-Mangala'], u'Josh Tymon': [u'13477', u'Josh-Tymon'], u'Mohamed Diam\xe9': [u'3982', u'Mohamed-Diam\xe9'], u'Sofiane Boufal': [u'12584', u'Sofiane-Boufal'], u'Nya Kirby': [u'15134', u'Nya-Kirby'], u'Max Melbourne': [u'15160', u'Max-Melbourne'], u'Marcin Bulka': [u'23695', u'Marcin-Bulka'], u'Rub\xe9n Sobrino': [u'16608', u'Rub\xe9n-Sobrino'], u'Tareiq Holmes Dennis': [u'8340', u'Tareiq-Holmes-Dennis'], u'Martin Cranie': [u'2559', u'Martin-Cranie'], u'Connor Mahoney': [u'7985', u'Connor-Mahoney'], u'Jamaal Lascelles': [u'9257', u'Jamaal-Lascelles'], u'Phil Foden': [u'14805', u'Phil-Foden'], u'Arijanet Muric': [u'19911', u'Arijanet-Muric'], u'Sadio Man\xe9': [u'6519', u'Sadio-Man\xe9'], u"Aiden O'Neill": [u'20095', u"Aiden-O'Neill"], u'Steve Cook': [u'8045', u'Steve-Cook'], u'Samuel Shashoua': [u'15142', u'Samuel-Shashoua'], u'Kyle Bartley': [u'3312', u'Kyle-Bartley'], u'Bojan': [u'4898', u'Bojan'], u'Jason Puncheon': [u'4084', u'Jason-Puncheon'], u'Damien Delaney': [u'1911', u'Damien-Delaney'], u'Steven Defour': [u'5345', u'Steven-Defour'], u'Christian Walton': [u'8159', u'Christian-Walton'], u'Timothy Fosu Mensah': [u'13561', u'Timothy-Fosu-Mensah'], u'Michael Keane': [u'4333', u'Michael-Keane'], u'Levi Lumeka': [u'14170', u'Levi-Lumeka'], u'Chancel Mbemba': [u'5850', u'Chancel-Mbemba'], u'Brice Dja Dj\xe9dj\xe9': [u'5577', u'Brice-Dja-Dj\xe9dj\xe9'], u'Vurnon Anita': [u'4550', u'Vurnon-Anita'], u'Jefferson Montero': [u'10518', u'Jefferson-Montero'], u'Toby Alderweireld': [u'4916', u'Toby-Alderweireld'], u'Dominic Calvert Lewin': [u'9576', u'Dominic-Calvert-Lewin'], u'Brad Jackson': [u'13246', u'Brad-Jackson'], u'James McArthur': [u'4224', u'James-McArthur'], u'Mat Ryan': [u'12192', u'Mat-Ryan'], u'Bartosz Kapustka': [u'19679', u'Bartosz-Kapustka'], u'Robert Snodgrass': [u'4558', u'Robert-Snodgrass'], u'Jonathan Leko': [u'13866', u'Jonathan-Leko'], u'Harry Arter': [u'8050', u'Harry-Arter'], u'Connor Goldson': [u'9634', u'Connor-Goldson'], u'Shaun Hobson': [u'21691', u'Shaun-Hobson'], u'Ayoze P\xe9rez': [u'10487', u'Ayoze-P\xe9rez'], u'Marc Pugh': [u'8049', u'Marc-Pugh'], u'Luciano Narsingh': [u'7122', u'Luciano-Narsingh'], u'Michael Folivi': [u'14298', u'Michael-Folivi'], u'Adri\xe1n': [u'4852', u'Adri\xe1n'], u'Mason Holgate': [u'10564', u'Mason-Holgate'], u'Joy Mukena': [u'15118', u'Joy-Mukena'], u"Lewis O'Brien": [u'24353', u"Lewis-O'Brien"], u'Javier Manquillo': [u'4918', u'Javier-Manquillo'], u"Dara O'Shea": [u'15154', u"Dara-O'Shea"], u"Clinton N'Jie": [u'6903', u"Clinton-N'Jie"], u'Yoan Gouffran': [u'4554', u'Yoan-Gouffran'], u'Michael Carrick': [u'1634', u'Michael-Carrick'], u'Moha': [u'13778', u'Moha'], u'Michy Batshuayi': [u'7450', u'Michy-Batshuayi'], u'Nathaniel Chalobah': [u'4105', u'Nathaniel-Chalobah'], u'Ryan Inniss': [u'4760', u'Ryan-Inniss'], u'Etienne Capoue': [u'4843', u'Etienne-Capoue'], u'Badou Ndiaye': [u'20538', u'Badou-Ndiaye'], u'Alexandre Lacazette': [u'6899', u'Alexandre-Lacazette'], u'Charlie Rowan': [u'14285', u'Charlie-Rowan'], u'Nathan Ferguson': [u'23976', u'Nathan-Ferguson'], u'Anthony Georgiou': [u'15146', u'Anthony-Georgiou'], u'Dujon Sterling': [u'14572', u'Dujon-Sterling'], u'Axel Tuanzebe': [u'13559', u'Axel-Tuanzebe'], u'Emre Can': [u'5001', u'Emre-Can'], u'Sam Surridge': [u'13195', u'Sam-Surridge'], u'Ryan Kent': [u'13509', u'Ryan-Kent'], u'Marc Albrighton': [u'3564', u'Marc-Albrighton'], u'Joe Williams': [u'10454', u'Joe-Williams'], u'Tom Heaton': [u'2933', u'Tom-Heaton'], u'Danny Rose': [u'3507', u'Danny-Rose'], u'Nathan Redmond': [u'3811', u'Nathan-Redmond'], u'Chicharito': [u'4161', u'Chicharito'], u'Dean Whitehead': [u'2980', u'Dean-Whitehead'], u'M.J. Williams': [u'10464', u'M.J.-Williams'], u'Harry Winks': [u'7488', u'Harry-Winks'], u'Josh Sims': [u'15374', u'Josh-Sims'], u'Charlie Gilmour': [u'14453', u'Charlie-Gilmour'], u'Aaron Wan Bissaka': [u'14164', u'Aaron-Wan-Bissaka'], u'Marc Muniesa': [u'4822', u'Marc-Muniesa'], u'Beni Baningime': [u'14623', u'Beni-Baningime'], u'Demarai Gray': [u'7946', u'Demarai-Gray'], u'Junior Stanislas': [u'3766', u'Junior-Stanislas'], u'Liam Rosenior': [u'2464', u'Liam-Rosenior'], u'Nathaniel Clyne': [u'4604', u'Nathaniel-Clyne'], u'Kamil Grabara': [u'19909', u'Kamil-Grabara'], u'Anthony Martial': [u'11272', u'Anthony-Martial'], u'Ben Foster': [u'2932', u'Ben-Foster'], u'Laurent Depoitre': [u'16747', u'Laurent-Depoitre'], u'Mike van der Hoorn': [u'4877', u'Mike-van-der-Hoorn'], u'Didier Ndong': [u'20708', u'Didier-Ndong'], u'Jordon Mutch': [u'3333', u'Jordon-Mutch'], u'Harry Kane': [u'3960', u'Harry-Kane'], u'Fernandinho': [u'4804', u'Fernandinho'], u'Riyad Mahrez': [u'8983', u'Riyad-Mahrez'], u'Kleton Perntreou': [u'14144', u'Kleton-Perntreou'], u'Dion Henry': [u'10855', u'Dion-Henry'], u'Kelechi Iheanacho': [u'13554', u'Kelechi-Iheanacho'], u'Salom\xf3n Rond\xf3n': [u'6030', u'Salom\xf3n-Rond\xf3n'], u'Ryan Allsop': [u'3732', u'Ryan-Allsop'], u'Erik Pieters': [u'4821', u'Erik-Pieters'], u'Willy Caballero': [u'10466', u'Willy-Caballero'], u'Claudio Yacob': [u'4673', u'Claudio-Yacob'], u'Craig Dawson': [u'4198', u'Craig-Dawson'], u'Jayson Molumby': [u'15293', u'Jayson-Molumby'], u'Lucas Leiva': [u'3137', u'Lucas-Leiva'], u'Martin Dubravka': [u'6451', u'Martin-Dubravka'], u'Bruno': [u'8162', u'Bruno'], u'Sam Johnstone': [u'4331', u'Sam-Johnstone'], u'Jes\xfas G\xe1mez': [u'11070', u'Jes\xfas-G\xe1mez'], u'Tomer Hemed': [u'13234', u'Tomer-Hemed'], u'Victor Moses': [u'3983', u'Victor-Moses'], u'Vincent Janssen': [u'15481', u'Vincent-Janssen'], u'Lo\xefc Remy': [u'4572', u'Lo\xefc-Remy'], u'Craig Cathcart': [u'3160', u'Craig-Cathcart'], u'Leroy Fer': [u'4810', u'Leroy-Fer'], u"Kieran O'Hara": [u'13584', u"Kieran-O'Hara"], u'Ola Aina': [u'10439', u'Ola-Aina'], u'Winston Reid': [u'4209', u'Winston-Reid'], u'Jose Baxter': [u'3608', u'Jose-Baxter'], u'Michael Obafemi': [u'21532', u'Michael-Obafemi'], u'Bruno Martins Indi': [u'11177', u'Bruno-Martins-Indi'], u'Laurent Koscielny': [u'4030', u'Laurent-Koscielny'], u'Borja Bast\xf3n': [u'16622', u'Borja-Bast\xf3n'], u'Daryl Janmaat': [u'10480', u'Daryl-Janmaat'], u'Freddy Woodman': [u'10479', u'Freddy-Woodman'], u'Jordy Hiwula Mayifuila': [u'10949', u'Jordy-Hiwula-Mayifuila'], u'Raphael Spiegel': [u'4679', u'Raphael-Spiegel'], u'Anthony Knockaert': [u'8982', u'Anthony-Knockaert'], u'Harry Lewis': [u'14982', u'Harry-Lewis'], u'Henrikh Mkhitaryan': [u'5102', u'Henrikh-Mkhitaryan'], u'Santiago Cazorla': [u'4477', u'Santiago-Cazorla'], u'Sean Scannell': [u'8887', u'Sean-Scannell'], u'Christian Atsu': [u'4859', u'Christian-Atsu'], u'Pascal Gro\xdf': [u'22542', u'Pascal-Gro\xdf'], u'Charlie Austin': [u'9468', u'Charlie-Austin'], u'Sam Byram': [u'8945', u'Sam-Byram'], u'Daniel Sturridge': [u'3154', u'Daniel-Sturridge'], u'Ga\xebtan Bong': [u'5721', u'Ga\xebtan-Bong'], u'Martin Kelly': [u'3644', u'Martin-Kelly'], u'Jack Payne': [u'9664', u'Jack-Payne'], u'Michel Vorm': [u'4398', u'Michel-Vorm'], u'Oriol Romeu': [u'4286', u'Oriol-Romeu'], u'Philip Billing': [u'8882', u'Philip-Billing'], u'Matthew Lowton': [u'4487', u'Matthew-Lowton'], u'Wayne Hennessey': [u'2569', u'Wayne-Hennessey'], u'Geoff Cameron': [u'4636', u'Geoff-Cameron'], u'Tammy Abraham': [u'13286', u'Tammy-Abraham'], u'Elvis Manu': [u'12374', u'Elvis-Manu'], u'Marvin Zeegelaar': [u'10123', u'Marvin-Zeegelaar'], u'Jordy Clasie': [u'12365', u'Jordy-Clasie'], u'Wayne Routledge': [u'2681', u'Wayne-Routledge'], u'Tom Anderson': [u'8234', u'Tom-Anderson'], u'Stephen Duke McKenna': [u'23738', u'Stephen-Duke-McKenna'], u'Harry Charsley': [u'14632', u'Harry-Charsley'], u'Erik Lamela': [u'4842', u'Erik-Lamela'], u'Elias Kachunga': [u'19611', u'Elias-Kachunga'], u'Molla Wagu\xe9': [u'21730', u'Molla-Wagu\xe9'], u'Ilkay G\xfcndogan': [u'5101', u'Ilkay-G\xfcndogan'], u'Ashley Williams': [u'4403', u'Ashley-Williams'], u'Lewis Grabban': [u'8055', u'Lewis-Grabban'], u'Seamus Coleman': [u'3600', u'Seamus-Coleman'], u'Jason Denayer': [u'11002', u'Jason-Denayer'], u'Jack Wilshere': [u'3547', u'Jack-Wilshere'], u'Calum Chambers': [u'4620', u'Calum-Chambers'], u'Samir Nasri': [u'3546', u'Samir-Nasri'], u'Alexis S\xe1nchez': [u'4973', u'Alexis-S\xe1nchez'], u'Kyle Walker': [u'3955', u'Kyle-Walker'], u'Martin Olsson': [u'2867', u'Martin-Olsson'], u'Modou Barrow': [u'10520', u'Modou-Barrow'], u'Robbie Brady': [u'4158', u'Robbie-Brady'], u'Tom Davies': [u'13389', u'Tom-Davies'], u'Fraser Forster': [u'3170', u'Fraser-Forster'], u'Francis Coquelin': [u'3549', u'Francis-Coquelin'], u'Matt Targett': [u'4815', u'Matt-Targett'], u'Davy Klaassen': [u'4886', u'Davy-Klaassen'], u"Stefan O'Connor": [u'10425', u"Stefan-O'Connor"], u'Fraser Hornby': [u'23744', u'Fraser-Hornby'], u'Tim Krul': [u'3169', u'Tim-Krul'], u'Ryan Hill': [u'21858', u'Ryan-Hill'], u'J\xfcrgen Locadia': [u'7124', u'J\xfcrgen-Locadia'], u'Ki Sung yueng': [u'4656', u'Ki-Sung-yueng'], u'Leon Britton': [u'2152', u'Leon-Britton'], u'Mesut \xd6zil': [u'4714', u'Mesut-\xd6zil'], u'Alex Denny': [u'14643', u'Alex-Denny'], u'Nemanja Matic': [u'3861', u'Nemanja-Matic'], u'Ryan Fraser': [u'8052', u'Ryan-Fraser'], u'Julian Speroni': [u'2664', u'Julian-Speroni'], u'Joel Campbell': [u'4254', u'Joel-Campbell'], u'Robert Elliot': [u'2214', u'Robert-Elliot'], u'Tosin Adarabioyo': [u'13549', u'Tosin-Adarabioyo'], u'Jack Colback': [u'3713', u'Jack-Colback'], u'Soufyan Ahannach': [u'24695', u'Soufyan-Ahannach'], u'Aaron Connolly': [u'21653', u'Aaron-Connolly'], u'Yasin Ben El Mhanni': [u'20879', u'Yasin-Ben-El-Mhanni'], u'Kazenga Lua Lua': [u'3173', u'Kazenga-Lua-Lua'], u'Ben Chilwell': [u'13491', u'Ben-Chilwell'], u'Aaron Ramsdale': [u'13703', u'Aaron-Ramsdale']}
AttributeError when extracting data from a URL in Python
I am using the code below to try an extract the data at the table in this URL. However, I get the following error message: Error: `AttributeError: 'NoneType' object has no attribute 'find'`in the line `data = iter(soup.find("table", {"class": "tablestats"}).find("th", {"class": "header"}).find_all_next("tr"))` My code is as follows: from bs4 import BeautifulSoup import requests r = requests.get( "http://www.federalreserve.gov/econresdata/researchdata/feds200628_1.html") soup = BeautifulSoup(r.content) data = iter(soup.find("table", {"class": "tablestats"}).find("th", {"class": "header"}).find_all_next("tr")) headers = (next(data).text, next(data).text) table_items = [(a.text, b.text) for ele in data for a, b in [ele.find_all("td")]] for a, b in table_items: print(u"Date={}, Maturity={}".format(a, b if b.strip() else "null")) Thank You
from bs4 import BeautifulSoup import requests r = requests.get( "http://www.federalreserve.gov/econresdata/researchdata/feds200628_1.html") soup = BeautifulSoup(r.content) # column headers h = data.find_all("th", scope="col") # get all the tr tags after the headers final = [[t.th.text] + [ele.text for ele in t.find_all("td")] for t in h[-1].find_all_next("tr")] headers = [th.text for th in h] The final out list is all the rows in individual lists: [['2015-06-05', '4.82039691', '-4.66420959', '-4.18904598', '-3.94541434', '1.1477', '2.9361', '3.3588', '0.6943', '1.5881', '2.3034', '2.7677', '3.0363', '3.1801', '3.2537', '3.2930', '3.3190', '3.3431', '3.3707', '3.4038', '3.4428', '3.4871', '3.5357', '3.5876', '3.6419', '3.6975', '3.7538', '3.8100', '3.8656', '3.9202', '3.9734', '4.0250', '4.0748', '4.1225', '4.1682', '4.2117', '4.2530', '4.2921', '0.3489', '0.7464', '1.1502', '1.4949', '1.7700', '1.9841', '2.1500', '2.2800', '2.3837', '2.4685', '2.5396', '2.6006', '2.6544', '2.7027', '2.7469', '2.7878', '2.8260', '2.8621', '2.8964', '2.9291', '2.9603', '2.9901', '3.0187', '3.0461', '3.0724', '3.0976', '3.1217', '3.1448', '3.1669', '3.1881', '0.3487', '0.7469', '1.1536', '1.5039', '1.7862', '2.0078', '2.1811', '2.3179', '2.4277', '2.5181', '2.5943', '2.6603', '2.7190', '2.7722', '2.8215', '2.8677', '2.9117', '2.9538', '2.9944', '3.0338', '3.0721', '3.1094', '3.1458', '3.1814', '3.2161', '3.2501', '3.2832', '3.3156', '3.3472', '3.3781', '1.40431658', '9.48795888'], ['2015-06-04', '4.64953424', '-4.52780982', '-3.98051369', ...................................... The headers: ['BETA0', 'BETA1', 'BETA2', 'BETA3', 'SVEN1F01', 'SVEN1F04', 'SVEN1F09', 'SVENF01', 'SVENF02', 'SVENF03', 'SVENF04', 'SVENF05', 'SVENF06', 'SVENF07', 'SVENF08', 'SVENF09', 'SVENF10', 'SVENF11', 'SVENF12', 'SVENF13', 'SVENF14', 'SVENF15', 'SVENF16', 'SVENF17', 'SVENF18', 'SVENF19', 'SVENF20', 'SVENF21', 'SVENF22', 'SVENF23', 'SVENF24', 'SVENF25', 'SVENF26', 'SVENF27', 'SVENF28', 'SVENF29', 'SVENF30', 'SVENPY01', 'SVENPY02', 'SVENPY03', 'SVENPY04', 'SVENPY05', 'SVENPY06', 'SVENPY07', 'SVENPY08', 'SVENPY09', 'SVENPY10', 'SVENPY11', 'SVENPY12', 'SVENPY13', 'SVENPY14', 'SVENPY15', 'SVENPY16', 'SVENPY17', 'SVENPY18', 'SVENPY19', 'SVENPY20', 'SVENPY21', 'SVENPY22', 'SVENPY23', 'SVENPY24', 'SVENPY25', 'SVENPY26', 'SVENPY27', 'SVENPY28', 'SVENPY29', 'SVENPY30', 'SVENY01', 'SVENY02', 'SVENY03', 'SVENY04', 'SVENY05', 'SVENY06', 'SVENY07', 'SVENY08', 'SVENY09', 'SVENY10', 'SVENY11', 'SVENY12', 'SVENY13', 'SVENY14', 'SVENY15', 'SVENY16', 'SVENY17', 'SVENY18', 'SVENY19', 'SVENY20', 'SVENY21', 'SVENY22', 'SVENY23', 'SVENY24', 'SVENY25', 'SVENY26', 'SVENY27', 'SVENY28', 'SVENY29', 'SVENY30', 'TAU1', 'TAU2']
There are a lot of issues in your code. There is no table with class 'tablestats'. There are no 'th' fields with class 'header'. Following line- table_items = [(a.text, b.text) for ele in data for a, b in [ele.find_all("td")]] doesnt return just 2 values, so cant assign to a, b