Related
I couldn't find all 'href' by "find_elements_by_xpath"
Is there another way to find data? thx
!pip install selenium
from selenium import webdriver
import time
import pandas as pd
browser = webdriver.Chrome(executable_path='./chromedriver.exe')
browser.implicitly_wait(5)
browser.get("https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons")
linkPath = '//ul[#class = "sc-eWvPqa cePswM"]/li/a'
product_links = browser.find_elements_by_xpath(linkPath)
print(product_links)
href is attribute of the anchor tag.
so this xpath
//a
should locate all of them.
or in Selenium you can use tag_name as well. I will use XPath,
browser = webdriver.Chrome(executable_path='./chromedriver.exe') browser.get("https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons")
browser.maximize_window()
linkPath = "//a"
product_links = browser.find_elements(By.XPATH, linkPath)
print(len(product_links))
for link in product_links:
print(link.get_attribute('href'))
Output:
288
https://tw.yahoo.com/
https://tw.buy.yahoo.com/
https://tw.bid.yahoo.com/
https://tw.usedcar.yahoo.com/
https://tw.mall.yahoo.com/activity?p=mall-1-0-180921-channelgroupbuy
http://mail.yahoo.com.tw/
https://yahoomode.tumblr.com/yahooapp/
https://tw.mall.yahoo.com/
https://tw.mall.yahoo.com/search/store?p=
https://tw.user.mall.yahoo.com/my/home
https://tw.sc.mall.yahoo.com/mcart/preview
https://tw.user.mall.yahoo.com/my/order/orderList
https://login.yahoo.com/config/login?.intl=tw&.src=mktg1&done=https%3A%2F%2Ftw.mall.yahoo.com%2Fstore%2Fwatsons
https://tw.user.mall.yahoo.com/my/point
https://tw.mall.yahoo.com/
https://tw.user.mall.yahoo.com/my/home
https://tw.user.mall.yahoo.com/my/order/orderList
https://tw.user.mall.yahoo.com/sc/view/home
https://tw.user.mall.yahoo.com/my/notification
https://tw.user.mall.yahoo.com/my/point
https://tw.user.mall.yahoo.com/my/order/ratingList
https://tw.user.mall.yahoo.com/my/followupStore
https://tw.user.mall.yahoo.com/my/watchlist
https://tw.user.mall.yahoo.com/my/ecoupon
https://tw.user.mall.yahoo.com/my/voucher/unused
https://tw.user.mall.yahoo.com/my/member
https://tw.user.mall.yahoo.com/my/setting
https://tw.user.mall.yahoo.com/my/customerqa
https://tw.help.yahoo.com/kb/shopping-mall-web/SLN35152.html
https://tw.mall.yahoo.com/
https://tw.mall.yahoo.com/store/watsons
https://tw.mall.yahoo.com/store/watsons
https://tw.mall.yahoo.com/store/watsons
https://tw.mall.yahoo.com/store/watsons/rating/list
https://tw.mall.yahoo.com/chat/watsons
https://tw.mall.yahoo.com/store/watsons/stIntroMgt
https://tw.mall.yahoo.com/store_vip/watsons
https://tw.mall.yahoo.com/store/watsons/stNoteMgt
https://tw.mall.yahoo.com/store/watsons/edm
https://tw.mall.yahoo.com/store/watsons
None
None
https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=2689&path=2689
https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=2797&path=2797
https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=712&path=712
https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=664&path=664
None
https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=1553&path=1553
https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=2505&path=2505
https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=1452&path=1452
https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=1060&path=1060
https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=827&path=827
https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=1003&path=1003
https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=1024&path=1024
https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=875&path=875
https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=958&path=958
https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=979&path=979
https://tw.mall.yahoo.com/store_vip/watsons
https://tw.mall.yahoo.com/store/watsons/promo
https://tw.rcv.mall.yahoo.com/rcv/askEcoupon?s=5Ir7dQTxCtYebEIVRr7qdbQJrQ--
https://tw.mall.yahoo.com/store/watsons/promoCode?id=407205
https://tw.mall.yahoo.com/store/watsons/amountPromo?promotion_id=3101274
https://tw.mall.yahoo.com/store/watsons/amountPromo?promotion_id=3101277
https://tw.mall.yahoo.com/store/watsons/amountPromo?promotion_id=3101328
https://tw.mall.yahoo.com/store/watsons/amountPromo?promotion_id=3101279
https://tw.mall.yahoo.com/store/watsons/amountPromo?promotion_id=3101281
https://tw.mall.yahoo.com/store/watsons/amountPromo?promotion_id=3101286
https://tw.mall.yahoo.com/store/watsons/amountPromo?promotion_id=3101276
https://tw.mall.yahoo.com/store/watsons/amountPromo?promotion_id=3101216
https://tw.mall.yahoo.com/store/watsons/amountPromo?promotion_id=3101318
https://tw.mall.yahoo.com/store/watsons/amountPromo?promotion_id=3101272
https://tw.mall.yahoo.com/store/watsons/amountPromo?promotion_id=3101228
https://tw.mall.yahoo.com/store/watsons/amountPromo?promotion_id=3101287
https://tw.mall.yahoo.com/store/watsons
https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=4792&path=4792
https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=4802&path=4793,4802
https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=4805&path=4794,4805
https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=4807&path=4794,4807
https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=4799&path=4793,4799
https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=4801&path=4793,4801
https://tw.mall.yahoo.com/search?q=%E6%B4%BB%E6%B2%9B%E5%A4%9A&sid=watsons
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=1539
https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=4810&path=4794,4810
https://tw.mall.yahoo.com/search?q=%E6%B4%BB%E6%B2%9B%E5%A4%9A&sid=watsons
https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=4806&path=4794,4806
https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=4806&path=4794,4806
https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=4800&path=4793,4800
https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=4800&path=4793,4800
https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=4812&path=4793,4812
https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=4812&path=4793,4812
https://tw.mall.yahoo.com/search?m=list&sid=watsons
https://tw.mall.yahoo.com/search?q=%E7%BE%8E%E8%88%92%E5%BE%8B&sid=watsons
https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=3283&path=2689,3283
https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=4808&path=4794,4808
https://tw.mall.yahoo.com/search?m=list&sid=watsons&ccatid=4809&path=4794,4809
https://tw.mall.yahoo.com/search?q=%E8%92%82%E8%8A%AC%E5%A6%AE%E4%BA%9E&sid=watsons
https://member.watsons.com.tw/NewsView.aspx?NewsID=AhmGIZT9gReFoAM18vHFyA%3d%3d
https://member.watsons.com.tw/NewsView.aspx?NewsID=PIzM00tt2hmQiVi9trVqWg%3d%3d
https://tw.mall.yahoo.com/activity?p=mall-1-0-200422-member
https://tw.mall.yahoo.com/item/p0330180543850
https://tw.mall.yahoo.com/item/p0330231079018
https://tw.mall.yahoo.com/item/p0330149612638
https://tw.mall.yahoo.com/item/p033089261341
https://tw.mall.yahoo.com/item/p0330190914741
https://tw.mall.yahoo.com/item/p0330206204422
https://tw.mall.yahoo.com/item/p033053127759
https://tw.mall.yahoo.com/item/p0330207304791
https://tw.mall.yahoo.com/item/p0330228841925
https://tw.mall.yahoo.com/item/p0330226336955
https://tw.mall.yahoo.com/item/p0330119743323
https://tw.mall.yahoo.com/item/p0330180543850
https://tw.mall.yahoo.com/item/p0330106434075
https://tw.mall.yahoo.com/item/p0330229833822
https://tw.mall.yahoo.com/item/p033098709784
https://tw.mall.yahoo.com/item/p03304442775
https://tw.mall.yahoo.com/item/p0330226584503
https://tw.mall.yahoo.com/item/p0330142835205
https://tw.mall.yahoo.com/item/p0330230215997
https://tw.mall.yahoo.com/item/p03304614678
https://tw.mall.yahoo.com/item/p033014991670
https://tw.mall.yahoo.com/item/p033041688721
https://tw.mall.yahoo.com/item/p0330172392339
https://tw.mall.yahoo.com/item/p0330222713237
https://tw.mall.yahoo.com/item/p033058974074
https://tw.mall.yahoo.com/item/p0330143315411
https://tw.mall.yahoo.com/item/p0330220872850
https://tw.mall.yahoo.com/item/p0330231079018
https://tw.mall.yahoo.com/item/p033092697026
https://tw.mall.yahoo.com/item/p0330219296877
https://tw.mall.yahoo.com/item/p0330230912735
https://tw.mall.yahoo.com/item/p0330230651934
https://tw.mall.yahoo.com/item/p0330199709328
https://tw.mall.yahoo.com/item/p0330229012247
https://tw.mall.yahoo.com/item/p0330142835202
https://tw.mall.yahoo.com/item/p0330158688717
https://tw.mall.yahoo.com/item/p0330227790346
None
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=4792
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=4793
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=4794
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=2689
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=3122
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=3126
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=3285
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=3952
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=3415
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=2150
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=2797
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=3155
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=757
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=749
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=740
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=3063
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=3312
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=3352
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=731
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=3061
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=712
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=664
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=1553
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=2505
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=1452
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=1539
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=1060
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=1046
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=827
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=1003
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=1024
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=875
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=918
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=2937
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=958
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=979
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=1797
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=2691
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=3186
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=2865
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=939
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=892
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=1757
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=4431
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ccatid=2950
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?pay=zero
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?pay=card
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?pay=store
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?pay=install
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ship=711
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ship=family
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ship=hilife_pick
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?ship=hilife_cash_on_delivery
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?status=instk
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?status=video
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?minr=1
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?minr=2
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?minr=3
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?minr=4
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?minr=5
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?s=-createtime
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?s=-rating
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?s=-rating_count
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?s=price
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?s=-price
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?view=both
https://tw.mall.yahoo.com/item/p0330231079018
https://tw.mall.yahoo.com/item/p0330221397264
https://tw.mall.yahoo.com/item/p0330201617111
https://tw.mall.yahoo.com/item/p0330149612638
https://tw.mall.yahoo.com/item/p0330199239561
https://tw.mall.yahoo.com/item/p0330157722030
https://tw.mall.yahoo.com/item/p0330195847496
https://tw.mall.yahoo.com/item/p0330199516056
https://tw.mall.yahoo.com/item/p033018957484
https://tw.mall.yahoo.com/item/p03304110080
https://tw.mall.yahoo.com/item/p0330180543850
https://tw.mall.yahoo.com/item/p0330182407803
https://tw.mall.yahoo.com/item/p0330127191918
https://tw.mall.yahoo.com/item/p033016523362
https://tw.mall.yahoo.com/item/p0330223567951
https://tw.mall.yahoo.com/item/p0330195847488
https://tw.mall.yahoo.com/item/p0330175780701
https://tw.mall.yahoo.com/item/p0330157492679
https://tw.mall.yahoo.com/item/p03304614678
https://tw.mall.yahoo.com/item/p033018936979
https://tw.mall.yahoo.com/item/p0330128871794
https://tw.mall.yahoo.com/item/p0330208015628
https://tw.mall.yahoo.com/item/p0330230696349
https://tw.mall.yahoo.com/item/p0330225341921
https://tw.mall.yahoo.com/item/p033076908568
https://tw.mall.yahoo.com/item/p0330226395844
https://tw.mall.yahoo.com/item/p03304110059
https://tw.mall.yahoo.com/item/p03304109225
https://tw.mall.yahoo.com/item/p0330194282004
https://tw.mall.yahoo.com/item/p0330224835920
https://tw.mall.yahoo.com/item/p0330228042972
https://tw.mall.yahoo.com/item/p0330106434075
https://tw.mall.yahoo.com/item/p03304109964
https://tw.mall.yahoo.com/item/p0330162452392
https://tw.mall.yahoo.com/item/p0330207304791
https://tw.mall.yahoo.com/item/p033076908463
https://tw.mall.yahoo.com/item/p03304057564
https://tw.mall.yahoo.com/item/p033037459923
https://tw.mall.yahoo.com/item/p0330212962835
https://tw.mall.yahoo.com/item/p0330212400568
https://tw.mall.yahoo.com/item/p03304109924
https://tw.mall.yahoo.com/item/p03304109929
https://tw.mall.yahoo.com/item/p0330157347297
https://tw.mall.yahoo.com/item/p033069529068
https://tw.mall.yahoo.com/item/p0330202092777
https://tw.mall.yahoo.com/item/p0330158688717
https://tw.mall.yahoo.com/item/p0330226336941
https://tw.mall.yahoo.com/item/p0330223693370
https://tw.mall.yahoo.com/item/p0330111297761
https://tw.mall.yahoo.com/item/p0330223745715
https://tw.mall.yahoo.com/item/p0330230577786
https://tw.mall.yahoo.com/item/p03304110017
https://tw.mall.yahoo.com/item/p0330107487926
https://tw.mall.yahoo.com/item/p0330101184035
https://tw.mall.yahoo.com/item/p033017662348
https://tw.mall.yahoo.com/item/p033069529070
https://tw.mall.yahoo.com/item/p0330200970871
https://tw.mall.yahoo.com/item/p0330227790329
https://tw.mall.yahoo.com/item/p0330181952679
https://tw.mall.yahoo.com/item/p033064463660
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?pg=2
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?pg=3
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?pg=4
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?pg=5
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?pg=6
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?pg=7
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?pg=8
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?pg=9
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?pg=10
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?pg=11
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons?pg=2
https://tw.mall.yahoo.com/chat/watsons?rr=1637492343029
https://tw.mall.yahoo.com/store/%E5%B1%88%E8%87%A3%E6%B0%8FWatsons:watsons#backToTop
https://itunes.apple.com/tw/app/id778296354?mt=8
https://play.google.com/store/apps/details?id=com.yahoo.mobile.client.android.ecstore&hl=zh_TW
https://www.facebook.com/Ybestbuy
https://tw.mall.yahoo.com/activity?p=mall-1-0-200424-newcorp003
https://tw.mall.yahoo.com/help/help.html
https://tw.mall.yahoo.com/help/return.html
https://policies.yahoo.com/tw/zh-hant/yahoo/terms/utos/index.htm
Due to language constraint, I could not differentiate products on the page.
I think they are located by
//a[#rel='nofollow']
XPath.
Update 1 :
linkPath = "//a"
product_links = driver.find_elements(By.XPATH, linkPath)
print(len(product_links))
for link in product_links:
address = link.get_attribute('href')
try:
if '/item/p' in address:
print(address)
except:
pass
Output :
https://tw.mall.yahoo.com/item/p0330180543850
https://tw.mall.yahoo.com/item/p0330231079018
https://tw.mall.yahoo.com/item/p0330149612638
https://tw.mall.yahoo.com/item/p033089261341
https://tw.mall.yahoo.com/item/p0330190914741
https://tw.mall.yahoo.com/item/p0330206204422
https://tw.mall.yahoo.com/item/p033053127759
https://tw.mall.yahoo.com/item/p0330207304791
https://tw.mall.yahoo.com/item/p0330228841925
https://tw.mall.yahoo.com/item/p0330226336955
https://tw.mall.yahoo.com/item/p0330230651934
https://tw.mall.yahoo.com/item/p033098709784
https://tw.mall.yahoo.com/item/p0330229833822
https://tw.mall.yahoo.com/item/p0330142835202
https://tw.mall.yahoo.com/item/p0330227790346
https://tw.mall.yahoo.com/item/p0330172392339
https://tw.mall.yahoo.com/item/p0330226584503
https://tw.mall.yahoo.com/item/p0330220872850
https://tw.mall.yahoo.com/item/p0330180543850
https://tw.mall.yahoo.com/item/p0330230215997
https://tw.mall.yahoo.com/item/p0330199709328
https://tw.mall.yahoo.com/item/p0330229012247
https://tw.mall.yahoo.com/item/p0330142835205
https://tw.mall.yahoo.com/item/p03304614678
https://tw.mall.yahoo.com/item/p033058974074
https://tw.mall.yahoo.com/item/p03304442775
https://tw.mall.yahoo.com/item/p033092697026
https://tw.mall.yahoo.com/item/p0330230912735
https://tw.mall.yahoo.com/item/p0330106434075
https://tw.mall.yahoo.com/item/p0330222713237
https://tw.mall.yahoo.com/item/p0330119743323
https://tw.mall.yahoo.com/item/p033041688721
https://tw.mall.yahoo.com/item/p0330231079018
https://tw.mall.yahoo.com/item/p0330158688717
https://tw.mall.yahoo.com/item/p033014991670
https://tw.mall.yahoo.com/item/p0330219296877
https://tw.mall.yahoo.com/item/p0330143315411
https://tw.mall.yahoo.com/item/p0330231079018
https://tw.mall.yahoo.com/item/p0330221397264
https://tw.mall.yahoo.com/item/p0330201617111
https://tw.mall.yahoo.com/item/p0330149612638
https://tw.mall.yahoo.com/item/p0330199239561
https://tw.mall.yahoo.com/item/p0330157722030
https://tw.mall.yahoo.com/item/p0330195847496
https://tw.mall.yahoo.com/item/p0330199516056
https://tw.mall.yahoo.com/item/p033018957484
https://tw.mall.yahoo.com/item/p03304110080
https://tw.mall.yahoo.com/item/p0330180543850
https://tw.mall.yahoo.com/item/p0330182407803
https://tw.mall.yahoo.com/item/p0330127191918
https://tw.mall.yahoo.com/item/p033016523362
https://tw.mall.yahoo.com/item/p0330223567951
https://tw.mall.yahoo.com/item/p0330195847488
https://tw.mall.yahoo.com/item/p0330175780701
https://tw.mall.yahoo.com/item/p0330157492679
https://tw.mall.yahoo.com/item/p03304614678
https://tw.mall.yahoo.com/item/p033018936979
https://tw.mall.yahoo.com/item/p0330128871794
https://tw.mall.yahoo.com/item/p0330208015628
https://tw.mall.yahoo.com/item/p0330230696349
https://tw.mall.yahoo.com/item/p0330225341921
https://tw.mall.yahoo.com/item/p033076908568
https://tw.mall.yahoo.com/item/p0330226395844
https://tw.mall.yahoo.com/item/p03304110059
https://tw.mall.yahoo.com/item/p0330194282004
https://tw.mall.yahoo.com/item/p03304109225
https://tw.mall.yahoo.com/item/p0330224835920
https://tw.mall.yahoo.com/item/p0330228042972
https://tw.mall.yahoo.com/item/p0330106434075
https://tw.mall.yahoo.com/item/p0330162452392
https://tw.mall.yahoo.com/item/p03304109964
https://tw.mall.yahoo.com/item/p0330207304791
https://tw.mall.yahoo.com/item/p03304057564
https://tw.mall.yahoo.com/item/p033076908463
https://tw.mall.yahoo.com/item/p033037459923
https://tw.mall.yahoo.com/item/p0330212962835
https://tw.mall.yahoo.com/item/p0330157347297
https://tw.mall.yahoo.com/item/p03304109929
https://tw.mall.yahoo.com/item/p0330212400568
https://tw.mall.yahoo.com/item/p03304109924
https://tw.mall.yahoo.com/item/p033069529068
https://tw.mall.yahoo.com/item/p0330202092777
https://tw.mall.yahoo.com/item/p0330158688717
https://tw.mall.yahoo.com/item/p0330226336941
https://tw.mall.yahoo.com/item/p0330223693370
https://tw.mall.yahoo.com/item/p0330111297761
https://tw.mall.yahoo.com/item/p0330223745715
https://tw.mall.yahoo.com/item/p0330230577786
https://tw.mall.yahoo.com/item/p0330101184035
https://tw.mall.yahoo.com/item/p03304110017
https://tw.mall.yahoo.com/item/p0330107487926
https://tw.mall.yahoo.com/item/p033017662348
https://tw.mall.yahoo.com/item/p0330200970871
https://tw.mall.yahoo.com/item/p033069529070
https://tw.mall.yahoo.com/item/p0330227790329
https://tw.mall.yahoo.com/item/p033064463660
https://tw.mall.yahoo.com/item/p0330181952679
I'm trying to grab just the first href in each row in an HTML table. Using find_all on the soup object doesn't work because there are multiple tables so I used soup.select() to isolate just that table and work from there but it doesn't seem to be working.
Tried using find_all on the soup object alone, tried looping through the table rows with find() but it said that it returns 'NoneType'.
I Would like to be able to store a list that starts [/players/a/abrinal01.html, "/players/a/acyqu01.html, etc]
url = 'https://www.basketball-reference.com/leagues/NBA_2019_per_game.html'
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'html.parser')
table = soup.find("table", { "id" : "per_game_stats" })
You can access the desired data by anchoring the parsing from the outer div wrapper with the id of all_per_game_stats:
import requests
from bs4 import BeautifulSoup as soup
d = soup(requests.get('https://www.basketball-reference.com/leagues/NBA_2019_per_game.html').text, 'html.parser')
data = [b.td.a['href'] for b in d.find('div', {'id':'all_per_game_stats'}).table.find_all('tr') if b.td]
Output:
['/players/a/abrinal01.html', '/players/a/acyqu01.html', '/players/a/adamsja01.html', '/players/a/adamsst01.html', '/players/a/adebaba01.html', '/players/a/adelde01.html', '/players/a/akoonde01.html', '/players/a/aldrila01.html', '/players/a/alkinra01.html', '/players/a/allengr01.html', '/players/a/allenja01.html', '/players/a/allenka01.html', '/players/a/aminual01.html', '/players/a/anderju01.html', '/players/a/anderky01.html', '/players/a/anderry01.html', '/players/a/anderry01.html', '/players/a/anderry01.html', '/players/a/anigbik01.html', '/players/a/antetgi01.html', '/players/a/antetko01.html', '/players/a/anthoca01.html', '/players/a/anunoog01.html', '/players/a/arcidry01.html', '/players/a/arizatr01.html', '/players/a/arizatr01.html', '/players/a/arizatr01.html', '/players/a/augusdj01.html', '/players/a/aytonde01.html', '/players/b/bacondw01.html', '/players/b/baglema01.html', '/players/b/bakerro01.html', '/players/b/bakerro01.html', '/players/b/bakerro01.html', '/players/b/baldwwa01.html', '/players/b/balllo01.html', '/players/b/bambamo01.html', '/players/b/bareajo01.html', '/players/b/barneha02.html', '/players/b/barneha02.html', '/players/b/barneha02.html', '/players/b/bartowi01.html', '/players/b/bateske01.html', '/players/b/batumni01.html', '/players/b/bayleje01.html', '/players/b/baynear01.html', '/players/b/bazemke01.html', '/players/b/bealbr01.html', '/players/b/beaslma01.html', '/players/b/beaslmi01.html', '/players/b/belinma01.html', '/players/b/belljo01.html', '/players/b/bembrde01.html', '/players/b/bendedr01.html', '/players/b/bertada02.html', '/players/b/bertada01.html', '/players/b/beverpa01.html', '/players/b/birchkh01.html', '/players/b/biyombi01.html', '/players/b/bjeline01.html', '/players/b/blakean01.html', '/players/b/bledser01.html', '/players/b/blossja01.html', '/players/b/bogdabo01.html', '/players/b/bogdabo02.html', '/players/b/bogutan01.html', '/players/b/boldejo01.html', '/players/b/bongais01.html', '/players/b/bookede01.html', '/players/b/bouchch01.html', '/players/b/bradlav01.html', '/players/b/bradlav01.html', '/players/b/bradlav01.html', '/players/b/bradlto01.html', '/players/b/breweco01.html', '/players/b/breweco01.html', '/players/b/breweco01.html', '/players/b/bridgmi01.html', '/players/b/bridgmi02.html', '/players/b/briscis01.html', '/players/b/broekry01.html', '/players/b/brogdma01.html', '/players/b/brookdi01.html', '/players/b/brookma01.html', '/players/b/brownbr01.html', '/players/b/brownja02.html', '/players/b/brownlo01.html', '/players/b/brownst02.html', '/players/b/browntr01.html', '/players/b/brunsja01.html', '/players/b/bryanth01.html', '/players/b/bullore01.html', '/players/b/bullore01.html', '/players/b/bullore01.html', '/players/b/burketr01.html', '/players/b/burketr01.html', '/players/b/burketr01.html', '/players/b/burksal01.html', '/players/b/burksal01.html', '/players/b/burksal01.html', '/players/b/burksal01.html', '/players/b/burtode02.html', '/players/b/butleji01.html', '/players/b/butleji01.html', '/players/b/butleji01.html', '/players/c/cabocbr01.html', '/players/c/caldejo01.html', '/players/c/caldwke01.html', '/players/c/canaais01.html', '/players/c/canaais01.html', '/players/c/canaais01.html', '/players/c/canaais01.html', '/players/c/capelca01.html', '/players/c/carrode01.html', '/players/c/carteje01.html', '/players/c/cartevi01.html', '/players/c/cartewe01.html', '/players/c/cartemi01.html', '/players/c/cartemi01.html', '/players/c/cartemi01.html', '/players/c/carusal01.html', '/players/c/casspom01.html', '/players/c/caulewi01.html', '/players/c/caupatr01.html', '/players/c/cavanty01.html', '/players/c/chandty01.html', '/players/c/chandty01.html', '/players/c/chandty01.html', '/players/c/chandwi01.html', '/players/c/chandwi01.html', '/players/c/chandwi01.html', '/players/c/chealjo01.html', '/players/c/chiozch01.html', '/players/c/chrisma01.html', '/players/c/chrisma01.html', '/players/c/chrisma01.html', '/players/c/clarkga01.html', '/players/c/clarkia01.html', '/players/c/clarkjo01.html', '/players/c/collijo01.html', '/players/c/colliza01.html', '/players/c/collida01.html', '/players/c/colsobo01.html', '/players/c/conlemi01.html', '/players/c/connapa01.html', '/players/c/cookqu01.html', '/players/c/couside01.html', '/players/c/covinro01.html', '/players/c/covinro01.html', '/players/c/covinro01.html', '/players/c/crabbal01.html', '/players/c/craigto01.html', '/players/c/crawfja01.html', '/players/c/creekmi01.html', '/players/c/creekmi01.html', '/players/c/creekmi01.html', '/players/c/crowdja01.html', '/players/c/cunnida01.html', '/players/c/curryse01.html', '/players/c/curryst01.html', '/players/d/danietr01.html', '/players/d/davisan02.html', '/players/d/davisde01.html', '/players/d/davised01.html', '/players/d/davisty01.html', '/players/d/dedmode01.html', '/players/d/dekkesa01.html', '/players/d/dekkesa01.html', '/players/d/dekkesa01.html', '/players/d/delgaan01.html', '/players/d/dellama01.html', '/players/d/dellama01.html', '/players/d/dellama01.html', '/players/d/denglu01.html', '/players/d/derozde01.html', '/players/d/derrima01.html', '/players/d/diallch01.html', '/players/d/diallha01.html', '/players/d/dienggo01.html', '/players/d/dinwisp01.html', '/players/d/divindo01.html', '/players/d/doncilu01.html', '/players/d/dorsety01.html', '/players/d/dorsety01.html', '/players/d/dorsety01.html', '/players/d/dotsoda01.html', '/players/d/doziepj01.html', '/players/d/dragigo01.html', '/players/d/drumman01.html', '/players/d/dudleja01.html', '/players/d/dunnkr01.html', '/players/d/duranke01.html', '/players/d/duvaltr01.html', '/players/e/edwarvi01.html', '/players/e/ellenhe01.html', '/players/e/ellenhe01.html', '/players/e/ellenhe01.html', '/players/e/ellinwa01.html', '/players/e/ellinwa01.html', '/players/e/ellinwa01.html', '/players/e/embiijo01.html', '/players/e/ennisja01.html', '/players/e/ennisja01.html', '/players/e/ennisja01.html', '/players/e/eubandr01.html', '/players/e/evansja02.html', '/players/e/evansja01.html', '/players/e/evansja01.html', '/players/e/evansja01.html', '/players/e/evansty01.html', '/players/e/exumda01.html', '/players/f/farieke01.html', '/players/f/farieke01.html', '/players/f/farieke01.html', '/players/f/favorde01.html', '/players/f/feliccr01.html', '/players/f/feltora01.html', '/players/f/fergute01.html', '/players/f/ferreyo01.html', '/players/f/finnedo01.html', '/players/f/forbebr01.html', '/players/f/fournev01.html', '/players/f/foxde01.html', '/players/f/frazime01.html', '/players/f/fraziti01.html', '/players/f/fraziti01.html', '/players/f/fraziti01.html', '/players/f/fredeji01.html', '/players/f/fryech01.html', '/players/f/fultzma01.html', '/players/g/gallida01.html', '/players/g/gallola01.html', '/players/g/garrebi01.html', '/players/g/gasolma01.html', '/players/g/gasolma01.html', '/players/g/gasolma01.html', '/players/g/gasolpa01.html', '/players/g/gasolpa01.html', '/players/g/gasolpa01.html', '/players/g/gayru01.html', '/players/g/georgpa01.html', '/players/g/gibsota01.html', '/players/g/gilesha01.html', '/players/g/gilgesh01.html', '/players/g/goberru01.html', '/players/g/goodwbr01.html', '/players/g/gordoaa01.html', '/players/g/gordoer01.html', '/players/g/gortama01.html', '/players/g/grahade01.html', '/players/g/grahatr01.html', '/players/g/grantje01.html', '/players/g/grantje02.html', '/players/g/grantdo01.html', '/players/g/greenda02.html', '/players/g/greendr01.html', '/players/g/greenge01.html', '/players/g/greenja01.html', '/players/g/greenja01.html', '/players/g/greenja01.html', '/players/g/greenje02.html', '/players/g/griffbl01.html', '/players/h/hamilda02.html', '/players/h/hannadu01.html', '/players/h/hardati02.html', '/players/h/hardati02.html', '/players/h/hardati02.html', '/players/h/hardeja01.html', '/players/h/harklma01.html', '/players/h/harremo01.html', '/players/h/harride01.html', '/players/h/harriga01.html', '/players/h/harrijo01.html', '/players/h/harrito02.html', '/players/h/harrito02.html', '/players/h/harrito02.html', '/players/h/harrian01.html', '/players/h/harrian01.html', '/players/h/harrian01.html', '/players/h/harrian01.html', '/players/h/harrish01.html', '/players/h/hartjo01.html', '/players/h/harteis01.html', '/players/h/hasleud01.html', '/players/h/haywago01.html', '/players/h/hensojo01.html', '/players/h/hernaju01.html', '/players/h/hernawi01.html', '/players/h/hezonma01.html', '/players/h/hicksis01.html', '/players/h/hieldbu01.html', '/players/h/highsha01.html', '/players/h/hilarne01.html', '/players/h/hillge01.html', '/players/h/hillge01.html', '/players/h/hillge01.html', '/players/h/hillso01.html', '/players/h/holidaa01.html', '/players/h/holidjr01.html', '/players/h/holidju01.html', '/players/h/holidju01.html', '/players/h/holidju01.html', '/players/h/hollajo02.html', '/players/h/holliro01.html', '/players/h/holmeri01.html', '/players/h/hoodro01.html', '/players/h/hoodro01.html', '/players/h/hoodro01.html', '/players/h/horfoal01.html', '/players/h/houseda01.html', '/players/h/howardw01.html', '/players/h/huertke01.html', '/players/h/humphis01.html', '/players/h/hunterj01.html', '/players/h/hutchch01.html', '/players/i/ibakase01.html', '/players/i/iguodan01.html', '/players/i/ilyaser01.html', '/players/i/inglejo01.html', '/players/i/ingraan01.html', '/players/i/ingrabr01.html', '/players/i/irvinky01.html', '/players/i/isaacjo01.html', '/players/i/iwundwe01.html', '/players/j/jacksde01.html', '/players/j/jacksfr01.html', '/players/j/jacksja02.html', '/players/j/jacksjo02.html', '/players/j/jacksju01.html', '/players/j/jacksju01.html', '/players/j/jacksju01.html', '/players/j/jacksre01.html', '/players/j/jamesle01.html', '/players/j/jeffeam01.html', '/players/j/jenkijo01.html', '/players/j/jenkijo01.html', '/players/j/jenkijo01.html', '/players/j/jerebjo01.html', '/players/j/johnsal02.html', '/players/j/johnsam01.html', '/players/j/johnsbj01.html', '/players/j/johnsbj01.html', '/players/j/johnsbj01.html', '/players/j/johnsja01.html', '/players/j/johnsst04.html', '/players/j/johnsst04.html', '/players/j/johnsst04.html', '/players/j/johnsty01.html', '/players/j/johnsty01.html', '/players/j/johnsty01.html', '/players/j/johnswe01.html', '/players/j/johnswe01.html', '/players/j/johnswe01.html', '/players/j/jokicni01.html', '/players/j/jonesda03.html', '/players/j/jonesde02.html', '/players/j/jonesja04.html', '/players/j/jonesje01.html', '/players/j/joneste01.html', '/players/j/jonesty01.html', '/players/j/jordade01.html', '/players/j/jordade01.html', '/players/j/jordade01.html', '/players/j/josepco01.html', '/players/k/kaminfr01.html', '/players/k/kanteen01.html', '/players/k/kanteen01.html', '/players/k/kanteen01.html', '/players/k/kennalu01.html', '/players/k/kiddgmi01.html', '/players/k/kingge03.html', '/players/k/klebima01.html', '/players/k/knighbr03.html', '/players/k/knighbr03.html', '/players/k/knighbr03.html', '/players/k/knoxke01.html', '/players/k/korkmfu01.html', '/players/k/kornelu01.html', '/players/k/korveky01.html', '/players/k/korveky01.html', '/players/k/korveky01.html', '/players/k/koufoko01.html', '/players/k/kurucro01.html', '/players/k/kuzmaky01.html', '/players/l/labissk01.html', '/players/l/labissk01.html', '/players/l/labissk01.html', '/players/l/lambje01.html', '/players/l/lavinza01.html', '/players/l/laymaja01.html', '/players/l/leaftj01.html', '/players/l/leeco01.html', '/players/l/leeco01.html', '/players/l/leeco01.html', '/players/l/leeda03.html', '/players/l/lemonwa01.html', '/players/l/lenal01.html', '/players/l/leonaka01.html', '/players/l/leoname01.html', '/players/l/leuerjo01.html', '/players/l/leverca01.html', '/players/l/lillada01.html', '/players/l/linje01.html', '/players/l/linje01.html', '/players/l/linje01.html', '/players/l/livinsh01.html', '/players/l/loftoza01.html', '/players/l/looneke01.html', '/players/l/lopezbr01.html', '/players/l/lopezro01.html', '/players/l/loveke01.html', '/players/l/lowryky01.html', '/players/l/loydjo01.html', '/players/l/lucaska01.html', '/players/l/luwawti01.html', '/players/l/luwawti01.html', '/players/l/luwawti01.html', '/players/l/lydonty01.html', '/players/l/lylestr01.html', '/players/m/machasc01.html', '/players/m/macksh01.html', '/players/m/macksh01.html', '/players/m/macksh01.html', '/players/m/maconda01.html', '/players/m/macurjp01.html', '/players/m/mahinia01.html', '/players/m/makerth01.html', '/players/m/makerth01.html', '/players/m/makerth01.html', '/players/m/marjabo01.html', '/players/m/marjabo01.html', '/players/m/marjabo01.html', '/players/m/markkla01.html', '/players/m/martija01.html', '/players/m/masonfr01.html', '/players/m/matenya01.html', '/players/m/matthwe02.html', '/players/m/matthwe02.html', '/players/m/matthwe02.html', '/players/m/matthwe02.html', '/players/m/mbahalu01.html', '/players/m/mccalta01.html', '/players/m/mccawpa01.html', '/players/m/mccawpa01.html', '/players/m/mccawpa01.html', '/players/m/mccolcj01.html', '/players/m/mccontj01.html', '/players/m/mcderdo01.html', '/players/m/mcgeeja01.html', '/players/m/mcgruro01.html', '/players/m/mckinal01.html', '/players/m/mclembe01.html', '/players/m/mcraejo01.html', '/players/m/meeksjo01.html', '/players/m/mejrisa01.html', '/players/m/meltode01.html', '/players/m/metuch01.html', '/players/m/middlkh01.html', '/players/m/milescj01.html', '/players/m/milescj01.html', '/players/m/milescj01.html', '/players/m/milleda01.html', '/players/m/millema01.html', '/players/m/millspa02.html', '/players/m/millspa01.html', '/players/m/miltosh01.html', '/players/m/mirotni01.html', '/players/m/mirotni01.html', '/players/m/mirotni01.html', '/players/m/mitchdo01.html', '/players/m/mitrona01.html', '/players/m/monkma01.html', '/players/m/monrogr01.html', '/players/m/monrogr01.html', '/players/m/monrogr01.html', '/players/m/monrogr01.html', '/players/m/mooreet01.html', '/players/m/moreler01.html', '/players/m/moreler01.html', '/players/m/moreler01.html', '/players/m/morrija01.html', '/players/m/morrima03.html', '/players/m/morrima02.html', '/players/m/morrima02.html', '/players/m/morrima02.html', '/players/m/morrimo01.html', '/players/m/motiedo01.html', '/players/m/motlejo01.html', '/players/m/mudiaem01.html', '/players/m/murraja01.html', '/players/m/musadz01.html', '/players/m/muscami01.html', '/players/m/muscami01.html', '/players/m/muscami01.html', '/players/m/mykhasv01.html', '/players/m/mykhasv01.html', '/players/m/mykhasv01.html', '/players/n/naderab01.html', '/players/n/nancela02.html', '/players/n/napiesh01.html', '/players/n/netora01.html', '/players/n/niangge01.html', '/players/n/noahjo01.html', '/players/n/noelne01.html', '/players/n/nowitdi01.html', '/players/n/ntilila01.html', '/players/n/nunnaja01.html', '/players/n/nunnaja01.html', '/players/n/nunnaja01.html', '/players/n/nurkiju01.html', '/players/n/nwabada01.html', '/players/o/onealro01.html', '/players/o/oquinky01.html', '/players/o/ojelese01.html', '/players/o/okafoja01.html', '/players/o/okoboel01.html', '/players/o/okogijo01.html', '/players/o/oladivi01.html', '/players/o/olynyke01.html', '/players/o/osmande01.html', '/players/o/oubreke01.html', '/players/o/oubreke01.html', '/players/o/oubreke01.html', '/players/p/pachuza01.html', '/players/p/parkeja01.html', '/players/p/parkeja01.html', '/players/p/parkeja01.html', '/players/p/parketo01.html', '/players/p/parsoch01.html', '/players/p/pattepa01.html', '/players/p/pattoju01.html', '/players/p/paulch01.html', '/players/p/payneca01.html', '/players/p/payneca01.html', '/players/p/payneca01.html', '/players/p/paytoel01.html', '/players/p/paytoga02.html', '/players/p/pinsoth01.html', '/players/p/plumlma01.html', '/players/p/plumlmi01.html', '/players/p/poeltja01.html', '/players/p/pondequ01.html', '/players/p/porteot01.html', '/players/p/porteot01.html', '/players/p/porteot01.html', '/players/p/portibo01.html', '/players/p/portibo01.html', '/players/p/portibo01.html', '/players/p/poweldw01.html', '/players/p/powelno01.html', '/players/p/poythal01.html', '/players/q/qizh01.html', '/players/r/rabbiv01.html', '/players/r/randlch01.html', '/players/r/randlju01.html', '/players/r/redicjj01.html', '/players/r/reedda01.html', '/players/r/reynoca01.html', '/players/r/richajo01.html', '/players/r/richama01.html', '/players/r/riverau01.html', '/players/r/riverau01.html', '/players/r/riverau01.html', '/players/r/robinde01.html', '/players/r/robindu01.html', '/players/r/robingl02.html', '/players/r/robinje01.html', '/players/r/robinmi01.html', '/players/r/rondora01.html', '/players/r/rosede01.html', '/players/r/rosste01.html', '/players/r/roziete01.html', '/players/r/rubiori01.html', '/players/r/russeda01.html', '/players/s/sabondo01.html', '/players/s/sampsbr01.html', '/players/s/sampsja02.html', '/players/s/saricda01.html', '/players/s/saricda01.html', '/players/s/saricda01.html', '/players/s/satorto01.html', '/players/s/schrode01.html', '/players/s/scottmi01.html', '/players/s/scottmi01.html', '/players/s/scottmi01.html', '/players/s/sefolth01.html', '/players/s/seldewa01.html', '/players/s/seldewa01.html', '/players/s/seldewa01.html', '/players/s/sextoco01.html', '/players/s/shamela01.html', '/players/s/shamela01.html', '/players/s/shamela01.html', '/players/s/shumpim01.html', '/players/s/shumpim01.html', '/players/s/shumpim01.html', '/players/s/siakapa01.html', '/players/s/siberjo01.html', '/players/s/simmobe01.html', '/players/s/simmojo02.html', '/players/s/simmojo02.html', '/players/s/simmojo02.html', '/players/s/simmoko01.html', '/players/s/simonan01.html', '/players/s/smartma01.html', '/players/s/smithde03.html', '/players/s/smithde03.html', '/players/s/smithde03.html', '/players/s/smithis01.html', '/players/s/smithjr01.html', '/players/s/smithja02.html', '/players/s/smithja02.html', '/players/s/smithja02.html', '/players/s/smithja02.html', '/players/s/smithzh01.html', '/players/s/snellto01.html', '/players/s/spaldra01.html', '/players/s/spaldra01.html', '/players/s/spaldra01.html', '/players/s/spellom01.html', '/players/s/stausni01.html', '/players/s/stausni01.html', '/players/s/stausni01.html', '/players/s/stephdj01.html', '/players/s/stephla01.html', '/players/s/sumneed01.html', '/players/s/swanica01.html', '/players/s/swanica01.html', '/players/s/swanica01.html', '/players/t/tatumja01.html', '/players/t/teaguje01.html', '/players/t/templga01.html', '/players/t/templga01.html', '/players/t/templga01.html', '/players/t/teodomi01.html', '/players/t/terreja01.html', '/players/t/terryem01.html', '/players/t/terryem01.html', '/players/t/terryem01.html', '/players/t/theisda01.html', '/players/t/thomais02.html', '/players/t/thomakh01.html', '/players/t/thomala01.html', '/players/t/thompkl01.html', '/players/t/thomptr01.html', '/players/t/thornsi01.html', '/players/t/tollian01.html', '/players/t/townska01.html', '/players/t/trentga02.html', '/players/t/trieral01.html', '/players/t/tuckepj01.html', '/players/t/turneev01.html', '/players/t/turnemy01.html', '/players/u/udohek01.html', '/players/u/ulisty01.html', '/players/v/valanjo01.html', '/players/v/valanjo01.html', '/players/v/valanjo01.html', '/players/v/vandeja01.html', '/players/v/vanvlfr01.html', '/players/v/vonleno01.html', '/players/v/vucevni01.html', '/players/w/wadedw01.html', '/players/w/wagnemo01.html', '/players/w/waitedi01.html', '/players/w/walkeke02.html', '/players/w/walkelo01.html', '/players/w/walljo01.html', '/players/w/wallaty01.html', '/players/p/princta02.html', '/players/w/wanambr01.html', '/players/w/warretj01.html', '/players/w/washbju01.html', '/players/w/watanyu01.html', '/players/w/welshth01.html', '/players/w/westbru01.html', '/players/w/whitede01.html', '/players/w/whiteok01.html', '/players/w/whiteha01.html', '/players/w/wiggian01.html', '/players/w/willial03.html', '/players/w/willicj01.html', '/players/w/willijo04.html', '/players/w/willike04.html', '/players/w/willilo02.html', '/players/w/willima02.html', '/players/w/williro04.html', '/players/w/willitr02.html', '/players/w/wilsodj01.html', '/players/w/winslju01.html', '/players/w/woodch01.html', '/players/w/woodch01.html', '/players/w/woodch01.html', '/players/w/wrighde01.html', '/players/w/wrighde01.html', '/players/w/wrighde01.html', '/players/y/yabusgu01.html', '/players/y/youngni01.html', '/players/y/youngth01.html', '/players/y/youngtr01.html', '/players/z/zelleco01.html', '/players/z/zellety01.html', '/players/z/zellety01.html', '/players/z/zellety01.html', '/players/z/zizican01.html', '/players/z/zubaciv01.html', '/players/z/zubaciv01.html', '/players/z/zubaciv01.html']
I would use a set comprehension to remove duplicates and also I think nth-of-type to select the appropriate column reads more cleanly. Using bs4 4.7.1
import requests
from bs4 import BeautifulSoup as bs
soup = bs(requests.get('https://www.basketball-reference.com/leagues/NBA_2019_per_game.html').text, 'html.parser')
links = {i['href'] for i in soup.select('#per_game_stats td:nth-of-type(1) a')}
print(links)
You could also use the following css selector:
[csk] > a
I am trying to write a python script that goes onto the website https://www.premierleague.com/players, takes a list of football player names from a spreadsheet I have (400+ footballer names), and inside a loop, iteratively searches for the link to each football player's page. For example : https://www.premierleague.com/players/4040/Benik-Afobe/overview.
The final bit of the script is commented out as I have finalised that yet, but for context of what I'm eventually going to get to: it will take this list of urls that I will have obtained, and iteratively search for each players link to the player image, and append it to a list.
I managed to get it to work for an individual player (Benik Afobe), but since adding the 'players_list' and trying a loop, I get the following error:
Traceback (most recent call last):
File "C:/Users/Liam/Documents/GitHub/Football_Scraping/fantast_pl_images.py", line 33, in <module>
player_link = soup.find('a', href=re.compile('%s'))['href'] %player
TypeError: 'NoneType' object is not subscriptable
Does anyone know what I'm doing wrong and how to get my loop working?
The Repo of my project can be found here: https://github.com/leej11/Football_Scraping
# Import the Libraries that I need
import urllib3
import certifi
from bs4 import BeautifulSoup
import re
import pandas as pd
# Specify the URL
url = 'https://www.premierleague.com/players'
http = urllib3.PoolManager(cert_reqs='CERT_REQUIRED', ca_certs=certifi.where())
response = http.request('GET', url, headers={'User-Agent': 'Mozilla/5.0'})
#Parse the html using beautiful soup and store in variable 'soup'
soup = BeautifulSoup(response.data, "html.parser")
#Importing the list of players I want to scrape the image of
players_list = pd.read_csv('epl_players_anki_clean.csv')
#Test that it's pulling all of the players names correctly
print(players_list.iloc[:,0])
print (type(players_list))
#Convert the pandas dataframe to a list of strings, with each item being the string of a player name
list_of_players = players_list['name'].values.tolist()
print(list_of_players)
#Setup an empty list to append the player links to
player_link_list = []
#Loop over the list of player names, and search for the player url and append it to the player_link_list
for player in players_list:
player_link = soup.find('a', href=re.compile('%s'))['href'] %player
print(player_link)
player_url = 'https://www.premierleague.com' + '%s' %player_link
print(player_url)
player_link_list.append(player_url)
##### The final step
##### To be worked on in a bit, basically take the list of links and loop over it pulling out the player image links and appending them to a list
#####
# url2 = player_url
# http2 = urllib3.PoolManager(cert_reqs='CERT_REQUIRED', ca_certs=certifi.where())
# response2 = http2.request('GET', url2, headers={'User-Agent': 'Mozilla/5.0'})
# soup2 = BeautifulSoup(response2.data, "html.parser")
# player_img = str(soup2.find("img", {'alt':'Benik Afobe'})['data-player'])
# print(player_img)
#
# photo_link = 'http://platform-static-files.s3.amazonaws.com/premierleague/photos/players/250x250/' + '%s' %player_img + '.png'
# print(photo_link)
It appears that the Premier League's player listing is dynamic, meaning that a browser script is loading additional players as the user scrolls down. Thus, using requests or urllib to find all the players will not work. Therefore, you will have to use a browser manipulation tool called selenium:
Install:
pip install selenium:
Then, install the proper binding for the webbrowser you are using:
http://selenium-python.readthedocs.io/installation.html#drivers
import re
import selenium
import time
import csv
driver = selenium.webdriver.Chrome('/path/to/driver')#substitute Chrome with browser you are using
driver.get('https://www.premierleague.com/players')
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(0.5)
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
time.sleep(10)
players = re.findall('www\.premierleague\.com/players/(.*?)/(.*?)/overview', driver.page_source)
csv_filedata = list(csv.reader(open('epl_players_anki_clean.csv')))
player_dict = {re.sub('-', ' ', b):(a, b) for a, b in players}
new_rows = [[csv_filedata[0]]+['url']]+[a+['https://www.premierleague.com/players/{}/{}/overview'.format(*player_dict[a[0]])] for a in csv_filedata]
with open('players.csv', 'a') as f:
write = csv.writer(f)
write.writerows(new_rows)
player_dict stores the following (truncated data):
Output:
{u'Asmir Begovic': [u'2537', u'Asmir-Begovic'], u'Ragnar Klavan': [u'15608', u'Ragnar-Klavan'], u'Eddie Nketiah': [u'14451', u'Eddie-Nketiah'], u'Mohamed Salah': [u'5178', u'Mohamed-Salah'], u'Jos\xe9 Holebas': [u'5713', u'Jos\xe9-Holebas'], u'Christian Eriksen': [u'4845', u'Christian-Eriksen'], u'Kurt Zouma': [u'5175', u'Kurt-Zouma'], u'Gareth Barry': [u'1308', u'Gareth-Barry'], u'Diego Costa': [u'4941', u'Diego-Costa'], u'Sam McQueen': [u'9649', u'Sam-McQueen'], u'Roque Mesa': [u'22575', u'Roque-Mesa'], u'Siem de Jong': [u'4885', u'Siem-de-Jong'], u'Lazar Markovic': [u'5078', u'Lazar-Markovic'], u'Adam Federici': [u'3182', u'Adam-Federici'], u'Dean Marney': [u'2359', u'Dean-Marney'], u'Nathan Broadhead': [u'14636', u'Nathan-Broadhead'], u'Alex Pritchard': [u'4433', u'Alex-Pritchard'], u'Matthew Pennington': [u'9895', u'Matthew-Pennington'], u'Tomas Kalas': [u'4101', u'Tomas-Kalas'], u'Nathan Ak\xe9': [u'4499', u'Nathan-Ak\xe9'], u'Mathias Normann': [u'23556', u'Mathias-Normann'], u'Grzegorz Krychowiak': [u'12735', u'Grzegorz-Krychowiak'], u'Wojciech Szczesny': [u'3543', u'Wojciech-Szczesny'], u'Charlie Adam': [u'4081', u'Charlie-Adam'], u'Marko Grujic': [u'13985', u'Marko-Grujic'], u'Harry Maguire': [u'9566', u'Harry-Maguire'], u'Isaiah Brown': [u'4674', u'Isaiah-Brown'], u'Matz Sels': [u'16723', u'Matz-Sels'], u'Leighton Baines': [u'3030', u'Leighton-Baines'], u'Marouane Fellaini': [u'3604', u'Marouane-Fellaini'], u'Jairo Riedewald': [u'4878', u'Jairo-Riedewald'], u'Glenn Murray': [u'4772', u'Glenn-Murray'], u'Tom Cadman': [u'14547', u'Tom-Cadman'], u'Ryan Shawcross': [u'3158', u'Ryan-Shawcross'], u"N'Golo Kant\xe9": [u'13492', u"N'Golo-Kant\xe9"], u'Aaron Ramsey': [u'3548', u'Aaron-Ramsey'], u'Stephen Kingsley': [u'10517', u'Stephen-Kingsley'], u'Eliaquim Mangala': [u'5334', u'Eliaquim-Mangala'], u'Josh Tymon': [u'13477', u'Josh-Tymon'], u'Mohamed Diam\xe9': [u'3982', u'Mohamed-Diam\xe9'], u'Sofiane Boufal': [u'12584', u'Sofiane-Boufal'], u'Nya Kirby': [u'15134', u'Nya-Kirby'], u'Max Melbourne': [u'15160', u'Max-Melbourne'], u'Marcin Bulka': [u'23695', u'Marcin-Bulka'], u'Rub\xe9n Sobrino': [u'16608', u'Rub\xe9n-Sobrino'], u'Tareiq Holmes Dennis': [u'8340', u'Tareiq-Holmes-Dennis'], u'Martin Cranie': [u'2559', u'Martin-Cranie'], u'Connor Mahoney': [u'7985', u'Connor-Mahoney'], u'Jamaal Lascelles': [u'9257', u'Jamaal-Lascelles'], u'Phil Foden': [u'14805', u'Phil-Foden'], u'Arijanet Muric': [u'19911', u'Arijanet-Muric'], u'Sadio Man\xe9': [u'6519', u'Sadio-Man\xe9'], u"Aiden O'Neill": [u'20095', u"Aiden-O'Neill"], u'Steve Cook': [u'8045', u'Steve-Cook'], u'Samuel Shashoua': [u'15142', u'Samuel-Shashoua'], u'Kyle Bartley': [u'3312', u'Kyle-Bartley'], u'Bojan': [u'4898', u'Bojan'], u'Jason Puncheon': [u'4084', u'Jason-Puncheon'], u'Damien Delaney': [u'1911', u'Damien-Delaney'], u'Steven Defour': [u'5345', u'Steven-Defour'], u'Christian Walton': [u'8159', u'Christian-Walton'], u'Timothy Fosu Mensah': [u'13561', u'Timothy-Fosu-Mensah'], u'Michael Keane': [u'4333', u'Michael-Keane'], u'Levi Lumeka': [u'14170', u'Levi-Lumeka'], u'Chancel Mbemba': [u'5850', u'Chancel-Mbemba'], u'Brice Dja Dj\xe9dj\xe9': [u'5577', u'Brice-Dja-Dj\xe9dj\xe9'], u'Vurnon Anita': [u'4550', u'Vurnon-Anita'], u'Jefferson Montero': [u'10518', u'Jefferson-Montero'], u'Toby Alderweireld': [u'4916', u'Toby-Alderweireld'], u'Dominic Calvert Lewin': [u'9576', u'Dominic-Calvert-Lewin'], u'Brad Jackson': [u'13246', u'Brad-Jackson'], u'James McArthur': [u'4224', u'James-McArthur'], u'Mat Ryan': [u'12192', u'Mat-Ryan'], u'Bartosz Kapustka': [u'19679', u'Bartosz-Kapustka'], u'Robert Snodgrass': [u'4558', u'Robert-Snodgrass'], u'Jonathan Leko': [u'13866', u'Jonathan-Leko'], u'Harry Arter': [u'8050', u'Harry-Arter'], u'Connor Goldson': [u'9634', u'Connor-Goldson'], u'Shaun Hobson': [u'21691', u'Shaun-Hobson'], u'Ayoze P\xe9rez': [u'10487', u'Ayoze-P\xe9rez'], u'Marc Pugh': [u'8049', u'Marc-Pugh'], u'Luciano Narsingh': [u'7122', u'Luciano-Narsingh'], u'Michael Folivi': [u'14298', u'Michael-Folivi'], u'Adri\xe1n': [u'4852', u'Adri\xe1n'], u'Mason Holgate': [u'10564', u'Mason-Holgate'], u'Joy Mukena': [u'15118', u'Joy-Mukena'], u"Lewis O'Brien": [u'24353', u"Lewis-O'Brien"], u'Javier Manquillo': [u'4918', u'Javier-Manquillo'], u"Dara O'Shea": [u'15154', u"Dara-O'Shea"], u"Clinton N'Jie": [u'6903', u"Clinton-N'Jie"], u'Yoan Gouffran': [u'4554', u'Yoan-Gouffran'], u'Michael Carrick': [u'1634', u'Michael-Carrick'], u'Moha': [u'13778', u'Moha'], u'Michy Batshuayi': [u'7450', u'Michy-Batshuayi'], u'Nathaniel Chalobah': [u'4105', u'Nathaniel-Chalobah'], u'Ryan Inniss': [u'4760', u'Ryan-Inniss'], u'Etienne Capoue': [u'4843', u'Etienne-Capoue'], u'Badou Ndiaye': [u'20538', u'Badou-Ndiaye'], u'Alexandre Lacazette': [u'6899', u'Alexandre-Lacazette'], u'Charlie Rowan': [u'14285', u'Charlie-Rowan'], u'Nathan Ferguson': [u'23976', u'Nathan-Ferguson'], u'Anthony Georgiou': [u'15146', u'Anthony-Georgiou'], u'Dujon Sterling': [u'14572', u'Dujon-Sterling'], u'Axel Tuanzebe': [u'13559', u'Axel-Tuanzebe'], u'Emre Can': [u'5001', u'Emre-Can'], u'Sam Surridge': [u'13195', u'Sam-Surridge'], u'Ryan Kent': [u'13509', u'Ryan-Kent'], u'Marc Albrighton': [u'3564', u'Marc-Albrighton'], u'Joe Williams': [u'10454', u'Joe-Williams'], u'Tom Heaton': [u'2933', u'Tom-Heaton'], u'Danny Rose': [u'3507', u'Danny-Rose'], u'Nathan Redmond': [u'3811', u'Nathan-Redmond'], u'Chicharito': [u'4161', u'Chicharito'], u'Dean Whitehead': [u'2980', u'Dean-Whitehead'], u'M.J. Williams': [u'10464', u'M.J.-Williams'], u'Harry Winks': [u'7488', u'Harry-Winks'], u'Josh Sims': [u'15374', u'Josh-Sims'], u'Charlie Gilmour': [u'14453', u'Charlie-Gilmour'], u'Aaron Wan Bissaka': [u'14164', u'Aaron-Wan-Bissaka'], u'Marc Muniesa': [u'4822', u'Marc-Muniesa'], u'Beni Baningime': [u'14623', u'Beni-Baningime'], u'Demarai Gray': [u'7946', u'Demarai-Gray'], u'Junior Stanislas': [u'3766', u'Junior-Stanislas'], u'Liam Rosenior': [u'2464', u'Liam-Rosenior'], u'Nathaniel Clyne': [u'4604', u'Nathaniel-Clyne'], u'Kamil Grabara': [u'19909', u'Kamil-Grabara'], u'Anthony Martial': [u'11272', u'Anthony-Martial'], u'Ben Foster': [u'2932', u'Ben-Foster'], u'Laurent Depoitre': [u'16747', u'Laurent-Depoitre'], u'Mike van der Hoorn': [u'4877', u'Mike-van-der-Hoorn'], u'Didier Ndong': [u'20708', u'Didier-Ndong'], u'Jordon Mutch': [u'3333', u'Jordon-Mutch'], u'Harry Kane': [u'3960', u'Harry-Kane'], u'Fernandinho': [u'4804', u'Fernandinho'], u'Riyad Mahrez': [u'8983', u'Riyad-Mahrez'], u'Kleton Perntreou': [u'14144', u'Kleton-Perntreou'], u'Dion Henry': [u'10855', u'Dion-Henry'], u'Kelechi Iheanacho': [u'13554', u'Kelechi-Iheanacho'], u'Salom\xf3n Rond\xf3n': [u'6030', u'Salom\xf3n-Rond\xf3n'], u'Ryan Allsop': [u'3732', u'Ryan-Allsop'], u'Erik Pieters': [u'4821', u'Erik-Pieters'], u'Willy Caballero': [u'10466', u'Willy-Caballero'], u'Claudio Yacob': [u'4673', u'Claudio-Yacob'], u'Craig Dawson': [u'4198', u'Craig-Dawson'], u'Jayson Molumby': [u'15293', u'Jayson-Molumby'], u'Lucas Leiva': [u'3137', u'Lucas-Leiva'], u'Martin Dubravka': [u'6451', u'Martin-Dubravka'], u'Bruno': [u'8162', u'Bruno'], u'Sam Johnstone': [u'4331', u'Sam-Johnstone'], u'Jes\xfas G\xe1mez': [u'11070', u'Jes\xfas-G\xe1mez'], u'Tomer Hemed': [u'13234', u'Tomer-Hemed'], u'Victor Moses': [u'3983', u'Victor-Moses'], u'Vincent Janssen': [u'15481', u'Vincent-Janssen'], u'Lo\xefc Remy': [u'4572', u'Lo\xefc-Remy'], u'Craig Cathcart': [u'3160', u'Craig-Cathcart'], u'Leroy Fer': [u'4810', u'Leroy-Fer'], u"Kieran O'Hara": [u'13584', u"Kieran-O'Hara"], u'Ola Aina': [u'10439', u'Ola-Aina'], u'Winston Reid': [u'4209', u'Winston-Reid'], u'Jose Baxter': [u'3608', u'Jose-Baxter'], u'Michael Obafemi': [u'21532', u'Michael-Obafemi'], u'Bruno Martins Indi': [u'11177', u'Bruno-Martins-Indi'], u'Laurent Koscielny': [u'4030', u'Laurent-Koscielny'], u'Borja Bast\xf3n': [u'16622', u'Borja-Bast\xf3n'], u'Daryl Janmaat': [u'10480', u'Daryl-Janmaat'], u'Freddy Woodman': [u'10479', u'Freddy-Woodman'], u'Jordy Hiwula Mayifuila': [u'10949', u'Jordy-Hiwula-Mayifuila'], u'Raphael Spiegel': [u'4679', u'Raphael-Spiegel'], u'Anthony Knockaert': [u'8982', u'Anthony-Knockaert'], u'Harry Lewis': [u'14982', u'Harry-Lewis'], u'Henrikh Mkhitaryan': [u'5102', u'Henrikh-Mkhitaryan'], u'Santiago Cazorla': [u'4477', u'Santiago-Cazorla'], u'Sean Scannell': [u'8887', u'Sean-Scannell'], u'Christian Atsu': [u'4859', u'Christian-Atsu'], u'Pascal Gro\xdf': [u'22542', u'Pascal-Gro\xdf'], u'Charlie Austin': [u'9468', u'Charlie-Austin'], u'Sam Byram': [u'8945', u'Sam-Byram'], u'Daniel Sturridge': [u'3154', u'Daniel-Sturridge'], u'Ga\xebtan Bong': [u'5721', u'Ga\xebtan-Bong'], u'Martin Kelly': [u'3644', u'Martin-Kelly'], u'Jack Payne': [u'9664', u'Jack-Payne'], u'Michel Vorm': [u'4398', u'Michel-Vorm'], u'Oriol Romeu': [u'4286', u'Oriol-Romeu'], u'Philip Billing': [u'8882', u'Philip-Billing'], u'Matthew Lowton': [u'4487', u'Matthew-Lowton'], u'Wayne Hennessey': [u'2569', u'Wayne-Hennessey'], u'Geoff Cameron': [u'4636', u'Geoff-Cameron'], u'Tammy Abraham': [u'13286', u'Tammy-Abraham'], u'Elvis Manu': [u'12374', u'Elvis-Manu'], u'Marvin Zeegelaar': [u'10123', u'Marvin-Zeegelaar'], u'Jordy Clasie': [u'12365', u'Jordy-Clasie'], u'Wayne Routledge': [u'2681', u'Wayne-Routledge'], u'Tom Anderson': [u'8234', u'Tom-Anderson'], u'Stephen Duke McKenna': [u'23738', u'Stephen-Duke-McKenna'], u'Harry Charsley': [u'14632', u'Harry-Charsley'], u'Erik Lamela': [u'4842', u'Erik-Lamela'], u'Elias Kachunga': [u'19611', u'Elias-Kachunga'], u'Molla Wagu\xe9': [u'21730', u'Molla-Wagu\xe9'], u'Ilkay G\xfcndogan': [u'5101', u'Ilkay-G\xfcndogan'], u'Ashley Williams': [u'4403', u'Ashley-Williams'], u'Lewis Grabban': [u'8055', u'Lewis-Grabban'], u'Seamus Coleman': [u'3600', u'Seamus-Coleman'], u'Jason Denayer': [u'11002', u'Jason-Denayer'], u'Jack Wilshere': [u'3547', u'Jack-Wilshere'], u'Calum Chambers': [u'4620', u'Calum-Chambers'], u'Samir Nasri': [u'3546', u'Samir-Nasri'], u'Alexis S\xe1nchez': [u'4973', u'Alexis-S\xe1nchez'], u'Kyle Walker': [u'3955', u'Kyle-Walker'], u'Martin Olsson': [u'2867', u'Martin-Olsson'], u'Modou Barrow': [u'10520', u'Modou-Barrow'], u'Robbie Brady': [u'4158', u'Robbie-Brady'], u'Tom Davies': [u'13389', u'Tom-Davies'], u'Fraser Forster': [u'3170', u'Fraser-Forster'], u'Francis Coquelin': [u'3549', u'Francis-Coquelin'], u'Matt Targett': [u'4815', u'Matt-Targett'], u'Davy Klaassen': [u'4886', u'Davy-Klaassen'], u"Stefan O'Connor": [u'10425', u"Stefan-O'Connor"], u'Fraser Hornby': [u'23744', u'Fraser-Hornby'], u'Tim Krul': [u'3169', u'Tim-Krul'], u'Ryan Hill': [u'21858', u'Ryan-Hill'], u'J\xfcrgen Locadia': [u'7124', u'J\xfcrgen-Locadia'], u'Ki Sung yueng': [u'4656', u'Ki-Sung-yueng'], u'Leon Britton': [u'2152', u'Leon-Britton'], u'Mesut \xd6zil': [u'4714', u'Mesut-\xd6zil'], u'Alex Denny': [u'14643', u'Alex-Denny'], u'Nemanja Matic': [u'3861', u'Nemanja-Matic'], u'Ryan Fraser': [u'8052', u'Ryan-Fraser'], u'Julian Speroni': [u'2664', u'Julian-Speroni'], u'Joel Campbell': [u'4254', u'Joel-Campbell'], u'Robert Elliot': [u'2214', u'Robert-Elliot'], u'Tosin Adarabioyo': [u'13549', u'Tosin-Adarabioyo'], u'Jack Colback': [u'3713', u'Jack-Colback'], u'Soufyan Ahannach': [u'24695', u'Soufyan-Ahannach'], u'Aaron Connolly': [u'21653', u'Aaron-Connolly'], u'Yasin Ben El Mhanni': [u'20879', u'Yasin-Ben-El-Mhanni'], u'Kazenga Lua Lua': [u'3173', u'Kazenga-Lua-Lua'], u'Ben Chilwell': [u'13491', u'Ben-Chilwell'], u'Aaron Ramsdale': [u'13703', u'Aaron-Ramsdale']}
Im trying to retrive a bunch of web pages using the follwing code
for i in range(619,333333):
print i
num=str(i)
urll='http://shironet.mako.co.il/artist?type=works&lang=1&prfid=%s' %num
r = requests.get(urll,timeout=5,headers=headers)
if r.status_code == 200:
print r.text
some of this web pages are exists for sure for example:
http://shironet.mako.co.il/artist?type=works&lang=1&prfid=
but when im running the code im getting the following results:
Blockquote
window.rbzns = {fiftyeightkb: 3600000, days_in_week : 1};var h9U={'L6':function(U,b){return U!=b;},'t6':function(U,b){return U|b;},'R6':function(U,b){return U>>>b;},'l2':function(U,b){return U===b;},'T7H':(function(L7H){return (function(C7H,r7H){return (function(W7H){return {v7H:W7H};})(function(B7H){var R7H,E7H=0;for(var q7H=C7H;E7HF7H;})(parseInt,Date,(function(J7H){return (''+J7H)"substring";})('_getTime2'),function(J7H,P7H){return new J7H()P7H;}),function(B7H,E7H){var O7H=parseInt(B7H"charAt",16)"toString";return O7H"charAt";});})('6ia8u0v00'),'n2':function(U,b){return U==b;},'x2':function(U,b){return U&b;},'N0':function(U,b){return U&b;},'p6':function(U,b){return U|b;},'c3':function(U,b){return Ub;},'e2':function(U,b){return U!==b;},'j0':function(U,b){return U>b;},'k7':function(U,b){return Ub;},'B7':function(U,b){return U<>b;},'O1':"getTimezoneOffset",'S3':function(U,b){return U=b;},'N7':function(U,b){return U==b;},'c7':function(U,b){return U<>>b;},'x6':function(U,b){return U&b;},'M7':function(U,b){return U<>>b;},'u0':function(U,b,D,k){return U^b^D^k;},'m6':function(U,b,D){return U|b|D;},'X7':function(U,b){return U&b;},'s7':function(U,b){return U!=b;},'E5':function(U,b){return U&b;},'q6':function(U,b){return U&b;},'M3':function(U,b){return U>=b;},'Q3':function(U,b){return U>=b;},'f2':function(U,b){return U|b;},'W2':function(U,b){return U|b;},'P0':function(U,b){return U&b;},'I9':"undefined",'i6':function(U,b){return U>b;},'k6':function(U,b){return U|b;},'j7':function(U,b){return U>>b;},'G0':function(U,b){return U>=b;},'d6':function(U,b){return U&b;},'r2':function(U,b){return U&b;},'S5':function(U,b){return U<=b;},'p2H':"Fingerprint",'Q8':"charCodeAt",'q0':function(U,b){return U<=b;},'n1':function(U,b){return U===b;},'T3':function(U,b){return U&b;},'l1':function(U,b){return U in b;},'j3':function(U,b){return U>b;},'p1':function(U,b){return U>>b;},'h7':function(U,b){return U>>b;},'D8':"-",'U2H':"function",'Q0':function(U,b){return U&b;},'v5':function(U,b){return U>>>b;},'R9':"test",'D0':function(U,b){return U&b;}};var bsig;try{(function(U,b,D){var k=h9U.T7H.v7H("543")?"temp":"amd",Q=h9U.T7H.v7H("1c")?"exports":"n";h9U.I9!==typeof module&&module[Q]?module[Q]=D():h9U.U2H===typeof define&&define[k]?define(D):b[U]=D();})(h9U.p2H,this,function(){var I=h9U.T7H.v7H("f1")?"D":"getContext",A=h9U.T7H.v7H("38")?"createElement":"Fingerprint",s=h9U.T7H.v7H("5fd")?"userAgent":"encoded",t=h9U.T7H.v7H("5b57")?"getRegularPluginsString":"ie_activex",O=h9U.T7H.v7H("d8db")?"screen_orientation":"length",B=h9U.T7H.v7H("d8")?"screen_resolution":"result",V="hasher",f=h9U.T7H.v7H("65")?null:"cookie",C="map",N=function(j){var m=h9U.T7H.v7H("cf7")?"getContext":"call";var u=h9U.T7H.v7H("cdd")?"counter_x":"each";var T,Z;T=Array.prototype.forEach;Z=Array.prototype.map;this[u]=function(U,b,D){var k="l2";var Q="hasOwnProperty";var H=h9U.T7H.v7H("a2c")?31:"e2";var y=h9U.T7H.v7H("2eb")?"p2":32768;var o="forEach";var M="k2";var h="U2";if(h9Uh)if(T&&h9UM)Uo;else if(U.length===+U.length)for(var p=0,K=U.length;h9Uy&&h9UH;p++);else for(p in U)if(UQ&&h9Uk)break;};this[C]=h9U.T7H.v7H("a41")?function(k,Q,H){var y=h9U.T7H.v7H("134")?61:"Y2";var o="n2";var M=[];if(h9Uo)return M;if(Z&&h9Uy)return kC;thisu;return M;}:8388608;"object"==typeof j?(this[V]=h9U.T7H.v7H("fa12")?"atob":j[V],this[B]=j[B],this[O]=h9U.T7H.v7H("382")?j[O]:"hasher",this.canvas=j.canvas,this[t]=h9U.T7H.v7H("ec4")?"canvas":j[t]):"function"==typeof j&&(this[V]=j);};N.prototype={get:function(){var U="murmurhash3_32_gc",b=h9U.T7H.v7H("6ce1")?"getCanvasFingerprint":"outerWidth",D="isCanvasSupported",k="getPluginsString",Q="doNotTrack",H="platform",y="cpuClass",o="openDatabase",M="addBehavior",h="body",p=h9U.T7H.v7H("86ff")?"pageXOffset":"indexedDB",K=h9U.T7H.v7H("3d31")?"hasLocalStorage":"hasLocalStorage",j="hasSessionStorage",m=h9U.T7H.v7H("5a3")?"window":"getScreenResolution",u="colorDepth",T="language",Z=[];Zh9U.k5H;Zh9U.k5H;Zh9U.k5H;this[B]&&"undefined"!==typeof thism&&Zh9U.k5H;Z[h9U.k5H]((new Date)h9U.O1);Zh9U.k5H;Zh9U.k5H;Zh9U.k5H;document[h]?Z[h9U.k5H](typeof document[h][M]):Zh9U.k5H;Z[h9U.k5H](typeof window[o]);Zh9U.k5H;Zh9U.k5H;Zh9U.k5H;Zh9U.k5H;this.canvas&&thisD&&Zh9U.k5H;return this[V]?thisV:thisU;},murmurhash3_32_gc:function(U,b){var D="F5",k="E5",Q="v5",H="a5",y=h9U.T7H.v7H("7a21")?29:"K5",o=h9U.T7H.v7H("473")?1732584193:"Y5",M="n5",h="l5",p="e5",K="H5",j="g5",m=h9U.T7H.v7H("3dff")?"::":"b5",u="d2",T="x2",Z="W2",J=h9U.T7H.v7H("8e")?19:"r2",E=h9U.T7H.v7H("4ebf")?30:"f2",L="F2",F="E2",g="v2",R="a2",a="K2",P,X,e,l,v;P=h9Ua;X=h9UR;e=h9U.T7H.v7H("1d")?"emit":b;for(v=0;h9Ug;)l=h9UF|(Uh9U.Q8&255)<<8|(Uh9U.Q8&255)<<16|(Uh9U.Q8&255)<<24,++v,l=h9UL,l=h9UE,l=h9UJ,e^=l,e=h9UZ,e=h9UT,e=(h9Uu)+27492+(h9Um);l=0;switch(P){case 3:l^=h9Uj;case 2:l^=h9UK;case 1:l^=h9U.T7H.v7H("353")?h9Up:12,l=h9U.T7H.v7H("86e")?255:h9Uh,l=h9UM,e^=h9Uo;}e^=U.length;e^=h9Uy;e=h9U.T7H.v7H("4224")?24:h9UH;e^=h9UQ;e=h9Uk;return h9UD;},hasLocalStorage:function(){var b=h9U.T7H.v7H("21a7")?"frames":"localStorage";try{return !!window[b];}catch(U){return !0;}},hasSessionStorage:function(){var b=h9U.T7H.v7H("c2")?"milliseconds":"sessionStorage";try{return !!window[b];}catch(U){return !0;}},isCanvasSupported:function(){var U=h9U.T7H.v7H("355f")?"cpuClass":documentA;return !(!U[I]||!UI);},isIE:function(){var U="r5",b=h9U.T7H.v7H("b4")?"appName":"outerHeight",D=h9U.T7H.v7H("daef")?2562383102:"f5";return h9UD?"hasOwnProperty":"description",H=h9U.T7H.v7H("451f")?"name":"s",y=h9U.T7H.v7H("22c")?"2d":this[C](k,function(U){var b=h9U.T7H.v7H("b3")?"Fingerprint":"suffixes";var D="type";return [U[D],U[b]]h9U.l9;})h9U.l9;return [k[H],k[Q],y]h9U.l9;},this)h9U.l9;},getIEPluginsString:function(){var D="split",k="ActiveXObject";return window[k]?thisC,function(b){try{return new ActiveXObject(b),b;}catch(U){return null;}})h9U.l9:"";},getScreenResolution:function(){var U=h9U.T7H.v7H("15e")?"x":"W5";return this[O]?h9UU?[screen.height,screen.width]:[screen.width,screen.height]:[screen.height,screen.width];},getCanvasFingerprint:function(){var U="toDataURL",b=documentA,D=h9U.T7H.v7H("de2")?bI:"object";D.textBaseline="top";D.font="14px 'Arial'";D.textBaseline="alphabetic";D.fillStyle="#f60";D.fillRect(125,1,62,20);D.fillStyle="#069";D.fillText("http://valve.github.io",2,15);D.fillStyle="rgba(102, 204, 0, 0.7)";D.fillText("abcdefghijklmnopqrstuvwxyz01234567890",4,17);return bU;}};return N;});bsig=new Fingerprint({canvas:true})h9U.d2H;}catch(b){var z=h9U.T7H.v7H("b55e")?function(U){bsig=h9U.T7H.v7H("bce8")?U:"g";}:4;z(h9U.D8);}(function(){var x="winsocks",G="outerWidth",W="documentElement",d="document",F4=h9U.T7H.v7H("3fb")?"z7":"; path=/",L4="w5",P4=h9U.T7H.v7H("f18a")?65:"S5",J4=h9U.T7H.v7H("fd")?"date":"fromCharCode",w=h9U.T7H.v7H("2d")?"replace":"getContext",r="charAt",S="toString",U4=function(U,b){var D="G0";return h9UD;},q=function(U,b){var D="u3";var k=h9UD;return k;},i=function(b){var D="B3";var k="";var Q;var H;for(Q=7;h9UD;Q--){var y=function(){var U="J3";H=h9UU;};y();k+=HS;}return k;},d4=function(b){var D="t3";var k="";var Q;var H;var y;for(Q=0;h9UD;Q+=2){var o=function(){var U="T3";H=h9UU;};var M=function(){var U="s3";y=h9UU;};o();M();k+=HS+yS;}return k;},c4=function(U){var b="h3";var D="toUpperCase";var k="j3";var Q=0;var H=0;var y;while(h9Uk){y=Ur;H++;switch(yD){case "0":Q+=4;break;case "1":Q+=3;break;case "2":case "3":Q+=2;break;case "4":case "5":case "6":case "7":Q+=1;break;}if(h9Ub)break;}return Q;},R4=function(b){var D="toLowerCase";var k="q0";var Q="L0";var H="O0";var y="I0";var o="u0";var M="h0";var h="j0";var p="i6";var K="q6";var j="R6";var m="L6";var u="P6";var T="s6";var Z=function(){var U="m6";g=h9UU;};var J=function(){var U="t6";g=h9UU;};var E=function(){var U="j6";g=h9UU;};var L=function(U){g=U;};var F;var g,R;var a=new Array(80);var P=1732584193;var X=4023233417;var e=2562383102;var l=271733878;var v=3285377520;var I,A,s,t,O;var B;b=r4(b);var V=b.length;var f=new Array;for(g=0;h9UT;g+=4){var C=function(){var U="T6";R=h9UU;};C();fh9U.k5H;}switch(h9Uu){case 0:L(2147483648);break;case 1:J();break;case 2:Z();break;case 3:E();break;}fh9U.k5H;while(h9Um)fh9U.k5H;fh9U.k5H;fh9U.k5H;for(F=0;h9Up;F+=16){var N=function(){var U="x6";v=h9UU;};var g4=function(){var U="M0";P=h9UU;};var Q4=function(U){t=U;};var D4=function(U){s=U;};var k4=function(U){A=U;};var y4=function(){var U="d6";l=h9UU;};var M4=function(){var U="D0";e=h9UU;};var p4=function(){var U="Q0";X=h9UU;};var H4=function(U){I=U;};var l4=function(U){O=U;};var j4=function(U){a[g]=U[F+g];};for(g=0;h9Uh;g++)j4(f);for(g=16;h9UM;g++)a[g]=q(h9Uo,1);H4(P);k4(X);D4(e);Q4(l);l4(v);for(g=0;h9Uy;g++){var Z4=function(U){O=U;};var e4=function(U){A=U;};var m4=function(U){I=U;};var n4=function(){var U="V0";B=q(I,5)+(h9UU|~A&t)+O+a[g]+1518500249&4294967295;};var h4=function(U){t=U;};n4();Z4(t);h4(s);s=q(A,30);e4(I);m4(B);}for(g=20;h9UH;g++){var o4=function(U){O=U;};var t4=function(U){I=U;};var A4=function(U){t=U;};var Y4=function(){var U="P0";B=h9UU;};var u4=function(U){A=U;};Y4();o4(t);A4(s);s=q(A,30);u4(I);t4(B);}for(g=40;h9UQ;g++){var a4=function(U){t=U;};var s4=function(U){A=U;};var I4=function(U){O=U;};var K4=function(){var U="R0";B=h9UU;};var v4=function(U){I=U;};K4();I4(t);a4(s);s=q(A,30);s4(I);v4(B);}for(g=60;h9Uk;g++){var O4=function(U){A=U;};var B4=function(U){I=U;};var E4=function(U){t=U;};var V4=function(){var U="N0";B=h9UU;};var T4=function(U){O=U;};V4();T4(t);E4(s);s=q(A,30);O4(I);B4(B);}g4();p4();M4();y4();N();}var B=i(P)+i(X)+i(e)+i(l)+i(v);return BD;},f4=function(){var U="domAutomationController";var b="domAutomation";var D="webdriver";var k="spawn";var Q="emit";var H="Buffer";var y="__phantomas";var o="callPhantom";var M="_phantom";if(window[M]||window[o]||window[y]||window[H]||window[Q]||window[k]||window[D]||window[b]||window[U]){return true;}return false;},r4=function(U){var b="M6";var D="p6";var k="k6";var Q="z6";var H="w3";var y="S3";var o="C3";var M="X3";var h="c3";U=Uw;var p="";for(var K=0;h9Uh;K++){var j=Uh9U.Q8;if(h9UM)p+=sfcc(j);else if(h9Uo&&h9Uy){p+=sfcc(h9UH);p+=sfcc(h9UQ);}else{p+=sfcc(h9Uk);p+=sfcc(h9UD);p+=sfcc(h9Ub);}}return p;};sfcc=String[J4],io="indexOf",at3="###",chars=[];for(var c=65;h9UP4;c++){charsh9U.k5H;}for(var c=97;h9UL4;c++){charsh9U.k5H;}for(var c=48;h9UF4;c++){charsh9U.k5H;}charsh9U.k5H;charsh9U.k5H;charsh9U.k5H;chars=charsh9U.l9;if(typeof b4=="undefined"){function b4(U){var b="t7",D="u7",k="h7",Q="j7",H="M7",y="p7",o="k7",M=[],h=0;while(h9Uo){var p=Uh9U.Q8,K=Uh9U.Q8,j=Uh9U.Q8,m=(h9Uy)+(h9UH)+(j||0),u=h9UQ,T=h9Uk,Z=isNaN(K)?64:h9UD,J=isNaN(j)?64:h9Ub;M[M.length]=charsr;M[M.length]=charsr;M[M.length]=charsr;M[M.length]=charsr;}return Mh9U.l9;};}if(typeof q4=="undefined"){function q4(U){var b="M3",D="Q3",k="D3",Q="U3",H="G7",y="N7",o="C7",M="X7",h="c7",p="J7",K="B7",j="T7",m="equals",u="chars",T="strlen",Z="s7",J={strlen:h9UZ,chars:(new RegExp("[^"+chars+"]"))h9U.R9,equals:/=/h9U.R9&&(/=[^=]/h9U.R9||/={3}/h9U.R9)};if(J[T]||J[u]||J[m])throw new Error("Invalid base64 data");var E=[],L=0;while(h9Uj){var F=charsio,g=charsio,R=charsio,a=charsio,P=(h9UK)+(h9Up)+(h9Uh)+(h9UM),X=h9Uo,e=h9Uy?-1:h9UH,l=h9UQ?-1:h9Uk;E[E.length]=sfcc(X);if(h9UD)E[E.length]=sfcc(e);if(h9Ub)E[E.length]=sfcc(l);}return Eh9U.l9;};};if(!Array.prototype[io])Array.prototype[io]=function(U){var b="n1",D="l1",k="e1",Q="p1",H="floor",y="ceil",o="k1",M="U1",h=h9UM,p=Number(arguments[1])||0;p=h9Uo?Mathy:MathH;if(h9UQ){p+=h;}for(;h9Uk;p++){if(h9UD&&h9Ub){return p;}}return -1;};var n=[],Y=["object","function","number","string"],z4=true;if(typeof (pageYOffset)=="undefined"){var X4=function(U){var b="scrollLeft";pageYOffset=U[d][W][b];};X4(window);}if(typeof (pageXOffset)=="undefined"){var G4=function(U){var b="scrollTop";pageXOffset=U[d][W][b];};G4(window);}if(typeof (innerWidth)=="undefined"){var w4=function(U){var b="clientWidth";innerWidth=U[W][b];};w4(document);}if(typeof (innerHeight)=="undefined"){var S4=function(U){var b="clientHeight";innerHeight=U[W][b];};S4(document);}if(typeof (window[G])=="undefined"){var x4=function(U){var b="offsetWidth";window[G]=U[W][b];};x4(document);}if(typeof (outerHeight)=="undefined"){var i4=function(U){var b="offsetHeight";outerHeight=U[W][b];};i4(document);}if(typeof (screenX)=="undefined"){var N4=function(U){screenX=U.width;};N4(screen);}if(typeof (screenY)=="undefined"){var C4=function(U){screenY=U.height;};C4(screen);}n[h9U.k5H](Y[io](typeof (frames))>-1);n[h9U.k5H](Y[io](typeof (length))>-1);n[h9U.k5H](Y[io](typeof (pageYOffset))>-1);n[h9U.k5H](Y[io](typeof (pageXOffset))>-1);n[h9U.k5H](Y[io](typeof (innerWidth))>-1);n[h9U.k5H](Y[io](typeof (innerHeight))>-1);n[h9U.k5H](Y[io](typeof (outerWidth))>-1);n[h9U.k5H](Y[io](typeof (outerHeight))>-1);n[h9U.k5H](Y[io](typeof (navigator))>-1);n[h9U.k5H](Y[io](typeof (navigator[h9U.Y8]))>-1);n[h9U.k5H](Y[io](typeof (screen))>-1);n[h9U.k5H](Y[io](typeof (document))>-1);n[h9U.k5H](Y[io](typeof (Image))>-1);n[h9U.k5H](Y[io](typeof (document))>-1);n[h9U.k5H](Y[io](typeof (window))>-1);n[h9U.k5H](Y[io](typeof (self))>-1);n[h9U.k5H](Y[io](typeof (status))>-1);n[h9U.k5H](Y[io](typeof (name))>-1);n[h9U.k5H](Y[io](typeof (screenY))>-1);n[h9U.k5H](Y[io](typeof (screenX))>-1);for(idx in n){if(!n[idx]){var W4=function(U){z4=U;};W4(false);break;}}if(!f4()&&!!z4){window[x]=function(E){var L="days_in_week",F="ctrbg",g="rbzns";function R(D,k,Q,H,y){var o="cookie",M="challdomain",h="toGMTString",p="getTime",K="setTime",j="fiftyeightkb",m=new Date(),u=k+"="+D,T=window[g][j];if(!y){mK;var Z="; expires="+mh;u=u+Z;}u=u+"; path=/";if(!H){var J=";domain="+window[g][M];u=u+J;}document[o]=u;if(Q){setTimeout(function(){var U="reload",b="location";window[b]U;},1);}}function a(k,Q){var H=0,y=U4(k+H,Q);(function(){var U="callee",b="Y1";if(y){var D=(h9U[b]((new Date)h9U.O1,123456789))S;R(b4(window[g][F]+at3+H+at3+D)w,"rbzid",E,false);}else{H++;y=U4(k+H,Q);setTimeout(arguments[U],0);}})();}a(window[g][F],window[g][L]);};}else{window[x]=function(){};}})(); rbzns.challdomain="shironet.mako.co.il"; rbzns.ctrbg="brb8ksuHgR0WzhLglimN5/vEOZXT8ta8pQ7CjdjaMLB1jNFR1vQ7oPtL47YLbNAdPhCdIms2PF6gVIVWak0+7+Gvj5fDVUYIaRdiXnH7HbY="; winsocks(true);
I notice that if Im enter the page from the browser then i get the result Im except for when Im running the code.
What should I do to handle this problem?
I am using the code below to try an extract the data at the table in this URL. However, I get the following error message:
Error: `AttributeError: 'NoneType' object has no attribute 'find'`in
the line `data = iter(soup.find("table", {"class":
"tablestats"}).find("th", {"class": "header"}).find_all_next("tr"))`
My code is as follows:
from bs4 import BeautifulSoup
import requests
r = requests.get(
"http://www.federalreserve.gov/econresdata/researchdata/feds200628_1.html")
soup = BeautifulSoup(r.content)
data = iter(soup.find("table", {"class": "tablestats"}).find("th", {"class": "header"}).find_all_next("tr"))
headers = (next(data).text, next(data).text)
table_items = [(a.text, b.text) for ele in data for a, b in [ele.find_all("td")]]
for a, b in table_items:
print(u"Date={}, Maturity={}".format(a, b if b.strip() else "null"))
Thank You
from bs4 import BeautifulSoup
import requests
r = requests.get(
"http://www.federalreserve.gov/econresdata/researchdata/feds200628_1.html")
soup = BeautifulSoup(r.content)
# column headers
h = data.find_all("th", scope="col")
# get all the tr tags after the headers
final = [[t.th.text] + [ele.text for ele in t.find_all("td")] for t in h[-1].find_all_next("tr")]
headers = [th.text for th in h]
The final out list is all the rows in individual lists:
[['2015-06-05', '4.82039691', '-4.66420959', '-4.18904598',
'-3.94541434', '1.1477', '2.9361', '3.3588', '0.6943', '1.5881',
'2.3034', '2.7677', '3.0363', '3.1801', '3.2537', '3.2930', '3.3190',
'3.3431', '3.3707', '3.4038', '3.4428', '3.4871', '3.5357', '3.5876',
'3.6419', '3.6975', '3.7538', '3.8100', '3.8656', '3.9202', '3.9734',
'4.0250', '4.0748', '4.1225', '4.1682', '4.2117', '4.2530', '4.2921',
'0.3489', '0.7464', '1.1502', '1.4949', '1.7700', '1.9841', '2.1500',
'2.2800', '2.3837', '2.4685', '2.5396', '2.6006', '2.6544', '2.7027',
'2.7469', '2.7878', '2.8260', '2.8621', '2.8964', '2.9291', '2.9603',
'2.9901', '3.0187', '3.0461', '3.0724', '3.0976', '3.1217', '3.1448',
'3.1669', '3.1881', '0.3487', '0.7469', '1.1536', '1.5039', '1.7862',
'2.0078', '2.1811', '2.3179', '2.4277', '2.5181', '2.5943', '2.6603',
'2.7190', '2.7722', '2.8215', '2.8677', '2.9117', '2.9538', '2.9944',
'3.0338', '3.0721', '3.1094', '3.1458', '3.1814', '3.2161', '3.2501',
'3.2832', '3.3156', '3.3472', '3.3781', '1.40431658', '9.48795888'],
['2015-06-04', '4.64953424', '-4.52780982', '-3.98051369',
......................................
The headers:
['BETA0', 'BETA1', 'BETA2', 'BETA3', 'SVEN1F01', 'SVEN1F04', 'SVEN1F09', 'SVENF01', 'SVENF02', 'SVENF03', 'SVENF04', 'SVENF05', 'SVENF06', 'SVENF07', 'SVENF08', 'SVENF09', 'SVENF10', 'SVENF11', 'SVENF12', 'SVENF13', 'SVENF14', 'SVENF15', 'SVENF16', 'SVENF17', 'SVENF18', 'SVENF19', 'SVENF20', 'SVENF21', 'SVENF22', 'SVENF23', 'SVENF24', 'SVENF25', 'SVENF26', 'SVENF27', 'SVENF28', 'SVENF29', 'SVENF30', 'SVENPY01', 'SVENPY02', 'SVENPY03', 'SVENPY04', 'SVENPY05', 'SVENPY06', 'SVENPY07', 'SVENPY08', 'SVENPY09', 'SVENPY10', 'SVENPY11', 'SVENPY12', 'SVENPY13', 'SVENPY14', 'SVENPY15', 'SVENPY16', 'SVENPY17', 'SVENPY18', 'SVENPY19', 'SVENPY20', 'SVENPY21', 'SVENPY22', 'SVENPY23', 'SVENPY24', 'SVENPY25', 'SVENPY26', 'SVENPY27', 'SVENPY28', 'SVENPY29', 'SVENPY30', 'SVENY01', 'SVENY02', 'SVENY03', 'SVENY04', 'SVENY05', 'SVENY06', 'SVENY07', 'SVENY08', 'SVENY09', 'SVENY10', 'SVENY11', 'SVENY12', 'SVENY13', 'SVENY14', 'SVENY15', 'SVENY16', 'SVENY17', 'SVENY18', 'SVENY19', 'SVENY20', 'SVENY21', 'SVENY22', 'SVENY23', 'SVENY24', 'SVENY25', 'SVENY26', 'SVENY27', 'SVENY28', 'SVENY29', 'SVENY30', 'TAU1', 'TAU2']
There are a lot of issues in your code.
There is no table with class 'tablestats'.
There are no 'th' fields with class 'header'.
Following line-
table_items = [(a.text, b.text) for ele in data for a, b in [ele.find_all("td")]]
doesnt return just 2 values, so cant assign to a, b