i want to scrape the url below
https://turo.com/search?airportCode=EWR&customDelivery=true&defaultZoomLevel=11&endDate=04%2F05%2F2019&endTime=11%3A00&international=true&isMapSearch=false&itemsPerPage=200&location=EWR&locationType=Airport&maximumDistanceInMiles=30&sortType=RELEVANCE&startDate=03%2F05%2F2019&startTime=10%3A00
i want all the href link of the cars listed so that i can scrape forward
but i am unable to get it
please help
You can pull in the json response with the query and work with that:
import requests
from pandas.io.json import json_normalize
base_url = 'https://turo.com'
url = 'https://turo.com/api/search?'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',
'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9',
'Connection': 'keep-alive',
'Host': 'turo.com',
'Referer': 'https://turo.com/search'}
params = {
'airportCode': 'EWR',
'customDelivery': 'true',
'defaultZoomLevel': '11',
'endDate': '04/05/2019',
'endTime': '11:00',
'international': 'true',
'isMapSearch': 'false',
'itemsPerPage': '200',
'location': 'EWR',
'locationType': 'Airport',
'maximumDistanceInMiles': '30',
'sortType': 'RELEVANCE',
'startDate': '03/05/2019',
'startTime': '10:00'}
response = requests.get(url, headers=headers, params=params)
data = response.json()
search_id = data['searchId']
print (search_id)
for ele in data['list']:
link = ele['vehicle']['url']
print (base_url + link)
links_list = [ base_url + ele['vehicle']['url'] for ele in data['list'] ]
# If you want to Manipulate a table of the data
df = json_normalize(data['list'])
Output:
https://turo.com/rentals/cars/nj/jersey-city/ford-mustang/385739
https://turo.com/rentals/suvs/nj/jersey-city/mazda-cx-9/84783
https://turo.com/rentals/cars/nj/chatham-township/mercedes-benz-s-class/439410
https://turo.com/rentals/cars/nj/edgewater/bmw-3-series/360338
https://turo.com/rentals/cars/nj/edgewater/mercedes-benz-s-class/528415
https://turo.com/rentals/cars/nj/jersey-city/mercedes-benz-c-class/546598
https://turo.com/rentals/cars/nj/wallington/bmw-3-series/523288
https://turo.com/rentals/cars/nj/edgewater/mercedes-benz-c-class/402723
https://turo.com/rentals/cars/nj/east-brunswick/dodge-charger/474370
https://turo.com/rentals/cars/nj/dover/mini-cooper/562685
https://turo.com/rentals/cars/nj/east-brunswick/dodge-challenger/245816
https://turo.com/rentals/cars/nj/city-of-orange/bmw-3-series/441085
https://turo.com/rentals/cars/nj/aberdeen-township/chevrolet-corvette/329734
https://turo.com/rentals/suvs/nj/montvale/nissan-rogue/551766
https://turo.com/rentals/cars/nj/haledon/ford-mustang/527092
https://turo.com/rentals/suvs/nj/montvale/toyota-rav4/551789
https://turo.com/rentals/cars/nj/wallington/bmw-3-series/506047
https://turo.com/rentals/suvs/nj/englewood/toyota-rav4/205796
https://turo.com/rentals/cars/nj/jersey-city/audi-tt-rs/454914
https://turo.com/rentals/suvs/nj/newark/jeep-grand-cherokee/547858
https://turo.com/rentals/suvs/nj/scotch-plains/acura-mdx/494289
https://turo.com/rentals/suvs/nj/jersey-city/dodge-journey/371145
https://turo.com/rentals/cars/nj/jersey-city/tesla-model-s/450268
https://turo.com/rentals/cars/nj/ridgefield-park/bmw-m3/479442
https://turo.com/rentals/suvs/nj/edgewater/bmw-x3/509043
https://turo.com/rentals/suvs/nj/lodi/kia-sorento/535604
https://turo.com/rentals/suvs/nj/scotch-plains/bmw-x5/555187
https://turo.com/rentals/suvs/nj/west-new-york/audi-q5/317033
https://turo.com/rentals/suvs/nj/jersey-city/honda-cr-v/523284
https://turo.com/rentals/suvs/nj/old-bridge-township/acura-mdx/384309
https://turo.com/rentals/cars/pa/wind-gap/bmw-3-series/558761
https://turo.com/rentals/suvs/nj/jersey-city/toyota-rav4/442146
https://turo.com/rentals/cars/nj/hillsborough-township/mercedes-benz-c-class/330064
https://turo.com/rentals/cars/nj/jersey-city/audi-a8/518856
https://turo.com/rentals/cars/nj/mount-olive-township/bmw-3-series/528020
https://turo.com/rentals/cars/il/lincolnwood/mercedes-benz-cls-class/207362
https://turo.com/rentals/cars/nj/jersey-city/bmw-5-series/515357
https://turo.com/rentals/cars/nj/newark/audi-a6/108287
https://turo.com/rentals/cars/nj/montvale/audi-a3/552084
https://turo.com/rentals/suvs/nj/jersey-city/bmw-x6-m/325161
https://turo.com/rentals/minivans/nj/jersey-city/honda-odyssey/256481
https://turo.com/rentals/cars/nj/dover/bmw-1-series/562679
https://turo.com/rentals/suvs/nj/fort-lee/audi-q7/437196
https://turo.com/rentals/cars/nj/edgewater/jaguar-xf/528929
https://turo.com/rentals/cars/nj/jersey-city/bmw-3-series/309937
https://turo.com/rentals/cars/nj/jersey-city/bmw-1-series/517828
https://turo.com/rentals/suvs/nj/old-bridge-township/jeep-wrangler/506051
https://turo.com/rentals/suvs/nj/jersey-city/bmw-x5/533213
https://turo.com/rentals/suvs/nj/morristown/honda-pilot/310610
https://turo.com/rentals/cars/nj/jersey-city/chevrolet-camaro/207528
https://turo.com/rentals/minivans/nj/jersey-city/honda-odyssey/506466
https://turo.com/rentals/suvs/nj/edgewater/acura-mdx/522686
https://turo.com/rentals/cars/nj/bloomfield/porsche-911/153991
https://turo.com/rentals/suvs/nj/west-new-york/land-rover-range-rover-sport/295016
https://turo.com/rentals/suvs/nj/jersey-city/infiniti-fx/533200
https://turo.com/rentals/suvs/nj/jersey-city/bmw-x5/551272
https://turo.com/rentals/suvs/nj/newark/kia-sorento/554517
https://turo.com/rentals/suvs/nj/jersey-city/honda-cr-v/517423
https://turo.com/rentals/suvs/nj/rahway/jeep-grand-cherokee/410048
https://turo.com/rentals/cars/nj/clifton/dodge-challenger/513745
https://turo.com/rentals/cars/nj/jersey-city/bmw-z4/569205
https://turo.com/rentals/suvs/nj/union-city/acura-rdx/159384
https://turo.com/rentals/cars/nj/roselle/chevrolet-impala/389132
https://turo.com/rentals/suvs/nj/newark/ford-edge/543229
https://turo.com/rentals/cars/nj/wallington/bmw-3-series/560189
https://turo.com/rentals/suvs/nj/rahway/nissan-murano/491762
https://turo.com/rentals/suvs/nj/sayreville/ford-escape/561262
https://turo.com/rentals/cars/nj/randolph/bmw-3-series/476134
https://turo.com/rentals/cars/nj/jersey-city/dodge-charger/353789
https://turo.com/rentals/suvs/nj/bloomfield/land-rover-range-rover-sport/526192
https://turo.com/rentals/cars/nj/hillsborough-township/infiniti-g-sedan/466276
https://turo.com/rentals/suvs/nj/edgewater/lexus-rx-350/455895
https://turo.com/rentals/trucks/nj/jackson/toyota-tundra/506040
https://turo.com/rentals/cars/nj/jersey-city/tesla-model-3/458395
https://turo.com/rentals/cars/nj/jersey-city/mercedes-benz-e-class/533931
https://turo.com/rentals/suvs/nj/monroe-township/bmw-x3/526846
https://turo.com/rentals/suvs/nj/roselle-park/volkswagen-tiguan/534144
https://turo.com/rentals/suvs/pa/easton/mercedes-benz-glk-class/422203
https://turo.com/rentals/cars/nj/jersey-city/bmw-3-series/543432
https://turo.com/rentals/cars/nj/edison/infiniti-q50/559651
https://turo.com/rentals/suvs/nj/elizabeth/jeep-grand-cherokee/566712
https://turo.com/rentals/cars/nj/edison/infiniti-q50/448312
https://turo.com/rentals/suvs/nj/rahway/toyota-highlander/524049
https://turo.com/rentals/cars/nj/newark/ford-mustang/551401
https://turo.com/rentals/suvs/nj/montvale/porsche-macan/551814
https://turo.com/rentals/cars/nj/hackensack/porsche-cayman/514871
https://turo.com/rentals/cars/nj/city-of-orange/mercedes-benz-c-class/414725
https://turo.com/rentals/cars/nj/east-brunswick/ford-flex/340268
https://turo.com/rentals/cars/nj/burlington/tesla-model-s/316123
https://turo.com/rentals/cars/nj/jersey-city/tesla-model-3/552630
https://turo.com/rentals/suvs/nj/jackson/ford-explorer/534670
https://turo.com/rentals/cars/nj/haledon/honda-insight/421898
https://turo.com/rentals/cars/nj/elizabeth/bmw-5-series/548519
https://turo.com/rentals/suvs/nj/montvale/mercedes-benz-glc-class/552101
https://turo.com/rentals/suvs/nj/jersey-city/bmw-x5/357098
https://turo.com/rentals/trucks/nj/weehawken/ford-f-150/405386
https://turo.com/rentals/cars/nj/jersey-city/audi-a4/541799
https://turo.com/rentals/suvs/nj/edgewater/jeep-wrangler/219798
https://turo.com/rentals/trucks/nj/montvale/ford-f-150/551838
https://turo.com/rentals/cars/nj/little-falls/lexus-is-250/179987
https://turo.com/rentals/cars/nj/old-bridge-township/maserati-granturismo-convertible/488496
https://turo.com/rentals/cars/nj/jersey-city/jaguar-f-type/420320
https://turo.com/rentals/suvs/nj/east-brunswick/gmc-acadia/348063
https://turo.com/rentals/cars/nj/garfield/mercedes-benz-e-class/500408
https://turo.com/rentals/cars/nj/edison/mercedes-benz-c-class/315170
https://turo.com/rentals/cars/nj/bayonne/mercedes-benz-cls-class/480787
https://turo.com/rentals/cars/nj/hammonton/ford-mustang/534026
https://turo.com/rentals/cars/nj/south-amboy/bmw-7-series/138755
https://turo.com/rentals/cars/nj/jersey-city/bmw-i3/344512
https://turo.com/rentals/cars/pa/easton/pontiac-g3/343750
https://turo.com/rentals/cars/nj/jersey-city/bmw-5-series/362079
https://turo.com/rentals/cars/nj/montvale/bmw-4-series/551976
https://turo.com/rentals/cars/nj/haledon/honda-civic/369357
https://turo.com/rentals/cars/nj/phillipsburg/porsche-boxster/420778
https://turo.com/rentals/suvs/nj/city-of-orange/infiniti-qx30/427614
https://turo.com/rentals/suvs/nj/jersey-city/toyota-highlander/196708
https://turo.com/rentals/suvs/nj/jersey-city/mercedes-benz-glk-class/546930
https://turo.com/rentals/cars/nj/elizabeth/mercedes-benz-e-class/473235
https://turo.com/rentals/cars/nj/newark/volkswagen-jetta/297019
https://turo.com/rentals/suvs/nj/little-ferry/honda-cr-v/483880
https://turo.com/rentals/cars/nj/jersey-city/mazda-3/379389
https://turo.com/rentals/cars/nj/jersey-city/tesla-model-3/539962
https://turo.com/rentals/cars/nj/jackson/cadillac-cts/506062
https://turo.com/rentals/cars/ct/sharon/bmw-7-series/536091
https://turo.com/rentals/suvs/nj/east-orange/porsche-macan/467839
https://turo.com/rentals/cars/nj/roselle/nissan-versa/473960
https://turo.com/rentals/cars/nj/hoboken/mercedes-benz-e-class/533198
https://turo.com/rentals/suvs/nj/hillsborough-township/gmc-acadia/553441
https://turo.com/rentals/cars/nj/lodi/tesla-model-3/568337
https://turo.com/rentals/cars/nj/jersey-city/mazda-6/528355
https://turo.com/rentals/cars/nj/west-new-york/mercedes-benz-e-class/544361
https://turo.com/rentals/suvs/nj/edison/gmc-terrain/416012
https://turo.com/rentals/cars/nj/newark/ford-focus/284564
https://turo.com/rentals/cars/nj/old-bridge-township/bmw-5-series/560039
https://turo.com/rentals/cars/nj/linwood/toyota-camry/219654
https://turo.com/rentals/cars/nj/jersey-city/honda-fit/429840
https://turo.com/rentals/suvs/nj/jersey-city/bmw-x5/429301
https://turo.com/rentals/suvs/nj/montvale/jaguar-f-pace/551780
https://turo.com/rentals/suvs/nj/elizabeth/ford-explorer/361114
https://turo.com/rentals/cars/nj/jersey-city/infiniti-g37/69252
https://turo.com/rentals/cars/nj/jersey-city/tesla-model-3/483663
https://turo.com/rentals/minivans/nj/linden/toyota-sienna/565755
https://turo.com/rentals/suvs/nj/dover/jeep-wrangler/273594
https://turo.com/rentals/suvs/nj/jersey-city/mercedes-benz-glk-class/147897
https://turo.com/rentals/suvs/nj/montvale/audi-q7/552088
https://turo.com/rentals/cars/nj/montvale/nissan-altima/551762
https://turo.com/rentals/cars/nj/jersey-city/bmw-3-series/382246
https://turo.com/rentals/cars/nj/newark/chevrolet-malibu/337149
https://turo.com/rentals/cars/nj/fairview/honda-accord/441008
https://turo.com/rentals/cars/pa/easton/ford-fiesta/324654
https://turo.com/rentals/cars/nj/montvale/mercedes-benz-c-class/552099
https://turo.com/rentals/suvs/nj/west-new-york/land-rover-lr3/317945
https://turo.com/rentals/cars/nj/montvale/bmw-4-series/551975
https://turo.com/rentals/cars/pa/lansdale/volvo-s80/502585
https://turo.com/rentals/suvs/nj/montvale/alfa-romeo-stelvio/567102
https://turo.com/rentals/cars/pa/easton/toyota-yaris/286270
https://turo.com/rentals/cars/nj/jersey-city/hyundai-accent/406944
https://turo.com/rentals/minivans/nj/jackson/toyota-sienna/506024
https://turo.com/rentals/suvs/nj/jersey-city/land-rover-range-rover-evoque/253107
https://turo.com/rentals/suvs/nj/bloomfield/land-rover-range-rover-sport/307494
https://turo.com/rentals/cars/nj/montvale/mercedes-benz-e-class/552109
https://turo.com/rentals/minivans/nj/newark/honda-odyssey/535200
https://turo.com/rentals/cars/nj/jersey-city/ford-mustang/500739
https://turo.com/rentals/suvs/nj/elizabeth/toyota-rav4/503028
https://turo.com/rentals/cars/nj/montvale/audi-a6/552092
https://turo.com/rentals/suvs/nj/paterson/honda-cr-v/312733
https://turo.com/rentals/cars/nj/montvale/bmw-5-series/551973
https://turo.com/rentals/cars/nj/teaneck/tesla-model-s/520126
https://turo.com/rentals/suvs/nj/little-falls/acura-mdx/222780
https://turo.com/rentals/suvs/nj/wallington/bmw-x3/440188
https://turo.com/rentals/suvs/nj/belleville/jeep-cherokee/501526
https://turo.com/rentals/suvs/nj/harrison/volkswagen-touareg/531531
https://turo.com/rentals/cars/nj/jersey-city/ford-mustang/438281
https://turo.com/rentals/suvs/nj/roxbury-township/bmw-x1/562709
https://turo.com/rentals/cars/nj/montvale/mercedes-benz-c-class/551803
https://turo.com/rentals/cars/pa/easton/nissan-versa/276913
https://turo.com/rentals/suvs/nj/city-of-orange/kia-sorento/101234
https://turo.com/rentals/suvs/nj/montvale/bmw-x5/551968
https://turo.com/rentals/minivans/pa/easton/toyota-sienna/215018
https://turo.com/rentals/cars/nj/edgewater/hyundai-elantra-touring/476041
https://turo.com/rentals/cars/nj/haledon/nissan-altima/318098
https://turo.com/rentals/suvs/nj/paterson/toyota-rav4/553455
https://turo.com/rentals/cars/nj/union-city/hyundai-sonata/461816
https://turo.com/rentals/cars/nj/montvale/jaguar-xe/551782
https://turo.com/rentals/cars/nj/northvale/bmw-4-series/483229
https://turo.com/rentals/cars/nj/harrison/audi-a5/518618
https://turo.com/rentals/cars/nj/newark/chevrolet-camaro/409288
https://turo.com/rentals/suvs/nj/montvale/porsche-cayenne/551802
https://turo.com/rentals/suvs/ct/west-haven/lexus-gx-460/557327
https://turo.com/rentals/trucks/nj/south-river/dodge-ram-1500/537439
https://turo.com/rentals/suvs/nj/hoboken/gmc-terrain/217576
https://turo.com/rentals/cars/nj/mount-olive-township/chevrolet-cruze/392840
https://turo.com/rentals/cars/nj/old-bridge-township/mercedes-benz-cls-class/488471
https://turo.com/rentals/cars/nj/garfield/nissan-altima/355678
https://turo.com/rentals/suvs/nj/jersey-city/infiniti-qx60/454257
https://turo.com/rentals/cars/nj/montvale/toyota-camry/551790
https://turo.com/rentals/cars/nj/edgewater/volkswagen-jetta/519761
https://turo.com/rentals/suvs/nj/weehawken/chevrolet-tahoe/97631
https://turo.com/rentals/cars/nj/elizabeth/honda-accord/524828
https://turo.com/rentals/cars/nj/montville/volkswagen-jetta/553559
Related
I'm writing a Python script to scrape a table from this site (this is public information about ocean tide levels).
One of the stations I'd like to scrape is Punta del Este, code 83.0, in any given day. But my scripts returns a different table than the browser even when the POST request seems to have the same input.
When I fill the form in my browser, the headers and data sent to the server are these:
So I wrote my script to make a POST request as it follows:
url = 'https://www.ambiente.gub.uy/SIH-JSF/paginas/sdh/consultaHDMCApublic.xhtml'
s = requests.Session()
r = s.get(url, verify=False)
soupGet = BeautifulSoup(r.content, 'lxml')
#JSESSIONID = s.cookies['JSESSIONID']
javax_faces_ViewState = soupGet.find("input", {"type": "hidden", "name":"javax.faces.ViewState"})['value']
headersSih = {
'Accept': 'application/xml, text/xml, */*; q=0.01',
'Accept-Language': 'gzip, deflate, br',
'Accept-Language': 'es-ES,es;q=0.6',
'Connection': 'keep-alive',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
# 'Cookie': 'JSESSIONID=FBE5ZdMQVFrgQ-P6K_yTc1bw.dinaguasihappproduccion',
'Faces-Request': 'partial/ajax',
'Origin': 'https://www.ambiente.gub.uy',
'Referer': url,
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'same-origin',
'Sec-GPC': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest',
}
ini_date = datetime.strftime(fecha0 , '%d/%m/%Y %H:%M')
end_date = datetime.strftime(fecha0 + timedelta(days=1), '%d/%m/%Y %H:%M')
codigo = 830
dataSih = {
'javax.faces.partial.ajax': 'true',
'javax.faces.source': 'formConsultaHorario:j_idt64',
'javax.faces.partial.execute': '#all',
'javax.faces.partial.render': 'formConsultaHorario:pnlhorarioConsulta',
'formConsultaHorario:j_idt64': 'formConsultaHorario:j_idt64',
'formConsultaHorario': 'formConsultaHorario',
'formConsultaHorario:estacion_focus': '',
'formConsultaHorario:estacion_input': codigo,
'formConsultaHorario:fechaDesde_input': ini_date,
'formConsultaHorario:fechaHasta_input': end_date,
'formConsultaHorario:variables_focus': '',
'formConsultaHorario:variables_input': '26', # Variable: H,Nivel
'formConsultaHorario:fcal_focus': '',
'formConsultaHorario:fcal_input': '7', # Tipo calculo: Ingresado
'formConsultaHorario:ptiempo_focus': '',
'formConsultaHorario:ptiempo_input': '2', #Paso de tiempo: Escala horaria
'javax.faces.ViewState': javax_faces_ViewState,
}
page = s.post(url, headers=headersSih, data=dataSih)
However, when I do it via browser I get a table full of data, while python request returns (page.text) a table saying "No data was found".
Is there something I'm missing? I've tried changing a lots of stuff but nothing seems to do the trick.
Maybe on this website javascript loads the data. Requests dont activate it. If you want to get data from there use Selenium
I'm working on scraping from barchart.com using modified code from this stack overflow question:
The header and payload information are from the XHR of the website I was attempting to scrape.
from urllib.parse import unquote
geturl=r'https://www.barchart.com/options/highest-implied-volatility'
apiurl=r'https://www.barchart.com/proxies/core-api/v1/quotes/get'
getheaders={
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'cache-control': 'max-age=0',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36'
}
getpay={
'page': 'all'
}
s=requests.Session()
r=s.get(geturl,params=getpay, headers=getheaders)
headersIV = {
'method': 'GET',
'scheme': 'https',
'authority': 'www.barchart.com',
'Host' : 'www.barchart.com',
'Accept': 'application/json',
'Accept-Encoding': 'gzip, deflate, br',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.3 Safari/605.1.15',
'Accept-Language': 'en-us',
'Referer': 'https://www.barchart.com/options/highest-implied-volatility',
'Connection': 'keep-alive',
'X-XSRF-TOKEN': 'eyJpdiI6Ik8vQTBkcGxZVVF1aG5QeE9TUnk5L3c9PSIsInZhbHVlIjoiMDd6STJyM1FPZEtMMFdLNEcrVjNNWUMva1l3WWxwblMvdEFZMEIzSllzalFySGFoblcyRzgrRmNZa1RMRHdZcTlBVExQTjBQUEhVdTVaNWhMZlJ0ZFM4c3ZaeHMvVmptM2FGQXJobnM1WTl1REx1d3M1eDI2RUc2SEtHY2wzTnUiLCJtYWMiOiIyNGExYjI3N2JkOGRiZGEwYjY4MTQ3OGFiYmYxZGE3ZmJhZmQyMDQwM2NiZTc0YTMzZDFkNjI4ZGIwZmY2YTU0In0=',
'path': '/proxies/core-api/v1/options/get?fields=symbol%2CbaseSymbol%2CbaseLastPrice%2CbaseSymbolType%2CsymbolType%2CstrikePrice%2CexpirationDate%2CdaysToExpiration%2CbidPrice%2Cmidpoint%2CaskPrice%2ClastPrice%2Cvolume%2CopenInterest%2CvolumeOpenInterestRatio%2Cvolatility%2CtradeTime%2CsymbolCode%2ChasOptions&orderBy=volatility&baseSymbolTypes=stock&between(lastPrice%2C.10%2C)=&between(daysToExpiration%2C15%2C)=&between(tradeTime%2C2021-10-21%2C2021-10-22)=&orderDir=desc&between(volatility%2C60%2C)=&limit=200&between(volume%2C500%2C)=&between(openInterest%2C100%2C)=&in(exchange%2C(AMEX%2CNASDAQ%2CNYSE))=&meta=field.shortName%2Cfield.type%2Cfield.description&hasOptions=true&raw=1',
}
payloadIV={
'fields': 'symbol,baseSymbol,baseLastPrice,baseSymbolType,symbolType,strikePrice,expirationDate,daysToExpiration,bidPrice,midpoint,askPrice,lastPrice,volume,openInterest,volumeOpenInterestRatio,volatility,tradeTime,symbolCode,hasOptions',
'orderBy': 'volatility',
'baseSymbolTypes': 'stock',
'between(lastPrice,.10,)':'',
'between(daysToExpiration,15,)':'',
'between(tradeTime,2021-10-21,2021-10-22)':'',
'orderDir': 'desc',
'between(volatility,60,)':'',
'limit': '200',
'between(volume,500,)':'',
'between(openInterest,100,)':'',
'in(exchange,(AMEX,NASDAQ,NYSE))':'',
'meta': 'field.shortName,field.type,field.description',
'hasOptions': 'true',
'raw': '1'
}
r=s.get(apiurl,params=payloadIV,headers=headersIV)
j=r.json()
print(j)
It returns this error message: {'error': {'message': 'Internal error.', 'code': 500}}
I am pretty new to scraping data using API and XHR data. I think I might be doing many things correctly right now but I don't know where I might be making the mistake.
I am trying to scrape "shopee.com.my" top selling products with scrape and also tried with requests but failed in getting valid JSON object. my requests code is given below:
import requests as r
import json
data = {
'authority': 'shopee.com.my',
'method': 'GET',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'scheme': 'https',
'accept': '*/*, application/json',
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36',
'x-api-source': 'pc',
'x-requested-with': 'XMLHttpRequest',
'x-shopee-language': 'en',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
}
subcat_url = '/Boys-Fashion-cat.27.2427'
id = subcat_url.split('.')[-1]
data['path'] = f'/api/v2/search_items/?by=sales&limit=50&match_id={id}&newest=0&order=desc&page_type=search&version=2'
data['referer'] = f'https://shopee.com.my{subcat_url}?page=0&sortBy=sales'
url = f'https://shopee.com.my/api/v2/search_items/?by=sales&match_id={id}&newest=0&order=desc&page_type=search&version=2'
req = r.get(url, headers=data)
items = req.json()['items']
print(items)
print(f'Items length: {len(items)}')
here is my scrapy code:
import scrapy
import json
from scrapy import Request
from scrapy.http.cookies import CookieJar
header_data = {'authority': 'shopee.com.my',
'method': 'GET',
'scheme': 'https',
'accept': '*/*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36',
# 'cookie': 'SPC_U=-; SPC_IA=-1; SPC_EC=-; SPC_F=7jrWAm4XYNNtyVAk83GPknN8NbCMQEIk; REC_T_ID=476673f8-eeb0-11ea-8919-48df374df85c; _gcl_au=1.1.1197882328.1599225148; _med=refer; _fbp=fb.2.1599225150134.114138691; language=en; _ga=GA1.3.1167355736.1599225151; csrftoken=mu9M72KLd73P9QJusB9zFBP6wV3NGg85; _gid=GA1.3.273342972.1603211749; SPC_SI=yxvc89nmqe97ldvpo6wgeybtc8berzyd; welcomePkgShown=true; AMP_TOKEN=%24NOT_FOUND; REC_MD_41_1000027=1603289427_0_50_0_48; SPC_CT_48918e31="1603289273.lUS7x9IuKN5vNbhzibZCOHrIf6vVQmykU/TXxiOii7w="; SPC_CT_57540430="1603289278.FLT3IdzHC32RmEzFxkOi9pI7qhKIs/yq328elYMuwps="; SPC_CT_50ee4e78="1603289299.gvjW32HwgiQGN/4kj2Ac3YFrpqyHVTO8+UjM+uzxy4E="; _dc_gtm_UA-61915055-6=1; SPC_CT_75d7a2b7="1603289557.t5FvxXhnJacZrKkjnIWCUbAgAxAQ3hG5c1tZBzafwc4="; SPC_R_T_ID="n6Ek85JJY1JZATlhgutfB4KB3qrbmFDYX1+udv1EBAPegPE9xuzM8HFeCy1duskY9+DVLJxe4RqaabhyUuojHQG0NI2TqegihbAge+s3k7w="; SPC_T_IV="SGNXqyZ1jtRYpo5kFeKtYg=="; SPC_R_T_IV="SGNXqyZ1jtRYpo5kFeKtYg=="; SPC_T_ID="n6Ek85JJY1JZATlhgutfB4KB3qrbmFDYX1+udv1EBAPegPE9xuzM8HFeCy1duskY9+DVLJxe4RqaabhyUuojHQG0NI2TqegihbAge+s3k7w="',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
'x-api-source': 'pc',
'x-requested-with': 'XMLHttpRequest',
'x-shopee-language': 'en',
}
class TestSpider(scrapy.Spider):
name = 'test'
allowed_domains = ['shopee.com', 'shopee.com.my', 'shopee.com.my/api/']
def start_requests(self):
subcat_url = '/Baby-Toddler-Play-cat.27.23785'
id = subcat_url.split('.')[-1]
header_data['path'] = f'/api/v2/search_items/?by=sales&limit=50&match_id={id}&newest=0&order=desc&page_type=search&version=2'
header_data['referer'] = f'https://shopee.com.my{subcat_url}?page=0&sortBy=sales'
url = f'https://shopee.com.my/api/v2/search_items/?by=sales&limit=50&match_id={id}&newest=0&order=desc&page_type=search&version=2'
yield Request(url=url, headers=header_data)
def parse_data(self, response):
try:
jdata = json.loads(response.body)
return None
except Exception as e:
print(f'exception: {e}')
print(response.body)
return None
items = jdata['items']
for item in items:
name = item['name']
image_path = item['image']
absolute_image = f'https://cf.shopee.com.my/file/{image_path}_tn'
print(f'this is absolute image {absolute_image}')
monthly_sold = 'pending'
price = float(item['price'])/100000
total_sold = item['sold']
location = item['shop_location']
stock = item['stock']
print(name)
print(price)
print(total_sold)
print(location)
print(stock)
not using cookies now but also tried with fresh cookies but no response.
Here are some example links where some so them responses always valid JSON object but some links not return any response. see below api and direct browser links:
https://shopee.com.my/Kids-Sports-Outdoor-Play-cat.27.21700?page=0&sortBy=sales
https://shopee.com.my/api/v2/search_items/?by=sales&limit=50&match_id=21700&newest=0&order=desc&page_type=search&version=2
https://shopee.com.my/Bath-Toiletries-cat.27.2422
https://shopee.com.my/api/v2/search_items/?by=sales&limit=50&match_id=2422&newest=0&order=desc&page_type=search&version=2
you can also see API links in network tab:
network tab link image
I think you are missing a required header I send them like this and it worked
from pprint import pprint
import requests
headers = {
'authority': 'shopee.com.my',
'pragma': 'no-cache',
'cache-control': 'no-cache',
'x-shopee-language': 'en',
'x-requested-with': 'XMLHttpRequest',
'if-none-match-': '55b03-c3d70d78b473147beeb6551fa9df8ca0',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36',
'x-api-source': 'pc',
'accept': '*/*',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'cors',
'sec-fetch-dest': 'empty',
'referer': 'https://shopee.com.my/Kids-Sports-Outdoor-Play-cat.27.21700?page=0&sortBy=sales',
'accept-language': 'es-US,es;q=0.9,en-US;q=0.8,en;q=0.7,es-419;q=0.6',
# 'cookie': '_gcl_au=1.1.1866522785.1603486253; _fbp=fb.2.1603486253254.1114160447; SPC_IA=-1; SPC_EC=-; SPC_U=-; SPC_F=9RO26eJM7IQiFlxki0dAdQCcCsgPwz67; REC_T_ID=71a698d6-1571-11eb-9baf-48df3757c438; SPC_SI=mall.n58BgakbNjCD5RDYlsQJ8EurmBkH5HIY; SPC_CT_c49f0fdc="1603486254.GqWz1BPlfz3MKmUufL3eTwFqgUfdKWcWVf2xiJI7nSk="; SPC_R_T_ID="89vber/2TKnfACAmGbXpxC3BzHc0ajEQMPxgMbAlZnQlgEo7YWmya0sf/KRt1FsoZvaFYKoNDk+Rh9YWLWsNMH324iqgZePbam1q9QpYQlE="; SPC_T_IV="vko6vAtWsyHuqteFHAoPIA=="; SPC_R_T_IV="vko6vAtWsyHuqteFHAoPIA=="; SPC_T_ID="89vber/2TKnfACAmGbXpxC3BzHc0ajEQMPxgMbAlZnQlgEo7YWmya0sf/KRt1FsoZvaFYKoNDk+Rh9YWLWsNMH324iqgZePbam1q9QpYQlE="; AMP_TOKEN=%24NOT_FOUND; _ga=GA1.3.602723004.1603486255; _gid=GA1.3.657631736.1603486255; _dc_gtm_UA-61915055-6=1; language=en',
}
params = (
('by', 'sales'),
('limit', '50'),
('match_id', '21700'),
('newest', '0'),
('order', 'desc'),
('page_type', 'search'),
('version', '2'),
)
response = requests.get('https://shopee.com.my/api/v2/search_items/', headers=headers, params=params)
pprint(response.json())
I have been looking at the site's requests when I log in, but I have tried to replicate them but have no luck. Could someone please help me write up some code that will allow me to use requests to log in to the site. Login URL is https://www.supplystore.com.au/shop/login.aspx The site has recaptcha v3 im pretty sure.
s = requests.session()
def load_task():
login_url = 'https://www.supplystore.com.au/shop/login.aspx'
r_login = s.get(login_url)
soup = bs(r_login.text, 'html.parser')
__VIEWSTATE = soup.find('input', {'name': '__VIEWSTATE'})['value']
#print(__VIEWSTATE)
__VIEWSTATEGENERATOR = soup.find('input', {'name': '__VIEWSTATEGENERATOR'})['value']
#print(__VIEWSTATEGENERATOR)
form_data = {'__EVENTTARGET': '',
'__EVENTARGUMENT': '',
'__VIEWSTATE': __VIEWSTATE,
'ctl00$ctl00$ctl00$ContentPlaceHolderDefault$PageContentPlaceholder$loginForm$UserName': user,
'ctl00$ctl00$ctl00$ContentPlaceHolderDefault$PageContentPlaceholder$loginForm$Password': password,
'ctl00$ctl00$ctl00$ContentPlaceHolderDefault$PageContentPlaceholder$loginForm$RememberMeSet': 'on',
'ctl00$ctl00$ctl00$ContentPlaceHolderDefault$PageContentPlaceholder$loginForm$RedirectUrl': '/',
'ctl00$ctl00$ctl00$ContentPlaceHolderDefault$PageContentPlaceholder$loginForm$Login': 'Login',
'__VIEWSTATEGENERATOR': __VIEWSTATEGENERATOR}
headers = {
'origin': login_url,
'referer': 'https://www.supplystore.com.au/shop/login.aspx',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36',
'DNT': '1',
}
send_r = s.post(login_url, data=form_data, headers=headers)
I am making a program in Python 2.7 which I want to POST a URL to the browser. Here is my code which should explain it much better than I can:
import requests, json, webbrowser
pid = "AQ6723"
size = "660"
recaptcha = ""
baseURL = 'http://www.adidas.co.uk/on/demandware.store/Sites-adidas-GB-Site/en_GB/Cart-MiniAddProduct'
payload = {
'dwfrm_cart_continueShopping': 'Continue+Shopping',
'layer': 'Add+To+Bag+overlay',
'pid': '%20' + pid + '_' + size,
'pid': '%20' + pid + '_' + size,
'g-recaptcha-response': recaptcha,
'Quantity': "1",
'masterPid': pid,
'ajax': "true"
}
headers = {
'Host': 'www.adidas.co.uk',
'Connection': 'keep-alive',
'Content-Length': '85',
'Accept': '*/*',
'Origin': 'http://www.adidas.co.uk',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-US,en;q=0.8,de;q=0.6',
}
print(pid)
finishedProduct = requests.post(baseURL, data = json.dumps(payload), headers = headers)
webbrowser.open(finishedProduct)
This obviously isn't correct but how could I "achieve" this? I just want to be able to see the result of the POST request in browser which ultimately would be a product in-cart.
There is no way of doing this in python natively. If you are looking to automate your browser, look in to Selenium. Selenium has a python binding.