Python - Requests pulling HTML instead of JSON - python

I'm building a Python web scraper (personal use) and am running into some trouble retrieving a JSON file. I was able to find the request URL I need, but when I run my script (I'm using Requests) the URL returns HTML instead of the JSON shown in the Chrome Developer Tools console. Here's my current script:
import requests
import json
url = 'https://nytimes.wd5.myworkdayjobs.com/Video?clientRequestID=1f1a6071627946499b4b09fd0f668ef0'
r = requests.get(url)
print(r.text)
Completely new to Python, so any push in the right direction is greatly appreciated. Thanks!

Looks like that website returns the response depending on the accept headers provided by the request. So try:
import requests
import json
url = 'https://nytimes.wd5.myworkdayjobs.com/Video?clientRequestID=1f1a6071627946499b4b09fd0f668ef0'
r = requests.get(url, headers={'accept': 'application/json'})
print(r.json())
You can have a look at the full api for further reference: http://docs.python-requests.org/en/latest/api/.

Related

Is it possible to send python requests data in format "&username=login&password=password"

I need to send python requests data in application/x-www-form-urlencoded. Couldn;t find the answer. It must be that format otherwise the web won;t pass me :(
simple request should work
import requests
url = 'application/x-www-form-urlencoded&username=login&password=password'
r = requests.get(url)
or a JSON post:
import requests
r = requests.post('application/x-www-form-urlencoded', json={"username": "login","password": password})

get request using python requests module

I'm trying to get the flt information and prices through https://www.easyjet.com by using requests module.
Through browser when I filled the form easyjet.com and click on submit, it is internally fetching the data using following call:
https://www.easyjet.com/ejavailability/api/v15/availability/query?AdditionalSeats=0&AdultSeats=1&ArrivalIata=%23PARIS&ChildSeats=0&DepartureIata=%23LONDON&IncludeAdminFees=true&IncludeFlexiFares=false&IncludeLowestFareSeats=true&IncludePrices=true&Infants=0&IsTransfer=false&LanguageCode=EN&MaxDepartureDate=2018-02-23&MinDepartureDate=2018-02-23
when I'm trying to mimic the same by using following code, I'm not getting the response. I'm pretty new to this domain. Can anyone help to understand what is going wrong?
here is my code
import requests
url = 'https://www.easyjet.com/en/'
url1 = 'https://www.easyjet.com/ejavailability/api/v15/availability/query?AdditionalSeats=0&AdultSeats=1&ArrivalIata=%23PARIS&ChildSeats=0&DepartureIata=%23LONDON&IncludeAdminFees=true&IncludeFlexiFares=false&IncludeLowestFareSeats=true&IncludePrices=true&Infants=0&IsTransfer=false&LanguageCode=EN&MaxDepartureDate=2018-02-23&MinDepartureDate=2018-02-21'
http = requests.Session()
response = http.get(url, verify=False)
response1 = http.get(url1, verify=False)
print(response1.text)

Python : Extract requests query parameters that any given URL receives

Background:
Typically, if I want to see what type of requests a website is getting, I would open up chrome developer tools (F12), go to the Network tab and filter the requests I want to see.
Example:
Once I have the request URL, I can simply parse the URL for the query string parameters I want.
This is a very manual task and I thought I could write a script that does this for any URL I provide. I thought Python would be great for this.
Task:
I have found a library called requests that I use to validate the URL before opening.
testPage = "http://www.google.com"
validatedRequest = str(requests.get(testPage, verify=False).url)
page = urlopen(validatedRequest)
However, I am unsure of how to get the requests that the URL I enter receives. Is this possible in python? A point in the right direction would be great. Once I know how to access these request headers, I can easily parse through.
Thank you.
You can use the urlparse method to fetch the query params
Demo:
import requests
import urllib
from urlparse import urlparse
testPage = "http://www.google.com"
validatedRequest = str(requests.get(testPage, verify=False).url)
page = urllib.urlopen(validatedRequest)
print urlparse(page.url).query
Result:
gfe_rd=cr&dcr=0&ei=ISdiWuOLJ86dX8j3vPgI
Tested in python2.7

Error 403 when getting data from Airbnb API

I am trying to pull data from the Airbnb API, but I get a HTTP 403 error when running my code, this means I somehow don't have access to the Airbnb server. However, I do have an API key. Can someone please help me out here?
This is my code:
#Import required modules
import amadeus
import urllib2
import json
client_id= "**********"
#URL
URL = "https://api.airbnb.com/v2/search_results? client_id=***********otqw18e8nh5nty&locale=en-US&currency=USD&_format=for_search_results_with_minimal_pricing&_limit=10&_offset=0&fetch_facets=true&guests=1&ib=false&ib_add_photo_flow=true&location=Lake%20Tahoe%2C%20CA%2C%20US&min_bathrooms=0&min_bedrooms=0&min_beds=1&min_num_pic_urls=10&price_max=210&price_min=40&sort=1&user_lat=37.3398634&user_lng=-122.0455164"
print URL
#Convert to Json format
json_obj = urllib2.urlopen(URL)
data = json.load(json_obj)
print data
you have to send your apikey in request like this:
import urllib2
request = urllib2.Request("yourURL", headers={"X-Airbnb-OAuth-Token" : "yourapikey"})
contents = urllib2.urlopen(request).read()
(im not 100% sure, but maybe it helps)
remove the spaces between '?' and 'client_id=...'
When I do that and then do a curl call, I get results.
And: never post your API-Key on sites like this.

python/scrapy for dynamic content

I am trying to write a python/scrapy script to get a list of ads from https://www.donedeal.ie/search/search?section=cars&adType=forsale&source=&sort=relevance%20desc&max=30&start=0, im interested in getting urls to individual ads. I found that page is making a XHR POST request to https://www.donedeal.ie/search/api/v3/find/.
Tried to write scrapy shell script to try my idea:
from scrapy.http import FormRequest
url = 'https://www.donedeal.ie/search/api/v3/find/'
payload = {'section': "cars", 'adType': "forsale", 'source': "", 'sort': "relevance desc", 'area': '', 'max': '30', 'start':'0'}
req = FormRequest(url, formdata=payload)
fetch(req)
but i get no response. In Chrome dev tools i saw that such request gives a json response with item ids which I could use to form urls myself.
I tried Selenium approach as well, where it gives time for a page to load up the dynamic content but that didn't seem to work either. Completely lost at this stage :(
The problem is with the call, the payload is almost OK.
The site you want to scrape accepts only JSON as payload so you should change your FormRequest to something like this:
import json
yield Request( url, method='POST',
body=json.dumps(payload),
headers={'Content-Type':'application/json'} )
This is because FormRequest is for simulate HTML forms (the content type is set to application/x-www-form-urlencoded), not JSON calls.
I was not able to create a working example with Scrapy.
However, I did come up with two other solutions for you.
In the examples below, response contains JSON data.
Working Example #1 using urllib2 — Tested with Python 2.7.10
import urllib2
url = 'https://www.donedeal.ie/search/api/v3/find/'
payload = '{"section":"cars","adType":"forsale","source":"","sort":"relevance desc","max":30,"start":0,"area":[]}'
req = urllib2.Request(url)
req.add_header('Content-Type', 'application/json')
response = urllib2.urlopen(req, payload).read()
Working Example #2 using requests — Tested with Python 2.7.10 and 3.3.5 and 3.5.0
import requests
url = 'https://www.donedeal.ie/search/api/v3/find/'
payload = '{"section":"cars","adType":"forsale","source":"","sort":"relevance desc","max":30,"start":0,"area":[]}'
response = requests.post(url, json=payload).content

Categories