How can I get proper response back from scrapy?

How can I get proper response back from scrapy? - python

I am trying to scrape some search results from this company register, but when i try to scrape the company name my results dont seem to return properly, its like the company name item is split into 2 html items based of the search keyword.
Is there a way to join these together? This is my spider
import scrapy
class QuotesSpider(scrapy.Spider):
name = 'gov2'
start_urls = ['https://beta.companieshouse.gov.uk/search/companies?q=a']
def parse(self, response):
for i in response.css('ul.results-list'):
yield {
'company_name': i.css('li.type-company h3 a::text').extract(),
'address': i.css('li.type-company p::text').extract(),
}
My results as you can see its missing some parts..
Hope any of you see whats going on.. thank you!

As I see, you want to fetch all the texts within a and p tags and there is many tags within this tags.
Try this one and remove the unnecessary spaces through regex:
import scrapy
import re
class QuotesSpider(scrapy.Spider):
name = 'gov2'
start_urls = ['https://beta.companieshouse.gov.uk/search/companies?q=a']
def parse(self, response):
for i in response.css('ul.results-list'):
yield {
'company_name': re.sub('\s+',' ',''.join(i.css('li.type-company h3 a ::text').extract())),
'address': re.sub('\s+',' ',''.join(i.css('li.type-company p ::text').extract())),
}

Using the regex, just modified the code for a better output.
import re
import scrapy
class QuotesSpider(scrapy.Spider):
name = 'gov2'
start_urls = ['https://beta.companieshouse.gov.uk/search/companies?q=a']
def parse(self, response):
for i in response.css('.type-company'):
yield {
'company_name': re.sub('\s+', ' ', ''.join(i.css('h3 a ::text').extract())),
'address': re.sub('\s+', ' ', ''.join(i.css('p ::text').extract())),
}

Related

Scraping Tripadvisor attractions using scrapy and python

I am trying to scrape TripAdvisor's attractions, but I cannot get the names and addresses of each attraction. I suspect I wrote product.css(...) wrong (there are jsons?).
Can anyone tell me how to correct the code to get the name and address of each attraction?
My current code:
import scrapy
class QuotesSpider(scrapy.Spider):
name = "quotes"
start_urls = [
'https://www.tripadvisor.com/Attractions-g187427-Activities-oa90-Spain'
]
def parse(self, response):
for link in response.css('.EsZYd a::attr(href)'):
yield response.follow(link.get(), callback=self.parse_categories)
def parse_categories(self, response):
products = response.css('div.eeqnt')
for product in products:
yield {
'name' : product.css('h1.WlYyy cPsXC GeSzT::text').get().strip(),
'address' : product.css('span.WlYyy cacGK Wb::text').get().strip(),
}
Updated code (exporting infro from each atrraction on each page from list):
import scrapy
class QuotesSpider(scrapy.Spider):
name = "quotes"
start_urls = [
'https://www.tripadvisor.com/Attractions-g274862-Activities-a_allAttractions.true-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa30-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa60-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa90-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa120-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa150-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa180-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa210-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa240-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa270-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa300-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa330-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa360-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa390-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa420-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa450-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa480-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa510-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa540-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa570-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa600-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa630-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa660-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa690-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa720-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa750-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa780-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa810-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa840-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa870-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa900-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa930-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa960-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa990-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1020-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1050-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1080-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1110-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1140-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1170-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1200-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1230-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1260-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1290-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1320-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1350-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1380-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1410-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1440-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1470-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1500-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1530-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1560-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1590-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1620-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1650-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1680-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1710-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1740-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1770-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1800-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1830-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1860-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1890-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1920-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1950-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa1980-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2010-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2040-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2070-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2100-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2130-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2160-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2190-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2220-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2250-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2280-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2310-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2340-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2370-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2400-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2430-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2460-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2490-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2520-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2550-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2580-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2610-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2640-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2670-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2700-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2730-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2760-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2790-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2820-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2850-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2880-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2910-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2940-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa2970-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa3000-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa3030-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa3060-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa3090-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa3120-Slovenia.html',
'https://www.tripadvisor.com/Attractions-g274862-Activities-oa3150-Slovenia.html'
]
def parse(self, response):
for link in response.css('.EsZYd a::attr(href)').getall():
yield response.follow(link, callback=self.parse_categories)
def parse_categories(self, response):
yield {
'name': response.css('h1.WlYyy.cPsXC.GeSzT::text').get(),
'reviews': response.xpath('(//*[#class="cfIVb"])[1]//text()').getall(),
'address': response.xpath('(//*[#class="dGWve"])//text()').getall(),
'url': response.url,
}

It's not really related to python, but css-selectors.
CSS classes should separate with dot and not space WlYyy.cPsXC.GeSzT.
Best suggestion would be to use chrome with dev-toolbar. It will give you an ability to get path to the specific element via css-selector or xpath, just right-click on the element in a DOM-tree and select copy menu-item.
Avoid using classes (especially one without semantic meaning) as an anchor point. They might change from page to page, or in time.
Better to use semantically meaningful nodes, like in your case:
XPath for the title would looks like this //main//header//div[#data-automation="main_h1"]//h1.

You can't use for loop in each listing page
from scrapy.crawler import CrawlerProcess
import scrapy
class QuotesSpider(scrapy.Spider):
name = "quotes"
start_urls = [
'https://www.tripadvisor.com/Attractions-g187427-Activities-oa90-Spain'
]
def parse(self, response):
for link in response.css('.EsZYd a::attr(href)').getall():
#print(link)
yield response.follow(link, callback=self.parse_categories)
def parse_categories(self, response):
yield {
'name' : response.css('h1.WlYyy.cPsXC.GeSzT::text').get(),
'address' :''.join(response.xpath('(//*[#class="hxQKk"])[1]//text()').getall()[:-1]),
'url':response.url
}
if __name__ == "__main__":
process =CrawlerProcess(QuotesSpider)
process.crawl()
process.start()

Scrapy not scraping if one item missing

I built my first scray spider in several hours for the last two days but i am stuck right now - the main purpose i wanted to achieve is to extract all data to later filter it in csv. Now, the real crucial data for me (Companies without! webpages) is dropped because scrapy can't find the xpath i provided if an item has a homepage. I tried an if statement here, but its not working.
Example website: https://www.achern.de/de/Wirtschaft/Unternehmen-A-Z/Unternehmen?view=publish&item=company&id=1345
I use xPath selector: response.xpath("//div[#class='cCore_contactInformationBlockWithIcon cCore_wwwIcon']/a/#href").extract()
Example non-website: https://www.achern.de/de/Wirtschaft/Unternehmen-A-Z/Unternehmen?view=publish&item=company&id=1512
Spider Code:
# -*- coding: utf-8 -*-
import scrapy
class AchernSpider(scrapy.Spider):
name = 'achern'
allowed_domains = ['www.achern.de']
start_urls = ['https://www.achern.de/de/Wirtschaft/Unternehmen-A-Z/']
def parse(self, response):
for href in response.xpath("//ul[#class='cCore_list cCore_customList']/li[*][*]/a/#href"):
url = response.urljoin(href.extract())
yield scrapy.Request(url, callback= self.scrape)
def scrape(self, response):
#Extracting the content using css selectors
print("Processing:"+response.url)
firma = response.css('div>#cMpu_publish_company>h2.cCore_headline::text').extract()
anschrift = response.xpath("//div[contains(#class,'cCore_addressBlock_address')]/text()").extract()
tel = response.xpath("//div[#class='cCore_contactInformationBlockWithIcon cCore_phoneIcon']/text()").extract()
mail = response.xpath(".//div[#class='cCore_contactInformationBlock']//*[contains(text(), '#')]/text()").extract()
web1 = response.xpath("//div[#class='cCore_contactInformationBlockWithIcon cCore_wwwIcon']/a/#href").extract()
if "http:" not in web1:
web = "na"
else:
web = web1
row_data=zip(firma,anschrift,tel,mail,web1) #web1 must be changed to web but then it only give out "n" for every link
#Give the extracted content row wise
for item in row_data:
#create a dictionary to store the scraped info
scraped_info = {
'Firma' : item[0],
'Anschrift' : item[1] +' 77855 Achern',
'Telefon' : item[2],
'Mail' : item[3],
'Web' : item[4],
}
#yield or give the scraped info to scrapy
yield scraped_info
So overall it should export the DROPPED items even "web" is not there..
Hope someone can help, greetings S

Using
response.css(".cCore_wwwIcon > a::attr(href)").get()
gives you a None or the website address, then you can use or to provide a default:
website = response.css(".cCore_wwwIcon > a::attr(href)").get() or 'na'
Also, I refactored your scraper to use css selectors. Note that I've used .get() instead of .extract() to get a single item, not a list, which cleans up the code quite a bit.
import scrapy
from scrapy.crawler import CrawlerProcess
class AchernSpider(scrapy.Spider):
name = 'achern'
allowed_domains = ['www.achern.de']
start_urls = ['https://www.achern.de/de/Wirtschaft/Unternehmen-A-Z/']
def parse(self, response):
for url in response.css("[class*=cCore_listRow] > a::attr(href)").extract():
yield scrapy.Request(url, callback=self.scrape)
def scrape(self, response):
# Extracting the content using css selectors
firma = response.css('.cCore_headline::text').get()
anschrift = response.css('.cCore_addressBlock_address::text').get()
tel = response.css(".cCore_phoneIcon::text").get()
mail = response.css("[href^=mailto]::attr(href)").get().replace('mailto:', '')
website = response.css(".cCore_wwwIcon > a::attr(href)").get() or 'na'
scraped_info = {
'Firma': firma,
'Anschrift': anschrift + ' 77855 Achern',
'Telefon': tel,
'Mail': mail,
'Web': website,
}
yield scraped_info
if __name__ == "__main__":
p = CrawlerProcess()
p.crawl(AchernSpider)
p.start()
output:
with website:
{'Firma': 'Wölfinger Fahrschule GmbH', 'Anschrift': 'Güterhallenstraße 8 77855 Achern', 'Telefon': '07841 6738132', 'Mail': 'info#woelfinger-fahrschule.de', 'Web': 'http://www.woelfinger-fahrschule.de'}
without website:
{'Firma': 'Zappenduster-RC Steffen Liepe', 'Anschrift': 'Am Kirchweg 16 77855 Achern', 'Telefon': '07841 6844700', 'Mail': 'Zappenduster-Rc#hotmail.de', 'Web': 'na'}

Learning Python and Scrapy

I am trying to learn Scrapy and Python. I'm having an issue I don't understand. I'm running the same piece of code once through the Terminal and then again through a script and the results are different. The Terminal gives me all the titles (what I want) the script just gives me the first.
for title in response.css('div.section-content ul'):
item = {
'title' : title.css('li h3 a::text').extract_first(),
}
I'm trying to extract all the movie names on the iTunes movies page.
Any help is appreciated. Thanks
UPDATE
import scrapy
class ItunesSpider(scrapy.Spider):
name = 'itunes'
allowed_domains = ['apple.com']
start_urls = ['apple.com/itunes/charts/movies/']
def parse(self, response):
self.log ('I just visited: ' + response.url)
for title in response.css('div.section-content ul'):
item = { 'title' : title.css('li h3 a::text').extract_first(), }
yield item

You have a simple thought error in your CSS-Selector; where you are looping through all items of type ul (and there is only one of that). What you want, is to loop through the li elements instead.
# -*- coding: utf-8 -*-
import scrapy
class ItunesSpider(scrapy.Spider):
name = 'itunes'
allowed_domains = ['apple.com']
start_urls = ['https://apple.com/itunes/charts/movies/']
def parse(self, response):
self.log ('I just visited: ' + response.url)
for title in response.css('div.section-content ul li'):
title = title.css('h3 a::text').extract_first()
self.log('Title: %s' % title)
item = {
'title' : title,
}
yield item
Why it works differently for you "in the terminal", I do not know.

How to crawl data from the linked webpages on a webpage we are crawling

I am crawling the names of the colleges on this webpage, but, i also want to crawl the number of faculties in these colleges which is available if open the specific webpages of the colleges by clicking the name of the college.
What should i append to this code to get the result.
The result should be in the form of [(name1, faculty1), (name2,faculty2),... ]
import scrapy
class QuotesSpider(scrapy.Spider):
name = "student"
start_urls = [
'http://www.engineering.careers360.com/colleges/list-of-engineering-colleges-in-karnataka?sort_filter=alpha',
]
def parse(self, response):
for students in response.css('li.search-result'):
yield {
'name': students.css('div.title a::text').extract(),
}

import scrapy
class QuotesSpider(scrapy.Spider):
name = "student"
start_urls = [
'http://www.engineering.careers360.com/colleges/list-of-engineering-colleges-in-karnataka?sort_filter=alpha',
]
def parse(self, response):
for students in response.css('li.search-result'):
req = scrapy.Request(students.css(SELECT_URL), callback=self.parse_student)
req.meta['name'] = students.css('div.title a::text').extract()
yield req
def parse_student(self, response):
yield {
'name': response.meta.get('name')
'other data': response.css(SELECTOR)
}
Should be something like this.
So you send the name of the student in the meta data of the request.
That allows you to request it in your next request.
If the data is also available on the last page you scrape in parse_student you might want to consider not sending it in the meta data but just to scrape it from the last page.

Scraping interactive website

I'm trying to scrap name of the course with number of students from Udacity to find out which courses are the most popular. I manage to create code for item:
import scrapy
class UdacityItem(scrapy.Item):
name=scrapy.Field()
users=scrapy.Field()
and spider:
import scrapy
from Udacity.items import UdacityItem
import re
class DmozSpider(scrapy.Spider):
name = "UdSpider"
allowed_domains = ["udacity.com"]
start_urls = ["https://www.udacity.com/courses/all"]
def parse(self, response):
sites = response.xpath('//h3/a')
for s in sites:
t=UdacityItem()
#name & url
t['name']=s.xpath('text()').extract()[0].strip()
url=response.urljoin(s.xpath('#href').extract()[0])
#request
req=scrapy.Request(url, callback=self.second)
req.meta['item']=t
#execute
yield req
def second(self,response):
t=response.meta['item']
strong =response.xpath('//strong[#data-course-student-count]/text()').extract()[0]
t['users']=strong
yield t
As a result I'm getting name of the course but instead of the number of students I am getting text 'thousands of'. When I open an example website in browser I see that 'thousands of' is the base value and later (in 1-2 sec) this text is changing into a proper number(which I want to get).
And here are my questions:
Why this replacement is happening? Is this JavaScript code? I would
like to understand mechanism of this change.
How I can capture proper number of students using scrapy? I hope this is possible.
Thank you in advance for help with that.

To get the enrollments count, you would have to simulate the API request to https://www.udacity.com/api/summaries endpoint for a specific course id, which can be extracted from the URL itself - for example, it is ud898 for the https://www.udacity.com/course/javascript-promises--ud898 URL.
Complete spider:
import json
import re
from urllib import quote_plus
import scrapy
class UdacityItem(scrapy.Item):
name = scrapy.Field()
users = scrapy.Field()
class DmozSpider(scrapy.Spider):
name = "UdSpider"
allowed_domains = ["udacity.com"]
start_urls = ["https://www.udacity.com/courses/all"]
def parse(self, response):
sites = response.xpath('//h3/a')
for s in sites:
t = UdacityItem()
# name & url
t['name'] = s.xpath('text()').extract()[0].strip()
url = response.urljoin(s.xpath('#href').extract()[0])
# request
req = scrapy.Request(url, callback=self.second)
req.meta['item'] = t
# execute
yield req
def second(self, response):
queries = [{
"limit": 1,
"model": "CourseStudentsSummary",
"locator": {
"sample_frequency": "daily",
"content_context": [{
"node_key": re.search(r'--(.*?)$', response.url).group(1)
}]
}
}]
yield scrapy.Request(method="GET",
url="https://www.udacity.com/api/summaries?queries=" + quote_plus(json.dumps(queries)),
callback=self.parse_totals)
def parse_totals(self, response):
print(json.loads(response.body[5:].strip())["summaries"]["default"][0]["data"]["total_enrollments"])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I get proper response back from scrapy? - python

Related

Scraping Tripadvisor attractions using scrapy and python

Scrapy not scraping if one item missing

Learning Python and Scrapy

How to crawl data from the linked webpages on a webpage we are crawling

Scraping interactive website

Categories

Resources