Scraping content from AJAX onclick pop-up - python

I'm attempting to scape information from this page using Python: https://j2c-com.com/Euronaval14/catalogueWeb/catalogue.php?lang=gb. I'm specifically interested in the pop-up that occurs when you click on an individual exhibitor's name. The challenging part is it uses a lot of JavaScript to make AJAX calls to load the data.
I've examined the network calls when clicking on an exhibitor and it appears that the AJAX call goes to this URL (for the first exhibitor in the list, "A.I.A.D. and MOD ITALY"): https://j2c-com.com/Euronaval14/catalogueWeb/ajaxSociete.php?cle=D000365D000365&rnd=0.005115277832373977
I understand where the cle parameter comes from (the id with the <span> tag), however, what I don't quite get is where the rnd parameter is derived. Is it simply just a random number? I tried supplying a random number with each request but the html returned is missing the actual content of the pop-up.
This leads me to believe that either the rnd attribute isn't a random number, or I need some type of cookie present in order for the actual data to come through in the request.
Here's my code so far, I'm using Requests and BeautifulSoup to parse the html:
import random
import decimal
import requests
from bs4 import BeautifulSoup
#base_url = 'https://j2c-com.com/Euronaval14/catalogueWeb/catalogue.php?lang=gb'
base_url = 'https://j2c-com.com/Euronaval14/catalogueWeb/cataloguerecherche.php?listeFavoris=&typeRecherche=1&typeRechSociete=&typeSociete=&typeMarque=&typeDescriptif=&typeActivite=&choixSociete=&choixPays=&choixActivite=&choixAgent=&choixPavillon=&choixZoneExpo=&langue=gb&rnd=0.1410133063327521'
def generate_random_number(i,d):
"Produce a random between 0 and 1, with 16 decimal digits"
return str(decimal.Decimal('%d.%d' % (random.randint(0,i),random.randint(0,d))))
r = requests.get(base_url)
soup = BeautifulSoup(r.text)
table = soup.find('table', {'id':'tableResultat'})
trs = table.findAll('tr')
for tr in trs:
span = tr.find('span')
cle = span.get('id')
url = 'https://j2c-com.com/Euronaval14/catalogueWeb/ajaxSociete.php?cle=' + cle + '&rnd=' + generate_random_number(0,9999999999999999)
pop = requests.post(url)
print url
print pop.text
break
Can you help me understand how I can successfully capture the pop-up data, or what I'm doing wrong? Thanks in advance!

It is not about the rnd parameter. It is completely random and filled up by Math.random() js function.
As you've suspected, it is about cookies. PHPSESSID cookie is critical to be brought with every following request. Just start a requests.Session() and use it for every request you make:
The Session object allows you to persist certain parameters across
requests. It also persists cookies across all requests made from the
Session instance.
...
# start session
session = requests.Session()
r = session.get(base_url)
soup = BeautifulSoup(r.text)
table = soup.find('table', {'id':'tableResultat'})
trs = table.findAll('tr')
for tr in trs:
span = tr.find('span')
cle = span.get('id')
url = 'https://j2c-com.com/Euronaval14/catalogueWeb/ajaxSociete.php?cle=' + cle + '&rnd=' + generate_random_number(0,9999999999999999)
pop = session.post(url) # <-- the POST request here contains cookies returned by the first GET call
print url
print pop.text
break
It prints (see the HTML is filled up with the required data):
https://j2c-com.com/Euronaval14/catalogueWeb/ajaxSociete.php?cle=D000365D000365&rnd=0.1625497943120751
<table class='divAdresse'>
<tr>
<td class='ficheAdresse' valign='top'>Via Nazionale 54<br>IT-00184 - Roma<br><img
src='../../intranetJ2C/images/flags/IT.gif' style='margin-right:5px;'>ITALY<br><br>Phone: +39 06 488
0247 | Fax: +39 06 482 74 76<br><br>Website: <a href='http://www.aiad.it' target='_new'>www.aiad.it</a></td>
</tr>
</table>
<br>
<b class="divMarque">Contact:</b><br>
<font class="ficheAdresse"> Carlo Festucci - Secretary General<br>
c.festucci#aiad.it</font>
<br><br>
<div id='divTexte' class='ficheTexte'></div>
UPD.
The reason you were not getting the results for other exhibitors in the table is difficult to explain, but the main point here is to simulate all the consequent ajax requests being called under the hood when you click on the row in the browser:
import random
import decimal
import requests
from bs4 import BeautifulSoup
base_url = 'https://j2c-com.com/Euronaval14/catalogueWeb/cataloguerecherche.php?listeFavoris=&typeRecherche=1&typeRechSociete=&typeSociete=&typeMarque=&typeDescriptif=&typeActivite=&choixSociete=&choixPays=&choixActivite=&choixAgent=&choixPavillon=&choixZoneExpo=&langue=gb&rnd=0.1410133063327521'
fiche_url = 'https://j2c-com.com/Euronaval14/catalogueWeb/fiche.php'
reload_url = 'https://j2c-com.com/Euronaval14/catalogueWeb/reload.php'
data_url = 'https://j2c-com.com/Euronaval14/catalogueWeb/ajaxSociete.php'
def generate_random_number(i,d):
"Produce a random between 0 and 1, with 16 decimal digits"
return str(decimal.Decimal('%d.%d' % (random.randint(0, i),random.randint(0, d))))
# start session
session = requests.Session()
r = session.get(base_url)
soup = BeautifulSoup(r.content)
for span in soup.select('table#tableResultat tr span'):
cle = span.get('id')
session.post(reload_url)
session.post(fiche_url, data={'page': 'page:catalogue',
'pasFavori': '1',
'listeFavoris': '',
'cle': cle,
'stand': '',
'rnd': generate_random_number(0, 9999999999999999)})
session.post(reload_url)
pop = session.post(data_url, data={'cle': cle,
'rnd': generate_random_number(0, 9999999999999999)})
print pop.text
Prints:
<table class='divAdresse'><tr><td class='ficheAdresse' valign='top'>Via Nazionale 54<br>IT-00184 - Roma<br><img src='../../intranetJ2C/images/flags/IT.gif' style='margin-right:5px;'>ITALY<br><br>Phone: +39 06 488 0247 | Fax: +39 06 482 74 76<br><br>Website: Contact:</b><br><font class="ficheAdresse"> Carlo Festucci - Secretary General<br><a href="mailto:c.festucci#aiad.it">c.festucci#aiad.it</font><br><br><div id='divTexte' class='ficheTexte'></div>
<table class='divAdresse'><tr><td class='ficheAdresse' valign='top'>An der Faehre 2<br>27809 - Lemwerder<br><img src='../../intranetJ2C/images/flags/DE.gif' style='margin-right:5px;'>GERMANY<br><br>Phone: +49 421 673 30 | Fax: +49 421 673 3115<br><br>Website: <a href='http://www.abeking.com' target='_new'>www.abeking.com</a></td></tr></table><br><b class="divMarque">Contact:</b><br><font class="ficheAdresse"> Thomas Haake - Sales Director Navy</font><br><br><div id='divTexte' class='ficheTexte'></div>
<table class='divAdresse'><tr><td class='ficheAdresse' valign='top'>Mohamed Bin Khalifa Street (street 15)<br>PO Box 107241<br>107241 - Abu Dhabi<br><img src='../../intranetJ2C/images/flags/AE.gif' style='margin-right:5px;'>UNITED ARAB EMIRATES<br><br>Phone: +971 2 445 5551 | Fax: +971 2 445 0644</td></tr></table><br><b class="divMarque">Contact:</b><br><font class="ficheAdresse"> Pierre Baz - Business Development<br>pierre.baz#abudhabimar.com</font><br><br><div id='divTexte' class='ficheTexte'></div>
...

Related

Having problem while scraping Klaytn Scope Table

I have some problems while I scrape 'https://scope.klaytn.com/account/0xb5471a00bcc02ea297df2c4a4fd1d073465c662b?tabId=tokenBalance'
this website using python with bs4, requests.
from bs4 import BeautifulSoup
import requests
import json
import urllib3
import pandas as pd
urllib3.disable_warnings()
I want to scrape Token Balance Table, so I requests, but nothing respond.
Klaytn Scope
How can I scrape this 'Token Balnace' Table Value?
When I use 'find' method to find all of the table value, but it prints 'None'.
html = requests.get(url, verify=False).text
soup = BeautifulSoup(html, 'html.parser')
title = soup.find('span', {'class': 'ValueWithUnit__value'})
print(title)
This page is generated using the API. For example, a table is obtained from the following address: https://api-cypress-v3.scope.klaytn.com/v2/accounts/0xb5471a00bcc02ea297df2c4a4fd1d073465c662b/ftBalances?page=1
Account info: https://api-cypress-v3.scope.klaytn.com/v2/accounts/0xb5471a00bcc02ea297df2c4a4fd1d073465c662b
So u can get table:
url = 'https://api-cypress-v3.scope.klaytn.com/v2/accounts/0xb5471a00bcc02ea297df2c4a4fd1d073465c662b/ftBalances?page=1'
response = requests.get(url)
for token in response.json()['tokens']:
tokenName = response.json()['tokens'][token]['tokenName']
tokenAmount = next(amount['amount'] for amount in response.json()['result'] if amount['tokenAddress'] == token)
print(tokenName, tokenAmount)
OUTPUT:
Ironscale 128
Bloater 158
Lantern-Eye 144
Gaia's Tears 101
Redgill 11
...
Blue Egg 1
Green Egg 1
Health Vial 1
Mana Vial 1

How to do simple pagination loop with beautifulsoup

I need your help to have an explanation on how to do pagination and loop on 5 different pages but with the same URL (http://www.chartsinfrance.net/charts/albums.php,p2) with just the last word of the URL who change for the number of the page.
I can scrape data of the first page but I don't understand how to get other URLs and scrape all the data in one loop and having like the 250 songs in one execution of the script!
import requests
from bs4 import BeautifulSoup
req = requests.get('http://www.chartsinfrance.net/charts/albums.php')
soup = BeautifulSoup(req.text, "html.parser")
charts = soup.select('.c1_td1')
Auteurs=[]
Titre=[]
Rang=[]
Evolution=[]
for chart in charts:
Rang = chart.select_one('.c1_td2').get_text()
Auteurs = chart.select_one('.c1_td5 a').get_text()
Evolution = chart.select_one('.c1_td3').get_text()
Titre = chart.select_one('.c1_td5 .noir11').get_text()
print('--------')
print(Auteurs)
print(Titre)
print(Rang)
print(Evolution)
You can put your code to while ... loop, where you load soup, get information about songs and then select link to next page.
If the link to next page exists, load new soup and continue the loop.
If not, break the loop.
For example:
import requests
from bs4 import BeautifulSoup
url = 'http://www.chartsinfrance.net/charts/albums.php'
while True:
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
charts = soup.select('.c1_td1')
Auteurs=[]
Titre=[]
Rang=[]
Evolution=[]
for chart in charts:
Rang = chart.select_one('.c1_td2').get_text()
Auteurs = chart.select_one('.c1_td5 a').get_text()
Evolution = chart.select_one('.c1_td3').get_text()
Titre = chart.select_one('.c1_td5 .noir11').get_text()
print('--------')
print(Auteurs)
print(Titre)
print(Rang)
print(Evolution)
next_link = soup.select_one('a:contains("→ Suite du classement")')
if next_link:
url = 'http://www.chartsinfrance.net' + next_link['href']
else:
break
Prints:
--------
Lady Gaga
Chromatica
1
Entrée
--------
Johnny Hallyday
Johnny 69
2
Entrée
--------
...
--------
Bof
Pulp Fiction
248
-115
--------
Trois Cafés Gourmands
Un air de Live
249
-30
--------
Various Artists
Salut Les Copains 60 Ans
250
Entrée

Web Scraping - Get to Page 2

How to I get to page two of the data sets? No matter what I do, it only returns page 1.
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
myURL = 'https://jobs.collinsaerospace.com/search-jobs/'
uClient = uReq(myURL)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
container = page_soup.findAll("section", {"id":"search-results"}, {"data-current-page":"4"})
for child in container:
for heading in child.find_all('h2'):
print(heading.text)
The site actually uses JSON to return the HTML containing all of the entries. The API for this allows a page number to be specified and also the number of records to be returned for each page, increasing this will further increase the speed.
The JSON that is returned contains 3 keys. Filter information, the results HTML and a flag to indicate if jobs were returned. This last entry can be used to signal when you have reached the end of the pages.
You might want to look at the very popular Python requests library which simplifies generating the correct URLs for you and is also fast.
import bs4
import requests
from bs4 import BeautifulSoup as soup
params = {
"CurrentPage" : 1,
"RecordsPerPage" : 100,
"SearchResultsModuleName" : "Search Results",
"SearchFiltersModuleName" : "Search Filters",
"SearchType" : 5,
}
myURL = 'https://jobs.collinsaerospace.com/search-jobs/results'
page = 1
more_jobs = True
while more_jobs:
print(f"\nPage {page}")
params['CurrentPage'] = page
req = requests.get(myURL, params=params)
json = req.json()
page_soup = soup(json['results'], "html.parser")
container = page_soup.findAll("section", {"id":"search-results"}, {"data-current-page":"4"})
for child in container:
for heading in child.find_all('h2'):
print(heading.text)
more_jobs = json['hasJobs'] # Did this return any jobs?
page += 1
Try the following script to get the results from whatever pages you are interested in. All you need to do is change the range as per your requirement. I could have defined a while loop to exhaust the whole content but that is not the question you asked.
import requests
from bs4 import BeautifulSoup
link = 'https://jobs.collinsaerospace.com/search-jobs/results?'
params = {
'CurrentPage': '',
'RecordsPerPage': 15,
'Distance': 50,
'SearchResultsModuleName': 'Search Results',
'SearchFiltersModuleName': 'Search Filters',
'SearchType': 5
}
for page in range(1,5): #This is where you change the range to get the results from whatever page you want
params['CurrentPage'] = page
res = requests.get(link,params=params)
soup = BeautifulSoup(res.json()['results'],"lxml")
for name in soup.select("h2"):
print(name.text)
try this :
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
for letter in range(10):
myURL = 'https://jobs.collinsaerospace.com/search-jobs/'+ str(letter) + ' '
uClient = uReq(myURL)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
container = page_soup.findAll("section", {"id":"search-results"}, {"data-current-page":"4"})
for child in container:
for heading in child.find_all('h2'):
print(heading.text)
output:
of 3 first pages:
0
SYSTEMS / APPLICATIONS ENGINEER
Data Scientist
Sr Engineer, Drafter/Product Definition
Finance and Accounting Intern
Senior Software Engineer - CT3
Intern Manufacturing Engineer
Staff Eng., Reliability Engineering
Software Developer
Configuration Management Specialist
Disassembler I--2nd Shift
Disassembler I--3rd Shift
Manager, Supplier Performance
Manager, Supplier Performance
Assoc Eng, Mfg Engrg-Ops, ME P1
Manager, Supplier Performance
1
Assembly Operator (UK7014) 1 1 1 1
Senior Administrator (DF1040) 1 1 1
Tester 1
Assembler 1
Assembler 1
Finisher 1
Painter 1
Technician 1 Manufacturing/Operations
Assembler 1 - 1st Shift
Supply Chain Analyst 1
Assembler (W7006) 1
Assembler (W7006) 1
Supplier Quality Engineer 1
Supplier Inspection Engineer 1
Assembler 1 - 1st Shift
2
Assembler I-FAA-2
Senior/Business Analyst-2
Operational Technical Support Level 2
Project Engineer - 2 – EMU Program
Line & Surface Plate Inspector Class 2
Software Engineer (LVL 2) - Embedded UAV Controls
Software Engineer (LVL 2 / JAVA) - Air Combat Training
Software Engineer (Level 2) - Mission Simulation & Training
Electrical Engineer (LVL 2) - Mission Systems Design Tools
Quality Inspector II
GET/PGET
GET/PGET
Production Supervisor - 2nd shift
Software Developer
Trainee Operator/ Operator

BeautifulSoup scraping table id with python

I'm new to scraping, and am learning to use BeautifulSoup but I'm having trouble scraping a table. For the HTML I'm trying to parse:
<table id="ctl00_mainContent_DataList1" cellspacing="0" > style="width:80%;border-collapse:collapse;"> == $0
<tbody>
<tr><td><table width="90%" cellpadding="5" cellspacing="0">...</table></td></tr>
<tr><td><table width="90%" cellpadding="5" cellspacing="0">...</table></td></tr>
<tr><td><table width="90%" cellpadding="5" cellspacing="0">...</table></td></tr>
<tr><td><table width="90%" cellpadding="5" cellspacing="0">...</table></td></tr>
...
My code:
from urllib.request import urlopen
from bs4 import BeautifulSoup
quote_page = 'https://www.bcdental.org/yourdentalhealth/findadentist.aspx'
page = urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
table = soup.find('table', id="ctl00_mainContent_DataList1")
rows = table.findAll('tr')
I get AttributeError: 'NoneType' object has no attribute 'findAll'. I'm using python 3.6 and jupyter notebook for this in case that matters.
EDIT:
The table data that I'm trying to parse only shows up on the page after requesting a search (In the city field, select Burnaby, and hit search). The table ctl00_mainContent_DataList1 is the list of dentists that shows up after the search is submitted.
First: I use requests because it is easier to work with cookies, headers, etc.
Page is generated by ASP.net and it sends values __VIEWSTATE, __VIEWSTATEGENERATOR, __EVENTVALIDATION which you have to send in POST request too.
You have to load page using GET and then you can get those values.
You can also use request.Session() to get cookies which can be needed too.
Next you have to copy values and add parameters from form and send it using POST.
In code I put only parameters which are always send.
'526' is code for Vancouver. Other codes you can find in <select> tag.
If you want other options then you may have to add other parameters.
ie. ctl00$mainContent$chkUndr4Ref: on
is for Children: 3 & Under - Diagnose & Refer
EDIT: because inside <tr> is <table> so find_all('tr') returns too many elements (external tr and internal tr) and and laterfind_all('td')give the sametdmany times. I changedfind_all('tr')intofind_all('table')` and it should stop duplicate data.
import requests
from bs4 import BeautifulSoup
url = 'https://www.bcdental.org/yourdentalhealth/findadentist.aspx'
# --- session ---
s = requests.Session() # to automatically copy cookies
#s.headers.update({'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0'})
# --- GET request ---
# get page to get cookies and params
response = s.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
# --- set params ---
params = {
# session - copy from GET request
#'EktronClientManager': '',
#'__VIEWSTATE': '',
#'__VIEWSTATEGENERATOR': '',
#'__EVENTVALIDATION': '',
# main options
'ctl00$terms': '',
'ctl00$mainContent$drpCity': '526',
'ctl00$mainContent$txtPostalCode': '',
'ctl00$mainContent$drpSpecialty': 'GP',
'ctl00$mainContent$drpLanguage': '0',
'ctl00$mainContent$drpSedation': '0',
'ctl00$mainContent$btnSearch': '+Search+',
# other options
#'ctl00$mainContent$chkUndr4Ref': 'on',
}
# copy from GET request
for key in ['EktronClientManager', '__VIEWSTATE', '__VIEWSTATEGENERATOR', '__EVENTVALIDATION']:
value = soup.find('input', id=key)['value']
params[key] = value
#print(key, ':', value)
# --- POST request ---
# get page with table - using params
response = s.post(url, data=params)#, headers={'Referer': url})
soup = BeautifulSoup(response.text, 'html.parser')
# --- data ---
table = soup.find('table', id='ctl00_mainContent_DataList1')
if not table:
print('no table')
#table = soup.find_all('table')
#print('count:', len(table))
#print(response.text)
else:
for row in table.find_all('table'):
for column in row.find_all('td'):
text = ', '.join(x.strip() for x in column.text.split('\n') if x.strip()).strip()
print(text)
print('-----')
Part of result:
Map
Dr. Kashyap Vora, 6145 Fraser Street, Vancouver V5W 2Z9
604 321 1869, www.voradental.ca
-----
Map
Dr. Niloufar Shirzad, Harbour Centre DentalL19 - 555 Hastings Street West, Vancouver V6B 4N6
604 669 1195, www.harbourcentredental.com
-----
Map
Dr. Janice Brennan, 902 - 805 Broadway West, Vancouver V5Z 1K1
604 872 2525
-----
Map
Dr. Rosemary Chang, 1240 Kingsway, Vancouver V5V 3E1
604 873 1211
-----
Map
Dr. Mersedeh Shahabaldine, 3641 Broadway West, Vancouver V6R 2B8
604 734 2114, www.westkitsdental.com
-----

Beautiful Soup nested div (Adding extra function)

I am trying to extract Company Name, address, and zipcode from [www.quicktransportsolutions.com][1]. I have written the following code to scrawl the site and return the information I need.
import requests
from bs4 import BeautifulSoup
def trade_spider(max_pages):
page = 1
while page <= max_pages:
url = 'http://www.quicktransportsolutions.com/carrier/missouri/adrian.php'
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for link in soup.findAll('div', {'class': 'well well-sm'}):
title = link.string
print(link)
trade_spider(1)
After running the code, I see the information that I want, but I am confused to how to get it to print without all of the non-pertinent information.
Above the
print(link)
I thought that I could have link.string pull the Company Names, but that failed. Any suggestions?
Output:
div class="well well-sm">
<b>2 OLD BOYS TRUCKING LLC</b><br><u><span itemprop="name"><b>2 OLD BOYS TRUCKING</b></span></u><br> <span itemprop="address" itemscope="" itemtype="http://schema.org/PostalAddress"><span itemprop="streetAddress">227 E 2ND</span>
<br>
<span itemprop="addressLocality">Adrian</span>, <span itemprop="addressRegion">MO</span> <span itemprop="postalCode">64720</span></br></span><br>
Trucks: 2 Drivers: 2<br>
<abbr class="initialism" title="Unique Number to identify Companies operating commercial vehicles to transport passengers or haul cargo in interstate commerce">USDOT</abbr> 2474795 <br><span class="glyphicon glyphicon-phone"></span><b itemprop="telephone"> 417-955-0651</b>
<br><a href="/inspectionreports/2-old-boys-trucking-usdot-2474795.php" itemprop="url" target="_blank" title="Trucking Company 2 OLD BOYS TRUCKING Inspection Reports">
Everyone,
Thanks for the help so far... I'm trying to add an extra function to my little crawler. I have written the following code:
def Crawl_State_Page(max_pages):
url = 'http://www.quicktransportsolutions.com/carrier/alabama/trucking-companies.php'
while i <= len(url):
response = requests.get(url)
soup = BeautifulSoup(response.content)
table = soup.find("table", {"class" : "table table-condensed table-striped table-hover table-bordered"})
for link in table.find_all(href=True):
print link['href']
Output:
abbeville.php
adamsville.php
addison.php
adger.php
akron.php
alabaster.php
alberta.php
albertville.php
alexander-city.php
alexandria.php
aliceville.php
alpine.php
... # goes all the way to Z I cut the output short for spacing..
What I'm trying to accomplish here is to pull all of the href with the city.php and write it to a file. .. But right now, i am stuck in an infinite loop where it keep cycling through the URL. Any tips on how to increment it? My end goal is to create another function that feeds back into my trade_spider with the www.site.com/state/city.php and then loops through all 50 dates... Something to the effect of
while i < len(states,cities):
url = "http://www.quicktransportsolutions.com/carrier" + states + cities[i] +"
And then this would loop into my trade_spider function, pulling all of the information that I needed.
But, before I get to that part, I need a bit of help getting out of my infinite loop. Any suggestions? Or foreseeable issues that I am going to run into?
I tried to create a crawler that would cycle through every link on the page, and then if it found content on the page that trade_spider could crawl, it would write it to a file... However, that was a bit out of my skill set, for now. So, i'm trying this method.
I would rely on the itemprop attributes of the different tags for each company. They are conveniently set for name, url, address etc:
import requests
from bs4 import BeautifulSoup
def trade_spider(max_pages):
page = 1
while page <= max_pages:
url = 'http://www.quicktransportsolutions.com/carrier/missouri/adrian.php'
response = requests.get(url)
soup = BeautifulSoup(response.content)
for company in soup.find_all('div', {'class': 'well well-sm'}):
link = company.find('a', itemprop='url').get('href').strip()
name = company.find('span', itemprop='name').text.strip()
address = company.find('span', itemprop='address').text.strip()
print name, link, address
print "----"
trade_spider(1)
Prints:
2 OLD BOYS TRUCKING /truckingcompany/missouri/2-old-boys-trucking-usdot-2474795.php 227 E 2ND
Adrian, MO 64720
----
HILLTOP SERVICE & EQUIPMENT /truckingcompany/missouri/hilltop-service-equipment-usdot-1047604.php ROUTE 2 BOX 453
Adrian, MO 64720
----

Categories