I'm a big fan of stackoverflow and typically find solutions to my problems through this website. However, the following problem has bothered me for so long that it forced me to create an account here and ask directly:
I'm trying to scape this link: What i want is the row "TRCS Asset Class" and "Currency".
For starters, I'm using this code:
from bs4 import BeautifulSoup
import urllib2
url = ''
req = urllib2.urlopen(url)
raw =
soup = BeautifulSoup(raw)
print soup.prettify()
The html code returned (see below) is different from what you can see in your browser upon clicking the link:
Does anyone know what I'm missing here and how I could get it to work?

Thanks, Teemu Risikko - a comment (albeit not the solution) of the website you linked got me on the right path.
In case someone else is bumping into the same problem, here is my solution: I'm getting the data via requests and not via traditional "scraping" (e.g. BeautifulSoup or lxml).
Navigate to the website using Google Chrome.
Right-click on the website and select "Inspect".
On the top navigation bar select "Network".
Limit network monitor to "XHR".
One of the entries (market with an arrow) shows the link that can be used with the requests library.
import requests
url = ''
headers = {'X-AG-Access-Token': YOUR_ACCESS_TOKEN}
r = requests.get(url, headers=headers)
Which gets me this:
{u'Asset Class': [u'Units'],
u'Asset Class URL': [u''],
u'Currency': [u'CAD'],
u'Currency URL': [u''],
u'Exchange': [u'TOR'],
u'IsQuoteOf.mdaas': [{u'Is Quote Of': [u'Convertible Debentures Income Units'],
u'URL': [u''],
u'quoteOfInstrument': [u'21475768667'],
u'quoteOfInstrument URL': [u'']}],
u'Mic': [u'XTSE'],
u'PERM ID': [u'21475776041'],
u'Quote Type': [u'equity'],
u'RIC': [u'OCV_u.TO'],
u'Ticker': [u'OCV.UN'],
u'entityType': [u'Quote']}

Using the default user-agent with a lot of pages will give you a different looking page because it is using an outdated user-agent. This is what your output is telling you.
Reference on Changing user-agents
Thought this may be your problem, it does not exactly answer the question about getting dynamically applied changes on a webpage. To get the dynamically changed data you need to emulate the javascript requests that the page is making on load. If you make the requests that the javascript is making you will get the data that the javascript is getting.


ERROR: 'NoneType' object has no attribute 'find_all'

I'm doing web scraping of a web page called: CVE Trends
import bs4, requests,webbrowser
LINK = ""
response = requests.get(LINK)
link_tweets =[]
for a_tweet in a_tweets:
link_tweet= str(a_tweet.get('href'))
if PRE_LINK in link_tweet:
from pprint import pprint
This is the code that I've written so far. I've tried in many ways but it gives always the same error:
'NoneType' object has no attribute 'find_all'
Can someone help me please? I really need this.
Thanks in advance for any answer.
This is due to not getting response you exactly want.
This website have java-script loaded content,so you will not get data in request.
instead of scraping website you will get data from
here is some solution:
import requests
import json
from urlextract import URLExtract
LINK = ""
link_tweets = []
# library for url extraction
extractor = URLExtract()
# ectract response from LINK (json Response)
html = requests.get(LINK).text
# convert string to json object
twitt_json = json.loads(html)
twitt_datas = twitt_json.get('data')
for twitt_data in twitt_datas:
# extract tweets
twitts = twitt_data.get('tweets')
for twitt in twitts:
# extract tweet texts and validate condition
twitt_text = twitt.get('tweet_text')
if PRE_LINK in twitt_text:
# find urls from text
urls_list = extractor.find_urls(twitt_text)
for url in urls_list:
if PRE_LINK in url:
This is happening because soup.find("div", class_="tweet_text") is not finding anything, so it returns None. This is happening because the site you're trying to scrape is populated using javascript, so when you send a get request to the site, this is what you're getting back:
web scraping BS4 table located but empty findall [duplicate]

This question already has an answer here:
HTML tag appears empty when parsing it with BeautifulSoup but has content when opened in browser
(1 answer)
Closed 2 years ago.
I am trying to scrape a table from a website:
After importing the url
When I inspect the website, I see that my table is there between tags:
Still when I use :
It returns me an empty list. Can someone point out what I did wrong ?
Beautifulsoup, doesn't evaluate javascript.
It looks like all those tables are being generated by Javascript. You could use dryscape to evaluate the page before passing it on to beautiful soup.

python requests problem: cloudflare error message "enable cookies"

I was planning on creating a basic web scraper for the site however my efforts were stopped early due to an error. When requesting to the url, rather than displaying the html of the website, or even the entrance captcha, I am redirected to a cloudflare page with the error message "enable cookies". Both my code and the response are shown below
import requests
import cfscrape
session = requests.session()
response = session.get('')
Adding Browser/User-Agent Filtering to cloudscraper did the trick for me.
import cloudscraper
from bs4 import BeautifulSoup
# Adding Browser / User-Agent Filtering should help ie.
# will give you only desktop firefox User-Agents on Windows
scraper = cloudscraper.create_scraper(browser={'browser': 'firefox','platform': 'windows','mobile': False})
html = scraper.get("").content
soup = BeautifulSoup(html, 'html.parser')
import cloudscraper
from bs4 import BeautifulSoup
scraper = cloudscraper.create_scraper()
html = scraper.get("").content
soup = BeautifulSoup(html, 'html.parser')
cloudscraper.exceptions.CloudflareReCaptchaProvider: Cloudflare reCaptcha detected, unfortunately you haven't loaded an anti reCaptcha provider correctly via the 'recaptcha' parameter.
Next Step ?
3rd Party reCaptcha Solvers
cloudscraper currently supports the following 3rd party reCaptcha solvers, should you require them.

Download PDF with chrome plugin in python selenium

I'm trying to extract a PDF from this site that uses the native Google Chrome pdf viewer tool to open the pdf in the first place, it's content type is /application/pdf. The issue is that the site URLs that I get aren't actually links to the PDF but rather to a .zul site where the js will load the pdf, or fetch it.
Here's my download code below:
def download_pdf(url, idx, save_dir):
options = webdriver.ChromeOptions()
profile = {"plugins.plugins_list": [{"enabled":False,"name":"Chrome PDF Viewer"}],
"download.default_directory" : save_dir}
driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver", chrome_options=options)
The problem that Im encountering with the above code is that I get the following readout from driver.source_page:
EDIT: Included the link

how to copy all the code of a URL with python

I want to copy all the code of an URL ( using Python 3.6, but I can only copy part of the code, and I don't know why.
So far, I tried with "requests" module
import requests
page = requests.get("")
and "urllib"
import urllib.request
site = urllib.request.urlopen("")
The part of the code with information of the "Reaction Details", like "Name", "ID" and "Abbreviation" are missing, but they are visible if I inspect the code on the developer bar of Chrome.
The code I'm able to download using the two codes above is:
Thank you.
It's being inserted by a javascript script, therefore, either requests nor urllib would find it, you would need to use a browser for this, you should try with selenium or PhantomJS
something like:
from selenium import webdriver
driver = webdriver.Chrome('./chromedriver')
Try getting this url instead:,rxn00001)
