import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import requests
import urllib
import csv,datetime,re
url_ = "https://www.wunderground.com/history/daily/ca/toronto/CYTZ/date/2016-6-25"
mypage = requests.get(url_).text
soup = BeautifulSoup(mypage,'html.parser')
soup.find_all('tr')
I was trying to fetch the weather data from wunderground. BeautifulSoup has fetched the source code but I don't know why when I use soup.find_all('tr') it keeps on giving me [] ('None'). Anyone know why?
Thank you!
The table data (most probably) gets populated by javascript. take a look at this question.
Related
I am new here and have had a read through much of the historic posts but cannot exactly find what I am looking for.
I am new to webscraping and have successfully scraped data from a handful of sites.
However I am having an issue with this code as I am trying to extract the titles of the products using beautiful soup but have an issue somewhere in the code as it is not returning the data? Any help would be appreciated:
from bs4 import BeautifulSoup
import requests
import pandas as pd
webpage = requests.get('https://groceries.asda.com/aisle/beer-wine-spirits/spirits/whisky/1215685911554-1215685911575-1215685911576')
sp = BeautifulSoup(webpage.content, 'html.parser')
title = sp.find_all('h3', class_='co-product__title')
print(title)
I assume my issue lies somewhere in the find_all function, however cannot quite work out how to resolve?
Regards
Milan
You could try to use this link, it seems to pull the information you desire:
from bs4 import BeautifulSoup
import requests
webpage = requests.get("https://groceries.asda.com/api/items/iconmetadata?request_origin=gi")
sp = BeautifulSoup(webpage.content, "html.parser")
print(sp)
Let me know if this helps.
Try this:
from bs4 import BeautifulSoup
import requests
import pandas as pd
webpage = requests.get('https://groceries.asda.com/aisle/beer-wine-spirits/spirits/whisky/1215685911554-1215685911575-1215685911576')
sp = BeautifulSoup(webpage.content, 'html.parser')
title = sp.find_all('h3', {'class':'co-product__title'})
print(title[0])
also i prefer
sp = BeautifulSoup(webpage.text, 'lxml')
Also note that this will return a list with all elements of that class. If you want just the first instance, use .find ie:
title = sp.find('h3', {'class':'co-product__title'})
Sorry to rain on this parade, but you wont be able to scrape this data with out a webdriver or You can call the api directly. You should research how to get post rendered js in python.
I am having some problems with pandas read_html. When I tried to read the table using pandas, it wouldn't work. So I tried using requests and BeautifulSoup and solved the problem. But I would like to know why it was not possible for me to get the table using pandas at the first time. Thank you.
first code
import pandas as pd
url = 'https://finance.naver.com/item/sise_day.nhn?code=005930&page=1'
r = pd.read_html(url)[0]
second code that i tried
import requests
from bs4 import BeautifulSoup
import pandas as pd
ulr = 'https://finance.naver.com/item/sise_day.nhn?code=005930&page=1'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
table = str(soup.select("table"))
data=pd.read_html(table)[0]
Hi I'm try to crawl the correct CSS to go with the html table created from beautifulsoup. The table is done but CSS is not. Can anyone take a look at my code and perhaps suggeste a better way to crawl stylesheet?
I can see two issues:
1. I'm not locating the correct stylesheet on the page matching the table
2. My implementation of the CSS into the html file is awkward if not any issues.
import requests
from bs4 import BeautifulSoup
import pandas as pd
import os
import tabulate
import urllib.request
import io
from bs4 import Comment
url = "https://www.etax.nat.gov.tw/etw-main/web/ETW183W2_10805/"
url_css = "https://www.etax.nat.gov.tw/etwmain/resources/web/css/main.fia.css"
soup = BeautifulSoup(urllib.request.urlopen(url).read(), features="html.parser",from_encoding='utf-16')
soup_table = soup.findAll('table')[0]
soup_css = BeautifulSoup(urllib.request.urlopen(url_css).read(), features="html.parser",from_encoding='utf-16')
with io.open("soup_table.html", "w", encoding='utf-16') as f:
f.write(str(soup_table))
f.write("<script>")
f.write(str(soup_css))
f.write("</script>")
There is no error message, just that the table doesn't look right without properly styling.
I want to scrape the airplane arrivals from a website with Python 2.7, and export it to excel, but something is wrong with my code:
import urllib2
import unicodecsv as csv
import os
import sys
import io
import time
import datetime
import pandas as pd
from bs4 import BeautifulSoup
filename=r'output.csv'
resultcsv=open(filename,"wb")
output=csv.writer(resultcsv, delimiter=';',quotechar = '"', quoting=csv.QUOTE_NONNUMERIC, encoding='latin-1')
url = "https://www.flightradar24.com/data/airports/bud/arrivals"
page = urllib2.urlopen(url).read()
soup = BeautifulSoup(page)
data = soup.find('div', { "class" : "row cnt-schedule-table"})
print data
I need the contents of the div with class row cnt-schedule table. What am I doing wrong?
I believe the problem is that you are trying to get data from a JavaScript loaded data-set. Instead of loading from the page directly you'll need to mimic the requests for the data that the page is making to populate it.
When I try to scrape data with the following code:
from bs4 import BeautifulSoup
import urllib2
import requests
import re
start_url = requests.get('http://www.indiaproperty.com/chennai-property-search-allresidential-properties-for-sale-in-velachery-results')
soup = BeautifulSoup(start_url.content)
properties = soup.findAll('a',{'class':'paddl10'})
for eachproperty in properties:
print eachproperty['href']
I do not see any output. It does not give any error also. I checked if it is getting redirected using following code
from bs4 import BeautifulSoup
import requests
response = requests.get('http://www.indiaproperty.com/chennai-property-search-allresidential-properties-for-sale-in-velachery-results')
response.url
Then it also shows nothing and no error either.