cannot download the table even thought HTML shows there is a table, any way to fix it ?
the code is ok when I change other website link, not sure what is wrong with this website~
import requests
import pandas as pd
from io import StringIO
import datetime
import os
url = "https://www.cmoney.tw/etf/e210.aspx?key=0050"
response = requests.get(url)
listed = pd.read_html(response.text)[0]
listed.columns = listed.iloc[0,:]
listed = listed[["標的代號","標的名稱"]]
listed = listed.iloc[1:]
listed
ValueError: No tables found
Related
I want to scrape multiple pages of website using Python, but I'm getting Remote Connection closed error.
Here is my code
import pandas as pd
url_link = 'https://www.taneps.go.tz/epps/viewAllAwardedContracts.do?d-3998960-p={}&selectedItem=viewAllAwardedContracts.do'
LIST = []
for number in range(1,5379):
url = url_link.format(number)
dframe = pd.read_html(url, header=None)[0]
LIST.append(dframe)
Result_df = pd.concat(LIST)
Result_df.to_csv('Taneps_contracts.csv')
Any idea how to solve it?
For me, just using requests to fetch the html before passing to read_html is getting the data. I just edited your code to
import pandas as pd
import re
url_link = 'https://www.taneps.go.tz/epps/viewAllAwardedContracts.do?d-3998960-p={}&selectedItem=viewAllAwardedContracts.do'
LIST = []
for number in range(1,5379):
url = url_link.format(number)
r = requests.get(url) # getting page -> html in r.text
dframe = pandas.read_html(r.text, header=None)[0]
LIST.append(dframe)
Result_df = pd.concat(LIST)
Result_df.to_csv('Taneps_contracts.csv')
I didn't even have to add headers, but if this isn't enough for you (i.e., if the program breaks or if you don't end up with 53770+ rows), try adding convincing headers or using something like HTMLSession instead of directly calling requests.get...
i have a wescraping project and i faced a problem in my codes
import requests
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
driver = requests.get('https://bama.ir/car')
products= []
prices = []
kilometrs = []
soup = BeautifulSoup(driver.text,'html.parser')
for a in soup.find_all('li',herf=True,attrs={'class':'car-list-item-li list-data-main'}):
name=a.find('div', attrs={'class':'title'})
price=a.find('p', attrs={'class':'cost single-price'})
kilometr=a.find('div', attrs={'class':'car-func-details'})
products.append(name.text)
prices.append(price.text)
kilometrs.append(kilometr.text)
print(kilometr.text)
df = pd.DataFrame({'Product Name':products,'Price':prices,'kilometr':kilometrs})
df.to_csv('products.csv', index=False, encoding='utf-8')
a.find() is not working and i have no idea why!!can u help me ?
Indeed, your request returns 403 Forbiden status code.
The website is cloudflare protected, take a look to package like https://github.com/VeNoMouS/cloudscraper.
I tried to download specific data as part of my work,
the data is located in link! .
The source indicates how to download through the get method, but when I make my requests:
import requests
import pandas as pd
url="https://estadisticas.bcrp.gob.pe/estadisticas/series/api/PN01210PM/csv/2015-01/2019-01"
r=pd.to_csv(url)
it doesnt read as it should be (open link in navigator).
When I try
s=requests.get(url,verify=False) # you can set verify=True
df=pd.DataFrame(s)
the data neither is good.
What else can I do? It suppose to download the data as csv avoiding me to clean the data.
to get the content as csv you can replace all HTML line breaks with newline chars.
please let me know if this works for you:
import requests
import pandas as pd
from io import StringIO
url = "https://estadisticas.bcrp.gob.pe/estadisticas/series/api/PN01210PM/csv/2015-01/2019-01"
content = requests.get(url,verify=False).text.replace("<br>","\n").strip()
csv = StringIO(content)
r = pd.read_csv(csv)
print(r)
I was using Bs4 in Python for downloading a wallpaper from nmgncp.com.
However the code downloads only 16KB file whereas the full image is around 300KB.
Please help me. I have even tried wget.download method.
PS:- I am using Python 3.6 on Windows 10.
Here is my code::--
from bs4 import BeautifulSoup
import requests
import datetime
import time
import re
import wget
import os
url='http://www.nmgncp.com/dark-wallpaper-1920x1080.html'
html=requests.get(url)
soup=BeautifulSoup(html.text,"lxml")
a = soup.findAll('img')[0].get('src')
newurl='http://www.nmgncp.com/'+a
print(newurl)
response = requests.get(newurl)
if response.status_code == 200:
with open("C:/Users/KD/Desktop/Python_practice/newwww.jpg", 'wb') as f:
f.write(response.content)
The source of your problem is because there is a protection : the image page requires a referer, otherwise it redirects to the html page.
Source code fixed :
from bs4 import BeautifulSoup
import requests
import datetime
import time
import re
import wget
import os
url='http://www.nmgncp.com/dark-wallpaper-1920x1080.html'
html=requests.get(url)
soup=BeautifulSoup(html.text,"lxml")
a = soup.findAll('img')[0].get('src')
newurl='http://www.nmgncp.com'+a
print(newurl)
response = requests.get(newurl, headers={'referer': newurl})
if response.status_code == 200:
with open("C:/Users/KD/Desktop/Python_practice/newwww.jpg", 'wb') as f:
f.write(response.content)
First of all http://www.nmgncp.com/dark-wallpaper-1920x1080.html is an HTML document. Second when you try to download an image by direct URL (like: http://www.nmgncp.com/data/out/95/4351795-dark-wallpaper-1920x1080.jpg) it will also redirect you to a HTML document. This is most probably because the hoster (nmgncp.com) does not want to provide direct links to its images. He can check whether the image was called directly by looking at the HTTP referer and deciding if it is valid. So in this case you have to put in some more effort to make the hoster think, that you are a valid caller of direct URLs.
I am trying to access the "Yield Curve Data" available on this page. It has a radio button which upon clicking "Submit" results in a zip File, from which I am looking to get the data. I am looking to get the data from the "Retrieve all data" Option. My code is as follows, and from the statement print result.read() I realize that result is actually a HTML Document. My difficult is in understanding how to extract the data from result as I don't see any data in this. I am confused as to where to go from here.
import urllib, urllib2
import csv
from StringIO import StringIO
import pandas as pd
import os
from zipfile import ZipFile
my_url = 'http://www.bankofcanada.ca/rates/interest-rates/bond-yield-curves/'
data = urllib.urlencode({'lastchange': 'all'})
request = urllib2.Request(my_url, data)
result = urllib2.urlopen(request)
Thank You
Your going to need to generate a post request to the following endpoint:
http://www.bankofcanada.ca/stats/results/csv
With the following form data:
lookupPage: lookup_yield_curve.php
startRange: 1986-01-01
searchRange: all
This should give you the file.
You may also need to fake your useragent.