To download a CSV file from a particular URL - python

How do i download a CSV file from a url in Jupyter notebook? I know how to use wget command in Google colab platform but in case of Jupyter notebook what should i do?

If you have a direct link to the URL you can just enter the URL directly in pandas to get the file:
import pandas as pd
import io
import requests
url="https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
s=requests.get(url).content
c=pd.read_csv(io.StringIO(s.decode('utf-8')))
Code is from this question: Pandas read_csv from url
Have a look at that as well maybe :)

import pandas as pd
data = pd.read_csv("https://abc/xyz.csv")
For more, check the documentation:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

You can use:
def download_csv_to_df(urllink):
converted_dataframe = pd.read_csv(urllink)
return converted_dataframe

Related

ValueError: No tables found by using pd.read_html

cannot download the table even thought HTML shows there is a table, any way to fix it ?
the code is ok when I change other website link, not sure what is wrong with this website~
import requests
import pandas as pd
from io import StringIO
import datetime
import os
url = "https://www.cmoney.tw/etf/e210.aspx?key=0050"
response = requests.get(url)
listed = pd.read_html(response.text)[0]
listed.columns = listed.iloc[0,:]
listed = listed[["標的代號","標的名稱"]]
listed = listed.iloc[1:]
listed
ValueError: No tables found

Not able to download a CSV file from a website

Maybe someone knows what's a problem with downloading from the site below... I run this code in Jupiter, and nothing happens.
import requests
import os
url = 'http://www.football-data.co.uk/mmz4281/1920/E0.csv'
response = requests.get(url)
with open(os.path.join("folder", "file"), 'wb') as f:
f.write(response.content)
I've also tried this code and it works fine on my side, assuming folder and file were defined correctly. Alternatively, you can try using pandas which can read a CSV file from URL. So the code would become:
import pandas as pd
import csv
url = '{Some CSV target}'
df = pd.read_csv(url)
df.to_csv('{absoulte path to CSV}', sep=',', index=False, quoting=csv.QUOTE_ALL)
Its has started working only after declaring proxy.
Im practicing on my work laptop. Maybe the local net is blocking my requests.
I hope it can be helpful for someone.
Thanks to everyone for your help!
import requests
import os
os.environ['HTTP_PROXY'] = 'your proxy'
url = 'http://www.football-data.co.uk/mmz4281/1920/E0.csv'
response = requests.get(url)
with open(os.path.join("C://DownloadLocation", "file.csv"), 'wb') as f:
f.write(response.content)

download csv data through requests.get() retrieves me php text

I tried to download specific data as part of my work,
the data is located in link! .
The source indicates how to download through the get method, but when I make my requests:
import requests
import pandas as pd
url="https://estadisticas.bcrp.gob.pe/estadisticas/series/api/PN01210PM/csv/2015-01/2019-01"
r=pd.to_csv(url)
it doesnt read as it should be (open link in navigator).
When I try
s=requests.get(url,verify=False) # you can set verify=True
df=pd.DataFrame(s)
the data neither is good.
What else can I do? It suppose to download the data as csv avoiding me to clean the data.
to get the content as csv you can replace all HTML line breaks with newline chars.
please let me know if this works for you:
import requests
import pandas as pd
from io import StringIO
url = "https://estadisticas.bcrp.gob.pe/estadisticas/series/api/PN01210PM/csv/2015-01/2019-01"
content = requests.get(url,verify=False).text.replace("<br>","\n").strip()
csv = StringIO(content)
r = pd.read_csv(csv)
print(r)

Can't Download Full File in Python

I was using Bs4 in Python for downloading a wallpaper from nmgncp.com.
However the code downloads only 16KB file whereas the full image is around 300KB.
Please help me. I have even tried wget.download method.
PS:- I am using Python 3.6 on Windows 10.
Here is my code::--
from bs4 import BeautifulSoup
import requests
import datetime
import time
import re
import wget
import os
url='http://www.nmgncp.com/dark-wallpaper-1920x1080.html'
html=requests.get(url)
soup=BeautifulSoup(html.text,"lxml")
a = soup.findAll('img')[0].get('src')
newurl='http://www.nmgncp.com/'+a
print(newurl)
response = requests.get(newurl)
if response.status_code == 200:
with open("C:/Users/KD/Desktop/Python_practice/newwww.jpg", 'wb') as f:
f.write(response.content)
The source of your problem is because there is a protection : the image page requires a referer, otherwise it redirects to the html page.
Source code fixed :
from bs4 import BeautifulSoup
import requests
import datetime
import time
import re
import wget
import os
url='http://www.nmgncp.com/dark-wallpaper-1920x1080.html'
html=requests.get(url)
soup=BeautifulSoup(html.text,"lxml")
a = soup.findAll('img')[0].get('src')
newurl='http://www.nmgncp.com'+a
print(newurl)
response = requests.get(newurl, headers={'referer': newurl})
if response.status_code == 200:
with open("C:/Users/KD/Desktop/Python_practice/newwww.jpg", 'wb') as f:
f.write(response.content)
First of all http://www.nmgncp.com/dark-wallpaper-1920x1080.html is an HTML document. Second when you try to download an image by direct URL (like: http://www.nmgncp.com/data/out/95/4351795-dark-wallpaper-1920x1080.jpg) it will also redirect you to a HTML document. This is most probably because the hoster (nmgncp.com) does not want to provide direct links to its images. He can check whether the image was called directly by looking at the HTTP referer and deciding if it is valid. So in this case you have to put in some more effort to make the hoster think, that you are a valid caller of direct URLs.

Understanding how to extract data from HTML File

I am trying to access the "Yield Curve Data" available on this page. It has a radio button which upon clicking "Submit" results in a zip File, from which I am looking to get the data. I am looking to get the data from the "Retrieve all data" Option. My code is as follows, and from the statement print result.read() I realize that result is actually a HTML Document. My difficult is in understanding how to extract the data from result as I don't see any data in this. I am confused as to where to go from here.
import urllib, urllib2
import csv
from StringIO import StringIO
import pandas as pd
import os
from zipfile import ZipFile
my_url = 'http://www.bankofcanada.ca/rates/interest-rates/bond-yield-curves/'
data = urllib.urlencode({'lastchange': 'all'})
request = urllib2.Request(my_url, data)
result = urllib2.urlopen(request)
Thank You
Your going to need to generate a post request to the following endpoint:
http://www.bankofcanada.ca/stats/results/csv
With the following form data:
lookupPage: lookup_yield_curve.php
startRange: 1986-01-01
searchRange: all
This should give you the file.
You may also need to fake your useragent.

Categories