download csv data through requests.get() retrieves me php text - python

I tried to download specific data as part of my work,
the data is located in link! .
The source indicates how to download through the get method, but when I make my requests:
import requests
import pandas as pd
url="https://estadisticas.bcrp.gob.pe/estadisticas/series/api/PN01210PM/csv/2015-01/2019-01"
r=pd.to_csv(url)
it doesnt read as it should be (open link in navigator).
When I try
s=requests.get(url,verify=False) # you can set verify=True
df=pd.DataFrame(s)
the data neither is good.
What else can I do? It suppose to download the data as csv avoiding me to clean the data.

to get the content as csv you can replace all HTML line breaks with newline chars.
please let me know if this works for you:
import requests
import pandas as pd
from io import StringIO
url = "https://estadisticas.bcrp.gob.pe/estadisticas/series/api/PN01210PM/csv/2015-01/2019-01"
content = requests.get(url,verify=False).text.replace("<br>","\n").strip()
csv = StringIO(content)
r = pd.read_csv(csv)
print(r)

Related

Getting question marks using Python requests with Excel file

I'm new to Python3 and requests. I found a Dataset on Harvard Dataverse but I've been stuck for hours trying to extract the Dataset. Instead I get question marks in my content and no readable data. I found similar issues but I'm still unable to solve mine.
Can anyone help me please ?
It would be so much appreciated ;)
Many thanks !!
import requests
import pandas as pd
import csv
import sys
#print(sys.executable)
#print(sys.version)
#print(sys.version_info)
url = "https://dataverse.harvard.edu/api/access/datafile/5856951"
r = requests.get(url)
print(type(r))
print('*************')
print('Response Code:', r.status_code)
print('*************')
print('Response Headers:\n', r.headers)
print('*************')
print('Response Content:\n',r.text)
print(r.encoding)
print(r.content)
with open('myfile.csv', mode='w', newline='') as f:
writer = csv.writer(f)
writer.writerows(r.text)
df = pd.read_csv('myfile.csv')
data = pd.DataFrame(df)
print("The content of the file is:\n", data)
print(data.head(10))
It seems that the request URL is not giving valid json response instead it is returning the whole excel file that contains the dataset which you want.
Instead of directly accessing the response object you should first save the response in excel file 'dataset.xlsx' then try to access that excel file in order to get results which you want.
The following code will help you to save the response in excel file. Then you can use xlrd https://www.geeksforgeeks.org/reading-excel-file-using-python/ python library to extract data from the file.
url = "https://dataverse.harvard.edu/api/access/datafile/5856951"
resp = requests.get(url)
open('dataset.xlsx', 'wb').write(resp.content)

ValueError: No tables found by using pd.read_html

cannot download the table even thought HTML shows there is a table, any way to fix it ?
the code is ok when I change other website link, not sure what is wrong with this website~
import requests
import pandas as pd
from io import StringIO
import datetime
import os
url = "https://www.cmoney.tw/etf/e210.aspx?key=0050"
response = requests.get(url)
listed = pd.read_html(response.text)[0]
listed.columns = listed.iloc[0,:]
listed = listed[["標的代號","標的名稱"]]
listed = listed.iloc[1:]
listed
ValueError: No tables found

To download a CSV file from a particular URL

How do i download a CSV file from a url in Jupyter notebook? I know how to use wget command in Google colab platform but in case of Jupyter notebook what should i do?
If you have a direct link to the URL you can just enter the URL directly in pandas to get the file:
import pandas as pd
import io
import requests
url="https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
s=requests.get(url).content
c=pd.read_csv(io.StringIO(s.decode('utf-8')))
Code is from this question: Pandas read_csv from url
Have a look at that as well maybe :)
import pandas as pd
data = pd.read_csv("https://abc/xyz.csv")
For more, check the documentation:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
You can use:
def download_csv_to_df(urllink):
converted_dataframe = pd.read_csv(urllink)
return converted_dataframe

Understanding how to extract data from HTML File

I am trying to access the "Yield Curve Data" available on this page. It has a radio button which upon clicking "Submit" results in a zip File, from which I am looking to get the data. I am looking to get the data from the "Retrieve all data" Option. My code is as follows, and from the statement print result.read() I realize that result is actually a HTML Document. My difficult is in understanding how to extract the data from result as I don't see any data in this. I am confused as to where to go from here.
import urllib, urllib2
import csv
from StringIO import StringIO
import pandas as pd
import os
from zipfile import ZipFile
my_url = 'http://www.bankofcanada.ca/rates/interest-rates/bond-yield-curves/'
data = urllib.urlencode({'lastchange': 'all'})
request = urllib2.Request(my_url, data)
result = urllib2.urlopen(request)
Thank You
Your going to need to generate a post request to the following endpoint:
http://www.bankofcanada.ca/stats/results/csv
With the following form data:
lookupPage: lookup_yield_curve.php
startRange: 1986-01-01
searchRange: all
This should give you the file.
You may also need to fake your useragent.

How to open .csv file from a url with Python?

I'm trying to open a csv file from a url but for some reason I get an error saying that there is an invalid mode or filename. I'm not sure what the issue is. Help?
url = "http://...."
data = open(url, "r")
read = csv.DictReader(data)
Download the stream, then process:
import urllib2
url = "http://httpbin.org/get"
response = urllib2.urlopen(url)
data = response.read()
read = csv.DictReader(data)
I recommend pandas for this:
import pandas as pd
read = pandas.io.parsers.read_csv("http://....", ...)
please see the documentation.
You can do the following :
import csv
import urllib2
url = 'http://winterolympicsmedals.com/medals.csv'
response = urllib2.urlopen(url)
cr = csv.reader(response)
for row in cr:
print row
Slightly tongue in cheek:
require json
>>> for line in file(','):
... print json.loads('['+line+']')
CSV is not a well defined format. JSON is so this will parse a certain type of CSV correctly every time.

Categories