How to open .csv file from a url with Python? - python

I'm trying to open a csv file from a url but for some reason I get an error saying that there is an invalid mode or filename. I'm not sure what the issue is. Help?
url = "http://...."
data = open(url, "r")
read = csv.DictReader(data)

Download the stream, then process:
import urllib2
url = "http://httpbin.org/get"
response = urllib2.urlopen(url)
data = response.read()
read = csv.DictReader(data)

I recommend pandas for this:
import pandas as pd
read = pandas.io.parsers.read_csv("http://....", ...)
please see the documentation.

You can do the following :
import csv
import urllib2
url = 'http://winterolympicsmedals.com/medals.csv'
response = urllib2.urlopen(url)
cr = csv.reader(response)
for row in cr:
print row

Slightly tongue in cheek:
require json
>>> for line in file(','):
... print json.loads('['+line+']')
CSV is not a well defined format. JSON is so this will parse a certain type of CSV correctly every time.

Related

Getting question marks using Python requests with Excel file

I'm new to Python3 and requests. I found a Dataset on Harvard Dataverse but I've been stuck for hours trying to extract the Dataset. Instead I get question marks in my content and no readable data. I found similar issues but I'm still unable to solve mine.
Can anyone help me please ?
It would be so much appreciated ;)
Many thanks !!
import requests
import pandas as pd
import csv
import sys
#print(sys.executable)
#print(sys.version)
#print(sys.version_info)
url = "https://dataverse.harvard.edu/api/access/datafile/5856951"
r = requests.get(url)
print(type(r))
print('*************')
print('Response Code:', r.status_code)
print('*************')
print('Response Headers:\n', r.headers)
print('*************')
print('Response Content:\n',r.text)
print(r.encoding)
print(r.content)
with open('myfile.csv', mode='w', newline='') as f:
writer = csv.writer(f)
writer.writerows(r.text)
df = pd.read_csv('myfile.csv')
data = pd.DataFrame(df)
print("The content of the file is:\n", data)
print(data.head(10))
It seems that the request URL is not giving valid json response instead it is returning the whole excel file that contains the dataset which you want.
Instead of directly accessing the response object you should first save the response in excel file 'dataset.xlsx' then try to access that excel file in order to get results which you want.
The following code will help you to save the response in excel file. Then you can use xlrd https://www.geeksforgeeks.org/reading-excel-file-using-python/ python library to extract data from the file.
url = "https://dataverse.harvard.edu/api/access/datafile/5856951"
resp = requests.get(url)
open('dataset.xlsx', 'wb').write(resp.content)

Get attached PDF file from HTTP request

I would like to download a file like this: https://www.bbs.unibo.it/conferma/?var=FormScaricaBrochure&brochureid=61305 with Python.
The problem is that is not directly a link to the file, but I only get the file id with query string.
I tried this code:
import requests
remote_url = "https://www.bbs.unibo.it/conferma/"
r = requests.get(remote_url, params = {"var":"FormScaricaBrochure", "brochureid": 61305})
But only the HTML is returned. How can I get the attached pdf?
You can use this example how to download the file using only brochureid:
import requests
url = "https://www.bbs.unibo.it/wp-content/themes/bbs/brochure-download.php?post_id={brochureid}&presentazione=true"
brochureid = 61305
with open("file.pdf", "wb") as f_out:
f_out.write(requests.get(url.format(brochureid=brochureid)).content)
Downloads the PDF to file.pdf (screenshot):

Python: Read URL's from text file and save result error

I am using the the following code to read the URL's in a text files and save the results in an another text file
import requests
with open('text.txt', 'r') as f: #text file containing the URLS
for url in f:
f = requests.get(url)
print (url)
print(f.text)
file=open("output.txt", "a") #output file
For some reason I am getting a {"error":"Permission denied"} message for each URL. I can paste the URL in the browser and get the correct response. I also tried with the following code and it worked OK on a singular URL.
import requests
link = "http://vanhatpainetutkartat.maanmittauslaitos.fi/getFiles.php?path=W50%2F4%2F4524"
f = requests.get(link)
print(f.text, file=open("output11.txt", "a"))
The txt file contains the following urls
http://vanhatpainetutkartat.maanmittauslaitos.fi/getFiles.php?path=22_Topografikartta_20k%2F3%2F3742%2F374207
http://vanhatpainetutkartat.maanmittauslaitos.fi/getFiles.php?path=W50%2F4%2F4524
http://vanhatpainetutkartat.maanmittauslaitos.fi/getFiles.php?path=W50%2F4%2F4432
http://vanhatpainetutkartat.maanmittauslaitos.fi/getFiles.php?path=21_Peruskartta_20k%2F3%2F3341%2F334112
I assume I am missing something very simple...Any clues?
Many thanks
Each line has a trailing newline. Simply strip it:
for url in f:
url = url.rstrip('\n')
...
you have to use content from the response-
you can use this code in loop
import requests
download_url="http://vanhatpainetutkartat.maanmittauslaitos.fi/getFiles.php?path=W50%2F4%2F4524"
response = requests.get(download_url, stream = True)
with open("document.txt", 'wb') as file:
file.write(response.content)
file.close()
print("Completed")

Not able to download a CSV file from a website

Maybe someone knows what's a problem with downloading from the site below... I run this code in Jupiter, and nothing happens.
import requests
import os
url = 'http://www.football-data.co.uk/mmz4281/1920/E0.csv'
response = requests.get(url)
with open(os.path.join("folder", "file"), 'wb') as f:
f.write(response.content)
I've also tried this code and it works fine on my side, assuming folder and file were defined correctly. Alternatively, you can try using pandas which can read a CSV file from URL. So the code would become:
import pandas as pd
import csv
url = '{Some CSV target}'
df = pd.read_csv(url)
df.to_csv('{absoulte path to CSV}', sep=',', index=False, quoting=csv.QUOTE_ALL)
Its has started working only after declaring proxy.
Im practicing on my work laptop. Maybe the local net is blocking my requests.
I hope it can be helpful for someone.
Thanks to everyone for your help!
import requests
import os
os.environ['HTTP_PROXY'] = 'your proxy'
url = 'http://www.football-data.co.uk/mmz4281/1920/E0.csv'
response = requests.get(url)
with open(os.path.join("C://DownloadLocation", "file.csv"), 'wb') as f:
f.write(response.content)

getting csv file from the url

I am trying to download a csv file which is on web portal, when doing it manually we login to the url and click on Download CSV button then it prompts for saving. we are using python3
I am trying to do this via python scripting, when we execute this script we get the the html page with the name Download CSV, when we click on that we get a csv file through that.
import urllib.request
import requests
session = requests.session()
playload = {'j_username':'avinash.reddy', 'j_password':'password'}
r = session.post('https://url_of_the_portal/auth/login','data=playload')
r = session.get('URL_of_the_page_where_the_csv_file_exiests')
url='https://url_of_the_portal/review/download/bm_sis'
print ('done')
urllib.request.urlretrieve (url, "Download CSV")
I think it should look like this + your login creds.
import csv
import urllib2
url = 'http://winterolympicsmedals.com/medals.csv'
response = urllib2.urlopen(url)
cr = csv.reader(response)
for row in cr:
print row
Else...
url = 'http://winterolympicsmedals.com/medals.csv'
r = requests.get(url)
text = r.iter_lines()
reader = csv.reader(text, delimiter=',')
Else...
import requests
from contextlib import closing
import csv
url = "http://download-and-process-csv-efficiently/python.csv"
with closing(requests.get(url, stream=True)) as r:
reader = csv.reader(r.iter_lines(), delimiter=',', quotechar='"')
for row in reader:
# Handle each row here...
print row
How to read a CSV file from a URL with Python?

Categories